Homework 7

pdf

School

University of St Thomas *

*We aren’t endorsed by this school

Course

745

Subject

Computer Science

Date

Jan 9, 2024

Type

pdf

Pages

3

Uploaded by trevorfostermn

Report
Homework 7 Trevor Foster Section 1 ElasticSearch 1. In three sentences or less, explain how a content search task differs from analytical queries executed on a columnar data store. A content search task typically involves searching for specific values or patterns across the entire content of a dataset, often using text-based queries, while analytical queries executed on a columnar data store focus on aggregating, summarizing, or analyzing specific columns of structured data for insights. Content search is more geared towards discovering relevant information, whereas analytical queries aim to extract meaningful patterns and trends from organized data columns, optimizing performance for analytics and reporting tasks. The former is often associated with unstructured or semi-structured data, while the latter is common in structured, columnar databases designed for analytical workloads. 2. Name three useful features that Lucene offers for processing text. 1. full-text search 2. scoring and ranking 3. tokenization and analyzers 3. What is the top-level Apache search server that is considered an alternative to Elasticsearch? Apache Solr. Apache Solr, like Elasticsearch, is an open-source search platform built on Apache Lucene. 4. Explain the intuition behind both the term frequency and the inverse document frequency calculations in a TFIDF similarity score. TF measures how frequently a term occurs in a document. The higher the frequency, the more likely the term is important within that specific document. IDF measures the rarity of a term across the entire collection of documents. The rarer the term, the higher its IDF value, indicating higher importance. 5. In simple terms explain what this Lucene query was written to find: “information retrieval”~12 In Lucene, the query "information retrieval"~12 is using a proximity search. In simple terms, it is looking for documents where the terms "information" and "retrieval" appear within 12 words of each other. The "~12" specifies the maximum allowed distance between the two terms. This type of query is useful when you want to find documents where the specified
terms are close to each other in the text, indicating a more specific relationship or context between them. 6. Describe Beat’s role in the Elastic Stack. How does this differ from Logstash? Beats are specialized, lightweight shippers for specific data types, while Logstash is a more versatile and powerful data processing engine designed for complex ETL tasks. The choice between them depends on the specific requirements of your data pipeline and the complexity of data processing needed before indexing into Elasticsearch. 7. Name and describe two different data producers available in Beats. 1. Filebeat -designed to ship log files, allowing you to collect, parse, and forward log data from different sources. 2. Metricbeat - designed for collecting and shipping metric data from various sources, providing insights into the performance and health of systems and services. 8. If you wanted to exclude documents in Elasticsearch from being ranked in a query, would you use the query context or the filter context? Use the filter context within a query. The filter context is specifically designed for conditions that should affect the inclusion or exclusion of documents but do not impact the scoring. Section 2: Search queries Explore the Shakespeare dataset using three new queries of your choosing. Make sure each query returns some results. For each, 1) Explain the query in human terms (what are the conditions that affect coring and filtering, what is the query searching for overall) and 2) provide a screenshot of the query and result. { "query": { "match": { "speaker": "Romeo" } } } Query 1 explanation Search for all lines spoken by a specific character, such as "Romeo." The query should match documents where the "speaker" field is equal to "Romeo." { "query": { "match": {
"text_entry": "love" } } } Query 2 explanation Search for lines that contain a specific word, for instance, lines containing the word "love." The query involves a match clause to find documents where the "text_entry" field contains the specified word. { "query": { "bool": { "must": { "match": { "play_name": "Hamlet" } }, "filter": { "term": { "act": 1 } } } } } Query 3 explanation Filter lines based on a specific play and act. This query involves a bool query with a must clause for the play and an additional filter clause for the act. This ensures that only lines from the specified play and act are retrieved.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help