Homework 7

pdf

School

University of St Thomas *

*We aren’t endorsed by this school

Course

745

Subject

Computer Science

Date

Jan 9, 2024

Type

pdf

Pages

Uploaded by trevorfostermn

Homework 7 Trevor Foster Section 1 ElasticSearch 1. In three sentences or less, explain how a content search task differs from analytical queries executed on a columnar data store. A content search task typically involves searching for specific values or patterns across the entire content of a dataset, often using text-based queries, while analytical queries executed on a columnar data store focus on aggregating, summarizing, or analyzing specific columns of structured data for insights. Content search is more geared towards discovering relevant information, whereas analytical queries aim to extract meaningful patterns and trends from organized data columns, optimizing performance for analytics and reporting tasks. The former is often associated with unstructured or semi-structured data, while the latter is common in structured, columnar databases designed for analytical workloads. 2. Name three useful features that Lucene offers for processing text. 1. full-text search 2. scoring and ranking 3. tokenization and analyzers 3. What is the top-level Apache search server that is considered an alternative to Elasticsearch? Apache Solr. Apache Solr, like Elasticsearch, is an open-source search platform built on Apache Lucene. 4. Explain the intuition behind both the term frequency and the inverse document frequency calculations in a TFIDF similarity score. TF measures how frequently a term occurs in a document. The higher the frequency, the more likely the term is important within that specific document. IDF measures the rarity of a term across the entire collection of documents. The rarer the term, the higher its IDF value, indicating higher importance. 5. In simple terms explain what this Lucene query was written to find: “information retrieval”~12 In Lucene, the query "information retrieval"~12 is using a proximity search. In simple terms, it is looking for documents where the terms "information" and "retrieval" appear within 12 words of each other. The "~12" specifies the maximum allowed distance between the two terms. This type of query is useful when you want to find documents where the specified

terms are close to each other in the text, indicating a more specific relationship or context between them. 6. Describe Beat’s role in the Elastic Stack. How does this differ from Logstash? Beats are specialized, lightweight shippers for specific data types, while Logstash is a more versatile and powerful data processing engine designed for complex ETL tasks. The choice between them depends on the specific requirements of your data pipeline and the complexity of data processing needed before indexing into Elasticsearch. 7. Name and describe two different data producers available in Beats. 1. Filebeat -designed to ship log files, allowing you to collect, parse, and forward log data from different sources. 2. Metricbeat - designed for collecting and shipping metric data from various sources, providing insights into the performance and health of systems and services. 8. If you wanted to exclude documents in Elasticsearch from being ranked in a query, would you use the query context or the filter context? Use the filter context within a query. The filter context is specifically designed for conditions that should affect the inclusion or exclusion of documents but do not impact the scoring. Section 2: Search queries Explore the Shakespeare dataset using three new queries of your choosing. Make sure each query returns some results. For each, 1) Explain the query in human terms (what are the conditions that affect coring and filtering, what is the query searching for overall) and 2) provide a screenshot of the query and result. { "query": { "match": { "speaker": "Romeo" } } } Query 1 explanation Search for all lines spoken by a specific character, such as "Romeo." The query should match documents where the "speaker" field is equal to "Romeo." { "query": { "match": {

"text_entry": "love" } } } Query 2 explanation Search for lines that contain a specific word, for instance, lines containing the word "love." The query involves a match clause to find documents where the "text_entry" field contains the specified word. { "query": { "bool": { "must": { "match": { "play_name": "Hamlet" } }, "filter": { "term": { "act": 1 } } } } } Query 3 explanation Filter lines based on a specific play and act. This query involves a bool query with a must clause for the play and an additional filter clause for the act. This ensures that only lines from the specified play and act are retrieved.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Related Documents

Term Group Project Deliverable 1.docx

workshop_9.pdf

ALU Lab.docx

DBch5a (1)-1-1(1).docx

Midterm Exam Note.docx

Quiz 1.docx

JT5kxst-SgmbtcUce6gj4g_e5493284ec9042f59f79b5b3bf61f3f1_CSE-535_Team-Assignment_Idea-Paper_Overview.

Recommended textbooks for you

Fundamentals of Information Systems

Computer Science

ISBN:9781337097536

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Principles of Information Systems (MindTap Course...

Computer Science

ISBN:9781305971776

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Principles of Information Systems (MindTap Course...

Computer Science

ISBN:9781285867168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Fundamentals of Information Systems

Computer Science

ISBN:9781305082168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Enhanced Discovering Computers 2017 (Shelly Cashm...

Computer Science

ISBN:9781305657458

Author:Misty E. Vermaat, Susan L. Sebok, Steven M. Freund, Mark Frydenberg, Jennifer T. Campbell

Publisher:Cengage Learning

MIS

Computer Science

ISBN:9781337681919

Author:BIDGOLI

Publisher:Cengage

SEE MORE TEXTBOOKS

Recommended textbooks for you

Fundamentals of Information Systems
Computer Science
ISBN:9781337097536
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781305971776
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781285867168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Fundamentals of Information Systems
Computer Science
ISBN:9781305082168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Enhanced Discovering Computers 2017 (Shelly Cashm...
Computer Science
ISBN:9781305657458
Author:Misty E. Vermaat, Susan L. Sebok, Steven M. Freund, Mark Frydenberg, Jennifer T. Campbell
Publisher:Cengage Learning
MIS
Computer Science
ISBN:9781337681919
Author:BIDGOLI
Publisher:Cengage