Bailey Davidson DAT 260 Module Three Assignment

docx

School

Southern New Hampshire University *

*We aren’t endorsed by this school

Course

260

Subject

Electrical Engineering

Date

Feb 20, 2024

Type

docx

Pages

Uploaded by PrivateMetal11597

DAT 260 Module Three Assignment Big Data Analysis Tools I. Tool Comparison Table Tool Strengths Weaknesses Best Used Hive  Supports SQL and based its tool language off of SQL.  Allows users to define tasks using Java or Python.  Only stores structured data.  High latency.  Hive is great for needing to analyze large data quickly and easily. I think this will fit well in Medicine (Baturina, 2019). Spark  Uses in-memory caching to run fast analytic queries.  Supports a variety of languages for developers (Amazon, 2024)  Does not have its own storage system.  Not for multiple users.  Spark is one of the fastest among other tools and supports developers, therefore it may be best for software engineers. Flink  It can perform on a large scale.  Real time processing.  “Supports a wide range of connectors to third-party systems” (Taylor, 2024).  Limited long-term storage.  Reduced data type storage.  Because of its real-time processing and large-scale abilities, Flink may be best for financial businesses (Baturina, 2019). Pentaho  Encourages big data to be analyzed at the source.  Offers tools for visualization and reporting.  Slower than other tools.  Even though it is open sourced, support in community is worse than others.  Pentaho is best for using a tool from beginning to end, for those who have small projects and need to mine, analyze, and then present. Great for small business or contractors to help businesses.

II. Reflection I am not sure what field I would like to start my career in. However, I do have experience in medical administration and if I were to continue working in that field I believe the best fit would be between Flink and Hive. I chose Flink simply because it can connect to a lot of third-party systems and it has real-time processing, which in the medical field there are lot of different categories of information that are stored in several different places. Yet, Hive would be amazing because it stores and analyzes a lot of data and can store it too. Therefore, it would be very useful to be able to access and analyze that information directly from each source. On the other hand, if I were working for a company that had its information nearly all in one source and needed to use it for reporting as well, then Pentaho would be the way to go. If I were to continue in medical administration, I would probably use Hive or Flink to analyze “personalized treatment, patient admissions prediction, and practice management and optimization” (Baturina, 2019). Using the data and finding trends for these reasons would be very beneficial for any provider in the medical field. It will provide patients with better care through knowing what works best and what areas need improving. Resources An introduction to big data . Opensource.com. (2024). https://opensource.com/resources/big-data Baturina, O. (2019, May 2). 40 Stats and Real-Life Examples of How Companies Use Big Data . ScienceSoft. https://www.scnsoft.com/blog/big-data-use-cases-stats-and-examples#real- life-examples Taylor, D. (2024, January 18). Top 15 Big Data Tools and Software (Open Source) 2024 . Guru99. https://www.guru99.com/big-data-tools.html What is Apache Spark? . Amazon. (2024). https://aws.amazon.com/what-is/apache-spark/

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version