Assignment 9

pdf

School

Illinois Institute Of Technology *

*We aren’t endorsed by this school

Course

554

Subject

Industrial Engineering

Date

Dec 6, 2023

Type

pdf

Pages

Uploaded by KidYakPerson698

Assignment 9 Exercise 1 A) What is the Kappa architecture and how does it differ from the lambda architecture ? The Kappa architecture is a data processing framework that is designed for real-time data processing. It is a simplified version of the lambda architecture, with a focus on stream processing only. The main difference between the two architectures is that the lambda architecture stores data in two different systems - one for batch processing and one for real-time processing. The Kappa architecture, on the other hand, uses a single system for both batch and real-time processing, simplifying the process and reducing the complexity of the system. The Kappa architecture is ideal for use cases that require rapid processing of large datasets and the ability to respond quickly to changing data. B) What are the advantages and drawbacks of pure streaming versus micro-batch real-time processing systems ? Pure streaming systems have the advantage of providing real-time processing and immediate responses due to their continuous processing nature. This means that any incoming data is processed immediately, which is great for time-sensitive applications such as fraud detection or stock trading. However, this also comes with the drawback of high processing requirements and the need for a robust infrastructure. On the other hand, micro-batch real-time processing systems operate by processing data in small batches. This reduces the overall processing requirements and allows for more efficient use of computing resources. Additionally, micro-batch systems are more fault-tolerant as they can recover from processing failures without losing significant amounts of data. However, this comes at the cost of reduced real-time response time and increased latency. C) In few sentences describe the data processing pipeline in Storm. The data processing pipeline in Storm is a series of steps that includes collecting data sources, processing the data using topology and spouts, and finally storing the data in a database or data warehouse. Storm uses a distributed architecture to handle large volumes of data in real-time, with each node processing a small portion of the data stream. The processed data is then aggregated and stored for further analysis or use in business applications. Storm's reliability and scalability make it an ideal choice for big data processing in a variety of industries, including finance, healthcare, and retail. D) How does Spark streaming shift the Spark ba0074ch processing approach to work on real-time data streams ? Spark streaming is a powerful tool that allows Spark to process real-time data streams. Unlike the traditional Spark batch processing approach, Spark streaming processes data in small, micro-batches in near real-time. This means that data can be ingested, processed, and analyzed in real-time, enabling businesses to make critical decisions faster than ever before. With Spark streaming, businesses can gain insights from streaming data sources such as social media feeds, IoT devices, and sensor data, and use this information to optimize operations, improve customer experiences, and drive business growth. By shifting to a real-time data processing approach with Spark streaming, businesses can stay competitive in today's fast-paced, data-driven world. Exercise 2 Creating kafka topics and listing the topics: Part A 1

Program to produce messages to the Kafka topic ‘sample’ put.py from kafka import KafkaProducer import json # create Kafka producer object my_kafka_producer = KafkaProducer(bootstrap_servers=['localhost:9092']) # create list of dictionaries, each containing a key-value pair my_key_value_pairs = [ {'key': 'MYID’, 'value': 'A20512469}, {'key': 'MYNAME', 'value': ‘ARJUN MOHAN KUMAR}, {'key': 'MYEYECOLOR', 'value': ‘black’} ] # encode the keys and values in each dictionary as bytes for row in my_key_value_pairs: row['key'] = bytes(row['key'], encoding='utf-8') row['value'] = bytes(json.dumps(row['value']), encoding='utf-8') # send the key-value pairs to the Kafka topic for row in my_key_value_pairs: my_kafka_producer.send(‘sample’, key=row['key'], value=row['value']) # close the Kafka producer connection my_kafka_producer.close() Part B Program to consume messages from the Kafka topic ‘sample’ get.py # Import the necessary libraries from kafka import KafkaConsumer import json # Create a Kafka consumer instance and connect it to the broker my_consumer = KafkaConsumer('sample', bootstrap_servers=['localhost:9092']) # Loop through the messages received from the Kafka topic for msg in my_consumer: # Decode the message key and value and store them in variables msg_key = msg.key.decode('utf-8') msg_value = json.loads(msg.value.decode('utf-8')) # Print the key-value pair print('Key={}, Value={}'.format(msg_key, msg_value)) # Close the Kafka consumer connection my_consumer.close() Output: 2

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Assignment 9

Related Documents