Assignment 9
pdf
keyboard_arrow_up
School
Illinois Institute Of Technology *
*We aren’t endorsed by this school
Course
554
Subject
Industrial Engineering
Date
Dec 6, 2023
Type
Pages
2
Uploaded by KidYakPerson698
Assignment 9
Exercise 1
A) What is the Kappa architecture and how does it differ from the lambda architecture
?
The Kappa architecture is a data processing framework that is designed for real-time data processing. It is a simplified version
of the lambda architecture, with a focus on stream processing only. The main difference between the two architectures is that
the lambda architecture stores data in two different systems - one for batch processing and one for real-time processing. The
Kappa architecture, on the other hand, uses a single system for both batch and real-time processing, simplifying the process
and reducing the complexity of the system. The Kappa architecture is ideal for use cases that require rapid processing of large
datasets and the ability to respond quickly to changing data.
B) What are the advantages and drawbacks of pure streaming versus micro-batch real-time processing systems
?
Pure streaming systems have the advantage of providing real-time processing and immediate responses due to their
continuous processing nature. This means that any incoming data is processed immediately, which is great for time-sensitive
applications such as fraud detection or stock trading. However, this also comes with the drawback of high processing
requirements and the need for a robust infrastructure.
On the other hand, micro-batch real-time processing systems operate by processing data in small batches. This reduces the
overall processing requirements and allows for more efficient use of computing resources. Additionally, micro-batch systems
are more fault-tolerant as they can recover from processing failures without losing significant amounts of data. However, this
comes at the cost of reduced real-time response time and increased latency.
C)
In few sentences describe the data processing pipeline in Storm.
The data processing pipeline in Storm is a series of steps that includes collecting data sources, processing the data using
topology and spouts, and finally storing the data in a database or data warehouse. Storm uses a distributed architecture to
handle large volumes of data in real-time, with each node processing a small portion of the data stream. The processed data is
then aggregated and stored for further analysis or use in business applications. Storm's reliability and scalability make it an
ideal choice for big data processing in a variety of industries, including finance, healthcare, and retail.
D)
How does Spark streaming shift the Spark ba0074ch processing approach to work on real-time data streams
?
Spark streaming is a powerful tool that allows Spark to process real-time data streams. Unlike the traditional Spark batch
processing approach, Spark streaming processes data in small, micro-batches in near real-time. This means that data can be
ingested, processed, and analyzed in real-time, enabling businesses to make critical decisions faster than ever before. With
Spark streaming, businesses can gain insights from streaming data sources such as social media feeds, IoT devices, and
sensor data, and use this information to optimize operations, improve customer experiences, and drive business growth. By
shifting to a real-time data processing approach with Spark streaming, businesses can stay competitive in today's fast-paced,
data-driven world.
Exercise 2
Creating kafka topics and listing the topics:
Part A
1
Program to produce messages to the Kafka topic ‘sample’
put.py
from kafka import KafkaProducer
import json
# create Kafka producer object
my_kafka_producer = KafkaProducer(bootstrap_servers=['localhost:9092'])
# create list of dictionaries, each containing a key-value pair
my_key_value_pairs = [
{'key': 'MYID’, 'value': 'A20512469},
{'key': 'MYNAME', 'value': ‘ARJUN MOHAN KUMAR},
{'key': 'MYEYECOLOR', 'value': ‘black’}
]
# encode the keys and values in each dictionary as bytes
for row in my_key_value_pairs:
row['key'] = bytes(row['key'], encoding='utf-8')
row['value'] = bytes(json.dumps(row['value']), encoding='utf-8')
# send the key-value pairs to the Kafka topic
for row in my_key_value_pairs:
my_kafka_producer.send(‘sample’, key=row['key'], value=row['value'])
# close the Kafka producer connection
my_kafka_producer.close()
Part B
Program to consume messages from the Kafka topic ‘sample’
get.py
# Import the necessary libraries
from kafka import KafkaConsumer
import json
# Create a Kafka consumer instance and connect it to the broker
my_consumer = KafkaConsumer('sample', bootstrap_servers=['localhost:9092'])
# Loop through the messages received from the Kafka topic
for msg in my_consumer:
# Decode the message key and value and store them in variables
msg_key = msg.key.decode('utf-8')
msg_value = json.loads(msg.value.decode('utf-8'))
# Print the key-value pair
print('Key={}, Value={}'.format(msg_key, msg_value))
# Close the Kafka consumer connection
my_consumer.close()
Output:
2
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help