Big-data

Published on
May 20, 2024
Real-Time Data Pipeline with Apache Kafka, Spark, and Hive
Big-Data Data-Science Cloud
This article outlines a scalable, Docker-based architecture for handling data streams from Reddit, processing them with Apache Kafka and Spark, and storing the results in Apache Hive for analytical querying
Published on
January 20, 2024
HDFS; Top-3 IPs for each hour of IP stream
Big-Data Data-Science
MapReduce process with emphasis on each mapper and reducer step, environment configuration, and intermediate results generation

Real-Time Data Pipeline with Apache Kafka, Spark, and Hive