All Posts

Published on
May 20, 2024
Real-Time Data Pipeline with Apache Kafka, Spark, and Hive
Big-Data Data-Science Cloud
This article outlines a scalable, Docker-based architecture for handling data streams from Reddit, processing them with Apache Kafka and Spark, and storing the results in Apache Hive for analytical querying
Published on
March 2, 2024
Document Retrieval System with Pinecone and LlamaIndex
LLM RAG LlamaIndex
In this article, we'll walk through setting up a document retrieval system using Pinecone for vector storage and LlamaIndex for managing the ingestion and querying processes. This step-by-step guide will help you understand how to integrate these tools for efficient and intelligent document searches.
Published on
January 20, 2024
HDFS; Top-3 IPs for each hour of IP stream
Big-Data Data-Science
MapReduce process with emphasis on each mapper and reducer step, environment configuration, and intermediate results generation
Published on
May 2, 2021
Go Emotion on Llama-3-8b
Machine-Learning LLM Fine-Tune
Go Emotion is a fine-tuned model from Llama-3 that allows you to easily extract emotions from text.

Real-Time Data Pipeline with Apache Kafka, Spark, and Hive