Hello, I'm
Arun Kumar
Building production-grade ETL pipelines, backend systems, and AI-powered tools. Currently at Technocas & Zank AI. Oracle Cloud GenAI Certified.
Get to Know Me
About Me

I'm a Software Engineering student at SZABIST, Karachi (graduating 2027), currently working as a Data Engineer at Technocas and Backend Developer at Zank AI, a US-based fintech startup — both roles I hold simultaneously.
I specialize in building end-to-end data pipelines, scraping systems, and production backend infrastructure. From wrangling 10k+ product records out of e-commerce sites to designing Snowflake ETL workflows with Airflow — I build things that work at scale.
I'm also trained in Cloud Data Engineering at SMIT and hold an Oracle Cloud GenAI certification. Open to remote roles in Cloud Data Engineering, Backend, or AI integration.
0+
Companies
0+
Projects
0+
Certifications
0
Graduating
Where I've Worked
Work Experience
Data Engineer
Technocas·Karachi, Pakistan
Owning the full data collection and delivery lifecycle — engineering scrapers that handle bot defenses, building ETL pipelines, and delivering clean, warehouse-ready datasets for business consumption.
- ▹Built production-grade scrapers in Python using Requests, BeautifulSoup, Playwright, and Apify — covering static HTML pages through to fully dynamic, JavaScript-rendered platforms including Pinterest.
- ▹Designed multi-stage ETL pipelines that normalize raw scraped data, resolve duplicates, and enforce schema contracts before loading into the data warehouse.
- ▹Automated end-to-end data workflows via Python scheduling — eliminating manual extraction runs and reducing data delivery to near-zero human intervention.
- ▹Implemented transformation logic to handle inconsistent formats, null fields, and nested JSON structures across heterogeneous source sites.
- ▹Integrated processed datasets with Metabase to power reporting dashboards and business intelligence queries.
- ▹Built resilient scraping infrastructure with retry logic, request throttling, and session management to sustain throughput against anti-bot systems.
Target Sites → Requests / BeautifulSoup / Playwright / Apify → Cleaning & Normalization → ETL → Data Warehouse → Metabase
Raw website HTML becomes a queryable warehouse record — fully automated, no manual steps.
Backend Developer
Zank AI·Remote — USA
Building core backend infrastructure for a US-based fintech startup — REST API design, database architecture, authentication systems, and banking workflow engineering.
- ▹Architected and shipped RESTful APIs with FastAPI powering core banking workflows: account management, transaction processing, and user onboarding.
- ▹Designed and enforced JWT-based authentication and role-based access control (RBAC) across all protected endpoints.
- ▹Modeled relational database schemas in PostgreSQL optimized for financial data integrity, referential consistency, and concurrent-safe operations.
- ▹Engineered backend workflows for fintech-specific features — fund transfers, balance ledgers, and statement generation.
- ▹Implemented Redis caching for high-frequency API responses and session data, reducing database load on hot paths.
- ▹Containerized backend services with Docker for consistent local development and production deployment environments.
- ▹Built and maintained a double-entry ledger system for tracking financial transactions with full audit trail support.
React Frontend → FastAPI Endpoints → JWT Auth → Redis Cache → PostgreSQL / Ledger → Banking Logic → Docker → API Response
Shipping backend systems that handle real user financial data in a live US fintech product.
Software Engineer (AI)
HexaVibes Solutions·Karachi, Pakistan
- ▹Integrated ML models into production applications, building inference wrappers and API layers to expose model outputs as usable product features.
- ▹Profiled and optimized model inference pipelines to reduce prediction latency for real-time use cases.
- ▹Delivered AI-powered features across cross-functional teams, translating research outputs into stable, deployable backend services.
Agentic AI Developer
UXGENIE·Karachi, Pakistan
- ▹Designed and implemented multi-step agentic AI workflows using LLM orchestration for automated UX research tooling.
- ▹Built automation pipelines that replaced manual UX research tasks through AI-driven data extraction and synthesis.
Frontend Developer
High Tech Software House·Karachi, Pakistan
- ▹Built pixel-perfect, responsive landing pages and portfolio sites in React (TypeScript) with Tailwind CSS.
- ▹Implemented component-driven UI architecture ensuring cross-browser compatibility and consistent design fidelity.
Freelance Engineer
Fiverr·Remote — Global
- ▹Delivering custom data engineering, scraping pipelines, and backend solutions for international clients across e-commerce, research, and analytics domains.
- ▹Scoped, architected, and shipped complete client projects end-to-end — from requirements to deployment.
What I've Built
Featured Projects
Production-style ETL DAG that detects CSV files in S3, auto-creates the Snowflake table schema, and loads data using COPY INTO. Includes SMTP email alerts and robust Sensors/Operators.
Full-featured chat app with one-to-one and group conversations. Built on the MERN stack with Socket.IO for instant message delivery and a fully responsive UI.
Browser-based real-time surveillance system that detects humans via webcam using TensorFlow.js — no server required. Optimized for low-latency inference in the browser.
Deep Dives
Case Studies
Airflow ETL: S3 → Snowflake
Fully orchestrated cloud data pipeline with schema detection, automated loading, and alerting
What
A zero-touch ETL pipeline from cloud storage to data warehouse
Built an Apache Airflow DAG that monitors an S3 bucket for incoming CSV files, automatically detects and creates the Snowflake table schema, loads data using COPY INTO, and dispatches SMTP email alerts on completion or failure — no manual steps at any stage.
Why
Manual data loading doesn't scale and breaks silently
Loading files from S3 to Snowflake by hand is error-prone and collapses under volume. The business needed a system that detects new data automatically, handles schema changes without engineer intervention, and alerts the team so no one babysits a data job overnight.
Where
Cloud-orchestrated, warehouse-native, alert-driven
Airflow runs as the scheduler and orchestration layer. S3 Sensors listen for new file arrivals and trigger the DAG. Snowflake receives the cleaned load via COPY INTO. SMTP delivers success and failure notifications to the team. All components are cloud-native with no on-prem dependency.
System Architecture
S3 Bucket → Airflow S3 Sensor → DAG Trigger → Schema Auto-Detection → COPY INTO Snowflake → SMTP Alert
Outcome
Files land in S3, the warehouse table is updated, and the team is notified — zero human in the loop, zero manual SQL.
Stack
Kind Words
What People Say
Arun built our entire scraping and ETL pipeline from scratch. Clean code, on time, and the data quality was exactly what we needed. Will hire again.
Client via Fiverr
E-Commerce Startup
Arun consistently delivered scalable backend solutions at Zank AI. His understanding of API design and database optimization is well above his experience level.
Team Lead
Zank AI — US Fintech
He integrated our ML models into production seamlessly and improved inference performance significantly. Great communicator and fast learner.
Tech Lead
HexaVibes Solutions
What I Do Best
Core Expertise
Data Engineering & ETL Pipelines
Design and ship end-to-end data pipelines — raw source ingestion to warehouse-ready structured tables. Orchestrate multi-stage ETL workflows with schema handling, validation, and automated alerting.
Web Scraping & Data Extraction
Extract structured data from any website — static HTML through to fully JavaScript-rendered platforms. Build resilient scrapers with anti-bot handling, retry logic, and multi-source pipelines at scale.
Backend API Development
Build production-grade REST APIs — authentication, RBAC, database modeling, caching, and financial-grade reliability. Architect backend systems that handle real user data and concurrent requests safely.
Cloud & Infrastructure
Deploy, containerize, and scale backend systems on AWS. Comfortable across S3, EC2, Lambda, SQS, and Glue — with Docker for containerization and Redis for caching and session management.
Workflow Automation & Scripting
Automate repetitive data and business workflows end-to-end using Python. From scheduled ETL triggers and data reporting to n8n pipelines — eliminate manual steps at every layer of the stack.
What I Work With
Technical Skills
Languages
Data & Scraping
Frameworks
Databases
Cloud & DevOps
Tools & Platforms
Credentials
Certifications
Oracle Cloud Infrastructure 2025 Certified Generative AI Professional
Oracle
Oracle Cloud Infrastructure Certified AI Foundations Associate
Oracle
Google Prompting Essentials
Google Career Certificates
AWS Educate: Introduction to Cloud 101
Amazon Web Services
Object Oriented Programming in Java
Coursera / Online
Let's Talk
Get In Touch
Open to remote roles in Data Engineering, Backend, or AI integration. Whether it's a job, project, or just a hello — my inbox is always open.