Pallavi Miriyala

Pallavi
Miriyala

Building intelligent systems for production-scale AI

AI ML Engineer

Designing intelligent systems that combine AI, infrastructure, and production-scale engineering — from multimodal inference to autonomous agents.

Explore Projects GitHub ↗Contact

pallavi@ai-systems ~ zsh

[✓] CUDA connected

[✓] Qwen-VL initialized

[✓] Vector DB online

[✓] API gateway active

AI Pipeline · Liverunning

Data Input

GPU Inference

Embedding

Vector Search

LLM Reasoning

API Response✓

Scenes Processed

Movie Intelligence Pipeline

Embeddings Indexed

FAISS vector search

Sub-second

Sub-second Retrieval

Vector similarity search

GPU-Accelerated

ONNX Runtime · CUDA

Systems I Build

Production-focused AI engineering.

AI Pipelines

Production-grade ML workflows, multimodal inference, embedding systems, and automated intelligence pipelines at scale.

Computer Vision

GPU-accelerated inference with ONNX Runtime, YOLO, RetinaFace, InsightFace, and large-scale scene analysis systems.

Agentic AI

Autonomous reasoning systems using ReAct workflows, LLM orchestration, tool-calling, and intelligent multi-step automation.

Cloud Infrastructure

AWS deployments, Dockerized services, PostgreSQL at scale, Cloudflare networking, CI/CD, and MLflow tracking.

Engineering Case Studies

Systems designed
for real-world scale.

Production-grade AI — built, deployed, and running.

01 / pipeline

Video Ingestion

Face Detection (RetinaFace)

ONNX Runtime GPU Inference

Embedding → FAISS Index

Qwen VL Scene Description

FastAPI Production Layer

Engineering Challenge

Resolved production-critical CUDA/TensorRT runtime conflicts (nvinfer_10.dll) — diagnosed provider incompatibilities and migrated to CUDAExecutionProvider, restoring full GPU inference at 360K+ scene scale.

Production ML System · Mango Mass Media

Movie Scene Intelligence Pipeline

End-to-end AI pipeline processing 360,000+ movie scenes with automated face detection, recognition, and scene classification. Powers dam-studio-master — a React/Node.js digital asset management platform serving processed media to production.

Qwen VLFAISSONNX RuntimeRetinaFacePostgreSQLAWS RDSFastAPI

02 / pipeline

Production Data Stream

Statistical Drift Detection

LLM Decision Agent

Root Cause Analysis

Adaptive Retraining

MLflow Version Tracking

Engineering Challenge

Designed modular architecture compatible with TensorFlow, PyTorch, and Scikit-learn — ensuring the agent deploys into any existing MLOps stack without changes to the pipeline.

Autonomous MLOps Agent

DriftGuard AI

Intelligent monitoring system that detects data drift and concept drift, automatically triggers adaptive retraining, and uses an LLM-powered Decision Agent (GPT / Claude) to reason about root causes and explain every remediation action in plain language.

PythonScikit-learnGPT-4ClaudeMLflowStatistical Drift Detection

View on GitHub

Multi-Agent MLOps

auto-ml-pipeline-agent

Autonomous 4-agent system (Monitor → Diagnose → Strategize → Execute) for self-healing ML pipelines with real-time metrics collection and AI-driven optimization.

PythonMulti-AgentLLMMLOps

MLOps Automation

auto-ml-guardian

Proactive ML model health management with Human-in-the-Loop controls routing critical changes for approval — framework-agnostic across sklearn, PyTorch, TensorFlow.

PythonPyTorchTensorFlowScikit-learn

Real-Time ML Pipeline

stream-anomaly-guardian

Industrial IoT anomaly detection with Apache Kafka streaming, ADWIN concept drift detection, and self-healing model retraining for high-velocity sensor data.

PythonApache KafkaRiver/ADWINScikit-learnDocker

Vision AI Agent

sentient-desktop-agent

AI agent that proactively assists by understanding your screen. Uses LLaVA locally for visual perception and GPT-4o / Claude 3.5 for reasoning in a continuous Perception-Reasoning-Action cycle.

PythonLLaVAOllamaGPT-4oClaude

DevOps AI System

devops-maestro-agent

Multi-agent LLM system for autonomous DevOps — Planner, Knowledge Retrieval, Diagnosis, and Solution agents collaborate to diagnose incidents and troubleshoot infrastructure.

PythonMulti-AgentRAGLLM

Adaptive AI Agent

adapti-persona-agent

Context-aware agent that dynamically adopts expert personas (Engineer, Strategist, Analyst) with ChromaDB vector memory, tool orchestration, and RAG-powered adaptive assistance.

PythonGPT-4oChromaDBRAGLangChain

Agentic Workflows

agentic-canvas

Framework for orchestrating complex human-in-the-loop agentic workflows with state persistence and dynamic tool integration across multi-step tasks.

PythonMulti-AgentState Persistence

ML Classification

Multiple Disease Prediction

Multi-class classifier predicting diseases from user-reported symptoms with 92%+ accuracy — Grid Search hyperparameter tuning, deployed as an interactive Streamlit app.

Scikit-learnPandasStreamlit

Browser Extension

Text Summarizer Extension

Chrome extension using Google Gemini AI for real-time text summarization and Q&A on selected web content, with secure API key storage and customizable output.

JavaScriptGemini APIChrome Extension

Infrastructure & Scaling

Building AI systems end-to-end.

I care about everything below the model — GPU inference optimization, vector retrieval at scale, deployment pipelines, networking, and cloud-native architecture that holds up in production.

GPU-accelerated ONNX Runtime inference — CUDA / TensorRT

PostgreSQL at scale — GIN indexes, materialized views, AWS RDS

Cloud-native deployments — AWS, Docker, Cloudflare CDN

Agentic AI orchestration — ReAct, tool-calling, LLM pipelines

Production Stack

◻User Interface Layer

│

⬡Cloudflare CDN + Edge Routing

│

◈React + FastAPI Services

│

◎AI Inference & Vector Searchactive

│

▣PostgreSQL + AWS RDS

Engineering Philosophy

Building intelligent systems that scale beyond experimentation.

I'm deeply interested in how modern AI systems work end-to-end — from model behavior and inference optimization to deployment architecture and production reliability.

Communication Channel