Sreeharsha Paruchuri

Hi, I'm Harsha, a robotics graduate student at Carnegie Mellon University's Robotics Institute, pursuing the Master of Science in Robotic Systems Development (MRSD). I'm currently working on giving robots the ability to perceive complex 3D environments and take informed decisions for long-horizon tasks through Reinforcement Learning.

At IIIT-Hyderabad, I explored robotics and computer vision with Prof. Madhava Krishna and computational social science with Prof. Vinoo Alluri, learning to approach problems from multiple angles and design interdisciplinary solutions. Currently, I'm sharpening that foundation through MRSD's unique blend of coursework, systems engineering, and entrepreneurship training which challenge me to think beyond code and about scalability, reliability, and teamwork in building complex robotic systems.

Previously, I worked as a Pre-Doctoral Research Fellow at TCS Research where I Investigated reinforcement learning methods for embodied agents to integrate audio-visual perception for spatial understanding and navigation, bypassing explicit SLAM through learned internal representations.

Research Interests: 3D Computer Vision and Long-Horizon Robot Intelligence for embodied agents.

Email  /  LinkedIn  /  Github  /  Resume

News

Oct 2025: Finished in 3rd place in the CMU VLA Challenge and will be presenting our work at IROS 2025! 🏆
Sep 2025: Conducted an in-person lab on Backpropagation and Training Convolutional Neural Networks. 📚
Aug 2025: Completed my summer internship at Mach9 in the Bay Area, focusing on 3D Computer Vision. 🚀
Apr 2025: Successfully demonstrated our Apple Vision Pro + Robot assisted semi-autonomous Total Knee Arthroplasty system. 🩺
Aug 2024: Started graduate school at CMU’s Robotics Institute! 🤖
May 2024: Submitted our paper on Audio-Visual Navigation to CoRL 2024. 🤞
Feb 2024: Admitted to CMU’s MRSD program!
Feb 2024: Invited to Google Research Week 2024 in Bangalore, India. 🎓
Nov 2023: Finished 4th place internationally in the Habitat Open Vocabulary Mobile Manipulation Challenge at NeurIPS 2023.
Jul 2022: Started working at TCS Research, Kolkata, on long-horizon robot navigation using Deep Reinforcement Learning.
Jul 2022: Graduated from IIIT-H with honours in Robot Perception! Grateful to my family and friends for making it possible. ✨
May 2022: Built an autonomous sanitization robot on a budget of $5000 and finished as runners up in a nationwide competition. 🧼

Industry Experience

Mach9 Logo
Mach9
May 2025 - Aug 2025
Perception Software Engineering Intern
Focus: Developed CUDA-accelerated 2D-3D feature correspondence pipeline and fine-tuned Vision-Language Models to speed up quality assurance in outdoor surveying systems
• Developed and deployed Pavement Symbol Extraction functionality to the Digital Surveyor software via CUDA-accelarated coordinate frame transformations and segmentation masks.
• Implemented unit and CI testing pipelines with GitHub Actions to validate CUDA kernels and vector-field clustering modules, ensuring production-grade reliability.
• Fine-tuned a Vision-Language Model (VLM) to implement a secondary-inference pipeline to classify extracted open-set painted symbols according to user specifications.
• Conducted 70+ controlled ablation experiments with A/B testing on Hungarian Assigner Costs, Loss weights, Model Queries, Multi-Scale Deformable Attention and encoder-decoder expressivity to boost the performance of the production model by 4%.
• Utilised methods from Object-Detection literature to qualitatively capture a DETR-based polyline detection model's uncertainty to expedite downstream Quality Assurance and Quality Control processes, saving company and customer resources.
TCS Research Logo
Tata Consultancy Services Research
July 2022 - July 2024
Pre-Doctoral Research Fellow
Focus: Led research in the realm of cognitive robotics, emphasizing navigation that focussed on audio-visual feature correspondence and reinforcement learning for active SLAM.
Audio-Visual Navigation: Led development of embodied AI agent with multimodal sensing, training online RL policy with novel class-agnostic reward, reducing path length by 21%
Offline RL for Indoor Robot Navigation: Built simulation pipeline to collect large-scale trajectory datasets, training a Causal Decision Transformer with early multimodal fusion; integrated environment randomization, behavior cloning baselines, and replay buffer curation to improve policy robustness and sample efficiency
CLIP-Enhanced Scene Graphs: Designed contrastive-learning framework to compute visual-language embeddings, leveraging GNNs to model object-region relationships
Open Vocabulary Manipulation (NeurIPS 23): Developed active SLAM exploration algorithm conditioned on probabilistic semantic map, improving task success by 60%.
• Volunteered for the Project Synergy initiative by TCS wherein volunteers taught written and spoken English to students in a Bangla-medium government school.
RRC IIIT-H Logo
Robotics Research Center (RRC, IIIT-H)
Jan 2020 - June 2022
Research Assistant
Focus: Worked on dense 3D reconstruction and utilizing SLAM techniques such as pose-graph optimization indoor and outdoor autonomy on self-driving vehicles.
Autonomous Sanitization Robot: Designed and implemented end-to-end robotic system during COVID-19 to autonomously sanitize indoor spaces, integrating computer vision, Visual-SLAM, and coverage-based navigation
Sim-to-Real Deployment: Built Gazebo simulation environments for iterative testing, then transferred stack to hardware platform with onboard sensors and sanitization actuators; finished runner-up among 140 teams
LiDAR SLAM: Evaluated LiDAR odometry and mapping approaches such as LOAM using CARLA simulation and outdoor driving data, analyzing localization accuracy and map consistency
Depth Estimation: Implemented stereo and monocular depth estimation methods on driving datasets including KITTI and NuScenes, developing a ROS package for multi-view bundle adjustment

I have also been a part of

Bosch Research and Technology Center
May 2021 - Aug 2021
Computer Vision Intern
• Fused Laser, Camera, and Odometry data using Kalman filtering to boost online Multi-Object Tracking performance by 11% IoU on outdoor autonomous driving datasets
• Augmented difficult-to-obtain real-world LiDAR datasets using synthetic data from generative models and physics engines, improving 3D object detection networks for outdoor scenarios
PreCog Lab, IIIT-H
2020 - 2021
Research Assistant - Information Retrieval and Computational Social Science
• Applied statistical machine learning with Music Information Retrieval to analyze lyrical regularities as early indicators of mental illness; Published results at INTERSPEECH 2021
• Scraped Reddit data to link music-sharing trends with mental health during COVID-19 using BERT embeddings and DBSCAN clustering; Published in medical journal

Projects & Research

AR Knee Surgery Project
Augmented-Reality and Robot Assisted Knee Surgery
website / code
Gathered and analyzed requirements from user studies, market competition, and sponsors to inform system development. Processed 3D and RGB information from the Apple Vision Pro to detect bone models in the environment via ICP registration.
Project Leadership: Led a 5-person team as Project Manager, driving scheduling, sponsor communication, and system integration for an AR-assisted surgical robotics platform
Accuracy-driven Perception: Achieved sub-4 mm drilling accuracy in total knee arthroplasty using a KUKA MED7 arm with multi-stage pointcloud registration (SAM2 + ICP)
AR Integration: Integrated Apple Vision Pro for dynamic bone tracking and real-time surgeon-in-the-loop planning across long surgical horizons
Motion Planning: Designed and deployed a ROS + MoveIt planning subsystem that adaptively updates as surgical pins are drilled, enabling safe trajectory generation
Hardware Development: Built a custom 3D-printed drill end-effector with embedded control electronics, activated via trajectory execution for autonomous drilling
3D Foundation Models Project
3D Foundation-Models for Monocular Video Reconstruction
report / code
Implemented semantic-geometric feature fusion using cross-attention between foundation model embeddings (DINOv2, Depth Anything) in a hierarchical state representation to recover camera extrinsics. Devised an adaptive keyframe selection strategy for confidence-aware pointmap refinement using a DUST3R-style architecture.
Foundation Model Fusion: Designed cross-attention mechanism to combine DINOv2 semantic features with Depth Anything geometric priors, achieving robust 3D scene understanding from monocular video
Adaptive Keyframe Selection: Developed confidence-aware algorithm that dynamically selects optimal frames for reconstruction, improving pointmap quality by 30% over uniform sampling
DUST3R Architecture: Implemented hierarchical state representation with multi-scale feature pyramids to handle camera motion estimation and dense 3D reconstruction simultaneously
CMU VLA Challenge
CMU VLA Challenge
problem / code
Built a Vision-Language Navigation (VLN) system that answered natural language queries by combining Gemini 2.5 Pro embodied reasoning with a custom ROS state machine. The system produced numerical answers, object references, or waypoint plans under a strict 10-minute limit.
Natural Language Understanding: “closest to the window” used Gemini 2.5 Pro to classify and intelligently reason over spatial relations
State Machine: designed a ROS state machine to coordinate exploration, mapping, and answering with dynamic transitions
Deployment Ready: Deployed the system on a real robot for the challenge through clean docker containerization.
Pose Graph Optimization Project
Pose Graph Optimization for 2D SLAM
report / code
Implemented a 2D SLAM backend where noisy odometry and loop closure constraints were refined into a globally consistent trajectory. Used jax to compute residuals and Jacobians, applied nonlinear least-squares optimization, and validated improvements with RPY and APE error metrics. Explored the role of confidence weighting in the information matrix and compared against g2o optimization with robust kernels.
Iterative Optimization: Built custom nonlinear solver in JAX with residual and Jacobian computation for pose updates
Loop Closures: Studied effect of varying odometry vs loop closure confidence weights on trajectory quality visualized in g2o_viewer
Error Evaluation: Quantified improvements via RPY drift and Absolute Pose Error reduction compared to initial odometry
Literature Review: Analyzed “Past, Present & Future of SLAM” survey, contextualizing open problems in robustness and scalability with deep learning-based approaches
Music, Mental Health, and Representation Learning
publication / code
Applied BERT-based sentiment analysis and k-means clustering to uncover nuanced links between language and acoustic music features in data scraped from mental health related subreddits during COVID-19. This research contributed to understanding the relationship between music and mental health through computational methods.
BERT Sentiment Analysis: Fine-tuned transformer models on mental health discourse to extract emotional patterns from 50k+ Reddit posts, achieving 87% accuracy in mood classification
Music Information Retrieval: Developed acoustic feature extraction pipeline using librosa and essentia to correlate musical elements (tempo, key, valence) with psychological states
COVID-19 Impact Study: Applied k-means clustering and statistical analysis to identify significant behavioral shifts in music consumption patterns during pandemic, published findings at INTERSPEECH 2021

Education

Carnegie Mellon University, School of Computer Science
Aug 2024 - May 2026
Master of Science in Robotic Systems Development (MRSD)
CGPA: 4.11/4.0
Teaching: Introduction to Deep Learning
Coursework: Learning for 3D Vision, Generative AI, Deep RL
Learning for 3D vision: 3D generation, volume rendering + NeRFs, gaussian splatting + diffusion-guided optimization, Classifier-Free Guidance, PointNet classification and segmentation
Generative AI: grouped query attention + RoPE in GPT-2, diffusion models and VAEs, parameter-efficient fine-tuning + DPO, In-Context Learning, Mixture of Experts, paper review
Deep reinforcement learning: policy gradients, Q-learning, Performance Difference Lemma, actor-critic methods, Proximal Policy Optimization, evolutionary methods, DAgger, Imitation Learning
Robot autonomy, mobility and control: grasping, Kalman filtering, control for drones, humanoids
Advanced computer vision: homography, Lucas-Kanade tracking, photometric stereo
Systems engineering: functional architecture, unit tests, project management
International Institute of Information Technology, Hyderabad
Aug 2018 - July 2022
Bachelor of Technology in Electronics and Communication Engineering (Honours)
Major GCPA: 9.02/10
Awards: Deans Merit List, Undergraduate Research Award
Coursework: Statistics in AI, Topics in Applied Optimization, Mobile Robotics
Statistical methods in AI: PCA, SVMs, Bayesian inference, logistic regression, image classification
Applied optimization: linear programming, convex optimization, singular value decomposition
Computer vision: camera calibration, SIFT, grab-cut, Mask-RCNNs, bag of words, Viola-Jones
Mobile robotics: pose-graph optimization, epipolar geometry, RRT, non-linear optimization
Digital image processing: edge detection, morphological operations, alpha blending
Data structures and algorithms: graph algorithms, dynamic programming, complexity analysis
Operating systems: process management, memory allocation, file systems, concurrency
Linear algebra: matrix operations, eigenvalues, vector spaces, linear transformations
Compilers: lexical analysis, parsing, code generation and grammar
Game theory: Nash equilibrium, mechanism design

Teaching Experience

Carnegie Mellon University
Introduction to Deep Learning (11-785)
Course Website
Description: Comprehensive graduate-level course covering neural networks, CNNs, RNNs, transformers, and modern deep learning architectures.
• Created educational content including slides and tutorials for NumPy fundamentals and Loss functions (Focal Loss, Chamfer Loss, RLHF)
• Designed a Colab Notebook and made slides to lead a lab on Backprop and Training Convolutional Neural Networks.
• Collaborated with instructional team to revise and update homework assignments for RNNs, GRUs, Transformers, Language Generation, Diffusion models and PEFT
• Conducted over 40 hours of office hours, labs, and hackathon events, providing hands-on instruction and problem-solving support for undergraduate and graduate students
International Institute of Information Technology, Hyderabad
Multiple Courses
Courses: Mobile Robotics, Music Mind and Technology, Introduction to Coding Theory
CS7.503.M21: Mobile Robotics: The most renowned course of IIIT-H. Provides students with a comprehensive toolkit for research at the intersection of Robotics and Computer Vision, covering SLAM, Computer Vision and Planning algorithms
CS9.434.S22: Music, Mind and Technology: An interdisciplinary course using algorithms and mathematics to explore how music is perceived by individuals and groups. Served as head TA, designing evaluations for over 60 graduate and undergraduate students
EC5.205.S21: Introduction to Coding Theory: A fascinating subject building on Shannon's Theory of Communication, exploring the mathematical foundations that underpin everyday communication systems

Technical Skills

Languages: Python, C++, MATLAB, CUDA, Java, Go, Swift
ML/AI: PyTorch, TensorFlow, Scikit-learn, PyTorch3D
Tools: ROS2, Unity 3D, OpenCV, XCode, Django, Git
Miscellaneous: Rust, JAX, Docker, Kubernetes, AWS, GCP

Original Template taken from here!