Sreeharsha Paruchuri

Sreeharsha Paruchuri

Hi, I'm Harsha, a robotics graduate student at Carnegie Mellon University's Robotics Institute, pursuing the Master of Science in Robotic Systems Development (MRSD). I'm currently working on giving robots the ability to perceive complex 3D environments and take informed decisions for long-horizon tasks through Reinforcement Learning.

At IIIT-Hyderabad, I explored robotics and computer vision with Prof. Madhava Krishna and computational social science with Prof. Vinoo Alluri, learning to approach problems from multiple angles and design interdisciplinary solutions. Currently, I'm sharpening that foundation through MRSD's unique blend of coursework, systems engineering, and entrepreneurship training which challenge me to think beyond code and about scalability, reliability, and teamwork in building complex robotic systems.

Previously, I worked as a Pre-Doctoral Research Fellow at TCS Research where I Investigated reinforcement learning methods for embodied agents to integrate audio-visual perception for spatial understanding and navigation, bypassing explicit SLAM through learned internal representations.

Research Interests: 3D Computer Vision and Long-Horizon Robot Intelligence for embodied agents.

Email / LinkedIn / Github / Resume

News

Oct 2025: Finished in 3rd place in the CMU VLA Challenge and presented our work at IROS 2025! 🏆 [slides] [certificate]

Sep 2025: Conducted an in-person lab on Backpropagation and Training Convolutional Neural Networks. 📚

Aug 2025: Completed my summer internship at Mach9 in the Bay Area, focusing on 3D Computer Vision. 🚀

Apr 2025: Successfully demonstrated our Apple Vision Pro + Robot assisted semi-autonomous Total Knee Arthroplasty system. 🩺

Aug 2024: Started graduate school at CMU’s Robotics Institute! 🤖

May 2024: Submitted our paper on Audio-Visual Navigation to CoRL 2024. 🤞

Feb 2024: Admitted to CMU’s MRSD program!

Feb 2024: Invited to Google Research Week 2024 in Bangalore, India. 🎓

Nov 2023: Finished 4th place internationally in the Habitat Open Vocabulary Mobile Manipulation Challenge at NeurIPS 2023.

Jul 2022: Started working at TCS Research, Kolkata, on long-horizon robot navigation using Deep Reinforcement Learning.

Jul 2022: Graduated from IIIT-H with honours in Robot Perception! Grateful to my family and friends for making it possible. ✨

May 2022: Built an autonomous sanitization robot on a budget of $5000 and finished as runners up in a nationwide competition. 🧼

Industry Experience

	Mach9 May 2025 - Aug 2025 Perception Software Engineering Intern Focus: Developed CUDA-accelerated 2D-3D feature correspondence pipeline and fine-tuned Vision-Language Models to speed up quality assurance in outdoor surveying systems View Details • Developed and deployed Pavement Symbol Extraction functionality to the Digital Surveyor software via CUDA-accelarated coordinate frame transformations and segmentation masks. • Implemented unit and CI testing pipelines with GitHub Actions to validate CUDA kernels and vector-field clustering modules, ensuring production-grade reliability. • Fine-tuned a Vision-Language Model (VLM) to implement a secondary-inference pipeline to classify extracted open-set painted symbols according to user specifications. • Conducted 70+ controlled ablation experiments with A/B testing on Hungarian Assigner Costs, Loss weights, Model Queries, Multi-Scale Deformable Attention and encoder-decoder expressivity to boost the performance of the production model by 4%. • Utilised methods from Object-Detection literature to qualitatively capture a DETR-based polyline detection model's uncertainty to expedite downstream Quality Assurance and Quality Control processes, saving company and customer resources.
	Tata Consultancy Services Research July 2022 - July 2024 Pre-Doctoral Research Fellow Focus: Led research in the realm of cognitive robotics, emphasizing navigation that focussed on audio-visual feature correspondence and reinforcement learning for active SLAM. View Details • Audio-Visual Navigation: Led development of embodied AI agent with multimodal sensing, training online RL policy with novel class-agnostic reward, reducing path length by 21% • Offline RL for Indoor Robot Navigation: Built simulation pipeline to collect large-scale trajectory datasets, training a Causal Decision Transformer with early multimodal fusion; integrated environment randomization, behavior cloning baselines, and replay buffer curation to improve policy robustness and sample efficiency • CLIP-Enhanced Scene Graphs: Designed contrastive-learning framework to compute visual-language embeddings, leveraging GNNs to model object-region relationships • Open Vocabulary Manipulation (NeurIPS 23): Developed active SLAM exploration algorithm conditioned on probabilistic semantic map, improving task success by 60%. • Volunteered for the Project Synergy initiative by TCS wherein volunteers taught written and spoken English to students in a Bangla-medium government school.
	Robotics Research Center (RRC, IIIT-H) Jan 2020 - June 2022 Research Assistant Focus: Worked on dense 3D reconstruction and utilizing SLAM techniques such as pose-graph optimization indoor and outdoor autonomy on self-driving vehicles. View Details • Autonomous Sanitization Robot: Designed and implemented end-to-end robotic system during COVID-19 to autonomously sanitize indoor spaces, integrating computer vision, Visual-SLAM, and coverage-based navigation • Sim-to-Real Deployment: Built Gazebo simulation environments for iterative testing, then transferred stack to hardware platform with onboard sensors and sanitization actuators; finished runner-up among 140 teams • LiDAR SLAM: Evaluated LiDAR odometry and mapping approaches such as LOAM using CARLA simulation and outdoor driving data, analyzing localization accuracy and map consistency • Depth Estimation: Implemented stereo and monocular depth estimation methods on driving datasets including KITTI and NuScenes, developing a ROS package for multi-view bundle adjustment

I have also been a part of

Bosch Research and Technology Center

May 2021 - Aug 2021

Computer Vision Intern

• Fused Laser, Camera, and Odometry data using Kalman filtering to boost online Multi-Object Tracking performance by 11% IoU on outdoor autonomous driving datasets
• Augmented difficult-to-obtain real-world LiDAR datasets using synthetic data from generative models and physics engines, improving 3D object detection networks for outdoor scenarios

PreCog Lab, IIIT-H

2020 - 2021

Research Assistant - Information Retrieval and Computational Social Science

• Applied statistical machine learning with Music Information Retrieval to analyze lyrical regularities as early indicators of mental illness; Published results at INTERSPEECH 2021
• Scraped Reddit data to link music-sharing trends with mental health during COVID-19 using BERT embeddings and DBSCAN clustering; Published in medical journal

Projects & Research

	Augmented-Reality and Robot Assisted Knee Surgery website / code Gathered and analyzed requirements from user studies, market competition, and sponsors to inform system development. Processed 3D and RGB information from the Apple Vision Pro to detect bone models in the environment via ICP registration. View Details • Project Leadership: Led a 5-person team as Project Manager, driving scheduling, sponsor communication, and system integration for an AR-assisted surgical robotics platform • Accuracy-driven Perception: Achieved sub-4 mm drilling accuracy in total knee arthroplasty using a KUKA MED7 arm with multi-stage pointcloud registration (SAM2 + ICP) • AR Integration: Integrated Apple Vision Pro for dynamic bone tracking and real-time surgeon-in-the-loop planning across long surgical horizons • Motion Planning: Designed and deployed a ROS + MoveIt planning subsystem that adaptively updates as surgical pins are drilled, enabling safe trajectory generation • Hardware Development: Built a custom 3D-printed drill end-effector with embedded control electronics, activated via trajectory execution for autonomous drilling
	3D Foundation-Models for Monocular Video Reconstruction report / code Implemented semantic-geometric feature fusion using cross-attention between foundation model embeddings (DINOv2, Depth Anything) in a hierarchical state representation to recover camera extrinsics. Devised an adaptive keyframe selection strategy for confidence-aware pointmap refinement using a DUST3R-style architecture. View Details • Foundation Model Fusion: Designed cross-attention mechanism to combine DINOv2 semantic features with Depth Anything geometric priors, achieving robust 3D scene understanding from monocular video • Adaptive Keyframe Selection: Developed confidence-aware algorithm that dynamically selects optimal frames for reconstruction, improving pointmap quality by 30% over uniform sampling • DUST3R Architecture: Implemented hierarchical state representation with multi-scale feature pyramids to handle camera motion estimation and dense 3D reconstruction simultaneously
	CMU VLA Challenge problem / code Built a Vision-Language Navigation (VLN) system that answered natural language queries by combining `Gemini 2.5 Pro` embodied reasoning with a custom ROS state machine. The system produced numerical answers, object references, or waypoint plans under a strict 10-minute limit. View Details • Natural Language Understanding: “closest to the window” used `Gemini 2.5 Pro` to classify and intelligently reason over spatial relations • State Machine: designed a ROS state machine to coordinate exploration, mapping, and answering with dynamic transitions • Deployment Ready: Deployed the system on a real robot for the challenge through clean docker containerization.
	Pose Graph Optimization for 2D SLAM report / code Implemented a 2D SLAM backend where noisy odometry and loop closure constraints were refined into a globally consistent trajectory. Used `jax` to compute residuals and Jacobians, applied nonlinear least-squares optimization, and validated improvements with RPY and APE error metrics. Explored the role of confidence weighting in the information matrix and compared against `g2o` optimization with robust kernels. View Details • Iterative Optimization: Built custom nonlinear solver in JAX with residual and Jacobian computation for pose updates • Loop Closures: Studied effect of varying odometry vs loop closure confidence weights on trajectory quality visualized in g2o_viewer • Error Evaluation: Quantified improvements via RPY drift and Absolute Pose Error reduction compared to initial odometry • Literature Review: Analyzed “Past, Present & Future of SLAM” survey, contextualizing open problems in robustness and scalability with deep learning-based approaches
	Music, Mental Health, and Representation Learning publication / code Applied BERT-based sentiment analysis and k-means clustering to uncover nuanced links between language and acoustic music features in data scraped from mental health related subreddits during COVID-19. This research contributed to understanding the relationship between music and mental health through computational methods. View Details • BERT Sentiment Analysis: Fine-tuned transformer models on mental health discourse to extract emotional patterns from 50k+ Reddit posts, achieving 87% accuracy in mood classification • Music Information Retrieval: Developed acoustic feature extraction pipeline using librosa and essentia to correlate musical elements (tempo, key, valence) with psychological states • COVID-19 Impact Study: Applied k-means clustering and statistical analysis to identify significant behavioral shifts in music consumption patterns during pandemic, published findings at INTERSPEECH 2021

View More Projects →

Education

Carnegie Mellon University, School of Computer Science

Aug 2024 - May 2026

Master of Science in Robotic Systems Development (MRSD)
CGPA: 4.11/4.0
Teaching: Introduction to Deep Learning
Coursework: Learning for 3D Vision, Generative AI, Deep RL

View Details

• Learning for 3D vision: 3D generation, volume rendering + NeRFs, gaussian splatting + diffusion-guided optimization, Classifier-Free Guidance, PointNet classification and segmentation
• Generative AI: grouped query attention + RoPE in GPT-2, diffusion models and VAEs, parameter-efficient fine-tuning + DPO, In-Context Learning, Mixture of Experts, paper review
• Deep reinforcement learning: policy gradients, Q-learning, Performance Difference Lemma, actor-critic methods, Proximal Policy Optimization, evolutionary methods, DAgger, Imitation Learning
• Robot autonomy, mobility and control: grasping, Kalman filtering, control for drones, humanoids
• Advanced computer vision: homography, Lucas-Kanade tracking, photometric stereo
• Systems engineering: functional architecture, unit tests, project management

International Institute of Information Technology, Hyderabad

Aug 2018 - July 2022

Bachelor of Technology in Electronics and Communication Engineering (Honours)
Major GCPA: 9.02/10
Awards: Deans Merit List, Undergraduate Research Award
Coursework: Statistics in AI, Topics in Applied Optimization, Mobile Robotics

View Details

• Statistical methods in AI: PCA, SVMs, Bayesian inference, logistic regression, image classification
• Applied optimization: linear programming, convex optimization, singular value decomposition
• Computer vision: camera calibration, SIFT, grab-cut, Mask-RCNNs, bag of words, Viola-Jones
• Mobile robotics: pose-graph optimization, epipolar geometry, RRT, non-linear optimization
• Digital image processing: edge detection, morphological operations, alpha blending
• Data structures and algorithms: graph algorithms, dynamic programming, complexity analysis
• Operating systems: process management, memory allocation, file systems, concurrency
• Linear algebra: matrix operations, eigenvalues, vector spaces, linear transformations
• Compilers: lexical analysis, parsing, code generation and grammar
• Game theory: Nash equilibrium, mechanism design

Teaching Experience

Carnegie Mellon University

Introduction to Deep Learning (11-785)
Course Website
Description: Comprehensive graduate-level course covering neural networks, CNNs, RNNs, transformers, and modern deep learning architectures.

View Details

• Created educational content including slides and tutorials for NumPy fundamentals and Loss functions (Focal Loss, Chamfer Loss, RLHF)
• Designed a Colab Notebook and made slides to lead a lab on Backprop and Training Convolutional Neural Networks.
• Collaborated with instructional team to revise and update homework assignments for RNNs, GRUs, Transformers, Language Generation, Diffusion models and PEFT
• Conducted over 40 hours of office hours, labs, and hackathon events, providing hands-on instruction and problem-solving support for undergraduate and graduate students

International Institute of Information Technology, Hyderabad

Multiple Courses
Courses: Mobile Robotics, Music Mind and Technology, Introduction to Coding Theory

View Details

• CS7.503.M21: Mobile Robotics: The most renowned course of IIIT-H. Provides students with a comprehensive toolkit for research at the intersection of Robotics and Computer Vision, covering SLAM, Computer Vision and Planning algorithms
• CS9.434.S22: Music, Mind and Technology: An interdisciplinary course using algorithms and mathematics to explore how music is perceived by individuals and groups. Served as head TA, designing evaluations for over 60 graduate and undergraduate students
• EC5.205.S21: Introduction to Coding Theory: A fascinating subject building on Shannon's Theory of Communication, exploring the mathematical foundations that underpin everyday communication systems

Technical Skills

Languages: Python, C++, MATLAB, CUDA, Java, Go, Swift
ML/AI: PyTorch, TensorFlow, Scikit-learn, PyTorch3D

Tools: ROS2, Unity 3D, OpenCV, XCode, Django, Git
Miscellaneous: Rust, JAX, Docker, Kubernetes, AWS, GCP

Original Template taken from here!