Intelligent robotics with digital-twin alignment : semantic navigation, manipulation, planning, and human-to-robot action transformation

No Thumbnail Available

Meeting name

Sponsors

Date

Journal Title

Format

Thesis

Subject

Research Projects

Organizational Units

Journal Issue

Abstract

This dissertation advances AI-empowered indoor robotics through four interconnected contributions that unify navigation, manipulation, semantic planning, and human-to-robot action transformation within a digital-twin-aligned framework. GRIP, a grid-aware semantic navigation module, integrates symbolic scene understanding with hybrid search-and-policy execution to achieve robust and context-aware ObjectNav. PathFormer, a transformer-based manipulation model structured around a 3D spatial--semantic grid, generates smooth, interpretable, and physically consistent trajectories that remain tightly aligned with digital-twin simulation. KG-Transformer, a knowledge-guided semantic planner, leverages a lightweight digital twin to calibrate execution, veto unsafe behaviors, and autonomously repair failing plans across diverse indoor environments. ActionFormer, an action-generation transformer, introduces a unified imitation-learning pipeline that integrates human-activity recognition, human-motion generation, and robot-motion generation. ActionFormer supports more than twenty complex human activities, producing robot-ready demonstrations that generalize across platforms and enable end-to-end imitation learning from video and landmark sequences. Collectively, these contributions establish a coherent foundation for AI-empowered robotics grounded in digital-twin intelligence. Across benchmarks and real-world deployments, GRIP yields up to 9.6% higher success rate and more than 2x gains in path efficiency (SPL, SAE). PathFormer produces digitally consistent manipulation trajectories validated through robust sim-to-real transfer. KG-Transformer achieves 99.6% executability, delivers a +4.6-point improvement on unseen-scene tasks, and eliminates safety violations in both simulated and multi-robot execution. ActionFormer attains state-of-the-art performance in human-activity recognition and high execution accuracy across more than 20 activities, generating realistic human-motion traces and corresponding robot-motion trajectories for embodied robotic demonstration. Together, these advances deliver a trustworthy, semantically aligned, and high-performance simulation-to-reality pipeline that significantly enhances the adaptability, reliability, and real-world readiness of autonomous indoor robotic systems.

Table of Contents

Introduction -- GRIP: a unified framework for grid-based relay and co-occurrence-aware planning in dynamic environments -- PathFormer: a transformer with 3D grid constraints for digital twin robot-arm trajectory generation -- KG-transformer: evidential knowledge-graph planning for safe language-to-action execution -- ActionFormer: a unified framework for human-to-robot action generation -- Conclusion and future work

DOI

PubMed ID

Degree

Ph.D. (Doctor of Philosophy)

Thesis Department

Rights

License