Ziyu Zhong

📍 Lund, Sweden
✉️ ziyu.zhong@eit.lth.se
LinkedIn

About

I am currently a doctoral student in Networked and Extended Reality (XR) at Lund University.
My research lies at the intersection of:

Multimodal machine learning (pose, video, IMU/mocap),
Real-time XR/AR systems (CloudXR, edge-assisted rendering),
Latency-aware prediction and inference optimization.

Before my PhD, I completed an M.Sc. in Computer Science at The University of Hong Kong and a B.A. in Digital Media Development at York University.
I also worked as a software engineer at Tencent (QQ, PCG) and ByteDance (Video Architecture), building large-scale multimedia and graphics-related systems.

I enjoy building end-to-end systems that go from data collection and calibration, through ML modeling and optimization, to interactive prototypes in XR, AR, and mobile environments.

Research Interests

Vision / XR foundation models and representation learning
XR/VR/AR pose tracking and motion prediction
Latency-aware inference and edge/cloud offloading

Education

Ph.D. in Networked & Extended Reality (XR), Lund University, Sweden
2023 – Present
M.Sc. in Computer Science (Multimedia Computing), The University of Hong Kong, Hong Kong
2020 – 2022
B.A. in Digital Media Development, York University, Canada
2016 – 2020

Projects and Publication

Conference Papers

[VRST]Predictability-Aware Motion Prediction for Edge XR via High-Order Error-State Kalman Filtering
ACM Symposium on Virtual Reality Software and Technology (VRST), ACM, 2025.
Authors: Ziyu Zhong, …
Summary: Latency-aware motion prediction for distributed / cloud XR rendering, compensating motion-to-photon latency using SLAM-based and ML-based pose prediction.
DOI: 10.1145/3756884.3765973

Real-Time Translation System (M.Sc. Thesis)

Designed a full real-time simultaneous translation pipeline:
- speech-to-text,
- neural translation (Transformer-based),
- low-latency delivery to mobile clients.
Built a production-style cloud service and an Android demo app to showcase the end-to-end system.
Focused on balancing latency, QoE, and model complexity, giving me practical experience in deploying ML systems in real-world conditions.

GPU-Accelerated Max-Flow for Real-Time Workloads

Implemented a CUDA-based Max-Flow solver, offloading CPU bottlenecks to GPU.
Achieved ≈80% reduction in runtime compared to the CPU baseline.
Used Nvidia Nsight tools to analyze warp divergence, memory throughput, and kernel-level bottlenecks.
This work informs my approach to efficient inference and acceleration.

Industry Experience

Software Engineer — QQ (PCG)

Tencent, Shenzhen, China · Dec 2020 – Feb 2023

Maintained and evolved a large-scale Android codebase for QQ (PCG), working across legacy modules and modern Kotlin/Jetpack components.
Owned end-to-end features in high-traffic UI flows, coordinating changes across app, rendering, and service layers to ensure stability and backward compatibility.
Refactored critical paths (Java/Kotlin/C++) to reduce ANRs and jank, profiling with Systrace/Perfetto and optimizing RecyclerView diffing, threading, and I/O.
Integrated and hardened cross-platform UI via the PTS2 2D engine (C++), including a DSL parser and shared rendering/layout primitives for Android/iOS.
Improved build and release reliability with Gradle/CMake optimizations, modularization, and CI pipelines; monitored performance and crashes via Grafana and internal metrics.
Collaborated with multi-team stakeholders (graphics, backend, QA) to triage device-specific issues, validate fixes across diverse Android OEMs, and ship weekly releases at scale.

Software Engineer Intern — Video Architecture

ByteDance, Beijing, China · Jun 2020 – Aug 2020

Gained deep familiarity with the cloud gaming/phone codebase, tracing end-to-end data paths across capture, encode, transport, and render pipelines.
Designed and validated signaling flows (session setup/teardown, device–cloud state sync, QoS/bitrate adaptation) using ByteRTC/WebRTC-like mechanisms, including NAT traversal, ICE/STUN/TURN, and custom control channels.
Improved reliability and latency by profiling message queues and transport layers (UDP/TCP), optimizing handshake, heartbeat, and reconnect logic for heterogeneous devices.
Implemented device–cloud state synchronization (Java/C++ with ByteRTC) for low-latency streaming.
Worked on a cross-platform AirPlay streaming SDK (Android/Windows), focusing on multimedia performance and compatibility.

Technical Skills

Machine Learning: PyTorch, TensorFlow; temporal models; basic GPU acceleration.
Programming: Python, C/C++, Java, Kotlin, Rust, CUDA, Shell, CMake.
Systems & Tools: Git, Docker, Kubernetes, Slurm, gdb, tsan, Wireshark, Grafana, Nvidia Nsight Systems/Compute.
Platforms: Android, Linux, macOS, Windows; Raspberry Pi, Arduino.
Web & Data: HTML/CSS/JavaScript, SQL, DB2.