Research
Trajectory representation learning is one of my established research topics. It aims to learn representations of trajectories (of vehicle, human, or vessel, etc.) that are universal across various downstream tasks. This also extends to building a multi-task/foundational model for trajectories that can perform various tasks at once.
UniTE: A Survey and Unified Pipeline for Pre-training Spatiotemporal Trajectory Embeddings
Surveys existing methods for learning reusable representations of movement trajectories and brings them together into a single modular framework with shared code, so that new methods can be built, compared, and evaluated on common ground.
UVTM: Universal Vehicle Trajectory Modeling with ST Feature Domain Generation
A single model for vehicle GPS trajectories that handles many tasks such as travel time estimation, trajectory recovery, and trajectory prediction, instead of maintaining a separate model for each. It stays accurate even when trajectories are sparse or only part of their features are available, by learning to rebuild dense and complete trajectories from incomplete ones.
Pre-training General Trajectory Embeddings with Maximum Multi-view Entropy Coding
Learns general-purpose representations of movement trajectories from unlabeled data that capture both travel behavior and spatial and temporal patterns. The learned representations avoid task-specific bias so they transfer well across many downstream tasks.
Pre-training Context and Time Aware Location Embeddings from Spatial-Temporal Trajectories for User Next Location Prediction
Pre-trains location representations from movement trajectories that capture how the meaning of a place changes with its surrounding context and the time of visit, leading to more accurate prediction of a user's next location.
TransferTraj: A Vehicle Trajectory Learning Model for Region and Task Transferability
Learns from vehicle GPS trajectories in a way that transfers across different geographic regions and different prediction tasks without retraining, removing the need to keep separate specialized models. Handles each task by treating it as recovering hidden parts of a trajectory, so one model serves many tasks even with limited data.
Spatiotemporal data mining is a broader topic that is inclusive of the above. It covers research on extracting and utilizing patterns from large-scale data with spatial information and that is dynamic over time (usually sourced from transportation scenarios).
DiSGMM: A Method for Time-varying Microscopic Weight Completion on Road Networks
Fills in missing fine-grained, time-varying traffic conditions on road networks, such as travel speeds on individual road segments during specific time periods, when observations are sparse both across segments and within each segment. Estimates the full range of likely conditions for each segment and time rather than a single value.
Origin-Destination Travel Time Oracle for Map-based Services
Estimates the travel time between an origin and a destination at a given departure time by learning from many historical trips that connect the same pair of locations. It first infers a likely route between the two points and then predicts the travel time along it, improving accuracy for map-based navigation services.
RIPCN: A Road Impedance Principal Component Network for Probabilistic Traffic Flow Forecasting
Forecasts future traffic flow across a road network while also estimating how uncertain each prediction is, combining transportation theory with learned models to capture how congestion shifts traffic between connected roads over time.
PLMTrajRec: A Scalable and Generalizable Trajectory Recovery Method with Pre-trained Language Models
Recovers the missing points in sparse movement trajectories to restore detailed paths, while needing only a small amount of dense training data and generalizing across trajectories recorded at different sampling rates.
Path-LLM: A Multi-Modal Path Representation Learning by Aligning and Fusing with Large Language Models
Learns representations of paths in a road network by combining the network structure with text that describes physical and regional context, which earlier methods left out. The combined view improves accuracy on tasks such as ranking paths and estimating travel time, including settings with little or no labeled data.
AI for materials science is a new research direction I am exploring. The promise is that data mining-driven AI enables discovery of new materials with tailored properties, a non-trivial process in traditional materials design.
Inverse Design of Amorphous Materials with Targeted Properties
Generates atomic structures of disordered materials such as glasses that match desired target properties, and refines them into stable low-energy configurations. Also introduces new datasets of amorphous materials to support this kind of design.
AMDEN: Amorphous Materials DEnoising Network
Research on Inverse Design of Materials Using Diffusion Probabilistic Models
This project focuses on developing diffusion probabilistic models to first understand the relationship between chemistry/structure and material properties, then enable the inverse design of new materials with specific properties.
SculptDrug: A Spatial Condition-Aware Bayesian Flow Model for Structure-based Drug Design
Generates drug molecules that fit a target protein's three-dimensional structure, keeping the molecules shaped to sit within the protein's surface and consistent with both its overall form and fine details. This produces more accurate candidate molecules for structure-based drug discovery.