Tracking Any Point (TAP) plays a crucial role in motion analysis. Video-based approaches rely on iterative local matching for tracking, but they assume linear motion during the blind time between frames, which leads to target point loss under large displacements or nonlinear motion. The high temporal resolution and motion blur-free characteristics of event cameras provide continuous, fine-grained motion information, capturing subtle variations with microsecond precision. This paper presents an event-based framework for tracking any point, which tackles the challenges posed by spatial sparsity and motion sensitivity in events through two tailored modules. Specifically, to resolve ambiguities caused by event sparsity, a motion-guidance module incorporates kinematic features into the local matching process. Additionally, a variable motion aware module is integrated to ensure temporally consistent responses that are insensitive to varying velocities, thereby enhancing matching precision. To validate the effectiveness of the approach, an event dataset for tracking any point is constructed by simulation, and is applied in experiments together with two real-world datasets. The experimental results show that the proposed method outperforms existing SOTA methods. Moreover, it achieves 150% faster processing with competitive model parameters.
(a) Framework overview. Given the event data and the initial positions of target points as input, the model initializes the locations for subsequent time steps, along with appearance features. It then iteratively calculates kinematic features and updates the appearance correlation map at each point to refine the trajectory. (b) Motion-Guidance Module. MGM extracts kinematic features from the gradient information in the event stream, guiding appearance feature matching and forming a dynamic-appearance matching space with the appearance features. (c) Variable Motion Aware Module. VMAM leverages kinematic features from MGM to produce temporally consistent feature responses, thereby resulting in robust correlation maps.
To be updated