Abstract: Video Moment Retrieval is a common task to evaluate the performance of visual-language models-it involves localising start and end times of moments in videos from query sentences. The ...