large language models, one-shot video tuning, segmentation guidance, text-to-video generation, video semantic analysis, video thumbnail generation.