Enhancing Long Video Question Answering with Scene-Localized Frame Grouping

1Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen, China 2School of Electronic and Computer Engineering, Peking University, Shenzhen, China 3School of Computer Science, Wuhan University, Wuhan, China 4University of Science and Technology of China, Hefei, China
主要图表

图1: SLFG方法

对比图表1

图2a: SceneQA 任务定义

对比图表2

图2b: LVBench vs LVSQA

引用

@article{your_paper_2024,
  title={你的论文标题},
  author={作者1 and 作者2 and 作者3},
  journal={期刊或会议名称},
  year={2024}
}