Xiang Li

Xiang Li (李翔)
Associate Professor, College of Computer Science, Nankai University
Address: No. 38, Tongyan Road, Haihe Education Park, Tianjin, China
Email: xiang.li.implus [at] {nankai.edu.cn}
Research Group Page: IMPlus@PCALab

About Me [GitHub] [Google Scholar] [Research Group]

李翔，南开大学副教授，博导，南开百青、五四青年奖章获得者，入选博新计划A档，获CCF优博提名，吴文俊人工智能优秀青年奖，斯坦福全球2%顶尖科学家。在CCF-A类会议及期刊上发表40余篇论文，谷歌学术引用1.7万余次，其中2篇一作论文单篇引用分别达3000和1500余次。相关工作被诺贝尔物理学奖、图灵奖得主Hinton教授团队重点跟进，并成为工业界主流轻量目标检测器 YOLO 系列的标准配置，获高水平期刊CVMJ 年度最佳论文提名奖。在全球大数据人工智能领域顶尖赛事中持续多年带领团队斩获多项冠军及高名次奖项，竞赛累计获奖金额达130余万，曾获计图Jittor人工智能挑战赛冠军（1/154队伍），滴滴研究院Di-Tech首届算法大赛全球总冠军（1/7664队伍），阿里巴巴天池大数据竞赛首届阿里移动推荐算法冠军（1/7186队伍），央视科教频道《走近科学》专集报道了这一成果。近年来带领团队开辟了“智能科研服务”研究领域，旨在利用多模态大模型技术解决科研人员的科研效率问题。部分成果以自动化科研资讯形式落地科研社区服务“减论”账号，半年累计产生了100万余次学习访问量，收获大量科研社区好评。减论IP创始人，新芽计划发起人，2025年度CCF学生领航计划（CCF SPP）工作组委员。

I'm an Associate Professor in College of Computer Science, Nankai University, in the Team of Ming-Ming Cheng. I got my PhD degree from the Department of Computer Science and Technology, Nanjing University of Science and Technology (NJUST) in 2020. My advisor is Prof. Jian Yang from NJUST, who is a Changjiang Scholar. My vice-advisor is Prof. Xiaolin Hu from Tsinghua University. I started my postdoctoral career in NJUST as a candidate for the 2020 Postdoctoral Innovative Talent Program, supervised by Prof. Jinhui Tang. In 2016, I spent 8 months as a research intern in Microsoft Research Asia, supervised by Prof. Tao Qin and Prof. Tie-Yan Liu. I was a visiting scholar at Momenta, mainly focusing on monocular perception algorithm.

My recent works are mainly on:

Intelligent Research Service
LLM Agent
neural architecture design, CNN/Transformer
object detection/recognition
unsupervised learning
knowledge distillation

We are looking for self-motivated students! Please feel free to contact me through the email (attach your CV). We would not push, but you should always be self-driven for your own target, i.e., making solid and impactful contributions to the CV/AI community.

Team

副教授

戴一冕

状态: 副教授

研究方向: 红外小目标

成果: 主持发布 SIRST V1、SIRST V2、DenseSIRST、HazyDet、GrokLST 等多个开源数据集。主持国自然青基、博士后面上等校企合作项目 5 项，主要成果发表在 IJCV、IEEE TGRS等国际知名期刊。谷歌学术引用 3500 余次引用，入选斯坦福前 2% 顶尖科学家榜单。曾获河南省教育厅科技成果二等奖（排名第二）、首届粤港澳大湾区国际算法算例大赛遥感目标检测赛道亚军、“吉林一号”杯卫星遥感应用青年创新创业大赛一等奖。

博士

杨凌风

状态: 博士

研究方向: 多模态感知，大模型

成果:2021首届“征图杯”校园机器视觉人工智能大赛亚军，2022第二届计图人工智能挑战赛冠军，2022第五届开源创新大赛团体一等奖。在CVPR，NeurIPS，TPAMI 等顶级会议期刊上发表论文数篇，Google Scholar引用500+。影石Insta360自动剪辑算法研发主要成员，申请两项发明专利。2024院长奖章、优秀毕业生、博士生国家奖学金和优秀研究生干部获得者。

赵鹏海

状态: 博士

研究方向: 智能科研服务

成果:“减论Agent系统”、“减论APP”算法研发负责人，推动科研流程的智能化与自动化，减论IP学术账号获得全网3万+关注，播放量近100万次，HuggingFace接口调用5000余次。获得粤港澳国际算法算例大赛三等奖，在AAAI等国际顶级会议及SCI期刊上发表多篇学术论文。未来将持续探索智能技术在科研流程中的深度融合与应用创新。

李宇轩

状态: 博士

研究方向: 遥感感知、大模型

成果: 2022第二届计图人工智能挑战赛冠军，2022第五届开源创新大赛团体一等奖，2022首届粤港澳大湾区国际算法算例大赛二等奖。在NeurIPS，IJCV，ICCV等顶级会议期刊上发表论文数篇，Google Scholar引用1300+。2024 PRCV竞赛和2025 第七届全球校园人工智能算法精英大赛出题人。

陈震元

状态: 博士

研究方向: 多智能体感知、大模型

成果: 南开-新奥质信实验室算法架构负责人，影石Insta360自动剪辑算法研发主要成员。多次获得CVPR2020、CVPR2021不完备数据竞赛语义分割项目和目标定位项目冠军、季军。于2023年获得第三届计图挑战赛季军和第六届开源设计大赛二等奖。曾于京东探索研究院和旷视科技进行实习。申请一项PCT国际专利和两项国内专利。

李政

状态: 博士

研究方向: 多模态模型，模型压缩

成果: 在Kaggle竞赛中获得两次金牌，获Kaggle Master。一作在ICCV，CVPR，AAAI等会议期刊上发表多篇论文，谷歌学术引用450+。曾获研究生国家奖学金。曾在旷视科技，蚂蚁集团，阿里巴巴达摩院进行研究性实习。

武戈

状态: 博士

研究方向: 多模态模型，模型压缩

成果: 2022年首届粤港澳大湾区国际算法算例大赛三等奖、2023年获得第三届计图挑战赛季军和第六届开源设计大赛二等奖。一作发表发表ECCV论文一篇。本科曾获河南省三好学生。

唐文浩

状态: 博士

研究方向: 智能科学

成果: 专注于高分辨率图像分析，发表以下论文：路面病害识别：2022 ACM MM，2021, 2023 IEEE T-ITS; 计算病理学：2023 ICCV, 2024 CVPR。在硕士期间获得国家奖学金。

吴俐伽

状态： 博士

研究方向: 大模型、智能科研服务

成果: “减论”APP后端研发核心团队；2024ICPC成都/沈阳银奖、2024ICPC东亚赛区决赛铜奖、2023ICPC杭州银奖、2024CCPC重庆银奖、2023CCPC深圳/秦皇岛银奖；

李林一

状态： 博士

研究方向: 大模型、智能体

成果: 南开-新奥质信实验室子课题算法负责人、2023ICPC西安邀请赛银奖、蓝桥杯国一、程序设计天梯赛国一

彭晨旭

状态： 博士

研究方向: 遥感感知；交互式模型

成果: 获Kaggle2金3银1铜；2021 CCF BDCI婴儿超声血管瘤分割冠军; 2022百度时序动作定位大赛冠军; 2023科大讯飞PET图像分析和疾病预测竞赛冠军; 2025 CVPR第四届反无人机竞赛赛道一冠军; 2024 ICPR弱监督红外小目标检测冠军; 2024 ICPR轻量级红外小目标检测冠军; 2024 PRCV广域红外小目标检测冠军; 2024长光卫星高分辨率道路提取竞赛一等奖; 2017江苏省高等数学竞赛一等奖；2019全国大学生数学建模竞赛二等奖。在NeuroImage，Medical Physics，PRCV等会议期刊上发表论文数篇，Google Scholar引用100+。

王晨旭

状态: 博士

研究方向: 遥感感知、半监督学习

成果: 专注于遥感目标检测领域，2025 CVPR第四届反无人机竞赛赛道一冠军赛道二亚军，2024长光卫星高分辨率道路提取竞赛三等奖；一作发表AAAI 2025论文一篇，二作发表NeurIPS2024论文一篇

硕士

张鑫

状态: 硕士

研究方向: 计算机视觉、遥感感知

成果: 获首届粤港澳大湾区国际算法算例大赛亚军、2023计图人工智能大赛三等奖、2024ISPRS遥感图像解译大赛亚军、2025 长光"吉林一号"杯遥感应用大赛冠军。一作发表CVPR论文一篇，共一发表ECCV论文一篇。本科曾获国家奖学金、重庆市优秀毕业论文、重庆大学十佳优秀共青团员。

李丹阳

状态: 硕士

研究方向: 大模型推理分割，遥感变化检测

成果: 曾获：2024 ISPRS多模态遥感应用算法解译大赛冠军、2025 长光"吉林一号"杯遥感应用大赛冠军、2025 CVPR第四届反无人机竞赛赛道一冠军赛道二亚军、国家奖学金(本科)等荣誉

庞天傲

状态: 硕士

研究方向: 减论后端算法、智能科研服务

成果: 全国大学生数学建模竞赛天津赛区二等奖，25届考研分数410+

过翔天

状态: 硕士

研究方向: 多模态模型、遥感感知

成果: 2022台达杯国际太阳能竞赛优秀奖，25届考研分数410+

本科

章壹程

状态: 本科

研究方向: 遥感感知、大模型

成果: 2024ICPC成都/杭州银奖、2024数学建模比赛天津市二等奖

刘砚桐

状态: 本科

研究方向: 减论后端、智能科研服务

成果: 2024ICPC成都/杭州银奖、2024ICPC西安邀请赛金奖、2025 长光"吉林一号"杯遥感应用大赛冠军

邢清画

状态: 本科

研究方向: 大模型、智能科研服务

成果: 获得南开大学国家励志奖学金，在AAAI2025以第二作者发表论文一篇，参与减论agent部分算法的开发和优化

田晋宇

状态: 本科

研究方向: 大模型、智能科研服务

成果: 获得南开大学国家励志奖学金，在AAAI2025以第四作者发表论文一篇，参与减论agent部分算法的开发和优化

周重天

状态: 本科

研究方向: 减论后端、智能科研服务

成果: 参与减论APP后端算法的开发和优化

杨峥芃

状态: 本科

研究方向: 减论产品、智能科研服务

成果: 第二届“吉林一号”杯卫星遥感应用青年创新创业大赛赛题D三等奖，互联网+减论项目主要负责人

王雨萌

状态: 本科

研究方向: 减论算法、智能科研服务

成果: 获得南开大学公能奖学金，2024全国大学生数学建模竞赛天津赛区二等奖，南开火山杯减论算法负责人

李政霖

状态: 本科

研究方向: 减论产品、智能科研服务

成果: 减论产品设计社区板块负责人、减论基金申请

郭鑫隆

状态: 本科

研究方向: 减论产品、智能科研服务

成果: 获得南开大学学业优秀奖学金，减论产品经理

李颖贤

状态: 本科

研究方向: 多智能体感知、大模型

成果: 获得南开大学国家奖学金，参与POI算法研究和AVT相关工作

李昱

状态: 本科

研究方向: 减论产品、智能科研服务

成果: 减论产品经理及UI设计、减论基金申请

刘祥宇

状态: 本科

研究方向: 减论产品、智能科研服务

成果: 减论产品经理及我的板块设计

钱俊玮

状态: 本科

研究方向: 减论算法、智能科研服务

成果: 减论平台作者定位与引用评价研发负责人

陶文烁

状态: 本科

研究方向: 减论算法、智能科研服务

成果: 南开大学物理学术竞赛作品赛一等奖、互联网+减论项目主要负责人

向宇涵

状态: 本科

研究方向: 减论算法、智能科研服务

成果: 获得南开大学公能奖学金，2023全国大学生数学建模竞赛天津赛区一等奖，参与基础数据元信息提取相关工作

张耕嘉

状态: 本科

研究方向: 减论算法、智能科研服务

成果: 南开大学公能奖学金，2023美国大学生数学建模竞赛H奖，2024全国大学生数学建模竞赛天津赛区省级二等奖，PolarDB数据库创新设计赛优胜奖

许洋

状态: 本科

研究方向: 减论算法、智能科研服务

成果: 南开大学公能奖学金

朱佳慧

状态: 本科

研究方向: 减论产品、智能科研服务

成果: 第二届“吉林一号”杯卫星遥感应用青年创新创业大赛赛题D三等奖

Honor

(See more details (codes, solutions, summaries) in [AICompetition of Group Page])

团队获CVPR workshop即The 4th Anti-UAV Workshop & Challenge最佳论文奖、Track 1赛道第一名、Track 2赛道第二名
团队本科生获2025年南开大学“火山杯”AI智能体创新应用大赛软件专业组一等奖, 5,000 RMB bonus

团队获2025年第二届“吉林一号”杯长光卫星遥感应用青年创新创业大赛特等奖、一等奖、三等奖， 24,000 RMB bonus

1st place of of Change Detection in High-resolution and Multi-temporal Optical Images, 2nd place of Forgery Detection in Multi-scenario Remote Sensing Images of Typical Objects in 2024 TC I Contest on Intelligent Interpretation for Multi-modal Remote Sensing Application, total 13,000 RMB bonus

Second place of IACC International Algorithm Case Competition, namely the remote sensing detection, 100,000 RMB bonus (2nd from 116 teams)

Champion of 2022 Jittor AI competition, namely the landscape picture generation, 50,000 RMB bonus (1st from 154 teams)

Second place of 2020 Zhengtu Cup's first AI competition, namely the industrial defect detection algorithm, 150,000 RMB bonus (2nd from 900 teams)

Champion of 2016 Didi Tech Di-Tech's first big data competition, namely the travel demand prediction algorithm, 100,000 US dollars bonus (1st from 7664 team)

Champion of 2015 Alibaba Tianchi's first big data competition, namely Ali mobile recommendation algorithm, 300,000 RMB bonus (1st from 7186 team)

2015 Dean Medal of School of Computer Science, Nanjing University of Science and Technology, 2016 Presidential Medal of Nanjing University of Science and Technology, 2016 National Scholarship

ACM-ICPC Asia Regional Contest, Silver Medal (1st)

Honors Gallery

优秀毕业论文

张鑫 - 重庆市优秀毕业论文

高森森 - 南开大学优秀毕业论文

竞赛获奖证书

CVPRW一等奖

CVPRW最佳论文奖

CVPRW二等奖

火山杯一等奖

长光卫星杯一等奖

长光卫星杯特等奖

ISPRS变化检测赛道冠军

ISPRS真伪鉴别赛道亚军

2022计图人工智能挑战赛冠军

2022粤港澳大湾区算法大赛亚军

2020征途杯校园机器视觉大赛亚军

2015阿里巴巴大数据竞赛冠军

2016滴滴研究院大数据竞赛冠军

第五届开源创新大赛团体一等奖

2023计图挑战赛二等奖

News

2025-06-26: 2 papers accepted in ICCV 2025, including ATPrompt, an attribute-guided prompt learning approach!
2025-04-09: I have been honored as a recipient of Nankai University's 2025 May Fourth Youth Medal.
2025-02-27: 2 papers accepted in CVPR 2025, including RSAR, a sota Restricted State Angle Resolver and Rotated SAR Benchmark!
2024-10-07: LSKNet: A foundation lightweight backbone for remote sensing published in IJCV.
2024-09-26: 3 papers accepted in NeurIPS 2024, including SARDet100K.
2024-09-16: I was selected for the 2024 World’s Top 2% Scientists list released by Stanford University, USA. See details here.
2024-07-02: 2 papers accepted in ECCV 2024.
2024-06-22: JianLun (减论) IP Project is officially started, focusing on efficient AI understanding and education.
2024-05-29: 1 paper Zone Evaluation accepted in TPAMI.
2024-03-07: Final round of The Wu Wenjun AI Outstanding Youth Award.
2024-02-27: 2 papers (PromptKD, CrossKD) accepted in CVPR 2024.
2023-09-23: 1 paper (FGVP) accepted in NeurIPS 2023.
2023-07-14: 3 papers (including LSKNet, ADNet) accepted in ICCV 2023.
2023-04-25: 1 paper DUAL for Panoramic Depth Completion accepted in ICML 2023.
2023-01: Nomination Award for the CCF Excellent Doctoral Dissertation Incentive Plan. See first round, final result.
2022-11-19: 4 papers (including Curriculum Temperature, DesNet) accepted in AAAI 2023.
2022-09-15: 2 papers (RecursiveMix, DTG-SSOD) accepted in NeurIPS 2022.
2022-07-05: 3 papers (RigNet, M3PT, PseCo) accepted in ECCV 2022.
2022-05-20: 1 paper (UM-MAE) is publicly available in arXiv.
2022-03-02: 1 paper (dynamicMLP) accepted (oral) in CVPR 2022.
2021-12-01: 1 paper (KD for object detection) accepted in AAAI 2022.
2021-05-05: 1 paper (PAN++) is accepted by TPAMI 2021.
2021-03-01: 1 paper (GFocalv2) accepted in CVPR 2021.
2020-09-25: 1 paper (GFocal) accepted in NeurIPS 2020.
2019-12-01: 1 paper (Understanding the disharmony v2) accepted in AAAI 2020.
2019-03-15: 3 papers (SKNet, Understanding the disharmony v1, PSENet) accepted in CVPR 2019.
2020-03-01: 1 paper (ST-CGAN) accepted in CVPR 2018.
2020-06-16: 1 paper (MixNet) accepted in IJCAI 2018.
2016-09-30: 1 paper (LightRNN) accepted in NeurIPS 2016.

Selected Publications

(* indicates equal contribution, # corresponding author)

Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think.
Ge Wu, Shen Zhang, Ruijing Shi, Shanghua Gao, Zhenyuan Chen, Lei Wang, Zhaowei Chen, Hongcheng Gao, Yao Tang, Jian Yang, Ming-Ming Cheng, Xiang Li#
in arXiv, 2025
[Paper] [BibTex] [Code]

[中文解读]
REG is a simple yet effective method that entangles low-level image latents with a single high-level class token derived from pretrained foundation models for denoising. REG significantly improves generation quality and training convergence efficiency.

Advancing Textual Prompt Learning with Anchored Attributes.
Zheng Li, Yibing Song, Ming-Ming Cheng, Xiang Li#, Jian Yang#
in ICCV, 2025
[Paper] [BibTex] [Code]

[中文解读] [中文版]
ATPrompt introduces a new attribute-anchored prompt format that can be seamlessly integrated into existing textual prompt leraning methods and achieve general improvements.

RSAR: Restricted State Angle Resolver and Rotated SAR Benchmark
Xin Zhang, Xue Yang, Yuxuan Li, Jian Yang, Ming-Ming Cheng, Xiang Li#
in CVPR, 2025
[Paper] [BibTex] [Code]

[中文解读] [中文版]
A large-scale multi-class rotated SAR object detection dataset. A unified perspective to analyze the angle boundary discontinuity problem. A weakly supervised model achieved SOTA results.

Fine-Grained Visual Text Prompting
Lingfeng Yang, Xiang Li#, Yueze Wang, Xinlong Wang, Jian Yang#
in TPAMI, 2025
[Paper] [BibTex] [Code]

FGVTP is an improved fine-grained multimodal prompting method over FGVP, enhancing large multimodal models’ localization and grounding via consistent visual–textual alignment.

From Words to Worth: Newborn Article Impact Prediction with LLM
Penghai Zhao, Qinghua Xing, Kairan Dou, Jinyu Tian, Ying Tai, Jian Yang, Ming-Ming Cheng, Xiang Li#
in AAAI, 2025
[Paper] [BibTex] [Code]

[中文解读]
This paper introduces an LLM-based method to predict newborn article impact from titles and abstracts, supported by a new normalized indicator (TNCSI_SP) and a 12K curated dataset.

Sardet-100k: Towards open-source benchmark and toolkit for large-scale sar object detection
Yuxuan Li, Xiang Li#, Weijie Li, Qibin Hou, Li Liu, Ming-Ming Cheng, Jian Yang#,
in NeurIPS, 2024 (Spotlight🎈)
[Paper] [BibTex] [Code]

[中文解读]
SARDet-100K is the first COCO-level large-scale multi-class SAR object detection dataset ever created. A SAR object detection pretrain method: Multi-Stage with Filter Augmentation (MSFA) is proposed to tackle the domain gap problems from the perspective of data input, domain transition, and model migration.

Cascade prompt learning for vision-language model adaptation
Ge Wu*, Xin Zhang*, Zheng Li, Zhaowei Chen, Jiajun Liang, Jian Yang, Xiang Li#,
in ECCV, 2024
[Paper] [BibTex] [Code]

[中文解读] [中文版]
CasPL is a cascade prompt learning framework that integrates generic and specific expertise via boosting and adapting prompts. It is plug-and-play for existing methods and achieves state-of-the-art results with improved performance–efficiency trade-offs on 11 classification datasets.

LSKNet: A Foundation Lightweight Backbone for Remote Sensing
Yuxuan Li, Xiang Li#, Yimian Dai, Qibin Hou, Li Liu, Yongxiang Liu, Ming-Ming Cheng, Jian Yang#,
in IJCV, 2024
[Paper] [BibTex] [Code]

[中文解读]
LSKNet can dynamically adjust its large spatial receptive field to better model the ranging context of various categories of objects in remote sensing scenarios. The lightweight LSKNet backbone network sets new state-of-the-art scores on standard remote sensing classification, object detection, semantic segmentation and change detection benchmarks.

PromptKD: Unsupervised Prompt Distillation for Vision-Language Models.
Zheng Li, Xiang Li#, Xinyi Fu, Xin Zhang, Weiqiang Wang, Shuo Chen, Jian Yang#.
in CVPR, 2024
[Paper] [BibTex] [Code]

[中文解读] [中文版] [中文视频]
PromptKD is a simple and effective prompt-driven unsupervised distillation framework for VLMs (e.g., CLIP), with state-of-the-art performance.

Fine-Grained Visual Prompting
Lingfeng Yang, Yueze Wang, Xiang Li#, Xinlong Wang, Jian Yang#
in NeurIPS, 2023
[Paper] [BibTex] [Code]

[中文解读] [中文视频]
FGVP is a visual prompting technique that improves referring expression comprehension by highlighting regions of interest via fine-grained segmentation, achieving better accuracy with faster inference than state-of-the-art methods.

Large Selective Kernel Network for Remote Sensing Object Detection
Yuxuan Li, Qibin Hou, Zhaohui Zheng, Ming-Ming Cheng, Jian Yang#, Xiang Li#
in ICCV, 2023
[Paper] [BibTex] [Code]

LSKNet can dynamically adjust its large spatial receptive field to better model the ranging context of various categories of objects in remote sensing scenarios.

Curriculum Temperature for Knowledge Distillation
Zheng Li, Xiang Li#, Lingfeng Yang, Borui Zhao, Renjie Song, Lei Luo, Jun Li, Jian Yang#
in AAAI, 2023
[Paper] [BibTex] [Code]

[中文解读]
CTKD organizes the distillation task from easy to hard through a dynamic and learnable temperature. The temperature is learned during the student’s training process with a reversed gradient that aims to maximize the distillation loss in an adversarial manner.

RecursiveMix: Mixed Learning with History
Lingfeng Yang*, Xiang Li*, Borui Zhao, Renjie Song, Jian Yang#
in NeurIPS (Spotlight), 2022
[Paper] [BibTex] [Code]

RecursiveMix is a simple but effective data augmentation technique that first leverages the historical input-prediction-label triplets.

DTG-SSOD: Dense Teacher Guidance for Semi-Supervised Object Detection
Gang Li, Xiang Li#, Yujie Wang, Yichao Wu, Ding Liang, Shanshan Zhang#
in NeurIPS, 2022
[Paper] [BibTex] [Code(to be released)]

DTG-SSOD explores a novel “dense-to-dense” paradigm, instead of the traditional “sparse-to-dense” paradigm, for effective semi-supervised object detection.

PseCo: Pseudo Labeling and Consistency Training for Semi-Supervised Object Detection
Gang Li, Xiang Li#, Yujie Wang, Yichao Wu, Ding Liang, Shanshan Zhang#
in ECCV, 2022
[Paper] [BibTex] [Code]

[Blog(Chinese)] [Video(Chinese)]
PseCo delves into two key techniques of semi-supervised learning (e.g., pseudo labeling and consistency training) for SSOD, and integrate object detection properties into them.

Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision Transformers with Locality
Xiang Li, Wenhai Wang, Lingfeng Yang, Jian Yang#
in arXiv, 2022
[Paper] [BibTex] [Code]

[Blog(Chinese)]
UM-MAE is an efficient and general technique that supports MAE-style MIM Pre-training for popular Pyramid-based Vision Transformers (e.g., PVT, Swin).

Dynamic MLP for Fine-Grained Image Classification by Leveraging Geographical and Temporal Information
Lingfeng Yang, Xiang Li#, Renjie Song, Borui Zhao, Juntian Tao, Shihao Zhou, Jiajun Liang, Jian Yang#
in CVPR (Oral), 2022
[Paper] [BibTex] [Code]

A very simple and effective approach for fine-grained recognition tasks using auxiliary knowledge like geographical/temporal information. We achieve SOTA results and take third place in the iNaturalist challenge at FGVC8 (CVPR21 workshop)

Knowledge Distillation for Object Detection via Rank Mimicking and Prediction-guided Feature Imitation
Gang Li*, Xiang Li*, Yujie Wang, Shanshan Zhang#, Yichao Wu, Ding Liang in AAAI, 2022
[Paper] [BibTex]
Rank Mimicking and Prediction-guided Feature Imitation for knowledge Distillation of Dense Object Detection, A Simple and Effective Approach!

PVTv2: Improved Baselines with Pyramid Vision Transformer
Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao
Technical Report, 2021
[Paper] [Code]

[中文解读] [Report] [Talk] [BibTex]
A better PVT.

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan#, Kaitao Song, Ding Liang, Tong Lu#, Ping Luo, Ling Shao
in ICCV, 2021 (oral presentation)
[Paper] [Code]

[中文解读] [Report] [Talk] [BibTex]
A pure Transformer backbone for dense prediction, such as object detection and semantic segmentation.

PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text
Wenhai Wang*, Enze Xie*, Xiang Li, Xuebo Liu, Ding Liang, Zhibo Yang, Tong Lu#, Chunhua Shen
TPAMI, 2021
[Paper] [Code]

[BibTex]
We extend PSENet (CVPR'19) and PAN (ICCV'19) to a text spotting system.

Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection
Xiang Li*, Wenhai Wang, Xiaolin Hu, Jun Li, Jinhui Tang, Jian Yang
in CVPR, 2021
[Paper] [Code]

[BibTex]
The improved version of GFocal!

Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection
Xiang Li*, Wenhai Wang, Lijun Wu, Shuo Chen, Xiaolin Hu, Jun Li, Jinhui Tang, Jian Yang
in NeurIPS, 2020
[Paper] [Code]

We propose the generalized focal loss for learning the improved representations of dense object detector. GFocal is officially included in [MMDetection], and is an important part of the [winning solution] in GigaVision contest (object detection and tracking tracks) hosted in ECCV 2020 workshop (winner: DeepBlueAI team).

Selective kernel networks
Xiang Li*, Wenhai Wang, Xiaolin Hu, Jian Yang
in CVPR, 2019
[Paper] [BibTex] [Code]

We propose a selective kernel mechanism for convolution.

Understanding the disharmony between dropout and batch normalization by variance shift
Xiang Li*, Shuo Chen, Xiaolin Hu, Jian Yang
in CVPR, 2019
[Paper] [BibTex]
We explore and address the disharmony between dropout and batch normalization.

Understanding the disharmony between weight normalization family and weight decay
Xiang Li*, Shuo Chen, Jian Yang
in AAAI, 2020
[Paper]
We explore and address the disharmony between weight normalization family and weight decay.

LightRNN: Memory and computation-efficient recurrent neural networks
Xiang Li*, Tao Qin, Jian Yang, Tie-Yan Liu
in NeurIPS, 2016
[Paper] [BibTex]
We propose a memory and computation-efficient recurrent neural networks for language model.

Stacked conditional generative adversarial networks for jointly learning shadow detection and shadow removal
Jifeng Wang*, Xiang Li*, Jian Yang
in CVPR, 2018
[Paper] [BibTex] [Dataset]

We release a new dataset for jointly shadow detection and removal.

Shape Robust Text Detection with Progressive Scale Expansion Network
Wenhai Wang*, Enze Xie*, Xiang Li*, Wenbo Hou, Tong Lu, Gang Yu, Shuai Shao
in CVPR, 2019
[Paper] [Poster] [BibTex] [Code]

We proposed a segmentation-based text detector that can precisely detect text instances with arbitrary shapes.

Mixed Link Networks
Wenhai Wang*, Xiang Li*, Jian Yang, Tong Lu
in IJCAI, 2018
[Paper] [Poster] [BibTex] [Code]

We proposed an parameter-efficient convolutional neural networks for image classification.

Review Services

Journal Reviewer
IEEE Transactions on Neural Networks and Learning Systems (TNNLS)
IEEE Transactions on Image Processing (TIP)
IEEE Transactions on Multimedia (TMM)
International Journal of Computer Vision (IJCV)
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

Conference Reviewer
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, 2021, 2022, 2023
AAAI Conference on Artificial Intelligence (AAAI), 2019, 2020, 2021, 2022, 2023
European Conference on Computer Vision (ECCV), 2022