Research
Semantic Segmentation & Panoptic Segmentation
Semantic segmentation is a fundamental and challenging problem in computer vision, aiming to segment and parse a natural image into different image regions associated with semantic categories including stuff (e.g. sky, road, grass) and discrete objects (e.g. person, car, bicycle). The study of this task can be applied to potential applications, such as automatic driving, robot sensing and image editing. In the task, the various scales, occlusion, and illumination changing of objects/stuff make it challenging to parsing each pixel. In order to accomplish the task effectively, we have developed a series of methods to deal with the problems in semantic segmentation and panoptic segmentation tasks.
[1] Jun Fu, Jing Liu, Haijie Tian, Zhiwei Fang, Hanqing Lu, Dual attention network for scene segmentation, CVPR (2019).
[2] Jun Fu, Jing Liu, Yuhang Wang, Hanqing Lu: Stacked Deconvolutional Network for Semantic Segmentation. CoRR abs/1708.04943 (2017)
Visual Question Answering
The Visual Question Answering (VQA) task aims at answering a natural language question about a given image. It has gained increasing attention as an interdisciplinary subject across computer vision and natural language processing. From an application perspective, VQA improves human-computer interaction ability and can be applied to many scenarios such as smart home management systems. From a research perspective, VQA requires a simultaneous understanding of both images and questions, and can be considered as a component of Visual Turing Test. We have studied VQA for several years and published a series of influential papers.
[1] Zhiwei Fang, Jing Liu, Yanyuan Qiao, Qu Tang, Yong Li, Hanqing Lu: Enhancing Visual Question Answering Using Dropout. ACM MM (2018).
[2] Zhiwei Fang, Jing Liu, Xueliang Liu, Qu Tang, Yong Li, Hanqing Lu: BTDP: Toward Sparse Fusion with Block Term Decomposition Pooling for Visual Question Answering. TOMMCCAP (2018).
Image Captioning
Image captioning task aims at automatically generating human-like captions for images, and has emerged as a prominent interdisciplinary research problem at the intersection of computer vision and natural language processing. It has many important industrial applications, such as visual intelligence in chatting robots, photo sharing on social media, and assistive facilities for visually impaired people. We have done a series of works to deal with the challenges in this task.
[1] Longteng Guo, Jing Liu, Peng Yao, Jiangwei Li, Hanqing Lu, MSCap: Multi-Style Image Captioning with Unpaired Stylized Text, CVPR (2019).
[2] Xinxin Zhu, Lixiang Li, Jing Liu, Ziyi Li, Haipeng Peng, Xinxin Niu, Image captioning with triple-attention and stack parallel LSTM. Neurocomputing, (2018).
Image Retrieval
With the advance of digital cameras and high-quality mobile devices as well as the internet technologies, there are increasingly large amounts of images available on the web, which necessitates effective and efficient image retrieval techniques. Among them, content-based image retrieval (CBIR) has been extensively studied over past decades. We have studied CBIR for several years and published a series of influential papers on general image retrieval, sketch retrieval, clothing retrieval, and commodity retrieval etc.
[1] Zhiwei Fang, Jing Liu, Yuhang Wang, Yong Li, Jinhui Tang, Hanqing Lu, Object-aware Deep Network for Commodity Image Retrieval, ICMR (2016).
[2] Jing Liu, Zechao Li, Hanqing Lu, Sparse semantic metric learning for image retrieval. Multimedia System (2014).