TOP 10 must read CVPR 2023 papers
CVPR 2023, the IEEE/CVF Conference on Computer Vision and Pattern Recognition, marked a significant milestone as it took place in Canada for the first time. From June 18th to 22nd, Vancouver became the gathering place for thousands of renowned computer scientists, engineers, researchers, and leaders from academia and the business world. With over 4,000 attendees, the conference buzzed with excitement and anticipation, reminiscent of the transformative impact that ChatGPT had on the field of natural language processing.
One of the prominent themes at CVPR was image generation, which has been gaining traction ever since the emergence of Generative Adversarial Networks (GANs) and diffusion models. This year’s event showcased innovative research focused on image editing and enhancing user control over generated content. Among the noteworthy papers in this domain was “Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models” by Gowthami Somepalli, Vasu Singla, Micah Goldblum, Jonas Geiping, and Tom Goldstein. The paper delved into the crucial problem of detecting object-level replications in images. To address this issue, the authors studied and compared ten different prototypes of feature extractors derived from self-supervised learning and image retrieval techniques. By applying the best image feature extractors to various diffusion models, such as Deep Diffusion Probabilistic Models (DDPMs) and latent diffusion models, trained on datasets of different scales, the researchers discovered that while replications could be easily detected in small and medium-scale datasets, content replications of various forms still occurred frequently in the generated samples of larger-scale stable diffusion models. Another award-winning paper in the field of image generation was “On Distillation of Guided Diffusion Models” by Chenlin Meng, Robin Rombach, Ruiqi Gao, Diederik Kingma, Stefano Ermon, Jonathan Ho, and Tim Salimans. This work was recognized for its significant contribution to democratizing diffusion models and generative Artificial Intelligences (AIs) by making them accessible to individuals with limited computational resources, especially those outside the AI community. The paper proposed a two-stage distillation process for diffusion models. In the first stage, a student model (distilled model) was trained to match the output of the original models at various levels of guidance strength. In the second stage, the student model underwent further distillation to reduce the number of discrete time steps progressively. This distillation approach was applicable to both pixel-space and latent-space diffusion models, enabling text-guided image editing and inpainting tasks.
The conference also showcased numerous advancements in the field of Neural Radiance Fields (NeRF), a technique for generating 3D scenes from 2D images. Researchers demonstrated their efforts to scale up NeRF for larger scenes, improve its efficiency, handle moving scenes, and work with a smaller number of input images. Two recommended papers that received acclaim were “MobileNeRF: Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures” by Zhiqin Chen, Thomas Funkhouser, Peter Hedman, and Andrea Tagliasacchi, and “RobustNeRF: Ignoring Distractors With Robust Losses” by Sara Sabour, Suhani Vora, Daniel Duckworth, Ivan Krasin, David J. Fleet, and Andrea Tagliasacchi. The former paper explored leveraging the polygon rasterization pipeline to render neural fields efficiently on mobile architectures, while the latter proposed a robust optimization framework that modeled distractors as outliers, making NeRF models practical for realistic scene reconstructions in dynamic environments.
Prompt learning was another notable area of focus at CVPR 2023. “Visual Programming: Compositional Visual Reasoning Without Training” by Tanmay Gupta and Aniruddha Kembhavi offered an intriguing approach to generating Python and HTML code snippets for prompt-based machine vision tasks. Additionally, “Vita-CLIP: Video and Text Adaptive CLIP via Multimodal Prompting” by Syed Talal Wasim, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan, and Mubarak Shah presented a unified training scheme that balanced supervised and zero-shot performance through the learning of multimodal prompts. The proposed method encoded video information at three levels: global video-level prompts, local frame-level prompts, and a summary prompt, enabling effective performance across both supervised and zero-shot settings.
In the realm of autonomous driving, the award-winning paper “Planning-oriented Autonomous Driving” by Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, Lewei Lu, Xiaosong Jia, Qiang Liu, Jifeng Dai, Yu Qiao, and Hongyang Li stood out. Another significant contribution in this field was “SkyEye: Self-Supervised Bird’s-Eye-View Semantic Mapping Using Monocular Frontal View Images” by Nikhil Gosala, Kürsat Petek, Paulo L. J. Drews-Jr, Wolfram Burgard, and Abhinav Valada. The paper introduced a voxel grid generation approach that encoded spatial and semantic information through lifting modules. The authors utilized implicit supervision during initial pretraining and explicit supervision during a subsequent refinement step using pseudolabels.
Lastly, knowledge distillation was a topic of interest, with two noteworthy papers capturing attention. “Coaching a Teachable Student” by Jimuyang Zhang, Zanming Huang, and Eshed Ohn-Bar explored the use of knowledge distillation to train a student model from a teacher model while incorporating a coaching mechanism. Another paper is “Decentralized Learning with Multi-Headed Distillation” by Andrey Zhmoginov, Mark Sandler, Nolan Miller, Gus Kristiansen, Max Vladymyrov. This technique enables multiple agents, each possessing private non-iid data, to learn from one another without the need to share their data, weights, or weight updates. The approach is not only communication efficient but also leverages an unlabeled public dataset and employs multiple auxiliary heads for each client, thereby greatly enhancing training efficiency, especially in scenarios involving heterogeneous data. By adopting this approach, individual models are capable of maintaining and improving their performance on their respective private tasks, while also achieving significant performance improvements on the globally aggregated data distribution. The effects of data and model architecture heterogeneity, as well as the influence of the communication graph topology, are thoroughly examined, demonstrating that the proposed technique allows the agents to achieve substantial performance enhancements compared to isolated learning scenarios.
Overall, CVPR 2023 showcased groundbreaking research and developments across various areas of computer vision, with these top 10 recommended papers highlighting the remarkable advancements and potential for future breakthroughs.
Join Upaspro to get email for news in AI and Finance
The CVPR conference showcased an array of groundbreaking research in the field of computer vision, and one standout paper that caught my attention was “Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models” by Gowthami Somepalli et al. This paper tackled the crucial problem of detecting object-level replications in images, utilizing various feature extractors and diffusion models to uncover the limitations of existing approaches. Another remarkable contribution was “On Distillation of Guided Diffusion Models” by Chenlin Meng et al., which introduced a two-stage distillation process to democratize diffusion models and make them accessible to individuals with limited computational resources.