🥥 COCONut: Crafting the Future of Segmentation Datasets with Exquisite Annotations in the Era of ✨Big Data✨

ByteDance Research, USA

The COCONut dataset and supported tasks

🔥 Highlights

1. We introduce COCONut, a modern, universal segmentation dataset that encompasses about 383K images and 5.18M human-verified panoptic segmentation masks, along with with COCONut-val, a new validation set as a novel and challenging testbed.

2. We present benchmarked results on COCONut. As the training set increases, we observe consistent improvement in semantic/instance/panoptic segmentation and object detection, new SOTA result on open-vocabulary segmentation, and better controllable image generation from semantic masks. Additionally, our experimental results highlight the superior value of human annotations compared to pseudo-labels.


In recent decades, the vision community has witnessed remarkable progress in visual recognition, partially owing to advancements in dataset benchmarks. Notably, the established COCO benchmark has propelled the development of modern detection and segmentation systems. However, the COCO segmentation benchmark has seen comparatively slow improvement over the last decade. Originally equipped with coarse polygon annotations for `thing' instances, it gradually incorporated coarse superpixel annotations for `stuff' regions, which were subsequently heuristically amalgamated to yield panoptic segmentation annotations. These annotations, executed by different groups of raters, have resulted not only in coarse segmentation masks but also in inconsistencies between segmentation types. In this study, we undertake a comprehensive reevaluation of the COCO segmentation annotations. By enhancing the annotation quality and expanding the dataset to encompass 383K images with more than 5.18M panoptic masks, we introduce COCONut, the COCO Next Universal segmenTation dataset. COCONut harmonizes segmentation annotations across semantic, instance, and panoptic segmentation with meticulously crafted high-quality masks, and establishes a robust benchmark for all segmentation tasks. To our knowledge, COCONut stands as the inaugural large-scale universal segmentation dataset, verified by human raters. We anticipate that the release of COCONut will significantly contribute to the community's ability to assess the progress of novel neural networks.

COCO vs. COCONut dataset annotation comparison

Interpolate start reference image.
Interpolate start reference image.

COCONut dataset

Interpolate start reference image.

Human assisted annotation pipeline


  author    = {Xueqing Deng, Qihang Yu, Peng Wang, Xiaohui Shen, Liang-Chieh Chen},
  title     = {COCONut: Modernizing COCO Segmentation},
  booktitle   = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2024},