학술논문

Fine-Grained Human Hair Segmentation Using a Text-to-Image Diffusion Model
Document Type
Periodical
Source
IEEE Access Access, IEEE. 12:13912-13922 2024
Subject
Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Geoscience
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Hair
Image segmentation
Feature extraction
Semantics
Visualization
Face recognition
Task analysis
Hair segmentation
fine-grained segmentation
generative model
diffusion model
text-to-image diffusion model
Figaro-1k
CelebAMask-HQ
face synthetics
Language
ISSN
2169-3536
Abstract
Human hair segmentation is essential for face recognition and for achieving natural transformation of style transfer. However, it remains a challenging task due to the diverse appearances and complex patterns of hair in image. In this study, we propose a novel method utilizing diffusion-based generative models, which have been extensively researched in recent times, to effectively capture and to finely segment human hair. In diffusion-based models, an internal visual representation during the denoising process contains pixel-level rich information. Inspired by this aspect, we introduce diffusion-based models for segmenting fine-grained human hair. Specifically, we extract the representation from the diffusion-based models, which contains pixel-level semantic information, and then train a segmentation network using it. Particularly, to more finely segment human hair, our approach employs the representation from a text-to-image diffusion model, conditioned on text information, to extract more relevant information for human hair, thereby predicting detailed hair masks. To validate our method, we conducted experiments on three distinct hair-related datasets with unique characteristics: Figaro-1k, CelebAMask-HQ, and Face Synthetics. The experimental results show the improved performance of our proposed method across all three datasets, outperforming existing methods in terms of mIoU (mean intersection over union), accuracy, precision, and F1-score. This is particularly evident in its ability to accurately capture and finely segment human hair from background and non-hair elements. This demonstrates the effectiveness of our method in accurately and finely segmenting human hair with complex characteristics. Our research contributes not only to the fine-grained segmentation of human hair but also to the application of generative models in semantic segmentation tasks. We hope that the proposed method will be applied for detailed semantic segmentation in various fields in the future.