학술논문

DiCENet: Dimension-Wise Convolutions for Efficient Networks
Document Type
Periodical
Source
IEEE Transactions on Pattern Analysis and Machine Intelligence IEEE Trans. Pattern Anal. Mach. Intell. Pattern Analysis and Machine Intelligence, IEEE Transactions on. 44(5):2416-2425 May, 2022
Subject
Computing and Processing
Bioengineering
Computer architecture
Tensors
Standards
Kernel
Convolutional codes
Task analysis
Object detection
Deep convolutional neural network
image classification
object detection
semantic segmentation
efficient networks
Language
ISSN
0162-8828
2160-9292
1939-3539
Abstract
We introduce a novel and generic convolutional unit, DiCE unit, that is built using dimension-wise convolutions and dimension-wise fusion. The dimension-wise convolutions apply light-weight convolutional filtering across each dimension of the input tensor while dimension-wise fusion efficiently combines these dimension-wise representations; allowing the DiCE unit to efficiently encode spatial and channel-wise information contained in the input tensor. The DiCE unit is simple and can be seamlessly integrated with any architecture to improve its efficiency and performance. Compared to depth-wise separable convolutions, the DiCE unit shows significant improvements across different architectures. When DiCE units are stacked to build the DiCENet model, we observe significant improvements over state-of-the-art models across various computer vision tasks including image classification, object detection, and semantic segmentation. On the ImageNet dataset, the DiCENet delivers 2-4 percent higher accuracy than state-of-the-art manually designed models (e.g., MobileNetv2 and ShuffleNetv2). Also, DiCENet generalizes better to tasks (e.g., object detection) that are often used in resource-constrained devices in comparison to state-of-the-art separable convolution-based efficient networks, including neural search-based methods (e.g., MobileNetv3 and MixNet).