학술논문

Two-stage Neural Architecture Optimization with Separated Training and Search
Document Type
Conference
Source
2023 International Joint Conference on Neural Networks (IJCNN) Neural Networks (IJCNN), 2023 International Joint Conference on. :1-8 Jun, 2023
Subject
Components, Circuits, Devices and Systems
Computing and Processing
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Training
Representation learning
Pipelines
Semantics
Estimation
Transformers
Feature extraction
Neural Architecture Search
Representation Learning
Automated Machine Learning
Language
ISSN
2161-4407
Abstract
Neural architecture search (NAS) has been a popular research topic for designing deep neural networks (DNNs) automatically. It is able to improve the design efficiency of neural architectures significantly for given learning tasks. Recently, instead of conducting architecture search in the original neural architecture space, many NAS approaches have been proposed to learn continuous representations from neural architectures for architecture search or estimation. In particular, Neural Architecture Optimization (NAO) is a representative method which encodes neural architectures as continuous representations by an auto-encoder and then performs continuous optimization in the encoded space with gradient-based methods. However, as NAO only considers the top-ranked architectures in learning the continuous representation, it could fail to construct a satisfied continuous optimization space which contains the expected high-quality neural architectures. Taking this cue, in this paper we propose a two-stage NAO (TNAO) to learn a more completed continuous representation of neural architectures which could provide a better optimization space for NAS. Specifically, by designing a pipeline that separates the training and search stages, we first build the training set via random sampling from the entire neural architecture search space, which is with the aim of collecting the well-distributed neural architectures for training. Moreover, to exploit the architectural semantic information with limited data effectively, we propose an improved Transformer auto-encoder for learning the continuous representation, which is supervised by ranking information of the neural architecture performance. Lastly, towards more effective optimization of neural architectures, we adopt a population-based swarm intelligence algorithm, i.e. competitive swarm optimization (CSO), with a newly designed remapping scoring scheme. To evaluate the efficiency of the proposed TNAO, comprehensive experimental studies are conducted on two common search spaces, i.e., NAS-Bench-101 and NAS-Bench-201. The architecture with the top 0.02% performance is discovered on NAS-Bench-101 and the best architecture in the CIFAR-10 dataset is obtained on NAS-Bench-201.