학술논문

Efficient Parallel Implementations of PIPO Block Cipher on CPU and GPU
Document Type
Periodical
Author
Source
IEEE Access Access, IEEE. 10:85995-86007 2022
Subject
Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Geoscience
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Graphics processing units
Computer architecture
Servers
Encryption
Ciphers
Cryptography
Parallel processing
AVX-2
AVX-512
block cipher
CUDA
GPU
parallel processing
PIPO
SIMD
Language
ISSN
2169-3536
Abstract
Data encryption is essential for securely managing clients’ data in servers in data-centric ICT environment. Clients must encrypt the data before transmitting it to severs or other clients. Encrypting a large volumne of data requires a lot of time. Therefore, in order for servers and clients to not only secure but also smoothly communicate each other, the optimization of data encryption is necessary on both the server-side and the client-side. Especially, the server environment is responsible for managing/processing lots of data from clients. In this paper, we present two kinds of highly optimized PIPO cipher software in CPU and GPU environment, respectively. PIPO was proposed in ICISC’19 as a lightweight block cipher. For optimization, we take full advantage of two parallel processing technologies: AVX-related instructions in CPU and NVIDIA CUDA platform in GPU. Regarding the optimization in CPU environment, we process several plaintext blocks such as 32 and 64 blocks with the proper use of AVX2 and AVX-512 instruction sets and the proposed arithmetic techniques, respectively. Regarding the optimization on GPU environment, we propose a data alignment/data combining methods, and PTX inline assembly utilization method considering the characteristics of GPU architecture. In Intel Core i9-11900K (3.50GHz) architecture, our PIPO software utilizing AVX-2 has a performance improvement on 839.64% (resp. 985.46% [AVX-512]) compared to the existing reference code (Regarding AVX-512, this is the first PIPO software using AVX-512 instructions as far as we know). Finally, in RTX 2080Ti, our PIPO GPO software shows throughput of up to 1110.08 Gbps.