학술논문

EC-Crypto: Highly Efficient Area-Delay Optimized Elliptic Curve Cryptography Processor
Document Type
Periodical
Source
IEEE Access Access, IEEE. 11:56649-56662 2023
Subject
Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Geoscience
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Galois fields
Elliptic curve cryptography
Field programmable gate arrays
Computer architecture
Arithmetic
Hardware acceleration
Throughput
Elliptic curve cryptography (ECC)
finite field multiplication
field programmable gate array (FPGA)
hardware acceleration
finite field arithmetic
Language
ISSN
2169-3536
Abstract
Elliptic Curve Cryptography (ECC) based security protocols require much shorter key space which makes ECC the most suitable option for resource-limited devices as compared to the other public key cryptography (PKC) schemes. This paper presents a highly efficient area-delay optimized ECC crypto processor over the general prime field ( $\mathbb {F}_{p}$ ). It is structured on a new novel finite field multiplier (FFM) where several optimization techniques have been incorporated to shorten the latency and hardware resource consumption. The proposed FFM architecture is embedded with a finite field adder/subtractor (FFAS) unit which is utilized to perform FFAS operations instead of deploying a dedicated unit. The Common Z (Co-Z) coordinates with the Montgomery ladder method are used to compute point multiplication, a core operation in all ECC-based crypto protocols. The work also proposes an efficient scheduling strategy to execute low-level finite field arithmetic primitives with minimum latency on the employed finite field arithmetic units. Due to these techniques, the proposed ECC processor is optimized for hardware resources, latency, and throughput. It is captured in Verilog-HDL, synthesized, and implemented on Virtex-7, Kintex-7, and Virtex-6 FPGA platforms using Xilinx Vivado and ISE Design Suite tools. On the Virtex-7 FPGA platform, it computes a single 256-bit scalar multiplication primitive in $0.7~m\text{s}$ , consumes just 6.2K slices, and delivers a throughput of 1428 operations per second. The implementation results show that it is a highly efficient design outperforming the state-of-the-art by providing a better area-delay product and higher efficiency. Therefore, it has the potential to be deployed in many applications where both latency and resource requirements are critical.