학술논문

Investigation and Simulation of Hardware Errors in Kernel Logs of Linux-based Server Systems
Document Type
Conference
Source
2021 6th South-East Europe Design Automation, Computer Engineering, Computer Networks and Social Media Conference (SEEDA-CECNSM) Design Automation, Computer Engineering, Computer Networks and Social Media Conference (SEEDA-CECNSM), 2021 6th South-East Europe. :1-7 Sep, 2021
Subject
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Fault diagnosis
Error analysis
Social networking (online)
Linux
Random access memory
Maintenance engineering
Hardware
Hardware errors
Kernel logs
Linux servers
RAM
Hard disk
CPU
Language
Abstract
In modern server systems, business critical applications run in different types of infrastructure, such as cloud systems, physical machines and virtualization. Often, due to high load and over time, various hardware faults occur in servers that translate to errors, resulting to malfunction or even server breakdown. CPU, RAM and hard drive (HDD) are the hardware parts that concern server administrators the most regarding errors. In this work, selected RAM, HDD and CPU errors, that have been observed or can be simulated in kernel ring buffer log files from two groups of Linux servers, are investigated. Moreover, HDD and RAM error statistics are shown for the two different groups of servers. Better understanding of such errors can lead to more efficient analysis of kernel logs that are usually exploited for fault diagnosis and prediction. In addition, this work summarizes ways of simulating hardware errors in RAM and HDD, in order to test the error detection and correction mechanisms of a Linux server.