학술논문

Research Report: Progress on Building a File Observatory for Secure Parser Development
Document Type
Conference
Source
2022 IEEE Security and Privacy Workshops (SPW) SPW Security and Privacy Workshops (SPW), 2022 IEEE. :168-175 May, 2022
Subject
Components, Circuits, Devices and Systems
Computing and Processing
Privacy
Observatories
Philosophical considerations
Conferences
User interfaces
Fuzzing
Portable document format
LangSec
language-theoretic security
file corpus creation
file forensics
text extraction
parser resources
Language
ISSN
2770-8411
Abstract
Parsing untrusted data is notoriously challenging. Failure to handle maliciously crafted data correctly can (and does) lead to a wide range of vulnerabilities. The Language-theoretic security (LangSec) philosophy seeks to obviate the need for developers to apply ad hoc solutions by, instead, offering formally correct and verifiable input handling throughout the software development lifecycle. One of the key components in developing secure parsers is a broad coverage corpus that enables developers to understand the problem space for a given format and to use, potentially, as seeds for fuzzing and other automated testing. In this paper, we offer an update on work reported at the LangSec 2021 conference on the development of a file observatory to gather and enable analysis on a diverse collection of files at scale. The initial focus of the observatory is on Portable Document Format (PDF) files and file formats typically embedded in PDFs. In this paper, we report on refactoring the ingest process, applying new analytic methods, and improving the User Interface.