학술논문

Research Report: Building a File Observatory for Secure Parser Development
Document Type
Conference
Source
2021 IEEE Security and Privacy Workshops (SPW) SPW Security and Privacy Workshops (SPW), 2021 IEEE. :121-127 May, 2021
Subject
Computing and Processing
Privacy
Observatories
Conferences
Computer bugs
Buildings
Fuzzing
Portable document format
LangSec
language-theoretic security
file corpus creation
file forensics
text extraction
parser resources
Language
Abstract
Parsing untrusted data is notoriously challenging. Failure to handle maliciously crafted data correctly can (and does) lead to a wide range of vulnerabilities. The Language-theoretic security (LangSec) philosophy seeks to obviate the need for developers to apply ad hoc solutions by, instead, offering formally correct and verifiable input handling throughout the software development lifecycle. One of the key components in developing secure parsers is a broad coverage corpus that enables developers to understand the problem space for a given format and to use, potentially, as seeds for fuzzing and other automated testing. In this paper, we offer an update on work initially reported at the LangSec 2020 conference on the development of a file observatory to gather and enable analysis on a diverse collection of files at scale. The initial focus of the observatory is on Portable Document Format (PDF) files and file formats typically embedded in PDFs. In this paper, we report on the addition of a bug tracker corpus and new analytic methods.