학술논문

Dynamic Inference From IoT Traffic Flows Under Concept Drifts in Residential ISP Networks
Document Type
Periodical
Source
IEEE Internet of Things Journal IEEE Internet Things J. Internet of Things Journal, IEEE. 10(17):15761-15773 Sep, 2023
Subject
Computing and Processing
Communication, Networking and Broadcast Technologies
Internet of Things
Context modeling
Data models
Home automation
Training
Performance evaluation
Behavioral sciences
Concept drifts
IoT
IPFIX data
machine learning
traffic inference
Language
ISSN
2327-4662
2372-2541
Abstract
Millions of vulnerable consumer IoT devices in home networks are the enabler for cyber crimes putting user privacy and Internet security at risk. Internet service providers (ISPs) are best poised to mitigate risks by automatically inferring active IoT devices per household and notifying users of vulnerable ones. Developing a scalable inference method that can perform robustly across thousands of home networks is a nontrivial task. This article focuses on the challenges of developing and applying data-driven inference models when labeled data of device behaviors is limited and the distribution of data changes across time and space domains (concept drifts). Our contributions are fourfold: 1) we collect and analyze more than six million network traffic flows of 24 types of consumer IoT devices from 12 real homes over six weeks to highlight the challenge of temporal and spatial concept drifts in network behaviors of IoT devices—we publicly release our training and testing instances data; 2) we analyze the performance of two inference strategies, namely global inference (a model trained on a combined set of all labeled data from training homes) and contextualized inference (several models each trained on the labeled data from a training home) in the presence of concept drifts; 3) to manage concept drifts, we develop a method that dynamically applies the “best” model (from a set) to network traffic of unseen homes during the testing phase, yielding better performance in a fifth of scenarios when the labels are available for the testing data (ideal but unrealistic settings); and 4) we develop a method to automatically select the best model without needing labels of unseen data (a realistic inference) and show that it can achieve 94% of the ideal model’s accuracy.