학술논문

Beyond the Web Graph: Mining the Information Architecture of the WWW with Navigation Structure Graphs
Document Type
Conference
Source
2011 International Conference on Emerging Intelligent Data and Web Technologies Emerging Intelligent Data and Web Technologies (EIDWT), 2011 International Conference on. :99-106 Sep, 2011
Subject
Computing and Processing
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Navigation
Humans
Organizations
Information architecture
Cascading style sheets
Data mining
Visualization
Web structure mining
Web graph
hierarchy extraction
Language
Abstract
Large Web sites contain a plethora of different menus and navigation aids, which implement systems of content organization as hierarchies, linear structures or matrices. Humans are able to decode the fine-grained content organization because they are aware of the different access methods provided by navigation systems and understand the higher-level information architecture. In contrast, current methods of link analysis cannot extract such a detailed model of the information architecture and are not able to recognize site boundaries and content hierarchies the way humans do. In this paper present a new approach of mining navigation systems that increases the precision of Web structure mining. Instead of analyzing the complete Web graph spanned by pages and hyperlinks, sub graphs called Navigation Structure Graphs (NSGs) are analyzed. A NSG represents the hyperlinks belonging to a certain navigation system. We demonstrate the capabilities of NSGs for analyzing the organization of Web sites and present our research on mining NSGs.