Automatic Web data collection from non-reactive sources by means of normative systems and Semantic Web Technologies
Web-based data collection is becoming increasingly important for many social science fields. It is not restricted to Web surveys, but it also includes non-reactive data, collected by means of various techniques from heterogeneous Web sources. A scientific methodology for Web-based data collection has not yet been developed. Relevant components for the required methodology are, from the point of view of those who will analyse these data, data validity, reliability, and quality; from the perspective of data providers, constraining the access to their data is essential, together with the possibility of being aware of how they will be stored and used. These guidelines are currently expressed in natural language. Therefore, when big amounts of data are treated for automatic extraction by means of specialized software, being compliant with those norms becomes very difficult. It is therefore clear that realizing new challenging technologies for supporting Web based data collection is an important open issue. In this project we propose to tackle this by developing models and techniques to express Web-based data collection guidelines, rules and policies at different levels of abstraction. We want to develop new techniques for automatic data extraction from the Web which rely on Semantic Web technologies and automatic reasoning to plan actions compliant with the guidelines. Finally we plan to implement a demonstrative tool able to use the above mentioned technologies for Web-based data collection.