A System for Automatic Web Data Collection by means of Norms and Semantic Web Technologies
Web-based data collection of non-reactive data is becoming increasingly important for many social science fields. Therefore being able to introduce policies that automatically regulate the collection of data is an important open issue. In this project we propose to tackle this problem, with a focus on guaranteeing that the activities performed in data collection are compliant with a given set of norms/policies. From February 2013 we worked on a SERI Project in which we proposed: (i) The definition of an OWL model of policies and obligation usable for expressing policies that regulate the management of data extracted from the Web and stored in an OWL ontology; (ii) The definition of an OWL Social Network Ontology for semantically expressing the data extracted from and enriched with semantic analysis techniques; (iii) The definition of a software component able to enforce the obligations that regulate how the data must be treated before they can be used for social research. The research project described in this proposal is a continuation and an extension of such a project and we aim at: (i) Improving the application-independent model of policies expressed using Semantic Web Technologies for being able to express prohibitions and permissions on non-reactive data automatically extracted from the Web and making it possible to use the policies model to express also condition on the role and the context of the user who is accessing the data; (ii) Completing the implementation of the semantic analysis components able to add semantics to the data collected from social networks and store them in an OWL ontology; (iii) Organizing into a unified framework the implementation of the various components required for realizing a system able to assist a social scientist in the different phases of his/her work: from automatic data collection to norms/policies enforcement and monitoring.