LIBER Innovation Award

The Institute for Information Law is pleased to announce that a paper written by Lucie Guibault, Christian Handke and Joan-Josep Vallbé has been awarded with the LIBER Innovation Award. This award is given to the 3 most innovative and relevant papers submitted to the LIBER Conference, which will be held in London on the 24-26 of June 2015.


Is Europe falling behind in data mining? Copyright law's impact on data mining in academic research.


Abstract:
This paper discusses how different levels of copyright protection affect the text and data mining (TDM) performance of academic researchers in the main research areas.
Copyright protection is determined at the national level. The scope of rights and exceptions varies per country: in some countries, exceptions expressly allow TDM to take place, while in others such activities are restricted. In most countries, the law is unclear. Statutory copyright exceptions, where they exist, can be interpreted in different ways. The assessment on the lawfulness of TDM falls back on the judgment of the researcher. Depending on the knowledge or perception of the law, TDM may be deemed allowed, probably allowed, probably not allowed or restricted. This paper assesses the consequences of the different levels of copyright protection on TDM activities.

Our aim is to explain the comparative variation in research output about data mining. For this, we collected data from Thomson Reuter’s Web of Science. To identify the research output of interest, we extracted the number of all published research from authors residing in the 31 largest national economies that contained the expression “data mining” in the extended abstract, including 14 EU member states, for the years 1992 to 2014. To control for the total research output of the respective countries, our dependent variable is the quotient between this absolute academic TDM output and the total research output from these countries. Our unit of analysis is the country-year proportion of TDM research output.

Other control variables include the rule of law (as reported by the World Bank), dealing with the level of enforcement of copyright, and the size and wealth of countries.
To estimate the effect of copyright law on the share of TDM in total research output, we fit a multilevel linear regression model with varying intercept for country and year.

The data illustrate the rapid growth of TDM related articles in total research output across all countries. We find a highly significant effect of copyright law: the more restrictive copyright law in most European countries is associated with a significantly lower share of TDM output. Data mining makes up a higher share of total research output in countries with more permissive copyright laws. Especially some Asian countries overperform in terms of their TDM research output. What is more, the share of TDM in total research output grows more rapidly in the less restrictive countries.