Press "Enter" to skip to content

SWD IA on EU copyright modernisation – Text and data mining: the problem and options

Text and Data Mining (TDM) is a term commonly used to describe the automated processing (“machine reading”) of large volumes of text and data to uncover new knowledge or insights. TDM can be a powerful scientific research tool to analyse big corpuses of text and data such as scientific publications or research datasets. Copyright issues contribute to the slow development of TDM in EU research.

Copyright is relevant in this context as TDM may often involve copying (e.g. downloading) of the content to be analysed, which can be protected by the “right of reproduction” under copyright law. The current EU copyright rules lay down exceptions permitting the use of content for the purposes of non-commercial scientific research. However, a considerable level of legal uncertainty exists in practice. Research organisations do not always know whether TDM is copyright-relevant at all, whether it may be covered by an exception or whether a specific rightholders’ authorisation is required.

Researchers consider this situation to be particularly problematic as regards protected content to which they already have lawful access to on the basis of a subscription purchased by their library or institution. Subscriptions to scientific publications may currently include or not the authorisation to perform TDM, prohibit it altogether, or leave it unclear. Fragmentation in the single market is also an emerging problem as MS have started to adopt national TDM exceptions referring to the research exceptions in the current EU rules.


Without EU intervention, TDM will continue to be based on rightholders’ prior authorisation and its development would depend on the development of market-based initiatives to facilitate TDM licensing. At the same time an increasing number of MS could decide to adopt national TDM exceptions in the context of the current research exceptions (Article 5(3)(a) of the InfoSoc Directive and Articles 6(2)(b) and 9(b) of the Database Directive).

Rightholders would support the baseline option as they are generally opposed to an intervention in this area. Researchers consider that legislative intervention is needed and would therefore strongly criticise a lack of EU action.

Option 1 – Fostering industry self-regulation initiatives without changes to the EU legal framework

Non-legislative option. The Commission would encourage stakeholders, notably publishers and researchers, to identify collaborative solutions to facilitate TDM, in particular for content subscribed to by research organisations.

Structured dialogues between researchers and publishers would be organised to allow both sides to express their views, notably with regard to researchers’ needs and the technical safeguards publishers could use to ensure the protection of their content without creating unnecessary or disproportionate burden for researchers. Building on existing initiatives such as “Cross Ref”, this option could also support and promote further technical solutions, such as platforms facilitating TDM in practice to allow researchers to access publishers’ data at one go, promoting common standards for data formats or the creation of trusted intermediaries ensuring a safe environment for the mining of content.

The Commission would monitor the implementation of the commitments made by publishers to allow TDM for scientific purposes and to amend their licences respectively. If no substantial improvements are achieved in the mid-term, the Commission would consider proposing legislative changes as described in Options 2 to 4.

Rightholders would support this non-legislative option. STM publishers in particular have asked the Commission to pursue a self or co-regulatory approach on TDM following up on the Licences for Europe dialogue. They consider that collaborative solutions identified together with non-commercial researchers would be a balanced way forward and could yield concrete results more quickly. On the other hand, researchers are not in favour of additional stakeholder dialogues if not accompanied by legislative changes.

Option 2 – Mandatory exception covering text and data mining for non-commercial scientific research purposes.

This option would make mandatory for MS the implementation of an exception to the rights of reproduction and of database extraction, with the following elements:

Beneficiaries: any user who has lawful access to content protected by copyright or by the sui generis database right (e.g. a subscription to a scientific journal). Lawful access would cover access to content through authorisation by content owners (e.g. subscriptions to scientific journals) as well as access to publicly available content (e.g. open access content).

Permitted uses: lawful users would be permitted to carry out the reproductions which are necessary for the TDM process, as long as the TDM is carried out for non-commercial scientific research purposes (within the meaning of the current research exceptions in the EU copyright rules which are subject to the “non-commercial purposes” condition). The exception would not permit any communication to the public of the content being mined.

Relationship with the licensing market: given that lawful access will often be granted through contracts, legislative intervention would also make clear that contractual terms that prevent or restrict uses permitted under the exception are null and void. At the same time, rightholders would be allowed to apply proportionate measures which are necessary to guarantee the security of the content as long as this does not unduly hamper uses covered by the exception. Additionally, the legislative instrument would encourage stakeholder dialogues aiming at setting up best practices and mutually agreed technical solutions with regard to security aspects.

Compensation: the exception would not be subject to the payment of fair compensation to rightholders as its specific features, notably the lawful access condition, allow rightholders to keep generating revenues from the access to their content, notably through subscription licences.

Interaction with the current exceptions: the current research exceptions in the InfoSoc and Database directives would remain untouched and continue to apply outside the scope of the new TDM exception. The exception under this option would also be without prejudice to the transient copies exception under Article 5(1) of the InfoSoc Directive.

Rightholders, publishers in particular, are strongly opposed to a legislative intervention introducing a TDM exception at EU level. Their main concern is an exception would facilitate the misuse and piracy of their content and make them lose business opportunities in future. This option would be the least opposed by rightholders among the legislative options as it is clearly limited to TDM carried out for non-commercial research purposes.

Option 3 – Mandatory exception applicable to public interest research organisations covering text and data mining for the purposes of both non-commercial and commercial scientific research

As Option 2 for all the points except for the beneficiaries of the exception and the purpose of the scientific research which would be as follows:

The exception would only apply to research organisations carrying out research in the public interest as opposed to commercial companies that would not be beneficiaries of the exception under this option. The concept of research organisations would be defined in the legal instrument to encompass different organisations across MS which have as their primary goal to conduct scientific research either on a non-for profit basis or pursuant to a public interest mission. This will cover for example universities, research institutes and similar research organisations.

At the same time, the exception would go beyond Option 2 in the sense that it would permit research organisations as defined above to carry out TDM on content they have lawful access to irrespective from the non-commercial or commercial purpose of their scientific research. This would cover notably research projects carried out in the framework of Public-Private Partnerships (PPPs, which may have an ultimate commercial outcome).

Researchers generally consider this option favourably as it would increase legal certainty for their organisations to perform TDM, including in the context of PPPs. At the same time part of the research community has expressed the concern that the concept of public interest organisation could be difficult to define and, more generally that a TDM exception should be extended to anybody who has lawful access and covering both non-commercial and commercial research. Rightholders are against any legal intervention, but they may favour this option as compared to a broader exception as the intervention would be limited to public interest research organisations.

Option 4 – Mandatory exception applicable to anybody who has lawful access (including both public interest research organisations and businesses) covering text and data mining for any scientific research purposes.

As Option 2 but under this option the exception would permit any user who has lawful access to carry out TDM for the purposes of both non-commercial and commercial scientific research. Differently from the other legislative options, the exception would not be limited to non-commercial use (Option 2) nor to specific beneficiaries (Option 3). In practice this intervention would cover TDM for scientific research beyond public research area, notably when carried out by commercial operators such as life science companies.

The research community supports this option as it would fully pursue their objective that anybody who has lawful access should be entitled to mine the content without additional authorisation or conditions. This option would be strongly opposed by rightholders. Publishers in particular take the view that such a large exception would significantly interfere with the TDM licensing market in the commercial sector, mainly in the area of life science. Commercial companies carrying out scientific research have generally not raised problems with commercial TDM licences, nor have generally requested the Commission to take action in this area.