BIG DATA PROTECTION IN IP (By- Apsi Adithya Kumar)

Authored By- Apsi Adithya Kumar
The interfaces between "Big Data" and IP matters both because of the impact of Intellectual Property (IP) rights in Big Data. This Article looks at both sides of the coin, focusing on several IP rights, namely copyright, patent, exclusivity and trade secret/confidential formation.
The term "Big Data" can be defined in a number of ways, but the four most important are volume, veracity, velocity and variety. Big Data corpora are often generated automatically, and the question of the quality or trustworthiness of the data is crucial. If all previous features are present, a Big Data corpus likely has significant "value".
In many cases, Big Data corpora are protected by protection that relies on trade secret law combined with technological protection from hacking, and contracts. A publicly available corpus, in contrast, must rely on erga omnes IP protection—if it deserves protection to begin with. Copyright protects collections of data; the sui generis database right (in the EU) might apply; and data exclusivity rights in clinical trial data may be relevant.
The human-written (AI) software used to collect (including search and social media apps), store and analyse Big Data corpora is considered a literary work eligible for copyright protection, subject to possible exclusions and limitations.18 The analysis that follows focuses on the harder question of the protection of the Big Data corpora and of the outputs generated from the processing of such corpora
A Committee of Experts meeting under the auspices of the World Intellectual Property Organization (WIPO), which administers the Berne Convention, concluded that, the only mandatory requirement for a literary or artistic work to be protected by the Convention is that it must be “original”. In its Article 2, when discussing the protection of “collections”, it states that “collections of literary or artistic works such as encyclopedias and anthologies shall be protected as such, without prejudice to the copyright in each of the works forming part of such collections. There are two layers of copyright in an encyclopedia: the "organizational layer", granted to the maker of the collection based on the "selection or arrangement" of the individual entries, photographs and illustrations; and the "collective layer", which is generally treated as a collective work. In a collection of this type, there are first, a right in each entry, and each illustration or photograph, which is either transferred or licensed by the maker or publisher to the person making or distributing it.
Big Data is sometimes defined in direct contrast to the notion of SQL database and reflected in the TRIPS Agreement. Big Data software is unlikely to “select or arrange” the data in a way that would meet the originality criterion and trigger copyright protection
Data generated by AI-based TDM systems that have initially high but fast declining value, such as financial information relevant to stock market transactions, could be subject to copyright protection in some jurisdictions. In the US, the tort of misappropriation is applicable to "hot news" in US law, or the protection against parasitic behaviour available in a number of European systems.
The use of noSQL technologies may mean that Big Data corpora are not protected by the sui generis right in the Database Directive. The Directive refers to the database maker's investment in "obtaining, verification or presentation of the contents" and then provides a right "to prevent extraction and/or re-utilization" of that data. The Court of Justice of the European Union defined "investment" in obtaining the data as "resources used to seek out existing materials and collect them in the database" - but does not cover the resources used for the creation of materials which make up a database. The main argument for this distinction is that the Database Directive's economic rationale is to promote and reward investment in database production, not in generating new data.
An AI-capable TDM system might be used in enhancing the use ofpatent information. The "patent bargain" is basically a fair disclosure of an invention in exchange for a limited monopoly on its use, especially on a commercial basis. AI applications in this field already go further, and the trajectory of their development leads to some potentially remarkable conclusions.
There is a concern that TDM tools might prevent the use of clinical trial data, which is seen as a negative development. This is because it is the collected clinical trials, and their ability to provide a large and comprehensive dataset, that make them valuable. It is not the specific health and safety outcomes proven by those data that are so valuable.
Patents may become more difficult to obtain due to massive Big Data –based AI disclosures of possibly new incremental innovations. Such a system could conceivably disclose new molecules and predict their efficacy. In such a case, it would be near impossible to patent the drug unless patented by the AI "inventor". The data exclusivity right might fill that void.
Application of trade secret law to Big Data. Trade secret and confidential information law could be used to protect data acquired for purposes of TDM. Trade secret law typically works far better for business information than private data. One might expect the default contracts may not adequately protect users or consumers, though privacy or consumer protection laws may impose limits on contractual freedoms. The protection of confidential information could apply to "data coming from a machine-to-machine process", as well as the use of such data by companies in the so-called "collaborative economy 3.0" - where they share their Big Data with each other. Possibilities of welfare gains by third parties, since this regime applying to knowledge commons such as the IoT enables spillovers, and therefore its presence may not necessarily be perceived as a bad thing.
Excessive restrictions on access to lock-in effects by major data gathering entities might have negative welfare impacts warranting governmental intervention in "data--driven platform markets characterized by strong network and lock--in effects--and in new technological contexts that might otherwise be ripe for competitive innovation
In sum, the interfaces between Big Data and IP are about finding ways to adapt IP rights to allow and set proper parameters for the generation, processing and use of Big Data. This includes an analysis of how Big Data may infringe IP rights. There is also an issue of rights in Big Data, however. Courts and legislators have years of questions to answer on both constraints in and protection of Big Data.
Big data is currently a hot topic in many fields, including management and marketing, scientific research, national security, government transparency, and open data. Both the public and private sectors are increasingly utilising big data analytics. This study aims to provide an overview of the issues as we see them and to contribute to the big data discussion.
In this subject, technological capabilities and the range of possible applications are quickly developing, and there is ongoing debate regarding the consequences of big data. Our goal is to balance the many privacy hazards connected with big data with the benefits that big data provides to organisations, individuals, and society as a whole. We believe that adhering to essential data protection rules and measures will aid in the long-term sustainability of big data's developing benefits. The benefits cannot be simply traded for the right to privacy.
This poses a tension in terms of intellectual property because there are continuing attempts to better protect authors' rights in the digital age; efforts that could be perceived as incompatible with the needs of big data.

Put simply, big data refers to the unlimited use of data, whereas traditional IP protection aims to prevent this. An IP attorney can help you solve the complexities of this situation, but at first glance, it seems to be going in the opposite direction.
 In order to proceed with a big data project without violating the
Copyright Act, the project manager should theoretically contact the individual authors represented in the dataset and obtain permission. 
Thearticle references several examples of big data and cites reports and other publications. Information is taken from publicly available sources and links are provided in the footnotes.
IP Protection and Big Data
The concept of secrecy, a form of trade secret law-based security coupled with technical hacking prevention and contracting, involves an enormous amount of information. Therefore, when figuring out which IP rights can be followed, it is important to distinguish between public and large amounts of undisclosed information (such as the Google database that powers search engines and advertisements). Secret Legions are often virtually protected from competition by secrecy. In other words, your competitor may create an aggressive legion to gain market share. Public data must rely on the security of intellectual property for evaluation.
The proposed EU General Data Protection Regulation[1] contains a number of provisions that would have a bearing on the use of personal data in big data analytic.
Data minimization and data anonymization – burden of proof on data controllers. The need for transparency; A shift in the balance of data protection forces designed by default; Possibility of extending data protection responsibilities to organizations outside the EU.

Personal data must be "restricted to the minimum necessary with respect to the purpose of processing" and must be processed "only in cases and during periods when the purpose cannot be achieved" by processing information that does not contain personal data" (Art. 5 (c) EU GDPR)
Furthermore (Article 17 EU GDPR)[3] “right to be forgotten” means that the data subject may request the deletion of personal data if it is no longer necessary for the purpose for which it was collected or processed. A recent decision of the European Court of Justice under the current directive (which is being implemented by the DPA) also supports this direction[4].
Under the proposed regulation, data controllers not only require a “transparent and easily accessible policy” for the processing of personal data, but also communicate with data subjects “in an understandablemanner using clear and simple language appropriate to the data subject” ask you to do .
Data controllers would have to put in place methods to ensure that only the bare minimum of personal data is utilised and that it is stored for no longer than is necessary for the processing. Big data is frequently described as a power dynamic that benefits corporations and governments. The Regulation implies a desire to change the power balance in favour of the individual by giving them more explicit rights over their personal data processing[5].  While the Commissioner supports the Regulation's protections in general, it is critical to ensure that the provisions are effective in reality, which requires more thought about what this degree of prescription would accomplish. The
Regulation clearly aims to address some of the most pressing data protection concerns raised by big data analytics, but whether it will be implemented in its current form remains to be seen.
Challenges To Patent System Posed By Big Data


Although big data is not patentable in and of itself, the algorithm and software programme may be covered by the law. Furthermore, while big data content cannot be patented in general, it may be protected as a patent if it may provide an economic advantage by articulating it as an innovation that is inherent unique and can be utilised for industrial use by the company seeking to assert its rights[6]. Consider the scenario of a computer that was said to be used to assess the qualifications of candidates for a vocational training programme[7]. The Court distinguished two types of computer use: the first is using a computer to carry out a scheme or plan in which the computer only acts as an intermediary, and the second is using a computer to improve its functionality or solve a technical problem that is outside of the computer's normal use. "Putting a business process or strategy into a computer is not patentable unless the computer performs the scheme or method in an inventive manner," the Court stated. However, finding patents for these computer-generated inventions can be a tricky equation, as tasks generated by uncontrolled artificial intelligence are not patentable. This can challenge traditional notions of intellectual property (IPR).
Patent challenges
Document Authentication and Management
The concerns examined under this group are record falsification, the arrangement of clever agreements, and the treatment of agreements after everything is said and done.
The following firms have successfully handled this issue:
• IBM — Handling many types of agreement layouts stored in a blockchain, where the type of format open is regulated by the event type and event records entered.
• Coin-plug – Checking the validity of a bank's exchange records by comparing the initial and subsequent records issued in accordance with a client.
• Bank of America – Using a private blockchain framework to improve and streamline the acceptance of reports moving between two different stockpiling devices.
• Alibaba – Validation and verification of records between clients by at least one user checking the record and then transferring it to a central server, which distributes the validated report to all other blockchain clients.
It is associated with patent families that are attempting to confirm data exchange through the system.
Data Sharing and Consistency
The following firms have successfully solved this issue: -
·         IBM – Method of securing a supplier's media material by storing it on a server and only transferring it to a media player application after verification.
·         Coinplug — Registering only the Merkle tree model's root estimates that speak to the entire blockchain                 rather than the complete blockchain to a local PC.
·          Bank of America - Using extremely complex hashes for each mutual information record to aid the framework in distinguishing between offer and other information records.
·          Alibaba – Choosing an accord hub for information sharing and consistency using a democratic framework to streamline the handling steps required to check the blockchain
Security and Secrecy
It identifies with patent families that are aiming to address challenges of information and motion protection and encryption.
• IBM – A method of securing a provider's media material by storing it on a server and only delivering it to a media player application after authentication.
• Coin-plug — A verification framework based on a blockchain-based electronic wallet.
• Bank of America – A framework that keeps track of asset accessibility and converts non-secure instruments into secure instruments that require customer and mark approval before access is granted.
• Alibaba – Using a blockchain stage to confirm exchange requests before transferring assets to the client.
Transactions in General
It includes arrangements for information exchanges and trade, as well as tracking and determining the types of transfers.
• IBM – blockchain to identify parties with outstanding transactions.
• Coin-plug — Using blockchain to track transactions between parties without requiring the perception of an open location or the use of QR codes.
• Bank of America – Using a blockchain framework to allow customers to migrate data from one bank to the next without the need for an aggregator.
• Alibaba – A clever blockchain agreement framework that determines the optimum agreement arrangement to use based on the business exchanges required.






Text Mining And Copyright

In text and data mining, enormous amounts of copyrighted material are frequently copied. To'mine' books and other content, researchers must utilise computer programmes to access, copy, and process them. Even if researchers have legal access to and can read the content, such as through their university library, copying a significant percentage of those works may be illegal.
Copyright, on the other hand, was never meant to restrict the use of a work's ideas, facts, or information. In a recent case involving internet browsing, the UK Supreme Court reaffirmed this principle, saying, "Broadly speaking, producing or distributing copies or adaptations of a protected work is an infringement." Simply looking at or reading it is not an infringement[8]. Text and data mining might be considered a technology that merely substitutes for human sight and reading. As a result, copying in the context of a text mining process could be seen as a byproduct of the technology's operation rather than an activity aimed at exploitation of copyright-protected content.
In this regard, copyright owners (publishers) have traditionally been ready to allow academics to'mine' works in their catalogues, particularly if the research might result in mutually advantageous outcomes, such as the development of software tools that increase the value of their catalogues. Instead of being competitors, readers and researchers are partners of copyright owners.

Exception For Text And Data Analysis

Copyright laws in the United Kingdom allow academics to make copies of works "for text and data analysis."
This means that if a person has legal access to a work, they can create a duplicate of it in order to do a computational analysis of the information contained within it.
The exception is subject to the following conditions:
1) The conceptions and lies must be for the goal of non-commercial study.
2) The copy is accompanied by an appropriate acknowledgement (unless this is practically impossible)
Copyright is also violated if a copy is transferred to a third party or used for a purpose other than those permitted under exceptions, according to the requirements (although the researcher could ask the owner for the permission to do either of these things).
Furthermore, text and data analysis copies are not for sale or hire.
Contracting out of the activities covered by the exception is not an option, according to the regulation.
Contractual terms that purport to limit or prevent the performance of the exception-authorized acts are unenforceable. The exclusion applies to all sorts of copyright works, as well as recordings of performances, even though text and data analysis is primarily focused with mining literary works. Policymakers considering enacting an explicit TDM exceptions or limitations should consider the following questions:
Whether the exception applies to only one type of right (reproduction) or all types of rights (adaptation/derivation);

Is it possible to have contractual overrides?
Whether the content should come from a legal source?
What kind of data dissemination, if any, is possible?
Whether TDM is used for a non-commercial purpose.?
In response to the main question, if allowing TDM is considered a normatively valid goal, the right holder should no longer be able to block it by using one right fragment from the bundle of copyright rights. Irini Stamatoudi concluded from an analysis of the rights involved that right fragments beyond reproduction and variation were far less applicable[9]. Nonetheless, it seems safer to phrase the exception or difficulty as a non-infringing use, as in the USA Copyright Act's Section 107 (fair use).[10]
Second, for the same reason, contraction overrides should be prohibited.
Unless there was only one TDM provider for a given type of job, it's difficult to see how they might
be effective. Even if a clause barring contractual overrides is not mentioned in the text of the statute, contract law principles may render the restriction invalid.[11]
On the surface, the legal source factor in French law appears to be compelling. It appears to be difficult to argue against requiring the data's source to be legitimate. However, putting it into practise poses certain difficulties.
To begin with, a human user may not always be able to determine whether or not a source is legal; the issue may be even more unclear for a machine.
Second, determining the legality of a foreign source may entail an examination of the law of the country of origin, because copyright infringement is found using the lex loci delicti, which necessitates first determining the source'sorigin. Perhaps a requirement focusing on sources that the user is aware of or would be  grossly negligent in ignoring in not knowing were illegal might be more appropriate.[12]
The final two tasks are a little more challenging. It may be important to communicate the information to others who are interested in the project if the data comprises copyrighted content. German legislation exempts a "restricted circle of people for cooperative scientific research," as well as "third parties for the purpose of checking the quality of scientific research." This represents the scientific exceptions, which include project-based work by a small group of scientists under the supervision of peer reviewers. This would make it impossible for TDM to scan libraries of books and make snippets available to the public, as Google Books does.[13]
We understand that big data analytics can assist society in a variety of ways, including scientific and medical research. These advantages come on top of better products and services for consumers and business advantages. Nonetheless, these benefits should not come at the expense of an unjustifiable violation of privacy. Data protection principles should not be regarded as a stumbling block to progress, but rather as a foundation for promoting privacy rights and encouraging the development of innovative ways to inform and involve the public. Transparency regarding the aim and impact of analytics is not just required by law; it may also boost people's confidence as "digital citizens" in the age of big data.
To summarise, the interfaces between Big Data and IP are all about finding ways to adapt IP rights to allow and set proper parameters for the generation, processing, and use of Big Data. This includes a look at how Big Data might violate intellectual property rights. However, Big Data poses an issue in terms of rights. Courts and politicians have been questioned for years about the restrictions and protection of Big Data.