Federated Learning– New Opportunities for data protection compliance

Federated machine learning is a rather new approach of machine learning where an algorithm is trained on local nodes or devices. The algorithm and the data sets are only stored in a decentralized way on local devices. Federated learning enables devices to learn collaboratively while keeping the data locally.  Only parameters (non-IID) of the training model are exchanged with a global model.

Traditional machine learning techniques mostly rely on a centralized data storage where local devices upload their data to one cloud. In many scenarios, machine learning has difficulties to comply with increasing data protection regulations, especially GDPR. The GDPR has pointed out 7 requirements and principles when running AI applications: 

  1. AI may not turn people into objects
  2. AI may only be used for constitutionally legitimized purposes and may not be used for a specific purpose
  3. AI must be transparent, comprehensible, and explainable
  4. AI must avoid discrimination
  5. Data minimization applies to AI
  6. AI needs accountability
  7. AI needs technical and organizational standards

The principle of data minimization is in contrast with machine learning. The key objective of machine learning is to get as much data as possible to train the global model. Federated learning stores and trains personal data only on local devices and does not exchange personal data with a centralized cloud.

There are many so called privacy-preserving technologies (PPT’s) to ensure data protection for users. Those technologies involve data aggregation and advanced cryptographic: data anonymization, differential privacy, secure multi-party computation (SMC), and homomorphic encryption.

If you achieve legally compliant data anonymization – non personal data – you do not need to comply with the GDPR. GDPR only handles the usage of personal data. In practice, data anonymization is very hard to accomplish because most application need an identifier.
Federated learning is a prospective solution to mitigate the risk of data breaches and keeps personal data on the user’s device. However, this technology is rather new and is in an early stage and more research is needed.  

ePrivacy is part of a major research project which is funded by the German Federal Ministry of Education and Research. The objective of this research is to develop a data custodianship for federated artificial intelligence in medicine in cooperation with the University of Hamburg, and the Medical University of Greifswald. A large amount of medical data can only be used to a very limited extent, due to high data protection requirements. This highly sensitive data is essential for drug and therapy research. The research project deals with big medical data processing in compliance with data protection regulations in Germany. This research is also part of the European data strategy.