New article: Making a Datomic system GDPR-compliant


Feedback welcome!


Nice work!
Technically, you do not need to “delete” data anyway; you can blacklist data, and that is an allowed solution as well.


What does blacklisting involve? Encryption with an ephemeral key?


Even more stupid: keep a list if data that is not supposed to be there anymore, and kill data on retrieval. The point of GDPR is not that someone’s email is not as a sequence of bytes on your disks, the point is that you will not use it/display it to users if asked not to.
[or so our German consultants say. I’m not a lawyer. YMMV]


Interesting… would love to see the rationale behind that. I’m guessing it is an assumption based on precedent. Reading through Article 17 of GDPR, it looks pretty clear at first read:

The data subject shall have the right to obtain from the controller the erasure of personal data concerning him or her without undue delay and the controller shall have the obligation to erase personal data without undue delay

The question is, “what does erasure mean”? It certainly implies deletion from disk, but I’m sure there will be a bunch of cases that will help define this.

Initial Googling found this page, which traces the history of this “right to be forgotten” back to a 1995 directive where a:

data subject has the right under certain conditions to ask search engines to remove links with personal data

However, part of the ruling was that:

Deleting the search engine results linked to the data subject’s name does not mean that the content is deleted from its original publication location

Which makes sense when you’re talking about the responsibility of the search engine… maybe that’s being used as a precedent, though, to say that it’s not whether the bytes exist on disk or not, rather it’s if the person appears forgotten by any reasonable search?

I’m no lawyer, so I really have no idea, but that’s a huge difference in terms of cost of implementation. Will be interesting to see where this ends up.

Oh, and great post @vvvvalvalval! It’s great to see some discussion on implementation and not just high-level debate on what the lawyers have written :slight_smile:


It seems a number of people are interpreting this article as legal advice, so I added a disclaimer in the beginning: “this article is not legal advice; its goal is to give you options, not to tell you what you’re supposed to do.”

Having said that, I think the legal and ethical discussion around these issues is also worth having:

As someone who gets legal counselling about this (which may or may not be good), I’m very skeptical about these interpretations, that’s not how we read the GDPR at all here. The GDPR talks about user consent (the user should proactively consent to any processing of her personal data, and should be able to modify or withdraw that consent) and also talks about erasure, so presumably those are different things. “Not using/displaying data” is nothing more than abiding by consent, it’s not erasure. I do agree that ‘erasing data’ means making it hard to access more than it means ‘wiping out any occurrence of this sequence of bytes from the universe’, but I’m pretty sure it means more than “flagging the data as not to be used”. I know of some companies that were audited by the CNIL in France for GDPR-related issues, and I can tell you their approach was much stricter.

I don’t want to indulge in fear-selling: again, one of the main points of the article is that data erasure with Datomic is not that hard to achieve.

I also think we need to put ourselves in the shoes of our users, and genuinely ask ourselves what it means to protect privacy. Even if you have flagged the data as ‘must not be processed / read’, what guarantees you that this flagging metadata won’t be left behind in a future refactoring or data migration ? How do you know your successors will have as much ethics as you do, and discipline themselves to say no when the manager asks for an export of all emails in the database ? I don’t think some metadata is an appropriate level of protection here; an appropriate level of protection might be you having to tell your manager “this data has been erased for privacy-regulation reasons, and we can’t retrieve it with a database query, and if we want to retrieve them we’ll have to go all the way to the datacenter hard drives and unreliably scan them for residual data, and by the way I’ve never done that so it’s likely to take weeks”.


About the legal aspects, see also the comments on Reddit:


European Studies graduate here (anyway, not a lawyer…). It seems pretty clear that erasure must be an option, but then making data completely anonymous and not retrievable in an atomic way (meaning we can’t get John Lee record as a John Lee record, but only in aggregate with all the other records) seems reasonable as well.

The issue is there are various levels of permissions: I may want you to retain my data (let’s say you’re a bank), but I may refuse my consent to use them to profile me or to draw inferences from my vector.

Fun fact: either the GDPR will be somewhat relaxed, or this is the end of blockchain. It’s too difficult to make it compliant under all aspects.

P.S.: please don’t trust regular lawyers, there are lawyers specialized in European law and even on GDPR issues. Talk with them, the others are not prepared