Ideas for Clojurians-log

Hi Arne, I couldn’t dredge up your email so reaching you here. I have been toying with the idea of building a discovery dashboard for the Clojurians database in hyperfiddle. There is a lot of amazing conversations that are just buried and rotting in slack and you’ve already built a starting point to solving this problem. I loaded up your sample data and did a quick first pass at it here: http://tank.hyperfiddle.net/:clojurians/ or a better example maybe http://tank.hyperfiddle.net/:clojurians!user-profile/‘stuarthalloway’ . One interesting thing is that anybody can create and share interesting queries backed by the database you’ve created.

I am wondering, first, do you have any future plans for the Clojurians log? And second, is opening the dataset to public query something you’d be interesting in collaborating on? Also, what does the community think?

2 Likes

That’s pretty cool stuff @dustingetz! I previously discussed opening up the dataset with Arne and while this would generally be cool there are some concerns around privacy and giving too structured access to this data.

On one hand a tool as flexible as hyperfiddle makes this concern even more real, on the other hand HTML is pretty structured too and if people really wanted to mine this data they totally could.

/cc @seancorfield who’s a Clojurians admin as well and might have his own thoughts on the topic.

Hi @dustingetz, since you found the demo data I suppose you also read the section titled Why don’t you just make the raw logs public??

While it is generally known that much of Clojurians is logged, it is not something we explicitly have people’s consent for, so from a privacy perspective we are on thin ice, which makes me wary of opening up the data set to novel uses.

The example you provide where you are able to search for all posts by a specific user is also exactly the kind of thing that would enable abuse.

Before considering a use case like this I’d like to see at a minimum

  • broad consultation with the community, especially from marginalized groups who are often the most vulnerable to online abuse
  • features added to the clojurians-log app so people can manage their own data, so they can do things like deleting their complete history, deleting a certain time range, or deleting individual posts
  • channel maintainers should also get an easier way to opt-out of logging their channel, and of deleting their channel’s history
1 Like

Yes, there are definitely privacy issues around the ability to “mine” conversation data. Even tho’ Clojurians is technically a “public” space, it is actually walled off from true public browsing – and the ability to publicly browse logs of all (most) channels is already a bit of a concern for some people.

I agree with @plexus that we definitely want people to be able to control how much of their conversations are searchable in this manner before we open up a search like this. GDPR considerations alone would likely mean that the Clojurians log system should provide a way to “forget my data” at a minimum.

@plexus, @seancorfield, what do you suggest a possible next step is?

One thing I have considered is replicating only #datomic as an experiment, and blasting it all over social media, to see if it becomes a thing people care about.

The data security concerns – if the community actually voices this concern – may be able to be mitigated by technology and then resolved without further thought – for example, by allowing people to associate by email address with their content and remove it if it is theirs. We don’t have to expose the d/history for query (it currently isn’t).

My absolute preference would be not to open this can of worms. I was hoping not to have to spell this out, but clojurians-log is living on borrowed time. We are breaking the law, and as soon as someone starts complaining about that I will pull the plug.

Right To Be Forgotten is one aspect of the GDPR, but there’s also explicit consent, right to data portability, the need of a privacy policy, …

That’s not to mention the role of Slack itself. I’ve gone through Slack’s ToS before and couldn’t immediately find any terms that we’re breaking, but I wouldn’t bet on it

So far I have kept clojurians-log up because no one has complained and it’s a useful resource for the community, although I have processed a few requests to delete things by hand, which also isn’t scalable.

If you really want to make this data available for more uses then start by helping to get clojurians-log out of these legal murky waters.

  • Get some response from Slack that they won’t force us to pull the plug
  • implement a mechanism for people to give consent to storing and displaying their data
  • ask everyone we have data on to provide consent, and hide data of everyone who hasn’t opted in
  • implement a mechanism for a person to delete their data
  • implement a mechanism for a person to download their data

These last few could all be done through private messages with the log bot, for instance. That would provide an easy way to authenticate user requests as well.

Sorry to be the party pooper here, but I actually think this could be a great opportunity for us to do the right thing and clear up any ambiguity about the state of this data, not to mention some fun technical challenges to solve.

I don’t really see the point of collecting the data and then not opening it up. And one of the reasons people still hang around Slack is the re-assurance that tons of advice will be preserved in the open data set we know of as clojurians-log. If clojurians-log is living on borrowed time, I think a lot more Slack users will be much more motivated to find a more open, resilient history of clojure advice.

Just to clarify, I totally sympathize with the position you’re in. It’d probably be a good idea to sort out the legalities of preserving it. Especially the beginners channel. As an aside, if clojureverse had a decent chat interface with preserved history explicit, I bet most of slack would migrate here.

This is not an answer for everything you raised, but note that Slack itself appears not to remove actual message content upon request, just real name and address. Given the importance of the definition of “personal data” to GDPR, I wonder whether it is so cut and dry.

The main point here is that once we allow full access to the dataset, the genie is out of the bottle. There is no more “delete” possible because the internet does not forget. We will no longer be able to track everyone with a copy of the data and ask them to delete parts of it. At the moment we can still do that, which means that if we get a request to rectify the situation we can.

If people are serious about enabling new uses, then start by writing a Slack bot that lets a person register their consent.

person> hi @logbot
logbot> Hi @person, I understand these commands: `help`, `consent`, `delete`.
person> consent
logbot> Do you agree to make your full past and future chat history available publicly in a machine readable format? [yes/no]
person> Yes
logbot> Thank you, your preference has been recorded!

This can be a standalone thing, there are plenty of slack bot tutorials out there. Record this information so the log app can use it. From there we can iterate and improve.

I think that there is a problem here: either you own and treat data you shouldn’t (are you able to delete all my messages from the logs if I ask so?) or people by getting into a service already gave their consent to make that data public.

Asking for consent might either be redundant or harmful (are you going to delete all the messages of those not responding? What if someone answers late and gets angry 'cause someone deleted their messages from the logs?)

Point is: Slack is not great for these things exactly for this reason, it was born for internal usage in companies where people already sign mountains of paper about data sharing, ownership, etcetera.

It would probably be better at least considering to moving Clojurians somewhere else.

P.S.: removing names and avatars from conversations and not keeping an ID to do the reverse means making data anonymous, this makes possible to release publicly all messages without any issue

We would not delete anything unless asked for it by the data owner (the user who posted it). We would only distribute machine-readable logs of people who gave consent. This is not perfect, but it would be a huge improvement. In general when confronted with a GDPR violation you are given time to rectify the situation before getting fined. This approach would ensure we are still able to do so.

I’m not going to argue with that. About a dozen alternatives have been discussed and launched. Go through the history of the “community-development” channel to find some of that history or google for “clojurians slackpocalypse”. Apparently convincing 6000+ people to pack up and come along with you isn’t that easy.

I run ClojureVerse because I believe in communities owned by the community. I run clojurians-log as a stopgap solution because people are creatures of habit and will continue to treat Slack as the only game in town.

This is simplistic and ill informed. Effectively anonymizing data is an unsolved problem.

This is what people specialized in the matter say, we like to make things more complex than they need. I agree that there are methods to understand who someone is without names and such, but the Commission and the EU Court (here a source: https://www.pdpjournals.com/docs/88197.pdf) are good as long as I can’t rebuild that identity.

If messages are all anonymous without any ID that makes possible to group them, even if I can deduct some info by a message I have to enrich some “user” profile data with this info, but that won’t be possible because I don’t have profiles anymore and I don’t even know if that user has written other messages and which ones they are.

We shouldn’t be so afraid of the GDPR, there is a lot of misinformation around (often because of ill informed lawyers who don’t understand neither EU law, nor what treating data means).

Anyway, I perfectly understand what the issues are and I agree that it isn’t easy to deal with a large community of people.

Knowledge rot can be solved many ways. Opening the dataset was just one idea, but there is a continuum of choices that fit the privacy spectrum in different ways. Here is another idea: Tweetstorms but for slack. You use the :zap:reaction to flag a thread which turns it approximately into http://www.dustingetz.com/:datomic-ion-launch-day-questions/ , with an admin panel to handle the privacy workflow as well as revise/flag/disavow. Something like this would save a lot of people a lot of time, and especially help out beginners in a huge way.

1 Like

Not to minimize the complexities or risk, but just to offer some more information on this topic: Stack Exchange’s position is that they need only delete user account information (not contributed content), possibly with the ability to redact specific personal data in an answer or question.

I’m not actually arguing any position here, for what it is worth. I just think it is interesting.

I agree, both the GDPR and the EU DPO always talk about “reason”, ergo since today is nearly impossible to identify someone by its text, that’s it. It’s not like we have geolocated data, that would be an issue :slight_smile:

I would like to stress the fact that I’m not arguing anything as well, just discussing and contributing. I like @dustingetz idea as well, but I wonder what happens if the thread isn’t a thread but is just in the stream?

I dont know but it is software we are talking about so we will solve it!

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.