There is a thread on meta.discourse.org discussing this. It’s not easy because Google makes it hard, but someone seems to have built a somewhat working scraper with Selenium + the ability to import the data.
Something worth watching. At the top they cite an article about the next google products that are likely to be killed off, and it’s true that Groups seems to have been neglected a bit in recent years.
I previously thought about moving the Clojure Berlin group over here. Might be a good small test of this thing. We’ll see when I get to that though
I admit I haven’t been paying much attention recently, but is the NNTP interface to Google Groups no longer an option? I’d think that would be much simpler to fetch from than their mess-of-a-not-quite-SPA reader interface. Is there a specific group we want/need to import? For many groups I follow, secondary mirrors (such as Gmane) have done a decent job of grabbing much of what’s on Google.
Will definitely keep an eye on this, though…
Newsgroups and mailing lists are actually treated separately by Google. The user interface is identical, but the groups created within GG are not accessible through NNTP.
There’s http://www.gmane.org/, which can create an NNTP-accessible archive of any mailing list. If that was set up early in the group’s life then that’s a good option. Alternatively whoever created the group might have the whole history in IMAP somewhere.
Once you start digging the whole story of Google Groups is so sad. Google bought up the few companies with a big historical usenet archive. That data should have been on archive.org. You can’t properly browse those historical posts, only type stuff in a search box and hope you get lucky. The whole product hasn’t been meaningfully changed in years, making you wonder if it’s the next thing on the chopping block. It’s textbook embrace, extend, extinguish.