A proposal for an archive revisiter

Thursday, November 8th, 2018

In his long list of statistical notes, Gwern includes a proposal for an archive revisiter:

One reason to take notes/clippings and leave comments in stimulating discussions is to later benefit by having references & citations at hand, and gradually build up an idea from disparate threads and make new connections between them. For this purpose, I make extensive excerpts from web pages & documents I read into my Evernote clippings (functioning as a commonplace book), and I comment constantly on Reddit, LessWrong, HN, etc. While expensive in time & effort, I often go back, months or years later, and search for a particular thing and expand & integrate it into another writing or expand it out to an entire essay of its own. (I also value highly not being in the situation where I believe something but I do not know why I believe it other than the conviction I read it somewhere, once.)

This sort of personal information management using simple personal information managers like Evernote works well enough when I have a clear memory of what the citation/factoid was, perhaps because it was so memorable, or when the citations or comments are in a nice cluster (perhaps because there was a key phrase in them or I kept going back & expanding a comment), but it loses out on key benefits to this procedure: serendipity and perspective.

As time passes, one may realize the importance of an odd tidbit or have utterly forgotten something or events considerably changed its meaning; in this case, you would benefit from revisiting & rereading that old bit & experiencing an aha! moment, but you don’t realize it. So one thing you could do is reread all your old clippings & comments, appraising them for reuse.

But how often? And it’s a pain to do so. And how do you keep track of which you’ve already read? One thing I do for my emails is semi-annually I (try to) read through my previous 6 months of email to see what might need to be followed up on10 or mined for inclusion in an article. (For example, an ignored request for data, or a discussion of darknet markets with a journalist I could excerpt into one of my DNM articles so I can point future journalists at that instead.) This is already difficult, and it would be even harder to expand. I have read through my LessWrong comment history… once. Years ago. It would be more difficult now. (And it would be impossible to read through my Reddit comments as the interface only goes back ~1000 comments.)

Simply re-reading periodically in big blocks may work but is suboptimal: there is no interface easily set up to reread them in small chunks over time, no constraints which avoid far too many reads, nor is there any way to remove individual items which you are certain need never be reviewed again. Reviewing is useful but can be an indefinite timesink. (My sent emails are not too hard to review in 6-month chunks, but my IRC logs are bad – 7,182,361 words in one channel alone – and my >38k Evernote clippings are worse; any lifestreaming will exacerbate the problem by orders of magnitude.) This is probably one reason that people who keep journals or diaries don’t reread Nor can it be crowdsourced or done by simply ranking comments by public upvotes (in the case of Reddit/LW/HN comments), because the most popular comments are ones you likely remember well & have already used up, and the oddities & serendipities you are hoping for are likely unrecognizable to outsiders.

This suggests some sort of reviewing framework where one systematically reviews old items (sent emails, comments, IRC logs by oneself), putting in a constant amount of time regularly and using some sort of ever expanding interval between re-reads as an item becomes exhausted & ever more likely to not be helpful. Similar to the logarithmically-bounded number of backups required for indefinite survival of data (Sandberg & Armstrong 2012), Deconstructing Deathism – Answering Objections to Immortality, Mike Perry 2013 (note: this is an entirely different kind of problem than those considered in Freeman Dyson’s immortal intelligences in Infinite in All Directions, which are more fundamental), discusses something like what I have in mind in terms of an immortal agent trying to review its memories & maintain a sense of continuity, pointing out that if time is allocated correctly, it will not consume 100% of the agent’s time but can be set to consume some bounded fraction.


So you could imagine some sort of software along the lines of spaced repetition systems like Anki, Mnemosyne, or Supermemo which you spend, say, 10 minutes a day at, simply rereading a selection of old emails you sent, lines from IRC with n lines of surrounding context, Reddit & LW comments etc; with an appropriate backoff & time-curve, you would reread each item maybe 3 times in your lifetime (eg first after a delay of a month, then a year or two, then decades). Each item could come with a rating function where the user rates it as an important or odd-seeming or incomplete item and to be exposed again in a few years, or as totally irrelevant and not to be shown again – as for many bits of idle chit-chat, mundane emails, or intemperate comments is not an instant too soon! (More positively, anything already incorporated into an essay or otherwise reused likely doesn’t need to be resurfaced.)

This wouldn’t be the same as a spaced repetition system which is designed to recall an item as many times as necessary, at the brink of forgetting, to ensure you memorize it; in this case, the forgetting curve & memorization are irrelevant and indeed, the priority here is to try to eliminate as many irrelevant or useless items as possible from showing up again so that the review doesn’t waste time.

More specifically, you could imagine an interface somewhat like Mutt which reads in a list of email files (my local POP email archives downloaded from Gmail with getmail4, filename IDs), chunks of IRC dialogue (a grep of my IRC logs producing lines written by me +- 10 lines for context, hashes for ID), LW/Reddit comments downloaded by either scraping or API via the BigQuery copy up to 2015, and stores IDs, review dates, and scores in a database. One would use it much like a SRS system, reading individual items for 10 or 20 minutes, and rating them, say, upvote (this could be useful someday, show me this ahead of schedule in the future) / downvote (push this far off into the future) / delete (never show again). Items would appear on an expanding schedule.


As far as I know, some to-do/self-help systems have something like a periodic review of past stuff, and as I mentioned, spaced repetition systems do something somewhat similar to this idea of exponential revisits, but there’s nothing like this at the moment.


  1. Harry Jones says:

    Evernote, zim, kjots and the like. I was well into writing my own before I discovered them. There should be a name for this sort of thing (no, they’re not PIMs, and note taking software doesn’t quite capture it.)

    I suspect there are two problems: 1. only a minority of people can even see the value in such a thing and 2. one size does not fit all. Different minds work in different ways. Even after learning that this class of thingies existed I still develop my own because I’m always thinking of some feature I want to add or some way to rejigger the UI to make it more efficient for me.

    It seems to me that if you’re in the market at all, you want to roll your own. It has to fit your own individual way of thinking. It’s a extension of one’s own mind.

  2. Magus says:

    Who needs AI when we have Gwern?

Leave a Reply