A proposal for an archive revisiter

Thursday, November 8th, 2018

In his long list of statistical notes, Gwern includes a proposal for an archive revisiter:

One reason to take notes/clippings and leave comments in stimulating discussions is to later benefit by having references & citations at hand, and gradually build up an idea from disparate threads and make new connections between them. For this purpose, I make extensive excerpts from web pages & documents I read into my Evernote clippings (functioning as a commonplace book), and I comment constantly on Reddit, LessWrong, HN, etc. While expensive in time & effort, I often go back, months or years later, and search for a particular thing and expand & integrate it into another writing or expand it out to an entire essay of its own. (I also value highly not being in the situation where I believe something but I do not know why I believe it other than the conviction I read it somewhere, once.)

This sort of personal information management using simple personal information managers like Evernote works well enough when I have a clear memory of what the citation/factoid was, perhaps because it was so memorable, or when the citations or comments are in a nice cluster (perhaps because there was a key phrase in them or I kept going back & expanding a comment), but it loses out on key benefits to this procedure: serendipity and perspective.

As time passes, one may realize the importance of an odd tidbit or have utterly forgotten something or events considerably changed its meaning; in this case, you would benefit from revisiting & rereading that old bit & experiencing an aha! moment, but you don’t realize it. So one thing you could do is reread all your old clippings & comments, appraising them for reuse.

But how often? And it’s a pain to do so. And how do you keep track of which you’ve already read? One thing I do for my emails is semi-annually I (try to) read through my previous 6 months of email to see what might need to be followed up on¹⁰ or mined for inclusion in an article. (For example, an ignored request for data, or a discussion of darknet markets with a journalist I could excerpt into one of my DNM articles so I can point future journalists at that instead.) This is already difficult, and it would be even harder to expand. I have read through my LessWrong comment history… once. Years ago. It would be more difficult now. (And it would be impossible to read through my Reddit comments as the interface only goes back ~1000 comments.)

Simply re-reading periodically in big blocks may work but is suboptimal: there is no interface easily set up to reread them in small chunks over time, no constraints which avoid far too many reads, nor is there any way to remove individual items which you are certain need never be reviewed again. Reviewing is useful but can be an indefinite timesink. (My sent emails are not too hard to review in 6-month chunks, but my IRC logs are bad – 7,182,361 words in one channel alone – and my >38k Evernote clippings are worse; any lifestreaming will exacerbate the problem by orders of magnitude.) This is probably one reason that people who keep journals or diaries don’t reread Nor can it be crowdsourced or done by simply ranking comments by public upvotes (in the case of Reddit/LW/HN comments), because the most popular comments are ones you likely remember well & have already used up, and the oddities & serendipities you are hoping for are likely unrecognizable to outsiders.

This suggests some sort of reviewing framework where one systematically reviews old items (sent emails, comments, IRC logs by oneself), putting in a constant amount of time regularly and using some sort of ever expanding interval between re-reads as an item becomes exhausted & ever more likely to not be helpful. Similar to the logarithmically-bounded number of backups required for indefinite survival of data (Sandberg & Armstrong 2012), Deconstructing Deathism – Answering Objections to Immortality, Mike Perry 2013 (note: this is an entirely different kind of problem than those considered in Freeman Dyson’s immortal intelligences in Infinite in All Directions, which are more fundamental), discusses something like what I have in mind in terms of an immortal agent trying to review its memories & maintain a sense of continuity, pointing out that if time is allocated correctly, it will not consume 100% of the agent’s time but can be set to consume some bounded fraction.

[...]

So you could imagine some sort of software along the lines of spaced repetition systems like Anki, Mnemosyne, or Supermemo which you spend, say, 10 minutes a day at, simply rereading a selection of old emails you sent, lines from IRC with n lines of surrounding context, Reddit & LW comments etc; with an appropriate backoff & time-curve, you would reread each item maybe 3 times in your lifetime (eg first after a delay of a month, then a year or two, then decades). Each item could come with a rating function where the user rates it as an important or odd-seeming or incomplete item and to be exposed again in a few years, or as totally irrelevant and not to be shown again – as for many bits of idle chit-chat, mundane emails, or intemperate comments is not an instant too soon! (More positively, anything already incorporated into an essay or otherwise reused likely doesn’t need to be resurfaced.)

This wouldn’t be the same as a spaced repetition system which is designed to recall an item as many times as necessary, at the brink of forgetting, to ensure you memorize it; in this case, the forgetting curve & memorization are irrelevant and indeed, the priority here is to try to eliminate as many irrelevant or useless items as possible from showing up again so that the review doesn’t waste time.

More specifically, you could imagine an interface somewhat like Mutt which reads in a list of email files (my local POP email archives downloaded from Gmail with getmail4, filename IDs), chunks of IRC dialogue (a grep of my IRC logs producing lines written by me +- 10 lines for context, hashes for ID), LW/Reddit comments downloaded by either scraping or API via the BigQuery copy up to 2015, and stores IDs, review dates, and scores in a database. One would use it much like a SRS system, reading individual items for 10 or 20 minutes, and rating them, say, upvote (this could be useful someday, show me this ahead of schedule in the future) / downvote (push this far off into the future) / delete (never show again). Items would appear on an expanding schedule.

[...]

As far as I know, some to-do/self-help systems have something like a periodic review of past stuff, and as I mentioned, spaced repetition systems do something somewhat similar to this idea of exponential revisits, but there’s nothing like this at the moment.

Posted in Education, Technology | 2 Comments »

Comments

Harry Jones says:

November 8, 2018 at 11:41 am

Evernote, zim, kjots and the like. I was well into writing my own before I discovered them. There should be a name for this sort of thing (no, they’re not PIMs, and note taking software doesn’t quite capture it.)

I suspect there are two problems: 1. only a minority of people can even see the value in such a thing and 2. one size does not fit all. Different minds work in different ways. Even after learning that this class of thingies existed I still develop my own because I’m always thinking of some feature I want to add or some way to rejigger the UI to make it more efficient for me.

It seems to me that if you’re in the market at all, you want to roll your own. It has to fit your own individual way of thinking. It’s a extension of one’s own mind.
Magus says:

November 9, 2018 at 12:15 am

Who needs AI when we have Gwern?

Bob Sykes: The problem facing all colleges and universities is that the number of white 18 year-olds, the primary consumers of college, is declining rapidly both relative to other races and absolutely. Many small liberal arts colleges are decidedly second rate academically, and so are the students they cater to. So, neither the loss of the schools nor the loss of the students is really a big deal. The health of the college system and the meaningfulness of the degrees awarded is actually better off...
Isegoria: Rising Sun came a decade later. I remember reading the novel right around when I read Jurassic Park. The Terminator, on the other hand, came out the same year as Runaways and was a much bigger deal.
Kentucky Headhunter: Huh, I remember Runaway being a fairly frequent Saturday afternoon movie option on cable. Not to the level of Rising Sun, but it was on at least once every three or four months. Now, unlike Rising Sun, I never actually left it on…
TRX: Crichton usually got his computer stuff correct, though. He picked up a degree in “computer graphics” while he was getting his M.D. at Harvard. When he decided he didn’t care for doctoring, he went to Hollywood and made more computer-ish movies than doctor-ish ones. I only discovered “Runaway” earlier this year; I thought I was familiar with all of Crichton’s movies, but apparently not. I don’t remember ever seeing any mention of it anywhere.
Isegoria: Crichton clearly had little interest in the details of weapons. In the movie, a household robot goes rogue and acquires a revolver — which makes a pump-action shotgun racking sound before each shot and leaves a ragged two-inch hole in the drywall. Sigh. So I’m not surprised he gets his warships mixed up.
Lucklucky: “battleship Sheffield” It was a mere destroyer not a battleship…
Jim: Equanimous Independence Day!
Buckethead: Adjacent to Atomic Rockets is ToughSF. Well researched and fascinating speculation on space. He posts only every so often, but he did do an interesting series on stealth – and piracy — in space.
Isegoria: Thanks for putting in the work, George. Grok also kept pointing to this blog. Apparently AI struggles with comments repeated across multiple pages.
George: Gemini claims (and I haven’t confirmed) that it’s: …a classic historical description written by the Scottish physician and traveler Dr. John Macculloch in his 1824 book, The Highlands and Western Isles of Scotland.He used this vivid phrase to describe the famous and treacherous pass of Glencroe… After searching all three volumes as PDFs, I’m pretty sure Gemini is hallucinating. And substantial time spent searching keeps leading me back to this blog. Cough up the...
Isegoria: I don’t think you’re alone in your struggle, Handle.
Grymalkin: Rudolf Jung (b.1882 – d.1945) was the first principal theoretician of National Socialism, a Sudeten-German trade-unionist and railway engineer who joined the Austro-Hungarian German Workers’ Party (later German National Socialist Workers’ Party, DNSAP) in 1909 and was heavily responsible for both building up the early successes of the movement and for establishing close links in the post-WWI era with emerging National Socialist parties in other German-speaking areas...
Handle: I liked Cal Newport’s World Without Email, but I have proven to be poor evangelist for its message. The default mode of email of permissionless universal access is bad (compare to needed to get affirmative consent for “direct liaison authority” like in the military) and everybody says they hate it but no one is willing to give up even a little bit of that capability, even though it’s essential to any system of attention preservation and disciplined communications. Sorry...
Bruce: Ludendorf’s WWI ‘War Communism’ and Marxist-Leninism rhyme.
Bob Sykes: There is no question that National Socialism is s form of socialism. Just ask G. B. Shaw or Time. One of the differences among socialisms is how they treat the national issue. Marxists are doctrine internationalists, and they used to maintain that the only distinguishing characteristic among people that really counts is economic class. That position seems to have been abandoned during WW I, because the proletariat turned out to be nationalists. Fascism and Naziism valorate the nation (or more...
Isegoria: It doesn’t appear to come from either Dracula or The Social Life of Small Urban Spaces.
Isegoria: I suppose I should have linked to my fourth Critical Chain post, which addresses multi-tasking.
Gaikokumaniakku: “The trail I walked lacked the geometric and artificial precision of the grand boulevards of the Städte I would later come to know so well. Here Nature did not bend to Man with such frequency or slavishness, but rather the two seemed to bend around one another at regular intervals, a grant of mutual dignity prevailing between the two. Here the paths wound around and through the hills, according to how the land pulled a man’s steps hither or thither. It was by this road, made by Man but...
Gaikokumaniakku: I’m a simple man. I see Goldratt, I feel compelled to wade into the comments section, even if I have little to add that the author has not already said. Overproductive workers who produce subassemblies are an example of physical constraints of part storage. Real-world factories don’t have infinite buffer space to store subassemblies. Overproduction is a problem for many reasons, but if we had some kind of Star-Trek-tier space warp for infinite storage, overproduction would be...
W2: Do two people looking at the same place see the same sprite?

Isegoria

A proposal for an archive revisiter

Comments

Leave a Reply

Search

Recent Comments

Categories