On a new interface for writing

September 14, 2024 (3mo ago)

We need a better AI interface for writers and essayists.

I want to share some ideas I've had about how we can build better AI systems for writing and knowledge management. These stem out of my experience writing with Obsidian and using Claude and Perplexity as both intuition and refinement engines.

[!important] The text you see here is from LM Sacasas's "What do Human Beings Need". I have used it here only for illustration purposes. I don't own or am not a contributor to it. All the illustrations here do not show how to improve the article but provide examples of how an interface can be built. I really admire Michael's writing and urge you to read his original article here linked above.

Chat is not enough

Chat as an interface is insufficient for writing, journalling and thinking more broadly. I believe it is an intermediate interface until we have more involved and capable models.

Some of the limitations I've found are:

  1. Conclusivity - chat replicates conversations. Speech is ephemeral and final. You have a thought, you voice the thought, someone hears it and they render their glimpse of your thought. The chat interface mimics this well but this cannot cross over into writing. Writing, journalling, etc. is fundamentally iterative. It's not decisive or be-all-end-all. You have to write, rewrite, read, reread, edit, delete, copy, search and do it over and over again. Chat is poorly suited to keep up with this.

  2. Poor spatial integration - This I mean with respect to the interface. Different parts of your article have different considerations. They require a different touch. Chat treats them all the same. Furthermore, chat is entirely limited to the div that it resides in. Unless you have some way of transferring knowledge/insights from the chat to the article without having to Ctrl+C / Ctrl+V, it does not offer much. Sage from buildspace had a neat way of doing this by asking the model to change some information about your profile. However, it is still limited to stuff you know you want to fix.

  3. Context is prompt inhibited - I got this idea from Linus Lee's talk on Generative Interfaces. Essentially when you converse with a colleague or work together on something, there is an implicit, underlying context between the two of you. Not just the environmental and situational context. But also your history together, previous conversations, previous ideas, etc. Even with a stranger, there is a collective frame of reference which is an avenue for fruitful conversation (btw Noam Chomsky thought about these things way before we did). But with chat, all that context is lost. We need to manually add that in there. This is problematic since we are poor information compressors.

Ideas for a better interface

Through writing on Obsidian, ideas about what it takes to be a supportive writing interface have marinated and bubbled up into the following list. It's going to come as no surprise that we're in a "scanner"-era for AI interfaces. This term is borrowed from Balaji Srinivas while he was on Tim Ferris's podcast

Balaji Srinivasan: So you can think of a piece of paper, and then you can think of a scanner and then you can think of a text file. Right. So you go from physical to this hybrid, like physical to digital scanner and then digital native. Another example would be you’ve got a face-to-face meeting, you’ve got Zoom, and then you’ve got virtual reality. Zoom is also like a scanner. It’s taking the offline and putting it online. But then VR is like digital native. With me?

Tim Ferriss: I am.

Balaji Srinivasan: Third example, physical cash, then something like PayPal or fintech, which is just basically the scanner, taking that and putting it online. And then you have crypto, which swaps out the backend that is just now natively digital, it doesn’t have a physical form.

So the quest now is to build AI-native interfaces for writing and other creative endeavours. Some of the ideas here are windows into how we can build these interfaces. This is by no means a comprehensive list as many of the points haven't fully crystallized.

Connecting to infinity

In a previous essay, On Accessing Infinities, I stressed the importance of connecting your local idea lake with a wider knowledge domain (Google, Wikipedia, etc.). However, to be effective and valuable, this connection has to be terse and didactic.

In the context of writing with AI, I've tried to illustrate my point below.

With this mechanism, I believe that one's ability to research or even find relevant material is. The downside I see is that you can more easily get tunnel-visioned. If the model only suggests articles in line with what you start with, you're going to go down that path. There should be some contrarian parameters that we can set for the model (just like temperature) that will attempt to contradict what you've written.

This would not be that hard to implement right now with Exa.ai, ChatGPT and an Obsidian extension. Perhaps a project idea.

Smart friend? Keen mentor? Witty opponent? All of the above?

I really want the writing experience with an AI model to be like having a really really, REALLY smart friend by your side and just occasionally pointing stuff out like "Hey, this could be better?". Like Grammarly but instead of staying true to grammatical rules, it helps you stay true to your voice.

This would work better the more you write obviously. The system would develop an understanding of your diction, your style, and your interests. Your taste, essentially. And I envision one that can help you exact it.

Couple of things to note here:

  1. These models/agents would have to coinhabit your workspace. This means that they would need to have context on your previous writing, what you liked reading, what you didn't like reading, etc. This would be difficult to pin down let alone implement. I believe we could start by taking the article as context and when you're working on something
  2. Maggie Appleton does an excellent job of flushing this idea out in her LLM Sketchbook article. She calls them daemons (honestly a better name than what I came up with - characters. Too techbro coded ?) Essentially you can have these daemons that manifest different personas (like Devil's Advocate, Synthesizer, etc.) and will work in the background on your article and point out things you haven't thought about or should think about. Check out this video
  3. Once again, I do believe there should be some contrarian parameter. Maybe this could be a daemon. You don't want a parrot for a friend, do you? (only a stochastic parrot it would seem). But, there has to be some means for the system to reflect and digress from your argument just like a friend/mentor would. LLMs would need reasoning for that so there has to be some other stand-in mechanism for this behaviour.

Random Walks and Reader-generated Essays

In his essay Reader-generated Essays, Henrik Karlsson posits that an essay is a walk through a network of ideas. Each of the ideas themselves are user-written bullet lists that capture the kernel of the idea. These ideas are connected via hyperlinks (Obsidian-style).

Most people could benefit from writing down their thoughts in nested bullet points, instead of in sprawling paragraphs, so they can graphically see the relationship between arguments and discover if they are stuck in a subpoint and have lost the main thread.

By creating specialization, where an AI assistant takes care of communication, we can focus on improving our ideas. I think that is a valuable complementarity that we should seek to develop, and it should be within reach with today's technology.

This way we (the writer) can focus on generating, curating, pruning and cultivating the ideas (the notes). To write an essay, we can have a model assist us (by writing a draft) while we walk through our graph of ideas.

The model draft only serves as a jumping-off point for a possible rendering of the ideas in an essay format. Ideally, the writer would take parts of it and stitch them together with their characteristic prose.

The Engelbart Zoom

Douglas Engelbart in his conception of the About OHS - Doug Engelbart Institute, specified that there should be a mechanism for the user to zoom in and out of the knowledge domain at will in order to find the most relevant information at will.

We'll need more facile ways to traverse our knowledge domains as if we are flying around in an information space. We need to be able to quickly skim across the landscapes, and dive down into whatever detail suits our needs in the moment, zooming in and out of detail as desired.

I want to propose 2 different but related zoom mechanisms that I believe would approximate what Engelbart had in mind for the OHS.

File Optics

The first zoom mechanism is at a file level. When you write essays or have a collection of bullet lists, it becomes cumbersome to keep track of the idea flow. Many times, you need to zoom out to the gist or breakdown of what you're reading or where you're heading. Vice versa, sometimes you want a higher-order overview so that you can zoom in on what part you want.

The image below illustrates the aforementioned zoom-in and zoom-out mechanisms.

Zoom out helps you look at your argument as a whole and the overall structure. In this format, the model will look at relevant articles from the web and your vault and assist in structure and flow. Right now, zooming based on sections and subsections would be ideal. It maintains structural and semantic coherence. If the model were to come up with its own headings, it would throw the user off.

Zoom in helps you look at the finer-grained points in your sections. The model would ideally bring up relevant points for you to consider about the tone and how best to argue your point.

A LLM could easily provide this functionality. Have it breakdown and summarize the article based on section or topic differences. The spectrogram functionality (that I will detail shortly) does provide this functionality.

Graph Optics

The next zoom mechanism applies to the graph of ideas. I've got this idea from using Obsidian. Imagine you had a graph of ideas as shown below. (DISCLAIMER: This is not my Obsidian graph. I got it online from here. It serves the point I am trying to illustrate.)

Amongst your graph of ideas, suppose you could see clusters of what you've been thinking about the most. Suppose you could glance at these clusters and identify how different articles/thoughts/ideas interact. So in the zoomed-out view, you'd get something like this. Different parts of your graphs have different labels.

Now something piques your interest, and you zoom in to a certain section. You now get a deeper and deeper dive into what you've been cooking with the articles and mini-essays.

Spectrograms for writing

This idea sprung from Linus's AI-First User Interfaces talk. When we look at images, we don't look pixel by pixel. When we listen to music, we don't do a Fourier Analysis. But when we read text, we have to go through "pixel-by-pixel".

  • a picture can be broken down into a colour histogram. You can break it down further into pixels (which are atomic)
  • a song can be broken down into a frequency spectrogram. You can break it down further into the constituent frequencies (which are atomic)
  • a book essentially can be broken down (atomically) into words. But what is the statistic? What is the breakdown?

Essentially, the question is what are some of the the aggregation statistics for text/writing?

Style Spectrogram

The simplest spectrogram would be a tone analysis spectrogram. But these are the ones you can tune. You can ask it to be more playful (funny) or elocutive (instructional), etc.

Inspiration Spectrogram

This spectrogram allows you to view a breakdown of what authors/ideas influence your writing. This would come in the way of

  1. references
  2. common ideas discussed
  3. The subject being discussed

So now the model analyses your writing. Finds what references and what ideas you borrow. Shows the degree of influence (in terms of amount of words dedicated to the idea).

Walkthrough graph Spectrogram

This is an idea out of Henrik Karlsson's reader-generated essay writeup. Say you have a walk of some notes in your vault. You walk through them and the model generates an essay. But say you want it to focus on essay B more than essay A. Now you can tune it.

So you can tune the settings for how "weighted" these essays are in influencing the final essay. You could really experiment and see how different the essays turn out by tweaking the knobs.

Caveat

  • While there are methods to do topic modelling / feature extraction from models, none of them integrate just as well as I have described here. Sometimes the topics generated are quite generic and lack context. And context is the product.
  • I still am not sure if existing writing tools can be patched with these features. Would we need an entirely new writing software for this? Could Notion or Obsidian implement something like this?

Some more from the kitchen

Here are some other ones that I will be flushing out:

  1. Version control - having some version control/branching mechanism for writing. Maybe some interface/design of pursuing a different thread or going in a different direction. Like a branch in a repository. If it doesn't pan out, ok. git checkout main. If it does. Lovely. git checkout main and git merge alternate-idea.
  2. Epi - Maggie Appleton wrote about "Epi" in her essay on LLM Interfaces that I think could be exceptionally valuable. The whole idea of the AI-generated / AI-powered context menu is super-underrated and I think they could be really powerful.
  3. Rabbitholing - this would be something along the lines of reading/writing. I love Andy Matuschak's working notes page.
    I think it's a phenomenal example of how rabbit holing can work well. When you read an article you can click a link and click once more and keep going to your heart's desire. But your train of thought is captured.

Caveats

As mentioned earlier, this system is on the rare side of cooked.

  • No discovery - the system I have described here is useful only after you've got an idea and you've started writing a little bit. However, for writing and idea generation, a lot of the work is in time spent not writing. The time spent ruminating about ideas and letting them simmer, and cook in the cauldron of your mind is just as important as the act of writing. I haven't figured out how a system could contribute to that just yet.

  • Idea tracking - the system above once again does not help you in the idea-generation phase. Having a method for idea tracking (prompts for writing) would be golden as well. Once again, I haven't figured out how to do this. Not without egregious privacy violations at least

A new ethos for AI for creatives

I truly believe that the next step for AI systems in creative spaces like drawing, writing, film, etc. will be that of iteration. Iterative improvement alongside the human should be the goal and raison d'etre for such systems.

Once we have integrated past the chat interface where the AI models become specialized and more adept at manipulating tools, a holistic interface for writing can be built. For now, the toggling between Obsidian, Claude and Perplexity and Chrome seems to be it.

Until next time!

(P.S. Shout out to Liam, sophy, Devin and Chirayu for reading drafts of this and providing their feedback.)