Beyond Prompts
- All I want to do is steering towards acceptable results rather than just tweaking prompts and hoping for the best — a judicious balance of constraints and freedom.
- Finding interaction patterns that give more calibrated control could be key. How can we discover interfaces that unlock deeper and more tailored integrations between users and generative models beyond sentence prompts? This could significantly augment creative and knowledge work.
The Curse of Indirection
- Current interfaces for working with generative AI models are indirect—we manipulate models mainly through text prompts and conversations. This adds friction and distance between the user’s intent and the model’s output.
- Current text prompts place generative models at arm’s length, like trying to steer a car from the passenger seat. More integrated, direct ways of manipulating models could improve workflows. More integrated interactions could provide proper driver’s seats for precise guidance.
- Quoting Kate Compton’s Casual Creators theory “the possibility space should be narrow enough to exclude broken artifacts… but broad enough to contain surprising artifacts as well. The surprising quality of the artifacts motivates the user to explore the possibility space in search of new discoveries, a motivation which disappears if the space is too uniform”
- Context menus inside documents that let users branch out of their current vertical by highlighting texts/keywords could be way to overcome indirection. **
-
Example of process that might generate better value: [Hyperlinks on results](https://www.notion.so/re-engineering-prompt-interfaces-6a572f1089c64981884d0558338a0f7b?pvs=21). Clicking helping to expand the topic, based on the prompt being discussed, then clicking back minimizing it. Word exploration through clickable words that function as ever expanding tree toggles
right?!
-
- Even within the space of text based interaction, we want to keep the lineage of information that changed over time instead of overwriting the fact. Forexample using Rich Hickey’s perspective on information updating theory, “If my favorite color was red and now it’s blue, we don’t go back and change the fact that my favorite color was red to be blue – that’s wrong. Instead, we add a new, updated fact that my favorite color is now blue, but the old fact remains historically true.”
Exploring Latent Spaces
- Moving through latent spaces quickly and viscerally is an alternative to conversational prompts. This ties to the idea from that we lack tactile, direct ways to guide text generation. New interaction techniques that let users directly manipulate directions and vectors in latent space could unlock more creative possibilities. Traversing latent idea spaces via prompts resembles blind navigation through textual adventure games. New interaction paradigms could make exploration more immersive, like being in the drivers seat of the tools using these vectors to explore/discover in latent space.
- What most traditional apps that sit on top of large amounts of data do is, how do you take a commodity in a database and layer curation and recommendation in ways that are more usable and friendly than just giving people a search box and pushing them out of the door? Is there a way to add horizontal expansion to search instead of vertical digging that requires reformatting inserts? How do you break out of hierarchical directories that don’t scale (ie. Yahoo’s directory) — even when the hierarchies are just ranked search results from the users’ prompts?
- I keep referring back to the covid times where I started using Roam Research for my note-taking. I was in college and had time. Back-propagation by directly playing around with interfaces. I didn’t start out being a programmer so I’ve always wondered about how to intuitively control end products to change the source code. Further extending this with what I said about language, how can we use the newfound abilities of coding on command to back-propagate information processing? So like instead of going on my weather app and searching through when it’s warm enough to leave my apt in cold nyc Dec without a jacket, moving the temperature slider to a higher degree to back-propagate the dates.
I keep thinking of what the nested knowledge graphs of roam.research look like if they were autogenerated instead of us manually generating interlinks. Learning would be awesome!
- This is my roam pages networked graphics during covid when I had the time to interlink notes. It was fun and useful, but never ended up working for me due to the intensive writing process whenever typing to include “[[ ]]” whenever trying to interlink topics and having to manually remember what to even link.
- Especially useful when the tool picks up on notes that are proper names that need to be clarified further…
- It can help me notice connections between ideas in my notes that I wouldn’t have even thought to make myself, even if I were trying to find interesting notes to link together. With a smarter system, a similar interface could even automatically discover and show links from your notes to high-quality articles or online sources that you may not have seen yet, automatically crawling the web on your behalf.
Going back to the analogy of driving cars, in addition to giving you the seat and a steering wheel, it’s allowing you to have a windshield to look across and see where you want to maneuver!!
Balancing Guardrails and Possibilities
- As noted in the initial thoughts, providing guardrails for safety while preserving expansive capability spaces is an important challenge. At the end of the day, we need windshields for a reason! Permitting expansive possibility spaces risks accidents or misuse. Even the drifting of attention. However, back-propagating user edits to tune outputs that could strike this balance.
Anyways, tying back this to current professional parsing tools I think finance, legal, and medical sectors deal with highly complex and structured data sets. Outside of the commonly thought of reasons on working with these data structures (like the high stakes decisions, regulatory compliance and precision needed, and room for automation/personalization), I think the complexity offers fertile ground for experimenting with innovative interaction models to manage, interpret, and manipulate such data effectively given their pre-given formatting.
Let me know if you would like me to elaborate or focus on any part of this synthesis further! I’ll leave with this: Static information media severely limits what ideas we can express and manipulate. We’re limited by how much we can conveniently represent, and so much thinking is still trapped inside the head. Dynamic, interactive media could empower entirely new realms of thinking.
Anecdotally, one summer at fb messenger my main role was on message search and ability to surface it well. I look at what is the best way to give the prompter something they want — even before they realize they want it. It all started with looking at simple descriptive stats about usage for in chat searches. People commonly searched for numbers, emails, passwords, or dates/locations. What if there’s a way to use people’s current usage flow, to add layers that guide to more discovery instead of just waiting for the user guided flow.
Obviously the worry here is not to overwhelm the user by providing buttons/flows they didn’t ask for. But I believe there is a world where it can empathetically be done!
Another worry might be tools like Harvey or Hebbia using users VDR to bring up knowledge graphs and predefined prompts that might seem intrusive and not-secure. I hope the only things standing between our current state and when this becomes the norm is some time and better enterprise ai security systems.
This is just wonderings I’ve had just written down to help me visualize my thoughts. Regardless, let me know what you think or lines/topics you’d want to explore further.