November 2023
Meditron: open-source medical LLM
Nov 28
Stable Diffusion Video
Nov 21
Temporally stable gen AI video
OpenAI dev day keynote
Nov 6
Released some interesting stuff, GPT4 turbo, 128k context window for GPT4, price * 1/3 on avg. Code interpreter and retrieval, I’m excited to dig into these to see where they netted out. They also released a bunch of features for ChatGPT.
October 2023
Ultra-fast deep-learned CNS tumour classification during surgery
Oct 18
Oct 17
MemGPT – LLMs with self-editing memory for unbounded context
Oct 16
See paper. I take a similar but simpler approach with Edie in that long term memory about the user is uncategorized. As noted in the HN comments this enables graceful degradation of recall that, on it’s surface, seems nice. However it can be off-putting in a chat context because humans don’t tend to forget information in the same way.
The key is to model how humans handle this gracefully degradation, either through traditional means or by training.
HN discussion on “ChatGPT’s system prompts”
Oct 15
Repo author did a great job providing answering why he believes these are not hallucinations.
PGVector vs Pinecone
Oct 14
Supabase puts out a ton of high quality content on LLM app building blocks. Love their preference of battle tested infrastructure. Seems like PGVector is coming along and will be able to handle large scale.
I wonder if the “you’re not web scale” adage applies here. Yes if one or few embeddings per datum, no if 10s or 100s of embeddings per datum.
Efficient Streaming Language Models with Attention Sinks
Oct 2
September 2023
A Hacker’s Guide to Langage Models
Video, posted by Alexandr Kurlin to the YC AI/ML group. Succinct one layer down explanation of LLMs.
Generative AI’s Act Two
Sept 20
Landscape overview from Sequoia. Lots of interesting stuff here.
“Some of the best consumer companies have 60-65% DAU/MAU; WhatsApp’s is 85%. By contrast, generative AI apps have a median of 14% (with the notable exception of Character and the “AI companionship” category).” — This makes sense to me based on exploratory work on Edie. Part of the reason is personification of the LLM. Your brain participates in the process similar to rubber ducking.2
Subliminal Advertisement
Sept 19
Do half of AI researchers believe that there's a 10% chance AI will kill us all?
Sept 19
Answer: maybe.
Notes on the Foundation DB paper
Sept 18
Not directly related to generative AI, but the more I learn about FDB the more I think it’s worth a serious look at building the next generation of technology on a more sound footing.
Lantern: PG vector extension that includes embedding gen via CLIP or HF models.
Sept 13
August 2023
July 2023
AI generated full South Park episode
July 20
I mean, it’s not a great episode but that’s not really the point.
Unverified pastebin on GPT4 internals
July 11
The Last of Us starring Ellen Page and Hugh Jackman
July 7
Deepfake actors into entire shows. As one commenter put it — “The future of movie streaming: please select movie, please select actor.”
June 2023
What are embeddings?
June 25
Fantastic resource, see linked PDF. Covers much of the history of
The Secret Sauce behind 100K context window in LLMs: all tricks in one place
June 18
AI Photojournalism
June 10
None of these people are real.
MusicGen: Simple and Controllable Music Generation
June 10
Controlnet QR Codes
June 7 — Warning, mildly NSFW
May 2023
Japan Goes All In: Copyright Doesn’t Apply To AI Training
May 31
OpenAI's plans according to Sam Altman
May 31
Production AI systems are really hard
Firsthand account of trying to build an Radiology AI system. First FDA-approved. HN thread here.
What if we set GPT-4 free in Minecraft?
May 26
Autonomous agents, different approach than AutoGPT etc. They build up a library of generated code actions and create embeddings of the doc strings. Thoughts → actions → vector search for code actions.
Begs the question, has the 3d simulation that births AGI already been created? Is it Minecraft?
Can ChatGPT-4 write a good melody?
May 21
Patrick Collison interviews Sam Altman
May 17
promptfoo - the prompt engineering helper
May 14
“promptfoo helps you tune LLM prompts systematically across many relevant test cases.”
ReAct: Synergizing reasoning and acting in language models
May 14
Brex’s prompt engineering guide
May 14
Simon Willison uses GPT4 to design sqlite-history
May 13
Raw transcript between Simon and GPT4. Raw transcripts are fun as you get to see how other people interact with the LLM, pick up tips, etc.
As for the content, it reinforces my belief that current LLMs are really good sounding boards, but you need to be an expert in the subject matter to use them effectively (for creating new things, not true for learning, etc).
ImageBind
Meta — May 10
Multimodal model, combine images, text, depth maps, motion data, etc and generate between each modality.
Generate 3D objects conditioned on text or images
OpenAI project.
Thoughts
- 100k to 1M token windows coming, does that change things? If more context is always better then token budget management will be important regardless of tech progress.
- Is there a theoretical max valuable context for a given prompt? It seems like there is low max valuable context for some queries (”what’s 1+1?”), high max valuable context for others (”what is the history of the world?”), and variable max valuable context (”who am I?”).
April 2023
Guiding questions / interests:
- LLM Memory / context window selection
- How to create a test suite that ensures desired traits are carried through prompt and model changes.
- How to design an agent inner monologue that identifies user goals and preferences.
EVA — AI relational database
Declarative approach to AI and non-AI system composition. Where in an imperative system like LangChain you explicitly define program flow, in a declarative system like EVA you define what you want and the system figures out how (I’m assuming).
EVA describes itself as “supporting database applications that operate on both structured (tables, feature vectors) and unstructured data (videos, podcasts, PDFs, etc.) using deep learning models”.
Something to investigate wrt building centralized LLM-based applications.
Snapchat’s MyAI system message
OP jailbroke Snapchat’s assistant to reveal it’s system message. It’s what you’d expect, but there are some interesting tidbits:
- “Create a natural, easygoing, back-and-forth flow to the dialogue. Don’t go into a monologue!”
- “You must ALWAYS be extremely concise! 99% of the time, your lines should be a sentence or two. Summarize your responses to be as brief as possible.”
- “Provide fun, harmless, and lighthearted preferences, but never have negative opinions or make adversarial judgements on sensitive topics such as: politics, religions, …”
- A significant portion of the system message is spent ensuring that the assistant should not reveal the location of the user if it’s not explicitly included in the system message (inclusion indicates the user has authorized location services).
The meta takeaway from this (and ChatGPT jailbreaks) is that the system message shouldn’t be sensitive or secret information — users will find a way to trick the LLM into revealing any information used to generate the response.
Partnering with Harvey: Putting LLMs to Work
From the article: “they understand both the problem (rote cognitive labor) and the solution (large language models).” Fantastic insight that we can use to pattern match whether LLMs are a good solution for a particular problem.
Understanding the real-world use cases where LLMs are better than traditional approaches is important in cutting through the hype, and it tracks that they’re good at rote cognitive labor based on current successes (e.g. Jasper).
Flux Copilot: AI for hardware design
Custom-trained LLM for hardware design. Knows the entire context of your project including component list, connections, and related part datasheets.
Can You Put it All Together: Evaluating Conversational Agents’ Ability to Blend Skills
Eric Michael Smith et al — April 17, 2020
Authors trained a “dialogue manager” to effectively combine existing single-task models (each with existing benchmarks) into a larger blended model. The dialogue manager routed prompts to the best suited single-task model.
While this doesn’t apply to prompting directly, this paper inspired me to consider a similar approach for improving agent responses by implementing a prompt preprocessor / planner to adjust the downstream system message for both ‘act as’ instructions and context facts.
The Inverse Scaling Prize
As models grow they typically produce better results for the same prompt except for when they don’t. This prize incentivizes finding prompts that trend worse.
A Standard Model of the Mind: Toward a Common Computational Framework across Artificial Intelligence, Cognitive Science, Neuroscience, and Robotics
Laird et al — Winter 2017
Overview of different cognitive architectural models.
Prompt Engineering
Lilian Weng — March 2023
Discusses techniques like self-consistency sampling, chain-of-thought prompting, and augmenting (e.g. search engines, APIs) ex: (PAL) and Tool Augmented Language Models (TALM)
Prompt Engineering vs Blind Prompting
Mitchell Hashimoto — April 2023
Founder of Hashicorp explains how to move from ‘try random stuff’ (blind prompting) to a more rigorous method for evolving prompts over time (prompt engineering).
Generative Agents: Interactive Simulacra of Human Behavior
Joon Sung Park et al — April 7, 2023
Use techniques like observation, planning, reflection to create autonomous genai-powered agents. Discusses design of memory that powers defining goals and inner monologue.
Edge of Realism
April 25, 2023
Getting Scary This Time: I Separated the Face & Hand Motion in a Video
April 23, 2023
Method for temporal (frame-to-frame) consistency. Extract keyframes from original video, create grid, run through SD, split back to keyframes, ebsynth between, etc. Workflow here: https://www.reddit.com/r/StableDiffusion/comments/11zeb17/tips_for_temporal_stability_while_changing_the/
Insights
- Big businesses can be built on using LLMs to solve rote cognitive work (Jasper, Harvey).
- LLMs enable human, emotional connection with software.
- GPT4 is not good enough (or even the right tool) to eliminate human in the loop. This might be a fundamental issue with the technology, not fixed with more parameters / data.
- Context window sizes will increase but never be unlimited. Therefore context selection will always be a problem to solve.
- Prompt injection / exfil is really hard to solve. Design systems that treat this info as public.
Working through
- https://dl.acm.org/doi/10.1145/219717.219808
- https://escholarship.org/uc/item/6x5933cw
- https://dl.acm.org/doi/pdf/10.1145/3526113.3545616 — Social Simulacra
- https://books.google.com/books?hl=en&lr=&id=cLofEAAAQBAJ&oi=fnd&pg=PR13&dq=info:-n7r9kLT8vQJ:scholar.google.com&ots=XtJLwxvN_p&sig=n-8cszQzAloPsG_UN1U0xnYGlmY#v=onepage&q&f=false — soar cognitive architecture
- https://scholar.google.com/citations?view_op=view_citation&hl=en&user=ea6cjVUAAAAJ&citation_for_view=ea6cjVUAAAAJ:HhcuHIWmDEUC — standard model of the mind
- Jsonformer — structured output from LLMs
- Interesting comment from yc founder kcorbitt about similar system where extraction is done one datum at a time.
- https://arxiv.org/abs/2305.01625