ZK’s GenAI reading list

November 2023

Meditron: open-source medical LLM

Nov 28

Stable Diffusion Video

Nov 21

Temporally stable gen AI video

OpenAI dev day keynote

Nov 6

Released some interesting stuff, GPT4 turbo, 128k context window for GPT4, price * 1/3 on avg. Code interpreter and retrieval, I’m excited to dig into these to see where they netted out. They also released a bunch of features for ChatGPT.

October 2023

Ultra-fast deep-learned CNS tumour classification during surgery

Oct 18

Oct 17

MemGPT – LLMs with self-editing memory for unbounded context

Oct 16

See paper. I take a similar but simpler approach with Edie in that long term memory about the user is uncategorized. As noted in the HN comments this enables graceful degradation of recall that, on it’s surface, seems nice. However it can be off-putting in a chat context because humans don’t tend to forget information in the same way.

The key is to model how humans handle this gracefully degradation, either through traditional means or by training.

HN discussion on “ChatGPT’s system prompts”

Oct 15

Repo author did a great job providing answering why he believes these are not hallucinations.

PGVector vs Pinecone

Oct 14

Supabase puts out a ton of high quality content on LLM app building blocks. Love their preference of battle tested infrastructure. Seems like PGVector is coming along and will be able to handle large scale.

I wonder if the “you’re not web scale” adage applies here. Yes if one or few embeddings per datum, no if 10s or 100s of embeddings per datum.

Efficient Streaming Language Models with Attention Sinks

Oct 2

September 2023

A Hacker’s Guide to Langage Models

Video, posted by Alexandr Kurlin to the YC AI/ML group. Succinct one layer down explanation of LLMs.

Generative AI’s Act Two

Sept 20

Landscape overview from Sequoia. Lots of interesting stuff here.

“Some of the best consumer companies have 60-65% DAU/MAU; WhatsApp’s is 85%. By contrast, generative AI apps have a median of 14% (with the notable exception of Character and the “AI companionship” category).” — This makes sense to me based on exploratory work on Edie. Part of the reason is personification of the LLM. Your brain participates in the process similar to rubber ducking.2

Subliminal Advertisement

Sept 19

image
image

Do half of AI researchers believe that there's a 10% chance AI will kill us all?

Sept 19

Answer: maybe.

Notes on the Foundation DB paper

Sept 18

Not directly related to generative AI, but the more I learn about FDB the more I think it’s worth a serious look at building the next generation of technology on a more sound footing.

Lantern: PG vector extension that includes embedding gen via CLIP or HF models.

Sept 13

August 2023

July 2023

AI generated full South Park episode

July 20

I mean, it’s not a great episode but that’s not really the point.

Unverified pastebin on GPT4 internals

July 11

The Last of Us starring Ellen Page and Hugh Jackman

July 7

Deepfake actors into entire shows. As one commenter put it — “The future of movie streaming: please select movie, please select actor.”

June 2023

What are embeddings?

June 25

Fantastic resource, see linked PDF. Covers much of the history of

The Secret Sauce behind 100K context window in LLMs: all tricks in one place

June 18

AI Photojournalism

June 10

None of these people are real.

image
image
image

MusicGen: Simple and Controllable Music Generation

June 10

Controlnet QR Codes

June 7 — Warning, mildly NSFW

May 2023

Japan Goes All In: Copyright Doesn’t Apply To AI Training

May 31

OpenAI's plans according to Sam Altman

May 31

Production AI systems are really hard

Firsthand account of trying to build an Radiology AI system. First FDA-approved. HN thread here.

What if we set GPT-4 free in Minecraft?

May 26

Autonomous agents, different approach than AutoGPT etc. They build up a library of generated code actions and create embeddings of the doc strings. Thoughts → actions → vector search for code actions.

Begs the question, has the 3d simulation that births AGI already been created? Is it Minecraft?

Can ChatGPT-4 write a good melody?

May 21

Patrick Collison interviews Sam Altman

May 17

promptfoo - the prompt engineering helper

May 14

“promptfoo helps you tune LLM prompts systematically across many relevant test cases.”

ReAct: Synergizing reasoning and acting in language models

May 14

Brex’s prompt engineering guide

May 14

Simon Willison uses GPT4 to design sqlite-history

May 13

Raw transcript between Simon and GPT4. Raw transcripts are fun as you get to see how other people interact with the LLM, pick up tips, etc.

As for the content, it reinforces my belief that current LLMs are really good sounding boards, but you need to be an expert in the subject matter to use them effectively (for creating new things, not true for learning, etc).

ImageBind

Meta — May 10

Multimodal model, combine images, text, depth maps, motion data, etc and generate between each modality.

Generate 3D objects conditioned on text or images

OpenAI project.

Prompt: A chair that looks like an avocado
Prompt: A chair that looks like an avocado

Thoughts

  • 100k to 1M token windows coming, does that change things? If more context is always better then token budget management will be important regardless of tech progress.
  • Is there a theoretical max valuable context for a given prompt? It seems like there is low max valuable context for some queries (”what’s 1+1?”), high max valuable context for others (”what is the history of the world?”), and variable max valuable context (”who am I?”).

April 2023

Guiding questions / interests:

  • LLM Memory / context window selection
  • How to create a test suite that ensures desired traits are carried through prompt and model changes.
  • How to design an agent inner monologue that identifies user goals and preferences.

EVA — AI relational database

Declarative approach to AI and non-AI system composition. Where in an imperative system like LangChain you explicitly define program flow, in a declarative system like EVA you define what you want and the system figures out how (I’m assuming).

EVA describes itself as “supporting database applications that operate on both structured (tables, feature vectors) and unstructured data (videos, podcasts, PDFs, etc.) using deep learning models”.

Something to investigate wrt building centralized LLM-based applications.

Snapchat’s MyAI system message

OP jailbroke Snapchat’s assistant to reveal it’s system message. It’s what you’d expect, but there are some interesting tidbits:

  • “Create a natural, easygoing, back-and-forth flow to the dialogue. Don’t go into a monologue!”
  • “You must ALWAYS be extremely concise! 99% of the time, your lines should be a sentence or two. Summarize your responses to be as brief as possible.”
  • “Provide fun, harmless, and lighthearted preferences, but never have negative opinions or make adversarial judgements on sensitive topics such as: politics, religions, …”
  • A significant portion of the system message is spent ensuring that the assistant should not reveal the location of the user if it’s not explicitly included in the system message (inclusion indicates the user has authorized location services).

The meta takeaway from this (and ChatGPT jailbreaks) is that the system message shouldn’t be sensitive or secret information — users will find a way to trick the LLM into revealing any information used to generate the response.

Partnering with Harvey: Putting LLMs to Work

From the article: “they understand both the problem (rote cognitive labor) and the solution (large language models).” Fantastic insight that we can use to pattern match whether LLMs are a good solution for a particular problem.

Understanding the real-world use cases where LLMs are better than traditional approaches is important in cutting through the hype, and it tracks that they’re good at rote cognitive labor based on current successes (e.g. Jasper).

Flux Copilot: AI for hardware design

Custom-trained LLM for hardware design. Knows the entire context of your project including component list, connections, and related part datasheets.

Can You Put it All Together: Evaluating Conversational Agents’ Ability to Blend Skills

Eric Michael Smith et al — April 17, 2020

Authors trained a “dialogue manager” to effectively combine existing single-task models (each with existing benchmarks) into a larger blended model. The dialogue manager routed prompts to the best suited single-task model.

While this doesn’t apply to prompting directly, this paper inspired me to consider a similar approach for improving agent responses by implementing a prompt preprocessor / planner to adjust the downstream system message for both ‘act as’ instructions and context facts.

The Inverse Scaling Prize

As models grow they typically produce better results for the same prompt except for when they don’t. This prize incentivizes finding prompts that trend worse.

A Standard Model of the Mind: Toward a Common Computational Framework across Artificial Intelligence, Cognitive Science, Neuroscience, and Robotics

Laird et al — Winter 2017

Overview of different cognitive architectural models.

Prompt Engineering

Lilian Weng — March 2023

Discusses techniques like self-consistency sampling, chain-of-thought prompting, and augmenting (e.g. search engines, APIs) ex: (PAL) and Tool Augmented Language Models (TALM)

Prompt Engineering vs Blind Prompting

Mitchell Hashimoto — April 2023

Founder of Hashicorp explains how to move from ‘try random stuff’ (blind prompting) to a more rigorous method for evolving prompts over time (prompt engineering).

Generative Agents: Interactive Simulacra of Human Behavior

Joon Sung Park et al — April 7, 2023

Use techniques like observation, planning, reflection to create autonomous genai-powered agents. Discusses design of memory that powers defining goals and inner monologue.

Edge of Realism

April 25, 2023

image
image
image

Getting Scary This Time: I Separated the Face & Hand Motion in a Video

April 23, 2023

Method for temporal (frame-to-frame) consistency. Extract keyframes from original video, create grid, run through SD, split back to keyframes, ebsynth between, etc. Workflow here: https://www.reddit.com/r/StableDiffusion/comments/11zeb17/tips_for_temporal_stability_while_changing_the/

Insights

  • Big businesses can be built on using LLMs to solve rote cognitive work (Jasper, Harvey).
  • LLMs enable human, emotional connection with software.
  • GPT4 is not good enough (or even the right tool) to eliminate human in the loop. This might be a fundamental issue with the technology, not fixed with more parameters / data.
  • Context window sizes will increase but never be unlimited. Therefore context selection will always be a problem to solve.
  • Prompt injection / exfil is really hard to solve. Design systems that treat this info as public.

Working through