Temporally stable gen AI video
Released some interesting stuff, GPT4 turbo, 128k context window for GPT4, price * 1/3 on avg. Code interpreter and retrieval, I’m excited to dig into these to see where they netted out. They also released a bunch of features for ChatGPT.
See paper. I take a similar but simpler approach with Edie in that long term memory about the user is uncategorized. As noted in the HN comments this enables graceful degradation of recall that, on it’s surface, seems nice. However it can be off-putting in a chat context because humans don’t tend to forget information in the same way.
The key is to model how humans handle this gracefully degradation, either through traditional means or by training.
Repo author did a great job providing answering why he believes these are not hallucinations.
Supabase puts out a ton of high quality content on LLM app building blocks. Love their preference of battle tested infrastructure. Seems like PGVector is coming along and will be able to handle large scale.
I wonder if the “you’re not web scale” adage applies here. Yes if one or few embeddings per datum, no if 10s or 100s of embeddings per datum.
Video, posted by Alexandr Kurlin to the YC AI/ML group. Succinct one layer down explanation of LLMs.
Landscape overview from Sequoia. Lots of interesting stuff here.
“Some of the best consumer companies have 60-65% DAU/MAU; WhatsApp’s is 85%. By contrast, generative AI apps have a median of 14% (with the notable exception of Character and the “AI companionship” category).” — This makes sense to me based on exploratory work on Edie. Part of the reason is personification of the LLM. Your brain participates in the process similar to rubber ducking.2
Not directly related to generative AI, but the more I learn about FDB the more I think it’s worth a serious look at building the next generation of technology on a more sound footing.
I mean, it’s not a great episode but that’s not really the point.
Deepfake actors into entire shows. As one commenter put it — “The future of movie streaming: please select movie, please select actor.”
Fantastic resource, see linked PDF. Covers much of the history of
None of these people are real.
June 7 — Warning, mildly NSFW
Firsthand account of trying to build an Radiology AI system. First FDA-approved. HN thread here.
Autonomous agents, different approach than AutoGPT etc. They build up a library of generated code actions and create embeddings of the doc strings. Thoughts → actions → vector search for code actions.
Begs the question, has the 3d simulation that births AGI already been created? Is it Minecraft?
“promptfoo helps you tune LLM prompts systematically across many relevant test cases.”
Raw transcript between Simon and GPT4. Raw transcripts are fun as you get to see how other people interact with the LLM, pick up tips, etc.
As for the content, it reinforces my belief that current LLMs are really good sounding boards, but you need to be an expert in the subject matter to use them effectively (for creating new things, not true for learning, etc).
Meta — May 10
Multimodal model, combine images, text, depth maps, motion data, etc and generate between each modality.
- 100k to 1M token windows coming, does that change things? If more context is always better then token budget management will be important regardless of tech progress.
- Is there a theoretical max valuable context for a given prompt? It seems like there is low max valuable context for some queries (”what’s 1+1?”), high max valuable context for others (”what is the history of the world?”), and variable max valuable context (”who am I?”).
Guiding questions / interests:
- LLM Memory / context window selection
- How to create a test suite that ensures desired traits are carried through prompt and model changes.
- How to design an agent inner monologue that identifies user goals and preferences.
Declarative approach to AI and non-AI system composition. Where in an imperative system like LangChain you explicitly define program flow, in a declarative system like EVA you define what you want and the system figures out how (I’m assuming).
EVA describes itself as “supporting database applications that operate on both structured (tables, feature vectors) and unstructured data (videos, podcasts, PDFs, etc.) using deep learning models”.
Something to investigate wrt building centralized LLM-based applications.
OP jailbroke Snapchat’s assistant to reveal it’s system message. It’s what you’d expect, but there are some interesting tidbits:
- “Create a natural, easygoing, back-and-forth flow to the dialogue. Don’t go into a monologue!”
- “You must ALWAYS be extremely concise! 99% of the time, your lines should be a sentence or two. Summarize your responses to be as brief as possible.”
- “Provide fun, harmless, and lighthearted preferences, but never have negative opinions or make adversarial judgements on sensitive topics such as: politics, religions, …”
- A significant portion of the system message is spent ensuring that the assistant should not reveal the location of the user if it’s not explicitly included in the system message (inclusion indicates the user has authorized location services).
The meta takeaway from this (and ChatGPT jailbreaks) is that the system message shouldn’t be sensitive or secret information — users will find a way to trick the LLM into revealing any information used to generate the response.
From the article: “they understand both the problem (rote cognitive labor) and the solution (large language models).” Fantastic insight that we can use to pattern match whether LLMs are a good solution for a particular problem.
Understanding the real-world use cases where LLMs are better than traditional approaches is important in cutting through the hype, and it tracks that they’re good at rote cognitive labor based on current successes (e.g. Jasper).
Custom-trained LLM for hardware design. Knows the entire context of your project including component list, connections, and related part datasheets.
Eric Michael Smith et al — April 17, 2020
Authors trained a “dialogue manager” to effectively combine existing single-task models (each with existing benchmarks) into a larger blended model. The dialogue manager routed prompts to the best suited single-task model.
While this doesn’t apply to prompting directly, this paper inspired me to consider a similar approach for improving agent responses by implementing a prompt preprocessor / planner to adjust the downstream system message for both ‘act as’ instructions and context facts.
As models grow they typically produce better results for the same prompt except for when they don’t. This prize incentivizes finding prompts that trend worse.
A Standard Model of the Mind: Toward a Common Computational Framework across Artificial Intelligence, Cognitive Science, Neuroscience, and Robotics
Laird et al — Winter 2017
Overview of different cognitive architectural models.
Lilian Weng — March 2023
Discusses techniques like self-consistency sampling, chain-of-thought prompting, and augmenting (e.g. search engines, APIs) ex: (PAL) and Tool Augmented Language Models (TALM)
Mitchell Hashimoto — April 2023
Founder of Hashicorp explains how to move from ‘try random stuff’ (blind prompting) to a more rigorous method for evolving prompts over time (prompt engineering).
Joon Sung Park et al — April 7, 2023
Use techniques like observation, planning, reflection to create autonomous genai-powered agents. Discusses design of memory that powers defining goals and inner monologue.
April 25, 2023
April 23, 2023
Method for temporal (frame-to-frame) consistency. Extract keyframes from original video, create grid, run through SD, split back to keyframes, ebsynth between, etc. Workflow here: https://www.reddit.com/r/StableDiffusion/comments/11zeb17/tips_for_temporal_stability_while_changing_the/
- Big businesses can be built on using LLMs to solve rote cognitive work (Jasper, Harvey).
- LLMs enable human, emotional connection with software.
- GPT4 is not good enough (or even the right tool) to eliminate human in the loop. This might be a fundamental issue with the technology, not fixed with more parameters / data.
- Context window sizes will increase but never be unlimited. Therefore context selection will always be a problem to solve.
- Prompt injection / exfil is really hard to solve. Design systems that treat this info as public.
- https://dl.acm.org/doi/pdf/10.1145/3526113.3545616 — Social Simulacra
- https://books.google.com/books?hl=en&lr=&id=cLofEAAAQBAJ&oi=fnd&pg=PR13&dq=info:-n7r9kLT8vQJ:scholar.google.com&ots=XtJLwxvN_p&sig=n-8cszQzAloPsG_UN1U0xnYGlmY#v=onepage&q&f=false — soar cognitive architecture
- https://scholar.google.com/citations?view_op=view_citation&hl=en&user=ea6cjVUAAAAJ&citation_for_view=ea6cjVUAAAAJ:HhcuHIWmDEUC — standard model of the mind
- Jsonformer — structured output from LLMs
- Interesting comment from yc founder kcorbitt about similar system where extraction is done one datum at a time.