Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
Claude RAG Brain Injury Analysis
This talk covers building a RAG + LLM system using Claude 100k to analyze medical records and research for understanding complex brain injury and treatment.
On August 4th, my son suffered a severe brain hemorrhage from a rare and undiagnosed brain condition, thrusting me onto the steep learning curves of the medical system, neurosurgery, rehabilitation and other topics. So I build a RAG + LLM system using Claude 100k to help me quickly learn and understand — that works by ingesting his medical records, my log of detailed note (in OSX notes app) and full index and summaries of relevant research papers.
Demonstrates core UI flow for comment entry, submission, and cancellation controls.
- Claude 100kAnthropic's Claude 100k delivers a massive 100,000-token context window (approx. 75,000 words): analyze hundreds of pages of documents instantly.Claude 100k, developed by Anthropic, is a significant leap in large language model capacity, expanding the context window from 9,000 to 100,000 tokens. This upgrade allows the model to process approximately 75,000 words—hundreds of pages—in a single prompt (e.g., an entire novel or a full annual report). The technology enables rapid, deep analysis: users can ingest dense materials like financial statements, legal briefs, or entire codebases and extract synthesized insights in seconds, dramatically reducing human analysis time from hours to under a minute. It's a game-changer for complex information retrieval and long-form conversational coherence.
- RAGRAG (Retrieval-Augmented Generation) is the GenAI framework that grounds LLMs (like GPT-4) on external, verified data, drastically reducing model hallucinations and providing verifiable sources.RAG is a critical GenAI architecture: it solves the LLM 'hallucination' problem by inserting a retrieval step before generation. A user query is vectorized, then used to query an external knowledge base (e.g., a Pinecone vector database) for relevant document chunks (typically 512-token segments). These retrieved facts augment the original prompt, providing the LLM (e.g., Gemini or Llama 3) the specific, current, or proprietary context required. This process ensures the final response is accurate and grounded in domain-specific data, avoiding the high cost and latency of full model retraining.
- OSX Notes
- Dynamic MegapromptAdaptive, large-scale prompt engineering: it tailors extensive input to instance-specific needs for superior LLM output.Dynamic Megaprompting is the next-gen evolution of prompt engineering, combining the depth of a multi-thousand-line mega-prompt with instance-aware adaptability. This technique employs a small, trainable learning network to adjust prompt properties dynamically (position, length, representation). It moves past fixed soft prompts. For example, a static 1500-word instruction set becomes a real-time, optimized context delivery system, boosting model performance and generalization across diverse inputs. We get precision at scale: maximum context, minimum overhead.
- LLMLarge Language Models (LLMs) are deep learning models, built on the Transformer architecture, that process and generate human-quality text and code at scale.LLMs are a class of foundation models: massive, pre-trained neural networks (often with billions to trillions of parameters) that leverage the self-attention mechanism of the Transformer architecture (introduced in 2017) to predict the next token in a sequence. Trained on vast datasets (e.g., Common Crawl's 50 billion+ web pages), these models—like GPT-4, Gemini, and Claude—acquire predictive power over syntax and semantics. They function as general-purpose sequence models, enabling critical applications such as complex content generation, language translation, and automated code completion (e.g., GitHub Copilot). Their core value: generalizing across diverse tasks with minimal task-specific fine-tuning.