Bichen Wu - Centropy, Inc.

Bichen Wu - Centropy, Inc. | LinkedIn

Avi Chawla

Daily Dose of Data Science • 169K followers

What is Function calling & MCP for LLMs? (explained with visuals and code) Before MCPs became popular, AI workflows relied on traditional Function Calling for tool access. Now, MCP is standardizing it for Agents/LLMs. The visual below explains how Function Calling and MCP work under the hood. Let's learn more! In Function Calling: - The LLM receives a prompt. - The LLM decides the tool. - The programmer implements a procedure to accept a tool call request from the LLM and prepare a function call. The tool call request is found in the LLM's response when you prompt it. - A backend service executes the tool. This Function Calling takes place within our stack: - We host the tool. - We implement a logic to determine the tool to invoke and its parameters. - We execute it. So Function Calling requires us to wire everything manually. MCP simplifies this! Instead of hard-wiring tools, MCP: - Standardizes defining, hosting, and exposing tools. - Makes it easy to discover tools, understand schemas, and use them. - Demands approval before invoking them. - Detaches implementation from consumption. For instance, whenever you integrate an MCP server, you never write a line of Python code to integrate the tools. Instead, you just integrate the MCP server and everything beyond this follows a standard protocol handled by the MCP client and the LLM: - They identify the MCP tool. - They prepare the input argument. - They invoke the tool. - They use the tool’s output to generate a response. Everything happens through a standard (but abstracted) protocol. So here’s the key point—MCP and Function Calling are not in conflict. They’re two sides of the same workflow. - Function Calling helps an LLM decide what it wants to do. - MCP ensures that tools are reliably available, discoverable, and executable—without you needing to custom-integrate everything. For example, an agent might say, “I need to search the web,” using function calling. That request can be routed through MCP to select from available web search tools, invoke the correct one, and return the result. Check the workflow in the diagram below. I created it while taking inspiration from Femke Plantinga's post on this topic. If you want to understand this better with code, I have linked my article in the comments. ____ If you want to learn AI/ML engineering, I have put together a free PDF (530+ pages) with 150+ core DS/ML lessons. Get here: https://lnkd.in/gi6xKmDc ____ Find me → Avi Chawla Every day, I share tutorials and insights on DS, ML, LLMs, and RAGs.
Anshu Agarwal

Converge • 6K followers

Yesterday, after Berkeley SkyDeck Demo Day, I had the privilege of attending a fireside chat with Prof. Ion Stoica (UC Berkeley professor, co-founder of Databricks, Anyscale, LMArena). The conversation, moderated by Chon Tang (co-founder of SkyDeck), touched on one of the most pressing debates in tech today: Should AI development be open source or closed source? Prof. Stoica’s perspective was both clear and thought-provoking: On siloed efforts: Many brilliant researchers are working at the frontier labs, but often in parallel — repeating similar work. From a human capital standpoint, that’s not efficient. On AI’s societal importance: If we believe AI is critical for society, then we must ensure our collective talent is applied in the most effective, responsible way. On collaboration and openness: For researchers to truly collaborate, they need shared artifacts. That means not just open weights, but open data, open algorithms, open evaluations — a full, 360-degree open source approach. It was an inspiring reminder that the way we structure openness in AI will deeply influence how fast — and how responsibly — we advance the field.
Avi Chawla

Daily Dose of Data Science • 169K followers

What is Function calling & MCP for LLMs? (explained with visuals and code) Before MCPs became popular, AI workflows relied on traditional Function Calling for tool access. Now, MCP is standardizing it for Agents/LLMs. The visual below explains how Function Calling and MCP work under the hood. Today, let's learn: - Function calling by building custom tools for Agents. - How MCPs help by building a local MCP client with mcp-use and using tools from Browserbase MCP server. In Function Calling: - The LLM receives a prompt. - The LLM decides the tool. - The programmer implements a procedure to accept a tool call request from the LLM and prepare a function call. The tool call request is found in the LLM's response when you prompt it. - A backend service executes the tool. This Function Calling takes place within our stack: - We host the tool. - We implement a logic to determine the tool to invoke and its parameters. - We execute it. So Function Calling requires us to wire everything manually. MCP simplifies this! Instead of hard-wiring tools, MCP: - Standardizes defining, hosting, and exposing tools. - Makes it easy to discover tools, understand schemas, and use them. - Demands approval before invoking them. - Detaches implementation from consumption. For instance, whenever you integrate an MCP server, you never write a line of Python code to integrate the tools. Instead, you just integrate the MCP server and everything beyond this follows a standard protocol handled by the MCP client and the LLM: - They identify the MCP tool. - They prepare the input argument. - They invoke the tool. - They use the tool’s output to generate a response. Everything happens through a standard (but abstracted) protocol. So here’s the key point: MCP and Function Calling are not in conflict. They’re two sides of the same workflow. - Function Calling helps an LLM decide what it wants to do. - MCP ensures that tools are reliably available, discoverable, and executable, without you needing to custom-integrate everything. For example, an agent might say, “I need to search the web,” using function calling. That request can be routed through MCP to select from available web search tools, invoke the correct one, and return the result. Check the workflow in the diagram below. In this setup, to build a local MCP client, I used mcp-use because it lets us connect any LLM to MCP servers & build private MCP clients, unlike Claude/Cursor. - Compatible with Ollama & LangChain - Stream Agent output async - Built-in debugging mode, etc Find the mcp-use GitHub repo in the comments! ____ Find me → Avi Chawla Every day, I share tutorials and insights on DS, ML, LLMs, and RAGs.
Marcelo De Santis

The Ascent • 50K followers

When it comes to autonomous driving, Tesla and Waymo couldn’t be more different in how they approach the problem, and what they believe it will take to solve it. Waymo is all-in on the “hardware-heavy” school of thought: LiDAR, radar, HD maps, redundant compute, and highly curated geo-fenced environments. It’s a masterpiece of engineering, no doubt. And it works, but only in specific cities, under tightly controlled conditions, and with a very high operational cost. Tesla, on the other hand, is betting on vision and scale. No LiDAR. No HD maps. Just neural networks trained on billions of real-world miles, constantly updated from a global fleet of cars. Less hardware. Lower cost. More scale. And most importantly, a direct path to consumer vehicles, not just robotaxi pilots. From a tech perspective, the contrast is clear: • Waymo builds autonomy like it’s building a #NASA mission. • Tesla builds it like it’s scaling a #smartphone feature. I’m leaning toward Tesla’s approach, not because it’s perfect, but because it’s data-rich, less dependent on expensive hardware, and more #economically viable in the long run. Of course, the race isn’t over. #Regulations, edge cases, and safety validation will shape the outcome. But if autonomy is ultimately a software problem at scale, I think Tesla may be closer than we think. HITEC Angeles Investors The Tech Series Porsche AG Pirelli NASA - National Aeronautics and Space Administration General Motors Rivian BYD U.S. Department of Transportation OpenAI
Emilio Andere

Wafer • 10K followers

what if you could bring up a complete PyTorch backend for a new chip overnight? Meta recently published TritorX - a system that generates functionally correct Triton kernels for their custom ASIC (MTIA). they get LLMs to write kernels, but optimize for coverage instead of performance. to give a bit of context: enabling a new PyTorch backend is non-trivial. PyTorch supports roughly 3,500+ operators, and new accelerators must implement/lower a large subset of these (often hundreds of kernels) before common models can run with basic functional parity. this can make backend bring-up a multi-month to multi-year engineering effort. TritorX attacks this bottleneck by using an FSM-based agent (Finite State Machine-based agent) instead of the usual free-form tool calling. the LLM iterates through a state machine: generate kernel → lint → compile → test → debug → repeat. they explicitly chose FSM over fully agentic because it's more predictable and debuggable in production. they also don't just feed the LLM all available MTIA documentation. they just feed it PyTorch docstrings. surprisingly, the model learns hardware quirks (32-byte alignment, scatter store restrictions) autonomously by only using compiler errors and crash dumps. TritorX generated 481 ATen operators at 84.7% coverage, passing 20,000+ OpInfo tests. a full sweep takes about 2 hours on 200 MTIA devices. 80%+ of required operators worked out of the box. they also ran it on simulation for future hardware and hit 73% coverage on chips that don't exist yet! importantly, TritorX isn’t validated only on unit tests: Meta decomposes real production models (NanoGPT, DLRM, and large-scale recommendation systems) into their individual operator invocations and feeds these real model inputs directly into the validation loop. this not only enables backend bring-up without full end-to-end model support, but also matures previously generated kernels to specific production workloads, refining general operators to reliably support the exact shapes, dtypes, and argument patterns that the real production models exercise. this is an exciting early update for alternative hardware accelerators. more cost efficient bring-up gives more opportunity for alternative accelerators to compete with NVIDIA. paper: https://lnkd.in/eKsgr-uV
Ashu Garg

Foundation Capital • 41K followers

For AI infra startups, NVIDIA’s expanding footprint raises the familiar question: where can startups win? One answer is by pursuing what NVIDIA isn’t optimizing for - open, interoperable infrastructure and domain-specific solutions. Startups like Anyscale, built on open systems like Ray, are betting on this dynamic - prioritizing flexibility, developer control, cross-platform compatibility. Speed-to-deployment is another underappreciated vector. Many enterprises know they need to adopt AI, but lack the internal tooling or expertise to move quickly. Startups that help them get from "AI strategy" to "AI in production" - whether through observability, orchestration, cost optimization, or governance - can create immediate value. And as the ecosystem diversifies, startups that bridge across accelerators, clouds, and deployment environments will be especially well-positioned.
Evolving AI

124K followers

💻 Anysphere, the company behind the AI coding assistant Cursor, has raised $900 million in a funding round led by Thrive Capital, tripling its valuation to $9 billion. Andreessen Horowitz and Accel also participated in the round. Cursor has gained popularity among developers for its AI-powered code generation capabilities, reportedly producing nearly a billion lines of code daily through natural language prompts. The tool has achieved $200 million in annual recurring revenue and is used by companies like Stripe, Spotify, and OpenAI. This significant investment reflects the growing trend of AI-driven developer tools, with "vibe coding" — a term popularized by AI researcher and OpenAI cofounder Andrej Karpathy — becoming increasingly prevalent. Vibe coding emphasizes a more intuitive and fluid coding experience, allowing developers to interact with AI assistants to generate and modify code seamlessly. The surge in funding for Anysphere underscores the broader shift in investor focus from foundational AI models to application-level tools that integrate AI into everyday workflows. In 2024 alone, AI application startups raised $8.2 billion, more than double the previous year's total. As AI continues to transform software development, tools like Cursor are at the forefront, offering developers enhanced productivity and a new approach to coding. Want to keep up with AI? 🤖 Follow Evolving AI to stay ahead of your competition (trusted by +3 million followers online) ✉️ Join 50,000+ newsletter readers and stay updated on the latest AI insights: https://lnkd.in/em9B--mb
Mark Moyou, PhD

NVIDIA • 18K followers

I just crossed 4 years at NVIDIA. Here are some lessons I have learned about the journey and how to best prep yourself for a company like NVIDIA. The picture shows my PhD advisor, Anthony O. Smith, and PhD labmates Kaleb Smith, Ph.D. and Rana Haber who also work at NVIDIA at the GTC 2025 conference. The order from left to right from Kaleb shows when we got into NVIDIA. So three people from the same lab in one great company, hopefully more. Here are the lessons... Your success began many years in advance -- Dr Smith was using NVIDIA GPUs for LiDAR data processing since 2012. He introduced our lab to GPUs back then, and I became aware of parallel computing on GPUs. The person behind you (in chronological order) may open the door in front of you -- Kaleb Smith, Ph.D. graduated from our lab after me, but got to NVIDIA before me and was able to refer me Do great work even if it is not impactful, so others can build on it -- Kaleb Smith, Ph.D. took some of my early, very slow one-shot object tracking work and accelerated it on GPUs as one of his first PhD projects, this laid the foundation to do other great GPU work that led us to NVIDIA eventually. Know your strength and find a crew of folks that can help improve your weaknesses -- Rana Haber was great at math, Kaleb Smith, Ph.D. at deep learning, myself at neither. I was just persistent and extremely curious. Many dumb questions were asked and answered. Do your own research and trust what you find -- I was a conference volunteer for many years. In 2015, I saw @andrej karpathy, Ilya Sutskever, and Ian Goodfellow on a small stage and I knew their deep learning research would lead to something big. I told my advisor I wanted to switch to deep learning, got denied but kept an eye on it. You don't need a PhD, but it helps -- Having a PhD is a signal to an employer that you can independently solve challenging, novel problems and can learn new domains quickly. Think of it as a risk mitigation strategy for the employer. So if you don't have a PhD show how you can solve hard problems and learn new things independently. A degree is not enough, accumulate career capital -- The book "So good they can't ignore you" fundamentally shifted my trajectory. I went after accumulating as much extra career capital as I could to best position myself for the job market. If someone takes their time to mentor you, listen and implement. Don't be a lazy mentee. -- Your success is built on the work of others, how far you go is a reflection of the sacrifice of time and effort of those who came before you. Honor your ancestors and mentors by pushing yourself to be your best. Be the most persistent person in the room, there are very smart people in the world but persistence closes the gap. Stay technical as long as you can, and master your presentation skills The fastest way to build your reputation is to create more opportunity for others. Focus on doing your best work today, tomorrow will take care of itself
Sharada Yeluri

Astera Labs • 21K followers

The paper “Challenges and Research Directions for LLM Inference Hardware” by Patterson and Ma frames what the industry is already talking about - inference is becoming constrained less by compute and more by memory bandwidth and capacity. As model sizes push toward 10 trillion parameters, an HBM-only strategy becomes economically and physically untenable. The authors argue for a tiered memory hierarchy and position High Bandwidth Flash (HBF) as a capacity-oriented extension of the memory system rather than as traditional storage. HBF borrows the 3D-stacking model from HBM but applies it to NAND flash. Instead of a discrete SSD controller connected to flash over a narrow interface, HBF integrates logic and NAND dies into a single vertical stack. A logic base die - similar in role to HBM’s base layer - implements the PHY, multi-channel flash controllers, and ECC. Vertical TSVs run through the stack, enabling thousands of parallel data paths and aggregate read bandwidths of ~1.6 TB/s per stack. A single HBF stack can provide ~512 GB of capacity, compared to 48–64 GB per HBM4 stack. The interface uses wide, synchronous, DDR-style signaling to the accelerator - similar to HBM. There are clear trade-offs, though. HBF operates at microsecond-scale latencies. Accesses are page-based (kilobytes), and small random reads quickly degrade effective bandwidth. Write endurance is limited (on the order of 10⁵ cycles), making HBF unsuitable as a general-purpose cache or a write-heavy tier. For this reason, HBM continues to remain critical for KV cache, activations, and other latency-sensitive accesses. Some systems are tiering KV cache - keeping hot KV in HBM while spilling colder data into CPU-attached LPDDR over coherent high-speed interconnects. HBF fits naturally with read-mostly data: model weights and slowly changing retrieval corpora. Weights are large, read-bandwidth-hungry, and not written during inference. Increasing per-node weight capacity by an order of magnitude reduces the number of accelerators required to fit a model, directly impacting system scale, power, and network complexity. HBF has shifted from theory to a formal industry roadmap following the 2025 SanDisk and SK Hynix alliance, with standardization underway and sampling slated for late 2026, targeting a full market debut on inference platforms by early 2027. Kioxia is pursuing an incremental path, moving multi-terabyte flash closer to accelerators over PCIe 6.0 with aggressive prefetching. Different implementations, same direction: flash is being pulled into the memory hierarchy! HBF is a serious attempt to address the capacity bottleneck. The open questions are no longer about whether this makes sense, but how to architect systems around it - software scheduling of page-based reads to hide microsecond latencies, balancing between HBM/HBF for different inference workloads, reducing tail latency, etc. Overall, I am excited to see more innovation for the memory wall problem!
Cameron R. Wolfe, Ph.D.

Netflix • 23K followers

Vision Large Language Models (vLLMs) extend text-based LLMs to understand visual modalities like images and video. Here’s exactly how they work... Top-level view. vLLM architectures have two primary components: 1. LLM backbone: standard decoder-only transformer. 2. Vision encoder: usually a CLIP / ViT model (with optional Perceiver Resampler for handling videos). Visual tokens. The vision encoder takes an image (or video) as input and returns a fixed-size set of visual “token” vectors to represent this image as an output. Usually, we create visual tokens by sampling vectors from multiple layers of the CLIP model and concatenating them together, which ensures little perceptual information is lost. Unified embedding. Now, we have a set of text and image (or video) token vectors as input. The first common vLLM architecture simply: - Concatenates the two modalities of vectors together. - Passes these concatenated vectors as input to a decoder-only transformer. The size of the visual token vectors may not match that of the text token vectors, so we linearly project the visual token vectors into the correct dimension. The unified embedding architecture is conceptually simple, but it increases the length of input passed to the LLM, which increases computational costs during training / inference. These visual tokens are passed through every layer of our powerful LLM backbone! Cross-modality attention. Instead of concatenating text and vision token vectors, we just pass text token vectors as input to the LLM. To incorporate vision info, we can add cross-attention modules that perform cross-attention between the text and vision token vector into select layers of the LLM—usually every second or fourth layer. This architectural variant, which looks similar to the transformer decoder, is called the cross-modality attention architecture. This architecture merges visual info into the LLM using cross-attention (more efficient) instead of increasing input length. Additionally, it adds new layers into the model for fusing visual / text info, rather than relying on existing LLM layers. So, we can leave the LLM backbone fixed during training and only train the added layers, ensuring the LLM’s performance on text-only tasks stays the same. Training vLLMs. vLLMs are trained similarly to any other LLM–using next token prediction. There are two strategies for training a vLLM like this: 1. Natively multi-modal: train the model from scratch using multi-modal data from the beginning and throughout the entire training process. 2. Compositional: first pretrain the vision encoder / LLM backbone separately, then perform a separate training stage to fuse them together. Many examples of natively mutli-modal LLMs now exist (e.g., Gemini or GPT-4o). However, natively multi-modal training can be complex, and compositional training has several benefits (e.g., model development can be parallelized, we can leverage existing text-based LLMs, etc.).
Zuhayeer Musa

Levels.fyi • 60K followers

Stock vesting schedules have been morphing, as we all know. And Waymo just slipped a new experiment into the mix. Two recent Waymo offers shared on Levels.fyi tell the story: - An L3 software-engineer package that vests RSUs over two years, 50 percent each year. - An L5 package that keeps the classic four-year, evenly split design. That contrast hints at a deliberate split strategy: shorter horizons for early-career hires, longer runways for senior talent. The logic tracks. New grads often treat their first role as a springboard, so a two-year vest ensures most of their grant actually pays out before they move on. For seasoned engineers, who hold bigger grants and deeper domain knowledge, Waymo still wants the retention pull of a four-year schedule. Waymo isn’t alone. Across tech, equity timelines are turning into a tuning knob: - Lyft issues single-year new-hire grants, then refreshes annually. - Uber front-loads at 35 / 30 / 20 / 15. - NVIDIA moved to 40 / 30 / 20 / 10. - Pinterest compresses to roughly 50 / 33 / 17 over three years. - A handful of high-growth startups grant two-year blocks by default. What’s driving the shift? 1. Pay-for-performance. Shorter cycles let companies re-price grants and reward top performers without being locked into four-year promises. 2. Dilution math. Fewer unvested shares lower equity overhang, especially valuable when stock prices spike. 3. Flexibility. Refreshers become the main retention lever, not year-three and year-four vest cliffs. 4. Talent liquidity. For early-career hires, faster access to equity can feel more tangible than a promise that’s years away. If you’re negotiating, remember that earlier liquidity doesn’t always mean bigger upside. Always compare total grant size, ask how refreshers are determined, and run the tax math on concentrated vesting events. Have you seen a two-year or front-loaded schedule in your offer recently? Comment below with your thoughts and I’ll send over a vesting report we put together here at Levels.fyi. #vestingschedule
George Z. Lin

4K followers

Recent research by UIUC and Intel Labs has introduced a new jailbreak technique for Large Language Models (LLMs) known as InfoFlood. This method takes advantage of a vulnerability termed "Information Overload," where excessive linguistic complexity can circumvent safety mechanisms without the need for traditional adversarial prefixes or suffixes. InfoFlood operates through a three-stage process: Linguistic Saturation, Rejection Analysis, and Saturation Refinement. Initially, it reformulates potentially harmful queries into more complex structures. If the first attempt does not succeed, the system analyzes the response to iteratively refine the query until a successful jailbreak is achieved. Empirical validation across four notable LLMs—GPT-4o, GPT-3.5-turbo, Gemini 2.0, and LLaMA 3.1—indicates that InfoFlood significantly surpasses existing methods, achieving success rates up to three times higher on various benchmarks. The study underscores significant vulnerabilities in current AI safety measures, as widely used defenses, such as OpenAI’s Moderation API, proved ineffective against InfoFlood attacks. This situation raises important concerns regarding the robustness of AI alignment systems and highlights the necessity for more resilient safety interventions. As LLMs become increasingly integrated into diverse applications, addressing these vulnerabilities is crucial for ensuring the responsible deployment of AI technologies and enhancing their safety against emerging adversarial techniques. Arxiv: https://lnkd.in/eBty6G7z
Simon Lancaster 🇺🇸🇨🇦🇵🇹

University of Waterloo • 34K followers

🚀 When I wrote that “the LLM era is peaking, the SLM era is just getting started,” it was a directional bet. Now, 𝗡𝗩𝗜𝗗𝗜𝗔 has doubled down with another paper: the future of AI isn’t about SLM vs LLM, it’s about the two working side by side. 💡 Key takeaways: • 𝗛𝘆𝗯𝗿𝗶𝗱 𝗯𝗲𝗮𝘁𝘀 𝗺𝗼𝗻𝗼𝗹𝗶𝘁𝗵𝗶𝗰: Hybrid systems have the edge when it comes to cost efficiency, stability, and maintainability. • 𝗦𝗟𝗠𝘀 𝗮𝗿𝗲 𝗱𝗶𝘀𝘁𝗶𝗹𝗹𝗲𝗱 𝗟𝗟𝗠𝘀: We haven’t seen many natively designed SLMs yet. Once those appear, efficiency and reliability could jump another order of magnitude. • 𝗦𝗺𝗮𝗹𝗹𝗲𝗿 𝗺𝗼𝗱𝗲𝗹𝘀 𝗵𝗮𝗹𝗹𝘂𝗰𝗶𝗻𝗮𝘁𝗲 𝗹𝗲𝘀𝘀: Narrower scope means tighter alignment, perfect for structured agent tasks. • 𝗔𝗱𝗼𝗽𝘁𝗶𝗼𝗻 𝗺𝗶𝗴𝗵𝘁 𝗯𝗲 𝘀𝗹𝗼𝘄: Billions already sunk into LLM infra mean inertia is real. 🏭 The insight for founders and operators: The next edge in AI won’t come from running ever-bigger models in the cloud, it will come from designing modular systems that match model size to task complexity. Use the right tool for the right job. 🤔 So where do you see this going—toward a fusion of SLMs and LLMs, or just another round of monster-sized models? 📖 NVIDIA’s new report is here: https://lnkd.in/gvVWJ-uT
Armin Parchami

Snorkel AI • 26K followers

NVIDIA Research just published an intersting paper stating that SLMs aren't just "budget alternatives" to LLMs, but they're often superior for real-world applications. Their analysis of popular agents like MetaGPT, Open Operator, and Cradle shows that 40-70% of current LLM calls could be effectively replaced by specialized SLMs. Meanwhile, Andrej Karpathy envisions an even more radical future: a small "cognitive core" model that runs locally by default, with always-on capabilities, multimodal input/output, and the ability to dial up reasoning power when needed while delegating complex tasks to cloud-based oracles. The convergence is clear: we're moving from monolithic, cloud-dependent LLMs toward modular, efficient architectures. SLMs offer 10-30x better cost efficiency, lower latency, and can be fine-tuned faster and at a reasonable cost. For agentic systems performing repetitive and specialized task are better handled with careful context engineering and specialized SLMs. The future isn't about choosing between big and small models, but it's about intelligent orchestration of both. Paper: https://lnkd.in/gZAgH4Y6 Andrej Karpathy's X post: https://lnkd.in/g23777Mc
Ameer Haj Ali, PhD

Stealth Startup • 8K followers

🎯 Just fired my conversation with Ion Stoica - the legendary Berkeley professor and co-founder of $62B Databricks and Anyscale. After working with Ion as my PhD advisor, manager, and company advisor, I finally got him to share the patterns behind his success and what's coming next in AI. Key insights that stood out: 🔸 "Execution is everything" - Even the best ideas fail without proper execution. The only way to know if your idea is good is to execute it well first. 🔸 The card game analogy - "You're dealt cards you can't change. Everyone gets unlucky sometimes. Focus on playing your hand optimally vs complaining about luck." This mindset shift changed how I approach every challenge. 🔸 Why China might win the AI race - They have the talent, data, and increasingly the infrastructure. Plus better collaboration between academia and industry. 🔸 The next big bet - Vertical integration across the AI stack. Just like a Formula 1 car, optimizes everything together, AI systems will need tight integration from hardware to application. 🔸 Building reliable AI - "There's no silver bullet. You need precise specifications and verification - just like good management." 🔸 How AI wrappers can differentiate - "It's like early internet days - everyone's building apps. Winners will be determined by business model innovation, not just tech. Find better alignment between your costs and customer value." 🔸 Building a $62B company - Databricks succeeded by: betting on cloud (versus on-premise), focusing on data scientists when few existed, execution, and timing major secular trends perfectly. For young AI founders: Ion's advice is to bet on vertical integration, reliability, and business model innovation. What resonated most with you? Timestamps in comments ⬇️ Full conversation: https://lnkd.in/gK3fQRgb #AI #Startups #TechLeadership #ArtificialIntelligence #Berkeley #Databricks
Andrew Feldman

Cerebras Systems • 40K followers

Ok, lets see just how cool blisteringly fast code generation can be. Here is very long prompt borrowed from Daniel Kim. Its important to see the prompt, so I put it below. Sit back and marvel at just how fast Cerebras Systems inference is. Prompt: a Python script that uses Pygame to simulate a single red ball bouncing inside a rotating regular hexagon, all in SI units with a conversion of 100 pixels = 1 meter. The window should be 800×600 pixels at 60 FPS, with a black background. The hexagon is centered on the window, has a radius of 250 pixels (2.5 m), and rotates clockwise at 60 degrees per second. The hexagon does not respond to gravity, but the ball does. Use a gravitational acceleration of 9.81 m/s², and give the ball a restitution of 0.92 so it bounces. The ball should be 10 pixels in radius (0.1 m), colored red, and start near the hexagon’s center but offset upwards by 1 meter. It should have an initial horizontal velocity of 1 m/s (i.e., 100 pixels/s). To prevent tunneling through the hexagon edges, implement multiple substeps (e.g., 5) each frame and carefully resolve collisions by pushing the ball out of the polygon when penetration occurs, then reflecting its velocity. Render the hexagon’s edges in white with a line width of 3, draw the red ball as a circle, and label the window caption as “Red Ball in a Rotating Hexagon (SI Units).” End the script cleanly when the user closes the window.