Tokenizers in Language Models

This post is divided into five parts; they are: • Naive Tokenization • Stemming and Lemmatization • Byte-Pair Encoding (BPE) • WordPiece • SentencePiece and Unigram The simplest form of tokenization splits text into tokens based on whitespace.

Digital Marketing Courses to Sell Digital Marketing Courses

There’s a strange loop taking over social media right now. Scroll through TikTok, YouTube Live, or Instagram, and you’ll see a parade of “digital marketing experts” promoting their latest PDF guide, online course, or coaching program. What’s it about? Digital marketing. But not the kind that helps actual businesses improve performance, it’s a course on …

Interleaved Reasoning for Large Language Models via Reinforcement Learning

Long chain-of-thought (CoT) significantly enhances large language models’ (LLM) reasoning capabilities. However, the extensive reasoning traces lead to inefficiencies and an increased time-to-first-token (TTFT). We propose a novel training paradigm that uses reinforcement learning (RL) to guide reasoning LLMs to interleave thinking and answering for multi-hop questions. We observe that models inherently possess the ability …

multi agent 1

Part 3: Building an AI-powered assistant for investment research with multi-agent collaboration in Amazon Bedrock and Amazon Bedrock Data Automation

In the financial services industry, analysts need to switch between structured data (such as time-series pricing information), unstructured text (such as SEC filings and analyst reports), and audio/visual content (earnings calls and presentations). Each format requires different analytical approaches and specialized tools, creating workflow inefficiencies. Add on top of this the intense time pressure resulting …