---
date: '2025-10-02'
description: and introduction into inference
id: '7'
modified: 2026-06-07 01:18:28 GMT-04:00
seealso:
  - '[[thoughts/Transformers|Transformers]]'
  - '[[thoughts/LLMs|LLMs]]'
  - '[[thoughts/vllm|vLLM]]'
socials:
  link: https://tsfm.ca/lecture-seven
tags:
  - ml
  - tsfm
title: lecture seven
created: '2025-10-02'
published: '2025-10-02'
pageLayout: default
slug: thoughts/tsfm/7
permalink: https://aarnphm.xyz/thoughts/tsfm/7.md
generator:
  quartz: v4.6.0
  hostedProvider: Cloudflare
  baseUrl: aarnphm.xyz
full: https://aarnphm.xyz/llms-full.txt
---
![[thoughts/Autoregressive models]]

![[thoughts/LLMs#inference]]

## arithmetic intensity

$$
\text{AI} = \frac{\text{FLOPs}}{\text{Bytes}}
$$

or you can think of it as “operations per byte moved”

![[thoughts/quantization#floating point]]

