查询执行流程
SQL 执行的生命周期
SQL 文本
↓
1. 词法/语法解析 → AST(抽象语法树)
↓
2. 语义分析 → 查询树
↓
3. 重写规则 → 展开视图、子查询
↓
4. 查询规划 → 选择最优执行计划
↓
5. 执行器 → 按计划执行
↓
结果集1. 词法/语法解析
把 SQL 文本解析成"解析树":
sql
SELECT id, name FROM users WHERE age > 18;解析后的 AST 结构:
SELECT
├── 目标列表
│ ├── id
│ └── name
├── FROM 子句
│ └── users
└── WHERE 子句
└── age > 18bash
# 可以查看解析树
SHOW debug_print_parse;
# 需要开启详细日志才能看到2. 语义分析
检查语义并生成查询树:
- 表是否存在
- 列是否存在
- 数据类型是否匹配
- 权限检查
- 绑定变量类型检查
sql
-- 生成查询树,内部表示
EXPLAIN (FORMAT YAML) SELECT * FROM users WHERE id = 1;
# 输出格式示例:
# ...
# Node Type: Seq Scan
# Relation Name: users
# Filter: (id = 1)3. 重写规则
将规则应用到查询树上:
视图展开
sql
CREATE VIEW active_users AS
SELECT id, name FROM users WHERE status = 'active';
SELECT * FROM active_users;
-- 规则重写后,相当于执行:
SELECT id, name FROM users WHERE status = 'active';规则系统
sql
-- 查看表的规则
SELECT rulename, definition
FROM pg_rules
WHERE tablename = 'users';
-- 创建规则
CREATE RULE users_update AS
ON UPDATE TO users
DO ALSO
INSERT INTO audit_log (table_name, action, old_data, new_data)
VALUES ('users', 'UPDATE', OLD.*, NEW.*);4. 查询规划与优化
这是最复杂的部分,PostgreSQL 使用成本模型选择最优计划。
成本模型
sql
-- 成本常量
show seq_page_cost; -- 顺序扫描一个页: 1.0
show random_page_cost; -- 随机扫描一个页: 4.0
show cpu_tuple_cost; -- 处理一行: 0.01
show cpu_index_scan_cost; -- 索引扫描一行: 0.005
show cpu_operator_cost; -- 执行一个操作符: 0.0025统计信息
规划器依赖统计信息估算行数:
sql
-- 查看表统计信息
SELECT * FROM pg_stats WHERE tablename = 'users';
-- 常用统计字段:
-- null_frac: NULL 比例
-- n_distinct: 不同值数量
-- most_common_vals: 最常见值
-- most_common_freqs: 最常见值频率
-- histogram_bounds: 直方图边界
-- 更新统计信息
ANALYZE users;
-- 自动统计信息收集
-- autovacuum_enabled = on代价估算示例
sql
EXPLAIN SELECT * FROM users WHERE age > 30;
-- Seq Scan on users (cost=0.00..100.00 rows=1000 width=50)
-- ↑ ↑ ↑ ↑
-- 启动成本 总成本 估算行数 行宽度决定扫描方式
sql
-- 小表:顺序扫描更快(避免随机读)
EXPLAIN SELECT * FROM users WHERE id = 1; -- 可能选 Seq Scan
-- 大表有索引:索引扫描
EXPLAIN SELECT * FROM users WHERE email = 'test@example.com'; -- Index Scan
-- 范围查询
EXPLAIN SELECT * FROM orders WHERE created_at > '2024-01-01'; -- 可能选 Bitmap ScanJoin 方式选择
sql
-- Nested Loop: 小表驱动大表,有索引
EXPLAIN SELECT * FROM users u, orders o WHERE u.id = o.user_id;
-- Hash Join: 大表连接,无索引
EXPLAIN SELECT * FROM users u JOIN orders o ON u.id = o.user_id
WHERE o.amount > 1000;
-- Merge Join: 已排序的数据
EXPLAIN SELECT * FROM users ORDER BY id JOIN orders ORDER BY user_id
ON u.id = o.user_id;代价计算公式
总代价 = 启动代价 + (读取页数 × 页代价) + (处理行数 × 行代价) + (索引扫描额外代价)
例如:顺序扫描 1000 行,100 个 8KB 页
总代价 = 0 + 100 × 1.0 + 1000 × 0.01 = 111强制/禁用特定计划
sql
-- 禁用顺序扫描
SET enable_seqscan = off;
-- 禁用嵌套循环
SET enable_nestloop = off;
-- 禁用哈希连接
SET enable_hashjoin = off;
-- 只用于调试,正常不要用5. 执行器
执行器按照计划树执行,使用"火山模型"(Iterator Model)。
火山模型
sql
-- SELECT * FROM users WHERE age > 18;
执行器计划树:
Aggregate
↑
Filter (age > 18)
↑
Seq Scan (users)- 每个节点有
Init()、Exec()、End() - 数据从叶子节点向上流动
- 一次一元组(one tuple at a time)
执行阶段详解
sql
-- 简单查询
SELECT name FROM users WHERE age > 20 ORDER BY name;
-- 完整执行流程:
1. Seq Scan 读取每一行
2. Filter 检查 age > 20
3. Sort 排序
4. Result 返回并行查询
sql
-- 查看是否支持并行
SHOW max_worker_processes; -- 最大进程数
SHOW max_parallel_workers_per_gather; -- 每个节点并行数
-- 强制并行度
SET max_parallel_workers_per_gather = 4;
-- 禁用并行
SET max_parallel_workers_per_gather = 0;sql
EXPLAIN SELECT * FROM users, orders WHERE users.id = orders.user_id;
-- Parallel Seq Scan on users
-- -> Parallel Hash Join
-- -> Parallel Seq Scan on orders缓存层
PostgreSQL 缓存结构
应用层
↓
查询解析/规划(每个会话独立)
↓
PostgreSQL 共享缓冲区 (shared_buffers)
↓
操作系统缓存(OS Cache)
↓
磁盘页面读取流程
sql
-- 查询数据页
SELECT * FROM users WHERE id = 1;
-- 执行器检查 shared_buffers
-- 命中 → 直接读取
-- 未命中 → 从 OS Cache 读取
-- OS Cache 也未命中 → 从磁盘读取
-- 放入 shared_buffers缓存命中率
sql
SELECT
sum(heap_blks_read) as heap_read,
sum(heap_blks_hit) as heap_hit,
sum(heap_blks_hit) /
nullif(sum(heap_blks_hit) + sum(heap_blks_read), 0) as hit_ratio
FROM pg_statio_user_tables
WHERE tablename = 'users';常见执行计划类型
顺序扫描 (Seq Scan)
sql
EXPLAIN SELECT * FROM users WHERE status = 'inactive';
-- Seq Scan on users (cost=0.00..100.00 rows=500 width=50)
-- Filter: (status = 'inactive')索引扫描 (Index Scan)
sql
EXPLAIN SELECT * FROM users WHERE email = 'test@example.com';
-- Index Scan on idx_users_email (cost=0.00..8.01 rows=1 width=50)
-- Index Cond: ((email)::text = 'test@example.com'::text)位图扫描 (Bitmap Scan)
sql
EXPLAIN SELECT * FROM users WHERE age > 30;
-- Bitmap Heap Scan on users (cost=4.00..10.00 rows=5 width=50)
-- Recheck Cond: (age > 30)
-- -> Bitmap Index Scan on idx_users_age (cost=0.00..4.00 rows=5 width=0)索引-only扫描 (Index Only Scan)
sql
EXPLAIN SELECT email FROM users WHERE email LIKE 'a%';
-- Index Only Scan using idx_users_email on users
-- Index Cond: ((email)::text >= 'a'::text AND (email)::text < 'b'::text)
-- Heap Fetches: 0 -- 不用读表,只需读索引嵌套循环 (Nested Loop)
sql
EXPLAIN SELECT * FROM users u, orders o WHERE u.id = o.user_id;
-- Nested Loop (cost=0.00..100.00 rows=1000 width=50)
-- -> Seq Scan on users (cost=0.00..10.00 rows=100 width=50)
-- -> Index Scan using idx_orders_user_id on orders
-- Index Cond: (user_id = u.id)哈希连接 (Hash Join)
sql
EXPLAIN SELECT * FROM users u JOIN orders o ON u.id = o.user_id;
-- Hash Join (cost=0.00..100.00 rows=1000 width=50)
-- Hash Cond: (u.id = o.user_id)
-- -> Seq Scan on users (cost=0.00..10.00 rows=100 width=50)
-- -> Hash (cost=10.00..10.00 rows=1000 width=50)
-- -> Seq Scan on orders (cost=0.00..10.00 rows=1000 width=50)合并连接 (Merge Join)
sql
EXPLAIN SELECT * FROM users u JOIN orders o ON u.id = o.user_id
ORDER BY u.id;
-- Merge Join (cost=0.00..100.00 rows=1000 width=50)
-- Merge Cond: (u.id = o.user_id)
-- -> Sort (cost=0.00..10.00 rows=100 width=50)
-- Sort Key: u.id
-- -> Seq Scan on users
-- -> Sort (cost=0.00..10.00 rows=1000 width=50)
-- Sort Key: o.user_id
-- -> Seq Scan on orders性能分析
EXPLAIN 详解
sql
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM users WHERE email = 'test@example.com';关键输出:
actual time: 实际执行时间rows: 实际返回行数loops: 执行次数Buffers: 缓存命中情况
慢查询分析
sql
-- 方法1:pg_stat_statements
CREATE EXTENSION pg_stat_statements;
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
-- 方法2:日志
ALTER SYSTEM SET log_min_duration_statement = 1000;
SELECT pg_reload_conf();常用优化手段
sql
-- 1. 确保统计信息准确
ANALYZE users;
-- 2. 查看是否用索引
EXPLAIN SELECT * FROM users WHERE email = 'x';
-- 3. 添加合适索引
CREATE INDEX idx_users_email ON users(email);
-- 4. 减少返回数据量
SELECT id, name FROM users; -- 比 SELECT * 快
-- 5. 使用批量插入
INSERT INTO users VALUES (1,'a'), (2,'b'), (3,'c');
-- 6. 预计算/缓存
CREATE MATERIALIZED VIEW monthly_stats AS ...