awk is one of those tools that feels cryptic until the moment it clicks. Then you reach for it constantly: extracting a column from logs, summing a CSV field, filtering rows that match a condition, counting unique values. It lives on every Unix system and needs no install.
This guide covers the patterns you'll actually use — not awk programming, just the 20 one-liners that solve 90% of real problems.
Contents
awk splits each line on whitespace (or your chosen delimiter) into fields: $1 is the first, $2 the second, $NF the last. $0 is the whole line.
awk '{print $2}' file.txt
No flag needed — whitespace delimited by default. Use $1 for the first column, $NF for the last. Works on ls -l, log files, anything space-separated.
awk '{print $1, $NF}' file.txt
NF is the count of fields on the current line, so $NF is always the last field regardless of how many columns there are.
awk -F',' '{print $1, $3}' data.csv
-F',' sets the field separator to comma. Use -F'\t' for TSV, -F':' for /etc/passwd-style files.
awk -F',' 'BEGIN{OFS="\t"} {print $1, $2, $3}' data.csv
OFS (output field separator) controls what awk puts between fields in a print statement. Set it in BEGIN so it applies to every row. Here: read CSV, write TSV.
awk -F',' '{print $3, $1, $2}' data.csv
Handy when you need to reorder columns for a different tool — no need to load a spreadsheet or write a Python script.
awk '/error/' file.log
The pattern between // is a regex applied to the whole line. Like grep 'error', but you can immediately add field processing after the //.
awk '!/error/' file.log
Prefix with ! to negate. Equivalent to grep -v 'error', but again composable with field operations.
awk '$3 == "prod" {print}' servers.txt
Tests only the third field. Use ~ for regex match: $3 ~ /prod/ matches "prod", "production", etc. Use !~ to negate.
awk '$5 > 1000 {print $1, $5}' data.txt
awk does arithmetic natively. This prints the name and value for any row where the 5th field is over 1000. Useful for log analysis ("show me requests slower than 1000ms").
awk '/START/,/END/' file.txt
The comma creates a range: awk prints from the first line matching START through the first line matching END, inclusive. Useful for extracting a section from a log or config file.
awk's END block runs once after all lines are processed — perfect for totals, averages, and counts.
awk '{sum += $1} END {print sum}' numbers.txt
The most common awk one-liner. Use printf "%.2f\n", sum instead of print sum if you want two decimal places.
awk '{sum += $1} END {print sum/NR}' numbers.txt
NR is the total number of records (lines) processed. Dividing by NR gives the mean.
awk '/error/ {count++} END {print count}' app.log
Equivalent to grep -c 'error' app.log, but you can add conditions on fields: $4 == "500" && /timeout/ {count++}.
awk 'BEGIN{max=-999999} {if($1+0 > max) max=$1} END{print max}' data.txt
$1+0 coerces the field to a number (in case there's a header). Set BEGIN{max=-999999} to a safe floor, or skip it if you know the data is positive.
awk '{count[$1]++} END {for (k in count) print count[k], k}' file.txt | sort -rn
awk's associative arrays make frequency tables trivial. Pipe to sort -rn to rank by count. Replace $1 with any field — HTTP status codes, usernames, error types.
awk 'NR>1' file.txt
NR is the current line number (record number). NR>1 skips line 1. Use NR==1 {next} as an explicit alternative.
awk '{print NR, $0}' file.txt
$0 is the entire line. This is faster than cat -n for piped input and lets you add conditions ("print only lines 10-20": NR>=10 && NR<=20).
awk 'NR%3==0' file.txt
Prints every 3rd line (lines 3, 6, 9 ...). Change the modulus to sample at different rates. Useful for downsampling high-frequency logs.
awk '$2 == "staging" {$2 = "prod"} {print}' servers.txt
Assigning to a field ($2 = "prod") rewrites it in place. awk then rebuilds the line with OFS between fields. Set OFS="," in BEGIN if you need the output to be CSV.
awk -F',' 'BEGIN{OFS="\t"} {$1=$1; print}' data.csv
The $1=$1 trick forces awk to rebuild $0 from the current fields using OFS — without it, awk prints the original line unchanged. A no-op assignment that has a real side effect.
awk is a text-processing tool built into every Unix system. Use it when you need to work with structured text — extracting columns, filtering rows by field values, doing math on fields, or reformatting files. For anything involving fields and columns, awk is cleaner than sed.
sed is best for line-level substitutions (find-and-replace across a file). awk is best for column-based work: it automatically splits each line into fields and lets you do arithmetic, conditional logic, and aggregation. Use sed to swap text; use awk to process structured data.
Use the -F flag: awk -F',' for CSV, awk -F'\t' for TSV, awk -F':' for colon-delimited (like /etc/passwd). You can also set it with FS="," inside a BEGIN block.
Basic awk with -F',' breaks on quoted fields that contain commas (e.g. "Smith, John"). For properly quoted CSVs, reach for Python's csv module or the miller tool (mlr). For simple CSVs without embedded commas, awk -F',' works perfectly.
These 20 patterns cover most of what you'll reach for awk to do. If you want the broader shell context these fit into, see the 25 bash one-liners guide — awk appears there too, integrated with find, sort, and df. For shortening repetitive commands, the bash aliases guide shows how to wrap your most-used awk patterns into two-keystroke shortcuts.