For our Blog Visitor only Get Additional 3 Month Free + 10% OFF on TriAnnual Plan YSBLOG10
Grab the Deal

Awk Command Explained With Practical Examples

The awk command is a powerful text processing utility in Unix/Linux that scans line by line, splits text into fields, and executes pattern action rules to filter, transform, and summarize data.

It excels at quick one liners for parsing logs, CSVs, and system outputs, making it essential for DevOps, SREs, and data wrangling on the command line. If you work on Linux servers or manage code deployments, learning the awk command is one of the highest leverage skills you can acquire.

This beginner friendly guide explains awk step by step with practical examples, from simple printing to real world log analysis. You’ll also learn tips that pros use daily across hosting and cloud environments.


What is the awk Command?

awk is a pattern driven processing language available on most Unix like systems (Linux, macOS, BSD).

Awk Command

Named after its creators (Aho, Weinberger, Kernighan), it reads input line by line (records), splits each into fields, and runs actions when patterns match. It’s ideal for extracting columns, filtering rows, computing aggregates, and building quick reports.

Unlike sed or grep, awk understands “columns” natively and supports variables, arrays, conditionals, and functions enough power to replace small scripts while staying lightweight and fast.


awk Syntax and Basics

Pattern Action Structure

At its core, awk programs are a list of pattern { action } rules. awk executes the action only for lines where the pattern matches. Without a pattern, the action runs for all lines. Without an action, the default is to print matched lines.

awk 'pattern { action }' file
awk '{ print $0 }' file          # print every line (default action shown explicitly)
awk '/error/' file               # print lines matching regex "error"

Fields, Records, and Delimiters

awk splits each line (record) into fields using a delimiter. By default, the field separator (FS) is any whitespace. Fields are accessible as $1, $2, …, and the whole line is $0. You can change FS via -F or inside a BEGIN block.

# space-separated
awk '{ print $1, $3 }' data.txt

# comma-separated (CSV-like)
awk -F, '{ print $1, $3 }' data.csv

# set output field separator for pretty printing
awk -F, 'BEGIN{ OFS="\t" } { print $1, $3 }' data.csv

Useful Built in Variables

Common built-ins you’ll use all the time:

  • NR: current record (line) number
  • FNR: current record number in the current file
  • NF: number of fields on the current line
  • FS/OFS: input/output field separator
  • RS/ORS: input/output record separator
  • $0: entire line; $1..$NF: fields
# print line number and first field
awk '{ print NR, $1 }' file

# print last field of each line
awk '{ print $NF }' file

Getting Started: Essential awk Examples

# Reorder fields 3 and 1; add a dash between
awk '{ print $3 "-" $1 }' access.log

# Print specific columns from top output (skip header)
top -b -n1 | awk 'NR>7 { print $1, $9, $12 }'

Filter Rows by Value or Regex

# Lines where 3rd field equals "FAILED"
awk '$3=="FAILED"' audit.log

# Requests returning 500 status code in Nginx access log (status is $9 or $8 depending on format)
awk '$9==500' /var/log/nginx/access.log

# Regex match: case-insensitive search for "timeout"
awk 'tolower($0) ~ /timeout/' app.log

Calculate Sums, Averages, and Min/Max

# Sum the 2nd column
awk '{ sum += $2 } END { print sum }' metrics.txt

# Average of column 4 (skip lines starting with #)
awk '$0 !~ /^#/ { n++; total += $4 } END { if (n>0) print total/n }' data.txt

# Track min and max
awk 'NR==1{min=max=$2} { if($2<min)min=$2; if($2>max)max=$2 } END{ print "min",min,"max",max }' stats.txt

Count Unique Values and Frequencies

Associative arrays make grouping trivial.

# Count requests per IP (IP assumed to be $1)
awk '{ hits[$1]++ } END { for (ip in hits) print hits[ip], ip }' access.log | sort -nr | head

Work With CSV and Custom Delimiters

For simple CSV (no quoted commas), -F, works. For more complex CSV, gawk’s FPAT can help match fields as tokens rather than splitting on commas.

# Basic CSV, print columns 1 and 3
awk -F, 'BEGIN{OFS=","} {print $1, $3}' users.csv

# gawk: handle commas inside quotes (basic FPAT pattern)
gawk 'BEGIN{ FPAT = "([^,]*)|(\"[^\"]+\")"; OFS="," } { print $1, $3 }' users.csv

Use BEGIN and END Blocks

BEGIN runs before reading input; END runs after all lines are processed handy for headers, summaries, and setting FS/OFS once.

awk 'BEGIN{ print "User,Count" } { c[$1]++ } END{ for (u in c) print u, c[u] }' log.txt

Practical Real World Use Cases


Analyze Web Server Logs (Apache/Nginx)

Ops teams routinely use awk to troubleshoot traffic, performance, and security. Examples assume a common combined log format; adjust field numbers to your actual format.

# Top 10 IPs by request volume
awk '{ ip=$1; hits[ip]++ } END{ for(ip in hits) print hits[ip], ip }' access.log | sort -nr | head -10

# Top 10 requested URLs (path often $7)
awk '{ path=$7; c[path]++ } END{ for(p in c) print c[p], p }' access.log | sort -nr | head -10

# Status code distribution (status often $9)
awk '{ sc=$9; c[sc]++ } END{ for(s in c) print s, c[s] }' access.log | sort -k1,1

# Total bytes sent (bytes often $10)
awk '$10 ~ /^[0-9]+$/ { bytes += $10 } END { print "Total bytes:", bytes }' access.log

Pipe awk output to sort, uniq, or head for quick reports. Combine with grep to pre-filter dates or virtual hosts.

Monitor System and App Metrics

# CPU usage from mpstat: average user+system
mpstat 1 5 | awk '/Average/ { print "CPU %usr+%sys:", $3 + $5 }'

# Memory usage from /proc/meminfo
awk '/MemTotal/ {t=$2} /MemAvailable/ {a=$2} END{ printf "Mem Used: %.2f%%\n", (t-a)/t*100 }' /proc/meminfo

# Slow queries over 2s in application log (assuming duration is field 6 in seconds)
awk '$6+0 > 2 { print }' app.log

Clean and Transform Data

# Normalize case and trim extra spaces
awk '{ gsub(/^ +| +$/,""); $0=tolower($0); print }' raw.txt

# Replace delimiter: tabs to commas
awk 'BEGIN{ OFS=","; FS="\t" } { print $1,$2,$3 }' input.tsv > output.csv

Join awk With Other CLI Tools

awk shines in pipelines. It complements grep (pre-filtering), sort (ordering), uniq (dedupe), and sed (text substitution). Use each tool for what it’s best at to keep commands small and readable.


Advanced awk Techniques

Grouping and Aggregation by Keys

# Revenue per customer (customer_id in $1, amount in $3)
awk -F, '{ rev[$1] += $3 } END { for (id in rev) printf "%s,%0.2f\n", id, rev[id] }' sales.csv | sort -t, -k2,2nr | head

Range Patterns and Multi file Processing

Range patterns match from one condition to another; FNR resets per file, NR is global across all files.

# Print lines between markers
awk '/BEGIN_REPORT/,/END_REPORT/' app.log

# Compare corresponding lines across files
awk 'FNR==NR { a[$1]=$2; next } { print $0, a[$1] }' left.txt right.txt

Functions, Conditionals, and Arrays

# Derive buckets with if/else and functions
awk '{
  score=$2
  if (score>=90) grade="A"
  else if (score>=80) grade="B"
  else grade="C"
  print toupper($1), grade, length($1)
}' grades.txt

Performance and Portability Tips

  • Prefer simple patterns and minimal gsub calls on huge files.
  • Use numeric comparisons (e.g., $3+0 >= 100) to avoid string semantics.
  • Set FS/OFS once in BEGIN for consistency.
  • For very large data, stream to external sort for heavy ordering.
  • Stick to POSIX awk for portability; use gawk features (FPAT, asort) when needed.

Common Pitfalls and Best Practices

  • Quoting: Wrap awk programs in single quotes to prevent the shell from expanding characters. Escape single quotes carefully.
  • Field numbers: Log formats differ; verify which column holds status, path, or bytes in your environment.
  • CSV complexity: Real CSVs include quoted commas and newlines; for robust CSV, consider gawk with FPAT or specialized tools like Miller.
  • Locale: Sorting and case conversion may vary with locale. Set LC_ALL=C for predictable behavior.
  • Testing: Start with a small sample (head) and add conditions incrementally.

awk vs sed vs grep: When to Use Which

  • Use grep to find lines that match a pattern (fast filtering).
  • Use sed to substitute or edit text in place (stream editor).
  • Use awk to parse fields, compute values, group, and summarize (structured processing).

Grep to narrow input, awk to compute, sed to clean output. This pipeline first mindset keeps command lines efficient and maintainable.


How awk Helps Hosting and DevOps Teams

From spotting abusive IPs to quantifying 5xx spikes, awk lets you turn raw logs into answers in seconds no heavy tooling needed. In hosting environments, this speed accelerates incident response, capacity planning, and performance tuning, especially when SSH’d into production servers with limited resources.

At YouStable, our managed hosting and VPS customers often ask for practical observability without complex stacks. awk, along with standard Linux tools, provides immediate insight.

If you want proactive monitoring, optimized web stacks, and hands on support that speaks your language, our team can help you operationalize these techniques at scale.

More Practical awk One Liners You’ll Reuse

# Remove duplicate lines while keeping first occurrence
awk '!seen[$0]++' file.txt

# Show lines with more than N fields (e.g., malformed)
awk 'NF > 10' data.txt

# Add header to computed report
awk 'BEGIN{print "status,count"} {c[$9]++} END{for(s in c) print s "," c[s]}' access.log

# Extract date (first field) and count per day
awk '{ d=$1; gsub(/\[|\]/,"",d); day=substr(d,1,11); hits[day]++ } END{ for (k in hits) print hits[k], k }' access.log | sort -nr

FAQ’s

1. What is the awk command used for in Linux?

awk is a text processing language for scanning files line by line, splitting lines into fields, and running pattern action rules. It’s used to extract columns, filter rows, compute statistics, generate reports, and transform data especially from logs, CSVs, and command outputs.

2. How do I print specific columns with awk?

Use $1, $2, … to reference fields. For CSVs, set -F, and optionally OFS for output. Example: awk -F, ‘BEGIN{OFS=”,”} {print $1,$3}’ file.csv prints the first and third columns as comma separated output.

3. What is the difference between NR and FNR in awk?

NR is the total line count across all input files; it keeps increasing. FNR is the line count within the current file and resets to 1 when awk starts a new file. Use FNR==NR patterns to build lookup maps from the first file, then process the second.

4. Can awk handle complex CSV files with quoted commas?

Basic awk with -F, struggles when fields contain commas within quotes. gawk improves this via FPAT to define fields as tokens. For fully robust CSV (escaped quotes, newlines), consider dedicated tools like Miller or xsv.

5. Which is better: awk, sed, or grep?

They solve different problems. grep searches, sed edits streams, and awk parses and computes. For structured analysis (columns, grouping, sums), awk is best. Combine them in pipelines for the most efficient, readable command lines.

Mastering the awk command will pay off quickly in any Linux, DevOps, or hosting workflow. Keep these examples handy, adapt them to your log formats, and you’ll turn raw text into actionable insight in seconds.

Sanjeet Chauhan

Sanjeet Chauhan is a blogger & SEO expert, dedicated to helping websites grow organically. He shares practical strategies, actionable tips, and insights to boost traffic, improve rankings, & maximize online presence.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top