How to Filter “access.log” Columns Using AWK

If you’ve administered a web server, you’ve undoubtedly encountered its access log. By default, nginx web servers maintain the log at “/var/log/nginx/access.log,” while Apache web servers maintain the log at “/etc/httpd/conf/httpd.conf.”

Access logs are typically configured to house information in a log format called “combined,” which consists of nine columns.

Example of an “access.log” File – – [1/Dec/2019:01:01:01 -0400] “GET / HTTP/1.1” 200 12345 “-” “Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +”

“Combined”-Format “access.log” Column Headings:

1Remote address
3Remote user
4Local time
6HTTP status
7Bytes sent (body)
8HTTP referrer
9HTTP user agent

What’s interesting about the format is the lack of quotes around columns. Some of them have it and some don’t, which poses a challenge for us when we use a tool like “awk.” If you look closer at column 4 you’ll see brackets with an internal space surrounding the date.

How, then, can you use awk to get columns? By default, awk uses commas as separators. And the “access.log” file doesn’t have any.

The way to do this is to define the columns using the -vFPAT parameter using awk.

AWK Example Using FPAT

$ awk -vFPAT='[^ ]*|”[^”]*”|\\[[^]]*\\]’ ‘{ print $5 }’ access.log

FPAT stands for “field pattern,” and as you can see above, we’re defining columns using regex — a space, a double quote, or an enclosed bracket.

Topics of interest: built-in awk variables