Tuesday 5 February 2013

Find all unique url's from Apache log files

I needed to build a list of all unique hits that had been made on a website in Apache.

Here's what I came up with using awk and sed.  This should match any HTTP 2xx or 3xx requests and strip of any GET request parameters.


awk '$9 ~/^(2|3)/ {print $7}' somelogs* | sed 's/\?.*$//' | sort | uniq