Notes on code, technology, caffeine and everything in between.

Access Log

Dec 17, 2022
tl;dr: I'm parsing the access logs of NGINX in a way to complicated solution to get an idea if somebody reads this blog.

Logging and statistics are quite a sensitive topic nowadays. Some say knowing how many users and generally just how users use a particular service is important to build better products. Others say that analyzing product usage is equivalent to spying. I’d say - as a product developer and also a fan of data protection - the answer lays somewhere in between. It’s not so much about how and if data is collected. It’s more that just the required data is collected and equivalently stored. And, if it’s a paid product like my iPhone, I personally want a clear communication how and what data is collected and how to opt out.

I’ve setup this blog now, as described in my previous posts and I will continue this as just a little side-project to craft something. But before I continue working on the style and other stuff I have in mind, I want to know how many people visit this page. If I had to guess, I’d say it’s just me and the google index robot (and maybe my wife checking out what I’m wasting my time with). But you never know.

I hate google analytics, matomo and all the other professional “spy solutions” using massive databases, cookies and more. And I love minimalist approaches. So I just take what’s already there. The access log of NGINX, which runs as the webserver. (I use to say engine x by the way, as this is a long lasting discussion topic somewhere and has finally been settled by the nginx developsers)

What’s inside the acces log?

NGINX logs all the resources a client (by IP) is accessing as well as status code and user agent and some more information. That’s what every web server does by default. The thing is with log retention. By default, logrotate.d is set t keep logs for 52 days. As my webserver is not running for so long in the moment as I’m constantly changing things, I just have no Idea if that’s true. I’ll have to test it. First of all, I set the NGINX docker container to save access log to a persistent volume.

But how to do statistic with a log file?

After some googling, I found a tool called goaccess. It is a command line based log file analyzer. As a little bonus, it can generate a report html file, which presents the log file data in a nice gui instead within the command line.

Great Idea, just put a report.html on your server, so everyone can view it!

I get the irony. No, I’m not that stupid. Every hour or so, I generate the newest report in a non-accessible folder and transfer it to a local machine here. The command to generate the report is:

cat /path/to/logs/access.log | docker run --rm -i -e LANG=$LANG allinurl/goaccess -a -o html --log-format COMBINED - > report.html

(docker required)

I’ve put that into a script and do an scp to transfer the log files from my webserver to my local machine at home within the same script. That script is executed by a cronjob every hour.

I run another little NGINX webserver at home to just view the recent report. As logrotate deletes data older than 52 days, this should be GDPR compliant and will automatically be deleted on my local machine as well.

So, let’s see if I’m right and nobody even reads this stuff here!