Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I worked on a project last year where we tried using Traefik on Kubernetes together with Let's Encrypt certs. It worked... sometimes.

We had significant issues with Traefik not allocating or renewing certs, resulting in some painful outages. The worst part was that there was no workaround; when adding a new domain to an ingress, it was completely incomprehensible why Traefik wasn't requesting a cert, or indeed why it wasn't renewing older ones that were close to expiration. We filed GitHub issues with concrete errors, but they were never addressed. At the time, I tried to debug Traefik to understand how it worked and maybe chase down some of those bugs. I don't like to speak ill of other people's code — let's just say that peeking under the covers made me realize perfectly why Traefik was so brittle and buggy.

We eventually ditched Traefik in favour of Google Load Balancer ingresses, combined with Cert-Manager for Let's Encrypt, and this combination worked flawlessly out of the box despite not being a 1.0 release at the time. The beauty of this setup is that the control plane (cert and ingress configuration) is kept separate from the data plane (web server), so the two can be maintained and upgraded/replaced separately.



Seems that when any popular project has lacking documentation this creates an opportunity for users to swoop in and own part of that story.

I did this with traefik and consequently many of my blog posts about it are my top visited pages.

And to be fair it the Traefik team invests in developer success and advocacy. They even send you swag for making contributions like popular posts.

I agree to parent posts though the docs lack concrete examples to take the ambiguity out. And debugging logs is painful sometimes.


I second this. It's incredible complex to debug how Traefik understand it's configuration, and also documentation and examples over the internet are very confusing because the version 1.x vs 2.x changes.


Yep. I believe part of the wonkiness comes from the way the configuration is stored. They have this weird design where the config is mapped to key/value stores using an abstraction. You can use a TOML file, YAML file, Etcd, Redis, etc. If you use Let's Encrypt, it also uses this mechanism (e.g. Etcd) to store the state.

It ends up being confusing and brittle, and exposes the underlying store as an API (you can modify Etcd directly and the changes are picked up). There's no intermediate layer that validates or controls the lifecycle of the config or state. You can end up in a situation where you break Traefik by pushing an invalid configuration, for example.


Can't they just take the text based format and create a config tool that reads TOML/YAML and then writes that configuration to etcd, redis or whatever else they support?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: