Netbox is awesome and I have just started to plan some automations to use its data (I currently just use the GUI).
However, if I use it as a source of truth for what is essentially config management, it would be replacing something that's currently running on top of git and GitHub.
I'm not sure I'm willing to lose version history, easy reverts, the pull request mechanism etc for this critical data. How have you dealt with this?
I don't see why you can't have both. AFIAK, Netbox doesn't have do any automation out of the box, you need to do that on your own. It's not going to start changing the configs on your devices. Your configs are not necessarily going to tell you how many rack units a given device is or the orientation of it in a cabinet. Being able to visualize the infrastructure and your IP space is extremely helpful.
I've certainly dealt with that in the past, but the reports, scripts, and webhooks in Netbox helped us automate our documentation QA.
Whenever data is altered, QA gets an immediate ticket to validate it and the reports give them the ability to quickly check it against our real-time inventory system.
To catch the case where data isn't entered, all project tasks and work tickets need to be associated with one or more Netbox updates, and QA confirms that as well.
There's no solution to this problem that's purely technical. In my experience you need to make it as easy as possible with your tooling, but you still have to have processes in place to catch mistakes and omissions.
The NAPALM link seems to be wrong/broken on the NAPALM page under additional features. It links to what looks like a strange feed aggregator. Not sure if anyone from the project is around, I was curious what that library/API/protocol/whatever was.
This is cool for people on the supply side of pure infrastructure. I wish it went a little further up the stack to include applications. It seems to have some rudimentary support via the TCP/IP service options, but it still lacks any concept of clusters.
It would be amazing to me to be able to have it run an analysis like "if we lost this network switch, how many applications would be unaccessible?". This looks like it has the underpinnings to tell me what hosts are going to lose network access, but not to step further up the stack and see what applications are on those hosts, and whether those apps are clustered.
There's basically nothing I've seen for people like SREs that live in the space between infrastructure and code. The infra people have tools like this, and the developers have all kinds of tools to analyze their code, but there's nothing to analyze the mashup of code and infrastructure we call "deployed applications".