What is the downside to using terraform?

I like Terraform, and I use it a lot, but I don't think you should use it if you have any other alternative for getting your configuration into source control. I've written about this in the context of using Terraform to manage Kubernetes specifically (where there are lots of alternatives) but I want to answer the more general question as well.

The main downside to using Terraform is that Terraform is requires managing state and "managing state" is a fancy computer term that means "sometimes it gets all fucked up."

NB: For production applications – anything serving traffic on the wbe that has more than one engineer and at least a few thousand dollars in revenue – absolutely check your configuration into production. Yes, all of it. If Terraform is a pretty good way to do this and if it's the only option there's a reason for that. Use it.

But the main reason that Terraform is so useful is Terraform state. And the main problem with using it is that then you have to deal with Terraform state. And Terraform state can get all fucked up, and then you have to spend time unfucking your Terraform state, which you could be using to write more Terraform.

Terraform doesn't just take your .tf files and make reality match them. It shouldn't and it can't. It uses your .tf files to control resources that it's managing.

Terraform state is needed for a handful of things but the main one is that it needs state to be able to deal with resources where it has no other way of knowing whether that resource is the one that it created or is managing. It's a way of letting Terraform actually update things, instead of just endlessly creating new copies, without the risk of deleting something that merely resembles the thing that it made, when two things are true:

(1) There's no way to unambiguously identify a resource besides writing down an ID that's created after the resource is created.

(2) It's not safe to delete and recreate resources.

Lots of systems don't have one or both of these properties. This is why I wrote that post about Kubernetes specifically. Kubernetes has gone to a lot of work, architecturally, not to have either of these properties. Kubernetes allows you to identify resources precisely using only facts you knew about them before you made them. And it's generally safe to recreate Kubernetes resources. (It has to be, because Kubernetes deletes and recreates them all the time for its own reasons.)

When you don't have these properties you have to write down resource ids. And once you've done that you have state.

The problem with state is that now, in order to successfully update something that's managed by Terraform, you have to have that state, and you have to have the right state. If someone else comes along and changes that reality out from underneath you without checking the corresponding changes into the state file, your changes won't work.

The basic way to deal with this, by the way, is to have a mutex, a little identifier that you check out when you start a process that will make changes to Terraform state. You can automate the Mutex in various ways (including by using special ways of storing Terraform state) but "checking out the Mutex" can be as simple as "announcing that I'm starting to write or apply some Terraform" to the team Slack.