/r/Terraform
Terraform discussion, resources, and other HashiCorp news.
This subreddit is for Terraform (IaC - Infrastructure as Code) discussions to get help, educate others and share the wealth of news.
Feel free to reach out to mods to make this subreddit better.
Rules:
Be nice to each other!
MultiLink_Spam == Perm.Ban
/r/Terraform
The less directly hardcoded stuff, the better (I guess?), which is why we try to use locals, especially when they contain arguments which are likely to be used elsewhere/multiple times.
However, is there a point where it becomes too much? I'm working on a project now and not sure if I'm starting to add too much to locals. I've found that the more I have in locals, the better the rest of my code looks -- however, the more unreadable it becomes.
Eg:
Using name = local.policies.user_policy
looks better than using name = "UserReadWritePolicy"
.
However, "UserReadWritePolicy"
no longer being in the iam.tf code means the policy becomes unclear, and you now need to jump over to locals.tf to have a look - or to read more of the iam.tf code to get a better understanding.
And like, what about stuff like hardcoding the lambda filepath, runtime, handler etc - better to keep it clean by moving all over to locals, or keep them in the lambda.tf file?
Is there a specific best practice to follow for this? Is there a balance?
Hello fellow terraformers. I'm hoping some of you can help me resolve why my ECS Service is timing out when I run terraform destroy. My ECS uses a managed capacity provider, which is fulfilled by a Auto Scaling Group using EC2 instances.
I can manually unstick the ECS Service destroy by terminating the EC2 Instances in the Auto Scaling Group. This seems to let the destroy process complete successfully.
My thinking is that due to how terraform constructs its dependency graph, when applying resources the Auto Scaling Group is created first, and then the ECS Service second. This is fine and expected, but when destroying resources the ECS Service attempts to be destroyed before the Auto Scaling Group. Unfortunately I think I need the Auto Scaling Group to destroy first (and thereby also the EC2 Instances), so that the ECS Service can then exit cleanly. I believe it is correct to ask terraform to destroy the Auto Scaling Group first, because it seems to continue happily when the instances are terminated.
The state I am stuck in, is that on destroy the ECS Service is deleted, but there is still one task running (as seen under the cluster), and an EC2 Instance in the Auto Scaling Group that has lost contact with the ECS Agent running on the EC2 Instance.
I have tried setting depends_on, and force_delete in various ways, but it doens't seem to change the fundamental problem of the Auto Scaling Group not terminating the EC2 Instances.
Is there another way to think about this? Is there another way to force_destroy the ECS Service/Cluster or make the Auto Scaling Group be destroyed first so that the ECS can be destroyed cleanly?
I would rather not run two commands, a terraform destroy -target ASG, followed by terraform destroy. I have no good reason to not want to, other than being a procedural purist who doesn't want to admit that running two commands is the best way to do this. >:) It is proabably what I will ultimately fall back on if I (we) can't figure this out.
Thanks for reading, and for the comments.
Edit: The final running task is a github action agent, which will run until its stopped or upon completing a workflow job. It will happily run until the end of time if no workflow jobs are given to it. It's job is to remain in a 'listening' state for more jobs. This may have some impact on the process above.
Hey guys, my team is building a cool new product, and we would like to know if this is something you would benefit from: https://app.youform.com/forms/lm7dgoso
Update: apparentlymart was right on; there was a call I had missed and somehow grep wasn't picking up on. I guess if that happens to anyone else, just keep digging because IT IS there...somewhere ;)
I'm fairly new to Terraform and inherited some old code at work that I have been updating to the latest version of TF.
After running terraform init when I thought I had it all complete, I discovered I missed fixing a call to aws_alb which is now aws_lb, so TF tried to load a provider 'hashicorp/alb'. I fixed the load balancer call, went to init again, and saw it is still trying to load that provider even though the terraform providers command shows no modules dependent on hashicorp/alb.
I nuked my .terraform directory and the state file but it's still occurring. Is there something else I can do to get rid of this call to the non-existent provider? I have grep'ed the hell out of the directory and there is nothing referencing aws_alb instead of aws_lb. I also ran TF_LOG to get the debugging information, but it wasn't helpful.
Hey everyone, my team and I are building a tool that makes it easy to optimize your cloud infrastructure costs using a combination of AI and static Terraform analysis. This project is only a month old so I’d love to hear your feedback to see if we’re building in the right direction!
You can try the tool without signing up at infra.new
Capabilities:
We just added a GitHub integration so you can easily pull in your existing Terraform configuration and view its costs / optimize it.
I’d love to hear your thoughts!
For our prod and test environments, they have their own IAM account - so we're good there. But for our dev account we have 5 people "playing" in this area and I'm not sure how best to manage this. If I bring up a consul dev cluster I don't want another team member to accidentally destroy it.
I've considered having a wrapper script around terraform itself set a different key in "state.config" as described at https://developer.hashicorp.com/terraform/language/backend#partial-configuration.
Or, we could utilize workspaces named for each person - and then we can easily use the ${terraform.workspace} syntax to keep Names and such different per person.
Whats the best pattern here?
Hi there…
I am setting up a new IaC setups and decided to go with a child --> parent model.
This is for Azure and since Azure AVM modules have some provider issues, I was recommended to not to consume their publicly available modules instead asked me to create ones from scratch.
So I am setting up Postgres module (child module) from scratch (using Terraform Registry) and it has azurerm_resource_group resource.
But I don’t want to add a resource_group at Postgres level because the parent module will have the resource_group section that will span across other Azure modules (it should help me with grouping all resources).
I am trying to understand the vary basic logic of getting rid of resource_group from this section: Terraform Registry and add it at the parent module.
If I remove the resource_group section here, there are dependencies on other resources and how can I fix this section community.
How can I achieve this?
As always, cheers!!
Hi there...
I am setting up our IaC setup and designing the terraform modules structure.
This is from my own experience few years ago in another organization, I learned this way:
EKS, S3, Lambda terraform modules get their own separate gitlab repos and will be called from a parent repo:
Dev (main.tf) will have modules of EKS, S3 & Lambda
QA (main.tf) will have modules of EKS, S3 & Lambda
Stg (main.tf) will have modules of EKS, S3 & Lambda
Prod (main.tf) will have modules of EKS, S3 & Lambda
S its easy for us to maintain the version that's needed for each env. I can see some of the posts here almost following the same structure.
I want to see if this is a good implementation (still) ro if there are other ways community evolved in managing these child-parent structure in terraform 🙋🏻♂️🙋🏻♂️
Cheers!
So far to me the responsible thing to do, under terragrunt, when there are dependencies between modules is to pass outputs to inputs. However I've more recently needed to use AWS Secret Manager config, and so I'm putting my passwords in there and passing an ARN. Given I am creating secrets with a methodical name, "<environment>-<application>" etc., I don't need the ARN, I can work it out myself, right?
As I am storing a database password in there, why don't I also store the url, port, protocol etc and then just get all those similar attributes back trivially in the same way?
It feels like the sort of thing you can swing back and forth over, what's right, what's consistent, and what's an abuse of functionality.
Currently I'm trying to decide if I pass a database credentials ARN from RDS to ECS modules, or just work it out, as I know what it will definitely be. The problem I had here was that I'd destroyed the RDS module state, so wasn't there to provide to the ECS module. So it was being fed a mock value by Terragrunt... But yeah, the string I don't "know" is entriley predictable, yet my code broke as I don't "predict" it.
Any best practise tips in this area?
I'm working on a module to create Azure AI Services environments that deploy the Deepseek R1 model. The model is defined in ARM's JSON syntax as follows:
{
"type": "Microsoft.MachineLearningServices/workspaces/serverlessEndpoints",
"apiVersion": "2024-07-01-preview",
"name": "foobarname",
"location": "eastus",
"dependsOn": [
"[resourceId('Microsoft.MachineLearningServices/workspaces', 'foobarworkspace')]"
],
"sku": {
"name": "Consumption",
"tier": "Free"
},
"properties": {
"modelSettings": {
"modelId": "azureml://registries/azureml-deepseek/models/DeepSeek-R1"
},
"authMode": "Key",
"contentSafety": {
"contentSafetyStatus": "Enabled"
}
}
},
Is there a way for me to deploy this via the azurerm TF resource provider? I don't see anything listed in the azurerm documentation for this sort of resource, and I was hoping to keep it all within azurerm if at all possible.
I have my ARM_SUBSCRIPTION_ID environment variable set, but when I try to run terraform plan it doesn't detect it.
I installed terraform using brew.
How can I fix this?
Hi everyone! I'm using Terragrunt in my job, and I was wondering how to add a prefix to every resource I create, so resource become easier to identify for debugging and billing. e.g. if project name is "System foobar", every resource has "foobar-<resource>" as its name.
Is there any way to achieve this?
Sorry for my english and thanks in advance.
There's currently 2 ways to declare a subnet in terraform azurerm:
In-line, inside a VNet
resource "azurerm_virtual_network" "example" { ... subnet { name = "subnet1" address_prefixes = ["10.0.1.0/24"] }
Using azurerm_subnet resource
resource "azurerm_subnet" "example" { name = "example-subnet" resource_group_name = azurerm_resource_group.example.name virtual_network_name = azurerm_virtual_network.example.name address_prefixes = ["10.0.1.0/24"] }
Why would you use 2nd option? Are there any advantages?
Hey there,
we are planning to implement the Cloud Adoption Framework (CAF) in Azure and Landing Zones in our company. Currently, I am the only one managing the Azure service, while many tasks are handled by our Managed Service Provider (MSP). The MSP will also drive the transition to CAF and Landing Zones.
I am currently pursuing the AZ-104 certification and aim to continue my education afterward. The company has asked me how long it would take for me, with no prior experience in Terraform, to manage the Landing Zones, and what would be necessary for this (i.e., how they can best support me on this journey).
What do you think about this? So far, I have no experience with Bicep or Terraform.
resource "aws_db_instance" "test-db" {
engine = "postgres"
db_name = "testdb"
identifier = "test-db"
instance_class = "db.m5.large"
allocated_storage = 100
publicly_accessible = true
backup_retention_period= 7
multi_az = true
storage_type = "gp3"
username = var.db_username
password = var.db_password
vpc_security_group_ids = [aws_security_group.example.id]
skip_final_snapshot = true
blue_green_update {
enabled = true
}
Here's my code
Error:
│ Error: updating RDS DB Instance (test-db): creating Blue/Green Deployment: waiting for Green environment: unexpected state 'storage-initialization', wanted target 'available, storage-optimization'. last error: %!s(<nil>)
Not sure what was the mistake I am doing
Is there any way to reduce the noise of the plan output? I've some resources that contain huge JSON docs (Grafana dashboard definitions) which cause thousands of lines or plan output rather than just a few dozen.
Using the template provided in the URL i tried provisioning Amazon Bedrock knowledge base using terraform. But, i am unable to create opensearch index using terraform.
Error is as below.
opensearch_index.forex_kb: Creating... ╷ │ Error: elastic: Error 403 (Forbidden): 403 Forbidden [type=Forbidden]
Note: I am able to create the index manually but not via terraform.
Hi Folks, I very recently picked up Terraform Cloud and wanted to know how folks are getting the most out of it. Mainly surrounding automation and self service I love the drift detection and the health checks enabled for all the workspaces but I noticed there wasnt anything built in to automatically handle drift atleast for specific workspaces or projects to just eliminate some extra manual labor. Would love to hear how folks are handling this if at all and any other ideas or recommendations for best practice, automation, self service etc. Bit of context I use gha for my plan/apply/linting pipeline integrated with git along with terraform and aws for all my infrastructure. Also as for self service leaning towards waypoint since its native and seems to check all the right boxes.
Afternoon all, still very new to terraform and I’m certain that this is a real basic issue. But I’m bot having any luck finding the answer.
I have a module that creates several azure resources including a container, sastoken, keyvault, secret, endpoints etc. A sastoken is generated and the value is written to the secret. I have noticed that the secret value is being preceded with a “?”SASToken.
Any idea what I could be doing wrong with declaring the value?
Thanks in advance.
Hello everyone,
I'm currently trying to create private networks and subnet and ovh cloud instances using terraform, and precisely i use the openstack provider,
The problem is that i manage to create everything but the instances dont have an aqsinged ip on the dashboard, to be more promecise the instances shows that they have a private ip assigned in the general menu but the specified menu of each instabce shows that they have no ip assinged,
I tried to create an instance manually to test and it git it ips assigned but for the terraform created ones it does not show up,
I looked in all of the doculentations and i saw many examples on the internet and whatever i do it nevet works,
Can you please help me?
I currently have a setup, which involves terraform/terragrunt with a certain directory structure. We are also another codebase which rewrites the older one using only terraform, and using tofu. The directory (state) structure is changing, the module/resource code also is changing. Looking for approaches to import/ migrate the state/resources onto the new IaC.
Hi everyone! I’m excited to share my first Terraform provider for HAProxy. I’m new to Go and provider development, so this has been a big learning experience.
The provider lets you manage frontend/backends, SSL, and load balancing configuration for HAProxy.
You can check it out here: https://github.com/cepitacio/terraform-provider-haproxy
Thank you!
I'll start off with that my career has been cybersecurity and nearly 3 years ago I did a lateral move as our first cloud security engineer. We use GCP with Gitlab.
I've been working on taking over the infrastructure for one of our security tools from a different team that has managed the infrastructure. What I'm running into is this tool vendor doesn't use any sort of versioning for their modules to setup the tool infrastructure.
Right now both our prod and non-prod infrastructure are in the same directory with prod.tf. and non-prod.tf. If I put together a MR with just putting a comment in the dev file the terraform plan as expected would update both prod and non-prod. Which is what I expected but don't want.
Would the solution be as "simple" as creating two sub-directories under our infra/ where all of the terraform resides, a prod and non-prod. Then move all of the terraform into the respective sub-folders? I assume that I'll need to deal with state and do terraform import statements.
Hopefully this makes sense and I've got the right idea, if I don't have the right idea what would be a good solution? For me the nuclear option would be to create an entirely new repo for dev and migrate everything to the new repo.
Hi everyone,
I hope you’re doing well!
I’m currently working on a project involving Azure and Terraform, and I’ve run into an issue during terraform apply. The error I’m facing seems to be related to the resource provider registration. Specifically, I’m getting an error stating that the required resource provider Microsoft.TimeSeriesInsights wasn’t properly registered.
I’ve already reviewed my provider.tf file but couldn’t pinpoint any clear issue. I was wondering if there’s something I need to adjust in the provider configuration.
Here’s what I’ve tried so far:
I considered manually registering the resource provider using the Azure CLI with:
az provider register --namespace Microsoft.TimeSeriesInsights
I also saw that adding skip_provider_registration = true in the provider configuration can disable Terraform’s automatic resource provider registration.
In your experience, which approach works best? Or is there something else I’m missing? Any insights would be greatly appreciated!
Thanks in advance for your help!
Experienced engineer here. Can someone please explain to me what problem terraform actually solves? Compared to using azure cli or azure arm templates? or the aws equivalent?
All it gives me is pain. State lockly, stateful, pain... for no benefit?
Why would i want 2 sources of truth for whats going on in my infrastructure? Why cant i just say what i want my infrastrcutrue to be, it gets compared to whats ACTUALLY THERE (not a state file), and then change it to what i want it to be. This is how ARM deployments work. And its way better.
Edit: seems like the answer is that it's good for people that have infrastructure spread across multiple providers with different apis and want one source of truth / tool for everything . i consistently see it used to manage a single cloud provider and adding unnecessary complexity which i find annoying and prompted the post. thanks for replies you crazy terraform bastards.
Hi!
I'm trying to create a linux function app under consumption plan in azure but I always get the error below:
Site Name: "my-func-name"): performing CreateOrUpdate: unexpected status 400 (400 Bad Request) with response: {"Code":"BadRequest","Message":"Creation of storage file share failed with: 'The remote server returned an error: (403) Forbidden.'. Please check if the storage account is accessible.","Target":null,"Details":[{"Message":"Creation of storage file share failed with: 'The remote server returned an error: (403) Forbidden.'. Please check if the storage account is accessible."},{"Code":"BadRequest"},{"ErrorEntity":{"ExtendedCode":"99022","MessageTemplate":"Creation of storage file share failed with: '{0}'. Please check if the storage account is accessible.","Parameters":["The remote server returned an error: (403) Forbidden."],"Code":"BadRequest","Message":"Creation of storage file share failed with: 'The remote server returned an error: (403) Forbidden.'. Please check if the storage account is accessible."}}],"Innererror":null}
I was using modules and such but to try to nail the problem I created a single main.tf file but still get the same error. Any ideas on what might be wrong here?
main.tf
# We strongly recommend using the required_providers block to set the
# Azure Provider source and version being used
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "=4.12.0"
}
}
backend "azurerm" {
storage_account_name = "somesa" # CHANGEME
container_name = "terraform-state"
key = "testcase.tfstate" # CHANGEME
resource_group_name = "my-rg"
}
}
# Configure the Microsoft Azure Provider
provider "azurerm" {
features {}
subscription_id = "<my subscription id>"
}
resource "random_string" "random_name" {
length = 12
upper = false
special = false
}
resource "azurerm_resource_group" "rg" {
name = "rg-myrg-eastus2"
location = "eastus2"
}
resource "azurerm_storage_account" "sa" {
name = "sa${random_string.random_name.result}"
resource_group_name = azurerm_resource_group.rg.name
location = azurerm_resource_group.rg.location
account_tier = "Standard"
account_replication_type = "LRS"
allow_nested_items_to_be_public = false
blob_properties {
change_feed_enabled = false
delete_retention_policy {
days = 7
permanent_delete_enabled = true
}
versioning_enabled = false
}
cross_tenant_replication_enabled = false
infrastructure_encryption_enabled = true
public_network_access_enabled = true
}
resource "azurerm_service_plan" "function_plan" {
name = "plan-myfunc"
resource_group_name = azurerm_resource_group.rg.name
location = azurerm_resource_group.rg.location
os_type = "Linux"
sku_name = "Y1" # Consumption Plan
}
resource "azurerm_linux_function_app" "main_function" {
name = "myfunc-app"
resource_group_name = azurerm_resource_group.rg.name
location = azurerm_resource_group.rg.location
service_plan_id = azurerm_service_plan.function_plan.id
storage_account_name = azurerm_storage_account.sa.name
site_config {
application_stack {
python_version = "3.11"
}
use_32_bit_worker = false
}
# Managed Identity Configuration
identity {
type = "SystemAssigned"
}
}
resource "azurerm_role_assignment" "func_storage_blob_contributor" {
scope = azurerm_storage_account.sa.id
role_definition_name = "Storage Blob Data Contributor"
principal_id = azurerm_linux_function_app.main_function.identity[0].principal_id
}
resource "azurerm_role_assignment" "func_storage_file_contributor" {
scope = azurerm_storage_account.sa.id
role_definition_name = "Storage File Data SMB Share Contributor"
principal_id = azurerm_linux_function_app.main_function.identity[0].principal_id
}
resource "azurerm_role_assignment" "func_storage_contributor" {
scope = azurerm_storage_account.sa.id
role_definition_name = "Storage Account Contributor"
principal_id = azurerm_linux_function_app.main_function.identity[0].principal_id
}
Hey there, I'm trying to manipulate the following data structure (this is a variable called vendor_ids_map
typed as a map(map(map(string)))
)...
{
"vendor-1": {
"availability-zone-1": {
"ID-1": "<some-id>"
"ID-2": "<some-other-id>"
...Other IDs
},
"availability-zone-2": {
"ID-1": "<another-id>"
"ID-2": "<yet-another-id>"
"ID-3": "<and-another-id>"
...Other IDs
},
...Other availability zones
},
"vendor-2": {
"availability-zone-1": {
"ID-1": "<some-id-1>"
"ID-2": "<some-other-id-1>"
...Other IDs
},
"availability-zone-2": {
"ID-1": "<another-id-1>"
"ID-2": "<yet-another-id-1>"
...Other IDs
},
...Other availability zones
},
...Other vendors
}
...Into something like this...
{
"vendor-1-ID-1": {
"vendor": "vendor-1",
"ID": "ID-1",
"items": ["<some-id>", "<another-id>"]
},
"vendor-1-ID-2": {
"vendor": "vendor-1",
"ID": "ID-2",
"items": ["<some-other-id>", "<yet-another-id>"]
},
"vendor-1-ID-3": {
"vendor": "vendor-1",
"ID": "ID-3",
"items": ["<and-another-id>"]
},
"vendor-2-ID-1": {
"vendor": "vendor-2",
"ID": "ID-1",
"items": ["<some-id-1>", "<another-id-1>"]
},
"vendor-2-ID-2": {
"vendor": "vendor-2",
"ID": "ID-2",
"items": ["<some-other-id-1>", "<yet-another-id-1>"]
},
...Other IDs that were specified in any of the `availability-zone` maps, for any of the vendors
}
...Basically what I'm trying to achieve is: the values for each of the matching IDs across all availability zones for a particular vendor are collected into a single array represented by a single key for that ID, for that vendor. Availability zone doesn't matter. But it does need to be dynamic, so if a new ID comes in for a particular AZ for a particular vendor, or a vendor is added/removed, etc. it should work out of the box.
The idea is to iterate over each of these to create resources... I will need the vendor and ID as part of the each.value
object (I guess I could also just split the key, but that feels a bit messy), as well as the array of items for that ID. If anybody has a better data structure suited for achieving this than what I've put, that's also fine - this is just what I thought would be easiest.
That said, I've been scratching my head at this for a little while now, and can't crack getting those nested IDs concatenated across nested maps... So I thought I'd ask the question in case someone a bit cleverer than myself has any ideas :) Thanks!
Lets say I declared variable hostname in variable.tf. In which scenario I should use var.hostname and ${var.hostname} ?