Who's Watching ??

October 13, 2025

After reducing the RDS size, I was eager to watch the reduction in monthly bill. Earlier, RDS was costing around $11/day and now it should have cost me $3/day. But after few days of monitoring, it did not budge below $7/day. On splitting the cost by usage type, some storage cost had increased for RDS. On inspecting, I found that this was due to the snapshots that I had created before deleting the previous RDS instance. AWS provides free storage only upto the size of RDS (200GB in my case). But the previous snapshots were of 2TB. Hence, they were beyond free storage and were costing around $4/day. On deleting these snapshots, the RDS bill had come down to the expected value of $3/day.

While I was hell-bent on RDS cost, something else had started sprawling. While I was expecting the monthly bill to reduce, it had started to increase. On further inspection, Cloudwatch cost had started to rise and it was costing $80/day. So, all the efforts to reduce bill had gone in vain.

First thing when inspecting cloudwatch was logs. Are we publishing too many logs ? Few devs had touched ECS pipeline around that time so I guessed ECS logging was set to debug mode and that would be costing more. But that wasn’t the reason. In fact whole logging was costing only $1/day. Still $75/day was coming from data processing. What kind of data processing ? I don’t know.

Since the timeline of cost increased matched our migration date, it had to do something with RDS migration. We use DMS to sync production db with analytics db (CDC pipeline). When I checked its config, logging was enabled here and that’s our culprit (is what I thought). Though logging was enabled, but it wasn’t costing us. It was hardly pushing 1GB logs.

Since it was related to data processing, it had to be something related to metrics. I checked PutMetrics , GetMetrics usage but it returned nothing. So, I listed all the metrics:

~ $ # List ALL metrics to find custom namespaces
~ $ aws cloudwatch list-metrics \
>   --query 'Metrics[].Namespace' \
>   --output text | tr '\t' '\n' | sort | uniq -c | sort -rn
  18493 Glue
    640 AWS/Usage
    462 AWS/EBS
    264 AWS/EC2
    153 AWS/RDS
    123 ECS/ContainerInsights
    121 AWS/DMS
    111 AWS/NetworkELB
     64 AWS/ApplicationELB
     45 AWS/Logs
     34 AWS/Firehose
     30 AWS/PrivateLinkEndpoints
     28 AWS/Events
     26 AWS/Lambda
     26 AWS/EKS
     20 AWS/S3
     17 AWS/Glue
     15 AWS/NATGateway
     15 AWS/Athena
      8 AWS/States
      8 AWS/Location
      6 AWS/Rekognition
      6 AWS/HealthLake
      5 AWS/Bedrock/DataAutomation
      3 AWS/KMS
      2 AWS/ECS
      1 AWS/SecretsManager

As can be seen, Glue was pushing 18k+ metrics / day. And that was the culprit. So, I started digging deeper into Glue jobs. There were two Glue jobs. Claude gave me commands to disable metrics in them. But I was hesitant - whoever created these Glue jobs likely used the AWS Console, not CLI. So I went hunting in the visual interface instead. And there it was: under Job Details → Monitoring options → “Job metrics” was checked. I confirmed it with claude and disabled these settings. This should reduce our cloudwatch billing from today.

Just a single click cost us around $80/day. So, knowing what to click is crucial.