


Connection details: choose an AWS Glue Data Catalog, AWS Glue Data Catalog in this account.Choose a metadata catalog: AWS Glue Data CatalogĪnd on the next one "Step 2: Connection details" you need to select,.Choose where your data is located: Query data in Amazon S3.On "Step 1: Choose a data source", you will choose In our example we have selected sufle-athena-output-bucket.Īfter you are done with initial configuration, click on the "Connect data source" button from the "Data sources" tab to start creating your first catalog. Create a bucket if you like from Amazon S3 service, or select an existing bucket that is on your mind. Remember that I mentioned it is running asynchronously and saving results on a bucket. Query result location is where AWS Athena will store the result of the queries that you run.If this is the first time that you use this service, it will ask you some settings initially Let's go to AWS Athena service on the AWS Management Console. 22:55:01 12720 country/countries_codes_and_coordinates.csv For that I used a gist library that has been shared on github by a community member.ĭownload and upload your data to different folders in a Amazon S3 bucket using this structure: ➜ ~ aws s3 ls s3://sufle-athena-bucket -recursive
S3 json query code#
On this data country codes are in ISO Alpha-2 country code format, so to be able to visualize it on a map, we'll need ISO Alpha-2 codes and coordinate information (latitude/longitude) of countries. I used WHO's COVID-19 data published as a CSV file on their website, containing official daily counts of COVID-19 cases, deaths and vaccine utilization reported by countries, territories and areas reported to WHO. Using AWS Athena and Amazon Quicksight to Visualize Data One great advantage is you can use this feature to run asynchronous queries on a relational database within your application. Data connectors available for Athena Federated Query are Amazon CloudWatch Logs, Amazon DynamoDB, MongoDB (AWS Managed or self-hosted), MySQL (Amazon RDS or self-hosted), PostgreSQL (Amazon RDS or self-hosted), ElasticSearch (Amazon Elasticsearch or self-hosted), Redis (Amazon ElastiCache or self-hosted). Other data sources than S3 using Athena Federated Query which puts AWS Lambda between AWS Athena and your data storage to run serverless queries.

Also using line based text SerDe like Regex and Grok, you can even write your own format, so it gives you infinite elasticity. Available data formats to query data on S3 varies like, CSV (comma-separated), TSV (tab-separated), Custom-Delimited, JSON, Apache Arvo, Parquet, ORC. It also saves the query result to a S3 bucket that you configure on the setup.Īmazon S3 using SerDe (serializer/deserializer) libraries. You can poll AWS Athena to get the status of the query and when it is completed you can use this ID to get the query result through AWS CLI, SDK and Management Console. Unlike RDBMS databases, Athena works in an async manner, which means when you send a query you get an query execution ID as a response. Since it is already stored on your S3 bucket and cataloged using AWS Glue, it doesn't need to.ĪWS Glue stores all following information related to the data that has been stored, Note that AWS Athena doesn't store any data or copy of your data. When we use AWS Athena to query data, we actually leverage three AWS services together to accomplish that AWS S3 to store your data, AWS Glue Data Catalog to create catalog of your data that you store, AWS Athena to query. It is developed based on Presto, which was developed by Facebook in 2012 and open-sourced in 2013 as a data warehousing tool. Need to pay for any data warehouse resource like clusters etc. It is a completely serverless solution, meaning you do not need to deploy or manage any infrastructure to use that. AWS Athena is an interactive query engine that enables us to run SQL queries on raw data that we store on S3 buckets.
