TF 200 ACDENG 1 step lab yellow tripdata 2020 01.csv | aws s3 cp “s3: $mybuc

Building and Orchestrating ETL Pipelines by Using Athena and Step Functions
Lab overview and objectives
In this lab, you will use AWS Step Functions to build an extract, transform, and load (ETL) pipeline that uses Amazon Simple Storage Service (Amazon S3), an AWS Glue Data Catalog, and Amazon Athena to process a large dataset.
Step Functions can help you automate business processes by creating workflows, also referred to as state machines. In this lab, you will use Step Functions to build a workflow that invokes Athena to take a series of actions. An example of an action is running a query to discover if AWS Glue tables exist.
The AWS Glue Data Catalog provides a persistent metadata store, including table definitions, schemas, and other control information. This information will help you to create the ETL pipeline.
Athena is a serverless interactive query service that simplifies analyzing data in Amazon S3 by using standard SQL.

You will design the workflow so that if AWS Glue tables don’t exist, the workflow will invoke additional Athena queries to create them. If the tables do exist, the workflow will run an additional AWS Glue query to create a view in Athena that combines data from two tables. You can then query that view to make interesting time-based and location-based discoveries in the large dataset.
After completing this lab, you should be able to do the following:
• Create and test a Step Functions workflow by using Step Functions Studio.
• Create an AWS Glue database and tables.
• Store data on Amazon S3 in Parquet format to use less storage
space and to promote faster data reads.
• Partition data that is stored on Amazon S3 and use Snappy
compression to optimize performance.
• Create an Athena view.
• Add an Athena view to a Step Functions workflow.
• Construct an ETL pipeline by using Step Functions, Amazon S3,
Athena, and AWS Glue.

This lab will require approximately 120 minutes to complete.
AWS service restrictions
In this lab environment, access to AWS services and service actions might be restricted to the ones that are needed to complete the lab instructions. You might encounter errors if you attempt to access other services or perform actions beyond the ones that are described in this lab.
Previously, you created a proof of concept (POC) to demonstrate how to use AWS Glue to infer a data schema and manually adjust column names. Then, you used Athena to query the data. Although Mary likes this approach, each time that she starts a new project she must complete many manual steps. She has asked you to create a reusable data pipeline that will help her to quickly start building new data processing projects.
One of Mary’s projects is to study New York City taxi data. She knows the column names for the table data and has already created views and ingestion SQL commands for you. She wants to study taxi usage patterns in New York City in the early part of 2020.

Mary has requested that you store the table data partition by month in Parquet format with Snappy compression. This will promote efficiency and cost. Because it is a POC, Mary is OK with you using hard-coded values for column names, partitions, views, and S3 bucket information.
Mary has provided the following:
• Links to access the taxi data
• The partitions that she would like to create (pickup_year and
pickup_month)
• SQL ingestion scripts
• A script that will create a view in SQL that she wants to use for this
particular project
When you start the lab, the environment will contain the resources that are shown in the following diagram.
By the end of the lab, you will have created the architecture that is shown in the following diagram.
After doing some research, you decided to take advantage of the flexibility of Step Functions to create the ETL pipeline logic. With Step Functions, you

can handle initial runs where the table data and SQL view don’t exist, in addition to subsequent runs where the tables and view do exist.
OK, let’s get started!
Accessing the AWS Management Console
1. Atthetopoftheseinstructions,chooseStartLab.
o The lab session starts.
o A timer displays at the top of the page and shows the time
remaining in the session.
Tip: To refresh the session length at any time, choose Start Lab again before the timer reaches 0:00.
o Before you continue, wait until the circle icon to the right of the AWS link in the upper-left corner turns green. When the lab environment is ready, the AWS Details panel will also display.
2. ToconnecttotheAWSManagementConsole,choosetheAWSlink in the upper-left corner.

o A new browser tab opens and connects you to the console.
Tip: If a new browser tab does not open, a banner or icon is usually at the top of your browser with the message that your browser is preventing the site from opening pop-up windows. Choose the banner or icon, and then choose Allow pop-ups.
Task 1: Analyzing existing resources and loading the source data
In this first task, you will analyze an AWS Identity and Access Management (IAM) role and an S3 bucket that were created for you. Then, you will copy the source taxi data from a public S3 bucket into your bucket. You will use this data later in the lab when you create a Step Functions workflow.
3. OpenalltheAWSserviceconsolesthatyouwilluseduringthislab.
Tip: Since you will use the consoles for many AWS services throughout this lab, it will be easier to have each console open in a separate browser tab.

o In the search box to the right of Services, search for Step Functions
o Open the context menu (right-click) on the Step Functions entry which appears in the search results and choose the option to open the link in a new tab.
o Repeat this same process to open the AWS service consoles for each of these additional services:
§ AWSGlue § Athena
o Confirm that you now have each of the six AWS services consoles open in different browser tabs.
4. AnalyzetheexistingIAMrolethatisnamedStepLabRole.
o In the IAM console and in the navigation pane, choose Roles. o Search for StepLabRole and choose the link for the role when it
o On the Permissions tab, expand and view the Policy-For-
Step IAM policy that is attached to the role.
Analysis: When you create the Step Functions workflow, you will associate this role with the workflow. This policy will

allow the workflow to make calls to the Athena, Amazon S3, AWS Glue, and AWS Lake Formation services.
5. AnalyzetheexistingS3bucket.
o In the S3 console, in the list of buckets, choose the link for the bucket that has gluelab in its name.
Notice that it doesn’t currently hold any objects. Later in this lab, you will reference this bucket in the Step Functions workflow that you configure.
o Copy the bucket name to a text file.
6. ConnecttotheAWSCloud9IDE.
You will use this name multiple times later in this lab.
o In the Cloud9 console, in the Your environments page, under Cloud9 Instance, choose Open IDE.
7. Loaddataintoyourbucketfromthesourcedataset.

o Run the following commands in the Cloud9 bash terminal. Replace with your actual bucket name (the one with gluelab in the name).
Important: Be sure to keep the quotes around the bucket name.
mybucket=”” echo $mybucket
Tip: You might be prompted about safely pasting multiline text. To disable this prompt for the future, clear Ask before pasting multiline code. Choose Paste.
Analysis: With these commands, you assigned your bucket name to a shell variable. You then echoed the value of that variable to the terminal. Saving the bucket name as a variable will be useful when you run the next few commands.
o Copy the yellow taxi data for January into a prefix (folder) in your bucket called nyctaxidata/data.
wget -qO- https://aws-tc-largeobjects.s3.us-west- 2.amazonaws.com/CUR-TF-200-ACDENG-1/step- lab/yellow_tripdata_2020-01.csv | aws s3 cp – “s3://$mybucket/nyctaxidata/data/yellow_tripdata_2020-01.csv”
Note: The command takes about 20 seconds to complete. The file that you are copying is approximately 500 MB in size.

Wait for the terminal prompt to display again before continuing.
o Copy the yellow taxi data for February into a prefix in your bucket called nyctaxidata/data.
wget -qO- https://aws-tc-largeobjects.s3.us-west- 2.amazonaws.com/CUR-TF-200-ACDENG-1/step- lab/yellow_tripdata_2020-02.csv | aws s3 cp – “s3://$mybucket/nyctaxidata/data/yellow_tripdata_2020-02.csv”
Tip: Much more taxi data is available, and in a production solution, you would likely want to include many years of data. However, for POC purposes, using 2 months of data will suffice.
o Copy the location information (lookup table) into a prefix in your bucket called nyctaxidata/lookup.
Important: The space in the taxi _zone_lookup.csv file name is intentional.
wget -qO- https://aws-tc-largeobjects.s3.us-west- 2.amazonaws.com/CUR-TF-200-ACDENG-1/step- lab/taxi+_zone_lookup.csv | aws s3 cp – “s3://$mybucket/nyctaxidata/lookup/taxi _zone_lookup.csv”
8. Analyzethestructureofthedatathatyoucopied.

The data in the lookup table has the following structure. The following are the first few lines of the file:
“LocationID”,”Borough”,”Zone”,”service_zone” 1,”EWR”,”Newark Airport”,”EWR”
2,”Queens”,”Jamaica Bay”,”Boro Zone” 3,”Bronx”,”Allerton/Pelham Gardens”,”Boro Zone” 4,”Manhattan”,”Alphabet City”,”Yellow Zone”
5,”Staten Island”,”Arden Heights”,”Boro Zone”
6,”Staten Island”,”Arrochar/Fort Wadsworth”,”Boro Zone” 7,”Queens”,”Astoria”,”Boro Zone”
8,”Queens”,”Astoria Park”,”Boro Zone” …truncated
Analysis: The structure is defined by listing the column names on the first line. Mary is familiar with these column names; therefore, the SQL commands that she provided will work without modification later in the lab.
The yellow taxi data file structure for January and February is similar to the following:
VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,t rip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID, payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,impro vement_surcharge,total_amount,congestion_surcharge
1,2020-01-01 00:28:15,2020-01-01 00:33:03,1,1.20,1,N,238,239,1,6,3,0.5,1.47,0,0.3,11.27,2.5 1,2020-01-01 00:35:39,2020-01-01 00:43:04,1,1.20,1,N,239,238,1,7,3,0.5,1.5,0,0.3,12.3,2.5 1,2020-01-01 00:47:41,2020-01-01 00:53:52,1,.60,1,N,238,238,1,6,3,0.5,1,0,0.3,10.8,2.5
…truncated

As with the lookup table file, the first line in each file defines the column names.
Congratulations! In this task, you successfully loaded the source data. Now, you can start building.
Task 2: Automating creation of an AWS Glue database
In this task, you will create a Step Functions workflow that will use Athena to check whether an AWS Glue database exists. If the database doesn’t already exist, Athena will create it.
9. Begintocreateaworkflow.
o In the Step Functions console, to open the navigation pane, choose the menu icon (), and then choose State machines.
o Choose Create state machine.
o Keep Design your workflow visually selected.
o For Type, keep Standard selected, and choose Next.
The Step Functions Workflow Studio interface displays.

10.Design the workflow by using the Workflow Studio interface.
o If a Welcome to Workflow Studio message appears, dismiss it by choosing the X icon.
Notice that a starter workflow, with Start and End tasks, is already defined, as shown in the following image.
o In the Actions panel on the left, search for
o Drag the StartQueryExecution task to the canvas between
the Start and End tasks, as shown in the following image.
Analysis: Athena uses the AWS Glue Data Catalog to store and retrieve table metadata for data that is stored in Amazon S3. The table metadata indicates to the Athena query engine how to find, read, and process the data that you want to query.
o In the Inspector panel on the right:
§ ChangeStatenameto
§ KeeptheIntegrationtypeasOptimized.
Create Glue DB

§ ForAPIParameters,replacethedefaultJSONcode with the following. Replace with your actual bucket name (the one with gluelab in the name).
“QueryString”: “CREATE DATABASE if not exists
nyctaxidb”,
“WorkGroup”: “primary”, “ResultConfiguration”: {
“OutputLocation”: “s3:///athena/” }
§ SelectWaitfortasktocomplete.
Note: This ensures that the workflow will wait until the task is complete before continuing to any additional downstream tasks. This particular task is complete when Athena verifies that the database exists or creates it.
o Keep Next state as Go to end.
o At the top of the page, choose Next.
11.Review the settings and finish creating the workflow.

The Review generated code page shows the JSON code that was created as a result of the settings that you chose in the Workflow Studio interface.
Analysis: In the JSON code, notice how Step Functions will invoke Athena to run a query. The query will check if a database named nyctaxidb already exists. If it doesn’t, the query will create the database. The database will be stored in the gluelab S3 bucket in a folder named athena.
o Choose Next.
o For State machine name, enter WorkflowPOC
o For Permissions, select Choose an existing role, and
ensure that StepLabRole is selected.
o Keep the other default settings, and choose Create state
12.Test the workflow.
Now that you have created a workflow, run it and see what happens on this first run.
o Choose Start execution.
o For Name, enter TaskTwoTest and then choose Start
execution.

Important: Be sure to name your Start execution tests exactly as documented in these lab instructions, otherwise you may not receive full credit for your work later when you submit the lab for a score.
On the Details tab at the top of the page, the status first shows as Running.
The initial Graph inspector view shows the Create Glue DB step in blue, as shown in the following image.
o Wait a minute or two while the workflow runs.
When the Create Glue DB step turns green, as shown in the following image, the step succeeded.
13.Verify that a result file was created in the S3 bucket.
o In the S3 console, choose the link for the gluelab bucket, or if you are already on that page, use the refresh icon to refresh the page.

You should see a new athena prefix (folder) in the bucket. o Choose the athena link to view the contents.
The folder contains a text file. Notice that the size of the file is 0 B, which indicates that the file is empty.
14.Verify that the AWS Glue database was created.
o In the AWS Glue console, in the navigation pane, under Data Catalog, choose Databases.
o Select the nyctaxidb database.
Notice that the database currently doesn’t have any tables. This is expected. You will add steps to the workflow later to create tables. However, this is great progress for now!
In this task, you successfully created an AWS Glue database by using a Step Functions workflow.
Task 3: Creating the task to check whether tables exist in the AWS Glue database

In this task, you will update the workflow so that it will check whether tables exist in the AWS Glue database that you just created.
15.Add another task to your workflow.
o In the Step Functions console, choose the WorkflowPOC state machine, and then choose Edit.
o Choose Workflow Studio on the right side of the page. o In the Actions panel, search for Athena
o Drag another StartQueryExecution task to the canvas
between the Create Glue DB task and the End task.
16.Configure the task and save the change.
o With the new StartQueryExecution task selected, in the Inspector panel, change State name to Run Table Lookup
After you rename the state, the workflow displays as shown in the following image.

o For API Parameters, replace the default JSON code with the following. Replace with your actual bucket name (the one with gluelab in the name).
“QueryString”: “show tables in nyctaxidb”, “WorkGroup”: “primary”, “ResultConfiguration”: {
“OutputLocation”: “s3:///athena/” }
o Select Wait for task to complete.
o Keep Next state as Go to end.
o At the top of the page, choose Apply and exit.
Confirm the definition. It should look similar to the following JSON code.
“Comment”: “A description of my state machine”, “StartAt”: “Create Glue DB”,
“States”: {
“Create Glue DB”: { “Type”: “Task”, “Resource”:
“arn:aws:states:::athena:startQueryExecution.sync”, “Parameters”: {
“QueryString”: “CREATE DATABASE if not exists nyctaxidb”, “WorkGroup”: “primary”,
“ResultConfiguration”: {
“OutputLocation”: “s3:///athena/”
“Next”: “Run Table Lookup” },

“Run Table Lookup”: { “Type”: “Task”, “Resource”:
“arn:aws:states:::athena:startQueryExecution.sync”, “Parameters”: {
“QueryString”: “show tables in nyctaxidb”, “WorkGroup”: “primary”, “ResultConfiguration”: {
“OutputLocation”: “s3:///athena/”
“End”: true }
o Choose Save.
o When prompted about how the IAM role might need new
permissions, choose Save anyway.
Note: Recall that you previously reviewed the permissions that are granted to this IAM role. The permissions are sufficient to complete all the tasks in this lab.
17.Test the updated workflow.
o Choose Start execution.
o For Name, enter TaskThreeTest and then choose Start
execution.

Watch as the workflow runs each task and the tasks change from white to blue to green in the Graph inspector section. The following image shows the graph after the workflow succeeds.
In the Execution event history section, notice that the status of each task is provided in addition to the time that each took to run.
The workflow takes about 1 minute to run, and it will not find any tables.
o After the workflow completes, in the Graph inspector section, choose the Run Table Lookup task.
o In the Details panel to the right, choose the Step output tab.
On or about line 9, notice that the task generated a QueryExecutionId. You will use this in the next task.
o In the Amazon S3 console, choose the link for the gluelab bucket, and then choose the athena link.
Notice that the folder (prefix) has more files now.

Tip: You may need to refresh the browser tab to see them.
The .txt files are blank, but a metadata file now exists and contains some data. AWS Glue will use the metadata file internally.
Congratulations! In this task, you updated the workflow by adding a task that checks whether tables exist in the AWS Glue database.
Task 4: Adding routing logic to the workflow based on whether AWS Glue tables exist
In this task, you will reference the execution ID that the Run Table Lookup task returns to check for existing tables in the AWS Glue database. You will also use a choice state to determine the logical route to follow based on the result of the previous task.
18.Update the workflow to look up query results.
o In the Step Functions console, choose the WorkflowPOC state machine, and then choose Edit.
o Choose Workflow Studio on the right side of the page.

o In the Actions panel, search for Athena
o Drag a GetQueryResults task to the canvas between the Run
Table Lookup task and the End task.
Do not use a GetQueryExecution task.
o With the GetQueryResults task selected, change State name to Get lookup query results
o For API Parameters, replace the existing contents with what is in the code block below.
“QueryExecutionId.$”: “$.QueryExecution.QueryExecutionId”
Analysis: This task will use the query execution ID that the prior task made available as an output value. By passing the value along, the next task (which you haven’t added yet) can use the value to evaluate whether tables were found.
Note: You don’t need to internally poll for this task to complete, so you don’t need to select Wait for task to complete.
o Choose Apply and exit.

浙大学霸代写 加微信 cstutorcs
Confirm the definition. It should look similar to the following JSON code where the placeholders contain your actual gluelab bucket name.
“Comment”: “A description of my state machine”,
“StartAt”: “Create Glue DB”, “States”: {
“Create Glue DB”: { “Type”: “Task”, “Resource”:
“arn:aws:states:::athena:startQueryExecution.sync”, “Parameters”: {
“QueryString”: “CREATE DATABASE if not exists nyctaxidb”, “WorkGroup”: “primary”,
“ResultConfiguration”: {
“OutputLocation”: “s3:///athena/” }
“Next”: “Run Table Lookup”
“Run Table Lookup”: {
“Type”: “Task”,
“Resource”: “arn:aws:states:::athena:startQueryExecution.sync”,
“Parameters”: {
“QueryString”: “show tables in nyctaxidb”, “WorkGroup”: “primary”, “ResultConfiguration”: {
“OutputLocation”: “s3:///athena/” }
“Next”: “Get lookup query results”
“Get lookup query results”: {
“Type”: “Task”,
“Resource”: “arn:aws:states:::athena:getQueryResults”, “Parameters”: {
Code Help
“QueryExecutionId.$”: “$.QueryExecution.QueryExecutionId”
“End”: true
o Choose Save, and then choose Save anyway.
19.Add a choice state to the workflow.
o Choose Workflow Studio on the right side of the page.
o In the Actions panel, choose the Flow tab.
o Drag a Choice state to the canvas between the Get lookup
query results task and the End task.
o With the Choice state selected, change State name to ChoiceStateFirstRun
o In the Choice Rules section, for Rule #1, choose the edit icon and configure the following:
§ ChooseAddconditions.
§ KeepthedefaultSimplecondition. § ForNot,chooseNOT.
§ ForVariable,enterthefollowing:
$.ResultSet.Rows[0].Data[0].VarCharValue § ForOperator,chooseispresent.

程序代写 CS代考 加微信: cstutorcs
§ Ensurethatyoursettingsmatchthefollowingimage.
§ ChooseSaveconditions.
20.Add two pass states to the workflow.
o Drag a Pass state to the canvas after the ChoiceStateFirstRun state, on the left side under the arrow that is labeled not….
o With the Pass state selected, change State name to
ME TRUE STATE
Note: This is a temporary name, which you will update later.
o Drag another Pass state to the canvas after the ChoiceStateFirstRun state, on the right side under the arrow that is labeled Default.
o With the Pass state selected, change State name to
ME FALSE STATE
Your workflow canvas should now look like the following image:
o Choose Apply and exit.

o Choose Save, and then choose Save anyway.
Analysis: When you run the workflow and the Get lookup query results task is complete, the choice state will evaluate the results of the last query.
If tables aren’t found (the $.ResultSet.Rows[0].Data[0].VarCharValue logic evaluates this), the workflow will go the REPLACE ME TRUE STATE route. In the next task, you will replace this state with a process to create tables.
Otherwise, if tables are found, the workflow will go the Default route (the REPLACE ME FALSE STATE route). Later in this lab, you will replace this state with a process to check for any new data (for example, February taxi data) and then insert it into an existing table.
21.Congratulations! In this task, you successfully added a choice state to support evaluating the results of the Get lookup query results task.
Task 5: Creating the AWS Glue table for the yellow taxi data

In this task, you will define logic in the workflow that will create AWS Glue tables if they don’t exist.
21.Add another Athena StartQueryExecution task to the workflow and configure it to create a table.
o To return to the canvas, choose Workflow Studio.
o In the Actions panel, search for athena
o Drag a StartQueryExecution task to the canvas between the
ChoiceStateFirstRun state and the REPLACE ME TRUE STATE
o With the StartQueryExecution task selected, change State
name to Run Create data Table Query
o For Integration type, keep Optimized selected.
o For API Parameters, replace the default JSON code with the
following. Replace and with your actual bucket name (the one with gluelab in the name).
“QueryString”: “CREATE EXTERNAL TABLE
nyctaxidb.yellowtaxi_data_csv( vendorid
bigint, string, double, string, bigint, double, double,
tpep_pickup_datetime string, tpep_dropoff_datetime passenger_count bigint, trip_distance
ratecodeid bigint, pulocationid bigint, payment_type bigint,
store_and_fwd_flag dolocationid
fare_amount double, extra mta_tax double, tip_amount double, tolls_amount
improvement_surcharge double, total_amount

double, congestion_surcharge double) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’ STORED AS INPUTFORMAT ‘org.apache.hadoop.mapred.TextInputFormat’ OUTPUTFORMAT ‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyT extOutputFormat’ LOCATION ‘s3:///nyctaxidata/data/’ TBLPROPERTIES ( ‘skip.header.line.count’=’1’)”,
“WorkGroup”: “primary”, “ResultConfiguration”: {
“OutputLocation”: “s3:///athena/” }
Analysis: Recall that you reviewed the structure of the source data files that you copied into your gluelab bucket. The yellow_tripdata_2020-01.csv and yellow_tripdata_2020- 02.csv source files are in comma-separated value (CSV) format.
The first line in each file defines the columns of data that are contained in the file. The columns include vendorid, tpep_pickup_datetime, and the other columns that are defined in the CREATE EXTERNAL TABLE SQL statement that you just entered for the task.
The CSV file doesn’t define data types for each column of data, but your AWS Glue table does define them (for example, as bigint and string). Note that, by defining the table as EXTERNAL, you indicate that the table data will remain in

Amazon S3, in the location defined by the LOCATION part of the command (s3:///nyctaxidata/data/).
The QueryString that you are sending to Athena in this task uses the Create Table as Select (CTAS) feature of Athena. CTAS statements use standard SELECT queries to create new tables. By using this feature, you can extract, transform, and load data into Amazon S3 for processing. For more information, see “Using CTAS and INSERT INTO for ETL and Data Analysis” at https://docs.aws.amazon.com/athena/latest/ug/ctas-insert- into-etl.html.
o Select Wait for task to complete.
Note: It is important for the table to be fully created before the workflow continues.
o For Next state, choose Go to end.
o On the canvas, select the REPLACE ME TRUE STATE state and
delete it by pressing the Delete key.
Important: Verify that the REPLACE ME TRUE STATE state is no longer on the canvas.

Your workflow canvas should now look like the following image:
o Choose Apply and exit.
o Choose Save, and then choose Save anyway.
22.Test the workflow.
o Choose Start execution.
o For Name, enter TaskFiveTest and then choose Start
execution.
It will take a few minutes for each step in the workflow to go from white, to blue, to green. Wait for the workflow to complete successfully.
This run will not create a new database. However, because it won’t find any tables in the database, it should take the route with the Run Create data Table Query task, as shown in the following image.

23.Verify that the updated workflow created a table in the AWS Glue database the first time that you ran it.
o In the Amazon S3 console, navigate to the contents of the athena folder in your gluelab bucket.
Notice that the folder contains another metadata file and additional empty text files.
Note: The empty text files are basic output files from Step Functions tasks. You can ignore them.
o In the AWS Glue console, in the navigation pane, choose Tables.
Notice that a yellowtaxi_data_csv table now exists. This is the AWS Glue table that Athena created when your Step Function workflow invoked the Run Create data Table Query task.
o To view the schema details, choose the link for the yellowtaxi_data_csv table.
The schema looks like the following image.

24.Run the workflow again to test the other choice route.
• Choose Start execution.
o For Name, enter and then choose Start execution again.
o Wait for the workflow to complete successfully.
Analysis: You want to ensure that, if the workflow finds the new table (as it should this time), the workflow will take the other choice route and invoke the REPLACE ME FALSE STATE state.
The following image shows the completed workflow.
This run didn’t re-create the database or try to overwrite the table that was created during the previous run. Step Functions did generate some output files in Amazon S3 with updated AWS Glue metadata.
o In the Step Functions console, choose the link for the WorkflowPOC state machine.

In this task, you successfully created an AWS Glue table that points to the yellow taxi data.
Task 6: Creating the AWS Glue table for location lookup data
In this task, you will create another AWS Glue table by updating and running the Step Functions workflow. The new table will reference the taxi _zone_lookup.csv source data file in Amazon S3. After you create this table, you will be able to join the yellow taxi data table with the lookup table in a later task. Joining the two tables will help you to make more sense of the data.
Recall that the lookup table holds taxi activity location information. The following text box shows the column names from the first line of the CSV- formatted source data file. The following also shows the first line of data in the file and provides an example of the data types that appear in each column.
“LocationID”,”Borough”,”Zone”,”service_zone” 1,”EWR”,”Newark Airport”,”EWR”
The query will again use CTAS to have Athena create an external table.
26.Update the workflow to create the lookup table.

o Still in the Step Functions console, use the method that you used in previous step