This quickstart shows you how to repeat a set of Tasks using a Loop Task and a Workflow Task. You will iteratively call the Workflow you just created in quickstart 2 to get monthly data for the first 3 months in 2016.
Objectives
- Find the top 50 presidential candidates who received the most individual donations for each of the first 3 months in 2016.
- Save the results to 3 BigQuery tables named <tableName>_YYYYMM.
- Export the tables to 3 Cloud Storage files named <filename>_YYYYMM.
Copying the sample Workflow
- Open the sample Workflow: quickstart_003
- Click on the Copy button at the bottom of the page.
- Click OK when prompted "Do you want to make a copy this workflow?".
- Specify the Workflow ID.
Workflow ID is global and must be unique. The Workflow ID, which must start with a letter, can contain only letters, numbers, and underscores. One way to make your Workflow ID unique is to incorporate your email address into the ID. For example, if your email address is bigquery.user@gmail.com: - For the Task named get_data_for_month, specify the child Workflow to execute.
First, clear the pre-existing child Workflow ID "quickstart_002" by highlighting the text and then hitting the delete key on your keyboard.
Once cleared, enter the Workflow ID of the Workflow you just created in Quickstart 2.
- For the Task named get_data_for_month, specify the input arguments to pass to the child Workflow.
- Click on Arguments
- For var_reporting_year, first select "set-value" as source, then enter 2016 in the text box.
- For var_reporting_month, specify the source to the month field of the record parameter var_iterate_month.
From the source dropdown list, select var_iterate_month.
When prompted for the field name, enter month in the text box, then click OK:
- For var_destination_table, first select "set-value" as source, then enter the destination table where you want to store the results in the text box.
The destination table should be of the format <project>:<dataset>.<table>. For example:
myProject:myDataset.topCandidates - Lastly, for var_destination_storage, select "set-value" as source, then enter the destination Cloud Storage location where you want to export the resulting table.
The destination storage should be of the format <bucket>/<file>. For example:
myBucket/topCandidates
- Click on Arguments
- Click Save to save the Workflow.
Running the Workflow
- Click on the Run button to run the Workflow.
- Upon completion, you will see the resulting data in 3 destination BigQuery tables. Following the example here, they are:
myProject:myDataset.topCandidates_201601 myProject:myDataset.topCandidates_201602
myProject:myDataset.topCandidates_201603
You can preview the data from BigQuery console UI. - You will also see the exported data in 3 Cloud Storage files. You can view the files from Cloud Storage console UI.
Clean up
You can delete the 3 BigQuery tables and 3 cloud storage files you just created from the Workflow execution.
Workflow explained
- This Workflow demonstrates two core Tasks: Loop Task and Execute Workflow Task.
- The first BigQuery Task named get_months creates the input table for the Loop Task to iterate over. The resulting temporary BigQuery table contains 3 rows, representing the first 3 months that we want to get donations data for.
The resulting table contains 3 rows:
- Next, we enter a Loop Task. This Loop Task iterates over the temporary table we just created <var_get_months_output>.
- Each row we iterate over is stored in the record custom parameter <var_iterate_month>.
Custom parameter is created and managed by user in the Parameters panel.
So, for the very first iteration, the value <var_iterate_month[month]> will be 01. For the last iteration, the value <var_iterate_month[month]> will be 03.
- Next, the Execute Workflow Task named get_data_for_month executes the child Workflow you just created in quickstart 2.
- You pass into the child Workflows the parameters, specifying the reporting year as 2016. More importantly, you specify the reporting month to be the iterating month. So, as the Loop iterates through, the months of 01, 02, and 03 will be passed as input to the child Workflow, and results for that month will be generated.
- The results are stored in the destination table you specified:
- The resulting BigQuery is exported to the Cloud Storage location you specified:
View Completed Workflow
A completed version of the workflow in this quickstart can be found at https://magnus.potens.io/?l=cXVpY2tzdGFydF8wMDM=