Cloud Resource connections reference Cloud Storage buckets. The reference is established when creating an external table. For the connection to read data in the bucket, permissions need to be granted to it. After the connection is created, a service account id is generated. This id is listed in the navigator popup.
Grant permissions to the Connection using the Service account id
- In External Connections, hover the connection to display the connection info popup.
- In the popup identify and copy the service account id.
- In Google Cloud, navigate to IAM of the project or bucket.
- Add the service account id as principal and apply a role such as Storage Object Viewer.
Use Cloud Resource Connection
A Cloud Resource Connection can be used to create a BigLake table. To create a BigLake Table:
- Navigate to the dataset you would like to create the table and right click to open the context menu.
- In the context menu, select Create > External Table.
- In the Create External Table modal
- Select Google Cloud Storage as the Source.
- Input fields as normal.
- Check Use Connection. This will enable using a connection and thus create a BigLake table. After checking, a list of connections will display.
- In the Connection drop down, only those connections of type Cloud Resource will display. From the list, select the Cloud Resource connection created in the previous step.
- Click OK to create the table.
Cloud SQL
Data that resides in Cloud SQL can be queried from BigQuery using a connection. Cloud SQL connections can be either MySQL or PostgreSQL. Cloud SQL connections can be in federated queries.
Create Cloud SQL Connection
- In the navigator, right click External Connections and select Create Connection.
- In Create Connection modal
- For Type, select either Cloud SQL – MySQL or Cloud SQL PostgreSQL.
- ID is the name of the connection.
- Select a location that is compatible with the external data source region.
- Friendly name and description are optional.
- For Cloud SQL inputs
- The instance id is the Cloud SQL to connect to. This id is obtained from the Google Cloud SQL page.
- Database name, username, and password are MySQL/PostgreSQL database and use credentials.
- When ready click ok to submit.
- After created, the connection will be listed under External Connections with the name in the format of [location].[connection id].
Use Cloud SQL Connection
Federated Query
Cloud SQL connections can be used to query data via federated queries. A BigQuery federated query allows querying of data in BigQuery that resides in Cloud SQL without copying or moving the data into BigQuery.
Requirements
BigQuery Cloud SQL federated queries must be made against a Cloud SQL instance that has public IP connectivity.
Syntax
Federated queries are made using EXTERNAL_QUERY. The syntax is:
EXTERNAL_QUERY('connection_id', '''external_database_query'''[, 'options'])
For example,
SELECT *
FROM
EXTERNAL_QUERY(
'connection_id',
'''SELECT * FROM table'''
);
Connection id is in the format of projects/PROJECT_ID/location/LOCATION/connections/CONNECTION_ID and can be found in the popup information for the connection by hovering over the connection the External Connections.
The easiest way to begin an external query is by right clicking the connection in the navigator and selecting EXTERNAL QUERY from the context menu. This will insert a starting template to the query text area with the connection id for the connection.
Amazon Web Services (AWS)
Data in AWS can be queried in BigQuery via BigQuery Omni. To use AWS via BigQuery Omni, create a connection, create an external table using that connection, and run queries using the external table.
Requirements
An AWS connection has an AWS identity and IAM user. Permissions are granted to users through AWS IAM roles. Create an AWS policy and role to allow BigQuery to access the AWS bucket.
Create AWS Connection
- In the navigator, right click External Connections and select Create Connection
- In Create Connection modal
- For Type, select BigLake on AWS (via BigQuery Omni).
- ID is the name of the connection.
- Select a location. When using AWS, the only available option is aws-us-east-1
- Friendly name and description are optional.
- AWS role is the IAM role id created on AWS in the format of arn:aws:iam::AWS_ACCOUNT_ID:role/ROLE_NAME.
- When ready click ok to submit.
- After created, the connection will be listed under External Connections with the name in the format of [location].[connection id].
Grant Permissions to AWS Connection
AWS connections reference Amazon S3 buckets. The reference is based on the AWS role and policy. For the connection to read data in the bucket, permissions need to be granted to it. After the connection is created, a BigQuery Google identity will be generated that can be used to establish a trust relationship to the AWS role or added to a custom AWS identity provider. This id is listed in the navigator popup.
Use AWS Connection
An AWS Connection can be used to create a BigLake table via BigQuery Omni. To create an external table:
- Navigate to the dataset where you would like to create the external table and right click to open the context menu. The dataset needs to be in aws-us-east1.
- In the context menu, select Create > External Table.
- In the Create External Table modal
- Select Amazon S3 as the Source.
- Input fields as normal.
- In the Connection drop down, only those connections of type AWS will display. From the list, select the AWS connection created in the previous step.
- Click OK to create the table.
External Connections
Connections are listed in the navigator. Depending on the connection type, different context menu items will display.
Share is available for all types. It displays inherited permissions and item specific permissions. Item specific permissions can be added and removed.
Modify Info is available for all types. Depending on the type, different metadata can be updated. For Cloud Resource friendly name and description are available. AWS can modify the Role Id. Cloud SQL can modify the database information.
Delete is available for all types and deletes the connection.
External Query is available for Cloud SQL connections and inserts a template EXTERNAL_QUERY into the query.
Export Results is available for AWS connections and inserts a template to EXPORT DATA WITH CONNECTION. This can be used to export query results to Amazon S3.
Load Data is available for AWS connections and inserts a template to LOAD DATA INTO. This can be used to transfer AWS data to BigQuery.
External Tables
External tables can now be classified as a BigLake table when they use a connection to reference source data. To identify a BigLake table, in the navigator hover over the external table to bring up the popup info. In the External Data Configuration, if Connection ID is displayed then this is an indication that it is a BigLake table.
If the external table is not a BigLake table, an option will display in the context menu to Upgrade to BigLake. This will export CREATE OR REPLACE EXTERNAL TABLE template to the query. This will allow creation of an external table with the use of a connection.