问题描述:

I have plenty of tables sharded by date. I am exploring the way to move on and use the new "Partitioned Tables" instead.

I have found this reference for converting tables: https://cloud.google.com/bigquery/docs/creating-partitioned-tables#converting_dated_tables_into_a_partitioned_table

However, as I have many processes relating to existing tables (loading data or Querying), I am looking for a phased transition approach.

Meaning: I would like to change the load process and the table definition at 1st, and afterwards, slowly change all relating processes.

  1. How should I go about this transition?

    (We have considered and ruled out the option to duplicate everything and maintain each process twice due to obvious reasons.)

  2. Is there a way to use the old table names convention (MyTable_YYYYMMDD) when querying a partition of the new transitioned table?
  3. Can I use the older Table wildcard functions to query it until I transform all my query templates?
  4. Are there existing partitioned tables on a public dataset where we can experiment?

Thanks

网友答案:

This is not exactly quite what you want, but may serve the purpose. With standard SQL, BigQuery supports the equivalent of table wildcard functions:

Documentation for this feature is going to be available soon, but to summarize:

We provide a pseudo column _TABLE_SUFFIX that provides a way to address the table. For example, the following query allows you to read from tables myproject.mydataset.20150105, and myproject.mydataset.20150106 (assuming value1 is a field in both).

SELECT _TABLE_SUFFIX AS table_name, SUM(value1) FROM myproject.mydataset.* WHERE _TABLE_SUFFIX = '20150105' OR _TABLE_SUFFIX = '20150106' GROUP BY 1;

For your use case:

You could first convert your queries against date sharded tables to use _TABLE_SUFFIX pseudo column. Then, when you are able to convert these tables to partitioned tables. After that point, you will need a minor change to your scripts to use the partitioning pseudo column (since the pseudo column name is a bit different in these two cases).

网友答案:

My few cents:

Can I use the older Table wildcard functions to query it until I transform all my query templates?

Directly against new partitioned table - No. You will most likely get something like "table wildcards matches no table"

Is there a way to use the old table names convention (MyTable_YYYYMMDD) when querying a partition of the new transitioned table?

The option I see is to create respective daily Views named after new_partitioned table but following new_partitioned_YYYYMMDD naming.
So while having use of new partitioning feature you will still use old syntax before you do full migration.
This comes with a little extra headache as you will need to make sure you have those daily views created daily going forward for new partitions - but if you are looking for temporary solution - this should be not big deal.

Finally quick comment on reference for converting tables:

It is not clearly stated in documentation, but looks like you cannot just use bq partition command to convert existing sharded to new partioned. First you actually need create that partitioned table. If not - you will get new table w/o schema so it will be useless. It might be a bug - dont know

相关阅读:
Top