Преобразование устаревшей функции сглаживания SQL в стандартный SQL

#google-bigquery

#google-bigquery

Вопрос:

У меня есть следующее, написанное на #LegacySQL :

    SELECT
      customer_email,
      submitted_at,
      title,
      answers.choices.labels answer_choices,
      answers.number score,
      answers.boolean true_false,
      metadata.platform device_type
    FROM
      (FLATTEN([test-test:sample.responses], answers)) resp
      LEFT JOIN [test-test:sample.forms] forms
    ON resp.answers.field.id = forms.id
    ORDER BY 1 ASC
  

Он возвращает результаты такими, какими я их хочу, вот так:

  ------------------ ------------------------- ------------------------------------------------------------ -------------------- ------- ------------ ------------- 
|  customer_email  |      submitted_at       |                           title                            |   answer_choices   | score | true_false | device_type |
 ------------------ ------------------------- ------------------------------------------------------------ -------------------- ------- ------------ ------------- 
| myname@gmail.com | 2018-12-25 04:00:02 UTC | How would you rate this product?                           |                    |    10 |            | other       |
| myname@gmail.com | 2018-12-25 04:00:02 UTC | What did you enjoy the most about your experience with us? | Delivery           |       |            | other       |
| myname@gmail.com | 2018-12-25 04:00:02 UTC | What other product(s) would you like to see us make?       | Additional Colors  |       |            | other       |
| myname@gmail.com | 2018-12-25 04:00:02 UTC | What other product(s) would you like to see us make?       | Additional Designs |       |            | other       |
| myname@gmail.com | 2018-12-25 04:00:02 UTC | What color(s) would you want to see?                       | Green              |       |            | other       |
 ------------------ ------------------------- ------------------------------------------------------------ -------------------- ------- ------------ ------------- 
  

Я пытаюсь преобразовать в BigQuery StandarSQL , и я собрал следующее:

 SELECT
  customer_email,
  submitted_at,
  title,
  answers.choices.labels answer_choices,
  answers.number score,
  answers.boolean true_false,
  metadata.platform device_type
FROM
`sample.responses` resp, unnest(answers) answers
LEFT JOIN `sample.forms` forms
ON answers.field.id  = forms.id
ORDER BY 1 ASC
  

К сожалению, он возвращает ее в виде записей, подобных so:

  ------------------ ------------------------- ------------------------- ------------------------------------------------------------ -------------------- ------- ------------ ------------- 
|  customer_email  |      submitted_at       |        landed_at        |                           title                            |   answer_choices   | score | true_false | device_type |
 ------------------ ------------------------- ------------------------- ------------------------------------------------------------ -------------------- ------- ------------ ------------- 
| myname@gmail.com | 2018-12-25 04:00:02 UTC | 2018-12-25 03:59:07 UTC | What did you enjoy the most about your experience with us? | Delivery           | null  | null       | other       |
| myname@gmail.com | 2018-12-25 04:00:02 UTC | 2018-12-25 03:59:07 UTC | What other product(s) would you like to see us make?       | Additional Colors  | null  | null       | other       |
|                  |                         |                         |                                                            | Additional Designs |       |            |             |
| myname@gmail.com | 2018-12-25 04:00:02 UTC | 2018-12-25 03:59:07 UTC | What color(s) would you want to see?                       | Green              | null  | null       | other       |
 ------------------ ------------------------- ------------------------- ------------------------------------------------------------ -------------------- ------- ------------ ------------- 
  

Что я делаю неправильно?

Комментарии:

1. answers.choices.labels повторяется (массив)?

2. Да? Я полагаю, что так

Ответ №1:

Следует иметь в виду, что устаревший SQL выполняет LEFT JOIN с массивами при их выравнивании, в то время как оператор запятой стандартного SQL выполняет обычный JOIN . Если вы используете LEFT JOIN при отмене вложения массива, вы получите строку в выходных данных (со NULL значением элемента), когда массив пуст, тогда как если вы используете обычный JOIN , строка будет опущена. Это похоже на то, как также работают соединения между таблицами. Смотрите руководство по миграции для получения более подробной информации о семантике.

Я думаю, что вы хотите для своего запроса использовать LEFT JOIN между таблицей и answers массивом:

 SELECT
  customer_email,
  submitted_at,
  title,
  answer.choices.labels AS answer_choices,
  answer.number score,
  answer.boolean true_false,
  metadata.platform device_type
FROM
  `sample.responses` resp
LEFT JOIN UNNEST(resp.answers) AS answer
LEFT JOIN `sample.forms` forms
ON answer.field.id  = forms.id
ORDER BY 1 ASC
  

Если вы также хотите выполнить выравнивание answer.choices.labels , вы можете изменить его на:

 SELECT
  customer_email,
  submitted_at,
  title,
  label AS answer_choices,
  answer.number score,
  answer.boolean true_false,
  metadata.platform device_type
FROM
  `sample.responses` resp
LEFT JOIN UNNEST(resp.answers) AS answer
LEFT JOIN UNNEST(answer.choices.labels) AS label
LEFT JOIN `sample.forms` forms
ON answer.field.id  = forms.id
ORDER BY 1 ASC
  

Ответ №2:

Предполагая, что это запрос для построения исходных данных

 SELECT
  'myname@gmail.com' AS customer_email,
  CAST('2018-12-25 04:00:02 UTC' AS TIMESTAMP) AS submitted_at,
  CAST('2018-12-25 03:59:07 UTC' AS TIMESTAMP) AS landed_at,
  'How would you rate this product?' as title,
  STRUCT (
    [
      STRUCT('' as label)
    ] AS choices,
    10 as number,
    NULL as boolean
  ) answers,
  STRUCT (
    'other' AS platform
  ) metadata

UNION ALL

SELECT
  'myname@gmail.com' AS customer_email,
  CAST('2018-12-25 04:00:02 UTC' AS TIMESTAMP) AS submitted_at,
  CAST('2018-12-25 03:59:07 UTC' AS TIMESTAMP) AS landed_at,
  'What did you enjoy the most about your experience with us?' as title,
  STRUCT (
    [
      STRUCT('' as label)
    ] AS choices,
    NULL as number,
    NULL as boolean
  ) answers,
  STRUCT (
    'other' AS platform
  ) metadata

UNION ALL

SELECT
  'myname@gmail.com' AS customer_email,
  CAST('2018-12-25 04:00:02 UTC' AS TIMESTAMP) AS submitted_at,
  CAST('2018-12-25 03:59:07 UTC' AS TIMESTAMP) AS landed_at,
  'What other product(s) would you like to see us make?' as title,
  STRUCT (
    [
      STRUCT('Additional Designs' as label),
      STRUCT('Additional Colors' as label)
    ] AS choices,
    NULL as number,
    NULL as boolean
  ) answers,
  STRUCT (
    'other' AS platform
  ) metadata

UNION ALL

SELECT
  'myname@gmail.com' AS customer_email,
  CAST('2018-12-25 04:00:02 UTC' AS TIMESTAMP) AS submitted_at,
  CAST('2018-12-25 03:59:07 UTC' AS TIMESTAMP) AS landed_at,
  'What color(s) would you want to see?' as title,
  STRUCT (
    [
      STRUCT('Green' as label)
    ] AS choices,
    NULL as number,
    NULL as boolean
  ) answers,
  STRUCT (
    'other' AS platform
  ) metadata
  

Тогда ваш запрос будет

 WITH joined_table AS (
    SELECT
        customer_email,
        submitted_at,
        landed_at,
        title,
        answers.choices answers_choices,
        answers.number score,
        answers.boolean true_false,
        metadata.platform device_type
    FROM `sample.responses` resp
    LEFT JOIN `sample.forms` forms ON resp.answers.field.id  = forms.id
)

SELECT 
    customer_email,
    submitted_at,
    landed_at,
    title,
    unnested_answers_choices.label as answer_choices,
    score,
    true_false,
    device_type
FROM joined_table
CROSS JOIN UNNEST(answers_choices) AS unnested_answers_choices