DB

Why comparison and conditional aggregation fails in BigQuery

Share this for your friends.

Why comparison and conditional aggregation fails in BigQuery

Why did comparison and conditional aggregation fail in BigQuery ?


BigQuery(BQ) is very useful for data analysis or processing.

It is good at handling huge data. It returns summary result in short time.


When we handle big data, sometimes we want to extract or summarise data that meets specific conditions.

Usually we use comparison operator like below.

WHERE COLUMN1 = "A"
SUM(IF(COLUMN1 = "A",1,0)

But it fails in specific case.

Why didn't it work.

So today I introduce about "Why comparison and conditional aggregation fails in BigQuery".

Author


Mid-carieer engineer (AI, system). Good at Python and SQL.

Advantage to read

You can understand "Why comparison and conditional aggregation fails in BigQuery". Then you don't have to concern about comparison and conditional aggregation.


Data

First, prepare data.

Import this csv file as null_sample table.

Data

col1,col2,col3
a,b,c
,b,c
,,c
a,b,
a,,


Then you can see table like below.

Table

Row col1 col2 col3
1 null b c
2 null null c
3 a b c
4 a b null
5 a null null



Comparison and conditional aggregation

They are fine example of comparison and conditional aggregation.

We use operator =.

SQL

SELECT * FROM test.null_sample
WHERE col1 = "a"


Result

Row col1 col2 col3
1 a b c
2 a b null
3 a null null


In order to get conditional sum, we can use SUM and IF function.

SQL

SELECT
SUM(IF(col1="a",1,0)) as sum_col1_a
FROM test.null_sample


Result

Row sum_col1_a
1 3



Example of failure

Then they are examples of failure.

They use != and its comparison does not work.

SQL

SELECT * FROM test.null_sample
WHERE col1 != "a"


Result

No result


Even if conditional aggregation, it fails like below.

SQL

SELECT
SUM(IF(col1="a",1,0)) as sum_col1_a,
SUM(IF(col1!="a",1,0)) as sum_col1_not_a,
SUM(IF(col2!="a",1,0)) as sum_col2_not_a,
SUM(IF(col3!="a",1,0)) as sum_col3_not_a
FROM test.null_sample


Result

Row sum_col1_a sum_col1_not_a sum_col2_not_a sum_col3_not_a
1 3 0 3 3


Sample table has 5 records.

In the table, about non- a count, it should be 2 in col1.

And it should be 5 in col2 or col3.

Why did it fail to aggregate.



Why comparison and conditional aggregation fails in BigQuery

The reason why comparison and conditional aggregation fails in BigQuery is null.

null is special that does not return true to both = nor !=.

So if you want to compare data,you should replace null with other value by IFNULL or COALESCE.

SQL

SELECT * FROM test.null_sample
WHERE IFNULL(col1,"") != "a"


Result

Row col1 col2 col3
1 null b c
2 null null c


For conditional aggregation, we should use IFNULL or COALESCE.

SQL

SELECT
SUM(IF(IFNULL(col1,"")="a",1,0)) as sum_col1_a,
SUM(IF(IFNULL(col1,"")!="a",1,0)) as sum_col1_not_a,
SUM(IF(IFNULL(col2,"")!="a",1,0)) as sum_col2_not_a,
SUM(IF(IFNULL(col3,"")!="a",1,0)) as sum_col3_not_a
FROM test.null_sample


Result

Row sum_col1_a sum_col1_not_a sum_col2_not_a sum_col3_not_a
1 3 2 5 5


In this case we got correct non-a records and non-a count.



Conclusion

Today I explained about "Why comparison and conditional aggregation fails in BigQuery".

The reason why comparison and conditional aggregation fails in BigQuery is null.

It fails when we try to compare null directly.

Solution is this.

Point

  • Replace null by IFNULL or COALESCE


null is very complicated.



There are some other articles about BigQuey.

If you interested in them, please read them.


Share this for your friends.

If you felt this article is useful, please share.

にほんブログ村 IT技術ブログへ

-DB
-,

© 2023 ITips