
BigQuery(BQ) is very useful for data analysis or processing.
It is good at handling huge data. It returns summary result in short time.
When we handle big data, sometimes we want to extract or summarise data that meets specific conditions.
Usually we use comparison operator like below.
WHERE COLUMN1 = "A"
SUM(IF(COLUMN1 = "A",1,0)
But it fails in specific case.
Why didn't it work.
So today I introduce about "Why comparison and conditional aggregation fails in BigQuery".
Author

Advantage to read
You can understand "Why comparison and conditional aggregation fails in BigQuery". Then you don't have to concern about comparison and conditional aggregation.
Data
First, prepare data.
Import this csv file as null_sample
table.
Data
col1,col2,col3 a,b,c ,b,c ,,c a,b, a,,
Then you can see table like below.
Table
Row | col1 | col2 | col3 |
1 | null | b | c |
2 | null | null | c |
3 | a | b | c |
4 | a | b | null |
5 | a | null | null |
Comparison and conditional aggregation
They are fine example of comparison and conditional aggregation.
We use operator =
.
SQL
SELECT * FROM test.null_sample WHERE col1 = "a"
Result
Row | col1 | col2 | col3 |
1 | a | b | c |
2 | a | b | null |
3 | a | null | null |
In order to get conditional sum, we can use SUM
and IF
function.
SQL
SELECT SUM(IF(col1="a",1,0)) as sum_col1_a FROM test.null_sample
Result
Row | sum_col1_a |
1 | 3 |
Example of failure
Then they are examples of failure.
They use !=
and its comparison does not work.
SQL
SELECT * FROM test.null_sample WHERE col1 != "a"
Result
Even if conditional aggregation, it fails like below.
SQL
SELECT SUM(IF(col1="a",1,0)) as sum_col1_a, SUM(IF(col1!="a",1,0)) as sum_col1_not_a, SUM(IF(col2!="a",1,0)) as sum_col2_not_a, SUM(IF(col3!="a",1,0)) as sum_col3_not_a FROM test.null_sample
Result
Row | sum_col1_a | sum_col1_not_a | sum_col2_not_a | sum_col3_not_a |
1 | 3 | 0 | 3 | 3 |
Sample table has 5 records.
In the table, about non- a
count, it should be 2 in col1.
And it should be 5 in col2 or col3.
Why did it fail to aggregate.
Why comparison and conditional aggregation fails in BigQuery
The reason why comparison and conditional aggregation fails in BigQuery is null
.
null
is special that does not return true to both =
nor !=
.
So if you want to compare data,you should replace null with other value by IFNULL
or COALESCE
.
SQL
SELECT * FROM test.null_sample WHERE IFNULL(col1,"") != "a"
Result
Row | col1 | col2 | col3 |
1 | null | b | c |
2 | null | null | c |
For conditional aggregation, we should use IFNULL
or COALESCE
.
SQL
SELECT SUM(IF(IFNULL(col1,"")="a",1,0)) as sum_col1_a, SUM(IF(IFNULL(col1,"")!="a",1,0)) as sum_col1_not_a, SUM(IF(IFNULL(col2,"")!="a",1,0)) as sum_col2_not_a, SUM(IF(IFNULL(col3,"")!="a",1,0)) as sum_col3_not_a FROM test.null_sample
Result
Row | sum_col1_a | sum_col1_not_a | sum_col2_not_a | sum_col3_not_a |
1 | 3 | 2 | 5 | 5 |
In this case we got correct non-a
records and non-a
count.
Conclusion
Today I explained about "Why comparison and conditional aggregation fails in BigQuery".
The reason why comparison and conditional aggregation fails in BigQuery is null
.
It fails when we try to compare null
directly.
Solution is this.
Point
- Replace null by
IFNULL
orCOALESCE

null
is very complicated.There are some other articles about BigQuey.
If you interested in them, please read them.
Read more