BigQuery(BQ) is very useful for data analysis or processing.

It is good at handling huge data. It returns summary result in short time.

And we want average, median, mode value when we analyse data.

How can we get these values in BigQuery ?

So today I introduce about **"How to get average, median, mode value in BigQuery"**.

Author

Advantage to read

You can understand "How to get average, median, mode value in BigQuery".

## Data

First, we have to prepare data.

Load this CSV file into table `avg_median_sample`

.

text_col,num_col1,num_col2,num_col3 a,10,1,100 a,10,2,100 a,10,3,100 a,10,4,100 a,10,5,100 a,10,6,100 a,10,7,100 a,10,8,100 a,10,9,100 a,10,10,100 b,12,11,200 b,12,12,200 b,12,13,200 b,12,14,200 c,15,15,200 c,15,16,200 c,15,17,300 c,15,18,300 c,15,19,10000

Then we can see table like below.

Row | text_col | num_col1 | num_col2 | num_col3 |

1 | a | 10 | 1 | 100 |

2 | a | 10 | 2 | 100 |

3 | a | 10 | 3 | 100 |

4 | a | 10 | 4 | 100 |

5 | a | 10 | 5 | 100 |

6 | a | 10 | 6 | 100 |

7 | a | 10 | 7 | 100 |

8 | a | 10 | 8 | 100 |

9 | a | 10 | 9 | 100 |

10 | a | 10 | 10 | 100 |

11 | b | 12 | 11 | 200 |

12 | b | 12 | 12 | 200 |

13 | b | 12 | 13 | 200 |

14 | b | 12 | 14 | 200 |

15 | c | 15 | 15 | 200 |

16 | c | 15 | 16 | 200 |

17 | c | 15 | 17 | 300 |

18 | c | 15 | 18 | 300 |

19 | c | 15 | 19 | 10000 |

## How to get average value

Once data is prepared, get average value.

In order to get average value, you should use `AVG()`

function.

SELECT text_col, avg(num_col1) as avg_1, avg(num_col2) as avg_2, avg(num_col3) as avg_3 FROM test.avg_median_sample GROUP BY text_col

Result is like below.

Row | text_col | avg_1 | avg_2 | avg_3 |

1 | a | 10 | 5.5 | 100 |

2 | b | 12 | 12.5 | 200 |

3 | c | 15 | 17 | 2200 |

## How to get median value

Next, get median value.

Median value is not a calculated value.

It is middle value in ordered list.

We can get median value with `PERCENTILE_CONT()`

function.

We use it as below.

SELECT distinct text_col, PERCENTILE_CONT(num_col1, 0.5) OVER(PARTITION BY text_col) AS median_1, PERCENTILE_CONT(num_col2, 0.5) OVER(PARTITION BY text_col) AS median_2, PERCENTILE_CONT(num_col3, 0.5) OVER(PARTITION BY text_col) AS median_3 FROM test.avg_median_sample ORDER BY text_col

Then we get this result.

Row | text_col | median_1 | median_2 | median_3 |

1 | a | 10 | 5.5 | 100 |

2 | b | 12 | 12.5 | 200 |

3 | c | 15 | 17 | 300 |

Average value of `text_col="c"`

and `num_col3`

was 2200.

Median value is 300. So we can understand this difference.

And if you change second parameter of `PERCENTILE_CONT()`

from `0.5`

to `0.1`

, you can get 10% position value. If `0.75`

, you can get 75% position value.

## How to get mode value

Then try to get mode value.

The mode is the value that appears most frequently in a data set.

Frequency is important. `100`

and `100.1`

are different values.

So mode doesn't match for real numbers.

It is good to use for checking group frequency.

In order to get mode value, you vcan use `APPROX_TOP_COUNT()`

.

SELECT text_col, APPROX_TOP_COUNT(num_col1, 1) top1, APPROX_TOP_COUNT(num_col2, 1) top2, APPROX_TOP_COUNT(num_col3, 1) top3 FROM test.avg_median_sample GROUP BY text_col ORDER BY text_col

Result is below.

Row | text_col | top1.value | top1.count | top2.value | top2.count | top3.value | top3.count |

1 | a | 10 | 10 | 1 | 1 | 100 | 10 |

2 | b | 12 | 4 | 11 | 1 | 200 | 4 |

3 | c | 15 | 5 | 15 | 1 | 200 | 2 |

As you see, `.value`

is value that appears frequently. And `.count`

means appeared count.

## Conclusion

Today I described about **"How to get average, median, mode value in BigQuery"**.

In order to get these values, we can use functions below.

SQL functions

`AVG()`

`PERCENTILE_CONT()`

`APPROX_TOP_COUNT()`

`AVG()`

is easy to use. But others are little bit complicated.There are some IT tips.

If you interested to this article, please read them too.