The basic idea of complex datatypes is to store multiple values in a single column. So if you are working with a Hive database and you query a column, but then you notice “This value I need is trapped in a column among other values…” you just came across a complex a.k.a. nested datatype.

There are three types: arrays, maps and structs. First, you have to understand, which types are present. Depending on the datatype, there are different ways how you can access the values.

  • array(ARRAY<data_type>): It is an ordered collection of elements. The elements in the array must be of the same type. So you have for example an array of strings, integers etc.

  • map (MAP<primitive_type, data_type>): It is an unordered collection of key-value pairs. Keys must be of primitive types. Values can be of any type. A map can have an arbitary number of elements.

  • struct(:STRUCT<col_name : data_type): It is a collection of elements of different types. The concept of a struct is very similar to a row in a table: structs have a fixed number of fields (“columns”). Every field has a predefined type.


type what is it? how do they look like example
Array ordered collection of elements of the same type a comma-separated list surrounded with […] ["a","b"]
Map unordered collection of key-value paris comma-separated list of key:value pairs, surrounded with {…} {"a":1, "b":2}
Struct collection of elements of different types written using the JSON map format {"a": 1, "b":"c"}

Use DESCRIBE <table> to get a list of the datatypes of the columns in your table. You may notice, that one complex data type is nested into another, e.g. you may see an array of structs. But don’t worry, if you understand each type separately, you can untangle the nested structures aswell. So let’s have a look at each of those types and how we can work with them.

Arrays

So arrays are an ordered collection of elements of the same type. You could compare them to lists of the same type in Python or vectors in R. Since they are ordered, you can use a numeric index to access the elements. The index starts at 0.

For example, the table students has a column extra_curriculum, which is an array of strings.


first_name extra_curriculum
Tom [‘orchestra’]
Ann [‘orchestra’, ‘art’]

You can get the first value in the array, by indexing it with [0].

SELECT 
	first_name, 
	extra_curriculum[0] AS first_extra_curriculum 
FROM 
	students
;

This returns:


first_name first_extra_curriculum
Tom orchestra
Ann orchestra

Let’s now say you want to have one row for each element in your array (The equivalent of Rs gather-function). Hive has build-in table generating functions (UDTF) for that. These UDTFs transform a single input row into multiple output rows. The basic syntax is

LATERAL VIEW explode(expression) tableAlias AS columnAlias

This returns 0 to many rows: one row for each element from the input array. So running the following query on our students table

SELECT 
	first_name, 
	extra_curriculum 
FROM 
	students
	LATERAL VIEW EXPLODE (extra_curriculum) ec AS extra_curriculum
;

will result in a table, with each entry in extra_curriculum as a row.


first_name extra_curriculum
Tom orchestra
Ann orchestra
Ann art

Maps

Maps are used for key-value pairs. They are very similar to dictionaries in Python or a named vector in R. You can access the key-value pairs with the name of the key in brackets.

In this example, you have the table students with column grade, MAP<string, string>, which maps different subjects to their letter grade.


first_name grade
Tom {‘math’: ‘B’, ‘english’: ‘B’}
Ann {‘math’: ‘A’, ‘english’: ‘B’, ‘biology’: ‘C’}

You can extract all the math grades by using the key math in brackets.

SELECT 
	first_name, 
	grade["math"] AS math_grade 
FROM 
	students
;
first_name math_grade
Tom B
Ann A

Structs

Structs are written in JSON format. You can access the values using the dot notation for the field to extact the value.


first_name teacher
Tom {‘math’: ‘Mrs Johnson’, ‘english’: ‘Mr Miller’, ‘nr_teachers’: 2}
Ann {‘math’: ‘Mrs Johnson’, ‘english’: ‘Mrs Thomson’, ‘biology’: ‘Mr Chu’, ‘nr_teachers’: 3}
SELECT 
	first_name, 
	teacher.math AS math_teacher 
FROM 
	student
;
first_name math_teacher
Tom Mrs Johnson
Ann Mrs Johnson

Summary

To sum it up, depending on the complex data type, you can query elements differently. This last table gives an overview on how.


Type How to extract elements Example
Array by index my_array[0]
Map by key my_map['key']
Struct by dot notation my_struct.name