Understanding Default Delimiters

In Hive, by default, each fields are delimited by CTRL-A (ie Octal \001).

If a field is a complex type(such as ARRAY, STRUCT or MAP) containing Primitive Type(like INT, STRING), then each value inside the complex type is delimited by CTRL-B (\002)

Example field : array('val1', 'val2')

If a field is a complex type containing complex type(we call this as Nested Type), then delimiters CTRL-B (\002) followed by CTRL-C(\003), CTRL-D(\004), etc... are used depending on the depth of the nesting.

Example field : array(named_struct('var1', 'val1'))

Let us test this concept with an example(a Table containing 4 fields)
create table marks as
select 
   'bob'                                           as name, 
   named_struct('engtot', '104', 'mathtot', '135') as total, 
   array(
      array(
         named_struct(
            'eng', 
            '33',
            'math',
            '44'
         ),
         named_struct(
            'eng',          
            '35',
            'math',
            '45'
         )
      ),
      array(
         named_struct(
            'eng', 
            '36',
            'math',
            '46'
         )
      )      
   )                                               as testmarks,
   map('engavg', '34.6', 'mathavg', '45')          as average;

hive> desc marks;
name                 string
total                struct
testmarks            array<array<struct<eng:string,math:string>>>
average              map   

hive> select * from marks;
bob   {"engtot":"104","mathtot":"135"} [[{"eng":"33","math":"44"},{"eng":"35","math":"45"}],[{"eng":"36","math":"46"}]] {"engavg":"34.6","mathavg":"45"}

Let us pull the table data into Local file system & verify the delimiters in VI Editor

hdfs dfs -getmerge /user/hive/warehouse/marks result.txt
vi result.txt
bob^A104^B135^A33^D44^C35^D45^B36^D46^Aengavg^C34.6^Bmathavg^C45

Let us dissect this data for clarity,

bob                           ^A
104            ^B 135         ^A
33^D44^C35^D45 ^B 36^D46      ^A
engavg^C34.6   ^B mathavg^C45
(Note : In a map, Key & Value is always delimited by ^C)

2 comments: