HIVE

1. When doing multi table joins, except the last ( right most ) table, other tables are buffered in the reducers. So to have less memory footprint on reducers, largest tables should be the last one as it is not buffered but streamed to reducers.   For Impala , it is different.

2.  You can filter the unwanted records within the join itself.

SELECT a.*, b.* FROM a LEFT OUTER JOIN b
ON (a.key=b.key AND b.name='Tim' AND a.name='Kim')

3. Joins are left associative. So execution happens from left to right and the result of joins is 
fed into next join operation.

4. To find the value of variable 
  
set hive.execution.engine

5. To find values of all variables

set 

6. If you want to pass different hive configuration settings in hive shell, you can use --hiveconf. For example

hive --hiveconf hivi.optimize.sort.dynamic.partition=true --hiveconf hive.exec.dynamic.partition.mode=nonstrict

7.  In case you accidentally drop the partitions of an external table, you can easily add those back.

ALTER TABLE  table_name ADD IF NOT EXISTS PARTITION (column_name1 = 'value1', column_name2='value2')  
LOCATION '/absolute_hdfs_path/table_name/column_name1 = value1/column_name2=value2'

8. For an external table if you delete the partition directory from HDFS without deleting the partition metadata 
in hive, your queries will throw errors. It will complain that partition directory not being found.

9.  Few handy commands

hdfs dfs -du -s <hdfs_dir>  : Will list the space consumed by each subdirectory under the <hdfs_dir>
hdfs dfs -ls -R  <hdfs_dir> : Will list directories under <hdfs_dir> recusrively
hdfs dfs -rm -skipTrash -R <hdfs_dir> : To remove the <hdfs_dir> and subdirectories and immediately freeup the space.

10. To import a partition into an external table. Remember the data directory has to be outside table directory, if it is already under the table 
then all you need is to add the partition, import is not needed.

import external table <table_name>  partition(coumn1='val1')  from  location '<path>'

11.  In HIVE queries, within where clause there are restrictions. See for example 

select  cast(split(key,'_')[1] as int)  from table_a where key='20181003_20180830_1'  - works
select  key  from table_a where cast(split(key,'_')[1] as int) =20180830  - ClassCastException , compilation error

12. in the latest version of hive , you can easily rename a partition

alter table table1 partition(col1=valule1)  rename to partition(col1=value2) ;






Comments

Popular posts from this blog

SQL

Analytics

DBeaver