HIVE
1. When doing multi table joins, except the last ( right most ) table, other tables are buffered in the reducers. So to have less memory footprint on reducers, largest tables should be the last one as it is not buffered but streamed to reducers. For Impala , it is different.
5. To find values of all variables
set
6. If you want to pass different hive configuration settings in hive shell, you can use --hiveconf. For example
hive --hiveconf hivi.optimize.sort.dynamic.partition=true --hiveconf hive.exec.dynamic.partition.mode=nonstrict
import external table <table_name> partition(coumn1='val1') from location '<path>'
11. In HIVE queries, within where clause there are restrictions. See for example
select cast(split(key,'_')[1] as int) from table_a where key='20181003_20180830_1' - works
select key from table_a where cast(split(key,'_')[1] as int) =20180830 - ClassCastException , compilation error
12. in the latest version of hive , you can easily rename a partition
alter table table1 partition(col1=valule1) rename to partition(col1=value2) ;
2. You can filter the unwanted records within the join itself.
3. Joins are left associative. So execution happens from left to right and the result of joins is
fed into next join operation.
4. To find the value of variable
set hive.execution.engine
5. To find values of all variables
set
6. If you want to pass different hive configuration settings in hive shell, you can use --hiveconf. For example
hive --hiveconf hivi.optimize.sort.dynamic.partition=true --hiveconf hive.exec.dynamic.partition.mode=nonstrict
7. In case you accidentally drop the partitions of an external table, you can easily add those back.
ALTER TABLE table_name ADD IF
NOT EXISTS PARTITION (column_name1 =
'value1', column_name2='value2')
LOCATION
'/absolute_hdfs_path/table_name/column_name1 = value1/column_name2=value2'
8. For an external table if you delete the partition directory from HDFS without deleting the partition metadata
in hive, your queries will throw errors. It will complain that partition directory not being found.
9. Few handy commands
hdfs dfs -du -s <hdfs_dir> : Will list the space consumed by each subdirectory under the <hdfs_dir>
hdfs dfs -ls -R <hdfs_dir> : Will list directories under <hdfs_dir> recusrively
hdfs dfs -rm -skipTrash -R <hdfs_dir> : To remove the <hdfs_dir> and subdirectories and immediately freeup the space.
10. To import a partition into an external table. Remember the data directory has to be outside table directory, if it is already under the table
then all you need is to add the partition, import is not needed.
11. In HIVE queries, within where clause there are restrictions. See for example
select cast(split(key,'_')[1] as int) from table_a where key='20181003_20180830_1' - works
select key from table_a where cast(split(key,'_')[1] as int) =20180830 - ClassCastException , compilation error
12. in the latest version of hive , you can easily rename a partition
alter table table1 partition(col1=valule1) rename to partition(col1=value2) ;
Comments
Post a Comment