Transactions in HIVE

Full ACID  semantics at row level is supported since HIVE  0.13.  Earlier it was only supported at partition level.  

  • Atomicity: Entire operation is single operation. Either entire operation happens or none.
  • Consistency: Once an operation is completed, every subsequent operations sees the same result of it. 
  • Isolation: One user operation does not impact another user. 
  • Durability: Once operation is done ..its result remains thereafter.
At present isolation is only at snapshot level. There are following isolation levels in various DBMS:
  • Snapshot: The snapshot of data at the beginning of  transaction will be visible to through out the transaction. Whatever is happening in other transaction will never be seen by this transaction. 
  • Dirty Read: Uncommitted updates from other transactions can be seen.
  • Read Committed: Only updates which are committed at the time of read will be seen by this transaction.
  • Repeatable Read: Read lock on data being read and write lock on data being created/updated/deleted. 
  • Serializable: Most restrictive. The concurrent transaction appear as if happening in serial. 
Transactions are only supported for ORC format. By default transactions are turned off. Also the transaction manager should be set to DBTxnTransactionManager in order to work with ACID tables. Tables must be bucketed for using ACID feature.  As external tables are beyond the control of compactor, those can not be ACID. You need to set some TBLPROPERTIES to indicate it to be ACID table. You also need to set some other configuration in the Hive ( session ) for enabling ACID transactions for these tables, otherwise depending upon HIVE version and the operation , either will be executed in non-ACID manner or will not be allowed.


Comments

Popular posts from this blog

SQL

Analytics

HIVE