SQL

Joins - Not always very obvious ( Following example uses Hive ).

create table x_temp (
name string,
p_date int  )

insert into x_temp(name,p_date) values
('x1',20181025),('x2', 20181026), ('x3',20181027)

create table y_temp (
name string,
s_date int  )

insert into y_temp(name,s_date) values
('x1',20181025),('x2', 20181026), ('x3',20181027)

create table z_temp (
name string,
p_date int,
s_date int
)

insert into z_temp(name,p_date,s_date) values
('z1','20181025','20181025')

select x.*, y.* from x_temp x, y_temp y, z_temp z
 where x.p_date != z.p_date and  y.s_date != z.s_date)

generates 4 records...

x.name,x.p_date,y.name,y.s_date
x2, 20181026, x2, 20181026
x3, 20181027, x2, 20181026
x2, 20181026, x3, 20181027
x3, 20181027, x3, 20181027

select x.*, y.* from x_temp x, y_temp y, z_temp z
 where x.p_date != z.p_date or  y.s_date != z.s_date)

generates 8 records

x.name,x.p_date,y.name,y.s_date
x2,20181026,x1,20181025
x3,20181027,x1,20181025
x1,20181025,x2,20181026
x2,20181026,x2,20181026
x3,20181027,x2,20181026
x1,20181025,x3,20181027
x2,20181026,x3,20181027
x3,20181027,x3,20181027

I thought this may be the way to get all combinations of x.p_date and y.s_date which don't exist in z_temp  ( p_date and z.s_date ). I was wrong.

Do the following:

insert into z_temp(name,p_date,s_date) values
('z1','20181025','20181026')

Run the same query again, you will get 16 records

x.name,x.p_date,y.name,y.s_date
x1,20181025,x1,20181025
x2,20181026,x1,20181025
x2,20181026,x1,20181025
x3,20181027,x1,20181025
x3,20181027,x1,20181025
x1,20181025,x2,20181026
x2,20181026,x2,20181026
x2,20181026,x2,20181026
x3,20181027,x2,20181026
x3,20181027,x2,20181026
x1,20181025,x3,20181027
x1,20181025,x3,20181027
x2,20181026,x3,20181027
x2,20181026,x3,20181027
x3,20181027,x3,20181027
x3,20181027,x3,20181027

So finally I figure out the following query for what Iwas looking for.

 select x.*, y.* from x_temp x, y_temp y left join z_temp z
 on  (x.p_date = z.p_date and  y.s_date = z.s_date)
 where z.p_date is null and z.s_date is null

this returns 7 records I wanted.

x.name,x.p_date,y.name,y.s_date
x2,20181026,x1,20181025
x3,20181027,x1,20181025
x2,20181026,x2,20181026
x3,20181027,x2,20181026
x1,20181025,x3,20181027
x2,20181026,x3,20181027
x3,20181027,x3,20181027

Comments

Popular posts from this blog

Analytics

HIVE