SQL SELF JOIN Tutorial Example
Hi everyone. So I need help understanding the following example in the SELF JOIN lesson.
SELECT DISTINCT japan_investments.company_name, japan_investments.company_permalink FROM tutorial.crunchbase_investments_part1 japan_investments JOIN tutorial.crunchbase_investments_part1 gb_investments ON japan_investments.company_name = gb_investments.company_name AND gb_investments.investor_country_code = 'GBR' AND gb_investments.funded_at > japan_investments.funded_at WHERE japan_investments.investor_country_code = 'JPN' ORDER BY 1
I'm not following how SQL is going through and comparing the rows. For example, from my understanding:
AND gb_investments.funded_at > japan_investments.funded_at
Can never be true since both of those entries are in the same column/row position. In other words, gb_investments.funded_at should equal japan_investments.funded_at.
Also, I don't see how the WHERE statement can work since all of the rows have already been filtered country_code = 'GBR'. Indeed, if you print all the rows to this query, there are no rows with 'JPN' as the country_code.
What this code is doing is joining a table to it's self, thus making a Cartesian product of all rows ( https://en.wikipedia.org/wiki/Cartesian_product ) with two sets of values for each output row access from the japan_investments side and gb_investments set of input values.
The join first limits only for companies with the same name, and that the GB side is from GBR and the gb side was funded after the JP side. Then finally the JP side is from JPN.
Now the all these clause could have been done in the ON or the WHERE because it's an INNER JOIN that are equivalent (in this case).
But the real thing to note is we have access to both the "left" and "right" version of the table values via their aliases, thus can apply filters to the matching rows.
create table investments (name text, country text, funded_at timestamp); insert into investments values ('A','GBR', '2018-05-01'), -- not funed in JPN ('B','GBR', '2018-05-01'),('B','JPN', '2018-05-02'), -- funded in JPN after GBR ('C','GBR', '2018-05-01'),('C','JPN', '2018-04-01'), -- funced in GBR after JPN (the only answer) ('D','JPN', '2018-04-03'); -- funced in GBR after JPN (the only answer) select distinct japan_investments.name --,japan_investments.company_permalink FROM investments japan_investments JOIN investments gb_investments ON japan_investments.name = gb_investments.name AND gb_investments.country = 'GBR' AND gb_investments.funded_at > japan_investments.funded_at WHERE japan_investments.country = 'JPN' ORDER BY 1
which gives only the answer C. Given how I inserted the values into the base table in company name pairs you can visualize how the data is getting compared after the japan_investments.name = gb_investments.name clause has run.Reply