Passenger (pid, pbithday, pcity) Station (sid, sname, sxcoord, sycoord)
Passenger (pid, pbithday, pcity)
Station (sid, sname, sxcoord, sycoord)
CheckIn (pid, departure_sid, arrive_sid, departure_date, arrive_date)
departure_sid reference sid in Station
arrive_sid reference sid in Station
pid references pid in Passenger
there are 400 million Passenger, 500 stations, and 20 billion checkIn stored in the schema. For simplicity, assume that every attribute is of size 8 bytes, and the size of each tuple is thus 8*x bytes for a table with x attributes, and that the size of a table is simply the size of a tuple multiplied by the number of tuples (i.e., 100% occupancy and no gaps or space for block metadata). Now consider the following queries:
SELECT P.pid
FROM Passenger p, CheckIn C, Station S
WHERE P.pid = C.pid
and C.departure_sid = S.sid
and S.sname = "LA"
and year(P.pbithday) = 2000
and date(C.departure_date) = "2015-12-12"
question:
1. first join P with C, and then join result with S
2. first join S with C, and then join result with P
Which way would be faster for this query,
or would it not make much of a difference in this case? and explain way.

Step by step
Solved in 2 steps









