vendredi 17 juin 2016

MS Access query with multiple joins returning duplicate rows

I am working with Access to query multiple tables that have passenger info in them. I've been able to distill this down to what I think is causing the issue, but I am unsure how to fix it.

Essentially there are three tables:

PASSENGERS

ID   | PASSENGER_NAME | COST_CENTER
-------------------------------------
1    | John Wright    | 31231       
2    | Cheryl Brown   | 54555       
3    | Adam Yang      | 65655       

FARE_LEVEL

ID   | TICKET_NUMBER  | PASSENGER_NAME | TICKET_AMT | IS_REFUND
---------------------------------------------------------------
1    | 14325435       | John Wright    | $632.64    | 0    
2    | 46746745       | Adam Yang      | $797.32    | 0  
3    | 45354434       | Cheryl Brown   | $2331.00   | 0  
4    | 67876456       | Cheryl Brown   | $990.11    | 0     
5    | 34654546       | Adam Yang      | $552.71    | 0  
6    | 14325435       | John Wright    | -$632.64   | 1    
7    | 87989879       | John Wright    | $123.11    | 0  
8    | 99124324       | Adam Yang      | $1114.42   | 0  
9    | 77231235       | Adam Yang      | $6232.32   | 0 

INCENTIVE_LOG

ID   | PASSENGER_NAME | INCENTIVE_AMT
-------------------------------------
1    | Adam Yang      | $1000.00       
2    | Cheryl Brown   | $1000.00       
3    | John Wright    | $1000.00   
4    | John Wright    | $1000.00       
5    | John Wright    | $1000.00   

I am using this query to join them:

SELECT
PASSENGERS.PASSENGER_NAME,
Sum(FARE_LEVEL.TICKET_AMT) AS SumOfFARES,
Count(INCENTIVE_LOG.INCENTIVE) AS CountOfINCENTIVE,
Sum(INCENTIVE_LOG.INCENTIVE) AS SumOfINCENTIVE
FROM (PASSENGERS LEFT OUTER JOIN FARE_LEVEL ON PASSENGERS.PASSENGER_NAME = FARE_LEVEL.PASSENGER_NAME)
INNER JOIN INCENTIVE_LOG ON PASSENGERS.PASSENGER_NAME = INCENTIVE_LOG.PASSENGER_NAME
GROUP BY PASSENGERS.PASSENGER_NAME;

The result I am looking for is the sum of all fares for each passenger in the PASSENGER table as well as the sum of all incentives taken for each passenger in the PASSENGER table. So for "John Wright" I would hope to see

PASSENGER_NAME | SumOfFares | CountOfIncentive | SumOfIncentive
---------------------------------------------------------------
John Wright    | $123.11    | 3                | $3000.00   

Instead I am getting get a result that seems to double count. When I change the GROUP BY to

GROUP BY PASSENGERS.PASSENGER_NAME, INCENTIVE_LOG.ID, FARE_LEVEL.ID;

I can see all of the duplicated rows that it is summing in error. It appears to be creating a row for every combination of FARE_LEVEL and INCENTIVE row that corresponds to a passenger so if someone had 7 fares and 3 incentives it would create 21 rows.

I am fairly certain the issue is in my join, but I am not sure how to fix it.

EDIT: I was able to solve the issue by creating two subqueries within the original query

SELECT
    PASSENGERS.PASSENGER_NAME,
    Sum(Query4.SumOfTICKET_AMT) AS SumOfFARES,
    Sum(Query2.SumOfINCENTIVE) AS Incentive
FROM (PASSENGERS 
    LEFT OUTER JOIN (
        SELECT PASSENGERS.PASSENGER_NAME, Sum(FARE_LEVEL.TICKET_AMT) AS SumOfTICKET_AMT
        FROM PASSENGERS LEFT OUTER JOIN FARE_LEVEL ON PASSENGERS.PASSENGER_NAME = FARE_LEVEL.PASSENGER_NAME
        GROUP BY PASSENGERS.PASSENGER_NAME 
    ) AS Query4
    ON PASSENGERS.PASSENGER_NAME = Query4.PASSENGER_NAME)
    INNER JOIN (
        SELECT PASSENGERS.PASSENGER_NAME, Sum(INCENTIVE_LOG.INCENTIVE) AS SumOfINCENTIVE
        FROM PASSENGERS LEFT OUTER JOIN INCENTIVE_LOG ON PASSENGERS.PASSENGER_NAME = INCENTIVE_LOG.PASSENGER_NAME
        GROUP BY PASSENGERS.PASSENGER_NAME
    ) AS Query2
    ON PASSENGERS.PASSENGER_NAME = Query2.PASSENGER_NAME
GROUP BY PASSENGERS.PASSENGER_NAME;

Aucun commentaire:

Enregistrer un commentaire