I have recently started working on Hadoop , my past experience is on ETL. Now i have an issue where i want to build parent - child hierarchies. Below is the input- INPUT
Parent_Id Child_Id
FAC001 FAC001
FAC001 FAC002
FAC002 FAC003
FAC003 FAC004
FAC004 FAC005
AAA005 AAA005
AAA005 AAA001
AAA001 AAA006
Desired Output
Top_Parent_Id Parent_Id Child_Id Level
FAC001 FAC001 FAC001 1
FAC001 FAC001 FAC002 2
FAC001 FAC002 FAC003 3
FAC001 FAC003 FAC004 4
FAC001 FAC004 FAC005 5
AAA005 AAA005 AAA005 1
AAA005 AAA005 AAA006 2
Can you please suggest a way achieve this , I have implemented the same logic in hive where i was able to create the hierarchies to a pre- defined levels (Using Self Joins ). But i was wondering if the same can be implemented in Spark or Pig upto n number of dynamic levels.
Note : Its not necessary that parent will be less than Child numerically or even alphabetically,Hence ordering should be avoided.
Appreciate all inputs.
Thanks in advance.
Aucun commentaire:
Enregistrer un commentaire