lundi 13 juin 2016

The Hadoop Hierarchy Puzzle

I have recently started working on Hadoop , my past experience is on ETL. Now i have an issue where i want to build parent - child hierarchies. Below is the input- INPUT

Parent_Id Child_Id

FAC001    FAC001

FAC001    FAC002

FAC002    FAC003

FAC003    FAC004

FAC004    FAC005

AAA005    AAA005 

AAA005    AAA001 

AAA001    AAA006 

Desired Output

Top_Parent_Id Parent_Id Child_Id Level

FAC001        FAC001    FAC001   1

FAC001        FAC001    FAC002   2

FAC001        FAC002    FAC003   3

FAC001        FAC003    FAC004   4

FAC001        FAC004    FAC005   5

AAA005        AAA005    AAA005   1

AAA005        AAA005    AAA006   2

Can you please suggest a way achieve this , I have implemented the same logic in hive where i was able to create the hierarchies to a pre- defined levels (Using Self Joins ). But i was wondering if the same can be implemented in Spark or Pig upto n number of dynamic levels.

Note : Its not necessary that parent will be less than Child numerically or even alphabetically,Hence ordering should be avoided.
Appreciate all inputs.

Thanks in advance.

Aucun commentaire:

Enregistrer un commentaire