I am doing this in spark
cityId PhysicalAddress EmailAddress ..many other columns of other meta info...
1 b st something@email.com
1 b st something@email.com <- some rows can be entirely duplicates
1 a avenue random@gmail.com
2 c square anything@yahoo.com
2 d blvd d@d.com
There is no primary key on this table and I want to grab one random row based on each distinct cityId
e.g. This is a correct answer
cityId PhysicalAddress EmailAddress ..many other columns
1 b st something@email.com
2 c square anything@yahoo.com
e.g. this is also an correct answer
cityId PhysicalAddress EmailAddress ..many other columns
1 a avenue random@gmail.com
2 c square anything@yahoo.com
One way that comes to mind is to use a group by
. However, that requires me to use an aggregate function on the other column. (such as min()). Whereas, I just want to pull out an entire row (doesn't matter which one).
Aucun commentaire:
Enregistrer un commentaire