mercredi 15 juin 2016

Trying to do a natural join using python standard library

Ok so here is the challenge I am facing .

I have 4 tables: shops (attributes: shop_id,name,lat,lng) products (attributes: product_id,shop_id,title,popularity tags (attributes: tag_id,tag) taggings (attributes: tagging_id,shop_id,tag_id)

I want to do a natural join over all those tables to products one table only. (a natural join is the same as an inner join with condition that shops.shop_id = products.shop_id = taggings.shop_id AND tags.tag_id = taggings.tag_id)

import csv
from collections import Counter

print "This is going to take a while ..."    

shops = list(csv.reader(open('data/shops.csv'),delimiter=','))
products = list(csv.reader(open('data/products.csv'),delimiter=','))
tag = list(csv.reader(open('data/tags.csv'),delimiter=','))
taggings = list(csv.reader(open('data/taggings.csv'),delimiter=','))


shops_products = []



hash_ = {}
for idx, row1 in enumerate(shops):
    hash_[row1[0]] = idx 


for row2 in products:
    if row2[1] in hash_:
        shops_products.append(shops[hash_[row2[1]]] + row2)






print "shops_products done ..."
print len(shops_products)










tags_taggings = []

hash_ = {}
for idx, row1 in enumerate(tag):
    hash_[row1[0]] = idx 


for row2 in taggings:
    if row2[2] in hash_:
        tags_taggings.append(tag[hash_[row2[2]]] + row2)

print "tags_taggings done ..."
print len(tags_taggings)

so far so good. Now I have 2 tables named shops_products, tags_taggings. If I merge them together I get the table that I want.

The challenge is that shops_products has many rows with the same shops_id AND tags_taggings has many rows with the same shop_id as well.

so I cannot merge them the way I was merging tables shops,products,tags,taggings .

I could merge them doing a nested loop. but That would result in O(n^2) complexity , over 40 minutes to complete.

I would like your help in figuring out a way to do this the way I was joining my other tables as you see in the code. That would be a satisfactory O(n) and finish fairly quick .

Thank you for reading.

Aucun commentaire:

Enregistrer un commentaire