1. Find names appearing in both log files. From this create a Set of customers that appear in both. In python it's just creating a set from the first file, creating another set from the second file and unionizing them. O(N)
2. concatenate both files to treat as one. create a Map with key: Customer and value: Set[Page]. This is basically iterating through the log, when you see a customer append the customer_id to the map and add the page to the set if it already exists. O(N)
3. Filter the map for all customers with length(Set[Page]) > 1. To get the Set of all customers that visited more than one page. O(N)
4. Combine the sets of customers who visited multiple pages with customers that appeared in both log files into a new set. O(N). You can do this with, again the union operator in python.
The irony is I'm more able to do this in python then I am in SQL. For SQL stuff at work I just look it up. I never remember the exact syntax.
Total runtime O(N)
total memory O(N)
This is just your basic hashmap stuff for fast look up and aggregation. Usually fang questions are harder than this.
1. Find names appearing in both log files. From this create a Set of customers that appear in both. In python it's just creating a set from the first file, creating another set from the second file and unionizing them. O(N)
2. concatenate both files to treat as one. create a Map with key: Customer and value: Set[Page]. This is basically iterating through the log, when you see a customer append the customer_id to the map and add the page to the set if it already exists. O(N)
3. Filter the map for all customers with length(Set[Page]) > 1. To get the Set of all customers that visited more than one page. O(N)
4. Combine the sets of customers who visited multiple pages with customers that appeared in both log files into a new set. O(N). You can do this with, again the union operator in python.
The irony is I'm more able to do this in python then I am in SQL. For SQL stuff at work I just look it up. I never remember the exact syntax. Total runtime O(N) total memory O(N)
This is just your basic hashmap stuff for fast look up and aggregation. Usually fang questions are harder than this.