问题描述:

I have a list of over 100000 values and I am iterating over these values and checking if each one is contained in another list of random values (of the same size).

I am doing this by using `if item[x] in randomList`

.

How efficient is this? Does python do some sort of hashing for each container or is it internally doing a straight up search of the other container to find the element I am looking for?

Also, if it does this search linearly, then does it create a dictionary of the randomList and do the lookup with that?

`in`

is implemented by the `__contains__`

magic method of the object it applies to, so the efficiency is dependent upon that. For instance, `set`

, `dict`

and `frozenset`

will be hash based lookups, while `list`

will require a linear search. However, `xrange`

(or `range`

in Python 3.x) has a `__contains__`

method that doesn't require a linear search, but instead can use the start/stop/step information to determine a truthy value. (eg: `7 in xrange(4, 1000000)`

isn't done linearly).

Custom classes are free to implement `__contains__`

however they see fit but ideally should provide some information about how it does so in documentation if "not obvious".

You will want to pre-convert your list to a set, where hashing can be used for O(1) lookup.

See http://wiki.python.org/moin/TimeComplexity

(Normally, you have to search every element in a 'classical' list to tell if something is in it (unless your data structure also keeps a set of all elements, but that would add a huge amount of time and space complexity, and the programmer can implement it themselves).)

ninjagecko already answered the specific case of lists (and tuples FWIW), the general answer is : "it depends on the 'container' implementation" (I quote 'container' cause it can as well be a generator). For builtin types, you want sets or dicts for a fast lookup. Else you'd better check the doc or implementation for your container.

You can do this without using a set if the two lists are sorted, as follows:

```
def gen(P1, P2):
i, j = 0, 0
I, J = len(P1), len(P2)
while i != I and j != J:
if P1[i] == P2[j]:
yield P1[i]
i, j = i + 1, j + 1
else:
if P1[i] < P2[j]:
i += 1
else:
j += 1
```

This returns the intersection of P1 and P2:

```
>>> P1 = [1, 2, 3, 5, 90]
>>> P2 = [2, 3, 5, 30, 48]
>>> list(gen(P1, P2))
[2, 3, 5]
```

The algorithm is described here.