I have a Python dictionary which has a large (~1.5 million) number of keys. The value associated with each key is a number and I only want to report on values that have values greater than two.

My current code looks something like:

``ks_ignored = 0for k in d.keys():if( d[k] > 2 ):print "Key(%s) has value %s"%( k, d[k] )else:ks_ignored += 1``

My final report shows that about 1.4 million keys were ignored and this takes a very long time to run (about 6 hours). Is there a simple way to loop through all keys which have a value greater than 2 without having to perform the check inside of the loop that will substantially speed this up?

Use dictionary comprehension to get the valid key values:

``````valid_kv = {k:v for k,v in d.iteritems() if v > 2}
``````

Ignored keys:

``````ks_ignored = len(d) - len(valid_kv)
``````

If what you want is to loop over the result, `itertools.ifilter()` should work for you. The following is time execution of list comprehension, `filter()` and `itertools.ifilter()`:

``````import time
import itertools

l = [i for i in range(1000000)]

t1 = time.time()
r1 = [i for i in l if i > 100]
t2 = time.time()

t3 = time.time()
r2 = filter(lambda i: i>100, l)
t4 = time.time()

t5 = time.time()
r3 = itertools.ifilter(lambda i: i>100, l)
t6 = time.time()

print t2-t1
print t4-t3
print t6-t5
``````

Output:

``````0.151000022888  # lc
0.100000143051  # filter
0.000999927520752  # ifilter
``````

``````res = itertools.ifilter(lambda item: d[item]>2, d)
``````

If getting the number of items that do not satisfy your condition is a requirement, you can use `filter()` like below:

``````res = filter(lambda item: d[item]>2, d)
ks_ignored = len(d) - len(res)
``````

Or:

``````ks_ignored = len(filter(lambda item: d[item]<=2, d))
``````

Top