问题描述:

In Python, theoretically, which method should be faster out of test1 and test2 (assuming same value of x). I have tried using %timeit but see very little difference.

import numpy as np

class Tester():

def __init__(self):

self.x = np.arange(100000)

def test1(self):

return np.sum(self.x * self.x )

def test2(self,x):

return np.sum(x*x)

网友答案:

In any implementation of Python, the time will be overwhelmingly dominated by the multiplication of two vectors with 100,000 elements each. Everything else is noise compared to that. Make the vector much smaller if you're really interested in measuring other overheads.

In CPython, test2() will most likely be a little faster. It has an "extra" argument, but arguments are unpacked "at C speed" so that doesn't matter much. Arguments are accessed the same way as local variables, via the LOAD_FAST opcode, which is a simple array[index] access.

In test1(), each instance of self.x causes the string "x" to be looked up in the dictionary self.__dict__. That's slower than an indexed array access. But compared to the time taken by the long-winded multiplication, it's basically nothing.

网友答案:

I know this sort of misses the point of the question, but since you tagged the question with numpy and are looking at speed differences for a large array, I thought I would mention that there are faster solutions would be something else entirely.

So, what you're doing is a dot product, so use numpy.dot, which is built with the multiplying and summing all together from an external library (LAPACK?) (For convenience I'll use the syntax of test1, despite @Tim's answer, because no extra argument needs to be passed.)

def test3(self):
    return np.dot(self.x, self.x)

or possibly even faster (and certainly more general):

def test4(self):
    return np.einsum('i,i->', self.x, self.x)

Here are some tests:

In [363]: paste
class Tester():
    def __init__(self, n):
        self.x = np.arange(n)
    def test1(self):
        return np.sum(self.x * self.x)
    def test2(self, x):
        return np.sum(x*x)
    def test3(self):
        return np.dot(self.x, self.x)
    def test4(self):
        return np.einsum('i,i->', self.x, self.x)
## -- End pasted text --

In [364]: t = Tester(10000)

In [365]: np.allclose(t.test1(), [t.test2(t.x), t.test3(), t.test4()])
Out[365]: True

In [366]: timeit t.test1()
10000 loops, best of 3: 37.4 µs per loop

In [367]: timeit t.test2(t.x)
10000 loops, best of 3: 37.4 µs per loop

In [368]: timeit t.test3()
100000 loops, best of 3: 15.2 µs per loop

In [369]: timeit t.test4()
100000 loops, best of 3: 16.5 µs per loop

In [370]: t = Tester(10)

In [371]: timeit t.test1()
100000 loops, best of 3: 16.6 µs per loop

In [372]: timeit t.test2(t.x)
100000 loops, best of 3: 16.5 µs per loop

In [373]: timeit t.test3()
100000 loops, best of 3: 3.14 µs per loop

In [374]: timeit t.test4()
100000 loops, best of 3: 6.26 µs per loop

And speaking of small, almost syntactic, speed differences, think of using a method rather than standalone function:

def test1b(self):
    return (self.x*self.x).sum()

gives:

In [385]: t = Tester(10000)

In [386]: timeit t.test1()
10000 loops, best of 3: 40.6 µs per loop

In [387]: timeit t.test1b()
10000 loops, best of 3: 37.3 µs per loop

In [388]: t = Tester(3)

In [389]: timeit t.test1()
100000 loops, best of 3: 16.6 µs per loop

In [390]: timeit t.test1b()
100000 loops, best of 3: 14.2 µs per loop
相关阅读:
Top