问题描述:

In Python, theoretically, which method should be faster out of `test1`

and `test2`

(assuming same value of `x`

). I have tried using `%timeit`

but see very little difference.

`import numpy as np`

class Tester():

def __init__(self):

self.x = np.arange(100000)

def test1(self):

return np.sum(self.x * self.x )

def test2(self,x):

return np.sum(x*x)

In any implementation of Python, the time will be overwhelmingly dominated by the multiplication of two vectors with 100,000 elements each. Everything else is noise compared to that. Make the vector much smaller if you're really interested in measuring other overheads.

In CPython, `test2()`

will most likely be a little faster. It has an "extra" argument, but arguments are unpacked "at C speed" so that doesn't matter much. Arguments are accessed the same way as local variables, via the `LOAD_FAST`

opcode, which is a simple `array[index]`

access.

In `test1()`

, each instance of `self.x`

causes the string "x" to be looked up in the dictionary `self.__dict__`

. That's slower than an indexed array access. But compared to the time taken by the long-winded multiplication, it's basically nothing.

I know this sort of misses the point of the question, but since you tagged the question with `numpy`

and are looking at speed differences for a large array, I thought I would mention that there are faster solutions would be something else entirely.

So, what you're doing is a dot product, so use `numpy.dot`

, which is built with the multiplying and summing all together from an external library (LAPACK?) (For convenience I'll use the syntax of `test1`

, despite @Tim's answer, because no extra argument needs to be passed.)

```
def test3(self):
return np.dot(self.x, self.x)
```

or possibly even faster (and certainly more general):

```
def test4(self):
return np.einsum('i,i->', self.x, self.x)
```

Here are some tests:

```
In [363]: paste
class Tester():
def __init__(self, n):
self.x = np.arange(n)
def test1(self):
return np.sum(self.x * self.x)
def test2(self, x):
return np.sum(x*x)
def test3(self):
return np.dot(self.x, self.x)
def test4(self):
return np.einsum('i,i->', self.x, self.x)
## -- End pasted text --
In [364]: t = Tester(10000)
In [365]: np.allclose(t.test1(), [t.test2(t.x), t.test3(), t.test4()])
Out[365]: True
In [366]: timeit t.test1()
10000 loops, best of 3: 37.4 µs per loop
In [367]: timeit t.test2(t.x)
10000 loops, best of 3: 37.4 µs per loop
In [368]: timeit t.test3()
100000 loops, best of 3: 15.2 µs per loop
In [369]: timeit t.test4()
100000 loops, best of 3: 16.5 µs per loop
In [370]: t = Tester(10)
In [371]: timeit t.test1()
100000 loops, best of 3: 16.6 µs per loop
In [372]: timeit t.test2(t.x)
100000 loops, best of 3: 16.5 µs per loop
In [373]: timeit t.test3()
100000 loops, best of 3: 3.14 µs per loop
In [374]: timeit t.test4()
100000 loops, best of 3: 6.26 µs per loop
```

And speaking of small, almost syntactic, speed differences, think of using a method rather than standalone function:

```
def test1b(self):
return (self.x*self.x).sum()
```

gives:

```
In [385]: t = Tester(10000)
In [386]: timeit t.test1()
10000 loops, best of 3: 40.6 µs per loop
In [387]: timeit t.test1b()
10000 loops, best of 3: 37.3 µs per loop
In [388]: t = Tester(3)
In [389]: timeit t.test1()
100000 loops, best of 3: 16.6 µs per loop
In [390]: timeit t.test1b()
100000 loops, best of 3: 14.2 µs per loop
```