问题描述:

I am benchmarking the cache behaviour of two search algorithms that operate on a sorted range of items with Cachegrind. I have n items in a vector, and another vector that holds all valid indices. I use std::random_shuffle on the second vector, and do then perform n successful lookups on the items in the first vector. The function I am benchmarking looks roughly as follows:

template <typename Iterator>

void lookup_in_random_order(Iterator begin, Iterator end)

{

const std::size_t N = std::distance(begin, end);

std::vector<std::size_t> idx(N);

std::iota(idx.begin(), idx.end(), 0);

std::srand(std::time(0));

std::random_shuffle(idx.begin(), idx.end());

// Warm the cache -- I don't care about measuring this loop.

for(std::size_t i = 0; i < N; ++i)

my_search(begin, end, idx[i]);

std::random_shuffle(idx.begin(), idx.end());

// This one I would care about!

for(std::size_t i = 0; i < N; ++i)

{

int s = idx[i];

// Especially this line, of course.

my_search(begin, end, s);

}

}

I compile my code with g++ (with -g and -O2). I run Cachegrind and then cg_annotate. I get something like the following:

 Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw

. . . . . . . . . template <typename Iterator>

17 2 2 0 0 0 6 0 0 void lookup_in_random_order(Iterator begin, Iterator end)

. . . . . . . . . {

. . . . . . . . . const std::size_t N = std::distance(begin, end);

. . . . . . . . . std::vector<std::size_t> idx(N);

. . . . . . . . . std::iota(idx.begin(), idx.end(), 0);

. . . . . . . . .

4 0 0 0 0 0 2 1 1 std::srand(std::time(0));

. . . . . . . . . std::random_shuffle(idx.begin(), idx.end());

. . . . . . . . .

3,145,729 0 0 0 0 0 0 0 0 for(std::size_t i = 0; i < N; ++i)

. . . . . . . . . my_search(begin, end, idx[i]);

. . . . . . . . .

. . . . . . . . . std::random_shuffle(idx.begin(), idx.end());

. . . . . . . . .

3,145,729 1 1 0 0 0 0 0 0 for(std::size_t i = 0; i < N; ++i)

. . . . . . . . . {

1,048,575 0 0 1,048,575 132,865 131,065 0 0 0 int s = idx[i];

. . . . . . . . . my_search(begin, end, s);

. . . . . . . . . }

7 0 0 6 1 1 0 0 0 }

For some reason, some lines (especially the most interesting one!) consist of dots. Now, the Cachegrind manual says "Events not applicable for a line are represented by a dot. This is useful for distinguishing between an event which cannot happen, and one which can but did not."

How should this be interpreted? My first idea was that maybe the compiler optimizes my searches away. I thought this cannot be, because the program did spend quite a bit of time running. Still, I tried compiling without the -O2 flag and it seemed to work in a sense that now every line with a call to my_search recorded some numbers (no dots anymore!). However, this doesn't seem like the right way to go for obvious reasons.

In general, is there a way I can tell Cachegrind that "look at this line especially, I am very interested how many cache misses it causes"?

网友答案:

My guess is that with O2 it allows the compiler to perform automatic inlining of the functions where you see the dots. Cachegrind will not see the inlined function calls as the calls have dissappeared. Try "-fno-inline" (Compiler options)

Of course you will probably have different cache performance numbers with and without inlining.

相关阅读:
Top