IT博客汇 | A False-Sharing Test

A False-Sharing Test

dutor发表于 2013-03-01 14:53:14

　　今天通过酷壳一篇推荐了解了一下 Cache 的“伪共享”(False Sharing). 写了小程序做了个简单的测试：

//! false_sharing.cpp
#include <iostream>
#include <pthread.h>
#include <sys/time.h>
using namespace std;
 
template <size_t NPAD, size_t NTH = 4, size_t NLOOP = (1<<30)>
class FalseSharing {
public:
  void run() {
    struct timeval b, e;
    gettimeofday(&b, NULL);
    pthread_t threads[NTH];
    for (size_t i = 0; i < NTH; ++i) {
      pthread_create(&threads[i], NULL, hook, reinterpret_cast<void*> (i));
    }
 
    for (size_t i = 0; i < NTH; ++i) {
      pthread_join(threads[i], NULL);
    }
    gettimeofday(&e, NULL);
    cout<<"padding bytes: "<<NPAD<<endl;;
    cout<<"thread count: "<<NTH<<endl;
    cout<<"loop count: "<<NLOOP<<endl;
    cout<<"elapsed(ms): "<< ((e.tv_sec-b.tv_sec)*1000000 + (e.tv_usec-b.tv_usec)) / 1000 <<endl;
    cout<<endl;
  }
 
private:
  static void* hook(void *args) {
    size_t ith = reinterpret_cast<size_t> (args);
    for (size_t i = 0; i < NLOOP; ++i) {
      ++s[ith].n;
    }
    return NULL;
  }
private:
  struct S {
    size_t  n;
    char    padding[NPAD];
  };
  static S s[NTH];
};
 
template <size_t NPAD, size_t NTH, size_t NLOOP>
typename FalseSharing<NPAD, NTH, NLOOP>::S FalseSharing<NPAD, NTH, NLOOP>::s[NTH];
 
int
main() {
  FalseSharing<0> test1;
  test1.run();
  FalseSharing<56> test2;
  test2.run();
  return 0;
};

$ cat /proc/cpuinfo | egrep 'model name|cache_alignment'
model name	: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
cache_alignment	: 64
model name	: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
cache_alignment	: 64
model name	: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
cache_alignment	: 64
model name	: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
cache_alignment	: 64
$ g++ false_sharing.cpp -pthread
$ time ./a.out 
padding bytes   : 0
thread count    : 4
loop count      : 1073741824
elapsed(ms)     : 12638
 
padding bytes   : 56
thread count    : 4
loop count      : 1073741824
elapsed(ms)     : 3888
 
real	0m16.530s
user	1m3.688s
sys	0m0.044s

PS. 上面结果是在一台2核4线程的i7-3520笔记本上面测得的，在一台4核16线程的E5520(2.27GHz)机器上，结果更加触目惊心：

padding bytes: 0
thread count: 4
loop count: 1073741824
elapsed(ms): 13646
 
padding bytes: 56
thread count: 4
loop count: 1073741824
elapsed(ms): 3548
 
 
padding bytes: 0
thread count: 16
loop count: 1073741824
elapsed(ms): 43848
 
padding bytes: 56
thread count: 16
loop count: 1073741824
elapsed(ms): 5051

UPDATE
　　有意思的是，根据Felix同学的测试性能差距并没有这么大，详情请移步到这里。