最近看了些Memory Barrier的东西,针对一个经典的示例程序,做了重现性测试。
对这个程序原型做一个简要介绍:有全局变量x/y/r1/r2,初始值均为0;线程1执行x = 1; r1 = y; 线程2执行y = 1; r2 = x。从“程序员角度”,在两个序列执行过后,r1 == r2 == 0不可能成立。但当发生执行时“访存重排序”时,却可能出现r1/r2同时为0的情况。使用CPU内存屏障适当“隔离”访存操作,杜绝重排序,就可以确保程序得到预想的结果。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | //~ reordering.cpp #include <semaphore.h> sem_t begin1; sem_t begin2; sem_t end; volatile int x = 0, y = 0, r1 = 0, r2 = 0; void *thread1(void *) { while (true) { sem_wait(&begin1); x = 1; //asm volatile("mfence" ::: "memory"); //~ barrier to the processors //asm volatile("" ::: "memory"); //~ barrier to compiler r1 = y; sem_post(&end); } return NULL; } void *thread2(void *) { while (true) { sem_wait(&begin2); y = 1; //asm volatile("mfence" ::: "memory"); //~ barrier to the processors //asm volatile("" ::: "memory"); //~ barrier to compiler r2 = x; sem_post(&end); } return NULL; } int main(int argc, char *argv[]) { sem_init(&begin1, 0, 0); sem_init(&begin2, 0, 0); sem_init(&end, 0, 0); pthread_t t1, t2; pthread_create(&t1, NULL, thread1, NULL); pthread_create(&t2, NULL, thread2, NULL); int detected = 0; for (int iterations = 1; ; iterations++) { x = y = 0; //~ sync beginning sem_post(&begin1); sem_post(&begin2); //~ sync end sem_wait(&end); sem_wait(&end); if (r1 == 0 && r2 == 0) { ++detected; cerr<<detected<<" reordering detected out of "<<iterations<<" iterations"<<endl;; cerr<<"one time per "<<iterations/detected<<" iteration"<<endl; } } return 0; } |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | $ g++ reordering.cpp -pthread $ ./a.out 1 reordering detected out of 511 iterations one time per 511 iteration 2 reordering detected out of 717 iterations one time per 358 iteration 3 reordering detected out of 30946 iterations one time per 10315 iteration [...] 48 reordering detected out of 281289 iterations one time per 5860 iteration 49 reordering detected out of 281419 iterations one time per 5743 iteration 50 reordering detected out of 289078 iterations one time per 5781 iteration |
测试环境:i7-3520m, 4 cores