Re: [PATCH v6 0/7] tlb flush optimization on x86

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Alex Shi <alex.shi@intel.com>
To: Alex Shi <alex.shi@intel.com>
Cc: tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com,
	arnd@arndb.de, rostedt@goodmis.org, fweisbec@gmail.com,
	jeremy@goop.org, riel@redhat.com, luto@mit.edu, avi@redhat.com,
	len.brown@intel.com, dhowells@redhat.com, fenghua.yu@intel.com,
	borislav.petkov@amd.com, yinghai@kernel.org, ak@linux.intel.com,
	cpw@sgi.com, steiner@sgi.com, akpm@linux-foundation.org,
	penberg@kernel.org, a.p.zijlstra@chello.nl, hughd@google.com,
	kamezawa.hiroyu@jp.fujitsu.com, viro@zeniv.linux.org.uk,
	linux-kernel@vger.kernel.org, yongjie.ren@intel.com
Subject: Re: [PATCH v6 0/7] tlb flush optimization on x86
Date: Thu, 17 May 2012 16:40:04 +0800	[thread overview]
Message-ID: <4FB4B964.6050501@intel.com> (raw)
In-Reply-To: <1337233375-840-1-git-send-email-alex.shi@intel.com>

On 05/17/2012 01:42 PM, Alex Shi wrote:

> Thanks Peter Z, Peter Anvin, Nick Piggin, and many others' comments!
> 
> The main change of this version is on generic mmu_gather code.
> It was tested with arm cross-compiler.
> 
> Thanks Rongjie's testing, that show the real case performance gain.
> 
> Alex Shi
> 
> [PATCH v6 1/7] x86/tlb: unify TLB_FLUSH_ALL definition
> [PATCH v6 2/7] x86/tlb_info: get last level TLB entry number of CPU
> [PATCH v6 3/7] x86/flush_tlb: try flush_tlb_single one by one in
> [PATCH v6 4/7] x86/tlb: fall back to flush all when meet a THP large
> [PATCH v6 5/7] x86/tlb: add tlb_flushall_shift for specific CPU
> [PATCH v6 6/7] x86/tlb: enable tlb flush range support for generic
> [PATCH v6 7/7] x86/tlb: add tlb_flushall_shift knob into debugfs




Here is the macro benchmark to measure munmap change:

tlb_flushall_shift = -1
[alexs@lkp-ne04 tlb]$ 
[alexs@lkp-ne04 tlb]$ for t in `echo 4 8 16  `; do echo "=============== t = $t ===================="; for i in `echo  8 16 32  `; do sudo  ./munmap -t $t -n $i; done done
=============== t = 4 ====================
munmap use 164ms 5032ns/time, memory access uses 81605 times/thread/ms, cost 12ns/time
munmap use 86ms 5251ns/time, memory access uses 83378 times/thread/ms, cost 11ns/time
munmap use 46ms 5642ns/time, memory access uses 87212 times/thread/ms, cost 11ns/time
=============== t = 8 ====================
munmap use 197ms 6036ns/time, memory access uses 69295 times/thread/ms, cost 14ns/time
munmap use 96ms 5896ns/time, memory access uses 71895 times/thread/ms, cost 13ns/time
munmap use 62ms 7608ns/time, memory access uses 83895 times/thread/ms, cost 11ns/time
=============== t = 16 ====================
munmap use 274ms 8367ns/time, memory access uses 37860 times/thread/ms, cost 26ns/time
munmap use 139ms 8543ns/time, memory access uses 38137 times/thread/ms, cost 26ns/time
munmap use 74ms 9033ns/time, memory access uses 38349 times/thread/ms, cost 26ns/time
[alexs@lkp-ne04 tlb]$ 
[alexs@lkp-ne04 tlb]$ 
tlb_flushall_shift = 5
[alexs@lkp-ne04 tlb]$ for t in `echo 4 8 16  `; do echo "=============== t = $t ===================="; for i in `echo  8 16 32  `; do sudo  ./munmap -t $t -n $i; done done
=============== t = 4 ====================
munmap use 212ms 6485ns/time, memory access uses 114003 times/thread/ms, cost 8ns/time
munmap use 130ms 7972ns/time, memory access uses 110725 times/thread/ms, cost 9ns/time
munmap use 45ms 5581ns/time, memory access uses 87866 times/thread/ms, cost 11ns/time
=============== t = 8 ====================
munmap use 253ms 7734ns/time, memory access uses 94578 times/thread/ms, cost 10ns/time
munmap use 147ms 9012ns/time, memory access uses 83851 times/thread/ms, cost 11ns/time
munmap use 63ms 7713ns/time, memory access uses 87473 times/thread/ms, cost 11ns/time
=============== t = 16 ====================
munmap use 369ms 11284ns/time, memory access uses 38854 times/thread/ms, cost 25ns/time
munmap use 264ms 16131ns/time, memory access uses 37870 times/thread/ms, cost 26ns/time
munmap use 73ms 8981ns/time, memory access uses 38309 times/thread/ms, cost 26ns/time

The munmap.c file is here:
---

/*
   munmap.c
   This is a macrobenchmark for TLB flush range testing.

   This program is free software; you can redistribute it and/or modify
   it under the terms of the GNU General Public License as published by
   the Free Software Foundation; either version 2 of the License.

   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.

   You should have received a copy of the GNU General Public License
   along with this program; if not, write to the Free Software
   Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.

   Copyright (C) Intel 2012
   Coypright Alex Shi alex.shi@intel.com 

   gcc -o munmap munmap.c -lrt -lpthread -O2

    #perf stat -e r881,r882,r884 -e r801,r802,r810,r820,r840,r880,r807 -e rc01 -e r4901,r4902,r4910,r4920,r4940,r4980 -e r5f01  -e rbd01,rdb20  -e r4f02 -e r8004,r8201,r8501,r8502,r8504,r8510,r8520,r8540,r8580  -e rae01,rc820,rc102,rc900 -e r8600  -e rcb10  ./munmap 
*/

#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <sys/mman.h>
#include <time.h>
#include <sys/types.h>
#include <pthread.h>

#define FILE_SIZE	(1024*1024*1024)

#define PAGE_SIZE 	4096
#define HPAGE_SIZE 	4096*512

#ifndef MAP_HUGETLB
#define MAP_HUGETLB	0x40000
#endif


long getnsec(clockid_t clockid) {
        struct timespec ts;
        if (clock_gettime(clockid, &ts) == -1)
                perror("clock_gettime failed");
        return (long) ts.tv_sec * 1000000000 + (long) ts.tv_nsec;
}

//data for threads
struct data{
	int *readp;
	void *startaddr;
	int rw;
	int loop;
};
volatile int * threadstart;
//thread for memory accessing
void *accessmm(void *data){
	struct data *d = data;
	long *actimes;
	char x;
	int i, k;
	int randn[PAGE_SIZE];
	
	for (i=0;i<PAGE_SIZE; i++)
		randn[i] = rand();

	actimes = malloc(sizeof(long));

	while (*threadstart == 0 )
		usleep(1);

	if (d->rw == 0)
		for (*actimes=0; *threadstart == 1; (*actimes)++)
			for (k=0; k < *d->readp; k++)
				x = *(volatile char *)(d->startaddr + randn[k]%FILE_SIZE); 
	else
		for (*actimes=0; *threadstart == 1; (*actimes)++)
			for (k=0; k < *d->readp; k++)
				*(char *)(d->startaddr + randn[k]%FILE_SIZE) = 1; 
	return actimes;
}

int main(int argc, char *argv[])
{
        static  char            optstr[] = "n:l:p:w:ht:";
	int n = 8;	/* default flush entries number */
	int l = 1; 	/* default loop times */
	int p = 512;	/* default accessed page number, after munmap */
	int er = 0, rw = 0, h = 0, t = 0; /* d: debug; h: use huge page; t thread number */
	int pagesize = PAGE_SIZE; /*default for regular page */
	volatile char x;
	long protindex = 0;

	int i, j, k, c;
	void *m1, *startaddr;
	unsigned long *startaddr2[1024*512];
	volatile void *tempaddr;
	clockid_t clockid = CLOCK_MONOTONIC;
	unsigned long start, stop, mptime, actime;
	int randn[PAGE_SIZE];

	pthread_t	pid[1024];
	void * res;
	struct data data;

	for (i=0;i<PAGE_SIZE; i++)
		randn[i] = rand();

        while ((c = getopt(argc, argv, optstr)) != EOF)
                switch (c) {
                case 'n':
                        n = atoi(optarg);
                        break;
                case 'p':
                        p = atoi(optarg);
                        break;
                case 'h':
                        h = 1;
                        break;
                case 'w':
                        rw = atoi(optarg);
                        break;
                case 't':
                        t = atoi(optarg);
                        break;
                case '?':
                        er = 1;
                        break;
                }
        if (er) {
                printf("usage: %s %s\n", argv[0], optstr);
                exit(1);
	}

	//printf("my pid is %d n=%d p=%d t=%d\n", getpid(), n, p, t);	
	if (h == 0){
		startaddr = mmap(0, FILE_SIZE, PROT_READ|PROT_WRITE, MAP_ANONYMOUS | MAP_SHARED, -1, 0);
		for (j = 0; j < FILE_SIZE/PAGE_SIZE/n; j++) {
			startaddr2[j] = mmap(0, PAGE_SIZE*n, PROT_READ|PROT_WRITE, MAP_ANONYMOUS | MAP_SHARED, -1, 0);
			if (startaddr2[j] == MAP_FAILED) {
				perror("mmap");
				exit(1);
			}
			*startaddr2[j] = 1;
		}
		pagesize = PAGE_SIZE;
	} else {
		startaddr = mmap(0, FILE_SIZE, PROT_READ|PROT_WRITE, MAP_ANONYMOUS | MAP_SHARED | MAP_HUGETLB, -1, 0);
		for (j = 0; j < FILE_SIZE/HPAGE_SIZE/n; j++) {
			startaddr2[j] = mmap(0, HPAGE_SIZE*n, PROT_READ|PROT_WRITE, MAP_ANONYMOUS | MAP_SHARED, -1, 0);
			if (startaddr2[j] == MAP_FAILED) {
				perror("mmap");
				exit(1);
			}
			*startaddr2[j] = 1;
		}
		pagesize = HPAGE_SIZE;
	}
	if (startaddr == MAP_FAILED) {
		perror("mmap");
		exit(1);
	}

	start = getnsec(clockid);
	//access whole memory, will generate many page faults 
	for (tempaddr = startaddr; tempaddr < startaddr + FILE_SIZE; tempaddr += pagesize)
		memset((char *)tempaddr, 0, 1);
        stop = getnsec(clockid);
//	printf("get 256K pages with one byte writing uses %lums, %luns/time \n", 
//		(stop - start)/1000000, (stop-start)*pagesize/FILE_SIZE);

	//thread created, and goes to sleep
	threadstart = malloc(sizeof(int));
	*threadstart = 0;
	data.readp = &p; data.startaddr = startaddr; data.rw = rw; data.loop = l;
	for (i=0; i< t; i++)
		if(pthread_create(&pid[i], NULL, accessmm, &data))
			perror("pthread create");
	//wait for randn[] filling.
	if (t!=0)	sleep(1);

	mptime = actime = 0;
	if (t != 0)
		start = getnsec(clockid);
	//kick threads, let them running.
	*threadstart = 1;
	for (j = 0; j < FILE_SIZE/pagesize/n; j++) {

		if (t == 0)
			start = getnsec(clockid);

		if(munmap(startaddr2[j], n*pagesize)==-1) {
			perror("munmap");
			goto end;
		}
		if (t == 0) {
			stop = getnsec(clockid);
			mptime += stop - start;
		}

		if (t == 0) {
			// access p number pages 
			start = stop; 
			if (rw == 0)
				for (k=0; k < p; k++)
					x = *(volatile char *)(startaddr + randn[k]%FILE_SIZE);
			else
				for (k=0; k < p; k++)
					*(char *)(startaddr + randn[k]%FILE_SIZE) = 1;
			actime += getnsec(clockid) - start;
		} 
	}
	//to avoid accessmm miss *threadstart == 1
	usleep(10000);//sleep 10ms
	*threadstart = 0;
	if (t != 0) {
		stop = getnsec(clockid);
		mptime += stop - start;
	}

	//get threads' result.
	for (i=0; i< t; i++) {
		if (pthread_join(pid[i], &res))
			perror("pthread_join");
		actime += *(long*)res;
	}
	l = FILE_SIZE/pagesize/n;
end:
	if ( t == 0 ) 
	       	printf("munmap use %lums %luns/time, memory access uses %lums %luns/time \n",
			 mptime/1000000, mptime/(l), actime/1000000, actime/p/l);
	else
		printf("munmap use %lums %luns/time, memory access uses %ld times/thread/ms, cost %ldns/time\n",
			 mptime/1000000, mptime/(l), actime*p*1000000/t/mptime, mptime*t/(actime*p));
	exit(0);
}

next prev parent reply	other threads:[~2012-05-17  8:41 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-17  5:42 [PATCH v6 0/7] tlb flush optimization on x86 Alex Shi
2012-05-17  5:42 ` [PATCH v6 1/7] x86/tlb: unify TLB_FLUSH_ALL definition Alex Shi
2012-05-17  5:42 ` [PATCH v6 2/7] x86/tlb_info: get last level TLB entry number of CPU Alex Shi
2012-05-17  5:42 ` [PATCH v6 3/7] x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range Alex Shi
2012-05-17  5:42 ` [PATCH v6 4/7] x86/tlb: fall back to flush all when meet a THP large page Alex Shi
2012-05-17  5:42 ` [PATCH v6 5/7] x86/tlb: add tlb_flushall_shift for specific CPU Alex Shi
2012-05-17  5:42 ` [PATCH v6 6/7] x86/tlb: enable tlb flush range support for generic mmu on x86 Alex Shi
2012-05-17  5:42 ` [PATCH v6 7/7] x86/tlb: add tlb_flushall_shift knob into debugfs Alex Shi
2012-05-17  8:40 ` Alex Shi [this message]
2012-05-17  8:49   ` [PATCH v6 0/7] tlb flush optimization on x86 Alex Shi
2012-05-18  0:16 ` Alex Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FB4B964.6050501@intel.com \
    --to=alex.shi@intel.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=avi@redhat.com \
    --cc=borislav.petkov@amd.com \
    --cc=cpw@sgi.com \
    --cc=dhowells@redhat.com \
    --cc=fenghua.yu@intel.com \
    --cc=fweisbec@gmail.com \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=jeremy@goop.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@mit.edu \
    --cc=mingo@redhat.com \
    --cc=penberg@kernel.org \
    --cc=riel@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=steiner@sgi.com \
    --cc=tglx@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    --cc=yinghai@kernel.org \
    --cc=yongjie.ren@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox