All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: "Wang, Yalin" <Yalin.Wang@sonymobile.com>,
	"'arnd@arndb.de'" <arnd@arndb.de>,
	"'linux-arch@vger.kernel.org'" <linux-arch@vger.kernel.org>,
	"'linux-kernel@vger.kernel.org'" <linux-kernel@vger.kernel.org>,
	"'linux@arm.linux.org.uk'" <linux@arm.linux.org.uk>,
	"'linux-arm-kernel@lists.infradead.org'"
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: [RFC] change non-atomic bitops method
Date: Tue, 3 Feb 2015 12:39:32 +0200	[thread overview]
Message-ID: <20150203103932.GA14259@node.dhcp.inet.fi> (raw)
In-Reply-To: <20150203011730.GA15653@node.dhcp.inet.fi>

[-- Attachment #1: Type: text/plain, Size: 586 bytes --]

On Tue, Feb 03, 2015 at 03:17:30AM +0200, Kirill A. Shutemov wrote:
> Results for 10 runs on my laptop -- i5-3427U (IvyBridge 1.8 Ghz, 2.8Ghz Turbo
> with 3MB LLC):

I've screwed up the inner loop condition and step. As result the benchmark
touches the same cache line 8 times and scan SIZE/8 of memory. Fixed test
is in attach.
				Avg		Stddev
baseline			14.0663		0.0182
-DCHECK_BEFORE_SET		13.8594		0.0458
-DCACHE_HOT			12.3896		0.0867
-DCACHE_HOT -DCHECK_BEFORE_SET	11.7480		0.2497

And now it's faster *with* the check. Sometimes CPU is just too clever. ;)

-- 
 Kirill A. Shutemov

[-- Attachment #2: test.c --]
[-- Type: text/plain, Size: 901 bytes --]

#include <stdio.h>
#include <time.h>
#include <sys/mman.h>

#ifdef CACHE_HOT
#define SIZE (2UL << 20)
#define TIMES 100000
#else
#define SIZE (1UL << 30)
#define TIMES 100
#endif

#define CACHE_LINE 64

int main(int argc, char **argv)
{
	struct timespec a, b, diff;
	unsigned long i, *p, times = TIMES;

	p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
			MAP_ANONYMOUS | MAP_PRIVATE | MAP_POPULATE, -1, 0);
	
	clock_gettime(CLOCK_MONOTONIC, &a);
	while (times--) {
		for (i = 0; i < SIZE / sizeof(*p);
				i += CACHE_LINE / sizeof(*p)) {
#ifdef CHECK_BEFORE_SET
			if (p[i] != times)
#endif
				p[i] = times;
		}
	}
	clock_gettime(CLOCK_MONOTONIC, &b);

	diff.tv_sec = b.tv_sec - a.tv_sec;
	if (a.tv_nsec > b.tv_nsec) {
		diff.tv_sec--;
		diff.tv_nsec = 1000000000 + b.tv_nsec - a.tv_nsec;
	} else
		diff.tv_nsec = b.tv_nsec - a.tv_nsec;

	printf("%lu.%09lu\n", diff.tv_sec, diff.tv_nsec);
	return 0;
}

WARNING: multiple messages have this Message-ID (diff)
From: kirill@shutemov.name (Kirill A. Shutemov)
To: linux-arm-kernel@lists.infradead.org
Subject: [RFC] change non-atomic bitops method
Date: Tue, 3 Feb 2015 12:39:32 +0200	[thread overview]
Message-ID: <20150203103932.GA14259@node.dhcp.inet.fi> (raw)
In-Reply-To: <20150203011730.GA15653@node.dhcp.inet.fi>

On Tue, Feb 03, 2015 at 03:17:30AM +0200, Kirill A. Shutemov wrote:
> Results for 10 runs on my laptop -- i5-3427U (IvyBridge 1.8 Ghz, 2.8Ghz Turbo
> with 3MB LLC):

I've screwed up the inner loop condition and step. As result the benchmark
touches the same cache line 8 times and scan SIZE/8 of memory. Fixed test
is in attach.
				Avg		Stddev
baseline			14.0663		0.0182
-DCHECK_BEFORE_SET		13.8594		0.0458
-DCACHE_HOT			12.3896		0.0867
-DCACHE_HOT -DCHECK_BEFORE_SET	11.7480		0.2497

And now it's faster *with* the check. Sometimes CPU is just too clever. ;)

-- 
 Kirill A. Shutemov
-------------- next part --------------
#include <stdio.h>
#include <time.h>
#include <sys/mman.h>

#ifdef CACHE_HOT
#define SIZE (2UL << 20)
#define TIMES 100000
#else
#define SIZE (1UL << 30)
#define TIMES 100
#endif

#define CACHE_LINE 64

int main(int argc, char **argv)
{
	struct timespec a, b, diff;
	unsigned long i, *p, times = TIMES;

	p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
			MAP_ANONYMOUS | MAP_PRIVATE | MAP_POPULATE, -1, 0);
	
	clock_gettime(CLOCK_MONOTONIC, &a);
	while (times--) {
		for (i = 0; i < SIZE / sizeof(*p);
				i += CACHE_LINE / sizeof(*p)) {
#ifdef CHECK_BEFORE_SET
			if (p[i] != times)
#endif
				p[i] = times;
		}
	}
	clock_gettime(CLOCK_MONOTONIC, &b);

	diff.tv_sec = b.tv_sec - a.tv_sec;
	if (a.tv_nsec > b.tv_nsec) {
		diff.tv_sec--;
		diff.tv_nsec = 1000000000 + b.tv_nsec - a.tv_nsec;
	} else
		diff.tv_nsec = b.tv_nsec - a.tv_nsec;

	printf("%lu.%09lu\n", diff.tv_sec, diff.tv_nsec);
	return 0;
}

  parent reply	other threads:[~2015-02-03 10:40 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-02  3:55 [RFC] change non-atomic bitops method Wang, Yalin
2015-02-02  3:55 ` Wang, Yalin
2015-02-02 18:53 ` Laura Abbott
2015-02-02 18:53   ` Laura Abbott
2015-02-02 19:31 ` Uwe Kleine-König
2015-02-02 19:31   ` Uwe Kleine-König
2015-02-03 15:14   ` David Howells
2015-02-03 15:14     ` David Howells
2015-02-03 15:14     ` David Howells
2015-02-03 19:10     ` Uwe Kleine-König
2015-02-03 19:10       ` Uwe Kleine-König
2015-02-02 23:29 ` Andrew Morton
2015-02-02 23:29   ` Andrew Morton
2015-02-02 23:31   ` Russell King - ARM Linux
2015-02-02 23:31     ` Russell King - ARM Linux
2015-02-03  1:17   ` Kirill A. Shutemov
2015-02-03  1:17     ` Kirill A. Shutemov
2015-02-03  2:13     ` Wang, Yalin
2015-02-03  2:13       ` Wang, Yalin
2015-02-03  5:42       ` Wang, Yalin
2015-02-03  5:42         ` Wang, Yalin
2015-02-03  6:38         ` Andrew Morton
2015-02-03  6:38           ` Andrew Morton
2015-02-03  7:03           ` Wang, Yalin
2015-02-03  7:03             ` Wang, Yalin
2015-02-03  8:42             ` Wang, Yalin
2015-02-03  8:42               ` Wang, Yalin
2015-02-03 10:59               ` Andrew Morton
2015-02-03 10:59                 ` Andrew Morton
2015-02-09  8:18                 ` Wang, Yalin
2015-02-09  8:18                   ` Wang, Yalin
2015-02-09 20:34                   ` Andrew Morton
2015-02-09 20:34                     ` Andrew Morton
2015-02-10  7:05                     ` Wang, Yalin
2015-02-10  7:05                       ` Wang, Yalin
2015-02-09 21:42                   ` Rasmus Villemoes
2015-02-09 21:42                     ` Rasmus Villemoes
2015-02-09 21:42                     ` Rasmus Villemoes
2015-02-03  8:40           ` David Miller
2015-02-03  8:40             ` David Miller
2015-02-03  8:48             ` Andrew Morton
2015-02-03  8:48               ` Andrew Morton
2015-02-03  9:34           ` Rasmus Villemoes
2015-02-03  9:34             ` Rasmus Villemoes
2015-02-03  9:34             ` Rasmus Villemoes
2015-02-03  9:41             ` Wang, Yalin
2015-02-03  9:41               ` Wang, Yalin
2015-02-03 10:39     ` Kirill A. Shutemov [this message]
2015-02-03 10:39       ` Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150203103932.GA14259@node.dhcp.inet.fi \
    --to=kirill@shutemov.name \
    --cc=Yalin.Wang@sonymobile.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@arm.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.