From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: "Wang, Yalin" <Yalin.Wang@sonymobile.com>,
"'arnd@arndb.de'" <arnd@arndb.de>,
"'linux-arch@vger.kernel.org'" <linux-arch@vger.kernel.org>,
"'linux-kernel@vger.kernel.org'" <linux-kernel@vger.kernel.org>,
"'linux@arm.linux.org.uk'" <linux@arm.linux.org.uk>,
"'linux-arm-kernel@lists.infradead.org'"
<linux-arm-kernel@lists.infradead.org>
Subject: Re: [RFC] change non-atomic bitops method
Date: Tue, 3 Feb 2015 03:17:30 +0200 [thread overview]
Message-ID: <20150203011730.GA15653@node.dhcp.inet.fi> (raw)
In-Reply-To: <20150202152909.13bfd11f192fb0268b2ab4bf@linux-foundation.org>
On Mon, Feb 02, 2015 at 03:29:09PM -0800, Andrew Morton wrote:
> On Mon, 2 Feb 2015 11:55:03 +0800 "Wang, Yalin" <Yalin.Wang@sonymobile.com> wrote:
>
> > This patch change non-atomic bitops,
> > add a if() condition to test it, before set/clear the bit.
> > so that we don't need dirty the cache line, if this bit
> > have been set or clear. On SMP system, dirty cache line will
> > need invalidate other processors cache line, this will have
> > some impact on SMP systems.
> >
> > --- a/include/asm-generic/bitops/non-atomic.h
> > +++ b/include/asm-generic/bitops/non-atomic.h
> > @@ -17,7 +17,9 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
> > unsigned long mask = BIT_MASK(nr);
> > unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> >
> > - *p |= mask;
> > + if ((*p & mask) == 0)
> > + *p |= mask;
> > +
> > }
>
> hm, maybe.
>
> It will speed up set_bit on an already-set bit. But it will slow down
> set_bit on a not-set bit. And the latter case is presumably much, much
> more common.
>
> How do we know the patch is a net performance gain?
Let's try to measure. The micro benchmark:
#include <stdio.h>
#include <time.h>
#include <sys/mman.h>
#ifdef CACHE_HOT
#define SIZE (2UL << 20)
#define TIMES 10000000
#else
#define SIZE (1UL << 30)
#define TIMES 10000
#endif
int main(int argc, char **argv)
{
struct timespec a, b, diff;
unsigned long i, *p, times = TIMES;
p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_PRIVATE | MAP_POPULATE, -1, 0);
clock_gettime(CLOCK_MONOTONIC, &a);
while (times--) {
for (i = 0; i < SIZE/64/sizeof(*p); i++) {
#ifdef CHECK_BEFORE_SET
if (p[i] != times)
#endif
p[i] = times;
}
}
clock_gettime(CLOCK_MONOTONIC, &b);
diff.tv_sec = b.tv_sec - a.tv_sec;
if (a.tv_nsec > b.tv_nsec) {
diff.tv_sec--;
diff.tv_nsec = 1000000000 + b.tv_nsec - a.tv_nsec;
} else
diff.tv_nsec = b.tv_nsec - a.tv_nsec;
printf("%lu.%09lu\n", diff.tv_sec, diff.tv_nsec);
return 0;
}
Results for 10 runs on my laptop -- i5-3427U (IvyBridge 1.8 Ghz, 2.8Ghz Turbo
with 3MB LLC):
Avg Stddev
baseline 21.5351 0.5315
-DCHECK_BEFORE_SET 21.9834 0.0789
-DCACHE_HOT 14.9987 0.0365
-DCACHE_HOT -DCHECK_BEFORE_SET 29.9010 0.0204
Difference between -DCACHE_HOT and -DCACHE_HOT -DCHECK_BEFORE_SET appears
huge, but if you recalculate it to CPU cycles per inner loop @ 2.8 Ghz,
it's 1.02530 and 2.04401 CPU cycles respectively.
Basically, the check is free on decent CPU.
--
Kirill A. Shutemov
WARNING: multiple messages have this Message-ID (diff)
From: kirill@shutemov.name (Kirill A. Shutemov)
To: linux-arm-kernel@lists.infradead.org
Subject: [RFC] change non-atomic bitops method
Date: Tue, 3 Feb 2015 03:17:30 +0200 [thread overview]
Message-ID: <20150203011730.GA15653@node.dhcp.inet.fi> (raw)
In-Reply-To: <20150202152909.13bfd11f192fb0268b2ab4bf@linux-foundation.org>
On Mon, Feb 02, 2015 at 03:29:09PM -0800, Andrew Morton wrote:
> On Mon, 2 Feb 2015 11:55:03 +0800 "Wang, Yalin" <Yalin.Wang@sonymobile.com> wrote:
>
> > This patch change non-atomic bitops,
> > add a if() condition to test it, before set/clear the bit.
> > so that we don't need dirty the cache line, if this bit
> > have been set or clear. On SMP system, dirty cache line will
> > need invalidate other processors cache line, this will have
> > some impact on SMP systems.
> >
> > --- a/include/asm-generic/bitops/non-atomic.h
> > +++ b/include/asm-generic/bitops/non-atomic.h
> > @@ -17,7 +17,9 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
> > unsigned long mask = BIT_MASK(nr);
> > unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> >
> > - *p |= mask;
> > + if ((*p & mask) == 0)
> > + *p |= mask;
> > +
> > }
>
> hm, maybe.
>
> It will speed up set_bit on an already-set bit. But it will slow down
> set_bit on a not-set bit. And the latter case is presumably much, much
> more common.
>
> How do we know the patch is a net performance gain?
Let's try to measure. The micro benchmark:
#include <stdio.h>
#include <time.h>
#include <sys/mman.h>
#ifdef CACHE_HOT
#define SIZE (2UL << 20)
#define TIMES 10000000
#else
#define SIZE (1UL << 30)
#define TIMES 10000
#endif
int main(int argc, char **argv)
{
struct timespec a, b, diff;
unsigned long i, *p, times = TIMES;
p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_PRIVATE | MAP_POPULATE, -1, 0);
clock_gettime(CLOCK_MONOTONIC, &a);
while (times--) {
for (i = 0; i < SIZE/64/sizeof(*p); i++) {
#ifdef CHECK_BEFORE_SET
if (p[i] != times)
#endif
p[i] = times;
}
}
clock_gettime(CLOCK_MONOTONIC, &b);
diff.tv_sec = b.tv_sec - a.tv_sec;
if (a.tv_nsec > b.tv_nsec) {
diff.tv_sec--;
diff.tv_nsec = 1000000000 + b.tv_nsec - a.tv_nsec;
} else
diff.tv_nsec = b.tv_nsec - a.tv_nsec;
printf("%lu.%09lu\n", diff.tv_sec, diff.tv_nsec);
return 0;
}
Results for 10 runs on my laptop -- i5-3427U (IvyBridge 1.8 Ghz, 2.8Ghz Turbo
with 3MB LLC):
Avg Stddev
baseline 21.5351 0.5315
-DCHECK_BEFORE_SET 21.9834 0.0789
-DCACHE_HOT 14.9987 0.0365
-DCACHE_HOT -DCHECK_BEFORE_SET 29.9010 0.0204
Difference between -DCACHE_HOT and -DCACHE_HOT -DCHECK_BEFORE_SET appears
huge, but if you recalculate it to CPU cycles per inner loop @ 2.8 Ghz,
it's 1.02530 and 2.04401 CPU cycles respectively.
Basically, the check is free on decent CPU.
--
Kirill A. Shutemov
next prev parent reply other threads:[~2015-02-03 1:20 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-02 3:55 [RFC] change non-atomic bitops method Wang, Yalin
2015-02-02 3:55 ` Wang, Yalin
2015-02-02 18:53 ` Laura Abbott
2015-02-02 18:53 ` Laura Abbott
2015-02-02 19:31 ` Uwe Kleine-König
2015-02-02 19:31 ` Uwe Kleine-König
2015-02-03 15:14 ` David Howells
2015-02-03 15:14 ` David Howells
2015-02-03 15:14 ` David Howells
2015-02-03 19:10 ` Uwe Kleine-König
2015-02-03 19:10 ` Uwe Kleine-König
2015-02-02 23:29 ` Andrew Morton
2015-02-02 23:29 ` Andrew Morton
2015-02-02 23:31 ` Russell King - ARM Linux
2015-02-02 23:31 ` Russell King - ARM Linux
2015-02-03 1:17 ` Kirill A. Shutemov [this message]
2015-02-03 1:17 ` Kirill A. Shutemov
2015-02-03 2:13 ` Wang, Yalin
2015-02-03 2:13 ` Wang, Yalin
2015-02-03 5:42 ` Wang, Yalin
2015-02-03 5:42 ` Wang, Yalin
2015-02-03 6:38 ` Andrew Morton
2015-02-03 6:38 ` Andrew Morton
2015-02-03 7:03 ` Wang, Yalin
2015-02-03 7:03 ` Wang, Yalin
2015-02-03 8:42 ` Wang, Yalin
2015-02-03 8:42 ` Wang, Yalin
2015-02-03 10:59 ` Andrew Morton
2015-02-03 10:59 ` Andrew Morton
2015-02-09 8:18 ` Wang, Yalin
2015-02-09 8:18 ` Wang, Yalin
2015-02-09 20:34 ` Andrew Morton
2015-02-09 20:34 ` Andrew Morton
2015-02-10 7:05 ` Wang, Yalin
2015-02-10 7:05 ` Wang, Yalin
2015-02-09 21:42 ` Rasmus Villemoes
2015-02-09 21:42 ` Rasmus Villemoes
2015-02-09 21:42 ` Rasmus Villemoes
2015-02-03 8:40 ` David Miller
2015-02-03 8:40 ` David Miller
2015-02-03 8:48 ` Andrew Morton
2015-02-03 8:48 ` Andrew Morton
2015-02-03 9:34 ` Rasmus Villemoes
2015-02-03 9:34 ` Rasmus Villemoes
2015-02-03 9:34 ` Rasmus Villemoes
2015-02-03 9:41 ` Wang, Yalin
2015-02-03 9:41 ` Wang, Yalin
2015-02-03 10:39 ` Kirill A. Shutemov
2015-02-03 10:39 ` Kirill A. Shutemov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150203011730.GA15653@node.dhcp.inet.fi \
--to=kirill@shutemov.name \
--cc=Yalin.Wang@sonymobile.com \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=linux-arch@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@arm.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.