From: "Paul E. McKenney" <paulmck@linux.ibm.com>
To: Akira Yokosawa <akiyks@gmail.com>
Cc: perfbook@vger.kernel.org
Subject: Re: [RFC PATCH] count_lim_sig: Add pair of smp_wmb() and smp_rmb()
Date: Thu, 18 Oct 2018 08:15:19 -0700 [thread overview]
Message-ID: <20181018151519.GI2674@linux.ibm.com> (raw)
In-Reply-To: <5796b290-9c04-8e33-703a-b823cedd16c0@gmail.com>
On Thu, Oct 18, 2018 at 10:03:56PM +0900, Akira Yokosawa wrote:
> On 2018/10/17 17:37:39 -0700, Paul E. McKenney wrote:
> > On Thu, Oct 18, 2018 at 07:21:38AM +0900, Akira Yokosawa wrote:
> >> On 2018/10/17 08:10:52 -0700, Paul E. McKenney wrote:
> >>> On Tue, Oct 16, 2018 at 08:04:00AM +0900, Akira Yokosawa wrote:
> >>>> >From 7b01fc0f19cfa010536d7eb53e4d0cda1e6b801f Mon Sep 17 00:00:00 2001
> >>>> From: Akira Yokosawa <akiyks@gmail.com>
> >>>> Date: Mon, 15 Oct 2018 23:46:52 +0900
> >>>> Subject: RFC [PATCH] count_lim_sig: Add pair of smp_wmb() and smp_rmb()
> >>>>
> >>>> This message-passing pattern requires smp_wmb()--smp_rmb() pairing.
> >>>>
> >>>> Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
> >>>> ---
> >>>> Hi Paul,
> >>>>
> >>>> I'm not sure this addition of memory barriers is actually required,
> >>>> but it does look like so.
> >>>>
> >>>> And I'm aware that you have avoided using weaker memory barriers in
> >>>> CodeSamples.
> >>>>
> >>>> Thoughts?
> >>>
> >>> Hello, Akira,
> >>>
> >>> I might be missing something, but it looks to me like this ordering is
> >>> covered by heavyweight ordering in the signal handler entry/exit and
> >>> the gblcnt_mutex. So what sequence of events leads to the failiure
> >>> scenario that you are seeing?
> >>
> >> So the fastpaths in add_count() and sub_count() are not protected by
> >> glbcnt_mutex. The slowpath in flush_local_count() waits the transition
> >> of theft from REQ to READY, clears counter and countermax, and finally
> >> assign IDLE to theft.
> >>
> >> So, the fastpaths can see (theft == IDLE) but see a non-zero value of
> >> counter or countermax, can't they?
> >
> > Maybe, maybe not. Please lay out a sequence of events showing a problem,
> > as in load by load, store by store, line by line. Intuition isn't as
> > helpful as one might like for this kind of stuff. ;-)
>
> Gotcha!
>
> I've not exhausted the timing variations, but now I see when
> split_local_count() sees (*theft@[t] == THEFT_READY), counter part of
> add_count() or sub_count() has exited the fastpath (marked by
> counting == 1).
>
> So the race I imagined has never existed.
I know that feeling!!!
> Thanks for your nice suggestion!
Well, there might well be another race. My main concern is whether or not
signal-handler entry/exit really provides full ordering on all platforms.
Thoughts?
Thanx, Paul
> >> One theory to prevent this from happening is because all the per-thread
> >> variables of a thread reside in a single cache line, and if the fastpaths
> >> see the updated value of theft, they are guaranteed to see the latest
> >> values of both counter and countermax.
> >
> > Good point, but we need to avoid that sort of assumption unless we
> > placed the variables into a struct and told the compiler to align it
> > appropriately. And even then, hardware architectures normally don't
> > make this sort of guarantee. There is too much that can go wrong, from
> > ECC errors to interrupts at just the wrong time, and much else besides.
>
> Absolutely!
>
> Thanks, Akira
>
> >
> > Thanx, Paul
> >
> >> I might be completely missing something, though.
> >>
> >> Thanks, Akira
> >>
> >>>
> >>> Thanx, Paul
> >>>
> >>>> Thanks, Akira
> >>>> --
> >>>> CodeSamples/arch-arm/arch-arm.h | 2 ++
> >>>> CodeSamples/arch-arm64/arch-arm64.h | 2 ++
> >>>> CodeSamples/arch-ppc64/arch-ppc64.h | 2 ++
> >>>> CodeSamples/arch-x86/arch-x86.h | 2 ++
> >>>> CodeSamples/count/count_lim_sig.c | 21 +++++++++++++--------
> >>>> 5 files changed, 21 insertions(+), 8 deletions(-)
> >>>>
> >>>> diff --git a/CodeSamples/arch-arm/arch-arm.h b/CodeSamples/arch-arm/arch-arm.h
> >>>> index 065c6f1..6f0707b 100644
> >>>> --- a/CodeSamples/arch-arm/arch-arm.h
> >>>> +++ b/CodeSamples/arch-arm/arch-arm.h
> >>>> @@ -41,6 +41,8 @@
> >>>> /* __sync_synchronize() is broken before gcc 4.4.1 on many ARM systems. */
> >>>> #define smp_mb() __asm__ __volatile__("dmb" : : : "memory")
> >>>>
> >>>> +#define smp_rmb() __asm__ __volatile__("dmb ish" : : : "memory")
> >>>> +#define smp_wmb() __asm__ __volatile__("dmb ishst" : : : "memory")
> >>>>
> >>>> #include <stdlib.h>
> >>>> #include <sys/time.h>
> >>>> diff --git a/CodeSamples/arch-arm64/arch-arm64.h b/CodeSamples/arch-arm64/arch-arm64.h
> >>>> index 354f1f2..a6ccf33 100644
> >>>> --- a/CodeSamples/arch-arm64/arch-arm64.h
> >>>> +++ b/CodeSamples/arch-arm64/arch-arm64.h
> >>>> @@ -41,6 +41,8 @@
> >>>> /* __sync_synchronize() is broken before gcc 4.4.1 on many ARM systems. */
> >>>> #define smp_mb() __asm__ __volatile__("dmb ish" : : : "memory")
> >>>>
> >>>> +#define smp_rmb() __asm__ __volatile__("dmb ishld" : : : "memory")
> >>>> +#define smp_wmb() __asm__ __volatile__("dmb ishst" : : : "memory")
> >>>>
> >>>> #include <stdlib.h>
> >>>> #include <time.h>
> >>>> diff --git a/CodeSamples/arch-ppc64/arch-ppc64.h b/CodeSamples/arch-ppc64/arch-ppc64.h
> >>>> index 7b0b025..2d6a2b5 100644
> >>>> --- a/CodeSamples/arch-ppc64/arch-ppc64.h
> >>>> +++ b/CodeSamples/arch-ppc64/arch-ppc64.h
> >>>> @@ -42,6 +42,8 @@
> >>>>
> >>>> #define smp_mb() __asm__ __volatile__("sync" : : : "memory")
> >>>>
> >>>> +#define smp_rmb() __asm__ __volatile__("lwsync" : : : "memory")
> >>>> +#define smp_wmb() __asm__ __volatile__("lwsync" : : : "memory")
> >>>>
> >>>> /*
> >>>> * Generate 64-bit timestamp.
> >>>> diff --git a/CodeSamples/arch-x86/arch-x86.h b/CodeSamples/arch-x86/arch-x86.h
> >>>> index 9ea97ca..2765bfc 100644
> >>>> --- a/CodeSamples/arch-x86/arch-x86.h
> >>>> +++ b/CodeSamples/arch-x86/arch-x86.h
> >>>> @@ -52,6 +52,8 @@ __asm__ __volatile__(LOCK_PREFIX "orl %0,%1" \
> >>>> __asm__ __volatile__("mfence" : : : "memory")
> >>>> /* __asm__ __volatile__("lock; addl $0,0(%%esp)" : : : "memory") */
> >>>>
> >>>> +#define smp_rmb() barrier()
> >>>> +#define smp_wmb() barrier()
> >>>>
> >>>> /*
> >>>> * Generate 64-bit timestamp.
> >>>> diff --git a/CodeSamples/count/count_lim_sig.c b/CodeSamples/count/count_lim_sig.c
> >>>> index c316426..26a2a76 100644
> >>>> --- a/CodeSamples/count/count_lim_sig.c
> >>>> +++ b/CodeSamples/count/count_lim_sig.c
> >>>> @@ -89,6 +89,7 @@ static void flush_local_count(void) //\lnlbl{flush:b}
> >>>> *counterp[t] = 0;
> >>>> globalreserve -= *countermaxp[t];
> >>>> *countermaxp[t] = 0; //\lnlbl{flush:thiev:e}
> >>>> + smp_wmb(); //\lnlbl{flush:wmb}
> >>>> WRITE_ONCE(*theftp[t], THEFT_IDLE); //\lnlbl{flush:IDLE}
> >>>> } //\lnlbl{flush:loop2:e}
> >>>> } //\lnlbl{flush:e}
> >>>> @@ -115,10 +116,12 @@ int add_count(unsigned long delta) //\lnlbl{b}
> >>>>
> >>>> WRITE_ONCE(counting, 1); //\lnlbl{fast:b}
> >>>> barrier(); //\lnlbl{barrier:1}
> >>>> - if (READ_ONCE(theft) <= THEFT_REQ && //\lnlbl{check:b}
> >>>> - countermax - counter >= delta) { //\lnlbl{check:e}
> >>>> - WRITE_ONCE(counter, counter + delta); //\lnlbl{add:f}
> >>>> - fastpath = 1; //\lnlbl{fasttaken}
> >>>> + if (READ_ONCE(theft) <= THEFT_REQ) { //\lnlbl{check:b}
> >>>> + smp_rmb(); //\lnlbl{rmb}
> >>>> + if (countermax - counter >= delta) { //\lnlbl{check:e}
> >>>> + WRITE_ONCE(counter, counter + delta);//\lnlbl{add:f}
> >>>> + fastpath = 1; //\lnlbl{fasttaken}
> >>>> + }
> >>>> }
> >>>> barrier(); //\lnlbl{barrier:2}
> >>>> WRITE_ONCE(counting, 0); //\lnlbl{clearcnt}
> >>>> @@ -154,10 +157,12 @@ int sub_count(unsigned long delta)
> >>>>
> >>>> WRITE_ONCE(counting, 1);
> >>>> barrier();
> >>>> - if (READ_ONCE(theft) <= THEFT_REQ &&
> >>>> - counter >= delta) {
> >>>> - WRITE_ONCE(counter, counter - delta);
> >>>> - fastpath = 1;
> >>>> + if (READ_ONCE(theft) <= THEFT_REQ) {
> >>>> + smp_rmb();
> >>>> + if (counter >= delta) {
> >>>> + WRITE_ONCE(counter, counter - delta);
> >>>> + fastpath = 1;
> >>>> + }
> >>>> }
> >>>> barrier();
> >>>> WRITE_ONCE(counting, 0);
> >>>> --
> >>>> 2.7.4
> >>>>
> >>>
> >>
> >
>
next prev parent reply other threads:[~2018-10-18 23:16 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-15 23:04 [RFC PATCH] count_lim_sig: Add pair of smp_wmb() and smp_rmb() Akira Yokosawa
2018-10-17 15:10 ` Paul E. McKenney
2018-10-17 22:21 ` Akira Yokosawa
2018-10-18 0:37 ` Paul E. McKenney
2018-10-18 13:03 ` Akira Yokosawa
2018-10-18 15:15 ` Paul E. McKenney [this message]
2018-10-18 22:43 ` Akira Yokosawa
2018-10-19 0:32 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181018151519.GI2674@linux.ibm.com \
--to=paulmck@linux.ibm.com \
--cc=akiyks@gmail.com \
--cc=perfbook@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox