From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
Bharata B Rao <bharata.rao@gmail.com>,
Li Zefan <lizf@cn.fujitsu.com>, Ingo Molnar <mingo@elte.hu>,
Paul Menage <menage@google.com>,
Balbir Singh <balbir@linux.vnet.ibm.com>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] cpuacct: add a branch prediction
Date: Thu, 26 Feb 2009 17:29:15 -0800 [thread overview]
Message-ID: <20090227012915.GF6634@linux.vnet.ibm.com> (raw)
In-Reply-To: <20090227095856.ef8c1c05.kamezawa.hiroyu@jp.fujitsu.com>
On Fri, Feb 27, 2009 at 09:58:56AM +0900, KAMEZAWA Hiroyuki wrote:
> On Thu, 26 Feb 2009 08:45:09 -0800
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
>
> > On Thu, Feb 26, 2009 at 09:06:24PM +0900, KAMEZAWA Hiroyuki wrote:
> > > Peter Zijlstra wrote:
> > > > On Thu, 2009-02-26 at 20:17 +0900, KAMEZAWA Hiroyuki wrote:
> > > >> Peter Zijlstra wrote:
> > > >> > On Thu, 2009-02-26 at 19:28 +0900, KAMEZAWA Hiroyuki wrote:
> > > >> >
> > > >> >> Taking hierarchy mutex while reading will make read-side stable.
> > > >> >
> > > >> > We're talking about scheduling here, taking a mutex to stop scheduling
> > > >> > won't work, nor will it be acceptible to use anything that will.
> > > >> >
> > > >> No mutex is necessary, anyway.
> > > >> hierarchy-walker function completely works well under rcu read lock,
> > > >> if small jitter is allowed.
> > > >
> > > > Right, should be doable -- and looking at the code, we have this
> > > > horrible 32 bit exception in there that locks the rq in order to read
> > > > the 64bit value.
> > > >
> > > > Would be grand to get rid of that,. how bad would it be for userspace to
> > > > get the occasionally fubarred value?
> > > >
> > > >From view of user-support saler, if terrible broken value is reported,
> > > it will be user-incident and annoy me(us) ;)
> > >
> > > I'd like to get rid of rq->lock, too..Hmm.. some routine like
> > > atomic64_read() can help this ? (But I don't want to use atomic_t here..)
> >
> > atomic64_read() will not help you on a 32-bit machine. Here is the
> > sequence of events that will cause the aforementioned user incidents and
> > consequent annoyance:
> >
> > o The value of the counter is (2^32)-1, or 0xffffffff.
> >
> > o CPU 0 reads the high-order 32 bits of the counter, getting zero.
> >
> > o CPU 1 increments the low-order 32 bits of the counter, resulting
> > in zero, but notes that there is a carry out of this field.
> >
> > o CPU 0 reads the low-order 32 bits of the counter, getting zero.
> >
> > o CPU 1 increments the high-order 32 bits of the counter, so that
> > the new value of the counter is 2^32, or 0x100000000.
> >
> > So CPU 0 gets a value that is -way- off.
> >
> > The usual trick is something like the following for counter read:
> >
> > 1. Read the high-order 32 bits of the counter.
> >
> > 2. Do a memory barrier, smp_mb().
> >
> > 3. Read the low-order 32 bits of the counter.
> >
> > 4. Do another memory barrier, again smp_mb().
> >
> > 5. Read the high-order 32 bits of the counter again.
> >
> > If it is the same as the value obtained in step 1 (or the previous
> > execution of step 5), then we are done. (This works even in case
> > of complete 64-bit overflow, though we should be very lucky to
> > live that long!) Otherwise, go to step 2.
> >
> > But it is also necessary to modify the counter update:
> >
> > 1. Increment the low-order 32 bits of the counter. If no overflow
> > occurred, we are done, otherwise, continue through this sequence
> > of steps.
> >
> > 2. Do a memory barrier, smp_mb().
> >
> > 3. Increment the high-order 32 bits of the counter.
> >
> > How to detect overflow in step 1? Well, if we are incrementing, we can
> > just test for the new value being zero. Otherwise, if we are adding
> > a 32-bit number, if the new value of the low-order 32 bits of counter
> > is less than the old value, overflow occurred (but make sure that the
> > comparison is unsigned!).
> >
> > This all assumes that you are adding a 32-bit quantity to the counter.
> > Adding 64-bit values is not much harder.
> >
> > Does this approach work for you?
> >
>
> Thank you. I'll try some and post if it seems easy to read/merge.
> Hmm, but in your approach, can't we see the counter goes backword ?
> (if the reader see only low 32 bit is incremtend.)
Ouch, indeed! The update would need to be atomic for my approach to
work. My apologies for my confusion!
> Can't we use seq_counter in include/linux/seqlock.h ?
> There is only one writer and we don't need write-side lock.
Yes, seqlock should work fine, good point!
Thanx, Paul
next prev parent reply other threads:[~2009-02-27 1:29 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-02-26 7:40 [PATCH] cpuacct: add a branch prediction Li Zefan
2009-02-26 8:07 ` KAMEZAWA Hiroyuki
2009-02-26 8:17 ` Li Zefan
2009-02-26 8:22 ` KAMEZAWA Hiroyuki
2009-02-26 8:35 ` Li Zefan
2009-02-26 8:40 ` KAMEZAWA Hiroyuki
2009-02-26 10:10 ` Bharata B Rao
2009-02-26 10:28 ` KAMEZAWA Hiroyuki
2009-02-26 10:44 ` Peter Zijlstra
2009-02-26 10:55 ` KAMEZAWA Hiroyuki
2009-02-26 11:22 ` Peter Zijlstra
2009-02-26 11:17 ` KAMEZAWA Hiroyuki
2009-02-26 11:28 ` Peter Zijlstra
2009-02-26 12:06 ` KAMEZAWA Hiroyuki
2009-02-26 12:20 ` Peter Zijlstra
2009-02-26 12:26 ` Ingo Molnar
2009-02-26 12:40 ` Arnd Bergmann
2009-02-27 4:25 ` Paul Mackerras
2009-02-26 16:45 ` Paul E. McKenney
2009-02-27 0:58 ` KAMEZAWA Hiroyuki
2009-02-27 1:29 ` Paul E. McKenney [this message]
2009-02-27 3:22 ` [RFC][PATCH] remove rq->lock from cpuacct cgroup (Was " KAMEZAWA Hiroyuki
2009-03-02 14:56 ` Peter Zijlstra
2009-03-02 23:42 ` KAMEZAWA Hiroyuki
2009-03-03 7:51 ` Peter Zijlstra
2009-03-03 9:04 ` KAMEZAWA Hiroyuki
2009-03-03 9:40 ` Peter Zijlstra
2009-03-03 10:42 ` KAMEZAWA Hiroyuki
2009-03-03 10:44 ` KAMEZAWA Hiroyuki
2009-03-03 11:54 ` Peter Zijlstra
2009-03-04 6:32 ` [PATCH] remove rq->lock from cpuacct cgroup v2 KAMEZAWA Hiroyuki
2009-03-04 7:54 ` Bharata B Rao
2009-03-04 8:20 ` KAMEZAWA Hiroyuki
2009-03-04 8:46 ` KAMEZAWA Hiroyuki
2009-03-04 10:35 ` Bharata B Rao
2009-03-04 12:11 ` Bharata B Rao
2009-03-04 14:17 ` KAMEZAWA Hiroyuki
2009-02-26 8:37 ` [PATCH] cpuacct: add a branch prediction Balbir Singh
2009-02-26 8:41 ` Li Zefan
2009-02-26 10:40 ` Balbir Singh
2009-02-26 10:43 ` Peter Zijlstra
2009-02-26 8:43 ` KAMEZAWA Hiroyuki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090227012915.GF6634@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=a.p.zijlstra@chello.nl \
--cc=balbir@linux.vnet.ibm.com \
--cc=bharata.rao@gmail.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lizf@cn.fujitsu.com \
--cc=menage@google.com \
--cc=mingo@elte.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox