linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>, dipankar <dipankar@in.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Josh Triplett <josh@joshtriplett.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	rostedt <rostedt@goodmis.org>,
	David Howells <dhowells@redhat.com>,
	Eric Dumazet <edumazet@google.com>,
	dvhart <dvhart@linux.intel.com>, fweisbec <fweisbec@gmail.com>,
	Oleg Nesterov <oleg@redhat.com>,
	bobby prani <bobby.prani@gmail.com>, ldr709 <ldr709@gmail.com>
Subject: Re: [PATCH RFC tip/core/rcu] SRCU rewrite
Date: Thu, 17 Nov 2016 15:38:08 +0000 (UTC)	[thread overview]
Message-ID: <1452112.5481.1479397088449.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <1189271890.5446.1479396692585.JavaMail.zimbra@efficios.com>

----- On Nov 17, 2016, at 10:31 AM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:

> ----- On Nov 17, 2016, at 10:07 AM, Lai Jiangshan jiangshanlai@gmail.com wrote:
> 
>> On Thu, Nov 17, 2016 at 10:31 PM, Boqun Feng <boqun.feng@gmail.com> wrote:
>>> On Thu, Nov 17, 2016 at 08:18:51PM +0800, Lai Jiangshan wrote:
>>>> On Tue, Nov 15, 2016 at 10:37 PM, Paul E. McKenney
>>>> <paulmck@linux.vnet.ibm.com> wrote:
>>>> > On Tue, Nov 15, 2016 at 09:44:45AM +0800, Boqun Feng wrote:
>>>>
>>>> >>
>>>> >> __srcu_read_lock() used to be called with preemption disabled. I guess
>>>> >> the reason was because we have two percpu variables to increase. So with
>>>> >> only one percpu right, could we remove the preempt_{dis,en}able() in
>>>> >> srcu_read_lock() and use this_cpu_inc() here?
>>>> >
>>>> > Quite possibly...
>>>> >
>>>>
>>>
>>> Hello, Lai ;-)
>>>
>>>> it will be nicer if it is removed.
>>>>
>>>> The reason for the preemption-disabled was also because we
>>>> have to disallow any preemption between the fetching of the idx
>>>> and the increasement. so that we have at most NR_CPUS worth
>>>> of readers using the old index that haven't incremented the counters.
>>>>
>>>
>>> After reading the comment for a while, I actually got a question, maybe
>>> I miss something ;-)
>>>
>>> Why "at most NR_CPUS worth of readers using the old index haven't
>>> incremented the counters" could save us from overflow the counter?
>>>
>>> Please consider the following case in current implementation:
>>>
>>>
>>> {sp->completed = 0} so idx = 1 in srcu_advance_batches(...)
>>>
>>> one thread A is currently in __srcu_read_lock() and using idx = 1 and
>>> about to increase the percpu c[idx], and ULONG_MAX __srcu_read_lock()s
>>> have been called and returned with idx = 1, please note I think this is
>>> possible because I assume we may have some code like this:
>>>
>>>         unsigned long i = 0;
>>>         for (; i < ULONG_MAX; i++)
>>>                 srcu_read_lock(); // return the same idx 1;
>> 
>> this is the wrong usage of the api.
>> 
>> 
>> you might rewrite it as:
>> 
>>        unsigned long index[2] = {0, 0};
>>        unsigned long i = 0;
>>        for (;  index[1] < ULONG_MAX; i++)
>>                 index[srcu_read_lock()]++;
>> 
>> 
>> I think we should add document to disallow this kind of usage.
>> a reader should eat 4bytes on the memory at least.
>> 
> 
> (the analysis below refers to the rewritten SRCU scheme)
> 
> Let's presume we use the API correctly, as you describe (saving
> the returned index of srcu_read_lock() somewhere).
> 
> So for the sake of argument, we can either call srcu_read_lock
> in a loop (during which we can be migrated), or call it
> concurrently from various threads. The effect in terms of
> overflow is pretty much the same.
> 
> What is done here is incrementing per-cpu split-counters. In
> the worse-case scenario, let's assume we're incrementing those
> counters for a single index (0 or 1).
> 
> If we think about this system-wide, we don't really care about
> the overflow of a single CPU counter, because what matters is
> the difference between the overall nr_lock - nr_unlock counts
> for a given index, once summed up by synchronize_srcu().
> 
> So the only situation that could lead to an overflow that matters
> is if synchronize_srcu() see ULONG_MAX more increments of nr_lock
> than the observed number of nr_unlock increments.
> 
> So the bound is not only about the number of concurrent SRCU
> readers, but also about the number of SRCU readers that may
> appear between the moment synchronize_srcu() reads the nr_unlock
> per-cpu counters and the moment it reads the nr_lock counters.
> 
> This maximum bound of ULONG_MAX - 1 therefore applies to the
> sum of:
> - numner of concurrent SRCU read-side critical sections active
>  at the same time,
> - number of SRCU read-side critical sections beginning after
>  synchronize_srcu() has read the nr_unlock counters, before
>  it reads the nr_lock counters.

Now that I think of it, since we flip the period before summing
the nr_unlock counter, we cannot have any newcoming readers appearing
within the target period while we execute synchronize_srcu().
So it ends up being a limit on the number of concurrent SRCU
read-side c.s. active at the same time. (you can scratch the
second bullet above).

Thanks,

Mathieu



> You guys seem to see cases that would require a lower max nr
> reader bound, but I'm afraid I don't quite understand them.
> 
> Thanks,
> 
> Mathieu
> 
> 
>>>
>>> And none of the corresponding srcu_read_unlock() has been called;
>>>
>>> In this case, at the time thread A increases the percpu c[idx], that
>>> will result in an overflow, right? So even one reader using old idx will
>>> result in overflow.
>>>
>>>
>>> I think we won't be hit by overflow is not because we have few readers
>>> using old idx, it's because there are unlikely ULONG_MAX + 1
>>> __srcu_read_lock() called for the same idx, right? And the reason of
>>> this is much complex: because we won't have a fair mount of threads in
>>> the system, because no thread will nest srcu many levels, because there
>>> won't be a lot readers using old idx.
>>>
>>> And this will still be true if we use new mechanism and shrink the
>>> preemption disabled section, right?
>>>
>>> Regards,
>>> Boqun
>>>
>>>> if we remove the preempt_{dis,en}able(). we must change the
>>>> "NR_CPUS" in the comment into ULONG_MAX/4. (I assume
>>>> one on-going reader needs at least need 4bytes at the stack). it is still safe.
>>>>
>>>> but we still need to think more if we want to remove the preempt_{dis,en}able().
>>>>
>>>> Thanks
>> >> Lai
> 
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

  reply	other threads:[~2016-11-17 18:03 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-14 18:36 [PATCH RFC tip/core/rcu] SRCU rewrite Paul E. McKenney
2016-11-14 19:00 ` Mathieu Desnoyers
2016-11-15  1:44 ` Boqun Feng
2016-11-15 14:37   ` Paul E. McKenney
2016-11-17 12:18     ` Lai Jiangshan
2016-11-17 13:49       ` Paul E. McKenney
2016-11-17 14:38         ` Paul E. McKenney
2016-11-17 14:45           ` Boqun Feng
2016-11-17 15:54             ` Paul E. McKenney
2016-11-17 15:55             ` Lai Jiangshan
2016-11-17 17:42               ` Paul E. McKenney
2016-11-17 14:31       ` Boqun Feng
2016-11-17 15:03         ` Paul E. McKenney
2016-11-17 15:07         ` Lai Jiangshan
2016-11-17 15:31           ` Mathieu Desnoyers
2016-11-17 15:38             ` Mathieu Desnoyers [this message]
2016-11-17 15:53               ` Paul E. McKenney
2016-11-17 16:33                 ` Mathieu Desnoyers
2016-11-17 20:31           ` Lance Roy
2016-11-15  7:51 ` Peter Zijlstra
2016-11-15 13:54   ` Mathieu Desnoyers
2016-11-15 13:59     ` Peter Zijlstra
2016-11-15 14:26       ` Paul E. McKenney
2016-11-15 14:55         ` Peter Zijlstra
2016-11-15 15:43           ` Paul E. McKenney
2016-11-17 13:58 ` Lai Jiangshan
2016-11-17 19:53   ` Lance Roy
2016-11-18 13:27     ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1452112.5481.1479397088449.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=akpm@linux-foundation.org \
    --cc=bobby.prani@gmail.com \
    --cc=boqun.feng@gmail.com \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=dvhart@linux.intel.com \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=jiangshanlai@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=ldr709@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).