From: Ingo Molnar <mingo@kernel.org>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Andy Lutomirski <luto@amacapital.net>,
Andrew Morton <akpm@linux-foundation.org>,
Denys Vlasenko <dvlasenk@redhat.com>,
Brian Gerst <brgerst@gmail.com>,
Peter Zijlstra <peterz@infradead.org>,
Borislav Petkov <bp@alien8.de>, "H. Peter Anvin" <hpa@zytor.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Thomas Gleixner <tglx@linutronix.de>,
Waiman Long <Waiman.Long@hp.com>
Subject: Re: [PATCH 02/12] x86/mm/hotplug: Remove pgd_list use from the memory hotplug code
Date: Mon, 15 Jun 2015 22:33:53 +0200 [thread overview]
Message-ID: <20150615203353.GB13273@gmail.com> (raw)
In-Reply-To: <20150615004030.GK3913@linux.vnet.ibm.com>
* Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> On Sun, Jun 14, 2015 at 09:38:25PM +0200, Oleg Nesterov wrote:
> > On 06/14, Oleg Nesterov wrote:
> > >
> > > On 06/14, Ingo Molnar wrote:
> > > >
> > > > * Oleg Nesterov <oleg@redhat.com> wrote:
> > > >
> > > > > > + spin_lock(&pgd_lock); /* Implies rcu_read_lock() for the task list iteration: */
> > > > > ^^^^^^^^^^^^^^^^^^^^^^^
> > > > >
> > > > > Hmm, but it doesn't if PREEMPT_RCU? No, no, I do not pretend I understand how it
> > > > > actually works ;) But, say, rcu_check_callbacks() can be called from irq and
> > > > > since spin_lock() doesn't increment current->rcu_read_lock_nesting this can lead
> > > > > to rcu_preempt_qs()?
> > > >
> > > > No, RCU grace periods are still defined by 'heavy' context boundaries such as
> > > > context switches, entering idle or user-space mode.
> > > >
> > > > PREEMPT_RCU is like traditional RCU, except that blocking is allowed within the
> > > > RCU read critical section - that is why it uses a separate nesting counter
> > > > (current->rcu_read_lock_nesting), not the preempt count.
> > >
> > > Yes.
> > >
> > > > But if a piece of kernel code is non-preemptible, such as a spinlocked region or
> > > > an irqs-off region, then those are still natural RCU read lock regions, regardless
> > > > of the RCU model, and need no additional RCU locking.
> > >
> > > I do not think so. Yes I understand that rcu_preempt_qs() itself doesn't
> > > finish the gp, but if there are no other rcu-read-lock holders then it
> > > seems synchronize_rcu() on another CPU can return _before_ spin_unlock(),
> > > this CPU no longer needs rcu_preempt_note_context_switch().
> > >
> > > OK, I can be easily wrong, I do not really understand the implementation
> > > of PREEMPT_RCU. Perhaps preempt_disable() can actually act as rcu_read_lock()
> > > with the _current_ implementation. Still this doesn't look right even if
> > > happens to work, and Documentation/RCU/checklist.txt says:
> > >
> > > 11. Note that synchronize_rcu() -only- guarantees to wait until
> > > all currently executing rcu_read_lock()-protected RCU read-side
> > > critical sections complete. It does -not- necessarily guarantee
> > > that all currently running interrupts, NMIs, preempt_disable()
> > > code, or idle loops will complete. Therefore, if your
> > > read-side critical sections are protected by something other
> > > than rcu_read_lock(), do -not- use synchronize_rcu().
> >
> >
> > I've even checked this ;) I applied the stupid patch below and then
> >
> > $ taskset 2 perl -e 'syscall 157, 666, 5000' &
> > [1] 565
> >
> > $ taskset 1 perl -e 'syscall 157, 777'
> >
> > $
> > [1]+ Done taskset 2 perl -e 'syscall 157, 666, 5000'
> >
> > $ dmesg -c
> > SPIN start
> > SYNC start
> > SYNC done!
> > SPIN done!
>
> Please accept my apologies for my late entry to this thread.
> Youngest kid graduated from university this weekend, so my
> attention has been elsewhere.
Congratulations! :-)
> If you were to disable interrupts instead of preemption, I would expect
> that the preemptible-RCU grace period would be blocked -- though I am
> not particularly comfortable with people relying on disabled interrupts
> blocking a preemptible-RCU grace period.
>
> Here is what can happen if you try to block a preemptible-RCU grace
> period by disabling preemption, assuming that there are at least two
> online CPUs in the system:
>
> 1. CPU 0 does spin_lock(), which disables preemption.
>
> 2. CPU 1 starts a grace period.
>
> 3. CPU 0 takes a scheduling-clock interrupt. It raises softirq,
> and the RCU_SOFTIRQ handler notes that there is a new grace
> period and sets state so that a subsequent quiescent state on
> this CPU will be noted.
>
> 4. CPU 0 takes another scheduling-clock interrupt, which checks
> current->rcu_read_lock_nesting, and notes that there is no
> preemptible-RCU read-side critical section in progress. It
> again raises softirq, and the RCU_SOFTIRQ handler reports
> the quiescent state to core RCU.
>
> 5. Once each of the other CPUs report a quiescent state, the
> grace period can end, despite CPU 0 having preemption
> disabled the whole time.
>
> So Oleg's test is correct, disabling preemption is not sufficient
> to block a preemptible-RCU grace period.
I stand corrected!
> The usual suggestion would be to add rcu_read_lock() just after the lock is
> acquired and rcu_read_unlock() just before each release of that same lock.
Will fix it that way.
Thanks,
Ingo
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Ingo Molnar <mingo@kernel.org>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Andy Lutomirski <luto@amacapital.net>,
Andrew Morton <akpm@linux-foundation.org>,
Denys Vlasenko <dvlasenk@redhat.com>,
Brian Gerst <brgerst@gmail.com>,
Peter Zijlstra <peterz@infradead.org>,
Borislav Petkov <bp@alien8.de>, "H. Peter Anvin" <hpa@zytor.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Thomas Gleixner <tglx@linutronix.de>,
Waiman Long <Waiman.Long@hp.com>
Subject: Re: [PATCH 02/12] x86/mm/hotplug: Remove pgd_list use from the memory hotplug code
Date: Mon, 15 Jun 2015 22:33:53 +0200 [thread overview]
Message-ID: <20150615203353.GB13273@gmail.com> (raw)
In-Reply-To: <20150615004030.GK3913@linux.vnet.ibm.com>
* Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> On Sun, Jun 14, 2015 at 09:38:25PM +0200, Oleg Nesterov wrote:
> > On 06/14, Oleg Nesterov wrote:
> > >
> > > On 06/14, Ingo Molnar wrote:
> > > >
> > > > * Oleg Nesterov <oleg@redhat.com> wrote:
> > > >
> > > > > > + spin_lock(&pgd_lock); /* Implies rcu_read_lock() for the task list iteration: */
> > > > > ^^^^^^^^^^^^^^^^^^^^^^^
> > > > >
> > > > > Hmm, but it doesn't if PREEMPT_RCU? No, no, I do not pretend I understand how it
> > > > > actually works ;) But, say, rcu_check_callbacks() can be called from irq and
> > > > > since spin_lock() doesn't increment current->rcu_read_lock_nesting this can lead
> > > > > to rcu_preempt_qs()?
> > > >
> > > > No, RCU grace periods are still defined by 'heavy' context boundaries such as
> > > > context switches, entering idle or user-space mode.
> > > >
> > > > PREEMPT_RCU is like traditional RCU, except that blocking is allowed within the
> > > > RCU read critical section - that is why it uses a separate nesting counter
> > > > (current->rcu_read_lock_nesting), not the preempt count.
> > >
> > > Yes.
> > >
> > > > But if a piece of kernel code is non-preemptible, such as a spinlocked region or
> > > > an irqs-off region, then those are still natural RCU read lock regions, regardless
> > > > of the RCU model, and need no additional RCU locking.
> > >
> > > I do not think so. Yes I understand that rcu_preempt_qs() itself doesn't
> > > finish the gp, but if there are no other rcu-read-lock holders then it
> > > seems synchronize_rcu() on another CPU can return _before_ spin_unlock(),
> > > this CPU no longer needs rcu_preempt_note_context_switch().
> > >
> > > OK, I can be easily wrong, I do not really understand the implementation
> > > of PREEMPT_RCU. Perhaps preempt_disable() can actually act as rcu_read_lock()
> > > with the _current_ implementation. Still this doesn't look right even if
> > > happens to work, and Documentation/RCU/checklist.txt says:
> > >
> > > 11. Note that synchronize_rcu() -only- guarantees to wait until
> > > all currently executing rcu_read_lock()-protected RCU read-side
> > > critical sections complete. It does -not- necessarily guarantee
> > > that all currently running interrupts, NMIs, preempt_disable()
> > > code, or idle loops will complete. Therefore, if your
> > > read-side critical sections are protected by something other
> > > than rcu_read_lock(), do -not- use synchronize_rcu().
> >
> >
> > I've even checked this ;) I applied the stupid patch below and then
> >
> > $ taskset 2 perl -e 'syscall 157, 666, 5000' &
> > [1] 565
> >
> > $ taskset 1 perl -e 'syscall 157, 777'
> >
> > $
> > [1]+ Done taskset 2 perl -e 'syscall 157, 666, 5000'
> >
> > $ dmesg -c
> > SPIN start
> > SYNC start
> > SYNC done!
> > SPIN done!
>
> Please accept my apologies for my late entry to this thread.
> Youngest kid graduated from university this weekend, so my
> attention has been elsewhere.
Congratulations! :-)
> If you were to disable interrupts instead of preemption, I would expect
> that the preemptible-RCU grace period would be blocked -- though I am
> not particularly comfortable with people relying on disabled interrupts
> blocking a preemptible-RCU grace period.
>
> Here is what can happen if you try to block a preemptible-RCU grace
> period by disabling preemption, assuming that there are at least two
> online CPUs in the system:
>
> 1. CPU 0 does spin_lock(), which disables preemption.
>
> 2. CPU 1 starts a grace period.
>
> 3. CPU 0 takes a scheduling-clock interrupt. It raises softirq,
> and the RCU_SOFTIRQ handler notes that there is a new grace
> period and sets state so that a subsequent quiescent state on
> this CPU will be noted.
>
> 4. CPU 0 takes another scheduling-clock interrupt, which checks
> current->rcu_read_lock_nesting, and notes that there is no
> preemptible-RCU read-side critical section in progress. It
> again raises softirq, and the RCU_SOFTIRQ handler reports
> the quiescent state to core RCU.
>
> 5. Once each of the other CPUs report a quiescent state, the
> grace period can end, despite CPU 0 having preemption
> disabled the whole time.
>
> So Oleg's test is correct, disabling preemption is not sufficient
> to block a preemptible-RCU grace period.
I stand corrected!
> The usual suggestion would be to add rcu_read_lock() just after the lock is
> acquired and rcu_read_unlock() just before each release of that same lock.
Will fix it that way.
Thanks,
Ingo
next prev parent reply other threads:[~2015-06-15 20:34 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-13 9:49 [PATCH 00/12, v2] x86/mm: Implement lockless pgd_alloc()/pgd_free() Ingo Molnar
2015-06-13 9:49 ` Ingo Molnar
2015-06-13 9:49 ` [PATCH 01/12] x86/mm/pat: Don't free PGD entries on memory unmap Ingo Molnar
2015-06-13 9:49 ` Ingo Molnar
2015-06-13 9:49 ` [PATCH 02/12] x86/mm/hotplug: Remove pgd_list use from the memory hotplug code Ingo Molnar
2015-06-13 9:49 ` Ingo Molnar
2015-06-13 19:24 ` Oleg Nesterov
2015-06-13 19:24 ` Oleg Nesterov
2015-06-14 7:36 ` Ingo Molnar
2015-06-14 7:36 ` Ingo Molnar
2015-06-14 19:24 ` Oleg Nesterov
2015-06-14 19:24 ` Oleg Nesterov
2015-06-14 19:38 ` Oleg Nesterov
2015-06-14 19:38 ` Oleg Nesterov
2015-06-15 0:40 ` Paul E. McKenney
2015-06-15 0:40 ` Paul E. McKenney
2015-06-15 20:33 ` Ingo Molnar [this message]
2015-06-15 20:33 ` Ingo Molnar
2015-06-13 9:49 ` [PATCH 03/12] x86/mm/hotplug: Don't remove PGD entries in remove_pagetable() Ingo Molnar
2015-06-13 9:49 ` Ingo Molnar
2015-06-13 9:49 ` [PATCH 04/12] x86/mm/hotplug: Simplify sync_global_pgds() Ingo Molnar
2015-06-13 9:49 ` Ingo Molnar
2015-06-13 9:49 ` [PATCH 05/12] mm: Introduce arch_pgd_init_late() Ingo Molnar
2015-06-13 9:49 ` Ingo Molnar
2015-06-13 9:49 ` [PATCH 06/12] x86/mm: Enable and use the arch_pgd_init_late() method Ingo Molnar
2015-06-13 9:49 ` Ingo Molnar
2015-06-13 9:49 ` [PATCH 07/12] x86/virt/guest/xen: Remove use of pgd_list from the Xen guest code Ingo Molnar
2015-06-13 9:49 ` Ingo Molnar
2015-06-14 8:26 ` Ingo Molnar
2015-06-14 8:26 ` Ingo Molnar
2015-06-15 9:05 ` Ian Campbell
2015-06-15 9:05 ` Ian Campbell
2015-06-15 10:30 ` David Vrabel
2015-06-15 10:30 ` David Vrabel
2015-06-15 20:35 ` Ingo Molnar
2015-06-15 20:35 ` Ingo Molnar
2015-06-16 14:15 ` David Vrabel
2015-06-16 14:15 ` David Vrabel
2015-06-16 14:19 ` Boris Ostrovsky
2015-06-16 14:19 ` Boris Ostrovsky
2015-06-16 14:27 ` David Vrabel
2015-06-16 14:27 ` David Vrabel
2015-06-16 14:27 ` David Vrabel
2015-06-16 14:19 ` Boris Ostrovsky
2015-06-16 14:15 ` David Vrabel
2015-06-15 20:35 ` Ingo Molnar
2015-06-15 10:30 ` David Vrabel
2015-06-15 9:05 ` Ian Campbell
2015-06-13 9:49 ` [PATCH 08/12] x86/mm: Remove pgd_list use from vmalloc_sync_all() Ingo Molnar
2015-06-13 9:49 ` Ingo Molnar
2015-06-13 9:49 ` [PATCH 09/12] x86/mm/pat/32: Remove pgd_list use from the PAT code Ingo Molnar
2015-06-13 9:49 ` Ingo Molnar
2015-06-13 9:49 ` [PATCH 10/12] x86/mm: Make pgd_alloc()/pgd_free() lockless Ingo Molnar
2015-06-13 9:49 ` Ingo Molnar
2015-06-13 9:49 ` [PATCH 11/12] x86/mm: Remove pgd_list leftovers Ingo Molnar
2015-06-13 9:49 ` Ingo Molnar
2015-06-13 9:49 ` [PATCH 12/12] x86/mm: Simplify pgd_alloc() Ingo Molnar
2015-06-13 9:49 ` Ingo Molnar
2015-06-13 18:58 ` why do we need vmalloc_sync_all? Oleg Nesterov
2015-06-13 18:58 ` Oleg Nesterov
2015-06-14 7:59 ` Ingo Molnar
2015-06-14 7:59 ` Ingo Molnar
2015-06-14 20:06 ` Oleg Nesterov
2015-06-14 20:06 ` Oleg Nesterov
2015-06-15 2:47 ` Andi Kleen
2015-06-15 2:47 ` Andi Kleen
2015-06-15 2:57 ` Andy Lutomirski
2015-06-15 2:57 ` Andy Lutomirski
2015-06-15 20:28 ` Ingo Molnar
2015-06-15 20:28 ` Ingo Molnar
2015-06-15 20:48 ` Andy Lutomirski
2015-06-15 20:48 ` Andy Lutomirski
-- strict thread matches above, loose matches on Subject: below --
2015-06-11 14:07 [RFC PATCH 00/12] x86/mm: Implement lockless pgd_alloc()/pgd_free() Ingo Molnar
2015-06-11 14:07 ` [PATCH 02/12] x86/mm/hotplug: Remove pgd_list use from the memory hotplug code Ingo Molnar
2015-06-12 22:22 ` Oleg Nesterov
2015-06-12 22:48 ` Waiman Long
2015-06-13 7:46 ` Ingo Molnar
2015-06-12 23:15 ` Oleg Nesterov
2015-06-13 7:48 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150615203353.GB13273@gmail.com \
--to=mingo@kernel.org \
--cc=Waiman.Long@hp.com \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=brgerst@gmail.com \
--cc=dvlasenk@redhat.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@amacapital.net \
--cc=oleg@redhat.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.