From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: tglx@linutronix.de
Cc: linux-rt-users@vger.kernel.org
Subject: Quick review of -rt RCU-related patches
Date: Tue, 4 Oct 2011 10:47:56 -0700 [thread overview]
Message-ID: <20111004174755.GA4762@linux.vnet.ibm.com> (raw)
Hello, Thomas,
A quick review of the RCU-related -rt patches, along with a list of
RCU bug fixes on their way to mainline. One of these is not yet in
-tip, so the patch is included.
No smoking guns, at least unless the system is being hammered by a
network-based DoS attack.
Thanx, Paul
------------------------------------------------------------------------
fs-add-missing-rcu-protection.patch
Just adds a rcu_read_lock()/rcu_read_unlock() pair.
Should be inoffensive.
rcu-reduce-lock-section.patch
Prevents a wakeup within an irq-disabled raw spinlock.
But which wakeup?
o rcu_report_exp_rnp() is only permitted to do a wakeup
if called with rrupts enabled.
o All calls enable wakeups -except- for the call from
sync_rcu_preempt_exp_init(), but in that case, there
is nothing to wake up -- it is in fact running doing
the initialization.
In theory, this should be OK. If it was broken, the symptom
would be an expedited grace period blocking forever.
signal-fix-up-rcu-wreckage.patch
Makes __lock_task_sighand() do local_irq_save_nort() instead
of local_irq_save(). The RCU implementation should be OK with
this. The signal implementation might or might not.
sched-might-sleep-do-not-account-rcu-depth.patch
Yow!!! For CONFIG_PREEMPT_RT_FULL, this redefines
sched_rcu_preempt_depth() to always return zero.
This means that might_sleep() will never complain about
blocking in an RCU read-side critical section. I guess that
this is necessary, though it would be better to have some
way to complain for general sleeping (e.g., waiting on
network receive) as opposed to blocking on a lock that is
subject to priority inheritance.
rcu-force-preempt-rcu-for-rt.patch
This one should be obsoleted by commit 8008e129dc9, which is
queued in -tip, hopefully for 3.2.
Except for the RT_GROUP_SCHED change, which is unrelated.
rcu-disable-the-rcu-bh-stuff-for-rt.patch
This implements RCU-bh in terms of RCU-preempt, but disables
BH around the resulting RCU-preempt read-side critical section.
May be vulnerable to network-based denial-of-service attacks,
which could OOM a system with this patch.
The implementation of rcu_read_lock_bh_held() is weak, but OK. In
an ideal world, it would also complain if not local_bh_disable().
_paul_e__mckenney_-eliminate_-_rcu_boosted.patch
Looks fine, but I might not be the best reviewer for this one.
peter_zijlstra-frob-rcu.patch
Looks OK. Hmmm... Should this one go to mainline?
Oh, looks equivalent, actually. So why the change?
Possible fixes from the 3.2-bound RCU commits in -tip:
7eb4f455 (Make rcu_implicit_dynticks_qs() locals be correct size)
Symptoms: misbehavior on 32-bit systems after a given CPU went
through about 2^30 dyntick-idle transitions. You would
have to work pretty hard to get this one to happen.
5c51dd73 (Prevent early boot set_need_resched() from __rcu_pending())
Symptoms: boot-time hangs for rare configurations.
037067a1 (Prohibit grace periods during early boot)
Symptoms: boot-time hangs for rare configurations.
06ae115a (Avoid having just-onlined CPU resched itself when RCU is idle)
Symptoms: boot-time hangs for rare configurations.
5b61b0ba (Wire up RCU_BOOST_PRIO for rcutree)
Symptoms: The system uses RT priority 1 for boosting regardless
of the value of CONFIG_RCU_BOOST_PRIO.
e90c53d3 (Remove rcu_needs_cpu_flush() to avoid false quiescent states)
Symptoms: Too-short grace periods for RCU_FAST_NO_HZ=y.
But simpler to set RCU_FAST_NO_HZ=n.
And the following is a patch to a theoretical problem with expedited
grace periods.
------------------------------------------------------------------------
rcu: Avoid RCU-preempt expedited grace-period botch
Because rcu_read_unlock_special() samples rcu_preempted_readers_exp(rnp)
after dropping rnp->lock, the following sequence of events is possible:
1. Task A exits its RCU read-side critical section, and removes
itself from the ->blkd_tasks list, releases rnp->lock, and is
then preempted. Task B remains on the ->blkd_tasks list, and
blocks the current expedited grace period.
2. Task B exits from its RCU read-side critical section and removes
itself from the ->blkd_tasks list. Because it is the last task
blocking the current expedited grace period, it ends that
expedited grace period.
3. Task A resumes, and samples rcu_preempted_readers_exp(rnp) which
of course indicates that nothing is blocking the nonexistent
expedited grace period. Task A is again preempted.
4. Some other CPU starts an expedited grace period. There are several
tasks blocking this expedited grace period queued on the
same rcu_node structure that Task A was using in step 1 above.
5. Task A examines its state and incorrectly concludes that it was
the last task blocking the expedited grace period on the current
rcu_node structure. It therefore reports completion up the
rcu_node tree.
6. The expedited grace period can then incorrectly complete before
the tasks blocked on this same rcu_node structure exit their
RCU read-side critical sections. Arbitrarily bad things happen.
This commit therefore takes a snapshot of rcu_preempted_readers_exp(rnp)
prior to dropping the lock, so that only the last task thinks that it is
the last task, thus avoiding the failure scenario laid out above.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 4b9b9f8..7986053 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -312,6 +312,7 @@ static noinline void rcu_read_unlock_special(struct task_struct *t)
{
int empty;
int empty_exp;
+ int empty_exp_now;
unsigned long flags;
struct list_head *np;
#ifdef CONFIG_RCU_BOOST
@@ -382,8 +383,10 @@ static noinline void rcu_read_unlock_special(struct task_struct *t)
/*
* If this was the last task on the current list, and if
* we aren't waiting on any CPUs, report the quiescent state.
- * Note that rcu_report_unblock_qs_rnp() releases rnp->lock.
+ * Note that rcu_report_unblock_qs_rnp() releases rnp->lock,
+ * so we must take a snapshot of the expedited state.
*/
+ empty_exp_now = !rcu_preempted_readers_exp(rnp);
if (!empty && !rcu_preempt_blocked_readers_cgp(rnp)) {
trace_rcu_quiescent_state_report("preempt_rcu",
rnp->gpnum,
@@ -406,7 +409,7 @@ static noinline void rcu_read_unlock_special(struct task_struct *t)
* If this was the last task on the expedited lists,
* then we need to report up the rcu_node hierarchy.
*/
- if (!empty_exp && !rcu_preempted_readers_exp(rnp))
+ if (!empty_exp && empty_exp_now)
rcu_report_exp_rnp(&rcu_preempt_state, rnp);
} else {
local_irq_restore(flags);
next reply other threads:[~2011-10-04 17:58 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-10-04 17:47 Paul E. McKenney [this message]
2011-10-04 22:05 ` Quick review of -rt RCU-related patches Thomas Gleixner
2011-10-04 22:12 ` Peter Zijlstra
2011-10-04 23:15 ` Paul E. McKenney
2011-10-04 22:13 ` Peter Zijlstra
2011-10-04 23:15 ` Paul E. McKenney
2011-10-04 23:27 ` Thomas Gleixner
2011-10-04 23:56 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111004174755.GA4762@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=linux-rt-users@vger.kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).