From: Frederic Weisbecker <fweisbec@gmail.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Jones <davej@redhat.com>,
Paul McKenney <paulmck@linux.vnet.ibm.com>,
Linux Kernel <linux-kernel@vger.kernel.org>,
Richard Guy Briggs <rgb@redhat.com>,
Eric Paris <eparis@redhat.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Oleg Nesterov <oleg@redhat.com>
Subject: Re: audit: rcu_read_lock() used illegally while idle
Date: Thu, 4 Dec 2014 00:49:24 +0100 [thread overview]
Message-ID: <20141203234922.GD31369@lerouge> (raw)
In-Reply-To: <CALCETrU321dBwOgeQOO9aao4D-exmFHBFSoWKJJ6yoCgufofzQ@mail.gmail.com>
On Wed, Dec 03, 2014 at 02:12:43PM -0800, Andy Lutomirski wrote:
> On Wed, Dec 3, 2014 at 2:08 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > I don't know. It's possible that something went wrong with the recent entry_64.S
> > and ptrace.c rework.
> >
> > Previously we expected to set context tracking to user state from syscall_trace_exit()
> > and to kernel state from syscall_trace_enter(). And if anything using RCU
> > was called between syscall_trace_exit() and the actual return to userspace, the code
> > had to be wrapped between user_exit() *code* user_enter().
> >
> > So it looked like this:
> >
> >
> > syscall {
> > //enter kernel
> > syscall_trace_enter() {
> > user_exit();
> > }
> >
> > syscall()
> >
> > syscall_trace_enter() {
>
> Do you mean syscall_trace_leave()? But syscall_trace_leave isn't called here...
Right :-)
>
> > user_enter();
> > }
> >
> > while (test_thread_flag(TIF_EXIT_WORK)) {
> > if (need_resched()) {
> > schedule_user() {
> > user_exit();
> > schedule()
> > user_enter();
> > }
> > }
> >
> > if ( need signal ) {
> > do_notify_resume() {
> > user_exit()
> > handle signal and stuff
> > user_enter()
> > }
> > }
>
> ... it's called hereabouts or so.
>
> > }
> > }
> >
> > This is suboptimal but it doesn't impact the syscall fastpath
> > and it's correct from cputime accounting and RCU point of views.
> >
> > Now maybe the recent logic rework broke the above assumptions?
>
> The big rework was entry, not exit, so I don't see the issue.
And you're right actually :-) I just rewinded to the times when I added
SCHEDULE_USER and actually things happen a bit differently than I thought
and it looks like things haven't changed much since then
syscall {
//enter kernel
syscall_trace_enter() {
user_exit();
}
syscall()
while (test_thread_flag(TIF_ALLWORK_MASK)) {
if (need_resched()) {
schedule_user() {
user_exit();
schedule()
user_enter();
}
} else {
if (test_thread_flag(TIF_WORK_SYSCALL_EXIT)) {
syscall_trace_leave() {
user_enter();
}
} else if (test_thread_flag(TIF_DO_NOTIFY_MASK) {)
do_notify_resume() {
user_exit()
handle signal and stuff
user_enter()
} else {
//ignored but unexpected, should we warn?
}
}
}
So schedule_user() may well be called before syscall_trace_leave() after all.
This mean that schedule_user() can call user_exit() whereas we are already in
the kernel from context tracking POV. Hence we have a context tracking imbalance
or a double call to user_exit() if you prefer.
Things probably happened to work somehow because double user_foo() calls are simply ignored,
and we've been lucky enough that it didn't explode is most scenarios.
The fix would be to change schedule_user() to handle random context tracking states.
exception_enter/exit() act like context_tracking_save()/context_tracking_restore()
so they fit pretty well there:
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 24beb9b..6fe82fb 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2869,15 +2869,18 @@ EXPORT_SYMBOL(schedule);
#ifdef CONFIG_CONTEXT_TRACKING
asmlinkage __visible void __sched schedule_user(void)
{
+ enum ctx_state prev_ctx;
+
/*
* If we come here after a random call to set_need_resched(),
* or we have been woken up remotely but the IPI has not yet arrived,
- * we haven't yet exited the RCU idle mode. Do it here manually until
- * we find a better solution.
+ * the context tracking is in a random state depending on which stage
+ * we are on resuming to userspace. Exception_enter/exit() handle that
+ * well by saving and restoring the current context tracking state.
*/
- user_exit();
+ prev_ctx = exception_enter();
schedule();
- user_enter();
+ exception_exit(prev_ctx);
}
#endif
> In any case, might it make sense to add warnings to user_exit and
> user_enter to ensure that they're called in the state in which they
> should be called?
Yeah I think we need to do that. We'll detect more easily issues like this
one.
next prev parent reply other threads:[~2014-12-03 23:49 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-03 18:19 audit: rcu_read_lock() used illegally while idle Dave Jones
2014-12-03 19:29 ` Paul E. McKenney
2014-12-03 20:06 ` Andy Lutomirski
2014-12-03 20:19 ` Dave Jones
2014-12-03 20:38 ` Andy Lutomirski
2014-12-03 22:08 ` Frederic Weisbecker
2014-12-03 22:12 ` Andy Lutomirski
2014-12-03 23:49 ` Frederic Weisbecker [this message]
2014-12-03 23:18 ` [PATCH] context_tracking: Restore previous state in schedule_user Andy Lutomirski
2014-12-03 23:26 ` Andy Lutomirski
2014-12-03 23:31 ` Dave Jones
2014-12-03 23:58 ` Frederic Weisbecker
2014-12-04 0:04 ` Andy Lutomirski
2014-12-04 0:30 ` Dave Jones
2014-12-04 0:38 ` Andy Lutomirski
2014-12-04 1:13 ` Frederic Weisbecker
2014-12-03 23:37 ` [PATCH v2] " Andy Lutomirski
2014-12-03 23:50 ` Paul E. McKenney
2014-12-04 0:01 ` Frederic Weisbecker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141203234922.GD31369@lerouge \
--to=fweisbec@gmail.com \
--cc=davej@redhat.com \
--cc=eparis@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=oleg@redhat.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=rgb@redhat.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox