From: Eric Sandeen <sandeen@sandeen.net>
To: Dave Chinner <david@fromorbit.com>, Dave Jones <davej@redhat.com>,
Al Viro <viro@ZenIV.linux.org.uk>,
Linus Torvalds <torvalds@linux-foundation.org>,
Linux Kernel <linux-kernel@vger.kernel.org>,
xfs@oss.sgi.com
Subject: Re: 3.14-rc2 XFS backtrace because irqs_disabled.
Date: Wed, 12 Feb 2014 09:57:53 -0600 [thread overview]
Message-ID: <52FB9A01.8060601@sandeen.net> (raw)
In-Reply-To: <20140212061038.GC13997@dastard>
On 2/12/14, 12:10 AM, Dave Chinner wrote:
> On Wed, Feb 12, 2014 at 12:50:27AM -0500, Dave Jones wrote:
>> On Wed, Feb 12, 2014 at 04:40:43PM +1100, Dave Chinner wrote:
>>
>> > None of the XFS code disables interrupts in that path, not does is
>> > call outside XFS except to dispatch IO. The stack is pretty deep at
>> > this point and I know that the standard (non stacked) IO stack can
>> > consume >3kb of stack space when it gets down to having to do memory
>> > reclaim during GFP_NOIO allocation at the lowest level of SCSI
>> > drivers. Stack overruns typically show up with symptoms like we are
>> > seeing.
>> > ..
>> >
>> > Dave, before chasing ghosts, can you (like Eric originally asked)
>> > turn on stack overrun detection?
>>
>> CONFIG_DEBUG_STACKOVERFLOW ? Already turned on.
>
> That only checks stack usage when an interrupt is taken. If no
> interrupts are taken when stack usage is within 128 bytes of
> overflow, then it doesn't catch it.
>
> I tend to use CONFIG_DEBUG_STACK_USAGE=y as it records the maximum
> stack usage of a process via canary overwrites and it records it in
> do_exit(). I also use the stack tracer to record the largest stack
> usage seen so I know exactly what code paths are approaching stack
> overruns...
>
> Cheers,
>
> Dave.
>
I'm not sure if I'm off base here, but maybe this would make sense: check
for a corrupted stack in __might_sleep. Compile tested only,
possibly inelegant, and/or completely wrong, but:
From: Eric Sandeen <sandeen@redhat.com>
sched: Test for corrupted task_struct in __might_sleep
If a thread overruns the stack, it may corrupt the task_struct,
leading to false positives on tests like irqs_disabled().
Warn if this seems to be the case.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b46131e..6920c3c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6934,6 +6934,8 @@ static inline int preempt_count_equals(int preempt_offset)
void __might_sleep(const char *file, int line, int preempt_offset)
{
+ struct task_struct *tsk = current;
+ unsigned long *stackend;
static unsigned long prev_jiffy; /* ratelimiting */
rcu_sleep_check(); /* WARN_ON_ONCE() by default, no rate limit reqd. */
@@ -6952,6 +6954,11 @@ void __might_sleep(const char *file, int line, int preempt_offset)
in_atomic(), irqs_disabled(),
current->pid, current->comm);
+ /* A corrupted stack can cause a false positive on irqs_disabled etc */
+ stackend = end_of_stack(tsk);
+ if (tsk != &init_task && *stackend != STACK_END_MAGIC)
+ printk(KERN_EMERG "Thread overran stack, or stack corrupted\n");
+
debug_show_held_locks(current);
if (irqs_disabled())
print_irqtrace_events(current);
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
WARNING: multiple messages have this Message-ID (diff)
From: Eric Sandeen <sandeen@sandeen.net>
To: Dave Chinner <david@fromorbit.com>, Dave Jones <davej@redhat.com>,
Al Viro <viro@ZenIV.linux.org.uk>,
Linus Torvalds <torvalds@linux-foundation.org>,
Linux Kernel <linux-kernel@vger.kernel.org>,
xfs@oss.sgi.com
Subject: Re: 3.14-rc2 XFS backtrace because irqs_disabled.
Date: Wed, 12 Feb 2014 09:57:53 -0600 [thread overview]
Message-ID: <52FB9A01.8060601@sandeen.net> (raw)
In-Reply-To: <20140212061038.GC13997@dastard>
On 2/12/14, 12:10 AM, Dave Chinner wrote:
> On Wed, Feb 12, 2014 at 12:50:27AM -0500, Dave Jones wrote:
>> On Wed, Feb 12, 2014 at 04:40:43PM +1100, Dave Chinner wrote:
>>
>> > None of the XFS code disables interrupts in that path, not does is
>> > call outside XFS except to dispatch IO. The stack is pretty deep at
>> > this point and I know that the standard (non stacked) IO stack can
>> > consume >3kb of stack space when it gets down to having to do memory
>> > reclaim during GFP_NOIO allocation at the lowest level of SCSI
>> > drivers. Stack overruns typically show up with symptoms like we are
>> > seeing.
>> > ..
>> >
>> > Dave, before chasing ghosts, can you (like Eric originally asked)
>> > turn on stack overrun detection?
>>
>> CONFIG_DEBUG_STACKOVERFLOW ? Already turned on.
>
> That only checks stack usage when an interrupt is taken. If no
> interrupts are taken when stack usage is within 128 bytes of
> overflow, then it doesn't catch it.
>
> I tend to use CONFIG_DEBUG_STACK_USAGE=y as it records the maximum
> stack usage of a process via canary overwrites and it records it in
> do_exit(). I also use the stack tracer to record the largest stack
> usage seen so I know exactly what code paths are approaching stack
> overruns...
>
> Cheers,
>
> Dave.
>
I'm not sure if I'm off base here, but maybe this would make sense: check
for a corrupted stack in __might_sleep. Compile tested only,
possibly inelegant, and/or completely wrong, but:
From: Eric Sandeen <sandeen@redhat.com>
sched: Test for corrupted task_struct in __might_sleep
If a thread overruns the stack, it may corrupt the task_struct,
leading to false positives on tests like irqs_disabled().
Warn if this seems to be the case.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b46131e..6920c3c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6934,6 +6934,8 @@ static inline int preempt_count_equals(int preempt_offset)
void __might_sleep(const char *file, int line, int preempt_offset)
{
+ struct task_struct *tsk = current;
+ unsigned long *stackend;
static unsigned long prev_jiffy; /* ratelimiting */
rcu_sleep_check(); /* WARN_ON_ONCE() by default, no rate limit reqd. */
@@ -6952,6 +6954,11 @@ void __might_sleep(const char *file, int line, int preempt_offset)
in_atomic(), irqs_disabled(),
current->pid, current->comm);
+ /* A corrupted stack can cause a false positive on irqs_disabled etc */
+ stackend = end_of_stack(tsk);
+ if (tsk != &init_task && *stackend != STACK_END_MAGIC)
+ printk(KERN_EMERG "Thread overran stack, or stack corrupted\n");
+
debug_show_held_locks(current);
if (irqs_disabled())
print_irqtrace_events(current);
next prev parent reply other threads:[~2014-02-12 15:57 UTC|newest]
Thread overview: 132+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-11 17:27 3.14-rc2 XFS backtrace because irqs_disabled Dave Jones
2014-02-11 17:27 ` Dave Jones
2014-02-11 21:08 ` Dave Chinner
2014-02-11 21:08 ` Dave Chinner
2014-02-11 21:49 ` Eric Sandeen
2014-02-11 21:49 ` Eric Sandeen
2014-02-12 0:44 ` Dave Jones
2014-02-12 0:44 ` Dave Jones
2014-02-12 1:09 ` Al Viro
2014-02-12 1:09 ` Al Viro
2014-02-12 2:52 ` Linus Torvalds
2014-02-12 2:52 ` Linus Torvalds
2014-02-12 4:03 ` Dave Jones
2014-02-12 4:03 ` Dave Jones
2014-02-12 4:22 ` Al Viro
2014-02-12 4:22 ` Al Viro
2014-02-12 5:40 ` Dave Chinner
2014-02-12 5:40 ` Dave Chinner
2014-02-12 5:50 ` Dave Jones
2014-02-12 5:50 ` Dave Jones
2014-02-12 6:10 ` Dave Chinner
2014-02-12 6:10 ` Dave Chinner
2014-02-12 6:31 ` Dave Chinner
2014-02-12 6:31 ` Dave Chinner
2014-02-12 6:59 ` Linus Torvalds
2014-02-12 6:59 ` Linus Torvalds
2014-02-12 8:13 ` Tejun Heo
2014-02-12 8:13 ` Tejun Heo
2014-02-12 12:44 ` Steven Rostedt
2014-02-12 12:44 ` Steven Rostedt
2014-02-12 8:35 ` Dave Chinner
2014-02-12 8:35 ` Dave Chinner
2014-02-12 12:50 ` Steven Rostedt
2014-02-12 12:50 ` Steven Rostedt
2014-02-12 12:40 ` Steven Rostedt
2014-02-12 12:40 ` Steven Rostedt
2014-02-12 13:29 ` Peter Zijlstra
2014-02-12 13:29 ` Peter Zijlstra
2014-02-12 14:25 ` Dave Jones
2014-02-12 14:25 ` Dave Jones
2014-02-12 21:14 ` Dave Chinner
2014-02-12 21:14 ` Dave Chinner
2014-02-12 15:57 ` Eric Sandeen [this message]
2014-02-12 15:57 ` Eric Sandeen
2014-02-12 6:28 ` Linus Torvalds
2014-02-12 6:28 ` Linus Torvalds
2014-02-12 7:18 ` Dave Chinner
2014-02-12 7:18 ` Dave Chinner
2014-02-14 0:24 ` Dave Chinner
2014-02-14 0:24 ` Dave Chinner
2014-02-14 16:01 ` Dave Jones
2014-02-14 16:01 ` Dave Jones
2014-02-15 22:23 ` Dave Chinner
2014-02-15 22:23 ` Dave Chinner
2014-02-15 22:28 ` Dave Jones
2014-02-15 22:28 ` Dave Jones
2014-02-15 22:43 ` Linus Torvalds
2014-02-15 22:43 ` Linus Torvalds
2014-02-15 23:50 ` Linus Torvalds
2014-02-15 23:50 ` Linus Torvalds
2014-02-15 23:50 ` Linus Torvalds
2014-02-18 1:27 ` Dave Chinner
2014-02-18 1:27 ` Dave Chinner
2014-02-18 1:27 ` Dave Chinner
2014-02-12 11:39 ` Al Viro
2014-02-12 11:39 ` Al Viro
2014-02-12 20:13 ` Linus Torvalds
2014-02-12 20:13 ` Linus Torvalds
2014-02-12 21:14 ` Al Viro
2014-02-12 21:14 ` Al Viro
2014-02-12 21:32 ` Linus Torvalds
2014-02-12 21:32 ` Linus Torvalds
2014-02-12 21:44 ` Al Viro
2014-02-12 21:44 ` Al Viro
2014-02-13 20:51 ` Al Viro
2014-02-13 20:51 ` Al Viro
2014-02-14 0:09 ` Al Viro
2014-02-14 0:09 ` Al Viro
2014-02-14 13:25 ` Christoph Hellwig
2014-02-14 13:25 ` Christoph Hellwig
2014-02-14 13:29 ` Richard Weinberger
2014-02-14 13:29 ` Richard Weinberger
2014-02-14 15:20 ` Al Viro
2014-02-14 15:20 ` Al Viro
2014-02-14 16:08 ` Oleg Nesterov
2014-02-14 16:08 ` Oleg Nesterov
2014-02-13 17:40 ` Oleg Nesterov
2014-02-13 17:40 ` Oleg Nesterov
2014-02-13 17:58 ` Linus Torvalds
2014-02-13 17:58 ` Linus Torvalds
2014-02-13 18:10 ` Oleg Nesterov
2014-02-13 18:10 ` Oleg Nesterov
2014-02-13 18:37 ` Oleg Nesterov
2014-02-13 18:37 ` Oleg Nesterov
2014-02-15 5:25 ` Al Viro
2014-02-15 5:25 ` Al Viro
2014-02-15 14:27 ` Oleg Nesterov
2014-02-15 14:27 ` Oleg Nesterov
2014-02-15 15:22 ` Al Viro
2014-02-15 15:22 ` Al Viro
2014-02-15 15:33 ` Oleg Nesterov
2014-02-15 15:33 ` Oleg Nesterov
2014-02-15 15:36 ` Al Viro
2014-02-15 15:36 ` Al Viro
2014-02-15 15:58 ` Al Viro
2014-02-15 15:58 ` Al Viro
2014-02-15 16:59 ` Al Viro
2014-02-15 16:59 ` Al Viro
2014-02-15 17:43 ` Oleg Nesterov
2014-02-15 17:43 ` Oleg Nesterov
2014-02-15 18:05 ` Al Viro
2014-02-15 18:05 ` Al Viro
2014-02-15 18:45 ` Oleg Nesterov
2014-02-15 18:45 ` Oleg Nesterov
2014-02-17 16:57 ` Oleg Nesterov
2014-02-17 16:57 ` Oleg Nesterov
2014-02-17 17:40 ` Al Viro
2014-02-17 17:40 ` Al Viro
2014-02-17 17:46 ` Oleg Nesterov
2014-02-17 17:46 ` Oleg Nesterov
2014-02-17 17:54 ` Al Viro
2014-02-17 17:54 ` Al Viro
2014-02-14 16:13 ` Christoph Hellwig
2014-02-14 16:13 ` Christoph Hellwig
2014-02-14 16:16 ` Al Viro
2014-02-14 16:16 ` Al Viro
2014-02-14 16:18 ` Al Viro
2014-02-14 16:18 ` Al Viro
2014-02-14 16:19 ` Christoph Hellwig
2014-02-14 16:19 ` Christoph Hellwig
2014-02-15 14:46 ` Oleg Nesterov
2014-02-15 14:46 ` Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52FB9A01.8060601@sandeen.net \
--to=sandeen@sandeen.net \
--cc=davej@redhat.com \
--cc=david@fromorbit.com \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@ZenIV.linux.org.uk \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.