From: Robin Holt <holt@sgi.com>
To: linux-ia64@vger.kernel.org
Subject: [PATCH] get_wchan on running task sometimes MCAs the machine.
Date: Thu, 17 May 2007 11:16:52 +0000 [thread overview]
Message-ID: <20070517111651.GA760@lnx-holt.americas.sgi.com> (raw)
Make ia64's get_wchan safer by not unwinding a running tasks stack.
Stolen from i386's get_wchan.
Signed-off-by: Robin Holt <holt@sgi.com>
---
We have seen one customer machine experience four MCAs in the last
13 days. All have a similar failure in that the processor is trying to
access some hardware reserved memory. I believe this is occurring because
the unwind code called from get_wchan references some memory from another
task while it is being changed by that task. One factor may be the large
number of I/O adapters spread throughout the system with the enormous
number of disks on the back side. IIRC, we have six dual-port FC HBAs
connecting via multiple paths to more than 6,000 disks. The machine is
under heavy I/O load. The customer application seems to fork one task
for each MPI rank (16) and then each of those creates 30+ pthreads.
The parent process then seems to be calling through proc_tgid_stat
... get_wchan, unw_unwind where it references an illegal address.
Of the four failures I have looked at, only one had a value similar to
the illegal address. The other three appear may have been overwritten.
In all cases, the reference appears to be within a few cache lines of
the end of physical memory.
I am speculating that this is due to get_wchan operating on a running
task. If I wave my hands enough, I can make this feel like it makes
sense. That is, until you realize that this most recent failure (the
one with the similar value still in the stack page) was when this
task was unwinding its own stack. I can see some evidence we _MAY_
have taken an interrupt recently, but I still have not found a way to
explain this failure.
Any suggestions would be greatly appreciated.
All that said, I have put together the following simple patch stolen
directly from i386's get_wchan. If the task is running, why even try.
Index: linux-tot-20070517/arch/ia64/kernel/process.c
=================================--- linux-tot-20070517.orig/arch/ia64/kernel/process.c 2007-05-17 05:39:54.000000000 -0500
+++ linux-tot-20070517/arch/ia64/kernel/process.c 2007-05-17 05:44:26.820535382 -0500
@@ -763,6 +763,9 @@ get_wchan (struct task_struct *p)
unsigned long ip;
int count = 0;
+ if (!p || p = current || p->state = TASK_RUNNING)
+ return 0;
+
/*
* Note: p may not be a blocked task (it could be current or
* another process running on some other CPU. Rather than
next reply other threads:[~2007-05-17 11:16 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-17 11:16 Robin Holt [this message]
2007-05-17 11:38 ` [PATCH] get_wchan on running task sometimes MCAs the machine Keith Owens
2007-05-17 13:00 ` Robin Holt
2007-05-17 13:05 ` Keith Owens
2007-05-17 14:16 ` David Mosberger-Tang
2007-05-18 3:02 ` Robin Holt
2007-05-18 18:23 ` David Mosberger-Tang
2007-05-18 18:35 ` Robin Holt
2007-05-18 23:01 ` Luck, Tony
2007-05-19 2:08 ` Robin Holt
2007-05-19 2:26 ` David Mosberger-Tang
2007-05-23 3:32 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070517111651.GA760@lnx-holt.americas.sgi.com \
--to=holt@sgi.com \
--cc=linux-ia64@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox