All of lore.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <fweisbec@gmail.com>
To: sparclinux@vger.kernel.org
Subject: Re: [PATCH 7/7] sparc64: Add function graph tracer support.
Date: Fri, 16 Apr 2010 23:14:12 +0000	[thread overview]
Message-ID: <20100416231408.GA10006@nowhere> (raw)
In-Reply-To: <20100412.234300.212396783.davem@davemloft.net>

On Fri, Apr 16, 2010 at 01:47:01PM -0700, David Miller wrote:
> From: Frederic Weisbecker <fweisbec@gmail.com>
> Date: Fri, 16 Apr 2010 17:44:21 +0200
> 
> > """(note the hrtimer warnings are normals. This is a hanging prevention
> > that has been added because of the function graph tracer first but
> > eventually serves as a general protection for hrtimer. It's about
> > similar to the balancing problem scheme: the time to service timers
> > is so slow that timers re-expire before we exit the servicing loop,
> > so we risk an endless loop)."""
> 
> I don't think it's normal in this case, I suspect we loop because
> of some kind of corruption.
> 
> > That said it also means there is a problem I think. It's normal
> > that it happens in a guest, but not a normal box. May be there
> > a contention in the tracer fast path that slows down the machine.
> 
> I think it's looping not because of contention, but because of
> corrupted memory/registers.
> 
> > Do you have CONFIG_DEBUG_LOCKDEP enabled? This was one of the
> > sources of these contentions (fixed lately in -tip but for
> > .35).
> 
> I'm using PROVE_LOCKING but not DEBUG_LOCKDEP.
> 
> Anyways, consistently my machine crashes with completely corrupted
> registers in either irq_exit() or __do_softirq().  Usually we get an
> unaligned access of some sort, either accessing the stack (because %fp
> is garbage) or via an indirect call (usually because %i7 is garbage).
> 
> One thing that's interesting about the softirq path is that it uses
> the softirq stack.  The only thing that guards us jumping onto the
> softirq_stack are the tests done by do_softirq(), mainly
> !in_interrupt() and we have softirqs pending.
> 
> What if preempt_count() got corrupted in such a way that we end up
> evaluating in_interrupt() to zero when we shouldn't?
> 
> If that happens, and this makes us jump onto the top of softirq stack
> of the current cpu multiple times, that could cause some wild
> corruptions.
> 
> Another thing I've noticed is that there appears to be some kind of
> pattern to many of the register corruptions I've seen.  There is
> a pattern of 64-bit values that often looks like this (in memory
> order):
> 
> 0xffffffffc3300000
> 0xffffffffc33000cc
> 0xffffffffc3d00000
> 0xffffffffc3d000cc
> 0xffffffffc4000000
> 0xffffffffc40000cc
> 
> and, from another trace:
> 
> 0xffffffffc6100000
> 0xffffffffc61000cc
> 0xffffffffc6a00000
> 0xffffffffc6a000cc
> 0xffffffffc6e00000
> 0xffffffffc6e000cc
> 
> They look like some kind of descriptor.  The closest thing I could
> find were the scatter-gather descriptors used by the Fusion mptsas
> driver, but I can't find a way that the descriptors would be formed
> exactly like the above, but it does come close.
> 
> For example, drivers/message/fusion/mptscsih.c:mptscsih_qcmd()
> has this call:
> 
> 		ioc->add_sge((char *)&pScsiReq->SGL,
> 			MPT_SGE_FLAGS_SSIMPLE_READ | 0,
> 			(dma_addr_t) -1);
> 
> which puts -1 into the address field, but this doesn't exactly line up
> because the 32-bit SGE descriptors are in the order "flags" then
> "address" not the other way around.
> 
> Ho hum... anyways, just looking for clues.  If those are mptsas
> descriptors, then it would be consistent with how I've found that the
> block I/O path seems to invariably be involved during the crashes.
> In another trace (that time with PROVE_LOCKING disabled) I saw
> the host->host_lock passed down into spin_lock_irqsave() being NULL.
> And this was in the software interrupt handler.



Hmm, just a random idea: do you think it could be due to stack overflows?
Because the function graph eats more stack by digging to function graph
handlers, ring buffer, etc...

It diggs further than what is supposed to happen without tracing.


  parent reply	other threads:[~2010-04-16 23:14 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-13  6:43 [PATCH 7/7] sparc64: Add function graph tracer support David Miller
2010-04-13 19:18 ` Frederic Weisbecker
2010-04-13 19:39 ` Rostedt
2010-04-13 19:45 ` Frederic Weisbecker
2010-04-13 21:34 ` David Miller
2010-04-13 21:35 ` David Miller
2010-04-13 21:51 ` Frederic Weisbecker
2010-04-13 21:52 ` Steven Rostedt
2010-04-13 21:56 ` David Miller
2010-04-13 21:57 ` David Miller
2010-04-13 21:57 ` Frederic Weisbecker
2010-04-13 22:05 ` Frederic Weisbecker
2010-04-13 22:11 ` David Miller
2010-04-13 23:34 ` David Miller
2010-04-13 23:56 ` David Miller
2010-04-14  1:59 ` David Miller
2010-04-14  9:04 ` David Miller
2010-04-14 15:29 ` Frederic Weisbecker
2010-04-14 15:48 ` Frederic Weisbecker
2010-04-14 23:08 ` David Miller
2010-04-16  9:12 ` David Miller
2010-04-16 15:44 ` Frederic Weisbecker
2010-04-16 20:47 ` David Miller
2010-04-16 22:51 ` David Miller
2010-04-16 23:14 ` Frederic Weisbecker [this message]
2010-04-16 23:17 ` David Miller
2010-04-17  7:51 ` David Miller
2010-04-17 16:59 ` Frederic Weisbecker
2010-04-17 17:22 ` Frederic Weisbecker
2010-04-17 21:24 ` David Miller
2010-04-17 21:25 ` David Miller
2010-04-17 21:29 ` David Miller
2010-04-17 21:34 ` Frederic Weisbecker
2010-04-17 21:38 ` Frederic Weisbecker
2010-04-17 21:38 ` David Miller
2010-04-17 21:41 ` Frederic Weisbecker
2010-04-18 15:31 ` Frederic Weisbecker
2010-04-18 21:19 ` David Miller
2010-04-19  7:56 ` David Miller
2010-04-19  8:15 ` David Miller
2010-04-19 19:52 ` Frederic Weisbecker
2010-04-19 19:56 ` David Miller
2010-04-19 20:37 ` Frederic Weisbecker
2010-04-20  5:51 ` David Miller
2010-04-20  7:50 ` David Miller
2010-04-20 13:58 ` Frederic Weisbecker
2010-04-20 21:17 ` David Miller
2010-04-20 22:52 ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100416231408.GA10006@nowhere \
    --to=fweisbec@gmail.com \
    --cc=sparclinux@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.