All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Vrable <mvrable@cs.ucsd.edu>
To: xen-devel@lists.xensource.com
Subject: Re: Grant Table Network Issues
Date: Sun, 14 Aug 2005 08:39:53 -0700	[thread overview]
Message-ID: <20050814153953.GA5037@vrable.net> (raw)
In-Reply-To: <85c87b7ce2e667949306ca7da953d219@cl.cam.ac.uk>

On Sun, Aug 14, 2005 at 09:29:02AM +0100, Keir Fraser wrote:
> On 13 Aug 2005, at 19:59, Michael Vrable wrote:
> 
> >The line causing trouble is "BUG_ON(in_irq())".  In this example, I had
> >tcpdump running in both domains; this seems to trigger the problem more
> >reliably.  I've also seen a similar crash with a TCP connection, but it
> >takes a few packets before this shows up (the handshake completes, and
> >the crash happens about the time data packets come back from domain-0;
> >if checksumming optimizations are enabled, it seems the packets are
> >dropped so I don't see a crash but I don't get any data either).
> 
> On the stack trace, at irq_exit() you definitely have no hardirqs or 
> softirqs in progress. But somehow, at kmap_skb_frag(), the hardirq 
> section of the preempt mask has become non-zero. You can't have been 
> preempted to another cpu during any of this because the preempt mask is 
> continuously non-zero throughout original irq handling and subsequent 
> softirq handling.
> 
> The only code between irq_exit and kmap_skb_frag on the stack trace is 
> unmodified Linux code. Assuming that is all correct (and presumably the 
> same whether we enable grant tables or not) I might guess another 
> interrupt arrives and the handler corrupts things?

I discovered the cause of this and other crashes yesterday: when grant
tables are enabled in the netback driver, net_tx_action() and
net_tx_action_dealloc() in netback.c each allocate large arrays from the
kernel's stack ("gnttab_map_grant_ref_t map_ops[MAX_PENDING_REQS]" and
"gnttab_unmap_grant_ref_t unmap_ops[MAX_PENDING_REQS]").  This results
in a stack overflow.

This in turn causes a number of very spectacular crashes.  One of the
more subtle crashes is the in_irq() bug; the preempt count is stored in
the current thread info, at the bottom of the stack.

Allocating the arrays statically fixes the problem for me.  Steve Hand
says he'll likely be committing a fix soon.

--Michael Vrable

  reply	other threads:[~2005-08-14 15:39 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-08-13 18:59 Grant Table Network Issues Michael Vrable
2005-08-14  8:29 ` Keir Fraser
2005-08-14 15:39   ` Michael Vrable [this message]
2005-08-14 15:46     ` Michael Vrable
2005-08-14 15:59       ` Ian Pratt
2005-08-14 16:15         ` Steven Hand
2005-08-14 16:27         ` Michael Vrable
2005-08-14 16:43           ` Michael Vrable
2005-08-14 16:53     ` David Hopwood

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050814153953.GA5037@vrable.net \
    --to=mvrable@cs.ucsd.edu \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.