All of lore.kernel.org
 help / color / mirror / Atom feed
From: "J. Bruce Fields" <bfields@fieldses.org>
To: Simon Kirby <sim@hostway.ca>
Cc: linux-nfs@vger.kernel.org, Greg Banks <gnb-xTcybq6BZ68@public.gmane.org>
Subject: Re: kernel NULL pointer dereference in rpcb_getport_done (2.6.29.4)
Date: Fri, 10 Jul 2009 18:34:08 -0400	[thread overview]
Message-ID: <20090710223408.GR10700@fieldses.org> (raw)
In-Reply-To: <20090709172739.GG13617@hostway.ca>

On Thu, Jul 09, 2009 at 10:27:39AM -0700, Simon Kirby wrote:
> Hello,
> 
> It seems this email to Greg Banks is bouncing (no longer works at SGI),

Yes, I've cc'd his new address.  (But he's on vacation.)

> and I see git commit 59a252ff8c0f2fa32c896f69d56ae33e641ce7ad is still
> in HEAD (and still causing problems for our load).
> 
> Can somebody else eyeball this, please?  I don't understand enough about
> this particular change to fix the request latency / queue backlogging
> that this patch seems to introduce.
> 
> It would seem to me that this patch is flawed because svc_xprt_enqueue()
> is edge-triggered upon the arrival of packets, but the NFS threads
> themselves cannot then pull another request off of the socket queue. 
> This patch likely helps with the particular benchmark, but not in our
> load case where there is a heavy mix of cached and uncached NFS requests.

That sounds plausible.  I'll need to take some time to look at it.

--b.

> 
> Simon-
> 
> On Mon, Jun 22, 2009 at 02:11:26PM -0700, Simon Kirby wrote:
> 
> > On Sat, Jun 20, 2009 at 10:09:41PM -0700, Simon Kirby wrote:
> > 
> > > Actually, we just saw another similar crash on another machine which is
> > > an NFS client from this server (no nfsd running).  Same backtrace, but
> > > this time RAX was "32322e32352e3031", which is obviously ASCII
> > > ("22.25.01"), so memory scribbling seems to definitely be happening...
> > 
> > Good news: 2.6.30 seems to have fixed whatever the original scribbling
> > source was.  I see at least a couple of suspect commits in the log, but
> > I'm not sure which yet.
> > 
> > However, with 2.6.30, it seems 59a252ff8c0f2fa32c896f69d56ae33e641ce7ad
> > is causing us a large performance regression.  The server's response
> > latency is huge compared to normal.  I suspected this patch was the
> > culprit, so I wrote over the instruction that loads SVC_MAX_WAKING before
> > this comparison:
> > 
> > +	if (pool->sp_nwaking >= SVC_MAX_WAKING) {
> > +		/* too many threads are runnable and trying to wake up */
> > +		thread_avail = 0;
> > +	}
> > 
> > ...when I raised SVC_MAX_WAKING to 40ish, the problem for us disappears. 
> > 
> > The problem is that with just 72 nfsd processes running, the NFS socket
> > has a ~1 MB backlog of packets on it, even though "ps" shows most of the
> > nfsd threads are not blocked.  This is on an 8 core system, with high NFS
> > packet rates.  More NFS threads (300) made no difference.
> > 
> > As soon as I raised SVC_MAX_WAKING, the load average went up again to
> > what it normally was before with 2.6.29, but the socket's receive backlog
> > went down to nearly 0 again, and the request latency is now back to
> > normal.
> > 
> > I think the issue here is that whatever calls svc_xprt_enqueue() isn't
> > doing it again as soon as the threads sleep again, but only when the next
> > packet comes in, or something...
> > 
> > Simon-
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2009-07-10 22:34 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-19 22:54 kernel NULL pointer dereference in rpcb_getport_done (2.6.29.4) Simon Kirby
2009-06-20 19:57 ` Trond Myklebust
     [not found]   ` <1245527855.5182.33.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-06-21  5:09     ` Simon Kirby
2009-06-22 21:11       ` Simon Kirby
2009-07-09 17:27         ` Simon Kirby
2009-07-10 22:34           ` J. Bruce Fields [this message]
2009-08-10 23:55             ` J. Bruce Fields
2009-08-11 17:17               ` Simon Kirby
2009-10-15 21:46                 ` Simon Kirby
2009-10-15 22:52                   ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090710223408.GR10700@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=gnb-xTcybq6BZ68@public.gmane.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=sim@hostway.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.