* [bisect] NFS regression breaks X
@ 2007-05-09 21:30 Jeff Garzik
2007-05-09 21:51 ` Linus Torvalds
2007-05-09 22:03 ` Trond Myklebust
0 siblings, 2 replies; 9+ messages in thread
From: Jeff Garzik @ 2007-05-09 21:30 UTC (permalink / raw)
To: Chuck Lever, Trond Myklebust
Cc: Andrew Morton, Linus Torvalds, Linux Kernel Mailing List,
NeilBrown, Adrian Bunk
Original bug report, with hardware and software info:
http://lkml.org/lkml/2007/5/8/667
I love bisect :) bisect has identified the following commit as the one
that causes my GNOME login to die, within 10 seconds of logging in:
commit 2bea90d43a050bbc4021d44e59beb34f384438db
Author: Chuck Lever <chuck.lever@oracle.com>
Date: Thu Mar 29 16:47:53 2007 -0400
SUNRPC: RPC buffer size estimates are too large
100% reproducible, verified regression. My home directory is an NFSv4
mount, and the problem appears on my client workstation, so this makes
some sense:
> sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
> pretzel:/ on /g type nfs4 (rw,noatime,proto=tcp,addr=10.10.10.1)
As an aside, let me express the hope that the NFS developers develop
better patch creation methods. My bisect compile repeatedly died at
> CHK include/linux/version.h
> CHK include/linux/utsrelease.h
> CHK include/linux/compile.h
> fs/nfs/pagelist.c:239: error: conflicting types for ‘nfs_pageio_init’
> include/linux/nfs_page.h:80: error: previous declaration of ‘nfs_pageio_init’ was here
> make[2]: *** [fs/nfs/pagelist.o] Error 1
> make[1]: *** [fs/nfs] Error 2
> make: *** [fs] Error 2
which indicates that someone on the NFS team did not create
wholly-contained patches when submitted to the kernel. Build breakage
should not be fixed in a later commit (unless the breakage already went
upstream), because -- as we see here -- it breaks bisection.
Jeff, occasionally guilty of same, and trying to reform
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [bisect] NFS regression breaks X
2007-05-09 21:30 [bisect] NFS regression breaks X Jeff Garzik
@ 2007-05-09 21:51 ` Linus Torvalds
2007-05-09 22:17 ` Trond Myklebust
2007-05-09 22:48 ` Jeff Garzik
2007-05-09 22:03 ` Trond Myklebust
1 sibling, 2 replies; 9+ messages in thread
From: Linus Torvalds @ 2007-05-09 21:51 UTC (permalink / raw)
To: Jeff Garzik
Cc: Chuck Lever, Trond Myklebust, Andrew Morton,
Linux Kernel Mailing List, NeilBrown, Adrian Bunk
On Wed, 9 May 2007, Jeff Garzik wrote:
>
> I love bisect :)
Yeah, me too.
> bisect has identified the following commit as the one that
> causes my GNOME login to die, within 10 seconds of logging in:
>
> commit 2bea90d43a050bbc4021d44e59beb34f384438db
Ok, that commit looks nice in many ways, so I'd be loathe to revert it
entirely.
Can you try it with this patch that re-introduces the "total overkill"
slack calculations, and probably makes them even worse (there's a few
left-shifts added, and now it will left-shift the extra slacktoo).
But there are also some unexplained changes in that patch, so maybe the
size allocation isn't the big problem. For example, now "call_allocate()"
will set task->tk_status to zero which is totally strange.
But if it's the allocation that is too small, this patch may help.
Trond, Chuck?
Linus
--
net/sunrpc/clnt.c | 7 ++++++-
1 files changed, 6 insertions(+), 1 deletions(-)
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index d8fbee4..5b78692 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -738,6 +738,8 @@ call_reserveresult(struct rpc_task *task)
rpc_exit(task, status);
}
+#define RPC_SLACK_SPACE (1024u) /* apparently NOT total overkill */
+
/*
* 2. Allocate the buffer. For details, see sched.c:rpc_malloc.
* (Note: buffer memory is freed in xprt_release).
@@ -745,7 +747,7 @@ call_reserveresult(struct rpc_task *task)
static void
call_allocate(struct rpc_task *task)
{
- unsigned int slack = task->tk_auth->au_cslack;
+ unsigned int slack;
struct rpc_rqst *req = task->tk_rqstp;
struct rpc_xprt *xprt = task->tk_xprt;
struct rpc_procinfo *proc = task->tk_msg.rpc_proc;
@@ -764,6 +766,9 @@ call_allocate(struct rpc_task *task)
BUG_ON(proc->p_replen == 0);
}
+ /* Apparently not total overkill */
+ slack = max(task->tk_auth->au_cslack, RPC_SLACK_SPACE);
+
/*
* Calculate the size (in quads) of the RPC call
* and reply headers, and convert both values
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [bisect] NFS regression breaks X
2007-05-09 21:51 ` Linus Torvalds
@ 2007-05-09 22:17 ` Trond Myklebust
2007-05-09 22:48 ` Jeff Garzik
1 sibling, 0 replies; 9+ messages in thread
From: Trond Myklebust @ 2007-05-09 22:17 UTC (permalink / raw)
To: Linus Torvalds
Cc: Jeff Garzik, Chuck Lever, Andrew Morton,
Linux Kernel Mailing List, NeilBrown, Adrian Bunk
On Wed, 2007-05-09 at 14:51 -0700, Linus Torvalds wrote:
>
> On Wed, 9 May 2007, Jeff Garzik wrote:
> >
> > I love bisect :)
>
> Yeah, me too.
>
> > bisect has identified the following commit as the one that
> > causes my GNOME login to die, within 10 seconds of logging in:
> >
> > commit 2bea90d43a050bbc4021d44e59beb34f384438db
>
> Ok, that commit looks nice in many ways, so I'd be loathe to revert it
> entirely.
>
> Can you try it with this patch that re-introduces the "total overkill"
> slack calculations, and probably makes them even worse (there's a few
> left-shifts added, and now it will left-shift the extra slacktoo).
>
> But there are also some unexplained changes in that patch, so maybe the
> size allocation isn't the big problem. For example, now "call_allocate()"
> will set task->tk_status to zero which is totally strange.
>
> But if it's the allocation that is too small, this patch may help.
>
> Trond, Chuck?
We'd really like to fix this by getting the pre-allocation tables right
in nfs4xdr.c. Chuck has already identified one incorrect value:
http://linux-nfs.org/cgi-bin/gitweb.cgi?p=nfs-2.6.git;a=commitdiff;h=6ce7dc940701cf3fde3c6e826a696b333092cbb1;hp=aa3d1faebe6e214cd96be0e587571477ff6fd9fc
that I just asked you to pull as part of a series of bugfixes. I'm
hoping that will suffice to fix Jeff's case too.
Trond
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [bisect] NFS regression breaks X
2007-05-09 21:51 ` Linus Torvalds
2007-05-09 22:17 ` Trond Myklebust
@ 2007-05-09 22:48 ` Jeff Garzik
1 sibling, 0 replies; 9+ messages in thread
From: Jeff Garzik @ 2007-05-09 22:48 UTC (permalink / raw)
To: Linus Torvalds
Cc: Chuck Lever, Trond Myklebust, Andrew Morton,
Linux Kernel Mailing List, NeilBrown, Adrian Bunk
Linus Torvalds wrote:
> Can you try it with this patch that re-introduces the "total overkill"
> slack calculations, and probably makes them even worse (there's a few
> left-shifts added, and now it will left-shift the extra slacktoo).
I'll try this patch, and separately, the Trond push that just appeared
on LKML.
But after losing several hours discovering this "known issue," I need to
attend to my own patches for today first. So, testing won't be immediate.
Jeff
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [bisect] NFS regression breaks X
2007-05-09 21:30 [bisect] NFS regression breaks X Jeff Garzik
2007-05-09 21:51 ` Linus Torvalds
@ 2007-05-09 22:03 ` Trond Myklebust
2007-05-09 22:52 ` Andrew Morton
1 sibling, 1 reply; 9+ messages in thread
From: Trond Myklebust @ 2007-05-09 22:03 UTC (permalink / raw)
To: Jeff Garzik
Cc: Chuck Lever, Andrew Morton, Linus Torvalds,
Linux Kernel Mailing List, NeilBrown, Adrian Bunk
On Wed, 2007-05-09 at 17:30 -0400, Jeff Garzik wrote:
> Original bug report, with hardware and software info:
> http://lkml.org/lkml/2007/5/8/667
>
> I love bisect :) bisect has identified the following commit as the one
> that causes my GNOME login to die, within 10 seconds of logging in:
>
> commit 2bea90d43a050bbc4021d44e59beb34f384438db
> Author: Chuck Lever <chuck.lever@oracle.com>
> Date: Thu Mar 29 16:47:53 2007 -0400
>
> SUNRPC: RPC buffer size estimates are too large
>
> 100% reproducible, verified regression. My home directory is an NFSv4
> mount, and the problem appears on my client workstation, so this makes
> some sense:
> > sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
> > pretzel:/ on /g type nfs4 (rw,noatime,proto=tcp,addr=10.10.10.1)
Known issue. Could you try applying the commit at
http://linux-nfs.org/cgi-bin/gitweb.cgi?p=nfs-2.6.git;a=commitdiff;h=6ce7dc940701cf3fde3c6e826a696b333092cbb1;hp=aa3d1faebe6e214cd96be0e587571477ff6fd9fc
and see if that suffices to fix the issue?
>
> As an aside, let me express the hope that the NFS developers develop
> better patch creation methods. My bisect compile repeatedly died at
>
> > CHK include/linux/version.h
> > CHK include/linux/utsrelease.h
> > CHK include/linux/compile.h
> > fs/nfs/pagelist.c:239: error: conflicting types for ‘nfs_pageio_init’
> > include/linux/nfs_page.h:80: error: previous declaration of ‘nfs_pageio_init’ was here
> > make[2]: *** [fs/nfs/pagelist.o] Error 1
> > make[1]: *** [fs/nfs] Error 2
> > make: *** [fs] Error 2
>
> which indicates that someone on the NFS team did not create
> wholly-contained patches when submitted to the kernel. Build breakage
> should not be fixed in a later commit (unless the breakage already went
> upstream), because -- as we see here -- it breaks bisection.
>
> Jeff, occasionally guilty of same, and trying to reform
The problem is that the above only appears to break on 64-bit compiles.
gcc gave no errors at all on a 386, and so the problem was detected only
after the merge.
Cheers
Trond
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [bisect] NFS regression breaks X
2007-05-09 22:03 ` Trond Myklebust
@ 2007-05-09 22:52 ` Andrew Morton
2007-05-10 0:10 ` Trond Myklebust
0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2007-05-09 22:52 UTC (permalink / raw)
To: Trond Myklebust
Cc: Jeff Garzik, Chuck Lever, Linus Torvalds,
Linux Kernel Mailing List, NeilBrown, Adrian Bunk
On Wed, 09 May 2007 18:03:26 -0400
Trond Myklebust <Trond.Myklebust@netapp.com> wrote:
> On Wed, 2007-05-09 at 17:30 -0400, Jeff Garzik wrote:
> > Original bug report, with hardware and software info:
> > http://lkml.org/lkml/2007/5/8/667
> >
> > I love bisect :) bisect has identified the following commit as the one
> > that causes my GNOME login to die, within 10 seconds of logging in:
> >
> > commit 2bea90d43a050bbc4021d44e59beb34f384438db
> > Author: Chuck Lever <chuck.lever@oracle.com>
> > Date: Thu Mar 29 16:47:53 2007 -0400
> >
> > SUNRPC: RPC buffer size estimates are too large
> >
> > 100% reproducible, verified regression. My home directory is an NFSv4
> > mount, and the problem appears on my client workstation, so this makes
> > some sense:
> > > sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
> > > pretzel:/ on /g type nfs4 (rw,noatime,proto=tcp,addr=10.10.10.1)
>
> Known issue.
It's a bit rough that Jeff spent a large amount of time hunting down an
already-known bug. That's normally my job :(
This five-week-old diff only ever appeared in 2.6.21-mm1, which was
released four days ago. It was then whizzed into mainline. We thus lost
five weeks public testing which would probably have saved Jeff his pain.
What went wrong?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [bisect] NFS regression breaks X
2007-05-09 22:52 ` Andrew Morton
@ 2007-05-10 0:10 ` Trond Myklebust
2007-05-10 13:36 ` Chuck Lever
2007-05-11 22:54 ` Jeff Garzik
0 siblings, 2 replies; 9+ messages in thread
From: Trond Myklebust @ 2007-05-10 0:10 UTC (permalink / raw)
To: Andrew Morton
Cc: Jeff Garzik, Chuck Lever, Linus Torvalds,
Linux Kernel Mailing List, NeilBrown, Adrian Bunk
On Wed, 2007-05-09 at 15:52 -0700, Andrew Morton wrote:
> It's a bit rough that Jeff spent a large amount of time hunting down an
> already-known bug. That's normally my job :(
The bug was reported by Florin Iucha (on lkml!) on Saturday. It has only
just been debugged, and I was in fact in the middle of marshalling the
fixes.
> This five-week-old diff only ever appeared in 2.6.21-mm1, which was
> released four days ago. It was then whizzed into mainline. We thus lost
> five weeks public testing which would probably have saved Jeff his pain.
>
> What went wrong?
Probably my fault. I've had a couple of weeks of heavy travel due to
various circumstances that were beyond my control, and so I had little
time in which to test the stuff and push it out.
Another factor that is affecting us is the slow but gradual collapse of
the OSDL NFSv4 regression testing effort.
Trond
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [bisect] NFS regression breaks X
2007-05-10 0:10 ` Trond Myklebust
@ 2007-05-10 13:36 ` Chuck Lever
2007-05-11 22:54 ` Jeff Garzik
1 sibling, 0 replies; 9+ messages in thread
From: Chuck Lever @ 2007-05-10 13:36 UTC (permalink / raw)
To: Andrew Morton
Cc: Trond Myklebust, Jeff Garzik, Linus Torvalds,
Linux Kernel Mailing List, NeilBrown, Adrian Bunk
[-- Attachment #1: Type: text/plain, Size: 1140 bytes --]
Trond Myklebust wrote:
> On Wed, 2007-05-09 at 15:52 -0700, Andrew Morton wrote:
>> It's a bit rough that Jeff spent a large amount of time hunting down an
>> already-known bug. That's normally my job :(
>
> The bug was reported by Florin Iucha (on lkml!) on Saturday. It has only
> just been debugged, and I was in fact in the middle of marshalling the
> fixes.
>
>> This five-week-old diff only ever appeared in 2.6.21-mm1, which was
>> released four days ago. It was then whizzed into mainline. We thus lost
>> five weeks public testing which would probably have saved Jeff his pain.
>>
>> What went wrong?
>
> Probably my fault. I've had a couple of weeks of heavy travel due to
> various circumstances that were beyond my control, and so I had little
> time in which to test the stuff and push it out.
>
> Another factor that is affecting us is the slow but gradual collapse of
> the OSDL NFSv4 regression testing effort.
I had expected that many of these issues would be caught by the OSDL
test harness. I learned only yesterday that it was no longer available,
so I am making an effort to broaden my personal test regime.
[-- Attachment #2: chuck.lever.vcf --]
[-- Type: text/x-vcard, Size: 291 bytes --]
begin:vcard
fn:Chuck Lever
n:Lever;Chuck
org:Oracle Corporation;Corporate Architecture: Linux Projects Group
adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA
title:Principal Member of Staff
tel;work:+1 248 614 5091
x-mozilla-html:FALSE
url:http://oss.oracle.com/~cel/
version:2.1
end:vcard
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [bisect] NFS regression breaks X
2007-05-10 0:10 ` Trond Myklebust
2007-05-10 13:36 ` Chuck Lever
@ 2007-05-11 22:54 ` Jeff Garzik
1 sibling, 0 replies; 9+ messages in thread
From: Jeff Garzik @ 2007-05-11 22:54 UTC (permalink / raw)
To: Trond Myklebust, Andrew Morton
Cc: Chuck Lever, Linus Torvalds, Linux Kernel Mailing List, NeilBrown,
Adrian Bunk, Michal Piotrowski
ACK -- this regression was fixed by Trond's recent NFS bugfix push upstream.
Jeff
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2007-05-11 22:55 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-09 21:30 [bisect] NFS regression breaks X Jeff Garzik
2007-05-09 21:51 ` Linus Torvalds
2007-05-09 22:17 ` Trond Myklebust
2007-05-09 22:48 ` Jeff Garzik
2007-05-09 22:03 ` Trond Myklebust
2007-05-09 22:52 ` Andrew Morton
2007-05-10 0:10 ` Trond Myklebust
2007-05-10 13:36 ` Chuck Lever
2007-05-11 22:54 ` Jeff Garzik
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox