* [PATCH] NLM: Fix "kernel BUG at fs/lockd/host.c:417!" or ".../host.c:283!"
@ 2011-01-24 20:50 Chuck Lever
2011-01-24 21:10 ` Nick Bowler
2011-01-24 22:40 ` J. Bruce Fields
0 siblings, 2 replies; 3+ messages in thread
From: Chuck Lever @ 2011-01-24 20:50 UTC (permalink / raw)
To: nbowler, trond.myklebust; +Cc: bfields, linux-nfs, linux-kernel
Nick Bowler <nbowler@elliptictech.com> reports:
> We were just having some NFS server troubles, and my client machine
> running 2.6.38-rc1+ (specifically, commit 2b1caf6ed7b888c95) crashed
> hard (syslog output appended to this mail).
>
> I'm not sure what the exact timeline was or how to reproduce this,
> but the server was rebooted during all this. Since I've never seen
> this happen before, it is possibly a regression from previous kernel
> releases. However, I recently updated my nfs-utils (on the client) to
> version 1.2.3, so that might be related as well.
[ BUG output redacted ]
When done searching, the for_each_host loop in next_host_state() falls
through and returns the final host on the host chain without bumping
it's reference count.
Since the host's ref count is only one at that point, releasing the
host in nlm_host_rebooted() attempts to destroy the host prematurely,
and therefore hits a BUG().
Likely, the original intent of the for_each_host behavior in
next_host_state() was to handle the case when the host chain is empty.
Searching the chain and finding no suitable host to return needs to be
handled as well.
Defensively restructure next_host_state() always to return NULL when
the loop falls through.
Introduced by commit b10e30f6 "lockd: reorganize nlm_host_rebooted".
Cc: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
I was able to reproduce this BUG on my client running 2.6.38-rc2
from earlier today. Here is a proposed fix. I can't reproduce the
BUG with this patch applied.
However, my reproducer hit the BUG in nlmsvc_release_host(). Nick hit
roughly the same BUG in nlmclnt_release_host(), which suggests his
"client" was also acting as a server. The symptoms are similar enough
that I believe this patch should be sufficient to address both cases.
fs/lockd/host.c | 9 +++++----
1 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/fs/lockd/host.c b/fs/lockd/host.c
index 5f1bcb2..b7c99bf 100644
--- a/fs/lockd/host.c
+++ b/fs/lockd/host.c
@@ -520,7 +520,7 @@ static struct nlm_host *next_host_state(struct hlist_head *cache,
struct nsm_handle *nsm,
const struct nlm_reboot *info)
{
- struct nlm_host *host = NULL;
+ struct nlm_host *host;
struct hlist_head *chain;
struct hlist_node *pos;
@@ -532,12 +532,13 @@ static struct nlm_host *next_host_state(struct hlist_head *cache,
host->h_state++;
nlm_get_host(host);
- goto out;
+ mutex_unlock(&nlm_host_mutex);
+ return host;
}
}
-out:
+
mutex_unlock(&nlm_host_mutex);
- return host;
+ return NULL;
}
/**
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] NLM: Fix "kernel BUG at fs/lockd/host.c:417!" or ".../host.c:283!"
2011-01-24 20:50 [PATCH] NLM: Fix "kernel BUG at fs/lockd/host.c:417!" or ".../host.c:283!" Chuck Lever
@ 2011-01-24 21:10 ` Nick Bowler
2011-01-24 22:40 ` J. Bruce Fields
1 sibling, 0 replies; 3+ messages in thread
From: Nick Bowler @ 2011-01-24 21:10 UTC (permalink / raw)
To: Chuck Lever; +Cc: trond.myklebust, bfields, linux-nfs, linux-kernel
On 2011-01-24 15:50 -0500, Chuck Lever wrote:
> Nick Bowler <nbowler@elliptictech.com> reports:
>
> > We were just having some NFS server troubles, and my client machine
> > running 2.6.38-rc1+ (specifically, commit 2b1caf6ed7b888c95) crashed
> > hard (syslog output appended to this mail).
[...]
> I was able to reproduce this BUG on my client running 2.6.38-rc2
> from earlier today. Here is a proposed fix. I can't reproduce the
> BUG with this patch applied.
>
> However, my reproducer hit the BUG in nlmsvc_release_host(). Nick hit
> roughly the same BUG in nlmclnt_release_host(), which suggests his
> "client" was also acting as a server.
Yeah, there are NFS exports on the client machine as well; although
as nobody had them mounted (and due to the timing of the crash) I hadn't
considered them.
> The symptoms are similar enough that I believe this patch should be
> sufficient to address both cases.
I'll try the patch, but since I don't know how to reproduce the failure
(rebooting the server just to see if it crashes again isn't an option),
I'll have to take your word for it.
Thanks,
--
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] NLM: Fix "kernel BUG at fs/lockd/host.c:417!" or ".../host.c:283!"
2011-01-24 20:50 [PATCH] NLM: Fix "kernel BUG at fs/lockd/host.c:417!" or ".../host.c:283!" Chuck Lever
2011-01-24 21:10 ` Nick Bowler
@ 2011-01-24 22:40 ` J. Bruce Fields
1 sibling, 0 replies; 3+ messages in thread
From: J. Bruce Fields @ 2011-01-24 22:40 UTC (permalink / raw)
To: Chuck Lever; +Cc: nbowler, trond.myklebust, linux-nfs, linux-kernel
On Mon, Jan 24, 2011 at 03:50:26PM -0500, Chuck Lever wrote:
> Nick Bowler <nbowler@elliptictech.com> reports:
>
> > We were just having some NFS server troubles, and my client machine
> > running 2.6.38-rc1+ (specifically, commit 2b1caf6ed7b888c95) crashed
> > hard (syslog output appended to this mail).
> >
> > I'm not sure what the exact timeline was or how to reproduce this,
> > but the server was rebooted during all this. Since I've never seen
> > this happen before, it is possibly a regression from previous kernel
> > releases. However, I recently updated my nfs-utils (on the client) to
> > version 1.2.3, so that might be related as well.
>
> [ BUG output redacted ]
>
> When done searching, the for_each_host loop in next_host_state() falls
> through and returns the final host on the host chain without bumping
> it's reference count.
>
> Since the host's ref count is only one at that point, releasing the
> host in nlm_host_rebooted() attempts to destroy the host prematurely,
> and therefore hits a BUG().
>
> Likely, the original intent of the for_each_host behavior in
> next_host_state() was to handle the case when the host chain is empty.
> Searching the chain and finding no suitable host to return needs to be
> handled as well.
>
> Defensively restructure next_host_state() always to return NULL when
> the loop falls through.
>
> Introduced by commit b10e30f6 "lockd: reorganize nlm_host_rebooted".
Whoops, thanks for finding that. The fix looks right to me.
--b.
>
> Cc: J. Bruce Fields <bfields@fieldses.org>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>
> I was able to reproduce this BUG on my client running 2.6.38-rc2
> from earlier today. Here is a proposed fix. I can't reproduce the
> BUG with this patch applied.
>
> However, my reproducer hit the BUG in nlmsvc_release_host(). Nick hit
> roughly the same BUG in nlmclnt_release_host(), which suggests his
> "client" was also acting as a server. The symptoms are similar enough
> that I believe this patch should be sufficient to address both cases.
>
>
> fs/lockd/host.c | 9 +++++----
> 1 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/fs/lockd/host.c b/fs/lockd/host.c
> index 5f1bcb2..b7c99bf 100644
> --- a/fs/lockd/host.c
> +++ b/fs/lockd/host.c
> @@ -520,7 +520,7 @@ static struct nlm_host *next_host_state(struct hlist_head *cache,
> struct nsm_handle *nsm,
> const struct nlm_reboot *info)
> {
> - struct nlm_host *host = NULL;
> + struct nlm_host *host;
> struct hlist_head *chain;
> struct hlist_node *pos;
>
> @@ -532,12 +532,13 @@ static struct nlm_host *next_host_state(struct hlist_head *cache,
> host->h_state++;
>
> nlm_get_host(host);
> - goto out;
> + mutex_unlock(&nlm_host_mutex);
> + return host;
> }
> }
> -out:
> +
> mutex_unlock(&nlm_host_mutex);
> - return host;
> + return NULL;
> }
>
> /**
>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2011-01-24 22:40 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-24 20:50 [PATCH] NLM: Fix "kernel BUG at fs/lockd/host.c:417!" or ".../host.c:283!" Chuck Lever
2011-01-24 21:10 ` Nick Bowler
2011-01-24 22:40 ` J. Bruce Fields
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).