From: Trond Myklebust <trond.myklebust@fys.uio.no>
To: "Stephen R. van den Berg" <srb-PCMv+cxZuL0@public.gmane.org>
Cc: Andrew Morton <akpm@linux-foundation.org>, linux-nfs@vger.kernel.org
Subject: Re: Fw: Deadlock regression in v2.6.31.6
Date: Wed, 25 Nov 2009 09:31:52 -0500 [thread overview]
Message-ID: <1259159512.3314.12.camel@localhost> (raw)
In-Reply-To: <64b4daae0911250056g3364d24l98850a272dcfe483-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On Wed, 2009-11-25 at 09:56 +0100, Stephen R. van den Berg wrote:
> > The problem vanishes as soon as I run v2.6.31.5 (neither kernel contains
> > any significant modules).
>
> I did a bisect, and it turns out that the problem is there in 2.6.31.5 as well.
This makes sense. There have been no RPC level changes between 2.6.31.5
and 2.6.31.6.
> The traces are still valid. This is on an NFS mounted root partition
> (NFSv3 over TCP), no other filesystems mounted (except a tmpfs here or
> there). I turned on some debugging in net/sunrpc/sched.c, and the
> following happens when I execute "apt-get --reinstall install man-db"
> (it happens everytime, so it is very reproducible):
>
> RPC: 9697 setting alarm for 60000 ms
> RPC: 9697 __rpc_wake_up_task (now 7827)
> RPC: 9697 disabling timer
> RPC: 9697 removed from queue cfa72d88 "xprt_pending"
> RPC: __rpc_wake_up_task done
> RPC: 9697 __rpc_execute flags=0x1 cf849c44
> RPC: 9697 sleep_on(queue "xprt_pending" time 7828)
> RPC: 9697 added to queue cfa72d88 "xprt_pending"
> RPC: 9697 setting alarm for 60000 ms
> RPC: 9697 __rpc_wake_up_task (now 7830)
> RPC: 9697 disabling timer
> RPC: 9697 removed from queue cfa72d88 "xprt_pending"
> RPC: __rpc_wake_up_task done
> RPC: 9697 __rpc_execute flags=0x1 cf849c44
> RPC: 9697 sleep_on(queue "xprt_pending" time 7831)
> RPC: 9697 added to queue cfa72d88 "xprt_pending"
> RPC: 9697 setting alarm for 60000 ms
> RPC: 9697 __rpc_wake_up_task (now 7833)
> RPC: 9697 disabling timer
> RPC: 9697 removed from queue cfa72d88 "xprt_pending"
> RPC: __rpc_wake_up_task done
> RPC: 9697 __rpc_execute flags=0x1 cf849c44
> RPC: 9697 sleep_on(queue "xprt_pending" time 7835)
> RPC: 9697 added to queue cfa72d88 "xprt_pending"
> RPC: 9697 setting alarm for 60000 ms
> RPC: 9697 __rpc_wake_up_task (now 7836)
> RPC: 9697 disabling timer
> RPC: 9697 removed from queue cfa72d88 "xprt_pending"
> RPC: __rpc_wake_up_task done
> RPC: 9697 __rpc_execute flags=0x1 cf849c44
> RPC: 9697 sleep_on(queue "xprt_pending" time 7838)
> RPC: 9697 added to queue cfa72d88 "xprt_pending"
> RPC: 9697 setting alarm for 60000 ms
> RPC: 9697 __rpc_wake_up_task (now 7839)
> RPC: 9697 disabling timer
> RPC: 9697 removed from queue cfa72d88 "xprt_pending"
> RPC: __rpc_wake_up_task done
> RPC: 9697 __rpc_execute flags=0x1 cf849c44
> RPC: 9697 sleep_on(queue "xprt_pending" time 7841)
> RPC: 9697 added to queue cfa72d88 "xprt_pending"
> RPC: 9697 setting alarm for 60000 ms
> RPC: 9697 __rpc_wake_up_task (now 7842)
> RPC: 9697 disabling timer
> RPC: 9697 removed from queue cfa72d88 "xprt_pending"
> RPC: __rpc_wake_up_task done
> RPC: 9697 __rpc_execute flags=0x1 cf849c44
> RPC: 9697 sleep_on(queue "xprt_pending" time 7844)
> RPC: 9697 added to queue cfa72d88 "xprt_pending"
> RPC: 9697 setting alarm for 60000 ms
> RPC: 9697 __rpc_wake_up_task (now 7845)
> RPC: 9697 disabling timer
> RPC: 9697 removed from queue cfa72d88 "xprt_pending"
> RPC: __rpc_wake_up_task done
>
> Ad infinitum.
> The cf849c44 is the task parameter which I printed as well.
> It looks like an endless loop in the statemachine.
> The kernel hangs at this point, the only way to get out of there is
> using SysBreak.
> I tried debugging it further, but I got lost in the statemachine (I think).
This just means that the RPC client is waiting for a reply from the NFS
server.
Does 'netstat -t' show that there is an active TCP connection to the
server's nfs port?
Does wireshark show that the client should have received a reply?
Trond
next prev parent reply other threads:[~2009-11-25 14:31 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-11-25 7:35 Fw: Deadlock regression in v2.6.31.6 Andrew Morton
2009-11-25 8:56 ` Stephen R. van den Berg
[not found] ` <64b4daae0911250056g3364d24l98850a272dcfe483-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-11-25 9:00 ` Stephen R. van den Berg
2009-11-25 14:31 ` Trond Myklebust [this message]
2009-11-25 21:58 ` Stephen R. van den Berg
2009-11-25 23:11 ` Stephen R. van den Berg
[not found] ` <64b4daae0911251511q7a070b0aj1c07cdc5d6719b41-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-11-26 15:01 ` Trond Myklebust
2009-11-26 15:07 ` Stephen R. van den Berg
[not found] ` <64b4daae0911260707i4064f608w4f7169441640567-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-11-26 15:20 ` Trond Myklebust
2009-11-27 0:07 ` Stephen R. van den Berg
[not found] ` <64b4daae0911261607m10d1ba3al8c067f85249c198f-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-11-27 0:14 ` Stephen R. van den Berg
[not found] ` <64b4daae0911261614l471fb74fx79db2988f0c65738-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-11-27 21:23 ` Trond Myklebust
2009-11-28 0:20 ` Stephen R. van den Berg
[not found] ` <64b4daae0911271620k46a99666td81528fc863e69f0-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-11-28 15:30 ` Trond Myklebust
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1259159512.3314.12.camel@localhost \
--to=trond.myklebust@fys.uio.no \
--cc=akpm@linux-foundation.org \
--cc=linux-nfs@vger.kernel.org \
--cc=srb-PCMv+cxZuL0@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox