public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Trond Myklebust <trond.myklebust@fys.uio.no>
To: "Stephen R. van den Berg" <srb-PCMv+cxZuL0@public.gmane.org>
Cc: Andrew Morton <akpm@linux-foundation.org>, linux-nfs@vger.kernel.org
Subject: Re: Fw: Deadlock regression in v2.6.31.6
Date: Wed, 25 Nov 2009 09:31:52 -0500	[thread overview]
Message-ID: <1259159512.3314.12.camel@localhost> (raw)
In-Reply-To: <64b4daae0911250056g3364d24l98850a272dcfe483-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Wed, 2009-11-25 at 09:56 +0100, Stephen R. van den Berg wrote: 
> > The problem vanishes as soon as I run v2.6.31.5 (neither kernel contains
> > any significant modules).
> 
> I did a bisect, and it turns out that the problem is there in 2.6.31.5 as well.

This makes sense. There have been no RPC level changes between 2.6.31.5
and 2.6.31.6.

> The traces are still valid.  This is on an NFS mounted root partition
> (NFSv3 over TCP), no other filesystems mounted (except a tmpfs here or
> there).  I turned on some debugging in net/sunrpc/sched.c, and the
> following happens when I execute "apt-get --reinstall install man-db"
> (it happens everytime, so it is very reproducible):
> 
> RPC:  9697 setting alarm for 60000 ms
> RPC:  9697 __rpc_wake_up_task (now 7827)
> RPC:  9697 disabling timer
> RPC:  9697 removed from queue cfa72d88 "xprt_pending"
> RPC:       __rpc_wake_up_task done
> RPC:  9697 __rpc_execute flags=0x1 cf849c44
> RPC:  9697 sleep_on(queue "xprt_pending" time 7828)
> RPC:  9697 added to queue cfa72d88 "xprt_pending"
> RPC:  9697 setting alarm for 60000 ms
> RPC:  9697 __rpc_wake_up_task (now 7830)
> RPC:  9697 disabling timer
> RPC:  9697 removed from queue cfa72d88 "xprt_pending"
> RPC:       __rpc_wake_up_task done
> RPC:  9697 __rpc_execute flags=0x1 cf849c44
> RPC:  9697 sleep_on(queue "xprt_pending" time 7831)
> RPC:  9697 added to queue cfa72d88 "xprt_pending"
> RPC:  9697 setting alarm for 60000 ms
> RPC:  9697 __rpc_wake_up_task (now 7833)
> RPC:  9697 disabling timer
> RPC:  9697 removed from queue cfa72d88 "xprt_pending"
> RPC:       __rpc_wake_up_task done
> RPC:  9697 __rpc_execute flags=0x1 cf849c44
> RPC:  9697 sleep_on(queue "xprt_pending" time 7835)
> RPC:  9697 added to queue cfa72d88 "xprt_pending"
> RPC:  9697 setting alarm for 60000 ms
> RPC:  9697 __rpc_wake_up_task (now 7836)
> RPC:  9697 disabling timer
> RPC:  9697 removed from queue cfa72d88 "xprt_pending"
> RPC:       __rpc_wake_up_task done
> RPC:  9697 __rpc_execute flags=0x1 cf849c44
> RPC:  9697 sleep_on(queue "xprt_pending" time 7838)
> RPC:  9697 added to queue cfa72d88 "xprt_pending"
> RPC:  9697 setting alarm for 60000 ms
> RPC:  9697 __rpc_wake_up_task (now 7839)
> RPC:  9697 disabling timer
> RPC:  9697 removed from queue cfa72d88 "xprt_pending"
> RPC:       __rpc_wake_up_task done
> RPC:  9697 __rpc_execute flags=0x1 cf849c44
> RPC:  9697 sleep_on(queue "xprt_pending" time 7841)
> RPC:  9697 added to queue cfa72d88 "xprt_pending"
> RPC:  9697 setting alarm for 60000 ms
> RPC:  9697 __rpc_wake_up_task (now 7842)
> RPC:  9697 disabling timer
> RPC:  9697 removed from queue cfa72d88 "xprt_pending"
> RPC:       __rpc_wake_up_task done
> RPC:  9697 __rpc_execute flags=0x1 cf849c44
> RPC:  9697 sleep_on(queue "xprt_pending" time 7844)
> RPC:  9697 added to queue cfa72d88 "xprt_pending"
> RPC:  9697 setting alarm for 60000 ms
> RPC:  9697 __rpc_wake_up_task (now 7845)
> RPC:  9697 disabling timer
> RPC:  9697 removed from queue cfa72d88 "xprt_pending"
> RPC:       __rpc_wake_up_task done
> 
> Ad infinitum.
> The cf849c44 is the task parameter which I printed as well.
> It looks like an endless loop in the statemachine.
> The kernel hangs at this point, the only way to get out of there is
> using SysBreak.
> I tried debugging it further, but I got lost in the statemachine (I think).

This just means that the RPC client is waiting for a reply from the NFS
server.

Does 'netstat -t' show that there is an active TCP connection to the
server's nfs port?
Does wireshark show that the client should have received a reply?

Trond


  parent reply	other threads:[~2009-11-25 14:31 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-25  7:35 Fw: Deadlock regression in v2.6.31.6 Andrew Morton
2009-11-25  8:56 ` Stephen R. van den Berg
     [not found]   ` <64b4daae0911250056g3364d24l98850a272dcfe483-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-11-25  9:00     ` Stephen R. van den Berg
2009-11-25 14:31     ` Trond Myklebust [this message]
2009-11-25 21:58       ` Stephen R. van den Berg
2009-11-25 23:11       ` Stephen R. van den Berg
     [not found]         ` <64b4daae0911251511q7a070b0aj1c07cdc5d6719b41-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-11-26 15:01           ` Trond Myklebust
2009-11-26 15:07             ` Stephen R. van den Berg
     [not found]               ` <64b4daae0911260707i4064f608w4f7169441640567-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-11-26 15:20                 ` Trond Myklebust
2009-11-27  0:07                   ` Stephen R. van den Berg
     [not found]                     ` <64b4daae0911261607m10d1ba3al8c067f85249c198f-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-11-27  0:14                       ` Stephen R. van den Berg
     [not found]                         ` <64b4daae0911261614l471fb74fx79db2988f0c65738-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-11-27 21:23                           ` Trond Myklebust
2009-11-28  0:20                             ` Stephen R. van den Berg
     [not found]                               ` <64b4daae0911271620k46a99666td81528fc863e69f0-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-11-28 15:30                                 ` Trond Myklebust

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1259159512.3314.12.camel@localhost \
    --to=trond.myklebust@fys.uio.no \
    --cc=akpm@linux-foundation.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=srb-PCMv+cxZuL0@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox