All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Flynn <davidf@rd.bbc.co.uk>
To: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: David Flynn <davidf@rd.bbc.co.uk>,
	linux-nfs@vger.kernel.org, Chuck Lever <chuck.lever@oracle.com>
Subject: Re: NFS4 BAD_STATEID loop (kernel 3.0.4)
Date: Mon, 24 Oct 2011 14:50:27 +0000	[thread overview]
Message-ID: <20111024145027.GF32587@rd.bbc.co.uk> (raw)
In-Reply-To: <1319463165.2734.1.camel@lade.trondhjem.org>

* Chuck Lever (chuck.lever@oracle.com) wrote:
> Can you tell us a little more about the server?  Which release of
> Solaris?  What hardware?

SunOS 5.10 Generic_141444-09
(sparc)

* Trond Myklebust (Trond.Myklebust@netapp.com) wrote:
> I'm assuming then that your network trace showed no sign of any OPEN
> calls of that particular file, just retries of the WRITE?

Correct.

However, the good news is that it has just happened again (certainly
not quota related)

The blocked task:
[179068.773206] INFO: task bash:3293 blocked for more than 120 seconds.
[179068.779660] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[179068.787701] bash            D 0000000000000004     0  3293      1 0x00000000
[179068.795173]  ffff88001f97fca8 0000000000000086 ffff880426876008 0000000000012a40
[179068.802992]  ffff88001f97ffd8 0000000000012a40 ffff88001f97e000 0000000000012a40
[179068.810745]  0000000000012a40 0000000000012a40 ffff88001f97ffd8 0000000000012a40
[179068.818810] Call Trace:
[179068.821496]  [<ffffffff81110030>] ? __lock_page+0x70/0x70
[179068.827204]  [<ffffffff8160007c>] io_schedule+0x8c/0xd0
[179068.832952]  [<ffffffff8111003e>] sleep_on_page+0xe/0x20
[179068.838823]  [<ffffffff816008ff>] __wait_on_bit+0x5f/0x90
[179068.844734]  [<ffffffff81110203>] wait_on_page_bit+0x73/0x80
[179068.850798]  [<ffffffff81085bf0>] ? autoremove_wake_function+0x40/0x40
[179068.857879]  [<ffffffff8111c5e5>] ? pagevec_lookup_tag+0x25/0x40
[179068.864173]  [<ffffffff81110436>] filemap_fdatawait_range+0xf6/0x1a0
[179068.870721]  [<ffffffffa02167d0>] ? nfs_destroy_directcache+0x20/0x20 [nfs]
[179068.877963]  [<ffffffff8111bae1>] ? do_writepages+0x21/0x40
[179068.883744]  [<ffffffff811116bb>] ? __filemap_fdatawrite_range+0x5b/0x60
[179068.890867]  [<ffffffff81111730>] filemap_write_and_wait_range+0x70/0x80
[179068.898025]  [<ffffffff8119cc6a>] vfs_fsync_range+0x5a/0x90
[179068.904197]  [<ffffffff8119cd0c>] vfs_fsync+0x1c/0x20
[179068.909721]  [<ffffffffa020ac74>] nfs_file_flush+0x54/0x80 [nfs]
[179068.916069]  [<ffffffff8116ee7f>] filp_close+0x3f/0x90
[179068.921611]  [<ffffffff8116f8a7>] sys_close+0xb7/0x120
[179068.927328]  [<ffffffff8160a702>] system_call_fastpath+0x16/0x1b

$ echo 0 >/proc/sys/sunrpc/rpc_debug
[180179.009328] -pid- flgs status -client- --rqstp- -timeout ---ops--
[180179.015540] 40304 0801      0 ffff8804241ae800   (null)        0 ffffffffa023cd40 nfsv4 WRITE a:call_start q:NFS client

and our pingpong (more details at end):
14:07:07.307191 IP vc-fs1.rd.bbc.co.uk.1837702678 > home.rd.bbc.co.uk.nfs: 300 getattr fh 0,0/22
14:07:07.307471 IP home.rd.bbc.co.uk.nfs > vc-fs1.rd.bbc.co.uk.1837702678: reply ok 52 getattr ERROR: unk 10025

This system is up at the moment, if there is further detail you require
i can provide that.

NB, the system this occurred on is running kernel 3.0.4
Mount options as per earlier.

Kind regards,

..david

No.     Time            Source                Destination           Protocol Size  Info
     39 15:33:59.077143 172.29.190.28         172.29.120.140        NFS      370   V4 COMPOUND Call (Reply In 40) <EMPTY> PUTFH;WRITE;GETATTR

Frame 39: 370 bytes on wire (2960 bits), 370 bytes captured (2960 bits)
Ethernet II, Src: ChelsioC_07:49:6f (00:07:43:07:49:6f), Dst: All-HSRP-routers_be (00:00:0c:07:ac:be)
Internet Protocol, Src: 172.29.190.28 (172.29.190.28), Dst: 172.29.120.140 (172.29.120.140)
Transmission Control Protocol, Src Port: omginitialrefs (900), Dst Port: nfs (2049), Seq: 40433, Ack: 7449, Len: 304
Remote Procedure Call, Type:Call XID:0x43ce4e16
Network File System
    [Program Version: 4]
    [V4 Procedure: COMPOUND (1)]
    Tag: <EMPTY>
        length: 0
        contents: <EMPTY>
    minorversion: 0
    Operations (count: 3)
        Opcode: PUTFH (22)
            filehandle
                length: 36
                [hash (CRC-32): 0x6e4b15f3]
                decode type as: unknown
                filehandle: 7df3a75d5e1cd908000ab44c5b000000efc80200000a0300...
        Opcode: WRITE (38)
            stateid
            offset: 11474
            stable: FILE_SYNC4 (2)
            Write length: 68
            Data: <DATA>
                length: 68
                contents: <DATA>
        Opcode: GETATTR (9)
            GETATTR4args
                attr_request
                    bitmap[0] = 0x00000018
                        [2 attributes requested]
                        mand_attr: FATTR4_CHANGE (3)
                        mand_attr: FATTR4_SIZE (4)
                    bitmap[1] = 0x00300000
                        [2 attributes requested]
                        recc_attr: FATTR4_TIME_METADATA (52)
                        recc_attr: FATTR4_TIME_MODIFY (53)

No.     Time            Source                Destination           Protocol Size  Info
     40 15:33:59.077433 172.29.120.140        172.29.190.28         NFS      122   V4 COMPOUND Reply (Call In 39) <EMPTY> PUTFH;WRITE

Frame 40: 122 bytes on wire (976 bits), 122 bytes captured (976 bits)
Ethernet II, Src: Cisco_1e:f7:80 (00:13:5f:1e:f7:80), Dst: ChelsioC_07:49:6f (00:07:43:07:49:6f)
Internet Protocol, Src: 172.29.120.140 (172.29.120.140), Dst: 172.29.190.28 (172.29.190.28)
Transmission Control Protocol, Src Port: nfs (2049), Dst Port: omginitialrefs (900), Seq: 7449, Ack: 40737, Len: 56
Remote Procedure Call, Type:Reply XID:0x43ce4e16
Network File System
    [Program Version: 4]
    [V4 Procedure: COMPOUND (1)]
    Status: NFS4ERR_BAD_STATEID (10025)
    Tag: <EMPTY>
        length: 0
        contents: <EMPTY>
    Operations (count: 2)
        Opcode: PUTFH (22)
            Status: NFS4_OK (0)
        Opcode: WRITE (38)
            Status: NFS4ERR_BAD_STATEID (10025)


  reply	other threads:[~2011-10-24 14:50 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-24 10:40 NFS4 BAD_STATEID loop (kernel 3.0) David Flynn
2011-10-24 11:22 ` Trond Myklebust
2011-10-24 13:17   ` David Flynn
2011-10-24 13:32     ` Trond Myklebust
2011-10-24 14:50       ` David Flynn [this message]
2011-10-24 15:31         ` NFS4 BAD_STATEID loop (kernel 3.0.4) Trond Myklebust
2011-10-24 15:55           ` David Flynn
2011-10-27 22:17           ` David Flynn
2011-10-29  0:25             ` NFS4ERR_STALE_CLIENTID loop David Flynn
2011-10-29 17:29               ` Trond Myklebust
2011-10-29 18:02                 ` David Flynn
2011-10-29 18:22                   ` Myklebust, Trond
2011-10-29 18:23                     ` Chuck Lever
2011-10-29 18:26                       ` Myklebust, Trond
2011-10-29 18:29                         ` David Flynn
2011-10-29 18:15                 ` J. Bruce Fields
2011-10-29 18:21                   ` Myklebust, Trond
2011-10-29 18:47                     ` J. Bruce Fields
2011-10-29 18:50                       ` Chuck Lever
2011-10-29 19:19                         ` Myklebust, Trond
2011-10-29 19:52                           ` David Flynn
2011-10-29 20:42                             ` Myklebust, Trond
2011-10-29 21:07                               ` David Flynn
2011-10-29 21:12                                 ` Myklebust, Trond
2011-10-31 13:07                             ` Chuck Lever
2011-10-31 13:21                               ` David Flynn
2011-10-31 13:39                                 ` Chuck Lever
2011-10-24 13:43 ` NFS4 BAD_STATEID loop (kernel 3.0) Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111024145027.GF32587@rd.bbc.co.uk \
    --to=davidf@rd.bbc.co.uk \
    --cc=Trond.Myklebust@netapp.com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.