public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Jiufei Xue <jiufei.xue@linux.alibaba.com>
To: Olga Kornievskaia <aglo@umich.edu>
Cc: "J. Bruce Fields" <bfields@fieldses.org>,
	Anna Schumaker <Anna.Schumaker@netapp.com>,
	Trond Myklebust <trond.myklebust@primarydata.com>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	Joseph Qi <joseph.qi@linux.alibaba.com>
Subject: Re: [bug report] task hang while testing xfstests generic/323
Date: Fri, 1 Mar 2019 16:49:55 +0800	[thread overview]
Message-ID: <57796366-0f08-2cfb-6f85-27d5485c07af@linux.alibaba.com> (raw)
In-Reply-To: <3b6597b0-41b4-255a-5ebb-cf34ad95315c@linux.alibaba.com>



On 2019/3/1 下午1:08, Jiufei Xue wrote:
> Hi Olga,
> 
> On 2019/3/1 上午6:26, Olga Kornievskaia wrote:
>> On Thu, Feb 28, 2019 at 5:11 AM Jiufei Xue <jiufei.xue@linux.alibaba.com> wrote:
>>>
>>> Hi,
>>>
>>> when I tested xfstests/generic/323 with NFSv4.1 and v4.2, the task
>>> changed to zombie occasionally while a thread is hanging with the
>>> following stack:
>>>
>>> [<0>] rpc_wait_bit_killable+0x1e/0xa0 [sunrpc]
>>> [<0>] nfs4_do_close+0x21b/0x2c0 [nfsv4]
>>> [<0>] __put_nfs_open_context+0xa2/0x110 [nfs]
>>> [<0>] nfs_file_release+0x35/0x50 [nfs]
>>> [<0>] __fput+0xa2/0x1c0
>>> [<0>] task_work_run+0x82/0xa0
>>> [<0>] do_exit+0x2ac/0xc20
>>> [<0>] do_group_exit+0x39/0xa0
>>> [<0>] get_signal+0x1ce/0x5d0
>>> [<0>] do_signal+0x36/0x620
>>> [<0>] exit_to_usermode_loop+0x5e/0xc2
>>> [<0>] do_syscall_64+0x16c/0x190
>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [<0>] 0xffffffffffffffff
>>>
>>> Since commit 12f275cdd163(NFSv4: Retry CLOSE and DELEGRETURN on
>>> NFS4ERR_OLD_STATEID), the client will retry to close the file when
>>> stateid generation number in client is lower than server.
>>>
>>> The original intention of this commit is retrying the operation while
>>> racing with an OPEN. However, in this case the stateid generation remains
>>> mismatch forever.
>>>
>>> Any suggestions?
>>
>> Can you include a network trace of the failure? Is it possible that
>> the server has crashed on reply to the close and that's why the task
>> is hung? What server are you testing against?
>>
> I am testing the local server, the testcase is attached below.
> 
> # uname -r
> 5.0.0-rc8
> 
> # cat /etc/exports
> /mnt/vdd *(rw,no_subtree_check,no_root_squash)
> 
> # cat test.sh
> mount /dev/vdd /mnt/vdd
> systemctl start nfs
> mount -t nfs -o rw,relatime,vers=4.1 127.0.0.1:/mnt/vdd /mnt/nfsscratch
> 
> for ((i=0;i<10000;i++))
> do
>         echo "loop $i started"
>         /usr/local/xfstests/src/aio-dio-regress/aio-last-ref-held-by-io 1 100 /mnt/nfsscratch/aio-testfile
> 	sleep 5
> done
> 
> The command aio-last-ref-held-by-io from xfstests will spawn 100 threads. Each thread
> repeated doing open/io_submit/close/io_getevents/io_destroy until the parent exist
> because of timeout.
> 
> 
> Here is the log when the task hung.
> 
> [10841.121523] nfs4_close_prepare: begin!
> [10841.121526] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=10
> [10841.121528] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
> [10841.121530] nfs4_close_prepare: done!
> [10841.121536] encode_sequence: sessionid=1551405749:3822331530:1:0 seqid=59495845 slotid=0 max_slotid=0 cache_this=1
> [10841.121583] nfsd_dispatch: vers 4 proc 1
> [10841.121588] __find_in_sessionid_hashtbl: 1551405749:3822331530:1:0
> [10841.121590] nfsd4_sequence: slotid 0
> [10841.121592] check_slot_seqid enter. seqid 59495845 slot_seqid 59495844
> [10841.121596] nfsd: fh_verify(28: 01060001 0c66c6db d74ea4c9 61e12d9e c03beba6 0000000b)
> [10841.121607] nfsd: fh_verify(28: 01060001 0c66c6db d74ea4c9 61e12d9e c03beba6 0000000b)
> [10841.121613] NFSD: nfsd4_close on file aio-testfile
> [10841.121615] NFSD: nfs4_preprocess_seqid_op: seqid=0 stateid = (5c7892b5/e3d4268a/000ab905/0000746b)
> [10841.121619] nfsv4 compound returned 10024
> [10841.121621] --> nfsd4_store_cache_entry slot 00000000f540ddd9
> [10841.121625] renewing client (clientid 5c7892b5/e3d4268a)
> [10841.122526] decode_attr_type: type=00
> ......
> [10841.122558] nfs4_close_done: begin!
> [10841.122561] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=10
> [10841.122564] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
> [10841.122566] nfs4_free_slot: slotid 1 highest_used_slotid 0
> [10841.122568] nfs41_sequence_process: Error 0 free the slot
> [10841.122571] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
> [10841.122575] nfs4_close_done: done, ret = 0!
> [10841.122584] nfs4_close_prepare: begin!
> ......
> Client keep retry the CLOSE.
> 
>> I have seen trace where close would get ERR_OLD_STATEID and would
>> still retry with the same open state until it got a reply to the OPEN
>> which changed the state and when the client received reply to that,
>> it'll retry the CLOSE with the updated stateid.
>>
> 
> Yes, you are right. I have also noticed that. But what happened if the
> OPEN task exit receiving a signal? How can a client update the stateid?
> 
> Thanks,
> Jiufei
>

I have added a tracepoint nfsd_inc_and_copy_stateid and enable the following
trace: nfs4_open_file,nfs4_open_stateid_update on client and
nfsd_inc_and_copy_stateid on server.

Here's some trace right before the problem. The generation on server is 31856(7c70) which
is higher than client(31855). And the client didn't update the stateid forever.

aio-last-ref-he-5224  [000] ...1  1105.173446: nfs4_open_file: error=0 (OK) flags=49152 (O_DIRECT|0x8000) fmode=READ|0x801c fileid=00:2b:11 fhandle=0x0ca0e657 name=00:2b:2/aio-testfile stateid=31854:0x2a82450b openstateid=31854:0x2a82450b
 aio-last-ref-he-5222  [000] ...1  1105.173568: nfs4_open_file: error=-512 (ERESTARTSYS) flags=49152 (O_DIRECT|0x8000) fmode=READ fileid=00:2b:0 fhandle=0x00000000 name=00:2b:2/aio-testfile stateid=0:0x00000000 openstateid=0:0x00000000
            nfsd-2991  [000] ...1  1105.173661: nfsd_inc_and_copy_stateid: client 5c78ec26:60bfa0c2 stateid 00095d13:00007c6f
 aio-last-ref-he-5234  [000] ...1  1105.174000: nfs4_open_file: error=0 (OK) flags=49152 (O_DIRECT|0x8000) fmode=READ|0x801c fileid=00:2b:11 fhandle=0x0ca0e657 name=00:2b:2/aio-testfile stateid=31854:0x2a82450b openstateid=31854:0x2a82450b
 aio-last-ref-he-5151  [001] ...1  1105.174226: nfs4_open_file: error=-512 (ERESTARTSYS) flags=49152 (O_DIRECT|0x8000) fmode=READ|0x801c fileid=00:2b:0 fhandle=0x00000000 name=00:2b:2/aio-testfile stateid=0:0x00000000 openstateid=0:0x00000000
 aio-last-ref-he-5235  [001] ...1  1105.174544: nfs4_open_file: error=-512 (ERESTARTSYS) flags=49152 (O_DIRECT|0x8000) fmode=READ|0x801c fileid=00:2b:0 fhandle=0x00000000 name=00:2b:2/aio-testfile stateid=0:0x00000000 openstateid=0:0x00000000
 aio-last-ref-he-5141  [001] ...4  1105.174752: nfs4_open_stateid_update: error=-4 (0x4) fileid=00:2b:11 fhandle=0x0ca0e657 stateid=25925:0xe0cb0a41
 aio-last-ref-he-5151  [001] ...4  1105.178258: nfs4_open_stateid_update: error=-4 (0x4) fileid=00:2b:11 fhandle=0x0ca0e657 stateid=31855:0x2a82450b
           <...>-5181  [001] .N.1  1105.180254: nfs4_open_file: error=-512 (ERESTARTSYS) flags=49152 (O_DIRECT|0x8000) fmode=READ|0x801c fileid=00:2b:0 fhandle=0x00000000 name=00:2b:2/aio-testfile stateid=0:0x00000000 openstateid=0:0x00000000
 aio-last-ref-he-5151  [001] ...1  1105.180670: nfs4_open_file: error=-512 (ERESTARTSYS) flags=49152 (O_DIRECT|0x8000) fmode=READ fileid=00:2b:0 fhandle=0x00000000 name=00:2b:2/aio-testfile stateid=0:0x00000000 openstateid=0:0x00000000
 aio-last-ref-he-5235  [001] ...1  1105.180929: nfs4_open_file: error=-512 (ERESTARTSYS) flags=49152 (O_DIRECT|0x8000) fmode=READ fileid=00:2b:0 fhandle=0x00000000 name=00:2b:2/aio-testfile stateid=0:0x00000000 openstateid=0:0x00000000
 aio-last-ref-he-5181  [001] ...1  1105.181179: nfs4_open_file: error=-512 (ERESTARTSYS) flags=49152 (O_DIRECT|0x8000) fmode=READ fileid=00:2b:0 fhandle=0x00000000 name=00:2b:2/aio-testfile stateid=0:0x00000000 openstateid=0:0x00000000
 aio-last-ref-he-5141  [001] ...1  1105.181816: nfs4_open_file: error=0 (OK) flags=49152 (O_DIRECT|0x8000) fmode=READ|0x801c fileid=00:2b:11 fhandle=0x0ca0e657 name=00:2b:2/aio-testfile stateid=31855:0x2a82450b openstateid=31855:0x2a82450b
            nfsd-2990  [001] ...1  1105.182033: nfsd_inc_and_copy_stateid: client 5c78ec26:60bfa0c2 stateid 00095d13:00007c70

Thanks,
Jiufei


  reply	other threads:[~2019-03-01  8:50 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-28 10:10 [bug report] task hang while testing xfstests generic/323 Jiufei Xue
2019-02-28 22:26 ` Olga Kornievskaia
2019-02-28 23:56   ` Trond Myklebust
2019-03-01  5:19     ` Jiufei Xue
2019-03-01  5:08   ` Jiufei Xue
2019-03-01  8:49     ` Jiufei Xue [this message]
2019-03-01 13:08       ` Trond Myklebust
2019-03-02 16:34         ` Jiufei Xue
2019-03-04 15:20         ` Jiufei Xue
2019-03-04 15:50           ` Trond Myklebust
2019-03-05  5:09             ` Jiufei Xue
2019-03-05 14:45               ` Trond Myklebust
2019-03-06  9:59                 ` Jiufei Xue
2019-03-06 16:09                   ` bfields
2019-03-10 22:20                     ` Olga Kornievskaia
2019-03-11 14:30                       ` Trond Myklebust
2019-03-11 15:07                         ` Olga Kornievskaia
2019-03-11 15:13                           ` Olga Kornievskaia
2019-03-15  6:30                             ` Jiufei Xue
2019-03-15 20:33                               ` Olga Kornievskaia
2019-03-15 20:55                                 ` Trond Myklebust
2019-03-16 14:11                                 ` Jiufei Xue
2019-03-19 15:33                                   ` Olga Kornievskaia
2019-03-11 15:12                         ` Trond Myklebust
2019-03-11 15:14                           ` Olga Kornievskaia
2019-03-11 15:28                             ` Trond Myklebust

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57796366-0f08-2cfb-6f85-27d5485c07af@linux.alibaba.com \
    --to=jiufei.xue@linux.alibaba.com \
    --cc=Anna.Schumaker@netapp.com \
    --cc=aglo@umich.edu \
    --cc=bfields@fieldses.org \
    --cc=joseph.qi@linux.alibaba.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trond.myklebust@primarydata.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox