From: Jim Rees <rees@umich.edu>
To: linux-nfs@vger.kernel.org
Cc: peter honeyman <honey@citi.umich.edu>
Subject: hung iozone client bug recap
Date: Thu, 3 Mar 2011 17:22:19 -0500 [thread overview]
Message-ID: <20110303222219.GA16426@merit.edu> (raw)
I've asked about this before but now I have some new information.
When I run a multithreaded iozone test against an EMC server, I reliably get
a hung client. We have reproduced this problem with a wide range of client
kernel versions, from 2.6.33.3-85.fc13.x86_64 up to benny's pnfs-all-latest,
and on both 4.0 and 4.1.
Trond thinks the server is failing to reply to some rpc, but looking at the
trace I don't see it. The hang happens about 39 seconds in. The trace is
here:
http://www.citi.umich.edu/projects/nfsv4/pnfs/block/80.pcap.gz
The iozone command is like this. Note the -o option for O_SYNC, this seem
to be important. There are three threads on each of four client.
iozone -r 2m -s 256m -w -W -c -t 12 -i 0 -+m ioz-4x3 -o -+n
The call trace is usually something like this:
INFO: task iozone:3587 blocked for more than 120 seconds.
[<ffffffff810c1314>] ? sync_page+0x0/0x45
[<ffffffff814297bc>] io_schedule+0x6e/0xb0
[<ffffffff810c1355>] sync_page+0x41/0x45
[<ffffffff81429cf8>] __wait_on_bit+0x43/0x76
[<ffffffff810c14ae>] wait_on_page_bit+0x6d/0x74
[<ffffffff8106484b>] ? wake_bit_function+0x0/0x2e
[<ffffffff810c94c0>] ? pagevec_lookup_tag+0x20/0x29
[<ffffffff810c1751>] filemap_fdatawait_range+0x9f/0x173
[<ffffffff810c18ce>] filemap_write_and_wait_range+0x3e/0x51
[<ffffffff8111fa53>] vfs_fsync_range+0x5a/0xad
[<ffffffff8111faf9>] generic_write_sync+0x53/0x55
[<ffffffff810c1d4b>] generic_file_aio_write+0x86/0xa2
[<ffffffffa0321bf8>] nfs_file_write+0xed/0x169 [nfs]
[<ffffffff811017c5>] do_sync_write+0xbf/0xfc
[<ffffffff810f4dc9>] ? __slab_free+0x28/0x22e
[<ffffffff81204b7d>] ? might_fault+0x1c/0x1e
[<ffffffff811be22b>] ? security_file_permission+0x11/0x13
[<ffffffff81101d23>] vfs_write+0xa9/0x106
[<ffffffff81101e36>] sys_write+0x45/0x69
[<ffffffff81009b02>] system_call_fastpath+0x16/0x1b
Here's a sample rpc queue (not from the same run as the trace):
echo 0 >/proc/sys/sunrpc/rpc_debug
r38-13-78: [ 1999.083961] -pid- flgs status -client- --rqstp- -timeout ---ops--
r38-13-78: [ 1999.083987] 34411 0001 0 ffff880247b2e800 ffff8802550e6930 60000 ffffffffa03c3810 nfsv4 WRITE a:call_status q:xprt_pending
r38-13-78: [ 1999.084004] 34412 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084021] 34413 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084036] 34414 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084050] 34415 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084064] 34416 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084078] 34417 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:NFS client
r38-13-78: [ 1999.084092] 34418 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084105] 34419 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:NFS client
r38-13-78: [ 1999.084119] 34420 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084137] 34421 0001 0 ffff880247b2e800 ffff8802550e6dc8 60000 ffffffffa03c3810 nfsv4 WRITE a:call_status q:xprt_pending
r38-13-78: [ 1999.084153] 34422 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084167] 34423 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084181] 34424 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084195] 34425 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084209] 34426 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084223] 34427 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084236] 34428 0001 0 ffff880247b2e800 ffff8802550e7260 60000 ffffffffa03c3810 nfsv4 WRITE a:call_status q:xprt_pending
r38-13-78: [ 1999.084250] 34429 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084262] 34430 0001 0 ffff880247b2e800 ffff8802550e6498 60000 ffffffffa03c3810 nfsv4 WRITE a:call_status q:xprt_pending
r38-13-78: [ 1999.084276] 34431 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084288] 34432 0001 0 ffff880247b2e800 ffff8802550e6620 60000 ffffffffa03c3810 nfsv4 WRITE a:call_status q:xprt_pending
r38-13-78: [ 1999.084301] 34433 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084316] 34434 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084330] 34435 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084344] 34436 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:NFS client
r38-13-78: [ 1999.084357] 34437 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084369] 34438 0001 -11 ffff880247b2e800 ffff8802550e6188 60000 ffffffffa03c3810 nfsv4 WRITE a:call_transmit_status q:xprt_pending
r38-13-78: [ 1999.084402] 34439 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084419] 34440 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084433] 34441 0001 0 ffff880247b2e800 ffff8802550e6c40 60000 ffffffffa03c3810 nfsv4 WRITE a:call_status q:xprt_pending
r38-13-78: [ 1999.084449] 34442 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:NFS client
r38-13-78: [ 1999.084465] 34443 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:NFS client
r38-13-78: [ 1999.084481] 34444 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084498] 34445 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:NFS client
r38-13-78: [ 1999.084514] 34446 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084529] 34447 0001 0 ffff880247b2e800 ffff8802550e73e8 60000 ffffffffa03c3810 nfsv4 WRITE a:call_status q:xprt_pending
r38-13-78: [ 1999.084545] 34448 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084561] 34449 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084577] 34450 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084592] 34451 0001 0 ffff880247b2e800 ffff8802550e70d8 60000 ffffffffa03c3810 nfsv4 WRITE a:call_status q:xprt_pending
r38-13-78: [ 1999.084608] 34452 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:NFS client
r38-13-78: [ 1999.084625] 34453 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084642] 34454 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084658] 34455 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084674] 34456 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084690] 34457 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:NFS client
r38-13-78: [ 1999.084706] 34458 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084722] 34459 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084738] 34460 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084754] 34461 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084770] 34462 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084786] 34463 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:NFS client
r38-13-78: [ 1999.084802] 34464 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084817] 34465 0001 -11 ffff880247b2e800 ffff8802550e6f50 0 ffffffffa03c3810 nfsv4 WRITE a:call_status q:xprt_sending
r38-13-78: [ 1999.084833] 34466 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:NFS client
r38-13-78: [ 1999.084848] 34467 0001 0 ffff880247b2e800 ffff8802550e7570 60000 ffffffffa03c3810 nfsv4 WRITE a:call_status q:xprt_pending
r38-13-78: [ 1999.084864] 34468 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084878] 34469 0001 -11 ffff880247b2e800 ffff8802550e6310 0 ffffffffa03c3810 nfsv4 WRITE a:call_status q:xprt_sending
r38-13-78: [ 1999.084894] 34470 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:NFS client
r38-13-78: [ 1999.084910] 34471 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084925] 34472 0001 0 ffff880247b2e800 ffff8802550e76f8 60000 ffffffffa03c3810 nfsv4 WRITE a:call_status q:xprt_pending
r38-13-78: [ 1999.084943] 34473 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084959] 34474 0001 0 ffff880247b2e800 (null) 0 ffffffffa03c3810 nfsv4 WRITE a:rpc_prepare_task q:ForeChannel Slot table
r38-13-78: [ 1999.084974] 18306 0280 0 ffff880253a4a000 ffff8802550e6ab8 60000 ffffffffa03cdb80 nfsv4 SEQUENCE a:call_status q:xprt_pending
next reply other threads:[~2011-03-03 22:22 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-03 22:22 Jim Rees [this message]
2011-03-04 16:28 ` hung iozone client bug recap Steve Dickson
2011-03-04 17:49 ` Jim Rees
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110303222219.GA16426@merit.edu \
--to=rees@umich.edu \
--cc=honey@citi.umich.edu \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.