NFS v4 client blocks when using UDP protocol

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Menyhart Zoltan <Zoltan.Menyhart@bull.net>
To: linux-nfs@vger.kernel.org
Subject: NFS v4 client blocks when using UDP protocol
Date: Tue, 29 Mar 2011 16:32:25 +0200	[thread overview]
Message-ID: <4D91ED79.4020302@bull.net> (raw)
In-Reply-To: <EB0F26AF-C0DB-4861-ABB5-72E39CB38090@bull.net>

Hi,

NFS v4 client blocks under heavy load, when using UDP protocol.
No progress can be seen, the test program "iozone" cannot be killed.

"nfsiostat 10 -p" reports 2 "nfs_writepages()" calls on every 10 seconds:

server:/ mounted on /mnt/test:
    op/s		rpc bklog
    0.00 	   0.00
read:             ops/s		   kB/s		  kB/op		retrans		avg RTT (ms)	avg exe (ms)
		  0.000 	  0.000 	  0.000        0 (0.0%) 	  0.000 	  0.000
write:            ops/s		   kB/s		  kB/op		retrans		avg RTT (ms)	avg exe (ms)
		  0.000 	  0.000 	  0.000        0 (0.0%) 	  0.000 	  0.000
0 nfs_readpage() calls read 0 pages
0 nfs_readpages() calls read 0 pages
0 nfs_updatepage() calls
0 nfs_writepage() calls wrote 0 pages
2 nfs_writepages() calls wrote 0 pages (0.0 pages per call)

Checking the test program with the "crash" reports:

PID: 6185   TASK: ffff8804047020c0  CPU: 0   COMMAND: "iozone"
  #0 [ffff88036ce638b8] schedule at ffffffff814d9459
  #1 [ffff88036ce63980] schedule_timeout at ffffffff814da152
  #2 [ffff88036ce63a30] io_schedule_timeout at ffffffff814d8f2f
  #3 [ffff88036ce63a60] balance_dirty_pages at ffffffff811213d1
  #4 [ffff88036ce63b80] balance_dirty_pages_ratelimited_nr at ffffffff811217e4
  #5 [ffff88036ce63b90] generic_file_buffered_write at ffffffff8110d993
  #6 [ffff88036ce63c60] __generic_file_aio_write at ffffffff8110f230
  #7 [ffff88036ce63d20] generic_file_aio_write at ffffffff8110f4cf
  #8 [ffff88036ce63d70] nfs_file_write at ffffffffa05e4c2e
  #9 [ffff88036ce63dc0] do_sync_write at ffffffff81170bda
#10 [ffff88036ce63ef0] vfs_write at ffffffff81170ed8
#11 [ffff88036ce63f30] sys_write at ffffffff81171911
#12 [ffff88036ce63f80] system_call_fastpath at ffffffff8100b172
     RIP: 0000003709a0e490  RSP: 00007fffba8eac78  RFLAGS: 00000246
     RAX: 0000000000000001  RBX: ffffffff8100b172  RCX: 0000003709a0ec10
     RDX: 0000000000400000  RSI: 00007fe469600000  RDI: 0000000000000003
     RBP: 0000000000000298   R8: 0000000000000000   R9: 0000000000000000
     R10: 0000000000000000  R11: 0000000000000246  R12: 00007fe469600000
     R13: 0000000000eb4c20  R14: 0000000000400000  R15: 0000000000eb4c20
     ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b

No activity can be seen with wireshark on the network.
Rebooting the client and restarting the test can show another failing point:

PID: 4723   TASK: ffff8803fc235580  CPU: 2   COMMAND: "iozone"
  #0 [ffff880417e2dc08] schedule at ffffffff814d9459
  #1 [ffff880417e2dcd0] io_schedule at ffffffff814d9c43
  #2 [ffff880417e2dcf0] sync_page at ffffffff8110d01d
  #3 [ffff880417e2dd00] __wait_on_bit at ffffffff814da4af
  #4 [ffff880417e2dd50] wait_on_page_bit at ffffffff8110d1d3
  #5 [ffff880417e2ddb0] wait_on_page_writeback_range at ffffffff8110d5eb
  #6 [ffff880417e2deb0] filemap_write_and_wait_range at ffffffff8110d7b8
  #7 [ffff880417e2dee0] vfs_fsync_range at ffffffff8119f11e
  #8 [ffff880417e2df30] vfs_fsync at ffffffff8119f1ed
  #9 [ffff880417e2df40] do_fsync at ffffffff8119f22e
#10 [ffff880417e2df70] sys_fsync at ffffffff8119f280
#11 [ffff880417e2df80] system_call_fastpath at ffffffff8100b172
     RIP: 0000003709a0ebb0  RSP: 00007fff5517dc28  RFLAGS: 00000212
     RAX: 000000000000004a  RBX: ffffffff8100b172  RCX: 00000000000000ef
     RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000003
     RBP: 0000000000008000   R8: 00000000ffffffff   R9: 0000000000000000
     R10: 0000000000000000  R11: 0000000000000246  R12: ffffffff8119f280
     R13: ffff880417e2df78  R14: 00000000011d7c30  R15: 0000000000000000
     ORIG_RAX: 000000000000004a  CS: 0033  SS: 002b

Sometimes console messages appear, like:

INFO: task iozone:4723 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
iozone        D 0000000000000002     0  4723   4683 0x00000080
ffff880417e2dcc8 0000000000000086 ffff880417e2dc48 ffffffffa042fc99
ffff88036e53d440 ffff8803e386aac0 ffff880402cf4610 ffff88036e53d448
ffff8803fc235b38 ffff880417e2dfd8 000000000000f558 ffff8803fc235b38
Call Trace:
[<ffffffffa042fc99>] ? rpc_run_task+0xd9/0x130 [sunrpc]
[<ffffffff81098aa9>] ? ktime_get_ts+0xa9/0xe0
[<ffffffff8110cfe0>] ? sync_page+0x0/0x50
[<ffffffff814d9c43>] io_schedule+0x73/0xc0
[<ffffffff8110d01d>] sync_page+0x3d/0x50
[<ffffffff814da4af>] __wait_on_bit+0x5f/0x90
[<ffffffff8110d1d3>] wait_on_page_bit+0x73/0x80
[<ffffffff8108df40>] ? wake_bit_function+0x0/0x50
[<ffffffff81122f25>] ? pagevec_lookup_tag+0x25/0x40
[<ffffffff8110d5eb>] wait_on_page_writeback_range+0xfb/0x190
[<ffffffff8110d7b8>] filemap_write_and_wait_range+0x78/0x90
[<ffffffff8119f11e>] vfs_fsync_range+0x7e/0xe0
[<ffffffff8119f1ed>] vfs_fsync+0x1d/0x20
[<ffffffff8119f22e>] do_fsync+0x3e/0x60
[<ffffffff8119f280>] sys_fsync+0x10/0x20
[<ffffffff8100b172>] system_call_fastpath+0x16/0x1b

This problem can be systematically reproduced with the following test:

iozone -a -c -e -E -L 64 -m -S 4096 -f /mnt/test/file -g 20G -n 1M -y 4K

It can be reproduced with both type of networks we have, see below.

Switching on RPC debugging with "sysctl -w sunrpc.rpc_debug=65535" gives:

-pid- flgs status -client- --rqstp- -timeout ---ops--
28139 0001    -11 ffff88040f403400   (null)        0 ffffffffa06157e0 nfsv4 WRITE a:call_reserveresult q:xprt_backlog
29167 0001    -11 ffff88040f403400 ffff8803ff405130        0 ffffffffa06157e0 nfsv4 WRITE a:call_status q:xprt_resend
29172 0001    -11 ffff88040f403400   (null)        0 ffffffffa06157e0 nfsv4 WRITE a:call_reserveresult q:xprt_backlog
...
51364 0001    -11 ffff88040f403400   (null)        0 ffffffffa06157e0 nfsv4 WRITE a:call_reserveresult q:xprt_backlog
55355 0001    -11 ffff88040f403400 ffff8803ff405450        0 ffffffffa06157e0 nfsv4 WRITE a:call_status q:xprt_resend
55356 0001    -11 ffff88040f403400 ffff8803ff404000        0 ffffffffa06157e0 nfsv4 WRITE a:call_status q:xprt_resend
55357 0001    -11 ffff88040f403400 ffff8803ff404c80        0 ffffffffa06157e0 nfsv4 WRITE a:call_status q:xprt_resend
55369 0001    -11 ffff88040f403400 ffff8803ff4055e0        0 ffffffffa06157e0 nfsv4 WRITE a:call_status q:xprt_resend
...
55370 0001    -11 ffff88040f403400   (null)        0 ffffffffa06157e0 nfsv4 WRITE a:call_reserveresult q:xprt_backlog
55371 0001    -11 ffff88040f403400   (null)        0 ffffffffa06157e0 nfsv4 WRITE a:call_reserveresult q:xprt_backlog
...

No error messages in "dmesg" neither in "/var/log/messages" at the server side.


Our test HW:
2 * Supermicro R460, 2 dual core Xeons 5150 2.66GHz in each machine
InfiniBand Mellanox Technologies MT26418 in DDR mode, IP over IB
Intel Corporation 80003ES2LAN Gigabit Ethernet Controller, point to point direct connection

SW:
Kernels up to 2.6.38-rc8


Server side:

A SW RAID "/dev/md0" mounted on "/md0" is exported:

/dev/md0:
          Version : 0.90
    Creation Time : Wed Feb 18 17:18:34 2009
       Raid Level : raid0
       Array Size : 357423360 (340.87 GiB 366.00 GB)
     Raid Devices : 5
    Total Devices : 5
Preferred Minor : 0
      Persistence : Superblock is persistent
      Update Time : Wed Feb 18 17:18:34 2009
            State : clean
   Active Devices : 5
Working Devices : 5
   Failed Devices : 0
    Spare Devices : 0
       Chunk Size : 256K
             UUID : 05fae940:c7540b1e:d0f324d8:8f753e72
           Events : 0.1

      Number   Major   Minor   RaidDevice State
         0       8       16        0      active sync   /dev/sdb
         1       8       32        1      active sync   /dev/sdc
         2       8       48        2      active sync   /dev/sdd
         3       8       64        3      active sync   /dev/sde
         4       8       80        4      active sync   /dev/sdf

/etc/exports:
/md0 *(rw,insecure,no_subtree_check,sync,fsid=0)

The client machine mounts it as follows:

mount -o async,udp 192.168.1.79:/ /mnt/test	# Ethernet
or
mount -o async,udp 192.168.42.79:/ /mnt/test	# IP over IB


Have you already seen this problem?
Thank you in advance,

Zoltan Menyhart

next      parent reply	other threads:[~2011-03-29 15:02 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <EB0F26AF-C0DB-4861-ABB5-72E39CB38090@bull.net>
2011-03-29 14:32 ` Menyhart Zoltan [this message]
2011-03-29 15:06   ` NFS v4 client blocks when using UDP protocol Chuck Lever
2011-03-29 15:26     ` Menyhart Zoltan
2011-03-29 15:46       ` Chuck Lever
2011-03-29 15:50       ` peter.staubach
2011-03-30  0:12         ` Tom Haynes
2011-03-30  9:00           ` Trond Myklebust
2011-11-10  8:11 fanchaoting
2011-11-10 14:50 ` Myklebust, Trond

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D91ED79.4020302@bull.net \
    --to=zoltan.menyhart@bull.net \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).