public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Benoit Plessis <benoit@plessis.info>
To: linux-kernel@vger.kernel.org
Subject: Re: NFS regression? Odd delays and lockups accessing an NFS export.
Date: Wed, 10 Sep 2008 04:51:31 +0200	[thread overview]
Message-ID: <48C73633.9030300@plessis.info> (raw)


Hi,

I've just found this thread, and i'm relieved in some way, that i'm not 
the only one
experiencing this.

NFS Server: NetApp Data On Tap v 7.2.4 (not changed

NFS Client: Dell and HP machines, using Broadcom network card
    # ethtool -i eth0
    driver: bnx2
    version: 1.7.4
    firmware-version: 4.0.3 UMP 1.1.9
    bus-info: 0000:05:00.0

All machines are running on NFS Root on local LAN. Previous 
configuration with kernel 2.6.22.14
was working quite fine except for some quick lost, solved within a few 
second
    Sep 10 01:03:54 kernel: nfs: server 10.250.255.253 not responding, 
still trying
    Sep 10 01:03:54 kernel: nfs: server 10.250.255.253 OK
That's already something strange that shouldn't happens

And now, after an upgrade to 2.6.25.16:
Sep 10 03:26:31 web-31-mtp-1 kernel: nfs: server 10.250.255.253 not 
responding, still trying
Sep 10 03:27:40 web-31-mtp-1 kernel: INFO: task apache2:29263 blocked 
for more than 120 seconds.
Sep 10 03:27:40 web-31-mtp-1 kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 10 03:27:40 web-31-mtp-1 kernel: apache2       D 
0000000000000000     0 29263   2288
Sep 10 03:27:40 web-31-mtp-1 kernel:  ffff81005d1ad958 0000000000200086 
0000000000000000 ffff810001005830
Sep 10 03:27:40 web-31-mtp-1 kernel:  ffff81009f8c8080 ffffffff805734a0 
ffff81009f8c82b8 0000000100970659
Sep 10 03:27:40 web-31-mtp-1 kernel:  00000000ffffffff 0000000000000000 
0000000000000000 0000000000000000
Sep 10 03:27:40 web-31-mtp-1 kernel: Call Trace:
Sep 10 03:27:40 web-31-mtp-1 kernel:  [<ffffffff8046ac20>] 
rpc_wait_bit_killable+0x0/0x40
Sep 10 03:27:40 web-31-mtp-1 kernel:  [<ffffffff8046ac3c>] 
rpc_wait_bit_killable+0x1c/0x40
Sep 10 03:27:40 web-31-mtp-1 kernel:  [<ffffffff80493f4f>] 
__wait_on_bit+0x4f/0x80
Sep 10 03:27:40 web-31-mtp-1 kernel:  [<ffffffff8046ac20>] 
rpc_wait_bit_killable+0x0/0x40
Sep 10 03:27:40 web-31-mtp-1 kernel:  [<ffffffff80493ffa>] 
out_of_line_wait_on_bit+0x7a/0xa0
Sep 10 03:27:40 web-31-mtp-1 kernel:  [<ffffffff80249350>] 
wake_bit_function+0x0/0x30
Sep 10 03:27:40 web-31-mtp-1 kernel:  [<ffffffff804672d3>] 
xprt_connect+0x83/0x170
Sep 10 03:27:40 web-31-mtp-1 kernel:  [<ffffffff8046b141>] 
__rpc_execute+0xd1/0x280
Sep 10 03:27:40 web-31-mtp-1 kernel:  [<ffffffff804644e1>] 
rpc_run_task+0x51/0x70
Sep 10 03:27:40 web-31-mtp-1 kernel:  [<ffffffff8046459c>] 
rpc_call_sync+0x3c/0x60
Sep 10 03:27:40 web-31-mtp-1 kernel:  [<ffffffff802edbba>] 
nfs3_rpc_wrapper+0x3a/0x60
Sep 10 03:27:40 web-31-mtp-1 kernel:  [<ffffffff802ee2f1>] 
nfs3_proc_access+0x141/0x240
Sep 10 03:27:40 web-31-mtp-1 kernel:  [<ffffffff8029d51f>] dput+0x1f/0xf0
Sep 10 03:27:40 web-31-mtp-1 kernel:  [<ffffffff802e0abf>] 
nfs_lookup_revalidate+0x22f/0x3e0
Sep 10 03:27:40 web-31-mtp-1 kernel:  [<ffffffff80249529>] 
remove_wait_queue+0x19/0x60
Sep 10 03:27:40 web-31-mtp-1 kernel:  [<ffffffff80298c91>] 
free_poll_entry+0x11/0x20
Sep 10 03:27:40 web-31-mtp-1 kernel:  [<ffffffff80298ccc>] 
poll_freewait+0x2c/0x90
Sep 10 03:27:40 web-31-mtp-1 kernel:  [<ffffffff80299026>] 
do_sys_poll+0x2f6/0x370
Sep 10 03:27:40 web-31-mtp-1 kernel:  [<ffffffff802dee71>] 
nfs_do_access+0xe1/0x340
Sep 10 03:27:40 web-31-mtp-1 kernel:  [<ffffffff802df1aa>] 
nfs_permission+0xda/0x1a0
Sep 10 03:27:40 web-31-mtp-1 kernel:  [<ffffffff80292d1d>] 
permission+0xbd/0x150
Sep 10 03:27:40 web-31-mtp-1 kernel:  [<ffffffff80294b7e>] 
__link_path_walk+0x7e/0xee0
Sep 10 03:27:40 web-31-mtp-1 kernel:  [<ffffffff803e865b>] 
sock_aio_read+0x16b/0x180
Sep 10 03:27:40 web-31-mtp-1 kernel:  [<ffffffff80295a3a>] 
path_walk+0x5a/0xc0
Sep 10 03:27:40 web-31-mtp-1 kernel:  [<ffffffff80295cc3>] 
do_path_lookup+0x83/0x1c0
Sep 10 03:27:40 web-31-mtp-1 kernel:  [<ffffffff802947a5>] 
getname+0xe5/0x210
Sep 10 03:27:40 web-31-mtp-1 kernel:  [<ffffffff802968bb>] 
__user_walk_fd+0x4b/0x80
Sep 10 03:27:40 web-31-mtp-1 kernel:  [<ffffffff8028eacf>] 
vfs_stat_fd+0x2f/0x80
Sep 10 03:27:40 web-31-mtp-1 kernel:  [<ffffffff802255af>] 
sys32_stat64+0x1f/0x50
Sep 10 03:27:40 web-31-mtp-1 kernel:  [<ffffffff80327df1>] 
__up_read+0x21/0xb0
Sep 10 03:27:40 web-31-mtp-1 kernel:  [<ffffffff804955c9>] 
error_exit+0x0/0x51
Sep 10 03:27:40 web-31-mtp-1 kernel:  [<ffffffff80224dd2>] 
ia32_sysret+0x0/0xa
Sep 10 03:27:40 web-31-mtp-1 kernel:
Sep 10 03:27:40 web-31-mtp-1 kernel: INFO: task apache2:29339 blocked 
for more than 120 seconds.
Sep 10 03:27:40 web-31-mtp-1 kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
....
Sep 10 03:31:56 web-31-mtp-1 kernel: nfs: server 10.250.255.253 OK

During this time the serveur isn't available, albeit it still reply to 
icmp message.
I've a bunch of identical server with this new kernel, and this doesn't 
happen on every one.
Only 30% of then seem to be affected.
All machines are running the latest same BIOS/Network Firmware.
Also all machines are using a bonded interface in failover mode.

There is no particular messages on the netapp side, only this common 
message:

    Average size of NFS TCP packets received from host: X.X.X.X is 3858.
    This is less than the MTU (9000 bytes) of the interface involved in
    the data transfer.
    The maximum segment size being used by TCP for this host is: 8960.
    Low average size of packets received by this system might be
    because of a misconfigured client system, or a poorly written
    client application.
    Press enter to continue

There isn't even any "retransmission timeouts" recorded during the 
client side hang.

The kernel is a vanilla 2.6.25.16 compiled for amd64 architecture with 
bnx2 and NFS root included in the kernel.
I'll try the latest 2.6.24 and then revert to 2.6.22 if necessary


And 'voila',
that's all i can think of at this time, don't hesitate to ask for more 
informations.


Oh yes, the first problems appeared on one machine while his twin sister 
doesn't have any trouble,
the only difference at this time was the value of 
"net.core.netdev_max_backlog" (2500 on the machine without
trouble, 1000 on the other). But now this happens on machines with the 
exact same configuration,
without many sysctl tweak, only:
    # increase TCP max buffer size
    net.core.rmem_max = 16777216
    net.core.wmem_max = 16777216
    # increase Linux autotuning TCP buffer limits
    # min, default, and max number of bytes to use
    net.ipv4.tcp_rmem = 4096 87380 16777216
    net.ipv4.tcp_wmem = 4096 65536 16777216

    # don't cache ssthresh from previous connection
    net.ipv4.tcp_no_metrics_save = 1
    # recommended to increase this for 1000 BT or higher
    net.core.netdev_max_backlog = 2500


PS: Please CC me the reply

-- 
Benoit


             reply	other threads:[~2008-09-10  3:03 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-10  2:51 Benoit Plessis [this message]
  -- strict thread matches above, loose matches on Subject: below --
2008-08-18  2:02 NFS regression? Odd delays and lockups accessing an NFS export Grant Coady
2008-08-18 18:50 ` Athanasius
2008-08-18 19:19   ` Trond Myklebust
2008-08-18 19:37   ` J. K. Cliburn
2008-08-18 23:13     ` Athanasius
2009-05-12 20:27     ` Frank Filz
2009-05-13  0:05       ` Jay Cliburn
2008-08-18 19:20 ` Trond Myklebust
2008-08-20  1:10   ` Grant Coady
2008-08-20 23:17   ` Grant Coady
2008-08-22 10:23   ` Ian Campbell
2008-08-22 18:08     ` Trond Myklebust
2008-08-22 18:13       ` Ian Campbell
2008-08-22 19:33         ` John Ronciak
2008-08-22 20:00           ` Ian Campbell
2008-08-22 21:15             ` John Ronciak
2008-08-22 21:23             ` Trond Myklebust
2008-08-22 21:37               ` Ian Campbell
2008-08-22 21:56                 ` Trond Myklebust
2008-08-22 22:41                   ` Ian Campbell
2008-08-24 18:53                   ` Ian Campbell
2008-08-24 19:17                     ` Trond Myklebust
2008-08-24 19:19                       ` Trond Myklebust
2008-08-24 22:09                         ` Ian Campbell
2008-08-24 22:15                           ` Trond Myklebust
2008-08-25  9:59                             ` Ian Campbell
2008-08-25 16:04                             ` Tom Tucker
2008-08-25 16:54                               ` Trond Myklebust
2008-08-25 20:15                                 ` Tom Tucker
2008-08-26 19:27                               ` J. Bruce Fields
2008-08-27 14:43                                 ` Tom Tucker
2008-08-30 15:47                                   ` Ian Campbell
2008-08-31 19:30                                     ` J. Bruce Fields
2008-08-31 19:44                                       ` Ian Campbell
2008-08-31 19:46                                         ` J. Bruce Fields
2008-08-31 19:49                                           ` Ian Campbell
2008-08-31 19:51                                             ` Tom Tucker
2008-08-31 19:51                                         ` Tom Tucker
2008-08-31 21:18                                           ` Ian Campbell
2008-09-01 17:20                                             ` Tom Tucker
2008-09-01 17:46                                               ` Ian Campbell
2008-09-10  8:40                                               ` Ian Campbell
2008-09-12 22:43                                                 ` J. Bruce Fields
2008-09-12 23:15                                                   ` Tom Tucker
2008-09-13  8:57                                                     ` Ian Campbell
2008-09-16  5:48                                                       ` Ian Campbell
2008-09-16 11:38                                                         ` Tom Tucker
2008-09-16 15:03                                                           ` Ian Campbell
2008-09-16 15:58                                                             ` Tom Tucker
2008-09-16 16:24                                                               ` Ian Campbell
2008-09-23  7:59                                                                 ` Ian Campbell
2008-09-23 11:33                                                                   ` Ian Campbell
2008-09-23 17:03                                                                     ` J. Bruce Fields
2008-09-26 15:37                                                                       ` Ian Campbell
2008-09-26 18:17                                                                         ` Ian Campbell
2008-09-27  3:54                                                                           ` J. Bruce Fields
2008-09-27 10:16                                                                             ` Ian Campbell
2008-08-25 21:39                           ` Roger Heflin
2008-08-25 20:23                   ` Grant Coady
2008-08-25 22:11                     ` Trond Myklebust
2008-08-26  0:29                       ` Grant Coady
2008-08-26  0:59                         ` Muntz, Daniel
2008-08-26  1:06                       ` Grant Coady

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48C73633.9030300@plessis.info \
    --to=benoit@plessis.info \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox