All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: linux-nfs@vger.kernel.org
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Subject: very slow NFS boot on linux-next and -mm
Date: Sun, 3 May 2009 18:54:56 +0800	[thread overview]
Message-ID: <20090503105456.GA30449@localhost> (raw)

Hi,

My NFSROOT box sometimes will get stuck for a dozen seconds during boot.

Here is the stucked process:

[   48.983671] gzip          D 0000000000000000  2880  2976   2966
[   48.983671]  ffff88007b087ae8 0000000000000046 0000000000000000 ffffffff81079c58
[   48.983671]  00000000001d1a00 000000000000cda8 ffff88007b87dc80 ffffffff817934a0
[   48.983671]  ffff88007b87e010 ffff880005bd1a00 000000007b087a98 ffff880005bd1a18
[   48.983671] Call Trace:
[   48.983671]  [<ffffffff81079c58>] ? mark_held_locks+0x68/0x90
[   48.983671]  [<ffffffff810709cd>] ? getnstimeofday+0x9d/0x140
[   48.983671]  [<ffffffff8152cb85>] schedule+0x15/0x50
[   48.983671]  [<ffffffff8152cc28>] io_schedule+0x68/0xb0
[   48.983671]  [<ffffffff810c1935>] sync_page+0x85/0xc0
[   48.983671]  [<ffffffff8152d482>] __wait_on_bit+0x62/0x90
[   48.983671]  [<ffffffff810c18b0>] ? sync_page+0x0/0xc0
[   48.983671]  [<ffffffff810c1c64>] wait_on_page_bit+0x74/0x80
[   48.983671]  [<ffffffff81066490>] ? wake_bit_function+0x0/0x50
[   48.983671]  [<ffffffff810cd5d7>] ? pagevec_lookup_tag+0x27/0x40
[   48.983671]  [<ffffffff810c2906>] wait_on_page_writeback_range+0xc6/0x150
[   48.983671]  [<ffffffff810cb600>] ? do_writepages+0x30/0x50
[   48.983671]  [<ffffffff810c2a29>] ? __filemap_fdatawrite_range+0x59/0x70
[   48.983671]  [<ffffffff810c29be>] filemap_fdatawait+0x2e/0x40
[   48.983671]  [<ffffffff810c3e8d>] filemap_write_and_wait+0x4d/0x60
[   48.983671]  [<ffffffff8118de75>] nfs_setattr+0x135/0x150
[   48.983671]  [<ffffffff81070031>] ? current_kernel_time+0x41/0x70
[   48.983671]  [<ffffffff81079c58>] ? mark_held_locks+0x68/0x90
[   48.983671]  [<ffffffff810512d7>] ? current_fs_time+0x27/0x30
[   48.983671]  [<ffffffff81076dc0>] ? debug_mutex_free_waiter+0x80/0xc0
[   48.983671]  [<ffffffff8111bce1>] notify_change+0x121/0x330
[   48.983671]  [<ffffffff81129993>] utimes_common+0xd3/0x1c0
[   48.983671]  [<ffffffff81129b7c>] do_utimes+0xfc/0x110
[   48.983671]  [<ffffffff81079fad>] ? trace_hardirqs_on_caller+0x16d/0x1c0
[   48.983671]  [<ffffffff81129cbc>] sys_utimensat+0x3c/0xb0
[   48.983671]  [<ffffffff8100c2b2>] system_call_fastpath+0x16/0x1b

The NFSROOT client that got stuck is running latest linux-next, the
server side is running 2.6.30-rc3.

Switching the client kernel to latest -mm makes it very reproducible.
Here is another stack dump:

[  180.399845] udevd         D ffff8800280291e0  3056  1388      1
[  180.399845]  ffff88007bdeb7b8 0000000000000046 ffff88007bdec000 ffff88007bdec7d0
[  180.399845]  ffff88007bdeb718 00000000001d1000 000000000000cd68 ffff88007bdec000
[  180.399845]  ffff88007d3dc000 ffff88007bdec378 0000000100000286 00000000ffff8a5d
[  180.399845] Call Trace:
[  180.399845]  [<ffffffff8107673d>] ? trace_hardirqs_on_caller+0x16d/0x1c0
[  180.399845]  [<ffffffff8107679d>] ? trace_hardirqs_on+0xd/0x10
[  180.399845]  [<ffffffff814e3071>] rpc_wait_bit_killable+0x21/0x40
[  180.399845]  [<ffffffff81549552>] __wait_on_bit+0x62/0x90
[  180.399845]  [<ffffffff814e3050>] ? rpc_wait_bit_killable+0x0/0x40
[  180.399845]  [<ffffffff814e3050>] ? rpc_wait_bit_killable+0x0/0x40
[  180.399845]  [<ffffffff815495f9>] out_of_line_wait_on_bit+0x79/0x90
[  180.399845]  [<ffffffff81063270>] ? wake_bit_function+0x0/0x50
[  180.399845]  [<ffffffff814e3965>] __rpc_execute+0x105/0x2b0
[  180.399845]  [<ffffffff814e3b3d>] rpc_execute+0x2d/0x40
[  180.399845]  [<ffffffff814dc5ea>] rpc_run_task+0x4a/0xc0
[  180.399845]  [<ffffffff814dc772>] rpc_call_sync+0x42/0x70
[  180.399845]  [<ffffffff811c6bdd>] nfs3_rpc_wrapper+0x2d/0x70
[  180.399845]  [<ffffffff811c73af>] nfs3_proc_getattr+0x4f/0x90
[  180.399845]  [<ffffffff811b8e26>] __nfs_revalidate_inode+0xc6/0x220
[  180.399845]  [<ffffffff81012ad0>] ? native_sched_clock+0x20/0x70
[  180.399845]  [<ffffffff81012b29>] ? sched_clock+0x9/0x10
[  180.399845]  [<ffffffff81074405>] ? lock_release_holdtime+0x35/0x1c0
[  180.399845]  [<ffffffff811b33fe>] nfs_lookup_revalidate+0x41e/0x580
[  180.399845]  [<ffffffff81012ad0>] ? native_sched_clock+0x20/0x70
[  180.399845]  [<ffffffff81012ad0>] ? native_sched_clock+0x20/0x70
[  180.399845]  [<ffffffff81012b29>] ? sched_clock+0x9/0x10
[  180.399845]  [<ffffffff81074405>] ? lock_release_holdtime+0x35/0x1c0
[  180.399845]  [<ffffffff81116d42>] ? __d_lookup+0x142/0x210
[  180.399845]  [<ffffffff81116c00>] ? __d_lookup+0x0/0x210
[  180.399845]  [<ffffffff8110bb77>] do_lookup+0x67/0x260
[  180.399845]  [<ffffffff811b60c8>] ? nfs_permission+0x178/0x1e0
[  180.399845]  [<ffffffff8110e3c4>] __link_path_walk+0x9a4/0x1250
[  180.399845]  [<ffffffff81074405>] ? lock_release_holdtime+0x35/0x1c0
[  180.399845]  [<ffffffff8110ee8c>] path_walk+0x5c/0xb0
[  180.399845]  [<ffffffff8110ef43>] do_path_lookup+0x63/0xb0
[  180.399845]  [<ffffffff8110ffeb>] do_filp_open+0xfb/0xb40
[  180.399845]  [<ffffffff81012ad0>] ? native_sched_clock+0x20/0x70
[  180.399845]  [<ffffffff81012b29>] ? sched_clock+0x9/0x10
[  180.399845]  [<ffffffff812777ae>] ? _raw_spin_unlock+0x7e/0xa0
[  180.399845]  [<ffffffff8111b571>] ? alloc_fd+0x131/0x160
[  180.399845]  [<ffffffff810ffa57>] do_sys_open+0x87/0x140
[  180.399845]  [<ffffffff810ffb50>] sys_open+0x20/0x30
[  180.399845]  [<ffffffff8100c072>] system_call_fastpath+0x16/0x1b

I noticed that when udevd is waiting for NFS IO, there is a steady
250K/s network traffic.

Any suggestions on further debugging?

Thanks,
Fengguang

             reply	other threads:[~2009-05-03 10:56 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-03 10:54 Wu Fengguang [this message]
2009-05-03 12:40 ` very slow NFS boot on linux-next and -mm Wu Fengguang
2009-05-03 12:40   ` Wu Fengguang
2009-05-03 12:56   ` Kay Sievers
2009-05-03 12:56     ` Kay Sievers
     [not found]     ` <20090504133019.GA14300@localhost>
2009-05-06 13:42       ` Kay Sievers
2009-05-06 13:42         ` Kay Sievers
2009-05-07 11:59         ` [PATCH] inotify: report rounded-up event size to user space Wu Fengguang
2009-05-07 11:59           ` Wu Fengguang
2009-05-07 12:16           ` Eric Paris
2009-05-07 12:16             ` Eric Paris

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090503105456.GA30449@localhost \
    --to=fengguang.wu@intel.com \
    --cc=Trond.Myklebust@netapp.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.