linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hans de Bruin <jmdebruin@xmsnet.nl>
To: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Linux NFS mailing list <linux-nfs@vger.kernel.org>
Subject: Re: nfsroot client will not start firefox or thunderbird from 3.4.0 nfsserver
Date: Mon, 04 Jun 2012 22:13:41 +0200	[thread overview]
Message-ID: <4FCD16F5.20407@xmsnet.nl> (raw)
In-Reply-To: <4FCC8E6D.9020106@openvz.org>

On 06/04/2012 12:31 PM, Konstantin Khlebnikov wrote:
> Hans de Bruin wrote:
>> On 06/01/2012 09:11 PM, Hans de Bruin wrote:
>>> On 05/29/2012 12:19 AM, Hans de Bruin wrote:
>>>> I just upgraded my home server from kernel 3.3.5 to 3.4.0 and ran into
>>>> some trouble. My laptop, a nfsroot client, will not run firefox and
>>>> thunderbird anymore. When I start these programs from an xterm, the
>>>> cursor goes to the next line and waits indefinitely.
>>>>
>>>> I do not know if there is any order is lsof's output. A lsof | grep
>>>> firefox or thunderbird shows ......./.parentlock as the last line.
>>>>
>>>> It does not matter whether the client is running a 3.4.0 or a 3.3.0
>>>> kernel, or if the server is running on top of xen or not.
>>>>
>>>> There is some noise in the servers dmesg:
>>>>
>>>> [ 241.256684] INFO: task kworker/u:2:801 blocked for more than 120
>>>> seconds.
>>>> [ 241.256691] "echo 0> /proc/sys/kernel/hung_task_timeout_secs"
>>>
>>> ...
>>>
>>> On a almost identical testsystem firefox en thunderbird segfault after
>>> upgrading to 3.4.0. I would have been nice if it would behave exaclty
>>> like my home server. I bisected the segfault to:
>>>
>>> commit 0fc9d1040313047edf6a39fd4d7c7defdca97c62
>>> Author: Konstantin Khlebnikov<khlebnikov@openvz.org>
>>> Date: Wed Mar 28 14:42:54 2012 -0700
>>>
>>> radix-tree: use iterators in find_get_pages* functions
>>>
>>>
>>> When I revert that on top of 3.4.0 the segfaults are gone but both
>>> firefox en thunderbird go in the lets wait indefinitely mode like the
>>> homeserver.
>>>
>>> I am going to make a bit-wise copy from from my homeserver to my
>>> testserver and try again.
>>>
>>
>> The bit-wise copy also segfaults firefox and thunderbird at the same
>> commit.
>>
>
> I think bug somewhere in NFS, that patch only highlighted it.
> Please, try to run it with debug patch from attachment.

Before I can start firefox from an xterm the lines below are shown on 
the server:

[  241.260076] INFO: task kworker/u:2:791 blocked for more than 120 seconds.
[  241.260084] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  241.260090] kworker/u:2     D 000000000000000c     0   791      2 
0x00000000
[  241.260102]  ffff8801390b1cf0 0000000000000046 0000000000012d00 
0000000000012d00
[  241.260113]  0000000000012d00 ffff880139141470 0000000000012d00 
ffff8801390b1fd8
[  241.260124]  ffff8801390b1fd8 0000000000012d00 ffff880139cdc420 
ffff880139141470
[  241.260135] Call Trace:
[  241.260152]  [<ffffffff81513116>] schedule+0x64/0x66
[  241.260162]  [<ffffffff812005a6>] cld_pipe_upcall+0x95/0xd1
[  241.260173]  [<ffffffff811faae5>] ? nfsd4_exchange_id+0x23e/0x23e
[  241.260182]  [<ffffffff81200a5e>] nfsd4_cld_grace_done+0x50/0x8a
[  241.260191]  [<ffffffff81200f8b>] nfsd4_record_grace_done+0x18/0x1a
[  241.260200]  [<ffffffff811fab2f>] laundromat_main+0x4a/0x213
[  241.260210]  [<ffffffff81069aeb>] ? need_resched+0x1e/0x28
[  241.260218]  [<ffffffff81513035>] ? __schedule+0x49d/0x4b5
[  241.260227]  [<ffffffff811faae5>] ? nfsd4_exchange_id+0x23e/0x23e
[  241.260237]  [<ffffffff8105b8ad>] process_one_work+0x190/0x28d
[  241.260248]  [<ffffffff8105c4e7>] worker_thread+0x105/0x189
[  241.260260]  [<ffffffff81513a75>] ? _raw_spin_unlock_irqrestore+0x1a/0x1d
[  241.260274]  [<ffffffff8105c3e2>] ? manage_workers.clone.17+0x173/0x173
[  241.260287]  [<ffffffff8105ff30>] kthread+0x8a/0x92
[  241.260325]  [<ffffffff815158a4>] kernel_thread_helper+0x4/0x10
[  241.260335]  [<ffffffff8105fea6>] ? 
kthread_freezable_should_stop+0x47/0x47
[  241.260343]  [<ffffffff815158a0>] ? gs_change+0x13/0x13
[  361.260025] INFO: task kworker/u:2:791 blocked for more than 120 seconds.
[  361.260032] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  361.260039] kworker/u:2     D 000000000000000c     0   791      2 
0x00000000
[  361.260051]  ffff8801390b1cf0 0000000000000046 0000000000012d00 
0000000000012d00
[  361.260062]  0000000000012d00 ffff880139141470 0000000000012d00 
ffff8801390b1fd8
[  361.260072]  ffff8801390b1fd8 0000000000012d00 ffff880139cdc420 
ffff880139141470
[  361.260084] Call Trace:
[  361.260099]  [<ffffffff81513116>] schedule+0x64/0x66
[  361.260110]  [<ffffffff812005a6>] cld_pipe_upcall+0x95/0xd1
[  361.260121]  [<ffffffff811faae5>] ? nfsd4_exchange_id+0x23e/0x23e
[  361.260130]  [<ffffffff81200a5e>] nfsd4_cld_grace_done+0x50/0x8a
[  361.260139]  [<ffffffff81200f8b>] nfsd4_record_grace_done+0x18/0x1a
[  361.260148]  [<ffffffff811fab2f>] laundromat_main+0x4a/0x213
[  361.260158]  [<ffffffff81069aeb>] ? need_resched+0x1e/0x28
[  361.260166]  [<ffffffff81513035>] ? __schedule+0x49d/0x4b5
[  361.260175]  [<ffffffff811faae5>] ? nfsd4_exchange_id+0x23e/0x23e
[  361.260185]  [<ffffffff8105b8ad>] process_one_work+0x190/0x28d
[  361.260194]  [<ffffffff8105c4e7>] worker_thread+0x105/0x189
[  361.260203]  [<ffffffff81513a75>] ? _raw_spin_unlock_irqrestore+0x1a/0x1d
[  361.260213]  [<ffffffff8105c3e2>] ? manage_workers.clone.17+0x173/0x173
[  361.260222]  [<ffffffff8105ff30>] kthread+0x8a/0x92
[  361.260231]  [<ffffffff815158a4>] kernel_thread_helper+0x4/0x10
[  361.260241]  [<ffffffff8105fea6>] ? 
kthread_freezable_should_stop+0x47/0x47
[  361.260249]  [<ffffffff815158a0>] ? gs_change+0x13/0x13
[  481.260010] INFO: task kworker/u:2:791 blocked for more than 120 seconds.
[  481.260019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  481.260028] kworker/u:2     D 000000000000000c     0   791      2 
0x00000000
[  481.260043]  ffff8801390b1cf0 0000000000000046 0000000000012d00 
0000000000012d00
[  481.260058]  0000000000012d00 ffff880139141470 0000000000012d00 
ffff8801390b1fd8
[  481.260073]  ffff8801390b1fd8 0000000000012d00 ffff880139cdc420 
ffff880139141470
[  481.260088] Call Trace:
[  481.260107]  [<ffffffff81513116>] schedule+0x64/0x66
[  481.260120]  [<ffffffff812005a6>] cld_pipe_upcall+0x95/0xd1
[  481.260135]  [<ffffffff811faae5>] ? nfsd4_exchange_id+0x23e/0x23e
[  481.260147]  [<ffffffff81200a5e>] nfsd4_cld_grace_done+0x50/0x8a
[  481.260159]  [<ffffffff81200f8b>] nfsd4_record_grace_done+0x18/0x1a
[  481.260172]  [<ffffffff811fab2f>] laundromat_main+0x4a/0x213
[  481.260185]  [<ffffffff81069aeb>] ? need_resched+0x1e/0x28
[  481.260196]  [<ffffffff81513035>] ? __schedule+0x49d/0x4b5
[  481.260206]  [<ffffffff811faae5>] ? nfsd4_exchange_id+0x23e/0x23e
[  481.260215]  [<ffffffff8105b8ad>] process_one_work+0x190/0x28d
[  481.260225]  [<ffffffff8105c4e7>] worker_thread+0x105/0x189
[  481.260234]  [<ffffffff81513a75>] ? _raw_spin_unlock_irqrestore+0x1a/0x1d
[  481.260243]  [<ffffffff8105c3e2>] ? manage_workers.clone.17+0x173/0x173
[  481.260252]  [<ffffffff8105ff30>] kthread+0x8a/0x92
[  481.260262]  [<ffffffff815158a4>] kernel_thread_helper+0x4/0x10
[  481.260271]  [<ffffffff8105fea6>] ? 
kthread_freezable_should_stop+0x47/0x47
[  481.260279]  [<ffffffff815158a0>] ? gs_change+0x13/0x13


dmesg on the client side:

[   27.607606] gtk-query-immod[1976]: segfault at 1d2d1f30 ip b7734391 
sp bfe3e984 error 4 in ld-2.13.so[b772b000+1d000]
[   48.136763] start_kdeinit (2086): /proc/2086/oom_adj is deprecated, 
please use /proc/2086/oom_score_adj instead.
[   75.801804] blueman-applet[2150]: segfault at 1cf2cf30 ip b7741391 sp 
bfb456b8 error 4 in ld-2.13.so[b7738000+1d000]
[  140.226371] firefox[2175]: segfault at 1b065f30 ip b76f6391 sp 
bfb15db8 error 4 in ld-2.13.so[b76ed000+1d000]


The firefox dump on client side produces no messages on server side.

md5sum's of ld-2.13.so are equal on server and client and across 
kernlversions.



Did I miss the output off the debug patch?


-- 
Hans





  reply	other threads:[~2012-06-04 20:13 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-28 22:19 nfsroot client will not start firefox or thunderbird from 3.4.0 nfsserver Hans de Bruin
2012-06-01 19:11 ` Hans de Bruin
2012-06-03 15:00   ` Hans de Bruin
2012-06-04 10:31     ` Konstantin Khlebnikov
2012-06-04 20:13       ` Hans de Bruin [this message]
2012-06-08 20:51 ` Hans de Bruin
2012-06-10  9:52 ` Jeff Layton
2012-06-10 13:56   ` Hans de Bruin
2012-06-11 10:22     ` Jeff Layton
2012-06-11 11:11     ` Jeff Layton
2012-06-13 18:18       ` Hans de Bruin
2012-06-14  1:37         ` Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FCD16F5.20407@xmsnet.nl \
    --to=jmdebruin@xmsnet.nl \
    --cc=khlebnikov@openvz.org \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).