From: Hans de Bruin <jmdebruin@xmsnet.nl>
To: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Linux NFS mailing list <linux-nfs@vger.kernel.org>
Subject: Re: nfsroot client will not start firefox or thunderbird from 3.4.0 nfsserver
Date: Mon, 04 Jun 2012 22:13:41 +0200 [thread overview]
Message-ID: <4FCD16F5.20407@xmsnet.nl> (raw)
In-Reply-To: <4FCC8E6D.9020106@openvz.org>
On 06/04/2012 12:31 PM, Konstantin Khlebnikov wrote:
> Hans de Bruin wrote:
>> On 06/01/2012 09:11 PM, Hans de Bruin wrote:
>>> On 05/29/2012 12:19 AM, Hans de Bruin wrote:
>>>> I just upgraded my home server from kernel 3.3.5 to 3.4.0 and ran into
>>>> some trouble. My laptop, a nfsroot client, will not run firefox and
>>>> thunderbird anymore. When I start these programs from an xterm, the
>>>> cursor goes to the next line and waits indefinitely.
>>>>
>>>> I do not know if there is any order is lsof's output. A lsof | grep
>>>> firefox or thunderbird shows ......./.parentlock as the last line.
>>>>
>>>> It does not matter whether the client is running a 3.4.0 or a 3.3.0
>>>> kernel, or if the server is running on top of xen or not.
>>>>
>>>> There is some noise in the servers dmesg:
>>>>
>>>> [ 241.256684] INFO: task kworker/u:2:801 blocked for more than 120
>>>> seconds.
>>>> [ 241.256691] "echo 0> /proc/sys/kernel/hung_task_timeout_secs"
>>>
>>> ...
>>>
>>> On a almost identical testsystem firefox en thunderbird segfault after
>>> upgrading to 3.4.0. I would have been nice if it would behave exaclty
>>> like my home server. I bisected the segfault to:
>>>
>>> commit 0fc9d1040313047edf6a39fd4d7c7defdca97c62
>>> Author: Konstantin Khlebnikov<khlebnikov@openvz.org>
>>> Date: Wed Mar 28 14:42:54 2012 -0700
>>>
>>> radix-tree: use iterators in find_get_pages* functions
>>>
>>>
>>> When I revert that on top of 3.4.0 the segfaults are gone but both
>>> firefox en thunderbird go in the lets wait indefinitely mode like the
>>> homeserver.
>>>
>>> I am going to make a bit-wise copy from from my homeserver to my
>>> testserver and try again.
>>>
>>
>> The bit-wise copy also segfaults firefox and thunderbird at the same
>> commit.
>>
>
> I think bug somewhere in NFS, that patch only highlighted it.
> Please, try to run it with debug patch from attachment.
Before I can start firefox from an xterm the lines below are shown on
the server:
[ 241.260076] INFO: task kworker/u:2:791 blocked for more than 120 seconds.
[ 241.260084] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 241.260090] kworker/u:2 D 000000000000000c 0 791 2
0x00000000
[ 241.260102] ffff8801390b1cf0 0000000000000046 0000000000012d00
0000000000012d00
[ 241.260113] 0000000000012d00 ffff880139141470 0000000000012d00
ffff8801390b1fd8
[ 241.260124] ffff8801390b1fd8 0000000000012d00 ffff880139cdc420
ffff880139141470
[ 241.260135] Call Trace:
[ 241.260152] [<ffffffff81513116>] schedule+0x64/0x66
[ 241.260162] [<ffffffff812005a6>] cld_pipe_upcall+0x95/0xd1
[ 241.260173] [<ffffffff811faae5>] ? nfsd4_exchange_id+0x23e/0x23e
[ 241.260182] [<ffffffff81200a5e>] nfsd4_cld_grace_done+0x50/0x8a
[ 241.260191] [<ffffffff81200f8b>] nfsd4_record_grace_done+0x18/0x1a
[ 241.260200] [<ffffffff811fab2f>] laundromat_main+0x4a/0x213
[ 241.260210] [<ffffffff81069aeb>] ? need_resched+0x1e/0x28
[ 241.260218] [<ffffffff81513035>] ? __schedule+0x49d/0x4b5
[ 241.260227] [<ffffffff811faae5>] ? nfsd4_exchange_id+0x23e/0x23e
[ 241.260237] [<ffffffff8105b8ad>] process_one_work+0x190/0x28d
[ 241.260248] [<ffffffff8105c4e7>] worker_thread+0x105/0x189
[ 241.260260] [<ffffffff81513a75>] ? _raw_spin_unlock_irqrestore+0x1a/0x1d
[ 241.260274] [<ffffffff8105c3e2>] ? manage_workers.clone.17+0x173/0x173
[ 241.260287] [<ffffffff8105ff30>] kthread+0x8a/0x92
[ 241.260325] [<ffffffff815158a4>] kernel_thread_helper+0x4/0x10
[ 241.260335] [<ffffffff8105fea6>] ?
kthread_freezable_should_stop+0x47/0x47
[ 241.260343] [<ffffffff815158a0>] ? gs_change+0x13/0x13
[ 361.260025] INFO: task kworker/u:2:791 blocked for more than 120 seconds.
[ 361.260032] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 361.260039] kworker/u:2 D 000000000000000c 0 791 2
0x00000000
[ 361.260051] ffff8801390b1cf0 0000000000000046 0000000000012d00
0000000000012d00
[ 361.260062] 0000000000012d00 ffff880139141470 0000000000012d00
ffff8801390b1fd8
[ 361.260072] ffff8801390b1fd8 0000000000012d00 ffff880139cdc420
ffff880139141470
[ 361.260084] Call Trace:
[ 361.260099] [<ffffffff81513116>] schedule+0x64/0x66
[ 361.260110] [<ffffffff812005a6>] cld_pipe_upcall+0x95/0xd1
[ 361.260121] [<ffffffff811faae5>] ? nfsd4_exchange_id+0x23e/0x23e
[ 361.260130] [<ffffffff81200a5e>] nfsd4_cld_grace_done+0x50/0x8a
[ 361.260139] [<ffffffff81200f8b>] nfsd4_record_grace_done+0x18/0x1a
[ 361.260148] [<ffffffff811fab2f>] laundromat_main+0x4a/0x213
[ 361.260158] [<ffffffff81069aeb>] ? need_resched+0x1e/0x28
[ 361.260166] [<ffffffff81513035>] ? __schedule+0x49d/0x4b5
[ 361.260175] [<ffffffff811faae5>] ? nfsd4_exchange_id+0x23e/0x23e
[ 361.260185] [<ffffffff8105b8ad>] process_one_work+0x190/0x28d
[ 361.260194] [<ffffffff8105c4e7>] worker_thread+0x105/0x189
[ 361.260203] [<ffffffff81513a75>] ? _raw_spin_unlock_irqrestore+0x1a/0x1d
[ 361.260213] [<ffffffff8105c3e2>] ? manage_workers.clone.17+0x173/0x173
[ 361.260222] [<ffffffff8105ff30>] kthread+0x8a/0x92
[ 361.260231] [<ffffffff815158a4>] kernel_thread_helper+0x4/0x10
[ 361.260241] [<ffffffff8105fea6>] ?
kthread_freezable_should_stop+0x47/0x47
[ 361.260249] [<ffffffff815158a0>] ? gs_change+0x13/0x13
[ 481.260010] INFO: task kworker/u:2:791 blocked for more than 120 seconds.
[ 481.260019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 481.260028] kworker/u:2 D 000000000000000c 0 791 2
0x00000000
[ 481.260043] ffff8801390b1cf0 0000000000000046 0000000000012d00
0000000000012d00
[ 481.260058] 0000000000012d00 ffff880139141470 0000000000012d00
ffff8801390b1fd8
[ 481.260073] ffff8801390b1fd8 0000000000012d00 ffff880139cdc420
ffff880139141470
[ 481.260088] Call Trace:
[ 481.260107] [<ffffffff81513116>] schedule+0x64/0x66
[ 481.260120] [<ffffffff812005a6>] cld_pipe_upcall+0x95/0xd1
[ 481.260135] [<ffffffff811faae5>] ? nfsd4_exchange_id+0x23e/0x23e
[ 481.260147] [<ffffffff81200a5e>] nfsd4_cld_grace_done+0x50/0x8a
[ 481.260159] [<ffffffff81200f8b>] nfsd4_record_grace_done+0x18/0x1a
[ 481.260172] [<ffffffff811fab2f>] laundromat_main+0x4a/0x213
[ 481.260185] [<ffffffff81069aeb>] ? need_resched+0x1e/0x28
[ 481.260196] [<ffffffff81513035>] ? __schedule+0x49d/0x4b5
[ 481.260206] [<ffffffff811faae5>] ? nfsd4_exchange_id+0x23e/0x23e
[ 481.260215] [<ffffffff8105b8ad>] process_one_work+0x190/0x28d
[ 481.260225] [<ffffffff8105c4e7>] worker_thread+0x105/0x189
[ 481.260234] [<ffffffff81513a75>] ? _raw_spin_unlock_irqrestore+0x1a/0x1d
[ 481.260243] [<ffffffff8105c3e2>] ? manage_workers.clone.17+0x173/0x173
[ 481.260252] [<ffffffff8105ff30>] kthread+0x8a/0x92
[ 481.260262] [<ffffffff815158a4>] kernel_thread_helper+0x4/0x10
[ 481.260271] [<ffffffff8105fea6>] ?
kthread_freezable_should_stop+0x47/0x47
[ 481.260279] [<ffffffff815158a0>] ? gs_change+0x13/0x13
dmesg on the client side:
[ 27.607606] gtk-query-immod[1976]: segfault at 1d2d1f30 ip b7734391
sp bfe3e984 error 4 in ld-2.13.so[b772b000+1d000]
[ 48.136763] start_kdeinit (2086): /proc/2086/oom_adj is deprecated,
please use /proc/2086/oom_score_adj instead.
[ 75.801804] blueman-applet[2150]: segfault at 1cf2cf30 ip b7741391 sp
bfb456b8 error 4 in ld-2.13.so[b7738000+1d000]
[ 140.226371] firefox[2175]: segfault at 1b065f30 ip b76f6391 sp
bfb15db8 error 4 in ld-2.13.so[b76ed000+1d000]
The firefox dump on client side produces no messages on server side.
md5sum's of ld-2.13.so are equal on server and client and across
kernlversions.
Did I miss the output off the debug patch?
--
Hans
next prev parent reply other threads:[~2012-06-04 20:13 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-28 22:19 nfsroot client will not start firefox or thunderbird from 3.4.0 nfsserver Hans de Bruin
2012-06-01 19:11 ` Hans de Bruin
2012-06-03 15:00 ` Hans de Bruin
2012-06-04 10:31 ` Konstantin Khlebnikov
2012-06-04 20:13 ` Hans de Bruin [this message]
2012-06-08 20:51 ` Hans de Bruin
2012-06-10 9:52 ` Jeff Layton
2012-06-10 13:56 ` Hans de Bruin
2012-06-11 10:22 ` Jeff Layton
2012-06-11 11:11 ` Jeff Layton
2012-06-13 18:18 ` Hans de Bruin
2012-06-14 1:37 ` Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FCD16F5.20407@xmsnet.nl \
--to=jmdebruin@xmsnet.nl \
--cc=khlebnikov@openvz.org \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.