From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nikola Ciprich Subject: 2.6.32 nfs regression? Date: Mon, 28 Dec 2009 13:10:18 +0100 Message-ID: <20091228121018.GA13987@develbox.linuxbox.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: nikola.ciprich-Jp3n8lUXroTtwjQa/ONI9g@public.gmane.org, Linux kernel list To: linux-nfs@vger.kernel.org Return-path: Received: from gwu.lbox.cz ([62.245.111.132]:56865 "EHLO gwu.lbox.cz" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750892AbZL1MTB (ORCPT ); Mon, 28 Dec 2009 07:19:01 -0500 Sender: linux-nfs-owner@vger.kernel.org List-ID: Hi, I just upgraded two my machines to 2.6.32.1, and am getting lots of nfs-related error messages and backtraces. those two repeat quite often: [60144.870302] rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket [60144.870436] nfsd: peername failed (err 107)! and the backtraces: [697321.291127] INFO: task nfsd:3707 blocked for more than 120 seconds. [697321.291132] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [697321.291135] nfsd D 00000000ffffffff 0 3707 2 0x00000080 [697321.291143] ffff880077997b30 0000000000000046 0000000000000000 ffffffff8106e995 [697321.291150] ffff880000000000 ffff8800779980c0 ffff880077998000 ffffffff8153a220 [697321.291157] ffff8800779983b8 ffff8800751d90f0 000000010a626ceb 0000000000000002 [697321.291164] Call Trace: [697321.291175] [] ? prepare_to_wait+0x25/0x80 [697321.291187] [] start_this_handle+0x246/0x530 [jbd2] [697321.291194] [] ? autoremove_wake_function+0x0/0x40 [697321.291202] [] jbd2_journal_start+0xc3/0xf0 [jbd2] [697321.291223] [] ext4_journal_start_sb+0x31/0x80 [ext4] [697321.291235] [] ext4_dirty_inode+0x25/0x60 [ext4] [697321.291242] [] __mark_inode_dirty+0x35/0x180 [697321.291247] [] inode_setattr+0x6c/0x180 [697321.291259] [] ext4_setattr+0x14b/0x340 [ext4] [697321.291264] [] notify_change+0x177/0x350 [697321.291279] [] nfsd_setattr+0x26d/0x510 [nfsd] [697321.291291] [] nfsd3_proc_setattr+0x7f/0xd0 [nfsd] [697321.291301] [] nfsd_dispatch+0xb5/0x230 [nfsd] [697321.291323] [] svc_process+0x477/0x780 [sunrpc] [697321.291333] [] ? nfsd+0x0/0x150 [nfsd] [697321.291342] [] nfsd+0xbd/0x150 [nfsd] [697321.291347] [] kthread+0x8e/0xa0 [697321.291352] [] child_rip+0xa/0x20 [697321.291357] [] ? kthread+0x0/0xa0 [697321.291361] [] ? child_rip+0x0/0x20 [697321.291365] 2 locks held by nfsd/3707: [697321.291368] #0: (hash_sem){......}, at: [] exp_readlock+0x10/0x20 [nfsd] [697321.291383] #1: (&sb->s_type->i_mutex_key#14){......}, at: [] nfsd_setattr+0x3d7/0x510 [nfsd] [697321.291403] INFO: task jbd2/dm-7-8:4448 blocked for more than 120 seconds. [697321.291406] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [697321.291409] jbd2/dm-7-8 D 00000000ffffffff 0 4448 2 0x00000080 [697321.291415] ffff8800722e7ce0 0000000000000046 0000000000000000 ffffffff8106e995 [697321.291422] ffff880000000000 ffff8800722d4f00 ffff8800722d4e40 ffff88007f8ace40 [697321.291428] ffff8800722d51f8 ffff8800751d9230 000000010a626d1b 0000000000000002 [697321.291435] Call Trace: [697321.291440] [] ? prepare_to_wait+0x25/0x80 [697321.291449] [] jbd2_journal_commit_transaction+0x1ea/0x1890 [jbd2] [697321.291455] [] ? lock_timer_base+0x36/0x70 [697321.291460] [] ? autoremove_wake_function+0x0/0x40 [697321.291466] [] ? try_to_del_timer_sync+0x44/0x100 [697321.291475] [] kjournald2+0xba/0x240 [jbd2] [697321.291479] [] ? autoremove_wake_function+0x0/0x40 [697321.291487] [] ? kjournald2+0x0/0x240 [jbd2] [697321.291492] [] kthread+0x8e/0xa0 [697321.291497] [] child_rip+0xa/0x20 [697321.291501] [] ? kthread+0x0/0xa0 [697321.291506] [] ? child_rip+0x0/0x20 [697321.291509] no locks held by jbd2/dm-7-8/4448. [698041.288035] INFO: task nfsd:3703 blocked for more than 120 seconds. [698041.288040] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [698041.288044] nfsd D 00000000ffffffff 0 3703 2 0x00000080 [698041.288051] ffff88007a309930 0000000000000046 0000000000000000 0000000000000000 [698041.288058] 0000000000000000 ffff88007a1200c0 ffff88007a120000 ffffffff8153a220 [698041.288065] ffff88007a12le_move+0x28/0x60 [698041.288522] [] ? file_move+0x28/0x60 [698041.288527] [] vfs_writev+0x39/0x60 [698041.288537] [] nfsd_vfs_write+0x103/0x410 [nfsd] [698041.288543] [] ? dentry_open+0x4d/0xb0 [698041.288553] [] ? nfsd_open+0x15c/0x1e0 [nfsd] [698041.288563] [] nfsd_write+0xe5/0x100 [nfsd] [698041.288575] [] nfsd3_proc_write+0xfe/0x140 [nfsd] [698041.288584] [] nfsd_dispatch+0xb5/0x230 [nfsd] [698041.288599] [] svc_process+0x477/0x780 [sunrpc] [698041.288610] [] ? nfsd+0x0/0x150 [nfsd] [698041.288618] [] nfsd+0xbd/0x150 [nfsd] [698041.288623] [] kthread+0x8e/0xa0 [698041.288628] [] child_rip+0xa/0x20 there's ext4 mentioned, so I'm not 100% sure if it ain't ext4-related, but on second machine (without ext4) I'm getting similar messages: [704103.575633] nfsd: peername failed (err 107)! [704280.628046] INFO: task reiserfs/0:2637 blocked for more than 120 seconds. [704280.628050] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [704280.628053] reiserfs/0 D 0000000000000000 0 2637 2 0x00000000 [704280.628058] ffff880139267c80 0000000000000046 0000000000000000 000000008103d1e9 [704280.628063] 0000000000000000 ffff88013f1400c0 ffff88013f140000 ffffffff8153a220 [704280.628067] ffff88013f1403b8 0000000000000000 000000010a7cf74c 0000000000000000 [704280.628071] Call Trace: [704280.628092] [] ? flush_commit_list+0x145/0x6f0 [reiserfs] [704280.628099] [] mutex_lock_nested+0x132/0x330 [704280.628106] [] ? flush_commit_list+0x145/0x6f0 [reiserfs] [704280.628115] [] flush_commit_list+0x145/0x6f0 [reiserfs] [704280.628123] [] flush_commit_list+0x237/0x6f0 [reiserfs] [704280.628131] [] ? flush_async_commits+0x0/0x70 [reiserfs] [704280.628138] [] flush_async_commits+0x57/0x70 [reiserfs] [704280.628144] [] worker_thread+0x235/0x390 [704280.628147] [] ? worker_thread+0x1e3/0x390 [704280.628151] [] ? autoremove_wake_function+0x0/0x40 [704280.628155] [] ? worker_thread+0x0/0x390 [704280.628158] [] kthread+0x8e/0xa0 [704280.628162] [] child_rip+0xa/0x20 [704280.628165] [] ? kthread+0x0/0xa0 [704280.628168] [] ? child_rip+0x0/0x20 [704280.628170] 3 locks held by reiserfs/0/2637: [704280.628172] #0: (reiserfs){......}, at: [] worker_thread+0x1e3/0x390 [704280.628178] #1: (&(&journal->j_work)->work){......}, at: [] worker_thread+0x1e3/0x390 [704280.628184] #2: (&jl->j_commit_mutex){......}, at: [] flush_commit_list+0x145/0x6f0 [reiserfs] [704329.500101] rpc-srv/tcp: nfsd: got error -32 when sending 140 bytes - shutting down socket which leads me to suspicion the problems are somewhere within nfs code... both machines are SMP x86_64. I'll update to 2.6.32.2 on Wednesday, but according to changelog, it doesn't seem to fix anything related... Could somebody please have a look at this? I'll be glad to provide further information if needed... -- ------------------------------------- Ing. Nikola CIPRICH LinuxBox.cz, s.r.o. 28. rijna 168, 709 01 Ostrava tel.: +420 596 603 142 fax: +420 596 621 273 mobil: +420 777 093 799 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: servis-Jp3n8lUXroTtwjQa/ONI9g@public.gmane.org ------------------------------------- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751403AbZL1MTG (ORCPT ); Mon, 28 Dec 2009 07:19:06 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751282AbZL1MTE (ORCPT ); Mon, 28 Dec 2009 07:19:04 -0500 Received: from gwu.lbox.cz ([62.245.111.132]:56865 "EHLO gwu.lbox.cz" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750892AbZL1MTB (ORCPT ); Mon, 28 Dec 2009 07:19:01 -0500 X-Greylist: delayed 587 seconds by postgrey-1.27 at vger.kernel.org; Mon, 28 Dec 2009 07:19:00 EST Date: Mon, 28 Dec 2009 13:10:18 +0100 From: Nikola Ciprich To: linux-nfs@vger.kernel.org Cc: nikola.ciprich@linuxbox.cz, Linux kernel list Subject: 2.6.32 nfs regression? Message-ID: <20091228121018.GA13987@develbox.linuxbox.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-12-10) X-Milter-Copy-Status: I X-Antivirus: on proxybox by Kaspersky antivirus, engine 5.5.10, data 3167693 records(28-12-2009) X-Envelope-From: ciprich@linuxbox.cz X-Spam-Score: N/A (trusted relay) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, I just upgraded two my machines to 2.6.32.1, and am getting lots of nfs-related error messages and backtraces. those two repeat quite often: [60144.870302] rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket [60144.870436] nfsd: peername failed (err 107)! and the backtraces: [697321.291127] INFO: task nfsd:3707 blocked for more than 120 seconds. [697321.291132] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [697321.291135] nfsd D 00000000ffffffff 0 3707 2 0x00000080 [697321.291143] ffff880077997b30 0000000000000046 0000000000000000 ffffffff8106e995 [697321.291150] ffff880000000000 ffff8800779980c0 ffff880077998000 ffffffff8153a220 [697321.291157] ffff8800779983b8 ffff8800751d90f0 000000010a626ceb 0000000000000002 [697321.291164] Call Trace: [697321.291175] [] ? prepare_to_wait+0x25/0x80 [697321.291187] [] start_this_handle+0x246/0x530 [jbd2] [697321.291194] [] ? autoremove_wake_function+0x0/0x40 [697321.291202] [] jbd2_journal_start+0xc3/0xf0 [jbd2] [697321.291223] [] ext4_journal_start_sb+0x31/0x80 [ext4] [697321.291235] [] ext4_dirty_inode+0x25/0x60 [ext4] [697321.291242] [] __mark_inode_dirty+0x35/0x180 [697321.291247] [] inode_setattr+0x6c/0x180 [697321.291259] [] ext4_setattr+0x14b/0x340 [ext4] [697321.291264] [] notify_change+0x177/0x350 [697321.291279] [] nfsd_setattr+0x26d/0x510 [nfsd] [697321.291291] [] nfsd3_proc_setattr+0x7f/0xd0 [nfsd] [697321.291301] [] nfsd_dispatch+0xb5/0x230 [nfsd] [697321.291323] [] svc_process+0x477/0x780 [sunrpc] [697321.291333] [] ? nfsd+0x0/0x150 [nfsd] [697321.291342] [] nfsd+0xbd/0x150 [nfsd] [697321.291347] [] kthread+0x8e/0xa0 [697321.291352] [] child_rip+0xa/0x20 [697321.291357] [] ? kthread+0x0/0xa0 [697321.291361] [] ? child_rip+0x0/0x20 [697321.291365] 2 locks held by nfsd/3707: [697321.291368] #0: (hash_sem){......}, at: [] exp_readlock+0x10/0x20 [nfsd] [697321.291383] #1: (&sb->s_type->i_mutex_key#14){......}, at: [] nfsd_setattr+0x3d7/0x510 [nfsd] [697321.291403] INFO: task jbd2/dm-7-8:4448 blocked for more than 120 seconds. [697321.291406] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [697321.291409] jbd2/dm-7-8 D 00000000ffffffff 0 4448 2 0x00000080 [697321.291415] ffff8800722e7ce0 0000000000000046 0000000000000000 ffffffff8106e995 [697321.291422] ffff880000000000 ffff8800722d4f00 ffff8800722d4e40 ffff88007f8ace40 [697321.291428] ffff8800722d51f8 ffff8800751d9230 000000010a626d1b 0000000000000002 [697321.291435] Call Trace: [697321.291440] [] ? prepare_to_wait+0x25/0x80 [697321.291449] [] jbd2_journal_commit_transaction+0x1ea/0x1890 [jbd2] [697321.291455] [] ? lock_timer_base+0x36/0x70 [697321.291460] [] ? autoremove_wake_function+0x0/0x40 [697321.291466] [] ? try_to_del_timer_sync+0x44/0x100 [697321.291475] [] kjournald2+0xba/0x240 [jbd2] [697321.291479] [] ? autoremove_wake_function+0x0/0x40 [697321.291487] [] ? kjournald2+0x0/0x240 [jbd2] [697321.291492] [] kthread+0x8e/0xa0 [697321.291497] [] child_rip+0xa/0x20 [697321.291501] [] ? kthread+0x0/0xa0 [697321.291506] [] ? child_rip+0x0/0x20 [697321.291509] no locks held by jbd2/dm-7-8/4448. [698041.288035] INFO: task nfsd:3703 blocked for more than 120 seconds. [698041.288040] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [698041.288044] nfsd D 00000000ffffffff 0 3703 2 0x00000080 [698041.288051] ffff88007a309930 0000000000000046 0000000000000000 0000000000000000 [698041.288058] 0000000000000000 ffff88007a1200c0 ffff88007a120000 ffffffff8153a220 [698041.288065] ffff88007a12le_move+0x28/0x60 [698041.288522] [] ? file_move+0x28/0x60 [698041.288527] [] vfs_writev+0x39/0x60 [698041.288537] [] nfsd_vfs_write+0x103/0x410 [nfsd] [698041.288543] [] ? dentry_open+0x4d/0xb0 [698041.288553] [] ? nfsd_open+0x15c/0x1e0 [nfsd] [698041.288563] [] nfsd_write+0xe5/0x100 [nfsd] [698041.288575] [] nfsd3_proc_write+0xfe/0x140 [nfsd] [698041.288584] [] nfsd_dispatch+0xb5/0x230 [nfsd] [698041.288599] [] svc_process+0x477/0x780 [sunrpc] [698041.288610] [] ? nfsd+0x0/0x150 [nfsd] [698041.288618] [] nfsd+0xbd/0x150 [nfsd] [698041.288623] [] kthread+0x8e/0xa0 [698041.288628] [] child_rip+0xa/0x20 there's ext4 mentioned, so I'm not 100% sure if it ain't ext4-related, but on second machine (without ext4) I'm getting similar messages: [704103.575633] nfsd: peername failed (err 107)! [704280.628046] INFO: task reiserfs/0:2637 blocked for more than 120 seconds. [704280.628050] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [704280.628053] reiserfs/0 D 0000000000000000 0 2637 2 0x00000000 [704280.628058] ffff880139267c80 0000000000000046 0000000000000000 000000008103d1e9 [704280.628063] 0000000000000000 ffff88013f1400c0 ffff88013f140000 ffffffff8153a220 [704280.628067] ffff88013f1403b8 0000000000000000 000000010a7cf74c 0000000000000000 [704280.628071] Call Trace: [704280.628092] [] ? flush_commit_list+0x145/0x6f0 [reiserfs] [704280.628099] [] mutex_lock_nested+0x132/0x330 [704280.628106] [] ? flush_commit_list+0x145/0x6f0 [reiserfs] [704280.628115] [] flush_commit_list+0x145/0x6f0 [reiserfs] [704280.628123] [] flush_commit_list+0x237/0x6f0 [reiserfs] [704280.628131] [] ? flush_async_commits+0x0/0x70 [reiserfs] [704280.628138] [] flush_async_commits+0x57/0x70 [reiserfs] [704280.628144] [] worker_thread+0x235/0x390 [704280.628147] [] ? worker_thread+0x1e3/0x390 [704280.628151] [] ? autoremove_wake_function+0x0/0x40 [704280.628155] [] ? worker_thread+0x0/0x390 [704280.628158] [] kthread+0x8e/0xa0 [704280.628162] [] child_rip+0xa/0x20 [704280.628165] [] ? kthread+0x0/0xa0 [704280.628168] [] ? child_rip+0x0/0x20 [704280.628170] 3 locks held by reiserfs/0/2637: [704280.628172] #0: (reiserfs){......}, at: [] worker_thread+0x1e3/0x390 [704280.628178] #1: (&(&journal->j_work)->work){......}, at: [] worker_thread+0x1e3/0x390 [704280.628184] #2: (&jl->j_commit_mutex){......}, at: [] flush_commit_list+0x145/0x6f0 [reiserfs] [704329.500101] rpc-srv/tcp: nfsd: got error -32 when sending 140 bytes - shutting down socket which leads me to suspicion the problems are somewhere within nfs code... both machines are SMP x86_64. I'll update to 2.6.32.2 on Wednesday, but according to changelog, it doesn't seem to fix anything related... Could somebody please have a look at this? I'll be glad to provide further information if needed... -- ------------------------------------- Ing. Nikola CIPRICH LinuxBox.cz, s.r.o. 28. rijna 168, 709 01 Ostrava tel.: +420 596 603 142 fax: +420 596 621 273 mobil: +420 777 093 799 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: servis@linuxbox.cz -------------------------------------