From mboxrd@z Thu Jan 1 00:00:00 1970 From: "J. Bruce Fields" Subject: Re: 2.6.32 nfs regression? Date: Wed, 6 Jan 2010 14:05:34 -0500 Message-ID: <20100106190534.GD6612@fieldses.org> References: <20091228121018.GA13987@develbox.linuxbox.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: linux-nfs@vger.kernel.org, nikola.ciprich@linuxbox.cz, Linux kernel list , linux-ext4@vger.kernel.org To: Nikola Ciprich Return-path: Content-Disposition: inline In-Reply-To: <20091228121018.GA13987@develbox.linuxbox.cz> Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-ext4.vger.kernel.org On Mon, Dec 28, 2009 at 01:10:18PM +0100, Nikola Ciprich wrote: > Hi, > I just upgraded two my machines to 2.6.32.1, and am getting lots of nfs-related > error messages and backtraces. > > those two repeat quite often: > [60144.870302] rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket -ECONNRESET (I wonder why?) > [60144.870436] nfsd: peername failed (err 107)! This looks harmless. > and the backtraces: > [697321.291127] INFO: task nfsd:3707 blocked for more than 120 seconds. > [697321.291132] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [697321.291135] nfsd D 00000000ffffffff 0 3707 2 0x00000080 > [697321.291143] ffff880077997b30 0000000000000046 0000000000000000 ffffffff8106e995 > [697321.291150] ffff880000000000 ffff8800779980c0 ffff880077998000 ffffffff8153a220 > [697321.291157] ffff8800779983b8 ffff8800751d90f0 000000010a626ceb 0000000000000002 > [697321.291164] Call Trace: > [697321.291175] [] ? prepare_to_wait+0x25/0x80 > [697321.291187] [] start_this_handle+0x246/0x530 [jbd2] > [697321.291194] [] ? autoremove_wake_function+0x0/0x40 > [697321.291202] [] jbd2_journal_start+0xc3/0xf0 [jbd2] > [697321.291223] [] ext4_journal_start_sb+0x31/0x80 [ext4] > [697321.291235] [] ext4_dirty_inode+0x25/0x60 [ext4] > [697321.291242] [] __mark_inode_dirty+0x35/0x180 > [697321.291247] [] inode_setattr+0x6c/0x180 > [697321.291259] [] ext4_setattr+0x14b/0x340 [ext4] This looks likely to be an ext4 problem, but I don't know. --b. > [697321.291264] [] notify_change+0x177/0x350 > [697321.291279] [] nfsd_setattr+0x26d/0x510 [nfsd] > [697321.291291] [] nfsd3_proc_setattr+0x7f/0xd0 [nfsd] > [697321.291301] [] nfsd_dispatch+0xb5/0x230 [nfsd] > [697321.291323] [] svc_process+0x477/0x780 [sunrpc] > [697321.291333] [] ? nfsd+0x0/0x150 [nfsd] > [697321.291342] [] nfsd+0xbd/0x150 [nfsd] > [697321.291347] [] kthread+0x8e/0xa0 > [697321.291352] [] child_rip+0xa/0x20 > [697321.291357] [] ? kthread+0x0/0xa0 > [697321.291361] [] ? child_rip+0x0/0x20 > [697321.291365] 2 locks held by nfsd/3707: > [697321.291368] #0: (hash_sem){......}, at: [] exp_readlock+0x10/0x20 [nfsd] > [697321.291383] #1: (&sb->s_type->i_mutex_key#14){......}, at: [] nfsd_setattr+0x3d7/0x510 [nfsd] > [697321.291403] INFO: task jbd2/dm-7-8:4448 blocked for more than 120 seconds. > [697321.291406] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [697321.291409] jbd2/dm-7-8 D 00000000ffffffff 0 4448 2 0x00000080 > [697321.291415] ffff8800722e7ce0 0000000000000046 0000000000000000 ffffffff8106e995 > [697321.291422] ffff880000000000 ffff8800722d4f00 ffff8800722d4e40 ffff88007f8ace40 > [697321.291428] ffff8800722d51f8 ffff8800751d9230 000000010a626d1b 0000000000000002 > [697321.291435] Call Trace: > [697321.291440] [] ? prepare_to_wait+0x25/0x80 > [697321.291449] [] jbd2_journal_commit_transaction+0x1ea/0x1890 [jbd2] > [697321.291455] [] ? lock_timer_base+0x36/0x70 > [697321.291460] [] ? autoremove_wake_function+0x0/0x40 > [697321.291466] [] ? try_to_del_timer_sync+0x44/0x100 > [697321.291475] [] kjournald2+0xba/0x240 [jbd2] > [697321.291479] [] ? autoremove_wake_function+0x0/0x40 > [697321.291487] [] ? kjournald2+0x0/0x240 [jbd2] > [697321.291492] [] kthread+0x8e/0xa0 > [697321.291497] [] child_rip+0xa/0x20 > [697321.291501] [] ? kthread+0x0/0xa0 > [697321.291506] [] ? child_rip+0x0/0x20 > [697321.291509] no locks held by jbd2/dm-7-8/4448. > [698041.288035] INFO: task nfsd:3703 blocked for more than 120 seconds. > [698041.288040] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [698041.288044] nfsd D 00000000ffffffff 0 3703 2 0x00000080 > [698041.288051] ffff88007a309930 0000000000000046 0000000000000000 0000000000000000 > [698041.288058] 0000000000000000 ffff88007a1200c0 ffff88007a120000 ffffffff8153a220 > [698041.288065] ffff88007a12le_move+0x28/0x60 > [698041.288522] [] ? file_move+0x28/0x60 > [698041.288527] [] vfs_writev+0x39/0x60 > [698041.288537] [] nfsd_vfs_write+0x103/0x410 [nfsd] > [698041.288543] [] ? dentry_open+0x4d/0xb0 > [698041.288553] [] ? nfsd_open+0x15c/0x1e0 [nfsd] > [698041.288563] [] nfsd_write+0xe5/0x100 [nfsd] > [698041.288575] [] nfsd3_proc_write+0xfe/0x140 [nfsd] > [698041.288584] [] nfsd_dispatch+0xb5/0x230 [nfsd] > [698041.288599] [] svc_process+0x477/0x780 [sunrpc] > [698041.288610] [] ? nfsd+0x0/0x150 [nfsd] > [698041.288618] [] nfsd+0xbd/0x150 [nfsd] > [698041.288623] [] kthread+0x8e/0xa0 > [698041.288628] [] child_rip+0xa/0x20 > > there's ext4 mentioned, so I'm not 100% sure if it ain't ext4-related, > but on second machine (without ext4) I'm getting similar messages: > > [704103.575633] nfsd: peername failed (err 107)! > [704280.628046] INFO: task reiserfs/0:2637 blocked for more than 120 seconds. > [704280.628050] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [704280.628053] reiserfs/0 D 0000000000000000 0 2637 2 0x00000000 > [704280.628058] ffff880139267c80 0000000000000046 0000000000000000 000000008103d1e9 > [704280.628063] 0000000000000000 ffff88013f1400c0 ffff88013f140000 ffffffff8153a220 > [704280.628067] ffff88013f1403b8 0000000000000000 000000010a7cf74c 0000000000000000 > [704280.628071] Call Trace: > [704280.628092] [] ? flush_commit_list+0x145/0x6f0 [reiserfs] > [704280.628099] [] mutex_lock_nested+0x132/0x330 > [704280.628106] [] ? flush_commit_list+0x145/0x6f0 [reiserfs] > [704280.628115] [] flush_commit_list+0x145/0x6f0 [reiserfs] > [704280.628123] [] flush_commit_list+0x237/0x6f0 [reiserfs] > [704280.628131] [] ? flush_async_commits+0x0/0x70 [reiserfs] > [704280.628138] [] flush_async_commits+0x57/0x70 [reiserfs] > [704280.628144] [] worker_thread+0x235/0x390 > [704280.628147] [] ? worker_thread+0x1e3/0x390 > [704280.628151] [] ? autoremove_wake_function+0x0/0x40 > [704280.628155] [] ? worker_thread+0x0/0x390 > [704280.628158] [] kthread+0x8e/0xa0 > [704280.628162] [] child_rip+0xa/0x20 > [704280.628165] [] ? kthread+0x0/0xa0 > [704280.628168] [] ? child_rip+0x0/0x20 > [704280.628170] 3 locks held by reiserfs/0/2637: > [704280.628172] #0: (reiserfs){......}, at: [] worker_thread+0x1e3/0x390 > [704280.628178] #1: (&(&journal->j_work)->work){......}, at: [] worker_thread+0x1e3/0x390 > [704280.628184] #2: (&jl->j_commit_mutex){......}, at: [] flush_commit_list+0x145/0x6f0 [reiserfs] > [704329.500101] rpc-srv/tcp: nfsd: got error -32 when sending 140 bytes - shutting down socket > > which leads me to suspicion the problems are somewhere within nfs code... > both machines are SMP x86_64. > I'll update to 2.6.32.2 on Wednesday, but according to changelog, it doesn't > seem to fix anything related... > Could somebody please have a look at this? I'll be glad to provide further > information if needed... > > -- > ------------------------------------- > Ing. Nikola CIPRICH > LinuxBox.cz, s.r.o. > 28. rijna 168, 709 01 Ostrava > > tel.: +420 596 603 142 > fax: +420 596 621 273 > mobil: +420 777 093 799 > www.linuxbox.cz > > mobil servis: +420 737 238 656 > email servis: servis@linuxbox.cz > ------------------------------------- > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html