From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx198.postini.com [74.125.245.198]) by kanga.kvack.org (Postfix) with SMTP id A7BE36B008C for ; Tue, 11 Dec 2012 03:23:34 -0500 (EST) Date: Tue, 11 Dec 2012 16:23:27 +0800 From: Fengguang Wu Subject: Re: livelock in __writeback_inodes_wb ? Message-ID: <20121211082327.GA15706@localhost> References: <20121128145515.GA26564@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20121128145515.GA26564@redhat.com> Sender: owner-linux-mm@kvack.org List-ID: To: Dave Jones , linux-mm@kvack.org, Linux Kernel Cc: Andrew Morton , Jan Kara , linux-fsdevel@vger.kernel.org On Wed, Nov 28, 2012 at 09:55:15AM -0500, Dave Jones wrote: > We had a user report the soft lockup detector kicked after 22 > seconds of no progress, with this trace.. Where is the original report? The reporter may help provide some clues on the workload that triggered the bug. > :BUG: soft lockup - CPU#1 stuck for 22s! [flush-8:16:3137] > :Pid: 3137, comm: flush-8:16 Not tainted 3.6.7-4.fc17.x86_64 #1 > :RIP: 0010:[] [] __list_del_entry+0x2c/0xd0 > :Call Trace: > : [] redirty_tail+0x5e/0x80 > : [] __writeback_inodes_wb+0x72/0xd0 > : [] wb_writeback+0x23b/0x2d0 > : [] wb_do_writeback+0xac/0x1f0 > : [] ? __internal_add_timer+0x130/0x130 > : [] bdi_writeback_thread+0x8b/0x230 > : [] ? wb_do_writeback+0x1f0/0x1f0 > : [] kthread+0x93/0xa0 > : [] kernel_thread_helper+0x4/0x10 > : [] ? kthread_freezable_should_stop+0x70/0x70 > : [] ? gs_change+0x13/0x13 > > Looking over the code, is it possible that something could be > dirtying pages faster than writeback can get them written out, > keeping us in this loop indefitely ? The bug reporter should know best whether there are heavy IO. However I suspect it's not directly caused by heavy IO: we will release &wb->list_lock before each __writeback_single_inode() call, which starts writeback IO for each inode. > Should there be something in this loop periodically poking > the watchdog perhaps ? It seems we failed to release &wb->list_lock in wb_writeback() for long time (dozens of seconds). That is, the inode_sleep_on_writeback() is somehow not called. However it's not obvious to me how come this can happen.. Thanks, Fengguang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org