From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tao Ma Subject: Re: [Bug 18632] "INFO: task" dpkg "blocked for more than 120 seconds. Date: Thu, 09 Jun 2011 22:51:56 +0800 Message-ID: <4DF0DE0C.2020902@tao.ma> References: <201106082138.p58Lchgj002615@demeter2.kernel.org> <20110608150241.8412a63d.akpm@linux-foundation.org> <20110609033217.GA10741@localhost> <20110609035426.GA12061@localhost> <20110609082718.GA10335@infradead.org> <20110609090906.GA19186@localhost> <20110609110214.GA9017@infradead.org> <20110609121117.GA5768@localhost> <4DF0D4D6.2060800@tao.ma> <20110609142133.GA12658@infradead.org> <20110609143256.GE29913@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Christoph Hellwig , Wu Fengguang , Dave Chinner , Andrew Morton , Jan Kara , "linux-fsdevel@vger.kernel.org" , "bugzilla-daemon@bugzilla.kernel.org" , "daaugusto@gmail.com" , "kernel-bugzilla@cygnusx-1.org" , "listposter@gmail.com" , "justincase@yopmail.com" , "clopez@igalia.com" , Jens Axboe , Shaohua Li To: Vivek Goyal Return-path: Received: from oproxy6-pub.bluehost.com ([67.222.54.6]:34547 "HELO oproxy6-pub.bluehost.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1751595Ab1FIOwM (ORCPT ); Thu, 9 Jun 2011 10:52:12 -0400 In-Reply-To: <20110609143256.GE29913@redhat.com> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On 06/09/2011 10:32 PM, Vivek Goyal wrote: > On Thu, Jun 09, 2011 at 10:21:33AM -0400, Christoph Hellwig wrote: >> On Thu, Jun 09, 2011 at 10:12:38PM +0800, Tao Ma wrote: >>> Just want to say more about the situation here. Actually the flusher is >>> too much easier to be blocked by the sync requests. And whenever it is >>> blocked, it takes a quite long time to get back(because several cfq >>> designs), so do you think we can use WRITE_SYNC for the bdev inodes in >>> flusher? AFAICS, in most of the cases when a volume is mounted, the >>> writeback for a bdev inode means the metadata writeback. And they are >>> very important for a file system and should be written as far as >>> possible. I ran my test cases with the change, and now the livelock >>> doesn't show up anymore. >> >> It's not a very good guestimate for metadata. A lot of metadata is >> either stored in directories (often happens for directories) or doesn't >> use the pagecache writeback functions at all. >> >> The major problem here seems to be that async requests simply starve >> sync requests far too much. > > You mean sync requests starve async requests? > > It is possible that CFQ can starve async requests for long time in > presence of sync reqeusts. If that's the case, all the reported issues > should go away with deadline scheduler. > > As I mentioned in other mail, one commit made the dias very heavily > loaded in favor of sync requests. > > commit f8ae6e3eb8251be32c6e913393d9f8d9e0609489 > Author: Shaohua Li > Date: Fri Jan 14 08:41:02 2011 +0100 > > block cfq: make queue preempt work for queues from different workload > > > I will do some quick tests and try to write a small patch where we can > keep track of how many times sync and async workloads have been scheduled > and make sure we don't starve async req completely. oh, so you mean the patch is the culprit? I will try to revert it to see whether the system works better. Regards, Tao