From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p9AIbZO9196156 for ; Mon, 10 Oct 2011 13:37:35 -0500 Received: from mail-iy0-f181.google.com (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id E0BB64F0372 for ; Mon, 10 Oct 2011 11:37:34 -0700 (PDT) Received: from mail-iy0-f181.google.com (mail-iy0-f181.google.com [209.85.210.181]) by cuda.sgi.com with ESMTP id DWUTx1wuJv1rwfZE for ; Mon, 10 Oct 2011 11:37:34 -0700 (PDT) Received: by iahk25 with SMTP id k25so2500857iah.26 for ; Mon, 10 Oct 2011 11:37:34 -0700 (PDT) Date: Mon, 10 Oct 2011 11:37:30 -0700 From: Tejun Heo Subject: Re: [PATCH 3/4] xfs: revert to using a kthread for AIL pushing Message-ID: <20111010183730.GJ8100@google.com> References: <20111006183257.036884724@bombadil.infradead.org> <20111006183549.770414484@bombadil.infradead.org> <20111010014509.GT3159@dastard> <20111010055546.GA1641@x4.trippels.de> <20111010132611.GA1248@infradead.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20111010132611.GA1248@infradead.org> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: Christoph Hellwig Cc: Stefan Priebe , Markus Trippelsdorf , xfs@oss.sgi.com Hello, On Mon, Oct 10, 2011 at 09:26:11AM -0400, Christoph Hellwig wrote: > On Mon, Oct 10, 2011 at 07:55:46AM +0200, Markus Trippelsdorf wrote: > > Wouldn't it be possible to verify that the problem also goes away with > > this simple one liner? > > We've been through a few variants, and none fixed it while Stefan had > to try them on production machines. > > To be honest I'm not convinced at all that a workqueue was such a good > idea for the ail in particular. It works extremly well for things were > we can easily define a work item, e.g. an object that gets queued up > and a method on it gets exectured. But for the AIL we really have > a changing target that needs more or less constant pushing, and the > target keeps changing while executing our work. Conceptually it fits > the idea of an thread much better, with the added benefit of not relying > on finding a combination of workqueue flags that gets the exact > behaviour (exectuion ASAP without any limits because of other items > or required memory allocation). > > And unlike the various per-cpu threads we used to have it is only one > thread per filesystem anyway. I don't know xfs internals at all so I don't have too strong an opinion at this point but don't we at least need to understand what's going on? CPU_INTENSIVE / HIGHPRI flags shouldn't cause deadlock unless some work items are doing busy looping waiting for another work item to do something (busy yielding might achieve similar effect tho). They don't change forward progress guarantee. The only thing which can cause stall is lack of MEM_RECLAIM. One thing to be careful about is that each wq has only one rescuer, so if more than one work items have inter-dependency, it might still lead to deadlock and they need to be served by different workqueues. The reasons for moving away from using kthread directly are two folded - resources and correctness. I've gone through a number of kthread users during auditing freezer usage recently and more than half of them get the synchronization against kthread_stop() or freezer wrong (to be fair, the rules are quite tricky). The problem with those bugs is that they are really obscure race conditions and won't trigger easily. Thank you. -- tejun _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs