From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757304AbZFXKEb (ORCPT ); Wed, 24 Jun 2009 06:04:31 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755562AbZFXKEO (ORCPT ); Wed, 24 Jun 2009 06:04:14 -0400 Received: from brick.kernel.dk ([93.163.65.50]:59018 "EHLO kernel.dk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753764AbZFXKEM (ORCPT ); Wed, 24 Jun 2009 06:04:12 -0400 Date: Wed, 24 Jun 2009 12:04:14 +0200 From: Jens Axboe To: Andrew Morton Cc: Linus Torvalds , Linux Kernel , hch@infradead.org Subject: Re: merging the per-bdi writeback patchset Message-ID: <20090624100414.GJ31415@kernel.dk> References: <20090623081156.GT31415@kernel.dk> <20090623014835.1fc8fb14.akpm@linux-foundation.org> <20090623085505.GU31415@kernel.dk> <20090623080137.33cdc45c.akpm@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090623080137.33cdc45c.akpm@linux-foundation.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 23 2009, Andrew Morton wrote: > On Tue, 23 Jun 2009 10:55:05 +0200 Jens Axboe wrote: > > > On Tue, Jun 23 2009, Andrew Morton wrote: > > > On Tue, 23 Jun 2009 10:11:56 +0200 Jens Axboe wrote: > > > > > > > Things are looking good for this patchset and it's been in -next for > > > > almost a week without any reports of problems. So I'd like to merge it > > > > for 2.6.31 if at all possible. Any objections? > > > > > > erk. I was rather expecting I'd have time to have a look at it all. > > > > OK, we can wait if we have to, just trying to avoid having to keep this > > fresh for one full cycle. I have posted this patchset 11 times though > > over months, so it's not like it's a new piece of work :-) > > Yeah, sorry. > > > > It's unclear to me actually _why_ the performance changes which were > > > observed have actually occurred. In fact it's a bit unclear (to me) > > > why the patchset was written and what it sets out to achieve :( > > > > It started out trying to get rid of the pdflush uneven writeout. If you > > look at various pdflush intensive workloads, even on a single disk you > > often have 5 or more pdflush threads working the same device. It's just > > not optimal. > > That's a bug, isn't it? This > > /* Is another pdflush already flushing this queue? */ > if (current_is_pdflush() && !writeback_acquire(bdi)) > break; > > isn't working. But that's on a per-inode basis. I didn't look further into the problem to be honest, just noticed that you very quickly get a handful of pdflush threads ticking along. > > Another issue was starvation with request allocation. Given > > that pdflush does non-blocking writes (it has to, by design), pdflush > > can potentially be starved if someone else is working the device. > > hm, true. 100% starved, or just "slowed down"? The latter I trust - > otherwise there are still failure modes? Just slowed down, I'm suspecting this is where the lumpiness comes from as well. At least in the cases I have seen, in theory you could starve the pdflush thread indefinitely. > > > A long time ago the XFS guys (Dave Chinner iirc) said that XFS needs > > > more than one thread per device to keep the device saturated. Did that > > > get addressed? > > > > It supports up to 32-threads per device, but Chinner et all have been > > silent. So the support is there and there's a > > super_operations->inode_get_wb() to map a dirty inode to a writeback > > device. Nobody is doing that yet though. > > OK. > > How many kernel threads do the 1000-spindle people end up with? If all 1000 spindles are exposed and flushing dirty data, you get 1000 threads. Realistically, you'll likely use some sort of dm/md frontend though. And then you only get 1 thread per dm/md device. -- Jens Axboe