From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jens Axboe Subject: Re: [RFC] page-writeback: move indoes from one superblock together Date: Thu, 24 Sep 2009 14:35:19 +0200 Message-ID: <20090924123519.GF23126@kernel.dk> References: <1253775260.10618.10.camel@sli10-desk.sh.intel.com> <20090924100136.GA25778@localhost> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: "Li, Shaohua" , lkml , Peter Zijlstra , Andrew Morton , Chris Mason , linux-fsdevel@vger.kernel.org, Jan Kara To: Wu Fengguang Return-path: Content-Disposition: inline In-Reply-To: <20090924100136.GA25778@localhost> Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Thu, Sep 24 2009, Wu Fengguang wrote: > On Thu, Sep 24, 2009 at 02:54:20PM +0800, Li, Shaohua wrote: > > __mark_inode_dirty adds inode to wb dirty list in random order. If a disk has > > several partitions, writeback might keep spindle moving between partitions. > > To reduce the move, better write big chunk of one partition and then move to > > another. Inodes from one fs usually are in one partion, so idealy move indoes > > from one fs together should reduce spindle move. This patch tries to address > > this. Before per-bdi writeback is added, the behavior is write indoes > > from one fs first and then another, so the patch restores previous behavior. > > The loop in the patch is a bit ugly, should we add a dirty list for each > > superblock in bdi_writeback? > > > > Test in a two partition disk with attached fio script shows about 3% ~ 6% > > improvement. > > A side note: given the noticeable performance gain, I wonder if it > deserves to generalize the idea to do whole disk location ordered > writeback. That should benefit many small file workloads more than > 10%. Because this patch only sorted 2 partitions and inodes in 5s > time window, while the below patch will roughly divide the disk into > 5 areas and sort inodes in a larger 25s time window. > > http://lkml.org/lkml/2007/8/27/45 > > Judging from this old patch, the complexity cost would be about 250 > lines of code (need a rbtree). First of all, nice patch, I'll add it to the current tree. I too was pondering using an rbtree for sb+dirty_time insertion and extraction. But for 100 inodes or less, I bet that just doing the re-sort in writeback time ends up being cheaper on the CPU cycle side. -- Jens Axboe