From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff Moyer Subject: Re: [Lsf-pc] [LSF/MM TOPIC] a few storage topics Date: Tue, 24 Jan 2012 15:13:40 -0500 Message-ID: References: <20120123161857.GC28526@quack.suse.cz> <20120123175353.GD30782@redhat.com> <20120124151504.GQ4387@shiny> <20120124165631.GA8941@infradead.org> <186EA560-1720-4975-AC2F-8C72C4A777A9@dilger.ca> <20120124184054.GA23227@infradead.org> <20120124190732.GH4387@shiny> <20120124200932.GB20650@quack.suse.cz> Reply-To: device-mapper development Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Cc: Andreas Dilger , Andrea Arcangeli , "linux-scsi@vger.kernel.org" , Mike Snitzer , Christoph Hellwig , "dm-devel@redhat.com" , fengguang.wu@gmail.com, Boaz Harrosh , "linux-fsdevel@vger.kernel.org" , "lsf-pc@lists.linux-foundation.org" , Chris Mason To: Jan Kara Return-path: In-Reply-To: <20120124200932.GB20650@quack.suse.cz> (Jan Kara's message of "Tue, 24 Jan 2012 21:09:32 +0100") List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com List-Id: linux-fsdevel.vger.kernel.org Jan Kara writes: > On Tue 24-01-12 14:14:14, Jeff Moyer wrote: >> Chris Mason writes: >> >> >> All three filesystems use the generic mpages code for reads, so they >> >> all get the same (bad) I/O patterns. Looks like we need to fix this up >> >> ASAP. >> > >> > Can you easily run btrfs through the same rig? We don't use mpages and >> > I'm curious. >> >> The readahead code was to blame, here. I wonder if we can change the >> logic there to not break larger I/Os down into smaller sized ones. >> Fengguang, doing a dd if=file of=/dev/null bs=1M results in 128K I/Os, >> when 128KB is the read_ahead_kb value. Is there any heuristic you could >> apply to not break larger I/Os up like this? Does that make sense? > Well, not breaking up I/Os would be fairly simple as ondemand_readahead() > already knows how much do we want to read. We just trim the submitted I/O to > read_ahead_kb artificially. And that is done so that you don't trash page > cache (possibly evicting pages you have not yet copied to userspace) when > there are several processes doing large reads. Do you really think applications issue large reads and then don't use the data? I mean, I've seen some bad programming, so I can believe that would be the case. Still, I'd like to think it doesn't happen. ;-) > Maybe 128 KB is a too small default these days but OTOH noone prevents you > from raising it (e.g. SLES uses 1 MB as a default). For some reason, I thought it had been bumped to 512KB by default. Must be that overactive imagination I have... Anyway, if all of the distros start bumping the default, don't you think it's time to consider bumping it upstream, too? I thought there was a lot of work put into not being too aggressive on readahead, so the downside of having a larger read_ahead_kb setting was fairly small. Cheers, Jeff