From: Jeff Moyer <jmoyer@redhat.com>
To: Jan Kara <jack@suse.cz>
Cc: Andreas Dilger <adilger@dilger.ca>,
Andrea Arcangeli <aarcange@redhat.com>,
"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
Mike Snitzer <snitzer@redhat.com>,
"neilb@suse.de" <neilb@suse.de>,
Christoph Hellwig <hch@infradead.org>,
"dm-devel@redhat.com" <dm-devel@redhat.com>,
fengguang.wu@gmail.com, Boaz Harrosh <bharrosh@panasas.com>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"lsf-pc@lists.linux-foundation.org"
<lsf-pc@lists.linux-foundation.org>,
Chris Mason <chris.mason@oracle.com>,
"Darrick J.Wong" <djwong@us.ibm.com>
Subject: Re: [Lsf-pc] [dm-devel] [LSF/MM TOPIC] a few storage topics
Date: Tue, 24 Jan 2012 15:59:02 -0500 [thread overview]
Message-ID: <x49liowkeax.fsf@segfault.boston.devel.redhat.com> (raw)
In-Reply-To: <20120124203936.GC20650@quack.suse.cz> (Jan Kara's message of "Tue, 24 Jan 2012 21:39:36 +0100")
Jan Kara <jack@suse.cz> writes:
> On Tue 24-01-12 15:13:40, Jeff Moyer wrote:
>> Jan Kara <jack@suse.cz> writes:
>>
>> > On Tue 24-01-12 14:14:14, Jeff Moyer wrote:
>> >> Chris Mason <chris.mason@oracle.com> writes:
>> >>
>> >> >> All three filesystems use the generic mpages code for reads, so they
>> >> >> all get the same (bad) I/O patterns. Looks like we need to fix this up
>> >> >> ASAP.
>> >> >
>> >> > Can you easily run btrfs through the same rig? We don't use mpages and
>> >> > I'm curious.
>> >>
>> >> The readahead code was to blame, here. I wonder if we can change the
>> >> logic there to not break larger I/Os down into smaller sized ones.
>> >> Fengguang, doing a dd if=file of=/dev/null bs=1M results in 128K I/Os,
>> >> when 128KB is the read_ahead_kb value. Is there any heuristic you could
>> >> apply to not break larger I/Os up like this? Does that make sense?
>> > Well, not breaking up I/Os would be fairly simple as ondemand_readahead()
>> > already knows how much do we want to read. We just trim the submitted I/O to
>> > read_ahead_kb artificially. And that is done so that you don't trash page
>> > cache (possibly evicting pages you have not yet copied to userspace) when
>> > there are several processes doing large reads.
>>
>> Do you really think applications issue large reads and then don't use
>> the data? I mean, I've seen some bad programming, so I can believe that
>> would be the case. Still, I'd like to think it doesn't happen. ;-)
> No, I meant a cache thrashing problem. Suppose that we always readahead
> as much as user asks and there are say 100 processes each wanting to read 4
> MB. Then you need to find 400 MB in the page cache so that all reads can
> fit. And if you don't have them, reads for process 50 may evict pages we
> already preread for process 1, but process one didn't yet get to CPU to
> copy the data to userspace buffer. So the read becomes wasted.
Yeah, you're right, cache thrashing is an issue. In my tests, I didn't
actually see the *initial* read come through as a full 1MB I/O, though.
That seems odd to me.
>> > Maybe 128 KB is a too small default these days but OTOH noone prevents you
>> > from raising it (e.g. SLES uses 1 MB as a default).
>>
>> For some reason, I thought it had been bumped to 512KB by default. Must
>> be that overactive imagination I have... Anyway, if all of the distros
>> start bumping the default, don't you think it's time to consider bumping
>> it upstream, too? I thought there was a lot of work put into not being
>> too aggressive on readahead, so the downside of having a larger
>> read_ahead_kb setting was fairly small.
> Yeah, I believe 512KB should be pretty safe these days except for
> embedded world. OTOH average desktop user doesn't really care so it's
> mostly servers with beefy storage that care... (note that I wrote we raised
> the read_ahead_kb for SLES but not for openSUSE or SLED (desktop enterprise
> distro)).
Fair enough.
Cheers,
Jeff
next prev parent reply other threads:[~2012-01-24 20:59 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-21 17:26 [CFP] Linux Storage, Filesystem & Memory Management Summit 2012 (April 1-2) Williams, Dan J
2012-01-17 20:06 ` [LSF/MM TOPIC] a few storage topics Mike Snitzer
2012-01-17 21:36 ` [Lsf-pc] " Jan Kara
2012-01-18 22:58 ` Darrick J. Wong
2012-01-18 23:22 ` Jan Kara
2012-01-18 23:42 ` Boaz Harrosh
2012-01-19 9:46 ` Jan Kara
2012-01-19 15:08 ` Andrea Arcangeli
2012-01-19 20:52 ` Jan Kara
2012-01-19 21:39 ` Andrea Arcangeli
2012-01-22 11:31 ` Boaz Harrosh
2012-01-23 16:30 ` Jan Kara
2012-01-22 12:21 ` Boaz Harrosh
2012-01-23 16:18 ` Jan Kara
2012-01-23 17:53 ` Andrea Arcangeli
2012-01-23 18:28 ` Jeff Moyer
2012-01-23 18:56 ` Andrea Arcangeli
2012-01-23 19:19 ` Jeff Moyer
2012-01-24 15:15 ` Chris Mason
2012-01-24 16:56 ` [dm-devel] " Christoph Hellwig
2012-01-24 17:01 ` Andreas Dilger
2012-01-24 17:06 ` [Lsf-pc] [dm-devel] " Andrea Arcangeli
2012-01-24 17:08 ` Chris Mason
2012-01-24 17:08 ` [Lsf-pc] " Andreas Dilger
2012-01-24 18:05 ` [dm-devel] " Jeff Moyer
2012-01-24 18:40 ` Christoph Hellwig
2012-01-24 19:07 ` Chris Mason
2012-01-24 19:14 ` Jeff Moyer
2012-01-24 20:09 ` [Lsf-pc] [dm-devel] " Jan Kara
2012-01-24 20:13 ` [Lsf-pc] " Jeff Moyer
2012-01-24 20:39 ` [Lsf-pc] [dm-devel] " Jan Kara
2012-01-24 20:59 ` Jeff Moyer [this message]
2012-01-24 21:08 ` Jan Kara
2012-01-25 3:29 ` Wu Fengguang
2012-01-25 6:15 ` [Lsf-pc] " Andreas Dilger
2012-01-25 6:35 ` [Lsf-pc] [dm-devel] " Wu Fengguang
2012-01-25 14:00 ` Jan Kara
2012-01-26 12:29 ` Andreas Dilger
2012-01-27 17:03 ` Ted Ts'o
2012-01-26 16:25 ` Vivek Goyal
2012-01-26 20:37 ` Jan Kara
2012-01-26 22:34 ` Dave Chinner
2012-01-27 3:27 ` Wu Fengguang
2012-01-27 5:25 ` Andreas Dilger
2012-01-27 7:53 ` Wu Fengguang
2012-01-25 14:33 ` Steven Whitehouse
2012-01-25 14:45 ` Jan Kara
2012-01-25 16:22 ` Loke, Chetan
2012-01-25 16:40 ` Steven Whitehouse
2012-01-25 17:08 ` Loke, Chetan
2012-01-25 17:32 ` James Bottomley
2012-01-25 18:28 ` Loke, Chetan
2012-01-25 18:37 ` Loke, Chetan
2012-01-25 18:37 ` James Bottomley
2012-01-25 20:06 ` Chris Mason
2012-01-25 22:46 ` Andrea Arcangeli
2012-01-25 22:58 ` Jan Kara
2012-01-26 8:59 ` Boaz Harrosh
2012-01-26 16:40 ` Loke, Chetan
2012-01-26 17:00 ` Andreas Dilger
2012-01-26 17:16 ` Loke, Chetan
2012-02-03 12:37 ` Wu Fengguang
2012-01-26 22:38 ` Dave Chinner
2012-01-26 16:17 ` Loke, Chetan
2012-01-25 18:44 ` Boaz Harrosh
2012-02-03 12:55 ` Wu Fengguang
2012-01-24 19:11 ` [dm-devel] [Lsf-pc] " Jeff Moyer
2012-01-26 22:31 ` Dave Chinner
2012-01-24 17:12 ` Jeff Moyer
2012-01-24 17:32 ` Chris Mason
2012-01-24 18:14 ` Jeff Moyer
2012-01-25 0:23 ` NeilBrown
2012-01-25 6:11 ` Andreas Dilger
2012-01-18 23:39 ` Dan Williams
2012-01-24 17:59 ` Martin K. Petersen
2012-01-24 19:48 ` Douglas Gilbert
2012-01-24 20:04 ` Martin K. Petersen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=x49liowkeax.fsf@segfault.boston.devel.redhat.com \
--to=jmoyer@redhat.com \
--cc=aarcange@redhat.com \
--cc=adilger@dilger.ca \
--cc=bharrosh@panasas.com \
--cc=chris.mason@oracle.com \
--cc=djwong@us.ibm.com \
--cc=dm-devel@redhat.com \
--cc=fengguang.wu@gmail.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=neilb@suse.de \
--cc=snitzer@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).