From: Andrew Morton <akpm@linux-foundation.org>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: Linux Memory Management List <linux-mm@kvack.org>,
<linux-fsdevel@vger.kernel.org>,
Li Shaohua <shaohua.li@intel.com>,
Clemens Ladisch <clemens@ladisch.de>,
Jens Axboe <jens.axboe@oracle.com>,
Rik van Riel <riel@redhat.com>,
LKML <linux-kernel@vger.kernel.org>,
Andi Kleen <andi@firstfloor.org>
Subject: Re: [PATCH 1/8] block: limit default readahead size for small devices
Date: Mon, 21 Nov 2011 14:52:47 -0800 [thread overview]
Message-ID: <20111121145247.0e37dc36.akpm@linux-foundation.org> (raw)
In-Reply-To: <20111121093846.121502745@intel.com>
On Mon, 21 Nov 2011 17:18:20 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:
> Linus reports a _really_ small & slow (505kB, 15kB/s) USB device,
> on which blkid runs unpleasantly slow. He manages to optimize the blkid
> reads down to 1kB+16kB, but still kernel read-ahead turns it into 48kB.
>
> lseek 0, read 1024 => readahead 4 pages (start of file)
I'm disturbed that the code did a 4 page (16kbyte?) readahead after an
lseek. Given the high probability that the next read will occur after
a second lseek, that's a mistake.
Was an lseek to offset 0 special-cased?
> lseek 1536, read 16384 => readahead 8 pages (page contiguous)
>
> The readahead heuristics involved here are reasonable ones in general.
> So it's good to fix blkid with fadvise(RANDOM), as Linus already did.
>
> For the kernel part, Linus suggests:
> So maybe we could be less aggressive about read-ahead when the size of
> the device is small? Turning a 16kB read into a 64kB one is a big deal,
> when it's about 15% of the whole device!
>
> This looks reasonable: smaller device tend to be slower (USB sticks as
> well as micro/mobile/old hard disks).
Spose so. Obviously there are other characteristics which should be
considered when choosing a readaahead size, but one of them can be disk
size and that's what this change does.
In a better world, userspace would run a
work-out-what-readahead-size-to-use script each time a distro is
installed and when new storage devices are added/detected. Userspace
would then remember that readahead size for subsequent bootups.
In the real world, we shovel guaranteed-to-be-wrong guesswork into the
kernel and everyone just uses the results. Sigh.
> --- linux-next.orig/block/genhd.c 2011-10-31 00:13:51.000000000 +0800
> +++ linux-next/block/genhd.c 2011-11-18 11:27:08.000000000 +0800
> @@ -623,6 +623,26 @@ void add_disk(struct gendisk *disk)
> WARN_ON(retval);
>
> disk_add_events(disk);
> +
> + /*
> + * Limit default readahead size for small devices.
> + * disk size readahead size
> + * 1M 8k
> + * 4M 16k
> + * 16M 32k
> + * 64M 64k
> + * 256M 128k
> + * 1G 256k
> + * 4G 512k
> + * 16G 1024k
> + * 64G 2048k
> + * 256G 4096k
> + */
> + if (get_capacity(disk)) {
> + unsigned long size = get_capacity(disk) >> 9;
get_capacity() returns sector_t. This expression will overflow with a
2T disk. I'm not sure if we successfully support 2T disks on 32-bit
machines, but changes like this will guarantee that we don't :)
> + size = 1UL << (ilog2(size) / 2);
I think there's a rounddown_pow_of_two() hiding in that expression?
> + bdi->ra_pages = min(bdi->ra_pages, size);
I don't have a clue why that min() is in there. It needs a comment,
please.
> + }
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@linux-foundation.org>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: Linux Memory Management List <linux-mm@kvack.org>,
linux-fsdevel@vger.kernel.org, Li Shaohua <shaohua.li@intel.com>,
Clemens Ladisch <clemens@ladisch.de>,
Jens Axboe <jens.axboe@oracle.com>,
Rik van Riel <riel@redhat.com>,
LKML <linux-kernel@vger.kernel.org>,
Andi Kleen <andi@firstfloor.org>
Subject: Re: [PATCH 1/8] block: limit default readahead size for small devices
Date: Mon, 21 Nov 2011 14:52:47 -0800 [thread overview]
Message-ID: <20111121145247.0e37dc36.akpm@linux-foundation.org> (raw)
In-Reply-To: <20111121093846.121502745@intel.com>
On Mon, 21 Nov 2011 17:18:20 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:
> Linus reports a _really_ small & slow (505kB, 15kB/s) USB device,
> on which blkid runs unpleasantly slow. He manages to optimize the blkid
> reads down to 1kB+16kB, but still kernel read-ahead turns it into 48kB.
>
> lseek 0, read 1024 => readahead 4 pages (start of file)
I'm disturbed that the code did a 4 page (16kbyte?) readahead after an
lseek. Given the high probability that the next read will occur after
a second lseek, that's a mistake.
Was an lseek to offset 0 special-cased?
> lseek 1536, read 16384 => readahead 8 pages (page contiguous)
>
> The readahead heuristics involved here are reasonable ones in general.
> So it's good to fix blkid with fadvise(RANDOM), as Linus already did.
>
> For the kernel part, Linus suggests:
> So maybe we could be less aggressive about read-ahead when the size of
> the device is small? Turning a 16kB read into a 64kB one is a big deal,
> when it's about 15% of the whole device!
>
> This looks reasonable: smaller device tend to be slower (USB sticks as
> well as micro/mobile/old hard disks).
Spose so. Obviously there are other characteristics which should be
considered when choosing a readaahead size, but one of them can be disk
size and that's what this change does.
In a better world, userspace would run a
work-out-what-readahead-size-to-use script each time a distro is
installed and when new storage devices are added/detected. Userspace
would then remember that readahead size for subsequent bootups.
In the real world, we shovel guaranteed-to-be-wrong guesswork into the
kernel and everyone just uses the results. Sigh.
> --- linux-next.orig/block/genhd.c 2011-10-31 00:13:51.000000000 +0800
> +++ linux-next/block/genhd.c 2011-11-18 11:27:08.000000000 +0800
> @@ -623,6 +623,26 @@ void add_disk(struct gendisk *disk)
> WARN_ON(retval);
>
> disk_add_events(disk);
> +
> + /*
> + * Limit default readahead size for small devices.
> + * disk size readahead size
> + * 1M 8k
> + * 4M 16k
> + * 16M 32k
> + * 64M 64k
> + * 256M 128k
> + * 1G 256k
> + * 4G 512k
> + * 16G 1024k
> + * 64G 2048k
> + * 256G 4096k
> + */
> + if (get_capacity(disk)) {
> + unsigned long size = get_capacity(disk) >> 9;
get_capacity() returns sector_t. This expression will overflow with a
2T disk. I'm not sure if we successfully support 2T disks on 32-bit
machines, but changes like this will guarantee that we don't :)
> + size = 1UL << (ilog2(size) / 2);
I think there's a rounddown_pow_of_two() hiding in that expression?
> + bdi->ra_pages = min(bdi->ra_pages, size);
I don't have a clue why that min() is in there. It needs a comment,
please.
> + }
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@linux-foundation.org>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: Linux Memory Management List <linux-mm@kvack.org>,
<linux-fsdevel@vger.kernel.org>,
Li Shaohua <shaohua.li@intel.com>,
Clemens Ladisch <clemens@ladisch.de>,
Jens Axboe <jens.axboe@oracle.com>,
Rik van Riel <riel@redhat.com>,
LKML <linux-kernel@vger.kernel.org>,
Andi Kleen <andi@firstfloor.org>
Subject: Re: [PATCH 1/8] block: limit default readahead size for small devices
Date: Mon, 21 Nov 2011 14:52:47 -0800 [thread overview]
Message-ID: <20111121145247.0e37dc36.akpm@linux-foundation.org> (raw)
In-Reply-To: <20111121093846.121502745@intel.com>
On Mon, 21 Nov 2011 17:18:20 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:
> Linus reports a _really_ small & slow (505kB, 15kB/s) USB device,
> on which blkid runs unpleasantly slow. He manages to optimize the blkid
> reads down to 1kB+16kB, but still kernel read-ahead turns it into 48kB.
>
> lseek 0, read 1024 => readahead 4 pages (start of file)
I'm disturbed that the code did a 4 page (16kbyte?) readahead after an
lseek. Given the high probability that the next read will occur after
a second lseek, that's a mistake.
Was an lseek to offset 0 special-cased?
> lseek 1536, read 16384 => readahead 8 pages (page contiguous)
>
> The readahead heuristics involved here are reasonable ones in general.
> So it's good to fix blkid with fadvise(RANDOM), as Linus already did.
>
> For the kernel part, Linus suggests:
> So maybe we could be less aggressive about read-ahead when the size of
> the device is small? Turning a 16kB read into a 64kB one is a big deal,
> when it's about 15% of the whole device!
>
> This looks reasonable: smaller device tend to be slower (USB sticks as
> well as micro/mobile/old hard disks).
Spose so. Obviously there are other characteristics which should be
considered when choosing a readaahead size, but one of them can be disk
size and that's what this change does.
In a better world, userspace would run a
work-out-what-readahead-size-to-use script each time a distro is
installed and when new storage devices are added/detected. Userspace
would then remember that readahead size for subsequent bootups.
In the real world, we shovel guaranteed-to-be-wrong guesswork into the
kernel and everyone just uses the results. Sigh.
> --- linux-next.orig/block/genhd.c 2011-10-31 00:13:51.000000000 +0800
> +++ linux-next/block/genhd.c 2011-11-18 11:27:08.000000000 +0800
> @@ -623,6 +623,26 @@ void add_disk(struct gendisk *disk)
> WARN_ON(retval);
>
> disk_add_events(disk);
> +
> + /*
> + * Limit default readahead size for small devices.
> + * disk size readahead size
> + * 1M 8k
> + * 4M 16k
> + * 16M 32k
> + * 64M 64k
> + * 256M 128k
> + * 1G 256k
> + * 4G 512k
> + * 16G 1024k
> + * 64G 2048k
> + * 256G 4096k
> + */
> + if (get_capacity(disk)) {
> + unsigned long size = get_capacity(disk) >> 9;
get_capacity() returns sector_t. This expression will overflow with a
2T disk. I'm not sure if we successfully support 2T disks on 32-bit
machines, but changes like this will guarantee that we don't :)
> + size = 1UL << (ilog2(size) / 2);
I think there's a rounddown_pow_of_two() hiding in that expression?
> + bdi->ra_pages = min(bdi->ra_pages, size);
I don't have a clue why that min() is in there. It needs a comment,
please.
> + }
next prev parent reply other threads:[~2011-11-21 22:52 UTC|newest]
Thread overview: 116+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-11-21 9:18 [PATCH 0/8] readahead stats/tracing, backwards prefetching and more Wu Fengguang
2011-11-21 9:18 ` Wu Fengguang
2011-11-21 9:18 ` Wu Fengguang
2011-11-21 9:18 ` [PATCH 1/8] block: limit default readahead size for small devices Wu Fengguang
2011-11-21 9:18 ` Wu Fengguang
2011-11-21 9:18 ` Wu Fengguang
2011-11-21 10:00 ` Christoph Hellwig
2011-11-21 10:00 ` Christoph Hellwig
2011-11-21 11:24 ` Wu Fengguang
2011-11-21 11:24 ` Wu Fengguang
2011-11-21 12:47 ` Andi Kleen
2011-11-21 12:47 ` Andi Kleen
2011-11-21 14:46 ` Jeff Moyer
2011-11-21 14:46 ` Jeff Moyer
2011-11-21 14:46 ` Jeff Moyer
2011-11-21 22:52 ` Andrew Morton [this message]
2011-11-21 22:52 ` Andrew Morton
2011-11-21 22:52 ` Andrew Morton
2011-11-22 14:23 ` Jeff Moyer
2011-11-22 14:23 ` Jeff Moyer
2011-11-22 14:23 ` Jeff Moyer
2011-11-23 12:18 ` Wu Fengguang
2011-11-23 12:18 ` Wu Fengguang
2011-11-21 9:18 ` [PATCH 2/8] readahead: make default readahead size a kernel parameter Wu Fengguang
2011-11-21 9:18 ` Wu Fengguang
2011-11-21 9:18 ` Wu Fengguang
2011-11-21 10:01 ` Christoph Hellwig
2011-11-21 10:01 ` Christoph Hellwig
2011-11-21 11:35 ` Wu Fengguang
2011-11-21 11:35 ` Wu Fengguang
2011-11-24 22:28 ` Jan Kara
2011-11-24 22:28 ` Jan Kara
2011-11-25 0:36 ` Dave Chinner
2011-11-25 0:36 ` Dave Chinner
2011-11-28 2:39 ` Wu Fengguang
2011-11-28 2:39 ` Wu Fengguang
2011-11-30 13:04 ` Christian Ehrhardt
2011-11-30 13:04 ` Christian Ehrhardt
2011-11-30 13:04 ` Christian Ehrhardt
2011-11-30 13:29 ` Wu Fengguang
2011-11-30 13:29 ` Wu Fengguang
2011-11-30 13:29 ` Wu Fengguang
2011-11-30 16:09 ` Jan Kara
2011-11-30 16:09 ` Jan Kara
2011-11-21 13:16 ` Namhyung Kim
2011-11-21 13:16 ` Namhyung Kim
2011-11-21 13:24 ` Wu Fengguang
2011-11-21 13:24 ` Wu Fengguang
2011-11-21 9:18 ` [PATCH 3/8] readahead: replace ra->mmap_miss with ra->ra_flags Wu Fengguang
2011-11-21 9:18 ` Wu Fengguang
2011-11-21 9:18 ` Wu Fengguang
2011-11-21 11:04 ` Steven Whitehouse
2011-11-21 11:04 ` Steven Whitehouse
2011-11-21 11:42 ` Wu Fengguang
2011-11-21 11:42 ` Wu Fengguang
2011-11-21 23:01 ` Andrew Morton
2011-11-21 23:01 ` Andrew Morton
2011-11-21 23:01 ` Andrew Morton
2011-11-23 12:47 ` Wu Fengguang
2011-11-23 12:47 ` Wu Fengguang
2011-11-23 20:31 ` Andrew Morton
2011-11-23 20:31 ` Andrew Morton
2011-11-29 3:42 ` Wu Fengguang
2011-11-29 3:42 ` Wu Fengguang
2011-11-21 9:18 ` [PATCH 4/8] readahead: record readahead patterns Wu Fengguang
2011-11-21 9:18 ` Wu Fengguang
2011-11-21 9:18 ` Wu Fengguang
2011-11-21 23:19 ` Andrew Morton
2011-11-21 23:19 ` Andrew Morton
2011-11-21 23:19 ` Andrew Morton
2011-11-29 2:40 ` Wu Fengguang
2011-11-29 2:40 ` Wu Fengguang
2011-11-21 9:18 ` [PATCH 5/8] readahead: add /debug/readahead/stats Wu Fengguang
2011-11-21 9:18 ` Wu Fengguang
2011-11-21 9:18 ` Wu Fengguang
2011-11-21 14:17 ` Andi Kleen
2011-11-21 14:17 ` Andi Kleen
2011-11-22 14:14 ` Wu Fengguang
2011-11-22 14:14 ` Wu Fengguang
2011-11-21 23:29 ` Andrew Morton
2011-11-21 23:29 ` Andrew Morton
2011-11-21 23:32 ` Andi Kleen
2011-11-21 23:32 ` Andi Kleen
2011-11-29 3:23 ` Wu Fengguang
2011-11-29 3:23 ` Wu Fengguang
2011-11-29 4:49 ` Andrew Morton
2011-11-29 4:49 ` Andrew Morton
2011-11-29 6:41 ` Wu Fengguang
2011-11-29 6:41 ` Wu Fengguang
2011-11-29 12:29 ` Wu Fengguang
2011-11-29 12:29 ` Wu Fengguang
2011-11-21 9:18 ` [PATCH 6/8] readahead: add debug tracing event Wu Fengguang
2011-11-21 9:18 ` Wu Fengguang
2011-11-21 9:18 ` Wu Fengguang
2011-11-21 14:01 ` Steven Rostedt
2011-11-21 14:01 ` Steven Rostedt
2011-11-21 9:18 ` [PATCH 7/8] readahead: basic support for backwards prefetching Wu Fengguang
2011-11-21 9:18 ` Wu Fengguang
2011-11-21 9:18 ` Wu Fengguang
2011-11-21 23:33 ` Andrew Morton
2011-11-21 23:33 ` Andrew Morton
2011-11-21 23:33 ` Andrew Morton
2011-11-29 3:08 ` Wu Fengguang
2011-11-29 3:08 ` Wu Fengguang
2011-11-21 9:18 ` [PATCH 8/8] readahead: dont do start-of-file readahead after lseek() Wu Fengguang
2011-11-21 9:18 ` Wu Fengguang
2011-11-21 9:18 ` Wu Fengguang
2011-11-21 23:36 ` Andrew Morton
2011-11-21 23:36 ` Andrew Morton
2011-11-21 23:36 ` Andrew Morton
2011-11-22 14:18 ` Wu Fengguang
2011-11-22 14:18 ` Wu Fengguang
2011-11-21 9:56 ` [PATCH 0/8] readahead stats/tracing, backwards prefetching and more Christoph Hellwig
2011-11-21 9:56 ` Christoph Hellwig
2011-11-21 12:00 ` Wu Fengguang
2011-11-21 12:00 ` Wu Fengguang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111121145247.0e37dc36.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=clemens@ladisch.de \
--cc=fengguang.wu@intel.com \
--cc=jens.axboe@oracle.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=riel@redhat.com \
--cc=shaohua.li@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.