linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: Linux Memory Management List <linux-mm@kvack.org>,
	<linux-fsdevel@vger.kernel.org>,
	Li Shaohua <shaohua.li@intel.com>,
	Clemens Ladisch <clemens@ladisch.de>,
	Jens Axboe <jens.axboe@oracle.com>,
	Rik van Riel <riel@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Andi Kleen <andi@firstfloor.org>
Subject: Re: [PATCH 1/8] block: limit default readahead size for small devices
Date: Mon, 21 Nov 2011 14:52:47 -0800	[thread overview]
Message-ID: <20111121145247.0e37dc36.akpm@linux-foundation.org> (raw)
In-Reply-To: <20111121093846.121502745@intel.com>

On Mon, 21 Nov 2011 17:18:20 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:

> Linus reports a _really_ small & slow (505kB, 15kB/s) USB device,
> on which blkid runs unpleasantly slow. He manages to optimize the blkid
> reads down to 1kB+16kB, but still kernel read-ahead turns it into 48kB.
> 
>      lseek 0,    read 1024   => readahead 4 pages (start of file)

I'm disturbed that the code did a 4 page (16kbyte?) readahead after an
lseek.  Given the high probability that the next read will occur after
a second lseek, that's a mistake.

Was an lseek to offset 0 special-cased?

>      lseek 1536, read 16384  => readahead 8 pages (page contiguous)
> 
> The readahead heuristics involved here are reasonable ones in general.
> So it's good to fix blkid with fadvise(RANDOM), as Linus already did.
> 
> For the kernel part, Linus suggests:
>   So maybe we could be less aggressive about read-ahead when the size of
>   the device is small? Turning a 16kB read into a 64kB one is a big deal,
>   when it's about 15% of the whole device!
> 
> This looks reasonable: smaller device tend to be slower (USB sticks as
> well as micro/mobile/old hard disks).

Spose so.  Obviously there are other characteristics which should be
considered when choosing a readaahead size, but one of them can be disk
size and that's what this change does.

In a better world, userspace would run a
work-out-what-readahead-size-to-use script each time a distro is
installed and when new storage devices are added/detected.  Userspace
would then remember that readahead size for subsequent bootups.

In the real world, we shovel guaranteed-to-be-wrong guesswork into the
kernel and everyone just uses the results.  Sigh.

> --- linux-next.orig/block/genhd.c	2011-10-31 00:13:51.000000000 +0800
> +++ linux-next/block/genhd.c	2011-11-18 11:27:08.000000000 +0800
> @@ -623,6 +623,26 @@ void add_disk(struct gendisk *disk)
>  	WARN_ON(retval);
>  
>  	disk_add_events(disk);
> +
> +	/*
> +	 * Limit default readahead size for small devices.
> +	 *        disk size    readahead size
> +	 *               1M                8k
> +	 *               4M               16k
> +	 *              16M               32k
> +	 *              64M               64k
> +	 *             256M              128k
> +	 *               1G              256k
> +	 *               4G              512k
> +	 *              16G             1024k
> +	 *              64G             2048k
> +	 *             256G             4096k
> +	 */
> +	if (get_capacity(disk)) {
> +		unsigned long size = get_capacity(disk) >> 9;

get_capacity() returns sector_t.  This expression will overflow with a
2T disk.  I'm not sure if we successfully support 2T disks on 32-bit
machines, but changes like this will guarantee that we don't :)

> +		size = 1UL << (ilog2(size) / 2);

I think there's a rounddown_pow_of_two() hiding in that expression?

> +		bdi->ra_pages = min(bdi->ra_pages, size);

I don't have a clue why that min() is in there.  It needs a comment,
please.

> +	}


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2011-11-21 22:52 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-21  9:18 [PATCH 0/8] readahead stats/tracing, backwards prefetching and more Wu Fengguang
2011-11-21  9:18 ` [PATCH 1/8] block: limit default readahead size for small devices Wu Fengguang
2011-11-21 10:00   ` Christoph Hellwig
2011-11-21 11:24     ` Wu Fengguang
2011-11-21 12:47     ` Andi Kleen
2011-11-21 14:46   ` Jeff Moyer
2011-11-21 22:52   ` Andrew Morton [this message]
2011-11-22 14:23     ` Jeff Moyer
2011-11-23 12:18     ` Wu Fengguang
2011-11-21  9:18 ` [PATCH 2/8] readahead: make default readahead size a kernel parameter Wu Fengguang
2011-11-21 10:01   ` Christoph Hellwig
2011-11-21 11:35     ` Wu Fengguang
2011-11-24 22:28       ` Jan Kara
2011-11-25  0:36         ` Dave Chinner
2011-11-28  2:39           ` Wu Fengguang
2011-11-30 13:04             ` Christian Ehrhardt
2011-11-30 13:29               ` Wu Fengguang
2011-11-30 16:09                 ` Jan Kara
2011-11-21 13:16   ` Namhyung Kim
2011-11-21 13:24     ` Wu Fengguang
2011-11-21  9:18 ` [PATCH 3/8] readahead: replace ra->mmap_miss with ra->ra_flags Wu Fengguang
2011-11-21 11:04   ` Steven Whitehouse
2011-11-21 11:42     ` Wu Fengguang
2011-11-21 23:01   ` Andrew Morton
2011-11-23 12:47     ` Wu Fengguang
2011-11-23 20:31       ` Andrew Morton
2011-11-29  3:42         ` Wu Fengguang
2011-11-21  9:18 ` [PATCH 4/8] readahead: record readahead patterns Wu Fengguang
2011-11-21 23:19   ` Andrew Morton
2011-11-29  2:40     ` Wu Fengguang
2011-11-21  9:18 ` [PATCH 5/8] readahead: add /debug/readahead/stats Wu Fengguang
2011-11-21 14:17   ` Andi Kleen
2011-11-22 14:14     ` Wu Fengguang
2011-11-21 23:29   ` Andrew Morton
2011-11-21 23:32     ` Andi Kleen
2011-11-29  3:23     ` Wu Fengguang
2011-11-29  4:49       ` Andrew Morton
2011-11-29  6:41         ` Wu Fengguang
2011-11-29 12:29           ` Wu Fengguang
2011-11-21  9:18 ` [PATCH 6/8] readahead: add debug tracing event Wu Fengguang
2011-11-21 14:01   ` Steven Rostedt
2011-11-21  9:18 ` [PATCH 7/8] readahead: basic support for backwards prefetching Wu Fengguang
2011-11-21 23:33   ` Andrew Morton
2011-11-29  3:08     ` Wu Fengguang
2011-11-21  9:18 ` [PATCH 8/8] readahead: dont do start-of-file readahead after lseek() Wu Fengguang
2011-11-21 23:36   ` Andrew Morton
2011-11-22 14:18     ` Wu Fengguang
2011-11-21  9:56 ` [PATCH 0/8] readahead stats/tracing, backwards prefetching and more Christoph Hellwig
2011-11-21 12:00   ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111121145247.0e37dc36.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=clemens@ladisch.de \
    --cc=fengguang.wu@intel.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@redhat.com \
    --cc=shaohua.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).