linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Jens Axboe <jens.axboe@oracle.com>,
	Wu Fengguang <fengguang.wu@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linux Memory Management List <linux-mm@kvack.org>
Cc: <linux-fsdevel@vger.kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Subject: [PATCH 01/11] readahead: limit readahead size for small devices
Date: Tue, 02 Feb 2010 23:28:36 +0800	[thread overview]
Message-ID: <20100202153316.375570078@intel.com> (raw)
In-Reply-To: 20100202152835.683907822@intel.com

[-- Attachment #1: readahead-size-for-tiny-device.patch --]
[-- Type: text/plain, Size: 7210 bytes --]

Linus reports a _really_ small & slow (505kB, 15kB/s) USB device,
on which blkid runs unpleasantly slow. He manages to optimize the blkid
reads down to 1kB+16kB, but still kernel read-ahead turns it into 48kB.

     lseek 0,    read 1024   => readahead 4 pages (start of file)
     lseek 1536, read 16384  => readahead 8 pages (page contiguous)

The readahead heuristics involved here are reasonable ones in general.
So it's good to fix blkid with fadvise(RANDOM), as Linus already did.

For the kernel part, Linus suggests:
  So maybe we could be less aggressive about read-ahead when the size of
  the device is small? Turning a 16kB read into a 64kB one is a big deal,
  when it's about 15% of the whole device!

This looks reasonable: smaller device tend to be slower (USB sticks as
well as micro/mobile/old hard disks).

Given that the non-rotational attribute is not always reported, we can
take disk size as a max readahead size hint. We use a formula that
generates the following concrete limits:

        disk size    readahead size
     (scale by 4)      (scale by 2)
               2M            	 4k
               8M                8k
              32M               16k
             128M               32k
             512M               64k
               2G              128k
               8G              256k
              32G              512k
             128G             1024k

The formula is determined on the following data, collected by script:

	#!/bin/sh

	# please make sure BDEV is not mounted or opened by others
	BDEV=sdb

	for rasize in 4 16 32 64 128 256 512 1024 2048
	do
		echo $rasize > /sys/block/$BDEV/queue/read_ahead_kb 
		time dd if=/dev/$BDEV of=/dev/null bs=4k count=102400
	done

The principle is, the formula shall not limit readahead size to such a
degree that will impact some device's sequential read performance.

The Intel SSD is special in that its throughput increases steadily with
larger readahead size. However it may take years for Linux to increase
its default readahead size to 2MB, so we don't take it seriously in the
formula.

SSD 80G Intel x25-M SSDSA2M080

	rasize	first run time/throughput	second run time/throughput
	------------------------------------------------------------------
	  4k	3.40038 s,	123 MB/s	3.42842 s,	122 MB/s
	  8k	2.7362 s,	153 MB/s	2.74528 s,	153 MB/s
	 16k	2.59808 s,	161 MB/s	2.58728 s,	162 MB/s
	 32k	2.50488 s,	167 MB/s	2.49138 s,	168 MB/s
	 64k	2.12861 s,	197 MB/s	2.13055 s,	197 MB/s
	128k	1.92905 s,	217 MB/s	1.93176 s,	217 MB/s
	256k	1.75896 s,	238 MB/s	1.78963 s,	234 MB/s
	512k	1.67357 s,	251 MB/s	1.69112 s,	248 MB/s
	  1M	1.62115 s,	259 MB/s	1.63206 s,	257 MB/s
==>	  2M	1.56204 s,	269 MB/s	1.58854 s,	264 MB/s
	  4M	1.57949 s,	266 MB/s	1.57426 s,	266 MB/s

Note that ==> points to the readahead size that yields plateau throughput.

SSD 30G SanDisk SATA 5000

	  4k	14.1593 s,	29.6 MB/s	14.1699 s,	29.6 MB/s	14.1782 s,	29.6 MB/s
	  8k	8.05231 s,	52.1 MB/s	8.04463 s,	52.1 MB/s	8.04758 s,	52.1 MB/s
	 16k	6.81751 s,	61.5 MB/s	6.81564 s,	61.5 MB/s	6.8146 s,	61.5 MB/s
	 32k	6.24176 s,	67.2 MB/s	6.2438 s,	67.2 MB/s	6.24645 s,	67.1 MB/s
	 64k	5.87828 s,	71.4 MB/s	5.87858 s,	71.3 MB/s	5.87481 s,	71.4 MB/s
	128k	5.71649 s,	73.4 MB/s	5.71804 s,	73.4 MB/s	5.72055 s,	73.3 MB/s
==>	256k	5.62466 s,	74.6 MB/s	5.62304 s,	74.6 MB/s	5.62114 s,	74.6 MB/s
	512k	5.61532 s,	74.7 MB/s	5.62098 s,	74.6 MB/s	5.61818 s,	74.7 MB/s
	  1M	5.50888 s,	76.1 MB/s	5.6204 s,	74.6 MB/s	5.62281 s,	74.6 MB/s

USB stick 32G Teclast CoolFlash idVendor=1307, idProduct=0165

	  4k	53.1635 s,	7.9 MB/s 	53.155 s,	7.9 MB/s 	53.107 s,	7.9 MB/s
	  8k	23.4061 s,	17.9 MB/s	23.3955 s,	17.9 MB/s	23.4222 s,	17.9 MB/s
	 16k	17.1077 s,	24.5 MB/s	17.0909 s,	24.5 MB/s	17.0875 s,	24.5 MB/s
	 32k	14.6029 s,	28.7 MB/s	14.5913 s,	28.7 MB/s	14.5951 s,	28.7 MB/s
	 64k	14.5483 s,	28.8 MB/s	14.5344 s,	28.9 MB/s	14.5333 s,	28.9 MB/s
==>	128k	13.7497 s,	30.5 MB/s	13.7364 s,	30.5 MB/s	13.731 s,	30.5 MB/s
	256k	13.5521 s,	30.9 MB/s	13.5415 s,	31.0 MB/s	13.5554 s,	30.9 MB/s
	512k	13.5414 s,	31.0 MB/s	13.5631 s,	30.9 MB/s	13.5654 s,	30.9 MB/s
	  1M	13.574 s,	30.9 MB/s	13.5686 s,	30.9 MB/s	13.5667 s,	30.9 MB/s

USB stick 4G SanDisk  Cruzer idVendor=0781, idProduct=5151

	  4k	65.3449 s,	6.4 MB/s 	65.3759 s,	6.4 MB/s 	65.3405 s,	6.4 MB/s
	  8k	31.2002 s,	13.4 MB/s	31.1914 s,	13.4 MB/s	31.6836 s,	13.2 MB/s
	 16k	23.5281 s,	17.8 MB/s	23.4705 s,	17.9 MB/s	23.5859 s,	17.8 MB/s
	 32k	19.6786 s,	21.3 MB/s	19.719 s,	21.3 MB/s	19.7548 s,	21.2 MB/s
	 64k	19.6219 s,	21.4 MB/s	19.6125 s,	21.4 MB/s	19.594 s,	21.4 MB/s
==>	128k	18.021 s,	23.3 MB/s	18.0527 s,	23.2 MB/s	18.0694 s,	23.2 MB/s
	256k	17.978 s,	23.3 MB/s	17.6483 s,	23.8 MB/s	17.9324 s,	23.4 MB/s
	512k	17.659 s,	23.8 MB/s	17.9403 s,	23.4 MB/s	17.986 s,	23.3 MB/s
	  1M	17.9437 s,	23.4 MB/s	18.0634 s,	23.2 MB/s	17.9469 s,	23.4 MB/s

USB stick 2G idVendor=0204, idProduct=6025 SerialNumber: 08082005000113

	  4k	62.6246 s,	6.7 MB/s 	60.5872 s,	6.9 MB/s 	62.2581 s,	6.7 MB/s
	  8k	35.7505 s,	11.7 MB/s	35.764 s,	11.7 MB/s	35.7396 s,	11.7 MB/s
	 16k	33.7949 s,	12.4 MB/s	33.8041 s,	12.4 MB/s	33.8015 s,	12.4 MB/s
-->	 32k	31.3851 s,	13.4 MB/s	31.381 s,	13.4 MB/s	31.3784 s,	13.4 MB/s
	 64k	31.3478 s,	13.4 MB/s	31.3494 s,	13.4 MB/s	31.3486 s,	13.4 MB/s
==>	128k	30.7384 s,	13.6 MB/s	30.7337 s,	13.6 MB/s	30.728 s,	13.6 MB/s
	256k	30.5439 s,	13.7 MB/s	30.544 s,	13.7 MB/s	30.5433 s,	13.7 MB/s
	512k	30.5408 s,	13.7 MB/s	30.543 s,	13.7 MB/s	30.5447 s,	13.7 MB/s
	  1M	30.5919 s,	13.7 MB/s	30.5893 s,	13.7 MB/s	30.5939 s,	13.7 MB/s

Anyone has 512/128MB USB stick? Anyway you get satisfiable performance
with >= 32k readahead size.

Tested-by: Linus Torvalds <torvalds@linux-foundation.org> 
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 block/genhd.c |   18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

--- linux.orig/block/genhd.c	2010-01-21 21:17:16.000000000 +0800
+++ linux/block/genhd.c	2010-01-22 17:09:34.000000000 +0800
@@ -518,6 +518,7 @@ void add_disk(struct gendisk *disk)
 	struct backing_dev_info *bdi;
 	dev_t devt;
 	int retval;
+	unsigned long size;
 
 	/* minors == 0 indicates to use ext devt from part0 and should
 	 * be accompanied with EXT_DEVT flag.  Make sure all
@@ -551,6 +552,23 @@ void add_disk(struct gendisk *disk)
 	retval = sysfs_create_link(&disk_to_dev(disk)->kobj, &bdi->dev->kobj,
 				   "bdi");
 	WARN_ON(retval);
+
+	/*
+	 * limit readahead size for small devices
+	 *        disk size    readahead size
+	 *               2M                4k
+	 *               8M                8k
+	 *              32M               16k
+	 *             128M               32k
+	 *             512M               64k
+	 *               2G              128k
+	 *               8G              256k
+	 *              32G              512k
+	 *             128G             1024k
+	 */
+	size = get_capacity(disk) >> 12;
+	size = 1UL << (ilog2(size) / 2);
+	bdi->ra_pages = min(bdi->ra_pages, size);
 }
 
 EXPORT_SYMBOL(add_disk);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-02-02 15:28 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-02 15:28 [PATCH 00/11] [RFC] 512K readahead size with thrashing safe readahead Wu Fengguang
2010-02-02 15:28 ` Wu Fengguang [this message]
2010-02-02 19:38   ` [PATCH 01/11] readahead: limit readahead size for small devices Jens Axboe
2010-02-03  6:13     ` Wu Fengguang
2010-02-03  8:23       ` Jens Axboe
2010-02-04  8:24   ` Clemens Ladisch
2010-02-04 13:00     ` Wu Fengguang
2010-02-02 15:28 ` [PATCH 02/11] readahead: bump up the default readahead size Wu Fengguang
2010-02-02 15:28 ` [PATCH 03/11] readahead: introduce {MAX|MIN}_READAHEAD_PAGES macros for ease of use Wu Fengguang
2010-02-02 15:28 ` [PATCH 04/11] readahead: replace ra->mmap_miss with ra->ra_flags Wu Fengguang
2010-02-02 15:28 ` [PATCH 05/11] readahead: retain inactive lru pages to be accessed soon Wu Fengguang
2010-02-02 15:28 ` [PATCH 06/11] readahead: thrashing safe context readahead Wu Fengguang
2010-02-02 15:28 ` [PATCH 07/11] readahead: record readahead patterns Wu Fengguang
2010-02-02 15:28 ` [PATCH 08/11] readahead: add tracing event Wu Fengguang
2010-02-12 16:19   ` Steven Rostedt
2010-02-14  3:56     ` Wu Fengguang
2010-02-02 15:28 ` [PATCH 09/11] readahead: add /debug/readahead/stats Wu Fengguang
2010-02-02 15:28 ` [PATCH 10/11] readahead: dont do start-of-file readahead after lseek() Wu Fengguang
2010-02-02 17:39   ` Linus Torvalds
2010-02-02 18:13   ` Olivier Galibert
2010-02-02 18:40     ` Linus Torvalds
2010-02-02 18:48       ` Olivier Galibert
2010-02-02 19:14         ` Linus Torvalds
2010-02-02 19:59           ` david
2010-02-02 20:22             ` Linus Torvalds
2010-02-02 15:28 ` [PATCH 11/11] radixtree: speed up next/prev hole search Wu Fengguang
2010-02-02 22:38 ` [PATCH 00/11] [RFC] 512K readahead size with thrashing safe readahead Vivek Goyal
2010-02-02 23:17   ` Vivek Goyal
2010-02-03  6:27   ` Wu Fengguang
2010-02-03 15:24     ` Vivek Goyal
2010-02-03 15:58       ` Vivek Goyal
2010-02-04 13:21         ` Wu Fengguang
2010-02-04 15:52           ` Vivek Goyal
2010-02-04 13:44       ` Wu Fengguang
  -- strict thread matches above, loose matches on Subject: below --
2010-02-07  4:10 [PATCH 00/11] " Wu Fengguang
2010-02-07  4:10 ` [PATCH 01/11] readahead: limit readahead size for small devices Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100202153316.375570078@intel.com \
    --to=fengguang.wu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=jens.axboe@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).