From: Wu Fengguang <fengguang.wu@intel.com>
To: Nikanth Karthikesan <knikanth@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
"balbir@linux.vnet.ibm.com" <balbir@linux.vnet.ibm.com>,
Jens Axboe <jens.axboe@oracle.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [PATCH v2] Make vm_max_readahead configurable at run-time
Date: Wed, 10 Feb 2010 19:05:51 +0800 [thread overview]
Message-ID: <20100210110551.GA1323@localhost> (raw)
In-Reply-To: <201002101623.30302.knikanth@suse.de>
Nikanth,
> Make vm_max_readahead configurable at run-time. Expose a sysctl knob
> in procfs to change it. This would ensure that new disks added would
> use this value as their default read_ahead_kb.
Do you have use case, or customer demand for it?
> Also filesystems which use default_backing_dev_info would also
> use this new value, even if they were already mounted.
>
> Currently xfs, btrfs, nilfs, raw, mtd use the default_backing_dev_info.
This sounds like bad interface, in that users will be confused by the
tricky details of "works for new devices" and "works for some fs".
One more tricky point is, btrfs/md/dm readahead size may not be
influenced if some of the component disks are hot added.
So this patch is only going to work for hot-plugged disks that
contains _standalone_ filesystem. Is this typical use case in servers?
Thanks,
Fengguang
>
> Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
>
> ---
>
> Index: linux-2.6/block/blk-core.c
> ===================================================================
> --- linux-2.6.orig/block/blk-core.c
> +++ linux-2.6/block/blk-core.c
> @@ -499,7 +499,7 @@ struct request_queue *blk_alloc_queue_no
> q->backing_dev_info.unplug_io_fn = blk_backing_dev_unplug;
> q->backing_dev_info.unplug_io_data = q;
> q->backing_dev_info.ra_pages =
> - (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE;
> + (vm_max_readahead_kb * 1024) / PAGE_CACHE_SIZE;
> q->backing_dev_info.state = 0;
> q->backing_dev_info.capabilities = BDI_CAP_MAP_COPY;
> q->backing_dev_info.name = "block";
> Index: linux-2.6/fs/fuse/inode.c
> ===================================================================
> --- linux-2.6.orig/fs/fuse/inode.c
> +++ linux-2.6/fs/fuse/inode.c
> @@ -870,7 +870,7 @@ static int fuse_bdi_init(struct fuse_con
> int err;
>
> fc->bdi.name = "fuse";
> - fc->bdi.ra_pages = (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE;
> + fc->bdi.ra_pages = (vm_max_readahead_kb * 1024) / PAGE_CACHE_SIZE;
> fc->bdi.unplug_io_fn = default_unplug_io_fn;
> /* fuse does it's own writeback accounting */
> fc->bdi.capabilities = BDI_CAP_NO_ACCT_WB;
> Index: linux-2.6/include/linux/mm.h
> ===================================================================
> --- linux-2.6.orig/include/linux/mm.h
> +++ linux-2.6/include/linux/mm.h
> @@ -1188,7 +1188,11 @@ int write_one_page(struct page *page, in
> void task_dirty_inc(struct task_struct *tsk);
>
> /* readahead.c */
> -#define VM_MAX_READAHEAD 128 /* kbytes */
> +#define INITIAL_VM_MAX_READAHEAD_KB 128
> +extern unsigned long vm_max_readahead_kb;
> +
> +int sysctl_vm_max_readahead_kb_handler(struct ctl_table *table, int write,
> + void __user *buffer, size_t *length, loff_t *ppos);
>
> int force_page_cache_readahead(struct address_space *mapping, struct file *filp,
> pgoff_t offset, unsigned long nr_to_read);
> Index: linux-2.6/mm/backing-dev.c
> ===================================================================
> --- linux-2.6.orig/mm/backing-dev.c
> +++ linux-2.6/mm/backing-dev.c
> @@ -18,7 +18,8 @@ EXPORT_SYMBOL(default_unplug_io_fn);
>
> struct backing_dev_info default_backing_dev_info = {
> .name = "default",
> - .ra_pages = VM_MAX_READAHEAD * 1024 / PAGE_CACHE_SIZE,
> + .ra_pages = INITIAL_VM_MAX_READAHEAD_KB
> + >> (PAGE_CACHE_SHIFT - 10),
> .state = 0,
> .capabilities = BDI_CAP_MAP_COPY,
> .unplug_io_fn = default_unplug_io_fn,
> Index: linux-2.6/mm/readahead.c
> ===================================================================
> --- linux-2.6.orig/mm/readahead.c
> +++ linux-2.6/mm/readahead.c
> @@ -17,6 +17,19 @@
> #include <linux/pagevec.h>
> #include <linux/pagemap.h>
>
> +unsigned long vm_max_readahead_kb = INITIAL_VM_MAX_READAHEAD_KB;
> +
> +int sysctl_vm_max_readahead_kb_handler(struct ctl_table *table, int write,
> + void __user *buffer, size_t *length, loff_t *ppos)
> +{
> + proc_doulongvec_minmax(table, write, buffer, length, ppos);
> +
> + default_backing_dev_info.ra_pages =
> + vm_max_readahead_kb >> (PAGE_CACHE_SHIFT - 10);
> +
> + return 0;
> +}
> +
> /*
> * Initialise a struct file's readahead state. Assumes that the caller has
> * memset *ra to zero.
> Index: linux-2.6/kernel/sysctl.c
> ===================================================================
> --- linux-2.6.orig/kernel/sysctl.c
> +++ linux-2.6/kernel/sysctl.c
> @@ -1273,7 +1273,13 @@ static struct ctl_table vm_table[] = {
> .extra2 = &one,
> },
> #endif
> -
> + {
> + .procname = "max_readahead_kb",
> + .data = &vm_max_readahead_kb,
> + .maxlen = sizeof(vm_max_readahead_kb),
> + .mode = 0644,
> + .proc_handler = sysctl_vm_max_readahead_kb_handler,
> + },
> { }
> };
>
> Index: linux-2.6/Documentation/sysctl/vm.txt
> ===================================================================
> --- linux-2.6.orig/Documentation/sysctl/vm.txt
> +++ linux-2.6/Documentation/sysctl/vm.txt
> @@ -31,6 +31,7 @@ Currently, these files are in /proc/sys/
> - laptop_mode
> - legacy_va_layout
> - lowmem_reserve_ratio
> +- max_readahead_kb
> - max_map_count
> - memory_failure_early_kill
> - memory_failure_recovery
> @@ -263,6 +264,18 @@ The minimum value is 1 (1/1 -> 100%).
>
> ==============================================================
>
> +max_readahead_kb:
> +
> +This file contains the default maximum readahead that would be
> +used, when new disks would be added to the system.
> +
> +Also filesystems which use default_backing_dev_info would also
> +use this new value, even if they were already mounted.
> +
> +xfs, btrfs, nilfs, raw, mtd use the default_backing_dev_info.
> +
> +==============================================================
> +
> max_map_count:
>
> This file contains the maximum number of memory map areas a process
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-02-10 11:06 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-09 11:29 [PATCH] Make vm_max_readahead configurable at run-time Nikanth Karthikesan
2010-02-09 23:22 ` Andrew Morton
2010-02-10 6:25 ` Balbir Singh
2010-02-10 10:53 ` [PATCH v2] " Nikanth Karthikesan
2010-02-10 11:05 ` Wu Fengguang [this message]
2010-02-10 13:52 ` Nikanth Karthikesan
2010-02-11 5:13 ` Wu Fengguang
2010-02-11 7:34 ` Nikanth Karthikesan
2010-02-11 10:16 ` [PATCH v2] Make VM_MAX_READAHEAD a kernel parameter Nikanth Karthikesan
2010-02-11 11:15 ` Ankit Jain
2010-02-11 11:45 ` Nikanth Karthikesan
2010-02-11 15:16 ` Wu Fengguang
2010-02-15 4:35 ` Nikanth Karthikesan
2010-02-14 21:37 ` Dave Chinner
2010-02-15 4:36 ` Nikanth Karthikesan
2010-02-21 14:26 ` Wu Fengguang
2010-02-21 15:49 ` Wu Fengguang
2010-02-21 15:52 ` Wu Fengguang
2010-02-22 8:16 ` Christian Ehrhardt
2010-02-23 2:25 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100210110551.GA1323@localhost \
--to=fengguang.wu@intel.com \
--cc=akpm@linux-foundation.org \
--cc=balbir@linux.vnet.ibm.com \
--cc=jens.axboe@oracle.com \
--cc=knikanth@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).