From: Johannes Weiner <hannes@cmpxchg.org>
To: Jens Axboe <axboe@kernel.dk>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH v4] mm: don't cap request size based on read-ahead setting
Date: Fri, 18 Nov 2016 13:02:18 -0500 [thread overview]
Message-ID: <20161118180218.GA6411@cmpxchg.org> (raw)
In-Reply-To: <e4271a04-35cf-b082-34ea-92649f5111be@kernel.dk>
On Thu, Nov 17, 2016 at 02:23:10PM -0700, Jens Axboe wrote:
> We ran into a funky issue, where someone doing 256K buffered reads saw
> 128K requests at the device level. Turns out it is read-ahead capping
> the request size, since we use 128K as the default setting. This doesn't
> make a lot of sense - if someone is issuing 256K reads, they should see
> 256K reads, regardless of the read-ahead setting, if the underlying
> device can support a 256K read in a single command.
>
> To make matters more confusing, there's an odd interaction with the
> fadvise hint setting. If we tell the kernel we're doing sequential IO on
> this file descriptor, we can get twice the read-ahead size. But if we
> tell the kernel that we are doing random IO, hence disabling read-ahead,
> we do get nice 256K requests at the lower level. This is because
> ondemand and forced read-ahead behave differently, with the latter doing
> the right thing. An application developer will be, rightfully,
> scratching his head at this point, wondering wtf is going on. A good one
> will dive into the kernel source, and silently weep.
With the FADV_RANDOM part of the changelog updated, this looks good to
me. Just a few nitpicks below.
> This patch introduces a bdi hint, io_pages. This is the soft max IO size
> for the lower level, I've hooked it up to the bdev settings here.
> Read-ahead is modified to issue the maximum of the user request size,
> and the read-ahead max size, but capped to the max request size on the
> device side. The latter is done to avoid reading ahead too much, if the
> application asks for a huge read. With this patch, the kernel behaves
> like the application expects.
>
> Signed-off-by: Jens Axboe <axboe@fb.com>
> @@ -207,12 +207,17 @@ int __do_page_cache_readahead(struct address_space
> *mapping, struct file *filp,
> * memory at once.
> */
> int force_page_cache_readahead(struct address_space *mapping, struct file
> *filp,
Linewrap (but you already knew that ;))
> - pgoff_t offset, unsigned long nr_to_read)
> + pgoff_t offset, unsigned long nr_to_read)
> {
> + struct backing_dev_info *bdi = inode_to_bdi(mapping->host);
> + struct file_ra_state *ra = &filp->f_ra;
> + unsigned long max_pages;
> +
> if (unlikely(!mapping->a_ops->readpage && !mapping->a_ops->readpages))
> return -EINVAL;
>
> - nr_to_read = min(nr_to_read, inode_to_bdi(mapping->host)->ra_pages);
> + max_pages = max_t(unsigned long, bdi->io_pages, ra->ra_pages);
> + nr_to_read = min(nr_to_read, max_pages);
It would be useful to have the comment on not capping below optimal IO
size from ondemand_readahead() here as well.
> @@ -369,10 +374,18 @@ ondemand_readahead(struct address_space *mapping,
> bool hit_readahead_marker, pgoff_t offset,
> unsigned long req_size)
> {
> - unsigned long max = ra->ra_pages;
> + struct backing_dev_info *bdi = inode_to_bdi(mapping->host);
> + unsigned long max_pages = ra->ra_pages;
> pgoff_t prev_offset;
>
> /*
> + * If the request exceeds the readahead window, allow the read to
> + * be up to the optimal hardware IO size
> + */
> + if (req_size > max_pages && bdi->io_pages > max_pages)
> + max_pages = min(req_size, bdi->io_pages);
> +
> + /*
> * start of file
> */
> if (!offset)
Please feel free to add:
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes@cmpxchg.org>
To: Jens Axboe <axboe@kernel.dk>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH v4] mm: don't cap request size based on read-ahead setting
Date: Fri, 18 Nov 2016 13:02:18 -0500 [thread overview]
Message-ID: <20161118180218.GA6411@cmpxchg.org> (raw)
In-Reply-To: <e4271a04-35cf-b082-34ea-92649f5111be@kernel.dk>
On Thu, Nov 17, 2016 at 02:23:10PM -0700, Jens Axboe wrote:
> We ran into a funky issue, where someone doing 256K buffered reads saw
> 128K requests at the device level. Turns out it is read-ahead capping
> the request size, since we use 128K as the default setting. This doesn't
> make a lot of sense - if someone is issuing 256K reads, they should see
> 256K reads, regardless of the read-ahead setting, if the underlying
> device can support a 256K read in a single command.
>
> To make matters more confusing, there's an odd interaction with the
> fadvise hint setting. If we tell the kernel we're doing sequential IO on
> this file descriptor, we can get twice the read-ahead size. But if we
> tell the kernel that we are doing random IO, hence disabling read-ahead,
> we do get nice 256K requests at the lower level. This is because
> ondemand and forced read-ahead behave differently, with the latter doing
> the right thing. An application developer will be, rightfully,
> scratching his head at this point, wondering wtf is going on. A good one
> will dive into the kernel source, and silently weep.
With the FADV_RANDOM part of the changelog updated, this looks good to
me. Just a few nitpicks below.
> This patch introduces a bdi hint, io_pages. This is the soft max IO size
> for the lower level, I've hooked it up to the bdev settings here.
> Read-ahead is modified to issue the maximum of the user request size,
> and the read-ahead max size, but capped to the max request size on the
> device side. The latter is done to avoid reading ahead too much, if the
> application asks for a huge read. With this patch, the kernel behaves
> like the application expects.
>
> Signed-off-by: Jens Axboe <axboe@fb.com>
> @@ -207,12 +207,17 @@ int __do_page_cache_readahead(struct address_space
> *mapping, struct file *filp,
> * memory at once.
> */
> int force_page_cache_readahead(struct address_space *mapping, struct file
> *filp,
Linewrap (but you already knew that ;))
> - pgoff_t offset, unsigned long nr_to_read)
> + pgoff_t offset, unsigned long nr_to_read)
> {
> + struct backing_dev_info *bdi = inode_to_bdi(mapping->host);
> + struct file_ra_state *ra = &filp->f_ra;
> + unsigned long max_pages;
> +
> if (unlikely(!mapping->a_ops->readpage && !mapping->a_ops->readpages))
> return -EINVAL;
>
> - nr_to_read = min(nr_to_read, inode_to_bdi(mapping->host)->ra_pages);
> + max_pages = max_t(unsigned long, bdi->io_pages, ra->ra_pages);
> + nr_to_read = min(nr_to_read, max_pages);
It would be useful to have the comment on not capping below optimal IO
size from ondemand_readahead() here as well.
> @@ -369,10 +374,18 @@ ondemand_readahead(struct address_space *mapping,
> bool hit_readahead_marker, pgoff_t offset,
> unsigned long req_size)
> {
> - unsigned long max = ra->ra_pages;
> + struct backing_dev_info *bdi = inode_to_bdi(mapping->host);
> + unsigned long max_pages = ra->ra_pages;
> pgoff_t prev_offset;
>
> /*
> + * If the request exceeds the readahead window, allow the read to
> + * be up to the optimal hardware IO size
> + */
> + if (req_size > max_pages && bdi->io_pages > max_pages)
> + max_pages = min(req_size, bdi->io_pages);
> +
> + /*
> * start of file
> */
> if (!offset)
Please feel free to add:
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-11-18 18:02 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-17 21:23 [PATCH v4] mm: don't cap request size based on read-ahead setting Jens Axboe
2016-11-17 21:23 ` Jens Axboe
2016-11-18 5:58 ` Hillf Danton
2016-11-18 5:58 ` Hillf Danton
2016-11-18 15:09 ` Jens Axboe
2016-11-18 15:09 ` Jens Axboe
2016-11-18 18:02 ` Johannes Weiner [this message]
2016-11-18 18:02 ` Johannes Weiner
2016-11-18 19:34 ` Jens Axboe
2016-11-18 19:34 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161118180218.GA6411@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.