From: Jan Stancek <jstancek@redhat.com>
To: ltp@lists.linux.it
Subject: [LTP] [PATCH v2] syscalls/readahead02: limit max readahead to backing device max_readahead_kb
Date: Thu, 7 Mar 2019 03:18:43 -0500 (EST) [thread overview]
Message-ID: <1883827755.5820362.1551946723163.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <CAOQ4uxjO+jYKzor7_eqBd8HMGs=m-EKt3jd+jRtOhh_KNPzYXQ@mail.gmail.com>
----- Original Message -----
> On Wed, Mar 6, 2019 at 6:43 PM Jan Stancek <jstancek@redhat.com> wrote:
> >
> > On Tue, Mar 05, 2019 at 10:44:57PM +0200, Amir Goldstein wrote:
> > >> > > > This is certainly better than 4K, but still feels like we are not
> > >> > > > really
> > >> > > > testing
> > >> > > > the API properly, but I'm fine with this fix.
> > >> > > >
> > >> > > > However... as follow up, how about extending the new
> > >> > > > tst_dev_bytes_written() utils from Sumit to cover also bytes_read
> > >> > > > and replace validation of readahead() from get_cached_size() diff
> > >> > > > to tst_dev_bytes_read()?
> > >> > >
> > >> > > There is something similar based on /proc/self/io. We could try
> > >> > > using
> > >> > > that to estimate max readahead size.
> > >> > >
> > >> > > Or /sys/class/block/$dev/stat as you suggested, not sure which one
> > >> > > is
> > >> > > more accurate/up to date.
> > >> > >
> > >> >
> > >> > I believe /proc/self/io doesn't count IO performed by kernel async
> > >> > readahead against the process that issued the readahead, but didn't
> > >> > check. The test uses /proc/self/io to check how many IO where avoided
> > >> > by readahead...
> > >>
> > >> We could do one readahead() on entire file, then read
> > >> the file and see how many IO we didn't manage to avoid.
> > >> The difference between filesize and IO we couldn't avoid,
> > >> would be our max readahead size.
> >
> > This also doesn't seem 100% accurate.
> >
> > Any method to inspect side-effect of readahead, appears to lead to more
> > readahead done by kernel. E.g. sequential reads leading to more async
> > readahead started by kernel (which tries to stay ahead by async_size).
> > mmap() approach appears to fault-in with do_fault_around().
> >
> > MAP_NONBLOCK is gone, mincore and pagemap doesn't help here.
> >
> > I'm attaching v3, where I do reads with sycalls() in reverse order.
> > But occasionally, it still somehow leads to couple extra pages being
> > read to cache. So, it still over-estimates. On ppc64le, it's quite
> > significant, 4 extra pages in cache, each 64k, causes readahead
> > loop to miss ~10MB of data.
> >
> > /sys/class/block/$dev/ stats appear to be increased for fs metadata
> > as well, which can also inflate value and we over-estimate.
> >
> > I'm running out of ideas for something more accurate/stable than v2.
> >
>
> I'm trying to understand if maybe you are running off the rails with
> the estimation issue.
> What the test aims to verify is that readahead prevents waiting
> on IO in the future.
> The max readahead size estimation is a very small part
> of the test and not that significant IMO.
> Why is the test not trying to readahead more than 64MB?
> It's arbitrary. So my first proposal to over-estimation was
> to cap upper estimation with 1MB, i.e.:
Problem is that max allowed readahead can be smaller than 1MB,
so then we still over-estimate.
>
> offset += MIN(max_ra_estimate, MAX_SANE_READAHEAD_ESTIMATION);
>
> With this change we won't be testing if readahead of 64MB
> in one go works anymore - it looks like getting that to work reliably
> is more challenging and not sure its worth the trouble.
>
> What you ended up implementing in v2 is not what I proposed
> (you just disabled estimation altogether)
Yes, v2/bdi limit is highest number I'm aware of, that should
work across all kernels without guessing.
> I am fine with the dynamic setting on min_sane_readahead
> in v2, but it is independent of capping the upper limit for
> estimation.
>
> So what do you say about v2 + estimation (with or without reading
> file backwards) and upper estimation limit?
How could we tell if 1MB upper limit is smaller than max readahead limit?
Other option could be v3, but in steps equal to "max_ra / 2":
* |<----- async_size ---------|
* |------------------- size -------------------->|
* |==================#===========================|
* ^start ^page marked with PG_readahead
*
* To overlap application thinking time and disk I/O time, we do
* `readahead pipelining': Do not wait until the application consumed all
* readahead pages and stalled on the missing page at readahead_index;
* Instead, submit an asynchronous readahead I/O as soon as there are
* only async_size pages left in the readahead window. Normally async_size
* will be equal to size, for maximum pipelining.
> Does it solve the sporadic failures problem?
>
> Thanks,
> Amir.
>
next prev parent reply other threads:[~2019-03-07 8:18 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-05 12:34 [LTP] [PATCH/RFC] syscalls/readahead02: don't use cache size Jan Stancek
2019-03-05 13:53 ` Amir Goldstein
2019-03-05 15:17 ` Jan Stancek
2019-03-05 15:33 ` Amir Goldstein
2019-03-05 16:17 ` [LTP] [PATCH v2] syscalls/readahead02: limit max readahead to backing device max_readahead_kb Jan Stancek
2019-03-05 16:35 ` Amir Goldstein
2019-03-05 16:55 ` Jan Stancek
2019-03-05 20:08 ` Amir Goldstein
2019-03-05 20:22 ` Jan Stancek
2019-03-05 20:44 ` Amir Goldstein
2019-03-06 16:42 ` Jan Stancek
2019-03-07 6:41 ` Amir Goldstein
2019-03-07 8:18 ` Jan Stancek [this message]
2019-03-07 8:48 ` Amir Goldstein
2019-03-07 9:15 ` Jan Stancek
2019-03-07 9:53 ` Amir Goldstein
2019-03-07 11:25 ` Jan Stancek
2019-03-07 11:49 ` Amir Goldstein
2019-03-08 12:19 ` [LTP] [PATCH v4] syscalls/readahead02: set readahead to min(bdi limit, 2M) Jan Stancek
2019-03-08 14:29 ` Amir Goldstein
2019-03-08 14:56 ` Jan Stancek
2019-03-12 13:46 ` Li Wang
2019-03-12 15:26 ` Jan Stancek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1883827755.5820362.1551946723163.JavaMail.zimbra@redhat.com \
--to=jstancek@redhat.com \
--cc=ltp@lists.linux.it \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox