Re: [PATCH] fadvise: perform WILLNEED readahead in a workqueue

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Eric Wong <normalperson@yhbt.net>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] fadvise: perform WILLNEED readahead in a workqueue
Date: Sun, 16 Dec 2012 03:59:53 +0000	[thread overview]
Message-ID: <20121216035953.GA30689@dcvr.yhbt.net> (raw)
In-Reply-To: <20121216033601.GJ9806@dastard>

Dave Chinner <david@fromorbit.com> wrote:
> On Sun, Dec 16, 2012 at 03:04:42AM +0000, Eric Wong wrote:
> > Dave Chinner <david@fromorbit.com> wrote:
> > > On Sat, Dec 15, 2012 at 12:54:48AM +0000, Eric Wong wrote:
> > > > 
> > > >  Before: fadvise64(3, 0, 0, POSIX_FADV_WILLNEED) = 0 <2.484832>
> > > >   After: fadvise64(3, 0, 0, POSIX_FADV_WILLNEED) = 0 <0.000061>
> > > 
> > > You've basically asked fadvise() to readahead the entire file if it
> > > can. That means it is likely to issue enough readahead to fill the
> > > IO queue, and that's where all the latency is coming from. If all
> > > you are trying to do is reduce the latency of the first read, then
> > > only readahead the initial range that you are going to need to read...
> > 
> > Yes, I do want to read the whole file, eventually.  So I want to put
> > the file into the page cache ASAP and allow the disk to spin down.
> 
> Issuing readahead is not going to speed up the first read. Either
> you will spend more time issuing all the readahead, or you block
> waiting for the first read to complete. And the way you are issuing
> readahead does not guarantee the entire file is brought into the
> page cache....

I'm not relying on readahead to speed up the first read.

By using fadvise/readahead, I want a _best-effort_ attempt to
keep the file in cache.

> > But I also want the first read() to be fast.
> 
> You can't have a pony, sorry.

I want the first read() to happen sooner than it would under current
fadvise.  If it's slightly slower that w/o fadvise, that's fine.
The 1-2s slower with current fadvise is what bothers me.

> > > Also, Pushing readahead off to a workqueue potentially allows
> > > someone to DOS the system because readahead won't ever get throttled
> > > in the syscall context...
> > 
> > Yes, I'm a little worried about this, too.
> > Perhaps squashing something like the following will work?
> > 
> > diff --git a/mm/readahead.c b/mm/readahead.c
> > index 56a80a9..51dc58e 100644
> > --- a/mm/readahead.c
> > +++ b/mm/readahead.c
> > @@ -246,16 +246,18 @@ void wq_page_cache_readahead(struct address_space *mapping, struct file *filp,
> >  {
> >  	struct wq_ra_req *req;
> >  
> > +	nr_to_read = max_sane_readahead(nr_to_read);
> > +	if (!nr_to_read)
> > +		goto skip_ra;
> 
> You do realise that anything you read ahead will be accounted as
> inactive pages, so nr_to_read doesn't decrease at all as you fill
> memory with readahead pages...

Ah, ok, I'll see if I can rework it.

> >  	req = kzalloc(sizeof(*req), GFP_ATOMIC);
> 
> GFP_ATOMIC? Really?

Sorry, I'm really new at this.

> In reality, I think you are looking in the wrong place to fix your
> "first read" latency problem. No matter what you do, there is going
> to be IO latency on the first read. And readahead doesn't guarantee
> that the pages are brought into the page cache (ever heard of
> readahead thrashing?) so the way you are doing your readahead is not
> going to result in you being able to spin the disk down after
> issuing a readahead command...

Right, I want a _best-effort_ readahead (which seems to be what an
advisory interface should offer).

> You've really got two problems - minimal initial latency, and
> reading the file quickly and pinning it in memory until you get
> around to needing it. The first can't be made faster by using
> readahead, and the second can not be guaranteed by using readahead.

Agreed.  I think I overstated the requirements.

I want "less-bad" initial latency than I was getting.

So I don't mind if open()+fadvise()+read() is a couple of milliseconds
slower than just open()+read(), but I do mind if fadvise() takes 1-2
seconds.

> IOWs, readahead is the wrong tool for solving your problems. Minimal
> IO latency from the first read will come from just issuing pread()
> after open(), and ensuring that the file is read quickly and pinned
> in memory can really only be done by allocating RAM in the
> application to hold it until it is needed....

I definitely only want a best-effort method to put a file into memory.
I want the kernel to decide whether or not to cache it.

Thanks for looking at this!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Eric Wong <normalperson@yhbt.net>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] fadvise: perform WILLNEED readahead in a workqueue
Date: Sun, 16 Dec 2012 03:59:53 +0000	[thread overview]
Message-ID: <20121216035953.GA30689@dcvr.yhbt.net> (raw)
In-Reply-To: <20121216033601.GJ9806@dastard>

Dave Chinner <david@fromorbit.com> wrote:
> On Sun, Dec 16, 2012 at 03:04:42AM +0000, Eric Wong wrote:
> > Dave Chinner <david@fromorbit.com> wrote:
> > > On Sat, Dec 15, 2012 at 12:54:48AM +0000, Eric Wong wrote:
> > > > 
> > > >  Before: fadvise64(3, 0, 0, POSIX_FADV_WILLNEED) = 0 <2.484832>
> > > >   After: fadvise64(3, 0, 0, POSIX_FADV_WILLNEED) = 0 <0.000061>
> > > 
> > > You've basically asked fadvise() to readahead the entire file if it
> > > can. That means it is likely to issue enough readahead to fill the
> > > IO queue, and that's where all the latency is coming from. If all
> > > you are trying to do is reduce the latency of the first read, then
> > > only readahead the initial range that you are going to need to read...
> > 
> > Yes, I do want to read the whole file, eventually.  So I want to put
> > the file into the page cache ASAP and allow the disk to spin down.
> 
> Issuing readahead is not going to speed up the first read. Either
> you will spend more time issuing all the readahead, or you block
> waiting for the first read to complete. And the way you are issuing
> readahead does not guarantee the entire file is brought into the
> page cache....

I'm not relying on readahead to speed up the first read.

By using fadvise/readahead, I want a _best-effort_ attempt to
keep the file in cache.

> > But I also want the first read() to be fast.
> 
> You can't have a pony, sorry.

I want the first read() to happen sooner than it would under current
fadvise.  If it's slightly slower that w/o fadvise, that's fine.
The 1-2s slower with current fadvise is what bothers me.

> > > Also, Pushing readahead off to a workqueue potentially allows
> > > someone to DOS the system because readahead won't ever get throttled
> > > in the syscall context...
> > 
> > Yes, I'm a little worried about this, too.
> > Perhaps squashing something like the following will work?
> > 
> > diff --git a/mm/readahead.c b/mm/readahead.c
> > index 56a80a9..51dc58e 100644
> > --- a/mm/readahead.c
> > +++ b/mm/readahead.c
> > @@ -246,16 +246,18 @@ void wq_page_cache_readahead(struct address_space *mapping, struct file *filp,
> >  {
> >  	struct wq_ra_req *req;
> >  
> > +	nr_to_read = max_sane_readahead(nr_to_read);
> > +	if (!nr_to_read)
> > +		goto skip_ra;
> 
> You do realise that anything you read ahead will be accounted as
> inactive pages, so nr_to_read doesn't decrease at all as you fill
> memory with readahead pages...

Ah, ok, I'll see if I can rework it.

> >  	req = kzalloc(sizeof(*req), GFP_ATOMIC);
> 
> GFP_ATOMIC? Really?

Sorry, I'm really new at this.

> In reality, I think you are looking in the wrong place to fix your
> "first read" latency problem. No matter what you do, there is going
> to be IO latency on the first read. And readahead doesn't guarantee
> that the pages are brought into the page cache (ever heard of
> readahead thrashing?) so the way you are doing your readahead is not
> going to result in you being able to spin the disk down after
> issuing a readahead command...

Right, I want a _best-effort_ readahead (which seems to be what an
advisory interface should offer).

> You've really got two problems - minimal initial latency, and
> reading the file quickly and pinning it in memory until you get
> around to needing it. The first can't be made faster by using
> readahead, and the second can not be guaranteed by using readahead.

Agreed.  I think I overstated the requirements.

I want "less-bad" initial latency than I was getting.

So I don't mind if open()+fadvise()+read() is a couple of milliseconds
slower than just open()+read(), but I do mind if fadvise() takes 1-2
seconds.

> IOWs, readahead is the wrong tool for solving your problems. Minimal
> IO latency from the first read will come from just issuing pread()
> after open(), and ensuring that the file is read quickly and pinned
> in memory can really only be done by allocating RAM in the
> application to hold it until it is needed....

I definitely only want a best-effort method to put a file into memory.
I want the kernel to decide whether or not to cache it.

Thanks for looking at this!

next prev parent reply	other threads:[~2012-12-16  3:59 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-15  0:54 [PATCH] fadvise: perform WILLNEED readahead in a workqueue Eric Wong
2012-12-15  0:54 ` Eric Wong
2012-12-15 22:34 ` Alan Cox
2012-12-15 22:34   ` Alan Cox
2012-12-16  0:25   ` Eric Wong
2012-12-16  0:25     ` Eric Wong
2012-12-16  3:03     ` Dave Chinner
2012-12-16  3:03       ` Dave Chinner
2012-12-16  3:35       ` Eric Wong
2012-12-16  3:35         ` Eric Wong
2012-12-16  4:15         ` Dave Chinner
2012-12-16  4:15           ` Dave Chinner
2012-12-16  5:23           ` Eric Wong
2012-12-16  5:23             ` Eric Wong
2012-12-16 21:31             ` Dave Chinner
2012-12-16 21:31               ` Dave Chinner
2012-12-16  8:48           ` Zheng Liu
2012-12-16  8:48             ` Zheng Liu
2012-12-16  2:45 ` Dave Chinner
2012-12-16  2:45   ` Dave Chinner
2012-12-16  3:04   ` Eric Wong
2012-12-16  3:04     ` Eric Wong
2012-12-16  3:09     ` Eric Wong
2012-12-16  3:09       ` Eric Wong
2012-12-16  3:36     ` Dave Chinner
2012-12-16  3:36       ` Dave Chinner
2012-12-16  3:59       ` Eric Wong [this message]
2012-12-16  3:59         ` Eric Wong
2012-12-16  4:26         ` Dave Chinner
2012-12-16  4:26           ` Dave Chinner
2012-12-16  5:17           ` Eric Wong
2012-12-16  5:17             ` Eric Wong
2013-02-22 16:45   ` Phillip Susi
2013-02-22 16:45     ` Phillip Susi
2013-02-22 21:13     ` Eric Wong
2013-02-22 21:13       ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121216035953.GA30689@dcvr.yhbt.net \
    --to=normalperson@yhbt.net \
    --cc=david@fromorbit.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.