From: Mel Gorman <mgorman@suse.de>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Rob van der Heij <rvdheij@gmail.com>,
Hugh Dickins <hughd@google.com>, Linux-MM <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm: fadvise: Drain all pagevecs if POSIX_FADV_DONTNEED fails to discard all pages
Date: Tue, 19 Feb 2013 11:57:30 +0000 [thread overview]
Message-ID: <20130219115729.GS4365@suse.de> (raw)
In-Reply-To: <20130214123926.599fcef8.akpm@linux-foundation.org>
On Thu, Feb 14, 2013 at 12:39:26PM -0800, Andrew Morton wrote:
> On Thu, 14 Feb 2013 12:03:49 +0000
> Mel Gorman <mgorman@suse.de> wrote:
>
> > Rob van der Heij reported the following (paraphrased) on private mail.
> >
> > The scenario is that I want to avoid backups to fill up the page
> > cache and purge stuff that is more likely to be used again (this is
> > with s390x Linux on z/VM, so I don't give it as much memory that
> > we don't care anymore). So I have something with LD_PRELOAD that
> > intercepts the close() call (from tar, in this case) and issues
> > a posix_fadvise() just before closing the file.
> >
> > This mostly works, except for small files (less than 14 pages)
> > that remains in page cache after the face.
>
> Sigh. We've had the "my backups swamp pagecache" thing for 15 years
> and it's still happening.
>
Yes. There have been variations of it too such as applications being pushed
prematurely into swap. I'm not certain how well we currently handle that
because I haven't checked in a few months.
> It should be possible nowadays to toss your backup application into a
> container to constrain its pagecache usage. So we can type
>
> run-in-a-memcg -m 200MB /my/backup/program
>
> and voila. Does such a script exist and work?
>
Michal already gave an example. It might work slower if the backup
application has to stall in direct reclaim to keep the container within
limits though.
> > --- a/mm/fadvise.c
> > +++ b/mm/fadvise.c
> > @@ -17,6 +17,7 @@
> > #include <linux/fadvise.h>
> > #include <linux/writeback.h>
> > #include <linux/syscalls.h>
> > +#include <linux/swap.h>
> >
> > #include <asm/unistd.h>
> >
> > @@ -120,9 +121,22 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice)
> > start_index = (offset+(PAGE_CACHE_SIZE-1)) >> PAGE_CACHE_SHIFT;
> > end_index = (endbyte >> PAGE_CACHE_SHIFT);
> >
> > - if (end_index >= start_index)
> > - invalidate_mapping_pages(mapping, start_index,
> > + if (end_index >= start_index) {
> > + unsigned long count = invalidate_mapping_pages(mapping,
> > + start_index, end_index);
> > +
> > + /*
> > + * If fewer pages were invalidated than expected then
> > + * it is possible that some of the pages were on
> > + * a per-cpu pagevec for a remote CPU. Drain all
> > + * pagevecs and try again.
> > + */
> > + if (count < (end_index - start_index + 1)) {
> > + lru_add_drain_all();
> > + invalidate_mapping_pages(mapping, start_index,
> > end_index);
> > + }
> > + }
> > break;
> > default:
> > ret = -EINVAL;
>
> Those LRU pagevecs are a right pain. They provided useful gains way
> back when I first inflicted them upon Linux, but it would be nice to
> confirm whether they're still worthwhile and if so, whether the
> benefits can be replicated with some less intrusive scheme.
>
I know. Unfortunately I've had "Implement pagevec removal and test" on my
TODO list for the guts of a year now. It's long overdue to actually sit down
and just do it. It's a similar story for the per-cpu lists in front of the
page allocator which are overdue to see if they can be replaced. I actually
have a prototype replacement for that lying around but it performed slower
in tests and has bit-rotted since but it ran slower and has bit-rotted
since as it was based on kernel 3.4.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
prev parent reply other threads:[~2013-02-19 11:57 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-02-14 12:03 [PATCH] mm: fadvise: Drain all pagevecs if POSIX_FADV_DONTNEED fails to discard all pages Mel Gorman
2013-02-14 17:07 ` Rob van der Heij
2013-02-14 20:39 ` Andrew Morton
2013-02-15 11:04 ` Michal Hocko
2013-02-15 16:14 ` Rob van der Heij
2013-02-15 16:48 ` Michal Hocko
2013-02-19 11:57 ` Mel Gorman [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130219115729.GS4365@suse.de \
--to=mgorman@suse.de \
--cc=akpm@linux-foundation.org \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=rvdheij@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).