linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Andres Freund <andres@anarazel.de>,
	Rik van Riel <riel@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team@fb.com
Subject: Re: [PATCH 2/3] mm: filemap: only do access activations on reads
Date: Mon, 4 Apr 2016 18:47:50 -0400	[thread overview]
Message-ID: <20160404224750.GA14828@cmpxchg.org> (raw)
In-Reply-To: <20160404142233.cfdea284b8107768fb359efd@linux-foundation.org>

On Mon, Apr 04, 2016 at 02:22:33PM -0700, Andrew Morton wrote:
> On Mon,  4 Apr 2016 13:13:37 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > Andres Freund observed that his database workload is struggling with
> > the transaction journal creating pressure on frequently read pages.
> > 
> > Access patterns like transaction journals frequently write the same
> > pages over and over, but in the majority of cases those pages are
> > never read back. There are no caching benefits to be had for those
> > pages, so activating them and having them put pressure on pages that
> > do benefit from caching is a bad choice.
> 
> Read-after-write is a pretty common pattern: temporary files for
> example.  What are the opportunities for regressions here?

The read(s) following the write will call mark_page_accessed() and so
promote the pages if their data is in fact repeatedly accessed. That
makes sense, because the writes really don't say anything about the
cache-worthiness. One write followed by one read shouldn't mean the
data is strongly benefiting from being cached. Only multiple reads.

What complicates that a little bit is that when the multiple reads do
happen on write-instantiated pages, the pages might have already been
aged somewhat in between, whereas fresh-faulting reads start counting
accesses from the head of the LRU right away. If both have re-use
distances shorter than memory, the LRU offset of pages instantiated by
writes could push the second access past eviction.

In that case, they would likely get picked up by refault detection and
promoted after all. So it would be one more IO, but nothing permanent.

This is also somewhat compensated by the dirty cache delaying reclaim
and giving these pages another round-trip anyway - unless dirty limits
cause the pages to be written back before they reach the LRU tail.

It's really hard to tell whether that would even be an issue since it
depends on whether a workload matching those parameters even exist. A
synthetic test doesn't really say us much about that. I think all we
can do here is decide whether the cache semantics make logical sense.

One thing I proposed in the thread that would compensate for the LRU
offset of write-instantiated pages would be to set PageReferenced on
these pages but never call mark_page_accessed() from the write. This
wouldn't be perfect because the distance between write and read does
not necessarily predict the distance between the subsequent reads, but
it would mean that the first read would promote the pages, whereas
repeatedly written files would never be activated or refault-activate.

Would that make sense? Is there something I'm missing?

> Did you consider providing userspace with a way to hint "this file is
> probably write-then-not-read"?

Yes, but I'm not too confident in that working out :(

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2016-04-04 22:47 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-04 17:13 [PATCH 0/3] mm: support bigger cache workingsets and protect against writes Johannes Weiner
2016-04-04 17:13 ` [PATCH 1/3] mm: workingset: only do workingset activations on reads Johannes Weiner
2016-04-04 17:13 ` [PATCH 2/3] mm: filemap: only do access " Johannes Weiner
2016-04-04 21:22   ` Andrew Morton
2016-04-04 21:39     ` Rik van Riel
2016-04-04 21:55       ` Andrew Morton
2016-04-05 17:50       ` Johannes Weiner
2016-04-04 22:47     ` Johannes Weiner [this message]
2016-04-04 17:13 ` [PATCH 3/3] mm: vmscan: reduce size of inactive file list Johannes Weiner
2016-04-04 18:52 ` [PATCH 0/3] mm: support bigger cache workingsets and protect against writes Andres Freund

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160404224750.GA14828@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=andres@anarazel.de \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).