public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: linux-kernel@vger.kernel.org, Hugh Dickins <hughd@google.com>,
	Chris Friesen <chris.friesen@genband.com>
Subject: Re: [PATCH v2 2/2] msync: start async writeout when MS_ASYNC
Date: Fri, 22 Jun 2012 14:26:27 -0700	[thread overview]
Message-ID: <20120622142627.b6184eda.akpm@linux-foundation.org> (raw)
In-Reply-To: <1339773179-31210-3-git-send-email-pbonzini@redhat.com>

On Fri, 15 Jun 2012 17:12:59 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> msync.c says that applications had better use fsync() or fadvise(FADV_DONTNEED)
> instead of MS_ASYNC.  Both advices are really bad:
> 
> * fsync() can be a replacement for MS_SYNC, not for MS_ASYNC;
> 
> * fadvise(FADV_DONTNEED) invalidates the pages completely, which will make
>   later accesses expensive.
> 
> Even sync_file_range would not be a replacement, because the writeout is
> done synchronously and can block for an extended period of time.

This is just wrong.  sync_file_range() is, within limits, asynchronous
when SYNC_FILE_RANGE_WAIT_* are not used.

> Having the possibility to schedule a writeback immediately is an advantage
> for the applications.

Having this forced upon them is also a disadvantage.  The syscall will
now take longer, consuming more CPU: starting all that IO will add
latency.  It also moves work away from the flusher threads and into the
calling process thus increasing overall runtime and reducing SMP
utilisation.

And as bdi_wrte_congested() is a best-effort, sometime-gets-it-wrong
thing, the patch will introduce quite rare but very long delays where
msync(MS_ASYNC) waits on IO.

>  They can do the same thing that fadvise does,
> but without the invalidation part.  The implementation is also similar
> to fadvise, but with tag-and-write enabled.
> 
> One example is if you are implementing a persistent dirty bitmap.
> Whenever you set bits to 1 you need to synchronize it with MS_SYNC, so
> that dirtiness is reported properly after a host crash.  If you have set
> any bits to 0, getting them to disk is not needed for correctness, but
> it is still desirable to save some work after a host crash.  You could
> simply use MS_SYNC in a separate thread, but MS_ASYNC provides exactly
> the desired semantics and is easily done in the kernel.

This is already the case.  The current msync(MS_ASYNC) will mark the
pages for writeout within a dirty_expire_centisecs period (default 30
seconds).  This has always been why we consider the current MS_ASYNC
implementation to be standards-compliant.

If you think that some applications will *benefit* from having that 30
seconds changed to zero seconds under their feet then please describe
the reasoning.

> If the application does not want to start I/O, it can simply call msync
> with flags equal to MS_INVALIDATE.  This one remains a no-op, as it should
> be on a reasonable implementation.

Using MS_INVALIDATE is a bit of a hack.


I'm just not seeing it, sorry.  The change has risks and downsides and
forces the application to do things which it could already have done,
had it so chosen.


  reply	other threads:[~2012-06-22 21:26 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-15 15:12 [PATCH v2 0/2] msync improvements Paolo Bonzini
2012-06-15 15:12 ` [PATCH v2 1/2] msync: support syncing a small part of the file Paolo Bonzini
2012-06-22 21:12   ` Andrew Morton
2012-07-02  8:14     ` Paolo Bonzini
2012-06-15 15:12 ` [PATCH v2 2/2] msync: start async writeout when MS_ASYNC Paolo Bonzini
2012-06-22 21:26   ` Andrew Morton [this message]
2012-07-02  8:15     ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120622142627.b6184eda.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=chris.friesen@genband.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox