public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Chad Talbott <ctalbott@google.com>
Cc: linux-kernel@vger.kernel.org, Andrew Morton <akpm@google.com>,
	Michael Rubin <mrubin@google.com>
Subject: Re: Metadata in sys_sync_file_range and fadvise(DONTNEED)
Date: Mon, 3 Nov 2008 09:45:13 +1100	[thread overview]
Message-ID: <20081102224513.GH19509@disturbed> (raw)
In-Reply-To: <1786ab030810311354h1a7c8fb0q1267969d432f521c@mail.gmail.com>

On Fri, Oct 31, 2008 at 01:54:14PM -0700, Chad Talbott wrote:
> We are looking at adding calls to posix_fadvise(DONTNEED) to various
> data logging routines.  This has two benefits:
> 
>   - frequent write-out -> shorter queues give lower latency, also disk
>     is more utilized as writeout begins immediately
> 
>   - less useless stuff in page cache
> 
> One problem with fadvise() (and ext2, at least) is that associated
> metadata isn't scheduled with the data. So, for a large log file with
> a high append rate, hundreds of indirect blocks are left to be written
> out by periodic writeback.  This metadata consists of single blocks
> spaced by 4MB, leading to spikes of very inefficient disk utilization,
> deep queues and high latency.

Sounds like a filesystem bug to me, not a problem with
posix_fadvise(DONTNEED).

> Andrew suggests a new SYNC_FILE_RANGE_METADATA flag for
> sys_sync_file_range(), and leaving posix_fadvise() alone.

What is the interface that a filesystem will see? No filesystem has
a "metadata sync" method - is this going to fall through to some new
convoluted combination of writeback flags to an inode/mapping
that more filesystems than not can get wrong?

FWIW, sys_sync_file_range() is fundamentally broken for data
integrity writeback - at no time does it call a filesystem method
that can result in a barrier I/O being issued to disk after
writeback is complete. So, unlike fsync() or fdatasync(), the data
can still be lost after completion due to power failure on drives
with volatile write caches....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  parent reply	other threads:[~2008-11-02 22:45 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-31 20:54 Metadata in sys_sync_file_range and fadvise(DONTNEED) Chad Talbott
2008-11-01  9:21 ` Andrew Morton
2008-11-06  0:56   ` Chad Talbott
2008-11-06  1:07     ` Andrew Morton
2008-11-06  1:27       ` Chad Talbott
2008-11-02 22:45 ` Dave Chinner [this message]
2008-11-06  1:19   ` Chad Talbott
2008-11-06  4:20     ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081102224513.GH19509@disturbed \
    --to=david@fromorbit.com \
    --cc=akpm@google.com \
    --cc=ctalbott@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mrubin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox