public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Metadata in sys_sync_file_range and fadvise(DONTNEED)
@ 2008-10-31 20:54 Chad Talbott
  2008-11-01  9:21 ` Andrew Morton
  2008-11-02 22:45 ` Dave Chinner
  0 siblings, 2 replies; 8+ messages in thread
From: Chad Talbott @ 2008-10-31 20:54 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton, Michael Rubin

We are looking at adding calls to posix_fadvise(DONTNEED) to various
data logging routines.  This has two benefits:

  - frequent write-out -> shorter queues give lower latency, also disk
    is more utilized as writeout begins immediately

  - less useless stuff in page cache

One problem with fadvise() (and ext2, at least) is that associated
metadata isn't scheduled with the data.  So, for a large log file with
a high append rate, hundreds of indirect blocks are left to be written
out by periodic writeback.  This metadata consists of single blocks
spaced by 4MB, leading to spikes of very inefficient disk utilization,
deep queues and high latency.

Andrew suggests a new SYNC_FILE_RANGE_METADATA flag for
sys_sync_file_range(), and leaving posix_fadvise() alone.  That will
work for my purposes, but it seems like it leaves
posix_fadvise(DONTNEED) with a performance bug on ext2 (or any other
filesystem with interleaved data/metadata).  Andrew's argument is that
people have expectations about posix_fadvise() behavior as it's been
around for years in Linux.

I'd like to get a consensus on what The Right Thing is, so I can move
toward implementing it and moving the logging code onto that
interface.

Chad

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-11-06  4:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-31 20:54 Metadata in sys_sync_file_range and fadvise(DONTNEED) Chad Talbott
2008-11-01  9:21 ` Andrew Morton
2008-11-06  0:56   ` Chad Talbott
2008-11-06  1:07     ` Andrew Morton
2008-11-06  1:27       ` Chad Talbott
2008-11-02 22:45 ` Dave Chinner
2008-11-06  1:19   ` Chad Talbott
2008-11-06  4:20     ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox