From: Jan Kara <jack@suse.cz>
To: Andres Freund <andres@anarazel.de>
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: Triggering non-integrity writeback from userspace
Date: Sat, 24 Oct 2015 21:09:56 +0200 [thread overview]
Message-ID: <20151024190956.GA17642@quack.suse.cz> (raw)
In-Reply-To: <20151022131555.GC4378@alap3.anarazel.de>
Hi,
On Thu 22-10-15 15:15:55, Andres Freund wrote:
> postgres regularly has to checkpoint data to disk to be able to free
> data from its journal. We currently use buffered IO and that's not
> going to change short term.
>
> In a busy database this checkpointing process can write out a lot of
> data. Currently that frequently leads to massive latency spikes
> (c.f. 20140326191113.GF9066@alap3.anarazel.de) for other processed doing
> IO. These happen either when the kernel starts writeback or when, at the
> end of the checkpoint, we issue an fsync() on the datafiles.
>
> One odd issue there is that the kernel tends to do writeback in a very
> irregular manner. Even if we write data at a constant rate writeback
> very often happens in bulk - not a good idea for preserving
> interactivity.
>
> What we're preparing to do now is to regularly issue
> sync_file_range(SYNC_FILE_RANGE_WRITE) on a few blocks shortly after
> we've written them to to the OS. That way there's not too much dirty
> data in the page cache, so writeback won't cause latency spikes, and the
> fsync at the end doesn't have to write much if anything.
>
> That improves things a lot.
>
> But I still see latency spikes that shouldn't be there given the amount
> of IO. I'm wondering if that is related to the fact that
> SYNC_FILE_RANGE_WRITE ends up doing __filemap_fdatawrite_range with
> WB_SYNC_ALL specified. Given the the documentation for
> SYNC_FILE_RANGE_WRITE I did not expect that:
> * SYNC_FILE_RANGE_WRITE: start writeout of all dirty pages in the range which
> * are not presently under writeout. This is an asynchronous flush-to-disk
> * operation. Not suitable for data integrity operations.
>
> If I followed the code correctly - not a sure thing at all - that means
> bios are submitted with WRITE_SYNC specified. Not really what's needed
> in this case.
>
> Now I think the docs are somewhat clear that SYNC_FILE_RANGE_WRITE isn't
> there for data integrity, but it might be that people rely on in
> nonetheless. so I'm loathe to suggest changing that. But I do wonder if
> there's a way non-integrity writeback triggering could be exposed to
> userspace. A new fadvise flags seems like a good way to do that -
> POSIX_FADV_DONTNEED actually does non-integrity writeback, but also does
> other things, so it's not suitable for us.
You are absolutely correct that sync_file_range() should issue writeback as
WB_SYNC_NONE and not wait for current writeback in progress. That was an
oversight introduced by commit ee53a891f474 (mm: do_sync_mapping_range
integrity fix) which changed do_sync_mapping_range() to use WB_SYNC_ALL
because it had other users which relied WB_SYNC_ALL semantics. Later that
got copied over to the current sync_file_range() implementation.
I think we should just revert to the very explicitely documented behavior
of sync_file_range(). I'll send a patch for that. Thanks for report.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-10-24 19:09 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-22 13:15 Triggering non-integrity writeback from userspace Andres Freund
2015-10-24 19:09 ` Jan Kara [this message]
2015-10-24 21:39 ` Dave Chinner
2015-10-28 9:27 ` Andres Freund
2015-10-28 20:48 ` Dave Chinner
2015-10-28 23:23 ` Andres Freund
2015-10-29 1:54 ` Dave Chinner
2015-10-29 16:23 ` Andres Freund
2015-10-29 22:10 ` Dave Chinner
2015-10-28 23:26 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151024190956.GA17642@quack.suse.cz \
--to=jack@suse.cz \
--cc=andres@anarazel.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).