From: Dave Chinner <david@fromorbit.com>
To: Jeff Moyer <jmoyer@redhat.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>,
linux-kernel@vger.kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
"J. Bruce Fields" <bfields@fieldses.org>,
Theodore Ts'o <tytso@mit.edu>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Andreas Dilger <adilger.kernel@dilger.ca>,
Dan Williams <dan.j.williams@intel.com>,
Ingo Molnar <mingo@redhat.com>, Jan Kara <jack@suse.com>,
Jeff Layton <jlayton@poochiereds.net>,
Matthew Wilcox <willy@linux.intel.com>,
Thomas Gleixner <tglx@linutronix.de>,
linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, linux-nvdimm@ml01.01.org, x86@kernel.org,
xfs@oss.sgi.com, Andrew Morton <akpm@linux-foundation.org>,
Matthew Wilcox <matthew.r.wilcox@intel.com>
Subject: Re: [RFC 00/11] DAX fsynx/msync support
Date: Tue, 3 Nov 2015 07:10:29 +1100 [thread overview]
Message-ID: <20151102201029.GI10656@dastard> (raw)
In-Reply-To: <x49vb9kqy5k.fsf@segfault.boston.devel.redhat.com>
On Mon, Nov 02, 2015 at 09:22:15AM -0500, Jeff Moyer wrote:
> Dave Chinner <david@fromorbit.com> writes:
>
> > Further, REQ_FLUSH/REQ_FUA are more than just "put the data on stable
> > storage" commands. They are also IO barriers that affect scheduling
> > of IOs in progress and in the request queues. A REQ_FLUSH/REQ_FUA
> > IO cannot be dispatched before all prior IO has been dispatched and
> > drained from the request queue, and IO submitted after a queued
> > REQ_FLUSH/REQ_FUA cannot be scheduled ahead of the queued
> > REQ_FLUSH/REQ_FUA operation.
> >
> > IOWs, REQ_FUA/REQ_FLUSH not only guarantee data is on stable
> > storage, they also guarantee the order of IO dispatch and
> > completion when concurrent IO is in progress.
>
> This hasn't been the case for several years, now. It used to work that
> way, and that was deemed a big performance problem. Since file systems
> already issued and waited for all I/O before sending down a barrier, we
> decided to get rid of the I/O ordering pieces of barriers (and stop
> calling them barriers).
>
> See commit 28e7d184521 (block: drop barrier ordering by queue draining).
Yes, I realise that, even if I wasn't very clear about how I wrote
it. ;)
Correct me if I'm wrong: AFAIA, dispatch ordering (i.e. the "IO
barrier") is still enforced by the scheduler via REQ_FUA|REQ_FLUSH
-> ELEVATOR_INSERT_FLUSH -> REQ_SOFTBARRIER and subsequent IO
scheduler calls to elv_dispatch_sort() that don't pass
REQ_SOFTBARRIER in the queue.
IOWs, if we queue a bunch of REQ_WRITE IOs followed by a
REQ_WRITE|REQ_FLUSH IO, all of the prior REQ_WRITE IOs will be
dispatched before the REQ_WRITE|REQ_FLUSH IO and hence be captured
by the cache flush.
Hence once the filesystem has waited on the REQ_WRITE|REQ_FLUSH IO
to complete, we know that all the earlier REQ_WRITE IOs are on
stable storage, too. Hence there's no need for the elevator to drain
the queue to guarantee completion ordering - the dispatch ordering
and flush/fua write semantics guarantee that when the flush/fua
completes, all the IOs dispatch prior to that flush/fua write are
also on stable storage...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Jeff Moyer <jmoyer@redhat.com>
Cc: Jeff Layton <jlayton@poochiereds.net>,
x86@kernel.org, Theodore Ts'o <tytso@mit.edu>,
Andrew Morton <akpm@linux-foundation.org>,
Thomas Gleixner <tglx@linutronix.de>,
linux-nvdimm@ml01.01.org, Jan Kara <jack@suse.com>,
linux-kernel@vger.kernel.org, xfs@oss.sgi.com,
"J. Bruce Fields" <bfields@fieldses.org>,
linux-mm@kvack.org, Ingo Molnar <mingo@redhat.com>,
Andreas Dilger <adilger.kernel@dilger.ca>,
Alexander Viro <viro@zeniv.linux.org.uk>,
"H. Peter Anvin" <hpa@zytor.com>,
linux-fsdevel@vger.kernel.org,
Matthew Wilcox <willy@linux.intel.com>,
Ross Zwisler <ross.zwisler@linux.intel.com>,
linux-ext4@vger.kernel.org,
Dan Williams <dan.j.williams@intel.com>,
Matthew Wilcox <matthew.r.wilcox@intel.com>
Subject: Re: [RFC 00/11] DAX fsynx/msync support
Date: Tue, 3 Nov 2015 07:10:29 +1100 [thread overview]
Message-ID: <20151102201029.GI10656@dastard> (raw)
In-Reply-To: <x49vb9kqy5k.fsf@segfault.boston.devel.redhat.com>
On Mon, Nov 02, 2015 at 09:22:15AM -0500, Jeff Moyer wrote:
> Dave Chinner <david@fromorbit.com> writes:
>
> > Further, REQ_FLUSH/REQ_FUA are more than just "put the data on stable
> > storage" commands. They are also IO barriers that affect scheduling
> > of IOs in progress and in the request queues. A REQ_FLUSH/REQ_FUA
> > IO cannot be dispatched before all prior IO has been dispatched and
> > drained from the request queue, and IO submitted after a queued
> > REQ_FLUSH/REQ_FUA cannot be scheduled ahead of the queued
> > REQ_FLUSH/REQ_FUA operation.
> >
> > IOWs, REQ_FUA/REQ_FLUSH not only guarantee data is on stable
> > storage, they also guarantee the order of IO dispatch and
> > completion when concurrent IO is in progress.
>
> This hasn't been the case for several years, now. It used to work that
> way, and that was deemed a big performance problem. Since file systems
> already issued and waited for all I/O before sending down a barrier, we
> decided to get rid of the I/O ordering pieces of barriers (and stop
> calling them barriers).
>
> See commit 28e7d184521 (block: drop barrier ordering by queue draining).
Yes, I realise that, even if I wasn't very clear about how I wrote
it. ;)
Correct me if I'm wrong: AFAIA, dispatch ordering (i.e. the "IO
barrier") is still enforced by the scheduler via REQ_FUA|REQ_FLUSH
-> ELEVATOR_INSERT_FLUSH -> REQ_SOFTBARRIER and subsequent IO
scheduler calls to elv_dispatch_sort() that don't pass
REQ_SOFTBARRIER in the queue.
IOWs, if we queue a bunch of REQ_WRITE IOs followed by a
REQ_WRITE|REQ_FLUSH IO, all of the prior REQ_WRITE IOs will be
dispatched before the REQ_WRITE|REQ_FLUSH IO and hence be captured
by the cache flush.
Hence once the filesystem has waited on the REQ_WRITE|REQ_FLUSH IO
to complete, we know that all the earlier REQ_WRITE IOs are on
stable storage, too. Hence there's no need for the elevator to drain
the queue to guarantee completion ordering - the dispatch ordering
and flush/fua write semantics guarantee that when the flush/fua
completes, all the IOs dispatch prior to that flush/fua write are
also on stable storage...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Jeff Moyer <jmoyer@redhat.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>,
linux-kernel@vger.kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
"J. Bruce Fields" <bfields@fieldses.org>,
"Theodore Ts'o" <tytso@mit.edu>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Andreas Dilger <adilger.kernel@dilger.ca>,
Dan Williams <dan.j.williams@intel.com>,
Ingo Molnar <mingo@redhat.com>, Jan Kara <jack@suse.com>,
Jeff Layton <jlayton@poochiereds.net>,
Matthew Wilcox <willy@linux.intel.com>,
Thomas Gleixner <tglx@linutronix.de>,
linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, linux-nvdimm@ml01.01.org, x86@kernel.org,
xfs@oss.sgi.com, Andrew Morton <akpm@linux-foundation.org>,
Matthew Wilcox <matthew.r.wilcox@intel.com>
Subject: Re: [RFC 00/11] DAX fsynx/msync support
Date: Tue, 3 Nov 2015 07:10:29 +1100 [thread overview]
Message-ID: <20151102201029.GI10656@dastard> (raw)
In-Reply-To: <x49vb9kqy5k.fsf@segfault.boston.devel.redhat.com>
On Mon, Nov 02, 2015 at 09:22:15AM -0500, Jeff Moyer wrote:
> Dave Chinner <david@fromorbit.com> writes:
>
> > Further, REQ_FLUSH/REQ_FUA are more than just "put the data on stable
> > storage" commands. They are also IO barriers that affect scheduling
> > of IOs in progress and in the request queues. A REQ_FLUSH/REQ_FUA
> > IO cannot be dispatched before all prior IO has been dispatched and
> > drained from the request queue, and IO submitted after a queued
> > REQ_FLUSH/REQ_FUA cannot be scheduled ahead of the queued
> > REQ_FLUSH/REQ_FUA operation.
> >
> > IOWs, REQ_FUA/REQ_FLUSH not only guarantee data is on stable
> > storage, they also guarantee the order of IO dispatch and
> > completion when concurrent IO is in progress.
>
> This hasn't been the case for several years, now. It used to work that
> way, and that was deemed a big performance problem. Since file systems
> already issued and waited for all I/O before sending down a barrier, we
> decided to get rid of the I/O ordering pieces of barriers (and stop
> calling them barriers).
>
> See commit 28e7d184521 (block: drop barrier ordering by queue draining).
Yes, I realise that, even if I wasn't very clear about how I wrote
it. ;)
Correct me if I'm wrong: AFAIA, dispatch ordering (i.e. the "IO
barrier") is still enforced by the scheduler via REQ_FUA|REQ_FLUSH
-> ELEVATOR_INSERT_FLUSH -> REQ_SOFTBARRIER and subsequent IO
scheduler calls to elv_dispatch_sort() that don't pass
REQ_SOFTBARRIER in the queue.
IOWs, if we queue a bunch of REQ_WRITE IOs followed by a
REQ_WRITE|REQ_FLUSH IO, all of the prior REQ_WRITE IOs will be
dispatched before the REQ_WRITE|REQ_FLUSH IO and hence be captured
by the cache flush.
Hence once the filesystem has waited on the REQ_WRITE|REQ_FLUSH IO
to complete, we know that all the earlier REQ_WRITE IOs are on
stable storage, too. Hence there's no need for the elevator to drain
the queue to guarantee completion ordering - the dispatch ordering
and flush/fua write semantics guarantee that when the flush/fua
completes, all the IOs dispatch prior to that flush/fua write are
also on stable storage...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2015-11-02 20:10 UTC|newest]
Thread overview: 86+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-29 20:12 [RFC 00/11] DAX fsynx/msync support Ross Zwisler
2015-10-29 20:12 ` Ross Zwisler
2015-10-29 20:12 ` Ross Zwisler
2015-10-29 20:12 ` [RFC 01/11] pmem: add wb_cache_pmem() to the PMEM API Ross Zwisler
2015-10-29 20:12 ` Ross Zwisler
2015-10-29 20:12 ` Ross Zwisler
2015-10-29 20:12 ` [RFC 02/11] mm: add pmd_mkclean() Ross Zwisler
2015-10-29 20:12 ` Ross Zwisler
2015-10-29 20:12 ` Ross Zwisler
2015-10-29 20:12 ` [RFC 03/11] pmem: enable REQ_FLUSH handling Ross Zwisler
2015-10-29 20:12 ` Ross Zwisler
2015-10-29 20:12 ` Ross Zwisler
2015-10-29 20:12 ` [RFC 04/11] dax: support dirty DAX entries in radix tree Ross Zwisler
2015-10-29 20:12 ` Ross Zwisler
2015-10-29 20:12 ` Ross Zwisler
2015-10-29 20:12 ` [RFC 05/11] mm: add follow_pte_pmd() Ross Zwisler
2015-10-29 20:12 ` Ross Zwisler
2015-10-29 20:12 ` Ross Zwisler
2015-10-29 20:12 ` Ross Zwisler
2015-10-29 20:12 ` [RFC 06/11] mm: add pgoff_mkclean() Ross Zwisler
2015-10-29 20:12 ` Ross Zwisler
2015-10-29 20:12 ` Ross Zwisler
2015-10-29 20:12 ` [RFC 07/11] mm: add find_get_entries_tag() Ross Zwisler
2015-10-29 20:12 ` Ross Zwisler
2015-10-29 20:12 ` Ross Zwisler
2015-10-29 20:12 ` [RFC 08/11] fs: add get_block() to struct inode_operations Ross Zwisler
2015-10-29 20:12 ` Ross Zwisler
2015-10-29 20:12 ` Ross Zwisler
2015-10-29 20:12 ` [RFC 09/11] dax: add support for fsync/sync Ross Zwisler
2015-10-29 20:12 ` Ross Zwisler
2015-10-29 20:12 ` Ross Zwisler
2015-10-29 20:12 ` Ross Zwisler
2015-10-29 20:12 ` [RFC 10/11] xfs, ext2: call dax_pfn_mkwrite() on write fault Ross Zwisler
2015-10-29 20:12 ` Ross Zwisler
2015-10-29 20:12 ` Ross Zwisler
2015-10-29 20:12 ` Ross Zwisler
2015-10-29 20:12 ` [RFC 11/11] ext4: add ext4_dax_pfn_mkwrite() Ross Zwisler
2015-10-29 20:12 ` Ross Zwisler
2015-10-29 20:12 ` Ross Zwisler
2015-10-29 22:49 ` [RFC 00/11] DAX fsynx/msync support Ross Zwisler
2015-10-29 22:49 ` Ross Zwisler
2015-10-29 22:49 ` Ross Zwisler
2015-10-29 22:49 ` Ross Zwisler
2015-10-30 3:55 ` Dave Chinner
2015-10-30 3:55 ` Dave Chinner
2015-10-30 3:55 ` Dave Chinner
2015-10-30 18:39 ` Ross Zwisler
2015-10-30 18:39 ` Ross Zwisler
2015-10-30 18:39 ` Ross Zwisler
2015-11-01 23:29 ` Dave Chinner
2015-11-01 23:29 ` Dave Chinner
2015-11-01 23:29 ` Dave Chinner
2015-11-02 14:22 ` Jeff Moyer
2015-11-02 14:22 ` Jeff Moyer
2015-11-02 14:22 ` Jeff Moyer
2015-11-02 20:10 ` Dave Chinner [this message]
2015-11-02 20:10 ` Dave Chinner
2015-11-02 20:10 ` Dave Chinner
2015-11-02 21:02 ` Jeff Moyer
2015-11-02 21:02 ` Jeff Moyer
2015-11-02 21:02 ` Jeff Moyer
2015-11-04 18:34 ` Jeff Moyer
2015-11-04 18:34 ` Jeff Moyer
2015-11-04 18:34 ` Jeff Moyer
2015-11-05 8:33 ` Dave Chinner
2015-11-05 8:33 ` Dave Chinner
2015-11-05 8:33 ` Dave Chinner
2015-11-05 19:49 ` Jeff Moyer
2015-11-05 19:49 ` Jeff Moyer
2015-11-05 19:49 ` Jeff Moyer
2015-11-05 20:54 ` Jens Axboe
2015-11-05 20:54 ` Jens Axboe
2015-11-05 20:54 ` Jens Axboe
2015-10-30 18:34 ` Dan Williams
2015-10-30 18:34 ` Dan Williams
2015-10-30 18:34 ` Dan Williams
2015-10-30 19:43 ` Ross Zwisler
2015-10-30 19:43 ` Ross Zwisler
2015-10-30 19:43 ` Ross Zwisler
2015-10-30 19:51 ` Dan Williams
2015-10-30 19:51 ` Dan Williams
2015-10-30 19:51 ` Dan Williams
2015-10-30 19:51 ` Dan Williams
2015-11-01 23:36 ` Dave Chinner
2015-11-01 23:36 ` Dave Chinner
2015-11-01 23:36 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151102201029.GI10656@dastard \
--to=david@fromorbit.com \
--cc=adilger.kernel@dilger.ca \
--cc=akpm@linux-foundation.org \
--cc=bfields@fieldses.org \
--cc=dan.j.williams@intel.com \
--cc=hpa@zytor.com \
--cc=jack@suse.com \
--cc=jlayton@poochiereds.net \
--cc=jmoyer@redhat.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@ml01.01.org \
--cc=matthew.r.wilcox@intel.com \
--cc=mingo@redhat.com \
--cc=ross.zwisler@linux.intel.com \
--cc=tglx@linutronix.de \
--cc=tytso@mit.edu \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@linux.intel.com \
--cc=x86@kernel.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.