* Re: [PATCH 10/12] NFS: Do not serialise O_DIRECT reads and writes
[not found] ` <755A2A14-C6A9-4737-8335-0A6785490F6D@primarydata.com>
@ 2016-06-15 14:48 ` Christoph Hellwig
2016-06-15 14:52 ` Trond Myklebust
0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2016-06-15 14:48 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Christoph Hellwig, linux-nfs@vger.kernel.org, xfs
On Wed, Jun 15, 2016 at 02:29:42PM +0000, Trond Myklebust wrote:
> The locking is actually simpler than XFS.
It looks way more complicated. And totally undocumented.
> We have 2 I/O modes: buffered I/O and direct I/O. The write lock is there to ensure safe transitions between those 2 modes, but once the mode is set,
> we _only_ use shared locks in order to allow parallelism.
>From reading the patch that's not what actually happens - I think you're
still taking i_rwsem exclusive for buffered writes, aren't you?
Doing that is absolutely mandatory for Posix atomicy requirements, or
you'll break tons of of applications.
But XFS allows full parallelism for direct reads and writes as long
as there is no more pagecache to flush. But if you have pages in
the pagecache you need the exclusive lock to prevent against new
pagecache pages being added.
> >The nice thing is than in 4.7-rc i_mutex has been replaced with a
> >rw_mutex so you can just use that in shared mode for direct I/O
> >as-is without needing any new lock.
>
> We would end up serialising reads and writes, since the latter grab an
> exclusive lock in generic_file_write(). Why do that if we don???t have to?
Looks at the XFS code - no serialization between direct reads and writes
as long as no buffered I/O came in inbetween.
And don't use generic_file_{read,write}_iter if you want to do direct I/O,
unfortunately locking in mm/filemap.c is totally screwed for direct I/O,
take a look at XFS which is where direct I/O came from and where we get
the locking right.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 10/12] NFS: Do not serialise O_DIRECT reads and writes
2016-06-15 14:48 ` [PATCH 10/12] NFS: Do not serialise O_DIRECT reads and writes Christoph Hellwig
@ 2016-06-15 14:52 ` Trond Myklebust
2016-06-15 14:56 ` Christoph Hellwig
0 siblings, 1 reply; 7+ messages in thread
From: Trond Myklebust @ 2016-06-15 14:52 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-nfs@vger.kernel.org, xfs@oss.sgi.com
[-- Attachment #1.1: Type: text/plain, Size: 2240 bytes --]
On 6/15/16, 10:48, "Christoph Hellwig" <hch@infradead.org> wrote:
>On Wed, Jun 15, 2016 at 02:29:42PM +0000, Trond Myklebust wrote:
>> The locking is actually simpler than XFS.
>
>It looks way more complicated. And totally undocumented.
>
>> We have 2 I/O modes: buffered I/O and direct I/O. The write lock is there to ensure safe transitions between those 2 modes, but once the mode is set,
>> we _only_ use shared locks in order to allow parallelism.
>
>From reading the patch that's not what actually happens - I think you're
>still taking i_rwsem exclusive for buffered writes, aren't you?
>
>Doing that is absolutely mandatory for Posix atomicy requirements, or
>you'll break tons of of applications.
Yes. We continue to let the VFS manage serialisation of writes.
>But XFS allows full parallelism for direct reads and writes as long
>as there is no more pagecache to flush. But if you have pages in
>the pagecache you need the exclusive lock to prevent against new
>pagecache pages being added.
Exactly. So does this.
>> >The nice thing is than in 4.7-rc i_mutex has been replaced with a
>> >rw_mutex so you can just use that in shared mode for direct I/O
>> >as-is without needing any new lock.
>>
>> We would end up serialising reads and writes, since the latter grab an
>> exclusive lock in generic_file_write(). Why do that if we don???t have to?
>
>Looks at the XFS code - no serialization between direct reads and writes
>as long as no buffered I/O came in inbetween.
>
>And don't use generic_file_{read,write}_iter if you want to do direct I/O,
>unfortunately locking in mm/filemap.c is totally screwed for direct I/O,
>take a look at XFS which is where direct I/O came from and where we get
>the locking right.
We don’t use generic_file_* for O_DIRECT; we only use it for buffered I/O.
Disclaimer
The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful.
[-- Attachment #1.2: Type: text/html, Size: 2729 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 10/12] NFS: Do not serialise O_DIRECT reads and writes
2016-06-15 14:52 ` Trond Myklebust
@ 2016-06-15 14:56 ` Christoph Hellwig
2016-06-15 15:09 ` Trond Myklebust
0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2016-06-15 14:56 UTC (permalink / raw)
To: Trond Myklebust
Cc: Christoph Hellwig, linux-nfs@vger.kernel.org, xfs@oss.sgi.com
On Wed, Jun 15, 2016 at 02:52:24PM +0000, Trond Myklebust wrote:
> >But XFS allows full parallelism for direct reads and writes as long
> >as there is no more pagecache to flush. But if you have pages in
> >the pagecache you need the exclusive lock to prevent against new
> >pagecache pages being added.
>
> Exactly. So does this.
So let's avoid bloating the inode with another rw_semaphore, and make
everyones life easier by using the existing lock.
> We don???t use generic_file_* for O_DIRECT; we only use it for buffered I/O.
I know - this was just an answer to your reference to the generic_file_*
code.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 10/12] NFS: Do not serialise O_DIRECT reads and writes
2016-06-15 14:56 ` Christoph Hellwig
@ 2016-06-15 15:09 ` Trond Myklebust
2016-06-15 15:14 ` Christoph Hellwig
0 siblings, 1 reply; 7+ messages in thread
From: Trond Myklebust @ 2016-06-15 15:09 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-nfs@vger.kernel.org, xfs@oss.sgi.com
[-- Attachment #1.1: Type: text/plain, Size: 1383 bytes --]
On 6/15/16, 10:56, "Christoph Hellwig" <hch@infradead.org> wrote:
>On Wed, Jun 15, 2016 at 02:52:24PM +0000, Trond Myklebust wrote:
>> >But XFS allows full parallelism for direct reads and writes as long
>> >as there is no more pagecache to flush. But if you have pages in
>> >the pagecache you need the exclusive lock to prevent against new
>> >pagecache pages being added.
>>
>> Exactly. So does this.
>
>So let's avoid bloating the inode with another rw_semaphore, and make
>everyones life easier by using the existing lock.
As I said earlier, the problem with that is you end up artificially serialising buffered reads and buffered writes.
• The reads only need a shared lock in order to protect the I/O mode from flipping to O_DIRECT (and relying on page locks to protect against buffered writes).
• The writes need protection against O_DIRECT, they don’t need serialisation with other reads, but they do need to be serialised against other buffered writes.
Disclaimer
The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful.
[-- Attachment #1.2: Type: text/html, Size: 1715 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 10/12] NFS: Do not serialise O_DIRECT reads and writes
2016-06-15 15:09 ` Trond Myklebust
@ 2016-06-15 15:14 ` Christoph Hellwig
2016-06-15 15:45 ` Trond Myklebust
0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2016-06-15 15:14 UTC (permalink / raw)
To: Trond Myklebust
Cc: Christoph Hellwig, linux-nfs@vger.kernel.org, xfs@oss.sgi.com
On Wed, Jun 15, 2016 at 03:09:23PM +0000, Trond Myklebust wrote:
> As I said earlier, the problem with that is you end up artificially serialising buffered reads and buffered writes.
If you actually want to be Posix compiant you need to serialize buffered
reads against buffererd writes - it's just that most Linux file systems
happen to get this wrong.
> ??? The reads only need a shared lock in order to protect the I/O mode from flipping to O_DIRECT (and relying on page locks to protect against buffered writes).
Which strictly speaking is not enough, although as said above most
Linux filesystems get this wrong. If you indeed want to keep that
(incorrect) behavior you need another lock. It's defintively not
"simpler", though.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 10/12] NFS: Do not serialise O_DIRECT reads and writes
2016-06-15 15:14 ` Christoph Hellwig
@ 2016-06-15 15:45 ` Trond Myklebust
2016-06-16 9:12 ` Christoph Hellwig
0 siblings, 1 reply; 7+ messages in thread
From: Trond Myklebust @ 2016-06-15 15:45 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-nfs@vger.kernel.org, xfs@oss.sgi.com
[-- Attachment #1.1: Type: text/plain, Size: 1902 bytes --]
On 6/15/16, 11:14, "Christoph Hellwig" <hch@infradead.org> wrote:
>On Wed, Jun 15, 2016 at 03:09:23PM +0000, Trond Myklebust wrote:
>> As I said earlier, the problem with that is you end up artificially serialising buffered reads and buffered writes.
>
>If you actually want to be Posix compiant you need to serialize buffered
>reads against buffererd writes - it's just that most Linux file systems
>happen to get this wrong.
>
>> ??? The reads only need a shared lock in order to protect the I/O mode from flipping to O_DIRECT (and relying on page locks to protect against buffered writes).
>
>Which strictly speaking is not enough, although as said above most
>Linux filesystems get this wrong. If you indeed want to keep that
>(incorrect) behavior you need another lock. It's defintively not
>"simpler", though.
>
Serialisation is not mandatory in POSIX:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html
“Writes can be serialized with respect to other reads and writes. If a read() of file data can be proven (by any means) to occur after a write() of the data, it must reflect that write(), even if the calls are made by different processes. A similar requirement applies to multiple write operations to the same file position. This is needed to guarantee the propagation of data from write() calls to subsequent read() calls. This requirement is particularly significant for networked file systems, where some caching schemes violate these semantics.”
Disclaimer
The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful.
[-- Attachment #1.2: Type: text/html, Size: 2351 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 10/12] NFS: Do not serialise O_DIRECT reads and writes
2016-06-15 15:45 ` Trond Myklebust
@ 2016-06-16 9:12 ` Christoph Hellwig
0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2016-06-16 9:12 UTC (permalink / raw)
To: Trond Myklebust
Cc: Christoph Hellwig, linux-nfs@vger.kernel.org, xfs@oss.sgi.com
On Wed, Jun 15, 2016 at 03:45:37PM +0000, Trond Myklebust wrote:
> Serialisation is not mandatory in POSIX:
>
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html
>
> ???Writes can be serialized with respect to other reads and writes. If a read() of file data can be proven (by any means) to occur after a write() of the data, it must reflect that write(), even if the calls are made by different processes. A similar requirement applies to multiple write operations to the same file position. This is needed to guarantee the propagation of data from write() calls to subsequent read() calls. This requirement is particularly significant for networked file systems, where some caching schemes violate these semantics.???
That is the basic defintion, but once O_DSYNC and friends come into
play it gets more complicated:
>From http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html:
[SIO] [Option Start] If the O_DSYNC and O_RSYNC bits have been set,
read I/O operations on the file descriptor shall complete as defined
by synchronized I/O data integrity completion. If the O_SYNC and
O_RSYNC bits have been set, read I/O operations on the file descriptor
shall complete as defined by synchronized I/O file integrity completion.
[Option End]
Which directs to:
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html:
3.378 Synchronized I/O Data Integrity Completion
For read, when the operation has been completed or diagnosed if
unsuccessful. The read is complete only when an image of the data
has been successfully transferred to the requesting process. If
there were any pending write requests affecting the data to be read
at the time that the synchronized read operation was requested,
these write requests are successfully transferred prior to reading
the data.
For write, when the operation has been completed or diagnosed if
unsuccessful. The write is complete only when the data specified in
the write request is successfully transferred and all file system
information required to retrieve the data is successfully
transferred.
File attributes that are not necessary for data retrieval (access
time, modification time, status change time) need not be
successfully transferred prior to returning to the calling process.
While we'll never see O_RSYNC in the kernel glibc treats it as just
O_SYNC. Either way - I'd be much happier if we could come up with
less different ways to do read/write exclusion rather than more..
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-06-16 9:12 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1465931115-30784-3-git-send-email-trond.myklebust@primarydata.com>
[not found] ` <1465931115-30784-4-git-send-email-trond.myklebust@primarydata.com>
[not found] ` <1465931115-30784-5-git-send-email-trond.myklebust@primarydata.com>
[not found] ` <1465931115-30784-6-git-send-email-trond.myklebust@primarydata.com>
[not found] ` <1465931115-30784-7-git-send-email-trond.myklebust@primarydata.com>
[not found] ` <1465931115-30784-8-git-send-email-trond.myklebust@primarydata.com>
[not found] ` <1465931115-30784-9-git-send-email-trond.myklebust@primarydata.com>
[not found] ` <1465931115-30784-10-git-send-email-trond.myklebust@primarydata.com>
[not found] ` <20160615071343.GC4318@infradead.org>
[not found] ` <755A2A14-C6A9-4737-8335-0A6785490F6D@primarydata.com>
2016-06-15 14:48 ` [PATCH 10/12] NFS: Do not serialise O_DIRECT reads and writes Christoph Hellwig
2016-06-15 14:52 ` Trond Myklebust
2016-06-15 14:56 ` Christoph Hellwig
2016-06-15 15:09 ` Trond Myklebust
2016-06-15 15:14 ` Christoph Hellwig
2016-06-15 15:45 ` Trond Myklebust
2016-06-16 9:12 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox