linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Is concurrent file read/write with O_DIRECT flag atomic?
@ 2017-11-12 14:59 Leo Chen
  2017-11-12 15:09 ` Liang Chen
  2017-11-13  1:09 ` Dave Chinner
  0 siblings, 2 replies; 3+ messages in thread
From: Leo Chen @ 2017-11-12 14:59 UTC (permalink / raw)
  To: linux-fsdevel

Hi,

I apologize if this topic is not proper for this mail list. I asked
the question on other channel, but haven't got answers yet (see
https://stackoverflow.com/questions/47245162/is-concurrent-file-read-write-with-o-direct-flag-atomic).

Basically, I have a non-sparse binary file. A writer process opens the
file using O_DIRECT flag, and it keeps calling pwrite() to update the
first 128KB data of the file. Meanwhile, multiple readers also keeps
calling pread() to read the first 128KB data. The readers open the
file using O_DIRECT flag.

Although I could not find any document saying that O_DIRECT guarantee
atomicity of concurrent read/write, I thought that the readers should
read back consistent data. Since the data to read/write from/to the
file is block aligned, I assume that kernel would just submit a single
scatter-gather command for one pwrite() or one pread(). Per my
knowledge of SCSI/SATA driver, HDD should (correct me if I'm wrong)
process each scatter-gather command atomically. Therefore, I thought
read/write operations in this scenario are atomic.

I wrote a program to verify my thought. Surprisingly, the readers did
occasionally read back mixed data. For example, in the first pwrite(),
the writer writes all 0x11, and in the 2nd pwrite(), it writes all
0x22, and in the 3rd write, it writes all 0x33... Occasionally, a
reader can read back data like "0x11, 0x11, .... 0x11, 0x22, 0x22....
0x22". The data appears to be from two consecutive pwrite() calls. I
checked the offset where the broken starts. The offset seems to be
sector-aligned (512-byte-aligned).

Why could such consistent issue happen? Did I miss anything in my
analysis and theory?

Thanks,
Leo

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Is concurrent file read/write with O_DIRECT flag atomic?
  2017-11-12 14:59 Is concurrent file read/write with O_DIRECT flag atomic? Leo Chen
@ 2017-11-12 15:09 ` Liang Chen
  2017-11-13  1:09 ` Dave Chinner
  1 sibling, 0 replies; 3+ messages in thread
From: Liang Chen @ 2017-11-12 15:09 UTC (permalink / raw)
  To: linux-fsdevel

On Sun, Nov 12, 2017 at 9:59 AM, Leo Chen <liangc8367@gmail.com> wrote:
>
>
> I wrote a program to verify my thought. Surprisingly, the readers did
> occasionally read back mixed data. For example, in the first pwrite(),
> the writer writes all 0x11, and in the 2nd pwrite(), it writes all
> 0x22, and in the 3rd write, it writes all 0x33... Occasionally, a
> reader can read back data like "0x11, 0x11, .... 0x11, 0x22, 0x22....
> 0x22". The data appears to be from two consecutive pwrite() calls. I
> checked the offset where the broken starts. The offset seems to be
> sector-aligned (512-byte-aligned).
>
Just for clarification purpose, the writer loop looks like

    while(true) {
        memset(aligned_buffer, seed, 128KB);
        pwrite(fd, aligned_buffer, 128KB);
        seed++;
    }

the reader loop looks like:
    while(true) {
        pread(fd, aligned_reader_buffer, 128KB);
        verify_aligned_reader_buffer();
    }

Thanks,
Leo

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Is concurrent file read/write with O_DIRECT flag atomic?
  2017-11-12 14:59 Is concurrent file read/write with O_DIRECT flag atomic? Leo Chen
  2017-11-12 15:09 ` Liang Chen
@ 2017-11-13  1:09 ` Dave Chinner
  1 sibling, 0 replies; 3+ messages in thread
From: Dave Chinner @ 2017-11-13  1:09 UTC (permalink / raw)
  To: Leo Chen; +Cc: linux-fsdevel

On Sun, Nov 12, 2017 at 09:59:49AM -0500, Leo Chen wrote:
> Hi,
> 
> I apologize if this topic is not proper for this mail list. I asked
> the question on other channel, but haven't got answers yet (see
> https://stackoverflow.com/questions/47245162/is-concurrent-file-read-write-with-o-direct-flag-atomic).
> 
> Basically, I have a non-sparse binary file. A writer process opens the
> file using O_DIRECT flag, and it keeps calling pwrite() to update the
> first 128KB data of the file. Meanwhile, multiple readers also keeps
> calling pread() to read the first 128KB data. The readers open the
> file using O_DIRECT flag.
> 
> Although I could not find any document saying that O_DIRECT guarantee
> atomicity of concurrent read/write,

That's because there isn't any.  Concurrent O_DIRECT I/O to the same
file or block device offset gives undefined results.

> I thought that the readers should
> read back consistent data.

Only if your IO size is a single sector. O_DIRECT gives no atomicity
guarantees for IOs larger than a single sector because the
underlying storage doesn't provide atomicity guarantees for
multi-sector IOs.

IOWs, O_DIRECT delegates all responsibility for IO and cache
coherence to userspace. Your app needs to provide synchronisation of
concurrent overlapping IO because the kernel will not do it for you
with O_DIRECT.

> Since the data to read/write from/to the
> file is block aligned, I assume that kernel would just submit a single
> scatter-gather command for one pwrite() or one pread().

So, you've got a RAID0 device that means the "single read IO" is split
and sent to 8 different devices, which all race with the "single
write IO" that was also split and sent to those 8 devices....

> I wrote a program to verify my thought. Surprisingly, the readers did
> occasionally read back mixed data. For example, in the first pwrite(),
> the writer writes all 0x11, and in the 2nd pwrite(), it writes all
> 0x22, and in the 3rd write, it writes all 0x33... Occasionally, a
> reader can read back data like "0x11, 0x11, .... 0x11, 0x22, 0x22....
> 0x22". The data appears to be from two consecutive pwrite() calls. I
> checked the offset where the broken starts. The offset seems to be
> sector-aligned (512-byte-aligned).

Yup, that's exactly where you'll see data changes (sector
boundaries) when you have concurrent IO races like this.

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-11-13  1:10 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-12 14:59 Is concurrent file read/write with O_DIRECT flag atomic? Leo Chen
2017-11-12 15:09 ` Liang Chen
2017-11-13  1:09 ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).