All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Theodore Ts'o" <tytso@mit.edu>
To: Travis Downs <travis.downs@gmail.com>
Cc: linux-block@vger.kernel.org
Subject: Re: Semantics of racy O_DIRECT writes
Date: Thu, 9 Jan 2025 10:51:19 -0500	[thread overview]
Message-ID: <20250109155119.GF1323402@mit.edu> (raw)
In-Reply-To: <CAOBGo4w88v0tqDiTwAPP6OQLXHGdjx1oFKaB0oRY45dmC-D1_Q@mail.gmail.com>

On Thu, Jan 09, 2025 at 11:16:41AM -0300, Travis Downs wrote:
>  - So the question is large about the possible outcomes of doing a zero-
>    copy O_DIRECT write (where the block driver will ultimately be reading
>    directly from the pages allocated by and passed to the kernel by the
>    userspace application) in the situation where a portion of the the passed
>    pages are modified in a racy way by the userspace application by a
>    subsequent O_DIRECT write.

Yeah, sorry, I thought "modified via memcpy() was via a memcpy to a
mmap'ed region", which would mean it's in the page cache.  If what you
mean one thread modifying a block while the O_DIRECT write is
underway, the answer is "it depends".  For non-Linux systems, it will
almost certainly be racy.

For Linux, if the block device is one that requires stable writes
(e.g., for iSCSI writes which include a checksum, or SCSI devices with
DIF/DIX enabled, or some software RAID 5 block device), where a racy
write might lead to an I/O error on the write or in the case of RAID
5, in the subsequent read of the block, Linux will protect against
this happening by marking the page read-only while the I/O is
underway, either if it's happening via buffered writeback or O_DIRECT
writes, and then marking the page read/write afterwards.  Doing this
has performance implications, since changing the page table and the
need to do global interprocessor interupts is not free.  So we only do
it for those block devices that require stable writes, and even if you
are interested in a Linux-only answer, it's still "it depends".

Cheers,

					- Ted

  parent reply	other threads:[~2025-01-09 15:51 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-08 16:33 Semantics of racy O_DIRECT writes Travis Downs
2025-01-09  4:57 ` Theodore Ts'o
2025-01-09 14:16   ` Travis Downs
2025-01-09 15:01     ` Travis Downs
2025-01-09 17:32       ` Bart Van Assche
2025-01-10  9:42         ` Christoph Hellwig
2025-01-31 19:58           ` Travis Downs
2025-01-09 15:51     ` Theodore Ts'o [this message]
2025-01-10  8:58       ` Christoph Hellwig
2025-01-31 20:06         ` Travis Downs
2025-02-04  5:19           ` Christoph Hellwig
2025-02-04 14:32             ` Travis Downs

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250109155119.GF1323402@mit.edu \
    --to=tytso@mit.edu \
    --cc=linux-block@vger.kernel.org \
    --cc=travis.downs@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.