Re: O_DIRECT vs BLK_FEAT_STABLE_WRITES, was Re: [PATCH] btrfs: never trust the bio from direct IO

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: Jan Kara <jack@suse.cz>, Matthew Wilcox <willy@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>, Qu Wenruo <wqu@suse.com>,
	linux-btrfs@vger.kernel.org, djwong@kernel.org,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-block@vger.kernel.org, linux-mm@kvack.org,
	martin.petersen@oracle.com, jack@suse.com
Subject: Re: O_DIRECT vs BLK_FEAT_STABLE_WRITES, was Re: [PATCH] btrfs: never trust the bio from direct IO
Date: Mon, 20 Oct 2025 21:00:50 +0200	[thread overview]
Message-ID: <5bd1d360-bee0-4fa2-80c8-476519e98b00@redhat.com> (raw)
In-Reply-To: <xc2orfhavfqaxrmxtsbf4kepglfujjodvhfzhzfawwaxlyrhlb@gammchkzoh2m>

On 20.10.25 17:58, Jan Kara wrote:
> On Mon 20-10-25 15:59:07, Matthew Wilcox wrote:
>> On Mon, Oct 20, 2025 at 03:59:33PM +0200, Jan Kara wrote:
>>> The idea was to bounce buffer the page we are writing back in case we spot
>>> a long-term pin we cannot just wait for - hence bouncing should be rare.
>>> But in this more general setting it is challenging to not bounce buffer for
>>> every IO (in which case you'd be basically at performance of RWF_DONTCACHE
>>> IO or perhaps worse so why bother?). Essentially if you hand out the real
>>> page underlying the buffer for the IO, all other attemps to do IO to that
>>> page have to block - bouncing is no longer an option because even with
>>> bouncing the second IO we could still corrupt data of the first IO once we
>>> copy to the final buffer. And if we'd block waiting for the first IO to
>>> complete, userspace could construct deadlock cycles - like racing IO to
>>> pages A, B with IO to pages B, A. So far I'm not sure about a sane way out
>>> of this...
>>
>> There isn't one.  We might have DMA-mapped this page earlier, and so a
>> device could write to it at any time.  Even if we remove PTE write
>> permissions ...
> 
> True but writes through DMA to the page are guarded by holding a page pin
> these days so we could in theory block getting another page pin or mapping
> the page writeably until the pin is released... if we can figure out a
> convincing story for dealing with long-term pins from RDMA and dealing with
> possible deadlocks created by this.

Just FYI, because it might be interesting in this context.

For anonymous memory we have this working by only writing the folio out 
if it is completely unmapped and there are no unexpected folio 
references/pins (see pageout()), and only allowing to write to such a 
folio ("reuse") if SWP_STABLE_WRITES is not set (see do_swap_page()).

So once we start writeback the folio has no writable page table mappings 
(unmapped) and no GUP pins. Consequently, when trying to write to it we 
can just fallback to creating a page copy without causing trouble with 
GUP pins.

-- 
Cheers

David / dhildenb

next prev parent reply	other threads:[~2025-10-20 19:00 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1ee861df6fbd8bf45ab42154f429a31819294352.1760951886.git.wqu@suse.com>
2025-10-20 10:00 ` O_DIRECT vs BLK_FEAT_STABLE_WRITES, was Re: [PATCH] btrfs: never trust the bio from direct IO Christoph Hellwig
2025-10-20 10:24   ` Qu Wenruo
2025-10-20 11:45     ` Christoph Hellwig
2025-10-20 11:16   ` Jan Kara
2025-10-20 11:44     ` Christoph Hellwig
2025-10-20 13:59       ` Jan Kara
2025-10-20 14:59         ` Matthew Wilcox
2025-10-20 15:58           ` Jan Kara
2025-10-20 17:55             ` John Hubbard
2025-10-21  8:27               ` Jan Kara
2025-10-21 16:56                 ` John Hubbard
2025-10-20 19:00             ` David Hildenbrand [this message]
2025-10-21  7:49               ` Christoph Hellwig
2025-10-21  7:57                 ` David Hildenbrand
2025-10-21  9:33                   ` Jan Kara
2025-10-21  9:43                     ` David Hildenbrand
2025-10-21  9:22                 ` Jan Kara
2025-10-21  9:37                   ` David Hildenbrand
2025-10-21  9:52                     ` Jan Kara
2025-10-21  3:17   ` Qu Wenruo
2025-10-21  7:48     ` Christoph Hellwig
2025-10-21  8:15       ` Qu Wenruo
2025-10-21 11:30         ` Johannes Thumshirn
2025-10-22  2:27           ` Qu Wenruo
2025-10-22  5:04             ` hch
2025-10-22  6:17               ` Qu Wenruo
2025-10-22  6:24                 ` hch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5bd1d360-bee0-4fa2-80c8-476519e98b00@redhat.com \
    --to=david@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@infradead.org \
    --cc=jack@suse.com \
    --cc=jack@suse.cz \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=willy@infradead.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).