Re: O_DIRECT vs BLK_FEAT_STABLE_WRITES, was Re: [PATCH] btrfs: never trust the bio from direct IO

Linux XFS filesystem development
 help / color / mirror / Atom feed

From: John Hubbard <jhubbard@nvidia.com>
To: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>,
	Christoph Hellwig <hch@infradead.org>, Qu Wenruo <wqu@suse.com>,
	linux-btrfs@vger.kernel.org, djwong@kernel.org,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-block@vger.kernel.org, linux-mm@kvack.org,
	martin.petersen@oracle.com, jack@suse.com
Subject: Re: O_DIRECT vs BLK_FEAT_STABLE_WRITES, was Re: [PATCH] btrfs: never trust the bio from direct IO
Date: Tue, 21 Oct 2025 09:56:21 -0700	[thread overview]
Message-ID: <b4ee24a3-0706-47aa-b2ad-0f60f90793ee@nvidia.com> (raw)
In-Reply-To: <6hedspdzoxjtdim7nruoeh5m4mx3xecubf7einzl67jzjmi3er@o54b7v5njwk5>

On 10/21/25 1:27 AM, Jan Kara wrote:
> On Mon 20-10-25 10:55:06, John Hubbard wrote:
>> On 10/20/25 8:58 AM, Jan Kara wrote:
>>> On Mon 20-10-25 15:59:07, Matthew Wilcox wrote:
>>>> On Mon, Oct 20, 2025 at 03:59:33PM +0200, Jan Kara wrote:
>>>>> The idea was to bounce buffer the page we are writing back in case we spot
>>>>> a long-term pin we cannot just wait for - hence bouncing should be rare.
>>>>> But in this more general setting it is challenging to not bounce buffer for
>>>>> every IO (in which case you'd be basically at performance of RWF_DONTCACHE
>>>>> IO or perhaps worse so why bother?). Essentially if you hand out the real
>>>>> page underlying the buffer for the IO, all other attemps to do IO to that
>>>>> page have to block - bouncing is no longer an option because even with
>>>>> bouncing the second IO we could still corrupt data of the first IO once we
>>>>> copy to the final buffer. And if we'd block waiting for the first IO to
>>>>> complete, userspace could construct deadlock cycles - like racing IO to
>>>>> pages A, B with IO to pages B, A. So far I'm not sure about a sane way out
>>>>> of this...
>>>>
>>>> There isn't one.  We might have DMA-mapped this page earlier, and so a
>>>> device could write to it at any time.  Even if we remove PTE write
>>>> permissions ...
>>>
>>> True but writes through DMA to the page are guarded by holding a page pin
>>> these days so we could in theory block getting another page pin or mapping
>>
>> Do you mean, "setting up to do DMA is guarded by holding a FOLL_LONGTERM
>> page pin"? Or something else (that's new to me)?
> 
> I meant to say that users that end up setting up DMA to a page also hold a
> page pin (either longterm for RDMA and similar users or shortterm for
> direct IO). Do you disagree?
> 
> 								Honza

Completely agree. I see what you have in mind now. I was hung up on
the "page pins won't prevent DMA from happening, once it is set up"
point, but your idea is to detect that that is already set up, and
make decisions from there...that part I get now.


thanks,
-- 
John Hubbard

next prev parent reply	other threads:[~2025-10-21 16:56 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-20  9:19 [PATCH] btrfs: never trust the bio from direct IO Qu Wenruo
2025-10-20 10:00 ` O_DIRECT vs BLK_FEAT_STABLE_WRITES, was " Christoph Hellwig
2025-10-20 10:24   ` Qu Wenruo
2025-10-20 11:45     ` Christoph Hellwig
2025-10-20 11:16   ` Jan Kara
2025-10-20 11:44     ` Christoph Hellwig
2025-10-20 13:59       ` Jan Kara
2025-10-20 14:59         ` Matthew Wilcox
2025-10-20 15:58           ` Jan Kara
2025-10-20 17:55             ` John Hubbard
2025-10-21  8:27               ` Jan Kara
2025-10-21 16:56                 ` John Hubbard [this message]
2025-10-20 19:00             ` David Hildenbrand
2025-10-21  7:49               ` Christoph Hellwig
2025-10-21  7:57                 ` David Hildenbrand
2025-10-21  9:33                   ` Jan Kara
2025-10-21  9:43                     ` David Hildenbrand
2025-10-21  9:22                 ` Jan Kara
2025-10-21  9:37                   ` David Hildenbrand
2025-10-21  9:52                     ` Jan Kara
2025-10-21  3:17   ` Qu Wenruo
2025-10-21  7:48     ` Christoph Hellwig
2025-10-21  8:15       ` Qu Wenruo
2025-10-21 11:30         ` Johannes Thumshirn
2025-10-22  2:27           ` Qu Wenruo
2025-10-22  5:04             ` hch
2025-10-22  6:17               ` Qu Wenruo
2025-10-22  6:24                 ` hch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b4ee24a3-0706-47aa-b2ad-0f60f90793ee@nvidia.com \
    --to=jhubbard@nvidia.com \
    --cc=djwong@kernel.org \
    --cc=hch@infradead.org \
    --cc=jack@suse.com \
    --cc=jack@suse.cz \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=willy@infradead.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox