linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Does GUP page unpinning have to be done in the pinning context?
@ 2025-04-04 10:20 David Howells
  2025-04-04 10:29 ` David Hildenbrand
  0 siblings, 1 reply; 10+ messages in thread
From: David Howells @ 2025-04-04 10:20 UTC (permalink / raw)
  To: Kirill A. Shutemov, John Hubbard, David Hildenbrand
  Cc: dhowells, hch, willy, linux-mm

Hi Kirill, John, David,

I don't know if you're the experts on GUP, but can you tell me if unpinning,
e.g. unpin_user_page(), needs to be done in the same MM context and/or the
same user context as the initial pinning?  And, if so, is there some way to
break that link?

I'm looking at how I might extend page pinning into the socket layer and
sendmsg(MSG_ZEROCOPY).  The problem is that sendmsg() merely queues the
buffers with no guarantee that it's finished with them by the time it's
returned.  You get a SO_EE_ORIGIN_ZEROCOPY message to tell you that.

I have to deal with the issue that the process that did the sendmsg(), the MM
context and the user context may all have ceased to exist by the time the
transmission completes if I don't pin them.

Thanks,
David



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Does GUP page unpinning have to be done in the pinning context?
  2025-04-04 10:20 Does GUP page unpinning have to be done in the pinning context? David Howells
@ 2025-04-04 10:29 ` David Hildenbrand
  2025-04-04 16:59   ` John Hubbard
  0 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2025-04-04 10:29 UTC (permalink / raw)
  To: David Howells, Kirill A. Shutemov, John Hubbard; +Cc: hch, willy, linux-mm

On 04.04.25 12:20, David Howells wrote:
> Hi Kirill, John, David,

Hi,

> 
> I don't know if you're the experts on GUP, but can you tell me if unpinning,
> e.g. unpin_user_page(), needs to be done in the same MM context and/or the
> same user context as the initial pinning?  And, if so, is there some way to
> break that link?

gup_put_folio() seems to only rely on per-folio information (esp. 
node_stat_mod_folio).

So there should not be such a context requirement.

> 
> I'm looking at how I might extend page pinning into the socket layer and
> sendmsg(MSG_ZEROCOPY).  The problem is that sendmsg() merely queues the
> buffers with no guarantee that it's finished with them by the time it's
> returned.  You get a SO_EE_ORIGIN_ZEROCOPY message to tell you that.
> 
> I have to deal with the issue that the process that did the sendmsg(), the MM
> context and the user context may all have ceased to exist by the time the
> transmission completes if I don't pin them.

AFAIKS that should work.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Does GUP page unpinning have to be done in the pinning context?
  2025-04-04 10:29 ` David Hildenbrand
@ 2025-04-04 16:59   ` John Hubbard
  2025-04-07  6:39     ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: John Hubbard @ 2025-04-04 16:59 UTC (permalink / raw)
  To: David Hildenbrand, David Howells, Kirill A. Shutemov; +Cc: hch, willy, linux-mm

On 4/4/25 3:29 AM, David Hildenbrand wrote:
> On 04.04.25 12:20, David Howells wrote:
>> Hi Kirill, John, David,
> 
> Hi,
> 
>>
>> I don't know if you're the experts on GUP, but can you tell me if 
>> unpinning,
>> e.g. unpin_user_page(), needs to be done in the same MM context and/or 
>> the
>> same user context as the initial pinning?  And, if so, is there some 
>> way to
>> break that link?
> 
> gup_put_folio() seems to only rely on per-folio information (esp. 
> node_stat_mod_folio).
> 
> So there should not be such a context requirement.

That is correct. The essence of gup/pup is that it operates on
struct pages, and doesn't have any "moral" connection to higher
layers or additional process context.


thanks,
-- 
John Hubbard



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Does GUP page unpinning have to be done in the pinning context?
  2025-04-04 16:59   ` John Hubbard
@ 2025-04-07  6:39     ` Christoph Hellwig
  2025-04-10  2:56       ` John Hubbard
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2025-04-07  6:39 UTC (permalink / raw)
  To: John Hubbard
  Cc: David Hildenbrand, David Howells, Kirill A. Shutemov, willy,
	linux-mm, Jens Axboe, linux-block

On Fri, Apr 04, 2025 at 09:59:40AM -0700, John Hubbard wrote:
> > gup_put_folio() seems to only rely on per-folio information (esp.
> > node_stat_mod_folio).
> > 
> > So there should not be such a context requirement.
> 
> That is correct. The essence of gup/pup is that it operates on
> struct pages, and doesn't have any "moral" connection to higher
> layers or additional process context.

Hmm, indeed.  I misremembered why the block based direct I/O code is
doing the process context offload, which is to call set_page_dirty to
redirty pages where the dirty bit was cleared during direct I/O.

Which I think now that all block based file systems use FOLL_PIN
shouldn't be needed anymore, because no one can clear the dirty bit
while the folios are pinned?



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Does GUP page unpinning have to be done in the pinning context?
  2025-04-07  6:39     ` Christoph Hellwig
@ 2025-04-10  2:56       ` John Hubbard
  2025-04-10  7:28         ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: John Hubbard @ 2025-04-10  2:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: David Hildenbrand, David Howells, Kirill A. Shutemov, willy,
	linux-mm, Jens Axboe, linux-block

On 4/6/25 11:39 PM, Christoph Hellwig wrote:
> On Fri, Apr 04, 2025 at 09:59:40AM -0700, John Hubbard wrote:
>>> gup_put_folio() seems to only rely on per-folio information (esp.
>>> node_stat_mod_folio).
>>>
>>> So there should not be such a context requirement.
>>
>> That is correct. The essence of gup/pup is that it operates on
>> struct pages, and doesn't have any "moral" connection to higher
>> layers or additional process context.
> 
> Hmm, indeed.  I misremembered why the block based direct I/O code is
> doing the process context offload, which is to call set_page_dirty to
> redirty pages where the dirty bit was cleared during direct I/O.
> 
> Which I think now that all block based file systems use FOLL_PIN
> shouldn't be needed anymore, because no one can clear the dirty bit
> while the folios are pinned?
> 

No one should clear the dirty bit while the pages are pinned, agreed.

This topic always worries me, because the original problem with
dirty pages is still unfixed: setting pages dirty upon unpinning
is both widely done (last time I checked), and yet broken, because
it doesn't do a mkdirty() call to set up writeback buffers.

The solution always seemed to point toward "get a file lease on that
range, before pinning", but it's a contentious design area to say
the least.


thanks,
-- 
John Hubbard



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Does GUP page unpinning have to be done in the pinning context?
  2025-04-10  2:56       ` John Hubbard
@ 2025-04-10  7:28         ` Christoph Hellwig
  2025-04-10 19:11           ` John Hubbard
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2025-04-10  7:28 UTC (permalink / raw)
  To: John Hubbard
  Cc: David Hildenbrand, David Howells, Kirill A. Shutemov, willy,
	linux-mm, Jens Axboe, linux-block

On Wed, Apr 09, 2025 at 07:56:07PM -0700, John Hubbard wrote:
> This topic always worries me, because the original problem with
> dirty pages is still unfixed: setting pages dirty upon unpinning
> is both widely done (last time I checked), and yet broken, because
> it doesn't do a mkdirty() call to set up writeback buffers.
> 
> The solution always seemed to point toward "get a file lease on that
> range, before pinning", but it's a contentious design area to say
> the least.

For the bio based direct I/O implementations we do set the pages
dirty before starting I/O using bio_set_pages_dirty, which uses
folio_mark_dirty and thus calls into the file systems using
->dirty_folio.  But we also do a second pass on I/O completion
before the buffers are unpinned.  Which I think now that we pin
the folios is superfluous.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Does GUP page unpinning have to be done in the pinning context?
  2025-04-10  7:28         ` Christoph Hellwig
@ 2025-04-10 19:11           ` John Hubbard
  2025-04-10 19:14             ` Matthew Wilcox
  2025-05-12  6:21             ` Christoph Hellwig
  0 siblings, 2 replies; 10+ messages in thread
From: John Hubbard @ 2025-04-10 19:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: David Hildenbrand, David Howells, Kirill A. Shutemov, willy,
	linux-mm, Jens Axboe, linux-block

On 4/10/25 12:28 AM, Christoph Hellwig wrote:
> On Wed, Apr 09, 2025 at 07:56:07PM -0700, John Hubbard wrote:
>> This topic always worries me, because the original problem with
>> dirty pages is still unfixed: setting pages dirty upon unpinning
>> is both widely done (last time I checked), and yet broken, because
>> it doesn't do a mkdirty() call to set up writeback buffers.
>>
>> The solution always seemed to point toward "get a file lease on that
>> range, before pinning", but it's a contentious design area to say
>> the least.
> 
> For the bio based direct I/O implementations we do set the pages
> dirty before starting I/O using bio_set_pages_dirty, which uses
> folio_mark_dirty and thus calls into the file systems using
> ->dirty_folio.  But we also do a second pass on I/O completion
> before the buffers are unpinned.  Which I think now that we pin
> the folios is superfluous.
> 

Oh actually I think I was wrong in my earlier reply about clearing
the dirty bit. Because in Jan Kara's original bug report, what
happened was that periodic writeback came in while the pages
were pinned, and cleared the dirty bit--and also deleted the
page buffers (file system specific behavior) that are required
for writeback.

So then later when the pages are unpinned and marked dirty,
that causes the next writeback to fail in an unexpected way
(it used to cause ext4 BUG checks, in fact).

So the problem here is that these pinned pages can get cleaned
while they are pinned, and then dirtied again by DMA (invisible
to the filesystem).

thanks,
-- 
John Hubbard



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Does GUP page unpinning have to be done in the pinning context?
  2025-04-10 19:11           ` John Hubbard
@ 2025-04-10 19:14             ` Matthew Wilcox
  2025-04-10 19:34               ` John Hubbard
  2025-05-12  6:21             ` Christoph Hellwig
  1 sibling, 1 reply; 10+ messages in thread
From: Matthew Wilcox @ 2025-04-10 19:14 UTC (permalink / raw)
  To: John Hubbard
  Cc: Christoph Hellwig, David Hildenbrand, David Howells,
	Kirill A. Shutemov, linux-mm, Jens Axboe, linux-block

On Thu, Apr 10, 2025 at 12:11:42PM -0700, John Hubbard wrote:
> On 4/10/25 12:28 AM, Christoph Hellwig wrote:
> > On Wed, Apr 09, 2025 at 07:56:07PM -0700, John Hubbard wrote:
> >> This topic always worries me, because the original problem with
> >> dirty pages is still unfixed: setting pages dirty upon unpinning
> >> is both widely done (last time I checked), and yet broken, because
> >> it doesn't do a mkdirty() call to set up writeback buffers.
> >>
> >> The solution always seemed to point toward "get a file lease on that
> >> range, before pinning", but it's a contentious design area to say
> >> the least.
> > 
> > For the bio based direct I/O implementations we do set the pages
> > dirty before starting I/O using bio_set_pages_dirty, which uses
> > folio_mark_dirty and thus calls into the file systems using
> > ->dirty_folio.  But we also do a second pass on I/O completion
> > before the buffers are unpinned.  Which I think now that we pin
> > the folios is superfluous.
> > 
> 
> Oh actually I think I was wrong in my earlier reply about clearing
> the dirty bit. Because in Jan Kara's original bug report, what
> happened was that periodic writeback came in while the pages
> were pinned, and cleared the dirty bit--and also deleted the
> page buffers (file system specific behavior) that are required
> for writeback.
> 
> So then later when the pages are unpinned and marked dirty,
> that causes the next writeback to fail in an unexpected way
> (it used to cause ext4 BUG checks, in fact).
> 
> So the problem here is that these pinned pages can get cleaned
> while they are pinned, and then dirtied again by DMA (invisible
> to the filesystem).

Did we fix that already?  Because it's relatively easy to writeback
pinned pages and _not_ clear the dirty flag.  That handles the two
problems which are falsely thinking that a heavily-mapped order-0 page 
is pinned (we write it back anyway, so don't lose data on crash),
and doesn't strip the bufferheads.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Does GUP page unpinning have to be done in the pinning context?
  2025-04-10 19:14             ` Matthew Wilcox
@ 2025-04-10 19:34               ` John Hubbard
  0 siblings, 0 replies; 10+ messages in thread
From: John Hubbard @ 2025-04-10 19:34 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, David Hildenbrand, David Howells,
	Kirill A. Shutemov, linux-mm, Jens Axboe, linux-block

On 4/10/25 12:14 PM, Matthew Wilcox wrote:
> On Thu, Apr 10, 2025 at 12:11:42PM -0700, John Hubbard wrote:
>> On 4/10/25 12:28 AM, Christoph Hellwig wrote:
>>> On Wed, Apr 09, 2025 at 07:56:07PM -0700, John Hubbard wrote:
...
>> Oh actually I think I was wrong in my earlier reply about clearing
>> the dirty bit. Because in Jan Kara's original bug report, what
>> happened was that periodic writeback came in while the pages
>> were pinned, and cleared the dirty bit--and also deleted the
>> page buffers (file system specific behavior) that are required
>> for writeback.
>>
>> So then later when the pages are unpinned and marked dirty,
>> that causes the next writeback to fail in an unexpected way
>> (it used to cause ext4 BUG checks, in fact).
>>
>> So the problem here is that these pinned pages can get cleaned
>> while they are pinned, and then dirtied again by DMA (invisible
>> to the filesystem).
> 
> Did we fix that already?  Because it's relatively easy to writeback

Maybe not?

On a closely related note, I do still see folio_clear_dirty() being 
called from ext4's mpage_prepare_extent_to_map(), along with the ext4
bandaid [1] that was applied to mitigate the problem:

	/*
	* Should never happen but for buggy code in
	* other subsystems that call
	* set_page_dirty() without properly warning
	* the file system first.  See [1] for more
	* information.
	*
	* [1] https://lore.kernel.org/linux-mm/20180103100430.GE4911@quack2.suse.cz
	*/
if (!folio_buffers(folio)) {
	ext4_warning_inode(mpd->inode, "page %lu does not have buffers attached", folio->index);
	folio_clear_dirty(folio);
	folio_unlock(folio);
	continue;
}

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cc5095747edf


> pinned pages and _not_ clear the dirty flag.  That handles the two
> problems which are falsely thinking that a heavily-mapped order-0 page 
> is pinned (we write it back anyway, so don't lose data on crash),
> and doesn't strip the bufferheads.

That sounds like a viable solution!

thanks,
-- 
John Hubbard



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Does GUP page unpinning have to be done in the pinning context?
  2025-04-10 19:11           ` John Hubbard
  2025-04-10 19:14             ` Matthew Wilcox
@ 2025-05-12  6:21             ` Christoph Hellwig
  1 sibling, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2025-05-12  6:21 UTC (permalink / raw)
  To: John Hubbard
  Cc: Christoph Hellwig, David Hildenbrand, David Howells,
	Kirill A. Shutemov, willy, linux-mm, Jens Axboe, linux-block

On Thu, Apr 10, 2025 at 12:11:42PM -0700, John Hubbard wrote:
> Oh actually I think I was wrong in my earlier reply about clearing
> the dirty bit. Because in Jan Kara's original bug report, what
> happened was that periodic writeback came in while the pages
> were pinned, and cleared the dirty bit--and also deleted the
> page buffers (file system specific behavior) that are required
> for writeback.
> 
> So then later when the pages are unpinned and marked dirty,
> that causes the next writeback to fail in an unexpected way
> (it used to cause ext4 BUG checks, in fact).
> 
> So the problem here is that these pinned pages can get cleaned
> while they are pinned, and then dirtied again by DMA (invisible
> to the filesystem).

I've looked around a bit.  We do skip pinned pags in shrink_folio_list
(btw, can someone please split that thing up, it's so huge that it
is completely unreadable) but that's not really relevant for clearing
the dirty bit for filemap folios these days despite comments talking
about just that.

So I guess, yes - we'd need to skip folio_maybe_dma_pinned() in
writeback, or wait for the bit to be cleared for data integrity
writeback.  Which doesn't sound too hard, but there might be
pitfalls.



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-05-12  6:21 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-04 10:20 Does GUP page unpinning have to be done in the pinning context? David Howells
2025-04-04 10:29 ` David Hildenbrand
2025-04-04 16:59   ` John Hubbard
2025-04-07  6:39     ` Christoph Hellwig
2025-04-10  2:56       ` John Hubbard
2025-04-10  7:28         ` Christoph Hellwig
2025-04-10 19:11           ` John Hubbard
2025-04-10 19:14             ` Matthew Wilcox
2025-04-10 19:34               ` John Hubbard
2025-05-12  6:21             ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).