linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [QUESTION] xas_reload() in iter_xarray_populate_pages()
@ 2025-05-26  6:35 Dev Jain
  2025-05-26  7:40 ` David Hildenbrand
  2025-06-17  5:10 ` Dev Jain
  0 siblings, 2 replies; 10+ messages in thread
From: Dev Jain @ 2025-05-26  6:35 UTC (permalink / raw)
  To: david, ziy, willy, dhowells, hughd
  Cc: linux-mm, linux-kernel, ryan.roberts, aneesh.kumar

Hello all,

After doing an xas_load() and xas_retry(), we take neither a reference nor a lock
on the folio, and we do an xas_reload(). Is this just to reduce the time window
for a race?

If the above is true, then, there is a negligible window between xas_load() and
xas_reload(), because only xas_retry() exists between them, so why to even reload()?

Thanks,
Dev



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [QUESTION] xas_reload() in iter_xarray_populate_pages()
  2025-05-26  6:35 [QUESTION] xas_reload() in iter_xarray_populate_pages() Dev Jain
@ 2025-05-26  7:40 ` David Hildenbrand
  2025-05-26 19:20   ` Matthew Wilcox
  2025-06-17  5:10 ` Dev Jain
  1 sibling, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2025-05-26  7:40 UTC (permalink / raw)
  To: Dev Jain, ziy, willy, dhowells, hughd
  Cc: linux-mm, linux-kernel, ryan.roberts, aneesh.kumar

On 26.05.25 08:35, Dev Jain wrote:
> Hello all,
> 
> After doing an xas_load() and xas_retry(), we take neither a reference nor a lock
> on the folio, and we do an xas_reload(). Is this just to reduce the time window
> for a race?
> 
> If the above is true, then, there is a negligible window between xas_load() and
> xas_reload(), because only xas_retry() exists between them, so why to even reload()?

The usual sequence for the pagecache is (see filemap_get_entry())

1) xas_load(): Load the entry

2) xas_retry(): Test if we have to retry immediately

3) folio || xa_is_value(folio): check if the entry stores a folio

4) folio_try_get(): try getting a folio reference, might get freed
    concurrently, so a folio_get() is not safe

5) folio != xas_reload(&xas): recheck whether the entry was changed
    concurrently

iter_xarray_get_pages()->iter_xarray_populate_pages() works on whatever 
xarray was provided to iov_iter_xarray().

erofs/netfs/orangefs seem to pass the pagecache ... so I would also 
assume that we have to use the same sequence as above.

Willy and me had a look ad that code in b57f4f4f186d ("iov_iter: convert 
iter_xarray_populate_pages() to use folios").

But looking at it now, I think that code is incorrect. At least the 
folio_get() and reload-before-folio-get is weird.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [QUESTION] xas_reload() in iter_xarray_populate_pages()
  2025-05-26  7:40 ` David Hildenbrand
@ 2025-05-26 19:20   ` Matthew Wilcox
  0 siblings, 0 replies; 10+ messages in thread
From: Matthew Wilcox @ 2025-05-26 19:20 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Dev Jain, ziy, dhowells, hughd, linux-mm, linux-kernel,
	ryan.roberts, aneesh.kumar

On Mon, May 26, 2025 at 09:40:06AM +0200, David Hildenbrand wrote:
> On 26.05.25 08:35, Dev Jain wrote:
> > Hello all,
> > 
> > After doing an xas_load() and xas_retry(), we take neither a reference nor a lock
> > on the folio, and we do an xas_reload(). Is this just to reduce the time window
> > for a race?
> > 
> > If the above is true, then, there is a negligible window between xas_load() and
> > xas_reload(), because only xas_retry() exists between them, so why to even reload()?
> 
> The usual sequence for the pagecache is (see filemap_get_entry())
> 
> 1) xas_load(): Load the entry
> 
> 2) xas_retry(): Test if we have to retry immediately
> 
> 3) folio || xa_is_value(folio): check if the entry stores a folio
> 
> 4) folio_try_get(): try getting a folio reference, might get freed
>    concurrently, so a folio_get() is not safe
> 
> 5) folio != xas_reload(&xas): recheck whether the entry was changed
>    concurrently
> 
> iter_xarray_get_pages()->iter_xarray_populate_pages() works on whatever
> xarray was provided to iov_iter_xarray().
> 
> erofs/netfs/orangefs seem to pass the pagecache ... so I would also assume
> that we have to use the same sequence as above.
> 
> Willy and me had a look ad that code in b57f4f4f186d ("iov_iter: convert
> iter_xarray_populate_pages() to use folios").
> 
> But looking at it now, I think that code is incorrect. At least the
> folio_get() and reload-before-folio-get is weird.

Well, I just converted it; I didn't think hard about what it was doing
was right.  I think I may have mentioned to dhowells that I thought the
xas_reload() was unnecessary as the folios are required to be referenced
by the caller.  So if you can look them up, they're guaranteed to not
be replaced [1].

[1] the xas_retry() cannot be skipped as it guards against a rearrangement
of the tree which can happen even with the folio pinned.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [QUESTION] xas_reload() in iter_xarray_populate_pages()
  2025-05-26  6:35 [QUESTION] xas_reload() in iter_xarray_populate_pages() Dev Jain
  2025-05-26  7:40 ` David Hildenbrand
@ 2025-06-17  5:10 ` Dev Jain
  2025-06-17  7:47   ` David Hildenbrand
  2025-06-17 13:47   ` Matthew Wilcox
  1 sibling, 2 replies; 10+ messages in thread
From: Dev Jain @ 2025-06-17  5:10 UTC (permalink / raw)
  To: david, ziy, willy, dhowells, hughd
  Cc: linux-mm, linux-kernel, ryan.roberts, aneesh.kumar


On 26/05/25 12:05 pm, Dev Jain wrote:
> Hello all,
>
> After doing an xas_load() and xas_retry(), we take neither a reference nor a lock
> on the folio, and we do an xas_reload(). Is this just to reduce the time window
> for a race?
>
> If the above is true, then, there is a negligible window between xas_load() and
> xas_reload(), because only xas_retry() exists between them, so why to even reload()?
>
> Thanks,
> Dev

I do not completely remember our discussion in THP Cabal; I recall David Howells maybe
saying that the folios are already locked, so it is safe to do xas_load and then do
a folio_get()? Even if we remove the redundant xas_reload(), I still don't understand
why we won't need xas_reload() at least after folio_get()?



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [QUESTION] xas_reload() in iter_xarray_populate_pages()
  2025-06-17  5:10 ` Dev Jain
@ 2025-06-17  7:47   ` David Hildenbrand
  2025-06-17  9:18     ` Dev Jain
  2025-06-17 13:47   ` Matthew Wilcox
  1 sibling, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2025-06-17  7:47 UTC (permalink / raw)
  To: Dev Jain, ziy, willy, dhowells, hughd
  Cc: linux-mm, linux-kernel, ryan.roberts, aneesh.kumar

On 17.06.25 07:10, Dev Jain wrote:
> 
> On 26/05/25 12:05 pm, Dev Jain wrote:
>> Hello all,
>>
>> After doing an xas_load() and xas_retry(), we take neither a reference nor a lock
>> on the folio, and we do an xas_reload(). Is this just to reduce the time window
>> for a race?
>>
>> If the above is true, then, there is a negligible window between xas_load() and
>> xas_reload(), because only xas_retry() exists between them, so why to even reload()?
>>
>> Thanks,
>> Dev
> 
> I do not completely remember our discussion in THP Cabal; I recall David Howells maybe
> saying that the folios are already locked, so it is safe to do xas_load and then do
> a folio_get()? Even if we remove the redundant xas_reload(), I still don't understand
> why we won't need xas_reload() at least after folio_get()?

I think the points where

(a) this should go all away soon

(b) there is the expectation that the folios cannot get truncated
     concurrently. So we can do an unconditional folio_get(), don't have
     to check folio->mapping etc.

(c) The xas_reload() seems unnecessary and can be dropped.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [QUESTION] xas_reload() in iter_xarray_populate_pages()
  2025-06-17  7:47   ` David Hildenbrand
@ 2025-06-17  9:18     ` Dev Jain
  2025-06-17  9:26       ` David Hildenbrand
  0 siblings, 1 reply; 10+ messages in thread
From: Dev Jain @ 2025-06-17  9:18 UTC (permalink / raw)
  To: David Hildenbrand, ziy, willy, dhowells, hughd
  Cc: linux-mm, linux-kernel, ryan.roberts, aneesh.kumar


On 17/06/25 1:17 pm, David Hildenbrand wrote:
> On 17.06.25 07:10, Dev Jain wrote:
>>
>> On 26/05/25 12:05 pm, Dev Jain wrote:
>>> Hello all,
>>>
>>> After doing an xas_load() and xas_retry(), we take neither a 
>>> reference nor a lock
>>> on the folio, and we do an xas_reload(). Is this just to reduce the 
>>> time window
>>> for a race?
>>>
>>> If the above is true, then, there is a negligible window between 
>>> xas_load() and
>>> xas_reload(), because only xas_retry() exists between them, so why 
>>> to even reload()?
>>>
>>> Thanks,
>>> Dev
>>
>> I do not completely remember our discussion in THP Cabal; I recall 
>> David Howells maybe
>> saying that the folios are already locked, so it is safe to do 
>> xas_load and then do
>> a folio_get()? Even if we remove the redundant xas_reload(), I still 
>> don't understand
>> why we won't need xas_reload() at least after folio_get()?
>
> I think the points where
>
> (a) this should go all away soon
>
> (b) there is the expectation that the folios cannot get truncated
>     concurrently. So we can do an unconditional folio_get(), don't have
>     to check folio->mapping etc.

Well...pretty sure the file read path is taking the inode->i_rwsem or 
i_lock somewhere,

to synchronize with truncation/reclaim, can't figure out where. Reclaim 
takes the i_lock in __remove_mapping and

then freezes the folio reference, so the read path must lock i_lock 
somewhere.


>
> (c) The xas_reload() seems unnecessary and can be dropped.
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [QUESTION] xas_reload() in iter_xarray_populate_pages()
  2025-06-17  9:18     ` Dev Jain
@ 2025-06-17  9:26       ` David Hildenbrand
  2025-06-17  9:38         ` Dev Jain
  0 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2025-06-17  9:26 UTC (permalink / raw)
  To: Dev Jain, ziy, willy, dhowells, hughd
  Cc: linux-mm, linux-kernel, ryan.roberts, aneesh.kumar

On 17.06.25 11:18, Dev Jain wrote:
> 
> On 17/06/25 1:17 pm, David Hildenbrand wrote:
>> On 17.06.25 07:10, Dev Jain wrote:
>>>
>>> On 26/05/25 12:05 pm, Dev Jain wrote:
>>>> Hello all,
>>>>
>>>> After doing an xas_load() and xas_retry(), we take neither a
>>>> reference nor a lock
>>>> on the folio, and we do an xas_reload(). Is this just to reduce the
>>>> time window
>>>> for a race?
>>>>
>>>> If the above is true, then, there is a negligible window between
>>>> xas_load() and
>>>> xas_reload(), because only xas_retry() exists between them, so why
>>>> to even reload()?
>>>>
>>>> Thanks,
>>>> Dev
>>>
>>> I do not completely remember our discussion in THP Cabal; I recall
>>> David Howells maybe
>>> saying that the folios are already locked, so it is safe to do
>>> xas_load and then do
>>> a folio_get()? Even if we remove the redundant xas_reload(), I still
>>> don't understand
>>> why we won't need xas_reload() at least after folio_get()?
>>
>> I think the points where
>>
>> (a) this should go all away soon
>>
>> (b) there is the expectation that the folios cannot get truncated
>>      concurrently. So we can do an unconditional folio_get(), don't have
>>      to check folio->mapping etc.
> 
> Well...pretty sure the file read path is taking the inode->i_rwsem or
> i_lock somewhere,
> 
> to synchronize with truncation/reclaim, can't figure out where. Reclaim
> takes the i_lock in __remove_mapping and
> 
> then freezes the folio reference, so the read path must lock i_lock
> somewhere.

I mean, if concurrent freeing of a folio would be possible, the function 
would be horribly broken :)

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [QUESTION] xas_reload() in iter_xarray_populate_pages()
  2025-06-17  9:26       ` David Hildenbrand
@ 2025-06-17  9:38         ` Dev Jain
  0 siblings, 0 replies; 10+ messages in thread
From: Dev Jain @ 2025-06-17  9:38 UTC (permalink / raw)
  To: David Hildenbrand, ziy, willy, dhowells, hughd
  Cc: linux-mm, linux-kernel, ryan.roberts, aneesh.kumar


On 17/06/25 2:56 pm, David Hildenbrand wrote:
> On 17.06.25 11:18, Dev Jain wrote:
>>
>> On 17/06/25 1:17 pm, David Hildenbrand wrote:
>>> On 17.06.25 07:10, Dev Jain wrote:
>>>>
>>>> On 26/05/25 12:05 pm, Dev Jain wrote:
>>>>> Hello all,
>>>>>
>>>>> After doing an xas_load() and xas_retry(), we take neither a
>>>>> reference nor a lock
>>>>> on the folio, and we do an xas_reload(). Is this just to reduce the
>>>>> time window
>>>>> for a race?
>>>>>
>>>>> If the above is true, then, there is a negligible window between
>>>>> xas_load() and
>>>>> xas_reload(), because only xas_retry() exists between them, so why
>>>>> to even reload()?
>>>>>
>>>>> Thanks,
>>>>> Dev
>>>>
>>>> I do not completely remember our discussion in THP Cabal; I recall
>>>> David Howells maybe
>>>> saying that the folios are already locked, so it is safe to do
>>>> xas_load and then do
>>>> a folio_get()? Even if we remove the redundant xas_reload(), I still
>>>> don't understand
>>>> why we won't need xas_reload() at least after folio_get()?
>>>
>>> I think the points where
>>>
>>> (a) this should go all away soon
>>>
>>> (b) there is the expectation that the folios cannot get truncated
>>>      concurrently. So we can do an unconditional folio_get(), don't 
>>> have
>>>      to check folio->mapping etc.
>>
>> Well...pretty sure the file read path is taking the inode->i_rwsem or
>> i_lock somewhere,
>>
>> to synchronize with truncation/reclaim, can't figure out where. Reclaim
>> takes the i_lock in __remove_mapping and
>>
>> then freezes the folio reference, so the read path must lock i_lock
>> somewhere.
>
> I mean, if concurrent freeing of a folio would be possible, the 
> function would be horribly broken :)
Yes, trusting the kernel overlords I have dropped my search : )


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [QUESTION] xas_reload() in iter_xarray_populate_pages()
  2025-06-17  5:10 ` Dev Jain
  2025-06-17  7:47   ` David Hildenbrand
@ 2025-06-17 13:47   ` Matthew Wilcox
  2025-06-18  3:14     ` Dev Jain
  1 sibling, 1 reply; 10+ messages in thread
From: Matthew Wilcox @ 2025-06-17 13:47 UTC (permalink / raw)
  To: Dev Jain
  Cc: david, ziy, dhowells, hughd, linux-mm, linux-kernel, ryan.roberts,
	aneesh.kumar

On Tue, Jun 17, 2025 at 10:40:51AM +0530, Dev Jain wrote:
> 
> On 26/05/25 12:05 pm, Dev Jain wrote:
> > Hello all,
> > 
> > After doing an xas_load() and xas_retry(), we take neither a reference nor a lock
> > on the folio, and we do an xas_reload(). Is this just to reduce the time window
> > for a race?
> > 
> > If the above is true, then, there is a negligible window between xas_load() and
> > xas_reload(), because only xas_retry() exists between them, so why to even reload()?
> > 
> > Thanks,
> > Dev
> 
> I do not completely remember our discussion in THP Cabal; I recall David Howells maybe
> saying that the folios are already locked, so it is safe to do xas_load and then do
> a folio_get()? Even if we remove the redundant xas_reload(), I still don't understand
> why we won't need xas_reload() at least after folio_get()?

Because you need xas_reload() in order to solve this race:

A: load folio
B: remove folio
B: free folio
C: alloc folio
A: tryget folio
A: reload folio

If A already has a refcount on folio, folio cannot be freed, and so A
cannot get a refcount to C's folio.

The other mutexes are irrelevant here; this is purely a folio refcount
problem/solution.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [QUESTION] xas_reload() in iter_xarray_populate_pages()
  2025-06-17 13:47   ` Matthew Wilcox
@ 2025-06-18  3:14     ` Dev Jain
  0 siblings, 0 replies; 10+ messages in thread
From: Dev Jain @ 2025-06-18  3:14 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: david, ziy, dhowells, hughd, linux-mm, linux-kernel, ryan.roberts,
	aneesh.kumar


On 17/06/25 7:17 pm, Matthew Wilcox wrote:
> On Tue, Jun 17, 2025 at 10:40:51AM +0530, Dev Jain wrote:
>> On 26/05/25 12:05 pm, Dev Jain wrote:
>>> Hello all,
>>>
>>> After doing an xas_load() and xas_retry(), we take neither a reference nor a lock
>>> on the folio, and we do an xas_reload(). Is this just to reduce the time window
>>> for a race?
>>>
>>> If the above is true, then, there is a negligible window between xas_load() and
>>> xas_reload(), because only xas_retry() exists between them, so why to even reload()?
>>>
>>> Thanks,
>>> Dev
>> I do not completely remember our discussion in THP Cabal; I recall David Howells maybe
>> saying that the folios are already locked, so it is safe to do xas_load and then do
>> a folio_get()? Even if we remove the redundant xas_reload(), I still don't understand
>> why we won't need xas_reload() at least after folio_get()?
> Because you need xas_reload() in order to solve this race:
>
> A: load folio
> B: remove folio
> B: free folio
> C: alloc folio
> A: tryget folio
> A: reload folio
>
> If A already has a refcount on folio, folio cannot be freed, and so A
> cannot get a refcount to C's folio.

Yes sorry, I should have written that why is an unconditional folio get being
used instead of tryget...and the answer seems to be that we already probably
have the inode lock so we are guaranteed that the folio won't get evicted.

>
> The other mutexes are irrelevant here; this is purely a folio refcount
> problem/solution.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-06-18  3:14 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-26  6:35 [QUESTION] xas_reload() in iter_xarray_populate_pages() Dev Jain
2025-05-26  7:40 ` David Hildenbrand
2025-05-26 19:20   ` Matthew Wilcox
2025-06-17  5:10 ` Dev Jain
2025-06-17  7:47   ` David Hildenbrand
2025-06-17  9:18     ` Dev Jain
2025-06-17  9:26       ` David Hildenbrand
2025-06-17  9:38         ` Dev Jain
2025-06-17 13:47   ` Matthew Wilcox
2025-06-18  3:14     ` Dev Jain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).