* Mixed page compact code and (higher order) folios for filemap
@ 2023-11-16 3:41 Qu Wenruo
2023-11-16 5:05 ` Matthew Wilcox
0 siblings, 1 reply; 7+ messages in thread
From: Qu Wenruo @ 2023-11-16 3:41 UTC (permalink / raw)
To: Linux Memory Management List, Linux FS Devel,
linux-btrfs@vger.kernel.org
Hi guys,
I'm wondering if there is any pitfalls when mixing the legacy page based
code with higher order (order >= 1) folios.
E.g. if I allocated a folio with order 2, attached some private data to
the folio, then call filemap_add_folio().
Later some one called find_lock_page() and hit the 2nd page of that folio.
I believe the regular IO is totally fine, but what would happen for the
page->private of that folio?
Would them all share the same value of the folio_attach_private()? Or
some different values?
Thanks,
Qu
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Mixed page compact code and (higher order) folios for filemap
2023-11-16 3:41 Mixed page compact code and (higher order) folios for filemap Qu Wenruo
@ 2023-11-16 5:05 ` Matthew Wilcox
2023-11-16 5:30 ` Qu Wenruo
0 siblings, 1 reply; 7+ messages in thread
From: Matthew Wilcox @ 2023-11-16 5:05 UTC (permalink / raw)
To: Qu Wenruo
Cc: Linux Memory Management List, Linux FS Devel,
linux-btrfs@vger.kernel.org
On Thu, Nov 16, 2023 at 02:11:00PM +1030, Qu Wenruo wrote:
> E.g. if I allocated a folio with order 2, attached some private data to
> the folio, then call filemap_add_folio().
>
> Later some one called find_lock_page() and hit the 2nd page of that folio.
>
> I believe the regular IO is totally fine, but what would happen for the
> page->private of that folio?
> Would them all share the same value of the folio_attach_private()? Or
> some different values?
Well, there's no magic ...
If you call find_lock_page(), you get back the precise page. If you
call page_folio() on that page, you get back the folio that you stored.
If you then dereference folio->private, you get the pointer that you
passed to folio_attach_private().
If you dereference page->private, *that is a bug*. You might get
NULL, you might get garbage. Just like dereferencing page->index or
page->mapping on tail pages. page_private() will also do the wrong thing
(we could fix that to embed a call to page_folio() ... it hasn't been
necessary before now, but if it'll help convert btrfs, then let's do it).
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Mixed page compact code and (higher order) folios for filemap
2023-11-16 5:05 ` Matthew Wilcox
@ 2023-11-16 5:30 ` Qu Wenruo
2023-11-16 14:23 ` Matthew Wilcox
0 siblings, 1 reply; 7+ messages in thread
From: Qu Wenruo @ 2023-11-16 5:30 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Linux Memory Management List, Linux FS Devel,
linux-btrfs@vger.kernel.org
On 2023/11/16 15:35, Matthew Wilcox wrote:
> On Thu, Nov 16, 2023 at 02:11:00PM +1030, Qu Wenruo wrote:
>> E.g. if I allocated a folio with order 2, attached some private data to
>> the folio, then call filemap_add_folio().
>>
>> Later some one called find_lock_page() and hit the 2nd page of that folio.
>>
>> I believe the regular IO is totally fine, but what would happen for the
>> page->private of that folio?
>> Would them all share the same value of the folio_attach_private()? Or
>> some different values?
>
> Well, there's no magic ...
>
> If you call find_lock_page(), you get back the precise page. If you
> call page_folio() on that page, you get back the folio that you stored.
> If you then dereference folio->private, you get the pointer that you
> passed to folio_attach_private().
>
> If you dereference page->private, *that is a bug*. You might get
> NULL, you might get garbage. Just like dereferencing page->index or
> page->mapping on tail pages. page_private() will also do the wrong thing
> (we could fix that to embed a call to page_folio() ... it hasn't been
> necessary before now, but if it'll help convert btrfs, then let's do it).
That would be great. The biggest problem I'm hitting so far is the page
cache for metadata.
We're using __GFP_NOFAIL for the current per-page allocation, but IIRC
__GFP_NOFAIL is ignored for higher order (>2 ?) folio allocation.
And we may want that per-page allocation as the last resort effort
allocation anyway.
Thus I'm checking if there is something we can do here.
But I guess we can always go folio_private() instead as a workaround for
now?
Thanks,
Qu
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Mixed page compact code and (higher order) folios for filemap
2023-11-16 5:30 ` Qu Wenruo
@ 2023-11-16 14:23 ` Matthew Wilcox
2023-11-16 20:33 ` Qu Wenruo
2023-11-16 22:40 ` Qu Wenruo
0 siblings, 2 replies; 7+ messages in thread
From: Matthew Wilcox @ 2023-11-16 14:23 UTC (permalink / raw)
To: Qu Wenruo
Cc: Linux Memory Management List, Linux FS Devel,
linux-btrfs@vger.kernel.org
On Thu, Nov 16, 2023 at 04:00:40PM +1030, Qu Wenruo wrote:
> On 2023/11/16 15:35, Matthew Wilcox wrote:
> > On Thu, Nov 16, 2023 at 02:11:00PM +1030, Qu Wenruo wrote:
> > > E.g. if I allocated a folio with order 2, attached some private data to
> > > the folio, then call filemap_add_folio().
> > >
> > > Later some one called find_lock_page() and hit the 2nd page of that folio.
> > >
> > > I believe the regular IO is totally fine, but what would happen for the
> > > page->private of that folio?
> > > Would them all share the same value of the folio_attach_private()? Or
> > > some different values?
> >
> > Well, there's no magic ...
> >
> > If you call find_lock_page(), you get back the precise page. If you
> > call page_folio() on that page, you get back the folio that you stored.
> > If you then dereference folio->private, you get the pointer that you
> > passed to folio_attach_private().
> >
> > If you dereference page->private, *that is a bug*. You might get
> > NULL, you might get garbage. Just like dereferencing page->index or
> > page->mapping on tail pages. page_private() will also do the wrong thing
> > (we could fix that to embed a call to page_folio() ... it hasn't been
> > necessary before now, but if it'll help convert btrfs, then let's do it).
>
> That would be great. The biggest problem I'm hitting so far is the page
> cache for metadata.
>
> We're using __GFP_NOFAIL for the current per-page allocation, but IIRC
> __GFP_NOFAIL is ignored for higher order (>2 ?) folio allocation.
> And we may want that per-page allocation as the last resort effort
> allocation anyway.
>
> Thus I'm checking if there is something we can do here.
>
> But I guess we can always go folio_private() instead as a workaround for
> now?
I don't understand enough about what you're doing to offer useful
advice. Is this for bs>PS or is it arbitrary large folios for better
performance? If the latter, you can always fall back to order-0 folios.
If the former, well, we need to adjust a few things anyway to handle
filesystems with a minimum order ...
In general, you should be using folio_private(). page->private and
page_private() will be removed eventually.
The GFP_NOFAIL warning is:
WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Mixed page compact code and (higher order) folios for filemap
2023-11-16 14:23 ` Matthew Wilcox
@ 2023-11-16 20:33 ` Qu Wenruo
2023-11-16 22:40 ` Qu Wenruo
1 sibling, 0 replies; 7+ messages in thread
From: Qu Wenruo @ 2023-11-16 20:33 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Linux Memory Management List, Linux FS Devel,
linux-btrfs@vger.kernel.org
On 2023/11/17 00:53, Matthew Wilcox wrote:
> On Thu, Nov 16, 2023 at 04:00:40PM +1030, Qu Wenruo wrote:
>> On 2023/11/16 15:35, Matthew Wilcox wrote:
>>> On Thu, Nov 16, 2023 at 02:11:00PM +1030, Qu Wenruo wrote:
>>>> E.g. if I allocated a folio with order 2, attached some private data to
>>>> the folio, then call filemap_add_folio().
>>>>
>>>> Later some one called find_lock_page() and hit the 2nd page of that folio.
>>>>
>>>> I believe the regular IO is totally fine, but what would happen for the
>>>> page->private of that folio?
>>>> Would them all share the same value of the folio_attach_private()? Or
>>>> some different values?
>>>
>>> Well, there's no magic ...
>>>
>>> If you call find_lock_page(), you get back the precise page. If you
>>> call page_folio() on that page, you get back the folio that you stored.
>>> If you then dereference folio->private, you get the pointer that you
>>> passed to folio_attach_private().
>>>
>>> If you dereference page->private, *that is a bug*. You might get
>>> NULL, you might get garbage. Just like dereferencing page->index or
>>> page->mapping on tail pages. page_private() will also do the wrong thing
>>> (we could fix that to embed a call to page_folio() ... it hasn't been
>>> necessary before now, but if it'll help convert btrfs, then let's do it).
>>
>> That would be great. The biggest problem I'm hitting so far is the page
>> cache for metadata.
>>
>> We're using __GFP_NOFAIL for the current per-page allocation, but IIRC
>> __GFP_NOFAIL is ignored for higher order (>2 ?) folio allocation.
>> And we may want that per-page allocation as the last resort effort
>> allocation anyway.
>>
>> Thus I'm checking if there is something we can do here.
>>
>> But I guess we can always go folio_private() instead as a workaround for
>> now?
>
> I don't understand enough about what you're doing to offer useful
> advice. Is this for bs>PS or is it arbitrary large folios for better
> performance?
The ultimate goal is to make nodesize (metadata block size) > PAGE_SIZE
case to go higher order folio by default, for better performance.
But use order 0 folios if we failed get higher order folios.
The current problem is the metadata allocation here is always going
page based, and using page->private.
> If the latter, you can always fall back to order-0 folios.
> If the former, well, we need to adjust a few things anyway to handle
> filesystems with a minimum order ...
>
> In general, you should be using folio_private(). page->private and
> page_private() will be removed eventually.
OK, that sounds good, we can do the cleanup first inside btrfs.
Thanks,
Qu
>
> The GFP_NOFAIL warning is:
>
> WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Mixed page compact code and (higher order) folios for filemap
2023-11-16 14:23 ` Matthew Wilcox
2023-11-16 20:33 ` Qu Wenruo
@ 2023-11-16 22:40 ` Qu Wenruo
2023-11-17 13:36 ` Matthew Wilcox
1 sibling, 1 reply; 7+ messages in thread
From: Qu Wenruo @ 2023-11-16 22:40 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Linux Memory Management List, Linux FS Devel,
linux-btrfs@vger.kernel.org
On 2023/11/17 00:53, Matthew Wilcox wrote:
> On Thu, Nov 16, 2023 at 04:00:40PM +1030, Qu Wenruo wrote:
>> On 2023/11/16 15:35, Matthew Wilcox wrote:
>>> On Thu, Nov 16, 2023 at 02:11:00PM +1030, Qu Wenruo wrote:
>>>> E.g. if I allocated a folio with order 2, attached some private data to
>>>> the folio, then call filemap_add_folio().
>>>>
>>>> Later some one called find_lock_page() and hit the 2nd page of that folio.
>>>>
>>>> I believe the regular IO is totally fine, but what would happen for the
>>>> page->private of that folio?
>>>> Would them all share the same value of the folio_attach_private()? Or
>>>> some different values?
>>>
>>> Well, there's no magic ...
>>>
>>> If you call find_lock_page(), you get back the precise page. If you
>>> call page_folio() on that page, you get back the folio that you stored.
>>> If you then dereference folio->private, you get the pointer that you
>>> passed to folio_attach_private().
>>>
>>> If you dereference page->private, *that is a bug*. You might get
>>> NULL, you might get garbage. Just like dereferencing page->index or
>>> page->mapping on tail pages. page_private() will also do the wrong thing
>>> (we could fix that to embed a call to page_folio() ... it hasn't been
>>> necessary before now, but if it'll help convert btrfs, then let's do it).
>>
>> That would be great. The biggest problem I'm hitting so far is the page
>> cache for metadata.
>>
>> We're using __GFP_NOFAIL for the current per-page allocation, but IIRC
>> __GFP_NOFAIL is ignored for higher order (>2 ?) folio allocation.
>> And we may want that per-page allocation as the last resort effort
>> allocation anyway.
>>
>> Thus I'm checking if there is something we can do here.
>>
>> But I guess we can always go folio_private() instead as a workaround for
>> now?
>
> I don't understand enough about what you're doing to offer useful
> advice. Is this for bs>PS or is it arbitrary large folios for better
> performance? If the latter, you can always fall back to order-0 folios.
> If the former, well, we need to adjust a few things anyway to handle
> filesystems with a minimum order ...
>
> In general, you should be using folio_private(). page->private and
> page_private() will be removed eventually.
Just another question.
What about flags like PageDirty? Are they synced with folio?
The declaration goes PF_HEAD for policy, thus for order 0 it makes no
difference, but for higher order folios, we should switch to pure folio
based operations other than mixing page and folios?
Thanks,
Qu
>
> The GFP_NOFAIL warning is:
>
> WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Mixed page compact code and (higher order) folios for filemap
2023-11-16 22:40 ` Qu Wenruo
@ 2023-11-17 13:36 ` Matthew Wilcox
0 siblings, 0 replies; 7+ messages in thread
From: Matthew Wilcox @ 2023-11-17 13:36 UTC (permalink / raw)
To: Qu Wenruo
Cc: Linux Memory Management List, Linux FS Devel,
linux-btrfs@vger.kernel.org
On Fri, Nov 17, 2023 at 09:10:10AM +1030, Qu Wenruo wrote:
> On 2023/11/17 00:53, Matthew Wilcox wrote:
> > On Thu, Nov 16, 2023 at 04:00:40PM +1030, Qu Wenruo wrote:
> > > On 2023/11/16 15:35, Matthew Wilcox wrote:
> > > > On Thu, Nov 16, 2023 at 02:11:00PM +1030, Qu Wenruo wrote:
> > > > > E.g. if I allocated a folio with order 2, attached some private data to
> > > > > the folio, then call filemap_add_folio().
> > > > >
> > > > > Later some one called find_lock_page() and hit the 2nd page of that folio.
> > > > >
> > > > > I believe the regular IO is totally fine, but what would happen for the
> > > > > page->private of that folio?
> > > > > Would them all share the same value of the folio_attach_private()? Or
> > > > > some different values?
> > > >
> > > > Well, there's no magic ...
> > > >
> > > > If you call find_lock_page(), you get back the precise page. If you
> > > > call page_folio() on that page, you get back the folio that you stored.
> > > > If you then dereference folio->private, you get the pointer that you
> > > > passed to folio_attach_private().
> > > >
> > > > If you dereference page->private, *that is a bug*. You might get
> > > > NULL, you might get garbage. Just like dereferencing page->index or
> > > > page->mapping on tail pages. page_private() will also do the wrong thing
> > > > (we could fix that to embed a call to page_folio() ... it hasn't been
> > > > necessary before now, but if it'll help convert btrfs, then let's do it).
> > >
> > > That would be great. The biggest problem I'm hitting so far is the page
> > > cache for metadata.
> > >
> > > We're using __GFP_NOFAIL for the current per-page allocation, but IIRC
> > > __GFP_NOFAIL is ignored for higher order (>2 ?) folio allocation.
> > > And we may want that per-page allocation as the last resort effort
> > > allocation anyway.
> > >
> > > Thus I'm checking if there is something we can do here.
> > >
> > > But I guess we can always go folio_private() instead as a workaround for
> > > now?
> >
> > I don't understand enough about what you're doing to offer useful
> > advice. Is this for bs>PS or is it arbitrary large folios for better
> > performance? If the latter, you can always fall back to order-0 folios.
> > If the former, well, we need to adjust a few things anyway to handle
> > filesystems with a minimum order ...
> >
> > In general, you should be using folio_private(). page->private and
> > page_private() will be removed eventually.
>
> Just another question.
>
> What about flags like PageDirty? Are they synced with folio?
Yes. You can SetPageDirty() in one function and then folio_test_dirty()
in another. Eventually all the PageFoo() functions will be removed,
except PageHWPoison and PageAnonExclusive.
> The declaration goes PF_HEAD for policy, thus for order 0 it makes no
> difference, but for higher order folios, we should switch to pure folio
> based operations other than mixing page and folios?
Every function in btrfs should be folio based. There should be nothing
in btrfs that deals with pages. Take a look at iomap's buffered I/O
paths for hints -- there's a per-block dirty and uptodate bit, but other
than that, everything is done with folios.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-11-17 13:37 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-16 3:41 Mixed page compact code and (higher order) folios for filemap Qu Wenruo
2023-11-16 5:05 ` Matthew Wilcox
2023-11-16 5:30 ` Qu Wenruo
2023-11-16 14:23 ` Matthew Wilcox
2023-11-16 20:33 ` Qu Wenruo
2023-11-16 22:40 ` Qu Wenruo
2023-11-17 13:36 ` Matthew Wilcox
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).