* Mixed page compact code and (higher order) folios for filemap @ 2023-11-16 3:41 Qu Wenruo 2023-11-16 5:05 ` Matthew Wilcox 0 siblings, 1 reply; 7+ messages in thread From: Qu Wenruo @ 2023-11-16 3:41 UTC (permalink / raw) To: Linux Memory Management List, Linux FS Devel, linux-btrfs@vger.kernel.org Hi guys, I'm wondering if there is any pitfalls when mixing the legacy page based code with higher order (order >= 1) folios. E.g. if I allocated a folio with order 2, attached some private data to the folio, then call filemap_add_folio(). Later some one called find_lock_page() and hit the 2nd page of that folio. I believe the regular IO is totally fine, but what would happen for the page->private of that folio? Would them all share the same value of the folio_attach_private()? Or some different values? Thanks, Qu ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Mixed page compact code and (higher order) folios for filemap 2023-11-16 3:41 Mixed page compact code and (higher order) folios for filemap Qu Wenruo @ 2023-11-16 5:05 ` Matthew Wilcox 2023-11-16 5:30 ` Qu Wenruo 0 siblings, 1 reply; 7+ messages in thread From: Matthew Wilcox @ 2023-11-16 5:05 UTC (permalink / raw) To: Qu Wenruo Cc: Linux Memory Management List, Linux FS Devel, linux-btrfs@vger.kernel.org On Thu, Nov 16, 2023 at 02:11:00PM +1030, Qu Wenruo wrote: > E.g. if I allocated a folio with order 2, attached some private data to > the folio, then call filemap_add_folio(). > > Later some one called find_lock_page() and hit the 2nd page of that folio. > > I believe the regular IO is totally fine, but what would happen for the > page->private of that folio? > Would them all share the same value of the folio_attach_private()? Or > some different values? Well, there's no magic ... If you call find_lock_page(), you get back the precise page. If you call page_folio() on that page, you get back the folio that you stored. If you then dereference folio->private, you get the pointer that you passed to folio_attach_private(). If you dereference page->private, *that is a bug*. You might get NULL, you might get garbage. Just like dereferencing page->index or page->mapping on tail pages. page_private() will also do the wrong thing (we could fix that to embed a call to page_folio() ... it hasn't been necessary before now, but if it'll help convert btrfs, then let's do it). ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Mixed page compact code and (higher order) folios for filemap 2023-11-16 5:05 ` Matthew Wilcox @ 2023-11-16 5:30 ` Qu Wenruo 2023-11-16 14:23 ` Matthew Wilcox 0 siblings, 1 reply; 7+ messages in thread From: Qu Wenruo @ 2023-11-16 5:30 UTC (permalink / raw) To: Matthew Wilcox Cc: Linux Memory Management List, Linux FS Devel, linux-btrfs@vger.kernel.org On 2023/11/16 15:35, Matthew Wilcox wrote: > On Thu, Nov 16, 2023 at 02:11:00PM +1030, Qu Wenruo wrote: >> E.g. if I allocated a folio with order 2, attached some private data to >> the folio, then call filemap_add_folio(). >> >> Later some one called find_lock_page() and hit the 2nd page of that folio. >> >> I believe the regular IO is totally fine, but what would happen for the >> page->private of that folio? >> Would them all share the same value of the folio_attach_private()? Or >> some different values? > > Well, there's no magic ... > > If you call find_lock_page(), you get back the precise page. If you > call page_folio() on that page, you get back the folio that you stored. > If you then dereference folio->private, you get the pointer that you > passed to folio_attach_private(). > > If you dereference page->private, *that is a bug*. You might get > NULL, you might get garbage. Just like dereferencing page->index or > page->mapping on tail pages. page_private() will also do the wrong thing > (we could fix that to embed a call to page_folio() ... it hasn't been > necessary before now, but if it'll help convert btrfs, then let's do it). That would be great. The biggest problem I'm hitting so far is the page cache for metadata. We're using __GFP_NOFAIL for the current per-page allocation, but IIRC __GFP_NOFAIL is ignored for higher order (>2 ?) folio allocation. And we may want that per-page allocation as the last resort effort allocation anyway. Thus I'm checking if there is something we can do here. But I guess we can always go folio_private() instead as a workaround for now? Thanks, Qu ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Mixed page compact code and (higher order) folios for filemap 2023-11-16 5:30 ` Qu Wenruo @ 2023-11-16 14:23 ` Matthew Wilcox 2023-11-16 20:33 ` Qu Wenruo 2023-11-16 22:40 ` Qu Wenruo 0 siblings, 2 replies; 7+ messages in thread From: Matthew Wilcox @ 2023-11-16 14:23 UTC (permalink / raw) To: Qu Wenruo Cc: Linux Memory Management List, Linux FS Devel, linux-btrfs@vger.kernel.org On Thu, Nov 16, 2023 at 04:00:40PM +1030, Qu Wenruo wrote: > On 2023/11/16 15:35, Matthew Wilcox wrote: > > On Thu, Nov 16, 2023 at 02:11:00PM +1030, Qu Wenruo wrote: > > > E.g. if I allocated a folio with order 2, attached some private data to > > > the folio, then call filemap_add_folio(). > > > > > > Later some one called find_lock_page() and hit the 2nd page of that folio. > > > > > > I believe the regular IO is totally fine, but what would happen for the > > > page->private of that folio? > > > Would them all share the same value of the folio_attach_private()? Or > > > some different values? > > > > Well, there's no magic ... > > > > If you call find_lock_page(), you get back the precise page. If you > > call page_folio() on that page, you get back the folio that you stored. > > If you then dereference folio->private, you get the pointer that you > > passed to folio_attach_private(). > > > > If you dereference page->private, *that is a bug*. You might get > > NULL, you might get garbage. Just like dereferencing page->index or > > page->mapping on tail pages. page_private() will also do the wrong thing > > (we could fix that to embed a call to page_folio() ... it hasn't been > > necessary before now, but if it'll help convert btrfs, then let's do it). > > That would be great. The biggest problem I'm hitting so far is the page > cache for metadata. > > We're using __GFP_NOFAIL for the current per-page allocation, but IIRC > __GFP_NOFAIL is ignored for higher order (>2 ?) folio allocation. > And we may want that per-page allocation as the last resort effort > allocation anyway. > > Thus I'm checking if there is something we can do here. > > But I guess we can always go folio_private() instead as a workaround for > now? I don't understand enough about what you're doing to offer useful advice. Is this for bs>PS or is it arbitrary large folios for better performance? If the latter, you can always fall back to order-0 folios. If the former, well, we need to adjust a few things anyway to handle filesystems with a minimum order ... In general, you should be using folio_private(). page->private and page_private() will be removed eventually. The GFP_NOFAIL warning is: WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Mixed page compact code and (higher order) folios for filemap 2023-11-16 14:23 ` Matthew Wilcox @ 2023-11-16 20:33 ` Qu Wenruo 2023-11-16 22:40 ` Qu Wenruo 1 sibling, 0 replies; 7+ messages in thread From: Qu Wenruo @ 2023-11-16 20:33 UTC (permalink / raw) To: Matthew Wilcox Cc: Linux Memory Management List, Linux FS Devel, linux-btrfs@vger.kernel.org On 2023/11/17 00:53, Matthew Wilcox wrote: > On Thu, Nov 16, 2023 at 04:00:40PM +1030, Qu Wenruo wrote: >> On 2023/11/16 15:35, Matthew Wilcox wrote: >>> On Thu, Nov 16, 2023 at 02:11:00PM +1030, Qu Wenruo wrote: >>>> E.g. if I allocated a folio with order 2, attached some private data to >>>> the folio, then call filemap_add_folio(). >>>> >>>> Later some one called find_lock_page() and hit the 2nd page of that folio. >>>> >>>> I believe the regular IO is totally fine, but what would happen for the >>>> page->private of that folio? >>>> Would them all share the same value of the folio_attach_private()? Or >>>> some different values? >>> >>> Well, there's no magic ... >>> >>> If you call find_lock_page(), you get back the precise page. If you >>> call page_folio() on that page, you get back the folio that you stored. >>> If you then dereference folio->private, you get the pointer that you >>> passed to folio_attach_private(). >>> >>> If you dereference page->private, *that is a bug*. You might get >>> NULL, you might get garbage. Just like dereferencing page->index or >>> page->mapping on tail pages. page_private() will also do the wrong thing >>> (we could fix that to embed a call to page_folio() ... it hasn't been >>> necessary before now, but if it'll help convert btrfs, then let's do it). >> >> That would be great. The biggest problem I'm hitting so far is the page >> cache for metadata. >> >> We're using __GFP_NOFAIL for the current per-page allocation, but IIRC >> __GFP_NOFAIL is ignored for higher order (>2 ?) folio allocation. >> And we may want that per-page allocation as the last resort effort >> allocation anyway. >> >> Thus I'm checking if there is something we can do here. >> >> But I guess we can always go folio_private() instead as a workaround for >> now? > > I don't understand enough about what you're doing to offer useful > advice. Is this for bs>PS or is it arbitrary large folios for better > performance? The ultimate goal is to make nodesize (metadata block size) > PAGE_SIZE case to go higher order folio by default, for better performance. But use order 0 folios if we failed get higher order folios. The current problem is the metadata allocation here is always going page based, and using page->private. > If the latter, you can always fall back to order-0 folios. > If the former, well, we need to adjust a few things anyway to handle > filesystems with a minimum order ... > > In general, you should be using folio_private(). page->private and > page_private() will be removed eventually. OK, that sounds good, we can do the cleanup first inside btrfs. Thanks, Qu > > The GFP_NOFAIL warning is: > > WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Mixed page compact code and (higher order) folios for filemap 2023-11-16 14:23 ` Matthew Wilcox 2023-11-16 20:33 ` Qu Wenruo @ 2023-11-16 22:40 ` Qu Wenruo 2023-11-17 13:36 ` Matthew Wilcox 1 sibling, 1 reply; 7+ messages in thread From: Qu Wenruo @ 2023-11-16 22:40 UTC (permalink / raw) To: Matthew Wilcox Cc: Linux Memory Management List, Linux FS Devel, linux-btrfs@vger.kernel.org On 2023/11/17 00:53, Matthew Wilcox wrote: > On Thu, Nov 16, 2023 at 04:00:40PM +1030, Qu Wenruo wrote: >> On 2023/11/16 15:35, Matthew Wilcox wrote: >>> On Thu, Nov 16, 2023 at 02:11:00PM +1030, Qu Wenruo wrote: >>>> E.g. if I allocated a folio with order 2, attached some private data to >>>> the folio, then call filemap_add_folio(). >>>> >>>> Later some one called find_lock_page() and hit the 2nd page of that folio. >>>> >>>> I believe the regular IO is totally fine, but what would happen for the >>>> page->private of that folio? >>>> Would them all share the same value of the folio_attach_private()? Or >>>> some different values? >>> >>> Well, there's no magic ... >>> >>> If you call find_lock_page(), you get back the precise page. If you >>> call page_folio() on that page, you get back the folio that you stored. >>> If you then dereference folio->private, you get the pointer that you >>> passed to folio_attach_private(). >>> >>> If you dereference page->private, *that is a bug*. You might get >>> NULL, you might get garbage. Just like dereferencing page->index or >>> page->mapping on tail pages. page_private() will also do the wrong thing >>> (we could fix that to embed a call to page_folio() ... it hasn't been >>> necessary before now, but if it'll help convert btrfs, then let's do it). >> >> That would be great. The biggest problem I'm hitting so far is the page >> cache for metadata. >> >> We're using __GFP_NOFAIL for the current per-page allocation, but IIRC >> __GFP_NOFAIL is ignored for higher order (>2 ?) folio allocation. >> And we may want that per-page allocation as the last resort effort >> allocation anyway. >> >> Thus I'm checking if there is something we can do here. >> >> But I guess we can always go folio_private() instead as a workaround for >> now? > > I don't understand enough about what you're doing to offer useful > advice. Is this for bs>PS or is it arbitrary large folios for better > performance? If the latter, you can always fall back to order-0 folios. > If the former, well, we need to adjust a few things anyway to handle > filesystems with a minimum order ... > > In general, you should be using folio_private(). page->private and > page_private() will be removed eventually. Just another question. What about flags like PageDirty? Are they synced with folio? The declaration goes PF_HEAD for policy, thus for order 0 it makes no difference, but for higher order folios, we should switch to pure folio based operations other than mixing page and folios? Thanks, Qu > > The GFP_NOFAIL warning is: > > WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Mixed page compact code and (higher order) folios for filemap 2023-11-16 22:40 ` Qu Wenruo @ 2023-11-17 13:36 ` Matthew Wilcox 0 siblings, 0 replies; 7+ messages in thread From: Matthew Wilcox @ 2023-11-17 13:36 UTC (permalink / raw) To: Qu Wenruo Cc: Linux Memory Management List, Linux FS Devel, linux-btrfs@vger.kernel.org On Fri, Nov 17, 2023 at 09:10:10AM +1030, Qu Wenruo wrote: > On 2023/11/17 00:53, Matthew Wilcox wrote: > > On Thu, Nov 16, 2023 at 04:00:40PM +1030, Qu Wenruo wrote: > > > On 2023/11/16 15:35, Matthew Wilcox wrote: > > > > On Thu, Nov 16, 2023 at 02:11:00PM +1030, Qu Wenruo wrote: > > > > > E.g. if I allocated a folio with order 2, attached some private data to > > > > > the folio, then call filemap_add_folio(). > > > > > > > > > > Later some one called find_lock_page() and hit the 2nd page of that folio. > > > > > > > > > > I believe the regular IO is totally fine, but what would happen for the > > > > > page->private of that folio? > > > > > Would them all share the same value of the folio_attach_private()? Or > > > > > some different values? > > > > > > > > Well, there's no magic ... > > > > > > > > If you call find_lock_page(), you get back the precise page. If you > > > > call page_folio() on that page, you get back the folio that you stored. > > > > If you then dereference folio->private, you get the pointer that you > > > > passed to folio_attach_private(). > > > > > > > > If you dereference page->private, *that is a bug*. You might get > > > > NULL, you might get garbage. Just like dereferencing page->index or > > > > page->mapping on tail pages. page_private() will also do the wrong thing > > > > (we could fix that to embed a call to page_folio() ... it hasn't been > > > > necessary before now, but if it'll help convert btrfs, then let's do it). > > > > > > That would be great. The biggest problem I'm hitting so far is the page > > > cache for metadata. > > > > > > We're using __GFP_NOFAIL for the current per-page allocation, but IIRC > > > __GFP_NOFAIL is ignored for higher order (>2 ?) folio allocation. > > > And we may want that per-page allocation as the last resort effort > > > allocation anyway. > > > > > > Thus I'm checking if there is something we can do here. > > > > > > But I guess we can always go folio_private() instead as a workaround for > > > now? > > > > I don't understand enough about what you're doing to offer useful > > advice. Is this for bs>PS or is it arbitrary large folios for better > > performance? If the latter, you can always fall back to order-0 folios. > > If the former, well, we need to adjust a few things anyway to handle > > filesystems with a minimum order ... > > > > In general, you should be using folio_private(). page->private and > > page_private() will be removed eventually. > > Just another question. > > What about flags like PageDirty? Are they synced with folio? Yes. You can SetPageDirty() in one function and then folio_test_dirty() in another. Eventually all the PageFoo() functions will be removed, except PageHWPoison and PageAnonExclusive. > The declaration goes PF_HEAD for policy, thus for order 0 it makes no > difference, but for higher order folios, we should switch to pure folio > based operations other than mixing page and folios? Every function in btrfs should be folio based. There should be nothing in btrfs that deals with pages. Take a look at iomap's buffered I/O paths for hints -- there's a per-block dirty and uptodate bit, but other than that, everything is done with folios. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-11-17 13:37 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-11-16 3:41 Mixed page compact code and (higher order) folios for filemap Qu Wenruo 2023-11-16 5:05 ` Matthew Wilcox 2023-11-16 5:30 ` Qu Wenruo 2023-11-16 14:23 ` Matthew Wilcox 2023-11-16 20:33 ` Qu Wenruo 2023-11-16 22:40 ` Qu Wenruo 2023-11-17 13:36 ` Matthew Wilcox
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).