* Re: [PATCH] mm/huge_memory: Avoid PMD-size page cache if needed [not found] <20240711104840.200573-1-gshan@redhat.com> @ 2024-07-11 20:46 ` Matthew Wilcox 2024-07-11 21:03 ` David Hildenbrand 2024-07-13 11:05 ` Ryan Roberts 0 siblings, 2 replies; 10+ messages in thread From: Matthew Wilcox @ 2024-07-11 20:46 UTC (permalink / raw) To: Gavin Shan Cc: linux-mm, linux-kernel, akpm, william.kucharski, david, ryan.roberts, shan.gavin On Thu, Jul 11, 2024 at 08:48:40PM +1000, Gavin Shan wrote: > +++ b/mm/huge_memory.c > @@ -136,7 +136,8 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, > > while (orders) { > addr = vma->vm_end - (PAGE_SIZE << order); > - if (thp_vma_suitable_order(vma, addr, order)) > + if (!(vma->vm_file && order > MAX_PAGECACHE_ORDER) && > + thp_vma_suitable_order(vma, addr, order)) > break; Why does 'orders' even contain potential orders that are larger than MAX_PAGECACHE_ORDER? We do this at the top: orders &= vma_is_anonymous(vma) ? THP_ORDERS_ALL_ANON : THP_ORDERS_ALL_FILE; include/linux/huge_mm.h:#define THP_ORDERS_ALL_FILE (BIT(PMD_ORDER) | BIT(PUD_ORDER)) ... and that seems very wrong. We support all kinds of orders for files, not just PMD order. We don't support PUD order at all. What the hell is going on here? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/huge_memory: Avoid PMD-size page cache if needed 2024-07-11 20:46 ` [PATCH] mm/huge_memory: Avoid PMD-size page cache if needed Matthew Wilcox @ 2024-07-11 21:03 ` David Hildenbrand 2024-07-11 21:20 ` David Hildenbrand 2024-07-12 5:39 ` Gavin Shan 2024-07-13 11:05 ` Ryan Roberts 1 sibling, 2 replies; 10+ messages in thread From: David Hildenbrand @ 2024-07-11 21:03 UTC (permalink / raw) To: Matthew Wilcox, Gavin Shan Cc: linux-mm, linux-kernel, akpm, william.kucharski, ryan.roberts, shan.gavin On 11.07.24 22:46, Matthew Wilcox wrote: > On Thu, Jul 11, 2024 at 08:48:40PM +1000, Gavin Shan wrote: >> +++ b/mm/huge_memory.c >> @@ -136,7 +136,8 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, >> >> while (orders) { >> addr = vma->vm_end - (PAGE_SIZE << order); >> - if (thp_vma_suitable_order(vma, addr, order)) >> + if (!(vma->vm_file && order > MAX_PAGECACHE_ORDER) && >> + thp_vma_suitable_order(vma, addr, order)) >> break; > > Why does 'orders' even contain potential orders that are larger than > MAX_PAGECACHE_ORDER? > > We do this at the top: > > orders &= vma_is_anonymous(vma) ? > THP_ORDERS_ALL_ANON : THP_ORDERS_ALL_FILE; > > include/linux/huge_mm.h:#define THP_ORDERS_ALL_FILE (BIT(PMD_ORDER) | BIT(PUD_ORDER)) > > ... and that seems very wrong. We support all kinds of orders for > files, not just PMD order. We don't support PUD order at all. > > What the hell is going on here? yes, that's just absolutely confusing. I mentioned it to Ryan lately that we should clean that up (I wanted to look into that, but am happy if someone else can help). There should likely be different defines for DAX (PMD|PUD) SHMEM (PMD) -- but soon more. Not sure if we want separate ANON_SHMEM for the time being. Hm. But shmem is already handles separately, so maybe we can just ignore shmem here. PAGECACHE (1 .. MAX_PAGECACHE_ORDER) ? But it's still unclear to me. At least DAX must stay special I think, and PAGECACHE should be capped at MAX_PAGECACHE_ORDER. -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/huge_memory: Avoid PMD-size page cache if needed 2024-07-11 21:03 ` David Hildenbrand @ 2024-07-11 21:20 ` David Hildenbrand 2024-07-12 5:39 ` Gavin Shan 1 sibling, 0 replies; 10+ messages in thread From: David Hildenbrand @ 2024-07-11 21:20 UTC (permalink / raw) To: Matthew Wilcox, Gavin Shan Cc: linux-mm, linux-kernel, akpm, william.kucharski, ryan.roberts, shan.gavin On 11.07.24 23:03, David Hildenbrand wrote: > On 11.07.24 22:46, Matthew Wilcox wrote: >> On Thu, Jul 11, 2024 at 08:48:40PM +1000, Gavin Shan wrote: >>> +++ b/mm/huge_memory.c >>> @@ -136,7 +136,8 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, >>> >>> while (orders) { >>> addr = vma->vm_end - (PAGE_SIZE << order); >>> - if (thp_vma_suitable_order(vma, addr, order)) >>> + if (!(vma->vm_file && order > MAX_PAGECACHE_ORDER) && >>> + thp_vma_suitable_order(vma, addr, order)) >>> break; >> >> Why does 'orders' even contain potential orders that are larger than >> MAX_PAGECACHE_ORDER? >> >> We do this at the top: >> >> orders &= vma_is_anonymous(vma) ? >> THP_ORDERS_ALL_ANON : THP_ORDERS_ALL_FILE; >> >> include/linux/huge_mm.h:#define THP_ORDERS_ALL_FILE (BIT(PMD_ORDER) | BIT(PUD_ORDER)) >> >> ... and that seems very wrong. We support all kinds of orders for >> files, not just PMD order. We don't support PUD order at all. >> >> What the hell is going on here? > > yes, that's just absolutely confusing. I mentioned it to Ryan lately > that we should clean that up (I wanted to look into that, but am happy > if someone else can help). > > There should likely be different defines for > > DAX (PMD|PUD) > > SHMEM (PMD) -- but soon more. Not sure if we want separate ANON_SHMEM > for the time being. Hm. But shmem is already handles separately, so > maybe we can just ignore shmem here. Correction: of course <= MAX_PAGECACHE_ORDER But yeah, this needs cleanups -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/huge_memory: Avoid PMD-size page cache if needed 2024-07-11 21:03 ` David Hildenbrand 2024-07-11 21:20 ` David Hildenbrand @ 2024-07-12 5:39 ` Gavin Shan 2024-07-13 1:03 ` David Hildenbrand 1 sibling, 1 reply; 10+ messages in thread From: Gavin Shan @ 2024-07-12 5:39 UTC (permalink / raw) To: David Hildenbrand, Matthew Wilcox Cc: linux-mm, linux-kernel, akpm, william.kucharski, ryan.roberts, shan.gavin On 7/12/24 7:03 AM, David Hildenbrand wrote: > On 11.07.24 22:46, Matthew Wilcox wrote: >> On Thu, Jul 11, 2024 at 08:48:40PM +1000, Gavin Shan wrote: >>> +++ b/mm/huge_memory.c >>> @@ -136,7 +136,8 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, >>> while (orders) { >>> addr = vma->vm_end - (PAGE_SIZE << order); >>> - if (thp_vma_suitable_order(vma, addr, order)) >>> + if (!(vma->vm_file && order > MAX_PAGECACHE_ORDER) && >>> + thp_vma_suitable_order(vma, addr, order)) >>> break; >> >> Why does 'orders' even contain potential orders that are larger than >> MAX_PAGECACHE_ORDER? >> >> We do this at the top: >> >> orders &= vma_is_anonymous(vma) ? >> THP_ORDERS_ALL_ANON : THP_ORDERS_ALL_FILE; >> >> include/linux/huge_mm.h:#define THP_ORDERS_ALL_FILE (BIT(PMD_ORDER) | BIT(PUD_ORDER)) >> >> ... and that seems very wrong. We support all kinds of orders for >> files, not just PMD order. We don't support PUD order at all. >> >> What the hell is going on here? > > yes, that's just absolutely confusing. I mentioned it to Ryan lately that we should clean that up (I wanted to look into that, but am happy if someone else can help). > > There should likely be different defines for > > DAX (PMD|PUD) > > SHMEM (PMD) -- but soon more. Not sure if we want separate ANON_SHMEM for the time being. Hm. But shmem is already handles separately, so maybe we can just ignore shmem here. > > PAGECACHE (1 .. MAX_PAGECACHE_ORDER) > > ? But it's still unclear to me. > > At least DAX must stay special I think, and PAGECACHE should be capped at MAX_PAGECACHE_ORDER. > David, I can help to clean it up. Could you please help to confirm the following changes are exactly what you're suggesting? Hopefully, there are nothing I've missed. The original issue can be fixed by the changes. With the changes applied, madvise(MADV_COLLAPSE) returns with errno -22 in the test program. The fix tag needs to adjusted either. Fixes: 3485b88390b0 ("mm: thp: introduce multi-size THP sysfs interface") diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 2aa986a5cd1b..45909efb0ef0 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -74,7 +74,12 @@ extern struct kobj_attribute shmem_enabled_attr; /* * Mask of all large folio orders supported for file THP. */ -#define THP_ORDERS_ALL_FILE (BIT(PMD_ORDER) | BIT(PUD_ORDER)) +#define THP_ORDERS_ALL_FILE_DAX \ + ((BIT(PMD_ORDER) | BIT(PUD_ORDER)) & (BIT(MAX_PAGECACHE_ORDER + 1) - 1)) +#define THP_ORDERS_ALL_FILE_DEFAULT \ + ((BIT(MAX_PAGECACHE_ORDER + 1) - 1) & ~BIT(0)) +#define THP_ORDERS_ALL_FILE \ + (THP_ORDERS_ALL_FILE_DAX | THP_ORDERS_ALL_FILE_DEFAULT) /* * Mask of all large folio orders supported for THP. diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2120f7478e55..4690f33afaa6 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -88,9 +88,17 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, bool smaps = tva_flags & TVA_SMAPS; bool in_pf = tva_flags & TVA_IN_PF; bool enforce_sysfs = tva_flags & TVA_ENFORCE_SYSFS; + unsigned long supported_orders; + /* Check the intersection of requested and supported orders. */ - orders &= vma_is_anonymous(vma) ? - THP_ORDERS_ALL_ANON : THP_ORDERS_ALL_FILE; + if (vma_is_anonymous(vma)) + supported_orders = THP_ORDERS_ALL_ANON; + else if (vma_is_dax(vma)) + supported_orders = THP_ORDERS_ALL_FILE_DAX; + else + supported_orders = THP_ORDERS_ALL_FILE_DEFAULT; + + orders &= supported_orders; if (!orders) return 0; Thanks, Gavin ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/huge_memory: Avoid PMD-size page cache if needed 2024-07-12 5:39 ` Gavin Shan @ 2024-07-13 1:03 ` David Hildenbrand 2024-07-13 4:01 ` Baolin Wang 2024-07-13 9:25 ` Gavin Shan 0 siblings, 2 replies; 10+ messages in thread From: David Hildenbrand @ 2024-07-13 1:03 UTC (permalink / raw) To: Gavin Shan, Matthew Wilcox Cc: linux-mm, linux-kernel, akpm, william.kucharski, ryan.roberts, shan.gavin On 12.07.24 07:39, Gavin Shan wrote: > On 7/12/24 7:03 AM, David Hildenbrand wrote: >> On 11.07.24 22:46, Matthew Wilcox wrote: >>> On Thu, Jul 11, 2024 at 08:48:40PM +1000, Gavin Shan wrote: >>>> +++ b/mm/huge_memory.c >>>> @@ -136,7 +136,8 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, >>>> while (orders) { >>>> addr = vma->vm_end - (PAGE_SIZE << order); >>>> - if (thp_vma_suitable_order(vma, addr, order)) >>>> + if (!(vma->vm_file && order > MAX_PAGECACHE_ORDER) && >>>> + thp_vma_suitable_order(vma, addr, order)) >>>> break; >>> >>> Why does 'orders' even contain potential orders that are larger than >>> MAX_PAGECACHE_ORDER? >>> >>> We do this at the top: >>> >>> orders &= vma_is_anonymous(vma) ? >>> THP_ORDERS_ALL_ANON : THP_ORDERS_ALL_FILE; >>> >>> include/linux/huge_mm.h:#define THP_ORDERS_ALL_FILE (BIT(PMD_ORDER) | BIT(PUD_ORDER)) >>> >>> ... and that seems very wrong. We support all kinds of orders for >>> files, not just PMD order. We don't support PUD order at all. >>> >>> What the hell is going on here? >> >> yes, that's just absolutely confusing. I mentioned it to Ryan lately that we should clean that up (I wanted to look into that, but am happy if someone else can help). >> >> There should likely be different defines for >> >> DAX (PMD|PUD) >> >> SHMEM (PMD) -- but soon more. Not sure if we want separate ANON_SHMEM for the time being. Hm. But shmem is already handles separately, so maybe we can just ignore shmem here. >> >> PAGECACHE (1 .. MAX_PAGECACHE_ORDER) >> >> ? But it's still unclear to me. >> >> At least DAX must stay special I think, and PAGECACHE should be capped at MAX_PAGECACHE_ORDER. >> > > David, I can help to clean it up. Could you please help to confirm the following Thanks! > changes are exactly what you're suggesting? Hopefully, there are nothing I've missed. > The original issue can be fixed by the changes. With the changes applied, madvise(MADV_COLLAPSE) > returns with errno -22 in the test program. > > The fix tag needs to adjusted either. > > Fixes: 3485b88390b0 ("mm: thp: introduce multi-size THP sysfs interface") > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > index 2aa986a5cd1b..45909efb0ef0 100644 > --- a/include/linux/huge_mm.h > +++ b/include/linux/huge_mm.h > @@ -74,7 +74,12 @@ extern struct kobj_attribute shmem_enabled_attr; > /* > * Mask of all large folio orders supported for file THP. > */ > -#define THP_ORDERS_ALL_FILE (BIT(PMD_ORDER) | BIT(PUD_ORDER)) DAX doesn't have any MAX_PAGECACHE_ORDER restrictions (like hugetlb). So this should be /* * FSDAX never splits folios, so the MAX_PAGECACHE_ORDER limit does not * apply here. */ THP_ORDERS_ALL_FILE_DAX ((BIT(PMD_ORDER) | BIT(PUD_ORDER)) Something like that > +#define THP_ORDERS_ALL_FILE_DAX \ > + ((BIT(PMD_ORDER) | BIT(PUD_ORDER)) & (BIT(MAX_PAGECACHE_ORDER + 1) - 1)) > +#define THP_ORDERS_ALL_FILE_DEFAULT \ > + ((BIT(MAX_PAGECACHE_ORDER + 1) - 1) & ~BIT(0)) > +#define THP_ORDERS_ALL_FILE \ > + (THP_ORDERS_ALL_FILE_DAX | THP_ORDERS_ALL_FILE_DEFAULT) Maybe we can get rid of THP_ORDERS_ALL_FILE (to prevent abuse) and fixup THP_ORDERS_ALL instead. > > /* > * Mask of all large folio orders supported for THP. > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 2120f7478e55..4690f33afaa6 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -88,9 +88,17 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, > bool smaps = tva_flags & TVA_SMAPS; > bool in_pf = tva_flags & TVA_IN_PF; > bool enforce_sysfs = tva_flags & TVA_ENFORCE_SYSFS; > + unsigned long supported_orders; > + > /* Check the intersection of requested and supported orders. */ > - orders &= vma_is_anonymous(vma) ? > - THP_ORDERS_ALL_ANON : THP_ORDERS_ALL_FILE; > + if (vma_is_anonymous(vma)) > + supported_orders = THP_ORDERS_ALL_ANON; > + else if (vma_is_dax(vma)) > + supported_orders = THP_ORDERS_ALL_FILE_DAX; > + else > + supported_orders = THP_ORDERS_ALL_FILE_DEFAULT; This is what I had in mind. But, do we have to special-case shmem as well or will that be handled correctly? -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/huge_memory: Avoid PMD-size page cache if needed 2024-07-13 1:03 ` David Hildenbrand @ 2024-07-13 4:01 ` Baolin Wang 2024-07-13 4:17 ` David Hildenbrand 2024-07-13 9:25 ` Gavin Shan 1 sibling, 1 reply; 10+ messages in thread From: Baolin Wang @ 2024-07-13 4:01 UTC (permalink / raw) To: David Hildenbrand, Gavin Shan, Matthew Wilcox Cc: linux-mm, linux-kernel, akpm, william.kucharski, ryan.roberts, shan.gavin On 2024/7/13 09:03, David Hildenbrand wrote: > On 12.07.24 07:39, Gavin Shan wrote: >> On 7/12/24 7:03 AM, David Hildenbrand wrote: >>> On 11.07.24 22:46, Matthew Wilcox wrote: >>>> On Thu, Jul 11, 2024 at 08:48:40PM +1000, Gavin Shan wrote: >>>>> +++ b/mm/huge_memory.c >>>>> @@ -136,7 +136,8 @@ unsigned long __thp_vma_allowable_orders(struct >>>>> vm_area_struct *vma, >>>>> while (orders) { >>>>> addr = vma->vm_end - (PAGE_SIZE << order); >>>>> - if (thp_vma_suitable_order(vma, addr, order)) >>>>> + if (!(vma->vm_file && order > MAX_PAGECACHE_ORDER) && >>>>> + thp_vma_suitable_order(vma, addr, order)) >>>>> break; >>>> >>>> Why does 'orders' even contain potential orders that are larger than >>>> MAX_PAGECACHE_ORDER? >>>> >>>> We do this at the top: >>>> >>>> orders &= vma_is_anonymous(vma) ? >>>> THP_ORDERS_ALL_ANON : THP_ORDERS_ALL_FILE; >>>> >>>> include/linux/huge_mm.h:#define THP_ORDERS_ALL_FILE >>>> (BIT(PMD_ORDER) | BIT(PUD_ORDER)) >>>> >>>> ... and that seems very wrong. We support all kinds of orders for >>>> files, not just PMD order. We don't support PUD order at all. >>>> >>>> What the hell is going on here? >>> >>> yes, that's just absolutely confusing. I mentioned it to Ryan lately >>> that we should clean that up (I wanted to look into that, but am >>> happy if someone else can help). >>> >>> There should likely be different defines for >>> >>> DAX (PMD|PUD) >>> >>> SHMEM (PMD) -- but soon more. Not sure if we want separate ANON_SHMEM >>> for the time being. Hm. But shmem is already handles separately, so >>> maybe we can just ignore shmem here. >>> >>> PAGECACHE (1 .. MAX_PAGECACHE_ORDER) >>> >>> ? But it's still unclear to me. >>> >>> At least DAX must stay special I think, and PAGECACHE should be >>> capped at MAX_PAGECACHE_ORDER. >>> >> >> David, I can help to clean it up. Could you please help to confirm the >> following > > Thanks! > >> changes are exactly what you're suggesting? Hopefully, there are >> nothing I've missed. >> The original issue can be fixed by the changes. With the changes >> applied, madvise(MADV_COLLAPSE) >> returns with errno -22 in the test program. >> >> The fix tag needs to adjusted either. >> >> Fixes: 3485b88390b0 ("mm: thp: introduce multi-size THP sysfs interface") >> >> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h >> index 2aa986a5cd1b..45909efb0ef0 100644 >> --- a/include/linux/huge_mm.h >> +++ b/include/linux/huge_mm.h >> @@ -74,7 +74,12 @@ extern struct kobj_attribute shmem_enabled_attr; >> /* >> * Mask of all large folio orders supported for file THP. >> */ >> -#define THP_ORDERS_ALL_FILE (BIT(PMD_ORDER) | BIT(PUD_ORDER)) > > DAX doesn't have any MAX_PAGECACHE_ORDER restrictions (like hugetlb). So > this should be > > /* > * FSDAX never splits folios, so the MAX_PAGECACHE_ORDER limit does not > * apply here. > */ > THP_ORDERS_ALL_FILE_DAX ((BIT(PMD_ORDER) | BIT(PUD_ORDER)) > > Something like that > >> +#define THP_ORDERS_ALL_FILE_DAX \ >> + ((BIT(PMD_ORDER) | BIT(PUD_ORDER)) & (BIT(MAX_PAGECACHE_ORDER >> + 1) - 1)) >> +#define THP_ORDERS_ALL_FILE_DEFAULT \ >> + ((BIT(MAX_PAGECACHE_ORDER + 1) - 1) & ~BIT(0)) >> +#define THP_ORDERS_ALL_FILE \ >> + (THP_ORDERS_ALL_FILE_DAX | THP_ORDERS_ALL_FILE_DEFAULT) > > Maybe we can get rid of THP_ORDERS_ALL_FILE (to prevent abuse) and fixup > THP_ORDERS_ALL instead. > >> /* >> * Mask of all large folio orders supported for THP. >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index 2120f7478e55..4690f33afaa6 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -88,9 +88,17 @@ unsigned long __thp_vma_allowable_orders(struct >> vm_area_struct *vma, >> bool smaps = tva_flags & TVA_SMAPS; >> bool in_pf = tva_flags & TVA_IN_PF; >> bool enforce_sysfs = tva_flags & TVA_ENFORCE_SYSFS; >> + unsigned long supported_orders; >> + >> /* Check the intersection of requested and supported orders. */ >> - orders &= vma_is_anonymous(vma) ? >> - THP_ORDERS_ALL_ANON : THP_ORDERS_ALL_FILE; >> + if (vma_is_anonymous(vma)) >> + supported_orders = THP_ORDERS_ALL_ANON; >> + else if (vma_is_dax(vma)) >> + supported_orders = THP_ORDERS_ALL_FILE_DAX; >> + else >> + supported_orders = THP_ORDERS_ALL_FILE_DEFAULT; > > This is what I had in mind. > > But, do we have to special-case shmem as well or will that be handled > correctly? For anonymous shmem, it is now same as anonymous THP, which can utilize THP_ORDERS_ALL_ANON. For tmpfs, we currently only support PMD-sized THP (will support more larger orders in the future). Therefore, I think we can reuse THP_ORDERS_ALL_ANON for shmem now: if (vma_is_anonymous(vma) || shmem_file(vma->vm_file))) supported_orders = THP_ORDERS_ALL_ANON; ...... ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/huge_memory: Avoid PMD-size page cache if needed 2024-07-13 4:01 ` Baolin Wang @ 2024-07-13 4:17 ` David Hildenbrand 2024-07-13 12:57 ` Baolin Wang 0 siblings, 1 reply; 10+ messages in thread From: David Hildenbrand @ 2024-07-13 4:17 UTC (permalink / raw) To: Baolin Wang, Gavin Shan, Matthew Wilcox Cc: linux-mm, linux-kernel, akpm, william.kucharski, ryan.roberts, shan.gavin On 13.07.24 06:01, Baolin Wang wrote: > > > On 2024/7/13 09:03, David Hildenbrand wrote: >> On 12.07.24 07:39, Gavin Shan wrote: >>> On 7/12/24 7:03 AM, David Hildenbrand wrote: >>>> On 11.07.24 22:46, Matthew Wilcox wrote: >>>>> On Thu, Jul 11, 2024 at 08:48:40PM +1000, Gavin Shan wrote: >>>>>> +++ b/mm/huge_memory.c >>>>>> @@ -136,7 +136,8 @@ unsigned long __thp_vma_allowable_orders(struct >>>>>> vm_area_struct *vma, >>>>>> while (orders) { >>>>>> addr = vma->vm_end - (PAGE_SIZE << order); >>>>>> - if (thp_vma_suitable_order(vma, addr, order)) >>>>>> + if (!(vma->vm_file && order > MAX_PAGECACHE_ORDER) && >>>>>> + thp_vma_suitable_order(vma, addr, order)) >>>>>> break; >>>>> >>>>> Why does 'orders' even contain potential orders that are larger than >>>>> MAX_PAGECACHE_ORDER? >>>>> >>>>> We do this at the top: >>>>> >>>>> orders &= vma_is_anonymous(vma) ? >>>>> THP_ORDERS_ALL_ANON : THP_ORDERS_ALL_FILE; >>>>> >>>>> include/linux/huge_mm.h:#define THP_ORDERS_ALL_FILE >>>>> (BIT(PMD_ORDER) | BIT(PUD_ORDER)) >>>>> >>>>> ... and that seems very wrong. We support all kinds of orders for >>>>> files, not just PMD order. We don't support PUD order at all. >>>>> >>>>> What the hell is going on here? >>>> >>>> yes, that's just absolutely confusing. I mentioned it to Ryan lately >>>> that we should clean that up (I wanted to look into that, but am >>>> happy if someone else can help). >>>> >>>> There should likely be different defines for >>>> >>>> DAX (PMD|PUD) >>>> >>>> SHMEM (PMD) -- but soon more. Not sure if we want separate ANON_SHMEM >>>> for the time being. Hm. But shmem is already handles separately, so >>>> maybe we can just ignore shmem here. >>>> >>>> PAGECACHE (1 .. MAX_PAGECACHE_ORDER) >>>> >>>> ? But it's still unclear to me. >>>> >>>> At least DAX must stay special I think, and PAGECACHE should be >>>> capped at MAX_PAGECACHE_ORDER. >>>> >>> >>> David, I can help to clean it up. Could you please help to confirm the >>> following >> >> Thanks! >> >>> changes are exactly what you're suggesting? Hopefully, there are >>> nothing I've missed. >>> The original issue can be fixed by the changes. With the changes >>> applied, madvise(MADV_COLLAPSE) >>> returns with errno -22 in the test program. >>> >>> The fix tag needs to adjusted either. >>> >>> Fixes: 3485b88390b0 ("mm: thp: introduce multi-size THP sysfs interface") >>> >>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h >>> index 2aa986a5cd1b..45909efb0ef0 100644 >>> --- a/include/linux/huge_mm.h >>> +++ b/include/linux/huge_mm.h >>> @@ -74,7 +74,12 @@ extern struct kobj_attribute shmem_enabled_attr; >>> /* >>> * Mask of all large folio orders supported for file THP. >>> */ >>> -#define THP_ORDERS_ALL_FILE (BIT(PMD_ORDER) | BIT(PUD_ORDER)) >> >> DAX doesn't have any MAX_PAGECACHE_ORDER restrictions (like hugetlb). So >> this should be >> >> /* >> * FSDAX never splits folios, so the MAX_PAGECACHE_ORDER limit does not >> * apply here. >> */ >> THP_ORDERS_ALL_FILE_DAX ((BIT(PMD_ORDER) | BIT(PUD_ORDER)) >> >> Something like that >> >>> +#define THP_ORDERS_ALL_FILE_DAX \ >>> + ((BIT(PMD_ORDER) | BIT(PUD_ORDER)) & (BIT(MAX_PAGECACHE_ORDER >>> + 1) - 1)) >>> +#define THP_ORDERS_ALL_FILE_DEFAULT \ >>> + ((BIT(MAX_PAGECACHE_ORDER + 1) - 1) & ~BIT(0)) >>> +#define THP_ORDERS_ALL_FILE \ >>> + (THP_ORDERS_ALL_FILE_DAX | THP_ORDERS_ALL_FILE_DEFAULT) >> >> Maybe we can get rid of THP_ORDERS_ALL_FILE (to prevent abuse) and fixup >> THP_ORDERS_ALL instead. >> >>> /* >>> * Mask of all large folio orders supported for THP. >>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>> index 2120f7478e55..4690f33afaa6 100644 >>> --- a/mm/huge_memory.c >>> +++ b/mm/huge_memory.c >>> @@ -88,9 +88,17 @@ unsigned long __thp_vma_allowable_orders(struct >>> vm_area_struct *vma, >>> bool smaps = tva_flags & TVA_SMAPS; >>> bool in_pf = tva_flags & TVA_IN_PF; >>> bool enforce_sysfs = tva_flags & TVA_ENFORCE_SYSFS; >>> + unsigned long supported_orders; >>> + >>> /* Check the intersection of requested and supported orders. */ >>> - orders &= vma_is_anonymous(vma) ? >>> - THP_ORDERS_ALL_ANON : THP_ORDERS_ALL_FILE; >>> + if (vma_is_anonymous(vma)) >>> + supported_orders = THP_ORDERS_ALL_ANON; >>> + else if (vma_is_dax(vma)) >>> + supported_orders = THP_ORDERS_ALL_FILE_DAX; >>> + else >>> + supported_orders = THP_ORDERS_ALL_FILE_DEFAULT; >> >> This is what I had in mind. >> >> But, do we have to special-case shmem as well or will that be handled >> correctly? > > For anonymous shmem, it is now same as anonymous THP, which can utilize > THP_ORDERS_ALL_ANON. > For tmpfs, we currently only support PMD-sized THP > (will support more larger orders in the future). Therefore, I think we > can reuse THP_ORDERS_ALL_ANON for shmem now: > > if (vma_is_anonymous(vma) || shmem_file(vma->vm_file))) > supported_orders = THP_ORDERS_ALL_ANON; > ...... > It should be THP_ORDERS_ALL_FILE_DEFAULT (MAX_PAGECACHE_ORDER imitation applies). -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/huge_memory: Avoid PMD-size page cache if needed 2024-07-13 4:17 ` David Hildenbrand @ 2024-07-13 12:57 ` Baolin Wang 0 siblings, 0 replies; 10+ messages in thread From: Baolin Wang @ 2024-07-13 12:57 UTC (permalink / raw) To: David Hildenbrand, Gavin Shan, Matthew Wilcox Cc: linux-mm, linux-kernel, akpm, william.kucharski, ryan.roberts, shan.gavin On 2024/7/13 12:17, David Hildenbrand wrote: > On 13.07.24 06:01, Baolin Wang wrote: >> >> >> On 2024/7/13 09:03, David Hildenbrand wrote: >>> On 12.07.24 07:39, Gavin Shan wrote: >>>> On 7/12/24 7:03 AM, David Hildenbrand wrote: >>>>> On 11.07.24 22:46, Matthew Wilcox wrote: >>>>>> On Thu, Jul 11, 2024 at 08:48:40PM +1000, Gavin Shan wrote: >>>>>>> +++ b/mm/huge_memory.c >>>>>>> @@ -136,7 +136,8 @@ unsigned long __thp_vma_allowable_orders(struct >>>>>>> vm_area_struct *vma, >>>>>>> while (orders) { >>>>>>> addr = vma->vm_end - (PAGE_SIZE << order); >>>>>>> - if (thp_vma_suitable_order(vma, addr, order)) >>>>>>> + if (!(vma->vm_file && order > MAX_PAGECACHE_ORDER) && >>>>>>> + thp_vma_suitable_order(vma, addr, order)) >>>>>>> break; >>>>>> >>>>>> Why does 'orders' even contain potential orders that are larger than >>>>>> MAX_PAGECACHE_ORDER? >>>>>> >>>>>> We do this at the top: >>>>>> >>>>>> orders &= vma_is_anonymous(vma) ? >>>>>> THP_ORDERS_ALL_ANON : THP_ORDERS_ALL_FILE; >>>>>> >>>>>> include/linux/huge_mm.h:#define THP_ORDERS_ALL_FILE >>>>>> (BIT(PMD_ORDER) | BIT(PUD_ORDER)) >>>>>> >>>>>> ... and that seems very wrong. We support all kinds of orders for >>>>>> files, not just PMD order. We don't support PUD order at all. >>>>>> >>>>>> What the hell is going on here? >>>>> >>>>> yes, that's just absolutely confusing. I mentioned it to Ryan lately >>>>> that we should clean that up (I wanted to look into that, but am >>>>> happy if someone else can help). >>>>> >>>>> There should likely be different defines for >>>>> >>>>> DAX (PMD|PUD) >>>>> >>>>> SHMEM (PMD) -- but soon more. Not sure if we want separate ANON_SHMEM >>>>> for the time being. Hm. But shmem is already handles separately, so >>>>> maybe we can just ignore shmem here. >>>>> >>>>> PAGECACHE (1 .. MAX_PAGECACHE_ORDER) >>>>> >>>>> ? But it's still unclear to me. >>>>> >>>>> At least DAX must stay special I think, and PAGECACHE should be >>>>> capped at MAX_PAGECACHE_ORDER. >>>>> >>>> >>>> David, I can help to clean it up. Could you please help to confirm the >>>> following >>> >>> Thanks! >>> >>>> changes are exactly what you're suggesting? Hopefully, there are >>>> nothing I've missed. >>>> The original issue can be fixed by the changes. With the changes >>>> applied, madvise(MADV_COLLAPSE) >>>> returns with errno -22 in the test program. >>>> >>>> The fix tag needs to adjusted either. >>>> >>>> Fixes: 3485b88390b0 ("mm: thp: introduce multi-size THP sysfs >>>> interface") >>>> >>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h >>>> index 2aa986a5cd1b..45909efb0ef0 100644 >>>> --- a/include/linux/huge_mm.h >>>> +++ b/include/linux/huge_mm.h >>>> @@ -74,7 +74,12 @@ extern struct kobj_attribute shmem_enabled_attr; >>>> /* >>>> * Mask of all large folio orders supported for file THP. >>>> */ >>>> -#define THP_ORDERS_ALL_FILE (BIT(PMD_ORDER) | BIT(PUD_ORDER)) >>> >>> DAX doesn't have any MAX_PAGECACHE_ORDER restrictions (like hugetlb). So >>> this should be >>> >>> /* >>> * FSDAX never splits folios, so the MAX_PAGECACHE_ORDER limit does >>> not >>> * apply here. >>> */ >>> THP_ORDERS_ALL_FILE_DAX ((BIT(PMD_ORDER) | BIT(PUD_ORDER)) >>> >>> Something like that >>> >>>> +#define THP_ORDERS_ALL_FILE_DAX \ >>>> + ((BIT(PMD_ORDER) | BIT(PUD_ORDER)) & (BIT(MAX_PAGECACHE_ORDER >>>> + 1) - 1)) >>>> +#define THP_ORDERS_ALL_FILE_DEFAULT \ >>>> + ((BIT(MAX_PAGECACHE_ORDER + 1) - 1) & ~BIT(0)) >>>> +#define THP_ORDERS_ALL_FILE \ >>>> + (THP_ORDERS_ALL_FILE_DAX | THP_ORDERS_ALL_FILE_DEFAULT) >>> >>> Maybe we can get rid of THP_ORDERS_ALL_FILE (to prevent abuse) and fixup >>> THP_ORDERS_ALL instead. >>> >>>> /* >>>> * Mask of all large folio orders supported for THP. >>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>>> index 2120f7478e55..4690f33afaa6 100644 >>>> --- a/mm/huge_memory.c >>>> +++ b/mm/huge_memory.c >>>> @@ -88,9 +88,17 @@ unsigned long __thp_vma_allowable_orders(struct >>>> vm_area_struct *vma, >>>> bool smaps = tva_flags & TVA_SMAPS; >>>> bool in_pf = tva_flags & TVA_IN_PF; >>>> bool enforce_sysfs = tva_flags & TVA_ENFORCE_SYSFS; >>>> + unsigned long supported_orders; >>>> + >>>> /* Check the intersection of requested and supported >>>> orders. */ >>>> - orders &= vma_is_anonymous(vma) ? >>>> - THP_ORDERS_ALL_ANON : THP_ORDERS_ALL_FILE; >>>> + if (vma_is_anonymous(vma)) >>>> + supported_orders = THP_ORDERS_ALL_ANON; >>>> + else if (vma_is_dax(vma)) >>>> + supported_orders = THP_ORDERS_ALL_FILE_DAX; >>>> + else >>>> + supported_orders = THP_ORDERS_ALL_FILE_DEFAULT; >>> >>> This is what I had in mind. >>> >>> But, do we have to special-case shmem as well or will that be handled >>> correctly? >> >> For anonymous shmem, it is now same as anonymous THP, which can utilize >> THP_ORDERS_ALL_ANON. >> For tmpfs, we currently only support PMD-sized THP >> (will support more larger orders in the future). Therefore, I think we >> can reuse THP_ORDERS_ALL_ANON for shmem now: >> >> if (vma_is_anonymous(vma) || shmem_file(vma->vm_file))) >> supported_orders = THP_ORDERS_ALL_ANON; >> ...... >> > > > It should be THP_ORDERS_ALL_FILE_DEFAULT (MAX_PAGECACHE_ORDER imitation > applies). Yes, indeed, I missed MAX_PAGECACHE_ORDER limitation. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/huge_memory: Avoid PMD-size page cache if needed 2024-07-13 1:03 ` David Hildenbrand 2024-07-13 4:01 ` Baolin Wang @ 2024-07-13 9:25 ` Gavin Shan 1 sibling, 0 replies; 10+ messages in thread From: Gavin Shan @ 2024-07-13 9:25 UTC (permalink / raw) To: David Hildenbrand, Matthew Wilcox Cc: linux-mm, linux-kernel, akpm, william.kucharski, ryan.roberts, shan.gavin On 7/13/24 11:03 AM, David Hildenbrand wrote: > On 12.07.24 07:39, Gavin Shan wrote: >> >> David, I can help to clean it up. Could you please help to confirm the following > > Thanks! > >> changes are exactly what you're suggesting? Hopefully, there are nothing I've missed. >> The original issue can be fixed by the changes. With the changes applied, madvise(MADV_COLLAPSE) >> returns with errno -22 in the test program. >> >> The fix tag needs to adjusted either. >> >> Fixes: 3485b88390b0 ("mm: thp: introduce multi-size THP sysfs interface") >> >> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h >> index 2aa986a5cd1b..45909efb0ef0 100644 >> --- a/include/linux/huge_mm.h >> +++ b/include/linux/huge_mm.h >> @@ -74,7 +74,12 @@ extern struct kobj_attribute shmem_enabled_attr; >> /* >> * Mask of all large folio orders supported for file THP. >> */ >> -#define THP_ORDERS_ALL_FILE (BIT(PMD_ORDER) | BIT(PUD_ORDER)) > > DAX doesn't have any MAX_PAGECACHE_ORDER restrictions (like hugetlb). So this should be > > /* > * FSDAX never splits folios, so the MAX_PAGECACHE_ORDER limit does not > * apply here. > */ > THP_ORDERS_ALL_FILE_DAX ((BIT(PMD_ORDER) | BIT(PUD_ORDER)) > > Something like that > Ok. It will be corrected in v2. >> +#define THP_ORDERS_ALL_FILE_DAX \ >> + ((BIT(PMD_ORDER) | BIT(PUD_ORDER)) & (BIT(MAX_PAGECACHE_ORDER + 1) - 1)) >> +#define THP_ORDERS_ALL_FILE_DEFAULT \ >> + ((BIT(MAX_PAGECACHE_ORDER + 1) - 1) & ~BIT(0)) >> +#define THP_ORDERS_ALL_FILE \ >> + (THP_ORDERS_ALL_FILE_DAX | THP_ORDERS_ALL_FILE_DEFAULT) > > Maybe we can get rid of THP_ORDERS_ALL_FILE (to prevent abuse) and fixup > THP_ORDERS_ALL instead. > Sure, it will be removed in v2. >> /* >> * Mask of all large folio orders supported for THP. >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index 2120f7478e55..4690f33afaa6 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -88,9 +88,17 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, >> bool smaps = tva_flags & TVA_SMAPS; >> bool in_pf = tva_flags & TVA_IN_PF; >> bool enforce_sysfs = tva_flags & TVA_ENFORCE_SYSFS; >> + unsigned long supported_orders; >> + >> /* Check the intersection of requested and supported orders. */ >> - orders &= vma_is_anonymous(vma) ? >> - THP_ORDERS_ALL_ANON : THP_ORDERS_ALL_FILE; >> + if (vma_is_anonymous(vma)) >> + supported_orders = THP_ORDERS_ALL_ANON; >> + else if (vma_is_dax(vma)) >> + supported_orders = THP_ORDERS_ALL_FILE_DAX; >> + else >> + supported_orders = THP_ORDERS_ALL_FILE_DEFAULT; > > This is what I had in mind. > > But, do we have to special-case shmem as well or will that be handled correctly? > With previous fixes and this one, I don't see there is any missed cases for shmem to have 512MB page cache, exceeding MAX_PAGECACHE_ORDER. Hopefully, I don't miss anything from the code inspection. - regular read/write paths: covered by the previous fixes - synchronous readahead: covered by the previous fixes - asynchronous readahead: page size granularity, no huge page - page fault handling: covered by the previous fixes - collapsing PTEs to PMD: to be covered by this patch - swapin: shouldn't have 512MB huge page since we don't have such huge pages during swapout period - other cases I missed (?) Thanks, Gavin ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/huge_memory: Avoid PMD-size page cache if needed 2024-07-11 20:46 ` [PATCH] mm/huge_memory: Avoid PMD-size page cache if needed Matthew Wilcox 2024-07-11 21:03 ` David Hildenbrand @ 2024-07-13 11:05 ` Ryan Roberts 1 sibling, 0 replies; 10+ messages in thread From: Ryan Roberts @ 2024-07-13 11:05 UTC (permalink / raw) To: Matthew Wilcox, Gavin Shan Cc: linux-mm, linux-kernel, akpm, william.kucharski, david, shan.gavin On 11/07/2024 21:46, Matthew Wilcox wrote: > On Thu, Jul 11, 2024 at 08:48:40PM +1000, Gavin Shan wrote: >> +++ b/mm/huge_memory.c >> @@ -136,7 +136,8 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, >> >> while (orders) { >> addr = vma->vm_end - (PAGE_SIZE << order); >> - if (thp_vma_suitable_order(vma, addr, order)) >> + if (!(vma->vm_file && order > MAX_PAGECACHE_ORDER) && >> + thp_vma_suitable_order(vma, addr, order)) >> break; > > Why does 'orders' even contain potential orders that are larger than > MAX_PAGECACHE_ORDER? > > We do this at the top: > > orders &= vma_is_anonymous(vma) ? > THP_ORDERS_ALL_ANON : THP_ORDERS_ALL_FILE; > > include/linux/huge_mm.h:#define THP_ORDERS_ALL_FILE (BIT(PMD_ORDER) | BIT(PUD_ORDER)) > > ... and that seems very wrong. We support all kinds of orders for > files, not just PMD order. We don't support PUD order at all. > > What the hell is going on here? Just to try to justify this a little, it was my perspective when adding (anon) mTHP that memory was either anon or file; Anything that populated vma->vm_file was file, including shmem, DAX, etc. Before my change THP could install PMD size mappings for anon, and PMD or PUD size mappings for file memory (but yes, PUD was only really applicable to DAX in practice, I believe). I agree it would be good to clean this up, but I don't think the current code is quite as mad as you're implying, Matthew? ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-07-13 12:58 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20240711104840.200573-1-gshan@redhat.com>
2024-07-11 20:46 ` [PATCH] mm/huge_memory: Avoid PMD-size page cache if needed Matthew Wilcox
2024-07-11 21:03 ` David Hildenbrand
2024-07-11 21:20 ` David Hildenbrand
2024-07-12 5:39 ` Gavin Shan
2024-07-13 1:03 ` David Hildenbrand
2024-07-13 4:01 ` Baolin Wang
2024-07-13 4:17 ` David Hildenbrand
2024-07-13 12:57 ` Baolin Wang
2024-07-13 9:25 ` Gavin Shan
2024-07-13 11:05 ` Ryan Roberts
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).