* [PATCH] powerpc/mm: Handle page table allocation failures
@ 2019-05-14  1:05 Aneesh Kumar K.V
  2019-05-14  6:40 ` Michael Ellerman
  0 siblings, 1 reply; 3+ messages in thread
From: Aneesh Kumar K.V @ 2019-05-14  1:05 UTC (permalink / raw)
  To: npiggin, paulus, mpe; +Cc: Sachin Sant, Aneesh Kumar K.V, linuxppc-dev
This fix the below crash that arise due to not handling page table allocation
failures while allocating hugetlb page table.
 BUG: Kernel NULL pointer dereference at 0x0000001c
 Faulting instruction address: 0xc000000001d1e58c
 Oops: Kernel access of bad area, sig: 11 [#1]
 LE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
 CPU: 3 PID: 4635 Comm: futex_wake04 Tainted: G        W  O      5.1.0-next-20190507-autotest #1
 NIP:  c000000001d1e58c LR: c000000001d1e54c CTR: 0000000000000000
 REGS: c000000004937890 TRAP: 0300   Tainted: G        W  O       (5.1.0-next-20190507-autotest)
 MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 22424822  XER: 00000000
 CFAR: c00000000183e9e0 DAR: 000000000000001c DSISR: 40000000 IRQMASK: 0
 GPR00: c000000001901a80 c000000004937b20 c000000003938700 0000000000000000
 GPR04: 0000000000400cc0 000000000003efff 000000027966e000 c000000003ba8700
 GPR08: c000000003ba8700 000000000d601125 c000000003ba8700 0000000080000000
 GPR12: 0000000022424822 c00000001ecae280 0000000000000000 0000000000000000
 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 GPR20: 0000000000000018 c0000000039e2d30 c0000000039e2d28 c0000002762da460
 GPR24: 000000000000001c 0000000000000000 0000000000000001 c000000001901a80
 GPR28: 0000000000400cc0 0000000000000000 0000000000000000 0000000000400cc0
 NIP [c000000001d1e58c] kmem_cache_alloc+0xbc/0x5a0
 LR [c000000001d1e54c] kmem_cache_alloc+0x7c/0x5a0
 Call Trace:
  [c000000001c91150] __pud_alloc+0x160/0x200 (unreliable)
  [c000000001901a80] huge_pte_alloc+0x580/0x950
  [c000000001cf7910] hugetlb_fault+0x9a0/0x1250
  [c000000001c94a80] handle_mm_fault+0x490/0x4a0
  [c0000000018d529c] __do_page_fault+0x77c/0x1f00
  [c0000000018d6a48] do_page_fault+0x28/0x50
  [c00000000183b0d4] handle_page_fault+0x18/0x38
Fixes: e2b3d202d1db ("powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format")
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
Note: I did add a recent commit for the Fixes tag. But in reality we never checked for page table
allocation failure there. If we want to go to that old commit, then we may need.
Fixes: a4fe3ce7699b ("powerpc/mm: Allow more flexible layouts for hugepage pagetables")
 arch/powerpc/mm/hugetlbpage.c | 8 ++++++++
 1 file changed, 8 insertions(+)
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index c5c9ff2d7afc..ae9d71da5219 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -130,6 +130,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz
 	} else {
 		pdshift = PUD_SHIFT;
 		pu = pud_alloc(mm, pg, addr);
+		if (!pu)
+			return NULL;
 		if (pshift == PUD_SHIFT)
 			return (pte_t *)pu;
 		else if (pshift > PMD_SHIFT) {
@@ -138,6 +140,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz
 		} else {
 			pdshift = PMD_SHIFT;
 			pm = pmd_alloc(mm, pu, addr);
+			if (!pm)
+				return NULL;
 			if (pshift == PMD_SHIFT)
 				/* 16MB hugepage */
 				return (pte_t *)pm;
@@ -154,12 +158,16 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz
 	} else {
 		pdshift = PUD_SHIFT;
 		pu = pud_alloc(mm, pg, addr);
+		if (!pu)
+			return NULL;
 		if (pshift >= PUD_SHIFT) {
 			ptl = pud_lockptr(mm, pu);
 			hpdp = (hugepd_t *)pu;
 		} else {
 			pdshift = PMD_SHIFT;
 			pm = pmd_alloc(mm, pu, addr);
+			if (!pm)
+				return NULL;
 			ptl = pmd_lockptr(mm, pm);
 			hpdp = (hugepd_t *)pm;
 		}
-- 
2.21.0
^ permalink raw reply related	[flat|nested] 3+ messages in thread
* Re: [PATCH] powerpc/mm: Handle page table allocation failures
  2019-05-14  1:05 [PATCH] powerpc/mm: Handle page table allocation failures Aneesh Kumar K.V
@ 2019-05-14  6:40 ` Michael Ellerman
  2019-05-14  9:33   ` Sachin Sant
  0 siblings, 1 reply; 3+ messages in thread
From: Michael Ellerman @ 2019-05-14  6:40 UTC (permalink / raw)
  To: Aneesh Kumar K.V, npiggin, paulus
  Cc: Sachin Sant, Aneesh Kumar K.V, linuxppc-dev
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
> This fix the below crash that arise due to not handling page table allocation
> failures while allocating hugetlb page table.
>
>  BUG: Kernel NULL pointer dereference at 0x0000001c
>  Faulting instruction address: 0xc000000001d1e58c
>  Oops: Kernel access of bad area, sig: 11 [#1]
>  LE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
>
>  CPU: 3 PID: 4635 Comm: futex_wake04 Tainted: G        W  O      5.1.0-next-20190507-autotest #1
>  NIP:  c000000001d1e58c LR: c000000001d1e54c CTR: 0000000000000000
>  REGS: c000000004937890 TRAP: 0300   Tainted: G        W  O       (5.1.0-next-20190507-autotest)
>  MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 22424822  XER: 00000000
>  CFAR: c00000000183e9e0 DAR: 000000000000001c DSISR: 40000000 IRQMASK: 0
>  GPR00: c000000001901a80 c000000004937b20 c000000003938700 0000000000000000
>  GPR04: 0000000000400cc0 000000000003efff 000000027966e000 c000000003ba8700
>  GPR08: c000000003ba8700 000000000d601125 c000000003ba8700 0000000080000000
>  GPR12: 0000000022424822 c00000001ecae280 0000000000000000 0000000000000000
>  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  GPR20: 0000000000000018 c0000000039e2d30 c0000000039e2d28 c0000002762da460
>  GPR24: 000000000000001c 0000000000000000 0000000000000001 c000000001901a80
>  GPR28: 0000000000400cc0 0000000000000000 0000000000000000 0000000000400cc0
>  NIP [c000000001d1e58c] kmem_cache_alloc+0xbc/0x5a0
>  LR [c000000001d1e54c] kmem_cache_alloc+0x7c/0x5a0
>  Call Trace:
>   [c000000001c91150] __pud_alloc+0x160/0x200 (unreliable)
>   [c000000001901a80] huge_pte_alloc+0x580/0x950
>   [c000000001cf7910] hugetlb_fault+0x9a0/0x1250
>   [c000000001c94a80] handle_mm_fault+0x490/0x4a0
>   [c0000000018d529c] __do_page_fault+0x77c/0x1f00
>   [c0000000018d6a48] do_page_fault+0x28/0x50
>   [c00000000183b0d4] handle_page_fault+0x18/0x38
>
> Fixes: e2b3d202d1db ("powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format")
> Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>
> Note: I did add a recent commit for the Fixes tag. But in reality we never checked for page table
> allocation failure there. If we want to go to that old commit, then we may need.
If we never checked for failure in that path, is there some reason we've
only just noticed the crashes? Are we just testing under memory pressure
more effectively than we used to?
cheers
^ permalink raw reply	[flat|nested] 3+ messages in thread
* Re: [PATCH] powerpc/mm: Handle page table allocation failures
  2019-05-14  6:40 ` Michael Ellerman
@ 2019-05-14  9:33   ` Sachin Sant
  0 siblings, 0 replies; 3+ messages in thread
From: Sachin Sant @ 2019-05-14  9:33 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: Aneesh Kumar K.V, paulus, linuxppc-dev, npiggin
> On 14-May-2019, at 12:10 PM, Michael Ellerman <mpe@ellerman.id.au> wrote:
> 
> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>> This fix the below crash that arise due to not handling page table allocation
>> failures while allocating hugetlb page table.
>> 
>> BUG: Kernel NULL pointer dereference at 0x0000001c
>> Faulting instruction address: 0xc000000001d1e58c
>> Oops: Kernel access of bad area, sig: 11 [#1]
>> LE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
>> 
>> CPU: 3 PID: 4635 Comm: futex_wake04 Tainted: G        W  O      5.1.0-next-20190507-autotest #1
>> NIP:  c000000001d1e58c LR: c000000001d1e54c CTR: 0000000000000000
>> REGS: c000000004937890 TRAP: 0300   Tainted: G        W  O       (5.1.0-next-20190507-autotest)
>> MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 22424822  XER: 00000000
>> CFAR: c00000000183e9e0 DAR: 000000000000001c DSISR: 40000000 IRQMASK: 0
>> GPR00: c000000001901a80 c000000004937b20 c000000003938700 0000000000000000
>> GPR04: 0000000000400cc0 000000000003efff 000000027966e000 c000000003ba8700
>> GPR08: c000000003ba8700 000000000d601125 c000000003ba8700 0000000080000000
>> GPR12: 0000000022424822 c00000001ecae280 0000000000000000 0000000000000000
>> GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> GPR20: 0000000000000018 c0000000039e2d30 c0000000039e2d28 c0000002762da460
>> GPR24: 000000000000001c 0000000000000000 0000000000000001 c000000001901a80
>> GPR28: 0000000000400cc0 0000000000000000 0000000000000000 0000000000400cc0
>> NIP [c000000001d1e58c] kmem_cache_alloc+0xbc/0x5a0
>> LR [c000000001d1e54c] kmem_cache_alloc+0x7c/0x5a0
>> Call Trace:
>>  [c000000001c91150] __pud_alloc+0x160/0x200 (unreliable)
>>  [c000000001901a80] huge_pte_alloc+0x580/0x950
>>  [c000000001cf7910] hugetlb_fault+0x9a0/0x1250
>>  [c000000001c94a80] handle_mm_fault+0x490/0x4a0
>>  [c0000000018d529c] __do_page_fault+0x77c/0x1f00
>>  [c0000000018d6a48] do_page_fault+0x28/0x50
>>  [c00000000183b0d4] handle_page_fault+0x18/0x38
>> 
>> Fixes: e2b3d202d1db ("powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format")
>> Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>> 
>> Note: I did add a recent commit for the Fixes tag. But in reality we never checked for page table
>> allocation failure there. If we want to go to that old commit, then we may need.
> 
> If we never checked for failure in that path, is there some reason we've
> only just noticed the crashes? Are we just testing under memory pressure
> more effectively than we used to?
> 
Actually the reported crash seems to be due to commit 723f268f19
723f268f19 - powerpc/mm: cleanup ifdef mess in add_huge_page_size()
Reverting this patch allows the test case to execute correctly without a crash.
Thanks
-Sachin
^ permalink raw reply	[flat|nested] 3+ messages in thread
end of thread, other threads:[~2019-05-14  9:34 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-05-14  1:05 [PATCH] powerpc/mm: Handle page table allocation failures Aneesh Kumar K.V
2019-05-14  6:40 ` Michael Ellerman
2019-05-14  9:33   ` Sachin Sant
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).