[PATCH v2] mm: khugepaged: Fix kernel BUG in hpage_collapse_scan

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2] mm: khugepaged: Fix kernel BUG in hpage_collapse_scan_file
@ 2023-03-30 15:53 Ivan Orlov
  2023-03-31  1:33 ` Zach O'Keefe
  0 siblings, 1 reply; 11+ messages in thread
From: Ivan Orlov @ 2023-03-30 15:53 UTC (permalink / raw)
  To: akpm
  Cc: Ivan Orlov, linux-mm, linux-kernel, himadrispandya, skhan,
	linux-kernel-mentees, shy828301, zokeefe,
	syzbot+9578faa5475acb35fa50

Syzkaller reported the following issue:

kernel BUG at mm/khugepaged.c:1823!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 5097 Comm: syz-executor220 Not tainted 6.2.0-syzkaller-13154-g857f1268a591 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/16/2023
RIP: 0010:collapse_file mm/khugepaged.c:1823 [inline]
RIP: 0010:hpage_collapse_scan_file+0x67c8/0x7580 mm/khugepaged.c:2233
Code: 00 00 89 de e8 c9 66 a3 ff 31 ff 89 de e8 c0 66 a3 ff 45 84 f6 0f 85 28 0d 00 00 e8 22 64 a3 ff e9 dc f7 ff ff e8 18 64 a3 ff <0f> 0b f3 0f 1e fa e8 0d 64 a3 ff e9 93 f6 ff ff f3 0f 1e fa 4c 89
RSP: 0018:ffffc90003dff4e0 EFLAGS: 00010093
RAX: ffffffff81e95988 RBX: 00000000000001c1 RCX: ffff8880205b3a80
RDX: 0000000000000000 RSI: 00000000000001c0 RDI: 00000000000001c1
RBP: ffffc90003dff830 R08: ffffffff81e90e67 R09: fffffbfff1a433c3
R10: 0000000000000000 R11: dffffc0000000001 R12: 0000000000000000
R13: ffffc90003dff6c0 R14: 00000000000001c0 R15: 0000000000000000
FS:  00007fdbae5ee700(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fdbae6901e0 CR3: 000000007b2dd000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 madvise_collapse+0x721/0xf50 mm/khugepaged.c:2693
 madvise_vma_behavior mm/madvise.c:1086 [inline]
 madvise_walk_vmas mm/madvise.c:1260 [inline]
 do_madvise+0x9e5/0x4680 mm/madvise.c:1439
 __do_sys_madvise mm/madvise.c:1452 [inline]
 __se_sys_madvise mm/madvise.c:1450 [inline]
 __x64_sys_madvise+0xa5/0xb0 mm/madvise.c:1450
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

The 'xas_store' call during page cache scanning can potentially
translate 'xas' into the error state (with the reproducer provided
by the syzkaller the error code is -ENOMEM). However, there are no
further checks after the 'xas_store', and the next call of 'xas_next'
at the start of the scanning cycle doesn't increase the xa_index,
and the issue occurs.

This patch will add the xarray state error checking after the
'xas_store' and the corresponding result error code. It will
also add xarray state error checking via WARN_ON_ONCE macros,
to be sure that ENOMEM or other possible errors don't occur
at the places they shouldn't.

Tested via syzbot.

Reported-by: syzbot+9578faa5475acb35fa50@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=7d6bb3760e026ece7524500fe44fb024a0e959fc
Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com>
---
V1 -> V2: Add WARN_ON_ONCE error checking and comments

 mm/khugepaged.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 92e6f56a932d..8b6580b13339 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -55,6 +55,7 @@ enum scan_result {
 	SCAN_CGROUP_CHARGE_FAIL,
 	SCAN_TRUNCATED,
 	SCAN_PAGE_HAS_PRIVATE,
+	SCAN_STORE_FAILED,
 };
 
 #define CREATE_TRACE_POINTS
@@ -1840,6 +1841,15 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
 					goto xa_locked;
 				}
 				xas_store(&xas, hpage);
+				if (xas_error(&xas)) {
+					/* revert shmem_charge performed
+					 * in the previous condition
+					 */
+					mapping->nrpages--;
+					shmem_uncharge(mapping->host, 1);
+					result = SCAN_STORE_FAILED;
+					goto xa_locked;
+				}
 				nr_none++;
 				continue;
 			}
@@ -1992,6 +2002,11 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
 
 		/* Finally, replace with the new page. */
 		xas_store(&xas, hpage);
+		/* We can't get an ENOMEM here (because the allocation happened before)
+		 * but let's check for errors (XArray implementation can be
+		 * changed in the future)
+		 */
+		WARN_ON_ONCE(xas_error(&xas));
 		continue;
 out_unlock:
 		unlock_page(page);
@@ -2029,6 +2044,11 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
 	/* Join all the small entries into a single multi-index entry */
 	xas_set_order(&xas, start, HPAGE_PMD_ORDER);
 	xas_store(&xas, hpage);
+	/* Here we can't get an ENOMEM (because entries were
+	 * previously allocated) But let's check for errors
+	 * (XArray implementation can be changed in the future)
+	 */
+	WARN_ON_ONCE(xas_error(&xas));
 xa_locked:
 	xas_unlock_irq(&xas);
 xa_unlocked:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] mm: khugepaged: Fix kernel BUG in hpage_collapse_scan_file
  2023-03-30 15:53 [PATCH v2] mm: khugepaged: Fix kernel BUG in hpage_collapse_scan_file Ivan Orlov
@ 2023-03-31  1:33 ` Zach O'Keefe
  2023-03-31  6:47   ` Ivan Orlov
  2023-04-04  0:58   ` Yang Shi
  0 siblings, 2 replies; 11+ messages in thread
From: Zach O'Keefe @ 2023-03-31  1:33 UTC (permalink / raw)
  To: Ivan Orlov
  Cc: akpm, linux-mm, linux-kernel, himadrispandya, skhan,
	linux-kernel-mentees, shy828301, syzbot+9578faa5475acb35fa50

On Mar 30 19:53, Ivan Orlov wrote:
> Syzkaller reported the following issue:
> 
> kernel BUG at mm/khugepaged.c:1823!
> invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> CPU: 1 PID: 5097 Comm: syz-executor220 Not tainted 6.2.0-syzkaller-13154-g857f1268a591 #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/16/2023
> RIP: 0010:collapse_file mm/khugepaged.c:1823 [inline]
> RIP: 0010:hpage_collapse_scan_file+0x67c8/0x7580 mm/khugepaged.c:2233
> Code: 00 00 89 de e8 c9 66 a3 ff 31 ff 89 de e8 c0 66 a3 ff 45 84 f6 0f 85 28 0d 00 00 e8 22 64 a3 ff e9 dc f7 ff ff e8 18 64 a3 ff <0f> 0b f3 0f 1e fa e8 0d 64 a3 ff e9 93 f6 ff ff f3 0f 1e fa 4c 89
> RSP: 0018:ffffc90003dff4e0 EFLAGS: 00010093
> RAX: ffffffff81e95988 RBX: 00000000000001c1 RCX: ffff8880205b3a80
> RDX: 0000000000000000 RSI: 00000000000001c0 RDI: 00000000000001c1
> RBP: ffffc90003dff830 R08: ffffffff81e90e67 R09: fffffbfff1a433c3
> R10: 0000000000000000 R11: dffffc0000000001 R12: 0000000000000000
> R13: ffffc90003dff6c0 R14: 00000000000001c0 R15: 0000000000000000
> FS:  00007fdbae5ee700(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fdbae6901e0 CR3: 000000007b2dd000 CR4: 00000000003506e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  <TASK>
>  madvise_collapse+0x721/0xf50 mm/khugepaged.c:2693
>  madvise_vma_behavior mm/madvise.c:1086 [inline]
>  madvise_walk_vmas mm/madvise.c:1260 [inline]
>  do_madvise+0x9e5/0x4680 mm/madvise.c:1439
>  __do_sys_madvise mm/madvise.c:1452 [inline]
>  __se_sys_madvise mm/madvise.c:1450 [inline]
>  __x64_sys_madvise+0xa5/0xb0 mm/madvise.c:1450
>  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>  do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> 

Thanks, Ivan.

In the process of reviewing this, I starting thinking if the !shmem case was
also susceptible to a similar race, and I *think* it might be. Unfortunately, my
time has ran out, and I haven't been able to validate ; I'm less familiar with
the file-side of things.

The underlying problem is race with truncation/hole-punch  under OOM condition.
The nice do-while loop near the top of collapse_file() attempts to avoid this
scenario by making sure enough slots are available. However, when we drop xarray
lock, we open ourselves up to concurrent removal + slot deletion. Those slots
then need to be allocated again -- which under OOM condition is failable.

The syzbot reproducer picks on shmem, but I think this can occur for file as
well. If we find a hole, we unlock the xarray and call
page_cache_sync_readahead(), which if it succeeds, IIUC, will have allocated a
new slot in our mapping pointing to the new page. We *then* locks the page. Only
after the page is locked are we protected from concurrent removal (Note: this is
what provides us protection in many of the xas_store() cases ; we've held the
slot's contained page-lock since verifying the slot exists, protecting us from
removal / reallocation races).

Maybe I'm just low on caffeine at the end of the day, and am missing something,
but if I had more time, I'd be looking into the file-side some more to verify.
Apologies that hasn't occurred to me until now ; I was looking at one of your
comments and double-checked why I *thought* we were safe.

Anyways, irrespective of that looming issues, some more notes to follow:

> The 'xas_store' call during page cache scanning can potentially
> translate 'xas' into the error state (with the reproducer provided
> by the syzkaller the error code is -ENOMEM). However, there are no
> further checks after the 'xas_store', and the next call of 'xas_next'
> at the start of the scanning cycle doesn't increase the xa_index,
> and the issue occurs.
> 
> This patch will add the xarray state error checking after the
> 'xas_store' and the corresponding result error code. It will
> also add xarray state error checking via WARN_ON_ONCE macros,
> to be sure that ENOMEM or other possible errors don't occur
> at the places they shouldn't.

Thanks for the additions here. I think it's worthwhile providing even more
details about the specifics of the race we are fixing and/or guarding against to
help ppl understand how that -ENOMEM comes about if the do-while loop has
"Ensured" we have slots available (additionally, I think that comment can be
augmented).

> Tested via syzbot.
> 
> Reported-by: syzbot+9578faa5475acb35fa50@syzkaller.appspotmail.com
> Link: https://syzkaller.appspot.com/bug?id=7d6bb3760e026ece7524500fe44fb024a0e959fc
> Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com>
> ---
> V1 -> V2: Add WARN_ON_ONCE error checking and comments
> 
>  mm/khugepaged.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 92e6f56a932d..8b6580b13339 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -55,6 +55,7 @@ enum scan_result {
>  	SCAN_CGROUP_CHARGE_FAIL,
>  	SCAN_TRUNCATED,
>  	SCAN_PAGE_HAS_PRIVATE,
> +	SCAN_STORE_FAILED,
>  };

I'm still reluctant to add a new error code for this as this seems like quite a
rare race that requires OOM to trigger. I'd be happier just reusing SCAN_FAIL,
or, something we might get some millage out of later: SCAN_OOM.

Also, a reminder to update include/trace/events/huge_memory.h, if you go that
route.

>  
>  #define CREATE_TRACE_POINTS
> @@ -1840,6 +1841,15 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
>  					goto xa_locked;
>  				}
>  				xas_store(&xas, hpage);
> +				if (xas_error(&xas)) {
> +					/* revert shmem_charge performed
> +					 * in the previous condition
> +					 */

Nit: Here, and following, I think standard convention for multiline comment is
to have an empty first and last line, eg:

 +					/*
 +					 * revert shmem_charge performed
 +					 * in the previous condition
 +					 */

Though, checkpatch.pl --strict didn't seem to care.

> +					mapping->nrpages--;
> +					shmem_uncharge(mapping->host, 1);
> +					result = SCAN_STORE_FAILED;
> +					goto xa_locked;
> +				}
>  				nr_none++;
>  				continue;
>  			}
> @@ -1992,6 +2002,11 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
>  
>  		/* Finally, replace with the new page. */
>  		xas_store(&xas, hpage);
> +		/* We can't get an ENOMEM here (because the allocation happened before)
> +		 * but let's check for errors (XArray implementation can be
> +		 * changed in the future)
> +		 */
> +		WARN_ON_ONCE(xas_error(&xas));

Nit: it's not just that allocation happened before -- need some guarantee we've
been protected from concurrent removal. This is what made me look at the file
side.

>  		continue;
>  out_unlock:
>  		unlock_page(page);
> @@ -2029,6 +2044,11 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
>  	/* Join all the small entries into a single multi-index entry */
>  	xas_set_order(&xas, start, HPAGE_PMD_ORDER);
>  	xas_store(&xas, hpage);
> +	/* Here we can't get an ENOMEM (because entries were
> +	 * previously allocated) But let's check for errors
> +	 * (XArray implementation can be changed in the future)
> +	 */
> +	WARN_ON_ONCE(xas_error(&xas));

Ditto.

Apologies I won't be around to see this change through -- I'm just out of time,
and will be shutting my computer down tomorrow for 3 months.  Sorry for the poor
timing, for raising issues, then disappearing. Hopefully I'm wrong and the
file-side isn't a concern.

Best,
Zach

>  xa_locked:
>  	xas_unlock_irq(&xas);
>  xa_unlocked:
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] mm: khugepaged: Fix kernel BUG in hpage_collapse_scan_file
  2023-03-31  1:33 ` Zach O'Keefe
@ 2023-03-31  6:47   ` Ivan Orlov
  2023-04-04  0:58   ` Yang Shi
  1 sibling, 0 replies; 11+ messages in thread
From: Ivan Orlov @ 2023-03-31  6:47 UTC (permalink / raw)
  To: Zach O'Keefe
  Cc: akpm, linux-mm, linux-kernel, himadrispandya, skhan,
	linux-kernel-mentees, shy828301, syzbot+9578faa5475acb35fa50

On 3/31/23 05:33, Zach O'Keefe wrote:

> Thanks, Ivan.
> 
> In the process of reviewing this, I starting thinking if the !shmem case was
> also susceptible to a similar race, and I *think* it might be. Unfortunately, my
> time has ran out, and I haven't been able to validate ; I'm less familiar with
> the file-side of things.
> 
> The underlying problem is race with truncation/hole-punch  under OOM condition.
> The nice do-while loop near the top of collapse_file() attempts to avoid this
> scenario by making sure enough slots are available. However, when we drop xarray
> lock, we open ourselves up to concurrent removal + slot deletion. Those slots
> then need to be allocated again -- which under OOM condition is failable.
> 
> The syzbot reproducer picks on shmem, but I think this can occur for file as
> well. If we find a hole, we unlock the xarray and call
> page_cache_sync_readahead(), which if it succeeds, IIUC, will have allocated a
> new slot in our mapping pointing to the new page. We *then* locks the page. Only
> after the page is locked are we protected from concurrent removal (Note: this is
> what provides us protection in many of the xas_store() cases ; we've held the
> slot's contained page-lock since verifying the slot exists, protecting us from
> removal / reallocation races).
> 
> Maybe I'm just low on caffeine at the end of the day, and am missing something,
> but if I had more time, I'd be looking into the file-side some more to verify.
> Apologies that hasn't occurred to me until now ; I was looking at one of your
> comments and double-checked why I *thought* we were safe.
> 
> Anyways, irrespective of that looming issues, some more notes to follow:
> 
>> The 'xas_store' call during page cache scanning can potentially
>> translate 'xas' into the error state (with the reproducer provided
>> by the syzkaller the error code is -ENOMEM). However, there are no
>> further checks after the 'xas_store', and the next call of 'xas_next'
>> at the start of the scanning cycle doesn't increase the xa_index,
>> and the issue occurs.
>>
>> This patch will add the xarray state error checking after the
>> 'xas_store' and the corresponding result error code. It will
>> also add xarray state error checking via WARN_ON_ONCE macros,
>> to be sure that ENOMEM or other possible errors don't occur
>> at the places they shouldn't.
> 
> Thanks for the additions here. I think it's worthwhile providing even more
> details about the specifics of the race we are fixing and/or guarding against to
> help ppl understand how that -ENOMEM comes about if the do-while loop has
> "Ensured" we have slots available (additionally, I think that comment can be
> augmented).
> 
>> Tested via syzbot.
>>
>> Reported-by: syzbot+9578faa5475acb35fa50@syzkaller.appspotmail.com
>> Link: https://syzkaller.appspot.com/bug?id=7d6bb3760e026ece7524500fe44fb024a0e959fc
>> Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com>
>> ---
>> V1 -> V2: Add WARN_ON_ONCE error checking and comments
>>
>>   mm/khugepaged.c | 20 ++++++++++++++++++++
>>   1 file changed, 20 insertions(+)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index 92e6f56a932d..8b6580b13339 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -55,6 +55,7 @@ enum scan_result {
>>   	SCAN_CGROUP_CHARGE_FAIL,
>>   	SCAN_TRUNCATED,
>>   	SCAN_PAGE_HAS_PRIVATE,
>> +	SCAN_STORE_FAILED,
>>   };
> 
> I'm still reluctant to add a new error code for this as this seems like quite a
> rare race that requires OOM to trigger. I'd be happier just reusing SCAN_FAIL,
> or, something we might get some millage out of later: SCAN_OOM.
> 
> Also, a reminder to update include/trace/events/huge_memory.h, if you go that
> route.
> 
>>   
>>   #define CREATE_TRACE_POINTS
>> @@ -1840,6 +1841,15 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
>>   					goto xa_locked;
>>   				}
>>   				xas_store(&xas, hpage);
>> +				if (xas_error(&xas)) {
>> +					/* revert shmem_charge performed
>> +					 * in the previous condition
>> +					 */
> 
> Nit: Here, and following, I think standard convention for multiline comment is
> to have an empty first and last line, eg:
> 
>   +					/*
>   +					 * revert shmem_charge performed
>   +					 * in the previous condition
>   +					 */
> 
> Though, checkpatch.pl --strict didn't seem to care.
> 
>> +					mapping->nrpages--;
>> +					shmem_uncharge(mapping->host, 1);
>> +					result = SCAN_STORE_FAILED;
>> +					goto xa_locked;
>> +				}
>>   				nr_none++;
>>   				continue;
>>   			}
>> @@ -1992,6 +2002,11 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
>>   
>>   		/* Finally, replace with the new page. */
>>   		xas_store(&xas, hpage);
>> +		/* We can't get an ENOMEM here (because the allocation happened before)
>> +		 * but let's check for errors (XArray implementation can be
>> +		 * changed in the future)
>> +		 */
>> +		WARN_ON_ONCE(xas_error(&xas));
> 
> Nit: it's not just that allocation happened before -- need some guarantee we've
> been protected from concurrent removal. This is what made me look at the file
> side.
> 
>>   		continue;
>>   out_unlock:
>>   		unlock_page(page);
>> @@ -2029,6 +2044,11 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
>>   	/* Join all the small entries into a single multi-index entry */
>>   	xas_set_order(&xas, start, HPAGE_PMD_ORDER);
>>   	xas_store(&xas, hpage);
>> +	/* Here we can't get an ENOMEM (because entries were
>> +	 * previously allocated) But let's check for errors
>> +	 * (XArray implementation can be changed in the future)
>> +	 */
>> +	WARN_ON_ONCE(xas_error(&xas));
> 
> Ditto.
> 
> Apologies I won't be around to see this change through -- I'm just out of time,
> and will be shutting my computer down tomorrow for 3 months.  Sorry for the poor
> timing, for raising issues, then disappearing. Hopefully I'm wrong and the
> file-side isn't a concern.
> 
> Best,
> Zach
> 
>>   xa_locked:
>>   	xas_unlock_irq(&xas);
>>   xa_unlocked:
>> -- 
>> 2.34.1
>>

Hello, Zach! Thank you very much for the detailed review! I thought that 
locking is our guarantee to not get an -ENOMEM, but I didn't have the 
deep understanding of the problem's cause, due to the fact I'm just 
starting my memory management journey :) In any case, I will do a 
research about this problem, and hopefully after you get out of the 
vacation you will see a new patch fully fixes this problem. Have a nice 
vacation! Thanks again!


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] mm: khugepaged: Fix kernel BUG in hpage_collapse_scan_file
  2023-03-31  1:33 ` Zach O'Keefe
  2023-03-31  6:47   ` Ivan Orlov
@ 2023-04-04  0:58   ` Yang Shi
  2023-04-05 16:43     ` Zach O'Keefe
  1 sibling, 1 reply; 11+ messages in thread
From: Yang Shi @ 2023-04-04  0:58 UTC (permalink / raw)
  To: Zach O'Keefe
  Cc: Ivan Orlov, akpm, linux-mm, linux-kernel, himadrispandya, skhan,
	linux-kernel-mentees, syzbot+9578faa5475acb35fa50

On Thu, Mar 30, 2023 at 6:33 PM Zach O'Keefe <zokeefe@google.com> wrote:
>
> On Mar 30 19:53, Ivan Orlov wrote:
> > Syzkaller reported the following issue:
> >
> > kernel BUG at mm/khugepaged.c:1823!
> > invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> > CPU: 1 PID: 5097 Comm: syz-executor220 Not tainted 6.2.0-syzkaller-13154-g857f1268a591 #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/16/2023
> > RIP: 0010:collapse_file mm/khugepaged.c:1823 [inline]
> > RIP: 0010:hpage_collapse_scan_file+0x67c8/0x7580 mm/khugepaged.c:2233
> > Code: 00 00 89 de e8 c9 66 a3 ff 31 ff 89 de e8 c0 66 a3 ff 45 84 f6 0f 85 28 0d 00 00 e8 22 64 a3 ff e9 dc f7 ff ff e8 18 64 a3 ff <0f> 0b f3 0f 1e fa e8 0d 64 a3 ff e9 93 f6 ff ff f3 0f 1e fa 4c 89
> > RSP: 0018:ffffc90003dff4e0 EFLAGS: 00010093
> > RAX: ffffffff81e95988 RBX: 00000000000001c1 RCX: ffff8880205b3a80
> > RDX: 0000000000000000 RSI: 00000000000001c0 RDI: 00000000000001c1
> > RBP: ffffc90003dff830 R08: ffffffff81e90e67 R09: fffffbfff1a433c3
> > R10: 0000000000000000 R11: dffffc0000000001 R12: 0000000000000000
> > R13: ffffc90003dff6c0 R14: 00000000000001c0 R15: 0000000000000000
> > FS:  00007fdbae5ee700(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00007fdbae6901e0 CR3: 000000007b2dd000 CR4: 00000000003506e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Call Trace:
> >  <TASK>
> >  madvise_collapse+0x721/0xf50 mm/khugepaged.c:2693
> >  madvise_vma_behavior mm/madvise.c:1086 [inline]
> >  madvise_walk_vmas mm/madvise.c:1260 [inline]
> >  do_madvise+0x9e5/0x4680 mm/madvise.c:1439
> >  __do_sys_madvise mm/madvise.c:1452 [inline]
> >  __se_sys_madvise mm/madvise.c:1450 [inline]
> >  __x64_sys_madvise+0xa5/0xb0 mm/madvise.c:1450
> >  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> >  do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> >  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >
>
> Thanks, Ivan.
>
> In the process of reviewing this, I starting thinking if the !shmem case was
> also susceptible to a similar race, and I *think* it might be. Unfortunately, my
> time has ran out, and I haven't been able to validate ; I'm less familiar with
> the file-side of things.
>
> The underlying problem is race with truncation/hole-punch  under OOM condition.
> The nice do-while loop near the top of collapse_file() attempts to avoid this
> scenario by making sure enough slots are available. However, when we drop xarray
> lock, we open ourselves up to concurrent removal + slot deletion. Those slots
> then need to be allocated again -- which under OOM condition is failable.
>
> The syzbot reproducer picks on shmem, but I think this can occur for file as
> well. If we find a hole, we unlock the xarray and call
> page_cache_sync_readahead(), which if it succeeds, IIUC, will have allocated a
> new slot in our mapping pointing to the new page. We *then* locks the page. Only
> after the page is locked are we protected from concurrent removal (Note: this is
> what provides us protection in many of the xas_store() cases ; we've held the
> slot's contained page-lock since verifying the slot exists, protecting us from
> removal / reallocation races).

IIUC, the find_lock_page() should be able to handle the race. It does
check whether the page is truncated or not after locking the page.

>
> Maybe I'm just low on caffeine at the end of the day, and am missing something,
> but if I had more time, I'd be looking into the file-side some more to verify.
> Apologies that hasn't occurred to me until now ; I was looking at one of your
> comments and double-checked why I *thought* we were safe.
>
> Anyways, irrespective of that looming issues, some more notes to follow:
>
> > The 'xas_store' call during page cache scanning can potentially
> > translate 'xas' into the error state (with the reproducer provided
> > by the syzkaller the error code is -ENOMEM). However, there are no
> > further checks after the 'xas_store', and the next call of 'xas_next'
> > at the start of the scanning cycle doesn't increase the xa_index,
> > and the issue occurs.
> >
> > This patch will add the xarray state error checking after the
> > 'xas_store' and the corresponding result error code. It will
> > also add xarray state error checking via WARN_ON_ONCE macros,
> > to be sure that ENOMEM or other possible errors don't occur
> > at the places they shouldn't.
>
> Thanks for the additions here. I think it's worthwhile providing even more
> details about the specifics of the race we are fixing and/or guarding against to
> help ppl understand how that -ENOMEM comes about if the do-while loop has
> "Ensured" we have slots available (additionally, I think that comment can be
> augmented).
>
> > Tested via syzbot.
> >
> > Reported-by: syzbot+9578faa5475acb35fa50@syzkaller.appspotmail.com
> > Link: https://syzkaller.appspot.com/bug?id=7d6bb3760e026ece7524500fe44fb024a0e959fc
> > Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com>
> > ---
> > V1 -> V2: Add WARN_ON_ONCE error checking and comments
> >
> >  mm/khugepaged.c | 20 ++++++++++++++++++++
> >  1 file changed, 20 insertions(+)
> >
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index 92e6f56a932d..8b6580b13339 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -55,6 +55,7 @@ enum scan_result {
> >       SCAN_CGROUP_CHARGE_FAIL,
> >       SCAN_TRUNCATED,
> >       SCAN_PAGE_HAS_PRIVATE,
> > +     SCAN_STORE_FAILED,
> >  };
>
> I'm still reluctant to add a new error code for this as this seems like quite a
> rare race that requires OOM to trigger. I'd be happier just reusing SCAN_FAIL,
> or, something we might get some millage out of later: SCAN_OOM.
>
> Also, a reminder to update include/trace/events/huge_memory.h, if you go that
> route.
>
> >
> >  #define CREATE_TRACE_POINTS
> > @@ -1840,6 +1841,15 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
> >                                       goto xa_locked;
> >                               }
> >                               xas_store(&xas, hpage);
> > +                             if (xas_error(&xas)) {
> > +                                     /* revert shmem_charge performed
> > +                                      * in the previous condition
> > +                                      */
>
> Nit: Here, and following, I think standard convention for multiline comment is
> to have an empty first and last line, eg:
>
>  +                                      /*
>  +                                       * revert shmem_charge performed
>  +                                       * in the previous condition
>  +                                       */
>
> Though, checkpatch.pl --strict didn't seem to care.
>
> > +                                     mapping->nrpages--;
> > +                                     shmem_uncharge(mapping->host, 1);
> > +                                     result = SCAN_STORE_FAILED;
> > +                                     goto xa_locked;
> > +                             }
> >                               nr_none++;
> >                               continue;
> >                       }
> > @@ -1992,6 +2002,11 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
> >
> >               /* Finally, replace with the new page. */
> >               xas_store(&xas, hpage);
> > +             /* We can't get an ENOMEM here (because the allocation happened before)
> > +              * but let's check for errors (XArray implementation can be
> > +              * changed in the future)
> > +              */
> > +             WARN_ON_ONCE(xas_error(&xas));
>
> Nit: it's not just that allocation happened before -- need some guarantee we've
> been protected from concurrent removal. This is what made me look at the file
> side.
>
> >               continue;
> >  out_unlock:
> >               unlock_page(page);
> > @@ -2029,6 +2044,11 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
> >       /* Join all the small entries into a single multi-index entry */
> >       xas_set_order(&xas, start, HPAGE_PMD_ORDER);
> >       xas_store(&xas, hpage);
> > +     /* Here we can't get an ENOMEM (because entries were
> > +      * previously allocated) But let's check for errors
> > +      * (XArray implementation can be changed in the future)
> > +      */
> > +     WARN_ON_ONCE(xas_error(&xas));
>
> Ditto.
>
> Apologies I won't be around to see this change through -- I'm just out of time,
> and will be shutting my computer down tomorrow for 3 months.  Sorry for the poor
> timing, for raising issues, then disappearing. Hopefully I'm wrong and the
> file-side isn't a concern.
>
> Best,
> Zach
>
> >  xa_locked:
> >       xas_unlock_irq(&xas);
> >  xa_unlocked:
> > --
> > 2.34.1
> >


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] mm: khugepaged: Fix kernel BUG in hpage_collapse_scan_file
  2023-04-04  0:58   ` Yang Shi
@ 2023-04-05 16:43     ` Zach O'Keefe
  2023-04-16 18:33       ` Andrew Morton
  0 siblings, 1 reply; 11+ messages in thread
From: Zach O'Keefe @ 2023-04-05 16:43 UTC (permalink / raw)
  To: Yang Shi
  Cc: Ivan Orlov, akpm, himadrispandya, linux-kernel,
	linux-kernel-mentees, linux-mm, skhan,
	syzbot+9578faa5475acb35fa50

[-- Attachment #1: Type: text/plain, Size: 9958 bytes --]

On Mon, Apr 3, 2023 at 5:58 PM Yang Shi <shy828301@gmail.com> wrote:

> On Thu, Mar 30, 2023 at 6:33 PM Zach O'Keefe <zokeefe@google.com> wrote:
> >
> > On Mar 30 19:53, Ivan Orlov wrote:
> > > Syzkaller reported the following issue:
> > >
> > > kernel BUG at mm/khugepaged.c:1823!
> > > invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> > > CPU: 1 PID: 5097 Comm: syz-executor220 Not tainted
> 6.2.0-syzkaller-13154-g857f1268a591 #0
> > > Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 02/16/2023
> > > RIP: 0010:collapse_file mm/khugepaged.c:1823 [inline]
> > > RIP: 0010:hpage_collapse_scan_file+0x67c8/0x7580 mm/khugepaged.c:2233
> > > Code: 00 00 89 de e8 c9 66 a3 ff 31 ff 89 de e8 c0 66 a3 ff 45 84 f6
> 0f 85 28 0d 00 00 e8 22 64 a3 ff e9 dc f7 ff ff e8 18 64 a3 ff <0f> 0b f3
> 0f 1e fa e8 0d 64 a3 ff e9 93 f6 ff ff f3 0f 1e fa 4c 89
> > > RSP: 0018:ffffc90003dff4e0 EFLAGS: 00010093
> > > RAX: ffffffff81e95988 RBX: 00000000000001c1 RCX: ffff8880205b3a80
> > > RDX: 0000000000000000 RSI: 00000000000001c0 RDI: 00000000000001c1
> > > RBP: ffffc90003dff830 R08: ffffffff81e90e67 R09: fffffbfff1a433c3
> > > R10: 0000000000000000 R11: dffffc0000000001 R12: 0000000000000000
> > > R13: ffffc90003dff6c0 R14: 00000000000001c0 R15: 0000000000000000
> > > FS:  00007fdbae5ee700(0000) GS:ffff8880b9900000(0000)
> knlGS:0000000000000000
> > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 00007fdbae6901e0 CR3: 000000007b2dd000 CR4: 00000000003506e0
> > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > Call Trace:
> > >  <TASK>
> > >  madvise_collapse+0x721/0xf50 mm/khugepaged.c:2693
> > >  madvise_vma_behavior mm/madvise.c:1086 [inline]
> > >  madvise_walk_vmas mm/madvise.c:1260 [inline]
> > >  do_madvise+0x9e5/0x4680 mm/madvise.c:1439
> > >  __do_sys_madvise mm/madvise.c:1452 [inline]
> > >  __se_sys_madvise mm/madvise.c:1450 [inline]
> > >  __x64_sys_madvise+0xa5/0xb0 mm/madvise.c:1450
> > >  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > >  do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > >  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > >
> >
> > Thanks, Ivan.
> >
> > In the process of reviewing this, I starting thinking if the !shmem case
> was
> > also susceptible to a similar race, and I *think* it might be.
> Unfortunately, my
> > time has ran out, and I haven't been able to validate ; I'm less
> familiar with
> > the file-side of things.
> >
> > The underlying problem is race with truncation/hole-punch  under OOM
> condition.
> > The nice do-while loop near the top of collapse_file() attempts to avoid
> this
> > scenario by making sure enough slots are available. However, when we
> drop xarray
> > lock, we open ourselves up to concurrent removal + slot deletion. Those
> slots
> > then need to be allocated again -- which under OOM condition is failable.
> >
> > The syzbot reproducer picks on shmem, but I think this can occur for
> file as
> > well. If we find a hole, we unlock the xarray and call
> > page_cache_sync_readahead(), which if it succeeds, IIUC, will have
> allocated a
> > new slot in our mapping pointing to the new page. We *then* locks the
> page. Only
> > after the page is locked are we protected from concurrent removal (Note:
> this is
> > what provides us protection in many of the xas_store() cases ; we've
> held the
> > slot's contained page-lock since verifying the slot exists, protecting
> us from
> > removal / reallocation races).
>
> IIUC, the find_lock_page() should be able to handle the race. It does
> check whether the page is truncated or not after locking the page.
>

Ugh. Duh. You’re right, and I’m not sure how I missed that implication; we
*did* succeed in locking the page in page cache. Well, the above rant just
looks foolish now.

Anyways, sorry for the false alarm, Ivan, and thanks for keeping me honest,
Yang!




> >
> > Maybe I'm just low on caffeine at the end of the day, and am missing
> something,
> > but if I had more time, I'd be looking into the file-side some more to
> verify.
> > Apologies that hasn't occurred to me until now ; I was looking at one of
> your
> > comments and double-checked why I *thought* we were safe.
> >
> > Anyways, irrespective of that looming issues, some more notes to follow:
> >
> > > The 'xas_store' call during page cache scanning can potentially
> > > translate 'xas' into the error state (with the reproducer provided
> > > by the syzkaller the error code is -ENOMEM). However, there are no
> > > further checks after the 'xas_store', and the next call of 'xas_next'
> > > at the start of the scanning cycle doesn't increase the xa_index,
> > > and the issue occurs.
> > >
> > > This patch will add the xarray state error checking after the
> > > 'xas_store' and the corresponding result error code. It will
> > > also add xarray state error checking via WARN_ON_ONCE macros,
> > > to be sure that ENOMEM or other possible errors don't occur
> > > at the places they shouldn't.
> >
> > Thanks for the additions here. I think it's worthwhile providing even
> more
> > details about the specifics of the race we are fixing and/or guarding
> against to
> > help ppl understand how that -ENOMEM comes about if the do-while loop has
> > "Ensured" we have slots available (additionally, I think that comment
> can be
> > augmented).
> >
> > > Tested via syzbot.
> > >
> > > Reported-by: syzbot+9578faa5475acb35fa50@syzkaller.appspotmail.com
> > > Link:
> https://syzkaller.appspot.com/bug?id=7d6bb3760e026ece7524500fe44fb024a0e959fc
> > > Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com>
> > > ---
> > > V1 -> V2: Add WARN_ON_ONCE error checking and comments
> > >
> > >  mm/khugepaged.c | 20 ++++++++++++++++++++
> > >  1 file changed, 20 insertions(+)
> > >
> > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > index 92e6f56a932d..8b6580b13339 100644
> > > --- a/mm/khugepaged.c
> > > +++ b/mm/khugepaged.c
> > > @@ -55,6 +55,7 @@ enum scan_result {
> > >       SCAN_CGROUP_CHARGE_FAIL,
> > >       SCAN_TRUNCATED,
> > >       SCAN_PAGE_HAS_PRIVATE,
> > > +     SCAN_STORE_FAILED,
> > >  };
> >
> > I'm still reluctant to add a new error code for this as this seems like
> quite a
> > rare race that requires OOM to trigger. I'd be happier just reusing
> SCAN_FAIL,
> > or, something we might get some millage out of later: SCAN_OOM.
> >
> > Also, a reminder to update include/trace/events/huge_memory.h, if you go
> that
> > route.
> >
> > >
> > >  #define CREATE_TRACE_POINTS
> > > @@ -1840,6 +1841,15 @@ static int collapse_file(struct mm_struct *mm,
> unsigned long addr,
> > >                                       goto xa_locked;
> > >                               }
> > >                               xas_store(&xas, hpage);
> > > +                             if (xas_error(&xas)) {
> > > +                                     /* revert shmem_charge performed
> > > +                                      * in the previous condition
> > > +                                      */
> >
> > Nit: Here, and following, I think standard convention for multiline
> comment is
> > to have an empty first and last line, eg:
> >
> >  +                                      /*
> >  +                                       * revert shmem_charge performed
> >  +                                       * in the previous condition
> >  +                                       */
> >
> > Though, checkpatch.pl --strict didn't seem to care.
> >
> > > +                                     mapping->nrpages--;
> > > +                                     shmem_uncharge(mapping->host, 1);
> > > +                                     result = SCAN_STORE_FAILED;
> > > +                                     goto xa_locked;
> > > +                             }
> > >                               nr_none++;
> > >                               continue;
> > >                       }
> > > @@ -1992,6 +2002,11 @@ static int collapse_file(struct mm_struct *mm,
> unsigned long addr,
> > >
> > >               /* Finally, replace with the new page. */
> > >               xas_store(&xas, hpage);
> > > +             /* We can't get an ENOMEM here (because the allocation
> happened before)
> > > +              * but let's check for errors (XArray implementation can
> be
> > > +              * changed in the future)
> > > +              */
> > > +             WARN_ON_ONCE(xas_error(&xas));
> >
> > Nit: it's not just that allocation happened before -- need some
> guarantee we've
> > been protected from concurrent removal. This is what made me look at the
> file
> > side.
> >
> > >               continue;
> > >  out_unlock:
> > >               unlock_page(page);
> > > @@ -2029,6 +2044,11 @@ static int collapse_file(struct mm_struct *mm,
> unsigned long addr,
> > >       /* Join all the small entries into a single multi-index entry */
> > >       xas_set_order(&xas, start, HPAGE_PMD_ORDER);
> > >       xas_store(&xas, hpage);
> > > +     /* Here we can't get an ENOMEM (because entries were
> > > +      * previously allocated) But let's check for errors
> > > +      * (XArray implementation can be changed in the future)
> > > +      */
> > > +     WARN_ON_ONCE(xas_error(&xas));
> >
> > Ditto.
> >
> > Apologies I won't be around to see this change through -- I'm just out
> of time,
> > and will be shutting my computer down tomorrow for 3 months.  Sorry for
> the poor
> > timing, for raising issues, then disappearing. Hopefully I'm wrong and
> the
> > file-side isn't a concern.
> >
> > Best,
> > Zach
> >
> > >  xa_locked:
> > >       xas_unlock_irq(&xas);
> > >  xa_unlocked:
> > > --
> > > 2.34.1
> > >
>
>

[-- Attachment #2: Type: text/html, Size: 12864 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] mm: khugepaged: Fix kernel BUG in hpage_collapse_scan_file
  2023-04-05 16:43     ` Zach O'Keefe
@ 2023-04-16 18:33       ` Andrew Morton
  2023-04-16 20:39         ` Ivan Orlov
                           ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Andrew Morton @ 2023-04-16 18:33 UTC (permalink / raw)
  To: Zach O'Keefe
  Cc: Yang Shi, Ivan Orlov, himadrispandya, linux-kernel,
	linux-kernel-mentees, linux-mm, skhan,
	syzbot+9578faa5475acb35fa50, Mike Kravetz, Kirill A. Shutemov,
	Matthew Wilcox

Circling back to this fix...

The BUG() is obviously real.  We're unsure that Ivan's fix is the best
one.  We haven't identified a Fixes:, and as this report is against the 6.2
kernel, a cc:stable will be needed.

According to the sysbot bisection
(https://syzkaller.appspot.com/bug?id=7d6bb3760e026ece7524500fe44fb024a0e959fc),
this is present in linux-5.19, so it might predate Zach's
58ac9a8993a13ebc changes.  But that bisection claim might be
misleading.

And Zach is offline for a few months.  So can people please take a look
and see if we can get this wrapped up?

Matthew, the assertion failure is in the

	VM_BUG_ON(index != xas.xa_index);

which was added in 77da9389b9d5f, so perhaps you could take a look?

Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] mm: khugepaged: Fix kernel BUG in hpage_collapse_scan_file
  2023-04-16 18:33       ` Andrew Morton
@ 2023-04-16 20:39         ` Ivan Orlov
  2023-04-16 23:04         ` Ivan Orlov
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Ivan Orlov @ 2023-04-16 20:39 UTC (permalink / raw)
  To: Andrew Morton, Zach O'Keefe
  Cc: Yang Shi, himadrispandya, linux-kernel, linux-kernel-mentees,
	linux-mm, skhan, syzbot+9578faa5475acb35fa50, Mike Kravetz,
	Kirill A. Shutemov, Matthew Wilcox

On 4/16/23 22:33, Andrew Morton wrote:
> 
> Circling back to this fix...
> 
> The BUG() is obviously real.  We're unsure that Ivan's fix is the best
> one.  We haven't identified a Fixes:, and as this report is against the 6.2
> kernel, a cc:stable will be needed.
> 
> According to the sysbot bisection
> (https://syzkaller.appspot.com/bug?id=7d6bb3760e026ece7524500fe44fb024a0e959fc),
> this is present in linux-5.19, so it might predate Zach's
> 58ac9a8993a13ebc changes.  But that bisection claim might be
> misleading.
> 
> And Zach is offline for a few months.  So can people please take a look
> and see if we can get this wrapped up?
> 
> Matthew, the assertion failure is in the
> 
> 	VM_BUG_ON(index != xas.xa_index);
> 
> which was added in 77da9389b9d5f, so perhaps you could take a look?
> 
> Thanks.

Hello, Andrew!

I have been unable to reproduce the problem with any of the existing 
reproducers on the 3d7cb6b04c3f3115719235cc6866b10326de34cd commit, 
which was detected by the syzkaller bisection. I also tried to test if 
the problem is reproducible on this commit via syzbot, but it did not 
detect the problem. It's possible that the bisection claim is 
misleading, as the issue may not be consistently reproducible.

Why did you mention the 58ac9a8993a13ebc commit? I thought 99cb0dbd47a15 
was considered as a "Fixes". 99cb0dbd47a15 is older than 3d7cb6b04c3f3, 
and the problematic code appeared there, so probably the problem could 
appear in 99cb0dbd47a15 as well.

Please, correct me if I'm missing something.

Kind regards,
Ivan Orlov.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] mm: khugepaged: Fix kernel BUG in hpage_collapse_scan_file
  2023-04-16 18:33       ` Andrew Morton
  2023-04-16 20:39         ` Ivan Orlov
@ 2023-04-16 23:04         ` Ivan Orlov
  2023-04-17  0:45         ` Hugh Dickins
  2023-04-17 18:28         ` Ivan Orlov
  3 siblings, 0 replies; 11+ messages in thread
From: Ivan Orlov @ 2023-04-16 23:04 UTC (permalink / raw)
  To: Andrew Morton, Zach O'Keefe
  Cc: Yang Shi, himadrispandya, linux-kernel, linux-kernel-mentees,
	linux-mm, skhan, syzbot+9578faa5475acb35fa50, Mike Kravetz,
	Kirill A. Shutemov, Matthew Wilcox

On 4/16/23 22:33, Andrew Morton wrote:
> 
> Circling back to this fix...
> 
> The BUG() is obviously real.  We're unsure that Ivan's fix is the best
> one.  We haven't identified a Fixes:, and as this report is against the 6.2
> kernel, a cc:stable will be needed.
> 
> According to the sysbot bisection
> (https://syzkaller.appspot.com/bug?id=7d6bb3760e026ece7524500fe44fb024a0e959fc),
> this is present in linux-5.19, so it might predate Zach's
> 58ac9a8993a13ebc changes.  But that bisection claim might be
> misleading.
> 
> And Zach is offline for a few months.  So can people please take a look
> and see if we can get this wrapped up?
> 
> Matthew, the assertion failure is in the
> 
> 	VM_BUG_ON(index != xas.xa_index);
> 
> which was added in 77da9389b9d5f, so perhaps you could take a look?
> 
> Thanks.

I tested the reproducers on the 99cb0dbd47a15 commit, and they do not 
trigger the problematic condition of shared memory truncation or 
hole-punching. I will investigate further, as there have been many 
changes in khugepaged since the 99cb0dbd47a15 commit that could 
potentially affect its behavior.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] mm: khugepaged: Fix kernel BUG in hpage_collapse_scan_file
  2023-04-16 18:33       ` Andrew Morton
  2023-04-16 20:39         ` Ivan Orlov
  2023-04-16 23:04         ` Ivan Orlov
@ 2023-04-17  0:45         ` Hugh Dickins
  2023-04-17 18:28         ` Ivan Orlov
  3 siblings, 0 replies; 11+ messages in thread
From: Hugh Dickins @ 2023-04-17  0:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Zach O'Keefe, Yang Shi, Ivan Orlov, himadrispandya,
	linux-kernel, linux-kernel-mentees, linux-mm, skhan,
	syzbot+9578faa5475acb35fa50, Mike Kravetz, Kirill A. Shutemov,
	Matthew Wilcox

On Sun, 16 Apr 2023, Andrew Morton wrote:

> 
> Circling back to this fix...
> 
> The BUG() is obviously real.

It's worth remembering that syzbot's reproducer involves artificially
injecting page allocation failures.  So although the bug may be "real",
it is rather in the theoretical category, way down on my own list to
look at, and I'd say not at all urgent to fix.

Hugh

> We're unsure that Ivan's fix is the best
> one.  We haven't identified a Fixes:, and as this report is against the 6.2
> kernel, a cc:stable will be needed.
> 
> According to the sysbot bisection
> (https://syzkaller.appspot.com/bug?id=7d6bb3760e026ece7524500fe44fb024a0e959fc),
> this is present in linux-5.19, so it might predate Zach's
> 58ac9a8993a13ebc changes.  But that bisection claim might be
> misleading.
> 
> And Zach is offline for a few months.  So can people please take a look
> and see if we can get this wrapped up?
> 
> Matthew, the assertion failure is in the
> 
> 	VM_BUG_ON(index != xas.xa_index);
> 
> which was added in 77da9389b9d5f, so perhaps you could take a look?
> 
> Thanks.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] mm: khugepaged: Fix kernel BUG in hpage_collapse_scan_file
  2023-04-16 18:33       ` Andrew Morton
                           ` (2 preceding siblings ...)
  2023-04-17  0:45         ` Hugh Dickins
@ 2023-04-17 18:28         ` Ivan Orlov
  2023-04-18 21:28           ` Andrew Morton
  3 siblings, 1 reply; 11+ messages in thread
From: Ivan Orlov @ 2023-04-17 18:28 UTC (permalink / raw)
  To: Andrew Morton, Zach O'Keefe
  Cc: Yang Shi, himadrispandya, linux-kernel, linux-kernel-mentees,
	linux-mm, skhan, syzbot+9578faa5475acb35fa50, Mike Kravetz,
	Kirill A. Shutemov, Matthew Wilcox

On 16.04.2023 22:33, Andrew Morton wrote:
> 
> Circling back to this fix...
> 
> The BUG() is obviously real.  We're unsure that Ivan's fix is the best
> one.  We haven't identified a Fixes:, and as this report is against the 6.2
> kernel, a cc:stable will be needed.
> 
> According to the sysbot bisection
> (https://syzkaller.appspot.com/bug?id=7d6bb3760e026ece7524500fe44fb024a0e959fc),
> this is present in linux-5.19, so it might predate Zach's
> 58ac9a8993a13ebc changes.  But that bisection claim might be
> misleading.
> 
> And Zach is offline for a few months.  So can people please take a look
> and see if we can get this wrapped up?
> 
> Matthew, the assertion failure is in the
> 
> 	VM_BUG_ON(index != xas.xa_index);
> 
> which was added in 77da9389b9d5f, so perhaps you could take a look?
> 
> Thanks.

Tested today on the latest rc7, the bug still exists.

Kind regards,
Ivan Orlov.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] mm: khugepaged: Fix kernel BUG in hpage_collapse_scan_file
  2023-04-17 18:28         ` Ivan Orlov
@ 2023-04-18 21:28           ` Andrew Morton
  0 siblings, 0 replies; 11+ messages in thread
From: Andrew Morton @ 2023-04-18 21:28 UTC (permalink / raw)
  To: Ivan Orlov
  Cc: Zach O'Keefe, Yang Shi, himadrispandya, linux-kernel,
	linux-kernel-mentees, linux-mm, skhan,
	syzbot+9578faa5475acb35fa50, Mike Kravetz, Kirill A. Shutemov,
	Matthew Wilcox

On Mon, 17 Apr 2023 22:28:51 +0400 Ivan Orlov <ivan.orlov0322@gmail.com> wrote:

> Tested today on the latest rc7, the bug still exists.

OK, thanks.  I'll send this patch upstream during the next merge window. 
If you, Zach or someone else decides to remove SCAN_STORE_FAILED and
perhaps those WARN_ON_ONCE()es then we can address these things later.



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-04-18 21:28 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-03-30 15:53 [PATCH v2] mm: khugepaged: Fix kernel BUG in hpage_collapse_scan_file Ivan Orlov
2023-03-31  1:33 ` Zach O'Keefe
2023-03-31  6:47   ` Ivan Orlov
2023-04-04  0:58   ` Yang Shi
2023-04-05 16:43     ` Zach O'Keefe
2023-04-16 18:33       ` Andrew Morton
2023-04-16 20:39         ` Ivan Orlov
2023-04-16 23:04         ` Ivan Orlov
2023-04-17  0:45         ` Hugh Dickins
2023-04-17 18:28         ` Ivan Orlov
2023-04-18 21:28           ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).