All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ivan Orlov <ivan.orlov0322@gmail.com>
To: Zach O'Keefe <zokeefe@google.com>
Cc: shy828301@gmail.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org,
	syzbot+9578faa5475acb35fa50@syzkaller.appspotmail.com,
	akpm@linux-foundation.org,
	linux-kernel-mentees@lists.linuxfoundation.org
Subject: Re: [PATCH v2] mm: khugepaged: Fix kernel BUG in hpage_collapse_scan_file
Date: Fri, 31 Mar 2023 10:47:06 +0400	[thread overview]
Message-ID: <d7354d2d-8464-2cdd-a42c-582ea518c76b@gmail.com> (raw)
In-Reply-To: <20230331013301.ecgkjymaf3ws6rfb@google.com>

On 3/31/23 05:33, Zach O'Keefe wrote:

> Thanks, Ivan.
> 
> In the process of reviewing this, I starting thinking if the !shmem case was
> also susceptible to a similar race, and I *think* it might be. Unfortunately, my
> time has ran out, and I haven't been able to validate ; I'm less familiar with
> the file-side of things.
> 
> The underlying problem is race with truncation/hole-punch  under OOM condition.
> The nice do-while loop near the top of collapse_file() attempts to avoid this
> scenario by making sure enough slots are available. However, when we drop xarray
> lock, we open ourselves up to concurrent removal + slot deletion. Those slots
> then need to be allocated again -- which under OOM condition is failable.
> 
> The syzbot reproducer picks on shmem, but I think this can occur for file as
> well. If we find a hole, we unlock the xarray and call
> page_cache_sync_readahead(), which if it succeeds, IIUC, will have allocated a
> new slot in our mapping pointing to the new page. We *then* locks the page. Only
> after the page is locked are we protected from concurrent removal (Note: this is
> what provides us protection in many of the xas_store() cases ; we've held the
> slot's contained page-lock since verifying the slot exists, protecting us from
> removal / reallocation races).
> 
> Maybe I'm just low on caffeine at the end of the day, and am missing something,
> but if I had more time, I'd be looking into the file-side some more to verify.
> Apologies that hasn't occurred to me until now ; I was looking at one of your
> comments and double-checked why I *thought* we were safe.
> 
> Anyways, irrespective of that looming issues, some more notes to follow:
> 
>> The 'xas_store' call during page cache scanning can potentially
>> translate 'xas' into the error state (with the reproducer provided
>> by the syzkaller the error code is -ENOMEM). However, there are no
>> further checks after the 'xas_store', and the next call of 'xas_next'
>> at the start of the scanning cycle doesn't increase the xa_index,
>> and the issue occurs.
>>
>> This patch will add the xarray state error checking after the
>> 'xas_store' and the corresponding result error code. It will
>> also add xarray state error checking via WARN_ON_ONCE macros,
>> to be sure that ENOMEM or other possible errors don't occur
>> at the places they shouldn't.
> 
> Thanks for the additions here. I think it's worthwhile providing even more
> details about the specifics of the race we are fixing and/or guarding against to
> help ppl understand how that -ENOMEM comes about if the do-while loop has
> "Ensured" we have slots available (additionally, I think that comment can be
> augmented).
> 
>> Tested via syzbot.
>>
>> Reported-by: syzbot+9578faa5475acb35fa50@syzkaller.appspotmail.com
>> Link: https://syzkaller.appspot.com/bug?id=7d6bb3760e026ece7524500fe44fb024a0e959fc
>> Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com>
>> ---
>> V1 -> V2: Add WARN_ON_ONCE error checking and comments
>>
>>   mm/khugepaged.c | 20 ++++++++++++++++++++
>>   1 file changed, 20 insertions(+)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index 92e6f56a932d..8b6580b13339 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -55,6 +55,7 @@ enum scan_result {
>>   	SCAN_CGROUP_CHARGE_FAIL,
>>   	SCAN_TRUNCATED,
>>   	SCAN_PAGE_HAS_PRIVATE,
>> +	SCAN_STORE_FAILED,
>>   };
> 
> I'm still reluctant to add a new error code for this as this seems like quite a
> rare race that requires OOM to trigger. I'd be happier just reusing SCAN_FAIL,
> or, something we might get some millage out of later: SCAN_OOM.
> 
> Also, a reminder to update include/trace/events/huge_memory.h, if you go that
> route.
> 
>>   
>>   #define CREATE_TRACE_POINTS
>> @@ -1840,6 +1841,15 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
>>   					goto xa_locked;
>>   				}
>>   				xas_store(&xas, hpage);
>> +				if (xas_error(&xas)) {
>> +					/* revert shmem_charge performed
>> +					 * in the previous condition
>> +					 */
> 
> Nit: Here, and following, I think standard convention for multiline comment is
> to have an empty first and last line, eg:
> 
>   +					/*
>   +					 * revert shmem_charge performed
>   +					 * in the previous condition
>   +					 */
> 
> Though, checkpatch.pl --strict didn't seem to care.
> 
>> +					mapping->nrpages--;
>> +					shmem_uncharge(mapping->host, 1);
>> +					result = SCAN_STORE_FAILED;
>> +					goto xa_locked;
>> +				}
>>   				nr_none++;
>>   				continue;
>>   			}
>> @@ -1992,6 +2002,11 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
>>   
>>   		/* Finally, replace with the new page. */
>>   		xas_store(&xas, hpage);
>> +		/* We can't get an ENOMEM here (because the allocation happened before)
>> +		 * but let's check for errors (XArray implementation can be
>> +		 * changed in the future)
>> +		 */
>> +		WARN_ON_ONCE(xas_error(&xas));
> 
> Nit: it's not just that allocation happened before -- need some guarantee we've
> been protected from concurrent removal. This is what made me look at the file
> side.
> 
>>   		continue;
>>   out_unlock:
>>   		unlock_page(page);
>> @@ -2029,6 +2044,11 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
>>   	/* Join all the small entries into a single multi-index entry */
>>   	xas_set_order(&xas, start, HPAGE_PMD_ORDER);
>>   	xas_store(&xas, hpage);
>> +	/* Here we can't get an ENOMEM (because entries were
>> +	 * previously allocated) But let's check for errors
>> +	 * (XArray implementation can be changed in the future)
>> +	 */
>> +	WARN_ON_ONCE(xas_error(&xas));
> 
> Ditto.
> 
> Apologies I won't be around to see this change through -- I'm just out of time,
> and will be shutting my computer down tomorrow for 3 months.  Sorry for the poor
> timing, for raising issues, then disappearing. Hopefully I'm wrong and the
> file-side isn't a concern.
> 
> Best,
> Zach
> 
>>   xa_locked:
>>   	xas_unlock_irq(&xas);
>>   xa_unlocked:
>> -- 
>> 2.34.1
>>

Hello, Zach! Thank you very much for the detailed review! I thought that 
locking is our guarantee to not get an -ENOMEM, but I didn't have the 
deep understanding of the problem's cause, due to the fact I'm just 
starting my memory management journey :) In any case, I will do a 
research about this problem, and hopefully after you get out of the 
vacation you will see a new patch fully fixes this problem. Have a nice 
vacation! Thanks again!
_______________________________________________
Linux-kernel-mentees mailing list
Linux-kernel-mentees@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

WARNING: multiple messages have this Message-ID (diff)
From: Ivan Orlov <ivan.orlov0322@gmail.com>
To: Zach O'Keefe <zokeefe@google.com>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, himadrispandya@gmail.com,
	skhan@linuxfoundation.org,
	linux-kernel-mentees@lists.linuxfoundation.org,
	shy828301@gmail.com,
	syzbot+9578faa5475acb35fa50@syzkaller.appspotmail.com
Subject: Re: [PATCH v2] mm: khugepaged: Fix kernel BUG in hpage_collapse_scan_file
Date: Fri, 31 Mar 2023 10:47:06 +0400	[thread overview]
Message-ID: <d7354d2d-8464-2cdd-a42c-582ea518c76b@gmail.com> (raw)
In-Reply-To: <20230331013301.ecgkjymaf3ws6rfb@google.com>

On 3/31/23 05:33, Zach O'Keefe wrote:

> Thanks, Ivan.
> 
> In the process of reviewing this, I starting thinking if the !shmem case was
> also susceptible to a similar race, and I *think* it might be. Unfortunately, my
> time has ran out, and I haven't been able to validate ; I'm less familiar with
> the file-side of things.
> 
> The underlying problem is race with truncation/hole-punch  under OOM condition.
> The nice do-while loop near the top of collapse_file() attempts to avoid this
> scenario by making sure enough slots are available. However, when we drop xarray
> lock, we open ourselves up to concurrent removal + slot deletion. Those slots
> then need to be allocated again -- which under OOM condition is failable.
> 
> The syzbot reproducer picks on shmem, but I think this can occur for file as
> well. If we find a hole, we unlock the xarray and call
> page_cache_sync_readahead(), which if it succeeds, IIUC, will have allocated a
> new slot in our mapping pointing to the new page. We *then* locks the page. Only
> after the page is locked are we protected from concurrent removal (Note: this is
> what provides us protection in many of the xas_store() cases ; we've held the
> slot's contained page-lock since verifying the slot exists, protecting us from
> removal / reallocation races).
> 
> Maybe I'm just low on caffeine at the end of the day, and am missing something,
> but if I had more time, I'd be looking into the file-side some more to verify.
> Apologies that hasn't occurred to me until now ; I was looking at one of your
> comments and double-checked why I *thought* we were safe.
> 
> Anyways, irrespective of that looming issues, some more notes to follow:
> 
>> The 'xas_store' call during page cache scanning can potentially
>> translate 'xas' into the error state (with the reproducer provided
>> by the syzkaller the error code is -ENOMEM). However, there are no
>> further checks after the 'xas_store', and the next call of 'xas_next'
>> at the start of the scanning cycle doesn't increase the xa_index,
>> and the issue occurs.
>>
>> This patch will add the xarray state error checking after the
>> 'xas_store' and the corresponding result error code. It will
>> also add xarray state error checking via WARN_ON_ONCE macros,
>> to be sure that ENOMEM or other possible errors don't occur
>> at the places they shouldn't.
> 
> Thanks for the additions here. I think it's worthwhile providing even more
> details about the specifics of the race we are fixing and/or guarding against to
> help ppl understand how that -ENOMEM comes about if the do-while loop has
> "Ensured" we have slots available (additionally, I think that comment can be
> augmented).
> 
>> Tested via syzbot.
>>
>> Reported-by: syzbot+9578faa5475acb35fa50@syzkaller.appspotmail.com
>> Link: https://syzkaller.appspot.com/bug?id=7d6bb3760e026ece7524500fe44fb024a0e959fc
>> Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com>
>> ---
>> V1 -> V2: Add WARN_ON_ONCE error checking and comments
>>
>>   mm/khugepaged.c | 20 ++++++++++++++++++++
>>   1 file changed, 20 insertions(+)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index 92e6f56a932d..8b6580b13339 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -55,6 +55,7 @@ enum scan_result {
>>   	SCAN_CGROUP_CHARGE_FAIL,
>>   	SCAN_TRUNCATED,
>>   	SCAN_PAGE_HAS_PRIVATE,
>> +	SCAN_STORE_FAILED,
>>   };
> 
> I'm still reluctant to add a new error code for this as this seems like quite a
> rare race that requires OOM to trigger. I'd be happier just reusing SCAN_FAIL,
> or, something we might get some millage out of later: SCAN_OOM.
> 
> Also, a reminder to update include/trace/events/huge_memory.h, if you go that
> route.
> 
>>   
>>   #define CREATE_TRACE_POINTS
>> @@ -1840,6 +1841,15 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
>>   					goto xa_locked;
>>   				}
>>   				xas_store(&xas, hpage);
>> +				if (xas_error(&xas)) {
>> +					/* revert shmem_charge performed
>> +					 * in the previous condition
>> +					 */
> 
> Nit: Here, and following, I think standard convention for multiline comment is
> to have an empty first and last line, eg:
> 
>   +					/*
>   +					 * revert shmem_charge performed
>   +					 * in the previous condition
>   +					 */
> 
> Though, checkpatch.pl --strict didn't seem to care.
> 
>> +					mapping->nrpages--;
>> +					shmem_uncharge(mapping->host, 1);
>> +					result = SCAN_STORE_FAILED;
>> +					goto xa_locked;
>> +				}
>>   				nr_none++;
>>   				continue;
>>   			}
>> @@ -1992,6 +2002,11 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
>>   
>>   		/* Finally, replace with the new page. */
>>   		xas_store(&xas, hpage);
>> +		/* We can't get an ENOMEM here (because the allocation happened before)
>> +		 * but let's check for errors (XArray implementation can be
>> +		 * changed in the future)
>> +		 */
>> +		WARN_ON_ONCE(xas_error(&xas));
> 
> Nit: it's not just that allocation happened before -- need some guarantee we've
> been protected from concurrent removal. This is what made me look at the file
> side.
> 
>>   		continue;
>>   out_unlock:
>>   		unlock_page(page);
>> @@ -2029,6 +2044,11 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
>>   	/* Join all the small entries into a single multi-index entry */
>>   	xas_set_order(&xas, start, HPAGE_PMD_ORDER);
>>   	xas_store(&xas, hpage);
>> +	/* Here we can't get an ENOMEM (because entries were
>> +	 * previously allocated) But let's check for errors
>> +	 * (XArray implementation can be changed in the future)
>> +	 */
>> +	WARN_ON_ONCE(xas_error(&xas));
> 
> Ditto.
> 
> Apologies I won't be around to see this change through -- I'm just out of time,
> and will be shutting my computer down tomorrow for 3 months.  Sorry for the poor
> timing, for raising issues, then disappearing. Hopefully I'm wrong and the
> file-side isn't a concern.
> 
> Best,
> Zach
> 
>>   xa_locked:
>>   	xas_unlock_irq(&xas);
>>   xa_unlocked:
>> -- 
>> 2.34.1
>>

Hello, Zach! Thank you very much for the detailed review! I thought that 
locking is our guarantee to not get an -ENOMEM, but I didn't have the 
deep understanding of the problem's cause, due to the fact I'm just 
starting my memory management journey :) In any case, I will do a 
research about this problem, and hopefully after you get out of the 
vacation you will see a new patch fully fixes this problem. Have a nice 
vacation! Thanks again!


  reply	other threads:[~2023-03-31  6:47 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-30 15:53 [PATCH v2] mm: khugepaged: Fix kernel BUG in hpage_collapse_scan_file Ivan Orlov
2023-03-30 15:53 ` Ivan Orlov
2023-03-31  1:33 ` Zach O'Keefe via Linux-kernel-mentees
2023-03-31  1:33   ` Zach O'Keefe
2023-03-31  6:47   ` Ivan Orlov [this message]
2023-03-31  6:47     ` Ivan Orlov
2023-04-04  0:58   ` Yang Shi
2023-04-04  0:58     ` Yang Shi
2023-04-05 16:43     ` Zach O'Keefe via Linux-kernel-mentees
2023-04-05 16:43       ` Zach O'Keefe
2023-04-16 18:33       ` Andrew Morton
2023-04-16 18:33         ` Andrew Morton
2023-04-16 20:39         ` Ivan Orlov
2023-04-16 20:39           ` Ivan Orlov
2023-04-16 23:04         ` Ivan Orlov
2023-04-16 23:04           ` Ivan Orlov
2023-04-17  0:45         ` Hugh Dickins via Linux-kernel-mentees
2023-04-17  0:45           ` Hugh Dickins
2023-04-17 18:28         ` Ivan Orlov
2023-04-17 18:28           ` Ivan Orlov
2023-04-18 21:28           ` Andrew Morton
2023-04-18 21:28             ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d7354d2d-8464-2cdd-a42c-582ea518c76b@gmail.com \
    --to=ivan.orlov0322@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel-mentees@lists.linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=shy828301@gmail.com \
    --cc=syzbot+9578faa5475acb35fa50@syzkaller.appspotmail.com \
    --cc=zokeefe@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.