All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Yin, Fengwei" <fengwei.yin@intel.com>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: <linux-mm@kvack.org>, <akpm@linux-foundation.org>,
	<willy@infradead.org>, <yuzhao@google.com>,
	<ryan.roberts@arm.com>, <ying.huang@intel.com>
Subject: Re: [PATCH v2 1/2] THP: avoid lock when check whether THP is in deferred list
Date: Mon, 1 May 2023 13:50:25 +0800	[thread overview]
Message-ID: <610aea06-e453-29de-6342-fca1fd2e21ae@intel.com> (raw)
In-Reply-To: <20230429084653.vnmionbhnodbbd2w@box.shutemov.name>

Hi Kirill,

On 4/29/2023 4:46 PM, Kirill A. Shutemov wrote:
> On Sat, Apr 29, 2023 at 04:32:34PM +0800, Yin, Fengwei wrote:
>> Hi Kirill,
>>
>> On 4/28/2023 10:02 PM, Kirill A. Shutemov wrote:
>>> On Fri, Apr 28, 2023 at 02:28:07PM +0800, Yin, Fengwei wrote:
>>>> Hi Kirill,
>>>>
>>>> On 4/25/2023 8:38 PM, Kirill A. Shutemov wrote:
>>>>> On Tue, Apr 25, 2023 at 04:46:26PM +0800, Yin Fengwei wrote:
>>>>>> free_transhuge_page() acquires split queue lock then check
>>>>>> whether the THP was added to deferred list or not.
>>>>>>
>>>>>> It's safe to check whether the THP is in deferred list or not.
>>>>>>    When code hit free_transhuge_page(), there is no one tries
>>>>>>    to update the folio's _deferred_list.
>>>>>>
>>>>>>    If folio is not in deferred_list, it's safe to check without
>>>>>>    acquiring lock.
>>>>>>
>>>>>>    If folio is in deferred_list, the other node in deferred_list
>>>>>>    adding/deleteing doesn't impact the return value of
>>>>>>    list_epmty(@folio->_deferred_list).
>>>>>
>>>>> Typo.
>>>>>
>>>>>>
>>>>>> Running page_fault1 of will-it-scale + order 2 folio for anonymous
>>>>>> mapping with 96 processes on an Ice Lake 48C/96T test box, we could
>>>>>> see the 61% split_queue_lock contention:
>>>>>> -   71.28%     0.35%  page_fault1_pro  [kernel.kallsyms]           [k]
>>>>>>     release_pages
>>>>>>    - 70.93% release_pages
>>>>>>       - 61.42% free_transhuge_page
>>>>>>          + 60.77% _raw_spin_lock_irqsave
>>>>>>
>>>>>> With this patch applied, the split_queue_lock contention is less
>>>>>> than 1%.
>>>>>>
>>>>>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
>>>>>> Tested-by: Ryan Roberts <ryan.roberts@arm.com>
>>>>>> ---
>>>>>>  mm/huge_memory.c | 19 ++++++++++++++++---
>>>>>>  1 file changed, 16 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>>>> index 032fb0ef9cd1..c620f1f12247 100644
>>>>>> --- a/mm/huge_memory.c
>>>>>> +++ b/mm/huge_memory.c
>>>>>> @@ -2799,12 +2799,25 @@ void free_transhuge_page(struct page *page)
>>>>>>  	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
>>>>>>  	unsigned long flags;
>>>>>>  
>>>>>> -	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>>>>>> -	if (!list_empty(&folio->_deferred_list)) {
>>>>>> +	/*
>>>>>> +	 * At this point, there is no one trying to queue the folio
>>>>>> +	 * to deferred_list. folio->_deferred_list is not possible
>>>>>> +	 * being updated.
>>>>>> +	 *
>>>>>> +	 * If folio is already added to deferred_list, add/delete to/from
>>>>>> +	 * deferred_list will not impact list_empty(&folio->_deferred_list).
>>>>>> +	 * It's safe to check list_empty(&folio->_deferred_list) without
>>>>>> +	 * acquiring the lock.
>>>>>> +	 *
>>>>>> +	 * If folio is not in deferred_list, it's safe to check without
>>>>>> +	 * acquiring the lock.
>>>>>> +	 */
>>>>>> +	if (data_race(!list_empty(&folio->_deferred_list))) {
>>>>>> +		spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>>>>>
>>>>> Recheck under lock?
>>>> In function deferred_split_scan(), there is following code block:
>>>>                 if (folio_try_get(folio)) {
>>>>                         list_move(&folio->_deferred_list, &list);
>>>>                 } else {
>>>>                         /* We lost race with folio_put() */
>>>>                         list_del_init(&folio->_deferred_list);
>>>>                         ds_queue->split_queue_len--;
>>>>                 }
>>>>
>>>> I am wondering what kind of "lost race with folio_put()" can be.
>>>>
>>>> My understanding is that it's not necessary to handle this case here
>>>> because free_transhuge_page() will handle it once folio get zero ref.
>>>> But I must miss something here. Thanks.
>>>
>>> free_transhuge_page() got when refcount is already zero. Both
>>> deferred_split_scan() and free_transhuge_page() can see the page with zero
>>> refcount. The check makes deferred_split_scan() to leave the page to the
>>> free_transhuge_page().
>>>
>> If deferred_split_scan() leaves the page to free_transhuge_page(), is it
>> necessary to do
>>         list_del_init(&folio->_deferred_list);
>>         ds_queue->split_queue_len--;
>>
>> Can these two line be left to free_transhuge_page() either? Thanks.
> 
> I *think* (my cache is cold on deferred split) we can. But since we
> already hold the lock, why not take care of it? It makes your change more
> efficient.
Thanks a lot for your confirmation. I just wanted to make sure I understand
the race here correctly (I didn't notice this part of code before Ying pointed
it out).


Regards
Yin, Fengwei

> 


  reply	other threads:[~2023-05-01  5:50 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-25  8:46 [PATCH v2 0/2] Reduce lock contention related with large folio Yin Fengwei
2023-04-25  8:46 ` [PATCH v2 1/2] THP: avoid lock when check whether THP is in deferred list Yin Fengwei
2023-04-25 12:38   ` Kirill A. Shutemov
2023-04-26  1:47     ` Yin Fengwei
2023-04-26  2:08     ` Yin Fengwei
2023-04-26  8:17       ` Ryan Roberts
2023-04-28  6:28     ` Yin, Fengwei
2023-04-28 14:02       ` Kirill A. Shutemov
2023-04-29  8:32         ` Yin, Fengwei
2023-04-29  8:46           ` Kirill A. Shutemov
2023-05-01  5:50             ` Yin, Fengwei [this message]
2023-04-26  1:13   ` Huang, Ying
2023-04-26  1:48     ` Yin Fengwei
2023-04-26  8:11   ` Ryan Roberts
2023-04-25  8:46 ` [PATCH v2 2/2] lru: allow large batched add large folio to lru list Yin Fengwei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=610aea06-e453-29de-6342-fca1fd2e21ae@intel.com \
    --to=fengwei.yin@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=kirill@shutemov.name \
    --cc=linux-mm@kvack.org \
    --cc=ryan.roberts@arm.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.