From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8C6CA7262F for ; Wed, 15 Oct 2025 22:01:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760565700; cv=none; b=MomQ3nFyK/oz0vASFjhqtn1WlggRPS1Vvy+QWuPxNTQXN1EgarR0ha+GGvSLQ4bH5zbC/V/KlVoxjZrkEg2oLy6dAZyPU4e1A/UEdpQiGyIyVeJ413+zLehWOBNQZrqoVkfv/rSnPZj6XyxTKNMdw+rHdD7KUh1LixS8gM+hE4c= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760565700; c=relaxed/simple; bh=xRigLJtd6I9P1M/MGeLtD8jvi6I8AZtWZFqhsovBofs=; h=Date:To:From:Subject:Message-Id; b=JxK2ieQbPcL1Rl13ysPPTX9Xh2oSHaghRlxsFRiEdLhoDoBnb3HfbYTMqv9RxRCuhaGVq09mV/6YUnfrRJjAxKAQTcWFKxXNWe+UFjuexJQdnE16MbtTNJ8y1mgBBg0HoIS3H+vHuzY4k2lzk31wjliG4tu55MimoD+m9G58+xQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b=eeqesEe+; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b="eeqesEe+" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 43A0BC4CEF8; Wed, 15 Oct 2025 22:01:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1760565700; bh=xRigLJtd6I9P1M/MGeLtD8jvi6I8AZtWZFqhsovBofs=; h=Date:To:From:Subject:From; b=eeqesEe+Q1IVuF9GGqfM28uw/98ek0sDGRYe32R3YAP+xxvidEnEbekcYY63kXQNH WcBFRKv6x8NZZOOvK3lZiWx6pevkmMJuGx6zQw8T1U1JQRfA+JPGO6XfUr32X8sLPI UINtuJW2da2/CPG2QLJvJX3l7f1i5wqYkhEbanZY= Date: Wed, 15 Oct 2025 15:01:39 -0700 To: mm-commits@vger.kernel.org,ziy@nvidia.com,zhengqi.arch@bytedance.com,shakeel.butt@linux.dev,ryan.roberts@arm.com,roman.gushchin@linux.dev,npache@redhat.com,muchun.song@linux.dev,mhocko@suse.com,lorenzo.stoakes@oracle.com,liam.howlett@oracle.com,lance.yang@linux.dev,hughd@google.com,harry.yoo@oracle.com,hannes@cmpxchg.org,dev.jain@arm.com,david@redhat.com,baolin.wang@linux.alibaba.com,baohua@kernel.org,songmuchun@bytedance.com,akpm@linux-foundation.org From: Andrew Morton Subject: + mm-thp-use-folio_batch-to-handle-thp-splitting-in-deferred_split_scan.patch added to mm-new branch Message-Id: <20251015220140.43A0BC4CEF8@smtp.kernel.org> Precedence: bulk X-Mailing-List: mm-commits@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: The patch titled Subject: mm: thp: use folio_batch to handle THP splitting in deferred_split_scan() has been added to the -mm mm-new branch. Its filename is mm-thp-use-folio_batch-to-handle-thp-splitting-in-deferred_split_scan.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-thp-use-folio_batch-to-handle-thp-splitting-in-deferred_split_scan.patch This patch will later appear in the mm-new branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Note, mm-new is a provisional staging ground for work-in-progress patches, and acceptance into mm-new is a notification for others take notice and to finish up reviews. Please do not hesitate to respond to review feedback and post updated versions to replace or incrementally fixup patches in mm-new. Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Muchun Song Subject: mm: thp: use folio_batch to handle THP splitting in deferred_split_scan() Date: Wed, 15 Oct 2025 14:35:32 +0800 The maintenance of the folio->_deferred_list is intricate because it's reused in a local list. Here are some peculiarities: 1) When a folio is removed from its split queue and added to a local on-stack list in deferred_split_scan(), the ->split_queue_len isn't updated, leading to an inconsistency between it and the actual number of folios in the split queue. 2) When the folio is split via split_folio() later, it's removed from the local list while holding the split queue lock. At this time, the lock is not needed as it is not protecting anything. 3) To handle the race condition with a third-party freeing or migrating the preceding folio, we must ensure there's always one safe (with raised refcount) folio before by delaying its folio_put(). More details can be found in commit e66f3185fa04 ("mm/thp: fix deferred split queue not partially_mapped"). It's rather tricky. We can use the folio_batch infrastructure to handle this clearly. In this case, ->split_queue_len will be consistent with the real number of folios in the split queue. If list_empty(&folio->_deferred_list) returns false, it's clear the folio must be in its split queue (not in a local list anymore). In the future, we will reparent LRU folios during memcg offline to eliminate dying memory cgroups, which requires reparenting the split queue to its parent first. So this patch prepares for using folio_split_queue_lock_irqsave() as the memcg may change then. Link: https://lkml.kernel.org/r/4f5d7a321c72dfe65e0e19a3f89180d5988eae2e.1760509767.git.zhengqi.arch@bytedance.com Signed-off-by: Muchun Song Signed-off-by: Qi Zheng Reviewed-by: Zi Yan Acked-by: David Hildenbrand Acked-by: Shakeel Butt Cc: Baolin Wang Cc: Barry Song Cc: Dev Jain Cc: Harry Yoo Cc: Hugh Dickins Cc: Johannes Weiner Cc: Lance Yang Cc: Liam Howlett Cc: Lorenzo Stoakes Cc: Mariano Pache Cc: Michal Hocko Cc: Muchun Song Cc: Roman Gushchin Cc: Ryan Roberts Signed-off-by: Andrew Morton --- mm/huge_memory.c | 81 +++++++++++++++++++++++---------------------- 1 file changed, 42 insertions(+), 39 deletions(-) --- a/mm/huge_memory.c~mm-thp-use-folio_batch-to-handle-thp-splitting-in-deferred_split_scan +++ a/mm/huge_memory.c @@ -3860,9 +3860,18 @@ static int __folio_split(struct folio *f struct lruvec *lruvec; int expected_refs; - if (old_order > 1 && - !list_empty(&folio->_deferred_list)) { - ds_queue->split_queue_len--; + if (old_order > 1) { + if (!list_empty(&folio->_deferred_list)) { + ds_queue->split_queue_len--; + /* + * Reinitialize page_deferred_list after + * removing the page from the split_queue, + * otherwise a subsequent split will see list + * corruption when checking the + * page_deferred_list. + */ + list_del_init(&folio->_deferred_list); + } if (folio_test_partially_mapped(folio)) { folio_clear_partially_mapped(folio); mod_mthp_stat(old_order, @@ -4276,35 +4285,40 @@ static unsigned long deferred_split_scan { struct deferred_split *ds_queue; unsigned long flags; - LIST_HEAD(list); - struct folio *folio, *next, *prev = NULL; - int split = 0, removed = 0; + struct folio *folio, *next; + int split = 0, i; + struct folio_batch fbatch; + + folio_batch_init(&fbatch); +retry: ds_queue = split_queue_lock_irqsave(sc->nid, sc->memcg, &flags); /* Take pin on all head pages to avoid freeing them under us */ list_for_each_entry_safe(folio, next, &ds_queue->split_queue, _deferred_list) { if (folio_try_get(folio)) { - list_move(&folio->_deferred_list, &list); - } else { + folio_batch_add(&fbatch, folio); + } else if (folio_test_partially_mapped(folio)) { /* We lost race with folio_put() */ - if (folio_test_partially_mapped(folio)) { - folio_clear_partially_mapped(folio); - mod_mthp_stat(folio_order(folio), - MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1); - } - list_del_init(&folio->_deferred_list); - ds_queue->split_queue_len--; + folio_clear_partially_mapped(folio); + mod_mthp_stat(folio_order(folio), + MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1); } + list_del_init(&folio->_deferred_list); + ds_queue->split_queue_len--; if (!--sc->nr_to_scan) break; + if (!folio_batch_space(&fbatch)) + break; } split_queue_unlock_irqrestore(ds_queue, flags); - list_for_each_entry_safe(folio, next, &list, _deferred_list) { + for (i = 0; i < folio_batch_count(&fbatch); i++) { bool did_split = false; bool underused = false; + struct deferred_split *fqueue; + folio = fbatch.folios[i]; if (!folio_test_partially_mapped(folio)) { /* * See try_to_map_unused_to_zeropage(): we cannot @@ -4327,38 +4341,27 @@ static unsigned long deferred_split_scan } folio_unlock(folio); next: + if (did_split || !folio_test_partially_mapped(folio)) + continue; /* - * split_folio() removes folio from list on success. * Only add back to the queue if folio is partially mapped. * If thp_underused returns false, or if split_folio fails * in the case it was underused, then consider it used and * don't add it back to split_queue. */ - if (did_split) { - ; /* folio already removed from list */ - } else if (!folio_test_partially_mapped(folio)) { - list_del_init(&folio->_deferred_list); - removed++; - } else { - /* - * That unlocked list_del_init() above would be unsafe, - * unless its folio is separated from any earlier folios - * left on the list (which may be concurrently unqueued) - * by one safe folio with refcount still raised. - */ - swap(folio, prev); + fqueue = folio_split_queue_lock_irqsave(folio, &flags); + if (list_empty(&folio->_deferred_list)) { + list_add_tail(&folio->_deferred_list, &fqueue->split_queue); + fqueue->split_queue_len++; } - if (folio) - folio_put(folio); + split_queue_unlock_irqrestore(fqueue, flags); } + folios_put(&fbatch); - spin_lock_irqsave(&ds_queue->split_queue_lock, flags); - list_splice_tail(&list, &ds_queue->split_queue); - ds_queue->split_queue_len -= removed; - spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); - - if (prev) - folio_put(prev); + if (sc->nr_to_scan && !list_empty(&ds_queue->split_queue)) { + cond_resched(); + goto retry; + } /* * Stop shrinker if we didn't split any page, but the queue is empty. _ Patches currently in -mm which might be from songmuchun@bytedance.com are mm-thp-replace-folio_memcg-with-folio_memcg_charged.patch mm-thp-introduce-folio_split_queue_lock-and-its-variants.patch mm-thp-use-folio_batch-to-handle-thp-splitting-in-deferred_split_scan.patch