From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9FF51C46CD2 for ; Wed, 24 Jan 2024 10:52:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C554D6B0074; Wed, 24 Jan 2024 05:52:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BB6B06B0075; Wed, 24 Jan 2024 05:52:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AA5FE6B0078; Wed, 24 Jan 2024 05:52:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 92CCA6B0074 for ; Wed, 24 Jan 2024 05:52:30 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 1AF3740383 for ; Wed, 24 Jan 2024 10:52:30 +0000 (UTC) X-FDA: 81713890860.20.04A6B7A Received: from out-185.mta0.migadu.com (out-185.mta0.migadu.com [91.218.175.185]) by imf28.hostedemail.com (Postfix) with ESMTP id 0713EC000B for ; Wed, 24 Jan 2024 10:52:27 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=Q0UrPpdx; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf28.hostedemail.com: domain of gang.li@linux.dev designates 91.218.175.185 as permitted sender) smtp.mailfrom=gang.li@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706093548; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hSGlNhweEQPHhtTtW35MZtCA/Y0flZ+/CGgGW317yTg=; b=2Iteruyh2E37k7GYeAlYdNVdlO87s1J8xus4h5oCJaoTlp8mZbZCW+jDFT4hS4iNus0r8d XzOwwHJYLOFWwP6Z2BTgNTEqztw6OjWYf7o1JBrerAIE/C+8OGhYP7v/5lxPnM4mYlRSbc U82iw/enYGhZRkLD0MtzUWbFBv13XXc= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=Q0UrPpdx; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf28.hostedemail.com: domain of gang.li@linux.dev designates 91.218.175.185 as permitted sender) smtp.mailfrom=gang.li@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706093548; a=rsa-sha256; cv=none; b=7lOY+Iplt4Au/l8C3PW1kGyNG9ByrJcGP77XjUUgru77RsUxtSIHApYt2Shg/pw39VdceJ W6U7oZx0GLT//dwV+j55Oj+o8snwj47MK+ZMghSp3VorY50hz6MORyLUHfRPF2OuCcMwit CY39nfcdv9r55/9/DY4RyhgJM2Afc4E= Message-ID: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1706093545; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hSGlNhweEQPHhtTtW35MZtCA/Y0flZ+/CGgGW317yTg=; b=Q0UrPpdxJ6FQa5MfWqrJnctEYjgv0R/38rUgx+1uUIwNMaa9gGQicl5F1PDnDNTlrqGeAr 6jALov1KSMUWA3jb+3+9ahqeLsn+lN6JuwjlQXfn6Plw5U0AexpYJ/Q2z/YVw3Hr1l5Iaa PvwCc4jPNOVE48V8zGkgeZUF9nwQLgc= Date: Wed, 24 Jan 2024 18:52:19 +0800 MIME-Version: 1.0 Subject: Re: [PATCH v4 7/7] hugetlb: parallelize 1G hugetlb initialization Content-Language: en-US To: Muchun Song Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, ligang.bdlg@bytedance.com, David Hildenbrand , David Rientjes , Mike Kravetz , Andrew Morton , Tim Chen References: <20240118123911.88833-1-gang.li@linux.dev> <20240118123911.88833-8-gang.li@linux.dev> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Gang Li In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 0713EC000B X-Stat-Signature: fsw4kez7tr6do9nirk1yqeapq1k3xy1j X-HE-Tag: 1706093547-678023 X-HE-Meta: U2FsdGVkX1/uW0TkJ+IdRHGK+UPuBcOPQ/oB6/0ohq0dObgODTVist/oLkzA7HYMlJljSUfw+MTQaNnyqd2pGyw5wa8ZL2txCmusS9MiFPWn9zdf44bkdBuJYGz/lCUzjRmMHb56Kmfq+3pq3u7EcuuHFL5W9UW/shcub+gPLrn01BVKaxjfB7AO9leaOtDqEreqDxAjaRf0T8EFeBiWf5JlQzrqKKSmoYxdiLq2s1Vr1hvszJQt+YnWV8pX5a7HGtrpjhtk4UtfLYoByGMgHtL/7vIGKKbVevyvcSBA2snNzk49T2AfhRXt4IUZxJsmkwJ8bbq6Tz8WubrlQIhEZnQ0x2Wip5EJzKCHuuZi4RJOZeok8VpJ514peZvfXbgBrSBMsH5iVfU8eazYQSKBeAqgKChxGz82BoHx0/NJEn8nI/G4xNrLyXNXazHURqcZVha3tZ7iSNbMc49UUIiyQaeTkz3vAmXwjYJBJU3iDuQzW49lXe2EIKKb2nXjry9MpFSAvEbnAvddEV2Iap9AucnrUawukCeDsXM0kX1LeBfP+MoBadhpJWXLY4BaOeb4Emv8mAptHuKq/CtP10bV4JSitRaR9yI1K0z+0LUxMTBhip2m4VbIdHcyFSOJSN0mNje1Kzv3hZJ+8Mpr6/7gMphKiPHR8Kwuqf+62PuzE9lwZT0ggY6gVRZrIpwcnfHGgxd3CRfqBmqLwFVm/9Gfmd8vIfUhXaI96edoDNFzSrjtN05sMEi0p3unetorUipdvaJrpQeQfS3JfF21YejOBhZ/ijOLqxlgZtDTsIvwshYWyaHpc6gUFRl0EJDiaPrMoJ/Td63W8vx9NSRT3HMRZ+QoMnto7aPyII4mwRaQ3i2a7GY+qubmf6zt5JfmBT6XwOT+zsNgv+dHtjRXoCyrEX1OG/HnX+TCmBNmOwJCuLBmY0FDSuHNrThJ1g5kpAuD5vIpVu6/QRrPcOOxKJI wGCEZLLP apQlO4TBqNl3mIWqw8dD4+DAZlC2zYOb77/aiXzbelnjIm/bpoRKV9ZLiu282UMNLSGOytM/kjM1tZV9bLZGWo8ZohTGu6NTkiMmpM1s/rdtjWG98yPlSGe2aww== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/1/24 17:23, Muchun Song wrote: > On 2024/1/18 20:39, Gang Li wrote: >> Optimizing the initialization speed of 1G huge pages through >> parallelization. >> >> 1G hugetlbs are allocated from bootmem, a process that is already >> very fast and does not currently require optimization. Therefore, >> we focus on parallelizing only the initialization phase in >> `gather_bootmem_prealloc`. >> >> Here are some test results: >>          test          no patch(ms)   patched(ms)   saved >>   ------------------- -------------- ------------- -------- >>    256c2t(4 node) 1G           4745          2024   57.34% > > What does "256c2t" mean? A machine with 256 core and 2T memory. > >>    128c1t(2 node) 1G           3358          1712   49.02% >>        12t        1G          77000         18300   76.23% >> >> Signed-off-by: Gang Li >> Tested-by: David Rientjes >> --- >>   include/linux/hugetlb.h |  2 +- >>   mm/hugetlb.c            | 42 +++++++++++++++++++++++++++++++++-------- >>   2 files changed, 35 insertions(+), 9 deletions(-) >> >> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h >> index c1ee640d87b1..77b30a8c6076 100644 >> --- a/include/linux/hugetlb.h >> +++ b/include/linux/hugetlb.h >> @@ -178,7 +178,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct >> vm_area_struct *vma, >>   struct address_space *hugetlb_page_mapping_lock_write(struct page >> *hpage); >>   extern int sysctl_hugetlb_shm_group; >> -extern struct list_head huge_boot_pages; >> +extern struct list_head huge_boot_pages[MAX_NUMNODES]; >>   /* arch callbacks */ >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> index 9b348ba418f5..2f4b77630ada 100644 >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c >> @@ -69,7 +69,7 @@ static bool hugetlb_cma_folio(struct folio *folio, >> unsigned int order) >>   #endif >>   static unsigned long hugetlb_cma_size __initdata; >> -__initdata LIST_HEAD(huge_boot_pages); >> +__initdata struct list_head huge_boot_pages[MAX_NUMNODES]; >>   /* for command line parsing */ >>   static struct hstate * __initdata parsed_hstate; >> @@ -3301,7 +3301,7 @@ int alloc_bootmem_huge_page(struct hstate *h, >> int nid) >>   int __alloc_bootmem_huge_page(struct hstate *h, int nid) >>   { >>       struct huge_bootmem_page *m = NULL; /* initialize for clang */ >> -    int nr_nodes, node; >> +    int nr_nodes, node = nid; > > Why not use nid directly in the following list_add()? `node` may be changed in `for_each_node_mask_to_alloc`. > >>       /* do node specific alloc */ >>       if (nid != NUMA_NO_NODE) { >> @@ -3339,7 +3339,7 @@ int __alloc_bootmem_huge_page(struct hstate *h, >> int nid) >>           huge_page_size(h) - PAGE_SIZE); >>       /* Put them into a private list first because mem_map is not up >> yet */ >>       INIT_LIST_HEAD(&m->list); >> -    list_add(&m->list, &huge_boot_pages); >> +    list_add(&m->list, &huge_boot_pages[node]); >>       m->hstate = h; >>       return 1; >>   } >> @@ -3390,8 +3390,6 @@ static void __init >> prep_and_add_bootmem_folios(struct hstate *h, >>       /* Send list for bulk vmemmap optimization processing */ >>       hugetlb_vmemmap_optimize_folios(h, folio_list); >> -    /* Add all new pool pages to free lists in one lock cycle */ >> -    spin_lock_irqsave(&hugetlb_lock, flags); >>       list_for_each_entry_safe(folio, tmp_f, folio_list, lru) { >>           if (!folio_test_hugetlb_vmemmap_optimized(folio)) { >>               /* >> @@ -3404,23 +3402,27 @@ static void __init >> prep_and_add_bootmem_folios(struct hstate *h, >>                       HUGETLB_VMEMMAP_RESERVE_PAGES, >>                       pages_per_huge_page(h)); >>           } >> +        /* Subdivide locks to achieve better parallel performance * >> +        spin_lock_irqsave(&hugetlb_lock, flags); >>           __prep_account_new_huge_page(h, folio_nid(folio)); >>           enqueue_hugetlb_folio(h, folio); >> +        spin_unlock_irqrestore(&hugetlb_lock, flags); >>       } >> -    spin_unlock_irqrestore(&hugetlb_lock, flags); >>   } >>   /* >>    * Put bootmem huge pages into the standard lists after mem_map is up. >>    * Note: This only applies to gigantic (order > MAX_PAGE_ORDER) pages. >>    */ >> -static void __init gather_bootmem_prealloc(void) >> +static void __init __gather_bootmem_prealloc(unsigned long start, >> unsigned long end, void *arg) > > This function name could be gather_bootmem_prealloc_node. > LGTM. >> + >>   { >> +    int nid = start; >>       LIST_HEAD(folio_list); >>       struct huge_bootmem_page *m; >>       struct hstate *h = NULL, *prev_h = NULL; >> -    list_for_each_entry(m, &huge_boot_pages, list) { >> +    list_for_each_entry(m, &huge_boot_pages[nid], list) { >>           struct page *page = virt_to_page(m); >>           struct folio *folio = (void *)page; >> @@ -3453,6 +3455,22 @@ static void __init gather_bootmem_prealloc(void) >>       prep_and_add_bootmem_folios(h, &folio_list); >>   } >> +static void __init gather_bootmem_prealloc(void) >> +{ >> +    struct padata_mt_job job = { >> +        .thread_fn    = __gather_bootmem_prealloc, >> +        .fn_arg        = NULL, >> +        .start        = 0, >> +        .size        = num_node_state(N_MEMORY), >> +        .align        = 1, >> +        .min_chunk    = 1, >> +        .max_threads    = num_node_state(N_MEMORY), >> +        .numa_aware    = true, >> +    }; >> + >> +    padata_do_multithreaded(&job); >> +} >> + >>   static void __init hugetlb_hstate_alloc_pages_onenode(struct hstate >> *h, int nid) >>   { >>       unsigned long i; >> @@ -3602,6 +3620,14 @@ static void __init >> hugetlb_hstate_alloc_pages(struct hstate *h) >>           return; >>       } >> +    /* hugetlb_hstate_alloc_pages will be called many times, init >> huge_boot_pages once*/ > > s/init/initialize/g > > And you miss a black right before "*/". OK > >> +    if (huge_boot_pages[0].next == NULL) { > > It it not intuitive. I'd like to use a 'initialied' variable Would it make the code look a bit redundant? > to indicate whether it has been initialized. BTW, it can be > marked as __initdata. > OK