From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 506F0C83F14 for ; Wed, 30 Aug 2023 12:58:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D019D440156; Wed, 30 Aug 2023 08:58:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CB160440155; Wed, 30 Aug 2023 08:58:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B797B440156; Wed, 30 Aug 2023 08:58:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id A7997440155 for ; Wed, 30 Aug 2023 08:58:01 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 746EA80150 for ; Wed, 30 Aug 2023 12:58:01 +0000 (UTC) X-FDA: 81180773562.04.00068D7 Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) by imf20.hostedemail.com (Postfix) with ESMTP id 9639F1C002D for ; Wed, 30 Aug 2023 12:57:59 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=XWJCoNj1; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf20.hostedemail.com: domain of zhangpeng.00@bytedance.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=zhangpeng.00@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1693400279; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0yWpUOFFGV0pFtikzJZ55MStl9p8vL9hqNAbNokJEAE=; b=05tzIGbWpzdkXqtjmzl2ok3LKt4BFHlNBsokvS2AbeTwtkQdgQDyEcbB+cF9OPdKNNG2gw SnEquE7n5wGluX/IMf2KwgEFlemR/HR0HJi+ZvcuNqJI8fZUC05s5LzY20SX7vPfFM/j95 WFWH0C4GaJSJHu7uZpYjdYzno6CB0o8= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=XWJCoNj1; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf20.hostedemail.com: domain of zhangpeng.00@bytedance.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=zhangpeng.00@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1693400279; a=rsa-sha256; cv=none; b=ZN0NPhZ+pr8xWiFBBN2vzo/g0UjKh2oIYlw6c6FCtkxBpbpCYaPB4wrM3gcTsPuUJsRcyw 4+nJDAXe8BJJuF9LUnZrm4eYepz7aoRIcpW+ivZirv0SG9Vlx9u223LHwluKS5R40ClCAg HI3vB4bAb9XR8HUNeMLlXuOe5z90IYg= Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-1bdc19b782aso35314675ad.0 for ; Wed, 30 Aug 2023 05:57:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1693400278; x=1694005078; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=0yWpUOFFGV0pFtikzJZ55MStl9p8vL9hqNAbNokJEAE=; b=XWJCoNj1+fYxjDEE/7KFwYVu5JFsQk/TlWHBB5L6hTsH45HAa2+ourfkUD3kAwfSIH 15xc6WlWtv4EqsLCy6gc+OA5LfA2BWs8osJ0BMFKBktwIXGQgM2VkJH+7D527ielgFCV yhU3N6SVht55SS1mp3w58XpZdrj4H+bkY/yoijUtdOxx5yQJiNHTnKBeuIsi+ZsKC/CK 92p96yp9k6rEE03vdBN+NlPC6yPteZSzEKN0a+sUEE9/DIhxBsL6sQuMNKN/YP4HxgdV E+tlZ+AtpVQiQzuTC1EunTKj2hQguZu0ni/oOm2OkUK4BiS6pI8B7kEpnTW8+4YSZVx7 m9Lw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693400278; x=1694005078; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0yWpUOFFGV0pFtikzJZ55MStl9p8vL9hqNAbNokJEAE=; b=RiVcNqRR3Z8IKzG/hahXk4yarxiowgXgH6p9+G7BvOqqaMtBuHtSKPufDnIAkSfcNQ Pa852ijec+uGAZSYX04G0ZjhyEc2UQk+GTOWXwWo263RM9PkvQtjZrAx8NJjwGtXW+Pp YyIaWuYq+A7VPH1odOo5RYH+chjx+HtGXAo1kNLb21nTBTKjYI9NdDy4GQqdfjGTimTg dJ8IYryipg3OcFMk9RNm6WX0ebFIfE748HqkAsiURJUB/FFTgZamzSEXUu7BbouepwS3 Mqgosd7hhb8X23xdx5it4NHNtGhKie1IuDwZyhkTskmKmcyvd1YQ7pSoj2t5W+KWNRZB a/1Q== X-Gm-Message-State: AOJu0YwA2ZwM3c7Vgr2yvJnAkN0UzB17a9rIX+ofU9uqX6uI2+lPysWS taBPR+sW6HkZ4FNB9iL+Fjepkw== X-Google-Smtp-Source: AGHT+IHrE4rTvkXO7RwqgnTEvxw+lDK+ANzha4IBQF+qFVxQo+OdsPaqgZdwuRA69Q5Iw9LfoqglQw== X-Received: by 2002:a17:902:da8c:b0:1bf:78d:5cde with SMTP id j12-20020a170902da8c00b001bf078d5cdemr1986513plx.59.1693400278469; Wed, 30 Aug 2023 05:57:58 -0700 (PDT) Received: from GL4FX4PXWL.bytedance.net ([139.177.225.247]) by smtp.gmail.com with ESMTPSA id iw1-20020a170903044100b001bbd8cf6b57sm11023265plb.230.2023.08.30.05.57.52 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 30 Aug 2023 05:57:58 -0700 (PDT) From: Peng Zhang To: Liam.Howlett@oracle.com, corbet@lwn.net, akpm@linux-foundation.org, willy@infradead.org, brauner@kernel.org, surenb@google.com, michael.christie@oracle.com, peterz@infradead.org, mathieu.desnoyers@efficios.com, npiggin@gmail.com, avagin@gmail.com Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Peng Zhang Subject: [PATCH v2 6/6] fork: Use __mt_dup() to duplicate maple tree in dup_mmap() Date: Wed, 30 Aug 2023 20:56:54 +0800 Message-Id: <20230830125654.21257-7-zhangpeng.00@bytedance.com> X-Mailer: git-send-email 2.37.0 (Apple Git-136) In-Reply-To: <20230830125654.21257-1-zhangpeng.00@bytedance.com> References: <20230830125654.21257-1-zhangpeng.00@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 9639F1C002D X-Stat-Signature: c3xd4391bxpbb1hn6mkdjgzbbnt9y4cn X-Rspam-User: X-HE-Tag: 1693400279-76832 X-HE-Meta: U2FsdGVkX18QRQRuM7MVKZfPqQIMB1jFPk7Ef5v55dj31/IBsCScmTKN18/E4a6rw4o/AL/gs3OlwgKCvaHDW+XMgNb7+frkoSFtErB7lN/7xcGeffocayVcRqzS2Pxrp0jf4Cd5WR8WY5QYbS8tfWTSWXZ2ymnG0t/3M8EpJzr5dM4MMOlERQZs2AOECyfzfzPf7pSHilHs66QfHZHF9/B5m3d6WRCi5pwo5rjmNaKkOkKRlWvh0YycKYLQU77opEp7ZVB/vzsyrZvNZFVd1fHN/q6oR5SvQVJb+00HsQvGpTF+6cj/K+PeagXG3exnv31X2TYxDdLOpKCyQzNIq1sP5zYaBQKsQJzi74SCixdBGIgoBqlHN2cYKBsShbfSraX0hcVP/MJmv6Dc4QIs9M6QPk40BIM8gJhxR5gSPpvjHBif8gCApjTDF1ocT7j9DGeW9zRYlzb6l0HRr3/s26vhm/cM9OtOvznXMFCfgKv/O1AxccuC7uTUelxyHA2ioDHLNwDuapEnmqbMe/hIwqdd0arJnim65LF9n7tR5UGgbWRrcLxNv6m/kKx3xX50hL+pujiFq8YZaquCFBl2lqSUVr/lblF4OBgBr7nqnIG1SPXGjEXqcCAxDI6oy7cv2RfhCGpdvQf44Q/sZzGOyG+HxoNUrmH/vf+C3XX/bQsVVQedTpMDHy4Gx7IIvNuBvnOHpvjplhvl/9UDThZ42qug+0EbxUqLLlAgds9358IKduFZOL0hW80ub+cQSs5cTVbiGczYNSG5IzdsCTBOluolxySnMf0PpqlGkhkG1vMFpoJU8QZhcrbOtio+Cc+vAgoxOJyfeuZHNNIYw3FHIV6XRqnuh7nW4TT/YRSv+jW2tSB9Z9SGq7tcV2XpEHOrCKAxXE48IBbg7QMIugFJEmh/HL8ChSNT69aKoavGHqsoOXsg6zOjJhGD1b0N22bKssgxZ1swbaRzgcfASv7 URL/vNPG i7boC7bJkynJtCcVS1+OzQX6rABXIGUxYhHDR86mJHoXCveDH2bePM8NkXYOABT/LC1/maJuJsG4DpICHRd4EtuAzS+aEpkkS0obmolp+Ga+KoLAEZ2Hpxhb6U9fN/2UwAbfFRG3cBoDw1epSyqAicZmYs0yEZwsLgWr+FwfazHT9hmVDEQzL/m5T2gyU9zfKPo6CZNNEwLjlRDl9hmRSFpcc18HX1sK7cGShOrrgAfGmi2fawzbK1dekTyJlYmV15bRb1qEmpM/SjcsScUCebFIe3GPfnXYOmJr+EdGrhVwK+GGxXuBQAWCbhIj/uHwkRIwTs/ZucIv91lBoQUQnWblZoVerrut/GxeIn0WH38iLqn71Did+VBS9cOtT36Zj7eSZKxRmy8RUMOn8xvdFcGSVo9309gDwBa1wHRzjNmRsBTQJxrQBOotIVc/fyhuQyExU9Ry+O5DkfMegWOkiPaZ4FPnJJeUyPnQgndDUXwvyqi3Bgsdv/cJW8XlPiqDvvvxO X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Use __mt_dup() to duplicate the old maple tree in dup_mmap(), and then directly modify the entries of VMAs in the new maple tree, which can get better performance. The optimization effect is proportional to the number of VMAs. There is a "spawn" in byte-unixbench[1], which can be used to test the performance of fork(). I modified it slightly to make it work with different number of VMAs. Below are the test numbers. There are 21 VMAs by default. The first row indicates the number of added VMAs. The following two lines are the number of fork() calls every 10 seconds. These numbers are different from the test results in v1 because this time the benchmark is bound to a CPU. This way the numbers are more stable. Increment of VMAs: 0 100 200 400 800 1600 3200 6400 6.5.0-next-20230829: 111878 75531 53683 35282 20741 11317 6110 3158 Apply this patchset: 114531 85420 64541 44592 28660 16371 9038 4831 +2.37% +13.09% +20.23% +26.39% +38.18% +44.66% +47.92% +52.98% [1] https://github.com/kdlucas/byte-unixbench/tree/master Signed-off-by: Peng Zhang --- kernel/fork.c | 34 ++++++++++++++++++++++++++-------- mm/mmap.c | 14 ++++++++++++-- 2 files changed, 38 insertions(+), 10 deletions(-) diff --git a/kernel/fork.c b/kernel/fork.c index 3b6d20dfb9a8..e6299adefbd8 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -650,7 +650,6 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, int retval; unsigned long charge = 0; LIST_HEAD(uf); - VMA_ITERATOR(old_vmi, oldmm, 0); VMA_ITERATOR(vmi, mm, 0); uprobe_start_dup_mmap(); @@ -678,17 +677,39 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, goto out; khugepaged_fork(mm, oldmm); - retval = vma_iter_bulk_alloc(&vmi, oldmm->map_count); - if (retval) + /* Use __mt_dup() to efficiently build an identical maple tree. */ + retval = __mt_dup(&oldmm->mm_mt, &mm->mm_mt, GFP_NOWAIT | __GFP_NOWARN); + if (unlikely(retval)) goto out; mt_clear_in_rcu(vmi.mas.tree); - for_each_vma(old_vmi, mpnt) { + for_each_vma(vmi, mpnt) { struct file *file; vma_start_write(mpnt); if (mpnt->vm_flags & VM_DONTCOPY) { vm_stat_account(mm, mpnt->vm_flags, -vma_pages(mpnt)); + + /* + * Since the new tree is exactly the same as the old one, + * we need to remove the unneeded VMAs. + */ + mas_store(&vmi.mas, NULL); + + /* + * Even removing an entry may require memory allocation, + * and if removal fails, we use XA_ZERO_ENTRY to mark + * from which VMA it failed. The case of encountering + * XA_ZERO_ENTRY will be handled in exit_mmap(). + */ + if (unlikely(mas_is_err(&vmi.mas))) { + retval = xa_err(vmi.mas.node); + mas_reset(&vmi.mas); + if (mas_find(&vmi.mas, ULONG_MAX)) + mas_store(&vmi.mas, XA_ZERO_ENTRY); + goto loop_out; + } + continue; } charge = 0; @@ -750,8 +771,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, hugetlb_dup_vma_private(tmp); /* Link the vma into the MT */ - if (vma_iter_bulk_store(&vmi, tmp)) - goto fail_nomem_vmi_store; + mas_store(&vmi.mas, tmp); mm->map_count++; if (!(tmp->vm_flags & VM_WIPEONFORK)) @@ -778,8 +798,6 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, uprobe_end_dup_mmap(); return retval; -fail_nomem_vmi_store: - unlink_anon_vmas(tmp); fail_nomem_anon_vma_fork: mpol_put(vma_policy(tmp)); fail_nomem_policy: diff --git a/mm/mmap.c b/mm/mmap.c index b56a7f0c9f85..dfc6881be81c 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -3196,7 +3196,11 @@ void exit_mmap(struct mm_struct *mm) arch_exit_mmap(mm); vma = mas_find(&mas, ULONG_MAX); - if (!vma) { + /* + * If dup_mmap() fails to remove a VMA marked VM_DONTCOPY, + * xa_is_zero(vma) may be true. + */ + if (!vma || xa_is_zero(vma)) { /* Can happen if dup_mmap() received an OOM */ mmap_read_unlock(mm); return; @@ -3234,7 +3238,13 @@ void exit_mmap(struct mm_struct *mm) remove_vma(vma, true); count++; cond_resched(); - } while ((vma = mas_find(&mas, ULONG_MAX)) != NULL); + vma = mas_find(&mas, ULONG_MAX); + /* + * If xa_is_zero(vma) is true, it means that subsequent VMAs + * donot need to be removed. Can happen if dup_mmap() fails to + * remove a VMA marked VM_DONTCOPY. + */ + } while (vma != NULL && !xa_is_zero(vma)); BUG_ON(count != mm->map_count); -- 2.20.1