From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9BB59C433FE for ; Fri, 21 Oct 2022 09:11:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BA54E8E0002; Fri, 21 Oct 2022 05:11:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B2C0A8E0001; Fri, 21 Oct 2022 05:11:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9CC468E0002; Fri, 21 Oct 2022 05:11:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 870438E0001 for ; Fri, 21 Oct 2022 05:11:15 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 1B65B160C14 for ; Fri, 21 Oct 2022 09:11:15 +0000 (UTC) X-FDA: 80044387710.01.C68280F Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf02.hostedemail.com (Postfix) with ESMTP id B656A80003 for ; Fri, 21 Oct 2022 09:11:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1666343472; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QNj3hgEs4bdBiVyu7LVFVtKGSZLscyD7sZ5w02+87Hk=; b=cKSwYpKoHRpZ4y2RyAMat+ccxI5VaVlF96s7ERLC8xT0w5T4GikRe84+njXXV/Y+nD5UWm bz4o56vAZRPF1E0uaVXGYF+uZJ+DS5D+QUdhA1/h4XFHoZx0Eb2qrWIV3MvNP6TNjSYEZ6 CiaGAU8s/Ov6uJtK3svZa6BpHEWXDGg= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-647-SIx-ogoBPLig2ZWi2F5gJw-1; Fri, 21 Oct 2022 05:11:09 -0400 X-MC-Unique: SIx-ogoBPLig2ZWi2F5gJw-1 Received: by mail-wm1-f70.google.com with SMTP id v125-20020a1cac83000000b003bd44dc5242so3051973wme.7 for ; Fri, 21 Oct 2022 02:11:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=QNj3hgEs4bdBiVyu7LVFVtKGSZLscyD7sZ5w02+87Hk=; b=I2Ub+E0d60doR+KhsZM4anDCKQ3P2SaQJtKFgl1RtocXcESvYeLoLXR4m78l2YTysf GpWObFnyoc42prtGNcJehTXeHFd2Ew8uZStPUoiQCU0RTts6SqDUFPlCSywL9nqYKHk4 DarsFbzaJZrIa77dpLX5mF9bjq2Gwum7pQ7CHFmjoOSaOUF7yyHXOwCYlkZ9Sny1YTE3 A28Nr66R9OEt9Kfwrh8wkUhO+xQwP+ZhNxMdj/gTsmZZyKsiYexfl9EuuLtkiAqDuVu0 IGZStWDROmguAKQbJEpObNkEmt+LbFfzC5cb4BC/GzHngz95KzZ+wmm1G/r3TiumINfo NcQQ== X-Gm-Message-State: ACrzQf2CXute1TFJnEteB9DJ5y5Kp6RdL3lEkb/MWj2aDd+t7iUbzlPi oyJ1+tCGCDCeM78o5lMnf/Fr5/ZwDzQZW5D9eZvadjuMfsOWMKlNwbARejTZOnatAGV/FhT19WF 37Tpr7YETcso= X-Received: by 2002:a5d:59a8:0:b0:22e:d6ff:3a7c with SMTP id p8-20020a5d59a8000000b0022ed6ff3a7cmr10953453wrr.128.1666343467789; Fri, 21 Oct 2022 02:11:07 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6hFXNeBt6oxLNmY4x/JZYnr1k0+HqDtVJtEv7DGA4moNCSIWQ++QOW5VSnf7aMpvwJq4FCeQ== X-Received: by 2002:a5d:59a8:0:b0:22e:d6ff:3a7c with SMTP id p8-20020a5d59a8000000b0022ed6ff3a7cmr10953431wrr.128.1666343467486; Fri, 21 Oct 2022 02:11:07 -0700 (PDT) Received: from ?IPV6:2003:cb:c708:1700:e40d:574c:c991:5f78? (p200300cbc7081700e40d574cc9915f78.dip0.t-ipconnect.de. [2003:cb:c708:1700:e40d:574c:c991:5f78]) by smtp.gmail.com with ESMTPSA id f18-20020adff452000000b0022584c82c80sm18307517wrp.19.2022.10.21.02.11.06 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 21 Oct 2022 02:11:06 -0700 (PDT) Message-ID: <948fbf8f-1f0c-ace4-f7b3-64f2bbca00f8@redhat.com> Date: Fri, 21 Oct 2022 11:11:05 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.3.1 To: Peter Xu Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Andrew Morton , Shuah Khan , Hugh Dickins , Vlastimil Babka , Andrea Arcangeli , "Matthew Wilcox (Oracle)" , Jason Gunthorpe , John Hubbard References: <20220930141931.174362-1-david@redhat.com> <20220930141931.174362-7-david@redhat.com> <9a84440f-1462-2193-7dd6-c84e8bb22232@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v1 6/7] mm/ksm: convert break_ksm() to use walk_page_range_vma() In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666343474; a=rsa-sha256; cv=none; b=UePIHjjslxMxdDIz9cxRDWd+/gYg9nYFV8tctOibmOgpRuKnuNvYttm6odtXFWqxTiRUz3 /anl3qRgeaMMca/YbliwSTnp3RwUmI+vaMfzVnhX8S0/c8sFQSt6aadH5+0a9uoK4Nojms ri9W/yswS8VvnOFgWZ6odqdFmbE+Yew= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=cKSwYpKo; spf=pass (imf02.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666343474; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QNj3hgEs4bdBiVyu7LVFVtKGSZLscyD7sZ5w02+87Hk=; b=651rQtH6SW4g+Mv4gYA3Pdi+n2Zjwe8vkA9iQjsJT7rrJx2W5iTaGcTntwBCj9spopTgye hV/aiHD7CPzXptwpUumfKqf6CEjI9UE7zUcqPoVNjkfjJyUJUaJ6QTd292n76X+m+/WDnu bh6i1v/pmWktER8yCyeWIgFt1sYZtzA= X-Stat-Signature: gm4731u9ug5fp75wiri9u8ottk79cue1 X-Rspamd-Queue-Id: B656A80003 Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=cKSwYpKo; spf=pass (imf02.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspam-User: X-Rspamd-Server: rspam05 X-HE-Tag: 1666343473-903479 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 06.10.22 21:28, Peter Xu wrote: > On Thu, Oct 06, 2022 at 11:20:42AM +0200, David Hildenbrand wrote: >>>> +int break_ksm_pud_entry(pud_t *pud, unsigned long addr, unsigned long next, >>>> + struct mm_walk *walk) >>>> +{ >>>> + /* We only care about page tables to walk to a single base page. */ >>>> + if (pud_leaf(*pud) || !pud_present(*pud)) >>>> + return 1; >>>> + return 0; >>>> +} >>> >>> Is this needed? I thought the pgtable walker handlers this already. >>> >>> [...] >>> >> >> Most probably yes. I was trying to avoid about PUD splits, but I guess we >> simply should not care in VMAs that are considered by KSM (MERGABLE). Most >> probably never ever happens. > > I was surprised the split is the default approach; didn't really notice > that before. Yeah maybe better to keep it. Interestingly, one callback reduces the benchmark result by 100-200 MiB. With only break_ksm_pmd_entry(), I get ~4900 MiB/s instead of ~5010 MiB/s (~2%). I came to the conclusion that we really shouldn't have to worry about pud THPs here: it could only be a file PUD and splitting that only zaps the entry, but doesn't PMD- or PTE-map it. Also, I think we will barely see large pud THP in a mergable mapping ... :) [...] >> My main motivation is to remove most of that GUP hackery here, which is >> 1) Getting a reference on a page and waiting for migration to finish >> even though both is unnecessary. >> 2) As we don't have sufficient control, we added FOLL_MIGRATION hacks to >> MM core to work around limitations in the GUP-based approacj. > > I saw one thing of adding FOLL_MIGRATION from Hugh was to have a hint for > follow page users: > > I'd have preferred to avoid another flag, and do it every time, in case > someone else makes the same easy mistake.. > > Though.. The important thing I think is that FOLL_MIGRATION really only applies to follow_page(). In case of "modern" GUP we will just wait for migration entries, handle swap entries ... when triggering a page fault. > >> 3) We rely on legacy follow_page() interface that we should really get >> rid of in the long term. > > ..this is part of effort to remove follow_page()? More context will be > helpful in that case. The comment from Hugh is another example why follow_page() adds complexity. One might wonder, how pages in the swapcache, device coherent pages, ... would have to be handled. Short-term, I want to cleanup GUP. Long-term we might want to consider removing follow_page() completely. [...] >> >> Yes, we have to extend page walking code, but it's just the natural, >> non-hacky way of doing it. >> >> Regarding the 4% performance degradation (if I wouldn't have added the >> benchmarks, nobody would know and probably care ;) ), I am not quite sure >> why that is the case. We're just walking page tables after all in both >> cases. Maybe the callback-based implementation of pagewalk code is less >> efficient, but we might be able to improve that implementation if we really >> care about performance here. Maybe removing break_ksm_pud_entry() already >> improves the numbers slightly. > > Yeah it could be the walker is just slower. And for !ksm walking your code > should be faster when hit migration entries, but that should really be rare > anyway. I have the following right now: From 7f767f9e9e673a29793cd35f1c3d66ed593b67cb Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Mon, 25 Jul 2022 10:36:20 +0200 Subject: [PATCH] mm/ksm: convert break_ksm() to use walk_page_range_vma() FOLL_MIGRATION exists only for the purpose of break_ksm(), and actually, there is not even the need to wait for the migration to finish, we only want to know if we're dealing with a KSM page. Using follow_page() just to identify a KSM page overcomplicates GUP code. Let's use walk_page_range_vma() instead, because we don't actually care about the page itself, we only need to know a single property -- no need to even grab a reference. So, get rid of follow_page() usage such that we can get rid of FOLL_MIGRATION now and eventually be able to get rid of follow_page() in the future. In my setup (AMD Ryzen 9 3900X), running the KSM selftest to test unmerge performance on 2 GiB (taskset 0x8 ./ksm_tests -D -s 2048), this results in a performance degradation of ~2% (old: ~5010 MiB/s, new: ~4900 MiB/s). I don't think we particularly care for now. Interestingly, the benchmark reduction is due to the single callback. Adding a second callback (e.g., pud_entry()) reduces the benchmark by another 100-200 MiB/s. Signed-off-by: David Hildenbrand --- mm/ksm.c | 49 +++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 39 insertions(+), 10 deletions(-) diff --git a/mm/ksm.c b/mm/ksm.c index c6f58aa6e731..5cdb852ff132 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -39,6 +39,7 @@ #include #include #include +#include #include #include "internal.h" @@ -419,6 +420,39 @@ static inline bool ksm_test_exit(struct mm_struct *mm) return atomic_read(&mm->mm_users) == 0; } +static int break_ksm_pmd_entry(pmd_t *pmd, unsigned long addr, unsigned long next, + struct mm_walk *walk) +{ + struct page *page = NULL; + spinlock_t *ptl; + pte_t *pte; + int ret; + + if (pmd_leaf(*pmd) || !pmd_present(*pmd)) + return 0; + + pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl); + if (pte_present(*pte)) { + page = vm_normal_page(walk->vma, addr, *pte); + } else if (!pte_none(*pte)) { + swp_entry_t entry = pte_to_swp_entry(*pte); + + /* + * As KSM pages remain KSM pages until freed, no need to wait + * here for migration to end. + */ + if (is_migration_entry(entry)) + page = pfn_swap_entry_to_page(entry); + } + ret = page && PageKsm(page); + pte_unmap_unlock(pte, ptl); + return ret; +} + +static const struct mm_walk_ops break_ksm_ops = { + .pmd_entry = break_ksm_pmd_entry, +}; + /* * We use break_ksm to break COW on a ksm page by triggering unsharing, * such that the ksm page will get replaced by an exclusive anonymous page. @@ -434,21 +468,16 @@ static inline bool ksm_test_exit(struct mm_struct *mm) */ static int break_ksm(struct vm_area_struct *vma, unsigned long addr) { - struct page *page; vm_fault_t ret = 0; do { - bool ksm_page = false; + int ksm_page; cond_resched(); - page = follow_page(vma, addr, - FOLL_GET | FOLL_MIGRATION | FOLL_REMOTE); - if (IS_ERR_OR_NULL(page)) - break; - if (PageKsm(page)) - ksm_page = true; - put_page(page); - + ksm_page = walk_page_range_vma(vma, addr, addr + 1, + &break_ksm_ops, NULL); + if (WARN_ON_ONCE(ksm_page < 0)) + return ksm_page; if (!ksm_page) return 0; ret = handle_mm_fault(vma, addr, -- 2.37.3 -- Thanks, David / dhildenb