From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 510A6C636D7 for ; Thu, 16 Feb 2023 17:01:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CF0EF6B0073; Thu, 16 Feb 2023 12:01:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C9FB56B0074; Thu, 16 Feb 2023 12:01:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B674D6B0075; Thu, 16 Feb 2023 12:01:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id A62006B0073 for ; Thu, 16 Feb 2023 12:01:02 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 79FB5A1138 for ; Thu, 16 Feb 2023 17:01:02 +0000 (UTC) X-FDA: 80473769964.02.4274FE4 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf07.hostedemail.com (Postfix) with ESMTP id 751794000C for ; Thu, 16 Feb 2023 17:00:58 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=EwLgUOxn; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf07.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1676566858; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rPtS2M4ckztPkytz/xiV8zOap2sT6BuznJ9SmPkRUBo=; b=sokd8tM0Nxvryhl56xgrkLmk2N/nQ1kYFUNpwNu0WHfksfs6ly1hglW2eiQTInxzgAFzv8 txH+6orlhGhKOkfrotqwN8e1YDC5LyNiASUOrwUyZhzypxmn81xThSGk9YNUB00GDGF20+ c5ENjQLkg5am1ecA67rkQUOLKIHSUP4= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=EwLgUOxn; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf07.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1676566858; a=rsa-sha256; cv=none; b=EtcX7TTmLwqixEaS0R7PggM9eZ6C48wDTpKnrQESbSNBOF7+dVwmz0J2T69pinoHT3Rs+q k9WhYlmxZlKG+0ZKMREb1lY+qREH+uwtyhiLJuRZxlom9FJfkK963wF3dS3rTKOhWrASU4 RE8RUbgeXNg//qv7otuNFB6j88FdrLM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1676566857; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rPtS2M4ckztPkytz/xiV8zOap2sT6BuznJ9SmPkRUBo=; b=EwLgUOxnzuwxuVDj/rYqZhLMY+tIyD79PK/GykZuHX3DzITVfBDNIgzxTZmp14fYGej+3j ztwZbKRhROWkPZ6RBSKgZ9uFI7RQyA7QzeB+tvz8pewTL9OKEQXIs8zJB+d/AX1OBabmw7 Nf4ja8FSHaX7oRZZInksTOP/Zg0A764= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-33-vB2Ila-AMdq9ffRPAfAv1A-1; Thu, 16 Feb 2023 12:00:55 -0500 X-MC-Unique: vB2Ila-AMdq9ffRPAfAv1A-1 Received: by mail-wm1-f70.google.com with SMTP id bg9-20020a05600c3c8900b003e1e7d3db06so1416621wmb.5 for ; Thu, 16 Feb 2023 09:00:54 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=rPtS2M4ckztPkytz/xiV8zOap2sT6BuznJ9SmPkRUBo=; b=14/HDdnrHb9ACzfU2h6CgPhDPYE3V8IMNH36RZDchEntoTwEQV02pMfAm050tcAT0S W2d7k7B8CXfJiO7fu5t7QO5X0MzlalRbTr5kXq36H6TQ9wWLT00YQ9aayfI/HIPihXFD E3u4k8q2dGSrhLaJ1mf7hWUarUG+a+Rdeemz/MP6FyG5Yk/GqKBv1DFTeWs01+QHCG1s bjOSvjMj5DdCjtaXnOCIs3CqkDleUlOfylOHtE179npMR8vPNkcBCf7uYsdAli/Ngpey 4oGoe17zp2MatKK9J3luVOJrr5FSy0X99AFMUOrtkDKPF9I9Ombg9f7S91VLzR3cqcsg Eebg== X-Gm-Message-State: AO0yUKUdvWXptwwSKnK9UkN1sPvD2/2NZIV2fxZO4jATxP3CAew4iKA2 I5f/MuqoUnCgvSegqt91Ku2TcJ/n/pQ28nhQrxeHBy1Gb7QsEOi+AQeQsqsYS+FCi+L8sKQ4Nhi 5WZ6y8nEzV+s= X-Received: by 2002:a5d:40d0:0:b0:2c5:5a65:799f with SMTP id b16-20020a5d40d0000000b002c55a65799fmr4808556wrq.34.1676566853679; Thu, 16 Feb 2023 09:00:53 -0800 (PST) X-Google-Smtp-Source: AK7set9dw7x2GZ1RBccsi5U8GisuBksZelvx9S8gQKwDhaYWOgVGlFewbg4pWNsxPmhoWu8WHuq2Ng== X-Received: by 2002:a5d:40d0:0:b0:2c5:5a65:799f with SMTP id b16-20020a5d40d0000000b002c55a65799fmr4808533wrq.34.1676566853285; Thu, 16 Feb 2023 09:00:53 -0800 (PST) Received: from ?IPV6:2003:cb:c708:bc00:2acb:9e46:1412:686a? (p200300cbc708bc002acb9e461412686a.dip0.t-ipconnect.de. [2003:cb:c708:bc00:2acb:9e46:1412:686a]) by smtp.gmail.com with ESMTPSA id e14-20020a5d530e000000b002c553e061fdsm1852390wrv.112.2023.02.16.09.00.52 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 16 Feb 2023 09:00:52 -0800 (PST) Message-ID: <4f64d62f-c21d-b7c8-640e-d41742bbbe7b@redhat.com> Date: Thu, 16 Feb 2023 18:00:51 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0 To: Peter Xu Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Axel Rasmussen , Mike Rapoport , Andrew Morton , Andrea Arcangeli , Nadav Amit , Muhammad Usama Anjum References: <20230215210257.224243-1-peterx@redhat.com> <7eb2bce9-d0b1-a0e3-8be3-f28d858a61a0@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH] mm/uffd: UFFD_FEATURE_WP_ZEROPAGE In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 751794000C X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: thxmhie7ncjte98wxxfgodk51ay4y48k X-HE-Tag: 1676566858-940193 X-HE-Meta: U2FsdGVkX186Izgnuyli4R95NyTJqcPzMdSgtoisiOvI2WqW/RfkJZ7Iqz7w/qP6oeahykmGdgDwl5s1/KwCFA93vdcmr6MB+wtCamspUZSSEubLGrShG9qWHEmFsD8dYdkfhpRh+aPiPUfILqgFUAnUiV6tlu8zfBWDTA011NlbEzEQp/kNETc4EYgvgSFPuvFjvikHrfWR6nbk8SDUmI3WnYc3bytrGKRWsqsjKSXKw70HNdx9CncnCvqWPsrNqAYqU88SZhPBiFEeqKw/yrUSdkXGERQvyMNGbppMyQ2mbX3FwoWnUWwZt3q1WykwNwNdp8Q3i376WGRa/45y9AncAgVY09vn87VvC9dkwK71r+AJ4d+0bw41hRJLdrX0dTjHJhETmTy1xtQ+/C0e9/2fze5rLWBtJiM64tdjPY09WbjTHRmP1AMCmvcksvQtF4RLMSUnBlTdQcXMemSfD++r76yHwVWBWJkLkQWlz93n/bEp/mVIHnj3dhwtniwMuhophw9P/3ooQ/ZZahNOPy+LtqRg8nuJGhkIMoIbuzTk71ovkgg/m7FTU7kGXp40ZsPrW1C5zEtt+VTzi5zNnAM6vpo63Jhm73UXJNIHZw6WEEqT1B+6MNwbIukRLJCGhidOLQRxq0mzrce14f+/+4RjeKDj+GjCUZHOKZHH9lGM+VV3bgq7kGbfEBUpQtYKuf63QzeqJshcrAyfNGrv8rHoN6TBExE8KSIIEGPVpEZN9QD2IsihRM9yBvr8ZmB0oUk6aX4COs2Ljtn2pyzbBoRUqunXU/lt/msSq9lyBy03VMIUCz/rex+KFRN8bPPcrv4r/zOXFQR7kytNFz2gqYvaYnjE4O943YIppwz6MbpwAjrTTs7zl0JO+t4mTZqAODdxOwhPFGLFGpgjZkCuVoP1e9dT6RrjuvlDMoV+/b1Zfv6DR/9ULAjE/H/lgQhyOumtrBzCmQvk5SFyEcW h/THj0lY vlJ8h6Bau4oCp9DyRj0PO7ewW+VwrJzRya3RF9rXrDmD1Uy4tTf2UPd8LZiwzHBg0I1zDFCqhzjFxS1wH3JRx3vr8Op31FNyfCo284n+rjkB3ys6HMhHERps8oepE8gNCTzRe9d1OI+zIBOiB6FgxE3iU4NkD+94duACPlCtLGmOZXPlFzzx/c8ctZs50MQcMbaf721UoszEIMGVPbSIWXAbu7xLaxy1I1iVlHsuVVoP7Lgy9K3kyopyqJ8x1oEixU3DQ52c5ohFXeN2czOcoSanAvgCSTLulwO67os6AV1lxjpGuIXEOLReGFT5zZFI+5vdwECDGGxsdiu/QcmPPGDojKuOfIMT7M+OmxvcQvBSTaNHrpJ+j2Ws1iQmFNzCsFGgqJ4Q6gLTsV+5KbXxfGkd7T6OY9YuJ5km0IEeWXfkBYCVybDFQvMx7K8vF8pBFrwMBNlWCbzlI1Mui0AQ9Ry11U7/gHQBDdUnrWybrovlP4cSMUCSaabYAGhsMnUPNXiPT X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: >> >> There are various reasons why I think a UFFD_FEATURE_WP_UNPOPULATED, using >> PTE markers, would be more benficial: >> >> 1) It would be applicable to anon hugetlb > > Anon hugetlb should already work with non ptes with the markers? > ... really? I thought we'd do the whole pte marker handling only when dealing with hugetlb/shmem. Interesting, thanks. (we could skip population in QEMU in that case as well -- we always do it for now) >> 2) It would be applicable even when the zeropage is disallowed >> (mm_forbids_zeropage()) > > Do you mean s390 can disable zeropage with mm_uses_skeys()? So far uffd-wp > doesn't support s390 yet, I'm not sure whether we over worried on this > effect. > > Or is there any other projects / ideas that potentially can enlarge forbid > zero pages to more contexts? I think it was shown that zeropages can be used to build covert channels (similar to memory deduplciation, because it effectively is memory deduplication). It's mentioned as a note in [1] under VII. A. ("Only Deduplicate Zero Pages.") [1] https://www.ndss-symposium.org/wp-content/uploads/2022-81-paper.pdf > >> 3) It would be possible to optimize even without the huge zeropage, by >> using a PMD marker. > > This patch doesn't need huge zeropage being exist. Yes, and for that reason I think it may perform worse than what we already have in some cases. Instead of populating a single PMD you'll have to fill a full PTE table. > >> 4) It would be possible to optimize even on the PUD level using a PMD >> marker. > > I think 3+4 is in general an interesting idea on using pte markers on > higher than pte levels, but that needs more changes. > > Firstly, keep using pte markers is somehow preallocating the pgtables, so a > side effect of it could be speeding up future faults because they'll all > split into pmd locks and read doesn't need to fault at all, only writes. > > Imagine when you hit a page fault on a pmd marker, it means you'll need to > spread that "marker" information to child ptes and you must - it moves the > slow operation of WP into future page faults in some way. In some cases > (I'd say, most cases..) that's not wanted. The same to PUDs. Right, but user space already has that option (see below). > >> >> Especially when uffd-wp'ing large ranges that are possibly all unpopulated >> (thinking about the existing VM background snapshot use case either with >> untouched memory or with things like free page reporting), we might neither >> be reading or writing that memory any time soon. > > Right, I think that's a trade-off. But I still think large portion of > totally unpopulated memory should be rare case rather than majority, or am > I wrong? Not to mention that requires a more involved changeset to the > kernel. > > So what I proposed here is the (AFAIU) simplest solution towards providing > such a feature in a complete form. I think we have chance to implement it > in other ways like pte markers, but that's something we can work upon, and > so far I'm not sure how much benefit we can get out of it yet. > What you propose here can already be achieved by user space fairly easily (in fact, QEMU implementation could be further sped up using MADV_POPULATE_READ). Usually, we only do that when there are very good reasons to (performance). Using PTE markers would provide a real advantage IMHO for some users (IMHO background snapshots), where we might want to avoid populating zeropages/page tables as best as we can completely if the VM memory is mostly untouched. Naturally, I wonder if UFFD_FEATURE_WP_ZEROPAGE is really worth it. Is there is another good reason to combine the populate zeropage+wp that I am missing (e.g., atomicity by doing both in one operation)? -- Thanks, David / dhildenb