From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C31D4C433F5 for ; Sun, 19 Dec 2021 18:00:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1E7A26B0071; Sun, 19 Dec 2021 13:00:13 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 197726B0073; Sun, 19 Dec 2021 13:00:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 05FD76B0074; Sun, 19 Dec 2021 13:00:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0066.hostedemail.com [216.40.44.66]) by kanga.kvack.org (Postfix) with ESMTP id EB7F06B0071 for ; Sun, 19 Dec 2021 13:00:12 -0500 (EST) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id A952583EAF for ; Sun, 19 Dec 2021 17:59:56 +0000 (UTC) X-FDA: 78935307192.01.44CF3AC Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf20.hostedemail.com (Postfix) with ESMTP id 8BF8D1C004B for ; Sun, 19 Dec 2021 17:59:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1639936795; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5t0uLGB88euDBAUi/hIHWgd5nmy1opcjQo/ctPwzp+0=; b=PZF0pycGn+yHd/HDvzlVlvV5nwCsAdNGi1AYGjfJYzSs8ehmVoNMXnao7qm1DOSt/S6hRI e8Ue/raeHAcm2d2R5eY7FapxrKy/+2vDgqB0Vqm3px98SZezW9tu8veJIswqGDzpfxlg8d ET85TUPSFtwljOti1NWhizKXkw2guVk= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-625-xV1Gp8ErPV-fbnoYiwtJ5g-1; Sun, 19 Dec 2021 12:59:54 -0500 X-MC-Unique: xV1Gp8ErPV-fbnoYiwtJ5g-1 Received: by mail-wr1-f71.google.com with SMTP id v18-20020a5d5912000000b001815910d2c0so2781621wrd.1 for ; Sun, 19 Dec 2021 09:59:54 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:organization:subject :in-reply-to:content-transfer-encoding; bh=5t0uLGB88euDBAUi/hIHWgd5nmy1opcjQo/ctPwzp+0=; b=eS8MXwHhZ+Q4mtsub3kCzCqWRCktf3p+WybuNPRX2+R2VU41l4OhMNsaU+gbIX/YHA BlQzr4swfdi/Ga2/6wt6B8SFCD2j2AWdHAZQPkZoPi4aPMrmPFmCKMdvkY6uhiw5mdaR CdkdOXZY85gAvNMLwdWJX+wTPRJ0UyTa1sXHu9onLkSf0mG+SxV875TtL+fjKJH/kGGA 17pFhTcaPZ1IN4QN9J/9Nzx8IeSS/X1H7beHwuCGv+m+kyMlGRH/UHdgeiPc32kSiakZ GSMqHHGH2Ko8EeEKpqJCZIA9OnZv2bBTOexa3zmCzokIuxerDCQ+KMl/fXe8i16SAzCG VHLA== X-Gm-Message-State: AOAM532zCy6B1OXvHsL+Qt5JdLwMkEa4zi/d9aUocozVpF5eTdDQxYaY zCXtwlfNCleGtSURx8te7JACRvmIW4jTbaGrIdYiRv2ujjUiyaCMc5n4K41/lKUUcjUTsOVLV9q 1F/U7Aoneixs= X-Received: by 2002:a05:600c:2943:: with SMTP id n3mr8015447wmd.167.1639936793328; Sun, 19 Dec 2021 09:59:53 -0800 (PST) X-Google-Smtp-Source: ABdhPJwsJEpquzSqMiXV3ss+CHHi+TlMEUCcvL8NS/DbzmPPCdr4VFbmvYDBP18cD5LrX1DYfzRajQ== X-Received: by 2002:a05:600c:2943:: with SMTP id n3mr8015420wmd.167.1639936793109; Sun, 19 Dec 2021 09:59:53 -0800 (PST) Received: from [192.168.3.132] (p4ff23c6f.dip0.t-ipconnect.de. [79.242.60.111]) by smtp.gmail.com with ESMTPSA id b197sm13271359wmb.24.2021.12.19.09.59.51 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 19 Dec 2021 09:59:52 -0800 (PST) Message-ID: Date: Sun, 19 Dec 2021 18:59:51 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0 To: Linus Torvalds , Nadav Amit Cc: Jason Gunthorpe , Linux Kernel Mailing List , Andrew Morton , Hugh Dickins , David Rientjes , Shakeel Butt , John Hubbard , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Linux-MM , "open list:KERNEL SELFTEST FRAMEWORK" , "open list:DOCUMENTATION" References: <54c492d7-ddcd-dcd0-7209-efb2847adf7c@redhat.com> <20211217204705.GF6385@nvidia.com> <2E28C79D-F79C-45BE-A16C-43678AD165E9@vmware.com> <20211218030509.GA1432915@nvidia.com> <5C0A673F-8326-4484-B976-DA844298DB29@vmware.com> <20211218184233.GB1432915@nvidia.com> <5CA1D89F-9DDB-4F91-8929-FE29BB79A653@vmware.com> <4D97206A-3B32-4818-9980-8F24BC57E289@vmware.com> <5A7D771C-FF95-465E-95F6-CD249FE28381@vmware.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v1 06/11] mm: support GUP-triggered unsharing via FAULT_FLAG_UNSHARE (!hugetlb) In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 8BF8D1C004B X-Stat-Signature: oj3pu9kzfku51tbt4nfthqpdtti85uki Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=PZF0pycG; spf=none (imf20.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-HE-Tag: 1639936792-66227 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 19.12.21 18:44, Linus Torvalds wrote: > David, you said that you were working on some alternative model. Is it > perhaps along these same lines below? > > I was thinking that a bit in the page tables to say "this page is > exclusive to this VM" would be a really simple thing to deal with for > fork() and swapout and friends. > > But we don't have such a bit in general, since many architectures have > very limited sets of SW bits, and even when they exist we've spent > them on things like UDDF_WP., > > But the more I think about the "bit doesn't even have to be in the > page tables", the more I think maybe that's the solution. > > A bit in the 'struct page' itself. > Exactly what I am prototyping right now. > For hugepages, you'd have to distribute said bit when you split the hugepage. Yes, that's one tricky part ... > > But other than that it looks quite simple: anybody who does a virtual > copy will inevitably be messing with the page refcount, so clearing > the "exclusive ownership" bit wouldn't be costly: the 'struct page' > cacheline is already getting dirtied. > > Or what was your model you were implying you were thinking about in > your other email? You said I'm playing with the idea of not setting the bit always during COW but only on GUP request to set the bit (either manually if possible or via FOLL_UNSHARE). That's a bit more tricky but allows for decoupling that approach completely from the page_pin() counter. fork() is allowed to clear the bit if page_count() == 1 and share the page. So no GUP->no fork() performance changes (!) . Otherwise the bit can only vanish if we swapout/migrate the page: in which case there are no additional GUP/references on the page that rely on it! The bit can be set directly if we have to copy the page in the fault handler (COW or unshare). Outside of COW/Unshare code, the bit can only be set if page_count() == 1 and we sync against fork(). (and that's the problem for gup-fast-only that I'm investigating right now, because it would then always have to fallback to the slow variant if the bit isn't already set) So the bit can "vanish" whenever there is no additional reference on the page. GUP syncs against fork() and can thereby set the bit/request to set the bit. I'm trying to decouple it completely from the page_pin() counter to also be able to handle FOLL_GET (O_DIRECT reproducers unfortunately) correctly. Not set it stone, just an idea what I'm playing with right now ... and I have to tripple-check if * page is PTE mapped in the page table I'm walking * page_count() == 1 Really means that "this is the only reference.". I do strongly believe so .. :) -- Thanks, David / dhildenb