From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F151C433F5 for ; Fri, 21 Jan 2022 12:04:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5F9A36B007D; Fri, 21 Jan 2022 07:04:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5A9C06B007E; Fri, 21 Jan 2022 07:04:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 470A36B0085; Fri, 21 Jan 2022 07:04:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0121.hostedemail.com [216.40.44.121]) by kanga.kvack.org (Postfix) with ESMTP id 342D66B007D for ; Fri, 21 Jan 2022 07:04:15 -0500 (EST) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id F23A98E2CF for ; Fri, 21 Jan 2022 12:04:14 +0000 (UTC) X-FDA: 79054161228.13.297B28D Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf23.hostedemail.com (Postfix) with ESMTP id 5D0DF140015 for ; Fri, 21 Jan 2022 12:04:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1642766653; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=plqbI4uQzIC/HoJcD7z9G3CYldKJvWvJe9PZR7YBaFw=; b=Jtl9hB66IjJDMWdGKVnfu3P11Do/D3uh3ebYyEeRddOZozVKxyssAOmeWTWQEPNffcdFYY 67UMpnOMB3I1B7Ax8bJST8yPZE9kv7qBsshQ6rGPFb8iUyP/8+aI+iMed0444HHg+/hMRZ dJcMAFfz/ZxwM3M4Rwuook9u89H4az8= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-192-6-tU_z-IPkmNLcGUe31IcA-1; Fri, 21 Jan 2022 07:04:12 -0500 X-MC-Unique: 6-tU_z-IPkmNLcGUe31IcA-1 Received: by mail-wm1-f70.google.com with SMTP id bg16-20020a05600c3c9000b0034bea12c043so9347312wmb.7 for ; Fri, 21 Jan 2022 04:04:12 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:organization:subject :in-reply-to:content-transfer-encoding; bh=plqbI4uQzIC/HoJcD7z9G3CYldKJvWvJe9PZR7YBaFw=; b=KwHqdn4dXeit7QceK4PcnnOUSaTkcBWmzN1BYyjtG41PWjr2TvR19CafWxaPjNL7bI fbRQmkh7iRnMUvra5JnY5oCLsrbfDHvTUKjPWLFh0VAelKNxiY2vj6bw4zEifZzOQle8 YzDY3OQ7AQXkuFVhgjnMoJrjxhMtWRJQKmMN7FR3NKxXPbGZo9eJO6mWUztyVYHep6an n7N9MiNMguYDifEzOkrh5Lq8WzZzYCaWTW0vL2S6wOhxoHIfW70eQjcaT9ZLZX9aiDWH 7UG6spSOCzPhakN8mkKGd6NoR+Mb3J+0f7meoNr419Sv00yedBndcvHs+0PuUhW5z/zJ R1TA== X-Gm-Message-State: AOAM530saqZmXhn7WlKalXciUkuyhEIA+0QLRq4jZItwNBZNVTOxPsn4 ThYQMY9of4sVwOezy7X4W1n9OATHW1k06zZ9A78gPYNvRxZhbdlgM8Q5Ml4pX0vdAj+jrIe8scG 6TGjmhSIK5c0= X-Received: by 2002:a1c:e913:: with SMTP id q19mr442688wmc.87.1642766651200; Fri, 21 Jan 2022 04:04:11 -0800 (PST) X-Google-Smtp-Source: ABdhPJwvtpL32Y4x8vX37wPiF8/xHj0kcayPEkakqi21Uxs2dZI2m36gZPvmSDiRnKhy6IhJsvH1JA== X-Received: by 2002:a1c:e913:: with SMTP id q19mr442652wmc.87.1642766650942; Fri, 21 Jan 2022 04:04:10 -0800 (PST) Received: from ?IPV6:2003:cb:c709:a200:adf9:611a:39a8:435a? (p200300cbc709a200adf9611a39a8435a.dip0.t-ipconnect.de. [2003:cb:c709:a200:adf9:611a:39a8:435a]) by smtp.gmail.com with ESMTPSA id 8sm6727265wrz.57.2022.01.21.04.04.09 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 21 Jan 2022 04:04:10 -0800 (PST) Message-ID: <7744b904-4803-1c8e-3a1c-eebd30f2da91@redhat.com> Date: Fri, 21 Jan 2022 13:04:09 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.4.0 To: Peter Zijlstra Cc: mingo@redhat.com, tglx@linutronix.de, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-api@vger.kernel.org, x86@kernel.org, pjt@google.com, posk@google.com, avagin@google.com, jannh@google.com, tdelisle@uwaterloo.ca, mark.rutland@arm.com, posk@posk.io References: <20220120155517.066795336@infradead.org> <20220120160822.666778608@infradead.org> <20220121075157.GA20638@worktop.programming.kicks-ass.net> <20220121085917.GA22849@worktop.programming.kicks-ass.net> <10d6cc13-b96b-e1b6-8751-1b245b242738@redhat.com> <20220121114058.GE20638@worktop.programming.kicks-ass.net> From: David Hildenbrand Organization: Red Hat Subject: Re: [RFC][PATCH v2 1/5] mm: Avoid unmapping pinned pages In-Reply-To: <20220121114058.GE20638@worktop.programming.kicks-ass.net> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 5D0DF140015 X-Stat-Signature: dw9x4c7rd6zaxmp5fn1zqy35jf5ssz5w Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Jtl9hB66; spf=none (imf23.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam07 X-HE-Tag: 1642766654-133743 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 21.01.22 12:40, Peter Zijlstra wrote: > On Fri, Jan 21, 2022 at 10:04:45AM +0100, David Hildenbrand wrote: >> On 21.01.22 09:59, Peter Zijlstra wrote: > >>> However, I'm not quite sure what fork() does with pages that have a pin. >> >> We COW the anon pages always, and we protect against concurrent GUP >> using the >> * mmap_lock in exclusive mode for ordinary GUP >> * mm->write_protect_seq for GUP-fast > > Right, but neither the mmap_sem nor the write_protect_seq help anything > at all vs already extant page pins. > > But I just found copy_present_page()'s page_needs_cow_for_dma(), which I > think deals with exactly that case, it avoids doing CoW on pinned pages > and instead feeds the child a full copy while keeping the pinned page in > the original process. Yes, page_needs_cow_for_dma() is the magic bit. The locking I explained keep its output "reliable". > >>> Naively, a page that has async DMA activity should not be CoW'ed, or if >>> it is, care must be taken to ensure the original pages stays in the >>> original process, but I realize that's somewhat hard. >> >> That's precisely what I'm working on fixing ... and yes, it's hard. >> >> Let me know if you need any other information, I've spent way too much >> time on this than I ever panned. > > So let me try and get this right: > > - GUP post-fork breaks CoW for FOLL_WRITE/FOLL_PIN, without either > there's a problem where one task might observe changes by another. > > - GUP pre-fork prevents CoW and does a full copy. Yes, pretty much. > > And that all mostly works, except for a fair amount of 'fun' cases? I'd say some obviously broken cases, some racy cases, some fun cases :) We have three main cases. And usually, trying to tackle one triggers another. (1) Missed CoW If the child R/O pins and unmaps the page, the parent might miss to CoW and reuse the page. Security issue. Once CVE in that area is currently still applicable for THP (well, and hugetlb). (2) Unnecessary CoW We CoW instead of reusing the page, but there are no relevant pins, so it's unnecessary. (3) Wrong CoW We CoW a page that has relevant pins, losing synchronicity between GUP and the page tables. The "criticality" is (1), (3), (2). (3) can currently get triggered by anything that can map a pinned page R/O. The racy case is what I described about the swapcache. Other broken cases are mprotect() and friends (we cannot differentiate between R/O and R/W pins ...). -- Thanks, David / dhildenb