From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 77D84C433EF for ; Fri, 17 Dec 2021 23:30:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D90066B0072; Fri, 17 Dec 2021 18:30:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D182D6B0073; Fri, 17 Dec 2021 18:30:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B6B666B0074; Fri, 17 Dec 2021 18:30:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0124.hostedemail.com [216.40.44.124]) by kanga.kvack.org (Postfix) with ESMTP id A11C06B0072 for ; Fri, 17 Dec 2021 18:30:11 -0500 (EST) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 59953180E9367 for ; Fri, 17 Dec 2021 23:30:01 +0000 (UTC) X-FDA: 78928881402.11.E1190B0 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf02.hostedemail.com (Postfix) with ESMTP id 3BCB580032 for ; Fri, 17 Dec 2021 23:29:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1639783800; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=44k3vX/3wa5i6/dqpN9sIfa/Ujj2HmSF1zFXiCu3Mm0=; b=eZpJbmKXftOLEjnG2eAbNheBi8rBOX0T9OPbwVNSp0WXWozv4bcTPgCPdAYFoySO9uPeYP VwPbrKK4XDgDvVN7Lgfr7dieX4bkFYOGZ2dIUW0H/QInRQLfs7R6O+ZExHiRcKxlX4RT/p xzoIupgb6tImdRKoHCweKovE0Pc0jlk= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-586-ExaqIU2vPZCb69Tvm13ztA-1; Fri, 17 Dec 2021 18:29:59 -0500 X-MC-Unique: ExaqIU2vPZCb69Tvm13ztA-1 Received: by mail-wr1-f70.google.com with SMTP id f13-20020adfe90d000000b001a15c110077so1047818wrm.8 for ; Fri, 17 Dec 2021 15:29:59 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:organization:subject :in-reply-to:content-transfer-encoding; bh=44k3vX/3wa5i6/dqpN9sIfa/Ujj2HmSF1zFXiCu3Mm0=; b=4rZivymfuXLUQd2q7Ocr5JKXqBypqAm1SZ4OzuTdcz6kKHoKgkwL+O3E4Arp4vFsiX doHD9OjEBvv2Hmj0JzbnnaOgtoy9e8LvdHUY3awjnYFAbjKf5QEyxFZp02ekeEGr75st NgxhmX11aUod6BTGM0KYyiJQbgqa2kD2ncEYD6aFeWXJZOyb3NggMKmsQthWfOg6tHZy 7MabNdR1r/K/A/KUC6QWod1qI6rZSE9/mfiYxbdMUeJEwJY+b4g6suGz3i+oN0a5Y/ov omgJ3HarL2ZPEZ/fHNn+KI6XfDZ4p3sB2ZZGeXyU3PGpFDUMzVU14vC/DkNtLOc98ZK9 qZOg== X-Gm-Message-State: AOAM533nGPtrbMAcLetXUVJeFpBum9hN+j2Tw2MW8Ksphm/h6kOGjpr9 AYtAOp29ci6mihNceXU/ccuLKi0iI+nfsxPFHz+SBUMYij0tpUiIDIDreZ7gtMlDsM96UCt4zjx TTKA5I0qH2BE= X-Received: by 2002:a5d:66d2:: with SMTP id k18mr3092029wrw.430.1639783798059; Fri, 17 Dec 2021 15:29:58 -0800 (PST) X-Google-Smtp-Source: ABdhPJwrOBcBhv3bmTYRGqBn0i+Q7SvUP66HjwLl4pDIm//o4ltTp7UThcCYRGeLCoCRxNP5fxkKRA== X-Received: by 2002:a5d:66d2:: with SMTP id k18mr3091992wrw.430.1639783797813; Fri, 17 Dec 2021 15:29:57 -0800 (PST) Received: from [192.168.3.132] (p4ff234b8.dip0.t-ipconnect.de. [79.242.52.184]) by smtp.gmail.com with ESMTPSA id z11sm9144942wmf.9.2021.12.17.15.29.56 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 17 Dec 2021 15:29:57 -0800 (PST) Message-ID: <0de1a3cb-8286-15bd-aec1-2b284bf8918a@redhat.com> Date: Sat, 18 Dec 2021 00:29:55 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0 To: Linus Torvalds Cc: Linux Kernel Mailing List , Andrew Morton , Hugh Dickins , David Rientjes , Shakeel Butt , John Hubbard , Jason Gunthorpe , Mike Kravetz , Mike Rapoport , Yang Shi , "Kirill A . Shutemov" , Matthew Wilcox , Vlastimil Babka , Jann Horn , Michal Hocko , Nadav Amit , Rik van Riel , Roman Gushchin , Andrea Arcangeli , Peter Xu , Donald Dutile , Christoph Hellwig , Oleg Nesterov , Jan Kara , Linux-MM , "open list:KERNEL SELFTEST FRAMEWORK" , "open list:DOCUMENTATION" References: <20211217113049.23850-1-david@redhat.com> <20211217113049.23850-7-david@redhat.com> <9c3ba92e-9e36-75a9-9572-a08694048c1d@redhat.com> <02cf4dcf-74e8-9cbd-ffbf-8888f18a9e8a@redhat.com> <0aa27d7d-0db6-94ee-ca16-91d19997286b@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v1 06/11] mm: support GUP-triggered unsharing via FAULT_FLAG_UNSHARE (!hugetlb) In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Stat-Signature: dhogd38dy65epim6o5ozqg1jkn37u1z5 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 3BCB580032 Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=eZpJbmKX; spf=none (imf02.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-HE-Tag: 1639783798-957925 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 17.12.21 23:58, Linus Torvalds wrote: > On Fri, Dec 17, 2021 at 2:29 PM David Hildenbrand wrote: >> >> While I do care about future use cases, I cannot possibly see fork() not >> requiring the mmap_lock in the foreseeable future. Just so much depends >> on it as of now. > > It's not that *fork()* depends on it. > > Of course fork() takes the mmap_sem. > > It's that fast-gup really really doesn't want it, and can't take it. Oh, sorry, I was misreading your mail. > > So any fast-gup user fundamentally cannot look at mapcount(), because > that would be fundamentally wrong and racy, and could race with fork. > So we're concerned about fork() racing with gup-fast-only. gup-fast-only runs essentially lockless. As we read an atomic mapcount, the relevant thing to happen would be that we read an mapcount of 1 and decide to share the page, but there is concurrent fork() such that the mapcount is increased. So the parent process has to be the only process owning that page for this to trigger (mapcount == 1). In that situation, we would pin the page in gup-fast-only. BUT this is just like GUP before fork() and caught using the mm->write_protect_seq, so we'd immediately unpin it and not actually return it from get-user-pages-fast. No harm done AFAIKS. > And yet, as far as I can tell, that's *exactly* what your gup patches > do, with gup_pte_range() adding > > + if (!pte_write(pte) && gup_must_unshare(flags, page, false)) { > + put_compound_head(head, 1, flags); > + goto pte_unmap; > + } > > which looks at the page mapcount without holding the mmap sem at all. > > And see my other email - I think there are other examples of your > patches looking at data that isn't stable because you don't hold the > right locks. We rely on PageAnon(), PageKsm() and the mapcount. To my understanding, they are stable for our use in pagefault handling code under mmap_lock and in gup-fast because of above reasoning. > > And you can't even do the optimistic case without taking the lock, > because in your world, a COW that optimistically copies in the case of > a race condition is fundamentally *wrong* and buggy. Because in your > world-view, GUP and COW are very different and have different rules, > but you need things to be *exact*, and they aren't. > > And none of this is anything at least I can think about, because I > don't see what the "design" is. > > I really have a hard time following what the rules actually are. You > seem to think that "page_mapcount()" is a really simple rule, and I > fundamentally disagree. It's a _very_ complicated thing indeed, with > locking issues, AND YOU ACTIVELY VIOLATE THE LOCKING RULES! > > See why I'm so unhappy? I see why your unhappy, and I appreciate the productive discussion :) But I think we just have to complete the big picture of what we're proposing and how the mapcount is safe to be used for this purpose. I mean, I'm happy if you actually find a flaw in the current design proposal. > > We *did* do the page_mapcount() thing. It was bad. It forced COW to > always take the page lock. There's a very real reason why I'm pushing > my "let's have a _design_ here", instead of your "let's look at > page_mapcount without even doing the locking". The locking semantics just have to be clarified and written in stone -- if we don't find any flaws. But this will be my last mail for today, have a nice weekend Linus! -- Thanks, David / dhildenb