From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 021ADC433F5 for ; Fri, 1 Oct 2021 13:49:01 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 512D861A02 for ; Fri, 1 Oct 2021 13:49:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 512D861A02 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id CD24894010B; Fri, 1 Oct 2021 09:48:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C80079400E4; Fri, 1 Oct 2021 09:48:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B6F1094010B; Fri, 1 Oct 2021 09:48:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id AA5D79400E4 for ; Fri, 1 Oct 2021 09:48:59 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 652542DD87 for ; Fri, 1 Oct 2021 13:48:59 +0000 (UTC) X-FDA: 78647999598.26.0300E66 Received: from mail-qk1-f173.google.com (mail-qk1-f173.google.com [209.85.222.173]) by imf22.hostedemail.com (Postfix) with ESMTP id 178C319AE for ; Fri, 1 Oct 2021 13:48:58 +0000 (UTC) Received: by mail-qk1-f173.google.com with SMTP id i132so9175987qke.1 for ; Fri, 01 Oct 2021 06:48:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=LG5nuVM12UKCUvMUz/H7oZ+JnPtb7xeOrg+cdikjdUw=; b=i2nZSJalGl2HFR2lljteQga9LLzznj1VY4sKfMFrOuI5TPw+9fxa9Y7VF6ORCDu6gf s1VZcbmu8prmAySSW0muDV9sPFy+ocsOAHMu7czDjdTka8/r9gDirvuLv+DjNcMP+yu2 2vpgG0jpIzFJO3o2EESr3Knc9sAVcu/3zrGpfyryk0rXsa3CM7UVEgJZI2lyiRkkfNpM 06X0/la9ZF1/JOw1nIH1hT6QVKTk7TJuv33hiHw5ih9p7MWcfmzNHEd0oxDuij3diKPb ruwnbMTM1vlfPcAVocSCOhTaVtV1FJ3tIhW47hPjbC6RUYlSTDx8/p4BvGKL4lQu/579 YBXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=LG5nuVM12UKCUvMUz/H7oZ+JnPtb7xeOrg+cdikjdUw=; b=jcGQKeB1dm4NwtLBiz+DoaTo8hmYtBesZgLXkg0kfuV6qlhxoERuPajTLUX9ROz0vt 6EAAta1kyUpZx/BI8r/9wBfizAOkqo4vkD/vuLgV2Cnpr6TBgpnfAfBosMqwtvmXMRtx ARZxfQFh7VlHPhjKGNKzXNDo41gqRIqtHKjA/uNLJigT0iMQoNI4dygqy2Pi78rgl0IH IH/tgXn449kkTO/HDG63ZFIU/dKByQlr3/sGzYOzJYT/1Rf4NUEO0NaItZk0CFaOpBgL LlayuuC+mpZhGpjCk66re1gyaJccKILCCvhVqIvFY/VMgRpZfEXLdgurRWsQo1EDWnWW 76yw== X-Gm-Message-State: AOAM530eK7WzgvdegIVnIZ5hM/+SR8o7Wwa5g3evLjGGylNAky0iqiXF RwAfYRfZiGdFyuFw57mBZRDNpg== X-Google-Smtp-Source: ABdhPJxAWiRAix1MOnP+fusB3RWFs4aefuUjIqiau0+pjC7MwPUEHTq7EjNfCISJpSPUQC5qQMYTaw== X-Received: by 2002:a37:65d6:: with SMTP id z205mr9907719qkb.522.1633096138367; Fri, 01 Oct 2021 06:48:58 -0700 (PDT) Received: from ziepe.ca (hlfxns017vw-142-162-113-129.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.162.113.129]) by smtp.gmail.com with ESMTPSA id u19sm3747206qtx.40.2021.10.01.06.48.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 01 Oct 2021 06:48:57 -0700 (PDT) Received: from jgg by mlx with local (Exim 4.94) (envelope-from ) id 1mWIuG-008R2P-Oj; Fri, 01 Oct 2021 10:48:56 -0300 Date: Fri, 1 Oct 2021 10:48:56 -0300 From: Jason Gunthorpe To: Logan Gunthorpe , Alistair Popple , Felix Kuehling , Christoph Hellwig , Dan Williams Cc: linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, linux-pci@vger.kernel.org, linux-mm@kvack.org, iommu@lists.linux-foundation.org, Stephen Bates , Christian =?utf-8?B?S8O2bmln?= , John Hubbard , Don Dutile , Matthew Wilcox , Daniel Vetter , Jakowski Andrzej , Minturn Dave B , Jason Ekstrand , Dave Hansen , Xiong Jianxin , Bjorn Helgaas , Ira Weiny , Robin Murphy , Martin Oliveira , Chaitanya Kulkarni Subject: Re: [PATCH v3 19/20] PCI/P2PDMA: introduce pci_mmap_p2pmem() Message-ID: <20211001134856.GN3544071@ziepe.ca> References: <20210916234100.122368-1-logang@deltatee.com> <20210916234100.122368-20-logang@deltatee.com> <20210928195518.GV3544071@ziepe.ca> <8d386273-c721-c919-9749-fc0a7dc1ed8b@deltatee.com> <20210929230543.GB3544071@ziepe.ca> <32ce26d7-86e9-f8d5-f0cf-40497946efe9@deltatee.com> <20210929233540.GF3544071@ziepe.ca> <20210930003652.GH3544071@ziepe.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210930003652.GH3544071@ziepe.ca> X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 178C319AE X-Stat-Signature: twkxm3goure9h75azq5reayw5muyg69q Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=ziepe.ca header.s=google header.b=i2nZSJal; dmarc=none; spf=pass (imf22.hostedemail.com: domain of jgg@ziepe.ca designates 209.85.222.173 as permitted sender) smtp.mailfrom=jgg@ziepe.ca X-HE-Tag: 1633096138-900412 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Sep 29, 2021 at 09:36:52PM -0300, Jason Gunthorpe wrote: > Why would DAX want to do this in the first place?? This means the > address space zap is much more important that just speeding up > destruction, it is essential for correctness since the PTEs are not > holding refcounts naturally... It is not really for this series to fix, but I think the whole thing is probably racy once you start allowing pte_special pages to be accessed by GUP. If we look at unmapping the PTE relative to GUP fast the important sequence is how the TLB flushing doesn't decrement the page refcount until after it knows any concurrent GUP fast is completed. This is arch specific, eg it could be done async through a call_rcu handler. This ensures that pages can't cross back into the free pool and be reallocated until we know for certain that nobody is walking the PTEs and could potentially take an additional reference on it. The scheme cannot rely on the page refcount being 0 because oce it goes into the free pool it could be immeidately reallocated back to a non-zero refcount. A DAX user that simply does an address space invalidation doesn't sequence itself with any of this mechanism. So we can race with a thread doing GUP fast and another thread re-cycling the page into another use - creating a leakage of the page from one security context to another. This seems to be made worse for the pgmap stuff due to the wonky refcount usage - at least if the refcount had dropped to zero gup fast would be blocked for a time, but even that doesn't happen. In short, I think using pg special for anything that can be returned by gup fast (and maybe even gup!) is racy/wrong. We must have the normal refcount mechanism work for correctness of the recycling flow. I don't know why DAX did this, I think we should be talking about udoing it all of it, not just the wonky refcounting Alistair and Felix are working on, but also the use of MIXEDMAP and pte special for struct page backed memory. Jason