From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DD4DC39EF2B for ; Wed, 13 May 2026 18:45:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778697917; cv=none; b=ljtHzEHKGSDROpWMO/IBPXkIiCbTEW+rQbe3EKZEq+S2pWjZqt1NsAJ8nto4HbI4jQu2QktxLjckO5zfATLVb0X9Cbqcm57OqAd4g0tz17FnhLGVzMN9CIH+CUy4GeHbLSUKr8fvsqRuxW0zrt5ZLL7DVRpQTVP6aj4M4xZMgQw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778697917; c=relaxed/simple; bh=ydcUSVxv/v63mFtqT5mIR/iL9P9X2GfgpjhkrKjzFwU=; h=Date:From:To:Cc:Message-ID:In-Reply-To:References:Subject: Mime-Version:Content-Type; b=ApTwjIY0tO45GYAYhFw4SMl0/ghoa88eJ4scsBPQgMUiDD9hwodxU7Fb/8uQJtTocCrWYMtbOaOBcKTDLVVgx2nWcpeAA4RVTHtNoOtiaha2Z/qENbxcVWElQUkfVf+eG56wLxaQgHbZdok9uMHsAXKIHUMPts/pF3PwdZCHpfw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=tMe8IsNI; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="tMe8IsNI" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3E330C2BCB7; Wed, 13 May 2026 18:45:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1778697917; bh=ydcUSVxv/v63mFtqT5mIR/iL9P9X2GfgpjhkrKjzFwU=; h=Date:From:To:Cc:In-Reply-To:References:Subject:From; b=tMe8IsNI2jV9Nv9DCUh9saOW+tNMF5sKN2SmohQRTkyjgXSVuW3qb1FqVVAvYxe55 cLRWNhJxKEFYx0vk7u3jkhhjj3yXAmlU0Z2a7MB5Y1Vv7O7c0ccPZ2EUQwr637ruJc /c8N7UdiKnw27EJ/D1PYZnaJITJ0vAsJtaN96JYvtQyAOH694f3r4phqJ5XbA6YRMF sjGuXTxl1nVhziyPzxfkQk54S7I+RBzTJTpE8/Y9WljqeHdntq2zFBDXZP+UT0H51l 9qm/bb+d6ZNp6hxmIRsnDchXt/qXEBoQEDbwXU++XPO4RzLNXC25M21MXdpbO0lEc2 d6Lpt837UmR+w== Received: from phl-compute-06.internal (phl-compute-06.internal [10.202.2.46]) by mailfauth.phl.internal (Postfix) with ESMTP id 8E3B4F40071; Wed, 13 May 2026 14:45:16 -0400 (EDT) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-06.internal (MEProxy); Wed, 13 May 2026 14:45:16 -0400 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefhedrtddtgdduvdehgedtucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhepfffhvfevkfgjfhfugggtgfesthejredttddtjeenucfhrhhomhepfdffrghnucgh ihhllhhirghmshculdhnvhhiughirgdmfdcuoegujhgsfieskhgvrhhnvghlrdhorhhgqe enucggtffrrghtthgvrhhnpedvgeehieekteelueffueehfeejjedvjedvveetfefgffev hedvuedvffevffdvheenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrih hlfhhrohhmpegujhgsfidomhgvshhmthhprghuthhhphgvrhhsohhnrghlihhthidqudej jedvfedtgeehhedqfeeffeelgedtgeejqdgujhgsfieppehkvghrnhgvlhdrohhrghesfh grshhtmhgrihhlrdgtohhmpdhnsggprhgtphhtthhopeelpdhmohguvgepshhmthhpohhu thdprhgtphhtthhopehrrghvihgssegrmhgriihonhdrtghomhdprhgtphhtthhopehhtg hhsehinhhfrhgruggvrggurdhorhhgpdhrtghpthhtoheprghkphhmsehlihhnuhigqdhf ohhunhgurghtihhonhdrohhrghdprhgtphhtthhopegshhgvlhhgrggrshesghhoohhglh gvrdgtohhmpdhrtghpthhtohepuggrvhhiugeskhgvrhhnvghlrdhorhhgpdhrtghpthht ohepughjsgifsehkvghrnhgvlhdrohhrghdprhgtphhtthhopehilhhpohdrjhgrrhhvih hnvghnsehlihhnuhigrdhinhhtvghlrdgtohhmpdhrtghpthhtoheplhhinhhugidqkhgv rhhnvghlsehvghgvrhdrkhgvrhhnvghlrdhorhhg X-ME-Proxy: Feedback-ID: i67ae4b3e:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 13 May 2026 14:45:15 -0400 (EDT) Date: Wed, 13 May 2026 11:45:14 -0700 From: "Dan Williams (nvidia)" To: Ravi Kumar Bandi , hch@infradead.org Cc: akpm@linux-foundation.org, bhelgaas@google.com, david@kernel.org, djbw@kernel.org, ilpo.jarvinen@linux.intel.com, linux-kernel@vger.kernel.org, ravib@amazon.com Message-ID: <6a04c6ba958b2_107b100f9@djbw-dev.notmuch> In-Reply-To: <20260512075022.48671-1-ravib@amazon.com> References: <20260512075022.48671-1-ravib@amazon.com> Subject: Re: [PATCH] resource: export iomem_get_mapping() for loadable modules Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Ravi Kumar Bandi wrote: > On 5/11/26, Bjorn Helgaas wrote: > > Which endpoint driver? > > Thank you for checking. > > It is Marvell PCIe driver for a device without a hot-plug pin routed > to the CPU. The link-down event is detected by the in-tree Xilinx DMA PL > PCIe controller driver (drivers/pci/controller/pcie-xilinx-dma-pl.c). > > > Why would the endpoint driver map iomem to userspace? > > sysfs exposed BAR resource files are mmap'd directly by a third-party > userspace driver. > > > Well, we need to see the user(s) first, but once they materialize this > > begs for a higher level API. Until then it is moot anyway. > > Understood. Thank you for the feedback. We will work on a higher level > API proposal and get back. I took a look and there are some problems here. My first thought was, "why does the endpoint driver need to do this? The PCI core device removal should be responsible for zapping mappings." 2 things defeat this: 1/ for sysfs bar mappings, the unmap_mapping_range() in kernfs_drain_open_files() misses mappings established against the shared iomem_get_mapping(). 2/ procfs access to BAR space has never unmapped on device removal The practical implication of this is that userspace mappings of BARs can survive past device removal. As for mitigations, with IO_STRICT_DEVMEM the kernel will zap them before use, with LOCKDOWN the mappings can not be established, and CAP_SYS_RAWIO is required (for procfs) to create these mappings. I recall that Sima added support for ioport mmap revoke support in: 636b21b50152 PCI: Revoke mappings like devmem ...but given revoke_iomem() only evacuates at request_mem_region() time, I do not see how that ever worked. We could do something like the following for kernfs regression in the near term, or just proceed with making revoke_iomem() something that the PCI core does unconditionally by physical address on device removal. That would also fix the procfs gap. Ravi, I think your time is best spent getting the PCI core to handle the unmap on device removal. -- >8 -- Subject: resource: Fix PCI/sysfs mmap revocation vs CONFIG_IO_STRICT_DEVMEM=n From: Dan Williams CONFIG_IO_STRICT_DEVMEM wants to arrange for exclusive kernel mappings of device MMIO resources. It prevents scenarios where the kernel is confused by a user triggered device effect while the driver is running. It uses a shared inode address_space such that request_mem_region() can unmap user mappings by physical resource address. This shared inode however is not considered by kernfs_drain_open_files(). It is also the case that the procfs method for mapping PCI resources has historically never cleaned up user mappings on device removal. Fix at least the regression of unmap behavior for the CONFIG_IO_STRICT_DEVMEM=n case by disabling the shared inode when revoke_iomem() is disabled. Signed-off-by: Dan Williams --- drivers/char/mem.c | 3 ++- fs/sysfs/file.c | 2 +- kernel/resource.c | 11 ++++++++--- 3 files changed, 11 insertions(+), 5 deletions(-) diff --git a/drivers/char/mem.c b/drivers/char/mem.c index 5fd421e48c04..d9a79b0e4e4f 100644 --- a/drivers/char/mem.c +++ b/drivers/char/mem.c @@ -632,7 +632,8 @@ static int open_port(struct inode *inode, struct file *filp) * revocations when drivers want to take over a /dev/mem mapped * range. */ - filp->f_mapping = iomem_get_mapping(); + if (iomem_get_mapping()) + filp->f_mapping = iomem_get_mapping(); return 0; } diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c index 5709cede1d75..60947fa4a402 100644 --- a/fs/sysfs/file.c +++ b/fs/sysfs/file.c @@ -189,7 +189,7 @@ static int sysfs_kf_bin_open(struct kernfs_open_file *of) { const struct bin_attribute *battr = of->kn->priv; - if (battr->f_mapping) + if (battr->f_mapping && battr->f_mapping()) of->file->f_mapping = battr->f_mapping(); return 0; diff --git a/kernel/resource.c b/kernel/resource.c index d02a53fb95d8..816bcdb9fa05 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -1297,9 +1297,6 @@ static void revoke_iomem(struct resource *res) unmap_mapping_range(inode->i_mapping, res->start, resource_size(res), 1); } -#else -static void revoke_iomem(struct resource *res) {} -#endif struct address_space *iomem_get_mapping(void) { @@ -1311,6 +1308,14 @@ struct address_space *iomem_get_mapping(void) */ return smp_load_acquire(&iomem_inode)->i_mapping; } +#else +static void revoke_iomem(struct resource *res) {} + +struct address_space *iomem_get_mapping(void) +{ + return NULL; +} +#endif static int __request_region_locked(struct resource *res, struct resource *parent, resource_size_t start, resource_size_t n,