public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
From: James Gowans <jgowans@amazon.com>
To: <linux-kernel@vger.kernel.org>
Cc: Eric Biederman <ebiederm@xmission.com>,
	<kexec@lists.infradead.org>, "Joerg Roedel" <joro@8bytes.org>,
	Will Deacon <will@kernel.org>, <iommu@lists.linux.dev>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	"Christian Brauner" <brauner@kernel.org>,
	<linux-fsdevel@vger.kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Sean Christopherson <seanjc@google.com>, <kvm@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>, <linux-mm@kvack.org>,
	Alexander Graf <graf@amazon.com>,
	David Woodhouse <dwmw@amazon.co.uk>,
	"Jan H . Schoenherr" <jschoenh@amazon.de>,
	Usama Arif <usama.arif@bytedance.com>,
	Anthony Yznaga <anthony.yznaga@oracle.com>,
	Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>,
	<madvenka@linux.microsoft.com>, <steven.sistare@oracle.com>,
	<yuleixzhang@tencent.com>
Subject: [RFC 08/18] iommu: Add allocator for pgtables from persistent region
Date: Mon, 5 Feb 2024 12:01:53 +0000	[thread overview]
Message-ID: <20240205120203.60312-9-jgowans@amazon.com> (raw)
In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com>

The specific IOMMU drivers will need to ability to allocate pages from a
pkernfs IOMMU pgtable file for their pgtables. Also, the IOMMU drivers
will need to ability to consistent get the same page for the root PGD
page - add a specific function to get this PGD "root" page. This is
different to allocating regular pgtable pages because the exact same
page needs to be *restored* after kexec into the pgd pointer on the
IOMMU domain struct.

To support this sort of allocation the pkernfs region is treated as an
array of 512 4 KiB pages, the first of which is an allocation bitmap.
---
 drivers/iommu/Makefile        |  1 +
 drivers/iommu/pgtable_alloc.c | 36 +++++++++++++++++++++++++++++++++++
 drivers/iommu/pgtable_alloc.h |  9 +++++++++
 3 files changed, 46 insertions(+)
 create mode 100644 drivers/iommu/pgtable_alloc.c
 create mode 100644 drivers/iommu/pgtable_alloc.h

diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 769e43d780ce..cadebabe9581 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-y += amd/ intel/ arm/ iommufd/
+obj-y += pgtable_alloc.o
 obj-$(CONFIG_IOMMU_API) += iommu.o
 obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
diff --git a/drivers/iommu/pgtable_alloc.c b/drivers/iommu/pgtable_alloc.c
new file mode 100644
index 000000000000..f0c2e12f8a8b
--- /dev/null
+++ b/drivers/iommu/pgtable_alloc.c
@@ -0,0 +1,36 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include "pgtable_alloc.h"
+#include <linux/mm.h>
+
+/*
+ * The first 4 KiB is the bitmap - set the first bit in the bitmap.
+ * Scan bitmap to find next free bits - it's next free page.
+ */
+
+void iommu_alloc_page_from_region(struct pkernfs_region *region, void **vaddr, unsigned long *paddr)
+{
+	int page_idx;
+
+	page_idx = bitmap_find_free_region(region->vaddr, 512, 0);
+	*vaddr = region->vaddr + (page_idx << PAGE_SHIFT);
+	if (paddr)
+		*paddr = region->paddr + (page_idx << PAGE_SHIFT);
+}
+
+
+void *pgtable_get_root_page(struct pkernfs_region *region, bool liveupdate)
+{
+	/*
+	 * The page immediately after the bitmap is the root page.
+	 * It would be wrong for the page to be allocated if we're
+	 * NOT doing a liveupdate, or for a liveupdate to happen
+	 * with no allocated page. Detect this mismatch.
+	 */
+	if (test_bit(1, region->vaddr) ^ liveupdate) {
+		pr_err("%sdoing a liveupdate but root pg bit incorrect",
+				liveupdate ? "" : "NOT ");
+	}
+	set_bit(1, region->vaddr);
+	return region->vaddr + PAGE_SIZE;
+}
diff --git a/drivers/iommu/pgtable_alloc.h b/drivers/iommu/pgtable_alloc.h
new file mode 100644
index 000000000000..c1666a7be3d3
--- /dev/null
+++ b/drivers/iommu/pgtable_alloc.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#include <linux/types.h>
+#include <linux/pkernfs.h>
+
+void iommu_alloc_page_from_region(struct pkernfs_region *region,
+				  void **vaddr, unsigned long *paddr);
+
+void *pgtable_get_root_page(struct pkernfs_region *region, bool liveupdate);
-- 
2.40.1



  parent reply	other threads:[~2024-02-05 12:04 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-05 12:01 [RFC 00/18] Pkernfs: Support persistence for live update James Gowans
2024-02-05 12:01 ` [RFC 01/18] pkernfs: Introduce filesystem skeleton James Gowans
2024-02-05 12:01 ` [RFC 02/18] pkernfs: Add persistent inodes hooked into directies James Gowans
2024-02-05 12:01 ` [RFC 03/18] pkernfs: Define an allocator for persistent pages James Gowans
2024-02-05 12:01 ` [RFC 04/18] pkernfs: support file truncation James Gowans
2024-02-05 12:01 ` [RFC 05/18] pkernfs: add file mmap callback James Gowans
2024-02-05 23:34   ` Dave Chinner
2024-02-05 12:01 ` [RFC 06/18] init: Add liveupdate cmdline param James Gowans
2024-02-05 12:01 ` [RFC 07/18] pkernfs: Add file type for IOMMU root pgtables James Gowans
2024-02-05 12:01 ` James Gowans [this message]
2024-02-05 12:01 ` [RFC 09/18] intel-iommu: Use pkernfs for root/context pgtable pages James Gowans
2024-02-05 12:01 ` [RFC 10/18] iommu/intel: zap context table entries on kexec James Gowans
2024-02-05 12:01 ` [RFC 11/18] dma-iommu: Always enable deferred attaches for liveupdate James Gowans
2024-02-05 17:45   ` Jason Gunthorpe
2024-02-05 12:01 ` [RFC 12/18] pkernfs: Add IOMMU domain pgtables file James Gowans
2024-02-05 12:01 ` [RFC 13/18] vfio: add ioctl to define persistent pgtables on container James Gowans
2024-02-05 17:08   ` Jason Gunthorpe
2024-02-05 12:01 ` [RFC 14/18] intel-iommu: Allocate domain pgtable pages from pkernfs James Gowans
2024-02-05 17:12   ` Jason Gunthorpe
2024-02-05 12:02 ` [RFC 15/18] pkernfs: register device memory for IOMMU domain pgtables James Gowans
2024-02-05 12:02 ` [RFC 16/18] vfio: support not mapping IOMMU pgtables on live-update James Gowans
2024-02-05 12:02 ` [RFC 17/18] pci: Don't clear bus master is persistence enabled James Gowans
2024-02-05 12:02 ` [RFC 18/18] vfio-pci: Assume device working after liveupdate James Gowans
2024-02-05 17:10 ` [RFC 00/18] Pkernfs: Support persistence for live update Alex Williamson
2024-02-07 14:56   ` Gowans, James
2024-02-07 15:28     ` Jason Gunthorpe
2024-02-05 17:42 ` Jason Gunthorpe
2024-02-07 14:45   ` Gowans, James
2024-02-07 15:22     ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240205120203.60312-9-jgowans@amazon.com \
    --to=jgowans@amazon.com \
    --cc=akpm@linux-foundation.org \
    --cc=anthony.yznaga@oracle.com \
    --cc=brauner@kernel.org \
    --cc=dwmw@amazon.co.uk \
    --cc=ebiederm@xmission.com \
    --cc=graf@amazon.com \
    --cc=iommu@lists.linux.dev \
    --cc=joro@8bytes.org \
    --cc=jschoenh@amazon.de \
    --cc=kexec@lists.infradead.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=madvenka@linux.microsoft.com \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=skinsburskii@linux.microsoft.com \
    --cc=steven.sistare@oracle.com \
    --cc=usama.arif@bytedance.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=will@kernel.org \
    --cc=yuleixzhang@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox