Re: [PATCH 13/16] iommupt: Add the x86 PAE page table format

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>,
	Lu Baolu <baolu.lu@linux.intel.com>,
	 David Hildenbrand <david@redhat.com>,
	Christoph Hellwig <hch@lst.de>,
	iommu@lists.linux.dev,  Joao Martins <joao.m.martins@oracle.com>,
	Kevin Tian <kevin.tian@intel.com>,
	kvm@vger.kernel.org,  linux-mm@kvack.org,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	 Peter Xu <peterx@redhat.com>,
	Ryan Roberts <ryan.roberts@arm.com>,
	 Tina Zhang <tina.zhang@intel.com>
Subject: Re: [PATCH 13/16] iommupt: Add the x86 PAE page table format
Date: Fri, 16 Aug 2024 12:21:18 -0700	[thread overview]
Message-ID: <Zr-mrsewGxXt1rAC@google.com> (raw)
In-Reply-To: <13-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com>

On Thu, Aug 15, 2024, Jason Gunthorpe wrote:
> This is used by x86 CPUs and can be used in both x86 IOMMUs. When the x86
> IOMMU is running SVA it is using this page table format.
> 
> This implementation follows the AMD v2 io-pgtable version.
> 
> There is nothing remarkable here, the format has a variable top and
> limited support for different page sizes and no contiguous pages support.
> 
> In principle this can support the 32 bit configuration with fewer table
> levels.

What's "the 32 bit configuration"?

> FIXME: Compare the bits against the VT-D version too.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/iommu/generic_pt/Kconfig            |   6 +
>  drivers/iommu/generic_pt/fmt/Makefile       |   2 +
>  drivers/iommu/generic_pt/fmt/defs_x86pae.h  |  21 ++
>  drivers/iommu/generic_pt/fmt/iommu_x86pae.c |   8 +
>  drivers/iommu/generic_pt/fmt/x86pae.h       | 283 ++++++++++++++++++++
>  include/linux/generic_pt/common.h           |   4 +
>  include/linux/generic_pt/iommu.h            |  12 +
>  7 files changed, 336 insertions(+)
>  create mode 100644 drivers/iommu/generic_pt/fmt/defs_x86pae.h
>  create mode 100644 drivers/iommu/generic_pt/fmt/iommu_x86pae.c
>  create mode 100644 drivers/iommu/generic_pt/fmt/x86pae.h
> 
> diff --git a/drivers/iommu/generic_pt/Kconfig b/drivers/iommu/generic_pt/Kconfig
> index e34be10cf8bac2..a7c006234fc218 100644
> --- a/drivers/iommu/generic_pt/Kconfig
> +++ b/drivers/iommu/generic_pt/Kconfig
> @@ -70,6 +70,11 @@ config IOMMU_PT_ARMV8_64K
>  
>  	  If unsure, say N here.
>  
> +config IOMMU_PT_X86PAE
> +       tristate "IOMMU page table for x86 PAE"
> +#include "iommu_template.h"
> diff --git a/drivers/iommu/generic_pt/fmt/x86pae.h b/drivers/iommu/generic_pt/fmt/x86pae.h
> new file mode 100644
> index 00000000000000..9e0ee74275fcb3
> --- /dev/null
> +++ b/drivers/iommu/generic_pt/fmt/x86pae.h
> @@ -0,0 +1,283 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES
> + *
> + * x86 PAE page table
> + *
> + * This is described in
> + *   Section "4.4 PAE Paging" of the Intel Software Developer's Manual Volume 3

I highly doubt what's implemented here is actually PAE paging, as the SDM (that
is referenced above) and most x86 folks describe PAE paging.  PAE paging is
specifically used when the CPU is in 32-bit mode (NOT including compatibility mode!).

  PAE paging translates 32-bit linear addresses to 52-bit physical addresses.

Presumably what's implemented here is what Intel calls 4-level and 5-level paging.
Those are _really_ similar to PAE paging, e.g. have the same encodings for bits
11:0, and even require CR4.PAE=1, but they aren't 100% identical.  E.g. true PAE
paging doesn't have software-available bits in 62:MAXPHYADDR.

Unfortuntately, I have no idea what name to use for this flavor.  x86pae is
actually kinda good, but I think it'll be confusing to people that are familiar
with the more canonical version of PAE paging.

> + *   Section "2.2.6 I/O Page Tables for Guest Translations" of the "AMD I/O
> + *   Virtualization Technology (IOMMU) Specification"
> + *
> + * It is used by x86 CPUs and The AMD and VT-D IOMMU HW.
> + *
> + * The named levels in the spec map to the pts->level as:
> + *   Table/PTE - 0
> + *   Directory/PDE - 1
> + *   Directory Ptr/PDPTE - 2
> + *   PML4/PML4E - 3
> + *   PML5/PML5E - 4

Any particularly reason not to use x86's (and KVM's) effective 1-based system?
(level '0' is essentially the 4KiB leaf entries in a page table)

Starting at '1' is kinda odd, but it aligns with thing like PML4/5, allows using
the pg_level enums from x86, and diverging from both x86 MM and KVM is likely
going to confuse people.
	
> + * FIXME: __sme_set
> + */
> +#ifndef __GENERIC_PT_FMT_X86PAE_H
> +#define __GENERIC_PT_FMT_X86PAE_H
> +
> +#include "defs_x86pae.h"
> +#include "../pt_defs.h"
> +
> +#include <linux/bitfield.h>
> +#include <linux/container_of.h>
> +#include <linux/log2.h>
> +
> +enum {
> +	PT_MAX_OUTPUT_ADDRESS_LG2 = 52,
> +	PT_MAX_VA_ADDRESS_LG2 = 57,
> +	PT_ENTRY_WORD_SIZE = sizeof(u64),
> +	PT_MAX_TOP_LEVEL = 4,
> +	PT_GRANUAL_LG2SZ = 12,
> +	PT_TABLEMEM_LG2SZ = 12,
> +};
> +
> +/* Shared descriptor bits */
> +enum {
> +	X86PAE_FMT_P = BIT(0),
> +	X86PAE_FMT_RW = BIT(1),
> +	X86PAE_FMT_U = BIT(2),
> +	X86PAE_FMT_A = BIT(5),
> +	X86PAE_FMT_D = BIT(6),
> +	X86PAE_FMT_OA = GENMASK_ULL(51, 12),
> +	X86PAE_FMT_XD = BIT_ULL(63),

Any reason not to use the #defines in arch/x86/include/asm/pgtable_types.h?

> +static inline bool x86pae_pt_install_table(struct pt_state *pts,
> +					   pt_oaddr_t table_pa,
> +					   const struct pt_write_attrs *attrs)
> +{
> +	u64 *tablep = pt_cur_table(pts, u64);
> +	u64 entry;
> +
> +	/*
> +	 * FIXME according to the SDM D is ignored by HW on table pointers?

Correct, only leaf entries have dirty bits.  

> +	 * io_pgtable_v2 sets it
> +	 */
> +	entry = X86PAE_FMT_P | X86PAE_FMT_RW | X86PAE_FMT_U | X86PAE_FMT_A |

What happens with the USER bit for I/O page tables?  Ignored, I assume?

> +		X86PAE_FMT_D |
> +		FIELD_PREP(X86PAE_FMT_OA, log2_div(table_pa, PT_GRANUAL_LG2SZ));
> +	return pt_table_install64(&tablep[pts->index], entry, pts->entry);
> +}

next prev parent reply	other threads:[~2024-08-16 19:21 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-15 15:11 [PATCH 00/16] Consolidate iommu page table implementations Jason Gunthorpe
2024-08-15 15:11 ` [PATCH 01/16] genpt: Generic Page Table base API Jason Gunthorpe
2024-08-15 15:11 ` [PATCH 02/16] genpt: Add a specialized allocator for page table levels Jason Gunthorpe
2024-08-15 15:11 ` [PATCH 03/16] iommupt: Add the basic structure of the iommu implementation Jason Gunthorpe
2024-08-16 17:58   ` Jeff Johnson
2024-08-15 15:11 ` [PATCH 04/16] iommupt: Add iova_to_phys op Jason Gunthorpe
2024-08-15 15:11 ` [PATCH 05/16] iommupt: Add unmap_pages op Jason Gunthorpe
2024-08-15 15:11 ` [PATCH 06/16] iommupt: Add map_pages op Jason Gunthorpe
2024-08-15 15:11 ` [PATCH 07/16] iommupt: Add cut_mapping op Jason Gunthorpe
2024-08-15 15:11 ` [PATCH 08/16] iommupt: Add read_and_clear_dirty op Jason Gunthorpe
2024-08-15 15:11 ` [PATCH 09/16] iommupt: Add a kunit test for Generic Page Table and the IOMMU implementation Jason Gunthorpe
2024-08-16 17:55   ` Jeff Johnson
2024-08-19 14:16     ` Jason Gunthorpe
2024-08-15 15:11 ` [PATCH 10/16] iommupt: Add a kunit test to compare against iopt Jason Gunthorpe
2024-08-15 15:11 ` [PATCH 11/16] iommupt: Add the 64 bit ARMv8 page table format Jason Gunthorpe
2024-08-15 15:11 ` [PATCH 12/16] iommupt: Add the AMD IOMMU v1 " Jason Gunthorpe
2024-08-15 15:11 ` [PATCH 13/16] iommupt: Add the x86 PAE " Jason Gunthorpe
2024-08-16 19:21   ` Sean Christopherson [this message]
2024-08-17  0:36     ` Jason Gunthorpe
2024-08-15 15:11 ` [PATCH 14/16] iommupt: Add the DART v1/v2 " Jason Gunthorpe
2024-08-15 15:11 ` [PATCH 15/16] iommupt: Add the 32 bit ARMv7s " Jason Gunthorpe
2024-08-15 15:11 ` [PATCH 16/16] iommupt: Add the Intel VT-D second stage " Jason Gunthorpe
2024-08-19  2:51   ` Zhang, Tina
2024-08-19 15:53     ` Jason Gunthorpe
2024-08-20  8:22       ` Yi Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zr-mrsewGxXt1rAC@google.com \
    --to=seanjc@google.com \
    --cc=alejandro.j.jimenez@oracle.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=david@redhat.com \
    --cc=hch@lst.de \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=joao.m.martins@oracle.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pasha.tatashin@soleen.com \
    --cc=peterx@redhat.com \
    --cc=ryan.roberts@arm.com \
    --cc=tina.zhang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox