All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: "Tian, Kevin" <kevin.tian@intel.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>,
	David Woodhouse <dwmw2@infradead.org>,
	"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
	Joerg Roedel <joro@8bytes.org>,
	Robin Murphy <robin.murphy@arm.com>,
	Will Deacon <will@kernel.org>,
	"patches@lists.linux.dev" <patches@lists.linux.dev>,
	Tina Zhang <tina.zhang@intel.com>,
	"Wang, Wei W" <wei.w.wang@intel.com>
Subject: Re: [PATCH 5/9] iommupt: Add the Intel VT-D second stage page table format
Date: Fri, 22 Aug 2025 11:53:21 -0300	[thread overview]
Message-ID: <20250822145321.GA1405994@nvidia.com> (raw)
In-Reply-To: <BN9PR11MB52765178E74F075CA330F7E28C3DA@BN9PR11MB5276.namprd11.prod.outlook.com>

On Fri, Aug 22, 2025 at 09:14:09AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Thursday, July 17, 2025 3:58 AM
> > 
> > The VT-D second stage format is almost the same as the x86 PAE format,
> > except the bit encodings in the PTE are different and a few new PTE
> > features, like force coherency are present.
> > 
> > Among all the formats it is unique in not having a designated present bit.
> > 
> > Comparing the performance of several operations to the existing version:
> > 
> > iommu_map()
> >    pgsz  ,avg new,old ns, min new,old ns  , min % (+ve is better)
> >      2^12,     53,66    ,      50,64      ,  21.21
> >      2^21,     59,70    ,      56,67      ,  16.16
> >      2^30,     54,66    ,      52,63      ,  17.17
> >  256*2^12,    384,524   ,     337,516     ,  34.34
> >  256*2^21,    387,632   ,     336,626     ,  46.46
> >  256*2^30,    376,629   ,     323,623     ,  48.48
> 
> It's a big win, thanks! Out of curiosity, is there a single aspect in the 
> new design contributing most of the improvement or is it just
> accumulated from many pieces?

I think this is principally from not rewalking so much. The new code
fills entire levels with leaf PTEs without rewalking back to the
level. This is somewhat tricky code.

> a side note - the variation between avg/min is getting bigger in the
> new code, but it's not that important for now compared to the
> overall gain. 😊

It could be a side effect of running in kvm..  I think the new code
has alot of instructions and branching so it is more sensitive to
cache performance.

> > +static inline u64 vtdss_pt_sw_bit(unsigned int bitnr)
> > +{
> > +	/* Bits marked Ignored in the specification */
> > +	switch (bitnr) {
> > +	case 0:
> > +		return BIT(10);
> > +	case 1 ... 10:
> > +		return BIT_ULL((bitnr - 1) + 52);
> 
> bit61 has a meaning now in the latest v5.0 spec, though it's not a
> good practice to repurpose an ignored bit.

"not a good practice" is an understatement.
 
> > +	case 11:
> > +		return BIT_ULL(63);
> > +	/* Remaing bits 9-3  are only available in some entries */
> 
> s/Remaing/remaining/
> 
> strictly speaking bit8 is A-bit in all entries. Probably no need to
> list each bit here or just remove the comment. People interested
> in the detail anyway will go to the spec.

I wanted to leave a note that this was not exhaustive

> btw bit2 is actually available for software usage.

That seems to be a spec mistake, it should be RSV since it used to be
X in earlier HW?
 
> > +
> > +	if (iommu_prot & IOMMU_READ)
> > +		pte |= VTDSS_FMT_R;
> > +	if (iommu_prot & IOMMU_WRITE)
> > +		pte |= VTDSS_FMT_W;
> > +	if (pt_feature(common, PT_FEAT_VTDSS_FORCE_COHERENCE))
> > +		pte |= VTDSS_FMT_SNP;
> > +
> > +	if (pt_feature(common, PT_FEAT_VTDSS_FORCE_WRITEABLE) &&
> > +	    !(iommu_prot & IOMMU_READ)) {
> 
> s/IOMMU_READ/IOMMU_WRITE/

oops got it

Thanks,
Jason

  reply	other threads:[~2025-08-22 14:53 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-16 19:57 [PATCH 0/9] Convert Intel VT-D to use the generic iommu page table Jason Gunthorpe
2025-07-16 19:57 ` [PATCH 1/9] iommu/pages: Add support for a incoherent IOMMU page walker Jason Gunthorpe
2025-07-21  8:41   ` Baolu Lu
2025-07-29 22:32     ` Jason Gunthorpe
2025-07-30  1:49       ` Baolu Lu
2025-08-11 21:21         ` Jason Gunthorpe
2025-08-15 11:28   ` Tian, Kevin
2025-08-22 21:13     ` Jason Gunthorpe
2025-07-16 19:57 ` [PATCH 2/9] iommupt: Add basic support for SW bits in the page table Jason Gunthorpe
2025-08-15 11:29   ` Tian, Kevin
2025-08-18 23:35     ` Jason Gunthorpe
2025-07-16 19:57 ` [PATCH 3/9] iommupt: Use the incoherent start/stop functions for PT_FEAT_DMA_INCOHERENT Jason Gunthorpe
2025-08-15 11:35   ` Tian, Kevin
2025-08-22 20:45     ` Jason Gunthorpe
2025-07-16 19:57 ` [PATCH 4/9] iommupt: Flush the CPU cache after any writes to the page table Jason Gunthorpe
2025-07-16 19:57 ` [PATCH 5/9] iommupt: Add the Intel VT-D second stage page table format Jason Gunthorpe
2025-07-22  3:11   ` Baolu Lu
2025-07-29 23:05     ` Jason Gunthorpe
2025-07-30  2:00       ` Baolu Lu
2025-08-22  9:14   ` Tian, Kevin
2025-08-22 14:53     ` Jason Gunthorpe [this message]
2025-07-16 19:57 ` [PATCH 6/9] iommupt/x86: Set the dirty bit only for writable PTEs Jason Gunthorpe
2025-07-21 10:02   ` Baolu Lu
2025-07-16 19:57 ` [PATCH 7/9] iommupt/x86: Support SW bits and permit PT_FEAT_DMA_INCOHERENT Jason Gunthorpe
2025-07-22  5:17   ` Baolu Lu
2025-07-29 23:13     ` Jason Gunthorpe
2025-07-30  2:35       ` Baolu Lu
2025-08-22  9:17   ` Tian, Kevin
2025-08-22 14:55     ` Jason Gunthorpe
2025-07-16 19:57 ` [PATCH 8/9] iommu/vt-d: Use the generic iommu page table Jason Gunthorpe
2025-07-22  6:44   ` Baolu Lu
2025-07-29 23:39     ` Jason Gunthorpe
2025-08-22  9:35   ` Tian, Kevin
2025-08-22 20:43     ` Jason Gunthorpe
2025-07-16 19:57 ` [PATCH 9/9] iommupt: Add a kunit test for the SW bits Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250822145321.GA1405994@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=dwmw2@infradead.org \
    --cc=iommu@lists.linux.dev \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=patches@lists.linux.dev \
    --cc=robin.murphy@arm.com \
    --cc=tina.zhang@intel.com \
    --cc=wei.w.wang@intel.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.