Linux IOMMU Development
 help / color / mirror / Atom feed
From: "Suthikulpanit, Suravee" <suravee.suthikulpanit@amd.com>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Vasant Hegde <vasant.hegde@amd.com>,
	iommu@lists.linux.dev, joro@8bytes.org
Subject: Re: [PATCH 11/21] iommu/amd: Clean up and enhance protection domain flags
Date: Mon, 17 Jul 2023 23:38:44 -0700	[thread overview]
Message-ID: <1b5b9a47-0e4c-927c-c19a-9ee05ef66918@amd.com> (raw)
In-Reply-To: <ZLUyzrmRKhz4wwQ0@ziepe.ca>

Jason,

On 7/17/2023 7:23 AM, Jason Gunthorpe wrote:
> On Sun, Jul 16, 2023 at 08:22:37PM -0500, Suthikulpanit, Suravee wrote:
>>>>
>>>> ....
>>>>
>>>> diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
>>>> index 346c32343417..907af6e41ae9 100644
>>>> --- a/drivers/iommu/amd/amd_iommu_types.h
>>>> +++ b/drivers/iommu/amd/amd_iommu_types.h
>>>> @@ -443,13 +443,11 @@
>>>>    #define MAX_DOMAIN_ID 65536
>>>>    /* Protection domain flags */
>>>> -#define PD_DMA_OPS_MASK		BIT(0) /* domain used for dma_ops */
>>>> -#define PD_DEFAULT_MASK		BIT(1) /* domain is a default dma_ops
>>>> -					      domain for an IOMMU */
>>>> -#define PD_PASSTHROUGH_MASK	BIT(2) /* domain has no page
>>>> -					      translation */
>>>> -#define PD_IOMMUV2_MASK		BIT(3) /* domain has gcr3 table */
>>>> -#define PD_GIOV_MASK		BIT(4) /* domain enable GIOV support */
>>>
>>>> +#define PD_FLAG_V1DMA		BIT(1) /* DMA-API mode w/ v1 page table */
>>>> +#define PD_FLAG_V2DMA		BIT(2) /* DMA-API mode w/ v2 page table */
>>>
>>> "DMA-API" should not be part of the driver vocabulary. Does this have
>>> anything to do with DMA-API?
>>>
>>> Only "paging" domains should have IO page tables and an IO page table
>>> type.
>>>
>>> Maybe this is just misnamed?
>>
>> These flags are for struct protection_domain in AMD IOMMU driver, and not
>> for the struct iommu_domain.
> 
> I understand that
> 
>> PD_FLAG_V[1|2]DMA flags are used to signify that the domain is using v1 or
>> v2 page table to implement IOMMU_DOMAIN_DMA (which has __IOMMU_DOMAIN_PAGING
>> | __IOMMU_DOMAIN_DMA_API).
> 
> Again, no, drivers should have no sensitivity to
> __IOMMU_DOMAIN_DMA_API, we are trying to get rid of this. Please don't
> add more to the driver.

Ok, no mention of "DMA" in the naming.

>>> Also, shouldn't it be an enum not flags?
>>
>> We use flags instead of enum because we exepect to have combination of the
>> flags. For example
>> * IOMMU_DOMAIN_DMA + amd_iommu=pgtbl_v1 = (PD_FLAG_V1DMA)
>> * IOMMU_DOMAIN_DMA + amd_iommu=pgtbl_v2 = (PD_FLAG_V2DMA | PD_FLAG_GIOV)
>> * IOMMU_DOMAIN_SVA = (PD_FLAG_SVA | PD_FLAG_GCR3)
>> * Domain for v2API = (PD_FLAG_V2API | PD_FLAG_GCR3)
> 
> You shouldn't have these distinctions. There should only be one enum
> that specifies that format the IOPTEs are, and the rest is implied.

Still not quite sure what you meant by "the rest is implied".

Anyhow, instead, we could introduce struct protection_domain.pd_mode as 
enum where:

enum amd_iommu_domain_pt_mode {
     AMD_IOMMU_PD_MODE_PT = 0, /* No page table setup */
     AMD_IOMMU_PD_MODE_V1,     /* IOMMU v1 page table */
     AMD_IOMMU_PD_MODE_V2,     /* IOMMU v2 page table */
};

Then we need a way to distinguish whether the v2 table is:
  * Managed by IOMMU driver for DMA-API, which supports only PASID=0.
  * Bound using the SVA interface
  * Bound using the AMD IOMMUv2 API

What if we introduce a struct protection_domain.gcr3_flags, and use it 
to track information for setting up DTE and GCR3 table for the following 
scenarios.

Considering an IOMMU group containing a PASID capable device.

CASE 1:
-------
When boot w/ "iommu=nopt amd_iommu=pgtable_v2", the device would be 
setup to support DMA-API with AMD IOMMU v2 page table with PASID=0 in 
the GCR3 table for this domain (i.e. DTE[GIOV]=1).

Here, we can set:
   pd_mode    = AMD_IOMMU_PD_MODE_V2;
   gcr3_flags = AMD_IOMMU_GCR3_FLAG_GIOV;   /* bit 0 */,

CASE 2:
-------
Following case 1, the device driver calls 
iommu_dev_enable_feature(IOMMU_DEV_FEAT_SVA) and iommu_sva_bind_device().

Here, we need to maintain DMA-API mapping from case 1. We would keep the 
GCR3 table from case 1, and bind IOMMU v2 page table from mm_struct to a 
non-zero PASID in the same GCR3 table for SVA support. This is possible 
because the kernel manages PASID-space starting from PASID=1.

Here, we can set:
   pd_mode    = AMD_IOMMU_PD_MODE_V2;
   gcr3_flags = (AMD_IOMMU_GCR3_FLAS_GIOV |		/* bit 0 */
                 AMD_IOMMU_GCR3_FLAG_MANAGED_PASID)	/* bit 1 */

Notes:
  - This would also allow non-PASID and PASID capable devices in the 
same IOMMU group to co-exist with full isolation support.

  - The AMD_IOMMU_GCR3_FLAGS_MANAGED_PASID is cleared when the domain no 
longer contain SVA-enabled devices.

CASE 3:
-------
Current default behavior for IOMMU group with PASID/PRI capable device 
is to setup the device in pass-through mode.

Here, we can set
   pd_mode = AMD_IOMMU_PD_MODE_PT;

Subsequently, when calling amd_iommu_bind_pasid(), the IOMMU driver 
would set
   pd_mode    = AMD_IOMMU_PD_MODE_V2;
   gcr3_flags = AMD_IOMMU_GCR3_FLAG_UNMANAGED_PASID;	/* bit 2 */

Notes:
  - Since kernel does not manage PASIDs for v2API, and the device driver 
is responsible for providing a PASID when calling 
amd_iommu_bind_pasid(), we cannot reserve PASID 0 for non-PASID device. 
Therefore, device in this group would set DTE[GIOV]=0.

  - We cannot have case 1 or 2 with case 3 in the same IOMMU group since 
GCR3 table is currently per-group/domain (i.e. one default domain 
per-group).

  - TBD: Despite the upcoming SVA support, we still need to maintain 
support for the existing AMD v2 API (which we might consider deprecating 
in the future and put on bug-fix mode only), and we cannot break 
existing device drivers.

> GCR3 is the "PASID table" right? it should not be part of the
> iommu_domain struct either, SMMUv3 is busy undoing this mistake in
> their design right now.

GCR3 table is the same as PASID table,

Could you please elaborate on this part if it should not be as part of 
the domain? What's the new design for SMMUv3?

>> It helps keeping track of how the IOMMU driver needs to setup DTEs for
>> device in a particular domain.
> 
> Yes, I understand what it is for. It is important that the right data
> be stored in the right places or you will have to redo it all. If it
> does not directly related to describing the format of the IOPTEs then
> it has no business being in the iommu_domain struct or it's driver
> variants like protection_domain. It belongs someplace else.

For AMD IOMMU, GCR3 table is setup in DTE for each device. So, 
technically, it can be setup per device. However, devices in the same 
domain can share the same mapping (e.g. a domain w/ multiple non-PASID 
capable devices would share the same IOMMU v2 page table w/ PASID=0 for 
DMA-API support). This is why we have struct protection_domain.gcr3_tbl 
to avoid duplicate GCR3 table for these devices.

Alternatively, If we don't care about the additional memory to setup 
per-device GCR3 table, we could move gcr3_tbl, gcr3_flags, and any 
related variables to the per-device struct iommu_dev_data, which 
contains device-specific data for AMD IOMMU driver.

>>>> +#define PD_FLAG_GIOV		BIT(4) /* domain enable GIOV support */
>>>
>>> This really shouldn't be a iommu_domain flag, this should be computed
>>> based on what domains are attached where
>>>
>>> Eg this seems like it should be triggered by attaching identity to the
>>> RID with PASID enabled or something like that.

Actually, I am not sure that I understand this point.

>>> Why do you need to set it for the legacy GPU path?

This should not have anything to do with the legacy GPU.

>> The PD_FLAG_GIOV is necessary for when we setup the v2 page table for
>> devices, which cannot support PASID. With the DTE[GIOV] bit set, IOMMU
>> hardware would treat the request from these devices as if it use PASID 0.
> 
> I understand that, but you are supposed to deduce this from having a
> RID bound page table installed into a PASID 0 slot but other logic, it
> is very much not part of the domain itself.

IIUC, the scenarios above should clearly describe when the DTE[GIOV] 
would is determined, and tracked by gcr3_tbl. This information is used 
when setting up DTE for each device.

> (as we all the other variations of this flag when you set an IDENTITY
> domain on the RID while having PASID operating)

I am not sure that I understand this point.

Please let me know if I am missing something, or if you have different 
ideas to share.

Thanks,
Suravee

  reply	other threads:[~2023-07-18  6:38 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-12 14:14 [PATCH 00/21] iommu/amd: Preparation for SVA Support (Part 1 of 2) Vasant Hegde
2023-07-12 14:14 ` [PATCH 01/21] iommu/amd: Remove unused amd_io_pgtable.pt_root variable Vasant Hegde
2023-07-12 14:14 ` [PATCH 02/21] iommu/amd: Consolidate timeout pre-define to amd_iommu_type.h Vasant Hegde
2023-07-12 14:14 ` [PATCH 03/21] iommu/amd: Consolidate logic to allocate protection domain Vasant Hegde
2023-07-12 14:14 ` [PATCH 04/21] iommu/amd: Refactor protection domain allocation code Vasant Hegde
2023-07-12 14:15 ` [PATCH 05/21] iommu/amd/iommu_v2: Use protection_domain in struct device_state Vasant Hegde
2023-07-12 14:15 ` [PATCH 06/21] iommu/amd: Introduce helper functions for managing GCR3 table Vasant Hegde
2023-07-12 14:15 ` [PATCH 07/21] iommu/amd: Use struct protection_domain in helper functions Vasant Hegde
2023-07-14 16:57   ` Jason Gunthorpe
2023-07-12 14:15 ` [PATCH 08/21] iommu/amd: Do not set amd_iommu_pgtable in pass-through mode Vasant Hegde
2023-07-12 14:15 ` [PATCH 09/21] iommu/amd: Miscellaneous clean up when free domain Vasant Hegde
2023-07-12 14:15 ` [PATCH 10/21] iommu/amd: Modify logic for checking GT and PPR features Vasant Hegde
2023-07-12 14:15 ` [PATCH 11/21] iommu/amd: Clean up and enhance protection domain flags Vasant Hegde
2023-07-14 17:25   ` Jason Gunthorpe
2023-07-17  1:22     ` Suthikulpanit, Suravee
2023-07-17 12:23       ` Jason Gunthorpe
2023-07-18  6:38         ` Suthikulpanit, Suravee [this message]
2023-07-18 12:36           ` Jason Gunthorpe
2023-07-28  5:43             ` Vasant Hegde
2023-07-12 14:15 ` [PATCH 12/21] iommu/amd: Use protection_domain.flags to check page table mode Vasant Hegde
2023-07-12 14:15 ` [PATCH 13/21] iommu/amd: Consolidate logic for setting up v2 API mode Vasant Hegde
2023-07-12 14:15 ` [PATCH 14/21] iommu/amd: Refactor domain allocation code for v2DMA mode Vasant Hegde
2023-07-12 14:15 ` [PATCH 15/21] iommu/amd: Setup v2 domain with max pasids Vasant Hegde
2023-07-12 14:15 ` [PATCH 16/21] iommu/amd: Introduce iommu_dev_data.flags to track device capabilities Vasant Hegde
2023-07-12 14:15 ` [PATCH 17/21] iommu/amd: Rename ats related variables Vasant Hegde
2023-07-12 14:15 ` [PATCH 18/21] iommu/amd: Enable device ATS/PASID/PRI capabilities independently Vasant Hegde
2023-07-12 14:15 ` [PATCH 19/21] iommu/amd: Delay enabling PRI to after initializing domain Vasant Hegde
2023-07-12 14:15 ` [PATCH 20/21] iommu/amd: Initialize iommu_device->max_pasids Vasant Hegde
2023-07-12 14:15 ` [PATCH 21/21] iommu/amd: Simplify amd_iommu_device_info() Vasant Hegde
2023-07-14 17:53 ` [PATCH 00/21] iommu/amd: Preparation for SVA Support (Part 1 of 2) Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1b5b9a47-0e4c-927c-c19a-9ee05ef66918@amd.com \
    --to=suravee.suthikulpanit@amd.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@ziepe.ca \
    --cc=joro@8bytes.org \
    --cc=vasant.hegde@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox