From: Jason Gunthorpe <jgg@nvidia.com>
To: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
joro@8bytes.org, robin.murphy@arm.com, vasant.hegde@amd.com,
ubizjak@gmail.com, jon.grimm@amd.com, santosh.shukla@amd.com,
pandoh@google.com, kumaranand@google.com
Subject: Re: [PATCH v2 3/5] iommu/amd: Introduce helper functions to access and update 256-bit DTE
Date: Thu, 29 Aug 2024 16:28:04 -0300 [thread overview]
Message-ID: <20240829192804.GJ3773488@nvidia.com> (raw)
In-Reply-To: <20240829180726.5022-4-suravee.suthikulpanit@amd.com>
On Thu, Aug 29, 2024 at 06:07:24PM +0000, Suravee Suthikulpanit wrote:
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index 994ed02842b9..93bca5c68bca 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -85,6 +85,47 @@ static void set_dte_entry(struct amd_iommu *iommu,
> *
> ****************************************************************************/
>
> +static void update_dte256(struct amd_iommu *iommu, struct iommu_dev_data *dev_data,
> + struct dev_table_entry *new)
> +{
> + struct dev_table_entry *dev_table = get_dev_table(iommu);
> + struct dev_table_entry *ptr = &dev_table[dev_data->devid];
> + struct dev_table_entry old;
> + u128 tmp;
> +
> + down_write(&dev_data->dte_sem);
This locking is too narrow, you need the critical region to span from
the get_dte256() till the update_dte256() because the get is
retrieving the value written by set_dte_irq_entry(), and it must not
change while the new DTE is worked on.
I suggest you copy the IRQ data here in this function under the lock
from old to new and then store it so it is always fresh.
Ideally you would remove get_dte256() because the driver *really*
shouldn't be changing the DTE in some way that already assumes
something is in the DTE (for instance my remarks on the nesting work)
Really the only reason to read the DTE is the get the IRQ data..
> + old.data128[0] = ptr->data128[0];
> + old.data128[1] = ptr->data128[1];
> +
> + tmp = cmpxchg128(&ptr->data128[1], old.data128[1], new->data128[1]);
> + if (tmp == old.data128[1]) {
> + if (!try_cmpxchg128(&ptr->data128[0], &old.data128[0], new->data128[0])) {
> + /* Restore hi 128-bit */
> + cmpxchg128(&ptr->data128[1], new->data128[1], tmp);
I don't think you should restore, this should reflect a locking error
but we still need to move forward and put some kind of correct
data.. The code can't go backwards so it should try to move forwards..
On ordering, I don't know, is this OK?
If you are leaving/entering nesting mode I think you have to write the
[2] value in the right sequence, you don't want to have the viommu
enabled unless the host page table is setup properly. So [2] is
written last when enabling, and first when disabling. Flushes required
after each write to ensure the HW doesn't see a cross-128 word bit
tear.
GuestPagingMode also has to be sequenced correctly, the GCR3 table
pointer should be invalid when it is changed, meaning you have to
write it and flush before storing the GCR3 table, and the reverse to
undo it.
The ordering, including when DTE flushes are needed, is pretty
hard. This is much simpler than, say, ARM, so I think you could open
code it, but it should be a pretty sizable bit of logic to figure out
what to do.
> +static void get_dte256(struct amd_iommu *iommu, struct iommu_dev_data *dev_data,
> + struct dev_table_entry *dte)
> +{
> + struct dev_table_entry *ptr;
> + struct dev_table_entry *dev_table = get_dev_table(iommu);
> +
> + ptr = &dev_table[dev_data->devid];
> +
> + down_read(&dev_data->dte_sem);
> + dte->data128[0] = ptr->data128[0];
> + dte->data128[1] = ptr->data128[1];
> + up_read(&dev_data->dte_sem);
I don't think you need a rwsem either. As above, you shouldn't be
reading anyhow to build a DTE, and you can't allow the interrupt
update to run regardless, so a simple spinlock would be sufficient and
faster, I think.
> @@ -2681,16 +2732,16 @@ static int amd_iommu_set_dirty_tracking(struct iommu_domain *domain,
> }
>
> list_for_each_entry(dev_data, &pdomain->dev_list, list) {
> - iommu = get_amd_iommu_from_dev_data(dev_data);
> + struct dev_table_entry dte;
>
> - dev_table = get_dev_table(iommu);
> - pte_root = dev_table[dev_data->devid].data[0];
> + iommu = get_amd_iommu_from_dev_data(dev_data);
> + get_dte256(iommu, dev_data, &dte);
>
> - pte_root = (enable ? pte_root | DTE_FLAG_HAD :
> - pte_root & ~DTE_FLAG_HAD);
> + dte.data[0] = (enable ? dte.data[0] | DTE_FLAG_HAD :
> + dte.data[0] & ~DTE_FLAG_HAD);
>
And this doesn't need all the logic just to flip one bit in a single
64bit quantity..
Jason
next prev parent reply other threads:[~2024-08-29 19:28 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-29 18:07 [PATCH v2 0/5] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
2024-08-29 18:07 ` [PATCH v2 1/5] iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported Suravee Suthikulpanit
2024-08-29 18:07 ` [PATCH v2 2/5] iommu/amd: Introduce rw_semaphore for Device Table Entry (DTE) Suravee Suthikulpanit
2024-08-29 19:34 ` Jason Gunthorpe
2024-09-05 6:20 ` Suthikulpanit, Suravee
2024-09-05 12:15 ` Jason Gunthorpe
2024-08-29 18:07 ` [PATCH v2 3/5] iommu/amd: Introduce helper functions to access and update 256-bit DTE Suravee Suthikulpanit
2024-08-29 19:28 ` Jason Gunthorpe [this message]
2024-09-05 17:54 ` Suthikulpanit, Suravee
2024-09-05 18:21 ` Jason Gunthorpe
2024-09-06 14:08 ` Suthikulpanit, Suravee
2024-09-06 16:01 ` Jason Gunthorpe
2024-08-29 18:07 ` [PATCH v2 4/5] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers Suravee Suthikulpanit
2024-08-29 18:07 ` [PATCH v2 5/5] iommu/amd: Use 128-bit cmpxchg in set_dte_irq_entry() Suravee Suthikulpanit
2024-08-29 19:40 ` Jason Gunthorpe
2024-09-05 5:32 ` Suthikulpanit, Suravee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240829192804.GJ3773488@nvidia.com \
--to=jgg@nvidia.com \
--cc=iommu@lists.linux.dev \
--cc=jon.grimm@amd.com \
--cc=joro@8bytes.org \
--cc=kumaranand@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=pandoh@google.com \
--cc=robin.murphy@arm.com \
--cc=santosh.shukla@amd.com \
--cc=suravee.suthikulpanit@amd.com \
--cc=ubizjak@gmail.com \
--cc=vasant.hegde@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox