From: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
To: Jean-Philippe Brucker
<jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
Cc: zhongmiao-C8/M+/jPZTeaMJb+Lgu22Q@public.gmane.org,
okaya-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
rjw-LthD3rsA81gm4RdzfppkhA@public.gmane.org,
linux-acpi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
sudeep.holla-5wv7dgnIgG8@public.gmane.org,
robin.murphy-5wv7dgnIgG8@public.gmane.org,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
lenb-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
Subject: Re: [PATCH v2 6/7] iommu/arm-smmu-v3: Add support for PCI ATS
Date: Tue, 16 Apr 2019 11:00:42 +0100 [thread overview]
Message-ID: <20190416100042.GA32213@fuggles.cambridge.arm.com> (raw)
In-Reply-To: <0b9b600f-60e0-0740-e1db-6b684bf5a195-5wv7dgnIgG8@public.gmane.org>
On Mon, Apr 15, 2019 at 07:00:27PM +0100, Jean-Philippe Brucker wrote:
> On 15/04/2019 14:21, Will Deacon wrote:
> > On Tue, Apr 09, 2019 at 05:52:44PM +0100, Jean-Philippe Brucker wrote:
> >> PCIe devices can implement their own TLB, named Address Translation Cache
> >> (ATC). Enable Address Translation Service (ATS) for devices that support
> >> it and send them invalidation requests whenever we invalidate the IOTLBs.
> >>
> >> ATC invalidation is allowed to take up to 90 seconds, according to the
> >> PCIe spec, so it is possible to get a SMMU command queue timeout during
> >> normal operations. However we expect implementations to complete
> >> invalidation in reasonable time.
> >>
> >> We only enable ATS for "trusted" devices, and currently rely on the
> >> pci_dev->untrusted bit. For ATS we have to trust that:
> >
> > AFAICT, devicetree has no way to describe a device as untrusted, so
> > everything will be trusted by default on those systems. Is that right?
>
> Yes, although I'm adding a devicetree property for PCI in v5.2:
> https://lore.kernel.org/linux-pci/20190411211823.GU256045-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org/T/#m9f2c036748bab6e262233280b22c1e6fab4e1607
Perfect :)
> >> (a) The device doesn't issue "translated" memory requests for addresses
> >> that weren't returned by the SMMU in a Translation Completion. In
> >> particular, if we give control of a device or device partition to a VM
> >> or userspace, software cannot program the device to access arbitrary
> >> "translated" addresses.
> >
> > Any plans to look at split-stage ATS later on? I think that would allow
> > us to pass untrusted devices through to a VM behind S1 ATS.
>
> I haven't tested it so far, we can look at that after adding support for
> nested translation
Sure, just curious. Thanks.
>
> >> (b) The device follows permissions granted by the SMMU in a Translation
> >> Completion. If the device requested read+write permission and only
> >> got read, then it doesn't write.
> >
> > Guessing we just ignore execute permissions, or do we need read implies
> > exec?
>
> Without PASID, a function cannot ask for execute permission, only RO and
> RW. In this case execution by the endpoint is never permitted (because
> the Exe bit in an ATS completion is always zero).
>
> With PASID, the endpoint must explicitly ask for execute permission (and
> interestingly, can't obtain it if the page is mapped exec-only, because
> in ATS exec implies read.)
Got it, thanks again.
> [...]
> >> +static bool disable_ats_check;
> >> +module_param_named(disable_ats_check, disable_ats_check, bool, S_IRUGO);
> >> +MODULE_PARM_DESC(disable_ats_check,
> >> + "By default, the SMMU checks whether each incoming transaction marked as translated is allowed by the stream configuration. This option disables the check.");
> >
> > Yikes, do we really want this option, or is it just a leftover from
> > debugging?
>
> I wasn't sure what to do with it, I'll drop it in next version
Cheers.
> [...]
> >> @@ -876,6 +911,14 @@ static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu)
> >> dev_err(smmu->dev, "retrying command fetch\n");
> >> case CMDQ_ERR_CERROR_NONE_IDX:
> >> return;
> >> + case CMDQ_ERR_CERROR_ATC_INV_IDX:
> >> + /*
> >> + * ATC Invalidation Completion timeout. CONS is still pointing
> >> + * at the CMD_SYNC. Attempt to complete other pending commands
> >> + * by repeating the CMD_SYNC, though we might well end up back
> >> + * here since the ATC invalidation may still be pending.
> >> + */
> >> + return;
> >
> > This is pretty bad, as it means we're unable to unmap a page safely from a
> > misbehaving device. Ideally, we'd block further transactions from the
> > problematic endpoint, but I suppose we can't easily know which one it was,
> > or inject a fault back into the unmap() path.
> >
> > Having said that, won't we get a CMD_SYNC timeout long before the ATC timeout?
>
> Yes, CMD_SYNC timeout is 1s, ATC timeout is 90s...
>
> > Perhaps we could propagate that out of arm_smmu_cmdq_issue_sync() and fail the
> > unmap?
> >
> > Not sure, what do you think?
>
> The callers of iommu_unmap() will free the page regardless of the return
> value, even though the device could still be accessing it. But I'll look
> at returning 0 if the CMD_SYNC times out, it's a good start for
> consolidating this. With dma-iommu.c it will trigger a WARN().
If it's not too tricky, that would be good.
> It gets a worse with PRI, when the invalidation comes from an MMU
> notifier and we can't even return an error. Ideally we'd hold a
> reference to these pages until invalidation completes.
Agreed.
> [...]
> >> + /*
> >> + * Find the smallest power of two that covers the range. Most
> >> + * significant differing bit between start and end address indicates the
> >> + * required span, ie. fls(start ^ end). For example:
> >> + *
> >> + * We want to invalidate pages [8; 11]. This is already the ideal range:
> >> + * x = 0b1000 ^ 0b1011 = 0b11
> >> + * span = 1 << fls(x) = 4
> >> + *
> >> + * To invalidate pages [7; 10], we need to invalidate [0; 15]:
> >> + * x = 0b0111 ^ 0b1010 = 0b1101
> >> + * span = 1 << fls(x) = 16
> >> + */
> >
> > Urgh, "The Address span is aligned to its size by the SMMU" is what makes
> > this horrible. Please can you add that to the comment?
>
> Sure (but the culprit is really the PCIe spec, with its "naturally
> aligned" ranges.)
Ok, blame them then :)
> > An alternative would be to emit multiple ATC_INV commands. Do you have a
> > feeling for what would be more efficient?
>
> With the current code, we might be removing cached entries of long-term
> mappings (tables and ring buffers) every time we unmap a buffer, in
> which case enabling ATS would certainly make the device slower.
>
> Multiple ATC_INV commands may be more efficient but I'm not too
> comfortable implementing it until I have some hardware to test it. I
> suspect we might need to optimize it a bit to avoid sending too many
> invalidations for large mappings.
Ok, just stick that in the comment as well then, please.
Will
WARNING: multiple messages have this Message-ID (diff)
From: Will Deacon <will.deacon@arm.com>
To: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Cc: zhongmiao@hisilicon.com, okaya@kernel.org, rjw@rjwysocki.net,
linux-acpi@vger.kernel.org, iommu@lists.linux-foundation.org,
sudeep.holla@arm.com, robin.murphy@arm.com,
linux-arm-kernel@lists.infradead.org, lenb@kernel.org
Subject: Re: [PATCH v2 6/7] iommu/arm-smmu-v3: Add support for PCI ATS
Date: Tue, 16 Apr 2019 11:00:42 +0100 [thread overview]
Message-ID: <20190416100042.GA32213@fuggles.cambridge.arm.com> (raw)
Message-ID: <20190416100042.WUfFLfpQeoYy2rwvJInGQehpQDKsLXjWYQqAQPNMaH8@z> (raw)
In-Reply-To: <0b9b600f-60e0-0740-e1db-6b684bf5a195@arm.com>
On Mon, Apr 15, 2019 at 07:00:27PM +0100, Jean-Philippe Brucker wrote:
> On 15/04/2019 14:21, Will Deacon wrote:
> > On Tue, Apr 09, 2019 at 05:52:44PM +0100, Jean-Philippe Brucker wrote:
> >> PCIe devices can implement their own TLB, named Address Translation Cache
> >> (ATC). Enable Address Translation Service (ATS) for devices that support
> >> it and send them invalidation requests whenever we invalidate the IOTLBs.
> >>
> >> ATC invalidation is allowed to take up to 90 seconds, according to the
> >> PCIe spec, so it is possible to get a SMMU command queue timeout during
> >> normal operations. However we expect implementations to complete
> >> invalidation in reasonable time.
> >>
> >> We only enable ATS for "trusted" devices, and currently rely on the
> >> pci_dev->untrusted bit. For ATS we have to trust that:
> >
> > AFAICT, devicetree has no way to describe a device as untrusted, so
> > everything will be trusted by default on those systems. Is that right?
>
> Yes, although I'm adding a devicetree property for PCI in v5.2:
> https://lore.kernel.org/linux-pci/20190411211823.GU256045@google.com/T/#m9f2c036748bab6e262233280b22c1e6fab4e1607
Perfect :)
> >> (a) The device doesn't issue "translated" memory requests for addresses
> >> that weren't returned by the SMMU in a Translation Completion. In
> >> particular, if we give control of a device or device partition to a VM
> >> or userspace, software cannot program the device to access arbitrary
> >> "translated" addresses.
> >
> > Any plans to look at split-stage ATS later on? I think that would allow
> > us to pass untrusted devices through to a VM behind S1 ATS.
>
> I haven't tested it so far, we can look at that after adding support for
> nested translation
Sure, just curious. Thanks.
>
> >> (b) The device follows permissions granted by the SMMU in a Translation
> >> Completion. If the device requested read+write permission and only
> >> got read, then it doesn't write.
> >
> > Guessing we just ignore execute permissions, or do we need read implies
> > exec?
>
> Without PASID, a function cannot ask for execute permission, only RO and
> RW. In this case execution by the endpoint is never permitted (because
> the Exe bit in an ATS completion is always zero).
>
> With PASID, the endpoint must explicitly ask for execute permission (and
> interestingly, can't obtain it if the page is mapped exec-only, because
> in ATS exec implies read.)
Got it, thanks again.
> [...]
> >> +static bool disable_ats_check;
> >> +module_param_named(disable_ats_check, disable_ats_check, bool, S_IRUGO);
> >> +MODULE_PARM_DESC(disable_ats_check,
> >> + "By default, the SMMU checks whether each incoming transaction marked as translated is allowed by the stream configuration. This option disables the check.");
> >
> > Yikes, do we really want this option, or is it just a leftover from
> > debugging?
>
> I wasn't sure what to do with it, I'll drop it in next version
Cheers.
> [...]
> >> @@ -876,6 +911,14 @@ static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu)
> >> dev_err(smmu->dev, "retrying command fetch\n");
> >> case CMDQ_ERR_CERROR_NONE_IDX:
> >> return;
> >> + case CMDQ_ERR_CERROR_ATC_INV_IDX:
> >> + /*
> >> + * ATC Invalidation Completion timeout. CONS is still pointing
> >> + * at the CMD_SYNC. Attempt to complete other pending commands
> >> + * by repeating the CMD_SYNC, though we might well end up back
> >> + * here since the ATC invalidation may still be pending.
> >> + */
> >> + return;
> >
> > This is pretty bad, as it means we're unable to unmap a page safely from a
> > misbehaving device. Ideally, we'd block further transactions from the
> > problematic endpoint, but I suppose we can't easily know which one it was,
> > or inject a fault back into the unmap() path.
> >
> > Having said that, won't we get a CMD_SYNC timeout long before the ATC timeout?
>
> Yes, CMD_SYNC timeout is 1s, ATC timeout is 90s...
>
> > Perhaps we could propagate that out of arm_smmu_cmdq_issue_sync() and fail the
> > unmap?
> >
> > Not sure, what do you think?
>
> The callers of iommu_unmap() will free the page regardless of the return
> value, even though the device could still be accessing it. But I'll look
> at returning 0 if the CMD_SYNC times out, it's a good start for
> consolidating this. With dma-iommu.c it will trigger a WARN().
If it's not too tricky, that would be good.
> It gets a worse with PRI, when the invalidation comes from an MMU
> notifier and we can't even return an error. Ideally we'd hold a
> reference to these pages until invalidation completes.
Agreed.
> [...]
> >> + /*
> >> + * Find the smallest power of two that covers the range. Most
> >> + * significant differing bit between start and end address indicates the
> >> + * required span, ie. fls(start ^ end). For example:
> >> + *
> >> + * We want to invalidate pages [8; 11]. This is already the ideal range:
> >> + * x = 0b1000 ^ 0b1011 = 0b11
> >> + * span = 1 << fls(x) = 4
> >> + *
> >> + * To invalidate pages [7; 10], we need to invalidate [0; 15]:
> >> + * x = 0b0111 ^ 0b1010 = 0b1101
> >> + * span = 1 << fls(x) = 16
> >> + */
> >
> > Urgh, "The Address span is aligned to its size by the SMMU" is what makes
> > this horrible. Please can you add that to the comment?
>
> Sure (but the culprit is really the PCIe spec, with its "naturally
> aligned" ranges.)
Ok, blame them then :)
> > An alternative would be to emit multiple ATC_INV commands. Do you have a
> > feeling for what would be more efficient?
>
> With the current code, we might be removing cached entries of long-term
> mappings (tables and ring buffers) every time we unmap a buffer, in
> which case enabling ATS would certainly make the device slower.
>
> Multiple ATC_INV commands may be more efficient but I'm not too
> comfortable implementing it until I have some hardware to test it. I
> suspect we might need to optimize it a bit to avoid sending too many
> invalidations for large mappings.
Ok, just stick that in the comment as well then, please.
Will
next prev parent reply other threads:[~2019-04-16 10:00 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-09 16:52 [PATCH v2 0/7] Add PCI ATS support to Arm SMMUv3 Jean-Philippe Brucker
2019-04-09 16:52 ` Jean-Philippe Brucker
[not found] ` <20190409165245.26500-1-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2019-04-09 16:52 ` [PATCH v2 1/7] ACPI/IORT: Check ATS capability in root complex nodes Jean-Philippe Brucker
2019-04-09 16:52 ` Jean-Philippe Brucker
[not found] ` <20190409165245.26500-2-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2019-04-15 13:21 ` Will Deacon
2019-04-15 13:21 ` Will Deacon
[not found] ` <20190415132108.GB15023-UDVVEH7NWB15pKCnmE3YQBJ8xKzm50AiAL8bYrjMMd8@public.gmane.org>
2019-04-15 18:00 ` Jean-Philippe Brucker
2019-04-15 18:00 ` Jean-Philippe Brucker
2019-04-15 18:31 ` Robin Murphy
2019-04-15 18:31 ` Robin Murphy
[not found] ` <c10c7adb-c7f6-f8c6-05cc-f4f143427a2d-5wv7dgnIgG8@public.gmane.org>
2019-04-16 16:27 ` Jean-Philippe Brucker
2019-04-16 16:27 ` Jean-Philippe Brucker
2019-04-09 16:52 ` [PATCH v2 2/7] iommu/arm-smmu-v3: Rename arm_smmu_master_data to arm_smmu_master Jean-Philippe Brucker
2019-04-09 16:52 ` Jean-Philippe Brucker
2019-04-09 16:52 ` [PATCH v2 3/7] iommu/arm-smmu-v3: Store SteamIDs in master Jean-Philippe Brucker
2019-04-09 16:52 ` Jean-Philippe Brucker
2019-04-09 16:52 ` [PATCH v2 4/7] iommu/arm-smmu-v3: Add a master->domain pointer Jean-Philippe Brucker
2019-04-09 16:52 ` Jean-Philippe Brucker
2019-04-09 16:52 ` [PATCH v2 5/7] iommu/arm-smmu-v3: Link domains and devices Jean-Philippe Brucker
2019-04-09 16:52 ` Jean-Philippe Brucker
2019-04-09 16:52 ` [PATCH v2 6/7] iommu/arm-smmu-v3: Add support for PCI ATS Jean-Philippe Brucker
2019-04-09 16:52 ` Jean-Philippe Brucker
[not found] ` <20190409165245.26500-7-jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org>
2019-04-15 13:21 ` Will Deacon
2019-04-15 13:21 ` Will Deacon
[not found] ` <20190415132121.GC15023-UDVVEH7NWB15pKCnmE3YQBJ8xKzm50AiAL8bYrjMMd8@public.gmane.org>
2019-04-15 18:00 ` Jean-Philippe Brucker
2019-04-15 18:00 ` Jean-Philippe Brucker
[not found] ` <0b9b600f-60e0-0740-e1db-6b684bf5a195-5wv7dgnIgG8@public.gmane.org>
2019-04-16 10:00 ` Will Deacon [this message]
2019-04-16 10:00 ` Will Deacon
2019-04-09 16:52 ` [PATCH v2 7/7] iommu/arm-smmu-v3: Disable tagged pointers Jean-Philippe Brucker
2019-04-09 16:52 ` Jean-Philippe Brucker
2019-04-15 13:20 ` [PATCH v2 0/7] Add PCI ATS support to Arm SMMUv3 Will Deacon
2019-04-15 13:20 ` Will Deacon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190416100042.GA32213@fuggles.cambridge.arm.com \
--to=will.deacon-5wv7dgnigg8@public.gmane.org \
--cc=iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
--cc=jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org \
--cc=lenb-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=linux-acpi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org \
--cc=okaya-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=rjw-LthD3rsA81gm4RdzfppkhA@public.gmane.org \
--cc=robin.murphy-5wv7dgnIgG8@public.gmane.org \
--cc=sudeep.holla-5wv7dgnIgG8@public.gmane.org \
--cc=zhongmiao-C8/M+/jPZTeaMJb+Lgu22Q@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox