* [PATCH v2 0/4] SMMUv3 CMD_SYNC optimisation
@ 2017-08-31 13:44 Robin Murphy
[not found] ` <cover.1504182142.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
0 siblings, 1 reply; 13+ messages in thread
From: Robin Murphy @ 2017-08-31 13:44 UTC (permalink / raw)
To: will.deacon-5wv7dgnIgG8
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
sgoutham-YGCgFSpz5w/QT0dZR+AlfA,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
Hi all,
Since Nate reported a reasonable performance boost from the out-of-line
MSI polling in v1 [1], I've now implemented the equivalent for cons
polling as well - that has been boot-tested on D05 with some trivial I/O
and at least doesn't seem to lock up or explode. There's also a little
cosmetic tweaking to make the patches a bit cleaner as a series.
Robin.
[1] https://www.mail-archive.com/iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org/msg19657.html
Robin Murphy (5):
iommu/arm-smmu-v3: Specialise CMD_SYNC handling
iommu/arm-smmu-v3: Forget about cmdq-sync interrupt
iommu/arm-smmu-v3: Use CMD_SYNC completion interrupt
iommu/arm-smmu-v3: Poll for CMD_SYNC outside cmdq lock
iommu/arm-smmu-v3: Use burst-polling for sync completion
drivers/iommu/arm-smmu-v3.c | 194 ++++++++++++++++++++++++++++++--------------
1 file changed, 135 insertions(+), 59 deletions(-)
--
2.13.4.dirty
^ permalink raw reply [flat|nested] 13+ messages in thread[parent not found: <cover.1504182142.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>]
* [PATCH v2 1/4] iommu/arm-smmu-v3: Specialise CMD_SYNC handling [not found] ` <cover.1504182142.git.robin.murphy-5wv7dgnIgG8@public.gmane.org> @ 2017-08-31 13:44 ` Robin Murphy 2017-08-31 13:44 ` [PATCH v2 2/4] iommu/arm-smmu-v3: Forget about cmdq-sync interrupt Robin Murphy ` (4 subsequent siblings) 5 siblings, 0 replies; 13+ messages in thread From: Robin Murphy @ 2017-08-31 13:44 UTC (permalink / raw) To: will.deacon-5wv7dgnIgG8 Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, sgoutham-YGCgFSpz5w/QT0dZR+AlfA, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r CMD_SYNC already has a bit of special treatment here and there, but as we're about to extend it with more functionality for completing outside the CMDQ lock, things are going to get rather messy if we keep trying to cram everything into a single generic command interface. Instead, let's break out the issuing of CMD_SYNC into its own specific helper where upcoming changes will have room to breathe. Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org> --- v2: Cosmetic changes to reduce churn in later patches drivers/iommu/arm-smmu-v3.c | 54 +++++++++++++++++++++++++++++---------------- 1 file changed, 35 insertions(+), 19 deletions(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 568c400eeaed..91f28aca72c0 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -936,13 +936,22 @@ static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu) queue_write(Q_ENT(q, cons), cmd, q->ent_dwords); } +static void arm_smmu_cmdq_insert_cmd(struct arm_smmu_device *smmu, u64 *cmd) +{ + struct arm_smmu_queue *q = &smmu->cmdq.q; + bool wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV); + + while (queue_insert_raw(q, cmd) == -ENOSPC) { + if (queue_poll_cons(q, false, wfe)) + dev_err_ratelimited(smmu->dev, "CMDQ timeout\n"); + } +} + static void arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu, struct arm_smmu_cmdq_ent *ent) { u64 cmd[CMDQ_ENT_DWORDS]; unsigned long flags; - bool wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV); - struct arm_smmu_queue *q = &smmu->cmdq.q; if (arm_smmu_cmdq_build_cmd(cmd, ent)) { dev_warn(smmu->dev, "ignoring unknown CMDQ opcode 0x%x\n", @@ -951,16 +960,29 @@ static void arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu, } spin_lock_irqsave(&smmu->cmdq.lock, flags); - while (queue_insert_raw(q, cmd) == -ENOSPC) { - if (queue_poll_cons(q, false, wfe)) - dev_err_ratelimited(smmu->dev, "CMDQ timeout\n"); - } - - if (ent->opcode == CMDQ_OP_CMD_SYNC && queue_poll_cons(q, true, wfe)) - dev_err_ratelimited(smmu->dev, "CMD_SYNC timeout\n"); + arm_smmu_cmdq_insert_cmd(smmu, cmd); spin_unlock_irqrestore(&smmu->cmdq.lock, flags); } +static void arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu) +{ + u64 cmd[CMDQ_ENT_DWORDS]; + unsigned long flags; + bool wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV); + struct arm_smmu_cmdq_ent ent = { .opcode = CMDQ_OP_CMD_SYNC }; + int ret; + + arm_smmu_cmdq_build_cmd(cmd, &ent); + + spin_lock_irqsave(&smmu->cmdq.lock, flags); + arm_smmu_cmdq_insert_cmd(smmu, cmd); + ret = queue_poll_cons(&smmu->cmdq.q, true, wfe); + spin_unlock_irqrestore(&smmu->cmdq.lock, flags); + + if (ret) + dev_err_ratelimited(smmu->dev, "CMD_SYNC timeout\n"); +} + /* Context descriptor manipulation functions */ static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr) { @@ -1029,8 +1051,7 @@ static void arm_smmu_sync_ste_for_sid(struct arm_smmu_device *smmu, u32 sid) }; arm_smmu_cmdq_issue_cmd(smmu, &cmd); - cmd.opcode = CMDQ_OP_CMD_SYNC; - arm_smmu_cmdq_issue_cmd(smmu, &cmd); + arm_smmu_cmdq_issue_sync(smmu); } static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid, @@ -1352,10 +1373,7 @@ static irqreturn_t arm_smmu_combined_irq_handler(int irq, void *dev) /* IO_PGTABLE API */ static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu) { - struct arm_smmu_cmdq_ent cmd; - - cmd.opcode = CMDQ_OP_CMD_SYNC; - arm_smmu_cmdq_issue_cmd(smmu, &cmd); + arm_smmu_cmdq_issue_sync(smmu); } static void arm_smmu_tlb_sync(void *cookie) @@ -2399,8 +2417,7 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass) /* Invalidate any cached configuration */ cmd.opcode = CMDQ_OP_CFGI_ALL; arm_smmu_cmdq_issue_cmd(smmu, &cmd); - cmd.opcode = CMDQ_OP_CMD_SYNC; - arm_smmu_cmdq_issue_cmd(smmu, &cmd); + arm_smmu_cmdq_issue_sync(smmu); /* Invalidate any stale TLB entries */ if (smmu->features & ARM_SMMU_FEAT_HYP) { @@ -2410,8 +2427,7 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass) cmd.opcode = CMDQ_OP_TLBI_NSNH_ALL; arm_smmu_cmdq_issue_cmd(smmu, &cmd); - cmd.opcode = CMDQ_OP_CMD_SYNC; - arm_smmu_cmdq_issue_cmd(smmu, &cmd); + arm_smmu_cmdq_issue_sync(smmu); /* Event queue */ writeq_relaxed(smmu->evtq.q.q_base, smmu->base + ARM_SMMU_EVTQ_BASE); -- 2.13.4.dirty ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 2/4] iommu/arm-smmu-v3: Forget about cmdq-sync interrupt [not found] ` <cover.1504182142.git.robin.murphy-5wv7dgnIgG8@public.gmane.org> 2017-08-31 13:44 ` [PATCH v2 1/4] iommu/arm-smmu-v3: Specialise CMD_SYNC handling Robin Murphy @ 2017-08-31 13:44 ` Robin Murphy 2017-08-31 13:44 ` [PATCH v2 3/4] iommu/arm-smmu-v3: Use CMD_SYNC completion MSI Robin Murphy ` (3 subsequent siblings) 5 siblings, 0 replies; 13+ messages in thread From: Robin Murphy @ 2017-08-31 13:44 UTC (permalink / raw) To: will.deacon-5wv7dgnIgG8 Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, sgoutham-YGCgFSpz5w/QT0dZR+AlfA, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r The cmdq-sync interrupt is never going to be particularly useful, since for stage 1 DMA at least we'll often need to wait for sync completion within someone else's IRQ handler, thus have to implement polling anyway. Beyond that, the overhead of taking an interrupt, then still having to grovel around in the queue to figure out *which* sync command completed, doesn't seem much more attractive than simple polling either. Furthermore, if an implementation both has wired interrupts and supports MSIs, then we don't want to be taking the IRQ unnecessarily if we're using the MSI write to update memory. Let's just make life simpler by not even bothering to claim it in the first place. Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org> --- v2: No change drivers/iommu/arm-smmu-v3.c | 24 +----------------------- 1 file changed, 1 insertion(+), 23 deletions(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 91f28aca72c0..f066725298cd 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -1296,12 +1296,6 @@ static irqreturn_t arm_smmu_priq_thread(int irq, void *dev) return IRQ_HANDLED; } -static irqreturn_t arm_smmu_cmdq_sync_handler(int irq, void *dev) -{ - /* We don't actually use CMD_SYNC interrupts for anything */ - return IRQ_HANDLED; -} - static int arm_smmu_device_disable(struct arm_smmu_device *smmu); static irqreturn_t arm_smmu_gerror_handler(int irq, void *dev) @@ -1334,10 +1328,8 @@ static irqreturn_t arm_smmu_gerror_handler(int irq, void *dev) if (active & GERROR_MSI_EVTQ_ABT_ERR) dev_warn(smmu->dev, "EVTQ MSI write aborted\n"); - if (active & GERROR_MSI_CMDQ_ABT_ERR) { + if (active & GERROR_MSI_CMDQ_ABT_ERR) dev_warn(smmu->dev, "CMDQ MSI write aborted\n"); - arm_smmu_cmdq_sync_handler(irq, smmu->dev); - } if (active & GERROR_PRIQ_ABT_ERR) dev_err(smmu->dev, "PRIQ write aborted -- events may have been lost\n"); @@ -1366,7 +1358,6 @@ static irqreturn_t arm_smmu_combined_irq_thread(int irq, void *dev) static irqreturn_t arm_smmu_combined_irq_handler(int irq, void *dev) { arm_smmu_gerror_handler(irq, dev); - arm_smmu_cmdq_sync_handler(irq, dev); return IRQ_WAKE_THREAD; } @@ -2283,15 +2274,6 @@ static void arm_smmu_setup_unique_irqs(struct arm_smmu_device *smmu) dev_warn(smmu->dev, "failed to enable evtq irq\n"); } - irq = smmu->cmdq.q.irq; - if (irq) { - ret = devm_request_irq(smmu->dev, irq, - arm_smmu_cmdq_sync_handler, 0, - "arm-smmu-v3-cmdq-sync", smmu); - if (ret < 0) - dev_warn(smmu->dev, "failed to enable cmdq-sync irq\n"); - } - irq = smmu->gerr_irq; if (irq) { ret = devm_request_irq(smmu->dev, irq, arm_smmu_gerror_handler, @@ -2799,10 +2781,6 @@ static int arm_smmu_device_probe(struct platform_device *pdev) if (irq > 0) smmu->priq.q.irq = irq; - irq = platform_get_irq_byname(pdev, "cmdq-sync"); - if (irq > 0) - smmu->cmdq.q.irq = irq; - irq = platform_get_irq_byname(pdev, "gerror"); if (irq > 0) smmu->gerr_irq = irq; -- 2.13.4.dirty ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 3/4] iommu/arm-smmu-v3: Use CMD_SYNC completion MSI [not found] ` <cover.1504182142.git.robin.murphy-5wv7dgnIgG8@public.gmane.org> 2017-08-31 13:44 ` [PATCH v2 1/4] iommu/arm-smmu-v3: Specialise CMD_SYNC handling Robin Murphy 2017-08-31 13:44 ` [PATCH v2 2/4] iommu/arm-smmu-v3: Forget about cmdq-sync interrupt Robin Murphy @ 2017-08-31 13:44 ` Robin Murphy [not found] ` <dbf0ce00f8e249c64f3d2041acd8d91818178e52.1504182142.git.robin.murphy-5wv7dgnIgG8@public.gmane.org> 2017-08-31 13:44 ` [PATCH v2 4/4] iommu/arm-smmu-v3: Poll for CMD_SYNC outside cmdq lock Robin Murphy ` (2 subsequent siblings) 5 siblings, 1 reply; 13+ messages in thread From: Robin Murphy @ 2017-08-31 13:44 UTC (permalink / raw) To: will.deacon-5wv7dgnIgG8 Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, sgoutham-YGCgFSpz5w/QT0dZR+AlfA, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r As an IRQ, the CMD_SYNC interrupt is not particularly useful, not least because we often need to wait for sync completion within someone else's IRQ handler anyway. However, when the SMMU is both coherent and supports MSIs, we can have a lot more fun by not using it as an interrupt at all. Following the example suggested in the architecture and using a write targeting normal memory, we can let callers wait on a status variable outside the lock instead of having to stall the entire queue or even touch MMIO registers. Since multiple sync commands are guaranteed to complete in order, a simple incrementing sequence count is all we need to unambiguously support any realistic number of overlapping waiters. Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org> --- v2: Remove redundant 'bool msi' command member, other cosmetic tweaks drivers/iommu/arm-smmu-v3.c | 47 +++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 45 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index f066725298cd..311f482b93d5 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -377,7 +377,16 @@ #define CMDQ_SYNC_0_CS_SHIFT 12 #define CMDQ_SYNC_0_CS_NONE (0UL << CMDQ_SYNC_0_CS_SHIFT) +#define CMDQ_SYNC_0_CS_IRQ (1UL << CMDQ_SYNC_0_CS_SHIFT) #define CMDQ_SYNC_0_CS_SEV (2UL << CMDQ_SYNC_0_CS_SHIFT) +#define CMDQ_SYNC_0_MSH_SHIFT 22 +#define CMDQ_SYNC_0_MSH_ISH (3UL << CMDQ_SYNC_0_MSH_SHIFT) +#define CMDQ_SYNC_0_MSIATTR_SHIFT 24 +#define CMDQ_SYNC_0_MSIATTR_OIWB (0xfUL << CMDQ_SYNC_0_MSIATTR_SHIFT) +#define CMDQ_SYNC_0_MSIDATA_SHIFT 32 +#define CMDQ_SYNC_0_MSIDATA_MASK 0xffffffffUL +#define CMDQ_SYNC_1_MSIADDR_SHIFT 0 +#define CMDQ_SYNC_1_MSIADDR_MASK 0xffffffffffffcUL /* Event queue */ #define EVTQ_ENT_DWORDS 4 @@ -409,6 +418,7 @@ /* High-level queue structures */ #define ARM_SMMU_POLL_TIMEOUT_US 100 #define ARM_SMMU_CMDQ_DRAIN_TIMEOUT_US 1000000 /* 1s! */ +#define ARM_SMMU_SYNC_TIMEOUT_US 1000000 /* 1s! */ #define MSI_IOVA_BASE 0x8000000 #define MSI_IOVA_LENGTH 0x100000 @@ -504,6 +514,10 @@ struct arm_smmu_cmdq_ent { } pri; #define CMDQ_OP_CMD_SYNC 0x46 + struct { + u32 msidata; + u64 msiaddr; + } sync; }; }; @@ -617,6 +631,9 @@ struct arm_smmu_device { int gerr_irq; int combined_irq; + atomic_t sync_nr; + u32 sync_count; + unsigned long ias; /* IPA */ unsigned long oas; /* PA */ unsigned long pgsize_bitmap; @@ -878,7 +895,13 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent) } break; case CMDQ_OP_CMD_SYNC: - cmd[0] |= CMDQ_SYNC_0_CS_SEV; + if (ent->sync.msiaddr) + cmd[0] |= CMDQ_SYNC_0_CS_IRQ; + else + cmd[0] |= CMDQ_SYNC_0_CS_SEV; + cmd[0] |= CMDQ_SYNC_0_MSH_ISH | CMDQ_SYNC_0_MSIATTR_OIWB; + cmd[0] |= (u64)ent->sync.msidata << CMDQ_SYNC_0_MSIDATA_SHIFT; + cmd[1] |= ent->sync.msiaddr & CMDQ_SYNC_1_MSIADDR_MASK; break; default: return -ENOENT; @@ -964,21 +987,40 @@ static void arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu, spin_unlock_irqrestore(&smmu->cmdq.lock, flags); } +static int arm_smmu_sync_poll_msi(struct arm_smmu_device *smmu, u32 sync_idx) +{ + ktime_t timeout = ktime_add_us(ktime_get(), ARM_SMMU_SYNC_TIMEOUT_US); + u32 val = smp_cond_load_acquire(&smmu->sync_count, + (int)(VAL - sync_idx) >= 0 || + !ktime_before(ktime_get(), timeout)); + + return (int)(val - sync_idx) < 0 ? -ETIMEDOUT : 0; +} + static void arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu) { u64 cmd[CMDQ_ENT_DWORDS]; unsigned long flags; bool wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV); + bool msi = (smmu->features & ARM_SMMU_FEAT_MSI) && + (smmu->features & ARM_SMMU_FEAT_COHERENCY); struct arm_smmu_cmdq_ent ent = { .opcode = CMDQ_OP_CMD_SYNC }; int ret; + if (msi) { + ent.sync.msidata = atomic_inc_return(&smmu->sync_nr); + ent.sync.msiaddr = virt_to_phys(&smmu->sync_count); + } arm_smmu_cmdq_build_cmd(cmd, &ent); spin_lock_irqsave(&smmu->cmdq.lock, flags); arm_smmu_cmdq_insert_cmd(smmu, cmd); - ret = queue_poll_cons(&smmu->cmdq.q, true, wfe); + if (!msi) + ret = queue_poll_cons(&smmu->cmdq.q, true, wfe); spin_unlock_irqrestore(&smmu->cmdq.lock, flags); + if (msi) + ret = arm_smmu_sync_poll_msi(smmu, ent.sync.msidata); if (ret) dev_err_ratelimited(smmu->dev, "CMD_SYNC timeout\n"); } @@ -2156,6 +2198,7 @@ static int arm_smmu_init_structures(struct arm_smmu_device *smmu) { int ret; + atomic_set(&smmu->sync_nr, 0); ret = arm_smmu_init_queues(smmu); if (ret) return ret; -- 2.13.4.dirty ^ permalink raw reply related [flat|nested] 13+ messages in thread
[parent not found: <dbf0ce00f8e249c64f3d2041acd8d91818178e52.1504182142.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>]
* Re: [PATCH v2 3/4] iommu/arm-smmu-v3: Use CMD_SYNC completion MSI [not found] ` <dbf0ce00f8e249c64f3d2041acd8d91818178e52.1504182142.git.robin.murphy-5wv7dgnIgG8@public.gmane.org> @ 2017-10-13 18:32 ` Will Deacon [not found] ` <20171013183237.GA30572-5wv7dgnIgG8@public.gmane.org> 0 siblings, 1 reply; 13+ messages in thread From: Will Deacon @ 2017-10-13 18:32 UTC (permalink / raw) To: Robin Murphy Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, sgoutham-YGCgFSpz5w/QT0dZR+AlfA, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r Hi Robin, This mostly looks good. Just a few comments below. On Thu, Aug 31, 2017 at 02:44:27PM +0100, Robin Murphy wrote: > As an IRQ, the CMD_SYNC interrupt is not particularly useful, not least > because we often need to wait for sync completion within someone else's > IRQ handler anyway. However, when the SMMU is both coherent and supports > MSIs, we can have a lot more fun by not using it as an interrupt at all. > Following the example suggested in the architecture and using a write > targeting normal memory, we can let callers wait on a status variable > outside the lock instead of having to stall the entire queue or even > touch MMIO registers. Since multiple sync commands are guaranteed to > complete in order, a simple incrementing sequence count is all we need > to unambiguously support any realistic number of overlapping waiters. > > Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org> > --- > > v2: Remove redundant 'bool msi' command member, other cosmetic tweaks > > drivers/iommu/arm-smmu-v3.c | 47 +++++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 45 insertions(+), 2 deletions(-) > > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c > index f066725298cd..311f482b93d5 100644 > --- a/drivers/iommu/arm-smmu-v3.c > +++ b/drivers/iommu/arm-smmu-v3.c > @@ -377,7 +377,16 @@ > > #define CMDQ_SYNC_0_CS_SHIFT 12 > #define CMDQ_SYNC_0_CS_NONE (0UL << CMDQ_SYNC_0_CS_SHIFT) > +#define CMDQ_SYNC_0_CS_IRQ (1UL << CMDQ_SYNC_0_CS_SHIFT) > #define CMDQ_SYNC_0_CS_SEV (2UL << CMDQ_SYNC_0_CS_SHIFT) > +#define CMDQ_SYNC_0_MSH_SHIFT 22 > +#define CMDQ_SYNC_0_MSH_ISH (3UL << CMDQ_SYNC_0_MSH_SHIFT) > +#define CMDQ_SYNC_0_MSIATTR_SHIFT 24 > +#define CMDQ_SYNC_0_MSIATTR_OIWB (0xfUL << CMDQ_SYNC_0_MSIATTR_SHIFT) > +#define CMDQ_SYNC_0_MSIDATA_SHIFT 32 > +#define CMDQ_SYNC_0_MSIDATA_MASK 0xffffffffUL > +#define CMDQ_SYNC_1_MSIADDR_SHIFT 0 > +#define CMDQ_SYNC_1_MSIADDR_MASK 0xffffffffffffcUL > > /* Event queue */ > #define EVTQ_ENT_DWORDS 4 > @@ -409,6 +418,7 @@ > /* High-level queue structures */ > #define ARM_SMMU_POLL_TIMEOUT_US 100 > #define ARM_SMMU_CMDQ_DRAIN_TIMEOUT_US 1000000 /* 1s! */ > +#define ARM_SMMU_SYNC_TIMEOUT_US 1000000 /* 1s! */ We only ever do this when waiting for the queue to drain, so may as well just reuse the drain timeout. > #define MSI_IOVA_BASE 0x8000000 > #define MSI_IOVA_LENGTH 0x100000 > @@ -504,6 +514,10 @@ struct arm_smmu_cmdq_ent { > } pri; > > #define CMDQ_OP_CMD_SYNC 0x46 > + struct { > + u32 msidata; > + u64 msiaddr; > + } sync; > }; > }; > > @@ -617,6 +631,9 @@ struct arm_smmu_device { > int gerr_irq; > int combined_irq; > > + atomic_t sync_nr; > + u32 sync_count; It's probably worth sticking these in separate cachelines so we don't get spurious wakeups when sync_nr is incremented. (yes, I know it should be the ERG, but that can be unreasonably huge!). > + > unsigned long ias; /* IPA */ > unsigned long oas; /* PA */ > unsigned long pgsize_bitmap; > @@ -878,7 +895,13 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent) > } > break; > case CMDQ_OP_CMD_SYNC: > - cmd[0] |= CMDQ_SYNC_0_CS_SEV; > + if (ent->sync.msiaddr) > + cmd[0] |= CMDQ_SYNC_0_CS_IRQ; > + else > + cmd[0] |= CMDQ_SYNC_0_CS_SEV; > + cmd[0] |= CMDQ_SYNC_0_MSH_ISH | CMDQ_SYNC_0_MSIATTR_OIWB; > + cmd[0] |= (u64)ent->sync.msidata << CMDQ_SYNC_0_MSIDATA_SHIFT; > + cmd[1] |= ent->sync.msiaddr & CMDQ_SYNC_1_MSIADDR_MASK; > break; > default: > return -ENOENT; > @@ -964,21 +987,40 @@ static void arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu, > spin_unlock_irqrestore(&smmu->cmdq.lock, flags); > } > > +static int arm_smmu_sync_poll_msi(struct arm_smmu_device *smmu, u32 sync_idx) > +{ > + ktime_t timeout = ktime_add_us(ktime_get(), ARM_SMMU_SYNC_TIMEOUT_US); > + u32 val = smp_cond_load_acquire(&smmu->sync_count, > + (int)(VAL - sync_idx) >= 0 || > + !ktime_before(ktime_get(), timeout)); > + > + return (int)(val - sync_idx) < 0 ? -ETIMEDOUT : 0; There are some theoretical overflow issues here which I don't think will ever occur in practice, but deserve at least a comment to explain why. > +} > + > static void arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu) > { > u64 cmd[CMDQ_ENT_DWORDS]; > unsigned long flags; > bool wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV); > + bool msi = (smmu->features & ARM_SMMU_FEAT_MSI) && > + (smmu->features & ARM_SMMU_FEAT_COHERENCY); I don't think this is sufficient for the case where we fail to setup MSIs and fall back on legacy IRQs. > struct arm_smmu_cmdq_ent ent = { .opcode = CMDQ_OP_CMD_SYNC }; > int ret; > > + if (msi) { > + ent.sync.msidata = atomic_inc_return(&smmu->sync_nr); I don't think you need barrier semantics here. > + ent.sync.msiaddr = virt_to_phys(&smmu->sync_count); > + } > arm_smmu_cmdq_build_cmd(cmd, &ent); > > spin_lock_irqsave(&smmu->cmdq.lock, flags); > arm_smmu_cmdq_insert_cmd(smmu, cmd); > - ret = queue_poll_cons(&smmu->cmdq.q, true, wfe); > + if (!msi) > + ret = queue_poll_cons(&smmu->cmdq.q, true, wfe); > spin_unlock_irqrestore(&smmu->cmdq.lock, flags); > > + if (msi) > + ret = arm_smmu_sync_poll_msi(smmu, ent.sync.msidata); This looks like the queue polling should be wrapped up in a function that returns with the spinlock released. Will ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <20171013183237.GA30572-5wv7dgnIgG8@public.gmane.org>]
* Re: [PATCH v2 3/4] iommu/arm-smmu-v3: Use CMD_SYNC completion MSI [not found] ` <20171013183237.GA30572-5wv7dgnIgG8@public.gmane.org> @ 2017-10-16 12:25 ` Robin Murphy 0 siblings, 0 replies; 13+ messages in thread From: Robin Murphy @ 2017-10-16 12:25 UTC (permalink / raw) To: Will Deacon Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, sgoutham-YGCgFSpz5w/QT0dZR+AlfA, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r On 13/10/17 19:32, Will Deacon wrote: > Hi Robin, > > This mostly looks good. Just a few comments below. > > On Thu, Aug 31, 2017 at 02:44:27PM +0100, Robin Murphy wrote: >> As an IRQ, the CMD_SYNC interrupt is not particularly useful, not least >> because we often need to wait for sync completion within someone else's >> IRQ handler anyway. However, when the SMMU is both coherent and supports >> MSIs, we can have a lot more fun by not using it as an interrupt at all. >> Following the example suggested in the architecture and using a write >> targeting normal memory, we can let callers wait on a status variable >> outside the lock instead of having to stall the entire queue or even >> touch MMIO registers. Since multiple sync commands are guaranteed to >> complete in order, a simple incrementing sequence count is all we need >> to unambiguously support any realistic number of overlapping waiters. >> >> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org> >> --- >> >> v2: Remove redundant 'bool msi' command member, other cosmetic tweaks >> >> drivers/iommu/arm-smmu-v3.c | 47 +++++++++++++++++++++++++++++++++++++++++++-- >> 1 file changed, 45 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c >> index f066725298cd..311f482b93d5 100644 >> --- a/drivers/iommu/arm-smmu-v3.c >> +++ b/drivers/iommu/arm-smmu-v3.c >> @@ -377,7 +377,16 @@ >> >> #define CMDQ_SYNC_0_CS_SHIFT 12 >> #define CMDQ_SYNC_0_CS_NONE (0UL << CMDQ_SYNC_0_CS_SHIFT) >> +#define CMDQ_SYNC_0_CS_IRQ (1UL << CMDQ_SYNC_0_CS_SHIFT) >> #define CMDQ_SYNC_0_CS_SEV (2UL << CMDQ_SYNC_0_CS_SHIFT) >> +#define CMDQ_SYNC_0_MSH_SHIFT 22 >> +#define CMDQ_SYNC_0_MSH_ISH (3UL << CMDQ_SYNC_0_MSH_SHIFT) >> +#define CMDQ_SYNC_0_MSIATTR_SHIFT 24 >> +#define CMDQ_SYNC_0_MSIATTR_OIWB (0xfUL << CMDQ_SYNC_0_MSIATTR_SHIFT) >> +#define CMDQ_SYNC_0_MSIDATA_SHIFT 32 >> +#define CMDQ_SYNC_0_MSIDATA_MASK 0xffffffffUL >> +#define CMDQ_SYNC_1_MSIADDR_SHIFT 0 >> +#define CMDQ_SYNC_1_MSIADDR_MASK 0xffffffffffffcUL >> >> /* Event queue */ >> #define EVTQ_ENT_DWORDS 4 >> @@ -409,6 +418,7 @@ >> /* High-level queue structures */ >> #define ARM_SMMU_POLL_TIMEOUT_US 100 >> #define ARM_SMMU_CMDQ_DRAIN_TIMEOUT_US 1000000 /* 1s! */ >> +#define ARM_SMMU_SYNC_TIMEOUT_US 1000000 /* 1s! */ > > We only ever do this when waiting for the queue to drain, so may as well > just reuse the drain timeout. As you've discovered, we remove the "drain" case entirely in the end. >> #define MSI_IOVA_BASE 0x8000000 >> #define MSI_IOVA_LENGTH 0x100000 >> @@ -504,6 +514,10 @@ struct arm_smmu_cmdq_ent { >> } pri; >> >> #define CMDQ_OP_CMD_SYNC 0x46 >> + struct { >> + u32 msidata; >> + u64 msiaddr; >> + } sync; >> }; >> }; >> >> @@ -617,6 +631,9 @@ struct arm_smmu_device { >> int gerr_irq; >> int combined_irq; >> >> + atomic_t sync_nr; >> + u32 sync_count; > > It's probably worth sticking these in separate cachelines so we don't > get spurious wakeups when sync_nr is incremented. (yes, I know it should > be the ERG, but that can be unreasonably huge!). Good point - we've got 8K of bitmaps embedded in the structure anyway, so even maximum ERG separation is easily possible. >> + >> unsigned long ias; /* IPA */ >> unsigned long oas; /* PA */ >> unsigned long pgsize_bitmap; >> @@ -878,7 +895,13 @@ static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent) >> } >> break; >> case CMDQ_OP_CMD_SYNC: >> - cmd[0] |= CMDQ_SYNC_0_CS_SEV; >> + if (ent->sync.msiaddr) >> + cmd[0] |= CMDQ_SYNC_0_CS_IRQ; >> + else >> + cmd[0] |= CMDQ_SYNC_0_CS_SEV; >> + cmd[0] |= CMDQ_SYNC_0_MSH_ISH | CMDQ_SYNC_0_MSIATTR_OIWB; >> + cmd[0] |= (u64)ent->sync.msidata << CMDQ_SYNC_0_MSIDATA_SHIFT; >> + cmd[1] |= ent->sync.msiaddr & CMDQ_SYNC_1_MSIADDR_MASK; >> break; >> default: >> return -ENOENT; >> @@ -964,21 +987,40 @@ static void arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu, >> spin_unlock_irqrestore(&smmu->cmdq.lock, flags); >> } >> >> +static int arm_smmu_sync_poll_msi(struct arm_smmu_device *smmu, u32 sync_idx) >> +{ >> + ktime_t timeout = ktime_add_us(ktime_get(), ARM_SMMU_SYNC_TIMEOUT_US); >> + u32 val = smp_cond_load_acquire(&smmu->sync_count, >> + (int)(VAL - sync_idx) >= 0 || >> + !ktime_before(ktime_get(), timeout)); >> + >> + return (int)(val - sync_idx) < 0 ? -ETIMEDOUT : 0; > > There are some theoretical overflow issues here which I don't think will > ever occur in practice, but deserve at least a comment to explain why. Even if we did have 2^31 or more CPUs, the size of a queue is bounded at 2^20, so we can never have enough in-flight syncs to get near to an overflow problem. I can certainly document that if you like. >> +} >> + >> static void arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu) >> { >> u64 cmd[CMDQ_ENT_DWORDS]; >> unsigned long flags; >> bool wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV); >> + bool msi = (smmu->features & ARM_SMMU_FEAT_MSI) && >> + (smmu->features & ARM_SMMU_FEAT_COHERENCY); > > I don't think this is sufficient for the case where we fail to setup MSIs > and fall back on legacy IRQs. Remember this 'MSI' is going nowhere near the GIC, so the IRQ configuration is irrelevant (especially after patch #2) - the feature bits tell us "is the SMMU capable of generating sync-completion writes?" and "are those writes coherent?", which is precisely what matters here. >> struct arm_smmu_cmdq_ent ent = { .opcode = CMDQ_OP_CMD_SYNC }; >> int ret; >> >> + if (msi) { >> + ent.sync.msidata = atomic_inc_return(&smmu->sync_nr); > > I don't think you need barrier semantics here. You mean atomic_inc_return_relaxed() would be sufficient? TBH I don't think I'd given this any thought - I guess the coherency protocols make it impossible to do an atomic op on stale data, so that seems reasonable. >> + ent.sync.msiaddr = virt_to_phys(&smmu->sync_count); >> + } >> arm_smmu_cmdq_build_cmd(cmd, &ent); >> >> spin_lock_irqsave(&smmu->cmdq.lock, flags); >> arm_smmu_cmdq_insert_cmd(smmu, cmd); >> - ret = queue_poll_cons(&smmu->cmdq.q, true, wfe); >> + if (!msi) >> + ret = queue_poll_cons(&smmu->cmdq.q, true, wfe); >> spin_unlock_irqrestore(&smmu->cmdq.lock, flags); >> >> + if (msi) >> + ret = arm_smmu_sync_poll_msi(smmu, ent.sync.msidata); > > This looks like the queue polling should be wrapped up in a function that > returns with the spinlock released. I did ponder that, but I can't help finding such asymmetric interfaces pretty grim, and things do get better again once both cases can wait outside the lock. Robin. ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2 4/4] iommu/arm-smmu-v3: Poll for CMD_SYNC outside cmdq lock [not found] ` <cover.1504182142.git.robin.murphy-5wv7dgnIgG8@public.gmane.org> ` (2 preceding siblings ...) 2017-08-31 13:44 ` [PATCH v2 3/4] iommu/arm-smmu-v3: Use CMD_SYNC completion MSI Robin Murphy @ 2017-08-31 13:44 ` Robin Murphy [not found] ` <ff239173e47dfa0fc76eaa2a25b3cbcfe8dce5e6.1504182142.git.robin.murphy-5wv7dgnIgG8@public.gmane.org> 2017-08-31 13:44 ` [RFT] iommu/arm-smmu-v3: Use burst-polling for sync completion Robin Murphy 2017-10-13 19:05 ` [PATCH v2 0/4] SMMUv3 CMD_SYNC optimisation Will Deacon 5 siblings, 1 reply; 13+ messages in thread From: Robin Murphy @ 2017-08-31 13:44 UTC (permalink / raw) To: will.deacon-5wv7dgnIgG8 Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, sgoutham-YGCgFSpz5w/QT0dZR+AlfA, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r Even without the MSI trick, we can still do a lot better than hogging the entire queue while it drains. All we actually need to do for the necessary guarantee of completion is wait for our particular command to have been consumed, and as long as we keep track of where it is there is no need to block other CPUs from adding further commands in the meantime. There is one theoretical (but incredibly unlikely) edge case to avoid, where cons has wrapped twice to still appear 'behind' the sync position - this is easily disambiguated by adding a generation count to the queue to indicate when prod wraps, since cons cannot wrap twice without prod having wrapped at least once. Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org> --- v2: New drivers/iommu/arm-smmu-v3.c | 81 ++++++++++++++++++++++++++++++++------------- 1 file changed, 58 insertions(+), 23 deletions(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 311f482b93d5..f5c5da553803 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -417,7 +417,6 @@ /* High-level queue structures */ #define ARM_SMMU_POLL_TIMEOUT_US 100 -#define ARM_SMMU_CMDQ_DRAIN_TIMEOUT_US 1000000 /* 1s! */ #define ARM_SMMU_SYNC_TIMEOUT_US 1000000 /* 1s! */ #define MSI_IOVA_BASE 0x8000000 @@ -540,6 +539,7 @@ struct arm_smmu_queue { struct arm_smmu_cmdq { struct arm_smmu_queue q; spinlock_t lock; + int generation; }; struct arm_smmu_evtq { @@ -770,21 +770,12 @@ static void queue_inc_prod(struct arm_smmu_queue *q) writel(q->prod, q->prod_reg); } -/* - * Wait for the SMMU to consume items. If drain is true, wait until the queue - * is empty. Otherwise, wait until there is at least one free slot. - */ -static int queue_poll_cons(struct arm_smmu_queue *q, bool drain, bool wfe) +/* Wait for the SMMU to consume items, until there is at least one free slot */ +static int queue_poll_cons(struct arm_smmu_queue *q, bool wfe) { - ktime_t timeout; - unsigned int delay = 1; + ktime_t timeout = ktime_add_us(ktime_get(), ARM_SMMU_POLL_TIMEOUT_US); - /* Wait longer if it's queue drain */ - timeout = ktime_add_us(ktime_get(), drain ? - ARM_SMMU_CMDQ_DRAIN_TIMEOUT_US : - ARM_SMMU_POLL_TIMEOUT_US); - - while (queue_sync_cons(q), (drain ? !queue_empty(q) : queue_full(q))) { + while (queue_sync_cons(q), queue_full(q)) { if (ktime_compare(ktime_get(), timeout) > 0) return -ETIMEDOUT; @@ -792,8 +783,7 @@ static int queue_poll_cons(struct arm_smmu_queue *q, bool drain, bool wfe) wfe(); } else { cpu_relax(); - udelay(delay); - delay *= 2; + udelay(1); } } @@ -959,15 +949,20 @@ static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu) queue_write(Q_ENT(q, cons), cmd, q->ent_dwords); } -static void arm_smmu_cmdq_insert_cmd(struct arm_smmu_device *smmu, u64 *cmd) +static u32 arm_smmu_cmdq_insert_cmd(struct arm_smmu_device *smmu, u64 *cmd) { struct arm_smmu_queue *q = &smmu->cmdq.q; bool wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV); while (queue_insert_raw(q, cmd) == -ENOSPC) { - if (queue_poll_cons(q, false, wfe)) + if (queue_poll_cons(q, wfe)) dev_err_ratelimited(smmu->dev, "CMDQ timeout\n"); } + + if (Q_IDX(q, q->prod) == 0) + smmu->cmdq.generation++; + + return q->prod; } static void arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu, @@ -997,15 +992,54 @@ static int arm_smmu_sync_poll_msi(struct arm_smmu_device *smmu, u32 sync_idx) return (int)(val - sync_idx) < 0 ? -ETIMEDOUT : 0; } +static int arm_smmu_sync_poll_cons(struct arm_smmu_device *smmu, u32 sync_idx, + int sync_gen) +{ + ktime_t timeout = ktime_add_us(ktime_get(), ARM_SMMU_SYNC_TIMEOUT_US); + struct arm_smmu_queue *q = &smmu->cmdq.q; + bool wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV); + unsigned int delay = 1; + + do { + queue_sync_cons(q); + /* + * If we see updates quickly enough, cons has passed sync_idx, + * but not yet wrapped. At worst, cons might have actually + * wrapped an even number of times, but that still guarantees + * the original sync must have been consumed. + */ + if (Q_IDX(q, q->cons) >= Q_IDX(q, sync_idx) && + Q_WRP(q, sync_idx) == Q_WRP(q, q->cons)) + return 0; + /* + * Otherwise, cons may have passed sync_idx and wrapped one or + * more times to appear behind it again, but in that case prod + * must also be one or more generations ahead. + */ + if (Q_IDX(q, q->cons) < Q_IDX(q, sync_idx) && + READ_ONCE(smmu->cmdq.generation) != sync_gen) + return 0; + + if (wfe) { + wfe(); + } else { + cpu_relax(); + udelay(delay); + delay *= 2; + } + } while (ktime_before(ktime_get(), timeout)); + + return -ETIMEDOUT; +} + static void arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu) { u64 cmd[CMDQ_ENT_DWORDS]; unsigned long flags; - bool wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV); bool msi = (smmu->features & ARM_SMMU_FEAT_MSI) && (smmu->features & ARM_SMMU_FEAT_COHERENCY); struct arm_smmu_cmdq_ent ent = { .opcode = CMDQ_OP_CMD_SYNC }; - int ret; + int ret, sync_idx, sync_gen; if (msi) { ent.sync.msidata = atomic_inc_return(&smmu->sync_nr); @@ -1014,13 +1048,14 @@ static void arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu) arm_smmu_cmdq_build_cmd(cmd, &ent); spin_lock_irqsave(&smmu->cmdq.lock, flags); - arm_smmu_cmdq_insert_cmd(smmu, cmd); - if (!msi) - ret = queue_poll_cons(&smmu->cmdq.q, true, wfe); + sync_idx = arm_smmu_cmdq_insert_cmd(smmu, cmd); + sync_gen = READ_ONCE(smmu->cmdq.generation); spin_unlock_irqrestore(&smmu->cmdq.lock, flags); if (msi) ret = arm_smmu_sync_poll_msi(smmu, ent.sync.msidata); + else + ret = arm_smmu_sync_poll_cons(smmu, sync_idx, sync_gen); if (ret) dev_err_ratelimited(smmu->dev, "CMD_SYNC timeout\n"); } -- 2.13.4.dirty ^ permalink raw reply related [flat|nested] 13+ messages in thread
[parent not found: <ff239173e47dfa0fc76eaa2a25b3cbcfe8dce5e6.1504182142.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>]
* Re: [PATCH v2 4/4] iommu/arm-smmu-v3: Poll for CMD_SYNC outside cmdq lock [not found] ` <ff239173e47dfa0fc76eaa2a25b3cbcfe8dce5e6.1504182142.git.robin.murphy-5wv7dgnIgG8@public.gmane.org> @ 2017-10-13 18:59 ` Will Deacon [not found] ` <20171013185917.GB30572-5wv7dgnIgG8@public.gmane.org> 0 siblings, 1 reply; 13+ messages in thread From: Will Deacon @ 2017-10-13 18:59 UTC (permalink / raw) To: Robin Murphy Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, sgoutham-YGCgFSpz5w/QT0dZR+AlfA, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r Hi Robin, Some of my comments on patch 3 are addressed here, but I'm really struggling to convince myself that this algorithm is correct. My preference would be to leave the code as it is for SMMUs that don't implement MSIs, but comments below anyway because it's an interesting idea. On Thu, Aug 31, 2017 at 02:44:28PM +0100, Robin Murphy wrote: > Even without the MSI trick, we can still do a lot better than hogging > the entire queue while it drains. All we actually need to do for the > necessary guarantee of completion is wait for our particular command to > have been consumed, and as long as we keep track of where it is there is > no need to block other CPUs from adding further commands in the > meantime. There is one theoretical (but incredibly unlikely) edge case > to avoid, where cons has wrapped twice to still appear 'behind' the sync > position - this is easily disambiguated by adding a generation count to > the queue to indicate when prod wraps, since cons cannot wrap twice > without prod having wrapped at least once. > > Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org> > --- > > v2: New > > drivers/iommu/arm-smmu-v3.c | 81 ++++++++++++++++++++++++++++++++------------- > 1 file changed, 58 insertions(+), 23 deletions(-) > > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c > index 311f482b93d5..f5c5da553803 100644 > --- a/drivers/iommu/arm-smmu-v3.c > +++ b/drivers/iommu/arm-smmu-v3.c > @@ -417,7 +417,6 @@ > > /* High-level queue structures */ > #define ARM_SMMU_POLL_TIMEOUT_US 100 > -#define ARM_SMMU_CMDQ_DRAIN_TIMEOUT_US 1000000 /* 1s! */ > #define ARM_SMMU_SYNC_TIMEOUT_US 1000000 /* 1s! */ > > #define MSI_IOVA_BASE 0x8000000 > @@ -540,6 +539,7 @@ struct arm_smmu_queue { > struct arm_smmu_cmdq { > struct arm_smmu_queue q; > spinlock_t lock; > + int generation; > }; > > struct arm_smmu_evtq { > @@ -770,21 +770,12 @@ static void queue_inc_prod(struct arm_smmu_queue *q) > writel(q->prod, q->prod_reg); > } > > -/* > - * Wait for the SMMU to consume items. If drain is true, wait until the queue > - * is empty. Otherwise, wait until there is at least one free slot. > - */ > -static int queue_poll_cons(struct arm_smmu_queue *q, bool drain, bool wfe) > +/* Wait for the SMMU to consume items, until there is at least one free slot */ > +static int queue_poll_cons(struct arm_smmu_queue *q, bool wfe) > { > - ktime_t timeout; > - unsigned int delay = 1; > + ktime_t timeout = ktime_add_us(ktime_get(), ARM_SMMU_POLL_TIMEOUT_US); > > - /* Wait longer if it's queue drain */ > - timeout = ktime_add_us(ktime_get(), drain ? > - ARM_SMMU_CMDQ_DRAIN_TIMEOUT_US : > - ARM_SMMU_POLL_TIMEOUT_US); > - > - while (queue_sync_cons(q), (drain ? !queue_empty(q) : queue_full(q))) { > + while (queue_sync_cons(q), queue_full(q)) { > if (ktime_compare(ktime_get(), timeout) > 0) > return -ETIMEDOUT; > > @@ -792,8 +783,7 @@ static int queue_poll_cons(struct arm_smmu_queue *q, bool drain, bool wfe) > wfe(); > } else { > cpu_relax(); > - udelay(delay); > - delay *= 2; > + udelay(1); > } > } > > @@ -959,15 +949,20 @@ static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu) > queue_write(Q_ENT(q, cons), cmd, q->ent_dwords); > } > > -static void arm_smmu_cmdq_insert_cmd(struct arm_smmu_device *smmu, u64 *cmd) > +static u32 arm_smmu_cmdq_insert_cmd(struct arm_smmu_device *smmu, u64 *cmd) > { > struct arm_smmu_queue *q = &smmu->cmdq.q; > bool wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV); > > while (queue_insert_raw(q, cmd) == -ENOSPC) { > - if (queue_poll_cons(q, false, wfe)) > + if (queue_poll_cons(q, wfe)) > dev_err_ratelimited(smmu->dev, "CMDQ timeout\n"); > } > + > + if (Q_IDX(q, q->prod) == 0) > + smmu->cmdq.generation++; The readers of generation are using READ_ONCE, so you're missing something here. > + > + return q->prod; > } > > static void arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu, > @@ -997,15 +992,54 @@ static int arm_smmu_sync_poll_msi(struct arm_smmu_device *smmu, u32 sync_idx) > return (int)(val - sync_idx) < 0 ? -ETIMEDOUT : 0; > } > > +static int arm_smmu_sync_poll_cons(struct arm_smmu_device *smmu, u32 sync_idx, > + int sync_gen) > +{ > + ktime_t timeout = ktime_add_us(ktime_get(), ARM_SMMU_SYNC_TIMEOUT_US); > + struct arm_smmu_queue *q = &smmu->cmdq.q; > + bool wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV); > + unsigned int delay = 1; > + > + do { > + queue_sync_cons(q); > + /* > + * If we see updates quickly enough, cons has passed sync_idx, > + * but not yet wrapped. At worst, cons might have actually > + * wrapped an even number of times, but that still guarantees > + * the original sync must have been consumed. > + */ > + if (Q_IDX(q, q->cons) >= Q_IDX(q, sync_idx) && > + Q_WRP(q, sync_idx) == Q_WRP(q, q->cons)) Can you move this into a separate function please, like we have for queue_full and queue_empty? > + return 0; > + /* > + * Otherwise, cons may have passed sync_idx and wrapped one or > + * more times to appear behind it again, but in that case prod > + * must also be one or more generations ahead. > + */ > + if (Q_IDX(q, q->cons) < Q_IDX(q, sync_idx) && > + READ_ONCE(smmu->cmdq.generation) != sync_gen) There's another daft overflow case here which deserves a comment (and even if it *did* happen, we'll recover gracefully). > + return 0; > + > + if (wfe) { > + wfe(); This is a bit scary... if we miss a generation update, just due to store propagation latency (maybe it's buffered by the updater), I think we can end up going into wfe when there's not an event pending. Using xchg everywhere might work, but there's still a race on having somebody update generation. The ordering here just looks generally suspicious to me because you have the generation writer in arm_smmu_cmdq_insert_cmd effectively doing: Write prod Write generation and the reader in arm_smmu_sync_poll_cons doing: Read cons Read generation so I can't see anything that gives you order to guarantee that the generation is seen to be up-to-date. But it's also Friday evening and my brain is pretty fried ;) Will ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <20171013185917.GB30572-5wv7dgnIgG8@public.gmane.org>]
* Re: [PATCH v2 4/4] iommu/arm-smmu-v3: Poll for CMD_SYNC outside cmdq lock [not found] ` <20171013185917.GB30572-5wv7dgnIgG8@public.gmane.org> @ 2017-10-16 13:12 ` Robin Murphy 0 siblings, 0 replies; 13+ messages in thread From: Robin Murphy @ 2017-10-16 13:12 UTC (permalink / raw) To: Will Deacon Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, sgoutham-YGCgFSpz5w/QT0dZR+AlfA, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r On 13/10/17 19:59, Will Deacon wrote: > Hi Robin, > > Some of my comments on patch 3 are addressed here, but I'm really struggling > to convince myself that this algorithm is correct. My preference would > be to leave the code as it is for SMMUs that don't implement MSIs, but > comments below anyway because it's an interesting idea. >From scrounging up boot logs I can see that neither the Cavium nor HiSilicon SMMUv3 implementations have MSI support (the one in D05 doesn't even have WFE), so there is a real motivation to improve scalability on current systems - it's not *just* a cool feature! > On Thu, Aug 31, 2017 at 02:44:28PM +0100, Robin Murphy wrote: >> Even without the MSI trick, we can still do a lot better than hogging >> the entire queue while it drains. All we actually need to do for the >> necessary guarantee of completion is wait for our particular command to >> have been consumed, and as long as we keep track of where it is there is >> no need to block other CPUs from adding further commands in the >> meantime. There is one theoretical (but incredibly unlikely) edge case >> to avoid, where cons has wrapped twice to still appear 'behind' the sync >> position - this is easily disambiguated by adding a generation count to >> the queue to indicate when prod wraps, since cons cannot wrap twice >> without prod having wrapped at least once. >> >> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org> >> --- >> >> v2: New >> >> drivers/iommu/arm-smmu-v3.c | 81 ++++++++++++++++++++++++++++++++------------- >> 1 file changed, 58 insertions(+), 23 deletions(-) >> >> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c >> index 311f482b93d5..f5c5da553803 100644 >> --- a/drivers/iommu/arm-smmu-v3.c >> +++ b/drivers/iommu/arm-smmu-v3.c >> @@ -417,7 +417,6 @@ >> >> /* High-level queue structures */ >> #define ARM_SMMU_POLL_TIMEOUT_US 100 >> -#define ARM_SMMU_CMDQ_DRAIN_TIMEOUT_US 1000000 /* 1s! */ >> #define ARM_SMMU_SYNC_TIMEOUT_US 1000000 /* 1s! */ >> >> #define MSI_IOVA_BASE 0x8000000 >> @@ -540,6 +539,7 @@ struct arm_smmu_queue { >> struct arm_smmu_cmdq { >> struct arm_smmu_queue q; >> spinlock_t lock; >> + int generation; >> }; >> >> struct arm_smmu_evtq { >> @@ -770,21 +770,12 @@ static void queue_inc_prod(struct arm_smmu_queue *q) >> writel(q->prod, q->prod_reg); >> } >> >> -/* >> - * Wait for the SMMU to consume items. If drain is true, wait until the queue >> - * is empty. Otherwise, wait until there is at least one free slot. >> - */ >> -static int queue_poll_cons(struct arm_smmu_queue *q, bool drain, bool wfe) >> +/* Wait for the SMMU to consume items, until there is at least one free slot */ >> +static int queue_poll_cons(struct arm_smmu_queue *q, bool wfe) >> { >> - ktime_t timeout; >> - unsigned int delay = 1; >> + ktime_t timeout = ktime_add_us(ktime_get(), ARM_SMMU_POLL_TIMEOUT_US); >> >> - /* Wait longer if it's queue drain */ >> - timeout = ktime_add_us(ktime_get(), drain ? >> - ARM_SMMU_CMDQ_DRAIN_TIMEOUT_US : >> - ARM_SMMU_POLL_TIMEOUT_US); >> - >> - while (queue_sync_cons(q), (drain ? !queue_empty(q) : queue_full(q))) { >> + while (queue_sync_cons(q), queue_full(q)) { >> if (ktime_compare(ktime_get(), timeout) > 0) >> return -ETIMEDOUT; >> >> @@ -792,8 +783,7 @@ static int queue_poll_cons(struct arm_smmu_queue *q, bool drain, bool wfe) >> wfe(); >> } else { >> cpu_relax(); >> - udelay(delay); >> - delay *= 2; >> + udelay(1); >> } >> } >> >> @@ -959,15 +949,20 @@ static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu) >> queue_write(Q_ENT(q, cons), cmd, q->ent_dwords); >> } >> >> -static void arm_smmu_cmdq_insert_cmd(struct arm_smmu_device *smmu, u64 *cmd) >> +static u32 arm_smmu_cmdq_insert_cmd(struct arm_smmu_device *smmu, u64 *cmd) >> { >> struct arm_smmu_queue *q = &smmu->cmdq.q; >> bool wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV); >> >> while (queue_insert_raw(q, cmd) == -ENOSPC) { >> - if (queue_poll_cons(q, false, wfe)) >> + if (queue_poll_cons(q, wfe)) >> dev_err_ratelimited(smmu->dev, "CMDQ timeout\n"); >> } >> + >> + if (Q_IDX(q, q->prod) == 0) >> + smmu->cmdq.generation++; > > The readers of generation are using READ_ONCE, so you're missing something > here. Yeah, I was a bit back-and-forth on this. The readers want a READ_ONCE if only to prevent it being hoisted out of the polling loop, but as long as the update of generation is single-copy-atomic, the exact point at which it occurs shouldn't matter so much, since it's only written under the cmdq lock. I guess it depends how much we trust GCC's claim of the atomicity of int - I have no great objection to smmu->cmdq.generation = WRITE_ONCE(smmu->cmdq.generation + 1); other than it being really long. >> + >> + return q->prod; >> } >> >> static void arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu, >> @@ -997,15 +992,54 @@ static int arm_smmu_sync_poll_msi(struct arm_smmu_device *smmu, u32 sync_idx) >> return (int)(val - sync_idx) < 0 ? -ETIMEDOUT : 0; >> } >> >> +static int arm_smmu_sync_poll_cons(struct arm_smmu_device *smmu, u32 sync_idx, >> + int sync_gen) >> +{ >> + ktime_t timeout = ktime_add_us(ktime_get(), ARM_SMMU_SYNC_TIMEOUT_US); >> + struct arm_smmu_queue *q = &smmu->cmdq.q; >> + bool wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV); >> + unsigned int delay = 1; >> + >> + do { >> + queue_sync_cons(q); >> + /* >> + * If we see updates quickly enough, cons has passed sync_idx, >> + * but not yet wrapped. At worst, cons might have actually >> + * wrapped an even number of times, but that still guarantees >> + * the original sync must have been consumed. >> + */ >> + if (Q_IDX(q, q->cons) >= Q_IDX(q, sync_idx) && >> + Q_WRP(q, sync_idx) == Q_WRP(q, q->cons)) > > Can you move this into a separate function please, like we have for > queue_full and queue_empty? OK, but given that it's only half of the "has cons moved past this index" operation, I'm not really sure what it could be called - queue_ahead_not_wrapped() comes to mind, but still seems a bit cryptic. >> + return 0; >> + /* >> + * Otherwise, cons may have passed sync_idx and wrapped one or >> + * more times to appear behind it again, but in that case prod >> + * must also be one or more generations ahead. >> + */ >> + if (Q_IDX(q, q->cons) < Q_IDX(q, sync_idx) && >> + READ_ONCE(smmu->cmdq.generation) != sync_gen) > > There's another daft overflow case here which deserves a comment (and even > if it *did* happen, we'll recover gracefully). You mean exactly 2^32 queue generations passing between polls? Yeah, not happening :P >> + return 0; >> + >> + if (wfe) { >> + wfe(); > > This is a bit scary... if we miss a generation update, just due to store > propagation latency (maybe it's buffered by the updater), I think we can > end up going into wfe when there's not an event pending. Using xchg > everywhere might work, but there's still a race on having somebody update > generation. The ordering here just looks generally suspicious to me because > you have the generation writer in arm_smmu_cmdq_insert_cmd effectively > doing: > > Write prod > Write generation > > and the reader in arm_smmu_sync_poll_cons doing: > > Read cons > Read generation > > so I can't see anything that gives you order to guarantee that the > generation is seen to be up-to-date. On reflection I'm pretty sure the below should suffice, to piggy-back off the DSB implicit in queue_insert_raw(). The readers only care about the generation *after* cons has been observed to go backwards, so the exact order of the generation and prod updates makes no practical difference other than closing that race window. Robin ----->8----- diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 12cdc5e50675..78ba8269c44c 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -959,14 +959,14 @@ static u32 arm_smmu_cmdq_insert_cmd(struct arm_smmu_device *smmu, u64 *cmd) struct arm_smmu_queue *q = &smmu->cmdq.q; bool wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV); + if (Q_IDX(q, q->prod + 1) == 0) + smmu->cmdq.generation++; + while (queue_insert_raw(q, cmd) == -ENOSPC) { if (queue_poll_cons(q, wfe)) dev_err_ratelimited(smmu->dev, "CMDQ timeout\n"); } - if (Q_IDX(q, q->prod) == 0) - smmu->cmdq.generation++; - return q->prod; } ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [RFT] iommu/arm-smmu-v3: Use burst-polling for sync completion [not found] ` <cover.1504182142.git.robin.murphy-5wv7dgnIgG8@public.gmane.org> ` (3 preceding siblings ...) 2017-08-31 13:44 ` [PATCH v2 4/4] iommu/arm-smmu-v3: Poll for CMD_SYNC outside cmdq lock Robin Murphy @ 2017-08-31 13:44 ` Robin Murphy 2017-10-13 19:05 ` [PATCH v2 0/4] SMMUv3 CMD_SYNC optimisation Will Deacon 5 siblings, 0 replies; 13+ messages in thread From: Robin Murphy @ 2017-08-31 13:44 UTC (permalink / raw) To: will.deacon-5wv7dgnIgG8 Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, sgoutham-YGCgFSpz5w/QT0dZR+AlfA, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r While CMD_SYNC is unlikely to complete immediately such that we never go round the polling loop, with a lightly-loaded queue it may still do so long before the delay period is up. If we have no better completion notifier, use similar logic as we have for SMMUv2 to spin a number of times before each backoff, so that we have more chance of catching syncs which complete relatively quickly and avoid delaying unnecessarily. Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org> --- This is mostly here for theoretical completeness - unless it proves to actually give a measurable benefit (I have no idea), I'd be inclined not to consider it for merging. drivers/iommu/arm-smmu-v3.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index f5c5da553803..b92cd65f43f8 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -418,6 +418,7 @@ /* High-level queue structures */ #define ARM_SMMU_POLL_TIMEOUT_US 100 #define ARM_SMMU_SYNC_TIMEOUT_US 1000000 /* 1s! */ +#define ARM_SMMU_SYNC_SPIN_COUNT 10 #define MSI_IOVA_BASE 0x8000000 #define MSI_IOVA_LENGTH 0x100000 @@ -998,7 +999,7 @@ static int arm_smmu_sync_poll_cons(struct arm_smmu_device *smmu, u32 sync_idx, ktime_t timeout = ktime_add_us(ktime_get(), ARM_SMMU_SYNC_TIMEOUT_US); struct arm_smmu_queue *q = &smmu->cmdq.q; bool wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV); - unsigned int delay = 1; + unsigned int delay = 1, spin_cnt = 0; do { queue_sync_cons(q); @@ -1022,10 +1023,13 @@ static int arm_smmu_sync_poll_cons(struct arm_smmu_device *smmu, u32 sync_idx, if (wfe) { wfe(); - } else { + } else if (++spin_cnt < ARM_SMMU_SYNC_SPIN_COUNT) { cpu_relax(); + continue; + } else { udelay(delay); delay *= 2; + spin_cnt = 0; } } while (ktime_before(ktime_get(), timeout)); -- 2.13.4.dirty ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v2 0/4] SMMUv3 CMD_SYNC optimisation [not found] ` <cover.1504182142.git.robin.murphy-5wv7dgnIgG8@public.gmane.org> ` (4 preceding siblings ...) 2017-08-31 13:44 ` [RFT] iommu/arm-smmu-v3: Use burst-polling for sync completion Robin Murphy @ 2017-10-13 19:05 ` Will Deacon [not found] ` <20171013190521.GD30572-5wv7dgnIgG8@public.gmane.org> 5 siblings, 1 reply; 13+ messages in thread From: Will Deacon @ 2017-10-13 19:05 UTC (permalink / raw) To: Robin Murphy Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, sgoutham-YGCgFSpz5w/QT0dZR+AlfA, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r Hi Robin, On Thu, Aug 31, 2017 at 02:44:24PM +0100, Robin Murphy wrote: > Since Nate reported a reasonable performance boost from the out-of-line > MSI polling in v1 [1], I've now implemented the equivalent for cons > polling as well - that has been boot-tested on D05 with some trivial I/O > and at least doesn't seem to lock up or explode. There's also a little > cosmetic tweaking to make the patches a bit cleaner as a series. > > Robin. > > [1] https://www.mail-archive.com/iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org/msg19657.html > > Robin Murphy (5): > iommu/arm-smmu-v3: Specialise CMD_SYNC handling > iommu/arm-smmu-v3: Forget about cmdq-sync interrupt > iommu/arm-smmu-v3: Use CMD_SYNC completion interrupt > iommu/arm-smmu-v3: Poll for CMD_SYNC outside cmdq lock > iommu/arm-smmu-v3: Use burst-polling for sync completion What's this final mythical patch about? I don't see it in the series. Anyway, the first two patches look fine to me, but this doesn't apply on top of my iommu/devel branch so they will need rebasing. Cheers, Will ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <20171013190521.GD30572-5wv7dgnIgG8@public.gmane.org>]
* Re: [PATCH v2 0/4] SMMUv3 CMD_SYNC optimisation [not found] ` <20171013190521.GD30572-5wv7dgnIgG8@public.gmane.org> @ 2017-10-16 13:18 ` Robin Murphy 2017-10-16 15:02 ` Will Deacon 1 sibling, 0 replies; 13+ messages in thread From: Robin Murphy @ 2017-10-16 13:18 UTC (permalink / raw) To: Will Deacon Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, sgoutham-YGCgFSpz5w/QT0dZR+AlfA, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r On 13/10/17 20:05, Will Deacon wrote: > Hi Robin, > > On Thu, Aug 31, 2017 at 02:44:24PM +0100, Robin Murphy wrote: >> Since Nate reported a reasonable performance boost from the out-of-line >> MSI polling in v1 [1], I've now implemented the equivalent for cons >> polling as well - that has been boot-tested on D05 with some trivial I/O >> and at least doesn't seem to lock up or explode. There's also a little >> cosmetic tweaking to make the patches a bit cleaner as a series. >> >> Robin. >> >> [1] https://www.mail-archive.com/iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org/msg19657.html >> >> Robin Murphy (5): >> iommu/arm-smmu-v3: Specialise CMD_SYNC handling >> iommu/arm-smmu-v3: Forget about cmdq-sync interrupt >> iommu/arm-smmu-v3: Use CMD_SYNC completion interrupt >> iommu/arm-smmu-v3: Poll for CMD_SYNC outside cmdq lock >> iommu/arm-smmu-v3: Use burst-polling for sync completion > > What's this final mythical patch about? I don't see it in the series. It's in the place 5/5 would be, I just tagged it as [RFT] since I have zero evidence whether it's worth the bother or not. > Anyway, the first two patches look fine to me, but this doesn't apply > on top of my iommu/devel branch so they will need rebasing. Will do. Thanks for the review, Robin. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 0/4] SMMUv3 CMD_SYNC optimisation [not found] ` <20171013190521.GD30572-5wv7dgnIgG8@public.gmane.org> 2017-10-16 13:18 ` Robin Murphy @ 2017-10-16 15:02 ` Will Deacon 1 sibling, 0 replies; 13+ messages in thread From: Will Deacon @ 2017-10-16 15:02 UTC (permalink / raw) To: Robin Murphy Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, sgoutham-YGCgFSpz5w/QT0dZR+AlfA, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r On Fri, Oct 13, 2017 at 08:05:21PM +0100, Will Deacon wrote: > On Thu, Aug 31, 2017 at 02:44:24PM +0100, Robin Murphy wrote: > > Since Nate reported a reasonable performance boost from the out-of-line > > MSI polling in v1 [1], I've now implemented the equivalent for cons > > polling as well - that has been boot-tested on D05 with some trivial I/O > > and at least doesn't seem to lock up or explode. There's also a little > > cosmetic tweaking to make the patches a bit cleaner as a series. > > > > Robin. > > > > [1] https://www.mail-archive.com/iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org/msg19657.html > > > > Robin Murphy (5): > > iommu/arm-smmu-v3: Specialise CMD_SYNC handling > > iommu/arm-smmu-v3: Forget about cmdq-sync interrupt > > iommu/arm-smmu-v3: Use CMD_SYNC completion interrupt > > iommu/arm-smmu-v3: Poll for CMD_SYNC outside cmdq lock > > iommu/arm-smmu-v3: Use burst-polling for sync completion > > What's this final mythical patch about? I don't see it in the series. > > Anyway, the first two patches look fine to me, but this doesn't apply > on top of my iommu/devel branch so they will need rebasing. No idea what I was doing wrong on Friday, but the first two patches apply cleanly now, so I'll push them out in a bit. Sorry for the noise, Will ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2017-10-16 15:02 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-31 13:44 [PATCH v2 0/4] SMMUv3 CMD_SYNC optimisation Robin Murphy
[not found] ` <cover.1504182142.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2017-08-31 13:44 ` [PATCH v2 1/4] iommu/arm-smmu-v3: Specialise CMD_SYNC handling Robin Murphy
2017-08-31 13:44 ` [PATCH v2 2/4] iommu/arm-smmu-v3: Forget about cmdq-sync interrupt Robin Murphy
2017-08-31 13:44 ` [PATCH v2 3/4] iommu/arm-smmu-v3: Use CMD_SYNC completion MSI Robin Murphy
[not found] ` <dbf0ce00f8e249c64f3d2041acd8d91818178e52.1504182142.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2017-10-13 18:32 ` Will Deacon
[not found] ` <20171013183237.GA30572-5wv7dgnIgG8@public.gmane.org>
2017-10-16 12:25 ` Robin Murphy
2017-08-31 13:44 ` [PATCH v2 4/4] iommu/arm-smmu-v3: Poll for CMD_SYNC outside cmdq lock Robin Murphy
[not found] ` <ff239173e47dfa0fc76eaa2a25b3cbcfe8dce5e6.1504182142.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2017-10-13 18:59 ` Will Deacon
[not found] ` <20171013185917.GB30572-5wv7dgnIgG8@public.gmane.org>
2017-10-16 13:12 ` Robin Murphy
2017-08-31 13:44 ` [RFT] iommu/arm-smmu-v3: Use burst-polling for sync completion Robin Murphy
2017-10-13 19:05 ` [PATCH v2 0/4] SMMUv3 CMD_SYNC optimisation Will Deacon
[not found] ` <20171013190521.GD30572-5wv7dgnIgG8@public.gmane.org>
2017-10-16 13:18 ` Robin Murphy
2017-10-16 15:02 ` Will Deacon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).