Re: [PATCH 1/5] iommu/arm-smmu-v3: put off the execution of TLBI* to reduce lock confliction

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: "Leizhen (ThunderTown)" <thunder.leizhen@huawei.com>
To: Will Deacon <will.deacon@arm.com>
Cc: Joerg Roedel <joro@8bytes.org>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
	iommu <iommu@lists.linux-foundation.org>,
	Robin Murphy <robin.murphy@arm.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Zefan Li <lizefan@huawei.com>, Xinwei Hu <huxinwei@huawei.com>,
	Tianhong Ding <dingtianhong@huawei.com>,
	Hanjun Guo <guohanjun@huawei.com>,
	John Garry <john.garry@huawei.com>, <nwatters@codeaurora.org>
Subject: Re: [PATCH 1/5] iommu/arm-smmu-v3: put off the execution of TLBI* to reduce lock confliction
Date: Thu, 29 Jun 2017 10:08:15 +0800	[thread overview]
Message-ID: <5954610F.9020807@huawei.com> (raw)
In-Reply-To: <20170628093207.GB11053@arm.com>



On 2017/6/28 17:32, Will Deacon wrote:
> Hi Zhen Lei,
> 
> Nate (CC'd), Robin and I have been working on something very similar to
> this series, but this patch is different to what we had planned. More below.
> 
> On Mon, Jun 26, 2017 at 09:38:46PM +0800, Zhen Lei wrote:
>> Because all TLBI commands should be followed by a SYNC command, to make
>> sure that it has been completely finished. So we can just add the TLBI
>> commands into the queue, and put off the execution until meet SYNC or
>> other commands. To prevent the followed SYNC command waiting for a long
>> time because of too many commands have been delayed, restrict the max
>> delayed number.
>>
>> According to my test, I got the same performance data as I replaced writel
>> with writel_relaxed in queue_inc_prod.
>>
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>> ---
>>  drivers/iommu/arm-smmu-v3.c | 42 +++++++++++++++++++++++++++++++++++++-----
>>  1 file changed, 37 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>> index 291da5f..4481123 100644
>> --- a/drivers/iommu/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm-smmu-v3.c
>> @@ -337,6 +337,7 @@
>>  /* Command queue */
>>  #define CMDQ_ENT_DWORDS			2
>>  #define CMDQ_MAX_SZ_SHIFT		8
>> +#define CMDQ_MAX_DELAYED		32
>>  
>>  #define CMDQ_ERR_SHIFT			24
>>  #define CMDQ_ERR_MASK			0x7f
>> @@ -472,6 +473,7 @@ struct arm_smmu_cmdq_ent {
>>  			};
>>  		} cfgi;
>>  
>> +		#define CMDQ_OP_TLBI_NH_ALL	0x10
>>  		#define CMDQ_OP_TLBI_NH_ASID	0x11
>>  		#define CMDQ_OP_TLBI_NH_VA	0x12
>>  		#define CMDQ_OP_TLBI_EL2_ALL	0x20
>> @@ -499,6 +501,7 @@ struct arm_smmu_cmdq_ent {
>>  
>>  struct arm_smmu_queue {
>>  	int				irq; /* Wired interrupt */
>> +	u32				nr_delay;
>>  
>>  	__le64				*base;
>>  	dma_addr_t			base_dma;
>> @@ -722,11 +725,16 @@ static int queue_sync_prod(struct arm_smmu_queue *q)
>>  	return ret;
>>  }
>>  
>> -static void queue_inc_prod(struct arm_smmu_queue *q)
>> +static void queue_inc_swprod(struct arm_smmu_queue *q)
>>  {
>> -	u32 prod = (Q_WRP(q, q->prod) | Q_IDX(q, q->prod)) + 1;
>> +	u32 prod = q->prod + 1;
>>  
>>  	q->prod = Q_OVF(q, q->prod) | Q_WRP(q, prod) | Q_IDX(q, prod);
>> +}
>> +
>> +static void queue_inc_prod(struct arm_smmu_queue *q)
>> +{
>> +	queue_inc_swprod(q);
>>  	writel(q->prod, q->prod_reg);
>>  }
>>  
>> @@ -761,13 +769,24 @@ static void queue_write(__le64 *dst, u64 *src, size_t n_dwords)
>>  		*dst++ = cpu_to_le64(*src++);
>>  }
>>  
>> -static int queue_insert_raw(struct arm_smmu_queue *q, u64 *ent)
>> +static int queue_insert_raw(struct arm_smmu_queue *q, u64 *ent, int optimize)
>>  {
>>  	if (queue_full(q))
>>  		return -ENOSPC;
>>  
>>  	queue_write(Q_ENT(q, q->prod), ent, q->ent_dwords);
>> -	queue_inc_prod(q);
>> +
>> +	/*
>> +	 * We don't want too many commands to be delayed, this may lead the
>> +	 * followed sync command to wait for a long time.
>> +	 */
>> +	if (optimize && (++q->nr_delay < CMDQ_MAX_DELAYED)) {
>> +		queue_inc_swprod(q);
>> +	} else {
>> +		queue_inc_prod(q);
>> +		q->nr_delay = 0;
>> +	}
>> +
> 
> So here, you're effectively putting invalidation commands into the command
> queue without updating PROD. Do you actually see a performance advantage
> from doing so? Another side of the argument would be that we should be
Yes, my sas ssd performance test showed that it can improve about 100-150K/s(the same to I directly replace
writel with writel_relaxed). And the average execution time of iommu_unmap(which called by iommu_dma_unmap_sg)
dropped from 10us to 5us.

> moving PROD as soon as we can, so that the SMMU can process invalidation
> commands in the background and reduce the cost of the final SYNC operation
> when the high-level unmap operation is complete.
There maybe that __iowmb() is more expensive than wait for tlbi complete. Except the time of __iowmb()
itself, it also protected by spinlock, lock confliction will rise rapidly in the stress scene. __iowmb()
average cost 300-500ns(Sorry, I forget the exact value).

In addition, after applied this patcheset and Robin's v2, and my earlier dma64 iova optimization patchset.
Our net performance test got the same data to global bypass. But sas ssd still have more than 20% dropped.
Maybe we should still focus at map/unamp, because the average execution time of iova alloc/free is only
about 400ns.

By the way, patch2-5 is more effective than this one, it can improve more than 350K/s. And with it, we can
got about 100-150K/s improvement of Robin's v2. Otherwise, I saw non effective of Robin's v2. Sorry, I have
not tested how about this patch without patch2-5. Further more, I got the same performance data to global
bypass for the traditional mechanical hard disk with only patch2-5(without this patch and Robin's).

> 
> Will
> 
> .
> 

-- 
Thanks!
BestRegards

next prev parent reply	other threads:[~2017-06-29  2:09 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-26 13:38 [PATCH 0/5] arm-smmu: performance optimization Zhen Lei
2017-06-26 13:38 ` [PATCH 1/5] iommu/arm-smmu-v3: put off the execution of TLBI* to reduce lock confliction Zhen Lei
2017-06-28  9:32   ` Will Deacon
2017-06-29  2:08     ` Leizhen (ThunderTown) [this message]
2017-07-17 13:06       ` John Garry
2017-07-17 14:23         ` Jonathan Cameron
2017-07-17 17:28           ` Nate Watterson
2017-07-18  9:20             ` Jonathan Cameron
2017-07-20 19:07               ` Nate Watterson
2017-07-21 10:57                 ` Jonathan Cameron
2017-08-22 15:41   ` Joerg Roedel
2017-08-23  1:21     ` Leizhen (ThunderTown)
2017-06-26 13:38 ` [PATCH 2/5] iommu: add a new member unmap_tlb_sync into struct iommu_ops Zhen Lei
2017-06-26 13:38 ` [PATCH 3/5] iommu/arm-smmu-v3: add support for unmap an iova range with only one tlb sync Zhen Lei
2017-06-26 13:38 ` [PATCH 4/5] iommu/arm-smmu: add support for unmap a memory " Zhen Lei
2017-06-26 13:38 ` [PATCH 5/5] iommu/io-pgtable: delete member tlb_sync_pending of struct io_pgtable Zhen Lei
2017-08-17 14:36 ` [PATCH 0/5] arm-smmu: performance optimization Will Deacon
2017-08-18  3:19   ` Leizhen (ThunderTown)
2017-08-18  8:39     ` Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5954610F.9020807@huawei.com \
    --to=thunder.leizhen@huawei.com \
    --cc=dingtianhong@huawei.com \
    --cc=guohanjun@huawei.com \
    --cc=huxinwei@huawei.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=john.garry@huawei.com \
    --cc=joro@8bytes.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan@huawei.com \
    --cc=nwatters@codeaurora.org \
    --cc=robin.murphy@arm.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox