Re: [PATCH 1/5] iommu/arm-smmu-v3: put off the execution of TLBI* to reduce lock confliction

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Leizhen (ThunderTown)" <thunder.leizhen@huawei.com>
To: Joerg Roedel <joro@8bytes.org>
Cc: John Garry <john.garry@huawei.com>,
	Hanjun Guo <guohanjun@huawei.com>,
	Will Deacon <will.deacon@arm.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Xinwei Hu <huxinwei@huawei.com>,
	iommu <iommu@lists.linux-foundation.org>,
	Zefan Li <lizefan@huawei.com>,
	Tianhong Ding <dingtianhong@huawei.com>,
	Robin Murphy <robin.murphy@arm.com>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH 1/5] iommu/arm-smmu-v3: put off the execution of TLBI* to reduce lock confliction
Date: Wed, 23 Aug 2017 09:21:33 +0800	[thread overview]
Message-ID: <599CD89D.7080600@huawei.com> (raw)
In-Reply-To: <20170822154142.GA19533@8bytes.org>



On 2017/8/22 23:41, Joerg Roedel wrote:
> On Mon, Jun 26, 2017 at 09:38:46PM +0800, Zhen Lei wrote:
>> -static int queue_insert_raw(struct arm_smmu_queue *q, u64 *ent)
>> +static int queue_insert_raw(struct arm_smmu_queue *q, u64 *ent, int optimize)
>>  {
>>  	if (queue_full(q))
>>  		return -ENOSPC;
>>  
>>  	queue_write(Q_ENT(q, q->prod), ent, q->ent_dwords);
>> -	queue_inc_prod(q);
>> +
>> +	/*
>> +	 * We don't want too many commands to be delayed, this may lead the
>> +	 * followed sync command to wait for a long time.
>> +	 */
>> +	if (optimize && (++q->nr_delay < CMDQ_MAX_DELAYED)) {
>> +		queue_inc_swprod(q);
>> +	} else {
>> +		queue_inc_prod(q);
>> +		q->nr_delay = 0;
>> +	}
>> +
>>  	return 0;
>>  }
>>  
>> @@ -909,6 +928,7 @@ static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu)
>>  static void arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu,
>>  				    struct arm_smmu_cmdq_ent *ent)
>>  {
>> +	int optimize = 0;
>>  	u64 cmd[CMDQ_ENT_DWORDS];
>>  	unsigned long flags;
>>  	bool wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV);
>> @@ -920,8 +940,17 @@ static void arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu,
>>  		return;
>>  	}
>>  
>> +	/*
>> +	 * All TLBI commands should be followed by a sync command later.
>> +	 * The CFGI commands is the same, but they are rarely executed.
>> +	 * So just optimize TLBI commands now, to reduce the "if" judgement.
>> +	 */
>> +	if ((ent->opcode >= CMDQ_OP_TLBI_NH_ALL) &&
>> +	    (ent->opcode <= CMDQ_OP_TLBI_NSNH_ALL))
>> +		optimize = 1;
>> +
>>  	spin_lock_irqsave(&smmu->cmdq.lock, flags);
>> -	while (queue_insert_raw(q, cmd) == -ENOSPC) {
>> +	while (queue_insert_raw(q, cmd, optimize) == -ENOSPC) {
>>  		if (queue_poll_cons(q, false, wfe))
>>  			dev_err_ratelimited(smmu->dev, "CMDQ timeout\n");
>>  	}
> 
> This doesn't look correct. How do you make sure that a given IOVA range
> is flushed before the addresses are reused?
Hi, Joerg:
	It's actullay guaranteed by the upper layer functions, for example:
	static int arm_lpae_unmap(
        ...
    	unmapped = __arm_lpae_unmap(data, iova, size, lvl, ptep);	//__arm_lpae_unmap will indirectly call arm_smmu_cmdq_issue_cmd to invalidate tlbs
	if (unmapped)
		io_pgtable_tlb_sync(&data->iop);			//a tlb_sync wait all tlbi operations finished

	
	I also described it in the next patch(2/5). Showed below:

Some people might ask: Is it safe to do so? The answer is yes. The standard
processing flow is:
	alloc iova
	map
	process data
	unmap
	tlb invalidation and sync
	free iova

What should be guaranteed is: "free iova" action is behind "unmap" and "tlbi
operation" action, that is what we are doing right now. This ensures that:
all TLBs of an iova-range have been invalidated before the iova reallocated.

Best regards,
	LeiZhen

> 
> 
> Regards,
> 
> 	Joerg
> 
> 
> .
> 

-- 
Thanks!
BestRegards

WARNING: multiple messages have this Message-ID (diff)

From: thunder.leizhen@huawei.com (Leizhen (ThunderTown))
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH 1/5] iommu/arm-smmu-v3: put off the execution of TLBI* to reduce lock confliction
Date: Wed, 23 Aug 2017 09:21:33 +0800	[thread overview]
Message-ID: <599CD89D.7080600@huawei.com> (raw)
In-Reply-To: <20170822154142.GA19533@8bytes.org>



On 2017/8/22 23:41, Joerg Roedel wrote:
> On Mon, Jun 26, 2017 at 09:38:46PM +0800, Zhen Lei wrote:
>> -static int queue_insert_raw(struct arm_smmu_queue *q, u64 *ent)
>> +static int queue_insert_raw(struct arm_smmu_queue *q, u64 *ent, int optimize)
>>  {
>>  	if (queue_full(q))
>>  		return -ENOSPC;
>>  
>>  	queue_write(Q_ENT(q, q->prod), ent, q->ent_dwords);
>> -	queue_inc_prod(q);
>> +
>> +	/*
>> +	 * We don't want too many commands to be delayed, this may lead the
>> +	 * followed sync command to wait for a long time.
>> +	 */
>> +	if (optimize && (++q->nr_delay < CMDQ_MAX_DELAYED)) {
>> +		queue_inc_swprod(q);
>> +	} else {
>> +		queue_inc_prod(q);
>> +		q->nr_delay = 0;
>> +	}
>> +
>>  	return 0;
>>  }
>>  
>> @@ -909,6 +928,7 @@ static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu)
>>  static void arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu,
>>  				    struct arm_smmu_cmdq_ent *ent)
>>  {
>> +	int optimize = 0;
>>  	u64 cmd[CMDQ_ENT_DWORDS];
>>  	unsigned long flags;
>>  	bool wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV);
>> @@ -920,8 +940,17 @@ static void arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu,
>>  		return;
>>  	}
>>  
>> +	/*
>> +	 * All TLBI commands should be followed by a sync command later.
>> +	 * The CFGI commands is the same, but they are rarely executed.
>> +	 * So just optimize TLBI commands now, to reduce the "if" judgement.
>> +	 */
>> +	if ((ent->opcode >= CMDQ_OP_TLBI_NH_ALL) &&
>> +	    (ent->opcode <= CMDQ_OP_TLBI_NSNH_ALL))
>> +		optimize = 1;
>> +
>>  	spin_lock_irqsave(&smmu->cmdq.lock, flags);
>> -	while (queue_insert_raw(q, cmd) == -ENOSPC) {
>> +	while (queue_insert_raw(q, cmd, optimize) == -ENOSPC) {
>>  		if (queue_poll_cons(q, false, wfe))
>>  			dev_err_ratelimited(smmu->dev, "CMDQ timeout\n");
>>  	}
> 
> This doesn't look correct. How do you make sure that a given IOVA range
> is flushed before the addresses are reused?
Hi, Joerg:
	It's actullay guaranteed by the upper layer functions, for example:
	static int arm_lpae_unmap(
        ...
    	unmapped = __arm_lpae_unmap(data, iova, size, lvl, ptep);	//__arm_lpae_unmap will indirectly call arm_smmu_cmdq_issue_cmd to invalidate tlbs
	if (unmapped)
		io_pgtable_tlb_sync(&data->iop);			//a tlb_sync wait all tlbi operations finished

	
	I also described it in the next patch(2/5). Showed below:

Some people might ask: Is it safe to do so? The answer is yes. The standard
processing flow is:
	alloc iova
	map
	process data
	unmap
	tlb invalidation and sync
	free iova

What should be guaranteed is: "free iova" action is behind "unmap" and "tlbi
operation" action, that is what we are doing right now. This ensures that:
all TLBs of an iova-range have been invalidated before the iova reallocated.

Best regards,
	LeiZhen

> 
> 
> Regards,
> 
> 	Joerg
> 
> 
> .
> 

-- 
Thanks!
BestRegards

WARNING: multiple messages have this Message-ID (diff)

From: "Leizhen (ThunderTown)" <thunder.leizhen@huawei.com>
To: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will.deacon@arm.com>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
	iommu <iommu@lists.linux-foundation.org>,
	Robin Murphy <robin.murphy@arm.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Zefan Li <lizefan@huawei.com>, Xinwei Hu <huxinwei@huawei.com>,
	Tianhong Ding <dingtianhong@huawei.com>,
	Hanjun Guo <guohanjun@huawei.com>,
	John Garry <john.garry@huawei.com>
Subject: Re: [PATCH 1/5] iommu/arm-smmu-v3: put off the execution of TLBI* to reduce lock confliction
Date: Wed, 23 Aug 2017 09:21:33 +0800	[thread overview]
Message-ID: <599CD89D.7080600@huawei.com> (raw)
In-Reply-To: <20170822154142.GA19533@8bytes.org>



On 2017/8/22 23:41, Joerg Roedel wrote:
> On Mon, Jun 26, 2017 at 09:38:46PM +0800, Zhen Lei wrote:
>> -static int queue_insert_raw(struct arm_smmu_queue *q, u64 *ent)
>> +static int queue_insert_raw(struct arm_smmu_queue *q, u64 *ent, int optimize)
>>  {
>>  	if (queue_full(q))
>>  		return -ENOSPC;
>>  
>>  	queue_write(Q_ENT(q, q->prod), ent, q->ent_dwords);
>> -	queue_inc_prod(q);
>> +
>> +	/*
>> +	 * We don't want too many commands to be delayed, this may lead the
>> +	 * followed sync command to wait for a long time.
>> +	 */
>> +	if (optimize && (++q->nr_delay < CMDQ_MAX_DELAYED)) {
>> +		queue_inc_swprod(q);
>> +	} else {
>> +		queue_inc_prod(q);
>> +		q->nr_delay = 0;
>> +	}
>> +
>>  	return 0;
>>  }
>>  
>> @@ -909,6 +928,7 @@ static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu)
>>  static void arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu,
>>  				    struct arm_smmu_cmdq_ent *ent)
>>  {
>> +	int optimize = 0;
>>  	u64 cmd[CMDQ_ENT_DWORDS];
>>  	unsigned long flags;
>>  	bool wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV);
>> @@ -920,8 +940,17 @@ static void arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu,
>>  		return;
>>  	}
>>  
>> +	/*
>> +	 * All TLBI commands should be followed by a sync command later.
>> +	 * The CFGI commands is the same, but they are rarely executed.
>> +	 * So just optimize TLBI commands now, to reduce the "if" judgement.
>> +	 */
>> +	if ((ent->opcode >= CMDQ_OP_TLBI_NH_ALL) &&
>> +	    (ent->opcode <= CMDQ_OP_TLBI_NSNH_ALL))
>> +		optimize = 1;
>> +
>>  	spin_lock_irqsave(&smmu->cmdq.lock, flags);
>> -	while (queue_insert_raw(q, cmd) == -ENOSPC) {
>> +	while (queue_insert_raw(q, cmd, optimize) == -ENOSPC) {
>>  		if (queue_poll_cons(q, false, wfe))
>>  			dev_err_ratelimited(smmu->dev, "CMDQ timeout\n");
>>  	}
> 
> This doesn't look correct. How do you make sure that a given IOVA range
> is flushed before the addresses are reused?
Hi, Joerg:
	It's actullay guaranteed by the upper layer functions, for example:
	static int arm_lpae_unmap(
        ...
    	unmapped = __arm_lpae_unmap(data, iova, size, lvl, ptep);	//__arm_lpae_unmap will indirectly call arm_smmu_cmdq_issue_cmd to invalidate tlbs
	if (unmapped)
		io_pgtable_tlb_sync(&data->iop);			//a tlb_sync wait all tlbi operations finished

	
	I also described it in the next patch(2/5). Showed below:

Some people might ask: Is it safe to do so? The answer is yes. The standard
processing flow is:
	alloc iova
	map
	process data
	unmap
	tlb invalidation and sync
	free iova

What should be guaranteed is: "free iova" action is behind "unmap" and "tlbi
operation" action, that is what we are doing right now. This ensures that:
all TLBs of an iova-range have been invalidated before the iova reallocated.

Best regards,
	LeiZhen

> 
> 
> Regards,
> 
> 	Joerg
> 
> 
> .
> 

-- 
Thanks!
BestRegards

next prev parent reply	other threads:[~2017-08-23  1:21 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-26 13:38 [PATCH 0/5] arm-smmu: performance optimization Zhen Lei
2017-06-26 13:38 ` Zhen Lei
2017-06-26 13:38 ` Zhen Lei
2017-06-26 13:38 ` [PATCH 1/5] iommu/arm-smmu-v3: put off the execution of TLBI* to reduce lock confliction Zhen Lei
2017-06-26 13:38   ` Zhen Lei
     [not found]   ` <1498484330-10840-2-git-send-email-thunder.leizhen-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2017-06-28  9:32     ` Will Deacon
2017-06-28  9:32       ` Will Deacon
2017-06-28  9:32       ` Will Deacon
2017-06-29  2:08       ` Leizhen (ThunderTown)
2017-06-29  2:08         ` Leizhen (ThunderTown)
2017-06-29  2:08         ` Leizhen (ThunderTown)
     [not found]         ` <5954610F.9020807-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2017-07-17 13:06           ` John Garry
2017-07-17 13:06             ` John Garry
2017-07-17 13:06             ` John Garry
2017-07-17 14:23             ` Jonathan Cameron
2017-07-17 14:23               ` Jonathan Cameron
2017-07-17 14:23               ` Jonathan Cameron
     [not found]               ` <20170717222337.0000508f-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2017-07-17 17:28                 ` Nate Watterson
2017-07-17 17:28                   ` Nate Watterson
2017-07-17 17:28                   ` Nate Watterson
     [not found]                   ` <3cec10c5-82ca-2c54-dfdb-ac73b16e5bc6-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2017-07-18  9:20                     ` Jonathan Cameron
2017-07-18  9:20                       ` Jonathan Cameron
2017-07-18  9:20                       ` Jonathan Cameron
     [not found]                       ` <20170718172055.00006e84-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2017-07-20 19:07                         ` Nate Watterson
2017-07-20 19:07                           ` Nate Watterson
2017-07-20 19:07                           ` Nate Watterson
     [not found]                           ` <c1d85f28-c57b-4414-3504-16afb3a19ce0-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2017-07-21 10:57                             ` Jonathan Cameron
2017-07-21 10:57                               ` Jonathan Cameron
2017-07-21 10:57                               ` Jonathan Cameron
2017-08-22 15:41     ` Joerg Roedel
2017-08-22 15:41       ` Joerg Roedel
2017-08-22 15:41       ` Joerg Roedel
2017-08-23  1:21       ` Leizhen (ThunderTown) [this message]
2017-08-23  1:21         ` Leizhen (ThunderTown)
2017-08-23  1:21         ` Leizhen (ThunderTown)
2017-06-26 13:38 ` [PATCH 2/5] iommu: add a new member unmap_tlb_sync into struct iommu_ops Zhen Lei
2017-06-26 13:38   ` Zhen Lei
     [not found] ` <1498484330-10840-1-git-send-email-thunder.leizhen-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2017-06-26 13:38   ` [PATCH 3/5] iommu/arm-smmu-v3: add support for unmap an iova range with only one tlb sync Zhen Lei
2017-06-26 13:38     ` Zhen Lei
2017-06-26 13:38     ` Zhen Lei
2017-06-26 13:38   ` [PATCH 4/5] iommu/arm-smmu: add support for unmap a memory " Zhen Lei
2017-06-26 13:38     ` Zhen Lei
2017-06-26 13:38     ` Zhen Lei
2017-06-26 13:38   ` [PATCH 5/5] iommu/io-pgtable: delete member tlb_sync_pending of struct io_pgtable Zhen Lei
2017-06-26 13:38     ` Zhen Lei
2017-06-26 13:38     ` Zhen Lei
2017-08-17 14:36   ` [PATCH 0/5] arm-smmu: performance optimization Will Deacon
2017-08-17 14:36     ` Will Deacon
2017-08-17 14:36     ` Will Deacon
     [not found]     ` <20170817143650.GB30338-5wv7dgnIgG8@public.gmane.org>
2017-08-18  3:19       ` Leizhen (ThunderTown)
2017-08-18  3:19         ` Leizhen (ThunderTown)
2017-08-18  3:19         ` Leizhen (ThunderTown)
2017-08-18  8:39         ` Will Deacon
2017-08-18  8:39           ` Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=599CD89D.7080600@huawei.com \
    --to=thunder.leizhen@huawei.com \
    --cc=dingtianhong@huawei.com \
    --cc=guohanjun@huawei.com \
    --cc=huxinwei@huawei.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=john.garry@huawei.com \
    --cc=joro@8bytes.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan@huawei.com \
    --cc=robin.murphy@arm.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.