All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Tanmay Jagdale <tanmay@marvell.com>
Cc: will@kernel.org, robin.murphy@arm.com, joro@8bytes.org,
	nicolinc@nvidia.com, mshavit@google.com,
	baolu.lu@linux.intel.com, thunder.leizhen@huawei.com,
	set_pte_at@outlook.com, smostafa@google.com,
	sgoutham@marvell.com, gcherian@marvell.com, jcm@jonmasters.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH V3 0/2] iommu/arm-smmu-v3: Add support for ECMDQ register mode
Date: Thu, 2 May 2024 13:25:30 -0300	[thread overview]
Message-ID: <20240502162530.GA578685@nvidia.com> (raw)
In-Reply-To: <20240425144152.52352-1-tanmay@marvell.com>

On Thu, Apr 25, 2024 at 07:41:50AM -0700, Tanmay Jagdale wrote:
> Resending the patches by Zhen Lei <thunder.leizhen@huawei.com> that add
> support for SMMU ECMDQ feature.
> 
> Tested this feature on a Marvell SoC by implementing a smmu-test driver.
> This test driver spawns a thread per CPU and each thread keeps sending
> map, table-walk and unmap requests for a fixed duration.

So this is not just measuring invalidation performance but basically
the DMA API performance to do map/unmap operations? What is "batch
size" ?

Does this HW support the range invalidation? How many invalidation
commands does earch test cycle generate?

>                    Total Requests  Average Requests  Difference
>                                       Per CPU         wrt ECMDQ
> -----------------------------------------------------------------
> ECMDQ                 239286381       2991079
> CMDQ Batch Size 1     228232187       2852902         -4.62%
> CMDQ Batch Size 32    233465784       2918322         -2.43%
> CMDQ Batch Size 64    231679588       2895994         -3.18%
> CMDQ Batch Size 128   233189030       2914862         -2.55%
> CMDQ Batch Size 256   230965773       2887072         -3.48%

If this is really 5% for a typical DMA API map/unmap cycle then that
seems interesting to me.

If it is 5% for just the invalidation command then it is harder to
say.

I'd suggest to present your results in terms of latency to do a dma
API map/unmap cycle, and to show how the results scale as you add more
threads. Does even 2 threads start to show a 4-5% gain?

Also, I'm wondering how ATS would interact, I think we have a
head-of-line blocking issue with ATS.. Allowing ATS to progress
concurrently with unrelated parallel invalidations may also be
interesting, esepcially for multi-SVA workloads.

Jason

WARNING: multiple messages have this Message-ID (diff)
From: Jason Gunthorpe <jgg@nvidia.com>
To: Tanmay Jagdale <tanmay@marvell.com>
Cc: will@kernel.org, robin.murphy@arm.com, joro@8bytes.org,
	nicolinc@nvidia.com, mshavit@google.com,
	baolu.lu@linux.intel.com, thunder.leizhen@huawei.com,
	set_pte_at@outlook.com, smostafa@google.com,
	sgoutham@marvell.com, gcherian@marvell.com, jcm@jonmasters.org,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH V3 0/2] iommu/arm-smmu-v3: Add support for ECMDQ register mode
Date: Thu, 2 May 2024 13:25:30 -0300	[thread overview]
Message-ID: <20240502162530.GA578685@nvidia.com> (raw)
In-Reply-To: <20240425144152.52352-1-tanmay@marvell.com>

On Thu, Apr 25, 2024 at 07:41:50AM -0700, Tanmay Jagdale wrote:
> Resending the patches by Zhen Lei <thunder.leizhen@huawei.com> that add
> support for SMMU ECMDQ feature.
> 
> Tested this feature on a Marvell SoC by implementing a smmu-test driver.
> This test driver spawns a thread per CPU and each thread keeps sending
> map, table-walk and unmap requests for a fixed duration.

So this is not just measuring invalidation performance but basically
the DMA API performance to do map/unmap operations? What is "batch
size" ?

Does this HW support the range invalidation? How many invalidation
commands does earch test cycle generate?

>                    Total Requests  Average Requests  Difference
>                                       Per CPU         wrt ECMDQ
> -----------------------------------------------------------------
> ECMDQ                 239286381       2991079
> CMDQ Batch Size 1     228232187       2852902         -4.62%
> CMDQ Batch Size 32    233465784       2918322         -2.43%
> CMDQ Batch Size 64    231679588       2895994         -3.18%
> CMDQ Batch Size 128   233189030       2914862         -2.55%
> CMDQ Batch Size 256   230965773       2887072         -3.48%

If this is really 5% for a typical DMA API map/unmap cycle then that
seems interesting to me.

If it is 5% for just the invalidation command then it is harder to
say.

I'd suggest to present your results in terms of latency to do a dma
API map/unmap cycle, and to show how the results scale as you add more
threads. Does even 2 threads start to show a 4-5% gain?

Also, I'm wondering how ATS would interact, I think we have a
head-of-line blocking issue with ATS.. Allowing ATS to progress
concurrently with unrelated parallel invalidations may also be
interesting, esepcially for multi-SVA workloads.

Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2024-05-02 16:25 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-25 14:41 [PATCH V3 0/2] iommu/arm-smmu-v3: Add support for ECMDQ register mode Tanmay Jagdale
2024-04-25 14:41 ` Tanmay Jagdale
2024-04-25 14:41 ` [PATCH V3 1/2] " Tanmay Jagdale
2024-04-25 14:41   ` Tanmay Jagdale
2024-04-28  2:19   ` Leizhen (ThunderTown)
2024-04-28  2:19     ` Leizhen (ThunderTown)
2024-04-25 14:41 ` [PATCH V3 2/2] iommu/arm-smmu-v3: Ensure that a set of associated commands are inserted in the same ECMDQ Tanmay Jagdale
2024-04-25 14:41   ` Tanmay Jagdale
2024-04-28  2:08 ` [PATCH V3 0/2] iommu/arm-smmu-v3: Add support for ECMDQ register mode Leizhen (ThunderTown)
2024-04-28  2:08   ` Leizhen (ThunderTown)
2024-04-30 15:09 ` Will Deacon
2024-04-30 15:09   ` Will Deacon
2024-05-16 14:25   ` Tanmay Jagdale
2024-05-16 14:25     ` Tanmay Jagdale
2024-05-02 16:25 ` Jason Gunthorpe [this message]
2024-05-02 16:25   ` Jason Gunthorpe
  -- strict thread matches above, loose matches on Subject: below --
2024-04-25 14:30 Tanmay Jagdale
2024-04-25 14:30 ` Tanmay Jagdale

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240502162530.GA578685@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=gcherian@marvell.com \
    --cc=iommu@lists.linux.dev \
    --cc=jcm@jonmasters.org \
    --cc=joro@8bytes.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mshavit@google.com \
    --cc=nicolinc@nvidia.com \
    --cc=robin.murphy@arm.com \
    --cc=set_pte_at@outlook.com \
    --cc=sgoutham@marvell.com \
    --cc=smostafa@google.com \
    --cc=tanmay@marvell.com \
    --cc=thunder.leizhen@huawei.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.