All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Cc: Nicolin Chen <nicolinc@nvidia.com>,
	"will@kernel.org" <will@kernel.org>,
	"robin.murphy@arm.com" <robin.murphy@arm.com>,
	"joro@8bytes.org" <joro@8bytes.org>,
	"thierry.reding@gmail.com" <thierry.reding@gmail.com>,
	"vdumpa@nvidia.com" <vdumpa@nvidia.com>,
	"jonathanh@nvidia.com" <jonathanh@nvidia.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	"linux-tegra@vger.kernel.org" <linux-tegra@vger.kernel.org>,
	Jerry Snitselaar <jsnitsel@redhat.com>
Subject: Re: [PATCH v5 0/6] Add Tegra241 (Grace) CMDQV Support (part 1/2)
Date: Wed, 17 Apr 2024 09:29:26 -0300	[thread overview]
Message-ID: <20240417122926.GP3637727@nvidia.com> (raw)
In-Reply-To: <ba3047f946c6475aaf30800f9d3f9afb@huawei.com>

On Wed, Apr 17, 2024 at 09:45:34AM +0000, Shameerali Kolothum Thodi wrote:

> Just to add to that. One idea could be like to have a case where when ECMDQs are 
> detected, use that for issuing limited set of cmds(like stage 1 TLBIs) and use the
> normal cmdq for rest. Since we use stage 1 for both host and for Guest nested cases
> and TLBIs are the bottlenecks in most cases I think this should give performance
> benefits.

There is definately options to look at to improve the performance
here.

IMHO the design of the ECMDQ largely seems to expect 1 queue per-cpu
and then we move to a lock-less design where each CPU uses it's own
private per-cpu queue. In this case a VMM calling the kernel to do
invalidation would often naturally use a thread originating on a pCPU
bound to a vCPU which is substantially exclusive to the VM.

Jason

WARNING: multiple messages have this Message-ID (diff)
From: Jason Gunthorpe <jgg@nvidia.com>
To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Cc: Nicolin Chen <nicolinc@nvidia.com>,
	"will@kernel.org" <will@kernel.org>,
	"robin.murphy@arm.com" <robin.murphy@arm.com>,
	"joro@8bytes.org" <joro@8bytes.org>,
	"thierry.reding@gmail.com" <thierry.reding@gmail.com>,
	"vdumpa@nvidia.com" <vdumpa@nvidia.com>,
	"jonathanh@nvidia.com" <jonathanh@nvidia.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	"linux-tegra@vger.kernel.org" <linux-tegra@vger.kernel.org>,
	Jerry Snitselaar <jsnitsel@redhat.com>
Subject: Re: [PATCH v5 0/6] Add Tegra241 (Grace) CMDQV Support (part 1/2)
Date: Wed, 17 Apr 2024 09:29:26 -0300	[thread overview]
Message-ID: <20240417122926.GP3637727@nvidia.com> (raw)
In-Reply-To: <ba3047f946c6475aaf30800f9d3f9afb@huawei.com>

On Wed, Apr 17, 2024 at 09:45:34AM +0000, Shameerali Kolothum Thodi wrote:

> Just to add to that. One idea could be like to have a case where when ECMDQs are 
> detected, use that for issuing limited set of cmds(like stage 1 TLBIs) and use the
> normal cmdq for rest. Since we use stage 1 for both host and for Guest nested cases
> and TLBIs are the bottlenecks in most cases I think this should give performance
> benefits.

There is definately options to look at to improve the performance
here.

IMHO the design of the ECMDQ largely seems to expect 1 queue per-cpu
and then we move to a lock-less design where each CPU uses it's own
private per-cpu queue. In this case a VMM calling the kernel to do
invalidation would often naturally use a thread originating on a pCPU
bound to a vCPU which is substantially exclusive to the VM.

Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2024-04-17 12:29 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-13  3:43 [PATCH v5 0/6] Add Tegra241 (Grace) CMDQV Support (part 1/2) Nicolin Chen
2024-04-13  3:43 ` Nicolin Chen
2024-04-13  3:43 ` [PATCH v5 1/6] iommu/arm-smmu-v3: Add CS_NONE quirk Nicolin Chen
2024-04-13  3:43   ` Nicolin Chen
2024-04-19 17:12   ` Nicolin Chen
2024-04-19 17:12     ` Nicolin Chen
2024-04-13  3:43 ` [PATCH v5 2/6] iommu/arm-smmu-v3: Make arm_smmu_cmdq_init reusable Nicolin Chen
2024-04-13  3:43   ` Nicolin Chen
2024-04-13  3:43 ` [PATCH v5 3/6] iommu/arm-smmu-v3: Make __arm_smmu_cmdq_skip_err reusable Nicolin Chen
2024-04-13  3:43   ` Nicolin Chen
2024-04-13  3:43 ` [PATCH v5 4/6] iommu/arm-smmu-v3: Pass in cmdq pointer to arm_smmu_cmdq_issue_cmdlist() Nicolin Chen
2024-04-13  3:43   ` Nicolin Chen
2024-04-13  3:43 ` [PATCH v5 5/6] iommu/arm-smmu-v3: Add in-kernel support for NVIDIA Tegra241 (Grace) CMDQV Nicolin Chen
2024-04-13  3:43   ` Nicolin Chen
2024-04-13  3:43 ` [PATCH v5 6/6] iommu/tegra241-cmdqv: Limit CMDs for guest owned VINTF Nicolin Chen
2024-04-13  3:43   ` Nicolin Chen
2024-04-17 15:12   ` Shameerali Kolothum Thodi
2024-04-17 15:12     ` Shameerali Kolothum Thodi
2024-04-17 16:04     ` Nicolin Chen
2024-04-17 16:04       ` Nicolin Chen
2024-04-15 17:14 ` [PATCH v5 0/6] Add Tegra241 (Grace) CMDQV Support (part 1/2) Jason Gunthorpe
2024-04-15 17:14   ` Jason Gunthorpe
2024-04-17  8:01   ` Shameerali Kolothum Thodi
2024-04-17  8:01     ` Shameerali Kolothum Thodi
2024-04-17  9:45     ` Shameerali Kolothum Thodi
2024-04-17  9:45       ` Shameerali Kolothum Thodi
2024-04-17 12:29       ` Jason Gunthorpe [this message]
2024-04-17 12:29         ` Jason Gunthorpe
2024-04-17 12:24     ` Jason Gunthorpe
2024-04-17 12:24       ` Jason Gunthorpe
2024-04-17 15:13       ` Shameerali Kolothum Thodi
2024-04-17 15:13         ` Shameerali Kolothum Thodi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240417122926.GP3637727@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=iommu@lists.linux.dev \
    --cc=jonathanh@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=jsnitsel@redhat.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tegra@vger.kernel.org \
    --cc=nicolinc@nvidia.com \
    --cc=robin.murphy@arm.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=thierry.reding@gmail.com \
    --cc=vdumpa@nvidia.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.