linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Michael Shavit <mshavit@google.com>
Cc: iommu@lists.linux.dev, Joerg Roedel <joro@8bytes.org>,
	linux-arm-kernel@lists.infradead.org,
	Robin Murphy <robin.murphy@arm.com>,
	Will Deacon <will@kernel.org>, Nicolin Chen <nicolinc@nvidia.com>
Subject: Re: [PATCH 04/19] iommu/arm-smmu-v3: Make STE programming independent of the callers
Date: Sun, 17 Dec 2023 09:03:55 -0400	[thread overview]
Message-ID: <ZX7xu8jN/rBZ4A/d@nvidia.com> (raw)
In-Reply-To: <CAKHBV24nfdq1q76T+DRQGNYeA-kP7ONnSjDsRFGsQCJVSrSm0w@mail.gmail.com>

On Sat, Dec 16, 2023 at 04:26:48AM +0800, Michael Shavit wrote:

> Ok, I took a proper stab at trying to unroll the loop on the github
> version of this patch (v3+)
> As you suspected, it's not easy to re-use the unrolled version for
> both STE and CD writing as we'd have to pass in callbacks for syncing
> the STE/CD and recomputing arm_smmu_{get_ste/cd}_used. 

Yes, that is why I structured it as an iterator
> I personally find this quite a bit more readable as a sequential
> series of steps instead of a loop. It also only requires 3 STE/CD
> syncs in the worst case compared to 4 in the loop version since we

The existing version is max 3 as well, it works the same by
checking the number of critical qwords after computing the first step
in the hitless flow.

However, what you have below has the same problem as the first sketch,
it always does 3 syncs. The existing version fully minimizes the
syncs. It is why it is so complex to unroll it as you have to check
before every sync if the sync is needed at all.

This could probably be organized like this so one shared function
computes the "plan" and then a cd/ste copy executes the plan. It
avoids the loop but all the code is still basically the same, there is
just more of it.

I'm fine with any of these ways

Jason

>  static void arm_smmu_write_ste(struct arm_smmu_device *smmu, u32 sid,
>                                struct arm_smmu_ste *ste,
>                                const struct arm_smmu_ste *target)
>  {
> +       struct arm_smmu_ste staging_ste;
>         struct arm_smmu_ste target_used;
> +       int writes_required = 0;
> +       u8 staging_used_diff = 0;
> +       int i = 0;
> 
> +       /*
> +        * Compute a staging ste that has all the bits currently unused by HW
> +        * set to their target values, such that comitting it to the ste table
> +        * woudn't disrupt the hardware.
> +        */
> +       memcpy(&staging_ste, ste, sizeof(staging_ste));
> +       arm_smmu_ste_set_unused_bits(&staging_ste, target);
> +
> +       /*
> +        * Determine if it's possible to reach the target configuration from the
> +        * staged configured in a single qword write (ignoring bits that are
> +        * unused under the target configuration).
> +        */
>         arm_smmu_get_ste_used(target, &target_used);
> -               WARN_ON_ONCE(target->data[i] & ~target_used.data[i]);
> +       /*
> +        * But first sanity check that masks in arm_smmu_get_ste_used() are up
> +        * to date.
> +        */
> +       for (i = 0; i != ARRAY_SIZE(target->data); i++) {
> +               if (WARN_ON_ONCE(target->data[i] & ~target_used.data[i]))
> +                       target_used.data[i] |= target->data[i];
> +       }
> 
> +       for (i = 0; i != ARRAY_SIZE(staging_ste.data); i++) {
> +               /*
> +                * Each bit of staging_used_diff indicates the index of a qword
> +                * within the staged ste that is incorrect compared to the
> +                * target, considering only the used bits in the target
> +                */
> +               if ((staging_ste.data[i] &
> +                   target_used.data[i]) != (target->data[i]))
> +                       staging_used_diff |= 1 << i;
> +       }
> +       if (hweight8(staging_used_diff) > 1) {
> +               /*
> +                * More than 1 qword is mismatched and a hitless transition from
> +                * the current ste to the target ste is not possible. Clear the
> +                * V bit and recompute the staging STE.
> +                * Because the V bit is cleared, the staging STE will be equal
> +                * to the target STE except for the first qword.
> +                */
> +               staging_ste.data[0] &= ~cpu_to_le64(STRTAB_STE_0_V);
> +               arm_smmu_ste_set_unused_bits(&staging_ste, target);
> +               staging_used_diff = 1;
> +       }
> +
> +       /*
> +        * Commit the staging STE.
> +        */
> +       for (i = 0; i != ARRAY_SIZE(staging_ste.data); i++)
> +               WRITE_ONCE(ste->data[i], staging_ste.data[i]);
> +       arm_smmu_sync_ste_for_sid(smmu, sid);
> +
> +       /*
> +        * It's now possible to switch to the target configuration with a write
> +        * to a single qword. Make that switch now.
> +        */
> +       i = ffs(staging_used_diff) - 1;
> +       WRITE_ONCE(ste->data[i], target->data[i]);
> +       arm_smmu_sync_ste_for_sid(smmu, sid);

The non hitless flow doesn't look right to me, it should set v=0 then
load all qwords but 0 then load 0, in exactly that sequence. If the
goal is clarity then the two programming flows should be explicitly
spelled out.

Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2023-12-17 13:04 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-11  0:33 [PATCH 00/19] Update SMMUv3 to the modern iommu API (part 1/2) Jason Gunthorpe
2023-10-11  0:33 ` [PATCH 01/19] iommu/arm-smmu-v3: Add a type for the STE Jason Gunthorpe
2023-10-13 10:37   ` Will Deacon
2023-10-13 14:00     ` Jason Gunthorpe
2023-10-11  0:33 ` [PATCH 02/19] iommu/arm-smmu-v3: Master cannot be NULL in arm_smmu_write_strtab_ent() Jason Gunthorpe
2023-10-11  0:33 ` [PATCH 03/19] iommu/arm-smmu-v3: Remove ARM_SMMU_DOMAIN_NESTED Jason Gunthorpe
2023-10-11  0:33 ` [PATCH 04/19] iommu/arm-smmu-v3: Make STE programming independent of the callers Jason Gunthorpe
2023-10-12  8:10   ` Michael Shavit
2023-10-12 12:16     ` Jason Gunthorpe
2023-10-18 11:05       ` Michael Shavit
2023-10-18 13:04         ` Jason Gunthorpe
2023-10-20  8:23           ` Michael Shavit
2023-10-20 11:39             ` Jason Gunthorpe
2023-10-23  8:36               ` Michael Shavit
2023-10-23 12:05                 ` Jason Gunthorpe
2023-12-15 20:26                 ` Michael Shavit
2023-12-17 13:03                   ` Jason Gunthorpe [this message]
2023-12-18 12:35                     ` Michael Shavit
2023-12-18 12:42                       ` Michael Shavit
2023-12-19 13:42                       ` Michael Shavit
2023-12-25 12:17                         ` Michael Shavit
2023-12-25 12:58                           ` Michael Shavit
2023-12-27 15:33                             ` Jason Gunthorpe
2023-12-27 15:46                         ` Jason Gunthorpe
2024-01-02  8:08                           ` Michael Shavit
2024-01-02 14:48                             ` Jason Gunthorpe
2024-01-03 16:52                               ` Michael Shavit
2024-01-03 17:50                                 ` Jason Gunthorpe
2024-01-06  8:36                                   ` [PATCH] " Michael Shavit
2024-01-06  8:36                                     ` [PATCH] iommu/arm-smmu-v3: Make CD programming use arm_smmu_write_entry_step() Michael Shavit
2024-01-10 13:34                                       ` Jason Gunthorpe
2024-01-06  8:36                                     ` [PATCH] iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry Michael Shavit
2024-01-12 16:36                                       ` Jason Gunthorpe
2024-01-16  9:23                                         ` Michael Shavit
2024-01-10 13:10                                     ` [PATCH] iommu/arm-smmu-v3: Make STE programming independent of the callers Jason Gunthorpe
2024-01-06  8:50                                   ` [PATCH 04/19] " Michael Shavit
2024-01-12 19:45                                     ` Jason Gunthorpe
2024-01-03 15:42                           ` Michael Shavit
2024-01-03 15:49                             ` Jason Gunthorpe
2024-01-03 16:47                               ` Michael Shavit
2024-01-02  8:13                         ` Michael Shavit
2024-01-02 14:48                           ` Jason Gunthorpe
2023-10-18 10:54   ` Michael Shavit
2023-10-18 12:24     ` Jason Gunthorpe
2023-10-19 23:03       ` Jason Gunthorpe
2023-10-11  0:33 ` [PATCH 05/19] iommu/arm-smmu-v3: Consolidate the STE generation for abort/bypass Jason Gunthorpe
2023-10-11  0:33 ` [PATCH 06/19] iommu/arm-smmu-v3: Move arm_smmu_rmr_install_bypass_ste() Jason Gunthorpe
2023-10-11  0:33 ` [PATCH 07/19] iommu/arm-smmu-v3: Move the STE generation for S1 and S2 domains into functions Jason Gunthorpe
2023-10-11  0:33 ` [PATCH 08/19] iommu/arm-smmu-v3: Build the whole STE in arm_smmu_make_s2_domain_ste() Jason Gunthorpe
2023-10-11  0:33 ` [PATCH 09/19] iommu/arm-smmu-v3: Hold arm_smmu_asid_lock during all of attach_dev Jason Gunthorpe
2023-10-24  2:44   ` Michael Shavit
2023-10-24  2:48     ` Michael Shavit
2023-10-24 11:50     ` Jason Gunthorpe
2023-10-11  0:33 ` [PATCH 10/19] iommu/arm-smmu-v3: Compute the STE only once for each master Jason Gunthorpe
2023-10-11  0:33 ` [PATCH 11/19] iommu/arm-smmu-v3: Do not change the STE twice during arm_smmu_attach_dev() Jason Gunthorpe
2023-10-11  0:33 ` [PATCH 12/19] iommu/arm-smmu-v3: Put writing the context descriptor in the right order Jason Gunthorpe
2023-10-12  9:01   ` Michael Shavit
2023-10-12 12:34     ` Jason Gunthorpe
2023-10-11  0:33 ` [PATCH 13/19] iommu/arm-smmu-v3: Pass smmu_domain to arm_enable/disable_ats() Jason Gunthorpe
2023-10-11  0:33 ` [PATCH 14/19] iommu/arm-smmu-v3: Remove arm_smmu_master->domain Jason Gunthorpe
2023-10-11  0:33 ` [PATCH 15/19] iommu/arm-smmu-v3: Add a global static IDENTITY domain Jason Gunthorpe
2023-10-18 11:06   ` Michael Shavit
2023-10-18 12:26     ` Jason Gunthorpe
2023-10-11  0:33 ` [PATCH 16/19] iommu/arm-smmu-v3: Add a global static BLOCKED domain Jason Gunthorpe
2023-10-11  0:33 ` [PATCH 17/19] iommu/arm-smmu-v3: Use the identity/blocked domain during release Jason Gunthorpe
2023-10-11  0:33 ` [PATCH 18/19] iommu/arm-smmu-v3: Pass arm_smmu_domain and arm_smmu_device to finalize Jason Gunthorpe
2023-10-11  0:33 ` [PATCH 19/19] iommu/arm-smmu-v3: Convert to domain_alloc_paging() Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZX7xu8jN/rBZ4A/d@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=iommu@lists.linux.dev \
    --cc=joro@8bytes.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=mshavit@google.com \
    --cc=nicolinc@nvidia.com \
    --cc=robin.murphy@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).