From: Jason Gunthorpe <jgg@nvidia.com>
To: Michael Shavit <mshavit@google.com>
Cc: iommu@lists.linux.dev, Joerg Roedel <joro@8bytes.org>,
linux-arm-kernel@lists.infradead.org,
Robin Murphy <robin.murphy@arm.com>,
Will Deacon <will@kernel.org>, Nicolin Chen <nicolinc@nvidia.com>
Subject: Re: [PATCH 04/19] iommu/arm-smmu-v3: Make STE programming independent of the callers
Date: Wed, 3 Jan 2024 13:50:43 -0400 [thread overview]
Message-ID: <20240103175043.GS50406@nvidia.com> (raw)
In-Reply-To: <CAKHBV26TD-gzv3dB8VkGSJ9T8rrynPa-DL48s_VzfF1xb7xQjA@mail.gmail.com>
On Thu, Jan 04, 2024 at 12:52:48AM +0800, Michael Shavit wrote:
> > > And then this branch is the case where you can directly switch to the
> > > entry without first setting unused bits.
> >
> > Don't make that a special case, just always set the unused bits. All
> > the setting functions should skip the sync if they didn't change the
> > entry, so we don't need to care if we call them needlessly.
> >
> > There are only three programming sequences.
>
> The different cases (ignoring clean-up) from simplest to least are:
> 1. No change because the STE is already equal to the target.
> 2. Directly writing critical word because that's the only difference.
> 3. Setting unused bits then writing critical word.
> 4. Installing breaking STE, write other words, write critical word.
Right
> Case 2. could potentially be collapsed into 3. if the routine that
> sets unused bits skips over the critical word, so that it's a nop when
> the only change is on that critical word.
Right
> > entry_qwords_used_diff should reflect required changes after setting
> > the unused bits.
>
> Ohhhhhhh, I see. Your suggestion is essentially to move this block
> into the first call to get_used_qword_diff_indexes:
> > > > > + /*
> > > > > + * Compute a staging entry that has all the bits currently
> > > > > + * unused by HW set to their target values, such that comitting
> > > > > + * it to the entry table woudn't disrupt the hardware.
> > > > > + */
> > > > > + memcpy(staging_entry, cur, writer->entry_length);
> > > > > + writer->ops.set_unused_bits(staging_entry, target);
> > > > > +
> > > > > + entry_qwords_used_diff =
> > > > > + writer->ops.get_used_qword_diff_indexes(staging_entry,
> > > > > + target);
>
> Such that:
> if (hweight8(entry_qwords_used_diff) > 1) => non hitless
> if (hweight8(entry_qwords_used_diff) > 0) => hitless, potentially by
> first setting some unused bits in non-critical qwords.
Yes, sorry it was unclear. Here is the full thing for what I mean:
struct arm_smmu_entry_writer_ops {
unsigned int num_entry_qwords;
__le64 v_bit;
void (*get_used)(const __le64 *entry, __le64 *used);
void (*sync)(void);
};
enum {
NUM_ENTRY_QWORDS =
((sizeof(struct arm_smmu_ste) > sizeof(struct arm_smmu_cd)) ?
sizeof(struct arm_smmu_ste) :
sizeof(struct arm_smmu_cd)) /
sizeof(u64)
};
static bool entry_set(const struct arm_smmu_entry_writer_ops *ops,
__le64 *entry, const __le64 *target, unsigned int start,
unsigned int len)
{
bool changed = false;
entry = entry + start;
target = target + start;
for (; len != 0; len--, target++, start++) {
if (*entry != *target) {
WRITE_ONCE(*entry, *target);
changed = true;
}
}
if (changed)
ops->sync();
return changed;
}
/*
* Figure out if we can do a hitless update of entry to become target. Returns a
* bit mask where 1 indicates that qword needs to be set disruptively.
* unused_update is an intermediate value of entry that has unused bits set to
* their new values.
*/
static u8 compute_qword_diff(const struct arm_smmu_entry_writer_ops *ops,
const __le64 *entry, const __le64 *target,
__le64 *unused_update)
{
__le64 target_used[NUM_ENTRY_QWORDS];
__le64 cur_used[NUM_ENTRY_QWORDS];
u8 used_qword_diff = 0;
unsigned int i;
ops->get_used(entry, cur_used);
ops->get_used(target, target_used);
for (i = 0; i != ops->num_entry_qwords; i++) {
/*
* Masks are up to date, the make functions are not allowed to
* set a bit to 1 if the used function doesn't say it is used.
*/
WARN_ON_ONCE(target[i] & ~target_used[i]);
/* Bits can change because they are not currently being used */
unused_update[i] = (entry[i] & cur_used[i]) |
(target[i] & ~cur_used[i]);
/*
* Each bit indicates that a used bit in a qword needs to be
* changed after unused_update is applied.
*/
if ((unused_update[i] & target_used[i]) !=
(target[i] & target_used[i]))
used_qword_diff |= 1 << i;
}
return used_qword_diff;
}
static void arm_smmu_write_entry(const struct arm_smmu_entry_writer_ops *ops,
__le64 *entry, const __le64 *target)
{
__le64 unused_update[NUM_ENTRY_QWORDS];
u8 used_qword_diff;
used_qword_diff = compute_qword_diff(ops, entry, target, unused_update);
if (hweight8(used_qword_diff) > 1) {
/*
* At least two qwords need their used bits to be changed. This
* requires a breaking update, zero the V bit, write all qwords
* but 0, then set qword 0
*/
unused_update[0] = entry[0] & (~ops->v_bit);
entry_set(ops, entry, unused_update, 0, 1);
entry_set(ops, entry, target, 1, ops->num_entry_qwords);
entry_set(ops, entry, target, 0, 1);
} else if (hweight8(used_qword_diff) == 1) {
/*
* Only one qword needs its used bits to be changed. This is a
* hitless update, update all bits the current STE is ignoring
* to their new values, then update a single qword to change the
* STE and finally 0 and unused bits.
*/
entry_set(ops, entry, unused_update, 0, ops->num_entry_qwords);
entry_set(ops, entry, target, ffs(used_qword_diff) - 1, 1);
entry_set(ops, entry, target, 0, ops->num_entry_qwords);
} else {
/*
* If everything is working properly this shouldn't do anything
* as unused bits should always be 0 and thus
* can't change.
*/
WARN_ON_ONCE(entry_set(ops, entry, target, 0,
ops->num_entry_qwords));
}
}
I'm fine with this, if you think it is better please sort out the rest
of the bits and send me a diff and I'll integrate it
Thanks,
Jason
WARNING: multiple messages have this Message-ID (diff)
From: Jason Gunthorpe <jgg@nvidia.com>
To: Michael Shavit <mshavit@google.com>
Cc: iommu@lists.linux.dev, Joerg Roedel <joro@8bytes.org>,
linux-arm-kernel@lists.infradead.org,
Robin Murphy <robin.murphy@arm.com>,
Will Deacon <will@kernel.org>, Nicolin Chen <nicolinc@nvidia.com>
Subject: Re: [PATCH 04/19] iommu/arm-smmu-v3: Make STE programming independent of the callers
Date: Wed, 3 Jan 2024 13:50:43 -0400 [thread overview]
Message-ID: <20240103175043.GS50406@nvidia.com> (raw)
In-Reply-To: <CAKHBV26TD-gzv3dB8VkGSJ9T8rrynPa-DL48s_VzfF1xb7xQjA@mail.gmail.com>
On Thu, Jan 04, 2024 at 12:52:48AM +0800, Michael Shavit wrote:
> > > And then this branch is the case where you can directly switch to the
> > > entry without first setting unused bits.
> >
> > Don't make that a special case, just always set the unused bits. All
> > the setting functions should skip the sync if they didn't change the
> > entry, so we don't need to care if we call them needlessly.
> >
> > There are only three programming sequences.
>
> The different cases (ignoring clean-up) from simplest to least are:
> 1. No change because the STE is already equal to the target.
> 2. Directly writing critical word because that's the only difference.
> 3. Setting unused bits then writing critical word.
> 4. Installing breaking STE, write other words, write critical word.
Right
> Case 2. could potentially be collapsed into 3. if the routine that
> sets unused bits skips over the critical word, so that it's a nop when
> the only change is on that critical word.
Right
> > entry_qwords_used_diff should reflect required changes after setting
> > the unused bits.
>
> Ohhhhhhh, I see. Your suggestion is essentially to move this block
> into the first call to get_used_qword_diff_indexes:
> > > > > + /*
> > > > > + * Compute a staging entry that has all the bits currently
> > > > > + * unused by HW set to their target values, such that comitting
> > > > > + * it to the entry table woudn't disrupt the hardware.
> > > > > + */
> > > > > + memcpy(staging_entry, cur, writer->entry_length);
> > > > > + writer->ops.set_unused_bits(staging_entry, target);
> > > > > +
> > > > > + entry_qwords_used_diff =
> > > > > + writer->ops.get_used_qword_diff_indexes(staging_entry,
> > > > > + target);
>
> Such that:
> if (hweight8(entry_qwords_used_diff) > 1) => non hitless
> if (hweight8(entry_qwords_used_diff) > 0) => hitless, potentially by
> first setting some unused bits in non-critical qwords.
Yes, sorry it was unclear. Here is the full thing for what I mean:
struct arm_smmu_entry_writer_ops {
unsigned int num_entry_qwords;
__le64 v_bit;
void (*get_used)(const __le64 *entry, __le64 *used);
void (*sync)(void);
};
enum {
NUM_ENTRY_QWORDS =
((sizeof(struct arm_smmu_ste) > sizeof(struct arm_smmu_cd)) ?
sizeof(struct arm_smmu_ste) :
sizeof(struct arm_smmu_cd)) /
sizeof(u64)
};
static bool entry_set(const struct arm_smmu_entry_writer_ops *ops,
__le64 *entry, const __le64 *target, unsigned int start,
unsigned int len)
{
bool changed = false;
entry = entry + start;
target = target + start;
for (; len != 0; len--, target++, start++) {
if (*entry != *target) {
WRITE_ONCE(*entry, *target);
changed = true;
}
}
if (changed)
ops->sync();
return changed;
}
/*
* Figure out if we can do a hitless update of entry to become target. Returns a
* bit mask where 1 indicates that qword needs to be set disruptively.
* unused_update is an intermediate value of entry that has unused bits set to
* their new values.
*/
static u8 compute_qword_diff(const struct arm_smmu_entry_writer_ops *ops,
const __le64 *entry, const __le64 *target,
__le64 *unused_update)
{
__le64 target_used[NUM_ENTRY_QWORDS];
__le64 cur_used[NUM_ENTRY_QWORDS];
u8 used_qword_diff = 0;
unsigned int i;
ops->get_used(entry, cur_used);
ops->get_used(target, target_used);
for (i = 0; i != ops->num_entry_qwords; i++) {
/*
* Masks are up to date, the make functions are not allowed to
* set a bit to 1 if the used function doesn't say it is used.
*/
WARN_ON_ONCE(target[i] & ~target_used[i]);
/* Bits can change because they are not currently being used */
unused_update[i] = (entry[i] & cur_used[i]) |
(target[i] & ~cur_used[i]);
/*
* Each bit indicates that a used bit in a qword needs to be
* changed after unused_update is applied.
*/
if ((unused_update[i] & target_used[i]) !=
(target[i] & target_used[i]))
used_qword_diff |= 1 << i;
}
return used_qword_diff;
}
static void arm_smmu_write_entry(const struct arm_smmu_entry_writer_ops *ops,
__le64 *entry, const __le64 *target)
{
__le64 unused_update[NUM_ENTRY_QWORDS];
u8 used_qword_diff;
used_qword_diff = compute_qword_diff(ops, entry, target, unused_update);
if (hweight8(used_qword_diff) > 1) {
/*
* At least two qwords need their used bits to be changed. This
* requires a breaking update, zero the V bit, write all qwords
* but 0, then set qword 0
*/
unused_update[0] = entry[0] & (~ops->v_bit);
entry_set(ops, entry, unused_update, 0, 1);
entry_set(ops, entry, target, 1, ops->num_entry_qwords);
entry_set(ops, entry, target, 0, 1);
} else if (hweight8(used_qword_diff) == 1) {
/*
* Only one qword needs its used bits to be changed. This is a
* hitless update, update all bits the current STE is ignoring
* to their new values, then update a single qword to change the
* STE and finally 0 and unused bits.
*/
entry_set(ops, entry, unused_update, 0, ops->num_entry_qwords);
entry_set(ops, entry, target, ffs(used_qword_diff) - 1, 1);
entry_set(ops, entry, target, 0, ops->num_entry_qwords);
} else {
/*
* If everything is working properly this shouldn't do anything
* as unused bits should always be 0 and thus
* can't change.
*/
WARN_ON_ONCE(entry_set(ops, entry, target, 0,
ops->num_entry_qwords));
}
}
I'm fine with this, if you think it is better please sort out the rest
of the bits and send me a diff and I'll integrate it
Thanks,
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2024-01-03 17:50 UTC|newest]
Thread overview: 134+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-11 0:33 [PATCH 00/19] Update SMMUv3 to the modern iommu API (part 1/2) Jason Gunthorpe
2023-10-11 0:33 ` Jason Gunthorpe
2023-10-11 0:33 ` [PATCH 01/19] iommu/arm-smmu-v3: Add a type for the STE Jason Gunthorpe
2023-10-11 0:33 ` Jason Gunthorpe
2023-10-13 10:37 ` Will Deacon
2023-10-13 10:37 ` Will Deacon
2023-10-13 14:00 ` Jason Gunthorpe
2023-10-13 14:00 ` Jason Gunthorpe
2023-10-11 0:33 ` [PATCH 02/19] iommu/arm-smmu-v3: Master cannot be NULL in arm_smmu_write_strtab_ent() Jason Gunthorpe
2023-10-11 0:33 ` Jason Gunthorpe
2023-10-11 0:33 ` [PATCH 03/19] iommu/arm-smmu-v3: Remove ARM_SMMU_DOMAIN_NESTED Jason Gunthorpe
2023-10-11 0:33 ` Jason Gunthorpe
2023-10-11 0:33 ` [PATCH 04/19] iommu/arm-smmu-v3: Make STE programming independent of the callers Jason Gunthorpe
2023-10-11 0:33 ` Jason Gunthorpe
2023-10-12 8:10 ` Michael Shavit
2023-10-12 8:10 ` Michael Shavit
2023-10-12 12:16 ` Jason Gunthorpe
2023-10-12 12:16 ` Jason Gunthorpe
2023-10-18 11:05 ` Michael Shavit
2023-10-18 11:05 ` Michael Shavit
2023-10-18 13:04 ` Jason Gunthorpe
2023-10-18 13:04 ` Jason Gunthorpe
2023-10-20 8:23 ` Michael Shavit
2023-10-20 8:23 ` Michael Shavit
2023-10-20 11:39 ` Jason Gunthorpe
2023-10-20 11:39 ` Jason Gunthorpe
2023-10-23 8:36 ` Michael Shavit
2023-10-23 8:36 ` Michael Shavit
2023-10-23 12:05 ` Jason Gunthorpe
2023-10-23 12:05 ` Jason Gunthorpe
2023-12-15 20:26 ` Michael Shavit
2023-12-15 20:26 ` Michael Shavit
2023-12-17 13:03 ` Jason Gunthorpe
2023-12-17 13:03 ` Jason Gunthorpe
2023-12-18 12:35 ` Michael Shavit
2023-12-18 12:35 ` Michael Shavit
2023-12-18 12:42 ` Michael Shavit
2023-12-18 12:42 ` Michael Shavit
2023-12-19 13:42 ` Michael Shavit
2023-12-19 13:42 ` Michael Shavit
2023-12-25 12:17 ` Michael Shavit
2023-12-25 12:17 ` Michael Shavit
2023-12-25 12:58 ` Michael Shavit
2023-12-25 12:58 ` Michael Shavit
2023-12-27 15:33 ` Jason Gunthorpe
2023-12-27 15:33 ` Jason Gunthorpe
2023-12-27 15:46 ` Jason Gunthorpe
2023-12-27 15:46 ` Jason Gunthorpe
2024-01-02 8:08 ` Michael Shavit
2024-01-02 8:08 ` Michael Shavit
2024-01-02 14:48 ` Jason Gunthorpe
2024-01-02 14:48 ` Jason Gunthorpe
2024-01-03 16:52 ` Michael Shavit
2024-01-03 16:52 ` Michael Shavit
2024-01-03 17:50 ` Jason Gunthorpe [this message]
2024-01-03 17:50 ` Jason Gunthorpe
2024-01-06 8:36 ` [PATCH] " Michael Shavit
2024-01-06 8:36 ` Michael Shavit
2024-01-06 8:36 ` [PATCH] iommu/arm-smmu-v3: Make CD programming use arm_smmu_write_entry_step() Michael Shavit
2024-01-06 8:36 ` Michael Shavit
2024-01-10 13:34 ` Jason Gunthorpe
2024-01-10 13:34 ` Jason Gunthorpe
2024-01-06 8:36 ` [PATCH] iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry Michael Shavit
2024-01-06 8:36 ` Michael Shavit
2024-01-12 16:36 ` Jason Gunthorpe
2024-01-12 16:36 ` Jason Gunthorpe
2024-01-16 9:23 ` Michael Shavit
2024-01-16 9:23 ` Michael Shavit
2024-01-10 13:10 ` [PATCH] iommu/arm-smmu-v3: Make STE programming independent of the callers Jason Gunthorpe
2024-01-10 13:10 ` Jason Gunthorpe
2024-01-06 8:50 ` [PATCH 04/19] " Michael Shavit
2024-01-06 8:50 ` Michael Shavit
2024-01-12 19:45 ` Jason Gunthorpe
2024-01-12 19:45 ` Jason Gunthorpe
2024-01-03 15:42 ` Michael Shavit
2024-01-03 15:42 ` Michael Shavit
2024-01-03 15:49 ` Jason Gunthorpe
2024-01-03 15:49 ` Jason Gunthorpe
2024-01-03 16:47 ` Michael Shavit
2024-01-03 16:47 ` Michael Shavit
2024-01-02 8:13 ` Michael Shavit
2024-01-02 8:13 ` Michael Shavit
2024-01-02 14:48 ` Jason Gunthorpe
2024-01-02 14:48 ` Jason Gunthorpe
2023-10-18 10:54 ` Michael Shavit
2023-10-18 10:54 ` Michael Shavit
2023-10-18 12:24 ` Jason Gunthorpe
2023-10-18 12:24 ` Jason Gunthorpe
2023-10-19 23:03 ` Jason Gunthorpe
2023-10-19 23:03 ` Jason Gunthorpe
2023-10-11 0:33 ` [PATCH 05/19] iommu/arm-smmu-v3: Consolidate the STE generation for abort/bypass Jason Gunthorpe
2023-10-11 0:33 ` Jason Gunthorpe
2023-10-11 0:33 ` [PATCH 06/19] iommu/arm-smmu-v3: Move arm_smmu_rmr_install_bypass_ste() Jason Gunthorpe
2023-10-11 0:33 ` Jason Gunthorpe
2023-10-11 0:33 ` [PATCH 07/19] iommu/arm-smmu-v3: Move the STE generation for S1 and S2 domains into functions Jason Gunthorpe
2023-10-11 0:33 ` Jason Gunthorpe
2023-10-11 0:33 ` [PATCH 08/19] iommu/arm-smmu-v3: Build the whole STE in arm_smmu_make_s2_domain_ste() Jason Gunthorpe
2023-10-11 0:33 ` Jason Gunthorpe
2023-10-11 0:33 ` [PATCH 09/19] iommu/arm-smmu-v3: Hold arm_smmu_asid_lock during all of attach_dev Jason Gunthorpe
2023-10-11 0:33 ` Jason Gunthorpe
2023-10-24 2:44 ` Michael Shavit
2023-10-24 2:44 ` Michael Shavit
2023-10-24 2:48 ` Michael Shavit
2023-10-24 2:48 ` Michael Shavit
2023-10-24 11:50 ` Jason Gunthorpe
2023-10-24 11:50 ` Jason Gunthorpe
2023-10-11 0:33 ` [PATCH 10/19] iommu/arm-smmu-v3: Compute the STE only once for each master Jason Gunthorpe
2023-10-11 0:33 ` Jason Gunthorpe
2023-10-11 0:33 ` [PATCH 11/19] iommu/arm-smmu-v3: Do not change the STE twice during arm_smmu_attach_dev() Jason Gunthorpe
2023-10-11 0:33 ` Jason Gunthorpe
2023-10-11 0:33 ` [PATCH 12/19] iommu/arm-smmu-v3: Put writing the context descriptor in the right order Jason Gunthorpe
2023-10-11 0:33 ` Jason Gunthorpe
2023-10-12 9:01 ` Michael Shavit
2023-10-12 9:01 ` Michael Shavit
2023-10-12 12:34 ` Jason Gunthorpe
2023-10-12 12:34 ` Jason Gunthorpe
2023-10-11 0:33 ` [PATCH 13/19] iommu/arm-smmu-v3: Pass smmu_domain to arm_enable/disable_ats() Jason Gunthorpe
2023-10-11 0:33 ` Jason Gunthorpe
2023-10-11 0:33 ` [PATCH 14/19] iommu/arm-smmu-v3: Remove arm_smmu_master->domain Jason Gunthorpe
2023-10-11 0:33 ` Jason Gunthorpe
2023-10-11 0:33 ` [PATCH 15/19] iommu/arm-smmu-v3: Add a global static IDENTITY domain Jason Gunthorpe
2023-10-11 0:33 ` Jason Gunthorpe
2023-10-18 11:06 ` Michael Shavit
2023-10-18 11:06 ` Michael Shavit
2023-10-18 12:26 ` Jason Gunthorpe
2023-10-18 12:26 ` Jason Gunthorpe
2023-10-11 0:33 ` [PATCH 16/19] iommu/arm-smmu-v3: Add a global static BLOCKED domain Jason Gunthorpe
2023-10-11 0:33 ` Jason Gunthorpe
2023-10-11 0:33 ` [PATCH 17/19] iommu/arm-smmu-v3: Use the identity/blocked domain during release Jason Gunthorpe
2023-10-11 0:33 ` Jason Gunthorpe
2023-10-11 0:33 ` [PATCH 18/19] iommu/arm-smmu-v3: Pass arm_smmu_domain and arm_smmu_device to finalize Jason Gunthorpe
2023-10-11 0:33 ` Jason Gunthorpe
2023-10-11 0:33 ` [PATCH 19/19] iommu/arm-smmu-v3: Convert to domain_alloc_paging() Jason Gunthorpe
2023-10-11 0:33 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240103175043.GS50406@nvidia.com \
--to=jgg@nvidia.com \
--cc=iommu@lists.linux.dev \
--cc=joro@8bytes.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=mshavit@google.com \
--cc=nicolinc@nvidia.com \
--cc=robin.murphy@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.