From: Pranjal Shrivastava <praan@google.com>
To: Daniel Mentz <danielmentz@google.com>
Cc: Will Deacon <will@kernel.org>, Joerg Roedel <joro@8bytes.org>,
Robin Murphy <robin.murphy@arm.com>,
Mostafa Saleh <smostafa@google.com>,
Nicolin Chen <nicolinc@nvidia.com>,
iommu@lists.linux.dev, Jason Gunthorpe <jgg@nvidia.com>
Subject: Re: [PATCH v4 1/3] iommu/arm-smmu-v3: Introduce struct arm_smmu_event
Date: Mon, 4 Nov 2024 18:16:48 +0000 [thread overview]
Message-ID: <ZykPkEuv70SXKiT-@google.com> (raw)
In-Reply-To: <CAE2F3rAtg_dE2NpFM-xB8fs3W+_7tANnBdA00SVfKgk-y6X5Gg@mail.gmail.com>
On Mon, Nov 04, 2024 at 09:23:31AM -0800, Daniel Mentz wrote:
> On Thu, Oct 24, 2024 at 10:02 AM Pranjal Shrivastava <praan@google.com> wrote:
> >
> > On Thu, Oct 24, 2024 at 02:11:48PM +0100, Will Deacon wrote:
> > > On Fri, Oct 18, 2024 at 06:00:20PM +0000, Pranjal Shrivastava wrote:
> > > > +struct arm_smmu_event {
> > > > + struct arm_smmu_device *smmu;
> > > > + u8 id;
> > > > + u8 class;
> > > > + u16 stag;
> > > > + u32 sid;
> > > > + u32 ssid;
> > > > + u64 iova;
> > > > + u64 ipa;
> > > > + u64 raw[EVTQ_ENT_DWORDS];
> > > > + bool stall;
> > > > + bool ssid_valid;
> > > > + bool privileged;
> > > > + bool instruction;
> > > > + bool s2;
> > > > + bool read;
> > > > +};
> > >
> > > minor nit, but it might be worth seeing what pahole says about the
> > > layout of this structure in case you've got a bunch of wasted padding
> > > thanks to the mixed-size fields.
> >
> > I ran pahole with this, looks like there's only one 4-byte hole but the
> > cacheline aligment is bad (for a 64-byte cacheline):
> >
> > struct arm_smmu_event {
> > const char * master_name; /* 0 8 */
> > struct arm_smmu_device * smmu; /* 8 8 */
> > struct device * dev; /* 16 8 */
> > u8 id; /* 24 1 */
> > u8 class; /* 25 1 */
> > u16 stag; /* 26 2 */
> > u32 sid; /* 28 4 */
> > u32 ssid; /* 32 4 */
> >
> > /* XXX 4 bytes hole, try to pack */
> >
> > u64 iova; /* 40 8 */
> > u64 ipa; /* 48 8 */
> > u64 raw[4]; /* 56 32 */
> > /* --- cacheline 1 boundary (64 bytes) was 24 bytes ago --- */
> > bool stall; /* 88 1 */
> > bool ssid_valid; /* 89 1 */
> > bool privileged; /* 90 1 */
> > bool instruction; /* 91 1 */
> > bool s2; /* 92 1 */
> > bool read; /* 93 1 */
> > bool ttrnw; /* 94 1 */
> > bool ttrnw_valid; /* 95 1 */
> >
> > /* size: 96, cachelines: 2, members: 19 */
> > /* sum members: 92, holes: 1, sum holes: 4 */
> > /* last cacheline: 32 bytes */
> > };
> >
> > I don't think we can do much about the 4-byte hole as the members occupy
> > 92 bytes only. I assume a single 4-byte hole shall be fine?
> >
> > However, for cacheline aligment we can move the 3 top pointer-members,
> > `master_name`,`smmu` & `dev` which improves the cacheline aligment:
>
> Can you be more explicit about what is a good cacheline alignment? I'm
> wondering if you're trying to ensure that raw[4] is contained within a
> single cacheline as opposed to spanning two adjacent cachelines. I
I'm using a tool `pahole` [1] as suggested by Will earlier.
The tool prints information about the layout of structures, checking for
memory wastage due to padding and if the struct layout causes
mis-alignment with the caches.
The tool printed, "cacheline 1 boundary (64 bytes) was 24 bytes ago"
hinting that with the current layout, we are wasting 24-bytes of a
cacheline.
> doubt that this is worth optimizing for. Also, I'm wondering if this
> analysis assumes that the base address of a struct arm_smmu_event
> instance is cacheline aligned, which I am not sure is the case. I
> would solely optimize for size.
You're right, the tool does assume that the struct begins at a cacheline
I'm just sharing the analysis done by the tool to be able to finalize
the layout of `struct arm_smmu_event`.
IMO, if we don't have a strong opinion about this, there's no harm in
trying to layout the struct better even if it is a micro-optimization.
Although, I agree that size was the main concern here but if we have a
92-byte structure with 64-bit fields, we'll always have a 4-byte hole.
If we remove the `smmu` & `master_name` fields and pack all bools in
bitfields as discussed earlier, we'll have a size of 69 bytes,
(92 - 16 - 8 + 1), thus, we'd need a padding of 3 more bytes. Hence, the
size would be reduced to 72-bytes from 92-bytes. I guess, that's fine?
Thanks,
Praan
next prev parent reply other threads:[~2024-11-04 18:16 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-18 18:00 [PATCH v4 0/3] iommu/arm-smmu-v3: Parse out event records Pranjal Shrivastava
2024-10-18 18:00 ` [PATCH v4 1/3] iommu/arm-smmu-v3: Introduce struct arm_smmu_event Pranjal Shrivastava
2024-10-19 1:56 ` Nicolin Chen
2024-10-21 6:20 ` Pranjal Shrivastava
2024-10-24 13:11 ` Will Deacon
2024-10-24 14:20 ` Pranjal Shrivastava
2024-10-24 17:02 ` Pranjal Shrivastava
2024-10-24 17:03 ` Jason Gunthorpe
2024-10-24 17:37 ` Pranjal Shrivastava
2024-10-28 12:23 ` Jason Gunthorpe
2024-10-28 14:46 ` Pranjal Shrivastava
2024-11-04 17:23 ` Daniel Mentz
2024-11-04 18:16 ` Pranjal Shrivastava [this message]
2024-11-04 18:19 ` Pranjal Shrivastava
2024-11-01 14:41 ` Robin Murphy
2024-11-01 15:08 ` Pranjal Shrivastava
2024-11-04 5:25 ` Daniel Mentz
2024-11-04 8:31 ` Pranjal Shrivastava
2024-11-07 0:10 ` Daniel Mentz
2024-11-07 14:33 ` Pranjal Shrivastava
2024-11-07 0:16 ` Daniel Mentz
2024-11-07 14:57 ` Pranjal Shrivastava
2024-11-11 22:20 ` Daniel Mentz
2024-11-12 0:52 ` Pranjal Shrivastava
2024-11-12 4:01 ` Daniel Mentz
2024-11-12 8:12 ` Pranjal Shrivastava
2024-10-18 18:00 ` [PATCH v4 2/3] iommu/arm-smmu-v3: Log better event records Pranjal Shrivastava
2024-10-19 2:06 ` Nicolin Chen
2024-10-19 4:51 ` Nicolin Chen
2024-10-21 6:29 ` Pranjal Shrivastava
2024-10-21 6:26 ` Pranjal Shrivastava
2024-10-21 22:53 ` Nicolin Chen
2024-10-24 13:15 ` Will Deacon
2024-10-24 14:14 ` Pranjal Shrivastava
2024-10-29 18:53 ` Will Deacon
2024-10-29 19:59 ` Pranjal Shrivastava
2024-10-24 19:00 ` Nicolin Chen
2024-10-29 18:49 ` Will Deacon
2024-11-01 15:05 ` Robin Murphy
2024-11-01 16:06 ` Pranjal Shrivastava
2024-11-04 6:36 ` Daniel Mentz
2024-11-04 10:51 ` Pranjal Shrivastava
2024-10-18 18:00 ` [PATCH v4 3/3] iommu/arm-smmu-v3: Avoid redundant master lookup in events Pranjal Shrivastava
2024-10-19 2:08 ` Nicolin Chen
2024-10-19 1:45 ` [PATCH v4 0/3] iommu/arm-smmu-v3: Parse out event records Nicolin Chen
2024-10-21 6:33 ` Pranjal Shrivastava
2024-10-21 22:51 ` Nicolin Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZykPkEuv70SXKiT-@google.com \
--to=praan@google.com \
--cc=danielmentz@google.com \
--cc=iommu@lists.linux.dev \
--cc=jgg@nvidia.com \
--cc=joro@8bytes.org \
--cc=nicolinc@nvidia.com \
--cc=robin.murphy@arm.com \
--cc=smostafa@google.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox