From: Pranjal Shrivastava <praan@google.com>
To: Daniel Mentz <danielmentz@google.com>
Cc: Will Deacon <will@kernel.org>, Joerg Roedel <joro@8bytes.org>,
Robin Murphy <robin.murphy@arm.com>,
Mostafa Saleh <smostafa@google.com>,
Nicolin Chen <nicolinc@nvidia.com>,
iommu@lists.linux.dev, Jason Gunthorpe <jgg@nvidia.com>
Subject: Re: [PATCH v4 1/3] iommu/arm-smmu-v3: Introduce struct arm_smmu_event
Date: Mon, 4 Nov 2024 18:19:27 +0000 [thread overview]
Message-ID: <ZykQL1UAxtWmC0P3@google.com> (raw)
In-Reply-To: <ZykPkEuv70SXKiT-@google.com>
On Mon, Nov 04, 2024 at 06:16:48PM +0000, Pranjal Shrivastava wrote:
> On Mon, Nov 04, 2024 at 09:23:31AM -0800, Daniel Mentz wrote:
> > On Thu, Oct 24, 2024 at 10:02 AM Pranjal Shrivastava <praan@google.com> wrote:
> > >
> > > On Thu, Oct 24, 2024 at 02:11:48PM +0100, Will Deacon wrote:
> > > > On Fri, Oct 18, 2024 at 06:00:20PM +0000, Pranjal Shrivastava wrote:
> > > > > +struct arm_smmu_event {
> > > > > + struct arm_smmu_device *smmu;
> > > > > + u8 id;
> > > > > + u8 class;
> > > > > + u16 stag;
> > > > > + u32 sid;
> > > > > + u32 ssid;
> > > > > + u64 iova;
> > > > > + u64 ipa;
> > > > > + u64 raw[EVTQ_ENT_DWORDS];
> > > > > + bool stall;
> > > > > + bool ssid_valid;
> > > > > + bool privileged;
> > > > > + bool instruction;
> > > > > + bool s2;
> > > > > + bool read;
> > > > > +};
> > > >
> > > > minor nit, but it might be worth seeing what pahole says about the
> > > > layout of this structure in case you've got a bunch of wasted padding
> > > > thanks to the mixed-size fields.
> > >
> > > I ran pahole with this, looks like there's only one 4-byte hole but the
> > > cacheline aligment is bad (for a 64-byte cacheline):
> > >
> > > struct arm_smmu_event {
> > > const char * master_name; /* 0 8 */
> > > struct arm_smmu_device * smmu; /* 8 8 */
> > > struct device * dev; /* 16 8 */
> > > u8 id; /* 24 1 */
> > > u8 class; /* 25 1 */
> > > u16 stag; /* 26 2 */
> > > u32 sid; /* 28 4 */
> > > u32 ssid; /* 32 4 */
> > >
> > > /* XXX 4 bytes hole, try to pack */
> > >
> > > u64 iova; /* 40 8 */
> > > u64 ipa; /* 48 8 */
> > > u64 raw[4]; /* 56 32 */
> > > /* --- cacheline 1 boundary (64 bytes) was 24 bytes ago --- */
> > > bool stall; /* 88 1 */
> > > bool ssid_valid; /* 89 1 */
> > > bool privileged; /* 90 1 */
> > > bool instruction; /* 91 1 */
> > > bool s2; /* 92 1 */
> > > bool read; /* 93 1 */
> > > bool ttrnw; /* 94 1 */
> > > bool ttrnw_valid; /* 95 1 */
> > >
> > > /* size: 96, cachelines: 2, members: 19 */
> > > /* sum members: 92, holes: 1, sum holes: 4 */
> > > /* last cacheline: 32 bytes */
> > > };
> > >
> > > I don't think we can do much about the 4-byte hole as the members occupy
> > > 92 bytes only. I assume a single 4-byte hole shall be fine?
> > >
> > > However, for cacheline aligment we can move the 3 top pointer-members,
> > > `master_name`,`smmu` & `dev` which improves the cacheline aligment:
> >
> > Can you be more explicit about what is a good cacheline alignment? I'm
> > wondering if you're trying to ensure that raw[4] is contained within a
> > single cacheline as opposed to spanning two adjacent cachelines. I
>
> I'm using a tool `pahole` [1] as suggested by Will earlier.
> The tool prints information about the layout of structures, checking for
> memory wastage due to padding and if the struct layout causes
> mis-alignment with the caches.
>
> The tool printed, "cacheline 1 boundary (64 bytes) was 24 bytes ago"
> hinting that with the current layout, we are wasting 24-bytes of a
> cacheline.
>
> > doubt that this is worth optimizing for. Also, I'm wondering if this
> > analysis assumes that the base address of a struct arm_smmu_event
> > instance is cacheline aligned, which I am not sure is the case. I
> > would solely optimize for size.
>
> You're right, the tool does assume that the struct begins at a cacheline
> I'm just sharing the analysis done by the tool to be able to finalize
> the layout of `struct arm_smmu_event`.
>
> IMO, if we don't have a strong opinion about this, there's no harm in
> trying to layout the struct better even if it is a micro-optimization.
>
> Although, I agree that size was the main concern here but if we have a
> 92-byte structure with 64-bit fields, we'll always have a 4-byte hole.
>
> If we remove the `smmu` & `master_name` fields and pack all bools in
> bitfields as discussed earlier, we'll have a size of 69 bytes,
> (92 - 16 - 8 + 1), thus, we'd need a padding of 3 more bytes. Hence, the
> size would be reduced to 72-bytes from 92-bytes. I guess, that's fine?
>
> Thanks,
> Praan
Missed the link tag! Apologies.
[1] https://linux.die.net/man/1/pahole
next prev parent reply other threads:[~2024-11-04 18:19 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-18 18:00 [PATCH v4 0/3] iommu/arm-smmu-v3: Parse out event records Pranjal Shrivastava
2024-10-18 18:00 ` [PATCH v4 1/3] iommu/arm-smmu-v3: Introduce struct arm_smmu_event Pranjal Shrivastava
2024-10-19 1:56 ` Nicolin Chen
2024-10-21 6:20 ` Pranjal Shrivastava
2024-10-24 13:11 ` Will Deacon
2024-10-24 14:20 ` Pranjal Shrivastava
2024-10-24 17:02 ` Pranjal Shrivastava
2024-10-24 17:03 ` Jason Gunthorpe
2024-10-24 17:37 ` Pranjal Shrivastava
2024-10-28 12:23 ` Jason Gunthorpe
2024-10-28 14:46 ` Pranjal Shrivastava
2024-11-04 17:23 ` Daniel Mentz
2024-11-04 18:16 ` Pranjal Shrivastava
2024-11-04 18:19 ` Pranjal Shrivastava [this message]
2024-11-01 14:41 ` Robin Murphy
2024-11-01 15:08 ` Pranjal Shrivastava
2024-11-04 5:25 ` Daniel Mentz
2024-11-04 8:31 ` Pranjal Shrivastava
2024-11-07 0:10 ` Daniel Mentz
2024-11-07 14:33 ` Pranjal Shrivastava
2024-11-07 0:16 ` Daniel Mentz
2024-11-07 14:57 ` Pranjal Shrivastava
2024-11-11 22:20 ` Daniel Mentz
2024-11-12 0:52 ` Pranjal Shrivastava
2024-11-12 4:01 ` Daniel Mentz
2024-11-12 8:12 ` Pranjal Shrivastava
2024-10-18 18:00 ` [PATCH v4 2/3] iommu/arm-smmu-v3: Log better event records Pranjal Shrivastava
2024-10-19 2:06 ` Nicolin Chen
2024-10-19 4:51 ` Nicolin Chen
2024-10-21 6:29 ` Pranjal Shrivastava
2024-10-21 6:26 ` Pranjal Shrivastava
2024-10-21 22:53 ` Nicolin Chen
2024-10-24 13:15 ` Will Deacon
2024-10-24 14:14 ` Pranjal Shrivastava
2024-10-29 18:53 ` Will Deacon
2024-10-29 19:59 ` Pranjal Shrivastava
2024-10-24 19:00 ` Nicolin Chen
2024-10-29 18:49 ` Will Deacon
2024-11-01 15:05 ` Robin Murphy
2024-11-01 16:06 ` Pranjal Shrivastava
2024-11-04 6:36 ` Daniel Mentz
2024-11-04 10:51 ` Pranjal Shrivastava
2024-10-18 18:00 ` [PATCH v4 3/3] iommu/arm-smmu-v3: Avoid redundant master lookup in events Pranjal Shrivastava
2024-10-19 2:08 ` Nicolin Chen
2024-10-19 1:45 ` [PATCH v4 0/3] iommu/arm-smmu-v3: Parse out event records Nicolin Chen
2024-10-21 6:33 ` Pranjal Shrivastava
2024-10-21 22:51 ` Nicolin Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZykQL1UAxtWmC0P3@google.com \
--to=praan@google.com \
--cc=danielmentz@google.com \
--cc=iommu@lists.linux.dev \
--cc=jgg@nvidia.com \
--cc=joro@8bytes.org \
--cc=nicolinc@nvidia.com \
--cc=robin.murphy@arm.com \
--cc=smostafa@google.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.