Re: [PATCH v4 1/3] iommu/arm-smmu-v3: Introduce struct arm_smmu_event

Linux IOMMU Development
 help / color / mirror / Atom feed

From: Pranjal Shrivastava <praan@google.com>
To: Daniel Mentz <danielmentz@google.com>
Cc: Will Deacon <will@kernel.org>, Joerg Roedel <joro@8bytes.org>,
	Robin Murphy <robin.murphy@arm.com>,
	Mostafa Saleh <smostafa@google.com>,
	Nicolin Chen <nicolinc@nvidia.com>,
	iommu@lists.linux.dev, Jason Gunthorpe <jgg@nvidia.com>
Subject: Re: [PATCH v4 1/3] iommu/arm-smmu-v3: Introduce struct arm_smmu_event
Date: Mon, 4 Nov 2024 18:16:48 +0000	[thread overview]
Message-ID: <ZykPkEuv70SXKiT-@google.com> (raw)
In-Reply-To: <CAE2F3rAtg_dE2NpFM-xB8fs3W+_7tANnBdA00SVfKgk-y6X5Gg@mail.gmail.com>

On Mon, Nov 04, 2024 at 09:23:31AM -0800, Daniel Mentz wrote:
> On Thu, Oct 24, 2024 at 10:02 AM Pranjal Shrivastava <praan@google.com> wrote:
> >
> > On Thu, Oct 24, 2024 at 02:11:48PM +0100, Will Deacon wrote:
> > > On Fri, Oct 18, 2024 at 06:00:20PM +0000, Pranjal Shrivastava wrote:
> > > > +struct arm_smmu_event {
> > > > +   struct arm_smmu_device          *smmu;
> > > > +   u8                              id;
> > > > +   u8                              class;
> > > > +   u16                             stag;
> > > > +   u32                             sid;
> > > > +   u32                             ssid;
> > > > +   u64                             iova;
> > > > +   u64                             ipa;
> > > > +   u64                             raw[EVTQ_ENT_DWORDS];
> > > > +   bool                            stall;
> > > > +   bool                            ssid_valid;
> > > > +   bool                            privileged;
> > > > +   bool                            instruction;
> > > > +   bool                            s2;
> > > > +   bool                            read;
> > > > +};
> > >
> > > minor nit, but it might be worth seeing what pahole says about the
> > > layout of this structure in case you've got a bunch of wasted padding
> > > thanks to the mixed-size fields.
> >
> > I ran pahole with this, looks like there's only one 4-byte hole but the
> > cacheline aligment is bad (for a 64-byte cacheline):
> >
> > struct arm_smmu_event {
> >         const char  *              master_name;          /*     0     8 */
> >         struct arm_smmu_device *   smmu;                 /*     8     8 */
> >         struct device *            dev;                  /*    16     8 */
> >         u8                         id;                   /*    24     1 */
> >         u8                         class;                /*    25     1 */
> >         u16                        stag;                 /*    26     2 */
> >         u32                        sid;                  /*    28     4 */
> >         u32                        ssid;                 /*    32     4 */
> >
> >         /* XXX 4 bytes hole, try to pack */
> >
> >         u64                        iova;                 /*    40     8 */
> >         u64                        ipa;                  /*    48     8 */
> >         u64                        raw[4];               /*    56    32 */
> >         /* --- cacheline 1 boundary (64 bytes) was 24 bytes ago --- */
> >         bool                       stall;                /*    88     1 */
> >         bool                       ssid_valid;           /*    89     1 */
> >         bool                       privileged;           /*    90     1 */
> >         bool                       instruction;          /*    91     1 */
> >         bool                       s2;                   /*    92     1 */
> >         bool                       read;                 /*    93     1 */
> >         bool                       ttrnw;                /*    94     1 */
> >         bool                       ttrnw_valid;          /*    95     1 */
> >
> >         /* size: 96, cachelines: 2, members: 19 */
> >         /* sum members: 92, holes: 1, sum holes: 4 */
> >         /* last cacheline: 32 bytes */
> > };
> >
> > I don't think we can do much about the 4-byte hole as the members occupy
> > 92 bytes only. I assume a single 4-byte hole shall be fine?
> >
> > However, for cacheline aligment we can move the 3 top pointer-members,
> > `master_name`,`smmu` & `dev` which improves the cacheline aligment:
> 
> Can you be more explicit about what is a good cacheline alignment? I'm
> wondering if you're trying to ensure that raw[4] is contained within a
> single cacheline as opposed to spanning two adjacent cachelines. I

I'm using a tool `pahole` [1] as suggested by Will earlier.
The tool prints information about the layout of structures, checking for
memory wastage due to padding and if the struct layout causes
mis-alignment with the caches.

The tool printed, "cacheline 1 boundary (64 bytes) was 24 bytes ago"
hinting that with the current layout, we are wasting 24-bytes of a
cacheline.

> doubt that this is worth optimizing for. Also, I'm wondering if this
> analysis assumes that the base address of a struct arm_smmu_event
> instance is cacheline aligned, which I am not sure is the case. I
> would solely optimize for size.

You're right, the tool does assume that the struct begins at a cacheline
I'm just sharing the analysis done by the tool to be able to finalize
the layout of `struct arm_smmu_event`.

IMO, if we don't have a strong opinion about this, there's no harm in
trying to layout the struct better even if it is a micro-optimization.

Although, I agree that size was the main concern here but if we have a
92-byte structure with 64-bit fields, we'll always have a 4-byte hole.

If we remove the `smmu` & `master_name` fields and pack all bools in
bitfields as discussed earlier, we'll have a size of 69 bytes,
(92 - 16 - 8 + 1), thus, we'd need a padding of 3 more bytes. Hence, the
size would be reduced to 72-bytes from 92-bytes. I guess, that's fine?

Thanks,
Praan

next prev parent reply	other threads:[~2024-11-04 18:16 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-18 18:00 [PATCH v4 0/3] iommu/arm-smmu-v3: Parse out event records Pranjal Shrivastava
2024-10-18 18:00 ` [PATCH v4 1/3] iommu/arm-smmu-v3: Introduce struct arm_smmu_event Pranjal Shrivastava
2024-10-19  1:56   ` Nicolin Chen
2024-10-21  6:20     ` Pranjal Shrivastava
2024-10-24 13:11   ` Will Deacon
2024-10-24 14:20     ` Pranjal Shrivastava
2024-10-24 17:02     ` Pranjal Shrivastava
2024-10-24 17:03       ` Jason Gunthorpe
2024-10-24 17:37         ` Pranjal Shrivastava
2024-10-28 12:23           ` Jason Gunthorpe
2024-10-28 14:46             ` Pranjal Shrivastava
2024-11-04 17:23       ` Daniel Mentz
2024-11-04 18:16         ` Pranjal Shrivastava [this message]
2024-11-04 18:19           ` Pranjal Shrivastava
2024-11-01 14:41   ` Robin Murphy
2024-11-01 15:08     ` Pranjal Shrivastava
2024-11-04  5:25       ` Daniel Mentz
2024-11-04  8:31         ` Pranjal Shrivastava
2024-11-07  0:10           ` Daniel Mentz
2024-11-07 14:33             ` Pranjal Shrivastava
2024-11-07  0:16   ` Daniel Mentz
2024-11-07 14:57     ` Pranjal Shrivastava
2024-11-11 22:20       ` Daniel Mentz
2024-11-12  0:52         ` Pranjal Shrivastava
2024-11-12  4:01           ` Daniel Mentz
2024-11-12  8:12             ` Pranjal Shrivastava
2024-10-18 18:00 ` [PATCH v4 2/3] iommu/arm-smmu-v3: Log better event records Pranjal Shrivastava
2024-10-19  2:06   ` Nicolin Chen
2024-10-19  4:51     ` Nicolin Chen
2024-10-21  6:29       ` Pranjal Shrivastava
2024-10-21  6:26     ` Pranjal Shrivastava
2024-10-21 22:53       ` Nicolin Chen
2024-10-24 13:15   ` Will Deacon
2024-10-24 14:14     ` Pranjal Shrivastava
2024-10-29 18:53       ` Will Deacon
2024-10-29 19:59         ` Pranjal Shrivastava
2024-10-24 19:00     ` Nicolin Chen
2024-10-29 18:49       ` Will Deacon
2024-11-01 15:05   ` Robin Murphy
2024-11-01 16:06     ` Pranjal Shrivastava
2024-11-04  6:36   ` Daniel Mentz
2024-11-04 10:51     ` Pranjal Shrivastava
2024-10-18 18:00 ` [PATCH v4 3/3] iommu/arm-smmu-v3: Avoid redundant master lookup in events Pranjal Shrivastava
2024-10-19  2:08   ` Nicolin Chen
2024-10-19  1:45 ` [PATCH v4 0/3] iommu/arm-smmu-v3: Parse out event records Nicolin Chen
2024-10-21  6:33   ` Pranjal Shrivastava
2024-10-21 22:51     ` Nicolin Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZykPkEuv70SXKiT-@google.com \
    --to=praan@google.com \
    --cc=danielmentz@google.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=nicolinc@nvidia.com \
    --cc=robin.murphy@arm.com \
    --cc=smostafa@google.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox