Re: [PATCH v4 1/3] iommu/arm-smmu-v3: Introduce struct arm_smmu_event

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Pranjal Shrivastava <praan@google.com>
To: Daniel Mentz <danielmentz@google.com>
Cc: Will Deacon <will@kernel.org>, Joerg Roedel <joro@8bytes.org>,
	Robin Murphy <robin.murphy@arm.com>,
	Mostafa Saleh <smostafa@google.com>,
	Nicolin Chen <nicolinc@nvidia.com>,
	iommu@lists.linux.dev, Jason Gunthorpe <jgg@nvidia.com>
Subject: Re: [PATCH v4 1/3] iommu/arm-smmu-v3: Introduce struct arm_smmu_event
Date: Mon, 4 Nov 2024 18:16:48 +0000	[thread overview]
Message-ID: <ZykPkEuv70SXKiT-@google.com> (raw)
In-Reply-To: <CAE2F3rAtg_dE2NpFM-xB8fs3W+_7tANnBdA00SVfKgk-y6X5Gg@mail.gmail.com>

On Mon, Nov 04, 2024 at 09:23:31AM -0800, Daniel Mentz wrote:
> On Thu, Oct 24, 2024 at 10:02 AM Pranjal Shrivastava <praan@google.com> wrote:
> >
> > On Thu, Oct 24, 2024 at 02:11:48PM +0100, Will Deacon wrote:
> > > On Fri, Oct 18, 2024 at 06:00:20PM +0000, Pranjal Shrivastava wrote:
> > > > +struct arm_smmu_event {
> > > > +   struct arm_smmu_device          *smmu;
> > > > +   u8                              id;
> > > > +   u8                              class;
> > > > +   u16                             stag;
> > > > +   u32                             sid;
> > > > +   u32                             ssid;
> > > > +   u64                             iova;
> > > > +   u64                             ipa;
> > > > +   u64                             raw[EVTQ_ENT_DWORDS];
> > > > +   bool                            stall;
> > > > +   bool                            ssid_valid;
> > > > +   bool                            privileged;
> > > > +   bool                            instruction;
> > > > +   bool                            s2;
> > > > +   bool                            read;
> > > > +};
> > >
> > > minor nit, but it might be worth seeing what pahole says about the
> > > layout of this structure in case you've got a bunch of wasted padding
> > > thanks to the mixed-size fields.
> >
> > I ran pahole with this, looks like there's only one 4-byte hole but the
> > cacheline aligment is bad (for a 64-byte cacheline):
> >
> > struct arm_smmu_event {
> >         const char  *              master_name;          /*     0     8 */
> >         struct arm_smmu_device *   smmu;                 /*     8     8 */
> >         struct device *            dev;                  /*    16     8 */
> >         u8                         id;                   /*    24     1 */
> >         u8                         class;                /*    25     1 */
> >         u16                        stag;                 /*    26     2 */
> >         u32                        sid;                  /*    28     4 */
> >         u32                        ssid;                 /*    32     4 */
> >
> >         /* XXX 4 bytes hole, try to pack */
> >
> >         u64                        iova;                 /*    40     8 */
> >         u64                        ipa;                  /*    48     8 */
> >         u64                        raw[4];               /*    56    32 */
> >         /* --- cacheline 1 boundary (64 bytes) was 24 bytes ago --- */
> >         bool                       stall;                /*    88     1 */
> >         bool                       ssid_valid;           /*    89     1 */
> >         bool                       privileged;           /*    90     1 */
> >         bool                       instruction;          /*    91     1 */
> >         bool                       s2;                   /*    92     1 */
> >         bool                       read;                 /*    93     1 */
> >         bool                       ttrnw;                /*    94     1 */
> >         bool                       ttrnw_valid;          /*    95     1 */
> >
> >         /* size: 96, cachelines: 2, members: 19 */
> >         /* sum members: 92, holes: 1, sum holes: 4 */
> >         /* last cacheline: 32 bytes */
> > };
> >
> > I don't think we can do much about the 4-byte hole as the members occupy
> > 92 bytes only. I assume a single 4-byte hole shall be fine?
> >
> > However, for cacheline aligment we can move the 3 top pointer-members,
> > `master_name`,`smmu` & `dev` which improves the cacheline aligment:
> 
> Can you be more explicit about what is a good cacheline alignment? I'm
> wondering if you're trying to ensure that raw[4] is contained within a
> single cacheline as opposed to spanning two adjacent cachelines. I

I'm using a tool `pahole` [1] as suggested by Will earlier.
The tool prints information about the layout of structures, checking for
memory wastage due to padding and if the struct layout causes
mis-alignment with the caches.

The tool printed, "cacheline 1 boundary (64 bytes) was 24 bytes ago"
hinting that with the current layout, we are wasting 24-bytes of a
cacheline.

> doubt that this is worth optimizing for. Also, I'm wondering if this
> analysis assumes that the base address of a struct arm_smmu_event
> instance is cacheline aligned, which I am not sure is the case. I
> would solely optimize for size.

You're right, the tool does assume that the struct begins at a cacheline
I'm just sharing the analysis done by the tool to be able to finalize
the layout of `struct arm_smmu_event`.

IMO, if we don't have a strong opinion about this, there's no harm in
trying to layout the struct better even if it is a micro-optimization.

Although, I agree that size was the main concern here but if we have a
92-byte structure with 64-bit fields, we'll always have a 4-byte hole.

If we remove the `smmu` & `master_name` fields and pack all bools in
bitfields as discussed earlier, we'll have a size of 69 bytes,
(92 - 16 - 8 + 1), thus, we'd need a padding of 3 more bytes. Hence, the
size would be reduced to 72-bytes from 92-bytes. I guess, that's fine?

Thanks,
Praan

next prev parent reply	other threads:[~2024-11-04 18:16 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-18 18:00 [PATCH v4 0/3] iommu/arm-smmu-v3: Parse out event records Pranjal Shrivastava
2024-10-18 18:00 ` [PATCH v4 1/3] iommu/arm-smmu-v3: Introduce struct arm_smmu_event Pranjal Shrivastava
2024-10-19  1:56   ` Nicolin Chen
2024-10-21  6:20     ` Pranjal Shrivastava
2024-10-24 13:11   ` Will Deacon
2024-10-24 14:20     ` Pranjal Shrivastava
2024-10-24 17:02     ` Pranjal Shrivastava
2024-10-24 17:03       ` Jason Gunthorpe
2024-10-24 17:37         ` Pranjal Shrivastava
2024-10-28 12:23           ` Jason Gunthorpe
2024-10-28 14:46             ` Pranjal Shrivastava
2024-11-04 17:23       ` Daniel Mentz
2024-11-04 18:16         ` Pranjal Shrivastava [this message]
2024-11-04 18:19           ` Pranjal Shrivastava
2024-11-01 14:41   ` Robin Murphy
2024-11-01 15:08     ` Pranjal Shrivastava
2024-11-04  5:25       ` Daniel Mentz
2024-11-04  8:31         ` Pranjal Shrivastava
2024-11-07  0:10           ` Daniel Mentz
2024-11-07 14:33             ` Pranjal Shrivastava
2024-11-07  0:16   ` Daniel Mentz
2024-11-07 14:57     ` Pranjal Shrivastava
2024-11-11 22:20       ` Daniel Mentz
2024-11-12  0:52         ` Pranjal Shrivastava
2024-11-12  4:01           ` Daniel Mentz
2024-11-12  8:12             ` Pranjal Shrivastava
2024-10-18 18:00 ` [PATCH v4 2/3] iommu/arm-smmu-v3: Log better event records Pranjal Shrivastava
2024-10-19  2:06   ` Nicolin Chen
2024-10-19  4:51     ` Nicolin Chen
2024-10-21  6:29       ` Pranjal Shrivastava
2024-10-21  6:26     ` Pranjal Shrivastava
2024-10-21 22:53       ` Nicolin Chen
2024-10-24 13:15   ` Will Deacon
2024-10-24 14:14     ` Pranjal Shrivastava
2024-10-29 18:53       ` Will Deacon
2024-10-29 19:59         ` Pranjal Shrivastava
2024-10-24 19:00     ` Nicolin Chen
2024-10-29 18:49       ` Will Deacon
2024-11-01 15:05   ` Robin Murphy
2024-11-01 16:06     ` Pranjal Shrivastava
2024-11-04  6:36   ` Daniel Mentz
2024-11-04 10:51     ` Pranjal Shrivastava
2024-10-18 18:00 ` [PATCH v4 3/3] iommu/arm-smmu-v3: Avoid redundant master lookup in events Pranjal Shrivastava
2024-10-19  2:08   ` Nicolin Chen
2024-10-19  1:45 ` [PATCH v4 0/3] iommu/arm-smmu-v3: Parse out event records Nicolin Chen
2024-10-21  6:33   ` Pranjal Shrivastava
2024-10-21 22:51     ` Nicolin Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZykPkEuv70SXKiT-@google.com \
    --to=praan@google.com \
    --cc=danielmentz@google.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=nicolinc@nvidia.com \
    --cc=robin.murphy@arm.com \
    --cc=smostafa@google.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.