* [PATCH net-next 1/2] mlx5/core: avoid memory barrier in eq_update_ci()
@ 2024-11-01 3:46 Caleb Sander Mateos
2024-11-01 3:46 ` [PATCH net-next 2/2] mlx5/core: deduplicate {mlx5_,}eq_update_ci() Caleb Sander Mateos
0 siblings, 1 reply; 19+ messages in thread
From: Caleb Sander Mateos @ 2024-11-01 3:46 UTC (permalink / raw)
To: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: Caleb Sander Mateos, netdev, linux-rdma, linux-kernel
The memory barrier in eq_update_ci() after the doorbell write is a
significant hot spot in mlx5_eq_comp_int(). Under heavy TCP load, we see
3% of CPU time spent on the mfence. As explained in [1], this barrier is
only needed to preserve the ordering of writes to the doorbell register.
Use writel() instead of __raw_writel() for the doorbell write to provide
this ordering without the need for a full memory barrier.
memory-barriers.txt guarantees MMIO writes using writel() appear to the
device in the same order they were made. On strongly-ordered
architectures like x86, writel() adds no overhead to __raw_writel();
both translate into a single store instruction. Removing the mb() avoids
the costly mfence instruction.
[1]: https://lore.kernel.org/netdev/CALzJLG8af0SMfA1C8U8r_Fddb_ZQhvEZd6=2a97dOoBcgLA0xg@mail.gmail.com/
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
---
drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h
index 4b7f7131c560..f03736711343 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h
@@ -68,13 +68,12 @@ static inline struct mlx5_eqe *next_eqe_sw(struct mlx5_eq *eq)
static inline void eq_update_ci(struct mlx5_eq *eq, int arm)
{
__be32 __iomem *addr = eq->doorbell + (arm ? 0 : 2);
u32 val = (eq->cons_index & 0xffffff) | (eq->eqn << 24);
- __raw_writel((__force u32)cpu_to_be32(val), addr);
- /* We still want ordering, just not swabbing, so add a barrier */
- mb();
+ /* Ensure ordering of consecutive doorbell writes */
+ writel((__force u32)cpu_to_be32(val), addr);
}
int mlx5_eq_table_init(struct mlx5_core_dev *dev);
void mlx5_eq_table_cleanup(struct mlx5_core_dev *dev);
int mlx5_eq_table_create(struct mlx5_core_dev *dev);
--
2.45.2
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH net-next 2/2] mlx5/core: deduplicate {mlx5_,}eq_update_ci()
2024-11-01 3:46 [PATCH net-next 1/2] mlx5/core: avoid memory barrier in eq_update_ci() Caleb Sander Mateos
@ 2024-11-01 3:46 ` Caleb Sander Mateos
2024-11-03 3:55 ` Parav Pandit
0 siblings, 1 reply; 19+ messages in thread
From: Caleb Sander Mateos @ 2024-11-01 3:46 UTC (permalink / raw)
To: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: Caleb Sander Mateos, netdev, linux-rdma, linux-kernel
The logic of eq_update_ci() is duplicated in mlx5_eq_update_ci(). The
only additional work done by mlx5_eq_update_ci() is to increment
eq->cons_index. Call eq_update_ci() from mlx5_eq_update_ci() to avoid
the duplication.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
---
drivers/net/ethernet/mellanox/mlx5/core/eq.c | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 859dcf09b770..078029c81935 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -802,19 +802,12 @@ struct mlx5_eqe *mlx5_eq_get_eqe(struct mlx5_eq *eq, u32 cc)
}
EXPORT_SYMBOL(mlx5_eq_get_eqe);
void mlx5_eq_update_ci(struct mlx5_eq *eq, u32 cc, bool arm)
{
- __be32 __iomem *addr = eq->doorbell + (arm ? 0 : 2);
- u32 val;
-
eq->cons_index += cc;
- val = (eq->cons_index & 0xffffff) | (eq->eqn << 24);
-
- __raw_writel((__force u32)cpu_to_be32(val), addr);
- /* We still want ordering, just not swabbing, so add a barrier */
- wmb();
+ eq_update_ci(eq, arm);
}
EXPORT_SYMBOL(mlx5_eq_update_ci);
static void comp_irq_release_pci(struct mlx5_core_dev *dev, u16 vecidx)
{
--
2.45.2
^ permalink raw reply related [flat|nested] 19+ messages in thread
* RE: [PATCH net-next 2/2] mlx5/core: deduplicate {mlx5_,}eq_update_ci()
2024-11-01 3:46 ` [PATCH net-next 2/2] mlx5/core: deduplicate {mlx5_,}eq_update_ci() Caleb Sander Mateos
@ 2024-11-03 3:55 ` Parav Pandit
2024-11-03 22:18 ` Caleb Sander
0 siblings, 1 reply; 19+ messages in thread
From: Parav Pandit @ 2024-11-03 3:55 UTC (permalink / raw)
To: Caleb Sander Mateos, Saeed Mahameed, Leon Romanovsky,
Tariq Toukan, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni
Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org
> From: Caleb Sander Mateos <csander@purestorage.com>
> Sent: Friday, November 1, 2024 9:17 AM
>
> The logic of eq_update_ci() is duplicated in mlx5_eq_update_ci(). The only
> additional work done by mlx5_eq_update_ci() is to increment
> eq->cons_index. Call eq_update_ci() from mlx5_eq_update_ci() to avoid
> the duplication.
>
> Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
> ---
> drivers/net/ethernet/mellanox/mlx5/core/eq.c | 9 +--------
> 1 file changed, 1 insertion(+), 8 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> index 859dcf09b770..078029c81935 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> @@ -802,19 +802,12 @@ struct mlx5_eqe *mlx5_eq_get_eqe(struct
> mlx5_eq *eq, u32 cc) } EXPORT_SYMBOL(mlx5_eq_get_eqe);
>
> void mlx5_eq_update_ci(struct mlx5_eq *eq, u32 cc, bool arm) {
> - __be32 __iomem *addr = eq->doorbell + (arm ? 0 : 2);
> - u32 val;
> -
> eq->cons_index += cc;
> - val = (eq->cons_index & 0xffffff) | (eq->eqn << 24);
> -
> - __raw_writel((__force u32)cpu_to_be32(val), addr);
> - /* We still want ordering, just not swabbing, so add a barrier */
> - wmb();
> + eq_update_ci(eq, arm);
Long ago I had similar rework patches to get rid of __raw_writel(), which I never got chance to push,
Eq_update_ci() is using full memory barrier.
While mlx5_eq_update_ci() is using only write memory barrier.
So it is not 100% deduplication by this patch.
Please have a pre-patch improving eq_update_ci() to use wmb().
Followed by this patch.
> }
> EXPORT_SYMBOL(mlx5_eq_update_ci);
>
> static void comp_irq_release_pci(struct mlx5_core_dev *dev, u16 vecidx) {
> --
> 2.45.2
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 2/2] mlx5/core: deduplicate {mlx5_,}eq_update_ci()
2024-11-03 3:55 ` Parav Pandit
@ 2024-11-03 22:18 ` Caleb Sander
2024-11-05 5:22 ` Parav Pandit
0 siblings, 1 reply; 19+ messages in thread
From: Caleb Sander @ 2024-11-03 22:18 UTC (permalink / raw)
To: Parav Pandit
Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org
On Sat, Nov 2, 2024 at 8:55 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Caleb Sander Mateos <csander@purestorage.com>
> > Sent: Friday, November 1, 2024 9:17 AM
> >
> > The logic of eq_update_ci() is duplicated in mlx5_eq_update_ci(). The only
> > additional work done by mlx5_eq_update_ci() is to increment
> > eq->cons_index. Call eq_update_ci() from mlx5_eq_update_ci() to avoid
> > the duplication.
> >
> > Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
> > ---
> > drivers/net/ethernet/mellanox/mlx5/core/eq.c | 9 +--------
> > 1 file changed, 1 insertion(+), 8 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> > b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> > index 859dcf09b770..078029c81935 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> > @@ -802,19 +802,12 @@ struct mlx5_eqe *mlx5_eq_get_eqe(struct
> > mlx5_eq *eq, u32 cc) } EXPORT_SYMBOL(mlx5_eq_get_eqe);
> >
> > void mlx5_eq_update_ci(struct mlx5_eq *eq, u32 cc, bool arm) {
> > - __be32 __iomem *addr = eq->doorbell + (arm ? 0 : 2);
> > - u32 val;
> > -
> > eq->cons_index += cc;
> > - val = (eq->cons_index & 0xffffff) | (eq->eqn << 24);
> > -
> > - __raw_writel((__force u32)cpu_to_be32(val), addr);
> > - /* We still want ordering, just not swabbing, so add a barrier */
> > - wmb();
> > + eq_update_ci(eq, arm);
> Long ago I had similar rework patches to get rid of __raw_writel(), which I never got chance to push,
>
> Eq_update_ci() is using full memory barrier.
> While mlx5_eq_update_ci() is using only write memory barrier.
>
> So it is not 100% deduplication by this patch.
> Please have a pre-patch improving eq_update_ci() to use wmb().
> Followed by this patch.
Right, patch 1/2 in this series is changing eq_update_ci() to use
writel() instead of __raw_writel() and avoid the memory barrier:
https://lore.kernel.org/lkml/20241101034647.51590-1-csander@purestorage.com/
Are you suggesting something different? If so, it would be great if
you could clarify what you mean.
Best,
Caleb
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [PATCH net-next 2/2] mlx5/core: deduplicate {mlx5_,}eq_update_ci()
2024-11-03 22:18 ` Caleb Sander
@ 2024-11-05 5:22 ` Parav Pandit
2024-11-05 16:06 ` Caleb Sander
0 siblings, 1 reply; 19+ messages in thread
From: Parav Pandit @ 2024-11-05 5:22 UTC (permalink / raw)
To: Caleb Sander
Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org
> From: Caleb Sander <csander@purestorage.com>
> Sent: Monday, November 4, 2024 3:49 AM
>
> On Sat, Nov 2, 2024 at 8:55 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> >
> > > From: Caleb Sander Mateos <csander@purestorage.com>
> > > Sent: Friday, November 1, 2024 9:17 AM
> > >
> > > The logic of eq_update_ci() is duplicated in mlx5_eq_update_ci().
> > > The only additional work done by mlx5_eq_update_ci() is to increment
> > > eq->cons_index. Call eq_update_ci() from mlx5_eq_update_ci() to
> > > eq->avoid
> > > the duplication.
> > >
> > > Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
> > > ---
> > > drivers/net/ethernet/mellanox/mlx5/core/eq.c | 9 +--------
> > > 1 file changed, 1 insertion(+), 8 deletions(-)
> > >
> > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> > > b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> > > index 859dcf09b770..078029c81935 100644
> > > --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> > > @@ -802,19 +802,12 @@ struct mlx5_eqe *mlx5_eq_get_eqe(struct
> > > mlx5_eq *eq, u32 cc) } EXPORT_SYMBOL(mlx5_eq_get_eqe);
> > >
> > > void mlx5_eq_update_ci(struct mlx5_eq *eq, u32 cc, bool arm) {
> > > - __be32 __iomem *addr = eq->doorbell + (arm ? 0 : 2);
> > > - u32 val;
> > > -
> > > eq->cons_index += cc;
> > > - val = (eq->cons_index & 0xffffff) | (eq->eqn << 24);
> > > -
> > > - __raw_writel((__force u32)cpu_to_be32(val), addr);
> > > - /* We still want ordering, just not swabbing, so add a barrier */
> > > - wmb();
> > > + eq_update_ci(eq, arm);
> > Long ago I had similar rework patches to get rid of __raw_writel(),
> > which I never got chance to push,
> >
> > Eq_update_ci() is using full memory barrier.
> > While mlx5_eq_update_ci() is using only write memory barrier.
> >
> > So it is not 100% deduplication by this patch.
> > Please have a pre-patch improving eq_update_ci() to use wmb().
> > Followed by this patch.
>
> Right, patch 1/2 in this series is changing eq_update_ci() to use
> writel() instead of __raw_writel() and avoid the memory barrier:
> https://lore.kernel.org/lkml/20241101034647.51590-1-
> csander@purestorage.com/
This patch has two bugs.
1. writel() writes the MMIO space in LE order. EQ updates are in BE order.
So this will break on ppc64 BE.
2. writel() issues the barrier BEFORE the raw_writel().
As opposed to that eq update needs to have a barrier AFTER the writel().
Likely to synchronize with other CQ related pointers update.
> Are you suggesting something different? If so, it would be great if you could
> clarify what you mean.
>
So I was suggesting to keep __raw_writel() as is and replace mb() with wmb().
> Best,
> Caleb
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 2/2] mlx5/core: deduplicate {mlx5_,}eq_update_ci()
2024-11-05 5:22 ` Parav Pandit
@ 2024-11-05 16:06 ` Caleb Sander
2024-11-06 5:44 ` Parav Pandit
0 siblings, 1 reply; 19+ messages in thread
From: Caleb Sander @ 2024-11-05 16:06 UTC (permalink / raw)
To: Parav Pandit
Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org
On Mon, Nov 4, 2024 at 9:22 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Caleb Sander <csander@purestorage.com>
> > Sent: Monday, November 4, 2024 3:49 AM
> >
> > On Sat, Nov 2, 2024 at 8:55 PM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > >
> > > > From: Caleb Sander Mateos <csander@purestorage.com>
> > > > Sent: Friday, November 1, 2024 9:17 AM
> > > >
> > > > The logic of eq_update_ci() is duplicated in mlx5_eq_update_ci().
> > > > The only additional work done by mlx5_eq_update_ci() is to increment
> > > > eq->cons_index. Call eq_update_ci() from mlx5_eq_update_ci() to
> > > > eq->avoid
> > > > the duplication.
> > > >
> > > > Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
> > > > ---
> > > > drivers/net/ethernet/mellanox/mlx5/core/eq.c | 9 +--------
> > > > 1 file changed, 1 insertion(+), 8 deletions(-)
> > > >
> > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> > > > b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> > > > index 859dcf09b770..078029c81935 100644
> > > > --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> > > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> > > > @@ -802,19 +802,12 @@ struct mlx5_eqe *mlx5_eq_get_eqe(struct
> > > > mlx5_eq *eq, u32 cc) } EXPORT_SYMBOL(mlx5_eq_get_eqe);
> > > >
> > > > void mlx5_eq_update_ci(struct mlx5_eq *eq, u32 cc, bool arm) {
> > > > - __be32 __iomem *addr = eq->doorbell + (arm ? 0 : 2);
> > > > - u32 val;
> > > > -
> > > > eq->cons_index += cc;
> > > > - val = (eq->cons_index & 0xffffff) | (eq->eqn << 24);
> > > > -
> > > > - __raw_writel((__force u32)cpu_to_be32(val), addr);
> > > > - /* We still want ordering, just not swabbing, so add a barrier */
> > > > - wmb();
> > > > + eq_update_ci(eq, arm);
> > > Long ago I had similar rework patches to get rid of __raw_writel(),
> > > which I never got chance to push,
> > >
> > > Eq_update_ci() is using full memory barrier.
> > > While mlx5_eq_update_ci() is using only write memory barrier.
> > >
> > > So it is not 100% deduplication by this patch.
> > > Please have a pre-patch improving eq_update_ci() to use wmb().
> > > Followed by this patch.
> >
> > Right, patch 1/2 in this series is changing eq_update_ci() to use
> > writel() instead of __raw_writel() and avoid the memory barrier:
> > https://lore.kernel.org/lkml/20241101034647.51590-1-
> > csander@purestorage.com/
> This patch has two bugs.
> 1. writel() writes the MMIO space in LE order. EQ updates are in BE order.
> So this will break on ppc64 BE.
Okay, so this should be writel(cpu_to_le32(val), addr)?
>
> 2. writel() issues the barrier BEFORE the raw_writel().
> As opposed to that eq update needs to have a barrier AFTER the writel().
> Likely to synchronize with other CQ related pointers update.
I was referencing this prior discussion about the memory barrier:
https://lore.kernel.org/netdev/CALzJLG8af0SMfA1C8U8r_Fddb_ZQhvEZd6=2a97dOoBcgLA0xg@mail.gmail.com/
From Saeed's message, it sounds like the memory barrier is only used
to ensure the ordering of writes to the doorbell register, not the
ordering of the doorbell write relative to any other writes. If some
other write needs to be ordered after the doorbell write, please
explain what it is. As Gal Pressman pointed out, a wmb() at the end of
a function doesn't make much sense, as there are no further writes in
the function to order. If the doorbell write needs to be ordered
before some other write in a caller function, the memory barrier
should probably move to the caller.
>
> > Are you suggesting something different? If so, it would be great if you could
> > clarify what you mean.
> >
> So I was suggesting to keep __raw_writel() as is and replace mb() with wmb().
wmb() would certainly be cheaper than mb(), but I would like to
understand the requirement for the barrier in the first place. The
fence instruction is very expensive.
Thanks,
Caleb
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [PATCH net-next 2/2] mlx5/core: deduplicate {mlx5_,}eq_update_ci()
2024-11-05 16:06 ` Caleb Sander
@ 2024-11-06 5:44 ` Parav Pandit
2024-11-06 23:44 ` Caleb Sander
0 siblings, 1 reply; 19+ messages in thread
From: Parav Pandit @ 2024-11-06 5:44 UTC (permalink / raw)
To: Caleb Sander
Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org
> From: Caleb Sander <csander@purestorage.com>
> Sent: Tuesday, November 5, 2024 9:36 PM
>
> On Mon, Nov 4, 2024 at 9:22 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> >
> > > From: Caleb Sander <csander@purestorage.com>
> > > Sent: Monday, November 4, 2024 3:49 AM
> > >
> > > On Sat, Nov 2, 2024 at 8:55 PM Parav Pandit <parav@nvidia.com> wrote:
> > > >
> > > >
> > > >
> > > > > From: Caleb Sander Mateos <csander@purestorage.com>
> > > > > Sent: Friday, November 1, 2024 9:17 AM
> > > > >
> > > > > The logic of eq_update_ci() is duplicated in mlx5_eq_update_ci().
> > > > > The only additional work done by mlx5_eq_update_ci() is to
> > > > > increment
> > > > > eq->cons_index. Call eq_update_ci() from mlx5_eq_update_ci() to
> > > > > eq->avoid
> > > > > the duplication.
> > > > >
> > > > > Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
> > > > > ---
> > > > > drivers/net/ethernet/mellanox/mlx5/core/eq.c | 9 +--------
> > > > > 1 file changed, 1 insertion(+), 8 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> > > > > b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> > > > > index 859dcf09b770..078029c81935 100644
> > > > > --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> > > > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> > > > > @@ -802,19 +802,12 @@ struct mlx5_eqe *mlx5_eq_get_eqe(struct
> > > > > mlx5_eq *eq, u32 cc) } EXPORT_SYMBOL(mlx5_eq_get_eqe);
> > > > >
> > > > > void mlx5_eq_update_ci(struct mlx5_eq *eq, u32 cc, bool arm) {
> > > > > - __be32 __iomem *addr = eq->doorbell + (arm ? 0 : 2);
> > > > > - u32 val;
> > > > > -
> > > > > eq->cons_index += cc;
> > > > > - val = (eq->cons_index & 0xffffff) | (eq->eqn << 24);
> > > > > -
> > > > > - __raw_writel((__force u32)cpu_to_be32(val), addr);
> > > > > - /* We still want ordering, just not swabbing, so add a barrier */
> > > > > - wmb();
> > > > > + eq_update_ci(eq, arm);
> > > > Long ago I had similar rework patches to get rid of
> > > > __raw_writel(), which I never got chance to push,
> > > >
> > > > Eq_update_ci() is using full memory barrier.
> > > > While mlx5_eq_update_ci() is using only write memory barrier.
> > > >
> > > > So it is not 100% deduplication by this patch.
> > > > Please have a pre-patch improving eq_update_ci() to use wmb().
> > > > Followed by this patch.
> > >
> > > Right, patch 1/2 in this series is changing eq_update_ci() to use
> > > writel() instead of __raw_writel() and avoid the memory barrier:
> > > https://lore.kernel.org/lkml/20241101034647.51590-1-
> > > csander@purestorage.com/
> > This patch has two bugs.
> > 1. writel() writes the MMIO space in LE order. EQ updates are in BE order.
> > So this will break on ppc64 BE.
>
> Okay, so this should be writel(cpu_to_le32(val), addr)?
>
That would break the x86 side because device should receive in BE format regardless of cpu endianness.
Above code will write in the LE format.
So an API foo_writel() need which does
a. write memory barrier
b. write to MMIO space but without endineness conversion.
> >
> > 2. writel() issues the barrier BEFORE the raw_writel().
> > As opposed to that eq update needs to have a barrier AFTER the writel().
> > Likely to synchronize with other CQ related pointers update.
>
> I was referencing this prior discussion about the memory barrier:
> https://lore.kernel.org/netdev/CALzJLG8af0SMfA1C8U8r_Fddb_ZQhvEZd6=2
> a97dOoBcgLA0xg@mail.gmail.com/
> From Saeed's message, it sounds like the memory barrier is only used to
> ensure the ordering of writes to the doorbell register, not the ordering of the
> doorbell write relative to any other writes. If some other write needs to be
> ordered after the doorbell write, please explain what it is.
Not write, reading of the CQE likely requires read barrier.
> As Gal Pressman
> pointed out, a wmb() at the end of a function doesn't make much sense, as
> there are no further writes in the function to order. If the doorbell write needs
> to be ordered before some other write in a caller function, the memory barrier
> should probably move to the caller.
It is the two EQ doorbell writes that needs to be ordered with respect to each other.
So please audit the code for CQE processing ensure that there is read barrier after valid bit.
And removal of this read barrier does not affect there.
It would be best if you can test on ARM (non x86_64) platform for this change.
>
> >
> > > Are you suggesting something different? If so, it would be great if
> > > you could clarify what you mean.
> > >
> > So I was suggesting to keep __raw_writel() as is and replace mb() with
> wmb().
>
> wmb() would certainly be cheaper than mb(), but I would like to understand
> the requirement for the barrier in the first place. The fence instruction is very
> expensive.
>
To order two doorbell writes of the same EQ.
> Thanks,
> Caleb
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 2/2] mlx5/core: deduplicate {mlx5_,}eq_update_ci()
2024-11-06 5:44 ` Parav Pandit
@ 2024-11-06 23:44 ` Caleb Sander
2024-11-07 2:36 ` Saeed Mahameed
0 siblings, 1 reply; 19+ messages in thread
From: Caleb Sander @ 2024-11-06 23:44 UTC (permalink / raw)
To: Parav Pandit
Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org
On Tue, Nov 5, 2024 at 9:44 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Caleb Sander <csander@purestorage.com>
> > Sent: Tuesday, November 5, 2024 9:36 PM
> >
> > On Mon, Nov 4, 2024 at 9:22 PM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > >
> > > > From: Caleb Sander <csander@purestorage.com>
> > > > Sent: Monday, November 4, 2024 3:49 AM
> > > >
> > > > On Sat, Nov 2, 2024 at 8:55 PM Parav Pandit <parav@nvidia.com> wrote:
> > > > >
> > > > >
> > > > >
> > > > > > From: Caleb Sander Mateos <csander@purestorage.com>
> > > > > > Sent: Friday, November 1, 2024 9:17 AM
> > > > > >
> > > > > > The logic of eq_update_ci() is duplicated in mlx5_eq_update_ci().
> > > > > > The only additional work done by mlx5_eq_update_ci() is to
> > > > > > increment
> > > > > > eq->cons_index. Call eq_update_ci() from mlx5_eq_update_ci() to
> > > > > > eq->avoid
> > > > > > the duplication.
> > > > > >
> > > > > > Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
> > > > > > ---
> > > > > > drivers/net/ethernet/mellanox/mlx5/core/eq.c | 9 +--------
> > > > > > 1 file changed, 1 insertion(+), 8 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> > > > > > b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> > > > > > index 859dcf09b770..078029c81935 100644
> > > > > > --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> > > > > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> > > > > > @@ -802,19 +802,12 @@ struct mlx5_eqe *mlx5_eq_get_eqe(struct
> > > > > > mlx5_eq *eq, u32 cc) } EXPORT_SYMBOL(mlx5_eq_get_eqe);
> > > > > >
> > > > > > void mlx5_eq_update_ci(struct mlx5_eq *eq, u32 cc, bool arm) {
> > > > > > - __be32 __iomem *addr = eq->doorbell + (arm ? 0 : 2);
> > > > > > - u32 val;
> > > > > > -
> > > > > > eq->cons_index += cc;
> > > > > > - val = (eq->cons_index & 0xffffff) | (eq->eqn << 24);
> > > > > > -
> > > > > > - __raw_writel((__force u32)cpu_to_be32(val), addr);
> > > > > > - /* We still want ordering, just not swabbing, so add a barrier */
> > > > > > - wmb();
> > > > > > + eq_update_ci(eq, arm);
> > > > > Long ago I had similar rework patches to get rid of
> > > > > __raw_writel(), which I never got chance to push,
> > > > >
> > > > > Eq_update_ci() is using full memory barrier.
> > > > > While mlx5_eq_update_ci() is using only write memory barrier.
> > > > >
> > > > > So it is not 100% deduplication by this patch.
> > > > > Please have a pre-patch improving eq_update_ci() to use wmb().
> > > > > Followed by this patch.
> > > >
> > > > Right, patch 1/2 in this series is changing eq_update_ci() to use
> > > > writel() instead of __raw_writel() and avoid the memory barrier:
> > > > https://lore.kernel.org/lkml/20241101034647.51590-1-
> > > > csander@purestorage.com/
> > > This patch has two bugs.
> > > 1. writel() writes the MMIO space in LE order. EQ updates are in BE order.
> > > So this will break on ppc64 BE.
> >
> > Okay, so this should be writel(cpu_to_le32(val), addr)?
> >
> That would break the x86 side because device should receive in BE format regardless of cpu endianness.
> Above code will write in the LE format.
>
> So an API foo_writel() need which does
> a. write memory barrier
> b. write to MMIO space but without endineness conversion.
Got it, thanks. writel(bswap_32(val, addr)) should work, then? I
suppose it may introduce a second bswap on BE architectures, but
that's probably worth it to avoid the memory barrier.
>
> > >
> > > 2. writel() issues the barrier BEFORE the raw_writel().
> > > As opposed to that eq update needs to have a barrier AFTER the writel().
> > > Likely to synchronize with other CQ related pointers update.
> >
> > I was referencing this prior discussion about the memory barrier:
> > https://lore.kernel.org/netdev/CALzJLG8af0SMfA1C8U8r_Fddb_ZQhvEZd6=2
> > a97dOoBcgLA0xg@mail.gmail.com/
> > From Saeed's message, it sounds like the memory barrier is only used to
> > ensure the ordering of writes to the doorbell register, not the ordering of the
> > doorbell write relative to any other writes. If some other write needs to be
> > ordered after the doorbell write, please explain what it is.
> Not write, reading of the CQE likely requires read barrier.
But mlx5_eq_update_ci() is already using wmb(), which only imposes an
ordering on writes. So if the existing code is correct, the memory
barrier cannot be required to order the doorbell write with respect to
a read of the CQE.
>
> > As Gal Pressman
> > pointed out, a wmb() at the end of a function doesn't make much sense, as
> > there are no further writes in the function to order. If the doorbell write needs
> > to be ordered before some other write in a caller function, the memory barrier
> > should probably move to the caller.
> It is the two EQ doorbell writes that needs to be ordered with respect to each other.
> So please audit the code for CQE processing ensure that there is read barrier after valid bit.
> And removal of this read barrier does not affect there.
>
> It would be best if you can test on ARM (non x86_64) platform for this change.
Unfortunately I don't have access to any platform besides x86_64 with
ConnectX cards.
>
> >
> > >
> > > > Are you suggesting something different? If so, it would be great if
> > > > you could clarify what you mean.
> > > >
> > > So I was suggesting to keep __raw_writel() as is and replace mb() with
> > wmb().
> >
> > wmb() would certainly be cheaper than mb(), but I would like to understand
> > the requirement for the barrier in the first place. The fence instruction is very
> > expensive.
> >
> To order two doorbell writes of the same EQ.
Right, I understand why the memory barrier is needed with the existing
__raw_writel(), as it provides no ordering guarantees. But writel()
seems to guarantee the necessary ordering of writes to the EQ's
doorbell. This is the relevant documentation from memory-barriers.txt
(I assume the mutual exclusion of interrupt handlers is equivalent to
holding a spinlock):
2. A writeX() issued by a CPU thread holding a spinlock is ordered
before a writeX() to the same peripheral from another CPU thread
issued after a later acquisition of the same spinlock. This ensures
that MMIO register writes to a particular device issued while holding
a spinlock will arrive in an order consistent with acquisitions of
the lock.
Do you still think the barrier is necessary if writel() is used instead?
If you feel strongly about keeping the wmb(), I can do that. It's
certainly an improvement over the full memory barrier. But fences are
quite expensive, so I would prefer to remove it if it's not necessary
for correctness.
Thanks,
Caleb
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 2/2] mlx5/core: deduplicate {mlx5_,}eq_update_ci()
2024-11-06 23:44 ` Caleb Sander
@ 2024-11-07 2:36 ` Saeed Mahameed
2024-11-07 2:43 ` Parav Pandit
2024-11-07 4:45 ` Caleb Sander
0 siblings, 2 replies; 19+ messages in thread
From: Saeed Mahameed @ 2024-11-07 2:36 UTC (permalink / raw)
To: Caleb Sander
Cc: Parav Pandit, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org
On 06 Nov 15:44, Caleb Sander wrote:
>On Tue, Nov 5, 2024 at 9:44 PM Parav Pandit <parav@nvidia.com> wrote:
>>
>>
>> > From: Caleb Sander <csander@purestorage.com>
>> > Sent: Tuesday, November 5, 2024 9:36 PM
>> >
>> > On Mon, Nov 4, 2024 at 9:22 PM Parav Pandit <parav@nvidia.com> wrote:
>> > >
>> > >
>> > >
>> > > > From: Caleb Sander <csander@purestorage.com>
>> > > > Sent: Monday, November 4, 2024 3:49 AM
>> > > >
>> > > > On Sat, Nov 2, 2024 at 8:55 PM Parav Pandit <parav@nvidia.com> wrote:
>> > > > >
>> > > > >
>> > > > >
>> > > > > > From: Caleb Sander Mateos <csander@purestorage.com>
>> > > > > > Sent: Friday, November 1, 2024 9:17 AM
>> > > > > >
>> > > > > > The logic of eq_update_ci() is duplicated in mlx5_eq_update_ci().
>> > > > > > The only additional work done by mlx5_eq_update_ci() is to
>> > > > > > increment
>> > > > > > eq->cons_index. Call eq_update_ci() from mlx5_eq_update_ci() to
>> > > > > > eq->avoid
>> > > > > > the duplication.
>> > > > > >
>> > > > > > Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
>> > > > > > ---
>> > > > > > drivers/net/ethernet/mellanox/mlx5/core/eq.c | 9 +--------
>> > > > > > 1 file changed, 1 insertion(+), 8 deletions(-)
>> > > > > >
>> > > > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
>> > > > > > b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
>> > > > > > index 859dcf09b770..078029c81935 100644
>> > > > > > --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
>> > > > > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
>> > > > > > @@ -802,19 +802,12 @@ struct mlx5_eqe *mlx5_eq_get_eqe(struct
>> > > > > > mlx5_eq *eq, u32 cc) } EXPORT_SYMBOL(mlx5_eq_get_eqe);
>> > > > > >
>> > > > > > void mlx5_eq_update_ci(struct mlx5_eq *eq, u32 cc, bool arm) {
>> > > > > > - __be32 __iomem *addr = eq->doorbell + (arm ? 0 : 2);
>> > > > > > - u32 val;
>> > > > > > -
>> > > > > > eq->cons_index += cc;
>> > > > > > - val = (eq->cons_index & 0xffffff) | (eq->eqn << 24);
>> > > > > > -
>> > > > > > - __raw_writel((__force u32)cpu_to_be32(val), addr);
>> > > > > > - /* We still want ordering, just not swabbing, so add a barrier */
>> > > > > > - wmb();
>> > > > > > + eq_update_ci(eq, arm);
>> > > > > Long ago I had similar rework patches to get rid of
>> > > > > __raw_writel(), which I never got chance to push,
>> > > > >
>> > > > > Eq_update_ci() is using full memory barrier.
>> > > > > While mlx5_eq_update_ci() is using only write memory barrier.
>> > > > >
>> > > > > So it is not 100% deduplication by this patch.
>> > > > > Please have a pre-patch improving eq_update_ci() to use wmb().
>> > > > > Followed by this patch.
>> > > >
>> > > > Right, patch 1/2 in this series is changing eq_update_ci() to use
>> > > > writel() instead of __raw_writel() and avoid the memory barrier:
>> > > > https://lore.kernel.org/lkml/20241101034647.51590-1-
>> > > > csander@purestorage.com/
>> > > This patch has two bugs.
>> > > 1. writel() writes the MMIO space in LE order. EQ updates are in BE order.
>> > > So this will break on ppc64 BE.
>> >
>> > Okay, so this should be writel(cpu_to_le32(val), addr)?
>> >
>> That would break the x86 side because device should receive in BE format regardless of cpu endianness.
>> Above code will write in the LE format.
>>
>> So an API foo_writel() need which does
>> a. write memory barrier
>> b. write to MMIO space but without endineness conversion.
>
>Got it, thanks. writel(bswap_32(val, addr)) should work, then? I
>suppose it may introduce a second bswap on BE architectures, but
>that's probably worth it to avoid the memory barrier.
>
The existing mb() needs to be changed to wmb(), this will provide a more
efficient fence on most architectures.
I don't understand why you are still discussing the use of writel(), yes
it will work but you are introducing two unconditional swaps per doorbell
write.
Just replace the existing mb with wmb() in eq_update_ci()
And if you have time to write one extra patch, please reuse eq_update_ci()
inside mlx5_eq_update_ci().
mlx5_eq_update_ci(eq, cc, arm) {
eq->cons_index += cc;
eq_update_ci(eq, arm);
}
So we won't have two different implementations of EQ doorbell ringing
anymore.
Thanks,
Saeed.
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [PATCH net-next 2/2] mlx5/core: deduplicate {mlx5_,}eq_update_ci()
2024-11-07 2:36 ` Saeed Mahameed
@ 2024-11-07 2:43 ` Parav Pandit
2024-11-07 4:45 ` Caleb Sander
1 sibling, 0 replies; 19+ messages in thread
From: Parav Pandit @ 2024-11-07 2:43 UTC (permalink / raw)
To: Saeed Mahameed, Caleb Sander
Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org
> From: Saeed Mahameed <saeed@kernel.org>
> Sent: Thursday, November 7, 2024 8:06 AM
>
> On 06 Nov 15:44, Caleb Sander wrote:
> >On Tue, Nov 5, 2024 at 9:44 PM Parav Pandit <parav@nvidia.com> wrote:
> >>
> >>
> >> > From: Caleb Sander <csander@purestorage.com>
> >> > Sent: Tuesday, November 5, 2024 9:36 PM
> >> >
> >> > On Mon, Nov 4, 2024 at 9:22 PM Parav Pandit <parav@nvidia.com>
> wrote:
> >> > >
> >> > >
> >> > >
> >> > > > From: Caleb Sander <csander@purestorage.com>
> >> > > > Sent: Monday, November 4, 2024 3:49 AM
> >> > > >
> >> > > > On Sat, Nov 2, 2024 at 8:55 PM Parav Pandit <parav@nvidia.com>
> wrote:
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > > From: Caleb Sander Mateos <csander@purestorage.com>
> >> > > > > > Sent: Friday, November 1, 2024 9:17 AM
> >> > > > > >
> >> > > > > > The logic of eq_update_ci() is duplicated in mlx5_eq_update_ci().
> >> > > > > > The only additional work done by mlx5_eq_update_ci() is to
> >> > > > > > increment
> >> > > > > > eq->cons_index. Call eq_update_ci() from
> >> > > > > > eq->mlx5_eq_update_ci() to avoid
> >> > > > > > the duplication.
> >> > > > > >
> >> > > > > > Signed-off-by: Caleb Sander Mateos
> >> > > > > > <csander@purestorage.com>
> >> > > > > > ---
> >> > > > > > drivers/net/ethernet/mellanox/mlx5/core/eq.c | 9 +--------
> >> > > > > > 1 file changed, 1 insertion(+), 8 deletions(-)
> >> > > > > >
> >> > > > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> >> > > > > > b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> >> > > > > > index 859dcf09b770..078029c81935 100644
> >> > > > > > --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> >> > > > > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> >> > > > > > @@ -802,19 +802,12 @@ struct mlx5_eqe
> >> > > > > > *mlx5_eq_get_eqe(struct mlx5_eq *eq, u32 cc) }
> >> > > > > > EXPORT_SYMBOL(mlx5_eq_get_eqe);
> >> > > > > >
> >> > > > > > void mlx5_eq_update_ci(struct mlx5_eq *eq, u32 cc, bool arm) {
> >> > > > > > - __be32 __iomem *addr = eq->doorbell + (arm ? 0 : 2);
> >> > > > > > - u32 val;
> >> > > > > > -
> >> > > > > > eq->cons_index += cc;
> >> > > > > > - val = (eq->cons_index & 0xffffff) | (eq->eqn << 24);
> >> > > > > > -
> >> > > > > > - __raw_writel((__force u32)cpu_to_be32(val), addr);
> >> > > > > > - /* We still want ordering, just not swabbing, so add a barrier
> */
> >> > > > > > - wmb();
> >> > > > > > + eq_update_ci(eq, arm);
> >> > > > > Long ago I had similar rework patches to get rid of
> >> > > > > __raw_writel(), which I never got chance to push,
> >> > > > >
> >> > > > > Eq_update_ci() is using full memory barrier.
> >> > > > > While mlx5_eq_update_ci() is using only write memory barrier.
> >> > > > >
> >> > > > > So it is not 100% deduplication by this patch.
> >> > > > > Please have a pre-patch improving eq_update_ci() to use wmb().
> >> > > > > Followed by this patch.
> >> > > >
> >> > > > Right, patch 1/2 in this series is changing eq_update_ci() to
> >> > > > use
> >> > > > writel() instead of __raw_writel() and avoid the memory barrier:
> >> > > > https://lore.kernel.org/lkml/20241101034647.51590-1-
> >> > > > csander@purestorage.com/
> >> > > This patch has two bugs.
> >> > > 1. writel() writes the MMIO space in LE order. EQ updates are in BE
> order.
> >> > > So this will break on ppc64 BE.
> >> >
> >> > Okay, so this should be writel(cpu_to_le32(val), addr)?
> >> >
> >> That would break the x86 side because device should receive in BE format
> regardless of cpu endianness.
> >> Above code will write in the LE format.
> >>
> >> So an API foo_writel() need which does a. write memory barrier b.
> >> write to MMIO space but without endineness conversion.
> >
> >Got it, thanks. writel(bswap_32(val, addr)) should work, then? I
> >suppose it may introduce a second bswap on BE architectures, but that's
> >probably worth it to avoid the memory barrier.
> >
>
> The existing mb() needs to be changed to wmb(), this will provide a more
> efficient fence on most architectures.
>
> I don't understand why you are still discussing the use of writel(), yes it will
> work but you are introducing two unconditional swaps per doorbell write.
>
> Just replace the existing mb with wmb() in eq_update_ci()
>
> And if you have time to write one extra patch, please reuse eq_update_ci()
> inside mlx5_eq_update_ci().
>
> mlx5_eq_update_ci(eq, cc, arm) {
> eq->cons_index += cc;
> eq_update_ci(eq, arm);
> }
>
> So we won't have two different implementations of EQ doorbell ringing
> anymore.
>
Yes, and its easy to read to understand why cpu_to_be32 is done.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 2/2] mlx5/core: deduplicate {mlx5_,}eq_update_ci()
2024-11-07 2:36 ` Saeed Mahameed
2024-11-07 2:43 ` Parav Pandit
@ 2024-11-07 4:45 ` Caleb Sander
2024-11-07 5:14 ` Saeed Mahameed
1 sibling, 1 reply; 19+ messages in thread
From: Caleb Sander @ 2024-11-07 4:45 UTC (permalink / raw)
To: Saeed Mahameed
Cc: Parav Pandit, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org
On Wed, Nov 6, 2024 at 6:36 PM Saeed Mahameed <saeed@kernel.org> wrote:
>
> On 06 Nov 15:44, Caleb Sander wrote:
> >On Tue, Nov 5, 2024 at 9:44 PM Parav Pandit <parav@nvidia.com> wrote:
> >>
> >>
> >> > From: Caleb Sander <csander@purestorage.com>
> >> > Sent: Tuesday, November 5, 2024 9:36 PM
> >> >
> >> > On Mon, Nov 4, 2024 at 9:22 PM Parav Pandit <parav@nvidia.com> wrote:
> >> > >
> >> > >
> >> > >
> >> > > > From: Caleb Sander <csander@purestorage.com>
> >> > > > Sent: Monday, November 4, 2024 3:49 AM
> >> > > >
> >> > > > On Sat, Nov 2, 2024 at 8:55 PM Parav Pandit <parav@nvidia.com> wrote:
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > > From: Caleb Sander Mateos <csander@purestorage.com>
> >> > > > > > Sent: Friday, November 1, 2024 9:17 AM
> >> > > > > >
> >> > > > > > The logic of eq_update_ci() is duplicated in mlx5_eq_update_ci().
> >> > > > > > The only additional work done by mlx5_eq_update_ci() is to
> >> > > > > > increment
> >> > > > > > eq->cons_index. Call eq_update_ci() from mlx5_eq_update_ci() to
> >> > > > > > eq->avoid
> >> > > > > > the duplication.
> >> > > > > >
> >> > > > > > Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
> >> > > > > > ---
> >> > > > > > drivers/net/ethernet/mellanox/mlx5/core/eq.c | 9 +--------
> >> > > > > > 1 file changed, 1 insertion(+), 8 deletions(-)
> >> > > > > >
> >> > > > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> >> > > > > > b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> >> > > > > > index 859dcf09b770..078029c81935 100644
> >> > > > > > --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> >> > > > > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> >> > > > > > @@ -802,19 +802,12 @@ struct mlx5_eqe *mlx5_eq_get_eqe(struct
> >> > > > > > mlx5_eq *eq, u32 cc) } EXPORT_SYMBOL(mlx5_eq_get_eqe);
> >> > > > > >
> >> > > > > > void mlx5_eq_update_ci(struct mlx5_eq *eq, u32 cc, bool arm) {
> >> > > > > > - __be32 __iomem *addr = eq->doorbell + (arm ? 0 : 2);
> >> > > > > > - u32 val;
> >> > > > > > -
> >> > > > > > eq->cons_index += cc;
> >> > > > > > - val = (eq->cons_index & 0xffffff) | (eq->eqn << 24);
> >> > > > > > -
> >> > > > > > - __raw_writel((__force u32)cpu_to_be32(val), addr);
> >> > > > > > - /* We still want ordering, just not swabbing, so add a barrier */
> >> > > > > > - wmb();
> >> > > > > > + eq_update_ci(eq, arm);
> >> > > > > Long ago I had similar rework patches to get rid of
> >> > > > > __raw_writel(), which I never got chance to push,
> >> > > > >
> >> > > > > Eq_update_ci() is using full memory barrier.
> >> > > > > While mlx5_eq_update_ci() is using only write memory barrier.
> >> > > > >
> >> > > > > So it is not 100% deduplication by this patch.
> >> > > > > Please have a pre-patch improving eq_update_ci() to use wmb().
> >> > > > > Followed by this patch.
> >> > > >
> >> > > > Right, patch 1/2 in this series is changing eq_update_ci() to use
> >> > > > writel() instead of __raw_writel() and avoid the memory barrier:
> >> > > > https://lore.kernel.org/lkml/20241101034647.51590-1-
> >> > > > csander@purestorage.com/
> >> > > This patch has two bugs.
> >> > > 1. writel() writes the MMIO space in LE order. EQ updates are in BE order.
> >> > > So this will break on ppc64 BE.
> >> >
> >> > Okay, so this should be writel(cpu_to_le32(val), addr)?
> >> >
> >> That would break the x86 side because device should receive in BE format regardless of cpu endianness.
> >> Above code will write in the LE format.
> >>
> >> So an API foo_writel() need which does
> >> a. write memory barrier
> >> b. write to MMIO space but without endineness conversion.
> >
> >Got it, thanks. writel(bswap_32(val, addr)) should work, then? I
> >suppose it may introduce a second bswap on BE architectures, but
> >that's probably worth it to avoid the memory barrier.
> >
>
> The existing mb() needs to be changed to wmb(), this will provide a more
> efficient fence on most architectures.
>
> I don't understand why you are still discussing the use of writel(), yes
> it will work but you are introducing two unconditional swaps per doorbell
> write.
Well, no memory fence is cheaper still than a wmb(). But it's your
driver, so if you prefer to use wmb() rather than switch to writel(),
that's fine. I'll update the patch series.
As for the bytes swaps in writel(bswap_32(val), addr), it would still
be 1 on LE architectures, but 2 instead of 0 on BE architectures.
Certainly a bit inefficient, but probably less overhead than the
memory barrier currently adds on strongly-ordered architectures.
>
> Just replace the existing mb with wmb() in eq_update_ci()
>
> And if you have time to write one extra patch, please reuse eq_update_ci()
> inside mlx5_eq_update_ci().
>
> mlx5_eq_update_ci(eq, cc, arm) {
> eq->cons_index += cc;
> eq_update_ci(eq, arm);
> }
>
> So we won't have two different implementations of EQ doorbell ringing
> anymore.
Isn't this what my patch 2 (at the start of this reply chain) already
does? If you are suggesting something else, please clarify.
Thanks for the reviews,
Caleb
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 2/2] mlx5/core: deduplicate {mlx5_,}eq_update_ci()
2024-11-07 4:45 ` Caleb Sander
@ 2024-11-07 5:14 ` Saeed Mahameed
2024-11-07 18:30 ` [PATCH net-next v2 1/2] mlx5/core: relax memory barrier in eq_update_ci() Caleb Sander Mateos
0 siblings, 1 reply; 19+ messages in thread
From: Saeed Mahameed @ 2024-11-07 5:14 UTC (permalink / raw)
To: Caleb Sander
Cc: Parav Pandit, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org
On 06 Nov 20:45, Caleb Sander wrote:
>On Wed, Nov 6, 2024 at 6:36 PM Saeed Mahameed <saeed@kernel.org> wrote:
>>
>> On 06 Nov 15:44, Caleb Sander wrote:
>> >On Tue, Nov 5, 2024 at 9:44 PM Parav Pandit <parav@nvidia.com> wrote:
>> >>
>> >>
>> >> > From: Caleb Sander <csander@purestorage.com>
>> >> > Sent: Tuesday, November 5, 2024 9:36 PM
>> >> >
>> >> > On Mon, Nov 4, 2024 at 9:22 PM Parav Pandit <parav@nvidia.com> wrote:
>> >> > >
>> >> > >
>> >> > >
>> >> > > > From: Caleb Sander <csander@purestorage.com>
>> >> > > > Sent: Monday, November 4, 2024 3:49 AM
>> >> > > >
>> >> > > > On Sat, Nov 2, 2024 at 8:55 PM Parav Pandit <parav@nvidia.com> wrote:
>> >> > > > >
>> >> > > > >
>> >> > > > >
>> >> > > > > > From: Caleb Sander Mateos <csander@purestorage.com>
>> >> > > > > > Sent: Friday, November 1, 2024 9:17 AM
>> >> > > > > >
>> >> > > > > > The logic of eq_update_ci() is duplicated in mlx5_eq_update_ci().
>> >> > > > > > The only additional work done by mlx5_eq_update_ci() is to
>> >> > > > > > increment
>> >> > > > > > eq->cons_index. Call eq_update_ci() from mlx5_eq_update_ci() to
>> >> > > > > > eq->avoid
>> >> > > > > > the duplication.
>> >> > > > > >
>> >> > > > > > Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
>> >> > > > > > ---
>> >> > > > > > drivers/net/ethernet/mellanox/mlx5/core/eq.c | 9 +--------
>> >> > > > > > 1 file changed, 1 insertion(+), 8 deletions(-)
>> >> > > > > >
>> >> > > > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
>> >> > > > > > b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
>> >> > > > > > index 859dcf09b770..078029c81935 100644
>> >> > > > > > --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
>> >> > > > > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
>> >> > > > > > @@ -802,19 +802,12 @@ struct mlx5_eqe *mlx5_eq_get_eqe(struct
>> >> > > > > > mlx5_eq *eq, u32 cc) } EXPORT_SYMBOL(mlx5_eq_get_eqe);
>> >> > > > > >
>> >> > > > > > void mlx5_eq_update_ci(struct mlx5_eq *eq, u32 cc, bool arm) {
>> >> > > > > > - __be32 __iomem *addr = eq->doorbell + (arm ? 0 : 2);
>> >> > > > > > - u32 val;
>> >> > > > > > -
>> >> > > > > > eq->cons_index += cc;
>> >> > > > > > - val = (eq->cons_index & 0xffffff) | (eq->eqn << 24);
>> >> > > > > > -
>> >> > > > > > - __raw_writel((__force u32)cpu_to_be32(val), addr);
>> >> > > > > > - /* We still want ordering, just not swabbing, so add a barrier */
>> >> > > > > > - wmb();
>> >> > > > > > + eq_update_ci(eq, arm);
>> >> > > > > Long ago I had similar rework patches to get rid of
>> >> > > > > __raw_writel(), which I never got chance to push,
>> >> > > > >
>> >> > > > > Eq_update_ci() is using full memory barrier.
>> >> > > > > While mlx5_eq_update_ci() is using only write memory barrier.
>> >> > > > >
>> >> > > > > So it is not 100% deduplication by this patch.
>> >> > > > > Please have a pre-patch improving eq_update_ci() to use wmb().
>> >> > > > > Followed by this patch.
>> >> > > >
>> >> > > > Right, patch 1/2 in this series is changing eq_update_ci() to use
>> >> > > > writel() instead of __raw_writel() and avoid the memory barrier:
>> >> > > > https://lore.kernel.org/lkml/20241101034647.51590-1-
>> >> > > > csander@purestorage.com/
>> >> > > This patch has two bugs.
>> >> > > 1. writel() writes the MMIO space in LE order. EQ updates are in BE order.
>> >> > > So this will break on ppc64 BE.
>> >> >
>> >> > Okay, so this should be writel(cpu_to_le32(val), addr)?
>> >> >
>> >> That would break the x86 side because device should receive in BE format regardless of cpu endianness.
>> >> Above code will write in the LE format.
>> >>
>> >> So an API foo_writel() need which does
>> >> a. write memory barrier
>> >> b. write to MMIO space but without endineness conversion.
>> >
>> >Got it, thanks. writel(bswap_32(val, addr)) should work, then? I
>> >suppose it may introduce a second bswap on BE architectures, but
>> >that's probably worth it to avoid the memory barrier.
>> >
>>
>> The existing mb() needs to be changed to wmb(), this will provide a more
>> efficient fence on most architectures.
>>
>> I don't understand why you are still discussing the use of writel(), yes
>> it will work but you are introducing two unconditional swaps per doorbell
>> write.
>
>Well, no memory fence is cheaper still than a wmb(). But it's your
>driver, so if you prefer to use wmb() rather than switch to writel(),
>that's fine. I'll update the patch series.
yest wmb() please.
>As for the bytes swaps in writel(bswap_32(val), addr), it would still
>be 1 on LE architectures, but 2 instead of 0 on BE architectures.
>Certainly a bit inefficient, but probably less overhead than the
>memory barrier currently adds on strongly-ordered architectures.
>
yes we all agree:
mb << writel() < _raw_writel() + wmb ()
>>
>> Just replace the existing mb with wmb() in eq_update_ci()
>>
>> And if you have time to write one extra patch, please reuse eq_update_ci()
>> inside mlx5_eq_update_ci().
>>
>> mlx5_eq_update_ci(eq, cc, arm) {
>> eq->cons_index += cc;
>> eq_update_ci(eq, arm);
>> }
>>
>> So we won't have two different implementations of EQ doorbell ringing
>> anymore.
>
>Isn't this what my patch 2 (at the start of this reply chain) already
>does? If you are suggesting something else, please clarify.
>
Sorry i missed it,
Yes this is perfect, thanks for the patches.
Let's do the wmb() and call it a day :) ..
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH net-next v2 1/2] mlx5/core: relax memory barrier in eq_update_ci()
2024-11-07 5:14 ` Saeed Mahameed
@ 2024-11-07 18:30 ` Caleb Sander Mateos
2024-11-07 18:30 ` [PATCH net-next v2 2/2] mlx5/core: deduplicate {mlx5_,}eq_update_ci() Caleb Sander Mateos
` (2 more replies)
0 siblings, 3 replies; 19+ messages in thread
From: Caleb Sander Mateos @ 2024-11-07 18:30 UTC (permalink / raw)
To: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: Parav Pandit, Caleb Sander Mateos, netdev, linux-rdma,
linux-kernel
The memory barrier in eq_update_ci() after the doorbell write is a
significant hot spot in mlx5_eq_comp_int(). Under heavy TCP load, we see
3% of CPU time spent on the mfence instruction.
98df6d5b877c ("net/mlx5: A write memory barrier is sufficient in EQ ci
update") already relaxed the full memory barrier to just a write barrier
in mlx5_eq_update_ci(), which duplicates eq_update_ci(). So replace mb()
with wmb() in eq_update_ci() too.
On strongly ordered architectures, no barrier is actually needed because
the MMIO writes to the doorbell register are guaranteed to appear to the
device in the order they were made. However, the kernel's ordered MMIO
primitive writel() lacks a convenient big-endian interface.
Therefore, we opt to stick with __raw_writel() + a barrier.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
---
v2: keep memory barrier instead of using ordered writel()
drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h
index 4b7f7131c560..b1edc71ffc6d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h
@@ -70,11 +70,11 @@ static inline void eq_update_ci(struct mlx5_eq *eq, int arm)
__be32 __iomem *addr = eq->doorbell + (arm ? 0 : 2);
u32 val = (eq->cons_index & 0xffffff) | (eq->eqn << 24);
__raw_writel((__force u32)cpu_to_be32(val), addr);
/* We still want ordering, just not swabbing, so add a barrier */
- mb();
+ wmb();
}
int mlx5_eq_table_init(struct mlx5_core_dev *dev);
void mlx5_eq_table_cleanup(struct mlx5_core_dev *dev);
int mlx5_eq_table_create(struct mlx5_core_dev *dev);
--
2.45.2
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH net-next v2 2/2] mlx5/core: deduplicate {mlx5_,}eq_update_ci()
2024-11-07 18:30 ` [PATCH net-next v2 1/2] mlx5/core: relax memory barrier in eq_update_ci() Caleb Sander Mateos
@ 2024-11-07 18:30 ` Caleb Sander Mateos
2024-11-08 10:49 ` Parav Pandit
2024-11-08 10:46 ` [PATCH net-next v2 1/2] mlx5/core: relax memory barrier in eq_update_ci() Parav Pandit
2024-11-12 2:38 ` Jakub Kicinski
2 siblings, 1 reply; 19+ messages in thread
From: Caleb Sander Mateos @ 2024-11-07 18:30 UTC (permalink / raw)
To: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: Parav Pandit, Caleb Sander Mateos, netdev, linux-rdma,
linux-kernel
The logic of eq_update_ci() is duplicated in mlx5_eq_update_ci(). The
only additional work done by mlx5_eq_update_ci() is to increment
eq->cons_index. Call eq_update_ci() from mlx5_eq_update_ci() to avoid
the duplication.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
---
drivers/net/ethernet/mellanox/mlx5/core/eq.c | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 859dcf09b770..078029c81935 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -802,19 +802,12 @@ struct mlx5_eqe *mlx5_eq_get_eqe(struct mlx5_eq *eq, u32 cc)
}
EXPORT_SYMBOL(mlx5_eq_get_eqe);
void mlx5_eq_update_ci(struct mlx5_eq *eq, u32 cc, bool arm)
{
- __be32 __iomem *addr = eq->doorbell + (arm ? 0 : 2);
- u32 val;
-
eq->cons_index += cc;
- val = (eq->cons_index & 0xffffff) | (eq->eqn << 24);
-
- __raw_writel((__force u32)cpu_to_be32(val), addr);
- /* We still want ordering, just not swabbing, so add a barrier */
- wmb();
+ eq_update_ci(eq, arm);
}
EXPORT_SYMBOL(mlx5_eq_update_ci);
static void comp_irq_release_pci(struct mlx5_core_dev *dev, u16 vecidx)
{
--
2.45.2
^ permalink raw reply related [flat|nested] 19+ messages in thread
* RE: [PATCH net-next v2 1/2] mlx5/core: relax memory barrier in eq_update_ci()
2024-11-07 18:30 ` [PATCH net-next v2 1/2] mlx5/core: relax memory barrier in eq_update_ci() Caleb Sander Mateos
2024-11-07 18:30 ` [PATCH net-next v2 2/2] mlx5/core: deduplicate {mlx5_,}eq_update_ci() Caleb Sander Mateos
@ 2024-11-08 10:46 ` Parav Pandit
2024-11-11 11:59 ` Tariq Toukan
2024-11-12 2:38 ` Jakub Kicinski
2 siblings, 1 reply; 19+ messages in thread
From: Parav Pandit @ 2024-11-08 10:46 UTC (permalink / raw)
To: Caleb Sander Mateos, Saeed Mahameed, Leon Romanovsky,
Tariq Toukan, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni
Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org
> From: Caleb Sander Mateos <csander@purestorage.com>
> Sent: Friday, November 8, 2024 12:01 AM
>
> The memory barrier in eq_update_ci() after the doorbell write is a significant
> hot spot in mlx5_eq_comp_int(). Under heavy TCP load, we see 3% of CPU
> time spent on the mfence instruction.
>
> 98df6d5b877c ("net/mlx5: A write memory barrier is sufficient in EQ ci
> update") already relaxed the full memory barrier to just a write barrier in
> mlx5_eq_update_ci(), which duplicates eq_update_ci(). So replace mb() with
> wmb() in eq_update_ci() too.
>
> On strongly ordered architectures, no barrier is actually needed because the
> MMIO writes to the doorbell register are guaranteed to appear to the device in
> the order they were made. However, the kernel's ordered MMIO primitive
> writel() lacks a convenient big-endian interface.
> Therefore, we opt to stick with __raw_writel() + a barrier.
>
> Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
> ---
> v2: keep memory barrier instead of using ordered writel()
>
> drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h
> b/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h
> index 4b7f7131c560..b1edc71ffc6d 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h
> @@ -70,11 +70,11 @@ static inline void eq_update_ci(struct mlx5_eq *eq, int
> arm)
> __be32 __iomem *addr = eq->doorbell + (arm ? 0 : 2);
> u32 val = (eq->cons_index & 0xffffff) | (eq->eqn << 24);
>
> __raw_writel((__force u32)cpu_to_be32(val), addr);
> /* We still want ordering, just not swabbing, so add a barrier */
> - mb();
> + wmb();
> }
>
> int mlx5_eq_table_init(struct mlx5_core_dev *dev); void
> mlx5_eq_table_cleanup(struct mlx5_core_dev *dev); int
> mlx5_eq_table_create(struct mlx5_core_dev *dev);
> --
> 2.45.2
Reviewed-by: Parav Pandit <parav@nvidia.com>
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [PATCH net-next v2 2/2] mlx5/core: deduplicate {mlx5_,}eq_update_ci()
2024-11-07 18:30 ` [PATCH net-next v2 2/2] mlx5/core: deduplicate {mlx5_,}eq_update_ci() Caleb Sander Mateos
@ 2024-11-08 10:49 ` Parav Pandit
2024-11-11 11:59 ` Tariq Toukan
0 siblings, 1 reply; 19+ messages in thread
From: Parav Pandit @ 2024-11-08 10:49 UTC (permalink / raw)
To: Caleb Sander Mateos, Saeed Mahameed, Leon Romanovsky,
Tariq Toukan, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni
Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org
> From: Caleb Sander Mateos <csander@purestorage.com>
> Sent: Friday, November 8, 2024 12:01 AM
>
> The logic of eq_update_ci() is duplicated in mlx5_eq_update_ci(). The only
> additional work done by mlx5_eq_update_ci() is to increment
> eq->cons_index. Call eq_update_ci() from mlx5_eq_update_ci() to avoid
> the duplication.
>
> Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
> ---
> drivers/net/ethernet/mellanox/mlx5/core/eq.c | 9 +--------
> 1 file changed, 1 insertion(+), 8 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> index 859dcf09b770..078029c81935 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> @@ -802,19 +802,12 @@ struct mlx5_eqe *mlx5_eq_get_eqe(struct mlx5_eq
> *eq, u32 cc) } EXPORT_SYMBOL(mlx5_eq_get_eqe);
>
> void mlx5_eq_update_ci(struct mlx5_eq *eq, u32 cc, bool arm) {
> - __be32 __iomem *addr = eq->doorbell + (arm ? 0 : 2);
> - u32 val;
> -
> eq->cons_index += cc;
> - val = (eq->cons_index & 0xffffff) | (eq->eqn << 24);
> -
> - __raw_writel((__force u32)cpu_to_be32(val), addr);
> - /* We still want ordering, just not swabbing, so add a barrier */
> - wmb();
> + eq_update_ci(eq, arm);
> }
> EXPORT_SYMBOL(mlx5_eq_update_ci);
>
> static void comp_irq_release_pci(struct mlx5_core_dev *dev, u16 vecidx) {
> --
> 2.45.2
Reviewed-by: Parav Pandit <parav@nvidia.com>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next v2 2/2] mlx5/core: deduplicate {mlx5_,}eq_update_ci()
2024-11-08 10:49 ` Parav Pandit
@ 2024-11-11 11:59 ` Tariq Toukan
0 siblings, 0 replies; 19+ messages in thread
From: Tariq Toukan @ 2024-11-11 11:59 UTC (permalink / raw)
To: Parav Pandit, Caleb Sander Mateos, Saeed Mahameed,
Leon Romanovsky, Tariq Toukan, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org
On 08/11/2024 12:49, Parav Pandit wrote:
>
>> From: Caleb Sander Mateos <csander@purestorage.com>
>> Sent: Friday, November 8, 2024 12:01 AM
>>
>> The logic of eq_update_ci() is duplicated in mlx5_eq_update_ci(). The only
>> additional work done by mlx5_eq_update_ci() is to increment
>> eq->cons_index. Call eq_update_ci() from mlx5_eq_update_ci() to avoid
>> the duplication.
>>
>> Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
>> ---
>> drivers/net/ethernet/mellanox/mlx5/core/eq.c | 9 +--------
>> 1 file changed, 1 insertion(+), 8 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
>> b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
>> index 859dcf09b770..078029c81935 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
>> @@ -802,19 +802,12 @@ struct mlx5_eqe *mlx5_eq_get_eqe(struct mlx5_eq
>> *eq, u32 cc) } EXPORT_SYMBOL(mlx5_eq_get_eqe);
>>
>> void mlx5_eq_update_ci(struct mlx5_eq *eq, u32 cc, bool arm) {
>> - __be32 __iomem *addr = eq->doorbell + (arm ? 0 : 2);
>> - u32 val;
>> -
>> eq->cons_index += cc;
>> - val = (eq->cons_index & 0xffffff) | (eq->eqn << 24);
>> -
>> - __raw_writel((__force u32)cpu_to_be32(val), addr);
>> - /* We still want ordering, just not swabbing, so add a barrier */
>> - wmb();
>> + eq_update_ci(eq, arm);
>> }
>> EXPORT_SYMBOL(mlx5_eq_update_ci);
>>
>> static void comp_irq_release_pci(struct mlx5_core_dev *dev, u16 vecidx) {
>> --
>> 2.45.2
>
> Reviewed-by: Parav Pandit <parav@nvidia.com>
>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next v2 1/2] mlx5/core: relax memory barrier in eq_update_ci()
2024-11-08 10:46 ` [PATCH net-next v2 1/2] mlx5/core: relax memory barrier in eq_update_ci() Parav Pandit
@ 2024-11-11 11:59 ` Tariq Toukan
0 siblings, 0 replies; 19+ messages in thread
From: Tariq Toukan @ 2024-11-11 11:59 UTC (permalink / raw)
To: Parav Pandit, Caleb Sander Mateos, Saeed Mahameed,
Leon Romanovsky, Tariq Toukan, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org
On 08/11/2024 12:46, Parav Pandit wrote:
>
>> From: Caleb Sander Mateos <csander@purestorage.com>
>> Sent: Friday, November 8, 2024 12:01 AM
>>
>> The memory barrier in eq_update_ci() after the doorbell write is a significant
>> hot spot in mlx5_eq_comp_int(). Under heavy TCP load, we see 3% of CPU
>> time spent on the mfence instruction.
>>
>> 98df6d5b877c ("net/mlx5: A write memory barrier is sufficient in EQ ci
>> update") already relaxed the full memory barrier to just a write barrier in
>> mlx5_eq_update_ci(), which duplicates eq_update_ci(). So replace mb() with
>> wmb() in eq_update_ci() too.
>>
>> On strongly ordered architectures, no barrier is actually needed because the
>> MMIO writes to the doorbell register are guaranteed to appear to the device in
>> the order they were made. However, the kernel's ordered MMIO primitive
>> writel() lacks a convenient big-endian interface.
>> Therefore, we opt to stick with __raw_writel() + a barrier.
>>
>> Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
>> ---
>> v2: keep memory barrier instead of using ordered writel()
>>
>> drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h
>> b/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h
>> index 4b7f7131c560..b1edc71ffc6d 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h
>> @@ -70,11 +70,11 @@ static inline void eq_update_ci(struct mlx5_eq *eq, int
>> arm)
>> __be32 __iomem *addr = eq->doorbell + (arm ? 0 : 2);
>> u32 val = (eq->cons_index & 0xffffff) | (eq->eqn << 24);
>>
>> __raw_writel((__force u32)cpu_to_be32(val), addr);
>> /* We still want ordering, just not swabbing, so add a barrier */
>> - mb();
>> + wmb();
>> }
>>
>> int mlx5_eq_table_init(struct mlx5_core_dev *dev); void
>> mlx5_eq_table_cleanup(struct mlx5_core_dev *dev); int
>> mlx5_eq_table_create(struct mlx5_core_dev *dev);
>> --
>> 2.45.2
>
> Reviewed-by: Parav Pandit <parav@nvidia.com>
>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next v2 1/2] mlx5/core: relax memory barrier in eq_update_ci()
2024-11-07 18:30 ` [PATCH net-next v2 1/2] mlx5/core: relax memory barrier in eq_update_ci() Caleb Sander Mateos
2024-11-07 18:30 ` [PATCH net-next v2 2/2] mlx5/core: deduplicate {mlx5_,}eq_update_ci() Caleb Sander Mateos
2024-11-08 10:46 ` [PATCH net-next v2 1/2] mlx5/core: relax memory barrier in eq_update_ci() Parav Pandit
@ 2024-11-12 2:38 ` Jakub Kicinski
2 siblings, 0 replies; 19+ messages in thread
From: Jakub Kicinski @ 2024-11-12 2:38 UTC (permalink / raw)
To: Caleb Sander Mateos
Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
David S. Miller, Eric Dumazet, Paolo Abeni, Parav Pandit, netdev,
linux-rdma, linux-kernel
On Thu, 7 Nov 2024 11:30:51 -0700 Caleb Sander Mateos wrote:
> The memory barrier in eq_update_ci() after the doorbell write is a
> significant hot spot in mlx5_eq_comp_int(). Under heavy TCP load, we see
> 3% of CPU time spent on the mfence instruction.
Applied, thanks.
In the future please avoid sending patches in reply to older version.
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2024-11-12 2:38 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-01 3:46 [PATCH net-next 1/2] mlx5/core: avoid memory barrier in eq_update_ci() Caleb Sander Mateos
2024-11-01 3:46 ` [PATCH net-next 2/2] mlx5/core: deduplicate {mlx5_,}eq_update_ci() Caleb Sander Mateos
2024-11-03 3:55 ` Parav Pandit
2024-11-03 22:18 ` Caleb Sander
2024-11-05 5:22 ` Parav Pandit
2024-11-05 16:06 ` Caleb Sander
2024-11-06 5:44 ` Parav Pandit
2024-11-06 23:44 ` Caleb Sander
2024-11-07 2:36 ` Saeed Mahameed
2024-11-07 2:43 ` Parav Pandit
2024-11-07 4:45 ` Caleb Sander
2024-11-07 5:14 ` Saeed Mahameed
2024-11-07 18:30 ` [PATCH net-next v2 1/2] mlx5/core: relax memory barrier in eq_update_ci() Caleb Sander Mateos
2024-11-07 18:30 ` [PATCH net-next v2 2/2] mlx5/core: deduplicate {mlx5_,}eq_update_ci() Caleb Sander Mateos
2024-11-08 10:49 ` Parav Pandit
2024-11-11 11:59 ` Tariq Toukan
2024-11-08 10:46 ` [PATCH net-next v2 1/2] mlx5/core: relax memory barrier in eq_update_ci() Parav Pandit
2024-11-11 11:59 ` Tariq Toukan
2024-11-12 2:38 ` Jakub Kicinski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).