netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mlx5: only schedule EQ comp tasklet if necessary
@ 2024-10-27  4:06 Caleb Sander Mateos
  2024-10-29  4:08 ` Parav Pandit
  0 siblings, 1 reply; 13+ messages in thread
From: Caleb Sander Mateos @ 2024-10-27  4:06 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Caleb Sander Mateos, netdev, linux-rdma, linux-kernel

Currently, the mlx5_eq_comp_int() interrupt handler schedules a tasklet
to call mlx5_cq_tasklet_cb() if it processes any completions. For CQs
whose completions don't need to be processed in tasklet context, this
overhead is unnecessary. Atomic operations are needed to schedule, lock,
and clear the tasklet. And when mlx5_cq_tasklet_cb() runs, it acquires a
spin lock to access the list of CQs enqueued for processing.

Schedule the tasklet in mlx5_add_cq_to_tasklet() instead to avoid this
overhead. mlx5_add_cq_to_tasklet() is responsible for enqueuing the CQs
to be processed in tasklet context, so it can schedule the tasklet. CQs
that need tasklet processing have their interrupt comp handler set to
mlx5_add_cq_to_tasklet(), so they will schedule the tasklet. CQs that
don't need tasklet processing won't schedule the tasklet. To avoid
scheduling the tasklet multiple times during the same interrupt, only
schedule the tasklet in mlx5_add_cq_to_tasklet() if the tasklet work
queue was empty before the new CQ was pushed to it.

Note that the mlx4 driver works the same way: it schedules the tasklet
in mlx4_add_cq_to_tasklet() and only if the work queue was empty before.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/cq.c | 5 +++++
 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 5 +----
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cq.c b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
index 4caa1b6f40ba..25f3b26db729 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
@@ -69,22 +69,27 @@ void mlx5_cq_tasklet_cb(struct tasklet_struct *t)
 static void mlx5_add_cq_to_tasklet(struct mlx5_core_cq *cq,
 				   struct mlx5_eqe *eqe)
 {
 	unsigned long flags;
 	struct mlx5_eq_tasklet *tasklet_ctx = cq->tasklet_ctx.priv;
+	bool schedule_tasklet = false;
 
 	spin_lock_irqsave(&tasklet_ctx->lock, flags);
 	/* When migrating CQs between EQs will be implemented, please note
 	 * that you need to sync this point. It is possible that
 	 * while migrating a CQ, completions on the old EQs could
 	 * still arrive.
 	 */
 	if (list_empty_careful(&cq->tasklet_ctx.list)) {
 		mlx5_cq_hold(cq);
+		schedule_tasklet = list_empty(&tasklet_ctx->list);
 		list_add_tail(&cq->tasklet_ctx.list, &tasklet_ctx->list);
 	}
 	spin_unlock_irqrestore(&tasklet_ctx->lock, flags);
+
+	if (schedule_tasklet)
+		tasklet_schedule(&tasklet_ctx->task);
 }
 
 /* Callers must verify outbox status in case of err */
 int mlx5_create_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq,
 		   u32 *in, int inlen, u32 *out, int outlen)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 68cb86b37e56..66fc17d9c949 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -112,17 +112,17 @@ static int mlx5_eq_comp_int(struct notifier_block *nb,
 	struct mlx5_eq_comp *eq_comp =
 		container_of(nb, struct mlx5_eq_comp, irq_nb);
 	struct mlx5_eq *eq = &eq_comp->core;
 	struct mlx5_eqe *eqe;
 	int num_eqes = 0;
-	u32 cqn = -1;
 
 	eqe = next_eqe_sw(eq);
 	if (!eqe)
 		goto out;
 
 	do {
+		u32 cqn;
 		struct mlx5_core_cq *cq;
 
 		/* Make sure we read EQ entry contents after we've
 		 * checked the ownership bit.
 		 */
@@ -145,13 +145,10 @@ static int mlx5_eq_comp_int(struct notifier_block *nb,
 	} while ((++num_eqes < MLX5_EQ_POLLING_BUDGET) && (eqe = next_eqe_sw(eq)));
 
 out:
 	eq_update_ci(eq, 1);
 
-	if (cqn != -1)
-		tasklet_schedule(&eq_comp->tasklet_ctx.task);
-
 	return 0;
 }
 
 /* Some architectures don't latch interrupts when they are disabled, so using
  * mlx5_eq_poll_irq_disabled could end up losing interrupts while trying to
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* RE: [PATCH] mlx5: only schedule EQ comp tasklet if necessary
  2024-10-27  4:06 [PATCH] mlx5: only schedule EQ comp tasklet if necessary Caleb Sander Mateos
@ 2024-10-29  4:08 ` Parav Pandit
  2024-10-29 16:32   ` Caleb Sander
  0 siblings, 1 reply; 13+ messages in thread
From: Parav Pandit @ 2024-10-29  4:08 UTC (permalink / raw)
  To: Caleb Sander Mateos, Saeed Mahameed, Leon Romanovsky,
	Tariq Toukan, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-kernel@vger.kernel.org

Hi 

> From: Caleb Sander Mateos <csander@purestorage.com>
> Sent: Sunday, October 27, 2024 9:37 AM
> 
> Currently, the mlx5_eq_comp_int() interrupt handler schedules a tasklet to call
> mlx5_cq_tasklet_cb() if it processes any completions. For CQs whose
> completions don't need to be processed in tasklet context, this overhead is
> unnecessary. Atomic operations are needed to schedule, lock, and clear the
> tasklet. And when mlx5_cq_tasklet_cb() runs, it acquires a spin lock to access
> the list of CQs enqueued for processing.
> 
> Schedule the tasklet in mlx5_add_cq_to_tasklet() instead to avoid this
> overhead. mlx5_add_cq_to_tasklet() is responsible for enqueuing the CQs to
> be processed in tasklet context, so it can schedule the tasklet. CQs that need
> tasklet processing have their interrupt comp handler set to
> mlx5_add_cq_to_tasklet(), so they will schedule the tasklet. CQs that don't
> need tasklet processing won't schedule the tasklet. To avoid scheduling the
> tasklet multiple times during the same interrupt, only schedule the tasklet in
> mlx5_add_cq_to_tasklet() if the tasklet work queue was empty before the
> new CQ was pushed to it.
> 
> Note that the mlx4 driver works the same way: it schedules the tasklet in
> mlx4_add_cq_to_tasklet() and only if the work queue was empty before.
> 
> Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/cq.c | 5 +++++
> drivers/net/ethernet/mellanox/mlx5/core/eq.c | 5 +----
>  2 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cq.c
> b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
> index 4caa1b6f40ba..25f3b26db729 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/cq.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
> @@ -69,22 +69,27 @@ void mlx5_cq_tasklet_cb(struct tasklet_struct *t)
> static void mlx5_add_cq_to_tasklet(struct mlx5_core_cq *cq,
>  				   struct mlx5_eqe *eqe)
>  {
>  	unsigned long flags;
>  	struct mlx5_eq_tasklet *tasklet_ctx = cq->tasklet_ctx.priv;
> +	bool schedule_tasklet = false;
> 
>  	spin_lock_irqsave(&tasklet_ctx->lock, flags);
>  	/* When migrating CQs between EQs will be implemented, please note
>  	 * that you need to sync this point. It is possible that
>  	 * while migrating a CQ, completions on the old EQs could
>  	 * still arrive.
>  	 */
>  	if (list_empty_careful(&cq->tasklet_ctx.list)) {
>  		mlx5_cq_hold(cq);
> +		schedule_tasklet = list_empty(&tasklet_ctx->list);
>  		list_add_tail(&cq->tasklet_ctx.list, &tasklet_ctx->list);
>  	}
>  	spin_unlock_irqrestore(&tasklet_ctx->lock, flags);
> +
> +	if (schedule_tasklet)
> +		tasklet_schedule(&tasklet_ctx->task);
>  }
> 
>  /* Callers must verify outbox status in case of err */  int mlx5_create_cq(struct
> mlx5_core_dev *dev, struct mlx5_core_cq *cq,
>  		   u32 *in, int inlen, u32 *out, int outlen) diff --git
> a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> index 68cb86b37e56..66fc17d9c949 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> @@ -112,17 +112,17 @@ static int mlx5_eq_comp_int(struct notifier_block
> *nb,
>  	struct mlx5_eq_comp *eq_comp =
>  		container_of(nb, struct mlx5_eq_comp, irq_nb);
>  	struct mlx5_eq *eq = &eq_comp->core;
>  	struct mlx5_eqe *eqe;
>  	int num_eqes = 0;
> -	u32 cqn = -1;
> 
>  	eqe = next_eqe_sw(eq);
>  	if (!eqe)
>  		goto out;
> 
>  	do {
> +		u32 cqn;
>  		struct mlx5_core_cq *cq;
> 
A small nit, cqn should be declared after cq to follow the netdev coding guidelines of [1].

>  		/* Make sure we read EQ entry contents after we've
>  		 * checked the ownership bit.
>  		 */
> @@ -145,13 +145,10 @@ static int mlx5_eq_comp_int(struct notifier_block
> *nb,
>  	} while ((++num_eqes < MLX5_EQ_POLLING_BUDGET) && (eqe =
> next_eqe_sw(eq)));
> 
>  out:
>  	eq_update_ci(eq, 1);
> 
> -	if (cqn != -1)
> -		tasklet_schedule(&eq_comp->tasklet_ctx.task);
> -
Current code processes many EQEs and performs the check for tasklet_schedule only once in the cqn check.
While this change, on every EQE, the additional check will be done.
This will marginally make the interrupt handler slow.
Returning a bool from comp() wont be good either, and we cannot inline things here due to function pointer.

The cost of scheduling null tasklet is higher than this if (schedule_tasklet) check.
In other series internally, I am working to reduce the cost of comp() itself unrelated to this change.
so it ok to have the additional check introduced here.

Apart from that,

Reviewed-by: Parav Pandit <parav@nvidia.com>

[1] https://elixir.bootlin.com/linux/v6.11.5/source/Documentation/process/maintainer-netdev.rst#L358

>  	return 0;
>  }
> 
>  /* Some architectures don't latch interrupts when they are disabled, so using
>   * mlx5_eq_poll_irq_disabled could end up losing interrupts while trying to
> --
> 2.45.2
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mlx5: only schedule EQ comp tasklet if necessary
  2024-10-29  4:08 ` Parav Pandit
@ 2024-10-29 16:32   ` Caleb Sander
  2024-10-30  3:16     ` Parav Pandit
  0 siblings, 1 reply; 13+ messages in thread
From: Caleb Sander @ 2024-10-29 16:32 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Mon, Oct 28, 2024 at 9:08 PM Parav Pandit <parav@nvidia.com> wrote:
>
> Hi
>
> > From: Caleb Sander Mateos <csander@purestorage.com>
> > Sent: Sunday, October 27, 2024 9:37 AM
> >
> > Currently, the mlx5_eq_comp_int() interrupt handler schedules a tasklet to call
> > mlx5_cq_tasklet_cb() if it processes any completions. For CQs whose
> > completions don't need to be processed in tasklet context, this overhead is
> > unnecessary. Atomic operations are needed to schedule, lock, and clear the
> > tasklet. And when mlx5_cq_tasklet_cb() runs, it acquires a spin lock to access
> > the list of CQs enqueued for processing.
> >
> > Schedule the tasklet in mlx5_add_cq_to_tasklet() instead to avoid this
> > overhead. mlx5_add_cq_to_tasklet() is responsible for enqueuing the CQs to
> > be processed in tasklet context, so it can schedule the tasklet. CQs that need
> > tasklet processing have their interrupt comp handler set to
> > mlx5_add_cq_to_tasklet(), so they will schedule the tasklet. CQs that don't
> > need tasklet processing won't schedule the tasklet. To avoid scheduling the
> > tasklet multiple times during the same interrupt, only schedule the tasklet in
> > mlx5_add_cq_to_tasklet() if the tasklet work queue was empty before the
> > new CQ was pushed to it.
> >
> > Note that the mlx4 driver works the same way: it schedules the tasklet in
> > mlx4_add_cq_to_tasklet() and only if the work queue was empty before.
> >
> > Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
> > ---
> >  drivers/net/ethernet/mellanox/mlx5/core/cq.c | 5 +++++
> > drivers/net/ethernet/mellanox/mlx5/core/eq.c | 5 +----
> >  2 files changed, 6 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cq.c
> > b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
> > index 4caa1b6f40ba..25f3b26db729 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/cq.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
> > @@ -69,22 +69,27 @@ void mlx5_cq_tasklet_cb(struct tasklet_struct *t)
> > static void mlx5_add_cq_to_tasklet(struct mlx5_core_cq *cq,
> >                                  struct mlx5_eqe *eqe)
> >  {
> >       unsigned long flags;
> >       struct mlx5_eq_tasklet *tasklet_ctx = cq->tasklet_ctx.priv;
> > +     bool schedule_tasklet = false;
> >
> >       spin_lock_irqsave(&tasklet_ctx->lock, flags);
> >       /* When migrating CQs between EQs will be implemented, please note
> >        * that you need to sync this point. It is possible that
> >        * while migrating a CQ, completions on the old EQs could
> >        * still arrive.
> >        */
> >       if (list_empty_careful(&cq->tasklet_ctx.list)) {
> >               mlx5_cq_hold(cq);
> > +             schedule_tasklet = list_empty(&tasklet_ctx->list);
> >               list_add_tail(&cq->tasklet_ctx.list, &tasklet_ctx->list);
> >       }
> >       spin_unlock_irqrestore(&tasklet_ctx->lock, flags);
> > +
> > +     if (schedule_tasklet)
> > +             tasklet_schedule(&tasklet_ctx->task);
> >  }
> >
> >  /* Callers must verify outbox status in case of err */  int mlx5_create_cq(struct
> > mlx5_core_dev *dev, struct mlx5_core_cq *cq,
> >                  u32 *in, int inlen, u32 *out, int outlen) diff --git
> > a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> > b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> > index 68cb86b37e56..66fc17d9c949 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> > @@ -112,17 +112,17 @@ static int mlx5_eq_comp_int(struct notifier_block
> > *nb,
> >       struct mlx5_eq_comp *eq_comp =
> >               container_of(nb, struct mlx5_eq_comp, irq_nb);
> >       struct mlx5_eq *eq = &eq_comp->core;
> >       struct mlx5_eqe *eqe;
> >       int num_eqes = 0;
> > -     u32 cqn = -1;
> >
> >       eqe = next_eqe_sw(eq);
> >       if (!eqe)
> >               goto out;
> >
> >       do {
> > +             u32 cqn;
> >               struct mlx5_core_cq *cq;
> >
> A small nit, cqn should be declared after cq to follow the netdev coding guidelines of [1].

Sure, will fix. Thanks for the reference.

>
> >               /* Make sure we read EQ entry contents after we've
> >                * checked the ownership bit.
> >                */
> > @@ -145,13 +145,10 @@ static int mlx5_eq_comp_int(struct notifier_block
> > *nb,
> >       } while ((++num_eqes < MLX5_EQ_POLLING_BUDGET) && (eqe =
> > next_eqe_sw(eq)));
> >
> >  out:
> >       eq_update_ci(eq, 1);
> >
> > -     if (cqn != -1)
> > -             tasklet_schedule(&eq_comp->tasklet_ctx.task);
> > -
> Current code processes many EQEs and performs the check for tasklet_schedule only once in the cqn check.
> While this change, on every EQE, the additional check will be done.
> This will marginally make the interrupt handler slow.
> Returning a bool from comp() wont be good either, and we cannot inline things here due to function pointer.
>
> The cost of scheduling null tasklet is higher than this if (schedule_tasklet) check.
> In other series internally, I am working to reduce the cost of comp() itself unrelated to this change.
> so it ok to have the additional check introduced here.

Right, there's definitely a tradeoff here.
From what I could tell, there is only one CQ type that processes
completions in tasklet context (user Infiniband CQs, running
mlx5_ib_cq_comp()). All others handle their completions in interrupt
context. Ideally the CQ types that don't need it would not pay the
cost of the tasklet schedule and execution. There are several atomic
operations involved in the tasklet path which are fairly expensive. In
our TCP-heavy workload, we see 4% of the CPU time spent on the
tasklet_trylock() in tasklet_action_common.constprop.0, with a smaller
amount spent on the atomic operations in tasklet_schedule(),
tasklet_clear_sched(), and acquiring the spinlock in
mlx5_cq_tasklet_cb().
I agree the additional branch per EQE should be cheaper than
scheduling the unused tasklet, but the cost would be paid by
Infiniband workloads while non-Infiniband workloads see the benefit.
How about instead scheduling the tasklet in mlx5_eq_comp_int() if any
of the CQs have a tasklet completion handler? That should get the best
of both worlds: skipping the tasklet schedule for CQs that don't need
it while ensuring the tasklet is only scheduled once per interrupt.
Something like this:
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 68cb86b37e56..f0ba3725b8e9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -112,9 +112,9 @@ static int mlx5_eq_comp_int(struct notifier_block *nb,
        struct mlx5_eq_comp *eq_comp =
                container_of(nb, struct mlx5_eq_comp, irq_nb);
        struct mlx5_eq *eq = &eq_comp->core;
+       bool schedule_tasklet = false;
        struct mlx5_eqe *eqe;
        int num_eqes = 0;
-       u32 cqn = -1;

        eqe = next_eqe_sw(eq);
        if (!eqe)
@@ -122,6 +122,7 @@ static int mlx5_eq_comp_int(struct notifier_block *nb,

        do {
                struct mlx5_core_cq *cq;
+               u32 cqn;

                /* Make sure we read EQ entry contents after we've
                 * checked the ownership bit.
@@ -134,6 +135,7 @@ static int mlx5_eq_comp_int(struct notifier_block *nb,
                if (likely(cq)) {
                        ++cq->arm_sn;
                        cq->comp(cq, eqe);
+                       schedule_tasklet |= !!cq->tasklet_ctx.comp;
                        mlx5_cq_put(cq);
                } else {
                        dev_dbg_ratelimited(eq->dev->device,
@@ -147,7 +149,7 @@ static int mlx5_eq_comp_int(struct notifier_block *nb,
 out:
        eq_update_ci(eq, 1);

-       if (cqn != -1)
+       if (schedule_tasklet)
                tasklet_schedule(&eq_comp->tasklet_ctx.task);

        return 0;

Thanks,
Caleb

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* RE: [PATCH] mlx5: only schedule EQ comp tasklet if necessary
  2024-10-29 16:32   ` Caleb Sander
@ 2024-10-30  3:16     ` Parav Pandit
  2024-10-30 17:06       ` [PATCH v2] " Caleb Sander Mateos
  0 siblings, 1 reply; 13+ messages in thread
From: Parav Pandit @ 2024-10-30  3:16 UTC (permalink / raw)
  To: Caleb Sander
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-kernel@vger.kernel.org



> From: Caleb Sander <csander@purestorage.com>
> Sent: Tuesday, October 29, 2024 10:02 PM
> 
> On Mon, Oct 28, 2024 at 9:08 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> > Hi
> >
> > > From: Caleb Sander Mateos <csander@purestorage.com>
> > > Sent: Sunday, October 27, 2024 9:37 AM
> > >
> > > Currently, the mlx5_eq_comp_int() interrupt handler schedules a
> > > tasklet to call
> > > mlx5_cq_tasklet_cb() if it processes any completions. For CQs whose
> > > completions don't need to be processed in tasklet context, this
> > > overhead is unnecessary. Atomic operations are needed to schedule,
> > > lock, and clear the tasklet. And when mlx5_cq_tasklet_cb() runs, it
> > > acquires a spin lock to access the list of CQs enqueued for processing.
> > >
> > > Schedule the tasklet in mlx5_add_cq_to_tasklet() instead to avoid
> > > this overhead. mlx5_add_cq_to_tasklet() is responsible for enqueuing
> > > the CQs to be processed in tasklet context, so it can schedule the
> > > tasklet. CQs that need tasklet processing have their interrupt comp
> > > handler set to mlx5_add_cq_to_tasklet(), so they will schedule the
> > > tasklet. CQs that don't need tasklet processing won't schedule the
> > > tasklet. To avoid scheduling the tasklet multiple times during the
> > > same interrupt, only schedule the tasklet in
> > > mlx5_add_cq_to_tasklet() if the tasklet work queue was empty before
> > > the new CQ was pushed to it.
> > >
> > > Note that the mlx4 driver works the same way: it schedules the
> > > tasklet in
> > > mlx4_add_cq_to_tasklet() and only if the work queue was empty before.
> > >
> > > Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
> > > ---
> > >  drivers/net/ethernet/mellanox/mlx5/core/cq.c | 5 +++++
> > > drivers/net/ethernet/mellanox/mlx5/core/eq.c | 5 +----
> > >  2 files changed, 6 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cq.c
> > > b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
> > > index 4caa1b6f40ba..25f3b26db729 100644
> > > --- a/drivers/net/ethernet/mellanox/mlx5/core/cq.c
> > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
> > > @@ -69,22 +69,27 @@ void mlx5_cq_tasklet_cb(struct tasklet_struct
> > > *t) static void mlx5_add_cq_to_tasklet(struct mlx5_core_cq *cq,
> > >                                  struct mlx5_eqe *eqe)  {
> > >       unsigned long flags;
> > >       struct mlx5_eq_tasklet *tasklet_ctx = cq->tasklet_ctx.priv;
> > > +     bool schedule_tasklet = false;
> > >
> > >       spin_lock_irqsave(&tasklet_ctx->lock, flags);
> > >       /* When migrating CQs between EQs will be implemented, please
> note
> > >        * that you need to sync this point. It is possible that
> > >        * while migrating a CQ, completions on the old EQs could
> > >        * still arrive.
> > >        */
> > >       if (list_empty_careful(&cq->tasklet_ctx.list)) {
> > >               mlx5_cq_hold(cq);
> > > +             schedule_tasklet = list_empty(&tasklet_ctx->list);
> > >               list_add_tail(&cq->tasklet_ctx.list, &tasklet_ctx->list);
> > >       }
> > >       spin_unlock_irqrestore(&tasklet_ctx->lock, flags);
> > > +
> > > +     if (schedule_tasklet)
> > > +             tasklet_schedule(&tasklet_ctx->task);
> > >  }
> > >
> > >  /* Callers must verify outbox status in case of err */  int
> > > mlx5_create_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq,
> > >                  u32 *in, int inlen, u32 *out, int outlen) diff
> > > --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> > > b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> > > index 68cb86b37e56..66fc17d9c949 100644
> > > --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> > > @@ -112,17 +112,17 @@ static int mlx5_eq_comp_int(struct
> > > notifier_block *nb,
> > >       struct mlx5_eq_comp *eq_comp =
> > >               container_of(nb, struct mlx5_eq_comp, irq_nb);
> > >       struct mlx5_eq *eq = &eq_comp->core;
> > >       struct mlx5_eqe *eqe;
> > >       int num_eqes = 0;
> > > -     u32 cqn = -1;
> > >
> > >       eqe = next_eqe_sw(eq);
> > >       if (!eqe)
> > >               goto out;
> > >
> > >       do {
> > > +             u32 cqn;
> > >               struct mlx5_core_cq *cq;
> > >
> > A small nit, cqn should be declared after cq to follow the netdev coding
> guidelines of [1].
> 
> Sure, will fix. Thanks for the reference.
> 
> >
> > >               /* Make sure we read EQ entry contents after we've
> > >                * checked the ownership bit.
> > >                */
> > > @@ -145,13 +145,10 @@ static int mlx5_eq_comp_int(struct
> > > notifier_block *nb,
> > >       } while ((++num_eqes < MLX5_EQ_POLLING_BUDGET) && (eqe =
> > > next_eqe_sw(eq)));
> > >
> > >  out:
> > >       eq_update_ci(eq, 1);
> > >
> > > -     if (cqn != -1)
> > > -             tasklet_schedule(&eq_comp->tasklet_ctx.task);
> > > -
> > Current code processes many EQEs and performs the check for
> tasklet_schedule only once in the cqn check.
> > While this change, on every EQE, the additional check will be done.
> > This will marginally make the interrupt handler slow.
> > Returning a bool from comp() wont be good either, and we cannot inline
> things here due to function pointer.
> >
> > The cost of scheduling null tasklet is higher than this if (schedule_tasklet)
> check.
> > In other series internally, I am working to reduce the cost of comp() itself
> unrelated to this change.
> > so it ok to have the additional check introduced here.
> 
> Right, there's definitely a tradeoff here.
> From what I could tell, there is only one CQ type that processes completions
> in tasklet context (user Infiniband CQs, running mlx5_ib_cq_comp()). All
> others handle their completions in interrupt context. Ideally the CQ types
> that don't need it would not pay the cost of the tasklet schedule and
> execution. There are several atomic operations involved in the tasklet path
> which are fairly expensive. In our TCP-heavy workload, we see 4% of the CPU
> time spent on the
> tasklet_trylock() in tasklet_action_common.constprop.0, with a smaller
> amount spent on the atomic operations in tasklet_schedule(),
> tasklet_clear_sched(), and acquiring the spinlock in mlx5_cq_tasklet_cb().
Please include this perf stats in the commit log in v1, which helps to decide on the trade-off and also quantifies it.

> I agree the additional branch per EQE should be cheaper than scheduling the
> unused tasklet, but the cost would be paid by Infiniband workloads while
> non-Infiniband workloads see the benefit.
> How about instead scheduling the tasklet in mlx5_eq_comp_int() if any of
> the CQs have a tasklet completion handler? That should get the best of both
> worlds: skipping the tasklet schedule for CQs that don't need it while
> ensuring the tasklet is only scheduled once per interrupt.
> Something like this:
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> index 68cb86b37e56..f0ba3725b8e9 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> @@ -112,9 +112,9 @@ static int mlx5_eq_comp_int(struct notifier_block
> *nb,
>         struct mlx5_eq_comp *eq_comp =
>                 container_of(nb, struct mlx5_eq_comp, irq_nb);
>         struct mlx5_eq *eq = &eq_comp->core;
> +       bool schedule_tasklet = false;
>         struct mlx5_eqe *eqe;
>         int num_eqes = 0;
> -       u32 cqn = -1;
> 
>         eqe = next_eqe_sw(eq);
>         if (!eqe)
> @@ -122,6 +122,7 @@ static int mlx5_eq_comp_int(struct notifier_block
> *nb,
> 
>         do {
>                 struct mlx5_core_cq *cq;
> +               u32 cqn;
> 
>                 /* Make sure we read EQ entry contents after we've
>                  * checked the ownership bit.
> @@ -134,6 +135,7 @@ static int mlx5_eq_comp_int(struct notifier_block
> *nb,
>                 if (likely(cq)) {
>                         ++cq->arm_sn;
>                         cq->comp(cq, eqe);
> +                       schedule_tasklet |= !!cq->tasklet_ctx.comp;
>                         mlx5_cq_put(cq);
>                 } else {
>                         dev_dbg_ratelimited(eq->dev->device,
> @@ -147,7 +149,7 @@ static int mlx5_eq_comp_int(struct notifier_block
> *nb,
>  out:
>         eq_update_ci(eq, 1);
> 
> -       if (cqn != -1)
> +       if (schedule_tasklet)
>                 tasklet_schedule(&eq_comp->tasklet_ctx.task);
> 
>         return 0;
> 
The issue in the above change is, it makes assumption of CQ layer (upper layer above interrupt) if tasklet to be used or not.
I would rather keep the two layers separate as your patch.

> Thanks,
> Caleb

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2] mlx5: only schedule EQ comp tasklet if necessary
  2024-10-30  3:16     ` Parav Pandit
@ 2024-10-30 17:06       ` Caleb Sander Mateos
  2024-10-31  3:14         ` Parav Pandit
  0 siblings, 1 reply; 13+ messages in thread
From: Caleb Sander Mateos @ 2024-10-30 17:06 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Caleb Sander Mateos, Parav Pandit, netdev, linux-rdma,
	linux-kernel

Currently, the mlx5_eq_comp_int() interrupt handler schedules a tasklet
to call mlx5_cq_tasklet_cb() if it processes any completions. For CQs
whose completions don't need to be processed in tasklet context, this
adds unnecessary overhead. In a heavy TCP workload, we see 4% of CPU
time spent on the tasklet_trylock() in tasklet_action_common(), with a
smaller amount spent on the atomic operations in tasklet_schedule(),
tasklet_clear_sched(), and locking the spinlock in mlx5_cq_tasklet_cb().
TCP completions are handled by mlx5e_completion_event(), which schedules
NAPI to poll the queue, so they don't need tasklet processing.

Schedule the tasklet in mlx5_add_cq_to_tasklet() instead to avoid this
overhead. mlx5_add_cq_to_tasklet() is responsible for enqueuing the CQs
to be processed in tasklet context, so it can schedule the tasklet. CQs
that need tasklet processing have their interrupt comp handler set to
mlx5_add_cq_to_tasklet(), so they will schedule the tasklet. CQs that
don't need tasklet processing won't schedule the tasklet. To avoid
scheduling the tasklet multiple times during the same interrupt, only
schedule the tasklet in mlx5_add_cq_to_tasklet() if the tasklet work
queue was empty before the new CQ was pushed to it.

The additional branch in mlx5_add_cq_to_tasklet(), called for each EQE,
may add a small cost for the userspace Infiniband CQs whose completions
are processed in tasklet context. But this seems worth it to avoid the
tasklet overhead for CQs that don't need it.

Note that the mlx4 driver works the same way: it schedules the tasklet
in mlx4_add_cq_to_tasklet() and only if the work queue was empty before.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/cq.c | 5 +++++
 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 5 +----
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cq.c b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
index 4caa1b6f40ba..25f3b26db729 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
@@ -69,22 +69,27 @@ void mlx5_cq_tasklet_cb(struct tasklet_struct *t)
 static void mlx5_add_cq_to_tasklet(struct mlx5_core_cq *cq,
 				   struct mlx5_eqe *eqe)
 {
 	unsigned long flags;
 	struct mlx5_eq_tasklet *tasklet_ctx = cq->tasklet_ctx.priv;
+	bool schedule_tasklet = false;
 
 	spin_lock_irqsave(&tasklet_ctx->lock, flags);
 	/* When migrating CQs between EQs will be implemented, please note
 	 * that you need to sync this point. It is possible that
 	 * while migrating a CQ, completions on the old EQs could
 	 * still arrive.
 	 */
 	if (list_empty_careful(&cq->tasklet_ctx.list)) {
 		mlx5_cq_hold(cq);
+		schedule_tasklet = list_empty(&tasklet_ctx->list);
 		list_add_tail(&cq->tasklet_ctx.list, &tasklet_ctx->list);
 	}
 	spin_unlock_irqrestore(&tasklet_ctx->lock, flags);
+
+	if (schedule_tasklet)
+		tasklet_schedule(&tasklet_ctx->task);
 }
 
 /* Callers must verify outbox status in case of err */
 int mlx5_create_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq,
 		   u32 *in, int inlen, u32 *out, int outlen)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 859dcf09b770..3fd2091c11c8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -112,14 +112,14 @@ static int mlx5_eq_comp_int(struct notifier_block *nb,
 	struct mlx5_eq_comp *eq_comp =
 		container_of(nb, struct mlx5_eq_comp, irq_nb);
 	struct mlx5_eq *eq = &eq_comp->core;
 	struct mlx5_eqe *eqe;
 	int num_eqes = 0;
-	u32 cqn = -1;
 
 	while ((eqe = next_eqe_sw(eq))) {
 		struct mlx5_core_cq *cq;
+		u32 cqn;
 
 		/* Make sure we read EQ entry contents after we've
 		 * checked the ownership bit.
 		 */
 		dma_rmb();
@@ -142,13 +142,10 @@ static int mlx5_eq_comp_int(struct notifier_block *nb,
 			break;
 	}
 
 	eq_update_ci(eq, 1);
 
-	if (cqn != -1)
-		tasklet_schedule(&eq_comp->tasklet_ctx.task);
-
 	return 0;
 }
 
 /* Some architectures don't latch interrupts when they are disabled, so using
  * mlx5_eq_poll_irq_disabled could end up losing interrupts while trying to
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* RE: [PATCH v2] mlx5: only schedule EQ comp tasklet if necessary
  2024-10-30 17:06       ` [PATCH v2] " Caleb Sander Mateos
@ 2024-10-31  3:14         ` Parav Pandit
  2024-10-31 16:34           ` [PATCH net-next v3] mlx5/core: Schedule EQ comp tasklet only " Caleb Sander Mateos
  0 siblings, 1 reply; 13+ messages in thread
From: Parav Pandit @ 2024-10-31  3:14 UTC (permalink / raw)
  To: Caleb Sander Mateos, Saeed Mahameed, Leon Romanovsky,
	Tariq Toukan, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-kernel@vger.kernel.org


> From: Caleb Sander Mateos <csander@purestorage.com>
> Sent: Wednesday, October 30, 2024 10:36 PM
> Subject: [PATCH v2] mlx5: only schedule EQ comp tasklet if necessary
>
I didn't pay attention to subject in the previous review.

It should be,

[PATCH net-next v2] mlx5/core: Schedule EQ comp tasklet only if necesary

 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH net-next v3] mlx5/core: Schedule EQ comp tasklet only if necessary
  2024-10-31  3:14         ` Parav Pandit
@ 2024-10-31 16:34           ` Caleb Sander Mateos
  2024-11-05 12:21             ` Paolo Abeni
  2024-11-05 18:56             ` Saeed Mahameed
  0 siblings, 2 replies; 13+ messages in thread
From: Caleb Sander Mateos @ 2024-10-31 16:34 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Caleb Sander Mateos, Parav Pandit, netdev, linux-rdma,
	linux-kernel

Currently, the mlx5_eq_comp_int() interrupt handler schedules a tasklet
to call mlx5_cq_tasklet_cb() if it processes any completions. For CQs
whose completions don't need to be processed in tasklet context, this
adds unnecessary overhead. In a heavy TCP workload, we see 4% of CPU
time spent on the tasklet_trylock() in tasklet_action_common(), with a
smaller amount spent on the atomic operations in tasklet_schedule(),
tasklet_clear_sched(), and locking the spinlock in mlx5_cq_tasklet_cb().
TCP completions are handled by mlx5e_completion_event(), which schedules
NAPI to poll the queue, so they don't need tasklet processing.

Schedule the tasklet in mlx5_add_cq_to_tasklet() instead to avoid this
overhead. mlx5_add_cq_to_tasklet() is responsible for enqueuing the CQs
to be processed in tasklet context, so it can schedule the tasklet. CQs
that need tasklet processing have their interrupt comp handler set to
mlx5_add_cq_to_tasklet(), so they will schedule the tasklet. CQs that
don't need tasklet processing won't schedule the tasklet. To avoid
scheduling the tasklet multiple times during the same interrupt, only
schedule the tasklet in mlx5_add_cq_to_tasklet() if the tasklet work
queue was empty before the new CQ was pushed to it.

The additional branch in mlx5_add_cq_to_tasklet(), called for each EQE,
may add a small cost for the userspace Infiniband CQs whose completions
are processed in tasklet context. But this seems worth it to avoid the
tasklet overhead for CQs that don't need it.

Note that the mlx4 driver works the same way: it schedules the tasklet
in mlx4_add_cq_to_tasklet() and only if the work queue was empty before.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
---
v3: revise commit message
v2: reorder variable declarations, describe CPU profile results

 drivers/net/ethernet/mellanox/mlx5/core/cq.c | 5 +++++
 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 5 +----
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cq.c b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
index 4caa1b6f40ba..25f3b26db729 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
@@ -69,22 +69,27 @@ void mlx5_cq_tasklet_cb(struct tasklet_struct *t)
 static void mlx5_add_cq_to_tasklet(struct mlx5_core_cq *cq,
 				   struct mlx5_eqe *eqe)
 {
 	unsigned long flags;
 	struct mlx5_eq_tasklet *tasklet_ctx = cq->tasklet_ctx.priv;
+	bool schedule_tasklet = false;
 
 	spin_lock_irqsave(&tasklet_ctx->lock, flags);
 	/* When migrating CQs between EQs will be implemented, please note
 	 * that you need to sync this point. It is possible that
 	 * while migrating a CQ, completions on the old EQs could
 	 * still arrive.
 	 */
 	if (list_empty_careful(&cq->tasklet_ctx.list)) {
 		mlx5_cq_hold(cq);
+		schedule_tasklet = list_empty(&tasklet_ctx->list);
 		list_add_tail(&cq->tasklet_ctx.list, &tasklet_ctx->list);
 	}
 	spin_unlock_irqrestore(&tasklet_ctx->lock, flags);
+
+	if (schedule_tasklet)
+		tasklet_schedule(&tasklet_ctx->task);
 }
 
 /* Callers must verify outbox status in case of err */
 int mlx5_create_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq,
 		   u32 *in, int inlen, u32 *out, int outlen)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 859dcf09b770..3fd2091c11c8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -112,14 +112,14 @@ static int mlx5_eq_comp_int(struct notifier_block *nb,
 	struct mlx5_eq_comp *eq_comp =
 		container_of(nb, struct mlx5_eq_comp, irq_nb);
 	struct mlx5_eq *eq = &eq_comp->core;
 	struct mlx5_eqe *eqe;
 	int num_eqes = 0;
-	u32 cqn = -1;
 
 	while ((eqe = next_eqe_sw(eq))) {
 		struct mlx5_core_cq *cq;
+		u32 cqn;
 
 		/* Make sure we read EQ entry contents after we've
 		 * checked the ownership bit.
 		 */
 		dma_rmb();
@@ -142,13 +142,10 @@ static int mlx5_eq_comp_int(struct notifier_block *nb,
 			break;
 	}
 
 	eq_update_ci(eq, 1);
 
-	if (cqn != -1)
-		tasklet_schedule(&eq_comp->tasklet_ctx.task);
-
 	return 0;
 }
 
 /* Some architectures don't latch interrupts when they are disabled, so using
  * mlx5_eq_poll_irq_disabled could end up losing interrupts while trying to
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v3] mlx5/core: Schedule EQ comp tasklet only if necessary
  2024-10-31 16:34           ` [PATCH net-next v3] mlx5/core: Schedule EQ comp tasklet only " Caleb Sander Mateos
@ 2024-11-05 12:21             ` Paolo Abeni
  2024-11-05 17:28               ` Tariq Toukan
  2024-11-05 18:56             ` Saeed Mahameed
  1 sibling, 1 reply; 13+ messages in thread
From: Paolo Abeni @ 2024-11-05 12:21 UTC (permalink / raw)
  To: Caleb Sander Mateos, Saeed Mahameed, Leon Romanovsky,
	Tariq Toukan, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski
  Cc: Parav Pandit, netdev, linux-rdma, linux-kernel

On 10/31/24 17:34, Caleb Sander Mateos wrote:
> Currently, the mlx5_eq_comp_int() interrupt handler schedules a tasklet
> to call mlx5_cq_tasklet_cb() if it processes any completions. For CQs
> whose completions don't need to be processed in tasklet context, this
> adds unnecessary overhead. In a heavy TCP workload, we see 4% of CPU
> time spent on the tasklet_trylock() in tasklet_action_common(), with a
> smaller amount spent on the atomic operations in tasklet_schedule(),
> tasklet_clear_sched(), and locking the spinlock in mlx5_cq_tasklet_cb().
> TCP completions are handled by mlx5e_completion_event(), which schedules
> NAPI to poll the queue, so they don't need tasklet processing.
> 
> Schedule the tasklet in mlx5_add_cq_to_tasklet() instead to avoid this
> overhead. mlx5_add_cq_to_tasklet() is responsible for enqueuing the CQs
> to be processed in tasklet context, so it can schedule the tasklet. CQs
> that need tasklet processing have their interrupt comp handler set to
> mlx5_add_cq_to_tasklet(), so they will schedule the tasklet. CQs that
> don't need tasklet processing won't schedule the tasklet. To avoid
> scheduling the tasklet multiple times during the same interrupt, only
> schedule the tasklet in mlx5_add_cq_to_tasklet() if the tasklet work
> queue was empty before the new CQ was pushed to it.
> 
> The additional branch in mlx5_add_cq_to_tasklet(), called for each EQE,
> may add a small cost for the userspace Infiniband CQs whose completions
> are processed in tasklet context. But this seems worth it to avoid the
> tasklet overhead for CQs that don't need it.
> 
> Note that the mlx4 driver works the same way: it schedules the tasklet
> in mlx4_add_cq_to_tasklet() and only if the work queue was empty before.
> 
> Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
> Reviewed-by: Parav Pandit <parav@nvidia.com>

@Saeed, @Leon, @Tariq: I assume you will apply this one and include in
the next mlx5 PR. please correct me if I'm wrong.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v3] mlx5/core: Schedule EQ comp tasklet only if necessary
  2024-11-05 12:21             ` Paolo Abeni
@ 2024-11-05 17:28               ` Tariq Toukan
  0 siblings, 0 replies; 13+ messages in thread
From: Tariq Toukan @ 2024-11-05 17:28 UTC (permalink / raw)
  To: Paolo Abeni, Caleb Sander Mateos, Saeed Mahameed, Leon Romanovsky,
	Tariq Toukan, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski
  Cc: Parav Pandit, netdev, linux-rdma, linux-kernel



On 05/11/2024 14:21, Paolo Abeni wrote:
> On 10/31/24 17:34, Caleb Sander Mateos wrote:
>> Currently, the mlx5_eq_comp_int() interrupt handler schedules a tasklet
>> to call mlx5_cq_tasklet_cb() if it processes any completions. For CQs
>> whose completions don't need to be processed in tasklet context, this
>> adds unnecessary overhead. In a heavy TCP workload, we see 4% of CPU
>> time spent on the tasklet_trylock() in tasklet_action_common(), with a
>> smaller amount spent on the atomic operations in tasklet_schedule(),
>> tasklet_clear_sched(), and locking the spinlock in mlx5_cq_tasklet_cb().
>> TCP completions are handled by mlx5e_completion_event(), which schedules
>> NAPI to poll the queue, so they don't need tasklet processing.
>>
>> Schedule the tasklet in mlx5_add_cq_to_tasklet() instead to avoid this
>> overhead. mlx5_add_cq_to_tasklet() is responsible for enqueuing the CQs
>> to be processed in tasklet context, so it can schedule the tasklet. CQs
>> that need tasklet processing have their interrupt comp handler set to
>> mlx5_add_cq_to_tasklet(), so they will schedule the tasklet. CQs that
>> don't need tasklet processing won't schedule the tasklet. To avoid
>> scheduling the tasklet multiple times during the same interrupt, only
>> schedule the tasklet in mlx5_add_cq_to_tasklet() if the tasklet work
>> queue was empty before the new CQ was pushed to it.
>>
>> The additional branch in mlx5_add_cq_to_tasklet(), called for each EQE,
>> may add a small cost for the userspace Infiniband CQs whose completions
>> are processed in tasklet context. But this seems worth it to avoid the
>> tasklet overhead for CQs that don't need it.
>>
>> Note that the mlx4 driver works the same way: it schedules the tasklet
>> in mlx4_add_cq_to_tasklet() and only if the work queue was empty before.
>>
>> Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
>> Reviewed-by: Parav Pandit <parav@nvidia.com>
> 
> @Saeed, @Leon, @Tariq: I assume you will apply this one and include in
> the next mlx5 PR. please correct me if I'm wrong.
> 

Hi Paolo,

I am doing the mlx5 core/en maintainer work right now. I work 
differently to Saeed, as I do not have a kernel.org branch of my own 
(nor use Saeed's ofcourse),

Please consider applying this one and any others once I acknowledge/review.

Acked-by: Tariq Toukan <tariqt@nvidia.com>

Regards,
Tariq


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v3] mlx5/core: Schedule EQ comp tasklet only if necessary
  2024-10-31 16:34           ` [PATCH net-next v3] mlx5/core: Schedule EQ comp tasklet only " Caleb Sander Mateos
  2024-11-05 12:21             ` Paolo Abeni
@ 2024-11-05 18:56             ` Saeed Mahameed
  2024-11-05 20:39               ` Caleb Sander
  2024-11-05 20:39               ` [PATCH net-next v4] " Caleb Sander Mateos
  1 sibling, 2 replies; 13+ messages in thread
From: Saeed Mahameed @ 2024-11-05 18:56 UTC (permalink / raw)
  To: Caleb Sander Mateos
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Parav Pandit, netdev, linux-rdma, linux-kernel

On 31 Oct 10:34, Caleb Sander Mateos wrote:
>Currently, the mlx5_eq_comp_int() interrupt handler schedules a tasklet
>to call mlx5_cq_tasklet_cb() if it processes any completions. For CQs
>whose completions don't need to be processed in tasklet context, this
>adds unnecessary overhead. In a heavy TCP workload, we see 4% of CPU
>time spent on the tasklet_trylock() in tasklet_action_common(), with a
>smaller amount spent on the atomic operations in tasklet_schedule(),
>tasklet_clear_sched(), and locking the spinlock in mlx5_cq_tasklet_cb().
>TCP completions are handled by mlx5e_completion_event(), which schedules
>NAPI to poll the queue, so they don't need tasklet processing.
>
>Schedule the tasklet in mlx5_add_cq_to_tasklet() instead to avoid this
>overhead. mlx5_add_cq_to_tasklet() is responsible for enqueuing the CQs
>to be processed in tasklet context, so it can schedule the tasklet. CQs
>that need tasklet processing have their interrupt comp handler set to
>mlx5_add_cq_to_tasklet(), so they will schedule the tasklet. CQs that
>don't need tasklet processing won't schedule the tasklet. To avoid
>scheduling the tasklet multiple times during the same interrupt, only
>schedule the tasklet in mlx5_add_cq_to_tasklet() if the tasklet work
>queue was empty before the new CQ was pushed to it.
>
>The additional branch in mlx5_add_cq_to_tasklet(), called for each EQE,
>may add a small cost for the userspace Infiniband CQs whose completions
>are processed in tasklet context. But this seems worth it to avoid the
>tasklet overhead for CQs that don't need it.
>
>Note that the mlx4 driver works the same way: it schedules the tasklet
>in mlx4_add_cq_to_tasklet() and only if the work queue was empty before.
>
>Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
>Reviewed-by: Parav Pandit <parav@nvidia.com>
>---
>v3: revise commit message
>v2: reorder variable declarations, describe CPU profile results
>
> drivers/net/ethernet/mellanox/mlx5/core/cq.c | 5 +++++
> drivers/net/ethernet/mellanox/mlx5/core/eq.c | 5 +----
> 2 files changed, 6 insertions(+), 4 deletions(-)
>
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cq.c b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
>index 4caa1b6f40ba..25f3b26db729 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/cq.c
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
>@@ -69,22 +69,27 @@ void mlx5_cq_tasklet_cb(struct tasklet_struct *t)
> static void mlx5_add_cq_to_tasklet(struct mlx5_core_cq *cq,
> 				   struct mlx5_eqe *eqe)
> {
> 	unsigned long flags;
> 	struct mlx5_eq_tasklet *tasklet_ctx = cq->tasklet_ctx.priv;
>+	bool schedule_tasklet = false;
>
> 	spin_lock_irqsave(&tasklet_ctx->lock, flags);
> 	/* When migrating CQs between EQs will be implemented, please note
> 	 * that you need to sync this point. It is possible that
> 	 * while migrating a CQ, completions on the old EQs could
> 	 * still arrive.
> 	 */
> 	if (list_empty_careful(&cq->tasklet_ctx.list)) {
> 		mlx5_cq_hold(cq);

The condition here is counter intuitive, please add a comment that relates
to the tasklet routine mlx5_cq_tasklet_cb, something like.
/* If this list isn't empty, the tasklet is already scheduled, and not yet
  * executing the list, the spinlock here guarantees the addition of this CQ
  * will be seen by the next execution, so rescheduling the tasklet is not
  * required */ 

One other way to do this, is to flag tasklet_ctx.sched_flag = true, inside
mlx5_add_cq_to_tasklet, and then schedule once at the end of eq irq processing 
if (tasklet_ctx.sched_flag == true). to avoid "too" early scheduling, but
since the tasklet can't run until the irq handler returns, I think your
solution shouldn't suffer from "too" early scheduling .. 

Acked-by: Saeed Mahameed <saeedm@nvidia.com>


>+		schedule_tasklet = list_empty(&tasklet_ctx->list);
> 		list_add_tail(&cq->tasklet_ctx.list, &tasklet_ctx->list);
> 	}
> 	spin_unlock_irqrestore(&tasklet_ctx->lock, flags);
>+
>+	if (schedule_tasklet)
>+		tasklet_schedule(&tasklet_ctx->task);
> }
>
> /* Callers must verify outbox status in case of err */
> int mlx5_create_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq,
> 		   u32 *in, int inlen, u32 *out, int outlen)
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
>index 859dcf09b770..3fd2091c11c8 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
>@@ -112,14 +112,14 @@ static int mlx5_eq_comp_int(struct notifier_block *nb,
> 	struct mlx5_eq_comp *eq_comp =
> 		container_of(nb, struct mlx5_eq_comp, irq_nb);
> 	struct mlx5_eq *eq = &eq_comp->core;
> 	struct mlx5_eqe *eqe;
> 	int num_eqes = 0;
>-	u32 cqn = -1;
>
> 	while ((eqe = next_eqe_sw(eq))) {
> 		struct mlx5_core_cq *cq;
>+		u32 cqn;
>
> 		/* Make sure we read EQ entry contents after we've
> 		 * checked the ownership bit.
> 		 */
> 		dma_rmb();
>@@ -142,13 +142,10 @@ static int mlx5_eq_comp_int(struct notifier_block *nb,
> 			break;
> 	}
>
> 	eq_update_ci(eq, 1);
>
>-	if (cqn != -1)
>-		tasklet_schedule(&eq_comp->tasklet_ctx.task);
>-
> 	return 0;
> }
>
> /* Some architectures don't latch interrupts when they are disabled, so using
>  * mlx5_eq_poll_irq_disabled could end up losing interrupts while trying to
>-- 
>2.45.2
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v3] mlx5/core: Schedule EQ comp tasklet only if necessary
  2024-11-05 18:56             ` Saeed Mahameed
@ 2024-11-05 20:39               ` Caleb Sander
  2024-11-05 20:39               ` [PATCH net-next v4] " Caleb Sander Mateos
  1 sibling, 0 replies; 13+ messages in thread
From: Caleb Sander @ 2024-11-05 20:39 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Parav Pandit, netdev, linux-rdma, linux-kernel

On Tue, Nov 5, 2024 at 10:56 AM Saeed Mahameed <saeed@kernel.org> wrote:
>
> On 31 Oct 10:34, Caleb Sander Mateos wrote:
> >Currently, the mlx5_eq_comp_int() interrupt handler schedules a tasklet
> >to call mlx5_cq_tasklet_cb() if it processes any completions. For CQs
> >whose completions don't need to be processed in tasklet context, this
> >adds unnecessary overhead. In a heavy TCP workload, we see 4% of CPU
> >time spent on the tasklet_trylock() in tasklet_action_common(), with a
> >smaller amount spent on the atomic operations in tasklet_schedule(),
> >tasklet_clear_sched(), and locking the spinlock in mlx5_cq_tasklet_cb().
> >TCP completions are handled by mlx5e_completion_event(), which schedules
> >NAPI to poll the queue, so they don't need tasklet processing.
> >
> >Schedule the tasklet in mlx5_add_cq_to_tasklet() instead to avoid this
> >overhead. mlx5_add_cq_to_tasklet() is responsible for enqueuing the CQs
> >to be processed in tasklet context, so it can schedule the tasklet. CQs
> >that need tasklet processing have their interrupt comp handler set to
> >mlx5_add_cq_to_tasklet(), so they will schedule the tasklet. CQs that
> >don't need tasklet processing won't schedule the tasklet. To avoid
> >scheduling the tasklet multiple times during the same interrupt, only
> >schedule the tasklet in mlx5_add_cq_to_tasklet() if the tasklet work
> >queue was empty before the new CQ was pushed to it.
> >
> >The additional branch in mlx5_add_cq_to_tasklet(), called for each EQE,
> >may add a small cost for the userspace Infiniband CQs whose completions
> >are processed in tasklet context. But this seems worth it to avoid the
> >tasklet overhead for CQs that don't need it.
> >
> >Note that the mlx4 driver works the same way: it schedules the tasklet
> >in mlx4_add_cq_to_tasklet() and only if the work queue was empty before.
> >
> >Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
> >Reviewed-by: Parav Pandit <parav@nvidia.com>
> >---
> >v3: revise commit message
> >v2: reorder variable declarations, describe CPU profile results
> >
> > drivers/net/ethernet/mellanox/mlx5/core/cq.c | 5 +++++
> > drivers/net/ethernet/mellanox/mlx5/core/eq.c | 5 +----
> > 2 files changed, 6 insertions(+), 4 deletions(-)
> >
> >diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cq.c b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
> >index 4caa1b6f40ba..25f3b26db729 100644
> >--- a/drivers/net/ethernet/mellanox/mlx5/core/cq.c
> >+++ b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
> >@@ -69,22 +69,27 @@ void mlx5_cq_tasklet_cb(struct tasklet_struct *t)
> > static void mlx5_add_cq_to_tasklet(struct mlx5_core_cq *cq,
> >                                  struct mlx5_eqe *eqe)
> > {
> >       unsigned long flags;
> >       struct mlx5_eq_tasklet *tasklet_ctx = cq->tasklet_ctx.priv;
> >+      bool schedule_tasklet = false;
> >
> >       spin_lock_irqsave(&tasklet_ctx->lock, flags);
> >       /* When migrating CQs between EQs will be implemented, please note
> >        * that you need to sync this point. It is possible that
> >        * while migrating a CQ, completions on the old EQs could
> >        * still arrive.
> >        */
> >       if (list_empty_careful(&cq->tasklet_ctx.list)) {
> >               mlx5_cq_hold(cq);
>
> The condition here is counter intuitive, please add a comment that relates
> to the tasklet routine mlx5_cq_tasklet_cb, something like.
> /* If this list isn't empty, the tasklet is already scheduled, and not yet
>   * executing the list, the spinlock here guarantees the addition of this CQ
>   * will be seen by the next execution, so rescheduling the tasklet is not
>   * required */

Sure, will send out a v4.

>
> One other way to do this, is to flag tasklet_ctx.sched_flag = true, inside
> mlx5_add_cq_to_tasklet, and then schedule once at the end of eq irq processing
> if (tasklet_ctx.sched_flag == true). to avoid "too" early scheduling, but
> since the tasklet can't run until the irq handler returns, I think your
> solution shouldn't suffer from "too" early scheduling ..

Right, that was my thinking behind the list_empty(&tasklet_ctx->list) check.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH net-next v4] mlx5/core: Schedule EQ comp tasklet only if necessary
  2024-11-05 18:56             ` Saeed Mahameed
  2024-11-05 20:39               ` Caleb Sander
@ 2024-11-05 20:39               ` Caleb Sander Mateos
  2024-11-11 22:13                 ` Jakub Kicinski
  1 sibling, 1 reply; 13+ messages in thread
From: Caleb Sander Mateos @ 2024-11-05 20:39 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Caleb Sander Mateos, Parav Pandit, netdev, linux-rdma,
	linux-kernel

Currently, the mlx5_eq_comp_int() interrupt handler schedules a tasklet
to call mlx5_cq_tasklet_cb() if it processes any completions. For CQs
whose completions don't need to be processed in tasklet context, this
adds unnecessary overhead. In a heavy TCP workload, we see 4% of CPU
time spent on the tasklet_trylock() in tasklet_action_common(), with a
smaller amount spent on the atomic operations in tasklet_schedule(),
tasklet_clear_sched(), and locking the spinlock in mlx5_cq_tasklet_cb().
TCP completions are handled by mlx5e_completion_event(), which schedules
NAPI to poll the queue, so they don't need tasklet processing.

Schedule the tasklet in mlx5_add_cq_to_tasklet() instead to avoid this
overhead. mlx5_add_cq_to_tasklet() is responsible for enqueuing the CQs
to be processed in tasklet context, so it can schedule the tasklet. CQs
that need tasklet processing have their interrupt comp handler set to
mlx5_add_cq_to_tasklet(), so they will schedule the tasklet. CQs that
don't need tasklet processing won't schedule the tasklet. To avoid
scheduling the tasklet multiple times during the same interrupt, only
schedule the tasklet in mlx5_add_cq_to_tasklet() if the tasklet work
queue was empty before the new CQ was pushed to it.

The additional branch in mlx5_add_cq_to_tasklet(), called for each EQE,
may add a small cost for the userspace Infiniband CQs whose completions
are processed in tasklet context. But this seems worth it to avoid the
tasklet overhead for CQs that don't need it.

Note that the mlx4 driver works the same way: it schedules the tasklet
in mlx4_add_cq_to_tasklet() and only if the work queue was empty before.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Acked-by: Saeed Mahameed <saeedm@nvidia.com>
---
v4: add comment describing tasklet schedule condition
v3: revise commit message
v2: reorder variable declarations, describe CPU profile results

 drivers/net/ethernet/mellanox/mlx5/core/cq.c | 11 +++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/eq.c |  5 +----
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cq.c b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
index 4caa1b6f40ba..1fd403713baf 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
@@ -69,22 +69,33 @@ void mlx5_cq_tasklet_cb(struct tasklet_struct *t)
 static void mlx5_add_cq_to_tasklet(struct mlx5_core_cq *cq,
 				   struct mlx5_eqe *eqe)
 {
 	unsigned long flags;
 	struct mlx5_eq_tasklet *tasklet_ctx = cq->tasklet_ctx.priv;
+	bool schedule_tasklet = false;
 
 	spin_lock_irqsave(&tasklet_ctx->lock, flags);
 	/* When migrating CQs between EQs will be implemented, please note
 	 * that you need to sync this point. It is possible that
 	 * while migrating a CQ, completions on the old EQs could
 	 * still arrive.
 	 */
 	if (list_empty_careful(&cq->tasklet_ctx.list)) {
 		mlx5_cq_hold(cq);
+		/* If the tasklet CQ work list isn't empty, mlx5_cq_tasklet_cb()
+		 * is scheduled/running and hasn't processed the list yet, so it
+		 * will see this added CQ when it runs. If the list is empty,
+		 * the tasklet needs to be scheduled to pick up the CQ. The
+		 * spinlock avoids any race with the tasklet accessing the list.
+		 */
+		schedule_tasklet = list_empty(&tasklet_ctx->list);
 		list_add_tail(&cq->tasklet_ctx.list, &tasklet_ctx->list);
 	}
 	spin_unlock_irqrestore(&tasklet_ctx->lock, flags);
+
+	if (schedule_tasklet)
+		tasklet_schedule(&tasklet_ctx->task);
 }
 
 /* Callers must verify outbox status in case of err */
 int mlx5_create_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq,
 		   u32 *in, int inlen, u32 *out, int outlen)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 859dcf09b770..3fd2091c11c8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -112,14 +112,14 @@ static int mlx5_eq_comp_int(struct notifier_block *nb,
 	struct mlx5_eq_comp *eq_comp =
 		container_of(nb, struct mlx5_eq_comp, irq_nb);
 	struct mlx5_eq *eq = &eq_comp->core;
 	struct mlx5_eqe *eqe;
 	int num_eqes = 0;
-	u32 cqn = -1;
 
 	while ((eqe = next_eqe_sw(eq))) {
 		struct mlx5_core_cq *cq;
+		u32 cqn;
 
 		/* Make sure we read EQ entry contents after we've
 		 * checked the ownership bit.
 		 */
 		dma_rmb();
@@ -142,13 +142,10 @@ static int mlx5_eq_comp_int(struct notifier_block *nb,
 			break;
 	}
 
 	eq_update_ci(eq, 1);
 
-	if (cqn != -1)
-		tasklet_schedule(&eq_comp->tasklet_ctx.task);
-
 	return 0;
 }
 
 /* Some architectures don't latch interrupts when they are disabled, so using
  * mlx5_eq_poll_irq_disabled could end up losing interrupts while trying to
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v4] mlx5/core: Schedule EQ comp tasklet only if necessary
  2024-11-05 20:39               ` [PATCH net-next v4] " Caleb Sander Mateos
@ 2024-11-11 22:13                 ` Jakub Kicinski
  0 siblings, 0 replies; 13+ messages in thread
From: Jakub Kicinski @ 2024-11-11 22:13 UTC (permalink / raw)
  To: Caleb Sander Mateos
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Lunn,
	David S. Miller, Eric Dumazet, Paolo Abeni, Parav Pandit, netdev,
	linux-rdma, linux-kernel

On Tue,  5 Nov 2024 13:39:59 -0700 Caleb Sander Mateos wrote:
> Currently, the mlx5_eq_comp_int() interrupt handler schedules a tasklet
> to call mlx5_cq_tasklet_cb() if it processes any completions. For CQs
> whose completions don't need to be processed in tasklet context, this
> adds unnecessary overhead. In a heavy TCP workload, we see 4% of CPU
> time spent on the tasklet_trylock() in tasklet_action_common(), with a
> smaller amount spent on the atomic operations in tasklet_schedule(),
> tasklet_clear_sched(), and locking the spinlock in mlx5_cq_tasklet_cb().
> TCP completions are handled by mlx5e_completion_event(), which schedules
> NAPI to poll the queue, so they don't need tasklet processing.

Applied, thanks

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-11-11 22:13 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-27  4:06 [PATCH] mlx5: only schedule EQ comp tasklet if necessary Caleb Sander Mateos
2024-10-29  4:08 ` Parav Pandit
2024-10-29 16:32   ` Caleb Sander
2024-10-30  3:16     ` Parav Pandit
2024-10-30 17:06       ` [PATCH v2] " Caleb Sander Mateos
2024-10-31  3:14         ` Parav Pandit
2024-10-31 16:34           ` [PATCH net-next v3] mlx5/core: Schedule EQ comp tasklet only " Caleb Sander Mateos
2024-11-05 12:21             ` Paolo Abeni
2024-11-05 17:28               ` Tariq Toukan
2024-11-05 18:56             ` Saeed Mahameed
2024-11-05 20:39               ` Caleb Sander
2024-11-05 20:39               ` [PATCH net-next v4] " Caleb Sander Mateos
2024-11-11 22:13                 ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).