* [PATCH net-next v2 1/4] net: dql: Avoid calling BUG() when WARN() is enough
2024-04-08 17:25 [PATCH v2 0/4] net : dqs: optimize if stall threshold is not set Breno Leitao
@ 2024-04-08 17:25 ` Breno Leitao
2024-04-08 17:25 ` [PATCH net-next v2 2/4] net: dql: Separate queue function responsibilities Breno Leitao
` (2 subsequent siblings)
3 siblings, 0 replies; 8+ messages in thread
From: Breno Leitao @ 2024-04-08 17:25 UTC (permalink / raw)
To: kuba, davem, pabeni, edumazet; +Cc: linux-kernel, netdev
If the dql_queued() function receives an invalid argument, WARN about it
and continue, instead of crashing the kernel.
This was raised by checkpatch, when I am refactoring this code (see
following patch/commit)
WARNING: Do not crash the kernel unless it is absolutely unavoidable--use WARN_ON_ONCE() plus recovery code (if feasible) instead of BUG() or variants
Signed-off-by: Breno Leitao <leitao@debian.org>
---
include/linux/dynamic_queue_limits.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/dynamic_queue_limits.h b/include/linux/dynamic_queue_limits.h
index 5693a4be0d9a..ff9c65841ae8 100644
--- a/include/linux/dynamic_queue_limits.h
+++ b/include/linux/dynamic_queue_limits.h
@@ -91,7 +91,8 @@ static inline void dql_queued(struct dql *dql, unsigned int count)
{
unsigned long map, now, now_hi, i;
- BUG_ON(count > DQL_MAX_OBJECT);
+ if (WARN_ON_ONCE(count > DQL_MAX_OBJECT))
+ return;
dql->last_obj_cnt = count;
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH net-next v2 2/4] net: dql: Separate queue function responsibilities
2024-04-08 17:25 [PATCH v2 0/4] net : dqs: optimize if stall threshold is not set Breno Leitao
2024-04-08 17:25 ` [PATCH net-next v2 1/4] net: dql: Avoid calling BUG() when WARN() is enough Breno Leitao
@ 2024-04-08 17:25 ` Breno Leitao
2024-04-08 17:25 ` [PATCH net-next v2 3/4] net: dql: Optimize stall information population Breno Leitao
2024-04-08 17:25 ` [PATCH net-next v2 4/4] net: dqs: make struct dql more cache efficient Breno Leitao
3 siblings, 0 replies; 8+ messages in thread
From: Breno Leitao @ 2024-04-08 17:25 UTC (permalink / raw)
To: kuba, davem, pabeni, edumazet; +Cc: linux-kernel, netdev
The dql_queued() function currently handles both queuing object counts
and populating bitmaps for reporting stalls.
This commit splits the bitmap population into a separate function,
allowing for conditional invocation in scenarios where the feature is
disabled.
This refactor maintains functionality while improving code
organization.
Signed-off-by: Breno Leitao <leitao@debian.org>
---
include/linux/dynamic_queue_limits.h | 44 ++++++++++++++++------------
1 file changed, 25 insertions(+), 19 deletions(-)
diff --git a/include/linux/dynamic_queue_limits.h b/include/linux/dynamic_queue_limits.h
index ff9c65841ae8..9980df0b7247 100644
--- a/include/linux/dynamic_queue_limits.h
+++ b/include/linux/dynamic_queue_limits.h
@@ -83,28 +83,11 @@ struct dql {
#define DQL_MAX_OBJECT (UINT_MAX / 16)
#define DQL_MAX_LIMIT ((UINT_MAX / 2) - DQL_MAX_OBJECT)
-/*
- * Record number of objects queued. Assumes that caller has already checked
- * availability in the queue with dql_avail.
- */
-static inline void dql_queued(struct dql *dql, unsigned int count)
+/* Populate the bitmap to be processed later in dql_check_stall() */
+static inline void dql_queue_stall(struct dql *dql)
{
unsigned long map, now, now_hi, i;
- if (WARN_ON_ONCE(count > DQL_MAX_OBJECT))
- return;
-
- dql->last_obj_cnt = count;
-
- /* We want to force a write first, so that cpu do not attempt
- * to get cache line containing last_obj_cnt, num_queued, adj_limit
- * in Shared state, but directly does a Request For Ownership
- * It is only a hint, we use barrier() only.
- */
- barrier();
-
- dql->num_queued += count;
-
now = jiffies;
now_hi = now / BITS_PER_LONG;
@@ -134,6 +117,29 @@ static inline void dql_queued(struct dql *dql, unsigned int count)
WRITE_ONCE(DQL_HIST_ENT(dql, now_hi), map | BIT_MASK(now));
}
+/*
+ * Record number of objects queued. Assumes that caller has already checked
+ * availability in the queue with dql_avail.
+ */
+static inline void dql_queued(struct dql *dql, unsigned int count)
+{
+ if (WARN_ON_ONCE(count > DQL_MAX_OBJECT))
+ return;
+
+ dql->last_obj_cnt = count;
+
+ /* We want to force a write first, so that cpu do not attempt
+ * to get cache line containing last_obj_cnt, num_queued, adj_limit
+ * in Shared state, but directly does a Request For Ownership
+ * It is only a hint, we use barrier() only.
+ */
+ barrier();
+
+ dql->num_queued += count;
+
+ dql_queue_stall(dql);
+}
+
/* Returns how many objects can be queued, < 0 indicates over limit. */
static inline int dql_avail(const struct dql *dql)
{
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH net-next v2 3/4] net: dql: Optimize stall information population
2024-04-08 17:25 [PATCH v2 0/4] net : dqs: optimize if stall threshold is not set Breno Leitao
2024-04-08 17:25 ` [PATCH net-next v2 1/4] net: dql: Avoid calling BUG() when WARN() is enough Breno Leitao
2024-04-08 17:25 ` [PATCH net-next v2 2/4] net: dql: Separate queue function responsibilities Breno Leitao
@ 2024-04-08 17:25 ` Breno Leitao
2024-04-08 17:25 ` [PATCH net-next v2 4/4] net: dqs: make struct dql more cache efficient Breno Leitao
3 siblings, 0 replies; 8+ messages in thread
From: Breno Leitao @ 2024-04-08 17:25 UTC (permalink / raw)
To: kuba, davem, pabeni, edumazet; +Cc: linux-kernel, netdev
When Dynamic Queue Limit (DQL) is set, it always populate stall
information through dql_queue_stall(). However, this information is
only necessary if a stall threshold is set, stored in struct
dql->stall_thrs.
dql_queue_stall() is cheap, but not free, since it does have memory
barriers and so forth.
Do not call dql_queue_stall() if there is no stall threshold set, and
save some CPU cycles.
Signed-off-by: Breno Leitao <leitao@debian.org>
---
include/linux/dynamic_queue_limits.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/linux/dynamic_queue_limits.h b/include/linux/dynamic_queue_limits.h
index 9980df0b7247..869afb800ea1 100644
--- a/include/linux/dynamic_queue_limits.h
+++ b/include/linux/dynamic_queue_limits.h
@@ -137,7 +137,9 @@ static inline void dql_queued(struct dql *dql, unsigned int count)
dql->num_queued += count;
- dql_queue_stall(dql);
+ /* Only populate stall information if the threshold is set */
+ if (READ_ONCE(dql->stall_thrs))
+ dql_queue_stall(dql);
}
/* Returns how many objects can be queued, < 0 indicates over limit. */
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next v2 4/4] net: dqs: make struct dql more cache efficient
2024-04-08 17:25 [PATCH v2 0/4] net : dqs: optimize if stall threshold is not set Breno Leitao
` (2 preceding siblings ...)
2024-04-08 17:25 ` [PATCH net-next v2 3/4] net: dql: Optimize stall information population Breno Leitao
@ 2024-04-08 17:25 ` Breno Leitao
2024-04-10 0:21 ` Jakub Kicinski
3 siblings, 1 reply; 8+ messages in thread
From: Breno Leitao @ 2024-04-08 17:25 UTC (permalink / raw)
To: kuba, davem, pabeni, edumazet; +Cc: linux-kernel, netdev
With the previous change, struct dqs->stall_thrs will be in the hot path
(at queue side), even if DQS is disabled.
The other fields accessed in this function (last_obj_cnt and num_queued)
are in the first cache line, let's move this field (stall_thrs) to the
very first cache line, since there is a hole there.
This does not change the structure size, since it moves an short (2
bytes) to 4-bytes whole in the first cache line.
This is the new structure format now:
struct dql {
unsigned int num_queued; /* Total ever queued */
unsigned int last_obj_cnt; /* Count at last queuing */
...
short unsigned int stall_thrs; /* 12 2 */
/* XXX 2 bytes hole, try to pack */
...
/* --- cacheline 1 boundary (64 bytes) --- */
...
/* Longest stall detected, reported to user */
short unsigned int stall_max; /* 100 2 */
/* XXX 2 bytes hole, try to pack */
};
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
include/linux/dynamic_queue_limits.h | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/include/linux/dynamic_queue_limits.h b/include/linux/dynamic_queue_limits.h
index 869afb800ea1..281298e77a15 100644
--- a/include/linux/dynamic_queue_limits.h
+++ b/include/linux/dynamic_queue_limits.h
@@ -50,6 +50,9 @@ struct dql {
unsigned int adj_limit; /* limit + num_completed */
unsigned int last_obj_cnt; /* Count at last queuing */
+ /* Stall threshold (in jiffies), defined by user */
+ unsigned short stall_thrs;
+
unsigned long history_head; /* top 58 bits of jiffies */
/* stall entries, a bit per entry */
unsigned long history[DQL_HIST_LEN];
@@ -71,8 +74,6 @@ struct dql {
unsigned int min_limit; /* Minimum limit */
unsigned int slack_hold_time; /* Time to measure slack */
- /* Stall threshold (in jiffies), defined by user */
- unsigned short stall_thrs;
/* Longest stall detected, reported to user */
unsigned short stall_max;
unsigned long last_reap; /* Last reap (in jiffies) */
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH net-next v2 4/4] net: dqs: make struct dql more cache efficient
2024-04-08 17:25 ` [PATCH net-next v2 4/4] net: dqs: make struct dql more cache efficient Breno Leitao
@ 2024-04-10 0:21 ` Jakub Kicinski
2024-04-10 13:52 ` Breno Leitao
0 siblings, 1 reply; 8+ messages in thread
From: Jakub Kicinski @ 2024-04-10 0:21 UTC (permalink / raw)
To: Breno Leitao; +Cc: davem, pabeni, edumazet, linux-kernel, netdev
On Mon, 8 Apr 2024 10:25:56 -0700 Breno Leitao wrote:
> With the previous change, struct dqs->stall_thrs will be in the hot path
> (at queue side), even if DQS is disabled.
>
> The other fields accessed in this function (last_obj_cnt and num_queued)
> are in the first cache line, let's move this field (stall_thrs) to the
> very first cache line, since there is a hole there.
>
> This does not change the structure size, since it moves an short (2
> bytes) to 4-bytes whole in the first cache line.
Doesn't this move the cache line bouncing problem to the other side?
Eric said "copy" I read that as "have two fields with the same value".
I think it's single digit number of alu instructions we'd be saving
here, not super convinced patch 3 is the right trade off...
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v2 4/4] net: dqs: make struct dql more cache efficient
2024-04-10 0:21 ` Jakub Kicinski
@ 2024-04-10 13:52 ` Breno Leitao
2024-04-11 1:45 ` Jakub Kicinski
0 siblings, 1 reply; 8+ messages in thread
From: Breno Leitao @ 2024-04-10 13:52 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: davem, pabeni, edumazet, linux-kernel, netdev
On Tue, Apr 09, 2024 at 05:21:49PM -0700, Jakub Kicinski wrote:
> On Mon, 8 Apr 2024 10:25:56 -0700 Breno Leitao wrote:
> > With the previous change, struct dqs->stall_thrs will be in the hot path
> > (at queue side), even if DQS is disabled.
> >
> > The other fields accessed in this function (last_obj_cnt and num_queued)
> > are in the first cache line, let's move this field (stall_thrs) to the
> > very first cache line, since there is a hole there.
> >
> > This does not change the structure size, since it moves an short (2
> > bytes) to 4-bytes whole in the first cache line.
>
> Doesn't this move the cache line bouncing problem to the other side?
I think so. Looking at dql_check_stall(), it only uses fields in the
second cache line, except now 'dql->stall_thrs' that is in the first
cache line.
> Eric said "copy" I read that as "have two fields with the same value".
Sorry, I misunderstood it. I can create two fields, and update them
together at the only place where they will be updated
(bql_set_stall_thrs).
> I think it's single digit number of alu instructions we'd be saving
> here, not super convinced patch 3 is the right trade off...
Right. I was more concerned about the write barriers (smp_wmb()) inside
the loop which happen quite frequently.
But, if you think that the this is not the right approach, I can drop
this whole patchset. Do you think a profiler will us here?
Thanks!
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v2 4/4] net: dqs: make struct dql more cache efficient
2024-04-10 13:52 ` Breno Leitao
@ 2024-04-11 1:45 ` Jakub Kicinski
0 siblings, 0 replies; 8+ messages in thread
From: Jakub Kicinski @ 2024-04-11 1:45 UTC (permalink / raw)
To: Breno Leitao; +Cc: davem, pabeni, edumazet, linux-kernel, netdev
On Wed, 10 Apr 2024 06:52:56 -0700 Breno Leitao wrote:
> > Doesn't this move the cache line bouncing problem to the other side?
>
> I think so. Looking at dql_check_stall(), it only uses fields in the
> second cache line, except now 'dql->stall_thrs' that is in the first
> cache line.
We do read num_queued at the beginning of dql_completed().
So maybe we we move the read of the threshold there we will be fine.
^ permalink raw reply [flat|nested] 8+ messages in thread