netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net 0/3] net_sched: sch_sfq: reject a derived limit of 1
@ 2025-03-28 20:16 Octavian Purdila
  2025-03-28 20:16 ` [PATCH net 1/3] net_sched: sch_sfq: use a temporary work area for validating configuration Octavian Purdila
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Octavian Purdila @ 2025-03-28 20:16 UTC (permalink / raw)
  To: jhs, xiyou.wangcong, jiri
  Cc: davem, edumazet, kuba, pabeni, horms, shuah, netdev,
	Octavian Purdila

Because sfq parameters can influence each other there can be
situations where although the user sets a limit of 2 it can be lowered
to 1:

$ tc qdisc add dev dummy0 handle 1: root sfq limit 2 flows 1 depth 1
$ tc qdisc show dev dummy0
qdisc sfq 1: dev dummy0 root refcnt 2 limit 1p quantum 1514b depth 1 divisor 1024

$ tc qdisc add dev dummy0 handle 1: root sfq limit 2 flows 10 depth 1 divisor 1
$ tc qdisc show dev dummy0
qdisc sfq 2: root refcnt 2 limit 1p quantum 1514b depth 1 divisor 1

As a limit of 1 is invalid, this patch series moves the limit
validation to after all configuration changes have been done. To do
so, the configuration is done in a temporary work area then applied to
the internal state.

The patch series also adds new test cases.

Octavian Purdila (3):
  net_sched: sch_sfq: use a temporary work area for validating
    configuration
  net_sched: sch_sfq: move the limit validation
  selftests/tc-testing: sfq: check that a derived limit of 1 is rejected

 net/sched/sch_sfq.c                           | 70 ++++++++++++++-----
 .../tc-testing/tc-tests/qdiscs/sfq.json       | 36 ++++++++++
 2 files changed, 90 insertions(+), 16 deletions(-)

-- 
2.49.0.472.ge94155a9ec-goog


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH net 1/3] net_sched: sch_sfq: use a temporary work area for validating configuration
  2025-03-28 20:16 [PATCH net 0/3] net_sched: sch_sfq: reject a derived limit of 1 Octavian Purdila
@ 2025-03-28 20:16 ` Octavian Purdila
  2025-03-30 23:49   ` Cong Wang
  2025-03-28 20:16 ` [PATCH net 2/3] net_sched: sch_sfq: move the limit validation Octavian Purdila
  2025-03-28 20:16 ` [PATCH net 3/3] selftests/tc-testing: sfq: check that a derived limit of 1 is rejected Octavian Purdila
  2 siblings, 1 reply; 9+ messages in thread
From: Octavian Purdila @ 2025-03-28 20:16 UTC (permalink / raw)
  To: jhs, xiyou.wangcong, jiri
  Cc: davem, edumazet, kuba, pabeni, horms, shuah, netdev,
	Octavian Purdila

Many configuration parameters have influence on others (e.g. divisor
-> flows -> limit, depth -> limit) and so it is difficult to correctly
do all of the validation before applying the configuration. And if a
validation error is detected late it is difficult to roll back a
partially applied configuration.

To avoid these issues use a temporary work area to update and validate
the configuration and only then apply the configuration to the
internal state.

Signed-off-by: Octavian Purdila <tavip@google.com>
---
 net/sched/sch_sfq.c | 60 ++++++++++++++++++++++++++++++++++++---------
 1 file changed, 48 insertions(+), 12 deletions(-)

diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 65d5b59da583..027a3fde2139 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -631,6 +631,18 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt,
 	struct red_parms *p = NULL;
 	struct sk_buff *to_free = NULL;
 	struct sk_buff *tail = NULL;
+	/* work area for validating changes before committing them */
+	struct {
+		int limit;
+		unsigned int divisor;
+		unsigned int maxflows;
+		int perturb_period;
+		unsigned int quantum;
+		u8 headdrop;
+		u8 maxdepth;
+		u8 flags;
+	} tmp;
+
 
 	if (opt->nla_len < nla_attr_size(sizeof(*ctl)))
 		return -EINVAL;
@@ -656,36 +668,60 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt,
 		NL_SET_ERR_MSG_MOD(extack, "invalid limit");
 		return -EINVAL;
 	}
+
 	sch_tree_lock(sch);
+
+	/* copy configuration to work area */
+	tmp.limit = q->limit;
+	tmp.divisor = q->divisor;
+	tmp.headdrop = q->headdrop;
+	tmp.maxdepth = q->maxdepth;
+	tmp.maxflows = q->maxflows;
+	tmp.perturb_period = q->perturb_period;
+	tmp.quantum = q->quantum;
+	tmp.flags = q->flags;
+
+	/* update and validate configuration */
 	if (ctl->quantum)
-		q->quantum = ctl->quantum;
-	WRITE_ONCE(q->perturb_period, ctl->perturb_period * HZ);
+		tmp.quantum = ctl->quantum;
+	tmp.perturb_period = ctl->perturb_period * HZ;
 	if (ctl->flows)
-		q->maxflows = min_t(u32, ctl->flows, SFQ_MAX_FLOWS);
+		tmp.maxflows = min_t(u32, ctl->flows, SFQ_MAX_FLOWS);
 	if (ctl->divisor) {
-		q->divisor = ctl->divisor;
-		q->maxflows = min_t(u32, q->maxflows, q->divisor);
+		tmp.divisor = ctl->divisor;
+		tmp.maxflows = min_t(u32, tmp.maxflows, tmp.divisor);
 	}
 	if (ctl_v1) {
 		if (ctl_v1->depth)
-			q->maxdepth = min_t(u32, ctl_v1->depth, SFQ_MAX_DEPTH);
+			tmp.maxdepth = min_t(u32, ctl_v1->depth, SFQ_MAX_DEPTH);
 		if (p) {
-			swap(q->red_parms, p);
-			red_set_parms(q->red_parms,
+			red_set_parms(p,
 				      ctl_v1->qth_min, ctl_v1->qth_max,
 				      ctl_v1->Wlog,
 				      ctl_v1->Plog, ctl_v1->Scell_log,
 				      NULL,
 				      ctl_v1->max_P);
 		}
-		q->flags = ctl_v1->flags;
-		q->headdrop = ctl_v1->headdrop;
+		tmp.flags = ctl_v1->flags;
+		tmp.headdrop = ctl_v1->headdrop;
 	}
 	if (ctl->limit) {
-		q->limit = min_t(u32, ctl->limit, q->maxdepth * q->maxflows);
-		q->maxflows = min_t(u32, q->maxflows, q->limit);
+		tmp.limit = min_t(u32, ctl->limit, tmp.maxdepth * tmp.maxflows);
+		tmp.maxflows = min_t(u32, tmp.maxflows, tmp.limit);
 	}
 
+	/* commit configuration, no return from this point further */
+	q->limit = tmp.limit;
+	q->divisor = tmp.divisor;
+	q->headdrop = tmp.headdrop;
+	q->maxdepth = tmp.maxdepth;
+	q->maxflows = tmp.maxflows;
+	WRITE_ONCE(q->perturb_period, tmp.perturb_period);
+	q->quantum = tmp.quantum;
+	q->flags = tmp.flags;
+	if (p)
+		swap(q->red_parms, p);
+
 	qlen = sch->q.qlen;
 	while (sch->q.qlen > q->limit) {
 		dropped += sfq_drop(sch, &to_free);
-- 
2.49.0.472.ge94155a9ec-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net 2/3] net_sched: sch_sfq: move the limit validation
  2025-03-28 20:16 [PATCH net 0/3] net_sched: sch_sfq: reject a derived limit of 1 Octavian Purdila
  2025-03-28 20:16 ` [PATCH net 1/3] net_sched: sch_sfq: use a temporary work area for validating configuration Octavian Purdila
@ 2025-03-28 20:16 ` Octavian Purdila
  2025-03-28 20:16 ` [PATCH net 3/3] selftests/tc-testing: sfq: check that a derived limit of 1 is rejected Octavian Purdila
  2 siblings, 0 replies; 9+ messages in thread
From: Octavian Purdila @ 2025-03-28 20:16 UTC (permalink / raw)
  To: jhs, xiyou.wangcong, jiri
  Cc: davem, edumazet, kuba, pabeni, horms, shuah, netdev,
	Octavian Purdila, syzbot

It is not sufficient to directly validate the limit on the data that
the user passes as it can be updated based on how the other parameters
are changed.

Move the check at the end of the configuration update process to also
catch scenarios where the limit is indirectly updated, for example
with the following configurations:

tc qdisc add dev dummy0 handle 1: root sfq limit 2 flows 1 depth 1
tc qdisc add dev dummy0 handle 1: root sfq limit 2 flows 1 divisor 1

This fixes the following syzkaller reported crash:

------------[ cut here ]------------
UBSAN: array-index-out-of-bounds in net/sched/sch_sfq.c:203:6
index 65535 is out of range for type 'struct sfq_head[128]'
CPU: 1 UID: 0 PID: 3037 Comm: syz.2.16 Not tainted 6.14.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 12/27/2024
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x201/0x300 lib/dump_stack.c:120
 ubsan_epilogue lib/ubsan.c:231 [inline]
 __ubsan_handle_out_of_bounds+0xf5/0x120 lib/ubsan.c:429
 sfq_link net/sched/sch_sfq.c:203 [inline]
 sfq_dec+0x53c/0x610 net/sched/sch_sfq.c:231
 sfq_dequeue+0x34e/0x8c0 net/sched/sch_sfq.c:493
 sfq_reset+0x17/0x60 net/sched/sch_sfq.c:518
 qdisc_reset+0x12e/0x600 net/sched/sch_generic.c:1035
 tbf_reset+0x41/0x110 net/sched/sch_tbf.c:339
 qdisc_reset+0x12e/0x600 net/sched/sch_generic.c:1035
 dev_reset_queue+0x100/0x1b0 net/sched/sch_generic.c:1311
 netdev_for_each_tx_queue include/linux/netdevice.h:2590 [inline]
 dev_deactivate_many+0x7e5/0xe70 net/sched/sch_generic.c:1375

Reported-by: syzbot <syzkaller@googlegroups.com>
Fixes: 10685681bafc ("net_sched: sch_sfq: don't allow 1 packet limit")
Signed-off-by: Octavian Purdila <tavip@google.com>
---
 net/sched/sch_sfq.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 027a3fde2139..a9f20cc98a1a 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -664,10 +664,6 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt,
 		if (!p)
 			return -ENOMEM;
 	}
-	if (ctl->limit == 1) {
-		NL_SET_ERR_MSG_MOD(extack, "invalid limit");
-		return -EINVAL;
-	}
 
 	sch_tree_lock(sch);
 
@@ -709,6 +705,12 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt,
 		tmp.limit = min_t(u32, ctl->limit, tmp.maxdepth * tmp.maxflows);
 		tmp.maxflows = min_t(u32, tmp.maxflows, tmp.limit);
 	}
+	if (tmp.limit == 1) {
+		sch_tree_unlock(sch);
+		kfree(p);
+		NL_SET_ERR_MSG_MOD(extack, "invalid limit");
+		return -EINVAL;
+	}
 
 	/* commit configuration, no return from this point further */
 	q->limit = tmp.limit;
-- 
2.49.0.472.ge94155a9ec-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net 3/3] selftests/tc-testing: sfq: check that a derived limit of 1 is rejected
  2025-03-28 20:16 [PATCH net 0/3] net_sched: sch_sfq: reject a derived limit of 1 Octavian Purdila
  2025-03-28 20:16 ` [PATCH net 1/3] net_sched: sch_sfq: use a temporary work area for validating configuration Octavian Purdila
  2025-03-28 20:16 ` [PATCH net 2/3] net_sched: sch_sfq: move the limit validation Octavian Purdila
@ 2025-03-28 20:16 ` Octavian Purdila
  2 siblings, 0 replies; 9+ messages in thread
From: Octavian Purdila @ 2025-03-28 20:16 UTC (permalink / raw)
  To: jhs, xiyou.wangcong, jiri
  Cc: davem, edumazet, kuba, pabeni, horms, shuah, netdev,
	Octavian Purdila

Because the limit is updated indirectly when other parameters are
updated, there are cases where even though the user requests a limit
of 2 it can actually be set to 1.

Add the following test cases to check that the kernel rejects them:
- limit 2 depth 1 flows 1
- limit 2 depth 1 divisor 1

Signed-off-by: Octavian Purdila <tavip@google.com>
---
 .../tc-testing/tc-tests/qdiscs/sfq.json       | 36 +++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/tools/testing/selftests/tc-testing/tc-tests/qdiscs/sfq.json b/tools/testing/selftests/tc-testing/tc-tests/qdiscs/sfq.json
index 50e8d72781cb..28c6ce6da7db 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/qdiscs/sfq.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/qdiscs/sfq.json
@@ -228,5 +228,41 @@
         "matchCount": "0",
         "teardown": [
         ]
+    },
+    {
+        "id": "7f8f",
+        "name": "Check that a derived limit of 1 is rejected (limit 2 depth 1 flows 1)",
+        "category": [
+            "qdisc",
+            "sfq"
+        ],
+        "plugins": {
+            "requires": "nsPlugin"
+        },
+        "setup": [],
+        "cmdUnderTest": "$TC qdisc add dev $DUMMY handle 1: root sfq limit 2 depth 1 flows 1",
+        "expExitCode": "2",
+        "verifyCmd": "$TC qdisc show dev $DUMMY",
+        "matchPattern": "sfq",
+        "matchCount": "0",
+        "teardown": []
+    },
+    {
+        "id": "5168",
+        "name": "Check that a derived limit of 1 is rejected (limit 2 depth 1 divisor 1)",
+        "category": [
+            "qdisc",
+            "sfq"
+        ],
+        "plugins": {
+            "requires": "nsPlugin"
+        },
+        "setup": [],
+        "cmdUnderTest": "$TC qdisc add dev $DUMMY handle 1: root sfq limit 2 depth 1 divisor 1",
+        "expExitCode": "2",
+        "verifyCmd": "$TC qdisc show dev $DUMMY",
+        "matchPattern": "sfq",
+        "matchCount": "0",
+        "teardown": []
     }
 ]
-- 
2.49.0.472.ge94155a9ec-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net 1/3] net_sched: sch_sfq: use a temporary work area for validating configuration
  2025-03-28 20:16 ` [PATCH net 1/3] net_sched: sch_sfq: use a temporary work area for validating configuration Octavian Purdila
@ 2025-03-30 23:49   ` Cong Wang
  2025-04-01  9:26     ` Paolo Abeni
  0 siblings, 1 reply; 9+ messages in thread
From: Cong Wang @ 2025-03-30 23:49 UTC (permalink / raw)
  To: Octavian Purdila
  Cc: jhs, jiri, davem, edumazet, kuba, pabeni, horms, shuah, netdev

On Fri, Mar 28, 2025 at 01:16:32PM -0700, Octavian Purdila wrote:
> Many configuration parameters have influence on others (e.g. divisor
> -> flows -> limit, depth -> limit) and so it is difficult to correctly
> do all of the validation before applying the configuration. And if a
> validation error is detected late it is difficult to roll back a
> partially applied configuration.
> 
> To avoid these issues use a temporary work area to update and validate
> the configuration and only then apply the configuration to the
> internal state.
> 
> Signed-off-by: Octavian Purdila <tavip@google.com>
> ---
>  net/sched/sch_sfq.c | 60 ++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 48 insertions(+), 12 deletions(-)
> 
> diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
> index 65d5b59da583..027a3fde2139 100644
> --- a/net/sched/sch_sfq.c
> +++ b/net/sched/sch_sfq.c
> @@ -631,6 +631,18 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt,
>  	struct red_parms *p = NULL;
>  	struct sk_buff *to_free = NULL;
>  	struct sk_buff *tail = NULL;
> +	/* work area for validating changes before committing them */
> +	struct {
> +		int limit;
> +		unsigned int divisor;
> +		unsigned int maxflows;
> +		int perturb_period;
> +		unsigned int quantum;
> +		u8 headdrop;
> +		u8 maxdepth;
> +		u8 flags;
> +	} tmp;

Thanks for your patch. It reminds me again about the lacking of complete
RCU support in TC. ;-)

Instead of using a temporary struct, how about introducing a new one
called struct sfq_sched_opt and putting it inside struct sfq_sched_data?
It looks more elegant to me.

Regards,
Cong

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net 1/3] net_sched: sch_sfq: use a temporary work area for validating configuration
  2025-03-30 23:49   ` Cong Wang
@ 2025-04-01  9:26     ` Paolo Abeni
  2025-04-01 10:47       ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Paolo Abeni @ 2025-04-01  9:26 UTC (permalink / raw)
  To: Cong Wang, Octavian Purdila
  Cc: jhs, jiri, davem, edumazet, kuba, horms, shuah, netdev

On 3/31/25 1:49 AM, Cong Wang wrote:
> On Fri, Mar 28, 2025 at 01:16:32PM -0700, Octavian Purdila wrote:
>> diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
>> index 65d5b59da583..027a3fde2139 100644
>> --- a/net/sched/sch_sfq.c
>> +++ b/net/sched/sch_sfq.c
>> @@ -631,6 +631,18 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt,
>>  	struct red_parms *p = NULL;
>>  	struct sk_buff *to_free = NULL;
>>  	struct sk_buff *tail = NULL;
>> +	/* work area for validating changes before committing them */
>> +	struct {
>> +		int limit;
>> +		unsigned int divisor;
>> +		unsigned int maxflows;
>> +		int perturb_period;
>> +		unsigned int quantum;
>> +		u8 headdrop;
>> +		u8 maxdepth;
>> +		u8 flags;
>> +	} tmp;
> 
> Thanks for your patch. It reminds me again about the lacking of complete
> RCU support in TC. ;-)
> 
> Instead of using a temporary struct, how about introducing a new one
> called struct sfq_sched_opt and putting it inside struct sfq_sched_data?
> It looks more elegant to me.

I agree with that. It should also make the code more compact. @Octavian,
please update the patch as per Cong's suggestion.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net 1/3] net_sched: sch_sfq: use a temporary work area for validating configuration
  2025-04-01  9:26     ` Paolo Abeni
@ 2025-04-01 10:47       ` Eric Dumazet
  2025-04-01 11:13         ` Paolo Abeni
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2025-04-01 10:47 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Cong Wang, Octavian Purdila, jhs, jiri, davem, kuba, horms, shuah,
	netdev

On Tue, Apr 1, 2025 at 11:27 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 3/31/25 1:49 AM, Cong Wang wrote:
> > On Fri, Mar 28, 2025 at 01:16:32PM -0700, Octavian Purdila wrote:
> >> diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
> >> index 65d5b59da583..027a3fde2139 100644
> >> --- a/net/sched/sch_sfq.c
> >> +++ b/net/sched/sch_sfq.c
> >> @@ -631,6 +631,18 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt,
> >>      struct red_parms *p = NULL;
> >>      struct sk_buff *to_free = NULL;
> >>      struct sk_buff *tail = NULL;
> >> +    /* work area for validating changes before committing them */
> >> +    struct {
> >> +            int limit;
> >> +            unsigned int divisor;
> >> +            unsigned int maxflows;
> >> +            int perturb_period;
> >> +            unsigned int quantum;
> >> +            u8 headdrop;
> >> +            u8 maxdepth;
> >> +            u8 flags;
> >> +    } tmp;
> >
> > Thanks for your patch. It reminds me again about the lacking of complete
> > RCU support in TC. ;-)
> >
> > Instead of using a temporary struct, how about introducing a new one
> > called struct sfq_sched_opt and putting it inside struct sfq_sched_data?
> > It looks more elegant to me.
>
> I agree with that. It should also make the code more compact. @Octavian,
> please update the patch as per Cong's suggestion.

The concern with this approach was data locality.

I had in my TODO list a patch to remove (accumulated over time) holes
and put together hot fields.

Something like :

struct sfq_sched_data {
int                        limit;                /*     0   0x4 */
unsigned int               divisor;              /*   0x4   0x4 */
u8                         headdrop;             /*   0x8   0x1 */
u8                         maxdepth;             /*   0x9   0x1 */
u8                         cur_depth;            /*   0xa   0x1 */
u8                         flags;                /*   0xb   0x1 */
unsigned int               quantum;              /*   0xc   0x4 */
siphash_key_t              perturbation;         /*  0x10  0x10 */
struct tcf_proto *         filter_list;          /*  0x20   0x8 */
struct tcf_block *         block;                /*  0x28   0x8 */
sfq_index *                ht;                   /*  0x30   0x8 */
struct sfq_slot *          slots;                /*  0x38   0x8 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct red_parms *         red_parms;            /*  0x40   0x8 */
struct tc_sfqred_stats     stats;                /*  0x48  0x18 */
struct sfq_slot *          tail;                 /*  0x60   0x8 */
struct sfq_head            dep[128];             /*  0x68 0x200 */
/* --- cacheline 9 boundary (576 bytes) was 40 bytes ago --- */
unsigned int               maxflows;             /* 0x268   0x4 */
int                        perturb_period;       /* 0x26c   0x4 */
struct timer_list          perturb_timer;        /* 0x270  0x28 */

/* XXX last struct has 4 bytes of padding */

/* --- cacheline 10 boundary (640 bytes) was 24 bytes ago --- */
struct Qdisc *             sch;                  /* 0x298   0x8 */

/* size: 672, cachelines: 11, members: 20 */
/* paddings: 1, sum paddings: 4 */
/* last cacheline: 32 bytes */
};


With this patch :

diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 65d5b59da583..f8fec2bc0d25 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -110,10 +110,11 @@ struct sfq_sched_data {
        unsigned int    divisor;        /* number of slots in hash table */
        u8              headdrop;
        u8              maxdepth;       /* limit of packets per flow */
-
-       siphash_key_t   perturbation;
        u8              cur_depth;      /* depth of longest slot */
        u8              flags;
+       unsigned int    quantum;        /* Allotment per round: MUST
BE >= MTU */
+
+       siphash_key_t   perturbation;
        struct tcf_proto __rcu *filter_list;
        struct tcf_block *block;
        sfq_index       *ht;            /* Hash table ('divisor' slots) */
@@ -132,7 +133,6 @@ struct sfq_sched_data {

        unsigned int    maxflows;       /* number of flows in flows array */
        int             perturb_period;
-       unsigned int    quantum;        /* Allotment per round: MUST
BE >= MTU */
        struct timer_list perturb_timer;
        struct Qdisc    *sch;
 };

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net 1/3] net_sched: sch_sfq: use a temporary work area for validating configuration
  2025-04-01 10:47       ` Eric Dumazet
@ 2025-04-01 11:13         ` Paolo Abeni
  2025-04-01 16:36           ` Octavian Purdila
  0 siblings, 1 reply; 9+ messages in thread
From: Paolo Abeni @ 2025-04-01 11:13 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Cong Wang, Octavian Purdila, jhs, jiri, davem, kuba, horms, shuah,
	netdev

On 4/1/25 12:47 PM, Eric Dumazet wrote:
> On Tue, Apr 1, 2025 at 11:27 AM Paolo Abeni <pabeni@redhat.com> wrote:
>> On 3/31/25 1:49 AM, Cong Wang wrote:
>>> On Fri, Mar 28, 2025 at 01:16:32PM -0700, Octavian Purdila wrote:
>>>> diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
>>>> index 65d5b59da583..027a3fde2139 100644
>>>> --- a/net/sched/sch_sfq.c
>>>> +++ b/net/sched/sch_sfq.c
>>>> @@ -631,6 +631,18 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt,
>>>>      struct red_parms *p = NULL;
>>>>      struct sk_buff *to_free = NULL;
>>>>      struct sk_buff *tail = NULL;
>>>> +    /* work area for validating changes before committing them */
>>>> +    struct {
>>>> +            int limit;
>>>> +            unsigned int divisor;
>>>> +            unsigned int maxflows;
>>>> +            int perturb_period;
>>>> +            unsigned int quantum;
>>>> +            u8 headdrop;
>>>> +            u8 maxdepth;
>>>> +            u8 flags;
>>>> +    } tmp;
>>>
>>> Thanks for your patch. It reminds me again about the lacking of complete
>>> RCU support in TC. ;-)
>>>
>>> Instead of using a temporary struct, how about introducing a new one
>>> called struct sfq_sched_opt and putting it inside struct sfq_sched_data?
>>> It looks more elegant to me.
>>
>> I agree with that. It should also make the code more compact. @Octavian,
>> please update the patch as per Cong's suggestion.
> 
> The concern with this approach was data locality.

I did not consider that aspect.

How about not using the struct at all, then?

	int cur_limit;
	// ...
	u8 cur_flags;

the 'tmp' struct is IMHO not so nice.

> I had in my TODO list a patch to remove (accumulated over time) holes
> and put together hot fields.
> 
> Something like :
> 
> struct sfq_sched_data {
> int                        limit;                /*     0   0x4 */
> unsigned int               divisor;              /*   0x4   0x4 */
> u8                         headdrop;             /*   0x8   0x1 */
> u8                         maxdepth;             /*   0x9   0x1 */
> u8                         cur_depth;            /*   0xa   0x1 */
> u8                         flags;                /*   0xb   0x1 */
> unsigned int               quantum;              /*   0xc   0x4 */
> siphash_key_t              perturbation;         /*  0x10  0x10 */
> struct tcf_proto *         filter_list;          /*  0x20   0x8 */
> struct tcf_block *         block;                /*  0x28   0x8 */
> sfq_index *                ht;                   /*  0x30   0x8 */
> struct sfq_slot *          slots;                /*  0x38   0x8 */
> /* --- cacheline 1 boundary (64 bytes) --- */
> struct red_parms *         red_parms;            /*  0x40   0x8 */
> struct tc_sfqred_stats     stats;                /*  0x48  0x18 */
> struct sfq_slot *          tail;                 /*  0x60   0x8 */
> struct sfq_head            dep[128];             /*  0x68 0x200 */
> /* --- cacheline 9 boundary (576 bytes) was 40 bytes ago --- */
> unsigned int               maxflows;             /* 0x268   0x4 */
> int                        perturb_period;       /* 0x26c   0x4 */
> struct timer_list          perturb_timer;        /* 0x270  0x28 */
> 
> /* XXX last struct has 4 bytes of padding */
> 
> /* --- cacheline 10 boundary (640 bytes) was 24 bytes ago --- */
> struct Qdisc *             sch;                  /* 0x298   0x8 */
> 
> /* size: 672, cachelines: 11, members: 20 */
> /* paddings: 1, sum paddings: 4 */
> /* last cacheline: 32 bytes */
> };
> 
> 
> With this patch :
> 
> diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
> index 65d5b59da583..f8fec2bc0d25 100644
> --- a/net/sched/sch_sfq.c
> +++ b/net/sched/sch_sfq.c
> @@ -110,10 +110,11 @@ struct sfq_sched_data {
>         unsigned int    divisor;        /* number of slots in hash table */
>         u8              headdrop;
>         u8              maxdepth;       /* limit of packets per flow */
> -
> -       siphash_key_t   perturbation;
>         u8              cur_depth;      /* depth of longest slot */
>         u8              flags;
> +       unsigned int    quantum;        /* Allotment per round: MUST
> BE >= MTU */
> +
> +       siphash_key_t   perturbation;
>         struct tcf_proto __rcu *filter_list;
>         struct tcf_block *block;
>         sfq_index       *ht;            /* Hash table ('divisor' slots) */
> @@ -132,7 +133,6 @@ struct sfq_sched_data {
> 
>         unsigned int    maxflows;       /* number of flows in flows array */
>         int             perturb_period;

Would it make any sense to additionally move 'maxflows' and
'perturb_period' at the top, just after 'perturbation'?

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net 1/3] net_sched: sch_sfq: use a temporary work area for validating configuration
  2025-04-01 11:13         ` Paolo Abeni
@ 2025-04-01 16:36           ` Octavian Purdila
  0 siblings, 0 replies; 9+ messages in thread
From: Octavian Purdila @ 2025-04-01 16:36 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Eric Dumazet, Cong Wang, jhs, jiri, davem, kuba, horms, shuah,
	netdev

On Tue, Apr 1, 2025 at 4:13 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 4/1/25 12:47 PM, Eric Dumazet wrote:
> > On Tue, Apr 1, 2025 at 11:27 AM Paolo Abeni <pabeni@redhat.com> wrote:
> >> On 3/31/25 1:49 AM, Cong Wang wrote:
> >>> On Fri, Mar 28, 2025 at 01:16:32PM -0700, Octavian Purdila wrote:
> >>>> diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
> >>>> index 65d5b59da583..027a3fde2139 100644
> >>>> --- a/net/sched/sch_sfq.c
> >>>> +++ b/net/sched/sch_sfq.c
> >>>> @@ -631,6 +631,18 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt,
> >>>>      struct red_parms *p = NULL;
> >>>>      struct sk_buff *to_free = NULL;
> >>>>      struct sk_buff *tail = NULL;
> >>>> +    /* work area for validating changes before committing them */
> >>>> +    struct {
> >>>> +            int limit;
> >>>> +            unsigned int divisor;
> >>>> +            unsigned int maxflows;
> >>>> +            int perturb_period;
> >>>> +            unsigned int quantum;
> >>>> +            u8 headdrop;
> >>>> +            u8 maxdepth;
> >>>> +            u8 flags;
> >>>> +    } tmp;
> >>>
> >>> Thanks for your patch. It reminds me again about the lacking of complete
> >>> RCU support in TC. ;-)
> >>>
> >>> Instead of using a temporary struct, how about introducing a new one
> >>> called struct sfq_sched_opt and putting it inside struct sfq_sched_data?
> >>> It looks more elegant to me.
> >>
> >> I agree with that. It should also make the code more compact. @Octavian,
> >> please update the patch as per Cong's suggestion.
> >
> > The concern with this approach was data locality.
>
> I did not consider that aspect.
>
> How about not using the struct at all, then?
>
>         int cur_limit;
>         // ...
>         u8 cur_flags;
>
> the 'tmp' struct is IMHO not so nice.
>

Thanks Paolo and Cong for the review.

I'll drop the structure, my initial thoughts were that by grouping
them into a structure it would make it easier to understand the
purpose of those variables, especially since we have a lot of local
variables now.

> > I had in my TODO list a patch to remove (accumulated over time) holes
> > and put together hot fields.
> >
> > Something like :
> >
> > struct sfq_sched_data {
> > int                        limit;                /*     0   0x4 */
> > unsigned int               divisor;              /*   0x4   0x4 */
> > u8                         headdrop;             /*   0x8   0x1 */
> > u8                         maxdepth;             /*   0x9   0x1 */
> > u8                         cur_depth;            /*   0xa   0x1 */
> > u8                         flags;                /*   0xb   0x1 */
> > unsigned int               quantum;              /*   0xc   0x4 */
> > siphash_key_t              perturbation;         /*  0x10  0x10 */
> > struct tcf_proto *         filter_list;          /*  0x20   0x8 */
> > struct tcf_block *         block;                /*  0x28   0x8 */
> > sfq_index *                ht;                   /*  0x30   0x8 */
> > struct sfq_slot *          slots;                /*  0x38   0x8 */
> > /* --- cacheline 1 boundary (64 bytes) --- */
> > struct red_parms *         red_parms;            /*  0x40   0x8 */
> > struct tc_sfqred_stats     stats;                /*  0x48  0x18 */
> > struct sfq_slot *          tail;                 /*  0x60   0x8 */
> > struct sfq_head            dep[128];             /*  0x68 0x200 */
> > /* --- cacheline 9 boundary (576 bytes) was 40 bytes ago --- */
> > unsigned int               maxflows;             /* 0x268   0x4 */
> > int                        perturb_period;       /* 0x26c   0x4 */
> > struct timer_list          perturb_timer;        /* 0x270  0x28 */
> >
> > /* XXX last struct has 4 bytes of padding */
> >
> > /* --- cacheline 10 boundary (640 bytes) was 24 bytes ago --- */
> > struct Qdisc *             sch;                  /* 0x298   0x8 */
> >
> > /* size: 672, cachelines: 11, members: 20 */
> > /* paddings: 1, sum paddings: 4 */
> > /* last cacheline: 32 bytes */
> > };
> >
> >
> > With this patch :
> >
> > diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
> > index 65d5b59da583..f8fec2bc0d25 100644
> > --- a/net/sched/sch_sfq.c
> > +++ b/net/sched/sch_sfq.c
> > @@ -110,10 +110,11 @@ struct sfq_sched_data {
> >         unsigned int    divisor;        /* number of slots in hash table */
> >         u8              headdrop;
> >         u8              maxdepth;       /* limit of packets per flow */
> > -
> > -       siphash_key_t   perturbation;
> >         u8              cur_depth;      /* depth of longest slot */
> >         u8              flags;
> > +       unsigned int    quantum;        /* Allotment per round: MUST
> > BE >= MTU */
> > +
> > +       siphash_key_t   perturbation;
> >         struct tcf_proto __rcu *filter_list;
> >         struct tcf_block *block;
> >         sfq_index       *ht;            /* Hash table ('divisor' slots) */
> > @@ -132,7 +133,6 @@ struct sfq_sched_data {
> >
> >         unsigned int    maxflows;       /* number of flows in flows array */
> >         int             perturb_period;
>
> Would it make any sense to additionally move 'maxflows' and
> 'perturb_period' at the top, just after 'perturbation'?
>
> Thanks,
>
> Paolo
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-04-01 16:36 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-28 20:16 [PATCH net 0/3] net_sched: sch_sfq: reject a derived limit of 1 Octavian Purdila
2025-03-28 20:16 ` [PATCH net 1/3] net_sched: sch_sfq: use a temporary work area for validating configuration Octavian Purdila
2025-03-30 23:49   ` Cong Wang
2025-04-01  9:26     ` Paolo Abeni
2025-04-01 10:47       ` Eric Dumazet
2025-04-01 11:13         ` Paolo Abeni
2025-04-01 16:36           ` Octavian Purdila
2025-03-28 20:16 ` [PATCH net 2/3] net_sched: sch_sfq: move the limit validation Octavian Purdila
2025-03-28 20:16 ` [PATCH net 3/3] selftests/tc-testing: sfq: check that a derived limit of 1 is rejected Octavian Purdila

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).