* [PATCH] net/sched: netem: account for backlog updates from child qdisc
@ 2024-11-25 23:18 Martin Ottens
2024-11-26 0:12 ` Stephen Hemminger
2024-12-03 3:13 ` Jakub Kicinski
0 siblings, 2 replies; 10+ messages in thread
From: Martin Ottens @ 2024-11-25 23:18 UTC (permalink / raw)
Cc: Martin Ottens, Stephen Hemminger, Jamal Hadi Salim, Cong Wang,
Jiri Pirko, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, netdev, linux-kernel
When netem is used with a child qdisc, the child qdisc can use
'qdisc_tree_reduce_backlog' to inform its parent, netem, about created or
dropped SKBs. This function updates 'qlen' and the backlog statistics of
netem, but netem does not account for changes made by a child qdisc. If a
child qdisc creates new SKBs during enqueue and informs its parent about
this, netem's 'qlen' value is increased. When netem dequeues the newly
created SKBs from the child, the 'qlen' in netem is not updated. If 'qlen'
reaches the configured limit, the enqueue function stops working, even
though the tfifo is not full.
Reproduce the bug:
Ensure that the sender machine has GSO enabled. Configure netem as root
qdisc and tbf as its child on the outgoing interface of the machine
as follows:
$ tc qdisc add dev <oif> root handle 1: netem delay 100ms limit 100
$ tc qdisc add dev <oif> parent 1:0 tbf rate 50Mbit burst 1542 latency 50ms
Send bulk TCP traffic out via this interface, e.g., by running an iPerf3
client on the machine. Check the qdisc statistics:
$ tc -s qdisc show dev <oif>
tbf segments the GSO SKBs (tbf_segment) and updates the netem's 'qlen'.
The interface fully stops transferring packets and "locks". In this case,
the child qdisc and tfifo are empty, but 'qlen' indicates the tfifo is at
its limit and no more packets are accepted.
This patch adds a counter for the entries in the tfifo. Netem's 'qlen' is
only decreased when a packet is returned by its dequeue function, and not
during enqueuing into the child qdisc. External updates to 'qlen' are thus
accounted for and only the behavior of the backlog statistics changes.
This statistics now show the total number/length of SKBs held in the tfifo
and in the child qdisc. (Note: the 'backlog' byte-statistic of netem is
incorrect in the example above even after the patch is applied due to a
bug in tbf. See my previous patch ([PATCH] net/sched: tbf: correct backlog
statistic for GSO packets)).
Signed-off-by: Martin Ottens <martin.ottens@fau.de>
---
net/sched/sch_netem.c | 22 ++++++++++++++++------
1 file changed, 16 insertions(+), 6 deletions(-)
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index fe6fed291a7b..71ec9986ed37 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -79,6 +79,8 @@ struct netem_sched_data {
struct sk_buff *t_head;
struct sk_buff *t_tail;
+ u32 t_len;
+
/* optional qdisc for classful handling (NULL at netem init) */
struct Qdisc *qdisc;
@@ -383,6 +385,7 @@ static void tfifo_reset(struct Qdisc *sch)
rtnl_kfree_skbs(q->t_head, q->t_tail);
q->t_head = NULL;
q->t_tail = NULL;
+ q->t_len = 0;
}
static void tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch)
@@ -412,6 +415,7 @@ static void tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch)
rb_link_node(&nskb->rbnode, parent, p);
rb_insert_color(&nskb->rbnode, &q->t_root);
}
+ q->t_len++;
sch->q.qlen++;
}
@@ -518,7 +522,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
1<<get_random_u32_below(8);
}
- if (unlikely(sch->q.qlen >= sch->limit)) {
+ if (unlikely(q->t_len >= sch->limit)) {
/* re-link segs, so that qdisc_drop_all() frees them all */
skb->next = segs;
qdisc_drop_all(skb, sch, to_free);
@@ -702,8 +706,8 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
tfifo_dequeue:
skb = __qdisc_dequeue_head(&sch->q);
if (skb) {
- qdisc_qstats_backlog_dec(sch, skb);
deliver:
+ qdisc_qstats_backlog_dec(sch, skb);
qdisc_bstats_update(sch, skb);
return skb;
}
@@ -719,8 +723,7 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
if (time_to_send <= now && q->slot.slot_next <= now) {
netem_erase_head(q, skb);
- sch->q.qlen--;
- qdisc_qstats_backlog_dec(sch, skb);
+ q->t_len--;
skb->next = NULL;
skb->prev = NULL;
/* skb->dev shares skb->rbnode area,
@@ -747,16 +750,21 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
if (net_xmit_drop_count(err))
qdisc_qstats_drop(sch);
qdisc_tree_reduce_backlog(sch, 1, pkt_len);
+ sch->qstats.backlog -= pkt_len;
+ sch->q.qlen--;
}
goto tfifo_dequeue;
}
+ sch->q.qlen--;
goto deliver;
}
if (q->qdisc) {
skb = q->qdisc->ops->dequeue(q->qdisc);
- if (skb)
+ if (skb) {
+ sch->q.qlen--;
goto deliver;
+ }
}
qdisc_watchdog_schedule_ns(&q->watchdog,
@@ -766,8 +774,10 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
if (q->qdisc) {
skb = q->qdisc->ops->dequeue(q->qdisc);
- if (skb)
+ if (skb) {
+ sch->q.qlen--;
goto deliver;
+ }
}
return NULL;
}
--
2.39.5
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH] net/sched: netem: account for backlog updates from child qdisc
2024-11-25 23:18 [PATCH] net/sched: netem: account for backlog updates from child qdisc Martin Ottens
@ 2024-11-26 0:12 ` Stephen Hemminger
2024-11-26 11:13 ` Martin Ottens
2024-12-03 3:13 ` Jakub Kicinski
1 sibling, 1 reply; 10+ messages in thread
From: Stephen Hemminger @ 2024-11-26 0:12 UTC (permalink / raw)
To: Martin Ottens
Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, netdev
On Tue, 26 Nov 2024 00:18:25 +0100
Martin Ottens <martin.ottens@fau.de> wrote:
> When netem is used with a child qdisc, the child qdisc can use
> 'qdisc_tree_reduce_backlog' to inform its parent, netem, about created or
> dropped SKBs. This function updates 'qlen' and the backlog statistics of
> netem, but netem does not account for changes made by a child qdisc. If a
> child qdisc creates new SKBs during enqueue and informs its parent about
> this, netem's 'qlen' value is increased. When netem dequeues the newly
> created SKBs from the child, the 'qlen' in netem is not updated. If 'qlen'
> reaches the configured limit, the enqueue function stops working, even
> though the tfifo is not full.
>
> Reproduce the bug:
> Ensure that the sender machine has GSO enabled. Configure netem as root
> qdisc and tbf as its child on the outgoing interface of the machine
> as follows:
> $ tc qdisc add dev <oif> root handle 1: netem delay 100ms limit 100
> $ tc qdisc add dev <oif> parent 1:0 tbf rate 50Mbit burst 1542 latency 50ms
>
> Send bulk TCP traffic out via this interface, e.g., by running an iPerf3
> client on the machine. Check the qdisc statistics:
> $ tc -s qdisc show dev <oif>
>
> tbf segments the GSO SKBs (tbf_segment) and updates the netem's 'qlen'.
> The interface fully stops transferring packets and "locks". In this case,
> the child qdisc and tfifo are empty, but 'qlen' indicates the tfifo is at
> its limit and no more packets are accepted.
>
> This patch adds a counter for the entries in the tfifo. Netem's 'qlen' is
> only decreased when a packet is returned by its dequeue function, and not
> during enqueuing into the child qdisc. External updates to 'qlen' are thus
> accounted for and only the behavior of the backlog statistics changes.
> This statistics now show the total number/length of SKBs held in the tfifo
> and in the child qdisc. (Note: the 'backlog' byte-statistic of netem is
> incorrect in the example above even after the patch is applied due to a
> bug in tbf. See my previous patch ([PATCH] net/sched: tbf: correct backlog
> statistic for GSO packets)).
>
> Signed-off-by: Martin Ottens <martin.ottens@fau.de>
Does this also address this issue with running non-work conserving
inner qdisc?
See: https://marc.info/?l=linux-netdev&m=172779429511319&w=2
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] net/sched: netem: account for backlog updates from child qdisc
2024-11-26 0:12 ` Stephen Hemminger
@ 2024-11-26 11:13 ` Martin Ottens
0 siblings, 0 replies; 10+ messages in thread
From: Martin Ottens @ 2024-11-26 11:13 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, netdev
On 26.11.24 01:12, Stephen Hemminger wrote:
> Does this also address this issue with running non-work conserving
> inner qdisc?
>
> See: https://marc.info/?l=linux-netdev&m=172779429511319&w=2
The bug that my patch fixes only occurs when netem interacts with
an inner qdisc like tbf, the behavior when netem interacts with its
parent qdiscs is not changed by the patch.
Therefore the patch does not address the UaF bug.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] net/sched: netem: account for backlog updates from child qdisc
2024-11-25 23:18 [PATCH] net/sched: netem: account for backlog updates from child qdisc Martin Ottens
2024-11-26 0:12 ` Stephen Hemminger
@ 2024-12-03 3:13 ` Jakub Kicinski
2024-12-04 11:47 ` Martin Ottens
2024-12-04 12:29 ` [PATCH v2] " Martin Ottens
1 sibling, 2 replies; 10+ messages in thread
From: Jakub Kicinski @ 2024-12-03 3:13 UTC (permalink / raw)
To: Martin Ottens
Cc: Stephen Hemminger, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman, netdev,
linux-kernel
On Tue, 26 Nov 2024 00:18:25 +0100 Martin Ottens wrote:
> @@ -702,8 +706,8 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
> tfifo_dequeue:
> skb = __qdisc_dequeue_head(&sch->q);
> if (skb) {
> - qdisc_qstats_backlog_dec(sch, skb);
> deliver:
> + qdisc_qstats_backlog_dec(sch, skb);
> qdisc_bstats_update(sch, skb);
> return skb;
> }
Isn't this sort of a change all we need?
I don't understand why we need to perform packet accounting
in a separate new member (t_len). You seem to fix qlen accounting,
anyway, and I think sch->limit should apply to the qdisc and all
its children. Not just qdisc directly (since most classful qdiscs
don't hold packets).
I'm not a qdisc expert, so if you feel confident about this code you
need to explain the thinking in the commit message..
--
pw-bot: cr
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH] net/sched: netem: account for backlog updates from child qdisc
2024-12-03 3:13 ` Jakub Kicinski
@ 2024-12-04 11:47 ` Martin Ottens
2024-12-04 12:29 ` [PATCH v2] " Martin Ottens
1 sibling, 0 replies; 10+ messages in thread
From: Martin Ottens @ 2024-12-04 11:47 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Stephen Hemminger, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman, netdev,
linux-kernel
On 03.12.24 04:13, Jakub Kicinski wrote:
> I don't understand why we need to perform packet accounting
> in a separate new member (t_len). You seem to fix qlen accounting,
> anyway, and I think sch->limit should apply to the qdisc and all
> its children. Not just qdisc directly (since most classful qdiscs
> don't hold packets).
Netem is a classful qdisc but different from others because it holds
packets in its internal tfifo and optional additionally in a child
qdisc. However, sch->limit currently only considers the packets in
the tfifo and not the packets hold by a child, but child qdiscs
expect this value to refer to the number of packets that are in
netem and all its children together. If the children change this
value (using 'qdisc_tree_reduce_backlog'), then the number of
packets in the tfifo no longer matches sch->limit.
By adding t_len, the number of packets in the tfifo will be tracked
independently from sch->limit therefore sch->limit can be changes
by children without unwanted behavior. t_len is required, because
currently the limit option of netem refers to the maximum number
of packets in the tfifo - therefore the behavior of netem is not
changed by this patch.
> I'm not a qdisc expert, so if you feel confident about this code you
> need to explain the thinking in the commit message..
With the patch I try to fix the error without changing the behavior
of netem (e.g., change the meaning of the limit option to apply to
the tfifo length and the packets in the child qdisc). Maybe there
are even better approaches - I am happy about any feedback. I will
revise the explanation in the patch to make this clearer and
resubmit it.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2] net/sched: netem: account for backlog updates from child qdisc
2024-12-03 3:13 ` Jakub Kicinski
2024-12-04 11:47 ` Martin Ottens
@ 2024-12-04 12:29 ` Martin Ottens
2024-12-05 12:40 ` Jamal Hadi Salim
1 sibling, 1 reply; 10+ messages in thread
From: Martin Ottens @ 2024-12-04 12:29 UTC (permalink / raw)
Cc: Martin Ottens, Stephen Hemminger, Jamal Hadi Salim, Cong Wang,
Jiri Pirko, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, netdev, linux-kernel
In general, 'qlen' of any classful qdisc should keep track of the
number of packets that the qdisc itself and all of its children holds.
In case of netem, 'qlen' only accounts for the packets in its internal
tfifo. When netem is used with a child qdisc, the child qdisc can use
'qdisc_tree_reduce_backlog' to inform its parent, netem, about created
or dropped SKBs. This function updates 'qlen' and the backlog statistics
of netem, but netem does not account for changes made by a child qdisc.
'qlen' then indicates the wrong number of packets in the tfifo.
If a child qdisc creates new SKBs during enqueue and informs its parent
about this, netem's 'qlen' value is increased. When netem dequeues the
newly created SKBs from the child, the 'qlen' in netem is not updated.
If 'qlen' reaches the configured sch->limit, the enqueue function stops
working, even though the tfifo is not full.
Reproduce the bug:
Ensure that the sender machine has GSO enabled. Configure netem as root
qdisc and tbf as its child on the outgoing interface of the machine
as follows:
$ tc qdisc add dev <oif> root handle 1: netem delay 100ms limit 100
$ tc qdisc add dev <oif> parent 1:0 tbf rate 50Mbit burst 1542 latency 50ms
Send bulk TCP traffic out via this interface, e.g., by running an iPerf3
client on the machine. Check the qdisc statistics:
$ tc -s qdisc show dev <oif>
tbf segments the GSO SKBs (tbf_segment) and updates the netem's 'qlen'.
The interface fully stops transferring packets and "locks". In this case,
the child qdisc and tfifo are empty, but 'qlen' indicates the tfifo is at
its limit and no more packets are accepted.
This patch adds a counter for the entries in the tfifo. Netem's 'qlen' is
only decreased when a packet is returned by its dequeue function, and not
during enqueuing into the child qdisc. External updates to 'qlen' are thus
accounted for and only the behavior of the backlog statistics changes. As
in other qdiscs, 'qlen' then keeps track of how many packets are held in
netem and all of its children. As before, sch->limit remains as the
maximum number of packets in the tfifo. The same applies to netem's
backlog statistics.
(Note: the 'backlog' byte-statistic of netem is incorrect in the example
above even after the patch is applied due to a bug in tbf. See my
previous patch ([PATCH] net/sched: tbf: correct backlog statistic for
GSO packets)).
Signed-off-by: Martin Ottens <martin.ottens@fau.de>
---
net/sched/sch_netem.c | 22 ++++++++++++++++------
1 file changed, 16 insertions(+), 6 deletions(-)
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index fe6fed291a7b..71ec9986ed37 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -79,6 +79,8 @@ struct netem_sched_data {
struct sk_buff *t_head;
struct sk_buff *t_tail;
+ u32 t_len;
+
/* optional qdisc for classful handling (NULL at netem init) */
struct Qdisc *qdisc;
@@ -383,6 +385,7 @@ static void tfifo_reset(struct Qdisc *sch)
rtnl_kfree_skbs(q->t_head, q->t_tail);
q->t_head = NULL;
q->t_tail = NULL;
+ q->t_len = 0;
}
static void tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch)
@@ -412,6 +415,7 @@ static void tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch)
rb_link_node(&nskb->rbnode, parent, p);
rb_insert_color(&nskb->rbnode, &q->t_root);
}
+ q->t_len++;
sch->q.qlen++;
}
@@ -518,7 +522,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
1<<get_random_u32_below(8);
}
- if (unlikely(sch->q.qlen >= sch->limit)) {
+ if (unlikely(q->t_len >= sch->limit)) {
/* re-link segs, so that qdisc_drop_all() frees them all */
skb->next = segs;
qdisc_drop_all(skb, sch, to_free);
@@ -702,8 +706,8 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
tfifo_dequeue:
skb = __qdisc_dequeue_head(&sch->q);
if (skb) {
- qdisc_qstats_backlog_dec(sch, skb);
deliver:
+ qdisc_qstats_backlog_dec(sch, skb);
qdisc_bstats_update(sch, skb);
return skb;
}
@@ -719,8 +723,7 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
if (time_to_send <= now && q->slot.slot_next <= now) {
netem_erase_head(q, skb);
- sch->q.qlen--;
- qdisc_qstats_backlog_dec(sch, skb);
+ q->t_len--;
skb->next = NULL;
skb->prev = NULL;
/* skb->dev shares skb->rbnode area,
@@ -747,16 +750,21 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
if (net_xmit_drop_count(err))
qdisc_qstats_drop(sch);
qdisc_tree_reduce_backlog(sch, 1, pkt_len);
+ sch->qstats.backlog -= pkt_len;
+ sch->q.qlen--;
}
goto tfifo_dequeue;
}
+ sch->q.qlen--;
goto deliver;
}
if (q->qdisc) {
skb = q->qdisc->ops->dequeue(q->qdisc);
- if (skb)
+ if (skb) {
+ sch->q.qlen--;
goto deliver;
+ }
}
qdisc_watchdog_schedule_ns(&q->watchdog,
@@ -766,8 +774,10 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
if (q->qdisc) {
skb = q->qdisc->ops->dequeue(q->qdisc);
- if (skb)
+ if (skb) {
+ sch->q.qlen--;
goto deliver;
+ }
}
return NULL;
}
--
2.39.5
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH v2] net/sched: netem: account for backlog updates from child qdisc
2024-12-04 12:29 ` [PATCH v2] " Martin Ottens
@ 2024-12-05 12:40 ` Jamal Hadi Salim
2024-12-07 16:37 ` Martin Ottens
0 siblings, 1 reply; 10+ messages in thread
From: Jamal Hadi Salim @ 2024-12-05 12:40 UTC (permalink / raw)
To: Martin Ottens
Cc: Stephen Hemminger, Cong Wang, Jiri Pirko, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, netdev,
linux-kernel
On Wed, Dec 4, 2024 at 7:29 AM Martin Ottens <martin.ottens@fau.de> wrote:
>
> In general, 'qlen' of any classful qdisc should keep track of the
> number of packets that the qdisc itself and all of its children holds.
> In case of netem, 'qlen' only accounts for the packets in its internal
> tfifo. When netem is used with a child qdisc, the child qdisc can use
> 'qdisc_tree_reduce_backlog' to inform its parent, netem, about created
> or dropped SKBs. This function updates 'qlen' and the backlog statistics
> of netem, but netem does not account for changes made by a child qdisc.
> 'qlen' then indicates the wrong number of packets in the tfifo.
> If a child qdisc creates new SKBs during enqueue and informs its parent
> about this, netem's 'qlen' value is increased. When netem dequeues the
> newly created SKBs from the child, the 'qlen' in netem is not updated.
> If 'qlen' reaches the configured sch->limit, the enqueue function stops
> working, even though the tfifo is not full.
>
> Reproduce the bug:
> Ensure that the sender machine has GSO enabled. Configure netem as root
> qdisc and tbf as its child on the outgoing interface of the machine
> as follows:
> $ tc qdisc add dev <oif> root handle 1: netem delay 100ms limit 100
> $ tc qdisc add dev <oif> parent 1:0 tbf rate 50Mbit burst 1542 latency 50ms
>
> Send bulk TCP traffic out via this interface, e.g., by running an iPerf3
> client on the machine. Check the qdisc statistics:
> $ tc -s qdisc show dev <oif>
>
> tbf segments the GSO SKBs (tbf_segment) and updates the netem's 'qlen'.
> The interface fully stops transferring packets and "locks". In this case,
> the child qdisc and tfifo are empty, but 'qlen' indicates the tfifo is at
> its limit and no more packets are accepted.
>
Would be nice to see the before and after (your change) output of the
stats to illustrate
> This patch adds a counter for the entries in the tfifo. Netem's 'qlen' is
> only decreased when a packet is returned by its dequeue function, and not
> during enqueuing into the child qdisc. External updates to 'qlen' are thus
> accounted for and only the behavior of the backlog statistics changes. As
> in other qdiscs, 'qlen' then keeps track of how many packets are held in
> netem and all of its children. As before, sch->limit remains as the
> maximum number of packets in the tfifo. The same applies to netem's
> backlog statistics.
> (Note: the 'backlog' byte-statistic of netem is incorrect in the example
> above even after the patch is applied due to a bug in tbf. See my
> previous patch ([PATCH] net/sched: tbf: correct backlog statistic for
> GSO packets)).
>
> Signed-off-by: Martin Ottens <martin.ottens@fau.de>
Fixes missing - probably since 2.6.x. Stephen?
Your fix seems reasonable but I am curious: does this only happen with
TCP? If yes, perhaps the
GSO handling maybe contributing?
Can you run iperf with udp and see if the issue shows up again? Or
ping -f with size 1024.
You can slow down tbf for example with something like: tbf rate 20kbit
buffer 1600 limit 3000
cheers,
jamal
> ---
> net/sched/sch_netem.c | 22 ++++++++++++++++------
> 1 file changed, 16 insertions(+), 6 deletions(-)
>
> diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> index fe6fed291a7b..71ec9986ed37 100644
> --- a/net/sched/sch_netem.c
> +++ b/net/sched/sch_netem.c
> @@ -79,6 +79,8 @@ struct netem_sched_data {
> struct sk_buff *t_head;
> struct sk_buff *t_tail;
>
> + u32 t_len;
> +
> /* optional qdisc for classful handling (NULL at netem init) */
> struct Qdisc *qdisc;
>
> @@ -383,6 +385,7 @@ static void tfifo_reset(struct Qdisc *sch)
> rtnl_kfree_skbs(q->t_head, q->t_tail);
> q->t_head = NULL;
> q->t_tail = NULL;
> + q->t_len = 0;
> }
>
> static void tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch)
> @@ -412,6 +415,7 @@ static void tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch)
> rb_link_node(&nskb->rbnode, parent, p);
> rb_insert_color(&nskb->rbnode, &q->t_root);
> }
> + q->t_len++;
> sch->q.qlen++;
> }
>
> @@ -518,7 +522,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> 1<<get_random_u32_below(8);
> }
>
> - if (unlikely(sch->q.qlen >= sch->limit)) {
> + if (unlikely(q->t_len >= sch->limit)) {
> /* re-link segs, so that qdisc_drop_all() frees them all */
> skb->next = segs;
> qdisc_drop_all(skb, sch, to_free);
> @@ -702,8 +706,8 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
> tfifo_dequeue:
> skb = __qdisc_dequeue_head(&sch->q);
> if (skb) {
> - qdisc_qstats_backlog_dec(sch, skb);
> deliver:
> + qdisc_qstats_backlog_dec(sch, skb);
> qdisc_bstats_update(sch, skb);
> return skb;
> }
> @@ -719,8 +723,7 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
>
> if (time_to_send <= now && q->slot.slot_next <= now) {
> netem_erase_head(q, skb);
> - sch->q.qlen--;
> - qdisc_qstats_backlog_dec(sch, skb);
> + q->t_len--;
> skb->next = NULL;
> skb->prev = NULL;
> /* skb->dev shares skb->rbnode area,
> @@ -747,16 +750,21 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
> if (net_xmit_drop_count(err))
> qdisc_qstats_drop(sch);
> qdisc_tree_reduce_backlog(sch, 1, pkt_len);
> + sch->qstats.backlog -= pkt_len;
> + sch->q.qlen--;
> }
> goto tfifo_dequeue;
> }
> + sch->q.qlen--;
> goto deliver;
> }
>
> if (q->qdisc) {
> skb = q->qdisc->ops->dequeue(q->qdisc);
> - if (skb)
> + if (skb) {
> + sch->q.qlen--;
> goto deliver;
> + }
> }
>
> qdisc_watchdog_schedule_ns(&q->watchdog,
> @@ -766,8 +774,10 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
>
> if (q->qdisc) {
> skb = q->qdisc->ops->dequeue(q->qdisc);
> - if (skb)
> + if (skb) {
> + sch->q.qlen--;
> goto deliver;
> + }
> }
> return NULL;
> }
> --
> 2.39.5
>
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH v2] net/sched: netem: account for backlog updates from child qdisc
2024-12-05 12:40 ` Jamal Hadi Salim
@ 2024-12-07 16:37 ` Martin Ottens
2024-12-09 21:13 ` Jamal Hadi Salim
0 siblings, 1 reply; 10+ messages in thread
From: Martin Ottens @ 2024-12-07 16:37 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: Stephen Hemminger, Cong Wang, Jiri Pirko, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, netdev,
linux-kernel
On 05.12.24 13:40, Jamal Hadi Salim wrote:
> Would be nice to see the before and after (your change) output of the
> stats to illustrate
Setup is as described in my patch. I used a larger limit of
1000 for netem so that the overshoot of the qlen becomes more
visible. Kernel is from the current net-next tree (the patch to
sch_tbf referenced in my patch is already applied (1596a135e318)).
TCP before the fix (qlen is 1150p, exceeding the maximum of 1000p,
netem qdisc becomes "locked" and stops accepting packets):
qdisc netem 1: root refcnt 2 limit 1000 delay 100ms
Sent 2760196 bytes 1843 pkt (dropped 389, overlimits 0 requeues 0)
backlog 4294560030b 1150p requeues 0
qdisc tbf 10: parent 1:1 rate 50Mbit burst 1537b lat 50ms
Sent 2760196 bytes 1843 pkt (dropped 327, overlimits 7356 requeues 0)
backlog 0b 0p requeues 0
UDP (iperf3 sends 50Mbit/s) before the fix, no issues here:
qdisc netem 1: root refcnt 2 limit 1000 delay 100ms
Sent 71917940 bytes 48286 pkt (dropped 2415, overlimits 0 requeues 0)
backlog 643680b 432p requeues 0
qdisc tbf 10: parent 1:1 rate 50Mbit burst 1537b lat 50ms
Sent 71917940 bytes 48286 pkt (dropped 2415, overlimits 341057 requeues 0)
backlog 311410b 209p requeues 0
TCP after the fix (UDP is not affected by the fix):
qdisc netem 1: root refcnt 2 limit 1000 delay 100ms
Sent 94859934 bytes 62676 pkt (dropped 15, overlimits 0 requeues 0)
backlog 573806b 130p requeues 0
qdisc tbf 10: parent 1:1 rate 50Mbit burst 1537b lat 50ms
Sent 94859934 bytes 62676 pkt (dropped 324, overlimits 248442 requeues 0)
backlog 4542b 3p requeues 0
> Your fix seems reasonable but I am curious: does this only happen with
> TCP? If yes, perhaps the
> GSO handling maybe contributing?
> Can you run iperf with udp and see if the issue shows up again? Or
> ping -f with size 1024.
I was only able to reproduce this behavior with tbf and it happens
only when GSO packets are segmented inside the tbf child qdisc. As
shown above, UDP is therefore not affected. The behavior also occurs
if this configuration is used on the "outgoing" interface of a system
that just forwards packets between two networks and GRO is enabled on
the "incoming" interface.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] net/sched: netem: account for backlog updates from child qdisc
2024-12-07 16:37 ` Martin Ottens
@ 2024-12-09 21:13 ` Jamal Hadi Salim
2024-12-09 22:44 ` Jamal Hadi Salim
0 siblings, 1 reply; 10+ messages in thread
From: Jamal Hadi Salim @ 2024-12-09 21:13 UTC (permalink / raw)
To: Martin Ottens
Cc: Stephen Hemminger, Cong Wang, Jiri Pirko, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, netdev,
linux-kernel
On Sat, Dec 7, 2024 at 11:37 AM Martin Ottens <martin.ottens@fau.de> wrote:
>
> On 05.12.24 13:40, Jamal Hadi Salim wrote:
> > Would be nice to see the before and after (your change) output of the
> > stats to illustrate
>
> Setup is as described in my patch. I used a larger limit of
> 1000 for netem so that the overshoot of the qlen becomes more
> visible. Kernel is from the current net-next tree (the patch to
> sch_tbf referenced in my patch is already applied (1596a135e318)).
>
Ok, wasnt aware of this one..
>
> TCP before the fix (qlen is 1150p, exceeding the maximum of 1000p,
> netem qdisc becomes "locked" and stops accepting packets):
>
> qdisc netem 1: root refcnt 2 limit 1000 delay 100ms
> Sent 2760196 bytes 1843 pkt (dropped 389, overlimits 0 requeues 0)
> backlog 4294560030b 1150p requeues 0
> qdisc tbf 10: parent 1:1 rate 50Mbit burst 1537b lat 50ms
> Sent 2760196 bytes 1843 pkt (dropped 327, overlimits 7356 requeues 0)
> backlog 0b 0p requeues 0
>
> UDP (iperf3 sends 50Mbit/s) before the fix, no issues here:
>
> qdisc netem 1: root refcnt 2 limit 1000 delay 100ms
> Sent 71917940 bytes 48286 pkt (dropped 2415, overlimits 0 requeues 0)
> backlog 643680b 432p requeues 0
> qdisc tbf 10: parent 1:1 rate 50Mbit burst 1537b lat 50ms
> Sent 71917940 bytes 48286 pkt (dropped 2415, overlimits 341057 requeues 0)
> backlog 311410b 209p requeues 0
>
> TCP after the fix (UDP is not affected by the fix):
>
> qdisc netem 1: root refcnt 2 limit 1000 delay 100ms
> Sent 94859934 bytes 62676 pkt (dropped 15, overlimits 0 requeues 0)
> backlog 573806b 130p requeues 0
> qdisc tbf 10: parent 1:1 rate 50Mbit burst 1537b lat 50ms
> Sent 94859934 bytes 62676 pkt (dropped 324, overlimits 248442 requeues 0)
> backlog 4542b 3p requeues 0
>
backlog being > 0 is a problem, unless your results are captured mid
test (instead of end of test)
I will validate on net-next and with your patch.
> > Your fix seems reasonable but I am curious: does this only happen with
> > TCP? If yes, perhaps the
> > GSO handling maybe contributing?
> > Can you run iperf with udp and see if the issue shows up again? Or
> > ping -f with size 1024.
>
> I was only able to reproduce this behavior with tbf and it happens
> only when GSO packets are segmented inside the tbf child qdisc. As
> shown above, UDP is therefore not affected. The behavior also occurs
> if this configuration is used on the "outgoing" interface of a system
> that just forwards packets between two networks and GRO is enabled on
> the "incoming" interface.
Ok, will do a quick check since i have cycles..
cheers,
jamal
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] net/sched: netem: account for backlog updates from child qdisc
2024-12-09 21:13 ` Jamal Hadi Salim
@ 2024-12-09 22:44 ` Jamal Hadi Salim
0 siblings, 0 replies; 10+ messages in thread
From: Jamal Hadi Salim @ 2024-12-09 22:44 UTC (permalink / raw)
To: Martin Ottens
Cc: Stephen Hemminger, Cong Wang, Jiri Pirko, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, netdev,
linux-kernel
On Mon, Dec 9, 2024 at 4:13 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> On Sat, Dec 7, 2024 at 11:37 AM Martin Ottens <martin.ottens@fau.de> wrote:
> >
> > On 05.12.24 13:40, Jamal Hadi Salim wrote:
> > > Would be nice to see the before and after (your change) output of the
> > > stats to illustrate
> >
> > Setup is as described in my patch. I used a larger limit of
> > 1000 for netem so that the overshoot of the qlen becomes more
> > visible. Kernel is from the current net-next tree (the patch to
> > sch_tbf referenced in my patch is already applied (1596a135e318)).
> >
>
> Ok, wasnt aware of this one..
>
> >
> > TCP before the fix (qlen is 1150p, exceeding the maximum of 1000p,
> > netem qdisc becomes "locked" and stops accepting packets):
> >
> > qdisc netem 1: root refcnt 2 limit 1000 delay 100ms
> > Sent 2760196 bytes 1843 pkt (dropped 389, overlimits 0 requeues 0)
> > backlog 4294560030b 1150p requeues 0
> > qdisc tbf 10: parent 1:1 rate 50Mbit burst 1537b lat 50ms
> > Sent 2760196 bytes 1843 pkt (dropped 327, overlimits 7356 requeues 0)
> > backlog 0b 0p requeues 0
> >
> > UDP (iperf3 sends 50Mbit/s) before the fix, no issues here:
> >
> > qdisc netem 1: root refcnt 2 limit 1000 delay 100ms
> > Sent 71917940 bytes 48286 pkt (dropped 2415, overlimits 0 requeues 0)
> > backlog 643680b 432p requeues 0
> > qdisc tbf 10: parent 1:1 rate 50Mbit burst 1537b lat 50ms
> > Sent 71917940 bytes 48286 pkt (dropped 2415, overlimits 341057 requeues 0)
> > backlog 311410b 209p requeues 0
> >
> > TCP after the fix (UDP is not affected by the fix):
> >
> > qdisc netem 1: root refcnt 2 limit 1000 delay 100ms
> > Sent 94859934 bytes 62676 pkt (dropped 15, overlimits 0 requeues 0)
> > backlog 573806b 130p requeues 0
> > qdisc tbf 10: parent 1:1 rate 50Mbit burst 1537b lat 50ms
> > Sent 94859934 bytes 62676 pkt (dropped 324, overlimits 248442 requeues 0)
> > backlog 4542b 3p requeues 0
> >
>
> backlog being > 0 is a problem, unless your results are captured mid
> test (instead of end of test)
> I will validate on net-next and with your patch.
>
Ok, so seems sane to me - but can you please put output on the commit
reflecting after the test is completed?
Something like, before patch (highlighting stuck backlog on netem):
qdisc netem 1: root refcnt 2 limit 1000 delay 1s seed 17105543349430145291
Sent 35220 bytes 43 pkt (dropped 7, overlimits 0 requeues 0)
backlog 4294958212b 0p requeues 0
qdisc tbf 8003: parent 1: rate 50Mbit burst 1600b lat 224us
Sent 35220 bytes 43 pkt (dropped 17, overlimits 1 requeues 0)
backlog 0b 0p requeues 0
And after your patch:
qdisc netem 1: root refcnt 2 limit 1000 delay 1s seed 11503045766577034723
Sent 42864 bytes 49 pkt (dropped 5, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc tbf 8001: parent 1: rate 50Mbit burst 1600b lat 224us
Sent 42864 bytes 49 pkt (dropped 16, overlimits 5 requeues 0)
backlog 0b 0p requeues 0
backlog is now shown as cleared.
Coincidentally, removing your tbf patch (which is already in net-next)
and rerunning the test it didnt seem to matter whether GSO was on or
off (as you can see below backlog is stuck on tbf):
GSO off:
qdisc netem 1: root refcnt 2 limit 1000 delay 1s seed 12925321237200695918
Sent 26284 bytes 39 pkt (dropped 7, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc tbf 8001: parent 1: rate 50Mbit burst 1600b lat 224us
Sent 26284 bytes 39 pkt (dropped 17, overlimits 1 requeues 0)
backlog 4294959726b 0p requeues 0
GSO on:
qdisc netem 1: root refcnt 2 limit 1000 delay 1s seed 18236003995023052493
Sent 35224 bytes 43 pkt (dropped 7, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc tbf 8002: parent 1: rate 50Mbit burst 1600b lat 224us
Sent 35224 bytes 43 pkt (dropped 17, overlimits 1 requeues 0)
backlog 4294958212b 0p requeues 0
Please resubmit the patch - add my acked-by and put the proper
before/after stats.
Fixes is likely: Linux-2.6.12-rc2
cheers,
jamal
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-12-09 22:44 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-25 23:18 [PATCH] net/sched: netem: account for backlog updates from child qdisc Martin Ottens
2024-11-26 0:12 ` Stephen Hemminger
2024-11-26 11:13 ` Martin Ottens
2024-12-03 3:13 ` Jakub Kicinski
2024-12-04 11:47 ` Martin Ottens
2024-12-04 12:29 ` [PATCH v2] " Martin Ottens
2024-12-05 12:40 ` Jamal Hadi Salim
2024-12-07 16:37 ` Martin Ottens
2024-12-09 21:13 ` Jamal Hadi Salim
2024-12-09 22:44 ` Jamal Hadi Salim
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).