* [PATCH] pkt_sched: fq: fix non TCP flows pacing
@ 2013-10-07 21:44 Eric Dumazet
2013-10-08 20:53 ` David Miller
2013-10-08 22:16 ` [PATCH v2 net] " Eric Dumazet
0 siblings, 2 replies; 5+ messages in thread
From: Eric Dumazet @ 2013-10-07 21:44 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Steinar H. Gunderson
From: Eric Dumazet <edumazet@google.com>
Steinar reported FQ pacing was not working for UDP flows.
It looks like the initial sk->sk_pacing_rate value of 0 was
a wrong choice. We should init it to ~0U like sk_max_pacing_rate
Then, TCA_FQ_FLOW_DEFAULT_RATE should be removed because it makes
no real sense. (The default rate is really : ~0U)
While debugging this issue, I realized sk_pacing_rate is shared between
transport and packet scheduler without locking / barriers :
We should use ACCESS_ONCE() to make sure compiler wont perform
multiple loads or stores.
Reported-by: Steinar H. Gunderson <sesse@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/core/sock.c | 1 +
net/ipv4/tcp_input.c | 7 ++++++-
net/sched/sch_fq.c | 20 +++++++++-----------
3 files changed, 16 insertions(+), 12 deletions(-)
diff --git a/net/core/sock.c b/net/core/sock.c
index 2bd9b3f..fd6afa2 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2331,6 +2331,7 @@ void sock_init_data(struct socket *sock, struct sock *sk)
#endif
sk->sk_max_pacing_rate = ~0U;
+ sk->sk_pacing_rate = ~0U;
/*
* Before updating sk_refcnt, we must commit prior changes to memory
* (Documentation/RCU/rculist_nulls.txt for details)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index fa6cf1f..d95f875 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -755,7 +755,12 @@ static void tcp_update_pacing_rate(struct sock *sk)
if (tp->srtt > 8 + 2)
do_div(rate, tp->srtt);
- sk->sk_pacing_rate = min_t(u64, rate, sk->sk_max_pacing_rate);
+ /* ACCESS_ONCE() is needed because FQ fetches sk_pacing_rate without
+ * any lock. We want to make sure compiler wont use sk_pacing_rate
+ * with intermediate values.
+ */
+ ACCESS_ONCE(sk->sk_pacing_rate) = min_t(u64, rate,
+ sk->sk_max_pacing_rate);
}
/* Calculate rto without backoff. This is the second half of Van Jacobson's
diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index a2fef8b..46b2adb 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -472,20 +472,16 @@ begin:
if (f->credit > 0 || !q->rate_enable)
goto out;
- if (skb->sk && skb->sk->sk_state != TCP_TIME_WAIT) {
- rate = skb->sk->sk_pacing_rate ?: q->flow_default_rate;
+ rate = q->flow_max_rate;
+ if (skb->sk && skb->sk->sk_state != TCP_TIME_WAIT)
+ rate = min(ACCESS_ONCE(skb->sk->sk_pacing_rate), rate);
- rate = min(rate, q->flow_max_rate);
- } else {
- rate = q->flow_max_rate;
- if (rate == ~0U)
- goto out;
- }
- if (rate) {
+ if (rate != ~0U) {
u32 plen = max(qdisc_pkt_len(skb), q->quantum);
u64 len = (u64)plen * NSEC_PER_SEC;
- do_div(len, rate);
+ if (likely(rate))
+ do_div(len, rate);
/* Since socket rate can change later,
* clamp the delay to 125 ms.
* TODO: maybe segment the too big skb, as in commit
@@ -735,12 +731,14 @@ static int fq_dump(struct Qdisc *sch, struct sk_buff *skb)
if (opts == NULL)
goto nla_put_failure;
+ /* TCA_FQ_FLOW_DEFAULT_RATE is not used anymore,
+ * do not bother giving its value
+ */
if (nla_put_u32(skb, TCA_FQ_PLIMIT, sch->limit) ||
nla_put_u32(skb, TCA_FQ_FLOW_PLIMIT, q->flow_plimit) ||
nla_put_u32(skb, TCA_FQ_QUANTUM, q->quantum) ||
nla_put_u32(skb, TCA_FQ_INITIAL_QUANTUM, q->initial_quantum) ||
nla_put_u32(skb, TCA_FQ_RATE_ENABLE, q->rate_enable) ||
- nla_put_u32(skb, TCA_FQ_FLOW_DEFAULT_RATE, q->flow_default_rate) ||
nla_put_u32(skb, TCA_FQ_FLOW_MAX_RATE, q->flow_max_rate) ||
nla_put_u32(skb, TCA_FQ_BUCKETS_LOG, q->fq_trees_log))
goto nla_put_failure;
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] pkt_sched: fq: fix non TCP flows pacing
2013-10-07 21:44 [PATCH] pkt_sched: fq: fix non TCP flows pacing Eric Dumazet
@ 2013-10-08 20:53 ` David Miller
2013-10-08 21:24 ` Eric Dumazet
2013-10-08 22:16 ` [PATCH v2 net] " Eric Dumazet
1 sibling, 1 reply; 5+ messages in thread
From: David Miller @ 2013-10-08 20:53 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev, sesse
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 07 Oct 2013 14:44:29 -0700
> From: Eric Dumazet <edumazet@google.com>
>
> Steinar reported FQ pacing was not working for UDP flows.
>
> It looks like the initial sk->sk_pacing_rate value of 0 was
> a wrong choice. We should init it to ~0U like sk_max_pacing_rate
>
> Then, TCA_FQ_FLOW_DEFAULT_RATE should be removed because it makes
> no real sense. (The default rate is really : ~0U)
>
> While debugging this issue, I realized sk_pacing_rate is shared between
> transport and packet scheduler without locking / barriers :
>
> We should use ACCESS_ONCE() to make sure compiler wont perform
> multiple loads or stores.
>
> Reported-by: Steinar H. Gunderson <sesse@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Is this meant for net or net-next? It doesn't apply cleanly to the
former.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] pkt_sched: fq: fix non TCP flows pacing
2013-10-08 20:53 ` David Miller
@ 2013-10-08 21:24 ` Eric Dumazet
0 siblings, 0 replies; 5+ messages in thread
From: Eric Dumazet @ 2013-10-08 21:24 UTC (permalink / raw)
To: David Miller; +Cc: netdev, sesse
On Tue, 2013-10-08 at 16:53 -0400, David Miller wrote:
> Is this meant for net or net-next? It doesn't apply cleanly to the
> former.
Oh right, I need to respin for net tree, sorry for the confusion.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v2 net] pkt_sched: fq: fix non TCP flows pacing
2013-10-07 21:44 [PATCH] pkt_sched: fq: fix non TCP flows pacing Eric Dumazet
2013-10-08 20:53 ` David Miller
@ 2013-10-08 22:16 ` Eric Dumazet
2013-10-09 1:54 ` David Miller
1 sibling, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2013-10-08 22:16 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Steinar H. Gunderson
From: Eric Dumazet <edumazet@google.com>
Steinar reported FQ pacing was not working for UDP flows.
It looks like the initial sk->sk_pacing_rate value of 0 was
a wrong choice. We should init it to ~0U (unlimited)
Then, TCA_FQ_FLOW_DEFAULT_RATE should be removed because it makes
no real sense. The default rate is really unlimited, and we
need to avoid a zero divide.
Reported-by: Steinar H. Gunderson <sesse@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
I removed the ACCESS_ONCE() stuff, as it adds conflicts for
next (net / net-next) merge. I'll send a separate patch later.
net/core/sock.c | 1 +
net/sched/sch_fq.c | 20 +++++++++-----------
2 files changed, 10 insertions(+), 11 deletions(-)
diff --git a/net/core/sock.c b/net/core/sock.c
index 5b6beba..0b39e7a 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2319,6 +2319,7 @@ void sock_init_data(struct socket *sock, struct sock *sk)
sk->sk_ll_usec = sysctl_net_busy_read;
#endif
+ sk->sk_pacing_rate = ~0U;
/*
* Before updating sk_refcnt, we must commit prior changes to memory
* (Documentation/RCU/rculist_nulls.txt for details)
diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index 48501a2..a9dfdda 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -472,20 +472,16 @@ begin:
if (f->credit > 0 || !q->rate_enable)
goto out;
- if (skb->sk && skb->sk->sk_state != TCP_TIME_WAIT) {
- rate = skb->sk->sk_pacing_rate ?: q->flow_default_rate;
+ rate = q->flow_max_rate;
+ if (skb->sk && skb->sk->sk_state != TCP_TIME_WAIT)
+ rate = min(skb->sk->sk_pacing_rate, rate);
- rate = min(rate, q->flow_max_rate);
- } else {
- rate = q->flow_max_rate;
- if (rate == ~0U)
- goto out;
- }
- if (rate) {
+ if (rate != ~0U) {
u32 plen = max(qdisc_pkt_len(skb), q->quantum);
u64 len = (u64)plen * NSEC_PER_SEC;
- do_div(len, rate);
+ if (likely(rate))
+ do_div(len, rate);
/* Since socket rate can change later,
* clamp the delay to 125 ms.
* TODO: maybe segment the too big skb, as in commit
@@ -735,12 +731,14 @@ static int fq_dump(struct Qdisc *sch, struct sk_buff *skb)
if (opts == NULL)
goto nla_put_failure;
+ /* TCA_FQ_FLOW_DEFAULT_RATE is not used anymore,
+ * do not bother giving its value
+ */
if (nla_put_u32(skb, TCA_FQ_PLIMIT, sch->limit) ||
nla_put_u32(skb, TCA_FQ_FLOW_PLIMIT, q->flow_plimit) ||
nla_put_u32(skb, TCA_FQ_QUANTUM, q->quantum) ||
nla_put_u32(skb, TCA_FQ_INITIAL_QUANTUM, q->initial_quantum) ||
nla_put_u32(skb, TCA_FQ_RATE_ENABLE, q->rate_enable) ||
- nla_put_u32(skb, TCA_FQ_FLOW_DEFAULT_RATE, q->flow_default_rate) ||
nla_put_u32(skb, TCA_FQ_FLOW_MAX_RATE, q->flow_max_rate) ||
nla_put_u32(skb, TCA_FQ_BUCKETS_LOG, q->fq_trees_log))
goto nla_put_failure;
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2 net] pkt_sched: fq: fix non TCP flows pacing
2013-10-08 22:16 ` [PATCH v2 net] " Eric Dumazet
@ 2013-10-09 1:54 ` David Miller
0 siblings, 0 replies; 5+ messages in thread
From: David Miller @ 2013-10-09 1:54 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev, sesse
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 08 Oct 2013 15:16:00 -0700
> From: Eric Dumazet <edumazet@google.com>
>
> Steinar reported FQ pacing was not working for UDP flows.
>
> It looks like the initial sk->sk_pacing_rate value of 0 was
> a wrong choice. We should init it to ~0U (unlimited)
>
> Then, TCA_FQ_FLOW_DEFAULT_RATE should be removed because it makes
> no real sense. The default rate is really unlimited, and we
> need to avoid a zero divide.
>
> Reported-by: Steinar H. Gunderson <sesse@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
> I removed the ACCESS_ONCE() stuff, as it adds conflicts for
> next (net / net-next) merge. I'll send a separate patch later.
Applied, thanks Eric.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-10-09 1:54 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-07 21:44 [PATCH] pkt_sched: fq: fix non TCP flows pacing Eric Dumazet
2013-10-08 20:53 ` David Miller
2013-10-08 21:24 ` Eric Dumazet
2013-10-08 22:16 ` [PATCH v2 net] " Eric Dumazet
2013-10-09 1:54 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).