From: John Fastabend <john.fastabend@gmail.com>
To: Eric Dumazet <eric.dumazet@gmail.com>,
xiyou.wangcong@gmail.com, jiri@resnulli.us, davem@davemloft.net
Cc: netdev@vger.kernel.org
Subject: Re: [net PATCH] net: sched, fix OOO packets with pfifo_fast
Date: Sat, 24 Mar 2018 15:10:29 -0700 [thread overview]
Message-ID: <32acd1d0-48b0-eaff-c971-9febdd33fcbd@gmail.com> (raw)
In-Reply-To: <71d325ad-5270-0491-b858-75674d9fe09a@gmail.com>
On 03/24/2018 02:15 PM, Eric Dumazet wrote:
>
>
> On 03/24/2018 01:13 PM, John Fastabend wrote:
>> After the qdisc lock was dropped in pfifo_fast we allow multiple
>> enqueue threads and dequeue threads to run in parallel. On the
>> enqueue side the skb bit ooo_okay is used to ensure all related
>> skbs are enqueued in-order. On the dequeue side though there is
>> no similar logic. What we observe is with fewer queues than CPUs
>> it is possible to re-order packets when two instances of
>> __qdisc_run() are running in parallel. Each thread will dequeue
>> a skb and then whichever thread calls the ndo op first will
>> be sent on the wire. This doesn't typically happen because
>> qdisc_run() is usually triggered by the same core that did the
>> enqueue. However, drivers will trigger __netif_schedule()
>> when queues are transitioning from stopped to awake using the
>> netif_tx_wake_* APIs. When this happens netif_schedule() calls
>> qdisc_run() on the same CPU that did the netif_tx_wake_* which
>> is usually done in the interrupt completion context. This CPU
>> is selected with the irq affinity which is unrelated to the
>> enqueue operations.
>>
>> To resolve this we add a RUNNING bit to the qdisc to ensure
>> only a single dequeue per qdisc is running. Enqueue and dequeue
>> operations can still run in parallel and also on multi queue
>> NICs we can still have a dequeue in-flight per qdisc, which
>> is typically per CPU.
>>
>> Fixes: c5ad119fb6c0 ("net: sched: pfifo_fast use skb_array")
>> Reported-by: Jakob Unterwurzacher <jakob.unterwurzacher@theobroma-systems.com>
>> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
>> ---
>> include/net/sch_generic.h | 1 +
>> net/sched/sch_generic.c | 13 ++++++++++---
>> 2 files changed, 11 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>> index 2092d33..8da3267 100644
>> --- a/include/net/sch_generic.h
>> +++ b/include/net/sch_generic.h
>> @@ -30,6 +30,7 @@ struct qdisc_rate_table {
>> enum qdisc_state_t {
>> __QDISC_STATE_SCHED,
>> __QDISC_STATE_DEACTIVATED,
>> + __QDISC_STATE_RUNNING,
>> };
>>
>> struct qdisc_size_table {
>> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
>> index 7e3fbe9..29a1b47 100644
>> --- a/net/sched/sch_generic.c
>> +++ b/net/sched/sch_generic.c
>> @@ -377,12 +377,17 @@ static inline bool qdisc_restart(struct Qdisc *q, int *packets)
>> struct netdev_queue *txq;
>> struct net_device *dev;
>> struct sk_buff *skb;
>> - bool validate;
>> + bool more, validate;
>>
>> /* Dequeue packet */
>> + if (test_and_set_bit(__QDISC_STATE_RUNNING, &q->state))
>> + return false;
>> +
[...]
>
> This adds a pair of atomic operations in fast path, only for pfifo_fast sake.
>
Yeah, we can wrap these in a `if (TCQ_F_NOLOCK)` to avoid it in cases
its not needed. Alternatively, for net we could turn off NOLOCK in
pfifo_fast and fix it net-next with something more complete.
> qdisc_restart() name is misleading, this is used from __qdisc_run()
>
>
I'll change it in net-next.
.John
prev parent reply other threads:[~2018-03-24 22:10 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-24 20:13 [net PATCH] net: sched, fix OOO packets with pfifo_fast John Fastabend
2018-03-24 21:15 ` Eric Dumazet
2018-03-24 22:10 ` John Fastabend [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=32acd1d0-48b0-eaff-c971-9febdd33fcbd@gmail.com \
--to=john.fastabend@gmail.com \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=jiri@resnulli.us \
--cc=netdev@vger.kernel.org \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.