netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Fastabend <john.fastabend@gmail.com>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
	Jiri Pirko <jiri@resnulli.us>, David Miller <davem@davemloft.net>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>
Subject: Re: [net PATCH v2] net: sched, fix OOO packets with pfifo_fast
Date: Mon, 26 Mar 2018 11:16:45 -0700	[thread overview]
Message-ID: <7f8636e3-c04f-18b6-7e6c-0f28bc54edbb@gmail.com> (raw)
In-Reply-To: <CAM_iQpWNX-9p-bo+caUyJ8yfsNDS1a2pV9LNvHK4=y3ec4qRVw@mail.gmail.com>

On 03/26/2018 10:30 AM, Cong Wang wrote:
> On Sat, Mar 24, 2018 at 10:25 PM, John Fastabend
> <john.fastabend@gmail.com> wrote:
>> After the qdisc lock was dropped in pfifo_fast we allow multiple
>> enqueue threads and dequeue threads to run in parallel. On the
>> enqueue side the skb bit ooo_okay is used to ensure all related
>> skbs are enqueued in-order. On the dequeue side though there is
>> no similar logic. What we observe is with fewer queues than CPUs
>> it is possible to re-order packets when two instances of
>> __qdisc_run() are running in parallel. Each thread will dequeue
>> a skb and then whichever thread calls the ndo op first will
>> be sent on the wire. This doesn't typically happen because
>> qdisc_run() is usually triggered by the same core that did the
>> enqueue. However, drivers will trigger __netif_schedule()
>> when queues are transitioning from stopped to awake using the
>> netif_tx_wake_* APIs. When this happens netif_schedule() calls
>> qdisc_run() on the same CPU that did the netif_tx_wake_* which
>> is usually done in the interrupt completion context. This CPU
>> is selected with the irq affinity which is unrelated to the
>> enqueue operations.
> 
> Interesting. Why this is unique to pfifo_fast? For me it could
> happen to other qdisc's too, when we release the qdisc root
> lock in sch_direct_xmit(), another CPU could dequeue from
> the same qdisc and transmit the skb in parallel too?
> 

Agreed, my guess is it never happens because the timing is
tighter in the lock case. Or if it is happening its infrequent
enough that no one noticed the OOO packets.

For net-next we probably could clean this up. I was just
going for something simple in net that didn't penalize all
qdiscs as Eric noted. This patch doesn't make it any worse
at least. And we have been living with the above race for
years.

> ...
> 
>> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
>> index 7e3fbe9..39c144b 100644
>> --- a/net/sched/sch_generic.c
>> +++ b/net/sched/sch_generic.c
>> @@ -373,24 +373,33 @@ bool sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
>>   */
>>  static inline bool qdisc_restart(struct Qdisc *q, int *packets)
>>  {
>> +       bool more, validate, nolock = q->flags & TCQ_F_NOLOCK;
>>         spinlock_t *root_lock = NULL;
>>         struct netdev_queue *txq;
>>         struct net_device *dev;
>>         struct sk_buff *skb;
>> -       bool validate;
>>
>>         /* Dequeue packet */
>> +       if (nolock && test_and_set_bit(__QDISC_STATE_RUNNING, &q->state))
>> +               return false;
>> +
> 
> Nit: you probably want to move the comment below this if check,
> or simply remove it since it is useless...
> 

hmm I was planning to do a comment rewrite patch to bring the comments
in sch_generic.c up to date in net-next I'll delete it there. I think
we can live with the extra line in net. Also Eric pointed out that
qdisc_restart is not really a good name anymore for this routine.

.John

  reply	other threads:[~2018-03-26 18:16 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-25  5:25 [net PATCH v2] net: sched, fix OOO packets with pfifo_fast John Fastabend
2018-03-26 16:36 ` David Miller
2018-03-26 17:10   ` John Fastabend
2018-03-26 17:30 ` Cong Wang
2018-03-26 18:16   ` John Fastabend [this message]
2018-04-18  7:28     ` Paolo Abeni
2018-04-18 16:44       ` John Fastabend
2018-04-19  8:00         ` Paolo Abeni
2018-05-08 16:17         ` Paolo Abeni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7f8636e3-c04f-18b6-7e6c-0f28bc54edbb@gmail.com \
    --to=john.fastabend@gmail.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=jiri@resnulli.us \
    --cc=netdev@vger.kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).