netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Abeni <pabeni@redhat.com>
To: John Fastabend <john.fastabend@gmail.com>,
	Cong Wang <xiyou.wangcong@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
	Jiri Pirko <jiri@resnulli.us>, David Miller <davem@davemloft.net>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>
Subject: Re: [net PATCH v2] net: sched, fix OOO packets with pfifo_fast
Date: Wed, 18 Apr 2018 09:28:32 +0200	[thread overview]
Message-ID: <1524036512.2599.4.camel@redhat.com> (raw)
In-Reply-To: <7f8636e3-c04f-18b6-7e6c-0f28bc54edbb@gmail.com>

Hi,

let me revive this old thread...

On Mon, 2018-03-26 at 11:16 -0700, John Fastabend wrote:
> On 03/26/2018 10:30 AM, Cong Wang wrote:
> > On Sat, Mar 24, 2018 at 10:25 PM, John Fastabend
> > <john.fastabend@gmail.com> wrote:
> > > After the qdisc lock was dropped in pfifo_fast we allow multiple
> > > enqueue threads and dequeue threads to run in parallel. On the
> > > enqueue side the skb bit ooo_okay is used to ensure all related
> > > skbs are enqueued in-order. On the dequeue side though there is
> > > no similar logic. What we observe is with fewer queues than CPUs
> > > it is possible to re-order packets when two instances of
> > > __qdisc_run() are running in parallel. Each thread will dequeue
> > > a skb and then whichever thread calls the ndo op first will
> > > be sent on the wire. This doesn't typically happen because
> > > qdisc_run() is usually triggered by the same core that did the
> > > enqueue. However, drivers will trigger __netif_schedule()
> > > when queues are transitioning from stopped to awake using the
> > > netif_tx_wake_* APIs. When this happens netif_schedule() calls
> > > qdisc_run() on the same CPU that did the netif_tx_wake_* which
> > > is usually done in the interrupt completion context. This CPU
> > > is selected with the irq affinity which is unrelated to the
> > > enqueue operations.
> > 
> > Interesting. Why this is unique to pfifo_fast? For me it could
> > happen to other qdisc's too, when we release the qdisc root
> > lock in sch_direct_xmit(), another CPU could dequeue from
> > the same qdisc and transmit the skb in parallel too?
> > 
> 
> Agreed, my guess is it never happens because the timing is
> tighter in the lock case. Or if it is happening its infrequent
> enough that no one noticed the OOO packets.

I think the above could not happend due to the qdisc seqlock - which is
not acquired by NOLOCK qdiscs.

> For net-next we probably could clean this up. I was just
> going for something simple in net that didn't penalize all
> qdiscs as Eric noted. This patch doesn't make it any worse
> at least. And we have been living with the above race for
> years.

I've benchmarked this patch is some different scenario, and in my
testing it introduces a measurable regression in uncontended/lightly
contended scenarios. The measured peak negative delta is with a pktgen
thread using "xmit_mode queue_xmit":

before: 27674032 pps
after: 23809052 pps

I spend some time searching a way to improve this, without success.

John, did you had any chance to look at this again?

Thanks,

Paolo

  reply	other threads:[~2018-04-18  7:28 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-25  5:25 [net PATCH v2] net: sched, fix OOO packets with pfifo_fast John Fastabend
2018-03-26 16:36 ` David Miller
2018-03-26 17:10   ` John Fastabend
2018-03-26 17:30 ` Cong Wang
2018-03-26 18:16   ` John Fastabend
2018-04-18  7:28     ` Paolo Abeni [this message]
2018-04-18 16:44       ` John Fastabend
2018-04-19  8:00         ` Paolo Abeni
2018-05-08 16:17         ` Paolo Abeni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1524036512.2599.4.camel@redhat.com \
    --to=pabeni@redhat.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=jiri@resnulli.us \
    --cc=john.fastabend@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).