From mboxrd@z Thu Jan 1 00:00:00 1970 From: Paolo Abeni Subject: Re: [net PATCH v2] net: sched, fix OOO packets with pfifo_fast Date: Wed, 18 Apr 2018 09:28:32 +0200 Message-ID: <1524036512.2599.4.camel@redhat.com> References: <20180325052505.4098.36912.stgit@john-Precision-Tower-5810> <7f8636e3-c04f-18b6-7e6c-0f28bc54edbb@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: Eric Dumazet , Jiri Pirko , David Miller , Linux Kernel Network Developers To: John Fastabend , Cong Wang Return-path: Received: from mx3-rdu2.redhat.com ([66.187.233.73]:34864 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752511AbeDRH2e (ORCPT ); Wed, 18 Apr 2018 03:28:34 -0400 In-Reply-To: <7f8636e3-c04f-18b6-7e6c-0f28bc54edbb@gmail.com> Sender: netdev-owner@vger.kernel.org List-ID: Hi, let me revive this old thread... On Mon, 2018-03-26 at 11:16 -0700, John Fastabend wrote: > On 03/26/2018 10:30 AM, Cong Wang wrote: > > On Sat, Mar 24, 2018 at 10:25 PM, John Fastabend > > wrote: > > > After the qdisc lock was dropped in pfifo_fast we allow multiple > > > enqueue threads and dequeue threads to run in parallel. On the > > > enqueue side the skb bit ooo_okay is used to ensure all related > > > skbs are enqueued in-order. On the dequeue side though there is > > > no similar logic. What we observe is with fewer queues than CPUs > > > it is possible to re-order packets when two instances of > > > __qdisc_run() are running in parallel. Each thread will dequeue > > > a skb and then whichever thread calls the ndo op first will > > > be sent on the wire. This doesn't typically happen because > > > qdisc_run() is usually triggered by the same core that did the > > > enqueue. However, drivers will trigger __netif_schedule() > > > when queues are transitioning from stopped to awake using the > > > netif_tx_wake_* APIs. When this happens netif_schedule() calls > > > qdisc_run() on the same CPU that did the netif_tx_wake_* which > > > is usually done in the interrupt completion context. This CPU > > > is selected with the irq affinity which is unrelated to the > > > enqueue operations. > > > > Interesting. Why this is unique to pfifo_fast? For me it could > > happen to other qdisc's too, when we release the qdisc root > > lock in sch_direct_xmit(), another CPU could dequeue from > > the same qdisc and transmit the skb in parallel too? > > > > Agreed, my guess is it never happens because the timing is > tighter in the lock case. Or if it is happening its infrequent > enough that no one noticed the OOO packets. I think the above could not happend due to the qdisc seqlock - which is not acquired by NOLOCK qdiscs. > For net-next we probably could clean this up. I was just > going for something simple in net that didn't penalize all > qdiscs as Eric noted. This patch doesn't make it any worse > at least. And we have been living with the above race for > years. I've benchmarked this patch is some different scenario, and in my testing it introduces a measurable regression in uncontended/lightly contended scenarios. The measured peak negative delta is with a pktgen thread using "xmit_mode queue_xmit": before: 27674032 pps after: 23809052 pps I spend some time searching a way to improve this, without success. John, did you had any chance to look at this again? Thanks, Paolo