From mboxrd@z Thu Jan 1 00:00:00 1970 From: John Fastabend Subject: Re: [net PATCH v2] net: sched, fix OOO packets with pfifo_fast Date: Mon, 26 Mar 2018 10:10:06 -0700 Message-ID: <03243235-f9ae-e44b-a0d7-0b8f3294dd2a@gmail.com> References: <20180325052505.4098.36912.stgit@john-Precision-Tower-5810> <20180326.123643.803872307508307757.davem@davemloft.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Cc: eric.dumazet@gmail.com, xiyou.wangcong@gmail.com, jiri@resnulli.us, netdev@vger.kernel.org, Jakob Unterwurzacher To: David Miller Return-path: Received: from mail-pl0-f66.google.com ([209.85.160.66]:32894 "EHLO mail-pl0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751890AbeCZRKV (ORCPT ); Mon, 26 Mar 2018 13:10:21 -0400 Received: by mail-pl0-f66.google.com with SMTP id c11-v6so12336108plo.0 for ; Mon, 26 Mar 2018 10:10:21 -0700 (PDT) In-Reply-To: <20180326.123643.803872307508307757.davem@davemloft.net> Content-Language: en-US Sender: netdev-owner@vger.kernel.org List-ID: On 03/26/2018 09:36 AM, David Miller wrote: > From: John Fastabend > Date: Sat, 24 Mar 2018 22:25:06 -0700 > >> After the qdisc lock was dropped in pfifo_fast we allow multiple >> enqueue threads and dequeue threads to run in parallel. On the >> enqueue side the skb bit ooo_okay is used to ensure all related >> skbs are enqueued in-order. On the dequeue side though there is >> no similar logic. What we observe is with fewer queues than CPUs >> it is possible to re-order packets when two instances of >> __qdisc_run() are running in parallel. Each thread will dequeue >> a skb and then whichever thread calls the ndo op first will >> be sent on the wire. This doesn't typically happen because >> qdisc_run() is usually triggered by the same core that did the >> enqueue. However, drivers will trigger __netif_schedule() >> when queues are transitioning from stopped to awake using the >> netif_tx_wake_* APIs. When this happens netif_schedule() calls >> qdisc_run() on the same CPU that did the netif_tx_wake_* which >> is usually done in the interrupt completion context. This CPU >> is selected with the irq affinity which is unrelated to the >> enqueue operations. >> >> To resolve this we add a RUNNING bit to the qdisc to ensure >> only a single dequeue per qdisc is running. Enqueue and dequeue >> operations can still run in parallel and also on multi queue >> NICs we can still have a dequeue in-flight per qdisc, which >> is typically per CPU. >> >> Fixes: c5ad119fb6c0 ("net: sched: pfifo_fast use skb_array") >> Reported-by: Jakob Unterwurzacher >> Signed-off-by: John Fastabend > > Applied, thanks John. > Great, also off-list email from Jakob (I forgot to add him to the CC list here, oops) told me to add, Tested-by: Jakob Unterwurzacher Also in net-next I'll look to see if we can avoid doing the extra atomics especially in cases where they are not actually needed. For example the 1:1 qdisc to txq mappings. It seems a bit evasive though for net. Finally just an FYI but I think I'll look at a distributed counter soon so we can get a lockless token bucket. I need the counter for BPF as well so coming soon. Thanks, John