From mboxrd@z Thu Jan  1 00:00:00 1970
From: arnd@arndb.de (Arnd Bergmann)
Date: Fri, 04 Dec 2015 22:30:35 +0100
Subject: [PATCH net-next v2 4/4] net: mvneta: Spread out the TX queues
 management on all CPUs
In-Reply-To: <1449256350.25029.36.camel@edumazet-glaptop2.roam.corp.google.com>
References: <1449254700-32685-1-git-send-email-gregory.clement@free-electrons.com>
 <1449254700-32685-5-git-send-email-gregory.clement@free-electrons.com>
 <1449256350.25029.36.camel@edumazet-glaptop2.roam.corp.google.com>
Message-ID: <10523514.jXrVKo414l@wuerfel>
To: linux-arm-kernel@lists.infradead.org
List-Id: linux-arm-kernel.lists.infradead.org

On Friday 04 December 2015 11:12:30 Eric Dumazet wrote:
> On Fri, 2015-12-04 at 19:45 +0100, Gregory CLEMENT wrote:
> > With this patch each CPU is associated with its own set of TX queues. In
> > the same time the SKB received in mvneta_tx is bound to the queue
> > associated to the CPU sending the data. Thanks to this the next IRQ will
> > be received on the same CPU allowing sending more data.
> > 
> > It will also allow to have a more predictable behavior regarding
> > throughput and latency when having multiple threads sending out data on
> > different CPUs.
> > 
> > As an example on Armada XP GP, with an iperf bound to a CPU and a ping
> > bound to another CPU, without this patch the ping round trip was about
> > 2.5ms (and could reach 3s!), whereas with this patch it was around
> > 0.7ms (and sometime it went to 1.2ms).
> 
> This really looks like you need something smarter than pfifo_fast qdisc,
> and maybe BQL (I did not check if this driver already implements this)

I suggested this change as well as the BQL implementation that Marcin did.
I believe he hasn't posted that yet while he's doing some more testing,
but it should come soon.

	Arnd