From: Paolo Abeni <pabeni@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>,
Eric Dumazet <edumazet@google.com>,
netdev <netdev@vger.kernel.org>,
"David S. Miller" <davem@davemloft.net>,
Jiri Pirko <jiri@mellanox.com>,
Daniel Borkmann <daniel@iogearbox.net>,
Alexei Starovoitov <ast@plumgrid.com>,
Alexander Duyck <aduyck@mirantis.com>,
Tom Herbert <tom@herbertland.com>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@kernel.org>, Rik van Riel <riel@redhat.com>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH 0/2] net: threadable napi poll loop
Date: Wed, 11 May 2016 16:38:36 +0200 [thread overview]
Message-ID: <1462977516.4444.73.camel@redhat.com> (raw)
In-Reply-To: <1462972112.23934.139.camel@edumazet-glaptop3.roam.corp.google.com>
On Wed, 2016-05-11 at 06:08 -0700, Eric Dumazet wrote:
> On Wed, 2016-05-11 at 11:48 +0200, Paolo Abeni wrote:
> > Hi Eric,
> > On Tue, 2016-05-10 at 15:51 -0700, Eric Dumazet wrote:
> > > On Wed, 2016-05-11 at 00:32 +0200, Hannes Frederic Sowa wrote:
> > >
> > > > Not only did we want to present this solely as a bugfix but also as as
> > > > performance enhancements in case of virtio (as you can see in the cover
> > > > letter). Given that a long time ago there was a tendency to remove
> > > > softirqs completely, we thought it might be very interesting, that a
> > > > threaded napi in general seems to be absolutely viable nowadays and
> > > > might offer new features.
> > >
> > > Well, you did not fix the bug, you worked around by adding yet another
> > > layer, with another sysctl that admins or programs have to manage.
> > >
> > > If you have a special need for virtio, do not hide it behind a 'bug fix'
> > > but add it as a features request.
> > >
> > > This ksoftirqd issue is real and a fix looks very reasonable.
> > >
> > > Please try this patch, as I had very good success with it.
> >
> > Thank you for your time and your effort.
> >
> > I tested your patch on the bare metal "single core" scenario, disabling
> > the unneeded cores with:
> > CPUS=`nproc`
> > for I in `seq 1 $CPUS`; do echo 0 > /sys/devices/system/node/node0/cpu$I/online; done
> >
> > And I got a:
> >
> > [ 86.925249] Broke affinity for irq <num>
> >
>
> Was it fatal, or simply a warning that you are removing the cpu that was
> the only allowed cpu in an affinity_mask ?
The above message is emitted with pr_notice() by the x86 version of
fixup_irqs(). It's not fatal, the host is alive and well after that. The
un-patched kernel does not emit it on cpus disabling.
I'll try to look into this later.
> Looks another bug to fix then ? We disabled CPU hotplug here at Google
> for our production, as it was notoriously buggy. No time to fix dozens
> of issues added by a crowd of developers that do not even know a cpu can
> be unplugged.
>
> Maybe some caller of local_bh_disable()/local_bh_enable() expected that
> current softirq would be processed. Obviously flaky even before the
> patches.
>
> > for each irq number generated by a network device.
> >
> > In this scenario, your patch solves the ksoftirqd issue, performing
> > comparable to the napi threaded patches (with a negative delta in the
> > noise range) and introducing a minor regression with a single flow, in
> > the noise range (3%).
> >
> > As said in a previous mail, we actually experimented something similar,
> > but it felt quite hackish.
>
> Right, we are networking guys, and we feel that messing with such core
> infra is not for us. So we feel comfortable adding a pure networking
> patch.
>
> >
> > AFAICS this patch adds three more tests in the fast path and affect all
> > other softirq use case. I'm not sure how to check for regression there.
>
> It is obvious to me that ksoftird mechanism is not working as intended.
>
> Fixing it might uncover bugs from parts of the kernel relying on the
> bug, indirectly or directly. Is it a good thing ?
>
> I can not tell before trying.
>
> Just by looking at /proc/{ksoftirqs_pid}/sched you can see the problem,
> as we normally schedule ksoftird under stress but most of the time,
> the softirq items were processed by another tasks as you found out.
>
>
> >
> > The napi thread patches are actually a new feature, that also fixes the
> > ksoftirqd issue: hunting the ksoftirqd issue has been the initial
> > trigger for this work. I'm sorry for not being clear enough in the cover
> > letter.
> >
> > The napi thread patches offer additional benefits, i.e. an additional
> > relevant gain in the described test scenario, and do not impact on other
> > subsystems/kernel entities.
> >
> > I still think they are worthy, and I bet you would disagree, but could
> > you please articulate more which parts concern you most and/or are more
> > bloated ?
>
> Just look at the added code. napi_threaded_poll() is very buggy, but
> honestly I do not want to fix the bugs you added there. If you have only
> one vcpu, how jiffies can ever change since you block BH ?
Uh, we have likely the same issue in the net_rx_action() function, which
also execute with bh disabled and check for jiffies changes even on
single core hosts ?!?
Aren't jiffies updated by the timer interrupt ? and thous even with
bh_disabled ?!?
> I was planning to remove cond_resched_softirq() that we no longer use
> after my recent changes to TCP stack,
> and you call it again (while it is obviously buggy since it does not
> check if a BH is pending, only if a thread needs the cpu)
I missed that, thank you for pointing out.
> I prefer fixing the existing code, really. It took us years to
> understand it and maybe fix it.
>
> Just think of what will happen if you have 10 devices (10 new threads in
> your model) and one cpu.
>
> Instead of the nice existing netif_rx() doing 64 packets per device
> rounds, you'll now rely on process scheduler behavior that has no such
> granularity.
>
> Adding more threads is the natural answer of userland programmers, but
> in the kernel it is not the right answer. We already have mechanism,
> just use them and fix them if they are broken.
>
> Sorry, I really do not think your patches are the way to go.
> But this thread is definitely interesting.
Oh, this is a far better comment that I would have expected ;-)
Cheers,
Paolo
next prev parent reply other threads:[~2016-05-11 14:38 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-10 14:11 [RFC PATCH 0/2] net: threadable napi poll loop Paolo Abeni
2016-05-10 14:11 ` [RFC PATCH 1/2] net: implement threaded-able napi poll loop support Paolo Abeni
2016-05-10 14:11 ` [RFC PATCH 2/2] net: add sysfs attribute to control napi threaded mode Paolo Abeni
2016-05-10 14:29 ` [RFC PATCH 0/2] net: threadable napi poll loop Eric Dumazet
2016-05-10 15:51 ` David Miller
2016-05-10 16:03 ` Paolo Abeni
2016-05-10 16:08 ` Eric Dumazet
2016-05-10 20:22 ` Paolo Abeni
2016-05-10 20:45 ` David Miller
2016-05-10 20:50 ` Rik van Riel
2016-05-10 20:52 ` David Miller
2016-05-10 21:01 ` Rik van Riel
2016-05-10 20:46 ` Hannes Frederic Sowa
2016-05-10 21:09 ` Eric Dumazet
2016-05-10 21:31 ` Eric Dumazet
2016-05-10 21:35 ` Rik van Riel
2016-05-10 21:53 ` Eric Dumazet
2016-05-10 22:02 ` Eric Dumazet
2016-05-10 22:44 ` Eric Dumazet
2016-05-10 22:02 ` Rik van Riel
2016-05-11 17:55 ` Eric Dumazet
2016-05-10 22:32 ` Hannes Frederic Sowa
2016-05-10 22:51 ` Eric Dumazet
2016-05-11 6:55 ` Peter Zijlstra
2016-05-11 13:13 ` Hannes Frederic Sowa
2016-05-11 14:40 ` Eric Dumazet
2016-05-11 15:01 ` Rik van Riel
2016-05-11 15:50 ` Eric Dumazet
2016-05-11 21:56 ` Eric Dumazet
2016-05-12 20:07 ` Paolo Abeni
2016-05-12 20:49 ` Eric Dumazet
2016-05-12 20:58 ` Paolo Abeni
2016-05-12 21:05 ` Eric Dumazet
2016-05-13 16:50 ` Paolo Abeni
2016-05-13 17:03 ` Eric Dumazet
2016-05-13 17:19 ` Paolo Abeni
2016-05-13 17:36 ` Eric Dumazet
2016-05-16 13:10 ` Paolo Abeni
2016-05-16 13:38 ` Eric Dumazet
2016-05-11 9:48 ` Paolo Abeni
2016-05-11 13:08 ` Eric Dumazet
2016-05-11 13:39 ` Hannes Frederic Sowa
2016-05-11 13:47 ` Hannes Frederic Sowa
2016-05-11 14:38 ` Paolo Abeni [this message]
2016-05-11 14:45 ` Eric Dumazet
2016-05-11 22:47 ` Hannes Frederic Sowa
2016-05-10 15:57 ` Thomas Gleixner
2016-05-10 20:41 ` Paolo Abeni
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1462977516.4444.73.camel@redhat.com \
--to=pabeni@redhat.com \
--cc=aduyck@mirantis.com \
--cc=ast@plumgrid.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=eric.dumazet@gmail.com \
--cc=hannes@stressinduktion.org \
--cc=jiri@mellanox.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=tom@herbertland.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.