From: Mike Galbraith <umgwanakikbuti@gmail.com>
To: trevor_davenport@selinc.com
Cc: linux-kernel@vger.kernel.org
Subject: Re: process_backlog interruptions with 3.10.47-rt50
Date: Wed, 11 Feb 2015 05:31:16 +0100 [thread overview]
Message-ID: <1423629076.9360.28.camel@gmail.com> (raw)
In-Reply-To: <OF70711DAC.356CE873-ON88257DE7.007C4A72-88257DE8.00776286@selinc.com>
On Tue, 2015-02-10 at 13:43 -0800, trevor_davenport@selinc.com wrote:
> I've recently encountered a problem after upgrading from 3.0.57-rt82 to
> 3.10.47-rt50 where process_backlog gets interrupted and does not resume
> for a while, which results in packets not being processed in time. I see
> net_rx_action, which then calls process_backlog (as the poll method to
> process the backlog of packets queued up the netif_rx) but then after the
> interruption, it does not finish for about 5ms. In the older kernel it
> would finish based on the priority of ksoftirqd. This is no longer the
> case.
>
> I have priorities configured so that hard interrupts are highest,
> ksoftirqd next (both are SCHED_FIFO) and then my program is currently
> SCHED_OTHER but I still do not see the rx softirq finish before my program
> runs.
>
> This is all on a single core powerpc device. I do not see these problems
> with a net device which uses NAPI directly (as such i'm updating my driver
> to use NAPI) but it seems like there is a real bug here somewhere. I have
> not been able to find any mention of similar problems (perhaps few people
> are using netif_rx these days).
>
> I've attached a recording from perf which shows the problem. Specifically,
> you see net_rx_action run at time 213.079014 and then it doesn't finish
> until about 5ms later at time 213.084953 which i not the case on the older
> kernels. It seems something has changed with softirq handling or
> process_backlog needed adapted for it. My suspicion is this has something
> to do with the work mentioned in 210dc110063cf040d3209fddf766f6fcafccdc34
> but I'm not an expert with this area of the kernel.
Your suspicion is correct. Your net traffic is being handled by your
SCHED_OTHER database task, which lost the CPU for a while due to it
being a SCHED_OTHER task. It's a behavior change from previous rt
kernels, but not a bad one. At the rt mini-summit of whatever year that
was, this change was shown to be a massive win. Low priority network
traffic now won't hinder a high priority task getting to the CPU, and
should a high priority task block due to your low priority task having
been preempted while holding the sirq lock it wants, PI will kick in.
-Mike
prev parent reply other threads:[~2015-02-11 4:42 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-10 21:43 process_backlog interruptions with 3.10.47-rt50 trevor_davenport
2015-02-11 4:31 ` Mike Galbraith [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1423629076.9360.28.camel@gmail.com \
--to=umgwanakikbuti@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=trevor_davenport@selinc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.