From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Tejun Heo <tj@kernel.org>
Cc: Joe Lawrence <joe.lawrence@stratus.com>,
netdev@vger.kernel.org, Jiri Pirko <jiri@resnulli.us>
Subject: Re: [PATCH] team: add rescheduling jiffy delay on !rtnl_trylock
Date: Wed, 1 Oct 2014 23:43:08 -0700 [thread overview]
Message-ID: <20141002064308.GN5015@linux.vnet.ibm.com> (raw)
In-Reply-To: <20140929160601.GD15925@htj.dyndns.org>
On Mon, Sep 29, 2014 at 12:06:01PM -0400, Tejun Heo wrote:
> (cc'ing Paul and quoting the whole body)
>
> Paul, this is a fix for RCU sched stall observed w/ a work item
> requeueing itself waiting for the RCU grace period. As the self
> requeueing work item ends up being executed by the same kworker, the
> worker task never stops running in the absence of a higher priority
> task and it seems to delay RCU grace period for a very long time on
> !PREEMPT kernels. As each work item denotes a boundary which no
> synchronization construct stretches across, I wonder whether it'd be a
> good idea to add a notification for the end of RCU critical section
> between executions of work items.
It sounds like a great idea to me! I suggest invoking
rcu_note_context_switch() between executions of work items.
Thanx, Paul
> Thanks.
>
> On Mon, Sep 29, 2014 at 11:54:45AM -0400, Joe Lawrence wrote:
> > Hello Jiri,
> >
> > I've been debugging a hang on RHEL7 that seems to originate in the
> > teaming driver and the team_notify_peers_work/team_mcast_rejoin_work
> > rtnl_trylock rescheduling logic. Running a stand-alone minimal driver
> > mimicing the same schedule_delayed_work(.., 0) reproduces the problem on
> > RHEL7 and upstream kernels [1].
> >
> > A quick summary of the hang:
> >
> > 1 - systemd-udevd issues an ioctl that heads down dev_ioctl (grabs the
> > rtnl_mutex), dev_ifsioc, dev_change_name and finally
> > synchronize_sched. In every vmcore I've taken of the hang, this
> > thread is waiting on the RCU.
> >
> > 2 - A kworker thread goes to 100% CPU.
> >
> > 3 - Inspecting the running thread on the CPU that rcusched reported as
> > holding up the RCU grace period usually shows it in either
> > team_notify_peers_work, team_mcast_rejoin_work, or somewhere in the
> > workqueue code (process_one_work). This is the same CPU/thread as
> > #2.
> >
> > 4 - team_notify_peers_work and team_mcast_rejoin_work want the rtnl_lock
> > that systemd-udevd in #1 has, so they try to play nice by calling
> > rtnl_trylock and rescheduling on failure. Unfortunately with 0
> > jiffy delay, process_one_work will "execute immediately" (ie, after
> > others already in queue, but before the next tick). With the stock
> > RHEL7 !CONFIG_PREEMPT at least, this creates a tight loop on
> > process_one_work + rtnl_trylock that spins the CPU in #2.
> >
> > 5 - Sometime minutes later, RCU seems to be kicked by a side effect of
> > a smp_apic_timer_interrupt. (This was the only other interesting
> > function reported by ftrace function tracer).
> >
> > See the patch below for a potential workaround. Giving at least 1 jiffy
> > should give process_one_work some breathing room before calling back
> > into team_notify_peers_work/team_mcast_rejoin_work and attempting to
> > acquire the rtnl_lock mutex.
> >
> > Regards,
> >
> > -- Joe
> >
> > [1] http://marc.info/?l=linux-kernel&m=141192244232345&w=2
> >
> > -->8--- -->8--- -->8--- -->8---
> >
> > From fc5bbf5771b5732f7479ac6e84bbfdde05710023 Mon Sep 17 00:00:00 2001
> > From: Joe Lawrence <joe.lawrence@stratus.com>
> > Date: Mon, 29 Sep 2014 11:09:05 -0400
> > Subject: [PATCH] team: add rescheduling jiffy delay on !rtnl_trylock
> >
> > Give the CPU running the kworker handling team_notify_peers_work and
> > team_mcast_rejoin_work functions some scheduling air by specifying a
> > non-zero delay.
> >
> > Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
> > ---
> > drivers/net/team/team.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
> > index ef10302..d46df38 100644
> > --- a/drivers/net/team/team.c
> > +++ b/drivers/net/team/team.c
> > @@ -633,7 +633,7 @@ static void team_notify_peers_work(struct work_struct *work)
> > team = container_of(work, struct team, notify_peers.dw.work);
> >
> > if (!rtnl_trylock()) {
> > - schedule_delayed_work(&team->notify_peers.dw, 0);
> > + schedule_delayed_work(&team->notify_peers.dw, 1);
> > return;
> > }
> > call_netdevice_notifiers(NETDEV_NOTIFY_PEERS, team->dev);
> > @@ -673,7 +673,7 @@ static void team_mcast_rejoin_work(struct work_struct *work)
> > team = container_of(work, struct team, mcast_rejoin.dw.work);
> >
> > if (!rtnl_trylock()) {
> > - schedule_delayed_work(&team->mcast_rejoin.dw, 0);
> > + schedule_delayed_work(&team->mcast_rejoin.dw, 1);
> > return;
> > }
> > call_netdevice_notifiers(NETDEV_RESEND_IGMP, team->dev);
> > --
> > 1.7.10.4
> >
>
> --
> tejun
>
next prev parent reply other threads:[~2014-10-02 6:43 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-29 15:54 [PATCH] team: add rescheduling jiffy delay on !rtnl_trylock Joe Lawrence
2014-09-29 16:06 ` Tejun Heo
2014-10-02 6:43 ` Paul E. McKenney [this message]
2014-10-03 19:37 ` Joe Lawrence
2014-10-04 8:37 ` Paul E. McKenney
2014-10-05 2:13 ` Tejun Heo
2014-10-05 12:53 ` Joe Lawrence
2014-10-05 14:08 ` Paul E. McKenney
2014-10-05 16:11 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141002064308.GN5015@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=jiri@resnulli.us \
--cc=joe.lawrence@stratus.com \
--cc=netdev@vger.kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.