netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] netconsole: prevent deadlock when removing net driver that netconsole is using
@ 2011-04-22 14:06 Neil Horman
  2011-04-22 14:44 ` Neil Horman
  2011-04-22 18:10 ` [PATCH] netconsole: fix deadlock when removing net driver that netconsole is using (v2) Neil Horman
  0 siblings, 2 replies; 6+ messages in thread
From: Neil Horman @ 2011-04-22 14:06 UTC (permalink / raw)
  To: netdev; +Cc: Neil Horman, David S. Miller

A deadlock was reported to me recently that occured when netconsole was being
used in a virtual guest.  If the virtio_net driver was removed while netconsole
was setup to use an interface that was driven by that driver, the guest
deadlocked.  No backtrace was provided because netconsole was the only console
configured, but it became clear pretty quickly what the problem was.  In
netconsole_netdev_event, if we get an unregister event, we call
__netpoll_cleanup with the target_list_lock held and irqs disabled.
__netpoll_cleanup can, if pending netpoll packets are waiting call
cancel_delayed_work_sync, which is a sleeping path.  the might_sleep call in
that path gets triggered, causing a console warning to be issued.  The
netconsole write handler of course tries to take the target_list_lock again,
which we already hold, causing deadlock.

The fix is pretty striaghtforward.  Simply drop the target_list_lock and
re-enable irqs prior to calling __netpoll_cleanup, the re-acquire the lock, and
restart the loop.  Confirmed by myself to fix the problem reported.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
---
 drivers/net/netconsole.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index dfb67eb..f365840 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -670,6 +670,7 @@ static int netconsole_netdev_event(struct notifier_block *this,
 	      event == NETDEV_BONDING_DESLAVE || event == NETDEV_GOING_DOWN))
 		goto done;
 
+restart:
 	spin_lock_irqsave(&target_list_lock, flags);
 	list_for_each_entry(nt, &target_list, list) {
 		netconsole_target_get(nt);
@@ -683,9 +684,12 @@ static int netconsole_netdev_event(struct notifier_block *this,
 				 * rtnl_lock already held
 				 */
 				if (nt->np.dev) {
+					spin_unlock_irqrestore(&target_list_lock,
+							       flags);
 					__netpoll_cleanup(&nt->np);
 					dev_put(nt->np.dev);
 					nt->np.dev = NULL;
+					goto restart;
 				}
 				/* Fall through */
 			case NETDEV_GOING_DOWN:
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] netconsole: prevent deadlock when removing net driver that netconsole is using
  2011-04-22 14:06 [PATCH] netconsole: prevent deadlock when removing net driver that netconsole is using Neil Horman
@ 2011-04-22 14:44 ` Neil Horman
  2011-04-22 18:10 ` [PATCH] netconsole: fix deadlock when removing net driver that netconsole is using (v2) Neil Horman
  1 sibling, 0 replies; 6+ messages in thread
From: Neil Horman @ 2011-04-22 14:44 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller

On Fri, Apr 22, 2011 at 10:06:56AM -0400, Neil Horman wrote:
> A deadlock was reported to me recently that occured when netconsole was being
> used in a virtual guest.  If the virtio_net driver was removed while netconsole
> was setup to use an interface that was driven by that driver, the guest
> deadlocked.  No backtrace was provided because netconsole was the only console
> configured, but it became clear pretty quickly what the problem was.  In
> netconsole_netdev_event, if we get an unregister event, we call
> __netpoll_cleanup with the target_list_lock held and irqs disabled.
> __netpoll_cleanup can, if pending netpoll packets are waiting call
> cancel_delayed_work_sync, which is a sleeping path.  the might_sleep call in
> that path gets triggered, causing a console warning to be issued.  The
> netconsole write handler of course tries to take the target_list_lock again,
> which we already hold, causing deadlock.
> 
> The fix is pretty striaghtforward.  Simply drop the target_list_lock and
> re-enable irqs prior to calling __netpoll_cleanup, the re-acquire the lock, and
> restart the loop.  Confirmed by myself to fix the problem reported.
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> CC: "David S. Miller" <davem@davemloft.net>
> ---
>  drivers/net/netconsole.c |    4 ++++
>  1 files changed, 4 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
> index dfb67eb..f365840 100644
> --- a/drivers/net/netconsole.c
> +++ b/drivers/net/netconsole.c
> @@ -670,6 +670,7 @@ static int netconsole_netdev_event(struct notifier_block *this,
>  	      event == NETDEV_BONDING_DESLAVE || event == NETDEV_GOING_DOWN))
>  		goto done;
>  
> +restart:
>  	spin_lock_irqsave(&target_list_lock, flags);
>  	list_for_each_entry(nt, &target_list, list) {
>  		netconsole_target_get(nt);
> @@ -683,9 +684,12 @@ static int netconsole_netdev_event(struct notifier_block *this,
>  				 * rtnl_lock already held
>  				 */
>  				if (nt->np.dev) {
> +					spin_unlock_irqrestore(&target_list_lock,
> +							       flags);
>  					__netpoll_cleanup(&nt->np);
>  					dev_put(nt->np.dev);
>  					nt->np.dev = NULL;
> +					goto restart;
>  				}
>  				/* Fall through */
>  			case NETDEV_GOING_DOWN:
> -- 
> 1.7.4.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
Crap, hold on this, sorry.  Just saw that I forgot to issue a
netconsole_target_put to balance the get in the restart path.  I'll send a new
patch shortly.
Neil


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] netconsole: fix deadlock when removing net driver that netconsole is using (v2)
  2011-04-22 14:06 [PATCH] netconsole: prevent deadlock when removing net driver that netconsole is using Neil Horman
  2011-04-22 14:44 ` Neil Horman
@ 2011-04-22 18:10 ` Neil Horman
  2011-04-22 18:40   ` David Miller
  1 sibling, 1 reply; 6+ messages in thread
From: Neil Horman @ 2011-04-22 18:10 UTC (permalink / raw)
  To: netdev; +Cc: Neil Horman, David S. Miller

A deadlock was reported to me recently that occured when netconsole was being
used in a virtual guest.  If the virtio_net driver was removed while netconsole
was setup to use an interface that was driven by that driver, the guest
deadlocked.  No backtrace was provided because netconsole was the only console
configured, but it became clear pretty quickly what the problem was.  In
netconsole_netdev_event, if we get an unregister event, we call
__netpoll_cleanup with the target_list_lock held and irqs disabled.
__netpoll_cleanup can, if pending netpoll packets are waiting call
cancel_delayed_work_sync, which is a sleeping path.  the might_sleep call in
that path gets triggered, causing a console warning to be issued.  The
netconsole write handler of course tries to take the target_list_lock again,
which we already hold, causing deadlock.

The fix is pretty striaghtforward.  Simply drop the target_list_lock and
re-enable irqs prior to calling __netpoll_cleanup, the re-acquire the lock, and
restart the loop.  Confirmed by myself to fix the problem reported.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
---
 drivers/net/netconsole.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index dfb67eb..eb41e44 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -671,6 +671,7 @@ static int netconsole_netdev_event(struct notifier_block *this,
 		goto done;
 
 	spin_lock_irqsave(&target_list_lock, flags);
+restart:
 	list_for_each_entry(nt, &target_list, list) {
 		netconsole_target_get(nt);
 		if (nt->np.dev == dev) {
@@ -683,9 +684,16 @@ static int netconsole_netdev_event(struct notifier_block *this,
 				 * rtnl_lock already held
 				 */
 				if (nt->np.dev) {
+					spin_unlock_irqrestore(
+							      &target_list_lock,
+							      flags);
 					__netpoll_cleanup(&nt->np);
+					spin_lock_irqsave(&target_list_lock,
+							  flags);
 					dev_put(nt->np.dev);
 					nt->np.dev = NULL;
+					netconsole_target_put(nt);
+					goto restart;
 				}
 				/* Fall through */
 			case NETDEV_GOING_DOWN:
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] netconsole: fix deadlock when removing net driver that netconsole is using (v2)
  2011-04-22 18:10 ` [PATCH] netconsole: fix deadlock when removing net driver that netconsole is using (v2) Neil Horman
@ 2011-04-22 18:40   ` David Miller
  2011-04-22 21:31     ` Neil Horman
  0 siblings, 1 reply; 6+ messages in thread
From: David Miller @ 2011-04-22 18:40 UTC (permalink / raw)
  To: nhorman; +Cc: netdev

From: Neil Horman <nhorman@tuxdriver.com>
Date: Fri, 22 Apr 2011 14:10:59 -0400

> @@ -683,9 +684,16 @@ static int netconsole_netdev_event(struct notifier_block *this,
>  				 * rtnl_lock already held
>  				 */
>  				if (nt->np.dev) {
> +					spin_unlock_irqrestore(
> +							      &target_list_lock,
> +							      flags);
>  					__netpoll_cleanup(&nt->np);
> +					spin_lock_irqsave(&target_list_lock,
> +							  flags);
>  					dev_put(nt->np.dev);
>  					nt->np.dev = NULL;
> +					netconsole_target_put(nt);
> +					goto restart;

If you drop the lock here, another cpu can put the device and set it
to NULL.

In which case you'll double release when you regrab the lock here.

Too bad you can't NULL out the device before dropping the lock,
because that way would be able to ensure you'd be the only releaser.

But that won't work because __netpoll_cleanup() wants the device
pointer to be set.

Hmmm...

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] netconsole: fix deadlock when removing net driver that netconsole is using (v2)
  2011-04-22 18:40   ` David Miller
@ 2011-04-22 21:31     ` Neil Horman
  2011-04-22 21:34       ` David Miller
  0 siblings, 1 reply; 6+ messages in thread
From: Neil Horman @ 2011-04-22 21:31 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Fri, Apr 22, 2011 at 11:40:33AM -0700, David Miller wrote:
> From: Neil Horman <nhorman@tuxdriver.com>
> Date: Fri, 22 Apr 2011 14:10:59 -0400
> 
> > @@ -683,9 +684,16 @@ static int netconsole_netdev_event(struct notifier_block *this,
> >  				 * rtnl_lock already held
> >  				 */
> >  				if (nt->np.dev) {
> > +					spin_unlock_irqrestore(
> > +							      &target_list_lock,
> > +							      flags);
> >  					__netpoll_cleanup(&nt->np);
> > +					spin_lock_irqsave(&target_list_lock,
> > +							  flags);
> >  					dev_put(nt->np.dev);
> >  					nt->np.dev = NULL;
> > +					netconsole_target_put(nt);
> > +					goto restart;
> 
> If you drop the lock here, another cpu can put the device and set it
> to NULL.
> 
> In which case you'll double release when you regrab the lock here.
> 
> Too bad you can't NULL out the device before dropping the lock,
> because that way would be able to ensure you'd be the only releaser.
> 
> But that won't work because __netpoll_cleanup() wants the device
> pointer to be set.
> 
> Hmmm...
> 
I understand what you're saying here, but I think we're ok in this particular
case.  I say that because all other callers of __netpoll_cleanup, call it via
netpoll_cleanup, which does the dev_put under protection of the rtnl_lock.  This
call is also under the rtnl_lock protection, its just taken when the event
notification is made (thats why we call __netpoll_cleanup instead of
netpoll_cleanup).  The target_list_lock just protects the integrity of the
netconsole_target list.  If someone disables a netconsole via configfs, they'll
block on the rtnl_lock.  Since no path through configfs takes the
target_list_lock and rtnl (via netpoll_cleanup) in any nested fashion, we're
safe from deadlock.

Best
Neil


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] netconsole: fix deadlock when removing net driver that netconsole is using (v2)
  2011-04-22 21:31     ` Neil Horman
@ 2011-04-22 21:34       ` David Miller
  0 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2011-04-22 21:34 UTC (permalink / raw)
  To: nhorman; +Cc: netdev

From: Neil Horman <nhorman@tuxdriver.com>
Date: Fri, 22 Apr 2011 17:31:45 -0400

> I understand what you're saying here, but I think we're ok in this particular
> case.  I say that because all other callers of __netpoll_cleanup, call it via
> netpoll_cleanup, which does the dev_put under protection of the rtnl_lock.  This
> call is also under the rtnl_lock protection, its just taken when the event
> notification is made (thats why we call __netpoll_cleanup instead of
> netpoll_cleanup).  The target_list_lock just protects the integrity of the
> netconsole_target list.  If someone disables a netconsole via configfs, they'll
> block on the rtnl_lock.  Since no path through configfs takes the
> target_list_lock and rtnl (via netpoll_cleanup) in any nested fashion, we're
> safe from deadlock.

That's right, rtnl saves us.

Thanks for explaining, applied, thanks!

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-04-22 21:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-22 14:06 [PATCH] netconsole: prevent deadlock when removing net driver that netconsole is using Neil Horman
2011-04-22 14:44 ` Neil Horman
2011-04-22 18:10 ` [PATCH] netconsole: fix deadlock when removing net driver that netconsole is using (v2) Neil Horman
2011-04-22 18:40   ` David Miller
2011-04-22 21:31     ` Neil Horman
2011-04-22 21:34       ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).