netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Ostrowski <mostrows@watson.ibm.com>
To: Stefan Rompf <srompf@isg.de>
Cc: netdev@oss.sgi.com
Subject: Re: Deadlock in sungem/ip_auto_config/linkwatch
Date: Mon, 05 Jan 2004 11:19:25 -0500	[thread overview]
Message-ID: <1073319565.2043.98923.camel@brick.watson.ibm.com> (raw)
In-Reply-To: <200401051550.51063.srompf@isg.de>

[-- Attachment #1: Type: text/plain, Size: 1994 bytes --]

This can get pretty hairy.

Suppose the linkwatch code backs-off in the case that rtnl_sem is held
legitimately by thread A.  Meanwhile, thread B is doing a
flush_scheduled_work in order to wait for pending linkwatch events to
complete.  

In the proposed solution this will result in incorrect behaviour
(flush_scheduled_work returns with the linkwatch work not really done). 
(Admittedly I'm not sure if such a scenario really is feasible.)

My initial though was to use a seperate work-queue, un-entangled with
the global queue used for flush_scheduled_work.  This would allow
linkwatch events to be synchronized against explicitly.  For this
solution though I think it would be nice to not have to have a thread
per cpu for the linkwatch work queue.

On the other hand, ic_open_devs appears to be the only place where
rtnl_sem is held while going into a driver's open() function, and so
maybe the right rule is that rtnl_sem is not held when calling
dev->open().


-- 
Michal Ostrowski <mostrows@watson.ibm.com>


On Mon, 2004-01-05 at 09:50, Stefan Rompf wrote:
> Am Montag, 05. Januar 2004 14:07 schrieb Michal Ostrowski:
> 
> > ic_open_devs grabs rtnl_sem with an rtnl_shlock() call.
> >
> > The sungem driver at some point calls gem_init_one, which calls
> > netif_carrier_*, which in turn calls schedule_work (linkwatch_event).
> >
> > linkwatch_event in turn needs rtnl_sem.
> 
> Good catch! The sungem driver shows clearly that we need some way to remove 
> queued work without scheduling and waiting for other events.
> 
> I will change the linkwatch code to use rtnl_shlock_nowait() and backoff and 
> retry in case of failure this week. Call it a workaround, but it increases 
> overall system stability.
> 
> Btw, what is the planned difference between rtnl_shlock() and rtnl_exlock()? 
> Even though the later is a null operation right now, I don't want to hold 
> more locks than needed in the linkwatch code.
> 
> Stefan

> 

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

  reply	other threads:[~2004-01-05 16:19 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-01-05 13:07 Deadlock in sungem/ip_auto_config/linkwatch Michal Ostrowski
2004-01-05 14:50 ` Stefan Rompf
2004-01-05 16:19   ` Michal Ostrowski [this message]
2004-01-05 16:50     ` Stefan Rompf
2004-01-05 19:02   ` David S. Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1073319565.2043.98923.camel@brick.watson.ibm.com \
    --to=mostrows@watson.ibm.com \
    --cc=netdev@oss.sgi.com \
    --cc=srompf@isg.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).