public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* calling flush_scheduled_work()
@ 2004-03-12 20:58 Tim Hockin
  2004-03-12 22:00 ` Trond Myklebust
  2004-03-12 23:27 ` Andrew Morton
  0 siblings, 2 replies; 11+ messages in thread
From: Tim Hockin @ 2004-03-12 20:58 UTC (permalink / raw)
  To: Linux Kernel mailing list

We've recently bumped into an issue, and I'm not sure which is the real bug.

In short we have a case where mntput() is called from the kevetd workqueue.
When that mntput() hit an NFS mount, we got a deadlock.  It turns out that
deep in the RPC code, someone calls flush_scheduled_work().  Deadlock.

So what is the real bug?

Is it verboten to call mntput() from keventd?  What other things might lead
to a flush_scheduled_work() and must therefore be avoided?

Should callers of flush_scheduled_work() be changed to use private
workqueues?  There are 31 calls that I got from grep.  25 are in drivers/, 1
in ncpfs, 3 in nfs4, 2 in sunrpc.  The drivers/ are *probably* ok. Should
those other 6 be changed?

Either way, it seems like there should maybe be a check and a badness
warning if flush_workqueue is called from that workqueue.

Which avenue should we follow?  Our own problem can be fixed differently,
but I didn't want to just ignore this unstated assumption that it is safe to
call flush_scheduled_work() anywhere.

Tim

-- 
Tim Hockin
Sun Microsystems, Linux Software Engineering
thockin@sun.com
All opinions are my own, not Sun's

^ permalink raw reply	[flat|nested] 11+ messages in thread
* Re: calling flush_scheduled_work()
@ 2004-03-13 11:45 Stefan Rompf
  0 siblings, 0 replies; 11+ messages in thread
From: Stefan Rompf @ 2004-03-13 11:45 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton

Andrew Morton wrote:

>> In short we have a case where mntput() is called from the kevetd
>> workqueue.
>> When that mntput() hit an NFS mount, we got a deadlock.  It turns out
>> that
>> deep in the RPC code, someone calls flush_scheduled_work().  Deadlock.
> 
> Seems simple enough to fix the workqueue code to handle this situation.

Code fixing one corner case won't help. Some time ago, there has been a 
deadlock between a network driver that called flush_scheduled_work() while 
the kernel held the rtnl semaphore and work scheduled by the linkwatch code 
that needs rtnl.

I had posted a patch that changed linkwatch not to block waiting for rtnl, 
however it was dropped in favor of fixing the driver (I don't own that card, 
so I can't tell you if it works by now)

However, this is another example for the problem: Any code can 
schedule_work(), any other code can wait in any place for this work to 
complete. As long as we don't have some known consent on what functions that 
runs inside the keventd workqueue may (not) do, and when it is ok to call 
flush_scheduled_work(), we are always at risk that the workqueue mechanism 
creates a deadlock by accident.

Stefan

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2004-03-13 11:44 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-12 20:58 calling flush_scheduled_work() Tim Hockin
2004-03-12 22:00 ` Trond Myklebust
2004-03-12 22:38   ` Tim Hockin
2004-03-12 23:19     ` Trond Myklebust
2004-03-12 23:27       ` Tim Hockin
2004-03-12 23:27 ` Andrew Morton
2004-03-13  0:20   ` Tim Hockin
2004-03-13  1:17   ` Trond Myklebust
2004-03-13  2:12     ` Andrew Morton
2004-03-13  2:24       ` Trond Myklebust
  -- strict thread matches above, loose matches on Subject: below --
2004-03-13 11:45 Stefan Rompf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox