calling flush_scheduled

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* calling flush_scheduled_work()
@ 2004-03-12 20:58 Tim Hockin
  2004-03-12 22:00 ` Trond Myklebust
  2004-03-12 23:27 ` Andrew Morton
  0 siblings, 2 replies; 11+ messages in thread
From: Tim Hockin @ 2004-03-12 20:58 UTC (permalink / raw)
  To: Linux Kernel mailing list

We've recently bumped into an issue, and I'm not sure which is the real bug.

In short we have a case where mntput() is called from the kevetd workqueue.
When that mntput() hit an NFS mount, we got a deadlock.  It turns out that
deep in the RPC code, someone calls flush_scheduled_work().  Deadlock.

So what is the real bug?

Is it verboten to call mntput() from keventd?  What other things might lead
to a flush_scheduled_work() and must therefore be avoided?

Should callers of flush_scheduled_work() be changed to use private
workqueues?  There are 31 calls that I got from grep.  25 are in drivers/, 1
in ncpfs, 3 in nfs4, 2 in sunrpc.  The drivers/ are *probably* ok. Should
those other 6 be changed?

Either way, it seems like there should maybe be a check and a badness
warning if flush_workqueue is called from that workqueue.

Which avenue should we follow?  Our own problem can be fixed differently,
but I didn't want to just ignore this unstated assumption that it is safe to
call flush_scheduled_work() anywhere.

Tim

-- 
Tim Hockin
Sun Microsystems, Linux Software Engineering
thockin@sun.com
All opinions are my own, not Sun's

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: calling flush_scheduled_work()
  2004-03-12 20:58 calling flush_scheduled_work() Tim Hockin
@ 2004-03-12 22:00 ` Trond Myklebust
  2004-03-12 22:38   ` Tim Hockin
  2004-03-12 23:27 ` Andrew Morton
  1 sibling, 1 reply; 11+ messages in thread
From: Trond Myklebust @ 2004-03-12 22:00 UTC (permalink / raw)
  To: Tim Hockin; +Cc: Linux Kernel mailing list

På fr , 12/03/2004 klokka 15:58, skreiv Tim Hockin:
> We've recently bumped into an issue, and I'm not sure which is the real bug.
> 
> In short we have a case where mntput() is called from the kevetd workqueue.
> When that mntput() hit an NFS mount, we got a deadlock.  It turns out that
> deep in the RPC code, someone calls flush_scheduled_work().  Deadlock.
> 
> So what is the real bug?
> 
> Is it verboten to call mntput() from keventd?  What other things might lead
> to a flush_scheduled_work() and must therefore be avoided?
> 
> Should callers of flush_scheduled_work() be changed to use private
> workqueues?  There are 31 calls that I got from grep.  25 are in drivers/, 1
> in ncpfs, 3 in nfs4, 2 in sunrpc.  The drivers/ are *probably* ok. Should
> those other 6 be changed?

It would be dead easy to change RPC+NFS to use their own, workqueue. In
fact I've already considered that several times in the places where the
NFSv4 state stuff gets hairy. ... but it might be nice not to have to
set up all these (mostly) idle threads for exclusive use by NFS just in
order to catch a corner case.

Could we therefore perhaps consider setting up one or more global
workqueues that would be alternatives to keventd?

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: calling flush_scheduled_work()
  2004-03-12 22:00 ` Trond Myklebust
@ 2004-03-12 22:38   ` Tim Hockin
  2004-03-12 23:19     ` Trond Myklebust
  0 siblings, 1 reply; 11+ messages in thread
From: Tim Hockin @ 2004-03-12 22:38 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Tim Hockin, Linux Kernel mailing list

On Fri, Mar 12, 2004 at 05:00:48PM -0500, Trond Myklebust wrote:
> It would be dead easy to change RPC+NFS to use their own, workqueue. In
> fact I've already considered that several times in the places where the
> NFSv4 state stuff gets hairy. ... but it might be nice not to have to
> set up all these (mostly) idle threads for exclusive use by NFS just in
> order to catch a corner case.
> 
> Could we therefore perhaps consider setting up one or more global
> workqueues that would be alternatives to keventd?

That just divides up the problem, but doesn't solve it.  The simplest would
be to print a badness and somehow get out of the flush.  Then anyone who is
triggering the corner case (us, in this case) can just solve it ourselves.
It's not pretty.

In short, it's dubious that ANYONE call flush_scheduled_work() on a
workqueue that they don't own.

Tim

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: calling flush_scheduled_work()
  2004-03-12 22:38   ` Tim Hockin
@ 2004-03-12 23:19     ` Trond Myklebust
  2004-03-12 23:27       ` Tim Hockin
  0 siblings, 1 reply; 11+ messages in thread
From: Trond Myklebust @ 2004-03-12 23:19 UTC (permalink / raw)
  To: Tim Hockin; +Cc: Tim Hockin, Linux Kernel mailing list

På fr , 12/03/2004 klokka 17:38, skreiv Tim Hockin:
> That just divides up the problem, but doesn't solve it.  The simplest would
> be to print a badness and somehow get out of the flush.  Then anyone who is
> triggering the corner case (us, in this case) can just solve it ourselves.
> It's not pretty.

Ewww.... That's saying "I'm just going to ignore the problem and pretend
I can continue". At least the hang will tell you that there *is* a
problem and points to where it is happening.

If I understood your description of where it is happening, then in this
case, continuing is likely to lead to an Oops later when all those open
RPC up/downcall pipes find that their parent directory has disappeared
from underneath them.

> In short, it's dubious that ANYONE call flush_scheduled_work() on a
> workqueue that they don't own.

Huh? You're saying that just because work has been scheduled on some
third party thread, you should not be allowed to wait on completion of
that work? That is a clearly unreasonable statement... Imagine doing
even memory allocation in such an environment...

I agree that we need to identify and solve the deadlocking problems, but
let's not start constricting developers with completely arbitrary rules.

The *real* problem here is that mput() is being run from keventd, and is
triggering code that was assumed to be running from an ordinary process
context. Let's concentrate on fixing that...

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: calling flush_scheduled_work()
  2004-03-12 23:19     ` Trond Myklebust
@ 2004-03-12 23:27       ` Tim Hockin
  0 siblings, 0 replies; 11+ messages in thread
From: Tim Hockin @ 2004-03-12 23:27 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Tim Hockin, Linux Kernel mailing list

On Fri, Mar 12, 2004 at 06:19:14PM -0500, Trond Myklebust wrote:
> Ewww.... That's saying "I'm just going to ignore the problem and pretend
> I can continue". At least the hang will tell you that there *is* a
> problem and points to where it is happening.

Well, it needs to be non-silent.  Whether that is a BUG() or a badness or a
panic..

> > In short, it's dubious that ANYONE call flush_scheduled_work() on a
> > workqueue that they don't own.
> 
> Huh? You're saying that just because work has been scheduled on some
> third party thread, you should not be allowed to wait on completion of
> that work? That is a clearly unreasonable statement... Imagine doing
> even memory allocation in such an environment...

Waiting for completion of your work is one thing.  But you can't know what
else has been scheduled.  In this case RPC doesn't know that our work is
calling it.  By waiting for ALL work, it is assuming (silently) that it will
never be called from a keventd.

Either that assumption is true, in which case there needs to be a BUG and a
way out, or that assumption is false.  I'd like to believe that the
assumption is false.

> The *real* problem here is that mput() is being run from keventd, and is
> triggering code that was assumed to be running from an ordinary process
> context. Let's concentrate on fixing that...

Well, we did fix it in a different way.  I just wanted to bring this up as
something that warrants at least a BUG().  But a BUG() and deadlock is
almost a panic().

So if mntput() just CAN NOT be called from keventd, that is a fine answer,
but something should be written about that.  And all the other things that
can not be called from keventd.

-- 
Tim Hockin
Sun Microsystems, Linux Software Engineering
thockin@sun.com
All opinions are my own, not Sun's

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: calling flush_scheduled_work()
  2004-03-12 20:58 calling flush_scheduled_work() Tim Hockin
  2004-03-12 22:00 ` Trond Myklebust
@ 2004-03-12 23:27 ` Andrew Morton
  2004-03-13  0:20   ` Tim Hockin
  2004-03-13  1:17   ` Trond Myklebust
  1 sibling, 2 replies; 11+ messages in thread
From: Andrew Morton @ 2004-03-12 23:27 UTC (permalink / raw)
  To: thockin; +Cc: linux-kernel

Tim Hockin <thockin@sun.com> wrote:
>
> We've recently bumped into an issue, and I'm not sure which is the real bug.
> 
> In short we have a case where mntput() is called from the kevetd workqueue.
> When that mntput() hit an NFS mount, we got a deadlock.  It turns out that
> deep in the RPC code, someone calls flush_scheduled_work().  Deadlock.

Seems simple enough to fix the workqueue code to handle this situation.

Wanna test this for me?

---

 25-akpm/kernel/workqueue.c |    8 ++++++++
 1 files changed, 8 insertions(+)

diff -puN kernel/workqueue.c~flush_scheduled_work-deadlock-fix kernel/workqueue.c
--- 25/kernel/workqueue.c~flush_scheduled_work-deadlock-fix	Fri Mar 12 15:24:29 2004
+++ 25-akpm/kernel/workqueue.c	Fri Mar 12 15:25:46 2004
@@ -229,6 +229,14 @@ void fastcall flush_workqueue(struct wor
 			continue;
 		cwq = wq->cpu_wq + cpu;
 
+		if (cwq->thread == current) {
+			/*
+			 * Probably keventd trying to flush its own queue.
+			 * So just run it by hand rather than deadlocking
+			 */
+			run_workqueue(cwq);
+			continue;
+		}
 		spin_lock_irq(&cwq->lock);
 		sequence_needed = cwq->insert_sequence;
 

_


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: calling flush_scheduled_work()
  2004-03-12 23:27 ` Andrew Morton
@ 2004-03-13  0:20   ` Tim Hockin
  2004-03-13  1:17   ` Trond Myklebust
  1 sibling, 0 replies; 11+ messages in thread
From: Tim Hockin @ 2004-03-13  0:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On Fri, Mar 12, 2004 at 03:27:47PM -0800, Andrew Morton wrote:
> Seems simple enough to fix the workqueue code to handle this situation.

Indeed.  I should have done that myself.  Thanks.

> Wanna test this for me?

Works for me.

-- 
Tim Hockin
Sun Microsystems, Linux Software Engineering
thockin@sun.com
All opinions are my own, not Sun's

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: calling flush_scheduled_work()
  2004-03-12 23:27 ` Andrew Morton
  2004-03-13  0:20   ` Tim Hockin
@ 2004-03-13  1:17   ` Trond Myklebust
  2004-03-13  2:12     ` Andrew Morton
  1 sibling, 1 reply; 11+ messages in thread
From: Trond Myklebust @ 2004-03-13  1:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Tim Hockin, linux-kernel

På fr , 12/03/2004 klokka 18:27, skreiv Andrew Morton:
> Seems simple enough to fix the workqueue code to handle this situation.
> 
> Wanna test this for me?

What if one of the workloads on the queue has another call to
flush_scheduled_work()? (which again finds itself calling another
instance, ...)
This fix doesn't really resolve the deadlock issue. It just reduces the
possibilities of it having any consequences...

Could you please at least include a dump_stack() before the call to
run_workqueue()) so that we can catch the bug in the act, and attempt to
fix it.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: calling flush_scheduled_work()
  2004-03-13  1:17   ` Trond Myklebust
@ 2004-03-13  2:12     ` Andrew Morton
  2004-03-13  2:24       ` Trond Myklebust
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2004-03-13  2:12 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: thockin, linux-kernel

Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
>
> På fr , 12/03/2004 klokka 18:27, skreiv Andrew Morton:
> > Seems simple enough to fix the workqueue code to handle this situation.
> > 
> > Wanna test this for me?
> 
> What if one of the workloads on the queue has another call to
> flush_scheduled_work()? (which again finds itself calling another
> instance, ...)

I fixed that with the old solution-via-changelog:


  Because keventd is a resource which is shared between unrelated parts
  of the kernel it is possible for one person's workqueue handler to
  accidentally call another person's flush_scheduled_work().  thockin
  managed it by calling mntput() from a workqueue handler.  It deadlocks.

  It's simple enough to fix: teach flush_scheduled_work() to go direct
  when it discovers that the calling thread is the one which should be
  running the work.

  Note that this can cause recursion.  The depth of that recursion is
  equal to the number of currently-queued works which themselves want to
  call flush_scheduled_work().  If this ever exceeds three I'll eat my hat.


Really, it's so unlikely it's hardly worth bothering about.

> This fix doesn't really resolve the deadlock issue. It just reduces the
> possibilities of it having any consequences...
> 
> Could you please at least include a dump_stack() before the call to
> run_workqueue()) so that we can catch the bug in the act, and attempt to
> fix it.

How's this?



Add a debug check for workqueues nested more than three deep via the
direct-run-workqueue() path.


---

 25-akpm/kernel/workqueue.c |   10 ++++++++++
 1 files changed, 10 insertions(+)

diff -puN kernel/workqueue.c~flush_scheduled_work-recursion-detect kernel/workqueue.c
--- 25/kernel/workqueue.c~flush_scheduled_work-recursion-detect	2004-03-12 17:54:32.859424968 -0800
+++ 25-akpm/kernel/workqueue.c	2004-03-12 18:11:09.939845704 -0800
@@ -47,6 +47,7 @@ struct cpu_workqueue_struct {
 	struct workqueue_struct *wq;
 	task_t *thread;
 
+	int run_depth;		/* Detect run_workqueue() recursion depth */
 } ____cacheline_aligned;
 
 /*
@@ -140,6 +141,13 @@ static inline void run_workqueue(struct 
 	 * done.
 	 */
 	spin_lock_irqsave(&cwq->lock, flags);
+	cwq->run_depth++;
+	if (cwq->run_depth > 3) {
+		/* morton gets to eat his hat */
+		printk("%s: recusrion depth exceeded: %d\n",
+			__FUNCTION__, cwq->run_depth);
+		dump_stack();
+	}
 	while (!list_empty(&cwq->worklist)) {
 		struct work_struct *work = list_entry(cwq->worklist.next,
 						struct work_struct, entry);
@@ -157,6 +165,7 @@ static inline void run_workqueue(struct 
 		cwq->remove_sequence++;
 		wake_up(&cwq->work_done);
 	}
+	cwq->run_depth--;
 	spin_unlock_irqrestore(&cwq->lock, flags);
 }
 
@@ -286,6 +295,7 @@ struct workqueue_struct *create_workqueu
 	wq = kmalloc(sizeof(*wq), GFP_KERNEL);
 	if (!wq)
 		return NULL;
+	memset(wq, 0, sizeof(*wq));
 
 	for (cpu = 0; cpu < NR_CPUS; cpu++) {
 		if (!cpu_online(cpu))

_


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: calling flush_scheduled_work()
  2004-03-13  2:12     ` Andrew Morton
@ 2004-03-13  2:24       ` Trond Myklebust
  0 siblings, 0 replies; 11+ messages in thread
From: Trond Myklebust @ 2004-03-13  2:24 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Tim Hockin, linux-kernel

På fr , 12/03/2004 klokka 21:12, skreiv Andrew Morton:
> > Could you please at least include a dump_stack() before the call to
> > run_workqueue()) so that we can catch the bug in the act, and attempt to
> > fix it.
> 
> How's this?

Excellent (aside from the spelling error)... Just for the record:
chocolate hats will not be acceptable as a forfeit. 8-)

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: calling flush_scheduled_work()
@ 2004-03-13 11:45 Stefan Rompf
  0 siblings, 0 replies; 11+ messages in thread
From: Stefan Rompf @ 2004-03-13 11:45 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton

Andrew Morton wrote:

>> In short we have a case where mntput() is called from the kevetd
>> workqueue.
>> When that mntput() hit an NFS mount, we got a deadlock.  It turns out
>> that
>> deep in the RPC code, someone calls flush_scheduled_work().  Deadlock.
> 
> Seems simple enough to fix the workqueue code to handle this situation.

Code fixing one corner case won't help. Some time ago, there has been a 
deadlock between a network driver that called flush_scheduled_work() while 
the kernel held the rtnl semaphore and work scheduled by the linkwatch code 
that needs rtnl.

I had posted a patch that changed linkwatch not to block waiting for rtnl, 
however it was dropped in favor of fixing the driver (I don't own that card, 
so I can't tell you if it works by now)

However, this is another example for the problem: Any code can 
schedule_work(), any other code can wait in any place for this work to 
complete. As long as we don't have some known consent on what functions that 
runs inside the keventd workqueue may (not) do, and when it is ok to call 
flush_scheduled_work(), we are always at risk that the workqueue mechanism 
creates a deadlock by accident.

Stefan

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2004-03-13 11:44 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-12 20:58 calling flush_scheduled_work() Tim Hockin
2004-03-12 22:00 ` Trond Myklebust
2004-03-12 22:38   ` Tim Hockin
2004-03-12 23:19     ` Trond Myklebust
2004-03-12 23:27       ` Tim Hockin
2004-03-12 23:27 ` Andrew Morton
2004-03-13  0:20   ` Tim Hockin
2004-03-13  1:17   ` Trond Myklebust
2004-03-13  2:12     ` Andrew Morton
2004-03-13  2:24       ` Trond Myklebust
  -- strict thread matches above, loose matches on Subject: below --
2004-03-13 11:45 Stefan Rompf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox