All of lore.kernel.org
 help / color / mirror / Atom feed
From: Roland Dreier <roland@topspin.com>
To: "Shawn Starr" <spstarr@sh0n.net>
Cc: "'Andrew Morton'" <akpm@digeo.com>, <rml@tech9.net>,
	<rmk@arm.linux.org.uk>, <linux-kernel@vger.kernel.org>
Subject: Re: [BUG][2.5.66bk9+] - tty hangings - patches, dmesg & sysctl+T info
Date: 08 Apr 2003 21:16:33 -0700	[thread overview]
Message-ID: <524r58nw4e.fsf@topspin.com> (raw)
In-Reply-To: <003001c2fe3d$6eab1080$030aa8c0@unknown>

I am completely convinced that the patch you are using (which I
believe is the same as the one Andrew calls
"tty-shutdown-race-fix.patch") is the problem.  What happens is that
release_dev() in tty_io.c calls cancel_delayed_work(), which calls
del_timer_sync() without decrementing nr_queued for keventd_wq.

When flush_scheduled_work() gets called it sleeps on the work_done
waitqueue.  The only place work_done gets woken up is in
run_workqueue, and it only happens if
atomic_dec_and_test(&cwq->nr_queued) decrements nr_queued to 0.
But after calling cancel_delayed_work(), that can never happen (we
deleted the timer that was going to add the work that we're waiting
for).

It seems to me that the implementation of cancel_delayed_work() is
not quite right.  We need to decrement nr_queued if we actually
stopped the work from being added to the workqueue.

Andrew, I've never seen a reply from you about this, can you tell me
if I'm missing something here?

By the way, I assume that the process below is the one that's hung:

    bash          D C04CDC68 4233453816  8524      1                8387 (L-TLB)
    Call Trace:
     [<c010cd75>] do_IRQ+0x235/0x370
     [<c01394a5>] flush_workqueue+0x305/0x450
     [<c010ac18>] common_interrupt+0x18/0x20
     [<c011de30>] default_wake_function+0x0/0x20
     [<c011de30>] default_wake_function+0x0/0x20
     [<c0257a44>] release_dev+0x6a4/0x860
     [<c01566ab>] zap_pmd_range+0x4b/0x70
     [<c0258204>] tty_release+0x94/0x1b0
     [<c016dd7c>] __fput+0xac/0x100
     [<c0258170>] tty_release+0x0/0x1b0

It would seem to be stuck in the flush_workqueue() called from
release_dev(), just as I would expect.

Shawn, can you try the patch below instead of Andrew's ttyfix2?

 - Roland

===== drivers/char/tty_io.c 1.72 vs edited =====
--- 1.72/drivers/char/tty_io.c	Thu Apr  3 10:20:22 2003
+++ edited/drivers/char/tty_io.c	Tue Apr  8 20:23:44 2003
@@ -1286,8 +1286,15 @@
 	}
 	
 	/*
-	 * Make sure that the tty's task queue isn't activated. 
+	 * Prevent flush_to_ldisc() from rescheduling the work for later.  Then
+	 * kill any delayed work.
 	 */
+	clear_bit(TTY_DONT_FLIP, &tty->flags);
+	cancel_delayed_work(&tty->flip.work);
+
+	/*
+	 * Wait for ->hangup_work and ->flip.work handlers to terminate
+ 	 */
 	flush_scheduled_work();
 
 	/* 
===== include/linux/workqueue.h 1.4 vs edited =====
--- 1.4/include/linux/workqueue.h	Mon Nov  4 13:12:06 2002
+++ edited/include/linux/workqueue.h	Tue Apr  8 20:42:41 2003
@@ -63,5 +63,12 @@
 
 extern void init_workqueues(void);
 
+/*
+ * Kill off a pending schedule_delayed_work().  Note that the work callback
+ * function may still be running on return from cancel_delayed_work().  Run
+ * flush_scheduled_work() to wait on it.
+ */
+extern int cancel_delayed_work(struct work_struct *work);
+
 #endif
 
===== kernel/workqueue.c 1.6 vs edited =====
--- 1.6/kernel/workqueue.c	Tue Feb 11 14:57:54 2003
+++ edited/kernel/workqueue.c	Tue Apr  8 20:27:50 2003
@@ -125,6 +125,24 @@
 	return ret;
 }
 
+int cancel_delayed_work(struct work_struct *work) {
+	struct cpu_workqueue_struct *cwq = work->wq_data;
+	int ret;
+
+	ret = del_timer_sync(&work->timer);
+	if (ret) {
+		/*
+		 * Wake up 'work done' waiters (flush) if we just
+		 * removed the last thing on the workqueue.
+		 */
+		if (atomic_dec_and_test(&cwq->nr_queued))
+			wake_up(&cwq->work_done);
+
+	}
+
+	return ret;
+}
+
 static inline void run_workqueue(struct cpu_workqueue_struct *cwq)
 {
 	unsigned long flags;
@@ -378,5 +396,5 @@
 
 EXPORT_SYMBOL(schedule_work);
 EXPORT_SYMBOL(schedule_delayed_work);
+EXPORT_SYMBOL(cancel_delayed_work);
 EXPORT_SYMBOL(flush_scheduled_work);
-

  parent reply	other threads:[~2003-04-09  4:05 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-04-06 20:09 [BUG][2.5.66bk9+] - changes to timers still broken - we don't oops anymore Shawn Starr
2003-04-06 20:20 ` Roland Dreier
2003-04-06 20:38 ` Andrew Morton
2003-04-06 21:00   ` Shawn Starr
2003-04-09  2:12   ` [BUG][2.5.66bk9+] - tty hangings - patches, dmesg & sysctl+T info Shawn Starr
2003-04-09  4:12     ` Andrew Morton
2003-04-09  4:47       ` Roland Dreier
2003-04-09  4:56         ` Andrew Morton
2003-04-09  4:16     ` Roland Dreier [this message]
2003-04-09  4:27       ` Andrew Morton
2003-04-09  4:52         ` Roland Dreier
2003-04-09  5:01           ` Shawn Starr
  -- strict thread matches above, loose matches on Subject: below --
2003-04-12 16:08 Shawn Starr

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=524r58nw4e.fsf@topspin.com \
    --to=roland@topspin.com \
    --cc=akpm@digeo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rmk@arm.linux.org.uk \
    --cc=rml@tech9.net \
    --cc=spstarr@sh0n.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.