public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 2.6.22-rc6 spurious hangs
@ 2007-06-28 14:00 Thomas Sattler
  2007-06-28 14:47 ` Oleg Nesterov
  2007-06-29 11:15 ` Tomi Orava
  0 siblings, 2 replies; 22+ messages in thread
From: Thomas Sattler @ 2007-06-28 14:00 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Alan Cox, Oleg Nesterov, Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 3799 bytes --]

Hi there ...

I'm observing seldom hangs with linux 2.6. I can't tell when exactly it
happened the first time, I think somewhere around 2.6.16 or 2.6.17. I
see it about once or twice a month. With absolutely nothing in the logs.
So far I asked for help:

 - in the -ck list

Mon Sep 4 10:22:06 EST 2006, [ck] ck-patches seem to break DVB-T drivers
(see http://bhhdoa.org.au/pipermail/ck/2006-September/thread.html#6385)

 - in the linux-dvb list

Wed Sep 6 19:02:29 CEST 2006, [linux-dvb] driver problems when using
ck-patchset
(http://www.linuxtv.org/pipermail/linux-dvb/2006-September/thread.html#12649)

 - in the DaLUG (german, currently no archive) 14.09.2006


But nobody could help me so far.

Here is what I do:

I was running different kernels with different patchsets. It happened in
the past on -ck kernels (staircase), vanilla scheduler and cfs. As far
as I can remember the following patches were allways applied: squashfs
and vesa-tng.

Currently I'm running 2.6.22rc6 with cfs-v18, vesa-tng and an
XFS-lockdep patch:

http://people.redhat.com/mingo/cfs-scheduler/sched-cfs-v2.6.22-rc6-v18.patch
http://dev.gentoo.org/~spock/projects/vesafb-tng/archive/vesafb-tng-1.0-rc2-2.6.20-rc2.patch
see http://marc.info/?l=linux-kernel&m=118286232709378&w=2

I also installed these kernel modules via gentoo portage:

ati-drivers-8.37.6-r1
fuse-2.6.4-r1
kqemu-1.3.0_pre11
truecrypt-4.3

kqemu and truecrypt weren't loaded, but ati-drivers and fuse were.

The box I talk about is an IBM T41p with 1.7GHz Pentium M and 512MB RAM.
The distribution in use is gentoo, quite up to date. Attached to the box
is an USB2.0 DVB-T receiver (Cinergy T², Terratec).

In rare cases the keyboard stops working when the T² stops streaming DVB
to the box. It happens when I record the stream to disk as well as when
I stream it to mplayer.

If end of streaming is caused by a keypress, 'q' or 'enter' on mplayer,
that key gets stuck. It's repeated until I reboot the box.

If the recording was scheduled and stops by itself no more keys are
recognized. The keyboard is dead. The laptop's own and the attached
USB-Keyboard. Magic-Sys-Keys are still working.

I can still use the mouse to move windows around, start new xterms via
icewm's panel or copy and past single characters from an xterm to other
xterms.

I can also close most of the open windows, for example firefox and most
xterms. I cannot close an xterm which is started as 'xterm -e top' by
icewm or a vncviewer. Both windows stay open but lose their content.

If a root shell is open I can enter 'reboot' or 'halt' but most of the
time this doesn't reboot or halt. I get the message for an upcoming
shotdown in all xterms but the box doesn't come down.

The systemload continously increases but there is nothing to see in top why.

Ingo Molnar told me to enable CONFIG_PROVE_LOCKING but xfs triggers it
long before the box hangs. I tested the patch mentioned above but it was
triggered by xfs again, see [1] and I didn't reboot between this and the
last hung. [1] http://marc.info/?l=linux-kernel&m=118295294529681&w=2

As Ingo told me I run 'echo t > /proc/sysrq-trigger' this time. The
corresponding part of my syslogs is attached, as well as my kernel config.


Another thing I observed with the T² is that it doesn't work if it's
already connected when the laptop boots up. I need to power off,
disconnect and boot. If I connect the T² after bootup it works. I can
also rmmod it's driver when it's not in use.

If I boot the box with the T² connected I cannot use it, the blue led in
the T² is always off and I cannot rmmod the driver. (I don't know
whether I ever tired to rmmod the driver before I tried to use the T².)


Please CC me as I'm not subscribed to the list.

Thomas

-- 
keep mailinglists in english, feel free to send PM in german

[-- Attachment #2: messages.gz --]
[-- Type: application/gzip, Size: 4203 bytes --]

[-- Attachment #3: config.gz --]
[-- Type: application/gzip, Size: 12255 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.6.22-rc6 spurious hangs
  2007-06-28 14:00 2.6.22-rc6 spurious hangs Thomas Sattler
@ 2007-06-28 14:47 ` Oleg Nesterov
  2007-06-28 14:58   ` Thomas Sattler
  2007-06-29 11:15 ` Tomi Orava
  1 sibling, 1 reply; 22+ messages in thread
From: Oleg Nesterov @ 2007-06-28 14:47 UTC (permalink / raw)
  To: Thomas Sattler; +Cc: Linux Kernel Mailing List, Alan Cox, Ingo Molnar

On 06/28, Thomas Sattler wrote:
>
> As Ingo told me I run 'echo t > /proc/sysrq-trigger' this time. The
> corresponding part of my syslogs is attached, as well as my kernel config.

xs_connect() and release_dev() are blocked on flush_workqueue(). Perhaps
this is OK, but may be not.

Could you try the patch below? It dumps some info when flush_workqueue()
hangs.

Oleg.

--- OLD/kernel/sched.c~TST	2007-04-05 12:20:35.000000000 +0400
+++ OLD/kernel/sched.c	2007-06-02 15:41:53.000000000 +0400
@@ -4177,6 +4177,20 @@ struct task_struct *idle_task(int cpu)
 	return cpu_rq(cpu)->idle;
 }
 
+struct task_struct *get_cpu_curr(int cpu)
+{
+	unsigned long flags;
+	struct task_struct *curr;
+	struct rq *rq = cpu_rq(cpu);
+
+	spin_lock_irqsave(&rq->lock, flags);
+	curr = rq->curr;
+	get_task_struct(curr);
+	spin_unlock_irqrestore(&rq->lock, flags);
+
+	return curr;
+}
+
 /**
  * find_process_by_pid - find a process with a matching PID value.
  * @pid: the pid in question.
--- OLD/kernel/workqueue.c~TST	2007-06-02 13:34:57.000000000 +0400
+++ OLD/kernel/workqueue.c	2007-06-03 11:28:54.000000000 +0400
@@ -49,6 +49,7 @@ struct cpu_workqueue_struct {
 	struct task_struct *thread;
 
 	int run_depth;		/* Detect run_workqueue() recursion depth */
+	int jobs;
 } ____cacheline_aligned;
 
 /*
@@ -253,6 +254,7 @@ static void run_workqueue(struct cpu_wor
 
 		cwq->current_work = work;
 		list_del_init(cwq->worklist.next);
+		cwq->jobs++;
 		spin_unlock_irq(&cwq->lock);
 
 		BUG_ON(get_wq_data(work) != cwq);
@@ -328,6 +330,47 @@ static void insert_wq_barrier(struct cpu
 	insert_work(cwq, &barr->work, tail);
 }
 
+extern struct task_struct *get_cpu_curr(int cpu);
+
+static void flush_wait(struct cpu_workqueue_struct *cwq, struct completion *done)
+{
+	const int cpu = task_cpu(cwq->thread);
+	struct task_struct *curr;
+	struct work_struct *work;
+	int old_pid, state, jobs;
+
+again:
+	state = cwq->thread->state;
+	work = cwq->current_work;
+	jobs = cwq->jobs;
+
+	curr = get_cpu_curr(cpu);
+	old_pid = curr->pid;
+	put_task_struct(curr);
+
+	if (wait_for_completion_timeout(done, HZ * 30))
+		return;
+
+	printk(KERN_ERR "ERR!! %s flush hang: %p %p %d %d %d %d\n", cwq->thread->comm,
+			work, cwq->current_work, jobs, cwq->jobs,
+			state, (int)cwq->thread->state);
+
+	curr = get_cpu_curr(cpu);
+	printk(KERN_ERR "CURR: %d %d %s %ld %ld\n", old_pid, curr->pid,
+			curr->comm, curr->nivcsw, curr->nvcsw);
+	put_task_struct(curr);
+
+	spin_lock_irq(&cwq->lock);
+	list_for_each_entry(work, &cwq->worklist, entry)
+		print_symbol("    %s\n", (unsigned long) work->func);
+	printk("    ----\n");
+	if (cwq->current_work)
+		print_symbol("    %s\n", (unsigned long) cwq->current_work->func);
+	spin_unlock_irq(&cwq->lock);
+
+	goto again;
+}
+
 static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq)
 {
 	int active;
@@ -351,7 +394,7 @@ static int flush_cpu_workqueue(struct cp
 		spin_unlock_irq(&cwq->lock);
 
 		if (active)
-			wait_for_completion(&barr.done);
+			flush_wait(cwq, &barr.done);
 	}
 
 	return active;


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.6.22-rc6 spurious hangs
  2007-06-28 14:47 ` Oleg Nesterov
@ 2007-06-28 14:58   ` Thomas Sattler
  2007-06-28 15:00     ` Ingo Molnar
       [not found]     ` <20070628150826.GA487@tv-sign.ru>
  0 siblings, 2 replies; 22+ messages in thread
From: Thomas Sattler @ 2007-06-28 14:58 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Linux Kernel Mailing List, Alan Cox, Ingo Molnar

>> As Ingo told me I run 'echo t > /proc/sysrq-trigger' this time. The
>> corresponding part of my syslogs is attached, as well as my kernel config.
> 
> Could you try the patch below? It dumps some info when flush_workqueue()
> hangs.

I'm compiling a patched kernel right now. As I wrote in my former mail the
whole thing not easy to trigger. So it can take some time to get the info.

Thanks so far,
Thomas

-- 
keep mailinglists in english, feel free to send PM in german

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.6.22-rc6 spurious hangs
  2007-06-28 14:58   ` Thomas Sattler
@ 2007-06-28 15:00     ` Ingo Molnar
       [not found]     ` <20070628150826.GA487@tv-sign.ru>
  1 sibling, 0 replies; 22+ messages in thread
From: Ingo Molnar @ 2007-06-28 15:00 UTC (permalink / raw)
  To: Thomas Sattler; +Cc: Oleg Nesterov, Linux Kernel Mailing List, Alan Cox


* Thomas Sattler <tsattler@gmx.de> wrote:

> >> As Ingo told me I run 'echo t > /proc/sysrq-trigger' this time. The
> >> corresponding part of my syslogs is attached, as well as my kernel config.
> > 
> > Could you try the patch below? It dumps some info when flush_workqueue()
> > hangs.
> 
> I'm compiling a patched kernel right now. As I wrote in my former mail 
> the whole thing not easy to trigger. So it can take some time to get 
> the info.

i'd suggest to increase CONFIG_LOG_BUF_SHIFT=17 to =20 in your .config - 
to make sure the full SysRq-t dump is captured when you do it.

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.6.22-rc6 spurious hangs
       [not found]     ` <20070628150826.GA487@tv-sign.ru>
@ 2007-06-28 17:35       ` Thomas Sattler
  2007-06-28 17:49         ` Thomas Sattler
  0 siblings, 1 reply; 22+ messages in thread
From: Thomas Sattler @ 2007-06-28 17:35 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Linux Kernel Mailing List, Alan Cox, Ingo Molnar

>>>> As Ingo told me I run 'echo t > /proc/sysrq-trigger' this time. The
>>>> corresponding part of my syslogs is attached, as well as my kernel config.
>>> Could you try the patch below? It dumps some info when flush_workqueue()
>>> hangs.
>> I'm compiling a patched kernel right now. As I wrote in my former mail the
>> whole thing not easy to trigger. So it can take some time to get the info.
> 
> Forgot to say, if you manage to trigger the hang, please wait a couple of
> minutes to collect more info from flush_wait().

Seems today is my lucky day: I triggered it in just a few minutes.

The logfile is attached.

Thomas

-- 
keep mailinglists in english, feel free to send PM in german

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.6.22-rc6 spurious hangs
  2007-06-28 17:35       ` Thomas Sattler
@ 2007-06-28 17:49         ` Thomas Sattler
  2007-06-28 18:10           ` Oleg Nesterov
  0 siblings, 1 reply; 22+ messages in thread
From: Thomas Sattler @ 2007-06-28 17:49 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Linux Kernel Mailing List, Alan Cox, Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 95 bytes --]

Here is the logfile.

Thomas

-- 
keep mailinglists in english, feel free to send PM in german

[-- Attachment #2: messages.gz --]
[-- Type: application/gzip, Size: 4077 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.6.22-rc6 spurious hangs
  2007-06-28 17:49         ` Thomas Sattler
@ 2007-06-28 18:10           ` Oleg Nesterov
  2007-06-28 21:31             ` Thomas Sattler
  2007-06-29  7:13             ` Thomas Sattler
  0 siblings, 2 replies; 22+ messages in thread
From: Oleg Nesterov @ 2007-06-28 18:10 UTC (permalink / raw)
  To: Thomas Sattler; +Cc: Linux Kernel Mailing List, Alan Cox, Ingo Molnar

On 06/28, Thomas Sattler wrote:
>
> Here is the logfile.

> Jun 28 18:26:03 pearl ERR!! events/0 flush hang: f573759c f573759c 5782 5782 0 1
> Jun 28 18:26:03 pearl CURR: 5557 5557 dvbd 0 129024
> Jun 28 18:26:03 pearl wq_barrier_func+0x0/0xd
> Jun 28 18:26:03 pearl cache_reap+0x0/0xe3
> Jun 28 18:26:03 pearl ----
> Jun 28 18:26:03 pearl cinergyt2_query_rc+0x0/0x2e9 [cinergyT2]
> ...
> Jun 28 19:23:03 pearl ERR!! events/0 flush hang: f573759c f573759c 5782 5782 1 1
> Jun 28 19:23:03 pearl CURR: 5557 5557 dvbd 0 129138
> Jun 28 19:23:03 pearl wq_barrier_func+0x0/0xd
> Jun 28 19:23:03 pearl cache_reap+0x0/0xe3
> Jun 28 19:23:03 pearl console_callback+0x0/0xca
> Jun 28 19:23:03 pearl flush_to_ldisc+0x0/0xf7
> Jun 28 19:23:03 pearl rekey_seq_generator+0x0/0x4e
> Jun 28 19:23:03 pearl ----
> Jun 28 19:23:03 pearl cinergyt2_query_rc+0x0/0x2e9 [cinergyT2]

cinergyt2_query_rc() hangs. I'll try to look tomorrov, but I know nothing
about drivers/media/dvb/.

This log is strange, flush_work() should report every 30 seconds, but it
looks like wait_for_completion_timeout(HZ * 30) returns a lot earlier.

Could you also show the result of sysrq-T ?

Thanks!

Oleg.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.6.22-rc6 spurious hangs
  2007-06-28 18:10           ` Oleg Nesterov
@ 2007-06-28 21:31             ` Thomas Sattler
  2007-06-29  7:13             ` Thomas Sattler
  1 sibling, 0 replies; 22+ messages in thread
From: Thomas Sattler @ 2007-06-28 21:31 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Linux Kernel Mailing List, Alan Cox, Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 504 bytes --]

> Could you also show the result of sysrq-T ?
I was so happy that I could trigger it that fast ...
... that I forgot to press Alt-Sysrq-t before reboot.
:-(

But, I could trigger it again. :-)

This time I can offer:

 - Debug output from Oleg's patch (11x, every 30s)
 - Alt-Sysrq-t (3x, about 30s between them)

There is no lockdep stuff but lockdep must have
been running. It's enabled and did not fire
before the the bug was triggered.

The logfile is attached.
(yes it is, I checked twice)

Thomas


[-- Attachment #2: messages.gz --]
[-- Type: application/gzip, Size: 28637 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.6.22-rc6 spurious hangs
  2007-06-28 18:10           ` Oleg Nesterov
  2007-06-28 21:31             ` Thomas Sattler
@ 2007-06-29  7:13             ` Thomas Sattler
  2007-06-29 13:09               ` Oleg Nesterov
  1 sibling, 1 reply; 22+ messages in thread
From: Thomas Sattler @ 2007-06-29  7:13 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Linux Kernel Mailing List, Alan Cox, Ingo Molnar

>> Jun 28 19:23:03 pearl cinergyt2_query_rc+0x0/0x2e9 [cinergyT2]
> 
> cinergyt2_query_rc() hangs. I'll try to look tomorrov, but I know nothing
> about drivers/media/dvb/.

Does this mean the problem is in the cinergyt2 driver? I'm having similar
problems with another box but with different hardware. While my laptop is
used as a test system the other one is used as a 'productive' TV-recorder.
I hoped we could trace the bug on the test system and fix the productive
one at the same time. :-/

The other box ("silver") is a desktop, which has two Hauppauge Nova-T DVB-T
PCI cards and one (analog) Hauppauge WinTV PVR-350. Silver only hangs if
the (digital) recording process has to much priority: (silver is running
2.6.21.5-cfs-v17 +squashfs +ivtv)

As I wanted to give as much priority to the recording process as possible
I firstly run dvbd as SCHED_RR. This hung the box quite often, sometimes
after an uptime of several minutes, sometimes after two weeks.

I switched to -ck and run dvbd as SCHED_ISO which worked without *any*
problem for about 18 months. As -ck is discontinued I switched to CFS and
the box hung again (twice until I understood why) when dvbd was running as
nice -15.

ATM dvbd runs with nice -12 but yesterday, during a rsync-transfer of
several >4G files, a recording was broken. 29 seconds of the recorded
stream are lost because the system load was at 5 for about three hours.

Perhaps the 29 missing seconds are caused not by to less CPU time but by
the havy IO of rsync. But on the other hand dvbd is also running at IO
realtime prio 4 (ionice) while rsync run as IO normal.

Any hints?
Thomas

-- 
keep mailinglists in english, feel free to send PM in german

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.6.22-rc6 spurious hangs
  2007-06-28 14:00 2.6.22-rc6 spurious hangs Thomas Sattler
  2007-06-28 14:47 ` Oleg Nesterov
@ 2007-06-29 11:15 ` Tomi Orava
  1 sibling, 0 replies; 22+ messages in thread
From: Tomi Orava @ 2007-06-29 11:15 UTC (permalink / raw)
  To: Thomas Sattler
  Cc: Linux Kernel Mailing List, Alan Cox, Oleg Nesterov, Ingo Molnar


Hello,

> I'm observing seldom hangs with linux 2.6. I can't tell when exactly it
> happened the first time, I think somewhere around 2.6.16 or 2.6.17. I
> see it about once or twice a month. With absolutely nothing in the logs.
> So far I asked for help:

<snip>

> The box I talk about is an IBM T41p with 1.7GHz Pentium M and 512MB RAM.
> The distribution in use is gentoo, quite up to date. Attached to the box
> is an USB2.0 DVB-T receiver (Cinergy T², Terratec).
>
> In rare cases the keyboard stops working when the T² stops streaming DVB
> to the box. It happens when I record the stream to disk as well as when
> I stream it to mplayer.

I just wanted you to know that you are not alone with this problem.
I've seen the same symptoms on my parents machine during the last 1-2 months
--> ie. keyboard hangs completely, mouse works and the host either hangs
completely sooner or later with only the following usb debug messages in
the logs (pasted below).

I believe that there is certainly some sort of bug in cinergyT2-driver
as I've not seen the hang happen whenever the cinergyT2 usb-receiver
has been disconnected from the machine.

Host configuration is:
2.6.22-rc4-git4 kernel + latest DVB drivers from v4l-dvb tree
(previously also tested with FC6 kernels with same symptoms)
MythTV-software (from devel SVN)

Asus P4P800-E motherboard (ich5 & PDC20378 sata-controllers)
1GB ram
Intel P4 2.8MHz
Terratec Cinergy T2 dvb-t receiver

Regards,
Tomi Orava


--------------------------------------------------
ehci_hcd 0000:00:1d.7: devpath 1 ep2in 3strikes
hub 1-0:1.0: state 7 ports 8 chg 0000 evt 0002
ehci_hcd 0000:00:1d.7: GetStatus port 1 status 001002 POWER sig=se0 CSC
hub 1-0:1.0: port 1, status 0100, change 0001, 12 Mb/s
usb 1-1: USB disconnect, address 2
usb 1-1: unregistering device
usb 1-1: usb_disable_device nuking all URBs
ehci_hcd 0000:00:1d.7: shutdown urb f6c9a3c0 pipe c0010280 ep2in-bulk
ehci_hcd 0000:00:1d.7: shutdown urb f6c9a340 pipe c0010280 ep2in-bulk
ehci_hcd 0000:00:1d.7: shutdown urb f6c9a2c0 pipe c0010280 ep2in-bulk
ehci_hcd 0000:00:1d.7: shutdown urb f6c9a240 pipe c0010280 ep2in-bulk
ehci_hcd 0000:00:1d.7: shutdown urb f71a6c40 pipe c0010280 ep2in-bulk
ehci_hcd 0000:00:1d.7: shutdown urb f7e7f0c0 pipe c0010280 ep2in-bulk
ehci_hcd 0000:00:1d.7: shutdown urb f6c37dc0 pipe c0010280 ep2in-bulk
ehci_hcd 0000:00:1d.7: shutdown urb f6c37e40 pipe c0010280 ep2in-bulk
ehci_hcd 0000:00:1d.7: shutdown urb f71a6d40 pipe c0010280 ep2in-bulk
ehci_hcd 0000:00:1d.7: shutdown urb f6c9af40 pipe c0010280 ep2in-bulk
ehci_hcd 0000:00:1d.7: shutdown urb f71a61c0 pipe c0010280 ep2in-bulk
ehci_hcd 0000:00:1d.7: shutdown urb f6c9ae40 pipe c0010280 ep2in-bulk
ehci_hcd 0000:00:1d.7: shutdown urb f6c9adc0 pipe c0010280 ep2in-bulk
ehci_hcd 0000:00:1d.7: shutdown urb f6c9ad40 pipe c0010280 ep2in-bulk
ehci_hcd 0000:00:1d.7: shutdown urb f6c9acc0 pipe c0010280 ep2in-bulk
ehci_hcd 0000:00:1d.7: shutdown urb f6c9ac40 pipe c0010280 ep2in-bulk
ehci_hcd 0000:00:1d.7: shutdown urb f6c9abc0 pipe c0010280 ep2in-bulk
ehci_hcd 0000:00:1d.7: shutdown urb f6c9ab40 pipe c0010280 ep2in-bulk
ehci_hcd 0000:00:1d.7: shutdown urb f6c9aac0 pipe c0010280 ep2in-bulk
ehci_hcd 0000:00:1d.7: shutdown urb f6c9aa40 pipe c0010280 ep2in-bulk
ehci_hcd 0000:00:1d.7: shutdown urb f6c9a9c0 pipe c0010280 ep2in-bulk
ehci_hcd 0000:00:1d.7: shutdown urb f6c9a940 pipe c0010280 ep2in-bulk
ehci_hcd 0000:00:1d.7: shutdown urb f6c9a8c0 pipe c0010280 ep2in-bulk
ehci_hcd 0000:00:1d.7: shutdown urb f6c9a840 pipe c0010280 ep2in-bulk
ehci_hcd 0000:00:1d.7: shutdown urb f6c9a7c0 pipe c0010280 ep2in-bulk
ehci_hcd 0000:00:1d.7: shutdown urb f6c9a740 pipe c0010280 ep2in-bulk
ehci_hcd 0000:00:1d.7: shutdown urb f6c9a6c0 pipe c0010280 ep2in-bulk
ehci_hcd 0000:00:1d.7: shutdown urb f6c9a640 pipe c0010280 ep2in-bulk
ehci_hcd 0000:00:1d.7: shutdown urb f6c9a5c0 pipe c0010280 ep2in-bulk
ehci_hcd 0000:00:1d.7: shutdown urb f6c9a540 pipe c0010280 ep2in-bulk
ehci_hcd 0000:00:1d.7: shutdown urb f6c9a4c0 pipe c0010280 ep2in-bulk
ehci_hcd 0000:00:1d.7: shutdown urb f6c9a440 pipe c0010280 ep2in-bulk
cinergyT2: cinergyt2_submit_stream_urb(): urb submission failed (err = -19)!
last message repeated 31 times
---------------------------------------------------------------------
-- 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.6.22-rc6 spurious hangs
  2007-06-29  7:13             ` Thomas Sattler
@ 2007-06-29 13:09               ` Oleg Nesterov
  2007-06-29 13:16                 ` Ingo Molnar
  0 siblings, 1 reply; 22+ messages in thread
From: Oleg Nesterov @ 2007-06-29 13:09 UTC (permalink / raw)
  To: Thomas Sattler
  Cc: Linux Kernel Mailing List, Alan Cox, Ingo Molnar, Daniel Mack,
	Holger Waechtler

On 06/29, Thomas Sattler wrote:
>
> >> Jun 28 19:23:03 pearl cinergyt2_query_rc+0x0/0x2e9 [cinergyT2]
> > 
> > cinergyt2_query_rc() hangs. I'll try to look tomorrov, but I know nothing
> > about drivers/media/dvb/.
> 
> Does this mean the problem is in the cinergyt2 driver? I'm having similar
> problems with another box but with different hardware. While my laptop is
> used as a test system the other one is used as a 'productive' TV-recorder.
> I hoped we could trace the bug on the test system and fix the productive
> one at the same time. :-/

Yes, I think cinergyt2 is buggy.

> Jun 28 23:02:42 pearl events/0      S 00000280     0     5      2 (L-TLB)
> Jun 28 23:02:42 pearl c1d4ded0 00000096 55136d42 00000280 c0302fa1 c1d4ad10 55136d42 00000280
> Jun 28 23:02:42 pearl c1d5c170 f737cd40 c1d4deec f56eca50 00000246 c1d4df0c c0303b81 00000000
> Jun 28 23:02:42 pearl 00000002 c0303c94 00000280 c1d4ab80 f56eca7c f56eca7c c1d4ab80 f56eca50
> Jun 28 23:02:42 pearl Call Trace:
> Jun 28 23:02:42 pearl [<c0303b81>] __mutex_lock_interruptible_slowpath+0x192/0x293
> Jun 28 23:02:42 pearl [<c0303c94>] mutex_lock_interruptible+0x12/0x15
> Jun 28 23:02:42 pearl [<f96ae120>] cinergyt2_query_rc+0x2d/0x2e9 [cinergyT2]
> Jun 28 23:02:42 pearl [<c012a79b>] run_workqueue+0x7f/0x130
> Jun 28 23:02:42 pearl [<c012b036>] worker_thread+0xb7/0xc5
> Jun 28 23:02:42 pearl [<c012d577>] kthread+0x39/0x5e
> Jun 28 23:02:42 pearl [<c010485b>] kernel_thread_helper+0x7/0x10


> Jun 28 23:02:42 pearl dvbd          D 00000281     0  5643      1 (NOTLB)
> Jun 28 23:02:42 pearl f6fd1e28 00000086 0a067524 00000281 c0543800 c1d5c300 0a067524 00000281
> Jun 28 23:02:42 pearl 00000000 f737cd40 f6fd1e38 00c3fba6 f6fd1e90 f6fd1e70 c0303704 f6fd1f00
> Jun 28 23:02:42 pearl 000458ee f702cf28 c05442a8 00c3fba6 c0124e58 c1d5c170 c0543800 c03036ff
> Jun 28 23:02:42 pearl Call Trace:
> Jun 28 23:02:42 pearl [<c0303704>] schedule_timeout+0x6e/0x8c
> Jun 28 23:02:42 pearl [<c03032b7>] wait_for_completion_timeout+0x72/0xcf
> Jun 28 23:02:42 pearl [<c012a9a4>] flush_cpu_workqueue+0x9d/0x19a
> Jun 28 23:02:42 pearl [<c012ad6e>] flush_workqueue+0x12/0x15
> Jun 28 23:02:42 pearl [<c012ad7e>] flush_scheduled_work+0xd/0xf
> Jun 28 23:02:42 pearl [<f96ae011>] cinergyt2_release+0x4c/0xa4 [cinergyT2]
> Jun 28 23:02:42 pearl [<c0160159>] __fput+0xaf/0x14b
> Jun 28 23:02:42 pearl [<c016020b>] fput+0x16/0x18
> Jun 28 23:02:42 pearl [<c015ddf3>] filp_close+0x54/0x5c
> Jun 28 23:02:42 pearl [<c015edcb>] sys_close+0x6e/0xa6
> Jun 28 23:02:42 pearl [<c0103c4e>] sysenter_past_esp+0x5f/0x99

cinergyt2_release() does flush_scheduled_work() under cinergyt2->sem.
flush_scheduled_work() hangs because cinergyt2_query_rc() waits for
the same cinergyt2->sem.

->disconnect_pending is used without any locks/barriers, perhaps this
is the reason.

I'll try to look further tomorrow. In any case, cinergyT2 should not
use flush_scheduled_work() at all.

Oleg.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.6.22-rc6 spurious hangs
  2007-06-29 13:09               ` Oleg Nesterov
@ 2007-06-29 13:16                 ` Ingo Molnar
  2007-06-29 13:58                   ` Oleg Nesterov
  0 siblings, 1 reply; 22+ messages in thread
From: Ingo Molnar @ 2007-06-29 13:16 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Thomas Sattler, Linux Kernel Mailing List, Alan Cox, Daniel Mack,
	Holger Waechtler


* Oleg Nesterov <oleg@tv-sign.ru> wrote:

> Yes, I think cinergyt2 is buggy.

> cinergyt2_release() does flush_scheduled_work() under cinergyt2->sem. 
> flush_scheduled_work() hangs because cinergyt2_query_rc() waits for 
> the same cinergyt2->sem.
> 
> ->disconnect_pending is used without any locks/barriers, perhaps this 
> is the reason.
> 
> I'll try to look further tomorrow. In any case, cinergyT2 should not 
> use flush_scheduled_work() at all.

would the hack below be worth trying, to see whether there are any 
further problems?

	Ingo

Index: linux/drivers/media/dvb/cinergyT2/cinergyT2.c
===================================================================
--- linux.orig/drivers/media/dvb/cinergyT2/cinergyT2.c
+++ linux/drivers/media/dvb/cinergyT2/cinergyT2.c
@@ -523,7 +523,6 @@ static int cinergyt2_release (struct ino
 
 	if (!cinergyt2->disconnect_pending && (file->f_flags & O_ACCMODE) != O_RDONLY) {
 		cancel_delayed_work(&cinergyt2->query_work);
-		flush_scheduled_work();
 		cinergyt2_sleep(cinergyt2, 1);
 	}
 
@@ -974,8 +973,6 @@ static void cinergyt2_disconnect (struct
 {
 	struct cinergyt2 *cinergyt2 = usb_get_intfdata (intf);
 
-	flush_scheduled_work();
-
 	cinergyt2_unregister_rc(cinergyt2);
 
 	cancel_delayed_work(&cinergyt2->query_work);
@@ -1002,7 +999,6 @@ static int cinergyt2_suspend (struct usb
 		cancel_delayed_work(&cinergyt2->query_work);
 		if (cinergyt2->streaming)
 			cinergyt2_stop_stream_xfer(cinergyt2);
-		flush_scheduled_work();
 		cinergyt2_sleep(cinergyt2, 1);
 	}
 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.6.22-rc6 spurious hangs
  2007-06-29 13:16                 ` Ingo Molnar
@ 2007-06-29 13:58                   ` Oleg Nesterov
  2007-06-29 14:10                     ` Ingo Molnar
  2007-06-29 21:21                     ` Mauro Carvalho Chehab
  0 siblings, 2 replies; 22+ messages in thread
From: Oleg Nesterov @ 2007-06-29 13:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Sattler, Linux Kernel Mailing List, Alan Cox, Daniel Mack,
	Holger Waechtler

On 06/29, Ingo Molnar wrote:
> * Oleg Nesterov <oleg@tv-sign.ru> wrote:
> 
> > Yes, I think cinergyt2 is buggy.
> 
> > cinergyt2_release() does flush_scheduled_work() under cinergyt2->sem. 
> > flush_scheduled_work() hangs because cinergyt2_query_rc() waits for 
> > the same cinergyt2->sem.
> > 
> > ->disconnect_pending is used without any locks/barriers, perhaps this 
> > is the reason.

I misread cinergyt2_release, it checks !->disconnect_pending, so it is
very clear why cinergyt2_query_rc() tries to take the mutex.

> > I'll try to look further tomorrow. In any case, cinergyT2 should not 
> > use flush_scheduled_work() at all.
> 
> would the hack below be worth trying, to see whether there are any 
> further problems?
> 
> 	Ingo
> 
> Index: linux/drivers/media/dvb/cinergyT2/cinergyT2.c
> ===================================================================
> --- linux.orig/drivers/media/dvb/cinergyT2/cinergyT2.c
> +++ linux/drivers/media/dvb/cinergyT2/cinergyT2.c
> @@ -523,7 +523,6 @@ static int cinergyt2_release (struct ino
>  
>  	if (!cinergyt2->disconnect_pending && (file->f_flags & O_ACCMODE) != O_RDONLY) {
>  		cancel_delayed_work(&cinergyt2->query_work);
> -		flush_scheduled_work();
>  		cinergyt2_sleep(cinergyt2, 1);
>  	}

I don't think we can just kill flush_scheduled_work(). We can use
cancel_rearming_delayed_work() instead of
cancel_delayed_work()+flush_scheduled_work()

Still we can't do this under cinergyt2->sem, because cinergyt2_query()
takes it too. This all looks very wrong to me, I hope maintaners can
explain.

I think cinergyt2_query() and cinergyt2_query_rc() should not use
->disconnect_pending at all. cinergyt2_disconnect() should set
->disconnect_pending = 1 and cancel both delayed_works.

cinergyt2_release() checks !->disconnect_pending and does the cancel
without mutex.

Oleg.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.6.22-rc6 spurious hangs
  2007-06-29 13:58                   ` Oleg Nesterov
@ 2007-06-29 14:10                     ` Ingo Molnar
  2007-06-29 16:59                       ` Dmitry Torokhov
  2007-06-29 21:21                     ` Mauro Carvalho Chehab
  1 sibling, 1 reply; 22+ messages in thread
From: Ingo Molnar @ 2007-06-29 14:10 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Thomas Sattler, Linux Kernel Mailing List, Alan Cox, Daniel Mack,
	Holger Waechtler, Dmitry Torokhov, Mauro Carvalho Chehab,
	Mariusz Kozlowski, v4l-dvb-maintainer


* Oleg Nesterov <oleg@tv-sign.ru> wrote:

> > > ->disconnect_pending is used without any locks/barriers, perhaps 
> > > this is the reason.
> 
> I misread cinergyt2_release, it checks !->disconnect_pending, so it is 
> very clear why cinergyt2_query_rc() tries to take the mutex.
> 
> > > I'll try to look further tomorrow. In any case, cinergyT2 should not 
> > > use flush_scheduled_work() at all.
> > 
> > would the hack below be worth trying, to see whether there are any 
> > further problems?
[...]
> I don't think we can just kill flush_scheduled_work(). We can use 
> cancel_rearming_delayed_work() instead of 
> cancel_delayed_work()+flush_scheduled_work()
> 
> Still we can't do this under cinergyt2->sem, because cinergyt2_query() 
> takes it too. This all looks very wrong to me, I hope maintaners can 
> explain.

i've Cc:-ed the maintainers.

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.6.22-rc6 spurious hangs
  2007-06-29 14:10                     ` Ingo Molnar
@ 2007-06-29 16:59                       ` Dmitry Torokhov
  2007-06-29 17:23                         ` Oleg Nesterov
  0 siblings, 1 reply; 22+ messages in thread
From: Dmitry Torokhov @ 2007-06-29 16:59 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Oleg Nesterov, Thomas Sattler, Linux Kernel Mailing List,
	Alan Cox, Daniel Mack, Holger Waechtler, Mauro Carvalho Chehab,
	Mariusz Kozlowski, v4l-dvb-maintainer

[-- Attachment #1: Type: text/plain, Size: 1883 bytes --]

On 6/29/07, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Oleg Nesterov <oleg@tv-sign.ru> wrote:
>
> > > > ->disconnect_pending is used without any locks/barriers, perhaps
> > > > this is the reason.
> >
> > I misread cinergyt2_release, it checks !->disconnect_pending, so it is
> > very clear why cinergyt2_query_rc() tries to take the mutex.
> >
> > > > I'll try to look further tomorrow. In any case, cinergyT2 should not
> > > > use flush_scheduled_work() at all.
> > >
> > > would the hack below be worth trying, to see whether there are any
> > > further problems?
> [...]
> > I don't think we can just kill flush_scheduled_work(). We can use
> > cancel_rearming_delayed_work() instead of
> > cancel_delayed_work()+flush_scheduled_work()
> >
> > Still we can't do this under cinergyt2->sem, because cinergyt2_query()
> > takes it too. This all looks very wrong to me, I hope maintaners can
> > explain.
>
> i've Cc:-ed the maintainers.
>

Well, not really maintainer but I think the short term soluton (at
least for the RC part) is to alter cinergyt2_query_rc to take
cinergyt2->sem only around cinergyt2_command(). Ther rest of the
polling function need not be protected as it does nto tun concurently
with itself.

The longer trem solution is to convert cinergyt2 to use polled input
device framework (as in attached patch - untested). Unfortunately it
depends on adding suspend/resume support to polled devices now that
workqueues are not being freezed again.

Overall the driver makes me cringe...

- Why does it take cinergyt2->sem in its cinergyt2_poll() function
before calling poll_wait()? Most of the places that try to wake up
polling process (such as FE_SET_FRONTEND) attempt to take the same
mutex and will promptly deadlock unless I am missing something.

- cinergyt2_query() wakes up pollers before altering poll condition.

- cinergyt2->iunuse is racy...

-- 
Dmitry

[-- Attachment #2: cinergyt-to-polldev.patch --]
[-- Type: text/plain, Size: 11978 bytes --]

Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
---
 drivers/media/dvb/cinergyT2/cinergyT2.c |  209 ++++++++++++++------------------
 1 file changed, 94 insertions(+), 115 deletions(-)

Index: linux/drivers/media/dvb/cinergyT2/cinergyT2.c
===================================================================
--- linux.orig/drivers/media/dvb/cinergyT2/cinergyT2.c
+++ linux/drivers/media/dvb/cinergyT2/cinergyT2.c
@@ -26,7 +26,8 @@
 #include <linux/module.h>
 #include <linux/slab.h>
 #include <linux/usb.h>
-#include <linux/input.h>
+#include <linux/input-polldev.h>
+#include <linux/usb/input.h>
 #include <linux/dvb/frontend.h>
 #include <linux/mutex.h>
 #include <linux/mm.h>
@@ -140,11 +141,9 @@ struct cinergyt2 {
 	struct urb *stream_urb [STREAM_URB_COUNT];
 
 #ifdef ENABLE_RC
-	struct input_dev *rc_input_dev;
+	struct input_polled_dev *rc_input;
 	char phys[64];
-	struct delayed_work rc_query_work;
-	int rc_input_event;
-	u32 rc_last_code;
+	int last_keycode;
 	unsigned long last_event_jiffies;
 #endif
 };
@@ -200,9 +199,9 @@ static const uint32_t rc_keys[] = {
 	CINERGYT2_RC_EVENT_TYPE_NEC,	0xa35ceb04,	KEY_NEXT
 };
 
-static int cinergyt2_command (struct cinergyt2 *cinergyt2,
-			      char *send_buf, int send_buf_len,
-			      char *recv_buf, int recv_buf_len)
+static int cinergyt2_command(struct cinergyt2 *cinergyt2,
+			     char *send_buf, int send_buf_len,
+			     char *recv_buf, int recv_buf_len)
 {
 	int actual_len;
 	char dummy;
@@ -232,7 +231,7 @@ static void cinergyt2_control_stream_tra
 	cinergyt2_command(cinergyt2, buf, sizeof(buf), NULL, 0);
 }
 
-static void cinergyt2_sleep (struct cinergyt2 *cinergyt2, int sleep)
+static void cinergyt2_sleep(struct cinergyt2 *cinergyt2, int sleep)
 {
 	char buf [] = { CINERGYT2_EP1_SLEEP_MODE, sleep ? 1 : 0 };
 	cinergyt2_command(cinergyt2, buf, sizeof(buf), NULL, 0);
@@ -476,7 +475,7 @@ static uint16_t compute_tps (struct dvb_
 	return tps;
 }
 
-static int cinergyt2_open (struct inode *inode, struct file *file)
+static int cinergyt2_open(struct inode *inode, struct file *file)
 {
 	struct dvb_device *dvbdev = file->private_data;
 	struct cinergyt2 *cinergyt2 = dvbdev->priv;
@@ -490,7 +489,6 @@ static int cinergyt2_open (struct inode 
 		return err;
 	}
 
-
 	if ((file->f_flags & O_ACCMODE) != O_RDONLY) {
 		cinergyt2_sleep(cinergyt2, 0);
 		schedule_delayed_work(&cinergyt2->query_work, HZ/2);
@@ -514,7 +512,7 @@ static void cinergyt2_unregister(struct 
 	kfree(cinergyt2);
 }
 
-static int cinergyt2_release (struct inode *inode, struct file *file)
+static int cinergyt2_release(struct inode *inode, struct file *file)
 {
 	struct dvb_device *dvbdev = file->private_data;
 	struct cinergyt2 *cinergyt2 = dvbdev->priv;
@@ -537,18 +535,18 @@ static int cinergyt2_release (struct ino
 	return dvb_generic_release(inode, file);
 }
 
-static unsigned int cinergyt2_poll (struct file *file, struct poll_table_struct *wait)
+static unsigned int cinergyt2_poll(struct file *file, struct poll_table_struct *wait)
 {
 	struct dvb_device *dvbdev = file->private_data;
 	struct cinergyt2 *cinergyt2 = dvbdev->priv;
-       unsigned int mask = 0;
+	unsigned int mask = 0;
 
 	if (cinergyt2->disconnect_pending || mutex_lock_interruptible(&cinergyt2->sem))
 		return -ERESTARTSYS;
 
 	poll_wait(file, &cinergyt2->poll_wq, wait);
 
-       if (cinergyt2->pending_fe_events != 0)
+	if (cinergyt2->pending_fe_events != 0)
 		mask |= (POLLIN | POLLRDNORM | POLLPRI);
 
 	mutex_unlock(&cinergyt2->sem);
@@ -722,142 +720,128 @@ static struct dvb_device cinergyt2_fe_te
 
 #ifdef ENABLE_RC
 
-static void cinergyt2_query_rc (struct work_struct *work)
+static int cinergyt2_get_keycode(struct cinergyt2 *cinergyt2,
+				 struct cinergyt2_rc_event *event)
 {
-	struct cinergyt2 *cinergyt2 =
-		container_of(work, struct cinergyt2, rc_query_work.work);
-	char buf[1] = { CINERGYT2_EP1_GET_RC_EVENTS };
-	struct cinergyt2_rc_event rc_events[12];
-	int n, len, i;
+	int i;
 
-	if (cinergyt2->disconnect_pending || mutex_lock_interruptible(&cinergyt2->sem))
-		return;
+	if (event->type == CINERGYT2_RC_EVENT_TYPE_NEC && event->value == ~0)
+		return cinergyt2->last_keycode;
 
-	len = cinergyt2_command(cinergyt2, buf, sizeof(buf),
-				(char *) rc_events, sizeof(rc_events));
-	if (len < 0)
-		goto out;
-	if (len == 0) {
-		if (time_after(jiffies, cinergyt2->last_event_jiffies +
-			       msecs_to_jiffies(150))) {
-			/* stop key repeat */
-			if (cinergyt2->rc_input_event != KEY_MAX) {
-				dprintk(1, "rc_input_event=%d Up\n", cinergyt2->rc_input_event);
-				input_report_key(cinergyt2->rc_input_dev,
-						 cinergyt2->rc_input_event, 0);
-				input_sync(cinergyt2->rc_input_dev);
-				cinergyt2->rc_input_event = KEY_MAX;
-			}
-			cinergyt2->rc_last_code = ~0;
+	for (i = 0; i < ARRAY_SIZE(rc_keys); i += 3) {
+		if (rc_keys[i + 0] == event->type &&
+		    rc_keys[i + 1] == le32_to_cpu(event->value)) {
+			return rc_keys[i + 2];
 		}
-		goto out;
 	}
-	cinergyt2->last_event_jiffies = jiffies;
 
-	for (n = 0; n < (len / sizeof(rc_events[0])); n++) {
-		dprintk(1, "rc_events[%d].value = %x, type=%x\n",
-			n, le32_to_cpu(rc_events[n].value), rc_events[n].type);
-
-		if (rc_events[n].type == CINERGYT2_RC_EVENT_TYPE_NEC &&
-		    rc_events[n].value == ~0) {
-			/* keyrepeat bit -> just repeat last rc_input_event */
-		} else {
-			cinergyt2->rc_input_event = KEY_MAX;
-			for (i = 0; i < ARRAY_SIZE(rc_keys); i += 3) {
-				if (rc_keys[i + 0] == rc_events[n].type &&
-				    rc_keys[i + 1] == le32_to_cpu(rc_events[n].value)) {
-					cinergyt2->rc_input_event = rc_keys[i + 2];
-					break;
+	return KEY_MAX;
+}
+
+static void cinergyt2_query_rc(struct input_polled_dev *poll_dev)
+{
+	struct cinergyt2 *cinergyt2 = poll_dev->private;
+	struct input_dev *input_dev = poll_dev->input;
+	char buf[1] = { CINERGYT2_EP1_GET_RC_EVENTS };
+	struct cinergyt2_rc_event rc_events[12];
+	int n, len, code;
+
+	mutex_lock(&cinergyt2->sem);
+	len = cinergyt2_command(cinergyt2, buf, sizeof(buf),
+				(char *)rc_events, sizeof(rc_events));
+	mutex_unlock(&cinergyt2->sem);
+
+	if (len > 0) {
+		/* we have new data */
+
+		for (n = 0; n < len / sizeof(rc_events[0]); n++) {
+			dprintk(1, "rc_events[%d].value = %x, type=%x\n",
+				n, le32_to_cpu(rc_events[n].value),
+				rc_events[n].type);
+
+			code = cinergyt2_get_keycode(cinergyt2, &rc_events[n]);
+
+			if (code != KEY_MAX) {
+				if (code != cinergyt2->last_keycode &&
+				    cinergyt2->last_keycode != KEY_MAX) {
+					/* emit a key-up for old key */
+					dprintk(1, "rc_input_event=%d UP\n",
+						cinergyt2->last_keycode);
+					input_report_key(input_dev,
+						 cinergyt2->last_keycode, 0);
 				}
-			}
-		}
 
-		if (cinergyt2->rc_input_event != KEY_MAX) {
-			if (rc_events[n].value == cinergyt2->rc_last_code &&
-			    cinergyt2->rc_last_code != ~0) {
-				/* emit a key-up so the double event is recognized */
-				dprintk(1, "rc_input_event=%d UP\n", cinergyt2->rc_input_event);
-				input_report_key(cinergyt2->rc_input_dev,
-						 cinergyt2->rc_input_event, 0);
+				dprintk(1, "rc_input_event=%d\n", code);
+				input_report_key(input_dev, code, 1);
+				input_sync(input_dev);
+
+				cinergyt2->last_keycode = code;
+				cinergyt2->last_event_jiffies = jiffies;
 			}
-			dprintk(1, "rc_input_event=%d\n", cinergyt2->rc_input_event);
-			input_report_key(cinergyt2->rc_input_dev,
-					 cinergyt2->rc_input_event, 1);
-			input_sync(cinergyt2->rc_input_dev);
-			cinergyt2->rc_last_code = rc_events[n].value;
 		}
-	}
-
-out:
-	schedule_delayed_work(&cinergyt2->rc_query_work,
-			      msecs_to_jiffies(RC_QUERY_INTERVAL));
+	} else if (len == 0 &&
+		   cinergyt2->last_keycode != KEY_MAX &&
+		   time_after(jiffies,
+			      cinergyt2->last_event_jiffies +
+					msecs_to_jiffies(150))) {
+		/* stop key repeat */
+		dprintk(1, "rc_input_event=%d Up\n", cinergyt2->last_keycode);
+		input_report_key(input_dev, cinergyt2->last_keycode, 0);
+		input_sync(input_dev);
 
-	mutex_unlock(&cinergyt2->sem);
+		cinergyt2->last_keycode = KEY_MAX;
+	}
 }
 
 static int cinergyt2_register_rc(struct cinergyt2 *cinergyt2)
 {
+	struct input_polled_dev *rc_input;
 	struct input_dev *input_dev;
 	int i;
 	int err;
 
-	input_dev = input_allocate_device();
-	if (!input_dev)
-		return -ENOMEM;
-
 	usb_make_path(cinergyt2->udev, cinergyt2->phys, sizeof(cinergyt2->phys));
 	strlcat(cinergyt2->phys, "/input0", sizeof(cinergyt2->phys));
-	cinergyt2->rc_input_event = KEY_MAX;
-	cinergyt2->rc_last_code = ~0;
-	INIT_DELAYED_WORK(&cinergyt2->rc_query_work, cinergyt2_query_rc);
+	cinergyt2->last_keycode = KEY_MAX;
+
+	cinergyt2->rc_input = rc_input = input_allocate_polled_device();
+	if (!rc_input)
+		return -ENOMEM;
+
+	rc_input->private = cinergyt2;
+	rc_input->poll = cinergyt2_query_rc;
+	rc_input->poll_interval = RC_QUERY_INTERVAL;
 
+	input_dev = rc_input->input;
 	input_dev->name = DRIVER_NAME " remote control";
 	input_dev->phys = cinergyt2->phys;
 	input_dev->evbit[0] = BIT(EV_KEY) | BIT(EV_REP);
 	for (i = 0; i < ARRAY_SIZE(rc_keys); i += 3)
 		set_bit(rc_keys[i + 2], input_dev->keybit);
-	input_dev->keycodesize = 0;
-	input_dev->keycodemax = 0;
-	input_dev->id.bustype = BUS_USB;
-	input_dev->id.vendor = cinergyt2->udev->descriptor.idVendor;
-	input_dev->id.product = cinergyt2->udev->descriptor.idProduct;
-	input_dev->id.version = 1;
-	input_dev->cdev.dev = &cinergyt2->udev->dev;
+	usb_to_input_id(cinergyt2->udev, &input_dev->id);
+	input_dev->dev.parent = &cinergyt2->udev->dev;
 
-	err = input_register_device(input_dev);
+	err = input_register_polled_device(rc_input);
 	if (err) {
-		input_free_device(input_dev);
+		input_free_polled_device(rc_input);
 		return err;
 	}
 
-	cinergyt2->rc_input_dev = input_dev;
-	schedule_delayed_work(&cinergyt2->rc_query_work, HZ/2);
-
 	return 0;
 }
 
 static void cinergyt2_unregister_rc(struct cinergyt2 *cinergyt2)
 {
-	cancel_delayed_work(&cinergyt2->rc_query_work);
-	input_unregister_device(cinergyt2->rc_input_dev);
-}
-
-static inline void cinergyt2_suspend_rc(struct cinergyt2 *cinergyt2)
-{
-	cancel_delayed_work(&cinergyt2->rc_query_work);
-}
+	struct input_polled_dev *rc_input = cinergyt2->rc_input;
 
-static inline void cinergyt2_resume_rc(struct cinergyt2 *cinergyt2)
-{
-	schedule_delayed_work(&cinergyt2->rc_query_work, HZ/2);
+	input_unregister_polled_device(rc_input);
+	input_free_polled_device(rc_input);
 }
 
 #else
 
 static inline int cinergyt2_register_rc(struct cinergyt2 *cinergyt2) { return 0; }
 static inline void cinergyt2_unregister_rc(struct cinergyt2 *cinergyt2) { }
-static inline void cinergyt2_suspend_rc(struct cinergyt2 *cinergyt2) { }
-static inline void cinergyt2_resume_rc(struct cinergyt2 *cinergyt2) { }
 
 #endif /* ENABLE_RC */
 
@@ -892,7 +876,7 @@ static void cinergyt2_query (struct work
 	mutex_unlock(&cinergyt2->sem);
 }
 
-static int cinergyt2_probe (struct usb_interface *intf,
+static int cinergyt2_probe(struct usb_interface *intf,
 		  const struct usb_device_id *id)
 {
 	struct cinergyt2 *cinergyt2;
@@ -970,15 +954,13 @@ bailout:
 	return -ENOMEM;
 }
 
-static void cinergyt2_disconnect (struct usb_interface *intf)
+static void cinergyt2_disconnect(struct usb_interface *intf)
 {
 	struct cinergyt2 *cinergyt2 = usb_get_intfdata (intf);
 
-	flush_scheduled_work();
-
 	cinergyt2_unregister_rc(cinergyt2);
 
-	cancel_delayed_work(&cinergyt2->query_work);
+	cancel_delayed_rearming_work(&cinergyt2->query_work);
 	wake_up_interruptible(&cinergyt2->poll_wq);
 
 	cinergyt2->demux.dmx.close(&cinergyt2->demux.dmx);
@@ -998,7 +980,6 @@ static int cinergyt2_suspend (struct usb
 	if (1) {
 		struct cinergyt2 *cinergyt2 = usb_get_intfdata (intf);
 
-		cinergyt2_suspend_rc(cinergyt2);
 		cancel_delayed_work(&cinergyt2->query_work);
 		if (cinergyt2->streaming)
 			cinergyt2_stop_stream_xfer(cinergyt2);
@@ -1026,8 +1007,6 @@ static int cinergyt2_resume (struct usb_
 		schedule_delayed_work(&cinergyt2->query_work, HZ/2);
 	}
 
-	cinergyt2_resume_rc(cinergyt2);
-
 	mutex_unlock(&cinergyt2->sem);
 	return 0;
 }

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.6.22-rc6 spurious hangs
  2007-06-29 16:59                       ` Dmitry Torokhov
@ 2007-06-29 17:23                         ` Oleg Nesterov
  0 siblings, 0 replies; 22+ messages in thread
From: Oleg Nesterov @ 2007-06-29 17:23 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Ingo Molnar, Thomas Sattler, Linux Kernel Mailing List, Alan Cox,
	Daniel Mack, Holger Waechtler, Mauro Carvalho Chehab,
	Mariusz Kozlowski, v4l-dvb-maintainer

On 06/29, Dmitry Torokhov wrote:
>
> Well, not really maintainer but I think the short term soluton (at
> least for the RC part) is to alter cinergyt2_query_rc to take
> cinergyt2->sem only around cinergyt2_command(). Ther rest of the
> polling function need not be protected as it does nto tun concurently
> with itself.

Can't comment, because I know nothing about this stuff.

But, unless I misread this patch, it doesn't solve the problem.
cinergyt2_release() calls flush_scheduled_work() under ->sem,
but ->query_work takes it too, no?

Quoting myself,
>
>  I think cinergyt2_query() and cinergyt2_query_rc() should not use
>  ->disconnect_pending at all. cinergyt2_disconnect() should set
>  ->disconnect_pending = 1 and cancel both delayed_works.
>
>  cinergyt2_release() checks !->disconnect_pending and does the cancel
>  without mutex.

Possible?

Oleg.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.6.22-rc6 spurious hangs
  2007-06-29 13:58                   ` Oleg Nesterov
  2007-06-29 14:10                     ` Ingo Molnar
@ 2007-06-29 21:21                     ` Mauro Carvalho Chehab
  2007-06-29 21:34                       ` Markus Rechberger
  1 sibling, 1 reply; 22+ messages in thread
From: Mauro Carvalho Chehab @ 2007-06-29 21:21 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Thomas Sattler, Linux Kernel Mailing List, Alan Cox,
	Daniel Mack, Holger Waechtler

> Still we can't do this under cinergyt2->sem, because cinergyt2_query()
> takes it too. This all looks very wrong to me, I hope maintaners can
> explain.

AFAIK, the driver authors are not working anymore with CinergyT2. The
last patch we have on development tree from Holger is dated as Dec, 3
2004. Since them, only internal kernel API changes and a few sparse
fixes from other contributors were applied.

Also, none of the current DVB maintainers seem to have any hardware for
testing.


Cheers,
Mauro


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.6.22-rc6 spurious hangs
  2007-06-29 21:21                     ` Mauro Carvalho Chehab
@ 2007-06-29 21:34                       ` Markus Rechberger
  2007-06-30 14:19                         ` Oleg Nesterov
  0 siblings, 1 reply; 22+ messages in thread
From: Markus Rechberger @ 2007-06-29 21:34 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Oleg Nesterov, Ingo Molnar, Thomas Sattler,
	Linux Kernel Mailing List, Alan Cox, Daniel Mack,
	Holger Waechtler

On 6/29/07, Mauro Carvalho Chehab <mchehab@infradead.org> wrote:
> > Still we can't do this under cinergyt2->sem, because cinergyt2_query()
> > takes it too. This all looks very wrong to me, I hope maintaners can
> > explain.
>
> AFAIK, the driver authors are not working anymore with CinergyT2. The
> last patch we have on development tree from Holger is dated as Dec, 3
> 2004. Since them, only internal kernel API changes and a few sparse
> fixes from other contributors were applied.
>
> Also, none of the current DVB maintainers seem to have any hardware for
> testing.
>

I received a Mail a while ago that this driver is open to the
community, it duplicates some code because the developers wanted to
use this driver for testing another DVB API which never took off.
Best would be to remove the duplicated code in that driver and make it
look like all other DVB drivers.

Markus

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.6.22-rc6 spurious hangs
  2007-06-29 21:34                       ` Markus Rechberger
@ 2007-06-30 14:19                         ` Oleg Nesterov
  2007-07-01 18:54                           ` Thomas Sattler
  0 siblings, 1 reply; 22+ messages in thread
From: Oleg Nesterov @ 2007-06-30 14:19 UTC (permalink / raw)
  To: Markus Rechberger
  Cc: Mauro Carvalho Chehab, Ingo Molnar, Thomas Sattler,
	Linux Kernel Mailing List, Alan Cox, Daniel Mack,
	Holger Waechtler

On 06/29, Markus Rechberger wrote:
>
> On 6/29/07, Mauro Carvalho Chehab <mchehab@infradead.org> wrote:
> >> Still we can't do this under cinergyt2->sem, because cinergyt2_query()
> >> takes it too. This all looks very wrong to me, I hope maintaners can
> >> explain.
> >
> >AFAIK, the driver authors are not working anymore with CinergyT2. The
> >last patch we have on development tree from Holger is dated as Dec, 3
> >2004. Since them, only internal kernel API changes and a few sparse
> >fixes from other contributors were applied.
> >
> >Also, none of the current DVB maintainers seem to have any hardware for
> >testing.
> >
> 
> I received a Mail a while ago that this driver is open to the
> community, it duplicates some code because the developers wanted to
> use this driver for testing another DVB API which never took off.
> Best would be to remove the duplicated code in that driver and make it
> look like all other DVB drivers.

OK.

Thomas, any chance you could try the patch below? It is very, very stupid,
it was done without any understanding of this code, and of course it is
completely untested. I doubt very much it is correct, and even if it is
correct it is definitely not good. It would be great if Dmitry can take a look.

This patch shifts cancel_rearming_delayed_work() out of ->sem. Another mutex,
->wq_sem, was added to protect against the concurrent open/resume.

Oleg.

--- t/drivers/media/dvb/cinergyT2/cinergyT2.c~cinergy	2007-06-19 17:09:15.000000000 +0400
+++ t/drivers/media/dvb/cinergyT2/cinergyT2.c	2007-06-30 18:01:43.000000000 +0400
@@ -118,6 +118,7 @@ struct cinergyt2 {
 	struct dvb_demux demux;
 	struct usb_device *udev;
 	struct mutex sem;
+	struct mutex wq_sem;
 	struct dvb_adapter adapter;
 	struct dvb_device *fedev;
 	struct dmxdev dmxdev;
@@ -482,14 +483,14 @@ static int cinergyt2_open (struct inode 
 	struct cinergyt2 *cinergyt2 = dvbdev->priv;
 	int err = -ERESTARTSYS;
 
-	if (cinergyt2->disconnect_pending || mutex_lock_interruptible(&cinergyt2->sem))
-		return -ERESTARTSYS;
+	if (cinergyt2->disconnect_pending || mutex_lock_interruptible(&cinergyt2->wq_sem))
+		goto out;
 
-	if ((err = dvb_generic_open(inode, file))) {
-		mutex_unlock(&cinergyt2->sem);
-		return err;
-	}
+	if (mutex_lock_interruptible(&cinergyt2->sem))
+		goto out_unlock1;
 
+	if ((err = dvb_generic_open(inode, file)))
+		goto out_unlock2;
 
 	if ((file->f_flags & O_ACCMODE) != O_RDONLY) {
 		cinergyt2_sleep(cinergyt2, 0);
@@ -498,8 +499,12 @@ static int cinergyt2_open (struct inode 
 
 	atomic_inc(&cinergyt2->inuse);
 
+out_unlock2:
 	mutex_unlock(&cinergyt2->sem);
-	return 0;
+out_unlock1:
+	mutex_unlock(&cinergyt2->wq_sem);
+out:
+	return err;
 }
 
 static void cinergyt2_unregister(struct cinergyt2 *cinergyt2)
@@ -519,15 +524,17 @@ static int cinergyt2_release (struct ino
 	struct dvb_device *dvbdev = file->private_data;
 	struct cinergyt2 *cinergyt2 = dvbdev->priv;
 
-	mutex_lock(&cinergyt2->sem);
+	mutex_lock(&cinergyt2->wq_sem);
 
 	if (!cinergyt2->disconnect_pending && (file->f_flags & O_ACCMODE) != O_RDONLY) {
-		cancel_delayed_work(&cinergyt2->query_work);
-		flush_scheduled_work();
+		cancel_rearming_delayed_work(&cinergyt2->query_work);
+
+		mutex_lock(&cinergyt2->sem);
 		cinergyt2_sleep(cinergyt2, 1);
+		mutex_unlock(&cinergyt2->sem);
 	}
 
-	mutex_unlock(&cinergyt2->sem);
+	mutex_unlock(&cinergyt2->wq_sem);
 
 	if (atomic_dec_and_test(&cinergyt2->inuse) && cinergyt2->disconnect_pending) {
 		warn("delayed unregister in release");
@@ -838,13 +845,13 @@ static int cinergyt2_register_rc(struct 
 
 static void cinergyt2_unregister_rc(struct cinergyt2 *cinergyt2)
 {
-	cancel_delayed_work(&cinergyt2->rc_query_work);
+	cancel_rearming_delayed_work(&cinergyt2->rc_query_work);
 	input_unregister_device(cinergyt2->rc_input_dev);
 }
 
 static inline void cinergyt2_suspend_rc(struct cinergyt2 *cinergyt2)
 {
-	cancel_delayed_work(&cinergyt2->rc_query_work);
+	cancel_rearming_delayed_work(&cinergyt2->rc_query_work);
 }
 
 static inline void cinergyt2_resume_rc(struct cinergyt2 *cinergyt2)
@@ -907,6 +914,7 @@ static int cinergyt2_probe (struct usb_i
 	usb_set_intfdata (intf, (void *) cinergyt2);
 
 	mutex_init(&cinergyt2->sem);
+	mutex_init(&cinergyt2->wq_sem);
 	init_waitqueue_head (&cinergyt2->poll_wq);
 	INIT_DELAYED_WORK(&cinergyt2->query_work, cinergyt2_query);
 
@@ -974,11 +982,8 @@ static void cinergyt2_disconnect (struct
 {
 	struct cinergyt2 *cinergyt2 = usb_get_intfdata (intf);
 
-	flush_scheduled_work();
-
 	cinergyt2_unregister_rc(cinergyt2);
-
-	cancel_delayed_work(&cinergyt2->query_work);
+	cancel_rearming_delayed_work(&cinergyt2->query_work);
 	wake_up_interruptible(&cinergyt2->poll_wq);
 
 	cinergyt2->demux.dmx.close(&cinergyt2->demux.dmx);
@@ -992,21 +997,22 @@ static int cinergyt2_suspend (struct usb
 {
 	struct cinergyt2 *cinergyt2 = usb_get_intfdata (intf);
 
-	if (cinergyt2->disconnect_pending || mutex_lock_interruptible(&cinergyt2->sem))
+	if (cinergyt2->disconnect_pending || mutex_lock_interruptible(&cinergyt2->wq_sem))
 		return -ERESTARTSYS;
 
 	if (1) {
-		struct cinergyt2 *cinergyt2 = usb_get_intfdata (intf);
 
 		cinergyt2_suspend_rc(cinergyt2);
-		cancel_delayed_work(&cinergyt2->query_work);
+		cancel_rearming_delayed_work(&cinergyt2->query_work);
+
+		mutex_lock(&cinergyt2->sem);
 		if (cinergyt2->streaming)
 			cinergyt2_stop_stream_xfer(cinergyt2);
-		flush_scheduled_work();
 		cinergyt2_sleep(cinergyt2, 1);
+		mutex_unlock(&cinergyt2->sem);
 	}
 
-	mutex_unlock(&cinergyt2->sem);
+	mutex_unlock(&cinergyt2->wq_sem);
 	return 0;
 }
 
@@ -1014,9 +1020,15 @@ static int cinergyt2_resume (struct usb_
 {
 	struct cinergyt2 *cinergyt2 = usb_get_intfdata (intf);
 	struct dvbt_set_parameters_msg *param = &cinergyt2->param;
+	int err = -ERESTARTSYS;
 
-	if (cinergyt2->disconnect_pending || mutex_lock_interruptible(&cinergyt2->sem))
-		return -ERESTARTSYS;
+	if (cinergyt2->disconnect_pending || mutex_lock_interruptible(&cinergyt2->wq_sem))
+		goto out;
+
+	if (mutex_lock_interruptible(&cinergyt2->sem))
+		goto out_unlock1;
+
+	err = 0;
 
 	if (!cinergyt2->sleeping) {
 		cinergyt2_sleep(cinergyt2, 0);
@@ -1029,7 +1041,10 @@ static int cinergyt2_resume (struct usb_
 	cinergyt2_resume_rc(cinergyt2);
 
 	mutex_unlock(&cinergyt2->sem);
-	return 0;
+out_unlock1:
+	mutex_unlock(&cinergyt2->wq_sem);
+out:
+	return err;
 }
 
 static const struct usb_device_id cinergyt2_table [] __devinitdata = {


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.6.22-rc6 spurious hangs
  2007-06-30 14:19                         ` Oleg Nesterov
@ 2007-07-01 18:54                           ` Thomas Sattler
  2007-07-01 19:47                             ` Oleg Nesterov
  0 siblings, 1 reply; 22+ messages in thread
From: Thomas Sattler @ 2007-07-01 18:54 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Markus Rechberger, Mauro Carvalho Chehab, Ingo Molnar,
	Linux Kernel Mailing List, Alan Cox, Daniel Mack,
	Holger Waechtler

> Thomas, any chance you could try the patch below? It is very, very stupid,
> it was done without any understanding of this code, and of course it is
> completely untested. I doubt very much it is correct, and even if it is
> correct it is definitely not good. It would be great if Dmitry can take a look.

I'm still testing but I couldn't break it until now.
And I didn't find any drawbacks yet.

Thomas

-- 
keep mailinglists in english, feel free to send PM in german

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.6.22-rc6 spurious hangs
  2007-07-01 18:54                           ` Thomas Sattler
@ 2007-07-01 19:47                             ` Oleg Nesterov
  2007-07-02  4:48                               ` Thomas Sattler
  0 siblings, 1 reply; 22+ messages in thread
From: Oleg Nesterov @ 2007-07-01 19:47 UTC (permalink / raw)
  To: Thomas Sattler
  Cc: Markus Rechberger, Mauro Carvalho Chehab, Ingo Molnar,
	Linux Kernel Mailing List, Alan Cox, Daniel Mack,
	Holger Waechtler

On 07/01, Thomas Sattler wrote:
>
> > Thomas, any chance you could try the patch below? It is very, very stupid,
> > it was done without any understanding of this code, and of course it is
> > completely untested. I doubt very much it is correct, and even if it is
> > correct it is definitely not good. It would be great if Dmitry can take a look.
> 
> I'm still testing but I couldn't break it until now.
> And I didn't find any drawbacks yet.

Great, thanks a lot Thomas!

I'll re-send the patch tomorrow with a proper changelog unless you find
something else or maintainers have any objections.

Oleg.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: 2.6.22-rc6 spurious hangs
  2007-07-01 19:47                             ` Oleg Nesterov
@ 2007-07-02  4:48                               ` Thomas Sattler
  0 siblings, 0 replies; 22+ messages in thread
From: Thomas Sattler @ 2007-07-02  4:48 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Markus Rechberger, Mauro Carvalho Chehab, Ingo Molnar,
	Linux Kernel Mailing List, Alan Cox, Daniel Mack,
	Holger Waechtler

>>> Thomas, any chance you could try the patch below?
>> I'm still testing but I couldn't break it until now.
> Great, thanks a lot Thomas!
The box is still running without a problem,
it seems the bug is fixed.

Thanks a lot,
Thomas

-- 
keep mailinglists in english, feel free to send PM in german

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2007-07-02  4:48 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-28 14:00 2.6.22-rc6 spurious hangs Thomas Sattler
2007-06-28 14:47 ` Oleg Nesterov
2007-06-28 14:58   ` Thomas Sattler
2007-06-28 15:00     ` Ingo Molnar
     [not found]     ` <20070628150826.GA487@tv-sign.ru>
2007-06-28 17:35       ` Thomas Sattler
2007-06-28 17:49         ` Thomas Sattler
2007-06-28 18:10           ` Oleg Nesterov
2007-06-28 21:31             ` Thomas Sattler
2007-06-29  7:13             ` Thomas Sattler
2007-06-29 13:09               ` Oleg Nesterov
2007-06-29 13:16                 ` Ingo Molnar
2007-06-29 13:58                   ` Oleg Nesterov
2007-06-29 14:10                     ` Ingo Molnar
2007-06-29 16:59                       ` Dmitry Torokhov
2007-06-29 17:23                         ` Oleg Nesterov
2007-06-29 21:21                     ` Mauro Carvalho Chehab
2007-06-29 21:34                       ` Markus Rechberger
2007-06-30 14:19                         ` Oleg Nesterov
2007-07-01 18:54                           ` Thomas Sattler
2007-07-01 19:47                             ` Oleg Nesterov
2007-07-02  4:48                               ` Thomas Sattler
2007-06-29 11:15 ` Tomi Orava

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox