[PATCH] input: polldev can cause crash in case of polling disabled

linux-input.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] input: polldev can cause crash in case of polling disabled
@ 2010-02-16 14:44 Samu Onkalo
  2010-02-16 17:50 ` Dmitry Torokhov
  0 siblings, 1 reply; 14+ messages in thread
From: Samu Onkalo @ 2010-02-16 14:44 UTC (permalink / raw)
  To: dmitry.torokhov; +Cc: linux-input, Samu Onkalo

If polling is set to disabled value and polled input device
is opened and closed several times, address to workqueue will probably
change at some point. Since nothing is queued (due to polled disabled
state), content of the work struct contains pointer to the old and non-existent
workqueue. When the device is closed again, cancel_delayed_work_sync
goes crazy due to pointer to nonexisting workqueue.

In case on disabled polling, init work struct to initial value to
clean up the old values.

Signed-off-by: Samu Onkalo <samu.p.onkalo@nokia.com>
---
 drivers/input/input-polldev.c |   10 +++++++++-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/drivers/input/input-polldev.c b/drivers/input/input-polldev.c
index aa6713b..f9c8e34 100644
--- a/drivers/input/input-polldev.c
+++ b/drivers/input/input-polldev.c
@@ -88,9 +88,17 @@ static int input_open_polled_device(struct input_dev *input)
 	if (dev->open)
 		dev->open(dev);
 
-	/* Only start polling if polling is enabled */
+	/*
+	 * Only start polling if polling is enabled.
+	 * If polling is not running, clean up work struct since
+	 * pointer to just allocated WQ may have been changed since
+	 * previous use. If polling is not used, canceling of the
+	 * work goes crazy.
+	 */
 	if (dev->poll_interval > 0)
 		queue_delayed_work(polldev_wq, &dev->work, 0);
+	else
+		INIT_DELAYED_WORK(&dev->work, input_polled_device_work);
 
 	return 0;
 }
-- 
1.6.0.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] input: polldev can cause crash in case of polling disabled
  2010-02-16 14:44 [PATCH] input: polldev can cause crash in case of polling disabled Samu Onkalo
@ 2010-02-16 17:50 ` Dmitry Torokhov
  2010-02-16 18:37   ` samu.p.onkalo
  0 siblings, 1 reply; 14+ messages in thread
From: Dmitry Torokhov @ 2010-02-16 17:50 UTC (permalink / raw)
  To: Samu Onkalo; +Cc: linux-input

Hi Samu,

On Tue, Feb 16, 2010 at 04:44:41PM +0200, Samu Onkalo wrote:
> If polling is set to disabled value and polled input device
> is opened and closed several times, address to workqueue will probably
> change at some point. Since nothing is queued (due to polled disabled
> state), content of the work struct contains pointer to the old and non-existent
> workqueue.

This I do not quite understand. The work struct as far as I can see does
not reference workqueue at all. There is a list entry but if we do not
poll the device that entry should be always detached from any lists. We
properly initialize WQ entry when we create the device and it shoudl
remain valid until the device is destroyed.

> When the device is closed again, cancel_delayed_work_sync
> goes crazy due to pointer to nonexisting workqueue.
> 

What kind of failure do you see? Is there a stack trace or something?

> In case on disabled polling, init work struct to initial value to
> clean up the old values.
> 

Also, why would not we see the same issue with enabled polling? The
workqueue is being created and destroyed in this case as well.

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] input: polldev can cause crash in case of polling disabled
  2010-02-16 17:50 ` Dmitry Torokhov
@ 2010-02-16 18:37   ` samu.p.onkalo
  2010-02-16 21:43     ` Dmitry Torokhov
  0 siblings, 1 reply; 14+ messages in thread
From: samu.p.onkalo @ 2010-02-16 18:37 UTC (permalink / raw)
  To: dmitry.torokhov; +Cc: linux-input

>-----Original Message-----
>From: ext Dmitry Torokhov [mailto:dmitry.torokhov@gmail.com]
>Sent: 16 February, 2010 19:51
>To: Onkalo Samu.P (Nokia-D/Tampere)
>Cc: linux-input@vger.kernel.org
>Subject: Re: [PATCH] input: polldev can cause crash in case of polling
>disabled
>
>Hi Samu,
>
>On Tue, Feb 16, 2010 at 04:44:41PM +0200, Samu Onkalo wrote:
>> If polling is set to disabled value and polled input device
>> is opened and closed several times, address to workqueue will probably
>> change at some point. Since nothing is queued (due to polled disabled
>> state), content of the work struct contains pointer to the old and
>non-existent
>> workqueue.
>
>This I do not quite understand. The work struct as far as I can see does
>not reference workqueue at all. There is a list entry but if we do not
>poll the device that entry should be always detached from any lists. We
>properly initialize WQ entry when we create the device and it shoudl
>remain valid until the device is destroyed.

'data' entry contains a pointer to per-cpu-workqueue which in turn contains
the workqueue pointer. This 'data' entry is not ok in case of failure. I can
Collect more data about this.

>
>> When the device is closed again, cancel_delayed_work_sync
>> goes crazy due to pointer to nonexisting workqueue.
>>
>
>What kind of failure do you see? Is there a stack trace or something?

Kernel panic while in workqueue handling (paging fault with some crappy address).

>
>> In case on disabled polling, init work struct to initial value to
>> clean up the old values.
>>
>
>Also, why would not we see the same issue with enabled polling? The
>workqueue is being created and destroyed in this case as well.

Queue_delayed_work updates the work struct. Workqueue itself is ok.

I think that the sequence goes about this way (no other polled devices open):
1. Polled device is opened with polling enabled
2. It first creates workqueue and then queue the first polling. Kernels
Workqueue functions updates current workqueue information to the work-struct
3. polled device is closed
4. workqueue is destroyed

5. polling interval is set to 0
6. device is reopened
7. New workqueue is created
8. polled device is closed without queueing a work
9. work struct for polled device contains pointer to the old (created in 2.) wq
10. cancel_workqueue... can access unallocated memory causing crash.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] input: polldev can cause crash in case of polling disabled
  2010-02-16 18:37   ` samu.p.onkalo
@ 2010-02-16 21:43     ` Dmitry Torokhov
  2010-02-17  8:15       ` samu.p.onkalo
  2010-02-17 16:28       ` Oleg Nesterov
  0 siblings, 2 replies; 14+ messages in thread
From: Dmitry Torokhov @ 2010-02-16 21:43 UTC (permalink / raw)
  To: samu.p.onkalo, oleg; +Cc: linux-input

On Tue, Feb 16, 2010 at 07:37:49PM +0100, samu.p.onkalo@nokia.com wrote:
> 
> 
> >-----Original Message-----
> >From: ext Dmitry Torokhov [mailto:dmitry.torokhov@gmail.com]
> >Sent: 16 February, 2010 19:51
> >To: Onkalo Samu.P (Nokia-D/Tampere)
> >Cc: linux-input@vger.kernel.org
> >Subject: Re: [PATCH] input: polldev can cause crash in case of polling
> >disabled
> >
> >Hi Samu,
> >
> >On Tue, Feb 16, 2010 at 04:44:41PM +0200, Samu Onkalo wrote:
> >> If polling is set to disabled value and polled input device
> >> is opened and closed several times, address to workqueue will probably
> >> change at some point. Since nothing is queued (due to polled disabled
> >> state), content of the work struct contains pointer to the old and
> >non-existent
> >> workqueue.
> >
> >This I do not quite understand. The work struct as far as I can see does
> >not reference workqueue at all. There is a list entry but if we do not
> >poll the device that entry should be always detached from any lists. We
> >properly initialize WQ entry when we create the device and it shoudl
> >remain valid until the device is destroyed.
> 
> 'data' entry contains a pointer to per-cpu-workqueue which in turn contains
> the workqueue pointer. This 'data' entry is not ok in case of failure. I can
> Collect more data about this.
> 
> >
> >> When the device is closed again, cancel_delayed_work_sync
> >> goes crazy due to pointer to nonexisting workqueue.
> >>
> >
> >What kind of failure do you see? Is there a stack trace or something?
> 
> Kernel panic while in workqueue handling (paging fault with some crappy address).
> 
> >
> >> In case on disabled polling, init work struct to initial value to
> >> clean up the old values.
> >>
> >
> >Also, why would not we see the same issue with enabled polling? The
> >workqueue is being created and destroyed in this case as well.
> 
> 
> Queue_delayed_work updates the work struct. Workqueue itself is ok.
> 
> I think that the sequence goes about this way (no other polled devices open):
> 1. Polled device is opened with polling enabled
> 2. It first creates workqueue and then queue the first polling. Kernels
> Workqueue functions updates current workqueue information to the work-struct
> 3. polled device is closed
> 4. workqueue is destroyed
> 
> 5. polling interval is set to 0
> 6. device is reopened
> 7. New workqueue is created
> 8. polled device is closed without queueing a work
> 9. work struct for polled device contains pointer to the old (created in 2.) wq
> 10. cancel_workqueue... can access unallocated memory causing crash.
> 

Ah, I see. In this case I think it should be fixed in workqueue code by
clearing work data so it does not point to the [potentially] non-existing
workqueue when we cancel or complete work.

Oleg, do you agree?

-- 
Dmitry

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] input: polldev can cause crash in case of polling disabled
  2010-02-16 21:43     ` Dmitry Torokhov
@ 2010-02-17  8:15       ` samu.p.onkalo
  2010-02-17  8:56         ` samu.p.onkalo
  2010-02-17 17:03         ` Oleg Nesterov
  2010-02-17 16:28       ` Oleg Nesterov
  1 sibling, 2 replies; 14+ messages in thread
From: samu.p.onkalo @ 2010-02-17  8:15 UTC (permalink / raw)
  To: dmitry.torokhov, oleg; +Cc: linux-input



>-----Original Message-----
>From: ext Dmitry Torokhov [mailto:dmitry.torokhov@gmail.com]
>Sent: 16 February, 2010 23:43
>To: Onkalo Samu.P (Nokia-D/Tampere); oleg@redhat.com
>Cc: linux-input@vger.kernel.org
>Subject: Re: [PATCH] input: polldev can cause crash in case of polling
>disabled
>
>On Tue, Feb 16, 2010 at 07:37:49PM +0100, samu.p.onkalo@nokia.com wrote:
>>
>>
>> >-----Original Message-----
>> >From: ext Dmitry Torokhov [mailto:dmitry.torokhov@gmail.com]
>> >Sent: 16 February, 2010 19:51
>> >To: Onkalo Samu.P (Nokia-D/Tampere)
>> >Cc: linux-input@vger.kernel.org
>> >Subject: Re: [PATCH] input: polldev can cause crash in case of
>polling
>> >disabled
>> >
>> >Hi Samu,
>> >
>> >On Tue, Feb 16, 2010 at 04:44:41PM +0200, Samu Onkalo wrote:
>> >> If polling is set to disabled value and polled input device
>> >> is opened and closed several times, address to workqueue will
>probably
>> >> change at some point. Since nothing is queued (due to polled
>disabled
>> >> state), content of the work struct contains pointer to the old and
>> >non-existent
>> >> workqueue.
>> >
>> >This I do not quite understand. The work struct as far as I can see
>does
>> >not reference workqueue at all. There is a list entry but if we do
>not
>> >poll the device that entry should be always detached from any lists.
>We
>> >properly initialize WQ entry when we create the device and it shoudl
>> >remain valid until the device is destroyed.
>>
>> 'data' entry contains a pointer to per-cpu-workqueue which in turn
>contains
>> the workqueue pointer. This 'data' entry is not ok in case of failure.
>I can
>> Collect more data about this.
>>
>> >
>> >> When the device is closed again, cancel_delayed_work_sync
>> >> goes crazy due to pointer to nonexisting workqueue.
>> >>
>> >
>> >What kind of failure do you see? Is there a stack trace or something?
>>
>> Kernel panic while in workqueue handling (paging fault with some
>crappy address).

It looks like this (in ARM):
[   88.974884] Unable to handle kernel paging request at virtual address 732e726f
[   88.994445] pgd = c0004000
[   88.997192] [732e726f] *pgd=00000000
[   89.013610] Internal error: Oops: 5 [#1] PREEMPT
[   89.079406] CPU: 0    Not tainted  (2.6.32-09063-g2953cf4 #30)
[   89.085296] PC is at __cancel_work_timer+0xfc/0x1a0
[   89.090179] LR is at __cancel_work_timer+0xec/0x1a0
[   89.095092] pc : [<c006f880>]    lr : [<c006f870>]    psr: a0000153
[   89.095092] sp : cf411c98  ip : cf410000  fp : cf411d1c
[   89.106628] r10: dfba59fc  r9 : dfba59f0  r8 : dfba59e4
[   89.111877] r7 : 00000000  r6 : 00000000  r5 : dfba59e0  r4 : dfba59c0
[   89.118438] r3 : 732e726f  r2 : d34a1b00  r1 : 00000001  r0 : dfba59f0
[   89.125000] Flags: NzCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment user
[   89.132263] Control: 10c5387d  Table: 91110019  DAC: 00000015
[   89.144866] Stack: (0xcf411c98 to 0xcf412000)
[   89.149261] 1c80:                                                       00000002 00000000
[   89.157470] 1ca0: c006f824 cf410000 00000000 d34a1b00 dfa3f540 ded44078 cf411cfc cf411cc8
[   89.165710] 1cc0: c00861d8 c0084a20 00000002 00000000 00000000 c02b2428 cf410000 20000153
[   89.173919] 1ce0: cf411cfc c00813a4 cf410000 20000153 cf411d44 dfba59c0 dfb618cc dfb2b864
[   89.182159] 1d00: df67e960 ded44078 dfa3f540 ded44078 cf411d2c cf411d20 c006f938 c006f790
[   89.190368] 1d20: cf411d44 cf411d30 bf10c2fc c006f930 dfb2b0c0 dfb618cc cf411d64 cf411d48
[   89.198608] 1d40: c02b2458 bf10c2e8 dfb618c0 d1ca1080 dfb6193c df67e960 cf411d84 cf411d68
[   89.206817] 1d60: c02b4ab8 c02b240c 00000000 cf408200 00000010 df67e960 cf411dc4 cf411d88
[   89.215057] 1d80: c00ce2e0 c02b4a38 00000000 00000000 cf408200 df67e960 cf411db4 cf408200
[   89.223266] 1da0: 00000000 d3bca000 0000000c d3bca008 00000001 00000000 cf411dd4 cf411dc8
[   89.231506] 1dc0: c00ce438 c00ce1bc cf411df4 cf411dd8 c00caa74 c00ce414 00000041 00000000
[   89.239715] 1de0: d3bca000 0000000c cf411e1c cf411df8 c005f8dc c00caa08 d34a1de0 d34a1b00
[   89.247955] 1e00: d3bca000 00000002 cf411ee0 d349b684 cf411e3c cf411e20 c005f984 c005f85c
[   89.256164] 1e20: 00000002 d34a1b00 cf410000 00000002 cf411e54 cf411e40 c0060fa8 c005f950
[   89.264404] 1e40: cf411e5c cf411e50 cf411e74 cf411e58 c00614ec c0060de8 c002e204 00000009
[   89.272613] 1e60: be95167c cf410000 cf411ec4 cf411e78 c006c218 c0061460 00000400 dfb618f0
[   89.280853] 1e80: dfb618f0 20000153 cf411ea4 cf411fb0 d349b184 d34d1680 cf411ebc c002e204
[   89.289062] 1ea0: cf411fb0 be95167c 00000003 c002e204 cf410000 00000003 cf411fac cf411ec8
[   89.297302] 1ec0: c0030430 c006be94 d1ca1080 cf410000 cf411f3c 00000001 c02b55a8 c0073430
[   89.305511] 1ee0: 00000009 00000000 00000000 00000000 00000000 c00732ec cf411ef8 cf411ef8
[   89.313751] 1f00: c00ccbfc c01e16e0 00000000 dfae9a10 d353cdd4 cf408200 be951644 cf411f70
[   89.321960] 1f20: 00000400 00000400 cf410000 00000003 cf411f6c cf411f40 c00cd770 c02b54c4
[   89.330200] 1f40: 00000000 00000000 cf410000 00000000 00000000 cf408200 00000400 be951644
[   89.338409] 1f60: cf411fa4 cf411f70 c00cd8e0 c00cd6c4 cf411f8c cf411f80 c0082b24 c0082250
[   89.346649] 1f80: 00000000 00012d98 00012818 be95167c 00000003 c002e204 cf410000 00000003
[   89.354858] 1fa0: 00000000 cf411fb0 c002e098 c00303c0 fffffe00 be951644 00000400 00000001
[   89.363098] 1fc0: 00012d98 00012818 be95167c 00000003 be951644 00000003 00000003 be951cd4
[   89.371307] 1fe0: 00000002 be950a70 0000cfe4 400fbfcc 60000050 00000003 00000000 00000000
[   89.379547] Backtrace: 
[   89.382019] [<c006f784>] (__cancel_work_timer+0x0/0x1a0) from [<c006f938>] (cancel_delayed_work_sync+0x14/0x18)
[   89.392181] [<c006f924>] (cancel_delayed_work_sync+0x0/0x18) from [<bf10c2fc>] (input_close_polled_device+0x20/0x74 [input_p)
[   89.404174] [<bf10c2dc>] (input_close_polled_device+0x0/0x74 [input_polldev]) from [<c02b2458>] (input_close_device+0x58/0x7)
[   89.415618]  r5:dfb618cc r4:dfb2b0c0
[   89.419250] [<c02b2400>] (input_close_device+0x0/0x7c) from [<c02b4ab8>] (evdev_release+0x8c/0xa4)
[   89.428253]  r7:df67e960 r6:dfb6193c r5:d1ca1080 r4:dfb618c0
[   89.433990] [<c02b4a2c>] (evdev_release+0x0/0xa4) from [<c00ce2e0>] (__fput+0x130/0x258)
[   89.442108]  r7:df67e960 r6:00000010 r5:cf408200 r4:00000000
[   89.447845] [<c00ce1b0>] (__fput+0x0/0x258) from [<c00ce438>] (fput+0x30/0x34)
[   89.455108] [<c00ce408>] (fput+0x0/0x34) from [<c00caa74>] (filp_close+0x78/0x84)
[   89.462646] [<c00ca9fc>] (filp_close+0x0/0x84) from [<c005f8dc>] (put_files_struct+0x8c/0xf4)
[   89.471221]  r7:0000000c r6:d3bca000 r5:00000000 r4:00000041
[   89.476928] [<c005f850>] (put_files_struct+0x0/0xf4) from [<c005f984>] (exit_files+0x40/0x44)
[   89.485504] [<c005f944>] (exit_files+0x0/0x44) from [<c0060fa8>] (do_exit+0x1cc/0x678)
[   89.493469]  r7:00000002 r6:cf410000 r5:d34a1b00 r4:00000002
[   89.499206] [<c0060ddc>] (do_exit+0x0/0x678) from [<c00614ec>] (do_group_exit+0x98/0xc8)
[   89.507354] [<c0061454>] (do_group_exit+0x0/0xc8) from [<c006c218>] (get_signal_to_deliver+0x390/0x3d8)
[   89.516784]  r7:cf410000 r6:be95167c r5:00000009 r4:c002e204
[   89.522521] [<c006be88>] (get_signal_to_deliver+0x0/0x3d8) from [<c0030430>] (do_notify_resume+0x7c/0x594)
[   89.532226] [<c00303b4>] (do_notify_resume+0x0/0x594) from [<c002e098>] (work_pending+0x1c/0x20)
[   89.541076] Code: e5953000 e3d33003 0a000011 e593304c (e5934000) 
[   90.065185] ---[ end trace d28022cc252c43c3 ]---
[   90.075866] Kernel panic - not syncing: Fatal exception

I added traces to drivers/input/input-polldev.c
static int input_open_polled_device(struct input_dev *input)
{
	struct input_polled_dev *dev = input_get_drvdata(input);
	int error;
	struct cpu_workqueue_struct *cwq;

	printk("INPUT_OPEN_POLLED_DEVICE - enter\n");
	printk("polldev_wq before start_workqueue: %x\n", polldev_wq);
	if (polldev_wq != NULL)
		printk("cpu_wq of wq: %x\n", polldev_wq->cpu_wq);
		

	error = input_polldev_start_workqueue();
	if (error)
		return error;
	printk("polldev_wq after start_workqueue: %x\n", polldev_wq);
	if (polldev_wq != NULL)
		printk("cpu_wq of wq: %x\n", polldev_wq->cpu_wq);

	if (dev->open)
		dev->open(dev);

	printk("poll_interval: %x\n", dev->poll_interval );
	printk("addr of work: %x\n", &dev->work);
	cwq = atomic_long_read(&dev->work.work.data) & WORK_STRUCT_WQ_DATA_MASK;
	printk("data (==cwq) at work (before queueing): %x\n", cwq);
	if (cwq != NULL)
		printk("wq from cwq: %x\n", cwq->wq);

	/* Only start polling if polling is enabled */
	if (dev->poll_interval > 0)
		queue_delayed_work(polldev_wq, &dev->work, 0);

	cwq = atomic_long_read(&dev->work.work.data) & WORK_STRUCT_WQ_DATA_MASK;
	printk("data (==cwq) at work (after queueing): %x\n", cwq);
	if (cwq != NULL)
		printk("wq from cwq: %x\n", cwq->wq);

	printk("INPUT_OPEN_POLLED_DEVICE - done\n");
	return 0;
}

First time device open:
[   29.094879] INPUT_OPEN_POLLED_DEVICE - enter
[   29.099182] polldev_wq before start_workqueue: 0
[   29.154449] polldev_wq after start_workqueue: ded3c580
[   29.159637] cpu_wq of wq: dc4b8480
[   29.255950] poll_interval: 32
[   29.258941] addr of work: dfbe8f20
[   29.262359] data (==cwq) at work (before queueing): 0
[   29.267669] data (==cwq) at work (after queueing): dc4b8480  <------- CPU workqueue same as polldev_wq
[   29.273315] wq from cwq: ded3c580
[   29.276641] INPUT_OPEN_POLLED_DEVICE - done

there were other devices opened and closed between

Second time device open:
[  202.372680] INPUT_OPEN_POLLED_DEVICE - enter
[  202.377258] polldev_wq before start_workqueue: dc57ba00
[  202.382598] cpu_wq of wq: dc4ea900
[  202.386077] polldev_wq after start_workqueue: dc57ba00
[  202.391326] cpu_wq of wq: dc4ea900
[  202.459259] poll_interval: 0  <-------------------------------- no queueing because of this
[  202.462188] addr of work: dfbe8f20
[  202.465637] data (==cwq) at work (before queueing): dc4b8480  <----------- CPU workqueue not updated.
[  202.471435] wq from cwq: 6b6b006f
[  202.474853] data (==cwq) at work (after queueing): dc4b8480 <----- queueing not done -> not updated
[  202.480468] wq from cwq: 6b6b006f <-------------------------- crap
[  202.483886] INPUT_OPEN_POLLED_DEVICE - done

And when cancel_delayed_work_sync is called, kernel crashes due to crap address to per-cpu workqueue.

Actually problem is that "data" field in the work struct points to the non-existing per-cpu workqueue entry.
When this is cancelled at device close, kernel crashes. But what is the root cause? In input-polldev,
workqueue can change from time to time depending on what happens at the userspace. Work struct
can still contain references to the old workqueue. Is that illegal thing to do?
Or should cancellation of the work cleanup the work struct so that it is in initial state.

It is said in the workqueue.c:
* The caller must ensure that workqueue_struct on which this work was last
 * queued can't be destroyed before this function returns.

This is not true here. Workqueue has been destroyed since the work has never queued to the new workqueue.
Either cancel_(delayed)_work_sync should clear the data field instead of setting it to non-pending or
input-polldev must clear the work struct in case of no queueing. Or do you have other proposals?

Br,
Samu





^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] input: polldev can cause crash in case of polling disabled
  2010-02-17  8:15       ` samu.p.onkalo
@ 2010-02-17  8:56         ` samu.p.onkalo
  2010-02-17  9:47           ` Dmitry Torokhov
  2010-02-17 17:03         ` Oleg Nesterov
  1 sibling, 1 reply; 14+ messages in thread
From: samu.p.onkalo @ 2010-02-17  8:56 UTC (permalink / raw)
  To: dmitry.torokhov, oleg; +Cc: linux-input



>-----Original Message-----
>From: linux-input-owner@vger.kernel.org [mailto:linux-input-
>owner@vger.kernel.org] On Behalf Of Onkalo Samu.P (Nokia-D/Tampere)
>Sent: 17 February, 2010 10:16
>To: dmitry.torokhov@gmail.com; oleg@redhat.com
>Cc: linux-input@vger.kernel.org
>Subject: RE: [PATCH] input: polldev can cause crash in case of polling
>disabled
>
>
>
>>-----Original Message-----
>>From: ext Dmitry Torokhov [mailto:dmitry.torokhov@gmail.com]
>>Sent: 16 February, 2010 23:43
>>To: Onkalo Samu.P (Nokia-D/Tampere); oleg@redhat.com
>>Cc: linux-input@vger.kernel.org
>>Subject: Re: [PATCH] input: polldev can cause crash in case of polling
>>disabled
>>
>>On Tue, Feb 16, 2010 at 07:37:49PM +0100, samu.p.onkalo@nokia.com
>wrote:
>>>
>>>
>>> >-----Original Message-----
>>> >From: ext Dmitry Torokhov [mailto:dmitry.torokhov@gmail.com]
>>> >Sent: 16 February, 2010 19:51
>>> >To: Onkalo Samu.P (Nokia-D/Tampere)
>>> >Cc: linux-input@vger.kernel.org
>>> >Subject: Re: [PATCH] input: polldev can cause crash in case of
>>polling
>>> >disabled
>>> >
>>> >Hi Samu,
>>> >
>>> >On Tue, Feb 16, 2010 at 04:44:41PM +0200, Samu Onkalo wrote:
>>> >> If polling is set to disabled value and polled input device
>>> >> is opened and closed several times, address to workqueue will
>>probably
>>> >> change at some point. Since nothing is queued (due to polled
>>disabled
>>> >> state), content of the work struct contains pointer to the old and
>>> >non-existent
>>> >> workqueue.
>>> >
>>> >This I do not quite understand. The work struct as far as I can see
>>does
>>> >not reference workqueue at all. There is a list entry but if we do
>>not
>>> >poll the device that entry should be always detached from any lists.
>>We
>>> >properly initialize WQ entry when we create the device and it shoudl
>>> >remain valid until the device is destroyed.
>>>
>>> 'data' entry contains a pointer to per-cpu-workqueue which in turn
>>contains
>>> the workqueue pointer. This 'data' entry is not ok in case of
>failure.
>>I can
>>> Collect more data about this.
>>>
>>> >
>>> >> When the device is closed again, cancel_delayed_work_sync
>>> >> goes crazy due to pointer to nonexisting workqueue.
>>> >>
>>> >
>>> >What kind of failure do you see? Is there a stack trace or
>something?
>>>
>>> Kernel panic while in workqueue handling (paging fault with some
>>crappy address).
>
>It looks like this (in ARM):
>[   88.974884] Unable to handle kernel paging request at virtual address
>732e726f
>[   88.994445] pgd = c0004000
>[   88.997192] [732e726f] *pgd=00000000
>[   89.013610] Internal error: Oops: 5 [#1] PREEMPT
>[   89.079406] CPU: 0    Not tainted  (2.6.32-09063-g2953cf4 #30)
>[   89.085296] PC is at __cancel_work_timer+0xfc/0x1a0
>[   89.090179] LR is at __cancel_work_timer+0xec/0x1a0
>[   89.095092] pc : [<c006f880>]    lr : [<c006f870>]    psr: a0000153
>[   89.095092] sp : cf411c98  ip : cf410000  fp : cf411d1c
>[   89.106628] r10: dfba59fc  r9 : dfba59f0  r8 : dfba59e4
>[   89.111877] r7 : 00000000  r6 : 00000000  r5 : dfba59e0  r4 :
>dfba59c0
>[   89.118438] r3 : 732e726f  r2 : d34a1b00  r1 : 00000001  r0 :
>dfba59f0
>[   89.125000] Flags: NzCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM
>Segment user
>[   89.132263] Control: 10c5387d  Table: 91110019  DAC: 00000015
>[   89.144866] Stack: (0xcf411c98 to 0xcf412000)
>[   89.149261] 1c80:
>00000002 00000000
>[   89.157470] 1ca0: c006f824 cf410000 00000000 d34a1b00 dfa3f540
>ded44078 cf411cfc cf411cc8
>[   89.165710] 1cc0: c00861d8 c0084a20 00000002 00000000 00000000
>c02b2428 cf410000 20000153
>[   89.173919] 1ce0: cf411cfc c00813a4 cf410000 20000153 cf411d44
>dfba59c0 dfb618cc dfb2b864
>[   89.182159] 1d00: df67e960 ded44078 dfa3f540 ded44078 cf411d2c
>cf411d20 c006f938 c006f790
>[   89.190368] 1d20: cf411d44 cf411d30 bf10c2fc c006f930 dfb2b0c0
>dfb618cc cf411d64 cf411d48
>[   89.198608] 1d40: c02b2458 bf10c2e8 dfb618c0 d1ca1080 dfb6193c
>df67e960 cf411d84 cf411d68
>[   89.206817] 1d60: c02b4ab8 c02b240c 00000000 cf408200 00000010
>df67e960 cf411dc4 cf411d88
>[   89.215057] 1d80: c00ce2e0 c02b4a38 00000000 00000000 cf408200
>df67e960 cf411db4 cf408200
>[   89.223266] 1da0: 00000000 d3bca000 0000000c d3bca008 00000001
>00000000 cf411dd4 cf411dc8
>[   89.231506] 1dc0: c00ce438 c00ce1bc cf411df4 cf411dd8 c00caa74
>c00ce414 00000041 00000000
>[   89.239715] 1de0: d3bca000 0000000c cf411e1c cf411df8 c005f8dc
>c00caa08 d34a1de0 d34a1b00
>[   89.247955] 1e00: d3bca000 00000002 cf411ee0 d349b684 cf411e3c
>cf411e20 c005f984 c005f85c
>[   89.256164] 1e20: 00000002 d34a1b00 cf410000 00000002 cf411e54
>cf411e40 c0060fa8 c005f950
>[   89.264404] 1e40: cf411e5c cf411e50 cf411e74 cf411e58 c00614ec
>c0060de8 c002e204 00000009
>[   89.272613] 1e60: be95167c cf410000 cf411ec4 cf411e78 c006c218
>c0061460 00000400 dfb618f0
>[   89.280853] 1e80: dfb618f0 20000153 cf411ea4 cf411fb0 d349b184
>d34d1680 cf411ebc c002e204
>[   89.289062] 1ea0: cf411fb0 be95167c 00000003 c002e204 cf410000
>00000003 cf411fac cf411ec8
>[   89.297302] 1ec0: c0030430 c006be94 d1ca1080 cf410000 cf411f3c
>00000001 c02b55a8 c0073430
>[   89.305511] 1ee0: 00000009 00000000 00000000 00000000 00000000
>c00732ec cf411ef8 cf411ef8
>[   89.313751] 1f00: c00ccbfc c01e16e0 00000000 dfae9a10 d353cdd4
>cf408200 be951644 cf411f70
>[   89.321960] 1f20: 00000400 00000400 cf410000 00000003 cf411f6c
>cf411f40 c00cd770 c02b54c4
>[   89.330200] 1f40: 00000000 00000000 cf410000 00000000 00000000
>cf408200 00000400 be951644
>[   89.338409] 1f60: cf411fa4 cf411f70 c00cd8e0 c00cd6c4 cf411f8c
>cf411f80 c0082b24 c0082250
>[   89.346649] 1f80: 00000000 00012d98 00012818 be95167c 00000003
>c002e204 cf410000 00000003
>[   89.354858] 1fa0: 00000000 cf411fb0 c002e098 c00303c0 fffffe00
>be951644 00000400 00000001
>[   89.363098] 1fc0: 00012d98 00012818 be95167c 00000003 be951644
>00000003 00000003 be951cd4
>[   89.371307] 1fe0: 00000002 be950a70 0000cfe4 400fbfcc 60000050
>00000003 00000000 00000000
>[   89.379547] Backtrace:
>[   89.382019] [<c006f784>] (__cancel_work_timer+0x0/0x1a0) from
>[<c006f938>] (cancel_delayed_work_sync+0x14/0x18)
>[   89.392181] [<c006f924>] (cancel_delayed_work_sync+0x0/0x18) from
>[<bf10c2fc>] (input_close_polled_device+0x20/0x74 [input_p)
>[   89.404174] [<bf10c2dc>] (input_close_polled_device+0x0/0x74
>[input_polldev]) from [<c02b2458>] (input_close_device+0x58/0x7)
>[   89.415618]  r5:dfb618cc r4:dfb2b0c0
>[   89.419250] [<c02b2400>] (input_close_device+0x0/0x7c) from
>[<c02b4ab8>] (evdev_release+0x8c/0xa4)
>[   89.428253]  r7:df67e960 r6:dfb6193c r5:d1ca1080 r4:dfb618c0
>[   89.433990] [<c02b4a2c>] (evdev_release+0x0/0xa4) from [<c00ce2e0>]
>(__fput+0x130/0x258)
>[   89.442108]  r7:df67e960 r6:00000010 r5:cf408200 r4:00000000
>[   89.447845] [<c00ce1b0>] (__fput+0x0/0x258) from [<c00ce438>]
>(fput+0x30/0x34)
>[   89.455108] [<c00ce408>] (fput+0x0/0x34) from [<c00caa74>]
>(filp_close+0x78/0x84)
>[   89.462646] [<c00ca9fc>] (filp_close+0x0/0x84) from [<c005f8dc>]
>(put_files_struct+0x8c/0xf4)
>[   89.471221]  r7:0000000c r6:d3bca000 r5:00000000 r4:00000041
>[   89.476928] [<c005f850>] (put_files_struct+0x0/0xf4) from
>[<c005f984>] (exit_files+0x40/0x44)
>[   89.485504] [<c005f944>] (exit_files+0x0/0x44) from [<c0060fa8>]
>(do_exit+0x1cc/0x678)
>[   89.493469]  r7:00000002 r6:cf410000 r5:d34a1b00 r4:00000002
>[   89.499206] [<c0060ddc>] (do_exit+0x0/0x678) from [<c00614ec>]
>(do_group_exit+0x98/0xc8)
>[   89.507354] [<c0061454>] (do_group_exit+0x0/0xc8) from [<c006c218>]
>(get_signal_to_deliver+0x390/0x3d8)
>[   89.516784]  r7:cf410000 r6:be95167c r5:00000009 r4:c002e204
>[   89.522521] [<c006be88>] (get_signal_to_deliver+0x0/0x3d8) from
>[<c0030430>] (do_notify_resume+0x7c/0x594)
>[   89.532226] [<c00303b4>] (do_notify_resume+0x0/0x594) from
>[<c002e098>] (work_pending+0x1c/0x20)
>[   89.541076] Code: e5953000 e3d33003 0a000011 e593304c (e5934000)
>[   90.065185] ---[ end trace d28022cc252c43c3 ]---
>[   90.075866] Kernel panic - not syncing: Fatal exception
>
>I added traces to drivers/input/input-polldev.c
>static int input_open_polled_device(struct input_dev *input)
>{
>	struct input_polled_dev *dev = input_get_drvdata(input);
>	int error;
>	struct cpu_workqueue_struct *cwq;
>
>	printk("INPUT_OPEN_POLLED_DEVICE - enter\n");
>	printk("polldev_wq before start_workqueue: %x\n", polldev_wq);
>	if (polldev_wq != NULL)
>		printk("cpu_wq of wq: %x\n", polldev_wq->cpu_wq);
>
>
>	error = input_polldev_start_workqueue();
>	if (error)
>		return error;
>	printk("polldev_wq after start_workqueue: %x\n", polldev_wq);
>	if (polldev_wq != NULL)
>		printk("cpu_wq of wq: %x\n", polldev_wq->cpu_wq);
>
>	if (dev->open)
>		dev->open(dev);
>
>	printk("poll_interval: %x\n", dev->poll_interval );
>	printk("addr of work: %x\n", &dev->work);
>	cwq = atomic_long_read(&dev->work.work.data) &
>WORK_STRUCT_WQ_DATA_MASK;
>	printk("data (==cwq) at work (before queueing): %x\n", cwq);
>	if (cwq != NULL)
>		printk("wq from cwq: %x\n", cwq->wq);
>
>	/* Only start polling if polling is enabled */
>	if (dev->poll_interval > 0)
>		queue_delayed_work(polldev_wq, &dev->work, 0);
>
>	cwq = atomic_long_read(&dev->work.work.data) &
>WORK_STRUCT_WQ_DATA_MASK;
>	printk("data (==cwq) at work (after queueing): %x\n", cwq);
>	if (cwq != NULL)
>		printk("wq from cwq: %x\n", cwq->wq);
>
>	printk("INPUT_OPEN_POLLED_DEVICE - done\n");
>	return 0;
>}
>
>First time device open:
>[   29.094879] INPUT_OPEN_POLLED_DEVICE - enter
>[   29.099182] polldev_wq before start_workqueue: 0
>[   29.154449] polldev_wq after start_workqueue: ded3c580
>[   29.159637] cpu_wq of wq: dc4b8480
>[   29.255950] poll_interval: 32
>[   29.258941] addr of work: dfbe8f20
>[   29.262359] data (==cwq) at work (before queueing): 0
>[   29.267669] data (==cwq) at work (after queueing): dc4b8480  <-------
>CPU workqueue same as polldev_wq
>[   29.273315] wq from cwq: ded3c580
>[   29.276641] INPUT_OPEN_POLLED_DEVICE - done
>
>there were other devices opened and closed between
>
>Second time device open:
>[  202.372680] INPUT_OPEN_POLLED_DEVICE - enter
>[  202.377258] polldev_wq before start_workqueue: dc57ba00
>[  202.382598] cpu_wq of wq: dc4ea900
>[  202.386077] polldev_wq after start_workqueue: dc57ba00
>[  202.391326] cpu_wq of wq: dc4ea900
>[  202.459259] poll_interval: 0  <-------------------------------- no
>queueing because of this
>[  202.462188] addr of work: dfbe8f20
>[  202.465637] data (==cwq) at work (before queueing): dc4b8480  <------
>----- CPU workqueue not updated.
>[  202.471435] wq from cwq: 6b6b006f
>[  202.474853] data (==cwq) at work (after queueing): dc4b8480 <-----
>queueing not done -> not updated
>[  202.480468] wq from cwq: 6b6b006f <-------------------------- crap
>[  202.483886] INPUT_OPEN_POLLED_DEVICE - done
>
>And when cancel_delayed_work_sync is called, kernel crashes due to crap
>address to per-cpu workqueue.
>
>Actually problem is that "data" field in the work struct points to the
>non-existing per-cpu workqueue entry.
>When this is cancelled at device close, kernel crashes. But what is the
>root cause? In input-polldev,
>workqueue can change from time to time depending on what happens at the
>userspace. Work struct
>can still contain references to the old workqueue. Is that illegal thing
>to do?
>Or should cancellation of the work cleanup the work struct so that it is
>in initial state.
>
>It is said in the workqueue.c:
>* The caller must ensure that workqueue_struct on which this work was
>last
> * queued can't be destroyed before this function returns.
>
>This is not true here. Workqueue has been destroyed since the work has
>never queued to the new workqueue.
>Either cancel_(delayed)_work_sync should clear the data field instead of
>setting it to non-pending or
>input-polldev must clear the work struct in case of no queueing. Or do
>you have other proposals?
>

One solution is not to destroy and recreate workqueue in input-polldev
based on open / close calls. This way references to workqueue stays valid.

-Samu


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] input: polldev can cause crash in case of polling disabled
  2010-02-17  8:56         ` samu.p.onkalo
@ 2010-02-17  9:47           ` Dmitry Torokhov
  0 siblings, 0 replies; 14+ messages in thread
From: Dmitry Torokhov @ 2010-02-17  9:47 UTC (permalink / raw)
  To: samu.p.onkalo; +Cc: oleg, linux-input

Hi Samu,

On Wed, Feb 17, 2010 at 09:56:33AM +0100, samu.p.onkalo@nokia.com wrote:
> 
> >
> >Second time device open:
> >[  202.372680] INPUT_OPEN_POLLED_DEVICE - enter
> >[  202.377258] polldev_wq before start_workqueue: dc57ba00
> >[  202.382598] cpu_wq of wq: dc4ea900
> >[  202.386077] polldev_wq after start_workqueue: dc57ba00
> >[  202.391326] cpu_wq of wq: dc4ea900
> >[  202.459259] poll_interval: 0  <-------------------------------- no
> >queueing because of this
> >[  202.462188] addr of work: dfbe8f20
> >[  202.465637] data (==cwq) at work (before queueing): dc4b8480  <------
> >----- CPU workqueue not updated.
> >[  202.471435] wq from cwq: 6b6b006f
> >[  202.474853] data (==cwq) at work (after queueing): dc4b8480 <-----
> >queueing not done -> not updated
> >[  202.480468] wq from cwq: 6b6b006f <-------------------------- crap
> >[  202.483886] INPUT_OPEN_POLLED_DEVICE - done
> >
> >And when cancel_delayed_work_sync is called, kernel crashes due to crap
> >address to per-cpu workqueue.
> >
> >Actually problem is that "data" field in the work struct points to the
> >non-existing per-cpu workqueue entry.
> >When this is cancelled at device close, kernel crashes. But what is the
> >root cause? In input-polldev,
> >workqueue can change from time to time depending on what happens at the
> >userspace. Work struct
> >can still contain references to the old workqueue. Is that illegal thing
> >to do?
> >Or should cancellation of the work cleanup the work struct so that it is
> >in initial state.
> >
> >It is said in the workqueue.c:
> >* The caller must ensure that workqueue_struct on which this work was
> >last
> > * queued can't be destroyed before this function returns.
> >
> >This is not true here. Workqueue has been destroyed since the work has
> >never queued to the new workqueue.
> >Either cancel_(delayed)_work_sync should clear the data field instead of
> >setting it to non-pending or
> >input-polldev must clear the work struct in case of no queueing. Or do
> >you have other proposals?
> >
> 
> One solution is not to destroy and recreate workqueue in input-polldev
> based on open / close calls. This way references to workqueue stays valid.
> 

I would really prefer having this fixed in the workqueue core instead of
working around the issue in a driver.

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] input: polldev can cause crash in case of polling disabled
  2010-02-16 21:43     ` Dmitry Torokhov
  2010-02-17  8:15       ` samu.p.onkalo
@ 2010-02-17 16:28       ` Oleg Nesterov
  1 sibling, 0 replies; 14+ messages in thread
From: Oleg Nesterov @ 2010-02-17 16:28 UTC (permalink / raw)
  To: Dmitry Torokhov; +Cc: samu.p.onkalo, linux-input

On 02/16, Dmitry Torokhov wrote:
>
> On Tue, Feb 16, 2010 at 07:37:49PM +0100, samu.p.onkalo@nokia.com wrote:
> >
> > Queue_delayed_work updates the work struct. Workqueue itself is ok.
> >
> > I think that the sequence goes about this way (no other polled devices open):
> > 1. Polled device is opened with polling enabled
> > 2. It first creates workqueue and then queue the first polling. Kernels
> > Workqueue functions updates current workqueue information to the work-struct
> > 3. polled device is closed
> > 4. workqueue is destroyed
> >
> > 5. polling interval is set to 0
> > 6. device is reopened
> > 7. New workqueue is created
> > 8. polled device is closed without queueing a work
> > 9. work struct for polled device contains pointer to the old (created in 2.) wq
> > 10. cancel_workqueue... can access unallocated memory causing crash.
> >
>
> Ah, I see. In this case I think it should be fixed in workqueue code by
> clearing work data so it does not point to the [potentially] non-existing
> workqueue when we cancel

I think yes, cancel_() can clear ->data. Instead of work_clear_pending()
__cancel_work_timer() should do something like
"work->data &= ~WORK_STRUCT_STATIC", but we should somehow do this
atomically, to avoid the race with test_and_set_bit(PENDING).

> or complete work.

No, run_workqueue() can't do this.

First of all, we shouldn't clear work->data before work->func() returns,
and when it returns we must not use this pointer: the work can be already
freed/reused.

But even if we could do this, it would be wrong. This can break flush
or cancel, the same work can be queued/running again (on the same or
other CPUs).

Oleg.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] input: polldev can cause crash in case of polling disabled
  2010-02-17  8:15       ` samu.p.onkalo
  2010-02-17  8:56         ` samu.p.onkalo
@ 2010-02-17 17:03         ` Oleg Nesterov
  2010-02-17 19:50           ` Dmitry Torokhov
  2010-02-18  6:46           ` samu.p.onkalo
  1 sibling, 2 replies; 14+ messages in thread
From: Oleg Nesterov @ 2010-02-17 17:03 UTC (permalink / raw)
  To: samu.p.onkalo; +Cc: dmitry.torokhov, linux-input

On 02/17, samu.p.onkalo@nokia.com wrote:
>
> First time device open:
> [   29.094879] INPUT_OPEN_POLLED_DEVICE - enter
> [   29.099182] polldev_wq before start_workqueue: 0
> [   29.154449] polldev_wq after start_workqueue: ded3c580
> [   29.159637] cpu_wq of wq: dc4b8480
> [   29.255950] poll_interval: 32
> [   29.258941] addr of work: dfbe8f20
> [   29.262359] data (==cwq) at work (before queueing): 0
> [   29.267669] data (==cwq) at work (after queueing): dc4b8480  <------- CPU workqueue same as polldev_wq
> [   29.273315] wq from cwq: ded3c580
> [   29.276641] INPUT_OPEN_POLLED_DEVICE - done
>
> there were other devices opened and closed between
>
> Second time device open:
> [  202.372680] INPUT_OPEN_POLLED_DEVICE - enter
> [  202.377258] polldev_wq before start_workqueue: dc57ba00
> [  202.382598] cpu_wq of wq: dc4ea900
> [  202.386077] polldev_wq after start_workqueue: dc57ba00
> [  202.391326] cpu_wq of wq: dc4ea900
> [  202.459259] poll_interval: 0  <-------------------------------- no queueing because of this
> [  202.462188] addr of work: dfbe8f20
> [  202.465637] data (==cwq) at work (before queueing): dc4b8480  <----------- CPU workqueue not updated.
> [  202.471435] wq from cwq: 6b6b006f
> [  202.474853] data (==cwq) at work (after queueing): dc4b8480 <----- queueing not done -> not updated
> [  202.480468] wq from cwq: 6b6b006f <-------------------------- crap
> [  202.483886] INPUT_OPEN_POLLED_DEVICE - done
>
> And when cancel_delayed_work_sync is called, kernel crashes due to crap address to per-cpu workqueue.
>
> Actually problem is that "data" field in the work struct points to the non-existing per-cpu workqueue entry.
> When this is cancelled at device close, kernel crashes. But what is the root cause? In input-polldev,
> workqueue can change from time to time depending on what happens at the userspace. Work struct
> can still contain references to the old workqueue. Is that illegal thing to do?

Yes, it is illegal. Well, it is OK to do queue(), but not cancel/flush.

> It is said in the workqueue.c:
> * The caller must ensure that workqueue_struct on which this work was last
>  * queued can't be destroyed before this function returns.
>
> This is not true here. Workqueue has been destroyed since the work has
> never queued to the new workqueue.

Not sure I understand... The comment says that workqueue_struct must
stay valid until cancel() returns, of course this also means it must
be valid when cancel() is called.

> Either cancel_(delayed)_work_sync should clear the data field

I am a bit confused. Yes I think this is possible, but Dmitry thinks
we should clear ->data when the work completes... See another email in
this thread.

> instead of setting it to non-pending or
> input-polldev must clear the work struct in case of no queueing.

Or the caller of cancel can clear ->data?

Of course, I don't understand input-polldev.c, but shouldn't the trivial
patch below fix the problem? Although, most likely I do not really
understand what the problem is ;)

Oleg.

the patch assumes INIT can't race with queue, I don't know if this
is true.

--- drivers/input/input-polldev.c
+++ drivers/input/input-polldev.c
@@ -100,6 +100,7 @@ static void input_close_polled_device(st
 	struct input_polled_dev *dev = input_get_drvdata(input);
 
 	cancel_delayed_work_sync(&dev->work);
+	INIT_DELAYED_WORK(&dev->work, dev->work->func);
 	input_polldev_stop_workqueue();
 
 	if (dev->close)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] input: polldev can cause crash in case of polling disabled
  2010-02-17 17:03         ` Oleg Nesterov
@ 2010-02-17 19:50           ` Dmitry Torokhov
  2010-02-17 20:23             ` Oleg Nesterov
  2010-02-18  6:46           ` samu.p.onkalo
  1 sibling, 1 reply; 14+ messages in thread
From: Dmitry Torokhov @ 2010-02-17 19:50 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: samu.p.onkalo, linux-input

On Wed, Feb 17, 2010 at 06:03:15PM +0100, Oleg Nesterov wrote:
> On 02/17, samu.p.onkalo@nokia.com wrote:
> >
> > First time device open:
> > [   29.094879] INPUT_OPEN_POLLED_DEVICE - enter
> > [   29.099182] polldev_wq before start_workqueue: 0
> > [   29.154449] polldev_wq after start_workqueue: ded3c580
> > [   29.159637] cpu_wq of wq: dc4b8480
> > [   29.255950] poll_interval: 32
> > [   29.258941] addr of work: dfbe8f20
> > [   29.262359] data (==cwq) at work (before queueing): 0
> > [   29.267669] data (==cwq) at work (after queueing): dc4b8480  <------- CPU workqueue same as polldev_wq
> > [   29.273315] wq from cwq: ded3c580
> > [   29.276641] INPUT_OPEN_POLLED_DEVICE - done
> >
> > there were other devices opened and closed between
> >
> > Second time device open:
> > [  202.372680] INPUT_OPEN_POLLED_DEVICE - enter
> > [  202.377258] polldev_wq before start_workqueue: dc57ba00
> > [  202.382598] cpu_wq of wq: dc4ea900
> > [  202.386077] polldev_wq after start_workqueue: dc57ba00
> > [  202.391326] cpu_wq of wq: dc4ea900
> > [  202.459259] poll_interval: 0  <-------------------------------- no queueing because of this
> > [  202.462188] addr of work: dfbe8f20
> > [  202.465637] data (==cwq) at work (before queueing): dc4b8480  <----------- CPU workqueue not updated.
> > [  202.471435] wq from cwq: 6b6b006f
> > [  202.474853] data (==cwq) at work (after queueing): dc4b8480 <----- queueing not done -> not updated
> > [  202.480468] wq from cwq: 6b6b006f <-------------------------- crap
> > [  202.483886] INPUT_OPEN_POLLED_DEVICE - done
> >
> > And when cancel_delayed_work_sync is called, kernel crashes due to crap address to per-cpu workqueue.
> >
> > Actually problem is that "data" field in the work struct points to the non-existing per-cpu workqueue entry.
> > When this is cancelled at device close, kernel crashes. But what is the root cause? In input-polldev,
> > workqueue can change from time to time depending on what happens at the userspace. Work struct
> > can still contain references to the old workqueue. Is that illegal thing to do?
> 
> Yes, it is illegal. Well, it is OK to do queue(), but not cancel/flush.
> 
> > It is said in the workqueue.c:
> > * The caller must ensure that workqueue_struct on which this work was last
> >  * queued can't be destroyed before this function returns.
> >
> > This is not true here. Workqueue has been destroyed since the work has
> > never queued to the new workqueue.
> 
> Not sure I understand... The comment says that workqueue_struct must
> stay valid until cancel() returns, of course this also means it must
> be valid when cancel() is called.
>

It apppears that it is allowed to try to cancel work that has never been
queued and I believe that canceled or completed work should be exactly
the same as never been queued work (which is apparently not the case
currently).

Obviously if work has never been queued there is no workqueue struct so
it can't possibly be valid.
 
> > Either cancel_(delayed)_work_sync should clear the data field
> 
> I am a bit confused. Yes I think this is possible, but Dmitry thinks
> we should clear ->data when the work completes... See another email in
> this thread.
> 
> > instead of setting it to non-pending or
> > input-polldev must clear the work struct in case of no queueing.
> 
> Or the caller of cancel can clear ->data?
> 
> Of course, I don't understand input-polldev.c, but shouldn't the trivial
> patch below fix the problem? Although, most likely I do not really
> understand what the problem is ;)
>

Yes, it is certainly possible to work around the issue in every driver
that may happen to shut down and re-create workqueue as needed. The
question is whether it is the right thing to do.

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] input: polldev can cause crash in case of polling disabled
  2010-02-17 19:50           ` Dmitry Torokhov
@ 2010-02-17 20:23             ` Oleg Nesterov
  2010-02-17 20:54               ` Dmitry Torokhov
  0 siblings, 1 reply; 14+ messages in thread
From: Oleg Nesterov @ 2010-02-17 20:23 UTC (permalink / raw)
  To: Dmitry Torokhov; +Cc: samu.p.onkalo, linux-input

On 02/17, Dmitry Torokhov wrote:
>
> It apppears that it is allowed to try to cancel work that has never been
> queued

Yes,

> and I believe that canceled or completed work should be exactly
> the same as never been queued work (which is apparently not the case
> currently).

And yes, currently this is not the case.

As I said, I agree that cancel() could clear ->data. Will this change
help? (in any case this change is not for 2.6.33)

But I don't see how "completed" can do this, please see my previous
email. Note that flush() can't do this too.

> Yes, it is certainly possible to work around the issue in every driver
> that may happen to shut down and re-create workqueue as needed. The
> question is whether it is the right thing to do.

I'd say, the question is whether we can improve this ;) Well, see above.

Oleg.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] input: polldev can cause crash in case of polling disabled
  2010-02-17 20:23             ` Oleg Nesterov
@ 2010-02-17 20:54               ` Dmitry Torokhov
  2010-02-19 12:15                 ` Oleg Nesterov
  0 siblings, 1 reply; 14+ messages in thread
From: Dmitry Torokhov @ 2010-02-17 20:54 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: samu.p.onkalo@nokia.com, linux-input@vger.kernel.org

On Feb 17, 2010, at 12:23 PM, Oleg Nesterov <oleg@redhat.com> wrote:

> On 02/17, Dmitry Torokhov wrote:
>>
>> It apppears that it is allowed to try to cancel work that has never  
>> been
>> queued
>
> Yes,
>
>> and I believe that canceled or completed work should be exactly
>> the same as never been queued work (which is apparently not the case
>> currently).
>
> And yes, currently this is not the case.
>
> As I said, I agree that cancel() could clear ->data. Will this change
> help? (in any case this change is not for 2.6.33)
>
> But I don't see how "completed" can do this, please see my previous
> email. Note that flush() can't do this too.
>
>> Yes, it is certainly possible to work around the issue in every  
>> driver
>> that may happen to shut down and re-create workqueue as needed. The
>> question is whether it is the right thing to do.
>
> I'd say, the question is whether we can improve this ;) Well, see  
> above.
>

I understand. I think if we just change cancel to do the right thing  
that should be enough to avoid surprises.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] input: polldev can cause crash in case of polling disabled
  2010-02-17 17:03         ` Oleg Nesterov
  2010-02-17 19:50           ` Dmitry Torokhov
@ 2010-02-18  6:46           ` samu.p.onkalo
  1 sibling, 0 replies; 14+ messages in thread
From: samu.p.onkalo @ 2010-02-18  6:46 UTC (permalink / raw)
  To: oleg; +Cc: dmitry.torokhov, linux-input

>Of course, I don't understand input-polldev.c, but shouldn't the trivial
>patch below fix the problem? Although, most likely I do not really
>understand what the problem is ;)
>
>Oleg.
>
>the patch assumes INIT can't race with queue, I don't know if this
>is true.
>
>--- drivers/input/input-polldev.c
>+++ drivers/input/input-polldev.c
>@@ -100,6 +100,7 @@ static void input_close_polled_device(st
> 	struct input_polled_dev *dev = input_get_drvdata(input);
>
> 	cancel_delayed_work_sync(&dev->work);
>+	INIT_DELAYED_WORK(&dev->work, dev->work->func);
> 	input_polldev_stop_workqueue();
>
> 	if (dev->close)

Yes that should fix the problem. This way references to the old workqueue are cleaned up.
I proposed similar thing but to the opening phase. Both should work. 

I'll test this. 

-Samu


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] input: polldev can cause crash in case of polling disabled
  2010-02-17 20:54               ` Dmitry Torokhov
@ 2010-02-19 12:15                 ` Oleg Nesterov
  0 siblings, 0 replies; 14+ messages in thread
From: Oleg Nesterov @ 2010-02-19 12:15 UTC (permalink / raw)
  To: Dmitry Torokhov; +Cc: samu.p.onkalo@nokia.com, linux-input@vger.kernel.org

On 02/17, Dmitry Torokhov wrote:
>
> I understand. I think if we just change cancel to do the right thing
> that should be enough to avoid surprises.

OK. I'll send the patch but let me re-check this code first, I already
forgot some details.

Anyway, I think in the short term the patch from Samu (which changed
input_close_polled_device to re-initialize dwork) is the best option
for now.

Oleg.


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2010-02-19 12:16 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-16 14:44 [PATCH] input: polldev can cause crash in case of polling disabled Samu Onkalo
2010-02-16 17:50 ` Dmitry Torokhov
2010-02-16 18:37   ` samu.p.onkalo
2010-02-16 21:43     ` Dmitry Torokhov
2010-02-17  8:15       ` samu.p.onkalo
2010-02-17  8:56         ` samu.p.onkalo
2010-02-17  9:47           ` Dmitry Torokhov
2010-02-17 17:03         ` Oleg Nesterov
2010-02-17 19:50           ` Dmitry Torokhov
2010-02-17 20:23             ` Oleg Nesterov
2010-02-17 20:54               ` Dmitry Torokhov
2010-02-19 12:15                 ` Oleg Nesterov
2010-02-18  6:46           ` samu.p.onkalo
2010-02-17 16:28       ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).