All of lore.kernel.org
 help / color / mirror / Atom feed
* Potential data race in flush_to_ldisc
@ 2015-08-28 16:57 Dmitry Vyukov
  2015-08-28 18:10 ` Greg Kroah-Hartman
  2015-08-28 19:24 ` Peter Hurley
  0 siblings, 2 replies; 4+ messages in thread
From: Dmitry Vyukov @ 2015-08-28 16:57 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Jiri Slaby, LKML, Andrey Konovalov,
	Kostya Serebryany, Alexander Potapenko

Hello,

We are working on a dynamic data race detector for the Linux kernel,
KernelThreadSanitizer (ktsan):
https://github.com/google/ktsan/wiki

While booting kernel (upstream revision 21bdb584af8c) we got a report:

ThreadSanitizer: data-race in release_tty

Write of size 8 by thread T325 (K2579):
 [<ffffffff81655c43>] release_tty+0xf3/0x1c0 drivers/tty/tty_io.c:1688
 [<ffffffff816563a8>] tty_release+0x698/0x7c0 drivers/tty/tty_io.c:1920
 [<ffffffff8126154f>] __fput+0x15f/0x310 fs/file_table.c:207
 [<ffffffff8126176d>] ____fput+0x1d/0x30 fs/file_table.c:243
 [<ffffffff810b9485>] task_work_run+0x115/0x130 kernel/task_work.c:123
(discriminator 1)
 [<     inlined    >] do_notify_resume+0x73/0x80
tracehook_notify_resume include/linux/tracehook.h:190
 [<ffffffff81006da3>] do_notify_resume+0x73/0x80 arch/x86/kernel/signal.c:757
 [<ffffffff81ee25fc>] int_signal+0x12/0x17 arch/x86/entry/entry_64.S:326

Previous read of size 8 by thread T19 (K16):
 [<ffffffff816624d9>] flush_to_ldisc+0x29/0x300 drivers/tty/tty_buffer.c:472
 [<ffffffff810b1fce>] process_one_work+0x47e/0x930 kernel/workqueue.c:2036
 [<ffffffff810b2530>] worker_thread+0xb0/0x900 kernel/workqueue.c:2170
 [<ffffffff810bbbd0>] kthread+0x150/0x170 kernel/kthread.c:207
 [<ffffffff81ee281f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526


flush_to_ldisc accesses port->itty:

static void flush_to_ldisc(struct work_struct *work)
{
   ...
    tty = port->itty;
    if (tty == NULL)
        return;
    disc = tty_ldisc_ref(tty);

while release_tty concurrently sets itty to NULL:

static void release_tty(struct tty_struct *tty, int idx)
{
    ...
    tty->port->itty = NULL;
    if (tty->link)
        tty->link->port->itty = NULL;
    cancel_work_sync(&tty->port->buf.work);
    tty_kref_put(tty->link);
    tty_kref_put(tty);
}

It seems that read of port->itty requires to be at least READ_ONCE,
because otherwise flush_to_ldisc can check that itty is not NULL, then
re-read it again and crash with NULL deref.
I don't know what is ownership and locking story here. There can be
larger issue here: either a lock is missing, or itty can be deleted
under flush_to_ldisc feet.

Please confirm that this is real but. If so please fix it.

Thank you

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Potential data race in flush_to_ldisc
  2015-08-28 16:57 Potential data race in flush_to_ldisc Dmitry Vyukov
@ 2015-08-28 18:10 ` Greg Kroah-Hartman
  2015-08-28 18:18   ` Dmitry Vyukov
  2015-08-28 19:24 ` Peter Hurley
  1 sibling, 1 reply; 4+ messages in thread
From: Greg Kroah-Hartman @ 2015-08-28 18:10 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Jiri Slaby, LKML, Andrey Konovalov, Kostya Serebryany,
	Alexander Potapenko

On Fri, Aug 28, 2015 at 06:57:17PM +0200, Dmitry Vyukov wrote:
> Hello,
> 
> We are working on a dynamic data race detector for the Linux kernel,
> KernelThreadSanitizer (ktsan):
> https://github.com/google/ktsan/wiki
> 
> While booting kernel (upstream revision 21bdb584af8c) we got a report:
> 
> ThreadSanitizer: data-race in release_tty
> 
> Write of size 8 by thread T325 (K2579):
>  [<ffffffff81655c43>] release_tty+0xf3/0x1c0 drivers/tty/tty_io.c:1688
>  [<ffffffff816563a8>] tty_release+0x698/0x7c0 drivers/tty/tty_io.c:1920
>  [<ffffffff8126154f>] __fput+0x15f/0x310 fs/file_table.c:207
>  [<ffffffff8126176d>] ____fput+0x1d/0x30 fs/file_table.c:243
>  [<ffffffff810b9485>] task_work_run+0x115/0x130 kernel/task_work.c:123
> (discriminator 1)
>  [<     inlined    >] do_notify_resume+0x73/0x80
> tracehook_notify_resume include/linux/tracehook.h:190
>  [<ffffffff81006da3>] do_notify_resume+0x73/0x80 arch/x86/kernel/signal.c:757
>  [<ffffffff81ee25fc>] int_signal+0x12/0x17 arch/x86/entry/entry_64.S:326
> 
> Previous read of size 8 by thread T19 (K16):
>  [<ffffffff816624d9>] flush_to_ldisc+0x29/0x300 drivers/tty/tty_buffer.c:472
>  [<ffffffff810b1fce>] process_one_work+0x47e/0x930 kernel/workqueue.c:2036
>  [<ffffffff810b2530>] worker_thread+0xb0/0x900 kernel/workqueue.c:2170
>  [<ffffffff810bbbd0>] kthread+0x150/0x170 kernel/kthread.c:207
>  [<ffffffff81ee281f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
> 
> 
> flush_to_ldisc accesses port->itty:
> 
> static void flush_to_ldisc(struct work_struct *work)
> {
>    ...
>     tty = port->itty;
>     if (tty == NULL)
>         return;
>     disc = tty_ldisc_ref(tty);
> 
> while release_tty concurrently sets itty to NULL:
> 
> static void release_tty(struct tty_struct *tty, int idx)
> {
>     ...
>     tty->port->itty = NULL;
>     if (tty->link)
>         tty->link->port->itty = NULL;
>     cancel_work_sync(&tty->port->buf.work);
>     tty_kref_put(tty->link);
>     tty_kref_put(tty);
> }
> 
> It seems that read of port->itty requires to be at least READ_ONCE,
> because otherwise flush_to_ldisc can check that itty is not NULL, then
> re-read it again and crash with NULL deref.
> I don't know what is ownership and locking story here. There can be
> larger issue here: either a lock is missing, or itty can be deleted
> under flush_to_ldisc feet.
> 
> Please confirm that this is real but. If so please fix it.

Patches are always gladly accepted.  Don't force us to try to determine
if your tool is finding false-positives or not.  That is your
responsibility, not ours :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Potential data race in flush_to_ldisc
  2015-08-28 18:10 ` Greg Kroah-Hartman
@ 2015-08-28 18:18   ` Dmitry Vyukov
  0 siblings, 0 replies; 4+ messages in thread
From: Dmitry Vyukov @ 2015-08-28 18:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Jiri Slaby, LKML, Andrey Konovalov, Kostya Serebryany,
	Alexander Potapenko

On Fri, Aug 28, 2015 at 8:10 PM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Fri, Aug 28, 2015 at 06:57:17PM +0200, Dmitry Vyukov wrote:
>> Hello,
>>
>> We are working on a dynamic data race detector for the Linux kernel,
>> KernelThreadSanitizer (ktsan):
>> https://github.com/google/ktsan/wiki
>>
>> While booting kernel (upstream revision 21bdb584af8c) we got a report:
>>
>> ThreadSanitizer: data-race in release_tty
>>
>> Write of size 8 by thread T325 (K2579):
>>  [<ffffffff81655c43>] release_tty+0xf3/0x1c0 drivers/tty/tty_io.c:1688
>>  [<ffffffff816563a8>] tty_release+0x698/0x7c0 drivers/tty/tty_io.c:1920
>>  [<ffffffff8126154f>] __fput+0x15f/0x310 fs/file_table.c:207
>>  [<ffffffff8126176d>] ____fput+0x1d/0x30 fs/file_table.c:243
>>  [<ffffffff810b9485>] task_work_run+0x115/0x130 kernel/task_work.c:123
>> (discriminator 1)
>>  [<     inlined    >] do_notify_resume+0x73/0x80
>> tracehook_notify_resume include/linux/tracehook.h:190
>>  [<ffffffff81006da3>] do_notify_resume+0x73/0x80 arch/x86/kernel/signal.c:757
>>  [<ffffffff81ee25fc>] int_signal+0x12/0x17 arch/x86/entry/entry_64.S:326
>>
>> Previous read of size 8 by thread T19 (K16):
>>  [<ffffffff816624d9>] flush_to_ldisc+0x29/0x300 drivers/tty/tty_buffer.c:472
>>  [<ffffffff810b1fce>] process_one_work+0x47e/0x930 kernel/workqueue.c:2036
>>  [<ffffffff810b2530>] worker_thread+0xb0/0x900 kernel/workqueue.c:2170
>>  [<ffffffff810bbbd0>] kthread+0x150/0x170 kernel/kthread.c:207
>>  [<ffffffff81ee281f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
>>
>>
>> flush_to_ldisc accesses port->itty:
>>
>> static void flush_to_ldisc(struct work_struct *work)
>> {
>>    ...
>>     tty = port->itty;
>>     if (tty == NULL)
>>         return;
>>     disc = tty_ldisc_ref(tty);
>>
>> while release_tty concurrently sets itty to NULL:
>>
>> static void release_tty(struct tty_struct *tty, int idx)
>> {
>>     ...
>>     tty->port->itty = NULL;
>>     if (tty->link)
>>         tty->link->port->itty = NULL;
>>     cancel_work_sync(&tty->port->buf.work);
>>     tty_kref_put(tty->link);
>>     tty_kref_put(tty);
>> }
>>
>> It seems that read of port->itty requires to be at least READ_ONCE,
>> because otherwise flush_to_ldisc can check that itty is not NULL, then
>> re-read it again and crash with NULL deref.
>> I don't know what is ownership and locking story here. There can be
>> larger issue here: either a lock is missing, or itty can be deleted
>> under flush_to_ldisc feet.
>>
>> Please confirm that this is real but. If so please fix it.
>
> Patches are always gladly accepted.  Don't force us to try to determine
> if your tool is finding false-positives or not.  That is your
> responsibility, not ours :)


Well, I did my homework of eliminating all known false positives from
the tool and also by looking at the code to ensure that the report
makes sense. But I have very little experience with kernel code, so
cannot be 100% sure that this is a real race. So I am asking
maintainers to confirm.
Regarding a patch, should I just take tty_mutex in flush_to_ldisc? If
so, should it be locked before buf->lock or after?


Thank you

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Potential data race in flush_to_ldisc
  2015-08-28 16:57 Potential data race in flush_to_ldisc Dmitry Vyukov
  2015-08-28 18:10 ` Greg Kroah-Hartman
@ 2015-08-28 19:24 ` Peter Hurley
  1 sibling, 0 replies; 4+ messages in thread
From: Peter Hurley @ 2015-08-28 19:24 UTC (permalink / raw)
  To: Dmitry Vyukov, Greg Kroah-Hartman
  Cc: Jiri Slaby, LKML, Andrey Konovalov, Kostya Serebryany,
	Alexander Potapenko

On 08/28/2015 12:57 PM, Dmitry Vyukov wrote:
> Hello,
> 
> We are working on a dynamic data race detector for the Linux kernel,
> KernelThreadSanitizer (ktsan):
> https://github.com/google/ktsan/wiki
> 
> While booting kernel (upstream revision 21bdb584af8c) we got a report:
> 
> ThreadSanitizer: data-race in release_tty
> 
> Write of size 8 by thread T325 (K2579):
>  [<ffffffff81655c43>] release_tty+0xf3/0x1c0 drivers/tty/tty_io.c:1688
>  [<ffffffff816563a8>] tty_release+0x698/0x7c0 drivers/tty/tty_io.c:1920
>  [<ffffffff8126154f>] __fput+0x15f/0x310 fs/file_table.c:207
>  [<ffffffff8126176d>] ____fput+0x1d/0x30 fs/file_table.c:243
>  [<ffffffff810b9485>] task_work_run+0x115/0x130 kernel/task_work.c:123
> (discriminator 1)
>  [<     inlined    >] do_notify_resume+0x73/0x80
> tracehook_notify_resume include/linux/tracehook.h:190
>  [<ffffffff81006da3>] do_notify_resume+0x73/0x80 arch/x86/kernel/signal.c:757
>  [<ffffffff81ee25fc>] int_signal+0x12/0x17 arch/x86/entry/entry_64.S:326
> 
> Previous read of size 8 by thread T19 (K16):
>  [<ffffffff816624d9>] flush_to_ldisc+0x29/0x300 drivers/tty/tty_buffer.c:472
>  [<ffffffff810b1fce>] process_one_work+0x47e/0x930 kernel/workqueue.c:2036
>  [<ffffffff810b2530>] worker_thread+0xb0/0x900 kernel/workqueue.c:2170
>  [<ffffffff810bbbd0>] kthread+0x150/0x170 kernel/kthread.c:207
>  [<ffffffff81ee281f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
> 
> 
> flush_to_ldisc accesses port->itty:
> 
> static void flush_to_ldisc(struct work_struct *work)
> {
>    ...
>     tty = port->itty;
>     if (tty == NULL)
>         return;
>     disc = tty_ldisc_ref(tty);
> 
> while release_tty concurrently sets itty to NULL:
> 
> static void release_tty(struct tty_struct *tty, int idx)
> {
>     ...
>     tty->port->itty = NULL;
>     if (tty->link)
>         tty->link->port->itty = NULL;
>     cancel_work_sync(&tty->port->buf.work);
>     tty_kref_put(tty->link);
>     tty_kref_put(tty);
> }
> 
> It seems that read of port->itty requires to be at least READ_ONCE,

Agree; it should be READ_ONCE.

> because otherwise flush_to_ldisc can check that itty is not NULL, then
> re-read it again and crash with NULL deref.
> I don't know what is ownership and locking story here. There can be
> larger issue here: either a lock is missing, or itty can be deleted
> under flush_to_ldisc feet.
> 
> Please confirm that this is real but. If so please fix it.

Not a race.

The cancel_work_sync() waits for flush_to_ldisc() to complete, if already
running. For example,

CPU 0                                   | CPU 1
                                        |
release_tty()                           | flush_to_ldisc()
                                        |   tty = port->itty;
                                        |   tty == NULL? no
                                        |   ...
  port->itty = NULL                     |
  cancel_work_sync()                    |
    sleep here since flush_to_ldisc()   |
         running on CPU1                |
                                        | worker ends
    woken   <===========================| wake waiters

If flush_to_ldisc() was scheduled but not yet running, it will be cancelled
and not run.

Also, if flush_to_ldisc() is scheduled from some other cpu after cancel_work_sync(),
flush_to_ldisc() is guaranteed to 'see' the NULL port->itty.

Regards,
Peter Hurley

PS - And what Greg said; analyzing what is and is not a race will rapidly
improve your kernel familiarity.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-08-28 19:25 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-28 16:57 Potential data race in flush_to_ldisc Dmitry Vyukov
2015-08-28 18:10 ` Greg Kroah-Hartman
2015-08-28 18:18   ` Dmitry Vyukov
2015-08-28 19:24 ` Peter Hurley

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.