* race between flush_to_ldisc and pty_cleanup [not found] <CGME20190201133326epcms5p506416bc4ae22f600ee705f146ca1a599@epcms5p5> @ 2019-02-01 13:33 ` Maninder Singh 2019-02-01 14:21 ` gregkh 2019-02-01 14:26 ` gregkh 0 siblings, 2 replies; 5+ messages in thread From: Maninder Singh @ 2019-02-01 13:33 UTC (permalink / raw) To: peter@hurleysoftware.com, gregkh@linuxfoundation.org, jslaby@suse.com, keun-o.park@darkmatter.ae Cc: linux-kernel@vger.kernel.org, AMIT SAHRAWAT, Vaneet Narang, Rohit Thapliyal, Ayush Mittal Hi, There is some race condition between tty_port_put and flush_to_ldisc which lead to use after free case: (Kernel 4.1) [1403.5130] Unable to handle kernel paging request at virtual address 6b6b6b83 ... ... ... [1403.5132] [<c036a65c>] (ldsem_down_read_trylock) from [<c0367db4>] (tty_ldisc_ref+0x24/0x60) [1403.5132] [<c0367d90>] (tty_ldisc_ref) from [<c03690cc>] (flush_to_ldisc+0x6c/0x21c) [1403.5132] r5:dbcd4a84 r4:00000000 [1403.5132] [<c0369060>] (flush_to_ldisc) from [<c0049f18>] (process_one_work+0x214/0x570) [1403.5132] r10:00000000 r9:ddab0000 r8:e3d6e000 r7:00000000 r6:e453f740 r5:cb37b780 [1403.5132] r4:dbcd4a84 [1403.5132] [<c0049d04>] (process_one_work) from [<c004a2d4>] (worker_thread+0x60/0x580) [1403.5132] r10:e453f740 r9:ddab0000 r8:e453f764 r7:00000088 r6:e453f740 r5:cb37b798 [1403.5132] r4:cb37b780 [1403.5132] [<c004a274>] (worker_thread) from [<c005022c>] (kthread+0xec/0x104) [1403.5132] r10:00000000 r9:00000000 r8:00000000 r7:c004a274 r6:cb37b780 r5:d8a3fc80 [1403.5132] r4:00000000 [1403.5132] [<c0050140>] (kthread) from [<c00107d8>] (ret_from_fork+0x14/0x3c) for checking further we entered some debug prints and added delay in flush_to_ldisc to reproduce and seems there is some issue with workqueue implementation of TTY: bool tty_buffer_cancel_work(struct tty_port *port) { bool ret; ret = cancel_work_sync(&port->buf.work); // Check return value of cancel_work_sync pr_emerg("Work cancelled is 0x%x %pS %d\n", (unsigned int)&port->buf.work, (void *)_RET_IP_, ret); return ret; } static void flush_to_ldisc(struct work_struct *work) { ... mdelay(100); // Added Delay to reproduce race if (flag_work_cancel) { pr_emerg("scheduled work after stopping work %x\n", (unsigned int)work); .... } static void pty_cleanup(struct tty_struct *tty) { ... flag_work_cancel = 1; ... } [1403.4158]Work cancelled is dbcd4a84 tty_port_destroy+0x1c/0x6c 0 // Since return is 0 so no work is pending [1403.5129] scheduled work after stopping work dbcd4a84 // Still same work is scheduled after cancelled [1403.5130] Unable to handle kernel paging request at virtual address 6b6b6b83 // Kernel OOPs occured because of use after free Using JTAG we fetched free and allocation BT for pty port is: Free Path: pty_cleanup release_one_tty process_one_work worker_thread kthread ret_from_fork Allocation Path: pty_unix98_install tty_init_dev ptmx_open chrdev_open do_dentry_open vfs_open do_last.isra.10 we already applied below patches: https://lore.kernel.org/patchwork/patch/862594/ But seems it is different case. Can you suggest any possible fix for this? Thanks, Maninder Singh ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: race between flush_to_ldisc and pty_cleanup 2019-02-01 13:33 ` race between flush_to_ldisc and pty_cleanup Maninder Singh @ 2019-02-01 14:21 ` gregkh 2019-02-01 14:26 ` gregkh 1 sibling, 0 replies; 5+ messages in thread From: gregkh @ 2019-02-01 14:21 UTC (permalink / raw) To: Maninder Singh Cc: peter@hurleysoftware.com, jslaby@suse.com, keun-o.park@darkmatter.ae, linux-kernel@vger.kernel.org, AMIT SAHRAWAT, Vaneet Narang, Rohit Thapliyal, Ayush Mittal On Fri, Feb 01, 2019 at 07:03:26PM +0530, Maninder Singh wrote: > Hi, > > > There is some race condition between tty_port_put and flush_to_ldisc > which lead to use after free case: > (Kernel 4.1) Wait, 4.1 is _REALLY_ old and obsolete and broken in numerous known places. Can you duplicate this on 4.20, or better yet, 5.0-rc4? Lots of tty and ldisc fixes has gone into the tree in the past 3 1/2 years since that kernel was released. thanks, greg k-h ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: race between flush_to_ldisc and pty_cleanup 2019-02-01 13:33 ` race between flush_to_ldisc and pty_cleanup Maninder Singh 2019-02-01 14:21 ` gregkh @ 2019-02-01 14:26 ` gregkh 2019-02-01 14:37 ` Maninder Singh 1 sibling, 1 reply; 5+ messages in thread From: gregkh @ 2019-02-01 14:26 UTC (permalink / raw) To: Maninder Singh Cc: peter@hurleysoftware.com, jslaby@suse.com, keun-o.park@darkmatter.ae, linux-kernel@vger.kernel.org, AMIT SAHRAWAT, Vaneet Narang, Rohit Thapliyal, Ayush Mittal On Fri, Feb 01, 2019 at 07:03:26PM +0530, Maninder Singh wrote: > Hi, > > > There is some race condition between tty_port_put and flush_to_ldisc > which lead to use after free case: > (Kernel 4.1) > > [1403.5130] Unable to handle kernel paging request at virtual address 6b6b6b83 > ... > ... > ... > > [1403.5132] [<c036a65c>] (ldsem_down_read_trylock) from [<c0367db4>] (tty_ldisc_ref+0x24/0x60) > [1403.5132] [<c0367d90>] (tty_ldisc_ref) from [<c03690cc>] (flush_to_ldisc+0x6c/0x21c) > [1403.5132] r5:dbcd4a84 r4:00000000 > [1403.5132] [<c0369060>] (flush_to_ldisc) from [<c0049f18>] (process_one_work+0x214/0x570) > [1403.5132] r10:00000000 r9:ddab0000 r8:e3d6e000 r7:00000000 r6:e453f740 r5:cb37b780 > [1403.5132] r4:dbcd4a84 > [1403.5132] [<c0049d04>] (process_one_work) from [<c004a2d4>] (worker_thread+0x60/0x580) > [1403.5132] r10:e453f740 r9:ddab0000 r8:e453f764 r7:00000088 r6:e453f740 r5:cb37b798 > [1403.5132] r4:cb37b780 > [1403.5132] [<c004a274>] (worker_thread) from [<c005022c>] (kthread+0xec/0x104) > [1403.5132] r10:00000000 r9:00000000 r8:00000000 r7:c004a274 r6:cb37b780 r5:d8a3fc80 > [1403.5132] r4:00000000 > [1403.5132] [<c0050140>] (kthread) from [<c00107d8>] (ret_from_fork+0x14/0x3c) > > > for checking further we entered some debug prints and added delay in flush_to_ldisc to reproduce > and seems there is some issue with workqueue implementation of TTY: > > bool tty_buffer_cancel_work(struct tty_port *port) > { > bool ret; > ret = cancel_work_sync(&port->buf.work); // Check return value of cancel_work_sync > pr_emerg("Work cancelled is 0x%x %pS %d\n", (unsigned int)&port->buf.work, (void *)_RET_IP_, ret); > return ret; > } > > static void flush_to_ldisc(struct work_struct *work) > { > ... > mdelay(100); // Added Delay to reproduce race > > if (flag_work_cancel) { > pr_emerg("scheduled work after stopping work %x\n", (unsigned int)work); > > .... > } > > static void pty_cleanup(struct tty_struct *tty) > { > ... > flag_work_cancel = 1; > ... > } > > > [1403.4158]Work cancelled is dbcd4a84 tty_port_destroy+0x1c/0x6c 0 // Since return is 0 so no work is pending > > [1403.5129] scheduled work after stopping work dbcd4a84 // Still same work is scheduled after cancelled > [1403.5130] Unable to handle kernel paging request at virtual address 6b6b6b83 // Kernel OOPs occured because of use after free Ok, after my initial "use a newer kernel" comment, this really does look strange. There has also been a lot of workqueue fixes and rework since 4.1, and that might be the thing that fixes this issue here. However, are you sure you are not just calling flush_to_ldisc() directly through some codepath somehow? If you look at the stack in the pr_emerg() message, where did it come from? From the same workqueue that you already stopped? Testing on a newer kernel would be great, if possible. thanks, greg k-h ^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: race between flush_to_ldisc and pty_cleanup 2019-02-01 14:26 ` gregkh @ 2019-02-01 14:37 ` Maninder Singh 2019-02-03 5:54 ` Keun-O Park 0 siblings, 1 reply; 5+ messages in thread From: Maninder Singh @ 2019-02-01 14:37 UTC (permalink / raw) To: gregkh@linuxfoundation.org, peter@hurleysoftware.com, jslaby@suse.com, keun-o.park@darkmatter.ae Cc: linux-kernel@vger.kernel.org, AMIT SAHRAWAT, Vaneet Narang, Rohit Thapliyal, Ayush Mittal Hi, >On Fri, Feb 01, 2019 at 07:03:26PM +0530, Maninder Singh wrote: >> Hi, >> >> >> There is some race condition between tty_port_put and flush_to_ldisc >> which lead to use after free case: >> (Kernel 4.1) >> >> [1403.5130] Unable to handle kernel paging request at virtual address 6b6b6b83 >> ... >> ... >> ... >> >> [1403.5132] [<c036a65c>] (ldsem_down_read_trylock) from [<c0367db4>] (tty_ldisc_ref+0x24/0x60) >> [1403.5132] [<c0367d90>] (tty_ldisc_ref) from [<c03690cc>] (flush_to_ldisc+0x6c/0x21c) >> [1403.5132] r5:dbcd4a84 r4:00000000 >> [1403.5132] [<c0369060>] (flush_to_ldisc) from [<c0049f18>] (process_one_work+0x214/0x570) >> [1403.5132] r10:00000000 r9:ddab0000 r8:e3d6e000 r7:00000000 r6:e453f740 r5:cb37b780 >> [1403.5132] r4:dbcd4a84 >> [1403.5132] [<c0049d04>] (process_one_work) from [<c004a2d4>] (worker_thread+0x60/0x580) >> [1403.5132] r10:e453f740 r9:ddab0000 r8:e453f764 r7:00000088 r6:e453f740 r5:cb37b798 >> [1403.5132] r4:cb37b780 >> [1403.5132] [<c004a274>] (worker_thread) from [<c005022c>] (kthread+0xec/0x104) >> [1403.5132] r10:00000000 r9:00000000 r8:00000000 r7:c004a274 r6:cb37b780 r5:d8a3fc80 >> [1403.5132] r4:00000000 >> [1403.5132] [<c0050140>] (kthread) from [<c00107d8>] (ret_from_fork+0x14/0x3c) >> >> >> for checking further we entered some debug prints and added delay in flush_to_ldisc to reproduce >> and seems there is some issue with workqueue implementation of TTY: >> >> bool tty_buffer_cancel_work(struct tty_port *port) >> { >> bool ret; >> ret = cancel_work_sync(&port->buf.work); // Check return value of cancel_work_sync >> pr_emerg("Work cancelled is 0x%x %pS %d\n", (unsigned int)&port->buf.work, (void *)_RET_IP_, ret); >> return ret; >> } >> >> static void flush_to_ldisc(struct work_struct *work) >> { >> ... >> mdelay(100); // Added Delay to reproduce race >> >> if (flag_work_cancel) { >> pr_emerg("scheduled work after stopping work %x\n", (unsigned int)work); >> >> .... >> } >> >> static void pty_cleanup(struct tty_struct *tty) >> { >> ... >> flag_work_cancel = 1; >> ... >> } >> >> >> [1403.4158]Work cancelled is dbcd4a84 tty_port_destroy+0x1c/0x6c 0 // Since return is 0 so no work is pending >> >> [1403.5129] scheduled work after stopping work dbcd4a84 // Still same work is scheduled after cancelled >> [1403.5130] Unable to handle kernel paging request at virtual address 6b6b6b83 // Kernel OOPs occured because of use after free > >Ok, after my initial "use a newer kernel" comment, this really does look >strange. There has also been a lot of workqueue fixes and rework since >4.1, and that might be the thing that fixes this issue here. > >However, are you sure you are not just calling flush_to_ldisc() directly >through some codepath somehow? If you look at the stack in the Yes, there is no call path for flush_to_disc directly. It is all aligned with kernel 4.1. >pr_emerg() message, where did it come from? From the same workqueue >that you already stopped? We added debug prints to check "work" in pty_cleanup() & flush_to_ldisc > >Testing on a newer kernel would be great, if possible. We are facing it hard, but currently we have 4.1 and able to reproduce on that. Not really possible to have the latest kernel on the same target and may be reproduce the same race. Tried to track for changes in the other stable branches, but no change looks really relevant for this race. I might be wrong, please help if there is any commit related with this. > >thanks, > >greg k-h Our initial debugging direction was with "tty" but looks some issue in workqueue. Also, the most surprising looks to be the CANCEL and FLUSH to be occuring on the same CORE in sequence. So, if the CANCEL really worked how can the flush_to_ldisc be scheduled. Thanks, Maninder Singh ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: race between flush_to_ldisc and pty_cleanup 2019-02-01 14:37 ` Maninder Singh @ 2019-02-03 5:54 ` Keun-O Park 0 siblings, 0 replies; 5+ messages in thread From: Keun-O Park @ 2019-02-03 5:54 UTC (permalink / raw) To: maninder1.s@samsung.com, gregkh@linuxfoundation.org, peter@hurleysoftware.com, jslaby@suse.com Cc: linux-kernel@vger.kernel.org, AMIT SAHRAWAT, Vaneet Narang, Rohit Thapliyal, Ayush Mittal [-- Attachment #1: Type: text/plain, Size: 5024 bytes --] Hi, 02/01/2019 07:31 PM에 Maninder Singh 이(가) 쓴 글: > Hi, > >> On Fri, Feb 01, 2019 at 07:03:26PM +0530, Maninder Singh wrote: >>> Hi, >>> >>> >>> There is some race condition between tty_port_put and flush_to_ldisc >>> which lead to use after free case: >>> (Kernel 4.1) >>> >>> [1403.5130] Unable to handle kernel paging request at virtual address 6b6b6b83 >>> ... >>> ... >>> ... >>> >>> [1403.5132] [<c036a65c>] (ldsem_down_read_trylock) from [<c0367db4>] (tty_ldisc_ref+0x24/0x60) >>> [1403.5132] [<c0367d90>] (tty_ldisc_ref) from [<c03690cc>] (flush_to_ldisc+0x6c/0x21c) >>> [1403.5132] r5:dbcd4a84 r4:00000000 >>> [1403.5132] [<c0369060>] (flush_to_ldisc) from [<c0049f18>] (process_one_work+0x214/0x570) >>> [1403.5132] r10:00000000 r9:ddab0000 r8:e3d6e000 r7:00000000 r6:e453f740 r5:cb37b780 >>> [1403.5132] r4:dbcd4a84 >>> [1403.5132] [<c0049d04>] (process_one_work) from [<c004a2d4>] (worker_thread+0x60/0x580) >>> [1403.5132] r10:e453f740 r9:ddab0000 r8:e453f764 r7:00000088 r6:e453f740 r5:cb37b798 >>> [1403.5132] r4:cb37b780 >>> [1403.5132] [<c004a274>] (worker_thread) from [<c005022c>] (kthread+0xec/0x104) >>> [1403.5132] r10:00000000 r9:00000000 r8:00000000 r7:c004a274 r6:cb37b780 r5:d8a3fc80 >>> [1403.5132] r4:00000000 >>> [1403.5132] [<c0050140>] (kthread) from [<c00107d8>] (ret_from_fork+0x14/0x3c) >>> >>> >>> for checking further we entered some debug prints and added delay in flush_to_ldisc to reproduce >>> and seems there is some issue with workqueue implementation of TTY: >>> >>> bool tty_buffer_cancel_work(struct tty_port *port) >>> { >>> bool ret; >>> ret = cancel_work_sync(&port->buf.work); // Check return value of cancel_work_sync >>> pr_emerg("Work cancelled is 0x%x %pS %d\n", (unsigned int)&port->buf.work, (void *)_RET_IP_, ret); >>> return ret; >>> } >>> >>> static void flush_to_ldisc(struct work_struct *work) >>> { >>> ... >>> mdelay(100); // Added Delay to reproduce race >>> >>> if (flag_work_cancel) { >>> pr_emerg("scheduled work after stopping work %x\n", (unsigned int)work); >>> >>> .... >>> } >>> >>> static void pty_cleanup(struct tty_struct *tty) >>> { >>> ... >>> flag_work_cancel = 1; >>> ... >>> } >>> >>> >>> [1403.4158]Work cancelled is dbcd4a84 tty_port_destroy+0x1c/0x6c 0 // Since return is 0 so no work is pending >>> >>> [1403.5129] scheduled work after stopping work dbcd4a84 // Still same work is scheduled after cancelled >>> [1403.5130] Unable to handle kernel paging request at virtual address 6b6b6b83 // Kernel OOPs occured because of use after free >> Ok, after my initial "use a newer kernel" comment, this really does look >> strange. There has also been a lot of workqueue fixes and rework since >> 4.1, and that might be the thing that fixes this issue here. >> >> However, are you sure you are not just calling flush_to_ldisc() directly >> through some codepath somehow? If you look at the stack in the > Yes, there is no call path for flush_to_disc directly. It is all aligned with kernel 4.1. > > >> pr_emerg() message, where did it come from? From the same workqueue >> that you already stopped? > We added debug prints to check "work" in pty_cleanup() & flush_to_ldisc > >> Testing on a newer kernel would be great, if possible. > We are facing it hard, but currently we have 4.1 and able to reproduce on that. > Not really possible to have the latest kernel on the same target and may be reproduce the same race. > Tried to track for changes in the other stable branches, but no change looks really relevant for this race. > I might be wrong, please help if there is any commit related with this. Please find & review the attached patch for this issue. I think the tty which called pty_cleanup may be different from the tty which called flush_to_ldisc. Please check if those two are same or different. I don't think this is a workqueue problem. Thanks. Best Regards, Sahara > >> thanks, >> >> greg k-h > Our initial debugging direction was with "tty" but looks some issue in workqueue. > Also, the most surprising looks to be the CANCEL and FLUSH to be occuring on the same CORE in sequence. > So, if the CANCEL really worked how can the flush_to_ldisc be scheduled. > > Thanks, > Maninder Singh > Keun-O Park | Principal Software Engineer Tel: +971 2 417 1417 | Mob: +971 55 260 7343 Keun-O.Park@darkmatter.ae The information transmitted, including attachments, is intended only for the person(s) or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and destroy any copies of this information. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-tty-pty-Fix-race-condition-between-release_one_tty-a.patch --] [-- Type: text/x-patch; name="0001-tty-pty-Fix-race-condition-between-release_one_tty-a.patch", Size: 2171 bytes --] From 3a65ec01f8849c3e11267794b9d1c70ef32d13e3 Mon Sep 17 00:00:00 2001 From: Sahara <keun-o.park@darkmatter.ae> Date: Sun, 3 Feb 2019 09:27:28 +0400 Subject: [PATCH] tty: pty: Fix race condition between release_one_tty and pty_write Especially when a linked tty is used such as pty, the linked tty port's buf works have not been cancelled while master tty port's buf work has been cancelled. Since release_one_tty and flush_to_ldisc run in workqueue threads separately, when pty_cleanup happens and link tty port is freed, flush_to_ldisc tries to access freed port and port->itty, eventually it causes a panic. This patch utilizes the magic value with holding the tty_mutex to check if the tty->link is valid. Fixes: 2b022ab7542d ("pty: cancel pty slave port buf's work in tty_release") Signed-off-by: Sahara <keun-o.park@darkmatter.ae> --- drivers/tty/pty.c | 7 +++++++ drivers/tty/tty_io.c | 3 +++ 2 files changed, 10 insertions(+) diff --git a/drivers/tty/pty.c b/drivers/tty/pty.c index 00099a84..ef72031 100644 --- a/drivers/tty/pty.c +++ b/drivers/tty/pty.c @@ -116,6 +116,12 @@ static int pty_write(struct tty_struct *tty, const unsigned char *buf, int c) if (tty->stopped) return 0; + mutex_lock(&tty_mutex); + if (to->magic != TTY_MAGIC) { + mutex_unlock(&tty_mutex); + return -EIO; + } + if (c > 0) { spin_lock_irqsave(&to->port->lock, flags); /* Stuff the data into the input queue of the other end */ @@ -125,6 +131,7 @@ static int pty_write(struct tty_struct *tty, const unsigned char *buf, int c) tty_flip_buffer_push(to->port); spin_unlock_irqrestore(&to->port->lock, flags); } + mutex_unlock(&tty_mutex); return c; } diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c index 23c6fd2..6c4a206 100644 --- a/drivers/tty/tty_io.c +++ b/drivers/tty/tty_io.c @@ -1446,10 +1446,13 @@ static void release_one_tty(struct work_struct *work) struct tty_driver *driver = tty->driver; struct module *owner = driver->owner; + mutex_lock(&tty_mutex); if (tty->ops->cleanup) tty->ops->cleanup(tty); tty->magic = 0; + mutex_unlock(&tty_mutex); + tty_driver_kref_put(driver); module_put(owner); -- 2.7.4 ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2019-02-03 6:05 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CGME20190201133326epcms5p506416bc4ae22f600ee705f146ca1a599@epcms5p5>
2019-02-01 13:33 ` race between flush_to_ldisc and pty_cleanup Maninder Singh
2019-02-01 14:21 ` gregkh
2019-02-01 14:26 ` gregkh
2019-02-01 14:37 ` Maninder Singh
2019-02-03 5:54 ` Keun-O Park
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.