From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750972AbcBHLH2 (ORCPT ); Mon, 8 Feb 2016 06:07:28 -0500 Received: from userp1040.oracle.com ([156.151.31.81]:37824 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750788AbcBHLH1 (ORCPT ); Mon, 8 Feb 2016 06:07:27 -0500 Subject: Re: gadgetfs regression (NULL ptr deref) since v4.4-rc7 To: Marek Szyprowski , Ruslan Bilovol References: <56B7D2C9.5070301@oracle.com> <56B86BB9.8050705@samsung.com> Cc: Maxime Ripard , Peter Chen , Felipe Balbi , Greg Kroah-Hartman , LKML , "linux-usb@vger.kernel.org" From: Vegard Nossum Message-ID: <56B876D5.9030805@oracle.com> Date: Mon, 8 Feb 2016 12:07:01 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: <56B86BB9.8050705@samsung.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Source-IP: userv0021.oracle.com [156.151.31.71] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (fixed Felipe Balbi e-mail address) On 02/08/2016 11:19 AM, Marek Szyprowski wrote: > Hello, > > On 2016-02-08 10:46, Ruslan Bilovol wrote: >> Hi Vegard, >> >> On Mon, Feb 8, 2016 at 1:27 AM, Vegard Nossum >> wrote: >>> Hi, >>> >>> Using gadgetfs on latest mainline, I get the following NULL pointer >>> dereference: >>> >>> BUG: unable to handle kernel NULL pointer dereference at >>> (null) >>> IP: [] __list_del_entry+0x29/0xc0 >>> PGD 34f067 PUD 355067 PMD 0 >>> Oops: 0000 [#1] DEBUG_PAGEALLOC >>> CPU: 0 PID: 35 Comm: afl-fuzz Not tainted 4.5.0-rc2 #1 >>> task: ffff8800002b1ec0 ti: ffff88000033c000 task.ti: ffff88000033c000 >>> RIP: 0010:[] [] >>> __list_del_entry+0x29/0xc0 >>> RSP: 0018:ffff88000033fe30 EFLAGS: 00010207 >>> RAX: 0000000000000000 RBX: ffffffff8138c108 RCX: dead000000000200 >>> RDX: 0000000000000000 RSI: 0000000000000061 RDI: ffffffff8138c108 >>> RBP: ffff88000033fe30 R08: 0000000000000000 R09: 0000000000000000 >>> R10: 0000000000000000 R11: 0000000000000246 R12: ffffffff8138b928 >>> R13: ffffffff8138c040 R14: ffff88000ec0da20 R15: ffff88000033ff58 >>> FS: 00007ffff7ff2740(0000) GS:ffffffff8135d000(0000) >>> knlGS:0000000000000000 >>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> CR2: 0000000000000000 CR3: 0000000000335000 CR4: 00000000001406b0 >>> Stack: >>> ffff88000033fe48 ffffffff81138f2d ffffffff8138bbd0 ffff88000033fe70 >>> ffffffff811c880b ffff8800002f3000 ffff88000ec109a0 ffff880000298aa0 >>> ffff88000033fe88 ffffffff811ce040 ffff88000034cdc0 ffff88000033feb8 >>> Call Trace: >>> [] list_del+0xd/0x30 >>> [] usb_gadget_unregister_driver+0xdb/0xf0 >>> [] dev_release+0x20/0x60 >>> [] __fput+0x4c/0x130 >>> [] ____fput+0x9/0x10 >>> [] task_work_run+0x67/0xa0 >>> [] exit_to_usermode_loop+0x8f/0xa0 >>> [] syscall_return_slowpath+0x3d/0x40 >>> [] int_ret_from_sys_call+0x25/0x8f >>> Code: ff ff 55 48 8b 07 48 b9 00 01 00 00 00 00 ad de 48 8b 57 08 48 >>> 89 e5 >>> 48 39 c8 74 29 48 b9 00 02 00 00 00 00 ad de 48 39 ca 74 3a <4c> 8b >>> 02 4c 39 >>> c7 75 52 4c 8b 40 08 4c 39 c7 75 66 48 89 50 08 >>> RIP [] __list_del_entry+0x29/0xc0 >>> RSP >>> CR2: 0000000000000000 >>> ---[ end trace e6cfe1de661dcffe ]--- >>> >>> Reverting this commit fixes the problem for me: >>> >>> commit 855ed04a3758b205e84b269f92d26ab36ed8e2f7 >>> Author: Ruslan Bilovol >>> Date: Mon Nov 23 09:56:38 2015 +0100 >>> >>> usb: gadget: udc-core: independent registration of gadgets and >>> gadget >>> drivers >>> >>> Though I am still seeing some occasional lockups. Thanks, >>> >> Original version of this patch had checking of input parameters >> (see change in usb_gadget_unregister_driver at >> https://lkml.org/lkml/2015/6/22/559) but this approach was >> rejected by Alan Stern many times so final version pushed by >> Marek Szyprowski doesn't have it, and we have this NULL pointer >> dereference. >> But this means there is a bug in GadgetFS that was hidden >> previously: it tries to deregister gadget driver that was not previously >> registered. Previous implementation was returning -ENODEV >> retval, current one just does NULL pointer dereference. >> >> At least need to fix GadgetFS that seems to by buggy in >> gadget driver unregistering, and I still say we need to nave >> input parameters checking in externally visible functions like >> usb_gadget_unregister_driver(). >> >> Meanwhile you can try following patch that returns checking of input >> parameters back and see if it helps. >> >> Best regards, >> Ruslan >> >> -------- >> >> diff --git a/drivers/usb/gadget/udc/udc-core.c >> b/drivers/usb/gadget/udc/udc-core.c >> index fd73a3e..f1d375f 100644 >> --- a/drivers/usb/gadget/udc/udc-core.c >> +++ b/drivers/usb/gadget/udc/udc-core.c >> @@ -597,9 +597,16 @@ int usb_gadget_unregister_driver(struct >> usb_gadget_driver *driver) >> } >> >> if (ret) { >> - list_del(&driver->pending); >> - ret = 0; >> + struct usb_gadget_driver *tmp; >> + >> + list_for_each_entry(tmp, &gadget_driver_pending_list, >> pending) >> + if (tmp == driver) { >> + list_del(&driver->pending); >> + ret = 0; >> + break; >> + } >> } >> + >> mutex_unlock(&udc_lock); >> return ret; >> } > So the patch *seems* to fix the problem I was seeing. However... > Sorry, but this is not the solution. If gadgetfs is broken, then it > should be fixed. There is no point in adding workarounds in core just > because there are drivers that can be broken. Even if it's not a fix per se, why not modify it to throw a warning instead of crashing? If I understand correctly, something like +WARN_ON_ONCE(ret); after the loop in Ruslan's patch should be enough. I added a BUG_ON(ret) (instead of WARN_ON_ONCE) and I got this: ------------[ cut here ]------------ kernel BUG at drivers/usb/gadget/udc/udc-core.c:622! invalid opcode: 0000 [#1] DEBUG_PAGEALLOC KASAN CPU: 0 PID: 36 Comm: afl-fuzz Not tainted 4.5.0-rc2 #1 task: ffff8800003b6900 ti: ffff88000c840000 task.ti: ffff88000c840000 RIP: 0010:[] [] usb_gadget_unregister_driver+0x18b/0x2d0 RSP: 0018:ffff88000c847de8 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffffffff81a117d8 RCX: 1ffff10001908fa6 RDX: 0000000000000001 RSI: 0000000000000061 RDI: ffff88000039c418 RBP: ffff88000c847e10 R08: 0000000000000246 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff81a118e0 R13: ffffffff81a13f40 R14: dffffc0000000000 R15: ffff88000c83ca00 FS: 00007ffff7ff2740(0000) GS:ffffffff8193f000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffff78c53a0 CR3: 000000000c850000 CR4: 00000000001406b0 Stack: ffffffff817f62a0 ffff880000277a40 ffff88000c83ca10 ffff88000c83ca20 ffff88000c83ca18 ffff88000c847e30 ffffffff81533bb4 ffff88000cc25758 ffff88000c83ca10 ffff88000c847e88 ffffffff811f0ccb 00007ffff780f0c0 Call Trace: [] dev_release+0x44/0x110 [] __fput+0x11b/0x490 [] ____fput+0x9/0x10 [] task_work_run+0xf1/0x190 [] ? filp_close+0x8a/0xe0 [] exit_to_usermode_loop+0xec/0x100 [] syscall_return_slowpath+0x91/0xc0 [] int_ret_from_sys_call+0x25/0x8f Code: f8 48 c1 e8 03 42 80 3c 20 00 0f 85 1b 01 00 00 48 8b 83 c8 00 00 00 48 3d a0 18 a1 81 48 8d 98 38 ff ff ff 75 be e8 45 b6 e6 ff <0f> 0b e8 3e b6 e6 ff 48 89 df e8 26 d0 ff ff 48 8d 7b 08 48 b8 RIP [] usb_gadget_unregister_driver+0x18b/0x2d0 RSP ---[ end trace 875139a9f2b43c09 ]--- > I was still not able to reproduce this issue (tried with dwc2, gadgetfs > and ptp gadget daemon). Vegard could you prepare a step-by-step > instruction how to reproduce this problem? It's a bit complicated to reproduce (and not 100% deterministic) since I am running into this using a fuzzer. But I am just using dummy_hcd in the following way: - mount gadgetfs - open dummy_hcd file - read/write some data - close dummy_hcd file - unmount gadgetfs I will write back if I find an easy way to reproduce it. In the meantime I can test patches easily. Vegard