From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Subject: usbnet: fix kernel crash after disconnect From: Oliver Neukum Message-Id: <1555569464.7835.4.camel@suse.com> Date: Thu, 18 Apr 2019 08:37:44 +0200 To: Kloetzke Jan Cc: "linux-usb@vger.kernel.org" , "netdev@vger.kernel.org" List-ID: T24gTWksIDIwMTktMDQtMTcgYXQgMDk6MTkgKzAwMDAsIEtsb2V0emtlIEphbiB3cm90ZToKPiBX aGVuIGRpc2Nvbm5lY3RpbmcgY2RjX25jbSB0aGUga2VybmVsIHNwb3JhZGljYWxseSBjcmFzaGVz IHNob3J0bHkKPiBhZnRlciB0aGUgZGlzY29ubmVjdDoKPiAKPiAgIFsgICA1Ny44Njg4MTJdIFVu YWJsZSB0byBoYW5kbGUga2VybmVsIE5VTEwgcG9pbnRlciBkZXJlZmVyZW5jZSBhdCB2aXJ0dWFs IGFkZHJlc3MgMDAwMDAwMDAKPiAgIC4uLgo+ICAgWyAgIDU4LjAwNjY1M10gUEMgaXMgYXQgMHgw Cj4gICBbICAgNTguMDA5MjAyXSBMUiBpcyBhdCBjYWxsX3RpbWVyX2ZuKzB4ZWMvMHgxYjQKPiAg IFsgICA1OC4wMTM1NjddIHBjIDogWzwwMDAwMDAwMDAwMDAwMDAwPl0gbHIgOiBbPGZmZmZmZjgw MDgwZjUxMzA+XSBwc3RhdGU6IDAwMDAwMTQ1Cj4gICBbICAgNTguMDIwOTc2XSBzcCA6IGZmZmZm ZjgwMDgwMDNkYTAKPiAgIFsgICA1OC4wMjQyOTVdIHgyOTogZmZmZmZmODAwODAwM2RhMCB4Mjg6 IDAwMDAwMDAwMDAwMDAwMDEKPiAgIFsgICA1OC4wMjk2MThdIHgyNzogMDAwMDAwMDAwMDAwMDAw YSB4MjY6IDAwMDAwMDAwMDAwMDAxMDAKPiAgIFsgICA1OC4wMzQ5NDFdIHgyNTogMDAwMDAwMDAw MDAwMDAwMCB4MjQ6IGZmZmZmZjgwMDgwMDNlNjgKPiAgIFsgICA1OC4wNDAyNjNdIHgyMzogMDAw MDAwMDAwMDAwMDAwMCB4MjI6IDAwMDAwMDAwMDAwMDAwMDAKPiAgIFsgICA1OC4wNDU1ODddIHgy MTogMDAwMDAwMDAwMDAwMDAwMCB4MjA6IGZmZmZmZmM2OGZhYzE4MDgKPiAgIFsgICA1OC4wNTA5 MTBdIHgxOTogMDAwMDAwMDAwMDAwMDEwMCB4MTg6IDAwMDAwMDAwMDAwMDAwMDAKPiAgIFsgICA1 OC4wNTYyMzJdIHgxNzogMDAwMDAwN2Y4ODVhZmY4YyB4MTY6IDAwMDAwMDdmODgzYTlmMTAKPiAg IFsgICA1OC4wNjE1NTZdIHgxNTogMDAwMDAwMDAwMDAwMDAwMSB4MTQ6IDAwMDAwMDAwMDAwMDAw NmUKPiAgIFsgICA1OC4wNjY4NzhdIHgxMzogMDAwMDAwMDAwMDAwMDAwMCB4MTI6IDAwMDAwMDAw MDAwMDAwYmEKPiAgIFsgICA1OC4wNzIyMDFdIHgxMTogZmZmZmZmYzY5ZmYxZGIzMCB4MTA6IDAw MDAwMDAwMDAwMDAwMjAKPiAgIFsgICA1OC4wNzc1MjRdIHg5IDogODAwMDEwMDAwODAwMTAwMCB4 OCA6IDAwMDAwMDAwMDAwMDAwMDEKPiAgIFsgICA1OC4wODI4NDddIHg3IDogMDAwMDAwMDAwMDAw MDgwMCB4NiA6IGZmZmZmZjgwMDgwMDNlNzAKPiAgIFsgICA1OC4wODgxNjldIHg1IDogZmZmZmZm YzY5ZmYxN2EyOCB4NCA6IDAwMDAwMDAwZmZmZjEzOGIKPiAgIFsgICA1OC4wOTM0OTJdIHgzIDog MDAwMDAwMDAwMDAwMDAwMCB4MiA6IDAwMDAwMDAwMDAwMDAwMDAKPiAgIFsgICA1OC4wOTg4MTRd IHgxIDogMDAwMDAwMDAwMDAwMDAwMCB4MCA6IDAwMDAwMDAwMDAwMDAwMDAKPiAgIC4uLgo+ICAg WyAgIDU4LjIwNTgwMF0gWzwgICAgICAgICAgKG51bGwpPl0gICAgICAgICAgIChudWxsKQo+ICAg WyAgIDU4LjIxMDUyMV0gWzxmZmZmZmY4MDA4MGY1Mjk4Pl0gZXhwaXJlX3RpbWVycysweGEwLzB4 MTRjCj4gICBbICAgNTguMjE1OTM3XSBbPGZmZmZmZjgwMDgwZjU0MmM+XSBydW5fdGltZXJfc29m dGlycSsweGU4LzB4MTI4Cj4gICBbICAgNTguMjIxNzAyXSBbPGZmZmZmZjgwMDgwODExMjA+XSBf X2RvX3NvZnRpcnErMHgyOTgvMHgzNDgKPiAgIFsgICA1OC4yMjcxMThdIFs8ZmZmZmZmODAwODBh NjMwND5dIGlycV9leGl0KzB4NzQvMHhiYwo+ICAgWyAgIDU4LjIzMjAwOV0gWzxmZmZmZmY4MDA4 MGUxN2RjPl0gX19oYW5kbGVfZG9tYWluX2lycSsweDc4LzB4YWMKPiAgIFsgICA1OC4yMzc4NTdd IFs8ZmZmZmZmODAwODA4MGNmND5dIGdpY19oYW5kbGVfaXJxKzB4ODAvMHhhYwo+ICAgLi4uCj4g Cj4gVGhlIGNyYXNoIGhhcHBlbnMgcm91Z2hseSAxMjUuLjEzMG1zIGFmdGVyIHRoZSBkaXNjb25u ZWN0LiBUaGlzCj4gY29ycmVsYXRlcyB3aXRoIHRoZSAnZGVsYXknIHRpbWVyIHRoYXQgaXMgc3Rh cnRlZCBvbiBjZXJ0YWluIFVTQiB0eC9yeAo+IGVycm9ycyBpbiB0aGUgVVJCIGNvbXBsZXRpb24g aGFuZGxlci4KPiAKPiBUaGUgc3VzcGVjdGVkIHByb2JsZW0gaXMgYSByYWNlIG9mIHVzYm5ldF9z dG9wKCkgd2l0aAo+IHVzYm5ldF9zdGFydF94bWl0KCkuIEluIHVzYm5ldF9zdG9wKCkgd2UgY2Fs bCB1c2JuZXRfdGVybWluYXRlX3VyYnMoKQo+IHRvIGNhbmNlbCBhbGwgVVJCcyBpbiBmbGlnaHQu IFRoaXMgb25seSBtYWtlcyBzZW5zZSBpZiBubyBuZXcgVVJCcyBhcmUKPiBzdWJtaXR0ZWQgY29u Y3VycmVudGx5LCB0aG91Z2guIEJ1dCB0aGUgdXNibmV0X3N0YXJ0X3htaXQoKSBjYW4gcnVuIGF0 Cj4gdGhlIHNhbWUgdGltZSBvbiBhbm90aGVyIENQVSB3aGljaCBhbG1vc3QgdW5jb25kaXRpb25h bGx5IHN1Ym1pdHMgYW4KPiBVUkIuIFRoZSBlcnJvciBjYWxsYmFjayBvZiB0aGUgbmV3IFVSQiB3 aWxsIHRoZW4gc2NoZWR1bGUgdGhlIHRpbWVyCj4gYWZ0ZXIgaXQgd2FzIGFscmVhZHkgc3RvcHBl ZC4KCkhpLAoKaW50ZXJlc3RpbmcuIEhvdyBzdXJlIGFyZSB5b3Ugb2YgdGhlIGRldGFpbHMgb2Yg eW91ciBhbmFseXNpcz8KSSBhbSBhc2tpbmcgYmVjYXVzZSB1c2JuZXRfc3RvcCgpIGRvZXMgYSBk ZWxfdGltZXJfc3luYygpLgpJdCBpcyBpbmRlZWQgd3JpdHRlbiB1bmRlciB0aGUgYXNzdW1wdGlv biB0aGF0IHRoZSB1cHBlciBsYXllcgp3aWxsIGhhdmUgY2Vhc2VkIHRyYW5zbWlzc2lvbiB3aGVu IGl0IHN0b3BzIGFuIGludGVyZmFjZS4KClNvIEkgYW0gd29uZGVyaW5nIHdoZXRoZXIgdGhlIGNv cnJlY3QgZml4IHdvdWxkIG5vdCBiZSB0byBtYWtlCnN1cmUgdGhlIHRpbWVyIGlzIHN0YXJ0ZWQu CgoJUmVnYXJkcwoJCU9saXZlcgo= From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.9 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4E7D5C10F0E for ; Thu, 18 Apr 2019 06:50:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 290722184B for ; Thu, 18 Apr 2019 06:50:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388070AbfDRGuF (ORCPT ); Thu, 18 Apr 2019 02:50:05 -0400 Received: from mx2.suse.de ([195.135.220.15]:50574 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725886AbfDRGuF (ORCPT ); Thu, 18 Apr 2019 02:50:05 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id D413EAF96; Thu, 18 Apr 2019 06:50:03 +0000 (UTC) Message-ID: <1555569464.7835.4.camel@suse.com> Subject: Re: [PATCH] usbnet: fix kernel crash after disconnect From: Oliver Neukum To: Kloetzke Jan Cc: "linux-usb@vger.kernel.org" , "netdev@vger.kernel.org" Date: Thu, 18 Apr 2019 08:37:44 +0200 In-Reply-To: <20190417091849.7475-1-Jan.Kloetzke@preh.de> References: <20190417091849.7475-1-Jan.Kloetzke@preh.de> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.26.6 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-usb-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-usb@vger.kernel.org Message-ID: <20190418063744.-hOf6Y6TgJgV1-IDMWvQzwYpvYmxvaNMG6yLQ8_tgmo@z> On Mi, 2019-04-17 at 09:19 +0000, Kloetzke Jan wrote: > When disconnecting cdc_ncm the kernel sporadically crashes shortly > after the disconnect: > > [ 57.868812] Unable to handle kernel NULL pointer dereference at virtual address 00000000 > ... > [ 58.006653] PC is at 0x0 > [ 58.009202] LR is at call_timer_fn+0xec/0x1b4 > [ 58.013567] pc : [<0000000000000000>] lr : [] pstate: 00000145 > [ 58.020976] sp : ffffff8008003da0 > [ 58.024295] x29: ffffff8008003da0 x28: 0000000000000001 > [ 58.029618] x27: 000000000000000a x26: 0000000000000100 > [ 58.034941] x25: 0000000000000000 x24: ffffff8008003e68 > [ 58.040263] x23: 0000000000000000 x22: 0000000000000000 > [ 58.045587] x21: 0000000000000000 x20: ffffffc68fac1808 > [ 58.050910] x19: 0000000000000100 x18: 0000000000000000 > [ 58.056232] x17: 0000007f885aff8c x16: 0000007f883a9f10 > [ 58.061556] x15: 0000000000000001 x14: 000000000000006e > [ 58.066878] x13: 0000000000000000 x12: 00000000000000ba > [ 58.072201] x11: ffffffc69ff1db30 x10: 0000000000000020 > [ 58.077524] x9 : 8000100008001000 x8 : 0000000000000001 > [ 58.082847] x7 : 0000000000000800 x6 : ffffff8008003e70 > [ 58.088169] x5 : ffffffc69ff17a28 x4 : 00000000ffff138b > [ 58.093492] x3 : 0000000000000000 x2 : 0000000000000000 > [ 58.098814] x1 : 0000000000000000 x0 : 0000000000000000 > ... > [ 58.205800] [< (null)>] (null) > [ 58.210521] [] expire_timers+0xa0/0x14c > [ 58.215937] [] run_timer_softirq+0xe8/0x128 > [ 58.221702] [] __do_softirq+0x298/0x348 > [ 58.227118] [] irq_exit+0x74/0xbc > [ 58.232009] [] __handle_domain_irq+0x78/0xac > [ 58.237857] [] gic_handle_irq+0x80/0xac > ... > > The crash happens roughly 125..130ms after the disconnect. This > correlates with the 'delay' timer that is started on certain USB tx/rx > errors in the URB completion handler. > > The suspected problem is a race of usbnet_stop() with > usbnet_start_xmit(). In usbnet_stop() we call usbnet_terminate_urbs() > to cancel all URBs in flight. This only makes sense if no new URBs are > submitted concurrently, though. But the usbnet_start_xmit() can run at > the same time on another CPU which almost unconditionally submits an > URB. The error callback of the new URB will then schedule the timer > after it was already stopped. Hi, interesting. How sure are you of the details of your analysis? I am asking because usbnet_stop() does a del_timer_sync(). It is indeed written under the assumption that the upper layer will have ceased transmission when it stops an interface. So I am wondering whether the correct fix would not be to make sure the timer is started. Regards Oliver