From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dmitry Torokhov Subject: Re: Oops/Warning report for the week of March 28th 2008 Date: Fri, 28 Mar 2008 17:16:42 -0400 Message-ID: <20080328171407.ZZRA012@mailhub.coreip.homeip.net> References: <47ED3F1A.1090101@linux.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Arjan van de Ven , Linux Kernel Mailing List , NetDev To: Linus Torvalds Return-path: Received: from nf-out-0910.google.com ([64.233.182.191]:57163 "EHLO nf-out-0910.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751045AbYC1VQt (ORCPT ); Fri, 28 Mar 2008 17:16:49 -0400 Received: by nf-out-0910.google.com with SMTP id g13so256453nfb.21 for ; Fri, 28 Mar 2008 14:16:47 -0700 (PDT) Content-Disposition: inline In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: On Fri, Mar 28, 2008 at 01:51:38PM -0700, Linus Torvalds wrote: > > > On Fri, 28 Mar 2008, Linus Torvalds wrote: > > > > Is there something obvious that I'm missing? I'd really like to see the > > whole posting that the oops came from. Do you save the originals or even > > just message ID's from the ones you pick from emails? > > Hmm. Definitely not from the kernel mailing list. I'm intrigued, where did > that oops #5814 come from (picked a recent one at random)? > > The thing is recent, and oopses on "mutex_lock(dev->mutex)" in > input_release_device. In particular, the path *seems* to be this one: > > evdev_release -> > evdev_ungrab -> > input_release_device -> > mutex_lock -> > mutex_lock_nested -> > __mutex_lock_common -> > list_add_tail(&waiter.list, &lock->wait_list) > > where "lock->wait_list.prev" seems to be 0x6b6b6b6b6b6b6b6b, which is the > use-after-free poison pattern. > > (In fact, I think the access that actually oopses is when the > debug version of __list_add() does > > if (unlikely(prev->next != next)) { > > because that "prev" pointer is crap). > > So it seems that when input_release_device() does: > > struct input_dev *dev = handle->dev; > > mutex_lock(&dev->mutex); > > the "dev" it uses has already been released. And this only shows up as a > problem when you have slab debugging turned on (like the Fedora kernels > do, thank you all Fedora guys). > > The odd thing is that I don't think any of this code has really changed > recently. > There is a patch from Pete that works around the problem by not calling input_release_device() on devices that are gone. But what I don't understand is why the parent input device is gone since sysfs/driver core should be keeping a reference to it since it is a parent of evdev. input_dev shoudl only be released after evdev_free() is called. -- Dmitry