public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg KH <gregkh@suse.de>
To: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: v2.6.28-rc7: error in panic code? (NULL pointer dereference at 0000004c)
Date: Sat, 20 Dec 2008 00:58:51 -0800	[thread overview]
Message-ID: <20081220085851.GA25095@suse.de> (raw)
In-Reply-To: <19f34abd0812200052g50067413t398eb7d67213371@mail.gmail.com>

On Sat, Dec 20, 2008 at 09:52:10AM +0100, Vegard Nossum wrote:
> On Sat, Dec 20, 2008 at 1:46 AM, Greg KH <gregkh@suse.de> wrote:
> > On Fri, Dec 19, 2008 at 10:35:57PM +0200, Pekka Enberg wrote:
> >> Hi Greg,
> >>
> >> On Fri, Dec 19, 2008 at 06:19:43PM +0100, Vegard Nossum wrote:
> >> > Fixes what?  It might be quite difficult to revert that patch now, as
> >> > the infrastructure is no longer in place to use a private pci device
> >> > list, that code is long gone.
> >>
> >> Vegard forced one oops but got two! The first one is expected and but
> >> the second one shouldn't probably be there:
> >
> > "Second" oopses are known to not be reliable, I wouldn't count it as a
> > real problem unless it happens on its own.
> 
> Yes, because usually it's a process that BUGed and was killed --
> perhaps with locks held or in the middle of some transaction that will
> never complete. But this one happens in the panic code itself...
> 
> >
> >> >> > [    0.040993] EIP: [<c13b41dc>] klist_next+0x10/0x8d SS:ESP 0068:c165dd48
> >>
> >> Looks like the patch Vegard identified breaks something in the oops path?
> >
> > Very wierd, I also don't understand how reverting the specific patch
> > would even make a buildable system.
> 
> It was an unclean revert, here's the relevant resultant hunk:
> 
> diff --cc drivers/pci/probe.c
> index 003a9b3,2db2e4b..0000000
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@@ -32,16 -29,55 +28,11 @@@ LIST_HEAD(pci_devices)
>    */
>   int no_pci_devices(void)
>   {
> -       struct device *dev;
> -       int no_devices;
> -
> -       dev = bus_find_device(&pci_bus_type, NULL, NULL, find_anything);
> -       no_devices = (dev == NULL);
> -       put_device(dev);
> -       return no_devices;
> +       return list_empty(&pci_devices);
>   }
> +
>   EXPORT_SYMBOL(no_pci_devices);
> 
> 
> ...and this builds.
> 
> The problem seems to be that pci_bus_type.p->klist_devices is NULL. Because:
> 
> Oops happens here:
> 
> struct klist_node *klist_next(struct klist_iter *i)
> {
>         void (*put)(struct klist_node *) = i->i_klist->put;
> 
> So either i == NULL or i->i_klist == NULL. But i->i_klist was just
> before set here:
> 
> void klist_iter_init_node(struct klist *k, struct klist_iter *i,
>                           struct klist_node *n)
> {
>         i->i_klist = k;
>         i->i_cur = n;
>         if (n)
>                 kref_get(&n->n_ref);
> }
> 
> so the klist passed must have been NULL, it came from bus_find_device():
> 
>         klist_iter_init_node(&bus->p->klist_devices, &i,
>                              (start ? &start->knode_bus : NULL));
> 
> ...and indeed, printing bus->p here yields 00000000. This function was
> called from no_pci_devices(), so the bus variable was initialized from
> &pci_bus_type. So pci_bus_type.p == NULL.
> 
> This should be initialized in bus_register() called from
> pci_driver_init(). Aha, this never gets called because initcalls did
> not yet run.
> 
> A summary of the bug:
> 
> 1. Sending panic=1 wants to reboot on panic.
> 2. If panic occurs before initcalls ran, pci_bus_type.p is not initialized.
> 3. mach_reboot_fixups() in x86 code calls pci_get_device()
> 4. New oops
> 
> Maybe mach_reboot_fixups() should check to see if PCI bus is
> initialized before calling pci_get_device(), since obviously it can be
> called before it has been initialized too.
> 
> The funny thing is that no_pci_devices() is what _used_ to guard
> against using pci_bus_type too early:
> 
> /*
>  * Some device drivers need know if pci is initiated.
>  * Basically, we think pci is not initiated when there
>  * is no device to be found on the pci_bus_type.
>  */
> int no_pci_devices(void)
> 
> ...and now it uses pci_bus_type itself. That is what makes commit
> 70308923d317f2ad4973c30d90bb48ae38761317 wrong, because there might be
> other users of no_pci_devices() too, which would now almost certainly
> result in an Oops if the pci bus hasn't been initialized.
> 
> Please tell if any of the above is unclear, and I will try to explain
> more. Thanks,

No, that makes perfect sense.

Care to make a patch for no_pci_devices() to work properly in this kind
of situation?

thanks,

greg k-h

  reply	other threads:[~2008-12-20  8:59 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-18 22:06 v2.6.28-rc7: error in panic code? (NULL pointer dereference at 0000004c) Vegard Nossum
2008-12-19 17:19 ` Vegard Nossum
2008-12-19 17:39   ` Greg KH
2008-12-19 20:35     ` Pekka Enberg
2008-12-20  0:46       ` Greg KH
2008-12-20  8:52         ` Vegard Nossum
2008-12-20  8:58           ` Greg KH [this message]
2008-12-20 10:56             ` [PATCH] pci: fix no_pci_devices() Vegard Nossum
2008-12-20 11:14               ` [PATCH] pci: fix no_pci_devices() #2 Vegard Nossum
2008-12-20 23:31                 ` Greg KH
2009-01-05 19:09                 ` Jesse Barnes
2009-01-07  0:42                   ` Greg KH
2009-01-16 18:41                     ` Jesse Barnes
2009-01-16 19:21                       ` Vegard Nossum
2009-01-27 18:41                         ` Jesse Barnes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081220085851.GA25095@suse.de \
    --to=gregkh@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=penberg@cs.helsinki.fi \
    --cc=vegard.nossum@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox