All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg KH <gregkh@suse.de>
To: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: v2.6.28-rc7: error in panic code? (NULL pointer dereference at 0000004c)
Date: Sat, 20 Dec 2008 00:58:51 -0800	[thread overview]
Message-ID: <20081220085851.GA25095@suse.de> (raw)
In-Reply-To: <19f34abd0812200052g50067413t398eb7d67213371@mail.gmail.com>

On Sat, Dec 20, 2008 at 09:52:10AM +0100, Vegard Nossum wrote:
> On Sat, Dec 20, 2008 at 1:46 AM, Greg KH <gregkh@suse.de> wrote:
> > On Fri, Dec 19, 2008 at 10:35:57PM +0200, Pekka Enberg wrote:
> >> Hi Greg,
> >>
> >> On Fri, Dec 19, 2008 at 06:19:43PM +0100, Vegard Nossum wrote:
> >> > Fixes what?  It might be quite difficult to revert that patch now, as
> >> > the infrastructure is no longer in place to use a private pci device
> >> > list, that code is long gone.
> >>
> >> Vegard forced one oops but got two! The first one is expected and but
> >> the second one shouldn't probably be there:
> >
> > "Second" oopses are known to not be reliable, I wouldn't count it as a
> > real problem unless it happens on its own.
> 
> Yes, because usually it's a process that BUGed and was killed --
> perhaps with locks held or in the middle of some transaction that will
> never complete. But this one happens in the panic code itself...
> 
> >
> >> >> > [    0.040993] EIP: [<c13b41dc>] klist_next+0x10/0x8d SS:ESP 0068:c165dd48
> >>
> >> Looks like the patch Vegard identified breaks something in the oops path?
> >
> > Very wierd, I also don't understand how reverting the specific patch
> > would even make a buildable system.
> 
> It was an unclean revert, here's the relevant resultant hunk:
> 
> diff --cc drivers/pci/probe.c
> index 003a9b3,2db2e4b..0000000
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@@ -32,16 -29,55 +28,11 @@@ LIST_HEAD(pci_devices)
>    */
>   int no_pci_devices(void)
>   {
> -       struct device *dev;
> -       int no_devices;
> -
> -       dev = bus_find_device(&pci_bus_type, NULL, NULL, find_anything);
> -       no_devices = (dev == NULL);
> -       put_device(dev);
> -       return no_devices;
> +       return list_empty(&pci_devices);
>   }
> +
>   EXPORT_SYMBOL(no_pci_devices);
> 
> 
> ...and this builds.
> 
> The problem seems to be that pci_bus_type.p->klist_devices is NULL. Because:
> 
> Oops happens here:
> 
> struct klist_node *klist_next(struct klist_iter *i)
> {
>         void (*put)(struct klist_node *) = i->i_klist->put;
> 
> So either i == NULL or i->i_klist == NULL. But i->i_klist was just
> before set here:
> 
> void klist_iter_init_node(struct klist *k, struct klist_iter *i,
>                           struct klist_node *n)
> {
>         i->i_klist = k;
>         i->i_cur = n;
>         if (n)
>                 kref_get(&n->n_ref);
> }
> 
> so the klist passed must have been NULL, it came from bus_find_device():
> 
>         klist_iter_init_node(&bus->p->klist_devices, &i,
>                              (start ? &start->knode_bus : NULL));
> 
> ...and indeed, printing bus->p here yields 00000000. This function was
> called from no_pci_devices(), so the bus variable was initialized from
> &pci_bus_type. So pci_bus_type.p == NULL.
> 
> This should be initialized in bus_register() called from
> pci_driver_init(). Aha, this never gets called because initcalls did
> not yet run.
> 
> A summary of the bug:
> 
> 1. Sending panic=1 wants to reboot on panic.
> 2. If panic occurs before initcalls ran, pci_bus_type.p is not initialized.
> 3. mach_reboot_fixups() in x86 code calls pci_get_device()
> 4. New oops
> 
> Maybe mach_reboot_fixups() should check to see if PCI bus is
> initialized before calling pci_get_device(), since obviously it can be
> called before it has been initialized too.
> 
> The funny thing is that no_pci_devices() is what _used_ to guard
> against using pci_bus_type too early:
> 
> /*
>  * Some device drivers need know if pci is initiated.
>  * Basically, we think pci is not initiated when there
>  * is no device to be found on the pci_bus_type.
>  */
> int no_pci_devices(void)
> 
> ...and now it uses pci_bus_type itself. That is what makes commit
> 70308923d317f2ad4973c30d90bb48ae38761317 wrong, because there might be
> other users of no_pci_devices() too, which would now almost certainly
> result in an Oops if the pci bus hasn't been initialized.
> 
> Please tell if any of the above is unclear, and I will try to explain
> more. Thanks,

No, that makes perfect sense.

Care to make a patch for no_pci_devices() to work properly in this kind
of situation?

thanks,

greg k-h

  reply	other threads:[~2008-12-20  8:59 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-18 22:06 v2.6.28-rc7: error in panic code? (NULL pointer dereference at 0000004c) Vegard Nossum
2008-12-19 17:19 ` Vegard Nossum
2008-12-19 17:39   ` Greg KH
2008-12-19 20:35     ` Pekka Enberg
2008-12-20  0:46       ` Greg KH
2008-12-20  8:52         ` Vegard Nossum
2008-12-20  8:58           ` Greg KH [this message]
2008-12-20 10:56             ` [PATCH] pci: fix no_pci_devices() Vegard Nossum
2008-12-20 11:14               ` [PATCH] pci: fix no_pci_devices() #2 Vegard Nossum
2008-12-20 23:31                 ` Greg KH
2009-01-05 19:09                 ` Jesse Barnes
2009-01-07  0:42                   ` Greg KH
2009-01-16 18:41                     ` Jesse Barnes
2009-01-16 19:21                       ` Vegard Nossum
2009-01-27 18:41                         ` Jesse Barnes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081220085851.GA25095@suse.de \
    --to=gregkh@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=penberg@cs.helsinki.fi \
    --cc=vegard.nossum@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.