All of lore.kernel.org
 help / color / mirror / Atom feed
From: prarit (prarit@prarit.com) <prarit@verizon.net>
To: linux-hotplug@vger.kernel.org
Subject: Re: Latest bk kernel does not properly free PCI IO & MEM allocations
Date: Sat, 12 Mar 2005 17:04:54 +0000	[thread overview]
Message-ID: <0ID900FO80S63RF0@vms046.mailsrvcs.net> (raw)
In-Reply-To: <422F42A9.7050009@sgi.com>

Greg,

Thanks for the reply.  I'm at home today and don't have access to my SGI
account.  If you want to fwd this to the mailing list please feel free.
If I don't see it Monday morning, I'll fwd it back to the list then.

>> What I meant to say was pci_free_resources calls release_resource, where
>> release_region calls __release_region.  __release_region is called a 
>> "legacy" function?
 
> What documentation calls it that?

Hrmm ... I distinctly remember seeing that somewhere.  I'll find it again.
I also recall the term "compatibility cruft" used ... in ioport.h?

>I don"t like the wording at all, no one will notice it (trust me I"ve
> tried stuff like this before...)

Yeah ... I know.  I've tried it before too.  I've been working in this
space for a long time and it's clear to me that it needs a compile time 
#define DEBUG printk option.  Issues within the resource allocation code 
are impossible to debug without sticking a large # of printk's 
throughout the code.

>But what"s really curious, is why no one has hit this before.  Nothing
> has changed recently in this area of the kernel.  Did this used to work
> before?  Does it work just fine without the patch for other drivers?

I haven't gone back through kernels to determine where this "breakage"
occurred, but I do know that it is in the 2.6.9 kernel as I have been 
developing on a RHEL4 platform.

This is how I stumbled across this:  As you know I'm building and am
testing an SGI Altix Hotplug Driver.  

While testing I started up a few memory stress and IO stress tests
*that did not involve the card I was targeting for the test* and
approximately 5% of the time I hit a NULL pointer oops.  

At first glance, I thought the issue was within the sysfs/proc 
filesystems or pci_free_resources as that's where the oops' were 
-- obviously with more inspection I realized that didn't make
sense and it wasn't  the case.

A few things have to happen (in the precise order) for someone
to hit this issue.  Note that I have been running on 16, 32, and
64 cpu systems so it is very like #4 below is due to another CPU.

1.  HP slot is disabled via sysfs.
2.  PCI driver must call release_regions
3.  release_regions kfree's resource structures
4.  Context switch/Another CPU: resource area is alloc'd by 
    something else.
5.  pci_free_resources is called

Possible oops right here.

I've also stumbled across the case where the pci_free_resources case 
tried to free non-existant regions -- dumping the memory address indicated
in one case that I was looking at char data ... I recall that it 
looked like I was looking at the word "qla".  I dumped /proc/iomem
and /proc/ioports and incurred an oops in /proc .

Suppose #4 above doesn't happen.  Step #5 occurs and no one (user or 
system) is the wiser.  The memory is still intact -- no oops.

Additionally, I've seen people using PCI Hotplug in the field. 
Typically when removing a card sysadmins tend to quiesce the system
before removal.  That makes #4 just that much less likely to occur.

:) :)  Want the long explanation? :) :)

P.


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95&alloc_id\x14396&op=click
_______________________________________________
Linux-hotplug-devel mailing list  http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel

  parent reply	other threads:[~2005-03-12 17:04 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-03-09 18:38 Latest bk kernel does not properly free PCI IO & MEM allocations Prarit Bhargava
2005-03-10 15:07 ` Prarit Bhargava
2005-03-10 17:17 ` Greg KH
2005-03-10 18:16 ` Prarit Bhargava
2005-03-10 18:50 ` Prarit Bhargava
2005-03-10 23:23 ` Greg KH
2005-03-11 21:30 ` Prarit Bhargava
2005-03-12  7:38 ` Greg KH
2005-03-12 17:04 ` prarit [this message]
2005-03-13  0:49 ` Re: Latest bk kernel does not properly free PCI IO & MEM prarit
2005-03-15  6:11 ` Re: Latest bk kernel does not properly free PCI IO & MEM allocations Rajesh Shah
2005-03-15 12:55 ` Prarit Bhargava

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0ID900FO80S63RF0@vms046.mailsrvcs.net \
    --to=prarit@verizon.net \
    --cc=linux-hotplug@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.