public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH]  Add pci_walk_bus function to PCI core (nonrecursive)
@ 2005-08-18  4:33 Paul Mackerras
  2005-08-18  4:58 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 6+ messages in thread
From: Paul Mackerras @ 2005-08-18  4:33 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-pci, linux-kernel, linas

The PCI error recovery infrastructure needs to be able to contact all
the drivers affected by a PCI error event, which may mean traversing
all the devices under a given PCI-PCI bridge.  This patch adds a
function to the PCI core that traverses all the PCI devices on a PCI
bus and under any PCI-PCI bridges on that bus (and so on), calling a
given function for each device.  This provides a way for the error
recovery code to iterate through all devices that are affected by an
error event.

This version is not implemented as a recursive function.  Instead,
when we reach a PCI-PCI bridge, we set the pointers to start doing the
devices on the bus under the bridge, and when we reach the end of a
bus's devices, we use the bus->self pointer to go back up to the next
higher bus and continue doing its devices.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
diff -urN linux-2.6/drivers/pci/bus.c test-pseries/drivers/pci/bus.c
--- linux-2.6/drivers/pci/bus.c	2005-08-03 10:51:36.000000000 +1000
+++ test-pseries/drivers/pci/bus.c	2005-08-18 14:03:28.000000000 +1000
@@ -150,6 +150,54 @@
 	}
 }
 
+/** pci_walk_bus - walk devices on/under bus, calling callback.
+ *  @top      bus whose devices should be walked
+ *  @cb       callback to be called for each device found
+ *  @userdata arbitrary pointer to be passed to callback.
+ *
+ *  Walk the given bus, including any bridged devices
+ *  on buses under this bus.  Call the provided callback
+ *  on each device found.
+ */
+void pci_walk_bus(struct pci_bus *top, void (*cb)(struct pci_dev *, void *),
+		  void *userdata)
+{
+	struct pci_dev *dev;
+	struct pci_bus *bus;
+	struct list_head *next;
+
+	bus = top;
+	spin_lock(&pci_bus_lock);
+	next = top->devices.next;
+	for (;;) {
+		if (next == &bus->devices) {
+			/* end of this bus, go up or finish */
+			if (bus == top)
+				break;
+			next = bus->self->bus_list.next;
+			bus = bus->self->bus;
+			continue;
+		}
+		dev = list_entry(next, struct pci_dev, bus_list);
+		pci_dev_get(dev);
+		if (dev->subordinate) {
+			/* this is a pci-pci bridge, do its devices next */
+			next = dev->subordinate->devices.next;
+			bus = dev->subordinate;
+		} else
+			next = dev->bus_list.next;
+		spin_unlock(&pci_bus_lock);
+
+		/* Run device routines with the bus unlocked */
+		cb(dev, userdata);
+
+		spin_lock(&pci_bus_lock);
+		pci_dev_put(dev);
+	}
+	spin_unlock(&pci_bus_lock);
+}
+EXPORT_SYMBOL_GPL(pci_walk_bus);
+
 EXPORT_SYMBOL(pci_bus_alloc_resource);
 EXPORT_SYMBOL_GPL(pci_bus_add_device);
 EXPORT_SYMBOL(pci_bus_add_devices);
diff -urN linux-2.6/include/linux/pci.h test-pseries/include/linux/pci.h
--- linux-2.6/include/linux/pci.h	2005-08-18 12:59:18.000000000 +1000
+++ test-pseries/include/linux/pci.h	2005-08-18 13:54:38.000000000 +1000
@@ -865,6 +865,9 @@
 const struct pci_device_id *pci_match_id(const struct pci_device_id *ids, struct pci_dev *dev);
 int pci_scan_bridge(struct pci_bus *bus, struct pci_dev * dev, int max, int pass);
 
+void pci_walk_bus(struct pci_bus *top, void (*cb)(struct pci_dev *, void *),
+		  void *userdata);
+
 /* kmem_cache style wrapper around pci_alloc_consistent() */
 
 #include <linux/dmapool.h>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH]  Add pci_walk_bus function to PCI core (nonrecursive)
  2005-08-18  4:33 [PATCH] Add pci_walk_bus function to PCI core (nonrecursive) Paul Mackerras
@ 2005-08-18  4:58 ` Benjamin Herrenschmidt
  2005-08-18  5:13   ` Greg KH
  2005-08-19 16:30   ` Linas Vepstas
  0 siblings, 2 replies; 6+ messages in thread
From: Benjamin Herrenschmidt @ 2005-08-18  4:58 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Greg KH, linux-pci, linux-kernel, linas

On Thu, 2005-08-18 at 14:33 +1000, Paul Mackerras wrote:
> The PCI error recovery infrastructure needs to be able to contact all
> the drivers affected by a PCI error event, which may mean traversing
> all the devices under a given PCI-PCI bridge.  This patch adds a
> function to the PCI core that traverses all the PCI devices on a PCI
> bus and under any PCI-PCI bridges on that bus (and so on), calling a
> given function for each device.  This provides a way for the error
> recovery code to iterate through all devices that are affected by an
> error event.

 .../...

Note that it's racy vs. removal of devices, but I suspect a good bunch
of the PCI code is. The whole idea that list*_safe routines pay you
anything in that regard need to be shot. Afaik, they are only safe about
the caller removing the current element.

I wonder if it's finally time to implement proper race free list
iterators in the kernel. Not that difficult... A small struct iterator
with a list head and the current elem pointer, and the "interated" list
containing the list itself, a list of iterators and a lock. Iterators
can then be "fixed" up on element removal with a fine grained lock on
list structure access.

Ben.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH]  Add pci_walk_bus function to PCI core (nonrecursive)
  2005-08-18  4:58 ` Benjamin Herrenschmidt
@ 2005-08-18  5:13   ` Greg KH
  2005-08-18  8:02     ` Benjamin Herrenschmidt
  2005-08-19 16:30   ` Linas Vepstas
  1 sibling, 1 reply; 6+ messages in thread
From: Greg KH @ 2005-08-18  5:13 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Paul Mackerras, linux-pci, linux-kernel, linas, Patrick Mochel

On Thu, Aug 18, 2005 at 02:58:28PM +1000, Benjamin Herrenschmidt wrote:
> I wonder if it's finally time to implement proper race free list
> iterators in the kernel. Not that difficult... A small struct iterator
> with a list head and the current elem pointer, and the "interated" list
> containing the list itself, a list of iterators and a lock. Iterators
> can then be "fixed" up on element removal with a fine grained lock on
> list structure access.

Pat tried to do that with klist, but people seem to think it's still
racy in some corner cases.  Have you looked at them?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH]  Add pci_walk_bus function to PCI core (nonrecursive)
  2005-08-18  5:13   ` Greg KH
@ 2005-08-18  8:02     ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 6+ messages in thread
From: Benjamin Herrenschmidt @ 2005-08-18  8:02 UTC (permalink / raw)
  To: Greg KH; +Cc: Paul Mackerras, linux-pci, linux-kernel, linas, Patrick Mochel

On Wed, 2005-08-17 at 22:13 -0700, Greg KH wrote:
> On Thu, Aug 18, 2005 at 02:58:28PM +1000, Benjamin Herrenschmidt wrote:
> > I wonder if it's finally time to implement proper race free list
> > iterators in the kernel. Not that difficult... A small struct iterator
> > with a list head and the current elem pointer, and the "interated" list
> > containing the list itself, a list of iterators and a lock. Iterators
> > can then be "fixed" up on element removal with a fine grained lock on
> > list structure access.
> 
> Pat tried to do that with klist, but people seem to think it's still
> racy in some corner cases.  Have you looked at them?

Not specifically... I used that type of construct in a distant past and
I'm pretty sure my stuff back them was quite solid but it's all old
memories. You have to be careful about a few things, like properly
getting() the current pointed object under the lock, and that sort of
thing, but it's not that complicated. I don't know of any other mecanism
to iterate lists in a safe way without taking global locks, which we
want to avoid as that would open us to all sorts of deadlocks...

The case of trees like PCI is fine provided parents aren't removed
before all childs are removed, and thus all iterators properly
invalidated.

Now, maybe I missed something ... but in any case, it seems a lot of our
current iterating constucts are racy.

Ben.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH]  Add pci_walk_bus function to PCI core (nonrecursive)
  2005-08-18  4:58 ` Benjamin Herrenschmidt
  2005-08-18  5:13   ` Greg KH
@ 2005-08-19 16:30   ` Linas Vepstas
  2005-08-20  0:10     ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 6+ messages in thread
From: Linas Vepstas @ 2005-08-19 16:30 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Paul Mackerras, Greg KH, linux-pci, linux-kernel

Hi,

On Thu, Aug 18, 2005 at 02:58:28PM +1000, Benjamin Herrenschmidt was heard to remark:
> On Thu, 2005-08-18 at 14:33 +1000, Paul Mackerras wrote:
> > This patch adds a
> > function to the PCI core that traverses all the PCI devices on a PCI
> > bus and under any PCI-PCI bridges on that bus (and so on), calling a
> > given function for each device.  

Paul, thanks, I'll use this in the next version, which I'm trying to
assemble now.

> Note that it's racy vs. removal of devices, 

...

> The whole idea that list*_safe routines pay you

OK, well, that makes me feel better, as I've stared at that code before, 
and wondered what magic made it work.

> I wonder if it's finally time to implement proper race free list
> iterators in the kernel. Not that difficult... A small struct iterator
> with a list head and the current elem pointer, and the "interated" list
> containing the list itself, a list of iterators and a lock. Iterators
> can then be "fixed" up on element removal with a fine grained lock on
> list structure access.

Wow. A list of iterators to be fixed up ... I get the idea, but it does
add a fair amount of complexity.  

Would it be easier (and simpler to maintain/debug) to "get" all items
on the list first, before iterating on them, and only then iterate?
This should work, as long as removing an item doesn't trash its "next" 
pointer.  The "next" pointer gets whacked only when the use-count goes 
to zero.

The idea is that while traversing the list, its OK if the "next" pointer
is pointing to a node that has removed from the list; in fact, its OK to
traverse to that node; eventually, traversing will eventually lead back
to a valid head.

I know that the above sounds crazy, but I think it could work, and be 
a relatively low-tech but capable solution.  It does presume that elems
have generic get/put routines.

--linas

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH]  Add pci_walk_bus function to PCI core (nonrecursive)
  2005-08-19 16:30   ` Linas Vepstas
@ 2005-08-20  0:10     ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 6+ messages in thread
From: Benjamin Herrenschmidt @ 2005-08-20  0:10 UTC (permalink / raw)
  To: Linas Vepstas; +Cc: Paul Mackerras, Greg KH, linux-pci, linux-kernel


> Wow. A list of iterators to be fixed up ... I get the idea, but it does
> add a fair amount of complexity.  

Not that much, been there done that, it's actually quite simple :)

> Would it be easier (and simpler to maintain/debug) to "get" all items
> on the list first, before iterating on them, and only then iterate?

How do you protect against change of your "next" pointer ? by taking a
global lock, peeking at it, getting it, unlocking ? That would work in
that case I suppose ...

> This should work, as long as removing an item doesn't trash its "next" 
> pointer.  The "next" pointer gets whacked only when the use-count goes 
> to zero.

A fine grained lock used when adding/removing items and when "peeking"
at next might work there..

> The idea is that while traversing the list, its OK if the "next" pointer
> is pointing to a node that has removed from the list; in fact, its OK to
> traverse to that node; eventually, traversing will eventually lead back
> to a valid head.

It's just making the race less likely to happen, but it's still there.
Once the object you are peeking (or your next object, whatever) is
unhooked from the list, that means that the next point it contains is
crap. At any  time while you are playing with it, somebody may free the
"next" object, and since you are unhooked form the list, your own "next"
pointer will not be updated -> kaboom.

> I know that the above sounds crazy, but I think it could work, and be 
> a relatively low-tech but capable solution.  It does presume that elems
> have generic get/put routines.
> 
> --linas


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-08-20  0:11 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-18  4:33 [PATCH] Add pci_walk_bus function to PCI core (nonrecursive) Paul Mackerras
2005-08-18  4:58 ` Benjamin Herrenschmidt
2005-08-18  5:13   ` Greg KH
2005-08-18  8:02     ` Benjamin Herrenschmidt
2005-08-19 16:30   ` Linas Vepstas
2005-08-20  0:10     ` Benjamin Herrenschmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox