All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management
       [not found] <eac2ad3017b5f160d24c.1141922822@localhost.localdomain>
@ 2006-03-09 23:20 ` Roland Dreier
  2006-03-09 23:39   ` Bryan O'Sullivan
  2006-03-09 23:24 ` Roland Dreier
  2006-03-09 23:26 ` Roland Dreier
  2 siblings, 1 reply; 31+ messages in thread
From: Roland Dreier @ 2006-03-09 23:20 UTC (permalink / raw)
  To: Bryan O'Sullivan
  Cc: rolandd, gregkh, akpm, davem, linux-kernel, openib-general

    Bryan> The ipath_sma.c file supports a lightweight userspace
    Bryan> subnet management agent (SMA).  This is used in deployments
    Bryan> (such as HPC clusters) where a full Infiniband protocol
    Bryan> stack is not needed.

I've never understood what forces you to maintain two separate SMAs.
Why can't you pick one of the two SMAs and use that unconditionally?

 - R.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management
       [not found] <eac2ad3017b5f160d24c.1141922822@localhost.localdomain>
  2006-03-09 23:20 ` [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management Roland Dreier
@ 2006-03-09 23:24 ` Roland Dreier
  2006-03-09 23:49   ` Bryan O'Sullivan
  2006-03-09 23:26 ` Roland Dreier
  2 siblings, 1 reply; 31+ messages in thread
From: Roland Dreier @ 2006-03-09 23:24 UTC (permalink / raw)
  To: Bryan O'Sullivan
  Cc: rolandd, gregkh, akpm, davem, linux-kernel, openib-general

 > +static int ipath_sma_open(struct inode *in, struct file *fp)
 > +{
 > +	int s;
 > +
 > +	if (ipath_sma_alive) {
 > +		ipath_dbg("SMA already running (pid %u), failing\n",
 > +			  ipath_sma_alive);
 > +		return -EBUSY;
 > +	}
 > +
 > +	for (s = 0; s < atomic_read(&ipath_max); s++) {
 > +		struct ipath_devdata *dd = ipath_lookup(s);
 > +		/* we need at least one infinipath device to be initialized. */
 > +		if (dd && dd->ipath_flags & IPATH_INITTED) {
 > +			ipath_sma_alive = current->pid;

It seems there's a window here where two processes can both pass the
if (ipath_sma_alive) test and then proceed to step on each other.

 - R.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management
       [not found] <eac2ad3017b5f160d24c.1141922822@localhost.localdomain>
  2006-03-09 23:20 ` [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management Roland Dreier
  2006-03-09 23:24 ` Roland Dreier
@ 2006-03-09 23:26 ` Roland Dreier
  2006-03-09 23:52   ` Bryan O'Sullivan
  2 siblings, 1 reply; 31+ messages in thread
From: Roland Dreier @ 2006-03-09 23:26 UTC (permalink / raw)
  To: Bryan O'Sullivan
  Cc: rolandd, gregkh, akpm, davem, linux-kernel, openib-general

 > +static int ipath_sma_release(struct inode *in, struct file *fp)
 > +{
 > +	int s;
 > +
 > +	ipath_sma_alive = 0;
 > +	ipath_cdbg(SMA, "Closing SMA device\n");
 > +	for (s = 0; s < atomic_read(&ipath_max); s++) {
 > +		struct ipath_devdata *dd = ipath_lookup(s);
 > +
 > +		if (!dd || !(dd->ipath_flags & IPATH_INITTED))
 > +			continue;
 > +		*dd->ipath_statusp &= ~IPATH_STATUS_SMA;
 > +		if (dd->verbs_layer.l_flags & IPATH_VERBS_KERNEL_SMA)
 > +			*dd->ipath_statusp |= IPATH_STATUS_OIB_SMA;
 > +	}
 > +	return 0;
 > +}

Similarly what protects against another process opening the device
right after the ipath_sma_alive = 0 setting, but before you do all the
cleanup that's after that?

And what protects against a hot unplug of a device after the test of s
against ipath_max?

 - R.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management
  2006-03-09 23:20 ` [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management Roland Dreier
@ 2006-03-09 23:39   ` Bryan O'Sullivan
  2006-03-09 23:47     ` Roland Dreier
  2006-03-10 15:54     ` Michael S. Tsirkin
  0 siblings, 2 replies; 31+ messages in thread
From: Bryan O'Sullivan @ 2006-03-09 23:39 UTC (permalink / raw)
  To: Roland Dreier; +Cc: rolandd, gregkh, akpm, davem, linux-kernel, openib-general

On Thu, 2006-03-09 at 15:20 -0800, Roland Dreier wrote:

> I've never understood what forces you to maintain two separate SMAs.
> Why can't you pick one of the two SMAs and use that unconditionally?

Three reasons.

      * OpenSM wasn't usable when we wrote our SMA.  We have customers
        using ours now, so we have to support it.
      * Our SMA does some setup for the layered ethernet emulation
        driver.
      * Our SMA works without an IB stack of any kind present.

	<b


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management
  2006-03-09 23:39   ` Bryan O'Sullivan
@ 2006-03-09 23:47     ` Roland Dreier
  2006-03-09 23:50       ` Bryan O'Sullivan
  2006-03-10 15:54     ` Michael S. Tsirkin
  1 sibling, 1 reply; 31+ messages in thread
From: Roland Dreier @ 2006-03-09 23:47 UTC (permalink / raw)
  To: Bryan O'Sullivan
  Cc: rolandd, gregkh, akpm, davem, linux-kernel, openib-general

 > Three reasons.
 > 
 >       * OpenSM wasn't usable when we wrote our SMA.  We have customers
 >         using ours now, so we have to support it.

Huh?  What does OpenSM working or not have to do with the SMA?

 >       * Our SMA does some setup for the layered ethernet emulation
 >         driver.
 >       * Our SMA works without an IB stack of any kind present.

That's fine.  So then I guess the question is, why can't you use your
SMA all the time?

And does that mean that the verbs SMA doesn't support ethernet
emulation, so you can't use ethernet emulation and verbs at the same time?

 - R.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management
  2006-03-09 23:24 ` Roland Dreier
@ 2006-03-09 23:49   ` Bryan O'Sullivan
  2006-03-09 23:51     ` Roland Dreier
  0 siblings, 1 reply; 31+ messages in thread
From: Bryan O'Sullivan @ 2006-03-09 23:49 UTC (permalink / raw)
  To: Roland Dreier; +Cc: rolandd, gregkh, akpm, davem, linux-kernel, openib-general

On Thu, 2006-03-09 at 15:24 -0800, Roland Dreier wrote:

> It seems there's a window here where two processes can both pass the
> if (ipath_sma_alive) test and then proceed to step on each other.

Yep, this is a real race, albeit incredibly unlikely.  I just turned
ipath_sma_alive into an atomic_t, and wrapped the open/close code in
spinlocks.

	<b


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management
  2006-03-09 23:47     ` Roland Dreier
@ 2006-03-09 23:50       ` Bryan O'Sullivan
  2006-03-09 23:52         ` Roland Dreier
  0 siblings, 1 reply; 31+ messages in thread
From: Bryan O'Sullivan @ 2006-03-09 23:50 UTC (permalink / raw)
  To: Roland Dreier; +Cc: rolandd, gregkh, akpm, davem, linux-kernel, openib-general

On Thu, 2006-03-09 at 15:47 -0800, Roland Dreier wrote:

> That's fine.  So then I guess the question is, why can't you use your
> SMA all the time?

We do.  It coexists with OpenSM if OpenSM is present.

	<b


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management
  2006-03-09 23:49   ` Bryan O'Sullivan
@ 2006-03-09 23:51     ` Roland Dreier
  0 siblings, 0 replies; 31+ messages in thread
From: Roland Dreier @ 2006-03-09 23:51 UTC (permalink / raw)
  To: Bryan O'Sullivan
  Cc: rolandd, gregkh, akpm, davem, linux-kernel, openib-general

    Bryan> Yep, this is a real race, albeit incredibly unlikely.  I
    Bryan> just turned ipath_sma_alive into an atomic_t, and wrapped
    Bryan> the open/close code in spinlocks.

How does making it an atomic_t help?  I think you're only going to be
using atomic_set() and atomic_read(), and atomic_t doesn't provide
anything in that case.

 - R.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management
  2006-03-09 23:50       ` Bryan O'Sullivan
@ 2006-03-09 23:52         ` Roland Dreier
  0 siblings, 0 replies; 31+ messages in thread
From: Roland Dreier @ 2006-03-09 23:52 UTC (permalink / raw)
  To: Bryan O'Sullivan
  Cc: rolandd, gregkh, akpm, davem, linux-kernel, openib-general

    Roland> That's fine.  So then I guess the question is, why can't
    Roland> you use your SMA all the time?

    Bryan> We do.  It coexists with OpenSM if OpenSM is present.

So can we kill the other SMA?

 - R.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management
  2006-03-09 23:26 ` Roland Dreier
@ 2006-03-09 23:52   ` Bryan O'Sullivan
  2006-03-10  0:00     ` Roland Dreier
  2006-03-10  0:45     ` Greg KH
  0 siblings, 2 replies; 31+ messages in thread
From: Bryan O'Sullivan @ 2006-03-09 23:52 UTC (permalink / raw)
  To: Roland Dreier; +Cc: rolandd, gregkh, akpm, davem, linux-kernel, openib-general

On Thu, 2006-03-09 at 15:26 -0800, Roland Dreier wrote:

> Similarly what protects against another process opening the device
> right after the ipath_sma_alive = 0 setting, but before you do all the
> cleanup that's after that?

This is fixed by the stuff I just did in response to your earlier
message.

> And what protects against a hot unplug of a device after the test of s
> against ipath_max?

We don't support hotplugged devices at the moment.  If you're asking
whether an rmmod at the wrong time could cause something bad to happen,
I don't *think* so.

	<b


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management
  2006-03-09 23:52   ` Bryan O'Sullivan
@ 2006-03-10  0:00     ` Roland Dreier
  2006-03-10  0:04       ` Bryan O'Sullivan
  2006-03-10  0:45     ` Greg KH
  1 sibling, 1 reply; 31+ messages in thread
From: Roland Dreier @ 2006-03-10  0:00 UTC (permalink / raw)
  To: Bryan O'Sullivan
  Cc: rolandd, gregkh, akpm, davem, linux-kernel, openib-general

    Bryan> We don't support hotplugged devices at the moment.  If
    Bryan> you're asking whether an rmmod at the wrong time could
    Bryan> cause something bad to happen, I don't *think* so.

How do you stop someone from hot plugging a PCIe device?

 - R.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management
  2006-03-10  0:00     ` Roland Dreier
@ 2006-03-10  0:04       ` Bryan O'Sullivan
  0 siblings, 0 replies; 31+ messages in thread
From: Bryan O'Sullivan @ 2006-03-10  0:04 UTC (permalink / raw)
  To: Roland Dreier; +Cc: rolandd, gregkh, akpm, davem, linux-kernel, openib-general

On Thu, 2006-03-09 at 16:00 -0800, Roland Dreier wrote:
>     Bryan> We don't support hotplugged devices at the moment.
> 
> How do you stop someone from hot plugging a PCIe device?

You say "we don't support this yet" somewhere in big letters.  The chips
and cards support hotplug electrically and logically.  We just haven't
had time yet to do the driver support work, and won't for a while.

	<b


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management
  2006-03-10  0:35 [PATCH 0 of 20] [RFC] ipath driver - another round for review Bryan O'Sullivan
@ 2006-03-10  0:35 ` Bryan O'Sullivan
  2006-03-10  0:45   ` Roland Dreier
  0 siblings, 1 reply; 31+ messages in thread
From: Bryan O'Sullivan @ 2006-03-10  0:35 UTC (permalink / raw)
  To: rolandd, gregkh, akpm, davem; +Cc: linux-kernel, openib-general

The ipath_diag.c file permits userspace diagnostic tools to read and
write a chip's registers.  It is different in purpose from the mmap
interfaces to the /sys/bus/pci resource files.

The ipath_sma.c file supports a lightweight userspace subnet management
agent (SMA).  This is used in deployments (such as HPC clusters) where
a full Infiniband protocol stack is not needed.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>

diff -r 1123028ac13a -r 28bb276205de drivers/infiniband/hw/ipath/ipath_diag.c
--- /dev/null	Thu Jan  1 00:00:00 1970 +0000
+++ b/drivers/infiniband/hw/ipath/ipath_diag.c	Thu Mar  9 16:16:04 2006 -0800
@@ -0,0 +1,377 @@
+/*
+ * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+/*
+ * This file contains support for diagnostic functions.  It is accessed by
+ * opening the ipath_diag device, normally minor number 129.  Diagnostic use
+ * of the InfiniPath chip may render the chip or board unusable until the
+ * driver is unloaded, or in some cases, until the system is rebooted.
+ *
+ * Accesses to the chip through this interface are not similar to going
+ * through the /sys/bus/pci resource mmap interface.
+ */
+
+#include <linux/pci.h>
+#include <linux/version.h>
+#include <asm/io.h>
+#include <asm/uaccess.h>
+
+#include "ipath_common.h"
+#include "ipath_kernel.h"
+#include "ips_common.h"
+#include "ipath_layer.h"
+
+__kernel_pid_t ipath_diag_alive;	/* PID of diags, if running */
+static int diag_set_link;
+
+static int ipath_diag_open(struct inode *in, struct file *fp);
+static int ipath_diag_release(struct inode *in, struct file *fp);
+static ssize_t ipath_diag_read(struct file *fp, char __user *data,
+			       size_t count, loff_t *off);
+static ssize_t ipath_diag_write(struct file *fp, const char __user *data,
+				size_t count, loff_t *off);
+
+static struct file_operations diag_file_ops = {
+	.owner = THIS_MODULE,
+	.write = ipath_diag_write,
+	.read = ipath_diag_read,
+	.open = ipath_diag_open,
+	.release = ipath_diag_release
+};
+
+static struct cdev *diag_cdev;
+static struct class_device *diag_class_dev;
+
+int ipath_diag_init(void)
+{
+	return ipath_cdev_init(IPATH_DIAG_MINOR, "ipath_diag",
+			       &diag_file_ops, &diag_cdev, &diag_class_dev);
+}
+
+void ipath_diag_cleanup(void)
+{
+	ipath_cdev_cleanup(&diag_cdev, &diag_class_dev);
+}
+
+/**
+ * ipath_read_umem64 - read a 64-bit quantity from the chip into user space
+ * @dd: the infinipath device
+ * @uaddr: the location to store the data in user memory
+ * @caddr: the source chip address (full pointer, not offset)
+ * @count: number of bytes to copy (multiple of 32 bits)
+ *
+ * This function also localizes all chip memory accesses.
+ * The copy should be written such that we read full cacheline packets
+ * from the chip.  This is usually used for a single qword
+ *
+ * NOTE:  This assumes the chip address is 64-bit aligned.
+ */
+static int ipath_read_umem64(struct ipath_devdata *dd, void __user *uaddr,
+			     const void __iomem *caddr, size_t count)
+{
+	const u64 __iomem *reg_addr = caddr;
+	const u64 __iomem *reg_end = reg_addr + (count / sizeof(u64));
+	int ret;
+
+	/* not very efficient, but it works for now */
+	if (reg_addr < dd->ipath_kregbase ||
+	    reg_end > dd->ipath_kregend) {
+		ret = -EINVAL;
+		goto bail;
+	}
+	while (reg_addr < reg_end) {
+		u64 data = readq(reg_addr);
+		if (copy_to_user(uaddr, &data, sizeof(u64))) {
+			ret = -EFAULT;
+			goto bail;
+		}
+		reg_addr++;
+		uaddr++;
+	}
+	ret = 0;
+bail:
+	return ret;
+}
+
+/**
+ * ipath_write_umem64 - write a 64-bit quantity to the chip from user space
+ * @dd: the infinipath device
+ * @caddr: the destination chip address (full pointer, not offset)
+ * @uaddr: the source of the data in user memory
+ * @count: the number of bytes to copy (multiple of 32 bits)
+ *
+ * This is usually used for a single qword
+ * NOTE:  This assumes the chip address is 64-bit aligned.
+ */
+
+static int ipath_write_umem64(struct ipath_devdata *dd, void __iomem *caddr,
+			      const void __user *uaddr, size_t count)
+{
+	u64 __iomem *reg_addr = caddr;
+	const u64 __iomem *reg_end = reg_addr + (count / sizeof(u64));
+	int ret;
+
+	/* not very efficient, but it works for now */
+	if (reg_addr < dd->ipath_kregbase ||
+	    reg_end > dd->ipath_kregend) {
+		ret = -EINVAL;
+		goto bail;
+	}
+	while (reg_addr < reg_end) {
+		u64 data;
+		if (copy_from_user(&data, uaddr, sizeof(data))) {
+			ret = -EFAULT;
+			goto bail;
+		}
+		writeq(data, reg_addr);
+
+		reg_addr++;
+		uaddr++;
+	}
+	ret = 0;
+bail:
+	return ret;
+}
+
+/**
+ * ipath_read_umem32 - read a 32-bit quantity from the chip into user space
+ * @dd: the infinipath device
+ * @uaddr: the location to store the data in user memory
+ * @caddr: the source chip address (full pointer, not offset)
+ * @count: number of bytes to copy
+ *
+ * read 32 bit values, not 64 bit; for memories that only
+ * support 32 bit reads; usually a single dword.
+ */
+static int ipath_read_umem32(struct ipath_devdata *dd, void __user *uaddr,
+			     const void __iomem *caddr, size_t count)
+{
+	const u32 __iomem *reg_addr = caddr;
+	const u32 __iomem *reg_end = reg_addr + (count / sizeof(u32));
+	int ret;
+
+	if (reg_addr < (u32 __iomem *) dd->ipath_kregbase ||
+	    reg_end > (u32 __iomem *) dd->ipath_kregend) {
+		ret = -EINVAL;
+		goto bail;
+	}
+	/* not very efficient, but it works for now */
+	while (reg_addr < reg_end) {
+		u32 data = readl(reg_addr);
+		if (copy_to_user(uaddr, &data, sizeof(data))) {
+			ret = -EFAULT;
+			goto bail;
+		}
+
+		reg_addr++;
+		uaddr++;
+	}
+	ret = 0;
+bail:
+	return ret;
+}
+
+/**
+ * ipath_write_umem32 - write a 32-bit quantity to the chip from user space
+ * @dd: the infinipath device
+ * @caddr: the destination chip address (full pointer, not offset)
+ * @uaddr: the source of the data in user memory
+ * @count: number of bytes to copy
+ *
+ * write 32 bit values, not 64 bit; for memories that only
+ * support 32 bit write; usually a single dword.
+ */
+
+static int ipath_write_umem32(struct ipath_devdata *dd, void __iomem *caddr,
+			      const void __user *uaddr, size_t count)
+{
+	u32 __iomem *reg_addr = caddr;
+	const u32 __iomem *reg_end = reg_addr + (count / sizeof(u32));
+	int ret;
+
+	if (reg_addr < (u32 __iomem *) dd->ipath_kregbase ||
+	    reg_end > (u32 __iomem *) dd->ipath_kregend) {
+		ret = -EINVAL;
+		return ret;
+	}
+	while (reg_addr < reg_end) {
+		u32 data;
+		if (copy_from_user(&data, uaddr, sizeof(data))) {
+			ret = -EFAULT;
+			goto bail;
+		}
+		writel(data, reg_addr);
+
+		reg_addr++;
+		uaddr++;
+	}
+	ret = 0;
+bail:
+	return ret;
+}
+
+static int ipath_diag_open(struct inode *in, struct file *fp)
+{
+	int i, ret;
+
+	if (ipath_diag_alive) {
+		ipath_dbg("Diags already running (pid %u), failing\n",
+			  ipath_diag_alive);
+		ret = -EBUSY;
+		goto bail;
+	}
+
+	for (i = 0; i < atomic_read(&ipath_max); i++) {
+		struct ipath_devdata *dd = ipath_lookup(i);
+
+		if (!dd || !(dd->ipath_flags & IPATH_PRESENT) ||
+		    !dd->ipath_kregbase)
+			continue;
+
+		/*
+		 * we need at least one infinipath device to be
+		 * present (don't use INITTED, because we want to be
+		 * able to open even if device is in freeze mode,
+		 * which cleared INITTED).  There is s small amount of
+		 * risk to this, which is why we also verify kregbase
+		 * is set.
+		 */
+
+		ipath_diag_alive = current->pid;
+		diag_set_link = 0;
+		ipath_dbg("diag device now open, active as PID %u\n",
+			  ipath_diag_alive);
+
+		/* Only expose a way to reset the device if we
+		   make it into diag mode. */
+		ipath_expose_reset(&dd->pcidev->dev);
+
+		ret = 0;
+		goto bail;
+	}
+
+	ipath_dbg("No hardware yet found and initted, failing diags\n");
+	ret = -ENODEV;
+
+bail:
+	return ret;
+}
+
+static int ipath_diag_release(struct inode *i, struct file *f)
+{
+	ipath_diag_alive = 0;
+	ipath_dbg("Closing DIAG device\n");
+	return 0;
+}
+
+static ssize_t ipath_diag_read(struct file *fp, char __user *data,
+			       size_t count, loff_t *off)
+{
+	int unit = 0; /* XXX this is bogus */
+	struct ipath_devdata *dd;
+	void __iomem *kreg_base;
+	ssize_t ret;
+
+	dd = ipath_lookup(unit);
+	if (!dd) {
+		ret = -ENODEV;
+		goto bail;
+	}
+
+	kreg_base = dd->ipath_kregbase;
+
+	if (count == 0)
+		ret = 0;
+	else if ((count % 4) || (*off % 4))
+		/* address or length is not 32-bit aligned, hence invalid */
+		ret = -EINVAL;
+	else if ((count % 8) || (*off % 8))
+		/* address or length not 64-bit aligned; do 32-bit reads */
+		ret = ipath_read_umem32(dd, data, kreg_base + *off, count);
+	else
+		ret = ipath_read_umem64(dd, data, kreg_base + *off, count);
+
+	if (ret >= 0) {
+		*off += count;
+		ret = count;
+	}
+
+bail:
+	return ret;
+}
+
+static ssize_t ipath_diag_write(struct file *fp, const char __user *data,
+				size_t count, loff_t *off)
+{
+	int unit = 0; /* XXX this is bogus */
+	struct ipath_devdata *dd;
+	void __iomem *kreg_base;
+	ssize_t ret;
+
+	dd = ipath_lookup(unit);
+	if (!dd) {
+		ret = -ENODEV;
+		goto bail;
+	}
+	kreg_base = dd->ipath_kregbase;
+
+	if (count == 0)
+		ret = 0;
+	else if ((count % 4) || (*off % 4))
+		/* address or length is not 32-bit aligned, hence invalid */
+		ret = -EINVAL;
+	else if ((count % 8) || (*off % 8))
+		/* address or length not 64-bit aligned; do 32-bit writes */
+		ret = ipath_write_umem32(dd, kreg_base + *off, data, count);
+	else
+		ret = ipath_write_umem64(dd, kreg_base + *off, data, count);
+
+	if (ret >= 0) {
+		*off += count;
+		ret = count;
+	}
+
+bail:
+	return ret;
+}
+
+void ipath_diag_bringup_link(struct ipath_devdata *dd)
+{
+	if (diag_set_link || (dd->ipath_flags & IPATH_LINKACTIVE))
+		return;
+
+	diag_set_link = 1;
+	ipath_cdbg(VERBOSE, "Trying to set to set link active for "
+		   "diag pkt\n");
+	ipath_layer_set_linkstate(dd, IPATH_IB_LINKARM);
+	ipath_layer_set_linkstate(dd, IPATH_IB_LINKACTIVE);
+}
diff -r 1123028ac13a -r 28bb276205de drivers/infiniband/hw/ipath/ipath_sma.c
--- /dev/null	Thu Jan  1 00:00:00 1970 +0000
+++ b/drivers/infiniband/hw/ipath/ipath_sma.c	Thu Mar  9 16:16:04 2006 -0800
@@ -0,0 +1,385 @@
+/*
+ * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/io.h>
+#include <linux/pci.h>
+#include <linux/cdev.h>
+#include <linux/vmalloc.h>
+#include <asm/uaccess.h>
+
+#include "ipath_kernel.h"
+#include "ips_common.h"
+#include "ipath_layer.h"
+
+/* PID of SMA, if it's running.  Atomic because it's accessed without
+ * a lock outside this file. */
+atomic_t ipath_sma_alive;
+DEFINE_SPINLOCK(ipath_sma_lock);	/* SMA receive */
+
+/* max SM received packets we'll queue; we keep the most recent packets. */
+
+struct _ipath_sma_rpkt ipath_sma_data[IPATH_NUM_SMA_PKTS];
+
+unsigned ipath_sma_first;	/* oldest sma packet index */
+unsigned ipath_sma_next;	/* next sma packet index to use */
+
+/*
+ * ipath_sma_data_bufs has one extra, pointed to by ipath_sma_data_spare,
+ * so we can exchange buffers to do copy_to_user, and not hold the lock
+ * across the copy_to_user().
+ */
+
+u8 ipath_sma_data_bufs[IPATH_NUM_SMA_PKTS + 1][IPATH_SMA_MAX_PKTSZ];
+u8 *ipath_sma_data_spare;
+/* sma waits globally on all units */
+wait_queue_head_t ipath_sma_wait;
+wait_queue_head_t ipath_sma_state_wait;
+
+static int ipath_sma_open(struct inode *in, struct file *fp);
+static int ipath_sma_release(struct inode *in, struct file *fp);
+static ssize_t ipath_sma_read(struct file *fp, char __user *data,
+			      size_t count, loff_t *off);
+static ssize_t ipath_sma_write(struct file *fp, const char __user *data,
+			       size_t count, loff_t *off);
+
+static struct file_operations sma_file_ops = {
+	.owner = THIS_MODULE,
+	.write = ipath_sma_write,
+	.read = ipath_sma_read,
+	.open = ipath_sma_open,
+	.release = ipath_sma_release
+};
+
+static struct cdev *sma_cdev;
+static struct class_device *sma_class_dev;
+
+int ipath_sma_init(void)
+{
+	return ipath_cdev_init(IPATH_SMA_MINOR, "ipath_sma", &sma_file_ops,
+			       &sma_cdev, &sma_class_dev);
+}
+
+void ipath_sma_cleanup(void)
+{
+	ipath_cdev_cleanup(&sma_cdev, &sma_class_dev);
+}
+
+static int ipath_sma_open(struct inode *in, struct file *fp)
+{
+	unsigned long flags;
+	__kernel_pid_t pid;
+	int s, ret;
+
+	spin_lock_irqsave(&ipath_sma_lock, flags);
+
+	pid = atomic_read(&ipath_sma_alive);
+	if (pid) {
+		ipath_dbg("SMA already running (pid %u), failing\n", pid);
+		ret = -EBUSY;
+		goto bail;
+	}
+
+	for (s = 0; s < atomic_read(&ipath_max); s++) {
+		struct ipath_devdata *dd = ipath_lookup(s);
+		/* we need at least one infinipath device to be initialized. */
+		if (dd && dd->ipath_flags & IPATH_INITTED) {
+			pid = current->pid;
+			*dd->ipath_statusp |= IPATH_STATUS_SMA;
+			*dd->ipath_statusp &= ~IPATH_STATUS_OIB_SMA;
+		}
+	}
+	if (pid) {
+		ipath_cdbg(SMA, "SMA device now open, SMA active as PID %u\n",
+			   pid);
+		atomic_set(&ipath_sma_alive, pid);
+		ret = 0;
+	} else {
+		ret = -ENODEV;
+		ipath_dbg("No hardware yet found and initted, failing\n");
+	}
+bail:
+	spin_unlock_irqrestore(&ipath_sma_lock, flags);
+	return ret;
+}
+
+static int ipath_sma_release(struct inode *in, struct file *fp)
+{
+	unsigned long flags;
+	int s;
+
+	spin_lock_irqsave(&ipath_sma_lock, flags);
+
+	atomic_set(&ipath_sma_alive, 0);
+	ipath_cdbg(SMA, "Closing SMA device\n");
+	for (s = 0; s < atomic_read(&ipath_max); s++) {
+		struct ipath_devdata *dd = ipath_lookup(s);
+
+		if (!dd || !(dd->ipath_flags & IPATH_INITTED))
+			continue;
+		*dd->ipath_statusp &= ~IPATH_STATUS_SMA;
+		if (dd->verbs_layer.l_flags & IPATH_VERBS_KERNEL_SMA)
+			*dd->ipath_statusp |= IPATH_STATUS_OIB_SMA;
+	}
+	spin_unlock_irqrestore(&ipath_sma_lock, flags);
+	return 0;
+}
+
+/**
+ * ipath_sma_read - receive an IB packet with QP 0 or 1
+ * @fp: the SMA device file pointer
+ * @data: ipath_sma_pkt structure saying where to place the SMA data
+ * @count: unused by this code
+ * @off: unused by this code
+ *
+ * This receives from all/any of the active chips, and we currently do not
+ * allow specifying just one (we could, by filling in unit in the library
+ * before the syscall, and checking here).
+ */
+static ssize_t ipath_sma_read(struct file *fp, char __user *data,
+			      size_t count, loff_t *off)
+{
+	struct _ipath_sma_rpkt *smpkt;
+	struct ipath_sma_pkt rp;
+	unsigned long flags;
+	ssize_t ret;
+	u32 rp_len;
+	int i, any;
+	u8 *sdata;
+	u32 slen;
+
+	if (copy_from_user(&rp, data, sizeof(rp))) {
+		ret = -EFAULT;
+		goto bail;
+	}
+
+	if (!ipath_sma_data_spare) {
+		ipath_dbg("can't do receive, sma not initialized\n");
+		ret = -ENETDOWN;
+		goto bail;
+	}
+
+	for (any = i = 0; i < atomic_read(&ipath_max); i++) {
+		struct ipath_devdata *dd = ipath_lookup(i);
+		if (dd && (dd->ipath_flags & IPATH_INITTED))
+			any++;
+	}
+
+	if (!any) {		/* no hardware, freeze, etc. */
+		ipath_cdbg(SMA, "Didn't find any initialized and usable chips\n");
+		ret = -ENODEV;
+		goto bail;
+	}
+
+	wait_event_interruptible(ipath_sma_wait,
+				 ipath_sma_data[ipath_sma_first].len);
+
+	spin_lock_irqsave(&ipath_sma_lock, flags);
+	if (ipath_sma_data[ipath_sma_first].len == 0) {
+		/* usually means SMA process received a signal */
+		spin_unlock_irqrestore(&ipath_sma_lock, flags);
+		ret = -EAGAIN;
+		goto bail;
+	}
+
+	smpkt = &ipath_sma_data[ipath_sma_first];
+
+	/*
+	 * we swap out the buffer we are going to use with the spare buffer
+	 * and set spare to that buffer.  This code is the only code that
+	 * ever manipulates spare, other than the initialization code.
+	 * This code should never be entered by more than one process at
+	 * a time, and if it is, the user  code doing so deserves what it gets;
+	 * it won't break anything in the driver by doing so.  We do it this
+	 * way to avoid holding a lock across the copy_to_user, which could
+	 * fault, or delay a long time while paging occurs; ditto for printks
+	 */
+
+	rp_len = rp.len;
+	rp.len = smpkt->len;
+	rp.unit = smpkt->unit;
+
+	slen = smpkt->len;
+	sdata = smpkt->buf;
+	smpkt->buf = ipath_sma_data_spare;
+	ipath_sma_data_spare = sdata;
+	smpkt->len = 0;	/* it's available again */
+	if (++ipath_sma_first >= IPATH_NUM_SMA_PKTS)
+		ipath_sma_first = 0;
+	spin_unlock_irqrestore(&ipath_sma_lock, flags);
+
+	if (copy_to_user((void __user *) (unsigned long) rp.data,
+			 sdata, min(rp_len, slen))) {
+		ret = -EFAULT;
+		goto bail;
+	}
+
+	if (copy_to_user(data, &rp, sizeof(rp))) {
+		ret = -EFAULT;
+		goto bail;
+	}
+
+	ret = sizeof(rp);
+bail:
+	return ret;
+}
+
+/**
+ * ipath_sma_write - receive an IB packet with QP 0 or 1
+ * @fp: the SMA device file pointer
+ * @data: ipath_sma_pkt structure saying where to get the SMA packet
+ * @count: size of data to write
+ * @off: unused by this code
+ *
+ * This routine is no longer on any critical paths; it is used only
+ * for sending SMA packets, and some diagnostic usage.
+ * Because it's currently sma only, there are no checks to see if the
+ * link is up; sma must be able to send in the not fully initialized state
+ */
+static ssize_t ipath_sma_write(struct file *fp, const char __user *data,
+			       size_t count, loff_t *off)
+{
+	u32 __iomem *piobuf;
+	u32 plen, clen, pbufn;
+	struct ipath_sma_pkt sp;
+	u32 *tmpbuf = NULL;
+	struct ipath_devdata *dd;
+	ssize_t ret = 0;
+	u64 val;
+
+	if (count < sizeof(sp)) {
+		ret = -EINVAL;
+		goto bail;
+	}
+
+	if (copy_from_user(&sp, data, sizeof(sp))) {
+		ret = -EFAULT;
+		goto bail;
+	}
+
+	// send count must be an exact number of dwords
+	if (sp.len & 3) {
+		ret = -EINVAL;
+		goto bail;
+	}
+
+	clen = sp.len >> 2;
+
+	dd = ipath_lookup(sp.unit);
+	if (!dd || !(dd->ipath_flags & IPATH_PRESENT) || !dd->ipath_kregbase) {
+		ipath_cdbg(SMA, "illegal unit %u for sma send\n", sp.unit);
+		ret = -ENODEV;
+		goto bail;
+	}
+
+	if (ipath_diag_alive)
+		ipath_diag_bringup_link(dd);
+
+	if (!(dd->ipath_flags & IPATH_INITTED)) {
+		/* no hardware, freeze, etc. */
+		ipath_cdbg(SMA, "unit %u not usable\n", dd->ipath_unit);
+		ret = -ENODEV;
+		goto bail;
+	}
+	val = dd->ipath_lastibcstat & IPATH_IBSTATE_MASK;
+	if (val != IPATH_IBSTATE_INIT && val != IPATH_IBSTATE_ARM &&
+	    val != IPATH_IBSTATE_ACTIVE) {
+		ipath_cdbg(SMA, "unit %u not ready (state %llx)\n",
+			   dd->ipath_unit, (unsigned long long) val);
+		ret = -EINVAL;
+		goto bail;
+	}
+
+	/* need total length before first word written */
+	plen = sizeof(u32) + sp.len;	/* +1 word is for the qword padding */
+
+	if ((plen + 4) > dd->ipath_ibmaxlen) {
+		ipath_dbg("Pkt len 0x%x > ibmaxlen %x\n",
+			  plen - 4, dd->ipath_ibmaxlen);
+		ret = -EINVAL;
+		goto bail;	/* before writing pbc */
+	}
+	tmpbuf = vmalloc(plen);
+	if (!tmpbuf) {
+		dev_info(&dd->pcidev->dev, "Unable to allocate tmp buffer, "
+			 "failing\n");
+		ret = -ENOMEM;
+		goto bail;
+	}
+
+	if (copy_from_user(tmpbuf,
+			   (const void __user *) (unsigned long) sp.data,
+			   sp.len)) {
+		ret = -EFAULT;
+		goto bail;
+	}
+
+	piobuf = ipath_getpiobuf(dd, &pbufn);
+	if (!piobuf) {
+		ipath_cdbg(SMA, "No PIO buffers available unit %u %u times\n",
+			   dd->ipath_unit, dd->ipath_nosma_bufs);
+		dd->ipath_nosma_bufs++;
+		ret = -EBUSY;
+		goto bail;
+	}
+	if (dd->ipath_nosma_bufs) {
+		ipath_cdbg(SMA, "Unit %u got SMA send buffer after %u failures, %u seconds\n",
+			   dd->ipath_unit, dd->ipath_nosma_bufs,
+			   dd->ipath_nosma_secs);
+		dd->ipath_nosma_bufs = 0;
+		dd->ipath_nosma_secs = 0;
+	}
+
+	plen >>= 2;		/* in dwords */
+
+	if (ipath_debug & __IPATH_PKTDBG)	// SMA and PKT, both
+		ipath_cdbg(SMA, "unit %u 0x%x+1w pio%d\n",
+			   dd->ipath_unit, plen - 1, pbufn);
+
+	/* we have to flush after the PBC for correctness on some cpus
+	 * or WC buffer can be written out of order */
+	writeq(plen, piobuf);
+	ipath_flush_wc();
+	/* copy all by the trigger word, then flush, so it's written
+	 * to chip before trigger word, then write trigger word, then
+	 * flush again, so packet is sent. */
+	__iowrite32_copy(piobuf + 2, tmpbuf, clen - 1);
+	ipath_flush_wc();
+	__raw_writel(tmpbuf[clen - 1], piobuf + clen + 1);
+	ipath_flush_wc();
+	ipath_stats.sps_sma_spkts++;
+
+	ret = sizeof(sp);
+
+bail:
+	vfree(tmpbuf);
+	return ret;
+}

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management
  2006-03-09 23:52   ` Bryan O'Sullivan
  2006-03-10  0:00     ` Roland Dreier
@ 2006-03-10  0:45     ` Greg KH
  2006-03-10  0:48       ` Bryan O'Sullivan
  1 sibling, 1 reply; 31+ messages in thread
From: Greg KH @ 2006-03-10  0:45 UTC (permalink / raw)
  To: Bryan O'Sullivan
  Cc: Roland Dreier, rolandd, akpm, davem, linux-kernel, openib-general

On Thu, Mar 09, 2006 at 03:52:47PM -0800, Bryan O'Sullivan wrote:
> On Thu, 2006-03-09 at 15:26 -0800, Roland Dreier wrote:
> 
> > Similarly what protects against another process opening the device
> > right after the ipath_sma_alive = 0 setting, but before you do all the
> > cleanup that's after that?
> 
> This is fixed by the stuff I just did in response to your earlier
> message.
> 
> > And what protects against a hot unplug of a device after the test of s
> > against ipath_max?
> 
> We don't support hotplugged devices at the moment.

Why not?  Your cards can't be placed in a machine that supports PCI
Hotplug (or PCI-E hotplug)?  You can't really tell users that (no matter
how often I have wished I could...)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management
  2006-03-10  0:35 ` [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management Bryan O'Sullivan
@ 2006-03-10  0:45   ` Roland Dreier
  2006-03-10  0:47     ` Bryan O'Sullivan
  0 siblings, 1 reply; 31+ messages in thread
From: Roland Dreier @ 2006-03-10  0:45 UTC (permalink / raw)
  To: Bryan O'Sullivan
  Cc: rolandd, gregkh, akpm, davem, linux-kernel, openib-general

OK, now I can see the change you made:

 > +atomic_t ipath_sma_alive;
 > +DEFINE_SPINLOCK(ipath_sma_lock);	/* SMA receive */

So why is ipath_sma_alive an atomic_t (and why isn't it static)?
You never modify ipath_sma_alive outside of your spinlock, so I don't
see what having it be atomic buys you.

 - R.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management
  2006-03-10  0:45   ` Roland Dreier
@ 2006-03-10  0:47     ` Bryan O'Sullivan
  2006-03-10  0:52       ` Roland Dreier
  0 siblings, 1 reply; 31+ messages in thread
From: Bryan O'Sullivan @ 2006-03-10  0:47 UTC (permalink / raw)
  To: Roland Dreier; +Cc: rolandd, gregkh, akpm, davem, linux-kernel, openib-general

On Thu, 2006-03-09 at 16:45 -0800, Roland Dreier wrote:

> So why is ipath_sma_alive an atomic_t (and why isn't it static)?
> You never modify ipath_sma_alive outside of your spinlock, so I don't
> see what having it be atomic buys you.

It's read outside of this file, without a lock held.

	<b


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management
  2006-03-10  0:45     ` Greg KH
@ 2006-03-10  0:48       ` Bryan O'Sullivan
  2006-03-10  1:04         ` Greg KH
  0 siblings, 1 reply; 31+ messages in thread
From: Bryan O'Sullivan @ 2006-03-10  0:48 UTC (permalink / raw)
  To: Greg KH; +Cc: Roland Dreier, rolandd, akpm, davem, linux-kernel, openib-general

On Thu, 2006-03-09 at 16:45 -0800, Greg KH wrote:

> > We don't support hotplugged devices at the moment.
> 
> Why not?  Your cards can't be placed in a machine that supports PCI
> Hotplug (or PCI-E hotplug)?

No, the driver and userspace code doesn't support it yet.  That's all.

> You can't really tell users that (no matter
> how often I have wished I could...)

I don't expect this to be a practical problem.  We're planning to add
hotplug support to the driver once we have some cycles free.

	<b


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management
  2006-03-10  0:47     ` Bryan O'Sullivan
@ 2006-03-10  0:52       ` Roland Dreier
  0 siblings, 0 replies; 31+ messages in thread
From: Roland Dreier @ 2006-03-10  0:52 UTC (permalink / raw)
  To: Bryan O'Sullivan
  Cc: rolandd, gregkh, akpm, davem, linux-kernel, openib-general

    Bryan> It's read outside of this file, without a lock held.

I missed the other reference in another patch.  But the central point
still stands: if all you do is atomic_set() and atomic_read(), then
using atomic_t doesn't buy you anything.  Just look at what
atomic_read() expands to -- using it isn't protecting you against
anything, so either you have a race, or you were safe without
atomic_t.  The only point to atomic_t is so that you can safely do
read-modify-write things like atomic_inc().

 - R.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management
  2006-03-10  0:48       ` Bryan O'Sullivan
@ 2006-03-10  1:04         ` Greg KH
  2006-03-10  4:41           ` Bryan O'Sullivan
  0 siblings, 1 reply; 31+ messages in thread
From: Greg KH @ 2006-03-10  1:04 UTC (permalink / raw)
  To: Bryan O'Sullivan
  Cc: Roland Dreier, rolandd, akpm, davem, linux-kernel, openib-general

On Thu, Mar 09, 2006 at 04:48:45PM -0800, Bryan O'Sullivan wrote:
> On Thu, 2006-03-09 at 16:45 -0800, Greg KH wrote:
> 
> > > We don't support hotplugged devices at the moment.
> > 
> > Why not?  Your cards can't be placed in a machine that supports PCI
> > Hotplug (or PCI-E hotplug)?
> 
> No, the driver and userspace code doesn't support it yet.  That's all.
> 
> > You can't really tell users that (no matter
> > how often I have wished I could...)
> 
> I don't expect this to be a practical problem.  We're planning to add
> hotplug support to the driver once we have some cycles free.

Ugh, that means it's never going to be there.

All new PCI drivers have the requirement that they work properly in
hotplug systems, as they should follow the PCI core api.  If not, odds
are they will not be accepted into the tree :(

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management
  2006-03-10  1:04         ` Greg KH
@ 2006-03-10  4:41           ` Bryan O'Sullivan
  2006-03-10  5:48             ` Greg KH
  2006-03-10  5:55             ` Roland Dreier
  0 siblings, 2 replies; 31+ messages in thread
From: Bryan O'Sullivan @ 2006-03-10  4:41 UTC (permalink / raw)
  To: Greg KH; +Cc: Roland Dreier, rolandd, akpm, davem, linux-kernel, openib-general

On Thu, 2006-03-09 at 17:04 -0800, Greg KH wrote:

> > I don't expect this to be a practical problem.  We're planning to add
> > hotplug support to the driver once we have some cycles free.
> 
> Ugh, that means it's never going to be there.
> 
> All new PCI drivers have the requirement that they work properly in
> hotplug systems, as they should follow the PCI core api.  If not, odds
> are they will not be accepted into the tree :(

Okay, maybe we're talking at cross purposes here.  We do follow the PCI
core API.  We have a __devinit probe and __devexit remove routine, a
MODULE_DEVICE_TABLE, the kernel generates hotplug events when a device
is detected or the driver is unloaded, and so on.

I *assumed* that there was something more that we would need to do in
order to support real hotplug of actual physical cards, but now that I
look more closely, it doesn't appear that there is.  At least, there's
nothing in Documentation/pci.txt or LDD3 that indicates to me that we
ought to be doing more.

Am I missing something?

	<b


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management
  2006-03-10  4:41           ` Bryan O'Sullivan
@ 2006-03-10  5:48             ` Greg KH
  2006-03-10 13:40               ` Bryan O'Sullivan
  2006-03-10  5:55             ` Roland Dreier
  1 sibling, 1 reply; 31+ messages in thread
From: Greg KH @ 2006-03-10  5:48 UTC (permalink / raw)
  To: Bryan O'Sullivan
  Cc: Roland Dreier, rolandd, akpm, davem, linux-kernel, openib-general

On Thu, Mar 09, 2006 at 08:41:36PM -0800, Bryan O'Sullivan wrote:
> On Thu, 2006-03-09 at 17:04 -0800, Greg KH wrote:
> 
> > > I don't expect this to be a practical problem.  We're planning to add
> > > hotplug support to the driver once we have some cycles free.
> > 
> > Ugh, that means it's never going to be there.
> > 
> > All new PCI drivers have the requirement that they work properly in
> > hotplug systems, as they should follow the PCI core api.  If not, odds
> > are they will not be accepted into the tree :(
> 
> Okay, maybe we're talking at cross purposes here.  We do follow the PCI
> core API.  We have a __devinit probe and __devexit remove routine, a
> MODULE_DEVICE_TABLE, the kernel generates hotplug events when a device
> is detected or the driver is unloaded, and so on.
> 
> I *assumed* that there was something more that we would need to do in
> order to support real hotplug of actual physical cards, but now that I
> look more closely, it doesn't appear that there is.  At least, there's
> nothing in Documentation/pci.txt or LDD3 that indicates to me that we
> ought to be doing more.
> 
> Am I missing something?

Nope, that's all that you need to do.  Your driver will be notified that
the device will be going away by calling the disconnect function.  So
great, nothing needs to be done :)

Oh, and you can test this out if you don't have a pci hotplug system by
using the fakephp driver and disconnecting your device that way.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management
  2006-03-10  4:41           ` Bryan O'Sullivan
  2006-03-10  5:48             ` Greg KH
@ 2006-03-10  5:55             ` Roland Dreier
  2006-03-10 13:43               ` Bryan O'Sullivan
  1 sibling, 1 reply; 31+ messages in thread
From: Roland Dreier @ 2006-03-10  5:55 UTC (permalink / raw)
  To: Bryan O'Sullivan
  Cc: Greg KH, rolandd, akpm, davem, linux-kernel, openib-general

    Bryan> I *assumed* that there was something more that we would
    Bryan> need to do in order to support real hotplug of actual
    Bryan> physical cards, but now that I look more closely, it
    Bryan> doesn't appear that there is.  At least, there's nothing in
    Bryan> Documentation/pci.txt or LDD3 that indicates to me that we
    Bryan> ought to be doing more.

    Bryan> Am I missing something?

No, the only problems are with the way the various pieces of your
drivers refer to devices by index.  There are obvious races such as

 > +int __init ipath_verbs_init(void)
 > +{
 > +	int i;
 > +
 > +	number_of_devices = ipath_layer_get_num_of_dev();
 > +	i = number_of_devices * sizeof(struct ipath_ibdev *);
 > +	ipath_devices = kmalloc(i, GFP_ATOMIC);
 > +	if (ipath_devices == NULL)
 > +		return -ENOMEM;
 > +
 > +	for (i = 0; i < number_of_devices; i++) {
 > +		struct ipath_devdata *dd;
 > +		int ret = ipath_verbs_register(i, ipath_ib_piobufavail,
 > +					       ipath_ib_rcv, ipath_ib_timer,
 > +					       &dd);

suppose number_of_devices gets set to 5 but by the time you call
ipath_verbs_register(5,...), the 5th device has been unplugged?

Also you only do this when the module is loaded, so you won't handle
devices that are hot-plugged later.  And I don't see anything that
would handle hot unplug either.

Pretty much any use of ipath_max is probably broken by hot plug.

 - R.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management
  2006-03-10  5:48             ` Greg KH
@ 2006-03-10 13:40               ` Bryan O'Sullivan
  0 siblings, 0 replies; 31+ messages in thread
From: Bryan O'Sullivan @ 2006-03-10 13:40 UTC (permalink / raw)
  To: Greg KH; +Cc: Roland Dreier, rolandd, akpm, davem, linux-kernel, openib-general

On Thu, 2006-03-09 at 21:48 -0800, Greg KH wrote:

> Oh, and you can test this out if you don't have a pci hotplug system by
> using the fakephp driver and disconnecting your device that way.

That's good to know, thanks.

	<b


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management
  2006-03-10  5:55             ` Roland Dreier
@ 2006-03-10 13:43               ` Bryan O'Sullivan
  2006-03-10 16:58                 ` Greg KH
  2006-03-10 17:08                 ` Roland Dreier
  0 siblings, 2 replies; 31+ messages in thread
From: Bryan O'Sullivan @ 2006-03-10 13:43 UTC (permalink / raw)
  To: Roland Dreier; +Cc: Greg KH, rolandd, akpm, davem, linux-kernel, openib-general

On Thu, 2006-03-09 at 21:55 -0800, Roland Dreier wrote:

> No, the only problems are with the way the various pieces of your
> drivers refer to devices by index.

OK.  What's a safe way to iterate over the devices in the presence of
hotplug, then?  I assume it's list_for_each_mumble; I just don't know
what mumble is :-)

> Also you only do this when the module is loaded, so you won't handle
> devices that are hot-plugged later.

No, ipath_max is updated any time a probe routine is called.

>   And I don't see anything that
> would handle hot unplug either.

What would this anything look like, if I were hoping for an example to
emulate?  There's nothing in LDD3 about this, so I'm kind of in the
dark.

	<b

-- 
Bryan O'Sullivan <bos@pathscale.com>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management
  2006-03-09 23:39   ` Bryan O'Sullivan
  2006-03-09 23:47     ` Roland Dreier
@ 2006-03-10 15:54     ` Michael S. Tsirkin
  2006-03-10 16:05       ` Bryan O'Sullivan
  1 sibling, 1 reply; 31+ messages in thread
From: Michael S. Tsirkin @ 2006-03-10 15:54 UTC (permalink / raw)
  To: Bryan O'Sullivan
  Cc: Roland Dreier, akpm, gregkh, linux-kernel, openib-general, davem

Quoting r. Bryan O'Sullivan <bos@pathscale.com>:
> Subject: Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management
> 
> On Thu, 2006-03-09 at 15:20 -0800, Roland Dreier wrote:
> 
> > I've never understood what forces you to maintain two separate SMAs.
> > Why can't you pick one of the two SMAs and use that unconditionally?
> 
> Three reasons.
> 
>       * OpenSM wasn't usable when we wrote our SMA.  We have customers
>         using ours now, so we have to support it.

Presumably you mean the ib_mad SMA - OpenSM is not an SMA.
I don't understand this. You don't talk to an SMA directly -
its jobs is to receive and send management packets that the card
gets from a subnet manager. So what do customers care which SMA
implementation is used, as long as it formats the management packets
correctly?

If you want to extend the SMA in non-standard ways, you can
snoop and send management packets by loading ib_umad.

>       * Our SMA does some setup for the layered ethernet emulation
>         driver.

You are doing this from userspace, right? But you can already send/receive MADs
from userspace by loading ib_umad.  What is there that is not sufficient for
your purposes?

By the way, this might also be a clean way for you to get at the port info
node info and port counters atomically like you wanted to in another thread:
post a management packet to the local device, get a packet back.
No need for extra sysfs files.

>       * Our SMA works without an IB stack of any kind present.

The stack is pretty slim, AFAIK it mostly consists of an SMA implementation.
So where's the benefit in avoiding it? Just link against ib_mad, it will
get loaded atomatically.

-- 
Michael S. Tsirkin
Staff Engineer, Mellanox Technologies

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management
  2006-03-10 15:54     ` Michael S. Tsirkin
@ 2006-03-10 16:05       ` Bryan O'Sullivan
  0 siblings, 0 replies; 31+ messages in thread
From: Bryan O'Sullivan @ 2006-03-10 16:05 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Roland Dreier, akpm, gregkh, linux-kernel, openib-general, davem

On Fri, 2006-03-10 at 17:54 +0200, Michael S. Tsirkin wrote:

> >       * OpenSM wasn't usable when we wrote our SMA.  We have customers
> >         using ours now, so we have to support it.
> 
> Presumably you mean the ib_mad SMA - OpenSM is not an SMA.

Yes, I already mentioned that I got my terms swapped in another message.

> So what do customers care which SMA
> implementation is used, as long as it formats the management packets
> correctly?

Many, perhaps most right now, of our customers don't have a full IB
stack loaded.  That's why we have this small userspace SMA.

	<b


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management
  2006-03-10 13:43               ` Bryan O'Sullivan
@ 2006-03-10 16:58                 ` Greg KH
  2006-03-10 17:05                   ` Bryan O'Sullivan
  2006-03-10 17:08                 ` Roland Dreier
  1 sibling, 1 reply; 31+ messages in thread
From: Greg KH @ 2006-03-10 16:58 UTC (permalink / raw)
  To: Bryan O'Sullivan
  Cc: Roland Dreier, rolandd, akpm, davem, linux-kernel, openib-general

On Fri, Mar 10, 2006 at 05:43:50AM -0800, Bryan O'Sullivan wrote:
> On Thu, 2006-03-09 at 21:55 -0800, Roland Dreier wrote:
> 
> > No, the only problems are with the way the various pieces of your
> > drivers refer to devices by index.
> 
> OK.  What's a safe way to iterate over the devices in the presence of
> hotplug, then?  I assume it's list_for_each_mumble; I just don't know
> what mumble is :-)

You keep an internal list of devices, if you really need to do such a
thing.

> > Also you only do this when the module is loaded, so you won't handle
> > devices that are hot-plugged later.
> 
> No, ipath_max is updated any time a probe routine is called.
> 
> >   And I don't see anything that
> > would handle hot unplug either.
> 
> What would this anything look like, if I were hoping for an example to
> emulate?  There's nothing in LDD3 about this, so I'm kind of in the
> dark.

It's just the "disconnect" PCI function being called, which can happen
at any time.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management
  2006-03-10 16:58                 ` Greg KH
@ 2006-03-10 17:05                   ` Bryan O'Sullivan
  0 siblings, 0 replies; 31+ messages in thread
From: Bryan O'Sullivan @ 2006-03-10 17:05 UTC (permalink / raw)
  To: Greg KH; +Cc: Roland Dreier, rolandd, akpm, davem, linux-kernel, openib-general

On Fri, 2006-03-10 at 08:58 -0800, Greg KH wrote:

> You keep an internal list of devices, if you really need to do such a
> thing.

OK.  I do think we need it, because we have a dozen or so cases where we
do "iterate over all of the known devices".

> It's just the "disconnect" PCI function being called, which can happen
> at any time.

Thanks.

	<b


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management
  2006-03-10 13:43               ` Bryan O'Sullivan
  2006-03-10 16:58                 ` Greg KH
@ 2006-03-10 17:08                 ` Roland Dreier
  2006-03-10 17:32                   ` Bryan O'Sullivan
  1 sibling, 1 reply; 31+ messages in thread
From: Roland Dreier @ 2006-03-10 17:08 UTC (permalink / raw)
  To: Bryan O'Sullivan
  Cc: Greg KH, rolandd, akpm, davem, linux-kernel, openib-general

    Bryan> OK.  What's a safe way to iterate over the devices in the
    Bryan> presence of hotplug, then?  I assume it's
    Bryan> list_for_each_mumble; I just don't know what mumble is :-)

You need something that takes a reference to each device while you're
looking at it, like for_each_pci_dev().  But in general iterating
through devices is usually the wrong thing to do, because devices can
come and go in the middle of your loop.  It's better to be driven by
the add and remove callbacks.

    Bryan> No, ipath_max is updated any time a probe routine is
    Bryan> called.

Yes, that's true.  (BTW, what does making ipath_max an atomic_t get
you?  The updates are protected by a lock anyway).  But I was talking
about the code in ipath_verbs_init(), which is the only place you call
ipath_verbs_register() that I could find.  You make one pass through
the devices that are present when ipath_verbs_init() is called at
module load time, and any devices that get added later are missed.

Similarly, if a device is unplugged while the verbs module is loaded,
there's no notification from the core driver of that, and you'll go
ahead and do ipath_verbs_unregister() on a device that is long gone
when you get to ipath_verbs_cleanup().

 - R.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management
  2006-03-10 17:08                 ` Roland Dreier
@ 2006-03-10 17:32                   ` Bryan O'Sullivan
  2006-03-10 22:20                     ` Roland Dreier
  0 siblings, 1 reply; 31+ messages in thread
From: Bryan O'Sullivan @ 2006-03-10 17:32 UTC (permalink / raw)
  To: Roland Dreier; +Cc: Greg KH, rolandd, akpm, davem, linux-kernel, openib-general

On Fri, 2006-03-10 at 09:08 -0800, Roland Dreier wrote:
>     Bryan> OK.  What's a safe way to iterate over the devices in the
>     Bryan> presence of hotplug, then?  I assume it's
>     Bryan> list_for_each_mumble; I just don't know what mumble is :-)
> 
> You need something that takes a reference to each device while you're
> looking at it, like for_each_pci_dev().

OK, thanks.

It seems like the thing to do to be fully safe might be to have
ipath_get() (just rename ipath_lookup) and ipath_put() functions.  Embed
a kref inside ipath_devdata, and have ipath_dev_get increment the ref
count on both the ipath_devdata and the pci_dev.

Is that sane, or am I way off base?

>   But in general iterating
> through devices is usually the wrong thing to do, because devices can
> come and go in the middle of your loop.

I understand that.  In practice, though, I think this is not a good
approach in many of the cases we're dealing with.  Every use of
ipath_max iterates over all devices for a reason.

For example, the diag open routine wants to find at least one device
that's up.  We could maintain a separate list of devices that are in the
state that it needs, so that it can just get the first entry off that
list or fail if the list is empty, but that's more complex than simply
iterating over every device and checking each one.

> (BTW, what does making ipath_max an atomic_t get
> you?  The updates are protected by a lock anyway).

Probably not much.  The motivation was to ensure that if it got
incremented during an iteration, whoever was iterating would see the
update in a timely fashion.

>   But I was talking
> about the code in ipath_verbs_init(), which is the only place you call
> ipath_verbs_register() that I could find.  You make one pass through
> the devices that are present when ipath_verbs_init() is called at
> module load time, and any devices that get added later are missed.

Yes, that code ought to be restructured.

Thanks for the helpful feedback.

	<b


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management
  2006-03-10 17:32                   ` Bryan O'Sullivan
@ 2006-03-10 22:20                     ` Roland Dreier
  0 siblings, 0 replies; 31+ messages in thread
From: Roland Dreier @ 2006-03-10 22:20 UTC (permalink / raw)
  To: Bryan O'Sullivan
  Cc: Greg KH, rolandd, akpm, davem, linux-kernel, openib-general

    Bryan> Probably not much.  The motivation was to ensure that if it
    Bryan> got incremented during an iteration, whoever was iterating
    Bryan> would see the update in a timely fashion.

But as far as I can see, you never do atomic_inc() -- only
atomic_set() under a spinlock.

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2006-03-10 22:21 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <eac2ad3017b5f160d24c.1141922822@localhost.localdomain>
2006-03-09 23:20 ` [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management Roland Dreier
2006-03-09 23:39   ` Bryan O'Sullivan
2006-03-09 23:47     ` Roland Dreier
2006-03-09 23:50       ` Bryan O'Sullivan
2006-03-09 23:52         ` Roland Dreier
2006-03-10 15:54     ` Michael S. Tsirkin
2006-03-10 16:05       ` Bryan O'Sullivan
2006-03-09 23:24 ` Roland Dreier
2006-03-09 23:49   ` Bryan O'Sullivan
2006-03-09 23:51     ` Roland Dreier
2006-03-09 23:26 ` Roland Dreier
2006-03-09 23:52   ` Bryan O'Sullivan
2006-03-10  0:00     ` Roland Dreier
2006-03-10  0:04       ` Bryan O'Sullivan
2006-03-10  0:45     ` Greg KH
2006-03-10  0:48       ` Bryan O'Sullivan
2006-03-10  1:04         ` Greg KH
2006-03-10  4:41           ` Bryan O'Sullivan
2006-03-10  5:48             ` Greg KH
2006-03-10 13:40               ` Bryan O'Sullivan
2006-03-10  5:55             ` Roland Dreier
2006-03-10 13:43               ` Bryan O'Sullivan
2006-03-10 16:58                 ` Greg KH
2006-03-10 17:05                   ` Bryan O'Sullivan
2006-03-10 17:08                 ` Roland Dreier
2006-03-10 17:32                   ` Bryan O'Sullivan
2006-03-10 22:20                     ` Roland Dreier
2006-03-10  0:35 [PATCH 0 of 20] [RFC] ipath driver - another round for review Bryan O'Sullivan
2006-03-10  0:35 ` [PATCH 9 of 20] ipath - char devices for diagnostics and lightweight subnet management Bryan O'Sullivan
2006-03-10  0:45   ` Roland Dreier
2006-03-10  0:47     ` Bryan O'Sullivan
2006-03-10  0:52       ` Roland Dreier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.