* Re: [PATCH 1/8] [I/OAT] DMA memcpy subsystem
From: Chris Leech @ 2006-03-06 19:39 UTC (permalink / raw)
To: David S. Miller; +Cc: linux-kernel, netdev
In-Reply-To: <20060303.174048.14793187.davem@davemloft.net>
On 3/3/06, David S. Miller <davem@davemloft.net> wrote:
> > +static spinlock_t dma_list_lock;
>
> Please use DEFINE_SPINLOCK().
>
> > +static void dma_chan_free_rcu(struct rcu_head *rcu) {
>
> Newline before the brace please.
>
> > +static void dma_async_device_cleanup(struct kref *kref) {
>
> Newline before the brace please.
>
> > +struct dma_chan_percpu
> > +{
>
> Left brace on the same line as "struct dma_chan_percpu" please.
>
> > +struct dma_chan
> > +{
>
> Similarly.
>
> Otherwise this patch looks mostly ok.
Thanks Dave,
I'll apply these and other feedback and get updated patches generated.
- Chris
^ permalink raw reply
* [git patches] net driver fixes
From: Jeff Garzik @ 2006-03-06 19:39 UTC (permalink / raw)
To: Andrew Morton, Linus Torvalds; +Cc: netdev, linux-kernel
Please pull from 'upstream-fixes' branch of
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git
to receive the following updates:
drivers/net/chelsio/espi.c | 4 +---
drivers/net/s2io.c | 1 +
2 files changed, 2 insertions(+), 3 deletions(-)
Eric Sesterhenn:
chelsio: fix kmalloc failure in t1_espi_create
Jeff Garzik:
s2io: set_multicast_list bug
diff --git a/drivers/net/chelsio/espi.c b/drivers/net/chelsio/espi.c
index 2306425..e824aca 100644
--- a/drivers/net/chelsio/espi.c
+++ b/drivers/net/chelsio/espi.c
@@ -296,9 +296,7 @@ void t1_espi_destroy(struct peespi *espi
struct peespi *t1_espi_create(adapter_t *adapter)
{
- struct peespi *espi = kmalloc(sizeof(*espi), GFP_KERNEL);
-
- memset(espi, 0, sizeof(*espi));
+ struct peespi *espi = kzalloc(sizeof(*espi), GFP_KERNEL);
if (espi)
espi->adapter = adapter;
diff --git a/drivers/net/s2io.c b/drivers/net/s2io.c
index 49b597c..b7f00d6 100644
--- a/drivers/net/s2io.c
+++ b/drivers/net/s2io.c
@@ -4092,6 +4092,7 @@ static void s2io_set_multicast(struct ne
i++, mclist = mclist->next) {
memcpy(sp->usr_addrs[i].addr, mclist->dmi_addr,
ETH_ALEN);
+ mac_addr = 0;
for (j = 0; j < ETH_ALEN; j++) {
mac_addr |= mclist->dmi_addr[j];
mac_addr <<= 8;
^ permalink raw reply related
* Re: [PATCH 8/8] [I/OAT] TCP recv offload to I/OAT
From: Chris Leech @ 2006-03-06 19:36 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, netdev
In-Reply-To: <20060305004534.1d94b3cf.akpm@osdl.org>
On 3/5/06, Andrew Morton <akpm@osdl.org> wrote:
> Chris Leech <christopher.leech@intel.com> wrote:
> >
> > +#ifdef CONFIG_NET_DMA
> > + tp->ucopy.dma_chan = NULL;
> > + if ((len > sysctl_tcp_dma_copybreak) && !(flags & MSG_PEEK) && !sysctl_tcp_low_latency && __get_cpu_var(softnet_data.net_dma))
> > + dma_lock_iovec_pages(msg->msg_iov, len, &tp->ucopy.locked_list);
> > +#endif
>
> The __get_cpu_var() here will run smp_processor_id() from preemptible
> context. You'll get a big warning if the correct debug options are set.
>
> The reason for this is that preemption could cause this code to hop between
> CPUs.
I've been playing with different models of where to select which DMA
channel to use in order to reduce cache thrash and lock contention in
the driver. It's not a clean per-cpu issue because per I/O there are
potentially operations happening in both the process syscall and the
netrx softirq context.
Right now the code delays selection of a DMA channel until the first
offload copy is ready to go, so the __get_cpu_var() you point out is
just checking to see if any hardware exists for I/OAT at this point
before doing the page pinning. Before anything is done with the
channel the per-cpu pointer is re-read safely with preemption disabled
and a reference count is incremented.
- Chris
^ permalink raw reply
* Re: RE: [PATCH 2/6] IB: match connection requests based on private data
From: Sean Hefty @ 2006-03-06 19:31 UTC (permalink / raw)
To: Caitlin Bestler; +Cc: netdev, Roland Dreier, openib-general, linux-kernel
In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F12FBF19@NT-SJCA-0751.brcm.ad.broadcom.com>
Caitlin Bestler wrote:
> The term "private data" is intended to convey the
> intent that the data is private to the application
> layer and is opaque to middleware and the network.
The private data area is for the use of whatever client resides above the
Infiniband CM only. There is no assumption about whether that client is
middleware or an application.
> By what mechanism does the listening application
> delegate how much of the private data for use by
> the CM for sub-dividing a listen? What does an
> application do if it wishes to retain full ownership
> of the private data?
An application that interfaces directly with the Infiniband CM always retains
full control of any private data. Applications that interface to middleware are
restricted by the limitations of that middleware layer.
- Sean
^ permalink raw reply
* Re: [PATCH 8/8] [I/OAT] TCP recv offload to I/OAT
From: Chris Leech @ 2006-03-06 19:28 UTC (permalink / raw)
To: Greg KH; +Cc: linux-kernel, netdev
In-Reply-To: <20060304231842.GA3103@kroah.com>
> #ifdef is not needed here (try not to put #ifdef in .c files.) I think
> a few of your other usages of #ifdef in this file can also be removed
> with judicious use of inline functions in a .h file.
ACK on all the ifdef comments. I may have gone a little ifdef crazy
making sure I could get to a zero impact state with these patches
applied but CONFIG_NET_DMA turned off. I'll get these cleaned up.
- Chris
^ permalink raw reply
* [PATCH 6/6] IB: userspace support for RDMA connection manager
From: Sean Hefty @ 2006-03-06 19:21 UTC (permalink / raw)
To: 'Roland Dreier', netdev, linux-kernel; +Cc: openib-general
In-Reply-To: <adaslpz2l9p.fsf@cisco.com>
Kernel component necessary to support the userspace RDMA connection management
library.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
---
diff -uprN -X linux-2.6.git/Documentation/dontdiff
linux-2.6.git/drivers/infiniband/core/Makefile
linux-2.6.ib/drivers/infiniband/core/Makefile
--- linux-2.6.git/drivers/infiniband/core/Makefile 2006-01-16 16:58:58.000000000 -0800
+++ linux-2.6.ib/drivers/infiniband/core/Makefile 2006-01-16 16:55:25.000000000 -0800
@@ -1,5 +1,5 @@
obj-$(CONFIG_INFINIBAND) += ib_core.o ib_mad.o ib_sa.o \
- ib_cm.o ib_addr.o rdma_cm.o
+ ib_cm.o ib_addr.o rdma_cm.o rdma_ucm.o
obj-$(CONFIG_INFINIBAND_USER_MAD) += ib_umad.o
obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o ib_ucm.o
@@ -14,6 +14,8 @@ ib_cm-y := cm.o
rdma_cm-y := cma.o
+rdma_ucm-y := ucma.o
+
ib_addr-y := addr.o
ib_umad-y := user_mad.o
diff -uprN -X linux-2.6.git/Documentation/dontdiff
linux-2.6.git/drivers/infiniband/core/ucma.c
linux-2.6.ib/drivers/infiniband/core/ucma.c
--- linux-2.6.git/drivers/infiniband/core/ucma.c 1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.ib/drivers/infiniband/core/ucma.c 2006-01-16 16:54:31.000000000 -0800
@@ -0,0 +1,788 @@
+/*
+ * Copyright (c) 2005 Intel Corporation. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/poll.h>
+#include <linux/idr.h>
+#include <linux/in.h>
+#include <linux/in6.h>
+#include <linux/miscdevice.h>
+
+#include <rdma/rdma_user_cm.h>
+#include <rdma/ib_marshall.h>
+#include <rdma/rdma_cm.h>
+
+MODULE_AUTHOR("Sean Hefty");
+MODULE_DESCRIPTION("RDMA Userspace Connection Manager Access");
+MODULE_LICENSE("Dual BSD/GPL");
+
+enum {
+ UCMA_MAX_BACKLOG = 128
+};
+
+struct ucma_file {
+ struct semaphore mutex;
+ struct file *filp;
+ struct list_head ctxs;
+ struct list_head events;
+ wait_queue_head_t poll_wait;
+};
+
+struct ucma_context {
+ int id;
+ wait_queue_head_t wait;
+ atomic_t ref;
+ int events_reported;
+ int backlog;
+
+ struct ucma_file *file;
+ struct rdma_cm_id *cm_id;
+ __u64 uid;
+
+ struct list_head events; /* list of pending events. */
+ struct list_head file_list; /* member in file ctx list */
+};
+
+struct ucma_event {
+ struct ucma_context *ctx;
+ struct list_head file_list; /* member in file event list */
+ struct list_head ctx_list; /* member in ctx event list */
+ struct rdma_cm_id *cm_id;
+ struct rdma_ucm_event_resp resp;
+};
+
+static DECLARE_MUTEX(ctx_mutex);
+static DEFINE_IDR(ctx_idr);
+
+static struct ucma_context* ucma_get_ctx(struct ucma_file *file, int id)
+{
+ struct ucma_context *ctx;
+
+ down(&ctx_mutex);
+ ctx = idr_find(&ctx_idr, id);
+ if (!ctx)
+ ctx = ERR_PTR(-ENOENT);
+ else if (ctx->file != file)
+ ctx = ERR_PTR(-EINVAL);
+ else
+ atomic_inc(&ctx->ref);
+ up(&ctx_mutex);
+
+ return ctx;
+}
+
+static void ucma_put_ctx(struct ucma_context *ctx)
+{
+ if (atomic_dec_and_test(&ctx->ref))
+ wake_up(&ctx->wait);
+}
+
+static void ucma_cleanup_events(struct ucma_context *ctx)
+{
+ struct ucma_event *uevent;
+
+ down(&ctx->file->mutex);
+ list_del(&ctx->file_list);
+ while (!list_empty(&ctx->events)) {
+
+ uevent = list_entry(ctx->events.next, struct ucma_event,
+ ctx_list);
+ list_del(&uevent->file_list);
+ list_del(&uevent->ctx_list);
+
+ /* clear incoming connections. */
+ if (uevent->resp.event == RDMA_CM_EVENT_CONNECT_REQUEST)
+ rdma_destroy_id(uevent->cm_id);
+
+ kfree(uevent);
+ }
+ up(&ctx->file->mutex);
+}
+
+static struct ucma_context* ucma_alloc_ctx(struct ucma_file *file)
+{
+ struct ucma_context *ctx;
+ int ret;
+
+ ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+ if (!ctx)
+ return NULL;
+
+ atomic_set(&ctx->ref, 1);
+ init_waitqueue_head(&ctx->wait);
+ ctx->file = file;
+ INIT_LIST_HEAD(&ctx->events);
+
+ do {
+ ret = idr_pre_get(&ctx_idr, GFP_KERNEL);
+ if (!ret)
+ goto error;
+
+ down(&ctx_mutex);
+ ret = idr_get_new(&ctx_idr, ctx, &ctx->id);
+ up(&ctx_mutex);
+ } while (ret == -EAGAIN);
+
+ if (ret)
+ goto error;
+
+ list_add_tail(&ctx->file_list, &file->ctxs);
+ return ctx;
+
+error:
+ kfree(ctx);
+ return NULL;
+}
+
+static int ucma_event_handler(struct rdma_cm_id *cm_id,
+ struct rdma_cm_event *event)
+{
+ struct ucma_event *uevent;
+ struct ucma_context *ctx = cm_id->context;
+ int ret = 0;
+
+ uevent = kzalloc(sizeof(*uevent), GFP_KERNEL);
+ if (!uevent)
+ return event->event == RDMA_CM_EVENT_CONNECT_REQUEST;
+
+ uevent->ctx = ctx;
+ uevent->cm_id = cm_id;
+ uevent->resp.uid = ctx->uid;
+ uevent->resp.id = ctx->id;
+ uevent->resp.event = event->event;
+ uevent->resp.status = event->status;
+ if ((uevent->resp.private_data_len = event->private_data_len))
+ memcpy(uevent->resp.private_data, event->private_data,
+ event->private_data_len);
+
+ down(&ctx->file->mutex);
+ if (event->event == RDMA_CM_EVENT_CONNECT_REQUEST) {
+ if (!ctx->backlog) {
+ ret = -EDQUOT;
+ goto out;
+ }
+ ctx->backlog--;
+ }
+ list_add_tail(&uevent->file_list, &ctx->file->events);
+ list_add_tail(&uevent->ctx_list, &ctx->events);
+ wake_up_interruptible(&ctx->file->poll_wait);
+out:
+ up(&ctx->file->mutex);
+ return ret;
+}
+
+static ssize_t ucma_get_event(struct ucma_file *file, const char __user *inbuf,
+ int in_len, int out_len)
+{
+ struct ucma_context *ctx;
+ struct rdma_ucm_get_event cmd;
+ struct ucma_event *uevent;
+ int ret = 0;
+ DEFINE_WAIT(wait);
+
+ if (out_len < sizeof(struct rdma_ucm_event_resp))
+ return -ENOSPC;
+
+ if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
+ return -EFAULT;
+
+ down(&file->mutex);
+ while (list_empty(&file->events)) {
+ if (file->filp->f_flags & O_NONBLOCK) {
+ ret = -EAGAIN;
+ break;
+ }
+
+ if (signal_pending(current)) {
+ ret = -ERESTARTSYS;
+ break;
+ }
+
+ prepare_to_wait(&file->poll_wait, &wait, TASK_INTERRUPTIBLE);
+ up(&file->mutex);
+ schedule();
+ down(&file->mutex);
+ finish_wait(&file->poll_wait, &wait);
+ }
+
+ if (ret)
+ goto done;
+
+ uevent = list_entry(file->events.next, struct ucma_event, file_list);
+
+ if (uevent->resp.event == RDMA_CM_EVENT_CONNECT_REQUEST) {
+ ctx = ucma_alloc_ctx(file);
+ if (!ctx) {
+ ret = -ENOMEM;
+ goto done;
+ }
+ uevent->ctx->backlog++;
+ ctx->cm_id = uevent->cm_id;
+ ctx->cm_id->context = ctx;
+ uevent->resp.id = ctx->id;
+ }
+
+ if (copy_to_user((void __user *)(unsigned long)cmd.response,
+ &uevent->resp, sizeof(uevent->resp))) {
+ ret = -EFAULT;
+ goto done;
+ }
+
+ list_del(&uevent->file_list);
+ list_del(&uevent->ctx_list);
+ uevent->ctx->events_reported++;
+ kfree(uevent);
+done:
+ up(&file->mutex);
+ return ret;
+}
+
+static ssize_t ucma_create_id(struct ucma_file *file,
+ const char __user *inbuf,
+ int in_len, int out_len)
+{
+ struct rdma_ucm_create_id cmd;
+ struct rdma_ucm_create_id_resp resp;
+ struct ucma_context *ctx;
+ int ret;
+
+ if (out_len < sizeof(resp))
+ return -ENOSPC;
+
+ if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
+ return -EFAULT;
+
+ down(&file->mutex);
+ ctx = ucma_alloc_ctx(file);
+ up(&file->mutex);
+ if (!ctx)
+ return -ENOMEM;
+
+ ctx->uid = cmd.uid;
+ ctx->cm_id = rdma_create_id(ucma_event_handler, ctx, RDMA_PS_TCP);
+ if (IS_ERR(ctx->cm_id)) {
+ ret = PTR_ERR(ctx->cm_id);
+ goto err1;
+ }
+
+ resp.id = ctx->id;
+ if (copy_to_user((void __user *)(unsigned long)cmd.response,
+ &resp, sizeof(resp))) {
+ ret = -EFAULT;
+ goto err2;
+ }
+ return 0;
+
+err2:
+ rdma_destroy_id(ctx->cm_id);
+err1:
+ down(&ctx_mutex);
+ idr_remove(&ctx_idr, ctx->id);
+ up(&ctx_mutex);
+ kfree(ctx);
+ return ret;
+}
+
+static ssize_t ucma_destroy_id(struct ucma_file *file, const char __user *inbuf,
+ int in_len, int out_len)
+{
+ struct rdma_ucm_destroy_id cmd;
+ struct rdma_ucm_destroy_id_resp resp;
+ struct ucma_context *ctx;
+ int ret = 0;
+
+ if (out_len < sizeof(resp))
+ return -ENOSPC;
+
+ if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
+ return -EFAULT;
+
+ down(&ctx_mutex);
+ ctx = idr_find(&ctx_idr, cmd.id);
+ if (!ctx)
+ ctx = ERR_PTR(-ENOENT);
+ else if (ctx->file != file)
+ ctx = ERR_PTR(-EINVAL);
+ else
+ idr_remove(&ctx_idr, ctx->id);
+ up(&ctx_mutex);
+
+ if (IS_ERR(ctx))
+ return PTR_ERR(ctx);
+
+ atomic_dec(&ctx->ref);
+ wait_event(ctx->wait, !atomic_read(&ctx->ref));
+
+ /* No new events will be generated after destroying the id. */
+ rdma_destroy_id(ctx->cm_id);
+ /* Cleanup events not yet reported to the user. */
+ ucma_cleanup_events(ctx);
+
+ resp.events_reported = ctx->events_reported;
+ if (copy_to_user((void __user *)(unsigned long)cmd.response,
+ &resp, sizeof(resp)))
+ ret = -EFAULT;
+
+ kfree(ctx);
+ return ret;
+}
+
+static ssize_t ucma_bind_addr(struct ucma_file *file, const char __user *inbuf,
+ int in_len, int out_len)
+{
+ struct rdma_ucm_bind_addr cmd;
+ struct ucma_context *ctx;
+ int ret;
+
+ if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
+ return -EFAULT;
+
+ ctx = ucma_get_ctx(file, cmd.id);
+ if (IS_ERR(ctx))
+ return PTR_ERR(ctx);
+
+ ret = rdma_bind_addr(ctx->cm_id, (struct sockaddr *) &cmd.addr);
+ ucma_put_ctx(ctx);
+ return ret;
+}
+
+static ssize_t ucma_resolve_addr(struct ucma_file *file,
+ const char __user *inbuf,
+ int in_len, int out_len)
+{
+ struct rdma_ucm_resolve_addr cmd;
+ struct ucma_context *ctx;
+ int ret;
+
+ if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
+ return -EFAULT;
+
+ ctx = ucma_get_ctx(file, cmd.id);
+ if (IS_ERR(ctx))
+ return PTR_ERR(ctx);
+
+ ret = rdma_resolve_addr(ctx->cm_id, (struct sockaddr *) &cmd.src_addr,
+ (struct sockaddr *) &cmd.dst_addr,
+ cmd.timeout_ms);
+ ucma_put_ctx(ctx);
+ return ret;
+}
+
+static ssize_t ucma_resolve_route(struct ucma_file *file,
+ const char __user *inbuf,
+ int in_len, int out_len)
+{
+ struct rdma_ucm_resolve_route cmd;
+ struct ucma_context *ctx;
+ int ret;
+
+ if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
+ return -EFAULT;
+
+ ctx = ucma_get_ctx(file, cmd.id);
+ if (IS_ERR(ctx))
+ return PTR_ERR(ctx);
+
+ ret = rdma_resolve_route(ctx->cm_id, cmd.timeout_ms);
+ ucma_put_ctx(ctx);
+ return ret;
+}
+
+static void ucma_copy_ib_route(struct rdma_ucm_query_route_resp *resp,
+ struct rdma_route *route)
+{
+ struct rdma_dev_addr *dev_addr;
+
+ resp->num_paths = route->num_paths;
+ switch (route->num_paths) {
+ case 0:
+ dev_addr = &route->addr.dev_addr;
+ memcpy(&resp->ib_route[0].dgid, ib_addr_get_dgid(dev_addr),
+ sizeof(union ib_gid));
+ memcpy(&resp->ib_route[0].sgid, ib_addr_get_sgid(dev_addr),
+ sizeof(union ib_gid));
+ resp->ib_route[0].pkey = cpu_to_be16(ib_addr_get_pkey(dev_addr));
+ break;
+ case 2:
+ ib_copy_path_rec_to_user(&resp->ib_route[1],
+ &route->path_rec[1]);
+ /* fall through */
+ case 1:
+ ib_copy_path_rec_to_user(&resp->ib_route[0],
+ &route->path_rec[0]);
+ break;
+ default:
+ break;
+ }
+}
+
+static ssize_t ucma_query_route(struct ucma_file *file,
+ const char __user *inbuf,
+ int in_len, int out_len)
+{
+ struct rdma_ucm_query_route cmd;
+ struct rdma_ucm_query_route_resp resp;
+ struct ucma_context *ctx;
+ struct sockaddr *addr;
+ int ret = 0;
+
+ if (out_len < sizeof(resp))
+ return -ENOSPC;
+
+ if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
+ return -EFAULT;
+
+ ctx = ucma_get_ctx(file, cmd.id);
+ if (IS_ERR(ctx))
+ return PTR_ERR(ctx);
+
+ if (!ctx->cm_id->device) {
+ ret = -ENODEV;
+ goto out;
+ }
+
+ addr = &ctx->cm_id->route.addr.src_addr;
+ memcpy(&resp.src_addr, addr, addr->sa_family == AF_INET ?
+ sizeof(struct sockaddr_in) :
+ sizeof(struct sockaddr_in6));
+ addr = &ctx->cm_id->route.addr.dst_addr;
+ memcpy(&resp.dst_addr, addr, addr->sa_family == AF_INET ?
+ sizeof(struct sockaddr_in) :
+ sizeof(struct sockaddr_in6));
+ resp.node_guid = ctx->cm_id->device->node_guid;
+ resp.port_num = ctx->cm_id->port_num;
+ switch (ctx->cm_id->device->node_type) {
+ case IB_NODE_CA:
+ ucma_copy_ib_route(&resp, &ctx->cm_id->route);
+ default:
+ break;
+ }
+
+ if (copy_to_user((void __user *)(unsigned long)cmd.response,
+ &resp, sizeof(resp)))
+ ret = -EFAULT;
+
+out:
+ ucma_put_ctx(ctx);
+ return ret;
+}
+
+static void ucma_copy_conn_param(struct rdma_conn_param *dst_conn,
+ struct rdma_ucm_conn_param *src_conn)
+{
+ dst_conn->private_data = src_conn->private_data;
+ dst_conn->private_data_len = src_conn->private_data_len;
+ dst_conn->responder_resources =src_conn->responder_resources;
+ dst_conn->initiator_depth = src_conn->initiator_depth;
+ dst_conn->flow_control = src_conn->flow_control;
+ dst_conn->retry_count = src_conn->retry_count;
+ dst_conn->rnr_retry_count = src_conn->rnr_retry_count;
+ dst_conn->srq = src_conn->srq;
+ dst_conn->qp_num = src_conn->qp_num;
+ dst_conn->qp_type = src_conn->qp_type;
+}
+
+static ssize_t ucma_connect(struct ucma_file *file, const char __user *inbuf,
+ int in_len, int out_len)
+{
+ struct rdma_ucm_connect cmd;
+ struct rdma_conn_param conn_param;
+ struct ucma_context *ctx;
+ int ret;
+
+ if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
+ return -EFAULT;
+
+ if (!cmd.conn_param.valid)
+ return -EINVAL;
+
+ ctx = ucma_get_ctx(file, cmd.id);
+ if (IS_ERR(ctx))
+ return PTR_ERR(ctx);
+
+ ucma_copy_conn_param(&conn_param, &cmd.conn_param);
+ ret = rdma_connect(ctx->cm_id, &conn_param);
+ ucma_put_ctx(ctx);
+ return ret;
+}
+
+static ssize_t ucma_listen(struct ucma_file *file, const char __user *inbuf,
+ int in_len, int out_len)
+{
+ struct rdma_ucm_listen cmd;
+ struct ucma_context *ctx;
+ int ret;
+
+ if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
+ return -EFAULT;
+
+ ctx = ucma_get_ctx(file, cmd.id);
+ if (IS_ERR(ctx))
+ return PTR_ERR(ctx);
+
+ ctx->backlog = cmd.backlog > 0 && cmd.backlog < UCMA_MAX_BACKLOG ?
+ cmd.backlog : UCMA_MAX_BACKLOG;
+ ret = rdma_listen(ctx->cm_id, ctx->backlog);
+ ucma_put_ctx(ctx);
+ return ret;
+}
+
+static ssize_t ucma_accept(struct ucma_file *file, const char __user *inbuf,
+ int in_len, int out_len)
+{
+ struct rdma_ucm_accept cmd;
+ struct rdma_conn_param conn_param;
+ struct ucma_context *ctx;
+ int ret;
+
+ if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
+ return -EFAULT;
+
+ ctx = ucma_get_ctx(file, cmd.id);
+ if (IS_ERR(ctx))
+ return PTR_ERR(ctx);
+
+ if (cmd.conn_param.valid) {
+ ctx->uid = cmd.uid;
+ ucma_copy_conn_param(&conn_param, &cmd.conn_param);
+ ret = rdma_accept(ctx->cm_id, &conn_param);
+ } else
+ ret = rdma_accept(ctx->cm_id, NULL);
+
+ ucma_put_ctx(ctx);
+ return ret;
+}
+
+static ssize_t ucma_reject(struct ucma_file *file, const char __user *inbuf,
+ int in_len, int out_len)
+{
+ struct rdma_ucm_reject cmd;
+ struct ucma_context *ctx;
+ int ret;
+
+ if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
+ return -EFAULT;
+
+ ctx = ucma_get_ctx(file, cmd.id);
+ if (IS_ERR(ctx))
+ return PTR_ERR(ctx);
+
+ ret = rdma_reject(ctx->cm_id, cmd.private_data, cmd.private_data_len);
+ ucma_put_ctx(ctx);
+ return ret;
+}
+
+static ssize_t ucma_disconnect(struct ucma_file *file, const char __user *inbuf,
+ int in_len, int out_len)
+{
+ struct rdma_ucm_disconnect cmd;
+ struct ucma_context *ctx;
+ int ret;
+
+ if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
+ return -EFAULT;
+
+ ctx = ucma_get_ctx(file, cmd.id);
+ if (IS_ERR(ctx))
+ return PTR_ERR(ctx);
+
+ ret = rdma_disconnect(ctx->cm_id);
+ ucma_put_ctx(ctx);
+ return ret;
+}
+
+static ssize_t ucma_init_qp_attr(struct ucma_file *file,
+ const char __user *inbuf,
+ int in_len, int out_len)
+{
+ struct rdma_ucm_init_qp_attr cmd;
+ struct ib_uverbs_qp_attr resp;
+ struct ucma_context *ctx;
+ struct ib_qp_attr qp_attr;
+ int ret;
+
+ if (out_len < sizeof(resp))
+ return -ENOSPC;
+
+ if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
+ return -EFAULT;
+
+ ctx = ucma_get_ctx(file, cmd.id);
+ if (IS_ERR(ctx))
+ return PTR_ERR(ctx);
+
+ resp.qp_attr_mask = 0;
+ memset(&qp_attr, 0, sizeof qp_attr);
+ qp_attr.qp_state = cmd.qp_state;
+ ret = rdma_init_qp_attr(ctx->cm_id, &qp_attr, &resp.qp_attr_mask);
+ if (ret)
+ goto out;
+
+ ib_copy_qp_attr_to_user(&resp, &qp_attr);
+ if (copy_to_user((void __user *)(unsigned long)cmd.response,
+ &resp, sizeof(resp)))
+ ret = -EFAULT;
+
+out:
+ ucma_put_ctx(ctx);
+ return ret;
+}
+
+static ssize_t (*ucma_cmd_table[])(struct ucma_file *file,
+ const char __user *inbuf,
+ int in_len, int out_len) = {
+ [RDMA_USER_CM_CMD_CREATE_ID] = ucma_create_id,
+ [RDMA_USER_CM_CMD_DESTROY_ID] = ucma_destroy_id,
+ [RDMA_USER_CM_CMD_BIND_ADDR] = ucma_bind_addr,
+ [RDMA_USER_CM_CMD_RESOLVE_ADDR] = ucma_resolve_addr,
+ [RDMA_USER_CM_CMD_RESOLVE_ROUTE]= ucma_resolve_route,
+ [RDMA_USER_CM_CMD_QUERY_ROUTE] = ucma_query_route,
+ [RDMA_USER_CM_CMD_CONNECT] = ucma_connect,
+ [RDMA_USER_CM_CMD_LISTEN] = ucma_listen,
+ [RDMA_USER_CM_CMD_ACCEPT] = ucma_accept,
+ [RDMA_USER_CM_CMD_REJECT] = ucma_reject,
+ [RDMA_USER_CM_CMD_DISCONNECT] = ucma_disconnect,
+ [RDMA_USER_CM_CMD_INIT_QP_ATTR] = ucma_init_qp_attr,
+ [RDMA_USER_CM_CMD_GET_EVENT] = ucma_get_event
+};
+
+static ssize_t ucma_write(struct file *filp, const char __user *buf,
+ size_t len, loff_t *pos)
+{
+ struct ucma_file *file = filp->private_data;
+ struct rdma_ucm_cmd_hdr hdr;
+ ssize_t ret;
+
+ if (len < sizeof(hdr))
+ return -EINVAL;
+
+ if (copy_from_user(&hdr, buf, sizeof(hdr)))
+ return -EFAULT;
+
+ if (hdr.cmd < 0 || hdr.cmd >= ARRAY_SIZE(ucma_cmd_table))
+ return -EINVAL;
+
+ if (hdr.in + sizeof(hdr) > len)
+ return -EINVAL;
+
+ ret = ucma_cmd_table[hdr.cmd](file, buf + sizeof(hdr), hdr.in, hdr.out);
+ if (!ret)
+ ret = len;
+
+ return ret;
+}
+
+static unsigned int ucma_poll(struct file *filp, struct poll_table_struct *wait)
+{
+ struct ucma_file *file = filp->private_data;
+ unsigned int mask = 0;
+
+ poll_wait(filp, &file->poll_wait, wait);
+
+ down(&file->mutex);
+ if (!list_empty(&file->events))
+ mask = POLLIN | POLLRDNORM;
+ up(&file->mutex);
+
+ return mask;
+}
+
+static int ucma_open(struct inode *inode, struct file *filp)
+{
+ struct ucma_file *file;
+
+ file = kmalloc(sizeof *file, GFP_KERNEL);
+ if (!file)
+ return -ENOMEM;
+
+ INIT_LIST_HEAD(&file->events);
+ INIT_LIST_HEAD(&file->ctxs);
+ init_waitqueue_head(&file->poll_wait);
+ init_MUTEX(&file->mutex);
+
+ filp->private_data = file;
+ file->filp = filp;
+ return 0;
+}
+
+static int ucma_close(struct inode *inode, struct file *filp)
+{
+ struct ucma_file *file = filp->private_data;
+ struct ucma_context *ctx;
+
+ down(&file->mutex);
+ while (!list_empty(&file->ctxs)) {
+ ctx = list_entry(file->ctxs.next, struct ucma_context,
+ file_list);
+ up(&file->mutex);
+
+ down(&ctx_mutex);
+ idr_remove(&ctx_idr, ctx->id);
+ up(&ctx_mutex);
+
+ rdma_destroy_id(ctx->cm_id);
+ ucma_cleanup_events(ctx);
+ kfree(ctx);
+
+ down(&file->mutex);
+ }
+ up(&file->mutex);
+ kfree(file);
+ return 0;
+}
+
+static struct file_operations ucma_fops = {
+ .owner = THIS_MODULE,
+ .open = ucma_open,
+ .release = ucma_close,
+ .write = ucma_write,
+ .poll = ucma_poll,
+};
+
+static struct miscdevice ucma_misc = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = "rdma_cm",
+ .fops = &ucma_fops,
+};
+
+static int __init ucma_init(void)
+{
+ return misc_register(&ucma_misc);
+}
+
+static void __exit ucma_cleanup(void)
+{
+ misc_deregister(&ucma_misc);
+ idr_destroy(&ctx_idr);
+}
+
+module_init(ucma_init);
+module_exit(ucma_cleanup);
diff -uprN -X linux-2.6.git/Documentation/dontdiff
linux-2.6.git/include/rdma/rdma_user_cm.h
linux-2.6.ib/include/rdma/rdma_user_cm.h
--- linux-2.6.git/include/rdma/rdma_user_cm.h 1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.ib/include/rdma/rdma_user_cm.h 2006-01-16 16:54:55.000000000 -0800
@@ -0,0 +1,186 @@
+/*
+ * Copyright (c) 2005 Intel Corporation. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef RDMA_USER_CM_H
+#define RDMA_USER_CM_H
+
+#include <linux/types.h>
+#include <linux/in6.h>
+#include <rdma/ib_user_verbs.h>
+#include <rdma/ib_user_sa.h>
+
+#define RDMA_USER_CM_ABI_VERSION 1
+
+#define RDMA_MAX_PRIVATE_DATA 256
+
+enum {
+ RDMA_USER_CM_CMD_CREATE_ID,
+ RDMA_USER_CM_CMD_DESTROY_ID,
+ RDMA_USER_CM_CMD_BIND_ADDR,
+ RDMA_USER_CM_CMD_RESOLVE_ADDR,
+ RDMA_USER_CM_CMD_RESOLVE_ROUTE,
+ RDMA_USER_CM_CMD_QUERY_ROUTE,
+ RDMA_USER_CM_CMD_CONNECT,
+ RDMA_USER_CM_CMD_LISTEN,
+ RDMA_USER_CM_CMD_ACCEPT,
+ RDMA_USER_CM_CMD_REJECT,
+ RDMA_USER_CM_CMD_DISCONNECT,
+ RDMA_USER_CM_CMD_INIT_QP_ATTR,
+ RDMA_USER_CM_CMD_GET_EVENT
+};
+
+/*
+ * command ABI structures.
+ */
+struct rdma_ucm_cmd_hdr {
+ __u32 cmd;
+ __u16 in;
+ __u16 out;
+};
+
+struct rdma_ucm_create_id {
+ __u64 uid;
+ __u64 response;
+};
+
+struct rdma_ucm_create_id_resp {
+ __u32 id;
+};
+
+struct rdma_ucm_destroy_id {
+ __u64 response;
+ __u32 id;
+ __u32 reserved;
+};
+
+struct rdma_ucm_destroy_id_resp {
+ __u32 events_reported;
+};
+
+struct rdma_ucm_bind_addr {
+ __u64 response;
+ struct sockaddr_in6 addr;
+ __u32 id;
+};
+
+struct rdma_ucm_resolve_addr {
+ struct sockaddr_in6 src_addr;
+ struct sockaddr_in6 dst_addr;
+ __u32 id;
+ __u32 timeout_ms;
+};
+
+struct rdma_ucm_resolve_route {
+ __u32 id;
+ __u32 timeout_ms;
+};
+
+struct rdma_ucm_query_route {
+ __u64 response;
+ __u32 id;
+ __u32 reserved;
+};
+
+struct rdma_ucm_query_route_resp {
+ __u64 node_guid;
+ struct ib_user_path_rec ib_route[2];
+ struct sockaddr_in6 src_addr;
+ struct sockaddr_in6 dst_addr;
+ __u32 num_paths;
+ __u8 port_num;
+ __u8 reserved[3];
+};
+
+struct rdma_ucm_conn_param {
+ __u32 qp_num;
+ __u32 qp_type;
+ __u8 private_data[RDMA_MAX_PRIVATE_DATA];
+ __u8 private_data_len;
+ __u8 srq;
+ __u8 responder_resources;
+ __u8 initiator_depth;
+ __u8 flow_control;
+ __u8 retry_count;
+ __u8 rnr_retry_count;
+ __u8 valid;
+};
+
+struct rdma_ucm_connect {
+ struct rdma_ucm_conn_param conn_param;
+ __u32 id;
+ __u32 reserved;
+};
+
+struct rdma_ucm_listen {
+ __u32 id;
+ __u32 backlog;
+};
+
+struct rdma_ucm_accept {
+ __u64 uid;
+ struct rdma_ucm_conn_param conn_param;
+ __u32 id;
+ __u32 reserved;
+};
+
+struct rdma_ucm_reject {
+ __u32 id;
+ __u8 private_data_len;
+ __u8 reserved[3];
+ __u8 private_data[RDMA_MAX_PRIVATE_DATA];
+};
+
+struct rdma_ucm_disconnect {
+ __u32 id;
+};
+
+struct rdma_ucm_init_qp_attr {
+ __u64 response;
+ __u32 id;
+ __u32 qp_state;
+};
+
+struct rdma_ucm_get_event {
+ __u64 response;
+};
+
+struct rdma_ucm_event_resp {
+ __u64 uid;
+ __u32 id;
+ __u32 event;
+ __u32 status;
+ __u8 private_data_len;
+ __u8 reserved[3];
+ __u8 private_data[RDMA_MAX_PRIVATE_DATA];
+};
+
+#endif /* RDMA_USER_CM_H */
^ permalink raw reply
* RE: [PATCH 2/6] IB: match connection requests based on private data
From: Caitlin Bestler @ 2006-03-06 19:20 UTC (permalink / raw)
To: Sean Hefty, Roland Dreier; +Cc: netdev, linux-kernel, openib-general
> -----Original Message-----
> From: netdev-owner@vger.kernel.org
> [mailto:netdev-owner@vger.kernel.org] On Behalf Of Sean Hefty
> Sent: Monday, March 06, 2006 11:04 AM
> To: 'Roland Dreier'
> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org;
> openib-general@openib.org
> Subject: [PATCH 2/6] IB: match connection requests based on
> private data
>
> Extend matching connection requests to listens in the
> Infiniband CM to include private data checks.
>
> This allows applications to listen on the same service
> identifier, with private data directing the request to the
> appropriate application.
>
> Signed-off-by: Sean Hefty <sean.hefty@intel.com>
>
The term "private data" is intended to convey the
intent that the data is private to the application
layer and is opaque to middleware and the network.
By what mechanism does the listening application
delegate how much of the private data for use by
the CM for sub-dividing a listen? What does an
application do if it wishes to retain full ownership
of the private data?
^ permalink raw reply
* [PATCH 5/6] IB: IP address based RDMA connection manager
From: Sean Hefty @ 2006-03-06 19:18 UTC (permalink / raw)
To: 'Roland Dreier', netdev, linux-kernel; +Cc: openib-general
In-Reply-To: <adaslpz2l9p.fsf@cisco.com>
Kernel mode connection management agent over Infiniband that connects based
on IP addresses. The agent defines a generic RDMA connection abstraction
to support clients wanting to connect over different RDMA devices.
Agent also handles RDMA device hotplug events on behalf of clients.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
---
diff -uprN -X linux-2.6.git/Documentation/dontdiff
linux-2.6.git/drivers/infiniband/core/cma.c
linux-2.6.ib/drivers/infiniband/core/cma.c
--- linux-2.6.git/drivers/infiniband/core/cma.c 1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.ib/drivers/infiniband/core/cma.c 2006-01-16 16:17:34.000000000 -0800
@@ -0,0 +1,1639 @@
+/*
+ * Copyright (c) 2005 Voltaire Inc. All rights reserved.
+ * Copyright (c) 2002-2005, Network Appliance, Inc. All rights reserved.
+ * Copyright (c) 1999-2005, Mellanox Technologies, Inc. All rights reserved.
+ * Copyright (c) 2005 Intel Corporation. All rights reserved.
+ *
+ * This Software is licensed under one of the following licenses:
+ *
+ * 1) under the terms of the "Common Public License 1.0" a copy of which is
+ * available from the Open Source Initiative, see
+ * http://www.opensource.org/licenses/cpl.php.
+ *
+ * 2) under the terms of the "The BSD License" a copy of which is
+ * available from the Open Source Initiative, see
+ * http://www.opensource.org/licenses/bsd-license.php.
+ *
+ * 3) under the terms of the "GNU General Public License (GPL) Version 2" a
+ * copy of which is available from the Open Source Initiative, see
+ * http://www.opensource.org/licenses/gpl-license.php.
+ *
+ * Licensee has the right to choose one of the above licenses.
+ *
+ * Redistributions of source code must retain the above copyright
+ * notice and one of the license notices.
+ *
+ * Redistributions in binary form must reproduce both the above copyright
+ * notice, one of the license notices in the documentation
+ * and/or other materials provided with the distribution.
+ *
+ */
+#include <linux/in.h>
+#include <linux/in6.h>
+#include <linux/random.h>
+#include <rdma/rdma_cm.h>
+#include <rdma/ib_cache.h>
+#include <rdma/ib_cm.h>
+#include <rdma/ib_sa.h>
+
+MODULE_AUTHOR("Guy German");
+MODULE_DESCRIPTION("Generic RDMA CM Agent");
+MODULE_LICENSE("Dual BSD/GPL");
+
+#define CMA_CM_RESPONSE_TIMEOUT 20
+#define CMA_MAX_CM_RETRIES 3
+
+static void cma_add_one(struct ib_device *device);
+static void cma_remove_one(struct ib_device *device);
+
+static struct ib_client cma_client = {
+ .name = "cma",
+ .add = cma_add_one,
+ .remove = cma_remove_one
+};
+
+static LIST_HEAD(dev_list);
+static LIST_HEAD(listen_any_list);
+static DECLARE_MUTEX(mutex);
+
+struct cma_device {
+ struct list_head list;
+ struct ib_device *device;
+ __be64 node_guid;
+ wait_queue_head_t wait;
+ atomic_t refcount;
+ struct list_head id_list;
+};
+
+enum cma_state {
+ CMA_IDLE,
+ CMA_ADDR_QUERY,
+ CMA_ADDR_RESOLVED,
+ CMA_ROUTE_QUERY,
+ CMA_ROUTE_RESOLVED,
+ CMA_CONNECT,
+ CMA_ADDR_BOUND,
+ CMA_LISTEN,
+ CMA_DEVICE_REMOVAL,
+ CMA_DESTROYING
+};
+
+/*
+ * Device removal can occur at anytime, so we need extra handling to
+ * serialize notifying the user of device removal with other callbacks.
+ * We do this by disabling removal notification while a callback is in process,
+ * and reporting it after the callback completes.
+ */
+struct rdma_id_private {
+ struct rdma_cm_id id;
+
+ struct list_head list;
+ struct list_head listen_list;
+ struct cma_device *cma_dev;
+
+ enum cma_state state;
+ spinlock_t lock;
+ wait_queue_head_t wait;
+ atomic_t refcount;
+ wait_queue_head_t wait_remove;
+ atomic_t dev_remove;
+
+ int backlog;
+ int timeout_ms;
+ struct ib_sa_query *query;
+ int query_id;
+ struct ib_cm_id *cm_id;
+
+ u32 seq_num;
+ u32 qp_num;
+ enum ib_qp_type qp_type;
+ u8 srq;
+};
+
+struct cma_work {
+ struct work_struct work;
+ struct rdma_id_private *id;
+};
+
+union cma_ip_addr {
+ struct in6_addr ip6;
+ struct {
+ __u32 pad[3];
+ __u32 addr;
+ } ip4;
+};
+
+struct cma_hdr {
+ u8 cma_version;
+ u8 ip_version; /* IP version: 7:4 */
+ __u16 port;
+ union cma_ip_addr src_addr;
+ union cma_ip_addr dst_addr;
+};
+
+struct sdp_hh {
+ u8 sdp_version;
+ u8 ip_version; /* IP version: 7:4 */
+ u8 sdp_specific1[10];
+ __u16 port;
+ __u16 sdp_specific2;
+ union cma_ip_addr src_addr;
+ union cma_ip_addr dst_addr;
+};
+
+#define CMA_VERSION 0x10
+#define SDP_VERSION 0x22
+
+static int cma_comp(struct rdma_id_private *id_priv, enum cma_state comp)
+{
+ unsigned long flags;
+ int ret;
+
+ spin_lock_irqsave(&id_priv->lock, flags);
+ ret = (id_priv->state == comp);
+ spin_unlock_irqrestore(&id_priv->lock, flags);
+ return ret;
+}
+
+static int cma_comp_exch(struct rdma_id_private *id_priv,
+ enum cma_state comp, enum cma_state exch)
+{
+ unsigned long flags;
+ int ret;
+
+ spin_lock_irqsave(&id_priv->lock, flags);
+ if ((ret = (id_priv->state == comp)))
+ id_priv->state = exch;
+ spin_unlock_irqrestore(&id_priv->lock, flags);
+ return ret;
+}
+
+static enum cma_state cma_exch(struct rdma_id_private *id_priv,
+ enum cma_state exch)
+{
+ unsigned long flags;
+ enum cma_state old;
+
+ spin_lock_irqsave(&id_priv->lock, flags);
+ old = id_priv->state;
+ id_priv->state = exch;
+ spin_unlock_irqrestore(&id_priv->lock, flags);
+ return old;
+}
+
+static inline u8 cma_get_ip_ver(struct cma_hdr *hdr)
+{
+ return hdr->ip_version >> 4;
+}
+
+static inline void cma_set_ip_ver(struct cma_hdr *hdr, u8 ip_ver)
+{
+ hdr->ip_version = (ip_ver << 4) | (hdr->ip_version & 0xF);
+}
+
+static inline u8 sdp_get_ip_ver(struct sdp_hh *hh)
+{
+ return hh->ip_version >> 4;
+}
+
+static inline void sdp_set_ip_ver(struct sdp_hh *hh, u8 ip_ver)
+{
+ hh->ip_version = (ip_ver << 4) | (hh->ip_version & 0xF);
+}
+
+static void cma_attach_to_dev(struct rdma_id_private *id_priv,
+ struct cma_device *cma_dev)
+{
+ atomic_inc(&cma_dev->refcount);
+ id_priv->cma_dev = cma_dev;
+ id_priv->id.device = cma_dev->device;
+ list_add_tail(&id_priv->list, &cma_dev->id_list);
+}
+
+static void cma_detach_from_dev(struct rdma_id_private *id_priv)
+{
+ list_del(&id_priv->list);
+ if (atomic_dec_and_test(&id_priv->cma_dev->refcount))
+ wake_up(&id_priv->cma_dev->wait);
+ id_priv->cma_dev = NULL;
+}
+
+static int cma_acquire_ib_dev(struct rdma_id_private *id_priv)
+{
+ struct cma_device *cma_dev;
+ union ib_gid *gid;
+ int ret = -ENODEV;
+
+ gid = ib_addr_get_sgid(&id_priv->id.route.addr.dev_addr);
+
+ down(&mutex);
+ list_for_each_entry(cma_dev, &dev_list, list) {
+ ret = ib_find_cached_gid(cma_dev->device, gid,
+ &id_priv->id.port_num, NULL);
+ if (!ret) {
+ cma_attach_to_dev(id_priv, cma_dev);
+ break;
+ }
+ }
+ up(&mutex);
+ return ret;
+}
+
+static int cma_acquire_dev(struct rdma_id_private *id_priv)
+{
+ switch (id_priv->id.route.addr.dev_addr.dev_type) {
+ case IB_NODE_CA:
+ return cma_acquire_ib_dev(id_priv);
+ default:
+ return -ENODEV;
+ }
+}
+
+static void cma_deref_id(struct rdma_id_private *id_priv)
+{
+ if (atomic_dec_and_test(&id_priv->refcount))
+ wake_up(&id_priv->wait);
+}
+
+static void cma_release_remove(struct rdma_id_private *id_priv)
+{
+ if (atomic_dec_and_test(&id_priv->dev_remove))
+ wake_up(&id_priv->wait_remove);
+}
+
+struct rdma_cm_id* rdma_create_id(rdma_cm_event_handler event_handler,
+ void *context, enum rdma_port_space ps)
+{
+ struct rdma_id_private *id_priv;
+
+ id_priv = kzalloc(sizeof *id_priv, GFP_KERNEL);
+ if (!id_priv)
+ return ERR_PTR(-ENOMEM);
+
+ id_priv->state = CMA_IDLE;
+ id_priv->id.context = context;
+ id_priv->id.event_handler = event_handler;
+ id_priv->id.ps = ps;
+ spin_lock_init(&id_priv->lock);
+ init_waitqueue_head(&id_priv->wait);
+ atomic_set(&id_priv->refcount, 1);
+ init_waitqueue_head(&id_priv->wait_remove);
+ atomic_set(&id_priv->dev_remove, 0);
+ INIT_LIST_HEAD(&id_priv->listen_list);
+ get_random_bytes(&id_priv->seq_num, sizeof id_priv->seq_num);
+
+ return &id_priv->id;
+}
+EXPORT_SYMBOL(rdma_create_id);
+
+static int cma_init_ib_qp(struct rdma_id_private *id_priv, struct ib_qp *qp)
+{
+ struct ib_qp_attr qp_attr;
+ struct rdma_dev_addr *dev_addr;
+ int ret;
+
+ dev_addr = &id_priv->id.route.addr.dev_addr;
+ ret = ib_find_cached_pkey(id_priv->id.device, id_priv->id.port_num,
+ ib_addr_get_pkey(dev_addr),
+ &qp_attr.pkey_index);
+ if (ret)
+ return ret;
+
+ qp_attr.qp_state = IB_QPS_INIT;
+ qp_attr.qp_access_flags = IB_ACCESS_LOCAL_WRITE;
+ qp_attr.port_num = id_priv->id.port_num;
+ return ib_modify_qp(qp, &qp_attr, IB_QP_STATE | IB_QP_ACCESS_FLAGS |
+ IB_QP_PKEY_INDEX | IB_QP_PORT);
+}
+
+int rdma_create_qp(struct rdma_cm_id *id, struct ib_pd *pd,
+ struct ib_qp_init_attr *qp_init_attr)
+{
+ struct rdma_id_private *id_priv;
+ struct ib_qp *qp;
+ int ret;
+
+ id_priv = container_of(id, struct rdma_id_private, id);
+ if (id->device != pd->device)
+ return -EINVAL;
+
+ qp = ib_create_qp(pd, qp_init_attr);
+ if (IS_ERR(qp))
+ return PTR_ERR(qp);
+
+ switch (id->device->node_type) {
+ case IB_NODE_CA:
+ ret = cma_init_ib_qp(id_priv, qp);
+ break;
+ default:
+ ret = -ENOSYS;
+ break;
+ }
+
+ if (ret)
+ goto err;
+
+ id->qp = qp;
+ id_priv->qp_num = qp->qp_num;
+ id_priv->qp_type = qp->qp_type;
+ id_priv->srq = (qp->srq != NULL);
+ return 0;
+err:
+ ib_destroy_qp(qp);
+ return ret;
+}
+EXPORT_SYMBOL(rdma_create_qp);
+
+void rdma_destroy_qp(struct rdma_cm_id *id)
+{
+ ib_destroy_qp(id->qp);
+}
+EXPORT_SYMBOL(rdma_destroy_qp);
+
+static int cma_modify_qp_rtr(struct rdma_cm_id *id)
+{
+ struct ib_qp_attr qp_attr;
+ int qp_attr_mask, ret;
+
+ if (!id->qp)
+ return 0;
+
+ /* Need to update QP attributes from default values. */
+ qp_attr.qp_state = IB_QPS_INIT;
+ ret = rdma_init_qp_attr(id, &qp_attr, &qp_attr_mask);
+ if (ret)
+ return ret;
+
+ ret = ib_modify_qp(id->qp, &qp_attr, qp_attr_mask);
+ if (ret)
+ return ret;
+
+ qp_attr.qp_state = IB_QPS_RTR;
+ ret = rdma_init_qp_attr(id, &qp_attr, &qp_attr_mask);
+ if (ret)
+ return ret;
+
+ return ib_modify_qp(id->qp, &qp_attr, qp_attr_mask);
+}
+
+static int cma_modify_qp_rts(struct rdma_cm_id *id)
+{
+ struct ib_qp_attr qp_attr;
+ int qp_attr_mask, ret;
+
+ if (!id->qp)
+ return 0;
+
+ qp_attr.qp_state = IB_QPS_RTS;
+ ret = rdma_init_qp_attr(id, &qp_attr, &qp_attr_mask);
+ if (ret)
+ return ret;
+
+ return ib_modify_qp(id->qp, &qp_attr, qp_attr_mask);
+}
+
+static int cma_modify_qp_err(struct rdma_cm_id *id)
+{
+ struct ib_qp_attr qp_attr;
+
+ if (!id->qp)
+ return 0;
+
+ qp_attr.qp_state = IB_QPS_ERR;
+ return ib_modify_qp(id->qp, &qp_attr, IB_QP_STATE);
+}
+
+int rdma_init_qp_attr(struct rdma_cm_id *id, struct ib_qp_attr *qp_attr,
+ int *qp_attr_mask)
+{
+ struct rdma_id_private *id_priv;
+ int ret;
+
+ id_priv = container_of(id, struct rdma_id_private, id);
+ switch (id_priv->id.device->node_type) {
+ case IB_NODE_CA:
+ ret = ib_cm_init_qp_attr(id_priv->cm_id, qp_attr,
+ qp_attr_mask);
+ if (qp_attr->qp_state == IB_QPS_RTR)
+ qp_attr->rq_psn = id_priv->seq_num;
+ break;
+ default:
+ ret = -ENOSYS;
+ break;
+ }
+
+ return ret;
+}
+EXPORT_SYMBOL(rdma_init_qp_attr);
+
+static inline int cma_any_addr(struct sockaddr *addr)
+{
+ struct in6_addr *ip6;
+
+ if (addr->sa_family == AF_INET)
+ return ((struct sockaddr_in *) addr)->sin_addr.s_addr ==
+ INADDR_ANY;
+ else {
+ ip6 = &((struct sockaddr_in6 *) addr)->sin6_addr;
+ return (ip6->s6_addr32[0] | ip6->s6_addr32[1] |
+ ip6->s6_addr32[3] | ip6->s6_addr32[4]) == 0;
+ }
+}
+
+static inline int cma_loopback_addr(struct sockaddr *addr)
+{
+ return ((struct sockaddr_in *) addr)->sin_addr.s_addr ==
+ ntohl(INADDR_LOOPBACK);
+}
+
+static int cma_get_net_info(void *hdr, enum rdma_port_space ps,
+ u8 *ip_ver, __u16 *port,
+ union cma_ip_addr **src, union cma_ip_addr **dst)
+{
+ switch (ps) {
+ case RDMA_PS_SDP:
+ if (((struct sdp_hh *) hdr)->sdp_version != SDP_VERSION)
+ return -EINVAL;
+
+ *ip_ver = sdp_get_ip_ver(hdr);
+ *port = ((struct sdp_hh *) hdr)->port;
+ *src = &((struct sdp_hh *) hdr)->src_addr;
+ *dst = &((struct sdp_hh *) hdr)->dst_addr;
+ break;
+ default:
+ if (((struct cma_hdr *) hdr)->cma_version != CMA_VERSION)
+ return -EINVAL;
+
+ *ip_ver = cma_get_ip_ver(hdr);
+ *port = ((struct cma_hdr *) hdr)->port;
+ *src = &((struct cma_hdr *) hdr)->src_addr;
+ *dst = &((struct cma_hdr *) hdr)->dst_addr;
+ break;
+ }
+ return 0;
+}
+
+static void cma_save_net_info(struct rdma_addr *addr,
+ struct rdma_addr *listen_addr,
+ u8 ip_ver, __u16 port,
+ union cma_ip_addr *src, union cma_ip_addr *dst)
+{
+ struct sockaddr_in *listen4, *ip4;
+ struct sockaddr_in6 *listen6, *ip6;
+
+ switch (ip_ver) {
+ case 4:
+ listen4 = (struct sockaddr_in *) &listen_addr->src_addr;
+ ip4 = (struct sockaddr_in *) &addr->src_addr;
+ ip4->sin_family = listen4->sin_family;
+ ip4->sin_addr.s_addr = dst->ip4.addr;
+ ip4->sin_port = listen4->sin_port;
+
+ ip4 = (struct sockaddr_in *) &addr->dst_addr;
+ ip4->sin_family = listen4->sin_family;
+ ip4->sin_addr.s_addr = src->ip4.addr;
+ ip4->sin_port = port;
+ break;
+ case 6:
+ listen6 = (struct sockaddr_in6 *) &listen_addr->src_addr;
+ ip6 = (struct sockaddr_in6 *) &addr->src_addr;
+ ip6->sin6_family = listen6->sin6_family;
+ ip6->sin6_addr = dst->ip6;
+ ip6->sin6_port = listen6->sin6_port;
+
+ ip6 = (struct sockaddr_in6 *) &addr->dst_addr;
+ ip6->sin6_family = listen6->sin6_family;
+ ip6->sin6_addr = src->ip6;
+ ip6->sin6_port = port;
+ break;
+ default:
+ break;
+ }
+}
+
+static inline int cma_user_data_offset(enum rdma_port_space ps)
+{
+ switch (ps) {
+ case RDMA_PS_SDP:
+ return 0;
+ default:
+ return sizeof(struct cma_hdr);
+ }
+}
+
+static int cma_notify_user(struct rdma_id_private *id_priv,
+ enum rdma_cm_event_type type, int status,
+ void *data, u8 data_len)
+{
+ struct rdma_cm_event event;
+
+ event.event = type;
+ event.status = status;
+ event.private_data = data;
+ event.private_data_len = data_len;
+
+ return id_priv->id.event_handler(&id_priv->id, &event);
+}
+
+static void cma_cancel_addr(struct rdma_id_private *id_priv)
+{
+ switch (id_priv->id.device->node_type) {
+ case IB_NODE_CA:
+ rdma_addr_cancel(&id_priv->id.route.addr.dev_addr);
+ break;
+ default:
+ break;
+ }
+}
+
+static void cma_cancel_route(struct rdma_id_private *id_priv)
+{
+ switch (id_priv->id.device->node_type) {
+ case IB_NODE_CA:
+ ib_sa_cancel_query(id_priv->query_id, id_priv->query);
+ break;
+ default:
+ break;
+ }
+}
+
+static inline int cma_internal_listen(struct rdma_id_private *id_priv)
+{
+ return (id_priv->state == CMA_LISTEN) && id_priv->cma_dev &&
+ cma_any_addr(&id_priv->id.route.addr.src_addr);
+}
+
+static void cma_destroy_listen(struct rdma_id_private *id_priv)
+{
+ cma_exch(id_priv, CMA_DESTROYING);
+
+ if (id_priv->cm_id && !IS_ERR(id_priv->cm_id))
+ ib_destroy_cm_id(id_priv->cm_id);
+
+ list_del(&id_priv->listen_list);
+ if (id_priv->cma_dev)
+ cma_detach_from_dev(id_priv);
+
+ atomic_dec(&id_priv->refcount);
+ wait_event(id_priv->wait, !atomic_read(&id_priv->refcount));
+
+ kfree(id_priv);
+}
+
+static void cma_cancel_listens(struct rdma_id_private *id_priv)
+{
+ struct rdma_id_private *dev_id_priv;
+
+ down(&mutex);
+ list_del(&id_priv->list);
+
+ while (!list_empty(&id_priv->listen_list)) {
+ dev_id_priv = list_entry(id_priv->listen_list.next,
+ struct rdma_id_private, listen_list);
+ cma_destroy_listen(dev_id_priv);
+ }
+ up(&mutex);
+}
+
+static void cma_cancel_operation(struct rdma_id_private *id_priv,
+ enum cma_state state)
+{
+ switch (state) {
+ case CMA_ADDR_QUERY:
+ cma_cancel_addr(id_priv);
+ break;
+ case CMA_ROUTE_QUERY:
+ cma_cancel_route(id_priv);
+ break;
+ case CMA_LISTEN:
+ if (cma_any_addr(&id_priv->id.route.addr.src_addr) &&
+ !id_priv->cma_dev)
+ cma_cancel_listens(id_priv);
+ break;
+ default:
+ break;
+ }
+}
+
+void rdma_destroy_id(struct rdma_cm_id *id)
+{
+ struct rdma_id_private *id_priv;
+ enum cma_state state;
+
+ id_priv = container_of(id, struct rdma_id_private, id);
+ state = cma_exch(id_priv, CMA_DESTROYING);
+ cma_cancel_operation(id_priv, state);
+
+ if (id_priv->cm_id && !IS_ERR(id_priv->cm_id))
+ ib_destroy_cm_id(id_priv->cm_id);
+
+ if (id_priv->cma_dev) {
+ down(&mutex);
+ cma_detach_from_dev(id_priv);
+ up(&mutex);
+ }
+
+ atomic_dec(&id_priv->refcount);
+ wait_event(id_priv->wait, !atomic_read(&id_priv->refcount));
+
+ kfree(id_priv->id.route.path_rec);
+ kfree(id_priv);
+}
+EXPORT_SYMBOL(rdma_destroy_id);
+
+static int cma_rep_recv(struct rdma_id_private *id_priv)
+{
+ int ret;
+
+ ret = cma_modify_qp_rtr(&id_priv->id);
+ if (ret)
+ goto reject;
+
+ ret = cma_modify_qp_rts(&id_priv->id);
+ if (ret)
+ goto reject;
+
+ ret = ib_send_cm_rtu(id_priv->cm_id, NULL, 0);
+ if (ret)
+ goto reject;
+
+ return 0;
+reject:
+ cma_modify_qp_err(&id_priv->id);
+ ib_send_cm_rej(id_priv->cm_id, IB_CM_REJ_CONSUMER_DEFINED,
+ NULL, 0, NULL, 0);
+ return ret;
+}
+
+static int cma_rtu_recv(struct rdma_id_private *id_priv)
+{
+ int ret;
+
+ ret = cma_modify_qp_rts(&id_priv->id);
+ if (ret)
+ goto reject;
+
+ return 0;
+reject:
+ cma_modify_qp_err(&id_priv->id);
+ ib_send_cm_rej(id_priv->cm_id, IB_CM_REJ_CONSUMER_DEFINED,
+ NULL, 0, NULL, 0);
+ return ret;
+}
+
+static int cma_ib_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event)
+{
+ struct rdma_id_private *id_priv = cm_id->context;
+ enum rdma_cm_event_type event;
+ u8 private_data_len = 0;
+ int ret = 0, status = 0;
+
+ if (!cma_comp(id_priv, CMA_CONNECT))
+ return 0;
+
+ atomic_inc(&id_priv->dev_remove);
+ switch (ib_event->event) {
+ case IB_CM_REQ_ERROR:
+ case IB_CM_REP_ERROR:
+ event = RDMA_CM_EVENT_UNREACHABLE;
+ status = -ETIMEDOUT;
+ break;
+ case IB_CM_REP_RECEIVED:
+ if (id_priv->id.qp) {
+ status = cma_rep_recv(id_priv);
+ event = status ? RDMA_CM_EVENT_CONNECT_ERROR :
+ RDMA_CM_EVENT_ESTABLISHED;
+ } else
+ event = RDMA_CM_EVENT_CONNECT_RESPONSE;
+ private_data_len = IB_CM_REP_PRIVATE_DATA_SIZE;
+ break;
+ case IB_CM_RTU_RECEIVED:
+ status = cma_rtu_recv(id_priv);
+ event = status ? RDMA_CM_EVENT_CONNECT_ERROR :
+ RDMA_CM_EVENT_ESTABLISHED;
+ break;
+ case IB_CM_DREQ_ERROR:
+ status = -ETIMEDOUT; /* fall through */
+ case IB_CM_DREQ_RECEIVED:
+ case IB_CM_DREP_RECEIVED:
+ event = RDMA_CM_EVENT_DISCONNECTED;
+ break;
+ case IB_CM_TIMEWAIT_EXIT:
+ case IB_CM_MRA_RECEIVED:
+ /* ignore event */
+ goto out;
+ case IB_CM_REJ_RECEIVED:
+ cma_modify_qp_err(&id_priv->id);
+ status = ib_event->param.rej_rcvd.reason;
+ event = RDMA_CM_EVENT_REJECTED;
+ break;
+ default:
+ printk(KERN_ERR "RDMA CMA: unexpected IB CM event: %d",
+ ib_event->event);
+ goto out;
+ }
+
+ ret = cma_notify_user(id_priv, event, status, ib_event->private_data,
+ private_data_len);
+ if (ret) {
+ /* Destroy the CM ID by returning a non-zero value. */
+ id_priv->cm_id = NULL;
+ cma_exch(id_priv, CMA_DESTROYING);
+ cma_release_remove(id_priv);
+ rdma_destroy_id(&id_priv->id);
+ return ret;
+ }
+out:
+ cma_release_remove(id_priv);
+ return ret;
+}
+
+static struct rdma_id_private* cma_new_id(struct rdma_cm_id *listen_id,
+ struct ib_cm_event *ib_event)
+{
+ struct rdma_id_private *id_priv;
+ struct rdma_cm_id *id;
+ struct rdma_route *rt;
+ union cma_ip_addr *src, *dst;
+ __u16 port;
+ u8 ip_ver;
+
+ id = rdma_create_id(listen_id->event_handler, listen_id->context,
+ listen_id->ps);
+ if (IS_ERR(id))
+ return NULL;
+
+ rt = &id->route;
+ rt->num_paths = ib_event->param.req_rcvd.alternate_path ? 2 : 1;
+ rt->path_rec = kmalloc(sizeof *rt->path_rec * rt->num_paths, GFP_KERNEL);
+ if (!rt->path_rec)
+ goto err;
+
+ if (cma_get_net_info(ib_event->private_data, listen_id->ps,
+ &ip_ver, &port, &src, &dst))
+ goto err;
+
+ cma_save_net_info(&id->route.addr, &listen_id->route.addr,
+ ip_ver, port, src, dst);
+ rt->path_rec[0] = *ib_event->param.req_rcvd.primary_path;
+ if (rt->num_paths == 2)
+ rt->path_rec[1] = *ib_event->param.req_rcvd.alternate_path;
+
+ ib_addr_set_sgid(&rt->addr.dev_addr, &rt->path_rec[0].sgid);
+ ib_addr_set_dgid(&rt->addr.dev_addr, &rt->path_rec[0].dgid);
+ ib_addr_set_pkey(&rt->addr.dev_addr, be16_to_cpu(rt->path_rec[0].pkey));
+ rt->addr.dev_addr.dev_type = IB_NODE_CA;
+
+ id_priv = container_of(id, struct rdma_id_private, id);
+ id_priv->state = CMA_CONNECT;
+ return id_priv;
+err:
+ rdma_destroy_id(id);
+ return NULL;
+}
+
+static int cma_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event)
+{
+ struct rdma_id_private *listen_id, *conn_id;
+ int offset, ret;
+
+ listen_id = cm_id->context;
+ atomic_inc(&listen_id->dev_remove);
+ if (!cma_comp(listen_id, CMA_LISTEN)) {
+ ret = -ECONNABORTED;
+ goto out;
+ }
+
+ conn_id = cma_new_id(&listen_id->id, ib_event);
+ if (!conn_id) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ atomic_inc(&conn_id->dev_remove);
+ ret = cma_acquire_ib_dev(conn_id);
+ if (ret) {
+ ret = -ENODEV;
+ cma_release_remove(conn_id);
+ rdma_destroy_id(&conn_id->id);
+ goto out;
+ }
+
+ conn_id->cm_id = cm_id;
+ cm_id->context = conn_id;
+ cm_id->cm_handler = cma_ib_handler;
+
+ offset = cma_user_data_offset(listen_id->id.ps);
+ ret = cma_notify_user(conn_id, RDMA_CM_EVENT_CONNECT_REQUEST, 0,
+ ib_event->private_data + offset,
+ IB_CM_REQ_PRIVATE_DATA_SIZE - offset);
+ if (ret) {
+ /* Destroy the CM ID by returning a non-zero value. */
+ conn_id->cm_id = NULL;
+ cma_exch(conn_id, CMA_DESTROYING);
+ cma_release_remove(conn_id);
+ rdma_destroy_id(&conn_id->id);
+ }
+out:
+ cma_release_remove(listen_id);
+ return ret;
+}
+
+static __be64 cma_get_service_id(enum rdma_port_space ps, struct sockaddr *addr)
+{
+ return cpu_to_be64(((u64)ps << 16) +
+ ((struct sockaddr_in *) addr)->sin_port);
+}
+
+static void cma_set_compare_data(struct sockaddr *addr,
+ struct ib_cm_private_data_compare *compare)
+{
+ struct cma_hdr *data, *mask;
+
+ memset(compare, 0, sizeof *compare);
+ data = (void *) compare->data;
+ mask = (void *) compare->mask;
+
+ switch (addr->sa_family) {
+ case AF_INET:
+ cma_set_ip_ver(data, 4);
+ cma_set_ip_ver(mask, 0xF);
+ data->dst_addr.ip4.addr = ((struct sockaddr_in *) addr)->
+ sin_addr.s_addr;
+ mask->dst_addr.ip4.addr = ~0;
+ break;
+ case AF_INET6:
+ cma_set_ip_ver(data, 6);
+ cma_set_ip_ver(mask, 0xF);
+ data->dst_addr.ip6 = ((struct sockaddr_in6 *) addr)->
+ sin6_addr;
+ memset(&mask->dst_addr.ip6, 1, sizeof mask->dst_addr.ip6);
+ break;
+ default:
+ break;
+ }
+}
+
+static int cma_ib_listen(struct rdma_id_private *id_priv)
+{
+ struct ib_cm_private_data_compare compare_data;
+ struct sockaddr *addr;
+ __be64 svc_id;
+ int ret;
+
+ id_priv->cm_id = ib_create_cm_id(id_priv->id.device, cma_req_handler,
+ id_priv);
+ if (IS_ERR(id_priv->cm_id))
+ return PTR_ERR(id_priv->cm_id);
+
+ addr = &id_priv->id.route.addr.src_addr;
+ svc_id = cma_get_service_id(id_priv->id.ps, addr);
+ if (cma_any_addr(addr))
+ ret = ib_cm_listen(id_priv->cm_id, svc_id, 0, NULL);
+ else {
+ cma_set_compare_data(addr, &compare_data);
+ ret = ib_cm_listen(id_priv->cm_id, svc_id, 0, &compare_data);
+ }
+
+ if (ret) {
+ ib_destroy_cm_id(id_priv->cm_id);
+ id_priv->cm_id = NULL;
+ }
+
+ return ret;
+}
+
+static int cma_duplicate_listen(struct rdma_id_private *id_priv)
+{
+ struct rdma_id_private *cur_id_priv;
+ struct sockaddr_in *cur_addr, *new_addr;
+
+ new_addr = (struct sockaddr_in *) &id_priv->id.route.addr.src_addr;
+ list_for_each_entry(cur_id_priv, &listen_any_list, listen_list) {
+ cur_addr = (struct sockaddr_in *)
+ &cur_id_priv->id.route.addr.src_addr;
+ if (cur_addr->sin_port == new_addr->sin_port)
+ return -EADDRINUSE;
+ }
+ return 0;
+}
+
+static int cma_listen_handler(struct rdma_cm_id *id,
+ struct rdma_cm_event *event)
+{
+ struct rdma_id_private *id_priv = id->context;
+
+ id->context = id_priv->id.context;
+ id->event_handler = id_priv->id.event_handler;
+ return id_priv->id.event_handler(id, event);
+}
+
+static void cma_listen_on_dev(struct rdma_id_private *id_priv,
+ struct cma_device *cma_dev)
+{
+ struct rdma_id_private *dev_id_priv;
+ struct rdma_cm_id *id;
+ int ret;
+
+ id = rdma_create_id(cma_listen_handler, id_priv, id_priv->id.ps);
+ if (IS_ERR(id))
+ return;
+
+ dev_id_priv = container_of(id, struct rdma_id_private, id);
+ ret = rdma_bind_addr(id, &id_priv->id.route.addr.src_addr);
+ if (ret)
+ goto err;
+
+ cma_attach_to_dev(dev_id_priv, cma_dev);
+ list_add_tail(&dev_id_priv->listen_list, &id_priv->listen_list);
+
+ ret = rdma_listen(id, id_priv->backlog);
+ if (ret)
+ goto err;
+
+ return;
+err:
+ cma_destroy_listen(dev_id_priv);
+}
+
+static int cma_listen_on_all(struct rdma_id_private *id_priv)
+{
+ struct cma_device *cma_dev;
+ int ret;
+
+ down(&mutex);
+ ret = cma_duplicate_listen(id_priv);
+ if (ret)
+ goto out;
+
+ list_add_tail(&id_priv->list, &listen_any_list);
+ list_for_each_entry(cma_dev, &dev_list, list)
+ cma_listen_on_dev(id_priv, cma_dev);
+out:
+ up(&mutex);
+ return ret;
+}
+
+int rdma_listen(struct rdma_cm_id *id, int backlog)
+{
+ struct rdma_id_private *id_priv;
+ int ret;
+
+ id_priv = container_of(id, struct rdma_id_private, id);
+ if (!cma_comp_exch(id_priv, CMA_ADDR_BOUND, CMA_LISTEN))
+ return -EINVAL;
+
+ if (id->device) {
+ switch (id->device->node_type) {
+ case IB_NODE_CA:
+ ret = cma_ib_listen(id_priv);
+ break;
+ default:
+ ret = -ENOSYS;
+ break;
+ }
+ } else
+ ret = cma_listen_on_all(id_priv);
+
+ if (ret)
+ goto err;
+
+ id_priv->backlog = backlog;
+ return 0;
+err:
+ cma_comp_exch(id_priv, CMA_LISTEN, CMA_ADDR_BOUND);
+ return ret;
+};
+EXPORT_SYMBOL(rdma_listen);
+
+static void cma_query_handler(int status, struct ib_sa_path_rec *path_rec,
+ void *context)
+{
+ struct rdma_id_private *id_priv = context;
+ struct rdma_route *route = &id_priv->id.route;
+ enum rdma_cm_event_type event = RDMA_CM_EVENT_ROUTE_RESOLVED;
+
+ atomic_inc(&id_priv->dev_remove);
+ if (!status) {
+ route->path_rec = kmalloc(sizeof *route->path_rec, GFP_KERNEL);
+ if (route->path_rec) {
+ route->num_paths = 1;
+ *route->path_rec = *path_rec;
+ if (!cma_comp_exch(id_priv, CMA_ROUTE_QUERY,
+ CMA_ROUTE_RESOLVED)) {
+ kfree(route->path_rec);
+ goto out;
+ }
+ } else
+ status = -ENOMEM;
+ }
+
+ if (status) {
+ if (!cma_comp_exch(id_priv, CMA_ROUTE_QUERY, CMA_ADDR_RESOLVED))
+ goto out;
+ event = RDMA_CM_EVENT_ROUTE_ERROR;
+ }
+
+ if (cma_notify_user(id_priv, event, status, NULL, 0)) {
+ cma_exch(id_priv, CMA_DESTROYING);
+ cma_release_remove(id_priv);
+ cma_deref_id(id_priv);
+ rdma_destroy_id(&id_priv->id);
+ return;
+ }
+out:
+ cma_release_remove(id_priv);
+ cma_deref_id(id_priv);
+}
+
+static int cma_resolve_ib_route(struct rdma_id_private *id_priv, int timeout_ms)
+{
+ struct rdma_dev_addr *addr = &id_priv->id.route.addr.dev_addr;
+ struct ib_sa_path_rec path_rec;
+
+ memset(&path_rec, 0, sizeof path_rec);
+ path_rec.sgid = *ib_addr_get_sgid(addr);
+ path_rec.dgid = *ib_addr_get_dgid(addr);
+ path_rec.pkey = cpu_to_be16(ib_addr_get_pkey(addr));
+ path_rec.numb_path = 1;
+
+ id_priv->query_id = ib_sa_path_rec_get(id_priv->id.device,
+ id_priv->id.port_num, &path_rec,
+ IB_SA_PATH_REC_DGID | IB_SA_PATH_REC_SGID |
+ IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH,
+ timeout_ms, GFP_KERNEL,
+ cma_query_handler, id_priv, &id_priv->query);
+
+ return (id_priv->query_id < 0) ? id_priv->query_id : 0;
+}
+
+int rdma_resolve_route(struct rdma_cm_id *id, int timeout_ms)
+{
+ struct rdma_id_private *id_priv;
+ int ret;
+
+ id_priv = container_of(id, struct rdma_id_private, id);
+ if (!cma_comp_exch(id_priv, CMA_ADDR_RESOLVED, CMA_ROUTE_QUERY))
+ return -EINVAL;
+
+ atomic_inc(&id_priv->refcount);
+ switch (id->device->node_type) {
+ case IB_NODE_CA:
+ ret = cma_resolve_ib_route(id_priv, timeout_ms);
+ break;
+ default:
+ ret = -ENOSYS;
+ break;
+ }
+ if (ret)
+ goto err;
+
+ return 0;
+err:
+ cma_comp_exch(id_priv, CMA_ROUTE_QUERY, CMA_ADDR_RESOLVED);
+ cma_deref_id(id_priv);
+ return ret;
+}
+EXPORT_SYMBOL(rdma_resolve_route);
+
+static int cma_bind_loopback(struct rdma_id_private *id_priv)
+{
+ struct cma_device *cma_dev;
+ union ib_gid *gid;
+ u16 pkey;
+ int ret;
+
+ down(&mutex);
+ if (list_empty(&dev_list)) {
+ ret = -ENODEV;
+ goto out;
+ }
+
+ cma_dev = list_entry(dev_list.next, struct cma_device, list);
+ gid = ib_addr_get_sgid(&id_priv->id.route.addr.dev_addr);
+ ret = ib_get_cached_gid(cma_dev->device, 1, 0, gid);
+ if (ret)
+ goto out;
+
+ ret = ib_get_cached_pkey(cma_dev->device, 1, 0, &pkey);
+ if (ret)
+ goto out;
+
+ ib_addr_set_pkey(&id_priv->id.route.addr.dev_addr, pkey);
+ id_priv->id.port_num = 1;
+ cma_attach_to_dev(id_priv, cma_dev);
+out:
+ up(&mutex);
+ return ret;
+}
+
+static void addr_handler(int status, struct sockaddr *src_addr,
+ struct rdma_dev_addr *dev_addr, void *context)
+{
+ struct rdma_id_private *id_priv = context;
+ enum rdma_cm_event_type event;
+ enum cma_state old_state;
+
+ atomic_inc(&id_priv->dev_remove);
+ if (!id_priv->cma_dev) {
+ old_state = CMA_IDLE;
+ if (!status)
+ status = cma_acquire_dev(id_priv);
+ } else
+ old_state = CMA_ADDR_BOUND;
+
+ if (status) {
+ if (!cma_comp_exch(id_priv, CMA_ADDR_QUERY, old_state))
+ goto out;
+ event = RDMA_CM_EVENT_ADDR_ERROR;
+ } else {
+ if (!cma_comp_exch(id_priv, CMA_ADDR_QUERY, CMA_ADDR_RESOLVED))
+ goto out;
+ memcpy(&id_priv->id.route.addr.src_addr, src_addr,
+ ip_addr_size(src_addr));
+ event = RDMA_CM_EVENT_ADDR_RESOLVED;
+ }
+
+ if (cma_notify_user(id_priv, event, status, NULL, 0)) {
+ cma_exch(id_priv, CMA_DESTROYING);
+ cma_release_remove(id_priv);
+ cma_deref_id(id_priv);
+ rdma_destroy_id(&id_priv->id);
+ return;
+ }
+out:
+ cma_release_remove(id_priv);
+ cma_deref_id(id_priv);
+}
+
+static void loopback_addr_handler(void *data)
+{
+ struct cma_work *work = data;
+ struct rdma_id_private *id_priv = work->id;
+
+ kfree(work);
+ atomic_inc(&id_priv->dev_remove);
+
+ if (!cma_comp_exch(id_priv, CMA_ADDR_QUERY, CMA_ADDR_RESOLVED))
+ goto out;
+
+ if (cma_notify_user(id_priv, RDMA_CM_EVENT_ADDR_RESOLVED, 0, NULL, 0)) {
+ cma_exch(id_priv, CMA_DESTROYING);
+ cma_release_remove(id_priv);
+ cma_deref_id(id_priv);
+ rdma_destroy_id(&id_priv->id);
+ return;
+ }
+out:
+ cma_release_remove(id_priv);
+ cma_deref_id(id_priv);
+}
+
+static int cma_resolve_loopback(struct rdma_id_private *id_priv,
+ struct sockaddr *src_addr, enum cma_state state)
+{
+ struct cma_work *work;
+ struct rdma_dev_addr *dev_addr;
+ int ret;
+
+ work = kmalloc(sizeof *work, GFP_KERNEL);
+ if (!work)
+ return -ENOMEM;
+
+ if (state == CMA_IDLE) {
+ ret = cma_bind_loopback(id_priv);
+ if (ret)
+ goto err;
+ dev_addr = &id_priv->id.route.addr.dev_addr;
+ ib_addr_set_dgid(dev_addr, ib_addr_get_sgid(dev_addr));
+ if (!src_addr || cma_any_addr(src_addr))
+ src_addr = &id_priv->id.route.addr.dst_addr;
+ memcpy(&id_priv->id.route.addr.src_addr, src_addr,
+ ip_addr_size(src_addr));
+ }
+
+ work->id = id_priv;
+ INIT_WORK(&work->work, loopback_addr_handler, work);
+ queue_work(rdma_wq, &work->work);
+ return 0;
+err:
+ kfree(work);
+ return ret;
+}
+
+int rdma_resolve_addr(struct rdma_cm_id *id, struct sockaddr *src_addr,
+ struct sockaddr *dst_addr, int timeout_ms)
+{
+ struct rdma_id_private *id_priv;
+ enum cma_state expected_state;
+ int ret;
+
+ id_priv = container_of(id, struct rdma_id_private, id);
+ if (id_priv->cma_dev) {
+ expected_state = CMA_ADDR_BOUND;
+ src_addr = &id->route.addr.src_addr;
+ } else
+ expected_state = CMA_IDLE;
+
+ if (!cma_comp_exch(id_priv, expected_state, CMA_ADDR_QUERY))
+ return -EINVAL;
+
+ atomic_inc(&id_priv->refcount);
+ memcpy(&id->route.addr.dst_addr, dst_addr, ip_addr_size(dst_addr));
+ if (cma_loopback_addr(dst_addr))
+ ret = cma_resolve_loopback(id_priv, src_addr, expected_state);
+ else
+ ret = rdma_resolve_ip(src_addr, dst_addr,
+ &id->route.addr.dev_addr,
+ timeout_ms, addr_handler, id_priv);
+ if (ret)
+ goto err;
+
+ return 0;
+err:
+ cma_comp_exch(id_priv, CMA_ADDR_QUERY, expected_state);
+ cma_deref_id(id_priv);
+ return ret;
+}
+EXPORT_SYMBOL(rdma_resolve_addr);
+
+int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)
+{
+ struct rdma_id_private *id_priv;
+ struct rdma_dev_addr *dev_addr;
+ int ret;
+
+ if (addr->sa_family != AF_INET)
+ return -EINVAL;
+
+ id_priv = container_of(id, struct rdma_id_private, id);
+ if (!cma_comp_exch(id_priv, CMA_IDLE, CMA_ADDR_BOUND))
+ return -EINVAL;
+
+ if (cma_any_addr(addr)) {
+ ret = 0;
+ } else if (cma_loopback_addr(addr)) {
+ ret = cma_bind_loopback(id_priv);
+ } else {
+ dev_addr = &id->route.addr.dev_addr;
+ ret = rdma_translate_ip(addr, dev_addr);
+ if (!ret)
+ ret = cma_acquire_dev(id_priv);
+ }
+
+ if (ret)
+ goto err;
+
+ memcpy(&id->route.addr.src_addr, addr, ip_addr_size(addr));
+ return 0;
+err:
+ cma_comp_exch(id_priv, CMA_ADDR_BOUND, CMA_IDLE);
+ return ret;
+}
+EXPORT_SYMBOL(rdma_bind_addr);
+
+static void cma_format_hdr(void *hdr, enum rdma_port_space ps,
+ struct rdma_route *route)
+{
+ struct sockaddr_in *src4, *dst4;
+ struct cma_hdr *cma_hdr;
+ struct sdp_hh *sdp_hdr;
+
+ src4 = (struct sockaddr_in *) &route->addr.src_addr;
+ dst4 = (struct sockaddr_in *) &route->addr.dst_addr;
+
+ switch (ps) {
+ case RDMA_PS_SDP:
+ sdp_hdr = hdr;
+ sdp_hdr->sdp_version = SDP_VERSION;
+ sdp_set_ip_ver(sdp_hdr, 4);
+ sdp_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr;
+ sdp_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr;
+ sdp_hdr->port = src4->sin_port;
+ break;
+ default:
+ cma_hdr = hdr;
+ cma_hdr->cma_version = CMA_VERSION;
+ cma_set_ip_ver(cma_hdr, 4);
+ cma_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr;
+ cma_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr;
+ cma_hdr->port = src4->sin_port;
+ break;
+ }
+}
+
+static int cma_connect_ib(struct rdma_id_private *id_priv,
+ struct rdma_conn_param *conn_param)
+{
+ struct ib_cm_req_param req;
+ struct rdma_route *route;
+ void *private_data;
+ int offset, ret;
+
+ memset(&req, 0, sizeof req);
+ offset = cma_user_data_offset(id_priv->id.ps);
+ req.private_data_len = offset + conn_param->private_data_len;
+ private_data = kzalloc(req.private_data_len, GFP_ATOMIC);
+ if (!private_data)
+ return -ENOMEM;
+
+ if (conn_param->private_data && conn_param->private_data_len)
+ memcpy(private_data + offset, conn_param->private_data,
+ conn_param->private_data_len);
+
+ id_priv->cm_id = ib_create_cm_id(id_priv->id.device, cma_ib_handler,
+ id_priv);
+ if (IS_ERR(id_priv->cm_id)) {
+ ret = PTR_ERR(id_priv->cm_id);
+ goto out;
+ }
+
+ route = &id_priv->id.route;
+ cma_format_hdr(private_data, id_priv->id.ps, route);
+ req.private_data = private_data;
+
+ req.primary_path = &route->path_rec[0];
+ if (route->num_paths == 2)
+ req.alternate_path = &route->path_rec[1];
+
+ req.service_id = cma_get_service_id(id_priv->id.ps,
+ &route->addr.dst_addr);
+ req.qp_num = id_priv->qp_num;
+ req.qp_type = id_priv->qp_type;
+ req.starting_psn = id_priv->seq_num;
+ req.responder_resources = conn_param->responder_resources;
+ req.initiator_depth = conn_param->initiator_depth;
+ req.flow_control = conn_param->flow_control;
+ req.retry_count = conn_param->retry_count;
+ req.rnr_retry_count = conn_param->rnr_retry_count;
+ req.remote_cm_response_timeout = CMA_CM_RESPONSE_TIMEOUT;
+ req.local_cm_response_timeout = CMA_CM_RESPONSE_TIMEOUT;
+ req.max_cm_retries = CMA_MAX_CM_RETRIES;
+ req.srq = id_priv->srq ? 1 : 0;
+
+ ret = ib_send_cm_req(id_priv->cm_id, &req);
+out:
+ kfree(private_data);
+ return ret;
+}
+
+int rdma_connect(struct rdma_cm_id *id, struct rdma_conn_param *conn_param)
+{
+ struct rdma_id_private *id_priv;
+ int ret;
+
+ id_priv = container_of(id, struct rdma_id_private, id);
+ if (!cma_comp_exch(id_priv, CMA_ROUTE_RESOLVED, CMA_CONNECT))
+ return -EINVAL;
+
+ if (!id->qp) {
+ id_priv->qp_num = conn_param->qp_num;
+ id_priv->qp_type = conn_param->qp_type;
+ id_priv->srq = conn_param->srq;
+ }
+
+ switch (id->device->node_type) {
+ case IB_NODE_CA:
+ ret = cma_connect_ib(id_priv, conn_param);
+ break;
+ default:
+ ret = -ENOSYS;
+ break;
+ }
+ if (ret)
+ goto err;
+
+ return 0;
+err:
+ cma_comp_exch(id_priv, CMA_CONNECT, CMA_ROUTE_RESOLVED);
+ return ret;
+}
+EXPORT_SYMBOL(rdma_connect);
+
+static int cma_accept_ib(struct rdma_id_private *id_priv,
+ struct rdma_conn_param *conn_param)
+{
+ struct ib_cm_rep_param rep;
+ int ret;
+
+ ret = cma_modify_qp_rtr(&id_priv->id);
+ if (ret)
+ return ret;
+
+ memset(&rep, 0, sizeof rep);
+ rep.qp_num = id_priv->qp_num;
+ rep.starting_psn = id_priv->seq_num;
+ rep.private_data = conn_param->private_data;
+ rep.private_data_len = conn_param->private_data_len;
+ rep.responder_resources = conn_param->responder_resources;
+ rep.initiator_depth = conn_param->initiator_depth;
+ rep.target_ack_delay = CMA_CM_RESPONSE_TIMEOUT;
+ rep.failover_accepted = 0;
+ rep.flow_control = conn_param->flow_control;
+ rep.rnr_retry_count = conn_param->rnr_retry_count;
+ rep.srq = id_priv->srq ? 1 : 0;
+
+ return ib_send_cm_rep(id_priv->cm_id, &rep);
+}
+
+int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param)
+{
+ struct rdma_id_private *id_priv;
+ int ret;
+
+ id_priv = container_of(id, struct rdma_id_private, id);
+ if (!cma_comp(id_priv, CMA_CONNECT))
+ return -EINVAL;
+
+ if (!id->qp && conn_param) {
+ id_priv->qp_num = conn_param->qp_num;
+ id_priv->qp_type = conn_param->qp_type;
+ id_priv->srq = conn_param->srq;
+ }
+
+ switch (id->device->node_type) {
+ case IB_NODE_CA:
+ if (conn_param)
+ ret = cma_accept_ib(id_priv, conn_param);
+ else
+ ret = cma_rep_recv(id_priv);
+ break;
+ default:
+ ret = -ENOSYS;
+ break;
+ }
+
+ if (ret)
+ goto reject;
+
+ return 0;
+reject:
+ cma_modify_qp_err(id);
+ rdma_reject(id, NULL, 0);
+ return ret;
+}
+EXPORT_SYMBOL(rdma_accept);
+
+int rdma_reject(struct rdma_cm_id *id, const void *private_data,
+ u8 private_data_len)
+{
+ struct rdma_id_private *id_priv;
+ int ret;
+
+ id_priv = container_of(id, struct rdma_id_private, id);
+ if (!cma_comp(id_priv, CMA_CONNECT))
+ return -EINVAL;
+
+ switch (id->device->node_type) {
+ case IB_NODE_CA:
+ ret = ib_send_cm_rej(id_priv->cm_id, IB_CM_REJ_CONSUMER_DEFINED,
+ NULL, 0, private_data, private_data_len);
+ break;
+ default:
+ ret = -ENOSYS;
+ break;
+ }
+ return ret;
+};
+EXPORT_SYMBOL(rdma_reject);
+
+int rdma_disconnect(struct rdma_cm_id *id)
+{
+ struct rdma_id_private *id_priv;
+ int ret;
+
+ id_priv = container_of(id, struct rdma_id_private, id);
+ if (!cma_comp(id_priv, CMA_CONNECT))
+ return -EINVAL;
+
+ ret = cma_modify_qp_err(id);
+ if (ret)
+ goto out;
+
+ switch (id->device->node_type) {
+ case IB_NODE_CA:
+ /* Initiate or respond to a disconnect. */
+ if (ib_send_cm_dreq(id_priv->cm_id, NULL, 0))
+ ib_send_cm_drep(id_priv->cm_id, NULL, 0);
+ break;
+ default:
+ break;
+ }
+out:
+ return ret;
+}
+EXPORT_SYMBOL(rdma_disconnect);
+
+static void cma_add_one(struct ib_device *device)
+{
+ struct cma_device *cma_dev;
+ struct rdma_id_private *id_priv;
+
+ cma_dev = kmalloc(sizeof *cma_dev, GFP_KERNEL);
+ if (!cma_dev)
+ return;
+
+ cma_dev->device = device;
+ cma_dev->node_guid = device->node_guid;
+ if (!cma_dev->node_guid)
+ goto err;
+
+ init_waitqueue_head(&cma_dev->wait);
+ atomic_set(&cma_dev->refcount, 1);
+ INIT_LIST_HEAD(&cma_dev->id_list);
+ ib_set_client_data(device, &cma_client, cma_dev);
+
+ down(&mutex);
+ list_add_tail(&cma_dev->list, &dev_list);
+ list_for_each_entry(id_priv, &listen_any_list, list)
+ cma_listen_on_dev(id_priv, cma_dev);
+ up(&mutex);
+ return;
+err:
+ kfree(cma_dev);
+}
+
+static int cma_remove_id_dev(struct rdma_id_private *id_priv)
+{
+ enum cma_state state;
+
+ /* Record that we want to remove the device */
+ state = cma_exch(id_priv, CMA_DEVICE_REMOVAL);
+ if (state == CMA_DESTROYING)
+ return 0;
+
+ cma_cancel_operation(id_priv, state);
+ wait_event(id_priv->wait_remove, !atomic_read(&id_priv->dev_remove));
+
+ /* Check for destruction from another callback. */
+ if (!cma_comp(id_priv, CMA_DEVICE_REMOVAL))
+ return 0;
+
+ return cma_notify_user(id_priv, RDMA_CM_EVENT_DEVICE_REMOVAL,
+ 0, NULL, 0);
+}
+
+static void cma_process_remove(struct cma_device *cma_dev)
+{
+ struct list_head remove_list;
+ struct rdma_id_private *id_priv;
+ int ret;
+
+ INIT_LIST_HEAD(&remove_list);
+
+ down(&mutex);
+ while (!list_empty(&cma_dev->id_list)) {
+ id_priv = list_entry(cma_dev->id_list.next,
+ struct rdma_id_private, list);
+
+ if (cma_internal_listen(id_priv)) {
+ cma_destroy_listen(id_priv);
+ continue;
+ }
+
+ list_del(&id_priv->list);
+ list_add_tail(&id_priv->list, &remove_list);
+ atomic_inc(&id_priv->refcount);
+ up(&mutex);
+
+ ret = cma_remove_id_dev(id_priv);
+ cma_deref_id(id_priv);
+ if (ret)
+ rdma_destroy_id(&id_priv->id);
+
+ down(&mutex);
+ }
+ up(&mutex);
+
+ atomic_dec(&cma_dev->refcount);
+ wait_event(cma_dev->wait, !atomic_read(&cma_dev->refcount));
+}
+
+static void cma_remove_one(struct ib_device *device)
+{
+ struct cma_device *cma_dev;
+
+ cma_dev = ib_get_client_data(device, &cma_client);
+ if (!cma_dev)
+ return;
+
+ down(&mutex);
+ list_del(&cma_dev->list);
+ up(&mutex);
+
+ cma_process_remove(cma_dev);
+ kfree(cma_dev);
+}
+
+static int cma_init(void)
+{
+ return ib_register_client(&cma_client);
+}
+
+static void cma_cleanup(void)
+{
+ ib_unregister_client(&cma_client);
+}
+
+module_init(cma_init);
+module_exit(cma_cleanup);
diff -uprN -X linux-2.6.git/Documentation/dontdiff
linux-2.6.git/drivers/infiniband/core/Makefile
linux-2.6.ib/drivers/infiniband/core/Makefile
--- linux-2.6.git/drivers/infiniband/core/Makefile 2006-01-16 16:16:18.000000000 -0800
+++ linux-2.6.ib/drivers/infiniband/core/Makefile 2006-01-16 16:35:48.000000000 -0800
@@ -1,5 +1,5 @@
obj-$(CONFIG_INFINIBAND) += ib_core.o ib_mad.o ib_sa.o \
- ib_cm.o ib_addr.o
+ ib_cm.o ib_addr.o rdma_cm.o
obj-$(CONFIG_INFINIBAND_USER_MAD) += ib_umad.o
obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o ib_ucm.o
@@ -12,6 +12,8 @@ ib_sa-y := sa_query.o
ib_cm-y := cm.o
+rdma_cm-y := cma.o
+
ib_addr-y := addr.o
ib_umad-y := user_mad.o
diff -uprN -X linux-2.6.git/Documentation/dontdiff
linux-2.6.git/include/rdma/rdma_cm.h
linux-2.6.ib/include/rdma/rdma_cm.h
--- linux-2.6.git/include/rdma/rdma_cm.h 1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.ib/include/rdma/rdma_cm.h 2006-01-16 16:19:12.000000000 -0800
@@ -0,0 +1,255 @@
+/*
+ * Copyright (c) 2005 Voltaire Inc. All rights reserved.
+ * Copyright (c) 2005 Intel Corporation. All rights reserved.
+ *
+ * This Software is licensed under one of the following licenses:
+ *
+ * 1) under the terms of the "Common Public License 1.0" a copy of which is
+ * available from the Open Source Initiative, see
+ * http://www.opensource.org/licenses/cpl.php.
+ *
+ * 2) under the terms of the "The BSD License" a copy of which is
+ * available from the Open Source Initiative, see
+ * http://www.opensource.org/licenses/bsd-license.php.
+ *
+ * 3) under the terms of the "GNU General Public License (GPL) Version 2" a
+ * copy of which is available from the Open Source Initiative, see
+ * http://www.opensource.org/licenses/gpl-license.php.
+ *
+ * Licensee has the right to choose one of the above licenses.
+ *
+ * Redistributions of source code must retain the above copyright
+ * notice and one of the license notices.
+ *
+ * Redistributions in binary form must reproduce both the above copyright
+ * notice, one of the license notices in the documentation
+ * and/or other materials provided with the distribution.
+ *
+ */
+
+#if !defined(RDMA_CM_H)
+#define RDMA_CM_H
+
+#include <linux/socket.h>
+#include <linux/in6.h>
+#include <rdma/ib_addr.h>
+#include <rdma/ib_sa.h>
+
+/*
+ * Upon receiving a device removal event, users must destroy the associated
+ * RDMA identifier and release all resources allocated with the device.
+ */
+enum rdma_cm_event_type {
+ RDMA_CM_EVENT_ADDR_RESOLVED,
+ RDMA_CM_EVENT_ADDR_ERROR,
+ RDMA_CM_EVENT_ROUTE_RESOLVED,
+ RDMA_CM_EVENT_ROUTE_ERROR,
+ RDMA_CM_EVENT_CONNECT_REQUEST,
+ RDMA_CM_EVENT_CONNECT_RESPONSE,
+ RDMA_CM_EVENT_CONNECT_ERROR,
+ RDMA_CM_EVENT_UNREACHABLE,
+ RDMA_CM_EVENT_REJECTED,
+ RDMA_CM_EVENT_ESTABLISHED,
+ RDMA_CM_EVENT_DISCONNECTED,
+ RDMA_CM_EVENT_DEVICE_REMOVAL,
+};
+
+enum rdma_port_space {
+ RDMA_PS_SDP = 0x0001,
+ RDMA_PS_TCP = 0x0106,
+ RDMA_PS_UDP = 0x0111,
+ RDMA_PS_SCTP = 0x0183
+};
+
+struct rdma_addr {
+ struct sockaddr src_addr;
+ u8 src_pad[sizeof(struct sockaddr_in6) -
+ sizeof(struct sockaddr)];
+ struct sockaddr dst_addr;
+ u8 dst_pad[sizeof(struct sockaddr_in6) -
+ sizeof(struct sockaddr)];
+ struct rdma_dev_addr dev_addr;
+};
+
+struct rdma_route {
+ struct rdma_addr addr;
+ struct ib_sa_path_rec *path_rec;
+ int num_paths;
+};
+
+struct rdma_cm_event {
+ enum rdma_cm_event_type event;
+ int status;
+ void *private_data;
+ u8 private_data_len;
+};
+
+struct rdma_cm_id;
+
+/**
+ * rdma_cm_event_handler - Callback used to report user events.
+ *
+ * Notes: Users may not call rdma_destroy_id from this callback to destroy
+ * the passed in id, or a corresponding listen id. Returning a
+ * non-zero value from the callback will destroy the corresponding id.
+ */
+typedef int (*rdma_cm_event_handler)(struct rdma_cm_id *id,
+ struct rdma_cm_event *event);
+
+struct rdma_cm_id {
+ struct ib_device *device;
+ void *context;
+ struct ib_qp *qp;
+ rdma_cm_event_handler event_handler;
+ struct rdma_route route;
+ enum rdma_port_space ps;
+ u8 port_num;
+};
+
+/**
+ * rdma_create_id - Create an RDMA identifier.
+ *
+ * @event_handler: User callback invoked to report events associated with the
+ * returned rdma_id.
+ * @context: User specified context associated with the id.
+ * @ps: RDMA port space.
+ */
+struct rdma_cm_id* rdma_create_id(rdma_cm_event_handler event_handler,
+ void *context, enum rdma_port_space ps);
+
+void rdma_destroy_id(struct rdma_cm_id *id);
+
+/**
+ * rdma_bind_addr - Bind an RDMA identifier to a source address and
+ * associated RDMA device, if needed.
+ *
+ * @id: RDMA identifier.
+ * @addr: Local address information. Wildcard values are permitted.
+ *
+ * This associates a source address with the RDMA identifier before calling
+ * rdma_listen. If a specific local address is given, the RDMA identifier will
+ * be bound to a local RDMA device.
+ */
+int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr);
+
+/**
+ * rdma_resolve_addr - Resolve destination and optional source addresses
+ * from IP addresses to an RDMA address. If successful, the specified
+ * rdma_cm_id will be bound to a local device.
+ *
+ * @id: RDMA identifier.
+ * @src_addr: Source address information. This parameter may be NULL.
+ * @dst_addr: Destination address information.
+ * @timeout_ms: Time to wait for resolution to complete.
+ */
+int rdma_resolve_addr(struct rdma_cm_id *id, struct sockaddr *src_addr,
+ struct sockaddr *dst_addr, int timeout_ms);
+
+/**
+ * rdma_resolve_route - Resolve the RDMA address bound to the RDMA identifier
+ * into route information needed to establish a connection.
+ *
+ * This is called on the client side of a connection.
+ * Users must have first called rdma_resolve_addr to resolve a dst_addr
+ * into an RDMA address before calling this routine.
+ */
+int rdma_resolve_route(struct rdma_cm_id *id, int timeout_ms);
+
+/**
+ * rdma_create_qp - Allocate a QP and associate it with the specified RDMA
+ * identifier.
+ *
+ * QPs allocated to an rdma_cm_id will automatically be transitioned by the CMA
+ * through their states.
+ */
+int rdma_create_qp(struct rdma_cm_id *id, struct ib_pd *pd,
+ struct ib_qp_init_attr *qp_init_attr);
+
+/**
+ * rdma_destroy_qp - Deallocate the QP associated with the specified RDMA
+ * identifier.
+ *
+ * Users must destroy any QP associated with an RDMA identifier before
+ * destroying the RDMA ID.
+ */
+void rdma_destroy_qp(struct rdma_cm_id *id);
+
+/**
+ * rdma_init_qp_attr - Initializes the QP attributes for use in transitioning
+ * to a specified QP state.
+ * @id: Communication identifier associated with the QP attributes to
+ * initialize.
+ * @qp_attr: On input, specifies the desired QP state. On output, the
+ * mandatory and desired optional attributes will be set in order to
+ * modify the QP to the specified state.
+ * @qp_attr_mask: The QP attribute mask that may be used to transition the
+ * QP to the specified state.
+ *
+ * Users must set the @qp_attr->qp_state to the desired QP state. This call
+ * will set all required attributes for the given transition, along with
+ * known optional attributes. Users may override the attributes returned from
+ * this call before calling ib_modify_qp.
+ *
+ * Users that wish to have their QP automatically transitioned through its
+ * states can associate a QP with the rdma_cm_id by calling rdma_create_qp().
+ */
+int rdma_init_qp_attr(struct rdma_cm_id *id, struct ib_qp_attr *qp_attr,
+ int *qp_attr_mask);
+
+struct rdma_conn_param {
+ const void *private_data;
+ u8 private_data_len;
+ u8 responder_resources;
+ u8 initiator_depth;
+ u8 flow_control;
+ u8 retry_count; /* ignored when accepting */
+ u8 rnr_retry_count;
+ /* Fields below ignored if a QP is created on the rdma_cm_id. */
+ u8 srq;
+ u32 qp_num;
+ enum ib_qp_type qp_type;
+};
+
+/**
+ * rdma_connect - Initiate an active connection request.
+ *
+ * Users must have resolved a route for the rdma_cm_id to connect with
+ * by having called rdma_resolve_route before calling this routine.
+ */
+int rdma_connect(struct rdma_cm_id *id, struct rdma_conn_param *conn_param);
+
+/**
+ * rdma_listen - This function is called by the passive side to
+ * listen for incoming connection requests.
+ *
+ * Users must have bound the rdma_cm_id to a local address by calling
+ * rdma_bind_addr before calling this routine.
+ */
+int rdma_listen(struct rdma_cm_id *id, int backlog);
+
+/**
+ * rdma_accept - Called to accept a connection request or response.
+ * @id: Connection identifier associated with the request.
+ * @conn_param: Information needed to establish the connection. This must be
+ * provided if accepting a connection request. If accepting a connection
+ * response, this parameter must be NULL.
+ *
+ * Typically, this routine is only called by the listener to accept a connection
+ * request. It must also be called on the active side of a connection if the
+ * user is performing their own QP transitions.
+ */
+int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param);
+
+/**
+ * rdma_reject - Called on the passive side to reject a connection request.
+ */
+int rdma_reject(struct rdma_cm_id *id, const void *private_data,
+ u8 private_data_len);
+
+/**
+ * rdma_disconnect - This function disconnects the associated QP.
+ */
+int rdma_disconnect(struct rdma_cm_id *id);
+
+#endif /* RDMA_CM_H */
+
^ permalink raw reply
* Re: de2104x: interrupts before interrupt handler is registered
From: Martin Michlmayr @ 2006-03-06 19:17 UTC (permalink / raw)
To: Francois Romieu; +Cc: netdev, linux-kernel
In-Reply-To: <20060306143512.GI23669@deprecation.cyrius.com>
* Martin Michlmayr <tbm@cyrius.com> [2006-03-06 14:35]:
> Thanks a lot for your quick response, Francois. I can confirm that
> this patch fixes the problem for me.
There's another interrupt related bug in the driver, though. I
sometimes get a kernel panic when rsycing several 100 megs of data
across the LAN. A picture showing the call trace can be found at
http://www.cyrius.com/tmp/de2104x_panic.jpg
--
Martin Michlmayr
http://www.cyrius.com/
^ permalink raw reply
* [PATCH 4/6] IB: address translation to map IP to IB addresses (GIDs)
From: Sean Hefty @ 2006-03-06 19:10 UTC (permalink / raw)
To: 'Roland Dreier', linux-kernel, netdev; +Cc: openib-general
In-Reply-To: <adaslpz2l9p.fsf@cisco.com>
Add an address translation service that maps IP addresses to Infiniband
GID addresses using IPoIB.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
---
diff -uprN -X linux-2.6.git/Documentation/dontdiff
linux-2.6.git/drivers/infiniband/core/addr.c
linux-2.6.ib/drivers/infiniband/core/addr.c
--- linux-2.6.git/drivers/infiniband/core/addr.c 1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.ib/drivers/infiniband/core/addr.c 2006-01-16 16:14:24.000000000 -0800
@@ -0,0 +1,356 @@
+/*
+ * Copyright (c) 2005 Voltaire Inc. All rights reserved.
+ * Copyright (c) 2002-2005, Network Appliance, Inc. All rights reserved.
+ * Copyright (c) 1999-2005, Mellanox Technologies, Inc. All rights reserved.
+ * Copyright (c) 2005 Intel Corporation. All rights reserved.
+ *
+ * This Software is licensed under one of the following licenses:
+ *
+ * 1) under the terms of the "Common Public License 1.0" a copy of which is
+ * available from the Open Source Initiative, see
+ * http://www.opensource.org/licenses/cpl.php.
+ *
+ * 2) under the terms of the "The BSD License" a copy of which is
+ * available from the Open Source Initiative, see
+ * http://www.opensource.org/licenses/bsd-license.php.
+ *
+ * 3) under the terms of the "GNU General Public License (GPL) Version 2" a
+ * copy of which is available from the Open Source Initiative, see
+ * http://www.opensource.org/licenses/gpl-license.php.
+ *
+ * Licensee has the right to choose one of the above licenses.
+ *
+ * Redistributions of source code must retain the above copyright
+ * notice and one of the license notices.
+ *
+ * Redistributions in binary form must reproduce both the above copyright
+ * notice, one of the license notices in the documentation
+ * and/or other materials provided with the distribution.
+ */
+#include <linux/inetdevice.h>
+#include <linux/workqueue.h>
+#include <net/arp.h>
+#include <net/neighbour.h>
+#include <net/route.h>
+#include <rdma/ib_addr.h>
+
+MODULE_AUTHOR("Sean Hefty");
+MODULE_DESCRIPTION("IB Address Translation");
+MODULE_LICENSE("Dual BSD/GPL");
+
+struct addr_req {
+ struct list_head list;
+ struct sockaddr src_addr;
+ struct sockaddr dst_addr;
+ struct rdma_dev_addr *addr;
+ void *context;
+ void (*callback)(int status, struct sockaddr *src_addr,
+ struct rdma_dev_addr *addr, void *context);
+ unsigned long timeout;
+ int status;
+};
+
+static void process_req(void *data);
+
+static DECLARE_MUTEX(mutex);
+static LIST_HEAD(req_list);
+static DECLARE_WORK(work, process_req, NULL);
+struct workqueue_struct *rdma_wq;
+EXPORT_SYMBOL(rdma_wq);
+
+static int copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev,
+ unsigned char *dst_dev_addr)
+{
+ switch (dev->type) {
+ case ARPHRD_INFINIBAND:
+ dev_addr->dev_type = IB_NODE_CA;
+ break;
+ default:
+ return -EADDRNOTAVAIL;
+ }
+
+ memcpy(dev_addr->src_dev_addr, dev->dev_addr, MAX_ADDR_LEN);
+ memcpy(dev_addr->broadcast, dev->broadcast, MAX_ADDR_LEN);
+ if (dst_dev_addr)
+ memcpy(dev_addr->dst_dev_addr, dst_dev_addr, MAX_ADDR_LEN);
+ return 0;
+}
+
+int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr)
+{
+ struct net_device *dev;
+ u32 ip = ((struct sockaddr_in *) addr)->sin_addr.s_addr;
+ int ret;
+
+ dev = ip_dev_find(ip);
+ if (!dev)
+ return -EADDRNOTAVAIL;
+
+ ret = copy_addr(dev_addr, dev, NULL);
+ dev_put(dev);
+ return ret;
+}
+EXPORT_SYMBOL(rdma_translate_ip);
+
+static void set_timeout(unsigned long time)
+{
+ unsigned long delay;
+
+ cancel_delayed_work(&work);
+
+ delay = time - jiffies;
+ if ((long)delay <= 0)
+ delay = 1;
+
+ queue_delayed_work(rdma_wq, &work, delay);
+}
+
+static void queue_req(struct addr_req *req)
+{
+ struct addr_req *temp_req;
+
+ down(&mutex);
+ list_for_each_entry_reverse(temp_req, &req_list, list) {
+ if (time_after(req->timeout, temp_req->timeout))
+ break;
+ }
+
+ list_add(&req->list, &temp_req->list);
+
+ if (req_list.next == &req->list)
+ set_timeout(req->timeout);
+ up(&mutex);
+}
+
+static void addr_send_arp(struct sockaddr_in *dst_in)
+{
+ struct rtable *rt;
+ struct flowi fl;
+ u32 dst_ip = dst_in->sin_addr.s_addr;
+
+ memset(&fl, 0, sizeof fl);
+ fl.nl_u.ip4_u.daddr = dst_ip;
+ if (ip_route_output_key(&rt, &fl))
+ return;
+
+ arp_send(ARPOP_REQUEST, ETH_P_ARP, rt->rt_gateway, rt->idev->dev,
+ rt->rt_src, NULL, rt->idev->dev->dev_addr, NULL);
+ ip_rt_put(rt);
+}
+
+static int addr_resolve_remote(struct sockaddr_in *src_in,
+ struct sockaddr_in *dst_in,
+ struct rdma_dev_addr *addr)
+{
+ u32 src_ip = src_in->sin_addr.s_addr;
+ u32 dst_ip = dst_in->sin_addr.s_addr;
+ struct flowi fl;
+ struct rtable *rt;
+ struct neighbour *neigh;
+ int ret;
+
+ memset(&fl, 0, sizeof fl);
+ fl.nl_u.ip4_u.daddr = dst_ip;
+ fl.nl_u.ip4_u.saddr = src_ip;
+ ret = ip_route_output_key(&rt, &fl);
+ if (ret)
+ goto out;
+
+ neigh = neigh_lookup(&arp_tbl, &rt->rt_gateway, rt->idev->dev);
+ if (!neigh) {
+ ret = -ENODATA;
+ goto err1;
+ }
+
+ if (!(neigh->nud_state & NUD_VALID)) {
+ ret = -ENODATA;
+ goto err2;
+ }
+
+ if (!src_ip) {
+ src_in->sin_family = dst_in->sin_family;
+ src_in->sin_addr.s_addr = rt->rt_src;
+ }
+
+ ret = copy_addr(addr, neigh->dev, neigh->ha);
+err2:
+ neigh_release(neigh);
+err1:
+ ip_rt_put(rt);
+out:
+ return ret;
+}
+
+static void process_req(void *data)
+{
+ struct addr_req *req, *temp_req;
+ struct sockaddr_in *src_in, *dst_in;
+ struct list_head done_list;
+
+ INIT_LIST_HEAD(&done_list);
+
+ down(&mutex);
+ list_for_each_entry_safe(req, temp_req, &req_list, list) {
+ if (req->status) {
+ src_in = (struct sockaddr_in *) &req->src_addr;
+ dst_in = (struct sockaddr_in *) &req->dst_addr;
+ req->status = addr_resolve_remote(src_in, dst_in,
+ req->addr);
+ }
+ if (req->status && time_after(jiffies, req->timeout))
+ req->status = -ETIMEDOUT;
+ else if (req->status == -ENODATA)
+ continue;
+
+ list_del(&req->list);
+ list_add_tail(&req->list, &done_list);
+ }
+
+ if (!list_empty(&req_list)) {
+ req = list_entry(req_list.next, struct addr_req, list);
+ set_timeout(req->timeout);
+ }
+ up(&mutex);
+
+ list_for_each_entry_safe(req, temp_req, &done_list, list) {
+ list_del(&req->list);
+ req->callback(req->status, &req->src_addr, req->addr,
+ req->context);
+ kfree(req);
+ }
+}
+
+static int addr_resolve_local(struct sockaddr_in *src_in,
+ struct sockaddr_in *dst_in,
+ struct rdma_dev_addr *addr)
+{
+ struct net_device *dev;
+ u32 src_ip = src_in->sin_addr.s_addr;
+ u32 dst_ip = dst_in->sin_addr.s_addr;
+ int ret;
+
+ dev = ip_dev_find(dst_ip);
+ if (!dev)
+ return -EADDRNOTAVAIL;
+
+ if (!src_ip) {
+ src_in->sin_family = dst_in->sin_family;
+ src_in->sin_addr.s_addr = dst_ip;
+ ret = copy_addr(addr, dev, dev->dev_addr);
+ } else {
+ ret = rdma_translate_ip((struct sockaddr *)src_in, addr);
+ if (!ret)
+ memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN);
+ }
+
+ dev_put(dev);
+ return ret;
+}
+
+int rdma_resolve_ip(struct sockaddr *src_addr, struct sockaddr *dst_addr,
+ struct rdma_dev_addr *addr, int timeout_ms,
+ void (*callback)(int status, struct sockaddr *src_addr,
+ struct rdma_dev_addr *addr, void *context),
+ void *context)
+{
+ struct sockaddr_in *src_in, *dst_in;
+ struct addr_req *req;
+ int ret = 0;
+
+ req = kmalloc(sizeof *req, GFP_KERNEL);
+ if (!req)
+ return -ENOMEM;
+ memset(req, 0, sizeof *req);
+
+ if (src_addr)
+ memcpy(&req->src_addr, src_addr, ip_addr_size(src_addr));
+ memcpy(&req->dst_addr, dst_addr, ip_addr_size(dst_addr));
+ req->addr = addr;
+ req->callback = callback;
+ req->context = context;
+
+ src_in = (struct sockaddr_in *) &req->src_addr;
+ dst_in = (struct sockaddr_in *) &req->dst_addr;
+
+ req->status = addr_resolve_local(src_in, dst_in, addr);
+ if (req->status == -EADDRNOTAVAIL)
+ req->status = addr_resolve_remote(src_in, dst_in, addr);
+
+ switch (req->status) {
+ case 0:
+ req->timeout = jiffies;
+ queue_req(req);
+ break;
+ case -ENODATA:
+ req->timeout = msecs_to_jiffies(timeout_ms) + jiffies;
+ queue_req(req);
+ addr_send_arp(dst_in);
+ break;
+ default:
+ ret = req->status;
+ kfree(req);
+ break;
+ }
+ return ret;
+}
+EXPORT_SYMBOL(rdma_resolve_ip);
+
+void rdma_addr_cancel(struct rdma_dev_addr *addr)
+{
+ struct addr_req *req, *temp_req;
+
+ up(&mutex);
+ list_for_each_entry_safe(req, temp_req, &req_list, list) {
+ if (req->addr == addr) {
+ req->status = -ECANCELED;
+ req->timeout = jiffies;
+ list_del(&req->list);
+ list_add(&req->list, &req_list);
+ set_timeout(req->timeout);
+ break;
+ }
+ }
+ up(&mutex);
+}
+EXPORT_SYMBOL(rdma_addr_cancel);
+
+static int addr_arp_recv(struct sk_buff *skb, struct net_device *dev,
+ struct packet_type *pkt, struct net_device *orig_dev)
+{
+ struct arphdr *arp_hdr;
+
+ arp_hdr = (struct arphdr *) skb->nh.raw;
+
+ if (dev->type == ARPHRD_INFINIBAND &&
+ (arp_hdr->ar_op == __constant_htons(ARPOP_REQUEST) ||
+ arp_hdr->ar_op == __constant_htons(ARPOP_REPLY)))
+ set_timeout(jiffies);
+
+ kfree_skb(skb);
+ return 0;
+}
+
+static struct packet_type addr_arp = {
+ .type = __constant_htons(ETH_P_ARP),
+ .func = addr_arp_recv,
+ .af_packet_priv = (void*) 1,
+};
+
+static int addr_init(void)
+{
+ rdma_wq = create_singlethread_workqueue("rdma_wq");
+ if (!rdma_wq)
+ return -ENOMEM;
+
+ dev_add_pack(&addr_arp);
+ return 0;
+}
+
+static void addr_cleanup(void)
+{
+ dev_remove_pack(&addr_arp);
+ destroy_workqueue(rdma_wq);
+}
+
+module_init(addr_init);
+module_exit(addr_cleanup);
diff -uprN -X linux-2.6.git/Documentation/dontdiff
linux-2.6.git/drivers/infiniband/core/Makefile
linux-2.6.ib/drivers/infiniband/core/Makefile
--- linux-2.6.git/drivers/infiniband/core/Makefile 2006-01-16 16:03:08.000000000 -0800
+++ linux-2.6.ib/drivers/infiniband/core/Makefile 2006-01-16 16:14:24.000000000 -0800
@@ -1,5 +1,5 @@
obj-$(CONFIG_INFINIBAND) += ib_core.o ib_mad.o ib_sa.o \
- ib_cm.o
+ ib_cm.o ib_addr.o
obj-$(CONFIG_INFINIBAND_USER_MAD) += ib_umad.o
obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o ib_ucm.o
@@ -12,6 +12,8 @@ ib_sa-y := sa_query.o
ib_cm-y := cm.o
+ib_addr-y := addr.o
+
ib_umad-y := user_mad.o
ib_ucm-y := ucm.o
diff -uprN -X linux-2.6.git/Documentation/dontdiff
linux-2.6.git/include/rdma/ib_addr.h
linux-2.6.ib/include/rdma/ib_addr.h
--- linux-2.6.git/include/rdma/ib_addr.h 1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.ib/include/rdma/ib_addr.h 2006-01-16 16:14:24.000000000 -0800
@@ -0,0 +1,117 @@
+/*
+ * Copyright (c) 2005 Voltaire Inc. All rights reserved.
+ * Copyright (c) 2005 Intel Corporation. All rights reserved.
+ *
+ * This Software is licensed under one of the following licenses:
+ *
+ * 1) under the terms of the "Common Public License 1.0" a copy of which is
+ * available from the Open Source Initiative, see
+ * http://www.opensource.org/licenses/cpl.php.
+ *
+ * 2) under the terms of the "The BSD License" a copy of which is
+ * available from the Open Source Initiative, see
+ * http://www.opensource.org/licenses/bsd-license.php.
+ *
+ * 3) under the terms of the "GNU General Public License (GPL) Version 2" a
+ * copy of which is available from the Open Source Initiative, see
+ * http://www.opensource.org/licenses/gpl-license.php.
+ *
+ * Licensee has the right to choose one of the above licenses.
+ *
+ * Redistributions of source code must retain the above copyright
+ * notice and one of the license notices.
+ *
+ * Redistributions in binary form must reproduce both the above copyright
+ * notice, one of the license notices in the documentation
+ * and/or other materials provided with the distribution.
+ *
+ */
+
+#if !defined(IB_ADDR_H)
+#define IB_ADDR_H
+
+#include <linux/in.h>
+#include <linux/in6.h>
+#include <linux/netdevice.h>
+#include <linux/socket.h>
+#include <rdma/ib_verbs.h>
+
+extern struct workqueue_struct *rdma_wq;
+
+struct rdma_dev_addr {
+ unsigned char src_dev_addr[MAX_ADDR_LEN];
+ unsigned char dst_dev_addr[MAX_ADDR_LEN];
+ unsigned char broadcast[MAX_ADDR_LEN];
+ enum ib_node_type dev_type;
+};
+
+/**
+ * rdma_translate_ip - Translate a local IP address to an RDMA hardware
+ * address.
+ */
+int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr);
+
+/**
+ * rdma_resolve_ip - Resolve source and destination IP addresses to
+ * RDMA hardware addresses.
+ * @src_addr: An optional source address to use in the resolution. If a
+ * source address is not provided, a usable address will be returned via
+ * the callback.
+ * @dst_addr: The destination address to resolve.
+ * @addr: A reference to a data location that will receive the resolved
+ * addresses. The data location must remain valid until the callback has
+ * been invoked.
+ * @timeout_ms: Amount of time to wait for the address resolution to complete.
+ * @callback: Call invoked once address resolution has completed, timed out,
+ * or been canceled. A status of 0 indicates success.
+ * @context: User-specified context associated with the call.
+ */
+int rdma_resolve_ip(struct sockaddr *src_addr, struct sockaddr *dst_addr,
+ struct rdma_dev_addr *addr, int timeout_ms,
+ void (*callback)(int status, struct sockaddr *src_addr,
+ struct rdma_dev_addr *addr, void *context),
+ void *context);
+
+void rdma_addr_cancel(struct rdma_dev_addr *addr);
+
+static inline int ip_addr_size(struct sockaddr *addr)
+{
+ return addr->sa_family == AF_INET6 ?
+ sizeof(struct sockaddr_in6) : sizeof(struct sockaddr_in);
+}
+
+static inline u16 ib_addr_get_pkey(struct rdma_dev_addr *dev_addr)
+{
+ return ((u16)dev_addr->broadcast[8] << 8) | (u16)dev_addr->broadcast[9];
+}
+
+static inline void ib_addr_set_pkey(struct rdma_dev_addr *dev_addr, u16 pkey)
+{
+ dev_addr->broadcast[8] = pkey >> 8;
+ dev_addr->broadcast[9] = (unsigned char) pkey;
+}
+
+static inline union ib_gid* ib_addr_get_sgid(struct rdma_dev_addr *dev_addr)
+{
+ return (union ib_gid *) (dev_addr->src_dev_addr + 4);
+}
+
+static inline void ib_addr_set_sgid(struct rdma_dev_addr *dev_addr,
+ union ib_gid *gid)
+{
+ memcpy(dev_addr->src_dev_addr + 4, gid, sizeof *gid);
+}
+
+static inline union ib_gid* ib_addr_get_dgid(struct rdma_dev_addr *dev_addr)
+{
+ return (union ib_gid *) (dev_addr->dst_dev_addr + 4);
+}
+
+static inline void ib_addr_set_dgid(struct rdma_dev_addr *dev_addr,
+ union ib_gid *gid)
+{
+ memcpy(dev_addr->dst_dev_addr + 4, gid, sizeof *gid);
+}
+
+#endif /* IB_ADDR_H */
+
^ permalink raw reply
* [PATCH 3/6] net/IB: export ip_dev_find
From: Sean Hefty @ 2006-03-06 19:07 UTC (permalink / raw)
To: 'Roland Dreier', linux-kernel, netdev; +Cc: openib-general
In-Reply-To: <adaslpz2l9p.fsf@cisco.com>
Export ip_dev_find to allow locating a net_device given an IP address.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
---
diff -uprN -X linux-2.6.git/Documentation/dontdiff
linux-2.6.git/net/ipv4/fib_frontend.c
linux-2.6.ib/net/ipv4/fib_frontend.c
--- linux-2.6.git/net/ipv4/fib_frontend.c 2006-01-16 10:28:29.000000000 -0800
+++ linux-2.6.ib/net/ipv4/fib_frontend.c 2006-01-16 16:14:24.000000000 -0800
@@ -666,4 +666,5 @@ void __init ip_fib_init(void)
}
EXPORT_SYMBOL(inet_addr_type);
+EXPORT_SYMBOL(ip_dev_find);
EXPORT_SYMBOL(ip_rt_ioctl);
^ permalink raw reply
* RFC: move SDP from AF_INET_SDP to IPPROTO_SDP
From: Michael S. Tsirkin @ 2006-03-06 19:06 UTC (permalink / raw)
To: Linux Kernel Mailing List, netdev
Hi!
I am working on implementing the Sockets Direct Protocol (SDP) for InfiniBand on
Linux. SDP uses the regular IPv4/IPv6 addresses and utilizes the IPv4/IPv6 layer
on top of InfiniBand for address resolution. However, all data is transferred by
means of an infiniband reliable connection.
Some existing SDP implementations posted on the openib.org subversion tree
create a new address family in a free slot, for this purpose.
Would it make sense to move SDP from using a separate address family to
a separate protocol under AF_INET and AF_INET6?
Something like IPPROTO_SDP?
The main advantages of this approach are
- IPv6 support will come more naturally and without further extending
to a yet another address family
- We could use a protocol number > 255 to avoid conflicting
with any IP based protocol.
There are much more free protocol numbers that free family numbers
(which only go up to 32 in linux for now).
- I could reuse more code for creating connections from af_inet.c
I also have a hunch this might make getaddrinfo and friends work better on sdp
selecting IPv4/IPv6 as appropriate but I'm not sure.
Comments? Are there disadvantages to this approach that someone can see?
Thanks,
--
Michael S. Tsirkin
Staff Engineer, Mellanox Technologies
_______________________________________________
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
----- End forwarded message -----
--
Michael S. Tsirkin
Staff Engineer, Mellanox Technologies
^ permalink raw reply
* [PATCH 2/6] IB: match connection requests based on private data
From: Sean Hefty @ 2006-03-06 19:04 UTC (permalink / raw)
To: 'Roland Dreier'; +Cc: netdev, linux-kernel, openib-general
In-Reply-To: <adaslpz2l9p.fsf@cisco.com>
Extend matching connection requests to listens in the Infiniband CM to include
private data checks.
This allows applications to listen on the same service identifier, with private
data directing the request to the appropriate application.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
---
diff -uprN -X linux-2.6.git/Documentation/dontdiff
linux-2.6.git/drivers/infiniband/core/cm.c
linux-2.6.ib/drivers/infiniband/core/cm.c
--- linux-2.6.git/drivers/infiniband/core/cm.c 2006-01-16 10:25:26.000000000 -0800
+++ linux-2.6.ib/drivers/infiniband/core/cm.c 2006-01-16 16:03:35.000000000 -0800
@@ -32,7 +32,7 @@
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*
- * $Id: cm.c 2821 2005-07-08 17:07:28Z sean.hefty $
+ * $Id: cm.c 4311 2005-12-05 18:42:01Z sean.hefty $
*/
#include <linux/dma-mapping.h>
#include <linux/err.h>
@@ -130,6 +130,7 @@ struct cm_id_private {
/* todo: use alternate port on send failure */
struct cm_av av;
struct cm_av alt_av;
+ struct ib_cm_private_data_compare *compare_data;
void *private_data;
__be64 tid;
@@ -355,6 +356,40 @@ static struct cm_id_private * cm_acquire
return cm_id_priv;
}
+static void cm_mask_compare_data(u8 *dst, u8 *src, u8 *mask)
+{
+ int i;
+
+ for (i = 0; i < IB_CM_PRIVATE_DATA_COMPARE_SIZE; i++)
+ dst[i] = src[i] & mask[i];
+}
+
+static int cm_compare_data(struct ib_cm_private_data_compare *src_data,
+ struct ib_cm_private_data_compare *dst_data)
+{
+ u8 src[IB_CM_PRIVATE_DATA_COMPARE_SIZE];
+ u8 dst[IB_CM_PRIVATE_DATA_COMPARE_SIZE];
+
+ if (!src_data || !dst_data)
+ return 0;
+
+ cm_mask_compare_data(src, src_data->data, dst_data->mask);
+ cm_mask_compare_data(dst, dst_data->data, src_data->mask);
+ return memcmp(src, dst, IB_CM_PRIVATE_DATA_COMPARE_SIZE);
+}
+
+static int cm_compare_private_data(u8 *private_data,
+ struct ib_cm_private_data_compare *dst_data)
+{
+ u8 src[IB_CM_PRIVATE_DATA_COMPARE_SIZE];
+
+ if (!dst_data)
+ return 0;
+
+ cm_mask_compare_data(src, private_data, dst_data->mask);
+ return memcmp(src, dst_data->data, IB_CM_PRIVATE_DATA_COMPARE_SIZE);
+}
+
static struct cm_id_private * cm_insert_listen(struct cm_id_private *cm_id_priv)
{
struct rb_node **link = &cm.listen_service_table.rb_node;
@@ -362,14 +397,18 @@ static struct cm_id_private * cm_insert_
struct cm_id_private *cur_cm_id_priv;
__be64 service_id = cm_id_priv->id.service_id;
__be64 service_mask = cm_id_priv->id.service_mask;
+ int data_cmp;
while (*link) {
parent = *link;
cur_cm_id_priv = rb_entry(parent, struct cm_id_private,
service_node);
+ data_cmp = cm_compare_data(cm_id_priv->compare_data,
+ cur_cm_id_priv->compare_data);
if ((cur_cm_id_priv->id.service_mask & service_id) ==
(service_mask & cur_cm_id_priv->id.service_id) &&
- (cm_id_priv->id.device == cur_cm_id_priv->id.device))
+ (cm_id_priv->id.device == cur_cm_id_priv->id.device) &&
+ !data_cmp)
return cur_cm_id_priv;
if (cm_id_priv->id.device < cur_cm_id_priv->id.device)
@@ -378,6 +417,10 @@ static struct cm_id_private * cm_insert_
link = &(*link)->rb_right;
else if (service_id < cur_cm_id_priv->id.service_id)
link = &(*link)->rb_left;
+ else if (service_id > cur_cm_id_priv->id.service_id)
+ link = &(*link)->rb_right;
+ else if (data_cmp < 0)
+ link = &(*link)->rb_left;
else
link = &(*link)->rb_right;
}
@@ -387,16 +430,20 @@ static struct cm_id_private * cm_insert_
}
static struct cm_id_private * cm_find_listen(struct ib_device *device,
- __be64 service_id)
+ __be64 service_id,
+ u8 *private_data)
{
struct rb_node *node = cm.listen_service_table.rb_node;
struct cm_id_private *cm_id_priv;
+ int data_cmp;
while (node) {
cm_id_priv = rb_entry(node, struct cm_id_private, service_node);
+ data_cmp = cm_compare_private_data(private_data,
+ cm_id_priv->compare_data);
if ((cm_id_priv->id.service_mask & service_id) ==
cm_id_priv->id.service_id &&
- (cm_id_priv->id.device == device))
+ (cm_id_priv->id.device == device) && !data_cmp)
return cm_id_priv;
if (device < cm_id_priv->id.device)
@@ -405,6 +452,10 @@ static struct cm_id_private * cm_find_li
node = node->rb_right;
else if (service_id < cm_id_priv->id.service_id)
node = node->rb_left;
+ else if (service_id > cm_id_priv->id.service_id)
+ node = node->rb_right;
+ else if (data_cmp < 0)
+ node = node->rb_left;
else
node = node->rb_right;
}
@@ -728,15 +779,14 @@ retest:
wait_event(cm_id_priv->wait, !atomic_read(&cm_id_priv->refcount));
while ((work = cm_dequeue_work(cm_id_priv)) != NULL)
cm_free_work(work);
- if (cm_id_priv->private_data && cm_id_priv->private_data_len)
- kfree(cm_id_priv->private_data);
+ kfree(cm_id_priv->compare_data);
+ kfree(cm_id_priv->private_data);
kfree(cm_id_priv);
}
EXPORT_SYMBOL(ib_destroy_cm_id);
-int ib_cm_listen(struct ib_cm_id *cm_id,
- __be64 service_id,
- __be64 service_mask)
+int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask,
+ struct ib_cm_private_data_compare *compare_data)
{
struct cm_id_private *cm_id_priv, *cur_cm_id_priv;
unsigned long flags;
@@ -750,7 +800,19 @@ int ib_cm_listen(struct ib_cm_id *cm_id,
return -EINVAL;
cm_id_priv = container_of(cm_id, struct cm_id_private, id);
- BUG_ON(cm_id->state != IB_CM_IDLE);
+ if (cm_id->state != IB_CM_IDLE)
+ return -EINVAL;
+
+ if (compare_data) {
+ cm_id_priv->compare_data = kzalloc(sizeof *compare_data,
+ GFP_KERNEL);
+ if (!cm_id_priv->compare_data)
+ return -ENOMEM;
+ cm_mask_compare_data(cm_id_priv->compare_data->data,
+ compare_data->data, compare_data->mask);
+ memcpy(cm_id_priv->compare_data->mask, compare_data->mask,
+ IB_CM_PRIVATE_DATA_COMPARE_SIZE);
+ }
cm_id->state = IB_CM_LISTEN;
@@ -767,6 +829,8 @@ int ib_cm_listen(struct ib_cm_id *cm_id,
if (cur_cm_id_priv) {
cm_id->state = IB_CM_IDLE;
+ kfree(cm_id_priv->compare_data);
+ cm_id_priv->compare_data = NULL;
ret = -EBUSY;
}
return ret;
@@ -1239,7 +1303,8 @@ static struct cm_id_private * cm_match_r
/* Find matching listen request. */
listen_cm_id_priv = cm_find_listen(cm_id_priv->id.device,
- req_msg->service_id);
+ req_msg->service_id,
+ req_msg->private_data);
if (!listen_cm_id_priv) {
spin_unlock_irqrestore(&cm.lock, flags);
cm_issue_rej(work->port, work->mad_recv_wc,
@@ -2646,7 +2711,8 @@ static int cm_sidr_req_handler(struct cm
goto out; /* Duplicate message. */
}
cur_cm_id_priv = cm_find_listen(cm_id->device,
- sidr_req_msg->service_id);
+ sidr_req_msg->service_id,
+ sidr_req_msg->private_data);
if (!cur_cm_id_priv) {
rb_erase(&cm_id_priv->sidr_id_node, &cm.remote_sidr_table);
spin_unlock_irqrestore(&cm.lock, flags);
diff -uprN -X linux-2.6.git/Documentation/dontdiff
linux-2.6.git/drivers/infiniband/core/ucm.c
linux-2.6.ib/drivers/infiniband/core/ucm.c
--- linux-2.6.git/drivers/infiniband/core/ucm.c 2006-01-16 16:03:08.000000000 -0800
+++ linux-2.6.ib/drivers/infiniband/core/ucm.c 2006-01-16 16:03:35.000000000 -0800
@@ -646,6 +646,17 @@ out:
return result;
}
+static int ucm_validate_listen(__be64 service_id, __be64 service_mask)
+{
+ service_id &= service_mask;
+
+ if (((service_id & IB_CMA_SERVICE_ID_MASK) == IB_CMA_SERVICE_ID) ||
+ ((service_id & IB_SDP_SERVICE_ID_MASK) == IB_SDP_SERVICE_ID))
+ return -EINVAL;
+
+ return 0;
+}
+
static ssize_t ib_ucm_listen(struct ib_ucm_file *file,
const char __user *inbuf,
int in_len, int out_len)
@@ -661,7 +672,13 @@ static ssize_t ib_ucm_listen(struct ib_u
if (IS_ERR(ctx))
return PTR_ERR(ctx);
- result = ib_cm_listen(ctx->cm_id, cmd.service_id, cmd.service_mask);
+ result = ucm_validate_listen(cmd.service_id, cmd.service_mask);
+ if (result)
+ goto out;
+
+ result = ib_cm_listen(ctx->cm_id, cmd.service_id, cmd.service_mask,
+ NULL);
+out:
ib_ucm_ctx_put(ctx);
return result;
}
diff -uprN -X linux-2.6.git/Documentation/dontdiff
linux-2.6.git/include/rdma/ib_cm.h
linux-2.6.ib/include/rdma/ib_cm.h
--- linux-2.6.git/include/rdma/ib_cm.h 2006-01-16 10:26:47.000000000 -0800
+++ linux-2.6.ib/include/rdma/ib_cm.h 2006-01-16 16:03:35.000000000 -0800
@@ -32,7 +32,7 @@
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*
- * $Id: ib_cm.h 2730 2005-06-28 16:43:03Z sean.hefty $
+ * $Id: ib_cm.h 4311 2005-12-05 18:42:01Z sean.hefty $
*/
#if !defined(IB_CM_H)
#define IB_CM_H
@@ -102,7 +102,8 @@ enum ib_cm_data_size {
IB_CM_APR_INFO_LENGTH = 72,
IB_CM_SIDR_REQ_PRIVATE_DATA_SIZE = 216,
IB_CM_SIDR_REP_PRIVATE_DATA_SIZE = 136,
- IB_CM_SIDR_REP_INFO_LENGTH = 72
+ IB_CM_SIDR_REP_INFO_LENGTH = 72,
+ IB_CM_PRIVATE_DATA_COMPARE_SIZE = 64
};
struct ib_cm_id;
@@ -238,7 +239,6 @@ struct ib_cm_sidr_rep_event_param {
u32 qpn;
void *info;
u8 info_len;
-
};
struct ib_cm_event {
@@ -317,6 +317,15 @@ void ib_destroy_cm_id(struct ib_cm_id *c
#define IB_SERVICE_ID_AGN_MASK __constant_cpu_to_be64(0xFF00000000000000ULL)
#define IB_CM_ASSIGN_SERVICE_ID __constant_cpu_to_be64(0x0200000000000000ULL)
+#define IB_CMA_SERVICE_ID __constant_cpu_to_be64(0x0000000001000000ULL)
+#define IB_CMA_SERVICE_ID_MASK __constant_cpu_to_be64(0xFFFFFFFFFF000000ULL)
+#define IB_SDP_SERVICE_ID __constant_cpu_to_be64(0x0000000000010000ULL)
+#define IB_SDP_SERVICE_ID_MASK __constant_cpu_to_be64(0xFFFFFFFFFFFF0000ULL)
+
+struct ib_cm_private_data_compare {
+ u8 data[IB_CM_PRIVATE_DATA_COMPARE_SIZE];
+ u8 mask[IB_CM_PRIVATE_DATA_COMPARE_SIZE];
+};
/**
* ib_cm_listen - Initiates listening on the specified service ID for
@@ -330,10 +339,12 @@ void ib_destroy_cm_id(struct ib_cm_id *c
* range of service IDs. If set to 0, the service ID is matched
* exactly. This parameter is ignored if %service_id is set to
* IB_CM_ASSIGN_SERVICE_ID.
+ * @compare_data: This parameter is optional. It specifies data that must
+ * appear in the private data of a connection request for the specified
+ * listen request.
*/
-int ib_cm_listen(struct ib_cm_id *cm_id,
- __be64 service_id,
- __be64 service_mask);
+int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask,
+ struct ib_cm_private_data_compare *compare_data);
struct ib_cm_req_param {
struct ib_sa_path_rec *primary_path;
^ permalink raw reply
* [PATCH 1/6] IB: common handling for marshalling parameters to/from userspace
From: Sean Hefty @ 2006-03-06 18:59 UTC (permalink / raw)
To: 'Roland Dreier'; +Cc: netdev, linux-kernel, openib-general
In-Reply-To: <adaslpz2l9p.fsf@cisco.com>
Provide common handling for marshalling data between userspace clients
and kernel mode Infiniband drivers.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
---
diff -uprN -X linux-2.6.git/Documentation/dontdiff
linux-2.6.git/drivers/infiniband/core/Makefile
linux-2.6.ib/drivers/infiniband/core/Makefile
--- linux-2.6.git/drivers/infiniband/core/Makefile 2006-01-16 10:25:27.000000000 -0800
+++ linux-2.6.ib/drivers/infiniband/core/Makefile 2006-01-16 15:34:15.000000000 -0800
@@ -16,4 +16,5 @@ ib_umad-y := user_mad.o
ib_ucm-y := ucm.o
-ib_uverbs-y := uverbs_main.o uverbs_cmd.o uverbs_mem.o
+ib_uverbs-y := uverbs_main.o uverbs_cmd.o uverbs_mem.o \
+ uverbs_marshall.o
diff -uprN -X linux-2.6.git/Documentation/dontdiff
linux-2.6.git/drivers/infiniband/core/ucm.c
linux-2.6.ib/drivers/infiniband/core/ucm.c
--- linux-2.6.git/drivers/infiniband/core/ucm.c 2006-01-16 10:25:26.000000000 -0800
+++ linux-2.6.ib/drivers/infiniband/core/ucm.c 2006-01-16 15:34:15.000000000 -0800
@@ -30,7 +30,7 @@
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*
- * $Id: ucm.c 2594 2005-06-13 19:46:02Z libor $
+ * $Id: ucm.c 4311 2005-12-05 18:42:01Z sean.hefty $
*/
#include <linux/init.h>
#include <linux/fs.h>
@@ -48,6 +48,7 @@
#include <rdma/ib_cm.h>
#include <rdma/ib_user_cm.h>
+#include <rdma/ib_marshall.h>
MODULE_AUTHOR("Libor Michalek");
MODULE_DESCRIPTION("InfiniBand userspace Connection Manager access");
@@ -203,36 +204,6 @@ error:
return NULL;
}
-static void ib_ucm_event_path_get(struct ib_ucm_path_rec *upath,
- struct ib_sa_path_rec *kpath)
-{
- if (!kpath || !upath)
- return;
-
- memcpy(upath->dgid, kpath->dgid.raw, sizeof *upath->dgid);
- memcpy(upath->sgid, kpath->sgid.raw, sizeof *upath->sgid);
-
- upath->dlid = kpath->dlid;
- upath->slid = kpath->slid;
- upath->raw_traffic = kpath->raw_traffic;
- upath->flow_label = kpath->flow_label;
- upath->hop_limit = kpath->hop_limit;
- upath->traffic_class = kpath->traffic_class;
- upath->reversible = kpath->reversible;
- upath->numb_path = kpath->numb_path;
- upath->pkey = kpath->pkey;
- upath->sl = kpath->sl;
- upath->mtu_selector = kpath->mtu_selector;
- upath->mtu = kpath->mtu;
- upath->rate_selector = kpath->rate_selector;
- upath->rate = kpath->rate;
- upath->packet_life_time = kpath->packet_life_time;
- upath->preference = kpath->preference;
-
- upath->packet_life_time_selector =
- kpath->packet_life_time_selector;
-}
-
static void ib_ucm_event_req_get(struct ib_ucm_req_event_resp *ureq,
struct ib_cm_req_event_param *kreq)
{
@@ -251,8 +222,10 @@ static void ib_ucm_event_req_get(struct
ureq->srq = kreq->srq;
ureq->port = kreq->port;
- ib_ucm_event_path_get(&ureq->primary_path, kreq->primary_path);
- ib_ucm_event_path_get(&ureq->alternate_path, kreq->alternate_path);
+ ib_copy_path_rec_to_user(&ureq->primary_path, kreq->primary_path);
+ if (kreq->alternate_path)
+ ib_copy_path_rec_to_user(&ureq->alternate_path,
+ kreq->alternate_path);
}
static void ib_ucm_event_rep_get(struct ib_ucm_rep_event_resp *urep,
@@ -322,8 +295,8 @@ static int ib_ucm_event_process(struct i
info = evt->param.rej_rcvd.ari;
break;
case IB_CM_LAP_RECEIVED:
- ib_ucm_event_path_get(&uvt->resp.u.lap_resp.path,
- evt->param.lap_rcvd.alternate_path);
+ ib_copy_path_rec_to_user(&uvt->resp.u.lap_resp.path,
+ evt->param.lap_rcvd.alternate_path);
uvt->data_len = IB_CM_LAP_PRIVATE_DATA_SIZE;
uvt->resp.present = IB_UCM_PRES_ALTERNATE;
break;
@@ -635,65 +608,11 @@ static ssize_t ib_ucm_attr_id(struct ib_
return result;
}
-static void ib_ucm_copy_ah_attr(struct ib_ucm_ah_attr *dest_attr,
- struct ib_ah_attr *src_attr)
-{
- memcpy(dest_attr->grh_dgid, src_attr->grh.dgid.raw,
- sizeof src_attr->grh.dgid);
- dest_attr->grh_flow_label = src_attr->grh.flow_label;
- dest_attr->grh_sgid_index = src_attr->grh.sgid_index;
- dest_attr->grh_hop_limit = src_attr->grh.hop_limit;
- dest_attr->grh_traffic_class = src_attr->grh.traffic_class;
-
- dest_attr->dlid = src_attr->dlid;
- dest_attr->sl = src_attr->sl;
- dest_attr->src_path_bits = src_attr->src_path_bits;
- dest_attr->static_rate = src_attr->static_rate;
- dest_attr->is_global = (src_attr->ah_flags & IB_AH_GRH);
- dest_attr->port_num = src_attr->port_num;
-}
-
-static void ib_ucm_copy_qp_attr(struct ib_ucm_init_qp_attr_resp *dest_attr,
- struct ib_qp_attr *src_attr)
-{
- dest_attr->cur_qp_state = src_attr->cur_qp_state;
- dest_attr->path_mtu = src_attr->path_mtu;
- dest_attr->path_mig_state = src_attr->path_mig_state;
- dest_attr->qkey = src_attr->qkey;
- dest_attr->rq_psn = src_attr->rq_psn;
- dest_attr->sq_psn = src_attr->sq_psn;
- dest_attr->dest_qp_num = src_attr->dest_qp_num;
- dest_attr->qp_access_flags = src_attr->qp_access_flags;
-
- dest_attr->max_send_wr = src_attr->cap.max_send_wr;
- dest_attr->max_recv_wr = src_attr->cap.max_recv_wr;
- dest_attr->max_send_sge = src_attr->cap.max_send_sge;
- dest_attr->max_recv_sge = src_attr->cap.max_recv_sge;
- dest_attr->max_inline_data = src_attr->cap.max_inline_data;
-
- ib_ucm_copy_ah_attr(&dest_attr->ah_attr, &src_attr->ah_attr);
- ib_ucm_copy_ah_attr(&dest_attr->alt_ah_attr, &src_attr->alt_ah_attr);
-
- dest_attr->pkey_index = src_attr->pkey_index;
- dest_attr->alt_pkey_index = src_attr->alt_pkey_index;
- dest_attr->en_sqd_async_notify = src_attr->en_sqd_async_notify;
- dest_attr->sq_draining = src_attr->sq_draining;
- dest_attr->max_rd_atomic = src_attr->max_rd_atomic;
- dest_attr->max_dest_rd_atomic = src_attr->max_dest_rd_atomic;
- dest_attr->min_rnr_timer = src_attr->min_rnr_timer;
- dest_attr->port_num = src_attr->port_num;
- dest_attr->timeout = src_attr->timeout;
- dest_attr->retry_cnt = src_attr->retry_cnt;
- dest_attr->rnr_retry = src_attr->rnr_retry;
- dest_attr->alt_port_num = src_attr->alt_port_num;
- dest_attr->alt_timeout = src_attr->alt_timeout;
-}
-
static ssize_t ib_ucm_init_qp_attr(struct ib_ucm_file *file,
const char __user *inbuf,
int in_len, int out_len)
{
- struct ib_ucm_init_qp_attr_resp resp;
+ struct ib_uverbs_qp_attr resp;
struct ib_ucm_init_qp_attr cmd;
struct ib_ucm_context *ctx;
struct ib_qp_attr qp_attr;
@@ -716,7 +635,7 @@ static ssize_t ib_ucm_init_qp_attr(struc
if (result)
goto out;
- ib_ucm_copy_qp_attr(&resp, &qp_attr);
+ ib_copy_qp_attr_to_user(&resp, &qp_attr);
if (copy_to_user((void __user *)(unsigned long)cmd.response,
&resp, sizeof(resp)))
@@ -791,7 +710,7 @@ static int ib_ucm_alloc_data(const void
static int ib_ucm_path_get(struct ib_sa_path_rec **path, u64 src)
{
- struct ib_ucm_path_rec ucm_path;
+ struct ib_user_path_rec upath;
struct ib_sa_path_rec *sa_path;
*path = NULL;
@@ -803,36 +722,14 @@ static int ib_ucm_path_get(struct ib_sa_
if (!sa_path)
return -ENOMEM;
- if (copy_from_user(&ucm_path, (void __user *)(unsigned long)src,
- sizeof(ucm_path))) {
+ if (copy_from_user(&upath, (void __user *)(unsigned long)src,
+ sizeof(upath))) {
kfree(sa_path);
return -EFAULT;
}
- memcpy(sa_path->dgid.raw, ucm_path.dgid, sizeof sa_path->dgid);
- memcpy(sa_path->sgid.raw, ucm_path.sgid, sizeof sa_path->sgid);
-
- sa_path->dlid = ucm_path.dlid;
- sa_path->slid = ucm_path.slid;
- sa_path->raw_traffic = ucm_path.raw_traffic;
- sa_path->flow_label = ucm_path.flow_label;
- sa_path->hop_limit = ucm_path.hop_limit;
- sa_path->traffic_class = ucm_path.traffic_class;
- sa_path->reversible = ucm_path.reversible;
- sa_path->numb_path = ucm_path.numb_path;
- sa_path->pkey = ucm_path.pkey;
- sa_path->sl = ucm_path.sl;
- sa_path->mtu_selector = ucm_path.mtu_selector;
- sa_path->mtu = ucm_path.mtu;
- sa_path->rate_selector = ucm_path.rate_selector;
- sa_path->rate = ucm_path.rate;
- sa_path->packet_life_time = ucm_path.packet_life_time;
- sa_path->preference = ucm_path.preference;
-
- sa_path->packet_life_time_selector =
- ucm_path.packet_life_time_selector;
-
+ ib_copy_path_rec_from_user(sa_path, &upath);
*path = sa_path;
return 0;
}
@@ -1243,8 +1140,10 @@ static unsigned int ib_ucm_poll(struct f
poll_wait(filp, &file->poll_wait, wait);
+ down(&file->mutex);
if (!list_empty(&file->events))
mask = POLLIN | POLLRDNORM;
+ up(&file->mutex);
return mask;
}
diff -uprN -X linux-2.6.git/Documentation/dontdiff
linux-2.6.git/drivers/infiniband/core/uverbs_marshall.c
linux-2.6.ib/drivers/infiniband/core/uverbs_marshall.c
--- linux-2.6.git/drivers/infiniband/core/uverbs_marshall.c 1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.ib/drivers/infiniband/core/uverbs_marshall.c 2006-01-16 15:34:15.000000000 -0800
@@ -0,0 +1,138 @@
+/*
+ * Copyright (c) 2005 Intel Corporation. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <rdma/ib_marshall.h>
+
+static void ib_copy_ah_attr_to_user(struct ib_uverbs_ah_attr *dst,
+ struct ib_ah_attr *src)
+{
+ memcpy(dst->grh.dgid, src->grh.dgid.raw, sizeof src->grh.dgid);
+ dst->grh.flow_label = src->grh.flow_label;
+ dst->grh.sgid_index = src->grh.sgid_index;
+ dst->grh.hop_limit = src->grh.hop_limit;
+ dst->grh.traffic_class = src->grh.traffic_class;
+ dst->dlid = src->dlid;
+ dst->sl = src->sl;
+ dst->src_path_bits = src->src_path_bits;
+ dst->static_rate = src->static_rate;
+ dst->is_global = src->ah_flags & IB_AH_GRH ? 1 : 0;
+ dst->port_num = src->port_num;
+}
+
+void ib_copy_qp_attr_to_user(struct ib_uverbs_qp_attr *dst,
+ struct ib_qp_attr *src)
+{
+ dst->cur_qp_state = src->cur_qp_state;
+ dst->path_mtu = src->path_mtu;
+ dst->path_mig_state = src->path_mig_state;
+ dst->qkey = src->qkey;
+ dst->rq_psn = src->rq_psn;
+ dst->sq_psn = src->sq_psn;
+ dst->dest_qp_num = src->dest_qp_num;
+ dst->qp_access_flags = src->qp_access_flags;
+
+ dst->max_send_wr = src->cap.max_send_wr;
+ dst->max_recv_wr = src->cap.max_recv_wr;
+ dst->max_send_sge = src->cap.max_send_sge;
+ dst->max_recv_sge = src->cap.max_recv_sge;
+ dst->max_inline_data = src->cap.max_inline_data;
+
+ ib_copy_ah_attr_to_user(&dst->ah_attr, &src->ah_attr);
+ ib_copy_ah_attr_to_user(&dst->alt_ah_attr, &src->alt_ah_attr);
+
+ dst->pkey_index = src->pkey_index;
+ dst->alt_pkey_index = src->alt_pkey_index;
+ dst->en_sqd_async_notify = src->en_sqd_async_notify;
+ dst->sq_draining = src->sq_draining;
+ dst->max_rd_atomic = src->max_rd_atomic;
+ dst->max_dest_rd_atomic = src->max_dest_rd_atomic;
+ dst->min_rnr_timer = src->min_rnr_timer;
+ dst->port_num = src->port_num;
+ dst->timeout = src->timeout;
+ dst->retry_cnt = src->retry_cnt;
+ dst->rnr_retry = src->rnr_retry;
+ dst->alt_port_num = src->alt_port_num;
+ dst->alt_timeout = src->alt_timeout;
+}
+EXPORT_SYMBOL(ib_copy_qp_attr_to_user);
+
+void ib_copy_path_rec_to_user(struct ib_user_path_rec *dst,
+ struct ib_sa_path_rec *src)
+{
+ memcpy(dst->dgid, src->dgid.raw, sizeof src->dgid);
+ memcpy(dst->sgid, src->sgid.raw, sizeof src->sgid);
+
+ dst->dlid = src->dlid;
+ dst->slid = src->slid;
+ dst->raw_traffic = src->raw_traffic;
+ dst->flow_label = src->flow_label;
+ dst->hop_limit = src->hop_limit;
+ dst->traffic_class = src->traffic_class;
+ dst->reversible = src->reversible;
+ dst->numb_path = src->numb_path;
+ dst->pkey = src->pkey;
+ dst->sl = src->sl;
+ dst->mtu_selector = src->mtu_selector;
+ dst->mtu = src->mtu;
+ dst->rate_selector = src->rate_selector;
+ dst->rate = src->rate;
+ dst->packet_life_time = src->packet_life_time;
+ dst->preference = src->preference;
+ dst->packet_life_time_selector = src->packet_life_time_selector;
+}
+EXPORT_SYMBOL(ib_copy_path_rec_to_user);
+
+void ib_copy_path_rec_from_user(struct ib_sa_path_rec *dst,
+ struct ib_user_path_rec *src)
+{
+ memcpy(dst->dgid.raw, src->dgid, sizeof dst->dgid);
+ memcpy(dst->sgid.raw, src->sgid, sizeof dst->sgid);
+
+ dst->dlid = src->dlid;
+ dst->slid = src->slid;
+ dst->raw_traffic = src->raw_traffic;
+ dst->flow_label = src->flow_label;
+ dst->hop_limit = src->hop_limit;
+ dst->traffic_class = src->traffic_class;
+ dst->reversible = src->reversible;
+ dst->numb_path = src->numb_path;
+ dst->pkey = src->pkey;
+ dst->sl = src->sl;
+ dst->mtu_selector = src->mtu_selector;
+ dst->mtu = src->mtu;
+ dst->rate_selector = src->rate_selector;
+ dst->rate = src->rate;
+ dst->packet_life_time = src->packet_life_time;
+ dst->preference = src->preference;
+ dst->packet_life_time_selector = src->packet_life_time_selector;
+}
+EXPORT_SYMBOL(ib_copy_path_rec_from_user);
diff -uprN -X linux-2.6.git/Documentation/dontdiff
linux-2.6.git/include/rdma/ib_marshall.h
linux-2.6.ib/include/rdma/ib_marshall.h
--- linux-2.6.git/include/rdma/ib_marshall.h 1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.ib/include/rdma/ib_marshall.h 2006-01-16 15:34:15.000000000 -0800
@@ -0,0 +1,50 @@
+/*
+ * Copyright (c) 2005 Intel Corporation. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#if !defined(IB_USER_MARSHALL_H)
+#define IB_USER_MARSHALL_H
+
+#include <rdma/ib_verbs.h>
+#include <rdma/ib_sa.h>
+#include <rdma/ib_user_verbs.h>
+#include <rdma/ib_user_sa.h>
+
+void ib_copy_qp_attr_to_user(struct ib_uverbs_qp_attr *dst,
+ struct ib_qp_attr *src);
+
+void ib_copy_path_rec_to_user(struct ib_user_path_rec *dst,
+ struct ib_sa_path_rec *src);
+
+void ib_copy_path_rec_from_user(struct ib_sa_path_rec *dst,
+ struct ib_user_path_rec *src);
+
+#endif /* IB_USER_MARSHALL_H */
diff -uprN -X linux-2.6.git/Documentation/dontdiff
linux-2.6.git/include/rdma/ib_user_cm.h
linux-2.6.ib/include/rdma/ib_user_cm.h
--- linux-2.6.git/include/rdma/ib_user_cm.h 2006-01-16 10:26:47.000000000 -0800
+++ linux-2.6.ib/include/rdma/ib_user_cm.h 2006-01-16 15:34:15.000000000 -0800
@@ -30,13 +30,13 @@
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*
- * $Id: ib_user_cm.h 2576 2005-06-09 17:00:30Z libor $
+ * $Id: ib_user_cm.h 4019 2005-11-11 00:33:09Z sean.hefty $
*/
#ifndef IB_USER_CM_H
#define IB_USER_CM_H
-#include <linux/types.h>
+#include <rdma/ib_user_sa.h>
#define IB_USER_CM_ABI_VERSION 4
@@ -110,58 +110,6 @@ struct ib_ucm_init_qp_attr {
__u32 qp_state;
};
-struct ib_ucm_ah_attr {
- __u8 grh_dgid[16];
- __u32 grh_flow_label;
- __u16 dlid;
- __u16 reserved;
- __u8 grh_sgid_index;
- __u8 grh_hop_limit;
- __u8 grh_traffic_class;
- __u8 sl;
- __u8 src_path_bits;
- __u8 static_rate;
- __u8 is_global;
- __u8 port_num;
-};
-
-struct ib_ucm_init_qp_attr_resp {
- __u32 qp_attr_mask;
- __u32 qp_state;
- __u32 cur_qp_state;
- __u32 path_mtu;
- __u32 path_mig_state;
- __u32 qkey;
- __u32 rq_psn;
- __u32 sq_psn;
- __u32 dest_qp_num;
- __u32 qp_access_flags;
-
- struct ib_ucm_ah_attr ah_attr;
- struct ib_ucm_ah_attr alt_ah_attr;
-
- /* ib_qp_cap */
- __u32 max_send_wr;
- __u32 max_recv_wr;
- __u32 max_send_sge;
- __u32 max_recv_sge;
- __u32 max_inline_data;
-
- __u16 pkey_index;
- __u16 alt_pkey_index;
- __u8 en_sqd_async_notify;
- __u8 sq_draining;
- __u8 max_rd_atomic;
- __u8 max_dest_rd_atomic;
- __u8 min_rnr_timer;
- __u8 port_num;
- __u8 timeout;
- __u8 retry_cnt;
- __u8 rnr_retry;
- __u8 alt_port_num;
- __u8 alt_timeout;
-};
-
struct ib_ucm_listen {
__be64 service_id;
__be64 service_mask;
@@ -180,28 +128,6 @@ struct ib_ucm_private_data {
__u8 reserved[3];
};
-struct ib_ucm_path_rec {
- __u8 dgid[16];
- __u8 sgid[16];
- __be16 dlid;
- __be16 slid;
- __u32 raw_traffic;
- __be32 flow_label;
- __u32 reversible;
- __u32 mtu;
- __be16 pkey;
- __u8 hop_limit;
- __u8 traffic_class;
- __u8 numb_path;
- __u8 sl;
- __u8 mtu_selector;
- __u8 rate_selector;
- __u8 rate;
- __u8 packet_life_time_selector;
- __u8 packet_life_time;
- __u8 preference;
-};
-
struct ib_ucm_req {
__u32 id;
__u32 qpn;
@@ -304,8 +230,8 @@ struct ib_ucm_event_get {
};
struct ib_ucm_req_event_resp {
- struct ib_ucm_path_rec primary_path;
- struct ib_ucm_path_rec alternate_path;
+ struct ib_user_path_rec primary_path;
+ struct ib_user_path_rec alternate_path;
__be64 remote_ca_guid;
__u32 remote_qkey;
__u32 remote_qpn;
@@ -349,7 +275,7 @@ struct ib_ucm_mra_event_resp {
};
struct ib_ucm_lap_event_resp {
- struct ib_ucm_path_rec path;
+ struct ib_user_path_rec path;
};
struct ib_ucm_apr_event_resp {
diff -uprN -X linux-2.6.git/Documentation/dontdiff
linux-2.6.git/include/rdma/ib_user_sa.h
linux-2.6.ib/include/rdma/ib_user_sa.h
--- linux-2.6.git/include/rdma/ib_user_sa.h 1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.ib/include/rdma/ib_user_sa.h 2006-01-16 15:34:15.000000000 -0800
@@ -0,0 +1,60 @@
+/*
+ * Copyright (c) 2005 Intel Corporation. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef IB_USER_SA_H
+#define IB_USER_SA_H
+
+#include <linux/types.h>
+
+struct ib_user_path_rec {
+ __u8 dgid[16];
+ __u8 sgid[16];
+ __be16 dlid;
+ __be16 slid;
+ __u32 raw_traffic;
+ __be32 flow_label;
+ __u32 reversible;
+ __u32 mtu;
+ __be16 pkey;
+ __u8 hop_limit;
+ __u8 traffic_class;
+ __u8 numb_path;
+ __u8 sl;
+ __u8 mtu_selector;
+ __u8 rate_selector;
+ __u8 rate;
+ __u8 packet_life_time_selector;
+ __u8 packet_life_time;
+ __u8 preference;
+};
+
+#endif /* IB_USER_SA_H */
diff -uprN -X linux-2.6.git/Documentation/dontdiff
linux-2.6.git/include/rdma/ib_user_verbs.h
linux-2.6.ib/include/rdma/ib_user_verbs.h
--- linux-2.6.git/include/rdma/ib_user_verbs.h 2006-01-16 10:26:47.000000000 -0800
+++ linux-2.6.ib/include/rdma/ib_user_verbs.h 2006-01-16 15:34:15.000000000 -0800
@@ -31,7 +31,7 @@
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*
- * $Id: ib_user_verbs.h 2708 2005-06-24 17:27:21Z roland $
+ * $Id: ib_user_verbs.h 4019 2005-11-11 00:33:09Z sean.hefty $
*/
#ifndef IB_USER_VERBS_H
@@ -311,6 +311,64 @@ struct ib_uverbs_destroy_cq_resp {
__u32 async_events_reported;
};
+struct ib_uverbs_global_route {
+ __u8 dgid[16];
+ __u32 flow_label;
+ __u8 sgid_index;
+ __u8 hop_limit;
+ __u8 traffic_class;
+ __u8 reserved;
+};
+
+struct ib_uverbs_ah_attr {
+ struct ib_uverbs_global_route grh;
+ __u16 dlid;
+ __u8 sl;
+ __u8 src_path_bits;
+ __u8 static_rate;
+ __u8 is_global;
+ __u8 port_num;
+ __u8 reserved;
+};
+
+struct ib_uverbs_qp_attr {
+ __u32 qp_attr_mask;
+ __u32 qp_state;
+ __u32 cur_qp_state;
+ __u32 path_mtu;
+ __u32 path_mig_state;
+ __u32 qkey;
+ __u32 rq_psn;
+ __u32 sq_psn;
+ __u32 dest_qp_num;
+ __u32 qp_access_flags;
+
+ struct ib_uverbs_ah_attr ah_attr;
+ struct ib_uverbs_ah_attr alt_ah_attr;
+
+ /* ib_qp_cap */
+ __u32 max_send_wr;
+ __u32 max_recv_wr;
+ __u32 max_send_sge;
+ __u32 max_recv_sge;
+ __u32 max_inline_data;
+
+ __u16 pkey_index;
+ __u16 alt_pkey_index;
+ __u8 en_sqd_async_notify;
+ __u8 sq_draining;
+ __u8 max_rd_atomic;
+ __u8 max_dest_rd_atomic;
+ __u8 min_rnr_timer;
+ __u8 port_num;
+ __u8 timeout;
+ __u8 retry_cnt;
+ __u8 rnr_retry;
+ __u8 alt_port_num;
+ __u8 alt_timeout;
+ __u8 reserved[5];
+};
+
struct ib_uverbs_create_qp {
__u64 response;
__u64 user_handle;
@@ -487,26 +545,6 @@ struct ib_uverbs_post_srq_recv_resp {
__u32 bad_wr;
};
-struct ib_uverbs_global_route {
- __u8 dgid[16];
- __u32 flow_label;
- __u8 sgid_index;
- __u8 hop_limit;
- __u8 traffic_class;
- __u8 reserved;
-};
-
-struct ib_uverbs_ah_attr {
- struct ib_uverbs_global_route grh;
- __u16 dlid;
- __u8 sl;
- __u8 src_path_bits;
- __u8 static_rate;
- __u8 is_global;
- __u8 port_num;
- __u8 reserved;
-};
-
struct ib_uverbs_create_ah {
__u64 response;
__u64 user_handle;
^ permalink raw reply
* Re: [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT)
From: Ingo Oeser @ 2006-03-06 17:44 UTC (permalink / raw)
To: Evgeniy Polyakov
Cc: David S. Miller, jengelh, christopher.leech, linux-kernel, netdev
In-Reply-To: <20060305014324.GA20026@2ka.mipt.ru>
Evgeniy Polyakov wrote:
> On Sat, Mar 04, 2006 at 01:41:44PM -0800, David S. Miller (davem@davemloft.net) wrote:
> > From: Jan Engelhardt <jengelh@linux01.gwdg.de>
> > Date: Sat, 4 Mar 2006 19:46:22 +0100 (MET)
> >
> > > Does this buy the normal standard desktop user anything?
> >
> > Absolutely, it optimizes end-node performance.
>
> It really depends on how it is used.
> According to investigation made for kevent based FS AIO reading,
> get_user_pages() performange graph looks like sqrt() function
Hmm, so I should resurrect my user page table walker abstraction?
There I would hand each page to a "recording" function, which
can drop the page from the collection or coalesce it in the collector
if your scatter gather implementation allows it.
Regards
Ingo Oeser
^ permalink raw reply
* Re: [Patch 7/7] Generic netlink interface (delay accounting)
From: Shailabh Nagar @ 2006-03-06 17:00 UTC (permalink / raw)
To: hadi; +Cc: netdev, linux-kernel, lse-tech
In-Reply-To: <1141652556.5185.64.camel@localhost.localdomain>
Jamal,
Pls keep lkml and lse-tech on cc since some of this affects the usage
of delay accounting.
jamal wrote:
>Hi Shailabh,
>Apologies for taking a week to respond ..
>
>On Mon, 2006-27-02 at 15:26 -0500, Shailabh Nagar wrote:
>
>
>>jamal wrote:
>>
>>
>
>
>
>>Yes, the current intent is to allow multiple listeners to receive the
>>responses sent by the kernel.
>>
>>
>
>Responses or events? There is a difference:
>Response implies the program in user space requested (ex a GET) for that
>information and is receiving such info.
>Event implies the program in user space asked to be informed of changes
>in the kernel. Example an exit would be considered an event.
>Events are received by virtue of registering to a multicast group.
>[..]
>
>
My design was to have the listener get both responses (what I call
replies in the code)
as well as events (data sent on exit of pid)
>>Since this interface (taskstats) is currently designed for that
>>possibility, having multiple listeners, one for
>>each "component" such as delay accounting, is the model we're using.
>>We expect each component to have a pair of userspace programs, one for
>>sending commands and the other
>>to "listen" to all replies + data generated on task exits.
>>
>>
>
>You need to have a sender of GETs essentially and a listener of events.
>Those are two connections. The replies of a get from user1 will not be
>sent to user2 as well - unless ... thats what you are trying to achieve;
>the question is why?
>
>
Yes, I was trying to have an asymmetric model where the userspace sender
of GETs
doesn't receive the reply as a unicast. Rather the reply is sent by
multicast (alongwith all the
event data).
Reason for this unintuitive design was to make it easier to process the
returned data.
The expected usage of delay accounting is to periodically "sample" the
delays for all
tasks (or tgids) in the system. Also, get the delays from exiting pids
(lets forget how tgid exit
is handled for now...irrelevant to this discussion).
Using the above two pieces of data, userspace can aggregate the "delays"
seen by any
grouping of tasks that it chooses to implement.
In this usage scenario, its more efficient to have one receiver get both
response and event
data and process in a loop.
However, we could switch to the model you suggest and use a
multithreaded send/receive
userspace utility.
>>The listener
>>is expected to register/deregister interest through
>>TASKSTATS_CMD_LISTEN and IGNORE.
>>
>>
>>
>
>It is not necessary if you follow the model i described.
>
>
>
>>>How does this correlate to TASKSTATS_CMD_LISTEN/IGNORE?
>>>
>>>
>>>
>>>
>>See above. Its mainly an optimization so that if no listener is present,
>>there's no need to generate the data.
>>
>>
>>
>
>Also not necessary - There is a recent netlink addition to make sure
>that events dont get sent if no listeners exist.
>genetlink needs to be extended. For now assume such a thing exists.
>
>
Ok. Will this addition work for both unicast and multicast modes ?
>
>
>>>>+
>>>>
>>>>
>
>
>
>>Good point. Should check for users sending it as a cmd and treat it as a
>>noop.
>>
>>
>
>More like return an -EINVAL
>
>
Will this be necessary ? Isn't genl_rcv_msg() going to return a -EOPNOTSUPP
automatically for us since we've not registered the command ?
>
>
>>I'm just using
>>this as a placeholder for data thats returned without being requested.
>>
>>
>>
>
>So it is unconditional?
>
>
Yes.
>
>
>>Come to think of it, there's no real reason to have a genlmsghdr for
>>returned data, is there ?
>>
>>
>
>All messages should be consistent whether they are sent from user
>or kernel.
>
>
Ok. will retain genetlink header.
>>Other than to copy the genlmsghdr that was sent so user can identify
>>which command was sent
>>(and I'm doing that through the reply type, perhaps redundantly).
>>
>>
>>
>
>yes, that is a useful trick. Just make sure they are reflected
>correctly.
>
>
>
>>Actually, the next iteration of the code will move to dynamically
>>generated ID. But yes, will need to check for that.
>>
>>
>>
>
>Also if you can provide feedback whether the doc i sent was any use
>and what wasnt clear etc.
>
>
Will do.
>>Thanks for the review.
>>Couple of questions about general netlink:
>>is it intended to remain a size that will always be aligned to the
>>NLMSG_ALIGNTO so that (NLMSG_DATA(nlhdr) + GENL_HDRLEN) can always
>>be used as a pointer to the genlmsghdr ?
>>
>>
>>
>
>I am not sure i followed.
>The whole message (nlhdr, genlhdr, optionalhdr, TLVs) has to be in
>the end 32 bit aligned.
>
>
Ok , so separate padding isn't needed to make the genlhdr, optionalhdr
and TLV parts aligned
too.
>>Adding some macros like genlmsg_data(nlh) would be handy (currently I
>>just define and use it locally).
>>
>>
>>
>
>Send a patch.
>
>
will do.
Thanks,
Shailabh
>cheers,
>jamal
>
>
>
^ permalink raw reply
* Re: de2104x: interrupts before interrupt handler is registered
From: Martin Michlmayr @ 2006-03-06 14:35 UTC (permalink / raw)
To: Francois Romieu; +Cc: netdev, linux-kernel
In-Reply-To: <20060305185948.GA24765@electric-eye.fr.zoreil.com>
* Francois Romieu <romieu@fr.zoreil.com> [2006-03-05 19:59]:
> > I have a system on which I can reproduce this bug 100%. While I have
> > no idea how to fix the issue, I can provide debugging information and
> > test a fix.
> (not compile-tested)
Thanks a lot for your quick response, Francois. I can confirm that
this patch fixes the problem for me.
> -err_out_hw:
> - spin_lock_irqsave(&de->lock, flags);
> - de_stop_hw(de);
> - spin_unlock_irqrestore(&de->lock, flags);
flags is no longer used now, so we get a compilation warning. Updated
patch below. Francois, can you please submit it with a proper
changelog entry and your Signed-off-by.
From: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: Martin Michlmayr <tbm@cyrius.com>
--- a/drivers/net/tulip/de2104x.c
+++ b/drivers/net/tulip/de2104x.c
@@ -1362,7 +1362,6 @@ static int de_open (struct net_device *d
{
struct de_private *de = dev->priv;
int rc;
- unsigned long flags;
if (netif_msg_ifup(de))
printk(KERN_DEBUG "%s: enabling interface\n", dev->name);
@@ -1376,18 +1375,20 @@ static int de_open (struct net_device *d
return rc;
}
- rc = de_init_hw(de);
- if (rc) {
- printk(KERN_ERR "%s: h/w init failure, err=%d\n",
- dev->name, rc);
- goto err_out_free;
- }
+ dw32(IntrMask, 0);
rc = request_irq(dev->irq, de_interrupt, SA_SHIRQ, dev->name, dev);
if (rc) {
printk(KERN_ERR "%s: IRQ %d request failure, err=%d\n",
dev->name, dev->irq, rc);
- goto err_out_hw;
+ goto err_out_free;
+ }
+
+ rc = de_init_hw(de);
+ if (rc) {
+ printk(KERN_ERR "%s: h/w init failure, err=%d\n",
+ dev->name, rc);
+ goto err_out_free_irq;
}
netif_start_queue(dev);
@@ -1395,11 +1396,8 @@ static int de_open (struct net_device *d
return 0;
-err_out_hw:
- spin_lock_irqsave(&de->lock, flags);
- de_stop_hw(de);
- spin_unlock_irqrestore(&de->lock, flags);
-
+err_out_free_irq:
+ free_irq(dev->irq, dev);
err_out_free:
de_free_rings(de);
return rc;
--
Martin Michlmayr
http://www.cyrius.com/
^ permalink raw reply
* Re: GigE on PowerMac G5
From: Benjamin Herrenschmidt @ 2006-03-06 13:15 UTC (permalink / raw)
To: Andreas Schwab; +Cc: netdev, linuxppc64-dev
In-Reply-To: <jefylvzwhw.fsf@sykes.suse.de>
On Mon, 2006-03-06 at 11:40 +0100, Andreas Schwab wrote:
> Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
>
> > On Sat, 2006-03-04 at 15:53 +0100, Andreas Schwab wrote:
> >> I suppose the NIC in the PowerMac G5 can do GigE, yet when plugged into a
> >> GB switch it is only willing to talk 100MB with it. Any idea why? Kernel
> >> is 2.6.16-rc5-git2.
> >
> > Works for me... Must be a problem with auto-neg and your switch, or the
> > cable.... Can you check how the switch is configured maybe ? You can
> > also try forcing the link speed with ethtool.
>
> It's not the cable, I have swapped it with another system where Gb is
> working fine. Neither it's the switch port, I have swapped it too. I
> can't force the speed with ethtool either. Any other idea what to look
> for?
At this point, all I can say is... does it work in OS X ?
Ben.
^ permalink raw reply
* Re: de2104x: interrupts before interrupt handler is registered
From: linux-os (Dick Johnson) @ 2006-03-06 13:02 UTC (permalink / raw)
To: Martin Michlmayr; +Cc: Jeff Garzik, netdev, linux-kernel
In-Reply-To: <20060305180757.GA22121@deprecation.cyrius.com>
On Sun, 5 Mar 2006, Martin Michlmayr wrote:
> We have three independent reports about problems with de2104x involving
> interrupts. Alan Stern suggested that it "sure looks as though the
> ethernet interface is generating an interrupt request before the
> de2104x driver has registered its interrupt handler".
>
> The three reports are:
> - de2104x does not work (non-fatal oops) when uhci_hcd is loaded
> first. http://lkml.org/lkml/2006/2/3/402 The problem does not
> occur under 2.4 with the tulip module, so this is a regression.
> - fatal de2104x interrupt oops (without uhci_hcd).
> http://lkml.org/lkml/2006/2/5/64
> - "kernel panic after the first transmission attempt times out"
> Regression from 2.4. http://bugs.debian.org/288821
>
> I have a system on which I can reproduce this bug 100%. While I have
> no idea how to fix the issue, I can provide debugging information and
> test a fix. However, I'm (temporarily) leaving the country in three
> weeks and won't have access to this PC for several months, so it would
> be great if someone could look into this soon. Jeff?
>
>
> 1)
> eth0: enabling interface
> eth0: set link 10baseT auto
> eth0: mode 0x7ffc0040, sia 0x10c4,0xffffef01,0xffffffff,0xffff0008
> eth0: set mode 0x7ffc0040, set sia 0xef01,0xffff,0x8
> irq 10: nobody cared (try booting with the "irqpoll" option)
> [<c012f89e>] __report_bad_irq+0x31/0x73
> [<c012f96d>] note_interrupt+0x75/0x98
> [<c012f46a>] __do_IRQ+0x67/0x91
> [<c0104fc1>] do_IRQ+0x19/0x24
> [<c0103afa>] common_interrupt+0x1a/0x20
> [<c0119a1c>] __do_softirq+0x2c/0x7d
> [<c0119a8f>] do_softirq+0x22/0x26
> [<c0104fc6>] do_IRQ+0x1e/0x24
> [<c0103afa>] common_interrupt+0x1a/0x20
> [<c481da07>] de_set_rx_mode+0xf/0x12 [de2104x]
> [<c481e2c1>] de_init_hw+0x6d/0x76 [de2104x]
> [<c481e59e>] de_open+0x64/0xe4 [de2104x]
> [<c0225a5f>] dev_open+0x30/0x66
> [<c0226a9a>] dev_change_flags+0x4d/0xf0
> [<c025d301>] devinet_ioctl+0x224/0x4bd
> [<c0155541>] do_ioctl+0x21/0x50
> [<c0155774>] vfs_ioctl+0x152/0x161
> [<c01557cb>] sys_ioctl+0x48/0x65
> [<c0102a99>] syscall_call+0x7/0xb
> handlers:
> [<c4890d97>] (usb_hcd_irq+0x0/0x56 [usbcore])
> Disabling IRQ #10
>
> 3)
> eth0: mode 0x7ffc0040, sia 0x10c4,0xffffef01,0xffffffff,0xffff0008
> eth0: set mode 0x7ffc0040, set sia 0xef01,0xffff,0x8
> [__report_bad_irq+42/144] __report_bad_irq+0x2a/0x90
> [note_interrupt+108/160] note_interrupt+0x6c/0xa0
> [do_IRQ+289/304] do_IRQ+0x121/0x130
> [common_interrupt+24/32] common_interrupt+0x18/0x20
> [__do_softirq+48/128] __do_softirq+0x30/0x80
> [acpi_irq+0/22] acpi_irq+0x0/0x16
> [do_softirq+38/48] do_softirq+0x26/0x30
> [do_IRQ+253/304] do_IRQ+0xfd/0x130
> [common_interrupt+24/32] common_interrupt+0x18/0x20
> [__crc_do_softirq+25311/208152] de_set_rx_mode+0x26/0x50 [de2104x]
> [__crc_do_softirq+28277/208152] de_init_hw+0x8c/0x90 [de2104x]
> [__crc_do_softirq+29105/208152] de_open+0x68/0x140 [de2104x]
> [profile_hook+45/75] profile_hook+0x2d/0x4b
> [dev_open+203/256] dev_open+0xcb/0x100
> [dev_mc_upload+36/80] dev_mc_upload+0x24/0x50
> [dev_change_flags+81/288] dev_change_flags+0x51/0x120
> [devinet_ioctl+582/1424] devinet_ioctl+0x246/0x590
> [inet_ioctl+94/160] inet_ioctl+0x5e/0xa0
> [sock_ioctl+249/688] sock_ioctl+0xf9/0x2b0
> [sys_ioctl+269/656] sys_ioctl+0x10d/0x290
> [syscall_call+7/11] syscall_call+0x7/0xb
> eth0: link up, media 10baseT auto
>
> --
> Martin Michlmayr
> http://www.cyrius.com/
> -
This started to happen in a lot of PCI drivers once it became
necessary to call pci_enable_device() in order to make the
returned IRQ values valid. This has been reported numerious
times and has not been fixed. Basically, in order to get
the correct value, one needs to disable the board in some
unspecified way so it is not possible for it to generate
an interrupt before enabling the board. With some devices
this may not be possible!
Cheers,
Dick Johnson
Penguin : Linux version 2.6.15.4 on an i686 machine (5589.47 BogoMips).
Warning : 98.36% of all statistics are fiction, book release in April.
_
\x1a\x04
****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.
Thank you.
^ permalink raw reply
* Re: GigE on PowerMac G5
From: Andreas Schwab @ 2006-03-06 10:40 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: netdev, linuxppc64-dev
In-Reply-To: <1141507000.17127.4.camel@localhost.localdomain>
Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
> On Sat, 2006-03-04 at 15:53 +0100, Andreas Schwab wrote:
>> I suppose the NIC in the PowerMac G5 can do GigE, yet when plugged into a
>> GB switch it is only willing to talk 100MB with it. Any idea why? Kernel
>> is 2.6.16-rc5-git2.
>
> Works for me... Must be a problem with auto-neg and your switch, or the
> cable.... Can you check how the switch is configured maybe ? You can
> also try forcing the link speed with ethtool.
It's not the cable, I have swapped it with another system where Gb is
working fine. Neither it's the switch port, I have swapped it too. I
can't force the speed with ethtool either. Any other idea what to look
for?
Andreas.
--
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply
* [PATCH] net: drop duplicate assignment in request_sock
From: Norbert Kiesel @ 2006-03-05 23:45 UTC (permalink / raw)
To: acme; +Cc: linux-kernel, netdev
From: Norbert Kiesel <nkiesel@tbdnetworks.com>
Hi,
just noticed that request_sock.[ch] contain a useless assignment of
rskq_accept_head to itself. I assume this is a typo and the 2nd one
was supposed to be _tail. However, setting _tail to NULL is not
needed, so the patch below just drops the 2nd assignment.
Best,
Norbert
Signed-Off-By: Norbert Kiesel <nkiesel@tbdnetworks.com>
---
diff -ru a/include/net/request_sock.h b/include/net/request_sock.h
--- a/include/net/request_sock.h 2005-10-28 15:44:45.000000000 -0700
+++ b/include/net/request_sock.h 2006-03-05 15:22:33.000000000 -0800
@@ -145,7 +145,7 @@
{
struct request_sock *req = queue->rskq_accept_head;
- queue->rskq_accept_head = queue->rskq_accept_head = NULL;
+ queue->rskq_accept_head = NULL;
return req;
}
diff -ru a/net/core/request_sock.c b/net/core/request_sock.c
--- a/net/core/request_sock.c 2006-03-05 14:40:50.000000000 -0800
+++ b/net/core/request_sock.c 2006-03-05 15:23:11.000000000 -0800
@@ -51,7 +51,7 @@
get_random_bytes(&lopt->hash_rnd, sizeof(lopt->hash_rnd));
rwlock_init(&queue->syn_wait_lock);
- queue->rskq_accept_head = queue->rskq_accept_head = NULL;
+ queue->rskq_accept_head = NULL;
lopt->nr_table_entries = nr_table_entries;
write_lock_bh(&queue->syn_wait_lock);
^ permalink raw reply
* Please pull bcm43xx softmac-upstream and dscape-upstream branches
From: Michael Buesch @ 2006-03-05 20:47 UTC (permalink / raw)
To: John W. Linville
Cc: netdev-u79uwXL29TY76Z2rM5mHXA, bcm43xx-dev-0fE9KPoRgkgATYTw5x5z8w
[-- Attachment #1: Type: text/plain, Size: 2826 bytes --]
Hi John,
It took a little longer than promised, but a broken harddisk in
my server delayed my plans. ;)
Please pull branches "softmac-upstream" and "dscape-upstream"
from my repository at:
git://bu3sch.de/wireless-2.6.git
This introduces the following changesets:
Michael Buesch:
bcm43xx-d80211: Partially fix PIO code. Add Kconfig option for PIO or DMA mode (or both).
bcm43xx-d80211: fix typo. Missing D80211.
bcm43xx-d80211: add note that not all devices support PIO.
bcm43xx-d80211: Documentation fix by "Bin Zhang" <yangtze31-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
bcm43xx-d80211: completely disable wireless_handlers, as we must not override d80211 handlers.
bcm43xx-d80211: Move sprom lowlevel reading/writing to its own functions.
bcm43xx-d80211: make bcm43xx_sprom_crc() static.
bcm43xx-d80211: remove all remaining standard WX.
bcm43xx-d80211: split channel helper functions, so that they can be called without a valid running core.
bcm43xx-d80211: remove old unused struct.
bcm43xx-d80211: Completely remove all WX and add a sysfs interface as substitute for the private WX.
bcm43xx-d80211: rewrite and simplify the periodic task handling.
bcm43xx-d80211: Fix Kconfig typo (transfer mode default)
bcm43xx-d80211: use -Dwext for wpa_supplicant.
bcm43xx-d80211: Don't build add_sta, if it already exists.
bcm43xx-d80211: Workaround init_board vs IRQ race.
bcm43xx-d80211: move initialized = 1 to the end of init_board.
bcm43xx-d80211: Remove obsolete FIXME on wireless_handlers.
bcm43xx-d80211: Move TX/RX related functions to its own file. Add basic RTS/CTS code.
bcm43xx-d80211: fix typo in comment.
bcm43xx-d80211: properly register device attributes at the right place.
bcm43xx-d80211: remove magic add_sta.c and add the STA interface through sysfs.
Michael Buesch:
bcm43xx: Move sprom lowlevel reading/writing to its own functions.
bcm43xx: make bcm43xx_sprom_crc() static.
bcm43xx: split the channel helper functions, so that they can be used without a valid running core.
bcm43xx: remove old unused struct.
bcm43xx: Fix Kconfig typo (transfer mode default)
bcm43xx: Workaround init_board vs IRQ race.
bcm43xx: move initialized = 1 to the end of init_board.
bcm43xx: add assert(bcm->initialized) to periodic_tasks_setup().
bcm43xx: Move TX/RX related functions to its own file. Add basic RTS/CTS code.
bcm43xx: Add sysfs attributes for device specific tunables.
If you want to look at the individual patches, please use
my git webinterface, as the patch would be to big to inline here.
http://bu3sch.de/git?p=wireless-2.6.git;a=summary
--
Greetings Michael.
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply
* Re: de2104x: interrupts before interrupt handler is registered
From: Francois Romieu @ 2006-03-05 18:59 UTC (permalink / raw)
To: Martin Michlmayr; +Cc: Jeff Garzik, netdev, linux-kernel
In-Reply-To: <20060305180757.GA22121@deprecation.cyrius.com>
Martin Michlmayr <tbm@cyrius.com> :
[...]
> I have a system on which I can reproduce this bug 100%. While I have
> no idea how to fix the issue, I can provide debugging information and
> test a fix. However, I'm (temporarily) leaving the country in three
> weeks and won't have access to this PC for several months, so it would
> be great if someone could look into this soon. Jeff?
(not compile-tested)
diff --git a/drivers/net/tulip/de2104x.c b/drivers/net/tulip/de2104x.c
index d7fb3ff..d16a5a0 100644
--- a/drivers/net/tulip/de2104x.c
+++ b/drivers/net/tulip/de2104x.c
@@ -1376,18 +1376,20 @@ static int de_open (struct net_device *d
return rc;
}
- rc = de_init_hw(de);
- if (rc) {
- printk(KERN_ERR "%s: h/w init failure, err=%d\n",
- dev->name, rc);
- goto err_out_free;
- }
+ dw32(IntrMask, 0);
rc = request_irq(dev->irq, de_interrupt, SA_SHIRQ, dev->name, dev);
if (rc) {
printk(KERN_ERR "%s: IRQ %d request failure, err=%d\n",
dev->name, dev->irq, rc);
- goto err_out_hw;
+ goto err_out_free;
+ }
+
+ rc = de_init_hw(de);
+ if (rc) {
+ printk(KERN_ERR "%s: h/w init failure, err=%d\n",
+ dev->name, rc);
+ goto err_out_free_irq;
}
netif_start_queue(dev);
@@ -1395,11 +1397,8 @@ static int de_open (struct net_device *d
return 0;
-err_out_hw:
- spin_lock_irqsave(&de->lock, flags);
- de_stop_hw(de);
- spin_unlock_irqrestore(&de->lock, flags);
-
+err_out_free_irq:
+ free_irq(dev->irq, dev);
err_out_free:
de_free_rings(de);
return rc;
^ permalink raw reply related
* de2104x: interrupts before interrupt handler is registered
From: Martin Michlmayr @ 2006-03-05 18:07 UTC (permalink / raw)
To: Jeff Garzik, netdev; +Cc: linux-kernel
We have three independent reports about problems with de2104x involving
interrupts. Alan Stern suggested that it "sure looks as though the
ethernet interface is generating an interrupt request before the
de2104x driver has registered its interrupt handler".
The three reports are:
- de2104x does not work (non-fatal oops) when uhci_hcd is loaded
first. http://lkml.org/lkml/2006/2/3/402 The problem does not
occur under 2.4 with the tulip module, so this is a regression.
- fatal de2104x interrupt oops (without uhci_hcd).
http://lkml.org/lkml/2006/2/5/64
- "kernel panic after the first transmission attempt times out"
Regression from 2.4. http://bugs.debian.org/288821
I have a system on which I can reproduce this bug 100%. While I have
no idea how to fix the issue, I can provide debugging information and
test a fix. However, I'm (temporarily) leaving the country in three
weeks and won't have access to this PC for several months, so it would
be great if someone could look into this soon. Jeff?
1)
eth0: enabling interface
eth0: set link 10baseT auto
eth0: mode 0x7ffc0040, sia 0x10c4,0xffffef01,0xffffffff,0xffff0008
eth0: set mode 0x7ffc0040, set sia 0xef01,0xffff,0x8
irq 10: nobody cared (try booting with the "irqpoll" option)
[<c012f89e>] __report_bad_irq+0x31/0x73
[<c012f96d>] note_interrupt+0x75/0x98
[<c012f46a>] __do_IRQ+0x67/0x91
[<c0104fc1>] do_IRQ+0x19/0x24
[<c0103afa>] common_interrupt+0x1a/0x20
[<c0119a1c>] __do_softirq+0x2c/0x7d
[<c0119a8f>] do_softirq+0x22/0x26
[<c0104fc6>] do_IRQ+0x1e/0x24
[<c0103afa>] common_interrupt+0x1a/0x20
[<c481da07>] de_set_rx_mode+0xf/0x12 [de2104x]
[<c481e2c1>] de_init_hw+0x6d/0x76 [de2104x]
[<c481e59e>] de_open+0x64/0xe4 [de2104x]
[<c0225a5f>] dev_open+0x30/0x66
[<c0226a9a>] dev_change_flags+0x4d/0xf0
[<c025d301>] devinet_ioctl+0x224/0x4bd
[<c0155541>] do_ioctl+0x21/0x50
[<c0155774>] vfs_ioctl+0x152/0x161
[<c01557cb>] sys_ioctl+0x48/0x65
[<c0102a99>] syscall_call+0x7/0xb
handlers:
[<c4890d97>] (usb_hcd_irq+0x0/0x56 [usbcore])
Disabling IRQ #10
3)
eth0: mode 0x7ffc0040, sia 0x10c4,0xffffef01,0xffffffff,0xffff0008
eth0: set mode 0x7ffc0040, set sia 0xef01,0xffff,0x8
[__report_bad_irq+42/144] __report_bad_irq+0x2a/0x90
[note_interrupt+108/160] note_interrupt+0x6c/0xa0
[do_IRQ+289/304] do_IRQ+0x121/0x130
[common_interrupt+24/32] common_interrupt+0x18/0x20
[__do_softirq+48/128] __do_softirq+0x30/0x80
[acpi_irq+0/22] acpi_irq+0x0/0x16
[do_softirq+38/48] do_softirq+0x26/0x30
[do_IRQ+253/304] do_IRQ+0xfd/0x130
[common_interrupt+24/32] common_interrupt+0x18/0x20
[__crc_do_softirq+25311/208152] de_set_rx_mode+0x26/0x50 [de2104x]
[__crc_do_softirq+28277/208152] de_init_hw+0x8c/0x90 [de2104x]
[__crc_do_softirq+29105/208152] de_open+0x68/0x140 [de2104x]
[profile_hook+45/75] profile_hook+0x2d/0x4b
[dev_open+203/256] dev_open+0xcb/0x100
[dev_mc_upload+36/80] dev_mc_upload+0x24/0x50
[dev_change_flags+81/288] dev_change_flags+0x51/0x120
[devinet_ioctl+582/1424] devinet_ioctl+0x246/0x590
[inet_ioctl+94/160] inet_ioctl+0x5e/0xa0
[sock_ioctl+249/688] sock_ioctl+0xf9/0x2b0
[sys_ioctl+269/656] sys_ioctl+0x10d/0x290
[syscall_call+7/11] syscall_call+0x7/0xb
eth0: link up, media 10baseT auto
--
Martin Michlmayr
http://www.cyrius.com/
^ permalink raw reply
* Re: [PATCH 8/8] [I/OAT] TCP recv offload to I/OAT
From: David S. Miller @ 2006-03-05 10:27 UTC (permalink / raw)
To: akpm; +Cc: christopher.leech, linux-kernel, netdev
In-Reply-To: <20060305004534.1d94b3cf.akpm@osdl.org>
From: Andrew Morton <akpm@osdl.org>
Date: Sun, 5 Mar 2006 00:45:34 -0800
> The __get_cpu_var() here will run smp_processor_id() from preemptible
> context. You'll get a big warning if the correct debug options are set.
>
> The reason for this is that preemption could cause this code to hop between
> CPUs.
>
> Please always test code with all debug options enabled and with full kernel
> preemption.
To be fair that warning doesn't trigger on some platforms, such as
sparc64 where the __get_cpu_var() implementation simply takes the
value from a fixed cpu register and doesn't do the debugging check.
Sparc64 should add the check when debugging options are enabled, for
sure, but the point is that it may not entirely be the tester's fault.
:-)
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox