Netdev List
 help / color / mirror / Atom feed
* Re: [GIT PULL] bcm43xx: Update B6PHY initialization
From: Danny van Dyk @ 2006-02-01  9:51 UTC (permalink / raw)
  To: John W. Linville
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, bcm43xx-dev-0fE9KPoRgkgATYTw5x5z8w
In-Reply-To: <20060201002101.GA8306-oTNwCEtKUwI/11+TDStg7g@public.gmane.org>

John, please

git pull rsync://pitr.amd64.dev.gentoo.org/kugelfang/wireless-2.6.git

which will provide this changeset:

Danny van Dyk:
      [bcm43xx] Sync bcm43xx_phy_initb6() with specs

diff --git a/drivers/net/wireless/bcm43xx/bcm43xx_phy.c 
b/drivers/net/wireless/bcm43xx/bcm43xx_phy.c
index f5e7a6a..d90f207 100644
--- a/drivers/net/wireless/bcm43xx/bcm43xx_phy.c
+++ b/drivers/net/wireless/bcm43xx/bcm43xx_phy.c
@@ -947,7 +947,7 @@ static void bcm43xx_phy_initb6(struct bc
 	bcm43xx_radio_write16(bcm, 0x0050, 0x0020);
 	if ((bcm->current_core->radio->manufact == 0x17F) &&
 	    (bcm->current_core->radio->version == 0x2050) &&
-	    (bcm->current_core->radio->revision == 2)) {
+	    (bcm->current_core->radio->revision <= 2)) {
 		bcm43xx_radio_write16(bcm, 0x0050, 0x0020);
 		bcm43xx_radio_write16(bcm, 0x005A, 0x0070);
 		bcm43xx_radio_write16(bcm, 0x005B, 0x007B);
@@ -984,10 +984,15 @@ static void bcm43xx_phy_initb6(struct bc
 		bcm43xx_write16(bcm, 0x03E4, 0x0009);
 	if (phy->type == BCM43xx_PHYTYPE_B) {
 		bcm43xx_write16(bcm, 0x03E6, 0x8140);
-		bcm43xx_phy_write(bcm, 0x0016, 0x5410);
-		bcm43xx_phy_write(bcm, 0x0017, 0xA820);
-		bcm43xx_phy_write(bcm, 0x0007, 0x0062);
-		TODO();//TODO: calibrate stuff.
+		bcm43xx_phy_write(bcm, 0x0016, 0x0410);
+		bcm43xx_phy_write(bcm, 0x0017, 0x0820);
+		bcm43xx_phy_write(bcm, 0x0062, 0x0007);
+		(void) bcm43xx_radio_calibrationvalue(bcm);
+		bcm43xx_phy_lo_b_measure(bcm);
+		if (bcm->sprom.boardflags & BCM43xx_BFL_RSSI) {
+			bcm43xx_calc_nrssi_slope(bcm);
+			bcm43xx_calc_nrssi_threshold(bcm);
+		}
 		bcm43xx_phy_init_pctl(bcm);
 	} else
 		bcm43xx_write16(bcm, 0x03E6, 0x0);
diff --git a/drivers/net/wireless/bcm43xx/bcm43xx_radio.c 
b/drivers/net/wireless/bcm43xx/bcm43xx_radio.c
index 5ce6ace..3901aa9 100644
--- a/drivers/net/wireless/bcm43xx/bcm43xx_radio.c
+++ b/drivers/net/wireless/bcm43xx/bcm43xx_radio.c
@@ -1184,7 +1184,7 @@ int bcm43xx_radio_set_interference_mitig
 	return 0;
 }
 
-static u16 bcm43xx_radio_calibrationvalue(struct bcm43xx_private *bcm)
+u16 bcm43xx_radio_calibrationvalue(struct bcm43xx_private *bcm)
 {
 	u16 reg, index, ret;
 
diff --git a/drivers/net/wireless/bcm43xx/bcm43xx_radio.h 
b/drivers/net/wireless/bcm43xx/bcm43xx_radio.h
index 89fe292..a5d2e10 100644
--- a/drivers/net/wireless/bcm43xx/bcm43xx_radio.h
+++ b/drivers/net/wireless/bcm43xx/bcm43xx_radio.h
@@ -89,5 +89,6 @@ void bcm43xx_nrssi_hw_update(struct bcm4
 void bcm43xx_nrssi_mem_update(struct bcm43xx_private *bcm);
 
 void bcm43xx_radio_set_tx_iq(struct bcm43xx_private *bcm);
+u16 bcm43xx_radio_calibrationvalue(struct bcm43xx_private *bcm);
 
 #endif /* BCM43xx_RADIO_H_ */
-- 
Danny van Dyk <kugelfang-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
Gentoo/AMD64 Project, Gentoo Scientific Project

-- 
Danny van Dyk <kugelfang-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
Gentoo/AMD64 Project, Gentoo Scientific Project

^ permalink raw reply related

* Re: [PATCH] TNETW1450: successful device initialization
From: Denis Vlasenko @ 2006-02-01  9:55 UTC (permalink / raw)
  To: Jean-Baptiste Note; +Cc: netdev, acx100-devel, andi
In-Reply-To: <1138786708.26656.4.camel@curcuma.ens.fr>

On Wednesday 01 February 2006 11:38, Jean-Baptiste Note wrote:
> Hi Denis,
> 
> > Do you have a "HOWTO sniff USB under Windows" or URL to it?
> > May be helpful for other potential developers.
> > 
> 
> One page which I found very usefull for prism54softmac USB sniffing
> under windows is this one :
> 
> http://benoit.papillault.free.fr/usbsnoop/
> 
> Another very good way to snoop the USB frames is under linux if your
> driver happens to run under ndiswrapper (which is often the case),
> either by savage printk's, or by using the USB snooping facilities from
> Pete Zaitcev.

Added to acx README, thanks!
--
vda


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642

^ permalink raw reply

* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
From: Herbert Xu @ 2006-02-01 10:42 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: davem, linux-kernel, netdev, YOSHIFUJI Hideaki
In-Reply-To: <20060131212432.GA18812@elte.hu>

[-- Attachment #1: Type: text/plain, Size: 1072 bytes --]

On Tue, Jan 31, 2006 at 10:24:32PM +0100, Ingo Molnar wrote:
>
>  [<c04de9e8>] _write_lock+0x8/0x10
>  [<c0499015>] inet6_destroy_sock+0x25/0x100
>  [<c04b8672>] tcp_v6_destroy_sock+0x12/0x20
>  [<c046bbda>] inet_csk_destroy_sock+0x4a/0x150
>  [<c047625c>] tcp_rcv_state_process+0xd4c/0xdd0
>  [<c047d8e9>] tcp_v4_do_rcv+0xa9/0x340
>  [<c047eabb>] tcp_v4_rcv+0x8eb/0x9d0

OK this is definitely broken.  We should never touch the dst lock in
softirq context.  Since inet6_destroy_sock may be called from that
context due to the asynchronous nature of sockets, we can't take the
lock there.

In fact this sk_dst_reset is totally redundant since all IPv6 sockets
use inet_sock_destruct as their socket destructor which always cleans
up the dst anyway.  So the solution is to simply remove the call.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

[-- Attachment #2: inet6-destroy-sock-should-not-reset-dst --]
[-- Type: text/plain, Size: 369 bytes --]

diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -369,12 +369,6 @@ int inet6_destroy_sock(struct sock *sk)
 	struct sk_buff *skb;
 	struct ipv6_txoptions *opt;
 
-	/*
-	 *	Release destination entry
-	 */
-
-	sk_dst_reset(sk);
-
 	/* Release rx options */
 
 	if ((skb = xchg(&np->pktoptions, NULL)) != NULL)

^ permalink raw reply

* [RFC] Backward compatibility and WAN netdev configuration
From: Krzysztof Halasa @ 2006-02-01 10:50 UTC (permalink / raw)
  To: netdev, lkml

Hi,

I'm considering some changes/additions to my generic HDLC (WAN) code.

What do you think about:
a) Currently it consists of mid-layer WAN protocols single module (Cisco
   HDLC, FR etc.) + low-level hardware HDLC card driver (C101, N2, PCI200SYN
   etc.). I'm thinking about splitting the protocol module into separate
   modules - it would make them independent, users would be able to
   load, say, FR without PPP or X.25 and underlying syncppp, lapb etc.
   From the technical POV it would be superior to current code but it
   would require sysadmins to change modprobe.conf, add another modprobe
   or something like that. Not a real problem but the upgrade can't be
   automatic.

b) I'm currently using a dedicated "sethdlc" tool for configuring WAN
   devices (both physical parameters like clocking, speeds etc. and
   protocol parameters/selection). It uses ioctl(). I'm thinking about
   switching configuration interface to sysfs. That would render the
   old ioctl interface obsolete.
   It would mean much better flexibility, and (when the HDLC ioctl
   interface is removed in a year or so) would simplify the code.

   I'm not sure about using sysfs for net device configuration, though.
   Of course, it would make sysfs mandatory for generic HDLC users.

I'd aim at making changes to ~ 2.6.18.

Opinions?
-- 
Krzysztof Halasa

^ permalink raw reply

* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
From: Ingo Molnar @ 2006-02-01 11:13 UTC (permalink / raw)
  To: Herbert Xu; +Cc: davem, linux-kernel, netdev, YOSHIFUJI Hideaki
In-Reply-To: <20060201104214.GA9085@gondor.apana.org.au>


* Herbert Xu <herbert@gondor.apana.org.au> wrote:

> On Tue, Jan 31, 2006 at 10:24:32PM +0100, Ingo Molnar wrote:
> >
> >  [<c04de9e8>] _write_lock+0x8/0x10
> >  [<c0499015>] inet6_destroy_sock+0x25/0x100
> >  [<c04b8672>] tcp_v6_destroy_sock+0x12/0x20
> >  [<c046bbda>] inet_csk_destroy_sock+0x4a/0x150
> >  [<c047625c>] tcp_rcv_state_process+0xd4c/0xdd0
> >  [<c047d8e9>] tcp_v4_do_rcv+0xa9/0x340
> >  [<c047eabb>] tcp_v4_rcv+0x8eb/0x9d0
> 
> OK this is definitely broken.  We should never touch the dst lock in 
> softirq context.  Since inet6_destroy_sock may be called from that 
> context due to the asynchronous nature of sockets, we can't take the 
> lock there.
> 
> In fact this sk_dst_reset is totally redundant since all IPv6 sockets 
> use inet_sock_destruct as their socket destructor which always cleans 
> up the dst anyway.  So the solution is to simply remove the call.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

cool - i've booted your patch and will have results later today [it's 
looking good so far, after 10 minutes of uptime ;)]

btw., this codepath took some time to trigger, and i'm not sure why: 
maybe because i dont have any true ipv6 traffic? (In fact i dont have 
CONFIG_IPV6 enabled at all in this kernel config - so this codepath must 
be an effect of ipv4/ipv6 unification?) I guess i should run the ipv6 
testsuite to expose all the important codepaths to the validator? [Would 
you have any suggestions for me how to do that quickly & easily?]

another thing: could you add the string 'lock validator' somewhere into 
the changelog, so that we can grep for such things later on? One reason 
for that is to strenghten my future arguments for mainline inclusion 
;-), but there's another argument as well:

the lock validator finds _all_ hidden deadlocks [no matter how obscure, 
interdependent or unlikely they are, as long as the affected codepath is 
triggered at least once], and thus the resulting bug statistics and 
characteristics will be an excellent (and one-time) opportunity to 
objectively judge the absolute code quality (and defect rate) of the 
Linux kernel, for an important category of hard-to-find bugs. (Maybe the 
results can even be extrapolated to other, similarly hard-to-find bug 
categories.)

in other words, the lock validator is building a runtime set of formal 
"locking requirements", and is automatically proving (and maintaining 
the proof) that all locking activity within the kernel meets those 
requirements, mathematically. It thus enables us to achieve a hard 
ceiling of 100% correctness that we can rarely achieve in complex code 
as the kernel. We should minimize the changeset entropy introduced and 
preserve this historic information.

	Ingo

^ permalink raw reply

* Re: [RFC] Poor Network Performance with e1000 on 2.6.14.3
From: Ashutosh Naik @ 2006-02-01 12:19 UTC (permalink / raw)
  To: Ben Greear, linux-kernel, netdev, linux-net, Andrew Morton,
	jgarzik
In-Reply-To: <43E04712.6080108@candelatech.com>

[-- Attachment #1: Type: text/plain, Size: 821 bytes --]

On 2/1/06, Ben Greear <greearb@candelatech.com> wrote:
> Ashutosh Naik wrote:
> > Now, I assume that on Gigabit ethernet, I should be getting Line Rate,
> > which is around 220 MBps. Even the CPU is not getting max-ed out here
> > and I am at a loss to understand this behaviour.
>
> Make sure you are running 64-bit 100+Mhz PCI, otherwise you will not
> get line speed.  ethtool -d [device]
> may get that info for you.

Thanks for that Ben. I now used a 64 bit,133 MHz PCI-X on both
machines and I got around  180 MBps, (86259.48 + 95164.90 ), which is
still a decent way away from Line speed. I think PCI is not a
bottleneck now, although I could be wrong. What could I be missing,
and has anybody seen line speed ( 220 MBps ) with e1000 ?

Regards and Thanks
Ashutosh

ps - Attaching my ettcp log

[-- Attachment #2: log.txt --]
[-- Type: text/plain, Size: 1224 bytes --]

[root@localhost ~]# ettcp -s -t -i 300ttcp-r: accept from 192.168.90.100
User: 17553  Nice: 0  System 8885  Idle:737051  Params:4

ttcp-t: buflen=65536, nbuf=2048, align=16384/0, port=5001  tcp  -> 192.168.90.10
0
nttcp-t: socket
nttcp-t: connect
User: 17555  Nice: 0  System 8892  Idle:737076  Params:4
^[
User: 19818  Nice: 0  System 14716  Idle:747604  Params:4
User_diff: 2265  Nice_diff: 0  System_diff: 5831  Idle_diff:10553 Tot:18649
User: 12.145423  System: 31.267092  Nice 0.000000  Idle:56.587485        
nttcp-r: Buflen:65536 SysLoad:31.27% 29232070656 bytes 299.97secs =95164.90 KB/s
ec
nttcp-r: 699884 I/O calls, msec/call = 0.44, calls/sec = 2333.15
nttcp-r: 0.2user 32.6sys 4:59real 10% 0i+0d 0maxrss 0+30pf 691228+3947csw
User: 19834  Nice: 0  System 14722  Idle:747623  Params:4
User_diff: 2279  Nice_diff: 0  System_diff: 5830  Idle_diff:10547 Tot:18656
User: 12.215909  System: 31.250000  Nice 0.000000  Idle:56.534091        
nttcp-t: Buflen:65536 SysLoad:31.25% 26499153920 bytes 300.00secs =86259.48 KB/s
ec
nttcp-t: 404345 I/O calls, msec/call = 0.76, calls/sec = 1347.80
nttcp-t: 0.2user 44.9sys 5:00real 15% 0i+0d 0maxrss 0+30pf 374765+1306csw
[1]+  Done                    ettcp -s -r -l 65536





^ permalink raw reply

* Re: 2.6.14.3 and page allocation failures..
From: Sai Bathina @ 2006-02-01 13:35 UTC (permalink / raw)
  To: Brandeburg, Jesse; +Cc: e1000-devel, netdev
In-Reply-To: <C925F8B43D79CC49ACD0601FB68FF50C06909DA8@orsmsx408>

Thanks for the post:

Here's the /proc/slabinfo output:
slabinfo - version: 2.1
# name            <active_objs> <num_objs> <objsize> <objperslab>
<pagesperslab> : tunables <limit> <batchcount> <sharedfactor> :
slabdata <activ
e_slabs> <num_slabs> <sharedavail>
ip_conntrack_syn       0      0     64   59    1 : tunables  120   60 
  8 : slabdata      0      0      0
ip_fib_alias           9    113     32  113    1 : tunables  120   60 
  8 : slabdata      1      1      0
ip_fib_hash            9    113     32  113    1 : tunables  120   60 
  8 : slabdata      1      1      0
rpc_buffers            8      8   2048    2    1 : tunables   24   12 
  8 : slabdata      4      4      0
rpc_tasks              8     15    256   15    1 : tunables  120   60 
  8 : slabdata      1      1      0
rpc_inode_cache        0      0    512    7    1 : tunables   54   27 
  8 : slabdata      0      0      0
bridge_fdb_cache       4     59     64   59    1 : tunables  120   60 
  8 : slabdata      1      1      0
UNIX                   7      7    512    7    1 : tunables   54   27 
  8 : slabdata      1      1      0
ipt_hashlimit          0      0     40   92    1 : tunables  120   60 
  8 : slabdata      0      0      0
ip_tproxy              0      0    128   30    1 : tunables  120   60 
  8 : slabdata      0      0      0
ip_conntrack_expect      0      0     96   40    1 : tunables  120  
60    8 : slabdata      0      0      0
ip_conntrack           1     16    248   16    1 : tunables  120   60 
  8 : slabdata      1      1      0
tcp_bind_bucket        1    203     16  203    1 : tunables  120   60 
  8 : slabdata      1      1      0
inet_peer_cache        0      0     64   59    1 : tunables  120   60 
  8 : slabdata      0      0      0
ip_dst_cache           0      0    256   15    1 : tunables  120   60 
  8 : slabdata      0      0      0
arp_cache              0      0    256   15    1 : tunables  120   60 
  8 : slabdata      0      0      0
RAW                    3      7    512    7    1 : tunables   54   27 
  8 : slabdata      1      1      0
UDP                    0      0    512    7    1 : tunables   54   27 
  8 : slabdata      0      0      0
tw_sock_TCP            0      0    128   30    1 : tunables  120   60 
  8 : slabdata      0      0      0
request_sock_TCP       0      0     64   59    1 : tunables  120   60 
  8 : slabdata      0      0      0
TCP                    4      7   1152    7    2 : tunables   24   12 
  8 : slabdata      1      1      0
cfq_ioc_pool           0      0     48   78    1 : tunables  120   60 
  8 : slabdata      0      0      0
cfq_pool               0      0     96   40    1 : tunables  120   60 
  8 : slabdata      0      0      0
crq_pool               0      0     48   78    1 : tunables  120   60 
  8 : slabdata      0      0      0
deadline_drq           0      0     52   72    1 : tunables  120   60 
  8 : slabdata      0      0      0
as_arq                12     59     64   59    1 : tunables  120   60 
  8 : slabdata      1      1      0
mqueue_inode_cache      1      6    640    6    1 : tunables   54   27
   8 : slabdata      1      1      0
nfs_write_data        36     42    512    7    1 : tunables   54   27 
  8 : slabdata      6      6      0
nfs_read_data         32     35    512    7    1 : tunables   54   27 
  8 : slabdata      5      5      0
nfs_inode_cache        0      0    668    6    1 : tunables   54   27 
  8 : slabdata      0      0      0
nfs_page               0      0     64   59    1 : tunables  120   60 
  8 : slabdata      0      0      0
ext2_inode_cache     649    816    492    8    1 : tunables   54   27 
  8 : slabdata    102    102      0
dnotify_cache          0      0     20  169    1 : tunables  120   60 
  8 : slabdata      0      0      0
eventpoll_pwq          0      0     36  101    1 : tunables  120   60 
  8 : slabdata      0      0      0
eventpoll_epi          0      0    128   30    1 : tunables  120   60 
  8 : slabdata      0      0      0
inotify_event_cache      0      0     28  127    1 : tunables  120  
60    8 : slabdata      0      0      0
inotify_watch_cache      0      0     36  101    1 : tunables  120  
60    8 : slabdata      0      0      0
kioctx                 0      0    256   15    1 : tunables  120   60 
  8 : slabdata      0      0      0
kiocb                  0      0    128   30    1 : tunables  120   60 
  8 : slabdata      0      0      0
fasync_cache           0      0     16  203    1 : tunables  120   60 
  8 : slabdata      0      0      0
shmem_inode_cache      3      8    476    8    1 : tunables   54   27 
  8 : slabdata      1      1      0
posix_timers_cache      0      0    104   37    1 : tunables  120   60
   8 : slabdata      0      0      0
uid_cache              0      0     64   59    1 : tunables  120   60 
  8 : slabdata      0      0      0
sgpool-128            32     32   2048    2    1 : tunables   24   12 
  8 : slabdata     16     16      0
sgpool-64             32     32   1024    4    1 : tunables   54   27 
  8 : slabdata      8      8      0
sgpool-32             32     32    512    8    1 : tunables   54   27 
  8 : slabdata      4      4      0
sgpool-16             32     45    256   15    1 : tunables  120   60 
  8 : slabdata      3      3      0
sgpool-8              32     60    128   30    1 : tunables  120   60 
  8 : slabdata      2      2      0
blkdev_ioc            33    127     28  127    1 : tunables  120   60 
  8 : slabdata      1      1      0
blkdev_queue          26     27    412    9    1 : tunables   54   27 
  8 : slabdata      3      3      0
blkdev_requests       24     48    160   24    1 : tunables  120   60 
  8 : slabdata      2      2      0
biovec-(256)         256    256   3072    2    2 : tunables   24   12 
  8 : slabdata    128    128      0
biovec-128           256    260   1536    5    2 : tunables   24   12 
  8 : slabdata     52     52      0
biovec-64            256    260    768    5    1 : tunables   54   27 
  8 : slabdata     52     52      0
biovec-16            259    270    256   15    1 : tunables  120   60 
  8 : slabdata     18     18      0
biovec-4             256    295     64   59    1 : tunables  120   60 
  8 : slabdata      5      5      0
biovec-1             272    406     16  203    1 : tunables  120   60 
  8 : slabdata      2      2      0
bio                  317   1170    128   30    1 : tunables  120   60 
  8 : slabdata     39     39      0
file_lock_cache       11     40     96   40    1 : tunables  120   60 
  8 : slabdata      1      1      0
sock_inode_cache      21     21    512    7    1 : tunables   54   27 
  8 : slabdata      3      3      0
skbuff_fclone_cache      0      0    384   10    1 : tunables   54  
27    8 : slabdata      0      0      0
skbuff_head_cache  32955  32955    256   15    1 : tunables  120   60 
  8 : slabdata   2197   2197      0
proc_inode_cache      70    170    392   10    1 : tunables   54   27 
  8 : slabdata     17     17      0
sigqueue               8     26    148   26    1 : tunables  120   60 
  8 : slabdata      1      1      0
radix_tree_node      698    728    276   14    1 : tunables   54   27 
  8 : slabdata     52     52      0
bdev_cache             7      7    512    7    1 : tunables   54   27 
  8 : slabdata      1      1      0
sysfs_dir_cache     1202   1288     40   92    1 : tunables  120   60 
  8 : slabdata     14     14      0
mnt_cache             17     30    128   30    1 : tunables  120   60 
  8 : slabdata      1      1      0
inode_cache          340    340    376   10    1 : tunables   54   27 
  8 : slabdata     34     34      0
dentry_cache        1095   1701    144   27    1 : tunables  120   60 
  8 : slabdata     63     63      0
filp                 390    495    256   15    1 : tunables  120   60 
  8 : slabdata     33     33      0
names_cache           12     12   4096    1    1 : tunables   24   12 
  8 : slabdata     12     12      0
idr_layer_cache       64     87    136   29    1 : tunables  120   60 
  8 : slabdata      3      3      0
buffer_head          900   1224     52   72    1 : tunables  120   60 
  8 : slabdata     17     17      0
mm_struct             48     48    640    6    1 : tunables   54   27 
  8 : slabdata      8      8      0
vm_area_struct       576    748     88   44    1 : tunables  120   60 
  8 : slabdata     17     17      0
fs_cache              59     59     64   59    1 : tunables  120   60 
  8 : slabdata      1      1      0
files_cache           56     56    512    7    1 : tunables   54   27 
  8 : slabdata      8      8      0
signal_cache          70     70    384   10    1 : tunables   54   27 
  8 : slabdata      7      7      0
sighand_cache         65     65   1408    5    2 : tunables   24   12 
  8 : slabdata     13     13      0
task_struct           66     66   1280    3    1 : tunables   24   12 
  8 : slabdata     22     22      0
anon_vma             367    406     16  203    1 : tunables  120   60 
  8 : slabdata      2      2      0
pgd                   48     48   4096    1    1 : tunables   24   12 
  8 : slabdata     48     48      0
size-131072(DMA)       0      0 131072    1   32 : tunables    8    4 
  0 : slabdata      0      0      0
size-131072            0      0 131072    1   32 : tunables    8    4 
  0 : slabdata      0      0      0
size-65536(DMA)        0      0  65536    1   16 : tunables    8    4 
  0 : slabdata      0      0      0
size-65536             3      3  65536    1   16 : tunables    8    4 
  0 : slabdata      3      3      0
size-32768(DMA)        0      0  32768    1    8 : tunables    8    4 
  0 : slabdata      0      0      0
size-32768             0      0  32768    1    8 : tunables    8    4 
  0 : slabdata      0      0      0
size-16384(DMA)        0      0  16384    1    4 : tunables    8    4 
  0 : slabdata      0      0      0
size-16384             0      0  16384    1    4 : tunables    8    4 
  0 : slabdata      0      0      0
size-8192(DMA)         0      0   8192    1    2 : tunables    8    4 
  0 : slabdata      0      0      0
size-8192             58     58   8192    1    2 : tunables    8    4 
  0 : slabdata     58     58      0
size-4096(DMA)         0      0   4096    1    1 : tunables   24   12 
  8 : slabdata      0      0      0
size-4096             46     46   4096    1    1 : tunables   24   12 
  8 : slabdata     46     46      0
size-2048(DMA)         0      0   2048    2    1 : tunables   24   12 
  8 : slabdata      0      0      0
size-2048          33000  33000   2048    2    1 : tunables   24   12 
  8 : slabdata  16500  16500     24
size-1024(DMA)         0      0   1024    4    1 : tunables   54   27 
  8 : slabdata      0      0      0
size-1024             80     80   1024    4    1 : tunables   54   27 
  8 : slabdata     20     20      0
size-512(DMA)          0      0    512    8    1 : tunables   54   27 
  8 : slabdata      0      0      0
size-512             184    184    512    8    1 : tunables   54   27 
  8 : slabdata     23     23      0
size-256(DMA)          0      0    256   15    1 : tunables  120   60 
  8 : slabdata      0      0      0
size-256             120    120    256   15    1 : tunables  120   60 
  8 : slabdata      8      8      0
size-128(DMA)          0      0    128   30    1 : tunables  120   60 
  8 : slabdata      0      0      0
size-128             900    900    128   30    1 : tunables  120   60 
  8 : slabdata     30     30      0
size-64(DMA)           0      0     64   59    1 : tunables  120   60 
  8 : slabdata      0      0      0
size-32(DMA)           0      0     32  113    1 : tunables  120   60 
  8 : slabdata      0      0      0
size-64            82658  82718     64   59    1 : tunables  120   60 
  8 : slabdata   1402   1402      0
size-32              851    904     32  113    1 : tunables  120   60 
  8 : slabdata      8      8      0
kmem_cache           120    120    128   30    1 : tunables  120   60 
  8 : slabdata      4      4      0


On 1/31/06, Brandeburg, Jesse <jesse.brandeburg@intel.com> wrote:
> Sorry for the top post.
>
> You may want to post this to netdev@vger.kernel.org.
>
> We haven't heard any reports of e100 itself leaking.  You need to run
> slabtop or send the output of /proc/slabinfo to see what slab entry is
> taking all the memory on your system.
>
> At this point some other component is suspect.
>
> Jesse
>
>
> > -----Original Message-----
> > From: e1000-devel-admin@lists.sourceforge.net [mailto:e1000-devel-
> > admin@lists.sourceforge.net] On Behalf Of Sai Bathina
> > Sent: Monday, January 30, 2006 4:03 PM
> > To: e1000-devel@lists.sourceforge.net
> > Subject: [E1000-devel] 2.6.14.3 and page allocation failures..
> >
> > Hi,
> >      I am seeing page allocation errors and I have a snapshot of the
> > /var/log/messages.
> >
> > My Hardware configs are
> > Dell 650, using e100 driver(e100-3.4.14-k2-NAPI), driver from the
> kernel
> >
> > model name      : Intel(R) Pentium(R) 4 CPU 2.40GHz cache size      :
> > 512 KB
> >
> >   total       used       free     shared    buffers     cached
> > Mem:        255744     188400      67344          0        680
> 11064
> > -/+ buffers/cache:     176656      79088
> > Swap:      2097136     134740    1962396
> >
> >
> > Observed the failures immediately after running network traffic.
> > I am assuming that it has something to do with the e100 driver page
> > allocations here....I would appreciate if anyone can help me in
> > figuring out what is going wrong with this.
> >
> > Jan 30 22:46:59 : swapper: page allocation failure. order:0, mode:0x20
> > Jan 30 22:46:59 :  [__alloc_pages+936/976] __alloc_pages+0x3a8/0x3d0
> > Jan 30 22:46:59 :  [br_nf_forward_ip+264/296]
> br_nf_forward_ip+0x108/0x128
> > Jan 30 22:46:59 :  [kmem_getpages+44/132] kmem_getpages+0x2c/0x84
> > Jan 30 22:46:59 :  [cache_grow+163/316] cache_grow+0xa3/0x13c
> > Jan 30 22:46:59 :  [cache_alloc_refill+415/488]
> > cache_alloc_refill+0x19f/0x1e8
> > Jan 30 22:46:59 :  [skb_checksum+72/628] skb_checksum+0x48/0x274
> > Jan 30 22:46:59 :  [kmem_cache_alloc+63/76] kmem_cache_alloc+0x3f/0x4c
> > Jan 30 22:47:00 :  [ip_conntrack_alloc+133/280]
> > ip_conntrack_alloc+0x85/0x118
> > Jan 30 22:47:00 :  [init_conntrack+48/348] init_conntrack+0x30/0x15c
> > Jan 30 22:47:00 :  [rpc_get_inode+52/148] rpc_get_inode+0x34/0x94
> > Jan 30 22:47:00 :  [ip_conntrack_in+317/824]
> ip_conntrack_in+0x13d/0x338
> > Jan 30 22:47:00 :  [br_nf_pre_routing_finish+0/704]
> > br_nf_pre_routing_finish+0x0/0x2c0
> > Jan 30 22:47:00 :  [nf_iterate+69/112] nf_iterate+0x45/0x70
> > Jan 30 22:47:00 :  [br_nf_pre_routing_finish+0/704]
> > br_nf_pre_routing_finish+0x0/0x2c0
> > Jan 30 22:47:00 :  [nf_hook_slow+100/260] nf_hook_slow+0x64/0x104
> > Jan 30 22:47:00 :  [br_nf_pre_routing_finish+0/704]
> > br_nf_pre_routing_finish+0x0/0x2c0
> > Jan 30 22:47:00 :  [br_handle_frame_finish+0/220]
> > br_handle_frame_finish+0x0/0xdc
> > Jan 30 22:47:00 :  [br_nf_pre_routing+907/948]
> > br_nf_pre_routing+0x38b/0x3b4
> > Jan 30 22:47:00 :  [br_nf_pre_routing_finish+0/704]
> > br_nf_pre_routing_finish+0x0/0x2c0
> > Jan 30 22:47:00 :  [br_handle_frame_finish+0/220]
> > br_handle_frame_finish+0x0/0xdc
> > Jan 30 22:47:00 :  [nf_iterate+69/112] nf_iterate+0x45/0x70
> > Jan 30 22:47:00 :  [br_handle_frame_finish+0/220]
> > br_handle_frame_finish+0x0/0xdc
> > Jan 30 22:47:00 :  [nf_hook_slow+100/260] nf_hook_slow+0x64/0x104
> > Jan 30 22:47:00 :  [br_handle_frame_finish+0/220]
> > br_handle_frame_finish+0x0/0xdc
> > Jan 30 22:47:00 :  [br_handle_frame+385/476]
> br_handle_frame+0x181/0x1dc
> > Jan 30 22:47:00 :  [br_handle_frame_finish+0/220]
> > br_handle_frame_finish+0x0/0xdc
> > Jan 30 22:47:00 :  [netif_receive_skb+322/576]
> > netif_receive_skb+0x142/0x240
> > Jan 30 22:47:00 :  [pg0+273725344/1069114368] e100_poll+0x274/0x6bc
> [e100]
> > Jan 30 22:47:00 :  [net_rx_action+128/296] net_rx_action+0x80/0x128
> > Jan 30 22:47:00 :  [gcc2_compiled.+141/256] __do_softirq+0x8d/0x100
> > Jan 30 22:47:00 :  [do_softirq+47/52] do_softirq+0x2f/0x34
> > Jan 30 22:47:00 :  [irq_exit+52/64] irq_exit+0x34/0x40
> > Jan 30 22:47:00 :  [gcc2_compiled.+32/40] do_IRQ+0x20/0x28
> > Jan 30 22:47:00 :  [common_interrupt+26/32] common_interrupt+0x1a/0x20
> > Jan 30 22:47:00 :  [default_idle+0/44] default_idle+0x0/0x2c
> > Jan 30 22:47:00 :  [default_idle+35/44] default_idle+0x23/0x2c
> > Jan 30 22:47:00 :  [cpu_idle+123/140] cpu_idle+0x7b/0x8c
> > Jan 30 22:47:00 :  [rest_init+45/48] rest_init+0x2d/0x30
> > Jan 30 22:47:00 :  [start_kernel+402/404] start_kernel+0x192/0x194
> > Jan 30 22:47:00 : Mem-info:
> > Jan 30 22:47:00 : DMA per-cpu:
> > Jan 30 22:47:00 : cpu 0 hot: low 2, high 6, batch 1 used:2
> > Jan 30 22:47:00 : cpu 0 cold: low 0, high 2, batch 1 used:0
> > Jan 30 22:47:00 : Normal per-cpu:
> > Jan 30 22:47:00 : cpu 0 hot: low 62, high 186, batch 31 used:92
> > Jan 30 22:47:00 : cpu 0 cold: low 0, high 62, batch 31 used:59
> > Jan 30 22:47:00 : HighMem per-cpu: empty
> > Jan 30 22:47:00 : Free pages:        1724kB (0kB HighMem)
> > Jan 30 22:47:00 : Active:42014 inactive:17287 dirty:18 writeback:0
> > unstable:0 free:431 slab:2213 mapped:57515 pagetables:162
> > Jan 30 22:47:00 : DMA free:1004kB min:124kB low:152kB high:184kB
> > active:5396kB inactive:5872kB present:16384kB pages_scanned:0
> > all_unreclaimable? no
> > Jan 30 22:47:00 : lowmem_reserve[]: 0 239 239
> > Jan 30 22:47:00 : Normal free:720kB min:1916kB low:2392kB high:2872kB
> > active:162660kB inactive:63276kB present:245696kB pages_scanned:0
> > all_unreclaimable? no
> > Jan 30 22:47:00 : lowmem_reserve[]: 0 0 0
> > Jan 30 22:47:00 : HighMem free:0kB min:128kB low:160kB high:192kB
> > active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable?
> > no
> > Jan 30 22:47:00 : lowmem_reserve[]: 0 0 0
> > Jan 30 22:47:00 : DMA: 1*4kB 1*8kB 0*16kB 1*32kB 1*64kB 1*128kB
> > 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1004kB
> > Jan 30 22:47:00 : Normal: 0*4kB 0*8kB 1*16kB 0*32kB 1*64kB 1*128kB
> > 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 720kB
> >
> > Jan 30 22:47:00 : HighMem: empty
> > Jan 30 22:47:00 : Swap cache: add 2750, delete 2255, find 199/211,
> race
> > 0+0
> > Jan 30 22:47:00 : Free swap  = 2086600kB
> > Jan 30 22:47:00 : Total swap = 2097136kB
> > Jan 30 22:47:00 : Free swap:       2086600kB
> > Jan 30 22:47:00 : 65520 pages of RAM
> > Jan 30 22:47:00 : 0 pages of HIGHMEM
> > Jan 30 22:47:00 : 2608 reserved pages
> > Jan 30 22:47:00 : 3922 pages shared
> > Jan 30 22:47:00 : 495 pages swap cached
> > Jan 30 22:47:00 : 18 pages dirty
> > Jan 30 22:47:00 : 0 pages writeback
> > Jan 30 22:47:00 : 57515 pages mapped
> > Jan 30 22:47:00 : 2213 pages slab
> > Jan 30 22:47:00 : 162 pages pagetables
> > Jan 30 22:47:02 : snort2: page allocation failure. order:0, mode:0x20
> > Jan 30 22:47:02 :  [__alloc_pages+936/976] __alloc_pages+0x3a8/0x3d0
> > Jan 30 22:47:02 :  [kmem_getpages+44/132] kmem_getpages+0x2c/0x84
> > Jan 30 22:47:02 :  [cache_grow+163/316] cache_grow+0xa3/0x13c
> > Jan 30 22:47:02 :  [cache_alloc_refill+415/488]
> > cache_alloc_refill+0x19f/0x1e8
> > Jan 30 22:47:02 :  [__kmalloc+115/136] __kmalloc+0x73/0x88
> > Jan 30 22:47:02 :  [__alloc_skb+88/292] __alloc_skb+0x58/0x124
> > Jan 30 22:47:02 :  [pg0+273725486/1069114368] e100_poll+0x302/0x6bc
> [e100]
> > Jan 30 22:47:02 :  [net_rx_action+128/296] net_rx_action+0x80/0x128
> > Jan 30 22:47:02 :  [gcc2_compiled.+141/256] __do_softirq+0x8d/0x100
> > Jan 30 22:47:02 :  [do_softirq+47/52] do_softirq+0x2f/0x34
> > Jan 30 22:47:02 :  [irq_exit+52/64] irq_exit+0x34/0x40
> > Jan 30 22:47:02 :  [gcc2_compiled.+32/40] do_IRQ+0x20/0x28
> > Jan 30 22:47:02 :  [common_interrupt+26/32] common_interrupt+0x1a/0x20
> > Jan 30 22:47:02 : Mem-info:
> > Jan 30 22:47:02 : DMA per-cpu:
> > Jan 30 22:47:02 : cpu 0 hot: low 2, high 6, batch 1 used:2
> > Jan 30 22:47:02 : cpu 0 cold: low 0, high 2, batch 1 used:1
> > Jan 30 22:47:02 : Normal per-cpu:
> > Jan 30 22:47:02 : cpu 0 hot: low 62, high 186, batch 31 used:92
> > Jan 30 22:47:02 : cpu 0 cold: low 0, high 62, batch 31 used:44
> > Jan 30 22:47:02 : HighMem per-cpu: empty
> > Jan 30 22:47:02 : Free pages:        1616kB (0kB HighMem)
> > Jan 30 22:47:02 : Active:26538 inactive:14460 dirty:0 writeback:1337
> > unstable:0 free:404 slab:20545 mapped:26431 pagetables:168
> > Jan 30 22:47:02 : DMA free:1004kB min:124kB low:152kB high:184kB
> > active:2768kB inactive:2032kB present:16384kB pages_scanned:0
> > all_unreclaimable? no
> > Jan 30 22:47:02 : lowmem_reserve[]: 0 239 239
> > Jan 30 22:47:02 : Normal free:612kB min:1916kB low:2392kB high:2872kB
> > active:103384kB inactive:55808kB present:245696kB pages_scanned:0
> > all_unreclaimable? no
> > Jan 30 22:47:02 : lowmem_reserve[]: 0 0 0
> > Jan 30 22:47:02 : HighMem free:0kB min:128kB low:160kB high:192kB
> > active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable?
> > no
> > Jan 30 22:47:02 : lowmem_reserve[]: 0 0 0
> > Jan 30 22:47:02 : DMA: 1*4kB 1*8kB 0*16kB 1*32kB 1*64kB 1*128kB
> > 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1004kB
> > Jan 30 22:47:02 : Normal: 1*4kB 0*8kB 0*16kB 1*32kB 1*64kB 0*128kB
> > 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 612kB
> > Jan 30 22:47:02 : HighMem: empty
> > Jan 30 22:47:02 : Swap cache: add 34195, delete 20265, find 291/345,
> race
> > 0+0
> > Jan 30 22:47:02 : Free swap  = 1962140kB
> > Jan 30 22:47:02 : Total swap = 2097136kB
> > Jan 30 22:47:02 : Free swap:       1962140kB
> > Jan 30 22:47:02 : 65520 pages of RAM
> > Jan 30 22:47:02 : 0 pages of HIGHMEM
> > Jan 30 22:47:02 : 2608 reserved pages
> > Jan 30 22:47:02 : 3785 pages shared
> > Jan 30 22:47:02 : 13930 pages swap cached
> > Jan 30 22:47:02 : 0 pages dirty
> > Jan 30 22:47:02 : 1337 pages writeback
> > Jan 30 22:47:02 : 26431 pages mapped
> > Jan 30 22:47:02 : 20545 pages slab
> > Jan 30 22:47:02 : 168 pages pagetables
> >
> >
> > -------------------------------------------------------
> > This SF.net email is sponsored by: Splunk Inc. Do you grep through log
> > files
> > for problems?  Stop!  Download the new AJAX search engine that makes
> > searching your log files as easy as surfing the  web.  DOWNLOAD
> SPLUNK!
> > http://sel.as-us.falkag.net/sel?cmd=k&kid\x103432&bid#0486&dat\x121642
> > _______________________________________________
> > E1000-devel mailing list
> > E1000-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/e1000-devel
>


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid\x103432&bid#0486&dat\x121642

^ permalink raw reply

* Re: Re: [GIT PULL] bcm43xx: Update B6PHY initialization
From: Michael Buesch @ 2006-02-01 13:55 UTC (permalink / raw)
  To: Danny van Dyk
  Cc: bcm43xx-dev-0fE9KPoRgkgATYTw5x5z8w, netdev-u79uwXL29TY76Z2rM5mHXA,
	bcm43xx-dev-0fE9KPoRgkgATYTw5x5z8w
In-Reply-To: <200602011051.00973.kugelfang-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 3763 bytes --]

On Wednesday 01 February 2006 10:51, Danny van Dyk wrote:
> John, please
> 
> git pull rsync://pitr.amd64.dev.gentoo.org/kugelfang/wireless-2.6.git
> 
> which will provide this changeset:
> 
> Danny van Dyk:
>       [bcm43xx] Sync bcm43xx_phy_initb6() with specs

Danny, _please_ make sure to apply patches to the softmac _and_ the dscape
branch. Otherwise patches will get lost.
It is very important to keep both branches in sync.

I know that this is annoying work, but we must stick with it, until
we have a usable 80211 stack in the kernel.
I also know that 99% of the people are not interrested in the dscape
branch, but it is important to keep it working. The only way to find
weaknesses in d80211 is to maintain drivers which use it.

Besides that, you can also request a pull from me. The git snapshots
are generated from my tree, so this is the fastest way to get your
changes into the snapshots. But that's entirely your decision.

> diff --git a/drivers/net/wireless/bcm43xx/bcm43xx_phy.c 
> b/drivers/net/wireless/bcm43xx/bcm43xx_phy.c
> index f5e7a6a..d90f207 100644
> --- a/drivers/net/wireless/bcm43xx/bcm43xx_phy.c
> +++ b/drivers/net/wireless/bcm43xx/bcm43xx_phy.c
> @@ -947,7 +947,7 @@ static void bcm43xx_phy_initb6(struct bc
>  	bcm43xx_radio_write16(bcm, 0x0050, 0x0020);
>  	if ((bcm->current_core->radio->manufact == 0x17F) &&
>  	    (bcm->current_core->radio->version == 0x2050) &&
> -	    (bcm->current_core->radio->revision == 2)) {
> +	    (bcm->current_core->radio->revision <= 2)) {
>  		bcm43xx_radio_write16(bcm, 0x0050, 0x0020);
>  		bcm43xx_radio_write16(bcm, 0x005A, 0x0070);
>  		bcm43xx_radio_write16(bcm, 0x005B, 0x007B);
> @@ -984,10 +984,15 @@ static void bcm43xx_phy_initb6(struct bc
>  		bcm43xx_write16(bcm, 0x03E4, 0x0009);
>  	if (phy->type == BCM43xx_PHYTYPE_B) {
>  		bcm43xx_write16(bcm, 0x03E6, 0x8140);
> -		bcm43xx_phy_write(bcm, 0x0016, 0x5410);
> -		bcm43xx_phy_write(bcm, 0x0017, 0xA820);
> -		bcm43xx_phy_write(bcm, 0x0007, 0x0062);
> -		TODO();//TODO: calibrate stuff.
> +		bcm43xx_phy_write(bcm, 0x0016, 0x0410);
> +		bcm43xx_phy_write(bcm, 0x0017, 0x0820);
> +		bcm43xx_phy_write(bcm, 0x0062, 0x0007);
> +		(void) bcm43xx_radio_calibrationvalue(bcm);
> +		bcm43xx_phy_lo_b_measure(bcm);
> +		if (bcm->sprom.boardflags & BCM43xx_BFL_RSSI) {
> +			bcm43xx_calc_nrssi_slope(bcm);
> +			bcm43xx_calc_nrssi_threshold(bcm);
> +		}
>  		bcm43xx_phy_init_pctl(bcm);
>  	} else
>  		bcm43xx_write16(bcm, 0x03E6, 0x0);
> diff --git a/drivers/net/wireless/bcm43xx/bcm43xx_radio.c 
> b/drivers/net/wireless/bcm43xx/bcm43xx_radio.c
> index 5ce6ace..3901aa9 100644
> --- a/drivers/net/wireless/bcm43xx/bcm43xx_radio.c
> +++ b/drivers/net/wireless/bcm43xx/bcm43xx_radio.c
> @@ -1184,7 +1184,7 @@ int bcm43xx_radio_set_interference_mitig
>  	return 0;
>  }
>  
> -static u16 bcm43xx_radio_calibrationvalue(struct bcm43xx_private *bcm)
> +u16 bcm43xx_radio_calibrationvalue(struct bcm43xx_private *bcm)
>  {
>  	u16 reg, index, ret;
>  
> diff --git a/drivers/net/wireless/bcm43xx/bcm43xx_radio.h 
> b/drivers/net/wireless/bcm43xx/bcm43xx_radio.h
> index 89fe292..a5d2e10 100644
> --- a/drivers/net/wireless/bcm43xx/bcm43xx_radio.h
> +++ b/drivers/net/wireless/bcm43xx/bcm43xx_radio.h
> @@ -89,5 +89,6 @@ void bcm43xx_nrssi_hw_update(struct bcm4
>  void bcm43xx_nrssi_mem_update(struct bcm43xx_private *bcm);
>  
>  void bcm43xx_radio_set_tx_iq(struct bcm43xx_private *bcm);
> +u16 bcm43xx_radio_calibrationvalue(struct bcm43xx_private *bcm);
>  
>  #endif /* BCM43xx_RADIO_H_ */
> -- 
> Danny van Dyk <kugelfang-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
> Gentoo/AMD64 Project, Gentoo Scientific Project
> 

-- 
Greetings Michael.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: [RFC] Backward compatibility and WAN netdev configuration
From: Marco d'Itri @ 2006-02-01 14:49 UTC (permalink / raw)
  To: Krzysztof Halasa; +Cc: netdev, lkml
In-Reply-To: <m3psm7tksr.fsf@defiant.localdomain>

On Feb 01, Krzysztof Halasa <khc@pm.waw.pl> wrote:

> a) Currently it consists of mid-layer WAN protocols single module (Cisco
>    HDLC, FR etc.) + low-level hardware HDLC card driver (C101, N2, PCI200SYN
>    etc.). I'm thinking about splitting the protocol module into separate
>    modules - it would make them independent, users would be able to
>    load, say, FR without PPP or X.25 and underlying syncppp, lapb etc.
>    From the technical POV it would be superior to current code but it
>    would require sysadmins to change modprobe.conf, add another modprobe
>    or something like that. Not a real problem but the upgrade can't be
>    automatic.
Why you cannot support autoloading the modules when a specific protocol
is needed?

-- 
ciao,
Marco

^ permalink raw reply

* Re: [RFC] Backward compatibility and WAN netdev configuration
From: Krzysztof Halasa @ 2006-02-01 18:33 UTC (permalink / raw)
  To: Marco d'Itri; +Cc: netdev, lkml
In-Reply-To: <20060201144917.GA7644@wonderland.linux.it>

md@Linux.IT (Marco d'Itri) writes:

> Why you cannot support autoloading the modules when a specific protocol
> is needed?

I probably could but it would complicate things a bit - currently only
the protocol module knows about existence of its protocol.

I will look at it, though. Thanks.
-- 
Krzysztof Halasa

^ permalink raw reply

* Re: [RFC] Backward compatibility and WAN netdev configuration
From: Stephen Hemminger @ 2006-02-01 18:39 UTC (permalink / raw)
  To: Krzysztof Halasa; +Cc: Marco d'Itri, netdev, lkml
In-Reply-To: <m3r76n3p4k.fsf@defiant.localdomain>

On Wed, 01 Feb 2006 19:33:47 +0100
Krzysztof Halasa <khc@pm.waw.pl> wrote:

> md@Linux.IT (Marco d'Itri) writes:
> 
> > Why you cannot support autoloading the modules when a specific protocol
> > is needed?
> 
> I probably could but it would complicate things a bit - currently only
> the protocol module knows about existence of its protocol.
> 
> I will look at it, though. Thanks.

The modern way is to not have any entries in modprobe.conf, and do all
the module loading via kernel module_aliases. Modprobe.conf is then
reserved for handling workarounds for special cases
-- 
Stephen Hemminger <shemminger@osdl.org>
OSDL http://developer.osdl.org/~shemminger

^ permalink raw reply

* Re: Re: [GIT PULL] bcm43xx: Update B6PHY initialization
From: Danny van Dyk @ 2006-02-01 18:54 UTC (permalink / raw)
  To: John Linville
  Cc: Michael Buesch, bcm43xx-dev-0fE9KPoRgkgATYTw5x5z8w,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <200602011455.34307.mbuesch-KuiJ5kEpwI6ELgA04lAiVw@public.gmane.org>

John,

> On Wednesday 01 February 2006 10:51, Danny van Dyk wrote:
> > John, please
> >
> > git pull rsync://pitr.amd64.dev.gentoo.org/kugelfang/wireless-2.6.git
> >
> > which will provide this changeset:
> >
> > Danny van Dyk:
> >       [bcm43xx] Sync bcm43xx_phy_initb6() with specs
>
> Danny, _please_ make sure to apply patches to the softmac _and_ the dscape
> branch. Otherwise patches will get lost.
> It is very important to keep both branches in sync.

I just added the very same patch to the dscape-all branch in my repo. Please

    git pull rsync://pitr.amd64.dev.gentoo.org/kugelfang/wireless-2.6.git

I don't post the patch again, as it is really the same as the previously sent 
patch

When applying it, I noticed that the dscape branch's bcm43xx is differently 
named than the softmac branch's. Why ? :-)

Danny
-- 
Danny van Dyk <kugelfang-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
Gentoo/AMD64 Project, Gentoo Scientific Project

^ permalink raw reply

* Re: Re: [GIT PULL] bcm43xx: Update B6PHY initialization
From: Michael Buesch @ 2006-02-01 18:56 UTC (permalink / raw)
  To: Danny van Dyk
  Cc: John Linville, bcm43xx-dev-0fE9KPoRgkgATYTw5x5z8w,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <200602011954.52325.kugelfang-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 268 bytes --]

On Wednesday 01 February 2006 19:54, you wrote:
> When applying it, I noticed that the dscape branch's bcm43xx is differently 
> named than the softmac branch's. Why ? :-)

Both branches can co-exist in one tree (see branch "domesday")

-- 
Greetings Michael.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* [PATCH 0/5] Infiniband: connection abstraction
From: Sean Hefty @ 2006-02-01 20:03 UTC (permalink / raw)
  To: netdev, linux-kernel; +Cc: openib-general

Here's an updated version of these patches based on feedback.   (The license
did not change and continues to match that of the other Infiniband code.)
Please consider for inclusion in 2.6.17.

The following set of patches defines a connection abstraction for Infiniband and
other RDMA devices, and serves several purposes:

* It implements a connection protocol over Infiniband based on IP addressing.
This greatly simplifies clients wishing to establish connections over
Infiniband.

* It defines a connection abstraction that works over multiple RDMA devices.
The submitted implementation targets Infiniband, but has been tested over other
RDMA devices as well.

* It handles RDMA device insertion and removal on behalf of its clients.

The changes have been broken into 5 separate patches.  The basic purpose of each
patch is:

1. Provide common handling for marshalling data between userspace clients and
kernel mode Infiniband drivers.

2. Extend the Infiniband CM to include private data comparisons as part of its
connection request matching process.

3. Provide an address translation service that maps IP addresses to Infiniband
addresses (GIDs).  This patch touches outside of the Infiniband core, so I'm
including the netdev mailing list.

4. Implement the kernel mode RDMA connection management agent.

5. Implement the userspace RDMA connection management agent kernel support
module.

Please copy the openib-general mailing list on any replies.

Thanks,
Sean

^ permalink raw reply

* [PATCH 1/5] Infiniband: connection abstraction
From: Sean Hefty @ 2006-02-01 20:07 UTC (permalink / raw)
  To: netdev, linux-kernel; +Cc: openib-general
In-Reply-To: <ORSMSX401KZmcK8r6be00000097@orsmsx401.amr.corp.intel.com>

The following patch provides common handling for marshalling data between
Userspace clients and kernel mode Infiniband drivers.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

---

diff -uprN -X linux-2.6.git/Documentation/dontdiff 
linux-2.6.git/drivers/infiniband/core/Makefile 
linux-2.6.ib/drivers/infiniband/core/Makefile
--- linux-2.6.git/drivers/infiniband/core/Makefile	2006-01-16 10:25:27.000000000 -0800
+++ linux-2.6.ib/drivers/infiniband/core/Makefile	2006-01-16 15:34:15.000000000 -0800
@@ -16,4 +16,5 @@ ib_umad-y :=			user_mad.o
 
 ib_ucm-y :=			ucm.o
 
-ib_uverbs-y :=			uverbs_main.o uverbs_cmd.o uverbs_mem.o
+ib_uverbs-y :=			uverbs_main.o uverbs_cmd.o uverbs_mem.o \
+				uverbs_marshall.o
diff -uprN -X linux-2.6.git/Documentation/dontdiff 
linux-2.6.git/drivers/infiniband/core/ucm.c 
linux-2.6.ib/drivers/infiniband/core/ucm.c
--- linux-2.6.git/drivers/infiniband/core/ucm.c	2006-01-16 10:25:26.000000000 -0800
+++ linux-2.6.ib/drivers/infiniband/core/ucm.c	2006-01-16 15:34:15.000000000 -0800
@@ -30,7 +30,7 @@
  * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
  * SOFTWARE.
  *
- * $Id: ucm.c 2594 2005-06-13 19:46:02Z libor $
+ * $Id: ucm.c 4311 2005-12-05 18:42:01Z sean.hefty $
  */
 #include <linux/init.h>
 #include <linux/fs.h>
@@ -48,6 +48,7 @@
 
 #include <rdma/ib_cm.h>
 #include <rdma/ib_user_cm.h>
+#include <rdma/ib_marshall.h>
 
 MODULE_AUTHOR("Libor Michalek");
 MODULE_DESCRIPTION("InfiniBand userspace Connection Manager access");
@@ -203,36 +204,6 @@ error:
 	return NULL;
 }
 
-static void ib_ucm_event_path_get(struct ib_ucm_path_rec *upath,
-				  struct ib_sa_path_rec	 *kpath)
-{
-	if (!kpath || !upath)
-		return;
-
-	memcpy(upath->dgid, kpath->dgid.raw, sizeof *upath->dgid);
-	memcpy(upath->sgid, kpath->sgid.raw, sizeof *upath->sgid);
-
-	upath->dlid             = kpath->dlid;
-	upath->slid             = kpath->slid;
-	upath->raw_traffic      = kpath->raw_traffic;
-	upath->flow_label       = kpath->flow_label;
-	upath->hop_limit        = kpath->hop_limit;
-	upath->traffic_class    = kpath->traffic_class;
-	upath->reversible       = kpath->reversible;
-	upath->numb_path        = kpath->numb_path;
-	upath->pkey             = kpath->pkey;
-	upath->sl	        = kpath->sl;
-	upath->mtu_selector     = kpath->mtu_selector;
-	upath->mtu              = kpath->mtu;
-	upath->rate_selector    = kpath->rate_selector;
-	upath->rate             = kpath->rate;
-	upath->packet_life_time = kpath->packet_life_time;
-	upath->preference       = kpath->preference;
-
-	upath->packet_life_time_selector =
-		kpath->packet_life_time_selector;
-}
-
 static void ib_ucm_event_req_get(struct ib_ucm_req_event_resp *ureq,
 				 struct ib_cm_req_event_param *kreq)
 {
@@ -251,8 +222,10 @@ static void ib_ucm_event_req_get(struct 
 	ureq->srq                        = kreq->srq;
 	ureq->port			 = kreq->port;
 
-	ib_ucm_event_path_get(&ureq->primary_path, kreq->primary_path);
-	ib_ucm_event_path_get(&ureq->alternate_path, kreq->alternate_path);
+	ib_copy_path_rec_to_user(&ureq->primary_path, kreq->primary_path);
+	if (kreq->alternate_path)
+		ib_copy_path_rec_to_user(&ureq->alternate_path,
+					 kreq->alternate_path);
 }
 
 static void ib_ucm_event_rep_get(struct ib_ucm_rep_event_resp *urep,
@@ -322,8 +295,8 @@ static int ib_ucm_event_process(struct i
 		info	      = evt->param.rej_rcvd.ari;
 		break;
 	case IB_CM_LAP_RECEIVED:
-		ib_ucm_event_path_get(&uvt->resp.u.lap_resp.path,
-				      evt->param.lap_rcvd.alternate_path);
+		ib_copy_path_rec_to_user(&uvt->resp.u.lap_resp.path,
+					 evt->param.lap_rcvd.alternate_path);
 		uvt->data_len = IB_CM_LAP_PRIVATE_DATA_SIZE;
 		uvt->resp.present = IB_UCM_PRES_ALTERNATE;
 		break;
@@ -635,65 +608,11 @@ static ssize_t ib_ucm_attr_id(struct ib_
 	return result;
 }
 
-static void ib_ucm_copy_ah_attr(struct ib_ucm_ah_attr *dest_attr,
-				struct ib_ah_attr *src_attr)
-{
-	memcpy(dest_attr->grh_dgid, src_attr->grh.dgid.raw,
-	       sizeof src_attr->grh.dgid);
-	dest_attr->grh_flow_label = src_attr->grh.flow_label;
-	dest_attr->grh_sgid_index = src_attr->grh.sgid_index;
-	dest_attr->grh_hop_limit = src_attr->grh.hop_limit;
-	dest_attr->grh_traffic_class = src_attr->grh.traffic_class;
-
-	dest_attr->dlid = src_attr->dlid;
-	dest_attr->sl = src_attr->sl;
-	dest_attr->src_path_bits = src_attr->src_path_bits;
-	dest_attr->static_rate = src_attr->static_rate;
-	dest_attr->is_global = (src_attr->ah_flags & IB_AH_GRH);
-	dest_attr->port_num = src_attr->port_num;
-}
-
-static void ib_ucm_copy_qp_attr(struct ib_ucm_init_qp_attr_resp *dest_attr,
-				struct ib_qp_attr *src_attr)
-{
-	dest_attr->cur_qp_state = src_attr->cur_qp_state;
-	dest_attr->path_mtu = src_attr->path_mtu;
-	dest_attr->path_mig_state = src_attr->path_mig_state;
-	dest_attr->qkey = src_attr->qkey;
-	dest_attr->rq_psn = src_attr->rq_psn;
-	dest_attr->sq_psn = src_attr->sq_psn;
-	dest_attr->dest_qp_num = src_attr->dest_qp_num;
-	dest_attr->qp_access_flags = src_attr->qp_access_flags;
-
-	dest_attr->max_send_wr = src_attr->cap.max_send_wr;
-	dest_attr->max_recv_wr = src_attr->cap.max_recv_wr;
-	dest_attr->max_send_sge = src_attr->cap.max_send_sge;
-	dest_attr->max_recv_sge = src_attr->cap.max_recv_sge;
-	dest_attr->max_inline_data = src_attr->cap.max_inline_data;
-
-	ib_ucm_copy_ah_attr(&dest_attr->ah_attr, &src_attr->ah_attr);
-	ib_ucm_copy_ah_attr(&dest_attr->alt_ah_attr, &src_attr->alt_ah_attr);
-
-	dest_attr->pkey_index = src_attr->pkey_index;
-	dest_attr->alt_pkey_index = src_attr->alt_pkey_index;
-	dest_attr->en_sqd_async_notify = src_attr->en_sqd_async_notify;
-	dest_attr->sq_draining = src_attr->sq_draining;
-	dest_attr->max_rd_atomic = src_attr->max_rd_atomic;
-	dest_attr->max_dest_rd_atomic = src_attr->max_dest_rd_atomic;
-	dest_attr->min_rnr_timer = src_attr->min_rnr_timer;
-	dest_attr->port_num = src_attr->port_num;
-	dest_attr->timeout = src_attr->timeout;
-	dest_attr->retry_cnt = src_attr->retry_cnt;
-	dest_attr->rnr_retry = src_attr->rnr_retry;
-	dest_attr->alt_port_num = src_attr->alt_port_num;
-	dest_attr->alt_timeout = src_attr->alt_timeout;
-}
-
 static ssize_t ib_ucm_init_qp_attr(struct ib_ucm_file *file,
 				   const char __user *inbuf,
 				   int in_len, int out_len)
 {
-	struct ib_ucm_init_qp_attr_resp resp;
+	struct ib_uverbs_qp_attr resp;
 	struct ib_ucm_init_qp_attr cmd;
 	struct ib_ucm_context *ctx;
 	struct ib_qp_attr qp_attr;
@@ -716,7 +635,7 @@ static ssize_t ib_ucm_init_qp_attr(struc
 	if (result)
 		goto out;
 
-	ib_ucm_copy_qp_attr(&resp, &qp_attr);
+	ib_copy_qp_attr_to_user(&resp, &qp_attr);
 
 	if (copy_to_user((void __user *)(unsigned long)cmd.response,
 			 &resp, sizeof(resp)))
@@ -791,7 +710,7 @@ static int ib_ucm_alloc_data(const void 
 
 static int ib_ucm_path_get(struct ib_sa_path_rec **path, u64 src)
 {
-	struct ib_ucm_path_rec ucm_path;
+	struct ib_user_path_rec upath;
 	struct ib_sa_path_rec  *sa_path;
 
 	*path = NULL;
@@ -803,36 +722,14 @@ static int ib_ucm_path_get(struct ib_sa_
 	if (!sa_path)
 		return -ENOMEM;
 
-	if (copy_from_user(&ucm_path, (void __user *)(unsigned long)src,
-			   sizeof(ucm_path))) {
+	if (copy_from_user(&upath, (void __user *)(unsigned long)src,
+			   sizeof(upath))) {
 
 		kfree(sa_path);
 		return -EFAULT;
 	}
 
-	memcpy(sa_path->dgid.raw, ucm_path.dgid, sizeof sa_path->dgid);
-	memcpy(sa_path->sgid.raw, ucm_path.sgid, sizeof sa_path->sgid);
-
-	sa_path->dlid	          = ucm_path.dlid;
-	sa_path->slid	          = ucm_path.slid;
-	sa_path->raw_traffic      = ucm_path.raw_traffic;
-	sa_path->flow_label       = ucm_path.flow_label;
-	sa_path->hop_limit        = ucm_path.hop_limit;
-	sa_path->traffic_class    = ucm_path.traffic_class;
-	sa_path->reversible       = ucm_path.reversible;
-	sa_path->numb_path        = ucm_path.numb_path;
-	sa_path->pkey             = ucm_path.pkey;
-	sa_path->sl               = ucm_path.sl;
-	sa_path->mtu_selector     = ucm_path.mtu_selector;
-	sa_path->mtu              = ucm_path.mtu;
-	sa_path->rate_selector    = ucm_path.rate_selector;
-	sa_path->rate             = ucm_path.rate;
-	sa_path->packet_life_time = ucm_path.packet_life_time;
-	sa_path->preference       = ucm_path.preference;
-
-	sa_path->packet_life_time_selector =
-		ucm_path.packet_life_time_selector;
-
+	ib_copy_path_rec_from_user(sa_path, &upath);
 	*path = sa_path;
 	return 0;
 }
@@ -1243,8 +1140,10 @@ static unsigned int ib_ucm_poll(struct f
 
 	poll_wait(filp, &file->poll_wait, wait);
 
+	down(&file->mutex);
 	if (!list_empty(&file->events))
 		mask = POLLIN | POLLRDNORM;
+	up(&file->mutex);
 
 	return mask;
 }
diff -uprN -X linux-2.6.git/Documentation/dontdiff 
linux-2.6.git/drivers/infiniband/core/uverbs_marshall.c 
linux-2.6.ib/drivers/infiniband/core/uverbs_marshall.c
--- linux-2.6.git/drivers/infiniband/core/uverbs_marshall.c	1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.ib/drivers/infiniband/core/uverbs_marshall.c	2006-01-16 15:34:15.000000000 -0800
@@ -0,0 +1,138 @@
+/*
+ * Copyright (c) 2005 Intel Corporation.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <rdma/ib_marshall.h>
+
+static void ib_copy_ah_attr_to_user(struct ib_uverbs_ah_attr *dst,
+				    struct ib_ah_attr *src)
+{
+	memcpy(dst->grh.dgid, src->grh.dgid.raw, sizeof src->grh.dgid);
+	dst->grh.flow_label        = src->grh.flow_label;
+	dst->grh.sgid_index        = src->grh.sgid_index;
+	dst->grh.hop_limit         = src->grh.hop_limit;
+	dst->grh.traffic_class     = src->grh.traffic_class;
+	dst->dlid 	    	   = src->dlid;
+	dst->sl   	    	   = src->sl;
+	dst->src_path_bits 	   = src->src_path_bits;
+	dst->static_rate   	   = src->static_rate;
+	dst->is_global             = src->ah_flags & IB_AH_GRH ? 1 : 0;
+	dst->port_num 	    	   = src->port_num;
+}
+
+void ib_copy_qp_attr_to_user(struct ib_uverbs_qp_attr *dst,
+			     struct ib_qp_attr *src)
+{
+	dst->cur_qp_state	= src->cur_qp_state;
+	dst->path_mtu		= src->path_mtu;
+	dst->path_mig_state	= src->path_mig_state;
+	dst->qkey		= src->qkey;
+	dst->rq_psn		= src->rq_psn;
+	dst->sq_psn		= src->sq_psn;
+	dst->dest_qp_num	= src->dest_qp_num;
+	dst->qp_access_flags	= src->qp_access_flags;
+
+	dst->max_send_wr	= src->cap.max_send_wr;
+	dst->max_recv_wr	= src->cap.max_recv_wr;
+	dst->max_send_sge	= src->cap.max_send_sge;
+	dst->max_recv_sge	= src->cap.max_recv_sge;
+	dst->max_inline_data	= src->cap.max_inline_data;
+
+	ib_copy_ah_attr_to_user(&dst->ah_attr, &src->ah_attr);
+	ib_copy_ah_attr_to_user(&dst->alt_ah_attr, &src->alt_ah_attr);
+
+	dst->pkey_index		= src->pkey_index;
+	dst->alt_pkey_index	= src->alt_pkey_index;
+	dst->en_sqd_async_notify = src->en_sqd_async_notify;
+	dst->sq_draining	= src->sq_draining;
+	dst->max_rd_atomic	= src->max_rd_atomic;
+	dst->max_dest_rd_atomic	= src->max_dest_rd_atomic;
+	dst->min_rnr_timer	= src->min_rnr_timer;
+	dst->port_num		= src->port_num;
+	dst->timeout		= src->timeout;
+	dst->retry_cnt		= src->retry_cnt;
+	dst->rnr_retry		= src->rnr_retry;
+	dst->alt_port_num	= src->alt_port_num;
+	dst->alt_timeout	= src->alt_timeout;
+}
+EXPORT_SYMBOL(ib_copy_qp_attr_to_user);
+
+void ib_copy_path_rec_to_user(struct ib_user_path_rec *dst,
+			      struct ib_sa_path_rec *src)
+{
+	memcpy(dst->dgid, src->dgid.raw, sizeof src->dgid);
+	memcpy(dst->sgid, src->sgid.raw, sizeof src->sgid);
+
+	dst->dlid		= src->dlid;
+	dst->slid		= src->slid;
+	dst->raw_traffic	= src->raw_traffic;
+	dst->flow_label		= src->flow_label;
+	dst->hop_limit		= src->hop_limit;
+	dst->traffic_class	= src->traffic_class;
+	dst->reversible		= src->reversible;
+	dst->numb_path		= src->numb_path;
+	dst->pkey		= src->pkey;
+	dst->sl			= src->sl;
+	dst->mtu_selector	= src->mtu_selector;
+	dst->mtu		= src->mtu;
+	dst->rate_selector	= src->rate_selector;
+	dst->rate		= src->rate;
+	dst->packet_life_time	= src->packet_life_time;
+	dst->preference		= src->preference;
+	dst->packet_life_time_selector = src->packet_life_time_selector;
+}
+EXPORT_SYMBOL(ib_copy_path_rec_to_user);
+
+void ib_copy_path_rec_from_user(struct ib_sa_path_rec *dst,
+				struct ib_user_path_rec *src)
+{
+	memcpy(dst->dgid.raw, src->dgid, sizeof dst->dgid);
+	memcpy(dst->sgid.raw, src->sgid, sizeof dst->sgid);
+
+	dst->dlid		= src->dlid;
+	dst->slid		= src->slid;
+	dst->raw_traffic	= src->raw_traffic;
+	dst->flow_label		= src->flow_label;
+	dst->hop_limit		= src->hop_limit;
+	dst->traffic_class	= src->traffic_class;
+	dst->reversible		= src->reversible;
+	dst->numb_path		= src->numb_path;
+	dst->pkey		= src->pkey;
+	dst->sl			= src->sl;
+	dst->mtu_selector	= src->mtu_selector;
+	dst->mtu		= src->mtu;
+	dst->rate_selector	= src->rate_selector;
+	dst->rate		= src->rate;
+	dst->packet_life_time	= src->packet_life_time;
+	dst->preference		= src->preference;
+	dst->packet_life_time_selector = src->packet_life_time_selector;
+}
+EXPORT_SYMBOL(ib_copy_path_rec_from_user);
diff -uprN -X linux-2.6.git/Documentation/dontdiff 
linux-2.6.git/include/rdma/ib_marshall.h 
linux-2.6.ib/include/rdma/ib_marshall.h
--- linux-2.6.git/include/rdma/ib_marshall.h	1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.ib/include/rdma/ib_marshall.h	2006-01-16 15:34:15.000000000 -0800
@@ -0,0 +1,50 @@
+/*
+ * Copyright (c) 2005 Intel Corporation.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#if !defined(IB_USER_MARSHALL_H)
+#define IB_USER_MARSHALL_H
+
+#include <rdma/ib_verbs.h>
+#include <rdma/ib_sa.h>
+#include <rdma/ib_user_verbs.h>
+#include <rdma/ib_user_sa.h>
+
+void ib_copy_qp_attr_to_user(struct ib_uverbs_qp_attr *dst,
+			     struct ib_qp_attr *src);
+
+void ib_copy_path_rec_to_user(struct ib_user_path_rec *dst,
+			      struct ib_sa_path_rec *src);
+
+void ib_copy_path_rec_from_user(struct ib_sa_path_rec *dst,
+				struct ib_user_path_rec *src);
+
+#endif /* IB_USER_MARSHALL_H */
diff -uprN -X linux-2.6.git/Documentation/dontdiff 
linux-2.6.git/include/rdma/ib_user_cm.h 
linux-2.6.ib/include/rdma/ib_user_cm.h
--- linux-2.6.git/include/rdma/ib_user_cm.h	2006-01-16 10:26:47.000000000 -0800
+++ linux-2.6.ib/include/rdma/ib_user_cm.h	2006-01-16 15:34:15.000000000 -0800
@@ -30,13 +30,13 @@
  * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
  * SOFTWARE.
  *
- * $Id: ib_user_cm.h 2576 2005-06-09 17:00:30Z libor $
+ * $Id: ib_user_cm.h 4019 2005-11-11 00:33:09Z sean.hefty $
  */
 
 #ifndef IB_USER_CM_H
 #define IB_USER_CM_H
 
-#include <linux/types.h>
+#include <rdma/ib_user_sa.h>
 
 #define IB_USER_CM_ABI_VERSION 4
 
@@ -110,58 +110,6 @@ struct ib_ucm_init_qp_attr {
 	__u32 qp_state;
 };
 
-struct ib_ucm_ah_attr {
-	__u8	grh_dgid[16];
-	__u32	grh_flow_label;
-	__u16	dlid;
-	__u16	reserved;
-	__u8	grh_sgid_index;
-	__u8	grh_hop_limit;
-	__u8	grh_traffic_class;
-	__u8	sl;
-	__u8	src_path_bits;
-	__u8	static_rate;
-	__u8	is_global;
-	__u8	port_num;
-};
-
-struct ib_ucm_init_qp_attr_resp {
-	__u32	qp_attr_mask;
-	__u32	qp_state;
-	__u32	cur_qp_state;
-	__u32	path_mtu;
-	__u32	path_mig_state;
-	__u32	qkey;
-	__u32	rq_psn;
-	__u32	sq_psn;
-	__u32	dest_qp_num;
-	__u32	qp_access_flags;
-
-	struct ib_ucm_ah_attr	ah_attr;
-	struct ib_ucm_ah_attr	alt_ah_attr;
-
-	/* ib_qp_cap */
-	__u32	max_send_wr;
-	__u32	max_recv_wr;
-	__u32	max_send_sge;
-	__u32	max_recv_sge;
-	__u32	max_inline_data;
-
-	__u16	pkey_index;
-	__u16	alt_pkey_index;
-	__u8	en_sqd_async_notify;
-	__u8	sq_draining;
-	__u8	max_rd_atomic;
-	__u8	max_dest_rd_atomic;
-	__u8	min_rnr_timer;
-	__u8	port_num;
-	__u8	timeout;
-	__u8	retry_cnt;
-	__u8	rnr_retry;
-	__u8	alt_port_num;
-	__u8	alt_timeout;
-};
-
 struct ib_ucm_listen {
 	__be64 service_id;
 	__be64 service_mask;
@@ -180,28 +128,6 @@ struct ib_ucm_private_data {
 	__u8  reserved[3];
 };
 
-struct ib_ucm_path_rec {
-	__u8  dgid[16];
-	__u8  sgid[16];
-	__be16 dlid;
-	__be16 slid;
-	__u32 raw_traffic;
-	__be32 flow_label;
-	__u32 reversible;
-	__u32 mtu;
-	__be16 pkey;
-	__u8  hop_limit;
-	__u8  traffic_class;
-	__u8  numb_path;
-	__u8  sl;
-	__u8  mtu_selector;
-	__u8  rate_selector;
-	__u8  rate;
-	__u8  packet_life_time_selector;
-	__u8  packet_life_time;
-	__u8  preference;
-};
-
 struct ib_ucm_req {
 	__u32 id;
 	__u32 qpn;
@@ -304,8 +230,8 @@ struct ib_ucm_event_get {
 };
 
 struct ib_ucm_req_event_resp {
-	struct ib_ucm_path_rec primary_path;
-	struct ib_ucm_path_rec alternate_path;
+	struct ib_user_path_rec primary_path;
+	struct ib_user_path_rec alternate_path;
 	__be64                 remote_ca_guid;
 	__u32                  remote_qkey;
 	__u32                  remote_qpn;
@@ -349,7 +275,7 @@ struct ib_ucm_mra_event_resp {
 };
 
 struct ib_ucm_lap_event_resp {
-	struct ib_ucm_path_rec path;
+	struct ib_user_path_rec path;
 };
 
 struct ib_ucm_apr_event_resp {
diff -uprN -X linux-2.6.git/Documentation/dontdiff 
linux-2.6.git/include/rdma/ib_user_sa.h 
linux-2.6.ib/include/rdma/ib_user_sa.h
--- linux-2.6.git/include/rdma/ib_user_sa.h	1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.ib/include/rdma/ib_user_sa.h	2006-01-16 15:34:15.000000000 -0800
@@ -0,0 +1,60 @@
+/*
+ * Copyright (c) 2005 Intel Corporation.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef IB_USER_SA_H
+#define IB_USER_SA_H
+
+#include <linux/types.h>
+
+struct ib_user_path_rec {
+	__u8	dgid[16];
+	__u8	sgid[16];
+	__be16	dlid;
+	__be16	slid;
+	__u32	raw_traffic;
+	__be32	flow_label;
+	__u32	reversible;
+	__u32	mtu;
+	__be16	pkey;
+	__u8	hop_limit;
+	__u8	traffic_class;
+	__u8	numb_path;
+	__u8	sl;
+	__u8	mtu_selector;
+	__u8	rate_selector;
+	__u8	rate;
+	__u8	packet_life_time_selector;
+	__u8	packet_life_time;
+	__u8	preference;
+};
+
+#endif /* IB_USER_SA_H */
diff -uprN -X linux-2.6.git/Documentation/dontdiff 
linux-2.6.git/include/rdma/ib_user_verbs.h 
linux-2.6.ib/include/rdma/ib_user_verbs.h
--- linux-2.6.git/include/rdma/ib_user_verbs.h	2006-01-16 10:26:47.000000000 -0800
+++ linux-2.6.ib/include/rdma/ib_user_verbs.h	2006-01-16 15:34:15.000000000 -0800
@@ -31,7 +31,7 @@
  * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
  * SOFTWARE.
  *
- * $Id: ib_user_verbs.h 2708 2005-06-24 17:27:21Z roland $
+ * $Id: ib_user_verbs.h 4019 2005-11-11 00:33:09Z sean.hefty $
  */
 
 #ifndef IB_USER_VERBS_H
@@ -311,6 +311,64 @@ struct ib_uverbs_destroy_cq_resp {
 	__u32 async_events_reported;
 };
 
+struct ib_uverbs_global_route {
+	__u8  dgid[16];
+	__u32 flow_label;    
+	__u8  sgid_index;
+	__u8  hop_limit;
+	__u8  traffic_class;
+	__u8  reserved;
+};
+
+struct ib_uverbs_ah_attr {
+	struct ib_uverbs_global_route grh;
+	__u16 dlid;
+	__u8  sl;
+	__u8  src_path_bits;
+	__u8  static_rate;
+	__u8  is_global;
+	__u8  port_num;
+	__u8  reserved;
+};
+
+struct ib_uverbs_qp_attr {
+	__u32	qp_attr_mask;
+	__u32	qp_state;
+	__u32	cur_qp_state;
+	__u32	path_mtu;
+	__u32	path_mig_state;
+	__u32	qkey;
+	__u32	rq_psn;
+	__u32	sq_psn;
+	__u32	dest_qp_num;
+	__u32	qp_access_flags;
+
+	struct ib_uverbs_ah_attr ah_attr;
+	struct ib_uverbs_ah_attr alt_ah_attr;
+
+	/* ib_qp_cap */
+	__u32	max_send_wr;
+	__u32	max_recv_wr;
+	__u32	max_send_sge;
+	__u32	max_recv_sge;
+	__u32	max_inline_data;
+
+	__u16	pkey_index;
+	__u16	alt_pkey_index;
+	__u8	en_sqd_async_notify;
+	__u8	sq_draining;
+	__u8	max_rd_atomic;
+	__u8	max_dest_rd_atomic;
+	__u8	min_rnr_timer;
+	__u8	port_num;
+	__u8	timeout;
+	__u8	retry_cnt;
+	__u8	rnr_retry;
+	__u8	alt_port_num;
+	__u8	alt_timeout;
+	__u8	reserved[5];
+};
+
 struct ib_uverbs_create_qp {
 	__u64 response;
 	__u64 user_handle;
@@ -487,26 +545,6 @@ struct ib_uverbs_post_srq_recv_resp {
 	__u32 bad_wr;
 };
 
-struct ib_uverbs_global_route {
-	__u8  dgid[16];
-	__u32 flow_label;    
-	__u8  sgid_index;
-	__u8  hop_limit;
-	__u8  traffic_class;
-	__u8  reserved;
-};
-
-struct ib_uverbs_ah_attr {
-	struct ib_uverbs_global_route grh;
-	__u16 dlid;
-	__u8  sl;
-	__u8  src_path_bits;
-	__u8  static_rate;
-	__u8  is_global;
-	__u8  port_num;
-	__u8  reserved;
-};
-
 struct ib_uverbs_create_ah {
 	__u64 response;
 	__u64 user_handle;

^ permalink raw reply

* [PATCH 2/5] Infiniband: connection abstraction
From: Sean Hefty @ 2006-02-01 20:10 UTC (permalink / raw)
  To: netdev, linux-kernel; +Cc: openib-general
In-Reply-To: <ORSMSX401KZmcK8r6be00000097@orsmsx401.amr.corp.intel.com>

The following patch extends matching connection requests to listens in the
Infiniband CM to include private data.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

---

diff -uprN -X linux-2.6.git/Documentation/dontdiff 
linux-2.6.git/drivers/infiniband/core/cm.c 
linux-2.6.ib/drivers/infiniband/core/cm.c
--- linux-2.6.git/drivers/infiniband/core/cm.c	2006-01-16 10:25:26.000000000 -0800
+++ linux-2.6.ib/drivers/infiniband/core/cm.c	2006-01-16 16:03:35.000000000 -0800
@@ -32,7 +32,7 @@
  * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
  * SOFTWARE.
  *
- * $Id: cm.c 2821 2005-07-08 17:07:28Z sean.hefty $
+ * $Id: cm.c 4311 2005-12-05 18:42:01Z sean.hefty $
  */
 #include <linux/dma-mapping.h>
 #include <linux/err.h>
@@ -130,6 +130,7 @@ struct cm_id_private {
 	/* todo: use alternate port on send failure */
 	struct cm_av av;
 	struct cm_av alt_av;
+	struct ib_cm_compare_data *compare_data;
 
 	void *private_data;
 	__be64 tid;
@@ -355,6 +356,41 @@ static struct cm_id_private * cm_acquire
 	return cm_id_priv;
 }
 
+static void cm_mask_copy(u8 *dst, u8 *src, u8 *mask)
+{
+	int i;
+
+	for (i = 0; i < IB_CM_COMPARE_SIZE / sizeof(unsigned long); i++)
+		((unsigned long *) dst)[i] = ((unsigned long *) src)[i] &
+					     ((unsigned long *) mask)[i];
+}
+
+static int cm_compare_data(struct ib_cm_compare_data *src_data,
+			   struct ib_cm_compare_data *dst_data)
+{
+	u8 src[IB_CM_COMPARE_SIZE];
+	u8 dst[IB_CM_COMPARE_SIZE];
+
+	if (!src_data || !dst_data)
+		return 0;
+	
+	cm_mask_copy(src, src_data->data, dst_data->mask);
+	cm_mask_copy(dst, dst_data->data, src_data->mask);
+	return memcmp(src, dst, IB_CM_COMPARE_SIZE);
+}
+
+static int cm_compare_private_data(u8 *private_data,
+				   struct ib_cm_compare_data *dst_data)
+{
+	u8 src[IB_CM_COMPARE_SIZE];
+
+	if (!dst_data)
+		return 0;
+	
+	cm_mask_copy(src, private_data, dst_data->mask);
+	return memcmp(src, dst_data->data, IB_CM_COMPARE_SIZE);
+}
+
 static struct cm_id_private * cm_insert_listen(struct cm_id_private *cm_id_priv)
 {
 	struct rb_node **link = &cm.listen_service_table.rb_node;
@@ -362,14 +397,18 @@ static struct cm_id_private * cm_insert_
 	struct cm_id_private *cur_cm_id_priv;
 	__be64 service_id = cm_id_priv->id.service_id;
 	__be64 service_mask = cm_id_priv->id.service_mask;
+	int data_cmp;
 
 	while (*link) {
 		parent = *link;
 		cur_cm_id_priv = rb_entry(parent, struct cm_id_private,
 					  service_node);
+		data_cmp = cm_compare_data(cm_id_priv->compare_data,
+					   cur_cm_id_priv->compare_data);
 		if ((cur_cm_id_priv->id.service_mask & service_id) ==
 		    (service_mask & cur_cm_id_priv->id.service_id) &&
-		    (cm_id_priv->id.device == cur_cm_id_priv->id.device))
+		    (cm_id_priv->id.device == cur_cm_id_priv->id.device) &&
+		    !data_cmp)
 			return cur_cm_id_priv;
 
 		if (cm_id_priv->id.device < cur_cm_id_priv->id.device)
@@ -378,6 +417,10 @@ static struct cm_id_private * cm_insert_
 			link = &(*link)->rb_right;
 		else if (service_id < cur_cm_id_priv->id.service_id)
 			link = &(*link)->rb_left;
+		else if (service_id > cur_cm_id_priv->id.service_id)
+			link = &(*link)->rb_right;
+		else if (data_cmp < 0)
+			link = &(*link)->rb_left;
 		else
 			link = &(*link)->rb_right;
 	}
@@ -387,16 +430,20 @@ static struct cm_id_private * cm_insert_
 }
 
 static struct cm_id_private * cm_find_listen(struct ib_device *device,
-					     __be64 service_id)
+					     __be64 service_id,
+					     u8 *private_data)
 {
 	struct rb_node *node = cm.listen_service_table.rb_node;
 	struct cm_id_private *cm_id_priv;
+	int data_cmp;
 
 	while (node) {
 		cm_id_priv = rb_entry(node, struct cm_id_private, service_node);
+		data_cmp = cm_compare_private_data(private_data,
+						   cm_id_priv->compare_data);
 		if ((cm_id_priv->id.service_mask & service_id) ==
 		     cm_id_priv->id.service_id &&
-		    (cm_id_priv->id.device == device))
+		    (cm_id_priv->id.device == device) && !data_cmp)
 			return cm_id_priv;
 
 		if (device < cm_id_priv->id.device)
@@ -405,6 +452,10 @@ static struct cm_id_private * cm_find_li
 			node = node->rb_right;
 		else if (service_id < cm_id_priv->id.service_id)
 			node = node->rb_left;
+		else if (service_id > cm_id_priv->id.service_id)
+			node = node->rb_right;
+		else if (data_cmp < 0)
+			node = node->rb_left;
 		else
 			node = node->rb_right;
 	}
@@ -728,15 +779,14 @@ retest:
 	wait_event(cm_id_priv->wait, !atomic_read(&cm_id_priv->refcount));
 	while ((work = cm_dequeue_work(cm_id_priv)) != NULL)
 		cm_free_work(work);
-	if (cm_id_priv->private_data && cm_id_priv->private_data_len)
-		kfree(cm_id_priv->private_data);
+	kfree(cm_id_priv->compare_data);
+	kfree(cm_id_priv->private_data);
 	kfree(cm_id_priv);
 }
 EXPORT_SYMBOL(ib_destroy_cm_id);
 
-int ib_cm_listen(struct ib_cm_id *cm_id,
-		 __be64 service_id,
-		 __be64 service_mask)
+int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask,
+		 struct ib_cm_compare_data *compare_data)
 {
 	struct cm_id_private *cm_id_priv, *cur_cm_id_priv;
 	unsigned long flags;
@@ -750,7 +800,19 @@ int ib_cm_listen(struct ib_cm_id *cm_id,
 		return -EINVAL;
 
 	cm_id_priv = container_of(cm_id, struct cm_id_private, id);
-	BUG_ON(cm_id->state != IB_CM_IDLE);
+	if (cm_id->state != IB_CM_IDLE)
+		return -EINVAL;
+
+	if (compare_data) {
+		cm_id_priv->compare_data = kzalloc(sizeof *compare_data,
+						   GFP_KERNEL);
+		if (!cm_id_priv->compare_data)
+			return -ENOMEM;
+		cm_mask_copy(cm_id_priv->compare_data->data,
+			     compare_data->data, compare_data->mask);
+		memcpy(cm_id_priv->compare_data->mask, compare_data->mask,
+		       IB_CM_COMPARE_SIZE);
+	}
 
 	cm_id->state = IB_CM_LISTEN;
 
@@ -767,6 +829,8 @@ int ib_cm_listen(struct ib_cm_id *cm_id,
 
 	if (cur_cm_id_priv) {
 		cm_id->state = IB_CM_IDLE;
+		kfree(cm_id_priv->compare_data);
+		cm_id_priv->compare_data = NULL;
 		ret = -EBUSY;
 	}
 	return ret;
@@ -1239,7 +1303,8 @@ static struct cm_id_private * cm_match_r
 
 	/* Find matching listen request. */
 	listen_cm_id_priv = cm_find_listen(cm_id_priv->id.device,
-					   req_msg->service_id);
+					   req_msg->service_id,
+					   req_msg->private_data);
 	if (!listen_cm_id_priv) {
 		spin_unlock_irqrestore(&cm.lock, flags);
 		cm_issue_rej(work->port, work->mad_recv_wc,
@@ -2646,7 +2711,8 @@ static int cm_sidr_req_handler(struct cm
 		goto out; /* Duplicate message. */
 	}
 	cur_cm_id_priv = cm_find_listen(cm_id->device,
-					sidr_req_msg->service_id);
+					sidr_req_msg->service_id,
+					sidr_req_msg->private_data);
 	if (!cur_cm_id_priv) {
 		rb_erase(&cm_id_priv->sidr_id_node, &cm.remote_sidr_table);
 		spin_unlock_irqrestore(&cm.lock, flags);
diff -uprN -X linux-2.6.git/Documentation/dontdiff 
linux-2.6.git/drivers/infiniband/core/ucm.c 
linux-2.6.ib/drivers/infiniband/core/ucm.c
--- linux-2.6.git/drivers/infiniband/core/ucm.c	2006-01-16 16:03:08.000000000 -0800
+++ linux-2.6.ib/drivers/infiniband/core/ucm.c	2006-01-16 16:03:35.000000000 -0800
@@ -646,6 +646,17 @@ out:
 	return result;
 }
 
+static int ucm_validate_listen(__be64 service_id, __be64 service_mask)
+{
+	service_id &= service_mask;
+
+	if (((service_id & IB_CMA_SERVICE_ID_MASK) == IB_CMA_SERVICE_ID) ||
+	    ((service_id & IB_SDP_SERVICE_ID_MASK) == IB_SDP_SERVICE_ID))
+		return -EINVAL;
+
+	return 0;
+}
+
 static ssize_t ib_ucm_listen(struct ib_ucm_file *file,
 			     const char __user *inbuf,
 			     int in_len, int out_len)
@@ -661,7 +672,13 @@ static ssize_t ib_ucm_listen(struct ib_u
 	if (IS_ERR(ctx))
 		return PTR_ERR(ctx);
 
-	result = ib_cm_listen(ctx->cm_id, cmd.service_id, cmd.service_mask);
+	result = ucm_validate_listen(cmd.service_id, cmd.service_mask);
+	if (result)
+		goto out;
+
+	result = ib_cm_listen(ctx->cm_id, cmd.service_id, cmd.service_mask,
+			      NULL);
+out:
 	ib_ucm_ctx_put(ctx);
 	return result;
 }
diff -uprN -X linux-2.6.git/Documentation/dontdiff 
linux-2.6.git/include/rdma/ib_cm.h 
linux-2.6.ib/include/rdma/ib_cm.h
--- linux-2.6.git/include/rdma/ib_cm.h	2006-01-16 10:26:47.000000000 -0800
+++ linux-2.6.ib/include/rdma/ib_cm.h	2006-01-16 16:03:35.000000000 -0800
@@ -32,7 +32,7 @@
  * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
  * SOFTWARE.
  *
- * $Id: ib_cm.h 2730 2005-06-28 16:43:03Z sean.hefty $
+ * $Id: ib_cm.h 4311 2005-12-05 18:42:01Z sean.hefty $
  */
 #if !defined(IB_CM_H)
 #define IB_CM_H
@@ -102,7 +102,8 @@ enum ib_cm_data_size {
 	IB_CM_APR_INFO_LENGTH		 = 72,
 	IB_CM_SIDR_REQ_PRIVATE_DATA_SIZE = 216,
 	IB_CM_SIDR_REP_PRIVATE_DATA_SIZE = 136,
-	IB_CM_SIDR_REP_INFO_LENGTH	 = 72
+	IB_CM_SIDR_REP_INFO_LENGTH	 = 72,
+	IB_CM_COMPARE_SIZE		 = 64
 };
 
 struct ib_cm_id;
@@ -238,7 +239,6 @@ struct ib_cm_sidr_rep_event_param {
 	u32			qpn;
 	void			*info;
 	u8			info_len;
-
 };
 
 struct ib_cm_event {
@@ -317,6 +317,15 @@ void ib_destroy_cm_id(struct ib_cm_id *c
 
 #define IB_SERVICE_ID_AGN_MASK	__constant_cpu_to_be64(0xFF00000000000000ULL)
 #define IB_CM_ASSIGN_SERVICE_ID __constant_cpu_to_be64(0x0200000000000000ULL)
+#define IB_CMA_SERVICE_ID	__constant_cpu_to_be64(0x0000000001000000ULL)
+#define IB_CMA_SERVICE_ID_MASK	__constant_cpu_to_be64(0xFFFFFFFFFF000000ULL)
+#define IB_SDP_SERVICE_ID	__constant_cpu_to_be64(0x0000000000010000ULL)
+#define IB_SDP_SERVICE_ID_MASK	__constant_cpu_to_be64(0xFFFFFFFFFFFF0000ULL)
+
+struct ib_cm_compare_data {
+	u8  data[IB_CM_COMPARE_SIZE];
+	u8  mask[IB_CM_COMPARE_SIZE];
+};
 
 /**
  * ib_cm_listen - Initiates listening on the specified service ID for
@@ -330,10 +339,12 @@ void ib_destroy_cm_id(struct ib_cm_id *c
  *   range of service IDs.  If set to 0, the service ID is matched
  *   exactly.  This parameter is ignored if %service_id is set to
  *   IB_CM_ASSIGN_SERVICE_ID.
+ * @compare_data: This parameter is optional.  It specifies data that must
+ *   appear in the private data of a connection request for the specified
+ *   listen request.
  */
-int ib_cm_listen(struct ib_cm_id *cm_id,
-		 __be64 service_id,
-		 __be64 service_mask);
+int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask,
+		 struct ib_cm_compare_data *compare_data);
 
 struct ib_cm_req_param {
 	struct ib_sa_path_rec	*primary_path;

^ permalink raw reply

* [PATCH 3/5] Infiniband: connection abstraction
From: Sean Hefty @ 2006-02-01 20:15 UTC (permalink / raw)
  To: netdev, linux-kernel; +Cc: openib-general
In-Reply-To: <ORSMSX401KZmcK8r6be00000097@orsmsx401.amr.corp.intel.com>

The following provides an address translation service that maps IP addresses
to Infiniband addresses (GIDs) using IPoIB.

This patch exports ip_dev_find() to locate a net_device given an IP address.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

---

diff -uprN -X linux-2.6.git/Documentation/dontdiff 
linux-2.6.git/drivers/infiniband/core/addr.c 
linux-2.6.ib/drivers/infiniband/core/addr.c
--- linux-2.6.git/drivers/infiniband/core/addr.c	1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.ib/drivers/infiniband/core/addr.c	2006-01-16 16:14:24.000000000 -0800
@@ -0,0 +1,356 @@
+/*
+ * Copyright (c) 2005 Voltaire Inc.  All rights reserved.
+ * Copyright (c) 2002-2005, Network Appliance, Inc. All rights reserved.
+ * Copyright (c) 1999-2005, Mellanox Technologies, Inc. All rights reserved.
+ * Copyright (c) 2005 Intel Corporation.  All rights reserved.
+ *
+ * This Software is licensed under one of the following licenses:
+ *
+ * 1) under the terms of the "Common Public License 1.0" a copy of which is
+ *    available from the Open Source Initiative, see
+ *    http://www.opensource.org/licenses/cpl.php.
+ *
+ * 2) under the terms of the "The BSD License" a copy of which is
+ *    available from the Open Source Initiative, see
+ *    http://www.opensource.org/licenses/bsd-license.php.
+ *
+ * 3) under the terms of the "GNU General Public License (GPL) Version 2" a
+ *    copy of which is available from the Open Source Initiative, see
+ *    http://www.opensource.org/licenses/gpl-license.php.
+ *
+ * Licensee has the right to choose one of the above licenses.
+ *
+ * Redistributions of source code must retain the above copyright
+ * notice and one of the license notices.
+ *
+ * Redistributions in binary form must reproduce both the above copyright
+ * notice, one of the license notices in the documentation
+ * and/or other materials provided with the distribution.
+ */
+#include <linux/inetdevice.h>
+#include <linux/workqueue.h>
+#include <net/arp.h>
+#include <net/neighbour.h>
+#include <net/route.h>
+#include <rdma/ib_addr.h>
+
+MODULE_AUTHOR("Sean Hefty");
+MODULE_DESCRIPTION("IB Address Translation");
+MODULE_LICENSE("Dual BSD/GPL");
+
+struct addr_req {
+	struct list_head list;
+	struct sockaddr src_addr;
+	struct sockaddr dst_addr;
+	struct rdma_dev_addr *addr;
+	void *context;
+	void (*callback)(int status, struct sockaddr *src_addr,
+			 struct rdma_dev_addr *addr, void *context);
+	unsigned long timeout;
+	int status;
+};
+
+static void process_req(void *data);
+
+static DEFINE_MUTEX(lock);
+static LIST_HEAD(req_list);
+static DECLARE_WORK(work, process_req, NULL);
+struct workqueue_struct *rdma_wq;
+EXPORT_SYMBOL(rdma_wq);
+
+static int copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev,
+		     unsigned char *dst_dev_addr)
+{
+	switch (dev->type) {
+	case ARPHRD_INFINIBAND:
+		dev_addr->dev_type = IB_NODE_CA;
+		break;
+	default:
+		return -EADDRNOTAVAIL;
+	}
+
+	memcpy(dev_addr->src_dev_addr, dev->dev_addr, MAX_ADDR_LEN);
+	memcpy(dev_addr->broadcast, dev->broadcast, MAX_ADDR_LEN);
+	if (dst_dev_addr)
+		memcpy(dev_addr->dst_dev_addr, dst_dev_addr, MAX_ADDR_LEN);
+	return 0;
+}
+
+int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr)
+{
+	struct net_device *dev;
+	u32 ip = ((struct sockaddr_in *) addr)->sin_addr.s_addr;
+	int ret;
+
+	dev = ip_dev_find(ip);
+	if (!dev)
+		return -EADDRNOTAVAIL;
+
+	ret = copy_addr(dev_addr, dev, NULL);
+	dev_put(dev);
+	return ret;
+}
+EXPORT_SYMBOL(rdma_translate_ip);
+
+static void set_timeout(unsigned long time)
+{
+	unsigned long delay;
+
+	cancel_delayed_work(&work);
+
+	delay = time - jiffies;
+	if ((long)delay <= 0)
+		delay = 1;
+
+	queue_delayed_work(rdma_wq, &work, delay);
+}
+
+static void queue_req(struct addr_req *req)
+{
+	struct addr_req *temp_req;
+
+	mutex_lock(&lock);
+	list_for_each_entry_reverse(temp_req, &req_list, list) {
+		if (time_after(req->timeout, temp_req->timeout))
+			break;
+	}
+
+	list_add(&req->list, &temp_req->list);
+
+	if (req_list.next == &req->list)
+		set_timeout(req->timeout);
+	mutex_unlock(&lock);
+}
+
+static void addr_send_arp(struct sockaddr_in *dst_in)
+{
+	struct rtable *rt;
+	struct flowi fl;
+	u32 dst_ip = dst_in->sin_addr.s_addr;
+
+	memset(&fl, 0, sizeof fl);
+	fl.nl_u.ip4_u.daddr = dst_ip;
+	if (ip_route_output_key(&rt, &fl))
+		return;
+
+	arp_send(ARPOP_REQUEST, ETH_P_ARP, rt->rt_gateway, rt->idev->dev,
+		 rt->rt_src, NULL, rt->idev->dev->dev_addr, NULL);
+	ip_rt_put(rt);
+}
+
+static int addr_resolve_remote(struct sockaddr_in *src_in,
+			       struct sockaddr_in *dst_in,
+			       struct rdma_dev_addr *addr)
+{
+	u32 src_ip = src_in->sin_addr.s_addr;
+	u32 dst_ip = dst_in->sin_addr.s_addr;
+	struct flowi fl;
+	struct rtable *rt;
+	struct neighbour *neigh;
+	int ret;
+
+	memset(&fl, 0, sizeof fl);
+	fl.nl_u.ip4_u.daddr = dst_ip;
+	fl.nl_u.ip4_u.saddr = src_ip;
+	ret = ip_route_output_key(&rt, &fl);
+	if (ret)
+		goto out;
+
+	neigh = neigh_lookup(&arp_tbl, &rt->rt_gateway, rt->idev->dev);
+	if (!neigh) {
+		ret = -ENODATA;
+		goto err1;
+	}
+
+	if (!(neigh->nud_state & NUD_VALID)) {
+		ret = -ENODATA;
+		goto err2;
+	}
+
+	if (!src_ip) {
+		src_in->sin_family = dst_in->sin_family;
+		src_in->sin_addr.s_addr = rt->rt_src;
+	}
+
+	ret = copy_addr(addr, neigh->dev, neigh->ha);
+err2:
+	neigh_release(neigh);
+err1:
+	ip_rt_put(rt);
+out:
+	return ret;
+}
+
+static void process_req(void *data)
+{
+	struct addr_req *req, *temp_req;
+	struct sockaddr_in *src_in, *dst_in;
+	struct list_head done_list;
+
+	INIT_LIST_HEAD(&done_list);
+
+	mutex_lock(&lock);
+	list_for_each_entry_safe(req, temp_req, &req_list, list) {
+		if (req->status) {
+			src_in = (struct sockaddr_in *) &req->src_addr;
+			dst_in = (struct sockaddr_in *) &req->dst_addr;
+			req->status = addr_resolve_remote(src_in, dst_in,
+							  req->addr);
+		}
+		if (req->status && time_after(jiffies, req->timeout))
+			req->status = -ETIMEDOUT;
+		else if (req->status == -ENODATA)
+			continue;
+
+		list_del(&req->list);
+		list_add_tail(&req->list, &done_list);
+	}
+
+	if (!list_empty(&req_list)) {
+		req = list_entry(req_list.next, struct addr_req, list);
+		set_timeout(req->timeout);
+	}
+	mutex_unlock(&lock);
+
+	list_for_each_entry_safe(req, temp_req, &done_list, list) {
+		list_del(&req->list);
+		req->callback(req->status, &req->src_addr, req->addr,
+			      req->context);
+		kfree(req);
+	}
+}
+
+static int addr_resolve_local(struct sockaddr_in *src_in,
+			      struct sockaddr_in *dst_in,
+			      struct rdma_dev_addr *addr)
+{
+	struct net_device *dev;
+	u32 src_ip = src_in->sin_addr.s_addr;
+	u32 dst_ip = dst_in->sin_addr.s_addr;
+	int ret;
+
+	dev = ip_dev_find(dst_ip);
+	if (!dev)
+		return -EADDRNOTAVAIL;
+
+	if (!src_ip) {
+		src_in->sin_family = dst_in->sin_family;
+		src_in->sin_addr.s_addr = dst_ip;
+		ret = copy_addr(addr, dev, dev->dev_addr);
+	} else {
+		ret = rdma_translate_ip((struct sockaddr *)src_in, addr);
+		if (!ret)
+			memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN);
+	}
+
+	dev_put(dev);
+	return ret;
+}
+
+int rdma_resolve_ip(struct sockaddr *src_addr, struct sockaddr *dst_addr,
+		    struct rdma_dev_addr *addr, int timeout_ms,
+		    void (*callback)(int status, struct sockaddr *src_addr,
+				     struct rdma_dev_addr *addr, void *context),
+		    void *context)
+{
+	struct sockaddr_in *src_in, *dst_in;
+	struct addr_req *req;
+	int ret = 0;
+
+	req = kmalloc(sizeof *req, GFP_KERNEL);
+	if (!req)
+		return -ENOMEM;
+	memset(req, 0, sizeof *req);
+
+	if (src_addr)
+		memcpy(&req->src_addr, src_addr, ip_addr_size(src_addr));
+	memcpy(&req->dst_addr, dst_addr, ip_addr_size(dst_addr));
+	req->addr = addr;
+	req->callback = callback;
+	req->context = context;
+
+	src_in = (struct sockaddr_in *) &req->src_addr;
+	dst_in = (struct sockaddr_in *) &req->dst_addr;
+
+	req->status = addr_resolve_local(src_in, dst_in, addr);
+	if (req->status == -EADDRNOTAVAIL)
+		req->status = addr_resolve_remote(src_in, dst_in, addr);
+
+	switch (req->status) {
+	case 0:
+		req->timeout = jiffies;
+		queue_req(req);
+		break;
+	case -ENODATA:
+		req->timeout = msecs_to_jiffies(timeout_ms) + jiffies;
+		queue_req(req);
+		addr_send_arp(dst_in);
+		break;
+	default:
+		ret = req->status;
+		kfree(req);
+		break;
+	}
+	return ret;
+}
+EXPORT_SYMBOL(rdma_resolve_ip);
+
+void rdma_addr_cancel(struct rdma_dev_addr *addr)
+{
+	struct addr_req *req, *temp_req;
+
+	mutex_lock(&lock);
+	list_for_each_entry_safe(req, temp_req, &req_list, list) {
+		if (req->addr == addr) {
+			req->status = -ECANCELED;
+			req->timeout = jiffies;
+			list_del(&req->list);
+			list_add(&req->list, &req_list);
+			set_timeout(req->timeout);
+			break;
+		}
+	}
+	mutex_unlock(&lock);
+}
+EXPORT_SYMBOL(rdma_addr_cancel);
+
+static int addr_arp_recv(struct sk_buff *skb, struct net_device *dev,
+			 struct packet_type *pkt, struct net_device *orig_dev)
+{
+	struct arphdr *arp_hdr;
+
+	arp_hdr = (struct arphdr *) skb->nh.raw;
+
+	if (dev->type == ARPHRD_INFINIBAND &&
+	    (arp_hdr->ar_op == __constant_htons(ARPOP_REQUEST) ||
+	     arp_hdr->ar_op == __constant_htons(ARPOP_REPLY)))
+		set_timeout(jiffies);
+
+	kfree_skb(skb);
+	return 0;
+}
+
+static struct packet_type addr_arp = {
+	.type           = __constant_htons(ETH_P_ARP),
+	.func           = addr_arp_recv,
+	.af_packet_priv = (void*) 1,
+};
+
+static int addr_init(void)
+{
+	rdma_wq = create_singlethread_workqueue("rdma_wq");
+	if (!rdma_wq)
+		return -ENOMEM;
+
+	dev_add_pack(&addr_arp);
+	return 0;
+}
+
+static void addr_cleanup(void)
+{
+	dev_remove_pack(&addr_arp);
+	destroy_workqueue(rdma_wq);
+}
+
+module_init(addr_init);
+module_exit(addr_cleanup);
diff -uprN -X linux-2.6.git/Documentation/dontdiff 
linux-2.6.git/drivers/infiniband/core/Makefile 
linux-2.6.ib/drivers/infiniband/core/Makefile
--- linux-2.6.git/drivers/infiniband/core/Makefile	2006-01-16 16:03:08.000000000 -0800
+++ linux-2.6.ib/drivers/infiniband/core/Makefile	2006-01-16 16:14:24.000000000 -0800
@@ -1,5 +1,5 @@
 obj-$(CONFIG_INFINIBAND) +=		ib_core.o ib_mad.o ib_sa.o \
-					ib_cm.o
+					ib_cm.o ib_addr.o
 obj-$(CONFIG_INFINIBAND_USER_MAD) +=	ib_umad.o
 obj-$(CONFIG_INFINIBAND_USER_ACCESS) +=	ib_uverbs.o ib_ucm.o
 
@@ -12,6 +12,8 @@ ib_sa-y :=			sa_query.o
 
 ib_cm-y :=			cm.o
 
+ib_addr-y :=			addr.o
+
 ib_umad-y :=			user_mad.o
 
 ib_ucm-y :=			ucm.o
diff -uprN -X linux-2.6.git/Documentation/dontdiff 
linux-2.6.git/include/rdma/ib_addr.h 
linux-2.6.ib/include/rdma/ib_addr.h
--- linux-2.6.git/include/rdma/ib_addr.h	1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.ib/include/rdma/ib_addr.h	2006-01-16 16:14:24.000000000 -0800
@@ -0,0 +1,117 @@
+/*
+ * Copyright (c) 2005 Voltaire Inc.  All rights reserved.
+ * Copyright (c) 2005 Intel Corporation.  All rights reserved.
+ *
+ * This Software is licensed under one of the following licenses:
+ *
+ * 1) under the terms of the "Common Public License 1.0" a copy of which is
+ *    available from the Open Source Initiative, see
+ *    http://www.opensource.org/licenses/cpl.php.
+ *
+ * 2) under the terms of the "The BSD License" a copy of which is
+ *    available from the Open Source Initiative, see
+ *    http://www.opensource.org/licenses/bsd-license.php.
+ *
+ * 3) under the terms of the "GNU General Public License (GPL) Version 2" a
+ *    copy of which is available from the Open Source Initiative, see
+ *    http://www.opensource.org/licenses/gpl-license.php.
+ *
+ * Licensee has the right to choose one of the above licenses.
+ *
+ * Redistributions of source code must retain the above copyright
+ * notice and one of the license notices.
+ *
+ * Redistributions in binary form must reproduce both the above copyright
+ * notice, one of the license notices in the documentation
+ * and/or other materials provided with the distribution.
+ *
+ */
+
+#if !defined(IB_ADDR_H)
+#define IB_ADDR_H
+
+#include <linux/in.h>
+#include <linux/in6.h>
+#include <linux/netdevice.h>
+#include <linux/socket.h>
+#include <rdma/ib_verbs.h>
+
+extern struct workqueue_struct *rdma_wq;
+
+struct rdma_dev_addr {
+	unsigned char src_dev_addr[MAX_ADDR_LEN];
+	unsigned char dst_dev_addr[MAX_ADDR_LEN];
+	unsigned char broadcast[MAX_ADDR_LEN];
+	enum ib_node_type dev_type;
+};
+
+/**
+ * rdma_translate_ip - Translate a local IP address to an RDMA hardware
+ *   address.
+ */
+int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr);
+
+/**
+ * rdma_resolve_ip - Resolve source and destination IP addresses to
+ *   RDMA hardware addresses.
+ * @src_addr: An optional source address to use in the resolution.  If a
+ *   source address is not provided, a usable address will be returned via
+ *   the callback.
+ * @dst_addr: The destination address to resolve.
+ * @addr: A reference to a data location that will receive the resolved
+ *   addresses.  The data location must remain valid until the callback has
+ *   been invoked.
+ * @timeout_ms: Amount of time to wait for the address resolution to complete.
+ * @callback: Call invoked once address resolution has completed, timed out,
+ *   or been canceled.  A status of 0 indicates success.
+ * @context: User-specified context associated with the call.
+ */
+int rdma_resolve_ip(struct sockaddr *src_addr, struct sockaddr *dst_addr,
+		    struct rdma_dev_addr *addr, int timeout_ms,
+		    void (*callback)(int status, struct sockaddr *src_addr,
+				     struct rdma_dev_addr *addr, void *context),
+		    void *context);
+
+void rdma_addr_cancel(struct rdma_dev_addr *addr);
+
+static inline int ip_addr_size(struct sockaddr *addr)
+{
+	return addr->sa_family == AF_INET6 ?
+	       sizeof(struct sockaddr_in6) : sizeof(struct sockaddr_in);
+}
+
+static inline u16 ib_addr_get_pkey(struct rdma_dev_addr *dev_addr)
+{
+	return ((u16)dev_addr->broadcast[8] << 8) | (u16)dev_addr->broadcast[9];
+}
+
+static inline void ib_addr_set_pkey(struct rdma_dev_addr *dev_addr, u16 pkey)
+{
+	dev_addr->broadcast[8] = pkey >> 8;
+	dev_addr->broadcast[9] = (unsigned char) pkey;
+}
+
+static inline union ib_gid* ib_addr_get_sgid(struct rdma_dev_addr *dev_addr)
+{
+	return 	(union ib_gid *) (dev_addr->src_dev_addr + 4);
+}
+
+static inline void ib_addr_set_sgid(struct rdma_dev_addr *dev_addr,
+				    union ib_gid *gid)
+{
+	memcpy(dev_addr->src_dev_addr + 4, gid, sizeof *gid);
+}
+
+static inline union ib_gid* ib_addr_get_dgid(struct rdma_dev_addr *dev_addr)
+{
+	return 	(union ib_gid *) (dev_addr->dst_dev_addr + 4);
+}
+
+static inline void ib_addr_set_dgid(struct rdma_dev_addr *dev_addr,
+				    union ib_gid *gid)
+{
+	memcpy(dev_addr->dst_dev_addr + 4, gid, sizeof *gid);
+}
+
+#endif /* IB_ADDR_H */
+
diff -uprN -X linux-2.6.git/Documentation/dontdiff 
linux-2.6.git/net/ipv4/fib_frontend.c 
linux-2.6.ib/net/ipv4/fib_frontend.c
--- linux-2.6.git/net/ipv4/fib_frontend.c	2006-01-16 10:28:29.000000000 -0800
+++ linux-2.6.ib/net/ipv4/fib_frontend.c	2006-01-16 16:14:24.000000000 -0800
@@ -666,4 +666,5 @@ void __init ip_fib_init(void)
 }
 
 EXPORT_SYMBOL(inet_addr_type);
+EXPORT_SYMBOL(ip_dev_find);
 EXPORT_SYMBOL(ip_rt_ioctl);

^ permalink raw reply

* [PATCH 4/5] Infiniband: connection abstraction
From: Sean Hefty @ 2006-02-01 20:18 UTC (permalink / raw)
  To: netdev, linux-kernel; +Cc: openib-general
In-Reply-To: <ORSMSX401KZmcK8r6be00000097@orsmsx401.amr.corp.intel.com>

The following patch implements a kernel mode connection management agent
over Infiniband that connects based on IP addresses.

The agent defines a generic RDMA connection abstraction to support clients
wanting to connect over different RDMA devices.

It also handles RDMA device hotplug events on behalf of clients.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

---

diff -uprN -X linux-2.6.git/Documentation/dontdiff 
linux-2.6.git/drivers/infiniband/core/cma.c 
linux-2.6.ib/drivers/infiniband/core/cma.c
--- linux-2.6.git/drivers/infiniband/core/cma.c	1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.ib/drivers/infiniband/core/cma.c	2006-01-16 16:17:34.000000000 -0800
@@ -0,0 +1,1639 @@
+/*
+ * Copyright (c) 2005 Voltaire Inc.  All rights reserved.
+ * Copyright (c) 2002-2005, Network Appliance, Inc. All rights reserved.
+ * Copyright (c) 1999-2005, Mellanox Technologies, Inc. All rights reserved.
+ * Copyright (c) 2005 Intel Corporation.  All rights reserved.
+ *
+ * This Software is licensed under one of the following licenses:
+ *
+ * 1) under the terms of the "Common Public License 1.0" a copy of which is
+ *    available from the Open Source Initiative, see
+ *    http://www.opensource.org/licenses/cpl.php.
+ *
+ * 2) under the terms of the "The BSD License" a copy of which is
+ *    available from the Open Source Initiative, see
+ *    http://www.opensource.org/licenses/bsd-license.php.
+ *
+ * 3) under the terms of the "GNU General Public License (GPL) Version 2" a
+ *    copy of which is available from the Open Source Initiative, see
+ *    http://www.opensource.org/licenses/gpl-license.php.
+ *
+ * Licensee has the right to choose one of the above licenses.
+ *
+ * Redistributions of source code must retain the above copyright
+ * notice and one of the license notices.
+ *
+ * Redistributions in binary form must reproduce both the above copyright
+ * notice, one of the license notices in the documentation
+ * and/or other materials provided with the distribution.
+ *
+ */
+#include <linux/in.h>
+#include <linux/in6.h>
+#include <linux/random.h>
+#include <rdma/rdma_cm.h>
+#include <rdma/ib_cache.h>
+#include <rdma/ib_cm.h>
+#include <rdma/ib_sa.h>
+
+MODULE_AUTHOR("Sean Hefty");
+MODULE_DESCRIPTION("Generic RDMA CM Agent");
+MODULE_LICENSE("Dual BSD/GPL");
+
+#define CMA_CM_RESPONSE_TIMEOUT 20
+#define CMA_MAX_CM_RETRIES 3
+
+static void cma_add_one(struct ib_device *device);
+static void cma_remove_one(struct ib_device *device);
+
+static struct ib_client cma_client = {
+	.name   = "cma",
+	.add    = cma_add_one,
+	.remove = cma_remove_one
+};
+
+static LIST_HEAD(dev_list);
+static LIST_HEAD(listen_any_list);
+static DEFINE_MUTEX(lock);
+
+struct cma_device {
+	struct list_head	list;
+	struct ib_device	*device;
+	__be64			node_guid;
+	wait_queue_head_t	wait;
+	atomic_t		refcount;
+	struct list_head	id_list;
+};
+
+enum cma_state {
+	CMA_IDLE,
+	CMA_ADDR_QUERY,
+	CMA_ADDR_RESOLVED,
+	CMA_ROUTE_QUERY,
+	CMA_ROUTE_RESOLVED,
+	CMA_CONNECT,
+	CMA_ADDR_BOUND,
+	CMA_LISTEN,
+	CMA_DEVICE_REMOVAL,
+	CMA_DESTROYING
+};
+
+/*
+ * Device removal can occur at anytime, so we need extra handling to
+ * serialize notifying the user of device removal with other callbacks.
+ * We do this by disabling removal notification while a callback is in process,
+ * and reporting it after the callback completes.
+ */
+struct rdma_id_private {
+	struct rdma_cm_id	id;
+
+	struct list_head	list;
+	struct list_head	listen_list;
+	struct cma_device	*cma_dev;
+
+	enum cma_state		state;
+	spinlock_t		lock;
+	wait_queue_head_t	wait;
+	atomic_t		refcount;
+	wait_queue_head_t	wait_remove;
+	atomic_t		dev_remove;
+
+	int			backlog;
+	int			timeout_ms;
+	struct ib_sa_query	*query;
+	int			query_id;
+	struct ib_cm_id		*cm_id;
+
+	u32			seq_num;
+	u32			qp_num;
+	enum ib_qp_type		qp_type;
+	u8			srq;
+};
+
+struct cma_work {
+	struct work_struct	work;
+	struct rdma_id_private	*id;
+};
+
+union cma_ip_addr {
+	struct in6_addr ip6;
+	struct {
+		__u32 pad[3];
+		__u32 addr;
+	} ip4;
+};
+
+struct cma_hdr {
+	u8 cma_version;
+	u8 ip_version;	/* IP version: 7:4 */
+	__u16 port;
+	union cma_ip_addr src_addr;
+	union cma_ip_addr dst_addr;
+};
+
+struct sdp_hh {
+	u8 sdp_version;
+	u8 ip_version;	/* IP version: 7:4 */
+	u8 sdp_specific1[10];
+	__u16 port;
+	__u16 sdp_specific2;
+	union cma_ip_addr src_addr;
+	union cma_ip_addr dst_addr;
+};
+
+#define CMA_VERSION 0x00
+#define SDP_VERSION 0x22
+
+static int cma_comp(struct rdma_id_private *id_priv, enum cma_state comp)
+{
+	unsigned long flags;
+	int ret;
+
+	spin_lock_irqsave(&id_priv->lock, flags);
+	ret = (id_priv->state == comp);
+	spin_unlock_irqrestore(&id_priv->lock, flags);
+	return ret;
+}
+
+static int cma_comp_exch(struct rdma_id_private *id_priv,
+			 enum cma_state comp, enum cma_state exch)
+{
+	unsigned long flags;
+	int ret;
+
+	spin_lock_irqsave(&id_priv->lock, flags);
+	if ((ret = (id_priv->state == comp)))
+		id_priv->state = exch;
+	spin_unlock_irqrestore(&id_priv->lock, flags);
+	return ret;
+}
+
+static enum cma_state cma_exch(struct rdma_id_private *id_priv,
+			       enum cma_state exch)
+{
+	unsigned long flags;
+	enum cma_state old;
+
+	spin_lock_irqsave(&id_priv->lock, flags);
+	old = id_priv->state;
+	id_priv->state = exch;
+	spin_unlock_irqrestore(&id_priv->lock, flags);
+	return old;
+}
+
+static inline u8 cma_get_ip_ver(struct cma_hdr *hdr)
+{
+	return hdr->ip_version >> 4;
+}
+
+static inline void cma_set_ip_ver(struct cma_hdr *hdr, u8 ip_ver)
+{
+	hdr->ip_version = (ip_ver << 4) | (hdr->ip_version & 0xF);
+}
+
+static inline u8 sdp_get_ip_ver(struct sdp_hh *hh)
+{
+	return hh->ip_version >> 4;
+}
+
+static inline void sdp_set_ip_ver(struct sdp_hh *hh, u8 ip_ver)
+{
+	hh->ip_version = (ip_ver << 4) | (hh->ip_version & 0xF);
+}
+
+static void cma_attach_to_dev(struct rdma_id_private *id_priv,
+			      struct cma_device *cma_dev)
+{
+	atomic_inc(&cma_dev->refcount);
+	id_priv->cma_dev = cma_dev;
+	id_priv->id.device = cma_dev->device;
+	list_add_tail(&id_priv->list, &cma_dev->id_list);
+}
+
+static void cma_detach_from_dev(struct rdma_id_private *id_priv)
+{
+	list_del(&id_priv->list);
+	if (atomic_dec_and_test(&id_priv->cma_dev->refcount))
+		wake_up(&id_priv->cma_dev->wait);
+	id_priv->cma_dev = NULL;
+}
+
+static int cma_acquire_ib_dev(struct rdma_id_private *id_priv)
+{
+	struct cma_device *cma_dev;
+	union ib_gid *gid;
+	int ret = -ENODEV;
+
+	gid = ib_addr_get_sgid(&id_priv->id.route.addr.dev_addr);
+
+	mutex_lock(&lock);
+	list_for_each_entry(cma_dev, &dev_list, list) {
+		ret = ib_find_cached_gid(cma_dev->device, gid,
+					 &id_priv->id.port_num, NULL);
+		if (!ret) {
+			cma_attach_to_dev(id_priv, cma_dev);
+			break;
+		}
+	}
+	mutex_unlock(&lock);
+	return ret;
+}
+
+static int cma_acquire_dev(struct rdma_id_private *id_priv)
+{
+	switch (id_priv->id.route.addr.dev_addr.dev_type) {
+	case IB_NODE_CA:
+		return cma_acquire_ib_dev(id_priv);
+	default:
+		return -ENODEV;
+	}
+}
+
+static void cma_deref_id(struct rdma_id_private *id_priv)
+{
+	if (atomic_dec_and_test(&id_priv->refcount))
+		wake_up(&id_priv->wait);
+}
+
+static void cma_release_remove(struct rdma_id_private *id_priv)
+{
+	if (atomic_dec_and_test(&id_priv->dev_remove))
+		wake_up(&id_priv->wait_remove);
+}
+
+struct rdma_cm_id* rdma_create_id(rdma_cm_event_handler event_handler,
+				  void *context, enum rdma_port_space ps)
+{
+	struct rdma_id_private *id_priv;
+
+	id_priv = kzalloc(sizeof *id_priv, GFP_KERNEL);
+	if (!id_priv)
+		return ERR_PTR(-ENOMEM);
+
+	id_priv->state = CMA_IDLE;
+	id_priv->id.context = context;
+	id_priv->id.event_handler = event_handler;
+	id_priv->id.ps = ps;
+	spin_lock_init(&id_priv->lock);
+	init_waitqueue_head(&id_priv->wait);
+	atomic_set(&id_priv->refcount, 1);
+	init_waitqueue_head(&id_priv->wait_remove);
+	atomic_set(&id_priv->dev_remove, 0);
+	INIT_LIST_HEAD(&id_priv->listen_list);
+	get_random_bytes(&id_priv->seq_num, sizeof id_priv->seq_num);
+
+	return &id_priv->id;
+}
+EXPORT_SYMBOL(rdma_create_id);
+
+static int cma_init_ib_qp(struct rdma_id_private *id_priv, struct ib_qp *qp)
+{
+	struct ib_qp_attr qp_attr;
+	struct rdma_dev_addr *dev_addr;
+	int ret;
+
+	dev_addr = &id_priv->id.route.addr.dev_addr;
+	ret = ib_find_cached_pkey(id_priv->id.device, id_priv->id.port_num,
+				  ib_addr_get_pkey(dev_addr),
+				  &qp_attr.pkey_index);
+	if (ret)
+		return ret;
+
+	qp_attr.qp_state = IB_QPS_INIT;
+	qp_attr.qp_access_flags = IB_ACCESS_LOCAL_WRITE;
+	qp_attr.port_num = id_priv->id.port_num;
+	return ib_modify_qp(qp, &qp_attr, IB_QP_STATE | IB_QP_ACCESS_FLAGS |
+					  IB_QP_PKEY_INDEX | IB_QP_PORT);
+}
+
+int rdma_create_qp(struct rdma_cm_id *id, struct ib_pd *pd,
+		   struct ib_qp_init_attr *qp_init_attr)
+{
+	struct rdma_id_private *id_priv;
+	struct ib_qp *qp;
+	int ret;
+
+	id_priv = container_of(id, struct rdma_id_private, id);
+	if (id->device != pd->device)
+		return -EINVAL;
+
+	qp = ib_create_qp(pd, qp_init_attr);
+	if (IS_ERR(qp))
+		return PTR_ERR(qp);
+
+	switch (id->device->node_type) {
+	case IB_NODE_CA:
+		ret = cma_init_ib_qp(id_priv, qp);
+		break;
+	default:
+		ret = -ENOSYS;
+		break;
+	}
+
+	if (ret)
+		goto err;
+
+	id->qp = qp;
+	id_priv->qp_num = qp->qp_num;
+	id_priv->qp_type = qp->qp_type;
+	id_priv->srq = (qp->srq != NULL);
+	return 0;
+err:
+	ib_destroy_qp(qp);
+	return ret;
+}
+EXPORT_SYMBOL(rdma_create_qp);
+
+void rdma_destroy_qp(struct rdma_cm_id *id)
+{
+	ib_destroy_qp(id->qp);
+}
+EXPORT_SYMBOL(rdma_destroy_qp);
+
+static int cma_modify_qp_rtr(struct rdma_cm_id *id)
+{
+	struct ib_qp_attr qp_attr;
+	int qp_attr_mask, ret;
+
+	if (!id->qp)
+		return 0;
+
+	/* Need to update QP attributes from default values. */
+	qp_attr.qp_state = IB_QPS_INIT;
+	ret = rdma_init_qp_attr(id, &qp_attr, &qp_attr_mask);
+	if (ret)
+		return ret;
+
+	ret = ib_modify_qp(id->qp, &qp_attr, qp_attr_mask);
+	if (ret)
+		return ret;
+
+	qp_attr.qp_state = IB_QPS_RTR;
+	ret = rdma_init_qp_attr(id, &qp_attr, &qp_attr_mask);
+	if (ret)
+		return ret;
+
+	return ib_modify_qp(id->qp, &qp_attr, qp_attr_mask);
+}
+
+static int cma_modify_qp_rts(struct rdma_cm_id *id)
+{
+	struct ib_qp_attr qp_attr;
+	int qp_attr_mask, ret;
+
+	if (!id->qp)
+		return 0;
+
+	qp_attr.qp_state = IB_QPS_RTS;
+	ret = rdma_init_qp_attr(id, &qp_attr, &qp_attr_mask);
+	if (ret)
+		return ret;
+
+	return ib_modify_qp(id->qp, &qp_attr, qp_attr_mask);
+}
+
+static int cma_modify_qp_err(struct rdma_cm_id *id)
+{
+	struct ib_qp_attr qp_attr;
+
+	if (!id->qp)
+		return 0;
+
+	qp_attr.qp_state = IB_QPS_ERR;
+	return ib_modify_qp(id->qp, &qp_attr, IB_QP_STATE);
+}
+
+int rdma_init_qp_attr(struct rdma_cm_id *id, struct ib_qp_attr *qp_attr,
+		       int *qp_attr_mask)
+{
+	struct rdma_id_private *id_priv;
+	int ret;
+
+	id_priv = container_of(id, struct rdma_id_private, id);
+	switch (id_priv->id.device->node_type) {
+	case IB_NODE_CA:
+		ret = ib_cm_init_qp_attr(id_priv->cm_id, qp_attr,
+					 qp_attr_mask);
+		if (qp_attr->qp_state == IB_QPS_RTR)
+			qp_attr->rq_psn = id_priv->seq_num;
+		break;
+	default:
+		ret = -ENOSYS;
+		break;
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL(rdma_init_qp_attr);
+
+static inline int cma_any_addr(struct sockaddr *addr)
+{
+	struct in6_addr *ip6;
+
+	if (addr->sa_family == AF_INET)
+		return ((struct sockaddr_in *) addr)->sin_addr.s_addr ==
+			INADDR_ANY;
+	else {
+		ip6 = &((struct sockaddr_in6 *) addr)->sin6_addr;
+		return (ip6->s6_addr32[0] | ip6->s6_addr32[1] |
+			ip6->s6_addr32[3] | ip6->s6_addr32[4]) == 0;
+	}
+}
+
+static inline int cma_loopback_addr(struct sockaddr *addr)
+{
+	return ((struct sockaddr_in *) addr)->sin_addr.s_addr ==
+		ntohl(INADDR_LOOPBACK);
+}
+
+static int cma_get_net_info(void *hdr, enum rdma_port_space ps,
+			    u8 *ip_ver, __u16 *port,
+			    union cma_ip_addr **src, union cma_ip_addr **dst)
+{
+	switch (ps) {
+	case RDMA_PS_SDP:
+		if (((struct sdp_hh *) hdr)->sdp_version != SDP_VERSION)
+			return -EINVAL;
+
+		*ip_ver	= sdp_get_ip_ver(hdr);
+		*port	= ((struct sdp_hh *) hdr)->port;
+		*src	= &((struct sdp_hh *) hdr)->src_addr;
+		*dst	= &((struct sdp_hh *) hdr)->dst_addr;
+		break;
+	default:
+		if (((struct cma_hdr *) hdr)->cma_version != CMA_VERSION)
+			return -EINVAL;
+
+		*ip_ver	= cma_get_ip_ver(hdr);
+		*port	= ((struct cma_hdr *) hdr)->port;
+		*src	= &((struct cma_hdr *) hdr)->src_addr;
+		*dst	= &((struct cma_hdr *) hdr)->dst_addr;
+		break;
+	}
+	return 0;
+}
+
+static void cma_save_net_info(struct rdma_addr *addr,
+			      struct rdma_addr *listen_addr,
+			      u8 ip_ver, __u16 port,
+			      union cma_ip_addr *src, union cma_ip_addr *dst)
+{
+	struct sockaddr_in *listen4, *ip4;
+	struct sockaddr_in6 *listen6, *ip6;
+
+	switch (ip_ver) {
+	case 4:
+		listen4 = (struct sockaddr_in *) &listen_addr->src_addr;
+		ip4 = (struct sockaddr_in *) &addr->src_addr;
+		ip4->sin_family = listen4->sin_family;
+		ip4->sin_addr.s_addr = dst->ip4.addr;
+		ip4->sin_port = listen4->sin_port;
+
+		ip4 = (struct sockaddr_in *) &addr->dst_addr;
+		ip4->sin_family = listen4->sin_family;
+		ip4->sin_addr.s_addr = src->ip4.addr;
+		ip4->sin_port = port;
+		break;
+	case 6:
+		listen6 = (struct sockaddr_in6 *) &listen_addr->src_addr;
+		ip6 = (struct sockaddr_in6 *) &addr->src_addr;
+		ip6->sin6_family = listen6->sin6_family;
+		ip6->sin6_addr = dst->ip6;
+		ip6->sin6_port = listen6->sin6_port;
+
+		ip6 = (struct sockaddr_in6 *) &addr->dst_addr;
+		ip6->sin6_family = listen6->sin6_family;
+		ip6->sin6_addr = src->ip6;
+		ip6->sin6_port = port;
+		break;
+	default:
+		break;
+	}
+}
+
+static inline int cma_user_data_offset(enum rdma_port_space ps)
+{
+	switch (ps) {
+	case RDMA_PS_SDP:
+		return 0;
+	default:
+		return sizeof(struct cma_hdr);
+	}
+}
+
+static int cma_notify_user(struct rdma_id_private *id_priv,
+			   enum rdma_cm_event_type type, int status,
+			   void *data, u8 data_len)
+{
+	struct rdma_cm_event event;
+
+	event.event = type;
+	event.status = status;
+	event.private_data = data;
+	event.private_data_len = data_len;
+
+	return id_priv->id.event_handler(&id_priv->id, &event);
+}
+
+static void cma_cancel_addr(struct rdma_id_private *id_priv)
+{
+	switch (id_priv->id.device->node_type) {
+	case IB_NODE_CA:
+		rdma_addr_cancel(&id_priv->id.route.addr.dev_addr);
+		break;
+	default:
+		break;
+	}
+}
+
+static void cma_cancel_route(struct rdma_id_private *id_priv)
+{
+	switch (id_priv->id.device->node_type) {
+	case IB_NODE_CA:
+		ib_sa_cancel_query(id_priv->query_id, id_priv->query);
+		break;
+	default:
+		break;
+	}
+}
+
+static inline int cma_internal_listen(struct rdma_id_private *id_priv)
+{
+	return (id_priv->state == CMA_LISTEN) && id_priv->cma_dev &&
+	       cma_any_addr(&id_priv->id.route.addr.src_addr);
+}
+
+static void cma_destroy_listen(struct rdma_id_private *id_priv)
+{
+	cma_exch(id_priv, CMA_DESTROYING);
+
+ 	if (id_priv->cm_id && !IS_ERR(id_priv->cm_id))
+		ib_destroy_cm_id(id_priv->cm_id);
+
+	list_del(&id_priv->listen_list);
+	if (id_priv->cma_dev)
+		cma_detach_from_dev(id_priv);
+
+	atomic_dec(&id_priv->refcount);
+	wait_event(id_priv->wait, !atomic_read(&id_priv->refcount));
+
+	kfree(id_priv);
+}
+
+static void cma_cancel_listens(struct rdma_id_private *id_priv)
+{
+	struct rdma_id_private *dev_id_priv;
+
+	mutex_lock(&lock);
+	list_del(&id_priv->list);
+
+	while (!list_empty(&id_priv->listen_list)) {
+		dev_id_priv = list_entry(id_priv->listen_list.next,
+					 struct rdma_id_private, listen_list);
+		cma_destroy_listen(dev_id_priv);
+	}
+	mutex_unlock(&lock);
+}
+
+static void cma_cancel_operation(struct rdma_id_private *id_priv,
+				 enum cma_state state)
+{
+	switch (state) {
+	case CMA_ADDR_QUERY:
+		cma_cancel_addr(id_priv);
+		break;
+	case CMA_ROUTE_QUERY:
+		cma_cancel_route(id_priv);
+		break;
+	case CMA_LISTEN:
+		if (cma_any_addr(&id_priv->id.route.addr.src_addr) &&
+		    !id_priv->cma_dev)
+			cma_cancel_listens(id_priv);
+		break;
+	default:
+		break;
+	}
+}
+
+void rdma_destroy_id(struct rdma_cm_id *id)
+{
+	struct rdma_id_private *id_priv;
+	enum cma_state state;
+
+	id_priv = container_of(id, struct rdma_id_private, id);
+	state = cma_exch(id_priv, CMA_DESTROYING);
+	cma_cancel_operation(id_priv, state);
+
+ 	if (id_priv->cm_id && !IS_ERR(id_priv->cm_id))
+		ib_destroy_cm_id(id_priv->cm_id);
+
+	if (id_priv->cma_dev) {
+	  	mutex_lock(&lock);
+		cma_detach_from_dev(id_priv);
+		mutex_unlock(&lock);
+	}
+
+	atomic_dec(&id_priv->refcount);
+	wait_event(id_priv->wait, !atomic_read(&id_priv->refcount));
+
+	kfree(id_priv->id.route.path_rec);
+	kfree(id_priv);
+}
+EXPORT_SYMBOL(rdma_destroy_id);
+
+static int cma_rep_recv(struct rdma_id_private *id_priv)
+{
+	int ret;
+
+	ret = cma_modify_qp_rtr(&id_priv->id);
+	if (ret)
+		goto reject;
+
+	ret = cma_modify_qp_rts(&id_priv->id);
+	if (ret)
+		goto reject;
+
+	ret = ib_send_cm_rtu(id_priv->cm_id, NULL, 0);
+	if (ret)
+		goto reject;
+
+	return 0;
+reject:
+	cma_modify_qp_err(&id_priv->id);
+	ib_send_cm_rej(id_priv->cm_id, IB_CM_REJ_CONSUMER_DEFINED,
+		       NULL, 0, NULL, 0);
+	return ret;
+}
+
+static int cma_rtu_recv(struct rdma_id_private *id_priv)
+{
+	int ret;
+
+	ret = cma_modify_qp_rts(&id_priv->id);
+	if (ret)
+		goto reject;
+
+	return 0;
+reject:
+	cma_modify_qp_err(&id_priv->id);
+	ib_send_cm_rej(id_priv->cm_id, IB_CM_REJ_CONSUMER_DEFINED,
+		       NULL, 0, NULL, 0);
+	return ret;
+}
+
+static int cma_ib_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event)
+{
+	struct rdma_id_private *id_priv = cm_id->context;
+	enum rdma_cm_event_type event;
+	u8 private_data_len = 0;
+	int ret = 0, status = 0;
+
+	if (!cma_comp(id_priv, CMA_CONNECT))
+		return 0;
+
+	atomic_inc(&id_priv->dev_remove);
+	switch (ib_event->event) {
+	case IB_CM_REQ_ERROR:
+	case IB_CM_REP_ERROR:
+		event = RDMA_CM_EVENT_UNREACHABLE;
+		status = -ETIMEDOUT;
+		break;
+	case IB_CM_REP_RECEIVED:
+		if (id_priv->id.qp) {
+			status = cma_rep_recv(id_priv);
+			event = status ? RDMA_CM_EVENT_CONNECT_ERROR :
+					 RDMA_CM_EVENT_ESTABLISHED;
+		} else
+			event = RDMA_CM_EVENT_CONNECT_RESPONSE;
+		private_data_len = IB_CM_REP_PRIVATE_DATA_SIZE;
+		break;
+	case IB_CM_RTU_RECEIVED:
+		status = cma_rtu_recv(id_priv);
+		event = status ? RDMA_CM_EVENT_CONNECT_ERROR :
+				 RDMA_CM_EVENT_ESTABLISHED;
+		break;
+	case IB_CM_DREQ_ERROR:
+		status = -ETIMEDOUT; /* fall through */
+	case IB_CM_DREQ_RECEIVED:
+	case IB_CM_DREP_RECEIVED:
+		event = RDMA_CM_EVENT_DISCONNECTED;
+		break;
+	case IB_CM_TIMEWAIT_EXIT:
+	case IB_CM_MRA_RECEIVED:
+		/* ignore event */
+		goto out;
+	case IB_CM_REJ_RECEIVED:
+		cma_modify_qp_err(&id_priv->id);
+		status = ib_event->param.rej_rcvd.reason;
+		event = RDMA_CM_EVENT_REJECTED;
+		break;
+	default:
+		printk(KERN_ERR "RDMA CMA: unexpected IB CM event: %d",
+		       ib_event->event);
+		goto out;
+	}
+
+	ret = cma_notify_user(id_priv, event, status, ib_event->private_data,
+			      private_data_len);
+	if (ret) {
+		/* Destroy the CM ID by returning a non-zero value. */
+		id_priv->cm_id = NULL;
+		cma_exch(id_priv, CMA_DESTROYING);
+		cma_release_remove(id_priv);
+		rdma_destroy_id(&id_priv->id);
+		return ret;
+	}
+out:
+	cma_release_remove(id_priv);
+	return ret;
+}
+
+static struct rdma_id_private* cma_new_id(struct rdma_cm_id *listen_id,
+					  struct ib_cm_event *ib_event)
+{
+	struct rdma_id_private *id_priv;
+	struct rdma_cm_id *id;
+	struct rdma_route *rt;
+	union cma_ip_addr *src, *dst;
+	__u16 port;
+	u8 ip_ver;
+
+	id = rdma_create_id(listen_id->event_handler, listen_id->context,
+			    listen_id->ps);
+	if (IS_ERR(id))
+		return NULL;
+
+	rt = &id->route;
+	rt->num_paths = ib_event->param.req_rcvd.alternate_path ? 2 : 1;
+	rt->path_rec = kmalloc(sizeof *rt->path_rec * rt->num_paths, GFP_KERNEL);
+	if (!rt->path_rec)
+		goto err;
+
+	if (cma_get_net_info(ib_event->private_data, listen_id->ps,
+			     &ip_ver, &port, &src, &dst))
+		goto err;
+
+	cma_save_net_info(&id->route.addr, &listen_id->route.addr,
+			  ip_ver, port, src, dst);
+	rt->path_rec[0] = *ib_event->param.req_rcvd.primary_path;
+	if (rt->num_paths == 2)
+		rt->path_rec[1] = *ib_event->param.req_rcvd.alternate_path;
+
+	ib_addr_set_sgid(&rt->addr.dev_addr, &rt->path_rec[0].sgid);
+	ib_addr_set_dgid(&rt->addr.dev_addr, &rt->path_rec[0].dgid);
+	ib_addr_set_pkey(&rt->addr.dev_addr, be16_to_cpu(rt->path_rec[0].pkey));
+	rt->addr.dev_addr.dev_type = IB_NODE_CA;
+
+	id_priv = container_of(id, struct rdma_id_private, id);
+	id_priv->state = CMA_CONNECT;
+	return id_priv;
+err:
+	rdma_destroy_id(id);
+	return NULL;
+}
+
+static int cma_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event)
+{
+	struct rdma_id_private *listen_id, *conn_id;
+	int offset, ret;
+
+	listen_id = cm_id->context;
+	atomic_inc(&listen_id->dev_remove);
+	if (!cma_comp(listen_id, CMA_LISTEN)) {
+		ret = -ECONNABORTED;
+		goto out;
+	}
+
+	conn_id = cma_new_id(&listen_id->id, ib_event);
+	if (!conn_id) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	atomic_inc(&conn_id->dev_remove);
+	ret = cma_acquire_ib_dev(conn_id);
+	if (ret) {
+		ret = -ENODEV;
+		cma_release_remove(conn_id);
+		rdma_destroy_id(&conn_id->id);
+		goto out;
+	}
+
+	conn_id->cm_id = cm_id;
+	cm_id->context = conn_id;
+	cm_id->cm_handler = cma_ib_handler;
+
+	offset = cma_user_data_offset(listen_id->id.ps);
+	ret = cma_notify_user(conn_id, RDMA_CM_EVENT_CONNECT_REQUEST, 0,
+			      ib_event->private_data + offset,
+			      IB_CM_REQ_PRIVATE_DATA_SIZE - offset);
+	if (ret) {
+		/* Destroy the CM ID by returning a non-zero value. */
+		conn_id->cm_id = NULL;
+		cma_exch(conn_id, CMA_DESTROYING);
+		cma_release_remove(conn_id);
+		rdma_destroy_id(&conn_id->id);
+	}
+out:
+	cma_release_remove(listen_id);
+	return ret;
+}
+
+static __be64 cma_get_service_id(enum rdma_port_space ps, struct sockaddr *addr)
+{
+	return cpu_to_be64(((u64)ps << 16) +
+	       be16_to_cpu(((struct sockaddr_in *) addr)->sin_port));
+}
+
+static void cma_set_compare_data(struct sockaddr *addr,
+				 struct ib_cm_compare_data *compare)
+{
+	struct cma_hdr *data, *mask;
+
+	memset(compare, 0, sizeof *compare);
+	data = (void *) compare->data;
+	mask = (void *) compare->mask;
+
+	switch (addr->sa_family) {
+	case AF_INET:
+		cma_set_ip_ver(data, 4);
+		cma_set_ip_ver(mask, 0xF);
+		data->dst_addr.ip4.addr = ((struct sockaddr_in *) addr)->
+					   sin_addr.s_addr;
+		mask->dst_addr.ip4.addr = ~0;
+		break;
+	case AF_INET6:
+		cma_set_ip_ver(data, 6);
+		cma_set_ip_ver(mask, 0xF);
+		data->dst_addr.ip6 = ((struct sockaddr_in6 *) addr)->
+				      sin6_addr;
+		memset(&mask->dst_addr.ip6, 1, sizeof mask->dst_addr.ip6);
+		break;
+	default:
+		break;
+	}
+}
+
+static int cma_ib_listen(struct rdma_id_private *id_priv)
+{
+	struct ib_cm_compare_data compare_data;
+	struct sockaddr *addr;
+	__be64 svc_id;
+	int ret;
+
+	id_priv->cm_id = ib_create_cm_id(id_priv->id.device, cma_req_handler,
+					 id_priv);
+	if (IS_ERR(id_priv->cm_id))
+		return PTR_ERR(id_priv->cm_id);
+
+	addr = &id_priv->id.route.addr.src_addr;
+	svc_id = cma_get_service_id(id_priv->id.ps, addr);
+	if (cma_any_addr(addr))
+		ret = ib_cm_listen(id_priv->cm_id, svc_id, 0, NULL);
+	else {
+		cma_set_compare_data(addr, &compare_data);
+		ret = ib_cm_listen(id_priv->cm_id, svc_id, 0, &compare_data);
+	}
+
+	if (ret) {
+		ib_destroy_cm_id(id_priv->cm_id);
+		id_priv->cm_id = NULL;
+	}
+
+	return ret;
+}
+
+static int cma_duplicate_listen(struct rdma_id_private *id_priv)
+{
+	struct rdma_id_private *cur_id_priv;
+	struct sockaddr_in *cur_addr, *new_addr;
+
+	new_addr = (struct sockaddr_in *) &id_priv->id.route.addr.src_addr;
+	list_for_each_entry(cur_id_priv, &listen_any_list, listen_list) {
+		cur_addr = (struct sockaddr_in *)
+			    &cur_id_priv->id.route.addr.src_addr;
+		if (cur_addr->sin_port == new_addr->sin_port)
+			return -EADDRINUSE;
+	}
+	return 0;
+}
+
+static int cma_listen_handler(struct rdma_cm_id *id,
+			      struct rdma_cm_event *event)
+{
+	struct rdma_id_private *id_priv = id->context;
+
+	id->context = id_priv->id.context;
+	id->event_handler = id_priv->id.event_handler;
+	return id_priv->id.event_handler(id, event);
+}
+
+static void cma_listen_on_dev(struct rdma_id_private *id_priv,
+			      struct cma_device *cma_dev)
+{
+	struct rdma_id_private *dev_id_priv;
+	struct rdma_cm_id *id;
+	int ret;
+
+	id = rdma_create_id(cma_listen_handler, id_priv, id_priv->id.ps);
+	if (IS_ERR(id))
+		return;
+
+	dev_id_priv = container_of(id, struct rdma_id_private, id);
+	ret = rdma_bind_addr(id, &id_priv->id.route.addr.src_addr);
+	if (ret)
+		goto err;
+
+	cma_attach_to_dev(dev_id_priv, cma_dev);
+	list_add_tail(&dev_id_priv->listen_list, &id_priv->listen_list);
+
+	ret = rdma_listen(id, id_priv->backlog);
+	if (ret)
+		goto err;
+
+	return;
+err:
+	cma_destroy_listen(dev_id_priv);
+}
+
+static int cma_listen_on_all(struct rdma_id_private *id_priv)
+{
+	struct cma_device *cma_dev;
+	int ret;
+
+	mutex_lock(&lock);
+	ret = cma_duplicate_listen(id_priv);
+	if (ret)
+		goto out;
+
+	list_add_tail(&id_priv->list, &listen_any_list);
+	list_for_each_entry(cma_dev, &dev_list, list)
+		cma_listen_on_dev(id_priv, cma_dev);
+out:
+	mutex_unlock(&lock);
+	return ret;
+}
+
+int rdma_listen(struct rdma_cm_id *id, int backlog)
+{
+	struct rdma_id_private *id_priv;
+	int ret;
+
+	id_priv = container_of(id, struct rdma_id_private, id);
+	if (!cma_comp_exch(id_priv, CMA_ADDR_BOUND, CMA_LISTEN))
+		return -EINVAL;
+
+	if (id->device) {
+		switch (id->device->node_type) {
+		case IB_NODE_CA:
+			ret = cma_ib_listen(id_priv);
+			break;
+		default:
+			ret = -ENOSYS;
+			break;
+		}
+	} else
+		ret = cma_listen_on_all(id_priv);
+
+	if (ret)
+		goto err;
+
+	id_priv->backlog = backlog;
+	return 0;
+err:
+	cma_comp_exch(id_priv, CMA_LISTEN, CMA_ADDR_BOUND);
+	return ret;
+};
+EXPORT_SYMBOL(rdma_listen);
+
+static void cma_query_handler(int status, struct ib_sa_path_rec *path_rec,
+			      void *context)
+{
+	struct rdma_id_private *id_priv = context;
+	struct rdma_route *route = &id_priv->id.route;
+	enum rdma_cm_event_type event = RDMA_CM_EVENT_ROUTE_RESOLVED;
+
+	atomic_inc(&id_priv->dev_remove);
+	if (!status) {
+		route->path_rec = kmalloc(sizeof *route->path_rec, GFP_KERNEL);
+		if (route->path_rec) {
+			route->num_paths = 1;
+			*route->path_rec = *path_rec;
+			if (!cma_comp_exch(id_priv, CMA_ROUTE_QUERY,
+						    CMA_ROUTE_RESOLVED)) {
+				kfree(route->path_rec);
+				goto out;
+			}
+		} else
+			status = -ENOMEM;
+	}
+
+	if (status) {
+		if (!cma_comp_exch(id_priv, CMA_ROUTE_QUERY, CMA_ADDR_RESOLVED))
+			goto out;
+		event = RDMA_CM_EVENT_ROUTE_ERROR;
+	}
+
+	if (cma_notify_user(id_priv, event, status, NULL, 0)) {
+		cma_exch(id_priv, CMA_DESTROYING);
+		cma_release_remove(id_priv);
+		cma_deref_id(id_priv);
+		rdma_destroy_id(&id_priv->id);
+		return;
+	}
+out:
+	cma_release_remove(id_priv);
+	cma_deref_id(id_priv);
+}
+
+static int cma_resolve_ib_route(struct rdma_id_private *id_priv, int timeout_ms)
+{
+	struct rdma_dev_addr *addr = &id_priv->id.route.addr.dev_addr;
+	struct ib_sa_path_rec path_rec;
+
+	memset(&path_rec, 0, sizeof path_rec);
+	path_rec.sgid = *ib_addr_get_sgid(addr);
+	path_rec.dgid = *ib_addr_get_dgid(addr);
+	path_rec.pkey = cpu_to_be16(ib_addr_get_pkey(addr));
+	path_rec.numb_path = 1;
+
+	id_priv->query_id = ib_sa_path_rec_get(id_priv->id.device,
+				id_priv->id.port_num, &path_rec,
+				IB_SA_PATH_REC_DGID | IB_SA_PATH_REC_SGID |
+				IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH,
+				timeout_ms, GFP_KERNEL,
+				cma_query_handler, id_priv, &id_priv->query);
+	
+	return (id_priv->query_id < 0) ? id_priv->query_id : 0;
+}
+
+int rdma_resolve_route(struct rdma_cm_id *id, int timeout_ms)
+{
+	struct rdma_id_private *id_priv;
+	int ret;
+
+	id_priv = container_of(id, struct rdma_id_private, id);
+	if (!cma_comp_exch(id_priv, CMA_ADDR_RESOLVED, CMA_ROUTE_QUERY))
+		return -EINVAL;
+
+	atomic_inc(&id_priv->refcount);
+	switch (id->device->node_type) {
+	case IB_NODE_CA:
+		ret = cma_resolve_ib_route(id_priv, timeout_ms);
+		break;
+	default:
+		ret = -ENOSYS;
+		break;
+	}
+	if (ret)
+		goto err;
+
+	return 0;
+err:
+	cma_comp_exch(id_priv, CMA_ROUTE_QUERY, CMA_ADDR_RESOLVED);
+	cma_deref_id(id_priv);
+	return ret;
+}
+EXPORT_SYMBOL(rdma_resolve_route);
+
+static int cma_bind_loopback(struct rdma_id_private *id_priv)
+{
+	struct cma_device *cma_dev;
+	union ib_gid *gid;
+	u16 pkey;
+	int ret;
+
+	mutex_lock(&lock);
+	if (list_empty(&dev_list)) {
+		ret = -ENODEV;
+		goto out;
+	}
+
+	cma_dev = list_entry(dev_list.next, struct cma_device, list);
+	gid = ib_addr_get_sgid(&id_priv->id.route.addr.dev_addr);
+	ret = ib_get_cached_gid(cma_dev->device, 1, 0, gid);
+	if (ret)
+		goto out;
+
+	ret = ib_get_cached_pkey(cma_dev->device, 1, 0, &pkey);
+	if (ret)
+		goto out;
+
+	ib_addr_set_pkey(&id_priv->id.route.addr.dev_addr, pkey);
+	id_priv->id.port_num = 1;
+	cma_attach_to_dev(id_priv, cma_dev);
+out:
+	mutex_unlock(&lock);
+	return ret;
+}
+
+static void addr_handler(int status, struct sockaddr *src_addr,
+			 struct rdma_dev_addr *dev_addr, void *context)
+{
+	struct rdma_id_private *id_priv = context;
+	enum rdma_cm_event_type event;
+	enum cma_state old_state;
+
+	atomic_inc(&id_priv->dev_remove);
+	if (!id_priv->cma_dev) {
+		old_state = CMA_IDLE;
+		if (!status)
+			status = cma_acquire_dev(id_priv);
+	} else
+		old_state = CMA_ADDR_BOUND;
+
+	if (status) {
+		if (!cma_comp_exch(id_priv, CMA_ADDR_QUERY, old_state))
+			goto out;
+		event = RDMA_CM_EVENT_ADDR_ERROR;
+	} else {
+		if (!cma_comp_exch(id_priv, CMA_ADDR_QUERY, CMA_ADDR_RESOLVED))
+			goto out;
+		memcpy(&id_priv->id.route.addr.src_addr, src_addr,
+		       ip_addr_size(src_addr));
+		event = RDMA_CM_EVENT_ADDR_RESOLVED;
+	}
+
+	if (cma_notify_user(id_priv, event, status, NULL, 0)) {
+		cma_exch(id_priv, CMA_DESTROYING);
+		cma_release_remove(id_priv);
+		cma_deref_id(id_priv);
+		rdma_destroy_id(&id_priv->id);
+		return;
+	}
+out:
+	cma_release_remove(id_priv);
+	cma_deref_id(id_priv);
+}
+
+static void loopback_addr_handler(void *data)
+{
+	struct cma_work *work = data;
+	struct rdma_id_private *id_priv = work->id;
+
+	kfree(work);
+	atomic_inc(&id_priv->dev_remove);
+
+	if (!cma_comp_exch(id_priv, CMA_ADDR_QUERY, CMA_ADDR_RESOLVED))
+		goto out;
+
+	if (cma_notify_user(id_priv, RDMA_CM_EVENT_ADDR_RESOLVED, 0, NULL, 0)) {
+		cma_exch(id_priv, CMA_DESTROYING);
+		cma_release_remove(id_priv);
+		cma_deref_id(id_priv);
+		rdma_destroy_id(&id_priv->id);
+		return;
+	}
+out:
+	cma_release_remove(id_priv);
+	cma_deref_id(id_priv);
+}
+
+static int cma_resolve_loopback(struct rdma_id_private *id_priv,
+				struct sockaddr *src_addr, enum cma_state state)
+{
+	struct cma_work *work;
+	struct rdma_dev_addr *dev_addr;
+	int ret;
+
+	work = kmalloc(sizeof *work, GFP_KERNEL);
+	if (!work)
+		return -ENOMEM;
+
+	if (state == CMA_IDLE) {
+		ret = cma_bind_loopback(id_priv);
+		if (ret)
+			goto err;
+		dev_addr = &id_priv->id.route.addr.dev_addr;
+		ib_addr_set_dgid(dev_addr, ib_addr_get_sgid(dev_addr));
+		if (!src_addr || cma_any_addr(src_addr))
+			src_addr = &id_priv->id.route.addr.dst_addr;
+		memcpy(&id_priv->id.route.addr.src_addr, src_addr,
+		       ip_addr_size(src_addr));
+	}
+
+	work->id = id_priv;
+	INIT_WORK(&work->work, loopback_addr_handler, work);
+	queue_work(rdma_wq, &work->work);
+	return 0;
+err:
+	kfree(work);
+	return ret;
+}
+
+int rdma_resolve_addr(struct rdma_cm_id *id, struct sockaddr *src_addr,
+		      struct sockaddr *dst_addr, int timeout_ms)
+{
+	struct rdma_id_private *id_priv;
+	enum cma_state expected_state;
+	int ret;
+
+	id_priv = container_of(id, struct rdma_id_private, id);
+	if (id_priv->cma_dev) {
+		expected_state = CMA_ADDR_BOUND;
+		src_addr = &id->route.addr.src_addr;
+	} else
+		expected_state = CMA_IDLE;
+
+	if (!cma_comp_exch(id_priv, expected_state, CMA_ADDR_QUERY))
+		return -EINVAL;
+
+	atomic_inc(&id_priv->refcount);
+	memcpy(&id->route.addr.dst_addr, dst_addr, ip_addr_size(dst_addr));
+	if (cma_loopback_addr(dst_addr))
+		ret = cma_resolve_loopback(id_priv, src_addr, expected_state);
+	else
+		ret = rdma_resolve_ip(src_addr, dst_addr,
+				      &id->route.addr.dev_addr,
+				      timeout_ms, addr_handler, id_priv);
+	if (ret)
+		goto err;
+
+	return 0;
+err:
+	cma_comp_exch(id_priv, CMA_ADDR_QUERY, expected_state);
+	cma_deref_id(id_priv);
+	return ret;
+}
+EXPORT_SYMBOL(rdma_resolve_addr);
+
+int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)
+{
+	struct rdma_id_private *id_priv;
+	struct rdma_dev_addr *dev_addr;
+	int ret;
+
+	if (addr->sa_family != AF_INET)
+		return -EINVAL;
+
+	id_priv = container_of(id, struct rdma_id_private, id);
+	if (!cma_comp_exch(id_priv, CMA_IDLE, CMA_ADDR_BOUND))
+		return -EINVAL;
+
+	if (cma_any_addr(addr)) {
+		ret = 0;
+	} else if (cma_loopback_addr(addr)) {
+		ret = cma_bind_loopback(id_priv);
+	} else {
+		dev_addr = &id->route.addr.dev_addr;
+		ret = rdma_translate_ip(addr, dev_addr);
+		if (!ret)
+			ret = cma_acquire_dev(id_priv);
+	}
+
+	if (ret)
+		goto err;
+
+	memcpy(&id->route.addr.src_addr, addr, ip_addr_size(addr));
+	return 0;
+err:
+	cma_comp_exch(id_priv, CMA_ADDR_BOUND, CMA_IDLE);
+	return ret;
+}
+EXPORT_SYMBOL(rdma_bind_addr);
+
+static void cma_format_hdr(void *hdr, enum rdma_port_space ps,
+			   struct rdma_route *route)
+{
+	struct sockaddr_in *src4, *dst4;
+	struct cma_hdr *cma_hdr;
+	struct sdp_hh *sdp_hdr;
+
+	src4 = (struct sockaddr_in *) &route->addr.src_addr;
+	dst4 = (struct sockaddr_in *) &route->addr.dst_addr;
+
+	switch (ps) {
+	case RDMA_PS_SDP:
+		sdp_hdr = hdr;
+		sdp_hdr->sdp_version = SDP_VERSION;
+		sdp_set_ip_ver(sdp_hdr, 4);
+		sdp_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr;
+		sdp_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr;
+		sdp_hdr->port = src4->sin_port;
+		break;
+	default:
+		cma_hdr = hdr;
+		cma_hdr->cma_version = CMA_VERSION;
+		cma_set_ip_ver(cma_hdr, 4);
+		cma_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr;
+		cma_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr;
+		cma_hdr->port = src4->sin_port;
+		break;
+	}
+}
+
+static int cma_connect_ib(struct rdma_id_private *id_priv,
+			  struct rdma_conn_param *conn_param)
+{
+	struct ib_cm_req_param req;
+	struct rdma_route *route;
+	void *private_data;
+	int offset, ret;
+
+	memset(&req, 0, sizeof req);
+	offset = cma_user_data_offset(id_priv->id.ps);
+	req.private_data_len = offset + conn_param->private_data_len;
+	private_data = kzalloc(req.private_data_len, GFP_ATOMIC);
+	if (!private_data)
+		return -ENOMEM;
+
+	if (conn_param->private_data && conn_param->private_data_len)
+		memcpy(private_data + offset, conn_param->private_data,
+		       conn_param->private_data_len);
+
+	id_priv->cm_id = ib_create_cm_id(id_priv->id.device, cma_ib_handler,
+					 id_priv);
+	if (IS_ERR(id_priv->cm_id)) {
+		ret = PTR_ERR(id_priv->cm_id);
+		goto out;
+	}
+
+	route = &id_priv->id.route;
+	cma_format_hdr(private_data, id_priv->id.ps, route);
+	req.private_data = private_data;
+
+	req.primary_path = &route->path_rec[0];
+	if (route->num_paths == 2)
+		req.alternate_path = &route->path_rec[1];
+
+	req.service_id = cma_get_service_id(id_priv->id.ps,
+					    &route->addr.dst_addr);
+	req.qp_num = id_priv->qp_num;
+	req.qp_type = id_priv->qp_type;
+	req.starting_psn = id_priv->seq_num;
+	req.responder_resources = conn_param->responder_resources;
+	req.initiator_depth = conn_param->initiator_depth;
+	req.flow_control = conn_param->flow_control;
+	req.retry_count = conn_param->retry_count;
+	req.rnr_retry_count = conn_param->rnr_retry_count;
+	req.remote_cm_response_timeout = CMA_CM_RESPONSE_TIMEOUT;
+	req.local_cm_response_timeout = CMA_CM_RESPONSE_TIMEOUT;
+	req.max_cm_retries = CMA_MAX_CM_RETRIES;
+	req.srq = id_priv->srq ? 1 : 0;
+
+	ret = ib_send_cm_req(id_priv->cm_id, &req);
+out:
+	kfree(private_data);
+	return ret;
+}
+
+int rdma_connect(struct rdma_cm_id *id, struct rdma_conn_param *conn_param)
+{
+	struct rdma_id_private *id_priv;
+	int ret;
+
+	id_priv = container_of(id, struct rdma_id_private, id);
+	if (!cma_comp_exch(id_priv, CMA_ROUTE_RESOLVED, CMA_CONNECT))
+		return -EINVAL;
+
+	if (!id->qp) {
+		id_priv->qp_num = conn_param->qp_num;
+		id_priv->qp_type = conn_param->qp_type;
+		id_priv->srq = conn_param->srq;
+	}
+
+	switch (id->device->node_type) {
+	case IB_NODE_CA:
+		ret = cma_connect_ib(id_priv, conn_param);
+		break;
+	default:
+		ret = -ENOSYS;
+		break;
+	}
+	if (ret)
+		goto err;
+
+	return 0;
+err:
+	cma_comp_exch(id_priv, CMA_CONNECT, CMA_ROUTE_RESOLVED);
+	return ret;
+}
+EXPORT_SYMBOL(rdma_connect);
+
+static int cma_accept_ib(struct rdma_id_private *id_priv,
+			 struct rdma_conn_param *conn_param)
+{
+	struct ib_cm_rep_param rep;
+	int ret;
+
+	ret = cma_modify_qp_rtr(&id_priv->id);
+	if (ret)
+		return ret;
+
+	memset(&rep, 0, sizeof rep);
+	rep.qp_num = id_priv->qp_num;
+	rep.starting_psn = id_priv->seq_num;
+	rep.private_data = conn_param->private_data;
+	rep.private_data_len = conn_param->private_data_len;
+	rep.responder_resources = conn_param->responder_resources;
+	rep.initiator_depth = conn_param->initiator_depth;
+	rep.target_ack_delay = CMA_CM_RESPONSE_TIMEOUT;
+	rep.failover_accepted = 0;
+	rep.flow_control = conn_param->flow_control;
+	rep.rnr_retry_count = conn_param->rnr_retry_count;
+	rep.srq = id_priv->srq ? 1 : 0;
+
+	return ib_send_cm_rep(id_priv->cm_id, &rep);
+}
+
+int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param)
+{
+	struct rdma_id_private *id_priv;
+	int ret;
+
+	id_priv = container_of(id, struct rdma_id_private, id);
+	if (!cma_comp(id_priv, CMA_CONNECT))
+		return -EINVAL;
+
+	if (!id->qp && conn_param) {
+		id_priv->qp_num = conn_param->qp_num;
+		id_priv->qp_type = conn_param->qp_type;
+		id_priv->srq = conn_param->srq;
+	}
+
+	switch (id->device->node_type) {
+	case IB_NODE_CA:
+		if (conn_param)
+			ret = cma_accept_ib(id_priv, conn_param);
+		else
+			ret = cma_rep_recv(id_priv);
+		break;
+	default:
+		ret = -ENOSYS;
+		break;
+	}
+
+	if (ret)
+		goto reject;
+
+	return 0;
+reject:
+	cma_modify_qp_err(id);
+	rdma_reject(id, NULL, 0);
+	return ret;
+}
+EXPORT_SYMBOL(rdma_accept);
+
+int rdma_reject(struct rdma_cm_id *id, const void *private_data,
+		u8 private_data_len)
+{
+	struct rdma_id_private *id_priv;
+	int ret;
+
+	id_priv = container_of(id, struct rdma_id_private, id);
+	if (!cma_comp(id_priv, CMA_CONNECT))
+		return -EINVAL;
+
+	switch (id->device->node_type) {
+	case IB_NODE_CA:
+		ret = ib_send_cm_rej(id_priv->cm_id, IB_CM_REJ_CONSUMER_DEFINED,
+				     NULL, 0, private_data, private_data_len);
+		break;
+	default:
+		ret = -ENOSYS;
+		break;
+	}
+	return ret;
+};
+EXPORT_SYMBOL(rdma_reject);
+
+int rdma_disconnect(struct rdma_cm_id *id)
+{
+	struct rdma_id_private *id_priv;
+	int ret;
+
+	id_priv = container_of(id, struct rdma_id_private, id);
+	if (!cma_comp(id_priv, CMA_CONNECT))
+		return -EINVAL;
+
+	ret = cma_modify_qp_err(id);
+	if (ret)
+		goto out;
+
+	switch (id->device->node_type) {
+	case IB_NODE_CA:
+		/* Initiate or respond to a disconnect. */
+		if (ib_send_cm_dreq(id_priv->cm_id, NULL, 0))
+			ib_send_cm_drep(id_priv->cm_id, NULL, 0);
+		break;
+	default:
+		break;
+	}
+out:
+	return ret;
+}
+EXPORT_SYMBOL(rdma_disconnect);
+
+static void cma_add_one(struct ib_device *device)
+{
+	struct cma_device *cma_dev;
+	struct rdma_id_private *id_priv;
+
+	cma_dev = kmalloc(sizeof *cma_dev, GFP_KERNEL);
+	if (!cma_dev)
+		return;
+
+	cma_dev->device = device;
+	cma_dev->node_guid = device->node_guid;
+	if (!cma_dev->node_guid)
+		goto err;
+
+	init_waitqueue_head(&cma_dev->wait);
+	atomic_set(&cma_dev->refcount, 1);
+	INIT_LIST_HEAD(&cma_dev->id_list);
+	ib_set_client_data(device, &cma_client, cma_dev);
+
+	mutex_lock(&lock);
+	list_add_tail(&cma_dev->list, &dev_list);
+	list_for_each_entry(id_priv, &listen_any_list, list)
+		cma_listen_on_dev(id_priv, cma_dev);
+	mutex_unlock(&lock);
+	return;
+err:
+	kfree(cma_dev);
+}
+
+static int cma_remove_id_dev(struct rdma_id_private *id_priv)
+{
+	enum cma_state state;
+
+	/* Record that we want to remove the device */
+	state = cma_exch(id_priv, CMA_DEVICE_REMOVAL);
+	if (state == CMA_DESTROYING)
+		return 0;
+
+	cma_cancel_operation(id_priv, state);
+	wait_event(id_priv->wait_remove, !atomic_read(&id_priv->dev_remove));
+
+	/* Check for destruction from another callback. */
+	if (!cma_comp(id_priv, CMA_DEVICE_REMOVAL))
+		return 0;
+
+	return cma_notify_user(id_priv, RDMA_CM_EVENT_DEVICE_REMOVAL,
+			       0, NULL, 0);
+}
+
+static void cma_process_remove(struct cma_device *cma_dev)
+{
+	struct list_head remove_list;
+	struct rdma_id_private *id_priv;
+	int ret;
+
+	INIT_LIST_HEAD(&remove_list);
+
+	mutex_lock(&lock);
+	while (!list_empty(&cma_dev->id_list)) {
+		id_priv = list_entry(cma_dev->id_list.next,
+				     struct rdma_id_private, list);
+
+		if (cma_internal_listen(id_priv)) {
+			cma_destroy_listen(id_priv);
+			continue;
+		}
+
+		list_del(&id_priv->list);
+		list_add_tail(&id_priv->list, &remove_list);
+		atomic_inc(&id_priv->refcount);
+		mutex_unlock(&lock);
+
+		ret = cma_remove_id_dev(id_priv);
+		cma_deref_id(id_priv);
+		if (ret)
+			rdma_destroy_id(&id_priv->id);
+
+		mutex_lock(&lock);
+	}
+	mutex_unlock(&lock);
+
+	atomic_dec(&cma_dev->refcount);
+	wait_event(cma_dev->wait, !atomic_read(&cma_dev->refcount));
+}
+
+static void cma_remove_one(struct ib_device *device)
+{
+	struct cma_device *cma_dev;
+
+	cma_dev = ib_get_client_data(device, &cma_client);
+	if (!cma_dev)
+		return;
+
+	mutex_lock(&lock);
+	list_del(&cma_dev->list);
+	mutex_unlock(&lock);
+
+	cma_process_remove(cma_dev);
+	kfree(cma_dev);
+}
+
+static int cma_init(void)
+{
+	return ib_register_client(&cma_client);
+}
+
+static void cma_cleanup(void)
+{
+	ib_unregister_client(&cma_client);
+}
+
+module_init(cma_init);
+module_exit(cma_cleanup);
diff -uprN -X linux-2.6.git/Documentation/dontdiff 
linux-2.6.git/drivers/infiniband/core/Makefile 
linux-2.6.ib/drivers/infiniband/core/Makefile
--- linux-2.6.git/drivers/infiniband/core/Makefile	2006-01-16 16:16:18.000000000 -0800
+++ linux-2.6.ib/drivers/infiniband/core/Makefile	2006-01-16 16:35:48.000000000 -0800
@@ -1,5 +1,5 @@
 obj-$(CONFIG_INFINIBAND) +=		ib_core.o ib_mad.o ib_sa.o \
-					ib_cm.o ib_addr.o
+					ib_cm.o ib_addr.o rdma_cm.o
 obj-$(CONFIG_INFINIBAND_USER_MAD) +=	ib_umad.o
 obj-$(CONFIG_INFINIBAND_USER_ACCESS) +=	ib_uverbs.o ib_ucm.o
 
@@ -12,6 +12,8 @@ ib_sa-y :=			sa_query.o
 
 ib_cm-y :=			cm.o
 
+rdma_cm-y :=			cma.o
+
 ib_addr-y :=			addr.o
 
 ib_umad-y :=			user_mad.o
diff -uprN -X linux-2.6.git/Documentation/dontdiff 
linux-2.6.git/include/rdma/rdma_cm.h 
linux-2.6.ib/include/rdma/rdma_cm.h
--- linux-2.6.git/include/rdma/rdma_cm.h	1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.ib/include/rdma/rdma_cm.h	2006-01-16 16:19:12.000000000 -0800
@@ -0,0 +1,255 @@
+/*
+ * Copyright (c) 2005 Voltaire Inc.  All rights reserved.
+ * Copyright (c) 2005 Intel Corporation.  All rights reserved.
+ *
+ * This Software is licensed under one of the following licenses:
+ *
+ * 1) under the terms of the "Common Public License 1.0" a copy of which is
+ *    available from the Open Source Initiative, see
+ *    http://www.opensource.org/licenses/cpl.php.
+ *
+ * 2) under the terms of the "The BSD License" a copy of which is
+ *    available from the Open Source Initiative, see
+ *    http://www.opensource.org/licenses/bsd-license.php.
+ *
+ * 3) under the terms of the "GNU General Public License (GPL) Version 2" a
+ *    copy of which is available from the Open Source Initiative, see
+ *    http://www.opensource.org/licenses/gpl-license.php.
+ *
+ * Licensee has the right to choose one of the above licenses.
+ *
+ * Redistributions of source code must retain the above copyright
+ * notice and one of the license notices.
+ *
+ * Redistributions in binary form must reproduce both the above copyright
+ * notice, one of the license notices in the documentation
+ * and/or other materials provided with the distribution.
+ *
+ */
+
+#if !defined(RDMA_CM_H)
+#define RDMA_CM_H
+
+#include <linux/socket.h>
+#include <linux/in6.h>
+#include <rdma/ib_addr.h>
+#include <rdma/ib_sa.h>
+
+/*
+ * Upon receiving a device removal event, users must destroy the associated
+ * RDMA identifier and release all resources allocated with the device.
+ */
+enum rdma_cm_event_type {
+	RDMA_CM_EVENT_ADDR_RESOLVED,
+	RDMA_CM_EVENT_ADDR_ERROR,
+	RDMA_CM_EVENT_ROUTE_RESOLVED,
+	RDMA_CM_EVENT_ROUTE_ERROR,
+	RDMA_CM_EVENT_CONNECT_REQUEST,
+	RDMA_CM_EVENT_CONNECT_RESPONSE,
+	RDMA_CM_EVENT_CONNECT_ERROR,
+	RDMA_CM_EVENT_UNREACHABLE,
+	RDMA_CM_EVENT_REJECTED,
+	RDMA_CM_EVENT_ESTABLISHED,
+	RDMA_CM_EVENT_DISCONNECTED,
+	RDMA_CM_EVENT_DEVICE_REMOVAL,
+};
+
+enum rdma_port_space {
+	RDMA_PS_SDP  = 0x0001,
+	RDMA_PS_TCP  = 0x0106,
+	RDMA_PS_UDP  = 0x0111,
+	RDMA_PS_SCTP = 0x0183
+};
+
+struct rdma_addr {
+	struct sockaddr src_addr;
+	u8		src_pad[sizeof(struct sockaddr_in6) -
+				sizeof(struct sockaddr)];
+	struct sockaddr dst_addr;
+	u8		dst_pad[sizeof(struct sockaddr_in6) -
+				sizeof(struct sockaddr)];
+	struct rdma_dev_addr dev_addr;
+};
+
+struct rdma_route {
+	struct rdma_addr addr;
+	struct ib_sa_path_rec *path_rec;
+	int num_paths;
+};
+
+struct rdma_cm_event {
+	enum rdma_cm_event_type	 event;
+	int			 status;
+	void			*private_data;
+	u8			 private_data_len;
+};
+
+struct rdma_cm_id;
+
+/**
+ * rdma_cm_event_handler - Callback used to report user events.
+ *
+ * Notes: Users may not call rdma_destroy_id from this callback to destroy
+ *   the passed in id, or a corresponding listen id.  Returning a
+ *   non-zero value from the callback will destroy the corresponding id.
+ */
+typedef int (*rdma_cm_event_handler)(struct rdma_cm_id *id,
+				     struct rdma_cm_event *event);
+
+struct rdma_cm_id {
+	struct ib_device	*device;
+	void			*context;
+	struct ib_qp		*qp;
+	rdma_cm_event_handler	 event_handler;
+	struct rdma_route	 route;
+	enum rdma_port_space	 ps;
+	u8			 port_num;
+};
+
+/**
+ * rdma_create_id - Create an RDMA identifier.
+ *
+ * @event_handler: User callback invoked to report events associated with the
+ *   returned rdma_id.
+ * @context: User specified context associated with the id.
+ * @ps: RDMA port space.
+ */
+struct rdma_cm_id* rdma_create_id(rdma_cm_event_handler event_handler,
+				  void *context, enum rdma_port_space ps);
+
+void rdma_destroy_id(struct rdma_cm_id *id);
+
+/**
+ * rdma_bind_addr - Bind an RDMA identifier to a source address and
+ *   associated RDMA device, if needed.
+ *
+ * @id: RDMA identifier.
+ * @addr: Local address information.  Wildcard values are permitted.
+ *
+ * This associates a source address with the RDMA identifier before calling
+ * rdma_listen.  If a specific local address is given, the RDMA identifier will
+ * be bound to a local RDMA device.
+ */
+int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr);
+
+/**
+ * rdma_resolve_addr - Resolve destination and optional source addresses
+ *   from IP addresses to an RDMA address.  If successful, the specified
+ *   rdma_cm_id will be bound to a local device.
+ *
+ * @id: RDMA identifier.
+ * @src_addr: Source address information.  This parameter may be NULL.
+ * @dst_addr: Destination address information.
+ * @timeout_ms: Time to wait for resolution to complete.
+ */
+int rdma_resolve_addr(struct rdma_cm_id *id, struct sockaddr *src_addr,
+		      struct sockaddr *dst_addr, int timeout_ms);
+
+/**
+ * rdma_resolve_route - Resolve the RDMA address bound to the RDMA identifier
+ *   into route information needed to establish a connection.
+ *
+ * This is called on the client side of a connection.
+ * Users must have first called rdma_resolve_addr to resolve a dst_addr
+ * into an RDMA address before calling this routine.
+ */
+int rdma_resolve_route(struct rdma_cm_id *id, int timeout_ms);
+
+/**
+ * rdma_create_qp - Allocate a QP and associate it with the specified RDMA
+ * identifier.
+ *
+ * QPs allocated to an rdma_cm_id will automatically be transitioned by the CMA
+ * through their states.
+ */
+int rdma_create_qp(struct rdma_cm_id *id, struct ib_pd *pd,
+		   struct ib_qp_init_attr *qp_init_attr);
+
+/**
+ * rdma_destroy_qp - Deallocate the QP associated with the specified RDMA
+ * identifier.
+ *
+ * Users must destroy any QP associated with an RDMA identifier before
+ * destroying the RDMA ID.
+ */
+void rdma_destroy_qp(struct rdma_cm_id *id);
+
+/**
+ * rdma_init_qp_attr - Initializes the QP attributes for use in transitioning
+ *   to a specified QP state.
+ * @id: Communication identifier associated with the QP attributes to
+ *   initialize.
+ * @qp_attr: On input, specifies the desired QP state.  On output, the
+ *   mandatory and desired optional attributes will be set in order to
+ *   modify the QP to the specified state.
+ * @qp_attr_mask: The QP attribute mask that may be used to transition the
+ *   QP to the specified state.
+ *
+ * Users must set the @qp_attr->qp_state to the desired QP state.  This call
+ * will set all required attributes for the given transition, along with
+ * known optional attributes.  Users may override the attributes returned from
+ * this call before calling ib_modify_qp.
+ *
+ * Users that wish to have their QP automatically transitioned through its
+ * states can associate a QP with the rdma_cm_id by calling rdma_create_qp().
+ */
+int rdma_init_qp_attr(struct rdma_cm_id *id, struct ib_qp_attr *qp_attr,
+		       int *qp_attr_mask);
+
+struct rdma_conn_param {
+	const void *private_data;
+	u8 private_data_len;
+	u8 responder_resources;
+	u8 initiator_depth;
+	u8 flow_control;
+	u8 retry_count;		/* ignored when accepting */
+	u8 rnr_retry_count;
+	/* Fields below ignored if a QP is created on the rdma_cm_id. */
+	u8 srq;
+	u32 qp_num;
+	enum ib_qp_type qp_type;
+};
+
+/**
+ * rdma_connect - Initiate an active connection request.
+ *
+ * Users must have resolved a route for the rdma_cm_id to connect with
+ * by having called rdma_resolve_route before calling this routine.
+ */
+int rdma_connect(struct rdma_cm_id *id, struct rdma_conn_param *conn_param);
+
+/**
+ * rdma_listen - This function is called by the passive side to
+ *   listen for incoming connection requests.
+ *
+ * Users must have bound the rdma_cm_id to a local address by calling
+ * rdma_bind_addr before calling this routine.
+ */
+int rdma_listen(struct rdma_cm_id *id, int backlog);
+
+/**
+ * rdma_accept - Called to accept a connection request or response.
+ * @id: Connection identifier associated with the request.
+ * @conn_param: Information needed to establish the connection.  This must be
+ *   provided if accepting a connection request.  If accepting a connection
+ *   response, this parameter must be NULL.
+ *
+ * Typically, this routine is only called by the listener to accept a connection
+ * request.  It must also be called on the active side of a connection if the
+ * user is performing their own QP transitions.
+ */
+int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param);
+
+/**
+ * rdma_reject - Called on the passive side to reject a connection request.
+ */
+int rdma_reject(struct rdma_cm_id *id, const void *private_data,
+		u8 private_data_len);
+
+/**
+ * rdma_disconnect - This function disconnects the associated QP.
+ */
+int rdma_disconnect(struct rdma_cm_id *id);
+
+#endif /* RDMA_CM_H */
+

^ permalink raw reply

* [PATCH 5/5] Infiniband: connection abstraction
From: Sean Hefty @ 2006-02-01 20:19 UTC (permalink / raw)
  To: netdev, linux-kernel; +Cc: openib-general
In-Reply-To: <ORSMSX401KZmcK8r6be00000097@orsmsx401.amr.corp.intel.com>

This patch adds the kernel component to support the userspace Infiniband/RDMA
connection agent library.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

---

diff -uprN -X linux-2.6.git/Documentation/dontdiff 
linux-2.6.git/drivers/infiniband/core/Makefile 
linux-2.6.ib/drivers/infiniband/core/Makefile
--- linux-2.6.git/drivers/infiniband/core/Makefile	2006-01-16 16:58:58.000000000 -0800
+++ linux-2.6.ib/drivers/infiniband/core/Makefile	2006-01-16 16:55:25.000000000 -0800
@@ -1,5 +1,5 @@
 obj-$(CONFIG_INFINIBAND) +=		ib_core.o ib_mad.o ib_sa.o \
-					ib_cm.o ib_addr.o rdma_cm.o
+					ib_cm.o ib_addr.o rdma_cm.o rdma_ucm.o
 obj-$(CONFIG_INFINIBAND_USER_MAD) +=	ib_umad.o
 obj-$(CONFIG_INFINIBAND_USER_ACCESS) +=	ib_uverbs.o ib_ucm.o
 
@@ -14,6 +14,8 @@ ib_cm-y :=			cm.o
 
 rdma_cm-y :=			cma.o
 
+rdma_ucm-y :=			ucma.o
+
 ib_addr-y :=			addr.o
 
 ib_umad-y :=			user_mad.o
diff -uprN -X linux-2.6.git/Documentation/dontdiff 
linux-2.6.git/drivers/infiniband/core/ucma.c 
linux-2.6.ib/drivers/infiniband/core/ucma.c
--- linux-2.6.git/drivers/infiniband/core/ucma.c	1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.ib/drivers/infiniband/core/ucma.c	2006-01-16 16:54:31.000000000 -0800
@@ -0,0 +1,788 @@
+/*
+ * Copyright (c) 2005 Intel Corporation.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *	copyright notice, this list of conditions and the following
+ *	disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *	copyright notice, this list of conditions and the following
+ *	disclaimer in the documentation and/or other materials
+ *	provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/poll.h>
+#include <linux/idr.h>
+#include <linux/in.h>
+#include <linux/in6.h>
+#include <linux/miscdevice.h>
+
+#include <rdma/rdma_user_cm.h>
+#include <rdma/ib_marshall.h>
+#include <rdma/rdma_cm.h>
+
+MODULE_AUTHOR("Sean Hefty");
+MODULE_DESCRIPTION("RDMA Userspace Connection Manager Access");
+MODULE_LICENSE("Dual BSD/GPL");
+
+enum {
+	UCMA_MAX_BACKLOG	= 128
+};
+
+struct ucma_file {
+	struct mutex		file_mutex;
+	struct file		*filp;
+	struct list_head	ctxs;
+	struct list_head	events;
+	wait_queue_head_t	poll_wait;
+};
+
+struct ucma_context {
+	int			id;
+	wait_queue_head_t	wait;
+	atomic_t		ref;
+	int			events_reported;
+	int			backlog;
+
+	struct ucma_file	*file;
+	struct rdma_cm_id	*cm_id;
+	__u64			uid;
+
+	struct list_head	events;    /* list of pending events. */
+	struct list_head	file_list; /* member in file ctx list */
+};
+
+struct ucma_event {
+	struct ucma_context	*ctx;
+	struct list_head	file_list; /* member in file event list */
+	struct list_head	ctx_list;  /* member in ctx event list */
+	struct rdma_cm_id	*cm_id;
+	struct rdma_ucm_event_resp resp;
+};
+
+static DEFINE_MUTEX(ctx_mutex);
+static DEFINE_IDR(ctx_idr);
+
+static struct ucma_context* ucma_get_ctx(struct ucma_file *file, int id)
+{
+	struct ucma_context *ctx;
+
+	mutex_lock(&ctx_mutex);
+	ctx = idr_find(&ctx_idr, id);
+	if (!ctx)
+		ctx = ERR_PTR(-ENOENT);
+	else if (ctx->file != file)
+		ctx = ERR_PTR(-EINVAL);
+	else
+		atomic_inc(&ctx->ref);
+	mutex_unlock(&ctx_mutex);
+
+	return ctx;
+}
+
+static void ucma_put_ctx(struct ucma_context *ctx)
+{
+	if (atomic_dec_and_test(&ctx->ref))
+		wake_up(&ctx->wait);
+}
+
+static void ucma_cleanup_events(struct ucma_context *ctx)
+{
+	struct ucma_event *uevent;
+
+	mutex_lock(&ctx->file->file_mutex);
+	list_del(&ctx->file_list);
+	while (!list_empty(&ctx->events)) {
+
+		uevent = list_entry(ctx->events.next, struct ucma_event,
+				    ctx_list);
+		list_del(&uevent->file_list);
+		list_del(&uevent->ctx_list);
+
+		/* clear incoming connections. */
+		if (uevent->resp.event == RDMA_CM_EVENT_CONNECT_REQUEST)
+			rdma_destroy_id(uevent->cm_id);
+
+		kfree(uevent);
+	}
+	mutex_unlock(&ctx->file->file_mutex);
+}
+
+static struct ucma_context* ucma_alloc_ctx(struct ucma_file *file)
+{
+	struct ucma_context *ctx;
+	int ret;
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return NULL;
+
+	atomic_set(&ctx->ref, 1);
+	init_waitqueue_head(&ctx->wait);
+	ctx->file = file;
+	INIT_LIST_HEAD(&ctx->events);
+
+	do {
+		ret = idr_pre_get(&ctx_idr, GFP_KERNEL);
+		if (!ret)
+			goto error;
+
+		mutex_lock(&ctx_mutex);
+		ret = idr_get_new(&ctx_idr, ctx, &ctx->id);
+		mutex_unlock(&ctx_mutex);
+	} while (ret == -EAGAIN);
+
+	if (ret)
+		goto error;
+
+	list_add_tail(&ctx->file_list, &file->ctxs);
+	return ctx;
+
+error:
+	kfree(ctx);
+	return NULL;
+}
+
+static int ucma_event_handler(struct rdma_cm_id *cm_id,
+			      struct rdma_cm_event *event)
+{
+	struct ucma_event *uevent;
+	struct ucma_context *ctx = cm_id->context;
+	int ret = 0;
+
+	uevent = kzalloc(sizeof(*uevent), GFP_KERNEL);
+	if (!uevent)
+		return event->event == RDMA_CM_EVENT_CONNECT_REQUEST;
+
+	uevent->ctx = ctx;
+	uevent->cm_id = cm_id;
+	uevent->resp.uid = ctx->uid;
+	uevent->resp.id = ctx->id;
+	uevent->resp.event = event->event;
+	uevent->resp.status = event->status;
+	if ((uevent->resp.private_data_len = event->private_data_len))
+		memcpy(uevent->resp.private_data, event->private_data,
+		       event->private_data_len);
+
+	mutex_lock(&ctx->file->file_mutex);
+	if (event->event == RDMA_CM_EVENT_CONNECT_REQUEST) {
+		if (!ctx->backlog) {
+			ret = -EDQUOT;
+			goto out;
+		}
+		ctx->backlog--;
+	}
+	list_add_tail(&uevent->file_list, &ctx->file->events);
+	list_add_tail(&uevent->ctx_list, &ctx->events);
+	wake_up_interruptible(&ctx->file->poll_wait);
+out:
+	mutex_unlock(&ctx->file->file_mutex);
+	return ret;
+}
+
+static ssize_t ucma_get_event(struct ucma_file *file, const char __user *inbuf,
+			      int in_len, int out_len)
+{
+	struct ucma_context *ctx;
+	struct rdma_ucm_get_event cmd;
+	struct ucma_event *uevent;
+	int ret = 0;
+	DEFINE_WAIT(wait);
+
+	if (out_len < sizeof(struct rdma_ucm_event_resp))
+		return -ENOSPC;
+
+	if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
+		return -EFAULT;
+
+	mutex_lock(&file->file_mutex);
+	while (list_empty(&file->events)) {
+		if (file->filp->f_flags & O_NONBLOCK) {
+			ret = -EAGAIN;
+			break;
+		}
+
+		if (signal_pending(current)) {
+			ret = -ERESTARTSYS;
+			break;
+		}
+
+		prepare_to_wait(&file->poll_wait, &wait, TASK_INTERRUPTIBLE);
+		mutex_unlock(&file->file_mutex);
+		schedule();
+		mutex_lock(&file->file_mutex);
+		finish_wait(&file->poll_wait, &wait);
+	}
+
+	if (ret)
+		goto done;
+
+	uevent = list_entry(file->events.next, struct ucma_event, file_list);
+
+	if (uevent->resp.event == RDMA_CM_EVENT_CONNECT_REQUEST) {
+		ctx = ucma_alloc_ctx(file);
+		if (!ctx) {
+			ret = -ENOMEM;
+			goto done;
+		}
+		uevent->ctx->backlog++;
+		ctx->cm_id = uevent->cm_id;
+		ctx->cm_id->context = ctx;
+		uevent->resp.id = ctx->id;
+	}
+
+	if (copy_to_user((void __user *)(unsigned long)cmd.response,
+			 &uevent->resp, sizeof(uevent->resp))) {
+		ret = -EFAULT;
+		goto done;
+	}
+
+	list_del(&uevent->file_list);
+	list_del(&uevent->ctx_list);
+	uevent->ctx->events_reported++;
+	kfree(uevent);
+done:
+	mutex_unlock(&file->file_mutex);
+	return ret;
+}
+
+static ssize_t ucma_create_id(struct ucma_file *file,
+				const char __user *inbuf,
+				int in_len, int out_len)
+{
+	struct rdma_ucm_create_id cmd;
+	struct rdma_ucm_create_id_resp resp;
+	struct ucma_context *ctx;
+	int ret;
+
+	if (out_len < sizeof(resp))
+		return -ENOSPC;
+
+	if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
+		return -EFAULT;
+
+	mutex_lock(&file->file_mutex);
+	ctx = ucma_alloc_ctx(file);
+	mutex_unlock(&file->file_mutex);
+	if (!ctx)
+		return -ENOMEM;
+
+	ctx->uid = cmd.uid;
+	ctx->cm_id = rdma_create_id(ucma_event_handler, ctx, RDMA_PS_TCP);
+	if (IS_ERR(ctx->cm_id)) {
+		ret = PTR_ERR(ctx->cm_id);
+		goto err1;
+	}
+
+	resp.id = ctx->id;
+	if (copy_to_user((void __user *)(unsigned long)cmd.response,
+			 &resp, sizeof(resp))) {
+		ret = -EFAULT;
+		goto err2;
+	}
+	return 0;
+
+err2:
+	rdma_destroy_id(ctx->cm_id);
+err1:
+	mutex_lock(&ctx_mutex);
+	idr_remove(&ctx_idr, ctx->id);
+	mutex_unlock(&ctx_mutex);
+	kfree(ctx);
+	return ret;
+}
+
+static ssize_t ucma_destroy_id(struct ucma_file *file, const char __user *inbuf,
+			       int in_len, int out_len)
+{
+	struct rdma_ucm_destroy_id cmd;
+	struct rdma_ucm_destroy_id_resp resp;
+	struct ucma_context *ctx;
+	int ret = 0;
+
+	if (out_len < sizeof(resp))
+		return -ENOSPC;
+
+	if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
+		return -EFAULT;
+
+	mutex_lock(&ctx_mutex);
+	ctx = idr_find(&ctx_idr, cmd.id);
+	if (!ctx)
+		ctx = ERR_PTR(-ENOENT);
+	else if (ctx->file != file)
+		ctx = ERR_PTR(-EINVAL);
+	else
+		idr_remove(&ctx_idr, ctx->id);
+	mutex_unlock(&ctx_mutex);
+
+	if (IS_ERR(ctx))
+		return PTR_ERR(ctx);
+
+	atomic_dec(&ctx->ref);
+	wait_event(ctx->wait, !atomic_read(&ctx->ref));
+
+	/* No new events will be generated after destroying the id. */
+	rdma_destroy_id(ctx->cm_id);
+	/* Cleanup events not yet reported to the user. */
+	ucma_cleanup_events(ctx);
+
+	resp.events_reported = ctx->events_reported;
+	if (copy_to_user((void __user *)(unsigned long)cmd.response,
+			 &resp, sizeof(resp)))
+		ret = -EFAULT;
+
+	kfree(ctx);
+	return ret;
+}
+
+static ssize_t ucma_bind_addr(struct ucma_file *file, const char __user *inbuf,
+			      int in_len, int out_len)
+{
+	struct rdma_ucm_bind_addr cmd;
+	struct ucma_context *ctx;
+	int ret;
+
+	if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
+		return -EFAULT;
+
+	ctx = ucma_get_ctx(file, cmd.id);
+	if (IS_ERR(ctx))
+		return PTR_ERR(ctx);
+
+	ret = rdma_bind_addr(ctx->cm_id, (struct sockaddr *) &cmd.addr);
+	ucma_put_ctx(ctx);
+	return ret;
+}
+
+static ssize_t ucma_resolve_addr(struct ucma_file *file,
+				 const char __user *inbuf,
+				 int in_len, int out_len)
+{
+	struct rdma_ucm_resolve_addr cmd;
+	struct ucma_context *ctx;
+	int ret;
+
+	if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
+		return -EFAULT;
+
+	ctx = ucma_get_ctx(file, cmd.id);
+	if (IS_ERR(ctx))
+		return PTR_ERR(ctx);
+
+	ret = rdma_resolve_addr(ctx->cm_id, (struct sockaddr *) &cmd.src_addr,
+				(struct sockaddr *) &cmd.dst_addr,
+				cmd.timeout_ms);
+	ucma_put_ctx(ctx);
+	return ret;
+}
+
+static ssize_t ucma_resolve_route(struct ucma_file *file,
+				  const char __user *inbuf,
+				  int in_len, int out_len)
+{
+	struct rdma_ucm_resolve_route cmd;
+	struct ucma_context *ctx;
+	int ret;
+
+	if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
+		return -EFAULT;
+
+	ctx = ucma_get_ctx(file, cmd.id);
+	if (IS_ERR(ctx))
+		return PTR_ERR(ctx);
+
+	ret = rdma_resolve_route(ctx->cm_id, cmd.timeout_ms);
+	ucma_put_ctx(ctx);
+	return ret;
+}
+
+static void ucma_copy_ib_route(struct rdma_ucm_query_route_resp *resp,
+			       struct rdma_route *route)
+{
+	struct rdma_dev_addr *dev_addr;
+
+	resp->num_paths = route->num_paths;
+	switch (route->num_paths) {
+	case 0:
+		dev_addr = &route->addr.dev_addr;
+		memcpy(&resp->ib_route[0].dgid, ib_addr_get_dgid(dev_addr),
+		       sizeof(union ib_gid));
+		memcpy(&resp->ib_route[0].sgid, ib_addr_get_sgid(dev_addr),
+		       sizeof(union ib_gid));
+		resp->ib_route[0].pkey = cpu_to_be16(ib_addr_get_pkey(dev_addr));
+		break;
+	case 2:
+		ib_copy_path_rec_to_user(&resp->ib_route[1],
+					 &route->path_rec[1]);
+		/* fall through */
+	case 1:
+		ib_copy_path_rec_to_user(&resp->ib_route[0],
+					 &route->path_rec[0]);
+		break;
+	default:
+		break;
+	}
+}
+
+static ssize_t ucma_query_route(struct ucma_file *file,
+				const char __user *inbuf,
+				int in_len, int out_len)
+{
+	struct rdma_ucm_query_route cmd;
+	struct rdma_ucm_query_route_resp resp;
+	struct ucma_context *ctx;
+	struct sockaddr *addr;
+	int ret = 0;
+
+	if (out_len < sizeof(resp))
+		return -ENOSPC;
+
+	if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
+		return -EFAULT;
+
+	ctx = ucma_get_ctx(file, cmd.id);
+	if (IS_ERR(ctx))
+		return PTR_ERR(ctx);
+
+	if (!ctx->cm_id->device) {
+		ret = -ENODEV;
+		goto out;
+	}
+
+	addr = &ctx->cm_id->route.addr.src_addr;
+	memcpy(&resp.src_addr, addr, addr->sa_family == AF_INET ?
+				     sizeof(struct sockaddr_in) : 
+				     sizeof(struct sockaddr_in6));
+	addr = &ctx->cm_id->route.addr.dst_addr;
+	memcpy(&resp.dst_addr, addr, addr->sa_family == AF_INET ?
+				     sizeof(struct sockaddr_in) : 
+				     sizeof(struct sockaddr_in6));
+	resp.node_guid = ctx->cm_id->device->node_guid;
+	resp.port_num = ctx->cm_id->port_num;
+	switch (ctx->cm_id->device->node_type) {
+	case IB_NODE_CA:
+		ucma_copy_ib_route(&resp, &ctx->cm_id->route);
+	default:
+		break;
+	}
+
+	if (copy_to_user((void __user *)(unsigned long)cmd.response,
+			 &resp, sizeof(resp)))
+		ret = -EFAULT;
+
+out:
+	ucma_put_ctx(ctx);
+	return ret;
+}
+
+static void ucma_copy_conn_param(struct rdma_conn_param *dst_conn,
+				 struct rdma_ucm_conn_param *src_conn)
+{
+	dst_conn->private_data = src_conn->private_data;
+	dst_conn->private_data_len = src_conn->private_data_len;
+	dst_conn->responder_resources =src_conn->responder_resources;
+	dst_conn->initiator_depth = src_conn->initiator_depth;
+	dst_conn->flow_control = src_conn->flow_control;
+	dst_conn->retry_count = src_conn->retry_count;
+	dst_conn->rnr_retry_count = src_conn->rnr_retry_count;
+	dst_conn->srq = src_conn->srq;
+	dst_conn->qp_num = src_conn->qp_num;
+	dst_conn->qp_type = src_conn->qp_type;
+}
+
+static ssize_t ucma_connect(struct ucma_file *file, const char __user *inbuf,
+			    int in_len, int out_len)
+{
+	struct rdma_ucm_connect cmd;
+	struct rdma_conn_param conn_param;
+	struct ucma_context *ctx;
+	int ret;
+
+	if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
+		return -EFAULT;
+
+	if (!cmd.conn_param.valid)
+		return -EINVAL;
+
+	ctx = ucma_get_ctx(file, cmd.id);
+	if (IS_ERR(ctx))
+		return PTR_ERR(ctx);
+
+	ucma_copy_conn_param(&conn_param, &cmd.conn_param);
+	ret = rdma_connect(ctx->cm_id, &conn_param);
+	ucma_put_ctx(ctx);
+	return ret;
+}
+
+static ssize_t ucma_listen(struct ucma_file *file, const char __user *inbuf,
+			   int in_len, int out_len)
+{
+	struct rdma_ucm_listen cmd;
+	struct ucma_context *ctx;
+	int ret;
+
+	if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
+		return -EFAULT;
+
+	ctx = ucma_get_ctx(file, cmd.id);
+	if (IS_ERR(ctx))
+		return PTR_ERR(ctx);
+
+	ctx->backlog = cmd.backlog > 0 && cmd.backlog < UCMA_MAX_BACKLOG ?
+		       cmd.backlog : UCMA_MAX_BACKLOG;
+	ret = rdma_listen(ctx->cm_id, ctx->backlog);
+	ucma_put_ctx(ctx);
+	return ret;
+}
+
+static ssize_t ucma_accept(struct ucma_file *file, const char __user *inbuf,
+			   int in_len, int out_len)
+{
+	struct rdma_ucm_accept cmd;
+	struct rdma_conn_param conn_param;
+	struct ucma_context *ctx;
+	int ret;
+
+	if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
+		return -EFAULT;
+
+	ctx = ucma_get_ctx(file, cmd.id);
+	if (IS_ERR(ctx))
+		return PTR_ERR(ctx);
+
+	if (cmd.conn_param.valid) {
+		ctx->uid = cmd.uid;
+		ucma_copy_conn_param(&conn_param, &cmd.conn_param);
+		ret = rdma_accept(ctx->cm_id, &conn_param);
+	} else
+		ret = rdma_accept(ctx->cm_id, NULL);
+
+	ucma_put_ctx(ctx);
+	return ret;
+}
+
+static ssize_t ucma_reject(struct ucma_file *file, const char __user *inbuf,
+			   int in_len, int out_len)
+{
+	struct rdma_ucm_reject cmd;
+	struct ucma_context *ctx;
+	int ret;
+
+	if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
+		return -EFAULT;
+
+	ctx = ucma_get_ctx(file, cmd.id);
+	if (IS_ERR(ctx))
+		return PTR_ERR(ctx);
+
+	ret = rdma_reject(ctx->cm_id, cmd.private_data, cmd.private_data_len);
+	ucma_put_ctx(ctx);
+	return ret;
+}
+
+static ssize_t ucma_disconnect(struct ucma_file *file, const char __user *inbuf,
+			       int in_len, int out_len)
+{
+	struct rdma_ucm_disconnect cmd;
+	struct ucma_context *ctx;
+	int ret;
+
+	if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
+		return -EFAULT;
+
+	ctx = ucma_get_ctx(file, cmd.id);
+	if (IS_ERR(ctx))
+		return PTR_ERR(ctx);
+
+	ret = rdma_disconnect(ctx->cm_id);
+	ucma_put_ctx(ctx);
+	return ret;
+}
+
+static ssize_t ucma_init_qp_attr(struct ucma_file *file,
+				 const char __user *inbuf,
+				 int in_len, int out_len)
+{
+	struct rdma_ucm_init_qp_attr cmd;
+	struct ib_uverbs_qp_attr resp;
+	struct ucma_context *ctx;
+	struct ib_qp_attr qp_attr;
+	int ret;
+
+	if (out_len < sizeof(resp))
+		return -ENOSPC;
+
+	if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
+		return -EFAULT;
+
+	ctx = ucma_get_ctx(file, cmd.id);
+	if (IS_ERR(ctx))
+		return PTR_ERR(ctx);
+
+	resp.qp_attr_mask = 0;
+	memset(&qp_attr, 0, sizeof qp_attr);
+	qp_attr.qp_state = cmd.qp_state;
+	ret = rdma_init_qp_attr(ctx->cm_id, &qp_attr, &resp.qp_attr_mask);
+	if (ret)
+		goto out;
+
+	ib_copy_qp_attr_to_user(&resp, &qp_attr);
+	if (copy_to_user((void __user *)(unsigned long)cmd.response,
+			 &resp, sizeof(resp)))
+		ret = -EFAULT;
+
+out:
+	ucma_put_ctx(ctx);
+	return ret;
+}
+
+static ssize_t (*ucma_cmd_table[])(struct ucma_file *file,
+				   const char __user *inbuf,
+				   int in_len, int out_len) = {
+	[RDMA_USER_CM_CMD_CREATE_ID]	= ucma_create_id,
+	[RDMA_USER_CM_CMD_DESTROY_ID]	= ucma_destroy_id,
+	[RDMA_USER_CM_CMD_BIND_ADDR]	= ucma_bind_addr,
+	[RDMA_USER_CM_CMD_RESOLVE_ADDR]	= ucma_resolve_addr,
+	[RDMA_USER_CM_CMD_RESOLVE_ROUTE]= ucma_resolve_route,
+	[RDMA_USER_CM_CMD_QUERY_ROUTE]	= ucma_query_route,
+	[RDMA_USER_CM_CMD_CONNECT]	= ucma_connect,
+	[RDMA_USER_CM_CMD_LISTEN]	= ucma_listen,
+	[RDMA_USER_CM_CMD_ACCEPT]	= ucma_accept,
+	[RDMA_USER_CM_CMD_REJECT]	= ucma_reject,
+	[RDMA_USER_CM_CMD_DISCONNECT]	= ucma_disconnect,
+	[RDMA_USER_CM_CMD_INIT_QP_ATTR]	= ucma_init_qp_attr,
+	[RDMA_USER_CM_CMD_GET_EVENT]	= ucma_get_event
+};
+
+static ssize_t ucma_write(struct file *filp, const char __user *buf,
+			  size_t len, loff_t *pos)
+{
+	struct ucma_file *file = filp->private_data;
+	struct rdma_ucm_cmd_hdr hdr;
+	ssize_t ret;
+
+	if (len < sizeof(hdr))
+		return -EINVAL;
+
+	if (copy_from_user(&hdr, buf, sizeof(hdr)))
+		return -EFAULT;
+
+	if (hdr.cmd < 0 || hdr.cmd >= ARRAY_SIZE(ucma_cmd_table))
+		return -EINVAL;
+
+	if (hdr.in + sizeof(hdr) > len)
+		return -EINVAL;
+
+	ret = ucma_cmd_table[hdr.cmd](file, buf + sizeof(hdr), hdr.in, hdr.out);
+	if (!ret)
+		ret = len;
+
+	return ret;
+}
+
+static unsigned int ucma_poll(struct file *filp, struct poll_table_struct *wait)
+{
+	struct ucma_file *file = filp->private_data;
+	unsigned int mask = 0;
+
+	poll_wait(filp, &file->poll_wait, wait);
+
+	mutex_lock(&file->file_mutex);
+	if (!list_empty(&file->events))
+		mask = POLLIN | POLLRDNORM;
+	mutex_unlock(&file->file_mutex);
+
+	return mask;
+}
+
+static int ucma_open(struct inode *inode, struct file *filp)
+{
+	struct ucma_file *file;
+
+	file = kmalloc(sizeof *file, GFP_KERNEL);
+	if (!file)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&file->events);
+	INIT_LIST_HEAD(&file->ctxs);
+	init_waitqueue_head(&file->poll_wait);
+	mutex_init(&file->file_mutex);
+
+	filp->private_data = file;
+	file->filp = filp;
+	return 0;
+}
+
+static int ucma_close(struct inode *inode, struct file *filp)
+{
+	struct ucma_file *file = filp->private_data;
+	struct ucma_context *ctx;
+
+	mutex_lock(&file->file_mutex);
+	while (!list_empty(&file->ctxs)) {
+		ctx = list_entry(file->ctxs.next, struct ucma_context,
+				 file_list);
+		mutex_unlock(&file->file_mutex);
+
+		mutex_lock(&ctx_mutex);
+		idr_remove(&ctx_idr, ctx->id);
+		mutex_unlock(&ctx_mutex);
+
+		rdma_destroy_id(ctx->cm_id);
+		ucma_cleanup_events(ctx);
+		kfree(ctx);
+
+		mutex_lock(&file->file_mutex);
+	}
+	mutex_unlock(&file->file_mutex);
+	kfree(file);
+	return 0;
+}
+
+static struct file_operations ucma_fops = {
+	.owner 	 = THIS_MODULE,
+	.open 	 = ucma_open,
+	.release = ucma_close,
+	.write	 = ucma_write,
+	.poll    = ucma_poll,
+};
+
+static struct miscdevice ucma_misc = {
+	.minor	= MISC_DYNAMIC_MINOR,
+	.name	= "rdma_cm",
+	.fops	= &ucma_fops,
+};
+
+static int __init ucma_init(void)
+{
+	return misc_register(&ucma_misc);
+}
+
+static void __exit ucma_cleanup(void)
+{
+	misc_deregister(&ucma_misc);
+	idr_destroy(&ctx_idr);
+}
+
+module_init(ucma_init);
+module_exit(ucma_cleanup);
diff -uprN -X linux-2.6.git/Documentation/dontdiff 
linux-2.6.git/include/rdma/rdma_user_cm.h 
linux-2.6.ib/include/rdma/rdma_user_cm.h
--- linux-2.6.git/include/rdma/rdma_user_cm.h	1969-12-31 16:00:00.000000000 -0800
+++ linux-2.6.ib/include/rdma/rdma_user_cm.h	2006-01-16 16:54:55.000000000 -0800
@@ -0,0 +1,186 @@
+/*
+ * Copyright (c) 2005 Intel Corporation.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef RDMA_USER_CM_H
+#define RDMA_USER_CM_H
+
+#include <linux/types.h>
+#include <linux/in6.h>
+#include <rdma/ib_user_verbs.h>
+#include <rdma/ib_user_sa.h>
+
+#define RDMA_USER_CM_ABI_VERSION 1
+
+#define RDMA_MAX_PRIVATE_DATA		256
+
+enum {
+	RDMA_USER_CM_CMD_CREATE_ID,
+	RDMA_USER_CM_CMD_DESTROY_ID,
+	RDMA_USER_CM_CMD_BIND_ADDR,
+	RDMA_USER_CM_CMD_RESOLVE_ADDR,
+	RDMA_USER_CM_CMD_RESOLVE_ROUTE,
+	RDMA_USER_CM_CMD_QUERY_ROUTE,
+	RDMA_USER_CM_CMD_CONNECT,
+	RDMA_USER_CM_CMD_LISTEN,
+	RDMA_USER_CM_CMD_ACCEPT,
+	RDMA_USER_CM_CMD_REJECT,
+	RDMA_USER_CM_CMD_DISCONNECT,
+	RDMA_USER_CM_CMD_INIT_QP_ATTR,
+	RDMA_USER_CM_CMD_GET_EVENT
+};
+
+/*
+ * command ABI structures.
+ */
+struct rdma_ucm_cmd_hdr {
+	__u32 cmd;
+	__u16 in;
+	__u16 out;
+};
+
+struct rdma_ucm_create_id {
+	__u64 uid;
+	__u64 response;
+};
+
+struct rdma_ucm_create_id_resp {
+	__u32 id;
+};
+
+struct rdma_ucm_destroy_id {
+	__u64 response;
+	__u32 id;
+	__u32 reserved;
+};
+
+struct rdma_ucm_destroy_id_resp {
+	__u32 events_reported;
+};
+
+struct rdma_ucm_bind_addr {
+	__u64 response;
+	struct sockaddr_in6 addr;
+	__u32 id;
+};
+
+struct rdma_ucm_resolve_addr {
+	struct sockaddr_in6 src_addr;
+	struct sockaddr_in6 dst_addr;
+	__u32 id;
+	__u32 timeout_ms;
+};
+
+struct rdma_ucm_resolve_route {
+	__u32 id;
+	__u32 timeout_ms;
+};
+
+struct rdma_ucm_query_route {
+	__u64 response;
+	__u32 id;
+	__u32 reserved;
+};
+
+struct rdma_ucm_query_route_resp {
+	__u64 node_guid;
+	struct ib_user_path_rec ib_route[2];
+	struct sockaddr_in6 src_addr;
+	struct sockaddr_in6 dst_addr;
+	__u32 num_paths;
+	__u8 port_num;
+	__u8 reserved[3];
+};
+
+struct rdma_ucm_conn_param {
+	__u32 qp_num;
+	__u32 qp_type;
+	__u8  private_data[RDMA_MAX_PRIVATE_DATA];
+	__u8  private_data_len;
+	__u8  srq;
+	__u8  responder_resources;
+	__u8  initiator_depth;
+	__u8  flow_control;
+	__u8  retry_count;
+	__u8  rnr_retry_count;
+	__u8  valid;
+};
+
+struct rdma_ucm_connect {
+	struct rdma_ucm_conn_param conn_param;
+	__u32 id;
+	__u32 reserved;
+};
+
+struct rdma_ucm_listen {
+	__u32 id;
+	__u32 backlog;
+};
+
+struct rdma_ucm_accept {
+	__u64 uid;
+	struct rdma_ucm_conn_param conn_param;
+	__u32 id;
+	__u32 reserved;
+};
+
+struct rdma_ucm_reject {
+	__u32 id;
+	__u8  private_data_len;
+	__u8  reserved[3];
+	__u8  private_data[RDMA_MAX_PRIVATE_DATA];
+};
+
+struct rdma_ucm_disconnect {
+	__u32 id;
+};
+
+struct rdma_ucm_init_qp_attr {
+	__u64 response;
+	__u32 id;
+	__u32 qp_state;
+};
+
+struct rdma_ucm_get_event {
+	__u64 response;
+};
+
+struct rdma_ucm_event_resp {
+	__u64 uid;
+	__u32 id;
+	__u32 event;
+	__u32 status;
+	__u8  private_data_len;
+	__u8  reserved[3];
+	__u8  private_data[RDMA_MAX_PRIVATE_DATA];
+};
+
+#endif /* RDMA_USER_CM_H */

^ permalink raw reply

* [PATCH] gianfar: Fix sparse warnings
From: Kumar Gala @ 2006-02-01 21:18 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel, netdev

Fixed sparse warnings mainly due to lack of __iomem.

---
commit 682307be628635abd497a77bf26aa7ba6b5593e9
tree ac3156a394371416519566a8e4aff345575e8f33
parent 690e2cb75eb8fc41413a9e9f85919b275b4164a9
author Kumar Gala <galak@kernel.crashing.org> Wed, 01 Feb 2006 15:17:45 -0600
committer Kumar Gala <galak@kernel.crashing.org> Wed, 01 Feb 2006 15:17:45 -0600

 drivers/net/gianfar.c         |   24 +++++++++++-------------
 drivers/net/gianfar.h         |    8 ++++----
 drivers/net/gianfar_ethtool.c |    8 ++++----
 drivers/net/gianfar_mii.c     |   17 ++++++++---------
 4 files changed, 27 insertions(+), 30 deletions(-)

diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index 0c18dbd..0e8e3fc 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -199,8 +199,7 @@ static int gfar_probe(struct platform_de
 
 	/* get a pointer to the register memory */
 	r = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-	priv->regs = (struct gfar *)
-		ioremap(r->start, sizeof (struct gfar));
+	priv->regs = ioremap(r->start, sizeof (struct gfar));
 
 	if (NULL == priv->regs) {
 		err = -ENOMEM;
@@ -369,7 +368,7 @@ static int gfar_probe(struct platform_de
 	return 0;
 
 register_fail:
-	iounmap((void *) priv->regs);
+	iounmap(priv->regs);
 regs_fail:
 	free_netdev(dev);
 	return err;
@@ -382,7 +381,7 @@ static int gfar_remove(struct platform_d
 
 	platform_set_drvdata(pdev, NULL);
 
-	iounmap((void *) priv->regs);
+	iounmap(priv->regs);
 	free_netdev(dev);
 
 	return 0;
@@ -454,8 +453,7 @@ static void init_registers(struct net_de
 
 	/* Zero out the rmon mib registers if it has them */
 	if (priv->einfo->device_flags & FSL_GIANFAR_DEV_HAS_RMON) {
-		memset((void *) &(priv->regs->rmon), 0,
-		       sizeof (struct rmon_mib));
+		memset_io(&(priv->regs->rmon), 0, sizeof (struct rmon_mib));
 
 		/* Mask off the CAM interrupts */
 		gfar_write(&priv->regs->rmon.cam1, 0xffffffff);
@@ -477,7 +475,7 @@ static void init_registers(struct net_de
 void gfar_halt(struct net_device *dev)
 {
 	struct gfar_private *priv = netdev_priv(dev);
-	struct gfar *regs = priv->regs;
+	struct gfar __iomem *regs = priv->regs;
 	u32 tempval;
 
 	/* Mask all interrupts */
@@ -507,7 +505,7 @@ void gfar_halt(struct net_device *dev)
 void stop_gfar(struct net_device *dev)
 {
 	struct gfar_private *priv = netdev_priv(dev);
-	struct gfar *regs = priv->regs;
+	struct gfar __iomem *regs = priv->regs;
 	unsigned long flags;
 
 	phy_stop(priv->phydev);
@@ -590,7 +588,7 @@ static void free_skb_resources(struct gf
 void gfar_start(struct net_device *dev)
 {
 	struct gfar_private *priv = netdev_priv(dev);
-	struct gfar *regs = priv->regs;
+	struct gfar __iomem *regs = priv->regs;
 	u32 tempval;
 
 	/* Enable Rx and Tx in MACCFG1 */
@@ -624,7 +622,7 @@ int startup_gfar(struct net_device *dev)
 	unsigned long vaddr;
 	int i;
 	struct gfar_private *priv = netdev_priv(dev);
-	struct gfar *regs = priv->regs;
+	struct gfar __iomem *regs = priv->regs;
 	int err = 0;
 	u32 rctrl = 0;
 	u32 attrs = 0;
@@ -1622,7 +1620,7 @@ static irqreturn_t gfar_interrupt(int ir
 static void adjust_link(struct net_device *dev)
 {
 	struct gfar_private *priv = netdev_priv(dev);
-	struct gfar *regs = priv->regs;
+	struct gfar __iomem *regs = priv->regs;
 	unsigned long flags;
 	struct phy_device *phydev = priv->phydev;
 	int new_state = 0;
@@ -1703,7 +1701,7 @@ static void gfar_set_multi(struct net_de
 {
 	struct dev_mc_list *mc_ptr;
 	struct gfar_private *priv = netdev_priv(dev);
-	struct gfar *regs = priv->regs;
+	struct gfar __iomem *regs = priv->regs;
 	u32 tempval;
 
 	if(dev->flags & IFF_PROMISC) {
@@ -1842,7 +1840,7 @@ static void gfar_set_mac_for_addr(struct
 	int idx;
 	char tmpbuf[MAC_ADDR_LEN];
 	u32 tempval;
-	u32 *macptr = &priv->regs->macstnaddr1;
+	u32 __iomem *macptr = &priv->regs->macstnaddr1;
 
 	macptr += num*2;
 
diff --git a/drivers/net/gianfar.h b/drivers/net/gianfar.h
index cb9d66a..d37d540 100644
--- a/drivers/net/gianfar.h
+++ b/drivers/net/gianfar.h
@@ -682,8 +682,8 @@ struct gfar_private {
 	struct rxbd8 *cur_rx;           /* Next free rx ring entry */
 	struct txbd8 *cur_tx;	        /* Next free ring entry */
 	struct txbd8 *dirty_tx;		/* The Ring entry to be freed. */
-	struct gfar *regs;	/* Pointer to the GFAR memory mapped Registers */
-	u32 *hash_regs[16];
+	struct gfar __iomem *regs;	/* Pointer to the GFAR memory mapped Registers */
+	u32 __iomem *hash_regs[16];
 	int hash_width;
 	struct net_device_stats stats; /* linux network statistics */
 	struct gfar_extra_stats extra_stats;
@@ -718,14 +718,14 @@ struct gfar_private {
 	uint32_t msg_enable;
 };
 
-static inline u32 gfar_read(volatile unsigned *addr)
+static inline u32 gfar_read(volatile unsigned __iomem *addr)
 {
 	u32 val;
 	val = in_be32(addr);
 	return val;
 }
 
-static inline void gfar_write(volatile unsigned *addr, u32 val)
+static inline void gfar_write(volatile unsigned __iomem *addr, u32 val)
 {
 	out_be32(addr, val);
 }
diff --git a/drivers/net/gianfar_ethtool.c b/drivers/net/gianfar_ethtool.c
index 765e810..5de7b2e 100644
--- a/drivers/net/gianfar_ethtool.c
+++ b/drivers/net/gianfar_ethtool.c
@@ -144,11 +144,11 @@ static void gfar_fill_stats(struct net_d
 	u64 *extra = (u64 *) & priv->extra_stats;
 
 	if (priv->einfo->device_flags & FSL_GIANFAR_DEV_HAS_RMON) {
-		u32 *rmon = (u32 *) & priv->regs->rmon;
+		u32 __iomem *rmon = (u32 __iomem *) & priv->regs->rmon;
 		struct gfar_stats *stats = (struct gfar_stats *) buf;
 
 		for (i = 0; i < GFAR_RMON_LEN; i++)
-			stats->rmon[i] = (u64) (rmon[i]);
+			stats->rmon[i] = (u64) gfar_read(&rmon[i]);
 
 		for (i = 0; i < GFAR_EXTRA_STATS_LEN; i++)
 			stats->extra[i] = extra[i];
@@ -221,11 +221,11 @@ static void gfar_get_regs(struct net_dev
 {
 	int i;
 	struct gfar_private *priv = netdev_priv(dev);
-	u32 *theregs = (u32 *) priv->regs;
+	u32 __iomem *theregs = (u32 __iomem *) priv->regs;
 	u32 *buf = (u32 *) regbuf;
 
 	for (i = 0; i < sizeof (struct gfar) / sizeof (u32); i++)
-		buf[i] = theregs[i];
+		buf[i] = gfar_read(&theregs[i]);
 }
 
 /* Convert microseconds to ethernet clock ticks, which changes
diff --git a/drivers/net/gianfar_mii.c b/drivers/net/gianfar_mii.c
index 74e52fc..c6b7255 100644
--- a/drivers/net/gianfar_mii.c
+++ b/drivers/net/gianfar_mii.c
@@ -50,7 +50,7 @@
  * All PHY configuration is done through the TSEC1 MIIM regs */
 int gfar_mdio_write(struct mii_bus *bus, int mii_id, int regnum, u16 value)
 {
-	struct gfar_mii *regs = bus->priv;
+	struct gfar_mii __iomem *regs = (void __iomem *)bus->priv;
 
 	/* Set the PHY address and the register address we want to write */
 	gfar_write(&regs->miimadd, (mii_id << 8) | regnum);
@@ -70,7 +70,7 @@ int gfar_mdio_write(struct mii_bus *bus,
  * configuration has to be done through the TSEC1 MIIM regs */
 int gfar_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
 {
-	struct gfar_mii *regs = bus->priv;
+	struct gfar_mii __iomem *regs = (void __iomem *)bus->priv;
 	u16 value;
 
 	/* Set the PHY address and the register address we want to read */
@@ -94,7 +94,7 @@ int gfar_mdio_read(struct mii_bus *bus, 
 /* Reset the MIIM registers, and wait for the bus to free */
 int gfar_mdio_reset(struct mii_bus *bus)
 {
-	struct gfar_mii *regs = bus->priv;
+	struct gfar_mii __iomem *regs = (void __iomem *)bus->priv;
 	unsigned int timeout = PHY_INIT_TIMEOUT;
 
 	spin_lock_bh(&bus->mdio_lock);
@@ -126,7 +126,7 @@ int gfar_mdio_probe(struct device *dev)
 {
 	struct platform_device *pdev = to_platform_device(dev);
 	struct gianfar_mdio_data *pdata;
-	struct gfar_mii *regs;
+	struct gfar_mii __iomem *regs;
 	struct mii_bus *new_bus;
 	struct resource *r;
 	int err = 0;
@@ -155,15 +155,14 @@ int gfar_mdio_probe(struct device *dev)
 	r = platform_get_resource(pdev, IORESOURCE_MEM, 0);
 
 	/* Set the PHY base address */
-	regs = (struct gfar_mii *) ioremap(r->start,
-			sizeof (struct gfar_mii));
+	regs = ioremap(r->start, sizeof (struct gfar_mii));
 
 	if (NULL == regs) {
 		err = -ENOMEM;
 		goto reg_map_fail;
 	}
 
-	new_bus->priv = regs;
+	new_bus->priv = (void __force *)regs;
 
 	new_bus->irq = pdata->irq;
 
@@ -181,7 +180,7 @@ int gfar_mdio_probe(struct device *dev)
 	return 0;
 
 bus_register_fail:
-	iounmap((void *) regs);
+	iounmap(regs);
 reg_map_fail:
 	kfree(new_bus);
 
@@ -197,7 +196,7 @@ int gfar_mdio_remove(struct device *dev)
 
 	dev_set_drvdata(dev, NULL);
 
-	iounmap((void *) (&bus->priv));
+	iounmap((void __iomem *)bus->priv);
 	bus->priv = NULL;
 	kfree(bus);
 

^ permalink raw reply related

* [patch 1/2] natsemi: NAPI and a bugfix
From: Mark Brown @ 2006-02-02  0:00 UTC (permalink / raw)
  To: Jeff Garzik, Tim Hockin; +Cc: netdev, linux-kernel
In-Reply-To: <20060202233909.426660000@lorien.sirena.org.uk>

[-- Attachment #1: natsemi-napi --]
[-- Type: text/plain, Size: 10122 bytes --]

This patch converts the natsemi driver to use NAPI.  It was originally
based on one written by Harald Welte, though it has since been modified
quite a bit, most extensively in order to remove the ability to disable
NAPI since none of the other drivers seem to provide that functionality
any more.

Signed-off-by: Mark Brown <broonie@sirena.org.uk>

Index: linux-2.6.15.2/drivers/net/natsemi.c
===================================================================
--- linux-2.6.15.2.orig/drivers/net/natsemi.c	2006-01-31 06:25:07.000000000 +0000
+++ linux-2.6.15.2/drivers/net/natsemi.c	2006-02-01 22:59:29.000000000 +0000
@@ -3,6 +3,7 @@
 	Written/copyright 1999-2001 by Donald Becker.
 	Portions copyright (c) 2001,2002 Sun Microsystems (thockin@sun.com)
 	Portions copyright 2001,2002 Manfred Spraul (manfred@colorfullife.com)
+	Portions copyright 2004 Harald Welte <laforge@gnumonks.org>
 
 	This software may be used and distributed according to the terms of
 	the GNU General Public License (GPL), incorporated herein by reference.
@@ -135,8 +136,6 @@
 
 	TODO:
 	* big endian support with CFG:BEM instead of cpu_to_le32
-	* support for an external PHY
-	* NAPI
 */
 
 #include <linux/config.h>
@@ -160,6 +159,7 @@
 #include <linux/mii.h>
 #include <linux/crc32.h>
 #include <linux/bitops.h>
+#include <linux/prefetch.h>
 #include <asm/processor.h>	/* Processor type for cache alignment. */
 #include <asm/io.h>
 #include <asm/irq.h>
@@ -183,8 +183,6 @@
 				 NETIF_MSG_TX_ERR)
 static int debug = -1;
 
-/* Maximum events (Rx packets, etc.) to handle at each interrupt. */
-static int max_interrupt_work = 20;
 static int mtu;
 
 /* Maximum number of multicast addresses to filter (vs. rx-all-multicast).
@@ -251,14 +249,11 @@ MODULE_AUTHOR("Donald Becker <becker@scy
 MODULE_DESCRIPTION("National Semiconductor DP8381x series PCI Ethernet driver");
 MODULE_LICENSE("GPL");
 
-module_param(max_interrupt_work, int, 0);
 module_param(mtu, int, 0);
 module_param(debug, int, 0);
 module_param(rx_copybreak, int, 0);
 module_param_array(options, int, NULL, 0);
 module_param_array(full_duplex, int, NULL, 0);
-MODULE_PARM_DESC(max_interrupt_work, 
-	"DP8381x maximum events handled per interrupt");
 MODULE_PARM_DESC(mtu, "DP8381x MTU (all boards)");
 MODULE_PARM_DESC(debug, "DP8381x default debug level");
 MODULE_PARM_DESC(rx_copybreak, 
@@ -691,6 +686,8 @@ struct netdev_private {
 	/* Based on MTU+slack. */
 	unsigned int rx_buf_sz;
 	int oom;
+	/* Interrupt status */
+	u32 intr_status;
 	/* Do not touch the nic registers */
 	int hands_off;
 	/* external phy that is used: only valid if dev->if_port != PORT_TP */
@@ -748,7 +745,8 @@ static void init_registers(struct net_de
 static int start_tx(struct sk_buff *skb, struct net_device *dev);
 static irqreturn_t intr_handler(int irq, void *dev_instance, struct pt_regs *regs);
 static void netdev_error(struct net_device *dev, int intr_status);
-static void netdev_rx(struct net_device *dev);
+static int natsemi_poll(struct net_device *dev, int *budget);
+static void netdev_rx(struct net_device *dev, int *work_done, int work_to_do);
 static void netdev_tx_done(struct net_device *dev);
 static int natsemi_change_mtu(struct net_device *dev, int new_mtu);
 #ifdef CONFIG_NET_POLL_CONTROLLER
@@ -776,6 +774,18 @@ static inline void __iomem *ns_ioaddr(st
 	return (void __iomem *) dev->base_addr;
 }
 
+static inline void natsemi_irq_enable(struct net_device *dev)
+{
+	writel(1, ns_ioaddr(dev) + IntrEnable);
+	readl(ns_ioaddr(dev) + IntrEnable);
+}
+
+static inline void natsemi_irq_disable(struct net_device *dev)
+{
+	writel(0, ns_ioaddr(dev) + IntrEnable);
+	readl(ns_ioaddr(dev) + IntrEnable);
+}
+
 static void move_int_phy(struct net_device *dev, int addr)
 {
 	struct netdev_private *np = netdev_priv(dev);
@@ -879,6 +889,7 @@ static int __devinit natsemi_probe1 (str
 	spin_lock_init(&np->lock);
 	np->msg_enable = (debug >= 0) ? (1<<debug)-1 : NATSEMI_DEF_MSG;
 	np->hands_off = 0;
+	np->intr_status = 0;
 
 	/* Initial port:
 	 * - If the nic was configured to use an external phy and if find_mii
@@ -932,6 +943,9 @@ static int __devinit natsemi_probe1 (str
 	dev->do_ioctl = &netdev_ioctl;
 	dev->tx_timeout = &tx_timeout;
 	dev->watchdog_timeo = TX_TIMEOUT;
+	dev->poll = natsemi_poll;
+	dev->weight = 64;
+
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	dev->poll_controller = &natsemi_poll_controller;
 #endif
@@ -2158,68 +2172,92 @@ static void netdev_tx_done(struct net_de
 	}
 }
 
-/* The interrupt handler does all of the Rx thread work and cleans up
-   after the Tx thread. */
+/* The interrupt handler doesn't actually handle interrupts itself, it
+ * schedules a NAPI poll if there is anything to do. */
 static irqreturn_t intr_handler(int irq, void *dev_instance, struct pt_regs *rgs)
 {
 	struct net_device *dev = dev_instance;
 	struct netdev_private *np = netdev_priv(dev);
 	void __iomem * ioaddr = ns_ioaddr(dev);
-	int boguscnt = max_interrupt_work;
-	unsigned int handled = 0;
 
 	if (np->hands_off)
 		return IRQ_NONE;
-	do {
-		/* Reading automatically acknowledges all int sources. */
-		u32 intr_status = readl(ioaddr + IntrStatus);
+	
+	/* Reading automatically acknowledges. */
+	np->intr_status = readl(ioaddr + IntrStatus);
 
-		if (netif_msg_intr(np))
-			printk(KERN_DEBUG
-				"%s: Interrupt, status %#08x, mask %#08x.\n",
-				dev->name, intr_status,
-				readl(ioaddr + IntrMask));
+	if (netif_msg_intr(np))
+		printk(KERN_DEBUG
+		       "%s: Interrupt, status %#08x, mask %#08x.\n",
+		       dev->name, np->intr_status,
+		       readl(ioaddr + IntrMask));
 
-		if (intr_status == 0)
-			break;
-		handled = 1;
+	if (!np->intr_status) 
+		return IRQ_NONE;
 
-		if (intr_status &
-		   (IntrRxDone | IntrRxIntr | RxStatusFIFOOver |
-		    IntrRxErr | IntrRxOverrun)) {
-			netdev_rx(dev);
-		}
+	prefetch(&np->rx_skbuff[np->cur_rx % RX_RING_SIZE]);
+
+	if (netif_rx_schedule_prep(dev)) {
+		/* Disable interrupts and register for poll */
+		natsemi_irq_disable(dev);
+		__netif_rx_schedule(dev);
+	}
+	return IRQ_HANDLED;
+}
 
-		if (intr_status &
-		   (IntrTxDone | IntrTxIntr | IntrTxIdle | IntrTxErr)) {
+/* This is the NAPI poll routine.  As well as the standard RX handling
+ * it also handles all other interrupts that the chip might raise.
+ */
+static int natsemi_poll(struct net_device *dev, int *budget)
+{
+	struct netdev_private *np = netdev_priv(dev);
+	void __iomem * ioaddr = ns_ioaddr(dev);
+
+	int work_to_do = min(*budget, dev->quota);
+	int work_done = 0;
+
+	do {
+		if (np->intr_status &
+		    (IntrTxDone | IntrTxIntr | IntrTxIdle | IntrTxErr)) {
 			spin_lock(&np->lock);
 			netdev_tx_done(dev);
 			spin_unlock(&np->lock);
 		}
 
 		/* Abnormal error summary/uncommon events handlers. */
-		if (intr_status & IntrAbnormalSummary)
-			netdev_error(dev, intr_status);
-
-		if (--boguscnt < 0) {
-			if (netif_msg_intr(np))
-				printk(KERN_WARNING
-					"%s: Too much work at interrupt, "
-					"status=%#08x.\n",
-					dev->name, intr_status);
-			break;
+		if (np->intr_status & IntrAbnormalSummary)
+			netdev_error(dev, np->intr_status);
+		
+		if (np->intr_status &
+		    (IntrRxDone | IntrRxIntr | RxStatusFIFOOver |
+		     IntrRxErr | IntrRxOverrun)) {
+			netdev_rx(dev, &work_done, work_to_do);
 		}
-	} while (1);
+		
+		*budget -= work_done;
+		dev->quota -= work_done;
 
-	if (netif_msg_intr(np))
-		printk(KERN_DEBUG "%s: exiting interrupt.\n", dev->name);
+		if (work_done >= work_to_do)
+			return 1;
+
+		np->intr_status = readl(ioaddr + IntrStatus);
+	} while (np->intr_status);
 
-	return IRQ_RETVAL(handled);
+	netif_rx_complete(dev);
+
+	/* Reenable interrupts providing nothing is trying to shut
+	 * the chip down. */
+	spin_lock(&np->lock);
+	if (!np->hands_off && netif_running(dev))
+		natsemi_irq_enable(dev);
+	spin_unlock(&np->lock);
+
+	return 0;
 }
 
 /* This routine is logically part of the interrupt handler, but separated
    for clarity and better register allocation. */
-static void netdev_rx(struct net_device *dev)
+static void netdev_rx(struct net_device *dev, int *work_done, int work_to_do)
 {
 	struct netdev_private *np = netdev_priv(dev);
 	int entry = np->cur_rx % RX_RING_SIZE;
@@ -2237,6 +2275,12 @@ static void netdev_rx(struct net_device 
 				entry, desc_status);
 		if (--boguscnt < 0)
 			break;
+
+		if (*work_done >= work_to_do)
+			break;
+
+		(*work_done)++;
+
 		pkt_len = (desc_status & DescSizeMask) - 4;
 		if ((desc_status&(DescMore|DescPktOK|DescRxLong)) != DescPktOK){
 			if (desc_status & DescMore) {
@@ -2293,7 +2337,7 @@ static void netdev_rx(struct net_device 
 				np->rx_skbuff[entry] = NULL;
 			}
 			skb->protocol = eth_type_trans(skb, dev);
-			netif_rx(skb);
+			netif_receive_skb(skb);
 			dev->last_rx = jiffies;
 			np->stats.rx_packets++;
 			np->stats.rx_bytes += pkt_len;
@@ -3074,9 +3118,7 @@ static int netdev_close(struct net_devic
 	del_timer_sync(&np->timer);
 	disable_irq(dev->irq);
 	spin_lock_irq(&np->lock);
-	/* Disable interrupts, and flush posted writes */
-	writel(0, ioaddr + IntrEnable);
-	readl(ioaddr + IntrEnable);
+	natsemi_irq_disable(dev);
 	np->hands_off = 1;
 	spin_unlock_irq(&np->lock);
 	enable_irq(dev->irq);
@@ -3158,6 +3200,9 @@ static void __devexit natsemi_remove1 (s
  *	* netdev_timer: timer stopped by natsemi_suspend.
  *	* intr_handler: doesn't acquire the spinlock. suspend calls
  *		disable_irq() to enforce synchronization.
+ *      * natsemi_poll: checks before reenabling interrupts.  suspend
+ *              sets hands_off, disables interrupts and then waits with
+ *              netif_poll_disable().
  *
  * Interrupts must be disabled, otherwise hands_off can cause irq storms.
  */
@@ -3183,6 +3228,8 @@ static int natsemi_suspend (struct pci_d
 		spin_unlock_irq(&np->lock);
 		enable_irq(dev->irq);
 
+		netif_poll_disable(dev);
+
 		/* Update the error counts. */
 		__get_stats(dev);
 
@@ -3235,6 +3282,7 @@ static int natsemi_resume (struct pci_de
 		mod_timer(&np->timer, jiffies + 1*HZ);
 	}
 	netif_device_attach(dev);
+	netif_poll_enable(dev);
 out:
 	rtnl_unlock();
 	return 0;

--
"You grabbed my hand and we fell into it, like a daydream - or a fever."

^ permalink raw reply

* [patch 2/2] natsemi: NAPI and a bugfix
From: Mark Brown @ 2006-02-02  0:00 UTC (permalink / raw)
  To: Jeff Garzik, Tim Hockin; +Cc: netdev, linux-kernel
In-Reply-To: <20060202233909.426660000@lorien.sirena.org.uk>

[-- Attachment #1: natsemi-an-1287 --]
[-- Type: text/plain, Size: 2345 bytes --]

As documented in National application note 1287 the RX state machine on
the natsemi chip can lock up under some conditions (mostly related to
heavy load).  When this happens a series of bogus packets are reported
by the chip including some oversized frames prior to the final lockup.

This patch implements the fix from the application note: when an
oversized packet is reported it resets the RX state machine, dropping
any currently pending packets.

Signed-off-by: Mark Brown <broonie@sirena.org.uk>

Index: linux-2.6.15.2/drivers/net/natsemi.c
===================================================================
--- linux-2.6.15.2.orig/drivers/net/natsemi.c	2006-02-01 22:59:29.000000000 +0000
+++ linux-2.6.15.2/drivers/net/natsemi.c	2006-02-02 00:05:23.000000000 +0000
@@ -1498,6 +1498,31 @@ static void natsemi_reset(struct net_dev
 	writel(rfcr, ioaddr + RxFilterAddr);
 }
 
+static void reset_rx(struct net_device *dev)
+{
+	int i;
+	struct netdev_private *np = netdev_priv(dev);
+	void __iomem *ioaddr = ns_ioaddr(dev);
+
+	np->intr_status &= ~RxResetDone;
+
+	writel(RxReset, ioaddr + ChipCmd);
+
+	for (i=0;i<NATSEMI_HW_TIMEOUT;i++) {
+		np->intr_status |= readl(ioaddr + IntrStatus);
+		if (np->intr_status & RxResetDone)
+			break;
+		udelay(15);
+	}
+	if (i==NATSEMI_HW_TIMEOUT) {
+		printk(KERN_WARNING "%s: RX reset did not complete in %d usec.\n",
+		       dev->name, i*15);
+	} else if (netif_msg_hw(np)) {
+		printk(KERN_WARNING "%s: RX reset took %d usec.\n",
+		       dev->name, i*15);
+	}
+}
+
 static void natsemi_reload_eeprom(struct net_device *dev)
 {
 	struct netdev_private *np = netdev_priv(dev);
@@ -2292,6 +2317,23 @@ static void netdev_rx(struct net_device 
 						"status %#08x.\n", dev->name,
 						np->cur_rx, desc_status);
 				np->stats.rx_length_errors++;
+
+				/* The RX state machine has probably
+				 * locked up beneath us.  Follow the
+				 * reset procedure documented in
+				 * AN-1287. */
+
+				spin_lock_irq(&np->lock);
+				reset_rx(dev);
+				reinit_rx(dev);
+				writel(np->ring_dma, ioaddr + RxRingPtr);
+				check_link(dev);
+				spin_unlock_irq(&np->lock);
+
+				/* We'll enable RX on exit from this
+				 * function. */
+				break;
+
 			} else {
 				/* There was an error. */
 				np->stats.rx_errors++;

--
"You grabbed my hand and we fell into it, like a daydream - or a fever."

^ permalink raw reply

* [PATCH RESEND] net: Move destructor from neigh->ops to neigh_params
From: Roland Dreier @ 2006-02-02  1:28 UTC (permalink / raw)
  To: davem; +Cc: netdev, openib-general
In-Reply-To: <adazmlmmy7v.fsf@cisco.com>

Sorry to distract everyone from the VJ channel discussion, but on the
other hand it looks like Dave is back... I'm resending this because
I'd really like to get this problem fixed but I want to make sure
we're doing it the right way.  So either an ACK or a NAK with guidance
would be great...


This is a resend of a patch written by Michael S. Tsirkin
<mst@mellanox.co.il>.  I'd like to get an ACK or NAK of it from Dave
and other networking people, so that we can either merge it upstream
or try a different approach.  There definitely is a problem with
neighbour destructors that IP-over-IB is running into.

It would be good to know what the design was behind putting the
destructor method in neigh->ops in the first place.

Dave, if you want to merge this directly, that's fine.  Or I'm fine
with merging this through the IB tree if you'd prefer (if you want me
to do that, let me know if you think it's 2.6.16 material).

Thanks,
  Roland



struct neigh_ops currently has a destructor field, which no in-kernel
drivers outside of infiniband use.  The infiniband/ulp/ipoib in-tree
driver stashes some info in the neighbour structure (the results of
the second-stage lookup from ARP results to real link-level path), and
it uses neigh->ops->destructor to get a callback so it can clean up
this extra info when a neighbour is freed.  We've run into problems
with this: since the destructor is in an ops field that is shared
between neighbours that may belong to different net devices, there's
no way to set/clear it safely.

The following patch moves this field to neigh_parms where it can be
safely set, together with its twin neigh_setup.  Two additional
patches in the patch series update ipoib to use this new interface.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

---

diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index 6fa9ae1..b0666d6 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -68,6 +68,7 @@ struct neigh_parms
 	struct net_device *dev;
 	struct neigh_parms *next;
 	int	(*neigh_setup)(struct neighbour *);
+	void	(*neigh_destructor)(struct neighbour *);
 	struct neigh_table *tbl;
 
 	void	*sysctl_table;
@@ -145,7 +146,6 @@ struct neighbour
 struct neigh_ops
 {
 	int			family;
-	void			(*destructor)(struct neighbour *);
 	void			(*solicit)(struct neighbour *, struct sk_buff*);
 	void			(*error_report)(struct neighbour *, struct sk_buff*);
 	int			(*output)(struct sk_buff*);
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index e68700f..3489e23 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -586,8 +586,8 @@ void neigh_destroy(struct neighbour *nei
 			kfree(hh);
 	}
 
-	if (neigh->ops && neigh->ops->destructor)
-		(neigh->ops->destructor)(neigh);
+	if (neigh->parms->neigh_destructor)
+		(neigh->parms->neigh_destructor)(neigh);
 
 	skb_queue_purge(&neigh->arp_queue);
 
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index fd3f5c8..9588124 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -247,7 +247,6 @@ static void path_free(struct net_device 
 		if (neigh->ah)
 			ipoib_put_ah(neigh->ah);
 		*to_ipoib_neigh(neigh->neighbour) = NULL;
-		neigh->neighbour->ops->destructor = NULL;
 		kfree(neigh);
 	}
 
@@ -530,7 +529,6 @@ static void neigh_add_path(struct sk_buf
 err:
 	*to_ipoib_neigh(skb->dst->neighbour) = NULL;
 	list_del(&neigh->list);
-	neigh->neighbour->ops->destructor = NULL;
 	kfree(neigh);
 
 	++priv->stats.tx_dropped;
@@ -769,21 +767,9 @@ static void ipoib_neigh_destructor(struc
 		ipoib_put_ah(ah);
 }
 
-static int ipoib_neigh_setup(struct neighbour *neigh)
-{
-	/*
-	 * Is this kosher?  I can't find anybody in the kernel that
-	 * sets neigh->destructor, so we should be able to set it here
-	 * without trouble.
-	 */
-	neigh->ops->destructor = ipoib_neigh_destructor;
-
-	return 0;
-}
-
 static int ipoib_neigh_setup_dev(struct net_device *dev, struct neigh_parms *parms)
 {
-	parms->neigh_setup = ipoib_neigh_setup;
+	parms->neigh_destructor = ipoib_neigh_destructor;
 
 	return 0;
 }

^ permalink raw reply related

* Re: [PATCH RESEND] net: Move destructor from neigh->ops to neigh_params
From: David S. Miller @ 2006-02-02  1:31 UTC (permalink / raw)
  To: rdreier; +Cc: netdev, openib-general
In-Reply-To: <adabqxq1rdh.fsf@cisco.com>

From: Roland Dreier <rdreier@cisco.com>
Date: Wed, 01 Feb 2006 17:28:10 -0800

> Sorry to distract everyone from the VJ channel discussion, but on the
> other hand it looks like Dave is back... I'm resending this because
> I'd really like to get this problem fixed but I want to make sure
> we're doing it the right way.  So either an ACK or a NAK with guidance
> would be great...

It's sitting in my backlog, and will be a net-2.6.17 issue not
a net-2.6.16 one as we're in bug fix mode there.

Sorry if you need this in 2.6.16, but that's not really practical.

^ permalink raw reply

* Re: [PATCH RESEND] net: Move destructor from neigh->ops to neigh_params
From: Roland Dreier @ 2006-02-02  1:33 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, openib-general
In-Reply-To: <20060201.173137.10844554.davem@davemloft.net>

    David> It's sitting in my backlog, and will be a net-2.6.17 issue
    David> not a net-2.6.16 one as we're in bug fix mode there.

    David> Sorry if you need this in 2.6.16, but that's not really
    David> practical.

No, that's fine... I was just resending in case you were using RED to
manage your backlog.  This is a real issue but I don't think it's
hitting a lot of people.

 - R.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox