From mboxrd@z Thu Jan 1 00:00:00 1970 From: Roland Dreier Subject: Advice needed on IP-over-InfiniBand driver Date: Sat, 18 Sep 2004 21:08:37 -0700 Sender: netdev-bounce@oss.sgi.com Message-ID: <52fz5esxx6.fsf@topspin.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: To: netdev@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com List-Id: netdev.vger.kernel.org Hi, I'm looking for guidance on the "right" way to implement an IP-over-InfiniBand (IPoIB) driver for Linux. Right now, we have something that works, but we are cleaning it up for upstream submission (along with the rest of the OpenIB code). IPoIB is a network device driver (in the sense of "struct net_device"), but there are a few complications beyond the usual ethernet NIC case, which I'll described below. For full details you can look at the drafts from the IETF ipoib working group: http://ietf.org/html.charters/ipoib-charter.html If you want to look at the existing code, the best place to look is in my Subversion branch, specifically https://openib.org/svn/gen2/branches/roland-merge/src/linux-kernel/infiniband/ulp/ipoib for IPoIB code. IPoIB uses the usual ARP protocol for IPv4 (everything is analogous for IPv6 neighbor discovery but I'll focus on IPv4 for simplicity). A hardware address is 20 bytes: 1 reserved byte, 3 bytes of queue pair number (QPN) and 16 bytes of global identifier (GID). ARP works as specified in RFC 826 (with hardware type 32). The wrinkle is that while this 20 byte address is enough to uniquely identify a destination, it is not enough to actually send a packet to there. Once an ARP reply comes back, the IPoIB driver must then send a query to the IB subnet manager (which is a remote server on the IB fabric) and obtain a path to the destination GID -- a path is a 2 byte local identifier (LID) and a few other pieces of information. Once we have the path, we give that to the IB hardware and a get an address handle, which can finally be used to send a packet. This means there are a few things we would like to be able to do in the IPoIB driver. First of all, it would be good to be able to hook into the ARP code so that we add the GID->path lookup after the normal ARP (and have the kernel keep queuing packets until that lookup completes). Also, once the whole process is complete and we have an address handle, the driver doesn't actually care about the 20-byte destination address when it's getting packets to send -- it just needs the address handle. So it would be nice to have some way to stash that in struct neighbor and get that in our hard_header method (rather than having to keep our own cache mapping 20-bytes address back to address handle). It seems that some combination of clever neigh_setup and hard_header_cache/header_cache_update methods should be enough to make this work, but I don't know enough about the network stack to see how to do it. I'd really appreciate guidance on how to implement this, and I'm happy to answer any questions about the IPoIB architecture. Thanks, Roland