* [PATCH] SIW: Documentation (initial)
@ 2010-10-05 6:55 Bernard Metzler
[not found] ` <1286261747-5288-1-git-send-email-bmt-OA+xvbQnYDHMbYB6QlFGEg@public.gmane.org>
0 siblings, 1 reply; 3+ messages in thread
From: Bernard Metzler @ 2010-10-05 6:55 UTC (permalink / raw)
To: netdev-u79uwXL29TY76Z2rM5mHXA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Bernard Metzler
---
Documentation/networking/siw.txt | 91 ++++++++++++++++++++++++++++++++++++++
1 files changed, 91 insertions(+), 0 deletions(-)
create mode 100644 Documentation/networking/siw.txt
diff --git a/Documentation/networking/siw.txt b/Documentation/networking/siw.txt
new file mode 100644
index 0000000..f051d8b
--- /dev/null
+++ b/Documentation/networking/siw.txt
@@ -0,0 +1,91 @@
+SoftiWARP: Software iWARP kernel driver module.
+
+General
+-------
+SoftiWARP (siw) implements the iWARP protocol suite (MPA/DDP/RDMAP,
+IETF-RFC 5044/5041/5040) completely in software as a Linux kernel module.
+siw runs on top of TCP kernel sockets and exports the Linux kernel ibvers
+RDMA interface. siw interfaces with the iwcm connection manager.
+
+
+Transmit Path
+-------------
+If a send queue (SQ) work queue element gets posted, siw tries to send
+it directly out of the application context. If the SQ was non-empty,
+SQ processing is done asynchronously by a kernel worker thread. This
+thread gets scheduled, if the TCP socket signals new write space to
+be available. If during send operation the socket send space get
+exhausted, SQ processing is abandoned until new socket write space
+becomes available.
+
+
+Receive Path
+------------
+All application data is placed into target buffers within softirq
+socket callback. Application notification is asynchronous.
+
+
+User Interface
+--------------
+All fast path operations such as posting of work requests and
+reaping of work completions currently involve a system call into
+the siw module. Kernel/user-mapped send and receive as well as
+completion queues are not part of the current code. In
+particular, mapped completion queues may improve performance,
+since reaping completion queue entries as well as re-arming
+the completion queue could be done more efficiently.
+
+
+Memory Management
+-----------------
+siw currently uses kernels ib_umem_get() function to pin memory for later
+use in data transfer operations. Transmit and receive memory is checked
+against correct access permissions only in the moment of access by the
+network input path or before pushing it to the socket for transmission.
+ib_umem_get() provides DMA mappings for the requested address space which
+is not used by siw.
+
+
+Module Parameters
+-----------------
+The following siw module parameters are recognized.
+loopback_enabled:
+ If set, siw attaches also to the looback device. Checked only
+ during module insertion.
+
+mpa_crc_enabled:
+ If set, the MPA CRC gets generated and checked both in tx and rx
+ path. Without hardware support, setting this flag will severely
+ hurt throughput.
+
+zcopy_tx:
+ If set, payload of non signalled work requests
+ (such as non signalled WRITE or SEND as well as all READ
+ responses) are transferred using the TCP sockets
+ sendpage interface. This parameter can be switched on and
+ off dynamically (echo 1 >> /sys/module/siw/parameters/zcopy_tx
+ for enablement, 0 for disabling). System load may benefits from
+ using 0copy data transmission. 0copy is not enabled if
+ mpa_crc_enabled is set.
+
+
+Compile Time Flags:
+-DCHECK_DMA_CAPABILITIES
+ Checks if the device siw wants to attach to provides
+ DMA capabilities. While DMA capabilities are currently not
+ needed (siw works on top of a kernel TCP socket), siw
+ uses ib_umem_get() which performs a (not used) DMA address
+ translation. Writing a siw private memory reservation and
+ pinning routine would solve the issue.
+
+-DSIW_TX_FULLSEGS
+ Experimental, not enabled by default. If set,
+ siw tries not to overrun the socket (not sending until
+ -EAGAIN retrun), but stops sending if the current segment
+ would not fit into the socket's estimated tx buffer. With that,
+ wire FPDUs may get truncated by the TCP stack far less often.
+ Since this feature manipulates the sock's SOCK_NOSPACE
+ bit, it violates strict layering and is therefore considered
+ proprietary.
+ Since TCP is a byte stream protocol, no guarantee can be given
+ if FPDU's are not fragmented.
--
1.5.4.3
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 3+ messages in thread[parent not found: <1286261747-5288-1-git-send-email-bmt-OA+xvbQnYDHMbYB6QlFGEg@public.gmane.org>]
* Re: [PATCH] SIW: Documentation (initial) [not found] ` <1286261747-5288-1-git-send-email-bmt-OA+xvbQnYDHMbYB6QlFGEg@public.gmane.org> @ 2010-10-14 22:57 ` Randy Dunlap [not found] ` <20101014155703.3d4b5d71.randy.dunlap-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 3+ messages in thread From: Randy Dunlap @ 2010-10-14 22:57 UTC (permalink / raw) To: Bernard Metzler Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA On Tue, 5 Oct 2010 08:55:47 +0200 Bernard Metzler wrote: > --- > Documentation/networking/siw.txt | 91 ++++++++++++++++++++++++++++++++++++++ > 1 files changed, 91 insertions(+), 0 deletions(-) > create mode 100644 Documentation/networking/siw.txt > > diff --git a/Documentation/networking/siw.txt b/Documentation/networking/siw.txt > new file mode 100644 > index 0000000..f051d8b > --- /dev/null > +++ b/Documentation/networking/siw.txt > @@ -0,0 +1,91 @@ > +SoftiWARP: Software iWARP kernel driver module. > + > +General > +------- > +SoftiWARP (siw) implements the iWARP protocol suite (MPA/DDP/RDMAP, > +IETF-RFC 5044/5041/5040) completely in software as a Linux kernel module. > +siw runs on top of TCP kernel sockets and exports the Linux kernel ibvers ^^^^^^ Is that "ibverbs"? (just checking) > +RDMA interface. siw interfaces with the iwcm connection manager. > + > + > +Transmit Path > +------------- > +If a send queue (SQ) work queue element gets posted, siw tries to send > +it directly out of the application context. If the SQ was non-empty, > +SQ processing is done asynchronously by a kernel worker thread. This > +thread gets scheduled, if the TCP socket signals new write space to drop the comma. > +be available. If during send operation the socket send space get becomes (or "is") > +exhausted, SQ processing is abandoned until new socket write space > +becomes available. > + > + > +Receive Path > +------------ > +All application data is placed into target buffers within softirq > +socket callback. Application notification is asynchronous. > + > + > +User Interface > +-------------- > +All fast path operations such as posting of work requests and > +reaping of work completions currently involve a system call into > +the siw module. Kernel/user-mapped send and receive as well as I didn't find the system call(s). Are they new syscalls or just (socket) reads/writes? (I was probably looking for new syscalls.) > +completion queues are not part of the current code. In > +particular, mapped completion queues may improve performance, > +since reaping completion queue entries as well as re-arming > +the completion queue could be done more efficiently. > + > + > +Memory Management > +----------------- > +siw currently uses kernels ib_umem_get() function to pin memory for later the kernel's > +use in data transfer operations. Transmit and receive memory is checked are checked (or change "and" to "or") > +against correct access permissions only in the moment of access by the > +network input path or before pushing it to the socket for transmission. > +ib_umem_get() provides DMA mappings for the requested address space which > +is not used by siw. > + > + > +Module Parameters > +----------------- > +The following siw module parameters are recognized. > +loopback_enabled: > + If set, siw attaches also to the looback device. Checked only > + during module insertion. > + > +mpa_crc_enabled: > + If set, the MPA CRC gets generated and checked both in tx and rx > + path. Without hardware support, setting this flag will severely > + hurt throughput. > + > +zcopy_tx: > + If set, payload of non signalled work requests non-signalled > + (such as non signalled WRITE or SEND as well as all READ non-signalled > + responses) are transferred using the TCP sockets socket's > + sendpage interface. This parameter can be switched on and > + off dynamically (echo 1 >> /sys/module/siw/parameters/zcopy_tx > + for enablement, 0 for disabling). System load may benefits from benefit > + using 0copy data transmission. 0copy is not enabled if > + mpa_crc_enabled is set. > + > + > +Compile Time Flags: > +-DCHECK_DMA_CAPABILITIES > + Checks if the device siw wants to attach to provides > + DMA capabilities. While DMA capabilities are currently not > + needed (siw works on top of a kernel TCP socket), siw > + uses ib_umem_get() which performs a (not used) DMA address > + translation. Writing a siw private memory reservation and > + pinning routine would solve the issue. > + > +-DSIW_TX_FULLSEGS > + Experimental, not enabled by default. If set, > + siw tries not to overrun the socket (not sending until > + -EAGAIN retrun), but stops sending if the current segment return), > + would not fit into the socket's estimated tx buffer. With that, > + wire FPDUs may get truncated by the TCP stack far less often. > + Since this feature manipulates the sock's SOCK_NOSPACE > + bit, it violates strict layering and is therefore considered > + proprietary. > + Since TCP is a byte stream protocol, no guarantee can be given > + if FPDU's are not fragmented. > -- --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 3+ messages in thread
[parent not found: <20101014155703.3d4b5d71.randy.dunlap-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH] SIW: Documentation (initial) [not found] ` <20101014155703.3d4b5d71.randy.dunlap-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> @ 2010-10-19 15:36 ` Bernard Metzler 0 siblings, 0 replies; 3+ messages in thread From: Bernard Metzler @ 2010-10-19 15:36 UTC (permalink / raw) To: Randy Dunlap Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA Randy, ...back from vacation. Many thanks! I'll take it all over. Bernard. Randy Dunlap <randy.dunlap-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote on 10/15/2010 12:57:03 AM: <snip> > > + > > +User Interface > > +-------------- > > +All fast path operations such as posting of work requests and > > +reaping of work completions currently involve a system call into > > +the siw module. Kernel/user-mapped send and receive as well as > > I didn't find the system call(s). Are they new syscalls or just > (socket) reads/writes? (I was probably looking for new syscalls.) > I will have to clarify. Currently all operations are using the infiniband/core infrastructure (e.g. via uverbs write file operation). There is no private interface between libsiw and siw kernel module in place. <snip> -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2010-10-19 15:36 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-05 6:55 [PATCH] SIW: Documentation (initial) Bernard Metzler
[not found] ` <1286261747-5288-1-git-send-email-bmt-OA+xvbQnYDHMbYB6QlFGEg@public.gmane.org>
2010-10-14 22:57 ` Randy Dunlap
[not found] ` <20101014155703.3d4b5d71.randy.dunlap-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2010-10-19 15:36 ` Bernard Metzler
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox