* Work in progress SMB-Direct driver for the linux kernel
@ 2018-02-01 8:18 Stefan Metzmacher
[not found] ` <959c3608-c508-17c4-bad6-fd9421a4a235-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: Stefan Metzmacher @ 2018-02-01 8:18 UTC (permalink / raw)
To: David Disseldorp
Cc: Samba Technical,
linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Tom Talpey,
Long Li, Steve French, Ralph Böhme
[-- Attachment #1.1: Type: text/plain, Size: 2101 bytes --]
Hi David,
you were asking about my work in progress of the SMB-Direct driver for
the linux kernel.
I dumped what I have to git://git.samba.org/metze/linux/smbdirect.git
into the smbdirect-work-in-progress branch.
See
https://git.samba.org/?p=metze/linux/smbdirect.git;a=shortlog;h=refs/heads/smbdirect-work-in-progress
From the README:
This is a work in progress SMB-Direct driver for the linux kernel
It's just a raw dump of what I currently have to get some help
with debugging kernel freezes. A lot of cleanups are required
and real commits with useful commit messages...
The protocol is specified by Microsoft in
[MS-SMBD] SMB2 Remote Direct Memory Access (RDMA) Transport Protocol
The aim is to later reuse parts of this for other operating systems
like FreeBSD.
The first goal is to provide a socket fd to userspace (or in kernel
consumers)
which provides semantics like a TCP socket which is used as transport
for SMB3. Basically frames are submitted with a 4 byte length header.
The second goal will be to provide RDMA read and write support
via ioctl() calls on the main smb-direct socket fd, but the
api for that is not yet designed. But it needs to
avoid data copy as much as possible.
loadchelsio.sh, loadrxe.sh, loadsiw.sh offer some examples
to setup the rdma stack before using 'insmod smbdirect.ko'.
The userspace 'smbdirect-tool' offers some commands for testing.
"connect <dstaddr> [<dstport> [<srcaddr> [<srcport>]]]"
"172.31.9.166"
"172.31.9.166 5445"
"172.31.9.166 5445 172.31.9.167"
"fd3a:aaa3:ee87:ff09:f189:7430:7976:6276 5445"
"listen [<port> [<addr>]]"
"5445"
"5445 172.31.9.166"
"445 fd3a:aaa3:ee87:ff09:f189:7430:7976:6276"
The current state is the following:
- It compiles with v4.{10,11,12,13,14,15}
- The kernel freezes after some time
on an active connection, e.g. when readv() is called
- Also rmmod smbdirect causes a freeze after some time
As you gave a talk about debugging the linux kernel
you might be able to find the reasons for the (silent) freezes.
metze
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread[parent not found: <959c3608-c508-17c4-bad6-fd9421a4a235-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org>]
* Re: Work in progress SMB-Direct driver for the linux kernel [not found] ` <959c3608-c508-17c4-bad6-fd9421a4a235-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org> @ 2018-02-01 12:12 ` David Disseldorp 2018-02-01 17:52 ` Jason Gunthorpe 1 sibling, 0 replies; 8+ messages in thread From: David Disseldorp @ 2018-02-01 12:12 UTC (permalink / raw) To: Stefan Metzmacher via samba-technical Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Steve French On Thu, 1 Feb 2018 09:18:10 +0100, Stefan Metzmacher via samba-technical wrote: > Hi David, > > you were asking about my work in progress of the SMB-Direct driver for > the linux kernel. > > I dumped what I have to git://git.samba.org/metze/linux/smbdirect.git > into the smbdirect-work-in-progress branch. > > See > https://git.samba.org/?p=metze/linux/smbdirect.git;a=shortlog;h=refs/heads/smbdirect-work-in-progress Thanks for sending this around Metze, will take a look and hopefully give it a try next week. Cheers, David -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Work in progress SMB-Direct driver for the linux kernel [not found] ` <959c3608-c508-17c4-bad6-fd9421a4a235-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org> 2018-02-01 12:12 ` David Disseldorp @ 2018-02-01 17:52 ` Jason Gunthorpe [not found] ` <20180201175247.GT17053-uk2M96/98Pc@public.gmane.org> 1 sibling, 1 reply; 8+ messages in thread From: Jason Gunthorpe @ 2018-02-01 17:52 UTC (permalink / raw) To: Stefan Metzmacher Cc: David Disseldorp, Samba Technical, linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Tom Talpey, Long Li, Steve French, Ralph Böhme On Thu, Feb 01, 2018 at 09:18:10AM +0100, Stefan Metzmacher wrote: > The first goal is to provide a socket fd to userspace (or in kernel > consumers) > which provides semantics like a TCP socket which is used as transport > for SMB3. Basically frames are submitted with a 4 byte length header. Part of the point of RDMA is that we don't need to make protocol specific kernel modules like this - is there a specific reason this needs to be in the kernel like this? Jason ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <20180201175247.GT17053-uk2M96/98Pc@public.gmane.org>]
* Re: Work in progress SMB-Direct driver for the linux kernel [not found] ` <20180201175247.GT17053-uk2M96/98Pc@public.gmane.org> @ 2018-02-02 3:00 ` Richard Sharpe [not found] ` <CACyXjPyKcLktNqEdBbicSd=3GU7R9BeE-=Dcc2DyCh2sH9wxMg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2018-02-04 16:39 ` Doug Ledford 1 sibling, 1 reply; 8+ messages in thread From: Richard Sharpe @ 2018-02-02 3:00 UTC (permalink / raw) To: Jason Gunthorpe Cc: Stefan Metzmacher, David Disseldorp, Samba Technical, linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Tom Talpey, Long Li, Steve French, Ralph Böhme On Thu, Feb 1, 2018 at 9:52 AM, Jason Gunthorpe <jgg-uk2M96/98Pc@public.gmane.org> wrote: > On Thu, Feb 01, 2018 at 09:18:10AM +0100, Stefan Metzmacher wrote: > >> The first goal is to provide a socket fd to userspace (or in kernel >> consumers) >> which provides semantics like a TCP socket which is used as transport >> for SMB3. Basically frames are submitted with a 4 byte length header. > > Part of the point of RDMA is that we don't need to make protocol > specific kernel modules like this - is there a specific reason this > needs to be in the kernel like this? If I had to guess it would be because Samba currently uses a fork model ... it might be years before it gets to a completely threaded model. -- Regards, Richard Sharpe (何以解憂?唯有杜康。--曹操) -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <CACyXjPyKcLktNqEdBbicSd=3GU7R9BeE-=Dcc2DyCh2sH9wxMg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Work in progress SMB-Direct driver for the linux kernel [not found] ` <CACyXjPyKcLktNqEdBbicSd=3GU7R9BeE-=Dcc2DyCh2sH9wxMg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2018-02-02 9:25 ` Stefan Metzmacher [not found] ` <2f684247-04ec-1c0f-8153-9b6f8d45f265-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: Stefan Metzmacher @ 2018-02-02 9:25 UTC (permalink / raw) To: Richard Sharpe, Jason Gunthorpe Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Samba Technical, Steve French, David Disseldorp [-- Attachment #1.1: Type: text/plain, Size: 1392 bytes --] Hi Jason, >>> The first goal is to provide a socket fd to userspace (or in kernel >>> consumers) >>> which provides semantics like a TCP socket which is used as transport >>> for SMB3. Basically frames are submitted with a 4 byte length header. >> >> Part of the point of RDMA is that we don't need to make protocol >> specific kernel modules like this - is there a specific reason this >> needs to be in the kernel like this? > > If I had to guess it would be because Samba currently uses a fork > model ... it might be years before it gets to a completely threaded > model. Yes, and it also means that our client and server code only need minimal changes in order to work in the same way it would work over tcp. Only the RDMA read and writes need some more work, but I have some ideas where the userspace gives the kernel an fd, offset and length plus a remove memory descriptor as ioctl on the connection fd. Then the kernel can get the content from the filesystem and directly pass it to the rdma adapter, avoiding the copy from kernel to userspace and back. From userspace we'll just wait in the syscall and don't have to care about memory registrations and all other complex stuff. It also happens that smbd sometimes blocks in syscalls like unlink for a long time. It's good to have the kernel as 2nd entity that takes care of keepalives. metze [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <2f684247-04ec-1c0f-8153-9b6f8d45f265-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org>]
* RE: Work in progress SMB-Direct driver for the linux kernel [not found] ` <2f684247-04ec-1c0f-8153-9b6f8d45f265-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org> @ 2018-02-02 14:52 ` Tom Talpey 0 siblings, 0 replies; 8+ messages in thread From: Tom Talpey @ 2018-02-02 14:52 UTC (permalink / raw) To: Stefan Metzmacher, Richard Sharpe, Jason Gunthorpe Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Samba Technical, Steve French, David Disseldorp [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 3058 bytes --] > -----Original Message----- > From: linux-cifs-owner@vger.kernel.org <linux-cifs-owner@vger.kernel.org> On > Behalf Of Stefan Metzmacher > Sent: Friday, February 2, 2018 4:25 AM > To: Richard Sharpe <realrichardsharpe@gmail.com>; Jason Gunthorpe > <jgg@ziepe.ca> > Cc: linux-cifs@vger.kernel.org; linux-rdma@vger.kernel.org; Samba Technical > <samba-technical@lists.samba.org>; Steve French <smfrench@gmail.com>; > David Disseldorp <ddiss@samba.org> > Subject: Re: Work in progress SMB-Direct driver for the linux kernel > > Hi Jason, > > >>> The first goal is to provide a socket fd to userspace (or in kernel > >>> consumers) > >>> which provides semantics like a TCP socket which is used as transport > >>> for SMB3. Basically frames are submitted with a 4 byte length header. > >> > >> Part of the point of RDMA is that we don't need to make protocol > >> specific kernel modules like this - is there a specific reason this > >> needs to be in the kernel like this? > > > > If I had to guess it would be because Samba currently uses a fork > > model ... it might be years before it gets to a completely threaded > > model. > > Yes, and it also means that our client and server code only need > minimal changes in order to work in the same way it would work > over tcp. > > Only the RDMA read and writes need some more work, but I have > some ideas where the userspace gives the kernel an fd, offset and length > plus a remove memory descriptor as ioctl on the connection fd. Then the > kernel can get the content from the filesystem and directly pass it to > the rdma adapter, avoiding the copy from kernel to userspace and back. > From userspace we'll just wait in the syscall and don't have to care > about memory registrations and all other complex stuff. Doesn't this sort of transport shimming put back all the overhead it was trying to avoid? Stripping off the 4-byte record marker, rearranging the read/write data and SMB3_READ operation header to add the channel (memory registration) handles, and most importantly placing the data in bounce buffers to accommodate the readv()/writev() calls are quite complex and expensive. And, just to present a file descriptor? Experience in early NFS/RDMA and Windows Sockets Direct have taught that transparency above the RDMA transport interface is generally the enemy of performance. The shims are forced to perform additional syscalls, RDMA work requests, and sometimes even network round trips. Do you have performance results for yours? > It also happens that smbd sometimes blocks in syscalls like unlink for > a long time. It's good to have the kernel as 2nd entity that takes care > of keepalives. I agree that implementing SMB Direct in your userspace SMB3 daemon may be problematic. But what of the existing SMB Direct code in the CIFS kernel client? How will that coexist going forward? Tom. N§²æìr¸yúèØb²X¬¶Ç§vØ^)Þº{.nÇ+·¥{±Ù{ayº\x1dÊÚë,j\a¢f£¢·h»öì\x17/oSc¾Ú³9uÀ¦æåÈ&jw¨®\x03(éÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þàþf£¢·h§~m ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Work in progress SMB-Direct driver for the linux kernel [not found] ` <20180201175247.GT17053-uk2M96/98Pc@public.gmane.org> 2018-02-02 3:00 ` Richard Sharpe @ 2018-02-04 16:39 ` Doug Ledford [not found] ` <1517762353.3936.41.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 1 sibling, 1 reply; 8+ messages in thread From: Doug Ledford @ 2018-02-04 16:39 UTC (permalink / raw) To: Jason Gunthorpe, Stefan Metzmacher Cc: David Disseldorp, Samba Technical, linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Tom Talpey, Long Li, Steve French, Ralph Böhme [-- Attachment #1: Type: text/plain, Size: 3012 bytes --] On Thu, 2018-02-01 at 10:52 -0700, Jason Gunthorpe wrote: > On Thu, Feb 01, 2018 at 09:18:10AM +0100, Stefan Metzmacher wrote: > > > The first goal is to provide a socket fd to userspace (or in kernel > > consumers) > > which provides semantics like a TCP socket which is used as transport > > for SMB3. Basically frames are submitted with a 4 byte length header. > > Part of the point of RDMA is that we don't need to make protocol > specific kernel modules like this - is there a specific reason this > needs to be in the kernel like this? Although your question was partially answered, I think the precise cause is that to behave like an SMB3 server, the model involves opening a TCP connection, doing auth over that, then negotiating whether or not RDMA connections are possible, then opening the RDMA connections. On our end, once the auth happens, smbd forks and the new user process handles the connection from that point on (which is integral to smbd's security model). Because of this, and because the client initiates the RDMA connection and will only use a single well known port to connect to, incoming RDMA connections after negotiation would always go to the master smbd process and not the forked client authed smbd process. Because we don't have a way of then handing the RDMA connection from one process to another forked process, the model breaks down. Possible solutions are: 1) Don't make the master process listen on the well known RDMA port. Let each client register to listen on the port only after the negotiation has completed, listen until it finds the proper client connection, then stop listening on the port. This obviously makes multiple client connections race for which server is listening. I have no idea if the Microsoft client would even retry a connection if it went to the wrong process the first time and was rejected. 2) Create a new listen model where a process could register to listen on a specific port + remote IP address(es). That way all connection attempts from the specific remote IP address(es) and the local port would go to the specific process. This could be contained within the CMA and need not impact normal sockets processes. I would envision that the smbd would get the negotiation packet from the client with a list of the RDMA devices it can connect across, would work out which of those devices it could accept connections from, would initiate all necessary listens, and finally would send the reply packet to the client. 3) Make a kernel module as they are doing to handle the incoming connections and multiplex between processes. 4) Make smbd threaded so that it can listen on the well known port and then hand a connection off to a specific thread once established. I'm partial to either 2 or 4 myself. -- Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <1517762353.3936.41.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: Work in progress SMB-Direct driver for the linux kernel [not found] ` <1517762353.3936.41.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2018-02-05 1:32 ` Steve French 0 siblings, 0 replies; 8+ messages in thread From: Steve French @ 2018-02-05 1:32 UTC (permalink / raw) To: Doug Ledford Cc: Jason Gunthorpe, Stefan Metzmacher, David Disseldorp, Samba Technical, linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Tom Talpey, Long Li, Ralph Böhme On Sun, Feb 4, 2018 at 10:39 AM, Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > On Thu, 2018-02-01 at 10:52 -0700, Jason Gunthorpe wrote: >> On Thu, Feb 01, 2018 at 09:18:10AM +0100, Stefan Metzmacher wrote: >> >> > The first goal is to provide a socket fd to userspace (or in kernel >> > consumers) >> > which provides semantics like a TCP socket which is used as transport >> > for SMB3. Basically frames are submitted with a 4 byte length header. >> >> Part of the point of RDMA is that we don't need to make protocol >> specific kernel modules like this - is there a specific reason this >> needs to be in the kernel like this? > > Although your question was partially answered, I think the precise cause > is that to behave like an SMB3 server, the model involves opening a TCP > connection, doing auth over that, then negotiating whether or not RDMA > connections are possible, then opening the RDMA connections. On our > end, once the auth happens, smbd forks and the new user process handles > the connection from that point on (which is integral to smbd's security > model). Because of this, and because the client initiates the RDMA > connection and will only use a single well known port to connect to, > incoming RDMA connections after negotiation would always go to the > master smbd process and not the forked client authed smbd process. > Because we don't have a way of then handing the RDMA connection from one > process to another forked process, the model breaks down. Possible > solutions are: It is interesting to think about the work Long Li did - which shows that at least on the client side the model doesn't require opening a TCP connection and doing auth on it before negotiating RDMA. His smb3 kernel code allows trying SMB3 over RDMA immediately (without first negotiating whether RDMA is allowed) - obviously finishing multichannel support in cifs.ko is important, but I was surprised at first that it was possible to use RDMA for SMB3 without multichannel. -- Thanks, Steve ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2018-02-05 1:32 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-02-01 8:18 Work in progress SMB-Direct driver for the linux kernel Stefan Metzmacher
[not found] ` <959c3608-c508-17c4-bad6-fd9421a4a235-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org>
2018-02-01 12:12 ` David Disseldorp
2018-02-01 17:52 ` Jason Gunthorpe
[not found] ` <20180201175247.GT17053-uk2M96/98Pc@public.gmane.org>
2018-02-02 3:00 ` Richard Sharpe
[not found] ` <CACyXjPyKcLktNqEdBbicSd=3GU7R9BeE-=Dcc2DyCh2sH9wxMg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-02-02 9:25 ` Stefan Metzmacher
[not found] ` <2f684247-04ec-1c0f-8153-9b6f8d45f265-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org>
2018-02-02 14:52 ` Tom Talpey
2018-02-04 16:39 ` Doug Ledford
[not found] ` <1517762353.3936.41.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-02-05 1:32 ` Steve French
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox