From: Krishnamraju Eraparaju <krishna2@chelsio.com>
To: Bernard Metzler <BMT@zurich.ibm.com>
Cc: faisal.latif@intel.com, shiraz.saleem@intel.com,
mkalderon@marvell.com, aelior@marvell.com, dledford@redhat.com,
jgg@ziepe.ca, linux-rdma@vger.kernel.org, bharat@chelsio.com,
nirranjan@chelsio.com
Subject: Re: Re: Re: Re: [RFC PATCH] RDMA/siw: Experimental e2e negotiation of GSO usage.
Date: Fri, 15 May 2020 19:20:40 +0530 [thread overview]
Message-ID: <20200515135038.GA15967@chelsio.com> (raw)
In-Reply-To: <OF5AE22DD2.A8A5C20E-ON00258568.004804AF-00258568.00481A8E@notes.na.collabserv.com>
On Thursday, May 05/14/20, 2020 at 13:07:33 +0000, Bernard Metzler wrote:
> -----"Krishnamraju Eraparaju" <krishna2@chelsio.com> wrote: -----
>
> >To: "Bernard Metzler" <BMT@zurich.ibm.com>
> >From: "Krishnamraju Eraparaju" <krishna2@chelsio.com>
> >Date: 05/14/2020 01:17PM
> >Cc: faisal.latif@intel.com, shiraz.saleem@intel.com,
> >mkalderon@marvell.com, aelior@marvell.com, dledford@redhat.com,
> >jgg@ziepe.ca, linux-rdma@vger.kernel.org, bharat@chelsio.com,
> >nirranjan@chelsio.com
> >Subject: [EXTERNAL] Re: Re: Re: [RFC PATCH] RDMA/siw: Experimental
> >e2e negotiation of GSO usage.
> >
> >On Wednesday, May 05/13/20, 2020 at 11:25:23 +0000, Bernard Metzler
> >wrote:
> >> -----"Krishnamraju Eraparaju" <krishna2@chelsio.com> wrote: -----
> >>
> >> >To: "Bernard Metzler" <BMT@zurich.ibm.com>
> >> >From: "Krishnamraju Eraparaju" <krishna2@chelsio.com>
> >> >Date: 05/13/2020 05:50AM
> >> >Cc: faisal.latif@intel.com, shiraz.saleem@intel.com,
> >> >mkalderon@marvell.com, aelior@marvell.com, dledford@redhat.com,
> >> >jgg@ziepe.ca, linux-rdma@vger.kernel.org, bharat@chelsio.com,
> >> >nirranjan@chelsio.com
> >> >Subject: [EXTERNAL] Re: Re: Re: [RFC PATCH] RDMA/siw: Experimental
> >> >e2e negotiation of GSO usage.
> >> >
> >> >On Monday, May 05/11/20, 2020 at 15:28:47 +0000, Bernard Metzler
> >> >wrote:
> >> >> -----"Krishnamraju Eraparaju" <krishna2@chelsio.com> wrote:
> >-----
> >> >>
> >> >> >To: "Bernard Metzler" <BMT@zurich.ibm.com>
> >> >> >From: "Krishnamraju Eraparaju" <krishna2@chelsio.com>
> >> >> >Date: 05/07/2020 01:07PM
> >> >> >Cc: faisal.latif@intel.com, shiraz.saleem@intel.com,
> >> >> >mkalderon@marvell.com, aelior@marvell.com, dledford@redhat.com,
> >> >> >jgg@ziepe.ca, linux-rdma@vger.kernel.org, bharat@chelsio.com,
> >> >> >nirranjan@chelsio.com
> >> >> >Subject: [EXTERNAL] Re: Re: [RFC PATCH] RDMA/siw: Experimental
> >e2e
> >> >> >negotiation of GSO usage.
> >> >> >
> >> >> >Hi Bernard,
> >> >> >Thanks for the review comments. Replied in line.
> >> >> >
> >> >> >On Tuesday, May 05/05/20, 2020 at 11:19:46 +0000, Bernard
> >Metzler
> >> >> >wrote:
> >> >> >>
> >> >> >> -----"Krishnamraju Eraparaju" <krishna2@chelsio.com> wrote:
> >> >-----
> >> >> >>
> >> >> >> >To: "Bernard Metzler" <BMT@zurich.ibm.com>
> >> >> >> >From: "Krishnamraju Eraparaju" <krishna2@chelsio.com>
> >> >> >> >Date: 04/28/2020 10:01PM
> >> >> >> >Cc: faisal.latif@intel.com, shiraz.saleem@intel.com,
> >> >> >> >mkalderon@marvell.com, aelior@marvell.com,
> >dledford@redhat.com,
> >> >> >> >jgg@ziepe.ca, linux-rdma@vger.kernel.org,
> >bharat@chelsio.com,
> >> >> >> >nirranjan@chelsio.com
> >> >> >> >Subject: [EXTERNAL] Re: [RFC PATCH] RDMA/siw: Experimental
> >e2e
> >> >> >> >negotiation of GSO usage.
> >> >> >> >
> >> >> >> >On Wednesday, April 04/15/20, 2020 at 11:59:21 +0000,
> >Bernard
> >> >> >Metzler
> >> >> >> >wrote:
> >> >> >> >Hi Bernard,
> >> >> >> >
> >> >> >> >The attached patches enables the GSO negotiation code in SIW
> >> >with
> >> >> >> >few modifications, and also allows hardware iwarp drivers to
> >> >> >> >advertise
> >> >> >> >their max length(in 16/32/64KB granularity) that they can
> >> >accept.
> >> >> >> >The logic is almost similar to how TCP SYN MSS announcements
> >> >works
> >> >> >> >while
> >> >> >> >3-way handshake.
> >> >> >> >
> >> >> >> >Please see if this approach works better for softiwarp <=>
> >> >> >hardiwarp
> >> >> >> >case.
> >> >> >> >
> >> >> >> >Thanks,
> >> >> >> >Krishna.
> >> >> >> >
> >> >> >> Hi Krishna,
> >> >> >>
> >> >> >> Thanks for providing this. I have a few comments:
> >> >> >>
> >> >> >> It would be good if we can look at patches inlined in the
> >> >> >> email body, as usual.
> >> >> >Sure, will do that henceforth.
> >> >> >>
> >> >> >> Before further discussing a complex solution as suggested
> >> >> >> here, I would like to hear comments from other iWarp HW
> >> >> >> vendors on their capabilities regarding GSO frame acceptance
> >> >> >> and potential preferences.
> >> >> >>
> >> >> >> The extension proposed here goes beyond what I initially sent
> >> >> >> as a proposed patch. From an siw point of view, it is
> >straight
> >> >> >> forward to select using GSO or not, depending on the iWarp
> >peer
> >> >> >> ability to process large frames. What is proposed here is a
> >> >> >> end-to-end negotiation of the actual frame size.
> >> >> >>
> >> >> >> A comment in the patch you sent suggests adding a module
> >> >> >> parameter. Module parameters are deprecated, and I removed
> >any
> >> >> >> of those from siw when it went upstream. I don't think we can
> >> >> >> rely on that mechanism.
> >> >> >>
> >> >> >> siw has a compile time parameter (yes, that was a module
> >> >> >> parameter) which can set the maximum tx frame size (in
> >multiples
> >> >> >> of MTU size). Any static setup of siw <-> Chelsio could make
> >> >> >> use of that as a work around.
> >> >> >>
> >> >> >> I wonder if it would be a better idea to look into an
> >extension
> >> >> >> of the rdma netlink protocol, which would allow setting
> >driver
> >> >> >> specific parameters per port, or even per QP.
> >> >> >> I assume there are more potential use cases for driver
> >private
> >> >> >> extensions of the rdma netlink interface?
> >> >> >
> >> >> >I think, the only problem with "configuring FPDU length via
> >rdma
> >> >> >netlink" is the enduser might not feel comfortable in finding
> >what
> >> >> >adapter
> >> >> >is installed at the remote endpoint and what length it
> >supports.
> >> >Any
> >> >> >thoughts on simplify this?
> >> >>
> >> >> Nope. This would be 'out of band' information.
> >> >>
> >> >> So we seem to have 3 possible solutions to the problem:
> >> >>
> >> >> (1) detect if the peer accepts FPDUs up to current GSO size,
> >> >> this is what I initially proposed. (2) negotiate a max FPDU
> >> >> size with the peer, this is what you are proposing, or (3)
> >> >> explicitly set that max FPDU size per extended user interface.
> >> >>
> >> >> My problem with (2) is the rather significant proprietary
> >> >> extension of MPA, since spare bits code a max value negotiation.
> >> >>
> >> >> I proposed (1) for its simplicity - just a single bit flag,
> >> >> which de-/selects GSO size for FPDUs on TX. Since Chelsio
> >> >> can handle _some_ larger (up to 16k, you said) sizes, (1)
> >> >> might have to be extended to cap at hard coded max size.
> >> >> Again, it would be good to know what other vendors limits
> >> >> are.
> >> >>
> >> >> Does 16k for siw <-> Chelsio already yield a decent
> >> >> performance win?
> >> >yes, 3x performance gain with just 16K GSO, compared to GSO
> >diabled
> >> >case. where MTU size is 1500.
> >> >
> >>
> >> That is a lot. At the other hand, I would suggest to always
> >> increase MTU size to max (9k) for adapters siw attaches to.
> >> With a page size of 4k, anything below 4k MTU size hurts,
> >> while 9k already packs two consecutive pages into one frame,
> >> if aligned.
> >>
> >> Would 16k still gain a significant performance win if we have
> >> set max MTU size for the interface?
Unfortunately no difference in throughput when MTU is 9K, for 16K FPDU.
Looks like TCP stack constructs GSO/TSO buffer in multiples of HW
MSS(tp->mss_cache). So, as 16K FPDU buffer is not a multiple of 9K, TCP
stack slices 16K buffer into 9K & 7K buffers before passing it to NIC
driver.
Thus no difference in perfromance as each tx packet to NIC cannot go
beyond 9K, when FPDU len is 16K.
> >>
> >> >Regarding the rdma netlink approach that you are suggesting,
> >should
> >> >it
> >> >be similar like below(?):
> >> >
> >> >rdma link set iwp3s0f4/1 max_fpdu_len 102.1.1.6:16384,
> >> >102.5.5.6:32768
> >> >
> >> >
> >> >rdma link show iwp3s0f4/1 max_fpdu_len
> >> > 102.1.1.6:16384
> >> > 102.5.5.6:32768
> >> >
> >> >where "102.1.1.6" is the destination IP address(such that the same
> >> >max
> >> >fpdu length is taken for all the connections to this
> >> >address/adapter).
> >> >And "16384" is max fdpu length.
> >> >
> >> Yes, that would be one way of doing it. Unfortunately we
> >> would end up with maintaining additional permanent in kernel
> >> state per peer we ever configured.
> >>
> >> So, would it make sense to combine it with the iwpmd,
> >> which then may cache peers, while setting max_fpdu per
> >> new connection? This would probably include extending the
> >> proprietary port mapper protocol, to exchange local
> >> preferences with the peer. Local capabilities might
> >> be queried from the device (extending enum ib_mtu to
> >> more than 4k, and using ibv_query_port()). And the
> >> iw_cm_id to be extended to carry that extra parameter
> >> down to the driver... Sounds complicated.
> >If I understand you right, client/server advertises their Max FPDU
> >len
> >in Res field of PMReq/PMAccept frames.
> >typedef struct iwpm_wire_msg {
> > __u8 magic;
> > __u8 pmtime;
> > __be16 reserved;
> >Then after Portmapper negotiation, the fpdu len is propagated to SIW
> >qp
> >strucutre from userspace iwpmd.
> >
> >If we weigh up the pros and cons of using PortMapper Res field vs MPA
> >Res feild, then looks like using MPA is less complicated, considering
> >the lines of changes and modules invovled in changes. Not sure my
> >analysis is right here?
> >
> One important difference IMHO is that one approach would touch an
> established IETF communication protocol (MPA), the other a
> proprietary application (iwpmd).
Ok, will explore more on iwpmd approach, may be prototyping this would help.
>
>
> >Between, looks like the existing SIW GSO code needs a logic to limit
> >"c_tx->tcp_seglen" to 64K-1, as MPA len is only 16bit. Say, in future
> >to
> >best utilize 400G Ethernet, if Linux TCP stack has increased
> >GSO_MAX_SIZE to 128K, then SIW will cast 18bit value to 16bit MPA
> >len.
> >
> Isn't GSO bound to IP fragmentation?
Not sure. But I would say it's better we limit "c_tx->tcp_seglen"
somewhere to 64K-1 to avoid future risks.
>
> Thanks,
> Bernard
>
next prev parent reply other threads:[~2020-05-15 13:51 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-14 14:48 [RFC PATCH] RDMA/siw: Experimental e2e negotiation of GSO usage Bernard Metzler
[not found] ` <20200415105135.GA8246@chelsio.com>
2020-04-15 11:59 ` Bernard Metzler
2020-04-15 12:52 ` Krishnamraju Eraparaju
2020-04-28 20:00 ` Krishnamraju Eraparaju
2020-05-05 11:19 ` Bernard Metzler
2020-05-07 11:06 ` Krishnamraju Eraparaju
2020-05-11 15:28 ` Bernard Metzler
2020-05-13 3:49 ` Krishnamraju Eraparaju
2020-05-13 11:25 ` Bernard Metzler
2020-05-14 11:17 ` Krishnamraju Eraparaju
2020-05-14 13:07 ` Bernard Metzler
2020-05-15 13:50 ` Krishnamraju Eraparaju [this message]
2020-05-15 13:58 ` Krishnamraju Eraparaju
2020-05-26 13:57 ` Bernard Metzler
2020-05-27 16:07 ` Krishnamraju Eraparaju
2020-05-31 7:03 ` Michal Kalderon
2020-05-29 15:20 ` Saleem, Shiraz
2020-05-29 15:48 ` Bernard Metzler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200515135038.GA15967@chelsio.com \
--to=krishna2@chelsio.com \
--cc=BMT@zurich.ibm.com \
--cc=aelior@marvell.com \
--cc=bharat@chelsio.com \
--cc=dledford@redhat.com \
--cc=faisal.latif@intel.com \
--cc=jgg@ziepe.ca \
--cc=linux-rdma@vger.kernel.org \
--cc=mkalderon@marvell.com \
--cc=nirranjan@chelsio.com \
--cc=shiraz.saleem@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.