* SCTP data chunk bundling when SCTP_NODELAY is set @ 2014-06-17 14:17 David Laight 2014-06-17 15:45 ` Vlad Yasevich 0 siblings, 1 reply; 10+ messages in thread From: David Laight @ 2014-06-17 14:17 UTC (permalink / raw) To: netdev@vger.kernel.org If SCTP_NODELAY is set it is difficult to get SCTP to bundle data chunks into ethernet packets. This leads to very high packet rates which bundling could easily reduce by a factor or 8 or 10. Nagle can't really be enabled because it generates unwanted delays when traffic is light (Nagle only really works for unidirectional bulk data and command-response when the messages are smaller than the mtu). Even if the sending application knows it has more data to send, there isn't much it can do to get the chunks bundled. AFAICT 'corking' the socket even stops full sized packets being sent - so the application will deadlock if the socket write buffer size is reached before the socket is 'uncorked'. This also means that the application can't send back to back full sized packets unless it uncorks the socket at exactly the right places. MSG_MORE isn't supported by SCTP, but I'm not sure it would help. You really need a MSG_NO_MORE flag and to leave Nagle enabled. About the only thing I can think of is to normally have Nagle enabled, and then perform the following sequence to force the buffered data chunks be sent: 1) disable Nagle 2) cork the socket 3) uncork the socket 4) enable Nagle Four socket calls is a little excessive! Any other ideas ? David ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: SCTP data chunk bundling when SCTP_NODELAY is set 2014-06-17 14:17 SCTP data chunk bundling when SCTP_NODELAY is set David Laight @ 2014-06-17 15:45 ` Vlad Yasevich 2014-06-17 16:07 ` David Laight 0 siblings, 1 reply; 10+ messages in thread From: Vlad Yasevich @ 2014-06-17 15:45 UTC (permalink / raw) To: David Laight, netdev@vger.kernel.org On 06/17/2014 10:17 AM, David Laight wrote: > If SCTP_NODELAY is set it is difficult to get SCTP to bundle > data chunks into ethernet packets. > This leads to very high packet rates which bundling could easily > reduce by a factor or 8 or 10. > > Nagle can't really be enabled because it generates unwanted delays > when traffic is light (Nagle only really works for unidirectional bulk > data and command-response when the messages are smaller than the mtu). > > Even if the sending application knows it has more data to send, > there isn't much it can do to get the chunks bundled. > > AFAICT 'corking' the socket even stops full sized packets being > sent - so the application will deadlock if the socket write > buffer size is reached before the socket is 'uncorked'. > This also means that the application can't send back to back > full sized packets unless it uncorks the socket at exactly > the right places. > > MSG_MORE isn't supported by SCTP, but I'm not sure it would help. > You really need a MSG_NO_MORE flag and to leave Nagle enabled. > > About the only thing I can think of is to normally have Nagle > enabled, and then perform the following sequence to force the > buffered data chunks be sent: > 1) disable Nagle > 2) cork the socket > 3) uncork the socket > 4) enable Nagle > Four socket calls is a little excessive! First, how are you corking an SCTP socket? There is no SCTP_CORK and looking at the code, I don't see how an SCTP queue can be cored by user... I suppose we could implement SCTP_CORK to do the right thing. I thought is possibly utilizing something like sendmmsg() and passing an extra flag to let it be know that this is a multi-message send that should be queued up by sctp.. -vlad > > Any other ideas ? > > David > > > > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: SCTP data chunk bundling when SCTP_NODELAY is set 2014-06-17 15:45 ` Vlad Yasevich @ 2014-06-17 16:07 ` David Laight 2014-06-17 16:49 ` Vlad Yasevich 0 siblings, 1 reply; 10+ messages in thread From: David Laight @ 2014-06-17 16:07 UTC (permalink / raw) To: 'Vlad Yasevich', netdev@vger.kernel.org From: Vlad Yasevich > On 06/17/2014 10:17 AM, David Laight wrote: > > If SCTP_NODELAY is set it is difficult to get SCTP to bundle > > data chunks into ethernet packets. > > This leads to very high packet rates which bundling could easily > > reduce by a factor or 8 or 10. > > > > Nagle can't really be enabled because it generates unwanted delays > > when traffic is light (Nagle only really works for unidirectional bulk > > data and command-response when the messages are smaller than the mtu). > > > > Even if the sending application knows it has more data to send, > > there isn't much it can do to get the chunks bundled. > > > > AFAICT 'corking' the socket even stops full sized packets being > > sent - so the application will deadlock if the socket write > > buffer size is reached before the socket is 'uncorked'. > > This also means that the application can't send back to back > > full sized packets unless it uncorks the socket at exactly > > the right places. > > > > MSG_MORE isn't supported by SCTP, but I'm not sure it would help. > > You really need a MSG_NO_MORE flag and to leave Nagle enabled. > > > > About the only thing I can think of is to normally have Nagle > > enabled, and then perform the following sequence to force the > > buffered data chunks be sent: > > 1) disable Nagle > > 2) cork the socket > > 3) uncork the socket > > 4) enable Nagle > > Four socket calls is a little excessive! > > First, how are you corking an SCTP socket? There is no SCTP_CORK > and looking at the code, I don't see how an SCTP queue can be > cored by user... I only looked as far as seeing that the code in sm_sideffect.c allows someone else to have corked the socket, and the effect that the 'cork' had. > I suppose we could implement SCTP_CORK to do the right thing. > > I thought is possibly utilizing something like sendmmsg() and passing > an extra flag to let it be know that this is a multi-message send > that should be queued up by sctp.. It would be as easy to expose the extra flag to the 'application' allowing it to use sendmsg() or sendmmsg(). While sendmmsg() saves a system call, it is fairly horrid to use. (and I'm sending from a kernel driver so don't care about the system call cost!) Possibly MSG_MORE with Nagle disabled could invoke the Nagle send delay - but you'd need to know whether any chunks in the queue had MSG_MORE clear. It would also be nice to have a 'send on local timeout', rather than Nagle's 'wait for everything to be acked'. David ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: SCTP data chunk bundling when SCTP_NODELAY is set 2014-06-17 16:07 ` David Laight @ 2014-06-17 16:49 ` Vlad Yasevich 2014-06-18 13:47 ` David Laight 0 siblings, 1 reply; 10+ messages in thread From: Vlad Yasevich @ 2014-06-17 16:49 UTC (permalink / raw) To: David Laight, netdev@vger.kernel.org On 06/17/2014 12:07 PM, David Laight wrote: > From: Vlad Yasevich >> On 06/17/2014 10:17 AM, David Laight wrote: >>> If SCTP_NODELAY is set it is difficult to get SCTP to bundle >>> data chunks into ethernet packets. >>> This leads to very high packet rates which bundling could easily >>> reduce by a factor or 8 or 10. >>> >>> Nagle can't really be enabled because it generates unwanted delays >>> when traffic is light (Nagle only really works for unidirectional bulk >>> data and command-response when the messages are smaller than the mtu). >>> >>> Even if the sending application knows it has more data to send, >>> there isn't much it can do to get the chunks bundled. >>> >>> AFAICT 'corking' the socket even stops full sized packets being >>> sent - so the application will deadlock if the socket write >>> buffer size is reached before the socket is 'uncorked'. >>> This also means that the application can't send back to back >>> full sized packets unless it uncorks the socket at exactly >>> the right places. >>> >>> MSG_MORE isn't supported by SCTP, but I'm not sure it would help. >>> You really need a MSG_NO_MORE flag and to leave Nagle enabled. >>> >>> About the only thing I can think of is to normally have Nagle >>> enabled, and then perform the following sequence to force the >>> buffered data chunks be sent: >>> 1) disable Nagle >>> 2) cork the socket >>> 3) uncork the socket >>> 4) enable Nagle >>> Four socket calls is a little excessive! >> >> First, how are you corking an SCTP socket? There is no SCTP_CORK >> and looking at the code, I don't see how an SCTP queue can be >> cored by user... > > I only looked as far as seeing that the code in sm_sideffect.c > allows someone else to have corked the socket, and the effect > that the 'cork' had. Right. Cork is pretty dumb right now. It should work more like TCP where if you've queued up enough data to bypass nagle checks, it should flush some number for full MTU packets. > >> I suppose we could implement SCTP_CORK to do the right thing. >> >> I thought is possibly utilizing something like sendmmsg() and passing >> an extra flag to let it be know that this is a multi-message send >> that should be queued up by sctp.. > > It would be as easy to expose the extra flag to the 'application' > allowing it to use sendmsg() or sendmmsg(). > While sendmmsg() saves a system call, it is fairly horrid to use. > (and I'm sending from a kernel driver so don't care about the > system call cost!) > > Possibly MSG_MORE with Nagle disabled could invoke the Nagle send > delay - but you'd need to know whether any chunks in the queue > had MSG_MORE clear. That's why doing this with cork would be simpler. The ULP can just queue up a bunch of small data and if we pass nagle checks, it will be flushed. If not, uncork will flush it. I could work up a patch for you if you want. Thanks -vlad > It would also be nice to have a 'send on local timeout', rather than > Nagle's 'wait for everything to be acked'. > > David > > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: SCTP data chunk bundling when SCTP_NODELAY is set 2014-06-17 16:49 ` Vlad Yasevich @ 2014-06-18 13:47 ` David Laight 2014-06-18 16:38 ` David Laight 0 siblings, 1 reply; 10+ messages in thread From: David Laight @ 2014-06-18 13:47 UTC (permalink / raw) To: 'Vlad Yasevich', netdev@vger.kernel.org From: Vlad Yasevich ... > >> I suppose we could implement SCTP_CORK to do the right thing. > >> > >> I thought is possibly utilizing something like sendmmsg() and passing > >> an extra flag to let it be know that this is a multi-message send > >> that should be queued up by sctp.. > > > > It would be as easy to expose the extra flag to the 'application' > > allowing it to use sendmsg() or sendmmsg(). > > While sendmmsg() saves a system call, it is fairly horrid to use. > > (and I'm sending from a kernel driver so don't care about the > > system call cost!) > > > > Possibly MSG_MORE with Nagle disabled could invoke the Nagle send > > delay - but you'd need to know whether any chunks in the queue > > had MSG_MORE clear. > > That's why doing this with cork would be simpler. The ULP can just > queue up a bunch of small data and if we pass nagle checks, it will be > flushed. If not, uncork will flush it. I think you need only care about the 'MSG_MORE' flag of the last data chunk. Any earlier data (with MSG_MORE clear) will usually have been sent (unless prevented by Nagle or flow control), so you probably wouldn't be able to send it regardless of the state of MSG_MORE on a chunk being queued. There is also the expectation that another send without MSG_MORE will happen almost immediately. So MSG_MORE could have the same effect as corking the socket. Although you'd need separate bits - but uncork could clear both. What I would like to implement (from M3UA) is to hold data for a maximum of (say) 5ms awaiting M3UA data chunks. To do this properly requires knowledge of the actual ethernet packet boundaries. The problem is there are (at least) three cases: 1) This data should be sent as soon as possible. 2) Send this data some time soonish. 3) I've got another data block I'm going to give you after this one. > I could work up a patch for you if you want. I was thinking I might try to write one. But I've been asked to look at something else for a bit. David ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: SCTP data chunk bundling when SCTP_NODELAY is set 2014-06-18 13:47 ` David Laight @ 2014-06-18 16:38 ` David Laight 2014-06-19 13:27 ` Vlad Yasevich 0 siblings, 1 reply; 10+ messages in thread From: David Laight @ 2014-06-18 16:38 UTC (permalink / raw) To: David Laight, 'Vlad Yasevich', netdev@vger.kernel.org From: David Laight > From: Vlad Yasevich > ... > > >> I suppose we could implement SCTP_CORK to do the right thing. > > >> > > >> I thought is possibly utilizing something like sendmmsg() and passing > > >> an extra flag to let it be know that this is a multi-message send > > >> that should be queued up by sctp.. > > > > > > It would be as easy to expose the extra flag to the 'application' > > > allowing it to use sendmsg() or sendmmsg(). > > > While sendmmsg() saves a system call, it is fairly horrid to use. > > > (and I'm sending from a kernel driver so don't care about the > > > system call cost!) > > > > > > Possibly MSG_MORE with Nagle disabled could invoke the Nagle send > > > delay - but you'd need to know whether any chunks in the queue > > > had MSG_MORE clear. > > > > That's why doing this with cork would be simpler. The ULP can just > > queue up a bunch of small data and if we pass nagle checks, it will be > > flushed. If not, uncork will flush it. > > I think you need only care about the 'MSG_MORE' flag of the last data chunk. > Any earlier data (with MSG_MORE clear) will usually have been sent (unless > prevented by Nagle or flow control), so you probably wouldn't be able to > send it regardless of the state of MSG_MORE on a chunk being queued. > There is also the expectation that another send without MSG_MORE will > happen almost immediately. > > So MSG_MORE could have the same effect as corking the socket. > Although you'd need separate bits - but uncork could clear both. > > What I would like to implement (from M3UA) is to hold data for a maximum > of (say) 5ms awaiting M3UA data chunks. To do this properly requires > knowledge of the actual ethernet packet boundaries. > > The problem is there are (at least) three cases: > 1) This data should be sent as soon as possible. > 2) Send this data some time soonish. > 3) I've got another data block I'm going to give you after this one. > > > I could work up a patch for you if you want. > > I was thinking I might try to write one. Actually this might work for what I'm trying to do. (untested). diff --git a/net/sctp/output.c b/net/sctp/output.c index 0f4d15f..51030bc 100644 --- a/net/sctp/output.c +++ b/net/sctp/output.c @@ -691,7 +691,7 @@ static sctp_xmit_t sctp_packet_can_append_data(struct sctp_packet *packet, * if any previously transmitted data on the connection remains * unacknowledged. */ - if (!sctp_sk(asoc->base.sk)->nodelay && sctp_packet_empty(packet) && + if (sctp_sk(asoc->base.sk)->nodelay != 1 && sctp_packet_empty(packet) && inflight && sctp_state(asoc, ESTABLISHED)) { unsigned int max = transport->pathmtu - packet->overhead; unsigned int len = chunk->skb->len + q->out_qlen; diff --git a/net/sctp/socket.c b/net/sctp/socket.c index fee06b9..084b957 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -1928,7 +1928,10 @@ static int sctp_sendmsg(struct kiocb *iocb, struct sock *sk, } /* Break the message into multiple chunks of maximum size. */ + if (msg->msg_flags & MSG_MORE) + sp->nodelay |= 2; datamsg = sctp_datamsg_from_user(asoc, sinfo, msg, msg_len); + sp->nodelay &= 1; if (IS_ERR(datamsg)) { err = PTR_ERR(datamsg); goto out_free; Ideally MSG_MORE should delay sending even if 'inflight' is false. But that would require 'flush on timeout'. I'd prefer that, and with a configurable timeout. But I can implement the timeout in the 'application'. Given the way Nagle is implemented in sctp, I could keep flipping it on and off - but that probably has undocumented behaviour (ie it might suddenly change). David ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: SCTP data chunk bundling when SCTP_NODELAY is set 2014-06-18 16:38 ` David Laight @ 2014-06-19 13:27 ` Vlad Yasevich 2014-06-19 13:45 ` David Laight 0 siblings, 1 reply; 10+ messages in thread From: Vlad Yasevich @ 2014-06-19 13:27 UTC (permalink / raw) To: David Laight, netdev@vger.kernel.org On 06/18/2014 12:38 PM, David Laight wrote: > From: David Laight >> From: Vlad Yasevich >> ... >>>>> I suppose we could implement SCTP_CORK to do the right thing. >>>>> >>>>> I thought is possibly utilizing something like sendmmsg() and passing >>>>> an extra flag to let it be know that this is a multi-message send >>>>> that should be queued up by sctp.. >>>> >>>> It would be as easy to expose the extra flag to the 'application' >>>> allowing it to use sendmsg() or sendmmsg(). >>>> While sendmmsg() saves a system call, it is fairly horrid to use. >>>> (and I'm sending from a kernel driver so don't care about the >>>> system call cost!) >>>> >>>> Possibly MSG_MORE with Nagle disabled could invoke the Nagle send >>>> delay - but you'd need to know whether any chunks in the queue >>>> had MSG_MORE clear. >>> >>> That's why doing this with cork would be simpler. The ULP can just >>> queue up a bunch of small data and if we pass nagle checks, it will be >>> flushed. If not, uncork will flush it. >> >> I think you need only care about the 'MSG_MORE' flag of the last data chunk. >> Any earlier data (with MSG_MORE clear) will usually have been sent (unless >> prevented by Nagle or flow control), so you probably wouldn't be able to >> send it regardless of the state of MSG_MORE on a chunk being queued. >> There is also the expectation that another send without MSG_MORE will >> happen almost immediately. >> >> So MSG_MORE could have the same effect as corking the socket. >> Although you'd need separate bits - but uncork could clear both. >> >> What I would like to implement (from M3UA) is to hold data for a maximum >> of (say) 5ms awaiting M3UA data chunks. To do this properly requires >> knowledge of the actual ethernet packet boundaries. >> >> The problem is there are (at least) three cases: >> 1) This data should be sent as soon as possible. >> 2) Send this data some time soonish. >> 3) I've got another data block I'm going to give you after this one. >> >>> I could work up a patch for you if you want. >> >> I was thinking I might try to write one. > > Actually this might work for what I'm trying to do. > (untested). > > diff --git a/net/sctp/output.c b/net/sctp/output.c > index 0f4d15f..51030bc 100644 > --- a/net/sctp/output.c > +++ b/net/sctp/output.c > @@ -691,7 +691,7 @@ static sctp_xmit_t sctp_packet_can_append_data(struct sctp_packet *packet, > * if any previously transmitted data on the connection remains > * unacknowledged. > */ > - if (!sctp_sk(asoc->base.sk)->nodelay && sctp_packet_empty(packet) && > + if (sctp_sk(asoc->base.sk)->nodelay != 1 && sctp_packet_empty(packet) && > inflight && sctp_state(asoc, ESTABLISHED)) { > unsigned int max = transport->pathmtu - packet->overhead; > unsigned int len = chunk->skb->len + q->out_qlen; > diff --git a/net/sctp/socket.c b/net/sctp/socket.c > index fee06b9..084b957 100644 > --- a/net/sctp/socket.c > +++ b/net/sctp/socket.c > @@ -1928,7 +1928,10 @@ static int sctp_sendmsg(struct kiocb *iocb, struct sock *sk, > } > > /* Break the message into multiple chunks of maximum size. */ > + if (msg->msg_flags & MSG_MORE) > + sp->nodelay |= 2; > datamsg = sctp_datamsg_from_user(asoc, sinfo, msg, msg_len); > + sp->nodelay &= 1; I think you reset it too early. You have to reset after the call to sctp_primitive_SEND(). This way, you queue up the data and go through the state machine with nodelay != 1, thus triggering the updated code on output. > if (IS_ERR(datamsg)) { > err = PTR_ERR(datamsg); > goto out_free; > > Ideally MSG_MORE should delay sending even if 'inflight' is false. > But that would require 'flush on timeout'. You can use a lack of MSG_MORE to be an indication of a flush. Thus MSG_MORE would always queue up data until MSG_MORE is 0, at which point flush should happen. > I'd prefer that, and with a configurable timeout. > But I can implement the timeout in the 'application'. > > Given the way Nagle is implemented in sctp, I could keep flipping > it on and off - but that probably has undocumented behaviour > (ie it might suddenly change). With the above MSG_MORE, I think you can just turn off nagle once and use MSG_MORE and when you drain your application queue, clear MSG_MORE on the last write. -vlad > > David > > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: SCTP data chunk bundling when SCTP_NODELAY is set 2014-06-19 13:27 ` Vlad Yasevich @ 2014-06-19 13:45 ` David Laight 2014-06-19 14:05 ` Eric Dumazet 2014-06-19 15:36 ` Vlad Yasevich 0 siblings, 2 replies; 10+ messages in thread From: David Laight @ 2014-06-19 13:45 UTC (permalink / raw) To: 'Vlad Yasevich', netdev@vger.kernel.org From: Vlad Yasevich > On 06/18/2014 12:38 PM, David Laight wrote: > > From: David Laight > >> From: Vlad Yasevich > >> ... > >>>>> I suppose we could implement SCTP_CORK to do the right thing. > >>>>> > >>>>> I thought is possibly utilizing something like sendmmsg() and passing > >>>>> an extra flag to let it be know that this is a multi-message send > >>>>> that should be queued up by sctp.. > >>>> > >>>> It would be as easy to expose the extra flag to the 'application' > >>>> allowing it to use sendmsg() or sendmmsg(). > >>>> While sendmmsg() saves a system call, it is fairly horrid to use. > >>>> (and I'm sending from a kernel driver so don't care about the > >>>> system call cost!) > >>>> > >>>> Possibly MSG_MORE with Nagle disabled could invoke the Nagle send > >>>> delay - but you'd need to know whether any chunks in the queue > >>>> had MSG_MORE clear. > >>> > >>> That's why doing this with cork would be simpler. The ULP can just > >>> queue up a bunch of small data and if we pass nagle checks, it will be > >>> flushed. If not, uncork will flush it. > >> > >> I think you need only care about the 'MSG_MORE' flag of the last data chunk. > >> Any earlier data (with MSG_MORE clear) will usually have been sent (unless > >> prevented by Nagle or flow control), so you probably wouldn't be able to > >> send it regardless of the state of MSG_MORE on a chunk being queued. > >> There is also the expectation that another send without MSG_MORE will > >> happen almost immediately. > >> > >> So MSG_MORE could have the same effect as corking the socket. > >> Although you'd need separate bits - but uncork could clear both. > >> > >> What I would like to implement (from M3UA) is to hold data for a maximum > >> of (say) 5ms awaiting M3UA data chunks. To do this properly requires > >> knowledge of the actual ethernet packet boundaries. > >> > >> The problem is there are (at least) three cases: > >> 1) This data should be sent as soon as possible. > >> 2) Send this data some time soonish. > >> 3) I've got another data block I'm going to give you after this one. > >> > >>> I could work up a patch for you if you want. > >> > >> I was thinking I might try to write one. > > > > Actually this might work for what I'm trying to do. > > (untested). > > > > diff --git a/net/sctp/output.c b/net/sctp/output.c > > index 0f4d15f..51030bc 100644 > > --- a/net/sctp/output.c > > +++ b/net/sctp/output.c > > @@ -691,7 +691,7 @@ static sctp_xmit_t sctp_packet_can_append_data(struct sctp_packet *packet, > > * if any previously transmitted data on the connection remains > > * unacknowledged. > > */ > > - if (!sctp_sk(asoc->base.sk)->nodelay && sctp_packet_empty(packet) && > > + if (sctp_sk(asoc->base.sk)->nodelay != 1 && sctp_packet_empty(packet) && > > inflight && sctp_state(asoc, ESTABLISHED)) { > > unsigned int max = transport->pathmtu - packet->overhead; > > unsigned int len = chunk->skb->len + q->out_qlen; > > diff --git a/net/sctp/socket.c b/net/sctp/socket.c > > index fee06b9..084b957 100644 > > --- a/net/sctp/socket.c > > +++ b/net/sctp/socket.c > > @@ -1928,7 +1928,10 @@ static int sctp_sendmsg(struct kiocb *iocb, struct sock *sk, > > } > > > > /* Break the message into multiple chunks of maximum size. */ > > + if (msg->msg_flags & MSG_MORE) > > + sp->nodelay |= 2; > > datamsg = sctp_datamsg_from_user(asoc, sinfo, msg, msg_len); > > + sp->nodelay &= 1; > > I think you reset it too early. You have to reset after the call to > sctp_primitive_SEND(). This way, you queue up the data and go through > the state machine with nodelay != 1, thus triggering the updated code > on output. I changed it to clear the flag if MSG_MORE is clear before I tested it. I'll post a patch after net-next opens. > > Ideally MSG_MORE should delay sending even if 'inflight' is false. > > But that would require 'flush on timeout'. > > You can use a lack of MSG_MORE to be an indication of a flush. Thus > MSG_MORE would always queue up data until MSG_MORE is 0, at which point > flush should happen. With Nagle disabled. If MSG_MORE was clear on the previous send then there will normally be nothing queued (would have to be flow control limited). > > > I'd prefer that, and with a configurable timeout. > > But I can implement the timeout in the 'application'. > > > > Given the way Nagle is implemented in sctp, I could keep flipping > > it on and off - but that probably has undocumented behaviour > > (ie it might suddenly change). > > With the above MSG_MORE, I think you can just turn off nagle once and > use MSG_MORE and when you drain your application queue, clear MSG_MORE > on the last write. That is what I've done. When my test application sends 100 messages through M3UA I now see a small number of ethernet packets - rather than 100. The whole thing does rather rely on the lengths of various code paths in order to get multiple messages queued at the point that sendmsg() is finally called. I'm not worried about sending 2 packets at the start of a burst of data. It seems safer than not sending data because the application send a single block with MSG_MORE set. I do need to do some testing with simulated network delays. Someone posted how to set that up earlier today. I noticed that the TCP code is documented to eventually send data after 200ms. It would be better if that interval were settable per-socket. I'd set it to 1ms (or next tick). David ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: SCTP data chunk bundling when SCTP_NODELAY is set 2014-06-19 13:45 ` David Laight @ 2014-06-19 14:05 ` Eric Dumazet 2014-06-19 15:36 ` Vlad Yasevich 1 sibling, 0 replies; 10+ messages in thread From: Eric Dumazet @ 2014-06-19 14:05 UTC (permalink / raw) To: David Laight; +Cc: 'Vlad Yasevich', netdev@vger.kernel.org On Thu, 2014-06-19 at 13:45 +0000, David Laight wrote: > I noticed that the TCP code is documented to eventually send data after 200ms. > It would be better if that interval were settable per-socket. > I'd set it to 1ms (or next tick). This 200ms is really RTO. We do not use an extra timer for this yet. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: SCTP data chunk bundling when SCTP_NODELAY is set 2014-06-19 13:45 ` David Laight 2014-06-19 14:05 ` Eric Dumazet @ 2014-06-19 15:36 ` Vlad Yasevich 1 sibling, 0 replies; 10+ messages in thread From: Vlad Yasevich @ 2014-06-19 15:36 UTC (permalink / raw) To: David Laight, netdev@vger.kernel.org On 06/19/2014 09:45 AM, David Laight wrote: > From: Vlad Yasevich >> On 06/18/2014 12:38 PM, David Laight wrote: >>> From: David Laight >>>> From: Vlad Yasevich >>>> ... >>>>>>> I suppose we could implement SCTP_CORK to do the right thing. >>>>>>> >>>>>>> I thought is possibly utilizing something like sendmmsg() and passing >>>>>>> an extra flag to let it be know that this is a multi-message send >>>>>>> that should be queued up by sctp.. >>>>>> >>>>>> It would be as easy to expose the extra flag to the 'application' >>>>>> allowing it to use sendmsg() or sendmmsg(). >>>>>> While sendmmsg() saves a system call, it is fairly horrid to use. >>>>>> (and I'm sending from a kernel driver so don't care about the >>>>>> system call cost!) >>>>>> >>>>>> Possibly MSG_MORE with Nagle disabled could invoke the Nagle send >>>>>> delay - but you'd need to know whether any chunks in the queue >>>>>> had MSG_MORE clear. >>>>> >>>>> That's why doing this with cork would be simpler. The ULP can just >>>>> queue up a bunch of small data and if we pass nagle checks, it will be >>>>> flushed. If not, uncork will flush it. >>>> >>>> I think you need only care about the 'MSG_MORE' flag of the last data chunk. >>>> Any earlier data (with MSG_MORE clear) will usually have been sent (unless >>>> prevented by Nagle or flow control), so you probably wouldn't be able to >>>> send it regardless of the state of MSG_MORE on a chunk being queued. >>>> There is also the expectation that another send without MSG_MORE will >>>> happen almost immediately. >>>> >>>> So MSG_MORE could have the same effect as corking the socket. >>>> Although you'd need separate bits - but uncork could clear both. >>>> >>>> What I would like to implement (from M3UA) is to hold data for a maximum >>>> of (say) 5ms awaiting M3UA data chunks. To do this properly requires >>>> knowledge of the actual ethernet packet boundaries. >>>> >>>> The problem is there are (at least) three cases: >>>> 1) This data should be sent as soon as possible. >>>> 2) Send this data some time soonish. >>>> 3) I've got another data block I'm going to give you after this one. >>>> >>>>> I could work up a patch for you if you want. >>>> >>>> I was thinking I might try to write one. >>> >>> Actually this might work for what I'm trying to do. >>> (untested). >>> >>> diff --git a/net/sctp/output.c b/net/sctp/output.c >>> index 0f4d15f..51030bc 100644 >>> --- a/net/sctp/output.c >>> +++ b/net/sctp/output.c >>> @@ -691,7 +691,7 @@ static sctp_xmit_t sctp_packet_can_append_data(struct sctp_packet *packet, >>> * if any previously transmitted data on the connection remains >>> * unacknowledged. >>> */ >>> - if (!sctp_sk(asoc->base.sk)->nodelay && sctp_packet_empty(packet) && >>> + if (sctp_sk(asoc->base.sk)->nodelay != 1 && sctp_packet_empty(packet) && >>> inflight && sctp_state(asoc, ESTABLISHED)) { >>> unsigned int max = transport->pathmtu - packet->overhead; >>> unsigned int len = chunk->skb->len + q->out_qlen; >>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c >>> index fee06b9..084b957 100644 >>> --- a/net/sctp/socket.c >>> +++ b/net/sctp/socket.c >>> @@ -1928,7 +1928,10 @@ static int sctp_sendmsg(struct kiocb *iocb, struct sock *sk, >>> } >>> >>> /* Break the message into multiple chunks of maximum size. */ >>> + if (msg->msg_flags & MSG_MORE) >>> + sp->nodelay |= 2; >>> datamsg = sctp_datamsg_from_user(asoc, sinfo, msg, msg_len); >>> + sp->nodelay &= 1; >> >> I think you reset it too early. You have to reset after the call to >> sctp_primitive_SEND(). This way, you queue up the data and go through >> the state machine with nodelay != 1, thus triggering the updated code >> on output. > > I changed it to clear the flag if MSG_MORE is clear before I tested it. > > I'll post a patch after net-next opens. > >>> Ideally MSG_MORE should delay sending even if 'inflight' is false. >>> But that would require 'flush on timeout'. >> >> You can use a lack of MSG_MORE to be an indication of a flush. Thus >> MSG_MORE would always queue up data until MSG_MORE is 0, at which point >> flush should happen. > > With Nagle disabled. If MSG_MORE was clear on the previous send then there > will normally be nothing queued (would have to be flow control limited). > >> >>> I'd prefer that, and with a configurable timeout. >>> But I can implement the timeout in the 'application'. >>> >>> Given the way Nagle is implemented in sctp, I could keep flipping >>> it on and off - but that probably has undocumented behaviour >>> (ie it might suddenly change). >> >> With the above MSG_MORE, I think you can just turn off nagle once and >> use MSG_MORE and when you drain your application queue, clear MSG_MORE >> on the last write. > > That is what I've done. > When my test application sends 100 messages through M3UA I now see a > small number of ethernet packets - rather than 100. > The whole thing does rather rely on the lengths of various code > paths in order to get multiple messages queued at the point that > sendmsg() is finally called. > > I'm not worried about sending 2 packets at the start of a burst of data. > It seems safer than not sending data because the application send > a single block with MSG_MORE set. > > I do need to do some testing with simulated network delays. > Someone posted how to set that up earlier today. http://www.linuxfoundation.org/collaborate/workgroups/networking/netem -vlad > > I noticed that the TCP code is documented to eventually send data after 200ms. > It would be better if that interval were settable per-socket. > I'd set it to 1ms (or next tick). > > David > > > > ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-06-19 15:36 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-06-17 14:17 SCTP data chunk bundling when SCTP_NODELAY is set David Laight 2014-06-17 15:45 ` Vlad Yasevich 2014-06-17 16:07 ` David Laight 2014-06-17 16:49 ` Vlad Yasevich 2014-06-18 13:47 ` David Laight 2014-06-18 16:38 ` David Laight 2014-06-19 13:27 ` Vlad Yasevich 2014-06-19 13:45 ` David Laight 2014-06-19 14:05 ` Eric Dumazet 2014-06-19 15:36 ` Vlad Yasevich
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).