* [PATCH net] sctp: fix a success return may hide an error @ 2016-08-11 12:52 ` Xin Long 0 siblings, 0 replies; 29+ messages in thread From: Xin Long @ 2016-08-11 12:52 UTC (permalink / raw) To: network dev, linux-sctp Cc: davem, Marcelo Ricardo Leitner, Vlad Yasevich, daniel Now in the end of sctp_outq_flush, sctp calls sctp_packet_transmit in a loop. The return of current sctp_packet_transmit always covers the prior one's. If the last call of sctp_packet_transmit return a success, it may hide the error that returns from the prior call. This patch is to fix this by keeping the old error until the new error returns from sctp_packet_transmit. Did TAHI test against this fix, no regression is found. Signed-off-by: Xin Long <lucien.xin@gmail.com> --- net/sctp/outqueue.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c index 72e54a4..b97c8ad 100644 --- a/net/sctp/outqueue.c +++ b/net/sctp/outqueue.c @@ -1193,7 +1193,7 @@ sctp_flush_out: send_ready); packet = &t->packet; if (!sctp_packet_empty(packet)) - error = sctp_packet_transmit(packet, gfp); + error = sctp_packet_transmit(packet, gfp) ? : error; /* Clear the burst limited state, if any */ sctp_transport_burst_reset(t); -- 2.1.0 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH net] sctp: fix a success return may hide an error @ 2016-08-11 12:52 ` Xin Long 0 siblings, 0 replies; 29+ messages in thread From: Xin Long @ 2016-08-11 12:52 UTC (permalink / raw) To: network dev, linux-sctp Cc: davem, Marcelo Ricardo Leitner, Vlad Yasevich, daniel Now in the end of sctp_outq_flush, sctp calls sctp_packet_transmit in a loop. The return of current sctp_packet_transmit always covers the prior one's. If the last call of sctp_packet_transmit return a success, it may hide the error that returns from the prior call. This patch is to fix this by keeping the old error until the new error returns from sctp_packet_transmit. Did TAHI test against this fix, no regression is found. Signed-off-by: Xin Long <lucien.xin@gmail.com> --- net/sctp/outqueue.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c index 72e54a4..b97c8ad 100644 --- a/net/sctp/outqueue.c +++ b/net/sctp/outqueue.c @@ -1193,7 +1193,7 @@ sctp_flush_out: send_ready); packet = &t->packet; if (!sctp_packet_empty(packet)) - error = sctp_packet_transmit(packet, gfp); + error = sctp_packet_transmit(packet, gfp) ? : error; /* Clear the burst limited state, if any */ sctp_transport_burst_reset(t); -- 2.1.0 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH net] sctp: fix a success return may hide an error 2016-08-11 12:52 ` Xin Long @ 2016-08-11 13:11 ` Marcelo Ricardo Leitner -1 siblings, 0 replies; 29+ messages in thread From: Marcelo Ricardo Leitner @ 2016-08-11 13:11 UTC (permalink / raw) To: Xin Long; +Cc: network dev, linux-sctp, davem, Vlad Yasevich, daniel On Thu, Aug 11, 2016 at 08:52:58PM +0800, Xin Long wrote: > Now in the end of sctp_outq_flush, sctp calls sctp_packet_transmit > in a loop. The return of current sctp_packet_transmit always covers > the prior one's. If the last call of sctp_packet_transmit return a > success, it may hide the error that returns from the prior call. > > This patch is to fix this by keeping the old error until the new > error returns from sctp_packet_transmit. Did TAHI test against this > fix, no regression is found. > > Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> > --- > net/sctp/outqueue.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c > index 72e54a4..b97c8ad 100644 > --- a/net/sctp/outqueue.c > +++ b/net/sctp/outqueue.c > @@ -1193,7 +1193,7 @@ sctp_flush_out: > send_ready); > packet = &t->packet; > if (!sctp_packet_empty(packet)) > - error = sctp_packet_transmit(packet, gfp); > + error = sctp_packet_transmit(packet, gfp) ? : error; > > /* Clear the burst limited state, if any */ > sctp_transport_burst_reset(t); > -- > 2.1.0 > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net] sctp: fix a success return may hide an error @ 2016-08-11 13:11 ` Marcelo Ricardo Leitner 0 siblings, 0 replies; 29+ messages in thread From: Marcelo Ricardo Leitner @ 2016-08-11 13:11 UTC (permalink / raw) To: Xin Long; +Cc: network dev, linux-sctp, davem, Vlad Yasevich, daniel On Thu, Aug 11, 2016 at 08:52:58PM +0800, Xin Long wrote: > Now in the end of sctp_outq_flush, sctp calls sctp_packet_transmit > in a loop. The return of current sctp_packet_transmit always covers > the prior one's. If the last call of sctp_packet_transmit return a > success, it may hide the error that returns from the prior call. > > This patch is to fix this by keeping the old error until the new > error returns from sctp_packet_transmit. Did TAHI test against this > fix, no regression is found. > > Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> > --- > net/sctp/outqueue.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c > index 72e54a4..b97c8ad 100644 > --- a/net/sctp/outqueue.c > +++ b/net/sctp/outqueue.c > @@ -1193,7 +1193,7 @@ sctp_flush_out: > send_ready); > packet = &t->packet; > if (!sctp_packet_empty(packet)) > - error = sctp_packet_transmit(packet, gfp); > + error = sctp_packet_transmit(packet, gfp) ? : error; > > /* Clear the burst limited state, if any */ > sctp_transport_burst_reset(t); > -- > 2.1.0 > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net] sctp: fix a success return may hide an error 2016-08-11 12:52 ` Xin Long @ 2016-08-11 15:36 ` Neil Horman -1 siblings, 0 replies; 29+ messages in thread From: Neil Horman @ 2016-08-11 15:36 UTC (permalink / raw) To: Xin Long Cc: network dev, linux-sctp, davem, Marcelo Ricardo Leitner, Vlad Yasevich, daniel On Thu, Aug 11, 2016 at 08:52:58PM +0800, Xin Long wrote: > Now in the end of sctp_outq_flush, sctp calls sctp_packet_transmit > in a loop. The return of current sctp_packet_transmit always covers > the prior one's. If the last call of sctp_packet_transmit return a > success, it may hide the error that returns from the prior call. > > This patch is to fix this by keeping the old error until the new > error returns from sctp_packet_transmit. Did TAHI test against this > fix, no regression is found. > > Signed-off-by: Xin Long <lucien.xin@gmail.com> > --- > net/sctp/outqueue.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c > index 72e54a4..b97c8ad 100644 > --- a/net/sctp/outqueue.c > +++ b/net/sctp/outqueue.c > @@ -1193,7 +1193,7 @@ sctp_flush_out: > send_ready); > packet = &t->packet; > if (!sctp_packet_empty(packet)) > - error = sctp_packet_transmit(packet, gfp); > + error = sctp_packet_transmit(packet, gfp) ? : error; > > /* Clear the burst limited state, if any */ > sctp_transport_burst_reset(t); > -- > 2.1.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-sctp" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Acked-by: Neil Horman <nhorman@tuxdriver.com> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net] sctp: fix a success return may hide an error @ 2016-08-11 15:36 ` Neil Horman 0 siblings, 0 replies; 29+ messages in thread From: Neil Horman @ 2016-08-11 15:36 UTC (permalink / raw) To: Xin Long Cc: network dev, linux-sctp, davem, Marcelo Ricardo Leitner, Vlad Yasevich, daniel On Thu, Aug 11, 2016 at 08:52:58PM +0800, Xin Long wrote: > Now in the end of sctp_outq_flush, sctp calls sctp_packet_transmit > in a loop. The return of current sctp_packet_transmit always covers > the prior one's. If the last call of sctp_packet_transmit return a > success, it may hide the error that returns from the prior call. > > This patch is to fix this by keeping the old error until the new > error returns from sctp_packet_transmit. Did TAHI test against this > fix, no regression is found. > > Signed-off-by: Xin Long <lucien.xin@gmail.com> > --- > net/sctp/outqueue.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c > index 72e54a4..b97c8ad 100644 > --- a/net/sctp/outqueue.c > +++ b/net/sctp/outqueue.c > @@ -1193,7 +1193,7 @@ sctp_flush_out: > send_ready); > packet = &t->packet; > if (!sctp_packet_empty(packet)) > - error = sctp_packet_transmit(packet, gfp); > + error = sctp_packet_transmit(packet, gfp) ? : error; > > /* Clear the burst limited state, if any */ > sctp_transport_burst_reset(t); > -- > 2.1.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-sctp" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Acked-by: Neil Horman <nhorman@tuxdriver.com> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net] sctp: fix a success return may hide an error 2016-08-11 12:52 ` Xin Long @ 2016-08-13 4:11 ` David Miller -1 siblings, 0 replies; 29+ messages in thread From: David Miller @ 2016-08-13 4:11 UTC (permalink / raw) To: lucien.xin; +Cc: netdev, linux-sctp, marcelo.leitner, vyasevich, daniel From: Xin Long <lucien.xin@gmail.com> Date: Thu, 11 Aug 2016 20:52:58 +0800 > Now in the end of sctp_outq_flush, sctp calls sctp_packet_transmit > in a loop. The return of current sctp_packet_transmit always covers > the prior one's. If the last call of sctp_packet_transmit return a > success, it may hide the error that returns from the prior call. > > This patch is to fix this by keeping the old error until the new > error returns from sctp_packet_transmit. Did TAHI test against this > fix, no regression is found. > > Signed-off-by: Xin Long <lucien.xin@gmail.com> This style of error handling is dangerous. The first error can be lost. For example, if sctp_outq_flush_rtx() earlier in this function returns an error, it will be lost if any invocation of the function sctp_packet_transmit() at the end function signals an error. I think you should always preserve the first error that is recorded into 'error'. I also wonder about why sctp_outq_flush_rtx() errors are completely ignored and don't influence the control flow here in any way. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net] sctp: fix a success return may hide an error @ 2016-08-13 4:11 ` David Miller 0 siblings, 0 replies; 29+ messages in thread From: David Miller @ 2016-08-13 4:11 UTC (permalink / raw) To: lucien.xin; +Cc: netdev, linux-sctp, marcelo.leitner, vyasevich, daniel From: Xin Long <lucien.xin@gmail.com> Date: Thu, 11 Aug 2016 20:52:58 +0800 > Now in the end of sctp_outq_flush, sctp calls sctp_packet_transmit > in a loop. The return of current sctp_packet_transmit always covers > the prior one's. If the last call of sctp_packet_transmit return a > success, it may hide the error that returns from the prior call. > > This patch is to fix this by keeping the old error until the new > error returns from sctp_packet_transmit. Did TAHI test against this > fix, no regression is found. > > Signed-off-by: Xin Long <lucien.xin@gmail.com> This style of error handling is dangerous. The first error can be lost. For example, if sctp_outq_flush_rtx() earlier in this function returns an error, it will be lost if any invocation of the function sctp_packet_transmit() at the end function signals an error. I think you should always preserve the first error that is recorded into 'error'. I also wonder about why sctp_outq_flush_rtx() errors are completely ignored and don't influence the control flow here in any way. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net] sctp: fix a success return may hide an error 2016-08-13 4:11 ` David Miller @ 2016-08-13 7:47 ` Xin Long -1 siblings, 0 replies; 29+ messages in thread From: Xin Long @ 2016-08-13 7:47 UTC (permalink / raw) To: David Miller Cc: network dev, linux-sctp, Marcelo Ricardo Leitner, Vladislav Yasevich, daniel > > This style of error handling is dangerous. The first error can be > lost. > > For example, if sctp_outq_flush_rtx() earlier in this function returns > an error, it will be lost if any invocation of the function > sctp_packet_transmit() at the end function signals an error. > > I think you should always preserve the first error that is recorded > into 'error'. > > I also wonder about why sctp_outq_flush_rtx() errors are completely > ignored and don't influence the control flow here in any way. Yes, the first error can be lost. Here we just keep the last error. We don't really have to return the first error or return it on the first failure. [1] Both sctp_outq_flush_rtx and sctp_packet_transmit can ONLY return one error (-ENOMEM), as sctp_outq_flush_rtx also calls sctp_packet_transmit. [2] It's the original codes that it doesn't return immediately when sctp_outq_flush_rtx returns error. I guess it just doesn't want to stop flushing out transport_list only because it fail to flush rtx. even sctp_packet_transmit_chunk in sctp_outq_flush also just put the error into sk->sk_err, instread of returning immediately. So we cannot return the err at the first failure as [2], the error here is always -ENOMEM as [1]. I think to return the last error here is ok, at least not dangerous, can also fix the issue "a success return may hide an error" with clear codes. :) ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net] sctp: fix a success return may hide an error @ 2016-08-13 7:47 ` Xin Long 0 siblings, 0 replies; 29+ messages in thread From: Xin Long @ 2016-08-13 7:47 UTC (permalink / raw) To: David Miller Cc: network dev, linux-sctp, Marcelo Ricardo Leitner, Vladislav Yasevich, daniel > > This style of error handling is dangerous. The first error can be > lost. > > For example, if sctp_outq_flush_rtx() earlier in this function returns > an error, it will be lost if any invocation of the function > sctp_packet_transmit() at the end function signals an error. > > I think you should always preserve the first error that is recorded > into 'error'. > > I also wonder about why sctp_outq_flush_rtx() errors are completely > ignored and don't influence the control flow here in any way. Yes, the first error can be lost. Here we just keep the last error. We don't really have to return the first error or return it on the first failure. [1] Both sctp_outq_flush_rtx and sctp_packet_transmit can ONLY return one error (-ENOMEM), as sctp_outq_flush_rtx also calls sctp_packet_transmit. [2] It's the original codes that it doesn't return immediately when sctp_outq_flush_rtx returns error. I guess it just doesn't want to stop flushing out transport_list only because it fail to flush rtx. even sctp_packet_transmit_chunk in sctp_outq_flush also just put the error into sk->sk_err, instread of returning immediately. So we cannot return the err at the first failure as [2], the error here is always -ENOMEM as [1]. I think to return the last error here is ok, at least not dangerous, can also fix the issue "a success return may hide an error" with clear codes. :) ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [PATCH net] sctp: fix a success return may hide an error 2016-08-13 7:47 ` Xin Long @ 2016-08-16 9:16 ` David Laight -1 siblings, 0 replies; 29+ messages in thread From: David Laight @ 2016-08-16 9:16 UTC (permalink / raw) To: 'Xin Long', David Miller Cc: network dev, linux-sctp@vger.kernel.org, Marcelo Ricardo Leitner, Vladislav Yasevich, daniel@iogearbox.net RnJvbTogWGluIExvbmcNCj4gU2VudDogMTMgQXVndXN0IDIwMTYgMDg6NDgNCj4gPg0KPiA+IFRo aXMgc3R5bGUgb2YgZXJyb3IgaGFuZGxpbmcgaXMgZGFuZ2Vyb3VzLiAgVGhlIGZpcnN0IGVycm9y IGNhbiBiZQ0KPiA+IGxvc3QuDQo+ID4NCj4gPiBGb3IgZXhhbXBsZSwgaWYgc2N0cF9vdXRxX2Zs dXNoX3J0eCgpIGVhcmxpZXIgaW4gdGhpcyBmdW5jdGlvbiByZXR1cm5zDQo+ID4gYW4gZXJyb3Is IGl0IHdpbGwgYmUgbG9zdCBpZiBhbnkgaW52b2NhdGlvbiBvZiB0aGUgZnVuY3Rpb24NCj4gPiBz Y3RwX3BhY2tldF90cmFuc21pdCgpIGF0IHRoZSBlbmQgZnVuY3Rpb24gc2lnbmFscyBhbiBlcnJv ci4NCj4gPg0KPiA+IEkgdGhpbmsgeW91IHNob3VsZCBhbHdheXMgcHJlc2VydmUgdGhlIGZpcnN0 IGVycm9yIHRoYXQgaXMgcmVjb3JkZWQNCj4gPiBpbnRvICdlcnJvcicuDQo+ID4NCj4gPiBJIGFs c28gd29uZGVyIGFib3V0IHdoeSBzY3RwX291dHFfZmx1c2hfcnR4KCkgZXJyb3JzIGFyZSBjb21w bGV0ZWx5DQo+ID4gaWdub3JlZCBhbmQgZG9uJ3QgaW5mbHVlbmNlIHRoZSBjb250cm9sIGZsb3cg aGVyZSBpbiBhbnkgd2F5Lg0KPiANCj4gWWVzLCB0aGUgZmlyc3QgZXJyb3IgY2FuIGJlIGxvc3Qu DQo+IEhlcmUgd2UganVzdCBrZWVwIHRoZSBsYXN0IGVycm9yLiBXZSBkb24ndCByZWFsbHkgaGF2 ZSB0byByZXR1cm4gdGhlDQo+IGZpcnN0IGVycm9yIG9yIHJldHVybiBpdCBvbiB0aGUgZmlyc3Qg ZmFpbHVyZS4NCj4gDQo+IFsxXQ0KPiBCb3RoIHNjdHBfb3V0cV9mbHVzaF9ydHggYW5kIHNjdHBf cGFja2V0X3RyYW5zbWl0IGNhbiBPTkxZDQo+IHJldHVybiBvbmUgZXJyb3IgKC1FTk9NRU0pLCBh cyBzY3RwX291dHFfZmx1c2hfcnR4IGFsc28gY2FsbHMNCj4gc2N0cF9wYWNrZXRfdHJhbnNtaXQu DQoNCldoYXQgaXMgdGhlIGVmZmVjdCBvZiB0aGUgZXJyb3I/DQpJZiBpdCBpcyAnanVzdCcgZXF1 aXZhbGVudCB0byBhIGxvc3QgZXRoZXJuZXQgcGFja2V0IChhbmQgdGhlIHNrYiAoZXRjKQ0KaXMg ZnJlZWQpIHRoZW4gdGhlIHByb3RvY29sIHdpbGwgcmVjb3Zlci4NCklmIGl0IGlzIGFueXRoaW5n IGVsc2UgdGhlbiB0aGUgZXJyb3IgcGF0aCBpcyBwcm9iYWJseSB3cm9uZy4NCg0KQWxzbyBhZnRl ciBvbmUgZXJyb3IgaXMgaXQgYWN0dWFsbHkgd29ydGggdHJ5aW5nIHRvIHNlbmQgYW55dGhpbmcg ZWxzZQ0KYXQgYWxsPyBJU1RNIHRoYXQgdGhlIGNvZGUgc2hvdWxkIGVpdGhlcjoNCjEpIHdhaXQg Zm9yIHJlc291cmNlcyBhbmQgcmV0cnkuDQoyKSBkaXNjYXJkIHRoZSBlbnRpcmUgcXVldWUgKGZy ZWVpbmcgcmVzb3VyY2UpIGFuZCBob3BlIHRoZSBwcm90b2NvbA0KICAgdGltZXJzIHdpbGwgcmVj b3Zlci4NCg0KPiBbMl0NCj4gSXQncyB0aGUgb3JpZ2luYWwgY29kZXMgdGhhdCBpdCBkb2Vzbid0 IHJldHVybiBpbW1lZGlhdGVseSB3aGVuDQo+IHNjdHBfb3V0cV9mbHVzaF9ydHggcmV0dXJucyBl cnJvci4gSSBndWVzcyBpdCBqdXN0IGRvZXNuJ3Qgd2FudA0KPiB0byBzdG9wIGZsdXNoaW5nIG91 dCB0cmFuc3BvcnRfbGlzdCBvbmx5IGJlY2F1c2UgaXQgZmFpbCB0byBmbHVzaA0KPiBydHguDQo+ IGV2ZW4gc2N0cF9wYWNrZXRfdHJhbnNtaXRfY2h1bmsgaW4gc2N0cF9vdXRxX2ZsdXNoIGFsc28g anVzdA0KPiBwdXQgdGhlIGVycm9yIGludG8gc2stPnNrX2VyciwgaW5zdHJlYWQgb2YgcmV0dXJu aW5nIGltbWVkaWF0ZWx5Lg0KPiANCj4gU28gd2UgY2Fubm90IHJldHVybiB0aGUgZXJyIGF0IHRo ZSBmaXJzdCBmYWlsdXJlIGFzIFsyXSwgdGhlIGVycm9yDQo+IGhlcmUgaXMgYWx3YXlzIC1FTk9N RU0gYXMgWzFdLg0KPiBJIHRoaW5rIHRvIHJldHVybiB0aGUgbGFzdCBlcnJvciBoZXJlIGlzIG9r LCBhdCBsZWFzdCAgbm90IGRhbmdlcm91cywNCj4gY2FuIGFsc28gZml4IHRoZSBpc3N1ZSAiYSBz dWNjZXNzIHJldHVybiBtYXkgaGlkZSBhbiBlcnJvciIgd2l0aA0KPiBjbGVhciBjb2Rlcy4gOikN Cg0KV2hpY2ggY29kZSBsb29rcyBhdCBzay0+c2tfZXJyPw0KSXQgZG9lc24ndCBsb29rIHJpZ2h0 IHRvIGJlIHNldHRpbmcgYW4gZXJyb3IgY29kZSBvbiB0aGUgc29ja2V0IGR1ZQ0KYSB0cmFuc21p dCBwYWNrZXQgZGlzY2FyZC4NCg0KCURhdmlkDQoNCg= ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [PATCH net] sctp: fix a success return may hide an error @ 2016-08-16 9:16 ` David Laight 0 siblings, 0 replies; 29+ messages in thread From: David Laight @ 2016-08-16 9:16 UTC (permalink / raw) To: 'Xin Long', David Miller Cc: network dev, linux-sctp@vger.kernel.org, Marcelo Ricardo Leitner, Vladislav Yasevich, daniel@iogearbox.net From: Xin Long > Sent: 13 August 2016 08:48 > > > > This style of error handling is dangerous. The first error can be > > lost. > > > > For example, if sctp_outq_flush_rtx() earlier in this function returns > > an error, it will be lost if any invocation of the function > > sctp_packet_transmit() at the end function signals an error. > > > > I think you should always preserve the first error that is recorded > > into 'error'. > > > > I also wonder about why sctp_outq_flush_rtx() errors are completely > > ignored and don't influence the control flow here in any way. > > Yes, the first error can be lost. > Here we just keep the last error. We don't really have to return the > first error or return it on the first failure. > > [1] > Both sctp_outq_flush_rtx and sctp_packet_transmit can ONLY > return one error (-ENOMEM), as sctp_outq_flush_rtx also calls > sctp_packet_transmit. What is the effect of the error? If it is 'just' equivalent to a lost ethernet packet (and the skb (etc) is freed) then the protocol will recover. If it is anything else then the error path is probably wrong. Also after one error is it actually worth trying to send anything else at all? ISTM that the code should either: 1) wait for resources and retry. 2) discard the entire queue (freeing resource) and hope the protocol timers will recover. > [2] > It's the original codes that it doesn't return immediately when > sctp_outq_flush_rtx returns error. I guess it just doesn't want > to stop flushing out transport_list only because it fail to flush > rtx. > even sctp_packet_transmit_chunk in sctp_outq_flush also just > put the error into sk->sk_err, instread of returning immediately. > > So we cannot return the err at the first failure as [2], the error > here is always -ENOMEM as [1]. > I think to return the last error here is ok, at least not dangerous, > can also fix the issue "a success return may hide an error" with > clear codes. :) Which code looks at sk->sk_err? It doesn't look right to be setting an error code on the socket due a transmit packet discard. David ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net] sctp: fix a success return may hide an error 2016-08-16 9:16 ` David Laight @ 2016-08-16 11:34 ` Xin Long -1 siblings, 0 replies; 29+ messages in thread From: Xin Long @ 2016-08-16 11:34 UTC (permalink / raw) To: David Laight Cc: David Miller, network dev, linux-sctp@vger.kernel.org, Marcelo Ricardo Leitner, Vladislav Yasevich, daniel@iogearbox.net >> >> [1] >> Both sctp_outq_flush_rtx and sctp_packet_transmit can ONLY >> return one error (-ENOMEM), as sctp_outq_flush_rtx also calls >> sctp_packet_transmit. > > What is the effect of the error? > If it is 'just' equivalent to a lost ethernet packet (and the skb (etc) > is freed) then the protocol will recover. > If it is anything else then the error path is probably wrong. This err returns back to sctp_sendmsg, there sctp will abort asoc. in this function, sctp tries to do 3 things: 1. flush rtx queue 2. transmit the packet of current transport 3. flush all the transports. Now sctp would do them one by one, even if one of them returns err. > > Also after one error is it actually worth trying to send anything else > at all? ISTM that the code should either: yeah, that's the problem. the "sctp_flush_out:" code tries to force clear all the transport before returning even if there're errors already. > 1) wait for resources and retry. > 2) discard the entire queue (freeing resource) and hope the protocol > timers will recover. It's a different process, will think about it. > >> [2] >> It's the original codes that it doesn't return immediately when >> sctp_outq_flush_rtx returns error. I guess it just doesn't want >> to stop flushing out transport_list only because it fail to flush >> rtx. >> even sctp_packet_transmit_chunk in sctp_outq_flush also just >> put the error into sk->sk_err, instread of returning immediately. >> >> So we cannot return the err at the first failure as [2], the error >> here is always -ENOMEM as [1]. >> I think to return the last error here is ok, at least not dangerous, >> can also fix the issue "a success return may hide an error" with >> clear codes. :) > > Which code looks at sk->sk_err? > It doesn't look right to be setting an error code on the socket due > a transmit packet discard. I guess sctp_packet_transmit_chunk's return value is used for 'status' (like PMTU_FULL,RWND_FUL... ), that's why err was put into sk->sk_err. This err is supposed to be checked in sctp_sendmsg, but there sctp_error check sk->sk_err only when err = -EPIPE. yes, we need to fix this, thanks. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net] sctp: fix a success return may hide an error @ 2016-08-16 11:34 ` Xin Long 0 siblings, 0 replies; 29+ messages in thread From: Xin Long @ 2016-08-16 11:34 UTC (permalink / raw) To: David Laight Cc: David Miller, network dev, linux-sctp@vger.kernel.org, Marcelo Ricardo Leitner, Vladislav Yasevich, daniel@iogearbox.net >> >> [1] >> Both sctp_outq_flush_rtx and sctp_packet_transmit can ONLY >> return one error (-ENOMEM), as sctp_outq_flush_rtx also calls >> sctp_packet_transmit. > > What is the effect of the error? > If it is 'just' equivalent to a lost ethernet packet (and the skb (etc) > is freed) then the protocol will recover. > If it is anything else then the error path is probably wrong. This err returns back to sctp_sendmsg, there sctp will abort asoc. in this function, sctp tries to do 3 things: 1. flush rtx queue 2. transmit the packet of current transport 3. flush all the transports. Now sctp would do them one by one, even if one of them returns err. > > Also after one error is it actually worth trying to send anything else > at all? ISTM that the code should either: yeah, that's the problem. the "sctp_flush_out:" code tries to force clear all the transport before returning even if there're errors already. > 1) wait for resources and retry. > 2) discard the entire queue (freeing resource) and hope the protocol > timers will recover. It's a different process, will think about it. > >> [2] >> It's the original codes that it doesn't return immediately when >> sctp_outq_flush_rtx returns error. I guess it just doesn't want >> to stop flushing out transport_list only because it fail to flush >> rtx. >> even sctp_packet_transmit_chunk in sctp_outq_flush also just >> put the error into sk->sk_err, instread of returning immediately. >> >> So we cannot return the err at the first failure as [2], the error >> here is always -ENOMEM as [1]. >> I think to return the last error here is ok, at least not dangerous, >> can also fix the issue "a success return may hide an error" with >> clear codes. :) > > Which code looks at sk->sk_err? > It doesn't look right to be setting an error code on the socket due > a transmit packet discard. I guess sctp_packet_transmit_chunk's return value is used for 'status' (like PMTU_FULL,RWND_FUL... ), that's why err was put into sk->sk_err. This err is supposed to be checked in sctp_sendmsg, but there sctp_error check sk->sk_err only when err == -EPIPE. yes, we need to fix this, thanks. ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [PATCH net] sctp: fix a success return may hide an error 2016-08-16 11:34 ` Xin Long @ 2016-08-16 16:01 ` David Laight -1 siblings, 0 replies; 29+ messages in thread From: David Laight @ 2016-08-16 16:01 UTC (permalink / raw) To: 'Xin Long' Cc: David Miller, network dev, linux-sctp@vger.kernel.org, Marcelo Ricardo Leitner, Vladislav Yasevich, daniel@iogearbox.net RnJvbTogWGluIExvbmcNCj4gU2VudDogMTYgQXVndXN0IDIwMTYgMTI6MzQNCj4NCj4gPj4gQm90 aCBzY3RwX291dHFfZmx1c2hfcnR4IGFuZCBzY3RwX3BhY2tldF90cmFuc21pdCBjYW4gT05MWQ0K PiA+PiByZXR1cm4gb25lIGVycm9yICgtRU5PTUVNKSwgYXMgc2N0cF9vdXRxX2ZsdXNoX3J0eCBh bHNvIGNhbGxzDQo+ID4+IHNjdHBfcGFja2V0X3RyYW5zbWl0Lg0KPiA+DQo+ID4gV2hhdCBpcyB0 aGUgZWZmZWN0IG9mIHRoZSBlcnJvcj8NCj4gPiBJZiBpdCBpcyAnanVzdCcgZXF1aXZhbGVudCB0 byBhIGxvc3QgZXRoZXJuZXQgcGFja2V0IChhbmQgdGhlIHNrYiAoZXRjKQ0KPiA+IGlzIGZyZWVk KSB0aGVuIHRoZSBwcm90b2NvbCB3aWxsIHJlY292ZXIuDQo+ID4gSWYgaXQgaXMgYW55dGhpbmcg ZWxzZSB0aGVuIHRoZSBlcnJvciBwYXRoIGlzIHByb2JhYmx5IHdyb25nLg0KPg0KPiBUaGlzIGVy ciByZXR1cm5zIGJhY2sgdG8gc2N0cF9zZW5kbXNnLCB0aGVyZSBzY3RwIHdpbGwgYWJvcnQgYXNv Yy4NCg0KVGhhdCBkb2Vzbid0IHNlZW0gYSBnb29kIGlkZWEuDQpZb3UgZG9uJ3Qgd2FudCB0byBh Ym9ydCB0aGUgYXNzb2NpYXRpb24gaWYgdGhlcmUgaXMgYSB0cmFuc2llbnQNCm1lbW9yeSBhbGxv Y2F0aW9uIGZhaWx1cmUuDQpZb3UgYWxzbyBjYW4ndCBkcm9wIGRhdGEgY2h1bmtzLg0KDQo+IGlu IHRoaXMgZnVuY3Rpb24sIHNjdHAgdHJpZXMgdG8gZG8gMyB0aGluZ3M6DQo+IDEuIGZsdXNoIHJ0 eCBxdWV1ZQ0KPiAyLiB0cmFuc21pdCB0aGUgcGFja2V0IG9mIGN1cnJlbnQgdHJhbnNwb3J0DQo+ IDMuIGZsdXNoIGFsbCB0aGUgdHJhbnNwb3J0cy4NCj4gTm93IHNjdHAgd291bGQgZG8gdGhlbSBv bmUgYnkgb25lLCBldmVuIGlmIG9uZSBvZiB0aGVtIHJldHVybnMgZXJyLg0KDQpZb3UgcHJvYmFi bHkgbmVlZCB0byBleHBsYWluIHdoYXQgJ2ZsdXNoJyBtZWFucyBoZXJlLg0KSSB0aGluayBpdCBt ZWFucyAncHJvY2VzcyBhbmQgc2VuZCcsIGJ1dCBpdCBtaWdodCBtZWFuICdkaXNjYXJkIHRoZQ0K Y29udGVudHMgb2YnLg0KDQpMYXN0IHRpbWUgSSBsb29rZWQgYXQgdGhlIHNjdHAgY29kZSBteSBo ZWFkIGV4cGxvZGVkLg0KSVNUUiBpdCBpcyBhIG1lc3Mgb2YgdGltaW5nIGVycm9ycyB3YWl0aW5n IHRvIGhhcHBlbg0KKGFuZCBJIHdyaXRlIGNvbW1zIHByb3RvY29sIHN0YWNrIGNvZGUgZm9yIGEg bGl2aW5nKS4NCg0KCURhdmlkDQoNCg= ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [PATCH net] sctp: fix a success return may hide an error @ 2016-08-16 16:01 ` David Laight 0 siblings, 0 replies; 29+ messages in thread From: David Laight @ 2016-08-16 16:01 UTC (permalink / raw) To: 'Xin Long' Cc: David Miller, network dev, linux-sctp@vger.kernel.org, Marcelo Ricardo Leitner, Vladislav Yasevich, daniel@iogearbox.net From: Xin Long > Sent: 16 August 2016 12:34 > > >> Both sctp_outq_flush_rtx and sctp_packet_transmit can ONLY > >> return one error (-ENOMEM), as sctp_outq_flush_rtx also calls > >> sctp_packet_transmit. > > > > What is the effect of the error? > > If it is 'just' equivalent to a lost ethernet packet (and the skb (etc) > > is freed) then the protocol will recover. > > If it is anything else then the error path is probably wrong. > > This err returns back to sctp_sendmsg, there sctp will abort asoc. That doesn't seem a good idea. You don't want to abort the association if there is a transient memory allocation failure. You also can't drop data chunks. > in this function, sctp tries to do 3 things: > 1. flush rtx queue > 2. transmit the packet of current transport > 3. flush all the transports. > Now sctp would do them one by one, even if one of them returns err. You probably need to explain what 'flush' means here. I think it means 'process and send', but it might mean 'discard the contents of'. Last time I looked at the sctp code my head exploded. ISTR it is a mess of timing errors waiting to happen (and I write comms protocol stack code for a living). David ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net] sctp: fix a success return may hide an error 2016-08-16 16:01 ` David Laight @ 2016-08-16 17:24 ` Marcelo Ricardo Leitner -1 siblings, 0 replies; 29+ messages in thread From: Marcelo Ricardo Leitner @ 2016-08-16 17:24 UTC (permalink / raw) To: David Laight Cc: 'Xin Long', David Miller, network dev, linux-sctp@vger.kernel.org, Vladislav Yasevich, daniel@iogearbox.net On Tue, Aug 16, 2016 at 04:01:50PM +0000, David Laight wrote: > From: Xin Long > > Sent: 16 August 2016 12:34 > > > > >> Both sctp_outq_flush_rtx and sctp_packet_transmit can ONLY > > >> return one error (-ENOMEM), as sctp_outq_flush_rtx also calls > > >> sctp_packet_transmit. > > > > > > What is the effect of the error? > > > If it is 'just' equivalent to a lost ethernet packet (and the skb (etc) > > > is freed) then the protocol will recover. > > > If it is anything else then the error path is probably wrong. > > > > This err returns back to sctp_sendmsg, there sctp will abort asoc. That's not right I think. sctp_sendmsg will only free the asoc if it was created to send that specific chunk. And in this case, this change should have no effect as it can't have sctp_outq_flush() touching several transports in a row. I'm basing on: out_free: if (new_asoc) sctp_association_free(asoc); and sctp_recvmsg will just fetch, return and clear the error via sctp_skb_recv_datagram, but not free it. Do you see any other place freeing it? > > That doesn't seem a good idea. > You don't want to abort the association if there is a transient > memory allocation failure. > You also can't drop data chunks. From a system-wise POV, this behavior - to free the new asoc in case of transient memory allocation failure - doesn't seem bad to me. That's what will have to happen if any allocation before it failed and also it helps the system to reduce the stress a little bit. I don't see any inconsistency/problems here because we are not dropping a single random chunk but instead we are actually refusing to initialize a new asoc in such conditions. Nevertheless, I agree that letting the application see ENOMEM errors when the data actually got queued and is being fully handled, as in, it will be retransmitted later, is not be wise, as the application probably won't be able to distinguish from ENOMEMs that it should retry or not. Here I see a problem, yet it's not due to this specific change, perhaps it just got attention because of it. In this situation, we should handle ENOMEMs internally if possible so the application can know that if it hits an ENOMEM, it's real and it has to retry. Fixing this inconsistency may very well cause us to let that new asoc to live longer, works for me too. > > > in this function, sctp tries to do 3 things: > > 1. flush rtx queue > > 2. transmit the packet of current transport > > 3. flush all the transports. > > Now sctp would do them one by one, even if one of them returns err. > > You probably need to explain what 'flush' means here. > I think it means 'process and send', but it might mean 'discard the > contents of'. Yes, the first. He probably use the work 'flush' because the function is called .._flush_.. > Last time I looked at the sctp code my head exploded. > ISTR it is a mess of timing errors waiting to happen > (and I write comms protocol stack code for a living). Well, it may be, but we are trying to improve it. Please continue discussing the fixes so we can keep improving it. :) Marcelo ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net] sctp: fix a success return may hide an error @ 2016-08-16 17:24 ` Marcelo Ricardo Leitner 0 siblings, 0 replies; 29+ messages in thread From: Marcelo Ricardo Leitner @ 2016-08-16 17:24 UTC (permalink / raw) To: David Laight Cc: 'Xin Long', David Miller, network dev, linux-sctp@vger.kernel.org, Vladislav Yasevich, daniel@iogearbox.net On Tue, Aug 16, 2016 at 04:01:50PM +0000, David Laight wrote: > From: Xin Long > > Sent: 16 August 2016 12:34 > > > > >> Both sctp_outq_flush_rtx and sctp_packet_transmit can ONLY > > >> return one error (-ENOMEM), as sctp_outq_flush_rtx also calls > > >> sctp_packet_transmit. > > > > > > What is the effect of the error? > > > If it is 'just' equivalent to a lost ethernet packet (and the skb (etc) > > > is freed) then the protocol will recover. > > > If it is anything else then the error path is probably wrong. > > > > This err returns back to sctp_sendmsg, there sctp will abort asoc. That's not right I think. sctp_sendmsg will only free the asoc if it was created to send that specific chunk. And in this case, this change should have no effect as it can't have sctp_outq_flush() touching several transports in a row. I'm basing on: out_free: if (new_asoc) sctp_association_free(asoc); and sctp_recvmsg will just fetch, return and clear the error via sctp_skb_recv_datagram, but not free it. Do you see any other place freeing it? > > That doesn't seem a good idea. > You don't want to abort the association if there is a transient > memory allocation failure. > You also can't drop data chunks. >From a system-wise POV, this behavior - to free the new asoc in case of transient memory allocation failure - doesn't seem bad to me. That's what will have to happen if any allocation before it failed and also it helps the system to reduce the stress a little bit. I don't see any inconsistency/problems here because we are not dropping a single random chunk but instead we are actually refusing to initialize a new asoc in such conditions. Nevertheless, I agree that letting the application see ENOMEM errors when the data actually got queued and is being fully handled, as in, it will be retransmitted later, is not be wise, as the application probably won't be able to distinguish from ENOMEMs that it should retry or not. Here I see a problem, yet it's not due to this specific change, perhaps it just got attention because of it. In this situation, we should handle ENOMEMs internally if possible so the application can know that if it hits an ENOMEM, it's real and it has to retry. Fixing this inconsistency may very well cause us to let that new asoc to live longer, works for me too. > > > in this function, sctp tries to do 3 things: > > 1. flush rtx queue > > 2. transmit the packet of current transport > > 3. flush all the transports. > > Now sctp would do them one by one, even if one of them returns err. > > You probably need to explain what 'flush' means here. > I think it means 'process and send', but it might mean 'discard the > contents of'. Yes, the first. He probably use the work 'flush' because the function is called .._flush_.. > Last time I looked at the sctp code my head exploded. > ISTR it is a mess of timing errors waiting to happen > (and I write comms protocol stack code for a living). Well, it may be, but we are trying to improve it. Please continue discussing the fixes so we can keep improving it. :) Marcelo ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net] sctp: fix a success return may hide an error 2016-08-16 17:24 ` Marcelo Ricardo Leitner @ 2016-08-16 18:24 ` Xin Long -1 siblings, 0 replies; 29+ messages in thread From: Xin Long @ 2016-08-16 18:24 UTC (permalink / raw) To: Marcelo Ricardo Leitner Cc: David Laight, David Miller, network dev, linux-sctp@vger.kernel.org, Vladislav Yasevich, daniel@iogearbox.net >> > This err returns back to sctp_sendmsg, there sctp will abort asoc. > > That's not right I think. sctp_sendmsg will only free the asoc if it was > created to send that specific chunk. And in this case, this change > should have no effect as it can't have sctp_outq_flush() touching > several transports in a row. > > I'm basing on: > out_free: > if (new_asoc) > sctp_association_free(asoc); > > and sctp_recvmsg will just fetch, return and clear the error via > sctp_skb_recv_datagram, but not free it. > > Do you see any other place freeing it? Sorry, you are right, it free assoc just for new_asoc. > >> >> That doesn't seem a good idea. >> You don't want to abort the association if there is a transient >> memory allocation failure. >> You also can't drop data chunks. > > From a system-wise POV, this behavior - to free the new asoc in case of > transient memory allocation failure - doesn't seem bad to me. > That's what will have to happen if any allocation before it failed and > also it helps the system to reduce the stress a little bit. I don't see > any inconsistency/problems here because we are not dropping a single > random chunk but instead we are actually refusing to initialize a new > asoc in such conditions. > > Nevertheless, I agree that letting the application see ENOMEM errors when > the data actually got queued and is being fully handled, as in, it will > be retransmitted later, is not be wise, as the application probably > won't be able to distinguish from ENOMEMs that it should retry or not. > Here I see a problem, yet it's not due to this specific change, perhaps > it just got attention because of it. In this situation, we should handle > ENOMEMs internally if possible so the application can know that if it > hits an ENOMEM, it's real and it has to retry. If letting the application see ENOMEM errors, and sctp has to drop this chunk, instead of retransmiting the ENOMEM chunk, but the ENOMEM chunk may not be the chunk from current msg, as it flush all the queue. even if users get an ENOMEM error, they may re-send a chunk that is same with the one that is still in retransmit queue. > > Fixing this inconsistency may very well cause us to let that new asoc to > live longer, works for me too. > >> >> > in this function, sctp tries to do 3 things: >> > 1. flush rtx queue >> > 2. transmit the packet of current transport >> > 3. flush all the transports. >> > Now sctp would do them one by one, even if one of them returns err. >> >> You probably need to explain what 'flush' means here. >> I think it means 'process and send', but it might mean 'discard the >> contents of'. > > Yes, the first. He probably use the work 'flush' because the function is > called .._flush_.. Yes, :D > >> Last time I looked at the sctp code my head exploded. >> ISTR it is a mess of timing errors waiting to happen >> (and I write comms protocol stack code for a living). > > Well, it may be, but we are trying to improve it. Please continue > discussing the fixes so we can keep improving it. :) > > Marcelo > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net] sctp: fix a success return may hide an error @ 2016-08-16 18:24 ` Xin Long 0 siblings, 0 replies; 29+ messages in thread From: Xin Long @ 2016-08-16 18:24 UTC (permalink / raw) To: Marcelo Ricardo Leitner Cc: David Laight, David Miller, network dev, linux-sctp@vger.kernel.org, Vladislav Yasevich, daniel@iogearbox.net >> > This err returns back to sctp_sendmsg, there sctp will abort asoc. > > That's not right I think. sctp_sendmsg will only free the asoc if it was > created to send that specific chunk. And in this case, this change > should have no effect as it can't have sctp_outq_flush() touching > several transports in a row. > > I'm basing on: > out_free: > if (new_asoc) > sctp_association_free(asoc); > > and sctp_recvmsg will just fetch, return and clear the error via > sctp_skb_recv_datagram, but not free it. > > Do you see any other place freeing it? Sorry, you are right, it free assoc just for new_asoc. > >> >> That doesn't seem a good idea. >> You don't want to abort the association if there is a transient >> memory allocation failure. >> You also can't drop data chunks. > > From a system-wise POV, this behavior - to free the new asoc in case of > transient memory allocation failure - doesn't seem bad to me. > That's what will have to happen if any allocation before it failed and > also it helps the system to reduce the stress a little bit. I don't see > any inconsistency/problems here because we are not dropping a single > random chunk but instead we are actually refusing to initialize a new > asoc in such conditions. > > Nevertheless, I agree that letting the application see ENOMEM errors when > the data actually got queued and is being fully handled, as in, it will > be retransmitted later, is not be wise, as the application probably > won't be able to distinguish from ENOMEMs that it should retry or not. > Here I see a problem, yet it's not due to this specific change, perhaps > it just got attention because of it. In this situation, we should handle > ENOMEMs internally if possible so the application can know that if it > hits an ENOMEM, it's real and it has to retry. If letting the application see ENOMEM errors, and sctp has to drop this chunk, instead of retransmiting the ENOMEM chunk, but the ENOMEM chunk may not be the chunk from current msg, as it flush all the queue. even if users get an ENOMEM error, they may re-send a chunk that is same with the one that is still in retransmit queue. > > Fixing this inconsistency may very well cause us to let that new asoc to > live longer, works for me too. > >> >> > in this function, sctp tries to do 3 things: >> > 1. flush rtx queue >> > 2. transmit the packet of current transport >> > 3. flush all the transports. >> > Now sctp would do them one by one, even if one of them returns err. >> >> You probably need to explain what 'flush' means here. >> I think it means 'process and send', but it might mean 'discard the >> contents of'. > > Yes, the first. He probably use the work 'flush' because the function is > called .._flush_.. Yes, :D > >> Last time I looked at the sctp code my head exploded. >> ISTR it is a mess of timing errors waiting to happen >> (and I write comms protocol stack code for a living). > > Well, it may be, but we are trying to improve it. Please continue > discussing the fixes so we can keep improving it. :) > > Marcelo > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net] sctp: fix a success return may hide an error 2016-08-16 18:24 ` Xin Long @ 2016-08-16 18:33 ` Marcelo Ricardo Leitner -1 siblings, 0 replies; 29+ messages in thread From: Marcelo Ricardo Leitner @ 2016-08-16 18:33 UTC (permalink / raw) To: Xin Long Cc: David Laight, David Miller, network dev, linux-sctp@vger.kernel.org, Vladislav Yasevich, daniel@iogearbox.net On Wed, Aug 17, 2016 at 02:24:19AM +0800, Xin Long wrote: > >> > This err returns back to sctp_sendmsg, there sctp will abort asoc. > > > > That's not right I think. sctp_sendmsg will only free the asoc if it was > > created to send that specific chunk. And in this case, this change > > should have no effect as it can't have sctp_outq_flush() touching > > several transports in a row. > > > > I'm basing on: > > out_free: > > if (new_asoc) > > sctp_association_free(asoc); > > > > and sctp_recvmsg will just fetch, return and clear the error via > > sctp_skb_recv_datagram, but not free it. > > > > Do you see any other place freeing it? > Sorry, you are right, it free assoc just for new_asoc. > > > > >> > >> That doesn't seem a good idea. > >> You don't want to abort the association if there is a transient > >> memory allocation failure. > >> You also can't drop data chunks. > > > > From a system-wise POV, this behavior - to free the new asoc in case of > > transient memory allocation failure - doesn't seem bad to me. > > That's what will have to happen if any allocation before it failed and > > also it helps the system to reduce the stress a little bit. I don't see > > any inconsistency/problems here because we are not dropping a single > > random chunk but instead we are actually refusing to initialize a new > > asoc in such conditions. > > > > Nevertheless, I agree that letting the application see ENOMEM errors when > > the data actually got queued and is being fully handled, as in, it will > > be retransmitted later, is not be wise, as the application probably > > won't be able to distinguish from ENOMEMs that it should retry or not. > > Here I see a problem, yet it's not due to this specific change, perhaps > > it just got attention because of it. In this situation, we should handle > > ENOMEMs internally if possible so the application can know that if it > > hits an ENOMEM, it's real and it has to retry. > If letting the application see ENOMEM errors, and sctp has to drop this > chunk, instead of retransmiting the ENOMEM chunk, but the ENOMEM > chunk may not be the chunk from current msg, as it flush all the queue. > even if users get an ENOMEM error, they may re-send a chunk that is same > with the one that is still in retransmit queue. Yep, one more reason to handle those internally when safe. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net] sctp: fix a success return may hide an error @ 2016-08-16 18:33 ` Marcelo Ricardo Leitner 0 siblings, 0 replies; 29+ messages in thread From: Marcelo Ricardo Leitner @ 2016-08-16 18:33 UTC (permalink / raw) To: Xin Long Cc: David Laight, David Miller, network dev, linux-sctp@vger.kernel.org, Vladislav Yasevich, daniel@iogearbox.net On Wed, Aug 17, 2016 at 02:24:19AM +0800, Xin Long wrote: > >> > This err returns back to sctp_sendmsg, there sctp will abort asoc. > > > > That's not right I think. sctp_sendmsg will only free the asoc if it was > > created to send that specific chunk. And in this case, this change > > should have no effect as it can't have sctp_outq_flush() touching > > several transports in a row. > > > > I'm basing on: > > out_free: > > if (new_asoc) > > sctp_association_free(asoc); > > > > and sctp_recvmsg will just fetch, return and clear the error via > > sctp_skb_recv_datagram, but not free it. > > > > Do you see any other place freeing it? > Sorry, you are right, it free assoc just for new_asoc. > > > > >> > >> That doesn't seem a good idea. > >> You don't want to abort the association if there is a transient > >> memory allocation failure. > >> You also can't drop data chunks. > > > > From a system-wise POV, this behavior - to free the new asoc in case of > > transient memory allocation failure - doesn't seem bad to me. > > That's what will have to happen if any allocation before it failed and > > also it helps the system to reduce the stress a little bit. I don't see > > any inconsistency/problems here because we are not dropping a single > > random chunk but instead we are actually refusing to initialize a new > > asoc in such conditions. > > > > Nevertheless, I agree that letting the application see ENOMEM errors when > > the data actually got queued and is being fully handled, as in, it will > > be retransmitted later, is not be wise, as the application probably > > won't be able to distinguish from ENOMEMs that it should retry or not. > > Here I see a problem, yet it's not due to this specific change, perhaps > > it just got attention because of it. In this situation, we should handle > > ENOMEMs internally if possible so the application can know that if it > > hits an ENOMEM, it's real and it has to retry. > If letting the application see ENOMEM errors, and sctp has to drop this > chunk, instead of retransmiting the ENOMEM chunk, but the ENOMEM > chunk may not be the chunk from current msg, as it flush all the queue. > even if users get an ENOMEM error, they may re-send a chunk that is same > with the one that is still in retransmit queue. Yep, one more reason to handle those internally when safe. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net] sctp: fix a success return may hide an error 2016-08-16 18:33 ` Marcelo Ricardo Leitner @ 2016-08-16 18:45 ` Marcelo Ricardo Leitner -1 siblings, 0 replies; 29+ messages in thread From: Marcelo Ricardo Leitner @ 2016-08-16 18:45 UTC (permalink / raw) To: Xin Long Cc: David Laight, David Miller, network dev, linux-sctp@vger.kernel.org, Vladislav Yasevich, daniel@iogearbox.net On Tue, Aug 16, 2016 at 03:33:30PM -0300, Marcelo Ricardo Leitner wrote: > On Wed, Aug 17, 2016 at 02:24:19AM +0800, Xin Long wrote: > > >> > This err returns back to sctp_sendmsg, there sctp will abort asoc. > > > > > > That's not right I think. sctp_sendmsg will only free the asoc if it was > > > created to send that specific chunk. And in this case, this change > > > should have no effect as it can't have sctp_outq_flush() touching > > > several transports in a row. > > > > > > I'm basing on: > > > out_free: > > > if (new_asoc) > > > sctp_association_free(asoc); > > > > > > and sctp_recvmsg will just fetch, return and clear the error via > > > sctp_skb_recv_datagram, but not free it. > > > > > > Do you see any other place freeing it? > > Sorry, you are right, it free assoc just for new_asoc. > > > > > > > >> > > >> That doesn't seem a good idea. > > >> You don't want to abort the association if there is a transient > > >> memory allocation failure. > > >> You also can't drop data chunks. > > > > > > From a system-wise POV, this behavior - to free the new asoc in case of > > > transient memory allocation failure - doesn't seem bad to me. > > > That's what will have to happen if any allocation before it failed and > > > also it helps the system to reduce the stress a little bit. I don't see > > > any inconsistency/problems here because we are not dropping a single > > > random chunk but instead we are actually refusing to initialize a new > > > asoc in such conditions. > > > > > > Nevertheless, I agree that letting the application see ENOMEM errors when > > > the data actually got queued and is being fully handled, as in, it will > > > be retransmitted later, is not be wise, as the application probably > > > won't be able to distinguish from ENOMEMs that it should retry or not. > > > Here I see a problem, yet it's not due to this specific change, perhaps > > > it just got attention because of it. In this situation, we should handle > > > ENOMEMs internally if possible so the application can know that if it > > > hits an ENOMEM, it's real and it has to retry. > > If letting the application see ENOMEM errors, and sctp has to drop this > > chunk, instead of retransmiting the ENOMEM chunk, but the ENOMEM > > chunk may not be the chunk from current msg, as it flush all the queue. > > even if users get an ENOMEM error, they may re-send a chunk that is same > > with the one that is still in retransmit queue. > > Yep, one more reason to handle those internally when safe. Xin, maybe you can squash this patch and this ENOMEM handling? I'm thinking that handling ENOMEM may result in similar situations in other places, so we have a common reasoning on them. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net] sctp: fix a success return may hide an error @ 2016-08-16 18:45 ` Marcelo Ricardo Leitner 0 siblings, 0 replies; 29+ messages in thread From: Marcelo Ricardo Leitner @ 2016-08-16 18:45 UTC (permalink / raw) To: Xin Long Cc: David Laight, David Miller, network dev, linux-sctp@vger.kernel.org, Vladislav Yasevich, daniel@iogearbox.net On Tue, Aug 16, 2016 at 03:33:30PM -0300, Marcelo Ricardo Leitner wrote: > On Wed, Aug 17, 2016 at 02:24:19AM +0800, Xin Long wrote: > > >> > This err returns back to sctp_sendmsg, there sctp will abort asoc. > > > > > > That's not right I think. sctp_sendmsg will only free the asoc if it was > > > created to send that specific chunk. And in this case, this change > > > should have no effect as it can't have sctp_outq_flush() touching > > > several transports in a row. > > > > > > I'm basing on: > > > out_free: > > > if (new_asoc) > > > sctp_association_free(asoc); > > > > > > and sctp_recvmsg will just fetch, return and clear the error via > > > sctp_skb_recv_datagram, but not free it. > > > > > > Do you see any other place freeing it? > > Sorry, you are right, it free assoc just for new_asoc. > > > > > > > >> > > >> That doesn't seem a good idea. > > >> You don't want to abort the association if there is a transient > > >> memory allocation failure. > > >> You also can't drop data chunks. > > > > > > From a system-wise POV, this behavior - to free the new asoc in case of > > > transient memory allocation failure - doesn't seem bad to me. > > > That's what will have to happen if any allocation before it failed and > > > also it helps the system to reduce the stress a little bit. I don't see > > > any inconsistency/problems here because we are not dropping a single > > > random chunk but instead we are actually refusing to initialize a new > > > asoc in such conditions. > > > > > > Nevertheless, I agree that letting the application see ENOMEM errors when > > > the data actually got queued and is being fully handled, as in, it will > > > be retransmitted later, is not be wise, as the application probably > > > won't be able to distinguish from ENOMEMs that it should retry or not. > > > Here I see a problem, yet it's not due to this specific change, perhaps > > > it just got attention because of it. In this situation, we should handle > > > ENOMEMs internally if possible so the application can know that if it > > > hits an ENOMEM, it's real and it has to retry. > > If letting the application see ENOMEM errors, and sctp has to drop this > > chunk, instead of retransmiting the ENOMEM chunk, but the ENOMEM > > chunk may not be the chunk from current msg, as it flush all the queue. > > even if users get an ENOMEM error, they may re-send a chunk that is same > > with the one that is still in retransmit queue. > > Yep, one more reason to handle those internally when safe. Xin, maybe you can squash this patch and this ENOMEM handling? I'm thinking that handling ENOMEM may result in similar situations in other places, so we have a common reasoning on them. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net] sctp: fix a success return may hide an error 2016-08-16 18:45 ` Marcelo Ricardo Leitner @ 2016-08-17 11:42 ` Xin Long -1 siblings, 0 replies; 29+ messages in thread From: Xin Long @ 2016-08-17 11:42 UTC (permalink / raw) To: Marcelo Ricardo Leitner Cc: David Laight, David Miller, network dev, linux-sctp@vger.kernel.org, Vladislav Yasevich, daniel@iogearbox.net >> > If letting the application see ENOMEM errors, and sctp has to drop this >> > chunk, instead of retransmiting the ENOMEM chunk, but the ENOMEM >> > chunk may not be the chunk from current msg, as it flush all the queue. >> > even if users get an ENOMEM error, they may re-send a chunk that is same >> > with the one that is still in retransmit queue. >> >> Yep, one more reason to handle those internally when safe. I just checked tcp_sendmsg, it doesn't return any transmit error to user, *NOT ONLY* ENOMEM. you can check __tcp_push_pending_frames and tcp_push, their return type is even void. although it may get err from sk->sk_err: err = sk_stream_error(sk, flags, err); But I didn't see it put any err into sk->sk_err in the main transmit path. yes, tcp_write_xmit has return value, as well as tcp_transmit_skb and err = icsk->icsk_af_ops->queue_xmit(sk, skb, &inet->cork.fl). but all of them are just used for internal, never return to userspace In tcp_write_xmit, it even uses "unlikely': if (unlikely(tcp_transmit_skb(sk, skb, 1, gfp))) break; > > Xin, maybe you can squash this patch and this ENOMEM handling? I'm > thinking that handling ENOMEM may result in similar situations in other > places, so we have a common reasoning on them. > So this reason does really matter, and not only for ENOMEM in transmit path. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net] sctp: fix a success return may hide an error @ 2016-08-17 11:42 ` Xin Long 0 siblings, 0 replies; 29+ messages in thread From: Xin Long @ 2016-08-17 11:42 UTC (permalink / raw) To: Marcelo Ricardo Leitner Cc: David Laight, David Miller, network dev, linux-sctp@vger.kernel.org, Vladislav Yasevich, daniel@iogearbox.net >> > If letting the application see ENOMEM errors, and sctp has to drop this >> > chunk, instead of retransmiting the ENOMEM chunk, but the ENOMEM >> > chunk may not be the chunk from current msg, as it flush all the queue. >> > even if users get an ENOMEM error, they may re-send a chunk that is same >> > with the one that is still in retransmit queue. >> >> Yep, one more reason to handle those internally when safe. I just checked tcp_sendmsg, it doesn't return any transmit error to user, *NOT ONLY* ENOMEM. you can check __tcp_push_pending_frames and tcp_push, their return type is even void. although it may get err from sk->sk_err: err = sk_stream_error(sk, flags, err); But I didn't see it put any err into sk->sk_err in the main transmit path. yes, tcp_write_xmit has return value, as well as tcp_transmit_skb and err = icsk->icsk_af_ops->queue_xmit(sk, skb, &inet->cork.fl). but all of them are just used for internal, never return to userspace In tcp_write_xmit, it even uses "unlikely': if (unlikely(tcp_transmit_skb(sk, skb, 1, gfp))) break; > > Xin, maybe you can squash this patch and this ENOMEM handling? I'm > thinking that handling ENOMEM may result in similar situations in other > places, so we have a common reasoning on them. > So this reason does really matter, and not only for ENOMEM in transmit path. ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [PATCH net] sctp: fix a success return may hide an error 2016-08-16 17:24 ` Marcelo Ricardo Leitner (?) (?) @ 2016-08-17 9:01 ` David Laight 2016-08-18 17:44 ` 'Marcelo Ricardo Leitner' -1 siblings, 1 reply; 29+ messages in thread From: David Laight @ 2016-08-17 9:01 UTC (permalink / raw) To: 'Marcelo Ricardo Leitner' Cc: 'Xin Long', David Miller, network dev, linux-sctp@vger.kernel.org, Vladislav Yasevich, daniel@iogearbox.net From: Marcelo Ricardo Leitner > Sent: 16 August 2016 18:25 ... > > That doesn't seem a good idea. > > You don't want to abort the association if there is a transient > > memory allocation failure. > > You also can't drop data chunks. > > From a system-wise POV, this behavior - to free the new asoc in case of > transient memory allocation failure - doesn't seem bad to me. > That's what will have to happen if any allocation before it failed and > also it helps the system to reduce the stress a little bit. I don't see > any inconsistency/problems here because we are not dropping a single > random chunk but instead we are actually refusing to initialize a new > asoc in such conditions. Failing a new association should be ok, whether purists will like connect() failing ENOMEM is another matter. > Nevertheless, I agree that letting the application see ENOMEM errors when > the data actually got queued and is being fully handled, as in, it will > be retransmitted later, is not be wise, as the application probably > won't be able to distinguish from ENOMEMs that it should retry or not. I think an application would be justified in thinking that an ENOMEM return meant that the system call had no effect. For send() even ENOMEM is really wrong, it should be treated as 'flow control' and either block or return EAGAIN/EWOULDBLOCK. Getting POLLOUT set is left as an exercise to the reader :-) ... > Well, it may be, but we are trying to improve it. Please continue > discussing the fixes so we can keep improving it. :) Indeed, we have customers who use sctp (for M3UA). We don't do anything 'complicated', but do end up sending a lot of short data chunks. David ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net] sctp: fix a success return may hide an error 2016-08-17 9:01 ` David Laight @ 2016-08-18 17:44 ` 'Marcelo Ricardo Leitner' 0 siblings, 0 replies; 29+ messages in thread From: 'Marcelo Ricardo Leitner' @ 2016-08-18 17:44 UTC (permalink / raw) To: David Laight Cc: 'Xin Long', David Miller, network dev, linux-sctp@vger.kernel.org, Vladislav Yasevich, daniel@iogearbox.net On Wed, Aug 17, 2016 at 09:01:38AM +0000, David Laight wrote: > From: Marcelo Ricardo Leitner > > Sent: 16 August 2016 18:25 > ... > > > That doesn't seem a good idea. > > > You don't want to abort the association if there is a transient > > > memory allocation failure. > > > You also can't drop data chunks. > > > > From a system-wise POV, this behavior - to free the new asoc in case of > > transient memory allocation failure - doesn't seem bad to me. > > That's what will have to happen if any allocation before it failed and > > also it helps the system to reduce the stress a little bit. I don't see > > any inconsistency/problems here because we are not dropping a single > > random chunk but instead we are actually refusing to initialize a new > > asoc in such conditions. > > Failing a new association should be ok, whether purists will like > connect() failing ENOMEM is another matter. > Good point. > > Nevertheless, I agree that letting the application see ENOMEM errors when > > the data actually got queued and is being fully handled, as in, it will > > be retransmitted later, is not be wise, as the application probably > > won't be able to distinguish from ENOMEMs that it should retry or not. > > I think an application would be justified in thinking that an ENOMEM return > meant that the system call had no effect. > Yep > For send() even ENOMEM is really wrong, it should be treated as 'flow control' > and either block or return EAGAIN/EWOULDBLOCK. Agreed. > Getting POLLOUT set is left as an exercise to the reader :-) > :-) > ... > > Well, it may be, but we are trying to improve it. Please continue > > discussing the fixes so we can keep improving it. :) > > Indeed, we have customers who use sctp (for M3UA). > We don't do anything 'complicated', but do end up sending a lot of short > data chunks. > > David > > -- > To unsubscribe from this list: send the line "unsubscribe linux-sctp" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH net] sctp: fix a success return may hide an error @ 2016-08-18 17:44 ` 'Marcelo Ricardo Leitner' 0 siblings, 0 replies; 29+ messages in thread From: 'Marcelo Ricardo Leitner' @ 2016-08-18 17:44 UTC (permalink / raw) To: David Laight Cc: 'Xin Long', David Miller, network dev, linux-sctp@vger.kernel.org, Vladislav Yasevich, daniel@iogearbox.net On Wed, Aug 17, 2016 at 09:01:38AM +0000, David Laight wrote: > From: Marcelo Ricardo Leitner > > Sent: 16 August 2016 18:25 > ... > > > That doesn't seem a good idea. > > > You don't want to abort the association if there is a transient > > > memory allocation failure. > > > You also can't drop data chunks. > > > > From a system-wise POV, this behavior - to free the new asoc in case of > > transient memory allocation failure - doesn't seem bad to me. > > That's what will have to happen if any allocation before it failed and > > also it helps the system to reduce the stress a little bit. I don't see > > any inconsistency/problems here because we are not dropping a single > > random chunk but instead we are actually refusing to initialize a new > > asoc in such conditions. > > Failing a new association should be ok, whether purists will like > connect() failing ENOMEM is another matter. > Good point. > > Nevertheless, I agree that letting the application see ENOMEM errors when > > the data actually got queued and is being fully handled, as in, it will > > be retransmitted later, is not be wise, as the application probably > > won't be able to distinguish from ENOMEMs that it should retry or not. > > I think an application would be justified in thinking that an ENOMEM return > meant that the system call had no effect. > Yep > For send() even ENOMEM is really wrong, it should be treated as 'flow control' > and either block or return EAGAIN/EWOULDBLOCK. Agreed. > Getting POLLOUT set is left as an exercise to the reader :-) > :-) > ... > > Well, it may be, but we are trying to improve it. Please continue > > discussing the fixes so we can keep improving it. :) > > Indeed, we have customers who use sctp (for M3UA). > We don't do anything 'complicated', but do end up sending a lot of short > data chunks. > > David > > -- > To unsubscribe from this list: send the line "unsubscribe linux-sctp" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2016-08-19 0:52 UTC | newest] Thread overview: 29+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-08-11 12:52 [PATCH net] sctp: fix a success return may hide an error Xin Long 2016-08-11 12:52 ` Xin Long 2016-08-11 13:11 ` Marcelo Ricardo Leitner 2016-08-11 13:11 ` Marcelo Ricardo Leitner 2016-08-11 15:36 ` Neil Horman 2016-08-11 15:36 ` Neil Horman 2016-08-13 4:11 ` David Miller 2016-08-13 4:11 ` David Miller 2016-08-13 7:47 ` Xin Long 2016-08-13 7:47 ` Xin Long 2016-08-16 9:16 ` David Laight 2016-08-16 9:16 ` David Laight 2016-08-16 11:34 ` Xin Long 2016-08-16 11:34 ` Xin Long 2016-08-16 16:01 ` David Laight 2016-08-16 16:01 ` David Laight 2016-08-16 17:24 ` Marcelo Ricardo Leitner 2016-08-16 17:24 ` Marcelo Ricardo Leitner 2016-08-16 18:24 ` Xin Long 2016-08-16 18:24 ` Xin Long 2016-08-16 18:33 ` Marcelo Ricardo Leitner 2016-08-16 18:33 ` Marcelo Ricardo Leitner 2016-08-16 18:45 ` Marcelo Ricardo Leitner 2016-08-16 18:45 ` Marcelo Ricardo Leitner 2016-08-17 11:42 ` Xin Long 2016-08-17 11:42 ` Xin Long 2016-08-17 9:01 ` David Laight 2016-08-18 17:44 ` 'Marcelo Ricardo Leitner' 2016-08-18 17:44 ` 'Marcelo Ricardo Leitner'
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.