* FAILED: patch "[PATCH] tcp: be more careful in tcp_fragment()" failed to apply to 4.14-stable tree
@ 2019-07-26 12:38 gregkh
2019-07-26 12:45 ` Greg KH
0 siblings, 1 reply; 7+ messages in thread
From: gregkh @ 2019-07-26 12:38 UTC (permalink / raw)
To: edumazet, aprout, cpaasch, davem, jonathan.lemon, jtl, mkubecek,
ncardwell, ycheng
Cc: stable
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable@vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b617158dc096709d8600c53b6052144d12b89fab Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet@google.com>
Date: Fri, 19 Jul 2019 11:52:33 -0700
Subject: [PATCH] tcp: be more careful in tcp_fragment()
Some applications set tiny SO_SNDBUF values and expect
TCP to just work. Recent patches to address CVE-2019-11478
broke them in case of losses, since retransmits might
be prevented.
We should allow these flows to make progress.
This patch allows the first and last skb in retransmit queue
to be split even if memory limits are hit.
It also adds the some room due to the fact that tcp_sendmsg()
and tcp_sendpage() might overshoot sk_wmem_queued by about one full
TSO skb (64KB size). Note this allowance was already present
in stable backports for kernels < 4.15
Note for < 4.15 backports :
tcp_rtx_queue_tail() will probably look like :
static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk)
{
struct sk_buff *skb = tcp_send_head(sk);
return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk);
}
Fixes: f070ef2ac667 ("tcp: tcp_fragment() should apply sane memory limits")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrew Prout <aprout@ll.mit.edu>
Tested-by: Andrew Prout <aprout@ll.mit.edu>
Tested-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Tested-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Christoph Paasch <cpaasch@apple.com>
Cc: Jonathan Looney <jtl@netflix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff --git a/include/net/tcp.h b/include/net/tcp.h
index f42d300f0cfa..e5cf514ba118 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1709,6 +1709,11 @@ static inline struct sk_buff *tcp_rtx_queue_head(const struct sock *sk)
return skb_rb_first(&sk->tcp_rtx_queue);
}
+static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk)
+{
+ return skb_rb_last(&sk->tcp_rtx_queue);
+}
+
static inline struct sk_buff *tcp_write_queue_head(const struct sock *sk)
{
return skb_peek(&sk->sk_write_queue);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 4af1f5dae9d3..6e4afc48d7bb 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1288,6 +1288,7 @@ int tcp_fragment(struct sock *sk, enum tcp_queue tcp_queue,
struct tcp_sock *tp = tcp_sk(sk);
struct sk_buff *buff;
int nsize, old_factor;
+ long limit;
int nlen;
u8 flags;
@@ -1298,8 +1299,16 @@ int tcp_fragment(struct sock *sk, enum tcp_queue tcp_queue,
if (nsize < 0)
nsize = 0;
- if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf &&
- tcp_queue != TCP_FRAG_IN_WRITE_QUEUE)) {
+ /* tcp_sendmsg() can overshoot sk_wmem_queued by one full size skb.
+ * We need some allowance to not penalize applications setting small
+ * SO_SNDBUF values.
+ * Also allow first and last skb in retransmit queue to be split.
+ */
+ limit = sk->sk_sndbuf + 2 * SKB_TRUESIZE(GSO_MAX_SIZE);
+ if (unlikely((sk->sk_wmem_queued >> 1) > limit &&
+ tcp_queue != TCP_FRAG_IN_WRITE_QUEUE &&
+ skb != tcp_rtx_queue_head(sk) &&
+ skb != tcp_rtx_queue_tail(sk))) {
NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPWQUEUETOOBIG);
return -ENOMEM;
}
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: FAILED: patch "[PATCH] tcp: be more careful in tcp_fragment()" failed to apply to 4.14-stable tree
2019-07-26 12:38 FAILED: patch "[PATCH] tcp: be more careful in tcp_fragment()" failed to apply to 4.14-stable tree gregkh
@ 2019-07-26 12:45 ` Greg KH
2019-07-26 14:39 ` Eric Dumazet
0 siblings, 1 reply; 7+ messages in thread
From: Greg KH @ 2019-07-26 12:45 UTC (permalink / raw)
To: edumazet, aprout, cpaasch, davem, jonathan.lemon, jtl, mkubecek,
ncardwell, ycheng
Cc: stable
On Fri, Jul 26, 2019 at 02:38:14PM +0200, gregkh@linuxfoundation.org wrote:
>
> The patch below does not apply to the 4.14-stable tree.
> If someone wants it applied there, or to any other stable or longterm
> tree, then please email the backport, including the original git commit
> id to <stable@vger.kernel.org>.
>
> thanks,
>
> greg k-h
>
> ------------------ original commit in Linus's tree ------------------
>
> >From b617158dc096709d8600c53b6052144d12b89fab Mon Sep 17 00:00:00 2001
> From: Eric Dumazet <edumazet@google.com>
> Date: Fri, 19 Jul 2019 11:52:33 -0700
> Subject: [PATCH] tcp: be more careful in tcp_fragment()
>
> Some applications set tiny SO_SNDBUF values and expect
> TCP to just work. Recent patches to address CVE-2019-11478
> broke them in case of losses, since retransmits might
> be prevented.
>
> We should allow these flows to make progress.
>
> This patch allows the first and last skb in retransmit queue
> to be split even if memory limits are hit.
>
> It also adds the some room due to the fact that tcp_sendmsg()
> and tcp_sendpage() might overshoot sk_wmem_queued by about one full
> TSO skb (64KB size). Note this allowance was already present
> in stable backports for kernels < 4.15
>
> Note for < 4.15 backports :
> tcp_rtx_queue_tail() will probably look like :
>
> static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk)
> {
> struct sk_buff *skb = tcp_send_head(sk);
>
> return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk);
> }
Note, I tried the above, but still ran into problems a 4.14 does not
have tcp_rtx_queue_head() and while I could guess as to what it would be
(tcp_sent_head()?), I figured it would be safer to ask for a backport :)
thanks,
greg k-h
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: FAILED: patch "[PATCH] tcp: be more careful in tcp_fragment()" failed to apply to 4.14-stable tree
2019-07-26 12:45 ` Greg KH
@ 2019-07-26 14:39 ` Eric Dumazet
2019-07-26 14:54 ` Greg KH
2019-08-06 15:09 ` [PATCH] tcp: be more careful in tcp_fragment() Matthieu Baerts
0 siblings, 2 replies; 7+ messages in thread
From: Eric Dumazet @ 2019-07-26 14:39 UTC (permalink / raw)
To: Greg KH
Cc: Andrew Prout, Christoph Paasch, David Miller, Jonathan Lemon,
Jonathan Looney, Michal Kubecek, Neal Cardwell, Yuchung Cheng,
stable
On Fri, Jul 26, 2019 at 2:45 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Fri, Jul 26, 2019 at 02:38:14PM +0200, gregkh@linuxfoundation.org wrote:
> >
> > The patch below does not apply to the 4.14-stable tree.
> > If someone wants it applied there, or to any other stable or longterm
> > tree, then please email the backport, including the original git commit
> > id to <stable@vger.kernel.org>.
> >
> > thanks,
> >
> > greg k-h
> >
> > ------------------ original commit in Linus's tree ------------------
> >
> > >From b617158dc096709d8600c53b6052144d12b89fab Mon Sep 17 00:00:00 2001
> > From: Eric Dumazet <edumazet@google.com>
> > Date: Fri, 19 Jul 2019 11:52:33 -0700
> > Subject: [PATCH] tcp: be more careful in tcp_fragment()
> >
> > Some applications set tiny SO_SNDBUF values and expect
> > TCP to just work. Recent patches to address CVE-2019-11478
> > broke them in case of losses, since retransmits might
> > be prevented.
> >
> > We should allow these flows to make progress.
> >
> > This patch allows the first and last skb in retransmit queue
> > to be split even if memory limits are hit.
> >
> > It also adds the some room due to the fact that tcp_sendmsg()
> > and tcp_sendpage() might overshoot sk_wmem_queued by about one full
> > TSO skb (64KB size). Note this allowance was already present
> > in stable backports for kernels < 4.15
> >
> > Note for < 4.15 backports :
> > tcp_rtx_queue_tail() will probably look like :
> >
> > static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk)
> > {
> > struct sk_buff *skb = tcp_send_head(sk);
> >
> > return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk);
> > }
>
>
> Note, I tried the above, but still ran into problems a 4.14 does not
> have tcp_rtx_queue_head() and while I could guess as to what it would be
> (tcp_sent_head()?), I figured it would be safer to ask for a backport :)
>
tcp_rtx_queue_head(sk) would be implemented by :
{
struct sk_buff *skb = tcp_write_queue_head(sk);
if (skb == tcp_send_head(sk))
skb = NULL;
return skb;
}
I can provide the backport, of course.
> thanks,
>
> greg k-h
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: FAILED: patch "[PATCH] tcp: be more careful in tcp_fragment()" failed to apply to 4.14-stable tree
2019-07-26 14:39 ` Eric Dumazet
@ 2019-07-26 14:54 ` Greg KH
2019-08-06 15:09 ` [PATCH] tcp: be more careful in tcp_fragment() Matthieu Baerts
1 sibling, 0 replies; 7+ messages in thread
From: Greg KH @ 2019-07-26 14:54 UTC (permalink / raw)
To: Eric Dumazet
Cc: Andrew Prout, Christoph Paasch, David Miller, Jonathan Lemon,
Jonathan Looney, Michal Kubecek, Neal Cardwell, Yuchung Cheng,
stable
On Fri, Jul 26, 2019 at 04:39:48PM +0200, Eric Dumazet wrote:
> On Fri, Jul 26, 2019 at 2:45 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> > On Fri, Jul 26, 2019 at 02:38:14PM +0200, gregkh@linuxfoundation.org wrote:
> > >
> > > The patch below does not apply to the 4.14-stable tree.
> > > If someone wants it applied there, or to any other stable or longterm
> > > tree, then please email the backport, including the original git commit
> > > id to <stable@vger.kernel.org>.
> > >
> > > thanks,
> > >
> > > greg k-h
> > >
> > > ------------------ original commit in Linus's tree ------------------
> > >
> > > >From b617158dc096709d8600c53b6052144d12b89fab Mon Sep 17 00:00:00 2001
> > > From: Eric Dumazet <edumazet@google.com>
> > > Date: Fri, 19 Jul 2019 11:52:33 -0700
> > > Subject: [PATCH] tcp: be more careful in tcp_fragment()
> > >
> > > Some applications set tiny SO_SNDBUF values and expect
> > > TCP to just work. Recent patches to address CVE-2019-11478
> > > broke them in case of losses, since retransmits might
> > > be prevented.
> > >
> > > We should allow these flows to make progress.
> > >
> > > This patch allows the first and last skb in retransmit queue
> > > to be split even if memory limits are hit.
> > >
> > > It also adds the some room due to the fact that tcp_sendmsg()
> > > and tcp_sendpage() might overshoot sk_wmem_queued by about one full
> > > TSO skb (64KB size). Note this allowance was already present
> > > in stable backports for kernels < 4.15
> > >
> > > Note for < 4.15 backports :
> > > tcp_rtx_queue_tail() will probably look like :
> > >
> > > static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk)
> > > {
> > > struct sk_buff *skb = tcp_send_head(sk);
> > >
> > > return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk);
> > > }
> >
> >
> > Note, I tried the above, but still ran into problems a 4.14 does not
> > have tcp_rtx_queue_head() and while I could guess as to what it would be
> > (tcp_sent_head()?), I figured it would be safer to ask for a backport :)
> >
>
>
> tcp_rtx_queue_head(sk) would be implemented by :
> {
> struct sk_buff *skb = tcp_write_queue_head(sk);
> if (skb == tcp_send_head(sk))
> skb = NULL;
> return skb;
> }
>
> I can provide the backport, of course.
A backport would be great to ensure I didn't mess it up :)
thanks,
greg k-h
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH] tcp: be more careful in tcp_fragment()
2019-07-26 14:39 ` Eric Dumazet
2019-07-26 14:54 ` Greg KH
@ 2019-08-06 15:09 ` Matthieu Baerts
2019-08-06 15:25 ` Eric Dumazet
1 sibling, 1 reply; 7+ messages in thread
From: Matthieu Baerts @ 2019-08-06 15:09 UTC (permalink / raw)
To: stable
Cc: gregkh, Eric Dumazet, Andrew Prout, Jonathan Lemon,
Michal Kubecek, Neal Cardwell, Yuchung Cheng, Christoph Paasch,
Jonathan Looney, David S . Miller, Matthieu Baerts
From: Eric Dumazet <edumazet@google.com>
commit b617158dc096709d8600c53b6052144d12b89fab upstream.
Some applications set tiny SO_SNDBUF values and expect
TCP to just work. Recent patches to address CVE-2019-11478
broke them in case of losses, since retransmits might
be prevented.
We should allow these flows to make progress.
This patch allows the first and last skb in retransmit queue
to be split even if memory limits are hit.
It also adds the some room due to the fact that tcp_sendmsg()
and tcp_sendpage() might overshoot sk_wmem_queued by about one full
TSO skb (64KB size). Note this allowance was already present
in stable backports for kernels < 4.15
Note for < 4.15 backports :
tcp_rtx_queue_tail() will probably look like :
static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk)
{
struct sk_buff *skb = tcp_send_head(sk);
return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk);
}
Fixes: f070ef2ac667 ("tcp: tcp_fragment() should apply sane memory limits")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrew Prout <aprout@ll.mit.edu>
Tested-by: Andrew Prout <aprout@ll.mit.edu>
Tested-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Tested-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Christoph Paasch <cpaasch@apple.com>
Cc: Jonathan Looney <jtl@netflix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
---
Notes:
Hello,
Here is the backport for linux-4.14.y branch simply by implementing
functions written by Eric here in the commit message and in this email
thread. It might be valid for older versions, I didn't check.
In my setup with MPTCP, I had the same bug other had where TCP flows
were stalled. The initial fix b6653b3629e5 (tcp: refine memory limit
test in tcp_fragment()) from Eric was helping but the backport in
< 4.15 was not looking safe: 1bc13903773b (tcp: refine memory limit
test in tcp_fragment()).
I then decided to test the new fix and it is working fine in my test
environment, no stalled TCP flows in a few hours.
In this email thread I see that Eric will push a patch for v4.14. I
absolutely do not want to add pressure or steal his work but because I
have the patch here and it is tested, I was thinking it could be a good
idea to share it with others. Feel free to ignore this patch if needed.
But if this patch can reduce a tiny bit Eric's workload, I would be
very glad if it helps :)
Because at the end, it is Eric's work, feel free to change my
"Signed-off-by" by "Tested-By" if it is how it work or nothing if you
prefer to wait for Eric's patch.
Cheers,
Matt
include/net/tcp.h | 17 +++++++++++++++++
net/ipv4/tcp_output.c | 11 ++++++++++-
2 files changed, 27 insertions(+), 1 deletion(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 0b477a1e1177..7994e569644e 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1688,6 +1688,23 @@ static inline void tcp_check_send_head(struct sock *sk, struct sk_buff *skb_unli
tcp_sk(sk)->highest_sack = NULL;
}
+static inline struct sk_buff *tcp_rtx_queue_head(const struct sock *sk)
+{
+ struct sk_buff *skb = tcp_write_queue_head(sk);
+
+ if (skb == tcp_send_head(sk))
+ skb = NULL;
+
+ return skb;
+}
+
+static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk)
+{
+ struct sk_buff *skb = tcp_send_head(sk);
+
+ return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk);
+}
+
static inline void __tcp_add_write_queue_tail(struct sock *sk, struct sk_buff *skb)
{
__skb_queue_tail(&sk->sk_write_queue, skb);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index a5960b9b6741..a99086bf26ea 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1264,6 +1264,7 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len,
struct tcp_sock *tp = tcp_sk(sk);
struct sk_buff *buff;
int nsize, old_factor;
+ long limit;
int nlen;
u8 flags;
@@ -1274,7 +1275,15 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len,
if (nsize < 0)
nsize = 0;
- if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf + 0x20000)) {
+ /* tcp_sendmsg() can overshoot sk_wmem_queued by one full size skb.
+ * We need some allowance to not penalize applications setting small
+ * SO_SNDBUF values.
+ * Also allow first and last skb in retransmit queue to be split.
+ */
+ limit = sk->sk_sndbuf + 2 * SKB_TRUESIZE(GSO_MAX_SIZE);
+ if (unlikely((sk->sk_wmem_queued >> 1) > limit &&
+ skb != tcp_rtx_queue_head(sk) &&
+ skb != tcp_rtx_queue_tail(sk))) {
NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPWQUEUETOOBIG);
return -ENOMEM;
}
--
2.20.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] tcp: be more careful in tcp_fragment()
2019-08-06 15:09 ` [PATCH] tcp: be more careful in tcp_fragment() Matthieu Baerts
@ 2019-08-06 15:25 ` Eric Dumazet
2019-08-06 22:10 ` Sasha Levin
0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2019-08-06 15:25 UTC (permalink / raw)
To: Matthieu Baerts
Cc: stable, Greg Kroah-Hartman, Andrew Prout, Jonathan Lemon,
Michal Kubecek, Neal Cardwell, Yuchung Cheng, Christoph Paasch,
Jonathan Looney, David S . Miller
On Tue, Aug 6, 2019 at 5:09 PM Matthieu Baerts
<matthieu.baerts@tessares.net> wrote:
>
> From: Eric Dumazet <edumazet@google.com>
>
> commit b617158dc096709d8600c53b6052144d12b89fab upstream.
>
> Some applications set tiny SO_SNDBUF values and expect
> TCP to just work. Recent patches to address CVE-2019-11478
> broke them in case of losses, since retransmits might
> be prevented.
>
> We should allow these flows to make progress.
>
> This patch allows the first and last skb in retransmit queue
> to be split even if memory limits are hit.
>
> It also adds the some room due to the fact that tcp_sendmsg()
> and tcp_sendpage() might overshoot sk_wmem_queued by about one full
> TSO skb (64KB size). Note this allowance was already present
> in stable backports for kernels < 4.15
>
> Note for < 4.15 backports :
> tcp_rtx_queue_tail() will probably look like :
>
> static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk)
> {
> struct sk_buff *skb = tcp_send_head(sk);
>
> return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk);
> }
>
> Fixes: f070ef2ac667 ("tcp: tcp_fragment() should apply sane memory limits")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Andrew Prout <aprout@ll.mit.edu>
> Tested-by: Andrew Prout <aprout@ll.mit.edu>
> Tested-by: Jonathan Lemon <jonathan.lemon@gmail.com>
> Tested-by: Michal Kubecek <mkubecek@suse.cz>
> Acked-by: Neal Cardwell <ncardwell@google.com>
> Acked-by: Yuchung Cheng <ycheng@google.com>
> Acked-by: Christoph Paasch <cpaasch@apple.com>
> Cc: Jonathan Looney <jtl@netflix.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
> ---
>
> Notes:
> Hello,
>
> Here is the backport for linux-4.14.y branch simply by implementing
> functions written by Eric here in the commit message and in this email
> thread. It might be valid for older versions, I didn't check.
>
> In my setup with MPTCP, I had the same bug other had where TCP flows
> were stalled. The initial fix b6653b3629e5 (tcp: refine memory limit
> test in tcp_fragment()) from Eric was helping but the backport in
> < 4.15 was not looking safe: 1bc13903773b (tcp: refine memory limit
> test in tcp_fragment()).
>
> I then decided to test the new fix and it is working fine in my test
> environment, no stalled TCP flows in a few hours.
>
> In this email thread I see that Eric will push a patch for v4.14. I
> absolutely do not want to add pressure or steal his work but because I
> have the patch here and it is tested, I was thinking it could be a good
> idea to share it with others. Feel free to ignore this patch if needed.
> But if this patch can reduce a tiny bit Eric's workload, I would be
> very glad if it helps :)
> Because at the end, it is Eric's work, feel free to change my
> "Signed-off-by" by "Tested-By" if it is how it work or nothing if you
> prefer to wait for Eric's patch.
This patch is fine, I was simply on vacation last week, and wanted to
truly take full advantage of them ;)
Thanks !
>
> Cheers,
> Matt
>
> include/net/tcp.h | 17 +++++++++++++++++
> net/ipv4/tcp_output.c | 11 ++++++++++-
> 2 files changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 0b477a1e1177..7994e569644e 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -1688,6 +1688,23 @@ static inline void tcp_check_send_head(struct sock *sk, struct sk_buff *skb_unli
> tcp_sk(sk)->highest_sack = NULL;
> }
>
> +static inline struct sk_buff *tcp_rtx_queue_head(const struct sock *sk)
> +{
> + struct sk_buff *skb = tcp_write_queue_head(sk);
> +
> + if (skb == tcp_send_head(sk))
> + skb = NULL;
> +
> + return skb;
> +}
> +
> +static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk)
> +{
> + struct sk_buff *skb = tcp_send_head(sk);
> +
> + return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk);
> +}
> +
> static inline void __tcp_add_write_queue_tail(struct sock *sk, struct sk_buff *skb)
> {
> __skb_queue_tail(&sk->sk_write_queue, skb);
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index a5960b9b6741..a99086bf26ea 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1264,6 +1264,7 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len,
> struct tcp_sock *tp = tcp_sk(sk);
> struct sk_buff *buff;
> int nsize, old_factor;
> + long limit;
> int nlen;
> u8 flags;
>
> @@ -1274,7 +1275,15 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len,
> if (nsize < 0)
> nsize = 0;
>
> - if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf + 0x20000)) {
> + /* tcp_sendmsg() can overshoot sk_wmem_queued by one full size skb.
> + * We need some allowance to not penalize applications setting small
> + * SO_SNDBUF values.
> + * Also allow first and last skb in retransmit queue to be split.
> + */
> + limit = sk->sk_sndbuf + 2 * SKB_TRUESIZE(GSO_MAX_SIZE);
> + if (unlikely((sk->sk_wmem_queued >> 1) > limit &&
> + skb != tcp_rtx_queue_head(sk) &&
> + skb != tcp_rtx_queue_tail(sk))) {
> NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPWQUEUETOOBIG);
> return -ENOMEM;
> }
> --
> 2.20.1
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] tcp: be more careful in tcp_fragment()
2019-08-06 15:25 ` Eric Dumazet
@ 2019-08-06 22:10 ` Sasha Levin
0 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2019-08-06 22:10 UTC (permalink / raw)
To: Eric Dumazet
Cc: Matthieu Baerts, stable, Greg Kroah-Hartman, Andrew Prout,
Jonathan Lemon, Michal Kubecek, Neal Cardwell, Yuchung Cheng,
Christoph Paasch, Jonathan Looney, David S . Miller
On Tue, Aug 06, 2019 at 05:25:35PM +0200, Eric Dumazet wrote:
>On Tue, Aug 6, 2019 at 5:09 PM Matthieu Baerts
><matthieu.baerts@tessares.net> wrote:
>>
>> From: Eric Dumazet <edumazet@google.com>
>>
>> commit b617158dc096709d8600c53b6052144d12b89fab upstream.
>>
>> Some applications set tiny SO_SNDBUF values and expect
>> TCP to just work. Recent patches to address CVE-2019-11478
>> broke them in case of losses, since retransmits might
>> be prevented.
>>
>> We should allow these flows to make progress.
>>
>> This patch allows the first and last skb in retransmit queue
>> to be split even if memory limits are hit.
>>
>> It also adds the some room due to the fact that tcp_sendmsg()
>> and tcp_sendpage() might overshoot sk_wmem_queued by about one full
>> TSO skb (64KB size). Note this allowance was already present
>> in stable backports for kernels < 4.15
>>
>> Note for < 4.15 backports :
>> tcp_rtx_queue_tail() will probably look like :
>>
>> static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk)
>> {
>> struct sk_buff *skb = tcp_send_head(sk);
>>
>> return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk);
>> }
>>
>> Fixes: f070ef2ac667 ("tcp: tcp_fragment() should apply sane memory limits")
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>> Reported-by: Andrew Prout <aprout@ll.mit.edu>
>> Tested-by: Andrew Prout <aprout@ll.mit.edu>
>> Tested-by: Jonathan Lemon <jonathan.lemon@gmail.com>
>> Tested-by: Michal Kubecek <mkubecek@suse.cz>
>> Acked-by: Neal Cardwell <ncardwell@google.com>
>> Acked-by: Yuchung Cheng <ycheng@google.com>
>> Acked-by: Christoph Paasch <cpaasch@apple.com>
>> Cc: Jonathan Looney <jtl@netflix.com>
>> Signed-off-by: David S. Miller <davem@davemloft.net>
>> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
>> ---
>>
>> Notes:
>> Hello,
>>
>> Here is the backport for linux-4.14.y branch simply by implementing
>> functions written by Eric here in the commit message and in this email
>> thread. It might be valid for older versions, I didn't check.
>>
>> In my setup with MPTCP, I had the same bug other had where TCP flows
>> were stalled. The initial fix b6653b3629e5 (tcp: refine memory limit
>> test in tcp_fragment()) from Eric was helping but the backport in
>> < 4.15 was not looking safe: 1bc13903773b (tcp: refine memory limit
>> test in tcp_fragment()).
>>
>> I then decided to test the new fix and it is working fine in my test
>> environment, no stalled TCP flows in a few hours.
>>
>> In this email thread I see that Eric will push a patch for v4.14. I
>> absolutely do not want to add pressure or steal his work but because I
>> have the patch here and it is tested, I was thinking it could be a good
>> idea to share it with others. Feel free to ignore this patch if needed.
>> But if this patch can reduce a tiny bit Eric's workload, I would be
>> very glad if it helps :)
>> Because at the end, it is Eric's work, feel free to change my
>> "Signed-off-by" by "Tested-By" if it is how it work or nothing if you
>> prefer to wait for Eric's patch.
>
>This patch is fine, I was simply on vacation last week, and wanted to
>truly take full advantage of them ;)
Queued for 4.14, 4.9, and 4.4. Thanks all!
--
Thanks,
Sasha
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2019-08-06 22:10 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-07-26 12:38 FAILED: patch "[PATCH] tcp: be more careful in tcp_fragment()" failed to apply to 4.14-stable tree gregkh
2019-07-26 12:45 ` Greg KH
2019-07-26 14:39 ` Eric Dumazet
2019-07-26 14:54 ` Greg KH
2019-08-06 15:09 ` [PATCH] tcp: be more careful in tcp_fragment() Matthieu Baerts
2019-08-06 15:25 ` Eric Dumazet
2019-08-06 22:10 ` Sasha Levin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).