* [PATCH] Avoid potentially erroneos RST drop.
@ 2021-04-30 9:36 Ali Abdallah
2021-05-05 19:53 ` Florian Westphal
0 siblings, 1 reply; 7+ messages in thread
From: Ali Abdallah @ 2021-04-30 9:36 UTC (permalink / raw)
To: netfilter-devel
In ignore state, we let SYN goes in original, the server might respond
with RST/ACK, and that RST packet is erroneously dropped because of the
flag IP_CT_TCP_FLAG_MAXACK_SET being already set. So we reset the flag
in this case.
Unfortunately that might not be enough, an out of order ACK in origin
might reset it back, and we might end up again dropping a valid RST when
the server responds with RST SEQ=0.
The patch disables also the RST check when we are not in established
state and we receive an RST with SEQ=0 that is most likely a response to
a SYN we had let it go through.
Signed-off-by: Ali Abdallah <aabdallah@suse.de>
---
net/netfilter/nf_conntrack_proto_tcp.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index 318b8f723349..e958fde8cf9b 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -949,6 +949,10 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct,
ct->proto.tcp.last_flags =
ct->proto.tcp.last_wscale = 0;
+ /* Reset the max ack flag so in case the server replies
+ * with RST/ACK it will not be marked as an invalid rst.
+ */
+ ct->proto.tcp.seen[dir].flags &= ~IP_CT_TCP_FLAG_MAXACK_SET;
tcp_options(skb, dataoff, th, &seen);
if (seen.flags & IP_CT_TCP_FLAG_WINDOW_SCALE) {
ct->proto.tcp.last_flags |=
@@ -1030,6 +1034,13 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct,
if (ct->proto.tcp.seen[!dir].flags & IP_CT_TCP_FLAG_MAXACK_SET) {
u32 seq = ntohl(th->seq);
+ /* If we are not in established state, and an RST is
+ * observed with SEQ=0, this is most likely an answer
+ * to a SYN we had let go through above.
+ */
+ if (seq == 0 && !nf_conntrack_tcp_established(ct))
+ break;
+
if (before(seq, ct->proto.tcp.seen[!dir].td_maxack)) {
/* Invalid RST */
spin_unlock_bh(&ct->lock);
--
2.26.2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] Avoid potentially erroneos RST drop.
2021-04-30 9:36 [PATCH] Avoid potentially erroneos RST drop Ali Abdallah
@ 2021-05-05 19:53 ` Florian Westphal
2021-05-06 7:33 ` Ali Abdallah
2021-05-19 12:07 ` Ali Abdallah
0 siblings, 2 replies; 7+ messages in thread
From: Florian Westphal @ 2021-05-05 19:53 UTC (permalink / raw)
To: Ali Abdallah; +Cc: netfilter-devel
Ali Abdallah <ali.abdallah@suse.com> wrote:
> In ignore state, we let SYN goes in original, the server might respond
> with RST/ACK, and that RST packet is erroneously dropped because of the
> flag IP_CT_TCP_FLAG_MAXACK_SET being already set. So we reset the flag
> in this case.
>
> Unfortunately that might not be enough, an out of order ACK in origin
> might reset it back, and we might end up again dropping a valid RST when
> the server responds with RST SEQ=0.
>
> The patch disables also the RST check when we are not in established
> state and we receive an RST with SEQ=0 that is most likely a response to
> a SYN we had let it go through.
Ali, sorry for coming back to this again and again.
What do you think of this change?
Its an incremental change on top of your patch.
The only real change is that this will skip window check if
conntrack thinks connection is closing already.
In addition, tcp window check is skipped in that case.
This is supposed to expedite conntrack eviction in case of tuple reuse
by some nat/pat middlebox, or a peer that has lower timeouts than
conntrack before a port is re-used.
diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -834,6 +834,22 @@ static noinline bool tcp_new(struct nf_conn *ct, const struct sk_buff *skb,
return true;
}
+static bool tcp_can_early_drop(const struct nf_conn *ct)
+{
+ switch (ct->proto.tcp.state) {
+ case TCP_CONNTRACK_FIN_WAIT:
+ case TCP_CONNTRACK_LAST_ACK:
+ case TCP_CONNTRACK_TIME_WAIT:
+ case TCP_CONNTRACK_CLOSE:
+ case TCP_CONNTRACK_CLOSE_WAIT:
+ return true;
+ default:
+ break;
+ }
+
+ return false;
+}
+
/* Returns verdict for packet, or -1 for invalid. */
int nf_conntrack_tcp_packet(struct nf_conn *ct,
struct sk_buff *skb,
@@ -1053,8 +1069,16 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct,
/* If we are not in established state, and an RST is
* observed with SEQ=0, this is most likely an answer
* to a SYN we had let go through above.
+ *
+ * Also expedite conntrack destruction: If we were already
+ * closing, peer or NAT/PAT might already have reused tuple.
*/
- if (seq == 0 && !nf_conntrack_tcp_established(ct))
+ if (!nf_conntrack_tcp_established(ct)) {
+ if (seq == 0 || tcp_can_early_drop(ct))
+ goto in_window;
+ }
+
+ if (seq == ct->proto.tcp.seen[!dir].td_maxack)
break;
if (before(seq, ct->proto.tcp.seen[!dir].td_maxack)) {
@@ -1066,10 +1090,6 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct,
return -NF_ACCEPT;
}
- if (!nf_conntrack_tcp_established(ct) ||
- seq == ct->proto.tcp.seen[!dir].td_maxack)
- break;
-
/* Check if rst is part of train, such as
* foo:80 > bar:4379: P, 235946583:235946602(19) ack 42
* foo:80 > bar:4379: R, 235946602:235946602(0) ack 42
@@ -1181,22 +1201,6 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct,
return NF_ACCEPT;
}
-static bool tcp_can_early_drop(const struct nf_conn *ct)
-{
- switch (ct->proto.tcp.state) {
- case TCP_CONNTRACK_FIN_WAIT:
- case TCP_CONNTRACK_LAST_ACK:
- case TCP_CONNTRACK_TIME_WAIT:
- case TCP_CONNTRACK_CLOSE:
- case TCP_CONNTRACK_CLOSE_WAIT:
- return true;
- default:
- break;
- }
-
- return false;
-}
-
#if IS_ENABLED(CONFIG_NF_CT_NETLINK)
#include <linux/netfilter/nfnetlink.h>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Avoid potentially erroneos RST drop.
2021-05-05 19:53 ` Florian Westphal
@ 2021-05-06 7:33 ` Ali Abdallah
2021-05-19 12:07 ` Ali Abdallah
1 sibling, 0 replies; 7+ messages in thread
From: Ali Abdallah @ 2021-05-06 7:33 UTC (permalink / raw)
To: Florian Westphal; +Cc: netfilter-devel
On 05.05.2021 21:53, Florian Westphal wrote:
> Ali, sorry for coming back to this again and again.
>
> What do you think of this change?
>
> Its an incremental change on top of your patch.
>
> The only real change is that this will skip window check if
> conntrack thinks connection is closing already.
>
> In addition, tcp window check is skipped in that case.
>
> This is supposed to expedite conntrack eviction in case of tuple reuse
> by some nat/pat middlebox, or a peer that has lower timeouts than
> conntrack before a port is re-used.
Thanks Florian, this looks sane for me, I will give a try and report
back here.
--
Ali Abdallah | SUSE Linux L3 Engineer
GPG fingerprint: 51A0 F4A0 C8CF C98F 842E A9A8 B945 56F8 1C85 D0D5
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Avoid potentially erroneos RST drop.
2021-05-05 19:53 ` Florian Westphal
2021-05-06 7:33 ` Ali Abdallah
@ 2021-05-19 12:07 ` Ali Abdallah
2021-05-19 12:23 ` Florian Westphal
1 sibling, 1 reply; 7+ messages in thread
From: Ali Abdallah @ 2021-05-19 12:07 UTC (permalink / raw)
To: Florian Westphal; +Cc: netfilter-devel
On 05.05.2021 21:53, Florian Westphal wrote:
> Ali, sorry for coming back to this again and again.
>
> What do you think of this change?
Hi Florian, I tested your patch and it solved the issue, no more NFS
hangs due to dropped RSTs. Please include it, together with the
following two patches I previously sent:
https://patchwork.ozlabs.org/project/netfilter-devel/patch/20210428130911.cteglt52r5if7ynp@Fryzen495/
https://patchwork.ozlabs.org/project/netfilter-devel/patch/20210430093601.zibczc4cjnwx3qwn@Fryzen495/
Thanks a lot!
> Its an incremental change on top of your patch.
>
> The only real change is that this will skip window check if
> conntrack thinks connection is closing already.
>
> In addition, tcp window check is skipped in that case.
>
> This is supposed to expedite conntrack eviction in case of tuple reuse
> by some nat/pat middlebox, or a peer that has lower timeouts than
> conntrack before a port is re-used.
>
> diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
> --- a/net/netfilter/nf_conntrack_proto_tcp.c
> +++ b/net/netfilter/nf_conntrack_proto_tcp.c
> @@ -834,6 +834,22 @@ static noinline bool tcp_new(struct nf_conn *ct, const struct sk_buff *skb,
> return true;
> }
>
> +static bool tcp_can_early_drop(const struct nf_conn *ct)
> +{
> + switch (ct->proto.tcp.state) {
> + case TCP_CONNTRACK_FIN_WAIT:
> + case TCP_CONNTRACK_LAST_ACK:
> + case TCP_CONNTRACK_TIME_WAIT:
> + case TCP_CONNTRACK_CLOSE:
> + case TCP_CONNTRACK_CLOSE_WAIT:
> + return true;
> + default:
> + break;
> + }
> +
> + return false;
> +}
> +
> /* Returns verdict for packet, or -1 for invalid. */
> int nf_conntrack_tcp_packet(struct nf_conn *ct,
> struct sk_buff *skb,
> @@ -1053,8 +1069,16 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct,
> /* If we are not in established state, and an RST is
> * observed with SEQ=0, this is most likely an answer
> * to a SYN we had let go through above.
> + *
> + * Also expedite conntrack destruction: If we were already
> + * closing, peer or NAT/PAT might already have reused tuple.
> */
> - if (seq == 0 && !nf_conntrack_tcp_established(ct))
> + if (!nf_conntrack_tcp_established(ct)) {
> + if (seq == 0 || tcp_can_early_drop(ct))
> + goto in_window;
> + }
> +
> + if (seq == ct->proto.tcp.seen[!dir].td_maxack)
> break;
>
> if (before(seq, ct->proto.tcp.seen[!dir].td_maxack)) {
> @@ -1066,10 +1090,6 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct,
> return -NF_ACCEPT;
> }
>
> - if (!nf_conntrack_tcp_established(ct) ||
> - seq == ct->proto.tcp.seen[!dir].td_maxack)
> - break;
> -
> /* Check if rst is part of train, such as
> * foo:80 > bar:4379: P, 235946583:235946602(19) ack 42
> * foo:80 > bar:4379: R, 235946602:235946602(0) ack 42
> @@ -1181,22 +1201,6 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct,
> return NF_ACCEPT;
> }
>
> -static bool tcp_can_early_drop(const struct nf_conn *ct)
> -{
> - switch (ct->proto.tcp.state) {
> - case TCP_CONNTRACK_FIN_WAIT:
> - case TCP_CONNTRACK_LAST_ACK:
> - case TCP_CONNTRACK_TIME_WAIT:
> - case TCP_CONNTRACK_CLOSE:
> - case TCP_CONNTRACK_CLOSE_WAIT:
> - return true;
> - default:
> - break;
> - }
> -
> - return false;
> -}
> -
> #if IS_ENABLED(CONFIG_NF_CT_NETLINK)
>
> #include <linux/netfilter/nfnetlink.h>
>
--
Ali Abdallah | SUSE Linux L3 Engineer
GPG fingerprint: 51A0 F4A0 C8CF C98F 842E A9A8 B945 56F8 1C85 D0D5
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Avoid potentially erroneos RST drop.
2021-05-19 12:07 ` Ali Abdallah
@ 2021-05-19 12:23 ` Florian Westphal
2021-05-19 22:16 ` Pablo Neira Ayuso
2021-05-21 8:00 ` Ali Abdallah
0 siblings, 2 replies; 7+ messages in thread
From: Florian Westphal @ 2021-05-19 12:23 UTC (permalink / raw)
To: Ali Abdallah; +Cc: Florian Westphal, netfilter-devel
Ali Abdallah <ali.abdallah@suse.com> wrote:
> On 05.05.2021 21:53, Florian Westphal wrote:
> > Ali, sorry for coming back to this again and again.
> >
> > What do you think of this change?
>
> Hi Florian, I tested your patch and it solved the issue, no more NFS
> hangs due to dropped RSTs. Please include it, together with the
> following two patches I previously sent:
>
> https://patchwork.ozlabs.org/project/netfilter-devel/patch/20210428130911.cteglt52r5if7ynp@Fryzen495/
Do we still need this one after this revised patch?
If we do, the help text has to be fixed, after your patch, be-liberal
turns off all sequence number/window checks. The revised text implies
it only has to do with RSTs.
Alternative would be to add another sysctl, or turn the existing sysctl
into integer (0, off, 1 current behaviour (sequence check on for rst
only, 2 off for everything).
> https://patchwork.ozlabs.org/project/netfilter-devel/patch/20210430093601.zibczc4cjnwx3qwn@Fryzen495/
I will send this patch for inclusion tomorrow or later today.
Pablo, please mark both patches as "Changes Requested".
I will deal with the 2nd patch and will resend it, with the more liberal
handing of RST when conntrack entry is closing.
Ali, if you still think the first patch is required please submit a new
version with at least a revised help text.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Avoid potentially erroneos RST drop.
2021-05-19 12:23 ` Florian Westphal
@ 2021-05-19 22:16 ` Pablo Neira Ayuso
2021-05-21 8:00 ` Ali Abdallah
1 sibling, 0 replies; 7+ messages in thread
From: Pablo Neira Ayuso @ 2021-05-19 22:16 UTC (permalink / raw)
To: Florian Westphal; +Cc: Ali Abdallah, netfilter-devel
On Wed, May 19, 2021 at 02:23:32PM +0200, Florian Westphal wrote:
> Ali Abdallah <ali.abdallah@suse.com> wrote:
[...]
> > https://patchwork.ozlabs.org/project/netfilter-devel/patch/20210428130911.cteglt52r5if7ynp@Fryzen495/
> > https://patchwork.ozlabs.org/project/netfilter-devel/patch/20210430093601.zibczc4cjnwx3qwn@Fryzen495/
>
> I will send this patch for inclusion tomorrow or later today.
>
> Pablo, please mark both patches as "Changes Requested".
Done.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Avoid potentially erroneos RST drop.
2021-05-19 12:23 ` Florian Westphal
2021-05-19 22:16 ` Pablo Neira Ayuso
@ 2021-05-21 8:00 ` Ali Abdallah
1 sibling, 0 replies; 7+ messages in thread
From: Ali Abdallah @ 2021-05-21 8:00 UTC (permalink / raw)
To: Florian Westphal; +Cc: netfilter-devel
On 19.05.2021 14:23, Florian Westphal wrote:
> > Hi Florian, I tested your patch and it solved the issue, no more NFS
> > hangs due to dropped RSTs. Please include it, together with the
> > following two patches I previously sent:
> >
> > https://patchwork.ozlabs.org/project/netfilter-devel/patch/20210428130911.cteglt52r5if7ynp@Fryzen495/
>
> Do we still need this one after this revised patch?
> If we do, the help text has to be fixed, after your patch, be-liberal
> turns off all sequence number/window checks. The revised text implies
> it only has to do with RSTs.
>
> Alternative would be to add another sysctl, or turn the existing sysctl
> into integer (0, off, 1 current behaviour (sequence check on for rst
> only, 2 off for everything).
I would still like to make the RST sequence number check optional. I
think it is a good idea to use 0, 1 and > 1 off for everything, keeping
this way the current behaviour when tcp_be_liberal is set to 1.
I will send another patch with also revised text.
Many thanks.
Ali
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-05-21 8:02 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-04-30 9:36 [PATCH] Avoid potentially erroneos RST drop Ali Abdallah
2021-05-05 19:53 ` Florian Westphal
2021-05-06 7:33 ` Ali Abdallah
2021-05-19 12:07 ` Ali Abdallah
2021-05-19 12:23 ` Florian Westphal
2021-05-19 22:16 ` Pablo Neira Ayuso
2021-05-21 8:00 ` Ali Abdallah
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.