* Re: [PATCH net-next] netfilter: conntrack: avoid sending RST to reply out-of-window skb
2024-03-07 15:11 ` Jason Xing
@ 2024-03-07 15:34 ` Jozsef Kadlecsik
2024-03-07 15:59 ` Jason Xing
2024-03-08 8:59 ` Jason Xing
2024-03-08 22:46 ` Florian Westphal
2 siblings, 1 reply; 14+ messages in thread
From: Jozsef Kadlecsik @ 2024-03-07 15:34 UTC (permalink / raw)
To: Jason Xing
Cc: Florian Westphal, edumazet, Pablo Neira Ayuso, kuba, pabeni,
David Miller, netfilter-devel, coreteam, netdev, Jason Xing
[-- Attachment #1: Type: text/plain, Size: 8496 bytes --]
On Thu, 7 Mar 2024, Jason Xing wrote:
> On Thu, Mar 7, 2024 at 10:10 PM Florian Westphal <fw@strlen.de> wrote:
> >
> > Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > On Thu, Mar 7, 2024 at 8:00 PM Florian Westphal <fw@strlen.de> wrote:
> > > >
> > > > Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > > > > This change disables most of the tcp_in_window() test, this will
> > > > > > pretend everything is fine even though tcp_in_window says otherwise.
> > > > >
> > > > > Thanks for the information. It does make sense.
> > > > >
> > > > > What I've done is quite similar to nf_conntrack_tcp_be_liberal sysctl
> > > > > knob which you also pointed out. It also pretends to ignore those
> > > > > out-of-window skbs.
> > > > >
> > > > > >
> > > > > > You could:
> > > > > > - drop invalid tcp packets in input hook
> > > > >
> > > > > How about changing the return value only as below? Only two cases will
> > > > > be handled:
> > > > >
> > > > > diff --git a/net/netfilter/nf_conntrack_proto_tcp.c
> > > > > b/net/netfilter/nf_conntrack_proto_tcp.c
> > > > > index ae493599a3ef..c88ce4cd041e 100644
> > > > > --- a/net/netfilter/nf_conntrack_proto_tcp.c
> > > > > +++ b/net/netfilter/nf_conntrack_proto_tcp.c
> > > > > @@ -1259,7 +1259,7 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct,
> > > > > case NFCT_TCP_INVALID:
> > > > > nf_tcp_handle_invalid(ct, dir, index, skb, state);
> > > > > spin_unlock_bh(&ct->lock);
> > > > > - return -NF_ACCEPT;
> > > > > + return -NF_DROP;
> > > >
> > > > Lets not do this. conntrack should never drop packets and defer to ruleset
> > > > whereever possible.
> > >
> > > Hmm, sorry, it is against my understanding.
> > >
> > > If we cannot return -NF_DROP, why have we already added some 'return
> > > NF_DROP' in the nf_conntrack_handle_packet() function? And why does
> > > this test statement exist?
> >
> > Sure we can drop. But we should only do it if there is no better
> > alternative.
> >
> > > nf_conntrack_in()
> > > -> nf_conntrack_handle_packet()
> > > -> if (ret <= 0) {
> > > if (ret == -NF_DROP) NF_CT_STAT_INC_ATOMIC(state->net, drop);
> >
> > AFAICS this only happens when we receive syn for an existing conntrack
> > that is being removed already so we'd expect next syn to create a new
>
> Sorry, I've double-checked this part and found out there is no chance
> to return '-NF_DROP' for nf_conntrack_handle_packet(). It might return
> 'NF_DROP' (see link [1]) instead. The if-else statements seem like
> dead code.
>
> [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/netfilter/nf_conntrack_proto_tcp.c#:~:text=%2DNF_REPEAT%3B-,return%20NF_DROP%3B,-%7D%0A%09%09fallthrough%3B
>
> > connection. Feel free to send patches that replace drop with -accept
> > where possible/where it makes sense, but I don't think the
> > TCP_CONNTRACK_SYN_SENT one can reasonably be avoided.
>
> Oh, are you suggesting replacing NF_DROP with -NF_ACCEPT in
> nf_conntrack_dccp_packet()?
>
> There are three points where nf_conntrack_handle_packet() returns NF_DROP:
> 1) one (syn_sent case) exists in nf_conntrack_tcp_packet(). As you
> said, it's not necessary to change.
> 2) another two exist in nf_conntrack_dccp_packet() which should be the
> same as nf_conntrack_tcp_packet() handles.
>
> The patch goes like this:
> diff --git a/net/netfilter/nf_conntrack_proto_dccp.c
> b/net/netfilter/nf_conntrack_proto_dccp.c
> index e2db1f4ec2df..ebc4f733bb2e 100644
> --- a/net/netfilter/nf_conntrack_proto_dccp.c
> +++ b/net/netfilter/nf_conntrack_proto_dccp.c
> @@ -525,7 +525,7 @@ int nf_conntrack_dccp_packet(struct nf_conn *ct,
> struct sk_buff *skb,
>
> dh = skb_header_pointer(skb, dataoff, sizeof(*dh), &_dh.dh);
> if (!dh)
> - return NF_DROP;
> + return -NF_ACCEPT;
>
> if (dccp_error(dh, skb, dataoff, state))
> return -NF_ACCEPT;
> @@ -533,7 +533,7 @@ int nf_conntrack_dccp_packet(struct nf_conn *ct,
> struct sk_buff *skb,
> /* pull again, including possible 48 bit sequences and subtype header */
> dh = dccp_header_pointer(skb, dataoff, dh, &_dh);
> if (!dh)
> - return NF_DROP;
> + return -NF_ACCEPT;
>
> type = dh->dccph_type;
> if (!nf_ct_is_confirmed(ct) && !dccp_new(ct, skb, dh, state))
>
> >
> > > My only purpose is not to let the TCP layer sending strange RST to the
> > > right flow.
> >
> > AFAIU tcp layer is correct, no? Out of the blue packet to some listener
> > socket?
>
> Allow me to finish the full sentence: my only purpose is not to let
> the TCP layer send strange RST to the _established_ socket due to
> receiving strange out-of-window skbs.
I don't understand why do you want to modify conntrack at all: conntrack
itself does not send RST packets. And the TCP layer don't send RST packets
to out of window ones either.
The only possibility I see for such packets is an iptables/nftables rule
which rejects packets classified as INVALID by conntrack.
As Florian suggested, why don't you change that rule?
The conntrack states are not fine-grained to express different TCP states
which covered with INVALID. It was never a good idea to reject INVALID
packets or let them through (leaking internal addresses).
Best regards,
Jozsef
> > > Besides, resorting to turning on nf_conntrack_tcp_be_liberal sysctl
> > > knob seems odd to me though it can workaround :S
> >
> > I don't see a better alternative, other than -p tcp -m conntrack
> > --ctstate INVALID -j DROP rule, if you wish for tcp stack to not see
> > such packets.
> >
> > > I would like to prevent sending such an RST as default behaviour.
> >
> > I don't see a way to make this work out of the box, without possible
> > unwanted side effects.
> >
> > MAYBE we could drop IFF we check that the conntrack entry candidate
> > that fails sequence validation has NAT translation applied to it, and
> > thus the '-NF_ACCEPT' packet won't be translated.
> >
> > Not even compile tested:
> >
> > diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
> > --- a/net/netfilter/nf_conntrack_proto_tcp.c
> > +++ b/net/netfilter/nf_conntrack_proto_tcp.c
> > @@ -1256,10 +1256,14 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct,
> > case NFCT_TCP_IGNORE:
> > spin_unlock_bh(&ct->lock);
> > return NF_ACCEPT;
> > - case NFCT_TCP_INVALID:
> > + case NFCT_TCP_INVALID: {
> > + verdict = -NF_ACCEPT;
> > + if (ct->status & IPS_NAT_MASK)
> > + res = NF_DROP; /* skb would miss nat transformation */
>
> Above line, I guess, should be 'verdict = NF_DROP'? Then this skb
> would be dropped in nf_hook_slow() eventually and would not be passed
> to the TCP layer.
>
> > nf_tcp_handle_invalid(ct, dir, index, skb, state);
> > spin_unlock_bh(&ct->lock);
> > - return -NF_ACCEPT;
> > + return verdict;
> > + }
> > case NFCT_TCP_ACCEPT:
> > break;
> > }
>
> Great! I think your draft patch makes sense really, which takes NAT
> into consideration.
>
> >
> > But I don't really see the advantage compared to doing drop decision in
> > iptables/nftables ruleset.
>
> From our views, especially to kernel developers, you're right: we
> could easily turn on that knob or add a drop policy to prevent it
> happening. Actually I did this in production to prevent such a case.
> It surely works.
>
> But from the views of normal users and those who do not understand how
> it works in the kernel, it looks strange: people may ask why we get
> some unknown RSTs in flight?
>
> > I also have a hunch that someone will eventually complain about this
> > change in behavior.
>
> Well, I still think the patch you suggested is proper and don't know
> why people could complain about it.
>
> Thanks for your patience :)
>
> Thanks,
> Jason
>
--
E-mail : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.hu
PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics
H-1525 Budapest 114, POB. 49, Hungary
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH net-next] netfilter: conntrack: avoid sending RST to reply out-of-window skb
2024-03-07 15:34 ` Jozsef Kadlecsik
@ 2024-03-07 15:59 ` Jason Xing
2024-03-07 19:00 ` Jozsef Kadlecsik
0 siblings, 1 reply; 14+ messages in thread
From: Jason Xing @ 2024-03-07 15:59 UTC (permalink / raw)
To: Jozsef Kadlecsik
Cc: Florian Westphal, edumazet, Pablo Neira Ayuso, kuba, pabeni,
David Miller, netfilter-devel, coreteam, netdev, Jason Xing
[...]
> > Allow me to finish the full sentence: my only purpose is not to let
> > the TCP layer send strange RST to the _established_ socket due to
> > receiving strange out-of-window skbs.
>
> I don't understand why do you want to modify conntrack at all: conntrack
> itself does not send RST packets. And the TCP layer don't send RST packets
> to out of window ones either.
Thanks for your reply.
To normal TCP flow, you're right because the TCP layer doesn't send
RST to out-of-window skbs.
But the DNAT policy on the server should have converted the port of
incoming skb from A_port to B_port as my description in this patch
said.
It actually doesn't happen. The conntrack clears the skb->_nfct value
after returning -NF_ACCEPT in nf_conntrack_tcp_packet() and then DNAT
would not convert the A_port to B_port.
So the TCP layer is not able to look up the correct socket (client_ip,
client_port, server_ip, B_port) because A_port doesn't match B_port,
then an RST would be sent to the client.
>
> The only possibility I see for such packets is an iptables/nftables rule
> which rejects packets classified as INVALID by conntrack.
>
> As Florian suggested, why don't you change that rule?
As I said, just for the workaround method, only turning on that sysctl
knob can solve the issue.
Thanks,
Jason
>
> The conntrack states are not fine-grained to express different TCP states
> which covered with INVALID. It was never a good idea to reject INVALID
> packets or let them through (leaking internal addresses).
>
> Best regards,
> Jozsef
>
> > > > Besides, resorting to turning on nf_conntrack_tcp_be_liberal sysctl
> > > > knob seems odd to me though it can workaround :S
> > >
> > > I don't see a better alternative, other than -p tcp -m conntrack
> > > --ctstate INVALID -j DROP rule, if you wish for tcp stack to not see
> > > such packets.
> > >
> > > > I would like to prevent sending such an RST as default behaviour.
> > >
> > > I don't see a way to make this work out of the box, without possible
> > > unwanted side effects.
> > >
> > > MAYBE we could drop IFF we check that the conntrack entry candidate
> > > that fails sequence validation has NAT translation applied to it, and
> > > thus the '-NF_ACCEPT' packet won't be translated.
> > >
> > > Not even compile tested:
> > >
> > > diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
> > > --- a/net/netfilter/nf_conntrack_proto_tcp.c
> > > +++ b/net/netfilter/nf_conntrack_proto_tcp.c
> > > @@ -1256,10 +1256,14 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct,
> > > case NFCT_TCP_IGNORE:
> > > spin_unlock_bh(&ct->lock);
> > > return NF_ACCEPT;
> > > - case NFCT_TCP_INVALID:
> > > + case NFCT_TCP_INVALID: {
> > > + verdict = -NF_ACCEPT;
> > > + if (ct->status & IPS_NAT_MASK)
> > > + res = NF_DROP; /* skb would miss nat transformation */
> >
> > Above line, I guess, should be 'verdict = NF_DROP'? Then this skb
> > would be dropped in nf_hook_slow() eventually and would not be passed
> > to the TCP layer.
> >
> > > nf_tcp_handle_invalid(ct, dir, index, skb, state);
> > > spin_unlock_bh(&ct->lock);
> > > - return -NF_ACCEPT;
> > > + return verdict;
> > > + }
> > > case NFCT_TCP_ACCEPT:
> > > break;
> > > }
> >
> > Great! I think your draft patch makes sense really, which takes NAT
> > into consideration.
> >
> > >
> > > But I don't really see the advantage compared to doing drop decision in
> > > iptables/nftables ruleset.
> >
> > From our views, especially to kernel developers, you're right: we
> > could easily turn on that knob or add a drop policy to prevent it
> > happening. Actually I did this in production to prevent such a case.
> > It surely works.
> >
> > But from the views of normal users and those who do not understand how
> > it works in the kernel, it looks strange: people may ask why we get
> > some unknown RSTs in flight?
> >
> > > I also have a hunch that someone will eventually complain about this
> > > change in behavior.
> >
> > Well, I still think the patch you suggested is proper and don't know
> > why people could complain about it.
> >
> > Thanks for your patience :)
> >
> > Thanks,
> > Jason
> >
>
> --
> E-mail : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.hu
> PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt
> Address : Wigner Research Centre for Physics
> H-1525 Budapest 114, POB. 49, Hungary
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH net-next] netfilter: conntrack: avoid sending RST to reply out-of-window skb
2024-03-07 15:59 ` Jason Xing
@ 2024-03-07 19:00 ` Jozsef Kadlecsik
2024-03-08 0:42 ` Jason Xing
0 siblings, 1 reply; 14+ messages in thread
From: Jozsef Kadlecsik @ 2024-03-07 19:00 UTC (permalink / raw)
To: Jason Xing
Cc: Florian Westphal, edumazet, Pablo Neira Ayuso, kuba, pabeni,
David Miller, netfilter-devel, coreteam, netdev, Jason Xing
On Thu, 7 Mar 2024, Jason Xing wrote:
> > > Allow me to finish the full sentence: my only purpose is not to let
> > > the TCP layer send strange RST to the _established_ socket due to
> > > receiving strange out-of-window skbs.
> >
> > I don't understand why do you want to modify conntrack at all: conntrack
> > itself does not send RST packets. And the TCP layer don't send RST packets
> > to out of window ones either.
>
> To normal TCP flow, you're right because the TCP layer doesn't send RST
> to out-of-window skbs.
>
> But the DNAT policy on the server should have converted the port of
> incoming skb from A_port to B_port as my description in this patch said.
>
> It actually doesn't happen. The conntrack clears the skb->_nfct value
> after returning -NF_ACCEPT in nf_conntrack_tcp_packet() and then DNAT
> would not convert the A_port to B_port.
The packet is INVALID therefore it's not NATed. I don't think that could
simply be changed in netfilter.
> So the TCP layer is not able to look up the correct socket (client_ip,
> client_port, server_ip, B_port) because A_port doesn't match B_port,
> then an RST would be sent to the client.
Yes, if you let the packet continue to traverse the stack.
> > The only possibility I see for such packets is an iptables/nftables rule
> > which rejects packets classified as INVALID by conntrack.
> >
> > As Florian suggested, why don't you change that rule?
>
> As I said, just for the workaround method, only turning on that sysctl
> knob can solve the issue.
Sorry, but why? If you drop the packet by a rule, then there's no need to
use the sysctl setting and there's no unwanted RST packets.
Best regards,
Jozsef
> > The conntrack states are not fine-grained to express different TCP states
> > which covered with INVALID. It was never a good idea to reject INVALID
> > packets or let them through (leaking internal addresses).
> >
> > Best regards,
> > Jozsef
> >
> > > > > Besides, resorting to turning on nf_conntrack_tcp_be_liberal sysctl
> > > > > knob seems odd to me though it can workaround :S
> > > >
> > > > I don't see a better alternative, other than -p tcp -m conntrack
> > > > --ctstate INVALID -j DROP rule, if you wish for tcp stack to not see
> > > > such packets.
> > > >
> > > > > I would like to prevent sending such an RST as default behaviour.
> > > >
> > > > I don't see a way to make this work out of the box, without possible
> > > > unwanted side effects.
> > > >
> > > > MAYBE we could drop IFF we check that the conntrack entry candidate
> > > > that fails sequence validation has NAT translation applied to it, and
> > > > thus the '-NF_ACCEPT' packet won't be translated.
> > > >
> > > > Not even compile tested:
> > > >
> > > > diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
> > > > --- a/net/netfilter/nf_conntrack_proto_tcp.c
> > > > +++ b/net/netfilter/nf_conntrack_proto_tcp.c
> > > > @@ -1256,10 +1256,14 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct,
> > > > case NFCT_TCP_IGNORE:
> > > > spin_unlock_bh(&ct->lock);
> > > > return NF_ACCEPT;
> > > > - case NFCT_TCP_INVALID:
> > > > + case NFCT_TCP_INVALID: {
> > > > + verdict = -NF_ACCEPT;
> > > > + if (ct->status & IPS_NAT_MASK)
> > > > + res = NF_DROP; /* skb would miss nat transformation */
> > >
> > > Above line, I guess, should be 'verdict = NF_DROP'? Then this skb
> > > would be dropped in nf_hook_slow() eventually and would not be passed
> > > to the TCP layer.
> > >
> > > > nf_tcp_handle_invalid(ct, dir, index, skb, state);
> > > > spin_unlock_bh(&ct->lock);
> > > > - return -NF_ACCEPT;
> > > > + return verdict;
> > > > + }
> > > > case NFCT_TCP_ACCEPT:
> > > > break;
> > > > }
> > >
> > > Great! I think your draft patch makes sense really, which takes NAT
> > > into consideration.
> > >
> > > >
> > > > But I don't really see the advantage compared to doing drop decision in
> > > > iptables/nftables ruleset.
> > >
> > > From our views, especially to kernel developers, you're right: we
> > > could easily turn on that knob or add a drop policy to prevent it
> > > happening. Actually I did this in production to prevent such a case.
> > > It surely works.
> > >
> > > But from the views of normal users and those who do not understand how
> > > it works in the kernel, it looks strange: people may ask why we get
> > > some unknown RSTs in flight?
> > >
> > > > I also have a hunch that someone will eventually complain about this
> > > > change in behavior.
> > >
> > > Well, I still think the patch you suggested is proper and don't know
> > > why people could complain about it.
> > >
> > > Thanks for your patience :)
> > >
> > > Thanks,
> > > Jason
> > >
> >
> > --
> > E-mail : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.hu
> > PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt
> > Address : Wigner Research Centre for Physics
> > H-1525 Budapest 114, POB. 49, Hungary
>
--
E-mail : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.hu
PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics
H-1525 Budapest 114, POB. 49, Hungary
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH net-next] netfilter: conntrack: avoid sending RST to reply out-of-window skb
2024-03-07 19:00 ` Jozsef Kadlecsik
@ 2024-03-08 0:42 ` Jason Xing
0 siblings, 0 replies; 14+ messages in thread
From: Jason Xing @ 2024-03-08 0:42 UTC (permalink / raw)
To: Jozsef Kadlecsik
Cc: Florian Westphal, edumazet, Pablo Neira Ayuso, kuba, pabeni,
David Miller, netfilter-devel, coreteam, netdev, Jason Xing
On Fri, Mar 8, 2024 at 3:00 AM Jozsef Kadlecsik
<kadlec@blackhole.kfki.hu> wrote:
>
> On Thu, 7 Mar 2024, Jason Xing wrote:
>
> > > > Allow me to finish the full sentence: my only purpose is not to let
> > > > the TCP layer send strange RST to the _established_ socket due to
> > > > receiving strange out-of-window skbs.
> > >
> > > I don't understand why do you want to modify conntrack at all: conntrack
> > > itself does not send RST packets. And the TCP layer don't send RST packets
> > > to out of window ones either.
> >
> > To normal TCP flow, you're right because the TCP layer doesn't send RST
> > to out-of-window skbs.
> >
> > But the DNAT policy on the server should have converted the port of
> > incoming skb from A_port to B_port as my description in this patch said.
> >
> > It actually doesn't happen. The conntrack clears the skb->_nfct value
> > after returning -NF_ACCEPT in nf_conntrack_tcp_packet() and then DNAT
> > would not convert the A_port to B_port.
>
> The packet is INVALID therefore it's not NATed. I don't think that could
> simply be changed in netfilter.
Well, what would you suggest ? :)
>
> > So the TCP layer is not able to look up the correct socket (client_ip,
> > client_port, server_ip, B_port) because A_port doesn't match B_port,
> > then an RST would be sent to the client.
>
> Yes, if you let the packet continue to traverse the stack.
Yes!
>
> > > The only possibility I see for such packets is an iptables/nftables rule
> > > which rejects packets classified as INVALID by conntrack.
> > >
> > > As Florian suggested, why don't you change that rule?
> >
> > As I said, just for the workaround method, only turning on that sysctl
> > knob can solve the issue.
>
> Sorry, but why? If you drop the packet by a rule, then there's no need to
> use the sysctl setting and there's no unwanted RST packets.
Oh, I was trying to clarify that using some knob can work around which
is not my original intention, but I don't seek a workaround method. I
didn't use --cstate INVALID to drop those INVALID packets in
production, which I feel should work.
For this particular case, I feel it's not that good to ask
users/customers to add more rules or turn on sysctl knob to prevent
seeing such RSTs.
Instead I thought we could naturally stop sending such RSTs as default
without asking other people to change something. People shouldn't see
the RSTs really. That's why I wrote this patch.
I hope I'm right... :S
Thanks,
Jason
>
> Best regards,
> Jozsef
>
> > > The conntrack states are not fine-grained to express different TCP states
> > > which covered with INVALID. It was never a good idea to reject INVALID
> > > packets or let them through (leaking internal addresses).
> > >
> > > Best regards,
> > > Jozsef
> > >
> > > > > > Besides, resorting to turning on nf_conntrack_tcp_be_liberal sysctl
> > > > > > knob seems odd to me though it can workaround :S
> > > > >
> > > > > I don't see a better alternative, other than -p tcp -m conntrack
> > > > > --ctstate INVALID -j DROP rule, if you wish for tcp stack to not see
> > > > > such packets.
> > > > >
> > > > > > I would like to prevent sending such an RST as default behaviour.
> > > > >
> > > > > I don't see a way to make this work out of the box, without possible
> > > > > unwanted side effects.
> > > > >
> > > > > MAYBE we could drop IFF we check that the conntrack entry candidate
> > > > > that fails sequence validation has NAT translation applied to it, and
> > > > > thus the '-NF_ACCEPT' packet won't be translated.
> > > > >
> > > > > Not even compile tested:
> > > > >
> > > > > diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
> > > > > --- a/net/netfilter/nf_conntrack_proto_tcp.c
> > > > > +++ b/net/netfilter/nf_conntrack_proto_tcp.c
> > > > > @@ -1256,10 +1256,14 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct,
> > > > > case NFCT_TCP_IGNORE:
> > > > > spin_unlock_bh(&ct->lock);
> > > > > return NF_ACCEPT;
> > > > > - case NFCT_TCP_INVALID:
> > > > > + case NFCT_TCP_INVALID: {
> > > > > + verdict = -NF_ACCEPT;
> > > > > + if (ct->status & IPS_NAT_MASK)
> > > > > + res = NF_DROP; /* skb would miss nat transformation */
> > > >
> > > > Above line, I guess, should be 'verdict = NF_DROP'? Then this skb
> > > > would be dropped in nf_hook_slow() eventually and would not be passed
> > > > to the TCP layer.
> > > >
> > > > > nf_tcp_handle_invalid(ct, dir, index, skb, state);
> > > > > spin_unlock_bh(&ct->lock);
> > > > > - return -NF_ACCEPT;
> > > > > + return verdict;
> > > > > + }
> > > > > case NFCT_TCP_ACCEPT:
> > > > > break;
> > > > > }
> > > >
> > > > Great! I think your draft patch makes sense really, which takes NAT
> > > > into consideration.
> > > >
> > > > >
> > > > > But I don't really see the advantage compared to doing drop decision in
> > > > > iptables/nftables ruleset.
> > > >
> > > > From our views, especially to kernel developers, you're right: we
> > > > could easily turn on that knob or add a drop policy to prevent it
> > > > happening. Actually I did this in production to prevent such a case.
> > > > It surely works.
> > > >
> > > > But from the views of normal users and those who do not understand how
> > > > it works in the kernel, it looks strange: people may ask why we get
> > > > some unknown RSTs in flight?
> > > >
> > > > > I also have a hunch that someone will eventually complain about this
> > > > > change in behavior.
> > > >
> > > > Well, I still think the patch you suggested is proper and don't know
> > > > why people could complain about it.
> > > >
> > > > Thanks for your patience :)
> > > >
> > > > Thanks,
> > > > Jason
> > > >
> > >
> > > --
> > > E-mail : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.hu
> > > PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt
> > > Address : Wigner Research Centre for Physics
> > > H-1525 Budapest 114, POB. 49, Hungary
> >
>
> --
> E-mail : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.hu
> PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt
> Address : Wigner Research Centre for Physics
> H-1525 Budapest 114, POB. 49, Hungary
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next] netfilter: conntrack: avoid sending RST to reply out-of-window skb
2024-03-07 15:11 ` Jason Xing
2024-03-07 15:34 ` Jozsef Kadlecsik
@ 2024-03-08 8:59 ` Jason Xing
2024-03-08 22:46 ` Florian Westphal
2 siblings, 0 replies; 14+ messages in thread
From: Jason Xing @ 2024-03-08 8:59 UTC (permalink / raw)
To: Florian Westphal
Cc: edumazet, pablo, kadlec, kuba, pabeni, davem, netfilter-devel,
coreteam, netdev, Jason Xing
On Thu, Mar 7, 2024 at 11:11 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Thu, Mar 7, 2024 at 10:10 PM Florian Westphal <fw@strlen.de> wrote:
> >
> > Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > On Thu, Mar 7, 2024 at 8:00 PM Florian Westphal <fw@strlen.de> wrote:
> > > >
> > > > Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > > > > This change disables most of the tcp_in_window() test, this will
> > > > > > pretend everything is fine even though tcp_in_window says otherwise.
> > > > >
> > > > > Thanks for the information. It does make sense.
> > > > >
> > > > > What I've done is quite similar to nf_conntrack_tcp_be_liberal sysctl
> > > > > knob which you also pointed out. It also pretends to ignore those
> > > > > out-of-window skbs.
> > > > >
> > > > > >
> > > > > > You could:
> > > > > > - drop invalid tcp packets in input hook
> > > > >
> > > > > How about changing the return value only as below? Only two cases will
> > > > > be handled:
> > > > >
> > > > > diff --git a/net/netfilter/nf_conntrack_proto_tcp.c
> > > > > b/net/netfilter/nf_conntrack_proto_tcp.c
> > > > > index ae493599a3ef..c88ce4cd041e 100644
> > > > > --- a/net/netfilter/nf_conntrack_proto_tcp.c
> > > > > +++ b/net/netfilter/nf_conntrack_proto_tcp.c
> > > > > @@ -1259,7 +1259,7 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct,
> > > > > case NFCT_TCP_INVALID:
> > > > > nf_tcp_handle_invalid(ct, dir, index, skb, state);
> > > > > spin_unlock_bh(&ct->lock);
> > > > > - return -NF_ACCEPT;
> > > > > + return -NF_DROP;
> > > >
> > > > Lets not do this. conntrack should never drop packets and defer to ruleset
> > > > whereever possible.
> > >
> > > Hmm, sorry, it is against my understanding.
> > >
> > > If we cannot return -NF_DROP, why have we already added some 'return
> > > NF_DROP' in the nf_conntrack_handle_packet() function? And why does
> > > this test statement exist?
> >
> > Sure we can drop. But we should only do it if there is no better
> > alternative.
> >
> > > nf_conntrack_in()
> > > -> nf_conntrack_handle_packet()
> > > -> if (ret <= 0) {
> > > if (ret == -NF_DROP) NF_CT_STAT_INC_ATOMIC(state->net, drop);
> >
> > AFAICS this only happens when we receive syn for an existing conntrack
> > that is being removed already so we'd expect next syn to create a new
>
> Sorry, I've double-checked this part and found out there is no chance
> to return '-NF_DROP' for nf_conntrack_handle_packet(). It might return
> 'NF_DROP' (see link [1]) instead. The if-else statements seem like
> dead code.
My bad. My mind went blank last night, maybe it's too late. Well, It's
not dead code because '-NF_DROP' which is still zero makes me
misunderstand :(
I'll post a patch to change 'if (ret == -NF_DROP)' to 'if (ret ==
NF_DROP)' just for easy reading.
Thanks,
Jason
>
> [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/netfilter/nf_conntrack_proto_tcp.c#:~:text=%2DNF_REPEAT%3B-,return%20NF_DROP%3B,-%7D%0A%09%09fallthrough%3B
>
> > connection. Feel free to send patches that replace drop with -accept
> > where possible/where it makes sense, but I don't think the
> > TCP_CONNTRACK_SYN_SENT one can reasonably be avoided.
>
> Oh, are you suggesting replacing NF_DROP with -NF_ACCEPT in
> nf_conntrack_dccp_packet()?
>
> There are three points where nf_conntrack_handle_packet() returns NF_DROP:
> 1) one (syn_sent case) exists in nf_conntrack_tcp_packet(). As you
> said, it's not necessary to change.
> 2) another two exist in nf_conntrack_dccp_packet() which should be the
> same as nf_conntrack_tcp_packet() handles.
>
> The patch goes like this:
> diff --git a/net/netfilter/nf_conntrack_proto_dccp.c
> b/net/netfilter/nf_conntrack_proto_dccp.c
> index e2db1f4ec2df..ebc4f733bb2e 100644
> --- a/net/netfilter/nf_conntrack_proto_dccp.c
> +++ b/net/netfilter/nf_conntrack_proto_dccp.c
> @@ -525,7 +525,7 @@ int nf_conntrack_dccp_packet(struct nf_conn *ct,
> struct sk_buff *skb,
>
> dh = skb_header_pointer(skb, dataoff, sizeof(*dh), &_dh.dh);
> if (!dh)
> - return NF_DROP;
> + return -NF_ACCEPT;
>
> if (dccp_error(dh, skb, dataoff, state))
> return -NF_ACCEPT;
> @@ -533,7 +533,7 @@ int nf_conntrack_dccp_packet(struct nf_conn *ct,
> struct sk_buff *skb,
> /* pull again, including possible 48 bit sequences and subtype header */
> dh = dccp_header_pointer(skb, dataoff, dh, &_dh);
> if (!dh)
> - return NF_DROP;
> + return -NF_ACCEPT;
>
> type = dh->dccph_type;
> if (!nf_ct_is_confirmed(ct) && !dccp_new(ct, skb, dh, state))
>
> >
> > > My only purpose is not to let the TCP layer sending strange RST to the
> > > right flow.
> >
> > AFAIU tcp layer is correct, no? Out of the blue packet to some listener
> > socket?
>
> Allow me to finish the full sentence: my only purpose is not to let
> the TCP layer send strange RST to the _established_ socket due to
> receiving strange out-of-window skbs.
>
> >
> > > Besides, resorting to turning on nf_conntrack_tcp_be_liberal sysctl
> > > knob seems odd to me though it can workaround :S
> >
> > I don't see a better alternative, other than -p tcp -m conntrack
> > --ctstate INVALID -j DROP rule, if you wish for tcp stack to not see
> > such packets.
> >
> > > I would like to prevent sending such an RST as default behaviour.
> >
> > I don't see a way to make this work out of the box, without possible
> > unwanted side effects.
> >
> > MAYBE we could drop IFF we check that the conntrack entry candidate
> > that fails sequence validation has NAT translation applied to it, and
> > thus the '-NF_ACCEPT' packet won't be translated.
> >
> > Not even compile tested:
> >
> > diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
> > --- a/net/netfilter/nf_conntrack_proto_tcp.c
> > +++ b/net/netfilter/nf_conntrack_proto_tcp.c
> > @@ -1256,10 +1256,14 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct,
> > case NFCT_TCP_IGNORE:
> > spin_unlock_bh(&ct->lock);
> > return NF_ACCEPT;
> > - case NFCT_TCP_INVALID:
> > + case NFCT_TCP_INVALID: {
> > + verdict = -NF_ACCEPT;
> > + if (ct->status & IPS_NAT_MASK)
> > + res = NF_DROP; /* skb would miss nat transformation */
>
> Above line, I guess, should be 'verdict = NF_DROP'? Then this skb
> would be dropped in nf_hook_slow() eventually and would not be passed
> to the TCP layer.
>
> > nf_tcp_handle_invalid(ct, dir, index, skb, state);
> > spin_unlock_bh(&ct->lock);
> > - return -NF_ACCEPT;
> > + return verdict;
> > + }
> > case NFCT_TCP_ACCEPT:
> > break;
> > }
>
> Great! I think your draft patch makes sense really, which takes NAT
> into consideration.
>
> >
> > But I don't really see the advantage compared to doing drop decision in
> > iptables/nftables ruleset.
>
> From our views, especially to kernel developers, you're right: we
> could easily turn on that knob or add a drop policy to prevent it
> happening. Actually I did this in production to prevent such a case.
> It surely works.
>
> But from the views of normal users and those who do not understand how
> it works in the kernel, it looks strange: people may ask why we get
> some unknown RSTs in flight?
>
> >
> > I also have a hunch that someone will eventually complain about this
> > change in behavior.
>
> Well, I still think the patch you suggested is proper and don't know
> why people could complain about it.
>
> Thanks for your patience :)
>
> Thanks,
> Jason
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH net-next] netfilter: conntrack: avoid sending RST to reply out-of-window skb
2024-03-07 15:11 ` Jason Xing
2024-03-07 15:34 ` Jozsef Kadlecsik
2024-03-08 8:59 ` Jason Xing
@ 2024-03-08 22:46 ` Florian Westphal
2024-03-09 0:37 ` Jason Xing
2 siblings, 1 reply; 14+ messages in thread
From: Florian Westphal @ 2024-03-08 22:46 UTC (permalink / raw)
To: Jason Xing
Cc: Florian Westphal, edumazet, pablo, kadlec, kuba, pabeni, davem,
netfilter-devel, coreteam, netdev, Jason Xing
Jason Xing <kerneljasonxing@gmail.com> wrote:
> > connection. Feel free to send patches that replace drop with -accept
> > where possible/where it makes sense, but I don't think the
> > TCP_CONNTRACK_SYN_SENT one can reasonably be avoided.
>
> Oh, are you suggesting replacing NF_DROP with -NF_ACCEPT in
> nf_conntrack_dccp_packet()?
It would be more consistent with what tcp and sctp trackers are
doing, but this should not matter in practice (the packet is malformed).
> > + case NFCT_TCP_INVALID: {
> > + verdict = -NF_ACCEPT;
> > + if (ct->status & IPS_NAT_MASK)
> > + res = NF_DROP; /* skb would miss nat transformation */
>
> Above line, I guess, should be 'verdict = NF_DROP'?
Yes.
> Great! I think your draft patch makes sense really, which takes NAT
> into consideration.
You could submit this officially and we could give it a try and see if
anyone complains down the road.
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH net-next] netfilter: conntrack: avoid sending RST to reply out-of-window skb
2024-03-08 22:46 ` Florian Westphal
@ 2024-03-09 0:37 ` Jason Xing
0 siblings, 0 replies; 14+ messages in thread
From: Jason Xing @ 2024-03-09 0:37 UTC (permalink / raw)
To: Florian Westphal
Cc: edumazet, pablo, kadlec, kuba, pabeni, davem, netfilter-devel,
coreteam, netdev, Jason Xing
On Sat, Mar 9, 2024 at 6:47 AM Florian Westphal <fw@strlen.de> wrote:
>
> Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > connection. Feel free to send patches that replace drop with -accept
> > > where possible/where it makes sense, but I don't think the
> > > TCP_CONNTRACK_SYN_SENT one can reasonably be avoided.
> >
> > Oh, are you suggesting replacing NF_DROP with -NF_ACCEPT in
> > nf_conntrack_dccp_packet()?
>
> It would be more consistent with what tcp and sctp trackers are
> doing, but this should not matter in practice (the packet is malformed).
Okay, I will take some time to check the sctp part. BTW, just like one
of previous emails said, I noticed there are two points in DCCP part
which is not consistent with TCP part, so I submitted one simple patch
[1] to do it.
[1]: https://lore.kernel.org/all/20240308092915.9751-1-kerneljasonxing@gmail.com/
>
> > > + case NFCT_TCP_INVALID: {
> > > + verdict = -NF_ACCEPT;
> > > + if (ct->status & IPS_NAT_MASK)
> > > + res = NF_DROP; /* skb would miss nat transformation */
> >
> > Above line, I guess, should be 'verdict = NF_DROP'?
>
> Yes.
>
> > Great! I think your draft patch makes sense really, which takes NAT
> > into consideration.
>
> You could submit this officially and we could give it a try and see if
> anyone complains down the road.
Great :)
Thanks,
Jason
^ permalink raw reply [flat|nested] 14+ messages in thread