* 6.1: possible bug with netfilter conntrack? @ 2023-01-12 23:03 Russell King (Oracle) 2023-01-12 23:38 ` Florian Westphal 2023-01-12 23:40 ` Russell King (Oracle) 0 siblings, 2 replies; 8+ messages in thread From: Russell King (Oracle) @ 2023-01-12 23:03 UTC (permalink / raw) To: netdev, netfilter-devel, coreteam Hi, I've noticed that my network at home is rather struggling, and having done some investigation, I find that the router VM is dropping packets due to lots of: nf_conntrack: nf_conntrack: table full, dropping packet I find that there are about 2380 established and assured connections with a destination of my incoming mail server with destination port 25, and 2 packets. In the reverse direction, apparently only one packet was sent according to conntrack. E.g.: tcp 6 340593 ESTABLISHED src=180.173.2.183 dst=78.32.30.218 sport=49694 dport=25 packets=2 bytes=92 src=78.32.30.218 dst=180.173.2.183 sport=25 dport=49694 packets=1 bytes=44 [ASSURED] use=1 However, if I look at the incoming mail server, its kernel believes there are no incoming port 25 connetions, which matches exim. I hadn't noticed any issues prior to upgrading from 5.16 to 6.1 on the router VM, and the firewall rules have been the same for much of 2021/2022. Is this is known issue? Something changed between 5.16 and 6.1 in the way conntrack works? I'm going to be manually clearing the conntrack table so stuff works again without lots of packet loss on my home network... Thanks. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 6.1: possible bug with netfilter conntrack? 2023-01-12 23:03 6.1: possible bug with netfilter conntrack? Russell King (Oracle) @ 2023-01-12 23:38 ` Florian Westphal 2023-01-13 0:16 ` Russell King (Oracle) 2023-01-12 23:40 ` Russell King (Oracle) 1 sibling, 1 reply; 8+ messages in thread From: Florian Westphal @ 2023-01-12 23:38 UTC (permalink / raw) To: Russell King (Oracle); +Cc: netdev, netfilter-devel, coreteam Russell King (Oracle) <linux@armlinux.org.uk> wrote: > Hi, > > I've noticed that my network at home is rather struggling, and having > done some investigation, I find that the router VM is dropping packets > due to lots of: > > nf_conntrack: nf_conntrack: table full, dropping packet > > I find that there are about 2380 established and assured connections > with a destination of my incoming mail server with destination port 25, > and 2 packets. In the reverse direction, apparently only one packet was > sent according to conntrack. E.g.: > > tcp 6 340593 ESTABLISHED src=180.173.2.183 dst=78.32.30.218 > sport=49694 dport=25 packets=2 bytes=92 src=78.32.30.218 > dst=180.173.2.183 sport=25 dport=49694 packets=1 bytes=44 [ASSURED] > use=1 Non-early-evictable entry that will expire in ~4 days, so not really surprising that this eventually fills the table. I'd suggest to reduce the net.netfilter.nf_conntrack_tcp_timeout_established sysctl to something more sane, e.g. 2 minutes or so unless you need to have longer timeouts. But this did not change, so not the root cause of this problem. > However, if I look at the incoming mail server, its kernel believes > there are no incoming port 25 connetions, which matches exim. > > I hadn't noticed any issues prior to upgrading from 5.16 to 6.1 on the > router VM, and the firewall rules have been the same for much of > 2021/2022. > > Is this is known issue? Something changed between 5.16 and 6.1 in the > way conntrack works? Nothing that should have such an impact. Does 'sysctl net.netfilter.nf_conntrack_tcp_loose=0' avoid the buildup of such entries? I'm wondering if conntrack misses the connection shutdown or if its perhaps triggering the entries because of late packets or similar. If that doesn't help. you could also check if 'sysctl net.netfilter.nf_conntrack_tcp_be_liberal=1' helps -- if it does, its time for more debugging but its too early to start digging atm. This would point at conntrack ignoring/discarding fin/reset packets. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 6.1: possible bug with netfilter conntrack? 2023-01-12 23:38 ` Florian Westphal @ 2023-01-13 0:16 ` Russell King (Oracle) 0 siblings, 0 replies; 8+ messages in thread From: Russell King (Oracle) @ 2023-01-13 0:16 UTC (permalink / raw) To: Florian Westphal; +Cc: netdev, netfilter-devel, coreteam Hi Florian, Thanks for the quick reply. On Fri, Jan 13, 2023 at 12:38:08AM +0100, Florian Westphal wrote: > Russell King (Oracle) <linux@armlinux.org.uk> wrote: > > Hi, > > > > I've noticed that my network at home is rather struggling, and having > > done some investigation, I find that the router VM is dropping packets > > due to lots of: > > > > nf_conntrack: nf_conntrack: table full, dropping packet > > > > I find that there are about 2380 established and assured connections > > with a destination of my incoming mail server with destination port 25, > > and 2 packets. In the reverse direction, apparently only one packet was > > sent according to conntrack. E.g.: > > > > tcp 6 340593 ESTABLISHED src=180.173.2.183 dst=78.32.30.218 > > sport=49694 dport=25 packets=2 bytes=92 src=78.32.30.218 > > dst=180.173.2.183 sport=25 dport=49694 packets=1 bytes=44 [ASSURED] > > use=1 > > Non-early-evictable entry that will expire in ~4 days, so not really > surprising that this eventually fills the table. > > I'd suggest to reduce the > net.netfilter.nf_conntrack_tcp_timeout_established > sysctl to something more sane, e.g. 2 minutes or so unless you need > to have longer timeouts. > > But this did not change, so not the root cause of this problem. I'll hold off trying that for now - I do tend to have some connections that may be idle... > > However, if I look at the incoming mail server, its kernel believes > > there are no incoming port 25 connetions, which matches exim. > > > > I hadn't noticed any issues prior to upgrading from 5.16 to 6.1 on the > > router VM, and the firewall rules have been the same for much of > > 2021/2022. > > > > Is this is known issue? Something changed between 5.16 and 6.1 in the > > way conntrack works? > > Nothing that should have such an impact. > > Does 'sysctl net.netfilter.nf_conntrack_tcp_loose=0' avoid the buildup > of such entries? I'm wondering if conntrack misses the connection > shutdown or if its perhaps triggering the entries because of late > packets or similar. > > If that doesn't help. you could also check if > > 'sysctl net.netfilter.nf_conntrack_tcp_be_liberal=1' helps -- if it > does, its time for more debugging but its too early to start digging > atm. This would point at conntrack ignoring/discarding fin/reset > packets. I think first I need to work out how the issue arises, since it seems to be behaving normally at the moment. I have for example: $ grep 173.239.196.95 bad-conntrack.log | wc -l 314 and this resolves to 173-239-196-95.azu1ez9l.com. It looks like exim was happy with that, so would have issued its SMTP banner very shortly after the connection was established, but all the entries in the conntrack table show packets=2...packets=1 meaning conntrack only saw the SYN, SYNACK and ACK packets establishing the connection, but not the packet sending the SMTP banner which seems mightily weird. I've just tried this from a machine on the 'net, telneting in to the SMTP port, the conntrack packet counters increase beyond 2/1, and when exim times out the connection, the conntrack entry goes away - so everything seems to work how it should. Digging through the logs, it looks like the first table-full happened twice on Dec 30th, just two and a half days after boot. Then eight times on Jan 10th, and from the 11th at about 11pm, the logs have been sporadically flooded with the conntrack table full messages. I'll try to keep an eye on it and dig out something a bit more useful which may help locate what the issue is, but it seems the trigger mechanism isn't something obvious. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 6.1: possible bug with netfilter conntrack? 2023-01-12 23:03 6.1: possible bug with netfilter conntrack? Russell King (Oracle) 2023-01-12 23:38 ` Florian Westphal @ 2023-01-12 23:40 ` Russell King (Oracle) 2023-01-12 23:45 ` Florian Westphal 1 sibling, 1 reply; 8+ messages in thread From: Russell King (Oracle) @ 2023-01-12 23:40 UTC (permalink / raw) To: netdev, netfilter-devel, coreteam On Thu, Jan 12, 2023 at 11:03:56PM +0000, Russell King (Oracle) wrote: > Hi, > > I've noticed that my network at home is rather struggling, and having > done some investigation, I find that the router VM is dropping packets > due to lots of: > > nf_conntrack: nf_conntrack: table full, dropping packet > > I find that there are about 2380 established and assured connections > with a destination of my incoming mail server with destination port 25, > and 2 packets. In the reverse direction, apparently only one packet was > sent according to conntrack. E.g.: > > tcp 6 340593 ESTABLISHED src=180.173.2.183 dst=78.32.30.218 > sport=49694 dport=25 packets=2 bytes=92 src=78.32.30.218 > dst=180.173.2.183 sport=25 dport=49694 packets=1 bytes=44 [ASSURED] > use=1 > > However, if I look at the incoming mail server, its kernel believes > there are no incoming port 25 connetions, which matches exim. > > I hadn't noticed any issues prior to upgrading from 5.16 to 6.1 on the > router VM, and the firewall rules have been the same for much of > 2021/2022. > > Is this is known issue? Something changed between 5.16 and 6.1 in the > way conntrack works? > > I'm going to be manually clearing the conntrack table so stuff works > again without lots of packet loss on my home network... > > Thanks. Having cleared out all the dport=25 and dport=587 entries, and observation there after, it looks like conntrack generally works as one would expect - I see connections become established, and then disappear as one expects. All traffic between the 'net and the mail server goes through the router VM in both directions - there is no asymetry there (the mail server is on a DMZ network which is only routed within the router VM to the PPP interfaces also in the router VM for the public network.) So, conntrack will be aware of every packet in both directions. I do see conntrack entries entering TIME_WAIT, LAST_ACK, CLOSE etc. Given the packet counts as per my example above, it looks like conntrack only saw: src=180.173.2.183 dst=78.32.30.218 SYN src=78.32.30.218 dst=180.173.2.183 SYN+ACK src=180.173.2.183 dst=78.32.30.218 ACK and I suspect at that point, the connection went silent - until Exim timed out and closed the connection, as does seem to be the case: 2023-01-11 21:32:04 no host name found for IP address 180.173.2.183 2023-01-11 21:33:05 SMTP command timeout on connection from [180.173.2.183]:64332 I=[78.32.30.218]:25 but if Exim closed the connection, why didn't conntrack pick it up? -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 6.1: possible bug with netfilter conntrack? 2023-01-12 23:40 ` Russell King (Oracle) @ 2023-01-12 23:45 ` Florian Westphal 2023-01-13 11:12 ` Russell King (Oracle) 0 siblings, 1 reply; 8+ messages in thread From: Florian Westphal @ 2023-01-12 23:45 UTC (permalink / raw) To: Russell King (Oracle); +Cc: netdev, netfilter-devel, coreteam Russell King (Oracle) <linux@armlinux.org.uk> wrote: > Given the packet counts as per my example above, it looks like > conntrack only saw: > > src=180.173.2.183 dst=78.32.30.218 SYN > src=78.32.30.218 dst=180.173.2.183 SYN+ACK > src=180.173.2.183 dst=78.32.30.218 ACK > > and I suspect at that point, the connection went silent - until > Exim timed out and closed the connection, as does seem to be the > case: > > 2023-01-11 21:32:04 no host name found for IP address 180.173.2.183 > 2023-01-11 21:33:05 SMTP command timeout on connection from [180.173.2.183]:64332 I=[78.32.30.218]:25 > > but if Exim closed the connection, why didn't conntrack pick it up? Yes, thats the question. Exim closing the connection should have conntrack at least pick up a fin packet from the mail server (which should move the entry to the 2 minute fin timeout). ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 6.1: possible bug with netfilter conntrack? 2023-01-12 23:45 ` Florian Westphal @ 2023-01-13 11:12 ` Russell King (Oracle) 2023-01-13 12:56 ` Florian Westphal 0 siblings, 1 reply; 8+ messages in thread From: Russell King (Oracle) @ 2023-01-13 11:12 UTC (permalink / raw) To: Florian Westphal; +Cc: netdev, netfilter-devel, coreteam On Fri, Jan 13, 2023 at 12:45:03AM +0100, Florian Westphal wrote: > Russell King (Oracle) <linux@armlinux.org.uk> wrote: > > Given the packet counts as per my example above, it looks like > > conntrack only saw: > > > > src=180.173.2.183 dst=78.32.30.218 SYN > > src=78.32.30.218 dst=180.173.2.183 SYN+ACK > > src=180.173.2.183 dst=78.32.30.218 ACK > > > > and I suspect at that point, the connection went silent - until > > Exim timed out and closed the connection, as does seem to be the > > case: > > > > 2023-01-11 21:32:04 no host name found for IP address 180.173.2.183 > > 2023-01-11 21:33:05 SMTP command timeout on connection from [180.173.2.183]:64332 I=[78.32.30.218]:25 > > > > but if Exim closed the connection, why didn't conntrack pick it up? > > Yes, thats the question. Exim closing the connection should have > conntrack at least pick up a fin packet from the mail server (which > should move the entry to the 2 minute fin timeout). Okay, update this morning. I left tcpdump running overnight having cleared conntrack of all port 25 and 587 connections. This morning, there's a whole bunch of new entries on conntrack. Digging through the tcpdump and logs, it seems what is going on is: public interface dmz interface origin -> mailserver SYN origin -> mailserver SYN mailserver -> origin SYNACK mailserver -> origin SYNACK origin -> mailserver ACK mailserver -> origin RST mailserver -> origin SYNACK mailserver -> origin SYNACK mailserver -> origin SYNACK mailserver -> origin SYNACK mailserver -> origin SYNACK mailserver -> origin SYNACK mailserver -> origin SYNACK mailserver -> origin SYNACK ... Here is an example from the public interface: 09:52:36.599398 IP 103.14.225.112.63461 > 78.32.30.218.587: Flags [SEW], seq 3387227814, win 8192, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0 09:52:36.599893 IP 78.32.30.218.587 > 103.14.225.112.63461: Flags [S.], seq 816385329, ack 3387227815, win 64240, options [mss 1452,nop,nop,sackOK,nop,wscale 7], length 0 09:52:36.820464 IP 103.14.225.112.63461 > 78.32.30.218.587: Flags [.], ack 1, win 260, length 0 09:52:36.820549 IP 78.32.30.218.587 > 103.14.225.112.63461: Flags [R], seq 816385330, win 0, length 0 09:52:37.637548 IP 78.32.30.218.587 > 103.14.225.112.63461: Flags [S.], seq 816385329, ack 3387227815, win 64240, options [mss 1452,nop,nop,sackOK,nop,wscale 7], length 0 and the corresponding trace on the mailserver: 09:52:36.599729 IP 103.14.225.112.63461 > 78.32.30.218.587: Flags [SEW], seq 3387227814, win 8192, options [mss 1452,nop,wscale 8,nop,nop,sackOK], length 0 09:52:36.599772 IP 78.32.30.218.587 > 103.14.225.112.63461: Flags [S.], seq 816385329, ack 3387227815, win 64240, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0 09:52:37.637421 IP 78.32.30.218.587 > 103.14.225.112.63461: Flags [S.], seq 816385329, ack 3387227815, win 64240, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0 So, my first observation is that conntrack is reacting to the ACK packet on the public interface, and marking the connection established, but a firewall rule is rejecting the connection when that ACK packet is received by sending a TCP reset. It looks like conntrack does not see this packet, and also conntrack does not see the SYNACK retransmissions (which is odd, because it saw the first one.) As to why we're responding with a TCP reset to the ACK packet, it's because iptables is hitting a reject rule as the IP address has been temporarily banned due to preceding known spammer signatures a few seconds before. I probably ought to pick up on the initial SYN rather than the 3rd packet of the connection... but even so, I don't think conntrack should be missing the TCP reset from the reject rule. The rule path that leads to the reject rule is currently: -A TCP -p tcp -m multiport --dports 25,587 -m conntrack --ctstate ESTABLISHED -j TCP-smtp-in -A TCP-smtp-in -p tcp -m set --match-set ip4-banned-smtp src -j TCP-smtp-s -A TCP-smtp-s -j SET --add-set ip4-banned-smtp src --exist --timeout N -A TCP-smtp-s -p tcp -j REJECT --reject-with tcp-reset (I've omitted the timeout.) There definitely seems to be a change in behaviour - looking back to the logs prior to upgrading to 6.1, there were never any conntrack table overflows, and that older kernel had been running for hundreds of days. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 6.1: possible bug with netfilter conntrack? 2023-01-13 11:12 ` Russell King (Oracle) @ 2023-01-13 12:56 ` Florian Westphal 2023-01-13 13:36 ` Russell King (Oracle) 0 siblings, 1 reply; 8+ messages in thread From: Florian Westphal @ 2023-01-13 12:56 UTC (permalink / raw) To: Russell King (Oracle); +Cc: Florian Westphal, netdev, netfilter-devel, coreteam Russell King (Oracle) <linux@armlinux.org.uk> wrote: [..] > Digging through the tcpdump and logs, it seems what is going on is: > > public interface dmz interface > origin -> mailserver SYN origin -> mailserver SYN > mailserver -> origin SYNACK mailserver -> origin SYNACK > origin -> mailserver ACK > mailserver -> origin RST > mailserver -> origin SYNACK mailserver -> origin SYNACK > mailserver -> origin SYNACK mailserver -> origin SYNACK > mailserver -> origin SYNACK mailserver -> origin SYNACK > mailserver -> origin SYNACK mailserver -> origin SYNACK > ... > > Here is an example from the public interface: > > 09:52:36.599398 IP 103.14.225.112.63461 > 78.32.30.218.587: Flags [SEW], seq 3387227814, win 8192, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0 > 09:52:36.599893 IP 78.32.30.218.587 > 103.14.225.112.63461: Flags [S.], seq 816385329, ack 3387227815, win 64240, options [mss 1452,nop,nop,sackOK,nop,wscale 7], length 0 > 09:52:36.820464 IP 103.14.225.112.63461 > 78.32.30.218.587: Flags [.], ack 1, win 260, length 0 > 09:52:36.820549 IP 78.32.30.218.587 > 103.14.225.112.63461: Flags [R], seq 816385330, win 0, length 0 > 09:52:37.637548 IP 78.32.30.218.587 > 103.14.225.112.63461: Flags [S.], seq 816385329, ack 3387227815, win 64240, options [mss 1452,nop,nop,sackOK,nop,wscale 7], length 0 > > and the corresponding trace on the mailserver: > 09:52:36.599729 IP 103.14.225.112.63461 > 78.32.30.218.587: Flags [SEW], seq 3387227814, win 8192, options [mss 1452,nop,wscale 8,nop,nop,sackOK], length 0 > 09:52:36.599772 IP 78.32.30.218.587 > 103.14.225.112.63461: Flags [S.], seq 816385329, ack 3387227815, win 64240, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0 > 09:52:37.637421 IP 78.32.30.218.587 > 103.14.225.112.63461: Flags [S.], seq 816385329, ack 3387227815, win 64240, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0 > > So, my first observation is that conntrack is reacting to the ACK > packet on the public interface, and marking the connection established, > but a firewall rule is rejecting the connection when that ACK packet is > received by sending a TCP reset. It looks like conntrack does not see > this packet, Right, this is silly. I'll see about this; the rst packet bypasses conntrack because nf_send_reset attaches the exising entry of the packet its replying to -- tcp conntrack gets skipped for the generated RST. But this is also the case in 5.16, so no idea why this is surfacing now. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 6.1: possible bug with netfilter conntrack? 2023-01-13 12:56 ` Florian Westphal @ 2023-01-13 13:36 ` Russell King (Oracle) 0 siblings, 0 replies; 8+ messages in thread From: Russell King (Oracle) @ 2023-01-13 13:36 UTC (permalink / raw) To: Florian Westphal; +Cc: netdev, netfilter-devel, coreteam On Fri, Jan 13, 2023 at 01:56:29PM +0100, Florian Westphal wrote: > Right, this is silly. I'll see about this; the rst packet > bypasses conntrack because nf_send_reset attaches the exising > entry of the packet its replying to -- tcp conntrack gets skipped for > the generated RST. > > But this is also the case in 5.16, so no idea why this is surfacing now. Maybe it's just down to a change in the traffic characteristics that now makes it a problem, and it appearing after upgrading to 6.1 is just coincidence? -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-01-13 13:44 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-01-12 23:03 6.1: possible bug with netfilter conntrack? Russell King (Oracle) 2023-01-12 23:38 ` Florian Westphal 2023-01-13 0:16 ` Russell King (Oracle) 2023-01-12 23:40 ` Russell King (Oracle) 2023-01-12 23:45 ` Florian Westphal 2023-01-13 11:12 ` Russell King (Oracle) 2023-01-13 12:56 ` Florian Westphal 2023-01-13 13:36 ` Russell King (Oracle)
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).