* troubles caused by conntrack overlimit in init_netns
@ 2022-04-02 10:33 Vasily Averin
2022-04-02 11:11 ` Florian Westphal
2022-04-02 17:12 ` Eric Dumazet
0 siblings, 2 replies; 8+ messages in thread
From: Vasily Averin @ 2022-04-02 10:33 UTC (permalink / raw)
To: Pablo Neira Ayuso, Florian Westphal; +Cc: netfilter-devel, kernel
Pablo, Florian,
There is an old issue with conntrack limit on multi-netns (read container) nodes.
Any connection to containers hosted on the node creates a conntrack in init_netns.
If the number of conntrack in init_netns reaches the limit, the whole node becomes
unavailable.
To avoid it OpenVz had special patches disabled conntracks on init_ns on openvz nodes,
but this automatically limits the functionality of host's firewall.
This has been our specific pain for many years, however, containers are now
being used much more widely than before, and the severity of the described problem
is growing more and more.
Do you know perhaps some alternative solution?
Thank you,
Vasily Averin
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: troubles caused by conntrack overlimit in init_netns
2022-04-02 10:33 troubles caused by conntrack overlimit in init_netns Vasily Averin
@ 2022-04-02 11:11 ` Florian Westphal
2022-04-02 13:00 ` Nikita Yushchenko
2022-04-04 7:59 ` Vasily Averin
2022-04-02 17:12 ` Eric Dumazet
1 sibling, 2 replies; 8+ messages in thread
From: Florian Westphal @ 2022-04-02 11:11 UTC (permalink / raw)
To: Vasily Averin
Cc: Pablo Neira Ayuso, Florian Westphal, netfilter-devel, kernel
Vasily Averin <vvs@openvz.org> wrote:
> There is an old issue with conntrack limit on multi-netns (read container) nodes.
>
> Any connection to containers hosted on the node creates a conntrack in init_netns.
> If the number of conntrack in init_netns reaches the limit, the whole node becomes
> unavailable.
Right, from inet_net p.o.v. connections coming from container netns is
no different from different physical host on pyhsical network.
> To avoid it OpenVz had special patches disabled conntracks on init_ns on openvz nodes,
> but this automatically limits the functionality of host's firewall.
>
> This has been our specific pain for many years, however, containers are now
> being used much more widely than before, and the severity of the described problem
> is growing more and more.
>
> Do you know perhaps some alternative solution?
If you need conntrack in init_net, then no.
If you don't (or only for connections that won't be rerouted to
container netns) you could -j NOTRACK traffic coming from/going to
container.
But, why do you need conntrack in the container netns?
Normally I'd expect that if packet was already handled in init_net,
why re-run skb through conntrack again?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: troubles caused by conntrack overlimit in init_netns
2022-04-02 11:11 ` Florian Westphal
@ 2022-04-02 13:00 ` Nikita Yushchenko
2022-04-04 7:59 ` Vasily Averin
1 sibling, 0 replies; 8+ messages in thread
From: Nikita Yushchenko @ 2022-04-02 13:00 UTC (permalink / raw)
To: Florian Westphal, Vasily Averin
Cc: Pablo Neira Ayuso, netfilter-devel, kernel
Hi
> But, why do you need conntrack in the container netns?
Container can hold a default installation of a linux distro, that often enables firewall and adds rules
that trigger conntrack. But, that is inside container's netns. That is accounted separately and is not
part of the issue being discussed.
The issue is that traffic directed to container(s) eats out host's conntrack limit, and causes the host
to be inaccessible via network.
>> Do you know perhaps some alternative solution?
>
> If you need conntrack in init_net, then no.
I suppose this can be fixed by something like:
- adding some "window" on top of host's conntrack limit,
- if out of conntracks, but still within window, conntrack shall be created but marked,
- mark could be removed by a dedicated netfilter rule,
- attaching more than one skb to marked conntrack shall be blocked (i.e. packet dropped instead),
- if skb pointing to a marked conntrack is about to leave the stack (that is, either being delivered
locally, or queued for transmitting out), it shall be dropped instead, and conntrack removed.
This shall give host's admin a way to explicitly configure a packat path to use as a host management
interface, that will stay accessible even if containers eat out conntrack limit.
Is that reasonable? Maybe I can try to implement that...
Nikita
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: troubles caused by conntrack overlimit in init_netns
2022-04-02 10:33 troubles caused by conntrack overlimit in init_netns Vasily Averin
2022-04-02 11:11 ` Florian Westphal
@ 2022-04-02 17:12 ` Eric Dumazet
2022-04-02 18:32 ` Vasily Averin
1 sibling, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2022-04-02 17:12 UTC (permalink / raw)
To: Vasily Averin, Pablo Neira Ayuso, Florian Westphal, edumazet
Cc: netfilter-devel, kernel
On 4/2/22 03:33, Vasily Averin wrote:
> Pablo, Florian,
>
> There is an old issue with conntrack limit on multi-netns (read container) nodes.
>
> Any connection to containers hosted on the node creates a conntrack in init_netns.
> If the number of conntrack in init_netns reaches the limit, the whole node becomes
> unavailable.
Can you describe network topology ?
Are you using macvlan, ipvlan, or something else ?
> To avoid it OpenVz had special patches disabled conntracks on init_ns on openvz nodes,
> but this automatically limits the functionality of host's firewall.
>
> This has been our specific pain for many years, however, containers are now
> being used much more widely than before, and the severity of the described problem
> is growing more and more.
>
> Do you know perhaps some alternative solution?
>
> Thank you,
> Vasily Averin
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: troubles caused by conntrack overlimit in init_netns
2022-04-02 17:12 ` Eric Dumazet
@ 2022-04-02 18:32 ` Vasily Averin
2022-04-02 18:50 ` Eric Dumazet
0 siblings, 1 reply; 8+ messages in thread
From: Vasily Averin @ 2022-04-02 18:32 UTC (permalink / raw)
To: Eric Dumazet, Pablo Neira Ayuso, Florian Westphal, edumazet
Cc: netfilter-devel, kernel
On 4/2/22 20:12, Eric Dumazet wrote:
>
> On 4/2/22 03:33, Vasily Averin wrote:
>> Pablo, Florian,
>>
>> There is an old issue with conntrack limit on multi-netns (read container) nodes.
>>
>> Any connection to containers hosted on the node creates a conntrack in init_netns.
>> If the number of conntrack in init_netns reaches the limit, the whole node becomes
>> unavailable.
>
> Can you describe network topology ?
+= veth1 <=> veth container1
ethX <=> brX =+= veth2 <=> veth container2
+= vethX <=> veth containerX
> Are you using macvlan, ipvlan, or something else ?
No, we dod not used it earlier, because it was not available in RHEL7,
but now it looks like good solution for me.
Thank you for the hint,
Vasily Averin
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: troubles caused by conntrack overlimit in init_netns
2022-04-02 18:32 ` Vasily Averin
@ 2022-04-02 18:50 ` Eric Dumazet
2022-04-02 19:52 ` Vasily Averin
0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2022-04-02 18:50 UTC (permalink / raw)
To: Vasily Averin
Cc: Eric Dumazet, Pablo Neira Ayuso, Florian Westphal,
netfilter-devel, kernel
On Sat, Apr 2, 2022 at 11:32 AM Vasily Averin <vasily.averin@linux.dev> wrote:
>
> On 4/2/22 20:12, Eric Dumazet wrote:
> >
> > On 4/2/22 03:33, Vasily Averin wrote:
> >> Pablo, Florian,
> >>
> >> There is an old issue with conntrack limit on multi-netns (read container) nodes.
> >>
> >> Any connection to containers hosted on the node creates a conntrack in init_netns.
> >> If the number of conntrack in init_netns reaches the limit, the whole node becomes
> >> unavailable.
> >
> > Can you describe network topology ?
>
> += veth1 <=> veth container1
> ethX <=> brX =+= veth2 <=> veth container2
> += vethX <=> veth containerX
>
Could you simply add an iptables rule in init_net to bypass conntrack
for idev=veth* ?
iptables -t raw -I PREROUTING -i veth+ -j NOTRACK
(I have not worked with conntrack in recent years, this might be foolish...)
> > Are you using macvlan, ipvlan, or something else ?
>
> No, we dod not used it earlier, because it was not available in RHEL7,
> but now it looks like good solution for me.
>
> Thank you for the hint,
> Vasily Averin
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: troubles caused by conntrack overlimit in init_netns
2022-04-02 18:50 ` Eric Dumazet
@ 2022-04-02 19:52 ` Vasily Averin
0 siblings, 0 replies; 8+ messages in thread
From: Vasily Averin @ 2022-04-02 19:52 UTC (permalink / raw)
To: Eric Dumazet
Cc: Eric Dumazet, Pablo Neira Ayuso, Florian Westphal,
netfilter-devel, kernel
On 4/2/22 21:50, Eric Dumazet wrote:
> On Sat, Apr 2, 2022 at 11:32 AM Vasily Averin <vasily.averin@linux.dev> wrote:
>>
>> On 4/2/22 20:12, Eric Dumazet wrote:
>>>
>>> On 4/2/22 03:33, Vasily Averin wrote:
>>>> Pablo, Florian,
>>>>
>>>> There is an old issue with conntrack limit on multi-netns (read container) nodes.
>>>>
>>>> Any connection to containers hosted on the node creates a conntrack in init_netns.
>>>> If the number of conntrack in init_netns reaches the limit, the whole node becomes
>>>> unavailable.
>>>
>>> Can you describe network topology ?
>>
>> += veth1 <=> veth container1
>> ethX <=> brX =+= veth2 <=> veth container2
>> += vethX <=> veth containerX
>>
>
> Could you simply add an iptables rule in init_net to bypass conntrack
> for idev=veth* ?
>
> iptables -t raw -I PREROUTING -i veth+ -j NOTRACK
>
> (I have not worked with conntrack in recent years, this might be foolish...)
Great and simple idea.
Thank you very much, we'll investigate it.
Vasily Averin
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: troubles caused by conntrack overlimit in init_netns
2022-04-02 11:11 ` Florian Westphal
2022-04-02 13:00 ` Nikita Yushchenko
@ 2022-04-04 7:59 ` Vasily Averin
1 sibling, 0 replies; 8+ messages in thread
From: Vasily Averin @ 2022-04-04 7:59 UTC (permalink / raw)
To: Florian Westphal; +Cc: Pablo Neira Ayuso, netfilter-devel, kernel
On 4/2/22 14:11, Florian Westphal wrote:
> But, why do you need conntrack in the container netns?
> Normally I'd expect that if packet was already handled in init_net,
> why re-run skb through conntrack again?
OpenVz and LXC containers are used for hosting:
so init_netns is controlled by Hoster admins for system-wide purposes,
the container is under the control of the end user, who can configure
any rules for the internal firewall.
Thank you,
Vasily Averin
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-04-04 7:59 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-04-02 10:33 troubles caused by conntrack overlimit in init_netns Vasily Averin
2022-04-02 11:11 ` Florian Westphal
2022-04-02 13:00 ` Nikita Yushchenko
2022-04-04 7:59 ` Vasily Averin
2022-04-02 17:12 ` Eric Dumazet
2022-04-02 18:32 ` Vasily Averin
2022-04-02 18:50 ` Eric Dumazet
2022-04-02 19:52 ` Vasily Averin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).