troubles caused by conntrack overlimit in init

netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* troubles caused by conntrack overlimit in init_netns
@ 2022-04-02 10:33 Vasily Averin
  2022-04-02 11:11 ` Florian Westphal
  2022-04-02 17:12 ` Eric Dumazet
  0 siblings, 2 replies; 8+ messages in thread
From: Vasily Averin @ 2022-04-02 10:33 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Florian Westphal; +Cc: netfilter-devel, kernel

Pablo, Florian,

There is an old issue with conntrack limit on multi-netns (read container) nodes.

Any connection to containers hosted on the node creates a conntrack in init_netns.
If the number of conntrack in init_netns reaches the limit, the whole node becomes
unavailable.

To avoid it OpenVz had special patches disabled conntracks on init_ns on openvz nodes, 
but this automatically limits the functionality of host's firewall.

This has been our specific pain for many years, however, containers are now 
being used much more widely than before, and the severity of the described problem
is growing more and more.

Do you know perhaps some alternative solution?

Thank you,
	Vasily Averin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: troubles caused by conntrack overlimit in init_netns
  2022-04-02 10:33 troubles caused by conntrack overlimit in init_netns Vasily Averin
@ 2022-04-02 11:11 ` Florian Westphal
  2022-04-02 13:00   ` Nikita Yushchenko
  2022-04-04  7:59   ` Vasily Averin
  2022-04-02 17:12 ` Eric Dumazet
  1 sibling, 2 replies; 8+ messages in thread
From: Florian Westphal @ 2022-04-02 11:11 UTC (permalink / raw)
  To: Vasily Averin
  Cc: Pablo Neira Ayuso, Florian Westphal, netfilter-devel, kernel

Vasily Averin <vvs@openvz.org> wrote:
> There is an old issue with conntrack limit on multi-netns (read container) nodes.
> 
> Any connection to containers hosted on the node creates a conntrack in init_netns.
> If the number of conntrack in init_netns reaches the limit, the whole node becomes
> unavailable.

Right, from inet_net p.o.v. connections coming from container netns is
no different from different physical host on pyhsical network.

> To avoid it OpenVz had special patches disabled conntracks on init_ns on openvz nodes, 
> but this automatically limits the functionality of host's firewall.
> 
> This has been our specific pain for many years, however, containers are now 
> being used much more widely than before, and the severity of the described problem
> is growing more and more.
> 
> Do you know perhaps some alternative solution?

If you need conntrack in init_net, then no.

If you don't (or only for connections that won't be rerouted to
container netns) you could -j NOTRACK traffic coming from/going to
container.

But, why do you need conntrack in the container netns?
Normally I'd expect that if packet was already handled in init_net,
why re-run skb through conntrack again?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: troubles caused by conntrack overlimit in init_netns
  2022-04-02 11:11 ` Florian Westphal
@ 2022-04-02 13:00   ` Nikita Yushchenko
  2022-04-04  7:59   ` Vasily Averin
  1 sibling, 0 replies; 8+ messages in thread
From: Nikita Yushchenko @ 2022-04-02 13:00 UTC (permalink / raw)
  To: Florian Westphal, Vasily Averin
  Cc: Pablo Neira Ayuso, netfilter-devel, kernel

Hi

 > But, why do you need conntrack in the container netns?

Container can hold a default installation of a linux distro, that often enables firewall and adds rules 
that trigger conntrack. But, that is inside container's netns. That is accounted separately and is not 
part of the issue being discussed.

The issue is that traffic directed to container(s) eats out host's conntrack limit, and causes the host 
to be inaccessible via network.

 >> Do you know perhaps some alternative solution?
 >
 > If you need conntrack in init_net, then no.

I suppose this can be fixed by something like:
- adding some "window" on top of host's conntrack limit,
- if out of conntracks, but still within window, conntrack shall be created but marked,
- mark could be removed by a dedicated netfilter rule,
- attaching more than one skb to marked conntrack shall be blocked (i.e. packet dropped instead),
- if skb pointing to a marked conntrack is about to leave the stack (that is, either being delivered 
locally, or queued for transmitting out), it shall be dropped instead, and conntrack removed.

This shall give host's admin a way to explicitly configure a packat path to use as a host management 
interface, that will stay accessible even if containers eat out conntrack limit.

Is that reasonable?  Maybe I can try to implement that...

Nikita

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: troubles caused by conntrack overlimit in init_netns
  2022-04-02 10:33 troubles caused by conntrack overlimit in init_netns Vasily Averin
  2022-04-02 11:11 ` Florian Westphal
@ 2022-04-02 17:12 ` Eric Dumazet
  2022-04-02 18:32   ` Vasily Averin
  1 sibling, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2022-04-02 17:12 UTC (permalink / raw)
  To: Vasily Averin, Pablo Neira Ayuso, Florian Westphal, edumazet
  Cc: netfilter-devel, kernel


On 4/2/22 03:33, Vasily Averin wrote:
> Pablo, Florian,
>
> There is an old issue with conntrack limit on multi-netns (read container) nodes.
>
> Any connection to containers hosted on the node creates a conntrack in init_netns.
> If the number of conntrack in init_netns reaches the limit, the whole node becomes
> unavailable.


Can you describe network topology ?


Are you using macvlan, ipvlan, or something else ?


> To avoid it OpenVz had special patches disabled conntracks on init_ns on openvz nodes,
> but this automatically limits the functionality of host's firewall.
>
> This has been our specific pain for many years, however, containers are now
> being used much more widely than before, and the severity of the described problem
> is growing more and more.
>
> Do you know perhaps some alternative solution?
>
> Thank you,
> 	Vasily Averin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: troubles caused by conntrack overlimit in init_netns
  2022-04-02 17:12 ` Eric Dumazet
@ 2022-04-02 18:32   ` Vasily Averin
  2022-04-02 18:50     ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Vasily Averin @ 2022-04-02 18:32 UTC (permalink / raw)
  To: Eric Dumazet, Pablo Neira Ayuso, Florian Westphal, edumazet
  Cc: netfilter-devel, kernel

On 4/2/22 20:12, Eric Dumazet wrote:
> 
> On 4/2/22 03:33, Vasily Averin wrote:
>> Pablo, Florian,
>>
>> There is an old issue with conntrack limit on multi-netns (read container) nodes.
>>
>> Any connection to containers hosted on the node creates a conntrack in init_netns.
>> If the number of conntrack in init_netns reaches the limit, the whole node becomes
>> unavailable.
> 
> Can you describe network topology ?

              += veth1 <=> veth container1
ethX <=> brX =+= veth2 <=> veth container2
              += vethX <=> veth containerX

> Are you using macvlan, ipvlan, or something else ?

No, we dod not used it earlier, because it was not available in RHEL7, 
but now it looks like good solution for me.

Thank you for the hint,
	Vasily Averin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: troubles caused by conntrack overlimit in init_netns
  2022-04-02 18:32   ` Vasily Averin
@ 2022-04-02 18:50     ` Eric Dumazet
  2022-04-02 19:52       ` Vasily Averin
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2022-04-02 18:50 UTC (permalink / raw)
  To: Vasily Averin
  Cc: Eric Dumazet, Pablo Neira Ayuso, Florian Westphal,
	netfilter-devel, kernel

On Sat, Apr 2, 2022 at 11:32 AM Vasily Averin <vasily.averin@linux.dev> wrote:
>
> On 4/2/22 20:12, Eric Dumazet wrote:
> >
> > On 4/2/22 03:33, Vasily Averin wrote:
> >> Pablo, Florian,
> >>
> >> There is an old issue with conntrack limit on multi-netns (read container) nodes.
> >>
> >> Any connection to containers hosted on the node creates a conntrack in init_netns.
> >> If the number of conntrack in init_netns reaches the limit, the whole node becomes
> >> unavailable.
> >
> > Can you describe network topology ?
>
>               += veth1 <=> veth container1
> ethX <=> brX =+= veth2 <=> veth container2
>               += vethX <=> veth containerX
>

Could you simply add an iptables rule in init_net to bypass conntrack
for idev=veth* ?

iptables -t raw -I PREROUTING -i veth+ -j NOTRACK

(I have not worked with conntrack in recent years, this might be foolish...)

> > Are you using macvlan, ipvlan, or something else ?
>
> No, we dod not used it earlier, because it was not available in RHEL7,
> but now it looks like good solution for me.
>
> Thank you for the hint,
>         Vasily Averin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: troubles caused by conntrack overlimit in init_netns
  2022-04-02 18:50     ` Eric Dumazet
@ 2022-04-02 19:52       ` Vasily Averin
  0 siblings, 0 replies; 8+ messages in thread
From: Vasily Averin @ 2022-04-02 19:52 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, Pablo Neira Ayuso, Florian Westphal,
	netfilter-devel, kernel

On 4/2/22 21:50, Eric Dumazet wrote:
> On Sat, Apr 2, 2022 at 11:32 AM Vasily Averin <vasily.averin@linux.dev> wrote:
>>
>> On 4/2/22 20:12, Eric Dumazet wrote:
>>>
>>> On 4/2/22 03:33, Vasily Averin wrote:
>>>> Pablo, Florian,
>>>>
>>>> There is an old issue with conntrack limit on multi-netns (read container) nodes.
>>>>
>>>> Any connection to containers hosted on the node creates a conntrack in init_netns.
>>>> If the number of conntrack in init_netns reaches the limit, the whole node becomes
>>>> unavailable.
>>>
>>> Can you describe network topology ?
>>
>>               += veth1 <=> veth container1
>> ethX <=> brX =+= veth2 <=> veth container2
>>               += vethX <=> veth containerX
>>
> 
> Could you simply add an iptables rule in init_net to bypass conntrack
> for idev=veth* ?
> 
> iptables -t raw -I PREROUTING -i veth+ -j NOTRACK
> 
> (I have not worked with conntrack in recent years, this might be foolish...)

Great and simple idea.
Thank you very much, we'll investigate it.

	Vasily Averin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: troubles caused by conntrack overlimit in init_netns
  2022-04-02 11:11 ` Florian Westphal
  2022-04-02 13:00   ` Nikita Yushchenko
@ 2022-04-04  7:59   ` Vasily Averin
  1 sibling, 0 replies; 8+ messages in thread
From: Vasily Averin @ 2022-04-04  7:59 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Pablo Neira Ayuso, netfilter-devel, kernel

On 4/2/22 14:11, Florian Westphal wrote:
> But, why do you need conntrack in the container netns?
> Normally I'd expect that if packet was already handled in init_net,
> why re-run skb through conntrack again?

OpenVz and LXC containers are used for hosting:
so init_netns is controlled by Hoster admins for system-wide purposes,
the container is under the control of the end user, who can configure
any rules for the internal firewall.

Thank you,
	Vasily Averin


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-04-04  7:59 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-04-02 10:33 troubles caused by conntrack overlimit in init_netns Vasily Averin
2022-04-02 11:11 ` Florian Westphal
2022-04-02 13:00   ` Nikita Yushchenko
2022-04-04  7:59   ` Vasily Averin
2022-04-02 17:12 ` Eric Dumazet
2022-04-02 18:32   ` Vasily Averin
2022-04-02 18:50     ` Eric Dumazet
2022-04-02 19:52       ` Vasily Averin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).