netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Topi Miettinen <toiwoton@gmail.com>
To: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: netfilter-devel@vger.kernel.org
Subject: Re: Support for loading firewall rules with cgroup(v2) expressions early
Date: Thu, 31 Mar 2022 18:10:19 +0300	[thread overview]
Message-ID: <dbbe9ff4-4ec8-b979-9a35-7f79b3fbb9cb@gmail.com> (raw)
In-Reply-To: <YkTP40PPDCJSObeH@salvia>

On 31.3.2022 0.47, Pablo Neira Ayuso wrote:
> On Wed, Mar 30, 2022 at 07:37:00PM +0300, Topi Miettinen wrote:
>> On 30.3.2022 1.25, Pablo Neira Ayuso wrote:
>>> On Tue, Mar 29, 2022 at 09:20:25PM +0300, Topi Miettinen wrote:
>>>> On 28.3.2022 18.05, Pablo Neira Ayuso wrote:
>>>>> On Mon, Mar 28, 2022 at 05:08:32PM +0300, Topi Miettinen wrote:
>>>>>> On 28.3.2022 0.31, Pablo Neira Ayuso wrote:
>>>>>>> On Sat, Mar 26, 2022 at 12:09:26PM +0200, Topi Miettinen wrote:
>>>>> [...]
>>>>>> But I think that with this approach, depending on system load, there could
>>>>>> be a vulnerable time window where the rules aren't loaded yet but the
>>>>>> process which is supposed to be protected by the rules has already started
>>>>>> running. This isn't desirable for firewalls, so I'd like to have a way for
>>>>>> loading the firewall rules as early as possible.
>>>>>
>>>>> You could define a static ruleset which creates the table, basechain
>>>>> and the cgroupv2 verdict map. Then, systemd updates this map with new
>>>>> entries to match on cgroupsv2 and apply the corresponding policy for
>>>>> this process, and delete it when not needed anymore. You have to
>>>>> define one non-basechain for each cgroupv2 policy.
>>>>
>>>> Actually this seems to work:
>>>>
>>>> table inet filter {
>>>>           set cg {
>>>>                   typeof socket cgroupv2 level 0
>>>>           }
>>>>
>>>>           chain y {
>>>>                   socket cgroupv2 level 2 @cg accept
>>>> 		counter drop
>>>>           }
>>>> }
>>>>
>>>> Simulating systemd adding the cgroup of a service to the set:
>>>> # nft add element inet filter cg "system.slice/systemd-resolved.service"
>>>>
>>>> Cgroup ID (inode number of the cgroup) has been successfully added:
>>>> # nft list set inet filter cg
>>>>           set cg {
>>>>                   typeof socket cgroupv2 level 0
>>>>                   elements = { 6032 }
>>>>           }
>>>> # ls -id /sys/fs/cgroup/system.slice/systemd-resolved.service
>>>> 6032 /sys/fs/cgroup/system.slice/systemd-resolved.service/
>>>
>>> You could define a ruleset that describes the policy following the
>>> cgroupsv2 hierarchy. Something like this:
>>>
>>>    table inet filter {
>>>           map dict_cgroup_level_1 {
>>>                   type cgroupsv2 : verdict;
>>>                   elements = { "system.slice" : jump system_slice }
>>>           }
>>>
>>>           map dict_cgroup_level_2 {
>>>                   type cgroupsv2 : verdict;
>>>                   elements = { "system.slice/systemd-timesyncd.service" : jump systemd_timesyncd }
>>>           }
>>>
>>>           chain systemd_timesyncd {
>>>                   # systemd-timesyncd policy
>>>           }
>>>
>>>           chain system_slice {
>>>                   socket cgroupv2 level 2 vmap @dict_cgroup_level_2
>>>                   # policy for system.slice process
>>>           }
>>>
>>>           chain input {
>>>                   type filter hook input priority filter; policy drop;
>>>                   socket cgroupv2 level 1 vmap @dict_cgroup_level_1
>>>           }
>>>    }
>>>
>>> The dictionaries per level allows you to mimic the cgroupsv2 tree
>>> hierarchy
>>>
>>> This allows you to attach a default policy for processes that belong
>>> to the "system_slice" (at level 1). This might also be useful in case
>>> that there is a process in the group "system_slice" which does not yet
>>> have an explicit level 2 policy, so level 1 policy applies in such
>>> case.
>>>
>>> You might want to apply the level 1 policy before the level 2 policy
>>> (ie. aggregate policies per level as you move searching for an exact
>>> cgroup match), or instead you might prefer to search for an exact
>>> match at level 2, otherwise backtrack to closest matching cgroupsv2
>>> for this process.
>>
>> Nice ideas, but the rules can't be loaded before the cgroups are realized at
>> early boot:
>>
>> Mar 30 19:14:45 systemd[1]: Starting nftables...
>> Mar 30 19:14:46 nft[1018]: /etc/nftables.conf:305:5-44: Error: cgroupv2 path
>> fails: Permission denied
>> Mar 30 19:14:46 nft[1018]: "system.slice/systemd-timesyncd.service" : jump
>> systemd_timesyncd
>> Mar 30 19:14:46 nft[1018]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> Mar 30 19:14:46 systemd[1]: nftables.service: Main process exited,
>> code=exited, status=1/FAILURE
>> Mar 30 19:14:46 systemd[1]: nftables.service: Failed with result
>> 'exit-code'.
>> Mar 30 19:14:46 systemd[1]: Failed to start nftables.
> 
> I guess this unit file performs nft -f on cgroupsv2 that do not exist
> yet.

Yes, that's the case. Being able to do so with for example 
"cgroupsv2name" would be nice.

> Could you just load the base policy with empty dictionaries instead,
> then track and register the cgroups into the ruleset as they are being
> created/removed?

That's possible and I'll probably make a PR for systemd for such a 
feature. But I don't think that's the best solution: if the NFT rules 
are loaded from initrd and systemd is not running (initrd is not built 
by dracut), rules won't work, even top level "system.slice" and 
"user.slice". Then network connectivity in initrd could be a problem. 
Also I don't know if that model would scale to unprivileged user 
services or containers. Userspace daemon feeding kernel information that 
it already knows seems a bit inelegant.

-Topi

>>> There is also the jump and goto semantics for chains that can be
>>> combined in this chain tree.
>>>
>>> BTW, what nftables version are you using? My listing does not show
>>> i-nodes, instead it shows the path.
>>
>> Debian version: 1.0.2-1. The inode numbers seem to be caused by my SELinux
>> policy. Disabling it shows the paths:
>>
>>          map dict_cgroup_level_2_sys {
>>                  type cgroupsv2 : verdict
>>                  elements = { 5132 : jump systemd_timesyncd }
>>          }
>>
>>          map dict_cgroup_level_1 {
>>                  type cgroupsv2 : verdict
>>                  elements = { "system.slice" : jump system_slice,
>>                               "user.slice" : jump user_slice }
>>          }
>>
>> Above "system.slice/systemd-timesyncd.service" is a number because the
>> cgroup ID became stale when I restarted the service. I think the policy
>> doesn't work then anymore.
> 
> Yes, you have to refresh your policy on cgroupsv2 updates.


  reply	other threads:[~2022-03-31 15:10 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-26 10:09 Support for loading firewall rules with cgroup(v2) expressions early Topi Miettinen
2022-03-27 21:31 ` Pablo Neira Ayuso
2022-03-28 14:08   ` Topi Miettinen
2022-03-28 15:05     ` Pablo Neira Ayuso
2022-03-28 17:46       ` Topi Miettinen
2022-03-29 18:20       ` Topi Miettinen
2022-03-29 22:25         ` Pablo Neira Ayuso
2022-03-30  2:53           ` Pablo Neira Ayuso
2022-04-02  8:12             ` Topi Miettinen
2022-04-03 18:32               ` Topi Miettinen
2022-04-05 22:00                 ` Pablo Neira Ayuso
2022-04-06 13:57                   ` Topi Miettinen
2022-03-30 16:37           ` Topi Miettinen
2022-03-30 21:47             ` Pablo Neira Ayuso
2022-03-31 15:10               ` Topi Miettinen [this message]
2022-04-05 22:18                 ` Pablo Neira Ayuso
2022-04-06 14:02                   ` Topi Miettinen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dbbe9ff4-4ec8-b979-9a35-7f79b3fbb9cb@gmail.com \
    --to=toiwoton@gmail.com \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pablo@netfilter.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).