All of lore.kernel.org
 help / color / mirror / Atom feed
* nftables.service hardening ideas
@ 2025-10-27  3:36 Christoph Anton Mitterer
  2025-10-28 16:26 ` Florian Westphal
  2025-10-30 23:30 ` Pablo Neira Ayuso
  0 siblings, 2 replies; 7+ messages in thread
From: Christoph Anton Mitterer @ 2025-10-27  3:36 UTC (permalink / raw)
  To: netfilter-devel

Hey.


This would be ideas about further hardening nftables.service, primarily
using the options from systemd.exec(5).


1. The current .service already has:
> ProtectSystem=full

This can be further tightened to:
> ProtectSystem=strict

which not only mounts some but the entire fs hierarchy read-only for
the service's commands.
I guess nft -f should never write anywhere, or does it? At least it
seems to work.

Setting ProtectSystem= effectively does something like PrivateMounts=,
so that is recommended not to be set.


2. As per ProtectSystem=’s documentation, which refers to
ReadOnlyPaths=’s:
- I assume nft never mounts anything (that wouldn't be propagated back
  to the main namespace.
- ReadOnlyPaths= docs say, either
    CapabilityBoundingSet=~CAP_SYS_ADMIN 
  and/or
    SystemCallFilter=~@mount
  shall be set, or a process can undo any ReadOnlyPaths= (and thus also
  ProtectSystem= and others.

SystemCallFilter= docs extend this to if any of PrivateTmp=,
PrivateDevices=, ProtectSystem=, ProtectHome=, ProtectKernelTunables=,
ProtectControlGroups=, ProtectKernelLogs=, ProtectClock=,
ReadOnlyPaths=, InaccessiblePaths= and ReadWritePaths=
are used.

We might actually set both:
> CapabilityBoundingSet=~CAP_SYS_ADMIN 
> SystemCallFilter=~@mount

(for CapabilityBoundingSet= even more, see below).


3. Using SystemCallFilter= in turn is recommended to use: >
SystemCallArchitectures=native

I'd guess nft doesn't use syscals from other archs?!


4. I guess nft needs no capabilities/privileges other than
CAP_NET_ADMIN:
> CapabilityBoundingSet=CAP_NET_ADMIN
> AmbientCapabilities=""
> NoNewPrivileges=yes

CapabilityBoundingSet=CAP_NET_ADMIN would supersede the =~CAP_SYS_ADMIN
from (2) above.

AmbientCapabilities="" disables all ambient capabilities, I'd blindly
guess that nftables doesn't execve(),... but it shouldn't harm either.


5. There should be no reason why nft -f needs to access stuff in /tmp
or /var/tmp of anything else, so:
> PrivateTmp=yes

this makes however /tmp/ and /var/tmp/ will be writable again (despite
the ProtectSystem=strict).

Even safer would be:
> PrivateTmp=isolate

but than we also need (because we have DefaultDependencies=no and some
other conditions fulfilled):
> RequiresMountsFor=/var

or /var/tmp could "leak out".


6. I'd guess nft -f never changes the clocks or directly reads/writes
to the kernel log buffer (or does it):
> ProtectClock=yes
> ProtectKernelLogs=yes

This also blocks syslog(2) (but not syslog(3)).


7. AFAICS, nft -f may cause modules to be loaded, but that's done
indirectly (i.e. I guess by the kernel itself?), so we can
> ProtectKernelModules=yes

Also makes /usr/lib/modules inaccessible, should that be used somehow.


8. I guess nft -f doesn't use devices (other than some standard ones
like /dev/null, etc.), IPC, RT nor namespaces and doesn't set
SUID/SGID:
> PrivateDevices=yes
> PrivateIPC=yes
> RestrictNamespaces=yes
> RestrictRealtime=yes
> RestrictSUIDSGID=yes


9. Does nftables use BPF or personalities? If not:
> PrivateBPF=yes
> LockPersonality=yes


10. A bit obscure perhaps...
> OOMScoreAdjust=-1000
or:
> OOMScoreAdjust=-999

The idea would be that nftables.service is security critical and should
rather not be OOMkilled in memory tight situations, also since it's
oneshot it would anyway give resources back soon.

Along with that one might set:
> OOMPolicy=kill

the default action is variable and set in system.conf (where it
defaults to stop).
Better a safe kill then sorry?!


Except perhaps for 10, the above things are IMO not totally
unreasonable.
So leats get a bit more exotic ;-)


11. Restrict executable paths:
> NoExecPaths=/
> ExecPaths=/usr/sbin/nft -/lib -/usr/lib

Not sure whether we'd also need to add locations lib32, lib64, libc32.
The above at least works on my amd64 Debian.


12. May unneeded pathnames inaccessible:
> InaccessiblePaths=-/boot -/media -/mnt -/opt -/proc -/srv -/sys -/var
> TemporaryFileSystem=/etc:ro
> BindReadOnlyPaths=/etc/protocols /etc/services /etc/passwd /etc/group /etc/nftables /etc/nftables.conf /etc/resolv.conf

- Some of these are already readonly, but why allowing even that if not
  neeed?!
- /dev, /home, /root and /tmp are already secured via other options
- I consider /bin/, /lib*, /sbin, /usr to never contain anything
  sensitive.
  But perhaps /usr/local could be blocked (might contain private code).
- /proc/ and /sys are seemingly not needed.
- /run is apparently needed, so cannot be blocked
- /etc/ is needed, but not all of it, so I make a tmpfs mount on it,
  and bindmount only the needed stuff.
  - The above uses the Debian config /etc/nftables.conf and what
    upstream nftables.service would use for rules /etc/nftables.
  - /etc/protocols, /etc/services,  /etc/passwd  and/etc/group are
    needed for resolving proto, service, user and group names.
  - For /etc/resolv.conf see (14) below.

Seems to work, but has of course the disadvantage that it only
blacklists. If a user has /my-secrets it would still be readable.

Better would be something like:
> TemporaryFileSystem=/:ro

and selectively BindReadOnlyPaths= everything actually needed.
If that would be preferred, I could try to work out the required dirs.



Up to here, I've tested the settings with at least some very simply
rules file, and loading that still seemed to work.
The following I haven't tested yet,... thought I'd ask for feedback
first, whether these things would be even wanted.



13. This depends on (12) or better (TODO) below.
> ProtectHostname=yes

“Note that when this option is enabled for a service hostname changes
no longer propagate from the system into the service”... not sure
whether nftables ever uses the hostname/domainname? If so, that would
mean it could be outdated.

“Note that this option does not prevent changing system hostname via
hostnamectl”... not sure whether (12) makes this fully impossible by
having hostnamectl non-executable... the manpage suggests using User=


14. Disallowing unneeded address families.
> RestrictAddressFamilies=AF_NETLINK AF_UNIX AF_INET AF_INET6

- This I've actually checked, too.
- In principle AF_NETLINK suffices.
- AF_INET and AF_INET6 are needed if hostnames are resolved via DNS.
Now in principle, nftables.service is anyway meant to run a at point
where no networking is available yet, so one could say this (and
/etc/resolv.conf) above is pointless.
It may however also be restarted/reloaded when the system is already
up, and then DNS would in principle work (in case the nft rules have
been modified meanwhile to include hostnames/fqdns as addresses).
Whether that should then be allowed, or whether it would be better to
fail already then (and not wait for the next boot), is of course
another questions. No strong opinions here from my side.
- Not sure whether AF_UNIX is used by anything in nft (or indirectly).

Because the above doesn't cover all cases how sockets may be accessed,
the manpage suggests to also use:
SystemCallFilter=@service

So in our case it would be (order matters, hope it's the right one ^^):
> SystemCallFilter=@service
> SystemCallFilter=~@mount

(haven't checked whether this is enough for nft)

Also the SystemCallArchitectures=native already set before should be
set again when this is used.


15. Deny writeable memory mappings that are also executable:
> MemoryDenyWriteExecute=yes

To prevent circumvention, the manpage recommends to also set:
InaccessiblePaths=/dev/shm
SystemCallFilter=~memfd_create

Again, the already above set SystemCallArchitectures=native is
recommended, too.


16. The following, AFAIU, would all depend on letting run nft under a
non-root-user, via User= or rather DynamicUser=, which, I presume, we
could do as long as we give it CAP_NET_ADMIN.
> ProtectProc=invisible
> ProcSubset=pid   (not fully sure whether that requires non-root)
> RemoveIPC=yes


17. These all imply (18):
> PrivatePIDs=yes
> ProtectKernelTunables=yes   (not sure whether nft would need any of these)
> ProtectControlGroups=strict


18. The first two options from (16) and the ones from (17) also imply
> MountAPIVFS=yes
which in turn implies BindLogSockets=yes .
Not sure whether MountAPIVFS=yes causes any issues. Documentation kinda
implies it might only be effective if RootDirectory=/RootImage= are
used.


19. The following already default to secure values:
> KeyringMode=
> IgnoreSIGPIPE=


20. We could configure timeouts and restarting, like via TimeoutSec=
respectively Restart=, StartLimitIntervalSec= and StartLimitBurst=.
Not sure whether that would make any sense security wise, rather not.



I could/would make patches for all of the ones of your choice.
Of course if you know any cases where nft -f uses some feature which
would be restricted by the above, then please tell. :-)



Thanks,
Chris.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nftables.service hardening ideas
  2025-10-27  3:36 nftables.service hardening ideas Christoph Anton Mitterer
@ 2025-10-28 16:26 ` Florian Westphal
  2025-10-29  0:55   ` Christoph Anton Mitterer
  2025-10-30 23:30 ` Pablo Neira Ayuso
  1 sibling, 1 reply; 7+ messages in thread
From: Florian Westphal @ 2025-10-28 16:26 UTC (permalink / raw)
  To: Christoph Anton Mitterer; +Cc: netfilter-devel

Christoph Anton Mitterer <calestyo@scientia.org> wrote:
> This would be ideas about further hardening nftables.service, primarily
> using the options from systemd.exec(5).

Whats the point?  nft will exit anyway.

> which not only mounts some but the entire fs hierarchy read-only for
> the service's commands.
> I guess nft -f should never write anywhere, or does it? At least it
> seems to work.

nft -f should not write anything.

> 4. I guess nft needs no capabilities/privileges other than
> CAP_NET_ADMIN:
> > CapabilityBoundingSet=CAP_NET_ADMIN
> > AmbientCapabilities=""
> > NoNewPrivileges=yes
> 
> CapabilityBoundingSet=CAP_NET_ADMIN would supersede the =~CAP_SYS_ADMIN
> from (2) above.

CAP_NET_ADMIN is mandatory and should work as only capability.

> AmbientCapabilities="" disables all ambient capabilities, I'd blindly
> guess that nftables doesn't execve(),... but it shouldn't harm either.

It doesn't execve.

> 5. There should be no reason why nft -f needs to access stuff in /tmp
> or /var/tmp of anything else, so:
> > PrivateTmp=yes

Makes no sense to me.  nft -f won't write anything.

> 7. AFAICS, nft -f may cause modules to be loaded, but that's done
> indirectly (i.e. I guess by the kernel itself?)

nft relies on kernel module autoloading.

> So leats get a bit more exotic ;-)

Exotic? More like estoteric, this is bad.  Service file should be small and not rely
on obscure and maybe not well-tested systemd code paths.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nftables.service hardening ideas
  2025-10-28 16:26 ` Florian Westphal
@ 2025-10-29  0:55   ` Christoph Anton Mitterer
  2025-10-30 23:10     ` Florian Westphal
  0 siblings, 1 reply; 7+ messages in thread
From: Christoph Anton Mitterer @ 2025-10-29  0:55 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

Hey.

On Tue, 2025-10-28 at 17:26 +0100, Florian Westphal wrote:
> Christoph Anton Mitterer <calestyo@scientia.org> wrote:
> > This would be ideas about further hardening nftables.service,
> > primarily
> > using the options from systemd.exec(5).
> 
> Whats the point?  nft will exit anyway.

Uhm... well the point of any sandboxing is always (at least trying to)
prevent any attacks.

Sure, nftables is probably not the most likely program to be abused (in
particular as it usually won't process untrusted input), but still even
nftables can't be 100% sure to never be abused in something like
secretly included malware or so.

As with the first patchset my idea was simply that *if* a .service file
is shared it could as well be proper and use as many sandboxing options
from systemd as possible, serving as and example for e.g. downstream
versions of such .service.


> > I guess nft -f should never write anywhere, or does it? At least it
> > seems to work.
> 
> nft -f should not write anything.

Wasn't 100% sure whether it might e.g. write to some locations like
/proc/sys/net in special situations.


> > 5. There should be no reason why nft -f needs to access stuff in
> > /tmp
> > or /var/tmp of anything else, so:
> > > PrivateTmp=yes
> 
> Makes no sense to me.  nft -f won't write anything.

The idea of PrivateTmp=yes (in addition to ProtectSystem=strict) was
rather to prevent that nftables would be able to read anyone else's
files in /tmp (and /var/tmp), again for sandboxing reasons.



> Exotic? More like estoteric, this is bad.  Service file should be
> small and not rely
> on obscure and maybe not well-tested systemd code paths.

Uhm... do you have any reason to believe that the options below were
less tested? It seems at least some of them are used by system's own
units, so these are probably used on basically every system, and most
of what systemd does there is, AFAIU, merely using stuff the kernel
provides via namespaces.
Also *if* something would actually fail, then nft would probably just
terminate with some error or via some signal and it would be quickly
spotted.


Anyway, as said, none of this needs to be done at all... I merely did
the work to went through all these options and which could be used with
nft -f, so that other could perhaps benefit from that, too.


Cheers,
Chris.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nftables.service hardening ideas
  2025-10-29  0:55   ` Christoph Anton Mitterer
@ 2025-10-30 23:10     ` Florian Westphal
  2025-10-30 23:59       ` Christoph Anton Mitterer
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Westphal @ 2025-10-30 23:10 UTC (permalink / raw)
  To: Christoph Anton Mitterer; +Cc: netfilter-devel

Christoph Anton Mitterer <calestyo@scientia.org> wrote:
> Hey.
> 
> On Tue, 2025-10-28 at 17:26 +0100, Florian Westphal wrote:
> > Christoph Anton Mitterer <calestyo@scientia.org> wrote:
> > > This would be ideas about further hardening nftables.service,
> > > primarily
> > > using the options from systemd.exec(5).
> > 
> > Whats the point?  nft will exit anyway.
> 
> Uhm... well the point of any sandboxing is always (at least trying to)
> prevent any attacks.

Sure, but then we're talking about e.g. bug in dns resolver/parser
or something like that.

In general I don't believe Linux is capable of isolating against
abusing userspace, unfortunately.  Especially with CAP_NET_ADMIN
(which is very broad and provides access to many facilities in
 the kernel) or with unprivilged user namespaces enabled (the default,
sigh).

> Sure, nftables is probably not the most likely program to be abused (in
> particular as it usually won't process untrusted input), but still even
> nftables can't be 100% sure to never be abused in something like
> secretly included malware or so.

In that case I think all bets are of.

> As with the first patchset my idea was simply that *if* a .service file
> is shared it could as well be proper and use as many sandboxing options
> from systemd as possible, serving as and example for e.g. downstream
> versions of such .service.

Ok, if you want then feel free to start to send patches.
(and CC Jan).

I think that enabling CAP_NET_ADMIN restriction is fine,
otoh if you think that this should be done then I believe
its better to patch nft and not rely on systemd for this.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nftables.service hardening ideas
  2025-10-27  3:36 nftables.service hardening ideas Christoph Anton Mitterer
  2025-10-28 16:26 ` Florian Westphal
@ 2025-10-30 23:30 ` Pablo Neira Ayuso
  2025-10-30 23:59   ` Christoph Anton Mitterer
  1 sibling, 1 reply; 7+ messages in thread
From: Pablo Neira Ayuso @ 2025-10-30 23:30 UTC (permalink / raw)
  To: Christoph Anton Mitterer; +Cc: netfilter-devel

On Mon, Oct 27, 2025 at 04:36:08AM +0100, Christoph Anton Mitterer wrote:
> Hey.
> 
> This would be ideas about further hardening nftables.service, primarily
> using the options from systemd.exec(5).

Thanks, it general I would prefer if this nftables.service in the tree
remains simple.

For more advanced configurations, probably we can provide a list of
extra configurations for the paranoid users that they can apply to
refine in some other form (wiki page?).

Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nftables.service hardening ideas
  2025-10-30 23:10     ` Florian Westphal
@ 2025-10-30 23:59       ` Christoph Anton Mitterer
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Anton Mitterer @ 2025-10-30 23:59 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Fri, 2025-10-31 at 00:10 +0100, Florian Westphal wrote:
> Sure, but then we're talking about e.g. bug in dns resolver/parser
> or something like that.
> 
> In general I don't believe Linux is capable of isolating against
> abusing userspace, unfortunately.  Especially with CAP_NET_ADMIN
> (which is very broad and provides access to many facilities in
>  the kernel) or with unprivilged user namespaces enabled (the
> default,
> sigh).

Well, in principle I agree... still we get more and more such
sandboxing stuff (all the bubblewrap, etc.) and even if none of them
may give a 100% safe jail, if they help only in 10% cases it may
already be a win.


> > Sure, nftables is probably not the most likely program to be abused
> > (in
> > particular as it usually won't process untrusted input), but still
> > even
> > nftables can't be 100% sure to never be abused in something like
> > secretly included malware or so.
> 
> In that case I think all bets are of.

Maybe. Though if you think e.g. about XZ, some distros were basically
safe for more or less obscurish reasons.


> 
> Ok, if you want then feel free to start to send patches.
> (and CC Jan).
> 
> I think that enabling CAP_NET_ADMIN restriction is fine,
> otoh if you think that this should be done then I believe
> its better to patch nft and not rely on systemd for this.

Well I as said, I have no strong desires to get any of that upstream. 

For me personally, the patches from the first series (in particular the
one about not stopping on shutdown/isolation and closing the "pitfall"
of restart) would have been the ones I'd have considered the most
beneficial, but that was already conceived not so enthusiastically. ;-)


I can of course make patches if you really think some of this makes
sense (like e.g. CAP_NET_ADMIN), but then it'll be helpful[0] if you
could just tell me the numbers of the points from my original mail,
which you think are reasonable to even consider and which are way off.

And whether you rather want each set of related hardening options in
one commit, so you can easier pick (we can still squash shortly before
merging) or from the beginning larger patches.

But it shouldn't be considered as me trying to strongly push any of
that getting merged. If you say "no let's leave things as is", I'll
hold no grudge :-)


Cheers,
Chris.


[0] In that case, please keep in mind that some of the sandboxing
options make not much sense without some others:

E.g. ProtectHostname=yes seems to be safely usable with nftables
(assuming it should never try to change the hostname), but if we don't
set (12) or even better run nft it as non-root, it's as per
systemd.exec(5) not really useful.

I tried to mention all these cases in my mail.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nftables.service hardening ideas
  2025-10-30 23:30 ` Pablo Neira Ayuso
@ 2025-10-30 23:59   ` Christoph Anton Mitterer
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Anton Mitterer @ 2025-10-30 23:59 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

On Fri, 2025-10-31 at 00:30 +0100, Pablo Neira Ayuso wrote:
> For more advanced configurations, probably we can provide a list of
> extra configurations for the paranoid users that they can apply to
> refine in some other form (wiki page?).

I personally, don't like wikis for documentation (it's rather difficult
to "download" them).

What I've thought about was that one could make the more esoteric stuff
comments in the .service, so that any users of it (like downstream
distros) could decide whether or not they want to enable it.

However, that would still kinda put he burden on maintaining these
comments (and keeping them working) on nftables, which it seems is not
really desired.

For example, my:
> ExecPaths=/usr/sbin/nft -/lib -/usr/lib
(an some more) wasn't well tested enough yet, and would have broken my
own ExecStop= (which tries to execute sh and systemctl).

Keeping all that sandboxing options well tested and up2date is of
course some effort, so from that PoV I can fully understand Florian,
when he doesn't want to have any of this in upstream.


Cheers,
Chris.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-10-30 23:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-27  3:36 nftables.service hardening ideas Christoph Anton Mitterer
2025-10-28 16:26 ` Florian Westphal
2025-10-29  0:55   ` Christoph Anton Mitterer
2025-10-30 23:10     ` Florian Westphal
2025-10-30 23:59       ` Christoph Anton Mitterer
2025-10-30 23:30 ` Pablo Neira Ayuso
2025-10-30 23:59   ` Christoph Anton Mitterer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.