* Re: [PATCH v8 0/8] Add support for memory sealing
[not found] ` <b7d4a2c8-70da-4eaa-8905-4def3eacae37@linaro.org>
@ 2025-02-07 0:47 ` Andrei Vagin
2025-02-07 12:10 ` Adhemerval Zanella Netto
0 siblings, 1 reply; 3+ messages in thread
From: Andrei Vagin @ 2025-02-07 0:47 UTC (permalink / raw)
To: Adhemerval Zanella Netto
Cc: Aleksandr Mikhalitsyn, Cristian Rodríguez, Florian Weimer,
libc-alpha, Jeff Xu, H . J . Lu, rppt, 0x7f454c46, criu
On Thu, Feb 06, 2025 at 04:47:32PM -0300, Adhemerval Zanella Netto wrote:
>
>
> On 06/02/25 15:03, Aleksandr Mikhalitsyn wrote:
> > On Thu, Feb 6, 2025 at 3:25 PM Adhemerval Zanella Netto
> > <adhemerval.zanella@linaro.org> wrote:
> >>
> >>
> >>
> >> On 06/02/25 06:15, Andrei Vagin wrote:
> >>> On Mon, Feb 03, 2025 at 11:11:56PM -0300, Cristian Rodríguez wrote:
> >>>> On Mon, Feb 3, 2025 at 4:40 PM Florian Weimer <fweimer@redhat.com> wrote:
> >>>>>
> >>>>> * Adhemerval Zanella Netto:
> >>>>>
> >>>>>>> CRIU needs to be able to unmap everything that was initially loaded by
> >>>>>>> the kernel and glibc. This will stop working if we use mseal for glibc
> >>>>>>> itself.
> >>>>>>
> >>>>>> So in this case the easiest way it to filter of mseal (with seccomp or
> >>>>>> something related) and disable sealing. I don't have a easy solution.
> >>>>>
> >>>>> Please test with CRIU and trace and find a way to make them work again
> >>>>> if they are broken.
> >>>>
> >>>> that is a kernel problem afaik..
> >>>
> >>> Could you please provide more details on why you think that is the
> >>> kernel issue?
> >>>
> >>> btw: this reminds me another discussion about mseal on lkml:
> >>> https://lore.kernel.org/lkml/htdv44tqzi4jl2b7dwutsdwnh4tgrxq6xdvumi5wwu3hnh7sgw@tfwlal74ukx6/
> >>>
> >>>> .why libc has to care about this limitation ?
> >>>
> >>> CRIU has worked with glibc for many years... It's not just about CRIU;
> >>> other projects, such as gVisor and UML, are also likely to be affected.
> >>
> >> The current proposal is a opt-in feature, but also without a way to disable it
> >> (similar to how RELRO is enableD).
> >>
> >> I don't have much experience on how CRIU or gVisor works internally, but if
> >> any requires to change any metadata (munmap, mprotect) of the PT_LOAD elf
> >> segments after startup this basically defeats the whole idea of the memory
> >> sealing hardening.
> >>
> >> I don't see a way to support both semantics without some extra kernel support,
> >> where either you can mark some process with extra credentials to do the
> >> required VMA operations (like process_madvise, etc.) or disable sealing during
> >> the snapshot.
> >>
> >> The mseal usage idea was primarily for program loaders, similar to how
> >> mimmutable for OpenBSD; but it seems that some programs also intend to
> >> use the syscall directly for some internal hardening (like Chrome). How
> >> CRIU/gVisor would handle such scenarios?
> >
> > Dear friends,
> >
> > I've quickly read a patchset [PATCH v8 0/8] Add support for memory
> > sealing (https://sourceware.org/pipermail/libc-alpha/2025-January/164361.html)
> > and noticed that on
> > https://sourceware.org/pipermail/libc-alpha/2025-January/164368.html
> > it's said:
> >> The GNU_PROPERTY_MEMORY_SEAL enforcement depends on whether the kernel
> >> supports the mseal syscall and how glibc is configured. On the default
> >> configuration that aims to support older kernel releases, the memory
> >> sealing attribute is taken as a hint. If glibc is configured with a
> >> minimum kernel of 6.10, where mseal is implied to be supported,
> >> sealing is enforced.
> >
> > => if I understand it right, it makes memory sealing to be enabled by
> > default if the kernel supports it even without a linker flag, right?
> >
> > I don't really understand what "glibc is configured with a minimum
> > kernel of 6.10" means from the user perspective.
> > I'm not very familiar with glibc internals, so can somebody put some
> > light on this, please?
>
> On glibc has a minimum support kernel version of 3.2; but some
> architectures override it (either because the ABI was added in newer
> versions, or due some other reason).
>
> We also have an option on where you can build glibc assuming it will
> always run on a specific kernel version (--enable-kernel=x.y.z). On
> previous releases we enforced by checking the kernel version at loading
> time, but currently glibc only uses to assume that certain syscall are
> always present (so there is no need to use fallbacks or handle ENOSYS).
>
> So if you build glibc with --enable-kernel=6.10 it means that mseal
> is expected to be always usable, ENOSYS is not possible, and thus any
> syscall failure is expected to be an error (assuming that we are passing
> valid arguments).
>
> If --enable-kernel is not used, it means that glibc can run on a kernel
> without mseal, and thus memory sealing can not be applied (we still might
> enforce it, but I think since we do have a way to enforce with
> --enable-kernel there is no urgent need for it).
>
> In any case, memory sealing will be only applied in the presence
> of GNU_PROPERTY_MEMORY_SEAL.
But this flag is considered for a binary and its libraries separately.
If libc is compiled with GNU_PROPERTY_MEMORY_SEAL, all binaries that
load this libc will have sealed mappings, regardless of whether the
binary itself has the flag or not.
I compiled glibc with the patches and performed a simple experiment:
```
[root@bc2868439161 install]# cat test.c
int main() {
return 0;
}
[root@bc2868439161 install]# gcc -Wl,-dynamic-linker,/mnt/glibc/install/lib/ld-linux-x86-64.so.2 -Wl,-z,nomemory-seal test.c
[root@bc2868439161 install]# strace -e mseal,openat,mmap ./a.out
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fda54b59000
mseal(0x7fda54b59000, 8192, 0) = 0
openat(AT_FDCWD, "/mnt/glibc/install/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
mmap(NULL, 2001, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fda54b58000
openat(AT_FDCWD, "/mnt/glibc/install/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
mmap(NULL, 1998928, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fda5496f000
mmap(0x7fda54ace000, 483328, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15f000) = 0x7fda54ace000
mmap(0x7fda54b44000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1d5000) = 0x7fda54b44000
mmap(0x7fda54b4a000, 53328, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fda54b4a000
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fda5496c000
mseal(0x7fda5496c000, 12288, 0) = 0
mseal(0x7fda5496f000, 1998928, 0) = 0
mseal(0x7fda54b61000, 163665, 0) = 0
mseal(0x7fda54b89000, 45544, 0) = 0
mseal(0x7fda54b95000, 13096, 0) = 0
+++ exited with 0 +++
```
The test binary was compiled without the GNU_PROPERTY_MEMORY_SEAL flag.
However, we can see that all glibc mappings have been sealed. The
initial mapping is sealed even before libc.so is loaded, likely because
ld.so also has the GNU_PROPERTY_MEMORY_SEAL flag.
For operation, CRIU needs to be able to unmap all its mappings, which is
essential for restoring process address spaces. This means we need to
compile CRIU so that its process doesn't have any sealed mappings.
The same requirement applies to gVisor and UML, which both use stub
processes to manage guest address spaces. Basically, the main process
forks a new process, unmaps all existing mappings in the forked process,
and then populates it with guest mappings.
>
> >
> > I can't see how this can break the CRIU dump for us (I believe it
> > shouldn't but still worth checking), but for CRIU restore it's
> > definitely a problem
> > and reminds me of the rseq()&CRIU story we had a few years ago. My
> > current understanding is:
> >
> > *during CRIU restore*
> > 0. somehow disable mseal for CRIU binary itself, to make sure that
> > when CRIU do clone() we don't get any mappings sealed
> > 1. restore all memory mappings of the restorable process without
> > mseal() applied to them
> > 2. at the later criu restore stage go over them and apply mseal()
> >
> > I have a bad feeling that I still miss something, but even step 0 is a
> > problem right now if we go with the current approach from this
> > patch series, isn't it?
>
> I am not familiar on how CRIU snapshot/restore is done, and how is
> responsible to do each step. Is the kernel involved in any dump step,
> meaning that you need either to start the process with some IPC, or it
> just done in userland (with ptrace or other way to stop the process
> plus reading /proc/mem)?
It is done in userland. CRIU uses ptrace, proc and even injects a small
binary code in a target process to collect all required information to
be able to restore the process in the same state later.
>
> And on restore, how is this accomplished?
The process is a bit more complicated, but for a basic understanding, it
involves the following steps: fork a new process; restore all mappings;
unmap all CRIU mappings; remap the restored mappings to the correct
addresses; and finally, resume the process.
Thanks,
Andrei
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v8 0/8] Add support for memory sealing
2025-02-07 0:47 ` [PATCH v8 0/8] Add support for memory sealing Andrei Vagin
@ 2025-02-07 12:10 ` Adhemerval Zanella Netto
2025-02-07 12:17 ` Adhemerval Zanella Netto
0 siblings, 1 reply; 3+ messages in thread
From: Adhemerval Zanella Netto @ 2025-02-07 12:10 UTC (permalink / raw)
To: Andrei Vagin
Cc: Aleksandr Mikhalitsyn, Cristian Rodríguez, Florian Weimer,
libc-alpha, Jeff Xu, H . J . Lu, rppt, 0x7f454c46, criu
On 06/02/25 21:47, Andrei Vagin wrote:
> On Thu, Feb 06, 2025 at 04:47:32PM -0300, Adhemerval Zanella Netto wrote:
>>
>>
>> On 06/02/25 15:03, Aleksandr Mikhalitsyn wrote:
>>> On Thu, Feb 6, 2025 at 3:25 PM Adhemerval Zanella Netto
>>> <adhemerval.zanella@linaro.org> wrote:
>>>>
>>>>
>>>>
>>>> On 06/02/25 06:15, Andrei Vagin wrote:
>>>>> On Mon, Feb 03, 2025 at 11:11:56PM -0300, Cristian Rodríguez wrote:
>>>>>> On Mon, Feb 3, 2025 at 4:40 PM Florian Weimer <fweimer@redhat.com> wrote:
>>>>>>>
>>>>>>> * Adhemerval Zanella Netto:
>>>>>>>
>>>>>>>>> CRIU needs to be able to unmap everything that was initially loaded by
>>>>>>>>> the kernel and glibc. This will stop working if we use mseal for glibc
>>>>>>>>> itself.
>>>>>>>>
>>>>>>>> So in this case the easiest way it to filter of mseal (with seccomp or
>>>>>>>> something related) and disable sealing. I don't have a easy solution.
>>>>>>>
>>>>>>> Please test with CRIU and trace and find a way to make them work again
>>>>>>> if they are broken.
>>>>>>
>>>>>> that is a kernel problem afaik..
>>>>>
>>>>> Could you please provide more details on why you think that is the
>>>>> kernel issue?
>>>>>
>>>>> btw: this reminds me another discussion about mseal on lkml:
>>>>> https://lore.kernel.org/lkml/htdv44tqzi4jl2b7dwutsdwnh4tgrxq6xdvumi5wwu3hnh7sgw@tfwlal74ukx6/
>>>>>
>>>>>> .why libc has to care about this limitation ?
>>>>>
>>>>> CRIU has worked with glibc for many years... It's not just about CRIU;
>>>>> other projects, such as gVisor and UML, are also likely to be affected.
>>>>
>>>> The current proposal is a opt-in feature, but also without a way to disable it
>>>> (similar to how RELRO is enableD).
>>>>
>>>> I don't have much experience on how CRIU or gVisor works internally, but if
>>>> any requires to change any metadata (munmap, mprotect) of the PT_LOAD elf
>>>> segments after startup this basically defeats the whole idea of the memory
>>>> sealing hardening.
>>>>
>>>> I don't see a way to support both semantics without some extra kernel support,
>>>> where either you can mark some process with extra credentials to do the
>>>> required VMA operations (like process_madvise, etc.) or disable sealing during
>>>> the snapshot.
>>>>
>>>> The mseal usage idea was primarily for program loaders, similar to how
>>>> mimmutable for OpenBSD; but it seems that some programs also intend to
>>>> use the syscall directly for some internal hardening (like Chrome). How
>>>> CRIU/gVisor would handle such scenarios?
>>>
>>> Dear friends,
>>>
>>> I've quickly read a patchset [PATCH v8 0/8] Add support for memory
>>> sealing (https://sourceware.org/pipermail/libc-alpha/2025-January/164361.html)
>>> and noticed that on
>>> https://sourceware.org/pipermail/libc-alpha/2025-January/164368.html
>>> it's said:
>>>> The GNU_PROPERTY_MEMORY_SEAL enforcement depends on whether the kernel
>>>> supports the mseal syscall and how glibc is configured. On the default
>>>> configuration that aims to support older kernel releases, the memory
>>>> sealing attribute is taken as a hint. If glibc is configured with a
>>>> minimum kernel of 6.10, where mseal is implied to be supported,
>>>> sealing is enforced.
>>>
>>> => if I understand it right, it makes memory sealing to be enabled by
>>> default if the kernel supports it even without a linker flag, right?
>>>
>>> I don't really understand what "glibc is configured with a minimum
>>> kernel of 6.10" means from the user perspective.
>>> I'm not very familiar with glibc internals, so can somebody put some
>>> light on this, please?
>>
>> On glibc has a minimum support kernel version of 3.2; but some
>> architectures override it (either because the ABI was added in newer
>> versions, or due some other reason).
>>
>> We also have an option on where you can build glibc assuming it will
>> always run on a specific kernel version (--enable-kernel=x.y.z). On
>> previous releases we enforced by checking the kernel version at loading
>> time, but currently glibc only uses to assume that certain syscall are
>> always present (so there is no need to use fallbacks or handle ENOSYS).
>>
>> So if you build glibc with --enable-kernel=6.10 it means that mseal
>> is expected to be always usable, ENOSYS is not possible, and thus any
>> syscall failure is expected to be an error (assuming that we are passing
>> valid arguments).
>>
>> If --enable-kernel is not used, it means that glibc can run on a kernel
>> without mseal, and thus memory sealing can not be applied (we still might
>> enforce it, but I think since we do have a way to enforce with
>> --enable-kernel there is no urgent need for it).
>>
>> In any case, memory sealing will be only applied in the presence
>> of GNU_PROPERTY_MEMORY_SEAL.
>
> But this flag is considered for a binary and its libraries separately.
> If libc is compiled with GNU_PROPERTY_MEMORY_SEAL, all binaries that
> load this libc will have sealed mappings, regardless of whether the
> binary itself has the flag or not.
>
> I compiled glibc with the patches and performed a simple experiment:
>
> ```
> [root@bc2868439161 install]# cat test.c
> int main() {
> return 0;
> }
> [root@bc2868439161 install]# gcc -Wl,-dynamic-linker,/mnt/glibc/install/lib/ld-linux-x86-64.so.2 -Wl,-z,nomemory-seal test.c
> [root@bc2868439161 install]# strace -e mseal,openat,mmap ./a.out
> mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fda54b59000
> mseal(0x7fda54b59000, 8192, 0) = 0
> openat(AT_FDCWD, "/mnt/glibc/install/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
> mmap(NULL, 2001, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fda54b58000
> openat(AT_FDCWD, "/mnt/glibc/install/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
> mmap(NULL, 1998928, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fda5496f000
> mmap(0x7fda54ace000, 483328, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15f000) = 0x7fda54ace000
> mmap(0x7fda54b44000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1d5000) = 0x7fda54b44000
> mmap(0x7fda54b4a000, 53328, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fda54b4a000
> mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fda5496c000
> mseal(0x7fda5496c000, 12288, 0) = 0
> mseal(0x7fda5496f000, 1998928, 0) = 0
> mseal(0x7fda54b61000, 163665, 0) = 0
> mseal(0x7fda54b89000, 45544, 0) = 0
> mseal(0x7fda54b95000, 13096, 0) = 0
> +++ exited with 0 +++
> ```
>
> The test binary was compiled without the GNU_PROPERTY_MEMORY_SEAL flag.
> However, we can see that all glibc mappings have been sealed. The
> initial mapping is sealed even before libc.so is loaded, likely because
> ld.so also has the GNU_PROPERTY_MEMORY_SEAL flag.
Yes, this is controlled by a new configure flags [1], which is enabled by
default. With --disable-default-memory-seal you can disable sealing for
glibc itself.
[1] https://patchwork.sourceware.org/project/glibc/patch/20250129172550.1119706-8-adhemerval.zanella@linaro.org/
>
> For operation, CRIU needs to be able to unmap all its mappings, which is
> essential for restoring process address spaces. This means we need to
> compile CRIU so that its process doesn't have any sealed mappings.
>
> The same requirement applies to gVisor and UML, which both use stub
> processes to manage guest address spaces. Basically, the main process
> forks a new process, unmaps all existing mappings in the forked process,
> and then populates it with guest mappings.
The main problem here is memory sealing idea is hardening mechanism to prevent
exactly this kind of operation. And this does not help also if the program
uses mseal directly, like Chrome and maybe other intends to do. How do intend
to work with these scenarios?
On previous iterations of this patch I have a tunable to disable sealing,
where GNU_PROPERTY_MEMORY_SEAL is simple ignored. I removed because this
is way a bypass the security hardening, and it also does help on fork case.
I still think we need some kernel help here, where a process can configure
itself (with a prctl or something related) to make a fork() process not
inherit the sealing bit to proper fix it without making this hardening
a opt-out feature (which I defeats the whole idea).
Ideally it would require a new clone flag, and most likely a new fork symbol,
to avoid concurrent issues (where multiple thread sets a global state).
>
>>
>>>
>>> I can't see how this can break the CRIU dump for us (I believe it
>>> shouldn't but still worth checking), but for CRIU restore it's
>>> definitely a problem
>>> and reminds me of the rseq()&CRIU story we had a few years ago. My
>>> current understanding is:
>>>
>>> *during CRIU restore*
>>> 0. somehow disable mseal for CRIU binary itself, to make sure that
>>> when CRIU do clone() we don't get any mappings sealed
>>> 1. restore all memory mappings of the restorable process without
>>> mseal() applied to them
>>> 2. at the later criu restore stage go over them and apply mseal()
>>>
>>> I have a bad feeling that I still miss something, but even step 0 is a
>>> problem right now if we go with the current approach from this
>>> patch series, isn't it?
>>
>> I am not familiar on how CRIU snapshot/restore is done, and how is
>> responsible to do each step. Is the kernel involved in any dump step,
>> meaning that you need either to start the process with some IPC, or it
>> just done in userland (with ptrace or other way to stop the process
>> plus reading /proc/mem)?
>
> It is done in userland. CRIU uses ptrace, proc and even injects a small
> binary code in a target process to collect all required information to
> be able to restore the process in the same state later.
>
>>
>> And on restore, how is this accomplished?
>
> The process is a bit more complicated, but for a basic understanding, it
> involves the following steps: fork a new process; restore all mappings;
> unmap all CRIU mappings; remap the restored mappings to the correct
> addresses; and finally, resume the process.
>
> Thanks,
> Andrei
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v8 0/8] Add support for memory sealing
2025-02-07 12:10 ` Adhemerval Zanella Netto
@ 2025-02-07 12:17 ` Adhemerval Zanella Netto
0 siblings, 0 replies; 3+ messages in thread
From: Adhemerval Zanella Netto @ 2025-02-07 12:17 UTC (permalink / raw)
To: Andrei Vagin
Cc: Aleksandr Mikhalitsyn, Cristian Rodríguez, Florian Weimer,
libc-alpha, Jeff Xu, H . J . Lu, rppt, 0x7f454c46, criu
On 07/02/25 09:10, Adhemerval Zanella Netto wrote:
>
>
> On 06/02/25 21:47, Andrei Vagin wrote:
>> On Thu, Feb 06, 2025 at 04:47:32PM -0300, Adhemerval Zanella Netto wrote:
>>>
>>>
>>> On 06/02/25 15:03, Aleksandr Mikhalitsyn wrote:
>>>> On Thu, Feb 6, 2025 at 3:25 PM Adhemerval Zanella Netto
>>>> <adhemerval.zanella@linaro.org> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 06/02/25 06:15, Andrei Vagin wrote:
>>>>>> On Mon, Feb 03, 2025 at 11:11:56PM -0300, Cristian Rodríguez wrote:
>>>>>>> On Mon, Feb 3, 2025 at 4:40 PM Florian Weimer <fweimer@redhat.com> wrote:
>>>>>>>>
>>>>>>>> * Adhemerval Zanella Netto:
>>>>>>>>
>>>>>>>>>> CRIU needs to be able to unmap everything that was initially loaded by
>>>>>>>>>> the kernel and glibc. This will stop working if we use mseal for glibc
>>>>>>>>>> itself.
>>>>>>>>>
>>>>>>>>> So in this case the easiest way it to filter of mseal (with seccomp or
>>>>>>>>> something related) and disable sealing. I don't have a easy solution.
>>>>>>>>
>>>>>>>> Please test with CRIU and trace and find a way to make them work again
>>>>>>>> if they are broken.
>>>>>>>
>>>>>>> that is a kernel problem afaik..
>>>>>>
>>>>>> Could you please provide more details on why you think that is the
>>>>>> kernel issue?
>>>>>>
>>>>>> btw: this reminds me another discussion about mseal on lkml:
>>>>>> https://lore.kernel.org/lkml/htdv44tqzi4jl2b7dwutsdwnh4tgrxq6xdvumi5wwu3hnh7sgw@tfwlal74ukx6/
>>>>>>
>>>>>>> .why libc has to care about this limitation ?
>>>>>>
>>>>>> CRIU has worked with glibc for many years... It's not just about CRIU;
>>>>>> other projects, such as gVisor and UML, are also likely to be affected.
>>>>>
>>>>> The current proposal is a opt-in feature, but also without a way to disable it
>>>>> (similar to how RELRO is enableD).
>>>>>
>>>>> I don't have much experience on how CRIU or gVisor works internally, but if
>>>>> any requires to change any metadata (munmap, mprotect) of the PT_LOAD elf
>>>>> segments after startup this basically defeats the whole idea of the memory
>>>>> sealing hardening.
>>>>>
>>>>> I don't see a way to support both semantics without some extra kernel support,
>>>>> where either you can mark some process with extra credentials to do the
>>>>> required VMA operations (like process_madvise, etc.) or disable sealing during
>>>>> the snapshot.
>>>>>
>>>>> The mseal usage idea was primarily for program loaders, similar to how
>>>>> mimmutable for OpenBSD; but it seems that some programs also intend to
>>>>> use the syscall directly for some internal hardening (like Chrome). How
>>>>> CRIU/gVisor would handle such scenarios?
>>>>
>>>> Dear friends,
>>>>
>>>> I've quickly read a patchset [PATCH v8 0/8] Add support for memory
>>>> sealing (https://sourceware.org/pipermail/libc-alpha/2025-January/164361.html)
>>>> and noticed that on
>>>> https://sourceware.org/pipermail/libc-alpha/2025-January/164368.html
>>>> it's said:
>>>>> The GNU_PROPERTY_MEMORY_SEAL enforcement depends on whether the kernel
>>>>> supports the mseal syscall and how glibc is configured. On the default
>>>>> configuration that aims to support older kernel releases, the memory
>>>>> sealing attribute is taken as a hint. If glibc is configured with a
>>>>> minimum kernel of 6.10, where mseal is implied to be supported,
>>>>> sealing is enforced.
>>>>
>>>> => if I understand it right, it makes memory sealing to be enabled by
>>>> default if the kernel supports it even without a linker flag, right?
>>>>
>>>> I don't really understand what "glibc is configured with a minimum
>>>> kernel of 6.10" means from the user perspective.
>>>> I'm not very familiar with glibc internals, so can somebody put some
>>>> light on this, please?
>>>
>>> On glibc has a minimum support kernel version of 3.2; but some
>>> architectures override it (either because the ABI was added in newer
>>> versions, or due some other reason).
>>>
>>> We also have an option on where you can build glibc assuming it will
>>> always run on a specific kernel version (--enable-kernel=x.y.z). On
>>> previous releases we enforced by checking the kernel version at loading
>>> time, but currently glibc only uses to assume that certain syscall are
>>> always present (so there is no need to use fallbacks or handle ENOSYS).
>>>
>>> So if you build glibc with --enable-kernel=6.10 it means that mseal
>>> is expected to be always usable, ENOSYS is not possible, and thus any
>>> syscall failure is expected to be an error (assuming that we are passing
>>> valid arguments).
>>>
>>> If --enable-kernel is not used, it means that glibc can run on a kernel
>>> without mseal, and thus memory sealing can not be applied (we still might
>>> enforce it, but I think since we do have a way to enforce with
>>> --enable-kernel there is no urgent need for it).
>>>
>>> In any case, memory sealing will be only applied in the presence
>>> of GNU_PROPERTY_MEMORY_SEAL.
>>
>> But this flag is considered for a binary and its libraries separately.
>> If libc is compiled with GNU_PROPERTY_MEMORY_SEAL, all binaries that
>> load this libc will have sealed mappings, regardless of whether the
>> binary itself has the flag or not.
>>
>> I compiled glibc with the patches and performed a simple experiment:
>>
>> ```
>> [root@bc2868439161 install]# cat test.c
>> int main() {
>> return 0;
>> }
>> [root@bc2868439161 install]# gcc -Wl,-dynamic-linker,/mnt/glibc/install/lib/ld-linux-x86-64.so.2 -Wl,-z,nomemory-seal test.c
>> [root@bc2868439161 install]# strace -e mseal,openat,mmap ./a.out
>> mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fda54b59000
>> mseal(0x7fda54b59000, 8192, 0) = 0
>> openat(AT_FDCWD, "/mnt/glibc/install/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
>> mmap(NULL, 2001, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fda54b58000
>> openat(AT_FDCWD, "/mnt/glibc/install/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
>> mmap(NULL, 1998928, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fda5496f000
>> mmap(0x7fda54ace000, 483328, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15f000) = 0x7fda54ace000
>> mmap(0x7fda54b44000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1d5000) = 0x7fda54b44000
>> mmap(0x7fda54b4a000, 53328, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fda54b4a000
>> mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fda5496c000
>> mseal(0x7fda5496c000, 12288, 0) = 0
>> mseal(0x7fda5496f000, 1998928, 0) = 0
>> mseal(0x7fda54b61000, 163665, 0) = 0
>> mseal(0x7fda54b89000, 45544, 0) = 0
>> mseal(0x7fda54b95000, 13096, 0) = 0
>> +++ exited with 0 +++
>> ```
>>
>> The test binary was compiled without the GNU_PROPERTY_MEMORY_SEAL flag.
>> However, we can see that all glibc mappings have been sealed. The
>> initial mapping is sealed even before libc.so is loaded, likely because
>> ld.so also has the GNU_PROPERTY_MEMORY_SEAL flag.
>
> Yes, this is controlled by a new configure flags [1], which is enabled by
> default. With --disable-default-memory-seal you can disable sealing for
> glibc itself.
>
> [1] https://patchwork.sourceware.org/project/glibc/patch/20250129172550.1119706-8-adhemerval.zanella@linaro.org/
>
>>
>> For operation, CRIU needs to be able to unmap all its mappings, which is
>> essential for restoring process address spaces. This means we need to
>> compile CRIU so that its process doesn't have any sealed mappings.
>>
>> The same requirement applies to gVisor and UML, which both use stub
>> processes to manage guest address spaces. Basically, the main process
>> forks a new process, unmaps all existing mappings in the forked process,
>> and then populates it with guest mappings.
>
> The main problem here is memory sealing idea is hardening mechanism to prevent
> exactly this kind of operation. And this does not help also if the program
> uses mseal directly, like Chrome and maybe other intends to do. How do intend
> to work with these scenarios?
>
> On previous iterations of this patch I have a tunable to disable sealing,
> where GNU_PROPERTY_MEMORY_SEAL is simple ignored. I removed because this
> is way a bypass the security hardening, and it also does help on fork case.
>
> I still think we need some kernel help here, where a process can configure
> itself (with a prctl or something related) to make a fork() process not
> inherit the sealing bit to proper fix it without making this hardening
> a opt-out feature (which I defeats the whole idea).
>
> Ideally it would require a new clone flag, and most likely a new fork symbol,
> to avoid concurrent issues (where multiple thread sets a global state).
I am assuming here that restore can happen at any time, in a API like manner
(I am not sure if I understand how CRIU/UMP/gVisor works in all cases).
If the idea is to just have a wrapper binary that is linked against a glibc
to just do the restore maybe a simple solution like filtering mseal syscall
(so it act a noop) might work better.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-02-07 12:17 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <87zfj2j47n.fsf@oldenburg.str.redhat.com>
[not found] ` <15cf9325-aba9-4bd8-a297-e5a0b0349e1c@linaro.org>
[not found] ` <87ed0ej1cl.fsf@oldenburg.str.redhat.com>
[not found] ` <6cda841a-e6d3-40e7-a2c7-9fdffa193909@linaro.org>
[not found] ` <878qqmizqh.fsf@oldenburg.str.redhat.com>
[not found] ` <CAPBLoAdrB9Hu=R25BTMj-sPJ_-xO3rX0bO6AhDGy6J_nHsOaBw@mail.gmail.com>
[not found] ` <Z6R9zDo2TZBHA4fs@google.com>
[not found] ` <d7220fff-913f-4fa5-9474-299916676d37@linaro.org>
[not found] ` <CAEivzxcUJkxqt2oDZ8jZsDhmm=6G18G5F+8Gdde_CR6w8TQpKQ@mail.gmail.com>
[not found] ` <b7d4a2c8-70da-4eaa-8905-4def3eacae37@linaro.org>
2025-02-07 0:47 ` [PATCH v8 0/8] Add support for memory sealing Andrei Vagin
2025-02-07 12:10 ` Adhemerval Zanella Netto
2025-02-07 12:17 ` Adhemerval Zanella Netto
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.