From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-il1-f170.google.com (mail-il1-f170.google.com [209.85.166.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9ABD217E4 for ; Fri, 7 Feb 2025 00:47:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.170 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738889257; cv=none; b=JakJ+PIJMR87thS7o6cSF93m1TUv9raMKjdYOvIyLLh90e6K+Z1V/FD+0tqI7wLi27vJFHvKhrsinDjO+iW/3gAeJddrqBmiQ+FFn+B8V6dvU+ka6kZCNOSn9VHk7w1MIrejSB185xLH0PsimhhGd9UEWeE8mgV07fUHTVzE9iE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738889257; c=relaxed/simple; bh=jSTrn5f4rYBOQL5YriUVnUhkio88aQpAAlaOZjiKNuw=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=iZ73WqKMPEGdvB8CmLJbE9rJsy4fvetQK/GopAEz3NZOSfwEU0W3KZNvMuCsTUfNgHYJsjEv/Wlwu7DlXgQUAYyNE9MMfBhWbiX4Pc1cW+2A/IACgGiPEcWA7pbUtTj0eCwm/es3mJQN9c3DQvdvdlWX6H+sWXEEMnWbaYufeqI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=36fgKlcm; arc=none smtp.client-ip=209.85.166.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="36fgKlcm" Received: by mail-il1-f170.google.com with SMTP id e9e14a558f8ab-3d04e5bd797so36755ab.1 for ; Thu, 06 Feb 2025 16:47:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738889254; x=1739494054; darn=lists.linux.dev; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=WWQAy3yXbDgLQ45F70EeKbYXGptVbQ63LibJxoUBBNE=; b=36fgKlcmfEw/BwAG/P4USGHECevW2LNIFDPCYtRGiOrKf0wtnoryTkUtEE0RxT50u8 +1Eaa1JGkxmQBk9/CXaOc9ANk3hjVdy2S9CzSUZxA1Ovw/aP5yDwGgc9eMm5rnO2fOsF WJnLEv3wksYufLhyt1wKqibXfmMJQVngJqoRGFw3fhMPubZa7SXEhXgyu5ttZnhQaLB1 8gUM4HyyJEMH8Mc9QeIxE1P0wERd3snTi3oH8L3s6K8YXEgllsLK7mMXK4IrGGruQLnb uDg9i80DpmVdLpI/XcJv2WX9oPgziCEB2pNq7CiWOTy804W8HeSHatl9xpVAt4UjJquV sx7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738889254; x=1739494054; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=WWQAy3yXbDgLQ45F70EeKbYXGptVbQ63LibJxoUBBNE=; b=iox4ORY23G7M4FPyFtbf4diu5MP4cZs34vFJB1tO7G5kTnxZ348xJK/TgqwseLC5ew HDpKs/VgBwW/2oKn6szxCUOkQfmkf+bpqh0RgGvAspsGfYF/MbsLLoAiRpLBkSpIkisE 5lWMTFrMDgXWx1jVs/NflDWSJ60G5NEG7mAilgP1XJtpRpPH31jJ2hBsLdrxLPzgHQHd uvBPcG+oPuqBo1GECtGdwOXVCNDqAGzjc4SsgjC/LeNLT22/TR5fpJMIin57WAMIJ78a RYbadc8j/kpe5ctpMN7kLVdkPuwhhr6JPqEAdKLoIUZxK5ouWnCtGpKsW3+7R/2B/Kee mxPg== X-Forwarded-Encrypted: i=1; AJvYcCXdkDhvbrLGtXm6P/SsXwhK0ziQqz7Z0NyLbXuLF2h0vgoLh6zbWRFt2922HsTKNw+7oywy@lists.linux.dev X-Gm-Message-State: AOJu0Yz5+ix5hc1b7GbSqwNr0OptxWE+hqLjO3Epf8YS20NhsHje228h c8iaqqt90FcFHmiYQ0vlm12yzlWS8L6X02Cfkv3KLj0oKnZw3Mvklcmyf5rGNg== X-Gm-Gg: ASbGncuBcInrdNOorH4jdbkjGRB/Ix1qwseSbs/gkuX33pMWHYQJtWW+phOW1hTj8RD 91dN7TScYqsChJzKRgmHPheB/Hh9MRdlNE78xP25cnB2SndM54kuNxpeB9ahhufWE1D/R4TZ9i4 rO1HvCGeCfthLzJiotsdNdIZGmo9+4bCVgNkOtIf+HffVLEzwyPIZcRVwHqntFK2m3Rk7V/U1GR xs+65N4qMnWmGT84QkOuHeOwUSryHJwXn7+ADlthfS2mJncFSvCNOKrQB6QVu0J2AC6JUY+585g JJcumMs+lEck/fkPvvyhRMilXoPzYi4Mr/+1pbX6TUx9GA== X-Google-Smtp-Source: AGHT+IHN4EZiOllrYQ5CfYdv7y3bIe55NW1xgmaKJDLYOoSkPECFVAYaSp8nuneec9o5ORLBBv5SUg== X-Received: by 2002:a05:6e02:3d82:b0:3cf:f8c0:417a with SMTP id e9e14a558f8ab-3d13dc2ca57mr1847345ab.0.1738889254296; Thu, 06 Feb 2025 16:47:34 -0800 (PST) Received: from google.com (41.25.70.34.bc.googleusercontent.com. [34.70.25.41]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4ecdc6806fasm38751173.110.2025.02.06.16.47.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Feb 2025 16:47:33 -0800 (PST) Date: Fri, 7 Feb 2025 00:47:31 +0000 From: Andrei Vagin To: Adhemerval Zanella Netto Cc: Aleksandr Mikhalitsyn , Cristian =?iso-8859-1?Q?Rodr=EDguez?= , Florian Weimer , libc-alpha@sourceware.org, Jeff Xu , "H . J . Lu" , rppt@kernel.org, 0x7f454c46@gmail.com, criu@lists.linux.dev Subject: Re: [PATCH v8 0/8] Add support for memory sealing Message-ID: References: <87zfj2j47n.fsf@oldenburg.str.redhat.com> <15cf9325-aba9-4bd8-a297-e5a0b0349e1c@linaro.org> <87ed0ej1cl.fsf@oldenburg.str.redhat.com> <6cda841a-e6d3-40e7-a2c7-9fdffa193909@linaro.org> <878qqmizqh.fsf@oldenburg.str.redhat.com> Precedence: bulk X-Mailing-List: criu@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Thu, Feb 06, 2025 at 04:47:32PM -0300, Adhemerval Zanella Netto wrote: > > > On 06/02/25 15:03, Aleksandr Mikhalitsyn wrote: > > On Thu, Feb 6, 2025 at 3:25 PM Adhemerval Zanella Netto > > wrote: > >> > >> > >> > >> On 06/02/25 06:15, Andrei Vagin wrote: > >>> On Mon, Feb 03, 2025 at 11:11:56PM -0300, Cristian Rodríguez wrote: > >>>> On Mon, Feb 3, 2025 at 4:40 PM Florian Weimer wrote: > >>>>> > >>>>> * Adhemerval Zanella Netto: > >>>>> > >>>>>>> CRIU needs to be able to unmap everything that was initially loaded by > >>>>>>> the kernel and glibc. This will stop working if we use mseal for glibc > >>>>>>> itself. > >>>>>> > >>>>>> So in this case the easiest way it to filter of mseal (with seccomp or > >>>>>> something related) and disable sealing. I don't have a easy solution. > >>>>> > >>>>> Please test with CRIU and trace and find a way to make them work again > >>>>> if they are broken. > >>>> > >>>> that is a kernel problem afaik.. > >>> > >>> Could you please provide more details on why you think that is the > >>> kernel issue? > >>> > >>> btw: this reminds me another discussion about mseal on lkml: > >>> https://lore.kernel.org/lkml/htdv44tqzi4jl2b7dwutsdwnh4tgrxq6xdvumi5wwu3hnh7sgw@tfwlal74ukx6/ > >>> > >>>> .why libc has to care about this limitation ? > >>> > >>> CRIU has worked with glibc for many years... It's not just about CRIU; > >>> other projects, such as gVisor and UML, are also likely to be affected. > >> > >> The current proposal is a opt-in feature, but also without a way to disable it > >> (similar to how RELRO is enableD). > >> > >> I don't have much experience on how CRIU or gVisor works internally, but if > >> any requires to change any metadata (munmap, mprotect) of the PT_LOAD elf > >> segments after startup this basically defeats the whole idea of the memory > >> sealing hardening. > >> > >> I don't see a way to support both semantics without some extra kernel support, > >> where either you can mark some process with extra credentials to do the > >> required VMA operations (like process_madvise, etc.) or disable sealing during > >> the snapshot. > >> > >> The mseal usage idea was primarily for program loaders, similar to how > >> mimmutable for OpenBSD; but it seems that some programs also intend to > >> use the syscall directly for some internal hardening (like Chrome). How > >> CRIU/gVisor would handle such scenarios? > > > > Dear friends, > > > > I've quickly read a patchset [PATCH v8 0/8] Add support for memory > > sealing (https://sourceware.org/pipermail/libc-alpha/2025-January/164361.html) > > and noticed that on > > https://sourceware.org/pipermail/libc-alpha/2025-January/164368.html > > it's said: > >> The GNU_PROPERTY_MEMORY_SEAL enforcement depends on whether the kernel > >> supports the mseal syscall and how glibc is configured. On the default > >> configuration that aims to support older kernel releases, the memory > >> sealing attribute is taken as a hint. If glibc is configured with a > >> minimum kernel of 6.10, where mseal is implied to be supported, > >> sealing is enforced. > > > > => if I understand it right, it makes memory sealing to be enabled by > > default if the kernel supports it even without a linker flag, right? > > > > I don't really understand what "glibc is configured with a minimum > > kernel of 6.10" means from the user perspective. > > I'm not very familiar with glibc internals, so can somebody put some > > light on this, please? > > On glibc has a minimum support kernel version of 3.2; but some > architectures override it (either because the ABI was added in newer > versions, or due some other reason). > > We also have an option on where you can build glibc assuming it will > always run on a specific kernel version (--enable-kernel=x.y.z). On > previous releases we enforced by checking the kernel version at loading > time, but currently glibc only uses to assume that certain syscall are > always present (so there is no need to use fallbacks or handle ENOSYS). > > So if you build glibc with --enable-kernel=6.10 it means that mseal > is expected to be always usable, ENOSYS is not possible, and thus any > syscall failure is expected to be an error (assuming that we are passing > valid arguments). > > If --enable-kernel is not used, it means that glibc can run on a kernel > without mseal, and thus memory sealing can not be applied (we still might > enforce it, but I think since we do have a way to enforce with > --enable-kernel there is no urgent need for it). > > In any case, memory sealing will be only applied in the presence > of GNU_PROPERTY_MEMORY_SEAL. But this flag is considered for a binary and its libraries separately. If libc is compiled with GNU_PROPERTY_MEMORY_SEAL, all binaries that load this libc will have sealed mappings, regardless of whether the binary itself has the flag or not. I compiled glibc with the patches and performed a simple experiment: ``` [root@bc2868439161 install]# cat test.c int main() { return 0; } [root@bc2868439161 install]# gcc -Wl,-dynamic-linker,/mnt/glibc/install/lib/ld-linux-x86-64.so.2 -Wl,-z,nomemory-seal test.c [root@bc2868439161 install]# strace -e mseal,openat,mmap ./a.out mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fda54b59000 mseal(0x7fda54b59000, 8192, 0) = 0 openat(AT_FDCWD, "/mnt/glibc/install/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 mmap(NULL, 2001, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fda54b58000 openat(AT_FDCWD, "/mnt/glibc/install/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 mmap(NULL, 1998928, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fda5496f000 mmap(0x7fda54ace000, 483328, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15f000) = 0x7fda54ace000 mmap(0x7fda54b44000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1d5000) = 0x7fda54b44000 mmap(0x7fda54b4a000, 53328, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fda54b4a000 mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fda5496c000 mseal(0x7fda5496c000, 12288, 0) = 0 mseal(0x7fda5496f000, 1998928, 0) = 0 mseal(0x7fda54b61000, 163665, 0) = 0 mseal(0x7fda54b89000, 45544, 0) = 0 mseal(0x7fda54b95000, 13096, 0) = 0 +++ exited with 0 +++ ``` The test binary was compiled without the GNU_PROPERTY_MEMORY_SEAL flag. However, we can see that all glibc mappings have been sealed. The initial mapping is sealed even before libc.so is loaded, likely because ld.so also has the GNU_PROPERTY_MEMORY_SEAL flag. For operation, CRIU needs to be able to unmap all its mappings, which is essential for restoring process address spaces. This means we need to compile CRIU so that its process doesn't have any sealed mappings. The same requirement applies to gVisor and UML, which both use stub processes to manage guest address spaces. Basically, the main process forks a new process, unmaps all existing mappings in the forked process, and then populates it with guest mappings. > > > > > I can't see how this can break the CRIU dump for us (I believe it > > shouldn't but still worth checking), but for CRIU restore it's > > definitely a problem > > and reminds me of the rseq()&CRIU story we had a few years ago. My > > current understanding is: > > > > *during CRIU restore* > > 0. somehow disable mseal for CRIU binary itself, to make sure that > > when CRIU do clone() we don't get any mappings sealed > > 1. restore all memory mappings of the restorable process without > > mseal() applied to them > > 2. at the later criu restore stage go over them and apply mseal() > > > > I have a bad feeling that I still miss something, but even step 0 is a > > problem right now if we go with the current approach from this > > patch series, isn't it? > > I am not familiar on how CRIU snapshot/restore is done, and how is > responsible to do each step. Is the kernel involved in any dump step, > meaning that you need either to start the process with some IPC, or it > just done in userland (with ptrace or other way to stop the process > plus reading /proc/mem)? It is done in userland. CRIU uses ptrace, proc and even injects a small binary code in a target process to collect all required information to be able to restore the process in the same state later. > > And on restore, how is this accomplished? The process is a bit more complicated, but for a basic understanding, it involves the following steps: fork a new process; restore all mappings; unmap all CRIU mappings; remap the restored mappings to the correct addresses; and finally, resume the process. Thanks, Andrei