Re: [PATCH v5] x86/devmem: Drop /dev/mem access for confidential guests

Linux Confidential Computing Development
 help / color / mirror / Atom feed

From: Dan Williams <dan.j.williams@intel.com>
To: Arnd Bergmann <arnd@arndb.de>,
	Dan Williams <dan.j.williams@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Ingo Molnar <mingo@kernel.org>, Kees Cook <kees@kernel.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Naveen N Rao <naveen@kernel.org>,
	Nikolay Borisov <nik.borisov@suse.com>, <stable@vger.kernel.org>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	"Vishal Annapurve" <vannapurve@google.com>, <x86@kernel.org>,
	<linux-coco@lists.linux.dev>
Subject: Re: [PATCH v5] x86/devmem: Drop /dev/mem access for confidential guests
Date: Wed, 30 Apr 2025 17:56:45 -0700	[thread overview]
Message-ID: <6812c6cda0575_1d6a294d7@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <0bdb1876-0cb3-4632-910b-2dc191902e3e@app.fastmail.com>

Arnd Bergmann wrote:
> On Wed, Apr 30, 2025, at 04:46, Dan Williams wrote:
> > While there is an existing mitigation to simulate and redirect access to
> > the BIOS data area with STRICT_DEVMEM=y, it is insufficient.
> > Specifically, STRICT_DEVMEM=y traps read(2) access to the BIOS data
> > area, and returns a zeroed buffer.  However, it turns out the kernel
> > fails to enforce the same via mmap(2), and a direct mapping is
> > established. This is a hole, and unfortunately userspace has learned to
> > exploit it [2].
> 
> As far as I can tell, this was a deliberate design choice in
> commit a4866aa81251 ("mm: Tighten x86 /dev/mem with zeroing reads"),
> which did not try to forbid it completely but mainly avoids triggering
> the hardened usercopy check.

I would say not a "design choice", but rather a known leftover hole that
nobody has had the initiative to close since 2022.

https://lore.kernel.org/all/202204071526.37364B5E3@keescook/

> > The simplest option for now is arrange for /dev/mem to always behave as
> > if lockdown is enabled for confidential guests. Require confidential
> > guest userspace to jettison legacy dependencies on /dev/mem similar to
> > how other legacy mechanisms are jettisoned for confidential operation.
> > Recall that modern methods for BIOS data access are available like
> > /sys/firmware/dmi/tables.
> 
> Restricting /dev/mem further is a good idea, but it would be nice
> if that could be done without adding yet another special case.
> 
> An even more radical approach would be to just disallow CONFIG_DEVMEM
> for any configuration that includes ARCH_HAS_CC_PLATFORM, but that
> may go a little too far.

Right, for example the policy could go as far as to always require
generic LOCKDOWN_KERNEL for confidential guests, but a distro likely
wants to be able to build confidential guests and bare metal host
kernels from the same kernel config. At a minimum it seems difficult to
get away from a runtime "is_confidential_guest()" check.

The other observation is that generic LOCKDOWN_KERNEL is about
protecting against root being able to compromise platform integrity
where confidential computing is full trust within the TEE, including
root to run amok, and no trust outside that.

> The existing rules that I can see are:
> 
> - readl/write is only allowed on actual (lowmem) RAM, not
>   on MMIO registers, enforced by valid_phys_addr_range()
> - with STRICT_DEVMEM, read/write is disallowed on both
>   RAM and MMIO
> - an an exception, x86 additionally allows read/write on the
>   low 1MB MMIO region and 32-bit PCI MMIO BAR space, with
>   a custom xlate_dev_mem_ptr() that calls either memremap()
>   or ioremap() on the physical address.
> - as another exception from that, the low 1MB on x86 behaves
>   like /dev/zero for memory pages when STRICT_DEVMEM
>   is set, and ignores conflicting drivers for MMIO registers
> - The PowerPC sys_rtas syscall has another exception in
>   order to ignore the STRICT_DEVMEM and write to a portion
>   of kernel memory to talk to firmware
> - on the mmap() side, x86 has another special to allow
>   mapping RAM in the first 1MB despite STRICT_DEVMEM
> 
> How about changing x86 to work more like the others and
> removing the special cases for the first 1MB and for the
> 32-bit PCI BAR space? If Xorg, and dmidecode are able to
> do this differently, maybe the hacks can just go away, or
> be guarded by a Kconfig option that is mutually exclusive
> with ARCH_HAS_CC_PLATFORM?

I see the 1MB MMIO special-case in x86::devmem_is_allowed(), but where
is the 32-bit PCI BAR space workaround? Just to make sure I am not
missing a detail here.

Note, this devmem exclusion effort previously went through a phase of
always returning "0" (no access) from x86::devmem_is_allowed() [1]. The
rationale for hacking the special case into open_port() was to maintain
ABI consistency with LOCKDOWN_KERNEL. Right now x86 userspace expects
either LOCKDOWN_KERNEL semantics, or read(2) returns zero and mmap(2) is
unrestricted. If devmem_is_allowed() always says "no" then userspace is
introduced to a new failure mode.

I am open to rip the band-aid off and see what happens, but Robustness
Principle suggested mimicking semantics that LOCKDOWN_KERNEL has already
socialized.

[1]: http://lore.kernel.org/67f8a1a15cc29_7205294d7@dwillia2-xfh.jf.intel.com.notmuch

> > @@ -595,6 +596,15 @@ static int open_port(struct inode *inode, struct 
> > file *filp)
> >  	if (rc)
> >  		return rc;
> > 
> > +	/*
> > +	 * Enforce encrypted mapping consistency and avoid unaccepted
> > +	 * memory conflicts, "lockdown" /dev/mem for confidential
> > +	 * guests.
> > +	 */
> > +	if (IS_ENABLED(CONFIG_STRICT_DEVMEM) &&
> > +	    cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
> > +		return -EPERM;
> > +
> 
> The description only talks about /dev/mem, but it looks like this
> blocks /dev/port as well. Blocking /dev/port may also be a good
> idea, but I don't see why that would be conditional on
> CC_ATTR_GUEST_MEM_ENCRYPT.

That is more a side effect of wanting to mimic the
security_locked_down(LOCKDOWN_DEV_MEM) behavior. That hook implies all
of /dev/{mem,kmem,port} follow the same policy.

> When CONFIG_DEVMEM=y and CONFIG_STRICT_DEVMEM=n, doesn't this still
> have the same problem for CC guests?

It does, but that's the point. The CC guests that need the exclusion
have "select STRICT_DEVMEM", and *maybe* some CC guest arch could
tolerate raw devmem access.

However, that is unlikely. The observations here and from Greg point to
security_locked_down() should be providing the answer here. Which
completes the full circle back towards Nikolay's original proposal of
allowing lockdown policy to handle a bitmap of options [2].

Nikolay, part of me is glad to have done the full exploration of the
problem space here. I learned something. At the same time it is humbling
to realize I could have saved everyone's time just supporting your
effort. Please pick up your lockdown bitmap proposal and consider this
thread a long-winded Reviewed-by.

[2]: http://lore.kernel.org/20250321102422.640271-1-nik.borisov@suse.com

BTW, you can avoid the IO_STRICT_DEVMEM complexity by including
LOCKDOWN_PCI_ACCESS in the list of LOCKDOWN bits that CC guests always
enable. If someone wants to enable confidential userspace PCI drivers
they can do the work to switch from LOCKDOWN_PCI_ACCESS to
IO_STRICT_DEVMEM.

next prev parent reply	other threads:[~2025-05-01  0:57 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-30  2:46 [PATCH v5] Restrict devmem for confidential VMs Dan Williams
2025-04-30  2:46 ` [PATCH v5] x86/devmem: Remove duplicate range_is_allowed() definition Dan Williams
2025-04-30  7:19   ` Greg Kroah-Hartman
2025-04-30  2:46 ` [PATCH v5] x86/devmem: Drop /dev/mem access for confidential guests Dan Williams
2025-04-30  7:19   ` Greg Kroah-Hartman
2025-04-30 23:03     ` Dan Williams
2025-04-30 17:31   ` Arnd Bergmann
2025-05-01  0:56     ` Dan Williams [this message]
2025-05-01  8:12       ` Arnd Bergmann
2025-05-01 20:01         ` Arnd Bergmann
2025-05-01 20:18           ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6812c6cda0575_1d6a294d7@dwillia2-xfh.jf.intel.com.notmuch \
    --to=dan.j.williams@intel.com \
    --cc=arnd@arndb.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=kees@kernel.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-coco@lists.linux.dev \
    --cc=mingo@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=naveen@kernel.org \
    --cc=nik.borisov@suse.com \
    --cc=stable@vger.kernel.org \
    --cc=suzuki.poulose@arm.com \
    --cc=vannapurve@google.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox