linux-coco.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: "Arnd Bergmann" <arnd@arndb.de>
To: "Dan Williams" <dan.j.williams@intel.com>,
	"Dave Hansen" <dave.hansen@linux.intel.com>
Cc: "Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Ingo Molnar" <mingo@kernel.org>, "Kees Cook" <kees@kernel.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	"Michael Ellerman" <mpe@ellerman.id.au>,
	"Naveen N Rao" <naveen@kernel.org>,
	"Nikolay Borisov" <nik.borisov@suse.com>,
	stable@vger.kernel.org,
	"Suzuki K Poulose" <suzuki.poulose@arm.com>,
	"Vishal Annapurve" <vannapurve@google.com>,
	x86@kernel.org, linux-coco@lists.linux.dev
Subject: Re: [PATCH v5] x86/devmem: Drop /dev/mem access for confidential guests
Date: Thu, 01 May 2025 10:12:06 +0200	[thread overview]
Message-ID: <b48aac71-5148-4be2-b95f-ec60e4f490bd@app.fastmail.com> (raw)
In-Reply-To: <6812c6cda0575_1d6a294d7@dwillia2-xfh.jf.intel.com.notmuch>

On Thu, May 1, 2025, at 02:56, Dan Williams wrote:
> Arnd Bergmann wrote:
>> On Wed, Apr 30, 2025, at 04:46, Dan Williams wrote:
>> > While there is an existing mitigation to simulate and redirect access to
>> > the BIOS data area with STRICT_DEVMEM=y, it is insufficient.
>> > Specifically, STRICT_DEVMEM=y traps read(2) access to the BIOS data
>> > area, and returns a zeroed buffer.  However, it turns out the kernel
>> > fails to enforce the same via mmap(2), and a direct mapping is
>> > established. This is a hole, and unfortunately userspace has learned to
>> > exploit it [2].
>> 
>> As far as I can tell, this was a deliberate design choice in
>> commit a4866aa81251 ("mm: Tighten x86 /dev/mem with zeroing reads"),
>> which did not try to forbid it completely but mainly avoids triggering
>> the hardened usercopy check.
>
> I would say not a "design choice", but rather a known leftover hole that
> nobody has had the initiative to close since 2022.
>
> https://lore.kernel.org/all/202204071526.37364B5E3@keescook/

Ok, I see.

>> The existing rules that I can see are:
>> 
>> - readl/write is only allowed on actual (lowmem) RAM, not
>>   on MMIO registers, enforced by valid_phys_addr_range()
>> - with STRICT_DEVMEM, read/write is disallowed on both
>>   RAM and MMIO
>> - an an exception, x86 additionally allows read/write on the
>>   low 1MB MMIO region and 32-bit PCI MMIO BAR space, with
>>   a custom xlate_dev_mem_ptr() that calls either memremap()
>>   or ioremap() on the physical address.
>> - as another exception from that, the low 1MB on x86 behaves
>>   like /dev/zero for memory pages when STRICT_DEVMEM
>>   is set, and ignores conflicting drivers for MMIO registers
>> - The PowerPC sys_rtas syscall has another exception in
>>   order to ignore the STRICT_DEVMEM and write to a portion
>>   of kernel memory to talk to firmware
>> - on the mmap() side, x86 has another special to allow
>>   mapping RAM in the first 1MB despite STRICT_DEVMEM
>> 
>> How about changing x86 to work more like the others and
>> removing the special cases for the first 1MB and for the
>> 32-bit PCI BAR space? If Xorg, and dmidecode are able to
>> do this differently, maybe the hacks can just go away, or
>> be guarded by a Kconfig option that is mutually exclusive
>> with ARCH_HAS_CC_PLATFORM?
>
> I see the 1MB MMIO special-case in x86::devmem_is_allowed(), but where
> is the 32-bit PCI BAR space workaround? Just to make sure I am not
> missing a detail here.

The main difference on x86 is the xlate_dev_mem_ptr() function
that does an extra memremap() of the physical address, everything
else just does a phys_to_virt(). The only other architecture
with an xlate_dev_mem_ptr() implementation is s390, which uses
it to work around the first physical page being different per CPU.
ia64 had something similar to x86 but is gone now.

The other bit of the puzzle is that memremap() on x86 silently
falls back to ioremap() for non-RAM pages. This was originally
added in 2008 commit e045fb2a988a ("x86: PAT avoid aliasing in
/dev/mem read/write"). I'm not sure what happened exactly, but
I suspect that the low 1MB was already mapped at the time
through a cached mapping, while the PCI MMIO hole was perhaps
not mapped. On x86-32, the 32-bit PCI BAR area should not
be included here (since it's above high_memory), but the 16MB
hold may be.

The address is first checked by valid_phys_addr_range(), which
is defined in an architecture specific way, so it's possible that
there are additional architectures on which that includes MMIO
ranges that can be accesses through phys_to_virt(), but I could
not find any.

The default valid_phys_addr_range() just checks against
'high_memory' to see whether the address is in the linear
map. This works on all architectures that don't have holes
in the memory map for MMIO (most of them) or that don't
just ioremap() all MMIO space into the hole (most of the rest).

arm64 and loongarch check memblock allow known RAM both in
the linear map and outside of it, while arm32 and sh explicitly
exclude addresses before PHYS_OFFSET on machines where RAM
does not start at address 0.

       Arnd

  reply	other threads:[~2025-05-01  8:13 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-30  2:46 [PATCH v5] Restrict devmem for confidential VMs Dan Williams
2025-04-30  2:46 ` [PATCH v5] x86/devmem: Remove duplicate range_is_allowed() definition Dan Williams
2025-04-30  7:19   ` Greg Kroah-Hartman
2025-04-30  2:46 ` [PATCH v5] x86/devmem: Drop /dev/mem access for confidential guests Dan Williams
2025-04-30  7:19   ` Greg Kroah-Hartman
2025-04-30 23:03     ` Dan Williams
2025-04-30 17:31   ` Arnd Bergmann
2025-05-01  0:56     ` Dan Williams
2025-05-01  8:12       ` Arnd Bergmann [this message]
2025-05-01 20:01         ` Arnd Bergmann
2025-05-01 20:18           ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b48aac71-5148-4be2-b95f-ec60e4f490bd@app.fastmail.com \
    --to=arnd@arndb.de \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=kees@kernel.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-coco@lists.linux.dev \
    --cc=mingo@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=naveen@kernel.org \
    --cc=nik.borisov@suse.com \
    --cc=stable@vger.kernel.org \
    --cc=suzuki.poulose@arm.com \
    --cc=vannapurve@google.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).