qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/1] vfio/nvlink: Remove exec permission to avoid SELinux AVCs
@ 2020-05-01  5:23 Leonardo Bras
  0 siblings, 0 replies; 3+ messages in thread
From: Leonardo Bras @ 2020-05-01  5:23 UTC (permalink / raw)
  To: Alex Williamson, Alexey Kardashevskiy; +Cc: Leonardo Bras, qemu-devel

If SELinux is setup without 'execmem' permission for qemu, all mmap
with (PROT_WRITE | PROT_EXEC) will fail and print a warning in
SELinux log.

If "nvlink2-mr" memory allocation fails (fist diff), it will cause
guest NUMA nodes to not be correctly configured (V100 memory will
not be visible for guest, nor its NUMA nodes).

Not having 'execmem' permission is intesting for virtual machines to
avoid buffer-overflow based attacks, and it's adopted in distros
like RHEL.

So, removing the PROT_EXEC flag seems the right thing to do.

Browsing some other code that mmaps memory for usage with
memory_region_init_ram_device_ptr, I could notice it's usual to
not have PROT_EXEC (only PROT_READ | PROT_WRITE), so it should be
no problem around this.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
---
 hw/vfio/pci-quirks.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index 2d348f8237..124d4f57e1 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -1620,7 +1620,7 @@ int vfio_pci_nvidia_v100_ram_init(VFIOPCIDevice *vdev, Error **errp)
     }
     cap = (void *) hdr;
 
-    p = mmap(NULL, nv2reg->size, PROT_READ | PROT_WRITE | PROT_EXEC,
+    p = mmap(NULL, nv2reg->size, PROT_READ | PROT_WRITE,
              MAP_SHARED, vdev->vbasedev.fd, nv2reg->offset);
     if (p == MAP_FAILED) {
         ret = -errno;
@@ -1680,7 +1680,7 @@ int vfio_pci_nvlink2_init(VFIOPCIDevice *vdev, Error **errp)
 
     /* Some NVLink bridges may not have assigned ATSD */
     if (atsdreg->size) {
-        p = mmap(NULL, atsdreg->size, PROT_READ | PROT_WRITE | PROT_EXEC,
+        p = mmap(NULL, atsdreg->size, PROT_READ | PROT_WRITE,
                  MAP_SHARED, vdev->vbasedev.fd, atsdreg->offset);
         if (p == MAP_FAILED) {
             ret = -errno;
-- 
2.25.4



^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH 1/1] vfio/nvlink: Remove exec permission to avoid SELinux AVCs
@ 2020-05-01  5:54 Leonardo Bras
  2020-05-01  6:44 ` Alexey Kardashevskiy
  0 siblings, 1 reply; 3+ messages in thread
From: Leonardo Bras @ 2020-05-01  5:54 UTC (permalink / raw)
  To: Alex Williamson, Alexey Kardashevskiy; +Cc: Leonardo Bras, qemu-devel

If SELinux is setup without 'execmem' permission for qemu, all mmap
with (PROT_WRITE | PROT_EXEC) will fail and print a warning in
SELinux log.

If "nvlink2-mr" memory allocation fails (fist diff), it will cause
guest NUMA nodes to not be correctly configured (V100 memory will
not be visible for guest, nor its NUMA nodes).

Not having 'execmem' permission is intesting for virtual machines to
avoid buffer-overflow based attacks, and it's adopted in distros
like RHEL.

So, removing the PROT_EXEC flag seems the right thing to do.

Browsing some other code that mmaps memory for usage with
memory_region_init_ram_device_ptr, I could notice it's usual to
not have PROT_EXEC (only PROT_READ | PROT_WRITE), so it should be
no problem around this.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
---
 hw/vfio/pci-quirks.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index 2d348f8237..124d4f57e1 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -1620,7 +1620,7 @@ int vfio_pci_nvidia_v100_ram_init(VFIOPCIDevice *vdev, Error **errp)
     }
     cap = (void *) hdr;
 
-    p = mmap(NULL, nv2reg->size, PROT_READ | PROT_WRITE | PROT_EXEC,
+    p = mmap(NULL, nv2reg->size, PROT_READ | PROT_WRITE,
              MAP_SHARED, vdev->vbasedev.fd, nv2reg->offset);
     if (p == MAP_FAILED) {
         ret = -errno;
@@ -1680,7 +1680,7 @@ int vfio_pci_nvlink2_init(VFIOPCIDevice *vdev, Error **errp)
 
     /* Some NVLink bridges may not have assigned ATSD */
     if (atsdreg->size) {
-        p = mmap(NULL, atsdreg->size, PROT_READ | PROT_WRITE | PROT_EXEC,
+        p = mmap(NULL, atsdreg->size, PROT_READ | PROT_WRITE,
                  MAP_SHARED, vdev->vbasedev.fd, atsdreg->offset);
         if (p == MAP_FAILED) {
             ret = -errno;
-- 
2.25.4



^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH 1/1] vfio/nvlink: Remove exec permission to avoid SELinux AVCs
  2020-05-01  5:54 Leonardo Bras
@ 2020-05-01  6:44 ` Alexey Kardashevskiy
  0 siblings, 0 replies; 3+ messages in thread
From: Alexey Kardashevskiy @ 2020-05-01  6:44 UTC (permalink / raw)
  To: Leonardo Bras, Alex Williamson; +Cc: Paul Mackerras, qemu-devel



On 01/05/2020 15:54, Leonardo Bras wrote:
> If SELinux is setup without 'execmem' permission for qemu, all mmap
> with (PROT_WRITE | PROT_EXEC) will fail and print a warning in
> SELinux log.
> 
> If "nvlink2-mr" memory allocation fails (fist diff), it will cause
> guest NUMA nodes to not be correctly configured (V100 memory will
> not be visible for guest, nor its NUMA nodes).
> 
> Not having 'execmem' permission is intesting for virtual machines to
> avoid buffer-overflow based attacks, and it's adopted in distros
> like RHEL.
> 
> So, removing the PROT_EXEC flag seems the right thing to do.
> 
> Browsing some other code that mmaps memory for usage with
> memory_region_init_ram_device_ptr, I could notice it's usual to
> not have PROT_EXEC (only PROT_READ | PROT_WRITE), so it should be
> no problem around this.

> 
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> ---
>  hw/vfio/pci-quirks.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
> index 2d348f8237..124d4f57e1 100644
> --- a/hw/vfio/pci-quirks.c
> +++ b/hw/vfio/pci-quirks.c
> @@ -1620,7 +1620,7 @@ int vfio_pci_nvidia_v100_ram_init(VFIOPCIDevice *vdev, Error **errp)
>      }
>      cap = (void *) hdr;
>  
> -    p = mmap(NULL, nv2reg->size, PROT_READ | PROT_WRITE | PROT_EXEC,
> +    p = mmap(NULL, nv2reg->size, PROT_READ | PROT_WRITE,



Unlike other users of memory_region_init_ram_device_ptr(), this memory
is more like a normal RAM, it is cache coherent and you can execute code
from it. I am unaware if this happens in practice but it easily could.

Having said that, I can see QEMU is not setting PROT_EXEC even for RAM
but KVM sets it anyway in a PTE in the partition scope tree (== guest
physical-to-host physical translation table) at [1] which is fine, I
just checked, with the change, the exec bit is still in the partition tree.


[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/kvm/book3s_64_mmu_radix.c?h=v5.7-rc3#n857




>               MAP_SHARED, vdev->vbasedev.fd, nv2reg->offset);
>      if (p == MAP_FAILED) {
>          ret = -errno;
> @@ -1680,7 +1680,7 @@ int vfio_pci_nvlink2_init(VFIOPCIDevice *vdev, Error **errp)
>  
>      /* Some NVLink bridges may not have assigned ATSD */
>      if (atsdreg->size) {
> -        p = mmap(NULL, atsdreg->size, PROT_READ | PROT_WRITE | PROT_EXEC,
> +        p = mmap(NULL, atsdreg->size, PROT_READ | PROT_WRITE,


This chunk is ok as ATSD is a set of registers.



Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>



>                   MAP_SHARED, vdev->vbasedev.fd, atsdreg->offset);
>          if (p == MAP_FAILED) {
>              ret = -errno;
> 

-- 
Alexey


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-05-01 13:46 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-05-01  5:23 [PATCH 1/1] vfio/nvlink: Remove exec permission to avoid SELinux AVCs Leonardo Bras
  -- strict thread matches above, loose matches on Subject: below --
2020-05-01  5:54 Leonardo Bras
2020-05-01  6:44 ` Alexey Kardashevskiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).