public inbox for qemu-devel@nongnu.org
 help / color / mirror / Atom feed
* [PATCH] linux-user: Fix zero_bss for RX PT_LOAD segments
@ 2026-02-02  0:24 Razvan Ghiorghe
  2026-02-03  4:56 ` Richard Henderson
  2026-03-06 19:53 ` Helge Deller
  0 siblings, 2 replies; 6+ messages in thread
From: Razvan Ghiorghe @ 2026-02-02  0:24 UTC (permalink / raw)
  To: qemu-devel, laurent, razvanghiorghe16

zero_bss() incorrectly assumed that any PT_LOAD containing .bss must be
writable, rejecting valid ELF binaries where .bss overlaps the tail of
an RX file-backed page.

Instead of failing, temporarily enable write access on the overlapping
page to zero the fractional bss range, then restore the original page
permissions once initialization is complete.

To validate the correctness of the modified zero_bss() implementation,
two targeted test cases were constructed, designed to exercise the edge cases where
the .bss segment overlaps a partially filled virtual memory page belonging to a
R_X region. The test binaries were intentionally built without a main() function
and instead defined a custom ELF entry-point via the _start symbol.
This approach bypasses CRT, dynamic loader, libc initialization etc. ensuring that
execution begins immediately after QEMU completes ELF loading and memory mapping.

The first binary defines a minimal _start routine and immediately terminates
via a system call, without ever referencing the .bss symbol. Although a .bss section
is present in the ELF, it is not accessed at runtime, and the resulting PT_LOAD
mapping can be established without triggering any writes to a file-backed RX page.
In this configuration, QEMU successfully loads the binary, and the loader reaches
the zero_bss() path, validating that the fractional .bss region is correctly zeroed
without violating the original segment permissions.

The second binary explicitly reads from the global .bss symbol (x) at program entry.
This forces the linker to materialize the .bss region within the same PT_LOAD
segment as the RX code, yielding a segment with p_filesz < p_memsz and flags R|X.
In this case, QEMU correctly fails during the initial file-backed mmap() of the PT_LOAD
segment, returning EINVAL. This behavior is consistent with the Linux kernel’s ELF
loader semantics, which prohibit mapping a file-backed segment as RX when it (logically)
contains writable memory. Consequently, this failure occurs before zero_bss()
is reached (behaviour expected and correct).

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3179
Signed-off-by: Razvan Ghiorghe <razvanghiorghe16@gmail.com>
---
 linux-user/elfload.c | 38 +++++++++++++++++++++++---------------
 1 file changed, 23 insertions(+), 15 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 35471c0c9a..fa3f7cda69 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -449,18 +449,11 @@ static bool zero_bss(abi_ulong start_bss, abi_ulong end_bss,
 {
     abi_ulong align_bss;
 
-    /* We only expect writable bss; the code segment shouldn't need this. */
-    if (!(prot & PROT_WRITE)) {
-        error_setg(errp, "PT_LOAD with non-writable bss");
-        return false;
-    }
-
     align_bss = TARGET_PAGE_ALIGN(start_bss);
     end_bss = TARGET_PAGE_ALIGN(end_bss);
 
     if (start_bss < align_bss) {
         int flags = page_get_flags(start_bss);
-
         if (!(flags & PAGE_RWX)) {
             /*
              * The whole address space of the executable was reserved
@@ -472,20 +465,35 @@ static bool zero_bss(abi_ulong start_bss, abi_ulong end_bss,
              */
             align_bss -= TARGET_PAGE_SIZE;
         } else {
+            abi_ulong start_page_aligned = start_bss & TARGET_PAGE_MASK;
             /*
-             * The start of the bss shares a page with something.
-             * The only thing that we expect is the data section,
-             * which would already be marked writable.
-             * Overlapping the RX code segment seems malformed.
+             * The logical OR between flags and PAGE_WRITE works because
+             * in include/exec/page-protection.h they are defined as PROT_*
+             * values, matching mprotect().
+             * Temporarily enable write access to zero the fractional bss.
+             * target_mprotect() handles TB invalidation if needed.
              */
             if (!(flags & PAGE_WRITE)) {
-                error_setg(errp, "PT_LOAD with bss overlapping "
-                           "non-writable page");
-                return false;
+                if (target_mprotect(start_page_aligned,
+                                    TARGET_PAGE_SIZE,
+                                    prot | PAGE_WRITE) == -1) {
+                    error_setg_errno(errp, errno,
+                                    "Error enabling write access for bss");
+                    return false;
+                }
             }
 
-            /* The page is already mapped and writable. */
+            /* The page is already mapped and now guaranteed writable. */
             memset(g2h_untagged(start_bss), 0, align_bss - start_bss);
+
+            if (!(flags & PAGE_WRITE)) {
+                if (target_mprotect(start_page_aligned,
+                                    TARGET_PAGE_SIZE, prot) == -1) {
+                    error_setg_errno(errp, errno,
+                                    "Error restoring bss first permissions");
+                    return false;
+                }
+            }
         }
     }
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] linux-user: Fix zero_bss for RX PT_LOAD segments
  2026-02-02  0:24 [PATCH] linux-user: Fix zero_bss for RX PT_LOAD segments Razvan Ghiorghe
@ 2026-02-03  4:56 ` Richard Henderson
  2026-03-06 19:53 ` Helge Deller
  1 sibling, 0 replies; 6+ messages in thread
From: Richard Henderson @ 2026-02-03  4:56 UTC (permalink / raw)
  To: Razvan Ghiorghe, qemu-devel, laurent

On 2/2/26 10:24, Razvan Ghiorghe wrote:
> To validate the correctness of the modified zero_bss() implementation,
> two targeted test cases were constructed, designed to exercise the edge cases where
> the .bss segment overlaps a partially filled virtual memory page belonging to a
> R_X region. The test binaries were intentionally built without a main() function
> and instead defined a custom ELF entry-point via the _start symbol.
> This approach bypasses CRT, dynamic loader, libc initialization etc. ensuring that
> execution begins immediately after QEMU completes ELF loading and memory mapping.

It would be nice to include those test cases.


r~


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] linux-user: Fix zero_bss for RX PT_LOAD segments
  2026-02-02  0:24 [PATCH] linux-user: Fix zero_bss for RX PT_LOAD segments Razvan Ghiorghe
  2026-02-03  4:56 ` Richard Henderson
@ 2026-03-06 19:53 ` Helge Deller
  2026-03-10 20:49   ` razvan ghiorghe
  1 sibling, 1 reply; 6+ messages in thread
From: Helge Deller @ 2026-03-06 19:53 UTC (permalink / raw)
  To: Razvan Ghiorghe, qemu-devel, laurent

Hello Razvan,

On 2/2/26 01:24, Razvan Ghiorghe wrote:
> zero_bss() incorrectly assumed that any PT_LOAD containing .bss must be
> writable, rejecting valid ELF binaries where .bss overlaps the tail of
> an RX file-backed page.

A .bss section which overlaps with a RX page?
That sounds strange.
Since I wonder, if is this a real-world case or just a theoretical case,
maybe you can explain a littler further why/how you ran into that issue?

Helge
  
> Instead of failing, temporarily enable write access on the overlapping
> page to zero the fractional bss range, then restore the original page
> permissions once initialization is complete.
> 
> To validate the correctness of the modified zero_bss() implementation,
> two targeted test cases were constructed, designed to exercise the edge cases where
> the .bss segment overlaps a partially filled virtual memory page belonging to a
> R_X region. The test binaries were intentionally built without a main() function
> and instead defined a custom ELF entry-point via the _start symbol.
> This approach bypasses CRT, dynamic loader, libc initialization etc. ensuring that
> execution begins immediately after QEMU completes ELF loading and memory mapping.
> 
> The first binary defines a minimal _start routine and immediately terminates
> via a system call, without ever referencing the .bss symbol. Although a .bss section
> is present in the ELF, it is not accessed at runtime, and the resulting PT_LOAD
> mapping can be established without triggering any writes to a file-backed RX page.
> In this configuration, QEMU successfully loads the binary, and the loader reaches
> the zero_bss() path, validating that the fractional .bss region is correctly zeroed
> without violating the original segment permissions.
> 
> The second binary explicitly reads from the global .bss symbol (x) at program entry.
> This forces the linker to materialize the .bss region within the same PT_LOAD
> segment as the RX code, yielding a segment with p_filesz < p_memsz and flags R|X.
> In this case, QEMU correctly fails during the initial file-backed mmap() of the PT_LOAD
> segment, returning EINVAL. This behavior is consistent with the Linux kernel’s ELF
> loader semantics, which prohibit mapping a file-backed segment as RX when it (logically)
> contains writable memory. Consequently, this failure occurs before zero_bss()
> is reached (behaviour expected and correct).
> 
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3179
> Signed-off-by: Razvan Ghiorghe <razvanghiorghe16@gmail.com>
> ---
>   linux-user/elfload.c | 38 +++++++++++++++++++++++---------------
>   1 file changed, 23 insertions(+), 15 deletions(-)
> 
> diff --git a/linux-user/elfload.c b/linux-user/elfload.c
> index 35471c0c9a..fa3f7cda69 100644
> --- a/linux-user/elfload.c
> +++ b/linux-user/elfload.c
> @@ -449,18 +449,11 @@ static bool zero_bss(abi_ulong start_bss, abi_ulong end_bss,
>   {
>       abi_ulong align_bss;
>   
> -    /* We only expect writable bss; the code segment shouldn't need this. */
> -    if (!(prot & PROT_WRITE)) {
> -        error_setg(errp, "PT_LOAD with non-writable bss");
> -        return false;
> -    }
> -
>       align_bss = TARGET_PAGE_ALIGN(start_bss);
>       end_bss = TARGET_PAGE_ALIGN(end_bss);
>   
>       if (start_bss < align_bss) {
>           int flags = page_get_flags(start_bss);
> -
>           if (!(flags & PAGE_RWX)) {
>               /*
>                * The whole address space of the executable was reserved
> @@ -472,20 +465,35 @@ static bool zero_bss(abi_ulong start_bss, abi_ulong end_bss,
>                */
>               align_bss -= TARGET_PAGE_SIZE;
>           } else {
> +            abi_ulong start_page_aligned = start_bss & TARGET_PAGE_MASK;
>               /*
> -             * The start of the bss shares a page with something.
> -             * The only thing that we expect is the data section,
> -             * which would already be marked writable.
> -             * Overlapping the RX code segment seems malformed.
> +             * The logical OR between flags and PAGE_WRITE works because
> +             * in include/exec/page-protection.h they are defined as PROT_*
> +             * values, matching mprotect().
> +             * Temporarily enable write access to zero the fractional bss.
> +             * target_mprotect() handles TB invalidation if needed.
>                */
>               if (!(flags & PAGE_WRITE)) {
> -                error_setg(errp, "PT_LOAD with bss overlapping "
> -                           "non-writable page");
> -                return false;
> +                if (target_mprotect(start_page_aligned,
> +                                    TARGET_PAGE_SIZE,
> +                                    prot | PAGE_WRITE) == -1) {
> +                    error_setg_errno(errp, errno,
> +                                    "Error enabling write access for bss");
> +                    return false;
> +                }
>               }
>   
> -            /* The page is already mapped and writable. */
> +            /* The page is already mapped and now guaranteed writable. */
>               memset(g2h_untagged(start_bss), 0, align_bss - start_bss);
> +
> +            if (!(flags & PAGE_WRITE)) {
> +                if (target_mprotect(start_page_aligned,
> +                                    TARGET_PAGE_SIZE, prot) == -1) {
> +                    error_setg_errno(errp, errno,
> +                                    "Error restoring bss first permissions");
> +                    return false;
> +                }
> +            }
>           }
>       }
>   



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] linux-user: Fix zero_bss for RX PT_LOAD segments
  2026-03-06 19:53 ` Helge Deller
@ 2026-03-10 20:49   ` razvan ghiorghe
  2026-03-10 22:36     ` Helge Deller
  0 siblings, 1 reply; 6+ messages in thread
From: razvan ghiorghe @ 2026-03-10 20:49 UTC (permalink / raw)
  To: Helge Deller; +Cc: qemu-devel, laurent

[-- Attachment #1: Type: text/plain, Size: 7263 bytes --]

Hello Helge,

This is a real-life case reported by user xtex in GitLab issue ( #3179) [1]
who encountered an ELF binary with PT_LOAD + (PROT_READ | PROT_EXEC) that
the Linux kernel loads successfully but QEMU rejects.

In response to Richard's feedback, I sent a follow-up patch on February 3rd
with 2 test cases that reproduce the exact ELF layout:

test-zerobss-1 has a .bss section but never accesses it, validates that the
loader correctly zeros the fractional .bss in an RX segment without
permission violations. test-zerobss-2 reads from multiple .bss variables
and exits with the sum as the exit code ( if .bss was correctly zeroed,
exit code is 0)

Both of them use a custom linker script that forces .text and .bss into the
same PT_LOAD segment with FLAGS(5) = R|X.Note that this reply is to the
first patch and the test cases are in the second patch sent on February 3rd
[2].

Best regards,

Razvan

[1]: https://gitlab.com/qemu-project/qemu/-/issues/3179

[2]:
https://lore.kernel.org/all/20260203073740.19913-2-razvanghiorghe16@gmail.com/

(https://mail.gnu.org/archive/html/qemu-devel/2026-02/msg00662.html)


În vin., 6 mar. 2026 la 21:53, Helge Deller <deller@gmx.de> a scris:

> Hello Razvan,
>
> On 2/2/26 01:24, Razvan Ghiorghe wrote:
> > zero_bss() incorrectly assumed that any PT_LOAD containing .bss must be
> > writable, rejecting valid ELF binaries where .bss overlaps the tail of
> > an RX file-backed page.
>
> A .bss section which overlaps with a RX page?
> That sounds strange.
> Since I wonder, if is this a real-world case or just a theoretical case,
> maybe you can explain a littler further why/how you ran into that issue?
>
> Helge
>
> > Instead of failing, temporarily enable write access on the overlapping
> > page to zero the fractional bss range, then restore the original page
> > permissions once initialization is complete.
> >
> > To validate the correctness of the modified zero_bss() implementation,
> > two targeted test cases were constructed, designed to exercise the edge
> cases where
> > the .bss segment overlaps a partially filled virtual memory page
> belonging to a
> > R_X region. The test binaries were intentionally built without a main()
> function
> > and instead defined a custom ELF entry-point via the _start symbol.
> > This approach bypasses CRT, dynamic loader, libc initialization etc.
> ensuring that
> > execution begins immediately after QEMU completes ELF loading and memory
> mapping.
> >
> > The first binary defines a minimal _start routine and immediately
> terminates
> > via a system call, without ever referencing the .bss symbol. Although a
> .bss section
> > is present in the ELF, it is not accessed at runtime, and the resulting
> PT_LOAD
> > mapping can be established without triggering any writes to a
> file-backed RX page.
> > In this configuration, QEMU successfully loads the binary, and the
> loader reaches
> > the zero_bss() path, validating that the fractional .bss region is
> correctly zeroed
> > without violating the original segment permissions.
> >
> > The second binary explicitly reads from the global .bss symbol (x) at
> program entry.
> > This forces the linker to materialize the .bss region within the same
> PT_LOAD
> > segment as the RX code, yielding a segment with p_filesz < p_memsz and
> flags R|X.
> > In this case, QEMU correctly fails during the initial file-backed mmap()
> of the PT_LOAD
> > segment, returning EINVAL. This behavior is consistent with the Linux
> kernel’s ELF
> > loader semantics, which prohibit mapping a file-backed segment as RX
> when it (logically)
> > contains writable memory. Consequently, this failure occurs before
> zero_bss()
> > is reached (behaviour expected and correct).
> >
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3179
> > Signed-off-by: Razvan Ghiorghe <razvanghiorghe16@gmail.com>
> > ---
> >   linux-user/elfload.c | 38 +++++++++++++++++++++++---------------
> >   1 file changed, 23 insertions(+), 15 deletions(-)
> >
> > diff --git a/linux-user/elfload.c b/linux-user/elfload.c
> > index 35471c0c9a..fa3f7cda69 100644
> > --- a/linux-user/elfload.c
> > +++ b/linux-user/elfload.c
> > @@ -449,18 +449,11 @@ static bool zero_bss(abi_ulong start_bss,
> abi_ulong end_bss,
> >   {
> >       abi_ulong align_bss;
> >
> > -    /* We only expect writable bss; the code segment shouldn't need
> this. */
> > -    if (!(prot & PROT_WRITE)) {
> > -        error_setg(errp, "PT_LOAD with non-writable bss");
> > -        return false;
> > -    }
> > -
> >       align_bss = TARGET_PAGE_ALIGN(start_bss);
> >       end_bss = TARGET_PAGE_ALIGN(end_bss);
> >
> >       if (start_bss < align_bss) {
> >           int flags = page_get_flags(start_bss);
> > -
> >           if (!(flags & PAGE_RWX)) {
> >               /*
> >                * The whole address space of the executable was reserved
> > @@ -472,20 +465,35 @@ static bool zero_bss(abi_ulong start_bss,
> abi_ulong end_bss,
> >                */
> >               align_bss -= TARGET_PAGE_SIZE;
> >           } else {
> > +            abi_ulong start_page_aligned = start_bss & TARGET_PAGE_MASK;
> >               /*
> > -             * The start of the bss shares a page with something.
> > -             * The only thing that we expect is the data section,
> > -             * which would already be marked writable.
> > -             * Overlapping the RX code segment seems malformed.
> > +             * The logical OR between flags and PAGE_WRITE works because
> > +             * in include/exec/page-protection.h they are defined as
> PROT_*
> > +             * values, matching mprotect().
> > +             * Temporarily enable write access to zero the fractional
> bss.
> > +             * target_mprotect() handles TB invalidation if needed.
> >                */
> >               if (!(flags & PAGE_WRITE)) {
> > -                error_setg(errp, "PT_LOAD with bss overlapping "
> > -                           "non-writable page");
> > -                return false;
> > +                if (target_mprotect(start_page_aligned,
> > +                                    TARGET_PAGE_SIZE,
> > +                                    prot | PAGE_WRITE) == -1) {
> > +                    error_setg_errno(errp, errno,
> > +                                    "Error enabling write access for
> bss");
> > +                    return false;
> > +                }
> >               }
> >
> > -            /* The page is already mapped and writable. */
> > +            /* The page is already mapped and now guaranteed writable.
> */
> >               memset(g2h_untagged(start_bss), 0, align_bss - start_bss);
> > +
> > +            if (!(flags & PAGE_WRITE)) {
> > +                if (target_mprotect(start_page_aligned,
> > +                                    TARGET_PAGE_SIZE, prot) == -1) {
> > +                    error_setg_errno(errp, errno,
> > +                                    "Error restoring bss first
> permissions");
> > +                    return false;
> > +                }
> > +            }
> >           }
> >       }
> >
>
>

[-- Attachment #2: Type: text/html, Size: 12445 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] linux-user: Fix zero_bss for RX PT_LOAD segments
  2026-03-10 20:49   ` razvan ghiorghe
@ 2026-03-10 22:36     ` Helge Deller
  2026-03-12 20:09       ` Helge Deller
  0 siblings, 1 reply; 6+ messages in thread
From: Helge Deller @ 2026-03-10 22:36 UTC (permalink / raw)
  To: razvan ghiorghe; +Cc: qemu-devel, laurent

On 3/10/26 21:49, razvan ghiorghe wrote:
> Hello Helge,

Hello Razvan,
  
> This is a real-life case reported by user xtex in GitLab issue
> ( #3179) [1] who encountered an ELF binary with PT_LOAD + (PROT_READ
> | PROT_EXEC) that the Linux kernel loads successfully but QEMU
> rejects.

Well, it's just a report.
I don't see an actual real program which fails to load, just some
hand-crafted executables.

I'm not saying there isn't a problem or that you did not solved it,
but I think it's not ok to include code when there is not a real
problem which normal users would face.

> In response to Richard's feedback, I sent a follow-up patch on February 3rd with 2 test cases that reproduce the exact ELF layout:
> 
> test-zerobss-1 has a .bss section but never accesses it, validates that the loader correctly zeros the fractional .bss in an RX segment without permission violations. test-zerobss-2 reads from multiple .bss variables and exits with the sum as the exit code ( if .bss was correctly zeroed, exit code is 0)
> 
> Both of them use a custom linker script that forces .text and .bss into the same PT_LOAD segment with FLAGS(5) = R|X.Note that this reply is to the first patch and the test cases are in the second patch sent on February 3rd [2].

Interesting!
But again: Will normal users face this?

Helge


> Best regards,
> 
> Razvan
> 
> 
> [1]: https://gitlab.com/qemu-project/qemu/-/issues/3179 <https://gitlab.com/qemu-project/qemu/-/issues/3179>
> 
> [2]: https://lore.kernel.org/all/20260203073740.19913-2-razvanghiorghe16@gmail.com/ <https://lore.kernel.org/all/20260203073740.19913-2-razvanghiorghe16@gmail.com/>
> 
> (https://mail.gnu.org/archive/html/qemu-devel/2026-02/msg00662.html <https://mail.gnu.org/archive/html/qemu-devel/2026-02/msg00662.html>)
> 
> 
> 
> În vin., 6 mar. 2026 la 21:53, Helge Deller <deller@gmx.de <mailto:deller@gmx.de>> a scris:
> 
>     Hello Razvan,
> 
>     On 2/2/26 01:24, Razvan Ghiorghe wrote:
>      > zero_bss() incorrectly assumed that any PT_LOAD containing .bss must be
>      > writable, rejecting valid ELF binaries where .bss overlaps the tail of
>      > an RX file-backed page.
> 
>     A .bss section which overlaps with a RX page?
>     That sounds strange.
>     Since I wonder, if is this a real-world case or just a theoretical case,
>     maybe you can explain a littler further why/how you ran into that issue?
> 
>     Helge
> 
>      > Instead of failing, temporarily enable write access on the overlapping
>      > page to zero the fractional bss range, then restore the original page
>      > permissions once initialization is complete.
>      >
>      > To validate the correctness of the modified zero_bss() implementation,
>      > two targeted test cases were constructed, designed to exercise the edge cases where
>      > the .bss segment overlaps a partially filled virtual memory page belonging to a
>      > R_X region. The test binaries were intentionally built without a main() function
>      > and instead defined a custom ELF entry-point via the _start symbol.
>      > This approach bypasses CRT, dynamic loader, libc initialization etc. ensuring that
>      > execution begins immediately after QEMU completes ELF loading and memory mapping.
>      >
>      > The first binary defines a minimal _start routine and immediately terminates
>      > via a system call, without ever referencing the .bss symbol. Although a .bss section
>      > is present in the ELF, it is not accessed at runtime, and the resulting PT_LOAD
>      > mapping can be established without triggering any writes to a file-backed RX page.
>      > In this configuration, QEMU successfully loads the binary, and the loader reaches
>      > the zero_bss() path, validating that the fractional .bss region is correctly zeroed
>      > without violating the original segment permissions.
>      >
>      > The second binary explicitly reads from the global .bss symbol (x) at program entry.
>      > This forces the linker to materialize the .bss region within the same PT_LOAD
>      > segment as the RX code, yielding a segment with p_filesz < p_memsz and flags R|X.
>      > In this case, QEMU correctly fails during the initial file-backed mmap() of the PT_LOAD
>      > segment, returning EINVAL. This behavior is consistent with the Linux kernel’s ELF
>      > loader semantics, which prohibit mapping a file-backed segment as RX when it (logically)
>      > contains writable memory. Consequently, this failure occurs before zero_bss()
>      > is reached (behaviour expected and correct).
>      >
>      > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3179 <https://gitlab.com/qemu-project/qemu/-/issues/3179>
>      > Signed-off-by: Razvan Ghiorghe <razvanghiorghe16@gmail.com <mailto:razvanghiorghe16@gmail.com>>
>      > ---
>      >   linux-user/elfload.c | 38 +++++++++++++++++++++++---------------
>      >   1 file changed, 23 insertions(+), 15 deletions(-)
>      >
>      > diff --git a/linux-user/elfload.c b/linux-user/elfload.c
>      > index 35471c0c9a..fa3f7cda69 100644
>      > --- a/linux-user/elfload.c
>      > +++ b/linux-user/elfload.c
>      > @@ -449,18 +449,11 @@ static bool zero_bss(abi_ulong start_bss, abi_ulong end_bss,
>      >   {
>      >       abi_ulong align_bss;
>      >
>      > -    /* We only expect writable bss; the code segment shouldn't need this. */
>      > -    if (!(prot & PROT_WRITE)) {
>      > -        error_setg(errp, "PT_LOAD with non-writable bss");
>      > -        return false;
>      > -    }
>      > -
>      >       align_bss = TARGET_PAGE_ALIGN(start_bss);
>      >       end_bss = TARGET_PAGE_ALIGN(end_bss);
>      >
>      >       if (start_bss < align_bss) {
>      >           int flags = page_get_flags(start_bss);
>      > -
>      >           if (!(flags & PAGE_RWX)) {
>      >               /*
>      >                * The whole address space of the executable was reserved
>      > @@ -472,20 +465,35 @@ static bool zero_bss(abi_ulong start_bss, abi_ulong end_bss,
>      >                */
>      >               align_bss -= TARGET_PAGE_SIZE;
>      >           } else {
>      > +            abi_ulong start_page_aligned = start_bss & TARGET_PAGE_MASK;
>      >               /*
>      > -             * The start of the bss shares a page with something.
>      > -             * The only thing that we expect is the data section,
>      > -             * which would already be marked writable.
>      > -             * Overlapping the RX code segment seems malformed.
>      > +             * The logical OR between flags and PAGE_WRITE works because
>      > +             * in include/exec/page-protection.h they are defined as PROT_*
>      > +             * values, matching mprotect().
>      > +             * Temporarily enable write access to zero the fractional bss.
>      > +             * target_mprotect() handles TB invalidation if needed.
>      >                */
>      >               if (!(flags & PAGE_WRITE)) {
>      > -                error_setg(errp, "PT_LOAD with bss overlapping "
>      > -                           "non-writable page");
>      > -                return false;
>      > +                if (target_mprotect(start_page_aligned,
>      > +                                    TARGET_PAGE_SIZE,
>      > +                                    prot | PAGE_WRITE) == -1) {
>      > +                    error_setg_errno(errp, errno,
>      > +                                    "Error enabling write access for bss");
>      > +                    return false;
>      > +                }
>      >               }
>      >
>      > -            /* The page is already mapped and writable. */
>      > +            /* The page is already mapped and now guaranteed writable. */
>      >               memset(g2h_untagged(start_bss), 0, align_bss - start_bss);
>      > +
>      > +            if (!(flags & PAGE_WRITE)) {
>      > +                if (target_mprotect(start_page_aligned,
>      > +                                    TARGET_PAGE_SIZE, prot) == -1) {
>      > +                    error_setg_errno(errp, errno,
>      > +                                    "Error restoring bss first permissions");
>      > +                    return false;
>      > +                }
>      > +            }
>      >           }
>      >       }
>      >
> 



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] linux-user: Fix zero_bss for RX PT_LOAD segments
  2026-03-10 22:36     ` Helge Deller
@ 2026-03-12 20:09       ` Helge Deller
  0 siblings, 0 replies; 6+ messages in thread
From: Helge Deller @ 2026-03-12 20:09 UTC (permalink / raw)
  To: razvan ghiorghe; +Cc: qemu-devel, laurent

On 3/10/26 23:36, Helge Deller wrote:
> On 3/10/26 21:49, razvan ghiorghe wrote:
>>      > Instead of failing, temporarily enable write access on the overlapping
>>      > page to zero the fractional bss range, then restore the original page
>>      > permissions once initialization is complete.
>>      >
>>      > To validate the correctness of the modified zero_bss() implementation,
>>      > two targeted test cases were constructed, designed to exercise the edge cases where
>>      > the .bss segment overlaps a partially filled virtual memory page belonging to a
>>      > R_X region. The test binaries were intentionally built without a main() function
>>      > and instead defined a custom ELF entry-point via the _start symbol.
>>      > This approach bypasses CRT, dynamic loader, libc initialization etc. ensuring that
>>      > execution begins immediately after QEMU completes ELF loading and memory mapping.
>>      >
>>      > The first binary defines a minimal _start routine and immediately terminates
>>      > via a system call, without ever referencing the .bss symbol. Although a .bss section
>>      > is present in the ELF, it is not accessed at runtime, and the resulting PT_LOAD
>>      > mapping can be established without triggering any writes to a file-backed RX page.
>>      > In this configuration, QEMU successfully loads the binary, and the loader reaches
>>      > the zero_bss() path, validating that the fractional .bss region is correctly zeroed
>>      > without violating the original segment permissions.
>>      >
>>      > The second binary explicitly reads from the global .bss symbol (x) at program entry.
>>      > This forces the linker to materialize the .bss region within the same PT_LOAD
>>      > segment as the RX code, yielding a segment with p_filesz < p_memsz and flags R|X.
>>      > In this case, QEMU correctly fails during the initial file-backed mmap() of the PT_LOAD
>>      > segment, returning EINVAL. This behavior is consistent with the Linux kernel’s ELF
>>      > loader semantics, which prohibit mapping a file-backed segment as RX when it (logically)
>>      > contains writable memory. Consequently, this failure occurs before zero_bss()
>>      > is reached (behaviour expected and correct).
>>      >
>>      > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3179 <https://gitlab.com/qemu-project/qemu/-/issues/3179>
>>      > Signed-off-by: Razvan Ghiorghe <razvanghiorghe16@gmail.com <mailto:razvanghiorghe16@gmail.com>>

After some more thinking and looking at the code I think the patch itself is OK.
Allowing this combination on qemu is OK, since it's accepted on normal Linux kernels as well and does not break other normal use cases.

So:
Reviewed-by: Helge Deller <deller@gmx.de>

Helge

>>      > ---
>>      >   linux-user/elfload.c | 38 +++++++++++++++++++++++---------------
>>      >   1 file changed, 23 insertions(+), 15 deletions(-)
>>      >
>>      > diff --git a/linux-user/elfload.c b/linux-user/elfload.c
>>      > index 35471c0c9a..fa3f7cda69 100644
>>      > --- a/linux-user/elfload.c
>>      > +++ b/linux-user/elfload.c
>>      > @@ -449,18 +449,11 @@ static bool zero_bss(abi_ulong start_bss, abi_ulong end_bss,
>>      >   {
>>      >       abi_ulong align_bss;
>>      >
>>      > -    /* We only expect writable bss; the code segment shouldn't need this. */
>>      > -    if (!(prot & PROT_WRITE)) {
>>      > -        error_setg(errp, "PT_LOAD with non-writable bss");
>>      > -        return false;
>>      > -    }
>>      > -
>>      >       align_bss = TARGET_PAGE_ALIGN(start_bss);
>>      >       end_bss = TARGET_PAGE_ALIGN(end_bss);
>>      >
>>      >       if (start_bss < align_bss) {
>>      >           int flags = page_get_flags(start_bss);
>>      > -
>>      >           if (!(flags & PAGE_RWX)) {
>>      >               /*
>>      >                * The whole address space of the executable was reserved
>>      > @@ -472,20 +465,35 @@ static bool zero_bss(abi_ulong start_bss, abi_ulong end_bss,
>>      >                */
>>      >               align_bss -= TARGET_PAGE_SIZE;
>>      >           } else {
>>      > +            abi_ulong start_page_aligned = start_bss & TARGET_PAGE_MASK;
>>      >               /*
>>      > -             * The start of the bss shares a page with something.
>>      > -             * The only thing that we expect is the data section,
>>      > -             * which would already be marked writable.
>>      > -             * Overlapping the RX code segment seems malformed.
>>      > +             * The logical OR between flags and PAGE_WRITE works because
>>      > +             * in include/exec/page-protection.h they are defined as PROT_*
>>      > +             * values, matching mprotect().
>>      > +             * Temporarily enable write access to zero the fractional bss.
>>      > +             * target_mprotect() handles TB invalidation if needed.
>>      >                */
>>      >               if (!(flags & PAGE_WRITE)) {
>>      > -                error_setg(errp, "PT_LOAD with bss overlapping "
>>      > -                           "non-writable page");
>>      > -                return false;
>>      > +                if (target_mprotect(start_page_aligned,
>>      > +                                    TARGET_PAGE_SIZE,
>>      > +                                    prot | PAGE_WRITE) == -1) {
>>      > +                    error_setg_errno(errp, errno,
>>      > +                                    "Error enabling write access for bss");
>>      > +                    return false;
>>      > +                }
>>      >               }
>>      >
>>      > -            /* The page is already mapped and writable. */
>>      > +            /* The page is already mapped and now guaranteed writable. */
>>      >               memset(g2h_untagged(start_bss), 0, align_bss - start_bss);
>>      > +
>>      > +            if (!(flags & PAGE_WRITE)) {
>>      > +                if (target_mprotect(start_page_aligned,
>>      > +                                    TARGET_PAGE_SIZE, prot) == -1) {
>>      > +                    error_setg_errno(errp, errno,
>>      > +                                    "Error restoring bss first permissions");
>>      > +                    return false;
>>      > +                }
>>      > +            }
>>      >           }
>>      >       }
>>      >
>>
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-03-12 20:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-02  0:24 [PATCH] linux-user: Fix zero_bss for RX PT_LOAD segments Razvan Ghiorghe
2026-02-03  4:56 ` Richard Henderson
2026-03-06 19:53 ` Helge Deller
2026-03-10 20:49   ` razvan ghiorghe
2026-03-10 22:36     ` Helge Deller
2026-03-12 20:09       ` Helge Deller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox