Extending clone_args for clone3()

linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Extending clone_args for clone3()
@ 2025-05-19 14:06 Yury Khrustalev
  2025-06-02 14:10 ` Yury Khrustalev
  2025-06-04  8:29 ` Arnd Bergmann
  0 siblings, 2 replies; 4+ messages in thread
From: Yury Khrustalev @ 2025-05-19 14:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: Christian Brauner, Arnd Bergmann, Mark Brown, Mark Rutland,
	linux-api

Hi,

I'm working on an RFC patch for Glibc to make use of the newly added
shadow_stack_token field in struct clone_args in [1] on arm64 targets.

I encountered the following problem. Glibc might be built with newer
version of struct clone_args than the currently running kernel. In
this case, we may attempt to use a non-zero value in the new field
in args (and pass size bigger than expected by the kernel) and the
kernel will reject the syscall with E2BIG error.

This seems to be due to a fail-early approach. The unexpected non-
zero values beyond what's supported by the kernel may indicate that
userspace expects something to happen (and may even have allocated
some resources). So it's better to indicate a problem rather than
silently ignore this and have userspace encounter an error later.

However, it creates difficulty with using extended "versions" of
the clone3 syscall. AFAIK, there is no way to ask kernel about
the supported size of struct clone_args except for making syscalls
with decreasing value of size until we stop getting E2BIG.

This seems fragile and may call for writing cumbersome code. In essence,
we will have to have clone30(), clone31(), clone32()... wrappers which
probably defeats the point of why clone3 was added:

  if (clone32_supported && clone32(...) == -1 && errno == E2BIG)
    {
      clone32_supported = false;
      /* ... */
    }
  else if (clone31_supported && clone31(...) == -1 && errno == E2BIG)
    {
      clone12_supported = false;
      /* ... */
    }
 ...

Is there a neat way to work around this? What was the idea for extending
clone_args in practice?

I suppose we can't rely on kernel version because support for extended
clone_args can be backported. In any case, we'd have to do a syscall
for this (it would probably be great to have kernel version in auxv).

I appreciate any advice here.

Thanks,
Yury

[1]: https://lore.kernel.org/all/20250416-clone3-shadow-stack-v16-0-2ffc9ca3917b@kernel.org/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Extending clone_args for clone3()
  2025-05-19 14:06 Extending clone_args for clone3() Yury Khrustalev
@ 2025-06-02 14:10 ` Yury Khrustalev
  2025-06-04  8:29 ` Arnd Bergmann
  1 sibling, 0 replies; 4+ messages in thread
From: Yury Khrustalev @ 2025-06-02 14:10 UTC (permalink / raw)
  To: linux-kernel
  Cc: Christian Brauner, Arnd Bergmann, Mark Brown, Mark Rutland,
	linux-api

Hi everyone,

A gentle ping :)

On Mon, May 19, 2025 at 03:06:29PM +0100, Yury Khrustalev wrote:
> Hi,
> 
> I'm working on an RFC patch for Glibc to make use of the newly added
> shadow_stack_token field in struct clone_args in [1] on arm64 targets.
> 
> I encountered the following problem. Glibc might be built with newer
> version of struct clone_args than the currently running kernel. In
> this case, we may attempt to use a non-zero value in the new field
> in args (and pass size bigger than expected by the kernel) and the
> kernel will reject the syscall with E2BIG error.
> 
> This seems to be due to a fail-early approach. The unexpected non-
> zero values beyond what's supported by the kernel may indicate that
> userspace expects something to happen (and may even have allocated
> some resources). So it's better to indicate a problem rather than
> silently ignore this and have userspace encounter an error later.
> 
> However, it creates difficulty with using extended "versions" of
> the clone3 syscall. AFAIK, there is no way to ask kernel about
> the supported size of struct clone_args except for making syscalls
> with decreasing value of size until we stop getting E2BIG.
> 
> This seems fragile and may call for writing cumbersome code. In essence,
> we will have to have clone30(), clone31(), clone32()... wrappers which
> probably defeats the point of why clone3 was added:
> 
> 
>   if (clone32_supported && clone32(...) == -1 && errno == E2BIG)
>     {
>       clone32_supported = false;
>       /* ... */
>     }
>   else if (clone31_supported && clone31(...) == -1 && errno == E2BIG)
>     {
>       clone12_supported = false;
>       /* ... */
>     }
>  ...
> 
> Is there a neat way to work around this? What was the idea for extending
> clone_args in practice?
> 
> I suppose we can't rely on kernel version because support for extended
> clone_args can be backported. In any case, we'd have to do a syscall
> for this (it would probably be great to have kernel version in auxv).
> 
> I appreciate any advice here.
> 
> Thanks,
> Yury
> 
> 
> [1]: https://lore.kernel.org/all/20250416-clone3-shadow-stack-v16-0-2ffc9ca3917b@kernel.org/
> 

Kind regards,
Yury


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Extending clone_args for clone3()
  2025-05-19 14:06 Extending clone_args for clone3() Yury Khrustalev
  2025-06-02 14:10 ` Yury Khrustalev
@ 2025-06-04  8:29 ` Arnd Bergmann
  2025-06-04 11:05   ` Mark Brown
  1 sibling, 1 reply; 4+ messages in thread
From: Arnd Bergmann @ 2025-06-04  8:29 UTC (permalink / raw)
  To: Yury Khrustalev, linux-kernel
  Cc: Christian Brauner, Mark Brown, Mark Rutland, linux-api

On Mon, May 19, 2025, at 16:06, Yury Khrustalev wrote:
>
> This seems fragile and may call for writing cumbersome code. In essence,
> we will have to have clone30(), clone31(), clone32()... wrappers which
> probably defeats the point of why clone3 was added:
>
>
>   if (clone32_supported && clone32(...) == -1 && errno == E2BIG)
>     {
>       clone32_supported = false;
>       /* ... */
>     }
>   else if (clone31_supported && clone31(...) == -1 && errno == E2BIG)
>     {
>       clone12_supported = false;
>       /* ... */
>     }
>  ...
>
> Is there a neat way to work around this? What was the idea for extending
> clone_args in practice?
>
> I suppose we can't rely on kernel version because support for extended
> clone_args can be backported. In any case, we'd have to do a syscall
> for this (it would probably be great to have kernel version in auxv).

I don't think there is a generic way to handle extended syscalls
from libc, it really depends on the specific feature it's trying
to use that requires the additional fields to be nonzero: some features
may have a reasonable fallback implementation in libc, other features
still require an error to be passed back to the caller.

As I understand the shadow stack feature, we want this to be enabled
whenever the kernel and hardware supports it, completely transparent
to an application, right?

I think ideally we'd check for HWCAP_GCS on arm64 or the equivalent
feature on other architectures and expect clone3 to support the
longer argument whenever that is set, but it looks like that would
break on current kernels that already support HWCAP_GCS but not
the clone3 argument.

Adding one more hwcap flag would be ugly, but that seems to be
the easiest way. That way, glibc can just test for the new hwcap
flag only use the extra clone3 word if all prerequisites (hardware
support, kernel gcs support, clone3 argument support) are there.

     Arnd

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Extending clone_args for clone3()
  2025-06-04  8:29 ` Arnd Bergmann
@ 2025-06-04 11:05   ` Mark Brown
  0 siblings, 0 replies; 4+ messages in thread
From: Mark Brown @ 2025-06-04 11:05 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Yury Khrustalev, linux-kernel, Christian Brauner, Mark Rutland,
	linux-api

[-- Attachment #1: Type: text/plain, Size: 1416 bytes --]

On Wed, Jun 04, 2025 at 10:29:48AM +0200, Arnd Bergmann wrote:

> As I understand the shadow stack feature, we want this to be enabled
> whenever the kernel and hardware supports it, completely transparent
> to an application, right?

Slightly more involved, but roughly.  The application and all libraries
linked into it should be built targeting shadow stacks (most binaries
will already be compatible but it's possible they wouldn't be so we
can't just asssume that) then if everything is compatible shadow stacks
will be enabled transparently by libc if the system supports them.

> I think ideally we'd check for HWCAP_GCS on arm64 or the equivalent
> feature on other architectures and expect clone3 to support the
> longer argument whenever that is set, but it looks like that would
> break on current kernels that already support HWCAP_GCS but not
> the clone3 argument.

> Adding one more hwcap flag would be ugly, but that seems to be
> the easiest way. That way, glibc can just test for the new hwcap
> flag only use the extra clone3 word if all prerequisites (hardware
> support, kernel gcs support, clone3 argument support) are there.

We'd also have to add something similar for x86 since that's had the
support even longer, and the RISC-V series looks like it's getting near
to being merged too so we'll likely have the same problem there given
that the clone3() series is not progressing super fast.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-06-04 11:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-19 14:06 Extending clone_args for clone3() Yury Khrustalev
2025-06-02 14:10 ` Yury Khrustalev
2025-06-04  8:29 ` Arnd Bergmann
2025-06-04 11:05   ` Mark Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).