Linux userland API discussions

Linux userland API discussions
 help / color / mirror / Atom feed

* [5][RFC] fs/ioctl.c: FIBMAP requires CAP_SYS_RAWIO while FIEMAP exposes identical data unprivileged
From: Cyber_black @ 2026-05-18 17:22 UTC (permalink / raw)
  To: luto@amacapital.net
  Cc: hch@infradead.org, linux-fsdevel@vger.kernel.org, tytso@mit.edu,
	linux-api@vger.kernel.org, djwong@kernel.org, mark@fasheh.com,
	moybs027@gmail.com

Thank you for raising this important question, Andy. I've been following the discussion as a "listening guest" and I have a thought.

My idea is this: Instead of forcing FIEMAP to become a root-only interface (breaking existing tools), or leaving it as-is (with information disclosure), what if we design a new, restricted API that is not privileged but also not unprivileged in the traditional sense?

Concretely:

1.  The API would be callable by any user, but it would not expose physical block addresses.

2.  It would answer higher-level questions that tools actually need, such as:

    -   "Are these two file ranges reflinked (shared)?" (for deduplication)

    -   "Is this file range sparse (holes)?" (without leaking physical locations)

    -   "What is the allocation status (delayed, unwritten, etc.)?"

3.  The kernel would maintain a capability or permission that is not root-equivalent (e.g., a new `CAP_BLOCK_MAP_QUERY`), but the API would not require full `CAP_SYS_RAWIO`.

This way:

-   Tools like `filefrag`, `cp`, and deduplication utilities can work without root.

-   Physical block addresses remain hidden from unprivileged users, closing the information leak.

-   We avoid forcing users to run these tools as root, which would open up far more serious risks (e.g., kernel panic, accidental corruption).

In short: we don't need to choose between "unprivileged leak" and "root-only". We can design a purpose‑limited API that answers only the necessary questions, with the minimum privilege required.

Would this be acceptable? I'd be happy to help draft a more detailed proposal or prototype.

This idea was developed together with my friend playerofficial19 (moybs027@gmail.com) through discussion. We hope it's helpful.

^ permalink raw reply

* Re: [PATCH 3/6] string: Introduce strtostr() for safe and performance string copies
From: David Laight @ 2026-05-18 18:38 UTC (permalink / raw)
  To: André Almeida
  Cc: Peter Zijlstra, Juri Lelli, Vincent Guittot, Steven Rostedt,
	Christian Brauner, Kees Cook, Shuah Khan, willy,
	mathieu.desnoyers, Linus Torvalds, akpm, Yafang Shao,
	andrii.nakryiko, arnaldo.melo, Petr Mladek, linux-kernel,
	kernel-dev, linux-mm, linux-api
In-Reply-To: <d4d6cf61-568e-478e-88d6-01b769d7eded@igalia.com>

On Mon, 18 May 2026 11:36:49 -0300
André Almeida <andrealmeid@igalia.com> wrote:

> Hi David, thanks for the feedback!
> 
> Em 17/05/2026 18:34, David Laight escreveu:
> > On Sun, 17 May 2026 15:36:13 -0300
> > André Almeida <andrealmeid@igalia.com> wrote:
> >   
> >> Some parts of the kernel uses memcpy() instead of strscpy() because they
> >> are performance sensitive and doesn't care about the return value of
> >> strscpy(). One such common case is to copy current->comm to a different
> >> buffer.
> >>
> >> As the command name is guaranteed to be NUL-terminated in the range of
> >> TASK_COMM_LEN, this is safe enough and doesn't create unterminated
> >> strings. However, in order to expand the size of current->comm, this
> >> expectation will be broken and those memcpy() could create such strings
> >> without trailing NUL byte.
> >>
> >> In order to support a fast and safe string copy, create strtostr(), to copy
> >> a NUL-terminated string to a new string buffer. If the destination buffer
> >> is bigger than the source, no pad is applied, but the string is
> >> NUL-terminated. If the destination buffer is smaller, the string is
> >> truncated. The last byte of the destination is always set to NUL for safety.
> >>
> >> Signed-off-by: André Almeida <andrealmeid@igalia.com>
> >> ---
> [...]>> +/**
> >> + * strtostr - Copy NUL-terminanted string to NUL-terminate string
> >> + *
> >> + * @dest: Pointer of destination string
> >> + * @src: Pointer to NUL-terminates string
> >> + *
> >> + * This is a replacement for strcpy() where the caller doesn't care about the
> >> + * return value and if the string is going to be truncated, albeit it needs
> >> + * to mark sure that it will be NUL-terminated. Intended for performance
> >> + * sensitive cases, such as tracing.  
> > 
> > If you care about performance, and the destination isn't smaller (especially
> > if the sizes are the same) then just use memcpy().
> >     
> 
> The problem is that as I'm expanding current->comm, the source buffer 
> might be bigger than destination, and when we truncate the string, it 
> won't have the termination NUL byte. So we need an extra dest[len-1] = 
> \0 after the memcpy.

It depends on other access to the destination.
If it might be being concurrently read it is vital that it is always
terminated.
So you can't even temporarily have a non-zero byte at the end.

> 
> >> + *
> >> + * If the destination is bigger than the source, no padding happens. It it's
> >> + * smaller the strings gets truncated.
> >> + *
> >> + * Both arguments needs to be arrays with lengths discoverable by the compiler.
> >> + */
> >> +#define strtostr(dest, src)	do {					\
> >> +	const size_t _dest_len = __must_be_cstr(dest) +			\
> >> +				 ARRAY_SIZE(dest);			\
> >> +	const size_t _src_len = __must_be_cstr(src) +			\
> >> +				__builtin_object_size(src, 1);		\
> >> +									\
> >> +	BUILD_BUG_ON(!__builtin_constant_p(_dest_len) ||		\
> >> +		     _dest_len == (size_t)-1);				\
> >> +	memcpy(dest, src, strnlen(src, min(_src_len, _dest_len)));	\
> >> +	dest[_dest_len - 1] = '\0';						\
> >> +} while (0)  
> > 
> > That doesn't work (for all sorts of reasons).
> > _dest_len can be the size of a pointer - no array check.
> > You need to use __is_array() and sizeof () for both dest and src.
> > You might have meant to check that _src_len is constant, not _dest_len.
> > You must not leave the destination unterminated.
> > 
> > __builtin_object_size(x->y,1) is also entirely useless!
> > If you have a pointer to a structure that ends in an array then the
> > object size of that array is SIZE_MAX (as if the array continues past
> > the end of the structure).
> > See https://godbolt.org/z/csenjfvxe (which I happened to prepare earlier today).
> > 
> > __builtin_object_size(x->y,0) also seems to always return SIZE_MAX.
> > You do get a sane answer for (x->y,3) on recent clang - but nowhere else.
> >   
> 
> Oops, you are right, thanks for pointing that out. This is how it would 
> look like checking that both args are arrays and using sizeof to get 
> their length, if it sounds good I can apply for the v2:
> 
> #define strtostr(dest, src)	do {				\
> 	const size_t _dest_len = __must_be_array(dest) +	\
> 				 sizeof(dest);			\
> 	const size_t _src_len = __must_be_array(src) +		\
> 				sizeof(src);			\
> 								\
> 	BUILD_BUG_ON(!__builtin_constant_p(_dest_len) ||	\
> 		     _dest_len == (size_t)-1);			\

That test can never fail.

> 	memcpy(dest, src, min(_src_len, _dest_len)));		\
> 	dest[_dest_len - 1] = '\0';				\

You are expending 'dest' twice.
Where it (p++)->array then the two values would be different and the final
value of 'p' incorrect.
Much better to assign both pointers to local variables.
Here you can use their required types to get type checking (I wouldn't bother
about the extra checks that _must_be_cstr() does).

I'd also create function that is explicitly for copying process names.
(Or replace the one that is already there - saves a lot of churn.)
then you know (and can check) the sizes are the expected ones.

It might even be worth making the #define (needed to get the array sizes)
call out to different functions for the different cases.

Thinks more...
On 64bit the 16 byte copy can be 'load; store; load; mask; store' provided
the buffer is aligned (copying u64 on 32bit will work the same).
But that requires that all the buffers be aligned.
So you'd need to check _Alignof(dest) >= _Alignof(u64) as well.
(Probably with a fallback to get things to compile.)

Whether that is best for the longer 64 byte copy is anybodies guess.

I also suspect it would be best to zero fill when copying a 16 byte
name into a 64 byte buffer.
(If you zero fill first then you can just copy 16 bytes over.)

-- David

> } while (0)
> 
> 
> > -- David
> > 
> >   
> 


^ permalink raw reply

* Re: [5][RFC] fs/ioctl.c: FIBMAP requires CAP_SYS_RAWIO while FIEMAP exposes identical data unprivileged
From: Andreas Dilger @ 2026-05-18 19:49 UTC (permalink / raw)
  To: Cyber_black
  Cc: luto@amacapital.net, hch@infradead.org,
	linux-fsdevel@vger.kernel.org, tytso@mit.edu,
	linux-api@vger.kernel.org, djwong@kernel.org, mark@fasheh.com,
	moybs027@gmail.com
In-Reply-To: <-nQmUF-iBsNFQ1Iz2j_cVui7DxnmpAO7z3X7qH8Xzpr7CYXE8j5x5YeFQ39U1wcMFNuVnuxu1pJf7ooiwJYK8ZFJDpjEtifFaBuWNJIi0ak=@proton.me>

On May 18, 2026, at 11:22, Cyber_black <Cyberblackk@proton.me> wrote:
> 
> Thank you for raising this important question, Andy. I've been following the discussion as a "listening guest" and I have a thought.
> 
> My idea is this: Instead of forcing FIEMAP to become a root-only interface (breaking existing tools), or leaving it as-is (with information disclosure), what if we design a new, restricted API that is not privileged but also not unprivileged in the traditional sense?

What is the *actual* security risk of showing block numbers to users for their own files?

If an attacker can access the underlying device/image, they could directly use debugfs
or other filesystem tools to get file->block mappings anyway, and could modify the image
arbitrarily.  Restricting FIEMAP to root or obscuring block numbers is security through
obscurity and provides no actual safety.

Cheers, Andreas

> 
> Concretely:
> 
> 1.  The API would be callable by any user, but it would not expose physical block addresses.
> 
> 2.  It would answer higher-level questions that tools actually need, such as:
> 
>    -   "Are these two file ranges reflinked (shared)?" (for deduplication)
> 
>    -   "Is this file range sparse (holes)?" (without leaking physical locations)
> 
>    -   "What is the allocation status (delayed, unwritten, etc.)?"
> 
> 3.  The kernel would maintain a capability or permission that is not root-equivalent (e.g., a new `CAP_BLOCK_MAP_QUERY`), but the API would not require full `CAP_SYS_RAWIO`.
> 
> 
> This way:
> 
> -   Tools like `filefrag`, `cp`, and deduplication utilities can work without root.
> 
> -   Physical block addresses remain hidden from unprivileged users, closing the information leak.
> 
> -   We avoid forcing users to run these tools as root, which would open up far more serious risks (e.g., kernel panic, accidental corruption).
> 
> 
> In short: we don't need to choose between "unprivileged leak" and "root-only". We can design a purpose‑limited API that answers only the necessary questions, with the minimum privilege required.
> 
> Would this be acceptable? I'd be happy to help draft a more detailed proposal or prototype.
> 
> This idea was developed together with my friend playerofficial19 (moybs027@gmail.com) through discussion. We hope it's helpful.
> 


Cheers, Andreas






^ permalink raw reply

* Re: [RFC] fs/ioctl.c: FIBMAP requires CAP_SYS_RAWIO while FIEMAP exposes identical data unprivileged
From: Theodore Tso @ 2026-05-19  2:23 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Cyber_black, linux-fsdevel@vger.kernel.org, Mark Fasheh,
	linux-api
In-Reply-To: <agqevS--YYBVW2Oz@infradead.org>

On Sun, May 17, 2026 at 10:08:13PM -0700, Christoph Hellwig wrote:
> On Fri, May 15, 2026 at 05:36:45PM +0000, Cyber_black wrote:
> > Option B) Add a capability check to ioctl_fiemap() to match FIBMAP.
> > This restores the intended restriction, at the cost of breaking
> > unprivileged use of FIEMAP (e.g. filefrag, btrfs tools, e2freefrag).
> > This option is a larger ABI impact and likely undesirable.
> > 
> > The preferred fix is Option A, since FIEMAP has been available
> > unprivileged since 2008 with no reported security issues, and read
> > access to physical block layout is already implicitly available
> > through open() permission on the file.
> 
> No, FIEMAP really should not be available unprivileged.  So I think B is
> the right thing.  Can you send a proper patch with a proper signoff?
> 

I disagree.  As I recall, we discussed whether or not FIEMAP needed to
be unprivileged many years ago, and it was a conscious choice not to
require root privs.  I don't believe it is a security issue to allow
users to see the logical -> physical block mappings for inodes.

Users might misuse it, and we did have that issue many years ago when
cp attempted to use FIEMAP in a way way that it wasn't intended to be
used[1].  However, that was over 15 years ago.

[1] https://lwn.net/Articles/429345/

But just because an interface could be misued doesn't mean that we
should restrict it, IMHO.

					- Ted

^ permalink raw reply

* Re: [RFC] fs/ioctl.c: FIBMAP requires CAP_SYS_RAWIO while FIEMAP exposes identical data unprivileged
From: Darrick J. Wong @ 2026-05-19  3:31 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Christoph Hellwig, Cyber_black, linux-fsdevel@vger.kernel.org,
	Mark Fasheh, Theodore Ts'o, linux-api
In-Reply-To: <CALCETrUFMFNnJ6FLd9SkzS5E1q3x+cqGvOvo5PzU2V_+moSEJw@mail.gmail.com>

On Mon, May 18, 2026 at 09:22:42AM -0700, Andy Lutomirski wrote:
> On Mon, May 18, 2026 at 9:21 AM Darrick J. Wong <djwong@kernel.org> wrote:
> >
> > On Sun, May 17, 2026 at 10:08:13PM -0700, Christoph Hellwig wrote:
> > > On Fri, May 15, 2026 at 05:36:45PM +0000, Cyber_black wrote:
> > > > Option B) Add a capability check to ioctl_fiemap() to match FIBMAP.
> > > > This restores the intended restriction, at the cost of breaking
> > > > unprivileged use of FIEMAP (e.g. filefrag, btrfs tools, e2freefrag).
> > > > This option is a larger ABI impact and likely undesirable.
> > > >
> > > > The preferred fix is Option A, since FIEMAP has been available
> > > > unprivileged since 2008 with no reported security issues, and read
> > > > access to physical block layout is already implicitly available
> > > > through open() permission on the file.
> > >
> > > No, FIEMAP really should not be available unprivileged.  So I think B is
> > > the right thing.  Can you send a proper patch with a proper signoff?
> >
> > For anyone who might be relying on FIEMAP output to find sparse regions
> > -- don't.  FIEMAP is a lowlevel fs debugging interface; it won't tell
> > you about dirty pagecache backed by unwritten disk space.  cp was burned
> > by that a decade and a half ago.
> >
> 
> The only way that I'm personally aware of to determine whether ranges
> in two files are reflinked to each other (and the only efficient way
> to find identical blocks to, say, archive a large directory without
> reading all the contents) is FIEMAP.  I wrote some code to do this
> awhile back (not in production use).  Yes, I realize that it might
> have issues with dirty page cache.
> 
> Is there some other way to do this?  Could an API be added that
> efficiently answers the actual question without revealing information
> that shouldn't be revealed?

Well, yes, we *could* make yet another ioctl, but we could also just run
fe_physical through a one-way u64 hash function and set
FIEMAP_EXTENT_UNKNOWN if (say) you don't have CAP_SYS_RAWIO or
something.  Then your comparison function might still work... maybe?

OTOH nobody really wants Linus roaring at them, so we might all just do
absolutely nothing.

Also note that FIEMAP still doesn't report devices, so you're still
playing with fire on multi-device reflink-aware filesystems like XFS.

--D

^ permalink raw reply

* Re: [RFC] fs/ioctl.c: FIBMAP requires CAP_SYS_RAWIO while FIEMAP exposes identical data unprivileged
From: Andreas Dilger @ 2026-05-19  7:53 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Andy Lutomirski, Christoph Hellwig, Cyber_black,
	linux-fsdevel@vger.kernel.org, Mark Fasheh, Theodore Ts'o,
	linux-api
In-Reply-To: <20260519033126.GD9531@frogsfrogsfrogs>

On May 18, 2026, at 21:31, Darrick J. Wong <djwong@kernel.org> wrote:
> 
> On Mon, May 18, 2026 at 09:22:42AM -0700, Andy Lutomirski wrote:
>> On Mon, May 18, 2026 at 9:21 AM Darrick J. Wong <djwong@kernel.org> wrote:
>>> 
>>> On Sun, May 17, 2026 at 10:08:13PM -0700, Christoph Hellwig wrote:
>>>> On Fri, May 15, 2026 at 05:36:45PM +0000, Cyber_black wrote:
>>>>> Option B) Add a capability check to ioctl_fiemap() to match FIBMAP.
>>>>> This restores the intended restriction, at the cost of breaking
>>>>> unprivileged use of FIEMAP (e.g. filefrag, btrfs tools, e2freefrag).
>>>>> This option is a larger ABI impact and likely undesirable.
>>>>> 
>>>>> The preferred fix is Option A, since FIEMAP has been available
>>>>> unprivileged since 2008 with no reported security issues, and read
>>>>> access to physical block layout is already implicitly available
>>>>> through open() permission on the file.
>>>> 
>>>> No, FIEMAP really should not be available unprivileged.  So I think B is
>>>> the right thing.  Can you send a proper patch with a proper signoff?
>>> 
>>> For anyone who might be relying on FIEMAP output to find sparse regions
>>> -- don't.  FIEMAP is a lowlevel fs debugging interface; it won't tell
>>> you about dirty pagecache backed by unwritten disk space.  cp was burned
>>> by that a decade and a half ago.
>>> 
>> 
>> The only way that I'm personally aware of to determine whether ranges
>> in two files are reflinked to each other (and the only efficient way
>> to find identical blocks to, say, archive a large directory without
>> reading all the contents) is FIEMAP.  I wrote some code to do this
>> awhile back (not in production use).  Yes, I realize that it might
>> have issues with dirty page cache.
>> 
>> Is there some other way to do this?  Could an API be added that
>> efficiently answers the actual question without revealing information
>> that shouldn't be revealed?
> 
> Well, yes, we *could* make yet another ioctl, but we could also just run
> fe_physical through a one-way u64 hash function and set
> FIEMAP_EXTENT_UNKNOWN if (say) you don't have CAP_SYS_RAWIO or
> something.  Then your comparison function might still work... maybe?
> 
> OTOH nobody really wants Linus roaring at them, so we might all just do
> absolutely nothing.
> 
> Also note that FIEMAP still doesn't report devices, so you're still
> playing with fire on multi-device reflink-aware filesystems like XFS.

I've long had a patch to add device printing to FIEMAP/filefrag, but IIRC
the last time I tried to submit it upstream it was rejected.  Maybe times
have changed and there is a chance to get it included.

Cheers, Andreas






^ permalink raw reply

* Re: [RFC] fs/ioctl.c: FIBMAP requires CAP_SYS_RAWIO while FIEMAP exposes identical data unprivileged
From: Christoph Hellwig @ 2026-05-19 11:42 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Christoph Hellwig, Cyber_black, linux-fsdevel@vger.kernel.org,
	Mark Fasheh, linux-api
In-Reply-To: <20260519022327.GA11894@macsyma-wired.lan>

On Mon, May 18, 2026 at 10:23:27PM -0400, Theodore Tso wrote:
> I disagree.  As I recall, we discussed whether or not FIEMAP needed to
> be unprivileged many years ago, and it was a conscious choice not to
> require root privs.  I don't believe it is a security issue to allow
> users to see the logical -> physical block mappings for inodes.

Users have no business even known it.  It is a side channel that can
easily leak information for attackers that know allocation policies.
And as the reported state it also is inconsistent with how FIBMAP has
behaved since the damn of time.


^ permalink raw reply

* Re: [RFC] fs/ioctl.c: FIBMAP requires CAP_SYS_RAWIO while FIEMAP exposes identical data unprivileged
From: Christoph Hellwig @ 2026-05-19 11:45 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Andy Lutomirski, Christoph Hellwig, Cyber_black,
	linux-fsdevel@vger.kernel.org, Mark Fasheh, Theodore Ts'o,
	linux-api
In-Reply-To: <20260519033126.GD9531@frogsfrogsfrogs>

On Mon, May 18, 2026 at 08:31:26PM -0700, Darrick J. Wong wrote:
> > The only way that I'm personally aware of to determine whether ranges
> > in two files are reflinked to each other (and the only efficient way
> > to find identical blocks to, say, archive a large directory without
> > reading all the contents) is FIEMAP.  I wrote some code to do this
> > awhile back (not in production use).  Yes, I realize that it might
> > have issues with dirty page cache.
> > 
> > Is there some other way to do this?  Could an API be added that
> > efficiently answers the actual question without revealing information
> > that shouldn't be revealed?
> 
> Well, yes, we *could* make yet another ioctl, but we could also just run
> fe_physical through a one-way u64 hash function and set
> FIEMAP_EXTENT_UNKNOWN if (say) you don't have CAP_SYS_RAWIO or
> something.  Then your comparison function might still work... maybe?

What is the actual use case for that dedup detection?  I.e. what is
considered duplicate?  Does the application already have candidate
ranges or does it scan the output for all fіles?

For xfs the rmap can directly tell you what is shared, but I can't think
of a good way to expose that, but part of that might be that I don't
understand what question is asked and why.

Note the FIEMAP output can give you the wrong answer, e.g. with XFS
and multiple devices, or for file systems that can do tail packing and
have small amounts of data for multiple files in the same block.

> Also note that FIEMAP still doesn't report devices, so you're still
> playing with fire on multi-device reflink-aware filesystems like XFS.

or even on f2fs despite the lack of reflink support if the caller is
dumb enough.  All that of course depends on what the caller is doing
based on the FIEMAP output.

^ permalink raw reply

* Re: [RFC] TID v2.0: kernel module for cache-line zeroization against Flush+Reload (CLFLUSHOPT + LFENCE + REP STOSQ)
From: Jann Horn @ 2026-05-19 16:47 UTC (permalink / raw)
  To: Ahmed Hassan
  Cc: linux-kernel, linux-security-module, linux-hardening,
	kernel-hardening, linux-crypto, linux-mm, linux-api,
	linux-kselftest
In-Reply-To: <F78521DA-08DC-424E-BBE1-231BC900CEE0@gmail.com>

On Mon, May 18, 2026 at 11:47 PM Ahmed Hassan
<ahmaaaaadbntaaaaa@gmail.com> wrote:
>
> Hi kernel developers,
>
> I am sharing TID (The Instant Destroyer) v2.0, a Linux kernel module
> written in C that addresses a specific gap in existing security
> libraries: none of them (libsodium, OpenSSL, glibc memzero_explicit)
> flush CPU cache lines after memory zeroization.
>
>
> == Problem ==
>
> Standard zeroization functions (explicit_bzero, sodium_memzero,
> OPENSSL_cleanse) prevent the compiler from eliding the wipe, but do
> not evict CPU cache lines (L1/L2/L3). This leaves residual key
> material measurable via Flush+Reload (Yarom & Falkner, 2014) after
> data use ends.

The thing you're talking about isn't really related to the
Flush+Reload side channel attack, right? You're just talking about
flushing cache lines.

In what threat model would this be an issue? Normally, the goal of
memory zeroing is to ensure that sensitive data is wiped before an
attacker has a chance to physically pull out the RAM from a machine
and plug it into another device that can reveal RAM contents, or
before an attacker gains physical control of a locked device and can
connect malicious peripherals to it, or such.

So for this to be an actual security problem, the device would have to
keep running in a sufficiently high power state that data caches are
not discarded, and at the same time not perform enough memory accesses
to cause this memory to be discarded...

Assuming that this is an actual problem, why are you using a kernel
module for this? At least on x86, CLFLUSH is unprivileged, so crypto
libraries should be able to just use that directly. (There is the
caveat of what happens when the kernel migrates pages or kills a
process, but that's a larger problem.)

^ permalink raw reply

* Re: [PATCH 3/6] string: Introduce strtostr() for safe and performance string copies
From: André Almeida @ 2026-05-19 19:47 UTC (permalink / raw)
  To: David Laight
  Cc: Peter Zijlstra, Juri Lelli, Vincent Guittot, Steven Rostedt,
	Christian Brauner, Kees Cook, Shuah Khan, willy,
	mathieu.desnoyers, Linus Torvalds, akpm, Yafang Shao,
	andrii.nakryiko, arnaldo.melo, Petr Mladek, linux-kernel,
	kernel-dev, linux-mm, linux-api
In-Reply-To: <20260518193843.7bde8d53@pumpkin>

Em 18/05/2026 15:38, David Laight escreveu:
> On Mon, 18 May 2026 11:36:49 -0300
> André Almeida <andrealmeid@igalia.com> wrote:
> 
>> Hi David, thanks for the feedback!
>>
>> Em 17/05/2026 18:34, David Laight escreveu:
>>> On Sun, 17 May 2026 15:36:13 -0300
>>> André Almeida <andrealmeid@igalia.com> wrote:
>>>    
>>>> Some parts of the kernel uses memcpy() instead of strscpy() because they
>>>> are performance sensitive and doesn't care about the return value of
>>>> strscpy(). One such common case is to copy current->comm to a different
>>>> buffer.
>>>>
>>>> As the command name is guaranteed to be NUL-terminated in the range of
>>>> TASK_COMM_LEN, this is safe enough and doesn't create unterminated
>>>> strings. However, in order to expand the size of current->comm, this
>>>> expectation will be broken and those memcpy() could create such strings
>>>> without trailing NUL byte.
>>>>
>>>> In order to support a fast and safe string copy, create strtostr(), to copy
>>>> a NUL-terminated string to a new string buffer. If the destination buffer
>>>> is bigger than the source, no pad is applied, but the string is
>>>> NUL-terminated. If the destination buffer is smaller, the string is
>>>> truncated. The last byte of the destination is always set to NUL for safety.
>>>>
>>>> Signed-off-by: André Almeida <andrealmeid@igalia.com>
>>>> ---
>> [...]>> +/**
>>>> + * strtostr - Copy NUL-terminanted string to NUL-terminate string
>>>> + *
>>>> + * @dest: Pointer of destination string
>>>> + * @src: Pointer to NUL-terminates string
>>>> + *
>>>> + * This is a replacement for strcpy() where the caller doesn't care about the
>>>> + * return value and if the string is going to be truncated, albeit it needs
>>>> + * to mark sure that it will be NUL-terminated. Intended for performance
>>>> + * sensitive cases, such as tracing.
>>>
>>> If you care about performance, and the destination isn't smaller (especially
>>> if the sizes are the same) then just use memcpy().
>>>      
>>
>> The problem is that as I'm expanding current->comm, the source buffer
>> might be bigger than destination, and when we truncate the string, it
>> won't have the termination NUL byte. So we need an extra dest[len-1] =
>> \0 after the memcpy.
> 
> It depends on other access to the destination.
> If it might be being concurrently read it is vital that it is always
> terminated.
> So you can't even temporarily have a non-zero byte at the end.
> 

I don't think this is the case here, as far as I can tell all the 
callers of strtostr will wait the end of the copy before using it.

>>
>>>> + *
>>>> + * If the destination is bigger than the source, no padding happens. It it's
>>>> + * smaller the strings gets truncated.
>>>> + *
>>>> + * Both arguments needs to be arrays with lengths discoverable by the compiler.
>>>> + */
>>>> +#define strtostr(dest, src)	do {					\
>>>> +	const size_t _dest_len = __must_be_cstr(dest) +			\
>>>> +				 ARRAY_SIZE(dest);			\
>>>> +	const size_t _src_len = __must_be_cstr(src) +			\
>>>> +				__builtin_object_size(src, 1);		\
>>>> +									\
>>>> +	BUILD_BUG_ON(!__builtin_constant_p(_dest_len) ||		\
>>>> +		     _dest_len == (size_t)-1);				\
>>>> +	memcpy(dest, src, strnlen(src, min(_src_len, _dest_len)));	\
>>>> +	dest[_dest_len - 1] = '\0';						\
>>>> +} while (0)
>>>
>>> That doesn't work (for all sorts of reasons).
>>> _dest_len can be the size of a pointer - no array check.
>>> You need to use __is_array() and sizeof () for both dest and src.
>>> You might have meant to check that _src_len is constant, not _dest_len.
>>> You must not leave the destination unterminated.
>>>
>>> __builtin_object_size(x->y,1) is also entirely useless!
>>> If you have a pointer to a structure that ends in an array then the
>>> object size of that array is SIZE_MAX (as if the array continues past
>>> the end of the structure).
>>> See https://godbolt.org/z/csenjfvxe (which I happened to prepare earlier today).
>>>
>>> __builtin_object_size(x->y,0) also seems to always return SIZE_MAX.
>>> You do get a sane answer for (x->y,3) on recent clang - but nowhere else.
>>>    
>>
>> Oops, you are right, thanks for pointing that out. This is how it would
>> look like checking that both args are arrays and using sizeof to get
>> their length, if it sounds good I can apply for the v2:
>>
>> #define strtostr(dest, src)	do {				\
>> 	const size_t _dest_len = __must_be_array(dest) +	\
>> 				 sizeof(dest);			\
>> 	const size_t _src_len = __must_be_array(src) +		\
>> 				sizeof(src);			\
>> 								\
>> 	BUILD_BUG_ON(!__builtin_constant_p(_dest_len) ||	\
>> 		     _dest_len == (size_t)-1);			\
> 
> That test can never fail.
> 
>> 	memcpy(dest, src, min(_src_len, _dest_len)));		\
>> 	dest[_dest_len - 1] = '\0';				\
> 
> You are expending 'dest' twice.
> Where it (p++)->array then the two values would be different and the final
> value of 'p' incorrect.
> Much better to assign both pointers to local variables.
> Here you can use their required types to get type checking (I wouldn't bother
> about the extra checks that _must_be_cstr() does).
> 

Also, all those memcpy() that I replaced had explicitly the dest size. I 
think I could reuse it for strtostr() to simplify a bit things, what do 
you think?

> I'd also create function that is explicitly for copying process names.
> (Or replace the one that is already there - saves a lot of churn.)
> then you know (and can check) the sizes are the expected ones.
> 

I don't have strong feeling about get_task_comm(), but Linus said that 
"I'd rather aim to get rid of get_task_comm() entirely"[1] so for me 
it's fine to get a new function for that.

[1] 
https://lore.kernel.org/all/CAHk-=wi5c=_-FBGo_88CowJd_F-Gi6Ud9d=TALm65ReN7YjrMw@mail.gmail.com/

> It might even be worth making the #define (needed to get the array sizes)
> call out to different functions for the different cases.
> 
> Thinks more...
> On 64bit the 16 byte copy can be 'load; store; load; mask; store' provided
> the buffer is aligned (copying u64 on 32bit will work the same).
> But that requires that all the buffers be aligned.
> So you'd need to check _Alignof(dest) >= _Alignof(u64) as well.
> (Probably with a fallback to get things to compile.)
> 
> Whether that is best for the longer 64 byte copy is anybodies guess.
> 
> I also suspect it would be best to zero fill when copying a 16 byte
> name into a 64 byte buffer.
> (If you zero fill first then you can just copy 16 bytes over.)
> 
> -- David
> 
>> } while (0)
>>
>>
>>> -- David
>>>
>>>    
>>
> 


^ permalink raw reply

* Re: [PATCH 3/6] string: Introduce strtostr() for safe and performance string copies
From: Linus Torvalds @ 2026-05-19 20:37 UTC (permalink / raw)
  To: André Almeida
  Cc: David Laight, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Steven Rostedt, Christian Brauner, Kees Cook, Shuah Khan, willy,
	mathieu.desnoyers, akpm, Yafang Shao, andrii.nakryiko,
	arnaldo.melo, Petr Mladek, linux-kernel, kernel-dev, linux-mm,
	linux-api
In-Reply-To: <d4d6cf61-568e-478e-88d6-01b769d7eded@igalia.com>

On Mon, 18 May 2026 at 09:37, André Almeida <andrealmeid@igalia.com> wrote:
>
> The problem is that as I'm expanding current->comm, the source buffer
> might be bigger than destination, and when we truncate the string, it
> won't have the termination NUL byte. So we need an extra dest[len-1] =
> \0 after the memcpy.

What's wrong with just using strscpy() with 'len' being min(srcsize,dstsize)?

           Linus

^ permalink raw reply

* Re: [PATCH 3/6] string: Introduce strtostr() for safe and performance string copies
From: André Almeida @ 2026-05-19 20:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Laight, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Steven Rostedt, Christian Brauner, Kees Cook, Shuah Khan, willy,
	mathieu.desnoyers, akpm, Yafang Shao, andrii.nakryiko,
	arnaldo.melo, Petr Mladek, linux-kernel, kernel-dev, linux-mm,
	linux-api
In-Reply-To: <CAHk-=wgBdK5iRf1NdOuMT0-+sjxUc8QAU9vr66jBBzY6EFDtUA@mail.gmail.com>

Em 19/05/2026 17:37, Linus Torvalds escreveu:
> On Mon, 18 May 2026 at 09:37, André Almeida <andrealmeid@igalia.com> wrote:
>>
>> The problem is that as I'm expanding current->comm, the source buffer
>> might be bigger than destination, and when we truncate the string, it
>> won't have the termination NUL byte. So we need an extra dest[len-1] =
>> \0 after the memcpy.
> 
> What's wrong with just using strscpy() with 'len' being min(srcsize,dstsize)?
> 
Well, I thought that strscpy() was too expensive for the trace use case, 
but I'm happy to use it in the v2 if it's ok.

^ permalink raw reply

* Re: [RFC] fs/ioctl.c: FIBMAP requires CAP_SYS_RAWIO while FIEMAP exposes identical data unprivileged
From: Andy Lutomirski @ 2026-05-19 20:51 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Cyber_black, linux-fsdevel@vger.kernel.org,
	Mark Fasheh, Theodore Ts'o, linux-api
In-Reply-To: <20260519033126.GD9531@frogsfrogsfrogs>

On Mon, May 18, 2026 at 8:31 PM Darrick J. Wong <djwong@kernel.org> wrote:
>
> On Mon, May 18, 2026 at 09:22:42AM -0700, Andy Lutomirski wrote:
> > On Mon, May 18, 2026 at 9:21 AM Darrick J. Wong <djwong@kernel.org> wrote:
> > >
> > > On Sun, May 17, 2026 at 10:08:13PM -0700, Christoph Hellwig wrote:
> > > > On Fri, May 15, 2026 at 05:36:45PM +0000, Cyber_black wrote:
> > > > > Option B) Add a capability check to ioctl_fiemap() to match FIBMAP.
> > > > > This restores the intended restriction, at the cost of breaking
> > > > > unprivileged use of FIEMAP (e.g. filefrag, btrfs tools, e2freefrag).
> > > > > This option is a larger ABI impact and likely undesirable.
> > > > >
> > > > > The preferred fix is Option A, since FIEMAP has been available
> > > > > unprivileged since 2008 with no reported security issues, and read
> > > > > access to physical block layout is already implicitly available
> > > > > through open() permission on the file.
> > > >
> > > > No, FIEMAP really should not be available unprivileged.  So I think B is
> > > > the right thing.  Can you send a proper patch with a proper signoff?
> > >
> > > For anyone who might be relying on FIEMAP output to find sparse regions
> > > -- don't.  FIEMAP is a lowlevel fs debugging interface; it won't tell
> > > you about dirty pagecache backed by unwritten disk space.  cp was burned
> > > by that a decade and a half ago.
> > >
> >
> > The only way that I'm personally aware of to determine whether ranges
> > in two files are reflinked to each other (and the only efficient way
> > to find identical blocks to, say, archive a large directory without
> > reading all the contents) is FIEMAP.  I wrote some code to do this
> > awhile back (not in production use).  Yes, I realize that it might
> > have issues with dirty page cache.
> >
> > Is there some other way to do this?  Could an API be added that
> > efficiently answers the actual question without revealing information
> > that shouldn't be revealed?
>
> Well, yes, we *could* make yet another ioctl, but we could also just run
> fe_physical through a one-way u64 hash function and set
> FIEMAP_EXTENT_UNKNOWN if (say) you don't have CAP_SYS_RAWIO or
> something.  Then your comparison function might still work... maybe?
>
> OTOH nobody really wants Linus roaring at them, so we might all just do
> absolutely nothing.
>
> Also note that FIEMAP still doesn't report devices, so you're still
> playing with fire on multi-device reflink-aware filesystems like XFS.
>

A hash would be fine for me.

But really a nicer interface would translate logical ranges in a file
to some range identifier, where:

- It would be consistent with page cache.  So holes are only reported
if the current logical contents is a hole.
- It would return reliably different identifiers for ranges that do
not have identical contents.
- It would usually return the same identifier for ranges that are
known to the FS to have identical contents.
- It would not return the same identifier for files on different
backing devices that just happen to be backed by the same offset
within their respective backing devices.
- It would not necessarily return values that are consistent across a
remount.  But maybe some kind of mount id would be around to at least
detect this happening.

Fun bonus points: if the range is dirty in page cache, tell me, and if
it's not dirty, then, on supporting filesystems, return a value that
will *change* if someone writes to the file and it get undirtied
again.  IOW it would be nice to be able to use this to efficiently
scan through a file and see what extents may have been modified since
the last scan.  But this would be complex.

I couldn't care less about the actual location of a file.

Anyway, this is a bit of a pie-in-the-sky thought.

^ permalink raw reply

* Re: [RFC] TID v2.0: kernel module for cache-line zeroization against Flush+Reload (CLFLUSHOPT + LFENCE + REP STOSQ)
From: Jann Horn @ 2026-05-19 21:41 UTC (permalink / raw)
  To: Ahmad Hasan
  Cc: linux-kernel, linux-security-module, linux-hardening,
	kernel-hardening, linux-crypto, linux-mm, linux-api,
	linux-kselftest
In-Reply-To: <CAAmtCfMHqdWbYh-Hc5sGbOhXSM-aCA9G0-s64G8FTM+rGEV5RA@mail.gmail.com>

On Tue, May 19, 2026 at 11:31 PM Ahmad Hasan
<ahmaaaaadbntaaaaa@gmail.com> wrote:
> Thank you for your questions. I'll address each one:
>
> == 1. Threat Model ==
>
> The target scenario is a same-machine attacker
> in multi-tenant/cloud environments where two
> processes share physical L3 cache.
>
> Example: a cryptographic service and a malicious
> process running on the same host. The attacker
> uses Flush+Reload to measure cache access timing
> after every encryption operation — no physical
> access required.
>
> This is documented with real measurements:
> - Without TID: 78 cycles (Cache HIT — key pattern visible)
> - With TID v2.0: 286 cycles (Cache MISS — attack defeated)

So you're assuming that the cryptographic code leaks secrets through a
cache-based side channel? That would be a vulnerability in the crypto
code.

> == 2. Why Kernel Module and not userspace? ==
>
> You are correct that CLFLUSHOPT does not require
> Ring 0. However, userspace execution can be
> interrupted by a Context Switch, which expands
> the timing window from 372ns to 36,640ns —
> making the attack significantly easier.

Why does it matter how many hundreds of nanoseconds it takes to wipe
the data from memory? You can also have a context switch directly
before you enter your cache-wiping syscall, or in the middle of a
crypto operation.

> == 3. Why not add this directly to libraries? ==
>
> No major security library implements CLFLUSHOPT
> after wiping — not OpenSSL, not libsodium, not
> glibc, not memzero_explicit. This gap has existed
> since Flush+Reload was published in 2014.

I don't think that's a gap, because the standard approach to
mitigating cache-based side channels such as FLUSH+RELOAD is to not
access memory at secret-dependent indices in the first place.

^ permalink raw reply

* Re: [PATCH 3/6] string: Introduce strtostr() for safe and performance string copies
From: David Laight @ 2026-05-20  9:53 UTC (permalink / raw)
  To: André Almeida
  Cc: Peter Zijlstra, Juri Lelli, Vincent Guittot, Steven Rostedt,
	Christian Brauner, Kees Cook, Shuah Khan, willy,
	mathieu.desnoyers, Linus Torvalds, akpm, Yafang Shao,
	andrii.nakryiko, arnaldo.melo, Petr Mladek, linux-kernel,
	kernel-dev, linux-mm, linux-api
In-Reply-To: <471b5b42-974c-441a-9afb-13e1baba5c44@igalia.com>

On Tue, 19 May 2026 16:47:05 -0300
André Almeida <andrealmeid@igalia.com> wrote:

> Em 18/05/2026 15:38, David Laight escreveu:
> > On Mon, 18 May 2026 11:36:49 -0300
> > André Almeida <andrealmeid@igalia.com> wrote:
> >   
> >> Hi David, thanks for the feedback!
> >>
> >> Em 17/05/2026 18:34, David Laight escreveu:  
> >>> On Sun, 17 May 2026 15:36:13 -0300
> >>> André Almeida <andrealmeid@igalia.com> wrote:
> >>>      
> >>>> Some parts of the kernel uses memcpy() instead of strscpy() because they
> >>>> are performance sensitive and doesn't care about the return value of
> >>>> strscpy(). One such common case is to copy current->comm to a different
> >>>> buffer.
> >>>>
> >>>> As the command name is guaranteed to be NUL-terminated in the range of
> >>>> TASK_COMM_LEN, this is safe enough and doesn't create unterminated
> >>>> strings. However, in order to expand the size of current->comm, this
> >>>> expectation will be broken and those memcpy() could create such strings
> >>>> without trailing NUL byte.
> >>>>
> >>>> In order to support a fast and safe string copy, create strtostr(), to copy
> >>>> a NUL-terminated string to a new string buffer. If the destination buffer
> >>>> is bigger than the source, no pad is applied, but the string is
> >>>> NUL-terminated. If the destination buffer is smaller, the string is
> >>>> truncated. The last byte of the destination is always set to NUL for safety.
> >>>>
> >>>> Signed-off-by: André Almeida <andrealmeid@igalia.com>
> >>>> ---
> >> [...]>> +/**
> >>>> + * strtostr - Copy NUL-terminanted string to NUL-terminate string
> >>>> + *
> >>>> + * @dest: Pointer of destination string
> >>>> + * @src: Pointer to NUL-terminates string
> >>>> + *
> >>>> + * This is a replacement for strcpy() where the caller doesn't care about the
> >>>> + * return value and if the string is going to be truncated, albeit it needs
> >>>> + * to mark sure that it will be NUL-terminated. Intended for performance
> >>>> + * sensitive cases, such as tracing.  
> >>>
> >>> If you care about performance, and the destination isn't smaller (especially
> >>> if the sizes are the same) then just use memcpy().
> >>>        
> >>
> >> The problem is that as I'm expanding current->comm, the source buffer
> >> might be bigger than destination, and when we truncate the string, it
> >> won't have the termination NUL byte. So we need an extra dest[len-1] =
> >> \0 after the memcpy.  
> > 
> > It depends on other access to the destination.
> > If it might be being concurrently read it is vital that it is always
> > terminated.
> > So you can't even temporarily have a non-zero byte at the end.
> >   
> 
> I don't think this is the case here, as far as I can tell all the 
> callers of strtostr will wait the end of the copy before using it.

It's not the callers, it is other threads.
The comm[] string in the process structure can be read write it is being
updated.
It doesn't matter if the reader gets a mix of the old and new strings,
but it must see the terminating '\0'.

...
> > I'd also create function that is explicitly for copying process names.
> > (Or replace the one that is already there - saves a lot of churn.)
> > then you know (and can check) the sizes are the expected ones.
> >   
> 
> I don't have strong feeling about get_task_comm(), but Linus said that 
> "I'd rather aim to get rid of get_task_comm() entirely"[1] so for me 
> it's fine to get a new function for that.
> 
> [1] 
> https://lore.kernel.org/all/CAHk-=wi5c=_-FBGo_88CowJd_F-Gi6Ud9d=TALm65ReN7YjrMw@mail.gmail.com/
> 

You could probably justify a rewritten get_task_comm() without all the baggage.

It might end up being a wrapper for strscpy_pad() or some other (to be written
function).
Maybe the body gets get extracted out later for other uses...

The advantage of a wrapper is that you can change the implementation
without having to change all the call sites.

Another (untested) wrapper:
#define copy_task_com(dst, src) do { \
	size_t _dst_len = sizeof(dst) + __must_be_array(dst); \
	size_t _src_len = sizeof(src); \
	const char *src = _src; \
	char *_dst = dst; \
\
	if (__is_array(src) && _src_len <= _dst_len) { \
		memcpy(_dst, _src, _src_len) \
	}else if (_Alignof(dst) < _Alignof(u64) || _Alignof(src) < _Alignof(u64) || \
			!__is_array(src) || _dst_len != 16 || _src_len < 16) { \
		strscpy_pad(_dst, _src, _dst_len); \
	} else { \
		((u64 *)_dst)[0] = ((u64 *)src)[0]; \
		((u64 *)_dst)[1] = ((u64 *)src)[1] & ~le64toh(0xff); \
	} \
} while (0);

Although (annoyingly) neither _Alignof() nor alignof() gives the value you want.
I don't think you can fix the alignment of a structure member.

-- David


> > It might even be worth making the #define (needed to get the array sizes)
> > call out to different functions for the different cases.
> > 
> > Thinks more...
> > On 64bit the 16 byte copy can be 'load; store; load; mask; store' provided
> > the buffer is aligned (copying u64 on 32bit will work the same).
> > But that requires that all the buffers be aligned.
> > So you'd need to check _Alignof(dest) >= _Alignof(u64) as well.
> > (Probably with a fallback to get things to compile.)
> > 
> > Whether that is best for the longer 64 byte copy is anybodies guess.
> > 
> > I also suspect it would be best to zero fill when copying a 16 byte
> > name into a 64 byte buffer.
> > (If you zero fill first then you can just copy 16 bytes over.)
> > 
> > -- David
> >   
> >> } while (0)
> >>
> >>  
> >>> -- David
> >>>
> >>>      
> >>  
> >   
> 
> 


^ permalink raw reply

* Re: [PATCH v14 03/15] fat: Implement fileattr_get for case sensitivity
From: Mark Brown @ 2026-05-20 14:31 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Al Viro, Christian Brauner, Jan Kara, linux-fsdevel, linux-ext4,
	linux-xfs, linux-cifs, linux-nfs, linux-api, linux-f2fs-devel,
	hirofumi, linkinjeon, sj1557.seo, yuezhang.mo,
	almaz.alexandrovich, slava, glaubitz, frank.li, tytso,
	adilger.kernel, cem, sfrench, pc, ronniesahlberg, sprasad,
	trondmy, anna, jaegeuk, chao, hansg, senozhatsky, Chuck Lever,
	Roland Mainz
In-Reply-To: <20260507-case-sensitivity-v14-3-e62cc8200435@oracle.com>

[-- Attachment #1: Type: text/plain, Size: 5588 bytes --]

On Thu, May 07, 2026 at 04:52:56AM -0400, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> Report FAT's case sensitivity behavior via the FS_XFLAG_CASEFOLD
> and FS_XFLAG_CASENONPRESERVING flags. FAT filesystems are
> case-insensitive by default.
> 
> MSDOS supports a 'nocase' mount option that enables case-sensitive
> behavior; check this option when reporting case sensitivity.
> 
> VFAT long filename entries preserve case; without VFAT, only
> uppercased 8.3 short names are stored. MSDOS with 'nocase' also
> preserves case since the name-formatting code skips upcasing when
> 'nocase' is set. Check both options when reporting case preservation.

I'm seeing a regression in -next with the LTP statx04 test which bisects
to this commit:

tst_tmpdir.c:316: TINFO: Using /tmp/LTP_sta8hUyB4 as tmpdir (tmpfs filesystem)
tst_device.c:98: TINFO: Found free device 0 '/dev/loop0'
tst_test.c:2047: TINFO: LTP version: 20260130
tst_test.c:2050: TINFO: Tested kernel: 7.1.0-rc4-next-20260520 #1 SMP PREEMPT @1779279361 aarch64

...

tst_test.c:1985: TINFO: === Testing on vfat ===
tst_test.c:1290: TINFO: Formatting /dev/loop0 with vfat opts='' extra opts=''
tst_test.c:1302: TINFO: Mounting /dev/loop0 to /tmp/LTP_sta8hUyB4/mntpoint fstyp=vfat flags=0
statx04.c:121: TFAIL: STATX_ATTR_COMPRESSED not supported
statx04.c:121: TFAIL: STATX_ATTR_APPEND not supported
statx04.c:121: TFAIL: STATX_ATTR_IMMUTABLE not supported
statx04.c:121: TFAIL: STATX_ATTR_NODUMP not supported

Full log:

   https://lava.sirena.org.uk/scheduler/job/2778994#L6373

bisect log, with links to intermediate test results:

# bad: [687da68900cd1a46549f7d9430c7d40346cb86a0] Add linux-next specific files for 20260520
# good: [2b248ec57f3dcb99f2ce423b72eb3b77553e90a0] Merge branch 'for-linux-next-fixes' of https://gitlab.freedesktop.org/drm/misc/kernel.git
# good: [1c9631527427d35668eeb7236803cc4b18f950a8] Merge branch 'vfs-7.2.procfs' into vfs.all
# good: [3035e4454142327ec5faee2ff57ab7cb1e9fc712] fs: Add case sensitivity flags to file_kattr
git bisect start '687da68900cd1a46549f7d9430c7d40346cb86a0' '2b248ec57f3dcb99f2ce423b72eb3b77553e90a0' '1c9631527427d35668eeb7236803cc4b18f950a8' '3035e4454142327ec5faee2ff57ab7cb1e9fc712'
# test job: [1c9631527427d35668eeb7236803cc4b18f950a8] https://lava.sirena.org.uk/scheduler/job/2774081
# test job: [3035e4454142327ec5faee2ff57ab7cb1e9fc712] https://lava.sirena.org.uk/scheduler/job/2774556
# test job: [687da68900cd1a46549f7d9430c7d40346cb86a0] https://lava.sirena.org.uk/scheduler/job/2778994
# bad: [687da68900cd1a46549f7d9430c7d40346cb86a0] Add linux-next specific files for 20260520
git bisect bad 687da68900cd1a46549f7d9430c7d40346cb86a0
# test job: [8d97e7babd9a9ff8b5be4e4105d24ad3514044ff] https://lava.sirena.org.uk/scheduler/job/2774206
# bad: [8d97e7babd9a9ff8b5be4e4105d24ad3514044ff] Merge branch 'vfs-7.2.casefold' into vfs.all
git bisect bad 8d97e7babd9a9ff8b5be4e4105d24ad3514044ff
# test job: [f9eba293ae7ca289e587985f94d84a390949ea31] https://lava.sirena.org.uk/scheduler/job/2773899
# bad: [f9eba293ae7ca289e587985f94d84a390949ea31] Merge branch 'kernel-7.2.misc' into vfs.all
git bisect bad f9eba293ae7ca289e587985f94d84a390949ea31
# test job: [eeb7b37b9700f0dbb3e6fe7b9e910b466ac190dd] https://lava.sirena.org.uk/scheduler/job/2774402
# bad: [eeb7b37b9700f0dbb3e6fe7b9e910b466ac190dd] ntfs3: Implement fileattr_get for case sensitivity
git bisect bad eeb7b37b9700f0dbb3e6fe7b9e910b466ac190dd
# test job: [c92db2ca726fe61a66580d30ecff8c192a791935] https://lava.sirena.org.uk/scheduler/job/2774955
# bad: [c92db2ca726fe61a66580d30ecff8c192a791935] fat: Implement fileattr_get for case sensitivity
git bisect bad c92db2ca726fe61a66580d30ecff8c192a791935
# first bad commit: [c92db2ca726fe61a66580d30ecff8c192a791935] fat: Implement fileattr_get for case sensitivity
# test job: [b6fe046c30236e37e3f8c500cf5b1297c317c5ee] https://lava.sirena.org.uk/scheduler/job/2776383
# bad: [b6fe046c30236e37e3f8c500cf5b1297c317c5ee] hfs: Implement fileattr_get for case sensitivity
git bisect bad b6fe046c30236e37e3f8c500cf5b1297c317c5ee
# test job: [27e0b573dd4aa927670fbfd84732e569fde72078] https://lava.sirena.org.uk/scheduler/job/2774607
# bad: [27e0b573dd4aa927670fbfd84732e569fde72078] exfat: Implement fileattr_get for case sensitivity
git bisect bad 27e0b573dd4aa927670fbfd84732e569fde72078
# test job: [ef14aa143f1dd8adcba6c9277c3bbed2fe0969b4] https://lava.sirena.org.uk/scheduler/job/2774344
# bad: [ef14aa143f1dd8adcba6c9277c3bbed2fe0969b4] vboxsf: Implement fileattr_get for case sensitivity
git bisect bad ef14aa143f1dd8adcba6c9277c3bbed2fe0969b4
# test job: [b6fe046c30236e37e3f8c500cf5b1297c317c5ee] https://lava.sirena.org.uk/scheduler/job/2776383
# bad: [b6fe046c30236e37e3f8c500cf5b1297c317c5ee] hfs: Implement fileattr_get for case sensitivity
git bisect bad b6fe046c30236e37e3f8c500cf5b1297c317c5ee
# test job: [27e0b573dd4aa927670fbfd84732e569fde72078] https://lava.sirena.org.uk/scheduler/job/2774607
# bad: [27e0b573dd4aa927670fbfd84732e569fde72078] exfat: Implement fileattr_get for case sensitivity
git bisect bad 27e0b573dd4aa927670fbfd84732e569fde72078
# test job: [c92db2ca726fe61a66580d30ecff8c192a791935] https://lava.sirena.org.uk/scheduler/job/2774955
# bad: [c92db2ca726fe61a66580d30ecff8c192a791935] fat: Implement fileattr_get for case sensitivity
git bisect bad c92db2ca726fe61a66580d30ecff8c192a791935
# first bad commit: [c92db2ca726fe61a66580d30ecff8c192a791935] fat: Implement fileattr_get for case sensitivity

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH v14 03/15] fat: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-05-20 14:39 UTC (permalink / raw)
  To: Mark Brown
  Cc: Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
	linux-ext4, linux-xfs, linux-cifs, linux-nfs, linux-api,
	linux-f2fs-devel, OGAWA Hirofumi, Namjae Jeon, Sungjong Seo,
	Yuezhang Mo, almaz.alexandrovich, Viacheslav Dubeyko,
	John Paul Adrian Glaubitz, frank.li, Theodore Tso, adilger.kernel,
	Carlos Maiolino, Steve French, Paulo Alcantara, Ronnie Sahlberg,
	Shyam Prasad N, Trond Myklebust, Anna Schumaker, Jaegeuk Kim,
	Chao Yu, Hans de Goede, senozhatsky, Chuck Lever, Roland Mainz
In-Reply-To: <dc69224d-9926-4414-8c6e-4c15ae98705b@sirena.org.uk>



On Wed, May 20, 2026, at 10:31 AM, Mark Brown wrote:
> On Thu, May 07, 2026 at 04:52:56AM -0400, Chuck Lever wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>> 
>> Report FAT's case sensitivity behavior via the FS_XFLAG_CASEFOLD
>> and FS_XFLAG_CASENONPRESERVING flags. FAT filesystems are
>> case-insensitive by default.
>> 
>> MSDOS supports a 'nocase' mount option that enables case-sensitive
>> behavior; check this option when reporting case sensitivity.
>> 
>> VFAT long filename entries preserve case; without VFAT, only
>> uppercased 8.3 short names are stored. MSDOS with 'nocase' also
>> preserves case since the name-formatting code skips upcasing when
>> 'nocase' is set. Check both options when reporting case preservation.
>
> I'm seeing a regression in -next with the LTP statx04 test which bisects
> to this commit:
>
> tst_tmpdir.c:316: TINFO: Using /tmp/LTP_sta8hUyB4 as tmpdir (tmpfs 
> filesystem)
> tst_device.c:98: TINFO: Found free device 0 '/dev/loop0'
> tst_test.c:2047: TINFO: LTP version: 20260130
> tst_test.c:2050: TINFO: Tested kernel: 7.1.0-rc4-next-20260520 #1 SMP 
> PREEMPT @1779279361 aarch64
>
> ...
>
> tst_test.c:1985: TINFO: === Testing on vfat ===
> tst_test.c:1290: TINFO: Formatting /dev/loop0 with vfat opts='' extra 
> opts=''
> tst_test.c:1302: TINFO: Mounting /dev/loop0 to 
> /tmp/LTP_sta8hUyB4/mntpoint fstyp=vfat flags=0
> statx04.c:121: TFAIL: STATX_ATTR_COMPRESSED not supported
> statx04.c:121: TFAIL: STATX_ATTR_APPEND not supported
> statx04.c:121: TFAIL: STATX_ATTR_IMMUTABLE not supported
> statx04.c:121: TFAIL: STATX_ATTR_NODUMP not supported

At first blush, that does not seem like a plausible bisect
result. This commit shouldn't affect the behavior of tmpfs
in any way.


-- 
Chuck Lever

^ permalink raw reply

* Re: [PATCH v14 03/15] fat: Implement fileattr_get for case sensitivity
From: Mark Brown @ 2026-05-20 14:54 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
	linux-ext4, linux-xfs, linux-cifs, linux-nfs, linux-api,
	linux-f2fs-devel, OGAWA Hirofumi, Namjae Jeon, Sungjong Seo,
	Yuezhang Mo, almaz.alexandrovich, Viacheslav Dubeyko,
	John Paul Adrian Glaubitz, frank.li, Theodore Tso, adilger.kernel,
	Carlos Maiolino, Steve French, Paulo Alcantara, Ronnie Sahlberg,
	Shyam Prasad N, Trond Myklebust, Anna Schumaker, Jaegeuk Kim,
	Chao Yu, Hans de Goede, senozhatsky, Chuck Lever, Roland Mainz
In-Reply-To: <04302551-3628-4036-9a3f-596cb782f5b7@app.fastmail.com>

[-- Attachment #1: Type: text/plain, Size: 1800 bytes --]

On Wed, May 20, 2026 at 10:39:16AM -0400, Chuck Lever wrote:
> On Wed, May 20, 2026, at 10:31 AM, Mark Brown wrote:
> > On Thu, May 07, 2026 at 04:52:56AM -0400, Chuck Lever wrote:

> > I'm seeing a regression in -next with the LTP statx04 test which bisects
> > to this commit:

> > tst_tmpdir.c:316: TINFO: Using /tmp/LTP_sta8hUyB4 as tmpdir (tmpfs 
> > filesystem)
> > tst_device.c:98: TINFO: Found free device 0 '/dev/loop0'
> > tst_test.c:2047: TINFO: LTP version: 20260130
> > tst_test.c:2050: TINFO: Tested kernel: 7.1.0-rc4-next-20260520 #1 SMP 
> > PREEMPT @1779279361 aarch64

> > ...

> > tst_test.c:1985: TINFO: === Testing on vfat ===
> > tst_test.c:1290: TINFO: Formatting /dev/loop0 with vfat opts='' extra 
> > opts=''
> > tst_test.c:1302: TINFO: Mounting /dev/loop0 to 
> > /tmp/LTP_sta8hUyB4/mntpoint fstyp=vfat flags=0
> > statx04.c:121: TFAIL: STATX_ATTR_COMPRESSED not supported
> > statx04.c:121: TFAIL: STATX_ATTR_APPEND not supported
> > statx04.c:121: TFAIL: STATX_ATTR_IMMUTABLE not supported
> > statx04.c:121: TFAIL: STATX_ATTR_NODUMP not supported

> At first blush, that does not seem like a plausible bisect
> result. This commit shouldn't affect the behavior of tmpfs
> in any way.

It's not testing tmpfs (well, it does but that passed), as the log above
shows it is making a vfat filesystem on a loop device backed by a file
that happens to be in a tmpfs and then testing that.  There's a bunch of
filesystems covered in this manner:

tst_test.c:1985: TINFO: === Testing on ext2 ===
tst_test.c:1985: TINFO: === Testing on ext3 ===
tst_test.c:1985: TINFO: === Testing on ext4 ===
tst_test.c:1985: TINFO: === Testing on btrfs ===
tst_test.c:1985: TINFO: === Testing on vfat ===
tst_test.c:1985: TINFO: === Testing on tmpfs ===

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH v14 03/15] fat: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-05-20 15:12 UTC (permalink / raw)
  To: Mark Brown
  Cc: Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
	linux-ext4, linux-xfs, linux-cifs, linux-nfs, linux-api,
	linux-f2fs-devel, OGAWA Hirofumi, Namjae Jeon, Sungjong Seo,
	Yuezhang Mo, almaz.alexandrovich, Viacheslav Dubeyko,
	John Paul Adrian Glaubitz, frank.li, Theodore Tso, adilger.kernel,
	Carlos Maiolino, Steve French, Paulo Alcantara, Ronnie Sahlberg,
	Shyam Prasad N, Trond Myklebust, Anna Schumaker, Jaegeuk Kim,
	Chao Yu, Hans de Goede, senozhatsky, Chuck Lever, Roland Mainz
In-Reply-To: <a366645c-364d-4588-8a15-4cd446f64366@sirena.org.uk>



On Wed, May 20, 2026, at 10:54 AM, Mark Brown wrote:
> On Wed, May 20, 2026 at 10:39:16AM -0400, Chuck Lever wrote:
>> On Wed, May 20, 2026, at 10:31 AM, Mark Brown wrote:
>> > On Thu, May 07, 2026 at 04:52:56AM -0400, Chuck Lever wrote:
>
>> > I'm seeing a regression in -next with the LTP statx04 test which bisects
>> > to this commit:
>
>> > tst_tmpdir.c:316: TINFO: Using /tmp/LTP_sta8hUyB4 as tmpdir (tmpfs 
>> > filesystem)
>> > tst_device.c:98: TINFO: Found free device 0 '/dev/loop0'
>> > tst_test.c:2047: TINFO: LTP version: 20260130
>> > tst_test.c:2050: TINFO: Tested kernel: 7.1.0-rc4-next-20260520 #1 SMP 
>> > PREEMPT @1779279361 aarch64
>
>> > ...
>
>> > tst_test.c:1985: TINFO: === Testing on vfat ===
>> > tst_test.c:1290: TINFO: Formatting /dev/loop0 with vfat opts='' extra 
>> > opts=''
>> > tst_test.c:1302: TINFO: Mounting /dev/loop0 to 
>> > /tmp/LTP_sta8hUyB4/mntpoint fstyp=vfat flags=0
>> > statx04.c:121: TFAIL: STATX_ATTR_COMPRESSED not supported
>> > statx04.c:121: TFAIL: STATX_ATTR_APPEND not supported
>> > statx04.c:121: TFAIL: STATX_ATTR_IMMUTABLE not supported
>> > statx04.c:121: TFAIL: STATX_ATTR_NODUMP not supported
>
>> At first blush, that does not seem like a plausible bisect
>> result. This commit shouldn't affect the behavior of tmpfs
>> in any way.
>
> It's not testing tmpfs (well, it does but that passed), as the log above
> shows it is making a vfat filesystem on a loop device backed by a file
> that happens to be in a tmpfs and then testing that.  There's a bunch of
> filesystems covered in this manner:
>
> tst_test.c:1985: TINFO: === Testing on ext2 ===
> tst_test.c:1985: TINFO: === Testing on ext3 ===
> tst_test.c:1985: TINFO: === Testing on ext4 ===
> tst_test.c:1985: TINFO: === Testing on btrfs ===
> tst_test.c:1985: TINFO: === Testing on vfat ===
> tst_test.c:1985: TINFO: === Testing on tmpfs ===

OK. Is vfat the only failure in LTP statx04 ?

-- 
Chuck Lever

^ permalink raw reply

* Re: [PATCH v14 03/15] fat: Implement fileattr_get for case sensitivity
From: Mark Brown @ 2026-05-20 15:19 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
	linux-ext4, linux-xfs, linux-cifs, linux-nfs, linux-api,
	linux-f2fs-devel, OGAWA Hirofumi, Namjae Jeon, Sungjong Seo,
	Yuezhang Mo, almaz.alexandrovich, Viacheslav Dubeyko,
	John Paul Adrian Glaubitz, frank.li, Theodore Tso, adilger.kernel,
	Carlos Maiolino, Steve French, Paulo Alcantara, Ronnie Sahlberg,
	Shyam Prasad N, Trond Myklebust, Anna Schumaker, Jaegeuk Kim,
	Chao Yu, Hans de Goede, senozhatsky, Chuck Lever, Roland Mainz
In-Reply-To: <8b750b3f-4d73-41f3-84fb-6e387fd24168@app.fastmail.com>

[-- Attachment #1: Type: text/plain, Size: 801 bytes --]

On Wed, May 20, 2026 at 11:12:51AM -0400, Chuck Lever wrote:
> On Wed, May 20, 2026, at 10:54 AM, Mark Brown wrote:

> > It's not testing tmpfs (well, it does but that passed), as the log above
> > shows it is making a vfat filesystem on a loop device backed by a file
> > that happens to be in a tmpfs and then testing that.  There's a bunch of
> > filesystems covered in this manner:

> OK. Is vfat the only failure in LTP statx04 ?

Yes, it's the only one showing as failing - there are four failures
correspoding to the four tests done for vfat.  It's only testing a
subset of filesystems (a combination of what the test knows about and
what's available at runtime with the kernel and rootfs.

Like I say there's a full log available at:

   https://lava.sirena.org.uk/scheduler/job/2778994#L6373

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH v14 03/15] fat: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-05-20 16:58 UTC (permalink / raw)
  To: Mark Brown
  Cc: Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
	linux-ext4, linux-xfs, linux-cifs, linux-nfs, linux-api,
	linux-f2fs-devel, OGAWA Hirofumi, Namjae Jeon, Sungjong Seo,
	Yuezhang Mo, almaz.alexandrovich, Viacheslav Dubeyko,
	John Paul Adrian Glaubitz, frank.li, Theodore Tso, adilger.kernel,
	Carlos Maiolino, Steve French, Paulo Alcantara, Ronnie Sahlberg,
	Shyam Prasad N, Trond Myklebust, Anna Schumaker, Jaegeuk Kim,
	Chao Yu, Hans de Goede, senozhatsky, Chuck Lever, Roland Mainz
In-Reply-To: <3a347b64-f91b-450f-b27d-26ea6810b960@sirena.org.uk>

On Wed, May 20, 2026, at 11:19 AM, Mark Brown wrote:
> On Wed, May 20, 2026 at 11:12:51AM -0400, Chuck Lever wrote:
>> On Wed, May 20, 2026, at 10:54 AM, Mark Brown wrote:
>
>> > It's not testing tmpfs (well, it does but that passed), as the log above
>> > shows it is making a vfat filesystem on a loop device backed by a file
>> > that happens to be in a tmpfs and then testing that.  There's a bunch of
>> > filesystems covered in this manner:
>
>> OK. Is vfat the only failure in LTP statx04 ?
>
> Yes, it's the only one showing as failing - there are four failures
> correspoding to the four tests done for vfat.

03/15 adds .fileattr_get = fat_fileattr_get for both
fat_file_inode_operations and vfat_dir_inode_operations. LTP
opens a directory (SAFE_OPEN(TESTDIR, O_RDONLY|O_DIRECTORY)),
so FS_IOC_GETFLAGS on the dir now succeeds, and statx04
proceeds where it was previously skipped.

AFAICS, 03/15 did not change pre-existing kernel behavior of
stx_attributes_mask on vfat. It merely converted a "skipped"
LTP outcome into an "executed but failed" outcome.

Fix options:

* fat_getattr() could call generic_fill_statx_attr(inode, stat),
  which advertises KSTAT_ATTR_VFS_FLAGS (IMMUTABLE + APPEND).
  That clears 2 of 4 TFAILs but not COMPRESSED/NODUMP, which
  FAT genuinely does not back.

* Set stat->attributes_mask |= KSTAT_ATTR_FS_IOC_FLAGS in
  fat_getattr(). Honest only to the extent that FAT now exposes
  some FS_*_FL bits via fileattr. This would silence the test
  failures, but advertises capabilities (COMPRESSED, NODUMP)
  FAT doesn't track.

* Admit the LTP statx04 test needs to be updated.
  FS_IOC_GETFLAGS succeeding does not logically imply all four
  FS_IOC_FLAGS-mapped STATX_ATTR_* bits are supported. The
  test's gate is too coarse for filesystems that gained a
  narrowly-scoped fileattr_get (just casefold/immutable). The
  test's tag list pins it to filesystems that do support the
  full set, but vfat was tacitly excluded by the prior ENOTTY.

The first option is the narrowest kernel-side change, and
matches what other minimal-fileattr filesystems do.

-- 
Chuck Lever

^ permalink raw reply

* Re: [PATCH v14 03/15] fat: Implement fileattr_get for case sensitivity
From: Mark Brown @ 2026-05-20 17:11 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
	linux-ext4, linux-xfs, linux-cifs, linux-nfs, linux-api,
	linux-f2fs-devel, OGAWA Hirofumi, Namjae Jeon, Sungjong Seo,
	Yuezhang Mo, almaz.alexandrovich, Viacheslav Dubeyko,
	John Paul Adrian Glaubitz, frank.li, Theodore Tso, adilger.kernel,
	Carlos Maiolino, Steve French, Paulo Alcantara, Ronnie Sahlberg,
	Shyam Prasad N, Trond Myklebust, Anna Schumaker, Jaegeuk Kim,
	Chao Yu, Hans de Goede, senozhatsky, Chuck Lever, Roland Mainz
In-Reply-To: <858d7233-1d9c-48f4-aa4f-c5a9f6e1f5dc@app.fastmail.com>

[-- Attachment #1: Type: text/plain, Size: 1953 bytes --]

On Wed, May 20, 2026 at 12:58:22PM -0400, Chuck Lever wrote:
> On Wed, May 20, 2026, at 11:19 AM, Mark Brown wrote:

> > Yes, it's the only one showing as failing - there are four failures
> > correspoding to the four tests done for vfat.

> 03/15 adds .fileattr_get = fat_fileattr_get for both
> fat_file_inode_operations and vfat_dir_inode_operations. LTP
> opens a directory (SAFE_OPEN(TESTDIR, O_RDONLY|O_DIRECTORY)),
> so FS_IOC_GETFLAGS on the dir now succeeds, and statx04
> proceeds where it was previously skipped.

> AFAICS, 03/15 did not change pre-existing kernel behavior of
> stx_attributes_mask on vfat. It merely converted a "skipped"
> LTP outcome into an "executed but failed" outcome.

Ah, that's an interesting issue with the way the test reports.  LTP
could use nested reports a la TAP here so we're not just seeing the top
level failure from the test case in automation.

> Fix options:

> * fat_getattr() could call generic_fill_statx_attr(inode, stat),
>   which advertises KSTAT_ATTR_VFS_FLAGS (IMMUTABLE + APPEND).
>   That clears 2 of 4 TFAILs but not COMPRESSED/NODUMP, which
>   FAT genuinely does not back.

...

> * Admit the LTP statx04 test needs to be updated.
>   FS_IOC_GETFLAGS succeeding does not logically imply all four
>   FS_IOC_FLAGS-mapped STATX_ATTR_* bits are supported. The
>   test's gate is too coarse for filesystems that gained a
>   narrowly-scoped fileattr_get (just casefold/immutable). The
>   test's tag list pins it to filesystems that do support the
>   full set, but vfat was tacitly excluded by the prior ENOTTY.

I think this is needed, it's hardly the first LTP test to make
unwarranted assumptions about the kernel APIs.  I'll try to look into
it.

> The first option is the narrowest kernel-side change, and
> matches what other minimal-fileattr filesystems do.

That sounds like a good idea regardless of what we do with the test?

Thanks for looking into this so quickly and thoroughly.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH v14 03/15] fat: Implement fileattr_get for case sensitivity
From: Chuck Lever @ 2026-05-20 17:30 UTC (permalink / raw)
  To: Mark Brown
  Cc: Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
	linux-ext4, linux-xfs, linux-cifs, linux-nfs, linux-api,
	linux-f2fs-devel, OGAWA Hirofumi, Namjae Jeon, Sungjong Seo,
	Yuezhang Mo, almaz.alexandrovich, Viacheslav Dubeyko,
	John Paul Adrian Glaubitz, frank.li, Theodore Tso, adilger.kernel,
	Carlos Maiolino, Steve French, Paulo Alcantara, Ronnie Sahlberg,
	Shyam Prasad N, Trond Myklebust, Anna Schumaker, Jaegeuk Kim,
	Chao Yu, Hans de Goede, senozhatsky, Chuck Lever, Roland Mainz
In-Reply-To: <cdeaab82-06bf-47c1-8f6c-4e40dbec2344@sirena.org.uk>


On Wed, May 20, 2026, at 1:11 PM, Mark Brown wrote:
> On Wed, May 20, 2026 at 12:58:22PM -0400, Chuck Lever wrote:
>> The first option is the narrowest kernel-side change, and
>> matches what other minimal-fileattr filesystems do.
>
> That sounds like a good idea regardless of what we do with the test?

Yes, I have no objection to this approach, but it would be great to
hear from the vfat maintainers/contributors on this one before I
dig in.


-- 
Chuck Lever

^ permalink raw reply

* Re: [PATCH v2] f2fs: another way to set large folio by remembering inode number
From: Christoph Hellwig @ 2026-05-21  8:51 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jaegeuk Kim, Christoph Hellwig, linux-kernel, linux-f2fs-devel,
	Akilesh Kailash, linux-fsdevel, linux-mm, linux-api,
	Christian Brauner
In-Reply-To: <ad_HwhzlNPUEKQi6@casper.infradead.org>

On Wed, Apr 15, 2026 at 06:15:46PM +0100, Matthew Wilcox wrote:
> On Wed, Apr 15, 2026 at 04:44:04PM +0000, Jaegeuk Kim wrote:
> > On 04/14, Christoph Hellwig wrote:
> > > Please add the relevant mailing lists when adding new user interfaces.
> > > 
> > > And I'm not sure hacks working around the proper large folio
> > > implementation are something that should be merged upstream.
> > 
> > Cc'ed linux-api and linux-fsdevel onto the patch thread with a proposal that
> > I'm not sure it's acceptable or not. 
> 
> You haven't sent a proposal.  This is a reply to a reply to a reply of a
> patch.  There's no justification for why f2fs is so special that it
> needs this.  What the hell is going on?  You know this is not the way to
> get code merged into Linux.

None of this got properly answers, and this broken interface now landed
in linux-next. IT is offloading a user.* xattr which is free-form
user data with semantics that are weird to say it very nicely.

All this was done against the advice in the mailing list discussion.

I think at some point we just need to stop taking f2fs updates likes
this.

^ permalink raw reply

* Re: [PATCH v6 0/4] OPENAT2_REGULAR flag support for openat2
From: Christian Brauner @ 2026-05-21 10:53 UTC (permalink / raw)
  To: linux-fsdevel, Dorjoy Chowdhury
  Cc: linux-kernel, linux-api, ceph-devel, gfs2, linux-nfs, linux-cifs,
	v9fs, linux-kselftest, viro, jack, jlayton, chuck.lever,
	alex.aring, arnd, adilger, mjguzik, smfrench, richard.henderson,
	mattst88, linmag7, tsbogend, James.Bottomley, deller, davem,
	andreas, idryomov, amarkuze, slava, agruenba, trondmy, anna,
	sfrench, pc, ronniesahlberg, sprasad, tom, bharathsm, shuah,
	miklos, hansg
In-Reply-To: <20260416-abgraben-seeweg-a44ce660957f@brauner>

On Thu, Apr 16, 2026 at 03:07:12PM +0200, Christian Brauner wrote:
> On Sat, 28 Mar 2026 23:22:21 +0600, Dorjoy Chowdhury wrote:
> > I came upon this "Ability to only open regular files" uapi feature suggestion
> > from https://uapi-group.org/kernel-features/#ability-to-only-open-regular-files
> > and thought it would be something I could do as a first patch and get to
> > know the kernel code a bit better.
> > 
> > The following filesystems have been tested by building and booting the kernel
> > x86 bzImage in a Fedora 43 VM in QEMU. I have tested with OPENAT2_REGULAR that
> > regular files can be successfully opened and non-regular files (directory, fifo etc)
> > return -EFTYPE.
> > - btrfs
> > - NFS (loopback)
> > - SMB (loopback)
> > 
> > [...]
> 
> - I've added an explanation why OPENAT2_REGULAR is only needed for some
>   ->atomic_open() implementers but not others. What I don't like is that
>   we need all that custom handling in there but it's managable.
> 
> - I dropped the topmost style conversions. They really don't belong
>   there and if we switch to something better we should use (1 << <nr>).
> 
> - I split the EFTYPE errno introduction into a separate patch.

So I've massaged this series a bit in that I moved OPENAT2_REGULAR into
the upper 64-bit and internally use a __O_REGULAR bit. After having
thought about it makes a lot more sense to move the openat2() only
features into the upper 32-bit for the uapi space. I also ported the
selftests to the TEST* framework to fit with Aleksa's recent rework.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox