* [LTP] [PATCH v1] fanotify22.c: handle multiple asynchronous error events
@ 2026-03-04 13:38 Wei Gao via ltp
2026-03-05 9:36 ` Jan Kara
2026-03-09 7:59 ` [LTP] [PATCH v2] " Wei Gao via ltp
0 siblings, 2 replies; 27+ messages in thread
From: Wei Gao via ltp @ 2026-03-04 13:38 UTC (permalink / raw)
To: ltp; +Cc: kernel test robot, Jan Kara
Since the introduction of the asynchronous fserror reporting framework
(kernel commit 81d2e13a57c9), fanotify22 has encountered sporadic failures
due to the non-deterministic nature of event delivery and merging:
1) tcase3 failure: A race condition occurs when the test reads the
notification fd between two events. Adding a short delay
(usleep) ensures all events are dispatched and ready before the
read() call.
2) tcase4 failure: The kernel may deliver errors as independent events
instead of a single merged event, The test logic is updated to
validate the expected error_count by either a single merged event
or the accumulation of multiple independent events in the buffer.
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202602042124.87bd00e3-lkp@intel.com
Signed-off-by: Wei Gao <wegao@suse.com>
---
.../kernel/syscalls/fanotify/fanotify22.c | 32 ++++++++++++++++---
1 file changed, 28 insertions(+), 4 deletions(-)
diff --git a/testcases/kernel/syscalls/fanotify/fanotify22.c b/testcases/kernel/syscalls/fanotify/fanotify22.c
index 6578474a7..82eed7ba9 100644
--- a/testcases/kernel/syscalls/fanotify/fanotify22.c
+++ b/testcases/kernel/syscalls/fanotify/fanotify22.c
@@ -53,6 +53,8 @@ static struct fanotify_fid_t null_fid;
static struct fanotify_fid_t bad_file_fid;
static struct fanotify_fid_t bad_link_fid;
+static int event_count;
+
static void trigger_fs_abort(void)
{
SAFE_MOUNT(tst_device->dev, MOUNT_PATH, tst_device->fs_type,
@@ -88,7 +90,6 @@ static void trigger_bad_link_lookup(void)
ret, BAD_LINK, errno, EUCLEAN);
}
-
static void tcase3_trigger(void)
{
trigger_bad_link_lookup();
@@ -176,9 +177,10 @@ static int check_error_event_info_error(struct fanotify_event_info_error *info_e
{
int fail = 0;
- if (info_error->error_count != ex->error_count) {
- tst_res(TFAIL, "%s: Unexpected error_count (%d!=%d)",
- ex->name, info_error->error_count, ex->error_count);
+ if (info_error->error_count != ex->error_count && event_count != ex->error_count) {
+ tst_res(TFAIL, "%s: Unexpected error_count (%d!=%d && %d!=%d)",
+ ex->name, info_error->error_count, ex->error_count,
+ event_count, ex->error_count);
fail++;
}
@@ -255,8 +257,30 @@ static void do_test(unsigned int i)
tcase->trigger_error();
+ usleep(100000);
+
read_len = SAFE_READ(0, fd_notify, event_buf, BUF_SIZE);
+ struct fanotify_event_metadata *metadata;
+ size_t len = read_len;
+
+ event_count = 0;
+
+ for (metadata = (struct fanotify_event_metadata *)event_buf;
+ FAN_EVENT_OK(metadata, len);
+ metadata = FAN_EVENT_NEXT(metadata, len)) {
+ event_count++;
+ struct fanotify_event_info_error *info_error = get_event_info_error(metadata);
+
+ if (info_error) {
+ tst_res(TINFO, "Event [%d]: errno=%d (expected %d), error_count=%d (expected total %d)",
+ event_count, info_error->error, tcase->error,
+ info_error->error_count, tcase->error_count);
+ } else {
+ tst_res(TINFO, "Event [%d]: No error info record found", event_count);
+ }
+ }
+
SAFE_FANOTIFY_MARK(fd_notify, FAN_MARK_REMOVE|FAN_MARK_FILESYSTEM,
FAN_FS_ERROR, AT_FDCWD, MOUNT_PATH);
--
2.52.0
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply related [flat|nested] 27+ messages in thread* Re: [LTP] [PATCH v1] fanotify22.c: handle multiple asynchronous error events
2026-03-04 13:38 [LTP] [PATCH v1] fanotify22.c: handle multiple asynchronous error events Wei Gao via ltp
@ 2026-03-05 9:36 ` Jan Kara
2026-03-05 14:36 ` Wei Gao via ltp
2026-03-09 7:59 ` [LTP] [PATCH v2] " Wei Gao via ltp
1 sibling, 1 reply; 27+ messages in thread
From: Jan Kara @ 2026-03-05 9:36 UTC (permalink / raw)
To: Wei Gao; +Cc: kernel test robot, Jan Kara, ltp
On Wed 04-03-26 13:38:07, Wei Gao wrote:
> Since the introduction of the asynchronous fserror reporting framework
> (kernel commit 81d2e13a57c9), fanotify22 has encountered sporadic failures
> due to the non-deterministic nature of event delivery and merging:
>
> 1) tcase3 failure: A race condition occurs when the test reads the
> notification fd between two events. Adding a short delay
> (usleep) ensures all events are dispatched and ready before the
> read() call.
OK, but please add a comment in the code why this is needed.
> 2) tcase4 failure: The kernel may deliver errors as independent events
> instead of a single merged event, The test logic is updated to
> validate the expected error_count by either a single merged event
> or the accumulation of multiple independent events in the buffer.
Did you investigate why the events didn't get merged in the kernel? If they
are against the same filesystem they should get merged AFAICS.
Honza
>
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202602042124.87bd00e3-lkp@intel.com
> Signed-off-by: Wei Gao <wegao@suse.com>
> ---
> .../kernel/syscalls/fanotify/fanotify22.c | 32 ++++++++++++++++---
> 1 file changed, 28 insertions(+), 4 deletions(-)
>
> diff --git a/testcases/kernel/syscalls/fanotify/fanotify22.c b/testcases/kernel/syscalls/fanotify/fanotify22.c
> index 6578474a7..82eed7ba9 100644
> --- a/testcases/kernel/syscalls/fanotify/fanotify22.c
> +++ b/testcases/kernel/syscalls/fanotify/fanotify22.c
> @@ -53,6 +53,8 @@ static struct fanotify_fid_t null_fid;
> static struct fanotify_fid_t bad_file_fid;
> static struct fanotify_fid_t bad_link_fid;
>
> +static int event_count;
> +
> static void trigger_fs_abort(void)
> {
> SAFE_MOUNT(tst_device->dev, MOUNT_PATH, tst_device->fs_type,
> @@ -88,7 +90,6 @@ static void trigger_bad_link_lookup(void)
> ret, BAD_LINK, errno, EUCLEAN);
> }
>
> -
> static void tcase3_trigger(void)
> {
> trigger_bad_link_lookup();
> @@ -176,9 +177,10 @@ static int check_error_event_info_error(struct fanotify_event_info_error *info_e
> {
> int fail = 0;
>
> - if (info_error->error_count != ex->error_count) {
> - tst_res(TFAIL, "%s: Unexpected error_count (%d!=%d)",
> - ex->name, info_error->error_count, ex->error_count);
> + if (info_error->error_count != ex->error_count && event_count != ex->error_count) {
> + tst_res(TFAIL, "%s: Unexpected error_count (%d!=%d && %d!=%d)",
> + ex->name, info_error->error_count, ex->error_count,
> + event_count, ex->error_count);
> fail++;
> }
>
> @@ -255,8 +257,30 @@ static void do_test(unsigned int i)
>
> tcase->trigger_error();
>
> + usleep(100000);
> +
OK, but can you please add a comment why the sleep is here.
> read_len = SAFE_READ(0, fd_notify, event_buf, BUF_SIZE);
>
> + struct fanotify_event_metadata *metadata;
> + size_t len = read_len;
> +
> + event_count = 0;
> +
> + for (metadata = (struct fanotify_event_metadata *)event_buf;
> + FAN_EVENT_OK(metadata, len);
> + metadata = FAN_EVENT_NEXT(metadata, len)) {
> + event_count++;
> + struct fanotify_event_info_error *info_error = get_event_info_error(metadata);
> +
> + if (info_error) {
> + tst_res(TINFO, "Event [%d]: errno=%d (expected %d), error_count=%d (expected total %d)",
> + event_count, info_error->error, tcase->error,
> + info_error->error_count, tcase->error_count);
> + } else {
> + tst_res(TINFO, "Event [%d]: No error info record found", event_count);
> + }
> + }
> +
This looks too lax to me. I think
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [LTP] [PATCH v1] fanotify22.c: handle multiple asynchronous error events
2026-03-05 9:36 ` Jan Kara
@ 2026-03-05 14:36 ` Wei Gao via ltp
2026-03-05 15:50 ` Jan Kara
0 siblings, 1 reply; 27+ messages in thread
From: Wei Gao via ltp @ 2026-03-05 14:36 UTC (permalink / raw)
To: Jan Kara; +Cc: kernel test robot, ltp
On Thu, Mar 05, 2026 at 10:36:04AM +0100, Jan Kara wrote:
> On Wed 04-03-26 13:38:07, Wei Gao wrote:
> > Since the introduction of the asynchronous fserror reporting framework
> > (kernel commit 81d2e13a57c9), fanotify22 has encountered sporadic failures
> > due to the non-deterministic nature of event delivery and merging:
> >
> > 1) tcase3 failure: A race condition occurs when the test reads the
> > notification fd between two events. Adding a short delay
> > (usleep) ensures all events are dispatched and ready before the
> > read() call.
>
> OK, but please add a comment in the code why this is needed.
>
Thanks for your quick feedback, i will add comments in next version.
> > 2) tcase4 failure: The kernel may deliver errors as independent events
> > instead of a single merged event, The test logic is updated to
> > validate the expected error_count by either a single merged event
> > or the accumulation of multiple independent events in the buffer.
>
> Did you investigate why the events didn't get merged in the kernel? If they
> are against the same filesystem they should get merged AFAICS.
>
> Honza
Sorry i have no idea why this happen, I just add debug code into LTP case and
found the event not lost but deliver independent, this leads me to believe that
the LTP should handle both scenarios—merged and independent events.
I also not sure my patch is correct or not, that's also the reason i
CC the patch to you :)
>
> >
> > Reported-by: kernel test robot <oliver.sang@intel.com>
> > Closes: https://lore.kernel.org/oe-lkp/202602042124.87bd00e3-lkp@intel.com
> > Signed-off-by: Wei Gao <wegao@suse.com>
> > ---
> > .../kernel/syscalls/fanotify/fanotify22.c | 32 ++++++++++++++++---
> > 1 file changed, 28 insertions(+), 4 deletions(-)
> >
> > diff --git a/testcases/kernel/syscalls/fanotify/fanotify22.c b/testcases/kernel/syscalls/fanotify/fanotify22.c
> > index 6578474a7..82eed7ba9 100644
> > --- a/testcases/kernel/syscalls/fanotify/fanotify22.c
> > +++ b/testcases/kernel/syscalls/fanotify/fanotify22.c
> > @@ -53,6 +53,8 @@ static struct fanotify_fid_t null_fid;
> > static struct fanotify_fid_t bad_file_fid;
> > static struct fanotify_fid_t bad_link_fid;
> >
> > +static int event_count;
> > +
> > static void trigger_fs_abort(void)
> > {
> > SAFE_MOUNT(tst_device->dev, MOUNT_PATH, tst_device->fs_type,
> > @@ -88,7 +90,6 @@ static void trigger_bad_link_lookup(void)
> > ret, BAD_LINK, errno, EUCLEAN);
> > }
> >
> > -
> > static void tcase3_trigger(void)
> > {
> > trigger_bad_link_lookup();
> > @@ -176,9 +177,10 @@ static int check_error_event_info_error(struct fanotify_event_info_error *info_e
> > {
> > int fail = 0;
> >
> > - if (info_error->error_count != ex->error_count) {
> > - tst_res(TFAIL, "%s: Unexpected error_count (%d!=%d)",
> > - ex->name, info_error->error_count, ex->error_count);
> > + if (info_error->error_count != ex->error_count && event_count != ex->error_count) {
> > + tst_res(TFAIL, "%s: Unexpected error_count (%d!=%d && %d!=%d)",
> > + ex->name, info_error->error_count, ex->error_count,
> > + event_count, ex->error_count);
> > fail++;
> > }
> >
> > @@ -255,8 +257,30 @@ static void do_test(unsigned int i)
> >
> > tcase->trigger_error();
> >
> > + usleep(100000);
> > +
>
> OK, but can you please add a comment why the sleep is here.
>
Sure!
> > read_len = SAFE_READ(0, fd_notify, event_buf, BUF_SIZE);
> >
> > + struct fanotify_event_metadata *metadata;
> > + size_t len = read_len;
> > +
> > + event_count = 0;
> > +
> > + for (metadata = (struct fanotify_event_metadata *)event_buf;
> > + FAN_EVENT_OK(metadata, len);
> > + metadata = FAN_EVENT_NEXT(metadata, len)) {
> > + event_count++;
> > + struct fanotify_event_info_error *info_error = get_event_info_error(metadata);
> > +
> > + if (info_error) {
> > + tst_res(TINFO, "Event [%d]: errno=%d (expected %d), error_count=%d (expected total %d)",
> > + event_count, info_error->error, tcase->error,
> > + info_error->error_count, tcase->error_count);
> > + } else {
> > + tst_res(TINFO, "Event [%d]: No error info record found", event_count);
> > + }
> > + }
> > +
>
> This looks too lax to me. I think
>
I guess your mean this is workaround for the issue? Are you suggesting that I should reconstruct the test case
to properly handle independent events rather than relying on the existing logic?
If so, I’d be happy to explore that and implement a more robust solution.
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [LTP] [PATCH v1] fanotify22.c: handle multiple asynchronous error events
2026-03-05 14:36 ` Wei Gao via ltp
@ 2026-03-05 15:50 ` Jan Kara
2026-03-06 4:50 ` Wei Gao via ltp
0 siblings, 1 reply; 27+ messages in thread
From: Jan Kara @ 2026-03-05 15:50 UTC (permalink / raw)
To: Wei Gao; +Cc: kernel test robot, Jan Kara, ltp
On Thu 05-03-26 14:36:31, Wei Gao wrote:
> On Thu, Mar 05, 2026 at 10:36:04AM +0100, Jan Kara wrote:
> > > 2) tcase4 failure: The kernel may deliver errors as independent events
> > > instead of a single merged event, The test logic is updated to
> > > validate the expected error_count by either a single merged event
> > > or the accumulation of multiple independent events in the buffer.
> >
> > Did you investigate why the events didn't get merged in the kernel? If they
> > are against the same filesystem they should get merged AFAICS.
> >
> Sorry i have no idea why this happen, I just add debug code into LTP case and
> found the event not lost but deliver independent, this leads me to believe that
> the LTP should handle both scenarios—merged and independent events.
> I also not sure my patch is correct or not, that's also the reason i
> CC the patch to you :)
OK :). How easily is this reproducible? Because in principle event merging
is not *guaranteed* so it isn't wrong for the LTP test to handle split
events but before we complicate the test too much it would be good to
figure out why the kernel behaves in unexpected way and doesn't merge the
events...
Honza
> > > Reported-by: kernel test robot <oliver.sang@intel.com>
> > > Closes: https://lore.kernel.org/oe-lkp/202602042124.87bd00e3-lkp@intel.com
> > > Signed-off-by: Wei Gao <wegao@suse.com>
> > > ---
> > > .../kernel/syscalls/fanotify/fanotify22.c | 32 ++++++++++++++++---
> > > 1 file changed, 28 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/testcases/kernel/syscalls/fanotify/fanotify22.c b/testcases/kernel/syscalls/fanotify/fanotify22.c
> > > index 6578474a7..82eed7ba9 100644
> > > --- a/testcases/kernel/syscalls/fanotify/fanotify22.c
> > > +++ b/testcases/kernel/syscalls/fanotify/fanotify22.c
> > > @@ -53,6 +53,8 @@ static struct fanotify_fid_t null_fid;
> > > static struct fanotify_fid_t bad_file_fid;
> > > static struct fanotify_fid_t bad_link_fid;
> > >
> > > +static int event_count;
> > > +
> > > static void trigger_fs_abort(void)
> > > {
> > > SAFE_MOUNT(tst_device->dev, MOUNT_PATH, tst_device->fs_type,
> > > @@ -88,7 +90,6 @@ static void trigger_bad_link_lookup(void)
> > > ret, BAD_LINK, errno, EUCLEAN);
> > > }
> > >
> > > -
> > > static void tcase3_trigger(void)
> > > {
> > > trigger_bad_link_lookup();
> > > @@ -176,9 +177,10 @@ static int check_error_event_info_error(struct fanotify_event_info_error *info_e
> > > {
> > > int fail = 0;
> > >
> > > - if (info_error->error_count != ex->error_count) {
> > > - tst_res(TFAIL, "%s: Unexpected error_count (%d!=%d)",
> > > - ex->name, info_error->error_count, ex->error_count);
> > > + if (info_error->error_count != ex->error_count && event_count != ex->error_count) {
> > > + tst_res(TFAIL, "%s: Unexpected error_count (%d!=%d && %d!=%d)",
> > > + ex->name, info_error->error_count, ex->error_count,
> > > + event_count, ex->error_count);
> > > fail++;
> > > }
> > >
> > > @@ -255,8 +257,30 @@ static void do_test(unsigned int i)
> > >
> > > tcase->trigger_error();
> > >
> > > + usleep(100000);
> > > +
> >
> > OK, but can you please add a comment why the sleep is here.
> >
> Sure!
> > > read_len = SAFE_READ(0, fd_notify, event_buf, BUF_SIZE);
> > >
> > > + struct fanotify_event_metadata *metadata;
> > > + size_t len = read_len;
> > > +
> > > + event_count = 0;
> > > +
> > > + for (metadata = (struct fanotify_event_metadata *)event_buf;
> > > + FAN_EVENT_OK(metadata, len);
> > > + metadata = FAN_EVENT_NEXT(metadata, len)) {
> > > + event_count++;
> > > + struct fanotify_event_info_error *info_error = get_event_info_error(metadata);
> > > +
> > > + if (info_error) {
> > > + tst_res(TINFO, "Event [%d]: errno=%d (expected %d), error_count=%d (expected total %d)",
> > > + event_count, info_error->error, tcase->error,
> > > + info_error->error_count, tcase->error_count);
> > > + } else {
> > > + tst_res(TINFO, "Event [%d]: No error info record found", event_count);
> > > + }
> > > + }
> > > +
> >
> > This looks too lax to me. I think
> >
> I guess your mean this is workaround for the issue? Are you suggesting that I should reconstruct the test case
> to properly handle independent events rather than relying on the existing logic?
> If so, I’d be happy to explore that and implement a more robust solution.
>
> > --
> > Jan Kara <jack@suse.com>
> > SUSE Labs, CR
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [LTP] [PATCH v1] fanotify22.c: handle multiple asynchronous error events
2026-03-05 15:50 ` Jan Kara
@ 2026-03-06 4:50 ` Wei Gao via ltp
2026-03-06 12:24 ` Petr Vorel
2026-03-06 15:19 ` Jan Kara
0 siblings, 2 replies; 27+ messages in thread
From: Wei Gao via ltp @ 2026-03-06 4:50 UTC (permalink / raw)
To: Jan Kara; +Cc: kernel test robot, ltp
On Thu, Mar 05, 2026 at 04:50:57PM +0100, Jan Kara wrote:
> On Thu 05-03-26 14:36:31, Wei Gao wrote:
> > On Thu, Mar 05, 2026 at 10:36:04AM +0100, Jan Kara wrote:
> > > > 2) tcase4 failure: The kernel may deliver errors as independent events
> > > > instead of a single merged event, The test logic is updated to
> > > > validate the expected error_count by either a single merged event
> > > > or the accumulation of multiple independent events in the buffer.
> > >
> > > Did you investigate why the events didn't get merged in the kernel? If they
> > > are against the same filesystem they should get merged AFAICS.
> > >
> > Sorry i have no idea why this happen, I just add debug code into LTP case and
> > found the event not lost but deliver independent, this leads me to believe that
> > the LTP should handle both scenarios—merged and independent events.
> > I also not sure my patch is correct or not, that's also the reason i
> > CC the patch to you :)
>
> OK :). How easily is this reproducible? Because in principle event merging
> is not *guaranteed* so it isn't wrong for the LTP test to handle split
> events but before we complicate the test too much it would be good to
> figure out why the kernel behaves in unexpected way and doesn't merge the
> events...
>
> Honza
Can not reproduce in my local env, but i can reproduce this issue in the OpenQA worker environment,
with a failure rate of approximately 30%.
By comparing the logs of successful and failed merge cases, I try to identified the root cause:
Event merging is bypassed when errors are handled by different kworker threads.
The core of the issue lies in how the event hash is calculated. As seen in the kernel source:
hash ^= hash_long((unsigned long)pid | ondir, FANOTIFY_EVENT_HASH_BITS);
In the failed case, different threads (e.g., T1618 and T1550) processed different events errors.
This places the events into different hash buckets (e.g., Bucket 89 vs. 118), finally
fanotify_should_merge() is never even invoked.
Enable pr-debug info with following command:
echo 'func fanotify_should_merge +p' > /sys/kernel/debug/dynamic_debug/control
merge failed log:
[ 5005.318867] [ T2675] loop0: detected capacity change from 0 to 614400
[ 5005.324451] [ T2675] /dev/zero: Can't lookup blockdev
[ 5005.399749] [ C0] operation not supported error, dev loop0, sector 614272 op 0x9:(WRITE_ZEROES) flags 0x10800800 phys_seg 0 prio class 2
[ 5005.431946] [ T2675] EXT4-fs (loop0): mounted filesystem 66a18a14-c058-4356-bf1d-f34b4139ad3f r/w with ordered data mode. Quota mode: none.
[ 5005.450493] [ T2691] EXT4-fs (loop0): unmounting filesystem 66a18a14-c058-4356-bf1d-f34b4139ad3f.
[ 5005.472350] [ T2691] EXT4-fs (loop0): mounted filesystem 66a18a14-c058-4356-bf1d-f34b4139ad3f r/w with ordered data mode. Quota mode: none.
[ 5005.476698] [ T2691] EXT4-fs error (device loop0): ext4_lookup:1785: inode #32386: comm fanotify22: iget: bogus i_mode (377)
[ 5005.479819] [ T1618] fanotify_group_event_mask: report_mask=4 mask=8000 data=00000000a5a60545 data_type=6
[ 5005.484575] [ T1618] fanotify_handle_event: group=000000005be5930d mask=8000 report_mask=4
[ 5005.485525] [ T2691] EXT4-fs error (device loop0): __ext4_remount:6804: comm fanotify22: Abort forced by user
[ 5005.486874] [ T1618] fanotify_insert_event: group=000000005be5930d event=00000000f1581cb4 bucket=118 <<<<<<
[ 5005.488983] [ T2691] Aborting journal on device loop0-8.
[ 5005.492336] [ T1550] fanotify_group_event_mask: report_mask=4 mask=8000 data=000000007f826d40 data_type=6
[ 5005.494414] [ T2691] EXT4-fs (loop0): Remounting filesystem read-only
[ 5005.494505] [ T1550] fanotify_handle_event: group=000000005be5930d mask=8000 report_mask=4
[ 5005.494746] [ T2691] EXT4-fs (loop0): re-mounted 66a18a14-c058-4356-bf1d-f34b4139ad3f ro.
[ 5005.495052] [ T1550] fanotify_merge: group=000000005be5930d event=00000000c92d9500 bucket=89
[ 5005.501280] [ T1550] fanotify_insert_event: group=000000005be5930d event=00000000c92d9500 bucket=89 <<<<<<
[ 5006.500649] [ T2691] EXT4-fs (loop0): unmounting filesystem 66a18a14-c058-4356-bf1d-f34b4139ad3f.
[ 5006.505916] [ T2691] EXT4-fs (loop0): warning: mounting fs with errors, running e2fsck is recommended
[ 5006.509571] [ T2691] EXT4-fs (loop0): mounted filesystem 66a18a14-c058-4356-bf1d-f34b4139ad3f r/w with ordered data mode. Quota mode: none.
[ 5006.512851] [ T2675] EXT4-fs (loop0): unmounting filesystem 66a18a14-c058-4356-bf1d-f34b4139ad3f.
merge ok log:
[ 5115.542758] [ T2732] loop0: detected capacity change from 0 to 614400
[ 5115.549082] [ T2732] /dev/zero: Can't lookup blockdev
[ 5115.617432] [ C0] operation not supported error, dev loop0, sector 614272 op 0x9:(WRITE_ZEROES) flags 0x10800800 phys_seg 0 prio class 2
[ 5115.637765] [ T2732] EXT4-fs (loop0): mounted filesystem f2264636-366f-4ba7-a6c1-3f7963ce677a r/w with ordered data mode. Quota mode: none.
[ 5115.661989] [ T2744] EXT4-fs (loop0): unmounting filesystem f2264636-366f-4ba7-a6c1-3f7963ce677a.
[ 5115.690910] [ T2744] EXT4-fs (loop0): mounted filesystem f2264636-366f-4ba7-a6c1-3f7963ce677a r/w with ordered data mode. Quota mode: none.
[ 5115.695573] [ T2744] EXT4-fs error (device loop0): ext4_lookup:1785: inode #32386: comm fanotify22: iget: bogus i_mode (377)
[ 5115.697835] [ T1618] fanotify_group_event_mask: report_mask=4 mask=8000 data=00000000a5a60545 data_type=6
[ 5115.700048] [ T1618] fanotify_handle_event: group=000000004dc01100 mask=8000 report_mask=4
[ 5115.702552] [ T1618] fanotify_insert_event: group=000000004dc01100 event=00000000c92d9500 bucket=48 <<<<<<
[ 5115.704872] [ T2744] EXT4-fs error (device loop0): __ext4_remount:6804: comm fanotify22: Abort forced by user
[ 5115.707250] [ T2744] Aborting journal on device loop0-8.
[ 5115.708638] [ T1618] fanotify_group_event_mask: report_mask=4 mask=8000 data=00000000a5a60545 data_type=6
[ 5115.712418] [ T1618] fanotify_handle_event: group=000000004dc01100 mask=8000 report_mask=4
[ 5115.715217] [ T1618] fanotify_merge: group=000000004dc01100 event=00000000f1581cb4 bucket=48 <<<<<<
[ 5115.716032] [ T2744] EXT4-fs (loop0): Remounting filesystem read-only
[ 5115.717883] [ T1618] fanotify_should_merge: old=00000000c92d9500 new=00000000f1581cb4
[ 5115.718698] [ T2744] EXT4-fs (loop0): re-mounted f2264636-366f-4ba7-a6c1-3f7963ce677a ro.
[ 5116.725817] [ T2744] EXT4-fs (loop0): unmounting filesystem f2264636-366f-4ba7-a6c1-3f7963ce677a.
[ 5116.731447] [ T2744] EXT4-fs (loop0): warning: mounting fs with errors, running e2fsck is recommended
[ 5116.734565] [ T2744] EXT4-fs (loop0): mounted filesystem f2264636-366f-4ba7-a6c1-3f7963ce677a r/w with ordered data mode. Quota mode: none.
[ 5116.738947] [ T2732] EXT4-fs (loop0): unmounting filesystem f2264636-366f-4ba7-a6c1-3f7963ce677a.
>
> > > > Reported-by: kernel test robot <oliver.sang@intel.com>
> > > > Closes: https://lore.kernel.org/oe-lkp/202602042124.87bd00e3-lkp@intel.com
> > > > Signed-off-by: Wei Gao <wegao@suse.com>
> > > > ---
> > > > .../kernel/syscalls/fanotify/fanotify22.c | 32 ++++++++++++++++---
> > > > 1 file changed, 28 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/testcases/kernel/syscalls/fanotify/fanotify22.c b/testcases/kernel/syscalls/fanotify/fanotify22.c
> > > > index 6578474a7..82eed7ba9 100644
> > > > --- a/testcases/kernel/syscalls/fanotify/fanotify22.c
> > > > +++ b/testcases/kernel/syscalls/fanotify/fanotify22.c
> > > > @@ -53,6 +53,8 @@ static struct fanotify_fid_t null_fid;
> > > > static struct fanotify_fid_t bad_file_fid;
> > > > static struct fanotify_fid_t bad_link_fid;
> > > >
> > > > +static int event_count;
> > > > +
> > > > static void trigger_fs_abort(void)
> > > > {
> > > > SAFE_MOUNT(tst_device->dev, MOUNT_PATH, tst_device->fs_type,
> > > > @@ -88,7 +90,6 @@ static void trigger_bad_link_lookup(void)
> > > > ret, BAD_LINK, errno, EUCLEAN);
> > > > }
> > > >
> > > > -
> > > > static void tcase3_trigger(void)
> > > > {
> > > > trigger_bad_link_lookup();
> > > > @@ -176,9 +177,10 @@ static int check_error_event_info_error(struct fanotify_event_info_error *info_e
> > > > {
> > > > int fail = 0;
> > > >
> > > > - if (info_error->error_count != ex->error_count) {
> > > > - tst_res(TFAIL, "%s: Unexpected error_count (%d!=%d)",
> > > > - ex->name, info_error->error_count, ex->error_count);
> > > > + if (info_error->error_count != ex->error_count && event_count != ex->error_count) {
> > > > + tst_res(TFAIL, "%s: Unexpected error_count (%d!=%d && %d!=%d)",
> > > > + ex->name, info_error->error_count, ex->error_count,
> > > > + event_count, ex->error_count);
> > > > fail++;
> > > > }
> > > >
> > > > @@ -255,8 +257,30 @@ static void do_test(unsigned int i)
> > > >
> > > > tcase->trigger_error();
> > > >
> > > > + usleep(100000);
> > > > +
> > >
> > > OK, but can you please add a comment why the sleep is here.
> > >
> > Sure!
> > > > read_len = SAFE_READ(0, fd_notify, event_buf, BUF_SIZE);
> > > >
> > > > + struct fanotify_event_metadata *metadata;
> > > > + size_t len = read_len;
> > > > +
> > > > + event_count = 0;
> > > > +
> > > > + for (metadata = (struct fanotify_event_metadata *)event_buf;
> > > > + FAN_EVENT_OK(metadata, len);
> > > > + metadata = FAN_EVENT_NEXT(metadata, len)) {
> > > > + event_count++;
> > > > + struct fanotify_event_info_error *info_error = get_event_info_error(metadata);
> > > > +
> > > > + if (info_error) {
> > > > + tst_res(TINFO, "Event [%d]: errno=%d (expected %d), error_count=%d (expected total %d)",
> > > > + event_count, info_error->error, tcase->error,
> > > > + info_error->error_count, tcase->error_count);
> > > > + } else {
> > > > + tst_res(TINFO, "Event [%d]: No error info record found", event_count);
> > > > + }
> > > > + }
> > > > +
> > >
> > > This looks too lax to me. I think
> > >
> > I guess your mean this is workaround for the issue? Are you suggesting that I should reconstruct the test case
> > to properly handle independent events rather than relying on the existing logic?
> > If so, I’d be happy to explore that and implement a more robust solution.
> >
> > > --
> > > Jan Kara <jack@suse.com>
> > > SUSE Labs, CR
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [LTP] [PATCH v1] fanotify22.c: handle multiple asynchronous error events
2026-03-06 4:50 ` Wei Gao via ltp
@ 2026-03-06 12:24 ` Petr Vorel
2026-03-06 15:19 ` Jan Kara
1 sibling, 0 replies; 27+ messages in thread
From: Petr Vorel @ 2026-03-06 12:24 UTC (permalink / raw)
To: Wei Gao; +Cc: kernel test robot, Amir Goldstein, Jan Kara, ltp
Hi all,
[ Cc Amir
https://lore.kernel.org/ltp/20260304133810.24585-1-wegao@suse.com/
https://patchwork.ozlabs.org/project/ltp/patch/20260304133810.24585-1-wegao@suse.com/
]
> On Thu, Mar 05, 2026 at 04:50:57PM +0100, Jan Kara wrote:
> > On Thu 05-03-26 14:36:31, Wei Gao wrote:
> > > On Thu, Mar 05, 2026 at 10:36:04AM +0100, Jan Kara wrote:
> > > > > 2) tcase4 failure: The kernel may deliver errors as independent events
> > > > > instead of a single merged event, The test logic is updated to
> > > > > validate the expected error_count by either a single merged event
> > > > > or the accumulation of multiple independent events in the buffer.
> > > > Did you investigate why the events didn't get merged in the kernel? If they
> > > > are against the same filesystem they should get merged AFAICS.
> > > Sorry i have no idea why this happen, I just add debug code into LTP case and
> > > found the event not lost but deliver independent, this leads me to believe that
> > > the LTP should handle both scenarios—merged and independent events.
> > > I also not sure my patch is correct or not, that's also the reason i
> > > CC the patch to you :)
> > OK :). How easily is this reproducible? Because in principle event merging
> > is not *guaranteed* so it isn't wrong for the LTP test to handle split
> > events but before we complicate the test too much it would be good to
> > figure out why the kernel behaves in unexpected way and doesn't merge the
> > events...
> > Honza
> Can not reproduce in my local env, but i can reproduce this issue in the OpenQA worker environment,
> with a failure rate of approximately 30%.
It'd be interesting to try to run manually qcow on the worker to investigate if
this is related to any QEMU switch (last time openQA only related issue was due
enabled HPET, which is by default off). Anyway, more info offline.
Kind regards,
Petr
> By comparing the logs of successful and failed merge cases, I try to identified the root cause:
> Event merging is bypassed when errors are handled by different kworker threads.
> The core of the issue lies in how the event hash is calculated. As seen in the kernel source:
> hash ^= hash_long((unsigned long)pid | ondir, FANOTIFY_EVENT_HASH_BITS);
> In the failed case, different threads (e.g., T1618 and T1550) processed different events errors.
> This places the events into different hash buckets (e.g., Bucket 89 vs. 118), finally
> fanotify_should_merge() is never even invoked.
> Enable pr-debug info with following command:
> echo 'func fanotify_should_merge +p' > /sys/kernel/debug/dynamic_debug/control
> merge failed log:
> [ 5005.318867] [ T2675] loop0: detected capacity change from 0 to 614400
> [ 5005.324451] [ T2675] /dev/zero: Can't lookup blockdev
> [ 5005.399749] [ C0] operation not supported error, dev loop0, sector 614272 op 0x9:(WRITE_ZEROES) flags 0x10800800 phys_seg 0 prio class 2
> [ 5005.431946] [ T2675] EXT4-fs (loop0): mounted filesystem 66a18a14-c058-4356-bf1d-f34b4139ad3f r/w with ordered data mode. Quota mode: none.
> [ 5005.450493] [ T2691] EXT4-fs (loop0): unmounting filesystem 66a18a14-c058-4356-bf1d-f34b4139ad3f.
> [ 5005.472350] [ T2691] EXT4-fs (loop0): mounted filesystem 66a18a14-c058-4356-bf1d-f34b4139ad3f r/w with ordered data mode. Quota mode: none.
> [ 5005.476698] [ T2691] EXT4-fs error (device loop0): ext4_lookup:1785: inode #32386: comm fanotify22: iget: bogus i_mode (377)
> [ 5005.479819] [ T1618] fanotify_group_event_mask: report_mask=4 mask=8000 data=00000000a5a60545 data_type=6
> [ 5005.484575] [ T1618] fanotify_handle_event: group=000000005be5930d mask=8000 report_mask=4
> [ 5005.485525] [ T2691] EXT4-fs error (device loop0): __ext4_remount:6804: comm fanotify22: Abort forced by user
> [ 5005.486874] [ T1618] fanotify_insert_event: group=000000005be5930d event=00000000f1581cb4 bucket=118 <<<<<<
> [ 5005.488983] [ T2691] Aborting journal on device loop0-8.
> [ 5005.492336] [ T1550] fanotify_group_event_mask: report_mask=4 mask=8000 data=000000007f826d40 data_type=6
> [ 5005.494414] [ T2691] EXT4-fs (loop0): Remounting filesystem read-only
> [ 5005.494505] [ T1550] fanotify_handle_event: group=000000005be5930d mask=8000 report_mask=4
> [ 5005.494746] [ T2691] EXT4-fs (loop0): re-mounted 66a18a14-c058-4356-bf1d-f34b4139ad3f ro.
> [ 5005.495052] [ T1550] fanotify_merge: group=000000005be5930d event=00000000c92d9500 bucket=89
> [ 5005.501280] [ T1550] fanotify_insert_event: group=000000005be5930d event=00000000c92d9500 bucket=89 <<<<<<
> [ 5006.500649] [ T2691] EXT4-fs (loop0): unmounting filesystem 66a18a14-c058-4356-bf1d-f34b4139ad3f.
> [ 5006.505916] [ T2691] EXT4-fs (loop0): warning: mounting fs with errors, running e2fsck is recommended
> [ 5006.509571] [ T2691] EXT4-fs (loop0): mounted filesystem 66a18a14-c058-4356-bf1d-f34b4139ad3f r/w with ordered data mode. Quota mode: none.
> [ 5006.512851] [ T2675] EXT4-fs (loop0): unmounting filesystem 66a18a14-c058-4356-bf1d-f34b4139ad3f.
> merge ok log:
> [ 5115.542758] [ T2732] loop0: detected capacity change from 0 to 614400
> [ 5115.549082] [ T2732] /dev/zero: Can't lookup blockdev
> [ 5115.617432] [ C0] operation not supported error, dev loop0, sector 614272 op 0x9:(WRITE_ZEROES) flags 0x10800800 phys_seg 0 prio class 2
> [ 5115.637765] [ T2732] EXT4-fs (loop0): mounted filesystem f2264636-366f-4ba7-a6c1-3f7963ce677a r/w with ordered data mode. Quota mode: none.
> [ 5115.661989] [ T2744] EXT4-fs (loop0): unmounting filesystem f2264636-366f-4ba7-a6c1-3f7963ce677a.
> [ 5115.690910] [ T2744] EXT4-fs (loop0): mounted filesystem f2264636-366f-4ba7-a6c1-3f7963ce677a r/w with ordered data mode. Quota mode: none.
> [ 5115.695573] [ T2744] EXT4-fs error (device loop0): ext4_lookup:1785: inode #32386: comm fanotify22: iget: bogus i_mode (377)
> [ 5115.697835] [ T1618] fanotify_group_event_mask: report_mask=4 mask=8000 data=00000000a5a60545 data_type=6
> [ 5115.700048] [ T1618] fanotify_handle_event: group=000000004dc01100 mask=8000 report_mask=4
> [ 5115.702552] [ T1618] fanotify_insert_event: group=000000004dc01100 event=00000000c92d9500 bucket=48 <<<<<<
> [ 5115.704872] [ T2744] EXT4-fs error (device loop0): __ext4_remount:6804: comm fanotify22: Abort forced by user
> [ 5115.707250] [ T2744] Aborting journal on device loop0-8.
> [ 5115.708638] [ T1618] fanotify_group_event_mask: report_mask=4 mask=8000 data=00000000a5a60545 data_type=6
> [ 5115.712418] [ T1618] fanotify_handle_event: group=000000004dc01100 mask=8000 report_mask=4
> [ 5115.715217] [ T1618] fanotify_merge: group=000000004dc01100 event=00000000f1581cb4 bucket=48 <<<<<<
> [ 5115.716032] [ T2744] EXT4-fs (loop0): Remounting filesystem read-only
> [ 5115.717883] [ T1618] fanotify_should_merge: old=00000000c92d9500 new=00000000f1581cb4
> [ 5115.718698] [ T2744] EXT4-fs (loop0): re-mounted f2264636-366f-4ba7-a6c1-3f7963ce677a ro.
> [ 5116.725817] [ T2744] EXT4-fs (loop0): unmounting filesystem f2264636-366f-4ba7-a6c1-3f7963ce677a.
> [ 5116.731447] [ T2744] EXT4-fs (loop0): warning: mounting fs with errors, running e2fsck is recommended
> [ 5116.734565] [ T2744] EXT4-fs (loop0): mounted filesystem f2264636-366f-4ba7-a6c1-3f7963ce677a r/w with ordered data mode. Quota mode: none.
> [ 5116.738947] [ T2732] EXT4-fs (loop0): unmounting filesystem f2264636-366f-4ba7-a6c1-3f7963ce677a.
> > > > > Reported-by: kernel test robot <oliver.sang@intel.com>
> > > > > Closes: https://lore.kernel.org/oe-lkp/202602042124.87bd00e3-lkp@intel.com
> > > > > Signed-off-by: Wei Gao <wegao@suse.com>
> > > > > ---
> > > > > .../kernel/syscalls/fanotify/fanotify22.c | 32 ++++++++++++++++---
> > > > > 1 file changed, 28 insertions(+), 4 deletions(-)
> > > > > diff --git a/testcases/kernel/syscalls/fanotify/fanotify22.c b/testcases/kernel/syscalls/fanotify/fanotify22.c
> > > > > index 6578474a7..82eed7ba9 100644
> > > > > --- a/testcases/kernel/syscalls/fanotify/fanotify22.c
> > > > > +++ b/testcases/kernel/syscalls/fanotify/fanotify22.c
> > > > > @@ -53,6 +53,8 @@ static struct fanotify_fid_t null_fid;
> > > > > static struct fanotify_fid_t bad_file_fid;
> > > > > static struct fanotify_fid_t bad_link_fid;
> > > > > +static int event_count;
> > > > > +
> > > > > static void trigger_fs_abort(void)
> > > > > {
> > > > > SAFE_MOUNT(tst_device->dev, MOUNT_PATH, tst_device->fs_type,
> > > > > @@ -88,7 +90,6 @@ static void trigger_bad_link_lookup(void)
> > > > > ret, BAD_LINK, errno, EUCLEAN);
> > > > > }
> > > > > -
> > > > > static void tcase3_trigger(void)
> > > > > {
> > > > > trigger_bad_link_lookup();
> > > > > @@ -176,9 +177,10 @@ static int check_error_event_info_error(struct fanotify_event_info_error *info_e
> > > > > {
> > > > > int fail = 0;
> > > > > - if (info_error->error_count != ex->error_count) {
> > > > > - tst_res(TFAIL, "%s: Unexpected error_count (%d!=%d)",
> > > > > - ex->name, info_error->error_count, ex->error_count);
> > > > > + if (info_error->error_count != ex->error_count && event_count != ex->error_count) {
> > > > > + tst_res(TFAIL, "%s: Unexpected error_count (%d!=%d && %d!=%d)",
> > > > > + ex->name, info_error->error_count, ex->error_count,
> > > > > + event_count, ex->error_count);
> > > > > fail++;
> > > > > }
> > > > > @@ -255,8 +257,30 @@ static void do_test(unsigned int i)
> > > > > tcase->trigger_error();
> > > > > + usleep(100000);
> > > > > +
> > > > OK, but can you please add a comment why the sleep is here.
> > > Sure!
> > > > > read_len = SAFE_READ(0, fd_notify, event_buf, BUF_SIZE);
> > > > > + struct fanotify_event_metadata *metadata;
> > > > > + size_t len = read_len;
> > > > > +
> > > > > + event_count = 0;
> > > > > +
> > > > > + for (metadata = (struct fanotify_event_metadata *)event_buf;
> > > > > + FAN_EVENT_OK(metadata, len);
> > > > > + metadata = FAN_EVENT_NEXT(metadata, len)) {
> > > > > + event_count++;
> > > > > + struct fanotify_event_info_error *info_error = get_event_info_error(metadata);
> > > > > +
> > > > > + if (info_error) {
> > > > > + tst_res(TINFO, "Event [%d]: errno=%d (expected %d), error_count=%d (expected total %d)",
> > > > > + event_count, info_error->error, tcase->error,
> > > > > + info_error->error_count, tcase->error_count);
> > > > > + } else {
> > > > > + tst_res(TINFO, "Event [%d]: No error info record found", event_count);
> > > > > + }
> > > > > + }
> > > > > +
> > > > This looks too lax to me. I think
> > > I guess your mean this is workaround for the issue? Are you suggesting that I should reconstruct the test case
> > > to properly handle independent events rather than relying on the existing logic?
> > > If so, I’d be happy to explore that and implement a more robust solution.
> > > > --
> > > > Jan Kara <jack@suse.com>
> > > > SUSE Labs, CR
> > --
> > Jan Kara <jack@suse.com>
> > SUSE Labs, CR
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [LTP] [PATCH v1] fanotify22.c: handle multiple asynchronous error events
2026-03-06 4:50 ` Wei Gao via ltp
2026-03-06 12:24 ` Petr Vorel
@ 2026-03-06 15:19 ` Jan Kara
1 sibling, 0 replies; 27+ messages in thread
From: Jan Kara @ 2026-03-06 15:19 UTC (permalink / raw)
To: Wei Gao; +Cc: kernel test robot, Jan Kara, ltp
On Fri 06-03-26 04:50:31, Wei Gao wrote:
> On Thu, Mar 05, 2026 at 04:50:57PM +0100, Jan Kara wrote:
> > On Thu 05-03-26 14:36:31, Wei Gao wrote:
> > > On Thu, Mar 05, 2026 at 10:36:04AM +0100, Jan Kara wrote:
> > > > > 2) tcase4 failure: The kernel may deliver errors as independent events
> > > > > instead of a single merged event, The test logic is updated to
> > > > > validate the expected error_count by either a single merged event
> > > > > or the accumulation of multiple independent events in the buffer.
> > > >
> > > > Did you investigate why the events didn't get merged in the kernel? If they
> > > > are against the same filesystem they should get merged AFAICS.
> > > >
> > > Sorry i have no idea why this happen, I just add debug code into LTP case and
> > > found the event not lost but deliver independent, this leads me to believe that
> > > the LTP should handle both scenarios—merged and independent events.
> > > I also not sure my patch is correct or not, that's also the reason i
> > > CC the patch to you :)
> >
> > OK :). How easily is this reproducible? Because in principle event merging
> > is not *guaranteed* so it isn't wrong for the LTP test to handle split
> > events but before we complicate the test too much it would be good to
> > figure out why the kernel behaves in unexpected way and doesn't merge the
> > events...
> >
> > Honza
>
> Can not reproduce in my local env, but i can reproduce this issue in the OpenQA worker environment,
> with a failure rate of approximately 30%.
>
> By comparing the logs of successful and failed merge cases, I try to identified the root cause:
> Event merging is bypassed when errors are handled by different kworker threads.
>
> The core of the issue lies in how the event hash is calculated. As seen in the kernel source:
> hash ^= hash_long((unsigned long)pid | ondir, FANOTIFY_EVENT_HASH_BITS);
>
> In the failed case, different threads (e.g., T1618 and T1550) processed different events errors.
> This places the events into different hash buckets (e.g., Bucket 89 vs. 118), finally
> fanotify_should_merge() is never even invoked.
Aha! That was the bit I was missing. Indeed with the new fserror code the
errors are generated from work items and as you say different worker
kthread can end up generating each event so they won't be merged.
So fixing this in LTP test definitely makes sense! As I wrote in earlier
email what I'd do is that I'd create a helper function in the test reading
fserror events from the kernel and merging them and returning final array
of merged events.
The side note is that for fserror events the pid is mostly useless because
it will be pid of some random kworker but I don't think it's serious enough
to try to deal with that in the kernel somehow.
Honza
>
>
> Enable pr-debug info with following command:
> echo 'func fanotify_should_merge +p' > /sys/kernel/debug/dynamic_debug/control
>
> merge failed log:
> [ 5005.318867] [ T2675] loop0: detected capacity change from 0 to 614400
> [ 5005.324451] [ T2675] /dev/zero: Can't lookup blockdev
> [ 5005.399749] [ C0] operation not supported error, dev loop0, sector 614272 op 0x9:(WRITE_ZEROES) flags 0x10800800 phys_seg 0 prio class 2
> [ 5005.431946] [ T2675] EXT4-fs (loop0): mounted filesystem 66a18a14-c058-4356-bf1d-f34b4139ad3f r/w with ordered data mode. Quota mode: none.
> [ 5005.450493] [ T2691] EXT4-fs (loop0): unmounting filesystem 66a18a14-c058-4356-bf1d-f34b4139ad3f.
> [ 5005.472350] [ T2691] EXT4-fs (loop0): mounted filesystem 66a18a14-c058-4356-bf1d-f34b4139ad3f r/w with ordered data mode. Quota mode: none.
> [ 5005.476698] [ T2691] EXT4-fs error (device loop0): ext4_lookup:1785: inode #32386: comm fanotify22: iget: bogus i_mode (377)
> [ 5005.479819] [ T1618] fanotify_group_event_mask: report_mask=4 mask=8000 data=00000000a5a60545 data_type=6
> [ 5005.484575] [ T1618] fanotify_handle_event: group=000000005be5930d mask=8000 report_mask=4
> [ 5005.485525] [ T2691] EXT4-fs error (device loop0): __ext4_remount:6804: comm fanotify22: Abort forced by user
> [ 5005.486874] [ T1618] fanotify_insert_event: group=000000005be5930d event=00000000f1581cb4 bucket=118 <<<<<<
> [ 5005.488983] [ T2691] Aborting journal on device loop0-8.
> [ 5005.492336] [ T1550] fanotify_group_event_mask: report_mask=4 mask=8000 data=000000007f826d40 data_type=6
> [ 5005.494414] [ T2691] EXT4-fs (loop0): Remounting filesystem read-only
> [ 5005.494505] [ T1550] fanotify_handle_event: group=000000005be5930d mask=8000 report_mask=4
> [ 5005.494746] [ T2691] EXT4-fs (loop0): re-mounted 66a18a14-c058-4356-bf1d-f34b4139ad3f ro.
> [ 5005.495052] [ T1550] fanotify_merge: group=000000005be5930d event=00000000c92d9500 bucket=89
> [ 5005.501280] [ T1550] fanotify_insert_event: group=000000005be5930d event=00000000c92d9500 bucket=89 <<<<<<
> [ 5006.500649] [ T2691] EXT4-fs (loop0): unmounting filesystem 66a18a14-c058-4356-bf1d-f34b4139ad3f.
> [ 5006.505916] [ T2691] EXT4-fs (loop0): warning: mounting fs with errors, running e2fsck is recommended
> [ 5006.509571] [ T2691] EXT4-fs (loop0): mounted filesystem 66a18a14-c058-4356-bf1d-f34b4139ad3f r/w with ordered data mode. Quota mode: none.
> [ 5006.512851] [ T2675] EXT4-fs (loop0): unmounting filesystem 66a18a14-c058-4356-bf1d-f34b4139ad3f.
>
> merge ok log:
> [ 5115.542758] [ T2732] loop0: detected capacity change from 0 to 614400
> [ 5115.549082] [ T2732] /dev/zero: Can't lookup blockdev
> [ 5115.617432] [ C0] operation not supported error, dev loop0, sector 614272 op 0x9:(WRITE_ZEROES) flags 0x10800800 phys_seg 0 prio class 2
> [ 5115.637765] [ T2732] EXT4-fs (loop0): mounted filesystem f2264636-366f-4ba7-a6c1-3f7963ce677a r/w with ordered data mode. Quota mode: none.
> [ 5115.661989] [ T2744] EXT4-fs (loop0): unmounting filesystem f2264636-366f-4ba7-a6c1-3f7963ce677a.
> [ 5115.690910] [ T2744] EXT4-fs (loop0): mounted filesystem f2264636-366f-4ba7-a6c1-3f7963ce677a r/w with ordered data mode. Quota mode: none.
> [ 5115.695573] [ T2744] EXT4-fs error (device loop0): ext4_lookup:1785: inode #32386: comm fanotify22: iget: bogus i_mode (377)
> [ 5115.697835] [ T1618] fanotify_group_event_mask: report_mask=4 mask=8000 data=00000000a5a60545 data_type=6
> [ 5115.700048] [ T1618] fanotify_handle_event: group=000000004dc01100 mask=8000 report_mask=4
> [ 5115.702552] [ T1618] fanotify_insert_event: group=000000004dc01100 event=00000000c92d9500 bucket=48 <<<<<<
> [ 5115.704872] [ T2744] EXT4-fs error (device loop0): __ext4_remount:6804: comm fanotify22: Abort forced by user
> [ 5115.707250] [ T2744] Aborting journal on device loop0-8.
> [ 5115.708638] [ T1618] fanotify_group_event_mask: report_mask=4 mask=8000 data=00000000a5a60545 data_type=6
> [ 5115.712418] [ T1618] fanotify_handle_event: group=000000004dc01100 mask=8000 report_mask=4
> [ 5115.715217] [ T1618] fanotify_merge: group=000000004dc01100 event=00000000f1581cb4 bucket=48 <<<<<<
> [ 5115.716032] [ T2744] EXT4-fs (loop0): Remounting filesystem read-only
> [ 5115.717883] [ T1618] fanotify_should_merge: old=00000000c92d9500 new=00000000f1581cb4
> [ 5115.718698] [ T2744] EXT4-fs (loop0): re-mounted f2264636-366f-4ba7-a6c1-3f7963ce677a ro.
> [ 5116.725817] [ T2744] EXT4-fs (loop0): unmounting filesystem f2264636-366f-4ba7-a6c1-3f7963ce677a.
> [ 5116.731447] [ T2744] EXT4-fs (loop0): warning: mounting fs with errors, running e2fsck is recommended
> [ 5116.734565] [ T2744] EXT4-fs (loop0): mounted filesystem f2264636-366f-4ba7-a6c1-3f7963ce677a r/w with ordered data mode. Quota mode: none.
> [ 5116.738947] [ T2732] EXT4-fs (loop0): unmounting filesystem f2264636-366f-4ba7-a6c1-3f7963ce677a.
>
> >
> > > > > Reported-by: kernel test robot <oliver.sang@intel.com>
> > > > > Closes: https://lore.kernel.org/oe-lkp/202602042124.87bd00e3-lkp@intel.com
> > > > > Signed-off-by: Wei Gao <wegao@suse.com>
> > > > > ---
> > > > > .../kernel/syscalls/fanotify/fanotify22.c | 32 ++++++++++++++++---
> > > > > 1 file changed, 28 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/testcases/kernel/syscalls/fanotify/fanotify22.c b/testcases/kernel/syscalls/fanotify/fanotify22.c
> > > > > index 6578474a7..82eed7ba9 100644
> > > > > --- a/testcases/kernel/syscalls/fanotify/fanotify22.c
> > > > > +++ b/testcases/kernel/syscalls/fanotify/fanotify22.c
> > > > > @@ -53,6 +53,8 @@ static struct fanotify_fid_t null_fid;
> > > > > static struct fanotify_fid_t bad_file_fid;
> > > > > static struct fanotify_fid_t bad_link_fid;
> > > > >
> > > > > +static int event_count;
> > > > > +
> > > > > static void trigger_fs_abort(void)
> > > > > {
> > > > > SAFE_MOUNT(tst_device->dev, MOUNT_PATH, tst_device->fs_type,
> > > > > @@ -88,7 +90,6 @@ static void trigger_bad_link_lookup(void)
> > > > > ret, BAD_LINK, errno, EUCLEAN);
> > > > > }
> > > > >
> > > > > -
> > > > > static void tcase3_trigger(void)
> > > > > {
> > > > > trigger_bad_link_lookup();
> > > > > @@ -176,9 +177,10 @@ static int check_error_event_info_error(struct fanotify_event_info_error *info_e
> > > > > {
> > > > > int fail = 0;
> > > > >
> > > > > - if (info_error->error_count != ex->error_count) {
> > > > > - tst_res(TFAIL, "%s: Unexpected error_count (%d!=%d)",
> > > > > - ex->name, info_error->error_count, ex->error_count);
> > > > > + if (info_error->error_count != ex->error_count && event_count != ex->error_count) {
> > > > > + tst_res(TFAIL, "%s: Unexpected error_count (%d!=%d && %d!=%d)",
> > > > > + ex->name, info_error->error_count, ex->error_count,
> > > > > + event_count, ex->error_count);
> > > > > fail++;
> > > > > }
> > > > >
> > > > > @@ -255,8 +257,30 @@ static void do_test(unsigned int i)
> > > > >
> > > > > tcase->trigger_error();
> > > > >
> > > > > + usleep(100000);
> > > > > +
> > > >
> > > > OK, but can you please add a comment why the sleep is here.
> > > >
> > > Sure!
> > > > > read_len = SAFE_READ(0, fd_notify, event_buf, BUF_SIZE);
> > > > >
> > > > > + struct fanotify_event_metadata *metadata;
> > > > > + size_t len = read_len;
> > > > > +
> > > > > + event_count = 0;
> > > > > +
> > > > > + for (metadata = (struct fanotify_event_metadata *)event_buf;
> > > > > + FAN_EVENT_OK(metadata, len);
> > > > > + metadata = FAN_EVENT_NEXT(metadata, len)) {
> > > > > + event_count++;
> > > > > + struct fanotify_event_info_error *info_error = get_event_info_error(metadata);
> > > > > +
> > > > > + if (info_error) {
> > > > > + tst_res(TINFO, "Event [%d]: errno=%d (expected %d), error_count=%d (expected total %d)",
> > > > > + event_count, info_error->error, tcase->error,
> > > > > + info_error->error_count, tcase->error_count);
> > > > > + } else {
> > > > > + tst_res(TINFO, "Event [%d]: No error info record found", event_count);
> > > > > + }
> > > > > + }
> > > > > +
> > > >
> > > > This looks too lax to me. I think
> > > >
> > > I guess your mean this is workaround for the issue? Are you suggesting that I should reconstruct the test case
> > > to properly handle independent events rather than relying on the existing logic?
> > > If so, I’d be happy to explore that and implement a more robust solution.
> > >
> > > > --
> > > > Jan Kara <jack@suse.com>
> > > > SUSE Labs, CR
> > --
> > Jan Kara <jack@suse.com>
> > SUSE Labs, CR
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 27+ messages in thread
* [LTP] [PATCH v2] fanotify22.c: handle multiple asynchronous error events
2026-03-04 13:38 [LTP] [PATCH v1] fanotify22.c: handle multiple asynchronous error events Wei Gao via ltp
2026-03-05 9:36 ` Jan Kara
@ 2026-03-09 7:59 ` Wei Gao via ltp
2026-03-09 10:26 ` Andrea Cervesato via ltp
` (2 more replies)
1 sibling, 3 replies; 27+ messages in thread
From: Wei Gao via ltp @ 2026-03-09 7:59 UTC (permalink / raw)
To: ltp; +Cc: Jan Kara, kernel test robot
Since the introduction of the asynchronous fserror reporting framework
(kernel commit 81d2e13a57c9), fanotify22 has encountered sporadic failures
due to the non-deterministic nature of event delivery and merging:
1) tcase3 failure: A race condition occurs when the test reads the
notification fd between two events. Adding a short delay
(usleep) ensures all events are dispatched and ready before the
read() call.
2) tcase4 failure: The kernel may deliver errors as independent events
instead of a single merged event, since different worker kthread can
end up generating each event so they won't be merged. As suggested by
Jan Kara, this patch introduces a consolidate_events() helper. It iterates
through the event buffer, accumulates the error_count from all independent
events, and updates the first event's count in-place.
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202602042124.87bd00e3-lkp@intel.com
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wei Gao <wegao@suse.com>
---
.../kernel/syscalls/fanotify/fanotify22.c | 37 ++++++++++++++++++-
1 file changed, 36 insertions(+), 1 deletion(-)
diff --git a/testcases/kernel/syscalls/fanotify/fanotify22.c b/testcases/kernel/syscalls/fanotify/fanotify22.c
index 6578474a7..4c0fd31b1 100644
--- a/testcases/kernel/syscalls/fanotify/fanotify22.c
+++ b/testcases/kernel/syscalls/fanotify/fanotify22.c
@@ -88,7 +88,6 @@ static void trigger_bad_link_lookup(void)
ret, BAD_LINK, errno, EUCLEAN);
}
-
static void tcase3_trigger(void)
{
trigger_bad_link_lookup();
@@ -138,6 +137,38 @@ static struct test_case {
}
};
+static size_t consolidate_events(char *buf, size_t len)
+{
+ struct fanotify_event_metadata *metadata, *first = NULL;
+ struct fanotify_event_info_error *first_info = NULL;
+ unsigned int total_count = 0;
+ int event_num = 0;
+
+ for (metadata = (struct fanotify_event_metadata *)buf;
+ FAN_EVENT_OK(metadata, len);
+ metadata = FAN_EVENT_NEXT(metadata, len)) {
+
+ event_num++;
+ struct fanotify_event_info_error *info = get_event_info_error(metadata);
+
+ if (info) {
+ if (!first) {
+ first = metadata;
+ first_info = info;
+ }
+ total_count += info->error_count;
+
+ tst_res(TINFO, "Event [%d]: errno=%d, error_count=%d",
+ event_num, info->error, info->error_count);
+ }
+ }
+
+ if (first_info)
+ first_info->error_count = total_count;
+
+ return (first) ? first->event_len : 0;
+}
+
static int check_error_event_info_fid(struct fanotify_event_info_fid *fid,
const struct test_case *ex)
{
@@ -255,7 +286,11 @@ static void do_test(unsigned int i)
tcase->trigger_error();
+ /* Wait for asynchronous kworker threads to dispatch events */
+ usleep(100000);
+
read_len = SAFE_READ(0, fd_notify, event_buf, BUF_SIZE);
+ read_len = consolidate_events(event_buf, read_len);
SAFE_FANOTIFY_MARK(fd_notify, FAN_MARK_REMOVE|FAN_MARK_FILESYSTEM,
FAN_FS_ERROR, AT_FDCWD, MOUNT_PATH);
--
2.52.0
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply related [flat|nested] 27+ messages in thread* Re: [LTP] [PATCH v2] fanotify22.c: handle multiple asynchronous error events
2026-03-09 7:59 ` [LTP] [PATCH v2] " Wei Gao via ltp
@ 2026-03-09 10:26 ` Andrea Cervesato via ltp
2026-03-09 11:29 ` Jan Kara
2026-03-18 6:46 ` [LTP] [PATCH v3] " Wei Gao via ltp
2 siblings, 0 replies; 27+ messages in thread
From: Andrea Cervesato via ltp @ 2026-03-09 10:26 UTC (permalink / raw)
To: Wei Gao via ltp; +Cc: kernel test robot, Jan Kara
Hi!
> + /* Wait for asynchronous kworker threads to dispatch events */
> + usleep(100000);
> +
> read_len = SAFE_READ(0, fd_notify, event_buf, BUF_SIZE);
> + read_len = consolidate_events(event_buf, read_len);
This doesn't sound correct. Instead of usleep() and "guessing" if data
is coming or not, we should poll() over the fd_notify and collect data
until BUF_SIZE. If data doesn't arrive, it means events were not
dispatched and test fails.
Remember that (in general) sleep operations hide test faults or bugs, as
it's explained in the ground rules guide:
https://linux-test-project.readthedocs.io/en/latest/developers/ground_rules.html#why-is-sleep-in-tests-bad-then
Kind regards,
--
Andrea Cervesato
SUSE QE Automation Engineer Linux
andrea.cervesato@suse.com
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [LTP] [PATCH v2] fanotify22.c: handle multiple asynchronous error events
2026-03-09 7:59 ` [LTP] [PATCH v2] " Wei Gao via ltp
2026-03-09 10:26 ` Andrea Cervesato via ltp
@ 2026-03-09 11:29 ` Jan Kara
2026-03-18 6:46 ` [LTP] [PATCH v3] " Wei Gao via ltp
2 siblings, 0 replies; 27+ messages in thread
From: Jan Kara @ 2026-03-09 11:29 UTC (permalink / raw)
To: Wei Gao; +Cc: Jan Kara, kernel test robot, ltp
On Mon 09-03-26 07:59:42, Wei Gao wrote:
> Since the introduction of the asynchronous fserror reporting framework
> (kernel commit 81d2e13a57c9), fanotify22 has encountered sporadic failures
> due to the non-deterministic nature of event delivery and merging:
>
> 1) tcase3 failure: A race condition occurs when the test reads the
> notification fd between two events. Adding a short delay
> (usleep) ensures all events are dispatched and ready before the
> read() call.
>
> 2) tcase4 failure: The kernel may deliver errors as independent events
> instead of a single merged event, since different worker kthread can
> end up generating each event so they won't be merged. As suggested by
> Jan Kara, this patch introduces a consolidate_events() helper. It iterates
> through the event buffer, accumulates the error_count from all independent
> events, and updates the first event's count in-place.
>
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202602042124.87bd00e3-lkp@intel.com
> Suggested-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Wei Gao <wegao@suse.com>
...
> +static size_t consolidate_events(char *buf, size_t len)
> +{
> + struct fanotify_event_metadata *metadata, *first = NULL;
> + struct fanotify_event_info_error *first_info = NULL;
> + unsigned int total_count = 0;
> + int event_num = 0;
> +
> + for (metadata = (struct fanotify_event_metadata *)buf;
> + FAN_EVENT_OK(metadata, len);
> + metadata = FAN_EVENT_NEXT(metadata, len)) {
> +
> + event_num++;
> + struct fanotify_event_info_error *info = get_event_info_error(metadata);
> +
> + if (info) {
> + if (!first) {
> + first = metadata;
> + first_info = info;
> + }
> + total_count += info->error_count;
Please verify the 'error' field in the info matches before merging the
count and fail the test if it does not. Also if we get event without error
info I think we should fail the test as it currently shouldn't happen for
any of the tests.
Honza
> +
> + tst_res(TINFO, "Event [%d]: errno=%d, error_count=%d",
> + event_num, info->error, info->error_count);
> + }
> + }
> +
> + if (first_info)
> + first_info->error_count = total_count;
> +
> + return (first) ? first->event_len : 0;
> +}
> +
> static int check_error_event_info_fid(struct fanotify_event_info_fid *fid,
> const struct test_case *ex)
> {
> @@ -255,7 +286,11 @@ static void do_test(unsigned int i)
>
> tcase->trigger_error();
>
> + /* Wait for asynchronous kworker threads to dispatch events */
> + usleep(100000);
> +
> read_len = SAFE_READ(0, fd_notify, event_buf, BUF_SIZE);
> + read_len = consolidate_events(event_buf, read_len);
>
> SAFE_FANOTIFY_MARK(fd_notify, FAN_MARK_REMOVE|FAN_MARK_FILESYSTEM,
> FAN_FS_ERROR, AT_FDCWD, MOUNT_PATH);
> --
> 2.52.0
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 27+ messages in thread* [LTP] [PATCH v3] fanotify22.c: handle multiple asynchronous error events
2026-03-09 7:59 ` [LTP] [PATCH v2] " Wei Gao via ltp
2026-03-09 10:26 ` Andrea Cervesato via ltp
2026-03-09 11:29 ` Jan Kara
@ 2026-03-18 6:46 ` Wei Gao via ltp
2026-03-18 18:18 ` Jan Kara
` (2 more replies)
2 siblings, 3 replies; 27+ messages in thread
From: Wei Gao via ltp @ 2026-03-18 6:46 UTC (permalink / raw)
To: ltp; +Cc: Jan Kara, kernel test robot
Since the introduction of the asynchronous fserror reporting framework
(kernel commit 81d2e13a57c9), fanotify22 has encountered sporadic failures
due to the non-deterministic nature of event delivery and merging:
1) tcase3 failure: A race condition occurs when the test reads the
notification fd between two events. uses a poll() and read() loop to wait
until the expected.
2) tcase4 failure: The kernel may deliver errors as independent events
instead of a single merged event, since different worker kthread can
end up generating each event so they won't be merged. As suggested by
Jan Kara, this patch introduces a consolidate_events() helper. It iterates
through the event buffer, accumulates the error_count from all independent
events, and updates the first event's count in-place.
v2->v3:
- Replaced usleep() with poll() loop
- Added secondary error check support (error2)
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202602042124.87bd00e3-lkp@intel.com
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wei Gao <wegao@suse.com>
---
.../kernel/syscalls/fanotify/fanotify22.c | 87 ++++++++++++++++++-
1 file changed, 83 insertions(+), 4 deletions(-)
diff --git a/testcases/kernel/syscalls/fanotify/fanotify22.c b/testcases/kernel/syscalls/fanotify/fanotify22.c
index e8002b160..c921cf2c0 100644
--- a/testcases/kernel/syscalls/fanotify/fanotify22.c
+++ b/testcases/kernel/syscalls/fanotify/fanotify22.c
@@ -28,6 +28,7 @@
#include "tst_test.h"
#include <sys/fanotify.h>
#include <sys/types.h>
+#include <poll.h>
#ifdef HAVE_SYS_FANOTIFY_H
#include "fanotify.h"
@@ -88,7 +89,6 @@ static void trigger_bad_link_lookup(void)
ret, BAD_LINK, errno, EUCLEAN);
}
-
static void tcase3_trigger(void)
{
trigger_bad_link_lookup();
@@ -104,6 +104,7 @@ static void tcase4_trigger(void)
static struct test_case {
char *name;
int error;
+ int error2;
unsigned int error_count;
struct fanotify_fid_t *fid;
void (*trigger_error)(void);
@@ -134,10 +135,54 @@ static struct test_case {
.trigger_error = &tcase4_trigger,
.error_count = 2,
.error = EFSCORRUPTED,
+ .error2 = ESHUTDOWN,
.fid = &bad_file_fid,
}
};
+static size_t consolidate_events(char *buf, size_t len, const struct test_case *ex)
+{
+ struct fanotify_event_metadata *metadata, *first = NULL;
+ struct fanotify_event_info_error *first_info = NULL;
+ unsigned int total_count = 0;
+ int event_num = 0;
+
+ for (metadata = (struct fanotify_event_metadata *)buf;
+ FAN_EVENT_OK(metadata, len);
+ metadata = FAN_EVENT_NEXT(metadata, len)) {
+
+ event_num++;
+ struct fanotify_event_info_error *info = get_event_info_error(metadata);
+
+ if (!info) {
+ tst_res(TFAIL, "%s: Event [%d] missing error info",
+ ex->name, event_num);
+ continue;
+ }
+
+ if (info->error != ex->error && (ex->error2 == 0 || info->error != ex->error2)) {
+ tst_res(TFAIL, "%s: Event [%d] unexpected errno (%d)",
+ ex->name, event_num, info->error);
+ }
+
+ if (info) {
+ if (!first) {
+ first = metadata;
+ first_info = info;
+ }
+ total_count += info->error_count;
+
+ tst_res(TINFO, "Event [%d]: errno=%d, error_count=%d",
+ event_num, info->error, info->error_count);
+ }
+ }
+
+ if (first_info)
+ first_info->error_count = total_count;
+
+ return (first) ? first->event_len : 0;
+}
+
static int check_error_event_info_fid(struct fanotify_event_info_fid *fid,
const struct test_case *ex)
{
@@ -248,19 +293,53 @@ static void check_event(char *buf, size_t len, const struct test_case *ex)
static void do_test(unsigned int i)
{
const struct test_case *tcase = &testcases[i];
- size_t read_len;
+ size_t read_len = 0;
+ struct pollfd pfd;
+ unsigned int accumulated_count = 0;
SAFE_FANOTIFY_MARK(fd_notify, FAN_MARK_ADD|FAN_MARK_FILESYSTEM,
FAN_FS_ERROR, AT_FDCWD, MOUNT_PATH);
tcase->trigger_error();
- read_len = SAFE_READ(0, fd_notify, event_buf, BUF_SIZE);
+ pfd.fd = fd_notify;
+ pfd.events = POLLIN;
+
+ while (accumulated_count < tcase->error_count) {
+ if (poll(&pfd, 1, 5000) <= 0) {
+ tst_res(TFAIL, "%s: Timeout waiting for events", tcase->name);
+ goto out;
+ }
+
+ char *current_pos = event_buf + read_len;
+ int ret = read(fd_notify, current_pos, BUF_SIZE - read_len);
+
+ if (ret < 0) {
+ tst_res(TFAIL, "%s: read failed: %s", tcase->name, strerror(errno));
+ goto out;
+ }
+
+ struct fanotify_event_metadata *m =
+ (struct fanotify_event_metadata *)current_pos;
+ while (FAN_EVENT_OK(m, ret)) {
+ read_len += ret;
+ struct fanotify_event_info_error *e = get_event_info_error(m);
+
+ if (e)
+ accumulated_count += e->error_count;
+ m = FAN_EVENT_NEXT(m, ret);
+ }
+ }
+
+ read_len = consolidate_events(event_buf, read_len, tcase);
+
+ check_event(event_buf, read_len, tcase);
+
+out:
SAFE_FANOTIFY_MARK(fd_notify, FAN_MARK_REMOVE|FAN_MARK_FILESYSTEM,
FAN_FS_ERROR, AT_FDCWD, MOUNT_PATH);
- check_event(event_buf, read_len, tcase);
/* Unmount and mount the filesystem to get it out of the error state */
SAFE_UMOUNT(MOUNT_PATH);
SAFE_MOUNT(tst_device->dev, MOUNT_PATH, tst_device->fs_type, 0, NULL);
--
2.52.0
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply related [flat|nested] 27+ messages in thread* Re: [LTP] [PATCH v3] fanotify22.c: handle multiple asynchronous error events
2026-03-18 6:46 ` [LTP] [PATCH v3] " Wei Gao via ltp
@ 2026-03-18 18:18 ` Jan Kara
2026-03-24 11:55 ` Andrea Cervesato via ltp
2026-03-25 12:43 ` [LTP] [PATCH v4] " Wei Gao via ltp
2 siblings, 0 replies; 27+ messages in thread
From: Jan Kara @ 2026-03-18 18:18 UTC (permalink / raw)
To: Wei Gao; +Cc: Jan Kara, kernel test robot, ltp
On Wed 18-03-26 06:46:19, Wei Gao wrote:
> Since the introduction of the asynchronous fserror reporting framework
> (kernel commit 81d2e13a57c9), fanotify22 has encountered sporadic failures
> due to the non-deterministic nature of event delivery and merging:
>
> 1) tcase3 failure: A race condition occurs when the test reads the
> notification fd between two events. uses a poll() and read() loop to wait
> until the expected.
>
> 2) tcase4 failure: The kernel may deliver errors as independent events
> instead of a single merged event, since different worker kthread can
> end up generating each event so they won't be merged. As suggested by
> Jan Kara, this patch introduces a consolidate_events() helper. It iterates
> through the event buffer, accumulates the error_count from all independent
> events, and updates the first event's count in-place.
>
> v2->v3:
> - Replaced usleep() with poll() loop
> - Added secondary error check support (error2)
>
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202602042124.87bd00e3-lkp@intel.com
> Suggested-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Wei Gao <wegao@suse.com>
Thanks. The patch looks good to me. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> .../kernel/syscalls/fanotify/fanotify22.c | 87 ++++++++++++++++++-
> 1 file changed, 83 insertions(+), 4 deletions(-)
>
> diff --git a/testcases/kernel/syscalls/fanotify/fanotify22.c b/testcases/kernel/syscalls/fanotify/fanotify22.c
> index e8002b160..c921cf2c0 100644
> --- a/testcases/kernel/syscalls/fanotify/fanotify22.c
> +++ b/testcases/kernel/syscalls/fanotify/fanotify22.c
> @@ -28,6 +28,7 @@
> #include "tst_test.h"
> #include <sys/fanotify.h>
> #include <sys/types.h>
> +#include <poll.h>
>
> #ifdef HAVE_SYS_FANOTIFY_H
> #include "fanotify.h"
> @@ -88,7 +89,6 @@ static void trigger_bad_link_lookup(void)
> ret, BAD_LINK, errno, EUCLEAN);
> }
>
> -
> static void tcase3_trigger(void)
> {
> trigger_bad_link_lookup();
> @@ -104,6 +104,7 @@ static void tcase4_trigger(void)
> static struct test_case {
> char *name;
> int error;
> + int error2;
> unsigned int error_count;
> struct fanotify_fid_t *fid;
> void (*trigger_error)(void);
> @@ -134,10 +135,54 @@ static struct test_case {
> .trigger_error = &tcase4_trigger,
> .error_count = 2,
> .error = EFSCORRUPTED,
> + .error2 = ESHUTDOWN,
> .fid = &bad_file_fid,
> }
> };
>
> +static size_t consolidate_events(char *buf, size_t len, const struct test_case *ex)
> +{
> + struct fanotify_event_metadata *metadata, *first = NULL;
> + struct fanotify_event_info_error *first_info = NULL;
> + unsigned int total_count = 0;
> + int event_num = 0;
> +
> + for (metadata = (struct fanotify_event_metadata *)buf;
> + FAN_EVENT_OK(metadata, len);
> + metadata = FAN_EVENT_NEXT(metadata, len)) {
> +
> + event_num++;
> + struct fanotify_event_info_error *info = get_event_info_error(metadata);
> +
> + if (!info) {
> + tst_res(TFAIL, "%s: Event [%d] missing error info",
> + ex->name, event_num);
> + continue;
> + }
> +
> + if (info->error != ex->error && (ex->error2 == 0 || info->error != ex->error2)) {
> + tst_res(TFAIL, "%s: Event [%d] unexpected errno (%d)",
> + ex->name, event_num, info->error);
> + }
> +
> + if (info) {
> + if (!first) {
> + first = metadata;
> + first_info = info;
> + }
> + total_count += info->error_count;
> +
> + tst_res(TINFO, "Event [%d]: errno=%d, error_count=%d",
> + event_num, info->error, info->error_count);
> + }
> + }
> +
> + if (first_info)
> + first_info->error_count = total_count;
> +
> + return (first) ? first->event_len : 0;
> +}
> +
> static int check_error_event_info_fid(struct fanotify_event_info_fid *fid,
> const struct test_case *ex)
> {
> @@ -248,19 +293,53 @@ static void check_event(char *buf, size_t len, const struct test_case *ex)
> static void do_test(unsigned int i)
> {
> const struct test_case *tcase = &testcases[i];
> - size_t read_len;
> + size_t read_len = 0;
> + struct pollfd pfd;
> + unsigned int accumulated_count = 0;
>
> SAFE_FANOTIFY_MARK(fd_notify, FAN_MARK_ADD|FAN_MARK_FILESYSTEM,
> FAN_FS_ERROR, AT_FDCWD, MOUNT_PATH);
>
> tcase->trigger_error();
>
> - read_len = SAFE_READ(0, fd_notify, event_buf, BUF_SIZE);
> + pfd.fd = fd_notify;
> + pfd.events = POLLIN;
> +
> + while (accumulated_count < tcase->error_count) {
> + if (poll(&pfd, 1, 5000) <= 0) {
> + tst_res(TFAIL, "%s: Timeout waiting for events", tcase->name);
> + goto out;
> + }
> +
> + char *current_pos = event_buf + read_len;
> + int ret = read(fd_notify, current_pos, BUF_SIZE - read_len);
> +
> + if (ret < 0) {
> + tst_res(TFAIL, "%s: read failed: %s", tcase->name, strerror(errno));
> + goto out;
> + }
> +
> + struct fanotify_event_metadata *m =
> + (struct fanotify_event_metadata *)current_pos;
> + while (FAN_EVENT_OK(m, ret)) {
> + read_len += ret;
> + struct fanotify_event_info_error *e = get_event_info_error(m);
> +
> + if (e)
> + accumulated_count += e->error_count;
> + m = FAN_EVENT_NEXT(m, ret);
> + }
> + }
> +
> + read_len = consolidate_events(event_buf, read_len, tcase);
> +
> + check_event(event_buf, read_len, tcase);
> +
> +out:
>
> SAFE_FANOTIFY_MARK(fd_notify, FAN_MARK_REMOVE|FAN_MARK_FILESYSTEM,
> FAN_FS_ERROR, AT_FDCWD, MOUNT_PATH);
>
> - check_event(event_buf, read_len, tcase);
> /* Unmount and mount the filesystem to get it out of the error state */
> SAFE_UMOUNT(MOUNT_PATH);
> SAFE_MOUNT(tst_device->dev, MOUNT_PATH, tst_device->fs_type, 0, NULL);
> --
> 2.52.0
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [LTP] [PATCH v3] fanotify22.c: handle multiple asynchronous error events
2026-03-18 6:46 ` [LTP] [PATCH v3] " Wei Gao via ltp
2026-03-18 18:18 ` Jan Kara
@ 2026-03-24 11:55 ` Andrea Cervesato via ltp
2026-03-25 12:43 ` [LTP] [PATCH v4] " Wei Gao via ltp
2 siblings, 0 replies; 27+ messages in thread
From: Andrea Cervesato via ltp @ 2026-03-24 11:55 UTC (permalink / raw)
To: Wei Gao; +Cc: Jan Kara, kernel test robot, ltp
Hi Wei,
there are still a few things to achieve before merge. Hoefully the next one
will be merged since we have Jan approval already.
> v2->v3:
> - Replaced usleep() with poll() loop
> - Added secondary error check support (error2)
This can't be added to the commit message.
> +
> + if (!info) {
> + tst_res(TFAIL, "%s: Event [%d] missing error info",
> + ex->name, event_num);
> + continue;
> + }
> +
> + if (info->error != ex->error && (ex->error2 == 0 || info->error != ex->error2)) {
> + tst_res(TFAIL, "%s: Event [%d] unexpected errno (%d)",
> + ex->name, event_num, info->error);
> + }
> +
> + if (info) {
We are already having if (!info) {.. continue; } so we don't need this.
> + pfd.fd = fd_notify;
> + pfd.events = POLLIN;
> +
> + while (accumulated_count < tcase->error_count) {
> + if (poll(&pfd, 1, 5000) <= 0) {
> + tst_res(TFAIL, "%s: Timeout waiting for events", tcase->name);
> + goto out;
> + }
> +
> + char *current_pos = event_buf + read_len;
> + int ret = read(fd_notify, current_pos, BUF_SIZE - read_len);
> +
> + if (ret < 0) {
> + tst_res(TFAIL, "%s: read failed: %s", tcase->name, strerror(errno));
> + goto out;
> + }
> +
> + struct fanotify_event_metadata *m =
> + (struct fanotify_event_metadata *)current_pos;
> + while (FAN_EVENT_OK(m, ret)) {
> + read_len += ret;
This should be moved out of the while() loop, otherwise we will grow the
read_len for multiple event.
Regards,
--
Andrea Cervesato
SUSE QE Automation Engineer Linux
andrea.cervesato@suse.com
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 27+ messages in thread* [LTP] [PATCH v4] fanotify22.c: handle multiple asynchronous error events
2026-03-18 6:46 ` [LTP] [PATCH v3] " Wei Gao via ltp
2026-03-18 18:18 ` Jan Kara
2026-03-24 11:55 ` Andrea Cervesato via ltp
@ 2026-03-25 12:43 ` Wei Gao via ltp
2026-03-25 15:52 ` Jan Kara
2026-03-26 1:28 ` [LTP] [PATCH v5] " Wei Gao via ltp
2 siblings, 2 replies; 27+ messages in thread
From: Wei Gao via ltp @ 2026-03-25 12:43 UTC (permalink / raw)
To: ltp; +Cc: Jan Kara, kernel test robot
Since the introduction of the asynchronous fserror reporting framework
(kernel commit 81d2e13a57c9), fanotify22 has encountered sporadic failures
due to the non-deterministic nature of event delivery and merging:
1) tcase3 failure: A race condition occurs when the test reads the
notification fd between two events. uses a poll() and read() loop to wait
until the expected.
2) tcase4 failure: The kernel may deliver errors as independent events
instead of a single merged event, since different worker kthread can
end up generating each event so they won't be merged. As suggested by
Jan Kara, this patch introduces a consolidate_events() helper. It iterates
through the event buffer, accumulates the error_count from all independent
events, and updates the first event's count in-place.
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202602042124.87bd00e3-lkp@intel.com
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wei Gao <wegao@suse.com>
---
v3->v4:
1)Fixed read_len logic by using m->event_len inside the loop to avoid overcounting.
2)Removed redundant if (info) check in consolidate_events.
.../kernel/syscalls/fanotify/fanotify22.c | 86 ++++++++++++++++++-
1 file changed, 82 insertions(+), 4 deletions(-)
diff --git a/testcases/kernel/syscalls/fanotify/fanotify22.c b/testcases/kernel/syscalls/fanotify/fanotify22.c
index e8002b160..e79213b40 100644
--- a/testcases/kernel/syscalls/fanotify/fanotify22.c
+++ b/testcases/kernel/syscalls/fanotify/fanotify22.c
@@ -28,6 +28,7 @@
#include "tst_test.h"
#include <sys/fanotify.h>
#include <sys/types.h>
+#include <poll.h>
#ifdef HAVE_SYS_FANOTIFY_H
#include "fanotify.h"
@@ -88,7 +89,6 @@ static void trigger_bad_link_lookup(void)
ret, BAD_LINK, errno, EUCLEAN);
}
-
static void tcase3_trigger(void)
{
trigger_bad_link_lookup();
@@ -104,6 +104,7 @@ static void tcase4_trigger(void)
static struct test_case {
char *name;
int error;
+ int error2;
unsigned int error_count;
struct fanotify_fid_t *fid;
void (*trigger_error)(void);
@@ -134,10 +135,52 @@ static struct test_case {
.trigger_error = &tcase4_trigger,
.error_count = 2,
.error = EFSCORRUPTED,
+ .error2 = ESHUTDOWN,
.fid = &bad_file_fid,
}
};
+static size_t consolidate_events(char *buf, size_t len, const struct test_case *ex)
+{
+ struct fanotify_event_metadata *metadata, *first = NULL;
+ struct fanotify_event_info_error *first_info = NULL;
+ unsigned int total_count = 0;
+ int event_num = 0;
+
+ for (metadata = (struct fanotify_event_metadata *)buf;
+ FAN_EVENT_OK(metadata, len);
+ metadata = FAN_EVENT_NEXT(metadata, len)) {
+
+ event_num++;
+ struct fanotify_event_info_error *info = get_event_info_error(metadata);
+
+ if (!info) {
+ tst_res(TFAIL, "%s: Event [%d] missing error info",
+ ex->name, event_num);
+ continue;
+ }
+
+ if (info->error != ex->error && (ex->error2 == 0 || info->error != ex->error2)) {
+ tst_res(TFAIL, "%s: Event [%d] unexpected errno (%d)",
+ ex->name, event_num, info->error);
+ }
+
+ if (!first) {
+ first = metadata;
+ first_info = info;
+ }
+ total_count += info->error_count;
+
+ tst_res(TINFO, "Event [%d]: errno=%d, error_count=%d",
+ event_num, info->error, info->error_count);
+ }
+
+ if (first_info)
+ first_info->error_count = total_count;
+
+ return (first) ? first->event_len : 0;
+}
+
static int check_error_event_info_fid(struct fanotify_event_info_fid *fid,
const struct test_case *ex)
{
@@ -248,19 +291,54 @@ static void check_event(char *buf, size_t len, const struct test_case *ex)
static void do_test(unsigned int i)
{
const struct test_case *tcase = &testcases[i];
- size_t read_len;
+ size_t read_len = 0;
+ struct pollfd pfd;
+ unsigned int accumulated_count = 0;
SAFE_FANOTIFY_MARK(fd_notify, FAN_MARK_ADD|FAN_MARK_FILESYSTEM,
FAN_FS_ERROR, AT_FDCWD, MOUNT_PATH);
tcase->trigger_error();
- read_len = SAFE_READ(0, fd_notify, event_buf, BUF_SIZE);
+ pfd.fd = fd_notify;
+ pfd.events = POLLIN;
+
+ while (accumulated_count < tcase->error_count) {
+ if (poll(&pfd, 1, 5000) <= 0) {
+ tst_res(TFAIL, "%s: Timeout waiting for events", tcase->name);
+ goto out;
+ }
+
+ char *current_pos = event_buf + read_len;
+ int ret = read(fd_notify, current_pos, BUF_SIZE - read_len);
+
+ if (ret < 0) {
+ tst_res(TFAIL, "%s: read failed: %s", tcase->name, strerror(errno));
+ goto out;
+ }
+
+ struct fanotify_event_metadata *m =
+ (struct fanotify_event_metadata *)current_pos;
+ while (FAN_EVENT_OK(m, ret)) {
+ struct fanotify_event_info_error *e = get_event_info_error(m);
+
+ if (e)
+ accumulated_count += e->error_count;
+
+ read_len += m->event_len;
+ m = FAN_EVENT_NEXT(m, ret);
+ }
+ }
+
+ read_len = consolidate_events(event_buf, read_len, tcase);
+
+ check_event(event_buf, read_len, tcase);
+
+out:
SAFE_FANOTIFY_MARK(fd_notify, FAN_MARK_REMOVE|FAN_MARK_FILESYSTEM,
FAN_FS_ERROR, AT_FDCWD, MOUNT_PATH);
- check_event(event_buf, read_len, tcase);
/* Unmount and mount the filesystem to get it out of the error state */
SAFE_UMOUNT(MOUNT_PATH);
SAFE_MOUNT(tst_device->dev, MOUNT_PATH, tst_device->fs_type, 0, NULL);
--
2.52.0
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply related [flat|nested] 27+ messages in thread* Re: [LTP] [PATCH v4] fanotify22.c: handle multiple asynchronous error events
2026-03-25 12:43 ` [LTP] [PATCH v4] " Wei Gao via ltp
@ 2026-03-25 15:52 ` Jan Kara
2026-03-26 1:28 ` [LTP] [PATCH v5] " Wei Gao via ltp
1 sibling, 0 replies; 27+ messages in thread
From: Jan Kara @ 2026-03-25 15:52 UTC (permalink / raw)
To: Wei Gao; +Cc: Jan Kara, kernel test robot, ltp
On Wed 25-03-26 12:43:57, Wei Gao wrote:
> Since the introduction of the asynchronous fserror reporting framework
> (kernel commit 81d2e13a57c9), fanotify22 has encountered sporadic failures
> due to the non-deterministic nature of event delivery and merging:
>
> 1) tcase3 failure: A race condition occurs when the test reads the
> notification fd between two events. uses a poll() and read() loop to wait
> until the expected.
>
> 2) tcase4 failure: The kernel may deliver errors as independent events
> instead of a single merged event, since different worker kthread can
> end up generating each event so they won't be merged. As suggested by
> Jan Kara, this patch introduces a consolidate_events() helper. It iterates
> through the event buffer, accumulates the error_count from all independent
> events, and updates the first event's count in-place.
>
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202602042124.87bd00e3-lkp@intel.com
> Suggested-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Wei Gao <wegao@suse.com>
...
> +static size_t consolidate_events(char *buf, size_t len, const struct test_case *ex)
> +{
> + struct fanotify_event_metadata *metadata, *first = NULL;
> + struct fanotify_event_info_error *first_info = NULL;
> + unsigned int total_count = 0;
> + int event_num = 0;
> +
> + for (metadata = (struct fanotify_event_metadata *)buf;
> + FAN_EVENT_OK(metadata, len);
> + metadata = FAN_EVENT_NEXT(metadata, len)) {
> +
> + event_num++;
> + struct fanotify_event_info_error *info = get_event_info_error(metadata);
> +
> + if (!info) {
> + tst_res(TFAIL, "%s: Event [%d] missing error info",
> + ex->name, event_num);
> + continue;
> + }
> +
> + if (info->error != ex->error && (ex->error2 == 0 || info->error != ex->error2)) {
> + tst_res(TFAIL, "%s: Event [%d] unexpected errno (%d)",
> + ex->name, event_num, info->error);
Should we add 'continue' here similarly to the failure case above? So that
we skip over the event with invalid error code... Otherwise the test looks
correct to me.
Honza
> + }
> +
> + if (!first) {
> + first = metadata;
> + first_info = info;
> + }
> + total_count += info->error_count;
> +
> + tst_res(TINFO, "Event [%d]: errno=%d, error_count=%d",
> + event_num, info->error, info->error_count);
> + }
> +
> + if (first_info)
> + first_info->error_count = total_count;
> +
> + return (first) ? first->event_len : 0;
> +}
> +
> static int check_error_event_info_fid(struct fanotify_event_info_fid *fid,
> const struct test_case *ex)
> {
> @@ -248,19 +291,54 @@ static void check_event(char *buf, size_t len, const struct test_case *ex)
> static void do_test(unsigned int i)
> {
> const struct test_case *tcase = &testcases[i];
> - size_t read_len;
> + size_t read_len = 0;
> + struct pollfd pfd;
> + unsigned int accumulated_count = 0;
>
> SAFE_FANOTIFY_MARK(fd_notify, FAN_MARK_ADD|FAN_MARK_FILESYSTEM,
> FAN_FS_ERROR, AT_FDCWD, MOUNT_PATH);
>
> tcase->trigger_error();
>
> - read_len = SAFE_READ(0, fd_notify, event_buf, BUF_SIZE);
> + pfd.fd = fd_notify;
> + pfd.events = POLLIN;
> +
> + while (accumulated_count < tcase->error_count) {
> + if (poll(&pfd, 1, 5000) <= 0) {
> + tst_res(TFAIL, "%s: Timeout waiting for events", tcase->name);
> + goto out;
> + }
> +
> + char *current_pos = event_buf + read_len;
> + int ret = read(fd_notify, current_pos, BUF_SIZE - read_len);
> +
> + if (ret < 0) {
> + tst_res(TFAIL, "%s: read failed: %s", tcase->name, strerror(errno));
> + goto out;
> + }
> +
> + struct fanotify_event_metadata *m =
> + (struct fanotify_event_metadata *)current_pos;
> + while (FAN_EVENT_OK(m, ret)) {
> + struct fanotify_event_info_error *e = get_event_info_error(m);
> +
> + if (e)
> + accumulated_count += e->error_count;
> +
> + read_len += m->event_len;
> + m = FAN_EVENT_NEXT(m, ret);
> + }
> + }
> +
> + read_len = consolidate_events(event_buf, read_len, tcase);
> +
> + check_event(event_buf, read_len, tcase);
> +
> +out:
>
> SAFE_FANOTIFY_MARK(fd_notify, FAN_MARK_REMOVE|FAN_MARK_FILESYSTEM,
> FAN_FS_ERROR, AT_FDCWD, MOUNT_PATH);
>
> - check_event(event_buf, read_len, tcase);
> /* Unmount and mount the filesystem to get it out of the error state */
> SAFE_UMOUNT(MOUNT_PATH);
> SAFE_MOUNT(tst_device->dev, MOUNT_PATH, tst_device->fs_type, 0, NULL);
> --
> 2.52.0
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 27+ messages in thread* [LTP] [PATCH v5] fanotify22.c: handle multiple asynchronous error events
2026-03-25 12:43 ` [LTP] [PATCH v4] " Wei Gao via ltp
2026-03-25 15:52 ` Jan Kara
@ 2026-03-26 1:28 ` Wei Gao via ltp
2026-03-26 8:57 ` Jan Kara
` (2 more replies)
1 sibling, 3 replies; 27+ messages in thread
From: Wei Gao via ltp @ 2026-03-26 1:28 UTC (permalink / raw)
To: ltp; +Cc: Jan Kara, kernel test robot
Since the introduction of the asynchronous fserror reporting framework
(kernel commit 81d2e13a57c9), fanotify22 has encountered sporadic failures
due to the non-deterministic nature of event delivery and merging:
1) tcase3 failure: A race condition occurs when the test reads the
notification fd between two events. uses a poll() and read() loop to wait
until the expected.
2) tcase4 failure: The kernel may deliver errors as independent events
instead of a single merged event, since different worker kthread can
end up generating each event so they won't be merged. As suggested by
Jan Kara, this patch introduces a consolidate_events() helper. It iterates
through the event buffer, accumulates the error_count from all independent
events, and updates the first event's count in-place.
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202602042124.87bd00e3-lkp@intel.com
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wei Gao <wegao@suse.com>
---
v4->v5:
Added a continue after an unexpected errno2 is detected
.../kernel/syscalls/fanotify/fanotify22.c | 87 ++++++++++++++++++-
1 file changed, 83 insertions(+), 4 deletions(-)
diff --git a/testcases/kernel/syscalls/fanotify/fanotify22.c b/testcases/kernel/syscalls/fanotify/fanotify22.c
index e8002b160..2e5e6a3fa 100644
--- a/testcases/kernel/syscalls/fanotify/fanotify22.c
+++ b/testcases/kernel/syscalls/fanotify/fanotify22.c
@@ -28,6 +28,7 @@
#include "tst_test.h"
#include <sys/fanotify.h>
#include <sys/types.h>
+#include <poll.h>
#ifdef HAVE_SYS_FANOTIFY_H
#include "fanotify.h"
@@ -88,7 +89,6 @@ static void trigger_bad_link_lookup(void)
ret, BAD_LINK, errno, EUCLEAN);
}
-
static void tcase3_trigger(void)
{
trigger_bad_link_lookup();
@@ -104,6 +104,7 @@ static void tcase4_trigger(void)
static struct test_case {
char *name;
int error;
+ int error2;
unsigned int error_count;
struct fanotify_fid_t *fid;
void (*trigger_error)(void);
@@ -134,10 +135,53 @@ static struct test_case {
.trigger_error = &tcase4_trigger,
.error_count = 2,
.error = EFSCORRUPTED,
+ .error2 = ESHUTDOWN,
.fid = &bad_file_fid,
}
};
+static size_t consolidate_events(char *buf, size_t len, const struct test_case *ex)
+{
+ struct fanotify_event_metadata *metadata, *first = NULL;
+ struct fanotify_event_info_error *first_info = NULL;
+ unsigned int total_count = 0;
+ int event_num = 0;
+
+ for (metadata = (struct fanotify_event_metadata *)buf;
+ FAN_EVENT_OK(metadata, len);
+ metadata = FAN_EVENT_NEXT(metadata, len)) {
+
+ event_num++;
+ struct fanotify_event_info_error *info = get_event_info_error(metadata);
+
+ if (!info) {
+ tst_res(TFAIL, "%s: Event [%d] missing error info",
+ ex->name, event_num);
+ continue;
+ }
+
+ if (info->error != ex->error && (ex->error2 == 0 || info->error != ex->error2)) {
+ tst_res(TFAIL, "%s: Event [%d] unexpected errno (%d)",
+ ex->name, event_num, info->error);
+ continue;
+ }
+
+ if (!first) {
+ first = metadata;
+ first_info = info;
+ }
+ total_count += info->error_count;
+
+ tst_res(TINFO, "Event [%d]: errno=%d, error_count=%d",
+ event_num, info->error, info->error_count);
+ }
+
+ if (first_info)
+ first_info->error_count = total_count;
+
+ return (first) ? first->event_len : 0;
+}
+
static int check_error_event_info_fid(struct fanotify_event_info_fid *fid,
const struct test_case *ex)
{
@@ -248,19 +292,54 @@ static void check_event(char *buf, size_t len, const struct test_case *ex)
static void do_test(unsigned int i)
{
const struct test_case *tcase = &testcases[i];
- size_t read_len;
+ size_t read_len = 0;
+ struct pollfd pfd;
+ unsigned int accumulated_count = 0;
SAFE_FANOTIFY_MARK(fd_notify, FAN_MARK_ADD|FAN_MARK_FILESYSTEM,
FAN_FS_ERROR, AT_FDCWD, MOUNT_PATH);
tcase->trigger_error();
- read_len = SAFE_READ(0, fd_notify, event_buf, BUF_SIZE);
+ pfd.fd = fd_notify;
+ pfd.events = POLLIN;
+
+ while (accumulated_count < tcase->error_count) {
+ if (poll(&pfd, 1, 5000) <= 0) {
+ tst_res(TFAIL, "%s: Timeout waiting for events", tcase->name);
+ goto out;
+ }
+
+ char *current_pos = event_buf + read_len;
+ int ret = read(fd_notify, current_pos, BUF_SIZE - read_len);
+
+ if (ret < 0) {
+ tst_res(TFAIL, "%s: read failed: %s", tcase->name, strerror(errno));
+ goto out;
+ }
+
+ struct fanotify_event_metadata *m =
+ (struct fanotify_event_metadata *)current_pos;
+ while (FAN_EVENT_OK(m, ret)) {
+ struct fanotify_event_info_error *e = get_event_info_error(m);
+
+ if (e)
+ accumulated_count += e->error_count;
+
+ read_len += m->event_len;
+ m = FAN_EVENT_NEXT(m, ret);
+ }
+ }
+
+ read_len = consolidate_events(event_buf, read_len, tcase);
+
+ check_event(event_buf, read_len, tcase);
+
+out:
SAFE_FANOTIFY_MARK(fd_notify, FAN_MARK_REMOVE|FAN_MARK_FILESYSTEM,
FAN_FS_ERROR, AT_FDCWD, MOUNT_PATH);
- check_event(event_buf, read_len, tcase);
/* Unmount and mount the filesystem to get it out of the error state */
SAFE_UMOUNT(MOUNT_PATH);
SAFE_MOUNT(tst_device->dev, MOUNT_PATH, tst_device->fs_type, 0, NULL);
--
2.52.0
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply related [flat|nested] 27+ messages in thread* Re: [LTP] [PATCH v5] fanotify22.c: handle multiple asynchronous error events
2026-03-26 1:28 ` [LTP] [PATCH v5] " Wei Gao via ltp
@ 2026-03-26 8:57 ` Jan Kara
2026-03-26 9:40 ` Andrea Cervesato via ltp
2026-03-27 4:55 ` [LTP] [PATCH v6] " Wei Gao via ltp
2 siblings, 0 replies; 27+ messages in thread
From: Jan Kara @ 2026-03-26 8:57 UTC (permalink / raw)
To: Wei Gao; +Cc: Jan Kara, kernel test robot, ltp
On Thu 26-03-26 01:28:58, Wei Gao wrote:
> Since the introduction of the asynchronous fserror reporting framework
> (kernel commit 81d2e13a57c9), fanotify22 has encountered sporadic failures
> due to the non-deterministic nature of event delivery and merging:
>
> 1) tcase3 failure: A race condition occurs when the test reads the
> notification fd between two events. uses a poll() and read() loop to wait
> until the expected.
>
> 2) tcase4 failure: The kernel may deliver errors as independent events
> instead of a single merged event, since different worker kthread can
> end up generating each event so they won't be merged. As suggested by
> Jan Kara, this patch introduces a consolidate_events() helper. It iterates
> through the event buffer, accumulates the error_count from all independent
> events, and updates the first event's count in-place.
>
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202602042124.87bd00e3-lkp@intel.com
> Suggested-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Wei Gao <wegao@suse.com>
Looks good to me. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> v4->v5:
> Added a continue after an unexpected errno2 is detected
>
> .../kernel/syscalls/fanotify/fanotify22.c | 87 ++++++++++++++++++-
> 1 file changed, 83 insertions(+), 4 deletions(-)
>
> diff --git a/testcases/kernel/syscalls/fanotify/fanotify22.c b/testcases/kernel/syscalls/fanotify/fanotify22.c
> index e8002b160..2e5e6a3fa 100644
> --- a/testcases/kernel/syscalls/fanotify/fanotify22.c
> +++ b/testcases/kernel/syscalls/fanotify/fanotify22.c
> @@ -28,6 +28,7 @@
> #include "tst_test.h"
> #include <sys/fanotify.h>
> #include <sys/types.h>
> +#include <poll.h>
>
> #ifdef HAVE_SYS_FANOTIFY_H
> #include "fanotify.h"
> @@ -88,7 +89,6 @@ static void trigger_bad_link_lookup(void)
> ret, BAD_LINK, errno, EUCLEAN);
> }
>
> -
> static void tcase3_trigger(void)
> {
> trigger_bad_link_lookup();
> @@ -104,6 +104,7 @@ static void tcase4_trigger(void)
> static struct test_case {
> char *name;
> int error;
> + int error2;
> unsigned int error_count;
> struct fanotify_fid_t *fid;
> void (*trigger_error)(void);
> @@ -134,10 +135,53 @@ static struct test_case {
> .trigger_error = &tcase4_trigger,
> .error_count = 2,
> .error = EFSCORRUPTED,
> + .error2 = ESHUTDOWN,
> .fid = &bad_file_fid,
> }
> };
>
> +static size_t consolidate_events(char *buf, size_t len, const struct test_case *ex)
> +{
> + struct fanotify_event_metadata *metadata, *first = NULL;
> + struct fanotify_event_info_error *first_info = NULL;
> + unsigned int total_count = 0;
> + int event_num = 0;
> +
> + for (metadata = (struct fanotify_event_metadata *)buf;
> + FAN_EVENT_OK(metadata, len);
> + metadata = FAN_EVENT_NEXT(metadata, len)) {
> +
> + event_num++;
> + struct fanotify_event_info_error *info = get_event_info_error(metadata);
> +
> + if (!info) {
> + tst_res(TFAIL, "%s: Event [%d] missing error info",
> + ex->name, event_num);
> + continue;
> + }
> +
> + if (info->error != ex->error && (ex->error2 == 0 || info->error != ex->error2)) {
> + tst_res(TFAIL, "%s: Event [%d] unexpected errno (%d)",
> + ex->name, event_num, info->error);
> + continue;
> + }
> +
> + if (!first) {
> + first = metadata;
> + first_info = info;
> + }
> + total_count += info->error_count;
> +
> + tst_res(TINFO, "Event [%d]: errno=%d, error_count=%d",
> + event_num, info->error, info->error_count);
> + }
> +
> + if (first_info)
> + first_info->error_count = total_count;
> +
> + return (first) ? first->event_len : 0;
> +}
> +
> static int check_error_event_info_fid(struct fanotify_event_info_fid *fid,
> const struct test_case *ex)
> {
> @@ -248,19 +292,54 @@ static void check_event(char *buf, size_t len, const struct test_case *ex)
> static void do_test(unsigned int i)
> {
> const struct test_case *tcase = &testcases[i];
> - size_t read_len;
> + size_t read_len = 0;
> + struct pollfd pfd;
> + unsigned int accumulated_count = 0;
>
> SAFE_FANOTIFY_MARK(fd_notify, FAN_MARK_ADD|FAN_MARK_FILESYSTEM,
> FAN_FS_ERROR, AT_FDCWD, MOUNT_PATH);
>
> tcase->trigger_error();
>
> - read_len = SAFE_READ(0, fd_notify, event_buf, BUF_SIZE);
> + pfd.fd = fd_notify;
> + pfd.events = POLLIN;
> +
> + while (accumulated_count < tcase->error_count) {
> + if (poll(&pfd, 1, 5000) <= 0) {
> + tst_res(TFAIL, "%s: Timeout waiting for events", tcase->name);
> + goto out;
> + }
> +
> + char *current_pos = event_buf + read_len;
> + int ret = read(fd_notify, current_pos, BUF_SIZE - read_len);
> +
> + if (ret < 0) {
> + tst_res(TFAIL, "%s: read failed: %s", tcase->name, strerror(errno));
> + goto out;
> + }
> +
> + struct fanotify_event_metadata *m =
> + (struct fanotify_event_metadata *)current_pos;
> + while (FAN_EVENT_OK(m, ret)) {
> + struct fanotify_event_info_error *e = get_event_info_error(m);
> +
> + if (e)
> + accumulated_count += e->error_count;
> +
> + read_len += m->event_len;
> + m = FAN_EVENT_NEXT(m, ret);
> + }
> + }
> +
> + read_len = consolidate_events(event_buf, read_len, tcase);
> +
> + check_event(event_buf, read_len, tcase);
> +
> +out:
>
> SAFE_FANOTIFY_MARK(fd_notify, FAN_MARK_REMOVE|FAN_MARK_FILESYSTEM,
> FAN_FS_ERROR, AT_FDCWD, MOUNT_PATH);
>
> - check_event(event_buf, read_len, tcase);
> /* Unmount and mount the filesystem to get it out of the error state */
> SAFE_UMOUNT(MOUNT_PATH);
> SAFE_MOUNT(tst_device->dev, MOUNT_PATH, tst_device->fs_type, 0, NULL);
> --
> 2.52.0
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [LTP] [PATCH v5] fanotify22.c: handle multiple asynchronous error events
2026-03-26 1:28 ` [LTP] [PATCH v5] " Wei Gao via ltp
2026-03-26 8:57 ` Jan Kara
@ 2026-03-26 9:40 ` Andrea Cervesato via ltp
2026-03-27 4:55 ` [LTP] [PATCH v6] " Wei Gao via ltp
2 siblings, 0 replies; 27+ messages in thread
From: Andrea Cervesato via ltp @ 2026-03-26 9:40 UTC (permalink / raw)
To: Wei Gao; +Cc: Jan Kara, kernel test robot, ltp
Hi Wei,
The overall approach with poll/read loop and consolidate_events() looks
good. A few issues below.
> + char *current_pos = event_buf + read_len;
> + int ret = read(fd_notify, current_pos, BUF_SIZE - read_len);
read() returns ssize_t, not int. Should be:
ssize_t ret = read(fd_notify, current_pos, BUF_SIZE - read_len);
Also, both read_len and BUF_SIZE are effectively unsigned here. If
read_len ever reaches or exceeds BUF_SIZE, the subtraction wraps to a
huge value and read() will write past event_buf. A defensive check
before the read would be safer:
if (read_len >= BUF_SIZE)
tst_brk(TBROK, "Event buffer full");
> + if (ret < 0) {
> + tst_res(TFAIL, "%s: read failed: %s", tcase->name, strerror(errno));
> + goto out;
> + }
Two things here:
1) Use TERRNO instead of manual strerror(errno):
tst_res(... | TERRNO, "%s: read failed", tcase->name);
2) A read() failure after poll() confirmed data availability is a test
infrastructure problem, not a test logic failure. This should be
TBROK, not TFAIL:
tst_brk(TBROK | TERRNO, "%s: read failed", tcase->name);
Regards,
--
Andrea Cervesato
SUSE QE Automation Engineer Linux
andrea.cervesato@suse.com
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 27+ messages in thread* [LTP] [PATCH v6] fanotify22.c: handle multiple asynchronous error events
2026-03-26 1:28 ` [LTP] [PATCH v5] " Wei Gao via ltp
2026-03-26 8:57 ` Jan Kara
2026-03-26 9:40 ` Andrea Cervesato via ltp
@ 2026-03-27 4:55 ` Wei Gao via ltp
2026-03-27 9:07 ` Andrea Cervesato via ltp
2026-03-27 12:33 ` [LTP] [PATCH v7] " Wei Gao via ltp
2 siblings, 2 replies; 27+ messages in thread
From: Wei Gao via ltp @ 2026-03-27 4:55 UTC (permalink / raw)
To: ltp; +Cc: Jan Kara, kernel test robot
Since the introduction of the asynchronous fserror reporting framework
(kernel commit 81d2e13a57c9), fanotify22 has encountered sporadic failures
due to the non-deterministic nature of event delivery and merging:
1) tcase3 failure: A race condition occurs when the test reads the
notification fd between two events. uses a poll() and read() loop to wait
until the expected.
2) tcase4 failure: The kernel may deliver errors as independent events
instead of a single merged event, since different worker kthread can
end up generating each event so they won't be merged. As suggested by
Jan Kara, this patch introduces a consolidate_events() helper. It iterates
through the event buffer, accumulates the error_count from all independent
events, and updates the first event's count in-place.
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202602042124.87bd00e3-lkp@intel.com
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wei Gao <wegao@suse.com>
---
v5->v6:
- Changed ret from int to ssize_t to correctly match the return type of the read() system call.
- Add Defensive Check ensure read_len has not reached or exceeded BUF_SIZE before calling read().
- Use TERRNO flag to replace manual strerror(errno)
.../kernel/syscalls/fanotify/fanotify22.c | 89 ++++++++++++++++++-
1 file changed, 85 insertions(+), 4 deletions(-)
diff --git a/testcases/kernel/syscalls/fanotify/fanotify22.c b/testcases/kernel/syscalls/fanotify/fanotify22.c
index e8002b160..931f59bed 100644
--- a/testcases/kernel/syscalls/fanotify/fanotify22.c
+++ b/testcases/kernel/syscalls/fanotify/fanotify22.c
@@ -28,6 +28,7 @@
#include "tst_test.h"
#include <sys/fanotify.h>
#include <sys/types.h>
+#include <poll.h>
#ifdef HAVE_SYS_FANOTIFY_H
#include "fanotify.h"
@@ -88,7 +89,6 @@ static void trigger_bad_link_lookup(void)
ret, BAD_LINK, errno, EUCLEAN);
}
-
static void tcase3_trigger(void)
{
trigger_bad_link_lookup();
@@ -104,6 +104,7 @@ static void tcase4_trigger(void)
static struct test_case {
char *name;
int error;
+ int error2;
unsigned int error_count;
struct fanotify_fid_t *fid;
void (*trigger_error)(void);
@@ -134,10 +135,53 @@ static struct test_case {
.trigger_error = &tcase4_trigger,
.error_count = 2,
.error = EFSCORRUPTED,
+ .error2 = ESHUTDOWN,
.fid = &bad_file_fid,
}
};
+static size_t consolidate_events(char *buf, size_t len, const struct test_case *ex)
+{
+ struct fanotify_event_metadata *metadata, *first = NULL;
+ struct fanotify_event_info_error *first_info = NULL;
+ unsigned int total_count = 0;
+ int event_num = 0;
+
+ for (metadata = (struct fanotify_event_metadata *)buf;
+ FAN_EVENT_OK(metadata, len);
+ metadata = FAN_EVENT_NEXT(metadata, len)) {
+
+ event_num++;
+ struct fanotify_event_info_error *info = get_event_info_error(metadata);
+
+ if (!info) {
+ tst_res(TFAIL, "%s: Event [%d] missing error info",
+ ex->name, event_num);
+ continue;
+ }
+
+ if (info->error != ex->error && (ex->error2 == 0 || info->error != ex->error2)) {
+ tst_res(TFAIL, "%s: Event [%d] unexpected errno (%d)",
+ ex->name, event_num, info->error);
+ continue;
+ }
+
+ if (!first) {
+ first = metadata;
+ first_info = info;
+ }
+ total_count += info->error_count;
+
+ tst_res(TINFO, "Event [%d]: errno=%d, error_count=%d",
+ event_num, info->error, info->error_count);
+ }
+
+ if (first_info)
+ first_info->error_count = total_count;
+
+ return (first) ? first->event_len : 0;
+}
+
static int check_error_event_info_fid(struct fanotify_event_info_fid *fid,
const struct test_case *ex)
{
@@ -248,19 +292,56 @@ static void check_event(char *buf, size_t len, const struct test_case *ex)
static void do_test(unsigned int i)
{
const struct test_case *tcase = &testcases[i];
- size_t read_len;
+ size_t read_len = 0;
+ struct pollfd pfd;
+ unsigned int accumulated_count = 0;
SAFE_FANOTIFY_MARK(fd_notify, FAN_MARK_ADD|FAN_MARK_FILESYSTEM,
FAN_FS_ERROR, AT_FDCWD, MOUNT_PATH);
tcase->trigger_error();
- read_len = SAFE_READ(0, fd_notify, event_buf, BUF_SIZE);
+ pfd.fd = fd_notify;
+ pfd.events = POLLIN;
+
+ while (accumulated_count < tcase->error_count) {
+ if (poll(&pfd, 1, 5000) <= 0) {
+ tst_res(TFAIL, "%s: Timeout waiting for events", tcase->name);
+ goto out;
+ }
+
+ if (read_len >= BUF_SIZE)
+ tst_brk(TBROK, "Event buffer full");
+
+ char *current_pos = event_buf + read_len;
+ ssize_t ret = read(fd_notify, current_pos, BUF_SIZE - read_len);
+
+ if (ret < 0) {
+ tst_brk(TBROK | TERRNO, "%s: read failed", tcase->name);
+ }
+
+ struct fanotify_event_metadata *m =
+ (struct fanotify_event_metadata *)current_pos;
+ while (FAN_EVENT_OK(m, ret)) {
+ struct fanotify_event_info_error *e = get_event_info_error(m);
+
+ if (e)
+ accumulated_count += e->error_count;
+
+ read_len += m->event_len;
+ m = FAN_EVENT_NEXT(m, ret);
+ }
+ }
+
+ read_len = consolidate_events(event_buf, read_len, tcase);
+
+ check_event(event_buf, read_len, tcase);
+
+out:
SAFE_FANOTIFY_MARK(fd_notify, FAN_MARK_REMOVE|FAN_MARK_FILESYSTEM,
FAN_FS_ERROR, AT_FDCWD, MOUNT_PATH);
- check_event(event_buf, read_len, tcase);
/* Unmount and mount the filesystem to get it out of the error state */
SAFE_UMOUNT(MOUNT_PATH);
SAFE_MOUNT(tst_device->dev, MOUNT_PATH, tst_device->fs_type, 0, NULL);
--
2.52.0
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply related [flat|nested] 27+ messages in thread* Re: [LTP] [PATCH v6] fanotify22.c: handle multiple asynchronous error events
2026-03-27 4:55 ` [LTP] [PATCH v6] " Wei Gao via ltp
@ 2026-03-27 9:07 ` Andrea Cervesato via ltp
2026-03-27 12:33 ` [LTP] [PATCH v7] " Wei Gao via ltp
1 sibling, 0 replies; 27+ messages in thread
From: Andrea Cervesato via ltp @ 2026-03-27 9:07 UTC (permalink / raw)
To: Wei Gao; +Cc: Jan Kara, kernel test robot, ltp
Hi Wei,
> Since the introduction of the asynchronous fserror reporting framework
> (kernel commit 81d2e13a57c9), fanotify22 has encountered sporadic failures
> due to the non-deterministic nature of event delivery and merging:
>
> 1) tcase3 failure: A race condition occurs when the test reads the
> notification fd between two events. uses a poll() and read() loop to wait
> until the expected.
This sentence is truncated at expected. Maybe the meaning was "Use a poll()
and read() loop to wait until the expected event".
> +
> + while (accumulated_count < tcase->error_count) {
> + if (poll(&pfd, 1, 5000) <= 0) {
> + tst_res(TFAIL, "%s: Timeout waiting for events", tcase->name);
> + goto out;
> + }
> +
> + if (read_len >= BUF_SIZE)
> + tst_brk(TBROK, "Event buffer full");
> +
> + char *current_pos = event_buf + read_len;
> + ssize_t ret = read(fd_notify, current_pos, BUF_SIZE - read_len);
> +
> + if (ret < 0) {
> + tst_brk(TBROK | TERRNO, "%s: read failed", tcase->name);
> + }
This is a bit weird. We have SAFE_READ() which is handling already the
read() syscalls errors, so we should use it. The reason why we are manually
handling the error is to show the tcase->name I guess. In this case, we
don't really need this: we usually have
tst_res(TINFO, "Test case: %s", tcase->name);
at the beginning of the do_test() function, so everything that comes later
will be related to it. At that point, also poll() TFAIL message should be
updated accordingly.
Regards,
--
Andrea Cervesato
SUSE QE Automation Engineer Linux
andrea.cervesato@suse.com
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 27+ messages in thread* [LTP] [PATCH v7] fanotify22.c: handle multiple asynchronous error events
2026-03-27 4:55 ` [LTP] [PATCH v6] " Wei Gao via ltp
2026-03-27 9:07 ` Andrea Cervesato via ltp
@ 2026-03-27 12:33 ` Wei Gao via ltp
2026-03-27 14:19 ` Andrea Cervesato via ltp
1 sibling, 1 reply; 27+ messages in thread
From: Wei Gao via ltp @ 2026-03-27 12:33 UTC (permalink / raw)
To: ltp; +Cc: Jan Kara, kernel test robot
Since the introduction of the asynchronous fserror reporting framework
(kernel commit 81d2e13a57c9), fanotify22 has encountered sporadic failures
due to the non-deterministic nature of event delivery and merging:
1) tcase3 failure: A race condition occurs when the test reads the
notification fd between two events. Use a poll() and read() loop to wait
until the expected event.
2) tcase4 failure: The kernel may deliver errors as independent events
instead of a single merged event, since different worker kthread can
end up generating each event so they won't be merged. As suggested by
Jan Kara, this patch introduces a consolidate_events() helper. It iterates
through the event buffer, accumulates the error_count from all independent
events, and updates the first event's count in-place.
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202602042124.87bd00e3-lkp@intel.com
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wei Gao <wegao@suse.com>
---
v6 -> v7:
- Replaced read() with SAFE_READ().
- Added TINFO message at the start of do_test().
- Removed redundant test case names from error messages.
- Improved buffer overflow check in do_test().
.../kernel/syscalls/fanotify/fanotify22.c | 110 +++++++++++++++---
1 file changed, 94 insertions(+), 16 deletions(-)
diff --git a/testcases/kernel/syscalls/fanotify/fanotify22.c b/testcases/kernel/syscalls/fanotify/fanotify22.c
index e8002b160..9a555c247 100644
--- a/testcases/kernel/syscalls/fanotify/fanotify22.c
+++ b/testcases/kernel/syscalls/fanotify/fanotify22.c
@@ -28,6 +28,7 @@
#include "tst_test.h"
#include <sys/fanotify.h>
#include <sys/types.h>
+#include <poll.h>
#ifdef HAVE_SYS_FANOTIFY_H
#include "fanotify.h"
@@ -88,7 +89,6 @@ static void trigger_bad_link_lookup(void)
ret, BAD_LINK, errno, EUCLEAN);
}
-
static void tcase3_trigger(void)
{
trigger_bad_link_lookup();
@@ -104,6 +104,7 @@ static void tcase4_trigger(void)
static struct test_case {
char *name;
int error;
+ int error2;
unsigned int error_count;
struct fanotify_fid_t *fid;
void (*trigger_error)(void);
@@ -134,37 +135,79 @@ static struct test_case {
.trigger_error = &tcase4_trigger,
.error_count = 2,
.error = EFSCORRUPTED,
+ .error2 = ESHUTDOWN,
.fid = &bad_file_fid,
}
};
+static size_t consolidate_events(char *buf, size_t len, const struct test_case *ex)
+{
+ struct fanotify_event_metadata *metadata, *first = NULL;
+ struct fanotify_event_info_error *first_info = NULL;
+ unsigned int total_count = 0;
+ int event_num = 0;
+
+ for (metadata = (struct fanotify_event_metadata *)buf;
+ FAN_EVENT_OK(metadata, len);
+ metadata = FAN_EVENT_NEXT(metadata, len)) {
+
+ event_num++;
+ struct fanotify_event_info_error *info = get_event_info_error(metadata);
+
+ if (!info) {
+ tst_res(TFAIL, "Event [%d] missing error info", event_num);
+ continue;
+ }
+
+ if (info->error != ex->error && (ex->error2 == 0 || info->error != ex->error2)) {
+ tst_res(TFAIL, "Event [%d] unexpected errno (%d)",
+ event_num, info->error);
+ continue;
+ }
+
+ if (!first) {
+ first = metadata;
+ first_info = info;
+ }
+ total_count += info->error_count;
+
+ tst_res(TINFO, "Event [%d]: errno=%d, error_count=%d",
+ event_num, info->error, info->error_count);
+ }
+
+ if (first_info)
+ first_info->error_count = total_count;
+
+ return (first) ? first->event_len : 0;
+}
+
static int check_error_event_info_fid(struct fanotify_event_info_fid *fid,
const struct test_case *ex)
{
struct file_handle *fh = (struct file_handle *) &fid->handle;
if (memcmp(&fid->fsid, &ex->fid->fsid, sizeof(fid->fsid))) {
- tst_res(TFAIL, "%s: Received bad FSID type (%x...!=%x...)",
- ex->name, FSID_VAL_MEMBER(fid->fsid, 0),
+ tst_res(TFAIL, "Received bad FSID type (%x...!=%x...)",
+ FSID_VAL_MEMBER(fid->fsid, 0),
ex->fid->fsid.val[0]);
return 1;
}
if (fh->handle_type != ex->fid->handle.handle_type) {
- tst_res(TFAIL, "%s: Received bad file_handle type (%d!=%d)",
- ex->name, fh->handle_type, ex->fid->handle.handle_type);
+ tst_res(TFAIL, "Received bad file_handle type (%d!=%d)",
+ fh->handle_type, ex->fid->handle.handle_type);
return 1;
}
if (fh->handle_bytes != ex->fid->handle.handle_bytes) {
- tst_res(TFAIL, "%s: Received bad file_handle len (%d!=%d)",
- ex->name, fh->handle_bytes, ex->fid->handle.handle_bytes);
+ tst_res(TFAIL, "Received bad file_handle len (%d!=%d)",
+ fh->handle_bytes, ex->fid->handle.handle_bytes);
return 1;
}
if (memcmp(fh->f_handle, ex->fid->handle.f_handle, fh->handle_bytes)) {
- tst_res(TFAIL, "%s: Received wrong handle. "
- "Expected (%x...) got (%x...) ", ex->name,
+ tst_res(TFAIL, "Received wrong handle. "
+ "Expected (%x...) got (%x...) ",
*(int *)ex->fid->handle.f_handle, *(int *)fh->f_handle);
return 1;
}
@@ -177,14 +220,14 @@ static int check_error_event_info_error(struct fanotify_event_info_error *info_e
int fail = 0;
if (info_error->error_count != ex->error_count) {
- tst_res(TFAIL, "%s: Unexpected error_count (%d!=%d)",
- ex->name, info_error->error_count, ex->error_count);
+ tst_res(TFAIL, "Unexpected error_count (%d!=%d)",
+ info_error->error_count, ex->error_count);
fail++;
}
if (info_error->error != ex->error) {
- tst_res(TFAIL, "%s: Unexpected error code value (%d!=%d)",
- ex->name, info_error->error, ex->error);
+ tst_res(TFAIL, "Unexpected error code value (%d!=%d)",
+ info_error->error, ex->error);
fail++;
}
@@ -248,19 +291,54 @@ static void check_event(char *buf, size_t len, const struct test_case *ex)
static void do_test(unsigned int i)
{
const struct test_case *tcase = &testcases[i];
- size_t read_len;
+ size_t read_len = 0;
+ struct pollfd pfd;
+ unsigned int accumulated_count = 0;
+
+ tst_res(TINFO, "Test case: %s", tcase->name);
SAFE_FANOTIFY_MARK(fd_notify, FAN_MARK_ADD|FAN_MARK_FILESYSTEM,
FAN_FS_ERROR, AT_FDCWD, MOUNT_PATH);
tcase->trigger_error();
- read_len = SAFE_READ(0, fd_notify, event_buf, BUF_SIZE);
+ pfd.fd = fd_notify;
+ pfd.events = POLLIN;
+
+ while (accumulated_count < tcase->error_count) {
+ if (poll(&pfd, 1, 5000) <= 0) {
+ tst_res(TFAIL, "Timeout waiting for events");
+ goto out;
+ }
+
+ if (BUF_SIZE - read_len < FAN_EVENT_METADATA_LEN)
+ tst_brk(TBROK, "Insufficient buffer space for next event");
+
+ char *current_pos = event_buf + read_len;
+ ssize_t ret = SAFE_READ(0, fd_notify, current_pos, BUF_SIZE - read_len);
+
+ struct fanotify_event_metadata *m =
+ (struct fanotify_event_metadata *)current_pos;
+ while (FAN_EVENT_OK(m, ret)) {
+ struct fanotify_event_info_error *e = get_event_info_error(m);
+
+ if (e)
+ accumulated_count += e->error_count;
+
+ read_len += m->event_len;
+ m = FAN_EVENT_NEXT(m, ret);
+ }
+ }
+
+ read_len = consolidate_events(event_buf, read_len, tcase);
+
+ check_event(event_buf, read_len, tcase);
+
+out:
SAFE_FANOTIFY_MARK(fd_notify, FAN_MARK_REMOVE|FAN_MARK_FILESYSTEM,
FAN_FS_ERROR, AT_FDCWD, MOUNT_PATH);
- check_event(event_buf, read_len, tcase);
/* Unmount and mount the filesystem to get it out of the error state */
SAFE_UMOUNT(MOUNT_PATH);
SAFE_MOUNT(tst_device->dev, MOUNT_PATH, tst_device->fs_type, 0, NULL);
--
2.52.0
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply related [flat|nested] 27+ messages in thread* Re: [LTP] [PATCH v7] fanotify22.c: handle multiple asynchronous error events
2026-03-27 12:33 ` [LTP] [PATCH v7] " Wei Gao via ltp
@ 2026-03-27 14:19 ` Andrea Cervesato via ltp
2026-03-28 0:44 ` Wei Gao via ltp
0 siblings, 1 reply; 27+ messages in thread
From: Andrea Cervesato via ltp @ 2026-03-27 14:19 UTC (permalink / raw)
To: Wei Gao; +Cc: Jan Kara, kernel test robot, ltp
Hi Wei,
we still have a problem with the patch. We are considering deterministic
events, where one event comes after the other.
consolidate_events() accepts events matching either ex->error or
ex->error2:
> + if (info->error != ex->error && (ex->error2 == 0 || info->error != ex->error2)) {
but the first accepted event becomes the one passed to check_event().
Since event delivery order is non-deterministic, the first event could
have error == ESHUTDOWN (error2). check_error_event_info_error() then
only checks against ex->error:
> if (info_error->error != ex->error) {
> tst_res(TFAIL, "Unexpected error code value (%d!=%d)",
> info_error->error, ex->error);
So it compares ESHUTDOWN against EFSCORRUPTED, doesn't match, and
reports TFAIL.
We need to change that into:
if (info_error->error != ex->error &&
(ex->error2 == 0 || info_error->error != ex->error2)) {
Similar to the consolidate_events(), otherwise we will have a TFAIL when
events don't arrive in the same order. Did you try to run patch with a high
number of `-i` ? It should fails sometimes with the actual code.
Regards,
--
Andrea Cervesato
SUSE QE Automation Engineer Linux
andrea.cervesato@suse.com
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [LTP] [PATCH v7] fanotify22.c: handle multiple asynchronous error events
2026-03-27 14:19 ` Andrea Cervesato via ltp
@ 2026-03-28 0:44 ` Wei Gao via ltp
2026-03-30 7:17 ` Andrea Cervesato via ltp
2026-03-30 7:36 ` Jan Kara
0 siblings, 2 replies; 27+ messages in thread
From: Wei Gao via ltp @ 2026-03-28 0:44 UTC (permalink / raw)
To: Andrea Cervesato; +Cc: Jan Kara, kernel test robot, ltp
On Fri, Mar 27, 2026 at 02:19:28PM +0000, Andrea Cervesato wrote:
> Hi Wei,
>
> we still have a problem with the patch. We are considering deterministic
> events, where one event comes after the other.
>
> consolidate_events() accepts events matching either ex->error or
> ex->error2:
>
> > + if (info->error != ex->error && (ex->error2 == 0 || info->error != ex->error2)) {
>
> but the first accepted event becomes the one passed to check_event().
> Since event delivery order is non-deterministic, the first event could
> have error == ESHUTDOWN (error2). check_error_event_info_error() then
> only checks against ex->error:
>
> > if (info_error->error != ex->error) {
> > tst_res(TFAIL, "Unexpected error code value (%d!=%d)",
> > info_error->error, ex->error);
>
> So it compares ESHUTDOWN against EFSCORRUPTED, doesn't match, and
> reports TFAIL.
>
> We need to change that into:
>
> if (info_error->error != ex->error &&
> (ex->error2 == 0 || info_error->error != ex->error2)) {
>
> Similar to the consolidate_events(), otherwise we will have a TFAIL when
> events don't arrive in the same order. Did you try to run patch with a high
> number of `-i` ? It should fails sometimes with the actual code.
I remember i have tested in openqa with -i1000 without error in early version such
as v4 or v5(The main logic not touched compared with v6/v7).
BTW: is this the final comments?
>
> Regards,
> --
> Andrea Cervesato
> SUSE QE Automation Engineer Linux
> andrea.cervesato@suse.com
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [LTP] [PATCH v7] fanotify22.c: handle multiple asynchronous error events
2026-03-28 0:44 ` Wei Gao via ltp
@ 2026-03-30 7:17 ` Andrea Cervesato via ltp
2026-03-30 7:36 ` Jan Kara
1 sibling, 0 replies; 27+ messages in thread
From: Andrea Cervesato via ltp @ 2026-03-30 7:17 UTC (permalink / raw)
To: Wei Gao; +Cc: Jan Kara, kernel test robot, ltp
Hi Wei,
> I remember i have tested in openqa with -i1000 without error in early version such
> as v4 or v5(The main logic not touched compared with v6/v7).
>
I see, anyway the logic needs to be fixed because at the moment test is not
properly handling both error codes.
> BTW: is this the final comments?
Yes, please send the next patch. I will review it later on.
Regards,
--
Andrea Cervesato
SUSE QE Automation Engineer Linux
andrea.cervesato@suse.com
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [LTP] [PATCH v7] fanotify22.c: handle multiple asynchronous error events
2026-03-28 0:44 ` Wei Gao via ltp
2026-03-30 7:17 ` Andrea Cervesato via ltp
@ 2026-03-30 7:36 ` Jan Kara
1 sibling, 0 replies; 27+ messages in thread
From: Jan Kara @ 2026-03-30 7:36 UTC (permalink / raw)
To: Wei Gao; +Cc: Jan Kara, kernel test robot, ltp
Hi!
On Sat 28-03-26 00:44:21, Wei Gao wrote:
> On Fri, Mar 27, 2026 at 02:19:28PM +0000, Andrea Cervesato wrote:
> > Hi Wei,
> >
> > we still have a problem with the patch. We are considering deterministic
> > events, where one event comes after the other.
> >
> > consolidate_events() accepts events matching either ex->error or
> > ex->error2:
> >
> > > + if (info->error != ex->error && (ex->error2 == 0 || info->error != ex->error2)) {
> >
> > but the first accepted event becomes the one passed to check_event().
> > Since event delivery order is non-deterministic, the first event could
> > have error == ESHUTDOWN (error2). check_error_event_info_error() then
> > only checks against ex->error:
> >
> > > if (info_error->error != ex->error) {
> > > tst_res(TFAIL, "Unexpected error code value (%d!=%d)",
> > > info_error->error, ex->error);
> >
> > So it compares ESHUTDOWN against EFSCORRUPTED, doesn't match, and
> > reports TFAIL.
> >
> > We need to change that into:
> >
> > if (info_error->error != ex->error &&
> > (ex->error2 == 0 || info_error->error != ex->error2)) {
> >
> > Similar to the consolidate_events(), otherwise we will have a TFAIL when
> > events don't arrive in the same order. Did you try to run patch with a high
> > number of `-i` ? It should fails sometimes with the actual code.
> I remember i have tested in openqa with -i1000 without error in early version such
> as v4 or v5(The main logic not touched compared with v6/v7).
>
> BTW: is this the final comments?
Andrea is correct, that strictly speaking the ordering of fanotify
notification events is not guaranteed. In practice however error events are
never reordered so you definitely won't be able to observe test failures
due to that currently. I'm fine with the test not handling reordering of
error events, we can always fix that up if the kernel ever decides it is
unavoidable (we'd have to have really strong reason for that).
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 27+ messages in thread
* [LTP] [PATCH v6] io_submit04: Add test case for RWF_NOWAIT flag
@ 2026-03-17 11:46 Wei Gao via ltp
2026-03-31 11:18 ` [LTP] [PATCH v7] fanotify22.c: handle multiple asynchronous error events Wei Gao via ltp
0 siblings, 1 reply; 27+ messages in thread
From: Wei Gao via ltp @ 2026-03-17 11:46 UTC (permalink / raw)
To: ltp
v5-v6:
- Changed fd checks to != -1 in cleanup function
- Replaced errno with TST_ERR for LTP macro consistency
- cleanup format issue for aio_abi.h
Fixes: #467
Signed-off-by: Wei Gao <wegao@suse.com>
---
configure.ac | 1 +
include/lapi/aio_abi.h | 44 +++++++++
runtest/syscalls | 1 +
.../kernel/syscalls/io_submit/.gitignore | 1 +
.../kernel/syscalls/io_submit/io_submit04.c | 99 +++++++++++++++++++
5 files changed, 146 insertions(+)
create mode 100644 include/lapi/aio_abi.h
create mode 100644 testcases/kernel/syscalls/io_submit/io_submit04.c
diff --git a/configure.ac b/configure.ac
index 7fa614dcb..61462d192 100644
--- a/configure.ac
+++ b/configure.ac
@@ -172,6 +172,7 @@ AC_CHECK_FUNCS_ONCE([ \
])
AC_CHECK_FUNCS(mkdtemp,[],AC_MSG_ERROR(mkdtemp() not found!))
+AC_CHECK_MEMBERS([struct iocb.aio_rw_flags],,,[#include <linux/aio_abi.h>])
AC_CHECK_MEMBERS([struct fanotify_event_info_fid.fsid.__val],,,[#include <sys/fanotify.h>])
AC_CHECK_MEMBERS([struct perf_event_mmap_page.aux_head],,,[#include <linux/perf_event.h>])
AC_CHECK_MEMBERS([struct sigaction.sa_sigaction],[],[],[#include <signal.h>])
diff --git a/include/lapi/aio_abi.h b/include/lapi/aio_abi.h
new file mode 100644
index 000000000..ac78e5500
--- /dev/null
+++ b/include/lapi/aio_abi.h
@@ -0,0 +1,44 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (c) 2025 Wei Gao <wegao@suse.com>
+ */
+
+#ifndef LAPI_AIO_ABI_H__
+#define LAPI_AIO_ABI_H__
+
+#include <endian.h>
+#include <linux/aio_abi.h>
+
+#ifndef RWF_NOWAIT
+# define RWF_NOWAIT 0x00000008
+#endif
+
+struct iocb_fallback {
+ uint64_t aio_data;
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+ uint32_t aio_key;
+ uint32_t aio_rw_flags;
+#elif __BYTE_ORDER == __BIG_ENDIAN
+ uint32_t aio_rw_flags;
+ uint32_t aio_key;
+#else
+#error edit for your odd byteorder.
+#endif
+ uint16_t aio_lio_opcode;
+ int16_t aio_reqprio;
+ uint32_t aio_fildes;
+ uint64_t aio_buf;
+ uint64_t aio_nbytes;
+ int64_t aio_offset;
+ uint64_t aio_reserved2;
+ uint32_t aio_flags;
+ uint32_t aio_resfd;
+};
+
+#ifndef HAVE_STRUCT_IOCB_AIO_RW_FLAGS
+typedef struct iocb_fallback iocb;
+#else
+typedef struct iocb iocb;
+#endif
+
+#endif /* LAPI_AIO_ABI_H__ */
diff --git a/runtest/syscalls b/runtest/syscalls
index 2179e007c..9812b1bfe 100644
--- a/runtest/syscalls
+++ b/runtest/syscalls
@@ -699,6 +699,7 @@ io_setup02 io_setup02
io_submit01 io_submit01
io_submit02 io_submit02
io_submit03 io_submit03
+io_submit04 io_submit04
keyctl01 keyctl01
keyctl02 keyctl02
diff --git a/testcases/kernel/syscalls/io_submit/.gitignore b/testcases/kernel/syscalls/io_submit/.gitignore
index 60b07970a..abe962e1c 100644
--- a/testcases/kernel/syscalls/io_submit/.gitignore
+++ b/testcases/kernel/syscalls/io_submit/.gitignore
@@ -1,3 +1,4 @@
/io_submit01
/io_submit02
/io_submit03
+/io_submit04
diff --git a/testcases/kernel/syscalls/io_submit/io_submit04.c b/testcases/kernel/syscalls/io_submit/io_submit04.c
new file mode 100644
index 000000000..3b8842da0
--- /dev/null
+++ b/testcases/kernel/syscalls/io_submit/io_submit04.c
@@ -0,0 +1,99 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (c) 2025 Wei Gao <wegao@suse.com>
+ */
+
+/*\
+ * Test RWF_NOWAIT support in io_submit(), verifying that an
+ * asynchronous read operation on a blocking resource (empty pipe)
+ * will cause -EAGAIN. This is done by checking that io_getevents()
+ * syscall returns immediately and io_event.res is equal to -EAGAIN.
+ */
+
+#include "config.h"
+#include "tst_test.h"
+#include "lapi/syscalls.h"
+#include "lapi/aio_abi.h"
+
+#define BUF_SIZE 100
+
+static int fd[2] = {-1, -1};
+static aio_context_t ctx;
+static char *buf;
+static iocb *cb;
+static iocb **iocbs;
+
+static void setup(void)
+{
+ if (tst_syscall(__NR_io_setup, 1, &ctx))
+ tst_brk(TBROK | TERRNO, "io_setup failed");
+
+ SAFE_PIPE(fd);
+
+ cb->aio_fildes = fd[0];
+ cb->aio_lio_opcode = IOCB_CMD_PREAD;
+ cb->aio_buf = (uint64_t)buf;
+ cb->aio_offset = 0;
+ cb->aio_nbytes = BUF_SIZE;
+ cb->aio_rw_flags = RWF_NOWAIT;
+
+ iocbs[0] = cb;
+}
+
+static void cleanup(void)
+{
+ if (fd[0] != -1)
+ SAFE_CLOSE(fd[0]);
+
+ if (fd[1] != -1)
+ SAFE_CLOSE(fd[1]);
+
+ if (ctx)
+ if (tst_syscall(__NR_io_destroy, ctx))
+ tst_brk(TBROK | TERRNO, "io_destroy() failed");
+}
+
+static void run(void)
+{
+ struct io_event evbuf;
+ struct timespec timeout = { .tv_sec = 1 };
+ long nr = 1;
+
+ TEST(tst_syscall(__NR_io_submit, ctx, nr, iocbs));
+
+ if (TST_RET == -1 && TST_ERR == EOPNOTSUPP) {
+ tst_brk(TCONF, "RWF_NOWAIT not supported by kernel");
+ } else if (TST_RET != nr) {
+ tst_brk(TBROK | TTERRNO, "io_submit() returns %ld, expected %ld",
+ TST_RET, nr);
+ }
+
+ TEST(tst_syscall(__NR_io_getevents, ctx, 1, 1, &evbuf, &timeout));
+
+ if (TST_RET != 1) {
+ tst_res(TFAIL | TTERRNO, "io_getevents() failed to get 1 event");
+ return;
+ }
+
+ if (evbuf.res == -EAGAIN)
+ tst_res(TPASS, "io_getevents() returned EAGAIN on read event");
+ else
+ tst_res(TFAIL, "io_getevents() returned with %s instead of EAGAIN",
+ strerror(-evbuf.res));
+}
+
+static struct tst_test test = {
+ .test_all = run,
+ .setup = setup,
+ .cleanup = cleanup,
+ .needs_kconfigs = (const char *[]) {
+ "CONFIG_AIO=y",
+ NULL
+ },
+ .bufs = (struct tst_buffers []) {
+ {&buf, .size = BUF_SIZE},
+ {&cb, .size = sizeof(iocb)},
+ {&iocbs, .size = sizeof(iocb *)},
+ {},
+ }
+};
--
2.52.0
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply related [flat|nested] 27+ messages in thread* [LTP] [PATCH v7] fanotify22.c: handle multiple asynchronous error events
2026-03-17 11:46 [LTP] [PATCH v6] io_submit04: Add test case for RWF_NOWAIT flag Wei Gao via ltp
@ 2026-03-31 11:18 ` Wei Gao via ltp
0 siblings, 0 replies; 27+ messages in thread
From: Wei Gao via ltp @ 2026-03-31 11:18 UTC (permalink / raw)
To: ltp; +Cc: Jan Kara, kernel test robot
Since the introduction of the asynchronous fserror reporting framework
(kernel commit 81d2e13a57c9), fanotify22 has encountered sporadic failures
due to the non-deterministic nature of event delivery and merging:
1) tcase3 failure: A race condition occurs when the test reads the
notification fd between two events. Use a poll() and read() loop to wait
until the expected event.
2) tcase4 failure: The kernel may deliver errors as independent events
instead of a single merged event, since different worker kthread can
end up generating each event so they won't be merged. As suggested by
Jan Kara, this patch introduces a consolidate_events() helper. It iterates
through the event buffer, accumulates the error_count from all independent
events, and updates the first event's count in-place.
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202602042124.87bd00e3-lkp@intel.com
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wei Gao <wegao@suse.com>
---
v6->v7:
- validation check to accept either error or error2 in check_error_event_info_error
.../kernel/syscalls/fanotify/fanotify22.c | 113 +++++++++++++++---
1 file changed, 96 insertions(+), 17 deletions(-)
diff --git a/testcases/kernel/syscalls/fanotify/fanotify22.c b/testcases/kernel/syscalls/fanotify/fanotify22.c
index e8002b160..f5d59b99d 100644
--- a/testcases/kernel/syscalls/fanotify/fanotify22.c
+++ b/testcases/kernel/syscalls/fanotify/fanotify22.c
@@ -28,6 +28,7 @@
#include "tst_test.h"
#include <sys/fanotify.h>
#include <sys/types.h>
+#include <poll.h>
#ifdef HAVE_SYS_FANOTIFY_H
#include "fanotify.h"
@@ -88,7 +89,6 @@ static void trigger_bad_link_lookup(void)
ret, BAD_LINK, errno, EUCLEAN);
}
-
static void tcase3_trigger(void)
{
trigger_bad_link_lookup();
@@ -104,6 +104,7 @@ static void tcase4_trigger(void)
static struct test_case {
char *name;
int error;
+ int error2;
unsigned int error_count;
struct fanotify_fid_t *fid;
void (*trigger_error)(void);
@@ -134,37 +135,79 @@ static struct test_case {
.trigger_error = &tcase4_trigger,
.error_count = 2,
.error = EFSCORRUPTED,
+ .error2 = ESHUTDOWN,
.fid = &bad_file_fid,
}
};
+static size_t consolidate_events(char *buf, size_t len, const struct test_case *ex)
+{
+ struct fanotify_event_metadata *metadata, *first = NULL;
+ struct fanotify_event_info_error *first_info = NULL;
+ unsigned int total_count = 0;
+ int event_num = 0;
+
+ for (metadata = (struct fanotify_event_metadata *)buf;
+ FAN_EVENT_OK(metadata, len);
+ metadata = FAN_EVENT_NEXT(metadata, len)) {
+
+ event_num++;
+ struct fanotify_event_info_error *info = get_event_info_error(metadata);
+
+ if (!info) {
+ tst_res(TFAIL, "Event [%d] missing error info", event_num);
+ continue;
+ }
+
+ if (info->error != ex->error && (ex->error2 == 0 || info->error != ex->error2)) {
+ tst_res(TFAIL, "Event [%d] unexpected errno (%d)",
+ event_num, info->error);
+ continue;
+ }
+
+ if (!first) {
+ first = metadata;
+ first_info = info;
+ }
+ total_count += info->error_count;
+
+ tst_res(TINFO, "Event [%d]: errno=%d, error_count=%d",
+ event_num, info->error, info->error_count);
+ }
+
+ if (first_info)
+ first_info->error_count = total_count;
+
+ return (first) ? first->event_len : 0;
+}
+
static int check_error_event_info_fid(struct fanotify_event_info_fid *fid,
const struct test_case *ex)
{
struct file_handle *fh = (struct file_handle *) &fid->handle;
if (memcmp(&fid->fsid, &ex->fid->fsid, sizeof(fid->fsid))) {
- tst_res(TFAIL, "%s: Received bad FSID type (%x...!=%x...)",
- ex->name, FSID_VAL_MEMBER(fid->fsid, 0),
+ tst_res(TFAIL, "Received bad FSID type (%x...!=%x...)",
+ FSID_VAL_MEMBER(fid->fsid, 0),
ex->fid->fsid.val[0]);
return 1;
}
if (fh->handle_type != ex->fid->handle.handle_type) {
- tst_res(TFAIL, "%s: Received bad file_handle type (%d!=%d)",
- ex->name, fh->handle_type, ex->fid->handle.handle_type);
+ tst_res(TFAIL, "Received bad file_handle type (%d!=%d)",
+ fh->handle_type, ex->fid->handle.handle_type);
return 1;
}
if (fh->handle_bytes != ex->fid->handle.handle_bytes) {
- tst_res(TFAIL, "%s: Received bad file_handle len (%d!=%d)",
- ex->name, fh->handle_bytes, ex->fid->handle.handle_bytes);
+ tst_res(TFAIL, "Received bad file_handle len (%d!=%d)",
+ fh->handle_bytes, ex->fid->handle.handle_bytes);
return 1;
}
if (memcmp(fh->f_handle, ex->fid->handle.f_handle, fh->handle_bytes)) {
- tst_res(TFAIL, "%s: Received wrong handle. "
- "Expected (%x...) got (%x...) ", ex->name,
+ tst_res(TFAIL, "Received wrong handle. "
+ "Expected (%x...) got (%x...) ",
*(int *)ex->fid->handle.f_handle, *(int *)fh->f_handle);
return 1;
}
@@ -177,14 +220,15 @@ static int check_error_event_info_error(struct fanotify_event_info_error *info_e
int fail = 0;
if (info_error->error_count != ex->error_count) {
- tst_res(TFAIL, "%s: Unexpected error_count (%d!=%d)",
- ex->name, info_error->error_count, ex->error_count);
+ tst_res(TFAIL, "Unexpected error_count (%d!=%d)",
+ info_error->error_count, ex->error_count);
fail++;
}
- if (info_error->error != ex->error) {
- tst_res(TFAIL, "%s: Unexpected error code value (%d!=%d)",
- ex->name, info_error->error, ex->error);
+ if (info_error->error != ex->error &&
+ (ex->error2 == 0 || info_error->error != ex->error2)) {
+ tst_res(TFAIL, "Unexpected error code value (%d!=%d)",
+ info_error->error, ex->error);
fail++;
}
@@ -248,19 +292,54 @@ static void check_event(char *buf, size_t len, const struct test_case *ex)
static void do_test(unsigned int i)
{
const struct test_case *tcase = &testcases[i];
- size_t read_len;
+ size_t read_len = 0;
+ struct pollfd pfd;
+ unsigned int accumulated_count = 0;
+
+ tst_res(TINFO, "Test case: %s", tcase->name);
SAFE_FANOTIFY_MARK(fd_notify, FAN_MARK_ADD|FAN_MARK_FILESYSTEM,
FAN_FS_ERROR, AT_FDCWD, MOUNT_PATH);
tcase->trigger_error();
- read_len = SAFE_READ(0, fd_notify, event_buf, BUF_SIZE);
+ pfd.fd = fd_notify;
+ pfd.events = POLLIN;
+
+ while (accumulated_count < tcase->error_count) {
+ if (poll(&pfd, 1, 5000) <= 0) {
+ tst_res(TFAIL, "Timeout waiting for events");
+ goto out;
+ }
+
+ if (BUF_SIZE - read_len < FAN_EVENT_METADATA_LEN)
+ tst_brk(TBROK, "Insufficient buffer space for next event");
+
+ char *current_pos = event_buf + read_len;
+ ssize_t ret = SAFE_READ(0, fd_notify, current_pos, BUF_SIZE - read_len);
+
+ struct fanotify_event_metadata *m =
+ (struct fanotify_event_metadata *)current_pos;
+ while (FAN_EVENT_OK(m, ret)) {
+ struct fanotify_event_info_error *e = get_event_info_error(m);
+
+ if (e)
+ accumulated_count += e->error_count;
+
+ read_len += m->event_len;
+ m = FAN_EVENT_NEXT(m, ret);
+ }
+ }
+
+ read_len = consolidate_events(event_buf, read_len, tcase);
+
+ check_event(event_buf, read_len, tcase);
+
+out:
SAFE_FANOTIFY_MARK(fd_notify, FAN_MARK_REMOVE|FAN_MARK_FILESYSTEM,
FAN_FS_ERROR, AT_FDCWD, MOUNT_PATH);
- check_event(event_buf, read_len, tcase);
/* Unmount and mount the filesystem to get it out of the error state */
SAFE_UMOUNT(MOUNT_PATH);
SAFE_MOUNT(tst_device->dev, MOUNT_PATH, tst_device->fs_type, 0, NULL);
--
2.52.0
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [LTP] [PATCH v7] fanotify22.c: handle multiple asynchronous error events
@ 2026-03-27 12:22 Wei Gao via ltp
0 siblings, 0 replies; 27+ messages in thread
From: Wei Gao via ltp @ 2026-03-27 12:22 UTC (permalink / raw)
To: ltp; +Cc: Jan Kara, kernel test robot
Since the introduction of the asynchronous fserror reporting framework
(kernel commit 81d2e13a57c9), fanotify22 has encountered sporadic failures
due to the non-deterministic nature of event delivery and merging:
1) tcase3 failure: A race condition occurs when the test reads the
notification fd between two events. Use a poll() and read() loop to wait
until the expected event.
2) tcase4 failure: The kernel may deliver errors as independent events
instead of a single merged event, since different worker kthread can
end up generating each event so they won't be merged. As suggested by
Jan Kara, this patch introduces a consolidate_events() helper. It iterates
through the event buffer, accumulates the error_count from all independent
events, and updates the first event's count in-place.
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202602042124.87bd00e3-lkp@intel.com
Suggested-by: Jan Kara <jack@suse.cz>
Cc: Andrea Cervesato <andrea.cervesato@suse.com>
Signed-off-by: Wei Gao <wegao@suse.com>
---
v6 -> v7:
- Replaced read() with SAFE_READ().
- Added TINFO message at the start of do_test().
- Removed redundant test case names from error messages.
- Improved buffer overflow check in do_test().
.../kernel/syscalls/fanotify/fanotify22.c | 110 +++++++++++++++---
1 file changed, 94 insertions(+), 16 deletions(-)
diff --git a/testcases/kernel/syscalls/fanotify/fanotify22.c b/testcases/kernel/syscalls/fanotify/fanotify22.c
index e8002b160..9a555c247 100644
--- a/testcases/kernel/syscalls/fanotify/fanotify22.c
+++ b/testcases/kernel/syscalls/fanotify/fanotify22.c
@@ -28,6 +28,7 @@
#include "tst_test.h"
#include <sys/fanotify.h>
#include <sys/types.h>
+#include <poll.h>
#ifdef HAVE_SYS_FANOTIFY_H
#include "fanotify.h"
@@ -88,7 +89,6 @@ static void trigger_bad_link_lookup(void)
ret, BAD_LINK, errno, EUCLEAN);
}
-
static void tcase3_trigger(void)
{
trigger_bad_link_lookup();
@@ -104,6 +104,7 @@ static void tcase4_trigger(void)
static struct test_case {
char *name;
int error;
+ int error2;
unsigned int error_count;
struct fanotify_fid_t *fid;
void (*trigger_error)(void);
@@ -134,37 +135,79 @@ static struct test_case {
.trigger_error = &tcase4_trigger,
.error_count = 2,
.error = EFSCORRUPTED,
+ .error2 = ESHUTDOWN,
.fid = &bad_file_fid,
}
};
+static size_t consolidate_events(char *buf, size_t len, const struct test_case *ex)
+{
+ struct fanotify_event_metadata *metadata, *first = NULL;
+ struct fanotify_event_info_error *first_info = NULL;
+ unsigned int total_count = 0;
+ int event_num = 0;
+
+ for (metadata = (struct fanotify_event_metadata *)buf;
+ FAN_EVENT_OK(metadata, len);
+ metadata = FAN_EVENT_NEXT(metadata, len)) {
+
+ event_num++;
+ struct fanotify_event_info_error *info = get_event_info_error(metadata);
+
+ if (!info) {
+ tst_res(TFAIL, "Event [%d] missing error info", event_num);
+ continue;
+ }
+
+ if (info->error != ex->error && (ex->error2 == 0 || info->error != ex->error2)) {
+ tst_res(TFAIL, "Event [%d] unexpected errno (%d)",
+ event_num, info->error);
+ continue;
+ }
+
+ if (!first) {
+ first = metadata;
+ first_info = info;
+ }
+ total_count += info->error_count;
+
+ tst_res(TINFO, "Event [%d]: errno=%d, error_count=%d",
+ event_num, info->error, info->error_count);
+ }
+
+ if (first_info)
+ first_info->error_count = total_count;
+
+ return (first) ? first->event_len : 0;
+}
+
static int check_error_event_info_fid(struct fanotify_event_info_fid *fid,
const struct test_case *ex)
{
struct file_handle *fh = (struct file_handle *) &fid->handle;
if (memcmp(&fid->fsid, &ex->fid->fsid, sizeof(fid->fsid))) {
- tst_res(TFAIL, "%s: Received bad FSID type (%x...!=%x...)",
- ex->name, FSID_VAL_MEMBER(fid->fsid, 0),
+ tst_res(TFAIL, "Received bad FSID type (%x...!=%x...)",
+ FSID_VAL_MEMBER(fid->fsid, 0),
ex->fid->fsid.val[0]);
return 1;
}
if (fh->handle_type != ex->fid->handle.handle_type) {
- tst_res(TFAIL, "%s: Received bad file_handle type (%d!=%d)",
- ex->name, fh->handle_type, ex->fid->handle.handle_type);
+ tst_res(TFAIL, "Received bad file_handle type (%d!=%d)",
+ fh->handle_type, ex->fid->handle.handle_type);
return 1;
}
if (fh->handle_bytes != ex->fid->handle.handle_bytes) {
- tst_res(TFAIL, "%s: Received bad file_handle len (%d!=%d)",
- ex->name, fh->handle_bytes, ex->fid->handle.handle_bytes);
+ tst_res(TFAIL, "Received bad file_handle len (%d!=%d)",
+ fh->handle_bytes, ex->fid->handle.handle_bytes);
return 1;
}
if (memcmp(fh->f_handle, ex->fid->handle.f_handle, fh->handle_bytes)) {
- tst_res(TFAIL, "%s: Received wrong handle. "
- "Expected (%x...) got (%x...) ", ex->name,
+ tst_res(TFAIL, "Received wrong handle. "
+ "Expected (%x...) got (%x...) ",
*(int *)ex->fid->handle.f_handle, *(int *)fh->f_handle);
return 1;
}
@@ -177,14 +220,14 @@ static int check_error_event_info_error(struct fanotify_event_info_error *info_e
int fail = 0;
if (info_error->error_count != ex->error_count) {
- tst_res(TFAIL, "%s: Unexpected error_count (%d!=%d)",
- ex->name, info_error->error_count, ex->error_count);
+ tst_res(TFAIL, "Unexpected error_count (%d!=%d)",
+ info_error->error_count, ex->error_count);
fail++;
}
if (info_error->error != ex->error) {
- tst_res(TFAIL, "%s: Unexpected error code value (%d!=%d)",
- ex->name, info_error->error, ex->error);
+ tst_res(TFAIL, "Unexpected error code value (%d!=%d)",
+ info_error->error, ex->error);
fail++;
}
@@ -248,19 +291,54 @@ static void check_event(char *buf, size_t len, const struct test_case *ex)
static void do_test(unsigned int i)
{
const struct test_case *tcase = &testcases[i];
- size_t read_len;
+ size_t read_len = 0;
+ struct pollfd pfd;
+ unsigned int accumulated_count = 0;
+
+ tst_res(TINFO, "Test case: %s", tcase->name);
SAFE_FANOTIFY_MARK(fd_notify, FAN_MARK_ADD|FAN_MARK_FILESYSTEM,
FAN_FS_ERROR, AT_FDCWD, MOUNT_PATH);
tcase->trigger_error();
- read_len = SAFE_READ(0, fd_notify, event_buf, BUF_SIZE);
+ pfd.fd = fd_notify;
+ pfd.events = POLLIN;
+
+ while (accumulated_count < tcase->error_count) {
+ if (poll(&pfd, 1, 5000) <= 0) {
+ tst_res(TFAIL, "Timeout waiting for events");
+ goto out;
+ }
+
+ if (BUF_SIZE - read_len < FAN_EVENT_METADATA_LEN)
+ tst_brk(TBROK, "Insufficient buffer space for next event");
+
+ char *current_pos = event_buf + read_len;
+ ssize_t ret = SAFE_READ(0, fd_notify, current_pos, BUF_SIZE - read_len);
+
+ struct fanotify_event_metadata *m =
+ (struct fanotify_event_metadata *)current_pos;
+ while (FAN_EVENT_OK(m, ret)) {
+ struct fanotify_event_info_error *e = get_event_info_error(m);
+
+ if (e)
+ accumulated_count += e->error_count;
+
+ read_len += m->event_len;
+ m = FAN_EVENT_NEXT(m, ret);
+ }
+ }
+
+ read_len = consolidate_events(event_buf, read_len, tcase);
+
+ check_event(event_buf, read_len, tcase);
+
+out:
SAFE_FANOTIFY_MARK(fd_notify, FAN_MARK_REMOVE|FAN_MARK_FILESYSTEM,
FAN_FS_ERROR, AT_FDCWD, MOUNT_PATH);
- check_event(event_buf, read_len, tcase);
/* Unmount and mount the filesystem to get it out of the error state */
SAFE_UMOUNT(MOUNT_PATH);
SAFE_MOUNT(tst_device->dev, MOUNT_PATH, tst_device->fs_type, 0, NULL);
--
2.52.0
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply related [flat|nested] 27+ messages in thread
end of thread, other threads:[~2026-03-31 11:19 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-04 13:38 [LTP] [PATCH v1] fanotify22.c: handle multiple asynchronous error events Wei Gao via ltp
2026-03-05 9:36 ` Jan Kara
2026-03-05 14:36 ` Wei Gao via ltp
2026-03-05 15:50 ` Jan Kara
2026-03-06 4:50 ` Wei Gao via ltp
2026-03-06 12:24 ` Petr Vorel
2026-03-06 15:19 ` Jan Kara
2026-03-09 7:59 ` [LTP] [PATCH v2] " Wei Gao via ltp
2026-03-09 10:26 ` Andrea Cervesato via ltp
2026-03-09 11:29 ` Jan Kara
2026-03-18 6:46 ` [LTP] [PATCH v3] " Wei Gao via ltp
2026-03-18 18:18 ` Jan Kara
2026-03-24 11:55 ` Andrea Cervesato via ltp
2026-03-25 12:43 ` [LTP] [PATCH v4] " Wei Gao via ltp
2026-03-25 15:52 ` Jan Kara
2026-03-26 1:28 ` [LTP] [PATCH v5] " Wei Gao via ltp
2026-03-26 8:57 ` Jan Kara
2026-03-26 9:40 ` Andrea Cervesato via ltp
2026-03-27 4:55 ` [LTP] [PATCH v6] " Wei Gao via ltp
2026-03-27 9:07 ` Andrea Cervesato via ltp
2026-03-27 12:33 ` [LTP] [PATCH v7] " Wei Gao via ltp
2026-03-27 14:19 ` Andrea Cervesato via ltp
2026-03-28 0:44 ` Wei Gao via ltp
2026-03-30 7:17 ` Andrea Cervesato via ltp
2026-03-30 7:36 ` Jan Kara
-- strict thread matches above, loose matches on Subject: below --
2026-03-17 11:46 [LTP] [PATCH v6] io_submit04: Add test case for RWF_NOWAIT flag Wei Gao via ltp
2026-03-31 11:18 ` [LTP] [PATCH v7] fanotify22.c: handle multiple asynchronous error events Wei Gao via ltp
2026-03-27 12:22 Wei Gao via ltp
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox