All of lore.kernel.org
 help / color / mirror / Atom feed
* fs/9p: regression in 6.8-rc1
@ 2024-01-26 20:21 Eric Van Hensbergen
  2024-01-27  1:53 ` Dominique Martinet
  2024-01-30 15:41 ` Linux regression tracking #adding (Thorsten Leemhuis)
  0 siblings, 2 replies; 6+ messages in thread
From: Eric Van Hensbergen @ 2024-01-26 20:21 UTC (permalink / raw)
  To: dhowells; +Cc: v9fs, linux_oss

I caught a problem in the new netfs code when running in 9p when running
with nocache mode.  A regression sweep is turning up a:
 [ 1084.438387] netfs: Zero-sized write [R=1b6da]
when running my ldconfig test (included at the end of this)
it reports:
/sbin/ldconfig.real: Writing of cache extension data failed: Input/output error

I will try to dig into this later today if I have time, but not sure I'll get
to it so I wanted to make other folks aware.  I'm not sure how much other
elements of my test harness are contributing to reproducing the problem.

	-eric

--snip--
# setup chroot

mount -o bind /dev /mnt/9/dev
mount -o bind /sys /mnt/9/sys

# test mmap via ldconfig doesn't produce lots of errors

chroot /mnt/9 ldconfig -C /tmp/testldconfig.cache > $LOG.log 2>&1

if [ -s $LOG.log ]; then
    echo "FAIL"
else
    echo "SUCCESS"
fi


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fs/9p: regression in 6.8-rc1
  2024-01-26 20:21 fs/9p: regression in 6.8-rc1 Eric Van Hensbergen
@ 2024-01-27  1:53 ` Dominique Martinet
  2024-01-28 13:06   ` Dominique Martinet
  2024-01-30 15:41 ` Linux regression tracking #adding (Thorsten Leemhuis)
  1 sibling, 1 reply; 6+ messages in thread
From: Dominique Martinet @ 2024-01-27  1:53 UTC (permalink / raw)
  To: Eric Van Hensbergen; +Cc: dhowells, v9fs, linux_oss

Eric Van Hensbergen wrote on Fri, Jan 26, 2024 at 02:21:39PM -0600:
> I caught a problem in the new netfs code when running in 9p when running
> with nocache mode.  A regression sweep is turning up a:
>  [ 1084.438387] netfs: Zero-sized write [R=1b6da]
> when running my ldconfig test (included at the end of this)
> it reports:
> /sbin/ldconfig.real: Writing of cache extension data failed: Input/output error
> 
> I will try to dig into this later today if I have time, but not sure I'll get
> to it so I wanted to make other folks aware.  I'm not sure how much other
> elements of my test harness are contributing to reproducing the problem.


The syzbot report (refcount underflow[1]) is also probably related; I'll
try to find some time to check a bit more this weekend

[1] https://lkml.kernel.org/r/000000000000ee5c6c060fd59890@google.com

-- 
Dominique Martinet | Asmadeus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fs/9p: regression in 6.8-rc1
  2024-01-27  1:53 ` Dominique Martinet
@ 2024-01-28 13:06   ` Dominique Martinet
  2024-01-29  9:20     ` David Howells
  0 siblings, 1 reply; 6+ messages in thread
From: Dominique Martinet @ 2024-01-28 13:06 UTC (permalink / raw)
  To: dhowells, Eric Van Hensbergen; +Cc: v9fs, linux_oss

Dominique Martinet wrote on Sat, Jan 27, 2024 at 10:53:19AM +0900:
> Eric Van Hensbergen wrote on Fri, Jan 26, 2024 at 02:21:39PM -0600:
> > I caught a problem in the new netfs code when running in 9p when running
> > with nocache mode.  A regression sweep is turning up a:
> >  [ 1084.438387] netfs: Zero-sized write [R=1b6da]
> > when running my ldconfig test (included at the end of this)
> > it reports:
> > /sbin/ldconfig.real: Writing of cache extension data failed: Input/output error
> > 
> > I will try to dig into this later today if I have time, but not sure I'll get
> > to it so I wanted to make other folks aware.  I'm not sure how much other
> > elements of my test harness are contributing to reproducing the problem.

Didn't get much time, I can just confirm I can reproduce, it boils down
to a 0-size write:

$ xfs_io -f -c 'pwrite 0 0' foo
(dmesg) netfs: Zero-sized write [R=fb5]
pwrite: Input/output error



I was going to say we probably need to filter it out - but it looks like
that might be netfs' job given the call trace I get:

# retsnoop -T -e vfs_write -a :fs/9p/*.c -a :fs/netfs/*.c
FUNCTION CALL TRACE                 RESULT                  DURATION
---------------------------------   --------------------  ----------
→ vfs_write                                                         
    → netfs_unbuffered_write_iter                                   
        ↔ netfs_start_io_direct     [0]                      0.391us
        → netfs_alloc_request                                       
            ↔ v9fs_init_request     [0]                      0.431us
        ← netfs_alloc_request       [0xffff8b8a5584d600]     1.653us
        ↔ netfs_extract_user_iter   [0]                      0.671us
        → netfs_begin_write                                         
            ↔ v9fs_free_inode       [void]                  33.653us
            ↔ v9fs_free_inode       [void]                   0.511us
            ↔ v9fs_free_inode       [void]                   0.371us
            ↔ v9fs_free_inode       [void]                   0.350us
            ↔ v9fs_free_inode       [void]                   0.391us
            ↔ v9fs_free_inode       [void]                   0.361us
            ↔ v9fs_free_inode       [void]                   0.451us
            ↔ v9fs_free_inode       [void]                   0.391us
            ↔ v9fs_free_inode       [void]                   0.391us
            ↔ v9fs_free_inode       [void]                   0.451us
        ← netfs_begin_write         [-EIO]                1120.811us
        → netfs_free_request                                        
            ↔ v9fs_free_request     [void]                  28.062us
        ← netfs_free_request        [void]                  44.423us
        ↔ netfs_end_io_direct       [void]                   0.421us
    ← netfs_unbuffered_write_iter   [-EIO]                1207.784us
← vfs_write                         [-EIO]                1210.228us


David, where do you think we should catch that?
Can we leave that fix to you?


> The syzbot report (refcount underflow[1]) is also probably related; I'll
> try to find some time to check a bit more this weekend
> 
> [1] https://lkml.kernel.org/r/000000000000ee5c6c060fd59890@google.com

So that one's not directly related to this, but given the timing I'd
still bet something changed around cache... I didn't manage to
reproduce it on a very quick workload but I didn't run all that much
yet, will need to spend a bit more time on that another day...

-- 
Dominique Martinet | Asmadeus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fs/9p: regression in 6.8-rc1
  2024-01-28 13:06   ` Dominique Martinet
@ 2024-01-29  9:20     ` David Howells
  0 siblings, 0 replies; 6+ messages in thread
From: David Howells @ 2024-01-29  9:20 UTC (permalink / raw)
  To: Dominique Martinet; +Cc: dhowells, Eric Van Hensbergen, v9fs, linux_oss

Dominique Martinet <asmadeus@codewreck.org> wrote:

> I was going to say we probably need to filter it out - but it looks like
> that might be netfs' job given the call trace I get:
> 
> # retsnoop -T -e vfs_write -a :fs/9p/*.c -a :fs/netfs/*.c
> FUNCTION CALL TRACE                 RESULT                  DURATION
> ---------------------------------   --------------------  ----------
> → vfs_write                                                         
>     → netfs_unbuffered_write_iter                                   

Yeah - I should check to see if generic_write_checks() returned 0 and skip if
it did (as does netfs_file_write_iter()).

It might also be worth doing the check before even taking any locks.

David


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fs/9p: regression in 6.8-rc1
  2024-01-26 20:21 fs/9p: regression in 6.8-rc1 Eric Van Hensbergen
  2024-01-27  1:53 ` Dominique Martinet
@ 2024-01-30 15:41 ` Linux regression tracking #adding (Thorsten Leemhuis)
  2024-02-18 10:10   ` Linux regression tracking #update (Thorsten Leemhuis)
  1 sibling, 1 reply; 6+ messages in thread
From: Linux regression tracking #adding (Thorsten Leemhuis) @ 2024-01-30 15:41 UTC (permalink / raw)
  To: Linux kernel regressions list; +Cc: v9fs

On 26.01.24 21:21, Eric Van Hensbergen wrote:
> I caught a problem in the new netfs code when running in 9p when running
> with nocache mode.  A regression sweep is turning up a:
>  [ 1084.438387] netfs: Zero-sized write [R=1b6da]
> when running my ldconfig test (included at the end of this)
> it reports:
> /sbin/ldconfig.real: Writing of cache extension data failed: Input/output error
> 
> I will try to dig into this later today if I have time, but not sure I'll get
> to it so I wanted to make other folks aware.  I'm not sure how much other
> elements of my test harness are contributing to reproducing the problem.

To be sure the issue doesn't fall through the cracks unnoticed, I'm
adding it to regzbot, the Linux kernel regression tracking bot:

#regzbot ^introduced v6.7..v6.8-rc1
#regzbot title fs/9p: "netfs: Zero-sized write" failures during ldconfig
test
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fs/9p: regression in 6.8-rc1
  2024-01-30 15:41 ` Linux regression tracking #adding (Thorsten Leemhuis)
@ 2024-02-18 10:10   ` Linux regression tracking #update (Thorsten Leemhuis)
  0 siblings, 0 replies; 6+ messages in thread
From: Linux regression tracking #update (Thorsten Leemhuis) @ 2024-02-18 10:10 UTC (permalink / raw)
  To: Linux kernel regressions list; +Cc: v9fs

On 30.01.24 16:41, Linux regression tracking #adding (Thorsten Leemhuis)
wrote:
> On 26.01.24 21:21, Eric Van Hensbergen wrote:
>> I caught a problem in the new netfs code when running in 9p when running
>> with nocache mode.  A regression sweep is turning up a:
>>  [ 1084.438387] netfs: Zero-sized write [R=1b6da]
>> when running my ldconfig test (included at the end of this)
>> it reports:
>> /sbin/ldconfig.real: Writing of cache extension data failed: Input/output error
>>
>> I will try to dig into this later today if I have time, but not sure I'll get
>> to it so I wanted to make other folks aware.  I'm not sure how much other
>> elements of my test harness are contributing to reproducing the problem.
> 
> To be sure the issue doesn't fall through the cracks unnoticed, I'm
> adding it to regzbot, the Linux kernel regression tracking bot:
> 
> #regzbot ^introduced v6.7..v6.8-rc1

#regzbot introduced 153a9961b551
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-02-18 10:11 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-26 20:21 fs/9p: regression in 6.8-rc1 Eric Van Hensbergen
2024-01-27  1:53 ` Dominique Martinet
2024-01-28 13:06   ` Dominique Martinet
2024-01-29  9:20     ` David Howells
2024-01-30 15:41 ` Linux regression tracking #adding (Thorsten Leemhuis)
2024-02-18 10:10   ` Linux regression tracking #update (Thorsten Leemhuis)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.