How to debug stuck read?

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* How to debug stuck read?
@ 2022-02-02 17:15 Dāvis Mosāns
  2022-02-02 19:13 ` Matthew Wilcox
  0 siblings, 1 reply; 11+ messages in thread
From: Dāvis Mosāns @ 2022-02-02 17:15 UTC (permalink / raw)
  To: BTRFS, linux-fsdevel, kernelnewbies

Hi,

I have a corrupted file on BTRFS which has CoW disabled thus no
checksum. Trying to read this file causes the process to get stuck
forever. It doesn't return EIO.

How can I find out why it gets stuck?

$ ddrescue -b 1 currupted_file /tmp/temp
GNU ddrescue 1.26
Press Ctrl-C to interrupt
    ipos:        0 B, non-trimmed:        0 B,  current rate:       0 B/s
    opos:        0 B, non-scraped:        0 B,  average rate:       0 B/s
non-tried:    8388 kB,  bad-sector:        0 B,    error rate:       0 B/s
 rescued:        0 B,   bad areas:        0,        run time:          0s
pct rescued:    0.00%, read errors:        0,  remaining time:         n/a
                             time since last successful read:         n/a
Copying non-tried blocks... Pass 1 (forwards)
^C
// doesn't stop with Ctrl+C nor SIGTERM

$ gdb -q -p 3449
Attaching to process 3449
^C
// same gets stuck

$ cat /proc/3449/stack | ./scripts/decode_stacktrace.sh vmlinux
folio_wait_bit_common (mm/filemap.c:1314)
filemap_get_pages (mm/filemap.c:2622)
filemap_read (mm/filemap.c:2676)
new_sync_read (fs/read_write.c:401 (discriminator 1))
vfs_read (fs/read_write.c:481)
ksys_read (fs/read_write.c:619)
do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113)


I enabled
CONFIG_BTRFS_DEBUG=y
CONFIG_BTRFS_ASSERT=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_LOCKDEP=y
CONFIG_LOCKUP_DETECTOR=y
CONFIG_SOFTLOCKUP_DETECTOR=y
CONFIG_LOCK_DEBUGGING_SUPPORT=y
CONFIG_PROVE_LOCKING=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_LOCK_ALLOC=y

but in dmesg only thing that shows up is a lot of
BTRFS error (device sdh): invalid lzo header, lzo len 2937060802
compressed len 4096

If I try to do btrfs send, it gets stuck same way
$ btrfs send -v /mnt/fs > /dev/null

$ cat /proc/4712/stack | ./scripts/decode_stacktrace.sh vmlinux
folio_wait_bit_common (mm/filemap.c:1314)
__filemap_get_folio (mm/filemap.c:1690 ./include/linux/pagemap.h:779
mm/filemap.c:1960)
pagecache_get_page (mm/folio-compat.c:126)
send_extent_data (fs/btrfs/send.c:4980 fs/btrfs/send.c:5048
fs/btrfs/send.c:5235) btrfs
process_extent (fs/btrfs/send.c:5575 fs/btrfs/send.c:5959) btrfs
btrfs_ioctl_send (fs/btrfs/send.c:6770 fs/btrfs/send.c:7368
fs/btrfs/send.c:7688) btrfs
_btrfs_ioctl_send (fs/btrfs/ioctl.c:4963) btrfs
btrfs_ioctl (fs/btrfs/ioctl.c:5072) btrfs
__x64_sys_ioctl (fs/ioctl.c:52 fs/ioctl.c:874 fs/ioctl.c:860 fs/ioctl.c:860)
do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113)

I'm now using 5.17.0-rc2 but it's exactly same with 5.16.5

Thanks!

Best regards,
Dāvis

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to debug stuck read?
  2022-02-02 17:15 How to debug stuck read? Dāvis Mosāns
@ 2022-02-02 19:13 ` Matthew Wilcox
  2022-02-02 21:50   ` Dāvis Mosāns
  0 siblings, 1 reply; 11+ messages in thread
From: Matthew Wilcox @ 2022-02-02 19:13 UTC (permalink / raw)
  To: Dāvis Mosāns; +Cc: BTRFS, linux-fsdevel, kernelnewbies

On Wed, Feb 02, 2022 at 07:15:14PM +0200, Dāvis Mosāns wrote:
> I have a corrupted file on BTRFS which has CoW disabled thus no
> checksum. Trying to read this file causes the process to get stuck
> forever. It doesn't return EIO.
> 
> How can I find out why it gets stuck?

> $ cat /proc/3449/stack | ./scripts/decode_stacktrace.sh vmlinux
> folio_wait_bit_common (mm/filemap.c:1314)
> filemap_get_pages (mm/filemap.c:2622)
> filemap_read (mm/filemap.c:2676)
> new_sync_read (fs/read_write.c:401 (discriminator 1))

folio_wait_bit_common() is where it waits for the page to be unlocked.
Probably the problem is that btrfs isn't unlocking the page on
seeing the error, so you don't get the -EIO returned?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to debug stuck read?
  2022-02-02 19:13 ` Matthew Wilcox
@ 2022-02-02 21:50   ` Dāvis Mosāns
  2022-02-06 11:01     ` FMDF
  0 siblings, 1 reply; 11+ messages in thread
From: Dāvis Mosāns @ 2022-02-02 21:50 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: BTRFS, linux-fsdevel, kernelnewbies

trešd., 2022. g. 2. febr., plkst. 21:13 — lietotājs Matthew Wilcox
(<willy@infradead.org>) rakstīja:
>
> On Wed, Feb 02, 2022 at 07:15:14PM +0200, Dāvis Mosāns wrote:
> > I have a corrupted file on BTRFS which has CoW disabled thus no
> > checksum. Trying to read this file causes the process to get stuck
> > forever. It doesn't return EIO.
> >
> > How can I find out why it gets stuck?
>
> > $ cat /proc/3449/stack | ./scripts/decode_stacktrace.sh vmlinux
> > folio_wait_bit_common (mm/filemap.c:1314)
> > filemap_get_pages (mm/filemap.c:2622)
> > filemap_read (mm/filemap.c:2676)
> > new_sync_read (fs/read_write.c:401 (discriminator 1))
>
> folio_wait_bit_common() is where it waits for the page to be unlocked.
> Probably the problem is that btrfs isn't unlocking the page on
> seeing the error, so you don't get the -EIO returned?


Yeah, but how to find where that happens.
Anyway by pure luck I found memcpy that wrote outside of allocated
memory and fixing that solved this issue but I still don't know how to
debug this properly.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to debug stuck read?
  2022-02-02 21:50   ` Dāvis Mosāns
@ 2022-02-06 11:01     ` FMDF
  2022-02-06 12:48       ` Matthew Wilcox
  0 siblings, 1 reply; 11+ messages in thread
From: FMDF @ 2022-02-06 11:01 UTC (permalink / raw)
  To: Dāvis Mosāns
  Cc: Matthew Wilcox, linux-fsdevel, BTRFS, kernelnewbies

On Wed, Feb 2, 2022 at 10:50 PM Dāvis Mosāns <davispuh@gmail.com> wrote:
>
> trešd., 2022. g. 2. febr., plkst. 21:13 — lietotājs Matthew Wilcox
> (<willy@infradead.org>) rakstīja:
> >
> > On Wed, Feb 02, 2022 at 07:15:14PM +0200, Dāvis Mosāns wrote:
> > > I have a corrupted file on BTRFS which has CoW disabled thus no
> > > checksum. Trying to read this file causes the process to get stuck
> > > forever. It doesn't return EIO.
> > >
> > > How can I find out why it gets stuck?
> >
> > > $ cat /proc/3449/stack | ./scripts/decode_stacktrace.sh vmlinux
> > > folio_wait_bit_common (mm/filemap.c:1314)
> > > filemap_get_pages (mm/filemap.c:2622)
> > > filemap_read (mm/filemap.c:2676)
> > > new_sync_read (fs/read_write.c:401 (discriminator 1))
> >
> > folio_wait_bit_common() is where it waits for the page to be unlocked.
> > Probably the problem is that btrfs isn't unlocking the page on
> > seeing the error, so you don't get the -EIO returned?
>
>
> Yeah, but how to find where that happens.
> Anyway by pure luck I found memcpy that wrote outside of allocated
> memory and fixing that solved this issue but I still don't know how to
> debug this properly.
>
There is no special recipe for debugging "this properly" :)

You wrote that "by pure luck" you found a memcpy() that wrote beyond the
limit of allocated memory. I suppose that you found that faulty memcpy()
somewhere in one of the function listed in the stack trace.

That's the right approach! You read the calls chain and find out where something
looks wrong and then fix it. This is why stack traces are so helpful.

It was not "pure luck". I think that you did what developers usually do after
decoding a stack trace. If not, how did you find that faulty memcpy() buried
somewhere in 40 millions lines of code?

it seems that you've found the right way to figure out the problems in code
that (probably) you had not ever worked on or read before you hit that bug.

Have you sent a patch to the LKML? If not, please do it.

Regards,

Fabio M. De Francesco

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to debug stuck read?
  2022-02-06 11:01     ` FMDF
@ 2022-02-06 12:48       ` Matthew Wilcox
  2022-02-06 21:22         ` FMDF
  0 siblings, 1 reply; 11+ messages in thread
From: Matthew Wilcox @ 2022-02-06 12:48 UTC (permalink / raw)
  To: FMDF; +Cc: Dāvis Mosāns, linux-fsdevel, BTRFS, kernelnewbies

On Sun, Feb 06, 2022 at 12:01:02PM +0100, FMDF wrote:
> On Wed, Feb 2, 2022 at 10:50 PM Dāvis Mosāns <davispuh@gmail.com> wrote:
> >
> > trešd., 2022. g. 2. febr., plkst. 21:13 — lietotājs Matthew Wilcox
> > (<willy@infradead.org>) rakstīja:
> > >
> > > On Wed, Feb 02, 2022 at 07:15:14PM +0200, Dāvis Mosāns wrote:
> > > > I have a corrupted file on BTRFS which has CoW disabled thus no
> > > > checksum. Trying to read this file causes the process to get stuck
> > > > forever. It doesn't return EIO.
> > > >
> > > > How can I find out why it gets stuck?
> > >
> > > > $ cat /proc/3449/stack | ./scripts/decode_stacktrace.sh vmlinux
> > > > folio_wait_bit_common (mm/filemap.c:1314)
> > > > filemap_get_pages (mm/filemap.c:2622)
> > > > filemap_read (mm/filemap.c:2676)
> > > > new_sync_read (fs/read_write.c:401 (discriminator 1))
> > >
> > > folio_wait_bit_common() is where it waits for the page to be unlocked.
> > > Probably the problem is that btrfs isn't unlocking the page on
> > > seeing the error, so you don't get the -EIO returned?
> >
> >
> > Yeah, but how to find where that happens.
> > Anyway by pure luck I found memcpy that wrote outside of allocated
> > memory and fixing that solved this issue but I still don't know how to
> > debug this properly.
> >
> There is no special recipe for debugging "this properly" :)
> 
> You wrote that "by pure luck" you found a memcpy() that wrote beyond the
> limit of allocated memory. I suppose that you found that faulty memcpy()
> somewhere in one of the function listed in the stack trace.

I very much doubt that.  The code flow here is:

userspace calls read() -> VFS -> btrfs -> block layer -> return to btrfs
-> return to VFS, wait for read to complete.  So by the time anyone's
looking at the stack trace, all you can see is the part of the call
chain in the VFS.  There's no way to see where we went in btrfs, nor
in the block layer.  We also can't see from the stack trace what
happened with the interrupt which _should have_ cleared the lock bit
and didn't.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to debug stuck read?
  2022-02-06 12:48       ` Matthew Wilcox
@ 2022-02-06 21:22         ` FMDF
  2022-02-06 23:21           ` Dāvis Mosāns
  0 siblings, 1 reply; 11+ messages in thread
From: FMDF @ 2022-02-06 21:22 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Dāvis Mosāns, linux-fsdevel, BTRFS, kernelnewbies

On Sun, Feb 6, 2022 at 1:48 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Sun, Feb 06, 2022 at 12:01:02PM +0100, FMDF wrote:
> > On Wed, Feb 2, 2022 at 10:50 PM Dāvis Mosāns <davispuh@gmail.com> wrote:
> > >
> > > trešd., 2022. g. 2. febr., plkst. 21:13 — lietotājs Matthew Wilcox
> > > (<willy@infradead.org>) rakstīja:
> > > >
> > > > On Wed, Feb 02, 2022 at 07:15:14PM +0200, Dāvis Mosāns wrote:
> > > > > I have a corrupted file on BTRFS which has CoW disabled thus no
> > > > > checksum. Trying to read this file causes the process to get stuck
> > > > > forever. It doesn't return EIO.
> > > > >
> > > > > How can I find out why it gets stuck?
> > > >
> > > > > $ cat /proc/3449/stack | ./scripts/decode_stacktrace.sh vmlinux
> > > > > folio_wait_bit_common (mm/filemap.c:1314)
> > > > > filemap_get_pages (mm/filemap.c:2622)
> > > > > filemap_read (mm/filemap.c:2676)
> > > > > new_sync_read (fs/read_write.c:401 (discriminator 1))
> > > >
> > > > folio_wait_bit_common() is where it waits for the page to be unlocked.
> > > > Probably the problem is that btrfs isn't unlocking the page on
> > > > seeing the error, so you don't get the -EIO returned?
> > >
> > >
> > > Yeah, but how to find where that happens.
> > > Anyway by pure luck I found memcpy that wrote outside of allocated
> > > memory and fixing that solved this issue but I still don't know how to
> > > debug this properly.
> > >
> > There is no special recipe for debugging "this properly" :)
> >
> > You wrote that "by pure luck" you found a memcpy() that wrote beyond the
> > limit of allocated memory. I suppose that you found that faulty memcpy()
> > somewhere in one of the function listed in the stack trace.
>
> I very much doubt that.  The code flow here is:
>
> userspace calls read() -> VFS -> btrfs -> block layer -> return to btrfs
> -> return to VFS, wait for read to complete.  So by the time anyone's
> looking at the stack trace, all you can see is the part of the call
> chain in the VFS.  There's no way to see where we went in btrfs, nor
> in the block layer.  We also can't see from the stack trace what
> happened with the interrupt which _should have_ cleared the lock bit
> and didn't.
>
OK, I agree. This appears to be is one of those special cases where the mere
reading of a stack trace cannot help much... :(

My argument is about a general approach to debugging some unknown code
by just reading the calls chain. Many times I've been able to find out what was
wrong with code I had never seen before by just following the chain of calls
in subsystems that I know nothing of (e.g., a bug in "tty" that was reported by
Syzbot).

In this special case, if the developer doesn't know that "the interrupt [which]
_should have_ cleared the lock bit and didn't." there is nothing that one can
deduce from a stack trace.

Here one need to know how things work, well beyond the functions that are
listed in the trace. So, probably, if one needs a "recipe" for those cases, the
recipe is just know the subsystem(s) at hand and know how the kernel manages
interrupts.

Actually I haven't deepened this issue but, by reading what Matthew writes,
I doubt that a faulty memcpy() can be the culprit... Davis, are you really sure
that you've fixed that bug?

Regards,

Fabio

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to debug stuck read?
  2022-02-06 21:22         ` FMDF
@ 2022-02-06 23:21           ` Dāvis Mosāns
  2022-02-06 23:49             ` Matthew Wilcox
  0 siblings, 1 reply; 11+ messages in thread
From: Dāvis Mosāns @ 2022-02-06 23:21 UTC (permalink / raw)
  To: FMDF; +Cc: Matthew Wilcox, linux-fsdevel, BTRFS, kernelnewbies

svētd., 2022. g. 6. febr., plkst. 13:01 — lietotājs FMDF
(<fmdefrancesco@gmail.com>) rakstīja:
[...]
> There is no special recipe for debugging "this properly" :)
>
> You wrote that "by pure luck" you found a memcpy() that wrote beyond the
> limit of allocated memory. I suppose that you found that faulty memcpy()
> somewhere in one of the function listed in the stack trace.
>
> That's the right approach! You read the calls chain and find out where something
> looks wrong and then fix it. This is why stack traces are so helpful.
>
> It was not "pure luck". I think that you did what developers usually do after
> decoding a stack trace. If not, how did you find that faulty memcpy() buried
> somewhere in 40 millions lines of code?
>
> it seems that you've found the right way to figure out the problems in code
> that (probably) you had not ever worked on or read before you hit that bug.
>

I think there should be a way to see which locks (and by who/where)
have been taken for a long time. In this case it would be just this
one as no one should take a lock for 10+ mins and such. Here we have
several complications that make figuring this out difficult. Firstly
you can get crash only once per boot. Killing process and reading file
again you won't get any crash at all but process will always get
stuck. Not even dropping caches changes anything. That's why I
couldn't figure out what is the issue because I tried repeating it
several times and never got any new crash in dmesg. Secondly even if
you see some crash in dmesg, your process didn't crash and it's not
clear if it is related to your issue - how to link kernel side crash
with your stuck process. Finally the way how this bug is it was
possible for it to not cause any crash but I'm guessing process
probably wouldn't have stuck then.

> Have you sent a patch to the LKML? If not, please do it.
>
Yeah I did, see patch titled: "[PATCH 2/2] btrfs: prevent copying too
big compressed lzo segment"

svētd., 2022. g. 6. febr., plkst. 23:22 — lietotājs FMDF
(<fmdefrancesco@gmail.com>) rakstīja:
>
> On Sun, Feb 6, 2022 at 1:48 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Sun, Feb 06, 2022 at 12:01:02PM +0100, FMDF wrote:
[...]
> >
> > I very much doubt that.  The code flow here is:
> >
> > userspace calls read() -> VFS -> btrfs -> block layer -> return to btrfs
> > -> return to VFS, wait for read to complete.  So by the time anyone's
> > looking at the stack trace, all you can see is the part of the call
> > chain in the VFS.  There's no way to see where we went in btrfs, nor
> > in the block layer.  We also can't see from the stack trace what
> > happened with the interrupt which _should have_ cleared the lock bit
> > and didn't.
> >
> OK, I agree. This appears to be is one of those special cases where the mere
> reading of a stack trace cannot help much... :(
>
> My argument is about a general approach to debugging some unknown code
> by just reading the calls chain. Many times I've been able to find out what was
> wrong with code I had never seen before by just following the chain of calls
> in subsystems that I know nothing of (e.g., a bug in "tty" that was reported by
> Syzbot).
>
> In this special case, if the developer doesn't know that "the interrupt [which]
> _should have_ cleared the lock bit and didn't." there is nothing that one can
> deduce from a stack trace.
>
> Here one need to know how things work, well beyond the functions that are
> listed in the trace. So, probably, if one needs a "recipe" for those cases, the
> recipe is just know the subsystem(s) at hand and know how the kernel manages
> interrupts.
>

Yeah, that's why I'm saying it was by luck :P The way how I fixed this
was pretty much accidental byproduct. Basically for this issue all I
saw in dmesg was "invalid lzo header". By looking at that part of code
I didn't see anything suspicious and also issue happened after this so
it wasn't culprit (but did hint it could be something related to
this). Anyway at this point I gave up on it, but I saw that I have one
crash in dmesg so I thought I'll fix that. I didn't thought it's
related to this issue, but once I fixed that this also was solved.

> Actually I haven't deepened this issue but, by reading what Matthew writes,
> I doubt that a faulty memcpy() can be the culprit... Davis, are you really sure
> that you've fixed that bug?
>

Yep, fully sure and tested :P I'm able to reproduce stuck process with
100% reliability. After applying my patch it returns EIO as expected
and doesn't get stuck.
If you look at copy_compressed_segment (in fs/btrfs/lzo.c) you'll see
  kaddr = kmap(cur_page);
  memcpy(dest + *cur_in - orig_in,
                 kaddr + offset_in_page(*cur_in), copy_len);
  kunmap(cur_page);

My guess is that kmap(cur_page) locks that page, then memcpy crashes
so that page never gets unmapped causing anyone that tries to map it
to wait forever. Hence this can be reproduced only once per boot. If I
knew how to find kernel thread that is running this we could check
it's stack and it should be stuck here on kmap.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to debug stuck read?
  2022-02-06 23:21           ` Dāvis Mosāns
@ 2022-02-06 23:49             ` Matthew Wilcox
  2022-02-07  0:07               ` Dāvis Mosāns
  0 siblings, 1 reply; 11+ messages in thread
From: Matthew Wilcox @ 2022-02-06 23:49 UTC (permalink / raw)
  To: Dāvis Mosāns; +Cc: FMDF, linux-fsdevel, BTRFS, kernelnewbies

On Mon, Feb 07, 2022 at 01:21:53AM +0200, Dāvis Mosāns wrote:
> svētd., 2022. g. 6. febr., plkst. 13:01 — lietotājs FMDF
> (<fmdefrancesco@gmail.com>) rakstīja:
> [...]
> > There is no special recipe for debugging "this properly" :)
> >
> > You wrote that "by pure luck" you found a memcpy() that wrote beyond the
> > limit of allocated memory. I suppose that you found that faulty memcpy()
> > somewhere in one of the function listed in the stack trace.
> >
> > That's the right approach! You read the calls chain and find out where something
> > looks wrong and then fix it. This is why stack traces are so helpful.
> >
> > It was not "pure luck". I think that you did what developers usually do after
> > decoding a stack trace. If not, how did you find that faulty memcpy() buried
> > somewhere in 40 millions lines of code?
> >
> > it seems that you've found the right way to figure out the problems in code
> > that (probably) you had not ever worked on or read before you hit that bug.
> >
> 
> I think there should be a way to see which locks (and by who/where)
> have been taken for a long time.

Well ... we do, but the problem is that the page lock is a single bit.
We just don't have the space in struct page for a pointer to a stack
trace.  So the page lock isn't like a spinlock or a mutex where we can
use the LOCKDEP infrastructure to tell us this kind of thing.

Also, in this case, we know exactly where the lock was taken and by whom
-- and it would be no more information than you had from the stack trace.
Something I slightly regret is that you used to get a "task blocked for
more than 120 seconds" message from check_hung_task().  But I made
that not show up in this path because I made the sleep killable and
that's only called for UNINTERRUPTIBLE tasks.

Maybe that should be changed.  Perhaps we should emit those messages
for TASK_KILLABLE too.

> Yep, fully sure and tested :P I'm able to reproduce stuck process with
> 100% reliability. After applying my patch it returns EIO as expected
> and doesn't get stuck.
> If you look at copy_compressed_segment (in fs/btrfs/lzo.c) you'll see
>   kaddr = kmap(cur_page);
>   memcpy(dest + *cur_in - orig_in,
>                  kaddr + offset_in_page(*cur_in), copy_len);
>   kunmap(cur_page);
> 
> My guess is that kmap(cur_page) locks that page, then memcpy crashes
> so that page never gets unmapped causing anyone that tries to map it
> to wait forever. Hence this can be reproduced only once per boot. If I
> knew how to find kernel thread that is running this we could check
> it's stack and it should be stuck here on kmap.

kmap() doesn't lock the page; it's already locked at this point.
But if the memcpy() does crash then you're right, the page will never
be unlocked because it was this thread's job to unlock it when the page
was uptodate.  The thread will be dead, so there's no way to find it.
Do we not dump the thread's stack on its death?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to debug stuck read?
  2022-02-06 23:49             ` Matthew Wilcox
@ 2022-02-07  0:07               ` Dāvis Mosāns
  2022-02-07  1:06                 ` Matthew Wilcox
  0 siblings, 1 reply; 11+ messages in thread
From: Dāvis Mosāns @ 2022-02-07  0:07 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: FMDF, linux-fsdevel, BTRFS, kernelnewbies

pirmd., 2022. g. 7. febr., plkst. 01:49 — lietotājs Matthew Wilcox
(<willy@infradead.org>) rakstīja:
>
> On Mon, Feb 07, 2022 at 01:21:53AM +0200, Dāvis Mosāns wrote:
> > svētd., 2022. g. 6. febr., plkst. 13:01 — lietotājs FMDF
> > (<fmdefrancesco@gmail.com>) rakstīja:
> > [...]
> > > There is no special recipe for debugging "this properly" :)
> > >
> > > You wrote that "by pure luck" you found a memcpy() that wrote beyond the
> > > limit of allocated memory. I suppose that you found that faulty memcpy()
> > > somewhere in one of the function listed in the stack trace.
> > >
> > > That's the right approach! You read the calls chain and find out where something
> > > looks wrong and then fix it. This is why stack traces are so helpful.
> > >
> > > It was not "pure luck". I think that you did what developers usually do after
> > > decoding a stack trace. If not, how did you find that faulty memcpy() buried
> > > somewhere in 40 millions lines of code?
> > >
> > > it seems that you've found the right way to figure out the problems in code
> > > that (probably) you had not ever worked on or read before you hit that bug.
> > >
> >
> > I think there should be a way to see which locks (and by who/where)
> > have been taken for a long time.
>
> Well ... we do, but the problem is that the page lock is a single bit.
> We just don't have the space in struct page for a pointer to a stack
> trace.  So the page lock isn't like a spinlock or a mutex where we can
> use the LOCKDEP infrastructure to tell us this kind of thing.
>
> Also, in this case, we know exactly where the lock was taken and by whom
> -- and it would be no more information than you had from the stack trace.

The issue here is that you have a stuck task that doesn't have any
crash/stack trace. The process itself is waiting in
folio_wait_bit_common but I need to find the other side of it.

> Something I slightly regret is that you used to get a "task blocked for
> more than 120 seconds" message from check_hung_task().  But I made
> that not show up in this path because I made the sleep killable and
> that's only called for UNINTERRUPTIBLE tasks.
>
> Maybe that should be changed.  Perhaps we should emit those messages
> for TASK_KILLABLE too.
>

Sounds like that would be very useful.

> > Yep, fully sure and tested :P I'm able to reproduce stuck process with
> > 100% reliability. After applying my patch it returns EIO as expected
> > and doesn't get stuck.
> > If you look at copy_compressed_segment (in fs/btrfs/lzo.c) you'll see
> >   kaddr = kmap(cur_page);
> >   memcpy(dest + *cur_in - orig_in,
> >                  kaddr + offset_in_page(*cur_in), copy_len);
> >   kunmap(cur_page);
> >
> > My guess is that kmap(cur_page) locks that page, then memcpy crashes
> > so that page never gets unmapped causing anyone that tries to map it
> > to wait forever. Hence this can be reproduced only once per boot. If I
> > knew how to find kernel thread that is running this we could check
> > it's stack and it should be stuck here on kmap.
>
> kmap() doesn't lock the page; it's already locked at this point.
> But if the memcpy() does crash then you're right, the page will never
> be unlocked because it was this thread's job to unlock it when the page
> was uptodate.  The thread will be dead, so there's no way to find it.
> Do we not dump the thread's stack on its death?

Yeah there was, but as I said it happens only once per boot. So you
have one (potentially old) crash/stacktrace but many stuck processes
with no clear cause. Eg. you get crash and stuck process, kill
process. Then days later you try reading that file again and it will
get stuck but there won't be stacktrace as it won't reach that memcpy
anymore.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to debug stuck read?
  2022-02-07  0:07               ` Dāvis Mosāns
@ 2022-02-07  1:06                 ` Matthew Wilcox
  2022-02-07  1:22                   ` Dāvis Mosāns
  0 siblings, 1 reply; 11+ messages in thread
From: Matthew Wilcox @ 2022-02-07  1:06 UTC (permalink / raw)
  To: Dāvis Mosāns; +Cc: FMDF, linux-fsdevel, BTRFS, kernelnewbies

On Mon, Feb 07, 2022 at 02:07:47AM +0200, Dāvis Mosāns wrote:
> > > I think there should be a way to see which locks (and by who/where)
> > > have been taken for a long time.
> >
> > Well ... we do, but the problem is that the page lock is a single bit.
> > We just don't have the space in struct page for a pointer to a stack
> > trace.  So the page lock isn't like a spinlock or a mutex where we can
> > use the LOCKDEP infrastructure to tell us this kind of thing.
> >
> > Also, in this case, we know exactly where the lock was taken and by whom
> > -- and it would be no more information than you had from the stack trace.
> 
> The issue here is that you have a stuck task that doesn't have any
> crash/stack trace. The process itself is waiting in
> folio_wait_bit_common but I need to find the other side of it.

Right, but what you're asking for won't help find the other side.
It's just an automated way to find the side you did find.

> > kmap() doesn't lock the page; it's already locked at this point.
> > But if the memcpy() does crash then you're right, the page will never
> > be unlocked because it was this thread's job to unlock it when the page
> > was uptodate.  The thread will be dead, so there's no way to find it.
> > Do we not dump the thread's stack on its death?
> 
> Yeah there was, but as I said it happens only once per boot. So you
> have one (potentially old) crash/stacktrace but many stuck processes
> with no clear cause. Eg. you get crash and stuck process, kill
> process. Then days later you try reading that file again and it will
> get stuck but there won't be stacktrace as it won't reach that memcpy
> anymore.

I can't think of a way to solve that.  We can't know whether a dying task
"was going to" unlock a page.  So we have a locked page in the page cache
that nobody will ever unlock.  We can't remove it, because we don't know
that task died.  We can't start I/O on it again, because it looks like
I/O is already in progress.

I think the only answer is "Don't ignore stack dumps in dmesg".

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How to debug stuck read?
  2022-02-07  1:06                 ` Matthew Wilcox
@ 2022-02-07  1:22                   ` Dāvis Mosāns
  0 siblings, 0 replies; 11+ messages in thread
From: Dāvis Mosāns @ 2022-02-07  1:22 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: FMDF, linux-fsdevel, BTRFS, kernelnewbies

pirmd., 2022. g. 7. febr., plkst. 03:06 — lietotājs Matthew Wilcox
(<willy@infradead.org>) rakstīja:
[...]
>
> I can't think of a way to solve that.  We can't know whether a dying task
> "was going to" unlock a page.  So we have a locked page in the page cache
> that nobody will ever unlock.  We can't remove it, because we don't know
> that task died.  We can't start I/O on it again, because it looks like
> I/O is already in progress.
>
> I think the only answer is "Don't ignore stack dumps in dmesg".

So looks like this is the conclusion of current state.

> We can't remove it, because we don't know that task died.
This seems pretty bad, maybe if there was build time option that
enables logging (eg. PID and lock time) for locked pages it could help
with debugging this and maybe allow even to force unlock. But then
again since there should be stack dump and I guess this is very rare
case of happening maybe it's not worth it.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-02-07  1:22 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-02-02 17:15 How to debug stuck read? Dāvis Mosāns
2022-02-02 19:13 ` Matthew Wilcox
2022-02-02 21:50   ` Dāvis Mosāns
2022-02-06 11:01     ` FMDF
2022-02-06 12:48       ` Matthew Wilcox
2022-02-06 21:22         ` FMDF
2022-02-06 23:21           ` Dāvis Mosāns
2022-02-06 23:49             ` Matthew Wilcox
2022-02-07  0:07               ` Dāvis Mosāns
2022-02-07  1:06                 ` Matthew Wilcox
2022-02-07  1:22                   ` Dāvis Mosāns

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).