* How to debug stuck read? @ 2022-02-02 17:15 Dāvis Mosāns 2022-02-02 19:13 ` Matthew Wilcox 0 siblings, 1 reply; 11+ messages in thread From: Dāvis Mosāns @ 2022-02-02 17:15 UTC (permalink / raw) To: BTRFS, linux-fsdevel, kernelnewbies Hi, I have a corrupted file on BTRFS which has CoW disabled thus no checksum. Trying to read this file causes the process to get stuck forever. It doesn't return EIO. How can I find out why it gets stuck? $ ddrescue -b 1 currupted_file /tmp/temp GNU ddrescue 1.26 Press Ctrl-C to interrupt ipos: 0 B, non-trimmed: 0 B, current rate: 0 B/s opos: 0 B, non-scraped: 0 B, average rate: 0 B/s non-tried: 8388 kB, bad-sector: 0 B, error rate: 0 B/s rescued: 0 B, bad areas: 0, run time: 0s pct rescued: 0.00%, read errors: 0, remaining time: n/a time since last successful read: n/a Copying non-tried blocks... Pass 1 (forwards) ^C // doesn't stop with Ctrl+C nor SIGTERM $ gdb -q -p 3449 Attaching to process 3449 ^C // same gets stuck $ cat /proc/3449/stack | ./scripts/decode_stacktrace.sh vmlinux folio_wait_bit_common (mm/filemap.c:1314) filemap_get_pages (mm/filemap.c:2622) filemap_read (mm/filemap.c:2676) new_sync_read (fs/read_write.c:401 (discriminator 1)) vfs_read (fs/read_write.c:481) ksys_read (fs/read_write.c:619) do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) I enabled CONFIG_BTRFS_DEBUG=y CONFIG_BTRFS_ASSERT=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_LOCKDEP=y CONFIG_LOCKUP_DETECTOR=y CONFIG_SOFTLOCKUP_DETECTOR=y CONFIG_LOCK_DEBUGGING_SUPPORT=y CONFIG_PROVE_LOCKING=y CONFIG_DEBUG_SPINLOCK=y CONFIG_DEBUG_LOCK_ALLOC=y but in dmesg only thing that shows up is a lot of BTRFS error (device sdh): invalid lzo header, lzo len 2937060802 compressed len 4096 If I try to do btrfs send, it gets stuck same way $ btrfs send -v /mnt/fs > /dev/null $ cat /proc/4712/stack | ./scripts/decode_stacktrace.sh vmlinux folio_wait_bit_common (mm/filemap.c:1314) __filemap_get_folio (mm/filemap.c:1690 ./include/linux/pagemap.h:779 mm/filemap.c:1960) pagecache_get_page (mm/folio-compat.c:126) send_extent_data (fs/btrfs/send.c:4980 fs/btrfs/send.c:5048 fs/btrfs/send.c:5235) btrfs process_extent (fs/btrfs/send.c:5575 fs/btrfs/send.c:5959) btrfs btrfs_ioctl_send (fs/btrfs/send.c:6770 fs/btrfs/send.c:7368 fs/btrfs/send.c:7688) btrfs _btrfs_ioctl_send (fs/btrfs/ioctl.c:4963) btrfs btrfs_ioctl (fs/btrfs/ioctl.c:5072) btrfs __x64_sys_ioctl (fs/ioctl.c:52 fs/ioctl.c:874 fs/ioctl.c:860 fs/ioctl.c:860) do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) I'm now using 5.17.0-rc2 but it's exactly same with 5.16.5 Thanks! Best regards, Dāvis ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: How to debug stuck read? 2022-02-02 17:15 How to debug stuck read? Dāvis Mosāns @ 2022-02-02 19:13 ` Matthew Wilcox 2022-02-02 21:50 ` Dāvis Mosāns 0 siblings, 1 reply; 11+ messages in thread From: Matthew Wilcox @ 2022-02-02 19:13 UTC (permalink / raw) To: Dāvis Mosāns; +Cc: BTRFS, linux-fsdevel, kernelnewbies On Wed, Feb 02, 2022 at 07:15:14PM +0200, Dāvis Mosāns wrote: > I have a corrupted file on BTRFS which has CoW disabled thus no > checksum. Trying to read this file causes the process to get stuck > forever. It doesn't return EIO. > > How can I find out why it gets stuck? > $ cat /proc/3449/stack | ./scripts/decode_stacktrace.sh vmlinux > folio_wait_bit_common (mm/filemap.c:1314) > filemap_get_pages (mm/filemap.c:2622) > filemap_read (mm/filemap.c:2676) > new_sync_read (fs/read_write.c:401 (discriminator 1)) folio_wait_bit_common() is where it waits for the page to be unlocked. Probably the problem is that btrfs isn't unlocking the page on seeing the error, so you don't get the -EIO returned? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: How to debug stuck read? 2022-02-02 19:13 ` Matthew Wilcox @ 2022-02-02 21:50 ` Dāvis Mosāns 2022-02-06 11:01 ` FMDF 0 siblings, 1 reply; 11+ messages in thread From: Dāvis Mosāns @ 2022-02-02 21:50 UTC (permalink / raw) To: Matthew Wilcox; +Cc: BTRFS, linux-fsdevel, kernelnewbies trešd., 2022. g. 2. febr., plkst. 21:13 — lietotājs Matthew Wilcox (<willy@infradead.org>) rakstīja: > > On Wed, Feb 02, 2022 at 07:15:14PM +0200, Dāvis Mosāns wrote: > > I have a corrupted file on BTRFS which has CoW disabled thus no > > checksum. Trying to read this file causes the process to get stuck > > forever. It doesn't return EIO. > > > > How can I find out why it gets stuck? > > > $ cat /proc/3449/stack | ./scripts/decode_stacktrace.sh vmlinux > > folio_wait_bit_common (mm/filemap.c:1314) > > filemap_get_pages (mm/filemap.c:2622) > > filemap_read (mm/filemap.c:2676) > > new_sync_read (fs/read_write.c:401 (discriminator 1)) > > folio_wait_bit_common() is where it waits for the page to be unlocked. > Probably the problem is that btrfs isn't unlocking the page on > seeing the error, so you don't get the -EIO returned? Yeah, but how to find where that happens. Anyway by pure luck I found memcpy that wrote outside of allocated memory and fixing that solved this issue but I still don't know how to debug this properly. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: How to debug stuck read? 2022-02-02 21:50 ` Dāvis Mosāns @ 2022-02-06 11:01 ` FMDF 2022-02-06 12:48 ` Matthew Wilcox 0 siblings, 1 reply; 11+ messages in thread From: FMDF @ 2022-02-06 11:01 UTC (permalink / raw) To: Dāvis Mosāns Cc: Matthew Wilcox, linux-fsdevel, BTRFS, kernelnewbies On Wed, Feb 2, 2022 at 10:50 PM Dāvis Mosāns <davispuh@gmail.com> wrote: > > trešd., 2022. g. 2. febr., plkst. 21:13 — lietotājs Matthew Wilcox > (<willy@infradead.org>) rakstīja: > > > > On Wed, Feb 02, 2022 at 07:15:14PM +0200, Dāvis Mosāns wrote: > > > I have a corrupted file on BTRFS which has CoW disabled thus no > > > checksum. Trying to read this file causes the process to get stuck > > > forever. It doesn't return EIO. > > > > > > How can I find out why it gets stuck? > > > > > $ cat /proc/3449/stack | ./scripts/decode_stacktrace.sh vmlinux > > > folio_wait_bit_common (mm/filemap.c:1314) > > > filemap_get_pages (mm/filemap.c:2622) > > > filemap_read (mm/filemap.c:2676) > > > new_sync_read (fs/read_write.c:401 (discriminator 1)) > > > > folio_wait_bit_common() is where it waits for the page to be unlocked. > > Probably the problem is that btrfs isn't unlocking the page on > > seeing the error, so you don't get the -EIO returned? > > > Yeah, but how to find where that happens. > Anyway by pure luck I found memcpy that wrote outside of allocated > memory and fixing that solved this issue but I still don't know how to > debug this properly. > There is no special recipe for debugging "this properly" :) You wrote that "by pure luck" you found a memcpy() that wrote beyond the limit of allocated memory. I suppose that you found that faulty memcpy() somewhere in one of the function listed in the stack trace. That's the right approach! You read the calls chain and find out where something looks wrong and then fix it. This is why stack traces are so helpful. It was not "pure luck". I think that you did what developers usually do after decoding a stack trace. If not, how did you find that faulty memcpy() buried somewhere in 40 millions lines of code? it seems that you've found the right way to figure out the problems in code that (probably) you had not ever worked on or read before you hit that bug. Have you sent a patch to the LKML? If not, please do it. Regards, Fabio M. De Francesco ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: How to debug stuck read? 2022-02-06 11:01 ` FMDF @ 2022-02-06 12:48 ` Matthew Wilcox 2022-02-06 21:22 ` FMDF 0 siblings, 1 reply; 11+ messages in thread From: Matthew Wilcox @ 2022-02-06 12:48 UTC (permalink / raw) To: FMDF; +Cc: Dāvis Mosāns, linux-fsdevel, BTRFS, kernelnewbies On Sun, Feb 06, 2022 at 12:01:02PM +0100, FMDF wrote: > On Wed, Feb 2, 2022 at 10:50 PM Dāvis Mosāns <davispuh@gmail.com> wrote: > > > > trešd., 2022. g. 2. febr., plkst. 21:13 — lietotājs Matthew Wilcox > > (<willy@infradead.org>) rakstīja: > > > > > > On Wed, Feb 02, 2022 at 07:15:14PM +0200, Dāvis Mosāns wrote: > > > > I have a corrupted file on BTRFS which has CoW disabled thus no > > > > checksum. Trying to read this file causes the process to get stuck > > > > forever. It doesn't return EIO. > > > > > > > > How can I find out why it gets stuck? > > > > > > > $ cat /proc/3449/stack | ./scripts/decode_stacktrace.sh vmlinux > > > > folio_wait_bit_common (mm/filemap.c:1314) > > > > filemap_get_pages (mm/filemap.c:2622) > > > > filemap_read (mm/filemap.c:2676) > > > > new_sync_read (fs/read_write.c:401 (discriminator 1)) > > > > > > folio_wait_bit_common() is where it waits for the page to be unlocked. > > > Probably the problem is that btrfs isn't unlocking the page on > > > seeing the error, so you don't get the -EIO returned? > > > > > > Yeah, but how to find where that happens. > > Anyway by pure luck I found memcpy that wrote outside of allocated > > memory and fixing that solved this issue but I still don't know how to > > debug this properly. > > > There is no special recipe for debugging "this properly" :) > > You wrote that "by pure luck" you found a memcpy() that wrote beyond the > limit of allocated memory. I suppose that you found that faulty memcpy() > somewhere in one of the function listed in the stack trace. I very much doubt that. The code flow here is: userspace calls read() -> VFS -> btrfs -> block layer -> return to btrfs -> return to VFS, wait for read to complete. So by the time anyone's looking at the stack trace, all you can see is the part of the call chain in the VFS. There's no way to see where we went in btrfs, nor in the block layer. We also can't see from the stack trace what happened with the interrupt which _should have_ cleared the lock bit and didn't. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: How to debug stuck read? 2022-02-06 12:48 ` Matthew Wilcox @ 2022-02-06 21:22 ` FMDF 2022-02-06 23:21 ` Dāvis Mosāns 0 siblings, 1 reply; 11+ messages in thread From: FMDF @ 2022-02-06 21:22 UTC (permalink / raw) To: Matthew Wilcox Cc: Dāvis Mosāns, linux-fsdevel, BTRFS, kernelnewbies On Sun, Feb 6, 2022 at 1:48 PM Matthew Wilcox <willy@infradead.org> wrote: > > On Sun, Feb 06, 2022 at 12:01:02PM +0100, FMDF wrote: > > On Wed, Feb 2, 2022 at 10:50 PM Dāvis Mosāns <davispuh@gmail.com> wrote: > > > > > > trešd., 2022. g. 2. febr., plkst. 21:13 — lietotājs Matthew Wilcox > > > (<willy@infradead.org>) rakstīja: > > > > > > > > On Wed, Feb 02, 2022 at 07:15:14PM +0200, Dāvis Mosāns wrote: > > > > > I have a corrupted file on BTRFS which has CoW disabled thus no > > > > > checksum. Trying to read this file causes the process to get stuck > > > > > forever. It doesn't return EIO. > > > > > > > > > > How can I find out why it gets stuck? > > > > > > > > > $ cat /proc/3449/stack | ./scripts/decode_stacktrace.sh vmlinux > > > > > folio_wait_bit_common (mm/filemap.c:1314) > > > > > filemap_get_pages (mm/filemap.c:2622) > > > > > filemap_read (mm/filemap.c:2676) > > > > > new_sync_read (fs/read_write.c:401 (discriminator 1)) > > > > > > > > folio_wait_bit_common() is where it waits for the page to be unlocked. > > > > Probably the problem is that btrfs isn't unlocking the page on > > > > seeing the error, so you don't get the -EIO returned? > > > > > > > > > Yeah, but how to find where that happens. > > > Anyway by pure luck I found memcpy that wrote outside of allocated > > > memory and fixing that solved this issue but I still don't know how to > > > debug this properly. > > > > > There is no special recipe for debugging "this properly" :) > > > > You wrote that "by pure luck" you found a memcpy() that wrote beyond the > > limit of allocated memory. I suppose that you found that faulty memcpy() > > somewhere in one of the function listed in the stack trace. > > I very much doubt that. The code flow here is: > > userspace calls read() -> VFS -> btrfs -> block layer -> return to btrfs > -> return to VFS, wait for read to complete. So by the time anyone's > looking at the stack trace, all you can see is the part of the call > chain in the VFS. There's no way to see where we went in btrfs, nor > in the block layer. We also can't see from the stack trace what > happened with the interrupt which _should have_ cleared the lock bit > and didn't. > OK, I agree. This appears to be is one of those special cases where the mere reading of a stack trace cannot help much... :( My argument is about a general approach to debugging some unknown code by just reading the calls chain. Many times I've been able to find out what was wrong with code I had never seen before by just following the chain of calls in subsystems that I know nothing of (e.g., a bug in "tty" that was reported by Syzbot). In this special case, if the developer doesn't know that "the interrupt [which] _should have_ cleared the lock bit and didn't." there is nothing that one can deduce from a stack trace. Here one need to know how things work, well beyond the functions that are listed in the trace. So, probably, if one needs a "recipe" for those cases, the recipe is just know the subsystem(s) at hand and know how the kernel manages interrupts. Actually I haven't deepened this issue but, by reading what Matthew writes, I doubt that a faulty memcpy() can be the culprit... Davis, are you really sure that you've fixed that bug? Regards, Fabio ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: How to debug stuck read? 2022-02-06 21:22 ` FMDF @ 2022-02-06 23:21 ` Dāvis Mosāns 2022-02-06 23:49 ` Matthew Wilcox 0 siblings, 1 reply; 11+ messages in thread From: Dāvis Mosāns @ 2022-02-06 23:21 UTC (permalink / raw) To: FMDF; +Cc: Matthew Wilcox, linux-fsdevel, BTRFS, kernelnewbies svētd., 2022. g. 6. febr., plkst. 13:01 — lietotājs FMDF (<fmdefrancesco@gmail.com>) rakstīja: [...] > There is no special recipe for debugging "this properly" :) > > You wrote that "by pure luck" you found a memcpy() that wrote beyond the > limit of allocated memory. I suppose that you found that faulty memcpy() > somewhere in one of the function listed in the stack trace. > > That's the right approach! You read the calls chain and find out where something > looks wrong and then fix it. This is why stack traces are so helpful. > > It was not "pure luck". I think that you did what developers usually do after > decoding a stack trace. If not, how did you find that faulty memcpy() buried > somewhere in 40 millions lines of code? > > it seems that you've found the right way to figure out the problems in code > that (probably) you had not ever worked on or read before you hit that bug. > I think there should be a way to see which locks (and by who/where) have been taken for a long time. In this case it would be just this one as no one should take a lock for 10+ mins and such. Here we have several complications that make figuring this out difficult. Firstly you can get crash only once per boot. Killing process and reading file again you won't get any crash at all but process will always get stuck. Not even dropping caches changes anything. That's why I couldn't figure out what is the issue because I tried repeating it several times and never got any new crash in dmesg. Secondly even if you see some crash in dmesg, your process didn't crash and it's not clear if it is related to your issue - how to link kernel side crash with your stuck process. Finally the way how this bug is it was possible for it to not cause any crash but I'm guessing process probably wouldn't have stuck then. > Have you sent a patch to the LKML? If not, please do it. > Yeah I did, see patch titled: "[PATCH 2/2] btrfs: prevent copying too big compressed lzo segment" svētd., 2022. g. 6. febr., plkst. 23:22 — lietotājs FMDF (<fmdefrancesco@gmail.com>) rakstīja: > > On Sun, Feb 6, 2022 at 1:48 PM Matthew Wilcox <willy@infradead.org> wrote: > > > > On Sun, Feb 06, 2022 at 12:01:02PM +0100, FMDF wrote: [...] > > > > I very much doubt that. The code flow here is: > > > > userspace calls read() -> VFS -> btrfs -> block layer -> return to btrfs > > -> return to VFS, wait for read to complete. So by the time anyone's > > looking at the stack trace, all you can see is the part of the call > > chain in the VFS. There's no way to see where we went in btrfs, nor > > in the block layer. We also can't see from the stack trace what > > happened with the interrupt which _should have_ cleared the lock bit > > and didn't. > > > OK, I agree. This appears to be is one of those special cases where the mere > reading of a stack trace cannot help much... :( > > My argument is about a general approach to debugging some unknown code > by just reading the calls chain. Many times I've been able to find out what was > wrong with code I had never seen before by just following the chain of calls > in subsystems that I know nothing of (e.g., a bug in "tty" that was reported by > Syzbot). > > In this special case, if the developer doesn't know that "the interrupt [which] > _should have_ cleared the lock bit and didn't." there is nothing that one can > deduce from a stack trace. > > Here one need to know how things work, well beyond the functions that are > listed in the trace. So, probably, if one needs a "recipe" for those cases, the > recipe is just know the subsystem(s) at hand and know how the kernel manages > interrupts. > Yeah, that's why I'm saying it was by luck :P The way how I fixed this was pretty much accidental byproduct. Basically for this issue all I saw in dmesg was "invalid lzo header". By looking at that part of code I didn't see anything suspicious and also issue happened after this so it wasn't culprit (but did hint it could be something related to this). Anyway at this point I gave up on it, but I saw that I have one crash in dmesg so I thought I'll fix that. I didn't thought it's related to this issue, but once I fixed that this also was solved. > Actually I haven't deepened this issue but, by reading what Matthew writes, > I doubt that a faulty memcpy() can be the culprit... Davis, are you really sure > that you've fixed that bug? > Yep, fully sure and tested :P I'm able to reproduce stuck process with 100% reliability. After applying my patch it returns EIO as expected and doesn't get stuck. If you look at copy_compressed_segment (in fs/btrfs/lzo.c) you'll see kaddr = kmap(cur_page); memcpy(dest + *cur_in - orig_in, kaddr + offset_in_page(*cur_in), copy_len); kunmap(cur_page); My guess is that kmap(cur_page) locks that page, then memcpy crashes so that page never gets unmapped causing anyone that tries to map it to wait forever. Hence this can be reproduced only once per boot. If I knew how to find kernel thread that is running this we could check it's stack and it should be stuck here on kmap. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: How to debug stuck read? 2022-02-06 23:21 ` Dāvis Mosāns @ 2022-02-06 23:49 ` Matthew Wilcox 2022-02-07 0:07 ` Dāvis Mosāns 0 siblings, 1 reply; 11+ messages in thread From: Matthew Wilcox @ 2022-02-06 23:49 UTC (permalink / raw) To: Dāvis Mosāns; +Cc: FMDF, linux-fsdevel, BTRFS, kernelnewbies On Mon, Feb 07, 2022 at 01:21:53AM +0200, Dāvis Mosāns wrote: > svētd., 2022. g. 6. febr., plkst. 13:01 — lietotājs FMDF > (<fmdefrancesco@gmail.com>) rakstīja: > [...] > > There is no special recipe for debugging "this properly" :) > > > > You wrote that "by pure luck" you found a memcpy() that wrote beyond the > > limit of allocated memory. I suppose that you found that faulty memcpy() > > somewhere in one of the function listed in the stack trace. > > > > That's the right approach! You read the calls chain and find out where something > > looks wrong and then fix it. This is why stack traces are so helpful. > > > > It was not "pure luck". I think that you did what developers usually do after > > decoding a stack trace. If not, how did you find that faulty memcpy() buried > > somewhere in 40 millions lines of code? > > > > it seems that you've found the right way to figure out the problems in code > > that (probably) you had not ever worked on or read before you hit that bug. > > > > I think there should be a way to see which locks (and by who/where) > have been taken for a long time. Well ... we do, but the problem is that the page lock is a single bit. We just don't have the space in struct page for a pointer to a stack trace. So the page lock isn't like a spinlock or a mutex where we can use the LOCKDEP infrastructure to tell us this kind of thing. Also, in this case, we know exactly where the lock was taken and by whom -- and it would be no more information than you had from the stack trace. Something I slightly regret is that you used to get a "task blocked for more than 120 seconds" message from check_hung_task(). But I made that not show up in this path because I made the sleep killable and that's only called for UNINTERRUPTIBLE tasks. Maybe that should be changed. Perhaps we should emit those messages for TASK_KILLABLE too. > Yep, fully sure and tested :P I'm able to reproduce stuck process with > 100% reliability. After applying my patch it returns EIO as expected > and doesn't get stuck. > If you look at copy_compressed_segment (in fs/btrfs/lzo.c) you'll see > kaddr = kmap(cur_page); > memcpy(dest + *cur_in - orig_in, > kaddr + offset_in_page(*cur_in), copy_len); > kunmap(cur_page); > > My guess is that kmap(cur_page) locks that page, then memcpy crashes > so that page never gets unmapped causing anyone that tries to map it > to wait forever. Hence this can be reproduced only once per boot. If I > knew how to find kernel thread that is running this we could check > it's stack and it should be stuck here on kmap. kmap() doesn't lock the page; it's already locked at this point. But if the memcpy() does crash then you're right, the page will never be unlocked because it was this thread's job to unlock it when the page was uptodate. The thread will be dead, so there's no way to find it. Do we not dump the thread's stack on its death? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: How to debug stuck read? 2022-02-06 23:49 ` Matthew Wilcox @ 2022-02-07 0:07 ` Dāvis Mosāns 2022-02-07 1:06 ` Matthew Wilcox 0 siblings, 1 reply; 11+ messages in thread From: Dāvis Mosāns @ 2022-02-07 0:07 UTC (permalink / raw) To: Matthew Wilcox; +Cc: FMDF, linux-fsdevel, BTRFS, kernelnewbies pirmd., 2022. g. 7. febr., plkst. 01:49 — lietotājs Matthew Wilcox (<willy@infradead.org>) rakstīja: > > On Mon, Feb 07, 2022 at 01:21:53AM +0200, Dāvis Mosāns wrote: > > svētd., 2022. g. 6. febr., plkst. 13:01 — lietotājs FMDF > > (<fmdefrancesco@gmail.com>) rakstīja: > > [...] > > > There is no special recipe for debugging "this properly" :) > > > > > > You wrote that "by pure luck" you found a memcpy() that wrote beyond the > > > limit of allocated memory. I suppose that you found that faulty memcpy() > > > somewhere in one of the function listed in the stack trace. > > > > > > That's the right approach! You read the calls chain and find out where something > > > looks wrong and then fix it. This is why stack traces are so helpful. > > > > > > It was not "pure luck". I think that you did what developers usually do after > > > decoding a stack trace. If not, how did you find that faulty memcpy() buried > > > somewhere in 40 millions lines of code? > > > > > > it seems that you've found the right way to figure out the problems in code > > > that (probably) you had not ever worked on or read before you hit that bug. > > > > > > > I think there should be a way to see which locks (and by who/where) > > have been taken for a long time. > > Well ... we do, but the problem is that the page lock is a single bit. > We just don't have the space in struct page for a pointer to a stack > trace. So the page lock isn't like a spinlock or a mutex where we can > use the LOCKDEP infrastructure to tell us this kind of thing. > > Also, in this case, we know exactly where the lock was taken and by whom > -- and it would be no more information than you had from the stack trace. The issue here is that you have a stuck task that doesn't have any crash/stack trace. The process itself is waiting in folio_wait_bit_common but I need to find the other side of it. > Something I slightly regret is that you used to get a "task blocked for > more than 120 seconds" message from check_hung_task(). But I made > that not show up in this path because I made the sleep killable and > that's only called for UNINTERRUPTIBLE tasks. > > Maybe that should be changed. Perhaps we should emit those messages > for TASK_KILLABLE too. > Sounds like that would be very useful. > > Yep, fully sure and tested :P I'm able to reproduce stuck process with > > 100% reliability. After applying my patch it returns EIO as expected > > and doesn't get stuck. > > If you look at copy_compressed_segment (in fs/btrfs/lzo.c) you'll see > > kaddr = kmap(cur_page); > > memcpy(dest + *cur_in - orig_in, > > kaddr + offset_in_page(*cur_in), copy_len); > > kunmap(cur_page); > > > > My guess is that kmap(cur_page) locks that page, then memcpy crashes > > so that page never gets unmapped causing anyone that tries to map it > > to wait forever. Hence this can be reproduced only once per boot. If I > > knew how to find kernel thread that is running this we could check > > it's stack and it should be stuck here on kmap. > > kmap() doesn't lock the page; it's already locked at this point. > But if the memcpy() does crash then you're right, the page will never > be unlocked because it was this thread's job to unlock it when the page > was uptodate. The thread will be dead, so there's no way to find it. > Do we not dump the thread's stack on its death? Yeah there was, but as I said it happens only once per boot. So you have one (potentially old) crash/stacktrace but many stuck processes with no clear cause. Eg. you get crash and stuck process, kill process. Then days later you try reading that file again and it will get stuck but there won't be stacktrace as it won't reach that memcpy anymore. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: How to debug stuck read? 2022-02-07 0:07 ` Dāvis Mosāns @ 2022-02-07 1:06 ` Matthew Wilcox 2022-02-07 1:22 ` Dāvis Mosāns 0 siblings, 1 reply; 11+ messages in thread From: Matthew Wilcox @ 2022-02-07 1:06 UTC (permalink / raw) To: Dāvis Mosāns; +Cc: FMDF, linux-fsdevel, BTRFS, kernelnewbies On Mon, Feb 07, 2022 at 02:07:47AM +0200, Dāvis Mosāns wrote: > > > I think there should be a way to see which locks (and by who/where) > > > have been taken for a long time. > > > > Well ... we do, but the problem is that the page lock is a single bit. > > We just don't have the space in struct page for a pointer to a stack > > trace. So the page lock isn't like a spinlock or a mutex where we can > > use the LOCKDEP infrastructure to tell us this kind of thing. > > > > Also, in this case, we know exactly where the lock was taken and by whom > > -- and it would be no more information than you had from the stack trace. > > The issue here is that you have a stuck task that doesn't have any > crash/stack trace. The process itself is waiting in > folio_wait_bit_common but I need to find the other side of it. Right, but what you're asking for won't help find the other side. It's just an automated way to find the side you did find. > > kmap() doesn't lock the page; it's already locked at this point. > > But if the memcpy() does crash then you're right, the page will never > > be unlocked because it was this thread's job to unlock it when the page > > was uptodate. The thread will be dead, so there's no way to find it. > > Do we not dump the thread's stack on its death? > > Yeah there was, but as I said it happens only once per boot. So you > have one (potentially old) crash/stacktrace but many stuck processes > with no clear cause. Eg. you get crash and stuck process, kill > process. Then days later you try reading that file again and it will > get stuck but there won't be stacktrace as it won't reach that memcpy > anymore. I can't think of a way to solve that. We can't know whether a dying task "was going to" unlock a page. So we have a locked page in the page cache that nobody will ever unlock. We can't remove it, because we don't know that task died. We can't start I/O on it again, because it looks like I/O is already in progress. I think the only answer is "Don't ignore stack dumps in dmesg". ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: How to debug stuck read? 2022-02-07 1:06 ` Matthew Wilcox @ 2022-02-07 1:22 ` Dāvis Mosāns 0 siblings, 0 replies; 11+ messages in thread From: Dāvis Mosāns @ 2022-02-07 1:22 UTC (permalink / raw) To: Matthew Wilcox; +Cc: FMDF, linux-fsdevel, BTRFS, kernelnewbies pirmd., 2022. g. 7. febr., plkst. 03:06 — lietotājs Matthew Wilcox (<willy@infradead.org>) rakstīja: [...] > > I can't think of a way to solve that. We can't know whether a dying task > "was going to" unlock a page. So we have a locked page in the page cache > that nobody will ever unlock. We can't remove it, because we don't know > that task died. We can't start I/O on it again, because it looks like > I/O is already in progress. > > I think the only answer is "Don't ignore stack dumps in dmesg". So looks like this is the conclusion of current state. > We can't remove it, because we don't know that task died. This seems pretty bad, maybe if there was build time option that enables logging (eg. PID and lock time) for locked pages it could help with debugging this and maybe allow even to force unlock. But then again since there should be stack dump and I guess this is very rare case of happening maybe it's not worth it. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2022-02-07 1:22 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-02-02 17:15 How to debug stuck read? Dāvis Mosāns 2022-02-02 19:13 ` Matthew Wilcox 2022-02-02 21:50 ` Dāvis Mosāns 2022-02-06 11:01 ` FMDF 2022-02-06 12:48 ` Matthew Wilcox 2022-02-06 21:22 ` FMDF 2022-02-06 23:21 ` Dāvis Mosāns 2022-02-06 23:49 ` Matthew Wilcox 2022-02-07 0:07 ` Dāvis Mosāns 2022-02-07 1:06 ` Matthew Wilcox 2022-02-07 1:22 ` Dāvis Mosāns
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).