From: FMDF <fmdefrancesco@gmail.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: "Dāvis Mosāns" <davispuh@gmail.com>,
linux-fsdevel@vger.kernel.org,
BTRFS <linux-btrfs@vger.kernel.org>,
kernelnewbies <kernelnewbies@kernelnewbies.org>
Subject: Re: How to debug stuck read?
Date: Sun, 6 Feb 2022 22:22:16 +0100 [thread overview]
Message-ID: <CAPj211uFgCyri=RKnOJs2cV7-9FRFjOPLti8Jo0ODZeHEPgGAw@mail.gmail.com> (raw)
In-Reply-To: <Yf/DiefrNOkib5mm@casper.infradead.org>
On Sun, Feb 6, 2022 at 1:48 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Sun, Feb 06, 2022 at 12:01:02PM +0100, FMDF wrote:
> > On Wed, Feb 2, 2022 at 10:50 PM Dāvis Mosāns <davispuh@gmail.com> wrote:
> > >
> > > trešd., 2022. g. 2. febr., plkst. 21:13 — lietotājs Matthew Wilcox
> > > (<willy@infradead.org>) rakstīja:
> > > >
> > > > On Wed, Feb 02, 2022 at 07:15:14PM +0200, Dāvis Mosāns wrote:
> > > > > I have a corrupted file on BTRFS which has CoW disabled thus no
> > > > > checksum. Trying to read this file causes the process to get stuck
> > > > > forever. It doesn't return EIO.
> > > > >
> > > > > How can I find out why it gets stuck?
> > > >
> > > > > $ cat /proc/3449/stack | ./scripts/decode_stacktrace.sh vmlinux
> > > > > folio_wait_bit_common (mm/filemap.c:1314)
> > > > > filemap_get_pages (mm/filemap.c:2622)
> > > > > filemap_read (mm/filemap.c:2676)
> > > > > new_sync_read (fs/read_write.c:401 (discriminator 1))
> > > >
> > > > folio_wait_bit_common() is where it waits for the page to be unlocked.
> > > > Probably the problem is that btrfs isn't unlocking the page on
> > > > seeing the error, so you don't get the -EIO returned?
> > >
> > >
> > > Yeah, but how to find where that happens.
> > > Anyway by pure luck I found memcpy that wrote outside of allocated
> > > memory and fixing that solved this issue but I still don't know how to
> > > debug this properly.
> > >
> > There is no special recipe for debugging "this properly" :)
> >
> > You wrote that "by pure luck" you found a memcpy() that wrote beyond the
> > limit of allocated memory. I suppose that you found that faulty memcpy()
> > somewhere in one of the function listed in the stack trace.
>
> I very much doubt that. The code flow here is:
>
> userspace calls read() -> VFS -> btrfs -> block layer -> return to btrfs
> -> return to VFS, wait for read to complete. So by the time anyone's
> looking at the stack trace, all you can see is the part of the call
> chain in the VFS. There's no way to see where we went in btrfs, nor
> in the block layer. We also can't see from the stack trace what
> happened with the interrupt which _should have_ cleared the lock bit
> and didn't.
>
OK, I agree. This appears to be is one of those special cases where the mere
reading of a stack trace cannot help much... :(
My argument is about a general approach to debugging some unknown code
by just reading the calls chain. Many times I've been able to find out what was
wrong with code I had never seen before by just following the chain of calls
in subsystems that I know nothing of (e.g., a bug in "tty" that was reported by
Syzbot).
In this special case, if the developer doesn't know that "the interrupt [which]
_should have_ cleared the lock bit and didn't." there is nothing that one can
deduce from a stack trace.
Here one need to know how things work, well beyond the functions that are
listed in the trace. So, probably, if one needs a "recipe" for those cases, the
recipe is just know the subsystem(s) at hand and know how the kernel manages
interrupts.
Actually I haven't deepened this issue but, by reading what Matthew writes,
I doubt that a faulty memcpy() can be the culprit... Davis, are you really sure
that you've fixed that bug?
Regards,
Fabio
next prev parent reply other threads:[~2022-02-06 21:22 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-02 17:15 How to debug stuck read? Dāvis Mosāns
2022-02-02 19:13 ` Matthew Wilcox
2022-02-02 21:50 ` Dāvis Mosāns
2022-02-06 11:01 ` FMDF
2022-02-06 12:48 ` Matthew Wilcox
2022-02-06 21:22 ` FMDF [this message]
2022-02-06 23:21 ` Dāvis Mosāns
2022-02-06 23:49 ` Matthew Wilcox
2022-02-07 0:07 ` Dāvis Mosāns
2022-02-07 1:06 ` Matthew Wilcox
2022-02-07 1:22 ` Dāvis Mosāns
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAPj211uFgCyri=RKnOJs2cV7-9FRFjOPLti8Jo0ODZeHEPgGAw@mail.gmail.com' \
--to=fmdefrancesco@gmail.com \
--cc=davispuh@gmail.com \
--cc=kernelnewbies@kernelnewbies.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).