Re: How to debug stuck read?

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: FMDF <fmdefrancesco@gmail.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: "Dāvis Mosāns" <davispuh@gmail.com>,
	linux-fsdevel@vger.kernel.org,
	BTRFS <linux-btrfs@vger.kernel.org>,
	kernelnewbies <kernelnewbies@kernelnewbies.org>
Subject: Re: How to debug stuck read?
Date: Sun, 6 Feb 2022 22:22:16 +0100	[thread overview]
Message-ID: <CAPj211uFgCyri=RKnOJs2cV7-9FRFjOPLti8Jo0ODZeHEPgGAw@mail.gmail.com> (raw)
In-Reply-To: <Yf/DiefrNOkib5mm@casper.infradead.org>

On Sun, Feb 6, 2022 at 1:48 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Sun, Feb 06, 2022 at 12:01:02PM +0100, FMDF wrote:
> > On Wed, Feb 2, 2022 at 10:50 PM Dāvis Mosāns <davispuh@gmail.com> wrote:
> > >
> > > trešd., 2022. g. 2. febr., plkst. 21:13 — lietotājs Matthew Wilcox
> > > (<willy@infradead.org>) rakstīja:
> > > >
> > > > On Wed, Feb 02, 2022 at 07:15:14PM +0200, Dāvis Mosāns wrote:
> > > > > I have a corrupted file on BTRFS which has CoW disabled thus no
> > > > > checksum. Trying to read this file causes the process to get stuck
> > > > > forever. It doesn't return EIO.
> > > > >
> > > > > How can I find out why it gets stuck?
> > > >
> > > > > $ cat /proc/3449/stack | ./scripts/decode_stacktrace.sh vmlinux
> > > > > folio_wait_bit_common (mm/filemap.c:1314)
> > > > > filemap_get_pages (mm/filemap.c:2622)
> > > > > filemap_read (mm/filemap.c:2676)
> > > > > new_sync_read (fs/read_write.c:401 (discriminator 1))
> > > >
> > > > folio_wait_bit_common() is where it waits for the page to be unlocked.
> > > > Probably the problem is that btrfs isn't unlocking the page on
> > > > seeing the error, so you don't get the -EIO returned?
> > >
> > >
> > > Yeah, but how to find where that happens.
> > > Anyway by pure luck I found memcpy that wrote outside of allocated
> > > memory and fixing that solved this issue but I still don't know how to
> > > debug this properly.
> > >
> > There is no special recipe for debugging "this properly" :)
> >
> > You wrote that "by pure luck" you found a memcpy() that wrote beyond the
> > limit of allocated memory. I suppose that you found that faulty memcpy()
> > somewhere in one of the function listed in the stack trace.
>
> I very much doubt that.  The code flow here is:
>
> userspace calls read() -> VFS -> btrfs -> block layer -> return to btrfs
> -> return to VFS, wait for read to complete.  So by the time anyone's
> looking at the stack trace, all you can see is the part of the call
> chain in the VFS.  There's no way to see where we went in btrfs, nor
> in the block layer.  We also can't see from the stack trace what
> happened with the interrupt which _should have_ cleared the lock bit
> and didn't.
>
OK, I agree. This appears to be is one of those special cases where the mere
reading of a stack trace cannot help much... :(

My argument is about a general approach to debugging some unknown code
by just reading the calls chain. Many times I've been able to find out what was
wrong with code I had never seen before by just following the chain of calls
in subsystems that I know nothing of (e.g., a bug in "tty" that was reported by
Syzbot).

In this special case, if the developer doesn't know that "the interrupt [which]
_should have_ cleared the lock bit and didn't." there is nothing that one can
deduce from a stack trace.

Here one need to know how things work, well beyond the functions that are
listed in the trace. So, probably, if one needs a "recipe" for those cases, the
recipe is just know the subsystem(s) at hand and know how the kernel manages
interrupts.

Actually I haven't deepened this issue but, by reading what Matthew writes,
I doubt that a faulty memcpy() can be the culprit... Davis, are you really sure
that you've fixed that bug?

Regards,

Fabio

next prev parent reply	other threads:[~2022-02-06 21:22 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-02 17:15 How to debug stuck read? Dāvis Mosāns
2022-02-02 19:13 ` Matthew Wilcox
2022-02-02 21:50   ` Dāvis Mosāns
2022-02-06 11:01     ` FMDF
2022-02-06 12:48       ` Matthew Wilcox
2022-02-06 21:22         ` FMDF [this message]
2022-02-06 23:21           ` Dāvis Mosāns
2022-02-06 23:49             ` Matthew Wilcox
2022-02-07  0:07               ` Dāvis Mosāns
2022-02-07  1:06                 ` Matthew Wilcox
2022-02-07  1:22                   ` Dāvis Mosāns

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPj211uFgCyri=RKnOJs2cV7-9FRFjOPLti8Jo0ODZeHEPgGAw@mail.gmail.com' \
    --to=fmdefrancesco@gmail.com \
    --cc=davispuh@gmail.com \
    --cc=kernelnewbies@kernelnewbies.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).