All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pavel Skripkin <paskripkin@gmail.com>
To: Dongliang Mu <mudongliangabcd@gmail.com>
Cc: alex.aring@gmail.com, "David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-wpan@vger.kernel.org, netdev@vger.kernel.org,
	stefan@datenfreihafen.org,
	syzkaller-bugs <syzkaller-bugs@googlegroups.com>,
	syzbot+b80c9959009a9325cdff@syzkaller.appspotmail.com,
	Dan Carpenter <dan.carpenter@oracle.com>,
	Greg KH <gregkh@linuxfoundation.org>
Subject: Re: Suggestions on how to debug kernel crashes where printk and gdb both does not work
Date: Mon, 14 Jun 2021 23:30:11 +0300	[thread overview]
Message-ID: <20210614233011.79ebe38a@gmail.com> (raw)
In-Reply-To: <CAD-N9QXUrv7zjSyUjsJsWO6KZDhGYtkTCK9U_ZuPA7awJ8P3Yw@mail.gmail.com>

On Mon, 14 Jun 2021 23:04:03 +0800
Dongliang Mu <mudongliangabcd@gmail.com> wrote:

> On Mon, Jun 14, 2021 at 10:47 PM Pavel Skripkin
> <paskripkin@gmail.com> wrote:
> >
> > On Mon, 14 Jun 2021 22:40:55 +0800
> > Dongliang Mu <mudongliangabcd@gmail.com> wrote:
> >
> > > On Mon, Jun 14, 2021 at 10:25 PM Pavel Skripkin
> > > <paskripkin@gmail.com> wrote:
> > > >
> > > > On Mon, 14 Jun 2021 22:19:10 +0800
> > > > Dongliang Mu <mudongliangabcd@gmail.com> wrote:
> > > >
> > > > > On Mon, Jun 14, 2021 at 9:34 PM Pavel Skripkin
> > > > > <paskripkin@gmail.com> wrote:
> > > > > >
> > > > > > On Mon, 14 Jun 2021 21:22:43 +0800
> > > > > > Dongliang Mu <mudongliangabcd@gmail.com> wrote:
> > > > > >
> > > > > > > Dear kernel developers,
> > > > > > >
> > > > > > > I was trying to debug the crash - memory leak in
> > > > > > > hwsim_add_one [1] recently. However, I encountered a
> > > > > > > disgusting issue: my breakpoint and printk/pr_alert in the
> > > > > > > functions that will be surely executed do not work. The
> > > > > > > stack trace is in the following. I wrote this email to
> > > > > > > ask for some suggestions on how to debug such cases?
> > > > > > >
> > > > > > > Thanks very much. Looking forward to your reply.
> > > > > > >
> > > > > >
> > > > > > Hi, Dongliang!
> > > > > >
> > > > > > This bug is not similar to others on the dashboard. I spent
> > > > > > some time debugging it a week ago. The main problem here,
> > > > > > that memory allocation happens in the boot time:
> > > > > >
> > > > > > > [<ffffffff84359255>] kernel_init+0xc/0x1a7
> > > > > > > init/main.c:1447
> > > > > >
> > > > >
> > > > > Oh, nice catch. No wonder why my debugging does not work. :(
> > > > >
> > > > > > and reproducer simply tries to
> > > > > > free this data. You can use ftrace to look at it. Smth like
> > > > > > this:
> > > > > >
> > > > > > $ echo 'hwsim_*' > $TRACE_DIR/set_ftrace_filter
> > > > >
> > > > > Thanks for your suggestion.
> > > > >
> > > > > Do you have any conclusions about this case? If you have found
> > > > > out the root cause and start writing patches, I will turn my
> > > > > focus to other cases.
> > > >
> > > > No, I had some busy days and I have nothing about this bug for
> > > > now. I've just traced the reproducer execution and that's all :)
> > > >
> > > > I guess, some error handling paths are broken, but Im not sure
> > >
> > > In the beginning, I agreed with you. However, after I manually
> > > checked functions: hwsim_probe (initialization) and  hwsim_remove
> > > (cleanup), then things may be different. The cleanup looks
> > > correct to me. I would like to debug but stuck with the debugging
> > > process.
> > >
> > > And there is another issue: the cleanup function also does not
> > > output anything or hit the breakpoint. I don't quite understand
> > > it since the cleanup is not at the boot time.
> > >
> > > Any idea?
> > >
> >
> > Output from ftrace (syzkaller repro):
> >
> > root@syzkaller:~# cat /sys/kernel/tracing/trace
> > # tracer: function_graph
> > #
> > # CPU  DURATION                  FUNCTION CALLS
> > # |     |   |                     |   |   |   |
> >  1)               |  hwsim_del_radio_nl() {
> >  1)               |    hwsim_del() {
> >  1)               |      hwsim_edge_unsubscribe_me() {
> >  1) ! 310.041 us  |        hwsim_free_edge();
> >  1) ! 665.221 us  |      }
> >  1) * 52999.05 us |    }
> >  1) * 53035.38 us |  }
> >
> > Cleanup function is not the case, I think :)
> 
> It seems like I spot the incorrect cleanup function (hwsim_remove is
> the right one is in my mind). Let me learn how to use ftrace to log
> the executed functions and then discuss this case with you guys.
> 

Hmmm, I think, there is a mess with lists.

I just want to share my debug results, I have no idea about the fix for
now.

In hwsim_probe() edge for phy->idx = 1 is allocated, then reproduces
sends a request to delete phy with idx == 0, so this check in
hwsim_edge_unsubscribe_me():

	if (e->endpoint->idx == phy->idx) { 
		... clean up code ...
	}

won't be passed and edge won't be freed (because it was allocated for
phy with idx == 1). Allocated edge for phy 1 becomes leaked after
hwsim_del(). I can't really see the code where phy with idx == 1 can
be deleted from list...

Maybe, it's kmemleak bug. Similar strange case was with this one
https://syzkaller.appspot.com/bug?id=3a325b8389fc41c1bc94de0f4ac437ed13cce584.
I find it strange, that I could reach leaked pointers after kmemleak reported a
leak. Im not familiar with kmemleak internals and I might be wrong 


With regards,
Pavel Skripkin

  reply	other threads:[~2021-06-14 20:30 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-14 13:22 Suggestions on how to debug kernel crashes where printk and gdb both does not work Dongliang Mu
2021-06-14 13:34 ` Pavel Skripkin
2021-06-14 14:19   ` Dongliang Mu
2021-06-14 14:25     ` Pavel Skripkin
2021-06-14 14:40       ` Dongliang Mu
2021-06-14 14:47         ` Pavel Skripkin
2021-06-14 15:04           ` Dongliang Mu
2021-06-14 20:30             ` Pavel Skripkin [this message]
2021-06-15 10:37               ` Dongliang Mu
2021-06-15 13:34                 ` Pavel Skripkin
2021-06-16  2:02                   ` Dongliang Mu
2021-06-16  2:11                     ` Dongliang Mu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210614233011.79ebe38a@gmail.com \
    --to=paskripkin@gmail.com \
    --cc=alex.aring@gmail.com \
    --cc=dan.carpenter@oracle.com \
    --cc=davem@davemloft.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-wpan@vger.kernel.org \
    --cc=mudongliangabcd@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=stefan@datenfreihafen.org \
    --cc=syzbot+b80c9959009a9325cdff@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.