Re: Crash in crc32_le during kmemleak_scan()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: vigneshr@codeaurora.org
To: vigneshr@codeaurora.org
Cc: vigneshr@codeaurora.org, linux-kernel@vger.kernel.org,
	catalin.marinas@arm.com, bernd.schubert@itwm.fraunhofer.de
Subject: Re: Crash in crc32_le during kmemleak_scan()
Date: Thu, 21 May 2015 05:45:17 -0000	[thread overview]
Message-ID: <cade605e668e2bc5620fac2b7328121d.squirrel@www.codeaurora.org> (raw)
In-Reply-To: <d5b53c2263c3bbb3d2070afb24bb0c38.squirrel@www.codeaurora.org>

> Analysis so far :
>
> I haven't been able to figure out exactly how the dots are connected and
> the relation to the change provided earlier but the following is what i
> could put together :
>
> When i checked the post mortem analysis i see the following in the
> crashing stack :
>
> -018|rb_erase(
>     |    [locdesc] node = ?,
>     |    [X1] root = 0xFFFFFFC0016E05B8)
>     |  [X1] root = 0xFFFFFFC0016E05B8
>     |  [locdesc] node = ?
>     |  [X2] parent = 0xFFFFFFC02EEB6F40
>     |  [X4] node = 0x0
>     |  [X3] sibling = 0x0
> -019|__delete_object(
>     |    [X19] object = 0xFFFFFFC0236B03C0)
>     |  [X21] flags = 0x0140
> -020|delete_object_full(
>     |  ?)
>     |  [X19] object = 0xFFFFFFC0236B03C0
> -021|__kmemleak_do_cleanup()
>     |  [X19] object = 0xFFFFFFC0236B03C0 -> (
>     |    [0xFFFFFFC0236B03C0] lock = ([0xFFFFFFC0236B03C0] rlock =
> ([0xFFFFFFC0236B03C0] raw_lock = ([0xFFFFFFC0236B03C0] owner
>     |    [0xFFFFFFC0236B03D8] flags = 0x0,
>     |    [0xFFFFFFC0236B03E0] object_list = ([NSD:0xFFFFFFC0236B03E0] next
> = 0xFFFFFFC0236B1920, [NSD:0xFFFFFFC0236B03E8] prev = 0x00200200),
>     |    [0xFFFFFFC0236B03F0] gray_list = ([0xFFFFFFC0236B03F0] next =
> 0x00100100, [0xFFFFFFC0236B03F8] prev = 0x00200200),
> 	 [0xFFFFFFC0236B0400] rb_node = (
>       			[0xFFFFFFC0236B0400] __rb_parent_color = 0xFFFFFFC02EEB6F41,
> 			[0xFFFFFFC0236B0408] rb_right = 0x0,
> 			[0xFFFFFFC0236B0410] rb_left = 0x0
>
>
> The node is becoming 0x0 here and is crashing in the following piece of
> code in rb_erase:
>
>
>   309         } else {
>   310             sibling = parent->rb_left;
>   311             if (rb_is_red(sibling)) {
>
> and rb_is_red(sibling) resolves into :
>
> 96 #define rb_is_red(rb)      __rb_is_red((rb)->__rb_parent_color)
>
>
>
> 1. Here the parent ptr refers to Object pointer residing at
> 0xFFFFFFC0236B03C0 and we can see that rb_node->rb_left is becoming 0x0 (
> this is from the values observed in the crashed stack)
>
> 		 [0xFFFFFFC0236B0400] rb_node = (
>       			[0xFFFFFFC0236B0400] __rb_parent_color = 0xFFFFFFC02EEB6F41,
> 			[0xFFFFFFC0236B0408] rb_right = 0x0,
> 			[0xFFFFFFC0236B0410] rb_left = 0x0
>
> 2. Therefore the sibling will take the value 0x0
>
> 3. And then when rb->__rb_parent_color operation takes place, we crash
> since its not able to resolve 0x0 (since rb=0x0 here).
>
>
> So the question is how this variable ended up being this way.
>
> 1. I checked the object_list via list walkthrough  and it did not report
> any corruptions.
>
> 2. However, the object in question from our call stack,
> object->object_list pointers looks a bit amiss.
>
>   object_list = ([NSD:0xFFFFFFC0236B03E0] next = 0xFFFFFFC0236B1920,
> [NSD:0xFFFFFFC0236B03E8] prev = 0x00200200),
>
> But its not clear to me how it ended up being in this state.
>
>
> I will continue to debug what went wrong and share my analysis. Please let
> me know if anything looks obvious to you that i can try out.

I went through the bug again and i think the following is happening :

I just noticed that there could be possible race that's happening between 2
cores causing the aforementioned crash to occur :

Core 0 ____________________________ Core 1

rb_erase()                  |
__delete_object()           |
delete_object_full()        |
                            |
___________________________ | __delete_object()
                            |
___________________________ | delete_object_full()
                            |
___________________________ | kmemleak_free()-Call from some other ctxt.
                            |
                            |
__kmemleak_do_cleanup()     |
kmemleak_do_cleanup()       |
process_one_work()          |
worker_thread()             |
kthread()                   |
ret_from_fork()             |

Just after __delete_object() running on core 1 unlocks the writelock
(kmemleak_lock), from core 0 we take the same object and try to do rb_erase
again. This is causing it to crash in this fashion.

Therefore, together with the warning that i quoted in the earlier mail and
the above race, i think the following can be added to the patch provided
earlier:

---------------------------------- 8< -------------------------------------

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 5ec8b71..4455bb8 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -959,7 +959,7 @@ void __ref kmemleak_free(const void *ptr)
 {
        pr_debug("%s(0x%p)\n", __func__, ptr);

-       if (kmemleak_enabled && ptr && !IS_ERR(ptr))
+       if (kmemleak_enabled && ptr && !IS_ERR(ptr) && !kmemleak_error)
                delete_object_full((unsigned long)ptr);
        else if (kmemleak_early_log)
                log_early(KMEMLEAK_FREE, ptr, 0, 0);
@@ -1019,7 +1019,7 @@ void __ref kmemleak_not_leak(const void *ptr)
 {
        pr_debug("%s(0x%p)\n", __func__, ptr);

-       if (kmemleak_enabled && ptr && !IS_ERR(ptr))
+       if (kmemleak_enabled && ptr && !IS_ERR(ptr) && !kmemleak_error)
                make_gray_object((unsigned long)ptr);
        else if (kmemleak_early_log)
                log_early(KMEMLEAK_NOT_LEAK, ptr, 0, 0);




---------------------------------- 8< -------------------------------------

Please let me know if the above analysis and the addition to the patch
looks good to you or if you a different opinion. I can try out testing
once more to see if it pops back if it looks good to you.

====================
Thanks and regards,
Vignesh Radhakrishnan


QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of the Code Aurora Forum, hosted by The Linux Foundation.
again.

next prev parent reply	other threads:[~2015-05-21  5:45 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-13 13:15 Crash in crc32_le during kmemleak_scan() vigneshr
2015-05-14 11:21 ` Catalin Marinas
2015-05-15  4:53   ` vigneshr
2015-05-20 13:16 ` vigneshr
2015-05-21  5:45   ` vigneshr [this message]
2015-05-22 16:51     ` Catalin Marinas
2015-06-01  4:57       ` vigneshr
2015-06-19 10:30         ` vigneshr
2015-06-19 13:20           ` Catalin Marinas

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:5ec8b71 dfblob:4455bb8 )
 OR (
bs:"Re: Crash in crc32_le during kmemleak_scan()" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cade605e668e2bc5620fac2b7328121d.squirrel@www.codeaurora.org \
    --to=vigneshr@codeaurora.org \
    --cc=bernd.schubert@itwm.fraunhofer.de \
    --cc=catalin.marinas@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.