kmemleak: illegal RCU use assertion error

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: kmemleak: illegal RCU use assertion error
Date: Mon, 2 Apr 2012 10:09:11 +0300	[thread overview]
Message-ID: <20120402070911.GB3464@swordfish> (raw)

Hello,

commit e5601400081651060a59bd1f45f2821bb8e97f95
Author: Paul E. McKenney <paul.mckenney@linaro.org>
Date:   Sat Jan 7 11:03:57 2012 -0800

    rcu: Simplify offline processing
    
    Move ->qsmaskinit and blkd_tasks[] manipulation to the CPU_DYING
    notifier.  This simplifies the code by eliminating a potential
    deadlock and by reducing the responsibilities of force_quiescent_state().
    Also rename functions to make their connection to the CPU-hotplug
    stages explicit.
    
    Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>


introduced WARN_ON_ONCE(cpu_is_offline(smp_processor_id())); to __call_rcu()
function, Paul also added cpu_offline checks to other routines (e.g. callbacks)
in later commits. It happens that kmemleak() triggers one of them.

During cpu core offline, kfree()->kmemleak_free()->put_object()-->__call_rcu() chain
for struct intel_shared_regs * is executed when no struct users left on this core -- 
all CPUs are dead or dying.


[ 4703.342462] CPU 3 is now offline
[ 4705.588116] ------------[ cut here ]------------
[ 4705.588129] WARNING: at kernel/rcutree.c:1823 __call_rcu+0x9d/0x1d2()
[..]
[ 4705.588196] Call Trace:
[ 4705.588207]  [<ffffffff81059a00>] ? synchronize_srcu+0x6/0x17
[ 4705.588215]  [<ffffffff8103364e>] warn_slowpath_common+0x83/0x9c
[ 4705.588223]  [<ffffffff8111e627>] ? get_object+0x31/0x31
[ 4705.588229]  [<ffffffff81033681>] warn_slowpath_null+0x1a/0x1c
[ 4705.588235]  [<ffffffff810af770>] __call_rcu+0x9d/0x1d2
[ 4705.588243]  [<ffffffff81013f52>] ? intel_pmu_cpu_dying+0x3b/0x5d
[ 4705.588249]  [<ffffffff810af8f1>] call_rcu_sched+0x17/0x19
[ 4705.588255]  [<ffffffff8111eb7e>] put_object+0x47/0x4b
[ 4705.588261]  [<ffffffff8111ed8b>] delete_object_full+0x2a/0x2e
[ 4705.588269]  [<ffffffff81491dc8>] kmemleak_free+0x26/0x45
[ 4705.588274]  [<ffffffff8111691f>] kfree+0x130/0x221
[ 4705.588280]  [<ffffffff81013f52>] intel_pmu_cpu_dying+0x3b/0x5d
[ 4705.588287]  [<ffffffff8149cb83>] x86_pmu_notifier+0xaf/0xb9
[ 4705.588296]  [<ffffffff814b0e9d>] notifier_call_chain+0xac/0xd9
[ 4705.588303]  [<ffffffff81059c9e>] __raw_notifier_call_chain+0xe/0x10
[ 4705.588309]  [<ffffffff810354ec>] __cpu_notify+0x20/0x37
[ 4705.588314]  [<ffffffff81035516>] cpu_notify+0x13/0x15
[ 4705.588320]  [<ffffffff81490fab>] take_cpu_down+0x28/0x2e
[ 4705.588326]  [<ffffffff8109ef7f>] stop_machine_cpu_stop+0x96/0xf1
[ 4705.588332]  [<ffffffff8109ece3>] cpu_stopper_thread+0xe3/0x183
[ 4705.588338]  [<ffffffff8109eee9>] ? queue_stop_cpus_work+0xd0/0xd0
[ 4705.588344]  [<ffffffff814ad382>] ? _raw_spin_unlock_irqrestore+0x47/0x65
[ 4705.588353]  [<ffffffff81087d0d>] ? trace_hardirqs_on_caller+0x119/0x175
[ 4705.588358]  [<ffffffff81087d76>] ? trace_hardirqs_on+0xd/0xf
[ 4705.588364]  [<ffffffff8109ec00>] ? cpu_stop_signal_done+0x2c/0x2c
[ 4705.588370]  [<ffffffff810544a9>] kthread+0x8b/0x93
[ 4705.588378]  [<ffffffff814b5f34>] kernel_thread_helper+0x4/0x10
[ 4705.588385]  [<ffffffff814ad7f0>] ? retint_restore_args+0x13/0x13
[ 4705.588391]  [<ffffffff8105441e>] ? __init_kthread_worker+0x5a/0x5a
[ 4705.588397]  [<ffffffff814b5f30>] ? gs_change+0x13/0x13
[ 4705.588400] ---[ end trace 720328982e35a713 ]---
[ 4705.588507] CPU 2 is now offline


My first solution was to return from delete_object() if object deallocation
performed on cpu_is_offline(smp_processor_id()), marking object with special
flag, say OBJECT_ORPHAN. And issue real object_delete() during scan (for example)
when we see OBJECT_ORPHAN object.  
That, however, requires to handle special case when cpu core offlined
for small period of time, leading to object insertion error in
create_object(), which either may be handled in 2 possible ways (assuming
that lookup_object() returned OBJECT_ORPHAN):
#1 delete orphaned object and retry with insertion (*)
#2 re-set existing orphan object


(*) performing delete_object() from within create_object() requires releasing
of held kmemleak and object locks, which is racy with other create_object() and
any possible scan() activities.

Yet I'm not exactly sure that option #2 is the correct one.
(I've kind of a patch [not properly tested, etc.] for #2 option).


	Sergey

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: kmemleak: illegal RCU use assertion error
Date: Mon, 2 Apr 2012 10:09:11 +0300	[thread overview]
Message-ID: <20120402070911.GB3464@swordfish> (raw)

Hello,

commit e5601400081651060a59bd1f45f2821bb8e97f95
Author: Paul E. McKenney <paul.mckenney@linaro.org>
Date:   Sat Jan 7 11:03:57 2012 -0800

    rcu: Simplify offline processing
    
    Move ->qsmaskinit and blkd_tasks[] manipulation to the CPU_DYING
    notifier.  This simplifies the code by eliminating a potential
    deadlock and by reducing the responsibilities of force_quiescent_state().
    Also rename functions to make their connection to the CPU-hotplug
    stages explicit.
    
    Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>


introduced WARN_ON_ONCE(cpu_is_offline(smp_processor_id())); to __call_rcu()
function, Paul also added cpu_offline checks to other routines (e.g. callbacks)
in later commits. It happens that kmemleak() triggers one of them.

During cpu core offline, kfree()->kmemleak_free()->put_object()-->__call_rcu() chain
for struct intel_shared_regs * is executed when no struct users left on this core -- 
all CPUs are dead or dying.


[ 4703.342462] CPU 3 is now offline
[ 4705.588116] ------------[ cut here ]------------
[ 4705.588129] WARNING: at kernel/rcutree.c:1823 __call_rcu+0x9d/0x1d2()
[..]
[ 4705.588196] Call Trace:
[ 4705.588207]  [<ffffffff81059a00>] ? synchronize_srcu+0x6/0x17
[ 4705.588215]  [<ffffffff8103364e>] warn_slowpath_common+0x83/0x9c
[ 4705.588223]  [<ffffffff8111e627>] ? get_object+0x31/0x31
[ 4705.588229]  [<ffffffff81033681>] warn_slowpath_null+0x1a/0x1c
[ 4705.588235]  [<ffffffff810af770>] __call_rcu+0x9d/0x1d2
[ 4705.588243]  [<ffffffff81013f52>] ? intel_pmu_cpu_dying+0x3b/0x5d
[ 4705.588249]  [<ffffffff810af8f1>] call_rcu_sched+0x17/0x19
[ 4705.588255]  [<ffffffff8111eb7e>] put_object+0x47/0x4b
[ 4705.588261]  [<ffffffff8111ed8b>] delete_object_full+0x2a/0x2e
[ 4705.588269]  [<ffffffff81491dc8>] kmemleak_free+0x26/0x45
[ 4705.588274]  [<ffffffff8111691f>] kfree+0x130/0x221
[ 4705.588280]  [<ffffffff81013f52>] intel_pmu_cpu_dying+0x3b/0x5d
[ 4705.588287]  [<ffffffff8149cb83>] x86_pmu_notifier+0xaf/0xb9
[ 4705.588296]  [<ffffffff814b0e9d>] notifier_call_chain+0xac/0xd9
[ 4705.588303]  [<ffffffff81059c9e>] __raw_notifier_call_chain+0xe/0x10
[ 4705.588309]  [<ffffffff810354ec>] __cpu_notify+0x20/0x37
[ 4705.588314]  [<ffffffff81035516>] cpu_notify+0x13/0x15
[ 4705.588320]  [<ffffffff81490fab>] take_cpu_down+0x28/0x2e
[ 4705.588326]  [<ffffffff8109ef7f>] stop_machine_cpu_stop+0x96/0xf1
[ 4705.588332]  [<ffffffff8109ece3>] cpu_stopper_thread+0xe3/0x183
[ 4705.588338]  [<ffffffff8109eee9>] ? queue_stop_cpus_work+0xd0/0xd0
[ 4705.588344]  [<ffffffff814ad382>] ? _raw_spin_unlock_irqrestore+0x47/0x65
[ 4705.588353]  [<ffffffff81087d0d>] ? trace_hardirqs_on_caller+0x119/0x175
[ 4705.588358]  [<ffffffff81087d76>] ? trace_hardirqs_on+0xd/0xf
[ 4705.588364]  [<ffffffff8109ec00>] ? cpu_stop_signal_done+0x2c/0x2c
[ 4705.588370]  [<ffffffff810544a9>] kthread+0x8b/0x93
[ 4705.588378]  [<ffffffff814b5f34>] kernel_thread_helper+0x4/0x10
[ 4705.588385]  [<ffffffff814ad7f0>] ? retint_restore_args+0x13/0x13
[ 4705.588391]  [<ffffffff8105441e>] ? __init_kthread_worker+0x5a/0x5a
[ 4705.588397]  [<ffffffff814b5f30>] ? gs_change+0x13/0x13
[ 4705.588400] ---[ end trace 720328982e35a713 ]---
[ 4705.588507] CPU 2 is now offline


My first solution was to return from delete_object() if object deallocation
performed on cpu_is_offline(smp_processor_id()), marking object with special
flag, say OBJECT_ORPHAN. And issue real object_delete() during scan (for example)
when we see OBJECT_ORPHAN object.  
That, however, requires to handle special case when cpu core offlined
for small period of time, leading to object insertion error in
create_object(), which either may be handled in 2 possible ways (assuming
that lookup_object() returned OBJECT_ORPHAN):
#1 delete orphaned object and retry with insertion (*)
#2 re-set existing orphan object


(*) performing delete_object() from within create_object() requires releasing
of held kmemleak and object locks, which is racy with other create_object() and
any possible scan() activities.

Yet I'm not exactly sure that option #2 is the correct one.
(I've kind of a patch [not properly tested, etc.] for #2 option).


	Sergey

next             reply	other threads:[~2012-04-02  7:09 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-02  7:09 Sergey Senozhatsky [this message]
2012-04-02  7:09 ` kmemleak: illegal RCU use assertion error Sergey Senozhatsky
2012-04-02 13:09 ` Paul E. McKenney
2012-04-02 13:09   ` Paul E. McKenney
2012-04-02 23:10   ` Sergey Senozhatsky
2012-04-03 14:58     ` Paul E. McKenney
2012-04-03 14:58       ` Paul E. McKenney
2012-04-05 21:30       ` Sergey Senozhatsky
2012-04-05 21:30         ` Sergey Senozhatsky
2012-04-05 21:47         ` Paul E. McKenney
2012-04-05 21:47           ` Paul E. McKenney
2012-04-05 22:07           ` Sergey Senozhatsky
2012-04-05 22:07             ` Sergey Senozhatsky
2012-04-16 20:35           ` Paul E. McKenney
2012-04-16 20:35             ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120402070911.GB3464@swordfish \
    --to=sergey.senozhatsky@gmail.com \
    --cc=catalin.marinas@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.