From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: kmemleak: illegal RCU use assertion error
Date: Mon, 2 Apr 2012 10:09:11 +0300 [thread overview]
Message-ID: <20120402070911.GB3464@swordfish> (raw)
Hello,
commit e5601400081651060a59bd1f45f2821bb8e97f95
Author: Paul E. McKenney <paul.mckenney@linaro.org>
Date: Sat Jan 7 11:03:57 2012 -0800
rcu: Simplify offline processing
Move ->qsmaskinit and blkd_tasks[] manipulation to the CPU_DYING
notifier. This simplifies the code by eliminating a potential
deadlock and by reducing the responsibilities of force_quiescent_state().
Also rename functions to make their connection to the CPU-hotplug
stages explicit.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
introduced WARN_ON_ONCE(cpu_is_offline(smp_processor_id())); to __call_rcu()
function, Paul also added cpu_offline checks to other routines (e.g. callbacks)
in later commits. It happens that kmemleak() triggers one of them.
During cpu core offline, kfree()->kmemleak_free()->put_object()-->__call_rcu() chain
for struct intel_shared_regs * is executed when no struct users left on this core --
all CPUs are dead or dying.
[ 4703.342462] CPU 3 is now offline
[ 4705.588116] ------------[ cut here ]------------
[ 4705.588129] WARNING: at kernel/rcutree.c:1823 __call_rcu+0x9d/0x1d2()
[..]
[ 4705.588196] Call Trace:
[ 4705.588207] [<ffffffff81059a00>] ? synchronize_srcu+0x6/0x17
[ 4705.588215] [<ffffffff8103364e>] warn_slowpath_common+0x83/0x9c
[ 4705.588223] [<ffffffff8111e627>] ? get_object+0x31/0x31
[ 4705.588229] [<ffffffff81033681>] warn_slowpath_null+0x1a/0x1c
[ 4705.588235] [<ffffffff810af770>] __call_rcu+0x9d/0x1d2
[ 4705.588243] [<ffffffff81013f52>] ? intel_pmu_cpu_dying+0x3b/0x5d
[ 4705.588249] [<ffffffff810af8f1>] call_rcu_sched+0x17/0x19
[ 4705.588255] [<ffffffff8111eb7e>] put_object+0x47/0x4b
[ 4705.588261] [<ffffffff8111ed8b>] delete_object_full+0x2a/0x2e
[ 4705.588269] [<ffffffff81491dc8>] kmemleak_free+0x26/0x45
[ 4705.588274] [<ffffffff8111691f>] kfree+0x130/0x221
[ 4705.588280] [<ffffffff81013f52>] intel_pmu_cpu_dying+0x3b/0x5d
[ 4705.588287] [<ffffffff8149cb83>] x86_pmu_notifier+0xaf/0xb9
[ 4705.588296] [<ffffffff814b0e9d>] notifier_call_chain+0xac/0xd9
[ 4705.588303] [<ffffffff81059c9e>] __raw_notifier_call_chain+0xe/0x10
[ 4705.588309] [<ffffffff810354ec>] __cpu_notify+0x20/0x37
[ 4705.588314] [<ffffffff81035516>] cpu_notify+0x13/0x15
[ 4705.588320] [<ffffffff81490fab>] take_cpu_down+0x28/0x2e
[ 4705.588326] [<ffffffff8109ef7f>] stop_machine_cpu_stop+0x96/0xf1
[ 4705.588332] [<ffffffff8109ece3>] cpu_stopper_thread+0xe3/0x183
[ 4705.588338] [<ffffffff8109eee9>] ? queue_stop_cpus_work+0xd0/0xd0
[ 4705.588344] [<ffffffff814ad382>] ? _raw_spin_unlock_irqrestore+0x47/0x65
[ 4705.588353] [<ffffffff81087d0d>] ? trace_hardirqs_on_caller+0x119/0x175
[ 4705.588358] [<ffffffff81087d76>] ? trace_hardirqs_on+0xd/0xf
[ 4705.588364] [<ffffffff8109ec00>] ? cpu_stop_signal_done+0x2c/0x2c
[ 4705.588370] [<ffffffff810544a9>] kthread+0x8b/0x93
[ 4705.588378] [<ffffffff814b5f34>] kernel_thread_helper+0x4/0x10
[ 4705.588385] [<ffffffff814ad7f0>] ? retint_restore_args+0x13/0x13
[ 4705.588391] [<ffffffff8105441e>] ? __init_kthread_worker+0x5a/0x5a
[ 4705.588397] [<ffffffff814b5f30>] ? gs_change+0x13/0x13
[ 4705.588400] ---[ end trace 720328982e35a713 ]---
[ 4705.588507] CPU 2 is now offline
My first solution was to return from delete_object() if object deallocation
performed on cpu_is_offline(smp_processor_id()), marking object with special
flag, say OBJECT_ORPHAN. And issue real object_delete() during scan (for example)
when we see OBJECT_ORPHAN object.
That, however, requires to handle special case when cpu core offlined
for small period of time, leading to object insertion error in
create_object(), which either may be handled in 2 possible ways (assuming
that lookup_object() returned OBJECT_ORPHAN):
#1 delete orphaned object and retry with insertion (*)
#2 re-set existing orphan object
(*) performing delete_object() from within create_object() requires releasing
of held kmemleak and object locks, which is racy with other create_object() and
any possible scan() activities.
Yet I'm not exactly sure that option #2 is the correct one.
(I've kind of a patch [not properly tested, etc.] for #2 option).
Sergey
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: kmemleak: illegal RCU use assertion error
Date: Mon, 2 Apr 2012 10:09:11 +0300 [thread overview]
Message-ID: <20120402070911.GB3464@swordfish> (raw)
Hello,
commit e5601400081651060a59bd1f45f2821bb8e97f95
Author: Paul E. McKenney <paul.mckenney@linaro.org>
Date: Sat Jan 7 11:03:57 2012 -0800
rcu: Simplify offline processing
Move ->qsmaskinit and blkd_tasks[] manipulation to the CPU_DYING
notifier. This simplifies the code by eliminating a potential
deadlock and by reducing the responsibilities of force_quiescent_state().
Also rename functions to make their connection to the CPU-hotplug
stages explicit.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
introduced WARN_ON_ONCE(cpu_is_offline(smp_processor_id())); to __call_rcu()
function, Paul also added cpu_offline checks to other routines (e.g. callbacks)
in later commits. It happens that kmemleak() triggers one of them.
During cpu core offline, kfree()->kmemleak_free()->put_object()-->__call_rcu() chain
for struct intel_shared_regs * is executed when no struct users left on this core --
all CPUs are dead or dying.
[ 4703.342462] CPU 3 is now offline
[ 4705.588116] ------------[ cut here ]------------
[ 4705.588129] WARNING: at kernel/rcutree.c:1823 __call_rcu+0x9d/0x1d2()
[..]
[ 4705.588196] Call Trace:
[ 4705.588207] [<ffffffff81059a00>] ? synchronize_srcu+0x6/0x17
[ 4705.588215] [<ffffffff8103364e>] warn_slowpath_common+0x83/0x9c
[ 4705.588223] [<ffffffff8111e627>] ? get_object+0x31/0x31
[ 4705.588229] [<ffffffff81033681>] warn_slowpath_null+0x1a/0x1c
[ 4705.588235] [<ffffffff810af770>] __call_rcu+0x9d/0x1d2
[ 4705.588243] [<ffffffff81013f52>] ? intel_pmu_cpu_dying+0x3b/0x5d
[ 4705.588249] [<ffffffff810af8f1>] call_rcu_sched+0x17/0x19
[ 4705.588255] [<ffffffff8111eb7e>] put_object+0x47/0x4b
[ 4705.588261] [<ffffffff8111ed8b>] delete_object_full+0x2a/0x2e
[ 4705.588269] [<ffffffff81491dc8>] kmemleak_free+0x26/0x45
[ 4705.588274] [<ffffffff8111691f>] kfree+0x130/0x221
[ 4705.588280] [<ffffffff81013f52>] intel_pmu_cpu_dying+0x3b/0x5d
[ 4705.588287] [<ffffffff8149cb83>] x86_pmu_notifier+0xaf/0xb9
[ 4705.588296] [<ffffffff814b0e9d>] notifier_call_chain+0xac/0xd9
[ 4705.588303] [<ffffffff81059c9e>] __raw_notifier_call_chain+0xe/0x10
[ 4705.588309] [<ffffffff810354ec>] __cpu_notify+0x20/0x37
[ 4705.588314] [<ffffffff81035516>] cpu_notify+0x13/0x15
[ 4705.588320] [<ffffffff81490fab>] take_cpu_down+0x28/0x2e
[ 4705.588326] [<ffffffff8109ef7f>] stop_machine_cpu_stop+0x96/0xf1
[ 4705.588332] [<ffffffff8109ece3>] cpu_stopper_thread+0xe3/0x183
[ 4705.588338] [<ffffffff8109eee9>] ? queue_stop_cpus_work+0xd0/0xd0
[ 4705.588344] [<ffffffff814ad382>] ? _raw_spin_unlock_irqrestore+0x47/0x65
[ 4705.588353] [<ffffffff81087d0d>] ? trace_hardirqs_on_caller+0x119/0x175
[ 4705.588358] [<ffffffff81087d76>] ? trace_hardirqs_on+0xd/0xf
[ 4705.588364] [<ffffffff8109ec00>] ? cpu_stop_signal_done+0x2c/0x2c
[ 4705.588370] [<ffffffff810544a9>] kthread+0x8b/0x93
[ 4705.588378] [<ffffffff814b5f34>] kernel_thread_helper+0x4/0x10
[ 4705.588385] [<ffffffff814ad7f0>] ? retint_restore_args+0x13/0x13
[ 4705.588391] [<ffffffff8105441e>] ? __init_kthread_worker+0x5a/0x5a
[ 4705.588397] [<ffffffff814b5f30>] ? gs_change+0x13/0x13
[ 4705.588400] ---[ end trace 720328982e35a713 ]---
[ 4705.588507] CPU 2 is now offline
My first solution was to return from delete_object() if object deallocation
performed on cpu_is_offline(smp_processor_id()), marking object with special
flag, say OBJECT_ORPHAN. And issue real object_delete() during scan (for example)
when we see OBJECT_ORPHAN object.
That, however, requires to handle special case when cpu core offlined
for small period of time, leading to object insertion error in
create_object(), which either may be handled in 2 possible ways (assuming
that lookup_object() returned OBJECT_ORPHAN):
#1 delete orphaned object and retry with insertion (*)
#2 re-set existing orphan object
(*) performing delete_object() from within create_object() requires releasing
of held kmemleak and object locks, which is racy with other create_object() and
any possible scan() activities.
Yet I'm not exactly sure that option #2 is the correct one.
(I've kind of a patch [not properly tested, etc.] for #2 option).
Sergey
next reply other threads:[~2012-04-02 7:09 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-04-02 7:09 Sergey Senozhatsky [this message]
2012-04-02 7:09 ` kmemleak: illegal RCU use assertion error Sergey Senozhatsky
2012-04-02 13:09 ` Paul E. McKenney
2012-04-02 13:09 ` Paul E. McKenney
2012-04-02 23:10 ` Sergey Senozhatsky
2012-04-03 14:58 ` Paul E. McKenney
2012-04-03 14:58 ` Paul E. McKenney
2012-04-05 21:30 ` Sergey Senozhatsky
2012-04-05 21:30 ` Sergey Senozhatsky
2012-04-05 21:47 ` Paul E. McKenney
2012-04-05 21:47 ` Paul E. McKenney
2012-04-05 22:07 ` Sergey Senozhatsky
2012-04-05 22:07 ` Sergey Senozhatsky
2012-04-16 20:35 ` Paul E. McKenney
2012-04-16 20:35 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120402070911.GB3464@swordfish \
--to=sergey.senozhatsky@gmail.com \
--cc=catalin.marinas@arm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=paulmck@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.