From: Stephen Boyd <sboyd@kernel.org>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Guenter Roeck <linux@roeck-us.net>,
stable <stable@vger.kernel.org>,
Kishon Vijay Abraham I <kishon@ti.com>
Cc: <linux-clk@vger.kernel.org>
Subject: Re: Clock related crashes in v5.4.y-queue
Date: Wed, 01 Jan 2020 23:30:57 -0800 [thread overview]
Message-ID: <20200102073058.662A9215A4@mail.kernel.org> (raw)
In-Reply-To: <5869f050-7b3f-b950-bfb6-5601d2b30fbd@roeck-us.net>
(Happy New Year!)
Quoting Guenter Roeck (2020-01-01 19:41:40)
> On 1/1/20 6:44 PM, Guenter Roeck wrote:
> > Hi,
> >
> > I see a number of crashes in the latest v5.4.y-queue; please see below
> > for details. The problem bisects to commit 54a311c5d3988d ("clk: Fix memory
> > leak in clk_unregister()").
> >
> > The context suggests recovery from a failed driver probe, and it appears
> > that the memory is released twice. Interestingly, I don't see the problem
> > in mainline.
> >
>
> The reason for not seeing the crash in mainline is that macb_probe()
> doesn't fail there due to unrelated changes in the driver. If I force
> macb_probe() to fail in mainline, I see exactly the same crash.
>
> Effectively this means that upstream commit 8247470772be is broken;
> it may fix a memory leak in some situations, but it results in UAF
> and crashes otherwise.
>
> Stephen, any comments ? I must admit that I don't understand the clock
> code nor the commit in question; I would have assumed that the call
> to __clk_put() would release the clk data structure, not an explicit
> call to kfree().
The clk that the commit from Kishon is freeing is the first "consumer
handle" that we make when a clk is registered. That is returned to
anyone that calls clk_register(), or if the provider decides to access
clk_hw::clk directly, which is not desired but still exists for
historical reasons. It is also used when drivers call clk_get_parent()
and that API currently fails to reference count or even create a
per-call clk pointer.
The general idea is that each user of clk_get() should get a different
struct clk pointer to use. The problem is we have this semi-internal
struct clk pointer that leaks out of clk_get_parent(), __clk_lookup()
and clk_register().
Maybe someone is calling clk_unregister() twice with the same pointer
they got through different ways? The macb driver does some questionable
things, like calling clk_register() and then putting the returned
pointer into *tx_clk, but only for the sifive implementation. After that
it does even odder things, like calling clk_unregister() on a clk that
probably shouldn't be unregistered, except for on the sifive platforms
that register it. Pretty horrifying that clk_unregister() gives any
consumer the power to destroy a clk from the system!
Can you try this patch? I think by fixing the leak we've discovered more
problems.
----8<----
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index 9c767ee252ac..7dce403fd27c 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -4069,7 +4069,7 @@ static int fu540_c000_clk_init(struct platform_device *pdev, struct clk **pclk,
mgmt->rate = 0;
mgmt->hw.init = &init;
- *tx_clk = clk_register(NULL, &mgmt->hw);
+ *tx_clk = devm_clk_register(&pdev->dev, &mgmt->hw);
if (IS_ERR(*tx_clk))
return PTR_ERR(*tx_clk);
@@ -4397,7 +4397,6 @@ static int macb_probe(struct platform_device *pdev)
err_disable_clocks:
clk_disable_unprepare(tx_clk);
- clk_unregister(tx_clk);
clk_disable_unprepare(hclk);
clk_disable_unprepare(pclk);
clk_disable_unprepare(rx_clk);
@@ -4427,7 +4426,6 @@ static int macb_remove(struct platform_device *pdev)
pm_runtime_dont_use_autosuspend(&pdev->dev);
if (!pm_runtime_suspended(&pdev->dev)) {
clk_disable_unprepare(bp->tx_clk);
- clk_unregister(bp->tx_clk);
clk_disable_unprepare(bp->hclk);
clk_disable_unprepare(bp->pclk);
clk_disable_unprepare(bp->rx_clk);
>
> Guenter
>
> > I would suggest to drop that patch from the stable queue.
> >
> > Guenter
> >
> > ---
> > First traceback is:
> >
> > [ 19.203547] ------------[ cut here ]------------
> > [ 19.204107] WARNING: CPU: 0 PID: 1 at drivers/clk/clk.c:4034 __clk_put+0xfc/0x128
Presumably this is the
WARN_ON_ONCE(IS_ERR(clk))
in __clk_put()? Or is it the exclusive count check that is getting
tricked out because of page poisoning?
I guess we should put that in some sort of text form of warning instead
of a not so helpful line number.
> > [ 19.204275] Modules linked in:
> > [ 19.204634] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.8-rc1-00191-gaf408bc6c96e #1
> > [ 19.204790] Hardware name: Xilinx Zynq Platform
> > [ 19.204994] [<c0313658>] (unwind_backtrace) from [<c030d698>] (show_stack+0x10/0x14)
> > [ 19.205150] [<c030d698>] (show_stack) from [<c1139bdc>] (dump_stack+0xe0/0x10c)
> > [ 19.205278] [<c1139bdc>] (dump_stack) from [<c0349098>] (__warn+0xf4/0x10c)
> > [ 19.205399] [<c0349098>] (__warn) from [<c0349164>] (warn_slowpath_fmt+0xb4/0xbc)
> > [ 19.205522] [<c0349164>] (warn_slowpath_fmt) from [<c0956d14>] (__clk_put+0xfc/0x128)
> > [ 19.205654] [<c0956d14>] (__clk_put) from [<c0b1ea10>] (release_nodes+0x1c4/0x278)
> > [ 19.205780] [<c0b1ea10>] (release_nodes) from [<c0b1a220>] (really_probe+0x108/0x34c)
> > [ 19.205908] [<c0b1a220>] (really_probe) from [<c0b1a5dc>] (driver_probe_device+0x60/0x174)
> > [ 19.206042] [<c0b1a5dc>] (driver_probe_device) from [<c0b1a898>] (device_driver_attach+0x58/0x60)
> > [ 19.206179] [<c0b1a898>] (device_driver_attach) from [<c0b1a924>] (__driver_attach+0x84/0xc0)
> > [ 19.206313] [<c0b1a924>] (__driver_attach) from [<c0b18400>] (bus_for_each_dev+0x78/0xb8)
> > [ 19.206463] [<c0b18400>] (bus_for_each_dev) from [<c0b195e8>] (bus_add_driver+0x164/0x1e8)
> > [ 19.206590] [<c0b195e8>] (bus_add_driver) from [<c0b1b6fc>] (driver_register+0x74/0x108)
> > [ 19.206723] [<c0b1b6fc>] (driver_register) from [<c030315c>] (do_one_initcall+0x8c/0x3bc)
> > [ 19.206857] [<c030315c>] (do_one_initcall) from [<c1a01080>] (kernel_init_freeable+0x14c/0x1e8)
> > [ 19.206992] [<c1a01080>] (kernel_init_freeable) from [<c11547a4>] (kernel_init+0x8/0x118)
> > [ 19.207116] [<c11547a4>] (kernel_init) from [<c03010b4>] (ret_from_fork+0x14/0x20)
> >
> > followed by:
> >
> > [ 19.209792] 8<--- cut here ---
> > [ 19.209926] Unable to handle kernel paging request at virtual address 6b6b6bb3
> > [ 19.210117] pgd = (ptrval)
> > [ 19.210207] [6b6b6bb3] *pgd=00000000
> > [ 19.210626] Internal error: Oops: 5 [#1] SMP ARM
> > [ 19.210807] Modules linked in:
> > [ 19.210956] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 5.4.8-rc1-00191-gaf408bc6c96e #1
> > [ 19.211090] Hardware name: Xilinx Zynq Platform
> > [ 19.211200] PC is at __clk_put+0x104/0x128
> > [ 19.211274] LR is at __clk_put+0xfc/0x128
> > [ 19.211349] pc : [<c0956d1c>] lr : [<c0956d14>] psr: 60000053
> > [ 19.211446] sp : c7129dd8 ip : 00000000 fp : c59f1680
> > [ 19.211534] r10: c72fb6ac r9 : c0b1dbd0 r8 : 00000008
> > [ 19.211626] r7 : c7129e04 r6 : c72fb410 r5 : c59f0880 r4 : c59f3180
> > [ 19.211727] r3 : 7a538c1d r2 : 6b6b6b6b r1 : 6b6b6b6b r0 : 00000000
> > [ 19.211885] Flags: nZCv IRQs on FIQs off Mode SVC_32 ISA ARM Segment none
> > [ 19.212022] Control: 10c5387d Table: 00204059 DAC: 00000051
> > [ 19.212152] Process swapper/0 (pid: 1, stack limit = 0x(ptrval))
> > [ 19.212270] Stack: (0xc7129dd8 to 0xc712a000)
> > [ 19.212391] 9dc0: c59f1680 c59f0880
> > [ 19.212608] 9de0: c72fb410 c0b1ea10 ffffffed 00000000 c0b1e404 c7128000 c72fb410 a0000053
> > [ 19.212822] 9e00: c72fb68c c59f1c80 c59f1480 7a538c1d 00000001 c241e19c c72fb410 c241e1a0
> > [ 19.213029] 9e20: 00000000 c1d8a1ac 00000000 ffffffed c1b8124c c0b1a220 c72fb410 c1d8a1ac
> > [ 19.213240] 9e40: c1d8a1ac c7128000 c1dc347c 00000007 000001f6 c0b1a5dc c1d8a1ac c1d8a1ac
> > [ 19.213462] 9e60: c7128000 c72fb410 00000000 c1d8a1ac c7128000 c1dc347c 00000007 000001f6
> > [ 19.213683] 9e80: c1b8124c c0b1a898 00000000 c1d8a1ac c72fb410 c0b1a924 00000000 c1d8a1ac
> > [ 19.213899] 9ea0: c0b1a8a0 c0b18400 c70b50d4 c70b50a4 c725d210 7a538c1d c70b50d4 c1d8a1ac
> > [ 19.214115] 9ec0: c59f0280 c1d6dd50 00000000 c0b195e8 c185eb44 c1aab944 00000000 c1d8a1ac
> > [ 19.214343] 9ee0: c1aab944 00000000 c1c08468 c0b1b6fc c1dc46c0 c1aab944 00000000 c030315c
> > [ 19.214555] 9f00: c1959bf0 000001f6 000001f6 c0372600 00000000 c19574b8 c1883c18 00000000
> > [ 19.214783] 9f20: c7128000 c03b3f70 c7128000 c1dd1f00 c1c08468 c1ae7870 c1a00590 00000007
> > [ 19.215001] 9f40: 000001f6 c03d39b8 00000000 7a538c1d c1dc347c c1dd1f00 c1dd1f00 c1ae7850
> > [ 19.215214] 9f60: c1ae7870 c1a00590 00000007 c1a01080 00000006 00000006 00000000 c1a00590
> > [ 19.215429] 9f80: 00000000 00000000 c115479c 00000000 00000000 00000000 00000000 00000000
> > [ 19.215636] 9fa0: 00000000 c11547a4 00000000 c03010b4 00000000 00000000 00000000 00000000
> > [ 19.215843] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> > [ 19.216068] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
> > [ 19.216255] [<c0956d1c>] (__clk_put) from [<c0b1ea10>] (release_nodes+0x1c4/0x278)
> > [ 19.216376] [<c0b1ea10>] (release_nodes) from [<c0b1a220>] (really_probe+0x108/0x34c)
> > [ 19.216494] [<c0b1a220>] (really_probe) from [<c0b1a5dc>] (driver_probe_device+0x60/0x174)
> > [ 19.216617] [<c0b1a5dc>] (driver_probe_device) from [<c0b1a898>] (device_driver_attach+0x58/0x60)
> > [ 19.216745] [<c0b1a898>] (device_driver_attach) from [<c0b1a924>] (__driver_attach+0x84/0xc0)
> > [ 19.216867] [<c0b1a924>] (__driver_attach) from [<c0b18400>] (bus_for_each_dev+0x78/0xb8)
> > [ 19.216993] [<c0b18400>] (bus_for_each_dev) from [<c0b195e8>] (bus_add_driver+0x164/0x1e8)
> > [ 19.217112] [<c0b195e8>] (bus_add_driver) from [<c0b1b6fc>] (driver_register+0x74/0x108)
> > [ 19.217233] [<c0b1b6fc>] (driver_register) from [<c030315c>] (do_one_initcall+0x8c/0x3bc)
> > [ 19.217358] [<c030315c>] (do_one_initcall) from [<c1a01080>] (kernel_init_freeable+0x14c/0x1e8)
> > [ 19.217500] [<c1a01080>] (kernel_init_freeable) from [<c11547a4>] (kernel_init+0x8/0x118)
> > [ 19.217624] [<c11547a4>] (kernel_init) from [<c03010b4>] (ret_from_fork+0x14/0x20)
>
next prev parent reply other threads:[~2020-01-02 7:31 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-02 2:44 Clock related crashes in v5.4.y-queue Guenter Roeck
2020-01-02 3:41 ` Guenter Roeck
2020-01-02 7:30 ` Stephen Boyd [this message]
2020-01-02 14:13 ` Guenter Roeck
2020-01-02 14:19 ` Naresh Kamboju
2020-01-02 14:38 ` Guenter Roeck
2020-01-02 21:01 ` Greg Kroah-Hartman
2020-01-02 21:28 ` Guenter Roeck
2020-01-03 0:40 ` Sasha Levin
2020-01-03 14:50 ` Guenter Roeck
2020-01-05 16:02 ` Sasha Levin
2020-01-05 16:34 ` Guenter Roeck
2020-01-05 19:10 ` Sasha Levin
2020-01-06 13:20 ` Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200102073058.662A9215A4@mail.kernel.org \
--to=sboyd@kernel.org \
--cc=gregkh@linuxfoundation.org \
--cc=kishon@ti.com \
--cc=linux-clk@vger.kernel.org \
--cc=linux@roeck-us.net \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).