All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stephen Boyd <sboyd@kernel.org>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Guenter Roeck <linux@roeck-us.net>,
	stable <stable@vger.kernel.org>,
	Kishon Vijay Abraham I <kishon@ti.com>
Cc: <linux-clk@vger.kernel.org>
Subject: Re: Clock related crashes in v5.4.y-queue
Date: Wed, 01 Jan 2020 23:30:57 -0800	[thread overview]
Message-ID: <20200102073058.662A9215A4@mail.kernel.org> (raw)
In-Reply-To: <5869f050-7b3f-b950-bfb6-5601d2b30fbd@roeck-us.net>

(Happy New Year!)

Quoting Guenter Roeck (2020-01-01 19:41:40)
> On 1/1/20 6:44 PM, Guenter Roeck wrote:
> > Hi,
> > 
> > I see a number of crashes in the latest v5.4.y-queue; please see below
> > for details. The problem bisects to commit 54a311c5d3988d ("clk: Fix memory
> > leak in clk_unregister()").
> > 
> > The context suggests recovery from a failed driver probe, and it appears
> > that the memory is released twice. Interestingly, I don't see the problem
> > in mainline.
> > 
> 
> The reason for not seeing the crash in mainline is that macb_probe()
> doesn't fail there due to unrelated changes in the driver. If I force
> macb_probe() to fail in mainline, I see exactly the same crash.
> 
> Effectively this means that upstream commit 8247470772be is broken;
> it may fix a memory leak in some situations, but it results in UAF
> and crashes otherwise.
> 
> Stephen, any comments ? I must admit that I don't understand the clock
> code nor the commit in question; I would have assumed that the call
> to __clk_put() would release the clk data structure, not an explicit
> call to kfree().

The clk that the commit from Kishon is freeing is the first "consumer
handle" that we make when a clk is registered. That is returned to
anyone that calls clk_register(), or if the provider decides to access
clk_hw::clk directly, which is not desired but still exists for
historical reasons. It is also used when drivers call clk_get_parent()
and that API currently fails to reference count or even create a
per-call clk pointer.

The general idea is that each user of clk_get() should get a different
struct clk pointer to use. The problem is we have this semi-internal
struct clk pointer that leaks out of clk_get_parent(), __clk_lookup()
and clk_register().

Maybe someone is calling clk_unregister() twice with the same pointer
they got through different ways? The macb driver does some questionable
things, like calling clk_register() and then putting the returned
pointer into *tx_clk, but only for the sifive implementation. After that
it does even odder things, like calling clk_unregister() on a clk that
probably shouldn't be unregistered, except for on the sifive platforms
that register it. Pretty horrifying that clk_unregister() gives any
consumer the power to destroy a clk from the system!

Can you try this patch? I think by fixing the leak we've discovered more
problems.

----8<----
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index 9c767ee252ac..7dce403fd27c 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -4069,7 +4069,7 @@ static int fu540_c000_clk_init(struct platform_device *pdev, struct clk **pclk,
 	mgmt->rate = 0;
 	mgmt->hw.init = &init;
 
-	*tx_clk = clk_register(NULL, &mgmt->hw);
+	*tx_clk = devm_clk_register(&pdev->dev, &mgmt->hw);
 	if (IS_ERR(*tx_clk))
 		return PTR_ERR(*tx_clk);
 
@@ -4397,7 +4397,6 @@ static int macb_probe(struct platform_device *pdev)
 
 err_disable_clocks:
 	clk_disable_unprepare(tx_clk);
-	clk_unregister(tx_clk);
 	clk_disable_unprepare(hclk);
 	clk_disable_unprepare(pclk);
 	clk_disable_unprepare(rx_clk);
@@ -4427,7 +4426,6 @@ static int macb_remove(struct platform_device *pdev)
 		pm_runtime_dont_use_autosuspend(&pdev->dev);
 		if (!pm_runtime_suspended(&pdev->dev)) {
 			clk_disable_unprepare(bp->tx_clk);
-			clk_unregister(bp->tx_clk);
 			clk_disable_unprepare(bp->hclk);
 			clk_disable_unprepare(bp->pclk);
 			clk_disable_unprepare(bp->rx_clk);

> 
> Guenter
> 
> > I would suggest to drop that patch from the stable queue.
> > 
> > Guenter
> > 
> > ---
> > First traceback is:
> > 
> > [   19.203547] ------------[ cut here ]------------
> > [   19.204107] WARNING: CPU: 0 PID: 1 at drivers/clk/clk.c:4034 __clk_put+0xfc/0x128

Presumably this is the 

	WARN_ON_ONCE(IS_ERR(clk))

in __clk_put()? Or is it the exclusive count check that is getting
tricked out because of page poisoning?

I guess we should put that in some sort of text form of warning instead
of a not so helpful line number.

> > [   19.204275] Modules linked in:
> > [   19.204634] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.8-rc1-00191-gaf408bc6c96e #1
> > [   19.204790] Hardware name: Xilinx Zynq Platform
> > [   19.204994] [<c0313658>] (unwind_backtrace) from [<c030d698>] (show_stack+0x10/0x14)
> > [   19.205150] [<c030d698>] (show_stack) from [<c1139bdc>] (dump_stack+0xe0/0x10c)
> > [   19.205278] [<c1139bdc>] (dump_stack) from [<c0349098>] (__warn+0xf4/0x10c)
> > [   19.205399] [<c0349098>] (__warn) from [<c0349164>] (warn_slowpath_fmt+0xb4/0xbc)
> > [   19.205522] [<c0349164>] (warn_slowpath_fmt) from [<c0956d14>] (__clk_put+0xfc/0x128)
> > [   19.205654] [<c0956d14>] (__clk_put) from [<c0b1ea10>] (release_nodes+0x1c4/0x278)
> > [   19.205780] [<c0b1ea10>] (release_nodes) from [<c0b1a220>] (really_probe+0x108/0x34c)
> > [   19.205908] [<c0b1a220>] (really_probe) from [<c0b1a5dc>] (driver_probe_device+0x60/0x174)
> > [   19.206042] [<c0b1a5dc>] (driver_probe_device) from [<c0b1a898>] (device_driver_attach+0x58/0x60)
> > [   19.206179] [<c0b1a898>] (device_driver_attach) from [<c0b1a924>] (__driver_attach+0x84/0xc0)
> > [   19.206313] [<c0b1a924>] (__driver_attach) from [<c0b18400>] (bus_for_each_dev+0x78/0xb8)
> > [   19.206463] [<c0b18400>] (bus_for_each_dev) from [<c0b195e8>] (bus_add_driver+0x164/0x1e8)
> > [   19.206590] [<c0b195e8>] (bus_add_driver) from [<c0b1b6fc>] (driver_register+0x74/0x108)
> > [   19.206723] [<c0b1b6fc>] (driver_register) from [<c030315c>] (do_one_initcall+0x8c/0x3bc)
> > [   19.206857] [<c030315c>] (do_one_initcall) from [<c1a01080>] (kernel_init_freeable+0x14c/0x1e8)
> > [   19.206992] [<c1a01080>] (kernel_init_freeable) from [<c11547a4>] (kernel_init+0x8/0x118)
> > [   19.207116] [<c11547a4>] (kernel_init) from [<c03010b4>] (ret_from_fork+0x14/0x20)
> > 
> > followed by:
> > 
> > [   19.209792] 8<--- cut here ---
> > [   19.209926] Unable to handle kernel paging request at virtual address 6b6b6bb3
> > [   19.210117] pgd = (ptrval)
> > [   19.210207] [6b6b6bb3] *pgd=00000000
> > [   19.210626] Internal error: Oops: 5 [#1] SMP ARM
> > [   19.210807] Modules linked in:
> > [   19.210956] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.4.8-rc1-00191-gaf408bc6c96e #1
> > [   19.211090] Hardware name: Xilinx Zynq Platform
> > [   19.211200] PC is at __clk_put+0x104/0x128
> > [   19.211274] LR is at __clk_put+0xfc/0x128
> > [   19.211349] pc : [<c0956d1c>]    lr : [<c0956d14>]    psr: 60000053
> > [   19.211446] sp : c7129dd8  ip : 00000000  fp : c59f1680
> > [   19.211534] r10: c72fb6ac  r9 : c0b1dbd0  r8 : 00000008
> > [   19.211626] r7 : c7129e04  r6 : c72fb410  r5 : c59f0880  r4 : c59f3180
> > [   19.211727] r3 : 7a538c1d  r2 : 6b6b6b6b  r1 : 6b6b6b6b  r0 : 00000000
> > [   19.211885] Flags: nZCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment none
> > [   19.212022] Control: 10c5387d  Table: 00204059  DAC: 00000051
> > [   19.212152] Process swapper/0 (pid: 1, stack limit = 0x(ptrval))
> > [   19.212270] Stack: (0xc7129dd8 to 0xc712a000)
> > [   19.212391] 9dc0:                                                       c59f1680 c59f0880
> > [   19.212608] 9de0: c72fb410 c0b1ea10 ffffffed 00000000 c0b1e404 c7128000 c72fb410 a0000053
> > [   19.212822] 9e00: c72fb68c c59f1c80 c59f1480 7a538c1d 00000001 c241e19c c72fb410 c241e1a0
> > [   19.213029] 9e20: 00000000 c1d8a1ac 00000000 ffffffed c1b8124c c0b1a220 c72fb410 c1d8a1ac
> > [   19.213240] 9e40: c1d8a1ac c7128000 c1dc347c 00000007 000001f6 c0b1a5dc c1d8a1ac c1d8a1ac
> > [   19.213462] 9e60: c7128000 c72fb410 00000000 c1d8a1ac c7128000 c1dc347c 00000007 000001f6
> > [   19.213683] 9e80: c1b8124c c0b1a898 00000000 c1d8a1ac c72fb410 c0b1a924 00000000 c1d8a1ac
> > [   19.213899] 9ea0: c0b1a8a0 c0b18400 c70b50d4 c70b50a4 c725d210 7a538c1d c70b50d4 c1d8a1ac
> > [   19.214115] 9ec0: c59f0280 c1d6dd50 00000000 c0b195e8 c185eb44 c1aab944 00000000 c1d8a1ac
> > [   19.214343] 9ee0: c1aab944 00000000 c1c08468 c0b1b6fc c1dc46c0 c1aab944 00000000 c030315c
> > [   19.214555] 9f00: c1959bf0 000001f6 000001f6 c0372600 00000000 c19574b8 c1883c18 00000000
> > [   19.214783] 9f20: c7128000 c03b3f70 c7128000 c1dd1f00 c1c08468 c1ae7870 c1a00590 00000007
> > [   19.215001] 9f40: 000001f6 c03d39b8 00000000 7a538c1d c1dc347c c1dd1f00 c1dd1f00 c1ae7850
> > [   19.215214] 9f60: c1ae7870 c1a00590 00000007 c1a01080 00000006 00000006 00000000 c1a00590
> > [   19.215429] 9f80: 00000000 00000000 c115479c 00000000 00000000 00000000 00000000 00000000
> > [   19.215636] 9fa0: 00000000 c11547a4 00000000 c03010b4 00000000 00000000 00000000 00000000
> > [   19.215843] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> > [   19.216068] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
> > [   19.216255] [<c0956d1c>] (__clk_put) from [<c0b1ea10>] (release_nodes+0x1c4/0x278)
> > [   19.216376] [<c0b1ea10>] (release_nodes) from [<c0b1a220>] (really_probe+0x108/0x34c)
> > [   19.216494] [<c0b1a220>] (really_probe) from [<c0b1a5dc>] (driver_probe_device+0x60/0x174)
> > [   19.216617] [<c0b1a5dc>] (driver_probe_device) from [<c0b1a898>] (device_driver_attach+0x58/0x60)
> > [   19.216745] [<c0b1a898>] (device_driver_attach) from [<c0b1a924>] (__driver_attach+0x84/0xc0)
> > [   19.216867] [<c0b1a924>] (__driver_attach) from [<c0b18400>] (bus_for_each_dev+0x78/0xb8)
> > [   19.216993] [<c0b18400>] (bus_for_each_dev) from [<c0b195e8>] (bus_add_driver+0x164/0x1e8)
> > [   19.217112] [<c0b195e8>] (bus_add_driver) from [<c0b1b6fc>] (driver_register+0x74/0x108)
> > [   19.217233] [<c0b1b6fc>] (driver_register) from [<c030315c>] (do_one_initcall+0x8c/0x3bc)
> > [   19.217358] [<c030315c>] (do_one_initcall) from [<c1a01080>] (kernel_init_freeable+0x14c/0x1e8)
> > [   19.217500] [<c1a01080>] (kernel_init_freeable) from [<c11547a4>] (kernel_init+0x8/0x118)
> > [   19.217624] [<c11547a4>] (kernel_init) from [<c03010b4>] (ret_from_fork+0x14/0x20)
> 

  reply	other threads:[~2020-01-02  7:31 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-02  2:44 Clock related crashes in v5.4.y-queue Guenter Roeck
2020-01-02  3:41 ` Guenter Roeck
2020-01-02  7:30   ` Stephen Boyd [this message]
2020-01-02 14:13     ` Guenter Roeck
2020-01-02 14:19       ` Naresh Kamboju
2020-01-02 14:38         ` Guenter Roeck
2020-01-02 21:01 ` Greg Kroah-Hartman
2020-01-02 21:28   ` Guenter Roeck
2020-01-03  0:40     ` Sasha Levin
2020-01-03 14:50       ` Guenter Roeck
2020-01-05 16:02         ` Sasha Levin
2020-01-05 16:34           ` Guenter Roeck
2020-01-05 19:10             ` Sasha Levin
2020-01-06 13:20             ` Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200102073058.662A9215A4@mail.kernel.org \
    --to=sboyd@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=kishon@ti.com \
    --cc=linux-clk@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.