public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: John Garry <john.garry@huawei.com>
To: Robert Richter <rrichter@marvell.com>
Cc: Borislav Petkov <bp@alien8.de>,
	Mauro Carvalho Chehab <mchehab@kernel.org>,
	James Morse <james.morse@arm.com>,
	"tony.luck@intel.com" <tony.luck@intel.com>,
	"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	wanghuiqiang <wanghuiqiang@huawei.com>,
	Xiaofei Tan <tanxiaofei@huawei.com>,
	Linuxarm <linuxarm@huawei.com>,
	"Huangming (Mark)" <huangming23@huawei.com>
Subject: Re: linuxnext-2019119 edac warns (was Re: edac KASAN warning in experimental arm64 allmodconfig boot)
Date: Thu, 21 Nov 2019 15:23:42 +0000	[thread overview]
Message-ID: <4ff7631f-fbb7-e45f-87dd-9223beca4da7@huawei.com> (raw)
In-Reply-To: <20191121142302.rhvgkgqpiubidhtu@rric.localdomain>

On 21/11/2019 14:23, Robert Richter wrote:
> Hi John,
> 
> thanks for testing and reporting this. See inline.
> 
> On 21.11.19 12:34:22, John Garry wrote:
>> On 14/10/2019 16:18, John Garry wrote:
>> JFYI, I see an issue on linuxnext-2019119, as follows:
>>
>>     21.645388] io scheduler kyber registered
>> [   21.734011] input: Power Button as
>> /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input0
>> [   21.743295] ACPI: Power Button [PWRB]
>> [   21.809644] [Firmware Bug]: APEI: Invalid bit width + offset in GAR
>> [0x94110034/64/0/3/0]
>> [   21.821974] EDAC MC0: Giving out device to module ghes_edac.c controller
>> ghes_edac: DEV ghes (INTERRUPT)
>> [   21.831763] ------------[ cut here ]------------
>> [   21.836374] refcount_t: increment on 0; use-after-free.
>> [   21.841620] WARNING: CPU: 36 PID: 1 at lib/refcount.c:156
>> refcount_inc_checked+0x44/0x50
>> [   21.849697] Modules linked in:
>> [   21.852745] CPU: 36 PID: 1 Comm: swapper/0 Not tainted
>> 5.4.0-rc8-next-20191119-00003-g141a9fef5092-dirty #650
>> [   21.862645] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 -
>> V1.16.01 03/15/2019
>> [   21.871157] pstate: 60c00009 (nZCv daif +PAN +UAO)
>> [   21.875936] pc : refcount_inc_checked+0x44/0x50
>> [   21.880455] lr : refcount_inc_checked+0x44/0x50
> 
> This is a warning from the refcount framework. It warns if we
> increment from zero. This is reasonable as typically a kernel object
> is created with a refcount of 1 and thrown away once the refcount is
> zero. Afterwards the object is used-after-free.
> 
> For ghes the refcount is initialized with zero, and that is why we see
> this message. However, we protect the refcount with the ghes_reg_mutex
> and thus there is no use after free. The device is allocated and
> registered if the refcount is zero. So this works fine.
> 
> Enclosed a fix that avoids the warning, please test.
> 
> But see below...
> 
>> [   21.884972] sp : ffff00236ffbf8a0
>> [   21.888274] x29: ffff00236ffbf8a0 x28: 0000000000000002
>> [   21.893576] x27: ffff00236cd07900 x26: ffff002369063010
>> [   21.898876] x25: 0000000000000000 x24: ffff00233c236824
>> [   21.904177] x23: ffffa000137b9000 x22: ffffa00016fbb7c0
>> [   21.909477] x21: ffffa00012dfd000 x20: 1fffe0046dff7f24
>> [   21.914777] x19: ffff00233c236000 x18: 0000000000000000
>> [   21.920077] x17: 0000000000000000 x16: 0000000000000000
>> [   21.925377] x15: 0000000000007700 x14: 64655f7365686720
>> [   21.930677] x13: 72656c6c6f72746e x12: 1ffff40002719618
>> [   21.935977] x11: ffff940002719618 x10: dfffa00000000000
>> [   21.941278] x9 : ffff940002719619 x8 : 0000000000000001
>> [   21.946578] x7 : 0000000000000000 x6 : 0000000000000001
>> [   21.951877] x5 : ffff940002719618 x4 : ffff00236ffb0010
>> [   21.957178] x3 : ffffa000112415e4 x2 : ffff80046dff7ede
>> [   21.962478] x1 : 5aff78756b1cf400 x0 : 0000000000000000
>> [   21.967779] Call trace:
>> [   21.970214]  refcount_inc_checked+0x44/0x50
>> [   21.974389]  ghes_edac_register+0x258/0x388
>> [   21.978562]  ghes_probe+0x28c/0x5f0
>> [   21.982041]  platform_drv_probe+0x70/0xd8
>> [   21.986039]  really_probe+0x174/0x468
>> [   21.989690]  driver_probe_device+0x7c/0x148
>> [   21.993862]  device_driver_attach+0x94/0xa0
>> [   21.998033]  __driver_attach+0xa4/0x110
>> [   22.001857]  bus_for_each_dev+0xe8/0x158
>> [   22.005768]  driver_attach+0x30/0x40
>> [   22.009331]  bus_add_driver+0x234/0x2f0
>> [   22.013156]  driver_register+0xbc/0x1d0
>> [   22.016981]  __platform_driver_register+0x7c/0x88
>> [   22.021675]  ghes_init+0xbc/0x14c
>> [   22.024979]  do_one_initcall+0xb4/0x254
>> [   22.028805]  kernel_init_freeable+0x248/0x2f4
>> [   22.033151]  kernel_init+0x10/0x118
>> [   22.036628]  ret_from_fork+0x10/0x18
>> [   22.040194] ---[ end trace 33655bb65a9835fe ]---
>> [   22.046666] EDAC MC: bug in low-level driver: attempt to assign
>> [   22.046666]     duplicate mc_idx 0 in add_mc_to_global_list()
>> [   22.058311] ghes_edac: Can't register at EDAC core
>> [   22.065402] EDAC MC: bug in low-level driver: attempt to assign
>> [   22.065402]     duplicate mc_idx 0 in add_mc_to_global_list()
>> [   22.077080] ghes_edac: Can't register at EDAC core
>> [   22.084140] EDAC MC: bug in low-level driver: attempt to assign
>> [   22.084140]     duplicate mc_idx 0 in add_mc_to_global_list()
>> [   22.095789] ghes_edac: Can't register at EDAC core
>> [   22.102873] EDAC MC: bug in low-level driver: attempt to assign
>> [   22.102873]     duplicate mc_idx 0 in add_mc_to_global_list()
>> [   22.115442] ghes_edac: Can't register at EDAC core
>> [   22.122536] EDAC MC: bug in low-level driver: attempt to assign
>> [   22.122536]     duplicate mc_idx 0 in add_mc_to_global_list()
>> [   22.134344] ghes_edac: Can't register at EDAC core
>> [   22.141441] EDAC MC: bug in low-level driver: attempt to assign
>> [   22.141441]     duplicate mc_idx 0 in add_mc_to_global_list()
>> [   22.153089] ghes_edac: Can't register at EDAC core
>> [   22.160161] EDAC MC: bug in low-level driver: attempt to assign
>> [   22.160161]     duplicate mc_idx 0 in add_mc_to_global_list()
>> [   22.171810] ghes_edac: Can't register at EDAC core
> 
> What I am more concerned is this here. In total this implies 8 ghes
> users that all try to register a (single-instance) ghes mc device. For
> non-x86 only one instance is allowed (see ghes_edac_register(), idx =
> 0).
> 

[cc some guys about HEST]

> So on your platform, when parsing the HEST table
> (hest_ghes_dev_register()), more than one "GHES" device is parsed,
> allocated and registered. Mind sending me your HEST table
> (/sys/firmware/acpi/tables/HEST), or explain what happens here? 

I think that this should be the same:
https://github.com/tianocore/edk2-platforms/tree/master/Silicon/Hisilicon/Hi1620/Drivers/Apei/Hest


If
> this is a valid use case, we need to change ghes_edac_register() to
> support more than one instance.
> 
> Again, please try the patch below.
> 
> Thanks,
> 
> -Robert
> 
> 
>>From 6962f8af4a7c1051c9e87a5ac60571f70d2b6814 Mon Sep 17 00:00:00 2001
> From: Robert Richter <rrichter@marvell.com>
> Date: Thu, 21 Nov 2019 15:01:28 +0100
> Subject: [PATCH] EDAC/ghes: Do not warn on when increment refcount on 0
> 
> Signed-off-by: Robert Richter <rrichter@marvell.com>
> ---
>   drivers/edac/ghes_edac.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> index 47f4e7f90ef0..b99080d8a10c 100644
> --- a/drivers/edac/ghes_edac.c
> +++ b/drivers/edac/ghes_edac.c
> @@ -556,8 +556,8 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
>   	ghes_pvt = pvt;
>   	spin_unlock_irqrestore(&ghes_lock, flags);
>   
> -	/* only increment on success */
> -	refcount_inc(&ghes_refcount);
> +	/* only set on success */
> +	refcount_set(&ghes_refcount, 1);

Yep, that seems to have silenced it all:

[   21.739895] input: Power Button as 
/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input0
[   21.749848] ACPI: Power Button [PWRB]
[   21.816977] [Firmware Bug]: APEI: Invalid bit width + offset in GAR 
[0x94110034/64/0/3/0]
[   21.827880] EDAC MC0: Giving out device to module ghes_edac.c 
controller ghes_edac: DEV ghes (INTERRUPT)
[   21.841313] GHES: APEI firmware first mode is enabled by APEI bit and 
WHEA _OSC.
[   21.849176] EINJ: Error INJection is initialized.
[   21.855135] ACPI GTDT: found 1 SBSA generic Watchdog(s).

Thanks,
John

>   
>   unlock:
>   	mutex_unlock(&ghes_reg_mutex);
> 


Cheers,
John

  reply	other threads:[~2019-11-21 15:24 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-14 15:18 edac KASAN warning in experimental arm64 allmodconfig boot John Garry
2019-10-14 16:09 ` Borislav Petkov
2019-10-14 16:44   ` John Garry
2019-10-14 16:15 ` James Morse
2019-10-14 16:56   ` John Garry
2019-10-14 16:57     ` Borislav Petkov
2019-11-21 12:34 ` linuxnext-2019119 edac warns (was Re: edac KASAN warning in experimental arm64 allmodconfig boot) John Garry
2019-11-21 14:23   ` Robert Richter
2019-11-21 15:23     ` John Garry [this message]
2019-11-21 21:36       ` [PATCH] EDAC/ghes: Do not warn when incrementing refcount on 0 Robert Richter
2019-11-22  9:01         ` Borislav Petkov
2019-11-26  9:57           ` John Garry
2019-11-22 11:28       ` linuxnext-2019119 edac warns (was Re: edac KASAN warning in experimental arm64 allmodconfig boot) Robert Richter
2019-11-26  9:59         ` John Garry
2019-11-27 17:07           ` linuxnext-2019127 " John Garry
2019-11-27 20:54             ` Robert Richter
2019-11-28 11:02               ` linuxnext-20191127 " John Garry
2019-11-28 16:44                 ` Borislav Petkov
2019-11-28 21:12             ` linuxnext-2019127 " Robert Richter
2019-12-02 10:23               ` John Garry
2019-12-02 11:46                 ` Robert Richter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ff7631f-fbb7-e45f-87dd-9223beca4da7@huawei.com \
    --to=john.garry@huawei.com \
    --cc=bp@alien8.de \
    --cc=huangming23@huawei.com \
    --cc=james.morse@arm.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=mchehab@kernel.org \
    --cc=rrichter@marvell.com \
    --cc=tanxiaofei@huawei.com \
    --cc=tony.luck@intel.com \
    --cc=wanghuiqiang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox