From: Corey Minyard <minyard@acm.org>
To: Xianting Tian <tian.xianting@h3c.com>
Cc: arnd@arndb.de, gregkh@linuxfoundation.org,
openipmi-developer@lists.sourceforge.net,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] ipmi: retry to get device id when error
Date: Sun, 13 Sep 2020 07:39:30 -0500 [thread overview]
Message-ID: <20200913123930.GH15602@minyard.net> (raw)
In-Reply-To: <20200913120203.3368-1-tian.xianting@h3c.com>
On Sun, Sep 13, 2020 at 08:02:03PM +0800, Xianting Tian wrote:
> We can't get bmc's device id with low probability when loading ipmi driver,
> it caused bmc device register failed. This issue may caused by bad lpc
> signal quality. When this issue happened, we got below kernel printks:
> [Wed Sep 9 19:52:03 2020] ipmi_si IPI0001:00: IPMI message handler: device id demangle failed: -22
> [Wed Sep 9 19:52:03 2020] IPMI BT: using default values
> [Wed Sep 9 19:52:03 2020] IPMI BT: req2rsp=5 secs retries=2
> [Wed Sep 9 19:52:03 2020] ipmi_si IPI0001:00: Unable to get the device id: -5
> [Wed Sep 9 19:52:04 2020] ipmi_si IPI0001:00: Unable to register device: error -5
>
> When this issue happened, we want to manually unload the driver and try to
> load it again, but it can't be unloaded by 'rmmod' as it is already 'in use'.
I'm not sure this patch is a good idea; it would cause a long boot delay
in situations where there really isn't a BMC out there. Yes, it
happens.
You don't have to reload the driver to add a device, though. You can
hot-add devices using /sys/modules/ipmi_si/parameters/hotmod. Look in
Documentation/driver-api/ipmi.rst for details.
Does that work for you?
-corey
>
> We add below 'printk' in handle_one_recv_msg(), when this issue happened,
> the msg we received is "Recv: 1c 01 d5", which means the data_len is 1,
> data[0] is 0xd5.
> Debug code:
> static int handle_one_recv_msg(struct ipmi_smi *intf,
> struct ipmi_smi_msg *msg) {
> printk("Recv: %*ph\n", msg->rsp_size, msg->rsp);
> ... ...
> }
> Then in ipmi_demangle_device_id(), it returned '-EINVAL' as 'data_len < 7'
> and 'data[0] != 0'.
>
> We used this patch to retry to get device id when error happen, we
> reproduced this issue again and the retry succeed on the first retry, we
> finally got the correct msg and then all is ok:
> Recv: 1c 01 00 01 81 05 84 02 af db 07 00 01 00 b9 00 10 00
>
> So use retry machanism in this patch to give bmc more opportunity to
> correctly response kernel.
>
> Signed-off-by: Xianting Tian <tian.xianting@h3c.com>
> ---
> drivers/char/ipmi/ipmi_msghandler.c | 17 ++++++++++++++---
> 1 file changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c
> index 737c0b6b2..bfb2de77a 100644
> --- a/drivers/char/ipmi/ipmi_msghandler.c
> +++ b/drivers/char/ipmi/ipmi_msghandler.c
> @@ -34,6 +34,7 @@
> #include <linux/uuid.h>
> #include <linux/nospec.h>
> #include <linux/vmalloc.h>
> +#include <linux/delay.h>
>
> #define IPMI_DRIVER_VERSION "39.2"
>
> @@ -60,6 +61,9 @@ enum ipmi_panic_event_op {
> #else
> #define IPMI_PANIC_DEFAULT IPMI_SEND_PANIC_EVENT_NONE
> #endif
> +
> +#define GET_DEVICE_ID_MAX_RETRY 5
> +
> static enum ipmi_panic_event_op ipmi_send_panic_event = IPMI_PANIC_DEFAULT;
>
> static int panic_op_write_handler(const char *val,
> @@ -2426,19 +2430,26 @@ send_get_device_id_cmd(struct ipmi_smi *intf)
> static int __get_device_id(struct ipmi_smi *intf, struct bmc_device *bmc)
> {
> int rv;
> -
> - bmc->dyn_id_set = 2;
> + unsigned int retry_count = 0;
>
> intf->null_user_handler = bmc_device_id_handler;
>
> +retry:
> + bmc->dyn_id_set = 2;
> +
> rv = send_get_device_id_cmd(intf);
> if (rv)
> return rv;
>
> wait_event(intf->waitq, bmc->dyn_id_set != 2);
>
> - if (!bmc->dyn_id_set)
> + if (!bmc->dyn_id_set) {
> + msleep(1000);
> + if (++retry_count <= GET_DEVICE_ID_MAX_RETRY)
> + goto retry;
> +
> rv = -EIO; /* Something went wrong in the fetch. */
> + }
>
> /* dyn_id_set makes the id data available. */
> smp_rmb();
> --
> 2.17.1
>
next prev parent reply other threads:[~2020-09-13 12:39 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-13 12:02 [PATCH] ipmi: retry to get device id when error Xianting Tian
2020-09-13 12:39 ` Corey Minyard [this message]
2020-09-13 14:10 ` Tianxianting
2020-09-13 17:26 ` Corey Minyard
2020-09-14 8:25 ` Tianxianting
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200913123930.GH15602@minyard.net \
--to=minyard@acm.org \
--cc=arnd@arndb.de \
--cc=gregkh@linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=openipmi-developer@lists.sourceforge.net \
--cc=tian.xianting@h3c.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.