public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Corey Minyard <minyard@acm.org>
To: Markus Boehme <markubo@amazon.com>
Cc: openipmi-developer@lists.sourceforge.net,
	Arnd Bergmann <arnd@arndb.de>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	linux-kernel@vger.kernel.org, Stefan Nuernberger <snu@amazon.com>,
	SeongJae Park <sjpark@amazon.com>, Amit Shah <aams@amazon.com>
Subject: Re: [PATCH 3/3] ipmi: Add timeout waiting for channel information
Date: Mon, 7 Sep 2020 19:34:12 -0500	[thread overview]
Message-ID: <20200908003412.GD15602@minyard.net> (raw)
In-Reply-To: <1599495937-10654-3-git-send-email-markubo@amazon.com>

On Mon, Sep 07, 2020 at 06:25:37PM +0200, Markus Boehme wrote:
> We have observed hosts with misbehaving BMCs that receive a Get Channel
> Info command but don't respond. This leads to an indefinite wait in the
> ipmi_msghandler's __scan_channels function, showing up as hung task
> messages for modprobe.
> 
> Add a timeout waiting for the channel scan to complete. If the scan
> fails to complete within that time, treat that like IPMI 1.0 and only
> assume the presence of the primary IPMB channel at channel number 0.

This patch is a significant rewrite of the function.  This makes me a
little uncomfortable.  It's generally better to fix the bug in a minimal
patch.  It would be easier to read if you had two patches, one to
restructure the code and one to fix the bug.

One comment inline, but it doesn't matter, because...

While thinking about this, I realized an issue with these patches.
There should be timers in the lower layers that detect that the BMC does
not respond and should return an error response.  This is supposed to be
guaranteed by the lower layer, you shouldn't need timers here.  In fact,
if you abort with a timer here, you should get a lower layer reponds
later, causing other issues.

So, this is wrong.  If you are never getting a response, there is a bug
in the lower layer.  If you are not getting the error response as
quickly as you would like, I'm not sure what to do about that.

The first patch, of course, is an obvious bug fix.  I'll include that.

-corey

> 
> Signed-off-by: Stefan Nuernberger <snu@amazon.com>
> Signed-off-by: Markus Boehme <markubo@amazon.com>
> ---
>  drivers/char/ipmi/ipmi_msghandler.c | 72 ++++++++++++++++++++-----------------
>  1 file changed, 39 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c
> index 2a2e8b2..9de9ba6 100644
> --- a/drivers/char/ipmi/ipmi_msghandler.c
> +++ b/drivers/char/ipmi/ipmi_msghandler.c
> @@ -3315,46 +3315,52 @@ channel_handler(struct ipmi_smi *intf, struct ipmi_recv_msg *msg)
>   */
>  static int __scan_channels(struct ipmi_smi *intf, struct ipmi_device_id *id)
>  {
> -	int rv;
> +	long rv;
> +	unsigned int set;
>  
> -	if (ipmi_version_major(id) > 1
> -			|| (ipmi_version_major(id) == 1
> -			    && ipmi_version_minor(id) >= 5)) {
> -		unsigned int set;
> +	if (ipmi_version_major(id) == 1 && ipmi_version_minor(id) < 5) {

This is incorrect, it will not correctly handle IPMI 0.x BMCs.  Yes,
they exist.

> +		set = intf->curr_working_cset;
> +		goto single_ipmb_channel;
> +	}
>  
> -		/*
> -		 * Start scanning the channels to see what is
> -		 * available.
> -		 */
> -		set = !intf->curr_working_cset;
> -		intf->curr_working_cset = set;
> -		memset(&intf->wchannels[set], 0,
> -		       sizeof(struct ipmi_channel_set));
> -
> -		intf->null_user_handler = channel_handler;
> -		intf->curr_channel = 0;
> -		rv = send_channel_info_cmd(intf, 0);
> -		if (rv) {
> -			dev_warn(intf->si_dev,
> -				 "Error sending channel information for channel 0, %d\n",
> -				 rv);
> -			intf->null_user_handler = NULL;
> -			return -EIO;
> -		}
> +	/*
> +	 * Start scanning the channels to see what is
> +	 * available.
> +	 */
> +	set = !intf->curr_working_cset;
> +	intf->curr_working_cset = set;
> +	memset(&intf->wchannels[set], 0, sizeof(struct ipmi_channel_set));
>  
> -		/* Wait for the channel info to be read. */
> -		wait_event(intf->waitq, intf->channels_ready);
> +	intf->null_user_handler = channel_handler;
> +	intf->curr_channel = 0;
> +	rv = send_channel_info_cmd(intf, 0);
> +	if (rv) {
> +		dev_warn(intf->si_dev,
> +			 "Error sending channel information for channel 0, %ld\n",
> +			 rv);
>  		intf->null_user_handler = NULL;
> -	} else {
> -		unsigned int set = intf->curr_working_cset;
> +		return -EIO;
> +	}
>  
> -		/* Assume a single IPMB channel at zero. */
> -		intf->wchannels[set].c[0].medium = IPMI_CHANNEL_MEDIUM_IPMB;
> -		intf->wchannels[set].c[0].protocol = IPMI_CHANNEL_PROTOCOL_IPMB;
> -		intf->channel_list = intf->wchannels + set;
> -		intf->channels_ready = true;
> +	/* Wait for the channel info to be read. */
> +	rv = wait_event_timeout(intf->waitq, intf->channels_ready, 5 * HZ);
> +	if (rv == 0) {
> +		dev_warn(intf->si_dev,
> +			 "Timed out waiting for channel information. Assuming a single IPMB channel at 0.\n");
> +		goto single_ipmb_channel;
>  	}
>  
> +	goto out;
> +
> +single_ipmb_channel:
> +	/* Assume a single IPMB channel at zero. */
> +	intf->wchannels[set].c[0].medium = IPMI_CHANNEL_MEDIUM_IPMB;
> +	intf->wchannels[set].c[0].protocol = IPMI_CHANNEL_PROTOCOL_IPMB;
> +	intf->channel_list = intf->wchannels + set;
> +	intf->channels_ready = true;
> +
> +out:
> +	intf->null_user_handler = NULL;
>  	return 0;
>  }
>  
> -- 
> 2.7.4
> 

  reply	other threads:[~2020-09-08  0:34 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-07 16:25 [PATCH 1/3] ipmi: Reset response handler when failing to send the command Markus Boehme
2020-09-07 16:25 ` [PATCH 2/3] ipmi: Add timeout waiting for device GUID Markus Boehme
2020-09-08  0:07   ` Corey Minyard
2020-09-07 16:25 ` [PATCH 3/3] ipmi: Add timeout waiting for channel information Markus Boehme
2020-09-08  0:34   ` Corey Minyard [this message]
2020-09-10 11:08     ` Boehme, Markus
2020-10-07 18:42       ` [Openipmi-developer] " Corey Minyard
2020-09-08  0:03 ` [PATCH 1/3] ipmi: Reset response handler when failing to send the command Corey Minyard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200908003412.GD15602@minyard.net \
    --to=minyard@acm.org \
    --cc=aams@amazon.com \
    --cc=arnd@arndb.de \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=markubo@amazon.com \
    --cc=openipmi-developer@lists.sourceforge.net \
    --cc=sjpark@amazon.com \
    --cc=snu@amazon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox