All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Boehme, Markus" <markubo@amazon.de>
To: "Boehme, Markus" <markubo@amazon.de>,
	"minyard@acm.org" <minyard@acm.org>
Cc: "openipmi-developer@lists.sourceforge.net" 
	<openipmi-developer@lists.sourceforge.net>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Shah, Amit" <aams@amazon.de>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"arnd@arndb.de" <arnd@arndb.de>,
	"Park, Seongjae" <sjpark@amazon.com>,
	"Nuernberger, Stefan" <snu@amazon.de>
Subject: Re: [PATCH 3/3] ipmi: Add timeout waiting for channel information
Date: Thu, 10 Sep 2020 11:08:40 +0000	[thread overview]
Message-ID: <1599736120.29234.12.camel@amazon.de> (raw)
In-Reply-To: <20200908003412.GD15602@minyard.net>

Hey Corey, thanks for taking a look!

On Mon, 2020-09-07 at 19:34 -0500, Corey Minyard wrote:
> On Mon, Sep 07, 2020 at 06:25:37PM +0200, Markus Boehme wrote:
> > 
> > We have observed hosts with misbehaving BMCs that receive a Get Channel
> > Info command but don't respond. This leads to an indefinite wait in the
> > ipmi_msghandler's __scan_channels function, showing up as hung task
> > messages for modprobe.
> > 
> > Add a timeout waiting for the channel scan to complete. If the scan
> > fails to complete within that time, treat that like IPMI 1.0 and only
> > assume the presence of the primary IPMB channel at channel number 0.
> [...]
> While thinking about this, I realized an issue with these patches.
> There should be timers in the lower layers that detect that the BMC does
> not respond and should return an error response.  This is supposed to be
> guaranteed by the lower layer, you shouldn't need timers here.  In fact,
> if you abort with a timer here, you should get a lower layer reponds
> later, causing other issues.
> 
> So, this is wrong.  If you are never getting a response, there is a bug
> in the lower layer.  If you are not getting the error response as
> quickly as you would like, I'm not sure what to do about that.
> 

I see. I might not be able to get hold of more hosts behaving this way,
but if I do, I'll dig deeper into why the lower layer timeouts didn't
save us here. Thanks for the pointer.



> > Signed-off-by: Stefan Nuernberger <snu@amazon.com>
> > Signed-off-by: Markus Boehme <markubo@amazon.com>
> > ---
> >  drivers/char/ipmi/ipmi_msghandler.c | 72 ++++++++++++++++++++-----------------
> >  1 file changed, 39 insertions(+), 33 deletions(-)
> > 
> > diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c
> > index 2a2e8b2..9de9ba6 100644
> > --- a/drivers/char/ipmi/ipmi_msghandler.c
> > +++ b/drivers/char/ipmi/ipmi_msghandler.c
> > @@ -3315,46 +3315,52 @@ channel_handler(struct ipmi_smi *intf, struct ipmi_recv_msg *msg)
> >   */
> >  static int __scan_channels(struct ipmi_smi *intf, struct ipmi_device_id *id)
> >  {
> > -     int rv;
> > +     long rv;
> > +     unsigned int set;
> > 
> > -     if (ipmi_version_major(id) > 1
> > -                     || (ipmi_version_major(id) == 1
> > -                         && ipmi_version_minor(id) >= 5)) {
> > -             unsigned int set;
> > +     if (ipmi_version_major(id) == 1 && ipmi_version_minor(id) < 5) {
> This is incorrect, it will not correctly handle IPMI 0.x BMCs.  Yes,
> they exist.

Interesting! I wasn't aware of those. Searching the web doesn't turn up
much and the spec doesn't mention them either. Are these pre-release
implementations of the IPMI 1.0 spec or some kind of "IPMI light"?

Markus



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879



  reply	other threads:[~2020-09-10 11:13 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-07 16:25 [PATCH 1/3] ipmi: Reset response handler when failing to send the command Markus Boehme
2020-09-07 16:25 ` [PATCH 2/3] ipmi: Add timeout waiting for device GUID Markus Boehme
2020-09-08  0:07   ` Corey Minyard
2020-09-07 16:25 ` [PATCH 3/3] ipmi: Add timeout waiting for channel information Markus Boehme
2020-09-08  0:34   ` Corey Minyard
2020-09-10 11:08     ` Boehme, Markus [this message]
2020-10-07 18:42       ` [Openipmi-developer] " Corey Minyard
2020-09-08  0:03 ` [PATCH 1/3] ipmi: Reset response handler when failing to send the command Corey Minyard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1599736120.29234.12.camel@amazon.de \
    --to=markubo@amazon.de \
    --cc=aams@amazon.de \
    --cc=arnd@arndb.de \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=minyard@acm.org \
    --cc=openipmi-developer@lists.sourceforge.net \
    --cc=sjpark@amazon.com \
    --cc=snu@amazon.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.