All of lore.kernel.org
 help / color / mirror / Atom feed
From: Corey Minyard <minyard@acm.org>
To: "河合英宏 / KAWAI,HIDEHIRO" <hidehiro.kawai.ez@hitachi.com>
Cc: "openipmi-developer@lists.sourceforge.net" 
	<openipmi-developer@lists.sourceforge.net>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 6/7] ipmi: Handle queued messages more certainly on panic
Date: Sat, 22 Aug 2015 12:39:54 -0500	[thread overview]
Message-ID: <55D8B3EA.5090008@acm.org> (raw)
In-Reply-To: <04EAB7311EE43145B2D3536183D1A84454938028@GSjpTKYDCembx31.service.hitachi.net>

On 08/17/2015 08:59 PM, 河合英宏 / KAWAI,HIDEHIRO wrote:
> Hello Corey,
>
>> -----Original Message-----
>> From: Corey Minyard [mailto:tcminyard@gmail.com] On Behalf Of Corey Minyard
>> Sent: Wednesday, August 12, 2015 1:13 PM
>> To: 河合英宏 / KAWAI,HIDEHIRO
>> Cc: openipmi-developer@lists.sourceforge.net; linux-kernel@vger.kernel.org
>> Subject: Re: [PATCH 6/7] ipmi: Handle queued messages more certainly on panic
>>
>> On 07/27/2015 12:55 AM, Hidehiro Kawai wrote:
>>> panic_event() called as a panic notifier tries to flush queued
>>> messages, but it can't handle them if the kernel panic happens
>>> while processing a message.  What happens depends on when the
>>> kernel panics.
>> Sorry this took so long, I've been traveling.
> No problem.
>
>> I have queued the patches before this one.  They all look good and
>> necessary.
> Thank you for reviewing!
>  
>> I'm not so sure about this patch.  It looks like the only thing that is
>> a real issue is #2 below.
>> It's not so important to avoid dropping messages.
> Initially I thought dropping middle of queued messages breaks
> some consistencies if a message depends on the preceding dropped
> message.  However, userland tools normally issue request messages
> in sequential manner, so the above situation wouldn't happen.
> Now, I think dropping a message is OK.
>
>> Can this be simplified somehow to work around the issue at panic time if
>> intf->curr_msg is set and smi_info->waiting_msg is not?
> There are two cases where intf->curr_msg is set and
> smi_info->waiting_msg is not; one is before (2) and the other
> is after (3).  If we decide to drop intf->curr_msg in both cases,
> I can simplify this patch somewhat.

Yes, please do.

-corey

> Regards,
>
> Hidehiro Kawai
> Hitachi, Ltd. Research & Development Group
>
>> Thank you,
>>
>> -corey
>>
>>> Here is the summary of message sending process.
>>>
>>>     smi_send()
>>>      smi_add_send_msg()
>>> (1)   intf->curr_msg = msg
>>>      sender()
>>> (2)   smi_info->waiting_msg = msg
>>>
>>>     <asynchronously called>
>>>     check_start_timer_thread()
>>>      start_next_msg()
>>>       smi_info->curr_msg = smi_info->waiting_msg
>>> (3)   smi_info->waiting_msg = NULL
>>> (4)   smi_info->handlers->start_transaction()
>>>
>>>     <asynchronously called>
>>>     smi_event_handler()
>>> (5)  handle_transaction_done()
>>>       smi_info->curr_msg = NULL
>>>       deliver_recv_msg()
>>>        ipmi_smi_msg_received()
>>>         intf->curr_msg = NULL
>>>
>>> If the kernel panics before (1), the requested message will be
>>> lost.  But it can't be helped.
>>>
>>> If the kernel panics before (2), new message sent by
>>> send_panic_events() is queued to intf->xmit_msgs because
>>> intf->curr_msg is non-NULL.  But the new message will be never
>>> sent because no one sends intf->curr_msg.  As the result, the
>>> kernel hangs up.
>>>
>>> If the kernel panics before (3), intf->curr_msg will be sent by
>>> set_run_to_completion().  It's no problem.
>>>
>>> If the kernel panics before (4), intf->curr_msg will be lost.
>>> However, messages on intf->xmit_msgs will be handled.
>>>
>>> If the kernel panics before (5), we try to continue running the
>>> state machine.  It may successfully complete.
>>>
>>> If the kernel panics after (5), we will miss the response message
>>> handling, but it's not much problem in the panic context.
>>>
>>> This patch tries to handle messages in intf->curr_msg and
>>> intf->xmit_msgs only once without losing them.  To achieve this,
>>> this patch does that:
>>>   - if a message is in intf->curr_msg or intf->xmit_msgs and
>>>     start_transaction() for the message hasn't been done yet,
>>>     resend it
>>>   - if start_transaction() for a message has been called,
>>>     just continue to run the state machine
>>>   - if the transaction has been completed, do nothing
>>>
>>> >From the perspective of implementation, these are done by keeping
>>> smi_info->waiting_msg until start_transaction() is completed and
>>> by keeping new flag IPMI_MSG_RESEND_ON_PANIC just before starting
>>> the state machine.
>>>
>>> Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
>>> ---
>>>  drivers/char/ipmi/ipmi_msghandler.c |   36 +++++++++++++++++++++++++++++++++++
>>>  drivers/char/ipmi/ipmi_si_intf.c    |    5 ++++-
>>>  include/linux/ipmi_smi.h            |    5 +++++
>>>  3 files changed, 45 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c
>>> index 5a2d9fe..3dcd814 100644
>>> --- a/drivers/char/ipmi/ipmi_msghandler.c
>>> +++ b/drivers/char/ipmi/ipmi_msghandler.c
>>> @@ -1493,6 +1493,8 @@ static struct ipmi_smi_msg *smi_add_send_msg(ipmi_smi_t intf,
>>>  					     struct ipmi_smi_msg *smi_msg,
>>>  					     int priority)
>>>  {
>>> +	smi_msg->flags |= IPMI_MSG_RESEND_ON_PANIC;
>>> +
>>>  	if (intf->curr_msg) {
>>>  		if (priority > 0)
>>>  			list_add_tail(&smi_msg->link, &intf->hp_xmit_msgs);
>>> @@ -4223,6 +4225,7 @@ struct ipmi_smi_msg *ipmi_alloc_smi_msg(void)
>>>  		rv->done = free_smi_msg;
>>>  		rv->user_data = NULL;
>>>  		atomic_inc(&smi_msg_inuse_count);
>>> +		rv->flags = 0;
>>>  	}
>>>  	return rv;
>>>  }
>>> @@ -4531,7 +4534,40 @@ static int panic_event(struct notifier_block *this,
>>>  			spin_unlock(&intf->waiting_rcv_msgs_lock);
>>>
>>>  		intf->run_to_completion = 1;
>>> +restart:
>>>  		intf->handlers->set_run_to_completion(intf->send_info, 1);
>>> +
>>> +		if (intf->curr_msg) {
>>> +			/*
>>> +			 * This can happen if the kernel panics before
>>> +			 * setting msg to smi_info->waiting_msg or while
>>> +			 * processing a response.  For the former case, we
>>> +			 * resend the message by re-queueing it.  For the
>>> +			 * latter case, we simply ignore it because handling
>>> +			 * response is not much meaningful in the panic
>>> +			 * context.
>>> +			 */
>>> +
>>> +			/*
>>> +			 * Since we want to send the current message first,
>>> +			 * re-queue it into the high-prioritized queue.
>>> +			 */
>>> +			if (intf->curr_msg->flags & IPMI_MSG_RESEND_ON_PANIC)
>>> +				list_add(&intf->curr_msg->link,
>>> +					 &intf->hp_xmit_msgs);
>>> +
>>> +			intf->curr_msg = NULL;
>>> +		}
>>> +
>>> +		if (!list_empty(&intf->hp_xmit_msgs) ||
>>> +		    !list_empty(&intf->xmit_msgs)) {
>>> +			/*
>>> +			 * This can happen if the kernel panics while
>>> +			 * processing a response.  Kick the queue and restart.
>>> +			 */
>>> +			smi_recv_tasklet((unsigned long)intf);
>>> +			goto restart;
>>> +		}
>>>  	}
>>>
>>>  #ifdef CONFIG_IPMI_PANIC_EVENT
>>> diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c
>>> index 814b7b7..c5c7806 100644
>>> --- a/drivers/char/ipmi/ipmi_si_intf.c
>>> +++ b/drivers/char/ipmi/ipmi_si_intf.c
>>> @@ -383,7 +383,6 @@ static enum si_sm_result start_next_msg(struct smi_info *smi_info)
>>>  		int err;
>>>
>>>  		smi_info->curr_msg = smi_info->waiting_msg;
>>> -		smi_info->waiting_msg = NULL;
>>>  		debug_timestamp("Start2");
>>>  		err = atomic_notifier_call_chain(&xaction_notifier_list,
>>>  				0, smi_info);
>>> @@ -401,6 +400,7 @@ static enum si_sm_result start_next_msg(struct smi_info *smi_info)
>>>  		rv = SI_SM_CALL_WITHOUT_DELAY;
>>>  	}
>>>   out:
>>> +	smi_info->waiting_msg = NULL;
>>>  	return rv;
>>>  }
>>>
>>> @@ -804,6 +804,9 @@ static enum si_sm_result smi_event_handler(struct smi_info *smi_info,
>>>  {
>>>  	enum si_sm_result si_sm_result;
>>>
>>> +	if (smi_info->curr_msg)
>>> +		smi_info->curr_msg->flags &= ~(IPMI_MSG_RESEND_ON_PANIC);
>>> +
>>>   restart:
>>>  	/*
>>>  	 * There used to be a loop here that waited a little while
>>> diff --git a/include/linux/ipmi_smi.h b/include/linux/ipmi_smi.h
>>> index ba57fb1..1200872 100644
>>> --- a/include/linux/ipmi_smi.h
>>> +++ b/include/linux/ipmi_smi.h
>>> @@ -47,6 +47,9 @@
>>>  /* Structure for the low-level drivers. */
>>>  typedef struct ipmi_smi *ipmi_smi_t;
>>>
>>> +/* Flags for flags member of struct ipmi_smi_msg */
>>> +#define IPMI_MSG_RESEND_ON_PANIC	1 /* If set, resend in panic_event() */
>>> +
>>>  /*
>>>   * Messages to/from the lower layer.  The smi interface will take one
>>>   * of these to send. After the send has occurred and a response has
>>> @@ -75,6 +78,8 @@ struct ipmi_smi_msg {
>>>  	/* Will be called when the system is done with the message
>>>  	   (presumably to free it). */
>>>  	void (*done)(struct ipmi_smi_msg *msg);
>>> +
>>> +	int flags;
>>>  };
>>>
>>>  struct ipmi_smi_handlers {
>>>
>>>


      reply	other threads:[~2015-08-23 23:14 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-27  5:55 [PATCH 0/7] ipmi: various fixes for panic notifier robustness Hidehiro Kawai
2015-07-27  5:55 ` [PATCH 5/7] ipmi: Don't call receive handler in the panic context Hidehiro Kawai
2015-07-27  5:55 ` [PATCH 2/7] ipmi: Factor out message flushing procedure Hidehiro Kawai
2015-07-27  5:55 ` [PATCH 4/7] ipmi: Avoid touching possible corrupted lists in the panic context Hidehiro Kawai
2015-07-27  5:55 ` [PATCH 3/7] ipmi: Don't flush messages in sneder() in run-to-completion mode Hidehiro Kawai
2015-07-27  5:55 ` [PATCH 7/7] ipmi/kcs: Don't run the KCS state machine when it is KCS_IDLE Hidehiro Kawai
2015-08-12  4:15   ` Corey Minyard
2015-08-18  2:54     ` 河合英宏 / KAWAI,HIDEHIRO
2015-08-22 17:45       ` Corey Minyard
2015-08-24  1:52         ` 河合英宏 / KAWAI,HIDEHIRO
2015-08-24 16:00           ` Corey Minyard
2015-08-25  3:53             ` 河合英宏 / KAWAI,HIDEHIRO
2015-08-26 20:27               ` Corey Minyard
2015-08-27  1:35                 ` 河合英宏 / KAWAI,HIDEHIRO
2015-07-27  5:55 ` [PATCH 1/7] ipmi: Remove unneeded set_run_to_completion call Hidehiro Kawai
2015-07-27  5:55 ` [PATCH 6/7] ipmi: Handle queued messages more certainly on panic Hidehiro Kawai
2015-08-12  4:13   ` Corey Minyard
2015-08-18  1:59     ` 河合英宏 / KAWAI,HIDEHIRO
2015-08-22 17:39       ` Corey Minyard [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55D8B3EA.5090008@acm.org \
    --to=minyard@acm.org \
    --cc=hidehiro.kawai.ez@hitachi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=openipmi-developer@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.