All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: KY Srinivasan <kys@microsoft.com>
Cc: "devel\@linuxdriverproject.org" <devel@linuxdriverproject.org>,
	"linux-kernel\@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Haiyang Zhang" <haiyangz@microsoft.com>,
	"Alex Ng \(LIS\)" <alexng@microsoft.com>,
	"Radim Krcmar" <rkrcmar@redhat.com>,
	Cathy Avery <cavery@redhat.com>
Subject: Re: [PATCH] Drivers: hv: vmbus: handle various crash scenarios
Date: Mon, 21 Mar 2016 08:51:54 +0100	[thread overview]
Message-ID: <874mc02rqd.fsf@vitty.brq.redhat.com> (raw)
In-Reply-To: <SN2PR03MB214246D9A9DAFA36E392AE78A08C0@SN2PR03MB2142.namprd03.prod.outlook.com> (KY Srinivasan's message of "Fri, 18 Mar 2016 18:02:53 +0000")

KY Srinivasan <kys@microsoft.com> writes:

>> -----Original Message-----
>> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
>> Sent: Friday, March 18, 2016 5:33 AM
>> To: devel@linuxdriverproject.org
>> Cc: linux-kernel@vger.kernel.org; KY Srinivasan <kys@microsoft.com>;
>> Haiyang Zhang <haiyangz@microsoft.com>; Alex Ng (LIS)
>> <alexng@microsoft.com>; Radim Krcmar <rkrcmar@redhat.com>; Cathy
>> Avery <cavery@redhat.com>
>> Subject: [PATCH] Drivers: hv: vmbus: handle various crash scenarios
>> 
>> Kdump keeps biting. Turns out CHANNELMSG_UNLOAD_RESPONSE is always
>> delivered to CPU0 regardless of what CPU we're sending
>> CHANNELMSG_UNLOAD
>> from. vmbus_wait_for_unload() doesn't account for the fact that in case
>> we're crashing on some other CPU and CPU0 is still alive and operational
>> CHANNELMSG_UNLOAD_RESPONSE will be delivered there completing
>> vmbus_connection.unload_event, our wait on the current CPU will never
>> end.
>
> What was the host you were testing on?
>

I was testing on both 2012R2 and 2016TP4. The bug is easily reproducible
by forcing crash on a secondary CPU, e.g.:

# cat crash.sh
#! /bin/sh
echo c > /proc/sysrq-trigger

# taskset -c 1 ./crash.sh

>> 
>> Do the following:
>> 1) Check for completion_done() in the loop. In case interrupt handler is
>>    still alive we'll get the confirmation we need.
>> 
>> 2) Always read CPU0's message page as CHANNELMSG_UNLOAD_RESPONSE
>> will be
>>    delivered there. We can race with still-alive interrupt handler doing
>>    the same but we don't care as we're checking completion_done() now.
>> 
>> 3) Cleanup message pages on all CPUs. This is required (at least for the
>>    current CPU as we're clearing CPU0 messages now but we may want to
>> bring
>>    up additional CPUs on crash) as new messages won't be delivered till we
>>    consume what's pending. On boot we'll place message pages somewhere
>> else
>>    and we won't be able to read stale messages.
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  drivers/hv/channel_mgmt.c | 30 +++++++++++++++++++++++++-----
>>  1 file changed, 25 insertions(+), 5 deletions(-)
>> 
>> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
>> index b10e8f74..5f37057 100644
>> --- a/drivers/hv/channel_mgmt.c
>> +++ b/drivers/hv/channel_mgmt.c
>> @@ -512,14 +512,26 @@ static void init_vp_index(struct vmbus_channel
>> *channel, const uuid_le *type_gui
>> 
>>  static void vmbus_wait_for_unload(void)
>>  {
>> -	int cpu = smp_processor_id();
>> -	void *page_addr = hv_context.synic_message_page[cpu];
>> +	int cpu;
>> +	void *page_addr = hv_context.synic_message_page[0];
>>  	struct hv_message *msg = (struct hv_message *)page_addr +
>>  				  VMBUS_MESSAGE_SINT;
>>  	struct vmbus_channel_message_header *hdr;
>>  	bool unloaded = false;
>> 
>> -	while (1) {
>> +	/*
>> +	 * CHANNELMSG_UNLOAD_RESPONSE is always delivered to CPU0.
>> When we're
>> +	 * crashing on a different CPU let's hope that IRQ handler on CPU0 is
>> +	 * still functional and vmbus_unload_response() will complete
>> +	 * vmbus_connection.unload_event. If not, the last thing we can do
>> is
>> +	 * read message page for CPU0 regardless of what CPU we're on.
>> +	 */
>> +	while (!unloaded) {
>> +		if (completion_done(&vmbus_connection.unload_event)) {
>> +			unloaded = true;
>> +			break;
>> +		}
>> +
>>  		if (READ_ONCE(msg->header.message_type) ==
>> HVMSG_NONE) {
>>  			mdelay(10);
>>  			continue;
>> @@ -530,9 +542,17 @@ static void vmbus_wait_for_unload(void)
>>  			unloaded = true;
>> 
>>  		vmbus_signal_eom(msg);
>> +	}
>> 
>> -		if (unloaded)
>> -			break;
>> +	/*
>> +	 * We're crashing and already got the UNLOAD_RESPONSE, cleanup
>> all
>> +	 * maybe-pending messages on all CPUs to be able to receive new
>> +	 * messages after we reconnect.
>> +	 */
>> +	for_each_online_cpu(cpu) {
>> +		page_addr = hv_context.synic_message_page[cpu];
>> +		msg = (struct hv_message *)page_addr +
>> VMBUS_MESSAGE_SINT;
>> +		msg->header.message_type = HVMSG_NONE;
>>  	}
>>  }
>> 
>> --
>> 2.5.0

-- 
  Vitaly

  reply	other threads:[~2016-03-21  7:52 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-18 12:33 [PATCH] Drivers: hv: vmbus: handle various crash scenarios Vitaly Kuznetsov
2016-03-18 15:20 ` Radim Krcmar
2016-03-18 15:53   ` Vitaly Kuznetsov
2016-03-18 16:11     ` Radim Krcmar
2016-03-18 18:02 ` KY Srinivasan
2016-03-21  7:51   ` Vitaly Kuznetsov [this message]
2016-03-21 22:44     ` KY Srinivasan
2016-03-22  9:47       ` Vitaly Kuznetsov
2016-03-22 14:00       ` Vitaly Kuznetsov
2016-03-22 14:18         ` KY Srinivasan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874mc02rqd.fsf@vitty.brq.redhat.com \
    --to=vkuznets@redhat.com \
    --cc=alexng@microsoft.com \
    --cc=cavery@redhat.com \
    --cc=devel@linuxdriverproject.org \
    --cc=haiyangz@microsoft.com \
    --cc=kys@microsoft.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rkrcmar@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.