From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: KY Srinivasan <kys@microsoft.com>
Cc: "devel\@linuxdriverproject.org" <devel@linuxdriverproject.org>,
"linux-kernel\@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Haiyang Zhang" <haiyangz@microsoft.com>,
"Alex Ng \(LIS\)" <alexng@microsoft.com>,
"Radim Krcmar" <rkrcmar@redhat.com>,
Cathy Avery <cavery@redhat.com>
Subject: Re: [PATCH] Drivers: hv: vmbus: handle various crash scenarios
Date: Mon, 21 Mar 2016 08:51:54 +0100 [thread overview]
Message-ID: <874mc02rqd.fsf@vitty.brq.redhat.com> (raw)
In-Reply-To: <SN2PR03MB214246D9A9DAFA36E392AE78A08C0@SN2PR03MB2142.namprd03.prod.outlook.com> (KY Srinivasan's message of "Fri, 18 Mar 2016 18:02:53 +0000")
KY Srinivasan <kys@microsoft.com> writes:
>> -----Original Message-----
>> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
>> Sent: Friday, March 18, 2016 5:33 AM
>> To: devel@linuxdriverproject.org
>> Cc: linux-kernel@vger.kernel.org; KY Srinivasan <kys@microsoft.com>;
>> Haiyang Zhang <haiyangz@microsoft.com>; Alex Ng (LIS)
>> <alexng@microsoft.com>; Radim Krcmar <rkrcmar@redhat.com>; Cathy
>> Avery <cavery@redhat.com>
>> Subject: [PATCH] Drivers: hv: vmbus: handle various crash scenarios
>>
>> Kdump keeps biting. Turns out CHANNELMSG_UNLOAD_RESPONSE is always
>> delivered to CPU0 regardless of what CPU we're sending
>> CHANNELMSG_UNLOAD
>> from. vmbus_wait_for_unload() doesn't account for the fact that in case
>> we're crashing on some other CPU and CPU0 is still alive and operational
>> CHANNELMSG_UNLOAD_RESPONSE will be delivered there completing
>> vmbus_connection.unload_event, our wait on the current CPU will never
>> end.
>
> What was the host you were testing on?
>
I was testing on both 2012R2 and 2016TP4. The bug is easily reproducible
by forcing crash on a secondary CPU, e.g.:
# cat crash.sh
#! /bin/sh
echo c > /proc/sysrq-trigger
# taskset -c 1 ./crash.sh
>>
>> Do the following:
>> 1) Check for completion_done() in the loop. In case interrupt handler is
>> still alive we'll get the confirmation we need.
>>
>> 2) Always read CPU0's message page as CHANNELMSG_UNLOAD_RESPONSE
>> will be
>> delivered there. We can race with still-alive interrupt handler doing
>> the same but we don't care as we're checking completion_done() now.
>>
>> 3) Cleanup message pages on all CPUs. This is required (at least for the
>> current CPU as we're clearing CPU0 messages now but we may want to
>> bring
>> up additional CPUs on crash) as new messages won't be delivered till we
>> consume what's pending. On boot we'll place message pages somewhere
>> else
>> and we won't be able to read stale messages.
>>
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>> drivers/hv/channel_mgmt.c | 30 +++++++++++++++++++++++++-----
>> 1 file changed, 25 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
>> index b10e8f74..5f37057 100644
>> --- a/drivers/hv/channel_mgmt.c
>> +++ b/drivers/hv/channel_mgmt.c
>> @@ -512,14 +512,26 @@ static void init_vp_index(struct vmbus_channel
>> *channel, const uuid_le *type_gui
>>
>> static void vmbus_wait_for_unload(void)
>> {
>> - int cpu = smp_processor_id();
>> - void *page_addr = hv_context.synic_message_page[cpu];
>> + int cpu;
>> + void *page_addr = hv_context.synic_message_page[0];
>> struct hv_message *msg = (struct hv_message *)page_addr +
>> VMBUS_MESSAGE_SINT;
>> struct vmbus_channel_message_header *hdr;
>> bool unloaded = false;
>>
>> - while (1) {
>> + /*
>> + * CHANNELMSG_UNLOAD_RESPONSE is always delivered to CPU0.
>> When we're
>> + * crashing on a different CPU let's hope that IRQ handler on CPU0 is
>> + * still functional and vmbus_unload_response() will complete
>> + * vmbus_connection.unload_event. If not, the last thing we can do
>> is
>> + * read message page for CPU0 regardless of what CPU we're on.
>> + */
>> + while (!unloaded) {
>> + if (completion_done(&vmbus_connection.unload_event)) {
>> + unloaded = true;
>> + break;
>> + }
>> +
>> if (READ_ONCE(msg->header.message_type) ==
>> HVMSG_NONE) {
>> mdelay(10);
>> continue;
>> @@ -530,9 +542,17 @@ static void vmbus_wait_for_unload(void)
>> unloaded = true;
>>
>> vmbus_signal_eom(msg);
>> + }
>>
>> - if (unloaded)
>> - break;
>> + /*
>> + * We're crashing and already got the UNLOAD_RESPONSE, cleanup
>> all
>> + * maybe-pending messages on all CPUs to be able to receive new
>> + * messages after we reconnect.
>> + */
>> + for_each_online_cpu(cpu) {
>> + page_addr = hv_context.synic_message_page[cpu];
>> + msg = (struct hv_message *)page_addr +
>> VMBUS_MESSAGE_SINT;
>> + msg->header.message_type = HVMSG_NONE;
>> }
>> }
>>
>> --
>> 2.5.0
--
Vitaly
next prev parent reply other threads:[~2016-03-21 7:52 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-18 12:33 [PATCH] Drivers: hv: vmbus: handle various crash scenarios Vitaly Kuznetsov
2016-03-18 15:20 ` Radim Krcmar
2016-03-18 15:53 ` Vitaly Kuznetsov
2016-03-18 16:11 ` Radim Krcmar
2016-03-18 18:02 ` KY Srinivasan
2016-03-21 7:51 ` Vitaly Kuznetsov [this message]
2016-03-21 22:44 ` KY Srinivasan
2016-03-22 9:47 ` Vitaly Kuznetsov
2016-03-22 14:00 ` Vitaly Kuznetsov
2016-03-22 14:18 ` KY Srinivasan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=874mc02rqd.fsf@vitty.brq.redhat.com \
--to=vkuznets@redhat.com \
--cc=alexng@microsoft.com \
--cc=cavery@redhat.com \
--cc=devel@linuxdriverproject.org \
--cc=haiyangz@microsoft.com \
--cc=kys@microsoft.com \
--cc=linux-kernel@vger.kernel.org \
--cc=rkrcmar@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox