From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757910AbcFAI7J (ORCPT ); Wed, 1 Jun 2016 04:59:09 -0400 Received: from mx1.redhat.com ([209.132.183.28]:45375 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757467AbcFAI7C (ORCPT ); Wed, 1 Jun 2016 04:59:02 -0400 From: Vitaly Kuznetsov To: Dexuan Cui Cc: "gregkh\@linuxfoundation.org" , "linux-kernel\@vger.kernel.org" , "driverdev-devel\@linuxdriverproject.org" , "olaf\@aepfle.de" , "apw\@canonical.com" , "jasowang\@redhat.com" , KY Srinivasan , Haiyang Zhang , "rolf.neugebauer\@docker.com" , "dave.scott\@docker.com" , "ian.campbell\@docker.com" Subject: Re: [PATCH v3] Drivers: hv: vmbus: fix the race when querying & updating the percpu list References: <1463896966-4745-1-git-send-email-decui@microsoft.com> <87twhefb1t.fsf@vitty.brq.redhat.com> Date: Wed, 01 Jun 2016 10:58:56 +0200 In-Reply-To: (Dexuan Cui's message of "Wed, 1 Jun 2016 06:39:54 +0000") Message-ID: <87h9ddffov.fsf@vitty.brq.redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Wed, 01 Jun 2016 08:59:02 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Dexuan Cui writes: >> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com] >> Sent: Wednesday, June 1, 2016 0:27 >> To: Dexuan Cui >> Cc: gregkh@linuxfoundation.org; linux-kernel@vger.kernel.org; driverdev- >> devel@linuxdriverproject.org; olaf@aepfle.de; apw@canonical.com; >> jasowang@redhat.com; KY Srinivasan ; Haiyang >> Zhang ; rolf.neugebauer@docker.com; >> dave.scott@docker.com; ian.campbell@docker.com >> Subject: Re: [PATCH v3] Drivers: hv: vmbus: fix the race when querying & >> updating the percpu list >> >> Dexuan Cui writes: >> >> > There is a rare race when we remove an entry from the global list >> > hv_context.percpu_list[cpu] in hv_process_channel_removal() -> >> > percpu_channel_deq() -> list_del(): at this time, if vmbus_on_event() -> >> > process_chn_event() -> pcpu_relid2channel() is trying to query the list, >> > we can get the kernel fault. >> > >> > Similarly, we also have the issue in the code path: vmbus_process_offer() >> -> >> > percpu_channel_enq(). >> > >> > We can resolve the issue by disabling the tasklet when updating the list. >> > >> > The patch also moves vmbus_release_relid() to a later place where >> > the channel has been removed from the per-cpu and the global lists. >> > >> > Reported-by: Rolf Neugebauer >> > Cc: Vitaly Kuznetsov >> > Signed-off-by: Dexuan Cui >> >> Tested 4.7-rc1 with this path applied and kernel always crashes on boot >> (WS2016TP5, 12 CPU SMP guest, Generation 2): >> >> [ 5.464251] hv_vmbus: Hyper-V Host Build:14300-10.0-1-0.1006; Vmbus >> version:4.0 >> [ 5.471666] hv_vmbus: Unknown GUID: f8e65716-3cb3-4a06-9a60- >> 1889c5cccab5 >> [ 5.472143] BUG: unable to handle kernel paging request at >> 000000079fff5288 >> [ 5.477107] IP: [] vmbus_onoffer+0x311/0x570 >> [hv_vmbus] >> ... >> Vitaly > > I can't reproduce the panic somehow, but I did find a bug in vmbus_process_offer(): > > "hv_event_tasklet_disable(channel) and hv_event_tasklet_enable(channel)" > are buggy: the 'channel' parameter should be 'newchannel'. > > This was a copy-and-paste bug... Sorry! > Can you fix this and see if the panic will disappear in your side? This fixes the issue I'm seeing, thanks! -- Vitaly