From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754730AbcEaQ1C (ORCPT ); Tue, 31 May 2016 12:27:02 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58840 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750863AbcEaQ07 (ORCPT ); Tue, 31 May 2016 12:26:59 -0400 From: Vitaly Kuznetsov To: Dexuan Cui Cc: gregkh@linuxfoundation.org, linux-kernel@vger.kernel.org, driverdev-devel@linuxdriverproject.org, olaf@aepfle.de, apw@canonical.com, jasowang@redhat.com, kys@microsoft.com, haiyangz@microsoft.com, rolf.neugebauer@docker.com, dave.scott@docker.com, ian.campbell@docker.com Subject: Re: [PATCH v3] Drivers: hv: vmbus: fix the race when querying & updating the percpu list References: <1463896966-4745-1-git-send-email-decui@microsoft.com> Date: Tue, 31 May 2016 18:26:54 +0200 In-Reply-To: <1463896966-4745-1-git-send-email-decui@microsoft.com> (Dexuan Cui's message of "Sat, 21 May 2016 23:02:46 -0700") Message-ID: <87twhefb1t.fsf@vitty.brq.redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Tue, 31 May 2016 16:26:58 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Dexuan Cui writes: > There is a rare race when we remove an entry from the global list > hv_context.percpu_list[cpu] in hv_process_channel_removal() -> > percpu_channel_deq() -> list_del(): at this time, if vmbus_on_event() -> > process_chn_event() -> pcpu_relid2channel() is trying to query the list, > we can get the kernel fault. > > Similarly, we also have the issue in the code path: vmbus_process_offer() -> > percpu_channel_enq(). > > We can resolve the issue by disabling the tasklet when updating the list. > > The patch also moves vmbus_release_relid() to a later place where > the channel has been removed from the per-cpu and the global lists. > > Reported-by: Rolf Neugebauer > Cc: Vitaly Kuznetsov > Signed-off-by: Dexuan Cui Tested 4.7-rc1 with this path applied and kernel always crashes on boot (WS2016TP5, 12 CPU SMP guest, Generation 2): [ 5.464251] hv_vmbus: Hyper-V Host Build:14300-10.0-1-0.1006; Vmbus version:4.0 [ 5.471666] hv_vmbus: Unknown GUID: f8e65716-3cb3-4a06-9a60-1889c5cccab5 [ 5.472143] BUG: unable to handle kernel paging request at 000000079fff5288 [ 5.477107] IP: [] vmbus_onoffer+0x311/0x570 [hv_vmbus] [ 5.477107] PGD 0 [ 5.477107] Oops: 0000 [#1] SMP [ 5.477107] Modules linked in: hv_vmbus [ 5.477107] CPU: 11 PID: 189 Comm: kworker/11:1 Not tainted 4.7.0-rc1_dc1_test+ #262 [ 5.477107] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v1.0 11/26/2012 [ 5.477107] Workqueue: hv_vmbus_con vmbus_onmessage_work [hv_vmbus] [ 5.477107] task: ffff8801796e4480 ti: ffff8801796e8000 task.ti: ffff8801796e8000 [ 5.477107] RIP: 0010:[] [] vmbus_onoffer+0x311/0x570 [hv_vmbus] [ 5.477107] RSP: 0018:ffff8801796ebc50 EFLAGS: 00010286 [ 5.477107] RAX: 00000000ffff8801 RBX: ffff880032641000 RCX: 0000000000000050 [ 5.477107] RDX: 0000000000040000 RSI: 0000000000000000 RDI: ffff880032641000 [ 5.477107] RBP: ffff8801796ebd10 R08: 0000000000000001 R09: 0000000000000001 [ 5.477107] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000010 [ 5.477107] R13: 4a063cb3f8e65716 R14: b5caccc58918609a R15: ffffffffa0008b60 [ 5.477107] FS: 0000000000000000(0000) GS:ffff88017c000000(0000) knlGS:0000000000000000 [ 5.477107] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5.477107] CR2: 000000079fff5288 CR3: 0000000032613000 CR4: 00000000001406e0 [ 5.477107] Stack: [ 5.477107] ffff880032641780 ffff88003264102c 0010010000000046 ffffffffa000646e [ 5.477107] ffff8801796e5090 ffff8801796e4480 00000000004f827d 0000000000000001 [ 5.477107] 0000000000000000 ffff8801796ebce8 ffffffff810eaebc 00000000796e5058 [ 5.477107] Call Trace: [ 5.477107] [] ? __lock_acquire+0x3dc/0x730 [ 5.477107] [] vmbus_onmessage+0x33/0xa0 [hv_vmbus] [ 5.477107] [] vmbus_onmessage_work+0x21/0x30 [hv_vmbus] [ 5.653321] [] process_one_work+0x1ff/0x6d0 [ 5.653321] [] ? process_one_work+0x181/0x6d0 [ 5.653321] [] worker_thread+0x4e/0x490 [ 5.653321] [] ? process_one_work+0x6d0/0x6d0 [ 5.653321] [] ? process_one_work+0x6d0/0x6d0 [ 5.653321] [] kthread+0x101/0x120 [ 5.653321] [] ret_from_fork+0x1f/0x40 [ 5.653321] [] ? kthread_create_on_node+0x250/0x250 [ 5.653321] Code: 74 24 08 48 c7 c7 60 6c 00 a0 e8 0a 9e 1b e1 b8 10 00 00 00 66 89 44 24 16 44 89 e6 48 89 df e8 f6 f9 ff ff 41 8b 87 f4 02 00 00 <48> 8b 14 c5 80 12 03 a0 f0 ff 42 10 48 8b 42 08 a8 02 75 f8 0f [ 5.653321] RIP [] vmbus_onoffer+0x311/0x570 [hv_vmbus] [ 5.653321] RSP [ 5.653321] CR2: 000000079fff5288 [ 5.653321] ---[ end trace 62df6070997f1f10 ]--- [ 5.653321] Kernel panic - not syncing: Fatal exception [ 5.653321] Kernel Offset: disabled [ 5.653321] ---[ end Kernel panic - not syncing: Fatal exception [ 5.653480] ------------[ cut here ]------------ I can investigate it tomorrow if this doesn't reproduce for you. -- Vitaly