Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.

Netdev List
 help / color / mirror / Atom feed

From: Jason Wang <jasowang@redhat.com>
To: David Hill <dhill@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org
Cc: Willem de Bruijn <willemb@google.com>, netdev <netdev@vger.kernel.org>
Subject: Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.
Date: Tue, 19 Dec 2017 11:36:19 +0800	[thread overview]
Message-ID: <cd0ea113-34ff-d24a-b798-819dfb536c76@redhat.com> (raw)
In-Reply-To: <c3b3f34e-fe64-8ab3-3617-f98313526f9f@redhat.com>



On 2017年12月12日 11:53, David Hill wrote:
>
>
> On 2017-12-08 01:03 PM, David Hill wrote:
>>
>>
>> On 2017-12-07 12:13 AM, Jason Wang wrote:
>>>
>>>
>>> On 2017年12月07日 12:42, David Hill wrote:
>>>>
>>>>
>>>> On 2017-12-06 11:34 PM, David Hill wrote:
>>>>>
>>>>>
>>>>> On 2017-12-04 02:51 PM, David Hill wrote:
>>>>>>
>>>>>> On 2017-12-03 11:08 PM, Jason Wang wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 2017年12月02日 00:38, David Hill wrote:
>>>>>>>>>
>>>>>>>>> Finally, I reverted 581fe0ea61584d88072527ae9fb9dcb9d1f2783e 
>>>>>>>>> too ... compiling and I'll keep you posted.
>>>>>>>>
>>>>>>>> So I'm still able to reproduce this issue even with reverting 
>>>>>>>> these 3 commits.  Would you have other suspect commits ? 
>>>>>>>
>>>>>>> Thanks for the testing. No, I don't have other suspect commits.
>>>>>>>
>>>>>>> Looks like somebody else it hitting your issue too (see 
>>>>>>> https://www.spinics.net/lists/netdev/msg468319.html)
>>>>>>>
>>>>>>> But he claims the issue were fixed by using qemu 2.10.1.
>>>>>>>
>>>>>>> So you may:
>>>>>>>
>>>>>>> -try to see if qemu 2.10.1 solves your issue
>>>>>> It didn't solve it for him... it's only harder to reproduce. [1]
>>>>>>> -if not, try to see if commit 
>>>>>>> 2ddf71e23cc246e95af72a6deed67b4a50a7b81c ("net: add notifier 
>>>>>>> hooks for devmap bpf map") is the first bad commit
>>>>>> I'll try to see what I can do here
>>>>> I'm looking at that commit and it's been introduced before v4.13 
>>>>> if I'm not mistaken while this issue appeared between v4.13 and 
>>>>> v4.14-rc1 .  Between those two releases, there're 1352 commits.
>>>>> Is there a way to quickly know which commits are touching 
>>>>> vhost-net, zerocopy ?
>>>>>
>>>>>
>>>>> [ 7496.553044]  __schedule+0x2dc/0xbb0
>>>>> [ 7496.553055]  ? trace_hardirqs_on+0xd/0x10
>>>>> [ 7496.553074]  schedule+0x3d/0x90
>>>>> [ 7496.553087]  vhost_net_ubuf_put_and_wait+0x73/0xa0 [vhost_net]
>>>>> [ 7496.553100]  ? finish_wait+0x90/0x90
>>>>> [ 7496.553115]  vhost_net_ioctl+0x542/0x910 [vhost_net]
>>>>> [ 7496.553144]  do_vfs_ioctl+0xa6/0x6c0
>>>>> [ 7496.553166]  SyS_ioctl+0x79/0x90
>>>>> [ 7496.553182]  entry_SYSCALL_64_fastpath+0x1f/0xbe
>>>>
>>>> That vhost_net_ubuf_put_and)wait call has been changed in this 
>>>> commit with the following comment:
>>>>
>>>> commit 0ad8b480d6ee916aa84324f69acf690142aecd0e
>>>> Author: Michael S. Tsirkin <mst@redhat.com>
>>>> Date:   Thu Feb 13 11:42:05 2014 +0200
>>>>
>>>>     vhost: fix ref cnt checking deadlock
>>>>
>>>>     vhost checked the counter within the refcnt before 
>>>> decrementing.  It
>>>>     really wanted to know that it is the one that has the last 
>>>> reference, as
>>>>     a way to batch freeing resources a bit more efficiently.
>>>>
>>>>     Note: we only let refcount go to 0 on device release.
>>>>
>>>>     This works well but we now access the ref counter twice so 
>>>> there's a
>>>>     race: all users might see a high count and decide to defer freeing
>>>>     resources.
>>>>     In the end no one initiates freeing resources until the last 
>>>> reference
>>>>     is gone (which is on VM shotdown so might happen after a 
>>>> looooong time).
>>>>
>>>>     Let's do what we probably should have done straight away:
>>>>     switch from kref to plain atomic, documenting the
>>>>     semantics, return the refcount value atomically after decrement,
>>>>     then use that to avoid the deadlock.
>>>>
>>>>     Reported-by: Qin Chuanyu <qinchuanyu@huawei.com>
>>>>     Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>>>     Acked-by: Jason Wang <jasowang@redhat.com>
>>>>     Signed-off-by: David S. Miller <davem@davemloft.net>
>>>>
>>>>
>>>>
>>>> So at this point, are we hitting a deadlock when using 
>>>> experimental_zcopytx ? 
>>>
>>> Yes. But there could be another possibility that it was not caused 
>>> by vhost_net itself but other places that holds a packet.
>>>
>>> Thanks
>>
>> While bisecting, when I reach this commit 
>> 46d4b68f891bee5d83a32508bfbd9778be6b1b63, the system kernel panic 
>> when I run virt-customize :
>>
>> Message from syslogd@zappa at Dec  8 12:52:06 ...
>>  kernel:[  350.016376] Kernel panic - not syncing: Fatal exception in 
>> interrupt
>>
>> I marked that commit as bad again.   Will continue bisecting!
>>
>
> It looks like the first bad commit would be the following:
>
> [jenkins@zappa linux-stable-new]$ sudo bash bisect.sh -g
> 3ece782693c4b64d588dd217868558ab9a19bfe7 is the first bad commit
> commit 3ece782693c4b64d588dd217868558ab9a19bfe7
> Author: Willem de Bruijn <willemb@google.com>
> Date:   Thu Aug 3 16:29:38 2017 -0400
>
>     sock: skb_copy_ubufs support for compound pages
>
>     Refine skb_copy_ubufs to support compound pages. With upcoming TCP
>     zerocopy sendmsg, such fragments may appear.
>
>     The existing code replaces each page one for one. Splitting each
>     compound page into an independent number of regular pages can result
>     in exceeding limit MAX_SKB_FRAGS if data is not exactly page aligned.
>
>     Instead, fill all destination pages but the last to PAGE_SIZE.
>     Split the existing alloc + copy loop into separate stages:
>     1. compute bytelength and minimum number of pages to store this.
>     2. allocate
>     3. copy, filling each page except the last to PAGE_SIZE bytes
>     4. update skb frag array
>
>     Signed-off-by: Willem de Bruijn <willemb@google.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
>
> :040000 040000 f1b652be7e59b1046400cad8e6be25028a88b8e2 
> 6ecf86d9f06a2d98946f531f1e4cf803de071b10 M    include
> :040000 040000 8420cf451fcf51f669ce81437ce7e0aacc33d2eb 
> 4fc8384362693e4619fab39b0a945f6f2349226b M    net
>
> Here is the bisect log:

Thanks for the hard bisecting.

Cc netdev and Willem.


>
> [root@zappa linux-stable-new]# git bisect log
> git bisect start
> # bad: [2bd6bf03f4c1c59381d62c61d03f6cc3fe71f66e] Linux 4.14-rc1
> git bisect bad 2bd6bf03f4c1c59381d62c61d03f6cc3fe71f66e
> # good: [e87c13993f16549e77abce9744af844c55154349] Linux 4.13.16
> git bisect good e87c13993f16549e77abce9744af844c55154349
> # good: [569dbb88e80deb68974ef6fdd6a13edb9d686261] Linux 4.13
> git bisect good 569dbb88e80deb68974ef6fdd6a13edb9d686261
> # good: [569dbb88e80deb68974ef6fdd6a13edb9d686261] Linux 4.13
> git bisect good 569dbb88e80deb68974ef6fdd6a13edb9d686261
> # bad: [aae3dbb4776e7916b6cd442d00159bea27a695c1] Merge 
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
> git bisect bad aae3dbb4776e7916b6cd442d00159bea27a695c1
> # good: [bf1d6b2c76eda86159519bf5c427b1fa8f51f733] Merge tag 
> 'staging-4.14-rc1' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
> git bisect good bf1d6b2c76eda86159519bf5c427b1fa8f51f733
> # bad: [e833251ad813168253fef9915aaf6a8c883337b0] rxrpc: Add 
> notification of end-of-Tx phase
> git bisect bad e833251ad813168253fef9915aaf6a8c883337b0
> # bad: [46d4b68f891bee5d83a32508bfbd9778be6b1b63] Merge tag 
> 'wireless-drivers-next-for-davem-2017-08-07' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
> git bisect bad 46d4b68f891bee5d83a32508bfbd9778be6b1b63
> # good: [cf6c6ea352faadb15d1373d890bf857080b218a4] iwlwifi: mvm: fix 
> the FIFO numbers in A000 devices
> git bisect good cf6c6ea352faadb15d1373d890bf857080b218a4
> # good: [65205cc465e9b37abbdbb3d595c46081b97e35bc] sctp: remove the 
> typedef sctp_addiphdr_t
> git bisect good 65205cc465e9b37abbdbb3d595c46081b97e35bc
> # bad: [ecbd87b8430419199cc9dd91598d5552a180f558] phylink: add support 
> for MII ioctl access to Clause 45 PHYs
> git bisect bad ecbd87b8430419199cc9dd91598d5552a180f558
> # bad: [52267790ef52d7513879238ca9fac22c1733e0e3] sock: add MSG_ZEROCOPY
> git bisect bad 52267790ef52d7513879238ca9fac22c1733e0e3
> # good: [04b1d4e50e82536c12da00ee04a77510c459c844] net: core: Make the 
> FIB notification chain generic
> git bisect good 04b1d4e50e82536c12da00ee04a77510c459c844
> # good: [9217d8c2fe743f02a3ce6d430fe3b5d514fd5f1c] ipv6: Regenerate 
> host route according to node pointer upon loopback up
> git bisect good 9217d8c2fe743f02a3ce6d430fe3b5d514fd5f1c
> # good: [0a7fd1ac2a6b316ceeb9a57a41ce0c45f6bff549] mlxsw: 
> spectrum_router: Add support for route replace
> git bisect good 0a7fd1ac2a6b316ceeb9a57a41ce0c45f6bff549
> # good: [84b7187ca2338832e3af58eb5123c02bb6921e4e] Merge branch 
> 'mlxsw-Support-for-IPv6-UC-router'
> git bisect good 84b7187ca2338832e3af58eb5123c02bb6921e4e
> # bad: [3ece782693c4b64d588dd217868558ab9a19bfe7] sock: skb_copy_ubufs 
> support for compound pages
> git bisect bad 3ece782693c4b64d588dd217868558ab9a19bfe7
> # good: [98ba0bd5505dcbb90322a4be07bcfe6b8a18c73f] sock: allocate skbs 
> from optmem
> git bisect good 98ba0bd5505dcbb90322a4be07bcfe6b8a18c73f
> # first bad commit: [3ece782693c4b64d588dd217868558ab9a19bfe7] sock: 
> skb_copy_ubufs support for compound pages
>
>

next      parent reply	other threads:[~2017-12-19  3:36 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <efd45fba-5724-0036-8473-0274b5816ae9@redhat.com>
     [not found] ` <10fe2b98-1e26-9539-9f49-0d01f8693e04@redhat.com>
     [not found]   ` <6b41b4e5-6c0c-fce6-21fe-02dd8f550095@redhat.com>
     [not found]     ` <c63ba0d1-c0d2-85c6-ad1c-7f777b59eae8@redhat.com>
     [not found]       ` <634116a6-6338-4249-7d2d-430b654cc99c@redhat.com>
     [not found]         ` <1f789868-7fda-3553-7078-3298873fb355@redhat.com>
     [not found]           ` <918c4152-bcf9-b28c-0f54-f51d07d82bfc@redhat.com>
     [not found]             ` <b8b2238c-30a0-6743-9399-ec441fd6043e@redhat.com>
     [not found]               ` <68b5d4aa-1d48-d9a1-fc47-62ee8d7ad07a@redhat.com>
     [not found]                 ` <623df785-b79c-80d1-899f-6fcc10f70e69@redhat.com>
     [not found]                   ` <61be2e2b-9aeb-1a82-d607-a6af00f8c9c6@redhat.com>
     [not found]                     ` <094aabc6-4e6b-841e-2b7b-177b31e8ed07@redhat.com>
     [not found]                       ` <ce97ed55-75fa-2f5d-d6cb-8f7e356a555a@redhat.com>
     [not found]                         ` <9da15781-b6e0-3688-f6b2-2ef483b39d0d@redhat.com>
     [not found]                           ` <2c153ff8-57cc-715b-6d2f-1758bcb66abb@redhat.com>
     [not found]                             ` <dcaafacd-182c-6bf9-636a-0726299f7ce2@redhat.com>
     [not found]                               ` <4c8c81e6-e582-f292-79ed-f3d62518e2d9@redhat.com>
     [not found]                                 ` <af1452f1-c327-b7f1-11ac-dc01e22bcfb5@redhat.com>
     [not found]                                   ` <c3b3f34e-fe64-8ab3-3617-f98313526f9f@redhat.com>
2017-12-19  3:36                                     ` Jason Wang [this message]
2017-12-19 16:19                                       ` Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover Willem de Bruijn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cd0ea113-34ff-d24a-b798-819dfb536c76@redhat.com \
    --to=jasowang@redhat.com \
    --cc=dhill@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox