Re: [Qemu-devel] Hotplug ram and vhost-user

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Maxime Coquelin <maxime.coquelin@redhat.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>
Cc: marcandre.lureau@redhat.com, qemu-devel@nongnu.org,
	Victor Kaplansky <vkaplans@redhat.com>
Subject: Re: [Qemu-devel] Hotplug ram and vhost-user
Date: Thu, 7 Dec 2017 19:33:10 +0100	[thread overview]
Message-ID: <7c88b2f7-45a3-a470-c458-7266c485befa@redhat.com> (raw)
In-Reply-To: <20171207182348.GF2439@work-vm>



On 12/07/2017 07:23 PM, Dr. David Alan Gilbert wrote:
> * Maxime Coquelin (maxime.coquelin@redhat.com) wrote:
>>
>>
>> On 12/07/2017 05:25 PM, Dr. David Alan Gilbert wrote:
>>> * Maxime Coquelin (maxime.coquelin@redhat.com) wrote:
>>>> Hi David,
>>>>
>>>> On 12/05/2017 06:41 PM, Dr. David Alan Gilbert wrote:
>>>>> Hi,
>>>>>      Since I'm reworking the memory map update code I've been
>>>>> trying to test it with hot adding RAM; but even on upstream
>>>>> I'm finding that hot adding RAM causes the guest to stop passing
>>>>> packets with vhost-user-bridge;  have either of you seen the same
>>>>> thing?
>>>>
>>>> No, I have never tried this.
>>>
>>> Would you know if it works on dpdk?
>>
>> We have a known issue in DPDK, the PMD threads might be accessing the
>> guest memory while the vhost-user protocol thread is unmapping it.
>>
>> We have a similar problem with dirty logging area, and Victor is working
>> on a patch that will fix both issues.
>>
>> Once ready, I'll have a try and let you know.
>>
>>>>> I'm doing:
>>>>> ./tests/vhost-user-bridge -u /tmp/vubrsrc.sock
>>>>> $QEMU -enable-kvm -m 1G,maxmem=2G,slots=4 -smp 2 -object memory-backend-file,id=mem,size=1G,mem-path=/dev/shm,share=on -numa node,memdev=mem -mem-prealloc -trace events=vhost-trace-file -chardev socket,id=char0,path=/tmp/vubrsrc.sock -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 $IMAGE -net none
>>>>>
>>>>> (with a f27 guest) and then doing:
>>>>> (qemu) object_add memory-backend-file,id=mem1,size=256M,mem-path=/dev/shm
>>>>> (qemu) device_add pc-dimm,id=dimm1,memdev=mem1
>>>>>
>>>>> but then not getting any responses inside the guest.
>>>>>
>>>>> I can see the code sending another set-mem-table with the
>>>>> extra chunk of RAM and fd, and I think I can see the bridge
>>>>> mapping it.
>>>>
>>>> I think there are at least two problems.
>>>> The first one is that vhost-user-bridge does not support vhost-user
>>>> protocol's reply-ack feature. So when QEMU sends the requests, it cannot
>>>> know whether/when it has been handled by the backend.
>>>
>>> Wouldn't you have to be unlucky to cause that a problem - i.e. the
>>> descriptors would have to get allocated in the new RAM?
>>
>> Yes, you may be right. I think it is worth to debug it to understand
>> what is going on.
>>
>>>> It had been fixed by sending a GET_FEATURE requests to be sure the
>>>> SET_MEM_TABLE was handled, as messages are processed in order. The problem
>>>> is that it caused some test failures when using TCG, so it got
>>>> reverted.
>>>>
>>>> The initial fix:
>>>>
>>>> commit 28ed5ef16384f12500abd3647973ee21b03cbe23
>>>> Author: Prerna Saxena <prerna.saxena@nutanix.com>
>>>> Date:   Fri Aug 5 03:53:51 2016 -0700
>>>>
>>>>       vhost-user: Attempt to fix a race with set_mem_table.
>>>>
>>>> The revert:
>>>>
>>>> commit 94c9cb31c04737f86be29afefbff401cd23bc24d
>>>> Author: Michael S. Tsirkin <mst@redhat.com>
>>>> Date:   Mon Aug 15 16:35:24 2016 +0300
>>>>
>>>>       Revert "vhost-user: Attempt to fix a race with set_mem_table."
>>>>
>>>
>>> Do we know which tests fail?
>>
>> vhost-user-test, but it should no more be failing now that it no more
>> uses TCG.
>>
>> I think we could consider reverting the revert. i.e. send get_features
>> in set_mem_table toi be sure it has been handled.
> 
> How does it fail? Does it fail every time or only some times?
> (The postcopy test in migration-test.c also fails under TCG under
> very heavy load and I've not figured out why yet).

I'm trying to remember the analysis I did one year ago... I don't have
yet the full picture, but found some notes I took at that time:

"
I have managed to reproduce the hang by adding some debug prints into
vhost_user_get_features().

Doing this the issue is reproducible quite easily.
Another way to reproduce it in one shot is to strace (with following
forks) /vhost-user-test execution.

So, by adding debug prints at vhost_user_get_features() entry and exit,
we can see we never return from this function when hang happens.
Strace of Qemu instance shows that its thread keeps retrying to receive
GET_FEATURE reply:

write(1, "vhost_user_get_features IN: \n", 29) = 29
sendmsg(11, {msg_name=NULL, msg_namelen=0,
         msg_iov=[{iov_base="\1\0\0\0\1\0\0\0\0\0\0\0", iov_len=12}],
         msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 12
recvmsg(11, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 EAGAIN
nanosleep({0, 100000}, 0x7fff29f8dd70)  = 0
...
recvmsg(11, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 EAGAIN
nanosleep({0, 100000}, 0x7fff29f8dd70)  = 0

The reason is that vhost-user-test never replies to Qemu,
because its thread handling the GET_FEATURES command is waiting for
the s->data_mutex lock.
This lock is held by the other vhost-user-test thread, executing
read_guest_mem().

The lock is never released because the thread is blocked in read
syscall, when read_guest_mem() is doing the readl().

This is because on Qemu side, the thread polling the qtest socket is
waiting for the qemu_global_mutex (in os_host_main_loop_wait()), but
the mutex is held by the thread trying to get the GET_FEATURE reply
(the TCG one).
"

It does not explain why it would only fail with TCG, I would need to
spend some time investigating the issue to find why I claimed this.

Maxime
> Dave
> 
>>>> Another problem is that memory mmapped with previous call does not seems
>>>> to be unmapped, but that should not cause other problems than leaking
>>>> virtual memory.
>>>
>>> Oh, leaks are the least of our problem there!
>>
>> Sure.
>>
>> Maxime
>>> Dave
>>>
>>>> Maxime
>>>>> Dave
>>>>>
>>>>> --
>>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>>>
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>

next prev parent reply	other threads:[~2017-12-07 18:33 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-05 17:41 [Qemu-devel] Hotplug ram and vhost-user Dr. David Alan Gilbert
2017-12-07 15:52 ` Maxime Coquelin
2017-12-07 15:56   ` Michael S. Tsirkin
2017-12-07 16:35     ` Maxime Coquelin
2017-12-07 16:55       ` Michael S. Tsirkin
2017-12-07 16:56       ` Michael S. Tsirkin
2017-12-07 16:25   ` Dr. David Alan Gilbert
2017-12-07 16:42     ` Maxime Coquelin
2017-12-07 18:23       ` Dr. David Alan Gilbert
2017-12-07 18:33         ` Maxime Coquelin [this message]
2017-12-07 18:57           ` Dr. David Alan Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7c88b2f7-45a3-a470-c458-7266c485befa@redhat.com \
    --to=maxime.coquelin@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=vkaplans@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).