From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36709) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a0rU4-0001Ip-KH for qemu-devel@nongnu.org; Mon, 23 Nov 2015 08:48:45 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a0rU1-0003ZH-DN for qemu-devel@nongnu.org; Mon, 23 Nov 2015 08:48:44 -0500 Received: from mx4-phx2.redhat.com ([209.132.183.25]:60972) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a0rU1-0003ZB-5b for qemu-devel@nongnu.org; Mon, 23 Nov 2015 08:48:41 -0500 Date: Mon, 23 Nov 2015 08:48:32 -0500 (EST) From: =?utf-8?Q?Marc-Andr=C3=A9?= Lureau Message-ID: <1032710289.15307566.1448286512454.JavaMail.zimbra@redhat.com> In-Reply-To: <87r3jg26fw.fsf@blackfin.pond.sub.org> References: <87si401wpf.fsf@blackfin.pond.sub.org> <87si40sfzh.fsf@blackfin.pond.sub.org> <430569618.12530858.1448043657890.JavaMail.zimbra@redhat.com> <874mggo3yc.fsf@blackfin.pond.sub.org> <243512039.12566588.1448050691647.JavaMail.zimbra@redhat.com> <87oael587p.fsf@blackfin.pond.sub.org> <802808837.15238786.1448280956337.JavaMail.zimbra@redhat.com> <87r3jg26fw.fsf@blackfin.pond.sub.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] ivshmem property size should be a size, not a string List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Markus Armbruster Cc: marcandre lureau , Claudio Fontana , qemu-devel@nongnu.org, Luiz Capitulino Hi ----- Original Message ----- > > "role" was designed to only migrate the master. Ability to migrate a pool > > of > > peer would be a significant new feature. I am not aware of such request. > > I see. But how is this supposed to work? > > Before migration: one master and N peers connected to the server on host > A, N>=0. > > After migration: one master and N' of the N peers connected to the > server on host B, N>=N'>=0, and the remaining N-N' peers still on host A > with their ivshmem device unplugged. > > How would I do this even for N'==0? I can't see how I'm supposted to > connect the migrated shared memory to a server on host B. I am not sure I understand you. You can't migrate the peers. As I said, "ability to migrate a pool of peer would be a significant new feature". > >> Did you try chardev=...,size=S, where S is larger than what the server > >> provides? > > > > It will fall in check_shm_size(). > > Yes. Called from ivshmem_read(). ivshmem_read() will then complain to > stderr, close the file descriptor we got from the server and leave the > BAR unmapped. My question is how guests deal with that state. Could be > anything from "detect the device is broken and fence it" to "kernel > panic". > Whatever it is, it could easily also happen if the guest wins the race > with the server and tries to use the device before it successfully got > its shared memory from the server. It's nothing bad from what I remember on qemu side. On guest side, it depends how your driver/user is implemented I suppose. To me, it's not a normal case, and the error should be enough to diagnose it. > 1. Unless the guest can reliably detect the doorbell feature, the > doorbell feature is *broken*. > > As far as I can tell, a device with a doorbell behaves exactly like > one without a doorbell until it got its shared memory from the > server. If that's correct, then doorbell detection is inherently > racy. There are many ways you can do synchronization. In test_ivshmem_server(), I trivially wait for the membar with the required signature to be mapped. Verify that peers have different ids, and then start using the doorbell. That seems good enough. > The only way to fix this in documentation is "broken, do not use". It works fine in the tests. Feel free to point out races or other issues. > The maximally compatible way to fix this in code is to ensure the > guest can't read register IVPosition before we got the shared memory > from the server. We can make realize wait, or the read. The latter > is probably an even worse idea. > > An easier way to fix it in code is splitting up the device, so guests > can simply check the PCI device ID to figure out whether they have > one with a doorbell. > > An even easier way is dropping the doorbell feature outright. > > 2. The UI is crap. > > We can fix this by rejecting nonsensical option combinations. Yes, I think it's the simplest way for now. I dislike having to break stuff when you can overcome it with a few more checks. > However, the result will be more complex than splitting the device in > two so that nonsensical options combinations are simply impossible. I disagree, adding more checks will add a few dozen lines with minimal impact. Splitting things will break stuff and require significant effort to share correctly what can be shared etc. > If we need to split it anyway to fix the doorbell, we can clean up > the UI at next to no cost. I don't think the doorbell is broken.