From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Mh3Zw-00034G-3z for qemu-devel@nongnu.org; Fri, 28 Aug 2009 11:41:28 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Mh3Zr-00030E-1K for qemu-devel@nongnu.org; Fri, 28 Aug 2009 11:41:27 -0400 Received: from [199.232.76.173] (port=45349 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Mh3Zq-000309-Lx for qemu-devel@nongnu.org; Fri, 28 Aug 2009 11:41:22 -0400 Received: from qw-out-1920.google.com ([74.125.92.144]:46072) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1Mh3Zq-0008IQ-9J for qemu-devel@nongnu.org; Fri, 28 Aug 2009 11:41:22 -0400 Received: by qw-out-1920.google.com with SMTP id 5so464373qwc.4 for ; Fri, 28 Aug 2009 08:41:21 -0700 (PDT) Message-ID: <4A97FA9B.1030401@codemonkey.ws> Date: Fri, 28 Aug 2009 10:41:15 -0500 From: Anthony Liguori MIME-Version: 1.0 Subject: Re: [Qemu-devel] [BUG] Regression of exec migration References: <4A969496.2070305@codemonkey.ws> <076B9FA6-C362-47C1-AA8B-70BF147843A6@irisa.fr> <4A96B353.8070600@codemonkey.ws> <65558604-590A-4FD7-877D-E676CE195C7A@irisa.fr> In-Reply-To: <65558604-590A-4FD7-877D-E676CE195C7A@irisa.fr> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Pierre Riteau Cc: Chris Lalancette , qemu-devel@nongnu.org Pierre Riteau wrote: > On 27 août 09, at 18:24, Anthony Liguori wrote: > >> Pierre Riteau wrote: >>> On 27 août 09, at 16:13, Anthony Liguori wrote: >>> >>>> Pierre Riteau wrote: >>>>> [Sorry Chris, resending without the giant attachments.] >>>>> >>>>> Commit 907500095851230a480b14bc852c4e49d32cb16d makes exec >>>>> migration much slower than before. >>>>> I'm running the latest HEAD of qemu, on Debian Lenny 5.0.2. >>>>> >>>>> I'm migrating a fully booted Linux VM (also running Lenny) with >>>>> 128MB of RAM to a file, using the following command: migrate >>>>> "exec: cat > vmimage". The resulting file has a size of 57MB >>>>> (because we save only what is allocated from the 128MB). >>>>> With the current HEAD, it takes from 15 to 40 seconds (it's >>>>> variable) to perform the migration to the file. >>>>> With commit 907500095851230a480b14bc852c4e49d32cb16d reverted (or >>>>> just commenting the "socket_set_nonblock(s->fd);" statement), it >>>>> takes about 3 seconds. >>>> >>>> Without that changeset, it wasn't a live migration. The better way >>>> to compare would be to issue stop before doing the migrate and >>>> compare that time with the previous time. >>>> >>>> When a migration is live, it's iterative which means there's more >>>> work to do. >>> >>> I tried with stop too, and I get the same results. It's an idle VM >>> so only a small number of pages are being modified while the >>> migration is going on. >>> I agree that the changeset seems good, the code it replaces was >>> obviously wrong. >>> But I think there is something wrong somewhere else, unless it is >>> considered normal that it takes so much time for an exec migration. >>> To compare, using the same setup with one more machine and a Gigabit >>> network, a tcp migration capped at 35m (the slowest speed I've >>> measured from the disk, it can be way faster) takes about the same >>> time, between 2 and 4 seconds. >> >> I don't think the difference between 3 seconds and 15 seconds is >> significant. >> >> Can you try a different workload that will result in a migration that >> takes much longer (say multiple minutes)? That is, I'd like to know >> whether there's a fixed greater cost of exec: migration vs. factor of 5. >> >> I expect exec: to be slower because there is more copying but not by >> a factor of 5. I expect that it's going to be a combination of >> relatively small constant factor + relatively small constant fixed cost. >> >> Regards, >> >> Anthony Liguori > > > I did more tests, now with a 1024MB VM. Before launching the migration > I run a simple program that allocates 900MB of memory and fills it > with random data, then sleeps. > The results are: > TCP migration: ~ 30s > exec migration to hard drive: ~ 4 min 20s > exec migration to netcat (replicating the setup used with the TCP > migration): ~ 5 min 40s I think the fundamental problem is that exec migration shouldn't use popen. It should create a pipe() and do a proper fork/exec. I don't think the f* function support asynchronous IO properly. Regards, Anthony Liguori