From mboxrd@z Thu Jan 1 00:00:00 1970 From: Igor Mitsyanko Subject: Re: [Qemu-devel] KVM call agenda for tuesday 31 Date: Mon, 05 Mar 2012 17:38:08 +0400 Message-ID: <4F54C1C0.6030803@samsung.com> References: <87ehuhrpel.fsf@elfo.elfo> <4F272A92.2010609@suse.de> <4F272D8C.8020608@codemonkey.ws> <4F27E98E.2080501@suse.de> Reply-To: i.mitsyanko@samsung.com Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7BIT Cc: =?UTF-8?B?QW5kcmVhcyBGw6RyYmVy?= , Anthony Liguori , quintela@redhat.com, Developers qemu-devel , KVM devel mailing list , Dmitry Solodkiy To: Peter Maydell Return-path: Received: from mailout1.w1.samsung.com ([210.118.77.11]:38726 "EHLO mailout1.w1.samsung.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932217Ab2CENiM (ORCPT ); Mon, 5 Mar 2012 08:38:12 -0500 Received: from euspt2 (mailout1.w1.samsung.com [210.118.77.11]) by mailout1.w1.samsung.com (iPlanet Messaging Server 5.2 Patch 2 (built Jul 14 2004)) with ESMTP id <0M0E00BC1Z7KPQ@mailout1.w1.samsung.com> for kvm@vger.kernel.org; Mon, 05 Mar 2012 13:38:08 +0000 (GMT) Received: from [106.109.8.162] by spt2.w1.samsung.com (iPlanet Messaging Server 5.2 Patch 2 (built Jul 14 2004)) with ESMTPA id <0M0E00LYEZ7LFR@spt2.w1.samsung.com> for kvm@vger.kernel.org; Mon, 05 Mar 2012 13:38:10 +0000 (GMT) In-reply-to: Sender: kvm-owner@vger.kernel.org List-ID: On 02/21/2012 07:33 PM, Peter Maydell wrote: > On 9 February 2012 22:23, Peter Maydell wrote: >> Ping re the VMState and variable sized arrays issue. I don't >> see any consensus in this discussion for a different approach, >> so should we just commit Mitsyanko's patchset? > > From an IRC conversation I just had with Anthony and Juan: > ===begin== > 14:51< pm215> quintela: do you have an opinion on the vmstate variable- > length array stuff (needed for sd card) ? > 14:51< quintela> pm havent looked, email title? > 14:52< pm215> "KVM call agenda for tuesday 31" is the most recent > discussion :-) > 14:53< pm215> http://patchwork.ozlabs.org/patch/137732/ and > http://patchwork.ozlabs.org/patch/137733/ are the relevant patches > 14:54< quintela> pm215: found it, that it is a difficult thing to do (TM) > 14:54< quintela> it should be on the "card" file, or whatever :-( > 14:55< quintela> notice the "should" part. > 14:55< pm215> I'm not sure what you mean, can you elaborate? > 14:57< quintela> pm215: sect is number of sectors, right? > 14:58< pm215> yes > 14:59< quintela> so, a 1GB card would have around 8MB array? > 14:59< quintela> took or left some byties here andthere. > 14:59< quintela> bytes indeed. > 14:59< quintela> I _think_ that we should put that in a save_live > section, but that is just me (TM) > 15:00< quintela> I guess that at some point, people are going to need > bigger SD cards (16GB are already on the wild, right) > 15:00< quintela> and that would make live migration just impossible > 15:00< quintela> or very slow, that is completely equivalent. > 15:01< quintela> it is my understanding that AHCI is using similar code, > or did I missread some of the information? > 15:03< pm215> I think alex would like ahci to use a similar 'variable > length array' thing, but in that case the array is much smaller > 15:03< aliguori> pm215, it's large but of bounded size, right? > 15:03< aliguori> for SD cards? > 15:03< quintela> aliguori: number of sectors * 4 bytes > 15:03< quintela> aliguori: so hugeeeeeeeeeeeeeeeeee > 15:04< quintela> 8MB array for each 1GB of disk. > 15:04< quintela> but or take some bytes. > 15:04< aliguori> quintela, you cannot save that much data via savevm > 15:04< quintela> this is more than all other devices combined in a > normal instalaltion. > 15:04< aliguori> that's too much > 15:04< quintela> aliguori: see my answer, we need a save_live section, > really. > 15:04< aliguori> it will screw up the live migration downtime algorithm > 15:04< aliguori> pm215, ^ > 15:04< aliguori> or just treat it as a ram section > 15:05< aliguori> qemu_ram_alloc() it, and call it a day > 15:05< pm215> the spec isn't very clear, but I think technically this > info should go in the sd card image, except there's no way to tack > additional info into a raw file > 15:05< aliguori> pm215, yeah, qemu_ram_alloc() is the way to go I think, > that makes it effectively volatile on-card RAM > 15:05< quintela> pm215: I fully agree that it should go into the card > image, but ..... no space for it :-( > 15:05< aliguori> which i think makes sense > 15:05< quintela> pm215: another thing, forgetting about migration at all. > 15:06< quintela> how does this work if you stop marchine and restart > it again? > 15:06 * quintela guess that it is stored somewhere? > 15:06< quintela> s/marchine/machine/ > 15:06< pm215> no, we just assume that any fresh sd card image has no > write-protect set up > 15:07< quintela> pm215: what is stored on that image? /me would have > assumed that wearing information > 15:07< quintela> but that is without reading the whole code. > 15:09< quintela> humm, it looks like only 1 bit is used for each sector, > why are we storing 32 bits if we only use 1 bit? > 15:09< pm215> it's write-protect : you can set parts of the sd card to > not be writeable (with a granularity of a "write-protect group size") > 15:09< pm215> we probably don't implement fantastically efficiently > 15:10< quintela> pm215: ok, only 1 bit is needed. > 15:10< quintela> we can move to 1bit/sector (8 times smaller) > 15:10< quintela> but I still think that doing the qemu_ram_alloc() > trick that aliguori pointed is the easiest way to fix it > 15:11< quintela> you can create a ram_save_live section, but it is > going to be more complex for almost no gain > 15:11< pm215> ah, so we qemu_ram_alloc() it and then the contents get > transferred in the same way as main memory ? > 15:12< pm215> ...that is in exec-obsolete.h and marked as "to be > removed soon"... > 15:13< aliguori> pm215, yeah, so you'll need to create it using > whatever the new fancy memory api interface is > 15:13< aliguori> pm215, note that whenever you touch that memory, you > have to set the dirty bits appropriately > 15:13< aliguori> or else live migration won't work > 15:14< quintela> aliguori: if they have to touch the dirty bits, it is > equivalent to do their own save_live section. > 15:14< quintela> but I agree that this is the only "easy" solution. > 15:17< pm215> doesn't sound too hard... > 15:18< quintela> as usual with vmstate, problem is testing (althought > shouldn't be very difficult, either) > ===endit=== > > Short summary: > * switch wp groups to bitfield rather than int array > * convert sd.c to use memory_region_init_ram() to allocate the wp groups > (being careful to use memory_region_set_dirty() when we touch them) > * we don't need variable-length fields for sd.c any more > * rest of the vmstate conversion is straightforward > > -- PMM > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > OK, it turned out to be not so simple, we can't use memory API in sd.c because TARGET_PHYS_ADDR_BITS value (and, consequently, target_phys_addr_t) is not defined for common objects. -- Mitsyanko Igor ASWG, Moscow R&D center, Samsung Electronics email: i.mitsyanko@samsung.com From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:39299) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S4Y7I-0004kV-Kk for qemu-devel@nongnu.org; Mon, 05 Mar 2012 08:38:26 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1S4Y7C-0003QV-El for qemu-devel@nongnu.org; Mon, 05 Mar 2012 08:38:20 -0500 Received: from mailout4.w1.samsung.com ([210.118.77.14]:61005) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S4Y7C-0003Q9-61 for qemu-devel@nongnu.org; Mon, 05 Mar 2012 08:38:14 -0500 MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; charset=UTF-8; format=flowed Received: from euspt2 ([210.118.77.14]) by mailout4.w1.samsung.com (Sun Java(tm) System Messaging Server 6.3-8.04 (built Jul 29 2009; 32bit)) with ESMTP id <0M0E00HKFZ7M6X80@mailout4.w1.samsung.com> for qemu-devel@nongnu.org; Mon, 05 Mar 2012 13:38:10 +0000 (GMT) Received: from [106.109.8.162] by spt2.w1.samsung.com (iPlanet Messaging Server 5.2 Patch 2 (built Jul 14 2004)) with ESMTPA id <0M0E00LYEZ7LFR@spt2.w1.samsung.com> for qemu-devel@nongnu.org; Mon, 05 Mar 2012 13:38:10 +0000 (GMT) Date: Mon, 05 Mar 2012 17:38:08 +0400 From: Igor Mitsyanko In-reply-to: Message-id: <4F54C1C0.6030803@samsung.com> References: <87ehuhrpel.fsf@elfo.elfo> <4F272A92.2010609@suse.de> <4F272D8C.8020608@codemonkey.ws> <4F27E98E.2080501@suse.de> Subject: Re: [Qemu-devel] KVM call agenda for tuesday 31 Reply-To: i.mitsyanko@samsung.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Maydell Cc: KVM devel mailing list , quintela@redhat.com, Developers qemu-devel , Dmitry Solodkiy , Anthony Liguori , =?UTF-8?B?QW5kcmVhcyBGw6RyYmVy?= On 02/21/2012 07:33 PM, Peter Maydell wrote: > On 9 February 2012 22:23, Peter Maydell wrote: >> Ping re the VMState and variable sized arrays issue. I don't >> see any consensus in this discussion for a different approach, >> so should we just commit Mitsyanko's patchset? > > From an IRC conversation I just had with Anthony and Juan: > ===begin== > 14:51< pm215> quintela: do you have an opinion on the vmstate variable- > length array stuff (needed for sd card) ? > 14:51< quintela> pm havent looked, email title? > 14:52< pm215> "KVM call agenda for tuesday 31" is the most recent > discussion :-) > 14:53< pm215> http://patchwork.ozlabs.org/patch/137732/ and > http://patchwork.ozlabs.org/patch/137733/ are the relevant patches > 14:54< quintela> pm215: found it, that it is a difficult thing to do (TM) > 14:54< quintela> it should be on the "card" file, or whatever :-( > 14:55< quintela> notice the "should" part. > 14:55< pm215> I'm not sure what you mean, can you elaborate? > 14:57< quintela> pm215: sect is number of sectors, right? > 14:58< pm215> yes > 14:59< quintela> so, a 1GB card would have around 8MB array? > 14:59< quintela> took or left some byties here andthere. > 14:59< quintela> bytes indeed. > 14:59< quintela> I _think_ that we should put that in a save_live > section, but that is just me (TM) > 15:00< quintela> I guess that at some point, people are going to need > bigger SD cards (16GB are already on the wild, right) > 15:00< quintela> and that would make live migration just impossible > 15:00< quintela> or very slow, that is completely equivalent. > 15:01< quintela> it is my understanding that AHCI is using similar code, > or did I missread some of the information? > 15:03< pm215> I think alex would like ahci to use a similar 'variable > length array' thing, but in that case the array is much smaller > 15:03< aliguori> pm215, it's large but of bounded size, right? > 15:03< aliguori> for SD cards? > 15:03< quintela> aliguori: number of sectors * 4 bytes > 15:03< quintela> aliguori: so hugeeeeeeeeeeeeeeeeee > 15:04< quintela> 8MB array for each 1GB of disk. > 15:04< quintela> but or take some bytes. > 15:04< aliguori> quintela, you cannot save that much data via savevm > 15:04< quintela> this is more than all other devices combined in a > normal instalaltion. > 15:04< aliguori> that's too much > 15:04< quintela> aliguori: see my answer, we need a save_live section, > really. > 15:04< aliguori> it will screw up the live migration downtime algorithm > 15:04< aliguori> pm215, ^ > 15:04< aliguori> or just treat it as a ram section > 15:05< aliguori> qemu_ram_alloc() it, and call it a day > 15:05< pm215> the spec isn't very clear, but I think technically this > info should go in the sd card image, except there's no way to tack > additional info into a raw file > 15:05< aliguori> pm215, yeah, qemu_ram_alloc() is the way to go I think, > that makes it effectively volatile on-card RAM > 15:05< quintela> pm215: I fully agree that it should go into the card > image, but ..... no space for it :-( > 15:05< aliguori> which i think makes sense > 15:05< quintela> pm215: another thing, forgetting about migration at all. > 15:06< quintela> how does this work if you stop marchine and restart > it again? > 15:06 * quintela guess that it is stored somewhere? > 15:06< quintela> s/marchine/machine/ > 15:06< pm215> no, we just assume that any fresh sd card image has no > write-protect set up > 15:07< quintela> pm215: what is stored on that image? /me would have > assumed that wearing information > 15:07< quintela> but that is without reading the whole code. > 15:09< quintela> humm, it looks like only 1 bit is used for each sector, > why are we storing 32 bits if we only use 1 bit? > 15:09< pm215> it's write-protect : you can set parts of the sd card to > not be writeable (with a granularity of a "write-protect group size") > 15:09< pm215> we probably don't implement fantastically efficiently > 15:10< quintela> pm215: ok, only 1 bit is needed. > 15:10< quintela> we can move to 1bit/sector (8 times smaller) > 15:10< quintela> but I still think that doing the qemu_ram_alloc() > trick that aliguori pointed is the easiest way to fix it > 15:11< quintela> you can create a ram_save_live section, but it is > going to be more complex for almost no gain > 15:11< pm215> ah, so we qemu_ram_alloc() it and then the contents get > transferred in the same way as main memory ? > 15:12< pm215> ...that is in exec-obsolete.h and marked as "to be > removed soon"... > 15:13< aliguori> pm215, yeah, so you'll need to create it using > whatever the new fancy memory api interface is > 15:13< aliguori> pm215, note that whenever you touch that memory, you > have to set the dirty bits appropriately > 15:13< aliguori> or else live migration won't work > 15:14< quintela> aliguori: if they have to touch the dirty bits, it is > equivalent to do their own save_live section. > 15:14< quintela> but I agree that this is the only "easy" solution. > 15:17< pm215> doesn't sound too hard... > 15:18< quintela> as usual with vmstate, problem is testing (althought > shouldn't be very difficult, either) > ===endit=== > > Short summary: > * switch wp groups to bitfield rather than int array > * convert sd.c to use memory_region_init_ram() to allocate the wp groups > (being careful to use memory_region_set_dirty() when we touch them) > * we don't need variable-length fields for sd.c any more > * rest of the vmstate conversion is straightforward > > -- PMM > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > OK, it turned out to be not so simple, we can't use memory API in sd.c because TARGET_PHYS_ADDR_BITS value (and, consequently, target_phys_addr_t) is not defined for common objects. -- Mitsyanko Igor ASWG, Moscow R&D center, Samsung Electronics email: i.mitsyanko@samsung.com