[Qemu-devel] Cutting a new QEMU release

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] Cutting a new QEMU release
@ 2009-02-03 20:48 Anthony Liguori
  2009-02-03 20:58 ` Glauber Costa
                   ` (7 more replies)
  0 siblings, 8 replies; 82+ messages in thread
From: Anthony Liguori @ 2009-02-03 20:48 UTC (permalink / raw)
  To: qemu-devel@nongnu.org

What do people think?  TCG seems to be in a good place.  We've got 
virtio, KVM, live migration, tons of new devices, bsd-user, etc.

We could decide to cut one by the end of the month.  I'm already doing 
some test work in QEMU so I can follow up with some more detailed notes 
about what is working and what isn't working.  That gives us some time 
to decide if there's anything we need to fix before a release.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Cutting a new QEMU release
  2009-02-03 20:48 [Qemu-devel] Cutting a new QEMU release Anthony Liguori
@ 2009-02-03 20:58 ` Glauber Costa
  2009-02-03 21:35   ` Laurent Desnogues
  2009-02-03 21:48 ` Rick Vernam
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 82+ messages in thread
From: Glauber Costa @ 2009-02-03 20:58 UTC (permalink / raw)
  To: qemu-devel

On Tue, Feb 3, 2009 at 6:48 PM, Anthony Liguori <anthony@codemonkey.ws> wrote:
> What do people think?  TCG seems to be in a good place.  We've got virtio,
> KVM, live migration, tons of new devices, bsd-user, etc.
>
> We could decide to cut one by the end of the month.  I'm already doing some
> test work in QEMU so I can follow up with some more detailed notes about
> what is working and what isn't working.  That gives us some time to decide
> if there's anything we need to fix before a release.
>
> Regards,

I'm totally for it.

-- 
Glauber  Costa.
"Free as in Freedom"
http://glommer.net

"The less confident you are, the more serious you have to act."

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Cutting a new QEMU release
  2009-02-03 20:58 ` Glauber Costa
@ 2009-02-03 21:35   ` Laurent Desnogues
  2009-02-03 21:50     ` Anthony Liguori
                       ` (2 more replies)
  0 siblings, 3 replies; 82+ messages in thread
From: Laurent Desnogues @ 2009-02-03 21:35 UTC (permalink / raw)
  To: qemu-devel

On Tue, Feb 3, 2009 at 9:58 PM, Glauber Costa <glommer@gmail.com> wrote:
> On Tue, Feb 3, 2009 at 6:48 PM, Anthony Liguori <anthony@codemonkey.ws> wrote:
>> What do people think?  TCG seems to be in a good place.  We've got virtio,
>> KVM, live migration, tons of new devices, bsd-user, etc.
>>
>> We could decide to cut one by the end of the month.  I'm already doing some
>> test work in QEMU so I can follow up with some more detailed notes about
>> what is working and what isn't working.  That gives us some time to decide
>> if there's anything we need to fix before a release.
>>
>> Regards,
>
> I'm totally for it.

So am I, but who will test user mode and more generally (user and system)
what is the test procedure?

For instance someone (Andzrej?) mentionned  ARM in system mode is half
slower than it was before TCG.  Also the ARM target needs some fixing.

Perhaps doing at least one release candidate to get feedback (and focus on
fixing reported bugs) would be appropriate.

Cheers,

Laurent

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Cutting a new QEMU release
  2009-02-03 20:48 [Qemu-devel] Cutting a new QEMU release Anthony Liguori
  2009-02-03 20:58 ` Glauber Costa
@ 2009-02-03 21:48 ` Rick Vernam
  2009-02-03 22:07 ` Daniel P. Berrange
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 82+ messages in thread
From: Rick Vernam @ 2009-02-03 21:48 UTC (permalink / raw)
  To: qemu-devel

On Tuesday 03 February 2009 2:48:22 pm Anthony Liguori wrote:
> What do people think?  TCG seems to be in a good place.  We've got
> virtio, KVM, live migration, tons of new devices, bsd-user, etc.
>
> We could decide to cut one by the end of the month.  I'm already doing
> some test work in QEMU so I can follow up with some more detailed notes
> about what is working and what isn't working.  That gives us some time
> to decide if there's anything we need to fix before a release.
>
> Regards,
>
> Anthony Liguori

-vga vmware doesn't work on either of my wxp guests, nor my w2k guest.

I continue to get triple faults from any Windows XP guest at a particular 
point while booting, when invoked with qemu-system-x86_64.
When invoked with plain qemu, it seems to work just fine.
This is with or without -kqemu or -kernel-kqemu (my athlon 3700+ doesn't do 
kvm).

I've posted about these in the past, and I don't have any additional 
information to add to those (ancient) discussions.  I don't intend to start a 
discussion about them now, this is just a friendly reminder about some 
unresolved issues.

Thanks
-Rick

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Cutting a new QEMU release
  2009-02-03 21:35   ` Laurent Desnogues
@ 2009-02-03 21:50     ` Anthony Liguori
  2009-02-03 22:05       ` Laurent Desnogues
  2009-02-04 13:09       ` Ulrich Hecht
  2009-02-04  0:31     ` David Turner
       [not found]     ` <74222928-D24B-4780-BDB0-D537A83C4F68@hotmail.com>
  2 siblings, 2 replies; 82+ messages in thread
From: Anthony Liguori @ 2009-02-03 21:50 UTC (permalink / raw)
  To: qemu-devel

Laurent Desnogues wrote:
> On Tue, Feb 3, 2009 at 9:58 PM, Glauber Costa <glommer@gmail.com> wrote:
>   
>> I'm totally for it.
>>     
>
> So am I, but who will test user mode and more generally (user and system)
> what is the test procedure?
>   
I'd like to approach this gently.  Historically, there's been no formal 
release process.  I'm not inclined to start out by introducing any sort 
of heavy weight procedure.

I'll poke things as best I can over the next couple weeks.  I encourage 
everyone else to do the same.  I'll keep track of what's working and 
what's broken and make it available publicly.  At some point, we can 
decide as if things are too embarrassing to release or not :-)

> For instance someone (Andzrej?) mentionned  ARM in system mode is half
> slower than it was before TCG.  Also the ARM target needs some fixing.
>
> Perhaps doing at least one release candidate to get feedback (and focus on
> fixing reported bugs) would be appropriate.
>   

A release doesn't have to be perfect to be useful.  I think what matters 
most is whether something is likely to be fixed in the reasonably near 
future.  We're going to have some regressions compared to 0.9.1.  There 
are a number of platforms that are no longer supported (ia64 and s390, 
for instance) but we could wait another year and I doubt these features 
would appear.

Regards,

Anthony Liguori

> Cheers,
>
> Laurent
>
>
>   

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Cutting a new QEMU release
  2009-02-03 21:50     ` Anthony Liguori
@ 2009-02-03 22:05       ` Laurent Desnogues
  2009-02-03 22:47         ` Anthony Liguori
  2009-02-04 13:09       ` Ulrich Hecht
  1 sibling, 1 reply; 82+ messages in thread
From: Laurent Desnogues @ 2009-02-03 22:05 UTC (permalink / raw)
  To: qemu-devel

On Tue, Feb 3, 2009 at 10:50 PM, Anthony Liguori <anthony@codemonkey.ws> wrote:
> Laurent Desnogues wrote:
[...]
>> So am I, but who will test user mode and more generally (user and system)
>> what is the test procedure?
>>
>
> I'd like to approach this gently.  Historically, there's been no formal
> release process.  I'm not inclined to start out by introducing any sort of
> heavy weight procedure.

Don't take me wrong, I am not for formal processes at all :)  I just want
to be sure user mode won't be forgotten due to the lack of a maintainer.

> I'll poke things as best I can over the next couple weeks.  I encourage
> everyone else to do the same.  I'll keep track of what's working and what's
> broken and make it available publicly.  At some point, we can decide as if
> things are too embarrassing to release or not :-)

I intend on testing various things on my side too.  Be sure I'll let you
know of problems and also will provide patches.

>> For instance someone (Andzrej?) mentionned  ARM in system mode is half
>> slower than it was before TCG.  Also the ARM target needs some fixing.
>>
>> Perhaps doing at least one release candidate to get feedback (and focus on
>> fixing reported bugs) would be appropriate.
>>
>
> A release doesn't have to be perfect to be useful.  I think what matters
> most is whether something is likely to be fixed in the reasonably near
> future.  We're going to have some regressions compared to 0.9.1.  There are
> a number of platforms that are no longer supported (ia64 and s390, for
> instance) but we could wait another year and I doubt these features would
> appear.

I agree we should not care now about targets that are not here anymore.
But things that are important for the community should be taken with
care (and arm linux user mode is certainly a very important target).


Laurent

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Cutting a new QEMU release
  2009-02-03 20:48 [Qemu-devel] Cutting a new QEMU release Anthony Liguori
  2009-02-03 20:58 ` Glauber Costa
  2009-02-03 21:48 ` Rick Vernam
@ 2009-02-03 22:07 ` Daniel P. Berrange
  2009-02-04 14:50 ` Aurelien Jarno
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 82+ messages in thread
From: Daniel P. Berrange @ 2009-02-03 22:07 UTC (permalink / raw)
  To: qemu-devel

On Tue, Feb 03, 2009 at 02:48:22PM -0600, Anthony Liguori wrote:
> What do people think?  TCG seems to be in a good place.  We've got 
> virtio, KVM, live migration, tons of new devices, bsd-user, etc.

I'd like to see a new release if at all practical. For Fedora there is a
push to ship KVM and QEMU packages based off the same source tree to make
patching security flaws more pratical. Given that KVM ships off a QEMU
SVN snapshot, having a single source tree would mean shipping our full
multi-arch QEMU package off a SVN snapshot too. I don't find this a
particularly appealing thing - if CVS snapshot is stable enough for it
to be exposed to Fedora users, I'd like to think QEMU developers would
be happy with a official release. If the QEMU dev community considers 
the code too unstable to release, then exposing it to Fedora users seem 
sub-optimal. 

Personally I test & use the i386 and x86_64 system emulator parts of
QEMU, and those seem generally stable enough to base a new release off.
So I'd welcome a new release from that POV. I'll leave others to
comment on quality of the other arch targets. 

> We could decide to cut one by the end of the month.  I'm already doing 
> some test work in QEMU so I can follow up with some more detailed notes 
> about what is working and what isn't working.  That gives us some time 
> to decide if there's anything we need to fix before a release.

A QEMU release by the end of the month would work pretty well for the 
time scale we're working on to get stuff into the Fedora 11 release 
too. 

Regards,
Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Cutting a new QEMU release
  2009-02-03 22:05       ` Laurent Desnogues
@ 2009-02-03 22:47         ` Anthony Liguori
  2009-02-03 23:48           ` Glauber Costa
  0 siblings, 1 reply; 82+ messages in thread
From: Anthony Liguori @ 2009-02-03 22:47 UTC (permalink / raw)
  To: qemu-devel

Laurent Desnogues wrote:
> On Tue, Feb 3, 2009 at 10:50 PM, Anthony Liguori <anthony@codemonkey.ws> wrote:
>   
>> Laurent Desnogues wrote:
>>     
>>> For instance someone (Andzrej?) mentionned  ARM in system mode is half
>>> slower than it was before TCG.  Also the ARM target needs some fixing.
>>>
>>> Perhaps doing at least one release candidate to get feedback (and focus on
>>> fixing reported bugs) would be appropriate.
>>>
>>>       
>> A release doesn't have to be perfect to be useful.  I think what matters
>> most is whether something is likely to be fixed in the reasonably near
>> future.  We're going to have some regressions compared to 0.9.1.  There are
>> a number of platforms that are no longer supported (ia64 and s390, for
>> instance) but we could wait another year and I doubt these features would
>> appear.
>>     
>
> I agree we should not care now about targets that are not here anymore.
> But things that are important for the community should be taken with
> care (and arm linux user mode is certainly a very important target).
>   

If someone is actively fixing it, then I'm perfectly happy to wait.  If 
it's a known issue that noone is resolving, I don't think delaying a 
release helps anyone.  However, documenting all of these things 
somewhere so that they are clearly visible may make it easier for 
someone to fix so the process of going through a release would probably 
be helpful in general.

Regards,

Anthony Liguori

> Laurent
>
>
>   

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Cutting a new QEMU release
  2009-02-03 22:47         ` Anthony Liguori
@ 2009-02-03 23:48           ` Glauber Costa
  0 siblings, 0 replies; 82+ messages in thread
From: Glauber Costa @ 2009-02-03 23:48 UTC (permalink / raw)
  To: qemu-devel

>>
>> I agree we should not care now about targets that are not here anymore.
>> But things that are important for the community should be taken with
>> care (and arm linux user mode is certainly a very important target).
>>
>
> If someone is actively fixing it, then I'm perfectly happy to wait.  If it's
> a known issue that noone is resolving, I don't think delaying a release
> helps anyone.  However, documenting all of these things somewhere so that
> they are clearly visible may make it easier for someone to fix so the
> process of going through a release would probably be helpful in general.
>

We also have to take into account that making a release is likely to increase
the quality of qemu code base as a whole. Just because any user getting into
qemu site today, or getting code from a distro, is likely to be using
something very
old. Old enough that any bug reports will be probably useless to us.

Bug reports against 0.9.1 has happened many times already in the list.


-- 
Glauber  Costa.
"Free as in Freedom"
http://glommer.net

"The less confident you are, the more serious you have to act."

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Cutting a new QEMU release
  2009-02-03 21:35   ` Laurent Desnogues
  2009-02-03 21:50     ` Anthony Liguori
@ 2009-02-04  0:31     ` David Turner
       [not found]     ` <74222928-D24B-4780-BDB0-D537A83C4F68@hotmail.com>
  2 siblings, 0 replies; 82+ messages in thread
From: David Turner @ 2009-02-04  0:31 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1231 bytes --]

On Tue, Feb 3, 2009 at 10:35 PM, Laurent Desnogues <
laurent.desnogues@gmail.com> wrote:

>
> For instance someone (Andzrej?) mentionned  ARM in system mode is half
> slower than it was before TCG.  Also the ARM target needs some fixing.
>

I have integrated the TCG ARM backend in the Android emulator, and my
measurements
show an improvement in performance, when running various Android performance
tests,
between x1.10 and x1.90 compared to the old dyngen based translator. To be
honest,
the improvements are not consistent, there are a few rare tests
that run at x0.89, but they're not critical to me).

Note that the TCG binary is compiled with GCC 4.2, while the old one was
built with
GCC 3.3 (fo rthe usual ugly dyngen reasons).

This is only when comparing the same ARMv5 binaries, but it sounds good
enough for me.

An official release would be very welcomed at this point. The amount of
changes since
the last one has been dramatic. Morever, this will allow everyone to reset
the clock on
their forks and more easily share patches with upstream.

Just my 2 cents

>
> Perhaps doing at least one release candidate to get feedback (and focus on
> fixing reported bugs) would be appropriate.
>
> Cheers,
>
> Laurent
>
>
>

[-- Attachment #2: Type: text/html, Size: 1796 bytes --]

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Cutting a new QEMU release
       [not found]     ` <74222928-D24B-4780-BDB0-D537A83C4F68@hotmail.com>
@ 2009-02-04  5:08       ` C.W. Betts
  0 siblings, 0 replies; 82+ messages in thread
From: C.W. Betts @ 2009-02-04  5:08 UTC (permalink / raw)
  To: qemu-devel

I am all for releasing a release candidate.  This signals everyone  
that qemu is thinking of releasing a stable version and people will  
try to find bugs.
On Feb 3, 2009, at 2:35 PM, Laurent Desnogues wrote:

> Perhaps doing at least one release candidate to get feedback (and  
> focus on
> fixing reported bugs) would be appropriate.
>
> Cheers,
>
> Laurent
>
>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Cutting a new QEMU release
  2009-02-03 21:50     ` Anthony Liguori
  2009-02-03 22:05       ` Laurent Desnogues
@ 2009-02-04 13:09       ` Ulrich Hecht
  1 sibling, 0 replies; 82+ messages in thread
From: Ulrich Hecht @ 2009-02-04 13:09 UTC (permalink / raw)
  To: qemu-devel

On Tuesday 03 February 2009, Anthony Liguori wrote:
> There are a number of platforms that are no longer 
> supported (ia64 and s390, for instance) but we could wait another year
> and I doubt these features would appear.

I am working on S/390 host support. Currently, it's good enough to show 
the PC BIOS startup screen (softmmu) and to run an i386 shell 
(linux-user). ATM I cannot give an estimate when it will be good enough 
for a release, though.

CU
Uli

-- 
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Cutting a new QEMU release
  2009-02-03 20:48 [Qemu-devel] Cutting a new QEMU release Anthony Liguori
                   ` (2 preceding siblings ...)
  2009-02-03 22:07 ` Daniel P. Berrange
@ 2009-02-04 14:50 ` Aurelien Jarno
  2009-02-04 15:23   ` Tristan Gingold
                     ` (3 more replies)
  2009-02-04 15:58 ` Glauber Costa
                   ` (3 subsequent siblings)
  7 siblings, 4 replies; 82+ messages in thread
From: Aurelien Jarno @ 2009-02-04 14:50 UTC (permalink / raw)
  To: qemu-devel

On Tue, Feb 03, 2009 at 02:48:22PM -0600, Anthony Liguori wrote:
> What do people think?  TCG seems to be in a good place.  We've got  
> virtio, KVM, live migration, tons of new devices, bsd-user, etc.
>
> We could decide to cut one by the end of the month.  I'm already doing  
> some test work in QEMU so I can follow up with some more detailed notes  
> about what is working and what isn't working.  That gives us some time  
> to decide if there's anything we need to fix before a release.
>

That's a really good idea.

I would like to see the switch of the remaining PowerPC machine from
OpenHackware to OpenBIOS. We don't have the sources of the current
ppc_rom.bin binary, and I don't feel comfortable making a release with
it. We probably have the sources of an older version.

This at least concerned ppc_chrp.c (ppc_prep.c could probably simply
be dropped).

I have no idea about how long it would take.

-- 
Aurelien Jarno	                        GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Cutting a new QEMU release
  2009-02-04 14:50 ` Aurelien Jarno
@ 2009-02-04 15:23   ` Tristan Gingold
  2009-02-04 15:43     ` Lennart Sorensen
  2009-02-04 17:39   ` [Qemu-devel] " Blue Swirl
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 82+ messages in thread
From: Tristan Gingold @ 2009-02-04 15:23 UTC (permalink / raw)
  To: qemu-devel


On Feb 4, 2009, at 3:50 PM, Aurelien Jarno wrote:
>
> This at least concerned ppc_chrp.c (ppc_prep.c could probably simply
> be dropped).

Please, don't drop ppc_prep.c.  Even if it is not supported by  
OpenBios I know several uses of prep
with direct images.

Tristan.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Cutting a new QEMU release
  2009-02-04 15:23   ` Tristan Gingold
@ 2009-02-04 15:43     ` Lennart Sorensen
  2009-02-04 16:01       ` Tristan Gingold
  0 siblings, 1 reply; 82+ messages in thread
From: Lennart Sorensen @ 2009-02-04 15:43 UTC (permalink / raw)
  To: qemu-devel

On Wed, Feb 04, 2009 at 04:23:10PM +0100, Tristan Gingold wrote:
> Please, don't drop ppc_prep.c.  Even if it is not supported by  
> OpenBios I know several uses of prep
> with direct images.

Who still support it?

The linux kernel seems to have thrown away the prep support code as of
2.6.27.

-- 
Len Sorensen

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Cutting a new QEMU release
  2009-02-03 20:48 [Qemu-devel] Cutting a new QEMU release Anthony Liguori
                   ` (3 preceding siblings ...)
  2009-02-04 14:50 ` Aurelien Jarno
@ 2009-02-04 15:58 ` Glauber Costa
  2009-02-07 15:29 ` Shin-ichiro KAWASAKI
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 82+ messages in thread
From: Glauber Costa @ 2009-02-04 15:58 UTC (permalink / raw)
  To: qemu-devel

On Tue, Feb 3, 2009 at 6:48 PM, Anthony Liguori <anthony@codemonkey.ws> wrote:
> What do people think?  TCG seems to be in a good place.  We've got virtio,
> KVM, live migration, tons of new devices, bsd-user, etc.
>
> We could decide to cut one by the end of the month.  I'm already doing some
> test work in QEMU so I can follow up with some more detailed notes about
> what is working and what isn't working.  That gives us some time to decide
> if there's anything we need to fix before a release.

As a curiosity, what would be the preferred version number? 0.9.2?
0.10 ? 1.0 ? "Phoenix"?
"Crisis Release" ?


-- 
Glauber  Costa.
"Free as in Freedom"
http://glommer.net

"The less confident you are, the more serious you have to act."

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Cutting a new QEMU release
  2009-02-04 15:43     ` Lennart Sorensen
@ 2009-02-04 16:01       ` Tristan Gingold
  2009-02-04 18:17         ` [Qemu-devel] " Consul
  0 siblings, 1 reply; 82+ messages in thread
From: Tristan Gingold @ 2009-02-04 16:01 UTC (permalink / raw)
  To: qemu-devel


On Feb 4, 2009, at 4:43 PM, Lennart Sorensen wrote:

> On Wed, Feb 04, 2009 at 04:23:10PM +0100, Tristan Gingold wrote:
>> Please, don't drop ppc_prep.c.  Even if it is not supported by
>> OpenBios I know several uses of prep
>> with direct images.
>
> Who still support it?

I know at least one non-free OS that runs on prep.
We also create raw programs that runs on prep (yes we could switch to  
chrp)

> The linux kernel seems to have thrown away the prep support code as of
> 2.6.27.

Yes, but the world is not only linux 2.6.27+ :-)

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Cutting a new QEMU release
  2009-02-04 14:50 ` Aurelien Jarno
  2009-02-04 15:23   ` Tristan Gingold
@ 2009-02-04 17:39   ` Blue Swirl
  2009-02-04 17:50     ` Jonathan Kalbfeld
  2009-02-04 20:07   ` Blue Swirl
  2009-02-07 14:15   ` Stuart Brady
  3 siblings, 1 reply; 82+ messages in thread
From: Blue Swirl @ 2009-02-04 17:39 UTC (permalink / raw)
  To: qemu-devel

On 2/4/09, Aurelien Jarno <aurelien@aurel32.net> wrote:
> On Tue, Feb 03, 2009 at 02:48:22PM -0600, Anthony Liguori wrote:
>
> > What do people think?  TCG seems to be in a good place.  We've got
>  > virtio, KVM, live migration, tons of new devices, bsd-user, etc.
>  >
>  > We could decide to cut one by the end of the month.  I'm already doing
>  > some test work in QEMU so I can follow up with some more detailed notes
>  > about what is working and what isn't working.  That gives us some time
>  > to decide if there's anything we need to fix before a release.
>  >
>
>
> That's a really good idea.
>
>  I would like to see the switch of the remaining PowerPC machine from
>  OpenHackware to OpenBIOS. We don't have the sources of the current
>  ppc_rom.bin binary, and I don't feel comfortable making a release with
>  it. We probably have the sources of an older version.
>
>  This at least concerned ppc_chrp.c (ppc_prep.c could probably simply
>  be dropped).
>
>  I have no idea about how long it would take.

PPC development on OpenBIOS side is taking quick leaps, I'd hate to
rush a release just now when we are very close to a fully working
system.

On Sparc32/64 side, things are moving more slowly. Sparc32 is pretty
much release quality with support for Linux, OpenBSD and NetBSD boot
and a lot of boards. Sparc64 is still unusable, but it's not worth
waiting for.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Cutting a new QEMU release
  2009-02-04 17:39   ` [Qemu-devel] " Blue Swirl
@ 2009-02-04 17:50     ` Jonathan Kalbfeld
  0 siblings, 0 replies; 82+ messages in thread
From: Jonathan Kalbfeld @ 2009-02-04 17:50 UTC (permalink / raw)
  To: qemu-devel

Are the host details fixed on Solaris/SPARC?

Does anyone want access to my build environment to try and make it work?

I haven't gotten anything since the January 13, 2008 release to work
without SIGSEGV on a sparc.

jonathan

On Wed, Feb 4, 2009 at 9:39 AM, Blue Swirl <blauwirbel@gmail.com> wrote:
> On 2/4/09, Aurelien Jarno <aurelien@aurel32.net> wrote:
>> On Tue, Feb 03, 2009 at 02:48:22PM -0600, Anthony Liguori wrote:
>>
>> > What do people think?  TCG seems to be in a good place.  We've got
>>  > virtio, KVM, live migration, tons of new devices, bsd-user, etc.
>>  >
>>  > We could decide to cut one by the end of the month.  I'm already doing
>>  > some test work in QEMU so I can follow up with some more detailed notes
>>  > about what is working and what isn't working.  That gives us some time
>>  > to decide if there's anything we need to fix before a release.
>>  >
>>
>>
>> That's a really good idea.
>>
>>  I would like to see the switch of the remaining PowerPC machine from
>>  OpenHackware to OpenBIOS. We don't have the sources of the current
>>  ppc_rom.bin binary, and I don't feel comfortable making a release with
>>  it. We probably have the sources of an older version.
>>
>>  This at least concerned ppc_chrp.c (ppc_prep.c could probably simply
>>  be dropped).
>>
>>  I have no idea about how long it would take.
>
> PPC development on OpenBIOS side is taking quick leaps, I'd hate to
> rush a release just now when we are very close to a fully working
> system.
>
> On Sparc32/64 side, things are moving more slowly. Sparc32 is pretty
> much release quality with support for Linux, OpenBSD and NetBSD boot
> and a lot of boards. Sparc64 is still unusable, but it's not worth
> waiting for.
>
>
>



-- 
--
Jonathan Kalbfeld
ThoughtWave Technologies LLC
www.thoughtwave.com

"Yes, we did!"

Learn UNIX For Free at unixlessons.com

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [Qemu-devel] Re: Cutting a new QEMU release
  2009-02-04 16:01       ` Tristan Gingold
@ 2009-02-04 18:17         ` Consul
  0 siblings, 0 replies; 82+ messages in thread
From: Consul @ 2009-02-04 18:17 UTC (permalink / raw)
  To: qemu-devel

> 
> Yes, but the world is not only linux 2.6.27+ :-)
> 

Of course not. All the world's a VAX!

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Cutting a new QEMU release
  2009-02-04 14:50 ` Aurelien Jarno
  2009-02-04 15:23   ` Tristan Gingold
  2009-02-04 17:39   ` [Qemu-devel] " Blue Swirl
@ 2009-02-04 20:07   ` Blue Swirl
  2009-02-07 14:15   ` Stuart Brady
  3 siblings, 0 replies; 82+ messages in thread
From: Blue Swirl @ 2009-02-04 20:07 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1190 bytes --]

On 2/4/09, Aurelien Jarno <aurelien@aurel32.net> wrote:
> On Tue, Feb 03, 2009 at 02:48:22PM -0600, Anthony Liguori wrote:
>
> > What do people think?  TCG seems to be in a good place.  We've got
>  > virtio, KVM, live migration, tons of new devices, bsd-user, etc.
>  >
>  > We could decide to cut one by the end of the month.  I'm already doing
>  > some test work in QEMU so I can follow up with some more detailed notes
>  > about what is working and what isn't working.  That gives us some time
>  > to decide if there's anything we need to fix before a release.
>  >
>
>
> That's a really good idea.
>
>  I would like to see the switch of the remaining PowerPC machine from
>  OpenHackware to OpenBIOS. We don't have the sources of the current
>  ppc_rom.bin binary, and I don't feel comfortable making a release with
>  it. We probably have the sources of an older version.
>
>  This at least concerned ppc_chrp.c (ppc_prep.c could probably simply
>  be dropped).
>
>  I have no idea about how long it would take.

15 minutes :-), though for Qemu side only. The attached patch switches
CHRP to OpenBIOS. The screen flashes with something, some more stuff
is needed on OpenBIOS side.

[-- Attachment #2: chrp_use_openbios.diff --]
[-- Type: plain/text, Size: 2327 bytes --]

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [Qemu-devel] Re: Cutting a new QEMU release
@ 2009-02-05  9:13 Steve Fosdick
  2009-02-05 14:26 ` Anthony Liguori
  2009-02-05 14:55 ` Rick Vernam
  0 siblings, 2 replies; 82+ messages in thread
From: Steve Fosdick @ 2009-02-05  9:13 UTC (permalink / raw)
  To: qemu-devel

Given the talk of a new release I though I'd try the latest qemu from
SVN.  At the moment I am being hampered by kqemu-1.4.0pre1 not compiling
though:

  CC [M]  /usr/src/kqemu-1.4.0pre1/kqemu-linux.o
/usr/src/kqemu-1.4.0pre1/kqemu-linux.c: In function
‘kqemu_lock_user_page’:
/usr/src/kqemu-1.4.0pre1/kqemu-linux.c:81: error: dereferencing pointer
to incomplete type
/usr/src/kqemu-1.4.0pre1/kqemu-linux.c: In function ‘kqemu_schedule’:
/usr/src/kqemu-1.4.0pre1/kqemu-linux.c:194: error: implicit declaration
of function ‘need_resched’
/usr/src/kqemu-1.4.0pre1/kqemu-linux.c:195: error: implicit declaration
of function ‘schedule’
/usr/src/kqemu-1.4.0pre1/kqemu-linux.c:197: error: implicit declaration
of function ‘signal_pending’
make[2]: *** [/usr/src/kqemu-1.4.0pre1/kqemu-linux.o] Error 1

This is with kernel 2.6.28.2. kqemu-1.3.0pre11 seems to compile OK with
the kernel.  Any ideas?

Regards,
Steve.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: Cutting a new QEMU release
  2009-02-05  9:13 [Qemu-devel] Re: Cutting a new QEMU release Steve Fosdick
@ 2009-02-05 14:26 ` Anthony Liguori
  2009-02-05 15:36   ` Rick Vernam
                     ` (2 more replies)
  2009-02-05 14:55 ` Rick Vernam
  1 sibling, 3 replies; 82+ messages in thread
From: Anthony Liguori @ 2009-02-05 14:26 UTC (permalink / raw)
  To: qemu-devel

Steve Fosdick wrote:
> Given the talk of a new release I though I'd try the latest qemu from
> SVN.  At the moment I am being hampered by kqemu-1.4.0pre1 not compiling
> though:
>
>   CC [M]  /usr/src/kqemu-1.4.0pre1/kqemu-linux.o
> /usr/src/kqemu-1.4.0pre1/kqemu-linux.c: In function
> ‘kqemu_lock_user_page’:
> /usr/src/kqemu-1.4.0pre1/kqemu-linux.c:81: error: dereferencing pointer
> to incomplete type
> /usr/src/kqemu-1.4.0pre1/kqemu-linux.c: In function ‘kqemu_schedule’:
> /usr/src/kqemu-1.4.0pre1/kqemu-linux.c:194: error: implicit declaration
> of function ‘need_resched’
> /usr/src/kqemu-1.4.0pre1/kqemu-linux.c:195: error: implicit declaration
> of function ‘schedule’
> /usr/src/kqemu-1.4.0pre1/kqemu-linux.c:197: error: implicit declaration
> of function ‘signal_pending’
> make[2]: *** [/usr/src/kqemu-1.4.0pre1/kqemu-linux.o] Error 1
>
> This is with kernel 2.6.28.2. kqemu-1.3.0pre11 seems to compile OK with
> the kernel.  Any ideas?
>   

kqemu is unsupported and unmaintained.

Regards,

Anthony Liguori

> Regards,
> Steve.
>
>
>   

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: Cutting a new QEMU release
  2009-02-05  9:13 [Qemu-devel] Re: Cutting a new QEMU release Steve Fosdick
  2009-02-05 14:26 ` Anthony Liguori
@ 2009-02-05 14:55 ` Rick Vernam
  1 sibling, 0 replies; 82+ messages in thread
From: Rick Vernam @ 2009-02-05 14:55 UTC (permalink / raw)
  To: qemu-devel

On Thursday 05 February 2009 3:13:14 am Steve Fosdick wrote:
> Given the talk of a new release I though I'd try the latest qemu from
> SVN.  At the moment I am being hampered by kqemu-1.4.0pre1 not compiling
> though:
>
>   CC [M]  /usr/src/kqemu-1.4.0pre1/kqemu-linux.o
> /usr/src/kqemu-1.4.0pre1/kqemu-linux.c: In function
> ‘kqemu_lock_user_page’:
> /usr/src/kqemu-1.4.0pre1/kqemu-linux.c:81: error: dereferencing pointer
> to incomplete type
> /usr/src/kqemu-1.4.0pre1/kqemu-linux.c: In function ‘kqemu_schedule’:
> /usr/src/kqemu-1.4.0pre1/kqemu-linux.c:194: error: implicit declaration
> of function ‘need_resched’
> /usr/src/kqemu-1.4.0pre1/kqemu-linux.c:195: error: implicit declaration
> of function ‘schedule’
> /usr/src/kqemu-1.4.0pre1/kqemu-linux.c:197: error: implicit declaration
> of function ‘signal_pending’
> make[2]: *** [/usr/src/kqemu-1.4.0pre1/kqemu-linux.o] Error 1
>
> This is with kernel 2.6.28.2. kqemu-1.3.0pre11 seems to compile OK with
> the kernel.  Any ideas?
I, and another, posted about this some time ago.  The solution is a particular 
#include somewhere, which I don't recall off the top of my head.
It's in the list somewhere, if you look hard enough.

>
> Regards,
> Steve.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: Cutting a new QEMU release
  2009-02-05 14:26 ` Anthony Liguori
@ 2009-02-05 15:36   ` Rick Vernam
  2009-02-05 16:27     ` Paul Brook
  2009-02-05 15:55   ` René Rebe
  2009-02-07 12:01   ` Stefan Weil
  2 siblings, 1 reply; 82+ messages in thread
From: Rick Vernam @ 2009-02-05 15:36 UTC (permalink / raw)
  To: qemu-devel

On Thursday 05 February 2009 8:26:04 am Anthony Liguori wrote:
> kqemu is unsupported and unmaintained.
Interesting.  When did it fall into that status?
The Maintainers file shows Fabrice as the maintainer of kqemu.  I suppose that 
needs to be updated?

I see Fabrice released 1.4.0pre1 on May 30th, 2008, although I never did see 
anything declaring it unsupported (I'm not suggesting it was never declared, 
just that I never saw any such declaration).

Are there any plans to support it in the future?  This really is quite a shock 
to me, actually.  I know qemu has a wide range of uses - but for me and surely 
others, virtualization is a primary use.  To the best of my knowledge, kvm 
requires hardware support - where does this leave the class of users who need 
virtualization & don't have hardware virtualization support?  Are we no longer 
the a target audience of qemu?  If not, fine, but apparently a statement needs 
to be made...

Also, I had considered the web site at http://bellard.org/qemu/ to be 
accurate.  Perhaps something should be done prior to a release so that those 
who browse to the site know that:
1 - the site is not an accurate source of information
or
2 - kqemu is no longer supported or maintained

Thanks
-Rick

>
> Regards,
>
> Anthony Liguori
>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: Cutting a new QEMU release
  2009-02-05 14:26 ` Anthony Liguori
  2009-02-05 15:36   ` Rick Vernam
@ 2009-02-05 15:55   ` René Rebe
  2009-02-07 12:01   ` Stefan Weil
  2 siblings, 0 replies; 82+ messages in thread
From: René Rebe @ 2009-02-05 15:55 UTC (permalink / raw)
  To: qemu-devel

Hi,

Anthony Liguori wrote:
> Steve Fosdick wrote:
>> Given the talk of a new release I though I'd try the latest qemu from
>> SVN.  At the moment I am being hampered by kqemu-1.4.0pre1 not compiling
>> though:
>>
>>   CC [M]  /usr/src/kqemu-1.4.0pre1/kqemu-linux.o
>> /usr/src/kqemu-1.4.0pre1/kqemu-linux.c: In function
>> ‘kqemu_lock_user_page’:
>> /usr/src/kqemu-1.4.0pre1/kqemu-linux.c:81: error: dereferencing pointer
>> to incomplete type
>> /usr/src/kqemu-1.4.0pre1/kqemu-linux.c: In function ‘kqemu_schedule’:
>> /usr/src/kqemu-1.4.0pre1/kqemu-linux.c:194: error: implicit declaration
>> of function ‘need_resched’
>> /usr/src/kqemu-1.4.0pre1/kqemu-linux.c:195: error: implicit declaration
>> of function ‘schedule’
>> /usr/src/kqemu-1.4.0pre1/kqemu-linux.c:197: error: implicit declaration
>> of function ‘signal_pending’
>> make[2]: *** [/usr/src/kqemu-1.4.0pre1/kqemu-linux.o] Error 1
>>
>> This is with kernel 2.6.28.2. kqemu-1.3.0pre11 seems to compile OK with
>> the kernel.  Any ideas?
>>   
>
> kqemu is unsupported and unmaintained.
Ouhm. Why's that? Give that the vast majority of CPUs in use still don't
have hardware virtualization, ...

That said kqemu builds for me and works for me.

-- 
  René Rebe - ExactCODE GmbH - Europe, Germany, Berlin
  http://exactcode.de | http://t2-project.org | http://rene.rebe.name

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: Cutting a new QEMU release
  2009-02-05 15:36   ` Rick Vernam
@ 2009-02-05 16:27     ` Paul Brook
  2009-02-05 17:15       ` René Rebe
  2009-02-05 17:51       ` Ben Taylor
  0 siblings, 2 replies; 82+ messages in thread
From: Paul Brook @ 2009-02-05 16:27 UTC (permalink / raw)
  To: qemu-devel

On Thursday 05 February 2009, Rick Vernam wrote:
> On Thursday 05 February 2009 8:26:04 am Anthony Liguori wrote:
> > kqemu is unsupported and unmaintained.
>
> Interesting.  When did it fall into that status?

IMHO It's pretty much always been that way.

> The Maintainers file shows Fabrice as the maintainer of kqemu.  I suppose
> that needs to be updated?
>
> I see Fabrice released 1.4.0pre1 on May 30th, 2008, although I never did
> see anything declaring it unsupported (I'm not suggesting it was never
> declared, just that I never saw any such declaration).
>
> Are there any plans to support it in the future?  This really is quite a
> shock to me, actually.  I know qemu has a wide range of uses - but for me
> and surely others, virtualization is a primary use.  To the best of my
> knowledge, kvm requires hardware support - where does this leave the class
> of users who need virtualization & don't have hardware virtualization
> support?  Are we no longer the a target audience of qemu?  If not, fine,
> but apparently a statement needs to be made...

You have the source, you're free to fork and maintain it yourself.

In practice Fabice is pretty much the only person who's ever done significant 
work on kqemu (except maybe some fairly minor host OS porting bits). There's 
never been a public source repository, so you get to use whatever random 
tarballs Fabrice leaves lying around. If those don't work, noone really 
cares.

Paul

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: Cutting a new QEMU release
  2009-02-05 16:27     ` Paul Brook
@ 2009-02-05 17:15       ` René Rebe
  2009-02-05 17:36         ` Paul Brook
  2009-02-05 17:51       ` Ben Taylor
  1 sibling, 1 reply; 82+ messages in thread
From: René Rebe @ 2009-02-05 17:15 UTC (permalink / raw)
  To: qemu-devel

Paul Brook wrote:
> On Thursday 05 February 2009, Rick Vernam wrote:
>   
>> On Thursday 05 February 2009 8:26:04 am Anthony Liguori wrote:
>>     
>>> kqemu is unsupported and unmaintained.
>>>       
>> Interesting.  When did it fall into that status?
>>     
>
> IMHO It's pretty much always been that way.
>
>   
>> The Maintainers file shows Fabrice as the maintainer of kqemu.  I suppose
>> that needs to be updated?
>>
>> I see Fabrice released 1.4.0pre1 on May 30th, 2008, although I never did
>> see anything declaring it unsupported (I'm not suggesting it was never
>> declared, just that I never saw any such declaration).
>>
>> Are there any plans to support it in the future?  This really is quite a
>> shock to me, actually.  I know qemu has a wide range of uses - but for me
>> and surely others, virtualization is a primary use.  To the best of my
>> knowledge, kvm requires hardware support - where does this leave the class
>> of users who need virtualization & don't have hardware virtualization
>> support?  Are we no longer the a target audience of qemu?  If not, fine,
>> but apparently a statement needs to be made...
>>     
>
> You have the source, you're free to fork and maintain it yourself.
>
> In practice Fabice is pretty much the only person who's ever done significant 
> work on kqemu (except maybe some fairly minor host OS porting bits). There's 
> never been a public source repository, so you get to use whatever random 
> tarballs Fabrice leaves lying around. If those don't work, noone really 
> cares.
>   
I find this rather drastic. So far it appears to work pretty well. And given
the sheer amount of CPU sililcon without VT/SVM it looks to be worth
keeping working. Maybe just to pull it into the Qemu SVN?

Btw. anyone knows what Fabice is doing these days?

-- 
  René Rebe - ExactCODE GmbH - Europe, Germany, Berlin
  http://exactcode.de | http://t2-project.org | http://rene.rebe.name

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: Cutting a new QEMU release
  2009-02-05 17:15       ` René Rebe
@ 2009-02-05 17:36         ` Paul Brook
  2009-02-05 17:51           ` Daniel P. Berrange
  0 siblings, 1 reply; 82+ messages in thread
From: Paul Brook @ 2009-02-05 17:36 UTC (permalink / raw)
  To: qemu-devel; +Cc: René Rebe

> given the sheer amount of CPU sililcon without VT/SVM it looks to be worth
> keeping [kqemu] working. Maybe just to pull it into the Qemu SVN?

I'd rather not.  What you[1] really need to do is get it merged into upstream 
linux kernels.  There have been several threads about this previously, the 
short version is that it probably involves rewriting to use the kvm API.
You'll find that many developers (including myself) have extremely low 
tolerance for out of tree kernel modules[2].

Paul

[1] Or someone else who actually cares/is paid to care about kqemu.
[2] Obviously there's a bit of chicken and egg here. Upstream submission 
should at least be a fairly near-term goal.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: Cutting a new QEMU release
  2009-02-05 16:27     ` Paul Brook
  2009-02-05 17:15       ` René Rebe
@ 2009-02-05 17:51       ` Ben Taylor
  2009-02-05 18:39         ` René Rebe
                           ` (2 more replies)
  1 sibling, 3 replies; 82+ messages in thread
From: Ben Taylor @ 2009-02-05 17:51 UTC (permalink / raw)
  To: qemu-devel

On Thu, Feb 5, 2009 at 11:27 AM, Paul Brook <paul@codesourcery.com> wrote:
>
> In practice Fabice is pretty much the only person who's ever done significant
> work on kqemu (except maybe some fairly minor host OS porting bits). There's
> never been a public source repository, so you get to use whatever random
> tarballs Fabrice leaves lying around. If those don't work, noone really
> cares.

I've maintained tarballs for both 1.4.0 and 1.3.0 at the qemu project
on OpenSolaris.org, and just realized that I never put into the SVN repo
the mods I made to the 1.4.0 code.  I had tested it with Solaris SXCE
and Ubuntu 08.04.  If anyone shows some interest in testing, I'll import
the 1.4.0 into the SVN repo.  I believe that I picked up the minor
patches that were posted to the list to fix compilations on linux
with some various kernels.

Ben

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: Cutting a new QEMU release
  2009-02-05 17:36         ` Paul Brook
@ 2009-02-05 17:51           ` Daniel P. Berrange
  0 siblings, 0 replies; 82+ messages in thread
From: Daniel P. Berrange @ 2009-02-05 17:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: René Rebe

On Thu, Feb 05, 2009 at 05:36:22PM +0000, Paul Brook wrote:
> > given the sheer amount of CPU sililcon without VT/SVM it looks to be worth
> > keeping [kqemu] working. Maybe just to pull it into the Qemu SVN?
> 
> I'd rather not.  What you[1] really need to do is get it merged into upstream 
> linux kernels.  There have been several threads about this previously, the 
> short version is that it probably involves rewriting to use the kvm API.
> You'll find that many developers (including myself) have extremely low 
> tolerance for out of tree kernel modules[2].

More fundamentally, whether in or out of tree, someone needs to step
forward & commit to being an active long term maintainer for the code.
Having it in QEMU SVN without someone maintaining it won't help the
current situation, and nor will dumping it upstream without someone
maintaining it.

Regards,
Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: Cutting a new QEMU release
  2009-02-05 17:51       ` Ben Taylor
@ 2009-02-05 18:39         ` René Rebe
  2009-02-05 19:03         ` Anthony Liguori
  2009-02-15 15:25         ` Andreas Färber
  2 siblings, 0 replies; 82+ messages in thread
From: René Rebe @ 2009-02-05 18:39 UTC (permalink / raw)
  To: qemu-devel

Ben Taylor wrote:
> On Thu, Feb 5, 2009 at 11:27 AM, Paul Brook <paul@codesourcery.com> wrote:
>   
>> In practice Fabice is pretty much the only person who's ever done significant
>> work on kqemu (except maybe some fairly minor host OS porting bits). There's
>> never been a public source repository, so you get to use whatever random
>> tarballs Fabrice leaves lying around. If those don't work, noone really
>> cares.
>>     
>
> I've maintained tarballs for both 1.4.0 and 1.3.0 at the qemu project
> on OpenSolaris.org, and just realized that I never put into the SVN repo
> the mods I made to the 1.4.0 code.  I had tested it with Solaris SXCE
> and Ubuntu 08.04.  If anyone shows some interest in testing, I'll import
> the 1.4.0 into the SVN repo.  I believe that I picked up the minor
> patches that were posted to the list to fix compilations on linux
> with some various kernels.
>   
Hm - kqemu-1.4.0pre1 builds for at least 2.6.28 and 2.6.26 for x86 and 
x86-64
on my side.

Anyway, could you post your modifications, some unsorted drop to my 
privately
is also welcome if you miss the time to sort it out.

Thanks,

-- 
  René Rebe - ExactCODE GmbH - Europe, Germany, Berlin
  http://exactcode.de | http://t2-project.org | http://rene.rebe.name

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: Cutting a new QEMU release
  2009-02-05 17:51       ` Ben Taylor
  2009-02-05 18:39         ` René Rebe
@ 2009-02-05 19:03         ` Anthony Liguori
  2009-02-06 10:54           ` Steve Fosdick
       [not found]           ` <92CAE88C-36FF-4566-BD1D-ACA58C98CB0F@hotmail.com>
  2009-02-15 15:25         ` Andreas Färber
  2 siblings, 2 replies; 82+ messages in thread
From: Anthony Liguori @ 2009-02-05 19:03 UTC (permalink / raw)
  To: qemu-devel

Ben Taylor wrote:
> On Thu, Feb 5, 2009 at 11:27 AM, Paul Brook <paul@codesourcery.com> wrote:
>   
>> In practice Fabice is pretty much the only person who's ever done significant
>> work on kqemu (except maybe some fairly minor host OS porting bits). There's
>> never been a public source repository, so you get to use whatever random
>> tarballs Fabrice leaves lying around. If those don't work, noone really
>> cares.
>>     
>
> I've maintained tarballs for both 1.4.0 and 1.3.0 at the qemu project
> on OpenSolaris.org, and just realized that I never put into the SVN repo
> the mods I made to the 1.4.0 code.  I had tested it with Solaris SXCE
> and Ubuntu 08.04.  If anyone shows some interest in testing, I'll import
> the 1.4.0 into the SVN repo.  I believe that I picked up the minor
> patches that were posted to the list to fix compilations on linux
> with some various kernels.
>   

Personally, I'd prefer that it lived outside of the QEMU tree.  It is 
never going to go into upstream Linux and it's not something that I 
think is worth supporting.

Regards,

Anthony Liguori

> Ben
>
>
>   

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: Cutting a new QEMU release
  2009-02-05 19:03         ` Anthony Liguori
@ 2009-02-06 10:54           ` Steve Fosdick
  2009-02-06 15:57             ` René Rebe
  2009-02-07 16:39             ` Jamie Lokier
       [not found]           ` <92CAE88C-36FF-4566-BD1D-ACA58C98CB0F@hotmail.com>
  1 sibling, 2 replies; 82+ messages in thread
From: Steve Fosdick @ 2009-02-06 10:54 UTC (permalink / raw)
  To: qemu-devel

On Thu, 2009-02-05 at 13:03 -0600, Anthony Liguori wrote:

> Personally, I'd prefer that it lived outside of the QEMU tree.  It is 
> never going to go into upstream Linux and it's not something that I 
> think is worth supporting.

Does anyone here have any stats on what people are using QEMU for?

I ask this because I suspect a significant use case is running an x86
guest on an x86 host and, at the moment, the only way to get reasonable
performance on a non virtualisation-enhanced CPU seems to be to use
kqmeu.

Now, I can understand the developers of kvm only supporting the
virtualisation-enhanced CPUs because, looking to the future they will be
common.  I suspect at the moment though there are plenty of people
running VMs on older hardware.

I can also see that if it would take major refactoring to get kqemu into
the main kernal tree it is probably not worth the efforts as, by the
time that work is complete the ratio virtualisation-enhanced CPUs to
older, non virtualisation-enhanced CPUs would be higher.

To my mind mind, what would be good right now is if someone (or some
people) understands kqemu well enough that, if kernel changes break it,
it can be fixed, not forever but until more people have
virtualisation-enhanced CPUs and can use KVM instead.

Regards,
Steve.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: Cutting a new QEMU release
  2009-02-06 10:54           ` Steve Fosdick
@ 2009-02-06 15:57             ` René Rebe
  2009-02-06 17:12               ` Anthony Liguori
  2009-02-06 21:53               ` René Rebe
  2009-02-07 16:39             ` Jamie Lokier
  1 sibling, 2 replies; 82+ messages in thread
From: René Rebe @ 2009-02-06 15:57 UTC (permalink / raw)
  To: qemu-devel

Hi,

Steve Fosdick wrote:
> On Thu, 2009-02-05 at 13:03 -0600, Anthony Liguori wrote:
>
>   
>> Personally, I'd prefer that it lived outside of the QEMU tree.  It is 
>> never going to go into upstream Linux and it's not something that I 
>> think is worth supporting.
>>     
>
> Does anyone here have any stats on what people are using QEMU for?
>
> I ask this because I suspect a significant use case is running an x86
> guest on an x86 host and, at the moment, the only way to get reasonable
> performance on a non virtualisation-enhanced CPU seems to be to use
> kqmeu.
>
> Now, I can understand the developers of kvm only supporting the
> virtualisation-enhanced CPUs because, looking to the future they will be
> common.  I suspect at the moment though there are plenty of people
> running VMs on older hardware.
>
> I can also see that if it would take major refactoring to get kqemu into
> the main kernal tree it is probably not worth the efforts as, by the
> time that work is complete the ratio virtualisation-enhanced CPUs to
> older, non virtualisation-enhanced CPUs would be higher.
>
> To my mind mind, what would be good right now is if someone (or some
> people) understands kqemu well enough that, if kernel changes break it,
> it can be fixed, not forever but until more people have
> virtualisation-enhanced CPUs and can use KVM instead.
>   
Indeed. Though I used KVM for the past months to do Linux development
and system testing / integration I had a use case for kqemu (non-VT CPU)
just this week and was surprised to find quite "old" kqemu release just 
build
and work for booth 2.6.26 and 2.6.28. And so far there was no problem with
it.

While I have no problem having it long time ported to the KVM interface,
just declaring some quite useful and functional piece of open source work
obsolete and unsupported quite drastic. This work should be not be lost
so easily.

When kqemu is supposed to be gotten upstream the question remains what
to do with the freebsd, windows, solaris, etc. glue code.

If I would know more of the internals of kqemu I would even volunteer to
maintain it - however, I just took the first look at it yesterday which does
not really qualify to maintain it just yet. Though I would work on getting
it adapted on future kernel changes, and/or even hunt a bug if it starts
crashing in one or another scenario for me (but right now I have to hunt
some crashing with 32bit host KVM for a start).

Yours,

-- 
  René Rebe - ExactCODE GmbH - Europe, Germany, Berlin
  http://exactcode.de | http://t2-project.org | http://rene.rebe.name

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: Cutting a new QEMU release
  2009-02-06 15:57             ` René Rebe
@ 2009-02-06 17:12               ` Anthony Liguori
  2009-02-06 21:47                 ` René Rebe
  2009-02-07 16:49                 ` Jamie Lokier
  2009-02-06 21:53               ` René Rebe
  1 sibling, 2 replies; 82+ messages in thread
From: Anthony Liguori @ 2009-02-06 17:12 UTC (permalink / raw)
  To: qemu-devel

René Rebe wrote:
>
> Hi, 
>>
> Indeed. Though I used KVM for the past months to do Linux development
> and system testing / integration I had a use case for kqemu (non-VT CPU)
> just this week and was surprised to find quite "old" kqemu release 
> just build
> and work for booth 2.6.26 and 2.6.28. And so far there was no problem 
> with
> it.
>
> While I have no problem having it long time ported to the KVM interface,
> just declaring some quite useful and functional piece of open source work
> obsolete and unsupported quite drastic. This work should be not be lost
> so easily.

I think you misunderstand.  Noone is claiming that kqemu is no longer 
being supported.  Quite rather, we're simply stating it's never been 
supported.

It started as a binary kernel module, impossible to support within the 
QEMU community.  While Fabrice has open sourced kqemu, it's never been 
included in QEMU.  It's not maintained by the current QEMU maintainers 
and not supported by the current QEMU maintainers.

It's essentially a separate project.

Regards,

Anthony Liguori

> When kqemu is supposed to be gotten upstream the question remains what
> to do with the freebsd, windows, solaris, etc. glue code.
>
> If I would know more of the internals of kqemu I would even volunteer to
> maintain it - however, I just took the first look at it yesterday 
> which does
> not really qualify to maintain it just yet. Though I would work on 
> getting
> it adapted on future kernel changes, and/or even hunt a bug if it starts
> crashing in one or another scenario for me (but right now I have to hunt
> some crashing with 32bit host KVM for a start).
>
> Yours,
>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: Cutting a new QEMU release
  2009-02-06 17:12               ` Anthony Liguori
@ 2009-02-06 21:47                 ` René Rebe
  2009-02-07 16:49                 ` Jamie Lokier
  1 sibling, 0 replies; 82+ messages in thread
From: René Rebe @ 2009-02-06 21:47 UTC (permalink / raw)
  To: qemu-devel

Anthony Liguori wrote:
> René Rebe wrote:
>>
>> Hi,
>>>
>> Indeed. Though I used KVM for the past months to do Linux development
>> and system testing / integration I had a use case for kqemu (non-VT CPU)
>> just this week and was surprised to find quite "old" kqemu release 
>> just build
>> and work for booth 2.6.26 and 2.6.28. And so far there was no problem 
>> with
>> it.
>>
>> While I have no problem having it long time ported to the KVM interface,
>> just declaring some quite useful and functional piece of open source work
>> obsolete and unsupported quite drastic. This work should be not be lost
>> so easily.
> 
> I think you misunderstand.  Noone is claiming that kqemu is no longer 
> being supported.  Quite rather, we're simply stating it's never been 
> supported.
> 
> It started as a binary kernel module, impossible to support within the 
> QEMU community.  While Fabrice has open sourced kqemu, it's never been 
> included in QEMU.  It's not maintained by the current QEMU maintainers 
> and not supported by the current QEMU maintainers.

I know about the history pretty well.

Btw. is Farbrice still actively working on Qemu related code these
days?

> It's essentially a separate project.

Well - depends. The user-space part always was in Qemu, but the
kernel module apparently is a little left aside. However, this
should not stop us from improving the situation instead of letting
it bitrott.

-- 
   René Rebe - ExactCODE GmbH - Europe, Germany, Berlin
   http://exactcode.de | http://t2-project.org | http://rene.rebe.name

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: Cutting a new QEMU release
  2009-02-06 15:57             ` René Rebe
  2009-02-06 17:12               ` Anthony Liguori
@ 2009-02-06 21:53               ` René Rebe
  1 sibling, 0 replies; 82+ messages in thread
From: René Rebe @ 2009-02-06 21:53 UTC (permalink / raw)
  To: qemu-devel

René Rebe wrote:

> If I would know more of the internals of kqemu I would even volunteer to
> maintain it - however, I just took the first look at it yesterday which 
> does
> not really qualify to maintain it just yet. Though I would work on getting
> it adapted on future kernel changes, and/or even hunt a bug if it starts
> crashing in one or another scenario for me (but right now I have to hunt
> some crashing with 32bit host KVM for a start).

Ok, those segfaults where due to an old non-NPTL glibc and the __thread
support being nonfunctional in this combination used in the kvm tree :-)

-- 
   René Rebe - ExactCODE GmbH - Europe, Germany, Berlin
   http://exactcode.de | http://t2-project.org | http://rene.rebe.name

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: Cutting a new QEMU release
  2009-02-05 14:26 ` Anthony Liguori
  2009-02-05 15:36   ` Rick Vernam
  2009-02-05 15:55   ` René Rebe
@ 2009-02-07 12:01   ` Stefan Weil
  2009-02-07 15:08     ` Anthony Liguori
  2009-02-07 15:36     ` Jamie Lokier
  2 siblings, 2 replies; 82+ messages in thread
From: Stefan Weil @ 2009-02-07 12:01 UTC (permalink / raw)
  To: qemu-devel

Anthony Liguori schrieb:
>
> kqemu is unsupported and unmaintained.
>
> Regards,
>
> Anthony Liguori
>

The kvm kernel module could be a good replacement for kqemu
for those running linux on new cpus.

It does not play this role in current linux distributions because
you will need newer versions of kvm which are sometimes
difficult to compile.

It will never play this role for those running "old" cpus.

And it will never play this role on Windows (or is there a kvm
for Windows?). I am surprised that nobody mentions this part
in the discussion.

So even if kqemu is unmaintained, the Qemu developers
should at least maintain the interface.

I'd prefer to have a svn tree with kqemu beside qemu.
Then patches could be sent, and maybe some day there could
be a maintainer, too.

Integration of code from virtualbox could be a way to replace
kqemu, but I don't see this coming in the near future.

Regards
Stefan Weil

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Cutting a new QEMU release
  2009-02-04 14:50 ` Aurelien Jarno
                     ` (2 preceding siblings ...)
  2009-02-04 20:07   ` Blue Swirl
@ 2009-02-07 14:15   ` Stuart Brady
  3 siblings, 0 replies; 82+ messages in thread
From: Stuart Brady @ 2009-02-07 14:15 UTC (permalink / raw)
  To: qemu-devel

On Wed, Feb 04, 2009 at 03:50:52PM +0100, Aurelien Jarno wrote:
> We don't have the sources of the current ppc_rom.bin binary, and I
> don't feel comfortable making a release with it. We probably have the
> sources of an older version.

Ouch! :(

I noticed that the upstream site for Open Hack'Ware disappeared a while
ago...  Various distros still have the source (and QEMU has a patch
against it, pc-bios/ohw.diff), and I thought that it had not been
modified in quite a while...

I'm glad that OpenBIOS is doing well, but if we really have lost some of
the Open Hack'Ware source, that's slightly disconcerting (even if it's
not really needed any longer.)

Cheers,
-- 
Stuart Brady

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: Cutting a new QEMU release
  2009-02-07 12:01   ` Stefan Weil
@ 2009-02-07 15:08     ` Anthony Liguori
  2009-02-07 15:36     ` Jamie Lokier
  1 sibling, 0 replies; 82+ messages in thread
From: Anthony Liguori @ 2009-02-07 15:08 UTC (permalink / raw)
  To: qemu-devel

Stefan Weil wrote:
> The kvm kernel module could be a good replacement for kqemu
> for those running linux on new cpus.
>
> It does not play this role in current linux distributions because
> you will need newer versions of kvm which are sometimes
> difficult to compile.
>
> It will never play this role for those running "old" cpus.
>
> And it will never play this role on Windows (or is there a kvm
> for Windows?). I am surprised that nobody mentions this part
> in the discussion.
>
> So even if kqemu is unmaintained, the Qemu developers
> should at least maintain the interface.
>
> I'd prefer to have a svn tree with kqemu beside qemu.
> Then patches could be sent, and maybe some day there could
> be a maintainer, too.
>   

Nothing is stopping anyone from taking kqemu and setting up a SVN repo 
somewhere.  That's the beauty of the GPL.

Regards,

Anthony Liguori

> Integration of code from virtualbox could be a way to replace
> kqemu, but I don't see this coming in the near future.
>
> Regards
> Stefan Weil
>
>
>
>   

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Cutting a new QEMU release
  2009-02-03 20:48 [Qemu-devel] Cutting a new QEMU release Anthony Liguori
                   ` (4 preceding siblings ...)
  2009-02-04 15:58 ` Glauber Costa
@ 2009-02-07 15:29 ` Shin-ichiro KAWASAKI
  2009-02-11 21:49   ` Rob Landley
  2009-02-09 12:43 ` Mark McLoughlin
  2009-02-13  8:40 ` Riku Voipio
  7 siblings, 1 reply; 82+ messages in thread
From: Shin-ichiro KAWASAKI @ 2009-02-07 15:29 UTC (permalink / raw)
  To: qemu-devel

Anthony Liguori wrote:
> What do people think?  TCG seems to be in a good place.  We've got 
> virtio, KVM, live migration, tons of new devices, bsd-user, etc.

The development of sh4 is rather slow, you know. Then there is no need to
think about it to decide when to cut the next version. 

>From the point of view from sh4 system emulation, that's a good news. 
It is a good way to provide current features for sh4 developers.
But before release, I hope these two points would handled by anyone.

[1] USB support

Current sh4 system emulation (r2d board) does not support USB host.
Without it, the graphic console does not receive any key input
via USB keyboard emulation.  Then, graphics console is not available
now.  I guess sh4 developers would feel it strange.

Following patch adds usb host.

    http://lists.gnu.org/archive/html/qemu-devel/2008-12/msg01620.html

It does not apply to current svn head, because of line mismatch.
I'm willing to post new version, if the patch is OK.
Could anyone review it?

[2] sh4 kernel & disk image on QEMU's download page

It's a bothering work to make kernel & disk image for sh4.  Not to obstruct
sh4 developers with it, I provide a small (4MB) set at following URL.

http://www.assembla.com/spaces/qemu-sh4/documents/b18oeq850r3AhNab7jnrAJ/download?filename=sh-test-0.1.tar.bz2

Could anyone put the image at QEMU's download page?
  http://bellard.org/qemu/download.html

I think it is the best place to provide it.

Regards,
Shin-ichiro KAWASAKI

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: Cutting a new QEMU release
  2009-02-07 12:01   ` Stefan Weil
  2009-02-07 15:08     ` Anthony Liguori
@ 2009-02-07 15:36     ` Jamie Lokier
  2009-02-07 16:45       ` Jan Kiszka
  1 sibling, 1 reply; 82+ messages in thread
From: Jamie Lokier @ 2009-02-07 15:36 UTC (permalink / raw)
  To: qemu-devel

Stefan Weil wrote:
> The kvm kernel module could be a good replacement for kqemu
> for those running linux on new cpus.

It's not yet, though.  kvm doesn't run 16-bit code properly.
I use kqemu to run older OSes, and kvm to run current ones.

I like the idea of a "kvm-soft", which is basically kqemu with a kvm
interface.  It would need a few extensions on the kvm interface, of
course.

Another potential use for _part_ of kqemu, or kvm-soft, is emulating
other CPUs with host kernel support for the memory map, instead of
full software TLB.  That might be a performance accelerator for
emulation, for some combinations of host and target CPUs where it's
feasible to map the memory in that way.

If kqemu were evolved into an accelerator for cross-CPU emulation in
that way, then its current use as an x86-on-x86 accelerator would just
be a special case of that.

-- Jamie

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: Cutting a new QEMU release
  2009-02-06 10:54           ` Steve Fosdick
  2009-02-06 15:57             ` René Rebe
@ 2009-02-07 16:39             ` Jamie Lokier
  1 sibling, 0 replies; 82+ messages in thread
From: Jamie Lokier @ 2009-02-07 16:39 UTC (permalink / raw)
  To: qemu-devel

Steve Fosdick wrote:
> Now, I can understand the developers of kvm only supporting the
> virtualisation-enhanced CPUs because, looking to the future they will be
> common.  I suspect at the moment though there are plenty of people
> running VMs on older hardware.

In a couple of brief threads before, it was made fairly clear that kvm
developers believe CPUs without the virtualisation feature are
essentially obsolete, not just non-current.

I sympathise with that view, now that my laptop has the feature :-)

But it does seem harsh, a rather sudden cut-off point as it was only a
few years ago that the virtualisation feature was not common, and it's
still not available on all x86s.

-- Jamie

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [Qemu-devel] Re: Cutting a new QEMU release
  2009-02-07 15:36     ` Jamie Lokier
@ 2009-02-07 16:45       ` Jan Kiszka
  0 siblings, 0 replies; 82+ messages in thread
From: Jan Kiszka @ 2009-02-07 16:45 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2483 bytes --]

Jamie Lokier wrote:
> Stefan Weil wrote:
>> The kvm kernel module could be a good replacement for kqemu
>> for those running linux on new cpus.
> 
> It's not yet, though.  kvm doesn't run 16-bit code properly.

You mean real mode, I guess. I think there are still a few holes in the
emulator that may bite you on certain DOSes or with some fancy boot
loaders. But 16-bit protected mode runs as fine as 32 or 64 bit for
quite a while now.

> I use kqemu to run older OSes, and kvm to run current ones.

I haven't found much code that caused troubles to kvm, but a lot that
broke kqemu - YMMV.

> 
> I like the idea of a "kvm-soft", which is basically kqemu with a kvm
> interface.  It would need a few extensions on the kvm interface, of
> course.
> 
> Another potential use for _part_ of kqemu, or kvm-soft, is emulating
> other CPUs with host kernel support for the memory map, instead of
> full software TLB.  That might be a performance accelerator for
> emulation, for some combinations of host and target CPUs where it's
> feasible to map the memory in that way.
> 
> If kqemu were evolved into an accelerator for cross-CPU emulation in
> that way, then its current use as an x86-on-x86 accelerator would just
> be a special case of that.

Most of kqemu's code base deals with / works around unvirtualizable x86
cruft. Memory management, page table handling is only a small subset.
And that part is focused on running guest code under the control of the
monitor, not in the TCG user space environment. So even if there is
something a kernel part could contribute to accelerate TCG execution,
I'm not sure that there will be a high re-use of kqemu's infrastructure
- or even kvm's.

You also should be aware of the fact the kqemu's x86 virtualization is
fairly fragile, only working for OSes like Linux and Windows, and even
there not always reliably (I've seen Linux kernels crashing). We are
evaluating alternative workarounds for these issues, but they will come
with their own limitations. Either they are too costly to implement
(binary translation) given the remaining lifetime of kqemu, or they
cause problems with self-checking guests (patch in traps & emulate). And
both will need special user space support beyond current kqemu's or
kvm's need. Depending on the outcome (for the picky customer OS), we may
be able to contribute to a properly maintained kqemu tree (or better:
kvm-soft). But this is yet open.

Jan

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [Qemu-devel] Re: Cutting a new QEMU release
  2009-02-06 17:12               ` Anthony Liguori
  2009-02-06 21:47                 ` René Rebe
@ 2009-02-07 16:49                 ` Jamie Lokier
  2009-02-07 17:06                   ` Laurent Desnogues
  2009-02-07 23:46                   ` Anthony Liguori
  1 sibling, 2 replies; 82+ messages in thread
From: Jamie Lokier @ 2009-02-07 16:49 UTC (permalink / raw)
  To: qemu-devel

Anthony Liguori wrote:
> >While I have no problem having it long time ported to the KVM interface,
> >just declaring some quite useful and functional piece of open source work
> >obsolete and unsupported quite drastic. This work should be not be lost
> >so easily.
> 
> I think you misunderstand.  Noone is claiming that kqemu is no longer 
> being supported.  Quite rather, we're simply stating it's never been 
> supported.
> 
> It started as a binary kernel module, impossible to support within the 
> QEMU community.  While Fabrice has open sourced kqemu, it's never been 
> included in QEMU.  It's not maintained by the current QEMU maintainers 
> and not supported by the current QEMU maintainers.
> 
> It's essentially a separate project.

Yes, it's unfortunate how its history worked out.  On the face of it,
it looks like Fabrice was hoping for someone to pay for it.  Maybe
they did.  I remember a vague murmur of an attempt to make an open
source replacement for kqemu when it was still binary-only; that
didn't go anywhere as far as I remember.

Anthony: If one or more maintainers were to step up, perhaps even
begin adapting the kqemu interface to kvm's, would you be interested
in folding it in the main qemu/kvm project as an official feature?

Straw poll: who here's interested in maintaining kqemu?

I have very little time, but plenty of x86 intimate knowledge and
kernel knowledge, and have used kqemu occasionally.  I can offer my
hand as "interested a bit, not by myself".

(Also, perhaps some of the Windows / other kqemu bits might be useful
in porting kvm to Windows.  Now that we have nested kvm, those of us
who never run a native Windows host can think about testing such a thing ;-)

-- Jamie

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: Cutting a new QEMU release
  2009-02-07 16:49                 ` Jamie Lokier
@ 2009-02-07 17:06                   ` Laurent Desnogues
  2009-02-07 23:46                   ` Anthony Liguori
  1 sibling, 0 replies; 82+ messages in thread
From: Laurent Desnogues @ 2009-02-07 17:06 UTC (permalink / raw)
  To: qemu-devel

On Sat, Feb 7, 2009 at 5:49 PM, Jamie Lokier <jamie@shareable.org> wrote:
> I remember a vague murmur of an attempt to make an open
> source replacement for kqemu when it was still binary-only; that
> didn't go anywhere as far as I remember.

I think you're referring to Paul's qvm86.

http://savannah.nongnu.org/projects/qvm86


Laurent

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: Cutting a new QEMU release
  2009-02-07 16:49                 ` Jamie Lokier
  2009-02-07 17:06                   ` Laurent Desnogues
@ 2009-02-07 23:46                   ` Anthony Liguori
  1 sibling, 0 replies; 82+ messages in thread
From: Anthony Liguori @ 2009-02-07 23:46 UTC (permalink / raw)
  To: qemu-devel

Jamie Lokier wrote:
> Anthony Liguori wrote:
>   
> Yes, it's unfortunate how its history worked out.  On the face of it,
> it looks like Fabrice was hoping for someone to pay for it.  Maybe
> they did.  I remember a vague murmur of an attempt to make an open
> source replacement for kqemu when it was still binary-only; that
> didn't go anywhere as far as I remember.
>
> Anthony: If one or more maintainers were to step up, perhaps even
> begin adapting the kqemu interface to kvm's, would you be interested
> in folding it in the main qemu/kvm project as an official feature?
>   

Actions speak louder than words.  All it takes is for someone to setup a 
tree somewhere with kqemu in it, and start working on it and merging 
patches.  Once that happens, we can discuss the long term future wrt KVM 
and QEMU.  Otherwise, it's just pontificating here.  Merging into Linux 
proper is going to be a lot of work.

I strongly suspect you won't see anyone step up.  From a developer 
perspective, it's a case of diminishing returns.  The more work you put 
into it, the less useful it is to people.  Every day that goes buy, the 
potential audience grows smaller.  Furthermore, the barrier to entry for 
someone to get a better solution is (i.e. KVM) is rather small.  Just 
buy a new CPU.

I think the only way it would prove useful to maintain is if some 
developer either has a deep desire to mess around with this kind of 
stuff or has a large customer base with pre-VT/SVM hardware that they 
wish to support.  So far, no such developer has proven to exist.  
Recall, even when kqemu was the only solution (but closed source), there 
wasn't really anyone interested/willing to maintain qvm86.

N.B. KVM and kqemu are not equal solutions.  Even at it's best, kqemu is 
going to be significantly slower than KVM in most cases.  When dealing 
with more modern CPUs (Barcelonas and Core i7s), the difference is going 
to be extremely high.

Regards,

Anthony Liguori

> Straw poll: who here's interested in maintaining kqemu?
>
> I have very little time, but plenty of x86 intimate knowledge and
> kernel knowledge, and have used kqemu occasionally.  I can offer my
> hand as "interested a bit, not by myself".
>
> (Also, perhaps some of the Windows / other kqemu bits might be useful
> in porting kvm to Windows.  Now that we have nested kvm, those of us
> who never run a native Windows host can think about testing such a thing ;-)
>
> -- Jamie
>
>
>   

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: Cutting a new QEMU release
       [not found]           ` <92CAE88C-36FF-4566-BD1D-ACA58C98CB0F@hotmail.com>
@ 2009-02-09  5:01             ` C.W. Betts
       [not found]               ` <784D2534-F9CD-4EA5-BBEE-67E9DE196598@hotmail.com>
  0 siblings, 1 reply; 82+ messages in thread
From: C.W. Betts @ 2009-02-09  5:01 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1398 bytes --]


On Feb 5, 2009, at 12:03 PM, Anthony Liguori wrote:

> Ben Taylor wrote:
>> On Thu, Feb 5, 2009 at 11:27 AM, Paul Brook <paul@codesourcery.com>  
>> wrote:
>>
>>> In practice Fabice is pretty much the only person who's ever done  
>>> significant
>>> work on kqemu (except maybe some fairly minor host OS porting  
>>> bits). There's
>>> never been a public source repository, so you get to use whatever  
>>> random
>>> tarballs Fabrice leaves lying around. If those don't work, noone  
>>> really
>>> cares.
>>>
>>
>> I've maintained tarballs for both 1.4.0 and 1.3.0 at the qemu project
>> on OpenSolaris.org, and just realized that I never put into the SVN  
>> repo
>> the mods I made to the 1.4.0 code.  I had tested it with Solaris SXCE
>> and Ubuntu 08.04.  If anyone shows some interest in testing, I'll  
>> import
>> the 1.4.0 into the SVN repo.  I believe that I picked up the minor
>> patches that were posted to the list to fix compilations on linux
>> with some various kernels.
>>
>
> Personally, I'd prefer that it lived outside of the QEMU tree.  It  
> is never going to go into upstream Linux and it's not something that  
> I think is worth supporting.
>
The only thing that prevents kvm being used in Windows or Darwin/OS X  
is that it depends too heavily on the Linux Kernel.  Kqemu, on the  
other hand, has been ported to Windows, and someone tried to do a  
Darwin port.

[-- Attachment #2: Type: text/html, Size: 2536 bytes --]

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: Cutting a new QEMU release
       [not found]               ` <784D2534-F9CD-4EA5-BBEE-67E9DE196598@hotmail.com>
@ 2009-02-09  5:42                 ` C.W. Betts
  2009-02-09 10:29                   ` René Rebe
  0 siblings, 1 reply; 82+ messages in thread
From: C.W. Betts @ 2009-02-09  5:42 UTC (permalink / raw)
  To: qemu-devel

Okay, what is keeping Qemu from releasing a new version?  I say we  
release a release candidate, wait for people to find the bugs (two or  
three weeks), then release a new "official" version.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: Cutting a new QEMU release
  2009-02-09  5:42                 ` C.W. Betts
@ 2009-02-09 10:29                   ` René Rebe
  0 siblings, 0 replies; 82+ messages in thread
From: René Rebe @ 2009-02-09 10:29 UTC (permalink / raw)
  To: qemu-devel

C.W. Betts wrote:
> Okay, what is keeping Qemu from releasing a new version?  I say we 
> release a release candidate, wait for people to find the bugs (two or 
> three weeks), then release a new "official" version.

I would prefer a release "schedule" like kvm: every other month - often
and quickly.

-- 
  René Rebe - ExactCODE GmbH - Europe, Germany, Berlin
  http://exactcode.de | http://t2-project.org | http://rene.rebe.name

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Cutting a new QEMU release
  2009-02-03 20:48 [Qemu-devel] Cutting a new QEMU release Anthony Liguori
                   ` (5 preceding siblings ...)
  2009-02-07 15:29 ` Shin-ichiro KAWASAKI
@ 2009-02-09 12:43 ` Mark McLoughlin
  2009-02-09 21:36   ` Anthony Liguori
  2009-02-10  0:47   ` Rob Landley
  2009-02-13  8:40 ` Riku Voipio
  7 siblings, 2 replies; 82+ messages in thread
From: Mark McLoughlin @ 2009-02-09 12:43 UTC (permalink / raw)
  To: qemu-devel

On Tue, 2009-02-03 at 14:48 -0600, Anthony Liguori wrote:
> What do people think?  TCG seems to be in a good place.  We've got 
> virtio, KVM, live migration, tons of new devices, bsd-user, etc.
> 
> We could decide to cut one by the end of the month.  I'm already doing 
> some test work in QEMU so I can follow up with some more detailed notes 
> about what is working and what isn't working.  That gives us some time 
> to decide if there's anything we need to fix before a release.

Sounds great to me.

>From a Fedora perspective, qemu-0.9.1 is a year old and upstream has
moved on a lot. As a package maintainer, it's hard to justify caring too
much about bugs reported against 0.9.1, since the bug is likely to have
very little relevance to the latest upstream.

Also, it would be really nice to have a kvm-userspace based off a solid
qemu release ... qemu moving so fast is great, but it means it's hard to
predict the stability of a given kvm-userspace release.

Some questions:

  - Will there be a period before the release when only bug fixes are 
    merged?

  - Will there be a release candidate?

  - Is there any missing features that we might push out the release
    date for?

  - Post-release, is there any interest in maintaining a stable branch 
    until the next release?

  - The plan for the next release is roughly 6 months, yes?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Cutting a new QEMU release
  2009-02-09 12:43 ` Mark McLoughlin
@ 2009-02-09 21:36   ` Anthony Liguori
  2009-02-10  0:47   ` Rob Landley
  1 sibling, 0 replies; 82+ messages in thread
From: Anthony Liguori @ 2009-02-09 21:36 UTC (permalink / raw)
  To: Mark McLoughlin, qemu-devel

Mark McLoughlin wrote:
> On Tue, 2009-02-03 at 14:48 -0600, Anthony Liguori wrote:
>   
>> What do people think?  TCG seems to be in a good place.  We've got 
>> virtio, KVM, live migration, tons of new devices, bsd-user, etc.
>>
>> We could decide to cut one by the end of the month.  I'm already doing 
>> some test work in QEMU so I can follow up with some more detailed notes 
>> about what is working and what isn't working.  That gives us some time 
>> to decide if there's anything we need to fix before a release.
>>     
>
> Sounds great to me.
>
> >From a Fedora perspective, qemu-0.9.1 is a year old and upstream has
> moved on a lot. As a package maintainer, it's hard to justify caring too
> much about bugs reported against 0.9.1, since the bug is likely to have
> very little relevance to the latest upstream.
>
> Also, it would be really nice to have a kvm-userspace based off a solid
> qemu release ... qemu moving so fast is great, but it means it's hard to
> predict the stability of a given kvm-userspace release.
>
> Some questions:
>
>   - Will there be a period before the release when only bug fixes are 
>     merged?
>   

It's a good idea, but it may be hard to pull off practically speaking 
for the first release.  Let's see how it works out.

>   - Will there be a release candidate?
>   

Sometime this week, I'll try to post something summarizing our current 
state and anything outstanding.  If there's time to put out an -rc, I'll 
try to make one available.  Things may hiccup a bit.

>   - Is there any missing features that we might push out the release
>     date for?
>   

Personally, I don't think so.  I think openbios was the biggest issue 
because we don't have the code for the current firmware.  It looks like 
that's been almost resolved.  I'm more interested in getting a release 
out in a timely manner than holding up for any particular feature.

If we have lots of features going in, I'd rather do more frequent 
releases than hold up releases.

>   - Post-release, is there any interest in maintaining a stable branch 
>     until the next release?
>   

I am tempted to try it out.  Let's see how it goes.

>   - The plan for the next release is roughly 6 months, yes?
>   

Yup.

Regards,

Anthony Liguori

> Thanks,
> Mark.
>
>
>
>   

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Cutting a new QEMU release
  2009-02-09 12:43 ` Mark McLoughlin
  2009-02-09 21:36   ` Anthony Liguori
@ 2009-02-10  0:47   ` Rob Landley
  2009-02-10  7:22     ` M. Warner Losh
  1 sibling, 1 reply; 82+ messages in thread
From: Rob Landley @ 2009-02-10  0:47 UTC (permalink / raw)
  To: qemu-devel, Mark McLoughlin

On Monday 09 February 2009 06:43:34 Mark McLoughlin wrote:
> On Tue, 2009-02-03 at 14:48 -0600, Anthony Liguori wrote:
> > What do people think?  TCG seems to be in a good place.  We've got
> > virtio, KVM, live migration, tons of new devices, bsd-user, etc.
> >
> > We could decide to cut one by the end of the month.  I'm already doing
> > some test work in QEMU so I can follow up with some more detailed notes
> > about what is working and what isn't working.  That gives us some time
> > to decide if there's anything we need to fix before a release.
>
> Sounds great to me.
>
> From a Fedora perspective, qemu-0.9.1 is a year old and upstream has
> moved on a lot. As a package maintainer, it's hard to justify caring too
> much about bugs reported against 0.9.1, since the bug is likely to have
> very little relevance to the latest upstream.
>
> Also, it would be really nice to have a kvm-userspace based off a solid
> qemu release ... qemu moving so fast is great, but it means it's hard to
> predict the stability of a given kvm-userspace release.

I'd like to point out a relevant Google tech talk video:

http://video.google.com/videoplay?docid=-5503858974016723264

April 19, 2007 Release Management in Large Free Software Projects - Martin 
Michlmayr (Debian)

ABSTRACT: Time based releases are made according to a specific time interval, 
instead of making a release when a particular functionality or set of features 
have been implemented. This talk argues that time based release management 
acts as an effective coordination mechanism in large volunteer projects and 
shows examples from seven projects that have moved to time based releases: 
Debian, GCC, GNOME, Linux, OpenOffice, Plone, and X.org.

> Some questions:
>
>   - Will there be a period before the release when only bug fixes are
>     merged?
>
>   - Will there be a release candidate?

Those two answer each other.  If your 0.9.2 release turns out to have bugs, 
you can trivially cut a bugfix-only 0.9.2.1, 0.9.2.2, 0.9.2.3... as needed.  
Weekly even.  So 0.9.2 being bug-free isn't that important.

And it's actually just about impossible for you .0 to be bug-free, because you 
get 20 times as many testers for an actual release as you get for any 
snapshot, so they _will_ find new bugs.  It's just about guaranteed.  Also 
unless your stabilization period is a hard freeze preventing new development 
from going into the repository, then you'll be introducing new bugs while you 
try to fix 'em...

>   - Post-release, is there any interest in maintaining a stable branch
>     until the next release?

That's kind of necessary for the previous two, but as long as it's clearly 
bugfix-only then it should have zero impact on new development, and can be 
done in a completely separate repository by a different maintainer.  (That's 
how the linux kernel does things.)

>   - Is there any missing features that we might push out the release
>     date for?

Defeats the purpose of time based releases: it's ok to bump things from this 
release if the next release is a finite amount of time away.  If you have no 
idea when the next release will be, then getting every last feature into this 
release (and holding up the release for it) is a big deal, and thus you have 
endless delays, feature creep, a rush to merge things that aren't quite ready 
when a release is floated...

>   - The plan for the next release is roughly 6 months, yes?

The general theory of having regular scheduled releases is that bumping stuff 
until the next release is no longer the end of the world, because there _will_ 
be a next release, and this has lots and lots of positive side effects, as 
described in the video.

> Thanks,
> Mark.

Rob

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Cutting a new QEMU release
  2009-02-10  0:47   ` Rob Landley
@ 2009-02-10  7:22     ` M. Warner Losh
  0 siblings, 0 replies; 82+ messages in thread
From: M. Warner Losh @ 2009-02-10  7:22 UTC (permalink / raw)
  To: qemu-devel, rob; +Cc: markmc

Re Time based releases:

You need to have someone drive the time based releases long term,
otherwise you'll slide back into the feature based release mode.  And
since there's always another feature, that, as has been pointed out,
tends to stretch out things a very long time.

Warner

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Cutting a new QEMU release
  2009-02-07 15:29 ` Shin-ichiro KAWASAKI
@ 2009-02-11 21:49   ` Rob Landley
  2009-02-12 14:44     ` Shin-ichiro KAWASAKI
  0 siblings, 1 reply; 82+ messages in thread
From: Rob Landley @ 2009-02-11 21:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Shin-ichiro KAWASAKI

On Saturday 07 February 2009 09:29:42 Shin-ichiro KAWASAKI wrote:
> Anthony Liguori wrote:
> > What do people think?  TCG seems to be in a good place.  We've got
> > virtio, KVM, live migration, tons of new devices, bsd-user, etc.
>
> The development of sh4 is rather slow, you know. Then there is no need to
> think about it to decide when to cut the next version.
>
> From the point of view from sh4 system emulation, that's a good news.
> It is a good way to provide current features for sh4 developers.
> But before release, I hope these two points would handled by anyone.
>
> [1] USB support
>
> Current sh4 system emulation (r2d board) does not support USB host.
> Without it, the graphic console does not receive any key input
> via USB keyboard emulation.  Then, graphics console is not available
> now.  I guess sh4 developers would feel it strange.
>
> Following patch adds usb host.
>
>     http://lists.gnu.org/archive/html/qemu-devel/2008-12/msg01620.html
>
> It does not apply to current svn head, because of line mismatch.
> I'm willing to post new version, if the patch is OK.
> Could anyone review it?
>
>
> [2] sh4 kernel & disk image on QEMU's download page
>
> It's a bothering work to make kernel & disk image for sh4.  Not to obstruct
> sh4 developers with it, I provide a small (4MB) set at following URL.
>
> http://www.assembla.com/spaces/qemu-sh4/documents/b18oeq850r3AhNab7jnrAJ/do
>wnload?filename=sh-test-0.1.tar.bz2

I downloaded this and tried it out with an svn snapshot from today (svn 6613), 
built on Ubuntu 8.10 with default "./configure; make; sudo make install".

Your README has a typo, it says "-kernel r2d_zImage" but the one you've 
packaged is just "zImage".  Once that's worked around, it pops up a qemu 
window within which it boots to a login prompt, but I can't type anything.  
This is the USB issue you mentioned, confirmed by the stdout from qemu:

  char device redirected to /dev/pts/14
  Warning: could not add USB device keyboard
  long read to SH7750_WCR1_A7 (0x000000001f800008) ignored
  long read to SH7750_WCR2_A7 (0x000000001f80000c) ignored
  long read to SH7750_WCR3_A7 (0x000000001f800010) ignored
  long read to SH7750_MCR_A7 (0x000000001f800014) ignored
  long read to SH7750_MCR_A7 (0x000000001f800014) ignored

I thought I'd work around that with a serial console (which is what I actually 
use for in my FWL project anyway), but I can't get that to work either.  Your 
command line -appends "console=ttySC0,115200" and "early_printk=serial" but 
ctrl-alt-2 shows nothing, and booting with
-nographic gives no output.

Also, going back to ctrl-alt-1 doesn't redraw the vga screen.  (The screen 
will partially redraw itself if it's still producing output, but it never 
redraws the penguin logo at the top of the frame buffer, and if the console 
has finished producing output it just stays black from then on.)

This is more functionality than I've ever gotten out of sh4, and I really look 
forward to adding sh4 support to http://impactlinux.com/fwl .  If I can get a 
serial console working I'm probably good to go, but right now it's not quite 
working for me yet...

Your README says that you can extract the config.gz from the kernel with 
linux/scripts/extract-ikconfig, but when I tried it it said:

  ERROR: Unable to extract kernel configuration information.
         This kernel image may not have the config info.

Obviously, I can't get it by logging in and catting /proc/config.gz without a 
working keyboard or serial console...

Rob

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Cutting a new QEMU release
  2009-02-11 21:49   ` Rob Landley
@ 2009-02-12 14:44     ` Shin-ichiro KAWASAKI
  2009-02-12 21:08       ` Rob Landley
  2009-02-12 21:44       ` Rob Landley
  0 siblings, 2 replies; 82+ messages in thread
From: Shin-ichiro KAWASAKI @ 2009-02-12 14:44 UTC (permalink / raw)
  To: Rob Landley; +Cc: qemu-devel

Rob Landley wrote:
> On Saturday 07 February 2009 09:29:42 Shin-ichiro KAWASAKI wrote:
>> Anthony Liguori wrote:
>>> What do people think?  TCG seems to be in a good place.  We've got
>>> virtio, KVM, live migration, tons of new devices, bsd-user, etc.
>> The development of sh4 is rather slow, you know. Then there is no need to
>> think about it to decide when to cut the next version.
>>
>> From the point of view from sh4 system emulation, that's a good news.
>> It is a good way to provide current features for sh4 developers.
>> But before release, I hope these two points would handled by anyone.
>>
>> [1] USB support
>>
>> Current sh4 system emulation (r2d board) does not support USB host.
>> Without it, the graphic console does not receive any key input
>> via USB keyboard emulation.  Then, graphics console is not available
>> now.  I guess sh4 developers would feel it strange.
>>
>> Following patch adds usb host.
>>
>>     http://lists.gnu.org/archive/html/qemu-devel/2008-12/msg01620.html
>>
>> It does not apply to current svn head, because of line mismatch.
>> I'm willing to post new version, if the patch is OK.
>> Could anyone review it?
>>
>>
>> [2] sh4 kernel & disk image on QEMU's download page
>>
>> It's a bothering work to make kernel & disk image for sh4.  Not to obstruct
>> sh4 developers with it, I provide a small (4MB) set at following URL.
>>
>> http://www.assembla.com/spaces/qemu-sh4/documents/b18oeq850r3AhNab7jnrAJ/do
>> wnload?filename=sh-test-0.1.tar.bz2
> 
> I downloaded this and tried it out with an svn snapshot from today (svn 6613), 
> built on Ubuntu 8.10 with default "./configure; make; sudo make install".

Thank you Rob for trying, and sorry for my mistakes in the package.
I thought again about the contents again, and have uploaded it on the same
URL as before.


> Your README has a typo, it says "-kernel r2d_zImage" but the one you've 
> packaged is just "zImage".  Once that's worked around, it pops up a qemu 
> window within which it boots to a login prompt, but I can't type anything.  
> This is the USB issue you mentioned, confirmed by the stdout from qemu:
> 
>   char device redirected to /dev/pts/14
>   Warning: could not add USB device keyboard
>   long read to SH7750_WCR1_A7 (0x000000001f800008) ignored
>   long read to SH7750_WCR2_A7 (0x000000001f80000c) ignored
>   long read to SH7750_WCR3_A7 (0x000000001f800010) ignored
>   long read to SH7750_MCR_A7 (0x000000001f800014) ignored
>   long read to SH7750_MCR_A7 (0x000000001f800014) ignored
> 
> I thought I'd work around that with a serial console (which is what I actually 
> use for in my FWL project anyway), but I can't get that to work either.  Your 
> command line -appends "console=ttySC0,115200" and "early_printk=serial" but 
> ctrl-alt-2 shows nothing, and booting with
> -nographic gives no output.
> 
> Also, going back to ctrl-alt-1 doesn't redraw the vga screen.  (The screen 
> will partially redraw itself if it's still producing output, but it never 
> redraws the penguin logo at the top of the frame buffer, and if the console 
> has finished producing output it just stays black from then on.)

Sorry to say, I completely missed to add '-serial null -serial stdio' in the
command line example.  Could you try following line again?

  % ./qemu-system-sh4 -M r2d -kernel zImage -hda sh-linux-mini.img -serial null -serial stdio -nographic

I hope you'll see the shell prompt.


> This is more functionality than I've ever gotten out of sh4, and I really look 
> forward to adding sh4 support to http://impactlinux.com/fwl .  If I can get a 
> serial console working I'm probably good to go, but right now it's not quite 
> working for me yet...
> 
> Your README says that you can extract the config.gz from the kernel with 
> linux/scripts/extract-ikconfig, but when I tried it it said:
> 
>   ERROR: Unable to extract kernel configuration information.
>          This kernel image may not have the config info.
> 
> Obviously, I can't get it by logging in and catting /proc/config.gz without a 
> working keyboard or serial console...

On my Ubuntu 8.04 env, I can't get config with scripts/extract-ikconfig.
On the other hand, with Ubuntu 8.10 env, I can.  I'm not sure about the reason.
Anyway, I removed the explanation about the way to get config before booting.

Additionally, I modified the default kernel boot options slightly,
and ran fsck.ext2 on the disk image before packing.
I hope the package is appropriate for qemu-sh users, now.

I'm looking forward the fwl system image.  I've been struggling
to get stable userland on which gcc is available.


Regards,
Shin-ichiro KAWASAKI

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Cutting a new QEMU release
  2009-02-12 14:44     ` Shin-ichiro KAWASAKI
@ 2009-02-12 21:08       ` Rob Landley
  2009-02-12 21:44       ` Rob Landley
  1 sibling, 0 replies; 82+ messages in thread
From: Rob Landley @ 2009-02-12 21:08 UTC (permalink / raw)
  To: Shin-ichiro KAWASAKI; +Cc: qemu-devel

On Thursday 12 February 2009 08:44:48 Shin-ichiro KAWASAKI wrote:
> Sorry to say, I completely missed to add '-serial null -serial stdio' in
> the command line example.  Could you try following line again?
>
>   % ./qemu-system-sh4 -M r2d -kernel zImage -hda sh-linux-mini.img -serial
> null -serial stdio -nographic
>
> I hope you'll see the shell prompt.

Yes I did!  Cool!

Thanks, that's what I needed.

> I'm looking forward the fwl system image.  I've been struggling
> to get stable userland on which gcc is available.

I'll try to make it work this evening.

Thank you very much,

Rob

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Cutting a new QEMU release
  2009-02-12 14:44     ` Shin-ichiro KAWASAKI
  2009-02-12 21:08       ` Rob Landley
@ 2009-02-12 21:44       ` Rob Landley
  1 sibling, 0 replies; 82+ messages in thread
From: Rob Landley @ 2009-02-12 21:44 UTC (permalink / raw)
  To: Shin-ichiro KAWASAKI; +Cc: qemu-devel

On Thursday 12 February 2009 08:44:48 Shin-ichiro KAWASAKI wrote:
> Rob Landley wrote:
> > On Saturday 07 February 2009 09:29:42 Shin-ichiro KAWASAKI wrote:

>   % ./qemu-system-sh4 -M r2d -kernel zImage -hda sh-linux-mini.img -serial
> null -serial stdio -nographic
>
> I hope you'll see the shell prompt.

FYI, here's something I type from that shell prompt which made qemu-system-sh4 
unhappy:

# reboot
The system is going down NOW!
Sending SIGTERM to all processes
Sending SIGKILL to all processes
Requesting system reboot
Restarting system.
Unauthorized access
qemu: fatal: Trying to execute code outside RAM or ROM at 0xa0000000

pc=0xa0000000 sr=0x700000f0 pr=0x8c03864c fpscr=0x00080000
spc=0x8c0126a6 ssr=0x10000000 gbr=0x2975b450 vbr=0x8c018000
sgr=0x8f989e8c dbr=0x00000000 delayed_pc=0x8c0126a0 fpul=0x00000000
r0=0x00000016 r1=0x80000001 r2=0x10000000 r3=0x0000198e
r4=0x00000000 r5=0x0000198e r6=0xffffffff r7=0xffffffff
r8=0x28121969 r9=0xfee1dead r10=0x000001a0 r11=0x01234567
r12=0x297577b8 r13=0x004c2c10 r14=0x7bea3aa0 r15=0x8f989e8c
r16=0x00000000 r17=0xffffff0f r18=0xffffffff r19=0x40008000
r20=0x8f989e08 r21=0x00000000 r22=0x00000000 r23=0x8f988000


Just FYI. :)

Rob

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Cutting a new QEMU release
  2009-02-03 20:48 [Qemu-devel] Cutting a new QEMU release Anthony Liguori
                   ` (6 preceding siblings ...)
  2009-02-09 12:43 ` Mark McLoughlin
@ 2009-02-13  8:40 ` Riku Voipio
  2009-02-13  9:59   ` Stefano Stabellini
  2009-02-13 16:30   ` Jamie Lokier
  7 siblings, 2 replies; 82+ messages in thread
From: Riku Voipio @ 2009-02-13  8:40 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 488 bytes --]

On Tue, Feb 03, 2009 at 02:48:22PM -0600, Anthony Liguori wrote:
> We could decide to cut one by the end of the month.

This would indeed be really cool.

> .. to decide if there's anything we need to fix before a release.

At least the OS X (cocoa) host is broken, which is IMHO pretty
bad regression. Apart from that I'm not aware of any major regression
(wearing arm-linux-user, some arm-softmmu and debian hats).

-- 
"rm -rf" only sounds scary if you don't have backups

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Cutting a new QEMU release
  2009-02-13  8:40 ` Riku Voipio
@ 2009-02-13  9:59   ` Stefano Stabellini
  2009-02-13 16:30   ` Jamie Lokier
  1 sibling, 0 replies; 82+ messages in thread
From: Stefano Stabellini @ 2009-02-13  9:59 UTC (permalink / raw)
  To: qemu-devel@nongnu.org

Riku Voipio wrote:

>> .. to decide if there's anything we need to fix before a release.
> 
> At least the OS X (cocoa) host is broken, which is IMHO pretty
> bad regression. Apart from that I'm not aware of any major regression
> (wearing arm-linux-user, some arm-softmmu and debian hats).
> 



There was a patch laying around few weeks ago that was OK for most cases
but didn't work in some.
If we don't have anything better I think we could accept the patch and
add a few #ifdef in vga.c to make sure the patch does not break.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Cutting a new QEMU release
  2009-02-13  8:40 ` Riku Voipio
  2009-02-13  9:59   ` Stefano Stabellini
@ 2009-02-13 16:30   ` Jamie Lokier
  2009-02-13 17:00     ` Anthony Liguori
  1 sibling, 1 reply; 82+ messages in thread
From: Jamie Lokier @ 2009-02-13 16:30 UTC (permalink / raw)
  To: qemu-devel

Riku Voipio wrote:
> On Tue, Feb 03, 2009 at 02:48:22PM -0600, Anthony Liguori wrote:
> > We could decide to cut one by the end of the month.
> 
> This would indeed be really cool.
> 
> > .. to decide if there's anything we need to fix before a release.
> 
> At least the OS X (cocoa) host is broken, which is IMHO pretty
> bad regression. Apart from that I'm not aware of any major regression
> (wearing arm-linux-user, some arm-softmmu and debian hats).

I'd say the two qcow2 data corruption bugs are a major regression.
(Both reported in in another thread).

qemu 0.9.1 has the qcow2 code from kvm-72, which doesn't exhibit
either of those corruption bugs.  A new release based on current kvm
userspace would introduce those bugs.  One of the bugs (reported by
Marc) corrupts a qcow2 image so you can't use it even if you revert to
an older qemu/kvm.  It's not clear if the other bug causes permanent
corruption itself, but anything which causes a guest to see the wrong
data can lead to the guest writing corrupt data elsewhere later on.

Simply reverting the qcow2 code appears to fix those problems, so it
needn't hold up cutting a release.  That's what I recommend.

-- Jamie

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Cutting a new QEMU release
  2009-02-13 16:30   ` Jamie Lokier
@ 2009-02-13 17:00     ` Anthony Liguori
  2009-02-13 19:04       ` [Qemu-devel] [PATCH] Revert block-qcow2.c to kvm-72 version due to corruption reports Jamie Lokier
  0 siblings, 1 reply; 82+ messages in thread
From: Anthony Liguori @ 2009-02-13 17:00 UTC (permalink / raw)
  To: qemu-devel

Jamie Lokier wrote:
> Riku Voipio wrote:
>   
>> On Tue, Feb 03, 2009 at 02:48:22PM -0600, Anthony Liguori wrote:
>>     
>>> We could decide to cut one by the end of the month.
>>>       
>> This would indeed be really cool.
>>
>>     
>>> .. to decide if there's anything we need to fix before a release.
>>>       
>> At least the OS X (cocoa) host is broken, which is IMHO pretty
>> bad regression. Apart from that I'm not aware of any major regression
>> (wearing arm-linux-user, some arm-softmmu and debian hats).
>>     
>
> I'd say the two qcow2 data corruption bugs are a major regression.
> (Both reported in in another thread).
>
> qemu 0.9.1 has the qcow2 code from kvm-72, which doesn't exhibit
> either of those corruption bugs.  A new release based on current kvm
> userspace would introduce those bugs.  One of the bugs (reported by
> Marc) corrupts a qcow2 image so you can't use it even if you revert to
> an older qemu/kvm.  It's not clear if the other bug causes permanent
> corruption itself, but anything which causes a guest to see the wrong
> data can lead to the guest writing corrupt data elsewhere later on.
>
> Simply reverting the qcow2 code appears to fix those problems, so it
> needn't hold up cutting a release.  That's what I recommend.
>   

Send some patches.

Regards,

Anthony Liguori

> -- Jamie
>
>
>   
 

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [Qemu-devel] [PATCH] Revert block-qcow2.c to kvm-72 version due to corruption reports
  2009-02-13 17:00     ` Anthony Liguori
@ 2009-02-13 19:04       ` Jamie Lokier
  2009-02-14 22:23         ` Dor Laor
  2009-02-14 23:13         ` Anthony Liguori
  0 siblings, 2 replies; 82+ messages in thread
From: Jamie Lokier @ 2009-02-13 19:04 UTC (permalink / raw)
  To: qemu-devel

Anthony Liguori wrote:
> >Simply reverting the qcow2 code appears to fix those problems, so it
> >needn't hold up cutting a release.  That's what I recommend.
> 
> Send some patches.

I did already.

Here it is again.  This should fix my bug and Marc's bug according to
his report that reverting qcow2.c fixes it.

-- Jamie


Subject: Revert block-qcow2.c to kvm-72 version due to corruption reports

This fixes two kinds of qcow2 corruption observed in kvm-83 (actually
kvm-73 and later), from three bug reports.


Bug 1: Windows 2000 guests complain of corrupt registry.

Many Windows 2000 guests which boot and runs fine in kvm-72, fail with
a blue-screen indicating file corruption errors in kvm-73 through to
kvm-83 (the latest), and succeed if we replace block-qcow2.c with the
version from kvm-72.

The blue screen appears towards the end of the boot sequence, and
shows only briefly before rebooting.  It says:

    STOP: c0000218 (Registry File Failure)
    The registry cannot load the hive (file):
    \SystemRoot\System32\Config\SOFTWARE
    or its log or alternate.
    It is corrupt, absent, or not writable.

    Beginning dump of physical memory
    Physical memory dump complete. Contact your system administrator or
    technical support [...?]

This is narrowed down to the difference in block-qcow2.c between
kvm-72 and kvm-73 (not -83).  From kvm-73 to kvm-83, there have been
more changes block-qcow2.c, but the observed corruption still occurs.

The bug isn't evident when only reading.  When using "qemu-img
convert" to convert a qcow2 file to a raw file, with broken and fixed
versions of block-qcow2.c it produces the same raw file.  Also, when
using "-snapshot" with qemu, the blue screen doesn't occur.

This bug was observed by Jamie Lokier <jamie@shareable.org> and 
confirmed for multiple Windows 2000 guests by
Marc Bevand <m.bevand@gmail.com>.


Bug 2: Windows 2003 guests complain of corrupt registry.

According to
http://sourceforge.net/tracker/?func=detail&atid=893831&aid=2001452&group_id=180599

Windows 2003 32-bit guests randomly spew disk corruption messages
like this:

    Windows – Registry Hive Recovered
    Registry hive (file): SOFTWARE was corrupted and it has
    been recovered. Some data might have been lost.

and

    The system cannot log on due to the following error:
    Unable to complete the requested operation because of
    either a catastrophic media failure or a data structure
    corruption on the disk.

This bug was reported by <gerdwachs@users.sourceforge.net> and
confirmed by Marc Bevand, noting:

    kvm-73+ also causes some of my Windows 2003 guests to exhibit this
    exact registry corruption error.  [...]  This bug is also fixed by
    reverting block-qcow2.c to the version from kvm-72.

Worryingly, gerdwachs' bug report says it's for kvm-70, implying this
patch may not fix all the Windows 2003 guest corruption problems.

At least Marc says his observed problem goes away with kvm-72's qcow2.


Bug 3: Corruption of qcow2 index rendering the file unusable.

Marc Bevand writes:

    I tested kvm-81 and kvm-83 as well (can't test kvm-80 or older because
    of the qcow2 performance regression caused by the default writethrough
    caching policy) but it randomly triggers an even worse bug: the moment
    I shut down a guest by typing "quit" in the monitor, it sometimes
    overwrite the first 4kB of the disk image with mostly NUL bytes (!)
    which completely destroys it. I am familiar with the qcow2 format and
    apparently this 4kB block seems to be an L2 table with most entries
    set to zero. I have had to restore at least 6 or 7 disk images from
    backup after occurences of that bug. My intuition tells me this may be
    the qcow2 code trying to allocate a cluster to write a new L2 table,
    but not noticing the allocation failed (represented by a 0 offset),
    and writing the L2 table at that 0 offset, overwriting the qcow2
    header.

    Fortunately this bug is also fixed by running kvm-75 with
    block-qcow2.c reverted to its kvm-72 version.

    Basically qcow2 in kvm-73 or newer is completely unreliable.


Reverting block-qcow2.c to the version in kvm-72 appears to fix the
corruption symptoms reported by Marc and Jamie, although gerdwachs'
related bug is against kvm-70 so it may not fix that.

Unfortunately this reverts some optimisations, but fixing corruption
is more important until the new code is reliable.

This patch reverts block-qcow2.c in kvm-83 to the version in kvm-72,
except the "cache=writeback" default performance tweak is retained and
there's no need to define "offsetof".

Signed-Off-By: Jamie Lokier <jamie@shareable.org>


--- kvm-83-real/qemu/block-qcow2.c	2009-01-13 13:29:42.000000000 +0000
+++ kvm-83/qemu/block-qcow2.c	2009-02-13 18:51:12.000000000 +0000
@@ -52,8 +52,6 @@
 #define QCOW_CRYPT_NONE 0
 #define QCOW_CRYPT_AES  1
 
-#define QCOW_MAX_CRYPT_CLUSTERS 32
-
 /* indicate that the refcount of the referenced cluster is exactly one. */
 #define QCOW_OFLAG_COPIED     (1LL << 63)
 /* indicate that the cluster is compressed (they never have the copied flag) */
@@ -269,8 +267,7 @@
     if (!s->cluster_cache)
         goto fail;
     /* one more sector for decompressed data alignment */
-    s->cluster_data = qemu_malloc(QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size
-                                  + 512);
+    s->cluster_data = qemu_malloc(s->cluster_size + 512);
     if (!s->cluster_data)
         goto fail;
     s->cluster_cache_offset = -1;
@@ -437,7 +434,8 @@
     int new_l1_size, new_l1_size2, ret, i;
     uint64_t *new_l1_table;
     uint64_t new_l1_table_offset;
-    uint8_t data[12];
+    uint64_t data64;
+    uint32_t data32;
 
     new_l1_size = s->l1_size;
     if (min_size <= new_l1_size)
@@ -467,10 +465,13 @@
         new_l1_table[i] = be64_to_cpu(new_l1_table[i]);
 
     /* set new table */
-    cpu_to_be32w((uint32_t*)data, new_l1_size);
-    cpu_to_be64w((uint64_t*)(data + 4), new_l1_table_offset);
-    if (bdrv_pwrite(s->hd, offsetof(QCowHeader, l1_size), data,
-                sizeof(data)) != sizeof(data))
+    data64 = cpu_to_be64(new_l1_table_offset);
+    if (bdrv_pwrite(s->hd, offsetof(QCowHeader, l1_table_offset),
+                    &data64, sizeof(data64)) != sizeof(data64))
+        goto fail;
+    data32 = cpu_to_be32(new_l1_size);
+    if (bdrv_pwrite(s->hd, offsetof(QCowHeader, l1_size),
+                    &data32, sizeof(data32)) != sizeof(data32))
         goto fail;
     qemu_free(s->l1_table);
     free_clusters(bs, s->l1_table_offset, s->l1_size * sizeof(uint64_t));
@@ -483,549 +484,169 @@
     return -EIO;
 }
 
-/*
- * seek_l2_table
+/* 'allocate' is:
  *
- * seek l2_offset in the l2_cache table
- * if not found, return NULL,
- * if found,
- *   increments the l2 cache hit count of the entry,
- *   if counter overflow, divide by two all counters
- *   return the pointer to the l2 cache entry
+ * 0 not to allocate.
  *
- */
-
-static uint64_t *seek_l2_table(BDRVQcowState *s, uint64_t l2_offset)
-{
-    int i, j;
-
-    for(i = 0; i < L2_CACHE_SIZE; i++) {
-        if (l2_offset == s->l2_cache_offsets[i]) {
-            /* increment the hit count */
-            if (++s->l2_cache_counts[i] == 0xffffffff) {
-                for(j = 0; j < L2_CACHE_SIZE; j++) {
-                    s->l2_cache_counts[j] >>= 1;
-                }
-            }
-            return s->l2_cache + (i << s->l2_bits);
-        }
-    }
-    return NULL;
-}
-
-/*
- * l2_load
+ * 1 to allocate a normal cluster (for sector indexes 'n_start' to
+ * 'n_end')
  *
- * Loads a L2 table into memory. If the table is in the cache, the cache
- * is used; otherwise the L2 table is loaded from the image file.
+ * 2 to allocate a compressed cluster of size
+ * 'compressed_size'. 'compressed_size' must be > 0 and <
+ * cluster_size
  *
- * Returns a pointer to the L2 table on success, or NULL if the read from
- * the image file failed.
+ * return 0 if not allocated.
  */
-
-static uint64_t *l2_load(BlockDriverState *bs, uint64_t l2_offset)
-{
-    BDRVQcowState *s = bs->opaque;
-    int min_index;
-    uint64_t *l2_table;
-
-    /* seek if the table for the given offset is in the cache */
-
-    l2_table = seek_l2_table(s, l2_offset);
-    if (l2_table != NULL)
-        return l2_table;
-
-    /* not found: load a new entry in the least used one */
-
-    min_index = l2_cache_new_entry(bs);
-    l2_table = s->l2_cache + (min_index << s->l2_bits);
-    if (bdrv_pread(s->hd, l2_offset, l2_table, s->l2_size * sizeof(uint64_t)) !=
-        s->l2_size * sizeof(uint64_t))
-        return NULL;
-    s->l2_cache_offsets[min_index] = l2_offset;
-    s->l2_cache_counts[min_index] = 1;
-
-    return l2_table;
-}
-
-/*
- * l2_allocate
- *
- * Allocate a new l2 entry in the file. If l1_index points to an already
- * used entry in the L2 table (i.e. we are doing a copy on write for the L2
- * table) copy the contents of the old L2 table into the newly allocated one.
- * Otherwise the new table is initialized with zeros.
- *
- */
-
-static uint64_t *l2_allocate(BlockDriverState *bs, int l1_index)
-{
-    BDRVQcowState *s = bs->opaque;
-    int min_index;
-    uint64_t old_l2_offset, tmp;
-    uint64_t *l2_table, l2_offset;
-
-    old_l2_offset = s->l1_table[l1_index];
-
-    /* allocate a new l2 entry */
-
-    l2_offset = alloc_clusters(bs, s->l2_size * sizeof(uint64_t));
-
-    /* update the L1 entry */
-
-    s->l1_table[l1_index] = l2_offset | QCOW_OFLAG_COPIED;
-
-    tmp = cpu_to_be64(l2_offset | QCOW_OFLAG_COPIED);
-    if (bdrv_pwrite(s->hd, s->l1_table_offset + l1_index * sizeof(tmp),
-                    &tmp, sizeof(tmp)) != sizeof(tmp))
-        return NULL;
-
-    /* allocate a new entry in the l2 cache */
-
-    min_index = l2_cache_new_entry(bs);
-    l2_table = s->l2_cache + (min_index << s->l2_bits);
-
-    if (old_l2_offset == 0) {
-        /* if there was no old l2 table, clear the new table */
-        memset(l2_table, 0, s->l2_size * sizeof(uint64_t));
-    } else {
-        /* if there was an old l2 table, read it from the disk */
-        if (bdrv_pread(s->hd, old_l2_offset,
-                       l2_table, s->l2_size * sizeof(uint64_t)) !=
-            s->l2_size * sizeof(uint64_t))
-            return NULL;
-    }
-    /* write the l2 table to the file */
-    if (bdrv_pwrite(s->hd, l2_offset,
-                    l2_table, s->l2_size * sizeof(uint64_t)) !=
-        s->l2_size * sizeof(uint64_t))
-        return NULL;
-
-    /* update the l2 cache entry */
-
-    s->l2_cache_offsets[min_index] = l2_offset;
-    s->l2_cache_counts[min_index] = 1;
-
-    return l2_table;
-}
-
-static int size_to_clusters(BDRVQcowState *s, int64_t size)
-{
-    return (size + (s->cluster_size - 1)) >> s->cluster_bits;
-}
-
-static int count_contiguous_clusters(uint64_t nb_clusters, int cluster_size,
-        uint64_t *l2_table, uint64_t start, uint64_t mask)
-{
-    int i;
-    uint64_t offset = be64_to_cpu(l2_table[0]) & ~mask;
-
-    if (!offset)
-        return 0;
-
-    for (i = start; i < start + nb_clusters; i++)
-        if (offset + i * cluster_size != (be64_to_cpu(l2_table[i]) & ~mask))
-            break;
-
-	return (i - start);
-}
-
-static int count_contiguous_free_clusters(uint64_t nb_clusters, uint64_t *l2_table)
-{
-    int i = 0;
-
-    while(nb_clusters-- && l2_table[i] == 0)
-        i++;
-
-    return i;
-}
-
-/*
- * get_cluster_offset
- *
- * For a given offset of the disk image, return cluster offset in
- * qcow2 file.
- *
- * on entry, *num is the number of contiguous clusters we'd like to
- * access following offset.
- *
- * on exit, *num is the number of contiguous clusters we can read.
- *
- * Return 1, if the offset is found
- * Return 0, otherwise.
- *
- */
-
 static uint64_t get_cluster_offset(BlockDriverState *bs,
-                                   uint64_t offset, int *num)
-{
-    BDRVQcowState *s = bs->opaque;
-    int l1_index, l2_index;
-    uint64_t l2_offset, *l2_table, cluster_offset;
-    int l1_bits, c;
-    int index_in_cluster, nb_available, nb_needed, nb_clusters;
-
-    index_in_cluster = (offset >> 9) & (s->cluster_sectors - 1);
-    nb_needed = *num + index_in_cluster;
-
-    l1_bits = s->l2_bits + s->cluster_bits;
-
-    /* compute how many bytes there are between the offset and
-     * the end of the l1 entry
-     */
-
-    nb_available = (1 << l1_bits) - (offset & ((1 << l1_bits) - 1));
-
-    /* compute the number of available sectors */
-
-    nb_available = (nb_available >> 9) + index_in_cluster;
-
-    cluster_offset = 0;
-
-    /* seek the the l2 offset in the l1 table */
-
-    l1_index = offset >> l1_bits;
-    if (l1_index >= s->l1_size)
-        goto out;
-
-    l2_offset = s->l1_table[l1_index];
-
-    /* seek the l2 table of the given l2 offset */
-
-    if (!l2_offset)
-        goto out;
-
-    /* load the l2 table in memory */
-
-    l2_offset &= ~QCOW_OFLAG_COPIED;
-    l2_table = l2_load(bs, l2_offset);
-    if (l2_table == NULL)
-        return 0;
-
-    /* find the cluster offset for the given disk offset */
-
-    l2_index = (offset >> s->cluster_bits) & (s->l2_size - 1);
-    cluster_offset = be64_to_cpu(l2_table[l2_index]);
-    nb_clusters = size_to_clusters(s, nb_needed << 9);
-
-    if (!cluster_offset) {
-        /* how many empty clusters ? */
-        c = count_contiguous_free_clusters(nb_clusters, &l2_table[l2_index]);
-    } else {
-        /* how many allocated clusters ? */
-        c = count_contiguous_clusters(nb_clusters, s->cluster_size,
-                &l2_table[l2_index], 0, QCOW_OFLAG_COPIED);
-    }
-
-   nb_available = (c * s->cluster_sectors);
-out:
-    if (nb_available > nb_needed)
-        nb_available = nb_needed;
-
-    *num = nb_available - index_in_cluster;
-
-    return cluster_offset & ~QCOW_OFLAG_COPIED;
-}
-
-/*
- * free_any_clusters
- *
- * free clusters according to its type: compressed or not
- *
- */
-
-static void free_any_clusters(BlockDriverState *bs,
-                              uint64_t cluster_offset, int nb_clusters)
-{
-    BDRVQcowState *s = bs->opaque;
-
-    /* free the cluster */
-
-    if (cluster_offset & QCOW_OFLAG_COMPRESSED) {
-        int nb_csectors;
-        nb_csectors = ((cluster_offset >> s->csize_shift) &
-                       s->csize_mask) + 1;
-        free_clusters(bs, (cluster_offset & s->cluster_offset_mask) & ~511,
-                      nb_csectors * 512);
-        return;
-    }
-
-    free_clusters(bs, cluster_offset, nb_clusters << s->cluster_bits);
-
-    return;
-}
-
-/*
- * get_cluster_table
- *
- * for a given disk offset, load (and allocate if needed)
- * the l2 table.
- *
- * the l2 table offset in the qcow2 file and the cluster index
- * in the l2 table are given to the caller.
- *
- */
-
-static int get_cluster_table(BlockDriverState *bs, uint64_t offset,
-                             uint64_t **new_l2_table,
-                             uint64_t *new_l2_offset,
-                             int *new_l2_index)
+                                   uint64_t offset, int allocate,
+                                   int compressed_size,
+                                   int n_start, int n_end)
 {
     BDRVQcowState *s = bs->opaque;
-    int l1_index, l2_index, ret;
-    uint64_t l2_offset, *l2_table;
-
-    /* seek the the l2 offset in the l1 table */
+    int min_index, i, j, l1_index, l2_index, ret;
+    uint64_t l2_offset, *l2_table, cluster_offset, tmp, old_l2_offset;
 
     l1_index = offset >> (s->l2_bits + s->cluster_bits);
     if (l1_index >= s->l1_size) {
-        ret = grow_l1_table(bs, l1_index + 1);
-        if (ret < 0)
+        /* outside l1 table is allowed: we grow the table if needed */
+        if (!allocate)
+            return 0;
+        if (grow_l1_table(bs, l1_index + 1) < 0)
             return 0;
     }
     l2_offset = s->l1_table[l1_index];
+    if (!l2_offset) {
+        if (!allocate)
+            return 0;
+    l2_allocate:
+        old_l2_offset = l2_offset;
+        /* allocate a new l2 entry */
+        l2_offset = alloc_clusters(bs, s->l2_size * sizeof(uint64_t));
+        /* update the L1 entry */
+        s->l1_table[l1_index] = l2_offset | QCOW_OFLAG_COPIED;
+        tmp = cpu_to_be64(l2_offset | QCOW_OFLAG_COPIED);
+        if (bdrv_pwrite(s->hd, s->l1_table_offset + l1_index * sizeof(tmp),
+                        &tmp, sizeof(tmp)) != sizeof(tmp))
+            return 0;
+        min_index = l2_cache_new_entry(bs);
+        l2_table = s->l2_cache + (min_index << s->l2_bits);
 
-    /* seek the l2 table of the given l2 offset */
-
-    if (l2_offset & QCOW_OFLAG_COPIED) {
-        /* load the l2 table in memory */
-        l2_offset &= ~QCOW_OFLAG_COPIED;
-        l2_table = l2_load(bs, l2_offset);
-        if (l2_table == NULL)
+        if (old_l2_offset == 0) {
+            memset(l2_table, 0, s->l2_size * sizeof(uint64_t));
+        } else {
+            if (bdrv_pread(s->hd, old_l2_offset,
+                           l2_table, s->l2_size * sizeof(uint64_t)) !=
+                s->l2_size * sizeof(uint64_t))
+                return 0;
+        }
+        if (bdrv_pwrite(s->hd, l2_offset,
+                        l2_table, s->l2_size * sizeof(uint64_t)) !=
+            s->l2_size * sizeof(uint64_t))
             return 0;
     } else {
-        if (l2_offset)
-            free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t));
-        l2_table = l2_allocate(bs, l1_index);
-        if (l2_table == NULL)
+        if (!(l2_offset & QCOW_OFLAG_COPIED)) {
+            if (allocate) {
+                free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t));
+                goto l2_allocate;
+            }
+        } else {
+            l2_offset &= ~QCOW_OFLAG_COPIED;
+        }
+        for(i = 0; i < L2_CACHE_SIZE; i++) {
+            if (l2_offset == s->l2_cache_offsets[i]) {
+                /* increment the hit count */
+                if (++s->l2_cache_counts[i] == 0xffffffff) {
+                    for(j = 0; j < L2_CACHE_SIZE; j++) {
+                        s->l2_cache_counts[j] >>= 1;
+                    }
+                }
+                l2_table = s->l2_cache + (i << s->l2_bits);
+                goto found;
+            }
+        }
+        /* not found: load a new entry in the least used one */
+        min_index = l2_cache_new_entry(bs);
+        l2_table = s->l2_cache + (min_index << s->l2_bits);
+        if (bdrv_pread(s->hd, l2_offset, l2_table, s->l2_size * sizeof(uint64_t)) !=
+            s->l2_size * sizeof(uint64_t))
             return 0;
-        l2_offset = s->l1_table[l1_index] & ~QCOW_OFLAG_COPIED;
     }
-
-    /* find the cluster offset for the given disk offset */
-
+    s->l2_cache_offsets[min_index] = l2_offset;
+    s->l2_cache_counts[min_index] = 1;
+ found:
     l2_index = (offset >> s->cluster_bits) & (s->l2_size - 1);
-
-    *new_l2_table = l2_table;
-    *new_l2_offset = l2_offset;
-    *new_l2_index = l2_index;
-
-    return 1;
-}
-
-/*
- * alloc_compressed_cluster_offset
- *
- * For a given offset of the disk image, return cluster offset in
- * qcow2 file.
- *
- * If the offset is not found, allocate a new compressed cluster.
- *
- * Return the cluster offset if successful,
- * Return 0, otherwise.
- *
- */
-
-static uint64_t alloc_compressed_cluster_offset(BlockDriverState *bs,
-                                                uint64_t offset,
-                                                int compressed_size)
-{
-    BDRVQcowState *s = bs->opaque;
-    int l2_index, ret;
-    uint64_t l2_offset, *l2_table, cluster_offset;
-    int nb_csectors;
-
-    ret = get_cluster_table(bs, offset, &l2_table, &l2_offset, &l2_index);
-    if (ret == 0)
-        return 0;
-
     cluster_offset = be64_to_cpu(l2_table[l2_index]);
-    if (cluster_offset & QCOW_OFLAG_COPIED)
-        return cluster_offset & ~QCOW_OFLAG_COPIED;
-
-    if (cluster_offset)
-        free_any_clusters(bs, cluster_offset, 1);
-
-    cluster_offset = alloc_bytes(bs, compressed_size);
-    nb_csectors = ((cluster_offset + compressed_size - 1) >> 9) -
-                  (cluster_offset >> 9);
-
-    cluster_offset |= QCOW_OFLAG_COMPRESSED |
-                      ((uint64_t)nb_csectors << s->csize_shift);
-
-    /* update L2 table */
-
-    /* compressed clusters never have the copied flag */
-
-    l2_table[l2_index] = cpu_to_be64(cluster_offset);
-    if (bdrv_pwrite(s->hd,
-                    l2_offset + l2_index * sizeof(uint64_t),
-                    l2_table + l2_index,
-                    sizeof(uint64_t)) != sizeof(uint64_t))
-        return 0;
-
-    return cluster_offset;
-}
-
-typedef struct QCowL2Meta
-{
-    uint64_t offset;
-    int n_start;
-    int nb_available;
-    int nb_clusters;
-} QCowL2Meta;
-
-static int alloc_cluster_link_l2(BlockDriverState *bs, uint64_t cluster_offset,
-        QCowL2Meta *m)
-{
-    BDRVQcowState *s = bs->opaque;
-    int i, j = 0, l2_index, ret;
-    uint64_t *old_cluster, start_sect, l2_offset, *l2_table;
-
-    if (m->nb_clusters == 0)
-        return 0;
-
-    if (!(old_cluster = qemu_malloc(m->nb_clusters * sizeof(uint64_t))))
-        return -ENOMEM;
-
-    /* copy content of unmodified sectors */
-    start_sect = (m->offset & ~(s->cluster_size - 1)) >> 9;
-    if (m->n_start) {
-        ret = copy_sectors(bs, start_sect, cluster_offset, 0, m->n_start);
-        if (ret < 0)
-            goto err;
+    if (!cluster_offset) {
+        if (!allocate)
+            return cluster_offset;
+    } else if (!(cluster_offset & QCOW_OFLAG_COPIED)) {
+        if (!allocate)
+            return cluster_offset;
+        /* free the cluster */
+        if (cluster_offset & QCOW_OFLAG_COMPRESSED) {
+            int nb_csectors;
+            nb_csectors = ((cluster_offset >> s->csize_shift) &
+                           s->csize_mask) + 1;
+            free_clusters(bs, (cluster_offset & s->cluster_offset_mask) & ~511,
+                          nb_csectors * 512);
+        } else {
+            free_clusters(bs, cluster_offset, s->cluster_size);
+        }
+    } else {
+        cluster_offset &= ~QCOW_OFLAG_COPIED;
+        return cluster_offset;
     }
-
-    if (m->nb_available & (s->cluster_sectors - 1)) {
-        uint64_t end = m->nb_available & ~(uint64_t)(s->cluster_sectors - 1);
-        ret = copy_sectors(bs, start_sect + end, cluster_offset + (end << 9),
-                m->nb_available - end, s->cluster_sectors);
-        if (ret < 0)
-            goto err;
+    if (allocate == 1) {
+        /* allocate a new cluster */
+        cluster_offset = alloc_clusters(bs, s->cluster_size);
+
+        /* we must initialize the cluster content which won't be
+           written */
+        if ((n_end - n_start) < s->cluster_sectors) {
+            uint64_t start_sect;
+
+            start_sect = (offset & ~(s->cluster_size - 1)) >> 9;
+            ret = copy_sectors(bs, start_sect,
+                               cluster_offset, 0, n_start);
+            if (ret < 0)
+                return 0;
+            ret = copy_sectors(bs, start_sect,
+                               cluster_offset, n_end, s->cluster_sectors);
+            if (ret < 0)
+                return 0;
+        }
+        tmp = cpu_to_be64(cluster_offset | QCOW_OFLAG_COPIED);
+    } else {
+        int nb_csectors;
+        cluster_offset = alloc_bytes(bs, compressed_size);
+        nb_csectors = ((cluster_offset + compressed_size - 1) >> 9) -
+            (cluster_offset >> 9);
+        cluster_offset |= QCOW_OFLAG_COMPRESSED |
+            ((uint64_t)nb_csectors << s->csize_shift);
+        /* compressed clusters never have the copied flag */
+        tmp = cpu_to_be64(cluster_offset);
     }
-
-    ret = -EIO;
     /* update L2 table */
-    if (!get_cluster_table(bs, m->offset, &l2_table, &l2_offset, &l2_index))
-        goto err;
-
-    for (i = 0; i < m->nb_clusters; i++) {
-        if(l2_table[l2_index + i] != 0)
-            old_cluster[j++] = l2_table[l2_index + i];
-
-        l2_table[l2_index + i] = cpu_to_be64((cluster_offset +
-                    (i << s->cluster_bits)) | QCOW_OFLAG_COPIED);
-     }
-
-    if (bdrv_pwrite(s->hd, l2_offset + l2_index * sizeof(uint64_t),
-                l2_table + l2_index, m->nb_clusters * sizeof(uint64_t)) !=
-            m->nb_clusters * sizeof(uint64_t))
-        goto err;
-
-    for (i = 0; i < j; i++)
-        free_any_clusters(bs, old_cluster[i], 1);
-
-    ret = 0;
-err:
-    qemu_free(old_cluster);
-    return ret;
- }
-
-/*
- * alloc_cluster_offset
- *
- * For a given offset of the disk image, return cluster offset in
- * qcow2 file.
- *
- * If the offset is not found, allocate a new cluster.
- *
- * Return the cluster offset if successful,
- * Return 0, otherwise.
- *
- */
-
-static uint64_t alloc_cluster_offset(BlockDriverState *bs,
-                                     uint64_t offset,
-                                     int n_start, int n_end,
-                                     int *num, QCowL2Meta *m)
-{
-    BDRVQcowState *s = bs->opaque;
-    int l2_index, ret;
-    uint64_t l2_offset, *l2_table, cluster_offset;
-    int nb_clusters, i = 0;
-
-    ret = get_cluster_table(bs, offset, &l2_table, &l2_offset, &l2_index);
-    if (ret == 0)
+    l2_table[l2_index] = tmp;
+    if (bdrv_pwrite(s->hd,
+                    l2_offset + l2_index * sizeof(tmp), &tmp, sizeof(tmp)) != sizeof(tmp))
         return 0;
-
-    nb_clusters = size_to_clusters(s, n_end << 9);
-
-    nb_clusters = MIN(nb_clusters, s->l2_size - l2_index);
-
-    cluster_offset = be64_to_cpu(l2_table[l2_index]);
-
-    /* We keep all QCOW_OFLAG_COPIED clusters */
-
-    if (cluster_offset & QCOW_OFLAG_COPIED) {
-        nb_clusters = count_contiguous_clusters(nb_clusters, s->cluster_size,
-                &l2_table[l2_index], 0, 0);
-
-        cluster_offset &= ~QCOW_OFLAG_COPIED;
-        m->nb_clusters = 0;
-
-        goto out;
-    }
-
-    /* for the moment, multiple compressed clusters are not managed */
-
-    if (cluster_offset & QCOW_OFLAG_COMPRESSED)
-        nb_clusters = 1;
-
-    /* how many available clusters ? */
-
-    while (i < nb_clusters) {
-        i += count_contiguous_clusters(nb_clusters - i, s->cluster_size,
-                &l2_table[l2_index], i, 0);
-
-        if(be64_to_cpu(l2_table[l2_index + i]))
-            break;
-
-        i += count_contiguous_free_clusters(nb_clusters - i,
-                &l2_table[l2_index + i]);
-
-        cluster_offset = be64_to_cpu(l2_table[l2_index + i]);
-
-        if ((cluster_offset & QCOW_OFLAG_COPIED) ||
-                (cluster_offset & QCOW_OFLAG_COMPRESSED))
-            break;
-    }
-    nb_clusters = i;
-
-    /* allocate a new cluster */
-
-    cluster_offset = alloc_clusters(bs, nb_clusters * s->cluster_size);
-
-    /* save info needed for meta data update */
-    m->offset = offset;
-    m->n_start = n_start;
-    m->nb_clusters = nb_clusters;
-
-out:
-    m->nb_available = MIN(nb_clusters << (s->cluster_bits - 9), n_end);
-
-    *num = m->nb_available - n_start;
-
     return cluster_offset;
 }
 
 static int qcow_is_allocated(BlockDriverState *bs, int64_t sector_num,
                              int nb_sectors, int *pnum)
 {
+    BDRVQcowState *s = bs->opaque;
+    int index_in_cluster, n;
     uint64_t cluster_offset;
 
-    *pnum = nb_sectors;
-    cluster_offset = get_cluster_offset(bs, sector_num << 9, pnum);
-
+    cluster_offset = get_cluster_offset(bs, sector_num << 9, 0, 0, 0, 0);
+    index_in_cluster = sector_num & (s->cluster_sectors - 1);
+    n = s->cluster_sectors - index_in_cluster;
+    if (n > nb_sectors)
+        n = nb_sectors;
+    *pnum = n;
     return (cluster_offset != 0);
 }
 
@@ -1102,9 +723,11 @@
     uint64_t cluster_offset;
 
     while (nb_sectors > 0) {
-        n = nb_sectors;
-        cluster_offset = get_cluster_offset(bs, sector_num << 9, &n);
+        cluster_offset = get_cluster_offset(bs, sector_num << 9, 0, 0, 0, 0);
         index_in_cluster = sector_num & (s->cluster_sectors - 1);
+        n = s->cluster_sectors - index_in_cluster;
+        if (n > nb_sectors)
+            n = nb_sectors;
         if (!cluster_offset) {
             if (bs->backing_hd) {
                 /* read from the base image */
@@ -1143,18 +766,15 @@
     BDRVQcowState *s = bs->opaque;
     int ret, index_in_cluster, n;
     uint64_t cluster_offset;
-    int n_end;
-    QCowL2Meta l2meta;
 
     while (nb_sectors > 0) {
         index_in_cluster = sector_num & (s->cluster_sectors - 1);
-        n_end = index_in_cluster + nb_sectors;
-        if (s->crypt_method &&
-            n_end > QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors)
-            n_end = QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors;
-        cluster_offset = alloc_cluster_offset(bs, sector_num << 9,
-                                              index_in_cluster,
-                                              n_end, &n, &l2meta);
+        n = s->cluster_sectors - index_in_cluster;
+        if (n > nb_sectors)
+            n = nb_sectors;
+        cluster_offset = get_cluster_offset(bs, sector_num << 9, 1, 0,
+                                            index_in_cluster,
+                                            index_in_cluster + n);
         if (!cluster_offset)
             return -1;
         if (s->crypt_method) {
@@ -1165,10 +785,8 @@
         } else {
             ret = bdrv_pwrite(s->hd, cluster_offset + index_in_cluster * 512, buf, n * 512);
         }
-        if (ret != n * 512 || alloc_cluster_link_l2(bs, cluster_offset, &l2meta) < 0) {
-            free_any_clusters(bs, cluster_offset, l2meta.nb_clusters);
+        if (ret != n * 512)
             return -1;
-        }
         nb_sectors -= n;
         sector_num += n;
         buf += n * 512;
@@ -1186,33 +804,8 @@
     uint64_t cluster_offset;
     uint8_t *cluster_data;
     BlockDriverAIOCB *hd_aiocb;
-    QEMUBH *bh;
-    QCowL2Meta l2meta;
 } QCowAIOCB;
 
-static void qcow_aio_read_cb(void *opaque, int ret);
-static void qcow_aio_read_bh(void *opaque)
-{
-    QCowAIOCB *acb = opaque;
-    qemu_bh_delete(acb->bh);
-    acb->bh = NULL;
-    qcow_aio_read_cb(opaque, 0);
-}
-
-static int qcow_schedule_bh(QEMUBHFunc *cb, QCowAIOCB *acb)
-{
-    if (acb->bh)
-        return -EIO;
-
-    acb->bh = qemu_bh_new(cb, acb);
-    if (!acb->bh)
-        return -EIO;
-
-    qemu_bh_schedule(acb->bh);
-
-    return 0;
-}
-
 static void qcow_aio_read_cb(void *opaque, int ret)
 {
     QCowAIOCB *acb = opaque;
@@ -1222,12 +815,13 @@
 
     acb->hd_aiocb = NULL;
     if (ret < 0) {
-fail:
+    fail:
         acb->common.cb(acb->common.opaque, ret);
         qemu_aio_release(acb);
         return;
     }
 
+ redo:
     /* post process the read buffer */
     if (!acb->cluster_offset) {
         /* nothing to do */
@@ -1253,9 +847,12 @@
     }
 
     /* prepare next AIO request */
-    acb->n = acb->nb_sectors;
-    acb->cluster_offset = get_cluster_offset(bs, acb->sector_num << 9, &acb->n);
+    acb->cluster_offset = get_cluster_offset(bs, acb->sector_num << 9,
+                                             0, 0, 0, 0);
     index_in_cluster = acb->sector_num & (s->cluster_sectors - 1);
+    acb->n = s->cluster_sectors - index_in_cluster;
+    if (acb->n > acb->nb_sectors)
+        acb->n = acb->nb_sectors;
 
     if (!acb->cluster_offset) {
         if (bs->backing_hd) {
@@ -1268,16 +865,12 @@
                 if (acb->hd_aiocb == NULL)
                     goto fail;
             } else {
-                ret = qcow_schedule_bh(qcow_aio_read_bh, acb);
-                if (ret < 0)
-                    goto fail;
+                goto redo;
             }
         } else {
             /* Note: in this case, no need to wait */
             memset(acb->buf, 0, 512 * acb->n);
-            ret = qcow_schedule_bh(qcow_aio_read_bh, acb);
-            if (ret < 0)
-                goto fail;
+            goto redo;
         }
     } else if (acb->cluster_offset & QCOW_OFLAG_COMPRESSED) {
         /* add AIO support for compressed blocks ? */
@@ -1285,9 +878,7 @@
             goto fail;
         memcpy(acb->buf,
                s->cluster_cache + index_in_cluster * 512, 512 * acb->n);
-        ret = qcow_schedule_bh(qcow_aio_read_bh, acb);
-        if (ret < 0)
-            goto fail;
+        goto redo;
     } else {
         if ((acb->cluster_offset & 511) != 0) {
             ret = -EIO;
@@ -1316,7 +907,6 @@
     acb->nb_sectors = nb_sectors;
     acb->n = 0;
     acb->cluster_offset = 0;
-    acb->l2meta.nb_clusters = 0;
     return acb;
 }
 
@@ -1340,8 +930,8 @@
     BlockDriverState *bs = acb->common.bs;
     BDRVQcowState *s = bs->opaque;
     int index_in_cluster;
+    uint64_t cluster_offset;
     const uint8_t *src_buf;
-    int n_end;
 
     acb->hd_aiocb = NULL;
 
@@ -1352,11 +942,6 @@
         return;
     }
 
-    if (alloc_cluster_link_l2(bs, acb->cluster_offset, &acb->l2meta) < 0) {
-        free_any_clusters(bs, acb->cluster_offset, acb->l2meta.nb_clusters);
-        goto fail;
-    }
-
     acb->nb_sectors -= acb->n;
     acb->sector_num += acb->n;
     acb->buf += acb->n * 512;
@@ -1369,22 +954,19 @@
     }
 
     index_in_cluster = acb->sector_num & (s->cluster_sectors - 1);
-    n_end = index_in_cluster + acb->nb_sectors;
-    if (s->crypt_method &&
-        n_end > QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors)
-        n_end = QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors;
-
-    acb->cluster_offset = alloc_cluster_offset(bs, acb->sector_num << 9,
-                                          index_in_cluster,
-                                          n_end, &acb->n, &acb->l2meta);
-    if (!acb->cluster_offset || (acb->cluster_offset & 511) != 0) {
+    acb->n = s->cluster_sectors - index_in_cluster;
+    if (acb->n > acb->nb_sectors)
+        acb->n = acb->nb_sectors;
+    cluster_offset = get_cluster_offset(bs, acb->sector_num << 9, 1, 0,
+                                        index_in_cluster,
+                                        index_in_cluster + acb->n);
+    if (!cluster_offset || (cluster_offset & 511) != 0) {
         ret = -EIO;
         goto fail;
     }
     if (s->crypt_method) {
         if (!acb->cluster_data) {
-            acb->cluster_data = qemu_mallocz(QCOW_MAX_CRYPT_CLUSTERS *
-                                             s->cluster_size);
+            acb->cluster_data = qemu_mallocz(s->cluster_size);
             if (!acb->cluster_data) {
                 ret = -ENOMEM;
                 goto fail;
@@ -1397,7 +979,7 @@
         src_buf = acb->buf;
     }
     acb->hd_aiocb = bdrv_aio_write(s->hd,
-                                   (acb->cluster_offset >> 9) + index_in_cluster,
+                                   (cluster_offset >> 9) + index_in_cluster,
                                    src_buf, acb->n,
                                    qcow_aio_write_cb, acb);
     if (acb->hd_aiocb == NULL)
@@ -1571,7 +1153,7 @@
 
     memset(s->l1_table, 0, l1_length);
     if (bdrv_pwrite(s->hd, s->l1_table_offset, s->l1_table, l1_length) < 0)
-        return -1;
+	return -1;
     ret = bdrv_truncate(s->hd, s->l1_table_offset + l1_length);
     if (ret < 0)
         return ret;
@@ -1637,10 +1219,8 @@
         /* could not compress: write normal cluster */
         qcow_write(bs, sector_num, buf, s->cluster_sectors);
     } else {
-        cluster_offset = alloc_compressed_cluster_offset(bs, sector_num << 9,
-                                              out_len);
-        if (!cluster_offset)
-            return -1;
+        cluster_offset = get_cluster_offset(bs, sector_num << 9, 2,
+                                            out_len, 0, 0);
         cluster_offset &= s->cluster_offset_mask;
         if (bdrv_pwrite(s->hd, cluster_offset, out_buf, out_len) != out_len) {
             qemu_free(out_buf);
@@ -2225,19 +1805,26 @@
     BDRVQcowState *s = bs->opaque;
     int i, nb_clusters;
 
-    nb_clusters = size_to_clusters(s, size);
-retry:
-    for(i = 0; i < nb_clusters; i++) {
-        int64_t i = s->free_cluster_index++;
-        if (get_refcount(bs, i) != 0)
-            goto retry;
-    }
+    nb_clusters = (size + s->cluster_size - 1) >> s->cluster_bits;
+    for(;;) {
+        if (get_refcount(bs, s->free_cluster_index) == 0) {
+            s->free_cluster_index++;
+            for(i = 1; i < nb_clusters; i++) {
+                if (get_refcount(bs, s->free_cluster_index) != 0)
+                    goto not_found;
+                s->free_cluster_index++;
+            }
 #ifdef DEBUG_ALLOC2
-    printf("alloc_clusters: size=%lld -> %lld\n",
-            size,
-            (s->free_cluster_index - nb_clusters) << s->cluster_bits);
+            printf("alloc_clusters: size=%lld -> %lld\n",
+                   size,
+                   (s->free_cluster_index - nb_clusters) << s->cluster_bits);
 #endif
-    return (s->free_cluster_index - nb_clusters) << s->cluster_bits;
+            return (s->free_cluster_index - nb_clusters) << s->cluster_bits;
+        } else {
+        not_found:
+            s->free_cluster_index++;
+        }
+    }
 }
 
 static int64_t alloc_clusters(BlockDriverState *bs, int64_t size)
@@ -2301,7 +1888,8 @@
     int new_table_size, new_table_size2, refcount_table_clusters, i, ret;
     uint64_t *new_table;
     int64_t table_offset;
-    uint8_t data[12];
+    uint64_t data64;
+    uint32_t data32;
     int old_table_size;
     int64_t old_table_offset;
 
@@ -2340,10 +1928,13 @@
     for(i = 0; i < s->refcount_table_size; i++)
         be64_to_cpus(&new_table[i]);
 
-    cpu_to_be64w((uint64_t*)data, table_offset);
-    cpu_to_be32w((uint32_t*)(data + 8), refcount_table_clusters);
+    data64 = cpu_to_be64(table_offset);
     if (bdrv_pwrite(s->hd, offsetof(QCowHeader, refcount_table_offset),
-                    data, sizeof(data)) != sizeof(data))
+                    &data64, sizeof(data64)) != sizeof(data64))
+        goto fail;
+    data32 = cpu_to_be32(refcount_table_clusters);
+    if (bdrv_pwrite(s->hd, offsetof(QCowHeader, refcount_table_clusters),
+                    &data32, sizeof(data32)) != sizeof(data32))
         goto fail;
     qemu_free(s->refcount_table);
     old_table_offset = s->refcount_table_offset;
@@ -2572,7 +2163,7 @@
     uint16_t *refcount_table;
 
     size = bdrv_getlength(s->hd);
-    nb_clusters = size_to_clusters(s, size);
+    nb_clusters = (size + s->cluster_size - 1) >> s->cluster_bits;
     refcount_table = qemu_mallocz(nb_clusters * sizeof(uint16_t));
 
     /* header */
@@ -2624,7 +2215,7 @@
     int refcount;
 
     size = bdrv_getlength(s->hd);
-    nb_clusters = size_to_clusters(s, size);
+    nb_clusters = (size + s->cluster_size - 1) >> s->cluster_bits;
     for(k = 0; k < nb_clusters;) {
         k1 = k;
         refcount = get_refcount(bs, k);

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [PATCH] Revert block-qcow2.c to kvm-72 version due to corruption reports
  2009-02-13 19:04       ` [Qemu-devel] [PATCH] Revert block-qcow2.c to kvm-72 version due to corruption reports Jamie Lokier
@ 2009-02-14 22:23         ` Dor Laor
  2009-02-15  2:20           ` Jamie Lokier
  2009-02-14 23:13         ` Anthony Liguori
  1 sibling, 1 reply; 82+ messages in thread
From: Dor Laor @ 2009-02-14 22:23 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 42670 bytes --]

Jamie Lokier wrote:
> Anthony Liguori wrote:
>   
>>> Simply reverting the qcow2 code appears to fix those problems, so it
>>> needn't hold up cutting a release.  That's what I recommend.
>>>       
>> Send some patches.
>>     
>
> I did already.
>
> Here it is again.  This should fix my bug and Marc's bug according to
> his report that reverting qcow2.c fixes it.
>   
Going back to kvm-72 is not good also.
First, there were qcow2 corruptions before it, they were very rare but 
still exist. Not long
ago we did not know even that qcow2 is the faulty.
In addition, Gleb fixed some qcow2 meta data ordering writes. We need to 
keep them in.
The solution is to find the real cause to the corruption.
> -- Jamie
>
>
> Subject: Revert block-qcow2.c to kvm-72 version due to corruption reports
>
> This fixes two kinds of qcow2 corruption observed in kvm-83 (actually
> kvm-73 and later), from three bug reports.
>
>
> Bug 1: Windows 2000 guests complain of corrupt registry.
>
> Many Windows 2000 guests which boot and runs fine in kvm-72, fail with
> a blue-screen indicating file corruption errors in kvm-73 through to
> kvm-83 (the latest), and succeed if we replace block-qcow2.c with the
> version from kvm-72.
>
> The blue screen appears towards the end of the boot sequence, and
> shows only briefly before rebooting.  It says:
>
>     STOP: c0000218 (Registry File Failure)
>     The registry cannot load the hive (file):
>     \SystemRoot\System32\Config\SOFTWARE
>     or its log or alternate.
>     It is corrupt, absent, or not writable.
>
>     Beginning dump of physical memory
>     Physical memory dump complete. Contact your system administrator or
>     technical support [...?]
>
> This is narrowed down to the difference in block-qcow2.c between
> kvm-72 and kvm-73 (not -83).  From kvm-73 to kvm-83, there have been
> more changes block-qcow2.c, but the observed corruption still occurs.
>
> The bug isn't evident when only reading.  When using "qemu-img
> convert" to convert a qcow2 file to a raw file, with broken and fixed
> versions of block-qcow2.c it produces the same raw file.  Also, when
> using "-snapshot" with qemu, the blue screen doesn't occur.
>
> This bug was observed by Jamie Lokier <jamie@shareable.org> and 
> confirmed for multiple Windows 2000 guests by
> Marc Bevand <m.bevand@gmail.com>.
>
>
> Bug 2: Windows 2003 guests complain of corrupt registry.
>
> According to
> http://sourceforge.net/tracker/?func=detail&atid=893831&aid=2001452&group_id=180599
>
> Windows 2003 32-bit guests randomly spew disk corruption messages
> like this:
>
>     Windows – Registry Hive Recovered
>     Registry hive (file): SOFTWARE was corrupted and it has
>     been recovered. Some data might have been lost.
>
> and
>
>     The system cannot log on due to the following error:
>     Unable to complete the requested operation because of
>     either a catastrophic media failure or a data structure
>     corruption on the disk.
>
> This bug was reported by <gerdwachs@users.sourceforge.net> and
> confirmed by Marc Bevand, noting:
>
>     kvm-73+ also causes some of my Windows 2003 guests to exhibit this
>     exact registry corruption error.  [...]  This bug is also fixed by
>     reverting block-qcow2.c to the version from kvm-72.
>
> Worryingly, gerdwachs' bug report says it's for kvm-70, implying this
> patch may not fix all the Windows 2003 guest corruption problems.
>
> At least Marc says his observed problem goes away with kvm-72's qcow2.
>
>
> Bug 3: Corruption of qcow2 index rendering the file unusable.
>
> Marc Bevand writes:
>
>     I tested kvm-81 and kvm-83 as well (can't test kvm-80 or older because
>     of the qcow2 performance regression caused by the default writethrough
>     caching policy) but it randomly triggers an even worse bug: the moment
>     I shut down a guest by typing "quit" in the monitor, it sometimes
>     overwrite the first 4kB of the disk image with mostly NUL bytes (!)
>     which completely destroys it. I am familiar with the qcow2 format and
>     apparently this 4kB block seems to be an L2 table with most entries
>     set to zero. I have had to restore at least 6 or 7 disk images from
>     backup after occurences of that bug. My intuition tells me this may be
>     the qcow2 code trying to allocate a cluster to write a new L2 table,
>     but not noticing the allocation failed (represented by a 0 offset),
>     and writing the L2 table at that 0 offset, overwriting the qcow2
>     header.
>
>     Fortunately this bug is also fixed by running kvm-75 with
>     block-qcow2.c reverted to its kvm-72 version.
>
>     Basically qcow2 in kvm-73 or newer is completely unreliable.
>
>
> Reverting block-qcow2.c to the version in kvm-72 appears to fix the
> corruption symptoms reported by Marc and Jamie, although gerdwachs'
> related bug is against kvm-70 so it may not fix that.
>
> Unfortunately this reverts some optimisations, but fixing corruption
> is more important until the new code is reliable.
>
> This patch reverts block-qcow2.c in kvm-83 to the version in kvm-72,
> except the "cache=writeback" default performance tweak is retained and
> there's no need to define "offsetof".
>
> Signed-Off-By: Jamie Lokier <jamie@shareable.org>
>
>
> --- kvm-83-real/qemu/block-qcow2.c	2009-01-13 13:29:42.000000000 +0000
> +++ kvm-83/qemu/block-qcow2.c	2009-02-13 18:51:12.000000000 +0000
> @@ -52,8 +52,6 @@
>  #define QCOW_CRYPT_NONE 0
>  #define QCOW_CRYPT_AES  1
>  
> -#define QCOW_MAX_CRYPT_CLUSTERS 32
> -
>  /* indicate that the refcount of the referenced cluster is exactly one. */
>  #define QCOW_OFLAG_COPIED     (1LL << 63)
>  /* indicate that the cluster is compressed (they never have the copied flag) */
> @@ -269,8 +267,7 @@
>      if (!s->cluster_cache)
>          goto fail;
>      /* one more sector for decompressed data alignment */
> -    s->cluster_data = qemu_malloc(QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size
> -                                  + 512);
> +    s->cluster_data = qemu_malloc(s->cluster_size + 512);
>      if (!s->cluster_data)
>          goto fail;
>      s->cluster_cache_offset = -1;
> @@ -437,7 +434,8 @@
>      int new_l1_size, new_l1_size2, ret, i;
>      uint64_t *new_l1_table;
>      uint64_t new_l1_table_offset;
> -    uint8_t data[12];
> +    uint64_t data64;
> +    uint32_t data32;
>  
>      new_l1_size = s->l1_size;
>      if (min_size <= new_l1_size)
> @@ -467,10 +465,13 @@
>          new_l1_table[i] = be64_to_cpu(new_l1_table[i]);
>  
>      /* set new table */
> -    cpu_to_be32w((uint32_t*)data, new_l1_size);
> -    cpu_to_be64w((uint64_t*)(data + 4), new_l1_table_offset);
> -    if (bdrv_pwrite(s->hd, offsetof(QCowHeader, l1_size), data,
> -                sizeof(data)) != sizeof(data))
> +    data64 = cpu_to_be64(new_l1_table_offset);
> +    if (bdrv_pwrite(s->hd, offsetof(QCowHeader, l1_table_offset),
> +                    &data64, sizeof(data64)) != sizeof(data64))
> +        goto fail;
> +    data32 = cpu_to_be32(new_l1_size);
> +    if (bdrv_pwrite(s->hd, offsetof(QCowHeader, l1_size),
> +                    &data32, sizeof(data32)) != sizeof(data32))
>          goto fail;
>      qemu_free(s->l1_table);
>      free_clusters(bs, s->l1_table_offset, s->l1_size * sizeof(uint64_t));
> @@ -483,549 +484,169 @@
>      return -EIO;
>  }
>  
> -/*
> - * seek_l2_table
> +/* 'allocate' is:
>   *
> - * seek l2_offset in the l2_cache table
> - * if not found, return NULL,
> - * if found,
> - *   increments the l2 cache hit count of the entry,
> - *   if counter overflow, divide by two all counters
> - *   return the pointer to the l2 cache entry
> + * 0 not to allocate.
>   *
> - */
> -
> -static uint64_t *seek_l2_table(BDRVQcowState *s, uint64_t l2_offset)
> -{
> -    int i, j;
> -
> -    for(i = 0; i < L2_CACHE_SIZE; i++) {
> -        if (l2_offset == s->l2_cache_offsets[i]) {
> -            /* increment the hit count */
> -            if (++s->l2_cache_counts[i] == 0xffffffff) {
> -                for(j = 0; j < L2_CACHE_SIZE; j++) {
> -                    s->l2_cache_counts[j] >>= 1;
> -                }
> -            }
> -            return s->l2_cache + (i << s->l2_bits);
> -        }
> -    }
> -    return NULL;
> -}
> -
> -/*
> - * l2_load
> + * 1 to allocate a normal cluster (for sector indexes 'n_start' to
> + * 'n_end')
>   *
> - * Loads a L2 table into memory. If the table is in the cache, the cache
> - * is used; otherwise the L2 table is loaded from the image file.
> + * 2 to allocate a compressed cluster of size
> + * 'compressed_size'. 'compressed_size' must be > 0 and <
> + * cluster_size
>   *
> - * Returns a pointer to the L2 table on success, or NULL if the read from
> - * the image file failed.
> + * return 0 if not allocated.
>   */
> -
> -static uint64_t *l2_load(BlockDriverState *bs, uint64_t l2_offset)
> -{
> -    BDRVQcowState *s = bs->opaque;
> -    int min_index;
> -    uint64_t *l2_table;
> -
> -    /* seek if the table for the given offset is in the cache */
> -
> -    l2_table = seek_l2_table(s, l2_offset);
> -    if (l2_table != NULL)
> -        return l2_table;
> -
> -    /* not found: load a new entry in the least used one */
> -
> -    min_index = l2_cache_new_entry(bs);
> -    l2_table = s->l2_cache + (min_index << s->l2_bits);
> -    if (bdrv_pread(s->hd, l2_offset, l2_table, s->l2_size * sizeof(uint64_t)) !=
> -        s->l2_size * sizeof(uint64_t))
> -        return NULL;
> -    s->l2_cache_offsets[min_index] = l2_offset;
> -    s->l2_cache_counts[min_index] = 1;
> -
> -    return l2_table;
> -}
> -
> -/*
> - * l2_allocate
> - *
> - * Allocate a new l2 entry in the file. If l1_index points to an already
> - * used entry in the L2 table (i.e. we are doing a copy on write for the L2
> - * table) copy the contents of the old L2 table into the newly allocated one.
> - * Otherwise the new table is initialized with zeros.
> - *
> - */
> -
> -static uint64_t *l2_allocate(BlockDriverState *bs, int l1_index)
> -{
> -    BDRVQcowState *s = bs->opaque;
> -    int min_index;
> -    uint64_t old_l2_offset, tmp;
> -    uint64_t *l2_table, l2_offset;
> -
> -    old_l2_offset = s->l1_table[l1_index];
> -
> -    /* allocate a new l2 entry */
> -
> -    l2_offset = alloc_clusters(bs, s->l2_size * sizeof(uint64_t));
> -
> -    /* update the L1 entry */
> -
> -    s->l1_table[l1_index] = l2_offset | QCOW_OFLAG_COPIED;
> -
> -    tmp = cpu_to_be64(l2_offset | QCOW_OFLAG_COPIED);
> -    if (bdrv_pwrite(s->hd, s->l1_table_offset + l1_index * sizeof(tmp),
> -                    &tmp, sizeof(tmp)) != sizeof(tmp))
> -        return NULL;
> -
> -    /* allocate a new entry in the l2 cache */
> -
> -    min_index = l2_cache_new_entry(bs);
> -    l2_table = s->l2_cache + (min_index << s->l2_bits);
> -
> -    if (old_l2_offset == 0) {
> -        /* if there was no old l2 table, clear the new table */
> -        memset(l2_table, 0, s->l2_size * sizeof(uint64_t));
> -    } else {
> -        /* if there was an old l2 table, read it from the disk */
> -        if (bdrv_pread(s->hd, old_l2_offset,
> -                       l2_table, s->l2_size * sizeof(uint64_t)) !=
> -            s->l2_size * sizeof(uint64_t))
> -            return NULL;
> -    }
> -    /* write the l2 table to the file */
> -    if (bdrv_pwrite(s->hd, l2_offset,
> -                    l2_table, s->l2_size * sizeof(uint64_t)) !=
> -        s->l2_size * sizeof(uint64_t))
> -        return NULL;
> -
> -    /* update the l2 cache entry */
> -
> -    s->l2_cache_offsets[min_index] = l2_offset;
> -    s->l2_cache_counts[min_index] = 1;
> -
> -    return l2_table;
> -}
> -
> -static int size_to_clusters(BDRVQcowState *s, int64_t size)
> -{
> -    return (size + (s->cluster_size - 1)) >> s->cluster_bits;
> -}
> -
> -static int count_contiguous_clusters(uint64_t nb_clusters, int cluster_size,
> -        uint64_t *l2_table, uint64_t start, uint64_t mask)
> -{
> -    int i;
> -    uint64_t offset = be64_to_cpu(l2_table[0]) & ~mask;
> -
> -    if (!offset)
> -        return 0;
> -
> -    for (i = start; i < start + nb_clusters; i++)
> -        if (offset + i * cluster_size != (be64_to_cpu(l2_table[i]) & ~mask))
> -            break;
> -
> -	return (i - start);
> -}
> -
> -static int count_contiguous_free_clusters(uint64_t nb_clusters, uint64_t *l2_table)
> -{
> -    int i = 0;
> -
> -    while(nb_clusters-- && l2_table[i] == 0)
> -        i++;
> -
> -    return i;
> -}
> -
> -/*
> - * get_cluster_offset
> - *
> - * For a given offset of the disk image, return cluster offset in
> - * qcow2 file.
> - *
> - * on entry, *num is the number of contiguous clusters we'd like to
> - * access following offset.
> - *
> - * on exit, *num is the number of contiguous clusters we can read.
> - *
> - * Return 1, if the offset is found
> - * Return 0, otherwise.
> - *
> - */
> -
>  static uint64_t get_cluster_offset(BlockDriverState *bs,
> -                                   uint64_t offset, int *num)
> -{
> -    BDRVQcowState *s = bs->opaque;
> -    int l1_index, l2_index;
> -    uint64_t l2_offset, *l2_table, cluster_offset;
> -    int l1_bits, c;
> -    int index_in_cluster, nb_available, nb_needed, nb_clusters;
> -
> -    index_in_cluster = (offset >> 9) & (s->cluster_sectors - 1);
> -    nb_needed = *num + index_in_cluster;
> -
> -    l1_bits = s->l2_bits + s->cluster_bits;
> -
> -    /* compute how many bytes there are between the offset and
> -     * the end of the l1 entry
> -     */
> -
> -    nb_available = (1 << l1_bits) - (offset & ((1 << l1_bits) - 1));
> -
> -    /* compute the number of available sectors */
> -
> -    nb_available = (nb_available >> 9) + index_in_cluster;
> -
> -    cluster_offset = 0;
> -
> -    /* seek the the l2 offset in the l1 table */
> -
> -    l1_index = offset >> l1_bits;
> -    if (l1_index >= s->l1_size)
> -        goto out;
> -
> -    l2_offset = s->l1_table[l1_index];
> -
> -    /* seek the l2 table of the given l2 offset */
> -
> -    if (!l2_offset)
> -        goto out;
> -
> -    /* load the l2 table in memory */
> -
> -    l2_offset &= ~QCOW_OFLAG_COPIED;
> -    l2_table = l2_load(bs, l2_offset);
> -    if (l2_table == NULL)
> -        return 0;
> -
> -    /* find the cluster offset for the given disk offset */
> -
> -    l2_index = (offset >> s->cluster_bits) & (s->l2_size - 1);
> -    cluster_offset = be64_to_cpu(l2_table[l2_index]);
> -    nb_clusters = size_to_clusters(s, nb_needed << 9);
> -
> -    if (!cluster_offset) {
> -        /* how many empty clusters ? */
> -        c = count_contiguous_free_clusters(nb_clusters, &l2_table[l2_index]);
> -    } else {
> -        /* how many allocated clusters ? */
> -        c = count_contiguous_clusters(nb_clusters, s->cluster_size,
> -                &l2_table[l2_index], 0, QCOW_OFLAG_COPIED);
> -    }
> -
> -   nb_available = (c * s->cluster_sectors);
> -out:
> -    if (nb_available > nb_needed)
> -        nb_available = nb_needed;
> -
> -    *num = nb_available - index_in_cluster;
> -
> -    return cluster_offset & ~QCOW_OFLAG_COPIED;
> -}
> -
> -/*
> - * free_any_clusters
> - *
> - * free clusters according to its type: compressed or not
> - *
> - */
> -
> -static void free_any_clusters(BlockDriverState *bs,
> -                              uint64_t cluster_offset, int nb_clusters)
> -{
> -    BDRVQcowState *s = bs->opaque;
> -
> -    /* free the cluster */
> -
> -    if (cluster_offset & QCOW_OFLAG_COMPRESSED) {
> -        int nb_csectors;
> -        nb_csectors = ((cluster_offset >> s->csize_shift) &
> -                       s->csize_mask) + 1;
> -        free_clusters(bs, (cluster_offset & s->cluster_offset_mask) & ~511,
> -                      nb_csectors * 512);
> -        return;
> -    }
> -
> -    free_clusters(bs, cluster_offset, nb_clusters << s->cluster_bits);
> -
> -    return;
> -}
> -
> -/*
> - * get_cluster_table
> - *
> - * for a given disk offset, load (and allocate if needed)
> - * the l2 table.
> - *
> - * the l2 table offset in the qcow2 file and the cluster index
> - * in the l2 table are given to the caller.
> - *
> - */
> -
> -static int get_cluster_table(BlockDriverState *bs, uint64_t offset,
> -                             uint64_t **new_l2_table,
> -                             uint64_t *new_l2_offset,
> -                             int *new_l2_index)
> +                                   uint64_t offset, int allocate,
> +                                   int compressed_size,
> +                                   int n_start, int n_end)
>  {
>      BDRVQcowState *s = bs->opaque;
> -    int l1_index, l2_index, ret;
> -    uint64_t l2_offset, *l2_table;
> -
> -    /* seek the the l2 offset in the l1 table */
> +    int min_index, i, j, l1_index, l2_index, ret;
> +    uint64_t l2_offset, *l2_table, cluster_offset, tmp, old_l2_offset;
>  
>      l1_index = offset >> (s->l2_bits + s->cluster_bits);
>      if (l1_index >= s->l1_size) {
> -        ret = grow_l1_table(bs, l1_index + 1);
> -        if (ret < 0)
> +        /* outside l1 table is allowed: we grow the table if needed */
> +        if (!allocate)
> +            return 0;
> +        if (grow_l1_table(bs, l1_index + 1) < 0)
>              return 0;
>      }
>      l2_offset = s->l1_table[l1_index];
> +    if (!l2_offset) {
> +        if (!allocate)
> +            return 0;
> +    l2_allocate:
> +        old_l2_offset = l2_offset;
> +        /* allocate a new l2 entry */
> +        l2_offset = alloc_clusters(bs, s->l2_size * sizeof(uint64_t));
> +        /* update the L1 entry */
> +        s->l1_table[l1_index] = l2_offset | QCOW_OFLAG_COPIED;
> +        tmp = cpu_to_be64(l2_offset | QCOW_OFLAG_COPIED);
> +        if (bdrv_pwrite(s->hd, s->l1_table_offset + l1_index * sizeof(tmp),
> +                        &tmp, sizeof(tmp)) != sizeof(tmp))
> +            return 0;
> +        min_index = l2_cache_new_entry(bs);
> +        l2_table = s->l2_cache + (min_index << s->l2_bits);
>  
> -    /* seek the l2 table of the given l2 offset */
> -
> -    if (l2_offset & QCOW_OFLAG_COPIED) {
> -        /* load the l2 table in memory */
> -        l2_offset &= ~QCOW_OFLAG_COPIED;
> -        l2_table = l2_load(bs, l2_offset);
> -        if (l2_table == NULL)
> +        if (old_l2_offset == 0) {
> +            memset(l2_table, 0, s->l2_size * sizeof(uint64_t));
> +        } else {
> +            if (bdrv_pread(s->hd, old_l2_offset,
> +                           l2_table, s->l2_size * sizeof(uint64_t)) !=
> +                s->l2_size * sizeof(uint64_t))
> +                return 0;
> +        }
> +        if (bdrv_pwrite(s->hd, l2_offset,
> +                        l2_table, s->l2_size * sizeof(uint64_t)) !=
> +            s->l2_size * sizeof(uint64_t))
>              return 0;
>      } else {
> -        if (l2_offset)
> -            free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t));
> -        l2_table = l2_allocate(bs, l1_index);
> -        if (l2_table == NULL)
> +        if (!(l2_offset & QCOW_OFLAG_COPIED)) {
> +            if (allocate) {
> +                free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t));
> +                goto l2_allocate;
> +            }
> +        } else {
> +            l2_offset &= ~QCOW_OFLAG_COPIED;
> +        }
> +        for(i = 0; i < L2_CACHE_SIZE; i++) {
> +            if (l2_offset == s->l2_cache_offsets[i]) {
> +                /* increment the hit count */
> +                if (++s->l2_cache_counts[i] == 0xffffffff) {
> +                    for(j = 0; j < L2_CACHE_SIZE; j++) {
> +                        s->l2_cache_counts[j] >>= 1;
> +                    }
> +                }
> +                l2_table = s->l2_cache + (i << s->l2_bits);
> +                goto found;
> +            }
> +        }
> +        /* not found: load a new entry in the least used one */
> +        min_index = l2_cache_new_entry(bs);
> +        l2_table = s->l2_cache + (min_index << s->l2_bits);
> +        if (bdrv_pread(s->hd, l2_offset, l2_table, s->l2_size * sizeof(uint64_t)) !=
> +            s->l2_size * sizeof(uint64_t))
>              return 0;
> -        l2_offset = s->l1_table[l1_index] & ~QCOW_OFLAG_COPIED;
>      }
> -
> -    /* find the cluster offset for the given disk offset */
> -
> +    s->l2_cache_offsets[min_index] = l2_offset;
> +    s->l2_cache_counts[min_index] = 1;
> + found:
>      l2_index = (offset >> s->cluster_bits) & (s->l2_size - 1);
> -
> -    *new_l2_table = l2_table;
> -    *new_l2_offset = l2_offset;
> -    *new_l2_index = l2_index;
> -
> -    return 1;
> -}
> -
> -/*
> - * alloc_compressed_cluster_offset
> - *
> - * For a given offset of the disk image, return cluster offset in
> - * qcow2 file.
> - *
> - * If the offset is not found, allocate a new compressed cluster.
> - *
> - * Return the cluster offset if successful,
> - * Return 0, otherwise.
> - *
> - */
> -
> -static uint64_t alloc_compressed_cluster_offset(BlockDriverState *bs,
> -                                                uint64_t offset,
> -                                                int compressed_size)
> -{
> -    BDRVQcowState *s = bs->opaque;
> -    int l2_index, ret;
> -    uint64_t l2_offset, *l2_table, cluster_offset;
> -    int nb_csectors;
> -
> -    ret = get_cluster_table(bs, offset, &l2_table, &l2_offset, &l2_index);
> -    if (ret == 0)
> -        return 0;
> -
>      cluster_offset = be64_to_cpu(l2_table[l2_index]);
> -    if (cluster_offset & QCOW_OFLAG_COPIED)
> -        return cluster_offset & ~QCOW_OFLAG_COPIED;
> -
> -    if (cluster_offset)
> -        free_any_clusters(bs, cluster_offset, 1);
> -
> -    cluster_offset = alloc_bytes(bs, compressed_size);
> -    nb_csectors = ((cluster_offset + compressed_size - 1) >> 9) -
> -                  (cluster_offset >> 9);
> -
> -    cluster_offset |= QCOW_OFLAG_COMPRESSED |
> -                      ((uint64_t)nb_csectors << s->csize_shift);
> -
> -    /* update L2 table */
> -
> -    /* compressed clusters never have the copied flag */
> -
> -    l2_table[l2_index] = cpu_to_be64(cluster_offset);
> -    if (bdrv_pwrite(s->hd,
> -                    l2_offset + l2_index * sizeof(uint64_t),
> -                    l2_table + l2_index,
> -                    sizeof(uint64_t)) != sizeof(uint64_t))
> -        return 0;
> -
> -    return cluster_offset;
> -}
> -
> -typedef struct QCowL2Meta
> -{
> -    uint64_t offset;
> -    int n_start;
> -    int nb_available;
> -    int nb_clusters;
> -} QCowL2Meta;
> -
> -static int alloc_cluster_link_l2(BlockDriverState *bs, uint64_t cluster_offset,
> -        QCowL2Meta *m)
> -{
> -    BDRVQcowState *s = bs->opaque;
> -    int i, j = 0, l2_index, ret;
> -    uint64_t *old_cluster, start_sect, l2_offset, *l2_table;
> -
> -    if (m->nb_clusters == 0)
> -        return 0;
> -
> -    if (!(old_cluster = qemu_malloc(m->nb_clusters * sizeof(uint64_t))))
> -        return -ENOMEM;
> -
> -    /* copy content of unmodified sectors */
> -    start_sect = (m->offset & ~(s->cluster_size - 1)) >> 9;
> -    if (m->n_start) {
> -        ret = copy_sectors(bs, start_sect, cluster_offset, 0, m->n_start);
> -        if (ret < 0)
> -            goto err;
> +    if (!cluster_offset) {
> +        if (!allocate)
> +            return cluster_offset;
> +    } else if (!(cluster_offset & QCOW_OFLAG_COPIED)) {
> +        if (!allocate)
> +            return cluster_offset;
> +        /* free the cluster */
> +        if (cluster_offset & QCOW_OFLAG_COMPRESSED) {
> +            int nb_csectors;
> +            nb_csectors = ((cluster_offset >> s->csize_shift) &
> +                           s->csize_mask) + 1;
> +            free_clusters(bs, (cluster_offset & s->cluster_offset_mask) & ~511,
> +                          nb_csectors * 512);
> +        } else {
> +            free_clusters(bs, cluster_offset, s->cluster_size);
> +        }
> +    } else {
> +        cluster_offset &= ~QCOW_OFLAG_COPIED;
> +        return cluster_offset;
>      }
> -
> -    if (m->nb_available & (s->cluster_sectors - 1)) {
> -        uint64_t end = m->nb_available & ~(uint64_t)(s->cluster_sectors - 1);
> -        ret = copy_sectors(bs, start_sect + end, cluster_offset + (end << 9),
> -                m->nb_available - end, s->cluster_sectors);
> -        if (ret < 0)
> -            goto err;
> +    if (allocate == 1) {
> +        /* allocate a new cluster */
> +        cluster_offset = alloc_clusters(bs, s->cluster_size);
> +
> +        /* we must initialize the cluster content which won't be
> +           written */
> +        if ((n_end - n_start) < s->cluster_sectors) {
> +            uint64_t start_sect;
> +
> +            start_sect = (offset & ~(s->cluster_size - 1)) >> 9;
> +            ret = copy_sectors(bs, start_sect,
> +                               cluster_offset, 0, n_start);
> +            if (ret < 0)
> +                return 0;
> +            ret = copy_sectors(bs, start_sect,
> +                               cluster_offset, n_end, s->cluster_sectors);
> +            if (ret < 0)
> +                return 0;
> +        }
> +        tmp = cpu_to_be64(cluster_offset | QCOW_OFLAG_COPIED);
> +    } else {
> +        int nb_csectors;
> +        cluster_offset = alloc_bytes(bs, compressed_size);
> +        nb_csectors = ((cluster_offset + compressed_size - 1) >> 9) -
> +            (cluster_offset >> 9);
> +        cluster_offset |= QCOW_OFLAG_COMPRESSED |
> +            ((uint64_t)nb_csectors << s->csize_shift);
> +        /* compressed clusters never have the copied flag */
> +        tmp = cpu_to_be64(cluster_offset);
>      }
> -
> -    ret = -EIO;
>      /* update L2 table */
> -    if (!get_cluster_table(bs, m->offset, &l2_table, &l2_offset, &l2_index))
> -        goto err;
> -
> -    for (i = 0; i < m->nb_clusters; i++) {
> -        if(l2_table[l2_index + i] != 0)
> -            old_cluster[j++] = l2_table[l2_index + i];
> -
> -        l2_table[l2_index + i] = cpu_to_be64((cluster_offset +
> -                    (i << s->cluster_bits)) | QCOW_OFLAG_COPIED);
> -     }
> -
> -    if (bdrv_pwrite(s->hd, l2_offset + l2_index * sizeof(uint64_t),
> -                l2_table + l2_index, m->nb_clusters * sizeof(uint64_t)) !=
> -            m->nb_clusters * sizeof(uint64_t))
> -        goto err;
> -
> -    for (i = 0; i < j; i++)
> -        free_any_clusters(bs, old_cluster[i], 1);
> -
> -    ret = 0;
> -err:
> -    qemu_free(old_cluster);
> -    return ret;
> - }
> -
> -/*
> - * alloc_cluster_offset
> - *
> - * For a given offset of the disk image, return cluster offset in
> - * qcow2 file.
> - *
> - * If the offset is not found, allocate a new cluster.
> - *
> - * Return the cluster offset if successful,
> - * Return 0, otherwise.
> - *
> - */
> -
> -static uint64_t alloc_cluster_offset(BlockDriverState *bs,
> -                                     uint64_t offset,
> -                                     int n_start, int n_end,
> -                                     int *num, QCowL2Meta *m)
> -{
> -    BDRVQcowState *s = bs->opaque;
> -    int l2_index, ret;
> -    uint64_t l2_offset, *l2_table, cluster_offset;
> -    int nb_clusters, i = 0;
> -
> -    ret = get_cluster_table(bs, offset, &l2_table, &l2_offset, &l2_index);
> -    if (ret == 0)
> +    l2_table[l2_index] = tmp;
> +    if (bdrv_pwrite(s->hd,
> +                    l2_offset + l2_index * sizeof(tmp), &tmp, sizeof(tmp)) != sizeof(tmp))
>          return 0;
> -
> -    nb_clusters = size_to_clusters(s, n_end << 9);
> -
> -    nb_clusters = MIN(nb_clusters, s->l2_size - l2_index);
> -
> -    cluster_offset = be64_to_cpu(l2_table[l2_index]);
> -
> -    /* We keep all QCOW_OFLAG_COPIED clusters */
> -
> -    if (cluster_offset & QCOW_OFLAG_COPIED) {
> -        nb_clusters = count_contiguous_clusters(nb_clusters, s->cluster_size,
> -                &l2_table[l2_index], 0, 0);
> -
> -        cluster_offset &= ~QCOW_OFLAG_COPIED;
> -        m->nb_clusters = 0;
> -
> -        goto out;
> -    }
> -
> -    /* for the moment, multiple compressed clusters are not managed */
> -
> -    if (cluster_offset & QCOW_OFLAG_COMPRESSED)
> -        nb_clusters = 1;
> -
> -    /* how many available clusters ? */
> -
> -    while (i < nb_clusters) {
> -        i += count_contiguous_clusters(nb_clusters - i, s->cluster_size,
> -                &l2_table[l2_index], i, 0);
> -
> -        if(be64_to_cpu(l2_table[l2_index + i]))
> -            break;
> -
> -        i += count_contiguous_free_clusters(nb_clusters - i,
> -                &l2_table[l2_index + i]);
> -
> -        cluster_offset = be64_to_cpu(l2_table[l2_index + i]);
> -
> -        if ((cluster_offset & QCOW_OFLAG_COPIED) ||
> -                (cluster_offset & QCOW_OFLAG_COMPRESSED))
> -            break;
> -    }
> -    nb_clusters = i;
> -
> -    /* allocate a new cluster */
> -
> -    cluster_offset = alloc_clusters(bs, nb_clusters * s->cluster_size);
> -
> -    /* save info needed for meta data update */
> -    m->offset = offset;
> -    m->n_start = n_start;
> -    m->nb_clusters = nb_clusters;
> -
> -out:
> -    m->nb_available = MIN(nb_clusters << (s->cluster_bits - 9), n_end);
> -
> -    *num = m->nb_available - n_start;
> -
>      return cluster_offset;
>  }
>  
>  static int qcow_is_allocated(BlockDriverState *bs, int64_t sector_num,
>                               int nb_sectors, int *pnum)
>  {
> +    BDRVQcowState *s = bs->opaque;
> +    int index_in_cluster, n;
>      uint64_t cluster_offset;
>  
> -    *pnum = nb_sectors;
> -    cluster_offset = get_cluster_offset(bs, sector_num << 9, pnum);
> -
> +    cluster_offset = get_cluster_offset(bs, sector_num << 9, 0, 0, 0, 0);
> +    index_in_cluster = sector_num & (s->cluster_sectors - 1);
> +    n = s->cluster_sectors - index_in_cluster;
> +    if (n > nb_sectors)
> +        n = nb_sectors;
> +    *pnum = n;
>      return (cluster_offset != 0);
>  }
>  
> @@ -1102,9 +723,11 @@
>      uint64_t cluster_offset;
>  
>      while (nb_sectors > 0) {
> -        n = nb_sectors;
> -        cluster_offset = get_cluster_offset(bs, sector_num << 9, &n);
> +        cluster_offset = get_cluster_offset(bs, sector_num << 9, 0, 0, 0, 0);
>          index_in_cluster = sector_num & (s->cluster_sectors - 1);
> +        n = s->cluster_sectors - index_in_cluster;
> +        if (n > nb_sectors)
> +            n = nb_sectors;
>          if (!cluster_offset) {
>              if (bs->backing_hd) {
>                  /* read from the base image */
> @@ -1143,18 +766,15 @@
>      BDRVQcowState *s = bs->opaque;
>      int ret, index_in_cluster, n;
>      uint64_t cluster_offset;
> -    int n_end;
> -    QCowL2Meta l2meta;
>  
>      while (nb_sectors > 0) {
>          index_in_cluster = sector_num & (s->cluster_sectors - 1);
> -        n_end = index_in_cluster + nb_sectors;
> -        if (s->crypt_method &&
> -            n_end > QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors)
> -            n_end = QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors;
> -        cluster_offset = alloc_cluster_offset(bs, sector_num << 9,
> -                                              index_in_cluster,
> -                                              n_end, &n, &l2meta);
> +        n = s->cluster_sectors - index_in_cluster;
> +        if (n > nb_sectors)
> +            n = nb_sectors;
> +        cluster_offset = get_cluster_offset(bs, sector_num << 9, 1, 0,
> +                                            index_in_cluster,
> +                                            index_in_cluster + n);
>          if (!cluster_offset)
>              return -1;
>          if (s->crypt_method) {
> @@ -1165,10 +785,8 @@
>          } else {
>              ret = bdrv_pwrite(s->hd, cluster_offset + index_in_cluster * 512, buf, n * 512);
>          }
> -        if (ret != n * 512 || alloc_cluster_link_l2(bs, cluster_offset, &l2meta) < 0) {
> -            free_any_clusters(bs, cluster_offset, l2meta.nb_clusters);
> +        if (ret != n * 512)
>              return -1;
> -        }
>          nb_sectors -= n;
>          sector_num += n;
>          buf += n * 512;
> @@ -1186,33 +804,8 @@
>      uint64_t cluster_offset;
>      uint8_t *cluster_data;
>      BlockDriverAIOCB *hd_aiocb;
> -    QEMUBH *bh;
> -    QCowL2Meta l2meta;
>  } QCowAIOCB;
>  
> -static void qcow_aio_read_cb(void *opaque, int ret);
> -static void qcow_aio_read_bh(void *opaque)
> -{
> -    QCowAIOCB *acb = opaque;
> -    qemu_bh_delete(acb->bh);
> -    acb->bh = NULL;
> -    qcow_aio_read_cb(opaque, 0);
> -}
> -
> -static int qcow_schedule_bh(QEMUBHFunc *cb, QCowAIOCB *acb)
> -{
> -    if (acb->bh)
> -        return -EIO;
> -
> -    acb->bh = qemu_bh_new(cb, acb);
> -    if (!acb->bh)
> -        return -EIO;
> -
> -    qemu_bh_schedule(acb->bh);
> -
> -    return 0;
> -}
> -
>  static void qcow_aio_read_cb(void *opaque, int ret)
>  {
>      QCowAIOCB *acb = opaque;
> @@ -1222,12 +815,13 @@
>  
>      acb->hd_aiocb = NULL;
>      if (ret < 0) {
> -fail:
> +    fail:
>          acb->common.cb(acb->common.opaque, ret);
>          qemu_aio_release(acb);
>          return;
>      }
>  
> + redo:
>      /* post process the read buffer */
>      if (!acb->cluster_offset) {
>          /* nothing to do */
> @@ -1253,9 +847,12 @@
>      }
>  
>      /* prepare next AIO request */
> -    acb->n = acb->nb_sectors;
> -    acb->cluster_offset = get_cluster_offset(bs, acb->sector_num << 9, &acb->n);
> +    acb->cluster_offset = get_cluster_offset(bs, acb->sector_num << 9,
> +                                             0, 0, 0, 0);
>      index_in_cluster = acb->sector_num & (s->cluster_sectors - 1);
> +    acb->n = s->cluster_sectors - index_in_cluster;
> +    if (acb->n > acb->nb_sectors)
> +        acb->n = acb->nb_sectors;
>  
>      if (!acb->cluster_offset) {
>          if (bs->backing_hd) {
> @@ -1268,16 +865,12 @@
>                  if (acb->hd_aiocb == NULL)
>                      goto fail;
>              } else {
> -                ret = qcow_schedule_bh(qcow_aio_read_bh, acb);
> -                if (ret < 0)
> -                    goto fail;
> +                goto redo;
>              }
>          } else {
>              /* Note: in this case, no need to wait */
>              memset(acb->buf, 0, 512 * acb->n);
> -            ret = qcow_schedule_bh(qcow_aio_read_bh, acb);
> -            if (ret < 0)
> -                goto fail;
> +            goto redo;
>          }
>      } else if (acb->cluster_offset & QCOW_OFLAG_COMPRESSED) {
>          /* add AIO support for compressed blocks ? */
> @@ -1285,9 +878,7 @@
>              goto fail;
>          memcpy(acb->buf,
>                 s->cluster_cache + index_in_cluster * 512, 512 * acb->n);
> -        ret = qcow_schedule_bh(qcow_aio_read_bh, acb);
> -        if (ret < 0)
> -            goto fail;
> +        goto redo;
>      } else {
>          if ((acb->cluster_offset & 511) != 0) {
>              ret = -EIO;
> @@ -1316,7 +907,6 @@
>      acb->nb_sectors = nb_sectors;
>      acb->n = 0;
>      acb->cluster_offset = 0;
> -    acb->l2meta.nb_clusters = 0;
>      return acb;
>  }
>  
> @@ -1340,8 +930,8 @@
>      BlockDriverState *bs = acb->common.bs;
>      BDRVQcowState *s = bs->opaque;
>      int index_in_cluster;
> +    uint64_t cluster_offset;
>      const uint8_t *src_buf;
> -    int n_end;
>  
>      acb->hd_aiocb = NULL;
>  
> @@ -1352,11 +942,6 @@
>          return;
>      }
>  
> -    if (alloc_cluster_link_l2(bs, acb->cluster_offset, &acb->l2meta) < 0) {
> -        free_any_clusters(bs, acb->cluster_offset, acb->l2meta.nb_clusters);
> -        goto fail;
> -    }
> -
>      acb->nb_sectors -= acb->n;
>      acb->sector_num += acb->n;
>      acb->buf += acb->n * 512;
> @@ -1369,22 +954,19 @@
>      }
>  
>      index_in_cluster = acb->sector_num & (s->cluster_sectors - 1);
> -    n_end = index_in_cluster + acb->nb_sectors;
> -    if (s->crypt_method &&
> -        n_end > QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors)
> -        n_end = QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors;
> -
> -    acb->cluster_offset = alloc_cluster_offset(bs, acb->sector_num << 9,
> -                                          index_in_cluster,
> -                                          n_end, &acb->n, &acb->l2meta);
> -    if (!acb->cluster_offset || (acb->cluster_offset & 511) != 0) {
> +    acb->n = s->cluster_sectors - index_in_cluster;
> +    if (acb->n > acb->nb_sectors)
> +        acb->n = acb->nb_sectors;
> +    cluster_offset = get_cluster_offset(bs, acb->sector_num << 9, 1, 0,
> +                                        index_in_cluster,
> +                                        index_in_cluster + acb->n);
> +    if (!cluster_offset || (cluster_offset & 511) != 0) {
>          ret = -EIO;
>          goto fail;
>      }
>      if (s->crypt_method) {
>          if (!acb->cluster_data) {
> -            acb->cluster_data = qemu_mallocz(QCOW_MAX_CRYPT_CLUSTERS *
> -                                             s->cluster_size);
> +            acb->cluster_data = qemu_mallocz(s->cluster_size);
>              if (!acb->cluster_data) {
>                  ret = -ENOMEM;
>                  goto fail;
> @@ -1397,7 +979,7 @@
>          src_buf = acb->buf;
>      }
>      acb->hd_aiocb = bdrv_aio_write(s->hd,
> -                                   (acb->cluster_offset >> 9) + index_in_cluster,
> +                                   (cluster_offset >> 9) + index_in_cluster,
>                                     src_buf, acb->n,
>                                     qcow_aio_write_cb, acb);
>      if (acb->hd_aiocb == NULL)
> @@ -1571,7 +1153,7 @@
>  
>      memset(s->l1_table, 0, l1_length);
>      if (bdrv_pwrite(s->hd, s->l1_table_offset, s->l1_table, l1_length) < 0)
> -        return -1;
> +	return -1;
>      ret = bdrv_truncate(s->hd, s->l1_table_offset + l1_length);
>      if (ret < 0)
>          return ret;
> @@ -1637,10 +1219,8 @@
>          /* could not compress: write normal cluster */
>          qcow_write(bs, sector_num, buf, s->cluster_sectors);
>      } else {
> -        cluster_offset = alloc_compressed_cluster_offset(bs, sector_num << 9,
> -                                              out_len);
> -        if (!cluster_offset)
> -            return -1;
> +        cluster_offset = get_cluster_offset(bs, sector_num << 9, 2,
> +                                            out_len, 0, 0);
>          cluster_offset &= s->cluster_offset_mask;
>          if (bdrv_pwrite(s->hd, cluster_offset, out_buf, out_len) != out_len) {
>              qemu_free(out_buf);
> @@ -2225,19 +1805,26 @@
>      BDRVQcowState *s = bs->opaque;
>      int i, nb_clusters;
>  
> -    nb_clusters = size_to_clusters(s, size);
> -retry:
> -    for(i = 0; i < nb_clusters; i++) {
> -        int64_t i = s->free_cluster_index++;
> -        if (get_refcount(bs, i) != 0)
> -            goto retry;
> -    }
> +    nb_clusters = (size + s->cluster_size - 1) >> s->cluster_bits;
> +    for(;;) {
> +        if (get_refcount(bs, s->free_cluster_index) == 0) {
> +            s->free_cluster_index++;
> +            for(i = 1; i < nb_clusters; i++) {
> +                if (get_refcount(bs, s->free_cluster_index) != 0)
> +                    goto not_found;
> +                s->free_cluster_index++;
> +            }
>  #ifdef DEBUG_ALLOC2
> -    printf("alloc_clusters: size=%lld -> %lld\n",
> -            size,
> -            (s->free_cluster_index - nb_clusters) << s->cluster_bits);
> +            printf("alloc_clusters: size=%lld -> %lld\n",
> +                   size,
> +                   (s->free_cluster_index - nb_clusters) << s->cluster_bits);
>  #endif
> -    return (s->free_cluster_index - nb_clusters) << s->cluster_bits;
> +            return (s->free_cluster_index - nb_clusters) << s->cluster_bits;
> +        } else {
> +        not_found:
> +            s->free_cluster_index++;
> +        }
> +    }
>  }
>  
>  static int64_t alloc_clusters(BlockDriverState *bs, int64_t size)
> @@ -2301,7 +1888,8 @@
>      int new_table_size, new_table_size2, refcount_table_clusters, i, ret;
>      uint64_t *new_table;
>      int64_t table_offset;
> -    uint8_t data[12];
> +    uint64_t data64;
> +    uint32_t data32;
>      int old_table_size;
>      int64_t old_table_offset;
>  
> @@ -2340,10 +1928,13 @@
>      for(i = 0; i < s->refcount_table_size; i++)
>          be64_to_cpus(&new_table[i]);
>  
> -    cpu_to_be64w((uint64_t*)data, table_offset);
> -    cpu_to_be32w((uint32_t*)(data + 8), refcount_table_clusters);
> +    data64 = cpu_to_be64(table_offset);
>      if (bdrv_pwrite(s->hd, offsetof(QCowHeader, refcount_table_offset),
> -                    data, sizeof(data)) != sizeof(data))
> +                    &data64, sizeof(data64)) != sizeof(data64))
> +        goto fail;
> +    data32 = cpu_to_be32(refcount_table_clusters);
> +    if (bdrv_pwrite(s->hd, offsetof(QCowHeader, refcount_table_clusters),
> +                    &data32, sizeof(data32)) != sizeof(data32))
>          goto fail;
>      qemu_free(s->refcount_table);
>      old_table_offset = s->refcount_table_offset;
> @@ -2572,7 +2163,7 @@
>      uint16_t *refcount_table;
>  
>      size = bdrv_getlength(s->hd);
> -    nb_clusters = size_to_clusters(s, size);
> +    nb_clusters = (size + s->cluster_size - 1) >> s->cluster_bits;
>      refcount_table = qemu_mallocz(nb_clusters * sizeof(uint16_t));
>  
>      /* header */
> @@ -2624,7 +2215,7 @@
>      int refcount;
>  
>      size = bdrv_getlength(s->hd);
> -    nb_clusters = size_to_clusters(s, size);
> +    nb_clusters = (size + s->cluster_size - 1) >> s->cluster_bits;
>      for(k = 0; k < nb_clusters;) {
>          k1 = k;
>          refcount = get_refcount(bs, k);
>
>
>   


[-- Attachment #2: Type: text/html, Size: 43068 bytes --]

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [PATCH] Revert block-qcow2.c to kvm-72 version due to corruption reports
  2009-02-13 19:04       ` [Qemu-devel] [PATCH] Revert block-qcow2.c to kvm-72 version due to corruption reports Jamie Lokier
  2009-02-14 22:23         ` Dor Laor
@ 2009-02-14 23:13         ` Anthony Liguori
  2009-02-15  2:01           ` Jamie Lokier
  1 sibling, 1 reply; 82+ messages in thread
From: Anthony Liguori @ 2009-02-14 23:13 UTC (permalink / raw)
  To: qemu-devel

Jamie Lokier wrote:
> Anthony Liguori wrote:
>   
>>> Simply reverting the qcow2 code appears to fix those problems, so it
>>> needn't hold up cutting a release.  That's what I recommend.
>>>       
>> Send some patches.
>>     
>
> I did already.
>
> Here it is again.  This should fix my bug and Marc's bug according to
> his report that reverting qcow2.c fixes it.
>   

Well such a large reversion is a bad idea.  Can you git bisect to the 
actual changeset that introduced the bug you see?

You're effectively reverting a very large number of changes whereas only 
one is likely causing your problem

Regards,

Anthony Liguori

> -- Jamie
>
>
> Subject: Revert block-qcow2.c to kvm-72 version due to corruption reports
>
> This fixes two kinds of qcow2 corruption observed in kvm-83 (actually
> kvm-73 and later), from three bug reports.
>
>
> Bug 1: Windows 2000 guests complain of corrupt registry.
>
> Many Windows 2000 guests which boot and runs fine in kvm-72, fail with
> a blue-screen indicating file corruption errors in kvm-73 through to
> kvm-83 (the latest), and succeed if we replace block-qcow2.c with the
> version from kvm-72.
>
> The blue screen appears towards the end of the boot sequence, and
> shows only briefly before rebooting.  It says:
>
>     STOP: c0000218 (Registry File Failure)
>     The registry cannot load the hive (file):
>     \SystemRoot\System32\Config\SOFTWARE
>     or its log or alternate.
>     It is corrupt, absent, or not writable.
>
>     Beginning dump of physical memory
>     Physical memory dump complete. Contact your system administrator or
>     technical support [...?]
>
> This is narrowed down to the difference in block-qcow2.c between
> kvm-72 and kvm-73 (not -83).  From kvm-73 to kvm-83, there have been
> more changes block-qcow2.c, but the observed corruption still occurs.
>
> The bug isn't evident when only reading.  When using "qemu-img
> convert" to convert a qcow2 file to a raw file, with broken and fixed
> versions of block-qcow2.c it produces the same raw file.  Also, when
> using "-snapshot" with qemu, the blue screen doesn't occur.
>
> This bug was observed by Jamie Lokier <jamie@shareable.org> and 
> confirmed for multiple Windows 2000 guests by
> Marc Bevand <m.bevand@gmail.com>.
>
>
> Bug 2: Windows 2003 guests complain of corrupt registry.
>
> According to
> http://sourceforge.net/tracker/?func=detail&atid=893831&aid=2001452&group_id=180599
>
> Windows 2003 32-bit guests randomly spew disk corruption messages
> like this:
>
>     Windows – Registry Hive Recovered
>     Registry hive (file): SOFTWARE was corrupted and it has
>     been recovered. Some data might have been lost.
>
> and
>
>     The system cannot log on due to the following error:
>     Unable to complete the requested operation because of
>     either a catastrophic media failure or a data structure
>     corruption on the disk.
>
> This bug was reported by <gerdwachs@users.sourceforge.net> and
> confirmed by Marc Bevand, noting:
>
>     kvm-73+ also causes some of my Windows 2003 guests to exhibit this
>     exact registry corruption error.  [...]  This bug is also fixed by
>     reverting block-qcow2.c to the version from kvm-72.
>
> Worryingly, gerdwachs' bug report says it's for kvm-70, implying this
> patch may not fix all the Windows 2003 guest corruption problems.
>
> At least Marc says his observed problem goes away with kvm-72's qcow2.
>
>
> Bug 3: Corruption of qcow2 index rendering the file unusable.
>
> Marc Bevand writes:
>
>     I tested kvm-81 and kvm-83 as well (can't test kvm-80 or older because
>     of the qcow2 performance regression caused by the default writethrough
>     caching policy) but it randomly triggers an even worse bug: the moment
>     I shut down a guest by typing "quit" in the monitor, it sometimes
>     overwrite the first 4kB of the disk image with mostly NUL bytes (!)
>     which completely destroys it. I am familiar with the qcow2 format and
>     apparently this 4kB block seems to be an L2 table with most entries
>     set to zero. I have had to restore at least 6 or 7 disk images from
>     backup after occurences of that bug. My intuition tells me this may be
>     the qcow2 code trying to allocate a cluster to write a new L2 table,
>     but not noticing the allocation failed (represented by a 0 offset),
>     and writing the L2 table at that 0 offset, overwriting the qcow2
>     header.
>
>     Fortunately this bug is also fixed by running kvm-75 with
>     block-qcow2.c reverted to its kvm-72 version.
>
>     Basically qcow2 in kvm-73 or newer is completely unreliable.
>
>
> Reverting block-qcow2.c to the version in kvm-72 appears to fix the
> corruption symptoms reported by Marc and Jamie, although gerdwachs'
> related bug is against kvm-70 so it may not fix that.
>
> Unfortunately this reverts some optimisations, but fixing corruption
> is more important until the new code is reliable.
>
> This patch reverts block-qcow2.c in kvm-83 to the version in kvm-72,
> except the "cache=writeback" default performance tweak is retained and
> there's no need to define "offsetof".
>
> Signed-Off-By: Jamie Lokier <jamie@shareable.org>
>
>
> --- kvm-83-real/qemu/block-qcow2.c	2009-01-13 13:29:42.000000000 +0000
> +++ kvm-83/qemu/block-qcow2.c	2009-02-13 18:51:12.000000000 +0000
> @@ -52,8 +52,6 @@
>  #define QCOW_CRYPT_NONE 0
>  #define QCOW_CRYPT_AES  1
>  
> -#define QCOW_MAX_CRYPT_CLUSTERS 32
> -
>  /* indicate that the refcount of the referenced cluster is exactly one. */
>  #define QCOW_OFLAG_COPIED     (1LL << 63)
>  /* indicate that the cluster is compressed (they never have the copied flag) */
> @@ -269,8 +267,7 @@
>      if (!s->cluster_cache)
>          goto fail;
>      /* one more sector for decompressed data alignment */
> -    s->cluster_data = qemu_malloc(QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size
> -                                  + 512);
> +    s->cluster_data = qemu_malloc(s->cluster_size + 512);
>      if (!s->cluster_data)
>          goto fail;
>      s->cluster_cache_offset = -1;
> @@ -437,7 +434,8 @@
>      int new_l1_size, new_l1_size2, ret, i;
>      uint64_t *new_l1_table;
>      uint64_t new_l1_table_offset;
> -    uint8_t data[12];
> +    uint64_t data64;
> +    uint32_t data32;
>  
>      new_l1_size = s->l1_size;
>      if (min_size <= new_l1_size)
> @@ -467,10 +465,13 @@
>          new_l1_table[i] = be64_to_cpu(new_l1_table[i]);
>  
>      /* set new table */
> -    cpu_to_be32w((uint32_t*)data, new_l1_size);
> -    cpu_to_be64w((uint64_t*)(data + 4), new_l1_table_offset);
> -    if (bdrv_pwrite(s->hd, offsetof(QCowHeader, l1_size), data,
> -                sizeof(data)) != sizeof(data))
> +    data64 = cpu_to_be64(new_l1_table_offset);
> +    if (bdrv_pwrite(s->hd, offsetof(QCowHeader, l1_table_offset),
> +                    &data64, sizeof(data64)) != sizeof(data64))
> +        goto fail;
> +    data32 = cpu_to_be32(new_l1_size);
> +    if (bdrv_pwrite(s->hd, offsetof(QCowHeader, l1_size),
> +                    &data32, sizeof(data32)) != sizeof(data32))
>          goto fail;
>      qemu_free(s->l1_table);
>      free_clusters(bs, s->l1_table_offset, s->l1_size * sizeof(uint64_t));
> @@ -483,549 +484,169 @@
>      return -EIO;
>  }
>  
> -/*
> - * seek_l2_table
> +/* 'allocate' is:
>   *
> - * seek l2_offset in the l2_cache table
> - * if not found, return NULL,
> - * if found,
> - *   increments the l2 cache hit count of the entry,
> - *   if counter overflow, divide by two all counters
> - *   return the pointer to the l2 cache entry
> + * 0 not to allocate.
>   *
> - */
> -
> -static uint64_t *seek_l2_table(BDRVQcowState *s, uint64_t l2_offset)
> -{
> -    int i, j;
> -
> -    for(i = 0; i < L2_CACHE_SIZE; i++) {
> -        if (l2_offset == s->l2_cache_offsets[i]) {
> -            /* increment the hit count */
> -            if (++s->l2_cache_counts[i] == 0xffffffff) {
> -                for(j = 0; j < L2_CACHE_SIZE; j++) {
> -                    s->l2_cache_counts[j] >>= 1;
> -                }
> -            }
> -            return s->l2_cache + (i << s->l2_bits);
> -        }
> -    }
> -    return NULL;
> -}
> -
> -/*
> - * l2_load
> + * 1 to allocate a normal cluster (for sector indexes 'n_start' to
> + * 'n_end')
>   *
> - * Loads a L2 table into memory. If the table is in the cache, the cache
> - * is used; otherwise the L2 table is loaded from the image file.
> + * 2 to allocate a compressed cluster of size
> + * 'compressed_size'. 'compressed_size' must be > 0 and <
> + * cluster_size
>   *
> - * Returns a pointer to the L2 table on success, or NULL if the read from
> - * the image file failed.
> + * return 0 if not allocated.
>   */
> -
> -static uint64_t *l2_load(BlockDriverState *bs, uint64_t l2_offset)
> -{
> -    BDRVQcowState *s = bs->opaque;
> -    int min_index;
> -    uint64_t *l2_table;
> -
> -    /* seek if the table for the given offset is in the cache */
> -
> -    l2_table = seek_l2_table(s, l2_offset);
> -    if (l2_table != NULL)
> -        return l2_table;
> -
> -    /* not found: load a new entry in the least used one */
> -
> -    min_index = l2_cache_new_entry(bs);
> -    l2_table = s->l2_cache + (min_index << s->l2_bits);
> -    if (bdrv_pread(s->hd, l2_offset, l2_table, s->l2_size * sizeof(uint64_t)) !=
> -        s->l2_size * sizeof(uint64_t))
> -        return NULL;
> -    s->l2_cache_offsets[min_index] = l2_offset;
> -    s->l2_cache_counts[min_index] = 1;
> -
> -    return l2_table;
> -}
> -
> -/*
> - * l2_allocate
> - *
> - * Allocate a new l2 entry in the file. If l1_index points to an already
> - * used entry in the L2 table (i.e. we are doing a copy on write for the L2
> - * table) copy the contents of the old L2 table into the newly allocated one.
> - * Otherwise the new table is initialized with zeros.
> - *
> - */
> -
> -static uint64_t *l2_allocate(BlockDriverState *bs, int l1_index)
> -{
> -    BDRVQcowState *s = bs->opaque;
> -    int min_index;
> -    uint64_t old_l2_offset, tmp;
> -    uint64_t *l2_table, l2_offset;
> -
> -    old_l2_offset = s->l1_table[l1_index];
> -
> -    /* allocate a new l2 entry */
> -
> -    l2_offset = alloc_clusters(bs, s->l2_size * sizeof(uint64_t));
> -
> -    /* update the L1 entry */
> -
> -    s->l1_table[l1_index] = l2_offset | QCOW_OFLAG_COPIED;
> -
> -    tmp = cpu_to_be64(l2_offset | QCOW_OFLAG_COPIED);
> -    if (bdrv_pwrite(s->hd, s->l1_table_offset + l1_index * sizeof(tmp),
> -                    &tmp, sizeof(tmp)) != sizeof(tmp))
> -        return NULL;
> -
> -    /* allocate a new entry in the l2 cache */
> -
> -    min_index = l2_cache_new_entry(bs);
> -    l2_table = s->l2_cache + (min_index << s->l2_bits);
> -
> -    if (old_l2_offset == 0) {
> -        /* if there was no old l2 table, clear the new table */
> -        memset(l2_table, 0, s->l2_size * sizeof(uint64_t));
> -    } else {
> -        /* if there was an old l2 table, read it from the disk */
> -        if (bdrv_pread(s->hd, old_l2_offset,
> -                       l2_table, s->l2_size * sizeof(uint64_t)) !=
> -            s->l2_size * sizeof(uint64_t))
> -            return NULL;
> -    }
> -    /* write the l2 table to the file */
> -    if (bdrv_pwrite(s->hd, l2_offset,
> -                    l2_table, s->l2_size * sizeof(uint64_t)) !=
> -        s->l2_size * sizeof(uint64_t))
> -        return NULL;
> -
> -    /* update the l2 cache entry */
> -
> -    s->l2_cache_offsets[min_index] = l2_offset;
> -    s->l2_cache_counts[min_index] = 1;
> -
> -    return l2_table;
> -}
> -
> -static int size_to_clusters(BDRVQcowState *s, int64_t size)
> -{
> -    return (size + (s->cluster_size - 1)) >> s->cluster_bits;
> -}
> -
> -static int count_contiguous_clusters(uint64_t nb_clusters, int cluster_size,
> -        uint64_t *l2_table, uint64_t start, uint64_t mask)
> -{
> -    int i;
> -    uint64_t offset = be64_to_cpu(l2_table[0]) & ~mask;
> -
> -    if (!offset)
> -        return 0;
> -
> -    for (i = start; i < start + nb_clusters; i++)
> -        if (offset + i * cluster_size != (be64_to_cpu(l2_table[i]) & ~mask))
> -            break;
> -
> -	return (i - start);
> -}
> -
> -static int count_contiguous_free_clusters(uint64_t nb_clusters, uint64_t *l2_table)
> -{
> -    int i = 0;
> -
> -    while(nb_clusters-- && l2_table[i] == 0)
> -        i++;
> -
> -    return i;
> -}
> -
> -/*
> - * get_cluster_offset
> - *
> - * For a given offset of the disk image, return cluster offset in
> - * qcow2 file.
> - *
> - * on entry, *num is the number of contiguous clusters we'd like to
> - * access following offset.
> - *
> - * on exit, *num is the number of contiguous clusters we can read.
> - *
> - * Return 1, if the offset is found
> - * Return 0, otherwise.
> - *
> - */
> -
>  static uint64_t get_cluster_offset(BlockDriverState *bs,
> -                                   uint64_t offset, int *num)
> -{
> -    BDRVQcowState *s = bs->opaque;
> -    int l1_index, l2_index;
> -    uint64_t l2_offset, *l2_table, cluster_offset;
> -    int l1_bits, c;
> -    int index_in_cluster, nb_available, nb_needed, nb_clusters;
> -
> -    index_in_cluster = (offset >> 9) & (s->cluster_sectors - 1);
> -    nb_needed = *num + index_in_cluster;
> -
> -    l1_bits = s->l2_bits + s->cluster_bits;
> -
> -    /* compute how many bytes there are between the offset and
> -     * the end of the l1 entry
> -     */
> -
> -    nb_available = (1 << l1_bits) - (offset & ((1 << l1_bits) - 1));
> -
> -    /* compute the number of available sectors */
> -
> -    nb_available = (nb_available >> 9) + index_in_cluster;
> -
> -    cluster_offset = 0;
> -
> -    /* seek the the l2 offset in the l1 table */
> -
> -    l1_index = offset >> l1_bits;
> -    if (l1_index >= s->l1_size)
> -        goto out;
> -
> -    l2_offset = s->l1_table[l1_index];
> -
> -    /* seek the l2 table of the given l2 offset */
> -
> -    if (!l2_offset)
> -        goto out;
> -
> -    /* load the l2 table in memory */
> -
> -    l2_offset &= ~QCOW_OFLAG_COPIED;
> -    l2_table = l2_load(bs, l2_offset);
> -    if (l2_table == NULL)
> -        return 0;
> -
> -    /* find the cluster offset for the given disk offset */
> -
> -    l2_index = (offset >> s->cluster_bits) & (s->l2_size - 1);
> -    cluster_offset = be64_to_cpu(l2_table[l2_index]);
> -    nb_clusters = size_to_clusters(s, nb_needed << 9);
> -
> -    if (!cluster_offset) {
> -        /* how many empty clusters ? */
> -        c = count_contiguous_free_clusters(nb_clusters, &l2_table[l2_index]);
> -    } else {
> -        /* how many allocated clusters ? */
> -        c = count_contiguous_clusters(nb_clusters, s->cluster_size,
> -                &l2_table[l2_index], 0, QCOW_OFLAG_COPIED);
> -    }
> -
> -   nb_available = (c * s->cluster_sectors);
> -out:
> -    if (nb_available > nb_needed)
> -        nb_available = nb_needed;
> -
> -    *num = nb_available - index_in_cluster;
> -
> -    return cluster_offset & ~QCOW_OFLAG_COPIED;
> -}
> -
> -/*
> - * free_any_clusters
> - *
> - * free clusters according to its type: compressed or not
> - *
> - */
> -
> -static void free_any_clusters(BlockDriverState *bs,
> -                              uint64_t cluster_offset, int nb_clusters)
> -{
> -    BDRVQcowState *s = bs->opaque;
> -
> -    /* free the cluster */
> -
> -    if (cluster_offset & QCOW_OFLAG_COMPRESSED) {
> -        int nb_csectors;
> -        nb_csectors = ((cluster_offset >> s->csize_shift) &
> -                       s->csize_mask) + 1;
> -        free_clusters(bs, (cluster_offset & s->cluster_offset_mask) & ~511,
> -                      nb_csectors * 512);
> -        return;
> -    }
> -
> -    free_clusters(bs, cluster_offset, nb_clusters << s->cluster_bits);
> -
> -    return;
> -}
> -
> -/*
> - * get_cluster_table
> - *
> - * for a given disk offset, load (and allocate if needed)
> - * the l2 table.
> - *
> - * the l2 table offset in the qcow2 file and the cluster index
> - * in the l2 table are given to the caller.
> - *
> - */
> -
> -static int get_cluster_table(BlockDriverState *bs, uint64_t offset,
> -                             uint64_t **new_l2_table,
> -                             uint64_t *new_l2_offset,
> -                             int *new_l2_index)
> +                                   uint64_t offset, int allocate,
> +                                   int compressed_size,
> +                                   int n_start, int n_end)
>  {
>      BDRVQcowState *s = bs->opaque;
> -    int l1_index, l2_index, ret;
> -    uint64_t l2_offset, *l2_table;
> -
> -    /* seek the the l2 offset in the l1 table */
> +    int min_index, i, j, l1_index, l2_index, ret;
> +    uint64_t l2_offset, *l2_table, cluster_offset, tmp, old_l2_offset;
>  
>      l1_index = offset >> (s->l2_bits + s->cluster_bits);
>      if (l1_index >= s->l1_size) {
> -        ret = grow_l1_table(bs, l1_index + 1);
> -        if (ret < 0)
> +        /* outside l1 table is allowed: we grow the table if needed */
> +        if (!allocate)
> +            return 0;
> +        if (grow_l1_table(bs, l1_index + 1) < 0)
>              return 0;
>      }
>      l2_offset = s->l1_table[l1_index];
> +    if (!l2_offset) {
> +        if (!allocate)
> +            return 0;
> +    l2_allocate:
> +        old_l2_offset = l2_offset;
> +        /* allocate a new l2 entry */
> +        l2_offset = alloc_clusters(bs, s->l2_size * sizeof(uint64_t));
> +        /* update the L1 entry */
> +        s->l1_table[l1_index] = l2_offset | QCOW_OFLAG_COPIED;
> +        tmp = cpu_to_be64(l2_offset | QCOW_OFLAG_COPIED);
> +        if (bdrv_pwrite(s->hd, s->l1_table_offset + l1_index * sizeof(tmp),
> +                        &tmp, sizeof(tmp)) != sizeof(tmp))
> +            return 0;
> +        min_index = l2_cache_new_entry(bs);
> +        l2_table = s->l2_cache + (min_index << s->l2_bits);
>  
> -    /* seek the l2 table of the given l2 offset */
> -
> -    if (l2_offset & QCOW_OFLAG_COPIED) {
> -        /* load the l2 table in memory */
> -        l2_offset &= ~QCOW_OFLAG_COPIED;
> -        l2_table = l2_load(bs, l2_offset);
> -        if (l2_table == NULL)
> +        if (old_l2_offset == 0) {
> +            memset(l2_table, 0, s->l2_size * sizeof(uint64_t));
> +        } else {
> +            if (bdrv_pread(s->hd, old_l2_offset,
> +                           l2_table, s->l2_size * sizeof(uint64_t)) !=
> +                s->l2_size * sizeof(uint64_t))
> +                return 0;
> +        }
> +        if (bdrv_pwrite(s->hd, l2_offset,
> +                        l2_table, s->l2_size * sizeof(uint64_t)) !=
> +            s->l2_size * sizeof(uint64_t))
>              return 0;
>      } else {
> -        if (l2_offset)
> -            free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t));
> -        l2_table = l2_allocate(bs, l1_index);
> -        if (l2_table == NULL)
> +        if (!(l2_offset & QCOW_OFLAG_COPIED)) {
> +            if (allocate) {
> +                free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t));
> +                goto l2_allocate;
> +            }
> +        } else {
> +            l2_offset &= ~QCOW_OFLAG_COPIED;
> +        }
> +        for(i = 0; i < L2_CACHE_SIZE; i++) {
> +            if (l2_offset == s->l2_cache_offsets[i]) {
> +                /* increment the hit count */
> +                if (++s->l2_cache_counts[i] == 0xffffffff) {
> +                    for(j = 0; j < L2_CACHE_SIZE; j++) {
> +                        s->l2_cache_counts[j] >>= 1;
> +                    }
> +                }
> +                l2_table = s->l2_cache + (i << s->l2_bits);
> +                goto found;
> +            }
> +        }
> +        /* not found: load a new entry in the least used one */
> +        min_index = l2_cache_new_entry(bs);
> +        l2_table = s->l2_cache + (min_index << s->l2_bits);
> +        if (bdrv_pread(s->hd, l2_offset, l2_table, s->l2_size * sizeof(uint64_t)) !=
> +            s->l2_size * sizeof(uint64_t))
>              return 0;
> -        l2_offset = s->l1_table[l1_index] & ~QCOW_OFLAG_COPIED;
>      }
> -
> -    /* find the cluster offset for the given disk offset */
> -
> +    s->l2_cache_offsets[min_index] = l2_offset;
> +    s->l2_cache_counts[min_index] = 1;
> + found:
>      l2_index = (offset >> s->cluster_bits) & (s->l2_size - 1);
> -
> -    *new_l2_table = l2_table;
> -    *new_l2_offset = l2_offset;
> -    *new_l2_index = l2_index;
> -
> -    return 1;
> -}
> -
> -/*
> - * alloc_compressed_cluster_offset
> - *
> - * For a given offset of the disk image, return cluster offset in
> - * qcow2 file.
> - *
> - * If the offset is not found, allocate a new compressed cluster.
> - *
> - * Return the cluster offset if successful,
> - * Return 0, otherwise.
> - *
> - */
> -
> -static uint64_t alloc_compressed_cluster_offset(BlockDriverState *bs,
> -                                                uint64_t offset,
> -                                                int compressed_size)
> -{
> -    BDRVQcowState *s = bs->opaque;
> -    int l2_index, ret;
> -    uint64_t l2_offset, *l2_table, cluster_offset;
> -    int nb_csectors;
> -
> -    ret = get_cluster_table(bs, offset, &l2_table, &l2_offset, &l2_index);
> -    if (ret == 0)
> -        return 0;
> -
>      cluster_offset = be64_to_cpu(l2_table[l2_index]);
> -    if (cluster_offset & QCOW_OFLAG_COPIED)
> -        return cluster_offset & ~QCOW_OFLAG_COPIED;
> -
> -    if (cluster_offset)
> -        free_any_clusters(bs, cluster_offset, 1);
> -
> -    cluster_offset = alloc_bytes(bs, compressed_size);
> -    nb_csectors = ((cluster_offset + compressed_size - 1) >> 9) -
> -                  (cluster_offset >> 9);
> -
> -    cluster_offset |= QCOW_OFLAG_COMPRESSED |
> -                      ((uint64_t)nb_csectors << s->csize_shift);
> -
> -    /* update L2 table */
> -
> -    /* compressed clusters never have the copied flag */
> -
> -    l2_table[l2_index] = cpu_to_be64(cluster_offset);
> -    if (bdrv_pwrite(s->hd,
> -                    l2_offset + l2_index * sizeof(uint64_t),
> -                    l2_table + l2_index,
> -                    sizeof(uint64_t)) != sizeof(uint64_t))
> -        return 0;
> -
> -    return cluster_offset;
> -}
> -
> -typedef struct QCowL2Meta
> -{
> -    uint64_t offset;
> -    int n_start;
> -    int nb_available;
> -    int nb_clusters;
> -} QCowL2Meta;
> -
> -static int alloc_cluster_link_l2(BlockDriverState *bs, uint64_t cluster_offset,
> -        QCowL2Meta *m)
> -{
> -    BDRVQcowState *s = bs->opaque;
> -    int i, j = 0, l2_index, ret;
> -    uint64_t *old_cluster, start_sect, l2_offset, *l2_table;
> -
> -    if (m->nb_clusters == 0)
> -        return 0;
> -
> -    if (!(old_cluster = qemu_malloc(m->nb_clusters * sizeof(uint64_t))))
> -        return -ENOMEM;
> -
> -    /* copy content of unmodified sectors */
> -    start_sect = (m->offset & ~(s->cluster_size - 1)) >> 9;
> -    if (m->n_start) {
> -        ret = copy_sectors(bs, start_sect, cluster_offset, 0, m->n_start);
> -        if (ret < 0)
> -            goto err;
> +    if (!cluster_offset) {
> +        if (!allocate)
> +            return cluster_offset;
> +    } else if (!(cluster_offset & QCOW_OFLAG_COPIED)) {
> +        if (!allocate)
> +            return cluster_offset;
> +        /* free the cluster */
> +        if (cluster_offset & QCOW_OFLAG_COMPRESSED) {
> +            int nb_csectors;
> +            nb_csectors = ((cluster_offset >> s->csize_shift) &
> +                           s->csize_mask) + 1;
> +            free_clusters(bs, (cluster_offset & s->cluster_offset_mask) & ~511,
> +                          nb_csectors * 512);
> +        } else {
> +            free_clusters(bs, cluster_offset, s->cluster_size);
> +        }
> +    } else {
> +        cluster_offset &= ~QCOW_OFLAG_COPIED;
> +        return cluster_offset;
>      }
> -
> -    if (m->nb_available & (s->cluster_sectors - 1)) {
> -        uint64_t end = m->nb_available & ~(uint64_t)(s->cluster_sectors - 1);
> -        ret = copy_sectors(bs, start_sect + end, cluster_offset + (end << 9),
> -                m->nb_available - end, s->cluster_sectors);
> -        if (ret < 0)
> -            goto err;
> +    if (allocate == 1) {
> +        /* allocate a new cluster */
> +        cluster_offset = alloc_clusters(bs, s->cluster_size);
> +
> +        /* we must initialize the cluster content which won't be
> +           written */
> +        if ((n_end - n_start) < s->cluster_sectors) {
> +            uint64_t start_sect;
> +
> +            start_sect = (offset & ~(s->cluster_size - 1)) >> 9;
> +            ret = copy_sectors(bs, start_sect,
> +                               cluster_offset, 0, n_start);
> +            if (ret < 0)
> +                return 0;
> +            ret = copy_sectors(bs, start_sect,
> +                               cluster_offset, n_end, s->cluster_sectors);
> +            if (ret < 0)
> +                return 0;
> +        }
> +        tmp = cpu_to_be64(cluster_offset | QCOW_OFLAG_COPIED);
> +    } else {
> +        int nb_csectors;
> +        cluster_offset = alloc_bytes(bs, compressed_size);
> +        nb_csectors = ((cluster_offset + compressed_size - 1) >> 9) -
> +            (cluster_offset >> 9);
> +        cluster_offset |= QCOW_OFLAG_COMPRESSED |
> +            ((uint64_t)nb_csectors << s->csize_shift);
> +        /* compressed clusters never have the copied flag */
> +        tmp = cpu_to_be64(cluster_offset);
>      }
> -
> -    ret = -EIO;
>      /* update L2 table */
> -    if (!get_cluster_table(bs, m->offset, &l2_table, &l2_offset, &l2_index))
> -        goto err;
> -
> -    for (i = 0; i < m->nb_clusters; i++) {
> -        if(l2_table[l2_index + i] != 0)
> -            old_cluster[j++] = l2_table[l2_index + i];
> -
> -        l2_table[l2_index + i] = cpu_to_be64((cluster_offset +
> -                    (i << s->cluster_bits)) | QCOW_OFLAG_COPIED);
> -     }
> -
> -    if (bdrv_pwrite(s->hd, l2_offset + l2_index * sizeof(uint64_t),
> -                l2_table + l2_index, m->nb_clusters * sizeof(uint64_t)) !=
> -            m->nb_clusters * sizeof(uint64_t))
> -        goto err;
> -
> -    for (i = 0; i < j; i++)
> -        free_any_clusters(bs, old_cluster[i], 1);
> -
> -    ret = 0;
> -err:
> -    qemu_free(old_cluster);
> -    return ret;
> - }
> -
> -/*
> - * alloc_cluster_offset
> - *
> - * For a given offset of the disk image, return cluster offset in
> - * qcow2 file.
> - *
> - * If the offset is not found, allocate a new cluster.
> - *
> - * Return the cluster offset if successful,
> - * Return 0, otherwise.
> - *
> - */
> -
> -static uint64_t alloc_cluster_offset(BlockDriverState *bs,
> -                                     uint64_t offset,
> -                                     int n_start, int n_end,
> -                                     int *num, QCowL2Meta *m)
> -{
> -    BDRVQcowState *s = bs->opaque;
> -    int l2_index, ret;
> -    uint64_t l2_offset, *l2_table, cluster_offset;
> -    int nb_clusters, i = 0;
> -
> -    ret = get_cluster_table(bs, offset, &l2_table, &l2_offset, &l2_index);
> -    if (ret == 0)
> +    l2_table[l2_index] = tmp;
> +    if (bdrv_pwrite(s->hd,
> +                    l2_offset + l2_index * sizeof(tmp), &tmp, sizeof(tmp)) != sizeof(tmp))
>          return 0;
> -
> -    nb_clusters = size_to_clusters(s, n_end << 9);
> -
> -    nb_clusters = MIN(nb_clusters, s->l2_size - l2_index);
> -
> -    cluster_offset = be64_to_cpu(l2_table[l2_index]);
> -
> -    /* We keep all QCOW_OFLAG_COPIED clusters */
> -
> -    if (cluster_offset & QCOW_OFLAG_COPIED) {
> -        nb_clusters = count_contiguous_clusters(nb_clusters, s->cluster_size,
> -                &l2_table[l2_index], 0, 0);
> -
> -        cluster_offset &= ~QCOW_OFLAG_COPIED;
> -        m->nb_clusters = 0;
> -
> -        goto out;
> -    }
> -
> -    /* for the moment, multiple compressed clusters are not managed */
> -
> -    if (cluster_offset & QCOW_OFLAG_COMPRESSED)
> -        nb_clusters = 1;
> -
> -    /* how many available clusters ? */
> -
> -    while (i < nb_clusters) {
> -        i += count_contiguous_clusters(nb_clusters - i, s->cluster_size,
> -                &l2_table[l2_index], i, 0);
> -
> -        if(be64_to_cpu(l2_table[l2_index + i]))
> -            break;
> -
> -        i += count_contiguous_free_clusters(nb_clusters - i,
> -                &l2_table[l2_index + i]);
> -
> -        cluster_offset = be64_to_cpu(l2_table[l2_index + i]);
> -
> -        if ((cluster_offset & QCOW_OFLAG_COPIED) ||
> -                (cluster_offset & QCOW_OFLAG_COMPRESSED))
> -            break;
> -    }
> -    nb_clusters = i;
> -
> -    /* allocate a new cluster */
> -
> -    cluster_offset = alloc_clusters(bs, nb_clusters * s->cluster_size);
> -
> -    /* save info needed for meta data update */
> -    m->offset = offset;
> -    m->n_start = n_start;
> -    m->nb_clusters = nb_clusters;
> -
> -out:
> -    m->nb_available = MIN(nb_clusters << (s->cluster_bits - 9), n_end);
> -
> -    *num = m->nb_available - n_start;
> -
>      return cluster_offset;
>  }
>  
>  static int qcow_is_allocated(BlockDriverState *bs, int64_t sector_num,
>                               int nb_sectors, int *pnum)
>  {
> +    BDRVQcowState *s = bs->opaque;
> +    int index_in_cluster, n;
>      uint64_t cluster_offset;
>  
> -    *pnum = nb_sectors;
> -    cluster_offset = get_cluster_offset(bs, sector_num << 9, pnum);
> -
> +    cluster_offset = get_cluster_offset(bs, sector_num << 9, 0, 0, 0, 0);
> +    index_in_cluster = sector_num & (s->cluster_sectors - 1);
> +    n = s->cluster_sectors - index_in_cluster;
> +    if (n > nb_sectors)
> +        n = nb_sectors;
> +    *pnum = n;
>      return (cluster_offset != 0);
>  }
>  
> @@ -1102,9 +723,11 @@
>      uint64_t cluster_offset;
>  
>      while (nb_sectors > 0) {
> -        n = nb_sectors;
> -        cluster_offset = get_cluster_offset(bs, sector_num << 9, &n);
> +        cluster_offset = get_cluster_offset(bs, sector_num << 9, 0, 0, 0, 0);
>          index_in_cluster = sector_num & (s->cluster_sectors - 1);
> +        n = s->cluster_sectors - index_in_cluster;
> +        if (n > nb_sectors)
> +            n = nb_sectors;
>          if (!cluster_offset) {
>              if (bs->backing_hd) {
>                  /* read from the base image */
> @@ -1143,18 +766,15 @@
>      BDRVQcowState *s = bs->opaque;
>      int ret, index_in_cluster, n;
>      uint64_t cluster_offset;
> -    int n_end;
> -    QCowL2Meta l2meta;
>  
>      while (nb_sectors > 0) {
>          index_in_cluster = sector_num & (s->cluster_sectors - 1);
> -        n_end = index_in_cluster + nb_sectors;
> -        if (s->crypt_method &&
> -            n_end > QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors)
> -            n_end = QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors;
> -        cluster_offset = alloc_cluster_offset(bs, sector_num << 9,
> -                                              index_in_cluster,
> -                                              n_end, &n, &l2meta);
> +        n = s->cluster_sectors - index_in_cluster;
> +        if (n > nb_sectors)
> +            n = nb_sectors;
> +        cluster_offset = get_cluster_offset(bs, sector_num << 9, 1, 0,
> +                                            index_in_cluster,
> +                                            index_in_cluster + n);
>          if (!cluster_offset)
>              return -1;
>          if (s->crypt_method) {
> @@ -1165,10 +785,8 @@
>          } else {
>              ret = bdrv_pwrite(s->hd, cluster_offset + index_in_cluster * 512, buf, n * 512);
>          }
> -        if (ret != n * 512 || alloc_cluster_link_l2(bs, cluster_offset, &l2meta) < 0) {
> -            free_any_clusters(bs, cluster_offset, l2meta.nb_clusters);
> +        if (ret != n * 512)
>              return -1;
> -        }
>          nb_sectors -= n;
>          sector_num += n;
>          buf += n * 512;
> @@ -1186,33 +804,8 @@
>      uint64_t cluster_offset;
>      uint8_t *cluster_data;
>      BlockDriverAIOCB *hd_aiocb;
> -    QEMUBH *bh;
> -    QCowL2Meta l2meta;
>  } QCowAIOCB;
>  
> -static void qcow_aio_read_cb(void *opaque, int ret);
> -static void qcow_aio_read_bh(void *opaque)
> -{
> -    QCowAIOCB *acb = opaque;
> -    qemu_bh_delete(acb->bh);
> -    acb->bh = NULL;
> -    qcow_aio_read_cb(opaque, 0);
> -}
> -
> -static int qcow_schedule_bh(QEMUBHFunc *cb, QCowAIOCB *acb)
> -{
> -    if (acb->bh)
> -        return -EIO;
> -
> -    acb->bh = qemu_bh_new(cb, acb);
> -    if (!acb->bh)
> -        return -EIO;
> -
> -    qemu_bh_schedule(acb->bh);
> -
> -    return 0;
> -}
> -
>  static void qcow_aio_read_cb(void *opaque, int ret)
>  {
>      QCowAIOCB *acb = opaque;
> @@ -1222,12 +815,13 @@
>  
>      acb->hd_aiocb = NULL;
>      if (ret < 0) {
> -fail:
> +    fail:
>          acb->common.cb(acb->common.opaque, ret);
>          qemu_aio_release(acb);
>          return;
>      }
>  
> + redo:
>      /* post process the read buffer */
>      if (!acb->cluster_offset) {
>          /* nothing to do */
> @@ -1253,9 +847,12 @@
>      }
>  
>      /* prepare next AIO request */
> -    acb->n = acb->nb_sectors;
> -    acb->cluster_offset = get_cluster_offset(bs, acb->sector_num << 9, &acb->n);
> +    acb->cluster_offset = get_cluster_offset(bs, acb->sector_num << 9,
> +                                             0, 0, 0, 0);
>      index_in_cluster = acb->sector_num & (s->cluster_sectors - 1);
> +    acb->n = s->cluster_sectors - index_in_cluster;
> +    if (acb->n > acb->nb_sectors)
> +        acb->n = acb->nb_sectors;
>  
>      if (!acb->cluster_offset) {
>          if (bs->backing_hd) {
> @@ -1268,16 +865,12 @@
>                  if (acb->hd_aiocb == NULL)
>                      goto fail;
>              } else {
> -                ret = qcow_schedule_bh(qcow_aio_read_bh, acb);
> -                if (ret < 0)
> -                    goto fail;
> +                goto redo;
>              }
>          } else {
>              /* Note: in this case, no need to wait */
>              memset(acb->buf, 0, 512 * acb->n);
> -            ret = qcow_schedule_bh(qcow_aio_read_bh, acb);
> -            if (ret < 0)
> -                goto fail;
> +            goto redo;
>          }
>      } else if (acb->cluster_offset & QCOW_OFLAG_COMPRESSED) {
>          /* add AIO support for compressed blocks ? */
> @@ -1285,9 +878,7 @@
>              goto fail;
>          memcpy(acb->buf,
>                 s->cluster_cache + index_in_cluster * 512, 512 * acb->n);
> -        ret = qcow_schedule_bh(qcow_aio_read_bh, acb);
> -        if (ret < 0)
> -            goto fail;
> +        goto redo;
>      } else {
>          if ((acb->cluster_offset & 511) != 0) {
>              ret = -EIO;
> @@ -1316,7 +907,6 @@
>      acb->nb_sectors = nb_sectors;
>      acb->n = 0;
>      acb->cluster_offset = 0;
> -    acb->l2meta.nb_clusters = 0;
>      return acb;
>  }
>  
> @@ -1340,8 +930,8 @@
>      BlockDriverState *bs = acb->common.bs;
>      BDRVQcowState *s = bs->opaque;
>      int index_in_cluster;
> +    uint64_t cluster_offset;
>      const uint8_t *src_buf;
> -    int n_end;
>  
>      acb->hd_aiocb = NULL;
>  
> @@ -1352,11 +942,6 @@
>          return;
>      }
>  
> -    if (alloc_cluster_link_l2(bs, acb->cluster_offset, &acb->l2meta) < 0) {
> -        free_any_clusters(bs, acb->cluster_offset, acb->l2meta.nb_clusters);
> -        goto fail;
> -    }
> -
>      acb->nb_sectors -= acb->n;
>      acb->sector_num += acb->n;
>      acb->buf += acb->n * 512;
> @@ -1369,22 +954,19 @@
>      }
>  
>      index_in_cluster = acb->sector_num & (s->cluster_sectors - 1);
> -    n_end = index_in_cluster + acb->nb_sectors;
> -    if (s->crypt_method &&
> -        n_end > QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors)
> -        n_end = QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors;
> -
> -    acb->cluster_offset = alloc_cluster_offset(bs, acb->sector_num << 9,
> -                                          index_in_cluster,
> -                                          n_end, &acb->n, &acb->l2meta);
> -    if (!acb->cluster_offset || (acb->cluster_offset & 511) != 0) {
> +    acb->n = s->cluster_sectors - index_in_cluster;
> +    if (acb->n > acb->nb_sectors)
> +        acb->n = acb->nb_sectors;
> +    cluster_offset = get_cluster_offset(bs, acb->sector_num << 9, 1, 0,
> +                                        index_in_cluster,
> +                                        index_in_cluster + acb->n);
> +    if (!cluster_offset || (cluster_offset & 511) != 0) {
>          ret = -EIO;
>          goto fail;
>      }
>      if (s->crypt_method) {
>          if (!acb->cluster_data) {
> -            acb->cluster_data = qemu_mallocz(QCOW_MAX_CRYPT_CLUSTERS *
> -                                             s->cluster_size);
> +            acb->cluster_data = qemu_mallocz(s->cluster_size);
>              if (!acb->cluster_data) {
>                  ret = -ENOMEM;
>                  goto fail;
> @@ -1397,7 +979,7 @@
>          src_buf = acb->buf;
>      }
>      acb->hd_aiocb = bdrv_aio_write(s->hd,
> -                                   (acb->cluster_offset >> 9) + index_in_cluster,
> +                                   (cluster_offset >> 9) + index_in_cluster,
>                                     src_buf, acb->n,
>                                     qcow_aio_write_cb, acb);
>      if (acb->hd_aiocb == NULL)
> @@ -1571,7 +1153,7 @@
>  
>      memset(s->l1_table, 0, l1_length);
>      if (bdrv_pwrite(s->hd, s->l1_table_offset, s->l1_table, l1_length) < 0)
> -        return -1;
> +	return -1;
>      ret = bdrv_truncate(s->hd, s->l1_table_offset + l1_length);
>      if (ret < 0)
>          return ret;
> @@ -1637,10 +1219,8 @@
>          /* could not compress: write normal cluster */
>          qcow_write(bs, sector_num, buf, s->cluster_sectors);
>      } else {
> -        cluster_offset = alloc_compressed_cluster_offset(bs, sector_num << 9,
> -                                              out_len);
> -        if (!cluster_offset)
> -            return -1;
> +        cluster_offset = get_cluster_offset(bs, sector_num << 9, 2,
> +                                            out_len, 0, 0);
>          cluster_offset &= s->cluster_offset_mask;
>          if (bdrv_pwrite(s->hd, cluster_offset, out_buf, out_len) != out_len) {
>              qemu_free(out_buf);
> @@ -2225,19 +1805,26 @@
>      BDRVQcowState *s = bs->opaque;
>      int i, nb_clusters;
>  
> -    nb_clusters = size_to_clusters(s, size);
> -retry:
> -    for(i = 0; i < nb_clusters; i++) {
> -        int64_t i = s->free_cluster_index++;
> -        if (get_refcount(bs, i) != 0)
> -            goto retry;
> -    }
> +    nb_clusters = (size + s->cluster_size - 1) >> s->cluster_bits;
> +    for(;;) {
> +        if (get_refcount(bs, s->free_cluster_index) == 0) {
> +            s->free_cluster_index++;
> +            for(i = 1; i < nb_clusters; i++) {
> +                if (get_refcount(bs, s->free_cluster_index) != 0)
> +                    goto not_found;
> +                s->free_cluster_index++;
> +            }
>  #ifdef DEBUG_ALLOC2
> -    printf("alloc_clusters: size=%lld -> %lld\n",
> -            size,
> -            (s->free_cluster_index - nb_clusters) << s->cluster_bits);
> +            printf("alloc_clusters: size=%lld -> %lld\n",
> +                   size,
> +                   (s->free_cluster_index - nb_clusters) << s->cluster_bits);
>  #endif
> -    return (s->free_cluster_index - nb_clusters) << s->cluster_bits;
> +            return (s->free_cluster_index - nb_clusters) << s->cluster_bits;
> +        } else {
> +        not_found:
> +            s->free_cluster_index++;
> +        }
> +    }
>  }
>  
>  static int64_t alloc_clusters(BlockDriverState *bs, int64_t size)
> @@ -2301,7 +1888,8 @@
>      int new_table_size, new_table_size2, refcount_table_clusters, i, ret;
>      uint64_t *new_table;
>      int64_t table_offset;
> -    uint8_t data[12];
> +    uint64_t data64;
> +    uint32_t data32;
>      int old_table_size;
>      int64_t old_table_offset;
>  
> @@ -2340,10 +1928,13 @@
>      for(i = 0; i < s->refcount_table_size; i++)
>          be64_to_cpus(&new_table[i]);
>  
> -    cpu_to_be64w((uint64_t*)data, table_offset);
> -    cpu_to_be32w((uint32_t*)(data + 8), refcount_table_clusters);
> +    data64 = cpu_to_be64(table_offset);
>      if (bdrv_pwrite(s->hd, offsetof(QCowHeader, refcount_table_offset),
> -                    data, sizeof(data)) != sizeof(data))
> +                    &data64, sizeof(data64)) != sizeof(data64))
> +        goto fail;
> +    data32 = cpu_to_be32(refcount_table_clusters);
> +    if (bdrv_pwrite(s->hd, offsetof(QCowHeader, refcount_table_clusters),
> +                    &data32, sizeof(data32)) != sizeof(data32))
>          goto fail;
>      qemu_free(s->refcount_table);
>      old_table_offset = s->refcount_table_offset;
> @@ -2572,7 +2163,7 @@
>      uint16_t *refcount_table;
>  
>      size = bdrv_getlength(s->hd);
> -    nb_clusters = size_to_clusters(s, size);
> +    nb_clusters = (size + s->cluster_size - 1) >> s->cluster_bits;
>      refcount_table = qemu_mallocz(nb_clusters * sizeof(uint16_t));
>  
>      /* header */
> @@ -2624,7 +2215,7 @@
>      int refcount;
>  
>      size = bdrv_getlength(s->hd);
> -    nb_clusters = size_to_clusters(s, size);
> +    nb_clusters = (size + s->cluster_size - 1) >> s->cluster_bits;
>      for(k = 0; k < nb_clusters;) {
>          k1 = k;
>          refcount = get_refcount(bs, k);
>
>
>   

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [PATCH] Revert block-qcow2.c to kvm-72 version due to corruption reports
  2009-02-14 23:13         ` Anthony Liguori
@ 2009-02-15  2:01           ` Jamie Lokier
  2009-02-15  4:09             ` Anthony Liguori
  0 siblings, 1 reply; 82+ messages in thread
From: Jamie Lokier @ 2009-02-15  2:01 UTC (permalink / raw)
  To: qemu-devel

Anthony Liguori wrote:
> Well such a large reversion is a bad idea.  Can you git bisect to the 
> actual changeset that introduced the bug you see?

Have done, did you read the other thread?

Message-ID: <20090211114126.GC31997@shareable.org>
Subject: Re: qcow2 corruption observed, fixed by reverting old change

Jamie Lokier wrote:
> Kevin Wolf wrote:
> > Jamie Lokier schrieb:
> > > Although there are many ways to make Windows blue screen in KVM, in
> > > this case I've narrowed it down to the difference in
> > > qemu/block-qcow2.c between kvm-72 and kvm-73 (not -83).
> >
> > This must be one of SVN revisions 5003 to 5008 in upstream qemu. Can you
> > narrow it down to one of these? I certainly don't feel like reviewing
> > all of them once again.
> 
> It's QEMU SVN delta 5005-5006, copied below.

I don't have time to disentangle the different optimisations done to
qcow2 around that changeset, nor fix the changeset itself, but I can
test proposed patches on my guest VM image, which I've copied aside
because it's consistent about failing or not.

If nobody else has time either, then I think an imminent new QEMU
release, which may get rolled into distros and so on, is better off
with the the changes reverted than corrupting guest images.

I'm not proposing throwing away all the good work done on qcow2, only
that fixing observed corruption is important especially for a major
release, and reverting later changes can be temporary until the bug is
found and fixed.

-- Jamie

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [PATCH] Revert block-qcow2.c to kvm-72 version due to corruption reports
  2009-02-14 22:23         ` Dor Laor
@ 2009-02-15  2:20           ` Jamie Lokier
  0 siblings, 0 replies; 82+ messages in thread
From: Jamie Lokier @ 2009-02-15  2:20 UTC (permalink / raw)
  To: dlaor, qemu-devel

Dor Laor wrote:
> The solution is to find the real cause to the corruption.

I agree, if someone is able to do that, great, but if not and
practical reality results in these choices:

    1. Ship the current code which results in corruption on Windows
       2000 and 2003 guests (and who knows what else), and by the way
       is unlikely to have anything to do with device emulation.

    2. Revert to (nearly) kvm-72 code which appears to fix the
       majority of those corruption cases, although there is still
       something rare, which may be a different bug.

Which is the best choice?

>From a QA POV, I would revert the known bug until someone has a fix,
then reinstate everything after it which is thought to be good.

> Jamie Lokier wrote:
>      Anthony Liguori wrote:
>                Simply reverting the qcow2 code appears to fix
>                those problems, so it
>                needn't hold up cutting a release.  That's what I
>                recommend.
>           Send some patches.
>      I did already.
> 
>      Here it is again.  This should fix my bug and Marc's bug according to
>      his report that reverting qcow2.c fixes it.
> Going back to kvm-72 is not good also.

> First, there were qcow2 corruptions before it, they were very rare but still
> exist.

That's true.  But they were noticably rarer - to the point that people
clearly are using kvm-72 with qcow2 and not reporting many problems.

Ubuntu 8.10 shipped kvm-72, and that coincided with their announcement
that they're supporting KVM as their official virtualisation solution.
I imagine kvm-72 is getting a fair bit of usage because of that.

Of course they could be having rare problems and think it's a bug in
the guest or its applications :-)

> Not long ago we did not know even that qcow2 is the faulty.

Worrying, isn't it.  Does qcow2 get any rigorous testing?  Should that
be added - a blockdev test suite?

There hasn't been a complete lack of bug reports about qcow2, but
maybe they aren't getting to the right places, and maybe they're too
difficult to reproduce and easy to workaround ("my guest occasionally
shows random corruption", "don't use KVM for that guest", "I switch to
raw and it went away")

I very luckily discovered it prevented one of my VMs from booting, as
soon as I upgraded from kvm-72 (shipped with Ubuntu) to something
newer.  If it hadn't prevented it from booting, just occasional rare
corruption, I might not have realised it was qcow2 at all.  Guest
corruption can occur for many reasons, and -win2k-hack implies that
the IDE emulation is not quite right in some way.

-- Jamie

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [PATCH] Revert block-qcow2.c to kvm-72 version due to corruption reports
  2009-02-15  2:01           ` Jamie Lokier
@ 2009-02-15  4:09             ` Anthony Liguori
  2009-02-15 15:42               ` Jamie Lokier
  0 siblings, 1 reply; 82+ messages in thread
From: Anthony Liguori @ 2009-02-15  4:09 UTC (permalink / raw)
  To: qemu-devel

On Sat, Feb 14, 2009 at 8:01 PM, Jamie Lokier <jamie@shareable.org> wrote:
> Have done, did you read the other thread?

Yes, but your patch confused me (which is admittedly not hard).

>> It's QEMU SVN delta 5005-5006, copied below.

So why such an aggressive revert?  Why not just revert the problematic
changesets?

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: Cutting a new QEMU release
  2009-02-05 17:51       ` Ben Taylor
  2009-02-05 18:39         ` René Rebe
  2009-02-05 19:03         ` Anthony Liguori
@ 2009-02-15 15:25         ` Andreas Färber
  2009-02-15 15:44           ` Jamie Lokier
  2009-02-15 18:17           ` Anthony Liguori
  2 siblings, 2 replies; 82+ messages in thread
From: Andreas Färber @ 2009-02-15 15:25 UTC (permalink / raw)
  To: qemu-devel

Am 05.02.2009 um 18:51 schrieb Ben Taylor:

> On Thu, Feb 5, 2009 at 11:27 AM, Paul Brook <paul@codesourcery.com>  
> wrote:
>>
>> In practice Fabice is pretty much the only person who's ever done  
>> significant
>> work on kqemu (except maybe some fairly minor host OS porting  
>> bits). There's
>> never been a public source repository, so you get to use whatever  
>> random
>> tarballs Fabrice leaves lying around. If those don't work, noone  
>> really
>> cares.
>
> I've maintained tarballs for both 1.4.0 and 1.3.0 at the qemu project
> on OpenSolaris.org, and just realized that I never put into the SVN  
> repo
> the mods I made to the 1.4.0 code.  I had tested it with Solaris SXCE
> and Ubuntu 08.04.  If anyone shows some interest in testing, I'll  
> import
> the 1.4.0 into the SVN repo.  I believe that I picked up the minor
> patches that were posted to the list to fix compilations on linux
> with some various kernels.

I have happily used kqemu 1.4 on OpenSolaris for several months  
without problems, running Linux in sparc-softmmu and Haiku/BeOS in  
i386-softmmu.

I did have to tweak the Makefile a little for kqemu to link on  
OpenSolaris/amd64, I believe. Possibly by replacing ld with path/to/ 
amd64/ld.

There has been no rumor of any KVM port to Solaris. Linux kernel  
integration cannot be the only criteria.
It used to work in early December - could we set up a Git repo for  
Fabrice's official tarball? Then we could apply the OpenSolaris.org  
changes on a branch and play with our own Git forks to keep it working  
as long as there is no alternative. Asking for maintainers of  
unversioned software seems doomed to fail.

Andreas

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [PATCH] Revert block-qcow2.c to kvm-72 version due to corruption reports
  2009-02-15  4:09             ` Anthony Liguori
@ 2009-02-15 15:42               ` Jamie Lokier
  2009-02-15 18:19                 ` Anthony Liguori
  0 siblings, 1 reply; 82+ messages in thread
From: Jamie Lokier @ 2009-02-15 15:42 UTC (permalink / raw)
  To: qemu-devel

Anthony Liguori wrote:
> On Sat, Feb 14, 2009 at 8:01 PM, Jamie Lokier <jamie@shareable.org> wrote:
> > Have done, did you read the other thread?
> 
> Yes, but your patch confused me (which is admittedly not hard).
> 
> >> It's QEMU SVN delta 5005-5006, copied below.
> 
> So why such an aggressive revert?  Why not just revert the problematic
> changesets?

Because most of the following changes look too dependent on it.

I did keep a couple of changes which are trivially independent since
that one - default to "cache=writeback" and eliminating #define
offsetof.

You have a point that QEMU SVN deltas up to 5005 don't need to be
reverted.  Reason for that: I simply don't have time to trim the patch
down to its bare essentials quickly, and being a corruption bug, it
should be dealt with quickly.  This one seems to work; feel free to
improve it by reverting less, or waiting a long time for me to do so :-)

-- Jamie

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: Cutting a new QEMU release
  2009-02-15 15:25         ` Andreas Färber
@ 2009-02-15 15:44           ` Jamie Lokier
  2009-02-15 19:14             ` Andreas Färber
  2009-02-15 18:17           ` Anthony Liguori
  1 sibling, 1 reply; 82+ messages in thread
From: Jamie Lokier @ 2009-02-15 15:44 UTC (permalink / raw)
  To: qemu-devel

Andreas Färber wrote:
> There has been no rumor of any KVM port to Solaris. Linux kernel  
> integration cannot be the only criteria.

Does Solaris not have their own equivalent to KVM, for running VMs?

-- Jamie

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: Cutting a new QEMU release
  2009-02-15 15:25         ` Andreas Färber
  2009-02-15 15:44           ` Jamie Lokier
@ 2009-02-15 18:17           ` Anthony Liguori
  2009-02-15 20:31             ` Andreas Färber
  1 sibling, 1 reply; 82+ messages in thread
From: Anthony Liguori @ 2009-02-15 18:17 UTC (permalink / raw)
  To: qemu-devel

Andreas Färber wrote:
>
> Am 05.02.2009 um 18:51 schrieb Ben Taylor:
>
>> On Thu, Feb 5, 2009 at 11:27 AM, Paul Brook <paul@codesourcery.com> 
>> wrote:
>>>
>>> In practice Fabice is pretty much the only person who's ever done 
>>> significant
>>> work on kqemu (except maybe some fairly minor host OS porting bits). 
>>> There's
>>> never been a public source repository, so you get to use whatever 
>>> random
>>> tarballs Fabrice leaves lying around. If those don't work, noone really
>>> cares.
>>
>> I've maintained tarballs for both 1.4.0 and 1.3.0 at the qemu project
>> on OpenSolaris.org, and just realized that I never put into the SVN repo
>> the mods I made to the 1.4.0 code.  I had tested it with Solaris SXCE
>> and Ubuntu 08.04.  If anyone shows some interest in testing, I'll import
>> the 1.4.0 into the SVN repo.  I believe that I picked up the minor
>> patches that were posted to the list to fix compilations on linux
>> with some various kernels.
>
> I have happily used kqemu 1.4 on OpenSolaris for several months 
> without problems, running Linux in sparc-softmmu and Haiku/BeOS in 
> i386-softmmu.
>
> I did have to tweak the Makefile a little for kqemu to link on 
> OpenSolaris/amd64, I believe. Possibly by replacing ld with 
> path/to/amd64/ld.
>
> There has been no rumor of any KVM port to Solaris. Linux kernel 
> integration cannot be the only criteria.
> It used to work in early December - could we set up a Git repo for 
> Fabrice's official tarball? Then we could apply the OpenSolaris.org 
> changes on a branch and play with our own Git forks to keep it working 
> as long as there is no alternative. Asking for maintainers of 
> unversioned software seems doomed to fail.

Set up a repository somewhere.  You don't need anyone's permission for that.

Savannah isn't a great place for hosting.  You can only have one git 
repo per project.  I'd suggest something like github or repo.or.cz.

Regards,

Anthony Liguori

>
> Andreas
>
>
>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [PATCH] Revert block-qcow2.c to kvm-72 version due to corruption reports
  2009-02-15 15:42               ` Jamie Lokier
@ 2009-02-15 18:19                 ` Anthony Liguori
  2009-02-15 18:34                   ` Johannes Schindelin
  2009-02-17  1:01                   ` Jamie Lokier
  0 siblings, 2 replies; 82+ messages in thread
From: Anthony Liguori @ 2009-02-15 18:19 UTC (permalink / raw)
  To: qemu-devel

Jamie Lokier wrote:
> Anthony Liguori wrote:
>   
>> On Sat, Feb 14, 2009 at 8:01 PM, Jamie Lokier <jamie@shareable.org> wrote:
>>     
>>> Have done, did you read the other thread?
>>>       
>> Yes, but your patch confused me (which is admittedly not hard).
>>
>>     
>>>> It's QEMU SVN delta 5005-5006, copied below.
>>>>         
>> So why such an aggressive revert?  Why not just revert the problematic
>> changesets?
>>     
>
> Because most of the following changes look too dependent on it.
>   

Too dependent on the introduced functionality or too dependent to make 
porting trivial?  My impression upon looking was that it's the later, 
not the former.  If that is the case, then someone needs to do the work 
of properly reverting.

> I did keep a couple of changes which are trivially independent since
> that one - default to "cache=writeback" and eliminating #define
> offsetof.
>
> You have a point that QEMU SVN deltas up to 5005 don't need to be
> reverted.  Reason for that: I simply don't have time to trim the patch
> down to its bare essentials quickly, and being a corruption bug, it
> should be dealt with quickly.  This one seems to work; feel free to
> improve it by reverting less, or waiting a long time for me to do so :-)
>   

But many of the changes since 5005 were also corruption fixes.  And 
let's be clear, your data is *not* safe with qcow2.  So I don't consider 
this to be a show stopping issue.

Regards,

Anthony Liguori

> -- Jamie
>
>
>   

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [PATCH] Revert block-qcow2.c to kvm-72 version due to corruption reports
  2009-02-15 18:19                 ` Anthony Liguori
@ 2009-02-15 18:34                   ` Johannes Schindelin
  2009-02-16  1:01                     ` Anthony Liguori
  2009-02-16  1:19                     ` Anthony Liguori
  2009-02-17  1:01                   ` Jamie Lokier
  1 sibling, 2 replies; 82+ messages in thread
From: Johannes Schindelin @ 2009-02-15 18:34 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel

Hi,

On Sun, 15 Feb 2009, Anthony Liguori wrote:

> And let's be clear, your data is *not* safe with qcow2.  So I don't 
> consider this to be a show stopping issue.

I beg your pardon?  The one format that was recommended for quite a long 
time now is considered unsafe?

That would not have happened with Fabrice in charge,
Dscho

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: Cutting a new QEMU release
  2009-02-15 15:44           ` Jamie Lokier
@ 2009-02-15 19:14             ` Andreas Färber
  0 siblings, 0 replies; 82+ messages in thread
From: Andreas Färber @ 2009-02-15 19:14 UTC (permalink / raw)
  To: qemu-devel

Am 15.02.2009 um 16:44 schrieb Jamie Lokier:

> Andreas Färber wrote:
>> There has been no rumor of any KVM port to Solaris. Linux kernel
>> integration cannot be the only criteria.
>
> Does Solaris not have their own equivalent to KVM, for running VMs?

Sun has xVM (based on Xen) with virt-manager UI. But it didn't run,  
e.g., Haiku and doesn't help with non-native (sparc-/ppc-softmmu)  
emulation either.
I'm not looking for a virtualization technology on Solaris but for a  
platform suited for my uses of QEMU emulation (and that box was pretty  
fast :).

However unsupported, QEMU+kqemu on OpenSolaris/amd64 is much faster  
than unaccelerated QEMU on OSX/ppc!

And trying to set up any KVM guest in Fedora was a pain. Haven't tried  
the new KVM integration in QEMU trunk yet. Maybe kqemu really has a  
bad kernel interface, but it's simple to set up and fits the needs of  
my use cases:

- booting existing hard disk images without one-time booting from an  
ISO image first (virt-manager seems to require the latter)
- storing the image files anywhere I like, including my home dir  
(SELinux messes with that on Fedora 10)
- starting the VM from an unpriviledged user, preferably from a shell  
script

Andreas

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] Re: Cutting a new QEMU release
  2009-02-15 18:17           ` Anthony Liguori
@ 2009-02-15 20:31             ` Andreas Färber
  0 siblings, 0 replies; 82+ messages in thread
From: Andreas Färber @ 2009-02-15 20:31 UTC (permalink / raw)
  To: qemu-devel


Am 15.02.2009 um 19:17 schrieb Anthony Liguori:

> Andreas Färber wrote:
>> [kqemu] used to work in early December - could we set up a Git repo  
>> for Fabrice's official tarball? Then we could apply the  
>> OpenSolaris.org changes on a branch and play with our own Git forks  
>> to keep it working as long as there is no alternative. Asking for  
>> maintainers of unversioned software seems doomed to fail.
>
> Set up a repository somewhere.  You don't need anyone's permission  
> for that.

I just did: http://repo.or.cz/w/kqemu.git

Andreas

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [PATCH] Revert block-qcow2.c to kvm-72 version due to corruption reports
  2009-02-15 18:34                   ` Johannes Schindelin
@ 2009-02-16  1:01                     ` Anthony Liguori
  2009-02-17  0:52                       ` Jamie Lokier
  2009-02-16  1:19                     ` Anthony Liguori
  1 sibling, 1 reply; 82+ messages in thread
From: Anthony Liguori @ 2009-02-16  1:01 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: qemu-devel

Johannes Schindelin wrote:
> Hi,
>
> On Sun, 15 Feb 2009, Anthony Liguori wrote:
>
>   
>> And let's be clear, your data is *not* safe with qcow2.  So I don't 
>> consider this to be a show stopping issue.
>>     
>
> I beg your pardon?  The one format that was recommended for quite a long 
> time now is considered unsafe?
>   

It's always been that way.  It's unsafe for a number of reasons that 
have been discussed at great length.

Regards,

Anthony LIguori

> That would not have happened with Fabrice in charge,
> Dscho
>
>   

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [PATCH] Revert block-qcow2.c to kvm-72 version due to corruption reports
  2009-02-15 18:34                   ` Johannes Schindelin
  2009-02-16  1:01                     ` Anthony Liguori
@ 2009-02-16  1:19                     ` Anthony Liguori
  1 sibling, 0 replies; 82+ messages in thread
From: Anthony Liguori @ 2009-02-16  1:19 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: qemu-devel

Johannes Schindelin wrote:
> Hi,
>
> On Sun, 15 Feb 2009, Anthony Liguori wrote:
>
>   
>> And let's be clear, your data is *not* safe with qcow2.  So I don't 
>> consider this to be a show stopping issue.
>>     
>
> I beg your pardon?  The one format that was recommended for quite a long 
> time now is considered unsafe?
>   

Let me be abundantly clear here.  qcow2 has never been the "recommended" 
format IMHO.  When it was introduced, it was introduced as 
experimental.  This is why block-qcow2.c was forked instead of qcow2 
support being integrated into block-qcow.c.  There's a ton of duplicate 
code between the two.  The plan was to eventually merge the two once 
qcow2 stabilized.  That's not happened.  The code was never improved 
much since it's introduction until recently.

I have concerns about the fundamental design of qcow2.  I believe it 
will be difficult to make it safe while having acceptable performance.  
There has been some work recently by Gleb Natapov to reduce the 
corruption window in qcow2 but it's still there.

I can only recommend qcow2 to casual users.  It should *not* be used in 
production environments.  It pains me to say that because we don't have 
a good alternative but there's no way I could recommend its use.

In particular, Jamie's patch reverts one set of patches that reduces the 
corruption window to "fix" a corruption bug that is now being 
experience.  However, since we don't know exactly what the cause of the 
new bug is, it's not necessarily true that the revert fixes the bug.  It 
may just make it more difficult to expose.

There's really no winning scenario here except finding the root cause of 
the new bug and fixing it.  That still won't make qcow2 safe for 
production data though.  So as far as I'm concerned, until qcow2 is made 
completely safe, it's still an experimental feature.

Regards,

Anthony Liguori

> That would not have happened with Fabrice in charge,
> Dscho
>
>   

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [PATCH] Revert block-qcow2.c to kvm-72 version due to corruption reports
  2009-02-16  1:01                     ` Anthony Liguori
@ 2009-02-17  0:52                       ` Jamie Lokier
  2009-02-17  2:55                         ` Anthony Liguori
  0 siblings, 1 reply; 82+ messages in thread
From: Jamie Lokier @ 2009-02-17  0:52 UTC (permalink / raw)
  To: qemu-devel

Anthony Liguori wrote:
> >>And let's be clear, your data is *not* safe with qcow2.  So I don't 
> >>consider this to be a show stopping issue.
> >
> >I beg your pardon?  The one format that was recommended for quite a long 
> >time now is considered unsafe?
> 
> It's always been that way.  It's unsafe for a number of reasons that 
> have been discussed at great length.

It sure isn't mentioned in the documentation.
If it was, I would never have used it, and I imagine I'm not alone.

QEMU might be an emulator project where people expect quirks, but KVM
and Xen are professional virtualisation platforms competing with
VMware.

It is really not very professional that the documentation places "your
data is not safe" formats on an equal footing with safe formats -
without saying anything about it - and doesn't even recommend one or
the other.

That said, maybe Microsoft is doing the same thing - their
documentation happily recommends their VHD format if you're not
concerned about running out of disk space, and it's maybe VHD has
similar corruption windows.

-- Jamie

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [PATCH] Revert block-qcow2.c to kvm-72 version due to corruption reports
  2009-02-15 18:19                 ` Anthony Liguori
  2009-02-15 18:34                   ` Johannes Schindelin
@ 2009-02-17  1:01                   ` Jamie Lokier
  1 sibling, 0 replies; 82+ messages in thread
From: Jamie Lokier @ 2009-02-17  1:01 UTC (permalink / raw)
  To: qemu-devel

Anthony Liguori wrote:
> >>>>It's QEMU SVN delta 5005-5006, copied below.
> >>>>        
> >>So why such an aggressive revert?  Why not just revert the problematic
> >>changesets?
> >
> >Because most of the following changes look too dependent on it.
> >  
> 
> Too dependent on the introduced functionality or too dependent to make 
> porting trivial?  My impression upon looking was that it's the later, 

My impression is the former, in that it seems necessary to understand
the changes in 5006 to understand how to rewrite subsequent patches
which use the changed functions.

But I didn't spend a long time on it, as I can't.  Of course all such
things reduce to trivial porting if you have enough time.

> But many of the changes since 5005 were also corruption fixes.  And 
> let's be clear, your data is *not* safe with qcow2.  So I don't consider 
> this to be a show stopping issue.

There's a HUGE difference between "not safe if the host/QEMU crashes"
and "corrupts silently during normal operation with no errors".

The former is a rare event we hope.

Marc's report, based apparently on a big farm of VMs, is that he
observes this corruption a lot with Windows guests.

The scary thing is it looks like it doesn't have anything (directly)
to do with the device emulation, which is more sensitive to guest OS
type.  I wonder if people using kvm >= 73 have silent corruption in
their Linux guests without noticing yet.

-- Jamie

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Qemu-devel] [PATCH] Revert block-qcow2.c to kvm-72 version due to corruption reports
  2009-02-17  0:52                       ` Jamie Lokier
@ 2009-02-17  2:55                         ` Anthony Liguori
  0 siblings, 0 replies; 82+ messages in thread
From: Anthony Liguori @ 2009-02-17  2:55 UTC (permalink / raw)
  To: qemu-devel

Jamie Lokier wrote:
> Anthony Liguori wrote:
>   
>>>> And let's be clear, your data is *not* safe with qcow2.  So I don't 
>>>> consider this to be a show stopping issue.
>>>>         
>>> I beg your pardon?  The one format that was recommended for quite a long 
>>> time now is considered unsafe?
>>>       
>> It's always been that way.  It's unsafe for a number of reasons that 
>> have been discussed at great length.
>>     
>
> It sure isn't mentioned in the documentation.
> If it was, I would never have used it, and I imagine I'm not alone.
>
> QEMU might be an emulator project where people expect quirks, but KVM
> and Xen are professional virtualisation platforms competing with
> VMware.
>
> It is really not very professional that the documentation places "your
> data is not safe" formats on an equal footing with safe formats -
> without saying anything about it - and doesn't even recommend one or
> the other.
>   

Please submit patches.  I don't disagree with you and that is why I'm 
trying to make this clear now.

> That said, maybe Microsoft is doing the same thing - their
> documentation happily recommends their VHD format if you're not
> concerned about running out of disk space, and it's maybe VHD has
> similar corruption windows.
>   

Yeah, it's hard to make a truly reliable format that isn't raw.  It 
basically is the same problem file systems solve and requires either a 
journal or an fsck step.  I'm thinking that this is a problem for other 
software too.

Regards,

Anthony Liguori

> -- Jamie
>
>
>   

^ permalink raw reply	[flat|nested] 82+ messages in thread

end of thread, other threads:[~2009-02-17  2:55 UTC | newest]

Thread overview: 82+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-03 20:48 [Qemu-devel] Cutting a new QEMU release Anthony Liguori
2009-02-03 20:58 ` Glauber Costa
2009-02-03 21:35   ` Laurent Desnogues
2009-02-03 21:50     ` Anthony Liguori
2009-02-03 22:05       ` Laurent Desnogues
2009-02-03 22:47         ` Anthony Liguori
2009-02-03 23:48           ` Glauber Costa
2009-02-04 13:09       ` Ulrich Hecht
2009-02-04  0:31     ` David Turner
     [not found]     ` <74222928-D24B-4780-BDB0-D537A83C4F68@hotmail.com>
2009-02-04  5:08       ` C.W. Betts
2009-02-03 21:48 ` Rick Vernam
2009-02-03 22:07 ` Daniel P. Berrange
2009-02-04 14:50 ` Aurelien Jarno
2009-02-04 15:23   ` Tristan Gingold
2009-02-04 15:43     ` Lennart Sorensen
2009-02-04 16:01       ` Tristan Gingold
2009-02-04 18:17         ` [Qemu-devel] " Consul
2009-02-04 17:39   ` [Qemu-devel] " Blue Swirl
2009-02-04 17:50     ` Jonathan Kalbfeld
2009-02-04 20:07   ` Blue Swirl
2009-02-07 14:15   ` Stuart Brady
2009-02-04 15:58 ` Glauber Costa
2009-02-07 15:29 ` Shin-ichiro KAWASAKI
2009-02-11 21:49   ` Rob Landley
2009-02-12 14:44     ` Shin-ichiro KAWASAKI
2009-02-12 21:08       ` Rob Landley
2009-02-12 21:44       ` Rob Landley
2009-02-09 12:43 ` Mark McLoughlin
2009-02-09 21:36   ` Anthony Liguori
2009-02-10  0:47   ` Rob Landley
2009-02-10  7:22     ` M. Warner Losh
2009-02-13  8:40 ` Riku Voipio
2009-02-13  9:59   ` Stefano Stabellini
2009-02-13 16:30   ` Jamie Lokier
2009-02-13 17:00     ` Anthony Liguori
2009-02-13 19:04       ` [Qemu-devel] [PATCH] Revert block-qcow2.c to kvm-72 version due to corruption reports Jamie Lokier
2009-02-14 22:23         ` Dor Laor
2009-02-15  2:20           ` Jamie Lokier
2009-02-14 23:13         ` Anthony Liguori
2009-02-15  2:01           ` Jamie Lokier
2009-02-15  4:09             ` Anthony Liguori
2009-02-15 15:42               ` Jamie Lokier
2009-02-15 18:19                 ` Anthony Liguori
2009-02-15 18:34                   ` Johannes Schindelin
2009-02-16  1:01                     ` Anthony Liguori
2009-02-17  0:52                       ` Jamie Lokier
2009-02-17  2:55                         ` Anthony Liguori
2009-02-16  1:19                     ` Anthony Liguori
2009-02-17  1:01                   ` Jamie Lokier
  -- strict thread matches above, loose matches on Subject: below --
2009-02-05  9:13 [Qemu-devel] Re: Cutting a new QEMU release Steve Fosdick
2009-02-05 14:26 ` Anthony Liguori
2009-02-05 15:36   ` Rick Vernam
2009-02-05 16:27     ` Paul Brook
2009-02-05 17:15       ` René Rebe
2009-02-05 17:36         ` Paul Brook
2009-02-05 17:51           ` Daniel P. Berrange
2009-02-05 17:51       ` Ben Taylor
2009-02-05 18:39         ` René Rebe
2009-02-05 19:03         ` Anthony Liguori
2009-02-06 10:54           ` Steve Fosdick
2009-02-06 15:57             ` René Rebe
2009-02-06 17:12               ` Anthony Liguori
2009-02-06 21:47                 ` René Rebe
2009-02-07 16:49                 ` Jamie Lokier
2009-02-07 17:06                   ` Laurent Desnogues
2009-02-07 23:46                   ` Anthony Liguori
2009-02-06 21:53               ` René Rebe
2009-02-07 16:39             ` Jamie Lokier
     [not found]           ` <92CAE88C-36FF-4566-BD1D-ACA58C98CB0F@hotmail.com>
2009-02-09  5:01             ` C.W. Betts
     [not found]               ` <784D2534-F9CD-4EA5-BBEE-67E9DE196598@hotmail.com>
2009-02-09  5:42                 ` C.W. Betts
2009-02-09 10:29                   ` René Rebe
2009-02-15 15:25         ` Andreas Färber
2009-02-15 15:44           ` Jamie Lokier
2009-02-15 19:14             ` Andreas Färber
2009-02-15 18:17           ` Anthony Liguori
2009-02-15 20:31             ` Andreas Färber
2009-02-05 15:55   ` René Rebe
2009-02-07 12:01   ` Stefan Weil
2009-02-07 15:08     ` Anthony Liguori
2009-02-07 15:36     ` Jamie Lokier
2009-02-07 16:45       ` Jan Kiszka
2009-02-05 14:55 ` Rick Vernam

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).