* [Qemu-devel] [PATCH] report read/write errors to IDE guest driver as ECC errors
@ 2008-08-05 11:55 Samuel Thibault
2008-08-06 2:26 ` Anthony Liguori
2008-08-06 16:21 ` [Qemu-devel] " Samuel Thibault
0 siblings, 2 replies; 9+ messages in thread
From: Samuel Thibault @ 2008-08-05 11:55 UTC (permalink / raw)
To: qemu-devel
report read/write errors to IDE guest driver as ECC errors
so that the guest knows that e.g. writes on read-only backends have
failed.
Signed-off-by: Samuel Thibault <samuel.thibault@eu.citrix.com>
Index: hw/ide.c
===================================================================
--- hw/ide.c (révision 4985)
+++ hw/ide.c (copie de travail)
@@ -891,7 +891,6 @@
return 1;
}
-/* XXX: handle errors */
static void ide_read_dma_cb(void *opaque, int ret)
{
BMDMAState *bm = opaque;
@@ -899,6 +898,14 @@
int n;
int64_t sector_num;
+ if (ret) {
+ s->status = READY_STAT | ERR_STAT;
+ s->error = ABRT_ERR | ECC_ERR;
+ s->nsector = 0;
+ ide_set_irq(s);
+ goto eot;
+ }
+
n = s->io_buffer_size >> 9;
sector_num = ide_get_sector(s);
if (n > 0) {
@@ -992,7 +999,6 @@
}
}
-/* XXX: handle errors */
static void ide_write_dma_cb(void *opaque, int ret)
{
BMDMAState *bm = opaque;
@@ -1000,6 +1006,14 @@
int n;
int64_t sector_num;
+ if (ret) {
+ s->status = READY_STAT | ERR_STAT;
+ s->error = ABRT_ERR | ECC_ERR;
+ s->nsector = 0;
+ ide_set_irq(s);
+ goto eot;
+ }
+
n = s->io_buffer_size >> 9;
sector_num = ide_get_sector(s);
if (n > 0) {
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH] report read/write errors to IDE guest driver as ECC errors
2008-08-05 11:55 [Qemu-devel] [PATCH] report read/write errors to IDE guest driver as ECC errors Samuel Thibault
@ 2008-08-06 2:26 ` Anthony Liguori
2008-08-06 9:28 ` Daniel P. Berrange
2008-08-06 16:21 ` [Qemu-devel] " Samuel Thibault
1 sibling, 1 reply; 9+ messages in thread
From: Anthony Liguori @ 2008-08-06 2:26 UTC (permalink / raw)
To: qemu-devel
Samuel Thibault wrote:
> report read/write errors to IDE guest driver as ECC errors
> so that the guest knows that e.g. writes on read-only backends have
> failed.
>
> Signed-off-by: Samuel Thibault <samuel.thibault@eu.citrix.com>
>
I'm glad you sent this patch as I think it's something we really need to
improve in QEMU.
A patch like this has appeared on the list a number of times, and each
time it meets with a fair bit of criticism. The most cited issue is
that indiscriminately passing IO errors to the guest is not really
correct. If you're passing through a drive, then EIO errors are pretty
reasonable to pass through as an ECC error. If you're dealing with a
file, generating an ECC error on ENOSPC is not really accurate. The
guest cannot really do anything about that either and is likely to just
further corrupt itself.
I think a patch like this is good in theory but it needs to do a better
job only handling the case where an error occurs that ECC really makes
sense.
For things like ENOSPC, there are better error handling strategies. For
instance, vm_stop()'ing the guest and printing out an error message.
This would allow the admin to free up some space, and then resume the
guest. Even if you just stubbed these things out with FIXMEs, it would
be better than pretending that these cases didn't exist :-)
Regards,
Anthony Liguori
> Index: hw/ide.c
> ===================================================================
> --- hw/ide.c (révision 4985)
> +++ hw/ide.c (copie de travail)
> @@ -891,7 +891,6 @@
> return 1;
> }
>
> -/* XXX: handle errors */
> static void ide_read_dma_cb(void *opaque, int ret)
> {
> BMDMAState *bm = opaque;
> @@ -899,6 +898,14 @@
> int n;
> int64_t sector_num;
>
> + if (ret) {
> + s->status = READY_STAT | ERR_STAT;
> + s->error = ABRT_ERR | ECC_ERR;
> + s->nsector = 0;
> + ide_set_irq(s);
> + goto eot;
> + }
> +
> n = s->io_buffer_size >> 9;
> sector_num = ide_get_sector(s);
> if (n > 0) {
> @@ -992,7 +999,6 @@
> }
> }
>
> -/* XXX: handle errors */
> static void ide_write_dma_cb(void *opaque, int ret)
> {
> BMDMAState *bm = opaque;
> @@ -1000,6 +1006,14 @@
> int n;
> int64_t sector_num;
>
> + if (ret) {
> + s->status = READY_STAT | ERR_STAT;
> + s->error = ABRT_ERR | ECC_ERR;
> + s->nsector = 0;
> + ide_set_irq(s);
> + goto eot;
> + }
> +
> n = s->io_buffer_size >> 9;
> sector_num = ide_get_sector(s);
> if (n > 0) {
>
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH] report read/write errors to IDE guest driver as ECC errors
2008-08-06 2:26 ` Anthony Liguori
@ 2008-08-06 9:28 ` Daniel P. Berrange
2008-08-06 12:22 ` Jamie Lokier
2008-08-11 16:47 ` Ian Jackson
0 siblings, 2 replies; 9+ messages in thread
From: Daniel P. Berrange @ 2008-08-06 9:28 UTC (permalink / raw)
To: qemu-devel
On Tue, Aug 05, 2008 at 09:26:14PM -0500, Anthony Liguori wrote:
> Samuel Thibault wrote:
> >report read/write errors to IDE guest driver as ECC errors
> >so that the guest knows that e.g. writes on read-only backends have
> >failed.
> >
> >Signed-off-by: Samuel Thibault <samuel.thibault@eu.citrix.com>
> >
>
> I'm glad you sent this patch as I think it's something we really need to
> improve in QEMU.
>
> A patch like this has appeared on the list a number of times, and each
> time it meets with a fair bit of criticism. The most cited issue is
> that indiscriminately passing IO errors to the guest is not really
> correct. If you're passing through a drive, then EIO errors are pretty
> reasonable to pass through as an ECC error. If you're dealing with a
> file, generating an ECC error on ENOSPC is not really accurate. The
> guest cannot really do anything about that either and is likely to just
> further corrupt itself.
If you have a journalling filesystem, the worst that'll happen in the
ENOSPC scenario is that you'll loose data from the open application files
that aren't flusshed to disk - no different to pulling the power plug.
The filesystem itself will not corrupt itself - it'll happily recover
the journal & carry on after rebootint.
> I think a patch like this is good in theory but it needs to do a better
> job only handling the case where an error occurs that ECC really makes
> sense.
>
> For things like ENOSPC, there are better error handling strategies. For
> instance, vm_stop()'ing the guest and printing out an error message.
> This would allow the admin to free up some space, and then resume the
> guest. Even if you just stubbed these things out with FIXMEs, it would
> be better than pretending that these cases didn't exist :-)
Unless someone wants to implement the ENOSPC handling right now, I'd
like to see this patch just committed as is, so we at least get
incremental benefit over current behaviour, which definitely *does*
corrupt guest filesystems by silently pretending the write succeeed.
Special ENOSPC handling can be added on top.
I agree that pausing the guest is probably best option in that scenario,
the interesting question being how to inform management tools/API that
the VM has just paused itself. In libvirt we handle pause/resume by doing
'stop'/'cont' in the QEMU monitor, and since we're triggering it ourselves
we can track the state change from running to paused. If the VM pauses
itself though we nee to figure out a way to detect this state change.
The monitor doesn't have any asynchronous notification capability as it
stands.
Daniel
--
|: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH] report read/write errors to IDE guest driver as ECC errors
2008-08-06 9:28 ` Daniel P. Berrange
@ 2008-08-06 12:22 ` Jamie Lokier
2008-08-11 16:47 ` Ian Jackson
1 sibling, 0 replies; 9+ messages in thread
From: Jamie Lokier @ 2008-08-06 12:22 UTC (permalink / raw)
To: Daniel P. Berrange, qemu-devel
Daniel P. Berrange wrote:
> If you have a journalling filesystem, the worst that'll happen in the
> ENOSPC scenario is that you'll loose data from the open application files
> that aren't flusshed to disk - no different to pulling the power plug.
> The filesystem itself will not corrupt itself - it'll happily recover
> the journal & carry on after rebootint.
So the filesystem's ok but the application's files are corrupt -
doesn't sound too good :-)
Journalling filesystems are supposed to be robust against sudden
reboot/power failure (despite this basic expectation, Linux ext3 is
not robust against power failure by default).
Journalled filesystems should also be robust against I/O errors, but
in fact that would require an IOP sequence like WRITE, BARRIER, WRITE
to abort the second WRITE if the first one fails with an I/O error.
Linux does not abort the second WRITE - and therefore an isolated
write I/O error can result in filesystem corruption on all its
journalled filesystems. When TCQ/NCQ are used, all commands my be in
flight concurrently, I'm not sure if it's even possible to auto-abort
the second WRITE when the first errors, in any guest.
(There are also weaknesses in Linux's handling of I/O errors in the
VM, discussed recently with a "sweep it under the carpet, handling I/O
errors properly in the VM is too hard" conclusion.)
I wouldn't be surprised if other guests have similar weaknesses.
Solaris ZFS may be an exception, as they claim to have thoroughly
tested it with simulated I/O errors.
Therefore, at least, when QEMU reports a write I/O error due to ENOSPC
(and perhaps due to EIO), it should set a sticky flag so that all
subsequent writes error without trying to write.
> Unless someone wants to implement the ENOSPC handling right now, I'd
> like to see this patch just committed as is, so we at least get
> incremental benefit over current behaviour, which definitely *does*
> corrupt guest filesystems by silently pretending the write succeeed.
> Special ENOSPC handling can be added on top.
I suggest adding a sticky flag: Once hit ENOSPC (due to extending
qcow2), all further writes should fail even if they don't need to
extend the file. This will prevent some kinds of guest journalled
filesystem corruption.
> I agree that pausing the guest is probably best option in that scenario,
> the interesting question being how to inform management tools/API that
> the VM has just paused itself. In libvirt we handle pause/resume by doing
> 'stop'/'cont' in the QEMU monitor, and since we're triggering it ourselves
> we can track the state change from running to paused. If the VM pauses
> itself though we nee to figure out a way to detect this state change.
> The monitor doesn't have any asynchronous notification capability as it
> stands.
It does have the log file, I suppose, or it could poll the CPU state
every so often. Not the prettiest mechanisms.
-- Jamie
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Qemu-devel] Re: [PATCH] report read/write errors to IDE guest driver as ECC errors
2008-08-05 11:55 [Qemu-devel] [PATCH] report read/write errors to IDE guest driver as ECC errors Samuel Thibault
2008-08-06 2:26 ` Anthony Liguori
@ 2008-08-06 16:21 ` Samuel Thibault
1 sibling, 0 replies; 9+ messages in thread
From: Samuel Thibault @ 2008-08-06 16:21 UTC (permalink / raw)
To: qemu-devel
Samuel Thibault, le Tue 05 Aug 2008 12:55:06 +0100, a écrit :
> report read/write errors to IDE guest driver as ECC errors
> so that the guest knows that e.g. writes on read-only backends have
> failed.
Ah, wait, Ian actually has a more complete patch.
Samuel
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH] report read/write errors to IDE guest driver as ECC errors
2008-08-06 9:28 ` Daniel P. Berrange
2008-08-06 12:22 ` Jamie Lokier
@ 2008-08-11 16:47 ` Ian Jackson
2008-08-11 17:01 ` [Qemu-devel] [PATCH] report read/write errors to IDE guest driver as ECC errorsj Samuel Thibault
2008-08-11 17:05 ` [Qemu-devel] [PATCH] report read/write errors to IDE guest driver as ECC errors Anthony Liguori
1 sibling, 2 replies; 9+ messages in thread
From: Ian Jackson @ 2008-08-11 16:47 UTC (permalink / raw)
To: Daniel P. Berrange, qemu-devel
Daniel P. Berrange writes ("Re: [Qemu-devel] [PATCH] report read/write errors to IDE guest driver as ECC errors"):
> Unless someone wants to implement the ENOSPC handling right now, I'd
> like to see this patch just committed as is, so we at least get
> incremental benefit over current behaviour, which definitely *does*
> corrupt guest filesystems by silently pretending the write succeeed.
Absolutely.
> Special ENOSPC handling can be added on top.
> I agree that pausing the guest is probably best option in that scenario,
I disagree. Most reasonable guests will have special handling for
write failures on their disks. For example, Linux will (in the
default setup) remount the fs readonly precisely to prevent
corruption.
Pausing the guest denies the guest the ability to take whatever action
it really wants to.
Ian.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH] report read/write errors to IDE guest driver as ECC errorsj
2008-08-11 16:47 ` Ian Jackson
@ 2008-08-11 17:01 ` Samuel Thibault
2008-08-11 17:56 ` Jamie Lokier
2008-08-11 17:05 ` [Qemu-devel] [PATCH] report read/write errors to IDE guest driver as ECC errors Anthony Liguori
1 sibling, 1 reply; 9+ messages in thread
From: Samuel Thibault @ 2008-08-11 17:01 UTC (permalink / raw)
To: qemu-devel
Ian Jackson, le Mon 11 Aug 2008 17:47:20 +0100, a écrit :
> > Special ENOSPC handling can be added on top.
> > I agree that pausing the guest is probably best option in that scenario,
>
> I disagree. Most reasonable guests will have special handling for
> write failures on their disks. For example, Linux will (in the
> default setup) remount the fs readonly precisely to prevent
> corruption.
Actually no, see the thread
"ext3 seems to ignore ECC errors"
http://marc.info/?l=linux-kernel&m=121786230719154&w=2
> Pausing the guest denies the guest the ability to take whatever action
> it really wants to.
Sure, but I'm afraid of e.g. guests that would handle it by thinking
"bad sector, let's mark it as such and try another one".
Samuel
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH] report read/write errors to IDE guest driver as ECC errors
2008-08-11 16:47 ` Ian Jackson
2008-08-11 17:01 ` [Qemu-devel] [PATCH] report read/write errors to IDE guest driver as ECC errorsj Samuel Thibault
@ 2008-08-11 17:05 ` Anthony Liguori
1 sibling, 0 replies; 9+ messages in thread
From: Anthony Liguori @ 2008-08-11 17:05 UTC (permalink / raw)
To: qemu-devel
Ian Jackson wrote:
> Daniel P. Berrange writes ("Re: [Qemu-devel] [PATCH] report read/write errors to IDE guest driver as ECC errors"):
>
>> Unless someone wants to implement the ENOSPC handling right now, I'd
>> like to see this patch just committed as is, so we at least get
>> incremental benefit over current behaviour, which definitely *does*
>> corrupt guest filesystems by silently pretending the write succeeed.
>>
>
> Absolutely.
>
>
>> Special ENOSPC handling can be added on top.
>> I agree that pausing the guest is probably best option in that scenario,
>>
>
> I disagree. Most reasonable guests will have special handling for
> write failures on their disks. For example, Linux will (in the
> default setup) remount the fs readonly precisely to prevent
> corruption.
>
But corruption isn't the issue. The issue is that you're out of disk
space. If using qcow, an attempt to remount the fs may result in even
more errors if the qcow file has to expand (due to CoW).
Consider the case of guest RAID. Once the ECC errors are detected, the
guest will start trying to rebuild the failed drive and since we're
already out of space, this is going to be catastrophic. This is why
ENOSPC should be handled differently than EIO.
> Pausing the guest denies the guest the ability to take whatever action
> it really wants to.
>
I agree for EIO, but a guest cannot take any actions when ENOSPC occurs
because it has no concept that such a thing can occur.
Regards,
Anthony Liguori
> Ian.
>
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH] report read/write errors to IDE guest driver as ECC errorsj
2008-08-11 17:01 ` [Qemu-devel] [PATCH] report read/write errors to IDE guest driver as ECC errorsj Samuel Thibault
@ 2008-08-11 17:56 ` Jamie Lokier
0 siblings, 0 replies; 9+ messages in thread
From: Jamie Lokier @ 2008-08-11 17:56 UTC (permalink / raw)
To: qemu-devel
Samuel Thibault wrote:
> Actually no, see the thread
> "ext3 seems to ignore ECC errors"
> http://marc.info/?l=linux-kernel&m=121786230719154&w=2
>
> > Pausing the guest denies the guest the ability to take whatever action
> > it really wants to.
>
> Sure, but I'm afraid of e.g. guests that would handle it by thinking
> "bad sector, let's mark it as such and try another one".
Yes, there are guests which queue writes which must be ordered, too,
and if an early one fails, they still submit the later ones.
It would be good if QEMU, having failed one write due to its metadata
issues, could fail _all_ subsequent writes to that disk until told
it's ok. No point having a few scattershot sectors written.
-- Jamie
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-08-11 17:56 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-05 11:55 [Qemu-devel] [PATCH] report read/write errors to IDE guest driver as ECC errors Samuel Thibault
2008-08-06 2:26 ` Anthony Liguori
2008-08-06 9:28 ` Daniel P. Berrange
2008-08-06 12:22 ` Jamie Lokier
2008-08-11 16:47 ` Ian Jackson
2008-08-11 17:01 ` [Qemu-devel] [PATCH] report read/write errors to IDE guest driver as ECC errorsj Samuel Thibault
2008-08-11 17:56 ` Jamie Lokier
2008-08-11 17:05 ` [Qemu-devel] [PATCH] report read/write errors to IDE guest driver as ECC errors Anthony Liguori
2008-08-06 16:21 ` [Qemu-devel] " Samuel Thibault
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).