qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* Re: [Qemu-devel] Massive read only kvm guests when backing file was missing
       [not found] <CAMrG31z=oy-53Lfya4svhNniD_7Q1YETuHeZsotHj8U5xJNYmw@mail.gmail.com>
@ 2014-03-27  6:41 ` Michael S. Tsirkin
  2014-03-27  7:36   ` Markus Armbruster
  0 siblings, 1 reply; 11+ messages in thread
From: Michael S. Tsirkin @ 2014-03-27  6:41 UTC (permalink / raw)
  To: Alejandro Comisario
  Cc: kvm, ghammer, Stefan Hajnoczi, Jason Wang, linux-kernel,
	qemu-devel

On Wed, Mar 26, 2014 at 11:08:03PM -0300, Alejandro Comisario wrote:
> Hi List!
> Hope some one can help me, we had a big issue in our cloud the other
> day, a couple of our openstack regions ( +2000 kvm guests with qcow2 )
> went read only filesystem from the guest side because the backing
> files directory (the openstack _base directory) was compromised and
> the data was lost, when we realized the data was lost, it took us 5
> mins to restore the backup of the backing files, but by that time all
> the kvm guests received some kind of IO error from the hypervisor
> layer, and went read only on root filesystem.
> 
> My question would be, is there a way to hold the IO operations against
> the backing files ( i thought that would be 99% READ operations ) for
> a little longer ( im asking this because i dont quite understand what
> is the process and when it raises the error ) in a case the backing
> files are missing (no IO possible) but is recoverable within minutes ?
> 
> Any tip  on how to achieve this if possible, or information about how
> backing files works on kvm, will be amazing.
> Waiting for feedback!
> 
> kindest regards.
> Alejandro Comisario


I'm guessing this is what happened: guests timed out meanwhile.
You can increase the timeout within the guest:
echo 600 > /sys/block/sda/device/timeout
to timeout after 10 minutes.

If you have installed qemu guest agent on your system, you can do this
from the host. Unfortunately by default it's memory can be pushed out to swap
and then on disk error access there might will fail :(
Maybe we should consider mlock on all its memory at least as an option.

You could pause your guests, restart them after the issue is resolved,
and we could I guess add functionality to pause VM on disk errors
automatically.
Stefan?


-- 
MST

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] Massive read only kvm guests when backing file was missing
  2014-03-27  6:41 ` [Qemu-devel] Massive read only kvm guests when backing file was missing Michael S. Tsirkin
@ 2014-03-27  7:36   ` Markus Armbruster
  2014-03-27  8:10     ` Michael S. Tsirkin
  0 siblings, 1 reply; 11+ messages in thread
From: Markus Armbruster @ 2014-03-27  7:36 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, ghammer, Stefan Hajnoczi, Jason Wang, linux-kernel,
	qemu-devel, Alejandro Comisario

"Michael S. Tsirkin" <mst@redhat.com> writes:

> On Wed, Mar 26, 2014 at 11:08:03PM -0300, Alejandro Comisario wrote:
>> Hi List!
>> Hope some one can help me, we had a big issue in our cloud the other
>> day, a couple of our openstack regions ( +2000 kvm guests with qcow2 )
>> went read only filesystem from the guest side because the backing
>> files directory (the openstack _base directory) was compromised and
>> the data was lost, when we realized the data was lost, it took us 5
>> mins to restore the backup of the backing files, but by that time all
>> the kvm guests received some kind of IO error from the hypervisor
>> layer, and went read only on root filesystem.
>> 
>> My question would be, is there a way to hold the IO operations against
>> the backing files ( i thought that would be 99% READ operations ) for
>> a little longer ( im asking this because i dont quite understand what
>> is the process and when it raises the error ) in a case the backing
>> files are missing (no IO possible) but is recoverable within minutes ?
>> 
>> Any tip  on how to achieve this if possible, or information about how
>> backing files works on kvm, will be amazing.
>> Waiting for feedback!
>> 
>> kindest regards.
>> Alejandro Comisario
>
>
> I'm guessing this is what happened: guests timed out meanwhile.
> You can increase the timeout within the guest:
> echo 600 > /sys/block/sda/device/timeout
> to timeout after 10 minutes.
>
> If you have installed qemu guest agent on your system, you can do this
> from the host. Unfortunately by default it's memory can be pushed out to swap
> and then on disk error access there might will fail :(
> Maybe we should consider mlock on all its memory at least as an option.
>
> You could pause your guests, restart them after the issue is resolved,
> and we could I guess add functionality to pause VM on disk errors
> automatically.
> Stefan?

Would -drive rerror=stop do?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] Massive read only kvm guests when backing file was missing
  2014-03-27  7:36   ` Markus Armbruster
@ 2014-03-27  8:10     ` Michael S. Tsirkin
  2014-03-27  8:53       ` Stefan Hajnoczi
  0 siblings, 1 reply; 11+ messages in thread
From: Michael S. Tsirkin @ 2014-03-27  8:10 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: kvm, ghammer, Stefan Hajnoczi, Jason Wang, linux-kernel,
	qemu-devel, Alejandro Comisario

On Thu, Mar 27, 2014 at 08:36:57AM +0100, Markus Armbruster wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> 
> > On Wed, Mar 26, 2014 at 11:08:03PM -0300, Alejandro Comisario wrote:
> >> Hi List!
> >> Hope some one can help me, we had a big issue in our cloud the other
> >> day, a couple of our openstack regions ( +2000 kvm guests with qcow2 )
> >> went read only filesystem from the guest side because the backing
> >> files directory (the openstack _base directory) was compromised and
> >> the data was lost, when we realized the data was lost, it took us 5
> >> mins to restore the backup of the backing files, but by that time all
> >> the kvm guests received some kind of IO error from the hypervisor
> >> layer, and went read only on root filesystem.
> >> 
> >> My question would be, is there a way to hold the IO operations against
> >> the backing files ( i thought that would be 99% READ operations ) for
> >> a little longer ( im asking this because i dont quite understand what
> >> is the process and when it raises the error ) in a case the backing
> >> files are missing (no IO possible) but is recoverable within minutes ?
> >> 
> >> Any tip  on how to achieve this if possible, or information about how
> >> backing files works on kvm, will be amazing.
> >> Waiting for feedback!
> >> 
> >> kindest regards.
> >> Alejandro Comisario
> >
> >
> > I'm guessing this is what happened: guests timed out meanwhile.
> > You can increase the timeout within the guest:
> > echo 600 > /sys/block/sda/device/timeout
> > to timeout after 10 minutes.
> >
> > If you have installed qemu guest agent on your system, you can do this
> > from the host. Unfortunately by default it's memory can be pushed out to swap
> > and then on disk error access there might will fail :(
> > Maybe we should consider mlock on all its memory at least as an option.
> >
> > You could pause your guests, restart them after the issue is resolved,
> > and we could I guess add functionality to pause VM on disk errors
> > automatically.
> > Stefan?
> 
> Would -drive rerror=stop do?

I think it will. It's a pity it doesn't appear in --help output -
would make it easier to find.

-- 
MST

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] Massive read only kvm guests when backing file was missing
  2014-03-27  8:10     ` Michael S. Tsirkin
@ 2014-03-27  8:53       ` Stefan Hajnoczi
  2014-03-27 16:13         ` Alejandro Comisario
  2014-03-27 16:14         ` Alejandro Comisario
  0 siblings, 2 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2014-03-27  8:53 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, kvm, ghammer, Jason Wang, linux-kernel,
	Markus Armbruster, Alejandro Comisario

On Thu, Mar 27, 2014 at 10:10:40AM +0200, Michael S. Tsirkin wrote:
> On Thu, Mar 27, 2014 at 08:36:57AM +0100, Markus Armbruster wrote:
> > "Michael S. Tsirkin" <mst@redhat.com> writes:
> > 
> > > On Wed, Mar 26, 2014 at 11:08:03PM -0300, Alejandro Comisario wrote:
> > >> Hi List!
> > >> Hope some one can help me, we had a big issue in our cloud the other
> > >> day, a couple of our openstack regions ( +2000 kvm guests with qcow2 )
> > >> went read only filesystem from the guest side because the backing
> > >> files directory (the openstack _base directory) was compromised and
> > >> the data was lost, when we realized the data was lost, it took us 5
> > >> mins to restore the backup of the backing files, but by that time all
> > >> the kvm guests received some kind of IO error from the hypervisor
> > >> layer, and went read only on root filesystem.
> > >> 
> > >> My question would be, is there a way to hold the IO operations against
> > >> the backing files ( i thought that would be 99% READ operations ) for
> > >> a little longer ( im asking this because i dont quite understand what
> > >> is the process and when it raises the error ) in a case the backing
> > >> files are missing (no IO possible) but is recoverable within minutes ?
> > >> 
> > >> Any tip  on how to achieve this if possible, or information about how
> > >> backing files works on kvm, will be amazing.
> > >> Waiting for feedback!
> > >> 
> > >> kindest regards.
> > >> Alejandro Comisario
> > >
> > >
> > > I'm guessing this is what happened: guests timed out meanwhile.
> > > You can increase the timeout within the guest:
> > > echo 600 > /sys/block/sda/device/timeout
> > > to timeout after 10 minutes.
> > >
> > > If you have installed qemu guest agent on your system, you can do this
> > > from the host. Unfortunately by default it's memory can be pushed out to swap
> > > and then on disk error access there might will fail :(
> > > Maybe we should consider mlock on all its memory at least as an option.
> > >
> > > You could pause your guests, restart them after the issue is resolved,
> > > and we could I guess add functionality to pause VM on disk errors
> > > automatically.
> > > Stefan?
> > 
> > Would -drive rerror=stop do?
> 
> I think it will. It's a pity it doesn't appear in --help output -
> would make it easier to find.

It is documented on the man page.  I'll send a patch to document it in
the --help output too.

But there's still a problem because the guest can have a shorter timeout
or the image may be NFS mounted on the host.  In that case the guest may
give up on the request before the host.  Then there is nothing QEMU can
do to avoid an error being returned to the application or the guest file
system going into read-only mode.

So make sure the timeout inside the guest is high.

Stefan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] Massive read only kvm guests when backing file was missing
  2014-03-27  8:53       ` Stefan Hajnoczi
@ 2014-03-27 16:13         ` Alejandro Comisario
  2014-03-27 16:14         ` Alejandro Comisario
  1 sibling, 0 replies; 11+ messages in thread
From: Alejandro Comisario @ 2014-03-27 16:13 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, kvm, Michael S. Tsirkin, ghammer, Jason Wang,
	linux-kernel, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 3454 bytes --]

Seems like virtio (kvm 1.0) doesnt expose timeout on the guest side (ubuntu
12.04 on host and guest).
So, how can i adjust the tinmeout on the guest ?

This solution is the most logical one, but i cannot apply it!
thanks for all the responses!

regards



Alejandro Comisario
*MercadoLibre Cloud Services*
Arias 3751, Piso 7 (C1430CRG)
Ciudad de Buenos Aires - Argentina
Cel: +549(11) 15-3770-1857
Tel : +54(11) 4640-8443


On Thu, Mar 27, 2014 at 5:53 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:

> On Thu, Mar 27, 2014 at 10:10:40AM +0200, Michael S. Tsirkin wrote:
> > On Thu, Mar 27, 2014 at 08:36:57AM +0100, Markus Armbruster wrote:
> > > "Michael S. Tsirkin" <mst@redhat.com> writes:
> > >
> > > > On Wed, Mar 26, 2014 at 11:08:03PM -0300, Alejandro Comisario wrote:
> > > >> Hi List!
> > > >> Hope some one can help me, we had a big issue in our cloud the other
> > > >> day, a couple of our openstack regions ( +2000 kvm guests with
> qcow2 )
> > > >> went read only filesystem from the guest side because the backing
> > > >> files directory (the openstack _base directory) was compromised and
> > > >> the data was lost, when we realized the data was lost, it took us 5
> > > >> mins to restore the backup of the backing files, but by that time
> all
> > > >> the kvm guests received some kind of IO error from the hypervisor
> > > >> layer, and went read only on root filesystem.
> > > >>
> > > >> My question would be, is there a way to hold the IO operations
> against
> > > >> the backing files ( i thought that would be 99% READ operations )
> for
> > > >> a little longer ( im asking this because i dont quite understand
> what
> > > >> is the process and when it raises the error ) in a case the backing
> > > >> files are missing (no IO possible) but is recoverable within
> minutes ?
> > > >>
> > > >> Any tip  on how to achieve this if possible, or information about
> how
> > > >> backing files works on kvm, will be amazing.
> > > >> Waiting for feedback!
> > > >>
> > > >> kindest regards.
> > > >> Alejandro Comisario
> > > >
> > > >
> > > > I'm guessing this is what happened: guests timed out meanwhile.
> > > > You can increase the timeout within the guest:
> > > > echo 600 > /sys/block/sda/device/timeout
> > > > to timeout after 10 minutes.
> > > >
> > > > If you have installed qemu guest agent on your system, you can do
> this
> > > > from the host. Unfortunately by default it's memory can be pushed
> out to swap
> > > > and then on disk error access there might will fail :(
> > > > Maybe we should consider mlock on all its memory at least as an
> option.
> > > >
> > > > You could pause your guests, restart them after the issue is
> resolved,
> > > > and we could I guess add functionality to pause VM on disk errors
> > > > automatically.
> > > > Stefan?
> > >
> > > Would -drive rerror=stop do?
> >
> > I think it will. It's a pity it doesn't appear in --help output -
> > would make it easier to find.
>
> It is documented on the man page.  I'll send a patch to document it in
> the --help output too.
>
> But there's still a problem because the guest can have a shorter timeout
> or the image may be NFS mounted on the host.  In that case the guest may
> give up on the request before the host.  Then there is nothing QEMU can
> do to avoid an error being returned to the application or the guest file
> system going into read-only mode.
>
> So make sure the timeout inside the guest is high.
>
> Stefan
>

[-- Attachment #2: Type: text/html, Size: 5603 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] Massive read only kvm guests when backing file was missing
  2014-03-27  8:53       ` Stefan Hajnoczi
  2014-03-27 16:13         ` Alejandro Comisario
@ 2014-03-27 16:14         ` Alejandro Comisario
  2014-03-28  7:01           ` Michael Tokarev
  1 sibling, 1 reply; 11+ messages in thread
From: Alejandro Comisario @ 2014-03-27 16:14 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, kvm, Michael S. Tsirkin, ghammer, Jason Wang,
	linux-kernel, Markus Armbruster

Seems like virtio (kvm 1.0) doesnt expose timeout on the guest side
(ubuntu 12.04 on host and guest).
So, how can i adjust the tinmeout on the guest ?

This solution is the most logical one, but i cannot apply it!
thanks for all the responses!

regards


Alejandro Comisario
MercadoLibre Cloud Services
Arias 3751, Piso 7 (C1430CRG)
Ciudad de Buenos Aires - Argentina
Cel: +549(11) 15-3770-1857
Tel : +54(11) 4640-8443


On Thu, Mar 27, 2014 at 5:53 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Thu, Mar 27, 2014 at 10:10:40AM +0200, Michael S. Tsirkin wrote:
>> On Thu, Mar 27, 2014 at 08:36:57AM +0100, Markus Armbruster wrote:
>> > "Michael S. Tsirkin" <mst@redhat.com> writes:
>> >
>> > > On Wed, Mar 26, 2014 at 11:08:03PM -0300, Alejandro Comisario wrote:
>> > >> Hi List!
>> > >> Hope some one can help me, we had a big issue in our cloud the other
>> > >> day, a couple of our openstack regions ( +2000 kvm guests with qcow2 )
>> > >> went read only filesystem from the guest side because the backing
>> > >> files directory (the openstack _base directory) was compromised and
>> > >> the data was lost, when we realized the data was lost, it took us 5
>> > >> mins to restore the backup of the backing files, but by that time all
>> > >> the kvm guests received some kind of IO error from the hypervisor
>> > >> layer, and went read only on root filesystem.
>> > >>
>> > >> My question would be, is there a way to hold the IO operations against
>> > >> the backing files ( i thought that would be 99% READ operations ) for
>> > >> a little longer ( im asking this because i dont quite understand what
>> > >> is the process and when it raises the error ) in a case the backing
>> > >> files are missing (no IO possible) but is recoverable within minutes ?
>> > >>
>> > >> Any tip  on how to achieve this if possible, or information about how
>> > >> backing files works on kvm, will be amazing.
>> > >> Waiting for feedback!
>> > >>
>> > >> kindest regards.
>> > >> Alejandro Comisario
>> > >
>> > >
>> > > I'm guessing this is what happened: guests timed out meanwhile.
>> > > You can increase the timeout within the guest:
>> > > echo 600 > /sys/block/sda/device/timeout
>> > > to timeout after 10 minutes.
>> > >
>> > > If you have installed qemu guest agent on your system, you can do this
>> > > from the host. Unfortunately by default it's memory can be pushed out to swap
>> > > and then on disk error access there might will fail :(
>> > > Maybe we should consider mlock on all its memory at least as an option.
>> > >
>> > > You could pause your guests, restart them after the issue is resolved,
>> > > and we could I guess add functionality to pause VM on disk errors
>> > > automatically.
>> > > Stefan?
>> >
>> > Would -drive rerror=stop do?
>>
>> I think it will. It's a pity it doesn't appear in --help output -
>> would make it easier to find.
>
> It is documented on the man page.  I'll send a patch to document it in
> the --help output too.
>
> But there's still a problem because the guest can have a shorter timeout
> or the image may be NFS mounted on the host.  In that case the guest may
> give up on the request before the host.  Then there is nothing QEMU can
> do to avoid an error being returned to the application or the guest file
> system going into read-only mode.
>
> So make sure the timeout inside the guest is high.
>
> Stefan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] Massive read only kvm guests when backing file was missing
  2014-03-27 16:14         ` Alejandro Comisario
@ 2014-03-28  7:01           ` Michael Tokarev
  2014-03-28  8:47             ` Stefan Hajnoczi
  0 siblings, 1 reply; 11+ messages in thread
From: Michael Tokarev @ 2014-03-28  7:01 UTC (permalink / raw)
  To: Alejandro Comisario
  Cc: kvm, Michael S. Tsirkin, ghammer, Stefan Hajnoczi, Jason Wang,
	qemu-devel, linux-kernel, Markus Armbruster

27.03.2014 20:14, Alejandro Comisario wrote:
> Seems like virtio (kvm 1.0) doesnt expose timeout on the guest side
> (ubuntu 12.04 on host and guest).
> So, how can i adjust the tinmeout on the guest ?

After a bit more talks on IRC yesterday, it turned out that the situation
is _much_ more "interesting" than originally described.  The OP claims to
have 10500 guests running off an NFS server, and that after NFS server
downtime, the "backing files" were disappeared (whatever it means), so
they had to restore those files.  More, the OP didn't even bother to look
at the guest's dmesg, being busy rebooting all 10500 guests.

> This solution is the most logical one, but i cannot apply it!
> thanks for all the responses!

I suggested the OP to actually describe the _real_ situation, instead of
giving random half-pictures, and actually take a look at the actual problem
as reported in various places (most importantly the guest kernel log), and
reoirt _those_ hints to the list.  I also mentioned that, at least for some
NFS servers, if a client has a file open on the server, and this file is
deleted, the server will report error to the client when client tries to
access that file, and this has nothing at all to do with timeouts of any
kind.

Thanks,

/mjt

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] Massive read only kvm guests when backing file was missing
  2014-03-28  7:01           ` Michael Tokarev
@ 2014-03-28  8:47             ` Stefan Hajnoczi
  2014-04-01  0:51               ` Alejandro Comisario
  0 siblings, 1 reply; 11+ messages in thread
From: Stefan Hajnoczi @ 2014-03-28  8:47 UTC (permalink / raw)
  To: Michael Tokarev
  Cc: kvm, Michael S. Tsirkin, ghammer, Markus Armbruster, Jason Wang,
	qemu-devel, linux-kernel, Alejandro Comisario

On Fri, Mar 28, 2014 at 11:01:00AM +0400, Michael Tokarev wrote:
> 27.03.2014 20:14, Alejandro Comisario wrote:
> > Seems like virtio (kvm 1.0) doesnt expose timeout on the guest side
> > (ubuntu 12.04 on host and guest).
> > So, how can i adjust the tinmeout on the guest ?
> 
> After a bit more talks on IRC yesterday, it turned out that the situation
> is _much_ more "interesting" than originally described.  The OP claims to
> have 10500 guests running off an NFS server, and that after NFS server
> downtime, the "backing files" were disappeared (whatever it means), so
> they had to restore those files.  More, the OP didn't even bother to look
> at the guest's dmesg, being busy rebooting all 10500 guests.
> 
> > This solution is the most logical one, but i cannot apply it!
> > thanks for all the responses!
> 
> I suggested the OP to actually describe the _real_ situation, instead of
> giving random half-pictures, and actually take a look at the actual problem
> as reported in various places (most importantly the guest kernel log), and
> reoirt _those_ hints to the list.  I also mentioned that, at least for some
> NFS servers, if a client has a file open on the server, and this file is
> deleted, the server will report error to the client when client tries to
> access that file, and this has nothing at all to do with timeouts of any
> kind.

Thanks for the update and for taking time to help on IRC.  I feel you're
being harsh on Alejandro though.

Improving the quality of bug reports is important but it shouldn't be at
the expense of quality of communication.  We can't assume that everyone
is an expert in troubleshooting KVM or Linux.  Therefore we can't blame
them, which will only drive people away and detract from the community.

TL;DR post logs and error messages +1, berate him -1

Stefan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] Massive read only kvm guests when backing file was missing
  2014-03-28  8:47             ` Stefan Hajnoczi
@ 2014-04-01  0:51               ` Alejandro Comisario
  2014-04-01 13:52                 ` Stefan Hajnoczi
  0 siblings, 1 reply; 11+ messages in thread
From: Alejandro Comisario @ 2014-04-01  0:51 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kvm, Michael S. Tsirkin, ghammer, Markus Armbruster, Jason Wang,
	Michael Tokarev, qemu-devel, linux-kernel

Thanks Stefan and thanks Michael also.

That situation regarding the IRC was very special, since i didnt
wanted to tell Michael "hey, everyone in the mailing list got it and
im here chatting with you and you didn't" so i assumed the IRC was
9999999999999 times more pro than the mailing list so i decided to
keep my head down and assume the communication error was on my side.

Still, IMHO, i really believe that if you are a user willing to give
KVM a chance enought to make a query on the IRC, you might feel you
are not geek enought to be there, and i dont mean be there on IRC, but
trying to use the community to support you while you try KVM.

In my case, while was very important to understant what were my
chances regarding this issue, i knew i would find my answer no matter
what because i was decided to find it, i could get mad with 10.5K
guests running on my back, yes my experience was more from the "virsh
stop; virsh start" side, but still i felt i needed you guys to try to
find this out.

Again, thanks to everyone.

best.
Alejandro Comisario


On Fri, Mar 28, 2014 at 5:47 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Fri, Mar 28, 2014 at 11:01:00AM +0400, Michael Tokarev wrote:
>> 27.03.2014 20:14, Alejandro Comisario wrote:
>> > Seems like virtio (kvm 1.0) doesnt expose timeout on the guest side
>> > (ubuntu 12.04 on host and guest).
>> > So, how can i adjust the tinmeout on the guest ?
>>
>> After a bit more talks on IRC yesterday, it turned out that the situation
>> is _much_ more "interesting" than originally described.  The OP claims to
>> have 10500 guests running off an NFS server, and that after NFS server
>> downtime, the "backing files" were disappeared (whatever it means), so
>> they had to restore those files.  More, the OP didn't even bother to look
>> at the guest's dmesg, being busy rebooting all 10500 guests.
>>
>> > This solution is the most logical one, but i cannot apply it!
>> > thanks for all the responses!
>>
>> I suggested the OP to actually describe the _real_ situation, instead of
>> giving random half-pictures, and actually take a look at the actual problem
>> as reported in various places (most importantly the guest kernel log), and
>> reoirt _those_ hints to the list.  I also mentioned that, at least for some
>> NFS servers, if a client has a file open on the server, and this file is
>> deleted, the server will report error to the client when client tries to
>> access that file, and this has nothing at all to do with timeouts of any
>> kind.
>
> Thanks for the update and for taking time to help on IRC.  I feel you're
> being harsh on Alejandro though.
>
> Improving the quality of bug reports is important but it shouldn't be at
> the expense of quality of communication.  We can't assume that everyone
> is an expert in troubleshooting KVM or Linux.  Therefore we can't blame
> them, which will only drive people away and detract from the community.
>
> TL;DR post logs and error messages +1, berate him -1
>
> Stefan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] Massive read only kvm guests when backing file was missing
  2014-04-01  0:51               ` Alejandro Comisario
@ 2014-04-01 13:52                 ` Stefan Hajnoczi
  2014-04-01 14:09                   ` Alejandro Comisario
  0 siblings, 1 reply; 11+ messages in thread
From: Stefan Hajnoczi @ 2014-04-01 13:52 UTC (permalink / raw)
  To: Alejandro Comisario
  Cc: kvm, Michael S. Tsirkin, ghammer, Markus Armbruster, Jason Wang,
	Michael Tokarev, qemu-devel, linux-kernel

On Mon, Mar 31, 2014 at 09:51:23PM -0300, Alejandro Comisario wrote:
> Again, thanks to everyone.

Did you reach a conclusion or is there still a problem that might be a
bug in KVM?

Stefan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] Massive read only kvm guests when backing file was missing
  2014-04-01 13:52                 ` Stefan Hajnoczi
@ 2014-04-01 14:09                   ` Alejandro Comisario
  0 siblings, 0 replies; 11+ messages in thread
From: Alejandro Comisario @ 2014-04-01 14:09 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kvm, Michael S. Tsirkin, ghammer, Markus Armbruster, Jason Wang,
	Michael Tokarev, qemu-devel, linux-kernel

The conclusion is that the backing file stored on NFS that is the same
for all 950 hosts / 10500 guests was deleted and immediatelly raised a
read-only filesystem on the guest, seems that there's no way to avoid
that.

We developed a script to recover from that scenario if the same happens.
Basically doing:

* virsh stop
* qemu-ndb connect
* fsck
* qemu-ndb disconnect
* virsh start

best regards.


Alejandro Comisario
MercadoLibre Cloud Services
Arias 3751, Piso 7 (C1430CRG)
Ciudad de Buenos Aires - Argentina
Cel: +549(11) 15-3770-1857
Tel : +54(11) 4640-8443


On Tue, Apr 1, 2014 at 10:52 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Mon, Mar 31, 2014 at 09:51:23PM -0300, Alejandro Comisario wrote:
>> Again, thanks to everyone.
>
> Did you reach a conclusion or is there still a problem that might be a
> bug in KVM?
>
> Stefan

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-04-01 14:10 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CAMrG31z=oy-53Lfya4svhNniD_7Q1YETuHeZsotHj8U5xJNYmw@mail.gmail.com>
2014-03-27  6:41 ` [Qemu-devel] Massive read only kvm guests when backing file was missing Michael S. Tsirkin
2014-03-27  7:36   ` Markus Armbruster
2014-03-27  8:10     ` Michael S. Tsirkin
2014-03-27  8:53       ` Stefan Hajnoczi
2014-03-27 16:13         ` Alejandro Comisario
2014-03-27 16:14         ` Alejandro Comisario
2014-03-28  7:01           ` Michael Tokarev
2014-03-28  8:47             ` Stefan Hajnoczi
2014-04-01  0:51               ` Alejandro Comisario
2014-04-01 13:52                 ` Stefan Hajnoczi
2014-04-01 14:09                   ` Alejandro Comisario

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).