Random blocks when accessing rbd images

All of lore.kernel.org
 help / color / mirror / Atom feed

* Random blocks when accessing rbd images
@ 2011-12-15 15:07 Guido Winkelmann
  2011-12-15 15:13 ` Wido den Hollander
  2011-12-15 15:32 ` Stratos Psomadakis
  0 siblings, 2 replies; 19+ messages in thread
From: Guido Winkelmann @ 2011-12-15 15:07 UTC (permalink / raw)
  To: ceph-devel

Hi,

I've got a small ceph cluster with one mon, one mds and two osds (all on the 
same machine, for now), that I want to use as a block- and file storage backend 
for qemu machine virtualisation.

I found that read access to some of the rbd images, or parts of some of them 
sometimes blocks indefinitely, usually after the image has been sitting around 
untouched for a while, for example over night. This has the effect that virtual 
machines that try to access their disks as well as rbd commands like "rbd cp" 
will just hang indefinitely.

 I found that these blocks can usually be "fixed" by restarting one of the 
osds.

The last time this happened, ceph -s reported one of the osds to be in state 
"active+clean+scrubbing". (I'm afraid I don't have the complete output from 
ceph -s anymore.)

Does anybody have any idea what could be going wrong here?

	Guido

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Random blocks when accessing rbd images
  2011-12-15 15:07 Random blocks when accessing rbd images Guido Winkelmann
@ 2011-12-15 15:13 ` Wido den Hollander
  2011-12-15 15:32 ` Stratos Psomadakis
  1 sibling, 0 replies; 19+ messages in thread
From: Wido den Hollander @ 2011-12-15 15:13 UTC (permalink / raw)
  To: Guido Winkelmann; +Cc: ceph-devel

Hi,

On 12/15/2011 04:07 PM, Guido Winkelmann wrote:
> Hi,
>
> I've got a small ceph cluster with one mon, one mds and two osds (all on the
> same machine, for now), that I want to use as a block- and file storage backend
> for qemu machine virtualisation.
>
> I found that read access to some of the rbd images, or parts of some of them
> sometimes blocks indefinitely, usually after the image has been sitting around
> untouched for a while, for example over night. This has the effect that virtual
> machines that try to access their disks as well as rbd commands like "rbd cp"
> will just hang indefinitely.
>
>   I found that these blocks can usually be "fixed" by restarting one of the
> osds.
>
> The last time this happened, ceph -s reported one of the osds to be in state
> "active+clean+scrubbing". (I'm afraid I don't have the complete output from
> ceph -s anymore.)

I've been seeing the exact same behaviour, but I wasn't able yet to get 
into it a bit deeper.

As far as I know, when a PG gets scrubbed it become unavailable for a 
short period, but since this scrub blocks/loops the PG will never become 
available again, thus blocking the virtual machine.

I saw this behaviour with v0.37 and 0.38, upgrading to 0.39 to see if it 
still exists.

Wido

>
> Does anybody have any idea what could be going wrong here?
>
> 	Guido
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Random blocks when accessing rbd images
  2011-12-15 15:07 Random blocks when accessing rbd images Guido Winkelmann
  2011-12-15 15:13 ` Wido den Hollander
@ 2011-12-15 15:32 ` Stratos Psomadakis
  2011-12-15 15:45   ` Guido Winkelmann
  1 sibling, 1 reply; 19+ messages in thread
From: Stratos Psomadakis @ 2011-12-15 15:32 UTC (permalink / raw)
  To: Guido Winkelmann; +Cc: ceph-devel

[-- Attachment #1: Type: text/plain, Size: 1261 bytes --]

On 12/15/2011 05:07 PM, Guido Winkelmann wrote:
> Hi,
>
> I've got a small ceph cluster with one mon, one mds and two osds (all on the 
> same machine, for now), that I want to use as a block- and file storage backend 
> for qemu machine virtualisation.
>
> I found that read access to some of the rbd images, or parts of some of them 
> sometimes blocks indefinitely, usually after the image has been sitting around 
> untouched for a while, for example over night. This has the effect that virtual 
> machines that try to access their disks as well as rbd commands like "rbd cp" 
> will just hang indefinitely.
>
>  I found that these blocks can usually be "fixed" by restarting one of the 
> osds.
>
> The last time this happened, ceph -s reported one of the osds to be in state 
> "active+clean+scrubbing". (I'm afraid I don't have the complete output from 
> ceph -s anymore.)
>
> Does anybody have any idea what could be going wrong here?
I think it's fixed in v0.39

> 	Guido
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Stratos Psomadakis
<psomas@grnet.gr>



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Random blocks when accessing rbd images
  2011-12-15 15:32 ` Stratos Psomadakis
@ 2011-12-15 15:45   ` Guido Winkelmann
  2011-12-15 16:30     ` Samuel Just
  2011-12-15 16:31     ` Martin Mailand
  0 siblings, 2 replies; 19+ messages in thread
From: Guido Winkelmann @ 2011-12-15 15:45 UTC (permalink / raw)
  To: ceph-devel

Am Donnerstag, 15. Dezember 2011, 17:32:25 schrieben Sie:
> On 12/15/2011 05:07 PM, Guido Winkelmann wrote:
> > Hi,
> > 
> > I've got a small ceph cluster with one mon, one mds and two osds (all on
> > the same machine, for now), that I want to use as a block- and file
> > storage backend for qemu machine virtualisation.
> > 
> > I found that read access to some of the rbd images, or parts of some of
> > them sometimes blocks indefinitely, usually after the image has been
> > sitting around untouched for a while, for example over night. This has
> > the effect that virtual machines that try to access their disks as well
> > as rbd commands like "rbd cp" will just hang indefinitely.
> > 
> >  I found that these blocks can usually be "fixed" by restarting one of
> >  the> 
> > osds.
> > 
> > The last time this happened, ceph -s reported one of the osds to be in
> > state "active+clean+scrubbing". (I'm afraid I don't have the complete
> > output from ceph -s anymore.)
> > 
> > Does anybody have any idea what could be going wrong here?
> 
> I think it's fixed in v0.39

I'm already using 0.39, so, no. (Should have mentioned that to start with...)

        Guido

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Random blocks when accessing rbd images
  2011-12-15 15:45   ` Guido Winkelmann
@ 2011-12-15 16:30     ` Samuel Just
  2011-12-15 16:33       ` Wido den Hollander
  2011-12-15 16:44       ` Guido Winkelmann
  2011-12-15 16:31     ` Martin Mailand
  1 sibling, 2 replies; 19+ messages in thread
From: Samuel Just @ 2011-12-15 16:30 UTC (permalink / raw)
  To: ceph-devel

'ceph pg dump' will tell you the status (active/clean/scrubbing/etc)
for each pg.  Does the same pg remain in state active+clean+scrubbing
for more than 10 minutes?
-Sam

On Thu, Dec 15, 2011 at 7:45 AM, Guido Winkelmann
<guido-ceph@thisisnotatest.de> wrote:
> Am Donnerstag, 15. Dezember 2011, 17:32:25 schrieben Sie:
>> On 12/15/2011 05:07 PM, Guido Winkelmann wrote:
>> > Hi,
>> >
>> > I've got a small ceph cluster with one mon, one mds and two osds (all on
>> > the same machine, for now), that I want to use as a block- and file
>> > storage backend for qemu machine virtualisation.
>> >
>> > I found that read access to some of the rbd images, or parts of some of
>> > them sometimes blocks indefinitely, usually after the image has been
>> > sitting around untouched for a while, for example over night. This has
>> > the effect that virtual machines that try to access their disks as well
>> > as rbd commands like "rbd cp" will just hang indefinitely.
>> >
>> >  I found that these blocks can usually be "fixed" by restarting one of
>> >  the>
>> > osds.
>> >
>> > The last time this happened, ceph -s reported one of the osds to be in
>> > state "active+clean+scrubbing". (I'm afraid I don't have the complete
>> > output from ceph -s anymore.)
>> >
>> > Does anybody have any idea what could be going wrong here?
>>
>> I think it's fixed in v0.39
>
> I'm already using 0.39, so, no. (Should have mentioned that to start with...)
>
>        Guido
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Random blocks when accessing rbd images
  2011-12-15 15:45   ` Guido Winkelmann
  2011-12-15 16:30     ` Samuel Just
@ 2011-12-15 16:31     ` Martin Mailand
  2011-12-15 16:51       ` Guido Winkelmann
  2011-12-15 17:32       ` Guido Winkelmann
  1 sibling, 2 replies; 19+ messages in thread
From: Martin Mailand @ 2011-12-15 16:31 UTC (permalink / raw)
  To: Guido Winkelmann; +Cc: ceph-devel

Hi Guido,
I am running ceph version 0.39-37-g54758ab 
(commit:54758abccf429122c1bc3bce6d01bc33f1cfe238) on my cluster and I do 
not see this problem. Do you use the qemu rbd block driver or the kernel 
mount?
How did you install ceph, via the packages?

-martin


Am 15.12.2011 16:45, schrieb Guido Winkelmann:
> Am Donnerstag, 15. Dezember 2011, 17:32:25 schrieben Sie:
>> On 12/15/2011 05:07 PM, Guido Winkelmann wrote:
>>> Hi,
>>>
>>> I've got a small ceph cluster with one mon, one mds and two osds (all on
>>> the same machine, for now), that I want to use as a block- and file
>>> storage backend for qemu machine virtualisation.
>>>
>>> I found that read access to some of the rbd images, or parts of some of
>>> them sometimes blocks indefinitely, usually after the image has been
>>> sitting around untouched for a while, for example over night. This has
>>> the effect that virtual machines that try to access their disks as well
>>> as rbd commands like "rbd cp" will just hang indefinitely.
>>>
>>>   I found that these blocks can usually be "fixed" by restarting one of
>>>   the>
>>> osds.
>>>
>>> The last time this happened, ceph -s reported one of the osds to be in
>>> state "active+clean+scrubbing". (I'm afraid I don't have the complete
>>> output from ceph -s anymore.)
>>>
>>> Does anybody have any idea what could be going wrong here?
>>
>> I think it's fixed in v0.39
>
> I'm already using 0.39, so, no. (Should have mentioned that to start with...)
>
>          Guido
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Random blocks when accessing rbd images
  2011-12-15 16:30     ` Samuel Just
@ 2011-12-15 16:33       ` Wido den Hollander
  2011-12-15 16:38         ` Martin Mailand
  2011-12-15 16:44       ` Guido Winkelmann
  1 sibling, 1 reply; 19+ messages in thread
From: Wido den Hollander @ 2011-12-15 16:33 UTC (permalink / raw)
  To: Samuel Just; +Cc: ceph-devel

On 12/15/2011 05:30 PM, Samuel Just wrote:
> 'ceph pg dump' will tell you the status (active/clean/scrubbing/etc)
> for each pg.  Does the same pg remain in state active+clean+scrubbing
> for more than 10 minutes?

Yes, from what I've seen it will block indefinitely until you restart 
one of the OSDs who are member of the PG.

Wido

> -Sam
>
> On Thu, Dec 15, 2011 at 7:45 AM, Guido Winkelmann
> <guido-ceph@thisisnotatest.de>  wrote:
>> Am Donnerstag, 15. Dezember 2011, 17:32:25 schrieben Sie:
>>> On 12/15/2011 05:07 PM, Guido Winkelmann wrote:
>>>> Hi,
>>>>
>>>> I've got a small ceph cluster with one mon, one mds and two osds (all on
>>>> the same machine, for now), that I want to use as a block- and file
>>>> storage backend for qemu machine virtualisation.
>>>>
>>>> I found that read access to some of the rbd images, or parts of some of
>>>> them sometimes blocks indefinitely, usually after the image has been
>>>> sitting around untouched for a while, for example over night. This has
>>>> the effect that virtual machines that try to access their disks as well
>>>> as rbd commands like "rbd cp" will just hang indefinitely.
>>>>
>>>>   I found that these blocks can usually be "fixed" by restarting one of
>>>>   the>
>>>> osds.
>>>>
>>>> The last time this happened, ceph -s reported one of the osds to be in
>>>> state "active+clean+scrubbing". (I'm afraid I don't have the complete
>>>> output from ceph -s anymore.)
>>>>
>>>> Does anybody have any idea what could be going wrong here?
>>>
>>> I think it's fixed in v0.39
>>
>> I'm already using 0.39, so, no. (Should have mentioned that to start with...)
>>
>>         Guido
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Random blocks when accessing rbd images
  2011-12-15 16:33       ` Wido den Hollander
@ 2011-12-15 16:38         ` Martin Mailand
  2011-12-15 16:44           ` Martin Mailand
  2011-12-15 16:45           ` Wido den Hollander
  0 siblings, 2 replies; 19+ messages in thread
From: Martin Mailand @ 2011-12-15 16:38 UTC (permalink / raw)
  To: Wido den Hollander; +Cc: Samuel Just, ceph-devel

Hi Wido,
but wasn't that fixed a few weeks ago?

-martin

Am 15.12.2011 17:33, schrieb Wido den Hollander:
> Yes, from what I've seen it will block indefinitely until you restart
> one of the OSDs who are member of the PG.
>
> Wido


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Random blocks when accessing rbd images
  2011-12-15 16:38         ` Martin Mailand
@ 2011-12-15 16:44           ` Martin Mailand
  2011-12-16 16:17             ` Wido den Hollander
  2011-12-15 16:45           ` Wido den Hollander
  1 sibling, 1 reply; 19+ messages in thread
From: Martin Mailand @ 2011-12-15 16:44 UTC (permalink / raw)
  To: Wido den Hollander; +Cc: Samuel Just, ceph-devel

Hi,
at least there is a patch that should have fixed it.

http://marc.info/?l=ceph-devel&m=131955913203561&w=2

Am 15.12.2011 17:38, schrieb Martin Mailand:
> Hi Wido,
> but wasn't that fixed a few weeks ago?
>
> -martin
>
> Am 15.12.2011 17:33, schrieb Wido den Hollander:
>> Yes, from what I've seen it will block indefinitely until you restart
>> one of the OSDs who are member of the PG.
>>
>> Wido
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Random blocks when accessing rbd images
  2011-12-15 16:30     ` Samuel Just
  2011-12-15 16:33       ` Wido den Hollander
@ 2011-12-15 16:44       ` Guido Winkelmann
  2011-12-15 17:24         ` Stratos Psomadakis
  1 sibling, 1 reply; 19+ messages in thread
From: Guido Winkelmann @ 2011-12-15 16:44 UTC (permalink / raw)
  To: ceph-devel

Am Donnerstag, 15. Dezember 2011, 08:30:26 schrieben Sie:
> 'ceph pg dump' will tell you the status (active/clean/scrubbing/etc)
> for each pg.  Does the same pg remain in state active+clean+scrubbing
> for more than 10 minutes?

Well, I used ceph -s, which only gave me a summary, but there definitely was a 
PG that was in active+clean+scrubbing for a long time (a lot longer than 10 
minutes), and remained so until I restarted one of the osds.

Unfortunately I don't know how to reliably reproduce the problem, so I can't 
check now...

	Guido

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Random blocks when accessing rbd images
  2011-12-15 16:38         ` Martin Mailand
  2011-12-15 16:44           ` Martin Mailand
@ 2011-12-15 16:45           ` Wido den Hollander
  1 sibling, 0 replies; 19+ messages in thread
From: Wido den Hollander @ 2011-12-15 16:45 UTC (permalink / raw)
  To: Martin Mailand; +Cc: ceph-devel

On 12/15/2011 05:38 PM, Martin Mailand wrote:
> Hi Wido,
> but wasn't that fixed a few weeks ago?

I'm not sure. I think I saw it not so long ago.

My cluster just got a upgrade to 0.39, so everything got a restart. I'll 
keep an eye out for blocking scrubs.

Wido

>
> -martin
>
> Am 15.12.2011 17:33, schrieb Wido den Hollander:
>> Yes, from what I've seen it will block indefinitely until you restart
>> one of the OSDs who are member of the PG.
>>
>> Wido
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Random blocks when accessing rbd images
  2011-12-15 16:31     ` Martin Mailand
@ 2011-12-15 16:51       ` Guido Winkelmann
  2011-12-15 17:32       ` Guido Winkelmann
  1 sibling, 0 replies; 19+ messages in thread
From: Guido Winkelmann @ 2011-12-15 16:51 UTC (permalink / raw)
  To: Martin Mailand; +Cc: ceph-devel

Am Donnerstag, 15. Dezember 2011, 17:31:22 schrieb Martin Mailand:
> Hi Guido,
> I am running ceph version 0.39-37-g54758ab
> (commit:54758abccf429122c1bc3bce6d01bc33f1cfe238) on my cluster and I do
> not see this problem. Do you use the qemu rbd block driver or the kernel
> mount?
> How did you install ceph, via the packages?

I downloaded the tarball from 
http://ceph.newdream.net/download/ untarred it, and did the usual ./configure ; 
make ; make install. Though if I were to do it again, I would prefer rpmbuild 
now.

The host system is CentOS 6, with the Kernel upgraded to 3.1.1, btw.

	Guido

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Random blocks when accessing rbd images
  2011-12-15 16:44       ` Guido Winkelmann
@ 2011-12-15 17:24         ` Stratos Psomadakis
  2011-12-15 21:28           ` Samuel Just
  0 siblings, 1 reply; 19+ messages in thread
From: Stratos Psomadakis @ 2011-12-15 17:24 UTC (permalink / raw)
  To: Guido Winkelmann; +Cc: ceph-devel

[-- Attachment #1: Type: text/plain, Size: 1535 bytes --]

On 12/15/2011 06:44 PM, Guido Winkelmann wrote:
> Am Donnerstag, 15. Dezember 2011, 08:30:26 schrieben Sie:
>> 'ceph pg dump' will tell you the status (active/clean/scrubbing/etc)
>> for each pg.  Does the same pg remain in state active+clean+scrubbing
>> for more than 10 minutes?
> Well, I used ceph -s, which only gave me a summary, but there definitely was a 
> PG that was in active+clean+scrubbing for a long time (a lot longer than 10 
> minutes), and remained so until I restarted one of the osds.
>
> Unfortunately I don't know how to reliably reproduce the problem, so I can't 
> check now...
When I hit that bug, I was able to trigger it (more easily) by setting:
    osd scrub max interval = 120
in the [osd] section in ceph.conf, forcing the cluster to send pg scrubs
more often.

Now, if you stress the cluster a bit (some heavy I/O), coupled with
singe OSD restarts, I think you could be able to trigger it.

Btw, I was using the rbd in-kernel driver.

Some info from the debugging I did, I think that at some point after
setting finalizing_scrub = true, it turns out that (last_update_applied
!= info.last_update), but the scrub operation is never requeued by
op_applied for some reason, and so the PG is stuck as scrubbing.

> 	Guido
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
-- 
Stratos Psomadakis
<psomas@grnet.gr>



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Random blocks when accessing rbd images
  2011-12-15 16:31     ` Martin Mailand
  2011-12-15 16:51       ` Guido Winkelmann
@ 2011-12-15 17:32       ` Guido Winkelmann
  1 sibling, 0 replies; 19+ messages in thread
From: Guido Winkelmann @ 2011-12-15 17:32 UTC (permalink / raw)
  To: ceph-devel

Am Donnerstag, 15. Dezember 2011, 17:31:22 schrieb Martin Mailand:
> Hi Guido,
> I am running ceph version 0.39-37-g54758ab
> (commit:54758abccf429122c1bc3bce6d01bc33f1cfe238) on my cluster and I do
> not see this problem. Do you use the qemu rbd block driver or the kernel
> mount?

Both - Kernel mount for the Cephfs, qemu rbd driver for the actual qemu 
volumes.

	Guido

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Random blocks when accessing rbd images
  2011-12-15 17:24         ` Stratos Psomadakis
@ 2011-12-15 21:28           ` Samuel Just
  0 siblings, 0 replies; 19+ messages in thread
From: Samuel Just @ 2011-12-15 21:28 UTC (permalink / raw)
  To: ceph-devel

This is likely the problem.  I'll try to reproduce it today. (Meant to
post this to the list the first time)
-Sam

On Thu, Dec 15, 2011 at 9:24 AM, Stratos Psomadakis <psomas@grnet.gr> wrote:
> On 12/15/2011 06:44 PM, Guido Winkelmann wrote:
>> Am Donnerstag, 15. Dezember 2011, 08:30:26 schrieben Sie:
>>> 'ceph pg dump' will tell you the status (active/clean/scrubbing/etc)
>>> for each pg.  Does the same pg remain in state active+clean+scrubbing
>>> for more than 10 minutes?
>> Well, I used ceph -s, which only gave me a summary, but there definitely was a
>> PG that was in active+clean+scrubbing for a long time (a lot longer than 10
>> minutes), and remained so until I restarted one of the osds.
>>
>> Unfortunately I don't know how to reliably reproduce the problem, so I can't
>> check now...
> When I hit that bug, I was able to trigger it (more easily) by setting:
>    osd scrub max interval = 120
> in the [osd] section in ceph.conf, forcing the cluster to send pg scrubs
> more often.
>
> Now, if you stress the cluster a bit (some heavy I/O), coupled with
> singe OSD restarts, I think you could be able to trigger it.
>
> Btw, I was using the rbd in-kernel driver.
>
> Some info from the debugging I did, I think that at some point after
> setting finalizing_scrub = true, it turns out that (last_update_applied
> != info.last_update), but the scrub operation is never requeued by
> op_applied for some reason, and so the PG is stuck as scrubbing.
>
>>       Guido
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> --
> Stratos Psomadakis
> <psomas@grnet.gr>
>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Random blocks when accessing rbd images
  2011-12-15 16:44           ` Martin Mailand
@ 2011-12-16 16:17             ` Wido den Hollander
  2011-12-16 21:17               ` Samuel Just
  0 siblings, 1 reply; 19+ messages in thread
From: Wido den Hollander @ 2011-12-16 16:17 UTC (permalink / raw)
  To: Martin Mailand; +Cc: ceph-devel

Hi,

On 12/15/2011 05:44 PM, Martin Mailand wrote:
> Hi,
> at least there is a patch that should have fixed it.
>
> http://marc.info/?l=ceph-devel&m=131955913203561&w=2
>

I'm still seeing this one:

2011-12-16 17:14:53.638722    pg v1170309: 7808 pgs: 7807 active+clean, 
1 active+clean+scrubbing; 15279 MB data, 47262 MB used, 73838 GB / 74520 
GB avail

In this case PG "2.688" is in scrubbing state and is staying that way.

I'm running v0.39, not the latest master.

Any suggestions to trace this one down?

Wido

> Am 15.12.2011 17:38, schrieb Martin Mailand:
>> Hi Wido,
>> but wasn't that fixed a few weeks ago?
>>
>> -martin
>>
>> Am 15.12.2011 17:33, schrieb Wido den Hollander:
>>> Yes, from what I've seen it will block indefinitely until you restart
>>> one of the OSDs who are member of the PG.
>>>
>>> Wido
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Random blocks when accessing rbd images
  2011-12-16 16:17             ` Wido den Hollander
@ 2011-12-16 21:17               ` Samuel Just
  2011-12-18 14:26                 ` Wido den Hollander
  2011-12-22 13:59                 ` Martin Mailand
  0 siblings, 2 replies; 19+ messages in thread
From: Samuel Just @ 2011-12-16 21:17 UTC (permalink / raw)
  To: ceph-devel

In master, 061e7619aacf60a828e0ce84a108d5a0bea247c6 may fix the
problem.  If not, 5274e88d2cb8c0449a4ecd1ff0cf8bb0af2cfc97 includes
some asserts that may give us a clue as to how this is happening.
-Sam

On Fri, Dec 16, 2011 at 8:17 AM, Wido den Hollander <wido@widodh.nl> wrote:
> Hi,
>
>
> On 12/15/2011 05:44 PM, Martin Mailand wrote:
>>
>> Hi,
>> at least there is a patch that should have fixed it.
>>
>> http://marc.info/?l=ceph-devel&m=131955913203561&w=2
>>
>
> I'm still seeing this one:
>
> 2011-12-16 17:14:53.638722    pg v1170309: 7808 pgs: 7807 active+clean, 1
> active+clean+scrubbing; 15279 MB data, 47262 MB used, 73838 GB / 74520 GB
> avail
>
> In this case PG "2.688" is in scrubbing state and is staying that way.
>
> I'm running v0.39, not the latest master.
>
> Any suggestions to trace this one down?
>
> Wido
>
>
>> Am 15.12.2011 17:38, schrieb Martin Mailand:
>>>
>>> Hi Wido,
>>> but wasn't that fixed a few weeks ago?
>>>
>>> -martin
>>>
>>> Am 15.12.2011 17:33, schrieb Wido den Hollander:
>>>>
>>>> Yes, from what I've seen it will block indefinitely until you restart
>>>> one of the OSDs who are member of the PG.
>>>>
>>>> Wido
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Random blocks when accessing rbd images
  2011-12-16 21:17               ` Samuel Just
@ 2011-12-18 14:26                 ` Wido den Hollander
  2011-12-22 13:59                 ` Martin Mailand
  1 sibling, 0 replies; 19+ messages in thread
From: Wido den Hollander @ 2011-12-18 14:26 UTC (permalink / raw)
  To: Samuel Just; +Cc: ceph-devel

On 12/16/2011 10:17 PM, Samuel Just wrote:
> In master, 061e7619aacf60a828e0ce84a108d5a0bea247c6 may fix the
> problem.  If not, 5274e88d2cb8c0449a4ecd1ff0cf8bb0af2cfc97 includes
> some asserts that may give us a clue as to how this is happening.

I've been running with bfbde5b18525406fc3b678751459e989ea5d4977 for over 
24 hours now, everything is still active+clean.

If it comes back I'll update this thread.

Thanks,

Wido

> -Sam
>
> On Fri, Dec 16, 2011 at 8:17 AM, Wido den Hollander<wido@widodh.nl>  wrote:
>> Hi,
>>
>>
>> On 12/15/2011 05:44 PM, Martin Mailand wrote:
>>>
>>> Hi,
>>> at least there is a patch that should have fixed it.
>>>
>>> http://marc.info/?l=ceph-devel&m=131955913203561&w=2
>>>
>>
>> I'm still seeing this one:
>>
>> 2011-12-16 17:14:53.638722    pg v1170309: 7808 pgs: 7807 active+clean, 1
>> active+clean+scrubbing; 15279 MB data, 47262 MB used, 73838 GB / 74520 GB
>> avail
>>
>> In this case PG "2.688" is in scrubbing state and is staying that way.
>>
>> I'm running v0.39, not the latest master.
>>
>> Any suggestions to trace this one down?
>>
>> Wido
>>
>>
>>> Am 15.12.2011 17:38, schrieb Martin Mailand:
>>>>
>>>> Hi Wido,
>>>> but wasn't that fixed a few weeks ago?
>>>>
>>>> -martin
>>>>
>>>> Am 15.12.2011 17:33, schrieb Wido den Hollander:
>>>>>
>>>>> Yes, from what I've seen it will block indefinitely until you restart
>>>>> one of the OSDs who are member of the PG.
>>>>>
>>>>> Wido
>>>>
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Random blocks when accessing rbd images
  2011-12-16 21:17               ` Samuel Just
  2011-12-18 14:26                 ` Wido den Hollander
@ 2011-12-22 13:59                 ` Martin Mailand
  1 sibling, 0 replies; 19+ messages in thread
From: Martin Mailand @ 2011-12-22 13:59 UTC (permalink / raw)
  To: Samuel Just; +Cc: ceph-devel

Hi Samuel
I think I am seeing it now.

root@s-brick-003:~# ceph pg dump|grep -i scrub
pg_stat objects mip     degr    unf     kb      bytes   log     disklog 
state   v       reported        up      acting  last_scrub
0.6     0       0       0       0       0       0       0       0 
active+clean+scrubbing  0'0     60'156  [6,2]   [6,2]   0'0 
2011-12-20 14:44:55.787529
root@s-brick-003:~# ceph -v
ceph version 0.39-171-gdcedda8 
(commit:dcedda84d0e1f69af985c301276c67c1b11e7efc)
root@s-brick-003:~#


I also had an osd crash and hit this (Assertion: 
./messages/MOSDRepScrub.h: 64: FAILED assert(v == 0)), see my other 
email for more information.

-martin



Am 16.12.2011 22:17, schrieb Samuel Just:
> In master, 061e7619aacf60a828e0ce84a108d5a0bea247c6 may fix the
> problem.  If not, 5274e88d2cb8c0449a4ecd1ff0cf8bb0af2cfc97 includes
> some asserts that may give us a clue as to how this is happening.
> -Sam


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2011-12-22 14:00 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-15 15:07 Random blocks when accessing rbd images Guido Winkelmann
2011-12-15 15:13 ` Wido den Hollander
2011-12-15 15:32 ` Stratos Psomadakis
2011-12-15 15:45   ` Guido Winkelmann
2011-12-15 16:30     ` Samuel Just
2011-12-15 16:33       ` Wido den Hollander
2011-12-15 16:38         ` Martin Mailand
2011-12-15 16:44           ` Martin Mailand
2011-12-16 16:17             ` Wido den Hollander
2011-12-16 21:17               ` Samuel Just
2011-12-18 14:26                 ` Wido den Hollander
2011-12-22 13:59                 ` Martin Mailand
2011-12-15 16:45           ` Wido den Hollander
2011-12-15 16:44       ` Guido Winkelmann
2011-12-15 17:24         ` Stratos Psomadakis
2011-12-15 21:28           ` Samuel Just
2011-12-15 16:31     ` Martin Mailand
2011-12-15 16:51       ` Guido Winkelmann
2011-12-15 17:32       ` Guido Winkelmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.