All of lore.kernel.org
 help / color / mirror / Atom feed
* q. about rbd-header
@ 2012-03-14 15:05 Oliver Francke
  2012-03-14 20:49 ` Oliver Francke
  2012-03-14 20:54 ` Josh Durgin
  0 siblings, 2 replies; 6+ messages in thread
From: Oliver Francke @ 2012-03-14 15:05 UTC (permalink / raw)
  To: ceph-devel

Hey,

anybody out there who could explain the structure of a rbd-header? After 
last crash we have about 10 images with a:
   2012-03-14 15:22:47.998790 7f45a61e3760 librbd: Error reading 
header: 2 No such file or directory
error opening image vm-266-disk-1.rbd: 2 No such file or directory
... error?
I understand the "rb.x.y"-prefix, the 2 ^ 16hex as block-size. But 
the size/count encoding is not intuitive ;)

Besides one file, where I "created" a header and putted it via "rados 
put" back into the pool, and got some files
back, many of the other images with lost headers have different sizes.

We got bad luck again, too many crashed VM's, too much data-loss...

Comments welcome ;)

Oliver.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: q. about rbd-header
  2012-03-14 15:05 q. about rbd-header Oliver Francke
@ 2012-03-14 20:49 ` Oliver Francke
  2012-03-14 21:22   ` Sage Weil
  2012-03-14 21:59   ` Josh Durgin
  2012-03-14 20:54 ` Josh Durgin
  1 sibling, 2 replies; 6+ messages in thread
From: Oliver Francke @ 2012-03-14 20:49 UTC (permalink / raw)
  To: ceph-devel

Well,

nobody able to sched some light in?
Did some math and found out how to fill the size bytes.

But, one question never got answered:
    - why is - with busy VMs - frequently the first block affected,
      with the result of damaged grub-loaders/partition-tables/filesystems?
      Is this some NULL/zero pointer thingy in case of ceph-failure?

If you demand some broken images… we have many of them to investigate,
unfortunately.

Maybe this sounds a bit harsh, after the 5th night-shift trying to repair images
and keep customers calm, I think this is forgivable.

Oliver.

Am 14.03.2012 um 16:05 schrieb Oliver Francke:

> Hey,
> 
> anybody out there who could explain the structure of a rbd-header? After 
> last crash we have about 10 images with a:
>   2012-03-14 15:22:47.998790 7f45a61e3760 librbd: Error reading 
> header: 2 No such file or directory
> error opening image vm-266-disk-1.rbd: 2 No such file or directory
> ... error?
> I understand the "rb.x.y"-prefix, the 2 ^ 16hex as block-size. But 
> the size/count encoding is not intuitive ;)
> 
> Besides one file, where I "created" a header and putted it via "rados 
> put" back into the pool, and got some files
> back, many of the other images with lost headers have different sizes.
> 
> We got bad luck again, too many crashed VM's, too much data-loss...
> 
> Comments welcome ;)
> 
> Oliver.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: q. about rbd-header
  2012-03-14 15:05 q. about rbd-header Oliver Francke
  2012-03-14 20:49 ` Oliver Francke
@ 2012-03-14 20:54 ` Josh Durgin
  1 sibling, 0 replies; 6+ messages in thread
From: Josh Durgin @ 2012-03-14 20:54 UTC (permalink / raw)
  To: Oliver Francke; +Cc: ceph-devel

On 03/14/2012 08:05 AM, Oliver Francke wrote:
> Hey,
>
> anybody out there who could explain the structure of a rbd-header? After
> last crash we have about 10 images with a:
>     2012-03-14 15:22:47.998790 7f45a61e3760 librbd: Error reading
> header: 2 No such file or directory
> error opening image vm-266-disk-1.rbd: 2 No such file or directory
> ... error?
> I understand the "rb.x.y"-prefix, the 2 ^ 16hex as block-size. But
> the size/count encoding is not intuitive ;)

The data structure is rbd_obj_header_ondisk, defined in 
src/include/rbd_types.h.

The size of the objects is stored as a shift value in the 'order' field. 
That is, object size is (1 << order) bytes, and the default of 4MB is 
order 22. The total size (image_size) is just a number of bytes.

The encoding of snapshots is a bit more painful, since you'd need to 
look up the right snapshot ids for each image by looking at its existing 
objects. If you don't mind losing the snapshots, you can just zero out 
the fields after image_size. Extra zero bytes shouldn't matter after 
snap_names_len is 0.

> Besides one file, where I "created" a header and putted it via "rados
> put" back into the pool, and got some files
> back, many of the other images with lost headers have different sizes.

If you don't know the correct size, setting it too high won't use any 
more space unless the fs using it is expanded.

> We got bad luck again, too many crashed VM's, too much data-loss...
>
> Comments welcome ;)
>
> Oliver.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: q. about rbd-header
  2012-03-14 20:49 ` Oliver Francke
@ 2012-03-14 21:22   ` Sage Weil
  2012-03-14 21:59   ` Josh Durgin
  1 sibling, 0 replies; 6+ messages in thread
From: Sage Weil @ 2012-03-14 21:22 UTC (permalink / raw)
  To: Oliver Francke; +Cc: ceph-devel

On Wed, 14 Mar 2012, Oliver Francke wrote:
> Well,
> 
> nobody able to sched some light in?
> Did some math and found out how to fill the size bytes.
> 
> But, one question never got answered:
>     - why is - with busy VMs - frequently the first block affected,
>       with the result of damaged grub-loaders/partition-tables/filesystems?
>       Is this some NULL/zero pointer thingy in case of ceph-failure?
> 
> If you demand some broken images? we have many of them to investigate,
> unfortunately.

We are definitely interested in this failure, _especially_ if it's 
something you can reproduce.  Even some broken images are worth looking 
at, though, to see what the corruptions look like.  Can you share the 
first block of one of these images with us (without worrying about 
customer data)?

I've opened #2178 to track this.  We can either attach everything to the 
bug, or share data out of band if it needs to stay private.

Thanks!
sage



> 
> Maybe this sounds a bit harsh, after the 5th night-shift trying to repair images
> and keep customers calm, I think this is forgivable.
> 
> Oliver.
> 
> Am 14.03.2012 um 16:05 schrieb Oliver Francke:
> 
> > Hey,
> > 
> > anybody out there who could explain the structure of a rbd-header? After 
> > last crash we have about 10 images with a:
> >   2012-03-14 15:22:47.998790 7f45a61e3760 librbd: Error reading 
> > header: 2 No such file or directory
> > error opening image vm-266-disk-1.rbd: 2 No such file or directory
> > ... error?
> > I understand the "rb.x.y"-prefix, the 2 ^ 16hex as block-size. But 
> > the size/count encoding is not intuitive ;)
> > 
> > Besides one file, where I "created" a header and putted it via "rados 
> > put" back into the pool, and got some files
> > back, many of the other images with lost headers have different sizes.
> > 
> > We got bad luck again, too many crashed VM's, too much data-loss...
> > 
> > Comments welcome ;)
> > 
> > Oliver.
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: q. about rbd-header
  2012-03-14 20:49 ` Oliver Francke
  2012-03-14 21:22   ` Sage Weil
@ 2012-03-14 21:59   ` Josh Durgin
  2012-03-15 10:21     ` Oliver Francke
  1 sibling, 1 reply; 6+ messages in thread
From: Josh Durgin @ 2012-03-14 21:59 UTC (permalink / raw)
  To: Oliver Francke; +Cc: ceph-devel

On 03/14/2012 01:49 PM, Oliver Francke wrote:
> Well,
>
> nobody able to sched some light in?
> Did some math and found out how to fill the size bytes.

Sorry I didn't respond faster.

> But, one question never got answered:
>      - why is - with busy VMs - frequently the first block affected,
>        with the result of damaged grub-loaders/partition-tables/filesystems?
>        Is this some NULL/zero pointer thingy in case of ceph-failure?

My guess is that this is not the first object affected, but it's where 
the loss of an object is most easily noticeable - if an object doesn't 
exist, it's treated as being full of zeros, which might go undetected 
for a long time if it's e.g. some temp or log file that's not reread and 
verified.

> If you demand some broken images… we have many of them to investigate,
> unfortunately.

We'd really like to find the root cause of the problem. One possibility 
is some bad interaction between osds running different versions. This 
caused one issue with recovery stxShadow saw yesterday, for example 
(http://tracker.newdream.net/issues/2132). Had you been doing rolling 
upgrades of osds before these problems appeared? If so, do you know 
which versions you had running concurrently?

Are your osds often restarting?

What we'd need to diagnose this are osd logs during recovery with:

debug osd = 20
debug ms = 1

Once you detect the problem, a log from each replica storing the pg the 
bad/missing object is in should be enough.

And just to make sure, you aren't writing to these rbd images from 
multiple places, right? This wouldn't cause the missing header objects, 
but is likely to cause corruption of the image data. This could happen, 
for example, by rolling an image back to a snapshot while a vm is 
running on it.

Josh

> Maybe this sounds a bit harsh, after the 5th night-shift trying to repair images
> and keep customers calm, I think this is forgivable.
>
> Oliver.
>
> Am 14.03.2012 um 16:05 schrieb Oliver Francke:
>
>> Hey,
>>
>> anybody out there who could explain the structure of a rbd-header? After
>> last crash we have about 10 images with a:
>>    2012-03-14 15:22:47.998790 7f45a61e3760 librbd: Error reading
>> header: 2 No such file or directory
>> error opening image vm-266-disk-1.rbd: 2 No such file or directory
>> ... error?
>> I understand the "rb.x.y"-prefix, the 2 ^ 16hex as block-size. But
>> the size/count encoding is not intuitive ;)
>>
>> Besides one file, where I "created" a header and putted it via "rados
>> put" back into the pool, and got some files
>> back, many of the other images with lost headers have different sizes.
>>
>> We got bad luck again, too many crashed VM's, too much data-loss...
>>
>> Comments welcome ;)
>>
>> Oliver.
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: q. about rbd-header
  2012-03-14 21:59   ` Josh Durgin
@ 2012-03-15 10:21     ` Oliver Francke
  0 siblings, 0 replies; 6+ messages in thread
From: Oliver Francke @ 2012-03-15 10:21 UTC (permalink / raw)
  To: Josh Durgin; +Cc: ceph-devel

Hi Josh,

On 03/14/2012 10:59 PM, Josh Durgin wrote:
> On 03/14/2012 01:49 PM, Oliver Francke wrote:
>> Well,
>>
>> nobody able to sched some light in?
>> Did some math and found out how to fill the size bytes.
>
> Sorry I didn't respond faster.
>
>> But, one question never got answered:
>>      - why is - with busy VMs - frequently the first block affected,
>>        with the result of damaged 
>> grub-loaders/partition-tables/filesystems?
>>        Is this some NULL/zero pointer thingy in case of ceph-failure?
>
> My guess is that this is not the first object affected, but it's where 
> the loss of an object is most easily noticeable - if an object doesn't 
> exist, it's treated as being full of zeros, which might go undetected 
> for a long time if it's e.g. some temp or log file that's not reread 
> and verified.

well, I responded to Sage with some more infos from one of the images 
where the header is missing... Did not want to bother the list ;)

>
>> If you demand some broken images… we have many of them to investigate,
>> unfortunately.
>
> We'd really like to find the root cause of the problem. One 
> possibility is some bad interaction between osds running different 
> versions. This caused one issue with recovery stxShadow saw yesterday, 
> for example (http://tracker.newdream.net/issues/2132). Had you been 
> doing rolling upgrades of osds before these problems appeared? If so, 
> do you know which versions you had running concurrently?
>
> Are your osds often restarting?
>
> What we'd need to diagnose this are osd logs during recovery with:
>
> debug osd = 20
> debug ms = 1
>
> Once you detect the problem, a log from each replica storing the pg 
> the bad/missing object is in should be enough.
>
> And just to make sure, you aren't writing to these rbd images from 
> multiple places, right? This wouldn't cause the missing header 
> objects, but is likely to cause corruption of the image data. This 
> could happen, for example, by rolling an image back to a snapshot 
> while a vm is running on it.

Currently we don't use snapshots. And of course ensure, a VM is running 
once at a time ;-) And we had some "rolling upgrade", but this was 
_after_ trouble/crashes occured.

Oliver.

>
> Josh
>
>> Maybe this sounds a bit harsh, after the 5th night-shift trying to 
>> repair images
>> and keep customers calm, I think this is forgivable.
>>
>> Oliver.
>>
>> Am 14.03.2012 um 16:05 schrieb Oliver Francke:
>>
>>> Hey,
>>>
>>> anybody out there who could explain the structure of a rbd-header? 
>>> After
>>> last crash we have about 10 images with a:
>>>    2012-03-14 15:22:47.998790 7f45a61e3760 librbd: Error reading
>>> header: 2 No such file or directory
>>> error opening image vm-266-disk-1.rbd: 2 No such file or directory
>>> ... error?
>>> I understand the "rb.x.y"-prefix, the 2 ^ 16hex as block-size. But
>>> the size/count encoding is not intuitive ;)
>>>
>>> Besides one file, where I "created" a header and putted it via "rados
>>> put" back into the pool, and got some files
>>> back, many of the other images with lost headers have different sizes.
>>>
>>> We got bad luck again, too many crashed VM's, too much data-loss...
>>>
>>> Comments welcome ;)
>>>
>>> Oliver.
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe 
>>> ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 

Oliver Francke

filoo GmbH
Moltkestraße 25a
33330 Gütersloh
HRB4355 AG Gütersloh

Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz

Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-03-15 10:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-14 15:05 q. about rbd-header Oliver Francke
2012-03-14 20:49 ` Oliver Francke
2012-03-14 21:22   ` Sage Weil
2012-03-14 21:59   ` Josh Durgin
2012-03-15 10:21     ` Oliver Francke
2012-03-14 20:54 ` Josh Durgin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.