Understanding Ceph

All of lore.kernel.org
 help / color / mirror / Atom feed

* Understanding Ceph
@ 2011-12-18  6:41 Bill Hastings
  2011-12-18 12:17 ` Christian Brunner
  0 siblings, 1 reply; 53+ messages in thread
From: Bill Hastings @ 2011-12-18  6:41 UTC (permalink / raw)
  To: ceph-devel

Hi All

I am trying to get my feet wet with Ceph and RADOS. My aim is to use
it as a block device for KVM instances. My understanding is that
virtual disks get striped at 1 MB boundaries by default. Does that
mean that there are going to be 1MB files on disks? Let's say I want
to update a particular vdisk with 16 bytes of data at offset 4096.
This would mean I want to update the first 1MB chunk. Let us assume I
have 3 way replication and the replicas are A, B and C. The write may
succeed at A and B and fail at C. Is there any state kept in the
metadata indicating at which replicas the write succeeded?

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2011-12-18  6:41 Bill Hastings
@ 2011-12-18 12:17 ` Christian Brunner
  2011-12-18 16:43   ` Bill Hastings
  0 siblings, 1 reply; 53+ messages in thread
From: Christian Brunner @ 2011-12-18 12:17 UTC (permalink / raw)
  To: Bill Hastings; +Cc: ceph-devel

Hi Bill,

2011/12/18 Bill Hastings <bllhastings@gmail.com>:

> I am trying to get my feet wet with Ceph and RADOS. My aim is to use
> it as a block device for KVM instances. My understanding is that
> virtual disks get striped at 1 MB boundaries by default. Does that
> mean that there are going to be 1MB files on disks?

Yes, the virtual disk is striped over multiple objects. By default
they have a size of 4MB (not 1MB). Ceph is storing objects, but in the
end they will be written as files on the different object stores.

> Let's say I want
> to update a particular vdisk with 16 bytes of data at offset 4096.
> This would mean I want to update the first 1MB chunk.

Yes, but you don't need to write the whole chunk again. You can update
the 16 bytes withour rewriting everything. (In fact rbd is using
sparse objects by default - "thin provisioning").

> Let us assume I
> have 3 way replication and the replicas are A, B and C. The write may
> succeed at A and B and fail at C. Is there any state kept in the
> metadata indicating at which replicas the write succeeded?

Objects are grouped into placement groups (PGs).The ceph monitor is
tracking the state of the PGs. With this information, the clients will
be directed to the working replicas. When an object store is failing,
it will start rebuilding the missing objects on other object stores.

Regards,
Christian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2011-12-18 12:17 ` Christian Brunner
@ 2011-12-18 16:43   ` Bill Hastings
  2011-12-18 17:17     ` Yehuda Sadeh Weinraub
  0 siblings, 1 reply; 53+ messages in thread
From: Bill Hastings @ 2011-12-18 16:43 UTC (permalink / raw)
  To: chb; +Cc: ceph-devel

Thanks for the response. What if a write of 16 bytes was successful at
nodes A and B and failed at C, perhaps C had a momentarily unreachable
via the network? How is the Ceph client prevented from performing the
next read at C? Also what if the writes to OSD's were successful but
the metadata update fails? How is this managed if at all? How are
writes that straddle chunk boundaries handled from a transactional
perspective? I am just in the process of investigation so please
forgive me if the questions are very naive.


On Sun, Dec 18, 2011 at 4:17 AM, Christian Brunner <chb@muc.de> wrote:
> Hi Bill,
>
> 2011/12/18 Bill Hastings <bllhastings@gmail.com>:
>
>> I am trying to get my feet wet with Ceph and RADOS. My aim is to use
>> it as a block device for KVM instances. My understanding is that
>> virtual disks get striped at 1 MB boundaries by default. Does that
>> mean that there are going to be 1MB files on disks?
>
> Yes, the virtual disk is striped over multiple objects. By default
> they have a size of 4MB (not 1MB). Ceph is storing objects, but in the
> end they will be written as files on the different object stores.
>
>> Let's say I want
>> to update a particular vdisk with 16 bytes of data at offset 4096.
>> This would mean I want to update the first 1MB chunk.
>
> Yes, but you don't need to write the whole chunk again. You can update
> the 16 bytes withour rewriting everything. (In fact rbd is using
> sparse objects by default - "thin provisioning").
>
>
>> Let us assume I
>> have 3 way replication and the replicas are A, B and C. The write may
>> succeed at A and B and fail at C. Is there any state kept in the
>> metadata indicating at which replicas the write succeeded?
>
> Objects are grouped into placement groups (PGs).The ceph monitor is
> tracking the state of the PGs. With this information, the clients will
> be directed to the working replicas. When an object store is failing,
> it will start rebuilding the missing objects on other object stores.
>
> Regards,
> Christian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2011-12-18 16:43   ` Bill Hastings
@ 2011-12-18 17:17     ` Yehuda Sadeh Weinraub
  2011-12-18 17:37       ` Bill Hastings
  0 siblings, 1 reply; 53+ messages in thread
From: Yehuda Sadeh Weinraub @ 2011-12-18 17:17 UTC (permalink / raw)
  To: Bill Hastings; +Cc: chb, ceph-devel

On Sun, Dec 18, 2011 at 8:43 AM, Bill Hastings <bllhastings@gmail.com> wrote:
> Thanks for the response. What if a write of 16 bytes was successful at
> nodes A and B and failed at C, perhaps C had a momentarily unreachable
> via the network? How is the Ceph client prevented from performing the
> next read at C? Also what if the writes to OSD's were successful but

In that case the client wouldn't have gotten a successful response in
the first place. The client sends the writes to the primary osd
handling that pg, and will get the following responses from it:
 - ack message when the request is in the page/buffer cache on all replicas
 - commit message when the request is on stable storage on all replicas

(depending on setup, in some cases it'll just get a commit message
which implies ack anyway)

The osd is responsible that data was written to all replicas, and the
client wouldn't get the commit response until then. For rbd, clients
wait for the commit message as an acknowledgment to write completion.

> the metadata update fails? How is this managed if at all? How are

What kind of metadata are you referring to? For rbd there is no metadata update.

> writes that straddle chunk boundaries handled from a transactional
> perspective? I am just in the process of investigation so please
> forgive me if the questions are very naive.

Depending on which client we're talking about. The short answer is
that the client will only get a response after all chunks were written
and acknowledged.

However, there are currently two different implementations; one is in
the linux kernel and the other one is based on librbd. In the linux
kernel, acknowledging the write is being done in byte order of the
request. That is, only after the first chunk was acked, the second one
would be acked, even if the osds responded in different order.
In librbd there's another complexity since we can cache the requests
and respond with an early ack. Ignoring that, client will only get a
response after all chunks were applied and acked.

Yehuda

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2011-12-18 17:17     ` Yehuda Sadeh Weinraub
@ 2011-12-18 17:37       ` Bill Hastings
  0 siblings, 0 replies; 53+ messages in thread
From: Bill Hastings @ 2011-12-18 17:37 UTC (permalink / raw)
  To: Yehuda Sadeh Weinraub; +Cc: chb, ceph-devel

These are perhaps very inane questions but I am trying to wrap my head
around this whole thing. So basically the primary OSD handling a
particular PG will make sure that the writes happen at all replicas. I
am assuming the client would timeout in case it doesn't get a
ack/commit within some time period. Is this true? If this is the case
the write is deemed a failure but subsequent read could read the value
of the failed write because there don't seem to be any rollback
semantics. Is my understanding correct?

Are these replicas fixed for a PG? If for a PG the replicas are A, B
and C and C happens to be down during the write will the write be
deemed a failure till C is back up or is another replica chosen in
place of C?

On Sun, Dec 18, 2011 at 9:17 AM, Yehuda Sadeh Weinraub
<yehuda.sadeh@dreamhost.com> wrote:
> On Sun, Dec 18, 2011 at 8:43 AM, Bill Hastings <bllhastings@gmail.com> wrote:
>> Thanks for the response. What if a write of 16 bytes was successful at
>> nodes A and B and failed at C, perhaps C had a momentarily unreachable
>> via the network? How is the Ceph client prevented from performing the
>> next read at C? Also what if the writes to OSD's were successful but
>
> In that case the client wouldn't have gotten a successful response in
> the first place. The client sends the writes to the primary osd
> handling that pg, and will get the following responses from it:
>  - ack message when the request is in the page/buffer cache on all replicas
>  - commit message when the request is on stable storage on all replicas
>
> (depending on setup, in some cases it'll just get a commit message
> which implies ack anyway)
>
> The osd is responsible that data was written to all replicas, and the
> client wouldn't get the commit response until then. For rbd, clients
> wait for the commit message as an acknowledgment to write completion.
>
>> the metadata update fails? How is this managed if at all? How are
>
> What kind of metadata are you referring to? For rbd there is no metadata update.
>
>> writes that straddle chunk boundaries handled from a transactional
>> perspective? I am just in the process of investigation so please
>> forgive me if the questions are very naive.
>
> Depending on which client we're talking about. The short answer is
> that the client will only get a response after all chunks were written
> and acknowledged.
>
> However, there are currently two different implementations; one is in
> the linux kernel and the other one is based on librbd. In the linux
> kernel, acknowledging the write is being done in byte order of the
> request. That is, only after the first chunk was acked, the second one
> would be acked, even if the osds responded in different order.
> In librbd there's another complexity since we can cache the requests
> and respond with an early ack. Ignoring that, client will only get a
> response after all chunks were applied and acked.
>
>
> Yehuda
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Understanding Ceph
@ 2013-01-19 15:50 Peter Smith
  2013-01-19 16:26 ` Dimitri Maziuk
  0 siblings, 1 reply; 53+ messages in thread
From: Peter Smith @ 2013-01-19 15:50 UTC (permalink / raw)
  To: ceph-devel

Hi,

I am considering deploying Ceph as the volume backend for our
Openstack cloud service. After reviewing the documents available on
the Internet, I am still confusing with several things.

1. Architecture/Implementation questions: What are the functionalities
of kernel-rbd, kernel client, kernel object exactly? How the different
parts of Ceph interact with each other, e.g. what is the data path of
librados/librbd requests going into OSD daemon?
2. QEMU performance: It says the QEMU uses librbd to avoid the
overhead of kernel object. What does this mean? With the answer of
question 1, I can probably understand this one. Do you have any data
about the performance difference between Ceph and Sheepdog?
3. OS recommendation: The OS recommendation page:
http://ceph.com/docs/master/install/os-recommendations/#bobtail-0-56
says CentOS 6.3 has a default kernel with old kernel client. CentOS
6.3 is our production environment. I am wondering if we only make use
of the Ceph block storage feature, does this old kernel client
influence the stability of production? Do you suggest we upgrade from
the default kernel of CentOS 6.3? I am concerning this will hurt the
stability of CentOS.

Thank you very much for anwsering my questions. I really appreciate.

Regards,
Peter

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-19 15:50 Understanding Ceph Peter Smith
@ 2013-01-19 16:26 ` Dimitri Maziuk
  2013-01-19 16:51   ` Denis Fondras
  2013-01-19 17:13   ` Sage Weil
  0 siblings, 2 replies; 53+ messages in thread
From: Dimitri Maziuk @ 2013-01-19 16:26 UTC (permalink / raw)
  To: Peter Smith; +Cc: ceph-devel

On 1/19/2013 9:50 AM, Peter Smith wrote:

> 3. OS recommendation: The OS recommendation page:
> http://ceph.com/docs/master/install/os-recommendations/#bobtail-0-56
> says CentOS 6.3 has a default kernel with old kernel client. CentOS
> 6.3 is our production environment.

I was unable to get ceph to run on centos 6.3 following the "5 minute 
quick start" document. I did get one machine to "unclean" cluster state 
using elrepo's kernel 3.7, but that kernel doesn't boot on most of our 
boxen (boot-time oops in pata-acpi).

It looks to me that unless inktank backports rbd.ko to 2.6.32, centos/sl 
users won't be able to test ceph for a couple of years: until centos 7.1 
comes out and we get the chance to reimage our hardware.

Dima

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-19 16:26 ` Dimitri Maziuk
@ 2013-01-19 16:51   ` Denis Fondras
  2013-01-19 17:15     ` Wenhao Xu
  2013-01-19 17:13   ` Sage Weil
  1 sibling, 1 reply; 53+ messages in thread
From: Denis Fondras @ 2013-01-19 16:51 UTC (permalink / raw)
  To: ceph-devel

Hello,

>
> I was unable to get ceph to run on centos 6.3 following the "5 minute

Same here... I was only able to build the ceph-fuse client.

Denis

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-19 16:26 ` Dimitri Maziuk
  2013-01-19 16:51   ` Denis Fondras
@ 2013-01-19 17:13   ` Sage Weil
  2013-01-19 17:25     ` Peter Smith
  2013-01-20 16:32     ` Dimitri Maziuk
  1 sibling, 2 replies; 53+ messages in thread
From: Sage Weil @ 2013-01-19 17:13 UTC (permalink / raw)
  To: Dimitri Maziuk; +Cc: Peter Smith, ceph-devel

On Sat, 19 Jan 2013, Dimitri Maziuk wrote:
> On 1/19/2013 9:50 AM, Peter Smith wrote:
> 
> > 3. OS recommendation: The OS recommendation page:
> > http://ceph.com/docs/master/install/os-recommendations/#bobtail-0-56
> > says CentOS 6.3 has a default kernel with old kernel client. CentOS
> > 6.3 is our production environment.
> 
> I was unable to get ceph to run on centos 6.3 following the "5 minute quick
> start" document. I did get one machine to "unclean" cluster state using
> elrepo's kernel 3.7, but that kernel doesn't boot on most of our boxen
> (boot-time oops in pata-acpi).
> 
> It looks to me that unless inktank backports rbd.ko to 2.6.32, centos/sl users
> won't be able to test ceph for a couple of years: until centos 7.1 comes out
> and we get the chance to reimage our hardware.

If you want to use the kernel client(s), that is true: there are no plans 
to backport the client code to the ancient RHEL kernels.  Nothing prevents 
you from running the server side, though, or the userland clients 
(ceph-fuse, librbd, qemu/KVM, radosgw, etc.)

sage

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-19 16:51   ` Denis Fondras
@ 2013-01-19 17:15     ` Wenhao Xu
  0 siblings, 0 replies; 53+ messages in thread
From: Wenhao Xu @ 2013-01-19 17:15 UTC (permalink / raw)
  To: Denis Fondras; +Cc: ceph-devel

oops, it seems not a good news to CentOS users. For kvm's rbd
image/volume use case, is it well supported? Or will Inktank consider
supporting CentOS in the near future?

Ceph-Fs seems not an urgent requirement. But the rbd and object store
are the core requirement for cloud service providers who are running
CentOS. If not, I think sheepdog should be a good alternative for
volume backend. And users can keep using swift as the object store on
CentOS.

On Sun, Jan 20, 2013 at 12:51 AM, Denis Fondras <ceph@ledeuns.net> wrote:
>
> Hello,
>
>
>>
>> I was unable to get ceph to run on centos 6.3 following the "5 minute
>
>
> Same here... I was only able to build the ceph-fuse client.
>
> Denis
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
~_~

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-19 17:13   ` Sage Weil
@ 2013-01-19 17:25     ` Peter Smith
  2013-01-19 17:38       ` Sage Weil
  2013-01-20 16:32     ` Dimitri Maziuk
  1 sibling, 1 reply; 53+ messages in thread
From: Peter Smith @ 2013-01-19 17:25 UTC (permalink / raw)
  To: Sage Weil; +Cc: Dimitri Maziuk, ceph-devel

Thanks for the reply, Sage and everyone.

Sage, so I can expect Ceph-rbd works well on Centos 6.3 if I only use
it as the Cinder volume backend because the librbd in QEMU doesn't
make use of kernel client, right?

Could you explain a bit more about what are the functions of kernel
client? Will it influence the daily operations, such as listing
volumes, devices by using ceph commands?






On Sun, Jan 20, 2013 at 1:13 AM, Sage Weil <sage@inktank.com> wrote:
> On Sat, 19 Jan 2013, Dimitri Maziuk wrote:
>> On 1/19/2013 9:50 AM, Peter Smith wrote:
>>
>> > 3. OS recommendation: The OS recommendation page:
>> > http://ceph.com/docs/master/install/os-recommendations/#bobtail-0-56
>> > says CentOS 6.3 has a default kernel with old kernel client. CentOS
>> > 6.3 is our production environment.
>>
>> I was unable to get ceph to run on centos 6.3 following the "5 minute quick
>> start" document. I did get one machine to "unclean" cluster state using
>> elrepo's kernel 3.7, but that kernel doesn't boot on most of our boxen
>> (boot-time oops in pata-acpi).
>>
>> It looks to me that unless inktank backports rbd.ko to 2.6.32, centos/sl users
>> won't be able to test ceph for a couple of years: until centos 7.1 comes out
>> and we get the chance to reimage our hardware.
>
> If you want to use the kernel client(s), that is true: there are no plans
> to backport the client code to the ancient RHEL kernels.  Nothing prevents
> you from running the server side, though, or the userland clients
> (ceph-fuse, librbd, qemu/KVM, radosgw, etc.)
>
> sage

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-19 17:25     ` Peter Smith
@ 2013-01-19 17:38       ` Sage Weil
  2013-01-19 18:08         ` Jeff Mitchell
  2013-01-19 18:16         ` Peter Smith
  0 siblings, 2 replies; 53+ messages in thread
From: Sage Weil @ 2013-01-19 17:38 UTC (permalink / raw)
  To: Peter Smith; +Cc: Dimitri Maziuk, ceph-devel

On Sun, 20 Jan 2013, Peter Smith wrote:
> Thanks for the reply, Sage and everyone.
> 
> Sage, so I can expect Ceph-rbd works well on Centos 6.3 if I only use
> it as the Cinder volume backend because the librbd in QEMU doesn't
> make use of kernel client, right?

Then the dependency is on the qemu version.  I don't remember that off the 
top of my head, or know what version rhel6 ships.  Most people deploying 
openstack and rbd are using a more modern distro (like ubuntu 12.04).

Josh would know more...

> Could you explain a bit more about what are the functions of kernel
> client? Will it influence the daily operations, such as listing
> volumes, devices by using ceph commands?

There is no kernel dependency at all for the radosgw and qemu+rbd use- 
cases.

sage


> 
> 
> 
> 
> 
> 
> On Sun, Jan 20, 2013 at 1:13 AM, Sage Weil <sage@inktank.com> wrote:
> > On Sat, 19 Jan 2013, Dimitri Maziuk wrote:
> >> On 1/19/2013 9:50 AM, Peter Smith wrote:
> >>
> >> > 3. OS recommendation: The OS recommendation page:
> >> > http://ceph.com/docs/master/install/os-recommendations/#bobtail-0-56
> >> > says CentOS 6.3 has a default kernel with old kernel client. CentOS
> >> > 6.3 is our production environment.
> >>
> >> I was unable to get ceph to run on centos 6.3 following the "5 minute quick
> >> start" document. I did get one machine to "unclean" cluster state using
> >> elrepo's kernel 3.7, but that kernel doesn't boot on most of our boxen
> >> (boot-time oops in pata-acpi).
> >>
> >> It looks to me that unless inktank backports rbd.ko to 2.6.32, centos/sl users
> >> won't be able to test ceph for a couple of years: until centos 7.1 comes out
> >> and we get the chance to reimage our hardware.
> >
> > If you want to use the kernel client(s), that is true: there are no plans
> > to backport the client code to the ancient RHEL kernels.  Nothing prevents
> > you from running the server side, though, or the userland clients
> > (ceph-fuse, librbd, qemu/KVM, radosgw, etc.)
> >
> > sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-19 17:38       ` Sage Weil
@ 2013-01-19 18:08         ` Jeff Mitchell
  2013-01-19 18:16           ` Sage Weil
  2013-01-19 18:16         ` Peter Smith
  1 sibling, 1 reply; 53+ messages in thread
From: Jeff Mitchell @ 2013-01-19 18:08 UTC (permalink / raw)
  To: Sage Weil; +Cc: Peter Smith, Dimitri Maziuk, ceph-devel

Sage Weil wrote:
> On Sun, 20 Jan 2013, Peter Smith wrote:
>> Thanks for the reply, Sage and everyone.
>>
>> Sage, so I can expect Ceph-rbd works well on Centos 6.3 if I only use
>> it as the Cinder volume backend because the librbd in QEMU doesn't
>> make use of kernel client, right?
>
> Then the dependency is on the qemu version.  I don't remember that off the
> top of my head, or know what version rhel6 ships.  Most people deploying
> openstack and rbd are using a more modern distro (like ubuntu 12.04).

This discussion has made me curious: I'm using Ganeti to manage VMs, 
which manages the storage using the kernel client and passes in the dev 
device to qemu.

Can you comment on any known performance differences between the two 
methods -- native qemu+librbd creating a block device vs. the kernel 
client creating a block device?

Thanks,
Jeff

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-19 18:08         ` Jeff Mitchell
@ 2013-01-19 18:16           ` Sage Weil
  2013-01-19 18:25             ` Peter Smith
  2013-01-20 16:39             ` Dimitri Maziuk
  0 siblings, 2 replies; 53+ messages in thread
From: Sage Weil @ 2013-01-19 18:16 UTC (permalink / raw)
  To: Jeff Mitchell; +Cc: Peter Smith, Dimitri Maziuk, ceph-devel

On Sat, 19 Jan 2013, Jeff Mitchell wrote:
> Sage Weil wrote:
> > On Sun, 20 Jan 2013, Peter Smith wrote:
> > > Thanks for the reply, Sage and everyone.
> > > 
> > > Sage, so I can expect Ceph-rbd works well on Centos 6.3 if I only use
> > > it as the Cinder volume backend because the librbd in QEMU doesn't
> > > make use of kernel client, right?
> > 
> > Then the dependency is on the qemu version.  I don't remember that off the
> > top of my head, or know what version rhel6 ships.  Most people deploying
> > openstack and rbd are using a more modern distro (like ubuntu 12.04).
> 
> This discussion has made me curious: I'm using Ganeti to manage VMs, which
> manages the storage using the kernel client and passes in the dev device to
> qemu.
> 
> Can you comment on any known performance differences between the two methods
> -- native qemu+librbd creating a block device vs. the kernel client creating a
> block device?

librbd is faster-paced and has more features, including client-side 
caching (analogous to the cache in a hard drive), discard, and support for 
image cloning.  It tends to perform better.

The kernel client can be combined with FlashCache or something similar, 
although that isn't something we've tested.

We generally recommend the KVM+librbd route, as it is easier to manage the 
dependencies, and is well integrated with libvirt.  FWIW this is what 
OpenStack and CloudStack normally use.

sage

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-19 17:38       ` Sage Weil
  2013-01-19 18:08         ` Jeff Mitchell
@ 2013-01-19 18:16         ` Peter Smith
  2013-01-19 21:10           ` Josh Durgin
  1 sibling, 1 reply; 53+ messages in thread
From: Peter Smith @ 2013-01-19 18:16 UTC (permalink / raw)
  To: Sage Weil; +Cc: Dimitri Maziuk, ceph-devel@vger.kernel.org



.

On Jan 20, 2013, at 1:38 AM, Sage Weil <sage@inktank.com> wrote:

> On Sun, 20 Jan 2013, Peter Smith wrote:
>> Thanks for the reply, Sage and everyone.
>> 
>> Sage, so I can expect Ceph-rbd works well on Centos 6.3 if I only use
>> it as the Cinder volume backend because the librbd in QEMU doesn't
>> make use of kernel client, right?
> 
> Then the dependency is on the qemu version.  I don't remember that off the 
> top of my head, or know what version rhel6 ships.  Most people deploying 
> openstack and rbd are using a more modern distro (like ubuntu 12.04).

That is fine. What qemu version do u recommend? I can build one myself.

Centos is a hard requirement for our IT. Ubuntu was known not very stable before. We are not sure how it is now.

> 
> Josh would know more...
> 
>> Could you explain a bit more about what are the functions of kernel
>> client? Will it influence the daily operations, such as listing
>> volumes, devices by using ceph commands?
> 
> There is no kernel dependency at all for the radosgw and qemu+rbd use- 
> cases.
Thanks for the clarification.
> 
> sage
> 
> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Sun, Jan 20, 2013 at 1:13 AM, Sage Weil <sage@inktank.com> wrote:
>>> On Sat, 19 Jan 2013, Dimitri Maziuk wrote:
>>>> On 1/19/2013 9:50 AM, Peter Smith wrote:
>>>> 
>>>>> 3. OS recommendation: The OS recommendation page:
>>>>> http://ceph.com/docs/master/install/os-recommendations/#bobtail-0-56
>>>>> says CentOS 6.3 has a default kernel with old kernel client. CentOS
>>>>> 6.3 is our production environment.
>>>> 
>>>> I was unable to get ceph to run on centos 6.3 following the "5 minute quick
>>>> start" document. I did get one machine to "unclean" cluster state using
>>>> elrepo's kernel 3.7, but that kernel doesn't boot on most of our boxen
>>>> (boot-time oops in pata-acpi).
>>>> 
>>>> It looks to me that unless inktank backports rbd.ko to 2.6.32, centos/sl users
>>>> won't be able to test ceph for a couple of years: until centos 7.1 comes out
>>>> and we get the chance to reimage our hardware.
>>> 
>>> If you want to use the kernel client(s), that is true: there are no plans
>>> to backport the client code to the ancient RHEL kernels.  Nothing prevents
>>> you from running the server side, though, or the userland clients
>>> (ceph-fuse, librbd, qemu/KVM, radosgw, etc.)
>>> 
>>> sage
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> 
Send from my iOS device

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-19 18:16           ` Sage Weil
@ 2013-01-19 18:25             ` Peter Smith
  2013-01-20 16:39             ` Dimitri Maziuk
  1 sibling, 0 replies; 53+ messages in thread
From: Peter Smith @ 2013-01-19 18:25 UTC (permalink / raw)
  To: Sage Weil; +Cc: Jeff Mitchell, Dimitri Maziuk, ceph-devel@vger.kernel.org

I am curious what the kernel parts of ceph do? What the user parts do?  Do we have a web page describing this in detail?

From what you described, in the librbd case, user parts do not need the kernel parts at all, right? This sounds very good to me.

Send from my iOS device.

On Jan 20, 2013, at 2:16 AM, Sage Weil <sage@inktank.com> wrote:

> On Sat, 19 Jan 2013, Jeff Mitchell wrote:
>> Sage Weil wrote:
>>> On Sun, 20 Jan 2013, Peter Smith wrote:
>>>> Thanks for the reply, Sage and everyone.
>>>> 
>>>> Sage, so I can expect Ceph-rbd works well on Centos 6.3 if I only use
>>>> it as the Cinder volume backend because the librbd in QEMU doesn't
>>>> make use of kernel client, right?
>>> 
>>> Then the dependency is on the qemu version.  I don't remember that off the
>>> top of my head, or know what version rhel6 ships.  Most people deploying
>>> openstack and rbd are using a more modern distro (like ubuntu 12.04).
>> 
>> This discussion has made me curious: I'm using Ganeti to manage VMs, which
>> manages the storage using the kernel client and passes in the dev device to
>> qemu.
>> 
>> Can you comment on any known performance differences between the two methods
>> -- native qemu+librbd creating a block device vs. the kernel client creating a
>> block device?
> 
> librbd is faster-paced and has more features, including client-side 
> caching (analogous to the cache in a hard drive), discard, and support for 
> image cloning.  It tends to perform better.
> 
> The kernel client can be combined with FlashCache or something similar, 
> although that isn't something we've tested.
> 
> We generally recommend the KVM+librbd route, as it is easier to manage the 
> dependencies, and is well integrated with libvirt.  FWIW this is what 
> OpenStack and CloudStack normally use.
> 
> sage

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-19 18:16         ` Peter Smith
@ 2013-01-19 21:10           ` Josh Durgin
  2013-01-20  0:41             ` Jeff Mitchell
  0 siblings, 1 reply; 53+ messages in thread
From: Josh Durgin @ 2013-01-19 21:10 UTC (permalink / raw)
  To: Peter Smith; +Cc: Sage Weil, Dimitri Maziuk, ceph-devel@vger.kernel.org

On 01/19/2013 10:16 AM, Peter Smith wrote:
> That is fine. What qemu version do u recommend? I can build one myself.

I'd recommend qemu 1.2+. You'll probably need a newer libvirt than 
Centos 6 has as well. libvirt 0.10+ is ideal. Ubuntu has an older
version, but with important fixes backported.

Josh

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-19 21:10           ` Josh Durgin
@ 2013-01-20  0:41             ` Jeff Mitchell
  2013-01-20  3:24               ` Peter Smith
  0 siblings, 1 reply; 53+ messages in thread
From: Jeff Mitchell @ 2013-01-20  0:41 UTC (permalink / raw)
  To: Josh Durgin
  Cc: Peter Smith, Sage Weil, Dimitri Maziuk,
	ceph-devel@vger.kernel.org

> I'd recommend qemu 1.2+. You'll probably need a newer libvirt than Centos 6
> has as well. libvirt 0.10+ is ideal. Ubuntu has an older
> version, but with important fixes backported.

Your guys' most supported/tested platform is Precise, according to
http://ceph.com/docs/master/install/os-recommendations/ -- so is it
more ideal to run Precise (with your packages) with the older +
patched versions of libvirt and qemu, or better to run e.g. Quantal
with newer libvirt + qemu but a less tested platform?

Thanks,
Jeff

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-20  0:41             ` Jeff Mitchell
@ 2013-01-20  3:24               ` Peter Smith
  2013-01-20  3:56                 ` Sage Weil
  0 siblings, 1 reply; 53+ messages in thread
From: Peter Smith @ 2013-01-20  3:24 UTC (permalink / raw)
  To: Jeff Mitchell
  Cc: Josh Durgin, Sage Weil, Dimitri Maziuk,
	ceph-devel@vger.kernel.org

Or upgrade to 3.7.3 kernel on Precise? Does Inktank test on Ubuntu
12.04 with old kernel or 3.7.3 kernel?


On Sun, Jan 20, 2013 at 8:41 AM, Jeff Mitchell
<jeffrey.mitchell@gmail.com> wrote:
>> I'd recommend qemu 1.2+. You'll probably need a newer libvirt than Centos 6
>> has as well. libvirt 0.10+ is ideal. Ubuntu has an older
>> version, but with important fixes backported.
>
> Your guys' most supported/tested platform is Precise, according to
> http://ceph.com/docs/master/install/os-recommendations/ -- so is it
> more ideal to run Precise (with your packages) with the older +
> patched versions of libvirt and qemu, or better to run e.g. Quantal
> with newer libvirt + qemu but a less tested platform?
>
> Thanks,
> Jeff

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-20  3:24               ` Peter Smith
@ 2013-01-20  3:56                 ` Sage Weil
  0 siblings, 0 replies; 53+ messages in thread
From: Sage Weil @ 2013-01-20  3:56 UTC (permalink / raw)
  To: Peter Smith
  Cc: Jeff Mitchell, Josh Durgin, Dimitri Maziuk,
	ceph-devel@vger.kernel.org

On Sun, 20 Jan 2013, Peter Smith wrote:
> Or upgrade to 3.7.3 kernel on Precise? Does Inktank test on Ubuntu
> 12.04 with old kernel or 3.7.3 kernel?

We test mostly mainline development kernels in the course of testing our 
own work and ensuring there aren't upstream regressions.

sage

> 
> 
> On Sun, Jan 20, 2013 at 8:41 AM, Jeff Mitchell
> <jeffrey.mitchell@gmail.com> wrote:
> >> I'd recommend qemu 1.2+. You'll probably need a newer libvirt than Centos 6
> >> has as well. libvirt 0.10+ is ideal. Ubuntu has an older
> >> version, but with important fixes backported.
> >
> > Your guys' most supported/tested platform is Precise, according to
> > http://ceph.com/docs/master/install/os-recommendations/ -- so is it
> > more ideal to run Precise (with your packages) with the older +
> > patched versions of libvirt and qemu, or better to run e.g. Quantal
> > with newer libvirt + qemu but a less tested platform?
> >
> > Thanks,
> > Jeff
> 
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-19 17:13   ` Sage Weil
  2013-01-19 17:25     ` Peter Smith
@ 2013-01-20 16:32     ` Dimitri Maziuk
  2013-01-24 18:16       ` Dan Mick
  1 sibling, 1 reply; 53+ messages in thread
From: Dimitri Maziuk @ 2013-01-20 16:32 UTC (permalink / raw)
  To: ceph-devel

On 1/19/2013 11:13 AM, Sage Weil wrote:

> If you want to use the kernel client(s), that is true: there are no plans
> to backport the client code to the ancient RHEL kernels.  Nothing prevents
> you from running the server side, though, or the userland clients
> (ceph-fuse, librbd, qemu/KVM, radosgw, etc.)

mkcephfs form 5-minute start fails without rbd.ko. I already reported that.

Dima



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-19 18:16           ` Sage Weil
  2013-01-19 18:25             ` Peter Smith
@ 2013-01-20 16:39             ` Dimitri Maziuk
  2013-01-23 15:13               ` Sam Lang
  1 sibling, 1 reply; 53+ messages in thread
From: Dimitri Maziuk @ 2013-01-20 16:39 UTC (permalink / raw)
  To: ceph-devel

On 1/19/2013 12:16 PM, Sage Weil wrote:

> We generally recommend the KVM+librbd route, as it is easier to manage the
> dependencies, and is well integrated with libvirt.  FWIW this is what
> OpenStack and CloudStack normally use.

OK, so is there a quick stat document for that configuration?

(Oh, and "form" in my other message is supposed to be "from": tyop)

Dima



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-20 16:39             ` Dimitri Maziuk
@ 2013-01-23 15:13               ` Sam Lang
  2013-01-23 16:19                 ` Patrick McGarry
  0 siblings, 1 reply; 53+ messages in thread
From: Sam Lang @ 2013-01-23 15:13 UTC (permalink / raw)
  To: dmaziuk; +Cc: ceph-devel@vger.kernel.org

On Sun, Jan 20, 2013 at 10:39 AM, Dimitri Maziuk <dmaziuk@bmrb.wisc.edu> wrote:
> On 1/19/2013 12:16 PM, Sage Weil wrote:
>
>> We generally recommend the KVM+librbd route, as it is easier to manage the
>> dependencies, and is well integrated with libvirt.  FWIW this is what
>> OpenStack and CloudStack normally use.
>
>
> OK, so is there a quick stat document for that configuration?

http://ceph.com/docs/master/rbd/rbd-openstack/
-sam

>
> (Oh, and "form" in my other message is supposed to be "from": tyop)
>
> Dima
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-23 15:13               ` Sam Lang
@ 2013-01-23 16:19                 ` Patrick McGarry
  2013-01-24  0:10                   ` Dimitri Maziuk
  0 siblings, 1 reply; 53+ messages in thread
From: Patrick McGarry @ 2013-01-23 16:19 UTC (permalink / raw)
  To: Sam Lang; +Cc: dmaziuk, ceph-devel@vger.kernel.org

Dimitri,

For what it's worth I also stepped through the process of spinning up
Ceph and OpenStack on a single EC2 node in a recent blog entry:

http://ceph.com/howto/building-a-public-ami-with-ceph-and-openstack/

It has some shortcuts (read: not meant to be production) but it may
help give you a quicker quickstart.  Feel free to shout if you have
questions, either here or poke scuttlemonkey on #ceph or twitter.
Good luck.  Thanks.


Best Regards,

Patrick

On Wed, Jan 23, 2013 at 10:13 AM, Sam Lang <sam.lang@inktank.com> wrote:
> On Sun, Jan 20, 2013 at 10:39 AM, Dimitri Maziuk <dmaziuk@bmrb.wisc.edu> wrote:
>> On 1/19/2013 12:16 PM, Sage Weil wrote:
>>
>>> We generally recommend the KVM+librbd route, as it is easier to manage the
>>> dependencies, and is well integrated with libvirt.  FWIW this is what
>>> OpenStack and CloudStack normally use.
>>
>>
>> OK, so is there a quick stat document for that configuration?
>
> http://ceph.com/docs/master/rbd/rbd-openstack/
> -sam
>
>>
>> (Oh, and "form" in my other message is supposed to be "from": tyop)
>>
>> Dima
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Patrick McGarry
Director, Community
Inktank

@scuttlemonkey @inktank @ceph

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-23 16:19                 ` Patrick McGarry
@ 2013-01-24  0:10                   ` Dimitri Maziuk
  2013-01-24  0:17                     ` John Nielsen
  2013-01-24  8:49                     ` Gandalf Corvotempesta
  0 siblings, 2 replies; 53+ messages in thread
From: Dimitri Maziuk @ 2013-01-24  0:10 UTC (permalink / raw)
  To: ceph-devel@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 1091 bytes --]

On 01/23/2013 10:19 AM, Patrick McGarry wrote:

> http://ceph.com/howto/building-a-public-ami-with-ceph-and-openstack/

> On Wed, Jan 23, 2013 at 10:13 AM, Sam Lang <sam.lang@inktank.com> wrote:

>> http://ceph.com/docs/master/rbd/rbd-openstack/

These are both great, I'm sure, but Patrick's page says "I chose to
follow the 5 minute quickstart guide" and the rbd-openstack page says
"Important ... you must have a running Ceph cluster."

My problem is I can;t find a "5 minute quickstart guide" for RHEL 6. and
I didn't get a "running ceph cluster" by trying to follow the existing
(ubuntu) guide and adjust for centos 6.3.

So I'm stuck at a point way before those guides become relevant: once I
had one OSD/MDS/MON box up, I got "HEALTH_WARN 384 pgs degraded; 384 pgs
stuck unclean; recovery 21/42 degraded (50.000%)" (384 appears be the
number of placement groups created by default).

What does that mean? That I only have one OSD? Or is it genuinely unhealthy?

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 254 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-24  0:10                   ` Dimitri Maziuk
@ 2013-01-24  0:17                     ` John Nielsen
  2013-01-24  2:36                       ` Dimitri Maziuk
  2013-01-24 15:45                       ` Dimitri Maziuk
  2013-01-24  8:49                     ` Gandalf Corvotempesta
  1 sibling, 2 replies; 53+ messages in thread
From: John Nielsen @ 2013-01-24  0:17 UTC (permalink / raw)
  To: Dimitri Maziuk; +Cc: ceph-devel@vger.kernel.org

On Jan 23, 2013, at 5:10 PM, Dimitri Maziuk <dmaziuk@bmrb.wisc.edu> wrote:

> On 01/23/2013 10:19 AM, Patrick McGarry wrote:
> 
>> http://ceph.com/howto/building-a-public-ami-with-ceph-and-openstack/
> 
>> On Wed, Jan 23, 2013 at 10:13 AM, Sam Lang <sam.lang@inktank.com> wrote:
> 
>>> http://ceph.com/docs/master/rbd/rbd-openstack/
> 
> These are both great, I'm sure, but Patrick's page says "I chose to
> follow the 5 minute quickstart guide" and the rbd-openstack page says
> "Important ... you must have a running Ceph cluster."
> 
> My problem is I can;t find a "5 minute quickstart guide" for RHEL 6. and
> I didn't get a "running ceph cluster" by trying to follow the existing
> (ubuntu) guide and adjust for centos 6.3.

http://ceph.com/docs/master/install/rpm/
http://ceph.com/docs/master/start/quick-start/

Between those two links my own quick-start on CentOS 6.3 was maybe 6 minutes. YMMV.

After learning that qemu uses librbd (and thus doesn't rely on the rbd kernel module) I was happy to stick with the stock CentOS kernel for my servers (with updated qemu and libvirt builds).

> So I'm stuck at a point way before those guides become relevant: once I
> had one OSD/MDS/MON box up, I got "HEALTH_WARN 384 pgs degraded; 384 pgs
> stuck unclean; recovery 21/42 degraded (50.000%)" (384 appears be the
> number of placement groups created by default).
> 
> What does that mean? That I only have one OSD? Or is it genuinely unhealthy?

Assuming you have more than one host, be sure that iptables or another firewall isn't preventing communication between the ceph daemons.

JN


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-24  0:17                     ` John Nielsen
@ 2013-01-24  2:36                       ` Dimitri Maziuk
  2013-01-24 15:45                       ` Dimitri Maziuk
  1 sibling, 0 replies; 53+ messages in thread
From: Dimitri Maziuk @ 2013-01-24  2:36 UTC (permalink / raw)
  To: ceph-devel@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 893 bytes --]

On 01/23/2013 06:17 PM, John Nielsen wrote:
...
> http://ceph.com/docs/master/install/rpm/
> http://ceph.com/docs/master/start/quick-start/
> 
> Between those two links my own quick-start on CentOS 6.3 was maybe 6 minutes. YMMV.

It does, obviously, since

"Deploy the configuration
...
2. Execute the following on the Ceph server host
cd /etc/ceph
sudo mkcephfs -a -c /etc/ceph/ceph.conf -k ceph.keyring
"

was failing here until I booted an elrepo 3.7 kernel with rbd.ko.

>> HEALTH_WARN 384 pgs degraded; 384 pgs stuck unclean; recovery 21/42 degraded (50.000%)

>> What does that mean? That I only have one OSD? Or is it genuinely unhealthy?

> Assuming you have more than one host ...

I just said I have one host. So is that expected when I only have one host?

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 254 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-24  0:10                   ` Dimitri Maziuk
  2013-01-24  0:17                     ` John Nielsen
@ 2013-01-24  8:49                     ` Gandalf Corvotempesta
  2013-01-24 13:55                       ` Dimitri Maziuk
  1 sibling, 1 reply; 53+ messages in thread
From: Gandalf Corvotempesta @ 2013-01-24  8:49 UTC (permalink / raw)
  To: Dimitri Maziuk; +Cc: ceph-devel@vger.kernel.org

2013/1/24 Dimitri Maziuk <dmaziuk@bmrb.wisc.edu>:
> So I'm stuck at a point way before those guides become relevant: once I
> had one OSD/MDS/MON box up, I got "HEALTH_WARN 384 pgs degraded; 384 pgs
> stuck unclean; recovery 21/42 degraded (50.000%)" (384 appears be the
> number of placement groups created by default).
>
> What does that mean? That I only have one OSD? Or is it genuinely unhealthy?

ceph is building it's cluster. You should wait for it.
In my case, it needed 5-10 minutes.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-24  8:49                     ` Gandalf Corvotempesta
@ 2013-01-24 13:55                       ` Dimitri Maziuk
       [not found]                         ` <CAKMAVE9HMo4x3seuG7ppeafSRJmBwjUXrLv0GUA-z5kDXyhoQA@mail.gmail.com>
  0 siblings, 1 reply; 53+ messages in thread
From: Dimitri Maziuk @ 2013-01-24 13:55 UTC (permalink / raw)
  To: Gandalf Corvotempesta; +Cc: ceph-devel@vger.kernel.org

On 1/24/2013 2:49 AM, Gandalf Corvotempesta wrote:
> 2013/1/24 Dimitri Maziuk <dmaziuk@bmrb.wisc.edu>:
>> So I'm stuck at a point way before those guides become relevant: once I
>> had one OSD/MDS/MON box up, I got "HEALTH_WARN 384 pgs degraded; 384 pgs
>> stuck unclean; recovery 21/42 degraded (50.000%)" (384 appears be the
>> number of placement groups created by default).
>>
>> What does that mean? That I only have one OSD? Or is it genuinely unhealthy?
>
> ceph is building it's cluster. You should wait for it.
> In my case, it needed 5-10 minutes.

No, that's not it: it was stuck in that state for 40 minutes or so.

Dima


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
       [not found]                         ` <CAKMAVE9HMo4x3seuG7ppeafSRJmBwjUXrLv0GUA-z5kDXyhoQA@mail.gmail.com>
@ 2013-01-24 15:28                           ` Dimitri Maziuk
  2013-01-24 16:12                             ` Sam Lang
                                               ` (2 more replies)
  0 siblings, 3 replies; 53+ messages in thread
From: Dimitri Maziuk @ 2013-01-24 15:28 UTC (permalink / raw)
  To: Sam Lang; +Cc: ceph-devel@vger.kernel.org

On 1/24/2013 8:20 AM, Sam Lang wrote:

> Yep it means that you only have one OSD with replication level of 2.
> If you had a rep level of 3, you would see degraded (66.667%).  If you
> just want to make the message go away (for testing purposes), you can
> set the rep level to 1
> (http://ceph.com/w/index.php?title=Adjusting_replication_level&redirect=no).

OK, thanks Sam and Dino -- I kinda suspected that but didn't find any docs.

This looks like it's not adjustable via ceph.conf, I can only do it at 
runtime, correct?

Dima


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-24  0:17                     ` John Nielsen
  2013-01-24  2:36                       ` Dimitri Maziuk
@ 2013-01-24 15:45                       ` Dimitri Maziuk
  2013-01-24 15:53                         ` Jens Kristian Søgaard
  2013-01-24 16:22                         ` Sam Lang
  1 sibling, 2 replies; 53+ messages in thread
From: Dimitri Maziuk @ 2013-01-24 15:45 UTC (permalink / raw)
  To: ceph-devel@vger.kernel.org

One other question I have left (so far) is: I read and tried to follow 
http://ceph.com/docs/master/install/rpm/ and 
http://ceph.com/docs/master/start/quick-start/ on centos 6.3.

mkcephfs step fails without rbd kernel module.

I just tried to find "libvirt", "kernel", "module", and "qemu" on those 
pages: "kernel" occurs in "add ceph packages" section and "module" 
occurs in the header, footer, and the side menu. 0 hits for the others.

So when I read "after learning that qemu uses librbd (and thus doesn't 
rely on the rbd kernel module) I was happy to stick with the stock 
CentOS kernel for my servers (with updated qemu and libvirt builds)" -- 
forgive me for being dense, but I have no context for this. Where in 
ceph.conf do I tell it to use qemu and librbd instead of kernel module? 
Or does it mean I'm to set up my OSDs in virtual machines? Seems I'm 
missing an important piece of information here (possibly because it's 
blatantly obvious and is staring me in the face -- woudn't be the first 
time).

So what is it that I'm missing?

TIA
Dima

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-24 15:45                       ` Dimitri Maziuk
@ 2013-01-24 15:53                         ` Jens Kristian Søgaard
  2013-01-24 15:58                           ` Wido den Hollander
  2013-01-24 16:22                         ` Sam Lang
  1 sibling, 1 reply; 53+ messages in thread
From: Jens Kristian Søgaard @ 2013-01-24 15:53 UTC (permalink / raw)
  To: dmaziuk; +Cc: ceph-devel@vger.kernel.org

Hi Dimitri,

> Where in ceph.conf do I tell it to use qemu and librbd instead of
> kernel module?

You do not need to specify that in ceph.conf.

When you run qemu then specify the disk for example like this:

  -drive format=rbd,file=rbd:/pool/imagename,if=virtio,index=0,boot=on

Where you replace "pool" and "imagename" with whatever you have called them.

You can create new images with the qemu-img utility.

-- 
Jens Kristian Søgaard, Mermaid Consulting ApS,
jens@mermaidconsulting.dk,
http://wwww.mermaidconsulting.com/

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-24 15:53                         ` Jens Kristian Søgaard
@ 2013-01-24 15:58                           ` Wido den Hollander
  2013-01-24 16:14                             ` Dimitri Maziuk
  0 siblings, 1 reply; 53+ messages in thread
From: Wido den Hollander @ 2013-01-24 15:58 UTC (permalink / raw)
  To: Jens Kristian Søgaard; +Cc: dmaziuk, ceph-devel@vger.kernel.org

On 01/24/2013 04:53 PM, Jens Kristian Søgaard wrote:
> Hi Dimitri,
>
>> Where in ceph.conf do I tell it to use qemu and librbd instead of
>> kernel module?
>
> You do not need to specify that in ceph.conf.
>
> When you run qemu then specify the disk for example like this:
>
>   -drive format=rbd,file=rbd:/pool/imagename,if=virtio,index=0,boot=on
>

Small typo :) It has to be:

  -drive format=rbd,file=rbd:pool/imagename,if=virtio,index=0,boot=on

Wido

> Where you replace "pool" and "imagename" with whatever you have called
> them.
>
> You can create new images with the qemu-img utility.
>

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-24 15:28                           ` Dimitri Maziuk
@ 2013-01-24 16:12                             ` Sam Lang
  2013-01-24 18:15                             ` Dan Mick
       [not found]                             ` <CAM2gkg6 m2S0DtgapSOg16GTmGQHsj7fz=a3XzH0ZsvcCWHcBtg@mail.gmail.com>
  2 siblings, 0 replies; 53+ messages in thread
From: Sam Lang @ 2013-01-24 16:12 UTC (permalink / raw)
  To: Dimitri Maziuk; +Cc: ceph-devel@vger.kernel.org

On Thu, Jan 24, 2013 at 9:28 AM, Dimitri Maziuk <dmaziuk@bmrb.wisc.edu> wrote:
> On 1/24/2013 8:20 AM, Sam Lang wrote:
>
>> Yep it means that you only have one OSD with replication level of 2.
>> If you had a rep level of 3, you would see degraded (66.667%).  If you
>> just want to make the message go away (for testing purposes), you can
>> set the rep level to 1
>>
>> (http://ceph.com/w/index.php?title=Adjusting_replication_level&redirect=no).
>
>
> OK, thanks Sam and Dino -- I kinda suspected that but didn't find any docs.
>
> This looks like it's not adjustable via ceph.conf, I can only do it at
> runtime, correct?

Correct.
-sam

>
> Dima
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-24 15:58                           ` Wido den Hollander
@ 2013-01-24 16:14                             ` Dimitri Maziuk
  2013-01-24 16:18                               ` Jens Kristian Søgaard
  0 siblings, 1 reply; 53+ messages in thread
From: Dimitri Maziuk @ 2013-01-24 16:14 UTC (permalink / raw)
  To: ceph-devel@vger.kernel.org

On 1/24/2013 9:58 AM, Wido den Hollander wrote:
> On 01/24/2013 04:53 PM, Jens Kristian Søgaard wrote:
>> Hi Dimitri,
>>
>>> Where in ceph.conf do I tell it to use qemu and librbd instead of
>>> kernel module?
>>
>> You do not need to specify that in ceph.conf.
>>
>> When you run qemu then specify the disk for example like this:
>>
>>   -drive format=rbd,file=rbd:/pool/imagename,if=virtio,index=0,boot=on
>>
>
> Small typo :) It has to be:
>
>   -drive format=rbd,file=rbd:pool/imagename,if=virtio,index=0,boot=on

Thanks but I'm still missing the context. I'm following this document:
  http://ceph.com/docs/master/start/quick-start/
to set up an osd/mds/mon *server*.

The step that's failing without the kernel module is "Deploy the 
configuration #2":
  mkcephfs -a -c /etc/ceph/ceph.conf -k ceph.keyring

Are you saying I'm to run "qemu -drive ..." instead of mkcephfs?

Dima (I'm assuming either you aren't or qemu has changed a lot since I 
last looked)

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-24 16:14                             ` Dimitri Maziuk
@ 2013-01-24 16:18                               ` Jens Kristian Søgaard
  0 siblings, 0 replies; 53+ messages in thread
From: Jens Kristian Søgaard @ 2013-01-24 16:18 UTC (permalink / raw)
  To: dmaziuk; +Cc: ceph-devel@vger.kernel.org

Hi Dimitri,

> The step that's failing without the kernel module is "Deploy the
> configuration #2":
>   mkcephfs -a -c /etc/ceph/ceph.conf -k ceph.keyring

Could you elaborate on how it fails?

Do you get an error message?

> Are you saying I'm to run "qemu -drive ..." instead of mkcephfs?

No, not at all.

-- 
Jens Kristian Søgaard, Mermaid Consulting ApS,
jens@mermaidconsulting.dk,
http://wwww.mermaidconsulting.com/


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-24 15:45                       ` Dimitri Maziuk
  2013-01-24 15:53                         ` Jens Kristian Søgaard
@ 2013-01-24 16:22                         ` Sam Lang
  2013-01-24 17:09                           ` Dimitri Maziuk
  1 sibling, 1 reply; 53+ messages in thread
From: Sam Lang @ 2013-01-24 16:22 UTC (permalink / raw)
  To: Dimitri Maziuk; +Cc: ceph-devel@vger.kernel.org

On Thu, Jan 24, 2013 at 9:45 AM, Dimitri Maziuk <dmaziuk@bmrb.wisc.edu> wrote:
>
> One other question I have left (so far) is: I read and tried to follow
> http://ceph.com/docs/master/install/rpm/ and
> http://ceph.com/docs/master/start/quick-start/ on centos 6.3.
>
> mkcephfs step fails without rbd kernel module.
>
> I just tried to find "libvirt", "kernel", "module", and "qemu" on those
> pages: "kernel" occurs in "add ceph packages" section and "module" occurs in
> the header, footer, and the side menu. 0 hits for the others.
>
> So when I read "after learning that qemu uses librbd (and thus doesn't rely
> on the rbd kernel module) I was happy to stick with the stock CentOS kernel
> for my servers (with updated qemu and libvirt builds)" -- forgive me for
> being dense, but I have no context for this. Where in ceph.conf do I tell it
> to use qemu and librbd instead of kernel module? Or does it mean I'm to set
> up my OSDs in virtual machines? Seems I'm missing an important piece of
> information here (possibly because it's blatantly obvious and is staring me
> in the face -- woudn't be the first time).
>
> So what is it that I'm missing?

Setting up ceph is separate from setting up qemu to use librbd.
That's why the instructions at
http://ceph.com/docs/master/rbd/rbd-openstack/ say you need to have
your ceph cluster already running.  You can just setup ceph without
any specific config parameters for qemu or librbd, and then follow the
instructions there to get librbd working with openstack.  Because
librbd runs in userspace, you don't need the rbd kernel module, or a
modified kernel, so the stock centos kernel (or another distro's
kernel) should work for you.  Does that make sense?

-sam

>
> TIA
> Dima
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-24 16:22                         ` Sam Lang
@ 2013-01-24 17:09                           ` Dimitri Maziuk
  2013-01-24 17:16                             ` John Nielsen
  0 siblings, 1 reply; 53+ messages in thread
From: Dimitri Maziuk @ 2013-01-24 17:09 UTC (permalink / raw)
  To: ceph-devel@vger.kernel.org

On 1/24/2013 10:22 AM, Sam Lang wrote:

...  Does that make sense?

Yes, but when I'm trying to set up a ceph server using the quick start 
guide, mkcephfs is failing with an error message I didn't write down, 
but the complaint was along the lines of missing rbd.ko. Booting a 3.7 
kernel made it go away.

This is the part where everyone says "server stuff should run on the 
stock centos kernel" but in my reality it doesn't. (So I'm trying to 
figure out why my reality is different from everyone else's ;)

I'll see if I can reproduce it and post the exact error message.

Dima

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-24 17:09                           ` Dimitri Maziuk
@ 2013-01-24 17:16                             ` John Nielsen
  0 siblings, 0 replies; 53+ messages in thread
From: John Nielsen @ 2013-01-24 17:16 UTC (permalink / raw)
  To: dmaziuk; +Cc: ceph-devel@vger.kernel.org

On Jan 24, 2013, at 10:09 AM, Dimitri Maziuk <dmaziuk@bmrb.wisc.edu> wrote:

> Yes, but when I'm trying to set up a ceph server using the quick start guide, mkcephfs is failing with an error message I didn't write down, but the complaint was along the lines of missing rbd.ko. Booting a 3.7 kernel made it go away.
> 
> This is the part where everyone says "server stuff should run on the stock centos kernel" but in my reality it doesn't. (So I'm trying to figure out why my reality is different from everyone else's ;)
> 
> I'll see if I can reproduce it and post the exact error message.

That would be helpful.

RBD is totally optional (it's just another client, really) and should not be needed on a ceph server to have a functional cluster. My comment about Qemu was intended to illustrate that; you certainly don't need qemu.


On Jan 23, 2013, at 7:41 PM, Dimitri Maziuk <dmaziuk@bmrb.wisc.edu> wrote:

> On 01/23/2013 06:17 PM, John Nielsen wrote:
> 
>> http://ceph.com/docs/master/install/rpm/
>> http://ceph.com/docs/master/start/quick-start/
>> 
>> Between those two links my own quick-start on CentOS 6.3 was maybe 6 minutes. YMMV.
> 
> I guess the other obvious question is did you use bobtail rpms or
> "development release"? - I installed bobtail.

I used the bobtail RPM's without problem.

JN


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-24 15:28                           ` Dimitri Maziuk
  2013-01-24 16:12                             ` Sam Lang
@ 2013-01-24 18:15                             ` Dan Mick
  2013-01-24 18:58                               ` Dimitri Maziuk
       [not found]                             ` <CAM2gkg6 m2S0DtgapSOg16GTmGQHsj7fz=a3XzH0ZsvcCWHcBtg@mail.gmail.com>
  2 siblings, 1 reply; 53+ messages in thread
From: Dan Mick @ 2013-01-24 18:15 UTC (permalink / raw)
  To: dmaziuk; +Cc: Sam Lang, ceph-devel@vger.kernel.org

On 01/24/2013 07:28 AM, Dimitri Maziuk wrote:
> On 1/24/2013 8:20 AM, Sam Lang wrote:
>
>> Yep it means that you only have one OSD with replication level of 2.
>> If you had a rep level of 3, you would see degraded (66.667%). If you
>> just want to make the message go away (for testing purposes), you can
>> set the rep level to 1
>> (http://ceph.com/w/index.php?title=Adjusting_replication_level&redirect=no). 
>>
>
> OK, thanks Sam and Dino -- I kinda suspected that but didn't find any 
> docs.
>
> This looks like it's not adjustable via ceph.conf, I can only do it at 
> runtime, correct?

or you could just add another OSD.



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-20 16:32     ` Dimitri Maziuk
@ 2013-01-24 18:16       ` Dan Mick
  2013-01-24 20:14         ` Dimitri Maziuk
  0 siblings, 1 reply; 53+ messages in thread
From: Dan Mick @ 2013-01-24 18:16 UTC (permalink / raw)
  To: dmaziuk; +Cc: ceph-devel

On 01/20/2013 08:32 AM, Dimitri Maziuk wrote:
> On 1/19/2013 11:13 AM, Sage Weil wrote:
>
>> If you want to use the kernel client(s), that is true: there are no 
>> plans
>> to backport the client code to the ancient RHEL kernels. Nothing 
>> prevents
>> you from running the server side, though, or the userland clients
>> (ceph-fuse, librbd, qemu/KVM, radosgw, etc.)
>
> mkcephfs form 5-minute start fails without rbd.ko. I already reported 
> that.
>
> Dima

This is an apparently-unique problem, and we'd love to see details.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-24 18:15                             ` Dan Mick
@ 2013-01-24 18:58                               ` Dimitri Maziuk
  2013-01-24 21:07                                 ` Dan Mick
  0 siblings, 1 reply; 53+ messages in thread
From: Dimitri Maziuk @ 2013-01-24 18:58 UTC (permalink / raw)
  To: ceph-devel@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 990 bytes --]

On 01/24/2013 12:15 PM, Dan Mick wrote:
> On 01/24/2013 07:28 AM, Dimitri Maziuk wrote:
>> On 1/24/2013 8:20 AM, Sam Lang wrote:
>>
>>> Yep it means that you only have one OSD with replication level of 2.
>>> If you had a rep level of 3, you would see degraded (66.667%). If you
>>> just want to make the message go away (for testing purposes), you can
>>> set the rep level to 1
>>> (http://ceph.com/w/index.php?title=Adjusting_replication_level&redirect=no).
>>>
>>
>> OK, thanks Sam and Dino -- I kinda suspected that but didn't find any
>> docs.
>>
>> This looks like it's not adjustable via ceph.conf, I can only do it at
>> runtime, correct?
> 
> or you could just add another OSD.

Obviously. You'd think that only one [osd] section in ceph.conf implies
nrep = 1, though. (And then you can go on adding OSDs and changing nrep
accordingly -- that was my plan.)

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 254 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
       [not found]                               ` <CAM2gkg6m2S0DtgapSOg16GTmGQHsj7fz=a3XzH0ZsvcCWHcBtg@mail.gmail.com>
@ 2013-01-24 19:52                                 ` Dimitri Maziuk
  2013-01-24 20:53                                   ` John Wilkins
  0 siblings, 1 reply; 53+ messages in thread
From: Dimitri Maziuk @ 2013-01-24 19:52 UTC (permalink / raw)
  To: John Wilkins; +Cc: Sam Lang, ceph-devel@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 955 bytes --]

On 01/24/2013 12:38 PM, John Wilkins wrote:
> Dima,
> 
> I'm working on a new monitoring and troubleshooting guide now that will
> answer most of the questions related to OSD and placement group states. I
> hope to have it done this week. I have not actually tested the quick starts
> on centos or rhel distributions, but it's on our radar. The intention of
> the quick starts is to get you up and running quickly. It doesn't cover
> deeper issues like how to monitor and troubleshoot. I'm working on adding a
> lot more substantive content there now.

A couple of things in the quick start:

- there should be no space between "rw," and "noatime" in
osd mount options {fs-type} = {mount options} # default mount option is
"rw, noatime"

- for ext4, you need to specify "user_xattr" there or mkcephfs will fail
(with --mkfs at least).

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 254 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-24 18:16       ` Dan Mick
@ 2013-01-24 20:14         ` Dimitri Maziuk
  0 siblings, 0 replies; 53+ messages in thread
From: Dimitri Maziuk @ 2013-01-24 20:14 UTC (permalink / raw)
  To: Dan Mick; +Cc: ceph-devel

[-- Attachment #1: Type: text/plain, Size: 363 bytes --]

On 01/24/2013 12:16 PM, Dan Mick wrote:

> This is an apparently-unique problem, and we'd love to see details.

I hate it when it makes a liar out of me, this time around it worked on
2.6.23 -- FSVO "worked": I did get it to "384 pgs stuck unclean" stage.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 254 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-24 19:52                                 ` Dimitri Maziuk
@ 2013-01-24 20:53                                   ` John Wilkins
  2013-01-24 22:15                                     ` Dimitri Maziuk
  0 siblings, 1 reply; 53+ messages in thread
From: John Wilkins @ 2013-01-24 20:53 UTC (permalink / raw)
  To: Dimitri Maziuk; +Cc: Sam Lang, ceph-devel@vger.kernel.org

Dima,

I went ahead and updated the quick-start conf with an example. I
appreciate the feedback.

John

On Thu, Jan 24, 2013 at 11:52 AM, Dimitri Maziuk <dmaziuk@bmrb.wisc.edu> wrote:
>
> On 01/24/2013 12:38 PM, John Wilkins wrote:
> > Dima,
> >
> > I'm working on a new monitoring and troubleshooting guide now that will
> > answer most of the questions related to OSD and placement group states. I
> > hope to have it done this week. I have not actually tested the quick starts
> > on centos or rhel distributions, but it's on our radar. The intention of
> > the quick starts is to get you up and running quickly. It doesn't cover
> > deeper issues like how to monitor and troubleshoot. I'm working on adding a
> > lot more substantive content there now.
>
> A couple of things in the quick start:
>
> - there should be no space between "rw," and "noatime" in
> osd mount options {fs-type} = {mount options} # default mount option is
> "rw, noatime"
>
> - for ext4, you need to specify "user_xattr" there or mkcephfs will fail
> (with --mkfs at least).
>
> --
> Dimitri Maziuk
> Programmer/sysadmin
> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
>



--
John Wilkins
Senior Technical Writer
Intank
john.wilkins@inktank.com
(415) 425-9599

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-24 18:58                               ` Dimitri Maziuk
@ 2013-01-24 21:07                                 ` Dan Mick
  2013-01-24 21:45                                   ` Dimitri Maziuk
  0 siblings, 1 reply; 53+ messages in thread
From: Dan Mick @ 2013-01-24 21:07 UTC (permalink / raw)
  To: Dimitri Maziuk; +Cc: ceph-devel@vger.kernel.org

> You'd think that only one [osd] section in ceph.conf implies
> nrep = 1, though. (And then you can go on adding OSDs and changing nrep
> accordingly -- that was my plan.)
>

Yeah; it's probably mostly just that one-OSD configurations are so 
uncommon that we never special-cased that small user set.  Also, you can 
run with a cluster in that state forever (well, until that one OSD dies 
at least); I do that regularly with the default vstart.sh local test cluster

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-24 21:07                                 ` Dan Mick
@ 2013-01-24 21:45                                   ` Dimitri Maziuk
  2013-01-24 21:48                                     ` Sage Weil
  0 siblings, 1 reply; 53+ messages in thread
From: Dimitri Maziuk @ 2013-01-24 21:45 UTC (permalink / raw)
  To: Dan Mick; +Cc: ceph-devel@vger.kernel.org, john.wilkins, sam.lang

[-- Attachment #1: Type: text/plain, Size: 1548 bytes --]

On 01/24/2013 03:07 PM, Dan Mick wrote:
...
> Yeah; it's probably mostly just that one-OSD configurations are so
> uncommon that we never special-cased that small user set.  Also, you can
> run with a cluster in that state forever (well, until that one OSD dies
> at least); I do that regularly with the default vstart.sh local test
> cluster

Well, this goes back to the quick start guide: to me a more natural way
to start is with one host, then add another. That's what I was trying to
do, however, the quick start page ends with

"When your cluster echoes back HEALTH_OK, you may begin using Ceph."

and that doesn't happen with one host: you get "384 pgs stuck unclean"
instead of "HEALTH_OK". To me that means I may *not* begin using ceph.

I did run "ceph osd pool set ... size 1" on each of the 3 default pools,
verified that it took with "ceph osd dump | grep 'rep size'", and gave
it a good half hour to settle. I still got "384 pgs stuck unclean" from
"ceph health".

So I re-done it with 2 OSDs and got the expected HEALTH_OK right from
the start.

John,

a) a note saying "if you have only one OSD you won't get HEALTH_OK until
you add another one; you can start using the cluster" may be a useful
addition to the quick start,

b) more importantly, if there are any plans to write more quickstart
pages, I'd love to see the "add another OSD (MDS, MON) to an existing
pool in 5 minutes".

Thanks all,
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 254 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-24 21:45                                   ` Dimitri Maziuk
@ 2013-01-24 21:48                                     ` Sage Weil
  2013-01-24 21:51                                       ` Dimitri Maziuk
  0 siblings, 1 reply; 53+ messages in thread
From: Sage Weil @ 2013-01-24 21:48 UTC (permalink / raw)
  To: Dimitri Maziuk
  Cc: Dan Mick, ceph-devel@vger.kernel.org, john.wilkins, sam.lang

On Thu, 24 Jan 2013, Dimitri Maziuk wrote:
> On 01/24/2013 03:07 PM, Dan Mick wrote:
> ...
> > Yeah; it's probably mostly just that one-OSD configurations are so
> > uncommon that we never special-cased that small user set.  Also, you can
> > run with a cluster in that state forever (well, until that one OSD dies
> > at least); I do that regularly with the default vstart.sh local test
> > cluster
> 
> Well, this goes back to the quick start guide: to me a more natural way
> to start is with one host, then add another. That's what I was trying to
> do, however, the quick start page ends with
> 
> "When your cluster echoes back HEALTH_OK, you may begin using Ceph."
> 
> and that doesn't happen with one host: you get "384 pgs stuck unclean"
> instead of "HEALTH_OK". To me that means I may *not* begin using ceph.
> 
> I did run "ceph osd pool set ... size 1" on each of the 3 default pools,
> verified that it took with "ceph osd dump | grep 'rep size'", and gave
> it a good half hour to settle. I still got "384 pgs stuck unclean" from
> "ceph health".
> 
> So I re-done it with 2 OSDs and got the expected HEALTH_OK right from
> the start.
> 
> John,
> 
> a) a note saying "if you have only one OSD you won't get HEALTH_OK until
> you add another one; you can start using the cluster" may be a useful
> addition to the quick start,
> 
> b) more importantly, if there are any plans to write more quickstart
> pages, I'd love to see the "add another OSD (MDS, MON) to an existing
> pool in 5 minutes".

There may be a related issue at work here: the default crush rules now 
replicate across hosts instead of across osds, so single-host configs may 
have similar problems (depending on whether you used mkcephfs to create 
the cluster or not).

sage

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-24 21:48                                     ` Sage Weil
@ 2013-01-24 21:51                                       ` Dimitri Maziuk
  0 siblings, 0 replies; 53+ messages in thread
From: Dimitri Maziuk @ 2013-01-24 21:51 UTC (permalink / raw)
  To: Sage Weil; +Cc: Dan Mick, ceph-devel@vger.kernel.org, john.wilkins, sam.lang

[-- Attachment #1: Type: text/plain, Size: 617 bytes --]

On 01/24/2013 03:48 PM, Sage Weil wrote:
> On Thu, 24 Jan 2013, Dimitri Maziuk wrote:

>> So I re-done it with 2 OSDs and got the expected HEALTH_OK right from
>> the start.

> There may be a related issue at work here: the default crush rules now 
> replicate across hosts instead of across osds, so single-host configs may 
> have similar problems (depending on whether you used mkcephfs to create 
> the cluster or not).

Right, that's with 2nd osd on another host, not with 2 osds on the same
host.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 254 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-24 20:53                                   ` John Wilkins
@ 2013-01-24 22:15                                     ` Dimitri Maziuk
  2013-01-24 22:52                                       ` Josh Durgin
  0 siblings, 1 reply; 53+ messages in thread
From: Dimitri Maziuk @ 2013-01-24 22:15 UTC (permalink / raw)
  To: John Wilkins; +Cc: Sam Lang, ceph-devel@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 493 bytes --]

John,

in block device quick start (http://ceph.com/docs/master/start/quick-rbd/)

"sudo rbd map foo --pool rbd --name client.admin"

maps the image to /dev/rbd0 here (centos 6.3/bobtail) so the subsequent

"4. Use the block device. In the following example, create a file system.

sudo mkfs.ext4 -m0 /dev/rbd/rbd/foo"

should end with "/dev/rbd0" instead of "/dev/rbd/rbd/foo".

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 254 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-24 22:15                                     ` Dimitri Maziuk
@ 2013-01-24 22:52                                       ` Josh Durgin
  2013-01-24 23:36                                         ` John Wilkins
  0 siblings, 1 reply; 53+ messages in thread
From: Josh Durgin @ 2013-01-24 22:52 UTC (permalink / raw)
  To: Dimitri Maziuk; +Cc: John Wilkins, Sam Lang, ceph-devel@vger.kernel.org

On 01/24/2013 02:15 PM, Dimitri Maziuk wrote:
> John,
>
> in block device quick start (http://ceph.com/docs/master/start/quick-rbd/)
>
> "sudo rbd map foo --pool rbd --name client.admin"
>
> maps the image to /dev/rbd0 here (centos 6.3/bobtail) so the subsequent
>
> "4. Use the block device. In the following example, create a file system.
>
> sudo mkfs.ext4 -m0 /dev/rbd/rbd/foo"
>
> should end with "/dev/rbd0" instead of "/dev/rbd/rbd/foo".
>

That's what happens when the udev rule for rbd isn't installed.

The rpm should include it, but it doesn't seem to be mentioned in the
spec file. I filed http://www.tracker.newdream.net/issues/3930 to track
that.

Josh

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-24 22:52                                       ` Josh Durgin
@ 2013-01-24 23:36                                         ` John Wilkins
  2013-01-24 23:38                                           ` Josh Durgin
  0 siblings, 1 reply; 53+ messages in thread
From: John Wilkins @ 2013-01-24 23:36 UTC (permalink / raw)
  To: Josh Durgin; +Cc: Dimitri Maziuk, Sam Lang, ceph-devel@vger.kernel.org

Do I need to update the doc for Dima's comment then, or will the bug
fix take care of it?

On Thu, Jan 24, 2013 at 2:52 PM, Josh Durgin <josh.durgin@inktank.com> wrote:
> On 01/24/2013 02:15 PM, Dimitri Maziuk wrote:
>>
>> John,
>>
>> in block device quick start (http://ceph.com/docs/master/start/quick-rbd/)
>>
>> "sudo rbd map foo --pool rbd --name client.admin"
>>
>> maps the image to /dev/rbd0 here (centos 6.3/bobtail) so the subsequent
>>
>> "4. Use the block device. In the following example, create a file system.
>>
>> sudo mkfs.ext4 -m0 /dev/rbd/rbd/foo"
>>
>> should end with "/dev/rbd0" instead of "/dev/rbd/rbd/foo".
>>
>
> That's what happens when the udev rule for rbd isn't installed.
>
> The rpm should include it, but it doesn't seem to be mentioned in the
> spec file. I filed http://www.tracker.newdream.net/issues/3930 to track
> that.
>
> Josh



-- 
John Wilkins
Senior Technical Writer
Intank
john.wilkins@inktank.com
(415) 425-9599

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Understanding Ceph
  2013-01-24 23:36                                         ` John Wilkins
@ 2013-01-24 23:38                                           ` Josh Durgin
  0 siblings, 0 replies; 53+ messages in thread
From: Josh Durgin @ 2013-01-24 23:38 UTC (permalink / raw)
  To: John Wilkins; +Cc: Dimitri Maziuk, Sam Lang, ceph-devel@vger.kernel.org

On 01/24/2013 03:36 PM, John Wilkins wrote:
> Do I need to update the doc for Dima's comment then, or will the bug
> fix take care of it?

Fixing the packages will take care of it.

> On Thu, Jan 24, 2013 at 2:52 PM, Josh Durgin <josh.durgin@inktank.com> wrote:
>> On 01/24/2013 02:15 PM, Dimitri Maziuk wrote:
>>>
>>> John,
>>>
>>> in block device quick start (http://ceph.com/docs/master/start/quick-rbd/)
>>>
>>> "sudo rbd map foo --pool rbd --name client.admin"
>>>
>>> maps the image to /dev/rbd0 here (centos 6.3/bobtail) so the subsequent
>>>
>>> "4. Use the block device. In the following example, create a file system.
>>>
>>> sudo mkfs.ext4 -m0 /dev/rbd/rbd/foo"
>>>
>>> should end with "/dev/rbd0" instead of "/dev/rbd/rbd/foo".
>>>
>>
>> That's what happens when the udev rule for rbd isn't installed.
>>
>> The rpm should include it, but it doesn't seem to be mentioned in the
>> spec file. I filed http://www.tracker.newdream.net/issues/3930 to track
>> that.
>>
>> Josh
>
>
>


^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2013-01-24 23:38 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-19 15:50 Understanding Ceph Peter Smith
2013-01-19 16:26 ` Dimitri Maziuk
2013-01-19 16:51   ` Denis Fondras
2013-01-19 17:15     ` Wenhao Xu
2013-01-19 17:13   ` Sage Weil
2013-01-19 17:25     ` Peter Smith
2013-01-19 17:38       ` Sage Weil
2013-01-19 18:08         ` Jeff Mitchell
2013-01-19 18:16           ` Sage Weil
2013-01-19 18:25             ` Peter Smith
2013-01-20 16:39             ` Dimitri Maziuk
2013-01-23 15:13               ` Sam Lang
2013-01-23 16:19                 ` Patrick McGarry
2013-01-24  0:10                   ` Dimitri Maziuk
2013-01-24  0:17                     ` John Nielsen
2013-01-24  2:36                       ` Dimitri Maziuk
2013-01-24 15:45                       ` Dimitri Maziuk
2013-01-24 15:53                         ` Jens Kristian Søgaard
2013-01-24 15:58                           ` Wido den Hollander
2013-01-24 16:14                             ` Dimitri Maziuk
2013-01-24 16:18                               ` Jens Kristian Søgaard
2013-01-24 16:22                         ` Sam Lang
2013-01-24 17:09                           ` Dimitri Maziuk
2013-01-24 17:16                             ` John Nielsen
2013-01-24  8:49                     ` Gandalf Corvotempesta
2013-01-24 13:55                       ` Dimitri Maziuk
     [not found]                         ` <CAKMAVE9HMo4x3seuG7ppeafSRJmBwjUXrLv0GUA-z5kDXyhoQA@mail.gmail.com>
2013-01-24 15:28                           ` Dimitri Maziuk
2013-01-24 16:12                             ` Sam Lang
2013-01-24 18:15                             ` Dan Mick
2013-01-24 18:58                               ` Dimitri Maziuk
2013-01-24 21:07                                 ` Dan Mick
2013-01-24 21:45                                   ` Dimitri Maziuk
2013-01-24 21:48                                     ` Sage Weil
2013-01-24 21:51                                       ` Dimitri Maziuk
     [not found]                             ` <CAM2gkg6 m2S0DtgapSOg16GTmGQHsj7fz=a3XzH0ZsvcCWHcBtg@mail.gmail.com>
     [not found]                               ` <CAM2gkg6m2S0DtgapSOg16GTmGQHsj7fz=a3XzH0ZsvcCWHcBtg@mail.gmail.com>
2013-01-24 19:52                                 ` Dimitri Maziuk
2013-01-24 20:53                                   ` John Wilkins
2013-01-24 22:15                                     ` Dimitri Maziuk
2013-01-24 22:52                                       ` Josh Durgin
2013-01-24 23:36                                         ` John Wilkins
2013-01-24 23:38                                           ` Josh Durgin
2013-01-19 18:16         ` Peter Smith
2013-01-19 21:10           ` Josh Durgin
2013-01-20  0:41             ` Jeff Mitchell
2013-01-20  3:24               ` Peter Smith
2013-01-20  3:56                 ` Sage Weil
2013-01-20 16:32     ` Dimitri Maziuk
2013-01-24 18:16       ` Dan Mick
2013-01-24 20:14         ` Dimitri Maziuk
  -- strict thread matches above, loose matches on Subject: below --
2011-12-18  6:41 Bill Hastings
2011-12-18 12:17 ` Christian Brunner
2011-12-18 16:43   ` Bill Hastings
2011-12-18 17:17     ` Yehuda Sadeh Weinraub
2011-12-18 17:37       ` Bill Hastings

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.