High-availability testing of ceph

All of lore.kernel.org
 help / color / mirror / Atom feed

* High-availability testing of ceph
@ 2012-07-31  2:46 Eric_YH_Chen
  2012-07-31  5:55 ` Josh Durgin
  0 siblings, 1 reply; 4+ messages in thread
From: Eric_YH_Chen @ 2012-07-31  2:46 UTC (permalink / raw)
  To: ceph-devel; +Cc: Chris_YT_Huang, Victor_CY_Chang

Hi, all:

I am testing high-availability of ceph.

Environment:  two servers, and 12 hard-disk on each server. Version: Ceph 0.48
             Kernel: 3.2.0-27

We create a ceph cluster with 24 osd.  
Osd.0 ~ osd.11 is on server1   
Osd.12 ~ osd.23 is on server2

The crush rule is using default rule.
rule rbd {
        ruleset 2
        type replicated
        min_size 1
        max_size 10
        step take default
        step chooseleaf firstn 0 type host
        step emit
}

pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 1536 pgp_num 1536 last_change 1172 owner 0

Test case 1: 
1. Create a rbd device and read/write to it
2. Random turn off one osd on server1  (service ceph stop osd.0)
3. check the read/write of rbd device

Test case 2: 
1. Create a rbd device and read/write to it
2. Random turn off one osd on server1  (service ceph stop osd.0)
2. Random turn off one osd on server2  (service ceph stop osd.12)
3. check the read/write of rbd device

About test case 1, we can access the rbd device as normal. But about test case 2, we would hang there and no response.
Is it a correct scenario ? 

I imagine that we can turn off any two osd when we set the replication as 2. 
Because without the master data, we have two other copies on two different osd. 
Even when we turn off two osd, we can find the data on third osd.
Any misunderstanding? Thanks!

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: High-availability testing of ceph
  2012-07-31  2:46 High-availability testing of ceph Eric_YH_Chen
@ 2012-07-31  5:55 ` Josh Durgin
  2012-07-31  7:31   ` Eric_YH_Chen
  0 siblings, 1 reply; 4+ messages in thread
From: Josh Durgin @ 2012-07-31  5:55 UTC (permalink / raw)
  To: Eric_YH_Chen; +Cc: ceph-devel, Chris_YT_Huang, Victor_CY_Chang

On 07/30/2012 07:46 PM, Eric_YH_Chen@wiwynn.com wrote:
> Hi, all:
>
> I am testing high-availability of ceph.
>
> Environment:  two servers, and 12 hard-disk on each server. Version: Ceph 0.48
>               Kernel: 3.2.0-27
>
> We create a ceph cluster with 24 osd.
> Osd.0 ~ osd.11 is on server1
> Osd.12 ~ osd.23 is on server2
>
> The crush rule is using default rule.
> rule rbd {
>          ruleset 2
>          type replicated
>          min_size 1
>          max_size 10
>          step take default
>          step chooseleaf firstn 0 type host
>          step emit
> }
>
> pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 1536 pgp_num 1536 last_change 1172 owner 0
>
> Test case 1:
> 1. Create a rbd device and read/write to it
> 2. Random turn off one osd on server1  (service ceph stop osd.0)
> 3. check the read/write of rbd device
>
> Test case 2:
> 1. Create a rbd device and read/write to it
> 2. Random turn off one osd on server1  (service ceph stop osd.0)
> 2. Random turn off one osd on server2  (service ceph stop osd.12)
> 3. check the read/write of rbd device
>
> About test case 1, we can access the rbd device as normal. But about test case 2, we would hang there and no response.
> Is it a correct scenario ?
>
> I imagine that we can turn off any two osd when we set the replication as 2.
> Because without the master data, we have two other copies on two different osd.
> Even when we turn off two osd, we can find the data on third osd.
> Any misunderstanding? Thanks!

rep size is the total number of copies, so stopping two osds with rep
size 2 may cause you to lose access to some objects.

Josh


^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: High-availability testing of ceph
  2012-07-31  5:55 ` Josh Durgin
@ 2012-07-31  7:31   ` Eric_YH_Chen
  2012-07-31 18:12     ` Tommi Virtanen
  0 siblings, 1 reply; 4+ messages in thread
From: Eric_YH_Chen @ 2012-07-31  7:31 UTC (permalink / raw)
  To: josh.durgin; +Cc: ceph-devel, Chris_YT_Huang, Victor_CY_Chang

Hi, Josh:

Thanks for your reply. However, I had asked a question about replica setting before.
http://www.spinics.net/lists/ceph-devel/msg07346.html

If the performance of rbd device is n MB/s under replica=2,
then that means the total io throughputs on hard disk is over 3 * n MB/s. 
Because I think the total number of copies is 3 in original.

So, it seems not correct now, the total number of copies is only 2. 
The total io through puts on disk should be 2 * n MB/s. Right?

-----Original Message-----
From: Josh Durgin [mailto:josh.durgin@inktank.com] 
Sent: Tuesday, July 31, 2012 1:56 PM
To: Eric YH Chen/WYHQ/Wiwynn
Cc: ceph-devel@vger.kernel.org; Chris YT Huang/WYHQ/Wiwynn; Victor CY Chang/WYHQ/Wiwynn
Subject: Re: High-availability testing of ceph

On 07/30/2012 07:46 PM, Eric_YH_Chen@wiwynn.com wrote:
> Hi, all:
>
> I am testing high-availability of ceph.
>
> Environment:  two servers, and 12 hard-disk on each server. Version: Ceph 0.48
>               Kernel: 3.2.0-27
>
> We create a ceph cluster with 24 osd.
> Osd.0 ~ osd.11 is on server1
> Osd.12 ~ osd.23 is on server2
>
> The crush rule is using default rule.
> rule rbd {
>          ruleset 2
>          type replicated
>          min_size 1
>          max_size 10
>          step take default
>          step chooseleaf firstn 0 type host
>          step emit
> }
>
> pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 
> 1536 pgp_num 1536 last_change 1172 owner 0
>
> Test case 1:
> 1. Create a rbd device and read/write to it 2. Random turn off one osd 
> on server1  (service ceph stop osd.0) 3. check the read/write of rbd 
> device
>
> Test case 2:
> 1. Create a rbd device and read/write to it 2. Random turn off one osd 
> on server1  (service ceph stop osd.0) 2. Random turn off one osd on 
> server2  (service ceph stop osd.12) 3. check the read/write of rbd 
> device
>
> About test case 1, we can access the rbd device as normal. But about test case 2, we would hang there and no response.
> Is it a correct scenario ?
>
> I imagine that we can turn off any two osd when we set the replication as 2.
> Because without the master data, we have two other copies on two different osd.
> Even when we turn off two osd, we can find the data on third osd.
> Any misunderstanding? Thanks!

rep size is the total number of copies, so stopping two osds with rep size 2 may cause you to lose access to some objects.

Josh


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: High-availability testing of ceph
  2012-07-31  7:31   ` Eric_YH_Chen
@ 2012-07-31 18:12     ` Tommi Virtanen
  0 siblings, 0 replies; 4+ messages in thread
From: Tommi Virtanen @ 2012-07-31 18:12 UTC (permalink / raw)
  To: Eric_YH_Chen; +Cc: josh.durgin, ceph-devel, Chris_YT_Huang, Victor_CY_Chang

On Tue, Jul 31, 2012 at 12:31 AM,  <Eric_YH_Chen@wiwynn.com> wrote:
> If the performance of rbd device is n MB/s under replica=2,
> then that means the total io throughputs on hard disk is over 3 * n MB/s.
> Because I think the total number of copies is 3 in original.
>
> So, it seems not correct now, the total number of copies is only 2.
> The total io through puts on disk should be 2 * n MB/s. Right?

Yes, each replica needs to independently write the data to disk. On
top of that, there are journal writes, and filesystems have overhead
too. If you create a 1 GB object in a pool replicated 3 times, you
should expect about 3*1 GB writes in total to your osd data disks, and
at least 3*1 GB writes in total to your osd journal disks.

In normal use, you have many servers, and use CRUSH rules to ensure
the different replicas are not stored on the same server.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-07-31 18:13 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-31  2:46 High-availability testing of ceph Eric_YH_Chen
2012-07-31  5:55 ` Josh Durgin
2012-07-31  7:31   ` Eric_YH_Chen
2012-07-31 18:12     ` Tommi Virtanen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.