why my cluster become unavailable

All of lore.kernel.org
 help / color / mirror / Atom feed

* why my cluster become unavailable
@ 2015-11-19 15:26 Libin Wu
  2015-11-21 17:43 ` Haomai Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Libin Wu @ 2015-11-19 15:26 UTC (permalink / raw)
  To: ceph-devel

Hi, cepher

I have a cluster of 6 OSD server, every server has 8 OSDs.

I out 4 OSDs on every server, then my client io is blocking.

I reboot my client and then create a new rbd device, but the new
device also can't write io.

Yeah, i understand that some data may lost as threee replicas of some
object were lost, but why the cluster become unavailable?

There 80 incomplete pg and 4 down+incomplete pg.

Any solution i could solve the problem?

Thanks!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: why my cluster become unavailable
  2015-11-19 15:26 why my cluster become unavailable Libin Wu
@ 2015-11-21 17:43 ` Haomai Wang
  2015-11-21 17:49   ` Sage Weil
  0 siblings, 1 reply; 9+ messages in thread
From: Haomai Wang @ 2015-11-21 17:43 UTC (permalink / raw)
  To: Libin Wu; +Cc: ceph-devel

On Thu, Nov 19, 2015 at 11:26 PM, Libin Wu <hzwulibin@gmail.com> wrote:
> Hi, cepher
>
> I have a cluster of 6 OSD server, every server has 8 OSDs.
>
> I out 4 OSDs on every server, then my client io is blocking.
>
> I reboot my client and then create a new rbd device, but the new
> device also can't write io.
>
> Yeah, i understand that some data may lost as threee replicas of some
> object were lost, but why the cluster become unavailable?
>
> There 80 incomplete pg and 4 down+incomplete pg.
>
> Any solution i could solve the problem?

Yes, if you doesn't have a special crushmap to control the data
replcement policy, pg will lack of necessary metadata to boot. If need
to readd outed osds or force remove pg which is incomplete(hope it's
just a test).

>
> Thanks!
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Best Regards,

Wheat

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: why my cluster become unavailable
  2015-11-21 17:43 ` Haomai Wang
@ 2015-11-21 17:49   ` Sage Weil
  2015-11-23  1:00     ` hzwulibin
  0 siblings, 1 reply; 9+ messages in thread
From: Sage Weil @ 2015-11-21 17:49 UTC (permalink / raw)
  To: Haomai Wang; +Cc: Libin Wu, ceph-devel

On Sun, 22 Nov 2015, Haomai Wang wrote:
> On Thu, Nov 19, 2015 at 11:26 PM, Libin Wu <hzwulibin@gmail.com> wrote:
> > Hi, cepher
> >
> > I have a cluster of 6 OSD server, every server has 8 OSDs.
> >
> > I out 4 OSDs on every server, then my client io is blocking.
> >
> > I reboot my client and then create a new rbd device, but the new
> > device also can't write io.
> >
> > Yeah, i understand that some data may lost as threee replicas of some
> > object were lost, but why the cluster become unavailable?
> >
> > There 80 incomplete pg and 4 down+incomplete pg.
> >
> > Any solution i could solve the problem?
> 
> Yes, if you doesn't have a special crushmap to control the data
> replcement policy, pg will lack of necessary metadata to boot. If need
> to readd outed osds or force remove pg which is incomplete(hope it's
> just a test).

Is min_size 2 or 1?  Reducing it to 1 will generally clear some of the 
incomplete pgs.  Just remember to raise it back to 2 after the cluster 
recovers.

sage

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: why my cluster become unavailable
  2015-11-21 17:49   ` Sage Weil
@ 2015-11-23  1:00     ` hzwulibin
  2015-11-26  7:54       ` why my cluster become unavailable (min_size of pool) hzwulibin
  0 siblings, 1 reply; 9+ messages in thread
From: hzwulibin @ 2015-11-23  1:00 UTC (permalink / raw)
  To: Sage Weil, Haomai Wang; +Cc: ceph-devel

Hi, Sage

Thanks! Will try it when next testing!

------------------				 
hzwulibin
2015-11-23

-------------------------------------------------------------
发件人：Sage Weil <sage@newdream.net>
发送日期：2015-11-22 01:49
收件人：Haomai Wang
抄送：Libin Wu,ceph-devel
主题：Re: why my cluster become unavailable

On Sun, 22 Nov 2015, Haomai Wang wrote:
> On Thu, Nov 19, 2015 at 11:26 PM, Libin Wu <hzwulibin@gmail.com> wrote:
> > Hi, cepher
> >
> > I have a cluster of 6 OSD server, every server has 8 OSDs.
> >
> > I out 4 OSDs on every server, then my client io is blocking.
> >
> > I reboot my client and then create a new rbd device, but the new
> > device also can't write io.
> >
> > Yeah, i understand that some data may lost as threee replicas of some
> > object were lost, but why the cluster become unavailable?
> >
> > There 80 incomplete pg and 4 down+incomplete pg.
> >
> > Any solution i could solve the problem?
> 
> Yes, if you doesn't have a special crushmap to control the data
> replcement policy, pg will lack of necessary metadata to boot. If need
> to readd outed osds or force remove pg which is incomplete(hope it's
> just a test).

Is min_size 2 or 1?  Reducing it to 1 will generally clear some of the 
incomplete pgs.  Just remember to raise it back to 2 after the cluster 
recovers.

sage


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: why my cluster become unavailable (min_size of pool)
  2015-11-23  1:00     ` hzwulibin
@ 2015-11-26  7:54       ` hzwulibin
  2015-11-26  8:00         ` Haomai Wang
  2015-11-26 13:30         ` Sage Weil
  0 siblings, 2 replies; 9+ messages in thread
From: hzwulibin @ 2015-11-26  7:54 UTC (permalink / raw)
  To: Sage Weil, Haomai Wang; +Cc: ceph-devel

Hi, Sage

I has a question about min_size of pool.

The default value of min_size is 2, but in this setting, when two OSDs are down(mean two replicas lost) at same time, the IO will be blocked.
We want to set the min_size to 1 in our production environment as we think it's normal case when two OSDs are down(sure on different host) at same time.

So is there anypotential problem of this setting?

We use 0.80.10 version.

Thanks!


------------------				 
hzwulibin
2015-11-26

-------------------------------------------------------------
发件人："hzwulibin"<hzwulibin@gmail.com>
发送日期：2015-11-23 09:00
收件人：Sage Weil,Haomai Wang
抄送：ceph-devel
主题：Re: why my cluster become unavailable

Hi, Sage

Thanks! Will try it when next testing!

------------------				 
hzwulibin
2015-11-23

-------------------------------------------------------------
发件人：Sage Weil <sage@newdream.net>
发送日期：2015-11-22 01:49
收件人：Haomai Wang
抄送：Libin Wu,ceph-devel
主题：Re: why my cluster become unavailable

On Sun, 22 Nov 2015, Haomai Wang wrote:
> On Thu, Nov 19, 2015 at 11:26 PM, Libin Wu <hzwulibin@gmail.com> wrote:
> > Hi, cepher
> >
> > I have a cluster of 6 OSD server, every server has 8 OSDs.
> >
> > I out 4 OSDs on every server, then my client io is blocking.
> >
> > I reboot my client and then create a new rbd device, but the new
> > device also can't write io.
> >
> > Yeah, i understand that some data may lost as threee replicas of some
> > object were lost, but why the cluster become unavailable?
> >
> > There 80 incomplete pg and 4 down+incomplete pg.
> >
> > Any solution i could solve the problem?
> 
> Yes, if you doesn't have a special crushmap to control the data
> replcement policy, pg will lack of necessary metadata to boot. If need
> to readd outed osds or force remove pg which is incomplete(hope it's
> just a test).

Is min_size 2 or 1?  Reducing it to 1 will generally clear some of the 
incomplete pgs.  Just remember to raise it back to 2 after the cluster 
recovers.

sage



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: why my cluster become unavailable (min_size of pool)
  2015-11-26  7:54       ` why my cluster become unavailable (min_size of pool) hzwulibin
@ 2015-11-26  8:00         ` Haomai Wang
  2015-11-26  8:04           ` hzwulibin
  2015-11-26 13:30         ` Sage Weil
  1 sibling, 1 reply; 9+ messages in thread
From: Haomai Wang @ 2015-11-26  8:00 UTC (permalink / raw)
  To: hzwulibin; +Cc: Sage Weil, ceph-devel

On Thu, Nov 26, 2015 at 3:54 PM, hzwulibin <hzwulibin@gmail.com> wrote:
> Hi, Sage
>
> I has a question about min_size of pool.
>
> The default value of min_size is 2, but in this setting, when two OSDs are down(mean two replicas lost) at same time, the IO will be blocked.
> We want to set the min_size to 1 in our production environment as we think it's normal case when two OSDs are down(sure on different host) at same time.

min_size with 2 means each object must ensure two copies in this pool.
It mainly reduce the permanent storage media corrupt risk which cause
actual data lose. That's mean if min_size is 1 and under this degraded
case, one more osd  permanent corrupt will cause data lose. If
min_size is 2, it need at least 2 osds.

>
> So is there anypotential problem of this setting?
>
> We use 0.80.10 version.
>
> Thanks!
>
>
> ------------------
> hzwulibin
> 2015-11-26
>
> -------------------------------------------------------------
> 发件人："hzwulibin"<hzwulibin@gmail.com>
> 发送日期：2015-11-23 09:00
> 收件人：Sage Weil,Haomai Wang
> 抄送：ceph-devel
> 主题：Re: why my cluster become unavailable
>
> Hi, Sage
>
> Thanks! Will try it when next testing!
>
> ------------------
> hzwulibin
> 2015-11-23
>
> -------------------------------------------------------------
> 发件人：Sage Weil <sage@newdream.net>
> 发送日期：2015-11-22 01:49
> 收件人：Haomai Wang
> 抄送：Libin Wu,ceph-devel
> 主题：Re: why my cluster become unavailable
>
> On Sun, 22 Nov 2015, Haomai Wang wrote:
>> On Thu, Nov 19, 2015 at 11:26 PM, Libin Wu <hzwulibin@gmail.com> wrote:
>> > Hi, cepher
>> >
>> > I have a cluster of 6 OSD server, every server has 8 OSDs.
>> >
>> > I out 4 OSDs on every server, then my client io is blocking.
>> >
>> > I reboot my client and then create a new rbd device, but the new
>> > device also can't write io.
>> >
>> > Yeah, i understand that some data may lost as threee replicas of some
>> > object were lost, but why the cluster become unavailable?
>> >
>> > There 80 incomplete pg and 4 down+incomplete pg.
>> >
>> > Any solution i could solve the problem?
>>
>> Yes, if you doesn't have a special crushmap to control the data
>> replcement policy, pg will lack of necessary metadata to boot. If need
>> to readd outed osds or force remove pg which is incomplete(hope it's
>> just a test).
>
> Is min_size 2 or 1?  Reducing it to 1 will generally clear some of the
> incomplete pgs.  Just remember to raise it back to 2 after the cluster
> recovers.
>
> sage
>
>



-- 
Best Regards,

Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re:  Re: why my cluster become unavailable (min_size of pool)
  2015-11-26  8:00         ` Haomai Wang
@ 2015-11-26  8:04           ` hzwulibin
  0 siblings, 0 replies; 9+ messages in thread
From: hzwulibin @ 2015-11-26  8:04 UTC (permalink / raw)
  To: Haomai Wang; +Cc: Sage Weil, ceph-devel

Hi, haomai

Thanks for quick reply, your explain make sense for me.

Thanks!

------------------				 
hzwulibin
2015-11-26

-------------------------------------------------------------
发件人：Haomai Wang <haomaiwang@gmail.com>
发送日期：2015-11-26 16:00
收件人：hzwulibin
抄送：Sage Weil,ceph-devel
主题：Re: why my cluster become unavailable (min_size of pool)

On Thu, Nov 26, 2015 at 3:54 PM, hzwulibin <hzwulibin@gmail.com> wrote:
> Hi, Sage
>
> I has a question about min_size of pool.
>
> The default value of min_size is 2, but in this setting, when two OSDs are down(mean two replicas lost) at same time, the IO will be blocked.
> We want to set the min_size to 1 in our production environment as we think it's normal case when two OSDs are down(sure on different host) at same time.

min_size with 2 means each object must ensure two copies in this pool.
It mainly reduce the permanent storage media corrupt risk which cause
actual data lose. That's mean if min_size is 1 and under this degraded
case, one more osd  permanent corrupt will cause data lose. If
min_size is 2, it need at least 2 osds.

>
> So is there anypotential problem of this setting?
>
> We use 0.80.10 version.
>
> Thanks!
>
>
> ------------------
> hzwulibin
> 2015-11-26
>
> -------------------------------------------------------------
> 发件人："hzwulibin"<hzwulibin@gmail.com>
> 发送日期：2015-11-23 09:00
> 收件人：Sage Weil,Haomai Wang
> 抄送：ceph-devel
> 主题：Re: why my cluster become unavailable
>
> Hi, Sage
>
> Thanks! Will try it when next testing!
>
> ------------------
> hzwulibin
> 2015-11-23
>
> -------------------------------------------------------------
> 发件人：Sage Weil <sage@newdream.net>
> 发送日期：2015-11-22 01:49
> 收件人：Haomai Wang
> 抄送：Libin Wu,ceph-devel
> 主题：Re: why my cluster become unavailable
>
> On Sun, 22 Nov 2015, Haomai Wang wrote:
>> On Thu, Nov 19, 2015 at 11:26 PM, Libin Wu <hzwulibin@gmail.com> wrote:
>> > Hi, cepher
>> >
>> > I have a cluster of 6 OSD server, every server has 8 OSDs.
>> >
>> > I out 4 OSDs on every server, then my client io is blocking.
>> >
>> > I reboot my client and then create a new rbd device, but the new
>> > device also can't write io.
>> >
>> > Yeah, i understand that some data may lost as threee replicas of some
>> > object were lost, but why the cluster become unavailable?
>> >
>> > There 80 incomplete pg and 4 down+incomplete pg.
>> >
>> > Any solution i could solve the problem?
>>
>> Yes, if you doesn't have a special crushmap to control the data
>> replcement policy, pg will lack of necessary metadata to boot. If need
>> to readd outed osds or force remove pg which is incomplete(hope it's
>> just a test).
>
> Is min_size 2 or 1?  Reducing it to 1 will generally clear some of the
> incomplete pgs.  Just remember to raise it back to 2 after the cluster
> recovers.
>
> sage
>
>



-- 
Best Regards,

Wheat


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: why my cluster become unavailable (min_size of pool)
  2015-11-26  7:54       ` why my cluster become unavailable (min_size of pool) hzwulibin
  2015-11-26  8:00         ` Haomai Wang
@ 2015-11-26 13:30         ` Sage Weil
  2015-12-02 11:22           ` Libin Wu
  1 sibling, 1 reply; 9+ messages in thread
From: Sage Weil @ 2015-11-26 13:30 UTC (permalink / raw)
  To: hzwulibin; +Cc: Haomai Wang, ceph-devel

On Thu, 26 Nov 2015, hzwulibin wrote:
> Hi, Sage
> 
> I has a question about min_size of pool.
> 
> The default value of min_size is 2, but in this setting, when two OSDs 
> are down(mean two replicas lost) at same time, the IO will be blocked. 
> We want to set the min_size to 1 in our production environment as we 
> think it's normal case when two OSDs are down(sure on different host) at 
> same time.
> 
> So is there anypotential problem of this setting?

min_size = 1 is okay, but be aware that it will increase the risk of a 
situation of a pg history like

 epoch 10: osd.0, osd.1, osd.2
 epoch 11: osd.0   (1 and 2 down)
 epoch 12: - (osd.0 fails hard)
 epoch 13: osd.1 osd.2

i.e., a pg is serviced by a single osd for some period (possibly very 
short) and then fails permanently, and any writes during that period are 
*only* stored on that osd.  It'll require some manual recovery to get past 
it (mark that osd as lost, and accept that you may have lost some recent 
writes to the data).

sage



 

> 
> We use 0.80.10 version.
> 
> Thanks!
> 
> 
> ------------------				 
> hzwulibin
> 2015-11-26
> 
> -------------------------------------------------------------
> ????"hzwulibin"<hzwulibin@gmail.com>
> ?????2015-11-23 09:00
> ????Sage Weil,Haomai Wang
> ???ceph-devel
> ???Re: why my cluster become unavailable
> 
> Hi, Sage
> 
> Thanks! Will try it when next testing!
> 
> ------------------				 
> hzwulibin
> 2015-11-23
> 
> -------------------------------------------------------------
> ????Sage Weil <sage@newdream.net>
> ?????2015-11-22 01:49
> ????Haomai Wang
> ???Libin Wu,ceph-devel
> ???Re: why my cluster become unavailable
> 
> On Sun, 22 Nov 2015, Haomai Wang wrote:
> > On Thu, Nov 19, 2015 at 11:26 PM, Libin Wu <hzwulibin@gmail.com> wrote:
> > > Hi, cepher
> > >
> > > I have a cluster of 6 OSD server, every server has 8 OSDs.
> > >
> > > I out 4 OSDs on every server, then my client io is blocking.
> > >
> > > I reboot my client and then create a new rbd device, but the new
> > > device also can't write io.
> > >
> > > Yeah, i understand that some data may lost as threee replicas of some
> > > object were lost, but why the cluster become unavailable?
> > >
> > > There 80 incomplete pg and 4 down+incomplete pg.
> > >
> > > Any solution i could solve the problem?
> > 
> > Yes, if you doesn't have a special crushmap to control the data
> > replcement policy, pg will lack of necessary metadata to boot. If need
> > to readd outed osds or force remove pg which is incomplete(hope it's
> > just a test).
> 
> Is min_size 2 or 1?  Reducing it to 1 will generally clear some of the 
> incomplete pgs.  Just remember to raise it back to 2 after the cluster 
> recovers.
> 
> sage
> 
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: why my cluster become unavailable (min_size of pool)
  2015-11-26 13:30         ` Sage Weil
@ 2015-12-02 11:22           ` Libin Wu
  0 siblings, 0 replies; 9+ messages in thread
From: Libin Wu @ 2015-12-02 11:22 UTC (permalink / raw)
  To: Sage Weil; +Cc: Haomai Wang, ceph-devel

Sage, thanks!

I'm missing your email until i saw it in GMANE today.

Thanks again!


2015-11-26 21:30 GMT+08:00 Sage Weil <sage@newdream.net>:
> On Thu, 26 Nov 2015, hzwulibin wrote:
>> Hi, Sage
>>
>> I has a question about min_size of pool.
>>
>> The default value of min_size is 2, but in this setting, when two OSDs
>> are down(mean two replicas lost) at same time, the IO will be blocked.
>> We want to set the min_size to 1 in our production environment as we
>> think it's normal case when two OSDs are down(sure on different host) at
>> same time.
>>
>> So is there anypotential problem of this setting?
>
> min_size = 1 is okay, but be aware that it will increase the risk of a
> situation of a pg history like
>
>  epoch 10: osd.0, osd.1, osd.2
>  epoch 11: osd.0   (1 and 2 down)
>  epoch 12: - (osd.0 fails hard)
>  epoch 13: osd.1 osd.2
>
> i.e., a pg is serviced by a single osd for some period (possibly very
> short) and then fails permanently, and any writes during that period are
> *only* stored on that osd.  It'll require some manual recovery to get past
> it (mark that osd as lost, and accept that you may have lost some recent
> writes to the data).
>
> sage
>
>
>
>
>
>>
>> We use 0.80.10 version.
>>
>> Thanks!
>>
>>
>> ------------------
>> hzwulibin
>> 2015-11-26
>>
>> -------------------------------------------------------------
>> ????"hzwulibin"<hzwulibin@gmail.com>
>> ?????2015-11-23 09:00
>> ????Sage Weil,Haomai Wang
>> ???ceph-devel
>> ???Re: why my cluster become unavailable
>>
>> Hi, Sage
>>
>> Thanks! Will try it when next testing!
>>
>> ------------------
>> hzwulibin
>> 2015-11-23
>>
>> -------------------------------------------------------------
>> ????Sage Weil <sage@newdream.net>
>> ?????2015-11-22 01:49
>> ????Haomai Wang
>> ???Libin Wu,ceph-devel
>> ???Re: why my cluster become unavailable
>>
>> On Sun, 22 Nov 2015, Haomai Wang wrote:
>> > On Thu, Nov 19, 2015 at 11:26 PM, Libin Wu <hzwulibin@gmail.com> wrote:
>> > > Hi, cepher
>> > >
>> > > I have a cluster of 6 OSD server, every server has 8 OSDs.
>> > >
>> > > I out 4 OSDs on every server, then my client io is blocking.
>> > >
>> > > I reboot my client and then create a new rbd device, but the new
>> > > device also can't write io.
>> > >
>> > > Yeah, i understand that some data may lost as threee replicas of some
>> > > object were lost, but why the cluster become unavailable?
>> > >
>> > > There 80 incomplete pg and 4 down+incomplete pg.
>> > >
>> > > Any solution i could solve the problem?
>> >
>> > Yes, if you doesn't have a special crushmap to control the data
>> > replcement policy, pg will lack of necessary metadata to boot. If need
>> > to readd outed osds or force remove pg which is incomplete(hope it's
>> > just a test).
>>
>> Is min_size 2 or 1?  Reducing it to 1 will generally clear some of the
>> incomplete pgs.  Just remember to raise it back to 2 after the cluster
>> recovers.
>>
>> sage
>>
>>
>>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-12-02 11:22 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-19 15:26 why my cluster become unavailable Libin Wu
2015-11-21 17:43 ` Haomai Wang
2015-11-21 17:49   ` Sage Weil
2015-11-23  1:00     ` hzwulibin
2015-11-26  7:54       ` why my cluster become unavailable (min_size of pool) hzwulibin
2015-11-26  8:00         ` Haomai Wang
2015-11-26  8:04           ` hzwulibin
2015-11-26 13:30         ` Sage Weil
2015-12-02 11:22           ` Libin Wu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.