* Feature request regarding size and min_size on pools
[not found] <1712504592.851.1378813557601.JavaMail.root@mail>
@ 2013-09-10 12:21 ` Svein-Erik Lund
2013-09-10 13:51 ` Sylvain Munaut
0 siblings, 1 reply; 2+ messages in thread
From: Svein-Erik Lund @ 2013-09-10 12:21 UTC (permalink / raw)
To: ceph-devel
Hello,
We are implementing ceph as storage backend for some systems.
Unfortunately we have to use a posix filesystem for storing the data.
To accomplish this we have implemented a solution quite similar to what Sebastien Han has described on his blog here http://www.sebastien-han.fr/blog/2012/07/06/nfs-over-rbd/
Now to our problem. We want to be sure that a write is replicated before we get a ack. Therefor we have set pg size to 2, and min_size to 2 as we have seen that a sudden removal of one osd can lead to data loss with min_size set to 1.
The problem now is that if one osd goes down some pg's will end up incomplete, and no io operations will be allowed to the rbd.
This problem could be solved a couple of ways
1) An option could be set so that writes always is done to the number of replicas as size before the write is acknowledged.
2) If a situation where one a pg ends up in a incomplete state ceph tries to resolv the situation by doing a recovery of the pg's in question.
For us adding a third replica isn't a feasible solution, 1) we have our data in two locations 2) The cost would be to high.
^ permalink raw reply [flat|nested] 2+ messages in thread* Re: Feature request regarding size and min_size on pools
2013-09-10 12:21 ` Feature request regarding size and min_size on pools Svein-Erik Lund
@ 2013-09-10 13:51 ` Sylvain Munaut
0 siblings, 0 replies; 2+ messages in thread
From: Sylvain Munaut @ 2013-09-10 13:51 UTC (permalink / raw)
To: Svein-Erik Lund; +Cc: ceph-devel@vger.kernel.org
Hi,
> Now to our problem. We want to be sure that a write is replicated before we get a ack.
That should be the case AFAIU.
There was however a bug that was recently fixed that make RBD ack too early.
> one osd can lead to data loss with min_size set to 1.
It definitely shouldn't. Unless of course the only remaining OSD fails as well.
> The problem now is that if one osd goes down some pg's will end up incomplete, and no io operations will be allowed to the rbd.
Well yeah, that's exactly the semantic of the min_size option ...
Cheers,
Sylvain
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2013-09-10 13:51 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1712504592.851.1378813557601.JavaMail.root@mail>
2013-09-10 12:21 ` Feature request regarding size and min_size on pools Svein-Erik Lund
2013-09-10 13:51 ` Sylvain Munaut
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.