"rbd rm image" slow with big images ?

All of lore.kernel.org
 help / color / mirror / Atom feed

* "rbd rm image" slow with big images ?
       [not found] <6bb4f478-37c9-460d-8a0d-698e32dcf08d@mailpro>
@ 2012-05-31  7:12 ` Alexandre DERUMIER
  2012-05-31 18:15   ` Wido den Hollander
  0 siblings, 1 reply; 8+ messages in thread
From: Alexandre DERUMIER @ 2012-05-31  7:12 UTC (permalink / raw)
  To: ceph-devel

Hi,

I trying to delete some rbd images with rbd rm,
and it seem to be "slow" with big images.

I'm testing it with just create a new image (1TB):

# time rbd -p pool1 create --size 1000000 image2

real    0m0.031s
user    0m0.015s
sys     0m0.010s

then just delete it, without having writed nothing in image

# time rbd -p pool1 rm image2
Removing image: 100% complete...done.

real    1m45.558s
user    0m14.683s
sys     0m17.363s

same test with 100GB

# time rbd -p pool1 create --size 100000 image2

real    0m0.032s
user    0m0.016s
sys     0m0.007s

# time rbd -p pool1 rm image2
Removing image: 100% complete...done.

real    0m10.499s
user    0m1.488s
sys     0m1.720s

I'm using journal in tmpfs, 3 servers, 15 osds with 1disk 15K (xfs)
network bandwith,diskio,cpu are low.

Is it the normal behaviour ? Maybe some xfs tuning could help ?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: "rbd rm image" slow with big images ?
  2012-05-31  7:12 ` "rbd rm image" slow with big images ? Alexandre DERUMIER
@ 2012-05-31 18:15   ` Wido den Hollander
  2012-05-31 18:16     ` Stefan Priebe
  2012-05-31 18:19     ` Sage Weil
  0 siblings, 2 replies; 8+ messages in thread
From: Wido den Hollander @ 2012-05-31 18:15 UTC (permalink / raw)
  To: Alexandre DERUMIER; +Cc: ceph-devel

Hi,

On 05/31/2012 09:12 AM, Alexandre DERUMIER wrote:
> Hi,
>
> I trying to delete some rbd images with rbd rm,
> and it seem to be "slow" with big images.
>
>
>
> I'm testing it with just create a new image (1TB):
>
> # time rbd -p pool1 create --size 1000000 image2
>
> real    0m0.031s
> user    0m0.015s
> sys     0m0.010s
>
>
> then just delete it, without having writed nothing in image
>
>
> # time rbd -p pool1 rm image2
> Removing image: 100% complete...done.
>
> real    1m45.558s
> user    0m14.683s
> sys     0m17.363s
>
>
>
> same test with 100GB
>
> # time rbd -p pool1 create --size 100000 image2
>
> real    0m0.032s
> user    0m0.016s
> sys     0m0.007s
>
> # time rbd -p pool1 rm image2
> Removing image: 100% complete...done.
>
> real    0m10.499s
> user    0m1.488s
> sys     0m1.720s
>
>
> I'm using journal in tmpfs, 3 servers, 15 osds with 1disk 15K (xfs)
> network bandwith,diskio,cpu are low.
>
> Is it the normal behaviour ? Maybe some xfs tuning could help ?

It's in the nature of RBD.

A RBD image consists of multiple 4MB (default) RADOS objects.

Let's say you have a disk of 40GB, that will contain 10.000 4MB RADOS 
objects, you can find those objects by doing: rados -p rbd ls

Now, when you create a new image only the header is writting, but no 
object is written.

When you start writing to a RBD image you will be writing to one of the 
4MB objects. When it doesn't exist it will be created.

So when you install your VM it will create objects, but not all of them.

RBD knows which RADOS objects to access by three parameters:

* Image name
* Image size
* Stripe size (4MB)

So when your VM access for byte Y until Z on the disk, RBD knows which 
object to access by calculating this.

Now, when you start removing the image there is no way of knowing which 
object exists and which doesn't, so RBD will try to remove all objects.

In the case of a fresh image this results in 10.000 RADOS remove 
operations for non-existent objects and that is slow.

Wido

>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: "rbd rm image" slow with big images ?
  2012-05-31 18:15   ` Wido den Hollander
@ 2012-05-31 18:16     ` Stefan Priebe
  2012-05-31 19:39       ` Wido den Hollander
  2012-05-31 18:19     ` Sage Weil
  1 sibling, 1 reply; 8+ messages in thread
From: Stefan Priebe @ 2012-05-31 18:16 UTC (permalink / raw)
  To: Wido den Hollander; +Cc: Alexandre DERUMIER, ceph-devel

One note:
he has written:
"then just delete it, without having writed nothing in image "


Am 31.05.2012 20:15, schrieb Wido den Hollander:
> Hi,
>
> On 05/31/2012 09:12 AM, Alexandre DERUMIER wrote:
>> Hi,
>>
>> I trying to delete some rbd images with rbd rm,
>> and it seem to be "slow" with big images.
>>
>>
>>
>> I'm testing it with just create a new image (1TB):
>>
>> # time rbd -p pool1 create --size 1000000 image2
>>
>> real 0m0.031s
>> user 0m0.015s
>> sys 0m0.010s
>>
>>
>> then just delete it, without having writed nothing in image
>>
>>
>> # time rbd -p pool1 rm image2
>> Removing image: 100% complete...done.
>>
>> real 1m45.558s
>> user 0m14.683s
>> sys 0m17.363s
>>
>>
>>
>> same test with 100GB
>>
>> # time rbd -p pool1 create --size 100000 image2
>>
>> real 0m0.032s
>> user 0m0.016s
>> sys 0m0.007s
>>
>> # time rbd -p pool1 rm image2
>> Removing image: 100% complete...done.
>>
>> real 0m10.499s
>> user 0m1.488s
>> sys 0m1.720s
>>
>>
>> I'm using journal in tmpfs, 3 servers, 15 osds with 1disk 15K (xfs)
>> network bandwith,diskio,cpu are low.
>>
>> Is it the normal behaviour ? Maybe some xfs tuning could help ?
>
> It's in the nature of RBD.
>
> A RBD image consists of multiple 4MB (default) RADOS objects.
>
> Let's say you have a disk of 40GB, that will contain 10.000 4MB RADOS
> objects, you can find those objects by doing: rados -p rbd ls
>
> Now, when you create a new image only the header is writting, but no
> object is written.
>
> When you start writing to a RBD image you will be writing to one of the
> 4MB objects. When it doesn't exist it will be created.
>
> So when you install your VM it will create objects, but not all of them.
>
> RBD knows which RADOS objects to access by three parameters:
>
> * Image name
> * Image size
> * Stripe size (4MB)
>
> So when your VM access for byte Y until Z on the disk, RBD knows which
> object to access by calculating this.
>
> Now, when you start removing the image there is no way of knowing which
> object exists and which doesn't, so RBD will try to remove all objects.
>
> In the case of a fresh image this results in 10.000 RADOS remove
> operations for non-existent objects and that is slow.
>
> Wido
>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: "rbd rm image" slow with big images ?
  2012-05-31 18:15   ` Wido den Hollander
  2012-05-31 18:16     ` Stefan Priebe
@ 2012-05-31 18:19     ` Sage Weil
  2012-06-01  4:38       ` Alexandre DERUMIER
  2012-06-01 13:51       ` Guido Winkelmann
  1 sibling, 2 replies; 8+ messages in thread
From: Sage Weil @ 2012-05-31 18:19 UTC (permalink / raw)
  To: Wido den Hollander; +Cc: Alexandre DERUMIER, ceph-devel

On Thu, 31 May 2012, Wido den Hollander wrote:
> Hi,
> > Is it the normal behaviour ? Maybe some xfs tuning could help ?
> 
> It's in the nature of RBD.

Yes.

That said, the current implementation is also stupid: it's doing a single 
io at a time.  #2256 (next sprint) will parallelize this to make it go 
much faster (probably an order of magnitude?).

sage

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: "rbd rm image" slow with big images ?
  2012-05-31 18:16     ` Stefan Priebe
@ 2012-05-31 19:39       ` Wido den Hollander
  0 siblings, 0 replies; 8+ messages in thread
From: Wido den Hollander @ 2012-05-31 19:39 UTC (permalink / raw)
  To: Stefan Priebe; +Cc: ceph-devel

On 05/31/2012 08:16 PM, Stefan Priebe wrote:
> One note:
> he has written:
> "then just delete it, without having writed nothing in image "

That is true, but RBD doesn't know that.

There is no record of which object got created and which didn't, so the 
removal process has to issue a removal for each RBD object that might exist.

That is the nature of RBD. It makes it simple and reliable.

Wido

>
>
> Am 31.05.2012 20:15, schrieb Wido den Hollander:
>> Hi,
>>
>> On 05/31/2012 09:12 AM, Alexandre DERUMIER wrote:
>>> Hi,
>>>
>>> I trying to delete some rbd images with rbd rm,
>>> and it seem to be "slow" with big images.
>>>
>>>
>>>
>>> I'm testing it with just create a new image (1TB):
>>>
>>> # time rbd -p pool1 create --size 1000000 image2
>>>
>>> real 0m0.031s
>>> user 0m0.015s
>>> sys 0m0.010s
>>>
>>>
>>> then just delete it, without having writed nothing in image
>>>
>>>
>>> # time rbd -p pool1 rm image2
>>> Removing image: 100% complete...done.
>>>
>>> real 1m45.558s
>>> user 0m14.683s
>>> sys 0m17.363s
>>>
>>>
>>>
>>> same test with 100GB
>>>
>>> # time rbd -p pool1 create --size 100000 image2
>>>
>>> real 0m0.032s
>>> user 0m0.016s
>>> sys 0m0.007s
>>>
>>> # time rbd -p pool1 rm image2
>>> Removing image: 100% complete...done.
>>>
>>> real 0m10.499s
>>> user 0m1.488s
>>> sys 0m1.720s
>>>
>>>
>>> I'm using journal in tmpfs, 3 servers, 15 osds with 1disk 15K (xfs)
>>> network bandwith,diskio,cpu are low.
>>>
>>> Is it the normal behaviour ? Maybe some xfs tuning could help ?
>>
>> It's in the nature of RBD.
>>
>> A RBD image consists of multiple 4MB (default) RADOS objects.
>>
>> Let's say you have a disk of 40GB, that will contain 10.000 4MB RADOS
>> objects, you can find those objects by doing: rados -p rbd ls
>>
>> Now, when you create a new image only the header is writting, but no
>> object is written.
>>
>> When you start writing to a RBD image you will be writing to one of the
>> 4MB objects. When it doesn't exist it will be created.
>>
>> So when you install your VM it will create objects, but not all of them.
>>
>> RBD knows which RADOS objects to access by three parameters:
>>
>> * Image name
>> * Image size
>> * Stripe size (4MB)
>>
>> So when your VM access for byte Y until Z on the disk, RBD knows which
>> object to access by calculating this.
>>
>> Now, when you start removing the image there is no way of knowing which
>> object exists and which doesn't, so RBD will try to remove all objects.
>>
>> In the case of a fresh image this results in 10.000 RADOS remove
>> operations for non-existent objects and that is slow.
>>
>> Wido
>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: "rbd rm image" slow with big images ?
  2012-05-31 18:19     ` Sage Weil
@ 2012-06-01  4:38       ` Alexandre DERUMIER
  2012-06-01 13:51       ` Guido Winkelmann
  1 sibling, 0 replies; 8+ messages in thread
From: Alexandre DERUMIER @ 2012-06-01  4:38 UTC (permalink / raw)
  To: Sage Weil, Wido den Hollander; +Cc: ceph-devel

>>That said, the current implementation is also stupid: it's doing a single 
>>io at a time. #2256 (next sprint) will parallelize this to make it go 
>>much faster (probably an order of magnitude?). 

Ah, ok, this is why is see low ios/network during delete.

Thanks Sage and Wido for the explains, that's very clear!

----- Mail original ----- 

De: "Sage Weil" <sage@inktank.com> 
À: "Wido den Hollander" <wido@widodh.nl> 
Cc: "Alexandre DERUMIER" <aderumier@odiso.com>, ceph-devel@vger.kernel.org 
Envoyé: Jeudi 31 Mai 2012 20:19:44 
Objet: Re: "rbd rm image" slow with big images ? 

On Thu, 31 May 2012, Wido den Hollander wrote: 
> Hi, 
> > Is it the normal behaviour ? Maybe some xfs tuning could help ? 
> 
> It's in the nature of RBD. 

Yes. 

That said, the current implementation is also stupid: it's doing a single 
io at a time. #2256 (next sprint) will parallelize this to make it go 
much faster (probably an order of magnitude?). 

sage 

-- 

-- 

	Alexandre D erumier 
Ingénieur Système 
Fixe : 03 20 68 88 90 
Fax : 03 20 68 90 81 
45 Bvd du Général Leclerc 59100 Roubaix - France 
12 rue Marivaux 75002 Paris - France 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: "rbd rm image" slow with big images ?
  2012-05-31 18:19     ` Sage Weil
  2012-06-01  4:38       ` Alexandre DERUMIER
@ 2012-06-01 13:51       ` Guido Winkelmann
  2012-06-01 20:33         ` Wido den Hollander
  1 sibling, 1 reply; 8+ messages in thread
From: Guido Winkelmann @ 2012-06-01 13:51 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Am Donnerstag, 31. Mai 2012, 11:19:44 schrieben Sie:
> On Thu, 31 May 2012, Wido den Hollander wrote:
> > Hi,
> > 
> > > Is it the normal behaviour ? Maybe some xfs tuning could help ?
> > 
> > It's in the nature of RBD.
> 
> Yes.
> 
> That said, the current implementation is also stupid: it's doing a single
> io at a time.  #2256 (next sprint) will parallelize this to make it go
> much faster (probably an order of magnitude?).

Will it speed up copy operations as well? Those are a lot more important in 
practice... A delete operation I can usually just fire off and leave running 
in the background, but if I'm running a copy operation, there's usually 
something else waiting (like starting a virtual server that's waiting for its 
disk) that cannot proceed until the copy is actually finished.

On another note, it looks to me (correct me if I'm wrong) like rbd copy 
operations always involve copying all the data objects from the source volume 
to the machine on which the rbd command is running, and then back to the 
cluster, even if that machine isn't even part of the cluster. Are there any 
plans to streamline this?

Regards,
	Guido

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: "rbd rm image" slow with big images ?
  2012-06-01 13:51       ` Guido Winkelmann
@ 2012-06-01 20:33         ` Wido den Hollander
  0 siblings, 0 replies; 8+ messages in thread
From: Wido den Hollander @ 2012-06-01 20:33 UTC (permalink / raw)
  To: Guido Winkelmann; +Cc: ceph-devel

Hi,

On 06/01/2012 03:51 PM, Guido Winkelmann wrote:
> Am Donnerstag, 31. Mai 2012, 11:19:44 schrieben Sie:
>> On Thu, 31 May 2012, Wido den Hollander wrote:
>>> Hi,
>>>
>>>> Is it the normal behaviour ? Maybe some xfs tuning could help ?
>>>
>>> It's in the nature of RBD.
>>
>> Yes.
>>
>> That said, the current implementation is also stupid: it's doing a single
>> io at a time.  #2256 (next sprint) will parallelize this to make it go
>> much faster (probably an order of magnitude?).
>
> Will it speed up copy operations as well? Those are a lot more important in
> practice... A delete operation I can usually just fire off and leave running
> in the background, but if I'm running a copy operation, there's usually
> something else waiting (like starting a virtual server that's waiting for its
> disk) that cannot proceed until the copy is actually finished.
>

#2256 is only about parallelizing deletions: 
http://tracker.newdream.net/issues/2256

I don't see a feature request in the tracker for parallelizing a copy, 
but we can always create that one :)

> On another note, it looks to me (correct me if I'm wrong) like rbd copy
> operations always involve copying all the data objects from the source volume
> to the machine on which the rbd command is running, and then back to the
> cluster, even if that machine isn't even part of the cluster. Are there any
> plans to streamline this?
>

You are running the rbd command on that client, so that client will read 
the object and write them again as new RADOS objects.

What you are asking is a "cluster-side" clone of a volume, correct?

There is working on-going for layering, where you have one "golden 
image" with multiple childs. With that you can achieve what you want, 
but it's not always desired in every situation.

There has been talking about promoting a child to a fresh volume, that 
would be the same as the cloning you are talking about. I don't know the 
status of that.

Wido

> Regards,
> 	Guido
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-06-01 20:33 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <6bb4f478-37c9-460d-8a0d-698e32dcf08d@mailpro>
2012-05-31  7:12 ` "rbd rm image" slow with big images ? Alexandre DERUMIER
2012-05-31 18:15   ` Wido den Hollander
2012-05-31 18:16     ` Stefan Priebe
2012-05-31 19:39       ` Wido den Hollander
2012-05-31 18:19     ` Sage Weil
2012-06-01  4:38       ` Alexandre DERUMIER
2012-06-01 13:51       ` Guido Winkelmann
2012-06-01 20:33         ` Wido den Hollander

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.