All of lore.kernel.org
 help / color / mirror / Atom feed
* Object size
@ 2011-04-28 18:51 Zenon Panoussis
  2011-04-28 20:02 ` Gregory Farnum
  0 siblings, 1 reply; 6+ messages in thread
From: Zenon Panoussis @ 2011-04-28 18:51 UTC (permalink / raw)
  To: ceph-devel


Hi

I'm trying to understand the relation between data size and actual
disk usage. With replication x2 I am seeing a 1:4.4 ratio according
to ceph (12222 MB data, 53579 MB used) and even more according to
'du -m' on the ext3 source of the ceph data (9791 MB instead of
12222 MB).

What is the current default object size? A paper from 2004 by Sage
et al speaks of 1 MB, a later one of 8 MB and
http://diaspora.gen.nz/~rodgerd/archives/1219-Ceph.html says 4 MB.
Is there a way to configure it? And is there any point in configuring
it (I am using ceph to store millions of small files) or would it
make no difference?

Z



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Object size
  2011-04-28 18:51 Object size Zenon Panoussis
@ 2011-04-28 20:02 ` Gregory Farnum
  2011-04-28 23:55   ` Zenon Panoussis
  0 siblings, 1 reply; 6+ messages in thread
From: Gregory Farnum @ 2011-04-28 20:02 UTC (permalink / raw)
  To: Zenon Panoussis; +Cc: ceph-devel

On Thursday, April 28, 2011 at 11:51 AM, Zenon Panoussis wrote:
> What is the current default object size? A paper from 2004 by Sage
> et al speaks of 1 MB, a later one of 8 MB and
> http://diaspora.gen.nz/~rodgerd/archives/1219-Ceph.html says 4 MB.
> Is there a way to configure it? And is there any point in configuring
> it (I am using ceph to store millions of small files) or would it
> make no difference?

The default at this point is 4MB objects, but it's configurable during mkcephfs, and you can change it on new subtrees and files by using the cephfs tool. However, those objects will only be as large as the actual disk space used on them -- if you've got a 2KB file, it will take up 2 KB on disk.

> I'm trying to understand the relation between data size and actual
> disk usage. With replication x2 I am seeing a 1:4.4 ratio according
> to ceph (12222 MB data, 53579 MB used) and even more according to
> 'du -m' on the ext3 source of the ceph data (9791 MB instead of
> 12222 MB).

The relation between these reports and your data can be a bit fuzzy, though. When looking at the disk space used the OSD is just relying on a df for the mount it's on -- if it's sharing that mount with anything else (eg, the node OS) then it's not distinguishing between OSD data, and data on the disk. Something like that must be going on if you've got a 4.4x ratio. (An example is below. [1]) Based on what you're giving us here:
1) You have 9791 MB of data in the filesystem.
2) You have (12222MB - 9791 MB=) 2431MB of metadata maintaining the Ceph tree.
3) RADOS is using 24444MB of disk space amongst all your OSDs to store this.
4) Your nodes have other stuff installed to the tune of (29135MB/2=)14567MB or (29135/3=)9711MB per OSD.

-Greg

[1]: If I use the vstart script to start up a 1-mon, 1-MDS, 1-OSD cluster on my dev machine, ceph -s on a clean tree gives me the following output:
2011-04-28 13:24:29.669534 pg v5: 18 pgs: 18 active+clean+degraded; 43 KB data, 12149 MB used, 919 GB / 931 GB avail; 37/74 degraded (50.000%))





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Object size
  2011-04-28 20:02 ` Gregory Farnum
@ 2011-04-28 23:55   ` Zenon Panoussis
  2011-04-30  0:04     ` Gregory Farnum
  2011-05-02 16:09     ` Gregory Farnum
  0 siblings, 2 replies; 6+ messages in thread
From: Zenon Panoussis @ 2011-04-28 23:55 UTC (permalink / raw)
  To: ceph-devel


On 04/28/2011 10:02 PM, Gregory Farnum wrote:

[various explanations]

Thanks Greg, that's very helpful towards graspings ceph's workings. I'll
put it in the wiki.

> The relation between these reports and your data can be a bit fuzzy, 
> though. When looking at the disk space used the OSD is just relying on 
> a df for the mount it's on -- if it's sharing that mount with anything 
> else (eg, the node OS) then it's not distinguishing between OSD data, 
> and data on the disk. Something like that must be going on if you've 
> got a 4.4x ratio. (An example is below. [1]) Based on what you're giving 
> us here:

> 1) You have 9791 MB of data in the filesystem.
> 2) You have (12222MB - 9791 MB=) 2431MB of metadata maintaining the Ceph tree.
> 3) RADOS is using 24444MB of disk space amongst all your OSDs to store this.
> 4) Your nodes have other stuff installed to the tune of (29135MB/2=)14567MB or (29135/3=)9711MB per OSD.

1 and 3 are correct. 2 is presumably correct; it makes perfect sense and
there's no reason to question it. 4 is not correct:

# df -m
[...]
/dev/mapper/sda6        232003     26913    191832  13% /mnt/osd

# grep /mnt/osd /etc/ceph/ceph.conf
	osd data = /mnt/osd

# ls -a /mnt/osd/
.  ..  ceph_fsid  current  fsid  lost+found  magic  whoami

So the OSD lives in its own exclusive partition and nothing else uses that
partition. The other node is done the same way. The "53579 MB used" reported
by ceph matches the aggregated "Used" output of df -m on both nodes. And
I checked, lost+found is empty on both. Something here is trying to be elusive
(and is succeeding).

Z


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Object size
  2011-04-28 23:55   ` Zenon Panoussis
@ 2011-04-30  0:04     ` Gregory Farnum
  2011-04-30 12:00       ` Zenon Panoussis
  2011-05-02 16:09     ` Gregory Farnum
  1 sibling, 1 reply; 6+ messages in thread
From: Gregory Farnum @ 2011-04-30  0:04 UTC (permalink / raw)
  To: Zenon Panoussis; +Cc: ceph-devel

On Thu, Apr 28, 2011 at 4:55 PM, Zenon Panoussis <oracle@provocation.net> wrote:
> On 04/28/2011 10:02 PM, Gregory Farnum wrote:
>> 1) You have 9791 MB of data in the filesystem.
>> 2) You have (12222MB - 9791 MB=) 2431MB of metadata maintaining the Ceph tree.
>> 3) RADOS is using 24444MB of disk space amongst all your OSDs to store this.
>> 4) Your nodes have other stuff installed to the tune of (29135MB/2=)14567MB or (29135/3=)9711MB per OSD.
>
> 1 and 3 are correct. 2 is presumably correct; it makes perfect sense and
> there's no reason to question it. 4 is not correct:
>
> # df -m
> [...]
> /dev/mapper/sda6        232003     26913    191832  13% /mnt/osd
>
> # grep /mnt/osd /etc/ceph/ceph.conf
>        osd data = /mnt/osd
>
> # ls -a /mnt/osd/
> .  ..  ceph_fsid  current  fsid  lost+found  magic  whoami
>
> So the OSD lives in its own exclusive partition and nothing else uses that
> partition. The other node is done the same way. The "53579 MB used" reported
> by ceph matches the aggregated "Used" output of df -m on both nodes. And
> I checked, lost+found is empty on both. Something here is trying to be elusive
> (and is succeeding).

Hmmm, that's unexpected. Is there anything in the lost+found dir?
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Object size
  2011-04-30  0:04     ` Gregory Farnum
@ 2011-04-30 12:00       ` Zenon Panoussis
  0 siblings, 0 replies; 6+ messages in thread
From: Zenon Panoussis @ 2011-04-30 12:00 UTC (permalink / raw)
  To: ceph-devel



On 04/30/2011 02:04 AM, Gregory Farnum wrote:

>> by ceph matches the aggregated "Used" output of df -m on both nodes. And
>> I checked, lost+found is empty on both. Something here is trying to be elusive
>> (and is succeeding).

> Hmmm, that's unexpected. Is there anything in the lost+found dir?

Erhm, no, it's empty ;)

Z


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Object size
  2011-04-28 23:55   ` Zenon Panoussis
  2011-04-30  0:04     ` Gregory Farnum
@ 2011-05-02 16:09     ` Gregory Farnum
  1 sibling, 0 replies; 6+ messages in thread
From: Gregory Farnum @ 2011-05-02 16:09 UTC (permalink / raw)
  To: Zenon Panoussis; +Cc: ceph-devel

On Thu, Apr 28, 2011 at 4:55 PM, Zenon Panoussis <oracle@provocation.net> wrote:
>
> On 04/28/2011 10:02 PM, Gregory Farnum wrote:
>
> [various explanations]
>
> Thanks Greg, that's very helpful towards graspings ceph's workings. I'll
> put it in the wiki.
>
>> The relation between these reports and your data can be a bit fuzzy,
>> though. When looking at the disk space used the OSD is just relying on
>> a df for the mount it's on -- if it's sharing that mount with anything
>> else (eg, the node OS) then it's not distinguishing between OSD data,
>> and data on the disk. Something like that must be going on if you've
>> got a 4.4x ratio. (An example is below. [1]) Based on what you're giving
>> us here:
>
>> 1) You have 9791 MB of data in the filesystem.
>> 2) You have (12222MB - 9791 MB=) 2431MB of metadata maintaining the Ceph tree.
>> 3) RADOS is using 24444MB of disk space amongst all your OSDs to store this.
>> 4) Your nodes have other stuff installed to the tune of (29135MB/2=)14567MB or (29135/3=)9711MB per OSD.
>
> 1 and 3 are correct. 2 is presumably correct; it makes perfect sense and
> there's no reason to question it. 4 is not correct:
>
> # df -m
> [...]
> /dev/mapper/sda6        232003     26913    191832  13% /mnt/osd
>
> # grep /mnt/osd /etc/ceph/ceph.conf
>        osd data = /mnt/osd
>
> # ls -a /mnt/osd/
> .  ..  ceph_fsid  current  fsid  lost+found  magic  whoami
>
> So the OSD lives in its own exclusive partition and nothing else uses that
> partition. The other node is done the same way. The "53579 MB used" reported
> by ceph matches the aggregated "Used" output of df -m on both nodes. And
> I checked, lost+found is empty on both. Something here is trying to be elusive
> (and is succeeding).

All right, a few other things.
1) Are you using snapshots? And what's the backing filesystem?
2) Can you run 'ceph pg dump -o -' and give us the output? That's
where the numbers are collated from, so hopefully we can see something
useful in there.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-05-02 16:09 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-28 18:51 Object size Zenon Panoussis
2011-04-28 20:02 ` Gregory Farnum
2011-04-28 23:55   ` Zenon Panoussis
2011-04-30  0:04     ` Gregory Farnum
2011-04-30 12:00       ` Zenon Panoussis
2011-05-02 16:09     ` Gregory Farnum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.