* Object size @ 2011-04-28 18:51 Zenon Panoussis 2011-04-28 20:02 ` Gregory Farnum 0 siblings, 1 reply; 6+ messages in thread From: Zenon Panoussis @ 2011-04-28 18:51 UTC (permalink / raw) To: ceph-devel Hi I'm trying to understand the relation between data size and actual disk usage. With replication x2 I am seeing a 1:4.4 ratio according to ceph (12222 MB data, 53579 MB used) and even more according to 'du -m' on the ext3 source of the ceph data (9791 MB instead of 12222 MB). What is the current default object size? A paper from 2004 by Sage et al speaks of 1 MB, a later one of 8 MB and http://diaspora.gen.nz/~rodgerd/archives/1219-Ceph.html says 4 MB. Is there a way to configure it? And is there any point in configuring it (I am using ceph to store millions of small files) or would it make no difference? Z ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Object size 2011-04-28 18:51 Object size Zenon Panoussis @ 2011-04-28 20:02 ` Gregory Farnum 2011-04-28 23:55 ` Zenon Panoussis 0 siblings, 1 reply; 6+ messages in thread From: Gregory Farnum @ 2011-04-28 20:02 UTC (permalink / raw) To: Zenon Panoussis; +Cc: ceph-devel On Thursday, April 28, 2011 at 11:51 AM, Zenon Panoussis wrote: > What is the current default object size? A paper from 2004 by Sage > et al speaks of 1 MB, a later one of 8 MB and > http://diaspora.gen.nz/~rodgerd/archives/1219-Ceph.html says 4 MB. > Is there a way to configure it? And is there any point in configuring > it (I am using ceph to store millions of small files) or would it > make no difference? The default at this point is 4MB objects, but it's configurable during mkcephfs, and you can change it on new subtrees and files by using the cephfs tool. However, those objects will only be as large as the actual disk space used on them -- if you've got a 2KB file, it will take up 2 KB on disk. > I'm trying to understand the relation between data size and actual > disk usage. With replication x2 I am seeing a 1:4.4 ratio according > to ceph (12222 MB data, 53579 MB used) and even more according to > 'du -m' on the ext3 source of the ceph data (9791 MB instead of > 12222 MB). The relation between these reports and your data can be a bit fuzzy, though. When looking at the disk space used the OSD is just relying on a df for the mount it's on -- if it's sharing that mount with anything else (eg, the node OS) then it's not distinguishing between OSD data, and data on the disk. Something like that must be going on if you've got a 4.4x ratio. (An example is below. [1]) Based on what you're giving us here: 1) You have 9791 MB of data in the filesystem. 2) You have (12222MB - 9791 MB=) 2431MB of metadata maintaining the Ceph tree. 3) RADOS is using 24444MB of disk space amongst all your OSDs to store this. 4) Your nodes have other stuff installed to the tune of (29135MB/2=)14567MB or (29135/3=)9711MB per OSD. -Greg [1]: If I use the vstart script to start up a 1-mon, 1-MDS, 1-OSD cluster on my dev machine, ceph -s on a clean tree gives me the following output: 2011-04-28 13:24:29.669534 pg v5: 18 pgs: 18 active+clean+degraded; 43 KB data, 12149 MB used, 919 GB / 931 GB avail; 37/74 degraded (50.000%)) ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Object size 2011-04-28 20:02 ` Gregory Farnum @ 2011-04-28 23:55 ` Zenon Panoussis 2011-04-30 0:04 ` Gregory Farnum 2011-05-02 16:09 ` Gregory Farnum 0 siblings, 2 replies; 6+ messages in thread From: Zenon Panoussis @ 2011-04-28 23:55 UTC (permalink / raw) To: ceph-devel On 04/28/2011 10:02 PM, Gregory Farnum wrote: [various explanations] Thanks Greg, that's very helpful towards graspings ceph's workings. I'll put it in the wiki. > The relation between these reports and your data can be a bit fuzzy, > though. When looking at the disk space used the OSD is just relying on > a df for the mount it's on -- if it's sharing that mount with anything > else (eg, the node OS) then it's not distinguishing between OSD data, > and data on the disk. Something like that must be going on if you've > got a 4.4x ratio. (An example is below. [1]) Based on what you're giving > us here: > 1) You have 9791 MB of data in the filesystem. > 2) You have (12222MB - 9791 MB=) 2431MB of metadata maintaining the Ceph tree. > 3) RADOS is using 24444MB of disk space amongst all your OSDs to store this. > 4) Your nodes have other stuff installed to the tune of (29135MB/2=)14567MB or (29135/3=)9711MB per OSD. 1 and 3 are correct. 2 is presumably correct; it makes perfect sense and there's no reason to question it. 4 is not correct: # df -m [...] /dev/mapper/sda6 232003 26913 191832 13% /mnt/osd # grep /mnt/osd /etc/ceph/ceph.conf osd data = /mnt/osd # ls -a /mnt/osd/ . .. ceph_fsid current fsid lost+found magic whoami So the OSD lives in its own exclusive partition and nothing else uses that partition. The other node is done the same way. The "53579 MB used" reported by ceph matches the aggregated "Used" output of df -m on both nodes. And I checked, lost+found is empty on both. Something here is trying to be elusive (and is succeeding). Z ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Object size 2011-04-28 23:55 ` Zenon Panoussis @ 2011-04-30 0:04 ` Gregory Farnum 2011-04-30 12:00 ` Zenon Panoussis 2011-05-02 16:09 ` Gregory Farnum 1 sibling, 1 reply; 6+ messages in thread From: Gregory Farnum @ 2011-04-30 0:04 UTC (permalink / raw) To: Zenon Panoussis; +Cc: ceph-devel On Thu, Apr 28, 2011 at 4:55 PM, Zenon Panoussis <oracle@provocation.net> wrote: > On 04/28/2011 10:02 PM, Gregory Farnum wrote: >> 1) You have 9791 MB of data in the filesystem. >> 2) You have (12222MB - 9791 MB=) 2431MB of metadata maintaining the Ceph tree. >> 3) RADOS is using 24444MB of disk space amongst all your OSDs to store this. >> 4) Your nodes have other stuff installed to the tune of (29135MB/2=)14567MB or (29135/3=)9711MB per OSD. > > 1 and 3 are correct. 2 is presumably correct; it makes perfect sense and > there's no reason to question it. 4 is not correct: > > # df -m > [...] > /dev/mapper/sda6 232003 26913 191832 13% /mnt/osd > > # grep /mnt/osd /etc/ceph/ceph.conf > osd data = /mnt/osd > > # ls -a /mnt/osd/ > . .. ceph_fsid current fsid lost+found magic whoami > > So the OSD lives in its own exclusive partition and nothing else uses that > partition. The other node is done the same way. The "53579 MB used" reported > by ceph matches the aggregated "Used" output of df -m on both nodes. And > I checked, lost+found is empty on both. Something here is trying to be elusive > (and is succeeding). Hmmm, that's unexpected. Is there anything in the lost+found dir? -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Object size 2011-04-30 0:04 ` Gregory Farnum @ 2011-04-30 12:00 ` Zenon Panoussis 0 siblings, 0 replies; 6+ messages in thread From: Zenon Panoussis @ 2011-04-30 12:00 UTC (permalink / raw) To: ceph-devel On 04/30/2011 02:04 AM, Gregory Farnum wrote: >> by ceph matches the aggregated "Used" output of df -m on both nodes. And >> I checked, lost+found is empty on both. Something here is trying to be elusive >> (and is succeeding). > Hmmm, that's unexpected. Is there anything in the lost+found dir? Erhm, no, it's empty ;) Z ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Object size 2011-04-28 23:55 ` Zenon Panoussis 2011-04-30 0:04 ` Gregory Farnum @ 2011-05-02 16:09 ` Gregory Farnum 1 sibling, 0 replies; 6+ messages in thread From: Gregory Farnum @ 2011-05-02 16:09 UTC (permalink / raw) To: Zenon Panoussis; +Cc: ceph-devel On Thu, Apr 28, 2011 at 4:55 PM, Zenon Panoussis <oracle@provocation.net> wrote: > > On 04/28/2011 10:02 PM, Gregory Farnum wrote: > > [various explanations] > > Thanks Greg, that's very helpful towards graspings ceph's workings. I'll > put it in the wiki. > >> The relation between these reports and your data can be a bit fuzzy, >> though. When looking at the disk space used the OSD is just relying on >> a df for the mount it's on -- if it's sharing that mount with anything >> else (eg, the node OS) then it's not distinguishing between OSD data, >> and data on the disk. Something like that must be going on if you've >> got a 4.4x ratio. (An example is below. [1]) Based on what you're giving >> us here: > >> 1) You have 9791 MB of data in the filesystem. >> 2) You have (12222MB - 9791 MB=) 2431MB of metadata maintaining the Ceph tree. >> 3) RADOS is using 24444MB of disk space amongst all your OSDs to store this. >> 4) Your nodes have other stuff installed to the tune of (29135MB/2=)14567MB or (29135/3=)9711MB per OSD. > > 1 and 3 are correct. 2 is presumably correct; it makes perfect sense and > there's no reason to question it. 4 is not correct: > > # df -m > [...] > /dev/mapper/sda6 232003 26913 191832 13% /mnt/osd > > # grep /mnt/osd /etc/ceph/ceph.conf > osd data = /mnt/osd > > # ls -a /mnt/osd/ > . .. ceph_fsid current fsid lost+found magic whoami > > So the OSD lives in its own exclusive partition and nothing else uses that > partition. The other node is done the same way. The "53579 MB used" reported > by ceph matches the aggregated "Used" output of df -m on both nodes. And > I checked, lost+found is empty on both. Something here is trying to be elusive > (and is succeeding). All right, a few other things. 1) Are you using snapshots? And what's the backing filesystem? 2) Can you run 'ceph pg dump -o -' and give us the output? That's where the numbers are collated from, so hopefully we can see something useful in there. -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-05-02 16:09 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-04-28 18:51 Object size Zenon Panoussis 2011-04-28 20:02 ` Gregory Farnum 2011-04-28 23:55 ` Zenon Panoussis 2011-04-30 0:04 ` Gregory Farnum 2011-04-30 12:00 ` Zenon Panoussis 2011-05-02 16:09 ` Gregory Farnum
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.