* Ext4 speedup by storing metadata and data on separate devices
@ 2012-11-20 11:04 Ivan Zahariev
2012-11-20 20:56 ` Andreas Dilger
0 siblings, 1 reply; 2+ messages in thread
From: Ivan Zahariev @ 2012-11-20 11:04 UTC (permalink / raw)
To: linux-ext4
Hello all,
This suggestion is not about storing the journal on a separate device.
Many of the tasks on an Ext4 file-system require a full or massive scan
of the metadata. A few examples:
- backup: you need to get a list with all "mtime" or "size" changed
files since last backup
- reporting: you need to get a list with all files of a particular
"group" owner ID
- delete: deleting the "/home/$user" of someone with lots of data and files
I know many efforts have been made to make the (meta)data operations
"local" -- this speeds up spinning disks operations a lot, also SSD
ones. However, having the whole metadata on an SSD disk (or a RAID1 of
two such disks) could speed up many common tasks a lot. And the hardware
price for such a benefit is really affordable now.
I see two possible implementations:
1. Re-work the Ext4 metadata operations (that work with inodes, etc) to
read/write on a separate block device.
or
2. Add an option to the "data locality" algorithm to force it to store
all metadata only at the beginning of a device (we can pre-allocate
enough space). We can then transparently map in the DM those blocks to a
separate faster block device, thus making the changes to Ext4 minimal.
Does all this make sense, or I'm missing something obvious?
Thank you for your time.
--
Best regards.
Ivan Zahariev | System Administrator | ICDSoft Ltd.
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Ext4 speedup by storing metadata and data on separate devices
2012-11-20 11:04 Ext4 speedup by storing metadata and data on separate devices Ivan Zahariev
@ 2012-11-20 20:56 ` Andreas Dilger
0 siblings, 0 replies; 2+ messages in thread
From: Andreas Dilger @ 2012-11-20 20:56 UTC (permalink / raw)
To: Ivan Zahariev; +Cc: linux-ext4@vger.kernel.org
On 2012-11-20, at 4:04, Ivan Zahariev <famzah@icdsoft.com> wrote:
>
> This suggestion is not about storing the journal on a separate device.
>
> Many of the tasks on an Ext4 file-system require a full or massive scan of the metadata. A few examples:
> - backup: you need to get a list with all "mtime" or "size" changed files since last backup
> - reporting: you need to get a list with all files of a particular "group" owner ID
> - delete: deleting the "/home/$user" of someone with lots of data and files
>
> I know many efforts have been made to make the (meta)data operations "local" -- this speeds up spinning disks operations a lot, also SSD ones. However, having the whole metadata on an SSD disk (or a RAID1 of two such disks) could speed up many common tasks a lot. And the hardware price for such a benefit is really affordable now.
>
> I see two possible implementations:
>
> 1. Re-work the Ext4 metadata operations (that work with inodes, etc) to read/write on a separate block device.
>
> or
>
> 2. Add an option to the "data locality" algorithm to force it to store all metadata only at the beginning of a device (we can pre-allocate enough space). We can then transparently map in the DM those blocks to a separate faster block device, thus making the changes to Ext4 minimal.
>
> Does all this make sense, or I'm missing something obvious?
We have implemented the #2 option using LVM with a script to map the first 128MB of the logical volume to SSD (RAID-1) and the next 255 * 128MB to HDD (usually RAID-6). This repeats as long as there is HDD and SSD space remaining. This is done easily using lvextend in a script.
For mke2fs, specifying a flex_bg factor of "-G 256", and limiting the inode ratio ("-n 69905", for an average file size just over 64kB) allows all of the block bitmaps and inode tables to fit into the first 128MB of the flex group with some space to spare. This means all of the static metadata is allocated on SSD, and the directory allocations are also biased toward the remaining space in the first flex_bg group.
It isn't elegant, but it works with minimal complexity.
There was also a discussion about implementing the #1 option, to have ext4 access multiple devices for data/metadata, but nobody has actually started to implement this.
Cheers, Andreas
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2012-11-20 20:56 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-20 11:04 Ext4 speedup by storing metadata and data on separate devices Ivan Zahariev
2012-11-20 20:56 ` Andreas Dilger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).