From: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
To: Andreas Dilger <adilger@dilger.ca>, Bobi Jam <bobijam@hotmail.com>
Cc: linux-ext4@vger.kernel.org
Subject: Re: [PATCH v3] ext4: optimize metadata allocation for hybrid LUNs
Date: Wed, 20 Sep 2023 10:53:14 +0530 [thread overview]
Message-ID: <877col770d.fsf@doe.com> (raw)
In-Reply-To: <9470959E-7683-4834-A4F5-34093A600F37@dilger.ca>
Andreas Dilger <adilger@dilger.ca> writes:
> On Sep 12, 2023, at 12:59 AM, Bobi Jam <bobijam@hotmail.com> wrote:
>>
>> With LVM it is possible to create an LV with SSD storage at the
>> beginning of the LV and HDD storage at the end of the LV, and use that
>> to separate ext4 metadata allocations (that need small random IOs)
>> from data allocations (that are better suited for large sequential
>> IOs) depending on the type of underlying storage. Between 0.5-1.0% of
>> the filesystem capacity would need to be high-IOPS storage in order to
>> hold all of the internal metadata.
>>
>> This would improve performance for inode and other metadata access,
>> such as ls, find, e2fsck, and in general improve file access latency,
>> modification, truncate, unlink, transaction commit, etc.
>>
>> This patch split largest free order group lists and average fragment
>> size lists into other two lists for IOPS/fast storage groups, and
>> cr 0 / cr 1 group scanning for metadata block allocation in following
>> order:
>>
>> if (allocate metadata blocks)
>> if (cr == 0)
>> try to find group in largest free order IOPS group list
>> if (cr == 1)
>> try to find group in fragment size IOPS group list
>> if (above two find failed)
>> fall through normal group lists as before
>> if (allocate data blocks)
>> try to find group in normal group lists as before
>> if (failed to find group in normal group && mb_enable_iops_data)
>> try to find group in IOPS groups
>>
>> Non-metadata block allocation does not allocate from the IOPS groups
>> if non-IOPS groups are not used up.
>
> Hi Ritesh,
> I believe this updated version of the patch addresses your original
> request that it is possible to allocate blocks from the IOPS block
> groups if the non-IOPS groups are full. This is currently disabled
> by default, because in cases where the IOPS groups make up only a
> small fraction of the space (e.g. < 1% of total capacity) having data
> blocks allocated from this space would not make a big improvement
> to the end-user usage of the filesystem, but would semi-permanently
> hurt the ability to allocate metadata into the IOPS groups.
>
> We discussed on the ext4 concall various options to make this more
> useful (e.g. allowing the root user to allocate from the IOPS groups
> if the filesystem is out of space, having a heuristic to balance IOPS
> vs. non-IOPS allocations for small files, having a BPF rule that can
> specify which UID/GID/procname/filename/etc. can access this space,
> but everyone was reluctant to put any complex policy into the kernel
> to make any decision, since this eventually is wrong for some usage.
>
> For now, there is just a simple on/off switch, and if this is enabled
> the IOPS groups are only used when all of the non-IOPS groups are full.
> Any more complex policy can be deferred to a separate patch, I think.
I think having a on/off switch for any user to enable/disable allocation
of data from iops groups is good enough for now. Atleast users with
larger iops disk space won't run out of ENOSPC if they enable with this feature.
So, thanks for addressing it. I am going through the series. I will provide
my review comments shortly.
Meanwhile, here is my understanding of your usecase. Can you please
correct add your inputs to this -
1. You would like to create a FS with a combination of high iops storage
disk and non-high iops disk. With high iops disk space to be around 1 %
of the total disk capacity. (well this is obvious as it is stated in the
patch description itself)
2. Since ofcourse ext4 currently does not support multi-drive, so we
will use it using LVM and place high iops disk in front.
3. Then at the creation of the FS we will use a cmd like this
mkfs.ext4 -O sparse_super2 -E packed_meta_blocks,iops=0-1024G /path/to/lvm
Is this understanding right?
I have few followup queries as well -
1. What about Thin Provisioned LVM? IIUC, the space in that is
pre-allocated, but allocation happens at the time of write and it might
so happen that both data/metadata allocations will start to sit in
iops/non-iops group randomly?
2. Even in case of taditional LVM, the mapping of the physical blocks
can be changed during an overwrite or discard sort of usecase right? So
do we have a gurantee of the metadata always sitting on high iops groups
after filesystem ages?
3. With this options of mkfs to utilize this feature, we do loose the
ability to resize right? I am guessing resize will be disabled with
sparse_super2 and/or packed_meta_blocks itself?
Thanks!
-ritesh
next prev parent reply other threads:[~2023-09-20 5:25 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-27 23:45 [PATCH 1/2] ext4: optimize metadata allocation for hybrid LUNs Bobi Jam
2023-08-02 23:16 ` Andreas Dilger
2023-08-03 12:10 ` Ritesh Harjani
2023-08-15 4:10 ` Andreas Dilger
2023-08-16 10:09 ` Ritesh Harjani
2023-08-18 19:53 ` Andreas Dilger
2023-09-12 6:59 ` [PATCH v3] " Bobi Jam
2023-09-18 21:47 ` Andreas Dilger
2023-09-20 5:23 ` Ritesh Harjani [this message]
2023-09-22 3:27 ` Andreas Dilger
2023-09-22 11:07 ` Dongyang Li
2023-09-20 5:39 ` Ritesh Harjani
2023-09-22 4:38 ` Andreas Dilger
2023-09-26 3:35 ` Ritesh Harjani
2023-09-26 19:19 ` Andreas Dilger
2023-09-27 2:48 ` Ritesh Harjani
2023-09-28 15:12 ` Ojaswin Mujoo
2023-09-28 14:40 ` Ojaswin Mujoo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=877col770d.fsf@doe.com \
--to=ritesh.list@gmail.com \
--cc=adilger@dilger.ca \
--cc=bobijam@hotmail.com \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox