From: Jan Kara <jack@suse.cz>
To: Zhang Yi <yi.zhang@huaweicloud.com>
Cc: Jan Kara <jack@suse.cz>,
linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, tytso@mit.edu,
adilger.kernel@dilger.ca, ritesh.list@gmail.com,
yi.zhang@huawei.com, chengzhihao1@huawei.com, yukuai3@huawei.com
Subject: Re: [PATCH v2 06/10] ext4: update delalloc data reserve spcae in ext4_es_insert_extent()
Date: Thu, 8 Aug 2024 20:36:19 +0200 [thread overview]
Message-ID: <20240808183619.vmxttspcs5ngm6g3@quack3> (raw)
In-Reply-To: <a23023f6-93cc-584d-c55a-9f8395e360ae@huaweicloud.com>
On Thu 08-08-24 19:18:30, Zhang Yi wrote:
> On 2024/8/8 1:41, Jan Kara wrote:
> > On Fri 02-08-24 19:51:16, Zhang Yi wrote:
> >> From: Zhang Yi <yi.zhang@huawei.com>
> >>
> >> Now that we update data reserved space for delalloc after allocating
> >> new blocks in ext4_{ind|ext}_map_blocks(), and if bigalloc feature is
> >> enabled, we also need to query the extents_status tree to calculate the
> >> exact reserved clusters. This is complicated now and it appears that
> >> it's better to do this job in ext4_es_insert_extent(), because
> >> __es_remove_extent() have already count delalloc blocks when removing
> >> delalloc extents and __revise_pending() return new adding pending count,
> >> we could update the reserved blocks easily in ext4_es_insert_extent().
> >>
> >> Thers is one special case needs to concern is the quota claiming, when
> >> bigalloc is enabled, if the delayed cluster allocation has been raced
> >> by another no-delayed allocation(e.g. from fallocate) which doesn't
> >> cover the delayed blocks:
> >>
> >> |< one cluster >|
> >> hhhhhhhhhhhhhhhhhhhdddddddddd
> >> ^ ^
> >> |< >| < fallocate this range, don't claim quota again
> >>
> >> We can't claim quota as usual because the fallocate has already claimed
> >> it in ext4_mb_new_blocks(), we could notice this case through the
> >> removed delalloc blocks count.
> >>
> >> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> > ...
> >> @@ -926,9 +928,27 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
> >> __free_pending(pr);
> >> pr = NULL;
> >> }
> >> + pending = err3;
> >> }
> >> error:
> >> write_unlock(&EXT4_I(inode)->i_es_lock);
> >> + /*
> >> + * Reduce the reserved cluster count to reflect successful deferred
> >> + * allocation of delayed allocated clusters or direct allocation of
> >> + * clusters discovered to be delayed allocated. Once allocated, a
> >> + * cluster is not included in the reserved count.
> >> + *
> >> + * When bigalloc is enabled, allocating non-delayed allocated blocks
> >> + * which belong to delayed allocated clusters (from fallocate, filemap,
> >> + * DIO, or clusters allocated when delalloc has been disabled by
> >> + * ext4_nonda_switch()). Quota has been claimed by ext4_mb_new_blocks(),
> >> + * so release the quota reservations made for any previously delayed
> >> + * allocated clusters.
> >> + */
> >> + resv_used = rinfo.delonly_cluster + pending;
> >> + if (resv_used)
> >> + ext4_da_update_reserve_space(inode, resv_used,
> >> + rinfo.delonly_block);
> >
> > I'm not sure I understand here. We are inserting extent into extent status
> > tree. We are replacing resv_used clusters worth of space with delayed
> > allocation reservation with normally allocated clusters so we need to
> > release the reservation (mballoc already reduced freeclusters counter).
> > That I understand. In normal case we should also claim quota because we are
> > converting from reserved into allocated state. Now if we allocated blocks
> > under this range (e.g. from fallocate()) without
> > EXT4_GET_BLOCKS_DELALLOC_RESERVE, we need to release quota reservation here
> > instead of claiming it. But I fail to see how rinfo.delonly_block > 0 is
> > related to whether EXT4_GET_BLOCKS_DELALLOC_RESERVE was set when allocating
> > blocks for this extent or not.
>
> Oh, this is really complicated due to the bigalloc feature, please let me
> explain it more clearly by listing all related situations.
>
> There are 2 types of paths of allocating delayed/reserved cluster:
> 1. Normal case, normally allocate delayed clusters from the write back path.
> 2. Special case, allocate blocks under this delayed range, e.g. from
> fallocate().
>
> There are 4 situations below:
>
> A. bigalloc is disabled. This case is simple, after path 2, we don't need
> to distinguish path 1 and 2, when calling ext4_es_insert_extent(), we
> set EXT4_GET_BLOCKS_DELALLOC_RESERVE after EXT4_MAP_DELAYED bit is
> detected. If the flag is set, we must be replacing a delayed extent and
> rinfo.delonly_block must be > 0. So rinfo.delonly_block > 0 is equal
> to set EXT4_GET_BLOCKS_DELALLOC_RESERVE.
Right. So fallocate() will call ext4_map_blocks() and
ext4_es_lookup_extent() will find delayed extent and set EXT4_MAP_DELAYED
which you (due to patch 2 of this series) transform into
EXT4_GET_BLOCKS_DELALLOC_RESERVE. We used to update the delalloc
accounting through in ext4_ext_map_blocks() but this patch moved the update
to ext4_es_insert_extent(). But there is one cornercase even here AFAICT:
Suppose fallocate is called for range 0..16k, we have delalloc extent at
8k..16k. In this case ext4_map_blocks() at block 0 will not find the
delalloc extent but ext4_ext_map_blocks() will allocate 16k from mballoc
without using delalloc reservation but then ext4_es_insert_extent() will
still have rinfo.delonly_block > 0 so we claim the quota reservation
instead of releasing it?
> B. bigalloc is enabled, there a 3 sub-cases of allocating a delayed
> cluster:
> B0.Allocating a whole delayed cluster, this case is the same to A.
>
> |< one cluster >|
> ddddddd+ddddddd+ddddddd+ddddddd
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ allocating the whole range
I agree. In this case there's no difference.
> B1.Allocating delayed blocks in a reserved cluster, this case is the same
> to A, too.
>
> |< one cluster >|
> hhhhhhh+hhhhhhh+ddddddd+ddddddd
> ^^^^^^^
> allocating this range
Yes, if the allocation starts within delalloc range, we will have
EXT4_GET_BLOCKS_DELALLOC_RESERVE set and ndelonly_blocks will always be >
0.
> B2.Allocating blocks which doesn't cover the delayed blocks in one reserved
> cluster,
>
> |< one cluster >|
> hhhhhhh+hhhhhhh+hhhhhhh+ddddddd
> ^^^^^^^
> fallocating this range
>
> This case must from path 2, which means allocating blocks without
> EXT4_GET_BLOCKS_DELALLOC_RESERVE. In this case, rinfo.delonly_block must
> be 0 since we are not replacing any delayed extents, so
> rinfo.delonly_block == 0 means allocate blocks without EXT4_MAP_DELAYED
> detected, which further means that EXT4_GET_BLOCKS_DELALLOC_RESERVE is
> not set. So I think we could use rinfo.delonly_block to identify this
> case.
Well, this is similar to the non-bigalloc case I was asking about above.
Why the allocated unwritten extent cannot extend past the start of delalloc
extent? I didn't find anything that would disallow that...
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next prev parent reply other threads:[~2024-08-08 18:36 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-02 11:51 [PATCH v2 00/10] ext4: simplify the counting and management of delalloc reserved blocks Zhang Yi
2024-08-02 11:51 ` [PATCH v2 01/10] ext4: factor out ext4_map_create_blocks() to allocate new blocks Zhang Yi
2024-08-06 14:38 ` Jan Kara
2024-08-02 11:51 ` [PATCH v2 02/10] ext4: optimize the EXT4_GET_BLOCKS_DELALLOC_RESERVE flag set Zhang Yi
2024-08-06 14:48 ` Jan Kara
2024-08-02 11:51 ` [PATCH v2 03/10] ext4: don't set EXTENT_STATUS_DELAYED on allocated blocks Zhang Yi
2024-08-06 15:23 ` Jan Kara
2024-08-07 12:18 ` Zhang Yi
2024-08-07 13:37 ` Jan Kara
2024-08-02 11:51 ` [PATCH v2 04/10] ext4: let __revise_pending() return newly inserted pendings Zhang Yi
2024-08-02 11:51 ` [PATCH v2 05/10] ext4: count removed reserved blocks for delalloc only extent entry Zhang Yi
2024-08-02 11:51 ` [PATCH v2 06/10] ext4: update delalloc data reserve spcae in ext4_es_insert_extent() Zhang Yi
2024-08-07 17:41 ` Jan Kara
2024-08-08 11:18 ` Zhang Yi
2024-08-08 18:36 ` Jan Kara [this message]
2024-08-09 3:35 ` Zhang Yi
2024-08-09 16:20 ` Jan Kara
2024-08-10 4:01 ` Zhang Yi
2024-08-02 11:51 ` [PATCH v2 07/10] ext4: drop ext4_es_delayed_clu() Zhang Yi
2024-08-02 11:51 ` [PATCH v2 08/10] ext4: use ext4_map_query_blocks() in ext4_map_blocks() Zhang Yi
2024-08-07 17:43 ` Jan Kara
2024-08-02 11:51 ` [PATCH v2 09/10] ext4: drop ext4_es_is_delonly() Zhang Yi
2024-08-07 17:48 ` Jan Kara
2024-08-08 11:21 ` Zhang Yi
2024-08-02 11:51 ` [PATCH v2 10/10] ext4: drop all delonly descriptions Zhang Yi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240808183619.vmxttspcs5ngm6g3@quack3 \
--to=jack@suse.cz \
--cc=adilger.kernel@dilger.ca \
--cc=chengzhihao1@huawei.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ritesh.list@gmail.com \
--cc=tytso@mit.edu \
--cc=yi.zhang@huawei.com \
--cc=yi.zhang@huaweicloud.com \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox