linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Wang Jianchao <jianchao.wan9@gmail.com>
To: "Theodore Y. Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>,
	linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] ext4: get discard out of jbd2 commit kthread
Date: Thu, 20 May 2021 09:20:49 +0800	[thread overview]
Message-ID: <1e29444c-cbd4-8c5e-d4c3-deaa61592b19@gmail.com> (raw)
In-Reply-To: <YKUp5BpAwaBREe6d@mit.edu>



On 2021/5/19 11:08 PM, Theodore Y. Ts'o wrote:
> On Wed, May 19, 2021 at 09:27:56AM +0800, Wang Jianchao wrote:
>>
>> We're running ext4 with discard on a nbd device whose backend is storage
>> cluster. The discard can help to free the unused space to storage pool.
>>
>> And sometimes application delete a lot of data and discard is flooding. 
>> Then we see the jbd2 commit kthread is blocked for a long time. Even
>> move the discard out of jbd2, we still see the write IO of jbd2 log
>> could be blocked. blk-wbt could help to relieve this. Finally the delay
>> is shift to allocation path. But this is better than blocking the page
>> fault path which holds the read mm->mmap_sem.
> 
> I'm assuming that the problem is when the application deletes a lot of
> data, the discard flood is causing performance problems on your nbd
> server.  Is that the high level problem that you are trying to solve?
> 

Yes, not only the discard sometimes could be very slow, but also it
would degrade the performance of normal write IO

> So if that's the case, I'd suggest a different approach.  First, move
> kmem_cache_free(ext4_free_data_cachep, entry) out of
> ext4_free_data_in_buddy() to its caller, ext4_process_data.  Then if
> discard is enabled, after calling ext4_free_data_in_buddy(), the
> ext4_free_data struct will be detached from rbtree rooted in
> ext4_group_info.bb_free_root, and then we can attach it to a new
> rbtree rooted in ext4_group_info.bb_discard_root.
> 
> This allows the block to be reused as soon the commit is finished
> (allowing for potentially more efficient block allocations), but we
> can now keep track of which blocks would be useful for discarding and
> decouple that from when we release the blocks to be reused.  We can
> now use the pre-existing fstrim kernel thread infrastructure to lock a
> block group, and we can now iterate over the rbtree, and take into
> account which blocks have since become allocated --- since if a block
> has been allocated, there's no need to send a discard for it.
> 
> I think this will be more efficient, and will allow us to share more
> of the code for fstrim and the discard-at-runtime model used by "mount
> -o discard".  We can also fine-tune how quickly we issue discards; it
> might be that if user has executed "rm -rf" it might actually better
> to wait until the deletes have completed, even if it takes several
> commit intervals, since it might allow us to combine discards if the
> blocks 100-199 and 400-500 are released in one commit, and blocks
> 200-399 are released two or three commits later.


Yes, this is more efficient and fair. I will cook the next version path
based on the suggestion above.

> 
> Something else I'd urge you to consider is whether it's possible to
> enhance the nbd protocol to add some kind of back-channel notification
> when the shared storage is getting low on space.  In that case, when
> the nbd client code a request from the nbd server indicating, "please
> issue discards if possible", it could either trigger an upcall to
> userspace, which could then issue the fstrim ioctl, which in the case
> where "mount -o discard" is enabled, would accelerate when discards
> took place.
> 
> We could then make the fstrim thread normally work on a much slower
> pace, but when there is a signal from the shared storage that space is
> needed, clients could accelerate when they issue discards to free up
> shared space.

This sounds great !!!
I will share this with my colleagues to see how to implement it.

> 
> Cheers,
> 
> 						- Ted
> 
> P.S.  One other potential thought; if we have established a new
> bb_discard_root rbtree, it *might* actually be beneficial to consider
> using that information in the block allocator.  One of the best way to
> tell an SSD that block is no longer needed is to simply overwrite that
> block.  If we do that, we don't need to send a discard to that block
> any more.

This seems also true for the storage server. If the nbd client reuse the
blocks that's just freed, the server needn't to do new allocation.

> 
> Of course, we still want to keep blocks contiguous since even though
> seeks are free for SSD's, we want to keep large reads contiguous as
> much as possible, and we want to keep the extent tree as compact as
> possible.  But if we have just released a 12k file, and we are writing
> a new 12k file, and don't really care *where* in the block group we
> are writing that file, reusing blocks that had just been freed might
> actually be a good strategy.
> 
> That's not something you need to implement in this patch series, but
> it might be an interesting optimization.
> 

And thanks a million for your suggestions

Best Regards
Jianchao 

      reply	other threads:[~2021-05-20  1:20 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-17  3:57 [PATCH] ext4: get discard out of jbd2 commit kthread Wang Jianchao
2021-05-17 20:52 ` Theodore Y. Ts'o
2021-05-18  1:19   ` Wang Jianchao
2021-05-18 14:57     ` Theodore Y. Ts'o
2021-05-19  1:27       ` Wang Jianchao
2021-05-19 15:08         ` Theodore Y. Ts'o
2021-05-20  1:20           ` Wang Jianchao [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1e29444c-cbd4-8c5e-d4c3-deaa61592b19@gmail.com \
    --to=jianchao.wan9@gmail.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).