public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Nikolay Borisov <nborisov@suse.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>,
	Josef Bacik <josef@toxicpanda.com>,
	linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 2/3] btrfs: add a comment describing delalloc space reservation
Date: Tue, 4 Feb 2020 14:27:14 +0200	[thread overview]
Message-ID: <55055cf9-2c36-8e3e-d1b8-b3fb53cc03c1@suse.com> (raw)
In-Reply-To: <7000f8a2-4d78-d9a1-2e3f-143b88ace1eb@gmx.com>



On 4.02.20 г. 11:48 ч., Qu Wenruo wrote:
>


<snip>

> 
>> + *
>> + *   Once this space is reserved, it is added to space_info->bytes_may_use.  The
>> + *   caller must keep track of this reservation and free it up if it is never
>> + *   used.  With the buffered IO case this is handled via the EXTENT_DELALLOC
>> + *   bit's on the inode's io_tree.  For direct IO it's more straightforward, we
>> + *   take the reservation at the start of the operation, and if we write less
>> + *   than we reserved we free the excess.
> 
> This part involves the lifespan and state machine of data.
> I guess more explanation on the state machine would help a lot.
> 
> Like:
> Page clean
> |
> +- btrfs_buffered_write()
> |  Reserve data space for data, metadata space for csum/file
> |  extents/inodes.
> |
> Page dirty
> |
> +- run_delalloc_range()
> |  Allocate data extents, submit ordered extents to do csum calculation
> |  and bio submission
> Page write back
> |
> +- finish_oredred_io()
> |  Insert csum and file extents
> |
> Page clean
> 
> Although I'm not sure if such lifespan should belong to delalloc-space.c.

This omits a lot of critical details. FOr example it should be noted
that in btrfs_buffered_write we reserve as much space as is requested by
the user. Then in run_delalloc_range it must be mentioned that in case
of compressed extents it can be called to allocate an extent which is
less than the space reserved in btrfs_buffered_write => that's where the
possible space savings in case of compressed come from.

As a matter of fact running ordered io doesn't really clean any space
apart from a bit of metadata space (unless we do overwrites, as per our
discussion with josef in the slack channel).

<snip>

>> + *
>> + *   1) Updating the inode item.  We hold a reservation for this inode as long
>> + *   as there are dirty bytes outstanding for this inode.  This is because we
>> + *   may update the inode multiple times throughout an operation, and there is
>> + *   no telling when we may have to do a full cow back to that inode item.  Thus
>> + *   we must always hold a reservation.
>> + *
>> + *   2) Adding an extent item.  This is trickier, so a few sub points
>> + *
>> + *     a) We keep track of how many extents an inode may need to create in
>> + *     inode->outstanding_extents.  This is how many items we will have reserved
>> + *     for the extents for this inode.
>> + *
>> + *     b) count_max_extents() is used to figure out how many extent items we
>> + *     will need based on the contiguous area we have dirtied.  Thus if we are
>> + *     writing 4k extents but they coalesce into a very large extent, we will

I THe way you have worded this is a bit confusing because first you
mention that count_max_extents calcs how many extent items we'll need
for a contiguous area. Then you mention that if we make a bunch of 4k
writes that coalesce to a single extent i.e create 1 large contiguous
(that's what coalescing implies in this context) we'll have to split it
them. This is counter-intuitive.

I guess what you meant here is physically contiguous as opposed to
logically contiguous?

>> + *     break this into smaller extents which means we'll need a reservation for
>> + *     each of those extents.
>> + *
>> + *     c) When we set EXTENT_DELALLOC on the inode io_tree we will figure out
>> + *     the nummber of extents needed for the contiguous area we just created,

nit: s/nummber/number

>> + *     and add that to inode->outstanding_extents.

<snip>

>> + *
>> + *   3) Adding csums for the range.  This is more straightforward than the
>> + *   extent items, as we just want to hold the number of bytes we'll need for
>> + *   checksums until the ordered extent is removed.  If there is an error it is
>> + *   cleared via the EXTENT_CLEAR_META_RESV bit when clearning EXTENT_DELALLOC

nit: s/clearning/clearing

>> + *   on the inode io_tree.
>> + */
>> +
>>  int btrfs_alloc_data_chunk_ondemand(struct btrfs_inode *inode, u64 bytes)
>>  {
>>  	struct btrfs_root *root = inode->root;
>>
> 

  reply	other threads:[~2020-02-04 12:27 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-03 20:44 [PATCH 0/3] Add comments describing how space reservation works Josef Bacik
2020-02-03 20:44 ` [PATCH 1/3] btrfs: add a comment describing block-rsvs Josef Bacik
2020-02-04  9:30   ` Qu Wenruo
2020-02-04 10:32     ` Nikolay Borisov
2020-02-03 20:44 ` [PATCH 2/3] btrfs: add a comment describing delalloc space reservation Josef Bacik
2020-02-04  9:48   ` Qu Wenruo
2020-02-04 12:27     ` Nikolay Borisov [this message]
2020-02-04 12:39       ` Qu Wenruo
2020-02-05 13:44         ` David Sterba
2020-02-03 20:44 ` [PATCH 3/3] btrfs: describe the space reservation system in general Josef Bacik
2020-02-04 10:14   ` Qu Wenruo
  -- strict thread matches above, loose matches on Subject: below --
2020-02-04 18:18 [PATCH 0/3][v2] Add comments describing how space reservation works Josef Bacik
2020-02-04 18:18 ` [PATCH 2/3] btrfs: add a comment describing delalloc space reservation Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55055cf9-2c36-8e3e-d1b8-b3fb53cc03c1@suse.com \
    --to=nborisov@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox