public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Sidong Yang <realwakka@gmail.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [RFC] btrfs-progs: cmds: Add subcommand that dumps file extents
Date: Sun, 11 Jul 2021 16:04:53 +0000	[thread overview]
Message-ID: <20210711160453.GA68357@realwakka> (raw)
In-Reply-To: <98e30ba5-9309-47d2-2ea7-37cfb2cddc3c@gmx.com>

On Sun, Jul 11, 2021 at 06:58:49PM +0800, Qu Wenruo wrote:
> 
> 
> On 2021/7/11 下午5:02, Sidong Yang wrote:
> > On Sun, Jul 11, 2021 at 09:33:43AM +0800, Qu Wenruo wrote:
> > 
> > Hi, I really thank you for review.
> > > 
> > > 
> > > On 2021/7/11 上午6:38, Qu Wenruo wrote:
> > > > 
> > > > 
> > > > On 2021/7/10 下午10:41, Sidong Yang wrote:
> > > > > This patch adds an subcommand in inspect-internal. It dumps file
> > > > > extents of
> > > > > the file that user provided. It helps to show the internal information
> > > > > about file extents comprise the file.
> > > > 
> > > > So this is going to be the combined command of filemap + btrfs-map-logical.
> > > > 
> > > > But how do you handle dirty pages which hasn't yet been flushed to disk?
> > 
> > I have no idea about this. Is there any references to get dirty pages
> > information?
> 
> For dirty pages there is no way to know their file extent info.
> Until we write them back.
> 
> This is delayed allocation utilized by all modern filesystem, and is one
> of the core concept to improve filesystem performance.
> 
> If you want to dig into the idea deeper, I would recommend to see how
> btrfs_buffered_write() works.
> It just does space reservation and copy the data to page cache, then
> call it all day.
> 
> The real writeback happen when extent_writepages() happens, which then
> allocate real on-disk file extents and emit the IO.
> 
> 
> For the dirty page writeback part, you can check the common
> fiemap_prep() function in fs/ioctl.c.
> 
> If we have FIEMAP_FLAG_SYNC, then it will inform the filesystem to
> writeback the inode before calling the fs specific fiemap code.
> 
> With that FIEMAP_FLAG_SYNC flag, then it's pretty much the same behavior
> of your draft, it just cares what's already in the subvolume tree,
> ignore the dirty pages completely.
> 
> And in btrfs, extent_fiemap() function in fs/btrfs/extent_io.c, is doing
> the main work.
> 
> It's mostly the enhanced version of your draft, but convert the output
> to fiemap ioctl format, with extra work like merging extents if possible.
> 
> It would be a pretty educational experience to read the code, as it's
> not that complex.

Yeah, I'm reading the code you said. It helps me to understand
filesystem. Thanks for detailed guide!

> 
> [...]
> > > 
> > > 
> > > OK, so you're doing on-disk file extent search.
> > > 
> > > This is fine, but under most case fiemap ioctl would be enough, not to
> > > mention fiemap can also handle pages not yet written to disk (by writing
> > > them back though).
> > 
> > It would be better that this patch also can do it.
> 
> But since the fiemap ioctl currently can't report the real size of
> compressed data, it can be sometimes pretty confusing to read the fiemap
> result.
> 
> E.g.:
> 
> # mount /dev/test/test  -o compress /mnt/btrfs/
> # xfs_io -f -c "pwrite 0 1M" /mnt/btrfs/file
> # sync
> # xfs_io  -f -c "fiemap -v" /mnt/btrfs/file
> /mnt/btrfs/file:
>  EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
>    0: [0..255]:        26624..26879       256   0x8
>    1: [256..511]:      26632..26887       256   0x8
>    2: [512..767]:      26640..26895       256   0x8
>    3: [768..1023]:     26648..26903       256   0x8
>    4: [1024..1279]:    26656..26911       256   0x8
>    5: [1280..1535]:    26664..26919       256   0x8
>    6: [1536..1791]:    26672..26927       256   0x8
>    7: [1792..2047]:    26680..26935       256   0x9
> 
> Note the 0x8 flag, which indicates FIEMAP_EXTENT_ENCODED, and in btrfs
> it means it's compressed.
> But it can't show any btrfs specific info, the compression algorithm,
> nor the compressed size.
> 
> And the final one, has one extra bit 0x1, which means FIEMAP_FLAG_LAST,
> means it's the last extent of the file.
> 
> 
> But if you check the result more carefully, you will find that the
> ranges are overlapping.
> This is the limitation of current fiemap ioctl, it always assume every
> extent has the same size on-disk, thus causing above overlapping layout.

I understood. I checked disk_bytenr and disk_num_bytes on my patch. And
the output wasn't overlapped.

> 
> That's what your draft can do better, but personally speaking, I would
> prefer to enhance the fiemap ioctl other than creating a btrfs specific
> interface.

I agree. Maybe many filesystem has a problem about this.
> 
> > 
> > > 
> > > Although this manual search provides much better info for compressed
> > > extent, but unfortunately here you didn't do any extra handling for them.
> > 
> > It also good to show compression information.
> > 
> > > 
> > > Thus so far this is no better than fiemap to get the logical bytenr.
> > > 
> > > And I can't be more wrong, by thinking you're also doing the chunk
> > > lookup, which you didn't.
> > Sorry, I'm confused. Is it needed to do the chunk lookup?
> 
> No, it's me getting confused by the "physical" words.
> 
> You don't need to do that at all.
> 
> > > 
> > > So I don't see any benefit compared to regular fiemap.
> > > 
> > > In fact, fiemap can provide more info than your initial draft.
> > > Fiemap can already show if the map is compressed (but can't show the
> > > compressed size yet).
> > > 
> > > Your draft can be improved to:
> > > 
> > > - Show the compression algorithm
> > >    Which fiemap can't
> > > 
> > > - Show the compressed size
> > >    Which fiemap can't either.
> > > 
> > I understand that the point is that this command is nothing better than
> > fiemap ioctl now. but It can be improved by showing more information
> > that fiemap doesn't provide like compression algorithm.
> 
> Yes, but as mentioned, it would be better to enhance fiemap ioctl to do
> more work.
> 
> It may be acceptable as a debug tool for a while, but a generic
> interface would be the ultimate solution.
> 
> > > 
> > > > > +                case BTRFS_FILE_EXTENT_PREALLOC:
> > > > > +                    len = le64_to_cpu(extent_item->num_bytes);
> > > > > +                    physical = le64_to_cpu(extent_item->disk_bytenr);
> > > > > +                    physical_len =
> > > > > le64_to_cpu(extent_item->disk_num_bytes);
> > > > > +                    offset = le64_to_cpu(extent_item->offset);
> > > > > +                    printf("end = 0x%llx, physical = 0x%llx,
> > > > > physical_len = 0x%llx\n",
> > > 
> > > Currently we only use %llx for flags, for bytenr we always %llu.
> > > You could refer to print-tree.c to see more examples.
> > Thanks! I'll fix this.
> > > 
> > > And I don't really like the word "physical" here.
> > > 
> > > In fact the bytenr are all btrfs logical bytenr, which makes no direct
> > > sense for its physical location on disk.
> > Yeah, the word "physical" is not good. would It be better to write as
> > "disk_bytenr"?
> 
> That would be really much better, then it completely follows the naming
> in file extent item.

Thanks for kind comments. I'll write the next version.

Thanks,
Sidong

> 
> Thanks,
> Qu
> 
> > 
> > Thanks,
> > Sidong
> > > 
> > > For real physical bytenr, we need something like (device id, physical
> > > offset), and needs to do a chunk map lookup.
> > > 
> > > Thanks,
> > > Qu
> > > 
> > > > > +                           begin + len, physical, physical_len);
> > > > > +                    break;
> > > > > +                }
> > > 
> > > > > +
> > > > > +            }
> > > > > +            buf_off += sizeof(*header) +
> > > > > btrfs_search_header_len(header);
> > > > > +            pos += len;
> > > > > +        }
> > > > > +
> > > > > +    }
> > > > > +    ret = 0;
> > > > > +out:
> > > > > +    close(fd);
> > > > > +    return ret;
> > > > > +}
> > > > > +DEFINE_SIMPLE_COMMAND(inspect_dump_file_extents, "dump-file-extents");
> > > > > diff --git a/cmds/inspect.c b/cmds/inspect.c
> > > > > index 2ef5f4b6..dfb0e27b 100644
> > > > > --- a/cmds/inspect.c
> > > > > +++ b/cmds/inspect.c
> > > > > @@ -696,6 +696,7 @@ static const struct cmd_group inspect_cmd_group = {
> > > > >            &cmd_struct_inspect_dump_tree,
> > > > >            &cmd_struct_inspect_dump_super,
> > > > >            &cmd_struct_inspect_tree_stats,
> > > > > +        &cmd_struct_inspect_dump_file_extents,
> > > > >            NULL
> > > > >        }
> > > > >    };
> > > > > 

      reply	other threads:[~2021-07-11 16:05 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-10 14:41 [RFC] btrfs-progs: cmds: Add subcommand that dumps file extents Sidong Yang
2021-07-10 22:38 ` Qu Wenruo
2021-07-11  1:33   ` Qu Wenruo
2021-07-11  9:02     ` Sidong Yang
2021-07-11 10:58       ` Qu Wenruo
2021-07-11 16:04         ` Sidong Yang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210711160453.GA68357@realwakka \
    --to=realwakka@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox