From: Sidong Yang <realwakka@gmail.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [RFC] btrfs-progs: cmds: Add subcommand that dumps file extents
Date: Sun, 11 Jul 2021 16:04:53 +0000 [thread overview]
Message-ID: <20210711160453.GA68357@realwakka> (raw)
In-Reply-To: <98e30ba5-9309-47d2-2ea7-37cfb2cddc3c@gmx.com>
On Sun, Jul 11, 2021 at 06:58:49PM +0800, Qu Wenruo wrote:
>
>
> On 2021/7/11 下午5:02, Sidong Yang wrote:
> > On Sun, Jul 11, 2021 at 09:33:43AM +0800, Qu Wenruo wrote:
> >
> > Hi, I really thank you for review.
> > >
> > >
> > > On 2021/7/11 上午6:38, Qu Wenruo wrote:
> > > >
> > > >
> > > > On 2021/7/10 下午10:41, Sidong Yang wrote:
> > > > > This patch adds an subcommand in inspect-internal. It dumps file
> > > > > extents of
> > > > > the file that user provided. It helps to show the internal information
> > > > > about file extents comprise the file.
> > > >
> > > > So this is going to be the combined command of filemap + btrfs-map-logical.
> > > >
> > > > But how do you handle dirty pages which hasn't yet been flushed to disk?
> >
> > I have no idea about this. Is there any references to get dirty pages
> > information?
>
> For dirty pages there is no way to know their file extent info.
> Until we write them back.
>
> This is delayed allocation utilized by all modern filesystem, and is one
> of the core concept to improve filesystem performance.
>
> If you want to dig into the idea deeper, I would recommend to see how
> btrfs_buffered_write() works.
> It just does space reservation and copy the data to page cache, then
> call it all day.
>
> The real writeback happen when extent_writepages() happens, which then
> allocate real on-disk file extents and emit the IO.
>
>
> For the dirty page writeback part, you can check the common
> fiemap_prep() function in fs/ioctl.c.
>
> If we have FIEMAP_FLAG_SYNC, then it will inform the filesystem to
> writeback the inode before calling the fs specific fiemap code.
>
> With that FIEMAP_FLAG_SYNC flag, then it's pretty much the same behavior
> of your draft, it just cares what's already in the subvolume tree,
> ignore the dirty pages completely.
>
> And in btrfs, extent_fiemap() function in fs/btrfs/extent_io.c, is doing
> the main work.
>
> It's mostly the enhanced version of your draft, but convert the output
> to fiemap ioctl format, with extra work like merging extents if possible.
>
> It would be a pretty educational experience to read the code, as it's
> not that complex.
Yeah, I'm reading the code you said. It helps me to understand
filesystem. Thanks for detailed guide!
>
> [...]
> > >
> > >
> > > OK, so you're doing on-disk file extent search.
> > >
> > > This is fine, but under most case fiemap ioctl would be enough, not to
> > > mention fiemap can also handle pages not yet written to disk (by writing
> > > them back though).
> >
> > It would be better that this patch also can do it.
>
> But since the fiemap ioctl currently can't report the real size of
> compressed data, it can be sometimes pretty confusing to read the fiemap
> result.
>
> E.g.:
>
> # mount /dev/test/test -o compress /mnt/btrfs/
> # xfs_io -f -c "pwrite 0 1M" /mnt/btrfs/file
> # sync
> # xfs_io -f -c "fiemap -v" /mnt/btrfs/file
> /mnt/btrfs/file:
> EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
> 0: [0..255]: 26624..26879 256 0x8
> 1: [256..511]: 26632..26887 256 0x8
> 2: [512..767]: 26640..26895 256 0x8
> 3: [768..1023]: 26648..26903 256 0x8
> 4: [1024..1279]: 26656..26911 256 0x8
> 5: [1280..1535]: 26664..26919 256 0x8
> 6: [1536..1791]: 26672..26927 256 0x8
> 7: [1792..2047]: 26680..26935 256 0x9
>
> Note the 0x8 flag, which indicates FIEMAP_EXTENT_ENCODED, and in btrfs
> it means it's compressed.
> But it can't show any btrfs specific info, the compression algorithm,
> nor the compressed size.
>
> And the final one, has one extra bit 0x1, which means FIEMAP_FLAG_LAST,
> means it's the last extent of the file.
>
>
> But if you check the result more carefully, you will find that the
> ranges are overlapping.
> This is the limitation of current fiemap ioctl, it always assume every
> extent has the same size on-disk, thus causing above overlapping layout.
I understood. I checked disk_bytenr and disk_num_bytes on my patch. And
the output wasn't overlapped.
>
> That's what your draft can do better, but personally speaking, I would
> prefer to enhance the fiemap ioctl other than creating a btrfs specific
> interface.
I agree. Maybe many filesystem has a problem about this.
>
> >
> > >
> > > Although this manual search provides much better info for compressed
> > > extent, but unfortunately here you didn't do any extra handling for them.
> >
> > It also good to show compression information.
> >
> > >
> > > Thus so far this is no better than fiemap to get the logical bytenr.
> > >
> > > And I can't be more wrong, by thinking you're also doing the chunk
> > > lookup, which you didn't.
> > Sorry, I'm confused. Is it needed to do the chunk lookup?
>
> No, it's me getting confused by the "physical" words.
>
> You don't need to do that at all.
>
> > >
> > > So I don't see any benefit compared to regular fiemap.
> > >
> > > In fact, fiemap can provide more info than your initial draft.
> > > Fiemap can already show if the map is compressed (but can't show the
> > > compressed size yet).
> > >
> > > Your draft can be improved to:
> > >
> > > - Show the compression algorithm
> > > Which fiemap can't
> > >
> > > - Show the compressed size
> > > Which fiemap can't either.
> > >
> > I understand that the point is that this command is nothing better than
> > fiemap ioctl now. but It can be improved by showing more information
> > that fiemap doesn't provide like compression algorithm.
>
> Yes, but as mentioned, it would be better to enhance fiemap ioctl to do
> more work.
>
> It may be acceptable as a debug tool for a while, but a generic
> interface would be the ultimate solution.
>
> > >
> > > > > + case BTRFS_FILE_EXTENT_PREALLOC:
> > > > > + len = le64_to_cpu(extent_item->num_bytes);
> > > > > + physical = le64_to_cpu(extent_item->disk_bytenr);
> > > > > + physical_len =
> > > > > le64_to_cpu(extent_item->disk_num_bytes);
> > > > > + offset = le64_to_cpu(extent_item->offset);
> > > > > + printf("end = 0x%llx, physical = 0x%llx,
> > > > > physical_len = 0x%llx\n",
> > >
> > > Currently we only use %llx for flags, for bytenr we always %llu.
> > > You could refer to print-tree.c to see more examples.
> > Thanks! I'll fix this.
> > >
> > > And I don't really like the word "physical" here.
> > >
> > > In fact the bytenr are all btrfs logical bytenr, which makes no direct
> > > sense for its physical location on disk.
> > Yeah, the word "physical" is not good. would It be better to write as
> > "disk_bytenr"?
>
> That would be really much better, then it completely follows the naming
> in file extent item.
Thanks for kind comments. I'll write the next version.
Thanks,
Sidong
>
> Thanks,
> Qu
>
> >
> > Thanks,
> > Sidong
> > >
> > > For real physical bytenr, we need something like (device id, physical
> > > offset), and needs to do a chunk map lookup.
> > >
> > > Thanks,
> > > Qu
> > >
> > > > > + begin + len, physical, physical_len);
> > > > > + break;
> > > > > + }
> > >
> > > > > +
> > > > > + }
> > > > > + buf_off += sizeof(*header) +
> > > > > btrfs_search_header_len(header);
> > > > > + pos += len;
> > > > > + }
> > > > > +
> > > > > + }
> > > > > + ret = 0;
> > > > > +out:
> > > > > + close(fd);
> > > > > + return ret;
> > > > > +}
> > > > > +DEFINE_SIMPLE_COMMAND(inspect_dump_file_extents, "dump-file-extents");
> > > > > diff --git a/cmds/inspect.c b/cmds/inspect.c
> > > > > index 2ef5f4b6..dfb0e27b 100644
> > > > > --- a/cmds/inspect.c
> > > > > +++ b/cmds/inspect.c
> > > > > @@ -696,6 +696,7 @@ static const struct cmd_group inspect_cmd_group = {
> > > > > &cmd_struct_inspect_dump_tree,
> > > > > &cmd_struct_inspect_dump_super,
> > > > > &cmd_struct_inspect_tree_stats,
> > > > > + &cmd_struct_inspect_dump_file_extents,
> > > > > NULL
> > > > > }
> > > > > };
> > > > >
prev parent reply other threads:[~2021-07-11 16:05 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-10 14:41 [RFC] btrfs-progs: cmds: Add subcommand that dumps file extents Sidong Yang
2021-07-10 22:38 ` Qu Wenruo
2021-07-11 1:33 ` Qu Wenruo
2021-07-11 9:02 ` Sidong Yang
2021-07-11 10:58 ` Qu Wenruo
2021-07-11 16:04 ` Sidong Yang [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210711160453.GA68357@realwakka \
--to=realwakka@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=quwenruo.btrfs@gmx.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox