From: Amjad Alsharafi <amjadsharafi10@gmail.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-devel@nongnu.org, Hanna Reitz <hreitz@redhat.com>,
"open list:vvfat" <qemu-block@nongnu.org>
Subject: Re: [PATCH v4 2/4] vvfat: Fix usage of `info.file.offset`
Date: Wed, 12 Jun 2024 00:22:46 +0800 [thread overview]
Message-ID: <Zmh51luVeRY6H-Qp@amjad-pc> (raw)
In-Reply-To: <ZmhfnSIe30da03uN@redhat.com>
On Tue, Jun 11, 2024 at 04:30:53PM +0200, Kevin Wolf wrote:
> Am 11.06.2024 um 14:31 hat Amjad Alsharafi geschrieben:
> > On Mon, Jun 10, 2024 at 06:49:43PM +0200, Kevin Wolf wrote:
> > > Am 05.06.2024 um 02:58 hat Amjad Alsharafi geschrieben:
> > > > The field is marked as "the offset in the file (in clusters)", but it
> > > > was being used like this
> > > > `cluster_size*(nums)+mapping->info.file.offset`, which is incorrect.
> > > >
> > > > Additionally, removed the `abort` when `first_mapping_index` does not
> > > > match, as this matches the case when adding new clusters for files, and
> > > > its inevitable that we reach this condition when doing that if the
> > > > clusters are not after one another, so there is no reason to `abort`
> > > > here, execution continues and the new clusters are written to disk
> > > > correctly.
> > > >
> > > > Signed-off-by: Amjad Alsharafi <amjadsharafi10@gmail.com>
> > >
> > > Can you help me understand how first_mapping_index really works?
> > >
> > > It seems to me that you get a chain of mappings for each file on the FAT
> > > filesystem, which are just the contiguous areas in it, and
> > > first_mapping_index refers to the mapping at the start of the file. But
> > > for much of the time, it actually doesn't seem to be set at all, so you
> > > have mapping->first_mapping_index == -1. Do you understand the rules
> > > around when it's set and when it isn't?
> >
> > Yeah. So `first_mapping_index` is the index of the first mapping, each
> > mapping is a group of clusters that are contiguous in the file.
> > Its mostly `-1` because the first mapping will have the value set as
> > `-1` and not its own index, this value will only be set when the file
> > contain more than one mapping, and this will only happen when you add
> > clusters to a file that are not contiguous with the existing clusters.
>
> Ah, that makes some sense. Not sure if it's optimal, but it's a rule I
> can work with. So just to confirm, this is the invariant that we think
> should always hold true, right?
>
> assert((mapping->mode & MODE_DIRECTORY) ||
> !mapping->info.file.offset ||
> mapping->first_mapping_index > 0);
>
Yes.
We can add this into `get_cluster_count_for_direntry` loop.
I'm thinking of also converting those `abort` into `assert`, since
the line `copy_it = 1;` was confusing me, since it was after the `abort`.
> > And actually, thanks to that I noticed another bug not fixed in PATCH 3,
> > We are doing this check
> > `s->current_mapping->first_mapping_index != mapping->first_mapping_index`
> > to know if we should switch to the new mapping or not.
> > If we were reading from the first mapping (`first_mapping_index == -1`)
> > and we jumped to the second mapping (`first_mapping_index == n`), we
> > will catch this condition and switch to the new mapping.
> >
> > But if the file has more than 2 mappings, and we jumped to the 3rd
> > mapping, we will not catch this since (`first_mapping_index == n`) for
> > both of them haha. I think a better check is to check the `mapping`
> > pointer directly. (I'll add it also in the next series together with a
> > test for it.)
>
> This comparison is exactly what confused me. I didn't realise that the
> first mapping in the chain has a different value here, so I thought this
> must mean that we're looking at a different file now - but of course I
> couldn't see a reason for that because we're iterating through a single
> file in this function.
>
> But even now that I know that the condition triggers when switching from
> the first to the second mapping, it doesn't make sense to me. We don't
> have to copy things around just because a file is non-contiguous.
>
> What we want to catch is if the order of mappings has changed compared
> to the old state. Do we need a linked list, maybe a prev_mapping_index,
> instead of first_mapping_index so that we can compare if it is still the
> same as before?
I think this would be the better design (tbh, that's what I thought
`first_mapping_index` would do), though not sure if other components
depend so much into the current design that it would be hard to change.
I'll try to implement this `prev_mapping_index` and see how it goes.
>
> Or actually, I suppose that's the first block with an abort() in the
> code, just that it doesn't compare mappings, but their offsets.
I think, I'm still confused on the whole logic there, the function
`get_cluster_count_for_direntry` is a mess, and it doesn't just
*get* the cluster count, it also schedule writeouts and may
copy clusters around.
>
> > >
> > > > block/vvfat.c | 12 +++++++-----
> > > > 1 file changed, 7 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/block/vvfat.c b/block/vvfat.c
> > > > index 19da009a5b..f0642ac3e4 100644
> > > > --- a/block/vvfat.c
> > > > +++ b/block/vvfat.c
> > > > @@ -1408,7 +1408,9 @@ read_cluster_directory:
> > > >
> > > > assert(s->current_fd);
> > > >
> > > > - offset=s->cluster_size*(cluster_num-s->current_mapping->begin)+s->current_mapping->info.file.offset;
> > > > + offset = s->cluster_size *
> > > > + ((cluster_num - s->current_mapping->begin)
> > > > + + s->current_mapping->info.file.offset);
> > > > if(lseek(s->current_fd, offset, SEEK_SET)!=offset)
> > > > return -3;
> > > > s->cluster=s->cluster_buffer;
> > > > @@ -1929,8 +1931,9 @@ get_cluster_count_for_direntry(BDRVVVFATState* s, direntry_t* direntry, const ch
> > > > (mapping->mode & MODE_DIRECTORY) == 0) {
> > > >
> > > > /* was modified in qcow */
> > > > - if (offset != mapping->info.file.offset + s->cluster_size
> > > > - * (cluster_num - mapping->begin)) {
> > > > + if (offset != s->cluster_size
> > > > + * ((cluster_num - mapping->begin)
> > > > + + mapping->info.file.offset)) {
> > > > /* offset of this cluster in file chain has changed */
> > > > abort();
> > > > copy_it = 1;
> > > > @@ -1944,7 +1947,6 @@ get_cluster_count_for_direntry(BDRVVVFATState* s, direntry_t* direntry, const ch
> > > >
> > > > if (mapping->first_mapping_index != first_mapping_index
> > > > && mapping->info.file.offset > 0) {
> > > > - abort();
> > > > copy_it = 1;
> > > > }
> > >
> > > I'm unsure which case this represents. If first_mapping_index refers to
> > > the mapping of the first cluster in the file, does this mean we got a
> > > mapping for a different file here? Or is the comparison between -1 and a
> > > real value?
> >
> > Now that I think more about it, I think this `abort` is actually
> > correct, the issue though is that the handling around this code is not.
> >
> > What this `abort` actually does is that it checks.
> > - if the `mapping->first_mapping_index` is not the same as
> > `first_mapping_index`, which **should** happen only in one case, when
> > we are handling the first mapping, in that case
> > `mapping->first_mapping_index == -1`, in all other cases, the other
> > mappings after the first should have the condition true.
> > - From above, we know that this is the first mapping, so if the offset
> > is not `0`, then abort, since this is an invalid state.
>
> Yes, make sense.
>
> > This is all good, the issue is that `first_mapping_index` is not set if
> > we are checking from the middle, the variable `first_mapping_index` is
> > only set if we passed through the check `cluster_was_modified` with the
> > first mapping, and in the same function call we checked the other
> > mappings.
>
> I think I noticed the same yesterday, but when I tried to write a quick
> patch that I could show you and that would update first_mapping_index in
> each iteration, I broke something. So I decided I'd first ask you what
> all of this even means. :-)
>
> > From what I have seen, that doesn't happen since even if you write the
> > whole file in one go, you are still writing it cluster by cluster, and
> > the checks happen at that time.
>
> Well, we do trigger the condition, but I suppose updating
> first_mapping_index in each loop iteration is really the way to go if
> you think the same.
Indeed, I did a quick change, modifying the loop to always go through
and set the `first_mapping_index` for the first mapping fixes the issue
and we can put the `abort` back in place.
I'll also modify the check to instead be
`mapping->first_mapping_index < 0 && mapping->info.file.offset > 0`
This will make it clear that this applies only to the first mapping.
>
> Kevin
>
next prev parent reply other threads:[~2024-06-11 16:23 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-05 0:58 [PATCH v4 0/4] vvfat: Fix write bugs for large files and add iotests Amjad Alsharafi
2024-06-05 0:58 ` [PATCH v4 1/4] vvfat: Fix bug in writing to middle of file Amjad Alsharafi
2024-06-05 0:58 ` [PATCH v4 2/4] vvfat: Fix usage of `info.file.offset` Amjad Alsharafi
2024-06-10 16:49 ` Kevin Wolf
2024-06-11 12:31 ` Amjad Alsharafi
2024-06-11 14:30 ` Kevin Wolf
2024-06-11 16:22 ` Amjad Alsharafi [this message]
2024-06-11 18:11 ` Kevin Wolf
2024-06-05 0:58 ` [PATCH v4 3/4] vvfat: Fix reading files with non-continuous clusters Amjad Alsharafi
2024-06-05 0:58 ` [PATCH v4 4/4] iotests: Add `vvfat` tests Amjad Alsharafi
2024-06-10 12:01 ` Kevin Wolf
2024-06-10 14:11 ` Amjad Alsharafi
2024-06-10 16:50 ` Kevin Wolf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zmh51luVeRY6H-Qp@amjad-pc \
--to=amjadsharafi10@gmail.com \
--cc=hreitz@redhat.com \
--cc=kwolf@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.