git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Elijah Newren <newren@gmail.com>
Cc: git@vger.kernel.org, "Eric W. Biederman" <ebiederm@gmail.com>,
	Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>,
	Patrick Steinhardt <ps@pks.im>
Subject: Re: [PATCH v5 5/5] builtin/merge-tree.c: implement support for `--write-pack`
Date: Fri, 10 Nov 2023 20:24:44 -0500	[thread overview]
Message-ID: <ZU7X3N/rqCK/Y9cj@nand.local> (raw)
In-Reply-To: <CABPp-BEfy9VOvimP9==ry_rZXu=metOQ8s=_-XiG_Pdx9c06Ww@mail.gmail.com>

On Fri, Nov 10, 2023 at 03:51:18PM -0800, Elijah Newren wrote:
> > @@ -2108,10 +2109,19 @@ static int handle_content_merge(struct merge_options *opt,
> >                 if ((merge_status < 0) || !result_buf.ptr)
> >                         ret = error(_("failed to execute internal merge"));
> >
> > -               if (!ret &&
> > -                   write_object_file(result_buf.ptr, result_buf.size,
> > -                                     OBJ_BLOB, &result->oid))
> > -                       ret = error(_("unable to add %s to database"), path);
> > +               if (!ret) {
> > +                       ret = opt->write_pack
> > +                               ? index_blob_bulk_checkin_incore(&result->oid,
> > +                                                                result_buf.ptr,
> > +                                                                result_buf.size,
> > +                                                                path, 1)
> > +                               : write_object_file(result_buf.ptr,
> > +                                                   result_buf.size,
> > +                                                   OBJ_BLOB, &result->oid);
> > +                       if (ret)
> > +                               ret = error(_("unable to add %s to database"),
> > +                                           path);
> > +               }
> >
> >                 free(result_buf.ptr);
> >                 if (ret)
>
> This is unsafe; the object may need to be read later within the same
> merge.  Perhaps the simplest example related to your testcase is
> modifying the middle section of that testcase (I'll highlight where
> when I comment on the testcase) to read:
>
>     test_commit -C repo base file "$(test_seq 3 5)" &&
>     git -C repo branch -M main &&
>     git -C repo checkout -b side main &&
>     test_commit -C repo side-file file "$(test_seq 1 5)" &&
>     test_commit -C repo side-file2 file2 "$(test_seq 1 6)" &&
>     git -C repo checkout -b other main &&
>     test_commit -C repo other-file file "$(test_seq 3 7)" &&
>     git -C repo mv file file2 &&
>     git -C repo commit -m other-file2 &&
>
>     find repo/$packdir -type f -name "pack-*.idx" >packs.before &&
>     git -C repo merge-tree --write-pack side other &&
>
> In words, what I'm doing here is having both sides modify "file" (the
> same as you did) but also having one side rename "file"->"file2".  The
> side that doesn't rename "file" also introduces a new "file2".  ort
> needs to merge the three versions of "file" to get a new blob object,
> and then merge that with the content from the brand new "file2".  More
> complicated cases are possible, of course.  Anyway, with the modified
> testcase above, merge-tree will fail with:
>
>     fatal: unable to read blob object 06e567b11dfdafeaf7d3edcc89864149383aeab6
>
> I think (untested) that you could fix this by creating two packs
> instead of just one.  In particular, add a call to
> flush_odb_transcation() after the "redo_after_renames" block in
> merge_ort_nonrecursive_internal().  (It's probably easier to see that
> you could place the flush_odb_transaction() call inside
> detect_and_process_renames() just after the process_renames() call,
> but when redo_after_renames is true you'd end up with three packs
> instead of just two).

Great analysis, thanks for catching this error. I tested your approach,
and indeed a flush_odb_transaction() call after the process_renames()
call in detect_and_process_renames() does do the trick.

> What happens with the odb transaction stuff if no new objects are
> written before the call to flush_odb_transaction?  Will that be a
> problem?

I think that the bulk-checkin code is flexible enough to understand that
we shouldn't do anything when there aren't any objects to pack.

> (Since any tree written will not be re-read within the same merge, the
> other write_object_file() call you changed does not have the same
> problem as the one here.)
>
> >@@ -4332,9 +4349,13 @@ static int process_entries(struct merge_options *opt,
> >                fflush(stdout);
> >                BUG("dir_metadata accounting completely off; shouldn't happen");
> >        }
> >-       if (write_tree(result_oid, &dir_metadata.versions, 0,
> >+       if (write_tree(opt, result_oid, &dir_metadata.versions, 0,
> >                       opt->repo->hash_algo->rawsz) < 0)
> >                ret = -1;
> >
> > +
> > +       if (opt->write_pack)
> > +               end_odb_transaction();
> > +
>
> Is the end_odb_transaction() here going to fail with an "Unbalanced
> ODB transaction nesting" when faced with a recursive merge?

I think so, and we should have a test-case demonstrating that. In the
remaining three patches that I posted to extend this approach to 'git
replay', I moved this call elsewhere in such a way that I think

> Perhaps flushing here, and then calling end_odb_transaction() in
> merge_finalize(), much as you do in your replay-and-write-pack series,
> should actually be moved to this commit and included here?

Yep, exactly.

> This does mean that for a recursive merge, that you'll get up to 2*N
> packfiles, where N is the depth of the recursive merge.

We definitely want to avoid that ;-). I think there are a couple of
potential directions forward here, but the most promising one I think is
due to Johannes who suggests that we write loose objects into a
temporary directory with a replace_tmp_objdir() call, and then repack
that side directory before migrating a single pack back into the main
object store.

Thanks,
eaylor

  parent reply	other threads:[~2023-11-11  1:24 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-19 17:28 [PATCH v4 0/7] merge-ort: implement support for packing objects together Taylor Blau
2023-10-19 17:28 ` [PATCH v4 1/7] bulk-checkin: extract abstract `bulk_checkin_source` Taylor Blau
2023-10-20  7:35   ` Jeff King
2023-10-20 16:55     ` Junio C Hamano
2023-10-19 17:28 ` [PATCH v4 2/7] bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types Taylor Blau
2023-10-19 17:28 ` [PATCH v4 3/7] bulk-checkin: refactor deflate routine to accept a `bulk_checkin_source` Taylor Blau
2023-10-19 17:28 ` [PATCH v4 4/7] bulk-checkin: implement `SOURCE_INCORE` mode for `bulk_checkin_source` Taylor Blau
2023-10-23  9:19   ` Patrick Steinhardt
2023-10-23 18:58     ` Jeff King
2023-10-24  6:34       ` Patrick Steinhardt
2023-10-24 17:08         ` Junio C Hamano
2023-10-19 17:28 ` [PATCH v4 5/7] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-19 17:28 ` [PATCH v4 6/7] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-19 17:29 ` [PATCH v4 7/7] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-19 21:47 ` [PATCH v4 0/7] merge-ort: implement support for packing objects together Junio C Hamano
2023-10-20  7:29 ` Jeff King
2023-10-20 16:53   ` Junio C Hamano
2023-10-23  9:19 ` Patrick Steinhardt
2023-10-23 22:44 ` [PATCH v5 0/5] " Taylor Blau
2023-10-23 22:44   ` [PATCH v5 1/5] bulk-checkin: extract abstract `bulk_checkin_source` Taylor Blau
2023-10-25  7:37     ` Jeff King
2023-10-25 15:39       ` Taylor Blau
2023-10-27 23:12       ` Junio C Hamano
2023-10-23 22:44   ` [PATCH v5 2/5] bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types Taylor Blau
2023-10-23 22:45   ` [PATCH v5 3/5] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-25  7:58     ` Patrick Steinhardt
2023-10-25 15:44       ` Taylor Blau
2023-10-25 17:21         ` Eric Sunshine
2023-10-26  8:16           ` Patrick Steinhardt
2023-11-11  0:17           ` Elijah Newren
2023-10-23 22:45   ` [PATCH v5 4/5] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-23 22:45   ` [PATCH v5 5/5] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-25  7:58     ` Patrick Steinhardt
2023-10-25 15:46       ` Taylor Blau
2023-11-10 23:51     ` Elijah Newren
2023-11-11  0:27       ` Junio C Hamano
2023-11-11  1:34         ` Taylor Blau
2023-11-11  1:24       ` Taylor Blau [this message]
2023-11-13 22:05         ` Jeff King
2023-11-14  1:40           ` Junio C Hamano
2023-11-14  2:54             ` Elijah Newren
2023-11-14 21:55             ` Jeff King
2023-11-14  3:08           ` Elijah Newren
2023-11-13 22:02       ` Jeff King
2023-11-13 22:34         ` Taylor Blau
2023-11-14  2:50           ` Elijah Newren
2023-11-14 21:53             ` Jeff King
2023-11-14 22:04           ` Jeff King
2023-10-23 23:31   ` [PATCH v5 0/5] merge-ort: implement support for packing objects together Junio C Hamano
2023-11-06 15:46     ` Johannes Schindelin
2023-11-06 23:19       ` Junio C Hamano
2023-11-07  3:42       ` Jeff King
2023-11-07 15:58       ` Taylor Blau
2023-11-07 18:22         ` [RFC PATCH 0/3] replay: implement support for writing new objects to a pack Taylor Blau
2023-11-07 18:22           ` [RFC PATCH 1/3] merge-ort.c: finalize ODB transactions after each step Taylor Blau
2023-11-11  3:45             ` Elijah Newren
2023-11-07 18:22           ` [RFC PATCH 2/3] tmp-objdir: introduce `tmp_objdir_repack()` Taylor Blau
2023-11-08  7:05             ` Patrick Steinhardt
2023-11-09 19:26               ` Taylor Blau
2023-11-07 18:23           ` [RFC PATCH 3/3] builtin/replay.c: introduce `--write-pack` Taylor Blau
2023-11-11  3:42           ` [RFC PATCH 0/3] replay: implement support for writing new objects to a pack Elijah Newren
2023-11-11  4:04           ` Elijah Newren
2023-10-25  7:58   ` [PATCH v5 0/5] merge-ort: implement support for packing objects together Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZU7X3N/rqCK/Y9cj@nand.local \
    --to=me@ttaylorr.com \
    --cc=ebiederm@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).