git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: Ronnie Sahlberg <sahlberg@google.com>
Cc: Stefan Beller <sbeller@google.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v3 00/14] ref-transactions-reflog
Date: Tue, 18 Nov 2014 20:46:36 +0100	[thread overview]
Message-ID: <546BA21C.9030803@alum.mit.edu> (raw)
In-Reply-To: <CAL=YDWn1x9TMGOWrmT5KMpQ_iBR0AQ5Ej1yr1pBb4==k0-vchw@mail.gmail.com>

On 11/18/2014 07:36 PM, Ronnie Sahlberg wrote:
> On Tue, Nov 18, 2014 at 3:26 AM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
>> On 11/18/2014 02:35 AM, Stefan Beller wrote:
>>> The following patch series updates the reflog handling to use transactions.
>>> This patch series has previously been sent to the list[1].
>>> [...]
>>
>> I was reviewing this patch series (I left some comments in Gerrit about
>> the first few patches) when I realized that I'm having trouble
>> understanding the big picture of where you want to go with this. I have
>> the feeling that the operations that you are implementing are at too low
>> a level of abstraction.
>>
>> What are the elementary write operations that are needed for a reflog?
>> Off the top of my head,
>>
>> 1. Add a reflog entry when a reference is updated in a transaction.
>> 2. Rename a reflog file when the corresponding reference is renamed.
>> 3. Delete the reflog when the corresponding reference is deleted [1].
>> 4. Configure a reference to be reflogged.
>> 5. Configure a reference to not be reflogged anymore and delete any
>>    existing reflog.
>> 6. Selectively expire old reflog entries, e.g., based on their age.
>>
>> Have I forgotten any?
>>
>> The first three should be side-effects of the corresponding reference
>> updates. Aside from the fact that renames are not yet done within a
>> transaction, I think this is already the case.
>>
>> Number 4, I think, currently only happens in conjunction with adding a
>> line to the reflog. So it could be implemented, say, as a
>> FORCE_CREATE_REFLOG flag on a ref_update within a transaction.
>>
>> Number 5 is not very interesting, I think. For example, it could be a
>> separate API function, disconnected from any transactions.
>>
>> Number 6 is more interesting, and from my quick reading, it looks like a
>> lot of the work of this patch series is to allow number 6 to be
>> implemented in builtin/reflog.c:expire_reflog(). But it seems to me that
>> you are building API calls at the wrong level of abstraction. Expiring a
>> reflog should be a single API call to the refs API, and ultimately it
>> should be left up to the refs backend to decide how to implement it. For
>> a filesystem-based backend, it would do what it does now. But (for
>> example) a SQL-based backend might implement this as a single SELECT
>> statement.
> 
> I agree in principle. But things are more difficult since
> expire_reflog() has very complex semantics.
> To keep things simple for the reviews at this stage the logic is the
> same as the original code:
>   loop over all entries:
>      use very complex conditionals to decide which entries to keep/remove
>      optionally modify the sha1 values for the records we keep
>      write records we keep back to the file one record at a time
> 
> So that as far as possible, we keep the same rules and behavior but we
> use a different API for the actual
> "write entry to new reflog".
> 
> 
> We could wrap this inside a new specific transaction_expire_reflog()
> function so that other types of backends, for example an SQL backend,
> could optimize, but I think that should be in a separate later patch
> because expire_reflog is almost impossibly complex.
> It will not be a simple SELECT unfortunately.
> 
> The current expire logic is something like :
>   1, expire all entries older than timestamp
>   2, optionally, also expire all entries that refer to unreachable
> objects using a different timestamp
>       This involves actually reading the objects that the sha1 points
> to and parsing them!
>   3, optionally, if the sha1 objects can not be referenced, they are
> not commit objects or if they don't exist, then expire them too.
>       This also involves reading the objects behind the sha1.
>   4, optionally, delete reflog entry #foo
>   5, optionally, if any log entries were discarded due to 2,3,4 then
> we might also re-write and modify some of the reflog entries we keep.
> or any combination thereof
> 
>   (6, if --dry-run is specified, just print what we would have expired)
> 
> 
> 2 and 3 requires that we need to read the objects for the entry
> 4 allows us to delete a specific entry
> 5 means that even for entries we keep we will need to mutate them.

Thanks for the explanation. I now understand that it might be more than
a single SELECT statement.

Regarding the complicated rules for expiring reflogs (1, 2, 3, 4): For
now I think it would be fine for the new expire_reflog() API function to
take a callback function as an argument.

Regarding the stitching together of the survivors (5), it seems like the
API function would be the right place to handle that.

Regarding 6, it sounds like you could run the reflog entries through
your callback and report what it *would* have expired.

>> I also don't have the feeling that reflog expiration has to be done
>> within a ref_transaction. For example, is there ever a reason to combine
>> expiration with other reference updates in a single atomic transaction?
> 
> --updateref
> In expire_reflog() we not only prune the reflog. When --updateref is
> used we update the actual ref itself.
> I think we want to have both the ref update and also the reflog update
> both be part of a single atomic transaction.

ISTM that --updateref is another aspect of stitching together the
surviving reflog entries and could properly be done by the API
expire_reflog() function. Maybe the implementation would use an
*internal* transaction. But I still don't see a need for the caller to
be able to combine *arbitrary* reflog changes with *arbitrary* reference
updates in a single transaction, and the unneeded flexibility seems to
require the API to become more complicated than necessary.

>> I think not.
>>
>> So it seems to me that it would be more practical to have a separate API
>> function that is called to expire selected entries from a reflog [2],
>> unconnected with any transaction.
> 
> I think it makes the API cleaner if we have a
> 'you can only update a ref/reflog/<other things added in the future>/
> from within a transaction.'
> 
> Since we need to do reflog changes within a transaction for the expire
> reflog case as well as the rename ref case
> I think it makes sense to enforce that reflog changes must be done
> within a transaction to just make it consistent.

I'm still not convinced. For me, "reflog_expire()" is an unusual outlier
operation, much like "git gc" or "git pack-refs" or "git fsck". None of
these are part of the beautiful Git data model; they are messy
maintenance operations. Forcing reference transactions to be general
enough to allow reflog expiration to be implemented *outside* the refs
API sacrificies their simplicity for lots of infrastructure that will
probably only be used to implement this single operation. Better to
implement reflog expiration *inside* the refs API.

That's my take on it, anyway.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu

  reply	other threads:[~2014-11-18 19:46 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-18  1:35 [PATCH v3 00/14] ref-transactions-reflog Stefan Beller
2014-11-18  1:35 ` [PATCH v3 01/14] refs.c: make ref_transaction_create a wrapper for ref_transaction_update Stefan Beller
2014-11-18  1:35 ` [PATCH v3 02/14] refs.c: make ref_transaction_delete " Stefan Beller
2014-11-18  1:35 ` [PATCH v3 03/14] refs.c: rename the transaction functions Stefan Beller
2014-11-18  1:35 ` [PATCH v3 04/14] refs.c: add a function to append a reflog entry to a fd Stefan Beller
2014-11-18  1:35 ` [PATCH v3 05/14] refs.c: add a new update_type field to ref_update Stefan Beller
2014-11-18  1:35 ` [PATCH v3 06/14] refs.c: add a transaction function to append a reflog entry Stefan Beller
2014-11-18  1:35 ` [PATCH v3 07/14] refs.c: add a flag to allow reflog updates to truncate the log Stefan Beller
2014-11-18  1:35 ` [PATCH v3 08/14] refs.c: only write reflog update if msg is non-NULL Stefan Beller
2014-11-18  1:35 ` [PATCH v3 09/14] refs.c: allow multiple reflog updates during a single transaction Stefan Beller
2014-11-18  1:35 ` [PATCH v3 10/14] reflog.c: use a reflog transaction when writing during expire Stefan Beller
2014-11-18  1:35 ` [PATCH v3 11/14] refs.c: rename log_ref_setup to create_reflog Stefan Beller
2014-11-18  1:35 ` [PATCH v3 12/14] refs.c: Remove unlock_ref/close_ref/commit_ref from the refs api Stefan Beller
2014-11-18  1:35 ` [PATCH v3 13/14] refs.c: remove lock_any_ref_for_update Stefan Beller
2014-11-18  1:35 ` [PATCH v3 14/14] refs.c: allow deleting refs with a broken sha1 Stefan Beller
2014-11-18 11:26 ` [PATCH v3 00/14] ref-transactions-reflog Michael Haggerty
2014-11-18 18:36   ` Ronnie Sahlberg
2014-11-18 19:46     ` Michael Haggerty [this message]
2014-11-18 20:30       ` Junio C Hamano
2014-11-18 21:16         ` Michael Haggerty
2014-11-18 21:28           ` Junio C Hamano
2014-11-19 23:22             ` Stefan Beller
2014-11-20  3:24               ` Jonathan Nieder
2014-11-20 17:34                 ` Junio C Hamano
2014-11-20 10:56               ` Michael Haggerty
2014-11-20 18:17                 ` Jonathan Nieder
2014-11-27  5:34                   ` [PATCH 0/4] Using transactions for the reflog Stefan Beller
2014-11-27  5:34                     ` [PATCH 1/4] refs.c: rename the transaction functions Stefan Beller
2014-11-27  5:34                     ` [PATCH 2/4] refs.c: add a new update_type field to ref_update Stefan Beller
2014-11-27  5:34                     ` [PATCH 3/4] refs.c: add a transaction function to append a reflog entry Stefan Beller
2014-11-27  5:34                     ` [PATCH 4/4] reflog.c: use a reflog transaction when writing during expire Stefan Beller
  -- strict thread matches above, loose matches on Subject: below --
2014-06-18 17:08 [PATCH v3 00/14] ref-transactions-reflog Ronnie Sahlberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=546BA21C.9030803@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=sahlberg@google.com \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).