* [PATCH 00/15] refs: introduce `--auto` to pack refs as needed
@ 2024-03-18 10:52 Patrick Steinhardt
2024-03-20 19:30 ` Karthik Nayak
0 siblings, 1 reply; 5+ messages in thread
From: Patrick Steinhardt @ 2024-03-18 10:52 UTC (permalink / raw)
To: git; +Cc: Derrick Stolee
[-- Attachment #1: Type: text/plain, Size: 4220 bytes --]
Hi,
right now, it is left to the caller of `refs_pack_refs()` to decide
whether or not to repack refs and how exactly to do it. This is
inefficient at times given that the caller typically has no idea whether
or not the refdb is close to optimal already. So even if the refdb is
already in a close-to-optimal state we would end up repacking all of it.
This patch series aims to address this shortcoming. It introduces a new
flag `PACK_REFS_AUTO` that tells the ref backend to repack refs as
needed. For the "files" backend we don't honor this flag yet and thus
end up behaving the exact same as if that flag wasn't set. But for the
"reftable" backend we will use the same auto-compaction algorithm that
is already used during writes to the refdb. Thus, in most of the cases
it wouldn't actually do anything except when the refdb is suboptimally
packed.
Eventually we'll probably also want to wire up heuristics for the
"files" backend to honor the `PACK_REFS_AUTO` flag. Using something like
a ratio of packed-refs size/number of loose refs might be viable. I
punted on that though as I feared that it might lead to bikeshedding and
thus distract from the main goal of this topic, which is to prepare the
ref code and relevant commands to perform optimizations as required. I'm
happy to add such a patch to this series though in case anybody feels
strongly about this.
The `PACK_REFS_AUTO` flag is exposed via a new `git pack-refs --auto`
flag. It is wired up in both `git gc --auto` and `git maintenance run
--auto`.
The series is structured as follows:
- Patches 1 - 5: Bugfixes and improvements for the reftable-specific
compaction code. These are issues that I've found while working on
this series.
- Patches 6 - 8: Refactorings to drop the `PACK_REFS_ALL` flag,
which isn't actually used by the ref backends anymore and confused
me multiple times.
- Patches 9 - 15: Handling of `PACK_REFS_ALL` in git-pack-refs(1),
git-gc(1) and git-maintenance(1).
The patch series is built on top of 2953d95d40 (The eighth batch,
2024-03-15) with "ps/reftable-stack-tempfile" at 60c4c42515
(reftable/stack: register compacted tables as tempfiles, 2024-03-07)
merged into it due to a merge conflict in "reftable/stack.c".
Patrick
PS: I polished this patch series while traveling and am still out of
office until Thursday. I'll thus only get to respond to threads that
are waiting for my input at the end of this week.
Patrick Steinhardt (15):
reftable/stack: fix error handling in `reftable_stack_init_addition()`
reftable/error: discern locked/outdated errors
reftable/stack: use error codes when locking fails during compaction
reftable/stack: gracefully handle failed auto-compaction due to locks
refs/reftable: print errors on compaction failure
t/helper: drop pack-refs wrapper
refs: move `struct pack_refs_opts` to where it's used
refs: remove `PACK_REFS_ALL` flag
refs/reftable: expose auto compaction via new flag
builtin/pack-refs: release allocated memory
builtin/pack-refs: introduce new "--auto" flag
builtin/gc: move `struct maintenance_run_opts`
t6500: extract objects with "17" prefix
builtin/gc: forward git-gc(1)'s `--auto` flag when packing refs
builtin/gc: pack refs when using `git maintenance run --auto`
Documentation/git-pack-refs.txt | 15 +++++-
builtin/gc.c | 86 +++++++++++++++++++--------------
builtin/pack-refs.c | 31 +++++++-----
refs.h | 20 ++++----
refs/reftable-backend.c | 11 ++++-
reftable/error.c | 4 +-
reftable/reftable-error.h | 5 +-
reftable/stack.c | 34 +++++++------
reftable/stack_test.c | 2 +-
t/helper/test-ref-store.c | 20 --------
t/oid-info/hash-info | 12 +++++
t/t0601-reffiles-pack-refs.sh | 30 ++++++++++--
t/t0610-reftable-basics.sh | 79 ++++++++++++++++++++++++++++++
t/t6500-gc.sh | 30 +++---------
14 files changed, 255 insertions(+), 124 deletions(-)
base-commit: f94fee98b456ef39a5f85fff5802f4e538a4dc7b
--
2.44.0
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 00/15] refs: introduce `--auto` to pack refs as needed
@ 2024-03-18 22:14 Han-Wen Nienhuys
2024-03-19 16:10 ` Patrick Steinhardt
0 siblings, 1 reply; 5+ messages in thread
From: Han-Wen Nienhuys @ 2024-03-18 22:14 UTC (permalink / raw)
To: Patrick Steinhardt, stolee@gmail.com; +Cc: git
I had a quick look over the reftable bits of this series. It looks OK,
but here are some comments. Nothing blocking.
* reftable/error: discern locked/outdated errors
It is not obvious to me why you need two different codes. Is it so you
can print the offending lock file (so people can delete them
manually?). FWIW, this was based on JGit, which has
/**
* The ref could not be locked for update/delete.
* <p>
* This is generally a transient failure and is
usually caused by
* another process trying to access the ref at the
same time as this
* process was trying to update it. It is possible a
future operation
* will be successful.
*/
* reftable/stack: gracefully handle failed auto-compaction due to locks
It's a bit unsatisfying that you have to use details of the locking
protocol to test it, but I couldn't think of a way to unittest this
using only the API. Maybe it's worth considering removing the
automatic compaction from the reftable-stack.h API, and have the
caller (eg. in refs/reftable-backend.c) call it explicitly?
--
Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 00/15] refs: introduce `--auto` to pack refs as needed
2024-03-18 22:14 [PATCH 00/15] refs: introduce `--auto` to pack refs as needed Han-Wen Nienhuys
@ 2024-03-19 16:10 ` Patrick Steinhardt
0 siblings, 0 replies; 5+ messages in thread
From: Patrick Steinhardt @ 2024-03-19 16:10 UTC (permalink / raw)
To: Han-Wen Nienhuys; +Cc: stolee@gmail.com, git, Justin Tobler
On Mon, Mar 18, 2024 at 11:14:58PM +0100, Han-Wen Nienhuys wrote:
> I had a quick look over the reftable bits of this series. It looks OK,
> but here are some comments. Nothing blocking.
>
> * reftable/error: discern locked/outdated errors
>
> It is not obvious to me why you need two different codes. Is it so you
> can print the offending lock file (so people can delete them
> manually?). FWIW, this was based on JGit, which has
>
> /**
> * The ref could not be locked for update/delete.
> * <p>
> * This is generally a transient failure and is
> usually caused by
> * another process trying to access the ref at the
> same time as this
> * process was trying to update it. It is possible a
> future operation
> * will be successful.
> */
I just think that these are somewhat different failure modes. Not being
able to lock a file because it already is locked is totally different
than realizing that files have been concurrently modified and are thus
out of date now.
So the main motivator why I split up the error codes is that the error
message will end up being shown to the user. Before we would say "data
is outdated" for both cases. Now we either say "data is locked", or we
say "data concurrently modified", which I think is a lot more helpful to
the user.
> * reftable/stack: gracefully handle failed auto-compaction due to locks
>
> It's a bit unsatisfying that you have to use details of the locking
> protocol to test it, but I couldn't think of a way to unittest this
> using only the API. Maybe it's worth considering removing the
> automatic compaction from the reftable-stack.h API, and have the
> caller (eg. in refs/reftable-backend.c) call it explicitly?
I think it's not all that bad. Yes, we now need to know about error
codes returned by the auto-compaction code. But it's still contained in
the reftable library. If we were to remove it from the API itself then
this knowledge would need to exist in all callers of the auto compaction
code which live _outside_ of the library, which I think would be even
worse.
We probably could add a unit test that basically does the same as the
added test in t0610. That is, we lock all tables in the stack and then
check that `reftable_addition_commit()` fails gracefully when trying to
compact the already-locked tables.
But that actually highlights another issue that we will tackle soonish:
I think that the compaction code should handle already-locked files more
gracefully. E.g. when tables 0..N are locked by a concurrent process,
then the reftable library should realize that and only try to compact
tables N+1..M. Justin Tobler is currently working on some refactorings
of the auto-compaction logic in [1], so we will tackle this issue once
a revised version of his patch has landed.
[1]: https://lore.kernel.org/git/pull.1683.git.1709669025722.gitgitgadget@gmail.com/
Patrick
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 00/15] refs: introduce `--auto` to pack refs as needed
2024-03-18 10:52 Patrick Steinhardt
@ 2024-03-20 19:30 ` Karthik Nayak
2024-03-25 9:10 ` Patrick Steinhardt
0 siblings, 1 reply; 5+ messages in thread
From: Karthik Nayak @ 2024-03-20 19:30 UTC (permalink / raw)
To: Patrick Steinhardt, git; +Cc: Derrick Stolee
[-- Attachment #1: Type: text/plain, Size: 821 bytes --]
Patrick Steinhardt <ps@pks.im> writes:
> The `PACK_REFS_AUTO` flag is exposed via a new `git pack-refs --auto`
> flag. It is wired up in both `git gc --auto` and `git maintenance run
> --auto`.
>
Wondering if this is also exposed as a config option.
> The series is structured as follows:
>
> - Patches 1 - 5: Bugfixes and improvements for the reftable-specific
> compaction code. These are issues that I've found while working on
> this series.
>
> - Patches 6 - 8: Refactorings to drop the `PACK_REFS_ALL` flag,
> which isn't actually used by the ref backends anymore and confused
> me multiple times.
>
> - Patches 9 - 15: Handling of `PACK_REFS_ALL` in git-pack-refs(1),
> git-gc(1) and git-maintenance(1).
>
I'm assuming this means `PACK_REFS_AUTO` in the last sentence.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 00/15] refs: introduce `--auto` to pack refs as needed
2024-03-20 19:30 ` Karthik Nayak
@ 2024-03-25 9:10 ` Patrick Steinhardt
0 siblings, 0 replies; 5+ messages in thread
From: Patrick Steinhardt @ 2024-03-25 9:10 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git, Derrick Stolee
[-- Attachment #1: Type: text/plain, Size: 1882 bytes --]
On Wed, Mar 20, 2024 at 12:30:58PM -0700, Karthik Nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> > The `PACK_REFS_AUTO` flag is exposed via a new `git pack-refs --auto`
> > flag. It is wired up in both `git gc --auto` and `git maintenance run
> > --auto`.
> >
>
> Wondering if this is also exposed as a config option.
It's not, no, and I don't quite think it would make sense. We don't
expose the current `git gc --auto` flag as config option, either. While
we have "gc.auto", it doesn't configure whether or not the `--auto` flag
is set, but rather configures the boundary at which objects would be
repacked.
In hindsight that config option is too narrowly focussed on objects
while pretending to be quite general. It would've been great if this was
called "gc.autoLooseObjectLimit" or something similar instead to not
cause confusion and to be more easily extendable in the future.
I can certainly see that we might eventually want to have ref backend
specific configuration here. For the "files" backend this could be be
the number of loose refs that are allowed to exist before repacking,
whereas for the "reftable" backend this could be the geometric factor.
But I'll leave that for a future series.
> > The series is structured as follows:
> >
> > - Patches 1 - 5: Bugfixes and improvements for the reftable-specific
> > compaction code. These are issues that I've found while working on
> > this series.
> >
> > - Patches 6 - 8: Refactorings to drop the `PACK_REFS_ALL` flag,
> > which isn't actually used by the ref backends anymore and confused
> > me multiple times.
> >
> > - Patches 9 - 15: Handling of `PACK_REFS_ALL` in git-pack-refs(1),
> > git-gc(1) and git-maintenance(1).
> >
>
> I'm assuming this means `PACK_REFS_AUTO` in the last sentence.
Ah, yes.
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-03-25 9:10 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-18 22:14 [PATCH 00/15] refs: introduce `--auto` to pack refs as needed Han-Wen Nienhuys
2024-03-19 16:10 ` Patrick Steinhardt
-- strict thread matches above, loose matches on Subject: below --
2024-03-18 10:52 Patrick Steinhardt
2024-03-20 19:30 ` Karthik Nayak
2024-03-25 9:10 ` Patrick Steinhardt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).