public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Phillip Wood <phillip.wood123@gmail.com>,
	phillip.wood@dunelm.org.uk, James Liu <james@jamesliu.io>,
	Derrick Stolee <stolee@gmail.com>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v3 7/7] run-command: fix detaching when running auto maintenance
Date: Mon, 19 Aug 2024 08:17:22 +0200	[thread overview]
Message-ID: <ZsLjcjhgI8Wk2tIV@tanuki> (raw)
In-Reply-To: <20240817121424.GA2439299@coredump.intra.peff.net>

On Sat, Aug 17, 2024 at 08:14:24AM -0400, Jeff King wrote:
> On Fri, Aug 16, 2024 at 12:45:17PM +0200, Patrick Steinhardt wrote:
> 
> > Fix this bug by asking git-gc(1) to not detach when it is being invoked
> > via git-maintenance(1). Instead, git-maintenance(1) now respects a new
> > config "maintenance.autoDetach", the equivalent of "gc.autoDetach", and
> > detaches itself into the background when running as part of our auto
> > maintenance. This should continue to behave the same for all users which
> > use the git-gc(1) task, only. For others though, it means that we now
> > properly perform all tasks in the background. The default behaviour of
> > git-maintenance(1) when executed by the user does not change, it will
> > remain in the foreground unless they pass the `--detach` option.
> 
> This patch seems to cause segfaults in t5616 when combined with the
> reftable backend. Try this:
> 
>   GIT_TEST_DEFAULT_REF_FORMAT=reftable ./t5616-partial-clone.sh --run=1-16 --stress
> 
> which fails for me within a few runs. Bisecting leads to 98077d06b2
> (run-command: fix detaching when running auto maintenance, 2024-08-16).
> It doesn't trigger with the files ref backend.
> 
> Compiling with ASan gets me a stack trace like this:
> 
>   + git -c protocol.version=0 -C pc1 fetch --filter=blob:limit=29999 --refetch origin
>   AddressSanitizer:DEADLYSIGNAL
>   =================================================================
>   ==657994==ERROR: AddressSanitizer: SEGV on unknown address 0x7fa0f0ec6089 (pc 0x55f23e52ddf9 bp 0x7ffe7bfa1700 sp 0x7ffe7bfa1700 T0)
>   ==657994==The signal is caused by a READ memory access.
>       #0 0x55f23e52ddf9 in get_var_int reftable/record.c:29
>       #1 0x55f23e53295e in reftable_decode_keylen reftable/record.c:170
>       #2 0x55f23e532cc0 in reftable_decode_key reftable/record.c:194
>       #3 0x55f23e54e72e in block_iter_next reftable/block.c:398
>       #4 0x55f23e5573dc in table_iter_next_in_block reftable/reader.c:240
>       #5 0x55f23e5573dc in table_iter_next reftable/reader.c:355
>       #6 0x55f23e5573dc in table_iter_next reftable/reader.c:339
>       #7 0x55f23e551283 in merged_iter_advance_subiter reftable/merged.c:69
>       #8 0x55f23e55169e in merged_iter_next_entry reftable/merged.c:123
>       #9 0x55f23e55169e in merged_iter_next_void reftable/merged.c:172
>       #10 0x55f23e537625 in reftable_iterator_next_ref reftable/generic.c:175
>       #11 0x55f23e2cf9c6 in reftable_ref_iterator_advance refs/reftable-backend.c:464
>       #12 0x55f23e2d996e in ref_iterator_advance refs/iterator.c:13
>       #13 0x55f23e2d996e in do_for_each_ref_iterator refs/iterator.c:452
>       #14 0x55f23dca6767 in get_ref_map builtin/fetch.c:623
>       #15 0x55f23dca6767 in do_fetch builtin/fetch.c:1659
>       #16 0x55f23dca6767 in fetch_one builtin/fetch.c:2133
>       #17 0x55f23dca6767 in cmd_fetch builtin/fetch.c:2432
>       #18 0x55f23dba7764 in run_builtin git.c:484
>       #19 0x55f23dba7764 in handle_builtin git.c:741
>       #20 0x55f23dbab61e in run_argv git.c:805
>       #21 0x55f23dbab61e in cmd_main git.c:1000
>       #22 0x55f23dba4781 in main common-main.c:64
>       #23 0x7fa0f063fc89 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>       #24 0x7fa0f063fd44 in __libc_start_main_impl ../csu/libc-start.c:360
>       #25 0x55f23dba6ad0 in _start (git+0xadfad0) (BuildId: 803b2b7f59beb03d7849fb8294a8e2145dd4aa27)
> 
> My guess based on what I'm seeing and what the patch does is that now
> maintenance from a previous command is running in the background while
> our foreground git-fetch runs, and it somehow confuses things (perhaps
> by trying to compact reftables or something?). So I think there are two
> problems:
> 
>   1. The reftable code needs to be more robust against whatever race is
>      happening. I didn't dig further, but I'm sure it would be possible
>      to produce a coredump.

Yes, it certainly has to be robust against this. It's also where the
recent reftable compaction fixes [1] came from. In theory, it should
work alright with a concurrent process compacting the stack at the same
time where another process is reading. In practice the backend is still
in its infancy, so I'm not entirely surprised that there are concurrency
bugs.

I will investigate this issue, thanks a lot for the backtrace.

[1]: https://lore.kernel.org/git/cover.1722435214.git.ps@pks.im/

>   2. Having racy background maintenance doesn't seem great for test
>      robustness. At the very least, it might subject us to the "rm"
>      problems mentioned elsewhere, where we fail to clean up. Annotating
>      individual "git gc" or "git maintenance" calls with an extra
>      descriptor isn't too bad, but in this case it's all happening under
>      the hood via fetch. Is it a potential problem for every script,
>      then? If so, should we disable background detaching for all test
>      repos, and then let the few that want to test it turn it back on?

Might be a good idea to set `maintenance.autoDetach=false` globally,
yes. The only downside is of course that it wouldn't cause us to detect
failures like the above, where the concurrency itself causes failure.

Anyway, for now I'll:

  - Send a patch to fix the race in t7900.

  - Investigate the reftable concurrency issue.

  - _Not_ send a patch that sets `maintenance.autoDetach=false`.

The last one requires a bit more discussion first, and we have been
running with `gc.autoDetach=true` implicitly in the past. Thinking a bit
more about it, the reason why the above bug triggers now is that
git-gc(1) itself runs git-pack-refs(1), but does that _synchronously_
before detaching itself. Now we detach at a higher level in the
hierarchy, which means that the previously-synchronous part now runs
asynchronously, as well.

I cannot think of a reason why we shouldn't do this, as the ref backends
should handle this gracefully. The fact that the reftable backend
doesn't is a separate, preexisting bug.

Patrick

  reply	other threads:[~2024-08-19  6:17 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-13  7:17 [PATCH 0/7] builtin/maintenance: fix auto-detach with non-standard tasks Patrick Steinhardt
2024-08-13  7:17 ` [PATCH 1/7] config: fix constness of out parameter for `git_config_get_expiry()` Patrick Steinhardt
2024-08-13  7:17 ` [PATCH 2/7] builtin/gc: refactor to read config into structure Patrick Steinhardt
2024-08-15  5:24   ` James Liu
2024-08-15  8:18     ` Patrick Steinhardt
2024-08-15 13:46   ` Derrick Stolee
2024-08-13  7:17 ` [PATCH 3/7] builtin/gc: fix leaking config values Patrick Steinhardt
2024-08-15  5:22   ` James Liu
2024-08-15  8:18     ` Patrick Steinhardt
2024-08-15 13:50   ` Derrick Stolee
2024-08-13  7:17 ` [PATCH 4/7] builtin/gc: stop processing log file on signal Patrick Steinhardt
2024-08-15  6:01   ` James Liu
2024-08-13  7:17 ` [PATCH 5/7] builtin/gc: add a `--detach` flag Patrick Steinhardt
2024-08-13  7:17 ` [PATCH 6/7] builtin/maintenance: " Patrick Steinhardt
2024-08-13  7:18 ` [PATCH 7/7] builtin/maintenance: fix auto-detach with non-standard tasks Patrick Steinhardt
2024-08-13 11:29   ` Phillip Wood
2024-08-13 11:59     ` Patrick Steinhardt
2024-08-13 13:19       ` Phillip Wood
2024-08-14  4:15         ` Patrick Steinhardt
2024-08-14 15:13           ` Phillip Wood
2024-08-15  5:30             ` Patrick Steinhardt
2024-08-15  6:40   ` James Liu
2024-08-15  8:17     ` Patrick Steinhardt
2024-08-15 14:00   ` Derrick Stolee
2024-08-15  6:42 ` [PATCH 0/7] " James Liu
2024-08-15  9:12 ` [PATCH v2 " Patrick Steinhardt
2024-08-15  9:12   ` [PATCH v2 1/7] config: fix constness of out parameter for `git_config_get_expiry()` Patrick Steinhardt
2024-08-15  9:12   ` [PATCH v2 2/7] builtin/gc: refactor to read config into structure Patrick Steinhardt
2024-08-15  9:12   ` [PATCH v2 3/7] builtin/gc: fix leaking config values Patrick Steinhardt
2024-08-15  9:12   ` [PATCH v2 4/7] builtin/gc: stop processing log file on signal Patrick Steinhardt
2024-08-15  9:12   ` [PATCH v2 5/7] builtin/gc: add a `--detach` flag Patrick Steinhardt
2024-08-15 19:11     ` Junio C Hamano
2024-08-15 22:29       ` Junio C Hamano
2024-08-16  8:06         ` Patrick Steinhardt
2024-08-15  9:12   ` [PATCH v2 6/7] builtin/maintenance: " Patrick Steinhardt
2024-08-15  9:12   ` [PATCH v2 7/7] run-command: fix detaching when running auto maintenance Patrick Steinhardt
2024-08-15 16:13     ` Junio C Hamano
2024-08-16  8:06       ` Patrick Steinhardt
2024-08-15 14:04 ` [PATCH 0/7] builtin/maintenance: fix auto-detach with non-standard tasks Derrick Stolee
2024-08-15 15:37   ` Junio C Hamano
2024-08-16  8:06   ` Patrick Steinhardt
2024-08-16 10:44 ` [PATCH v3 " Patrick Steinhardt
2024-08-16 10:44   ` [PATCH v3 1/7] config: fix constness of out parameter for `git_config_get_expiry()` Patrick Steinhardt
2024-08-16 10:45   ` [PATCH v3 2/7] builtin/gc: refactor to read config into structure Patrick Steinhardt
2024-08-16 10:45   ` [PATCH v3 3/7] builtin/gc: fix leaking config values Patrick Steinhardt
2024-08-16 10:45   ` [PATCH v3 4/7] builtin/gc: stop processing log file on signal Patrick Steinhardt
2024-08-16 10:45   ` [PATCH v3 5/7] builtin/gc: add a `--detach` flag Patrick Steinhardt
2024-08-16 10:45   ` [PATCH v3 6/7] builtin/maintenance: " Patrick Steinhardt
2024-08-17  7:09     ` Jeff King
2024-08-17  7:14       ` Jeff King
2024-08-19  6:17       ` Patrick Steinhardt
2024-08-16 10:45   ` [PATCH v3 7/7] run-command: fix detaching when running auto maintenance Patrick Steinhardt
2024-08-17 12:14     ` Jeff King
2024-08-19  6:17       ` Patrick Steinhardt [this message]
2024-08-19  7:47         ` [PATCH 0/3] Fixups for git-maintenance(1) tests Patrick Steinhardt
2024-08-19  7:47           ` [PATCH 1/3] t7900: fix flaky test due to leaking background job Patrick Steinhardt
2024-08-19  8:49             ` Jeff King
2024-08-19  8:55               ` Patrick Steinhardt
2024-08-19  9:12                 ` Jeff King
2024-08-19  9:17                   ` Patrick Steinhardt
2024-08-19  7:48           ` [PATCH 2/3] t7900: exercise detaching via trace2 regions Patrick Steinhardt
2024-08-19  8:51             ` Jeff King
2024-08-19  8:56               ` Patrick Steinhardt
2024-08-21 18:38                 ` Junio C Hamano
2024-08-22  5:41                   ` Patrick Steinhardt
2024-08-22 17:22                     ` Junio C Hamano
2024-08-19  7:48           ` [PATCH 3/3] builtin/maintenance: fix loose objects task emitting pack hash Patrick Steinhardt
2024-08-19  8:55             ` Jeff King
2024-08-19  9:07               ` Patrick Steinhardt
2024-08-19  9:17                 ` Jeff King
2024-08-19  9:26                   ` Patrick Steinhardt
2024-08-19 10:26                     ` Jeff King
2024-08-20  7:39                       ` Patrick Steinhardt
2024-08-20 15:58                         ` Junio C Hamano
2024-08-19 17:05                   ` Junio C Hamano
2024-08-19  8:46         ` [PATCH v3 7/7] run-command: fix detaching when running auto maintenance Jeff King
2024-08-19  9:04           ` Patrick Steinhardt
2024-08-19 10:49       ` Patrick Steinhardt
2024-08-19 15:41         ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZsLjcjhgI8Wk2tIV@tanuki \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=james@jamesliu.io \
    --cc=peff@peff.net \
    --cc=phillip.wood123@gmail.com \
    --cc=phillip.wood@dunelm.org.uk \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox