* [PATCH 0/2] sequencer: truncate lockfile and ref to NAME_MAX @ 2023-08-10 16:34 Mark Ruvald Pedersen via GitGitGadget 2023-08-10 16:34 ` [PATCH 1/2] sequencer: truncate labels to accommodate loose refs Mark Ruvald Pedersen via GitGitGadget 2023-08-10 16:35 ` [PATCH 2/2] rebase: allow overriding the maximal length of the generated labels Johannes Schindelin via GitGitGadget 0 siblings, 2 replies; 7+ messages in thread From: Mark Ruvald Pedersen via GitGitGadget @ 2023-08-10 16:34 UTC (permalink / raw) To: git; +Cc: Mark Ruvald Pedersen Some commits may have unusually long subject lines, which can cause git error out. Currently the sequencer and lockfile assumes these to be less than NAME_MAX which is the maximum length of a filename (on Linux). When reproduced one is met by the error message: $ git rebase --continue error: cannot lock ref 'refs/rewritten/SANITIZED-SUBJECT': Unable to create '.git/refs/rewritten/SANITIZED-SUBJECT.lock': File name too long * where SANITIZED-SUBJECT is very long Affected repos can only be salvaged through filter-branch etc. Johannes Schindelin (1): rebase: allow overriding the maximal length of the generated labels Mark Ruvald Pedersen (1): sequencer: truncate labels to accommodate loose refs Documentation/config/rebase.txt | 6 +++++ git-compat-util.h | 4 +++ sequencer.c | 47 ++++++++++++++++++++++++++++----- t/t3430-rebase-merges.sh | 11 ++++++++ 4 files changed, 62 insertions(+), 6 deletions(-) base-commit: a82fb66fed250e16d3010c75404503bea3f0ab61 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1562%2Fmped-oticon%2Fmped_bugfix_lockfile_maxname-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1562/mped-oticon/mped_bugfix_lockfile_maxname-v1 Pull-Request: https://github.com/git/git/pull/1562 -- gitgitgadget ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] sequencer: truncate labels to accommodate loose refs 2023-08-10 16:34 [PATCH 0/2] sequencer: truncate lockfile and ref to NAME_MAX Mark Ruvald Pedersen via GitGitGadget @ 2023-08-10 16:34 ` Mark Ruvald Pedersen via GitGitGadget 2023-08-10 17:12 ` Junio C Hamano 2023-08-10 16:35 ` [PATCH 2/2] rebase: allow overriding the maximal length of the generated labels Johannes Schindelin via GitGitGadget 1 sibling, 1 reply; 7+ messages in thread From: Mark Ruvald Pedersen via GitGitGadget @ 2023-08-10 16:34 UTC (permalink / raw) To: git; +Cc: Mark Ruvald Pedersen, Mark Ruvald Pedersen From: Mark Ruvald Pedersen <mped@demant.com> Some commits may have unusually long subject lines. When those subject lines are used as labels in the `--rebase-merges` mode of `git rebase`, they can cause errors when writing the corresponding loose refs because most file systems have a maximal file name length of 255 (`NAME_MAX`). The symptom looks like this: $ git rebase --continue error: cannot lock ref 'refs/rewritten/SANITIZED-SUBJECT': Unable to create '.git/refs/rewritten/SANITIZED-SUBJECT.lock': File name too long - where SANITIZED-SUBJECT is very long Let's accommodate this situation by truncating the labels. Care must be taken in case the subject line contains multi-byte characters so as not to truncate in the middle of a character. Signed-off-by: Mark Ruvald Pedersen <mped@demant.com> Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> --- git-compat-util.h | 4 ++++ sequencer.c | 41 ++++++++++++++++++++++++++++++++++++----- 2 files changed, 40 insertions(+), 5 deletions(-) diff --git a/git-compat-util.h b/git-compat-util.h index d32aa754ae1..3e7a59b5ff1 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -422,6 +422,10 @@ char *gitdirname(char *); #define PATH_MAX 4096 #endif +#ifndef NAME_MAX +#define NAME_MAX 255 +#endif + typedef uintmax_t timestamp_t; #define PRItime PRIuMAX #define parse_timestamp strtoumax diff --git a/sequencer.c b/sequencer.c index adc9cfb4df3..be837bd2948 100644 --- a/sequencer.c +++ b/sequencer.c @@ -51,6 +51,15 @@ #define GIT_REFLOG_ACTION "GIT_REFLOG_ACTION" +/* + * To accommodate common filesystem limitations, where the loose refs' file + * names must not exceed `NAME_MAX`, the labels generated by `git rebase + * --rebase-merges` need to be truncated if the corresponding commit subjects + * are too long. + * Add some margin to stay clear from reaching `NAME_MAX`. + */ +#define GIT_MAX_LABEL_LENGTH ((NAME_MAX) - (LOCK_SUFFIX_LEN) - 16) + static const char sign_off_header[] = "Signed-off-by: "; static const char cherry_picked_prefix[] = "(cherry picked from commit "; @@ -5396,6 +5405,8 @@ static const char *label_oid(struct object_id *oid, const char *label, } } else { struct strbuf *buf = &state->buf; + int label_is_utf8 = 1; /* start with this assumption */ + size_t max_len = buf->len + GIT_MAX_LABEL_LENGTH; /* * Sanitize labels by replacing non-alpha-numeric characters @@ -5404,14 +5415,34 @@ static const char *label_oid(struct object_id *oid, const char *label, * * Note that we retain non-ASCII UTF-8 characters (identified * via the most significant bit). They should be all acceptable - * in file names. We do not validate the UTF-8 here, that's not - * the job of this function. + * in file names. + * + * As we will use the labels as names of (loose) refs, it is + * vital that the name not be longer than the maximum component + * size of the file system (`NAME_MAX`). We are careful to + * truncate the label accordingly, allowing for the `.lock` + * suffix and for the label to be UTF-8 encoded (i.e. we avoid + * truncating in the middle of a character). */ - for (; *label; label++) - if ((*label & 0x80) || isalnum(*label)) + for (; *label && buf->len + 1 < max_len; label++) + if (isalnum(*label) || + (!label_is_utf8 && (*label & 0x80))) strbuf_addch(buf, *label); + else if (*label & 0x80) { + const char *p = label; + + utf8_width(&p, NULL); + if (p) { + if (buf->len + (p - label) > max_len) + break; + strbuf_add(buf, label, p - label); + label = p - 1; + } else { + label_is_utf8 = 0; + strbuf_addch(buf, *label); + } /* avoid leading dash and double-dashes */ - else if (buf->len && buf->buf[buf->len - 1] != '-') + } else if (buf->len && buf->buf[buf->len - 1] != '-') strbuf_addch(buf, '-'); if (!buf->len) { strbuf_addstr(buf, "rev-"); -- gitgitgadget ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] sequencer: truncate labels to accommodate loose refs 2023-08-10 16:34 ` [PATCH 1/2] sequencer: truncate labels to accommodate loose refs Mark Ruvald Pedersen via GitGitGadget @ 2023-08-10 17:12 ` Junio C Hamano 2023-08-16 8:36 ` Johannes Schindelin 0 siblings, 1 reply; 7+ messages in thread From: Junio C Hamano @ 2023-08-10 17:12 UTC (permalink / raw) To: Mark Ruvald Pedersen via GitGitGadget Cc: git, Mark Ruvald Pedersen, Mark Ruvald Pedersen "Mark Ruvald Pedersen via GitGitGadget" <gitgitgadget@gmail.com> writes: > +/* > + * To accommodate common filesystem limitations, where the loose refs' file > + * names must not exceed `NAME_MAX`, the labels generated by `git rebase > + * --rebase-merges` need to be truncated if the corresponding commit subjects > + * are too long. > + * Add some margin to stay clear from reaching `NAME_MAX`. > + */ > +#define GIT_MAX_LABEL_LENGTH ((NAME_MAX) - (LOCK_SUFFIX_LEN) - 16) OK. Hopefully no systems defien NAME_MAX shorter than 20 bytes ;-). We may suffix "-%d" to make it unique after this truncation, so there definitely is a need for some slop, and 16-bytes should sufficiently be long. > @@ -5404,14 +5415,34 @@ static const char *label_oid(struct object_id *oid, const char *label, > * > * Note that we retain non-ASCII UTF-8 characters (identified > * via the most significant bit). They should be all acceptable > - * in file names. We do not validate the UTF-8 here, that's not > - * the job of this function. > + * in file names. > + * > + * As we will use the labels as names of (loose) refs, it is > + * vital that the name not be longer than the maximum component > + * size of the file system (`NAME_MAX`). We are careful to > + * truncate the label accordingly, allowing for the `.lock` > + * suffix and for the label to be UTF-8 encoded (i.e. we avoid > + * truncating in the middle of a character). > */ > - for (; *label; label++) > - if ((*label & 0x80) || isalnum(*label)) > + for (; *label && buf->len + 1 < max_len; label++) > + if (isalnum(*label) || > + (!label_is_utf8 && (*label & 0x80))) > strbuf_addch(buf, *label); > + else if (*label & 0x80) { > + const char *p = label; > + > + utf8_width(&p, NULL); > + if (p) { > + if (buf->len + (p - label) > max_len) > + break; > + strbuf_add(buf, label, p - label); > + label = p - 1; > + } else { > + label_is_utf8 = 0; > + strbuf_addch(buf, *label); > + } Utf8_width() does let you advance one unicode character at a time as its side effect, but it may be a bit overkill, as its primary function is to compute the display width of that character. We could take advantage of the fact that the first byte of a UTF-8 character has two high-bits set (i.e. 11xxxxxx) while the second and subsequent bytes have only the top-bit set and the second highest bit clear (i.e. 10xxxxxx) to simplify/optimize it. If this were in a performance sensitive codepath, that is. I'll queue it as-is for now, as we are in "regression fix only" phase of the cycle, and have enough time to polish it. Thanks. > /* avoid leading dash and double-dashes */ > - else if (buf->len && buf->buf[buf->len - 1] != '-') > + } else if (buf->len && buf->buf[buf->len - 1] != '-') > strbuf_addch(buf, '-'); > if (!buf->len) { > strbuf_addstr(buf, "rev-"); ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] sequencer: truncate labels to accommodate loose refs 2023-08-10 17:12 ` Junio C Hamano @ 2023-08-16 8:36 ` Johannes Schindelin 2023-08-16 16:28 ` Junio C Hamano 0 siblings, 1 reply; 7+ messages in thread From: Johannes Schindelin @ 2023-08-16 8:36 UTC (permalink / raw) To: Junio C Hamano Cc: Mark Ruvald Pedersen via GitGitGadget, git, Mark Ruvald Pedersen, Mark Ruvald Pedersen Hi, On Thu, 10 Aug 2023, Junio C Hamano wrote: > "Mark Ruvald Pedersen via GitGitGadget" <gitgitgadget@gmail.com> > writes: > > > +/* > > + * To accommodate common filesystem limitations, where the loose refs' file > > + * names must not exceed `NAME_MAX`, the labels generated by `git rebase > > + * --rebase-merges` need to be truncated if the corresponding commit subjects > > + * are too long. > > + * Add some margin to stay clear from reaching `NAME_MAX`. > > + */ > > +#define GIT_MAX_LABEL_LENGTH ((NAME_MAX) - (LOCK_SUFFIX_LEN) - 16) > > OK. Hopefully no systems defien NAME_MAX shorter than 20 bytes ;-). If there are, we already have problems with the following paths: #CHARS git_path --------------------------------- 20 BISECT_ANCESTORS_OK 20 BISECT_EXPECTED_REV 20 BISECT_FIRST_PARENT 22 fsmonitor--daemon.ipc 23 drop_redundant_commits 23 git-rebase-todo.backup 23 keep_redundant_commits 23 reschedule-failed-exec 24 allow_rerere_autoupdate 26 no-reschedule-failed-exec So I think we're good ;-) > We may suffix "-%d" to make it unique after this truncation, so > there definitely is a need for some slop, and 16-bytes should > sufficiently be long. > > > > @@ -5404,14 +5415,34 @@ static const char *label_oid(struct object_id *oid, const char *label, > > * > > * Note that we retain non-ASCII UTF-8 characters (identified > > * via the most significant bit). They should be all acceptable > > - * in file names. We do not validate the UTF-8 here, that's not > > - * the job of this function. > > + * in file names. > > + * > > + * As we will use the labels as names of (loose) refs, it is > > + * vital that the name not be longer than the maximum component > > + * size of the file system (`NAME_MAX`). We are careful to > > + * truncate the label accordingly, allowing for the `.lock` > > + * suffix and for the label to be UTF-8 encoded (i.e. we avoid > > + * truncating in the middle of a character). > > */ > > - for (; *label; label++) > > - if ((*label & 0x80) || isalnum(*label)) > > + for (; *label && buf->len + 1 < max_len; label++) > > + if (isalnum(*label) || > > + (!label_is_utf8 && (*label & 0x80))) > > strbuf_addch(buf, *label); > > + else if (*label & 0x80) { > > + const char *p = label; > > + > > + utf8_width(&p, NULL); > > + if (p) { > > + if (buf->len + (p - label) > max_len) > > + break; > > + strbuf_add(buf, label, p - label); > > + label = p - 1; > > + } else { > > + label_is_utf8 = 0; > > + strbuf_addch(buf, *label); > > + } > > Utf8_width() does let you advance one unicode character at a time as > its side effect, but it may be a bit overkill, as its primary > function is to compute the display width of that character. > > We could take advantage of the fact that the first byte of a UTF-8 > character has two high-bits set (i.e. 11xxxxxx) while the second and > subsequent bytes have only the top-bit set and the second highest > bit clear (i.e. 10xxxxxx) to simplify/optimize it. If this were in > a performance sensitive codepath, that is. It is not a performance-critical code path, so I erred on the side of simplicity (although I have to admit that the post image of the diff is not exactly for the faint of heart). Could we maybe form the plan to keep in the back of our heads that we already have a UTF-8-truncating functionality in sequencer, and in case another user should turn up, implemnt that optimized function in `utf8.[ch]`? > I'll queue it as-is for now, as we are in "regression fix only" > phase of the cycle, and have enough time to polish it. Thanks, Johannes ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] sequencer: truncate labels to accommodate loose refs 2023-08-16 8:36 ` Johannes Schindelin @ 2023-08-16 16:28 ` Junio C Hamano 0 siblings, 0 replies; 7+ messages in thread From: Junio C Hamano @ 2023-08-16 16:28 UTC (permalink / raw) To: Johannes Schindelin Cc: Mark Ruvald Pedersen via GitGitGadget, git, Mark Ruvald Pedersen, Mark Ruvald Pedersen Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > It is not a performance-critical code path, so I erred on the side of > simplicity (although I have to admit that the post image of the diff is > not exactly for the faint of heart). > > Could we maybe form the plan to keep in the back of our heads that we > already have a UTF-8-truncating functionality in sequencer, and in case > another user should turn up, implemnt that optimized function in > `utf8.[ch]`? Yup, that is a good idea. Even though this one only cares about the bytecount, we'd eventually benefit from two variants, truncate by bytecount and truncate by display width. Both variants should return an error when given a bytestring that does not make a valid UTF-8 sequence, and leave it to the caller to truncate at byte boundary as a fallback, which is trivial (the alternative would be to do the truncation by the callee, but then caller cannot tell if the returned result is a fallback result that the end user may need to be warned about or a known-valid UTF-8 substring if we go that route, so it would be suboptimal). Thanks. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/2] rebase: allow overriding the maximal length of the generated labels 2023-08-10 16:34 [PATCH 0/2] sequencer: truncate lockfile and ref to NAME_MAX Mark Ruvald Pedersen via GitGitGadget 2023-08-10 16:34 ` [PATCH 1/2] sequencer: truncate labels to accommodate loose refs Mark Ruvald Pedersen via GitGitGadget @ 2023-08-10 16:35 ` Johannes Schindelin via GitGitGadget 2023-08-10 17:15 ` Junio C Hamano 1 sibling, 1 reply; 7+ messages in thread From: Johannes Schindelin via GitGitGadget @ 2023-08-10 16:35 UTC (permalink / raw) To: git; +Cc: Mark Ruvald Pedersen, Johannes Schindelin From: Johannes Schindelin <johannes.schindelin@gmx.de> With this change, users can override the compiled-in default for the maximal length of the label names generated by `git rebase --rebase-merges`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Mark Ruvald Pedersen <mped@demant.com> --- Documentation/config/rebase.txt | 6 ++++++ sequencer.c | 8 ++++++-- t/t3430-rebase-merges.sh | 11 +++++++++++ 3 files changed, 23 insertions(+), 2 deletions(-) diff --git a/Documentation/config/rebase.txt b/Documentation/config/rebase.txt index afaf6dad99b..9c248accec2 100644 --- a/Documentation/config/rebase.txt +++ b/Documentation/config/rebase.txt @@ -77,3 +77,9 @@ rebase.rebaseMerges:: equivalent to `--no-rebase-merges`. Passing `--rebase-merges` on the command line, with or without an argument, overrides any `rebase.rebaseMerges` configuration. + +rebase.maxLabelLength:: + When generating label names from commit subjects, truncate the names to + this length. By default, the names are truncated to a little less than + `NAME_MAX` (to allow e.g. `.lock` files to be written for the + corresponding loose refs). diff --git a/sequencer.c b/sequencer.c index be837bd2948..b1fc44f0321 100644 --- a/sequencer.c +++ b/sequencer.c @@ -5349,6 +5349,7 @@ struct label_state { struct oidmap commit2label; struct hashmap labels; struct strbuf buf; + int max_label_length; }; static const char *label_oid(struct object_id *oid, const char *label, @@ -5406,7 +5407,7 @@ static const char *label_oid(struct object_id *oid, const char *label, } else { struct strbuf *buf = &state->buf; int label_is_utf8 = 1; /* start with this assumption */ - size_t max_len = buf->len + GIT_MAX_LABEL_LENGTH; + size_t max_len = buf->len + state->max_label_length; /* * Sanitize labels by replacing non-alpha-numeric characters @@ -5504,7 +5505,8 @@ static int make_script_with_merges(struct pretty_print_context *pp, struct string_entry *entry; struct oidset interesting = OIDSET_INIT, child_seen = OIDSET_INIT, shown = OIDSET_INIT; - struct label_state state = { OIDMAP_INIT, { NULL }, STRBUF_INIT }; + struct label_state state = + { OIDMAP_INIT, { NULL }, STRBUF_INIT, GIT_MAX_LABEL_LENGTH }; int abbr = flags & TODO_LIST_ABBREVIATE_CMDS; const char *cmd_pick = abbr ? "p" : "pick", @@ -5512,6 +5514,8 @@ static int make_script_with_merges(struct pretty_print_context *pp, *cmd_reset = abbr ? "t" : "reset", *cmd_merge = abbr ? "m" : "merge"; + git_config_get_int("rebase.maxlabellength", &state.max_label_length); + oidmap_init(&commit2todo, 0); oidmap_init(&state.commit2label, 0); hashmap_init(&state.labels, labels_cmp, NULL, 0); diff --git a/t/t3430-rebase-merges.sh b/t/t3430-rebase-merges.sh index 96ae0edf1e1..ac5c390652f 100755 --- a/t/t3430-rebase-merges.sh +++ b/t/t3430-rebase-merges.sh @@ -586,4 +586,15 @@ test_expect_success 'progress shows the correct total' ' test_line_count = 14 progress ' +test_expect_success 'truncate label names' ' + commit=$(git commit-tree -p HEAD^ -p HEAD -m "0123456789 我 123" HEAD^{tree}) && + git merge --ff-only $commit && + + done="$(git rev-parse --git-path rebase-merge/done)" && + git -c rebase.maxLabelLength=14 rebase --rebase-merges -x "cp \"$done\" out" --root && + grep "label 0123456789-我$" out && + git -c rebase.maxLabelLength=13 rebase --rebase-merges -x "cp \"$done\" out" --root && + grep "label 0123456789-$" out +' + test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] rebase: allow overriding the maximal length of the generated labels 2023-08-10 16:35 ` [PATCH 2/2] rebase: allow overriding the maximal length of the generated labels Johannes Schindelin via GitGitGadget @ 2023-08-10 17:15 ` Junio C Hamano 0 siblings, 0 replies; 7+ messages in thread From: Junio C Hamano @ 2023-08-10 17:15 UTC (permalink / raw) To: Johannes Schindelin via GitGitGadget Cc: git, Mark Ruvald Pedersen, Johannes Schindelin "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Johannes Schindelin <johannes.schindelin@gmx.de> > > With this change, users can override the compiled-in default for the > maximal length of the label names generated by `git rebase > --rebase-merges`. > ... > +rebase.maxLabelLength:: > + When generating label names from commit subjects, truncate the names to > + this length. By default, the names are truncated to a little less than > + `NAME_MAX` (to allow e.g. `.lock` files to be written for the > + corresponding loose refs). OK. > @@ -5512,6 +5514,8 @@ static int make_script_with_merges(struct pretty_print_context *pp, > *cmd_reset = abbr ? "t" : "reset", > *cmd_merge = abbr ? "m" : "merge"; > > + git_config_get_int("rebase.maxlabellength", &state.max_label_length); > + And it makes sense that the code does not do any check against the NAME_MAX; presumably the primary mission of this configuration variable is to help users who know better than their system headers, and they may need to bust the NAME_MAX limit that is artificially set too low. Will queue. Thanks. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-08-16 16:29 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-08-10 16:34 [PATCH 0/2] sequencer: truncate lockfile and ref to NAME_MAX Mark Ruvald Pedersen via GitGitGadget 2023-08-10 16:34 ` [PATCH 1/2] sequencer: truncate labels to accommodate loose refs Mark Ruvald Pedersen via GitGitGadget 2023-08-10 17:12 ` Junio C Hamano 2023-08-16 8:36 ` Johannes Schindelin 2023-08-16 16:28 ` Junio C Hamano 2023-08-10 16:35 ` [PATCH 2/2] rebase: allow overriding the maximal length of the generated labels Johannes Schindelin via GitGitGadget 2023-08-10 17:15 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).