git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Phillip Wood <phillip.wood123@gmail.com>
Cc: Jeff King <peff@peff.net>,
	 git@vger.kernel.org,  Patrick Steinhardt <ps@pks.im>,
	 correctmost <cmlists@sent.com>,  Taylor Blau <me@ttaylorr.com>
Subject: Re: [PATCH v2 4/9] cache-tree: avoid strtol() on non-string buffer
Date: Sun, 23 Nov 2025 10:06:50 -0800	[thread overview]
Message-ID: <xmqqh5ukzkqt.fsf@gitster.g> (raw)
In-Reply-To: <633f4d92-c258-45a8-9d32-116c94838e68@gmail.com> (Phillip Wood's message of "Sun, 23 Nov 2025 15:51:57 +0000")

Phillip Wood <phillip.wood123@gmail.com> writes:

> All we need to do to accept a single minus sign is s/while/if/
> ...
> If we limit ourselves to accepting a single minus sign then this can become
> 	if (s == *ptr + (sign == -1))
>
> so we need very little in the way of extra code.
> ...
> A generic helper to replace strtol() that takes a length rather than 
> assuming the input is NUL terminated could be useful elsewhere but I'm 
> not sure we need something that complicated here. I do like the fact 
> that overflow does not cause undefined behavior though. Changing ret for 
> "int" to "unsigned" in peff's patch should fix that.
> 
> Thanks

Perhaps.

By the way, an interesting tangent is this.

The only reason why these fields under discussion are stored in
textual decimal is pretty much the same as the reason why the object
header expresses the byte-length of the payload in textual decimal,
i.e., to be independent from the platform natural implementation of
"int" type (e.g., endiannness and width), but unlike object files,
the index is a local matter (we are prepared for the same directory
accessed over NFS from two platforms with different endianness, but
we do not recommend network access to a repository in the first
place).  And a lot more importantly, the total number of the index
entries contained within an index file is capped to 2^32-1 (the
header has 32-bit count in the network byte order).  The total
number of subdirectories within a directory or the total number of
entries for a level of directory hierarchy that would form a tree
object from a slice of the index cannot exceed that number anyway.

And thanks to the design that made cache-tree an optional index
extension, we can make cache-tree version 2 where the in-core
representation is exactly the same as the current one, but only uses
different serialization when writing to and reading from the index
file.  The new serialization can use 32-bit network byte order
integers, or use our own varint.{c,h,rs}, to record these numbers.

A version of Git that knows about that extension could be taught to
read from the current cache-tree and convert to a new version, but
better yet, it can simply ignore the current cache-tree data in the
file, and write the new version when we do need to write the index
out with a cache-tree.  When such a transparent auto conversion
happens, one single invocation of write_index_as_tree() would become
more expensive than usual (because the last invocation of the
current Git left cache-tree data in the index and usually the next
invocation of Git would take advantage of it when it writes a tree,
but a new version of Git that uses the v2 format would behave as if
there is no cache-tree data in the index and build the tree from
scratch.  After that happens, the cache-tree data in the new format
will be reused and things will continue to work.  You could use an
older version of Git on such an index file and the same transparent
auto conversion will take care of the transition.

  reply	other threads:[~2025-11-23 18:06 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-12  7:55 [PATCH 0/9] asan bonanza Jeff King
2025-11-12  7:56 ` [PATCH 1/9] compat/mmap: mark unused argument in git_munmap() Jeff King
2025-11-12  8:01 ` [PATCH 2/9] pack-bitmap: handle name-hash lookups in incremental bitmaps Jeff King
2025-11-12 11:25   ` Patrick Steinhardt
2025-11-13  2:55   ` Taylor Blau
2025-11-18  8:59     ` Jeff King
2025-11-12  8:02 ` [PATCH 3/9] Makefile: turn on NO_MMAP when building with ASan Jeff King
2025-11-12  8:17   ` Collin Funk
2025-11-12 10:31     ` Jeff King
2025-11-12 20:06       ` Collin Funk
2025-11-12 11:26   ` Patrick Steinhardt
2025-11-13  3:12     ` Taylor Blau
2025-11-13  6:34       ` Patrick Steinhardt
2025-11-18  8:49       ` Jeff King
2025-11-13 16:30     ` Junio C Hamano
2025-11-14  7:00       ` Patrick Steinhardt
2025-11-15  2:13         ` Jeff King
2025-11-12  8:05 ` [PATCH 4/9] cache-tree: avoid strtol() on non-string buffer Jeff King
2025-11-12 11:26   ` Patrick Steinhardt
2025-11-13  3:09     ` Taylor Blau
2025-11-18  8:40       ` Jeff King
2025-11-18  8:38     ` Jeff King
2025-11-12  8:06 ` [PATCH 5/9] fsck: assert newline presence in fsck_ident() Jeff King
2025-11-12  8:06 ` [PATCH 6/9] fsck: avoid strcspn() " Jeff King
2025-11-12  8:06 ` [PATCH 7/9] fsck: remove redundant date timestamp check Jeff King
2025-11-12  8:10 ` [PATCH 8/9] fsck: avoid parse_timestamp() on buffer that isn't NUL-terminated Jeff King
2025-11-12 11:25   ` Patrick Steinhardt
2025-11-12 19:36     ` Junio C Hamano
2025-11-15  2:12     ` Jeff King
2025-11-12  8:10 ` [PATCH 9/9] t: enable ASan's strict_string_checks option Jeff King
2025-11-13  3:17 ` [PATCH 0/9] asan bonanza Taylor Blau
2025-11-18  9:11 ` [PATCH v2 " Jeff King
2025-11-18  9:11   ` [PATCH v2 1/9] compat/mmap: mark unused argument in git_munmap() Jeff King
2025-11-18  9:12   ` [PATCH v2 2/9] pack-bitmap: handle name-hash lookups in incremental bitmaps Jeff King
2025-11-18  9:12   ` [PATCH v2 3/9] Makefile: turn on NO_MMAP when building with ASan Jeff King
2025-11-18  9:12   ` [PATCH v2 4/9] cache-tree: avoid strtol() on non-string buffer Jeff King
2025-11-18 14:30     ` Phillip Wood
2025-11-23  6:19       ` Junio C Hamano
2025-11-23 15:51         ` Phillip Wood
2025-11-23 18:06           ` Junio C Hamano [this message]
2025-11-24 22:30         ` Jeff King
2025-11-24 23:09           ` Junio C Hamano
2025-11-26 15:09             ` Jeff King
2025-11-26 17:22               ` Junio C Hamano
2025-11-30 13:13                 ` [PATCH 0/4] more robust functions for parsing int from buf Jeff King
2025-11-30 13:14                   ` [PATCH 1/4] parse: prefer bool to int for boolean returns Jeff King
2025-12-04 11:23                     ` Patrick Steinhardt
2025-11-30 13:15                   ` [PATCH 2/4] parse: add functions for parsing from non-string buffers Jeff King
2025-11-30 13:46                     ` my complaints with clar Jeff King
2025-12-01 14:16                       ` Phillip Wood
2025-12-04 11:09                         ` Patrick Steinhardt
2025-12-05 18:30                           ` Jeff King
2025-12-04 11:23                     ` [PATCH 2/4] parse: add functions for parsing from non-string buffers Patrick Steinhardt
2025-12-05 16:11                     ` Phillip Wood
2025-11-30 13:15                   ` [PATCH 3/4] cache-tree: use parse_int_from_buf() Jeff King
2025-11-30 13:16                   ` [PATCH 4/4] fsck: use parse_unsigned_from_buf() for parsing timestamp Jeff King
2025-11-18  9:12   ` [PATCH v2 5/9] fsck: assert newline presence in fsck_ident() Jeff King
2025-11-18  9:12   ` [PATCH v2 6/9] fsck: avoid strcspn() " Jeff King
2025-11-18  9:12   ` [PATCH v2 7/9] fsck: remove redundant date timestamp check Jeff King
2025-11-18  9:12   ` [PATCH v2 8/9] fsck: avoid parse_timestamp() on buffer that isn't NUL-terminated Jeff King
2025-11-18  9:12   ` [PATCH v2 9/9] t: enable ASan's strict_string_checks option Jeff King
2025-11-23  5:49   ` [PATCH v2 0/9] asan bonanza Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqh5ukzkqt.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=cmlists@sent.com \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    --cc=phillip.wood123@gmail.com \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).