git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jon Forrest <nobozo@gmail.com>
To: git@vger.kernel.org
Subject: Question About Sorting the Index
Date: Fri, 16 May 2025 16:43:37 -0700	[thread overview]
Message-ID: <1008ijb$6j0$1@ciao.gmane.io> (raw)

I've learned that entries in the index file "are
sorted in ascending order on the name field".

Am I right in thinking that this means that
every time a file is added to the index by
running "git add" the whole index file must
be resorted? If so, this seems like a lot of
work, especially since not all the entries
are the same size.

Has any thought been made about improving this,
such as perhaps having an "index index"? This
would be a separate file that contains the name
field of each entry, the location of where the entry
starts in the index, and the length of the entry.
I'll call this a partial index entry.
The "index index" would also be sorted by the name field.

With this approach, running "git add" would simply
append a full index entry to the index, and
append the partial entry to the "index index", which
would then be sorted. The full index would not be
sorted. I'm guessing this is the common path.

To delete a file from the index, I'd propose adding an
"deleted" bit to the full cache entry. When "git rm --cached"
is run, 2 things would happen:

1) The "deleted" bit would be turned on in the full index
entry for the file. The index itself will not be sorted.
Every so often, perhaps when "git fsck" is run, these
entries could be deleted. The full index won't have
to be resorted when this happens because it won't be
assumed to be in sorted order any longer.

2) The "index index" would be modified by removing the
partial entry for the file. This could be done by
writing the partial entries up to the entry being
deleted, and then the entries following. No sort would
be necessary because the "index index" is already sorted.

One drawback of this approach would be that since the "index index"
entries also won't be the same length, sorting it will still require
extra work. However, this wouldn't be any harder then sorting the full
index, and a lot less data wouldn't have to be moved around.

All this is so simple that I suspect that it's been considered before.
Am I missing something?

Cordially,
Jon Forrest

P.S. I'm trying to read the Git source code to get a better handle
on what actually goes on in the index but this is taking some time.








             reply	other threads:[~2025-05-16 23:43 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-16 23:43 Jon Forrest [this message]
2025-05-17  3:46 ` Re Question About Sorting the Index K Jayatheerth
2025-05-17 17:20   ` Jon Forrest
2025-05-17 18:36 ` Junio C Hamano
2025-05-17 18:48   ` Jon Forrest
2025-05-18  5:13     ` Elijah Newren
2025-05-18 15:13       ` Jon Forrest
2025-05-27 16:38   ` Jon Forrest
2025-05-28  2:34     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='1008ijb$6j0$1@ciao.gmane.io' \
    --to=nobozo@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).