git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Git status extremely slow if any file is a multiple of 8GBi
@ 2022-05-04 17:47 Jason Hatton
  2022-05-05 21:04 ` René Scharfe
  0 siblings, 1 reply; 6+ messages in thread
From: Jason Hatton @ 2022-05-04 17:47 UTC (permalink / raw)
  To: git@vger.kernel.org; +Cc: Junio C Hamano

> > The condition sd_size==0 is used as a signal for "no, we really need
> > to compare the contents", and causes the contents to be hashed, and
> > if the contents match the object name recorded in the index, the
> > on-disk size is stored in sd_size and the entry is marked as
> > CE_UPTODATE.  Alas, if the truncated st_size is 0, the resulting
> > entry would have sd_size==0 again, so a workaround like what you
> > outlined is needed.
> 
> Junio C Hamano <gitster@pobox.com> writes:
>
> This is of secondary importance, but the fact that Jason observed
> 8GBi files gets hashed over and over unnecessarily means that we
> would do the same for an empty file, opening, reading 0-bytes,
> hashing, and closing, without taking advantage of the fact that
> CE_UPTODATE bit says the file contents should be up-to-date with
> respect to the cached object name, doesn't it?
> 
> Or do we have "if st_size == 0 and sd_size == 0 then we know what it
> hashes to (i.e. EMPTY_BLOB_SHA*) and there is no need to do the
> usual open-read-hash-close dance" logic (I didn't check)?

Junio C Hamano

As best as I can tell, it rechecks the zero sized files. My Linux box can run
git ls in .006 seconds with 1000 zero sized files in the repo. Rehashing every
file that is a multiple of 2^32 with every "git ls" on the other hand...

I managed to actually compile git with the proposed changes. It seems to correct
the problem and "make test" passes. If upgrading to the patched version if git,
git will rehash the 8GBi files once and work normally. If downgrading to an
unpatched version, git will perceive that the 8GBi files have changes. This
needs to be corrected with "git add" or "git checkout". If you people are
interested, I may be able to find a way to send a patch to the list or put it
on github.

Thanks
Jason D. Hatton


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-05-07  5:19 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-05-04 17:47 Git status extremely slow if any file is a multiple of 8GBi Jason Hatton
2022-05-05 21:04 ` René Scharfe
2022-05-05 22:55   ` Philip Oakley
2022-05-06  0:22     ` [Email External to GFS] " Jason Hatton
2022-05-06  9:40       ` Philip Oakley
2022-05-07  5:19       ` Carlo Marcelo Arenas Belón

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).