All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steven Grimm <koreth@midwinter.com>
To: Junio C Hamano <junkio@cox.net>
Cc: Daniel Barkalow <barkalow@iabervon.org>,
	Theodore Ts'o <tytso@mit.edu>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH] Add --no-reuse-delta option to git-gc
Date: Wed, 09 May 2007 02:02:28 -0700	[thread overview]
Message-ID: <46418E24.9020309@midwinter.com> (raw)
In-Reply-To: <7v3b26xvjo.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:
> I think that sounds saner and more user friendly than specific
> knob to tune "window", "depth" and friends which are too
> technical.  It has an added attraction that we can redefine what
> exactly "hard" means later.
>   

On that note, has any thought been given to looking at other compression 
algorithms? Gzip is a great high-speed compressor, but there are others 
out there (some a bit slower, some much slower at both compression and 
decompression) that produce substantially smaller output.

One could even, if one were in a particularly twisted state of mind, 
envision using CPU-intensive compression for less frequently-accessed 
objects and using gzip for active ones, on the theory that the best 
time/space tradeoff is not uniform across all the objects in a git 
repository. Presumably most of us never actually unpack the vast 
majority of objects in a git repository of reasonable age, so the fact 
that it'd take a little longer if we *did* want to unpack them isn't 
much of a downside compared to the upside of reclaiming disk space. That 
would mitigate the impact of using an algorithm that's slow at 
decompression.

I think it'd be kind of neat to have my .git directory shrink by another 
20+%. That's conservative; on maximumcompression.com's test of a mix of 
different file types including images, gzip compresses 64% and the 
best-scoring one does 80%. On English text gzip does 71% and the top 
scorer does 89%. Most of the top-tier compressors are proprietary, but 
there are some open-source ones that do pretty well.

Maybe not worth the added complexity, but I thought I'd toss it out 
there. It probably makes more sense (if it makes any at all) after 
Linus's suggestion to not unpack after cloning is in place. Once the 
upstream has gone to the trouble of CPU-intensive compressing, you 
certainly don't want to force clones to have to spend the time repeating 
the same work.

-Steve (who suspects this is a "yes, we talked this over early in git's 
history" question, but what the heck)

  reply	other threads:[~2007-05-09  9:02 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-08  2:54 [PATCH] Add --no-reuse-delta, --window, and --depth options to git-gc Theodore Ts'o
2007-05-08  3:13 ` Nicolas Pitre
2007-05-08  3:21   ` Theodore Tso
2007-05-08  3:38     ` Dana How
2007-05-08  4:43     ` Junio C Hamano
2007-05-08 13:28       ` [PATCH] Add --no-reuse-delta, --window, and --depth options to Theodore Ts'o
2007-05-08 13:28         ` [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50 Theodore Ts'o
2007-05-08 13:28           ` [PATCH] Add --no-reuse-delta option to git-gc Theodore Ts'o
2007-05-08 15:35             ` Nicolas Pitre
2007-05-09  5:05             ` Daniel Barkalow
2007-05-09  8:15               ` Junio C Hamano
2007-05-09  9:02                 ` Steven Grimm [this message]
2007-05-09 11:35                   ` Other compression?, was " Johannes Schindelin
2007-05-09 15:15                   ` Junio C Hamano
2007-05-09 19:10                   ` Shawn O. Pearce
2007-06-10  7:40                     ` Sam Vilain
2007-06-11  1:51                       ` Nicolas Pitre
2007-06-11  6:20                         ` Steven Grimm
2007-06-11  6:31                           ` Shawn O. Pearce
2007-06-11 10:20                         ` Johannes Schindelin
2007-06-11 14:01                           ` Nicolas Pitre
2007-06-11 21:40                             ` Johannes Schindelin
2007-05-09 19:48                 ` [PATCH] Add --aggressive option to 'git gc' Theodore Tso
2007-05-09 20:19                   ` Junio C Hamano
2007-05-09 22:22                     ` Theodore Tso
2007-05-10  7:38                   ` Junio C Hamano
2007-05-08 15:38           ` [PATCH] Add pack.depth option to git-pack-objects and change default depth to 50 Nicolas Pitre
2007-05-08 16:30             ` Theodore Tso
2007-05-08 16:49               ` Johannes Schindelin
2007-05-08 18:09                 ` Theodore Tso
2007-05-08 18:46                   ` Nicolas Pitre
2007-05-09 13:49                     ` Theodore Tso
2007-05-09 14:17                       ` Johannes Schindelin
2007-05-08 17:07               ` Dana How
2007-05-08 17:35               ` Nicolas Pitre
2007-05-09  5:03                 ` Junio C Hamano
2007-05-08 15:30         ` [PATCH] Add --no-reuse-delta, --window, and --depth options to Nicolas Pitre
2007-05-08 21:12           ` Junio C Hamano
2007-05-08 23:59             ` Nicolas Pitre
2007-05-08 13:46       ` [PATCH] Add --no-reuse-delta, --window, and --depth options to git-gc Nicolas Pitre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46418E24.9020309@midwinter.com \
    --to=koreth@midwinter.com \
    --cc=barkalow@iabervon.org \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.