git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Theodore Tso <tytso@mit.edu>
To: Steven Grimm <koreth@midwinter.com>
Cc: git@vger.kernel.org
Subject: Re: Pruning objects from history?
Date: Sat, 31 Mar 2007 09:11:34 -0400	[thread overview]
Message-ID: <20070331131134.GC25539@thunk.org> (raw)
In-Reply-To: <460DC0F7.1070607@midwinter.com>

On Fri, Mar 30, 2007 at 07:01:27PM -0700, Steven Grimm wrote:
> I've imported the full history of a large project from Subversion using 
> the latest git-svn. The resulting repo is huge, and I believe it's due 
> in large part to a series of big tar.gz files that got checked into the 
> Subversion repository by mistake early in the project's history. They 
> were subsequently removed from svn, but of course git-svn grabs them and 
> puts them in my local history.
> 
> Is there any way to excise those files? They are of no interest to us 
> now -- they were data files for a third-party application we ended up 
> not using -- and they're making git look bad in the disk usage department.
> 
> I believe this has been asked before in the context of removing 
> copyrighted content from public repositories. However, I have a twist 
> that may make it easier: nobody else has cloned this repository yet. I 
> am free to rewrite history with no risk of messing up any downstream 
> repositories, and I don't have to worry about propagating the deletions 
> out to anyone. I just don't know how to do it (assuming it's doable at all.)

It's painful to rewrite history, since you end up needing to rewrite
every single commit after the point where you've tampered with time to
fix up the parent commit ID.

Are you planning on doing a one-shot import, or are you hoping to be
able to do bidirectional gatewaying between svn and git?  If you want
to do the latter, rewriting history is going to be very painful if you
want the bidirectional gateway to work afterwards.

If you just want to do a one-way import, it's probably going to be
much easier to modify whatever importer you use to not import the big
files in the first place.

						- Ted

  parent reply	other threads:[~2007-03-31 13:11 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-31  2:01 Pruning objects from history? Steven Grimm
2007-03-31  2:08 ` Shawn O. Pearce
2007-03-31 13:11 ` Theodore Tso [this message]
2007-03-31 16:18   ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070331131134.GC25539@thunk.org \
    --to=tytso@mit.edu \
    --cc=git@vger.kernel.org \
    --cc=koreth@midwinter.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).