git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Pruning objects from history?
@ 2007-03-31  2:01 Steven Grimm
  2007-03-31  2:08 ` Shawn O. Pearce
  2007-03-31 13:11 ` Theodore Tso
  0 siblings, 2 replies; 4+ messages in thread
From: Steven Grimm @ 2007-03-31  2:01 UTC (permalink / raw)
  To: git

I've imported the full history of a large project from Subversion using 
the latest git-svn. The resulting repo is huge, and I believe it's due 
in large part to a series of big tar.gz files that got checked into the 
Subversion repository by mistake early in the project's history. They 
were subsequently removed from svn, but of course git-svn grabs them and 
puts them in my local history.

Is there any way to excise those files? They are of no interest to us 
now -- they were data files for a third-party application we ended up 
not using -- and they're making git look bad in the disk usage department.

I believe this has been asked before in the context of removing 
copyrighted content from public repositories. However, I have a twist 
that may make it easier: nobody else has cloned this repository yet. I 
am free to rewrite history with no risk of messing up any downstream 
repositories, and I don't have to worry about propagating the deletions 
out to anyone. I just don't know how to do it (assuming it's doable at all.)

Thanks!

-Steve

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Pruning objects from history?
  2007-03-31  2:01 Pruning objects from history? Steven Grimm
@ 2007-03-31  2:08 ` Shawn O. Pearce
  2007-03-31 13:11 ` Theodore Tso
  1 sibling, 0 replies; 4+ messages in thread
From: Shawn O. Pearce @ 2007-03-31  2:08 UTC (permalink / raw)
  To: Steven Grimm; +Cc: git

Steven Grimm <koreth@midwinter.com> wrote:
> Is there any way to excise those files? They are of no interest to us 
> now -- they were data files for a third-party application we ended up 
> not using -- and they're making git look bad in the disk usage department.

Try cg-admin-rewritehistory (in Cogito)?

-- 
Shawn.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Pruning objects from history?
  2007-03-31  2:01 Pruning objects from history? Steven Grimm
  2007-03-31  2:08 ` Shawn O. Pearce
@ 2007-03-31 13:11 ` Theodore Tso
  2007-03-31 16:18   ` Linus Torvalds
  1 sibling, 1 reply; 4+ messages in thread
From: Theodore Tso @ 2007-03-31 13:11 UTC (permalink / raw)
  To: Steven Grimm; +Cc: git

On Fri, Mar 30, 2007 at 07:01:27PM -0700, Steven Grimm wrote:
> I've imported the full history of a large project from Subversion using 
> the latest git-svn. The resulting repo is huge, and I believe it's due 
> in large part to a series of big tar.gz files that got checked into the 
> Subversion repository by mistake early in the project's history. They 
> were subsequently removed from svn, but of course git-svn grabs them and 
> puts them in my local history.
> 
> Is there any way to excise those files? They are of no interest to us 
> now -- they were data files for a third-party application we ended up 
> not using -- and they're making git look bad in the disk usage department.
> 
> I believe this has been asked before in the context of removing 
> copyrighted content from public repositories. However, I have a twist 
> that may make it easier: nobody else has cloned this repository yet. I 
> am free to rewrite history with no risk of messing up any downstream 
> repositories, and I don't have to worry about propagating the deletions 
> out to anyone. I just don't know how to do it (assuming it's doable at all.)

It's painful to rewrite history, since you end up needing to rewrite
every single commit after the point where you've tampered with time to
fix up the parent commit ID.

Are you planning on doing a one-shot import, or are you hoping to be
able to do bidirectional gatewaying between svn and git?  If you want
to do the latter, rewriting history is going to be very painful if you
want the bidirectional gateway to work afterwards.

If you just want to do a one-way import, it's probably going to be
much easier to modify whatever importer you use to not import the big
files in the first place.

						- Ted

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Pruning objects from history?
  2007-03-31 13:11 ` Theodore Tso
@ 2007-03-31 16:18   ` Linus Torvalds
  0 siblings, 0 replies; 4+ messages in thread
From: Linus Torvalds @ 2007-03-31 16:18 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Steven Grimm, git



On Sat, 31 Mar 2007, Theodore Tso wrote:
> 
> It's painful to rewrite history, since you end up needing to rewrite
> every single commit after the point where you've tampered with time to
> fix up the parent commit ID.

Well, if you don't ever need to actually rewrite blob objects, and many of 
the trees end up being the same, history rewriting is actually pretty 
cheap. If it mostly just has to rewrite the commits and a few trees, you 
end up having (even for big projects) just a few hundred thousand easy 
objects to rewrite. It's going to take a minute or two at most.

"git-convert-objects" has most of the logic, so some trivial added code to 
"convert" tree objects by removing certain entries should just do it. But 
yeah, it would involve some real changes. cg-admin-rewritehist should be 
able to do it already, although I suspect it would be slower (but for a 
one-shot thing, nobody probably cares, and changing convert-objects is 
probably going to take more time than just running the rewritehist 
scripts).

> If you just want to do a one-way import, it's probably going to be
> much easier to modify whatever importer you use to not import the big
> files in the first place.

I'd actually like cg-admin-rewritehist to be merged into git. I think it's 
one of the few things that cogito does that native git doesn't do.

		Linus

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2007-03-31 16:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-31  2:01 Pruning objects from history? Steven Grimm
2007-03-31  2:08 ` Shawn O. Pearce
2007-03-31 13:11 ` Theodore Tso
2007-03-31 16:18   ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).