From: "Shawn O. Pearce" <spearce@spearce.org>
To: Junio C Hamano <junkio@cox.net>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 0/4] External 'filter' attributes and drivers
Date: Sun, 22 Apr 2007 01:20:08 -0400 [thread overview]
Message-ID: <20070422052008.GH17480@spearce.org> (raw)
In-Reply-To: <11771520591529-git-send-email-junkio@cox.net>
Junio C Hamano <junkio@cox.net> wrote:
> I know this is controversial, but here is a small four patch
> series to let you insert arbitrary external filter in checkin
> and checkout codepath.
This series is some pretty nice work.
But I really don't think we want filters. Actually, I'm very
against them, and I'm actually very against the CRLF work that
has already been added. Since the CRLF ship has sailed I won't
try to call it back to port. But I don't want to see the filter
stuff raise anchor...
I've only really seen a few arguments for the filters:
1) Better compress structured content (e.g. ODF) by storing the
ZIP as a tree, allowing normal deltification within packfiles
to apply to the contained files.
2) Use a custom diff function on special files (e.g. ODF) as they
are otherwise unreadable with the internal xdiff based engine.
3) Mutate content prior to extracting from the tree, e.g. printing.
Let me try to address these points.
#1: There are a limited number of content formats that we could
reasonably filter into the repository such that the standard
deltification routines will have good space/performance benefits.
Most of them today are ZIP archives (e.g. ODF, JAR).
Why don't we just teach the packfile format how to better compress
these types of streams? Let read_sha1_file() and pack-objects do all
of the heavy translation work, just as they do today for text files.
Explode them into a "tree-like" thing that allows deltification
against any other content (even cross ZIP streams) just like we
do with trees, but always expose them to the working directory
level of the system as blobs.
This way we never get into the mess that David Lang pointed out
where we have many optimizations that reuse working tree files when
stat data matches; nor do we have to worry about major structural
differences between the working tree (1 file) and the repository
format (exploded ZIP as 10,000 files).
#2: We already support using any diff tool you want: set the
GIT_EXTERNAL_DIFF environment variable before running a program that
generates a diff. As Junio pointed out on #git tonight, that could
be any shell script that decides how to produce the diff based on
its own logic. Though we could also use the new attribute stuff
to select diff programs, much like we do now for merge conflict
resolution in merge-recursive.
#3: This has already been discussed at length on the list.
Letting the build system perform this sort of work is better than
making the VCS do it; especially when you want the VCS to do its
sole job well (track the state of the working directory) and the
build system do its sole job well (produce files suitable for use
outside of the repository).
So despite the fact that I tried to make 4/4 shorter, I really
don't think we should be doing this...
--
Shawn.
next prev parent reply other threads:[~2007-04-22 5:20 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-04-21 10:40 [PATCH 0/4] External 'filter' attributes and drivers Junio C Hamano
[not found] ` <11771520591703-gi t-send-email-junkio@cox.net>
2007-04-21 10:40 ` [PATCH 1/4] Simplify calling of CR/LF conversion routines Junio C Hamano
2007-04-21 10:40 ` [PATCH 2/4] convert.c: restructure the attribute checking part Junio C Hamano
2007-04-21 10:40 ` [PATCH 3/4] lockfile: record the primary process Junio C Hamano
2007-04-21 10:40 ` [PATCH 4/4] Add 'filter' attribute and external filter driver definition Junio C Hamano
2007-04-22 0:39 ` Shawn O. Pearce
2007-04-22 2:15 ` Junio C Hamano
2007-04-22 3:00 ` Shawn O. Pearce
2007-04-22 1:33 ` David Lang
2007-04-22 6:33 ` Junio C Hamano
2007-04-22 9:09 ` David Lang
2007-04-22 9:20 ` David Lang
2007-04-22 17:42 ` Junio C Hamano
2007-04-22 21:05 ` David Lang
2007-04-22 18:11 ` Nicolas Pitre
2007-04-22 20:27 ` [PATCH 4/4] Add 'filter' attribute and external filter driverdefinition David Lang
2007-04-22 5:47 ` [PATCH 4/4] Add 'filter' attribute and external filter driver definition Linus Torvalds
2007-04-22 6:12 ` Junio C Hamano
2007-04-21 20:03 ` [PATCH 0/4] External 'filter' attributes and drivers Alex Riesen
2007-04-22 1:19 ` David Lang
2007-04-22 5:20 ` Shawn O. Pearce [this message]
2007-04-22 9:01 ` David Lang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070422052008.GH17480@spearce.org \
--to=spearce@spearce.org \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).