* Re: [PATCH] gitweb: Adding a `blame' interface.
From: Junio C Hamano @ 2006-06-14 20:27 UTC (permalink / raw)
To: Martin Langhoff; +Cc: Florian Forster, git
In-Reply-To: <46a038f90606111502g607be3cfnf83ce81764a5f909@mail.gmail.com>
"Martin Langhoff" <martin.langhoff@gmail.com> writes:
> Florian,
>
> Looks good! git-blame/git-annotate are quite expensive to run. Do you
> think it would make sense making it conditional on a git-repo-config
> option (gitweb.blame=1)?
>
> kernel.org is the flagship user for gitweb, so expensive options
> should default to off :-/
Seconded. Thanks Florian and Martin.
^ permalink raw reply
* Re: [PATCH] auto-detect changed $prefix in Makefile and properly rebuild to avoid broken install
From: Junio C Hamano @ 2006-06-14 20:04 UTC (permalink / raw)
To: Yakov Lerner; +Cc: git
In-Reply-To: <0J0V00LDT7B9BU00@mxout2.netvision.net.il>
Yakov Lerner <iler.ml@gmail.com> writes:
> Many times, I mistakenly used 'make prefix=... install' where prefix value
> was different from prefix value during build. This resulted in broken
> install. This patch adds auto-detection of $prefix change to the Makefile.
> This results in correct install whenever prefix is changed.
>
> Signed-off-by: Yakov Lerner <iler.ml@gmail.com>
I do not mind this per se, and probably even agree that this is
an improvement compared to the current state of affairs, but a few
points:
- please make sure you clean that state file in "make clean";
- we may want to make the state file a bit more visible (IOW, I
somewhat do mind the name being dot-git-dot-prefix).
- we might want to later (or at the same time as this patch)
do "consistent set of compilation flags" (e.g. run early
part of compilation with openssl SHA-1 implementation,
interrupt it and build and link the rest with mozilla SHA-1
implementation -- then you will get a nonsense binary without
linker errors). It might make sense to prepare this
mechanism so we could reuse it for that purpose.
^ permalink raw reply
* Re: oprofile on svn import
From: Jakub Narebski @ 2006-06-14 19:38 UTC (permalink / raw)
To: git
In-Reply-To: <9e4733910606141225n11b406fte6229ea9993825dd@mail.gmail.com>
Jon Smirl wrote:
> Stats after 18 hours into git-svnimport. Process is now stuck in the
> kernel 64% of the time. All of the kernel time is in page management.
> Perl svnimport process is 290MB now.
>
> My top candidates for causing the problem are the fork in the perl
> code or the execing of a million tiny git processes.
>
> The key low level git functions could be made into a library to avoid
> the need to exec them continuously. The svn functions are libraries
> and they hardly show up.
There is ongoing effort to translate git functions into builtins.
Still you would need to translate git-svnimport Perl code into C,
or somehow access git library from Perl.
--
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git
^ permalink raw reply
* [PATCH] auto-detect changed $prefix in Makefile and properly rebuild to avoid broken install
From: Yakov Lerner @ 2006-06-14 19:26 UTC (permalink / raw)
To: git; +Cc: iler.ml
Many times, I mistakenly used 'make prefix=... install' where prefix value
was different from prefix value during build. This resulted in broken
install. This patch adds auto-detection of $prefix change to the Makefile.
This results in correct install whenever prefix is changed.
Signed-off-by: Yakov Lerner <iler.ml@gmail.com>
---
Makefile | 29 ++++++++++++++++++++++-------
1 files changed, 22 insertions(+), 7 deletions(-)
diff --git a/Makefile b/Makefile
index 2a1e639..015c9b2 100644
--- a/Makefile
+++ b/Makefile
@@ -464,6 +464,7 @@ DESTDIR_SQ = $(subst ','\'',$(DESTDIR))
bindir_SQ = $(subst ','\'',$(bindir))
gitexecdir_SQ = $(subst ','\'',$(gitexecdir))
template_dir_SQ = $(subst ','\'',$(template_dir))
+prefix_SQ = $(subst ','\'',$(prefix))
SHELL_PATH_SQ = $(subst ','\'',$(SHELL_PATH))
PERL_PATH_SQ = $(subst ','\'',$(PERL_PATH))
@@ -484,7 +485,7 @@ all:
strip: $(PROGRAMS) git$X
$(STRIP) $(STRIP_OPTS) $(PROGRAMS) git$X
-git$X: git.c common-cmds.h $(BUILTIN_OBJS) $(GITLIBS)
+git$X: git.c common-cmds.h $(BUILTIN_OBJS) $(GITLIBS) .git.prefix
$(CC) -DGIT_VERSION='"$(GIT_VERSION)"' \
$(ALL_CFLAGS) -o $@ $(filter %.c,$^) \
$(BUILTIN_OBJS) $(ALL_LDFLAGS) $(LIBS)
@@ -516,7 +517,7 @@ common-cmds.h: Documentation/git-*.txt
chmod +x $@+
mv $@+ $@
-$(patsubst %.py,%,$(SCRIPT_PYTHON)) : % : %.py
+$(patsubst %.py,%,$(SCRIPT_PYTHON)) : % : %.py .git.prefix
rm -f $@ $@+
sed -e '1s|#!.*python|#!$(PYTHON_PATH_SQ)|' \
-e 's|@@GIT_PYTHON_PATH@@|$(GIT_PYTHON_DIR_SQ)|g' \
@@ -540,19 +541,19 @@ git$X git.spec \
$(patsubst %.py,%,$(SCRIPT_PYTHON)) \
: GIT-VERSION-FILE
-%.o: %.c
+%.o: %.c .git.prefix
$(CC) -o $*.o -c $(ALL_CFLAGS) $<
%.o: %.S
$(CC) -o $*.o -c $(ALL_CFLAGS) $<
-exec_cmd.o: exec_cmd.c
+exec_cmd.o: exec_cmd.c .git.prefix
$(CC) -o $*.o -c $(ALL_CFLAGS) '-DGIT_EXEC_PATH="$(gitexecdir_SQ)"' $<
-http.o: http.c
+http.o: http.c .git.prefix
$(CC) -o $*.o -c $(ALL_CFLAGS) -DGIT_USER_AGENT='"git/$(GIT_VERSION)"' $<
ifdef NO_EXPAT
-http-fetch.o: http-fetch.c http.h
+http-fetch.o: http-fetch.c http.h .git.prefix
$(CC) -o $*.o -c $(ALL_CFLAGS) -DNO_EXPAT $<
endif
@@ -609,6 +610,14 @@ tags:
rm -f tags
find . -name '*.[hcS]' -print | xargs ctags -a
+### Detect prefix changes
+.git.prefix: .FORCE-git.prefix
+ @PREFIXES='$(bindir_SQ):$(gitexecdir_SQ):$(template_dir_SQ):$(prefix_SQ)';\
+ if test x"$$PREFIXES" != x"`cat .git.prefix 2>/dev/null`" ; then \
+ echo 1>&2 " * prefix changed"; \
+ echo "$$PREFIXES" >.git.prefix; \
+ fi
+
### Testing rules
# GNU make supports exporting all variables by "export" without parameters.
@@ -632,6 +641,12 @@ test-dump-cache-tree$X: dump-cache-tree.
check:
for i in *.c; do sparse $(ALL_CFLAGS) $(SPARSE_FLAGS) $$i || exit; done
+test-prefix-change:
+ mkdir -p "`pwd`/tmp1" "`pwd`/tmp2"
+ $(MAKE) clean install prefix="`pwd`/tmp1"
+ $(MAKE) install prefix="`pwd`/tmp2"
+ @grep -r "`pwd`/tmp1" "`pwd`/tmp2" >/dev/null; if test $$? = 0 ; then\
+ echo Error, test failed; exit 1; else echo Ok, test passed; fi
### Installation rules
@@ -714,7 +729,7 @@ clean:
rm -f GIT-VERSION-FILE
.PHONY: all install clean strip
-.PHONY: .FORCE-GIT-VERSION-FILE TAGS tags
+.PHONY: .FORCE-GIT-VERSION-FILE TAGS tags .FORCE-git.prefix
### Check documentation
#
--
1.4.0
^ permalink raw reply related
* Re: oprofile on svn import
From: Jon Smirl @ 2006-06-14 19:25 UTC (permalink / raw)
To: git
In-Reply-To: <9e4733910606131932w362c6ddcx5bf36ea5591feba1@mail.gmail.com>
Stats after 18 hours into git-svnimport. Process is now stuck in the
kernel 64% of the time. All of the kernel time is in page management.
Perl svnimport process is 290MB now.
My top candidates for causing the problem are the fork in the perl
code or the execing of a million tiny git processes.
The key low level git functions could be made into a library to avoid
the need to exec them continuously. The svn functions are libraries
and they hardly show up.
606218 2.4143 /usr/local/bin/git-update-index
127170 0.5065 /usr/local/bin/git-write-tree
81153 0.3232 /usr/local/bin/git-read-tree
13065 0.0520 /usr/local/bin/git-ls-files
2624 0.0105 /usr/local/bin/git-hash-object
754 0.0030 /usr/local/bin/git-commit-tree
462 0.0018 /usr/local/bin/git-ls-tree
398 0.0016 /usr/local/bin/git-rev-parse
versus
102784 0.3641 /usr/lib/libsvn_subr-1.so.0.0.0
70235 0.2488 /usr/lib/libsvn_fs_fs-1.so.0.0.0
67081 0.2376 /usr/lib/libsvn_delta-1.so.0.0.0
848 0.0030 /usr/lib/libsvn_swig_perl-1.so.0.0.0
512 0.0018 /usr/lib/libsvn_ra_local-1.so.0.0.0
350 0.0012 /usr/lib/libsvn_fs-1.so.0.0.0
222 7.9e-04 /usr/lib/libsvn_repos-1.so.0.0.0
124 4.4e-04 /usr/lib/libsvn_ra-1.so.0.0.0
------------------------------------------------------------------------------------------------------------
4093890 64.3711 /home/good/vmlinux
906014 14.2459 /lib/libcrypto.so.0.9.8a
435744 6.8515 /lib/libc-2.4.so
158325 2.4895 /usr/lib/libz.so.1.2.3
139995 2.2012 /usr/local/bin/git-update-index
75322 1.1843 /nvidia
64349 1.0118 /usr/bin/oprofiled
52825 0.8306 /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so
51930 0.8165 /usr/lib/libapr-1.so.0.2.2
42771 0.6725 /usr/local/bin/git-read-tree
37774 0.5939 /lib/ld-2.4.so
34761 0.5466 /usr/local/bin/git-write-tree
29560 0.4648 /usr/lib/libsvn_subr-1.so.0.0.0
28210 0.4436 /usr/lib/libaprutil-1.so.0.2.2
-----------------------------------------------------------------------------------------------------------------
2471826 32.8741 copy_page_range
375260 18.2903 unmap_vmas
574208 7.6367 release_pages
572189 7.6098 page_remove_rmap
233367 3.1037 free_pages_and_swap_cache
191051 2.5409 get_page_from_freelist
169058 2.2484 unlock_page
162027 2.1549 vm_normal_page
155691 2.0706 swap_info_get
136324 1.8130 swap_duplicate
119227 1.5857 page_fault
99729 1.3263 page_waitqueue
49288 0.6555 remove_exclusive_swap_page
39611 0.5268 do_wp_page
39142 0.5206 __wake_up_bit
34384 0.4573 __copy_from_user_ll
31111 0.4138 __handle_mm_fault
29990 0.3989 find_get_page
29682 0.3948 do_page_fault
--
Jon Smirl
jonsmirl@gmail.com
^ permalink raw reply
* Re: Repacking many disconnected blobs
From: Nicolas Pitre @ 2006-06-14 19:25 UTC (permalink / raw)
To: Keith Packard; +Cc: Linus Torvalds, Git Mailing List
In-Reply-To: <1150311567.30681.28.camel@neko.keithp.com>
On Wed, 14 Jun 2006, Keith Packard wrote:
> On Wed, 2006-06-14 at 11:18 -0700, Linus Torvalds wrote:
>
> > You don't _need_ to shuffle. As mentioned, it will only affect the
> > location of the data in the pack-file, which in turn will mostly matter
> > as an IO pattern thing, not anything really fundamental. If the pack-file
> > ends up caching well, the IO patterns obviously will never matter.
>
> Ok, sounds like shuffling isn't necessary; the only benefit packing
> gains me is to reduce the size of each directory in the object store;
> the process I follow is to construct blobs for every revision, then just
> use the sha1 values to construct an index for each commit. I never
> actually look at the blobs myself, so IO access patterns aren't
> relevant.
>
> Repacking after the import is completed should undo whatever horror show
> I've created in any case.
The only advantage of feeding object names from latest to oldest has to
do with the delta direction. In doing so the delta are backward such
that objects with deeper delta chain are further back in history and
this is what you want in the final pack for faster access to the latest
revision.
Of course the final repack will do that automatically, but only if you
use -a -f with git-repack. But when -f is not provided then already
deltified objects from other packs are copied as is without any delta
computation making the repack process lots faster. In that case it
might be preferable that the reuse of already deltified data is made of
backward delta which is the reason you might consider feeding object in
the prefered order up front.
Nicolas
^ permalink raw reply
* Re: Repacking many disconnected blobs
From: Linus Torvalds @ 2006-06-14 19:18 UTC (permalink / raw)
To: Keith Packard; +Cc: Git Mailing List
In-Reply-To: <1150311567.30681.28.camel@neko.keithp.com>
On Wed, 14 Jun 2006, Keith Packard wrote:
>
> Ok, sounds like shuffling isn't necessary; the only benefit packing
> gains me is to reduce the size of each directory in the object store;
There's actually a secondary benefit to packing that turned out to be much
bigger from a performance standpoint: the size benefit coupled with the
fact that it's all in one file ends up meaning that accessing packed
objects is _much_ faster than accessing individual files.
The Linux system call overhead is one of the lowest ones out there, but
it's still much bigger than just a function call, and doing a full
pathname walk and open/close is bigger yet. In contrast, if you access
lots of objects and they are all in a pack, you only end up doing one mmap
and a page fault for each 4kB entry, and that's it.
So packing has a large performance benefit outside of the actual disk use
one, and to some degree that performance benefit is then further magnified
by good locality (ie you get more effective objects per page fault), but
in your case that locality issue is secondary.
I assume that you never actually end up looking at the _contents_ of the
objects any more ever afterwards, because in a very real sense you're
really interested in the SHA1 names, right? All the latter phases of
parsecvs will just use the SHA1 names directly, and never actually even
open the data (packed or not).
So in that sense, you only care about the disksize and a much improved
directory walk from fewer files (until the repository has actually been
fully created, at which point a repack will do the right thing).
Linus
^ permalink raw reply
* Re: Repacking many disconnected blobs
From: Keith Packard @ 2006-06-14 18:59 UTC (permalink / raw)
To: Linus Torvalds; +Cc: keithp, Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0606141113130.5498@g5.osdl.org>
[-- Attachment #1: Type: text/plain, Size: 851 bytes --]
On Wed, 2006-06-14 at 11:18 -0700, Linus Torvalds wrote:
> You don't _need_ to shuffle. As mentioned, it will only affect the
> location of the data in the pack-file, which in turn will mostly matter
> as an IO pattern thing, not anything really fundamental. If the pack-file
> ends up caching well, the IO patterns obviously will never matter.
Ok, sounds like shuffling isn't necessary; the only benefit packing
gains me is to reduce the size of each directory in the object store;
the process I follow is to construct blobs for every revision, then just
use the sha1 values to construct an index for each commit. I never
actually look at the blobs myself, so IO access patterns aren't
relevant.
Repacking after the import is completed should undo whatever horror show
I've created in any case.
--
keith.packard@intel.com
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply
* Re: Repacking many disconnected blobs
From: Linus Torvalds @ 2006-06-14 18:52 UTC (permalink / raw)
To: Keith Packard; +Cc: Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0606141113130.5498@g5.osdl.org>
On Wed, 14 Jun 2006, Linus Torvalds wrote:
>
> You don't _need_ to shuffle. As mentioned, it will only affect the
> location of the data in the pack-file, which in turn will mostly matter
> as an IO pattern thing, not anything really fundamental. If the pack-file
> ends up caching well, the IO patterns obviously will never matter.
Actually, thinking about it more, the way you do things, shuffling
probably won't even help.
Why? Because you'll obviously have multiple files, and even if each file
were to be sorted "correctly", the access patterns from any global
standpoint won't really matter, becase you'd probably bounce back and
forth in the pack-file anyway.
So if anything, I would say
- just dump them into the packfile in whatever order is most convenient
- if you know that later phases will go through the objects and actually
use them (as opposed to just building trees out of their SHA1 values)
in some particular order, _that_ might be the ordering to use.
- in many ways, getting good delta chains is _much_ more important, since
"git repack -a -d" will re-use good deltas from a previous pack, but
will _not_ care about any ordering in the old pack. As well as
obviously improving the size of the temporary pack-files anyway.
I'll pontificate more if I can think of any other cases that might matter.
Linus
^ permalink raw reply
* Re: Repacking many disconnected blobs
From: Linus Torvalds @ 2006-06-14 18:18 UTC (permalink / raw)
To: Keith Packard; +Cc: Git Mailing List
In-Reply-To: <1150307715.20536.166.camel@neko.keithp.com>
On Wed, 14 Jun 2006, Keith Packard wrote:
> On Wed, 2006-06-14 at 08:53 -0700, Linus Torvalds wrote:
>
> > - You can list the objects with "most important first" order first, if
> > you can. That will improve locality later (the packing will try to
> > generate the pack so that the order you gave the objects in will be a
> > rough order of the resul - the first objects will be together at the
> > beginning, the last objects will be at the end)
>
> I take every ,v file and construct blobs for every revision. If I
> understand this correctly, I should be shuffling the revisions so I send
> the latest revision of every file first, then the next-latest revision.
> It would be somewhat easier to just send the whole list of revisions for
> the first file and then move to the next file, but if shuffling is what
> I want, I'll do that.
You don't _need_ to shuffle. As mentioned, it will only affect the
location of the data in the pack-file, which in turn will mostly matter
as an IO pattern thing, not anything really fundamental. If the pack-file
ends up caching well, the IO patterns obviously will never matter.
Eventually, after the whole import has finished, and you do the final
repack, that one will do things in "recency order" (or "global
reachability order" if you prefer), which means that all the objects in
the final pack will be sorted by how "close" they are to the top-of-tree.
And that will happen regardless of what the intermediate ordering has
been.
So if shuffling is inconvenient, just don't do it.
On the other hand, if you know that you generated the blobs "oldest to
newest", just print them in the reverse order when you end up repacking,
and you're all done (if you just save the info into some array before you
repack, just walk the array backwards).
Linus
^ permalink raw reply
* Re: Repacking many disconnected blobs
From: Keith Packard @ 2006-06-14 17:55 UTC (permalink / raw)
To: Linus Torvalds; +Cc: keithp, Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0606140826200.5498@g5.osdl.org>
[-- Attachment #1: Type: text/plain, Size: 1373 bytes --]
On Wed, 2006-06-14 at 08:53 -0700, Linus Torvalds wrote:
> - You can list the objects with "most important first" order first, if
> you can. That will improve locality later (the packing will try to
> generate the pack so that the order you gave the objects in will be a
> rough order of the resul - the first objects will be together at the
> beginning, the last objects will be at the end)
I take every ,v file and construct blobs for every revision. If I
understand this correctly, I should be shuffling the revisions so I send
the latest revision of every file first, then the next-latest revision.
It would be somewhat easier to just send the whole list of revisions for
the first file and then move to the next file, but if shuffling is what
I want, I'll do that.
> The corollary to this is that it's better to generate the pack-file
> from a list of every version of a few files than it is to generate it
> from a few versions of every file. Ie, if you process things one file
> at a time, and create every object for that file, that is actually good
> for packing, since there will be the optimal delta opportunity.
I assumed that was the case. Fortunately, I process each file
separately, so this matches my needs exactly. I should be able to report
on this shortly.
--
keith.packard@intel.com
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply
* Re: Repacking many disconnected blobs
From: Linus Torvalds @ 2006-06-14 15:53 UTC (permalink / raw)
To: Keith Packard; +Cc: Git Mailing List
In-Reply-To: <1150269478.20536.150.camel@neko.keithp.com>
On Wed, 14 Jun 2006, Keith Packard wrote:
>
> parsecvs scans every ,v file and creates a blob for every revision of
> every file right up front. Once these are created, it discards the
> actual file contents and deals solely with the hash values.
>
> The problem is that while this is going on, the repository consists
> solely of disconnected objects, and I can't make git-repack put those
> into pack objects.
Ok. That's actually _easily_ rectifiable, because it turns out that your
behaviour is something that re-packing is actually really good at
handling.
The thing is, "git repack" (the wrapper function) is all about finding all
the heads of a repository, and then tellign the _real_ packing logic which
objects to pack.
In other words, it literally boils down to basically
git-rev-list --all --objects $rev_list |
git-pack-objects --non-empty $pack_objects .tmp-pack
where "$rev_list" and "$pack_objects" are just extra flags to the two
phases that you don't really care about.
But the important point to recognize is that the pack generation itself
doesn't care about reachability or anything else AT ALL. The pack is just
a jumble of objects, nothing more. Which is exactly what you want.
> I'm assuming that if I could get these disconnected blobs all neatly
> tucked into a pack object, things might go a bit faster.
Absolutely. And it's even easy.
What you should do is to just generate a list of objects every once in a
while, and pass that list off to "git-pack-objects", which will create a
pack-file for you. Then you just move the generated pack-file (and index
file) into the .git/objects/pack directory, and then you can run the
normal "git-prune-packed", and you're done.
There's just two small subtle points to look out for:
- You can list the objects with "most important first" order first, if
you can. That will improve locality later (the packing will try to
generate the pack so that the order you gave the objects in will be a
rough order of the resul - the first objects will be together at the
beginning, the last objects will be at the end)
This is not a huge deal. If you don't have a good order, give them in
any order, and then after you're done (and you do have branches and
tag-heads), the final repack (with a regular "git repack") will fix it
all up.
You'll still get all of the size/access advantage of packfiles without
this, it just won't have the additional "nice IO patterns within the
packfile" behaviour (which mainly matters for the cold-cache case, so
you may well not care).
- append the filename the object is associated with to the object name on
the list, if at all possible. This is what git-pack-objects will use as
part of the heuristic for finding the deltas, so this is actually a big
deal. If you forget (or mess up) the filename, packing will still
_work_ - it's just a heuristic, after all, and there are a few others
too - but the pack-file will have inferior delta chains.
(The name doesn't have to be the "real name", it really only needs to
be something unique per *,v file, but real name is probably best)
The corollary to this is that it's better to generate the pack-file
from a list of every version of a few files than it is to generate it
from a few versions of every file. Ie, if you process things one file
at a time, and create every object for that file, that is actually good
for packing, since there will be the optimal delta opportunity.
In other words, you should just feed git-pack-file a list of objects in
the form "<sha1><space><filename>\n", and git-pack-file will do the rest.
Just as a stupid example, if you were to want to pack just the _tree_ that
is the current version of a git archive, you'd do
git-rev-list --objects HEAD^{tree} |
git-pack-objects --non-empty .tmp-pack
which you can try on the current git tree just to see (the first line will
generate a list of all objects reachable from the current _tree_: no
history at all, the second line will create two files under the name of
".tmp-pack-<sha1-of-object-list>.{pack|idx}".
The reason I suggest doing this for the current tree of the git archive is
simply that you can look at the git-rev-list output with "less", and see
for yourself what it actually does (and there are just a few hundred
objects there: a few tree objects, and the blob objects for every file in
the current HEAD).
So the git pack-format is actually _optimal_ for your particular case,
exactly because the pack-files don't actually care about any high-level
semantics: all they contain is a list of objects.
So in phase 1, when you generate all the objects, the simplest thing to do
is to literally just remember the last five thousand objects or so as you
generate them, and when that array of objects fills up, you just start the
"git-pack-objects" thing, and feed it the list of objects, move the
pack-file into .git/objects/pack/pack-... and do a "git prune-packed".
Then you just continue.
So this should all fit the parsecvs approach very well indeed.
Linus
^ permalink raw reply
* [PATCH/RFC] Teach diff about -b and -w flags
From: Johannes Schindelin @ 2006-06-14 15:40 UTC (permalink / raw)
To: davidel, git, junkio
This adds -b (--ignore-space-change) and -w (--ignore-all-space) flags to
diff. The main part of the patch is teaching libxdiff about it.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
---
Note that -b will not treat DOS and Unix line endings as equal,
although it would be trivial. Is this desired?
Another question: instead of checking the flags all the time,
xdl_line_match (and possibly xdl_hash_record) could be split into
three different functions, and a function pointer could be set
at init time. This would be faster, but less elegant, no?
diff.c | 13 +++++++++----
diff.h | 1 +
xdiff/xdiff.h | 3 +++
xdiff/xdiffi.c | 12 ++++++------
xdiff/xdiffi.h | 1 -
xdiff/xmacros.h | 1 -
xdiff/xprepare.c | 16 ++++++++++------
xdiff/xutils.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++-
xdiff/xutils.h | 3 ++-
9 files changed, 81 insertions(+), 20 deletions(-)
diff --git a/diff.c b/diff.c
index bc32a4a..5b34f73 100644
--- a/diff.c
+++ b/diff.c
@@ -661,7 +661,7 @@ static void builtin_diff(const char *nam
memset(&ecbdata, 0, sizeof(ecbdata));
ecbdata.label_path = lbl;
ecbdata.color_diff = o->color_diff;
- xpp.flags = XDF_NEED_MINIMAL;
+ xpp.flags = XDF_NEED_MINIMAL | o->xdl_opts;
xecfg.ctxlen = o->context;
xecfg.flags = XDL_EMIT_FUNCNAMES;
if (!diffopts)
@@ -686,6 +686,7 @@ static void builtin_diffstat(const char
struct diff_filespec *one,
struct diff_filespec *two,
struct diffstat_t *diffstat,
+ struct diff_options *o,
int complete_rewrite)
{
mmfile_t mf1, mf2;
@@ -715,7 +716,7 @@ static void builtin_diffstat(const char
xdemitconf_t xecfg;
xdemitcb_t ecb;
- xpp.flags = XDF_NEED_MINIMAL;
+ xpp.flags = XDF_NEED_MINIMAL | o->xdl_opts;
xecfg.ctxlen = 0;
xecfg.flags = 0;
ecb.outf = xdiff_outf;
@@ -1300,7 +1301,7 @@ static void run_diffstat(struct diff_fil
if (DIFF_PAIR_UNMERGED(p)) {
/* unmerged */
- builtin_diffstat(p->one->path, NULL, NULL, NULL, diffstat, 0);
+ builtin_diffstat(p->one->path, NULL, NULL, NULL, diffstat, o, 0);
return;
}
@@ -1312,7 +1313,7 @@ static void run_diffstat(struct diff_fil
if (p->status == DIFF_STATUS_MODIFIED && p->score)
complete_rewrite = 1;
- builtin_diffstat(name, other, p->one, p->two, diffstat, complete_rewrite);
+ builtin_diffstat(name, other, p->one, p->two, diffstat, o, complete_rewrite);
}
static void run_checkdiff(struct diff_filepair *p, struct diff_options *o)
@@ -1517,6 +1518,10 @@ int diff_opt_parse(struct diff_options *
}
else if (!strcmp(arg, "--color"))
options->color_diff = 1;
+ else if (!strcmp(arg, "-w") || !strcmp(arg, "--ignore-all-space"))
+ options->xdl_opts |= XDF_IGNORE_WHITESPACE;
+ else if (!strcmp(arg, "-b") || !strcmp(arg, "--ignore-space-change"))
+ options->xdl_opts |= XDF_IGNORE_WHITESPACE_CHANGE;
else
return 0;
return 1;
diff --git a/diff.h b/diff.h
index 2b821df..7d7b6cd 100644
--- a/diff.h
+++ b/diff.h
@@ -46,6 +46,7 @@ struct diff_options {
int setup;
int abbrev;
const char *stat_sep;
+ long xdl_opts;
int nr_paths;
const char **paths;
diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index 2540e8a..2ce10b4 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -29,6 +29,9 @@ #endif /* #ifdef __cplusplus */
#define XDF_NEED_MINIMAL (1 << 1)
+#define XDF_IGNORE_WHITESPACE (1 << 2)
+#define XDF_IGNORE_WHITESPACE_CHANGE (1 << 3)
+#define XDF_WHITESPACE_FLAGS (XDF_IGNORE_WHITESPACE | XDF_IGNORE_WHITESPACE_CHANGE)
#define XDL_PATCH_NORMAL '-'
#define XDL_PATCH_REVERSE '+'
diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index b95ade2..5d09a16 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -45,7 +45,7 @@ static long xdl_split(unsigned long cons
long *kvdf, long *kvdb, int need_min, xdpsplit_t *spl,
xdalgoenv_t *xenv);
static xdchange_t *xdl_add_change(xdchange_t *xscr, long i1, long i2, long chg1, long chg2);
-static int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo);
+static int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags);
@@ -397,7 +397,7 @@ static xdchange_t *xdl_add_change(xdchan
}
-static int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo) {
+static int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
long ix, ixo, ixs, ixref, grpsiz, nrec = xdf->nrec;
char *rchg = xdf->rchg, *rchgo = xdfo->rchg;
xrecord_t **recs = xdf->recs;
@@ -440,7 +440,7 @@ static int xdl_change_compact(xdfile_t *
* the group.
*/
while (ixs > 0 && recs[ixs - 1]->ha == recs[ix - 1]->ha &&
- XDL_RECMATCH(recs[ixs - 1], recs[ix - 1])) {
+ xdl_line_match(recs[ixs - 1]->ptr, recs[ixs - 1]->size, recs[ix - 1]->ptr, recs[ix - 1]->size, flags)) {
rchg[--ixs] = 1;
rchg[--ix] = 0;
@@ -468,7 +468,7 @@ static int xdl_change_compact(xdfile_t *
* the group.
*/
while (ix < nrec && recs[ixs]->ha == recs[ix]->ha &&
- XDL_RECMATCH(recs[ixs], recs[ix])) {
+ xdl_line_match(recs[ixs]->ptr, recs[ixs]->size, recs[ix]->ptr, recs[ix]->size, flags)) {
rchg[ixs++] = 0;
rchg[ix++] = 1;
@@ -546,8 +546,8 @@ int xdl_diff(mmfile_t *mf1, mmfile_t *mf
return -1;
}
- if (xdl_change_compact(&xe.xdf1, &xe.xdf2) < 0 ||
- xdl_change_compact(&xe.xdf2, &xe.xdf1) < 0 ||
+ if (xdl_change_compact(&xe.xdf1, &xe.xdf2, xpp->flags) < 0 ||
+ xdl_change_compact(&xe.xdf2, &xe.xdf1, xpp->flags) < 0 ||
xdl_build_script(&xe, &xscr) < 0) {
xdl_free_env(&xe);
diff --git a/xdiff/xdiffi.h b/xdiff/xdiffi.h
index dd8f3c9..d3b7271 100644
--- a/xdiff/xdiffi.h
+++ b/xdiff/xdiffi.h
@@ -55,6 +55,5 @@ void xdl_free_script(xdchange_t *xscr);
int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
xdemitconf_t const *xecfg);
-
#endif /* #if !defined(XDIFFI_H) */
diff --git a/xdiff/xmacros.h b/xdiff/xmacros.h
index 78f0260..4c2fde8 100644
--- a/xdiff/xmacros.h
+++ b/xdiff/xmacros.h
@@ -33,7 +33,6 @@ #define XDL_ABS(v) ((v) >= 0 ? (v): -(v)
#define XDL_ISDIGIT(c) ((c) >= '0' && (c) <= '9')
#define XDL_HASHLONG(v, b) (((unsigned long)(v) * GR_PRIME) >> ((CHAR_BIT * sizeof(unsigned long)) - (b)))
#define XDL_PTRFREE(p) do { if (p) { xdl_free(p); (p) = NULL; } } while (0)
-#define XDL_RECMATCH(r1, r2) ((r1)->size == (r2)->size && memcmp((r1)->ptr, (r2)->ptr, (r1)->size) == 0)
#define XDL_LE32_PUT(p, v) \
do { \
unsigned char *__p = (unsigned char *) (p); \
diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
index add5a75..f2a12ae 100644
--- a/xdiff/xprepare.c
+++ b/xdiff/xprepare.c
@@ -43,12 +43,13 @@ typedef struct s_xdlclassifier {
xdlclass_t **rchash;
chastore_t ncha;
long count;
+ long flags;
} xdlclassifier_t;
-static int xdl_init_classifier(xdlclassifier_t *cf, long size);
+static int xdl_init_classifier(xdlclassifier_t *cf, long size, long flags);
static void xdl_free_classifier(xdlclassifier_t *cf);
static int xdl_classify_record(xdlclassifier_t *cf, xrecord_t **rhash, unsigned int hbits,
xrecord_t *rec);
@@ -63,9 +64,11 @@ static int xdl_optimize_ctxs(xdfile_t *x
-static int xdl_init_classifier(xdlclassifier_t *cf, long size) {
+static int xdl_init_classifier(xdlclassifier_t *cf, long size, long flags) {
long i;
+ cf->flags = flags;
+
cf->hbits = xdl_hashbits((unsigned int) size);
cf->hsize = 1 << cf->hbits;
@@ -103,8 +106,9 @@ static int xdl_classify_record(xdlclassi
line = rec->ptr;
hi = (long) XDL_HASHLONG(rec->ha, cf->hbits);
for (rcrec = cf->rchash[hi]; rcrec; rcrec = rcrec->next)
- if (rcrec->ha == rec->ha && rcrec->size == rec->size &&
- !memcmp(line, rcrec->line, rec->size))
+ if (rcrec->ha == rec->ha &&
+ xdl_line_match(rcrec->line, rcrec->size,
+ rec->ptr, rec->size, cf->flags))
break;
if (!rcrec) {
@@ -173,7 +177,7 @@ static int xdl_prepare_ctx(mmfile_t *mf,
top = blk + bsize;
}
prev = cur;
- hav = xdl_hash_record(&cur, top);
+ hav = xdl_hash_record(&cur, top, xpp->flags);
if (nrec >= narec) {
narec *= 2;
if (!(rrecs = (xrecord_t **) xdl_realloc(recs, narec * sizeof(xrecord_t *)))) {
@@ -268,7 +272,7 @@ int xdl_prepare_env(mmfile_t *mf1, mmfil
enl1 = xdl_guess_lines(mf1) + 1;
enl2 = xdl_guess_lines(mf2) + 1;
- if (xdl_init_classifier(&cf, enl1 + enl2 + 1) < 0) {
+ if (xdl_init_classifier(&cf, enl1 + enl2 + 1, xpp->flags) < 0) {
return -1;
}
diff --git a/xdiff/xutils.c b/xdiff/xutils.c
index 21ab8e7..3dd5fe1 100644
--- a/xdiff/xutils.c
+++ b/xdiff/xutils.c
@@ -189,12 +189,61 @@ long xdl_guess_lines(mmfile_t *mf) {
return nl + 1;
}
+int xdl_line_match(const char *l1, long s1, const char *l2, long s2, long flags)
+{
+ int i1, i2;
+
+ if (flags & XDF_IGNORE_WHITESPACE) {
+ for (i1 = i2 = 0; i1 < s1 && i2 < s2; i1++, i2++) {
+ if (isspace(l1[i1]))
+ while (isspace(l1[i1]) && i1 < s1)
+ i1++;
+ else if (isspace(l2[i2]))
+ while (isspace(l2[i2]) && i2 < s2)
+ i2++;
+ else if (l1[i1] != l2[i2])
+ return l2[i2] - l1[i1];
+ }
+ if (i1 >= s1)
+ return 1;
+ else if (i2 >= s2)
+ return -1;
+ } else if (flags & XDF_IGNORE_WHITESPACE_CHANGE) {
+ for (i1 = i2 = 0; i1 < s1 && i2 < s2; i1++, i2++) {
+ if (isspace(l1[i1])) {
+ if (!isspace(l2[i2]))
+ return -1;
+ while (isspace(l1[i1]) && i1 < s1)
+ i1++;
+ while (isspace(l2[i2]) && i2 < s2)
+ i2++;
+ } else if (l1[i1] != l2[i2])
+ return l2[i2] - l1[i1];
+ }
+ if (i1 >= s1)
+ return 1;
+ else if (i2 >= s2)
+ return -1;
+ } else
+ return s1 == s2 && !memcmp(l1, l2, s1);
+
+ return 0;
+}
-unsigned long xdl_hash_record(char const **data, char const *top) {
+unsigned long xdl_hash_record(char const **data, char const *top, long flags) {
unsigned long ha = 5381;
char const *ptr = *data;
for (; ptr < top && *ptr != '\n'; ptr++) {
+ if (isspace(*ptr) && (flags & XDF_WHITESPACE_FLAGS)) {
+ while (ptr < top && isspace(*ptr) && ptr[1] != '\n')
+ ptr++;
+ if (flags & XDF_IGNORE_WHITESPACE_CHANGE) {
+ ha += (ha << 5);
+ ha ^= (unsigned long) ' ';
+ }
+ continue;
+ }
ha += (ha << 5);
ha ^= (unsigned long) *ptr;
}
diff --git a/xdiff/xutils.h b/xdiff/xutils.h
index ea38ee9..2701eea 100644
--- a/xdiff/xutils.h
+++ b/xdiff/xutils.h
@@ -33,7 +33,8 @@ void *xdl_cha_alloc(chastore_t *cha);
void *xdl_cha_first(chastore_t *cha);
void *xdl_cha_next(chastore_t *cha);
long xdl_guess_lines(mmfile_t *mf);
-unsigned long xdl_hash_record(char const **data, char const *top);
+int xdl_line_match(const char *l1, long s1, const char *l2, long s2, long flags);
+unsigned long xdl_hash_record(char const **data, char const *top, long flags);
unsigned int xdl_hashbits(unsigned int size);
int xdl_num_out(char *out, long val);
long xdl_atol(char const *str, char const **next);
--
1.4.0.ga9a96-dirty
^ permalink raw reply related
* Re: [PATCH] fix git alias
From: Johannes Schindelin @ 2006-06-14 13:38 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <Pine.LNX.4.63.0606141507420.16802@wbgn013.biozentrum.uni-wuerzburg.de>
Hi,
On Wed, 14 Jun 2006, Johannes Schindelin wrote:
> > There is another more grave problem I seem to be hitting but
> > haven't figured out (and will probably not figure out while
> > away); I'd appreciate if you can track it down. With
> > "alias.wh = whatchanged --patch-with-stat", "git wh HEAD --
> > mailinfo.c" segfaults at fclose() in git_config_from_file()
> > when it reads the configuration for the second time (the
> > first time being getting the alias). The second call comes
> > via init_revisions() calling setup_git_directory(). Oddly
> > I do not seem to be able to reproduce this segfault on amd64.
>
> I will do that.
I cannot reproduce, sorry. Valgrind says some objects are not released,
but I cannot find another error. That's with 'next'.
Ciao,
Dscho
^ permalink raw reply
* Re: Porcelain specific metadata under .git?
From: Junio C Hamano @ 2006-06-14 13:30 UTC (permalink / raw)
To: git
In-Reply-To: <44900A2F.7050704@op5.se>
Andreas Ericsson <ae@op5.se> writes:
> Yes, but I understood him to mean "it's a tree-sha" instead of a
> branch/head thing, which would mean it doesn't fit the .git/refs
> definition of ref.
I am not sure what you meant by "it's a tree-sha", but if you
have an impression that .git/refs define "ref" as committish,
you are mistaken. Linus has .git/refs/tags/v2.6.11-tree which
tags a tree object. I even have a .git/refs/tags/junio-gpg-pub
which tags a blob (blobish ;-> ?).
^ permalink raw reply
* Re: [PATCH] fix git alias
From: Johannes Schindelin @ 2006-06-14 13:14 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <7vu06nevse.fsf@assigned-by-dhcp.cox.net>
Hi,
On Wed, 14 Jun 2006, Junio C Hamano wrote:
> * This would make "git l -n 4" work when you have "alias.l =
> log -M" in your configuration. The original code generated
> an equivalent of "git log -M l -n 4".
Of course, I tested it only with links... (ln git git-l). Thanks.
> There is another more grave problem I seem to be hitting but
> haven't figured out (and will probably not figure out while
> away); I'd appreciate if you can track it down. With
> "alias.wh = whatchanged --patch-with-stat", "git wh HEAD --
> mailinfo.c" segfaults at fclose() in git_config_from_file()
> when it reads the configuration for the second time (the
> first time being getting the alias). The second call comes
> via init_revisions() calling setup_git_directory(). Oddly
> I do not seem to be able to reproduce this segfault on amd64.
I will do that.
Note that I have a mmap()ed version in the pipeline. I just wanted to wait
with that until I manage to implement your cool idea about config
rewriting. Obviously, this mmap()ed version does not have this problem.
Ciao,
Dscho
^ permalink raw reply
* Re: Porcelain specific metadata under .git?
From: Andreas Ericsson @ 2006-06-14 13:07 UTC (permalink / raw)
To: Jakub Narebski; +Cc: git
In-Reply-To: <e6os3v$r5g$1@sea.gmane.org>
Jakub Narebski wrote:
> Andreas Ericsson wrote:
>
>
>>Shawn Pearce wrote:
>>
>>>I already assume/know that refs/heads and refs/tags are completely
>>>off-limits as they are for user refs only.
>>>
>>>I also think the core GIT tools already assume that anything
>>>directly under .git which is strictly a file and which is named
>>>entirely with uppercase letters (aside from "HEAD") is strictly a
>>>temporary/short-lived state type item (e.g. COMMIT_MSG) used by a
>>>Porcelain.
>>>
>>>But is saying ".git/refs/eclipse-workspaces" is probably able to
>>>be used for this purpose safe? :-)
>>>
>>
>>.git/eclipse/whatever-you-like
>>
>>would probably be better. Heads can be stored directly under .git/refs
>>too. Most likely, nothing will ever be stored under ./git/eclipse by
>>either core git or the current (other) porcelains though.
>
>
> I think if it is a ref, which one wants to be visible to git-fsck (and
> git-prune), it should be under .git/refs.
>
Yes, but I understood him to mean "it's a tree-sha" instead of a
branch/head thing, which would mean it doesn't fit the .git/refs
definition of ref.
--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
^ permalink raw reply
* [PATCH] fix git alias
From: Junio C Hamano @ 2006-06-14 13:01 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git
When extra command line arguments are given to a command that
was alias-expanded, the code generated a wrong argument list,
leaving the original alias in the result, and forgetting to
terminate the new argv list.
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
* This would make "git l -n 4" work when you have "alias.l =
log -M" in your configuration. The original code generated
an equivalent of "git log -M l -n 4".
There is another more grave problem I seem to be hitting but
haven't figured out (and will probably not figure out while
away); I'd appreciate if you can track it down. With
"alias.wh = whatchanged --patch-with-stat", "git wh HEAD --
mailinfo.c" segfaults at fclose() in git_config_from_file()
when it reads the configuration for the second time (the
first time being getting the alias). The second call comes
via init_revisions() calling setup_git_directory(). Oddly
I do not seem to be able to reproduce this segfault on amd64.
diff --git a/git.c b/git.c
index 9469d44..329ebec 100644
--- a/git.c
+++ b/git.c
@@ -122,9 +122,9 @@ static int handle_alias(int *argcp, cons
/* insert after command name */
if (*argcp > 1) {
new_argv = realloc(new_argv, sizeof(char*) *
- (count + *argcp - 1));
- memcpy(new_argv + count, *argv, sizeof(char*) *
- (*argcp - 1));
+ (count + *argcp));
+ memcpy(new_argv + count, *argv + 1,
+ sizeof(char*) * *argcp);
}
*argv = new_argv;
^ permalink raw reply related
* Re: Repacking many disconnected blobs
From: Junio C Hamano @ 2006-06-14 12:33 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git
In-Reply-To: <Pine.LNX.4.63.0606141104050.15578@wbgn013.biozentrum.uni-wuerzburg.de>
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> Alternatively, you could construct fake trees like this:
>
> README/1.1.1.1
> README/1.2
> README/1.3
> ...
>
> i.e. every file becomes a directory -- containing all the versions of that
> file -- in the (virtual) tree, which you can point to by a temporary ref.
That would not play well with the packing heuristics, I suspect.
If you reverse it to use rev/file-id, then the same files from
different revs would sort closer, though.
^ permalink raw reply
* Re: Porcelain specific metadata under .git?
From: Jakub Narebski @ 2006-06-14 11:32 UTC (permalink / raw)
To: git
In-Reply-To: <448FEED7.30701@op5.se>
Andreas Ericsson wrote:
> Shawn Pearce wrote:
>>
>> I already assume/know that refs/heads and refs/tags are completely
>> off-limits as they are for user refs only.
>>
>> I also think the core GIT tools already assume that anything
>> directly under .git which is strictly a file and which is named
>> entirely with uppercase letters (aside from "HEAD") is strictly a
>> temporary/short-lived state type item (e.g. COMMIT_MSG) used by a
>> Porcelain.
>>
>> But is saying ".git/refs/eclipse-workspaces" is probably able to
>> be used for this purpose safe? :-)
>>
>
> .git/eclipse/whatever-you-like
>
> would probably be better. Heads can be stored directly under .git/refs
> too. Most likely, nothing will ever be stored under ./git/eclipse by
> either core git or the current (other) porcelains though.
I think if it is a ref, which one wants to be visible to git-fsck (and
git-prune), it should be under .git/refs.
--
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git
^ permalink raw reply
* Re: Porcelain specific metadata under .git?
From: Andreas Ericsson @ 2006-06-14 11:11 UTC (permalink / raw)
To: Shawn Pearce; +Cc: git
In-Reply-To: <20060614062240.GA13886@spearce.org>
Shawn Pearce wrote:
>
> I already assume/know that refs/heads and refs/tags are completely
> off-limits as they are for user refs only.
>
> I also think the core GIT tools already assume that anything
> directly under .git which is strictly a file and which is named
> entirely with uppercase letters (aside from "HEAD") is strictly a
> temporary/short-lived state type item (e.g. COMMIT_MSG) used by a
> Porcelain.
>
> But is saying ".git/refs/eclipse-workspaces" is probably able to
> be used for this purpose safe? :-)
>
.git/eclipse/whatever-you-like
would probably be better. Heads can be stored directly under .git/refs
too. Most likely, nothing will ever be stored under ./git/eclipse by
either core git or the current (other) porcelains though.
--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
^ permalink raw reply
* Re: 'sparse' clone idea
From: Jakub Narebski @ 2006-06-14 9:44 UTC (permalink / raw)
To: git
In-Reply-To: <Pine.LNX.4.63.0606141110001.15673@wbgn013.biozentrum.uni-wuerzburg.de>
Johannes Schindelin wrote:
> On Wed, 14 Jun 2006, Jakub Narebski wrote:
>
>> I wonder if 'sparse clone' idea described below would avoid the most
>> difficult part of 'shallow clone' idea, namely the [sometimes] need to
>> un-cauterize history. See: (<7vac8lidwi.fsf@assigned-by-dhcp.cox.net>).
>
> I do not think that is the hardest problem. The hardest thing is to tell
> the server in an efficient manner which objects we have.
>
> Example:
>
> A - B - C - D
> ^ cutoff
> ^ current HEAD
>
> Suppose B is your fake root, C is your HEAD, you want to fetch D. Now,
> make it a difficult example: both A and D contain a certain blob Z, but
> neither B nor C do. You have to tell the server _in an efficient manner_
> to send Z also.
>
> And by efficient manner I mean: you may not bring the server down just
> because 5 people with shallow clones decide to fetch from it.
Nah, that I think is solved. Check the mentioned post by Junio C Hamano
in the "Re: Figured out how to get Mozilla into git" post:
http://permalink.gmane.org/gmane.comp.version-control.git/21603
(although it would need extension to the git protocol). Client and server
do graft exchange both ways, limiting the commit ancestry graph the both
ends walk to the intersection of the fake view of the ancestry graph both
ends have. Then server uses those virtual grafts to calculate which objects
to send.
The rest is done (or should be done) by history grafting code.
>> * merge bases for all commits in full, and in the sparse part,
>> _including_ merge bases themselves
>
> Hmmm. You cannot know _all_ merge bases beforehand, because you do not
> decide where other people fork off.
By all merge bases I mean merge bases for all commits in full part, merge
bases for all commits in full part and commits pointed by tags in sparse
part, merge bases for all commits in full part and tagged in sparse part
and merge bases in sparse part etc. recursively.
>> * all roots
>
> Why?
Just in case, as an ultimate merge bases.
> P.S.: I think the problems of a lazy clone are much easier to solve...
I still think that the correct idea for the lazy clone is to have soft
grafts, so you have to solve at least part of shallo clone/sparse clone
problems first.
--
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git
^ permalink raw reply
* Re: git-cvsimport doesn't quite work, wrt branches
From: sf @ 2006-06-14 9:37 UTC (permalink / raw)
To: git
In-Reply-To: <46a038f90606131555m7b1fa744g9770140c87598b7b@mail.gmail.com>
Martin Langhoff wrote:
...
> Yes, cvsps is relying on the wrong things. I am looking at parsecvs
> and the cvs2svn tool and wondering where to from here.
...
> I am starting to look at what I can do with cvs2svn to get the import
> into git. It seems to get very good patchsets, and it yields an easily
> readable DB. I'll either learn Python, or read the DB from Perl
> (probably from git-cvsimport).
SVN has a portable format called "dumpfile" (see
http://svn.collab.net/repos/svn/trunk/notes/fs_dumprestore.txt) which is
produced by "svnadmin dump ..." and "cvs2svn --dump-only ...".
Why not use it as input for importing into git?
Pros:
- "svnadmin dump" should be fast
- svn repositories can be tracked with "svnadmin dump" (just remember
the last imported revision and restart from there)
- cvs2svn seems to be very good at its job
- only one tool needed
Cons:
- Both svnadmin and cvs2svn only work on local repositories
- cvs2svn cannot be used for tracking
Regards
Stephan
^ permalink raw reply
* Re: Repacking many disconnected blobs
From: Sergey Vlasov @ 2006-06-14 9:37 UTC (permalink / raw)
To: Keith Packard; +Cc: git
In-Reply-To: <1150269478.20536.150.camel@neko.keithp.com>
[-- Attachment #1: Type: text/plain, Size: 2149 bytes --]
On Wed, 14 Jun 2006 00:17:58 -0700 Keith Packard wrote:
> parsecvs scans every ,v file and creates a blob for every revision of
> every file right up front. Once these are created, it discards the
> actual file contents and deals solely with the hash values.
>
> The problem is that while this is going on, the repository consists
> solely of disconnected objects, and I can't make git-repack put those
> into pack objects. This leaves the directories bloated, and operations
> within the tree quite sluggish. I'm importing a project with 30000 files
> and 30000 revisions (the CVS repository is about 700MB), and after
> scanning the files, and constructing (in memory) a complete revision
> history, the actual construction of the commits is happening at about 2
> per second, and about 70% of that time is in the kernel, presumably
> playing around in the repository.
>
> I'm assuming that if I could get these disconnected blobs all neatly
> tucked into a pack object, things might go a bit faster.
git-repack.sh basically does:
git-rev-list --objects --all | git-pack-objects .tmp-pack
When you have only disconnected blobs, obviously the first part does
not work - git-rev-list cannot find these blobs. However, you can do
that part manually - e.g., when you add a blob, do:
fprintf(list_file, "%s %s\n", sha1, path);
(path should be a relative path in the repo without ",v" or "Attic" -
it is used for delta packing optimization, so getting it wrong will
not cause any corruption, but the pack may become significantly
larger). You may output some duplicate sha1 values, but
git-pack-objects should handle duplicates correctly.
Then just invoke "git-pack-objects --non-empty .tmp_pack <list_file";
it will output the resulting pack sha1 to stdout. Then you need to
move the pack into place and call git-prune-packed (which does not
use object lists, so it should work even with unreachable objects).
You may even want to repack more than once during the import;
probably the simplest way to do it is to truncate list_file after
each repack and use "git-pack-objects --incremental".
[-- Attachment #2: Type: application/pgp-signature, Size: 190 bytes --]
^ permalink raw reply
* Re: 'sparse' clone idea
From: Johannes Schindelin @ 2006-06-14 9:20 UTC (permalink / raw)
To: Jakub Narebski; +Cc: git
In-Reply-To: <e6oh2g$ngh$1@sea.gmane.org>
Hi,
On Wed, 14 Jun 2006, Jakub Narebski wrote:
> I wonder if 'sparse clone' idea described below would avoid the most
> difficult part of 'shallow clone' idea, namely the [sometimes] need to
> un-cauterize history. See: (<7vac8lidwi.fsf@assigned-by-dhcp.cox.net>).
I do not think that is the hardest problem. The hardest thing is to tell
the server in an efficient manner which objects we have.
Example:
A - B - C - D
^ cutoff
^ current HEAD
Suppose B is your fake root, C is your HEAD, you want to fetch D. Now,
make it a difficult example: both A and D contain a certain blob Z, but
neither B nor C do. You have to tell the server _in an efficient manner_
to send Z also.
And by efficient manner I mean: you may not bring the server down just
because 5 people with shallow clones decide to fetch from it.
> 'sparse clone' begins like 'shallow clone': full history is copied down to
> specified point of history (cut-off or cauterization point for shallow
> clone), but instead of cauterizing the history from that point downwards,
> the history is simplified using grafts.
>
> In the sparse part we need:
> * all commits pointed by tags (if we clone/copy tags)
> and other refs (if we clone/copy those tags)
> * merge bases for all commits in full, and in the sparse part,
> _including_ merge bases themselves
Hmmm. You cannot know _all_ merge bases beforehand, because you do not
decide where other people fork off.
> * all roots
Why?
> Commits in sparse part would be connected like in original history, only
> skipping "uniteresting" commits.
Interesting idea, though I do not think it solves the most pressing
problems we have with shallow clones.
Ciao,
Dscho
P.S.: I think the problems of a lazy clone are much easier to solve...
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox