* Re: [PATCH] Transitively read alternatives
From: Jakub Narebski @ 2006-05-07 18:28 UTC (permalink / raw)
To: git
In-Reply-To: <20060507181920.GC23738@admingilde.org>
Martin Waitz wrote:
> When adding an alternate object store then add entries from its
> info/alternates files, too.
> Relative entries are only allowed in the current repository.
> Loops and duplicate alternates through multiple repositories are ignored.
> Just to be sure that nothing breaks it is not allow to build deep
> nesting levels using info/alternates.
> + if (depth > 5) {
> + error("%s: ignoring alternate object stores, nesting too deep.",
> + relative_base);
> + return;
> + }
Please, no magic numbers. Use preprocesor "constants" for that.
--
Jakub Narebski
Warsaw, Poland
^ permalink raw reply
* [PATCH] clone: don't clone the info/alternates file
From: Martin Waitz @ 2006-05-07 18:19 UTC (permalink / raw)
To: git
Now that the cloned alternates file is parsed, too we don't need to
copy it into our new repository, we just reference it.
Signed-off-by: Martin Waitz <tali@admingilde.org>
---
git-clone.sh | 6 +-----
1 files changed, 1 insertions(+), 5 deletions(-)
4dc26b7a15c0778459f2ccf85aad1c03d1b3a3cc
diff --git a/git-clone.sh b/git-clone.sh
index b785247..227245c 100755
--- a/git-clone.sh
+++ b/git-clone.sh
@@ -261,11 +261,7 @@ yes,yes)
;;
yes)
mkdir -p "$GIT_DIR/objects/info"
- {
- test -f "$repo/objects/info/alternates" &&
- cat "$repo/objects/info/alternates";
- echo "$repo/objects"
- } >>"$GIT_DIR/objects/info/alternates"
+ echo "$repo/objects" >> "$GIT_DIR/objects/info/alternates"
;;
esac
git-ls-remote "$repo" >"$GIT_DIR/CLONE_HEAD"
--
1.3.1.g6ef7
^ permalink raw reply related
* [PATCH] test case for transitive info/alternates
From: Martin Waitz @ 2006-05-07 18:19 UTC (permalink / raw)
To: git
Signed-off-by: Martin Waitz <tali@admingilde.org>
---
t/t5710-info-alternate.sh | 105 +++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 105 insertions(+), 0 deletions(-)
create mode 100755 t/t5710-info-alternate.sh
47d59f8b0c26226a40344673ecd9e6255a576b98
diff --git a/t/t5710-info-alternate.sh b/t/t5710-info-alternate.sh
new file mode 100755
index 0000000..097d037
--- /dev/null
+++ b/t/t5710-info-alternate.sh
@@ -0,0 +1,105 @@
+#!/bin/sh
+#
+# Copyright (C) 2006 Martin Waitz <tali@admingilde.org>
+#
+
+test_description='test transitive info/alternate entries'
+. ./test-lib.sh
+
+# test that a file is not reachable in the current repository
+# but that it is after creating a info/alternate entry
+reachable_via() {
+ alternate="$1"
+ file="$2"
+ if git cat-file -e "HEAD:$file"; then return 1; fi
+ echo "$alternate" >> .git/objects/info/alternate
+ git cat-file -e "HEAD:$file"
+}
+
+test_valid_repo() {
+ git fsck-objects --full > fsck.log &&
+ test `wc -l < fsck.log` = 0
+}
+
+base_dir=`pwd`
+
+test_expect_success 'preparing first repository' \
+'test_create_repo A && cd A &&
+echo "Hello World" > file1 &&
+git add file1 &&
+git commit -m "Initial commit" file1 &&
+git repack -a -d &&
+git prune'
+
+cd "$base_dir"
+
+test_expect_success 'preparing second repository' \
+'git clone -l -s A B && cd B &&
+echo "foo bar" > file2 &&
+git add file2 &&
+git commit -m "next commit" file2 &&
+git repack -a -d -l &&
+git prune'
+
+cd "$base_dir"
+
+test_expect_success 'preparing third repository' \
+'git clone -l -s B C && cd C &&
+echo "Goodbye, cruel world" > file3 &&
+git add file3 &&
+git commit -m "one more" file3 &&
+git repack -a -d -l &&
+git prune'
+
+cd "$base_dir"
+
+test_expect_failure 'creating too deep nesting' \
+'git clone -l -s C D &&
+git clone -l -s D E &&
+git clone -l -s E F &&
+git clone -l -s F G &&
+test_valid_repo'
+
+cd "$base_dir"
+
+test_expect_success 'validity of third repository' \
+'cd C &&
+test_valid_repo'
+
+cd "$base_dir"
+
+test_expect_success 'validity of fourth repository' \
+'cd D &&
+test_valid_repo'
+
+cd "$base_dir"
+
+test_expect_success 'breaking of loops' \
+"echo '$base_dir/B/.git/objects' >> '$base_dir'/A/.git/objects/info/alternates&&
+cd C &&
+test_valid_repo"
+
+cd "$base_dir"
+
+test_expect_failure 'that info/alternates is neccessary' \
+'cd C &&
+rm .git/objects/info/alternates &&
+test_valid_repo'
+
+cd "$base_dir"
+
+test_expect_success 'that relative alternate is possible for current dir' \
+'cd C &&
+echo "../../../B/.git/objects" > .git/objects/info/alternates &&
+test_valid_repo'
+
+cd "$base_dir"
+
+test_expect_failure 'that relative alternate is only possible for current dir' \
+'cd D &&
+test_valid_repo'
+
+cd "$base_dir"
+
+test_done
+
--
1.3.1.g6ef7
^ permalink raw reply related
* [PATCH] Transitively read alternatives
From: Martin Waitz @ 2006-05-07 18:19 UTC (permalink / raw)
To: git
When adding an alternate object store then add entries from its
info/alternates files, too.
Relative entries are only allowed in the current repository.
Loops and duplicate alternates through multiple repositories are ignored.
Just to be sure that nothing breaks it is not allow to build deep
nesting levels using info/alternates.
Signed-off-by: Martin Waitz <tali@admingilde.org>
---
sha1_file.c | 178 +++++++++++++++++++++++++++++++++++------------------------
1 files changed, 106 insertions(+), 72 deletions(-)
34981f5467a86bb09120cc3c9637b98048fada04
diff --git a/sha1_file.c b/sha1_file.c
index 5464828..b62d0e3 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -217,6 +217,8 @@ char *sha1_pack_index_name(const unsigne
struct alternate_object_database *alt_odb_list;
static struct alternate_object_database **alt_odb_tail;
+static void read_info_alternates(const char * alternates, int depth);
+
/*
* Prepare alternate object database registry.
*
@@ -232,14 +234,85 @@ static struct alternate_object_database
* SHA1, an extra slash for the first level indirection, and the
* terminating NUL.
*/
-static void link_alt_odb_entries(const char *alt, const char *ep, int sep,
- const char *relative_base)
+static int link_alt_odb_entry(const char * entry, int len, const char * relative_base, int depth)
{
- const char *cp, *last;
- struct alternate_object_database *ent;
+ struct stat st;
const char *objdir = get_object_directory();
+ struct alternate_object_database *ent;
+ struct alternate_object_database *alt;
+ /* 43 = 40-byte + 2 '/' + terminating NUL */
+ int pfxlen = len;
+ int entlen = pfxlen + 43;
int base_len = -1;
+ if (*entry != '/' && relative_base) {
+ /* Relative alt-odb */
+ if (base_len < 0)
+ base_len = strlen(relative_base) + 1;
+ entlen += base_len;
+ pfxlen += base_len;
+ }
+ ent = xmalloc(sizeof(*ent) + entlen);
+
+ if (*entry != '/' && relative_base) {
+ memcpy(ent->base, relative_base, base_len - 1);
+ ent->base[base_len - 1] = '/';
+ memcpy(ent->base + base_len, entry, len);
+ }
+ else
+ memcpy(ent->base, entry, pfxlen);
+
+ ent->name = ent->base + pfxlen + 1;
+ ent->base[pfxlen + 3] = '/';
+ ent->base[pfxlen] = ent->base[entlen-1] = 0;
+
+ /* Detect cases where alternate disappeared */
+ if (stat(ent->base, &st) || !S_ISDIR(st.st_mode)) {
+ error("object directory %s does not exist; "
+ "check .git/objects/info/alternates.",
+ ent->base);
+ free(ent);
+ return -1;
+ }
+
+ /* Prevent the common mistake of listing the same
+ * thing twice, or object directory itself.
+ */
+ for (alt = alt_odb_list; alt; alt = alt->next) {
+ if (!memcmp(ent->base, alt->base, pfxlen)) {
+ free(ent);
+ return -1;
+ }
+ }
+ if (!memcmp(ent->base, objdir, pfxlen)) {
+ free(ent);
+ return -1;
+ }
+
+ /* add the alternate entry */
+ *alt_odb_tail = ent;
+ alt_odb_tail = &(ent->next);
+ ent->next = NULL;
+
+ /* recursively add alternates */
+ read_info_alternates(ent->base, depth + 1);
+
+ ent->base[pfxlen] = '/';
+
+ return 0;
+}
+
+static void link_alt_odb_entries(const char *alt, const char *ep, int sep,
+ const char *relative_base, int depth)
+{
+ const char *cp, *last;
+
+ if (depth > 5) {
+ error("%s: ignoring alternate object stores, nesting too deep.",
+ relative_base);
+ return;
+ }
+
last = alt;
while (last < ep) {
cp = last;
@@ -249,60 +322,15 @@ static void link_alt_odb_entries(const c
last = cp + 1;
continue;
}
- for ( ; cp < ep && *cp != sep; cp++)
- ;
+ while (cp < ep && *cp != sep)
+ cp++;
if (last != cp) {
- struct stat st;
- struct alternate_object_database *alt;
- /* 43 = 40-byte + 2 '/' + terminating NUL */
- int pfxlen = cp - last;
- int entlen = pfxlen + 43;
-
- if (*last != '/' && relative_base) {
- /* Relative alt-odb */
- if (base_len < 0)
- base_len = strlen(relative_base) + 1;
- entlen += base_len;
- pfxlen += base_len;
- }
- ent = xmalloc(sizeof(*ent) + entlen);
-
- if (*last != '/' && relative_base) {
- memcpy(ent->base, relative_base, base_len - 1);
- ent->base[base_len - 1] = '/';
- memcpy(ent->base + base_len,
- last, cp - last);
- }
- else
- memcpy(ent->base, last, pfxlen);
-
- ent->name = ent->base + pfxlen + 1;
- ent->base[pfxlen + 3] = '/';
- ent->base[pfxlen] = ent->base[entlen-1] = 0;
-
- /* Detect cases where alternate disappeared */
- if (stat(ent->base, &st) || !S_ISDIR(st.st_mode)) {
- error("object directory %s does not exist; "
- "check .git/objects/info/alternates.",
- ent->base);
- goto bad;
- }
- ent->base[pfxlen] = '/';
-
- /* Prevent the common mistake of listing the same
- * thing twice, or object directory itself.
- */
- for (alt = alt_odb_list; alt; alt = alt->next)
- if (!memcmp(ent->base, alt->base, pfxlen))
- goto bad;
- if (!memcmp(ent->base, objdir, pfxlen)) {
- bad:
- free(ent);
- }
- else {
- *alt_odb_tail = ent;
- alt_odb_tail = &(ent->next);
- ent->next = NULL;
+ if ((*last != '/') && depth) {
+ error("%s: ignoring relative alternate object store %s",
+ relative_base, last);
+ } else {
+ link_alt_odb_entry(last, cp - last,
+ relative_base, depth);
}
}
while (cp < ep && *cp == sep)
@@ -311,23 +339,14 @@ static void link_alt_odb_entries(const c
}
}
-void prepare_alt_odb(void)
+static void read_info_alternates(const char * relative_base, int depth)
{
- char path[PATH_MAX];
char *map;
- int fd;
struct stat st;
- char *alt;
-
- alt = getenv(ALTERNATE_DB_ENVIRONMENT);
- if (!alt) alt = "";
-
- if (alt_odb_tail)
- return;
- alt_odb_tail = &alt_odb_list;
- link_alt_odb_entries(alt, alt + strlen(alt), ':', NULL);
+ char path[PATH_MAX];
+ int fd;
- sprintf(path, "%s/info/alternates", get_object_directory());
+ sprintf(path, "%s/info/alternates", relative_base);
fd = open(path, O_RDONLY);
if (fd < 0)
return;
@@ -340,11 +359,26 @@ void prepare_alt_odb(void)
if (map == MAP_FAILED)
return;
- link_alt_odb_entries(map, map + st.st_size, '\n',
- get_object_directory());
+ link_alt_odb_entries(map, map + st.st_size, '\n', relative_base, depth);
+
munmap(map, st.st_size);
}
+void prepare_alt_odb(void)
+{
+ char *alt;
+
+ alt = getenv(ALTERNATE_DB_ENVIRONMENT);
+ if (!alt) alt = "";
+
+ if (alt_odb_tail)
+ return;
+ alt_odb_tail = &alt_odb_list;
+ link_alt_odb_entries(alt, alt + strlen(alt), ':', NULL, 0);
+
+ read_info_alternates(get_object_directory(), 0);
+}
+
static char *find_sha1_file(const unsigned char *sha1, struct stat *st)
{
char *name = sha1_file_name(sha1);
--
1.3.1.g6ef7
^ permalink raw reply related
* [PATCH] clone: keep --reference even with -l -s
From: Martin Waitz @ 2006-05-07 18:19 UTC (permalink / raw)
To: git
Both -l -s and --reference update objects/info/alternates and used
to write over each other.
Signed-off-by: Martin Waitz <tali@admingilde.org>
---
git-clone.sh | 2 +
t/t5700-clone-reference.sh | 78 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 79 insertions(+), 1 deletions(-)
create mode 100755 t/t5700-clone-reference.sh
3410837e34357d43a38a170691ca45f8f3a82221
diff --git a/git-clone.sh b/git-clone.sh
index 0805168..b785247 100755
--- a/git-clone.sh
+++ b/git-clone.sh
@@ -265,7 +265,7 @@ yes,yes)
test -f "$repo/objects/info/alternates" &&
cat "$repo/objects/info/alternates";
echo "$repo/objects"
- } >"$GIT_DIR/objects/info/alternates"
+ } >>"$GIT_DIR/objects/info/alternates"
;;
esac
git-ls-remote "$repo" >"$GIT_DIR/CLONE_HEAD"
diff --git a/t/t5700-clone-reference.sh b/t/t5700-clone-reference.sh
new file mode 100755
index 0000000..916ee15
--- /dev/null
+++ b/t/t5700-clone-reference.sh
@@ -0,0 +1,78 @@
+#!/bin/sh
+#
+# Copyright (C) 2006 Martin Waitz <tali@admingilde.org>
+#
+
+test_description='test clone --reference'
+. ./test-lib.sh
+
+base_dir=`pwd`
+
+test_expect_success 'preparing first repository' \
+'test_create_repo A && cd A &&
+echo first > file1 &&
+git add file1 &&
+git commit -m initial'
+
+cd "$base_dir"
+
+test_expect_success 'preparing second repository' \
+'git clone A B && cd B &&
+echo second > file2 &&
+git add file2 &&
+git commit -m addition &&
+git repack -a -d &&
+git prune'
+
+cd "$base_dir"
+
+test_expect_success 'cloning with reference' \
+'git clone -l -s --reference B A C'
+
+cd "$base_dir"
+
+test_expect_success 'existance of info/alternates' \
+'test `wc -l <C/.git/objects/info/alternates` = 2'
+
+cd "$base_dir"
+
+test_expect_success 'pulling from reference' \
+'cd C &&
+git pull ../B'
+
+cd "$base_dir"
+
+test_expect_success 'that reference gets used' \
+'cd C &&
+echo "0 objects, 0 kilobytes" > expected &&
+git count-objects > current &&
+diff expected current'
+
+cd "$base_dir"
+
+test_expect_success 'updating origin' \
+'cd A &&
+echo third > file3 &&
+git add file3 &&
+git commit -m update &&
+git repack -a -d &&
+git prune'
+
+cd "$base_dir"
+
+test_expect_success 'pulling changes from origin' \
+'cd C &&
+git pull origin'
+
+cd "$base_dir"
+
+# the 2 local objects are commit and tree from the merge
+test_expect_success 'that alternate to origin gets used' \
+'cd C &&
+echo "2 objects" > expected &&
+git count-objects | cut -d, -f1 > current &&
+diff expected current'
+
+cd "$base_dir"
+
+test_done
--
1.3.1.g6ef7
^ permalink raw reply related
* [PATCH] repack: honor -d even when no new pack was created
From: Martin Waitz @ 2006-05-07 18:18 UTC (permalink / raw)
To: git
If all objects are reachable via an alternate object store then we
still have to remove all obsolete local packs.
Signed-off-by: Martin Waitz <tali@admingilde.org>
---
git-repack.sh | 14 +++++++-------
1 files changed, 7 insertions(+), 7 deletions(-)
c37df94d2b1633ce329bbe079805073b40a48548
diff --git a/git-repack.sh b/git-repack.sh
index e0c9f32..4fb3f26 100755
--- a/git-repack.sh
+++ b/git-repack.sh
@@ -48,15 +48,15 @@ name=$(git-rev-list --objects --all $rev
exit 1
if [ -z "$name" ]; then
echo Nothing new to pack.
- exit 0
-fi
-echo "Pack pack-$name created."
+else
+ echo "Pack pack-$name created."
-mkdir -p "$PACKDIR" || exit
+ mkdir -p "$PACKDIR" || exit
-mv .tmp-pack-$name.pack "$PACKDIR/pack-$name.pack" &&
-mv .tmp-pack-$name.idx "$PACKDIR/pack-$name.idx" ||
-exit
+ mv .tmp-pack-$name.pack "$PACKDIR/pack-$name.pack" &&
+ mv .tmp-pack-$name.idx "$PACKDIR/pack-$name.idx" ||
+ exit
+fi
if test "$remove_redundant" = t
then
--
1.3.1.g6ef7
^ permalink raw reply related
* Re: symlinks
From: Linus Torvalds @ 2006-05-07 18:07 UTC (permalink / raw)
To: Yakov Lerner; +Cc: git
In-Reply-To: <f36b08ee0605071047h32ccef4bk76ac360ada1331a@mail.gmail.com>
On Sun, 7 May 2006, Yakov Lerner wrote:
>
> I have a project that makes heavy use of symlinks in the source tree.
> I added it to git, then cloned the repository, and all symlinks were
> converted to plain files. What am I missing to preserve symlinks ?
You're not missing anything, it sounds like a bug. What did you use to
clone, and what version? It definitely doesn't happen for me:
mkdir symlink
cd symlink/
git-init-db
ln -s unknown new-link
ls -l
git add new-link
git commit
cd
git clone symlink symlink2
cd symlink2/
ls -l
shows that the symlink was preserved.
But maybe there's a bug in some older version, or in some other clone
protocol...
Linus
^ permalink raw reply
* symlinks
From: Yakov Lerner @ 2006-05-07 17:47 UTC (permalink / raw)
To: git
Hello,
I have a project that makes heavy use of symlinks in the source tree.
I added it to git, then cloned the repository, and all symlinks were
converted to plain files. What am I missing to preserve symlinks ?
Yakov
^ permalink raw reply
* [PATCH] core-tutorial.txt: escape asterisk
From: Matthias Lederhofer @ 2006-05-07 17:32 UTC (permalink / raw)
To: git
---
I don't now exactly why, but this asterisk has to be escaped to show
up correctly in the html format.
Documentation/core-tutorial.txt | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
7e3a9fbafb8b6aa4ce460221a982feda06549215
diff --git a/Documentation/core-tutorial.txt b/Documentation/core-tutorial.txt
index 4211c81..d1360ec 100644
--- a/Documentation/core-tutorial.txt
+++ b/Documentation/core-tutorial.txt
@@ -971,7 +971,7 @@ environment, is `git show-branch`.
The first two lines indicate that it is showing the two branches
and the first line of the commit log message from their
top-of-the-tree commits, you are currently on `master` branch
-(notice the asterisk `*` character), and the first column for
+(notice the asterisk `\*` character), and the first column for
the later output lines is used to show commits contained in the
`master` branch, and the second column for the `mybranch`
branch. Three commits are shown along with their log messages.
--
1.3.2
^ permalink raw reply related
* [PATCH] Fix crash when reading the empty tree
From: Johannes Schindelin @ 2006-05-07 15:42 UTC (permalink / raw)
To: git, junkio
cvsimport needs to call git-read-tree without arguments to create an empty
tree.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
---
read-tree.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/read-tree.c b/read-tree.c
index 067fb95..a060a97 100644
--- a/read-tree.c
+++ b/read-tree.c
@@ -881,7 +881,7 @@ int main(int argc, char **argv)
* valid cache-tree because the index must match exactly
* what came from the tree.
*/
- if (trees->item && (!merge || (stage == 2))) {
+ if (trees && trees->item && (!merge || (stage == 2))) {
cache_tree_free(&active_cache_tree);
prime_cache_tree();
}
--
1.3.2.g5131-dirty
^ permalink raw reply related
* Re: Unresolved issues #2 (shallow clone again)
From: Linus Torvalds @ 2006-05-07 15:27 UTC (permalink / raw)
To: Jeff King; +Cc: git
In-Reply-To: <20060507075631.GA24423@coredump.intra.peff.net>
On Sun, 7 May 2006, Jeff King wrote:
>
> - Total savings by going shallow: 10.7%
>
> So basically, trees and commits DON'T compress as well as historical
> blobs (potentially because git-pack-objects isn't currently optimized
> for this -- I haven't checked). As a result, we're saving only 10% by
> going shallow instead of a potential 50%.
The biggest size savers from packing is (in rough order of relevance, if
I recall the rough statistics I did):
- avoiding block boundaries.
- delta packing of blobs
- delta packing of trees
- regular compression
The block boundaries are huge, we have tons of small objects, and that was
one of the primary reasons for packing. I'd suspect that this is a 3:1
factor for a lot of things for many "common" filesystem setups. You
probably didn't even account for the size of inodes in your "du" setup.
And blobs with history generally delta very well (_much_ better than
regular compression).
Trees should _delta_ very well, but they basically don't compress,
especially after deltaing. The SHA1's are totally incompressible (in a
tree they aren't even ASCII), and as a deta, the names won't compress much
either because they are short.
Commits are fairly small, shouldn't delta all that much, and they don't
even compress _that_ well either (they're normal text and often have some
redundancy with the committer and author being the same, but they are
short and have some fairly incompressible elements, so..)
The thing with trees in particular is that they are very common for the
kernel (and probably not so much for many other projects). A single commit
ends up quite commonly being just one commit object, one blob (that deltas
really well), and three or four trees. Merges often have no new blobs at
all, just several new trees and the commit object.
So a huge amount of the wins from packing come from the file _history_,
the part that a shallow clone (on purpose) leaves behind.
The regular compression will pick up a fair amount of slack with the
blobs, but it's a much smaller factor than the delta compression for
something that has a long history.
It's somewhat interesting to note that over the year that we've used git,
the kernel pack-size hasn't even increased all that much. I forget exactly
what it was when we started packing, but it was on the order of ~75M. It
is now 115M for me. And the old linux-history thing (full BK history over
three years) is 177M - not much more than twice the size of just a few
kernel versions - with some higher packing ratios..
Exactly because blobs delta so incredibly well.
Linus
^ permalink raw reply
* [PATCH] Sparse fix for builtin-diff
From: Peter Hagervall @ 2006-05-07 14:50 UTC (permalink / raw)
To: junkio; +Cc: git
You gotta love sparse:
builtin-diff.c:88:4: error: Just how const do you want this type to be?
Signed-off-by: Peter Hagervall <hager@cs.umu.se>
---
builtin-diff.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)
diff --git a/builtin-diff.c b/builtin-diff.c
index 636edbf..d3ac581 100644
--- a/builtin-diff.c
+++ b/builtin-diff.c
@@ -84,8 +84,7 @@ static void stuff_change(struct diff_opt
if (opt->reverse_diff) {
unsigned tmp;
- const
- const unsigned char *tmp_u;
+ const unsigned char *tmp_u;
const char *tmp_c;
tmp = old_mode; old_mode = new_mode; new_mode = tmp;
tmp_u = old_sha1; old_sha1 = new_sha1; new_sha1 = tmp_u;
^ permalink raw reply related
* Re: Unresolved issues #2 (shallow clone again)
From: Jakub Narebski @ 2006-05-07 13:30 UTC (permalink / raw)
To: git
In-Reply-To: <7v1wv92u7o.fsf@assigned-by-dhcp.cox.net>
Junio C Hamano wrote:
> The vocabulary we would want from the requestor side is probably
> (at least):
>
> I WANT to have these
> I HAVE these
> I'm MISSING these
> Don't bother with these this time around (--since, ^v2.6.16, ...)
Wouldn't it be easier (sorry, no code yet) to have the following:
I WANT to have these
I HAVE these
These are GRAFT PARENTLESS
with the target side sending list of all parentless commits in the
info/grafts file. The source side will then do the grafting 'in memory' and
send the packs like normal, only with those cauterizing grafts in place.
Now I'm waiting for someone to say that it is too simple and cannot be done,
or that shallow clone/shallow fetch uses this method...
--
Jakub Narebski
Warsaw, Poland
^ permalink raw reply
* Re: [PATCH] config: mmap() the contents of the config file
From: Johannes Schindelin @ 2006-05-07 11:40 UTC (permalink / raw)
To: git, junkio
In-Reply-To: <Pine.LNX.4.63.0605070103270.26358@wbgn013.biozentrum.uni-wuerzburg.de>
Hi,
On Sun, 7 May 2006, Johannes Schindelin wrote:
> This makes it possible to rewrite the config without accessing the config
> file twice.
Since commit c2b9e699, this is needed, too:
diff --git a/config.c b/config.c
index 59006fa..30effe3 100644
--- a/config.c
+++ b/config.c
@@ -362,7 +362,7 @@ static int store_aux(const char* key, co
} else if (strrchr(key, '.') - key == store.baselen &&
!strncmp(key, store.key, store.baselen)) {
store.state = SECTION_SEEN;
- store.offset[store.seen] = ftell(config_file);
+ store.offset[store.seen] = config_offset;
}
}
return 0;
--
1.3.2.g4e38b-dirty
^ permalink raw reply related
* Re: Unresolved issues #2
From: Jakub Narebski @ 2006-05-07 11:38 UTC (permalink / raw)
To: git
In-Reply-To: <7vy7xekwbs.fsf@assigned-by-dhcp.cox.net>
Junio C Hamano wrote:
> char *value; /* "existence" bool may have NULL,
> * otherwise probably a string "= value"
> */
Probably " = value" to preserve whitespace (e.g. justify on equal sign in
hand crafted config file).
--
Jakub Narebski
Warsaw, Poland
^ permalink raw reply
* Re: Unresolved issues #2
From: Johannes Schindelin @ 2006-05-07 11:31 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <7vy7xekwbs.fsf@assigned-by-dhcp.cox.net>
Hi,
On Sun, 7 May 2006, Junio C Hamano wrote:
> [...] Probably a reasonable convention would be to define the config
> file format to be something like this:
>
> <comment that applies to the section>
> [section]
> <comment that applies to the variable stands on
> its own before the variable>
> variable [= value] <comment that applies to the
> fact the variable is set to
> this particular value starts
> on the same line as the
> "variable = value" thing>
>
> - when a variable is reset to another value, remove the
> "value comment";
> - when a variable disappears, remove "variable comment";
> - when a section disappears, remove "section comment";
> - otherwise leave the comment intact.
>
> Then we could tell the user the rule is like above, and tell
> them to structure the file with comments that way, if they ever
> want to edit the file by hand.
>
> Now if we wanted to do something like the above, I suspect that
> it would be easier and less error prone to first scan the config
> file, note what appears where, and do the processing in-core,
> and then write the results out, perhaps using data structures
> like these:
>
> struct config_section {
> char *pre_comment;
> char *name; /* e.g. "core" */
> struct config_section *next; /* next section */
> struct config_var *vars; /* pointer to the first one */
> };
> struct config_var {
> char *pre_comment;
> char *name;
> char *value; /* "existence" bool may have NULL,
> * otherwise probably a string "= value"
> */
> char *value_comment;
> struct config_var *next; /* pointer to the next one
> * in the section
> */
> };
>
> Obviously, data structures like these would make it even easier
> if we decide we do _not_ care about comments (we would just lose
> x_comment fields, parse the thing and write the resulting list
> out).
Sounds very reasonable.
Ciao,
Dscho
^ permalink raw reply
* Re: Unresolved issues #2
From: Junio C Hamano @ 2006-05-07 9:42 UTC (permalink / raw)
To: git
In-Reply-To: <7vy7xekwbs.fsf@assigned-by-dhcp.cox.net>
Junio C Hamano <junkio@cox.net> writes:
> But what about prefersymlinkrefs one? When setting the variable
> to such a non-standard value, it is unreasonable for people to
> want to justify why with a comment like the above.
Obviously I was not reading what I was typing. It is very
reasonable for people to want to do that. Sorry.
^ permalink raw reply
* Re: Unresolved issues #2
From: Junio C Hamano @ 2006-05-07 9:39 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git
In-Reply-To: <Pine.LNX.4.63.0605062332420.6423@wbgn013.biozentrum.uni-wuerzburg.de>
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> It was done because the very syntax of the config suggests it be a
> user-editable file. I do not want to mess with the comments more than
> necessary.
I personally feel that is a lost cause _unless_ you come up with
a reasonable convention for where to put comments, stress that
rule to the user in the documentation, _and_ make repo-config to
follow that rule as well.
We _do_ want to treat config file as hand editable and cat
reviewable file, not an unreadable gunk like xml, so trying to
preserve user comments is important and I am not opposed to that
you did (at least some of) it. But as the code currently
stands, what it does is at best half baked, at worst somewhat
confusing.
A demonstration. What is wrong with this picture?
$ cat .git/config
[core]
repositoryformatversion = 0
; are the mode bits trustworthy?
filemode = true ; yes, on ext3
; We want symlinked HEAD because we will bisect
; recent kernel history.
prefersymlinkrefs = true
$ git repo-config core.prefersymlinkrefs false
$ git repo-config core.filemode false
$ cat .git/config
[core]
repositoryformatversion = 0
; are the mode bits trustworthy?
filemode = false
; We want symlinked HEAD because we will bisect
; recent kernel history.
prefersymlinkrefs = false
$ exit
The comment given to "filemode" is "reasonable" in that it
describes what the value that is set to the variable does, and
losing the original comment given to its "true" when we set it
to false is better than keeping it, so that part happens to be
doing the right thing -- only because I knew what repo-config
would do to the comments and arranged original comments in the
file that way.
But what about prefersymlinkrefs one? When setting the variable
to such a non-standard value, it is unreasonable for people to
want to justify why with a comment like the above. But after
resetting the value the comment becomes stale.
It gets worse:
$ git repo-config --unset core.filemode
$ cat .git/config
[core]
repositoryformatversion = 0
; please please use symlinks please
prefersymlinkrefs = false
; are the mode bits trustworthy?
$ exit
There now is a confusing trailing comment left that does not
comment anything. Removing core.filemode is not so common, but
this can happen whenever you remove any variable, so we can use
any other variable as an example.
Now, enough being negative and pointing out problems. Time to
become constructive. Probably a reasonable convention would be
to define the config file format to be something like this:
<comment that applies to the section>
[section]
<comment that applies to the variable stands on
its own before the variable>
variable [= value] <comment that applies to the
fact the variable is set to
this particular value starts
on the same line as the
"variable = value" thing>
- when a variable is reset to another value, remove the
"value comment";
- when a variable disappears, remove "variable comment";
- when a section disappears, remove "section comment";
- otherwise leave the comment intact.
Then we could tell the user the rule is like above, and tell
them to structure the file with comments that way, if they ever
want to edit the file by hand.
Now if we wanted to do something like the above, I suspect that
it would be easier and less error prone to first scan the config
file, note what appears where, and do the processing in-core,
and then write the results out, perhaps using data structures
like these:
struct config_section {
char *pre_comment;
char *name; /* e.g. "core" */
struct config_section *next; /* next section */
struct config_var *vars; /* pointer to the first one */
};
struct config_var {
char *pre_comment;
char *name;
char *value; /* "existence" bool may have NULL,
* otherwise probably a string "= value"
*/
char *value_comment;
struct config_var *next; /* pointer to the next one
* in the section
*/
};
Obviously, data structures like these would make it even easier
if we decide we do _not_ care about comments (we would just lose
x_comment fields, parse the thing and write the resulting list
out).
^ permalink raw reply
* Re: [PATCH] config: if mtime (or size) of the config file changed since last read, reread it
From: Junio C Hamano @ 2006-05-07 8:42 UTC (permalink / raw)
To: Jan-Benedict Glaw; +Cc: git
In-Reply-To: <20060507073052.GC17031@lug-owl.de>
Jan-Benedict Glaw <jbglaw@lug-owl.de> writes:
>> + if (in_fd < 0 && ENOENT != errno )
>
> I admit that I don't like the (constant -operator- variable) notation,
> but mixing both in one line..?
Lol.
I would have written it as (in_fd < 0 && errno != ENOENT) BTW.
^ permalink raw reply
* Re: Unresolved issues #2 (shallow clone again)
From: Sergey Vlasov @ 2006-05-07 8:01 UTC (permalink / raw)
To: Martin Langhoff; +Cc: Junio C Hamano, git
In-Reply-To: <46a038f90605062308x53995076k7bf45f0aebcae0c6@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 1915 bytes --]
On Sun, 7 May 2006 18:08:03 +1200 Martin Langhoff wrote:
> On 5/6/06, Junio C Hamano <junkio@cox.net> wrote:
> > "Martin Langhoff" <martin.langhoff@gmail.com> writes:
> > >
> > > It means that for a merge or checkout involving stuff we "don't have",
> > > it's trivial to know we are missing, and so we can attempt a fetch of
> > > the missing objects or tell the user how to request them them before
> > > retrying.
> > >
> > > And in any case commits and trees are lightweight and compress well...
> >
> > Commit maybe, but is this based on a hard fact?
>
> No hard facts here :( but I think it's reasonable to assume that the
> trees delta/compress reasonably well, as a given commit will change
> just a few entries in each tree.
>
> I might try and hack a shallow local clone of the kernel and pack it
> tightly to see what it yields.
For linux v2.6.16:
7,3M commits-b41b04a36afebdba3b70b74f419fc7d97249bd7f.pack
24M commits_trees-8397f1c2a885527acd07e2caa8c95df626451493.pack
97M full-c7b2747a674ff55cb4a59dabebe419f191e360df.pack
For comparizon, a single version in packed form:
51M v2.6.12-rc2-4f3526b6815eb63da6c43ed85be1494bb776e2c5.pack
Made with
git-rev-list v2.6.16 | git-pack-objects commits
git-rev-list --objects --no-blobs v2.6.16 | git-pack-objects commits_trees
git-rev-list --objects v2.6.16 | git-pack-objects full
and this hack to git-rev-list:
diff --git a/revision.c b/revision.c
index f2a9f25..b5a929e 100644
--- a/revision.c
+++ b/revision.c
@@ -636,6 +636,10 @@ int setup_revisions(int argc, const char
revs->blob_objects = 1;
continue;
}
+ if (!strcmp(arg, "--no-blobs")) {
+ revs->blob_objects = 0;
+ continue;
+ }
if (!strcmp(arg, "--objects-edge")) {
revs->tag_objects = 1;
revs->tree_objects = 1;
So trees are definitely not lightweight, and commits are rather large
too.
[-- Attachment #2: Type: application/pgp-signature, Size: 190 bytes --]
^ permalink raw reply related
* Re: Unresolved issues #2 (shallow clone again)
From: Jeff King @ 2006-05-07 7:56 UTC (permalink / raw)
To: git
In-Reply-To: <46a038f90605062308x53995076k7bf45f0aebcae0c6@mail.gmail.com>
On Sun, May 07, 2006 at 06:08:03PM +1200, Martin Langhoff wrote:
> >> And in any case commits and trees are lightweight and compress well...
> >Commit maybe, but is this based on a hard fact?
> No hard facts here :( but I think it's reasonable to assume that the
> trees delta/compress reasonably well, as a given commit will change
> just a few entries in each tree.
A few hard facts (using Linus' linux-2.6 tree):
- original packsize: 120996 kilobytes
- unpacked: 233338 objects, 1417476 kilobytes
This is an 11.7:1 compression ratio (of course, much of this is
wasted space from the 4k block size in the filesystem)
- There were 87915 total blob objects, of which 19321 were in the
current tree. I removed all non-current blobs to produce a "shallow"
tree.
- The shallow tree unpacked: 164744 objects, 761960 kilobytes
IOW, about half of the unpacked disk usage was old blobs.
- Shallow commit/tree/tag objects packed (using 1.3.1
git-pack-objects):
Total 164744, written 164744 (delta 92322), reused 0 (delta 0)
size: 108088
The compression ratio here is only 7.0:1
- Total savings by going shallow: 10.7%
So basically, trees and commits DON'T compress as well as historical
blobs (potentially because git-pack-objects isn't currently optimized
for this -- I haven't checked). As a result, we're saving only 10% by
going shallow instead of a potential 50%.
-Peff
^ permalink raw reply
* Re: [PATCH] config: if mtime (or size) of the config file changed since last read, reread it
From: Jan-Benedict Glaw @ 2006-05-07 7:30 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git
In-Reply-To: <Pine.LNX.4.63.0605070144530.7578@wbgn013.biozentrum.uni-wuerzburg.de>
[-- Attachment #1: Type: text/plain, Size: 820 bytes --]
On Sun, 2006-05-07 01:45:22 +0200, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> diff --git a/config.c b/config.c
> index 6765186..452b587 100644
> --- a/config.c
> +++ b/config.c
> @@ -261,6 +261,10 @@ int git_config_from_file(config_fn_t fn,
> config_offset = 0;
>
> in_fd = open(filename, O_RDONLY);
> + if (in_fd < 0 && ENOENT != errno )
I admit that I don't like the (constant -operator- variable) notation,
but mixing both in one line..?
MfG, JBG
--
Jan-Benedict Glaw jbglaw@lug-owl.de . +49-172-7608481 _ O _
"Eine Freie Meinung in einem Freien Kopf | Gegen Zensur | Gegen Krieg _ _ O
für einen Freien Staat voll Freier Bürger" | im Internet! | im Irak! O O O
ret = do_actions((curr | FREE_SPEECH) & ~(NEW_COPYRIGHT_LAW | DRM | TCPA));
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply
* Re: Unresolved issues #2 (shallow clone again)
From: Martin Langhoff @ 2006-05-07 6:08 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <7vbqubvdbr.fsf@assigned-by-dhcp.cox.net>
On 5/6/06, Junio C Hamano <junkio@cox.net> wrote:
> "Martin Langhoff" <martin.langhoff@gmail.com> writes:
> >
> > It means that for a merge or checkout involving stuff we "don't have",
> > it's trivial to know we are missing, and so we can attempt a fetch of
> > the missing objects or tell the user how to request them them before
> > retrying.
> >
> > And in any case commits and trees are lightweight and compress well...
>
> Commit maybe, but is this based on a hard fact?
No hard facts here :( but I think it's reasonable to assume that the
trees delta/compress reasonably well, as a given commit will change
just a few entries in each tree.
I might try and hack a shallow local clone of the kernel and pack it
tightly to see what it yields.
cheers,
martin
^ permalink raw reply
* Re: [RFC] get_pathspec(): free() old buffer if rewriting
From: Junio C Hamano @ 2006-05-07 4:50 UTC (permalink / raw)
To: git
In-Reply-To: <7viroipijf.fsf@assigned-by-dhcp.cox.net>
Junio C Hamano <junkio@cox.net> writes:
> int match_pathspec(const char *path, int len,
> struct pathspec **pathspec);
>
> See if the path (with length) matches the given spec.
> path[len-1] == '/' signals that the caller is traversing
> a tree and checking if it is worth descending into the
> tree.
> ...
> and when traversing the tree for A and B, upon seeing "Documentation"
> entry in the topleve tree object buffers, call:
>
> path_match("Documentation", 14, 1, spec)
Oops, match_pathspec("Documentation/", 15, spec) is what I meant
here. Likewise in the later examples.
^ permalink raw reply
* Re: [RFC] get_pathspec(): free() old buffer if rewriting
From: Junio C Hamano @ 2006-05-07 4:26 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0605061532190.16343@g5.osdl.org>
Linus Torvalds <torvalds@osdl.org> writes:
> On Sun, 7 May 2006, Johannes Schindelin wrote:
>>
>> This might be the wrong way to do it, but as it is without this patch,
>> get_pathspec() is leaking memory.
>
> I'm not at all convinced we want to do somethng like this.
>
> get_pathspec() is a one-time event. It doesn't "leak" memory. To me,
> "leaking" is when you continually lose a bit of memory, and you eventually
> run out. I don't see that happening here.
I agree with that, except that blame does use it for each
commit and loses memory with deep history.
I have been contemplating to revamp pathspec stuff to deal with
not just an array pointers to strings. tree-diff already uses a
parallel array of ints (pathlens) to optimize away the repeated
use of strlen() for each pathspec elements, and I think we would
be better off using something like that everywhere. The API I
have in mind goes like this:
struct pathspec; /* opaque */
A type opaque to the caller.
struct pathspec **get_pathspec(const char *prefix,
const char **pathspec,
int wildcard);
The caller gives the prefix (return value from
setup_git_directory(), the user supplied pathspec list,
and if it wants to use ls-files style wildcard or
tree-diff style directory prefix bahaviour. A newly
allocated array is returned and the caller can free() it
when done.
int match_pathspec(const char *path, int len,
struct pathspec **pathspec);
See if the path (with length) matches the given spec.
path[len-1] == '/' signals that the caller is traversing
a tree and checking if it is worth descending into the
tree. In that case, original spec string "foo/bar/baz"
matches path = "foo/" (with len = 4). Otherwise the full
path is checked so that original spec string would match
path = "foo/bar/baz" (with len = 11), but not "foo"
(with len = 3). If the get_pathspec() was called with
wildcard support, spec string "foo/bar*" matches these:
"foo/" (i.e. should descend into it),
"foo/barboz/" (ditto)
"foo/bar.txt" (matches)
but not these:
"fob/" (no point descending into it)
"foo/bax" (does not match)
A wildcard aware diff-tree, when invoked like this:
cd Documentation
git-diff-tree -r A B -- 'ho*'
might do:
struct pathspec **spec;
const char *(paths[2]);
paths[0] = "how*"; paths[1] = NULL;
spec = get_pathspec("Documentation/", argv, 1);
and when traversing the tree for A and B, upon seeing "Documentation"
entry in the topleve tree object buffers, call:
path_match("Documentation", 14, 1, spec)
which would return true (worth descending into the directory).
The recursive call would, upon seeing "hooks.txt" entry, call:
path_match("Documentation/hooks.txt", 23, 0, spec)
which would say true and compares the entry from two trees.
Also, the same recursive invocation would call
path_match("Documentation/howto", 19, 1, spec)
which would return true to cause it further recurse into it.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox