* Re: [RFC] Support projects including other projects
From: James Purser @ 2005-05-12 5:37 UTC (permalink / raw)
To: Daniel Barkalow; +Cc: Junio C Hamano, git, Petr Baudis, Linus Torvalds
In-Reply-To: <Pine.LNX.4.21.0505120057250.30848-100000@iabervon.org>
On Thu, 2005-05-12 at 15:19, Daniel Barkalow wrote:
> If you think about it as git and cogito being entirely separate projects,
> where users would be expected to have the right version of git most of the
> time (or ever), this is true. But I think that cogito is as closely tied
> to git as the kernel is to kbuild or kconfig; the difference is that git
> is not solely available with cogito, like kbuild is solely available with
> the kernel.
I tend to disagree with you on this point. Cogito and Git share
arelationship more akin to xorg and gnome and this is something I think
Linus intended so that it would be very easy to build a layer on top of
the git toolset. Cogito is great and it fills a need but give it time
and other implementations and tool sets will come along that may
supersede it.
--
James Purser
http://ksit.dynalias.com
^ permalink raw reply
* Re: [RFC] Support projects including other projects
From: Daniel Barkalow @ 2005-05-12 5:19 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Petr Baudis, Linus Torvalds
In-Reply-To: <7vk6m5kpue.fsf@assigned-by-dhcp.cox.net>
On Wed, 11 May 2005, Junio C Hamano wrote:
> I think that the core of your idea of recording "required
> version" of the depended project (core GIT) in the depending
> project (Cogito) is a very sound one. GNU Arch folks do
> something similar in their "package-framework" stuff.
>
> I however do not think that belongs to the core GIT nor even to
> Cogito for that matter. To me, it feels like this is a pure
> build infrastructure issue.
If you think about it as git and cogito being entirely separate projects,
where users would be expected to have the right version of git most of the
time (or ever), this is true. But I think that cogito is as closely tied
to git as the kernel is to kbuild or kconfig; the difference is that git
is not solely available with cogito, like kbuild is solely available with
the kernel.
> I think you could arrange something like that with today's core
> GIT tools, like this:
>
> - Tweak Cogito Makefile so that pure Cogito and core GIT are
> housed in separate subdirectories;
>
> - Add "required-git-pb" file to Cogito source as a tracked
> source file, and record the required version of git-pb there;
>
> - Arrange Cogito Makefile to make sure the subtree that has the
> core GIT side meets "required-git-pb" constraints. The
> constraints could be "at least contains this one", "exactly
> this one". The policy would be differnt from a depending
> project to another. What happens if the requirements are not
> met is also up to the policy of that depending project.
When a particular cogito commit is made, it is impossible to tell whether
the next git-pb will work with it; the current set of patches could be
rejected in mainline git, and different support for the same functionality
added which requires something different from cogito.
This also means that Petr can't really test changes to git before
commiting them (and a new cogito with the constraint changed), because the
cogito build system would then require him to use a version he's not
testing.
Also, either the user has to keep track of two projects without any system
support in the same directory structure and figure out how to follow the
instructions from the build system in getting the right version checked
out in the right place, or the build system is tied to a particular
wrapper layer.
I think your idea is theoretically possible, but that it is just too
impractical for anyone to ever actually use it. It's something that people
could do with CVS (and it would actually work better, due to CVS's
limitations making the issues simpler), but people don't.
-Daniel
*This .sig left intentionally blank*
^ permalink raw reply
* Re: [RFC] Support projects including other projects
From: Junio C Hamano @ 2005-05-12 4:52 UTC (permalink / raw)
To: Daniel Barkalow; +Cc: git, Petr Baudis, Linus Torvalds
In-Reply-To: <Pine.LNX.4.21.0505112350420.30848-100000@iabervon.org>
I think that the core of your idea of recording "required
version" of the depended project (core GIT) in the depending
project (Cogito) is a very sound one. GNU Arch folks do
something similar in their "package-framework" stuff.
I however do not think that belongs to the core GIT nor even to
Cogito for that matter. To me, it feels like this is a pure
build infrastructure issue.
I think you could arrange something like that with today's core
GIT tools, like this:
- Tweak Cogito Makefile so that pure Cogito and core GIT are
housed in separate subdirectories;
- Add "required-git-pb" file to Cogito source as a tracked
source file, and record the required version of git-pb there;
- Arrange Cogito Makefile to make sure the subtree that has the
core GIT side meets "required-git-pb" constraints. The
constraints could be "at least contains this one", "exactly
this one". The policy would be differnt from a depending
project to another. What happens if the requirements are not
met is also up to the policy of that depending project.
^ permalink raw reply
* Re: [PATCH] improved delta support for git
From: Junio C Hamano @ 2005-05-12 4:36 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: git
In-Reply-To: <Pine.LNX.4.62.0505112309480.5426@localhost.localdomain>
The changes to sha1_file interface seems to be contained to
read_sha1_file() only; which is a very good sign. You have
already expressed that you are aware that fsck-cache needs to be
taught about the delta objects, so I'd trust that would be what
you will be tackling next.
I started wondering how the delta chains would affect pull.c,
the engine that decides which files under GIT_OBJECT_DIRECTORY
need to be pulled from the remote side in order to construct the
set of objects needed by the given commit ID, under various
combinations of cut-off criteria given with -c, -t, and -a
options.
It appears to me that changes to the make_sure_we_have_it()
routine along the following lines (completely untested) would
suffice. Instead of just returning success, we first fetch the
named object from the remote side, read it to see if it is
really the object we have asked, or just a delta, and if it is a
delta call itself again on the underlying object that delta
object depends upon.
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
# - git-pb: Fixed a leak in read-tree
# + (working tree)
--- a/pull.c
+++ b/pull.c
@@ -32,11 +32,23 @@ static void report_missing(const char *w
static int make_sure_we_have_it(const char *what, unsigned char *sha1)
{
int status;
+ unsigned long mapsize;
+ void *map, *buf;
+
if (has_sha1_file(sha1))
return 0;
status = fetch(sha1);
if (status && what)
report_missing(what, sha1);
+
+ map = map_sha1_file(sha1, &mapsize);
+ if (map) {
+ buf = unpack_sha1_file(map, mapsize, type, size);
+ munmap(map, mapsize);
+ if (buf && !strcmp(type, "delta"))
+ status = make_sure_we_have_it(what, buf);
+ free(buf);
+ }
return status;
}
^ permalink raw reply
* [RFC] Support projects including other projects
From: Daniel Barkalow @ 2005-05-12 4:23 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Petr Baudis, Linus Torvalds
I've come up with a way to handle projects like cogito which are based on
other projects. I think that it actually solves the real problem with such
projects, and it is actually very simple.
The problem that such projects run into, especially while both the core
and the non-core projects are in a state of substantial flux and when the
non-core developer(s) contribute needed changes to the core, is that the
two projects not only have to be tracked, they have to be kept in
sync. That is, a particular version of cogito requires a particular
version of git. There is a bit of convenience to having the tools
magically do the right thing when you check out the child project, but the
thing that really requires tool support is that you need to be able to
find the version of git-pb which matches the version of cogito you're
trying to build (and you might be searching the history for where a bug
was introduced, so you may not be able to use the latest of either).
The solution is to add a header to commits: "include {hash}", which simply
says that the given hash, which is from the core project, is the commit
needed to build this commit of the non-core project. This comes from an
argument to commit-tree ("-I", perhaps), and the parsing code needs to
identify the reference so that fsck-cache stays happy.
Git doesn't do anything more; wrapping layers would be able to take care
of the rest. When the wrapping layer determines that you are checking out
a commit with an include header, it also checks out the included commit,
using a different index file. The core treats everything as if you had a
bunch of non-tracked files in the directory (those being the things in the
other project). When you commit, it first commits any includes (if
needed), identifies the resulting core head, and passes that to the
include for the final result.
It seems to me like this should work perfectly. The one weakness is that
it's quite annoying to do by hand, since you have to simultaneously track
two index files and remember to pass the argument to commit-tree each
time. (Also, it means that you'd ideally pull git-pb from the cogito
repository with a client that ignores things not reachable from your head,
although Petr could still just copy and prune to match the current
situation).
I've written up the git changes needed, if people are interested in the
patch.
-Daniel
*This .sig left intentionally blank*
^ permalink raw reply
* [PATCH] improved delta support for git
From: Nicolas Pitre @ 2005-05-12 3:51 UTC (permalink / raw)
To: git
OK, here's some improved support for delta objects in a git repository.
This patch adds the ability to create and restore delta objects with the
git-mkdelta command. A list of objects is provided and the
corresponding delta chain is created. The maximum depth of a delta
chain can be specified with the -d argument. If a max depth of 0 is
provided then all given objects are undeltafied and replaced by their
original version. With the -v argument a lot of lovely details are
printed out.
Also included is a script to deltafy an entire repository. Simply
execute git-deltafy-script to create deltas of objects corresponding to
successive previous versions of every files. Running
'git-deltafy-script -d 0' will revert everything to non deltafied form.
I've yet to add suport to fsck-cache to understand delta objects. It is
advised to undeltafy your repository before running it otherwise you'll
see lots of reported errors. Once undeltafied you should have good
output from fsck-cache again.
Please backup your repository before playing with this for now... just
in case.
If you happen to have the whole kernel history in your repository I'd be
interested to know what the space figure is and how it performs. So far
I tested a tar of the .git/objects directory from git's git repository.
This is to estimate the real data size without the filesystem block
round up. The undeltafied repository created a 1708kb tar file while the
deltafied repository created a 1173kb tar file. The chunking storage
code should be considered for real life usage of course.
There are probably things to experiment in order to save space further,
such as deltafying tree objects, and in the context of Linux, deltafying
files with lots of similitudes between content in diferent include/asm-*
subdirectories.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Index: git/diff-delta.c
===================================================================
--- /dev/null
+++ git/diff-delta.c
@@ -0,0 +1,330 @@
+/*
+ * diff-delta.c: generate a delta between two buffers
+ *
+ * Many parts of this file have been lifted from LibXDiff version 0.10.
+ * http://www.xmailserver.org/xdiff-lib.html
+ *
+ * LibXDiff was written by Davide Libenzi <davidel@xmailserver.org>
+ * Copyright (C) 2003 Davide Libenzi
+ *
+ * Many mods for GIT usage by Nicolas Pitre <nico@cam.org>, (C) 2005.
+ *
+ * This file is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ */
+
+#include <stdlib.h>
+#include "delta.h"
+
+
+/* block size: min = 16, max = 64k, power of 2 */
+#define BLK_SIZE 16
+
+#define MIN(a, b) ((a) < (b) ? (a) : (b))
+
+#define GR_PRIME 0x9e370001
+#define HASH(v, b) (((unsigned int)(v) * GR_PRIME) >> (32 - (b)))
+
+/* largest prime smaller than 65536 */
+#define BASE 65521
+
+/* NMAX is the largest n such that 255n(n+1)/2 + (n+1)(BASE-1) <= 2^32-1 */
+#define NMAX 5552
+
+#define DO1(buf, i) { s1 += buf[i]; s2 += s1; }
+#define DO2(buf, i) DO1(buf, i); DO1(buf, i + 1);
+#define DO4(buf, i) DO2(buf, i); DO2(buf, i + 2);
+#define DO8(buf, i) DO4(buf, i); DO4(buf, i + 4);
+#define DO16(buf) DO8(buf, 0); DO8(buf, 8);
+
+static unsigned int adler32(unsigned int adler, const unsigned char *buf, int len)
+{
+ int k;
+ unsigned int s1 = adler & 0xffff;
+ unsigned int s2 = adler >> 16;
+
+ while (len > 0) {
+ k = MIN(len, NMAX);
+ len -= k;
+ while (k >= 16) {
+ DO16(buf);
+ buf += 16;
+ k -= 16;
+ }
+ if (k != 0)
+ do {
+ s1 += *buf++;
+ s2 += s1;
+ } while (--k);
+ s1 %= BASE;
+ s2 %= BASE;
+ }
+
+ return (s2 << 16) | s1;
+}
+
+static unsigned int hashbits(unsigned int size)
+{
+ unsigned int val = 1, bits = 0;
+ while (val < size && bits < 32) {
+ val <<= 1;
+ bits++;
+ }
+ return bits ? bits: 1;
+}
+
+typedef struct s_chanode {
+ struct s_chanode *next;
+ int icurr;
+} chanode_t;
+
+typedef struct s_chastore {
+ chanode_t *head, *tail;
+ int isize, nsize;
+ chanode_t *ancur;
+ chanode_t *sncur;
+ int scurr;
+} chastore_t;
+
+static void cha_init(chastore_t *cha, int isize, int icount)
+{
+ cha->head = cha->tail = NULL;
+ cha->isize = isize;
+ cha->nsize = icount * isize;
+ cha->ancur = cha->sncur = NULL;
+ cha->scurr = 0;
+}
+
+static void *cha_alloc(chastore_t *cha)
+{
+ chanode_t *ancur;
+ void *data;
+
+ ancur = cha->ancur;
+ if (!ancur || ancur->icurr == cha->nsize) {
+ ancur = malloc(sizeof(chanode_t) + cha->nsize);
+ if (!ancur)
+ return NULL;
+ ancur->icurr = 0;
+ ancur->next = NULL;
+ if (cha->tail)
+ cha->tail->next = ancur;
+ if (!cha->head)
+ cha->head = ancur;
+ cha->tail = ancur;
+ cha->ancur = ancur;
+ }
+
+ data = (void *)ancur + sizeof(chanode_t) + ancur->icurr;
+ ancur->icurr += cha->isize;
+ return data;
+}
+
+static void cha_free(chastore_t *cha)
+{
+ chanode_t *cur = cha->head;
+ while (cur) {
+ chanode_t *tmp = cur;
+ cur = cur->next;
+ free(tmp);
+ }
+}
+
+typedef struct s_bdrecord {
+ struct s_bdrecord *next;
+ unsigned int fp;
+ const unsigned char *ptr;
+} bdrecord_t;
+
+typedef struct s_bdfile {
+ const unsigned char *data, *top;
+ chastore_t cha;
+ unsigned int fphbits;
+ bdrecord_t **fphash;
+} bdfile_t;
+
+static int delta_prepare(const unsigned char *buf, int bufsize, bdfile_t *bdf)
+{
+ unsigned int fphbits;
+ int i, hsize;
+ const unsigned char *base, *data, *top;
+ bdrecord_t *brec;
+ bdrecord_t **fphash;
+
+ fphbits = hashbits(bufsize / BLK_SIZE + 1);
+ hsize = 1 << fphbits;
+ fphash = malloc(hsize * sizeof(bdrecord_t *));
+ if (!fphash)
+ return -1;
+ for (i = 0; i < hsize; i++)
+ fphash[i] = NULL;
+ cha_init(&bdf->cha, sizeof(bdrecord_t), hsize / 4 + 1);
+
+ bdf->data = data = base = buf;
+ bdf->top = top = buf + bufsize;
+ data += (bufsize / BLK_SIZE) * BLK_SIZE;
+ if (data == top)
+ data -= BLK_SIZE;
+
+ for ( ; data >= base; data -= BLK_SIZE) {
+ brec = cha_alloc(&bdf->cha);
+ if (!brec) {
+ cha_free(&bdf->cha);
+ free(fphash);
+ return -1;
+ }
+ brec->fp = adler32(0, data, MIN(BLK_SIZE, top - data));
+ brec->ptr = data;
+ i = HASH(brec->fp, fphbits);
+ brec->next = fphash[i];
+ fphash[i] = brec;
+ }
+
+ bdf->fphbits = fphbits;
+ bdf->fphash = fphash;
+
+ return 0;
+}
+
+static void delta_cleanup(bdfile_t *bdf)
+{
+ free(bdf->fphash);
+ cha_free(&bdf->cha);
+}
+
+#define COPYOP_SIZE(o, s) \
+ (!!(o & 0xff) + !!(o & 0xff00) + !!(o & 0xff0000) + !!(o & 0xff000000) + \
+ !!(s & 0xff) + !!(s & 0xff00) + 1)
+
+void *diff_delta(void *from_buf, unsigned long from_size,
+ void *to_buf, unsigned long to_size,
+ unsigned long *delta_size)
+{
+ int i, outpos, outsize, inscnt, csize, msize, moff;
+ unsigned int fp;
+ const unsigned char *data, *top, *ptr1, *ptr2;
+ unsigned char *out, *orig;
+ bdrecord_t *brec;
+ bdfile_t bdf;
+
+ if (!from_size || !to_size || delta_prepare(from_buf, from_size, &bdf))
+ return NULL;
+
+ outpos = 0;
+ outsize = 8192;
+ out = malloc(outsize);
+ if (!out) {
+ delta_cleanup(&bdf);
+ return NULL;
+ }
+
+ data = to_buf;
+ top = to_buf + to_size;
+
+ /* store reference buffer size */
+ orig = out + outpos++;
+ *orig = i = 0;
+ do {
+ if (from_size & 0xff) {
+ *orig |= (1 << i);
+ out[outpos++] = from_size;
+ }
+ i++;
+ from_size >>= 8;
+ } while (from_size);
+
+ /* store target buffer size */
+ orig = out + outpos++;
+ *orig = i = 0;
+ do {
+ if (to_size & 0xff) {
+ *orig |= (1 << i);
+ out[outpos++] = to_size;
+ }
+ i++;
+ to_size >>= 8;
+ } while (to_size);
+
+ inscnt = 0;
+ moff = 0;
+ while (data < top) {
+ msize = 0;
+ fp = adler32(0, data, MIN(top - data, BLK_SIZE));
+ i = HASH(fp, bdf.fphbits);
+ for (brec = bdf.fphash[i]; brec; brec = brec->next) {
+ if (brec->fp == fp) {
+ csize = bdf.top - brec->ptr;
+ if (csize > top - data)
+ csize = top - data;
+ for (ptr1 = brec->ptr, ptr2 = data;
+ csize && *ptr1 == *ptr2;
+ csize--, ptr1++, ptr2++);
+
+ csize = ptr1 - brec->ptr;
+ if (csize > msize) {
+ moff = brec->ptr - bdf.data;
+ msize = csize;
+ if (msize >= 0x10000) {
+ msize = 0x10000;
+ break;
+ }
+ }
+ }
+ }
+
+ if (!msize || msize < COPYOP_SIZE(moff, msize)) {
+ if (!inscnt)
+ outpos++;
+ out[outpos++] = *data++;
+ inscnt++;
+ if (inscnt == 0x7f) {
+ out[outpos - inscnt - 1] = inscnt;
+ inscnt = 0;
+ }
+ } else {
+ if (inscnt) {
+ out[outpos - inscnt - 1] = inscnt;
+ inscnt = 0;
+ }
+
+ data += msize;
+ orig = out + outpos++;
+ i = 0x80;
+
+ if (moff & 0xff) { out[outpos++] = moff; i |= 0x01; }
+ moff >>= 8;
+ if (moff & 0xff) { out[outpos++] = moff; i |= 0x02; }
+ moff >>= 8;
+ if (moff & 0xff) { out[outpos++] = moff; i |= 0x04; }
+ moff >>= 8;
+ if (moff & 0xff) { out[outpos++] = moff; i |= 0x08; }
+
+ if (msize & 0xff) { out[outpos++] = msize; i |= 0x10; }
+ msize >>= 8;
+ if (msize & 0xff) { out[outpos++] = msize; i |= 0x20; }
+
+ *orig = i;
+ }
+
+ /* next time around the largest possible output is 1 + 4 + 3 */
+ if (outpos > outsize - 8) {
+ void *tmp = out;
+ outsize = outsize * 3 / 2;
+ out = realloc(out, outsize);
+ if (!out) {
+ free(tmp);
+ delta_cleanup(&bdf);
+ return NULL;
+ }
+ }
+ }
+
+ if (inscnt)
+ out[outpos - inscnt - 1] = inscnt;
+
+ delta_cleanup(&bdf);
+ *delta_size = outpos;
+ return out;
+}
Index: git/delta.h
===================================================================
--- /dev/null
+++ git/delta.h
@@ -0,0 +1,6 @@
+extern void *diff_delta(void *from_buf, unsigned long from_size,
+ void *to_buf, unsigned long to_size,
+ unsigned long *delta_size);
+extern void *patch_delta(void *src_buf, unsigned long src_size,
+ void *delta_buf, unsigned long delta_size,
+ unsigned long *dst_size);
Index: git/Makefile
===================================================================
--- git.orig/Makefile
+++ git/Makefile
@@ -29,7 +29,7 @@
install $(PROG) $(SCRIPTS) $(HOME)/bin/
LIB_OBJS=read-cache.o sha1_file.o usage.o object.o commit.o tree.o blob.o \
- tag.o date.o
+ tag.o date.o diff-delta.o patch-delta.o
LIB_FILE=libgit.a
LIB_H=cache.h object.h blob.h tree.h commit.h tag.h
@@ -63,6 +63,9 @@
test-date: test-date.c date.o
$(CC) $(CFLAGS) -o $@ test-date.c date.o
+test-delta: test-delta.c diff-delta.o patch-delta.o
+ $(CC) $(CFLAGS) -o $@ $^
+
git-%: %.c $(LIB_FILE)
$(CC) $(CFLAGS) -o $@ $(filter %.c,$^) $(LIBS)
Index: git/patch-delta.c
===================================================================
--- /dev/null
+++ git/patch-delta.c
@@ -0,0 +1,88 @@
+/*
+ * patch-delta.c:
+ * recreate a buffer from a source and the delta produced by diff-delta.c
+ *
+ * (C) 2005 Nicolas Pitre <nico@cam.org>
+ *
+ * This code is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <stdlib.h>
+#include <string.h>
+#include "delta.h"
+
+void *patch_delta(void *src_buf, unsigned long src_size,
+ void *delta_buf, unsigned long delta_size,
+ unsigned long *dst_size)
+{
+ const unsigned char *data, *top;
+ unsigned char *dst_buf, *out, cmd;
+ unsigned long size;
+ int i;
+
+ /* the smallest delta size possible is 6 bytes */
+ if (delta_size < 6)
+ return NULL;
+
+ data = delta_buf;
+ top = delta_buf + delta_size;
+
+ /* make sure the orig file size matches what we expect */
+ size = i = 0;
+ cmd = *data++;
+ while (cmd) {
+ if (cmd & 1)
+ size |= *data++ << i;
+ i += 8;
+ cmd >>= 1;
+ }
+ if (size != src_size)
+ return NULL;
+
+ /* now the result size */
+ size = i = 0;
+ cmd = *data++;
+ while (cmd) {
+ if (cmd & 1)
+ size |= *data++ << i;
+ i += 8;
+ cmd >>= 1;
+ }
+ dst_buf = malloc(size);
+ if (!dst_buf)
+ return NULL;
+
+ out = dst_buf;
+ while (data < top) {
+ cmd = *data++;
+ if (cmd & 0x80) {
+ unsigned long cp_off = 0, cp_size = 0;
+ const unsigned char *buf;
+ if (cmd & 0x01) cp_off = *data++;
+ if (cmd & 0x02) cp_off |= (*data++ << 8);
+ if (cmd & 0x04) cp_off |= (*data++ << 16);
+ if (cmd & 0x08) cp_off |= (*data++ << 24);
+ if (cmd & 0x10) cp_size = *data++;
+ if (cmd & 0x20) cp_size |= (*data++ << 8);
+ if (cp_size == 0) cp_size = 0x10000;
+ buf = (cmd & 0x40) ? dst_buf : src_buf;
+ memcpy(out, buf + cp_off, cp_size);
+ out += cp_size;
+ } else {
+ memcpy(out, data, cmd);
+ out += cmd;
+ data += cmd;
+ }
+ }
+
+ /* sanity check */
+ if (data != top || out - dst_buf != size) {
+ free(dst_buf);
+ return NULL;
+ }
+
+ *dst_size = size;
+ return dst_buf;
+}
Index: git/test-delta.c
===================================================================
--- /dev/null
+++ git/test-delta.c
@@ -0,0 +1,79 @@
+/*
+ * test-delta.c: test code to exercise diff-delta.c and patch-delta.c
+ *
+ * (C) 2005 Nicolas Pitre <nico@cam.org>
+ *
+ * This code is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <stdio.h>
+#include <unistd.h>
+#include <string.h>
+#include <fcntl.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/mman.h>
+#include "delta.h"
+
+static const char *usage =
+ "test-delta (-d|-p) <from_file> <data_file> <out_file>";
+
+int main(int argc, char *argv[])
+{
+ int fd;
+ struct stat st;
+ void *from_buf, *data_buf, *out_buf;
+ unsigned long from_size, data_size, out_size;
+
+ if (argc != 5 || (strcmp(argv[1], "-d") && strcmp(argv[1], "-p"))) {
+ fprintf(stderr, "Usage: %s\n", usage);
+ return 1;
+ }
+
+ fd = open(argv[2], O_RDONLY);
+ if (fd < 0 || fstat(fd, &st)) {
+ perror(argv[2]);
+ return 1;
+ }
+ from_size = st.st_size;
+ from_buf = mmap(NULL, from_size, PROT_READ, MAP_PRIVATE, fd, 0);
+ if (from_buf == MAP_FAILED) {
+ perror(argv[2]);
+ return 1;
+ }
+ close(fd);
+
+ fd = open(argv[3], O_RDONLY);
+ if (fd < 0 || fstat(fd, &st)) {
+ perror(argv[3]);
+ return 1;
+ }
+ data_size = st.st_size;
+ data_buf = mmap(NULL, data_size, PROT_READ, MAP_PRIVATE, fd, 0);
+ if (data_buf == MAP_FAILED) {
+ perror(argv[3]);
+ return 1;
+ }
+ close(fd);
+
+ if (argv[1][1] == 'd')
+ out_buf = diff_delta(from_buf, from_size,
+ data_buf, data_size, &out_size);
+ else
+ out_buf = patch_delta(from_buf, from_size,
+ data_buf, data_size, &out_size);
+ if (!out_buf) {
+ fprintf(stderr, "delta operation failed (returned NULL)\n");
+ return 1;
+ }
+
+ fd = open (argv[4], O_WRONLY|O_CREAT|O_TRUNC, 0666);
+ if (fd < 0 || write(fd, out_buf, out_size) != out_size) {
+ perror(argv[4]);
+ return 1;
+ }
+
+ return 0;
+}
Index: git/sha1_file.c
===================================================================
--- git.orig/sha1_file.c
+++ git/sha1_file.c
@@ -8,6 +8,7 @@
*/
#include <stdarg.h>
#include "cache.h"
+#include "delta.h"
#ifndef O_NOATIME
#if defined(__linux__) && (defined(__i386__) || defined(__PPC__))
@@ -224,6 +225,19 @@
if (map) {
buf = unpack_sha1_file(map, mapsize, type, size);
munmap(map, mapsize);
+ if (buf && !strcmp(type, "delta")) {
+ void *ref = NULL, *delta = buf;
+ unsigned long ref_size, delta_size = *size;
+ buf = NULL;
+ if (delta_size > 20)
+ ref = read_sha1_file(delta, type, &ref_size);
+ if (ref)
+ buf = patch_delta(ref, ref_size,
+ delta+20, delta_size-20,
+ size);
+ free(delta);
+ free(ref);
+ }
return buf;
}
return NULL;
Index: git/Makefile
===================================================================
--- git.orig/Makefile
+++ git/Makefile
@@ -13,7 +13,7 @@
AR=ar
SCRIPTS=git-apply-patch-script git-merge-one-file-script git-prune-script \
- git-pull-script git-tag-script git-resolve-script
+ git-pull-script git-tag-script git-resolve-script git-deltafy-script
PROG= git-update-cache git-diff-files git-init-db git-write-tree \
git-read-tree git-commit-tree git-cat-file git-fsck-cache \
@@ -21,7 +21,8 @@
git-check-files git-ls-tree git-merge-base git-merge-cache \
git-unpack-file git-export git-diff-cache git-convert-cache \
git-http-pull git-rpush git-rpull git-rev-list git-mktag \
- git-diff-tree-helper git-tar-tree git-local-pull git-write-blob
+ git-diff-tree-helper git-tar-tree git-local-pull git-write-blob \
+ git-mkdelta
all: $(PROG)
@@ -95,6 +96,7 @@
git-rpull: rsh.c pull.c
git-rev-list: rev-list.c
git-mktag: mktag.c
+git-mkdelta: mkdelta.c
git-diff-tree-helper: diff-tree-helper.c
git-tar-tree: tar-tree.c
git-write-blob: write-blob.c
Index: git/mkdelta.c
===================================================================
--- /dev/null
+++ git/mkdelta.c
@@ -0,0 +1,283 @@
+/*
+ * Deltafication of a GIT database.
+ *
+ * (C) 2005 Nicolas Pitre <nico@cam.org>
+ *
+ * This code is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "cache.h"
+#include "delta.h"
+
+static int replace_object(char *buf, unsigned long len, unsigned char *sha1,
+ char *hdr, int hdrlen)
+{
+ char tmpfile[PATH_MAX];
+ int size;
+ char *compressed;
+ z_stream stream;
+ int fd;
+
+ snprintf(tmpfile, sizeof(tmpfile), "%s/obj_XXXXXX", get_object_directory());
+ fd = mkstemp(tmpfile);
+ if (fd < 0)
+ return error("%s: %s\n", tmpfile, strerror(errno));
+
+ /* Set it up */
+ memset(&stream, 0, sizeof(stream));
+ deflateInit(&stream, Z_BEST_COMPRESSION);
+ size = deflateBound(&stream, len+hdrlen);
+ compressed = xmalloc(size);
+
+ /* Compress it */
+ stream.next_out = compressed;
+ stream.avail_out = size;
+
+ /* First header.. */
+ stream.next_in = hdr;
+ stream.avail_in = hdrlen;
+ while (deflate(&stream, 0) == Z_OK)
+ /* nothing */;
+
+ /* Then the data itself.. */
+ stream.next_in = buf;
+ stream.avail_in = len;
+ while (deflate(&stream, Z_FINISH) == Z_OK)
+ /* nothing */;
+ deflateEnd(&stream);
+ size = stream.total_out;
+
+ if (write(fd, compressed, size) != size) {
+ perror("unable to write file");
+ close(fd);
+ unlink(tmpfile);
+ return -1;
+ }
+ fchmod(fd, 0444);
+ close(fd);
+
+ if (rename(tmpfile, sha1_file_name(sha1))) {
+ perror("unable to replace original object");
+ unlink(tmpfile);
+ return -1;
+ }
+ return 0;
+}
+
+static int write_delta_file(char *buf, unsigned long len,
+ unsigned char *sha1_ref, unsigned char *sha1_trg)
+{
+ char hdr[50];
+ int hdrlen;
+
+ /* Generate the header + sha1 of reference for delta */
+ hdrlen = sprintf(hdr, "delta %lu", len+20)+1;
+ memcpy(hdr + hdrlen, sha1_ref, 20);
+ hdrlen += 20;
+
+ return replace_object(buf, len, sha1_trg, hdr, hdrlen);
+}
+
+static int replace_sha1_file(char *buf, unsigned long len,
+ char *type, unsigned char *sha1)
+{
+ char hdr[50];
+ int hdrlen;
+
+ hdrlen = sprintf(hdr, "%s %lu", type, len)+1;
+ return replace_object(buf, len, sha1, hdr, hdrlen);
+}
+
+static void *get_buffer(unsigned char *sha1, char *type, unsigned long *size)
+{
+ unsigned long mapsize;
+ void *map = map_sha1_file(sha1, &mapsize);
+ if (map) {
+ void *buffer = unpack_sha1_file(map, mapsize, type, size);
+ munmap(map, mapsize);
+ if (buffer)
+ return buffer;
+ }
+ error("unable to get object %s", sha1_to_hex(sha1));
+ return NULL;
+}
+
+static void *expand_delta(void *delta, unsigned long delta_size, char *type,
+ unsigned long *size, unsigned int *depth, char *head)
+{
+ void *buf = NULL;
+ *depth++;
+ if (delta_size < 20) {
+ error("delta object is bad");
+ free(delta);
+ } else {
+ unsigned long ref_size;
+ void *ref = get_buffer(delta, type, &ref_size);
+ if (ref && !strcmp(type, "delta"))
+ ref = expand_delta(ref, ref_size, type, &ref_size,
+ depth, head);
+ else
+ memcpy(head, delta, 20);
+ if (ref)
+ buf = patch_delta(ref, ref_size, delta+20,
+ delta_size-20, size);
+ free(ref);
+ free(delta);
+ }
+ return buf;
+}
+
+static char *mkdelta_usage =
+"mkdelta [ --max-depth=N ] <reference_sha1> <target_sha1> [ <next_sha1> ... ]";
+
+int main(int argc, char **argv)
+{
+ unsigned char sha1_ref[20], sha1_trg[20], head_ref[20], head_trg[20];
+ char type_ref[20], type_trg[20];
+ void *buf_ref, *buf_trg, *buf_delta;
+ unsigned long size_ref, size_trg, size_orig, size_delta;
+ unsigned int depth_ref, depth_trg, depth_max = -1;
+ int i, verbose = 0;
+
+ for (i = 1; i < argc; i++) {
+ if (!strcmp(argv[i], "-v")) {
+ verbose = 1;
+ } else if (!strcmp(argv[i], "-d") && i+1 < argc) {
+ depth_max = atoi(argv[++i]);
+ } else if (!strncmp(argv[i], "--max-depth=", 12)) {
+ depth_max = atoi(argv[i]+12);
+ } else
+ break;
+ }
+
+ if (i + (depth_max != 0) >= argc)
+ usage(mkdelta_usage);
+
+ if (get_sha1(argv[i], sha1_ref))
+ die("bad sha1 %s", argv[i]);
+ depth_ref = 0;
+ buf_ref = get_buffer(sha1_ref, type_ref, &size_ref);
+ if (buf_ref && !strcmp(type_ref, "delta"))
+ buf_ref = expand_delta(buf_ref, size_ref, type_ref,
+ &size_ref, &depth_ref, head_ref);
+ else
+ memcpy(head_ref, sha1_ref, 20);
+ if (!buf_ref)
+ die("unable to obtain initial object %s", argv[i]);
+
+ if (depth_ref > depth_max) {
+ if (replace_sha1_file(buf_ref, size_ref, type_ref, sha1_ref))
+ die("unable to restore %s", argv[i]);
+ if (verbose)
+ printf("undelta %s (depth was %d)\n", argv[i], depth_ref);
+ depth_ref = 0;
+ }
+
+ while (++i < argc) {
+ if (get_sha1(argv[i], sha1_trg))
+ die("bad sha1 %s", argv[i]);
+ depth_trg = 0;
+ buf_trg = get_buffer(sha1_trg, type_trg, &size_trg);
+ if (buf_trg && !size_trg) {
+ if (verbose)
+ printf("skip %s (object is empty)\n", argv[i]);
+ continue;
+ }
+ size_orig = size_trg;
+ if (buf_trg && !strcmp(type_trg, "delta")) {
+ if (!memcmp(buf_trg, sha1_ref, 20)) {
+ /* delta already in place */
+ depth_ref++;
+ memcpy(sha1_ref, sha1_trg, 20);
+ buf_ref = patch_delta(buf_ref, size_ref,
+ buf_trg+20, size_trg-20,
+ &size_ref);
+ if (!buf_ref)
+ die("unable to apply delta %s", argv[i]);
+ if (depth_ref > depth_max) {
+ if (replace_sha1_file(buf_ref, size_ref,
+ type_ref, sha1_ref))
+ die("unable to restore %s", argv[i]);
+ if (verbose)
+ printf("undelta %s (depth was %d)\n", argv[i], depth_ref);
+ depth_ref = 0;
+ continue;
+ }
+ if (verbose)
+ printf("skip %s (delta already in place)\n", argv[i]);
+ continue;
+ }
+ buf_trg = expand_delta(buf_trg, size_trg, type_trg,
+ &size_trg, &depth_trg, head_trg);
+ } else
+ memcpy(head_trg, sha1_trg, 20);
+ if (!buf_trg)
+ die("unable to read target object %s", argv[i]);
+
+ if (depth_trg > depth_max) {
+ if (replace_sha1_file(buf_trg, size_trg, type_trg, sha1_trg))
+ die("unable to restore %s", argv[i]);
+ if (verbose)
+ printf("undelta %s (depth was %d)\n", argv[i], depth_trg);
+ depth_trg = 0;
+ size_orig = size_trg;
+ }
+
+ if (depth_max == 0)
+ goto skip;
+
+ if (strcmp(type_ref, type_trg))
+ die("type mismatch for object %s", argv[i]);
+
+ if (!size_ref) {
+ if (verbose)
+ printf("skip %s (initial object is empty)\n", argv[i]);
+ goto skip;
+ }
+
+ depth_ref++;
+ if (depth_ref > depth_max) {
+ if (verbose)
+ printf("skip %s (exceeding max link depth)\n", argv[i]);
+ goto skip;
+ }
+
+ if (!memcmp(head_ref, sha1_trg, 20)) {
+ if (verbose)
+ printf("skip %s (would create a loop)\n", argv[i]);
+ goto skip;
+ }
+
+ buf_delta = diff_delta(buf_ref, size_ref, buf_trg, size_trg, &size_delta);
+ if (!buf_delta)
+ die("out of memory");
+
+ if (size_delta+20 < size_orig) {
+ if (write_delta_file(buf_delta, size_delta,
+ sha1_ref, sha1_trg))
+ die("unable to write delta for %s", argv[i]);
+ free(buf_delta);
+ if (verbose)
+ printf("delta %s (size=%ld.%02ld%%, depth=%d)\n",
+ argv[i], (size_delta+20)*100 / size_trg,
+ ((size_delta+20)*10000 / size_trg)%100,
+ depth_ref);
+ } else {
+ free(buf_delta);
+ if (verbose)
+ printf("skip %s (original is smaller)\n", argv[i]);
+ skip:
+ depth_ref = depth_trg;
+ memcpy(head_ref, head_trg, 20);
+ }
+
+ free(buf_ref);
+ buf_ref = buf_trg;
+ size_ref = size_trg;
+ memcpy(sha1_ref, sha1_trg, 20);
+ }
+
+ return 0;
+}
Index: git/mktag.c
===================================================================
--- git.orig/mktag.c
+++ git/mktag.c
@@ -25,20 +25,14 @@
static int verify_object(unsigned char *sha1, const char *expected_type)
{
int ret = -1;
- unsigned long mapsize;
- void *map = map_sha1_file(sha1, &mapsize);
+ char type[100];
+ unsigned long size;
+ void *buffer = read_sha1_file(sha1, type, &size);
- if (map) {
- char type[100];
- unsigned long size;
- void *buffer = unpack_sha1_file(map, mapsize, type, &size);
-
- if (buffer) {
- if (!strcmp(type, expected_type))
- ret = check_sha1_signature(sha1, buffer, size, type);
- free(buffer);
- }
- munmap(map, mapsize);
+ if (buffer) {
+ if (!strcmp(type, expected_type))
+ ret = check_sha1_signature(sha1, buffer, size, type);
+ free(buffer);
}
return ret;
}
Index: git/git-deltafy-script
===================================================================
--- /dev/null
+++ git/git-deltafy-script
@@ -0,0 +1,33 @@
+#!/bin/bash
+
+# Script to deltafy an entire GIT repository based on the commit list.
+# The most recent version of a file is the reference and the previous version
+# is changed into a delta from that most recent version. And so on for
+# successive versions going back in time.
+#
+# The -d argument allows to provide a limit on the delta chain depth.
+# If 0 is passed then everything is undeltafied.
+
+set -e
+
+depth=
+[ "$1" == "-d" ] && depth="--max-depth=$2" && shift 2
+
+curr_file=""
+
+git-rev-list HEAD |
+git-diff-tree -r --stdin |
+sed -n '/^\*/ s/^.*->\(.\{41\}\)\(.*\)$/\2 \1/p' | sort | uniq |
+while read file sha1; do
+ if [ "$file" == "$curr_file" ]; then
+ list="$list $sha1"
+ else
+ if [ "$list" ]; then
+ echo "Processing $curr_file"
+ echo "$head $list" | xargs git-mkdelta $depth -v
+ fi
+ curr_file="$file"
+ list=""
+ head="$sha1"
+ fi
+done
^ permalink raw reply
* Re: [PATCH] [RFD] Add repoid identifier to commit
From: Joel Becker @ 2005-05-12 3:30 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: tglx, Dmitry Torokhov, git
In-Reply-To: <4282ADC9.2010900@zytor.com>
On Wed, May 11, 2005 at 06:13:45PM -0700, H. Peter Anvin wrote:
> What I meant with that is I think .git/repoid is the right thing, if the
> file doesn't exist a new ID file is generated.
Count me in the "what does repoid help?" camp. If we create a
new UUID on each clone, imagine this typical usage:
linux-2.6.git has repoid AAAAAA.
I clone it locally, local-2.6-clean, repoid BBBBBB
I clone the local one, local-2.6-working, repoid CCCCCC
I work in the local one and commit my change. commit abcd,
repoid CCCCCC.
I then rsync, copy, or clone that working repository to some
place that Linus can pull from.
I then throw away the copy with repoid CCCCCC, because I'm done
with that temporary work area.
lather, rinse, repeat.
IOW, each of my changes, if I work like this, has a different
repoid. And when a problem arises, the repoid tells us diddly. I
thought one of the tenents of bk/git/codeville/whatever development is
that clone is the way to do any temporary area. You work in a clone or
10, and then clean up for submission. Which of the 10 clones is the
associated repoid seems, well, unimporant.
Joel
--
Life's Little Instruction Book #99
"Think big thoughts, but relish small pleasures."
Joel Becker
Senior Member of Technical Staff
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127
^ permalink raw reply
* Re: [PATCH] Stop git-rev-list at sha1 match
From: Junio C Hamano @ 2005-05-12 2:11 UTC (permalink / raw)
To: Petr Baudis; +Cc: tglx, git
In-Reply-To: <7v4qd9mcp1.fsf@assigned-by-dhcp.cox.net>
>>>>> "JCH" == Junio C Hamano <junkio@cox.net> writes:
>>>>> "PB" == Petr Baudis <pasky@ucw.cz> writes:
>>> --- a/checkout-cache.c
>>> +++ b/checkout-cache.c
PB> I assume this is irrelevant here?
JCH> Sorry for sending a dirty patch in. Will fix it up.
------------
Introduce "rev-list --stop-at=<commit>".
Additional option, --stop-at=<commit>, is introduced. The
git-rev-list output stops just before showing the named commit.
This is based on Thoms Gleixner's patch but slightly reworked,
with documentation updates.
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
Documentation/git-rev-list.txt | 18 +++++++++++++++++-
rev-list.c | 20 ++++++++++++++++----
2 files changed, 33 insertions(+), 5 deletions(-)
--- a/Documentation/git-rev-list.txt
+++ b/Documentation/git-rev-list.txt
@@ -9,7 +9,10 @@
SYNOPSIS
--------
-'git-rev-list' <commit>
+'git-rev-list' [--max-count=<number>]
+ [--max-age=<unixtime>]
+ [--min-age=<unixtime>]
+ [--stop-at=<commit>] <commit>
DESCRIPTION
-----------
@@ -17,6 +20,19 @@
given commit, taking ancestry relationship into account. This is
useful to produce human-readable log output.
+OPTIONS
+-------
+--max-count=<number>::
+ Stop after showing <number> commits.
+
+--max-age=<unixtime>::
+ Stop after showing commit made before <unixtime>.
+
+--min-age=<unixtime>::
+ Skip until commit made before <unixtime>.
+
+--stop-at=<commit>::
+ Stop just before showing <commit>.
Author
------
--- a/rev-list.c
+++ b/rev-list.c
@@ -1,12 +1,21 @@
#include "cache.h"
#include "commit.h"
+static const char *rev_list_usage =
+"usage: rev-list [OPTION] commit-id\n"
+" --max-count=nr\n"
+" --max-age=epoch\n"
+" --min-age=epoch\n"
+" --stop-at=commit\n";
+
int main(int argc, char **argv)
{
unsigned char sha1[20];
struct commit_list *list = NULL;
struct commit *commit;
char *commit_arg = NULL;
+ unsigned char stop_at[20];
+ int has_stop_at = 0;
int i;
unsigned long max_age = -1;
unsigned long min_age = -1;
@@ -21,16 +30,17 @@
max_age = atoi(arg + 10);
} else if (!strncmp(arg, "--min-age=", 10)) {
min_age = atoi(arg + 10);
+ } else if (!strncmp(arg, "--stop-at=", 10)) {
+ if (get_sha1(arg + 10, stop_at))
+ usage(rev_list_usage);
+ has_stop_at = 1;
} else {
commit_arg = arg;
}
}
if (!commit_arg || get_sha1(commit_arg, sha1))
- usage("usage: rev-list [OPTION] commit-id\n"
- " --max-count=nr\n"
- " --max-age=epoch\n"
- " --min-age=epoch\n");
+ usage(rev_list_usage);
commit = lookup_commit(sha1);
if (!commit || parse_commit(commit) < 0)
@@ -46,6 +56,8 @@
break;
if (max_count != -1 && !max_count--)
break;
+ if (has_stop_at && !memcmp(stop_at, commit->object.sha1, 20))
+ break;
printf("%s\n", sha1_to_hex(commit->object.sha1));
} while (list);
return 0;
------------------------------------------------
^ permalink raw reply
* Re: [PATCH] Stop git-rev-list at sha1 match
From: Junio C Hamano @ 2005-05-12 1:54 UTC (permalink / raw)
To: Petr Baudis; +Cc: tglx, git
In-Reply-To: <20050511221719.GH22686@pasky.ji.cz>
>>>>> "PB" == Petr Baudis <pasky@ucw.cz> writes:
PB> it will show the merged revisions properly, but for
PB> o
PB> | \
PB> o |
PB> ------
PB> | o
PB> | o
PB> o /
PB> o
PB> it won't show the full merge. Whilst when you do
PB> *-log --since foo
PB> I think you mean it to show everything going into the tree since foo -
PB> that would include the whole branch you cut off now.
I use "rev-tree HEAD ^$(git-merge-base HEAD foo)" for this
kind of thing, so rev-list does not really matter.
>> --- a/checkout-cache.c
>> +++ b/checkout-cache.c
PB> I assume this is irrelevant here?
Sorry for sending a dirty patch in. Will fix it up.
^ permalink raw reply
* Re: [PATCH] [RFD] Add repoid identifier to commit
From: Junio C Hamano @ 2005-05-12 1:46 UTC (permalink / raw)
To: tglx; +Cc: H. Peter Anvin, git
In-Reply-To: <1115858022.22180.256.camel@tglx>
>>>>> "TG" == Thomas Gleixner <tglx@linutronix.de> writes:
TG> So what alternatives do we have ?
How about doing nothing of this sort, introducing repo-id? I do
not understand what problem repo-id is solving.
Earlier in your response to Sean <seanlkml@sympaticoca>, you
gave a QA department example.
TG> You have to track down a problem in bugfix and the source of it.
TG> It does not matter whether the maintainer of "bugfix" pulled it from
TG> devel or from stable. It's his fault anyway.
TG>
TG> But we are not talking about faults and guiltiness. We want
TG> to identify the location and the context _where_ and _why_
TG> this change was created.
Here is my understanding of the scenario you are describing.
Are these correct?
- There is a problem in the source.
- You know what lines of which file is causing the problem.
But you cannot tell how the file got into that state and why
by just looking at the problem revision.
- You have the complete history (commit chain) leading to the
revision.
- You want to get some context to help you understand why those
offending lines are there.
Assuming I am with you so far, I would like to know what kind of
information you are looking for ("some context to help you
understand"). Is a specific commit object (rather, one pair of
commits that is parent-child) that made those lines into the
current shape enough?
My understanding of Sean's argument is that finding such a
commit (or a commit-pair) is a good enough place to start
understanding why that change was introduced and finding who to
ask for help, and it does not matter in which repository the
change was introduced. I tend to agree with him if that is what
is being discussed.
If the owner has multiple repositories and he needs to know in
which of his repositories the change was introduced, I assume he
would xsbe able to run the same procedure the QA department run
to find the problem commit on each of his repositories to find
such a commit, and commits around it (its ancestors and
descendants). So a maintainer having more than one repositories
does not seem to be an issue, either.
So I am having a hard time understanding what problem repo-id
solves.
^ permalink raw reply
* Re: [PATCH] [RFD] Add repoid identifier to commit
From: H. Peter Anvin @ 2005-05-12 1:13 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: tglx, Dmitry Torokhov, git
In-Reply-To: <4282ACD3.50009@zytor.com>
H. Peter Anvin wrote:
>>
>> Yes, as long as you make sure that rsync does _NOT_ pollute/populate it
>>
>
> You shouldn't be rsyncing the .git directory, only .git/objects anyway.
> Some people seem to have merely copied Linus' entire tree, and that's
> what causing problems.
>
> That one you can't win.
>
What I meant with that is I think .git/repoid is the right thing, if the
file doesn't exist a new ID file is generated.
If people are copying their repoid file explicitly it's up to them to
know what they're doing.
-hpa
^ permalink raw reply
* Re: [PATCH] [RFD] Add repoid identifier to commit
From: H. Peter Anvin @ 2005-05-12 1:09 UTC (permalink / raw)
To: tglx; +Cc: Dmitry Torokhov, git
In-Reply-To: <1115858670.22180.259.camel@tglx>
Thomas Gleixner wrote:
> On Wed, 2005-05-11 at 19:41 -0500, Dmitry Torokhov wrote:
>
>>>Which is complety error prone due to rsync. Some of the repositories on
>>>kernel.org keep identical copies of .git/description already. Why should
>>>they preserve an unique .git/repoid ?
>>
>>I think that an unique repoid should be created automatically every time
>>you clone. It is ok for it to go away when you discard a tree, it will just
>>identify a line (set) of changes originating from some place.
>
>
> Yes, as long as you make sure that rsync does _NOT_ pollute/populate it
>
You shouldn't be rsyncing the .git directory, only .git/objects anyway.
Some people seem to have merely copied Linus' entire tree, and that's
what causing problems.
That one you can't win.
-hpa
^ permalink raw reply
* New version of gitk
From: Paul Mackerras @ 2005-05-12 1:00 UTC (permalink / raw)
To: git
I have just put a new version of gitk at:
http://ozlabs.org/~paulus/gitk-0.9
I'm pretty happy with the display side of it now. When you select a
commit it displays the full diff below the commit comments in the
bottom-left pane, with the diff displayed nicely with red and green
backgrounds for the removed and added lines. There is still plenty to
do in the areas of user preferences, menus, find facility, etc.
Paul.
^ permalink raw reply
* Re: [PATCH] [RFD] Add repoid identifier to commit
From: Sean @ 2005-05-12 0:58 UTC (permalink / raw)
To: tglx; +Cc: git
In-Reply-To: <1115859372.22180.266.camel@tglx>
On Wed, May 11, 2005 8:56 pm, Thomas Gleixner said:
> Try to find out the history of kernel.org/.../dwmw2/audit-2.6 in correct
> order, using the available tools.
>
> Come back to me when you are done.
Ask me any question that matters and i'll answer it with available tools.
> I was not aware, that omitting irrelevant information is creating a
> blind spot.
Sorry, your assessment that it is irrelevant is incorrect and overlooks
that there is information loss.
> Period. End of thread.
Fair enough.
Sean
^ permalink raw reply
* Re: [PATCH] [RFD] Add repoid identifier to commit
From: Thomas Gleixner @ 2005-05-12 0:56 UTC (permalink / raw)
To: Sean; +Cc: git
In-Reply-To: <3185.10.10.10.24.1115858739.squirrel@linux1>
On Wed, 2005-05-11 at 20:45 -0400, Sean wrote:
> Can we please not _invent_ problems where there are none? Can you show a
> specific case today where repoid would make one ounce of difference in the
> life of anyone?
Try to find out the history of kernel.org/.../dwmw2/audit-2.6 in correct
order, using the available tools.
Come back to me when you are done.
> No, you seem to want it both ways. Sometimes it's important to you to
> know where an object came from and how it got there, and sometimes it's
> not. Interesting blind spot.
He ?
I was not aware, that omitting irrelevant information is creating a
blind spot.
Period. End of thread.
tglx
^ permalink raw reply
* Re: [PATCH] [RFD] Add repoid identifier to commit
From: Sean @ 2005-05-12 0:45 UTC (permalink / raw)
To: tglx; +Cc: git
In-Reply-To: <1115857838.22180.250.camel@tglx>
On Wed, May 11, 2005 8:30 pm, Thomas Gleixner said:
> On Wed, 2005-05-11 at 19:44 -0400, Sean wrote:
>> What problem are you trying to solve?
>
> The problem to explain the obvious facts to an agnostic
No the problem is you're seeing dragons.
> Aarg. Did you ever get in contact with QA departements ?
Can we please not _invent_ problems where there are none? Can you show a
specific case today where repoid would make one ounce of difference in the
life of anyone?
> Assume you have: bugfix - stable - devel repositories.
Why does this imaginary QA department use the same committer and author
for all of them? And why is it you switch from imaginary problems of
dave, greg and russell to imaginary problems of a fictitious QA
department?
> You have to track down a problem in bugfix and the source of it.
> It does not matter whether the maintainer of "bugfix" pulled it from
> devel or from stable. It's his fault anyway.
>
> But we are not talking about faults and guiltiness. We want to identify
> the location and the context _where_ and _why_ this change was created.
>
> The current solution of git makes it impossible to retrieve this
> information in a consistent way.
Wrong. When a commit is pulled from a repository, all the surrounding
context of every commit that came before it and after it on that branch is
pulled right along with it.
> So you have no quick solution to figure out what happened. Quite
> contrary, you have to dissect inconsistent information.
>
> See also the thread about "Stop git-rev-list at sha1 match".
Sorry, this one is entertaining enough <g>
>> The chain of command might be good to know in the same way that an
>> accurate signed-off-by chain is good to know.
>
> This sentence makes me guess, that you actually are working in a QA
> departement and therefor trying to maximize the amount of irrelevant
> information.
No, you seem to want it both ways. Sometimes it's important to you to
know where an object came from and how it got there, and sometimes it's
not. Interesting blind spot.
Sean
^ permalink raw reply
* Re: [PATCH] [RFD] Add repoid identifier to commit
From: Thomas Gleixner @ 2005-05-12 0:44 UTC (permalink / raw)
To: Dmitry Torokhov; +Cc: git, H. Peter Anvin
In-Reply-To: <200505111941.04104.dtor_core@ameritech.net>
On Wed, 2005-05-11 at 19:41 -0500, Dmitry Torokhov wrote:
> >
> > Which is complety error prone due to rsync. Some of the repositories on
> > kernel.org keep identical copies of .git/description already. Why should
> > they preserve an unique .git/repoid ?
>
> I think that an unique repoid should be created automatically every time
> you clone. It is ok for it to go away when you discard a tree, it will just
> identify a line (set) of changes originating from some place.
Yes, as long as you make sure that rsync does _NOT_ pollute/populate it
tglx
^ permalink raw reply
* Re: [PATCH] [RFD] Add repoid identifier to commit
From: Dmitry Torokhov @ 2005-05-12 0:41 UTC (permalink / raw)
To: git, tglx; +Cc: H. Peter Anvin
In-Reply-To: <1115854733.22180.202.camel@tglx>
On Wednesday 11 May 2005 18:38, Thomas Gleixner wrote:
> On Wed, 2005-05-11 at 16:14 -0700, H. Peter Anvin wrote:
> > I would like to suggest a few limiters are set on the repoid. In
> > particular, I'd like to suggest that a repoid is a UUID, that a file is
> > used to track it (.git/repoid), and that if it doesn't exist, a new one
> > is created from /dev/urandom.
>
> Which is complety error prone due to rsync. Some of the repositories on
> kernel.org keep identical copies of .git/description already. Why should
> they preserve an unique .git/repoid ?
I think that an unique repoid should be created automatically every time
you clone. It is ok for it to go away when you discard a tree, it will just
identify a line (set) of changes originating from some place.
--
Dmitry
^ permalink raw reply
* Re: [PATCH] [RFD] Add repoid identifier to commit
From: Thomas Gleixner @ 2005-05-12 0:33 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: git
In-Reply-To: <428297DB.8030905@zytor.com>
On Wed, 2005-05-11 at 16:40 -0700, H. Peter Anvin wrote:
> > I expect neither of those two things to happen, but a complete working
> > directory path is better than nothing to make educated guesses.
> > Committer names (maintainers) can be the same over repositories, but its
> > unlikely that somebody who manages more than one subsystems uses the
> > same working directory for them.
> >
>
> I can tell you what would happen in at least my case: you'll see each
> "repository" with about 23 different IDs.
You won. :)
So what alternatives do we have ?
- commit history per repository
.git/head-history rsync and user error prone
- .git/repoid rsync error prone
- GIT_REPO_ID=xyz user error prone
- directory name based guessing hpa error prone
What's your preferred error scenario ?
tglx
^ permalink raw reply
* Re: [PATCH] Stop git-rev-list at sha1 match
From: Thomas Gleixner @ 2005-05-12 0:31 UTC (permalink / raw)
To: Petr Baudis; +Cc: Junio C Hamano, git
In-Reply-To: <20050511234455.GL22686@pasky.ji.cz>
On Thu, 2005-05-12 at 01:44 +0200, Petr Baudis wrote:
> for extensive discussion on how (it is impossible or very hard) to do
> better.
:)
> So how would you order the list of commits?
Rn
merged Mn
merged Mn-1
Rn-1
....
That's the relevant information in repository R. Looking at it from
repository M after M updated to Rn
(Mn+1) == Rn ; Mn+1 is not created due to head forward
merged Rn
..
merged Rn-3
Mn
Mn-1
Thats the historical correct ordering from a repository point of view.
Thats the only relevant information IMNSHO.
The dates of author and committer are retrievable in each repository,
but the order of commits are not.
tglx
^ permalink raw reply
* Re: [PATCH] [RFD] Add repoid identifier to commit
From: Thomas Gleixner @ 2005-05-12 0:30 UTC (permalink / raw)
To: Sean; +Cc: git
In-Reply-To: <2997.10.10.10.24.1115855049.squirrel@linux1>
On Wed, 2005-05-11 at 19:44 -0400, Sean wrote:
> What problem are you trying to solve?
The problem to explain the obvious facts to an agnostic
> Has dave or russell or anybody with
> multiple repositories given you reason to think they have a problem
> tracking their personal repositories? I doubt it very much.
Aarg. Did you ever get in contact with QA departements ?
Assume you have: bugfix - stable - devel repositories.
You have to track down a problem in bugfix and the source of it.
It does not matter whether the maintainer of "bugfix" pulled it from
devel or from stable. It's his fault anyway.
But we are not talking about faults and guiltiness. We want to identify
the location and the context _where_ and _why_ this change was created.
The current solution of git makes it impossible to retrieve this
information in a consistent way.
So you have no quick solution to figure out what happened. Quite
contrary, you have to dissect inconsistent information.
See also the thread about "Stop git-rev-list at sha1 match".
> The chain of command might be good to know in the same way that an
> accurate signed-off-by chain is good to know.
This sentence makes me guess, that you actually are working in a QA
departement and therefor trying to maximize the amount of irrelevant
information.
tglx
^ permalink raw reply
* Re: [PATCH] [RFD] Add repoid identifier to commit
From: Sean @ 2005-05-12 0:20 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: tglx, git
In-Reply-To: <42829D9F.3010403@zytor.com>
On Wed, May 11, 2005 8:04 pm, H. Peter Anvin said:
> Sean wrote:
>>
>> Amongst other issues and complexity this will introduce. This is
>> really a solution in search of a problem anyway.
>>
> You mean repoid?
Hey Peter,
Yes, it will create just as many problems as it sets out to solve.
Actually, I still don't know what problem is being addressed by the
current proposal.
Sean
^ permalink raw reply
* Re: [PATCH] [RFD] Add repoid identifier to commit
From: H. Peter Anvin @ 2005-05-12 0:04 UTC (permalink / raw)
To: Sean; +Cc: tglx, git
In-Reply-To: <3004.10.10.10.24.1115855130.squirrel@linux1>
Sean wrote:
>
> Amongst other issues and complexity this will introduce. This is really
> a solution in search of a problem anyway.
>
You mean repoid?
-hpa
^ permalink raw reply
* [PATCH] checkout-cache fix
From: Junio C Hamano @ 2005-05-12 0:02 UTC (permalink / raw)
To: Petr Baudis; +Cc: git
In-Reply-To: <20050511224044.GI22686@pasky.ji.cz>
Commit cc01b05f0a3dfdf5ed114e429a7bec1ad549ab1c
Author Junio C Hamano <junkio@cox.net>, Wed May 11 17:00:16 2005 -0700
Committer Junio C Hamano <junkio@cox.net>, Wed May 11 17:00:16 2005 -0700
Fix checkout-cache when existing work tree interferes with the checkout.
This is essentially the same one as the last one I sent to the
GIT list, except that the patch is rebased to the current tip of
the git-pb tree, and an unnecessary call to create_directories()
removed.
The checkout-cache command gets confused when checking out a
file in a subdirectory and the work tree has a symlink to the
subdirectory. Also it fails to check things out when there is a
non-directory in the work tree when cache expects a directory
there, and vice versa. This patch fixes the first problem by
making sure all the leading paths in the file being checked out
are indeed directories, and also fixes directory vs
non-directory conflicts when '-f' is specified by removing the
offending paths.
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
--- a/checkout-cache.c
+++ b/checkout-cache.c
@@ -32,6 +32,8 @@
* of "-a" causing problems (not possible in the above example,
* but get used to it in scripting!).
*/
+#include <sys/types.h>
+#include <dirent.h>
#include "cache.h"
static int force = 0, quiet = 0, not_new = 0;
@@ -46,20 +48,61 @@ static void create_directories(const cha
len = slash - path;
memcpy(buf, path, len);
buf[len] = 0;
- mkdir(buf, 0755);
+ if (mkdir(buf, 0755)) {
+ if (errno == EEXIST) {
+ struct stat st;
+ if (!lstat(buf, &st) && S_ISDIR(st.st_mode))
+ continue; /* ok */
+ if (force && !unlink(buf) && !mkdir(buf, 0755))
+ continue;
+ }
+ die("cannot create directory at %s", buf);
+ }
}
free(buf);
}
+static void remove_subtree(const char *path)
+{
+ DIR *dir = opendir(path);
+ struct dirent *de;
+ char pathbuf[PATH_MAX];
+ char *name;
+
+ if (!dir)
+ die("cannot opendir %s", path);
+ strcpy(pathbuf, path);
+ name = pathbuf + strlen(path);
+ *name++ = '/';
+ while ((de = readdir(dir)) != NULL) {
+ struct stat st;
+ if ((de->d_name[0] == '.') &&
+ ((de->d_name[1] == 0) ||
+ ((de->d_name[1] == '.') && de->d_name[2] == 0)))
+ continue;
+ strcpy(name, de->d_name);
+ if (lstat(pathbuf, &st))
+ die("cannot lstat %s", pathbuf);
+ if (S_ISDIR(st.st_mode))
+ remove_subtree(pathbuf);
+ else if (unlink(pathbuf))
+ die("cannot unlink %s", pathbuf);
+ }
+ closedir(dir);
+ if (rmdir(path))
+ die("cannot rmdir %s", path);
+}
+
static int create_file(const char *path, unsigned int mode)
{
int fd;
mode = (mode & 0100) ? 0777 : 0666;
+ create_directories(path);
fd = open(path, O_WRONLY | O_TRUNC | O_CREAT, mode);
if (fd < 0) {
- if (errno == ENOENT) {
- create_directories(path);
+ if (errno == EISDIR && force) {
+ remove_subtree(path);
fd = open(path, O_WRONLY | O_TRUNC | O_CREAT, mode);
}
}
------------------------------------------------
^ permalink raw reply
* [PATCH] Test suite
From: Junio C Hamano @ 2005-05-12 0:01 UTC (permalink / raw)
To: Petr Baudis; +Cc: git
In-Reply-To: <20050511224044.GI22686@pasky.ji.cz>
Commit 1da683e1247046796a094c4917bc0c4591530272
Author Junio C Hamano <junkio@cox.net>, Wed May 11 16:59:35 2005 -0700
Committer Junio C Hamano <junkio@cox.net>, Wed May 11 16:59:35 2005 -0700
Test suite: infrastructure and examples.
This adds the test suite infrastructure with two example tests.
The current git-checkout-cache the example tests would fail this
test and will be corrected in a separate patch.
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
Created: t/t1000-checkout-cache.sh (mode:100755)
--- /dev/null
+++ b/t/t1000-checkout-cache.sh
@@ -0,0 +1,54 @@
+#!/bin/sh
+#
+# Copyright (c) 2005 Junio C Hamano
+#
+
+. ./test-lib.sh
+test_description "$@" 'git-checkout-cache test.
+
+This test registers the following filesystem structure in the
+cache:
+
+ path0 - a file
+ path1/file1 - a file in a directory
+
+And then tries to checkout in a work tree that has the following:
+
+ path0/file0 - a file in a directory
+ path1 - a file
+
+The git-checkout-cache command should fail when attempting to checkout
+path0, finding it is occupied by a directory, and path1/file1, finding
+path1 is occupied by a non-directory. With "-f" flag, it should remove
+the conflicting paths and succeed.
+'
+
+date >path0
+mkdir path1
+date >path1/file1
+git-update-cache --add path0 path1/file1
+test_debug 'git-ls-files --stage'
+
+rm -fr path0 path1
+mkdir path0
+date >path0/file0
+date >path1
+test_debug 'git-ls-files --stage'
+test_debug 'find path*'
+
+test_expect_failure 'git-checkout-cache -a'
+test_debug 'find path*'
+
+test_expect_success 'git-checkout-cache -f -a'
+test_debug 'find path*'
+
+if test -f path0 && test -d path1 && test -f path1/file1
+then
+ test_ok "checkout successful"
+else
+ test_failure "checkout failed"
+fi
+
+test_done
+
+
Created: t/t1001-checkout-cache.sh (mode:100755)
--- /dev/null
+++ b/t/t1001-checkout-cache.sh
@@ -0,0 +1,76 @@
+#!/bin/sh
+#
+# Copyright (c) 2005 Junio C Hamano
+#
+
+. ./test-lib.sh
+test_description "$@" 'git-checkout-cache test.
+
+This test registers the following filesystem structure in the cache:
+
+ path0/file0 - a file in a directory
+ path1/file1 - a file in a directory
+
+and attempts to check it out when the work tree has:
+
+ path0/file0 - a file in a directory
+ path1 - a symlink pointing at "path0"
+
+Checkout cache should fail to extract path1/file1 because the leading
+path path1 is occupied by a non-directory. With "-f" it should remove
+the symlink path1 and create directory path1 and file path1/file1.
+'
+
+show_files() {
+ # show filesystem files, just [-dl] for type and name
+ find path? -ls |
+ sed -e 's/^[0-9]* * [0-9]* * \([-bcdl]\)[^ ]* *[0-9]* *[^ ]* *[^ ]* *[0-9]* [A-Z][a-z][a-z] [0-9][0-9] [^ ]* /fs: \1 /'
+ # what's in the cache, just mode and name
+ git-ls-files --stage |
+ sed -e 's/^\([0-9]*\) [0-9a-f]* [0-3] /ca: \1 /'
+ # what's in the tree, just mode and name.
+ git-ls-tree -r "$1" |
+ sed -e 's/^\([0-9]*\) [^ ]* [0-9a-f]* /tr: \1 /'
+}
+
+mkdir path0
+date >path0/file0
+git-update-cache --add path0/file0
+tree1=$(git-write-tree)
+test_debug 'show_files $tree1'
+
+mkdir path1
+date >path1/file1
+git-update-cache --add path1/file1
+tree2=$(git-write-tree)
+test_debug 'show_files $tree2'
+
+rm -fr path1
+git-read-tree -m $tree1
+git-checkout-cache -f -a
+test_debug 'show_files $tree1'
+
+ln -s path0 path1
+git-update-cache --add path1
+tree3=$(git-write-tree)
+test_debug 'show_files $tree3'
+
+# Morten says "Got that?" here.
+# Test begins.
+
+git-read-tree $tree2
+test_expect_success 'git-checkout-cache -f -a'
+test_debug show_files $tree2
+
+if test ! -h path0 && test -d path0 &&
+ test ! -h path1 && test -d path1 &&
+ test ! -h path0/file0 && test -f path0/file0 &&
+ test ! -h path1/file1 && test -f path1/file1
+then
+ test_ok "checked out correctly."
+else
+ test_failure "did not check out correctly."
+fi
+
+test_done
+
Created: t/test-lib.sh (mode:100755)
--- /dev/null
+++ b/t/test-lib.sh
@@ -0,0 +1,106 @@
+#!/bin/sh
+#
+# Copyright (c) 2005 Junio C Hamano
+#
+
+# For repeatability, reset the environment to known value.
+export LANG C
+export TZ UTC
+unset AUTHOR_DATE
+unset AUTHOR_EMAIL
+unset AUTHOR_NAME
+unset COMMIT_AUTHOR_EMAIL
+unset COMMIT_AUTHOR_NAME
+unset GIT_ALTERNATE_OBJECT_DIRECTORIES
+unset GIT_AUTHOR_DATE
+unset GIT_AUTHOR_EMAIL
+unset GIT_AUTHOR_NAME
+unset GIT_COMMITTER_EMAIL
+unset GIT_COMMITTER_NAME
+unset GIT_DIFF_OPTS
+unset GIT_DIR
+unset GIT_EXTERNAL_DIFF
+unset GIT_INDEX_FILE
+unset GIT_OBJECT_DIRECTORY
+unset SHA1_FILE_DIRECTORIES
+unset SHA1_FILE_DIRECTORY
+
+# Each test should start with something like this, after copyright notices:
+#
+# . ./testlib.sh
+# test_description "$@" 'Description of this test...
+# This test checks if command xyzzy does the right thing...
+# '
+#
+
+test_description () {
+ while case "$#" in 0) break;; esac
+ do
+ case "$1" in
+ -d|--d|--de|--deb|--debu|--debug)
+ debug=t; shift ;;
+ -h|--h|--he|--hel|--help)
+ eval echo '"$'$#'"'
+ exit 0
+ ;;
+ *)
+ break ;;
+ esac
+ done
+ test_failure=0
+}
+
+say () {
+ echo "* $*"
+}
+
+test_debug () {
+ case "$debug" in '') ;; ?*) eval "$*" ;; esac
+}
+
+test_ok () {
+ echo "* $*";
+}
+
+test_failure () {
+ echo "* $*";
+ test_failure=1;
+}
+
+test_expect_failure () {
+ say "expecting failure: $1"
+ eval "$1"
+ case $? in
+ 0) test_failure "did not fail as expected" ;;
+ *) test_ok "failed as expected" ;;
+ esac
+}
+
+test_expect_success () {
+ say "expecting success: $1"
+ eval "$1"
+ case $? in
+ 0) test_ok "succeeded as expected" ;;
+ *) test_failure "did not succeed as expected" ;;
+ esac
+}
+
+test_done () {
+ case "$test_failure" in
+ 0) exit 0 ;;
+ '') echo "*** test script did not start with test_description";
+ exit 2 ;;
+ *) exit 1 ;;
+ esac
+}
+
+# Test the binaries we have just built. The tests are kept in
+# t/ subdirectory and are run in test-repo subdirectory.
+PATH=$(pwd)/..:$PATH
+
+# Test repository
+test=test-repo
+rm -fr "$test"
+mkdir "$test"
+cd "$test"
+git-init-db 2>/dev/null || error "cannot run git-init-db"
------------------------------------------------
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox