Git development
 help / color / mirror / Atom feed
* Re: [PATCH] Make git tag a builtin.
From: Johannes Schindelin @ 2007-07-20 10:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Carlos Rica, git
In-Reply-To: <7v1wf3zbhj.fsf@assigned-by-dhcp.cox.net>

Hi,

On Fri, 20 Jul 2007, Junio C Hamano wrote:

> Carlos Rica <jasampler@gmail.com> writes:
> 
> > This replaces the script "git-tag.sh" with "builtin-tag.c".
> 
> Thanks.  Will queue in 'next', and perhaps with a few nit fixups
> to merge as the first thing after 1.5.3.

Nice!

> launch-editor() would need adjustment for GIT_EDITOR and core.editor 
> which should be straightforward.

Yes.  As a separate commit?  After a couple of revisions, I lost track of 
what I looked at, and what not...  So it would make it easier for me to 
have these changes after the big builtin-tag commit.

> > +		sp = buf = read_sha1_file(sha1, &type, &size);
> > +		if (!buf || !size)
> > +			return 0;
> 
> Theoretically, I can create millions of lightweight tags, all of
> them pointing at a zero-length blob object (or an empty tree
> object) and kill you with memory leak here ;-).

Yes ;-)  Would we really want to say

	if (!buf)
		return 0;
	if (!size) {
		free(buf);
		return 0;
	}

here?  IMHO that is overkill.

> > +		/* skip header */
> > +		while (sp + 1 < buf + size &&
> > +				!(sp[0] == '\n' && sp[1] == '\n'))
> > +			sp++;
> > +		/* only take up to "lines" lines, and strip the signature */
> > +		for (i = 0, sp += 2; i < filter->lines && sp < buf + size &&
> > +				prefixcmp(sp, PGP_SIGNATURE "\n");
> > +				i++) {
> 
> Minor nit; I would have split this to four physical lines, like:
> 
> 		for (i = 0, sp += 2;
> 			i < filter->lines && sp < buf + size &&
> 			prefixcmp(sp, PGP_SIGNATURE "\n");
> 			i++) {
> 
> The places that semicolons appear are more significant gaps when
> reading the code.

I am to blame for that.  That code was outlined by me.

> > +int cmd_tag(int argc, const char **argv, const char *prefix)
> > +{
> > ...
> > +		if (!strcmp(arg, "-F")) {
> > +			unsigned long len;
> > +			int fd;
> > +
> > +			annotate = 1;
> > +			i++;
> > +			if (i == argc)
> > +				die("option -F needs an argument.");
> > +
> > +			if (!strcmp(argv[i], "-"))
> > +				fd = 0;
> > +			else {
> > +				fd = open(argv[i], O_RDONLY);
> > +				if (fd < 0)
> > +					die("could not open '%s': %s",
> > +						argv[i], strerror(errno));
> > +			}
> > +			len = 1024;
> > +			message = xmalloc(len);
> 
> You cannot anticipate how many bytes the user will type (or
> pipe-in), but when you opened the file you could fstat() to see
> how many bytes you would need to hold the contents of that
> file.  Even in stdin case fstat(fd) could tell you the size, but
> I am not sure how to tell if the st_size is reliable.  But for
> the purposes of "git tag", 1k buffer that grows on demand is
> probably cheaper than a fstat() syscall.

For a one-shot program as git-tag, I agree, the current patch is 
sufficient.

Ciao,
Dscho

^ permalink raw reply

* Re: [RFC PATCH] Re: Empty directories...
From: David Kastrup @ 2007-07-20 10:11 UTC (permalink / raw)
  To: git
In-Reply-To: <46A08006.4020500@fs.ei.tum.de>

Simon 'corecode' Schubert <corecode@fs.ei.tum.de> writes:

> David Kastrup wrote:
>>> Otherwise you could have two very different trees that encode the
>>> same *content* (just with different ways of getting there -
>>> depending on whether you have a history with empty trees or not),
>>> and that's very much against the philosophy of git, and breaks some
>>> fundamental rules (like the fact that "same content == same SHA1").
>>
>> No, the content is _different_.  One tree contains a tracked
>> directory, the other does not.  That means that the trees behave
>> _differently_ when you manipulate them, and that means that they are
>> _not_ the same tree.
>
> You are mistaking things.

No, I am redefining them, or rather the view on them.  Subtle
difference.

> Like the executable bit on a file is not content, the fact that a
> directory should be kept despite being empty is also an *attribute*
> of the directory.  This is meta-data, not actual data (content).

We need to track it, anyway.  So there is little point in not using
the existing infrastructure for handling named entities.

> So no matter how elegant tracking the "." entry might be (and I
> think it is, because it covers a lot of corner cases already), it
> puts the information at the wrong place.

I don't see that the place is wrong: after all, that is where Unix
places "." too, and for good reason.  I was arguing for _separating_
the concept of "directory" and "tree" in the repository.  The tree is
a container entity defined exclusively by its contents (which
determine its hash).  That is how git already does things.  There is
_no_ connection with the physical existence of a directory: in the
work directory, git creates and deletes directories as a _side-effect_
of storing and removing trees.  But git itself does not track
directories as a physical entity at _all_.  If you had a flat
filesystem allowing slashes in filenames, git would get along better
than it does now, without ever creating or removing a directory.
Trees are just a convenient selection and pattern matching mechanism
for files as far as git is concerned.  The correspondence to physical
directories in the work directory is a nuisance rather than an asset
as far as git is concerned.

In a recent thread here, tags with slashes were supported by
essentially doing

    mkdir -p "`dirname $TAG`"
    touch $TAG

where directory creation is just a side effect of supporting slashes.
And that, if you look closely, is git's current relation with
directories altogether.  The directories in the work file system are
created by git just as a side effect for representing slashes, which
in turn facilitate a certain manner of pattern matching.

And "." seems perfectly well suited to bring across the point that
there actually is _physical_ existence associated with a directory,
existence that remains when the rest of the tree is gone and _makes_ a
difference to what the tree is, because it has a _different_
representation in the work file system.

Storing it as an _attribute_ of the tree is a bad idea, since then the
simple rule "a tree without contents is empty" needs an exception.
And a tree stops becoming just a container of its contents and all
sort of new exceptions creep up.

There are some systems where the difference between directory as a
file and directory as a structuring method are more apparent than
under Unix (some utilities like rsync differentiate between A/B and
A/B/ to bring across that difference).

Here is an example for some Emacs function concerned with the concept:

    directory-file-name is a built-in function in `C source code'.
    (directory-file-name DIRECTORY)

    Returns the file name of the directory named DIRECTORY.
    This is the name of the file that holds the data for the directory DIRECTORY.
    This operation exists because a directory is also a file, but its name as
    a directory is different from its name as a file.
    In Unix-syntax, this function just removes the final slash.
    On VMS, given a VMS-syntax directory name such as "[X.Y]",
    it returns a file name such as "[X]Y.DIR.1".

    [back]

> That's sad, because otherwise it would be really elegant.

If something is not elegant because of the angle of view, change the
view.  And it is not like the different angle has no predecessors or
no consistency.

-- 
David Kastrup

^ permalink raw reply

* Re: [RFC PATCH] Re: Empty directories...
From: Simon 'corecode' Schubert @ 2007-07-20  9:27 UTC (permalink / raw)
  To: David Kastrup; +Cc: git
In-Reply-To: <85644fvdrn.fsf@lola.goethe.zz>

David Kastrup wrote:
>> Otherwise you could have two very different trees that encode the
>> same *content* (just with different ways of getting there -
>> depending on whether you have a history with empty trees or not),
>> and that's very much against the philosophy of git, and breaks some
>> fundamental rules (like the fact that "same content == same SHA1").
> 
> No, the content is _different_.  One tree contains a tracked
> directory, the other does not.  That means that the trees behave
> _differently_ when you manipulate them, and that means that they are
> _not_ the same tree.

You are mistaking things.  Like the executable bit on a file is not content, the fact that a directory should be kept despite being empty is also an *attribute* of the directory.  This is meta-data, not actual data (content).  So no matter how elegant tracking the "." entry might be (and I think it is, because it covers a lot of corner cases already), it puts the information at the wrong place.

That's sad, because otherwise it would be really elegant.

cheers
  simon

-- 
Serve - BSD     +++  RENT this banner advert  +++    ASCII Ribbon   /"\
Work - Mac      +++  space for low €€€ NOW!1  +++      Campaign     \ /
Party Enjoy Relax   |   http://dragonflybsd.org      Against  HTML   \
Dude 2c 2 the max   !   http://golden-apple.biz       Mail + News   / \

^ permalink raw reply

* Re: [PATCH] Make git tag a builtin.
From: Junio C Hamano @ 2007-07-20  9:10 UTC (permalink / raw)
  To: Carlos Rica; +Cc: git, Johannes Schindelin
In-Reply-To: <469FF6E4.9010501@gmail.com>

Carlos Rica <jasampler@gmail.com> writes:

> This replaces the script "git-tag.sh" with "builtin-tag.c".

Thanks.  Will queue in 'next', and perhaps with a few nit fixups
to merge as the first thing after 1.5.3.

> diff --git a/builtin-tag.c b/builtin-tag.c
> new file mode 100644
> index 0000000..507f510
> --- /dev/null
> +++ b/builtin-tag.c
> @@ -0,0 +1,450 @@
> +/*
> + * Builtin "git tag"
> + *
> + * Copyright (c) 2007 Kristian H??gsberg <krh@redhat.com>,

I've fixed up the name here in my copy.

> +static void launch_editor(const char *path, char **buffer, unsigned long *len)
> +{
> +	const char *editor, *terminal;
> +	struct child_process child;
> +	const char *args[3];
> +	int fd;
> +
> +	editor = getenv("VISUAL");
> +	if (!editor)
> +		editor = getenv("EDITOR");
> +
> +	terminal = getenv("TERM");
> +	if (!editor && (!terminal || !strcmp(terminal, "dumb"))) {
> +		fprintf(stderr,
> +		"Terminal is dumb but no VISUAL nor EDITOR defined.\n"
> +		"Please supply the message using either -m or -F option.\n");
> +		exit(1);
> +	}
> +
> +	if (!editor)
> +		editor = "vi";

launch-editor() would need adjustment for GIT_EDITOR and
core.editor which should be straightforward.

> +struct tag_filter {
> +	const char *pattern;
> +	int lines;
> +};
> +
> +#define PGP_SIGNATURE "-----BEGIN PGP SIGNATURE-----"
> +
> +static int show_reference(const char *refname, const unsigned char *sha1,
> +			  int flag, void *cb_data)
> +{
> +	struct tag_filter *filter = cb_data;
> +
> +	if (!fnmatch(filter->pattern, refname, 0)) {
> +		int i;
> +		unsigned long size;
> +		enum object_type type;
> +		char *buf, *sp, *eol;
> +		size_t len;
> +
> +		if (!filter->lines) {
> +			printf("%s\n", refname);
> +			return 0;
> +		}
> +		printf("%-15s ", refname);
> +
> +		sp = buf = read_sha1_file(sha1, &type, &size);
> +		if (!buf || !size)
> +			return 0;

Theoretically, I can create millions of lightweight tags, all of
them pointing at a zero-length blob object (or an empty tree
object) and kill you with memory leak here ;-).

> +		/* skip header */
> +		while (sp + 1 < buf + size &&
> +				!(sp[0] == '\n' && sp[1] == '\n'))
> +			sp++;
> +		/* only take up to "lines" lines, and strip the signature */
> +		for (i = 0, sp += 2; i < filter->lines && sp < buf + size &&
> +				prefixcmp(sp, PGP_SIGNATURE "\n");
> +				i++) {

Minor nit; I would have split this to four physical lines, like:

		for (i = 0, sp += 2;
			i < filter->lines && sp < buf + size &&
			prefixcmp(sp, PGP_SIGNATURE "\n");
			i++) {

The places that semicolons appear are more significant gaps when
reading the code.

> +int cmd_tag(int argc, const char **argv, const char *prefix)
> +{
> ...
> +		if (!strcmp(arg, "-F")) {
> +			unsigned long len;
> +			int fd;
> +
> +			annotate = 1;
> +			i++;
> +			if (i == argc)
> +				die("option -F needs an argument.");
> +
> +			if (!strcmp(argv[i], "-"))
> +				fd = 0;
> +			else {
> +				fd = open(argv[i], O_RDONLY);
> +				if (fd < 0)
> +					die("could not open '%s': %s",
> +						argv[i], strerror(errno));
> +			}
> +			len = 1024;
> +			message = xmalloc(len);

You cannot anticipate how many bytes the user will type (or
pipe-in), but when you opened the file you could fstat() to see
how many bytes you would need to hold the contents of that
file.  Even in stdin case fstat(fd) could tell you the size, but
I am not sure how to tell if the st_size is reliable.  But for
the purposes of "git tag", 1k buffer that grows on demand is
probably cheaper than a fstat() syscall.

> +			if (read_fd(fd, &message, &len)) {
> +				free(message);
> +				die("cannot read %s", argv[i]);
> +			}
> +			continue;

We seem to leak fd here, but the user is doing something insane
if he gives more than one -F anyway, so it probably is Ok, as
long as we have some sanity checks.  Should we barf on "git tag
-m foo -F bar"?  What should we do with "git tag -m one -m two"?

Does the test suite check any of these conditions?

> +	if (snprintf(ref, sizeof(ref), "refs/tags/%s", tag) >= sizeof(ref))
> +		die("tag name too long: %.*s...", 50, tag);

Cute and considerate ;-).

> diff --git a/git-tag.sh b/contrib/examples/git-tag.sh
> similarity index 100%
> rename from git-tag.sh
> rename to contrib/examples/git-tag.sh

This actually caught my attention for completely different
reason.  If this were removal of git-tag.sh, this would have
conflicted when applied on top of Adam's GIT_EDITOR/core.editor
patch.  With a rename, however, there is no preimage/context
shown, so it just cleanly applied.

Also Johannes's final minor nits.

^ permalink raw reply

* Re: [PATCH] Make git tag a builtin.
From: Johannes Schindelin @ 2007-07-20  8:53 UTC (permalink / raw)
  To: Carlos Rica; +Cc: git, Junio C Hamano
In-Reply-To: <469FF6E4.9010501@gmail.com>

Hi,

On Fri, 20 Jul 2007, Carlos Rica wrote:

> diff --git a/builtin-tag.c b/builtin-tag.c
> new file mode 100644
> index 0000000..507f510
> --- /dev/null
> +++ b/builtin-tag.c
> @@ -0,0 +1,450 @@
> +/*
> + * Builtin "git tag"
> + *
> + * Copyright (c) 2007 Kristian Hv/gsberg <krh@redhat.com>,

In case of encoding problems, I suggest uploading it to repo.or.cz, so it 
can be pulled.  Uploading has more advantages, such as visibility and 
a free backup, too...

> +typedef int (*func_tag)(const char *name, const char *ref,
> +				const unsigned char *sha1);
> +
> +static int do_tag_names(const char **argv, func_tag fn)

Neat!  Just a minor style nit here, though: in refs.h, we have something 
similar, and I would have expected to read "each_tag_name_fn" and 
"for_each_tag_name" instead of "func_tag" and "do_tag_names", 
respectively.  Hmm?

> +static ssize_t do_sign(char *buffer, size_t size, size_t max)
> +{
> +	struct child_process gpg;
> +	const char *args[4];
> +	char *bracket;
> +	int len;
> +
> +	if (!*signingkey) {
> +		if (strlcpy(signingkey, git_committer_info(1),
> +				sizeof(signingkey)) >= sizeof(signingkey))

sizeof(signingkey) - 1?

> +	len = read_in_full(gpg.out, buffer + size, max - size);
> +
> +	finish_command(&gpg);
> +
> +	if (len == max - size)
> +		return error("could not read the entire signature from gpg.");

Maybe at a later stage, and if this turns out to be not sufficient to 
begin with, we might consider not using a buffer, but writing 
incrementally into an object right away.  Think "git hash-object -w 
--stdin".

But at the moment, this is not a problem, so let's keep it as-is.

> +static int git_tag_config(const char *var, const char *value)
> +{
> +	if (!strcmp(var, "user.signingkey")) {
> +		if (!value)
> +			die("user.signingkey without value");
> +		if (strlcpy(signingkey, value, sizeof(signingkey))
> +						>= sizeof(signingkey))

sizeof(signingkey) - 1?

> +static void create_tag(const unsigned char *object, const char *tag,
> +		       char *message, int sign, unsigned char *result)
> +{
> +	enum object_type type;
> +	char header_buf[1024], *buffer;
> +	int header_len, max_size;
> +	unsigned long size;
> +
> +	type = sha1_object_info(object, NULL);
> +	if (type <= 0)

Maybe use OBJ_NONE instead of 0?  Dunno.

> +	    die("bad object type.");
> +
> +	header_len = snprintf(header_buf, sizeof(header_buf),
> +			  "object %s\n"
> +			  "type %s\n"
> +			  "tag %s\n"
> +			  "tagger %s\n\n",
> +			  sha1_to_hex(object),
> +			  typename(type),
> +			  tag,
> +			  git_committer_info(1));
> +
> +	if (header_len >= sizeof(header_buf))

sizeof(header_buf) - 1?

> +	memmove(buffer + header_len, buffer, size);

Just two ideas for the future (it is not really important now, so we 
should not do it now):

- Instead of memmove()ing, we could teach read_fd() to take an offset, 
  too.

- launch_editor() could learn to write the buffer itself (IMHO it is more 
  common to write a buffer to the file before editing it, than editing an 
  existing file).

But as I said, this is just something to keep in mind, not to change now.

> +		if (!strcmp(arg, "-F")) {
> +			unsigned long len;
> +			int fd;
> +
> +			annotate = 1;
> +			i++;
> +			if (i == argc)
> +				die("option -F needs an argument.");
> +
> +			if (!strcmp(argv[i], "-"))
> +				fd = 0;
> +			else {
> +				fd = open(argv[i], O_RDONLY);
> +				if (fd < 0)
> +					die("could not open '%s': %s",
> +						argv[i], strerror(errno));
> +			}
> +			len = 1024;
> +			message = xmalloc(len);
> +			if (read_fd(fd, &message, &len)) {

With your recent sanification of read_fd(), you can initialise len and 
message to 0 and NULL, respectively.

> +		if (!strcmp(arg, "-u")) {
> +			annotate = 1;
> +			sign = 1;
> +			i++;
> +			if (i == argc)
> +				die("option -u needs an argument.");
> +			if (strlcpy(signingkey, argv[i], sizeof(signingkey))
> +							>= sizeof(signingkey))

sizeof(signingkey) - 1?

> +				die("argument to option -u too long");
> +			continue;
> +		}
> +		if (!strcmp(arg, "-l")) {
> +			return list_tags(argv[i + 1], lines);
> +		}
> +		if (!strcmp(arg, "-d")) {
> +			return do_tag_names(argv + i + 1, delete_tag);
> +		}
> +		if (!strcmp(arg, "-v")) {
> +			return do_tag_names(argv + i + 1, verify_tag);
> +		}

Please lose the "{..}" around single lines.

> +	if (get_sha1(object_ref, object))
> +		die("Failed to resolve '%s' as a valid ref.", object_ref);
> +
> +	if (snprintf(ref, sizeof(ref), "refs/tags/%s", tag) >= sizeof(ref))

sizeof(ref) - 1?

> diff --git a/git-tag.sh b/contrib/examples/git-tag.sh
> similarity index 100%
> rename from git-tag.sh
> rename to contrib/examples/git-tag.sh

That should make Nico and Junio happy.

These are my last minor nits.  If nobody else has objections, let's put 
this ship to sea.

Ciao,
Dscho

^ permalink raw reply

* Re: CVS -> SVN -> Git
From: Markus Schiltknecht @ 2007-07-20  8:45 UTC (permalink / raw)
  To: Simon 'corecode' Schubert
  Cc: Martin Langhoff, esr, Michael Haggerty, Julian Phillips, git, dev
In-Reply-To: <469FB80A.8000001@fs.ei.tum.de>

Simon 'corecode' Schubert wrote:
> I do not think Eric is right here.  You will allways lose information 
> when converting CVS to svn, and if it is just the uncertainty, the 
> non-atomicity.  This is also information (hidden one, though).

Full ACK.

> Yes.  I've already done what people want, it is not called cvs2xxx, but 
> fromcvs [1].  I don't think it is necessary to define an output format.  
> Of course, that's possible, but limiting yourself to a file format means 
> you're losing flexibility, which is needed for efficient, correct and 
> fast repository conversion.

Hm.. interesting. I'll have a close look.

Regards

Markus

^ permalink raw reply

* Re: Empty directories...
From: David Kastrup @ 2007-07-20  8:41 UTC (permalink / raw)
  To: git
In-Reply-To: <200707201029.10358.johan@herland.net>

Johan Herland <johan@herland.net> writes:

> On Wednesday 18 July 2007, Junio C Hamano wrote:
>> Didn't I say I do not have an objection for somebody who wants
>> to track empty directories, already?  I probably would not do
>> that myself but I do not see a reason to forbid it, either.
>> 
>> The right approach to take probably would be to allow entries of
>> mode 040000 in the index.  Traditionally, we allowed only 100644
>> (blobs as regular files) and 120000 (blobs as symlinks).  We
>> recently added 160000 (commit from outer space, aka subproject).
>> 
>> And we do that for all directories, not just empty ones.  So if
>> you have fileA, empty/, sub/fileB tracked, your index would
>> probably have these four entries, immediately after read-tree
>> of an existing tree object:
>
> Sorry for jumping in late...

It could have given you a chance to read up on what has already been
discussed.

> Why do you want to add _all_ directories, and not just the ones we
> want to explicitly track (independent of whether they're empty or
> not).

Because the problematic cases are more often than not the _implicit_
cases.  Do you check a directory tree for empty directories before you
archive it?  In order to archive every empty directory explicitly?

If you did that, you could equally maintain a script that manually
does mkdir/rmdir.

> Basically, add a "--dir" flag to git-add, git-rm and friends, to
> tell them you're acting on the directory itself (rather than its
> (recursive) contents). "git-add --dir foo" will add the "040000
> 123abc... 0 foo" to the index/tree whether or not foo is an empty
> directory. "git-rm --dir foo" will remove that entry (or fail if it
> doesn't exist), but _not_ the contents of foo.

There is nothing wrong with implementing something like this in
_addition_ to treating directory entries implicitly.  For example, ls
has an option -d which does just that, and even git-ls-files has an
option --directory.  Heck, I even have

rm --help
Usage: rm [OPTION]... FILE...
Remove (unlink) the FILE(s).

  -d, --directory       unlink FILE, even if it is a non-empty directory
                          (super-user only; this works only if your system
                           supports `unlink' for nonempty directories)
[...]

which works on just the directory and not on the contents.

So a --directory option for appropriate commands would be natural for
_explicit_ manipulation of such entries.

But the important, the _really_ important thing are the implicit
behaviors.  If I have to hassle with every directory myself, I don't
need a content tracking system.

The --directory stuff, in contrast, are things nice to have when the
framework is in place (and may be even necessary for some direct
manual maintenance tasks), but they don't really concern the
framework.

-- 
David Kastrup

^ permalink raw reply

* Re: GSoC status report
From: Shawn O. Pearce @ 2007-07-20  8:34 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <200707201024.35605.jnareb@gmail.com>

Jakub Narebski <jnareb@gmail.com> wrote:
> How it goes with Google Summer of Code git projects:
> builtinification,

Carlos is making good progress, he's had a few versions of git-tag
posted, and incorporated feedback into it.  Hopefully we'll see that
finialize and merge into Junio's tree in the not-too-distant future.

> libification

Luiz and I have have been working mostly off-list on this, as we
wanted to get something solid that the list can actually look at
and comment on, rather than just speculate.  Luiz has something he's
just about ready to share with us, and may post about it next week.
I don't want to steal his thunder, so I'm staying quiet on the
topic for now.  ;-)

> and GitTorrent?

Unfortuantely this isn't a project under GSoC anymore.  The student
involved had some issues with being able to get his paperwork
completed for Google and was not able to participate this summer.
I have not heard much more than that.
 
> I can see that builtinification (and perhaps libification) goes quite 
> well; what about GitTorrent project? Does anybody know how Mono SoC 
> project: "Managed C# git implementation" goes?

I haven't heard anything on it, or looked into it since the start.
I did reach out to both the student and Miguel, and got responses
back from them at that time, but have not heard anything since.
A Google search turned up the Mono GSoC status update mailing list,
which has weekly posts from a bunch of other Mono related projects,
but none about the C# Git implementation.  I'm suspecting maybe it
is dead.

-- 
Shawn.

^ permalink raw reply

* Re: Empty directories...
From: Johan Herland @ 2007-07-20  8:29 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, David Kastrup
In-Reply-To: <7vhco28aoq.fsf@assigned-by-dhcp.cox.net>

On Wednesday 18 July 2007, Junio C Hamano wrote:
> Didn't I say I do not have an objection for somebody who wants
> to track empty directories, already?  I probably would not do
> that myself but I do not see a reason to forbid it, either.
> 
> The right approach to take probably would be to allow entries of
> mode 040000 in the index.  Traditionally, we allowed only 100644
> (blobs as regular files) and 120000 (blobs as symlinks).  We
> recently added 160000 (commit from outer space, aka subproject).
> 
> And we do that for all directories, not just empty ones.  So if
> you have fileA, empty/, sub/fileB tracked, your index would
> probably have these four entries, immediately after read-tree
> of an existing tree object:

Sorry for jumping in late...

Why do you want to add _all_ directories, and not just the ones we want to 
explicitly track (independent of whether they're empty or not).

Basically, add a "--dir" flag to git-add, git-rm and friends, to tell them 
you're acting on the directory itself (rather than its (recursive) 
contents). "git-add --dir foo" will add the "040000 123abc... 0 foo" to the 
index/tree whether or not foo is an empty directory. "git-rm --dir foo" will 
remove that entry (or fail if it doesn't exist), but _not_ the contents of 
foo.

Since we're making directory tracking _explicit_, this should all be trivially 
backward-compatible.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply

* GSoC status report
From: Jakub Narebski @ 2007-07-20  8:24 UTC (permalink / raw)
  To: git

How it goes with Google Summer of Code git projects:
builtinification, libification and GitTorrent?

I can see that builtinification (and perhaps libification) goes quite 
well; what about GitTorrent project? Does anybody know how Mono SoC 
project: "Managed C# git implementation" goes?

-- 
Jakub Narebski
Poland

^ permalink raw reply

* Re: Idea for git-fast-import
From: Shawn O. Pearce @ 2007-07-20  7:28 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: git
In-Reply-To: <46A05D4B.1050208@alum.mit.edu>

Michael Haggerty <mhagger@alum.mit.edu> wrote:
> I'm working on a git backend for cvs2svn and had an idea for
> git-fast-import that would make life a tiny bit easier:

Cool!
 
> Currently, git-fast-import marks are positive integers.  But they are
> used for two things: marking single-file blobs, and marking commits.
> 
> This is a tiny bit awkward, because cvs2svn assigns small integer IDs to
> these things too, but uses distinct (overlapping) integer series for the
> two concepts.  If it would be trivial to split the marks into two
> "namespaces" (one for single-file blobs and one for commits), that would
> make things a little bit more natural.  I don't think commit marks can
> be used interchangeably with blob marks anyway, so it wouldn't be a
> backwards incompatibility.

That's true, they aren't interchangeable.  fast-import pukes
and dies if you try to use the wrong type at the wrong location.
It has been requested before that the two namespaces be split,
and I just have been too lazy to do it.

> Without this feature, I will have to assign a new "mark" integer series
> that is unrelated to cvs2svn's IDs, which is no big deal at all but will
> make debugging a little bit harder.  So only add this feature if it is
> really easy for you.

Its not that much code reorg, but there is some reorg required to
make it work.  Maybe only a few hundred line diff, so probably well
within reason.  I'll look into it later.
 
> Also, is there a big cost to using "not-quite-consecutive" integers as
> marks?  cvs2svn's CVSRevision IDs are intermingled with IDs for
> CVSBranches and CVSTags, so the CVSRevisions alone probably only pack
> the ID space 5%-50% full.

Marks cost exactly 1 pointer (4 or 8 bytes) as they are actually just
a pointer to the already-in-memory object metadata that fast-import
uses for bookkeeping related to packfile generation.  Gaps in the
marks sequence also cost exactly 1 pointer, as they are just NULL.

But the marks table is actually a sparse array, using 1024 entries
per block.  So if you assign a mark at :5, then another at say
:1047000 you have only allocated 3 blocks and 12 KiB of memory
(a root directory block at 4 KiB, two leafs at 4 KiB each).  A far
cry from 4 MiB.

Its not a binary tree, its a sparse digital index.  So going
really far out in the namespace with huge gaps will cost you some
index nodes.  Staying reasonably dense is actually quite efficient,
with pretty low directory overheads.

> In fact, if there is a big cost to "not-quite-consecutive" integers,
> then I withdraw my request for separate mark namespaces, since I would
> have to reallocate mark numbers anyway :-)

See above.  5% full is really bad, because you are probably going to
allocate nearly every block in the directory, and only fill each leaf
block at 5% full.  50% full is actually reasonable, as it means marks
are only costing you about 2 pointers on average (8 or 16 bytes).

I went with the sparse array/digital index approach because it is
fairly compact code, quick store and lookup operations, and I figured
most frontends could get at least 50% full on their mark allocation.
On really dense allocations (>60%) the very low overhead per mark
makes it insanely efficient, even for a very large number of marks.

Jon Smirl was dumping marks sequentially from his hacked cvs2svn,
thereby getting the marks table at 100% full.  Other recent import
attempts with fast-import have also managed to keep their mark
allocations pretty close (if not dead on) at 100% full.

I can see how it might be convenient to have a very sparsely filled
mark namespace.  Its also convenient to have a mark namespace that
uses arbitrary strings.  Unfortunately I chose not to support
those very well (or at all!) for the sake of trying to keep the
fast-import code more compact internally, and to simplify its
internal memory management.  You might be able to talk me into
improving on that however.  ;-)
 
> Another thing that might help with debugging would be a "comment"
> command, which git-fast-import should ignore.  One could put text about
> the source of a chunk of git-fast-import stream to relate it back to the
> front-end concepts when debugging the stream contents by hand.

This is an awesome idea, especially when combined with having a
buffer of the last few commands that fast-import saw right before
it crashed.  I'll see what I can do.
 
-- 
Shawn.

^ permalink raw reply

* Re: Another question about importing SVN with fast-import
From: David Frech @ 2007-07-20  7:16 UTC (permalink / raw)
  To: Julian Phillips; +Cc: Shawn O. Pearce, git
In-Reply-To: <Pine.LNX.4.64.0707200759090.20056@beast.quantumfyre.co.uk>

On 7/20/07, Julian Phillips <julian@quantumfyre.co.uk> wrote:
> On Thu, 19 Jul 2007, David Frech wrote:
> > Since we'll be referring to past commits via marks (with start with
> > ":") how about this:
> >
> > 'C' SP srcpath:mark SP dstpath
>
> Why only via marks?  That may be what you happen to want, but no other
> command restricts you to _only_ using marks ...

I was using the royal "we". ;-)

Yes, you're right of course. My bad.

- David

> --
> Julian
>
>   ---
> What say you, Mr. American Ambassador?
>
> Fuck Canada!
>


-- 
If I have not seen farther, it is because I have stood in the
footsteps of giants.

^ permalink raw reply

* Re: [PATCH] Force listingblocks to be monospaced in manpages
From: Julian Phillips @ 2007-07-20  7:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Fredrik Tolf, git
In-Reply-To: <7v3azj1rva.fsf@assigned-by-dhcp.cox.net>

On Fri, 20 Jul 2007, Junio C Hamano wrote:

> Julian Phillips <julian@quantumfyre.co.uk> writes:
>
>> In order for these roff commands to get through to the manpage they
>> have to be element encoded to prevent quoting.  In particular with
>> docbook xsl 1.72.0 and newer we have to use U+2302 instead of . to
>> prevent the roff command being escaped.  We also add a small perl
>> script for docbook < 1.72.0.
>
> This does not work at all for docbook 1.71.  I get "^<TAB>ft C"
> as output from xmlto.

Oh, well ... that's handy? :S

I've just checked, and I do have a machine with docbook < 1.72, so I'll 
see if I can get something working on both.  Probably early next week 
since I'm away this weekend.

-- 
Julian

  ---
Words are the voice of the heart.

^ permalink raw reply

* Re: Another question about importing SVN with fast-import
From: Julian Phillips @ 2007-07-20  7:01 UTC (permalink / raw)
  To: David Frech; +Cc: Shawn O. Pearce, git
In-Reply-To: <7154c5c60707192354k7db677a6m4f8cbd474747ca92@mail.gmail.com>

On Thu, 19 Jul 2007, David Frech wrote:

> On 7/19/07, Julian Phillips <julian@quantumfyre.co.uk> wrote:
>>  On Fri, 20 Jul 2007, Shawn O. Pearce wrote:
>> 
>> >  It is possible.  I'm just not sure what the syntax for it should be.
>> >  Suggestions?  I really want to stay backwards compatible with the
>> >  current "C" command, so:
>> > 
>> >        'C' SP commit SP path SP path
>> > 
>> >  is out because its ambiguous with the current meaning where the
>> >  second (destination) path can contain SP without being quoted by
>> >  the frontend.
>>
>>  You could always make it part of the 'M' command?
>>
>>  'M' sp mode sp 'copy' sp path_str lf (ref_str | hexsha1 | sha1exp_str |
>>  idnum) SP path_str;
>>
>>  Or just make it a new command, O (for other) or E (for existing) maybe? :S
>
> Since we'll be referring to past commits via marks (with start with
> ":") how about this:
>
> 'C' SP srcpath:mark SP dstpath

Why only via marks?  That may be what you happen to want, but no other 
command restricts you to _only_ using marks ...

>
> In the case of making it a new command I can't think of any really
> compelling one letter names. ;-)
>
> - David
>
>>
>>  --
>>  Julian
>>
>>    ---
>>  It is easier to change the specification to fit the program than vice
>>  versa.
>> 
>
>

-- 
Julian

  ---
What say you, Mr. American Ambassador?

Fuck Canada!

^ permalink raw reply

* Re: [PATCH] Force listingblocks to be monospaced in manpages
From: Junio C Hamano @ 2007-07-20  7:00 UTC (permalink / raw)
  To: Julian Phillips; +Cc: Fredrik Tolf, git
In-Reply-To: <20070719014852.4573.65336.julian@quantumfyre.co.uk>

Julian Phillips <julian@quantumfyre.co.uk> writes:

> In order for these roff commands to get through to the manpage they
> have to be element encoded to prevent quoting.  In particular with
> docbook xsl 1.72.0 and newer we have to use U+2302 instead of . to
> prevent the roff command being escaped.  We also add a small perl
> script for docbook < 1.72.0.

This does not work at all for docbook 1.71.  I get "^<TAB>ft C"
as output from xmlto.

^ permalink raw reply

* Idea for git-fast-import
From: Michael Haggerty @ 2007-07-20  6:59 UTC (permalink / raw)
  To: git, spearce

I'm working on a git backend for cvs2svn and had an idea for
git-fast-import that would make life a tiny bit easier:

Currently, git-fast-import marks are positive integers.  But they are
used for two things: marking single-file blobs, and marking commits.

This is a tiny bit awkward, because cvs2svn assigns small integer IDs to
these things too, but uses distinct (overlapping) integer series for the
two concepts.  If it would be trivial to split the marks into two
"namespaces" (one for single-file blobs and one for commits), that would
make things a little bit more natural.  I don't think commit marks can
be used interchangeably with blob marks anyway, so it wouldn't be a
backwards incompatibility.

Without this feature, I will have to assign a new "mark" integer series
that is unrelated to cvs2svn's IDs, which is no big deal at all but will
make debugging a little bit harder.  So only add this feature if it is
really easy for you.

Also, is there a big cost to using "not-quite-consecutive" integers as
marks?  cvs2svn's CVSRevision IDs are intermingled with IDs for
CVSBranches and CVSTags, so the CVSRevisions alone probably only pack
the ID space 5%-50% full.

In fact, if there is a big cost to "not-quite-consecutive" integers,
then I withdraw my request for separate mark namespaces, since I would
have to reallocate mark numbers anyway :-)

Another thing that might help with debugging would be a "comment"
command, which git-fast-import should ignore.  One could put text about
the source of a chunk of git-fast-import stream to relate it back to the
front-end concepts when debugging the stream contents by hand.

[I will be out of town until Monday, so don't be surprised that I don't
respond right away :-) ]

Thanks,
Michael

^ permalink raw reply

* Re: Another question about importing SVN with fast-import
From: David Frech @ 2007-07-20  6:54 UTC (permalink / raw)
  To: Julian Phillips; +Cc: Shawn O. Pearce, git
In-Reply-To: <Pine.LNX.4.64.0707200646400.18125@beast.quantumfyre.co.uk>

On 7/19/07, Julian Phillips <julian@quantumfyre.co.uk> wrote:
> On Fri, 20 Jul 2007, Shawn O. Pearce wrote:
>
> > It is possible.  I'm just not sure what the syntax for it should be.
> > Suggestions?  I really want to stay backwards compatible with the
> > current "C" command, so:
> >
> >       'C' SP commit SP path SP path
> >
> > is out because its ambiguous with the current meaning where the
> > second (destination) path can contain SP without being quoted by
> > the frontend.
>
> You could always make it part of the 'M' command?
>
> 'M' sp mode sp 'copy' sp path_str lf (ref_str | hexsha1 | sha1exp_str |
> idnum) SP path_str;
>
> Or just make it a new command, O (for other) or E (for existing) maybe? :S

Since we'll be referring to past commits via marks (with start with
":") how about this:

'C' SP srcpath:mark SP dstpath

In the case of making it a new command I can't think of any really
compelling one letter names. ;-)

- David

>
> --
> Julian
>
>   ---
> It is easier to change the specification to fit the program than vice versa.
>

-- 
If I have not seen farther, it is because I have stood in the
footsteps of giants.

^ permalink raw reply

* Re: [RFC PATCH] Re: Empty directories...
From: David Kastrup @ 2007-07-20  5:55 UTC (permalink / raw)
  To: git
In-Reply-To: <alpine.LFD.0.999.0707191930030.27249@woody.linux-foundation.org>

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Thu, 19 Jul 2007, Junio C Hamano wrote:
>> 
>> But you _could_ treat that "should-be-kept-even-when-empty"-ness
>> just like we treat executable bit on blobs, I think.
>
> True. Or you could make it a path attribute and/or a per-repository
> decision, so that while the data wouldn't necessarily be in the
> database itself, the user could specify the behaviour he wanted.

No, one can't.  Once can decide per repository whether one wants to
permit this kind of information in.  But if one does, the information
needs to there for _every_ tree.  And a "." entry is a natural and
intuitive way to do that.  "." has been used as a directory entry for
decades in Unix.

>> This will involve a lot of changes, so I would not recommend
>> anybody doing so, though.
>
> Agreed. The upside just isn't there.

It is a good thing that you did not design the Unix file systems.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply

* Re: [RFC PATCH] Re: Empty directories...
From: David Kastrup @ 2007-07-20  5:58 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Linus Torvalds, Brian Gernhardt, Shawn O.Pearce, Matthieu Moy,
	Johannes Schindelin, Git Mailing List
In-Reply-To: <7vir8f24o2.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <gitster@pobox.com> writes:

> Linus Torvalds <torvalds@linux-foundation.org> writes:
>
>> So the whole notion of "remembering" whether a directory was added 
>> explicitly as an empty directory or not is just not a sensible concept in 
>> git. 
>
> That is true if it is implemented as David suggested, to have a
> phony "." entry in the tree object itself.

Unix file systems contain a phony "." entry in the directory itself,
and have survived in spite of this.

> The object name of such a tree (when it contains blobs and trees
> underneath) will be different from a tree that contains the same set
> of blobs and trees.  It would destroy the fundamental concepts of
> git.

Like "." destroyed the fundamental concepts of Unix filesystems.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply

* Re: CVS -> SVN -> Git
From: Julian Phillips @ 2007-07-20  5:58 UTC (permalink / raw)
  To: Simon 'corecode' Schubert; +Cc: git
In-Reply-To: <469FB84B.2010909@fs.ei.tum.de>

On Thu, 19 Jul 2007, Simon 'corecode' Schubert wrote:

> Julian Phillips wrote:
>>  Has anyone managed to succssfully import a Subversion repository that was
>>  initially imported from CVS using cvs2svn using fast-import?
>>
>>  It looks like cvs2svn has created a rather big mess.   It has created
>>  single commits that change files in more than one branch and/or tag. It
>>  also creates tags using more than one commit.  Now I come to try and
>>  import the Subversion history into git and I'm having trouble creating a
>>  sensible stream to feed into fast-import.
>
> Did you try first converting the old CVS repo to git and then adding the svn 
> changes?  That might give you much better results.

I thought about it, but there are over 20000 commits sitting on top of the 
converted history referring back to it - I'm not convinced that I could 
stitch things back together properly, the svn history now really does 
rely on the import done by cvs2svn.  (btw, I blame CVS for the mess not 
cvs2svn, we should have switched _before_ we started using branches 
heavily ...)

The problem really is that we use branching like it's going out of 
fashion.  We have thousands now, and had at least 10s if not 100s by the 
time we gave up on CVS.  Similarly with tags.

I think I've managed to get things sorted now with fast-import ... just 
need to be able to copy blobs from other commits and I think I'll be done. 
It really is a nice tool.

-- 
Julian

  ---
There is nothing so easy but that it becomes difficult when you do it
reluctantly.
 		-- Publius Terentius Afer (Terence)

^ permalink raw reply

* Re: [RFC PATCH] Re: Empty directories...
From: David Kastrup @ 2007-07-20  5:53 UTC (permalink / raw)
  To: git
In-Reply-To: <7vir8f24o2.fsf@assigned -by-dhcp.cox.net>

Junio C Hamano <gitster@pobox.com> writes:

> Linus Torvalds <torvalds@linux-foundation.org> writes:
>
>> So the whole notion of "remembering" whether a directory was added 
>> explicitly as an empty directory or not is just not a sensible concept in 
>> git. 
>
> That is true if it is implemented as David suggested, to have a
> phony "." entry in the tree object itself.  The object name of such
> a tree (when it contains blobs and trees underneath) will be
> different from a tree that contains the same set of blobs and trees.
> It would destroy the fundamental concepts of git.

How so?

> But you _could_ treat that "should-be-kept-even-when-empty"-ness
> just like we treat executable bit on blobs, I think.
>
> When blobs with the same contents but of different type (REG vs LNK)
> and regular file with or without executable bit are entered in git,
> they all get the same SHA-1 but we can still tell them apart because
> the index and the tree entry have mode bits.  So hypothetically, you
> could introduce "sticky" directory in tree entries to mark "this
> will not go away when emptied".

A tree containing files with and without executable bits will show
different SHA-1 sums.  There is no reason that this should be
different for a tree containing the conceptual "." or not.  I won't
fight for a specific implementation but if I am going to implement
this (and the current lack of enthusiasm points to that) I will not go
and duplicate the entire ignore/add/rm/index/repository machinery in
order to have a bit rather than an actual "." directory entry.

Most Unix file systems have an honest, physical, down-to-Earth
directory entry "." even on disk because it _simplifies_ matters, even
though one could special-case "." all throughout and make do without a
physical entry in theory.

And, as I explained, "." lends itself perfectly to the gitignore
machinery in order to policy projects to track or not track
directories.

> In a 'tree' object, they might appear as:
>
>         40000 ordinary-directory '\0' 20-byte SHA-1
>         41000 directory-dontremove-even-if-empty '\0' 20-byte SHA-1
>
> In 'index', as your "I'm soft" patch, we do not have to add
> nonsticky kind of tree nodes,

It does not work, since then you can't distinguish

mkdir A
touch B
git-add A/B

from

mkdir A
touch B
git-add A

It is very clear that git-rm A/B _mustn't_ leave an empty directory in
the first case, and _must_ leave an empty directory in the second case
_if_ and only if one tracks directories.

> Obviously, this "sticky" bit will cascade up and make your otherwise
> equivalent parent tree's different,

No, it must not "cascade up".  After

mkdir -p A/B
touch A/B/C
git-add A/B
git-rm A/B

there must be nothing tracked by git.  The "sticky" bit does not
"cascade up".  Its upward effect is only changing the SHA-1 of the
tree, like any change below does.

> This will involve a lot of changes, so I would not recommend anybody
> doing so, though.

Neither would I.  Why people want to complicate the code base
everywhere by avoiding to treat "." like a legitimate entry (as Unix
file systems do for a _reason_) is simply a miracle to me.

The framework is pretty much _there_.  There is no point in not making
use of it and duplicating the whole machinery because we want a "bit
set" implementation instead of a file name.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply

* Re: Another question about importing SVN with fast-import
From: Julian Phillips @ 2007-07-20  5:50 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: David Frech, git
In-Reply-To: <20070720051142.GO32566@spearce.org>

On Fri, 20 Jul 2007, Shawn O. Pearce wrote:

> David Frech <david@nimblemachines.com> wrote:
>> On 7/16/07, Julian Phillips <julian@quantumfyre.co.uk> wrote:
>>> Now the question.  Shawn recently added C and R operations - almost as
>>> soon as they were asked for too.  However, how do you copy a file from a
>>> particular revision?
>>
>> I have to second Julian's question.
>
> I'm getting to it.  Maybe this weekend.  Right now I have git-gui
> work to finish, and that work is more important to me this week than
> fast-import is.  Not that fast-import isn't important to me either,
> its just the way things are this week.

No complaints from me - git-gui is one of the reasons I want to be able to 
use fast-import ;)

>> The comment/question is: how different is this, really, from being
>> able to  specify a "from" line in a commit? In both cases I'm asking
>> fast-import to reach into its memory (or the repo) and pull out a
>> tree, and to add (some or all of it) to my current branch. Isn't the
>> kind of generic C command that Julian and I are asking for the same
>> thing, only instead of taking the whole tree (from the specified
>> commit) it takes a single file or directory?
>
> It is possible.  I'm just not sure what the syntax for it should be.
> Suggestions?  I really want to stay backwards compatible with the
> current "C" command, so:
>
> 	'C' SP commit SP path SP path
>
> is out because its ambiguous with the current meaning where the
> second (destination) path can contain SP without being quoted by
> the frontend.

You could always make it part of the 'M' command?

'M' sp mode sp 'copy' sp path_str lf (ref_str | hexsha1 | sha1exp_str | 
idnum) SP path_str;

Or just make it a new command, O (for other) or E (for existing) maybe? :S

-- 
Julian

  ---
It is easier to change the specification to fit the program than vice versa.

^ permalink raw reply

* Re: [RFC PATCH] Re: Empty directories...
From: David Kastrup @ 2007-07-20  5:35 UTC (permalink / raw)
  To: git
In-Reply-To: <alpine.LFD.0.999.0707191726510.27249@woody.linux-foundation.org>

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Thu, 19 Jul 2007, Linus Torvalds wrote:
>> 
>> That's physically impossible with the git data-structures (since
>> there is no way of saving "this directory was added empty" in the
>> tree structures, nor any point to it), so I think it's just insane
>> rambling.
>
> Of course, it's physically *possible* to have a tree that contains
> two entries for the same name: first the "empty tree" and then the
> "real tree", and yeah, in theory you could track things that way.
>
> So I guess the "physically impossible" was a bit strong. You'd have
> to have a totally insane format, and you'd have to violate deeply
> seated rules about what trees look like (and the index too, for that
> matter: we'd have to do the same for the index, and keep the S_IFDIR
> entry alive despite having other entries that are children of it),
> but it's *possible*.

Excuse me?  You don't need a "totally insane format".  You need an
entry "." of a new type "directory" that can be part of the current
concept of a "tree".  This new type does _not_ have children.  It is
not a container for files.  It would be the thing that would carry
permissions or other properties if git were to store them.  It can be
put into .gitignore files like other files.

One drawback is that adding and removing it alone is not supported
with the current git-add and git-remove commands: they would require
an additional argument "-d" like "ls" does.

All of this is a straightforward extension fitting very well the
current paradigms and also existing file systems and their usage.

> It's just a really bad idea.

> So to be sane, when you add files, the empty directory entry has to
> go away.

You really have not followed the discussion at all.  This is not
possible since otherwise you could not distinguish the cases

mkdir A
touch A/B
git-add A
git-rm A/B

where A was added and not removed and should stay and

mkdir A
touch A/B
git-add A/B
git-rm A/B

where a single file was added and removed and nothing should stay.

> Otherwise you could have two very different trees that encode the
> same *content* (just with different ways of getting there -
> depending on whether you have a history with empty trees or not),
> and that's very much against the philosophy of git, and breaks some
> fundamental rules (like the fact that "same content == same SHA1").

No, the content is _different_.  One tree contains a tracked
directory, the other does not.  That means that the trees behave
_differently_ when you manipulate them, and that means that they are
_not_ the same tree.

> In fact, that may be the best way to explain why it's *not* an
> option to have "empty trees remain empty trees if we remove the last
> file from them": git fundamnetally tracks "content snapshots", and
> anything that implies the content containing any history is against
> the rules.
>
> So the whole notion of "remembering" whether a directory was added
> explicitly as an empty directory or not is just not a sensible
> concept in git.

Certainly.  That is why we instead remember whether or not a directory
entry "." was added or not.  It will be added (unless the defaults and
gitignore settings ask "." to be non-tracked) when git adds the
corresponding tree or subtree, and it will get removed when git
removes the corresponding tree or subtree.  Emptiness is not a special
case, and it can't be.  Currently, the main information associated
with "." is "stay around even if tree becomes empty".

Now you can do

    unlink .

in Solaris and have the name "." vanish while the directory still
works as a container by other names.

I don't propose that git be able to track this difference, though, and
I doubt that most file archivers would.

But git can or cannot ignore files, and in a similar way it can or
cannot ignore what a directory has more than being an abstract
container.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply

* Re: Another question about importing SVN with fast-import
From: Shawn O. Pearce @ 2007-07-20  5:11 UTC (permalink / raw)
  To: David Frech; +Cc: Julian Phillips, git
In-Reply-To: <7154c5c60707190009r6d460debs71158d4db9a028d4@mail.gmail.com>

David Frech <david@nimblemachines.com> wrote:
> On 7/16/07, Julian Phillips <julian@quantumfyre.co.uk> wrote:
> >Now the question.  Shawn recently added C and R operations - almost as
> >soon as they were asked for too.  However, how do you copy a file from a
> >particular revision?
> 
> I have to second Julian's question.

I'm getting to it.  Maybe this weekend.  Right now I have git-gui
work to finish, and that work is more important to me this week than
fast-import is.  Not that fast-import isn't important to me either,
its just the way things are this week.
 
> This is kind of a request and a comment/question. The request is:
> there is no way to do *reasonably* in the front end what fast-import
> can do somewhat reasonably: namely, copy a <path> (file or directory!)
> from an arbitrary previously committed revision/mark to the current
> branch.

I agree.
 
> The comment/question is: how different is this, really, from being
> able to  specify a "from" line in a commit? In both cases I'm asking
> fast-import to reach into its memory (or the repo) and pull out a
> tree, and to add (some or all of it) to my current branch. Isn't the
> kind of generic C command that Julian and I are asking for the same
> thing, only instead of taking the whole tree (from the specified
> commit) it takes a single file or directory?

It is possible.  I'm just not sure what the syntax for it should be.
Suggestions?  I really want to stay backwards compatible with the
current "C" command, so:

	'C' SP commit SP path SP path

is out because its ambiguous with the current meaning where the
second (destination) path can contain SP without being quoted by
the frontend.

> Lastly, do we really need "R"? With this generic copy - and I think
> there should be *only* a generic version, not a "streamlined local
> copy" version and a "reach into history arbitrarily" version - we can,
> as an earlier poster pointed out, do R by doing a C and then a D. This
> is, in fact, how svn dump files represent file and directory renames.

The code for "R" is so short that I just don't see a need to remove it.
Its also already out in the wild, as it has been in Junio's master for
a little while now.
 
-- 
Shawn.

^ permalink raw reply

* [PATCH] Add GIT_EDITOR environment variable and core.editor configuration variable
From: Adam Roben @ 2007-07-20  5:09 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Adam Roben
In-Reply-To: <11848913032579-git-send-email-aroben@apple.com>

These variables let you specify an editor that will be launched in preference to
the EDITOR and VISUAL environment variables. The order of preference is
GIT_EDITOR, core.editor, EDITOR, VISUAL.

Signed-off-by: Adam Roben <aroben@apple.com>
---
This patch obsoletes all the previous ones I've sent for this change.

After discussing a bit with Junio and Shawn, this patch introduces
git-sh-setup::git_editor() to handle invoking the editor. It also respects the
GIT_EDITOR environment variable for overriding the choice of editor.

Hopefully this is the last version of this patch :-)

 Documentation/git-commit.txt     |   10 ++++++----
 Documentation/git-send-email.txt |    4 ++--
 git-am.sh                        |    2 +-
 git-commit.sh                    |   11 +----------
 git-rebase--interactive.sh       |    2 +-
 git-send-email.perl              |    7 +++----
 git-sh-setup.sh                  |   15 +++++++++++++++
 git-tag.sh                       |    2 +-
 8 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/Documentation/git-commit.txt b/Documentation/git-commit.txt
index f96142f..8e0e7e2 100644
--- a/Documentation/git-commit.txt
+++ b/Documentation/git-commit.txt
@@ -244,10 +244,12 @@ on the Subject: line and the rest of the commit in the body.
 
 include::i18n.txt[]
 
-ENVIRONMENT VARIABLES
----------------------
-The command specified by either the VISUAL or EDITOR environment
-variables is used to edit the commit log message.
+ENVIRONMENT AND CONFIGURATION VARIABLES
+---------------------------------------
+The editor used to edit the commit log message will be chosen from the
+GIT_EDITOR environment variable, the core.editor configuration variable, the
+VISUAL environment variable, or the EDITOR environment variable (in that
+order).
 
 HOOKS
 -----
diff --git a/Documentation/git-send-email.txt b/Documentation/git-send-email.txt
index 293686c..d243ed1 100644
--- a/Documentation/git-send-email.txt
+++ b/Documentation/git-send-email.txt
@@ -44,8 +44,8 @@ The --cc option must be repeated for each user you want on the cc list.
 	value; if that is unspecified, default to --chain-reply-to.
 
 --compose::
-	Use $EDITOR to edit an introductory message for the
-	patch series.
+	Use $GIT_EDITOR, core.editor, $VISUAL, or $EDITOR to edit an
+	introductory message for the patch series.
 
 --from::
 	Specify the sender of the emails.  This will default to
diff --git a/git-am.sh b/git-am.sh
index e5e6f2c..bfd65dc 100755
--- a/git-am.sh
+++ b/git-am.sh
@@ -364,7 +364,7 @@ do
 		[yY]*) action=yes ;;
 		[aA]*) action=yes interactive= ;;
 		[nN]*) action=skip ;;
-		[eE]*) "${VISUAL:-${EDITOR:-vi}}" "$dotest/final-commit"
+		[eE]*) git_editor "$dotest/final-commit"
 		       action=again ;;
 		[vV]*) action=again
 		       LESS=-S ${PAGER:-less} "$dotest/patch" ;;
diff --git a/git-commit.sh b/git-commit.sh
index 3f3de17..92749df 100755
--- a/git-commit.sh
+++ b/git-commit.sh
@@ -544,18 +544,9 @@ fi
 
 case "$no_edit" in
 '')
-	case "${VISUAL:-$EDITOR},$TERM" in
-	,dumb)
-		echo >&2 "Terminal is dumb but no VISUAL nor EDITOR defined."
-		echo >&2 "Please supply the commit log message using either"
-		echo >&2 "-m or -F option.  A boilerplate log message has"
-		echo >&2 "been prepared in $GIT_DIR/COMMIT_EDITMSG"
-		exit 1
-		;;
-	esac
 	git-var GIT_AUTHOR_IDENT > /dev/null  || die
 	git-var GIT_COMMITTER_IDENT > /dev/null  || die
-	${VISUAL:-${EDITOR:-vi}} "$GIT_DIR/COMMIT_EDITMSG"
+	git_editor "$GIT_DIR/COMMIT_EDITMSG"
 	;;
 esac
 
diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
index f395076..a2d4d09 100755
--- a/git-rebase--interactive.sh
+++ b/git-rebase--interactive.sh
@@ -414,7 +414,7 @@ EOF
 			die_abort "Nothing to do"
 
 		cp "$TODO" "$TODO".backup
-		${VISUAL:-${EDITOR:-vi}} "$TODO" ||
+		git_editor "$TODO" ||
 			die "Could not execute editor"
 
 		test -z "$(grep -ve '^$' -e '^#' < $TODO)" &&
diff --git a/git-send-email.perl b/git-send-email.perl
index 7552cac..a09b1c9 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -49,8 +49,8 @@ Options:
    --bcc          Specify a list of email addresses that should be Bcc:
 		  on all the emails.
 
-   --compose      Use \$EDITOR to edit an introductory message for the
-                  patch series.
+   --compose      Use \$GIT_EDITOR, core.editor, \$EDITOR, or \$VISUAL to edit
+		  an introductory message for the patch series.
 
    --subject      Specify the initial "Subject:" line.
                   Only necessary if --compose is also set.  If --compose
@@ -341,8 +341,7 @@ GIT: for the patch you are writing.
 EOT
 	close(C);
 
-	my $editor = $ENV{EDITOR};
-	$editor = 'vi' unless defined $editor;
+	my $editor = $ENV{GIT_EDITOR} || $repo->config("core.editor") || $ENV{VISUAL} || $ENV{EDITOR} || "vi";
 	system($editor, $compose_filename);
 
 	open(C2,">",$compose_filename . ".final")
diff --git a/git-sh-setup.sh b/git-sh-setup.sh
index 4ed07e9..c51985e 100755
--- a/git-sh-setup.sh
+++ b/git-sh-setup.sh
@@ -28,6 +28,21 @@ set_reflog_action() {
 	fi
 }
 
+git_editor() {
+	GIT_EDITOR=${GIT_EDITOR:-$(git config core.editor || echo ${VISUAL:-${EDITOR}})}
+	case "$GIT_EDITOR,$TERM" in
+	,dumb)
+		echo >&2 "No editor specified in GIT_EDITOR, core.editor, VISUAL,"
+		echo >&2 "or EDITOR. Tried to fall back to vi but terminal is dumb."
+		echo >&2 "Please set one of these variables to an appropriate"
+		echo >&2 "editor or run $0 with options that will not cause an"
+		echo >&2 "editor to be invoked (e.g., -m or -F for git-commit)."
+		exit 1
+		;;
+	esac
+	"${GIT_EDITOR:-vi}" "$1"
+}
+
 is_bare_repository () {
 	git rev-parse --is-bare-repository
 }
diff --git a/git-tag.sh b/git-tag.sh
index 1c25d88..5ee3f50 100755
--- a/git-tag.sh
+++ b/git-tag.sh
@@ -177,7 +177,7 @@ if [ "$annotate" ]; then
         ( echo "#"
           echo "# Write a tag message"
           echo "#" ) > "$GIT_DIR"/TAG_EDITMSG
-        ${VISUAL:-${EDITOR:-vi}} "$GIT_DIR"/TAG_EDITMSG || exit
+        git_editor "$GIT_DIR"/TAG_EDITMSG || exit
     else
         printf '%s\n' "$message" >"$GIT_DIR"/TAG_EDITMSG
     fi
-- 
1.5.3.rc2.20.g2e098-dirty

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox