Git development
 help / color / mirror / Atom feed
* Re: [HELP] Adding git awareness to the darning patch management system.
From: Peter Williams @ 2011-11-30 23:47 UTC (permalink / raw)
  To: Tay Ray Chuan; +Cc: git
In-Reply-To: <CALUzUxrcWB62jARtgifTOocGL4gEGXDFMiO=ppHXzFyUmb+3ww@mail.gmail.com>

On 30/11/11 19:04, Tay Ray Chuan wrote:
> On Wed, Nov 30, 2011 at 10:17 AM, Peter Williams
> <peter_ono@users.sourceforge.net>  wrote:
>> I'm the author of the darning patch management system
>> <http://darning.sourceforge.net/>  and would like some help adding git
>> awareness to the system.  At this stage of the development, "awareness" is
>> fairly simple concept with two broad aims:
>
> That link says it "combines the strengths of quilt and mq and
> eliminates their weaknesses", but I don't see any info on why this is
> the case with your program;

Documenting this is on my todo list (but not at the top).

Quilt's weaknesses are documented on their website and my issues with MQ 
is that it (potentially) compromises the Mercurial repository and others 
to which it may push.  (MQ also has the problem that its current 
maintainers don't understand the workflow for which it was designed and 
are changing it to suit a different workflow.)

The repository compromise issue also applies to stgit.

Other things darning adds are:

1. tracking of copies and renames (not in quilt),
2. tracking changes to files' modes (not in quilt),
3. binary diffs (using git binary diff format for import/export), and
4. help managing addition of trailing whitespace by patches.

> it would be great it if you could provide
> a quick start guide (probably easier to show this with some commands
> in the CLI rather than GUI screenshots).

At this stage, the CLI is really only there to allow me to do script 
based testing of darning's internals and is not a fully capable 
interface.  However, if you look in the directory test-cli for files 
with a ".test" suffix you will find examples of their use.

>
> Have a look at StGit's tutorial
> http://www.procode.org/stgit/doc/tutorial.html (very similar to quilt
> and mq too) to see what I mean.

OK

Peter

^ permalink raw reply

* [PATCH/RFC 2/2] merge, pull: Document the --no-ff-only merge option
From: Samuel Bronson @ 2011-12-01  1:38 UTC (permalink / raw)
  To: git; +Cc: Samuel Bronson
In-Reply-To: <1322703537-3914-1-git-send-email-naesten@gmail.com>

Now that --no-ff-only will work with pull and not just with merge, we
should probably tell the users about it.

Signed-off-by: Samuel Bronson <naesten@gmail.com>
---
 Documentation/merge-options.txt |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/Documentation/merge-options.txt b/Documentation/merge-options.txt
index 6bd0b04..2adc4f5 100644
--- a/Documentation/merge-options.txt
+++ b/Documentation/merge-options.txt
@@ -56,9 +56,13 @@ With --no-squash perform the merge and commit the result. This
 option can be used to override --squash.
 
 --ff-only::
+--no-ff-only::
 	Refuse to merge and exit with a non-zero status unless the
 	current `HEAD` is already up-to-date or the merge can be
 	resolved as a fast-forward.
++
+With --no-ff-only, don't refuse. This is can be used to override
+--ff-only, or the "`only`" value for the `merge.ff` configuration option.
 
 -s <strategy>::
 --strategy=<strategy>::
-- 
1.7.7.3

^ permalink raw reply related

* [PATCH/RFC 1/2] pull: pass the --no-ff-only flag through to merge, not fetch
From: Samuel Bronson @ 2011-12-01  1:38 UTC (permalink / raw)
  To: git; +Cc: Samuel Bronson

Without this, pull becomes unusable for merging branches when the config
option `merge.ff` is set to `only`.

Signed-off-by: Samuel Bronson <naesten@gmail.com>
---
 git-pull.sh |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/git-pull.sh b/git-pull.sh
index 9868a0b..a09fcbc 100755
--- a/git-pull.sh
+++ b/git-pull.sh
@@ -76,6 +76,8 @@ do
 		no_ff=--no-ff ;;
 	--ff-only)
 		ff_only=--ff-only ;;
+	--no-ff-only)
+		ff_only=--no-ff-only ;;
 	-s=*|--s=*|--st=*|--str=*|--stra=*|--strat=*|--strate=*|\
 		--strateg=*|--strategy=*|\
 	-s|--s|--st|--str|--stra|--strat|--strate|--strateg|--strategy)
-- 
1.7.7.3

^ permalink raw reply related

* Re: [HELP] Adding git awareness to the darning patch management system.
From: Peter Williams @ 2011-12-01  0:56 UTC (permalink / raw)
  To: Jeff King; +Cc: git
In-Reply-To: <20111130072248.GG5317@sigill.intra.peff.net>

On 30/11/11 17:22, Jeff King wrote:
> On Wed, Nov 30, 2011 at 12:17:22PM +1000, Peter Williams wrote:
>
>> 1. presenting the file tree of the sources being patched in a way
>> that makes sense to the user including the current status of files
>> from the point of view of the underlying SCM (in this case, git), and
>
> I'm not exactly sure what this means.

If you look at the screenshots at sourceforge (which were produced on 
top of a Mercurial repo) you'll notice that file names in the left most 
tree have letters in front of them and appear in different foreground 
colours.  These letters are the same as those returned by Mercurial's 
status command and, hence, give a Mercurial user an easy to understand 
snapshot of the status of the files in the playground.  The colour 
coding is (relatively) arbitrary (and chosen by me) and is intended to 
make it easier to detect the different file statuses.

My main problem is that I can't find a git file status command (and 
there are a lot of them to choose from) that gives a snapshot of the 
statuses of all files in a directory (including those not tracked or 
ignored).  A secondary problem is that, if I could cobble together 
statuses from various commands, mapping git statuses to the Mercurial 
ones for display would not be a good solution as they would not 
necessarily make sense to a git user.  (It's fairly clear to me from my 
inability to make sense of git's CLI that git users think differently to 
me, a Mercurial user, and it's unlikely that I can, without help, make a 
file tree display that makes sense to a git user.)

>
>> 2. detecting files with uncommitted changes (from the SCM's point of
>> view) when the user adds them to a patch (or pushes a patch that
>> contains them) so that they may be alerted to the fact and offered
>> the choice of absorbing the uncommitted changes into the patch (or
>> not).
>
> For this, you probably want "git diff-files --name-only", which will
> show files with differences in the working tree. Keep in mind that git
> has an "index" or "staging area", which means that you have three states
> of content for a given path:
>
>    1. the state of the prior commit (i.e., HEAD)
>
>    2. the state that is marked to be committed when "git commit" is run
>       (i.e., the index)
>
>    3. the state in the working tree

This is a prime example of the different mindset of the git user to the 
hg user.

>
> You can compare the first two with "git diff-index", and the latter two
> with "git diff-files". You can also use "git status --porcelain" to get
> a machine-readable output that shows how the three states match up, with
> one line per file.

This is an example of why I'm confused.  There are too many ways to do 
(similar) things and it's hard to know which to use.

>
>> I've already implemented this interface for Mercurial (with which I
>> am familiar) and looked at doing the same with git but had difficulty
>> discovering the definitive mechanisms for obtaining the necessary
>> data.  So I'm soliciting your help in overcoming these problems.
>
> I hope the above helps you some. If not, just ask. It might be easier to
> understand what you are looking for if you can give concrete examples.

Maybe an example of why I think the feature is useful might help.  Say 
that you start editing a file and then decide that you want to put this 
change into a patch rather than committing it.  If you were using quilt 
you would have to do this manually by any of a number or ways such as:

$ <git diff command> file > temp.patch
$ <git revert command> file
$ quilt new one.patch
$ quilt add file
$ patch -p1 file < temp.patch
$ rm temp.patch

In darning, you just do:

$ darn new one.patch
$ darn add --absorb file

If you're using the GUI (the primary interface), it will report that the 
file has uncommitted changes and offer the choice of absorbing the 
changes into the new patch, forcing the new patch to consider the 
current file state as its starting point or (of course) cancel the 
addition.  The CLI command will fail if an attempt to add a file which 
has uncommitted changes is made unless either the --absorb or --force 
(which uses the file's current content as the starting point from the 
patches point of view) options are used.  (So, whichever interface is in 
use, you have to explicitly state how you want uncommitted changes to be 
treated.

The interface to the SCM to support this is two functions:

1: get_files_with_uncommitted_changes() which called with no arguments 
returns a list of the paths of all files with uncommitted changes or 
when given a list of file paths (the more common case) returns the 
subset of that list which have uncommitted changes; and

2. copy_clean_version_to(filepath, target_path) which makes a copy of 
the file as recorded in the prior commit and places it at the 
target_path (usually where darning stores the "original" for reference 
when creating diffs).

A similar mechanism is in place for the case where a file is added to a 
patch and the file is in an underlying patch with unrefreshed changes 
but this requires no help from the underlying SCM.

Both of these mechanisms also come into play when a patch is 
pushed/applied so that the user has (relatively painless) control over 
which changes end up in which patch.

With MQ and the above example, the file would be automatically added to 
the current patch (or a new patch if you created one) absorbing the 
changes whether you wanted it to or not.  I.e. there is no way of 
creating MQ patches that don't automatically absorb all uncommitted changes.

Thanks for your reply,
Peter
PS Darning can be used on top of git repository without "git awareness" 
but is not as useful as it would be with it.

^ permalink raw reply

* Re: [PATCH] Implement fast hash-collision detection
From: Bill Zaumen @ 2011-12-01  0:41 UTC (permalink / raw)
  To: Jeff King; +Cc: git, gitster, pclouds, spearce, torvalds
In-Reply-To: <20111130062512.GA5317@sigill.intra.peff.net>

On Wed, 2011-11-30 at 01:25 -0500, Jeff King wrote:
> On Tue, Nov 29, 2011 at 01:56:28PM -0800, Bill Zaumen wrote:
> But I
> think the important attacks bypass your CRC anyway. Consider this attack
> scenario:
> 
>   1. Linus signs a tag (or a commit) and pushes it to kernel.org.
> 
>   2. kernel.org gets hacked, and the attacker replaces an object with
>      an evil colliding version[1].
> 
>   3. I clone from kernel.org, and run "git tag --verify". Git says it's
>      OK, because the signature checks out, but I have a bogus object.
> 
> How does your CRC help? If I understand your scheme correctly,
> kernel.org will have told me the CRC of all of the objects during the
> clone. But that isn't part of what Linus signed, so the attacker in step
> 2 could just as easily have overwritten kernel.org's crc file, and the
> signature will remain valid.

First, there is a misconception - the server will not tell you the CRC.
The CRC will be computed locally by the client instead.  

Aside from that, suppose the attacker does what you suggests (providing
a valid CRC so that git commands like verify-pack don't have an error
to detect).  You can't tell that something is wrong, but Linus can - 
the next time he does a fetch.  If he fetches, the server sends
some SHA-1 hashes and the client responds with 'have' or 'want' in a
reply.  If the client wants it, the client doesn't have a CRC, if the
client sends 'have', the CRCs are available so those get sent.  The
server then checks, notices a mismatch (probably in the CRC of the CRCs
of all the blobs in the commit's tree), and generates an error.

It's also possible to write some additional commands to (for example)
fetch the SHA-1 hashes and CRCs from all remote repositories you use
and compare these to make sure they are all consistent, something that
can be run ocassionally.


> > An efficient algorithm to do both simultaneously does not yet exist.
> > So, if we could generate a SHA-1 collision in one second, it would
> > presumably take billions of seconds (many decades of continuous
> > computation) to generate a SHA-1 hash with the same CRC, and well
> > before a year has elapsed, the original object should have been in all
> > the repositories, preventing a forged object from being inserted. Of
> > course, eventually you might need a real message digest.
> 
> This is wrong, for two reasons.
> 
>   1. The method for generating an object that collides in both sha-1 and
>      CRC is not necessarily to generate a colliding sha-1 and then do a
>      pre-image attack on the CRC. It is to do a birthday attack on the
>      sha-1 and the CRC together. Which halves the bit-strength of the
>      CRC to 16 bits (just as we can generally find collisions in 160-bit
>      sha1s in 2^80). 

That result for birthday attack assumes the goal is to find two files
that will have the same SHA-1 value (or SHA-1 + CRC).  The case I was
talking about (apologies if I did not state this clearly) is that you
have an existing object and an existing CRC and you want to generate
a second object with the same SHA-1 and same CRC as the first.  A
birthday attack doesn't work so well in that case - the number of tries
is much higher than half the number of bits in the digest.

http://en.wikipedia.org/wiki/Birthday_attack#Digital_signature_susceptibility
has a discussion regarding digital signatures.  The trick is for a
person to create a large number of variations of a "fair" and "unfair"
contract, and use a birthday attack to find a pair that have the same
hash.  The variations are typically inconsequential changes (extra
spacing, commas, etc.)  In the case I was discussing, a developer
creates some code,  commits it and pushes it to a shared repository -
the developer is not given code by the attacker.  The attacker can,
however, see the code by fetching it.  An attack then consists of
generating a collision, change the object in the attacker's local
repository, and then push the original developer's commit (with the
modified object) to another shared repository before someone else
puts the correct objects into that repository.  A birthday attack
does not work in this case.

There one issue that this suggests however - it is not clear if the
2^57 number given for the best SHA-1 attacks were attempts to generate
a new file with the same SHA-1 hash as an existing file or a pair of
files that have the same SHA-1 hash.  If it is the latter, then an
attack is significantly harder than I assumed as a worst case, but
still possibly much, much better than brute force.

http://en.wikipedia.org/wiki/Birthday_problem has an analysis of the
birthday problem (the basis of a birthday attack) and clearly notes that
this is different than the "same birthday as you" variation - you don't
do nearly so well in that case.

> Anyway, all of that is just reiterating that CRC should not be used
>      as a security function. It can easily be replaced in your scheme by
>      sha-256, which does have the desired properties.

Oh, I'm perfectly happy with sha-256 (and indicated that in the
documentation that is in the patch) - there's a tradeoff between error
detection and speed, and I simply guessed that the community was more
concerned with speed.

>   2. Your attack seems to be "find the sha-1 collision, publish one of
>      your colliding objects (i.e., the innocent-looking half), then try
>      to break the CRC". And then you claim that by the time you find the
>      CRC, everybody will already have the object.
> 
>      But wouldn't a smarter attack be to first find the collision, including
>      the CRC, and only _then_ start the attack? Then nobody will have
>      the object.

My assumption was that legitimate developer wrote some code, put it in
one of the remote repositories, and than an attacker downloaded that
code and tried to get a modified version into a different remote
repository.

> 
>      Moreover, it's not true that after a year everyone will have the
>      object. People still run "git clone" against kernel.org. Those
>      repos do not have the object.

This isn't the problem - a clone is going to be an exact copy of an
existing repository.  I was referring to the case where a commit made it
into one remote repository but not another - to get into the second one,
someone else would have had to review the code, creating a window where
an attacker with write access to the second repository could put the
modified object there.

> Right, but we are assuming that sha1 is broken. That's the whole
> security problem. So the existing digest is not worth much.

The assumption was more that "J Random Hacker's" code would be
trusted far less than code submitted by Linus, so if "J Random Hacker"
can generate create a replacement file with the same SHA-1 hash as
one in one of Linus' commits, others will initially assume that Linus
wrote that code. But, Git uses a "first in wins" rule so the bad guy
has to generate the replacement file and get it inserted before Linus'
commit reaches all the shared repositories for the project.  A SHA-1
collision is harmless if computed too late to get in.

> > Second, any value of the CRC that is stored permanently (baring bugs,
> > in my implementation, of course) is computed locally 

> But if I don't already have the object, then I have nothing to compare
> against. So when I get it from kernel.org, I have to simply accept that
> the object I'm getting is good, and write it into my object db.

The value is computed locally but can be compared remotely.

You (specifically) won't find anything while talking to kernel.org in
this example, but when you try to fetch from a different repository,
instead of just noting that you have the object, you'll get a
notification that there was a collision, so you'll find the problem
earlier than you would otherwise.  Also, anyone who had the right object
from kernel.org (the object before the repository was 'hacked' would get
a warning when trying to fetch from kernel.org after the change.

> Yes, the header has to go at the end of the existing headers. But I
> don't see any reason that would be a problem for the scheme I described.

Thanks for mentioning it - I tried putting the header in multiple
locations and probably missed the one spot where it would work.

> Current git should ignore headers that it doesn't understand. I haven't
> tested this, but Junio recently has been experimenting with
> gpg-signature lines in commits, and I'm pretty sure he checked that
> older gits properly ignore them.

Possibly there was another bug in the test I tried as well - something
was preventing the directory cleanup in one of the 'notes' test.

Anyway, thanks for the comments.

Bill

^ permalink raw reply

* Re: [PATCHv2 0/4] git-p4: small fixes to branches and labels; tests
From: Vitor Antunes @ 2011-12-01  0:37 UTC (permalink / raw)
  To: Pete Wyckoff; +Cc: Luke Diamand, git
In-Reply-To: <20111130230007.GA11598@arf.padd.com>

On Wed, Nov 30, 2011 at 11:00 PM, Pete Wyckoff <pw@padd.com> wrote:
> And avoids collision with some Vitor code that will get
> added eventually.

I'm starting to doubt I will ever be able to overcome the fast-import
limitation on not allowing branch delesetion. Sure, the code I wrote was
garbage! But they seem to be very relunctant on the concept of deleting
branches on the fly.
Did you ever take a look at the patch I sent? Maybe you could help me
shape it up a bit.

-- 
Vitor Antunes

^ permalink raw reply

* Re: [PATCHv2 0/4] git-p4: small fixes to branches and labels; tests
From: Vitor Antunes @ 2011-12-01  0:33 UTC (permalink / raw)
  To: Pete Wyckoff; +Cc: Luke Diamand, git
In-Reply-To: <20111130225813.GA11544@arf.padd.com>

On Wed, Nov 30, 2011 at 10:58 PM, Pete Wyckoff <pw@padd.com> wrote:
> This is another fundamental disconnect between p4 and git.
> Reading
>
> http://www.perforce.com/perforce/doc.current/manuals/p4guide/07_labels.html
>
> it is clear that labels are supposed to be used exactly where
> tags cannot:  to specify a collection of files as they existed
> at _different_ points in the commit history.

Check the "Use Tag Fixup Branches" section in fast-import manual, it
might help on this. The basic concept is to create a special branch
that puts all files in the same state the P4 label would put them and
then tag it in git.

Tried to use this for my branch stuff, but with no success.

> Thus I think supporting labels is kind of pointless.  But in the
> restricted use case that perforce docs tell us not to do, namely
> using labels to identify change numbers, git can reflect that
> with tags.

I still use labels as simple tags. Telling that we should use
changelists instead of labels is the same as saying that we should use
IP addresses instead of host names. It works, but I doubt you will
ever remember it unless you write it down somewhere.

-- 
Vitor Antunes

^ permalink raw reply

* [PATCH v2 5/5] bulk-checkin: replace fast-import based implementation
From: Junio C Hamano @ 2011-12-01  0:27 UTC (permalink / raw)
  To: git
In-Reply-To: <1322699263-14475-1-git-send-email-gitster@pobox.com>

This extends the earlier approach to stream a large file directly from the
filesystem to its own packfile, and allows "git add" to send large files
directly into a single pack. Older code used to spawn fast-import, but the
new bulk-checkin API replaces it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Makefile               |    2 +
 builtin/add.c          |    5 +
 builtin/pack-objects.c |    6 +-
 bulk-checkin.c         |  275 ++++++++++++++++++++++++++++++++++++++++++++++++
 bulk-checkin.h         |   16 +++
 cache.h                |    2 +
 config.c               |    4 +
 environment.c          |    1 +
 sha1_file.c            |   67 +-----------
 t/t1050-large.sh       |   94 +++++++++++++++--
 zlib.c                 |    9 ++-
 11 files changed, 403 insertions(+), 78 deletions(-)
 create mode 100644 bulk-checkin.c
 create mode 100644 bulk-checkin.h

diff --git a/Makefile b/Makefile
index 3139c19..418dd2e 100644
--- a/Makefile
+++ b/Makefile
@@ -505,6 +505,7 @@ LIB_H += argv-array.h
 LIB_H += attr.h
 LIB_H += blob.h
 LIB_H += builtin.h
+LIB_H += bulk-checkin.h
 LIB_H += cache.h
 LIB_H += cache-tree.h
 LIB_H += color.h
@@ -591,6 +592,7 @@ LIB_OBJS += base85.o
 LIB_OBJS += bisect.o
 LIB_OBJS += blob.o
 LIB_OBJS += branch.o
+LIB_OBJS += bulk-checkin.o
 LIB_OBJS += bundle.o
 LIB_OBJS += cache-tree.o
 LIB_OBJS += color.o
diff --git a/builtin/add.c b/builtin/add.c
index c59b0c9..1c42900 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -13,6 +13,7 @@
 #include "diff.h"
 #include "diffcore.h"
 #include "revision.h"
+#include "bulk-checkin.h"
 
 static const char * const builtin_add_usage[] = {
 	"git add [options] [--] <filepattern>...",
@@ -458,11 +459,15 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 		free(seen);
 	}
 
+	plug_bulk_checkin();
+
 	exit_status |= add_files_to_cache(prefix, pathspec, flags);
 
 	if (add_new_files)
 		exit_status |= add_files(&dir, flags);
 
+	unplug_bulk_checkin();
+
  finish:
 	if (active_cache_changed) {
 		if (write_cache(newfd, active_cache, active_nr) ||
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index b458b6d..dde913e 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -76,7 +76,7 @@ static struct pack_idx_option pack_idx_opts;
 static const char *base_name;
 static int progress = 1;
 static int window = 10;
-static unsigned long pack_size_limit, pack_size_limit_cfg;
+static unsigned long pack_size_limit;
 static int depth = 50;
 static int delta_search_threads;
 static int pack_to_stdout;
@@ -2009,10 +2009,6 @@ static int git_pack_config(const char *k, const char *v, void *cb)
 			    pack_idx_opts.version);
 		return 0;
 	}
-	if (!strcmp(k, "pack.packsizelimit")) {
-		pack_size_limit_cfg = git_config_ulong(k, v);
-		return 0;
-	}
 	return git_default_config(k, v, cb);
 }
 
diff --git a/bulk-checkin.c b/bulk-checkin.c
new file mode 100644
index 0000000..807463e
--- /dev/null
+++ b/bulk-checkin.c
@@ -0,0 +1,275 @@
+/*
+ * Copyright (c) 2011, Google Inc.
+ */
+#include "bulk-checkin.h"
+#include "csum-file.h"
+#include "pack.h"
+
+static int pack_compression_level = Z_DEFAULT_COMPRESSION;
+
+static struct bulk_checkin_state {
+	unsigned plugged:1;
+
+	char *pack_tmp_name;
+	struct sha1file *f;
+	off_t offset;
+	struct pack_idx_option pack_idx_opts;
+
+	struct pack_idx_entry **written;
+	uint32_t alloc_written;
+	uint32_t nr_written;
+} state;
+
+static void finish_bulk_checkin(struct bulk_checkin_state *state)
+{
+	unsigned char sha1[20];
+	char packname[PATH_MAX];
+	int i;
+
+	if (!state->f)
+		return;
+
+	if (state->nr_written == 0) {
+		close(state->f->fd);
+		unlink(state->pack_tmp_name);
+		goto clear_exit;
+	} else if (state->nr_written == 1) {
+		sha1close(state->f, sha1, CSUM_FSYNC);
+	} else {
+		int fd = sha1close(state->f, sha1, 0);
+		fixup_pack_header_footer(fd, sha1, state->pack_tmp_name,
+					 state->nr_written, sha1,
+					 state->offset);
+		close(fd);
+	}
+
+	sprintf(packname, "%s/pack/pack-", get_object_directory());
+	finish_tmp_packfile(packname, state->pack_tmp_name,
+			    state->written, state->nr_written,
+			    &state->pack_idx_opts, sha1);
+	for (i = 0; i < state->nr_written; i++)
+		free(state->written[i]);
+
+clear_exit:
+	free(state->written);
+	memset(state, 0, sizeof(*state));
+
+	/* Make objects we just wrote available to ourselves */
+	reprepare_packed_git();
+}
+
+static int already_written(struct bulk_checkin_state *state, unsigned char sha1[])
+{
+	int i;
+
+	/* The object may already exist in the repository */
+	if (has_sha1_file(sha1))
+		return 1;
+
+	/* Might want to keep the list sorted */
+	for (i = 0; i < state->nr_written; i++)
+		if (!hashcmp(state->written[i]->sha1, sha1))
+			return 1;
+
+	/* This is a new object we need to keep */
+	return 0;
+}
+
+/*
+ * Read the contents from fd for size bytes, streaming it to the
+ * packfile in state while updating the hash in ctx. Signal a failure
+ * by returning a negative value when the resulting pack would exceed
+ * the pack size limit and this is not the first object in the pack,
+ * so that the caller can discard what we wrote from the current pack
+ * by truncating it and opening a new one. The caller will then call
+ * us again after rewinding the input fd.
+ *
+ * The already_hashed_to pointer is kept untouched by the caller to
+ * make sure we do not hash the same byte when we are called
+ * again. This way, the caller does not have to checkpoint its hash
+ * status before calling us just in case we ask it to call us again
+ * with a new pack.
+ */
+static int stream_to_pack(struct bulk_checkin_state *state,
+			  git_SHA_CTX *ctx, off_t *already_hashed_to,
+			  int fd, size_t size, enum object_type type,
+			  const char *path, unsigned flags)
+{
+	git_zstream s;
+	unsigned char obuf[16384];
+	unsigned hdrlen;
+	int status = Z_OK;
+	int write_object = (flags & HASH_WRITE_OBJECT);
+	off_t offset = 0;
+
+	memset(&s, 0, sizeof(s));
+	git_deflate_init(&s, pack_compression_level);
+
+	hdrlen = encode_in_pack_object_header(type, size, obuf);
+	s.next_out = obuf + hdrlen;
+	s.avail_out = sizeof(obuf) - hdrlen;
+
+	while (status != Z_STREAM_END) {
+		unsigned char ibuf[16384];
+
+		if (size && !s.avail_in) {
+			ssize_t rsize = size < sizeof(ibuf) ? size : sizeof(ibuf);
+			if (xread(fd, ibuf, rsize) != rsize)
+				die("failed to read %d bytes from '%s'",
+				    (int)rsize, path);
+			offset += rsize;
+			if (*already_hashed_to < offset) {
+				size_t hsize = offset - *already_hashed_to;
+				if (rsize < hsize)
+					hsize = rsize;
+				if (hsize)
+					git_SHA1_Update(ctx, ibuf, hsize);
+				*already_hashed_to = offset;
+			}
+			s.next_in = ibuf;
+			s.avail_in = rsize;
+			size -= rsize;
+		}
+
+		status = git_deflate(&s, size ? 0 : Z_FINISH);
+
+		if (!s.avail_out || status == Z_STREAM_END) {
+			if (write_object) {
+				size_t written = s.next_out - obuf;
+
+				/* would we bust the size limit? */
+				if (state->nr_written &&
+				    pack_size_limit_cfg &&
+				    pack_size_limit_cfg < state->offset + written) {
+					git_deflate_abort(&s);
+					return -1;
+				}
+
+				sha1write(state->f, obuf, written);
+				state->offset += written;
+			}
+			s.next_out = obuf;
+			s.avail_out = sizeof(obuf);
+		}
+
+		switch (status) {
+		case Z_OK:
+		case Z_BUF_ERROR:
+		case Z_STREAM_END:
+			continue;
+		default:
+			die("unexpected deflate failure: %d", status);
+		}
+	}
+	git_deflate_end(&s);
+	return 0;
+}
+
+/* Lazily create backing packfile for the state */
+static void prepare_to_stream(struct bulk_checkin_state *state,
+			      unsigned flags)
+{
+	if (!(flags & HASH_WRITE_OBJECT) || state->f)
+		return;
+
+	state->f = create_tmp_packfile(&state->pack_tmp_name);
+	reset_pack_idx_option(&state->pack_idx_opts);
+
+	/* Pretend we are going to write only one object */
+	state->offset = write_pack_header(state->f, 1);
+	if (!state->offset)
+		die_errno("unable to write pack header");
+}
+
+static int deflate_to_pack(struct bulk_checkin_state *state,
+			   unsigned char result_sha1[],
+			   int fd, size_t size,
+			   enum object_type type, const char *path,
+			   unsigned flags)
+{
+	off_t seekback, already_hashed_to;
+	git_SHA_CTX ctx;
+	unsigned char obuf[16384];
+	unsigned header_len;
+	struct sha1file_checkpoint checkpoint;
+	struct pack_idx_entry *idx = NULL;
+
+	seekback = lseek(fd, 0, SEEK_CUR);
+	if (seekback == (off_t) -1)
+		return error("cannot find the current offset");
+
+	header_len = sprintf((char *)obuf, "%s %" PRIuMAX,
+			     typename(type), (uintmax_t)size) + 1;
+	git_SHA1_Init(&ctx);
+	git_SHA1_Update(&ctx, obuf, header_len);
+
+	/* Note: idx is non-NULL when we are writing */
+	if ((flags & HASH_WRITE_OBJECT) != 0)
+		idx = xcalloc(1, sizeof(*idx));
+
+	already_hashed_to = seekback;
+
+	while (1) {
+		prepare_to_stream(state, flags);
+		if (idx) {
+			sha1file_checkpoint(state->f, &checkpoint);
+			idx->offset = state->offset;
+			crc32_begin(state->f);
+		}
+		if (!stream_to_pack(state, &ctx, &already_hashed_to,
+				    fd, size, type, path, flags))
+			break;
+		/*
+		 * Writing this object to the current pack will make
+		 * it too big; we need to truncate it, start a new
+		 * pack, and write into it.
+		 */
+		if (!idx)
+			die("BUG: should not happen");
+		sha1file_truncate(state->f, &checkpoint);
+		state->offset = checkpoint.offset;
+		finish_bulk_checkin(state);
+		if (lseek(fd, seekback, SEEK_SET) == (off_t) -1)
+			return error("cannot seek back");
+	}
+	git_SHA1_Final(result_sha1, &ctx);
+	if (!idx)
+		return 0;
+
+	if (already_written(state, result_sha1)) {
+		sha1file_truncate(state->f, &checkpoint);
+		state->offset = checkpoint.offset;
+		free(idx);
+	} else {
+		idx->crc32 = crc32_end(state->f);
+		hashcpy(idx->sha1, result_sha1);
+		ALLOC_GROW(state->written,
+			   state->nr_written + 1,
+			   state->alloc_written);
+		state->written[state->nr_written++] = idx;
+	}
+	return 0;
+}
+
+int index_bulk_checkin(unsigned char *sha1,
+		       int fd, size_t size, enum object_type type,
+		       const char *path, unsigned flags)
+{
+	int status = deflate_to_pack(&state, sha1, fd, size, type,
+				     path, flags);
+	if (!state.plugged)
+		finish_bulk_checkin(&state);
+	return status;
+}
+
+void plug_bulk_checkin(void)
+{
+	state.plugged = 1;
+}
+
+void unplug_bulk_checkin(void)
+{
+	state.plugged = 0;
+	if (state.f)
+		finish_bulk_checkin(&state);
+}
diff --git a/bulk-checkin.h b/bulk-checkin.h
new file mode 100644
index 0000000..4f599f8
--- /dev/null
+++ b/bulk-checkin.h
@@ -0,0 +1,16 @@
+/*
+ * Copyright (c) 2011, Google Inc.
+ */
+#ifndef BULK_CHECKIN_H
+#define BULK_CHECKIN_H
+
+#include "cache.h"
+
+extern int index_bulk_checkin(unsigned char sha1[],
+			      int fd, size_t size, enum object_type type,
+			      const char *path, unsigned flags);
+
+extern void plug_bulk_checkin(void);
+extern void unplug_bulk_checkin(void);
+
+#endif
diff --git a/cache.h b/cache.h
index 2e6ad36..4f20861 100644
--- a/cache.h
+++ b/cache.h
@@ -35,6 +35,7 @@ int git_inflate(git_zstream *, int flush);
 void git_deflate_init(git_zstream *, int level);
 void git_deflate_init_gzip(git_zstream *, int level);
 void git_deflate_end(git_zstream *);
+int git_deflate_abort(git_zstream *);
 int git_deflate_end_gently(git_zstream *);
 int git_deflate(git_zstream *, int flush);
 unsigned long git_deflate_bound(git_zstream *, unsigned long);
@@ -598,6 +599,7 @@ extern size_t packed_git_window_size;
 extern size_t packed_git_limit;
 extern size_t delta_base_cache_limit;
 extern unsigned long big_file_threshold;
+extern unsigned long pack_size_limit_cfg;
 extern int read_replace_refs;
 extern int fsync_object_files;
 extern int core_preload_index;
diff --git a/config.c b/config.c
index edf9914..c736802 100644
--- a/config.c
+++ b/config.c
@@ -797,6 +797,10 @@ int git_default_config(const char *var, const char *value, void *dummy)
 		return 0;
 	}
 
+	if (!strcmp(var, "pack.packsizelimit")) {
+		pack_size_limit_cfg = git_config_ulong(var, value);
+		return 0;
+	}
 	/* Add other config variables here and to Documentation/config.txt. */
 	return 0;
 }
diff --git a/environment.c b/environment.c
index 0bee6a7..31e4284 100644
--- a/environment.c
+++ b/environment.c
@@ -60,6 +60,7 @@ char *notes_ref_name;
 int grafts_replace_parents = 1;
 int core_apply_sparse_checkout;
 struct startup_info *startup_info;
+unsigned long pack_size_limit_cfg;
 
 /* Parallel index stat data preload? */
 int core_preload_index = 0;
diff --git a/sha1_file.c b/sha1_file.c
index 27f3b9b..c96e366 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -18,6 +18,7 @@
 #include "refs.h"
 #include "pack-revindex.h"
 #include "sha1-lookup.h"
+#include "bulk-checkin.h"
 
 #ifndef O_NOATIME
 #if defined(__linux__) && (defined(__i386__) || defined(__PPC__))
@@ -2679,10 +2680,8 @@ static int index_core(unsigned char *sha1, int fd, size_t size,
 }
 
 /*
- * This creates one packfile per large blob, because the caller
- * immediately wants the result sha1, and fast-import can report the
- * object name via marks mechanism only by closing the created
- * packfile.
+ * This creates one packfile per large blob unless bulk-checkin
+ * machinery is "plugged".
  *
  * This also bypasses the usual "convert-to-git" dance, and that is on
  * purpose. We could write a streaming version of the converting
@@ -2696,65 +2695,7 @@ static int index_stream(unsigned char *sha1, int fd, size_t size,
 			enum object_type type, const char *path,
 			unsigned flags)
 {
-	struct child_process fast_import;
-	char export_marks[512];
-	const char *argv[] = { "fast-import", "--quiet", export_marks, NULL };
-	char tmpfile[512];
-	char fast_import_cmd[512];
-	char buf[512];
-	int len, tmpfd;
-
-	strcpy(tmpfile, git_path("hashstream_XXXXXX"));
-	tmpfd = git_mkstemp_mode(tmpfile, 0600);
-	if (tmpfd < 0)
-		die_errno("cannot create tempfile: %s", tmpfile);
-	if (close(tmpfd))
-		die_errno("cannot close tempfile: %s", tmpfile);
-	sprintf(export_marks, "--export-marks=%s", tmpfile);
-
-	memset(&fast_import, 0, sizeof(fast_import));
-	fast_import.in = -1;
-	fast_import.argv = argv;
-	fast_import.git_cmd = 1;
-	if (start_command(&fast_import))
-		die_errno("index-stream: git fast-import failed");
-
-	len = sprintf(fast_import_cmd, "blob\nmark :1\ndata %lu\n",
-		      (unsigned long) size);
-	write_or_whine(fast_import.in, fast_import_cmd, len,
-		       "index-stream: feeding fast-import");
-	while (size) {
-		char buf[10240];
-		size_t sz = size < sizeof(buf) ? size : sizeof(buf);
-		ssize_t actual;
-
-		actual = read_in_full(fd, buf, sz);
-		if (actual < 0)
-			die_errno("index-stream: reading input");
-		if (write_in_full(fast_import.in, buf, actual) != actual)
-			die_errno("index-stream: feeding fast-import");
-		size -= actual;
-	}
-	if (close(fast_import.in))
-		die_errno("index-stream: closing fast-import");
-	if (finish_command(&fast_import))
-		die_errno("index-stream: finishing fast-import");
-
-	tmpfd = open(tmpfile, O_RDONLY);
-	if (tmpfd < 0)
-		die_errno("index-stream: cannot open fast-import mark");
-	len = read(tmpfd, buf, sizeof(buf));
-	if (len < 0)
-		die_errno("index-stream: reading fast-import mark");
-	if (close(tmpfd) < 0)
-		die_errno("index-stream: closing fast-import mark");
-	if (unlink(tmpfile))
-		die_errno("index-stream: unlinking fast-import mark");
-	if (len != 44 ||
-	    memcmp(":1 ", buf, 3) ||
-	    get_sha1_hex(buf + 3, sha1))
-		die_errno("index-stream: unexpected fast-import mark: <%s>", buf);
-	return 0;
+	return index_bulk_checkin(sha1, fd, size, type, path, flags);
 }
 
 int index_fd(unsigned char *sha1, int fd, struct stat *st,
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index deba111..29d6024 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -7,21 +7,97 @@ test_description='adding and checking out large blobs'
 
 test_expect_success setup '
 	git config core.bigfilethreshold 200k &&
-	echo X | dd of=large bs=1k seek=2000
+	echo X | dd of=large1 bs=1k seek=2000 &&
+	echo X | dd of=large2 bs=1k seek=2000 &&
+	echo X | dd of=large3 bs=1k seek=2000 &&
+	echo Y | dd of=huge bs=1k seek=2500
 '
 
-test_expect_success 'add a large file' '
-	git add large &&
-	# make sure we got a packfile and no loose objects
-	test -f .git/objects/pack/pack-*.pack &&
-	test ! -f .git/objects/??/??????????????????????????????????????
+test_expect_success 'add a large file or two' '
+	git add large1 huge large2 &&
+	# make sure we got a single packfile and no loose objects
+	bad= count=0 idx= &&
+	for p in .git/objects/pack/pack-*.pack
+	do
+		count=$(( $count + 1 ))
+		if test -f "$p" && idx=${p%.pack}.idx && test -f "$idx"
+		then
+			continue
+		fi
+		bad=t
+	done &&
+	test -z "$bad" &&
+	test $count = 1 &&
+	cnt=$(git show-index <"$idx" | wc -l) &&
+	test $cnt = 2 &&
+	for l in .git/objects/??/??????????????????????????????????????
+	do
+		test -f "$l" || continue
+		bad=t
+	done &&
+	test -z "$bad" &&
+
+	# attempt to add another copy of the same
+	git add large3 &&
+	bad= count=0 &&
+	for p in .git/objects/pack/pack-*.pack
+	do
+		count=$(( $count + 1 ))
+		if test -f "$p" && idx=${p%.pack}.idx && test -f "$idx"
+		then
+			continue
+		fi
+		bad=t
+	done &&
+	test -z "$bad" &&
+	test $count = 1
 '
 
 test_expect_success 'checkout a large file' '
-	large=$(git rev-parse :large) &&
-	git update-index --add --cacheinfo 100644 $large another &&
+	large1=$(git rev-parse :large1) &&
+	git update-index --add --cacheinfo 100644 $large1 another &&
 	git checkout another &&
-	cmp large another ;# this must not be test_cmp
+	cmp large1 another ;# this must not be test_cmp
+'
+
+test_expect_success 'packsize limit' '
+	test_create_repo mid &&
+	(
+		cd mid &&
+		git config core.bigfilethreshold 64k &&
+		git config pack.packsizelimit 256k &&
+
+		# mid1 and mid2 will fit within 256k limit but
+		# appending mid3 will bust the limit and will
+		# result in a separate packfile.
+		test-genrandom "a" $(( 66 * 1024 )) >mid1 &&
+		test-genrandom "b" $(( 80 * 1024 )) >mid2 &&
+		test-genrandom "c" $(( 128 * 1024 )) >mid3 &&
+		git add mid1 mid2 mid3 &&
+
+		count=0
+		for pi in .git/objects/pack/pack-*.idx
+		do
+			test -f "$pi" && count=$(( $count + 1 ))
+		done &&
+		test $count = 2 &&
+
+		(
+			git hash-object --stdin <mid1
+			git hash-object --stdin <mid2
+			git hash-object --stdin <mid3
+		) |
+		sort >expect &&
+
+		for pi in .git/objects/pack/pack-*.idx
+		do
+			git show-index <"$pi"
+		done |
+		sed -e "s/^[0-9]* \([0-9a-f]*\) .*/\1/" |
+		sort >actual &&
+
+		test_cmp expect actual
+	)
 '
 
 test_done
diff --git a/zlib.c b/zlib.c
index 3c63d48..2b2c0c7 100644
--- a/zlib.c
+++ b/zlib.c
@@ -188,13 +188,20 @@ void git_deflate_init_gzip(git_zstream *strm, int level)
 	    strm->z.msg ? strm->z.msg : "no message");
 }
 
-void git_deflate_end(git_zstream *strm)
+int git_deflate_abort(git_zstream *strm)
 {
 	int status;
 
 	zlib_pre_call(strm);
 	status = deflateEnd(&strm->z);
 	zlib_post_call(strm);
+	return status;
+}
+
+void git_deflate_end(git_zstream *strm)
+{
+	int status = git_deflate_abort(strm);
+
 	if (status == Z_OK)
 		return;
 	error("deflateEnd: %s (%s)", zerr_to_string(status),
-- 
1.7.8.rc4.177.g4d64

^ permalink raw reply related

* [PATCH v2 4/5] csum-file: introduce sha1file_checkpoint
From: Junio C Hamano @ 2011-12-01  0:27 UTC (permalink / raw)
  To: git
In-Reply-To: <1322699263-14475-1-git-send-email-gitster@pobox.com>

It is useful to be able to rewind a check-summed file to a certain
previous state after writing data into it using sha1write() API. The
fast-import command does this after streaming a blob data to the packfile
being generated and then noticing that the same blob has already been
written, and it does this with a private code truncate_pack() that is
commented as "Yes, this is a layering violation".

Introduce two API functions, sha1file_checkpoint(), that allows the caller
to save a state of a sha1file, and then later revert it to the saved state.
Use it to reimplement truncate_pack().

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 csum-file.c   |   20 ++++++++++++++++++++
 csum-file.h   |    9 +++++++++
 fast-import.c |   25 ++++++++-----------------
 3 files changed, 37 insertions(+), 17 deletions(-)

diff --git a/csum-file.c b/csum-file.c
index fc97d6e..53f5375 100644
--- a/csum-file.c
+++ b/csum-file.c
@@ -158,6 +158,26 @@ struct sha1file *sha1fd_throughput(int fd, const char *name, struct progress *tp
 	return f;
 }
 
+void sha1file_checkpoint(struct sha1file *f, struct sha1file_checkpoint *checkpoint)
+{
+	sha1flush(f);
+	checkpoint->offset = f->total;
+	checkpoint->ctx = f->ctx;
+}
+
+int sha1file_truncate(struct sha1file *f, struct sha1file_checkpoint *checkpoint)
+{
+	off_t offset = checkpoint->offset;
+
+	if (ftruncate(f->fd, offset) ||
+	    lseek(f->fd, offset, SEEK_SET) != offset)
+		return -1;
+	f->total = offset;
+	f->ctx = checkpoint->ctx;
+	f->offset = 0; /* sha1flush() was called in checkpoint */
+	return 0;
+}
+
 void crc32_begin(struct sha1file *f)
 {
 	f->crc32 = crc32(0, NULL, 0);
diff --git a/csum-file.h b/csum-file.h
index 6a7967c..3b540bd 100644
--- a/csum-file.h
+++ b/csum-file.h
@@ -17,6 +17,15 @@ struct sha1file {
 	unsigned char buffer[8192];
 };
 
+/* Checkpoint */
+struct sha1file_checkpoint {
+	off_t offset;
+	git_SHA_CTX ctx;
+};
+
+extern void sha1file_checkpoint(struct sha1file *, struct sha1file_checkpoint *);
+extern int sha1file_truncate(struct sha1file *, struct sha1file_checkpoint *);
+
 /* sha1close flags */
 #define CSUM_CLOSE	1
 #define CSUM_FSYNC	2
diff --git a/fast-import.c b/fast-import.c
index 8d8ea3c..a8db41b 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1143,17 +1143,11 @@ static int store_object(
 	return 0;
 }
 
-static void truncate_pack(off_t to, git_SHA_CTX *ctx)
+static void truncate_pack(struct sha1file_checkpoint *checkpoint)
 {
-	if (ftruncate(pack_data->pack_fd, to)
-	 || lseek(pack_data->pack_fd, to, SEEK_SET) != to)
+	if (sha1file_truncate(pack_file, checkpoint))
 		die_errno("cannot truncate pack to skip duplicate");
-	pack_size = to;
-
-	/* yes this is a layering violation */
-	pack_file->total = to;
-	pack_file->offset = 0;
-	pack_file->ctx = *ctx;
+	pack_size = checkpoint->offset;
 }
 
 static void stream_blob(uintmax_t len, unsigned char *sha1out, uintmax_t mark)
@@ -1166,8 +1160,8 @@ static void stream_blob(uintmax_t len, unsigned char *sha1out, uintmax_t mark)
 	unsigned long hdrlen;
 	off_t offset;
 	git_SHA_CTX c;
-	git_SHA_CTX pack_file_ctx;
 	git_zstream s;
+	struct sha1file_checkpoint checkpoint;
 	int status = Z_OK;
 
 	/* Determine if we should auto-checkpoint. */
@@ -1175,11 +1169,8 @@ static void stream_blob(uintmax_t len, unsigned char *sha1out, uintmax_t mark)
 		|| (pack_size + 60 + len) < pack_size)
 		cycle_packfile();
 
-	offset = pack_size;
-
-	/* preserve the pack_file SHA1 ctx in case we have to truncate later */
-	sha1flush(pack_file);
-	pack_file_ctx = pack_file->ctx;
+	sha1file_checkpoint(pack_file, &checkpoint);
+	offset = checkpoint.offset;
 
 	hdrlen = snprintf((char *)out_buf, out_sz, "blob %" PRIuMAX, len) + 1;
 	if (out_sz <= hdrlen)
@@ -1245,14 +1236,14 @@ static void stream_blob(uintmax_t len, unsigned char *sha1out, uintmax_t mark)
 
 	if (e->idx.offset) {
 		duplicate_count_by_type[OBJ_BLOB]++;
-		truncate_pack(offset, &pack_file_ctx);
+		truncate_pack(&checkpoint);
 
 	} else if (find_sha1_pack(sha1, packed_git)) {
 		e->type = OBJ_BLOB;
 		e->pack_id = MAX_PACK_ID;
 		e->idx.offset = 1; /* just not zero! */
 		duplicate_count_by_type[OBJ_BLOB]++;
-		truncate_pack(offset, &pack_file_ctx);
+		truncate_pack(&checkpoint);
 
 	} else {
 		e->depth = 0;
-- 
1.7.8.rc4.177.g4d64

^ permalink raw reply related

* [PATCH v2 3/5] finish_tmp_packfile(): a helper function
From: Junio C Hamano @ 2011-12-01  0:27 UTC (permalink / raw)
  To: git
In-Reply-To: <1322699263-14475-1-git-send-email-gitster@pobox.com>

Factor out a small logic out of the private write_pack_file() function
in builtin/pack-objects.c.

This changes the order of finishing multi-pack generation slightly. The
code used to

 - adjust shared perm of temporary packfile
 - rename temporary packfile to the final name
 - update mtime of the packfile under the final name
 - adjust shared perm of temporary idxfile
 - rename temporary idxfile to the final name

but because the helper does not want to do the mtime thing, the updated
code does that step first and then all the rest.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/pack-objects.c |   33 ++++++++++-----------------------
 pack-write.c           |   31 +++++++++++++++++++++++++++++++
 pack.h                 |    1 +
 3 files changed, 42 insertions(+), 23 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 3258fa9..b458b6d 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -617,20 +617,8 @@ static void write_pack_file(void)
 
 		if (!pack_to_stdout) {
 			struct stat st;
-			const char *idx_tmp_name;
 			char tmpname[PATH_MAX];
 
-			idx_tmp_name = write_idx_file(NULL, written_list, nr_written,
-						      &pack_idx_opts, sha1);
-
-			snprintf(tmpname, sizeof(tmpname), "%s-%s.pack",
-				 base_name, sha1_to_hex(sha1));
-			free_pack_by_name(tmpname);
-			if (adjust_shared_perm(pack_tmp_name))
-				die_errno("unable to make temporary pack file readable");
-			if (rename(pack_tmp_name, tmpname))
-				die_errno("unable to rename temporary pack file");
-
 			/*
 			 * Packs are runtime accessed in their mtime
 			 * order since newer packs are more likely to contain
@@ -638,28 +626,27 @@ static void write_pack_file(void)
 			 * packs then we should modify the mtime of later ones
 			 * to preserve this property.
 			 */
-			if (stat(tmpname, &st) < 0) {
+			if (stat(pack_tmp_name, &st) < 0) {
 				warning("failed to stat %s: %s",
-					tmpname, strerror(errno));
+					pack_tmp_name, strerror(errno));
 			} else if (!last_mtime) {
 				last_mtime = st.st_mtime;
 			} else {
 				struct utimbuf utb;
 				utb.actime = st.st_atime;
 				utb.modtime = --last_mtime;
-				if (utime(tmpname, &utb) < 0)
+				if (utime(pack_tmp_name, &utb) < 0)
 					warning("failed utime() on %s: %s",
 						tmpname, strerror(errno));
 			}
 
-			snprintf(tmpname, sizeof(tmpname), "%s-%s.idx",
-				 base_name, sha1_to_hex(sha1));
-			if (adjust_shared_perm(idx_tmp_name))
-				die_errno("unable to make temporary index file readable");
-			if (rename(idx_tmp_name, tmpname))
-				die_errno("unable to rename temporary index file");
-
-			free((void *) idx_tmp_name);
+			/* Enough space for "-<sha-1>.pack"? */
+			if (sizeof(tmpname) <= strlen(base_name) + 50)
+				die("pack base name '%s' too long", base_name);
+			snprintf(tmpname, sizeof(tmpname), "%s-", base_name);
+			finish_tmp_packfile(tmpname, pack_tmp_name,
+					    written_list, nr_written,
+					    &pack_idx_opts, sha1);
 			free(pack_tmp_name);
 			puts(sha1_to_hex(sha1));
 		}
diff --git a/pack-write.c b/pack-write.c
index 863cce8..cadc3e1 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -338,3 +338,34 @@ struct sha1file *create_tmp_packfile(char **pack_tmp_name)
 	*pack_tmp_name = xstrdup(tmpname);
 	return sha1fd(fd, *pack_tmp_name);
 }
+
+void finish_tmp_packfile(char *name_buffer,
+			 const char *pack_tmp_name,
+			 struct pack_idx_entry **written_list,
+			 uint32_t nr_written,
+			 struct pack_idx_option *pack_idx_opts,
+			 unsigned char sha1[])
+{
+	const char *idx_tmp_name;
+	char *end_of_name_prefix = strrchr(name_buffer, 0);
+
+	if (adjust_shared_perm(pack_tmp_name))
+		die_errno("unable to make temporary pack file readable");
+
+	idx_tmp_name = write_idx_file(NULL, written_list, nr_written,
+				      pack_idx_opts, sha1);
+	if (adjust_shared_perm(idx_tmp_name))
+		die_errno("unable to make temporary index file readable");
+
+	sprintf(end_of_name_prefix, "%s.pack", sha1_to_hex(sha1));
+	free_pack_by_name(name_buffer);
+
+	if (rename(pack_tmp_name, name_buffer))
+		die_errno("unable to rename temporary pack file");
+
+	sprintf(end_of_name_prefix, "%s.idx", sha1_to_hex(sha1));
+	if (rename(idx_tmp_name, name_buffer))
+		die_errno("unable to rename temporary index file");
+
+	free((void *)idx_tmp_name);
+}
diff --git a/pack.h b/pack.h
index 0027ac6..cfb0f69 100644
--- a/pack.h
+++ b/pack.h
@@ -86,5 +86,6 @@ extern int encode_in_pack_object_header(enum object_type, uintmax_t, unsigned ch
 extern int read_pack_header(int fd, struct pack_header *);
 
 extern struct sha1file *create_tmp_packfile(char **pack_tmp_name);
+extern void finish_tmp_packfile(char *name_buffer, const char *pack_tmp_name, struct pack_idx_entry **written_list, uint32_t nr_written, struct pack_idx_option *pack_idx_opts, unsigned char sha1[]);
 
 #endif
-- 
1.7.8.rc4.177.g4d64

^ permalink raw reply related

* [PATCH v2 2/5] create_tmp_packfile(): a helper function
From: Junio C Hamano @ 2011-12-01  0:27 UTC (permalink / raw)
  To: git
In-Reply-To: <1322699263-14475-1-git-send-email-gitster@pobox.com>

Factor out a small logic out of the private write_pack_file() function
in builtin/pack-objects.c

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/pack-objects.c |   12 +++---------
 pack-write.c           |   10 ++++++++++
 pack.h                 |    3 +++
 3 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 6643c16..3258fa9 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -584,16 +584,10 @@ static void write_pack_file(void)
 		unsigned char sha1[20];
 		char *pack_tmp_name = NULL;
 
-		if (pack_to_stdout) {
+		if (pack_to_stdout)
 			f = sha1fd_throughput(1, "<stdout>", progress_state);
-		} else {
-			char tmpname[PATH_MAX];
-			int fd;
-			fd = odb_mkstemp(tmpname, sizeof(tmpname),
-					 "pack/tmp_pack_XXXXXX");
-			pack_tmp_name = xstrdup(tmpname);
-			f = sha1fd(fd, pack_tmp_name);
-		}
+		else
+			f = create_tmp_packfile(&pack_tmp_name);
 
 		offset = write_pack_header(f, nr_remaining);
 		if (!offset)
diff --git a/pack-write.c b/pack-write.c
index 46f3f84..863cce8 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -328,3 +328,13 @@ int encode_in_pack_object_header(enum object_type type, uintmax_t size, unsigned
 	*hdr = c;
 	return n;
 }
+
+struct sha1file *create_tmp_packfile(char **pack_tmp_name)
+{
+	char tmpname[PATH_MAX];
+	int fd;
+
+	fd = odb_mkstemp(tmpname, sizeof(tmpname), "pack/tmp_pack_XXXXXX");
+	*pack_tmp_name = xstrdup(tmpname);
+	return sha1fd(fd, *pack_tmp_name);
+}
diff --git a/pack.h b/pack.h
index d429d8a..0027ac6 100644
--- a/pack.h
+++ b/pack.h
@@ -84,4 +84,7 @@ extern int encode_in_pack_object_header(enum object_type, uintmax_t, unsigned ch
 #define PH_ERROR_PACK_SIGNATURE	(-2)
 #define PH_ERROR_PROTOCOL	(-3)
 extern int read_pack_header(int fd, struct pack_header *);
+
+extern struct sha1file *create_tmp_packfile(char **pack_tmp_name);
+
 #endif
-- 
1.7.8.rc4.177.g4d64

^ permalink raw reply related

* [PATCH v2 1/5] write_pack_header(): a helper function
From: Junio C Hamano @ 2011-12-01  0:27 UTC (permalink / raw)
  To: git
In-Reply-To: <1322699263-14475-1-git-send-email-gitster@pobox.com>

Factor out a small logic out of the private write_pack_file() function
in builtin/pack-objects.c

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/pack-objects.c |    9 +++------
 pack-write.c           |   12 ++++++++++++
 pack.h                 |    2 ++
 3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index ba3705d..6643c16 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -571,7 +571,6 @@ static void write_pack_file(void)
 	uint32_t i = 0, j;
 	struct sha1file *f;
 	off_t offset;
-	struct pack_header hdr;
 	uint32_t nr_remaining = nr_result;
 	time_t last_mtime = 0;
 	struct object_entry **write_order;
@@ -596,11 +595,9 @@ static void write_pack_file(void)
 			f = sha1fd(fd, pack_tmp_name);
 		}
 
-		hdr.hdr_signature = htonl(PACK_SIGNATURE);
-		hdr.hdr_version = htonl(PACK_VERSION);
-		hdr.hdr_entries = htonl(nr_remaining);
-		sha1write(f, &hdr, sizeof(hdr));
-		offset = sizeof(hdr);
+		offset = write_pack_header(f, nr_remaining);
+		if (!offset)
+			die_errno("unable to write pack header");
 		nr_written = 0;
 		for (; i < nr_objects; i++) {
 			struct object_entry *e = write_order[i];
diff --git a/pack-write.c b/pack-write.c
index 9cd3bfb..46f3f84 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -178,6 +178,18 @@ const char *write_idx_file(const char *index_name, struct pack_idx_entry **objec
 	return index_name;
 }
 
+off_t write_pack_header(struct sha1file *f, uint32_t nr_entries)
+{
+	struct pack_header hdr;
+
+	hdr.hdr_signature = htonl(PACK_SIGNATURE);
+	hdr.hdr_version = htonl(PACK_VERSION);
+	hdr.hdr_entries = htonl(nr_entries);
+	if (sha1write(f, &hdr, sizeof(hdr)))
+		return 0;
+	return sizeof(hdr);
+}
+
 /*
  * Update pack header with object_count and compute new SHA1 for pack data
  * associated to pack_fd, and write that SHA1 at the end.  That new SHA1
diff --git a/pack.h b/pack.h
index 722a54e..d429d8a 100644
--- a/pack.h
+++ b/pack.h
@@ -2,6 +2,7 @@
 #define PACK_H
 
 #include "object.h"
+#include "csum-file.h"
 
 /*
  * Packed object header
@@ -74,6 +75,7 @@ extern const char *write_idx_file(const char *index_name, struct pack_idx_entry
 extern int check_pack_crc(struct packed_git *p, struct pack_window **w_curs, off_t offset, off_t len, unsigned int nr);
 extern int verify_pack_index(struct packed_git *);
 extern int verify_pack(struct packed_git *);
+extern off_t write_pack_header(struct sha1file *f, uint32_t);
 extern void fixup_pack_header_footer(int, unsigned char *, const char *, uint32_t, unsigned char *, off_t);
 extern char *index_pack_lockfile(int fd);
 extern int encode_in_pack_object_header(enum object_type, uintmax_t, unsigned char *);
-- 
1.7.8.rc4.177.g4d64

^ permalink raw reply related

* [PATCH v2 0/5] Bulk Check-in
From: Junio C Hamano @ 2011-12-01  0:27 UTC (permalink / raw)
  To: git

This is a re-roll of the earlier bulk-check-in series. The only change is
that the last one is a bit re-structured to pay attention to the packsize
limit from the get-go and also not write objects that already exist in the
repository, instead of "oops, I forgot to do that, and here is a fix".

Junio C Hamano (5):
  write_pack_header(): a helper function
  create_tmp_packfile(): a helper function
  finish_tmp_packfile(): a helper function
  csum-file: introduce sha1file_checkpoint
  bulk-checkin: replace fast-import based implementation

 Makefile               |    2 +
 builtin/add.c          |    5 +
 builtin/pack-objects.c |   62 +++--------
 bulk-checkin.c         |  275 ++++++++++++++++++++++++++++++++++++++++++++++++
 bulk-checkin.h         |   16 +++
 cache.h                |    2 +
 config.c               |    4 +
 csum-file.c            |   20 ++++
 csum-file.h            |    9 ++
 environment.c          |    1 +
 fast-import.c          |   25 ++---
 pack-write.c           |   53 +++++++++
 pack.h                 |    6 +
 sha1_file.c            |   67 +-----------
 t/t1050-large.sh       |   94 +++++++++++++++--
 zlib.c                 |    9 ++-
 16 files changed, 516 insertions(+), 134 deletions(-)
 create mode 100644 bulk-checkin.c
 create mode 100644 bulk-checkin.h

-- 
1.7.8.rc4.177.g4d64

^ permalink raw reply

* mini-wierdness with "git am"
From: Aghiles @ 2011-11-30 23:12 UTC (permalink / raw)
  To: git list

Hello,

I am using : git version 1.7.0.3, in Mac OS X.

A slightly strange behavior occurs with "git am --abort". Here
is the sequence:

> git am 000*
error: ... : patch doesn not apply
(oops, forgot to reset)
> git reset --hard HEAD~1
> git am 000*
previous rebase directory ... still exists but mbox given.
(oops, forgot to abort)
> git am --abort

At this point, my repository is back to ORIG_HEAD!
Meaning that it reverted my preivous "git reset --hard HEAD~1".

Hope this is clear.

As usual, thanks for the great product.

-- aghiles

^ permalink raw reply

* Re: [PATCHv2 0/4] git-p4: small fixes to branches and labels; tests
From: Pete Wyckoff @ 2011-11-30 23:00 UTC (permalink / raw)
  To: Luke Diamand; +Cc: Vitor Antunes, git
In-Reply-To: <20111130225813.GA11544@arf.padd.com>

P.S.  Since you're respinning anyway to change the label code,
can you stick the 'branch with shell char' test from t9801 in
with t9803?  It feels more appropriate there than with the branch
tests.  And avoids collision with some Vitor code that will get
added eventually.

		-- Pete

^ permalink raw reply

* Re: [PATCHv2 0/4] git-p4: small fixes to branches and labels; tests
From: Pete Wyckoff @ 2011-11-30 22:58 UTC (permalink / raw)
  To: Luke Diamand; +Cc: Vitor Antunes, git
In-Reply-To: <4ED6809A.9020703@diamand.org>

luke@diamand.org wrote on Wed, 30 Nov 2011 19:14 +0000:
> On 30/11/11 14:55, Vitor Antunes wrote:
> >Luke Diamand<luke<at>  diamand.org>  writes:
> >
> >>In adding the test case for labels I also found and fixed a few other
> >>small bugs in the label handling:
> >>
> >>  - labels missing a description or "EOT" in their text cause problems;
> >>  - labels without an owner cause problems.
> >>
> >>I also noticed, but did not fix, that you can't have more than one label
> >>per commit (the others are silently dropped) and the documentation for
> >>branch import could be improved. I've added a (failing) test case for
> >>the multiple label problem.

I was hanging onto your v1, and made a comment on the v1's 3/4
that perhaps you missed.  Also acked the entire thing.  I can
resend if my mailer ate it.

Don't expect Junio to pick it up until after 1.7.8 goes out.

> >Hi Luke,
> >
> >Seeing that you have some experience using labels, could I kindly ask you to
> >include some description of it in git-p4.txt?
> 
> OK, if you can help me understand what's going on...
> 
> The label-detection bug that I've described, on further
> investigation, looks to be a fundamental limitation.
> 
> With perforce, I can check out the head revision, and then tag just
> a single file. If I then check out on that tag, I get just that one
> file.
> 
> I think I can't do that with git; certainly fast-import can't do it.

This is another fundamental disconnect between p4 and git.
Reading

http://www.perforce.com/perforce/doc.current/manuals/p4guide/07_labels.html

it is clear that labels are supposed to be used exactly where
tags cannot:  to specify a collection of files as they existed
at _different_ points in the commit history.

Thus I think supporting labels is kind of pointless.  But in the
restricted use case that perforce docs tell us not to do, namely
using labels to identify change numbers, git can reflect that
with tags.

> So the code in git-p4 that is checking the file vs label counts
> (git-p4 around line 1496) is actually trying to say "this label
> can't be imported into git".
> 
> If my understanding is correct, I can then fix my test and update
> the docs and the code to explain this.

Yeah.  It's just a big restriction on how labels get imported.
A better error message and some docs would be useful.

> As an aside, git-p4.txt currently has quite good information on the
> config variables, but nothing on the command line variables.
> Possibly that should be fixed.

Recently, I wrote an asciidoc-style document for git-p4, and
tried to find all the options on all the commands.  There's a lot
more than I ever knew about.  :)  I'll take another pass
through it then send it out for review.  Maybe we can get rid
of the old git-p4.txt then and work on improving a more
structured document.

		-- Pete

^ permalink raw reply

* Re: [PATCH 5/3] revert: introduce --abort to cancel a failed cherry-pick
From: Junio C Hamano @ 2011-11-30 22:52 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Ramkumar Ramachandra, git, Christian Couder,
	Martin von Zweigbergk, Phil Hord, Jay Soffian
In-Reply-To: <20111122112001.GF7399@elie.hsd1.il.comcast.net>

Jonathan Nieder <jrnieder@gmail.com> writes:

> After running some ill-advised command like "git cherry-pick
> HEAD..linux-next", the bewildered novice may want to return to more
> familiar territory.  Introduce a "git cherry-pick --abort" command
> that rolls back the entire cherry-pick sequence and places the
> repository back on solid ground.

This is confusing; if you have many commits in the range, and a handful of
them replayed without conflicts and then you hit a conflict, where should
(I am not asking "where does ... with your patch") abort take us? The
state after the random commit that happened to have replayed successfully?
The state before the entire cherry-pick sequence started?  "back on solid
ground" does not tell us which one you meant.

I am assuming that it is the latter.

> diff --git a/Documentation/sequencer.txt b/Documentation/sequencer.txt
> index 75f8e869..5747f442 100644
> --- a/Documentation/sequencer.txt
> +++ b/Documentation/sequencer.txt
> @@ -7,3 +7,6 @@
>  	Forget about the current operation in progress.  Can be used
>  	to clear the sequencer state after a failed cherry-pick or
>  	revert.
> +
> +--abort::
> +	Cancel the operation and return to the pre-sequence state.

Ok, it is the latter.

> +static int reset_for_rollback(const unsigned char *sha1)
> +{
> +	const char *argv[4];	/* reset --merge <arg> + NULL */
> +	argv[0] = "reset";
> +	argv[1] = "--merge";
> +	argv[2] = sha1_to_hex(sha1);
> +	argv[3] = NULL;
> +	return run_command_v_opt(argv, RUN_GIT_CMD);
> +}

So you give the value of the HEAD before the sequence started to this
function and all should go well. Where do you read that value from?

> +static int rollback_single_pick(void)
> +{
> +	unsigned char head_sha1[20];
> +
> +	if (!file_exists(git_path("CHERRY_PICK_HEAD")) &&
> +	    !file_exists(git_path("REVERT_HEAD")))
> +		return error(_("no cherry-pick or revert in progress"));
> +	if (!resolve_ref("HEAD", head_sha1, 0, NULL))
> +		return error(_("cannot resolve HEAD"));
> +	if (is_null_sha1(head_sha1))
> +		return error(_("cannot abort from a branch yet to be born"));

Ok, this is for single-pick so HEAD is where we came from. Good.

> +	return reset_for_rollback(head_sha1);
> +}
> +
> +static int sequencer_rollback(struct replay_opts *opts)
> +{
> +	const char *filename;
> +	FILE *f;
> +	unsigned char sha1[20];
> +	struct strbuf buf = STRBUF_INIT;
> +
> +	filename = git_path(SEQ_HEAD_FILE);
> +	f = fopen(filename, "r");
> +	if (!f && errno == ENOENT) {
> +		/*
> +		 * There is no multiple-cherry-pick in progress.
> +		 * If CHERRY_PICK_HEAD or REVERT_HEAD indicates
> +		 * a single-cherry-pick in progress, abort that.
> +		 */
> +		return rollback_single_pick();
> +	}
> +	if (!f)
> +		return error(_("cannot open %s: %s"), filename,
> +						strerror(errno));
> +	if (strbuf_getline(&buf, f, '\n')) {
> +		error(_("cannot read %s: %s"), filename, ferror(f) ?
> +			strerror(errno) : _("unexpected end of file"));
> +		goto fail;
> +	}

And when we are in multi-pick, SEQ_HEAD_FILE has it.

Looks good from a cursory review. Thanks.

^ permalink raw reply

* Re: BUG: "--work-tree blah" does not imply "--git-dir blah/.git" or fix misleading error message
From: Junio C Hamano @ 2011-11-30 20:45 UTC (permalink / raw)
  To: Carlos Martín Nieto; +Cc: John Twilley, git
In-Reply-To: <20111130194233.GD12096@centaur.lab.cmartin.tk>

Carlos Martín Nieto <carlos@cmartin.tk> writes:

> On Wed, Nov 30, 2011 at 11:21:29AM -0800, Junio C Hamano wrote:
>> subdirectory "blah/.git"), but I do not think this is likely to change, as
>> I suspect that people and scripts are relying on the current behaviour to
>> be able to do something like this:
>> 
>>     cd /pub/scm/git/git.git ;# this is a bare repository
>>     mkdir /var/tmp/git
>>     git --work-tree=/var/tmp/git checkout
>
> This is in fact the way that many (or from what I can see the most
> popular) tutorials for abusing git as a deployment system tell you to
> run it (though more often than not setting GIT_WORKTREE in the
> environment).

Heh, *ab*using is a good description if they really mean to use it as a
deployment system. For one thing it won't do anything a proper deployment
should do if the target directory is not empty ;-)

^ permalink raw reply

* Re: [PATCHv2 0/4] git-p4: small fixes to branches and labels; tests
From: Vitor Antunes @ 2011-11-30 19:44 UTC (permalink / raw)
  To: Luke Diamand; +Cc: git, Pete Wyckoff
In-Reply-To: <4ED6809A.9020703@diamand.org>

On Wed, Nov 30, 2011 at 7:14 PM, Luke Diamand <luke@diamand.org> wrote:
> OK, if you can help me understand what's going on...
>
> The label-detection bug that I've described, on further investigation, looks
> to be a fundamental limitation.
>
> With perforce, I can check out the head revision, and then tag just a single
> file. If I then check out on that tag, I get just that one file.
>
> I think I can't do that with git; certainly fast-import can't do it.
>
> So the code in git-p4 that is checking the file vs label counts (git-p4
> around line 1496) is actually trying to say "this label can't be imported
> into git".
>
> If my understanding is correct, I can then fix my test and update the docs
> and the code to explain this.
>
> As an aside, git-p4.txt currently has quite good information on the config
> variables, but nothing on the command line variables. Possibly that should
> be fixed.

Hey Luke,

I did not have any success in the few times I tried to import labels.
Since you were updating some of that code I was hoping you could
extend git-p4.txt such that I would be able to understand it. This to
say: I can't help you much with my current expertise level on how
labels are implemented. Will need to look into that later.

-- 
Vitor Antunes

^ permalink raw reply

* Re: BUG: "--work-tree blah" does not imply "--git-dir blah/.git" or fix misleading error message
From: Carlos Martín Nieto @ 2011-11-30 19:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: John Twilley, git
In-Reply-To: <7vaa7dquva.fsf@alter.siamese.dyndns.org>

[-- Attachment #1: Type: text/plain, Size: 629 bytes --]

On Wed, Nov 30, 2011 at 11:21:29AM -0800, Junio C Hamano wrote:
> subdirectory "blah/.git"), but I do not think this is likely to change, as
> I suspect that people and scripts are relying on the current behaviour to
> be able to do something like this:
> 
>     cd /pub/scm/git/git.git ;# this is a bare repository
>     mkdir /var/tmp/git
>     git --work-tree=/var/tmp/git checkout
> 

This is in fact the way that many (or from what I can see the most
popular) tutorials for abusing git as a deployment system tell you to
run it (though more often than not setting GIT_WORKTREE in the
environment).

   cmn


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* Re: BUG: "--work-tree blah" does not imply "--git-dir blah/.git" or fix misleading error message
From: Junio C Hamano @ 2011-11-30 19:21 UTC (permalink / raw)
  To: John Twilley; +Cc: git
In-Reply-To: <CAEUMa-cA8qPjJuPBREE1RqhgwmcZG7x1MjBYkxa3i+ZSAnMPOA@mail.gmail.com>

John Twilley <mathuin@gmail.com> writes:

> Today someone asked me if there was a way to run git against a
> directory other than the current directory.  I looked at the output of
> --help and ran this:
>
> $ git --work-tree blah status
>
> I got the following output:
>
> fatal: Not a git repository (or any parent up to mount parent /home)

Yeah, that is a "this a use case that we didn't even intend to support,
and as a consequence we do a random thing" bug.

Originally, when GIT_DIR is set (from the environment, and then later we
added "git --git-dir=..." as another way to do so), Git always used the
current directory as the top of the working tree. There was no mechanism
for the user to say "No, I am not at the top level, but I am in a
subdirectory of the working tree. The top of working tree is there".  That
was the use case GIT_WORK_TREE (from the environment, and then later we
added "git --work-tree=..." as another way to do so) was introduced
for.

So in that sense, it is an unsupported mode of operation and it is not
surprising at all if Git did any random and meaningless things if you used
GIT_WORK_TREE without specifying GIT_DIR at all. In the same sense,
strictly speaking, setting GIT_WORK_TREE to somewhere that is not a parent
directory of the current directory (even if you set GIT_DIR) is also an
unsupported mode of operation.

When GIT_DIR is not set, I think we still run the normal GIT_DIR discovery
starting from the current working directory, and when we do not find one,
we would error out, as you saw. I am sympathetic that your particular case
might have resulted in a more pleasant user experience if the GIT_DIR
discovery started from the directory specified by GIT_WORK_TREE (i.e. the
subdirectory "blah/.git"), but I do not think this is likely to change, as
I suspect that people and scripts are relying on the current behaviour to
be able to do something like this:

    cd /pub/scm/git/git.git ;# this is a bare repository
    mkdir /var/tmp/git
    git --work-tree=/var/tmp/git checkout

to have a temporary checkout, and changing the GIT_DIR discovery logic
will break them, i.e. they now have to do:

    cd /pub/scm/git/git.git ;# this is a bare repository
    mkdir /var/tmp/git
    git --work-tree=/var/tmp/git --git-dir=$(pwd) checkout

or something. Instead, what you wanted to do is already supported by:

    (cd blah && git status)

so nothing is lost.

We could reword this:

> fatal: Not a git repository (or any parent up to mount parent /home)

to "fatal: /home/bar/baz (or any parent ...) is not a git repository" to
mention the current directory /home/bar/baz, but I am having a hard time
convincing myself that such a change is particularly good, because almost
always you know where you are (many people have it in their shell prompt).
Such a change makes the message longer to fit on a line without adding
much value.

^ permalink raw reply

* Re: [PATCHv2 0/4] git-p4: small fixes to branches and labels; tests
From: Luke Diamand @ 2011-11-30 19:14 UTC (permalink / raw)
  To: Vitor Antunes; +Cc: git, Pete Wyckoff
In-Reply-To: <loom.20111130T155409-599@post.gmane.org>

On 30/11/11 14:55, Vitor Antunes wrote:
> Luke Diamand<luke<at>  diamand.org>  writes:
>
>> In adding the test case for labels I also found and fixed a few other
>> small bugs in the label handling:
>>
>>   - labels missing a description or "EOT" in their text cause problems;
>>   - labels without an owner cause problems.
>>
>> I also noticed, but did not fix, that you can't have more than one label
>> per commit (the others are silently dropped) and the documentation for
>> branch import could be improved. I've added a (failing) test case for
>> the multiple label problem.
>
> Hi Luke,
>
> Seeing that you have some experience using labels, could I kindly ask you to
> include some description of it in git-p4.txt?

OK, if you can help me understand what's going on...

The label-detection bug that I've described, on further investigation, 
looks to be a fundamental limitation.

With perforce, I can check out the head revision, and then tag just a 
single file. If I then check out on that tag, I get just that one file.

I think I can't do that with git; certainly fast-import can't do it.

So the code in git-p4 that is checking the file vs label counts (git-p4 
around line 1496) is actually trying to say "this label can't be 
imported into git".

If my understanding is correct, I can then fix my test and update the 
docs and the code to explain this.

As an aside, git-p4.txt currently has quite good information on the 
config variables, but nothing on the command line variables. Possibly 
that should be fixed.

Cheers!
Luke

^ permalink raw reply

* Re: BUG: "--work-tree blah" does not imply "--git-dir blah/.git" or fix misleading error message
From: John Twilley @ 2011-11-30 19:13 UTC (permalink / raw)
  To: Carlos Martín Nieto, git
In-Reply-To: <20111130182230.GC12096@centaur.lab.cmartin.tk>

On Wed, Nov 30, 2011 at 10:22, Carlos Martín Nieto <cmn@elego.de> wrote:
> On Wed, Nov 30, 2011 at 09:43:08AM -0800, John Twilley wrote:
>> Today someone asked me if there was a way to run git against a
>> directory other than the current directory.  I looked at the output of
>> --help and ran this:
>>
>> $ git --work-tree blah status
>>
>> I got the following output:
>>
>> fatal: Not a git repository (or any parent up to mount parent /home)
>> Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
>>
>> I mistakenly thought the error message meant that blah was not a git
>> repository.  What it meant was that there was no .git in the current
>> directory or any parent directory up to /home.
>
> Yes, git looks at the current directory and .git/ to see if there's a
> git repository there. This is what happens unless you tell git to look
> for it somewhere else.

This makes perfect sense, because nearly every time I run git
commands, I am somewhere within the working tree.  The point of my
post was that I was using --work-tree to tell git to look for the git
repository somewhere else (the root of the specified working tree)
which is not what git expected.

>> This command worked as expected:
>>
>> $ git --work-tree blah --git-dir blah/.git status
>>
>> The documentation is somewhat fuzzy about what constitutes a git
>> repository.  The gittutorial describes the git repository as .git when
>> talking about "git init" while the Git User's Manual describes the git
>> repository as the working tree and the special top-level directory
>> named .git when talking about "git clone".
>
> A git repository is what's under .git/ (or in foo.git/ for bare
> repos). Non-bare repositories have a working tree associated with
> them, which by default lives just above the repository, because it'd
> be silly to have it somewhere else by default. Often you can think of
> both as the repository, as they're both. When you tell git to look for
> the worktree somewhere else, you're only overriding that particular
> variable, as git expects to be run from the repository (or just above,
> in the worktree).

And it's exactly this issue -- that sometimes the repository is just
the git directory, and sometimes the repository is the working tree
which contains the git directory at its root -- that caused the
confusion and unexpected behavior.  If I were to use a bare repository
directly (something I've never done), I guess I might use --git-dir
since the repository may not be named .git but instead something like
proj.git.  When I accessed a repository from outside its working tree,
I expected --work-tree to cover the whole shebang.  Obviously this
discussion is exposing my relatively limited experience with git, but
these assumptions do not seem unreasonable on their face.

>> It's clear (to me at least) that --work-tree should be used to
>> identify the root of the working tree when not inside the working
>> tree.  I expected that the git directory would be automatically set to
>> .git in the root of the working tree, as that would match the
>> documentation.  Instead, the current directory and its parents were
>> checked -- which could provide dangerously misleading information to
>> the user.
>
> What part of the documentation exactly? --work-tree tells git to look
> for the working tree somewhere else. This option exists in order to
> support a multiple-workdir workflow.

What you mention above was what I was thinking about when I mentioned
the possibility of this being a critical and significant feature.  If
it is important to support a workflow with one git directory and
multiple working trees, and that case is more common/important than
the one I experienced, then changing the behavior of --git-dir is
obviously not the right thing to do.

>> I think that one of two things should be done:  either the --git-dir
>> default should be changed when the --work-tree option is set, or the
>> error message cited above should be changed to explicitly identify the
>> directory being tested as a potential git repository.  I personally
>
> Git does tell you explicitly, but only when you specify a gitdir (via
> GIT_DIR or --git-dir), otherwise it looks at the current directory.

This is misleading if you don't know that the specification of a
working tree does not also implicitly specify a git directory.
Whether that lack of knowledge is the user's problem or the
software/documentation's problem is a separate question.

>> believe the first option is superior because it fulfills the
>> expectations of average users (folks who read git's documentation
>> instead of its source code) while permitting flexibility to those who
>
> It's not likely that it will get changed because that would break
> backwards-compatability in a very big way. If your concern is for
> "average user", she shouldn't be using that option, but changing to
> that directory instead. If you want your working tree to be ./foo/ and
> your gitdir to be ./foo/.git, why don't you just cd to ./foo/?

From that perspective, why have --work-tree at all?  Without that
option, either the git directory is in the root of your current
working tree, or it's not -- in which case --git-dir is all you need.
If you're going to keep the option, it's helpful to provide the
diagnostic output.  My suggestion would be more compelling if I could
provide a valid use case, but all I can come up with off the top of my
head are scripts and something like "(cd $worktree && git status)"
would probably work fine.

>> wish to refer to the current directory or some other directory for
>> their --git-dir value.  If the current behavior is somehow not a bug
>> but instead a critical and significant feature which if changed would
>> cause more harm than good, please consider the second option.
>
> You get two different messages depending on how git is looking for the
> repository. The message you mentioned gets printed when git tries to
> find it automatically. A "fatal: Not a git repository: '/tmp'" gets
> printed if you've told git to look for it in a specific place. The
> information is already there, though I guess you do have to know about
> the difference. Adding the current directory to the "default" message
> probably wouldn't hurt, as it's unlikely that a script is parsing
> that, and might be useful.

Fewer scripts would be broken if the additional output is only
displayed when --work-tree is used, but that might be too special-case
for this situation.

>   cmn

Jack.
--
mathuin at gmail dot com

^ permalink raw reply

* Re: [PATCH] Implement fast hash-collision detection
From: Bill Zaumen @ 2011-11-30 19:00 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Jeff King, git, gitster, spearce, torvalds
In-Reply-To: <CACsJy8A6kGmn0h0xdxfTC4krXgc8hzO1fHTdqfk0YnASGN5K0w@mail.gmail.com>

[Will send a reply to Jeff's comment from last night with some 
clarifications and explanations later].

> What I'm thinking is whether it's possible to decouple two sha-1 roles
> in git, as object identifier and digest, separately. Each sha-1
> identifies an object and an extra set of digests on the "same" object.
> Object database is extended to store all these new digests and mapping
> between sha-1 and them. When we need to verify an object, given an
> sha-1, we rehash that object and check the result digest with the ones
> linked to the sha-1.

The patch I created (at least, a reasonable chunk of the code) kind of
does that:  it is very easy to change the CRC to whatever message digest
one wants.  I used a CRC primarily because I had the impression that
people were very concerned about speed, but it is easy to change that to
the message digest of your choice.  In any case, it might be a good
starting point if you want to try something in a different direction.

Basically, when you create a loose object, in addition to getting a
SHA-1 ID, you get a message digest that gets stored as well (in a
separate file). When you index a pack file, you get an IDX file
containing the SHA-1 ID  plus a corresponding MDS file containing the
message digest. Index-pack calculates the SHA-1 value from the object
stored in the pack file, and the (additional) message digest is computed
at the same time using the same data.  Commands like verify-pack check
both the IDX file and the MDS file for consistency with the matching
pack file.  The new message digest (the CRC in the patch) is used only
in cases where a repository is being altered (e.g., a loose object or
pack file is being created or a fetch, push, or pull operation) or some
explicit verification operation is running (e.g., git verify-pack).

Adding an additional header to the commit message is a good idea (I had
actually tried that, but something went wrong, although one of you
suggested what the problem might have been --- I can try again if there
is some interest in pursuing that).

It might be worth pointing out that you can use the SHA-1 hash of the
contents of objects (e.g., without the Git object header) as an
additional digest:  I tried a test using two 128-byte files with the
same MD5 hash, differing past the 20th byte, and deleted the first
four bytes of each.  With those bytes deleted, the hash collision
went away. I doubt if there is a known efficient algorithm that can
generate a hash collision for two files and for two other files that
differ from the first set by deleting N bytes from both.

^ permalink raw reply

* Re: log: option "--follow" not the default for a single file?
From: Ralf Thielow @ 2011-11-30 18:38 UTC (permalink / raw)
  To: Jeff King; +Cc: git
In-Reply-To: <20111130063743.GB5317@sigill.intra.peff.net>

>     pathspec. That may happen to match a single file in the current>     revision, but to git it is actually a prefix-limiting pattern, and
Is it possible to detect the case of a single file in the current
revisionand use "--follow" by default for exactly that?

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox