Git development

Git development
 help / color / mirror / Atom feed

* [PATCH] Display change history as a diff between two dirs
From: Roland Kaufmann @ 2011-10-29 20:51 UTC (permalink / raw)
  To: gitster; +Cc: git

Watching patches serially it can be difficult to get an overview of how
a pervasive change is distributed through-out different modules. Thus:

Extract snapshots of the files that have changed between two revisions
into temporary directories and launch a graphical tool to show the diff
between them.

Use existing functionality in git-diff to figure out which files have
changed, and to get the files themselves.

Based on a script called 'git-diffc' by Nitin Gupta.

Signed-off-by: Roland Kaufmann <rlndkfmn+git@gmail.com>
---

Requests for such a scripts surface occationally, so I believe it could
be useful to have in the distribution itself.

 Documentation/git-dirdiff.txt |   55 +++++++++++++++++++++++++++++++++++++++++
 Makefile                      |    2 +
 git-dirdiff--helper.sh        |   28 +++++++++++++++++++++
 git-dirdiff.sh                |   34 +++++++++++++++++++++++++
 4 files changed, 119 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/git-dirdiff.txt
 create mode 100755 git-dirdiff--helper.sh
 create mode 100755 git-dirdiff.sh

diff --git a/Documentation/git-dirdiff.txt b/Documentation/git-dirdiff.txt
new file mode 100644
index 0000000..bdd2581
--- /dev/null
+++ b/Documentation/git-dirdiff.txt
@@ -0,0 +1,55 @@
+git-dirdiff(1)
+==============
+
+NAME
+----
+git-dirdiff - Show changes using directory compare
+
+SYNOPSIS
+--------
+[verse]
+'git dirdiff' [<options>] [<commit> [<commit>]] [--] [<path>...]
+
+DESCRIPTION
+-----------
+'git dirdiff' is a git command that allows you to compare revisions
+as a difference between two directories. 'git dirdiff' is a frontend
+to linkgit:git-diff[1].
+
+OPTIONS
+-------
+See linkgit:git-diff[1] for the list of supported options.
+
+CONFIG VARIABLES
+----------------
+'git dirdiff' uses the same config variables as linkgit:git-difftool[1]
+to determine which difftool should be used.
+
+TEMPORARY FILES
+---------------
+'git dirdiff' creates a directory with 'mktemp' to hold snapshots of the
+files which are different in the two revisions. This directory is removed
+when the diff viewer terminates.
+
+NOTES
+-----
+The diff viewer must support being passed directories instead of files
+as its arguments.
++
+Files that are not put under version control are not included when
+viewing the difference between a revision and the working directory.
+
+SEE ALSO
+--------
+linkgit:git-diff[1]::
+	 Show changes between commits, commit and working tree, etc
+
+linkgit:git-difftool[1]::
+	Show changes using common diff tools
+
+linkgit:git-config[1]::
+	 Get and set repository or global options
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index 3139c19..03771cf 100644
--- a/Makefile
+++ b/Makefile
@@ -365,6 +365,8 @@ unexport CDPATH
 SCRIPT_SH += git-am.sh
 SCRIPT_SH += git-bisect.sh
 SCRIPT_SH += git-difftool--helper.sh
+SCRIPT_SH += git-dirdiff.sh
+SCRIPT_SH += git-dirdiff--helper.sh
 SCRIPT_SH += git-filter-branch.sh
 SCRIPT_SH += git-lost-found.sh
 SCRIPT_SH += git-merge-octopus.sh
diff --git a/git-dirdiff--helper.sh b/git-dirdiff--helper.sh
new file mode 100755
index 0000000..bc0b49d
--- /dev/null
+++ b/git-dirdiff--helper.sh
@@ -0,0 +1,28 @@
+#!/bin/sh
+#
+# Accumulate files in a changeset into a pre-defined directory.
+#
+# Copyright (C) 2011 Roland Kaufmann
+# Based on a script called git-diffc by Nitin Gupta
+#
+# This file is licensed under the GPL v2, or a later version
+# at the discretion of the official Git maintainer.
+
+# bail out if there is any problems copying
+set -e
+
+# check that we are called by git-dirdiff
+if [ -z $__GIT_DIFF_DIR ]; then
+  echo Error: Do not call $(basename $0) directly 1>&2
+  exit 1
+fi
+
+# don't attempt to copy new or removed files
+if [ "$2" != "/dev/null" ]; then
+  mkdir -p $__GIT_DIFF_DIR/old/$(dirname $1)
+  cp $2 $__GIT_DIFF_DIR/old/$1
+fi
+if [ "$5" != "/dev/null" ]; then
+  mkdir -p $__GIT_DIFF_DIR/new/$(dirname $1)
+  cp $5 $__GIT_DIFF_DIR/new/$1
+fi
diff --git a/git-dirdiff.sh b/git-dirdiff.sh
new file mode 100755
index 0000000..4e75eda
--- /dev/null
+++ b/git-dirdiff.sh
@@ -0,0 +1,34 @@
+#!/bin/sh
+#
+# Display differences between two commits with a directory comparison.
+#
+# Copyright (C) 2011 Roland Kaufmann
+# Based on a script called git-diffc by Nitin Gupta
+#
+# This file is licensed under the GPL v2, or a later version
+# at the discretion of the official Git maintainer.
+
+# bail out if there is any problems in getting a diff
+set -e
+
+# create a temporary directory to hold snapshots of changed files
+__GIT_DIFF_DIR=$(mktemp --tmpdir -d git-dirdiff.XXXXXX)
+export __GIT_DIFF_DIR
+
+# cleanup after we're done
+trap 'rm -rf $__GIT_DIFF_DIR' 0
+
+# list all files that have changed. store this list in a separate
+# file so that we can test the exit status of this command. (if we had
+# bash we could use pipefail, or if we had Posix we could use mkfifo)
+git diff --raw "$@" > $__GIT_DIFF_DIR/toc
+
+# let the helper script accumulate them into the temporary directory
+cut -f 2- -s $__GIT_DIFF_DIR/toc | while read f; do
+  GIT_EXTERNAL_DIFF=git-dirdiff--helper git --no-pager diff "$@" $f
+done
+
+# run original diff program, reckoning it will understand directories
+# modes and shas does not apply to the root directories so submit dummy
+# values for those, hoping that the diff tool does not use them.
+git-difftool--helper - $__GIT_DIFF_DIR/old deadbeef 0755 $__GIT_DIFF_DIR/new babeface 0755
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH] http.c: Use curl_multi_fdset to select on curl fds instead of just sleeping
From: Daniel Stenberg @ 2011-10-29 20:33 UTC (permalink / raw)
  To: Mika Fischer; +Cc: git
In-Reply-To: <1319901621-482-1-git-send-email-mika.fischer@zoopnet.de>

On Sat, 29 Oct 2011, Mika Fischer wrote:

> Previously, when nothing could be read from the connections curl had open, 
> git would just sleep unconditionally for 50ms. This patch changes this 
> behavior and instead obtains the recommended timeout and the actual file 
> descriptors from curl. This should eliminate time spent sleeping when data 
> could actually be read/written on the socket.

It looks fine to me, from a libcurl perspective. I only have one comment about 
this:

> +			curl_multi_fdset(curlm, &readfds, &writefds, &excfds, &max_fd);
> +
> +			select(max_fd+1, &readfds, &writefds, &excfds, &select_timeout);

At times, curl_multi_fdset() might return -1 in max_fd, as when there's no 
internal socket around to provide to the application to wait for.

Calling select() with max_fd+1 (== 0) will then not be appreciated by all 
implementations of select() so that case should probably also be covered by 
the 50ms sleep approach...

-- 

  / daniel.haxx.se

^ permalink raw reply

* Re: What's cooking in git.git (Oct 2011, #11; Fri, 28)
From: Erik Faye-Lund @ 2011-10-29 15:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, msysGit
In-Reply-To: <7vzkglrnmc.fsf@alter.siamese.dyndns.org>

Cc'ing the msysgit list.

On Fri, Oct 28, 2011 at 8:12 PM, Junio C Hamano <gitster@pobox.com> wrote:
> * ef/mingw-upload-archive (2011-10-26) 3 commits
>  - upload-archive: use start_command instead of fork
>  - compat/win32/poll.c: upgrade from upstream
>  - mingw: move poll out of sys-folder
>
> Are msysgit folks OK with this series (I didn't see msysgit list Cc'ed on
> these patches)? If so let's move this forward, as the changes to the core
> part seem solid.
>

The msysgit list not being Cc'ed on the patches was a slip-up on my
behalf. I believe the changes are relatively uncontroversial from an
msysgit point of view, though. However, an ack/nack would be
appreciated ;)

Or does people prefer me re-sending the series, with the msysgit list Cc'ed?

^ permalink raw reply

* [PATCH] http.c: Use curl_multi_fdset to select on curl fds instead of just sleeping
From: Mika Fischer @ 2011-10-29 15:20 UTC (permalink / raw)
  To: git; +Cc: Mika Fischer

Previously, when nothing could be read from the connections curl had
open, git would just sleep unconditionally for 50ms. This patch changes
this behavior and instead obtains the recommended timeout and the actual
file descriptors from curl. This should eliminate time spent sleeping when
data could actually be read/written on the socket.

Signed-off-by: Mika Fischer <mika.fischer@zoopnet.de>
---
 http.c |   21 ++++++++++++++++-----
 1 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/http.c b/http.c
index a4bc770..12180f3 100644
--- a/http.c
+++ b/http.c
@@ -649,6 +649,7 @@ void run_active_slot(struct active_request_slot *slot)
 	fd_set excfds;
 	int max_fd;
 	struct timeval select_timeout;
+	long int curl_timeout;
 	int finished = 0;
 
 	slot->finished = &finished;
@@ -664,14 +665,24 @@ void run_active_slot(struct active_request_slot *slot)
 		}
 
 		if (slot->in_use && !data_received) {
-			max_fd = 0;
+			curl_multi_timeout(curlm, &curl_timeout);
+			if (curl_timeout == 0) {
+				continue;
+			} else if (curl_timeout == -1) {
+				select_timeout.tv_sec  = 0;
+				select_timeout.tv_usec = 50000;
+			} else {
+				select_timeout.tv_sec  =  curl_timeout / 1000;
+				select_timeout.tv_usec = (curl_timeout % 1000) * 1000;
+			}
+
+			max_fd = -1;
 			FD_ZERO(&readfds);
 			FD_ZERO(&writefds);
 			FD_ZERO(&excfds);
-			select_timeout.tv_sec = 0;
-			select_timeout.tv_usec = 50000;
-			select(max_fd, &readfds, &writefds,
-			       &excfds, &select_timeout);
+			curl_multi_fdset(curlm, &readfds, &writefds, &excfds, &max_fd);
+
+			select(max_fd+1, &readfds, &writefds, &excfds, &select_timeout);
 		}
 	}
 #else
-- 
1.7.7.1.489.g1fee

^ permalink raw reply related

* Re: git slow over https
From: Mika Fischer @ 2011-10-29 15:15 UTC (permalink / raw)
  To: Daniel Stenberg; +Cc: Git Mailing List
In-Reply-To: <alpine.DEB.2.00.1110282019510.28338@tvnag.unkk.fr>

Thanks for the pointer. Doing it this way fixes things for me. I'll
send a patch soon. I'd appreciate it if you could check it quicky.

Best,
 Mika

On Fri, Oct 28, 2011 at 20:28, Daniel Stenberg <daniel@haxx.se> wrote:
> On Fri, 28 Oct 2011, Mika Fischer wrote:
>
>> 1) What's the purpose of the select in http.c:673? Can it be removed?
>> 2) If it serves a useful purpose, what can be the reason that it hurts
>> performance so much in my case?
>
> The purpose must be to avoid busy-looping in case there's nothing to read.
>
> It should probably use curl_multi_fdset [1] to get a decent set to wait for
> instead so that it'll return fast if there is pending data. The timeout for
> select can in fact also get extended with the use of curl_multi_timeout [2].
>
> 1 = http://curl.haxx.se/libcurl/c/curl_multi_fdset.html
> 2 = http://curl.haxx.se/libcurl/c/curl_multi_timeout.html
>
> --
>
>  / daniel.haxx.se
>
>

^ permalink raw reply

* Re: [PATCHv2 3/3] completion: match ctags symbol names in grep patterns
From: SZEDER Gábor @ 2011-10-29 12:47 UTC (permalink / raw)
  To: Jeff King; +Cc: git
In-Reply-To: <20111028060517.GA3993@sigill.intra.peff.net>

Hi,


On Thu, Oct 27, 2011 at 11:05:20PM -0700, Jeff King wrote:
> On Sun, Oct 23, 2011 at 11:29:28PM +0200, SZEDER Gábor wrote:
> 
> > On Fri, Oct 21, 2011 at 01:30:21PM -0400, Jeff King wrote:
> > > This incorporates the suggestions from Gábor's review, with one
> > > exception: it still looks only in the current directory for the "tags"
> > > files. I think that might have some performance implications, so I'd
> > > rather add it separately, if at all.
> > 
> > I agree that scanning through a whole working tree for tags files
> > would cost too much.  But I think that a tags file at the top of the
> > working tree is common enough to be supported, and checking its
> > existence is fairly cheap.
> 
> Actually, it's not too expensive. Asking git for the top of the working
> tree means it has to traverse up to there anyway. So the trick is just
> doing our search without invoking too many external tools which would
> cause unnecessary forks.
> 
> The patch is below, but I'm still not sure it's a good idea.
> 
> Grep only looks in the current subdirectory for matches.

Unless the user explicitly specifies the path(s)...  But that comes at
the end of the command line, so the completion script could have no
idea about it at the time of 'git grep <TAB>'.

> > So how about something like this for the case arm? (I didn't actually
> > tested it.)
> > 
> > 		local tagsfile
> > 		if test -r tags; then
> > 			tagsfile=tags
> > 		else
> > 			local dir="$(__gitdir)"
> 
> You don't want __gitdir here, but rather "git rev-parse --show-cdup".

Oh, yes, indeed.

But there was a point in using __gitdir() here: it respects the
--git-dir= option.  Which brings up the question: what
should 'git --git-dir=/some/where/.git grep <TAB>' offer?

So in the end I agree that it's not a good idea.

> > Btw, there is a bug in the case statement: 'git --no-pager grep <TAB>'
> > offers refs instead of symbols, because $cword is not 2 and $prev
> > doesn't start with a dash.  But it's not worse than the current
> > behavior, so I don't think this bug is a show-stopper for the patch.
> 
> Yeah. The intent of the "$cword is 2" thing is to know that we are the
> first non-option argument. Arguably, _get_comp_words_by_ref should
> somehow tell us which position we are completing relative to the actual
> command name.

_get_comp_words_by_ref() is a general completion function, the purpose
of which is to provide a bash-version-independent equivalent of
$COMP_WORDS and $COMP_CWORD by working around the different word
splitting rules.  It doesn't know about git and its commands at all.

But there is the while loop in _git() that looks for the git command
(among other things) on the command line, which could store the index
of the command name in $words in a variable.  This variable could then
be used in that case statement (and probably in a couple of other
places, too).


Best,
Gábor


> ---
> diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
> index af283cb..b0ed657 100755
> --- a/contrib/completion/git-completion.bash
> +++ b/contrib/completion/git-completion.bash
> @@ -1429,6 +1429,39 @@ _git_gitk ()
>  	_gitk
>  }
>  
> +__git_cdup_dirs() {
> +	local prefix=$(git rev-parse --show-cdup 2>/dev/null)
> +	local oldifs=$IFS
> +	local dots
> +	local i
> +	IFS=/
> +	for i in $prefix; do
> +		dots=$dots../
> +		echo "$dots"
> +	done
> +	IFS=$oldifs
> +}

No need for $oldifs here; do a local IFS=/ instead, and then it just
goes out of scope when returning from the function.

> +
> +__git_find_in_cdup_one() {
> +	local dir=$1; shift
> +	for i in "$@"; do
> +		if test -r "$dir$i"; then
> +			echo "$dir$i"
> +			return 0
> +		fi
> +	done
> +	return 1
> +}
> +
> +__git_find_in_cdup() {
> +	__git_find_in_cdup_one "" "$@" && return
> +
> +	local dir
> +	for dir in $(__git_cdup_dirs); do
> +		__git_find_in_cdup_one "$dir" "$@" && return
> +	done
> +}
> +
>  __git_match_ctag() {
>  	awk "/^${1////\\/}/ { print \$1 }" "$2"
>  }
> @@ -1457,8 +1490,9 @@ _git_grep ()
>  
>  	case "$cword,$prev" in
>  	2,*|*,-*)
> -		if test -r tags; then
> -			__gitcomp "$(__git_match_ctag "$cur" tags)"
> +		local tags=$(__git_find_in_cdup tags)
> +		if test -n "$tags"; then
> +			__gitcomp "$(__git_match_ctag "$cur" "$tags")"
>  			return
>  		fi
>  		;;

^ permalink raw reply

* Re: Git is exploding
From: Ramkumar Ramachandra @ 2011-10-29  9:12 UTC (permalink / raw)
  To: Tay Ray Chuan; +Cc: Øyvind A. Holm, git
In-Reply-To: <CALUzUxqHNByaV+TL2p4wBcwaLNpiaATw14Jgkb1YwcfXxNkMrg@mail.gmail.com>

Hi Tay,

Tay Ray Chuan writes:
> How were the numbers gathered? I looked around the page but gave up.

See: http://popcon.debian.org/FAQ

-- Ram

^ permalink raw reply

* Re: Git is exploding
From: Tay Ray Chuan @ 2011-10-29  8:36 UTC (permalink / raw)
  To: Øyvind A. Holm; +Cc: git
In-Reply-To: <CAA787r=jeBv9moineaJVY=urYzEX+d7n23ED-txAGhLS+OPbmg@mail.gmail.com>

On Sat, Oct 29, 2011 at 8:39 AM, Øyvind A. Holm <sunny@sunbase.org> wrote:
> Found an interesting "Popularity Contest" graph on debian.org (via
> Thomas Bassetto on G+):
>
> http://bit.ly/rNxVN0
>
> Very cool indeed. Maybe it's the rise of GitHub, or simply that the
> user interface is mature enough that also "regular" users feel
> comfortable with it.

How were the numbers gathered? I looked around the page but gave up.

-- 
Cheers,
Ray Chuan

^ permalink raw reply

* Re: Git is exploding
From: Miles Bader @ 2011-10-29  8:28 UTC (permalink / raw)
  To: Øyvind A. Holm; +Cc: git
In-Reply-To: <8762j8jje9.fsf@catnip.gol.com>

2011/10/29 Miles Bader <miles@gnu.org>:
> That the sharpness of that graph is pretty amazing though; what
> happened in 2010Q1?

Actually, now I realize what happened:  that's the date the Debian
"git-core" package was renamed "git" (the "git" package used to be
"gnu interactive tools")!!

-Miles

-- 
Cat is power.  Cat is peace.

^ permalink raw reply

* Re: sparse checkout using exclusions
From: Ramkumar Ramachandra @ 2011-10-29  5:46 UTC (permalink / raw)
  To: Eric Raible; +Cc: git@vger.kernel.org
In-Reply-To: <4EAB4632.5080101@nextest.com>

Hi Eric,

Eric Raible writes:
> Might it make sense for the example in git-read-tree.html to be
> updated to include the leading slash?

This issue was fixed in 5e821231 (git-read-tree.txt: update sparse
checkout examples, 2011-09-26).

Cheers.

-- Ram

^ permalink raw reply

* Bitbucket now has git
From: Alec Taylor @ 2011-10-29  3:36 UTC (permalink / raw)
  To: git

Please update http://git-scm.com/tools

^ permalink raw reply

* Fork freedesktop project to bitbucket, make changes, generate patch back to freedesktop?
From: Alec Taylor @ 2011-10-29  3:35 UTC (permalink / raw)
  To: git

Good afternoon,

I've forked a [git] freedesktop project to [git] bitbucket.

I am working with a team extending the functionality of this project.

After many MANY adds, commits and pushes back and forth on the
bitbucket project, we then want to send this freedesktop project a
PATCH with the changes we've made.

Can you tell me the command I need to do this?

Thanks for all suggestions,

Alec Taylor

^ permalink raw reply

* Git is exploding
From: Øyvind A. Holm @ 2011-10-29  0:39 UTC (permalink / raw)
  To: git

Found an interesting "Popularity Contest" graph on debian.org (via
Thomas Bassetto on G+):

http://bit.ly/rNxVN0

Very cool indeed. Maybe it's the rise of GitHub, or simply that the
user interface is mature enough that also "regular" users feel
comfortable with it.

Regards,
Øyvind

^ permalink raw reply

* sparse checkout using exclusions
From: Eric Raible @ 2011-10-29  0:17 UTC (permalink / raw)
  To: git@vger.kernel.org

Hi all.

I was just about to send a long message about using exclusions
in sparse-checkout, when I did one last search and saw that all
of my problems were fixed by using '/*' instead of '*' as the
first line in .git/info/sparse-checkout.

Might it make sense for the example in git-read-tree.html to be
updated to include the leading slash?

    /*
    !unwanted

- Eric

^ permalink raw reply

* [PATCH 4/4] Bulk check-in
From: Junio C Hamano @ 2011-10-28 23:54 UTC (permalink / raw)
  To: git
In-Reply-To: <1319846051-462-1-git-send-email-gitster@pobox.com>

This extends the earlier approach to stream a large file directly from the
filesystem to its own packfile, and allows "git add" to send large files
directly into a single pack. Older code used to spawn fast-import, but
the new bulk_checkin API replaces it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Makefile         |    2 +
 builtin/add.c    |    5 ++
 bulk-checkin.c   |  159 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 bulk-checkin.h   |   16 ++++++
 sha1_file.c      |   67 ++---------------------
 t/t1050-large.sh |   26 +++++++--
 6 files changed, 206 insertions(+), 69 deletions(-)
 create mode 100644 bulk-checkin.c
 create mode 100644 bulk-checkin.h

diff --git a/Makefile b/Makefile
index 3139c19..418dd2e 100644
--- a/Makefile
+++ b/Makefile
@@ -505,6 +505,7 @@ LIB_H += argv-array.h
 LIB_H += attr.h
 LIB_H += blob.h
 LIB_H += builtin.h
+LIB_H += bulk-checkin.h
 LIB_H += cache.h
 LIB_H += cache-tree.h
 LIB_H += color.h
@@ -591,6 +592,7 @@ LIB_OBJS += base85.o
 LIB_OBJS += bisect.o
 LIB_OBJS += blob.o
 LIB_OBJS += branch.o
+LIB_OBJS += bulk-checkin.o
 LIB_OBJS += bundle.o
 LIB_OBJS += cache-tree.o
 LIB_OBJS += color.o
diff --git a/builtin/add.c b/builtin/add.c
index c59b0c9..1c42900 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -13,6 +13,7 @@
 #include "diff.h"
 #include "diffcore.h"
 #include "revision.h"
+#include "bulk-checkin.h"
 
 static const char * const builtin_add_usage[] = {
 	"git add [options] [--] <filepattern>...",
@@ -458,11 +459,15 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 		free(seen);
 	}
 
+	plug_bulk_checkin();
+
 	exit_status |= add_files_to_cache(prefix, pathspec, flags);
 
 	if (add_new_files)
 		exit_status |= add_files(&dir, flags);
 
+	unplug_bulk_checkin();
+
  finish:
 	if (active_cache_changed) {
 		if (write_cache(newfd, active_cache, active_nr) ||
diff --git a/bulk-checkin.c b/bulk-checkin.c
new file mode 100644
index 0000000..cad7a0b
--- /dev/null
+++ b/bulk-checkin.c
@@ -0,0 +1,159 @@
+/*
+ * Copyright (c) 2011, Google Inc.
+ */
+#include "bulk-checkin.h"
+#include "csum-file.h"
+#include "pack.h"
+
+static int pack_compression_level = Z_DEFAULT_COMPRESSION;
+
+static struct bulk_checkin_state {
+	unsigned plugged:1;
+
+	char *pack_tmp_name;
+	struct sha1file *f;
+	off_t offset;
+	struct pack_idx_option pack_idx_opts;
+
+	struct pack_idx_entry **written;
+	uint32_t alloc_written;
+	uint32_t nr_written;
+} state;
+
+static void finish_bulk_checkin(struct bulk_checkin_state *state)
+{
+	unsigned char sha1[20];
+	char packname[PATH_MAX];
+	int i;
+
+	if (!state->f)
+		return;
+
+	if (state->nr_written == 1) {
+		sha1close(state->f, sha1, CSUM_FSYNC);
+	} else {
+		int fd = sha1close(state->f, sha1, 0);
+		fixup_pack_header_footer(fd, sha1, state->pack_tmp_name,
+					 state->nr_written, sha1,
+					 state->offset);
+		close(fd);
+	}
+
+	sprintf(packname, "%s/pack/pack-", get_object_directory());
+	finish_tmp_packfile(packname, state->pack_tmp_name,
+			    state->written, state->nr_written,
+			    &state->pack_idx_opts, sha1);
+	for (i = 0; i < state->nr_written; i++)
+		free(state->written[i]);
+	free(state->written);
+	memset(state, 0, sizeof(*state));
+
+	/* Make objects we just wrote available to ourselves */
+	reprepare_packed_git();
+}
+
+static void deflate_to_pack(struct bulk_checkin_state *state,
+			    unsigned char sha1[],
+			    int fd, size_t size, enum object_type type,
+			    const char *path, unsigned flags)
+{
+	unsigned char obuf[16384];
+	unsigned hdrlen;
+	git_zstream s;
+	git_SHA_CTX ctx;
+	int write_object = (flags & HASH_WRITE_OBJECT);
+	int status = Z_OK;
+	struct pack_idx_entry *idx = NULL;
+
+	hdrlen = sprintf((char *)obuf, "%s %" PRIuMAX, typename(type), size) + 1;
+	git_SHA1_Init(&ctx);
+	git_SHA1_Update(&ctx, obuf, hdrlen);
+
+	if (write_object) {
+		idx = xcalloc(1, sizeof(*idx));
+		idx->offset = state->offset;
+		crc32_begin(state->f);
+	}
+	memset(&s, 0, sizeof(s));
+	git_deflate_init(&s, pack_compression_level);
+
+	hdrlen = encode_in_pack_object_header(type, size, obuf);
+	s.next_out = obuf + hdrlen;
+	s.avail_out = sizeof(obuf) - hdrlen;
+
+	while (status != Z_STREAM_END) {
+		unsigned char ibuf[16384];
+
+		if (size && !s.avail_in) {
+			ssize_t rsize = size < sizeof(ibuf) ? size : sizeof(ibuf);
+			if (xread(fd, ibuf, rsize) != rsize)
+				die("failed to read %d bytes from '%s'",
+				    (int)rsize, path);
+			git_SHA1_Update(&ctx, ibuf, rsize);
+			s.next_in = ibuf;
+			s.avail_in = rsize;
+			size -= rsize;
+		}
+
+		status = git_deflate(&s, size ? 0 : Z_FINISH);
+
+		if (!s.avail_out || status == Z_STREAM_END) {
+			size_t written = s.next_out - obuf;
+			if (write_object) {
+				sha1write(state->f, obuf, written);
+				state->offset += written;
+			}
+			s.next_out = obuf;
+			s.avail_out = sizeof(obuf);
+		}
+
+		switch (status) {
+		case Z_OK:
+		case Z_BUF_ERROR:
+		case Z_STREAM_END:
+			continue;
+		default:
+			die("unexpected deflate failure: %d", status);
+		}
+	}
+	git_deflate_end(&s);
+	git_SHA1_Final(sha1, &ctx);
+	if (write_object) {
+		idx->crc32 = crc32_end(state->f);
+		hashcpy(idx->sha1, sha1);
+		ALLOC_GROW(state->written,
+			   state->nr_written + 1, state->alloc_written);
+		state->written[state->nr_written++] = idx;
+	}
+}
+
+int index_bulk_checkin(unsigned char *sha1,
+		       int fd, size_t size, enum object_type type,
+		       const char *path, unsigned flags)
+{
+	if (!state.f && (flags & HASH_WRITE_OBJECT)) {
+		state.f = create_tmp_packfile(&state.pack_tmp_name);
+		reset_pack_idx_option(&state.pack_idx_opts);
+		/* Pretend we are going to write only one object */
+		state.offset = write_pack_header(state.f, 1);
+		if (!state.offset)
+			die_errno("unable to write pack header");
+	}
+
+	deflate_to_pack(&state, sha1, fd, size, type, path, flags);
+	if (!state.plugged)
+		finish_bulk_checkin(&state);
+	return 0;
+}
+
+void plug_bulk_checkin(void)
+{
+	state.plugged = 1;
+}
+
+void unplug_bulk_checkin(void)
+{
+	state.plugged = 0;
+	if (state.f)
+		finish_bulk_checkin(&state);
+}
diff --git a/bulk-checkin.h b/bulk-checkin.h
new file mode 100644
index 0000000..4f599f8
--- /dev/null
+++ b/bulk-checkin.h
@@ -0,0 +1,16 @@
+/*
+ * Copyright (c) 2011, Google Inc.
+ */
+#ifndef BULK_CHECKIN_H
+#define BULK_CHECKIN_H
+
+#include "cache.h"
+
+extern int index_bulk_checkin(unsigned char sha1[],
+			      int fd, size_t size, enum object_type type,
+			      const char *path, unsigned flags);
+
+extern void plug_bulk_checkin(void);
+extern void unplug_bulk_checkin(void);
+
+#endif
diff --git a/sha1_file.c b/sha1_file.c
index 27f3b9b..c96e366 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -18,6 +18,7 @@
 #include "refs.h"
 #include "pack-revindex.h"
 #include "sha1-lookup.h"
+#include "bulk-checkin.h"
 
 #ifndef O_NOATIME
 #if defined(__linux__) && (defined(__i386__) || defined(__PPC__))
@@ -2679,10 +2680,8 @@ static int index_core(unsigned char *sha1, int fd, size_t size,
 }
 
 /*
- * This creates one packfile per large blob, because the caller
- * immediately wants the result sha1, and fast-import can report the
- * object name via marks mechanism only by closing the created
- * packfile.
+ * This creates one packfile per large blob unless bulk-checkin
+ * machinery is "plugged".
  *
  * This also bypasses the usual "convert-to-git" dance, and that is on
  * purpose. We could write a streaming version of the converting
@@ -2696,65 +2695,7 @@ static int index_stream(unsigned char *sha1, int fd, size_t size,
 			enum object_type type, const char *path,
 			unsigned flags)
 {
-	struct child_process fast_import;
-	char export_marks[512];
-	const char *argv[] = { "fast-import", "--quiet", export_marks, NULL };
-	char tmpfile[512];
-	char fast_import_cmd[512];
-	char buf[512];
-	int len, tmpfd;
-
-	strcpy(tmpfile, git_path("hashstream_XXXXXX"));
-	tmpfd = git_mkstemp_mode(tmpfile, 0600);
-	if (tmpfd < 0)
-		die_errno("cannot create tempfile: %s", tmpfile);
-	if (close(tmpfd))
-		die_errno("cannot close tempfile: %s", tmpfile);
-	sprintf(export_marks, "--export-marks=%s", tmpfile);
-
-	memset(&fast_import, 0, sizeof(fast_import));
-	fast_import.in = -1;
-	fast_import.argv = argv;
-	fast_import.git_cmd = 1;
-	if (start_command(&fast_import))
-		die_errno("index-stream: git fast-import failed");
-
-	len = sprintf(fast_import_cmd, "blob\nmark :1\ndata %lu\n",
-		      (unsigned long) size);
-	write_or_whine(fast_import.in, fast_import_cmd, len,
-		       "index-stream: feeding fast-import");
-	while (size) {
-		char buf[10240];
-		size_t sz = size < sizeof(buf) ? size : sizeof(buf);
-		ssize_t actual;
-
-		actual = read_in_full(fd, buf, sz);
-		if (actual < 0)
-			die_errno("index-stream: reading input");
-		if (write_in_full(fast_import.in, buf, actual) != actual)
-			die_errno("index-stream: feeding fast-import");
-		size -= actual;
-	}
-	if (close(fast_import.in))
-		die_errno("index-stream: closing fast-import");
-	if (finish_command(&fast_import))
-		die_errno("index-stream: finishing fast-import");
-
-	tmpfd = open(tmpfile, O_RDONLY);
-	if (tmpfd < 0)
-		die_errno("index-stream: cannot open fast-import mark");
-	len = read(tmpfd, buf, sizeof(buf));
-	if (len < 0)
-		die_errno("index-stream: reading fast-import mark");
-	if (close(tmpfd) < 0)
-		die_errno("index-stream: closing fast-import mark");
-	if (unlink(tmpfile))
-		die_errno("index-stream: unlinking fast-import mark");
-	if (len != 44 ||
-	    memcmp(":1 ", buf, 3) ||
-	    get_sha1_hex(buf + 3, sha1))
-		die_errno("index-stream: unexpected fast-import mark: <%s>", buf);
-	return 0;
+	return index_bulk_checkin(sha1, fd, size, type, path, flags);
 }
 
 int index_fd(unsigned char *sha1, int fd, struct stat *st,
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index deba111..36def25 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -7,14 +7,28 @@ test_description='adding and checking out large blobs'
 
 test_expect_success setup '
 	git config core.bigfilethreshold 200k &&
-	echo X | dd of=large bs=1k seek=2000
+	echo X | dd of=large bs=1k seek=2000 &&
+	echo Y | dd of=huge bs=1k seek=2500
 '
 
-test_expect_success 'add a large file' '
-	git add large &&
-	# make sure we got a packfile and no loose objects
-	test -f .git/objects/pack/pack-*.pack &&
-	test ! -f .git/objects/??/??????????????????????????????????????
+test_expect_success 'add a large file or two' '
+	git add large huge &&
+	# make sure we got a single packfile and no loose objects
+	bad= count=0 &&
+	for p in .git/objects/pack/pack-*.pack
+	do
+		count=$(( $count + 1 ))
+		test -f "$p" && continue
+		bad=t
+	done &&
+	test -z "$bad" &&
+	test $count = 1 &&
+	for l in .git/objects/??/??????????????????????????????????????
+	do
+		test -f "$l" || continue
+		bad=t
+	done &&
+	test -z "$bad"
 '
 
 test_expect_success 'checkout a large file' '
-- 
1.7.7.1.573.ga40d2

^ permalink raw reply related

* [PATCH 3/4] finish_tmp_packfile(): a helper function
From: Junio C Hamano @ 2011-10-28 23:54 UTC (permalink / raw)
  To: git
In-Reply-To: <1319846051-462-1-git-send-email-gitster@pobox.com>

Factor out a small logic out of the private write_pack_file() function
in builtin/pack-objects.c.

This changes the order of finishing multi-pack generation slightly. The
code used to

 - adjust shared perm of temporary packfile
 - rename temporary packfile to the final name
 - update mtime of the packfile under the final name
 - adjust shared perm of temporary idxfile
 - rename temporary idxfile to the final name

but because the helper does not want to do the mtime thing, the updated
code does that step first and then all the rest.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/pack-objects.c |   33 ++++++++++-----------------------
 pack-write.c           |   31 +++++++++++++++++++++++++++++++
 pack.h                 |    1 +
 3 files changed, 42 insertions(+), 23 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 3258fa9..b458b6d 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -617,20 +617,8 @@ static void write_pack_file(void)
 
 		if (!pack_to_stdout) {
 			struct stat st;
-			const char *idx_tmp_name;
 			char tmpname[PATH_MAX];
 
-			idx_tmp_name = write_idx_file(NULL, written_list, nr_written,
-						      &pack_idx_opts, sha1);
-
-			snprintf(tmpname, sizeof(tmpname), "%s-%s.pack",
-				 base_name, sha1_to_hex(sha1));
-			free_pack_by_name(tmpname);
-			if (adjust_shared_perm(pack_tmp_name))
-				die_errno("unable to make temporary pack file readable");
-			if (rename(pack_tmp_name, tmpname))
-				die_errno("unable to rename temporary pack file");
-
 			/*
 			 * Packs are runtime accessed in their mtime
 			 * order since newer packs are more likely to contain
@@ -638,28 +626,27 @@ static void write_pack_file(void)
 			 * packs then we should modify the mtime of later ones
 			 * to preserve this property.
 			 */
-			if (stat(tmpname, &st) < 0) {
+			if (stat(pack_tmp_name, &st) < 0) {
 				warning("failed to stat %s: %s",
-					tmpname, strerror(errno));
+					pack_tmp_name, strerror(errno));
 			} else if (!last_mtime) {
 				last_mtime = st.st_mtime;
 			} else {
 				struct utimbuf utb;
 				utb.actime = st.st_atime;
 				utb.modtime = --last_mtime;
-				if (utime(tmpname, &utb) < 0)
+				if (utime(pack_tmp_name, &utb) < 0)
 					warning("failed utime() on %s: %s",
 						tmpname, strerror(errno));
 			}
 
-			snprintf(tmpname, sizeof(tmpname), "%s-%s.idx",
-				 base_name, sha1_to_hex(sha1));
-			if (adjust_shared_perm(idx_tmp_name))
-				die_errno("unable to make temporary index file readable");
-			if (rename(idx_tmp_name, tmpname))
-				die_errno("unable to rename temporary index file");
-
-			free((void *) idx_tmp_name);
+			/* Enough space for "-<sha-1>.pack"? */
+			if (sizeof(tmpname) <= strlen(base_name) + 50)
+				die("pack base name '%s' too long", base_name);
+			snprintf(tmpname, sizeof(tmpname), "%s-", base_name);
+			finish_tmp_packfile(tmpname, pack_tmp_name,
+					    written_list, nr_written,
+					    &pack_idx_opts, sha1);
 			free(pack_tmp_name);
 			puts(sha1_to_hex(sha1));
 		}
diff --git a/pack-write.c b/pack-write.c
index 863cce8..cadc3e1 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -338,3 +338,34 @@ struct sha1file *create_tmp_packfile(char **pack_tmp_name)
 	*pack_tmp_name = xstrdup(tmpname);
 	return sha1fd(fd, *pack_tmp_name);
 }
+
+void finish_tmp_packfile(char *name_buffer,
+			 const char *pack_tmp_name,
+			 struct pack_idx_entry **written_list,
+			 uint32_t nr_written,
+			 struct pack_idx_option *pack_idx_opts,
+			 unsigned char sha1[])
+{
+	const char *idx_tmp_name;
+	char *end_of_name_prefix = strrchr(name_buffer, 0);
+
+	if (adjust_shared_perm(pack_tmp_name))
+		die_errno("unable to make temporary pack file readable");
+
+	idx_tmp_name = write_idx_file(NULL, written_list, nr_written,
+				      pack_idx_opts, sha1);
+	if (adjust_shared_perm(idx_tmp_name))
+		die_errno("unable to make temporary index file readable");
+
+	sprintf(end_of_name_prefix, "%s.pack", sha1_to_hex(sha1));
+	free_pack_by_name(name_buffer);
+
+	if (rename(pack_tmp_name, name_buffer))
+		die_errno("unable to rename temporary pack file");
+
+	sprintf(end_of_name_prefix, "%s.idx", sha1_to_hex(sha1));
+	if (rename(idx_tmp_name, name_buffer))
+		die_errno("unable to rename temporary index file");
+
+	free((void *)idx_tmp_name);
+}
diff --git a/pack.h b/pack.h
index 0027ac6..cfb0f69 100644
--- a/pack.h
+++ b/pack.h
@@ -86,5 +86,6 @@ extern int encode_in_pack_object_header(enum object_type, uintmax_t, unsigned ch
 extern int read_pack_header(int fd, struct pack_header *);
 
 extern struct sha1file *create_tmp_packfile(char **pack_tmp_name);
+extern void finish_tmp_packfile(char *name_buffer, const char *pack_tmp_name, struct pack_idx_entry **written_list, uint32_t nr_written, struct pack_idx_option *pack_idx_opts, unsigned char sha1[]);
 
 #endif
-- 
1.7.7.1.573.ga40d2

^ permalink raw reply related

* [PATCH 2/4] create_tmp_packfile(): a helper function
From: Junio C Hamano @ 2011-10-28 23:54 UTC (permalink / raw)
  To: git
In-Reply-To: <1319846051-462-1-git-send-email-gitster@pobox.com>

Factor out a small logic out of the private write_pack_file() function
in builtin/pack-objects.c

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/pack-objects.c |   12 +++---------
 pack-write.c           |   10 ++++++++++
 pack.h                 |    3 +++
 3 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 6643c16..3258fa9 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -584,16 +584,10 @@ static void write_pack_file(void)
 		unsigned char sha1[20];
 		char *pack_tmp_name = NULL;
 
-		if (pack_to_stdout) {
+		if (pack_to_stdout)
 			f = sha1fd_throughput(1, "<stdout>", progress_state);
-		} else {
-			char tmpname[PATH_MAX];
-			int fd;
-			fd = odb_mkstemp(tmpname, sizeof(tmpname),
-					 "pack/tmp_pack_XXXXXX");
-			pack_tmp_name = xstrdup(tmpname);
-			f = sha1fd(fd, pack_tmp_name);
-		}
+		else
+			f = create_tmp_packfile(&pack_tmp_name);
 
 		offset = write_pack_header(f, nr_remaining);
 		if (!offset)
diff --git a/pack-write.c b/pack-write.c
index 46f3f84..863cce8 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -328,3 +328,13 @@ int encode_in_pack_object_header(enum object_type type, uintmax_t size, unsigned
 	*hdr = c;
 	return n;
 }
+
+struct sha1file *create_tmp_packfile(char **pack_tmp_name)
+{
+	char tmpname[PATH_MAX];
+	int fd;
+
+	fd = odb_mkstemp(tmpname, sizeof(tmpname), "pack/tmp_pack_XXXXXX");
+	*pack_tmp_name = xstrdup(tmpname);
+	return sha1fd(fd, *pack_tmp_name);
+}
diff --git a/pack.h b/pack.h
index d429d8a..0027ac6 100644
--- a/pack.h
+++ b/pack.h
@@ -84,4 +84,7 @@ extern int encode_in_pack_object_header(enum object_type, uintmax_t, unsigned ch
 #define PH_ERROR_PACK_SIGNATURE	(-2)
 #define PH_ERROR_PROTOCOL	(-3)
 extern int read_pack_header(int fd, struct pack_header *);
+
+extern struct sha1file *create_tmp_packfile(char **pack_tmp_name);
+
 #endif
-- 
1.7.7.1.573.ga40d2

^ permalink raw reply related

* [PATCH 1/4] write_pack_header(): a helper function
From: Junio C Hamano @ 2011-10-28 23:54 UTC (permalink / raw)
  To: git
In-Reply-To: <1319846051-462-1-git-send-email-gitster@pobox.com>

Factor out a small logic out of the private write_pack_file() function
in builtin/pack-objects.c

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/pack-objects.c |    9 +++------
 pack-write.c           |   12 ++++++++++++
 pack.h                 |    2 ++
 3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index ba3705d..6643c16 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -571,7 +571,6 @@ static void write_pack_file(void)
 	uint32_t i = 0, j;
 	struct sha1file *f;
 	off_t offset;
-	struct pack_header hdr;
 	uint32_t nr_remaining = nr_result;
 	time_t last_mtime = 0;
 	struct object_entry **write_order;
@@ -596,11 +595,9 @@ static void write_pack_file(void)
 			f = sha1fd(fd, pack_tmp_name);
 		}
 
-		hdr.hdr_signature = htonl(PACK_SIGNATURE);
-		hdr.hdr_version = htonl(PACK_VERSION);
-		hdr.hdr_entries = htonl(nr_remaining);
-		sha1write(f, &hdr, sizeof(hdr));
-		offset = sizeof(hdr);
+		offset = write_pack_header(f, nr_remaining);
+		if (!offset)
+			die_errno("unable to write pack header");
 		nr_written = 0;
 		for (; i < nr_objects; i++) {
 			struct object_entry *e = write_order[i];
diff --git a/pack-write.c b/pack-write.c
index 9cd3bfb..46f3f84 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -178,6 +178,18 @@ const char *write_idx_file(const char *index_name, struct pack_idx_entry **objec
 	return index_name;
 }
 
+off_t write_pack_header(struct sha1file *f, uint32_t nr_entries)
+{
+	struct pack_header hdr;
+
+	hdr.hdr_signature = htonl(PACK_SIGNATURE);
+	hdr.hdr_version = htonl(PACK_VERSION);
+	hdr.hdr_entries = htonl(nr_entries);
+	if (sha1write(f, &hdr, sizeof(hdr)))
+		return 0;
+	return sizeof(hdr);
+}
+
 /*
  * Update pack header with object_count and compute new SHA1 for pack data
  * associated to pack_fd, and write that SHA1 at the end.  That new SHA1
diff --git a/pack.h b/pack.h
index 722a54e..d429d8a 100644
--- a/pack.h
+++ b/pack.h
@@ -2,6 +2,7 @@
 #define PACK_H
 
 #include "object.h"
+#include "csum-file.h"
 
 /*
  * Packed object header
@@ -74,6 +75,7 @@ extern const char *write_idx_file(const char *index_name, struct pack_idx_entry
 extern int check_pack_crc(struct packed_git *p, struct pack_window **w_curs, off_t offset, off_t len, unsigned int nr);
 extern int verify_pack_index(struct packed_git *);
 extern int verify_pack(struct packed_git *);
+extern off_t write_pack_header(struct sha1file *f, uint32_t);
 extern void fixup_pack_header_footer(int, unsigned char *, const char *, uint32_t, unsigned char *, off_t);
 extern char *index_pack_lockfile(int fd);
 extern int encode_in_pack_object_header(enum object_type, uintmax_t, unsigned char *);
-- 
1.7.7.1.573.ga40d2

^ permalink raw reply related

* [PATCH 0/4] Bulk check-in
From: Junio C Hamano @ 2011-10-28 23:54 UTC (permalink / raw)
  To: git

This miniseries is a continuation of the "large file" topic from 1.7.6
development cycle.

The first three are moving existing code around for better reuse.  The
last one serves two purposes: to lift the one-pack-per-one-large-blob
constraint by introducing the concept of "plugging/unplugging" (i.e. you
plug the drain and throw many large blob at index_fd(), and they appear in
a single pack when you unplug it), and to stop using fast-import in this
codepath.

Only very lightly tested.

Junio C Hamano (4):
  write_pack_header(): a helper function
  create_tmp_packfile(): a helper function
  finish_tmp_packfile(): a helper function
  Bulk check-in

 Makefile               |    2 +
 builtin/add.c          |    5 ++
 builtin/pack-objects.c |   56 +++++------------
 bulk-checkin.c         |  159 ++++++++++++++++++++++++++++++++++++++++++++++++
 bulk-checkin.h         |   16 +++++
 pack-write.c           |   53 ++++++++++++++++
 pack.h                 |    6 ++
 sha1_file.c            |   67 +-------------------
 t/t1050-large.sh       |   26 ++++++--
 9 files changed, 282 insertions(+), 108 deletions(-)
 create mode 100644 bulk-checkin.c
 create mode 100644 bulk-checkin.h

-- 
1.7.7.1.573.ga40d2

^ permalink raw reply

* Re: [RFC/PATCH] define the way new representation types are encoded in the pack
From: Nicolas Pitre @ 2011-10-28 23:30 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Junio C Hamano, git, Jeff King
In-Reply-To: <CAJo=hJsEzkFV9k8N+GAwWddmEZH8pQeJZrg_MXD72stbAW0ceQ@mail.gmail.com>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1352 bytes --]

On Fri, 28 Oct 2011, Shawn Pearce wrote:

> On Fri, Oct 28, 2011 at 15:48, Nicolas Pitre <nico@fluxnic.net> wrote:
> > On Fri, 28 Oct 2011, Shawn Pearce wrote:
> >> - The immediate next byte encodes the extended type. This type is
> >> stored using the OFS_DELTA offset varint encoding, and thus may be
> >> larger than 256 if we ever need it to be.
> >
> > I'd say it is just a byte.  No encoding needed.  Let's not be silly
> > about it.  If we really have more than 255 object types one day (and I
> > really hope this will never happen) then the value 0 in that byte could
> > indicate yet another extended object type encoding.  But I truly hope
> > we'll have pack v9 or v10 by then and that we'll have obsoleted the
> > current 3-bit encoding completely at that point anyway.
> 
> Yes. I probably wouldn't code the parser to use a varint here. I would
> say the extended types stored in this byte must be >= 8, and must be
> <= 127. Any values out of this range are unsupported and should be
> rejected. We can later reserve the right to set the high bit and
> switch to the OFS_DELTA varint encoding if we need that many more
> types, and we explicitly define codes 0-7 as illegal if detected here
> in the extended byte field.

I wouldn't go as far as rejecting codes 1-7 as illegal though, but I 
otherwise agree with what you say.


Nicolas

^ permalink raw reply

* Re: [RFC/PATCH] define the way new representation types are encoded in the pack
From: Shawn Pearce @ 2011-10-28 23:07 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Junio C Hamano, git, Jeff King
In-Reply-To: <alpine.LFD.2.02.1110290031540.30467@xanadu.home>

On Fri, Oct 28, 2011 at 15:48, Nicolas Pitre <nico@fluxnic.net> wrote:
> On Fri, 28 Oct 2011, Shawn Pearce wrote:
>> - The immediate next byte encodes the extended type. This type is
>> stored using the OFS_DELTA offset varint encoding, and thus may be
>> larger than 256 if we ever need it to be.
>
> I'd say it is just a byte.  No encoding needed.  Let's not be silly
> about it.  If we really have more than 255 object types one day (and I
> really hope this will never happen) then the value 0 in that byte could
> indicate yet another extended object type encoding.  But I truly hope
> we'll have pack v9 or v10 by then and that we'll have obsoleted the
> current 3-bit encoding completely at that point anyway.

Yes. I probably wouldn't code the parser to use a varint here. I would
say the extended types stored in this byte must be >= 8, and must be
<= 127. Any values out of this range are unsupported and should be
rejected. We can later reserve the right to set the high bit and
switch to the OFS_DELTA varint encoding if we need that many more
types, and we explicitly define codes 0-7 as illegal if detected here
in the extended byte field.

^ permalink raw reply

* Re: [RFC/PATCH] define the way new representation types are encoded in the pack
From: Nicolas Pitre @ 2011-10-28 22:48 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Junio C Hamano, git, Jeff King
In-Reply-To: <CAJo=hJt-YZcdxw+D=1S4haPmY-8-LLjXD=MvDGeWbdJ88_VOGw@mail.gmail.com>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3385 bytes --]

On Fri, 28 Oct 2011, Shawn Pearce wrote:

> On Thu, Oct 27, 2011 at 23:04, Junio C Hamano <gitster@pobox.com> wrote:
> > In addition to four basic types (commit, tree, blob and tag), the pack
> > stream can encode a few other "representation" types, such as REF_DELTA
> > and OFS_DELTA. As we allocate 3 bits in the first byte for this purpose,
> > we do not have much room to add new representation types in place, but we
> > do have one value reserved for future expansion.
> 
> We have 2 values reserved, 0 and 5.
> 
> > When bit 4-6 encodes type 5, the first byte is used this way:
> >
> >  - Bit 0-3 denotes the real "extended" representation type. Because types
> >   0-7 can already be encoded without using the extended format, we can
> >   offset the type by 8 (i.e. if bit 0-3 says 3, it means representation
> >   type 11 = 3 + 8);
> >
> >  - Bit 4-6 has the value "5";
> >
> >  - Bit 7 is used to signal if the _third_ byte needs to be read for larger
> >   size that cannot be represented with 8-bit.
> 
> This is very complicated. We don't need more complex logic in the pack
> encoding. We _especially_ do not need yet another variant of how to
> store a variable length integer in the pack file. I'm sorry, but we
> already have two different variants and this just adds a third. It is
> beyond crazy.
> 
> Last time (this is now years ago but whatever) Nico and I discussed
> adding a new type to packs it was for the alternate tree encoding in
> "pack v4". Trees happen so often that type code 5 is a good value to
> use for these. Later you talked about using the extended type to store
> a cattree blob thing, which would not appear nearly as often as a
> normal directory listing type tree that was encoded using the pack v4
> style encoding... I think saving type 5 for a small frequently
> occurring type is a good thing.
> 
> > As it is unlikely for us to pack things that do not need to record any
> > size, the second byte is always used in full to encode the low 8-bit of
> > the size.
> >
> > I haven't started using type=8 and upwards for anything yet, but because
> > we have only one "future expansion" value left, I want us to be extremely
> > careful in order to avoid painting us into a corner that we cannot get out
> > of, so I am sending this out early for a preliminary review.
> 
> I would have said something more like:
> 
> When bit 4-6 encodes "0", then:
> 
> - Bit 0-3 and bit 7 are used normally to encode a variable length
> "size" integer. These may be 0 indicating no size information.
> 
> - 2nd-nth bytes store remaining "size" information, if bit 7 was set.
> 
> - The immediate next byte encodes the extended type. This type is
> stored using the OFS_DELTA offset varint encoding, and thus may be
> larger than 256 if we ever need it to be.

I'd say it is just a byte.  No encoding needed.  Let's not be silly 
about it.  If we really have more than 255 object types one day (and I 
really hope this will never happen) then the value 0 in that byte could 
indicate yet another extended object type encoding.  But I truly hope 
we'll have pack v9 or v10 by then and that we'll have obsoleted the 
current 3-bit encoding completely at that point anyway.

For the record, I spent around 20 hours working on pack v4 while in the 
Caribbeans for a week last winter as I said I would.  Maybe I'll repeat 
the operation this year.


Nicolas

^ permalink raw reply

* Re: imap-send badly handles commit bodies beginning with "From <"
From: Jeff King @ 2011-10-28 21:37 UTC (permalink / raw)
  To: Andrew Eikum; +Cc: git
In-Reply-To: <20111028212122.GB3966@foghorn.codeweavers.com>

On Fri, Oct 28, 2011 at 04:21:22PM -0500, Andrew Eikum wrote:

> Since we have a program called "mailsplit," wouldn't it make more
> sense to have imap-send use its implementation to split mail instead
> of sharing just the From line detection?

Potentially, yeah. I was thinking of just pulling over the from line
detection (which is the real black magic bit), but it looks like
imap-send's mbox handling could use some general attention (maybe it
would be possible to not read the entire mbox into memory, for example).

> I was hoping it'd be a quick matter of pulling mailsplit's
> implementation out of builtin and into the top level, but I see it's
> got some global variables that are tangled enough that I actually have
> to understand the code before I can pull it apart :)
>
> If no one beats me to it, I'll work on this next week. It's late on
> Friday and I'm moving house this weekend.

No rush. Let us know if you have questions.

> Quick question, since I'm not intimately familiar with Git's code: I
> was thinking of creating a new compilation unit at the top level,
> mailutils.{c,h}, and referencing it from both imap-send.c and
> builtin/splitmail.c. Does that seem like the right approach? Is there
> an existing compilation unit I should be placing splitmail's guts into
> instead?

Yes, I think a new file makes sense here. Make sure to update LIB_H and
LIB_OBJS in the Makefile.

-Peff

^ permalink raw reply

* Re: imap-send badly handles commit bodies beginning with "From <"
From: Andrew Eikum @ 2011-10-28 21:21 UTC (permalink / raw)
  To: Jeff King; +Cc: Andrew Eikum, git
In-Reply-To: <20111028203256.GA15082@sigill.intra.peff.net>

On Fri, Oct 28, 2011 at 01:32:57PM -0700, Jeff King wrote:
> Mbox does have this problem, but I think in this case it is a
> particularly crappy implementation of mbox in imap-send. Look at
> imap-send.c:split_msg; it just looks for "From ".
> 
> It should at least check for something that looks like a timestamp, like
> git-mailsplit does. Maybe mailsplit's is_from_line should be factored
> out so that it can be reused in imap-send.

Since we have a program called "mailsplit," wouldn't it make more
sense to have imap-send use its implementation to split mail instead
of sharing just the From line detection?

> Want to work on a patch?

I was hoping it'd be a quick matter of pulling mailsplit's
implementation out of builtin and into the top level, but I see it's
got some global variables that are tangled enough that I actually have
to understand the code before I can pull it apart :)

If no one beats me to it, I'll work on this next week. It's late on
Friday and I'm moving house this weekend.

Quick question, since I'm not intimately familiar with Git's code: I
was thinking of creating a new compilation unit at the top level,
mailutils.{c,h}, and referencing it from both imap-send.c and
builtin/splitmail.c. Does that seem like the right approach? Is there
an existing compilation unit I should be placing splitmail's guts into
instead?

Andrew

^ permalink raw reply

* Re: [PATCH/WIP 01/11] Introduce "check-attr --excluded" as a replacement for "add --ignore-missing"
From: Nguyen Thai Ngoc Duy @ 2011-10-28 20:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vfwiez4s5.fsf@alter.siamese.dyndns.org>

2011/10/28 Junio C Hamano <gitster@pobox.com>:
> Perhaps ls-files is a more suitable home for the feature?

ls-files sounds good. It does all kinds of file selection already.
I'll see if I can add -I (aka "show ignored files only) to it.
-- 
Duy

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox