Git development

Git development
 help / color / mirror / Atom feed

* Re: git pull opinion
From: Linus Torvalds @ 2007-11-10  0:36 UTC (permalink / raw)
  To: Aghiles; +Cc: Bill Lear, Junio C Hamano, git
In-Reply-To: <3abd05a90711071325y397434efq7d4e50cb7a1cf07e@mail.gmail.com>

On Wed, 7 Nov 2007, Aghiles wrote:

> > [...]
> > Now, I do think that we could relax the rule so that "files that are
> > modified must be clean in the working tree" could instead become "files
> > that actually don't merge _trivially_ must be clean in the working tree".
> > But basically, if it's not a trivial merge, then since it's done in the
> > working tree, the working tree has to be clean (or the merge would
> > overwrite it).
> >[...]
> 
> I really think this is a good idea. It seems to me that the first "bad"
> surprise a svn/cvs/bk user will have is the result of a "git pull" command
> on a dirty tree. With the proposed change, and if I understand correctly:
>   - users that are used to commit often and fetch into clean trees
> will never be bothered by this change.
>   - users that are used to "update" often are expecting to resolve
> conflicts in their working copy anyway.
> 
> In both cases git does not get in your way and everyone is happy.

Well, there will still be cases where people won't be happy.

That said, all fast-forward cases (which is, I guess, a fairly common way 
of operating for anybody who has ever just uses anoncvs to track others) 
would be handled by the "three-way-merge dirty data for trivial merges". 

So even if it would only handle that special case (and it handles a *lot* 
of other cases too!) it probably would be useful to some people.

That said, I still don't think I have the energy to actually try to do it. 
I do suspect it's not that hard, and I outlined where it would go, but 
it's really quite core and important code... IOW, this needs *lots* of 
deep thought and care.

			Linus

^ permalink raw reply

* Re: git submodules and diffing
From: Sven Herzberg @ 2007-11-10  0:46 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <1194649057-19676-1-git-send-email-jnareb@gmail.com>

Hey Jakub,

Am 09.11.2007 um 23:57 schrieb Jakub Narebski:

> Sven Herzberg wrote:
>
>> When I started working with git submodules, I realized that git-diff
>> only lists the revision ids of a submodule if it has changed. I have
>> created a repository which includes a diff command for git-submodule,
>> so you can use it like "gut submodule diff <modules...>"
>>
>> I pushed my git tree at git://git.imendio.com/sven/git.git

>   http://git.imendio.com/?p=sven/git.git
>
>> Feel free to look into the changes and request improvements or merge
>> it into your tree.
>
> Although having "git submodule diff" is quite nice, I'd rather have
> "git diff --recurse-submodules" (or something like that) if I want to
> get diff of submodules.

I think it's pretty nice if git-submodule is the only command that  
knows about submodules.

> From browsing commitdiff
>   http://git.imendio.com/?p=sven/ 
> git.git;a=commitdiff;h=7fa1d4911d1ac2590ab1eccd84a7f235aca7878e
> I'd like to mention that instead of
>
>   (unset GIT_DIR && cd "$path" && git diff $flag "$sha1..HEAD")
>
> you can simply use
>
>   git --git-dir="$path" diff $flag "$sha1..HEAD"

It wasn't exactly that simple (I has to add `pwd` and "/.git") but  
thats for the hint. See the updated version in my repository.

>> PS: Please CC me, I'm not on this list
>
> You can always read list using NNTP / news / Usenet interface at
>   nntp://news.gmane.org/gmane.comp.version-control.git
> or one of the mailing list archives, see
>   http://git.or.cz/gitwiki/GitCommunity

I know, but being CCed is a simple task and makes it even easier for  
me to contribute in irregular intervals.

Thank you,
   Sven

^ permalink raw reply

* Re: [PATCH 2/2] --pretty=format: on-demand format expansion
From: Jeff King @ 2007-11-10  0:46 UTC (permalink / raw)
  To: René Scharfe
  Cc: Junio C Hamano, Paul Mackerras, Git Mailing List, Pierre Habouzit,
	Johannes Schindelin
In-Reply-To: <4734EA4E.8070405@lsrfire.ath.cx>

On Sat, Nov 10, 2007 at 12:16:30AM +0100, René Scharfe wrote:

> > A partial patch on top of yours is below (it caches commit and tree 
> > abbreviations; parent abbreviations and person-parsing are probably 
> > worth doing). Some timings:
> 
> ... but I object to the choice of items to cache.  Are there real-world
> formats containing the same placeholder twice or even more often?

My choice of items was more "here is what I am talking about" and not
"this is the best set of items."

As for what real-world workloads are like, part of the _point_ of
--pretty=format: is for one-off formats that users use in their
workflow. So yes, I have used formats that repeat specifiers, but they
are probably not common.

The point of my timings was to show not only that we sped up that
uncommon case, but that there was negligible cost to the common case.
And since we don't know what formats users will provide, it makes sense
not to have lousy performance on the uncommon.

> There is probably more to gain from the interdependencies of different
> placeholders.  The patch below attempts to avoid parsing the commit
> twice, by saving pointers to the different parts.

Looks sane, although I don't see any reason this couldn't just go on top
of my patch, and then we can speed up both cases.

> (next)
> $ time git log --pretty=format:"* %cd %cn%n%n%s%n%b" >/dev/null
> real    0m0.631s
[...]
> (next+patch)
> $ time git log --pretty=format:"* %cd %cn%n%n%s%n%b" >/dev/null
> real    0m0.570s

Great, you have sped up the uncommon case. But what is the cost to "git
log --pretty=format:" and "git log --pretty=format:%cd"?

-Peff

^ permalink raw reply

* Re: [PATCH 2/2] --pretty=format: on-demand format expansion
From: Jeff King @ 2007-11-10  0:49 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: René Scharfe, Junio C Hamano, Paul Mackerras,
	Git Mailing List, Pierre Habouzit
In-Reply-To: <Pine.LNX.4.64.0711100030090.4362@racer.site>

On Sat, Nov 10, 2007 at 12:31:41AM +0000, Johannes Schindelin wrote:

> If you keep going like that, "git log" will be slower than "git log 
> --pretty=format:bla" soon.

Maybe I am dreaming, but the code might be much simpler and more
readable if --pretty=* were implemented in terms of --pretty=format,
assuming the speeds are equivalent.

-Peff

^ permalink raw reply

* Re: [PATCH 2/2] --pretty=format: on-demand format expansion
From: Jeff King @ 2007-11-10  0:51 UTC (permalink / raw)
  To: René Scharfe
  Cc: Junio C Hamano, Paul Mackerras, Git Mailing List, Pierre Habouzit,
	Johannes Schindelin
In-Reply-To: <4734EB36.2030408@lsrfire.ath.cx>

On Sat, Nov 10, 2007 at 12:20:22AM +0100, René Scharfe wrote:

> > This void cast is pointless, since all pointers types convert implicitly
> > to void pointers anyway. At best, it does nothing, and at worst, it
> > hides an actual type error if the function signature or the type of
> > 'commit' change.
> 
> When commit (of type const struct commit*) is implicitly converted to
> void *, gcc complains because the "const" qualifier is silently dropped.

Bleh. You're right, of course. I don't like casting away constness, but
it is stupid for strbuf_expand to take a "const void *" since it has no
idea how it will be used (although I note your caching patch also neatly
does away with this, anyway).

-Peff

^ permalink raw reply

* Re: `git-send-email' doesn't specify `Content-Type'
From: Johannes Schindelin @ 2007-11-10  0:52 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: git
In-Reply-To: <87ode3klc7.fsf@chbouib.org>

Hi,

On Sat, 10 Nov 2007, Ludovic Court?s wrote:

> Apparently, `git-send-email' doesn't specify the email's `Content-Type',
> notably its charset, while it should really add something like:
> 
>   Content-Type: text/plain; charset=UTF-8
> 
> Or did I miss an option or something?

Apparently.  There was a thread some days ago, about that very issue.  
Please find and read it.

Ciao,
Dscho

^ permalink raw reply

* Re: Reducing the memory footprint
From: Brian Downing @ 2007-11-10  0:53 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Git Mailing List
In-Reply-To: <9e4733910711091538h260fbcd2s5783f01e7db4b19a@mail.gmail.com>

On Fri, Nov 09, 2007 at 06:38:00PM -0500, Jon Smirl wrote:
> I'm using this config file:
> 
> [pack]
>         windowMemory = 1M
>         deltaCacheSize = 1M
> 
> And I have NO_MMAP compiled in.
> 
> git is still using over 200MB of memory or address space, my process
> gets killed either way.

I'm assuming it's dying on repacking since you included the pack
parameters.

How big is your biggest object?  Even with pack.windowMemory, it still
keeps the last object around to try and delta against (in other words,
the window only shrinks to size 1), which means you have to have room
for it and its delta index.

-bcd

^ permalink raw reply

* Re: Reducing the memory footprint
From: Jon Smirl @ 2007-11-10  1:05 UTC (permalink / raw)
  To: Brian Downing; +Cc: Git Mailing List
In-Reply-To: <20071110005327.GH6212@lavos.net>

On 11/9/07, Brian Downing <bdowning@lavos.net> wrote:
> On Fri, Nov 09, 2007 at 06:38:00PM -0500, Jon Smirl wrote:
> > I'm using this config file:
> >
> > [pack]
> >         windowMemory = 1M
> >         deltaCacheSize = 1M
> >
> > And I have NO_MMAP compiled in.
> >
> > git is still using over 200MB of memory or address space, my process
> > gets killed either way.
>
> I'm assuming it's dying on repacking since you included the pack
> parameters.
>
> How big is your biggest object?  Even with pack.windowMemory, it still
> keeps the last object around to try and delta against (in other words,
> the window only shrinks to size 1), which means you have to have room
> for it and its delta index.

It's a Linux kernel repository. Git receive-pack is going over 200MB
and getting zapped.  I don't understand why the process is so large. I
am compiled with -DNO_MMAP.

I think I have a achieved a work around. I rsync'd in my last several
weeks of changes. Now I can 'git push' small amounts of changes
without getting killed.

I'm begging dreamhost to simply install git. Installed commands don't
get zapped.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* gitweb, updating 'last changed' column on the project page
From: Jon Smirl @ 2007-11-10  1:09 UTC (permalink / raw)
  To: Git Mailing List

At http://git.digispeaker.com/ the 'last change' column is not getting updated.

mpc5200b.git
	DigiSpeaker for Freescale MPC5200B.
	Jon Smirl
	5 weeks ago
	summary | shortlog | log | tree

It still says 5 weeks ago, but if I click on the project last change is today.

What controls this? I tried running update-server-info

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* Re: *[PATCH] builtin-commit: Refresh cache after adding files.
From: Junio C Hamano @ 2007-11-10  1:40 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Kristian Høgsberg, git
In-Reply-To: <7vhcjvtgz5.fsf@gitster.siamese.dyndns.org>

Junio C Hamano <gitster@pobox.com> writes:

> This discussion exposes a problem with add_files_to_cache()
> function.
> ...
> And add_file_to_cache(), which calls add_file_to_index() on
> the_index, does call the fill_stat_cache_info() to sync the stat
> information by itself, as it should be.  I am still puzzled with
> this.

Blech.  I think it is 0781b8a9b2fe760fc4ed519a3a26e4b9bd6ccffe
(add_file_to_index: skip rehashing if the cached stat already
matches) that broke "git add".  I get the same problem not just
with "git add -u" but also with an explicit "git add Makefile".

In add_file_to_index(), we check if ie_modified() says if the
path is modified, but that is actually a very wrong check in
that function.  ie_modified() knows the racy git condition and
digs deeper to find if the file in the work tree is really
different.  We end up saying "ah, Ok, if that is the same, then
we do not add it to the index again", which is why we do not
update the stat info in this case.

Instead, we should ask ie_match_stat() which answers "does not
match" for a entry that _could_ be racily clean, so that we make
sure we re-add such an entry to clear the stat info.

This applies to maint and all the way up.

I also suspect that run_diff_files() that is called from
builtin-add.c:update() needs to tell ie_match_stat() to assume
that a racily clean path is unclean (that is what bit (1<<1) of
the third parameter to ie_match_stat() is about), so that "add
-u" will call the add_file_to_index() function, but that would
be a bigger patch.

---

 read-cache.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 928e8fa..75e2d46 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -420,7 +420,7 @@ int add_file_to_index(struct index_state *istate, const char *path, int verbose)
 	pos = index_name_pos(istate, ce->name, namelen);
 	if (0 <= pos &&
 	    !ce_stage(istate->cache[pos]) &&
-	    !ie_modified(istate, istate->cache[pos], &st, 1)) {
+	    !ie_match_stat(istate, istate->cache[pos], &st, 3)) {
 		/* Nothing changed, really */
 		free(ce);
 		return 0;

^ permalink raw reply related

* Re: gitweb, updating 'last changed' column on the project page
From: Jakub Narebski @ 2007-11-10  1:58 UTC (permalink / raw)
  To: git
In-Reply-To: <9e4733910711091709k173bf23flf2824673f82de9bb@mail.gmail.com>

Jon Smirl wrote:

> At http://git.digispeaker.com/ the 'last change' column is not getting updated.
> 
> mpc5200b.git
>       DigiSpeaker for Freescale MPC5200B.
>       Jon Smirl
>       5 weeks ago
>       summary | shortlog | log | tree
> 
> It still says 5 weeks ago, but if I click on the project last change is today.
> 
> What controls this? I tried running update-server-info

What does

  git for-each-ref --format="%(refname):%09%(committer)" --sort=-committerdate
      refs/heads

return? Does adding --count select proper branch, with proper update
date?

Which gitweb version is this?

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply

* Re: *[PATCH] builtin-commit: Refresh cache after adding files.
From: Johannes Schindelin @ 2007-11-10  2:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kristian Høgsberg, git
In-Reply-To: <7v640ari76.fsf@gitster.siamese.dyndns.org>

Hi,

On Fri, 9 Nov 2007, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
> 
> > This discussion exposes a problem with add_files_to_cache()
> > function.
> > ...
> > And add_file_to_cache(), which calls add_file_to_index() on
> > the_index, does call the fill_stat_cache_info() to sync the stat
> > information by itself, as it should be.  I am still puzzled with
> > this.
> 
> Blech.  I think it is 0781b8a9b2fe760fc4ed519a3a26e4b9bd6ccffe 
> (add_file_to_index: skip rehashing if the cached stat already matches) 
> that broke "git add".  I get the same problem not just with "git add -u" 
> but also with an explicit "git add Makefile".

But I think that we still need to refresh the index.  It outputs which 
files names were new, and which ones were removed.  Otherwise git commit 
would be eerily silent on those.

Ciao,
Dscho

^ permalink raw reply

* [PATCH] git-add: make the entry stat-clean after re-adding the same contents
From: Junio C Hamano @ 2007-11-10  2:22 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Kristian Høgsberg, git
In-Reply-To: <7vhcjvtgz5.fsf@gitster.siamese.dyndns.org>

Earlier in commit 0781b8a9b2fe760fc4ed519a3a26e4b9bd6ccffe
(add_file_to_index: skip rehashing if the cached stat already
matches), add_file_to_index() were taught not to re-add the path
if it already matches the index.

The change meant well, but was not executed quite right.  It
used ie_modified() to see if the file on the work tree is really
different from the index, and skipped adding the contents if the
function says "not modified".

This was wrong.  There are three possible comparison results
between the index and the file in the work tree:

 - with lstat(2) we _know_ they are different.  E.g. if the
   length or the owner in the cached stat information is
   different from the length we just obtained from lstat(2), we
   can tell the file is modified without looking at the actual
   contents.

 - with lstat(2) we _know_ they are the same.  The same length,
   the same owner, the same everything (but this has a twist, as
   described below).

 - we cannot tell from lstat(2) information alone and need to go
   to the filesystem to actually compare.

The last case arises from what we call 'racy git' situation,
that can be caused with this sequence:

    $ echo hello >file
    $ git add file
    $ echo aeiou >file ;# the same length

If the second "echo" is done within the same filesystem
timestamp granularity as the first "echo", then the timestamp
recorded by "git add" and the timestamp we get from lstat(2)
will be the same, and we can mistakenly say the file is not
modified.  The path is called 'racily clean'.  We need to
reliably detect racily clean paths are in fact modified.

To solve this problem, when we write out the index, we mark the
index entry that has the same timestamp as the index file itself
(that is the time from the point of view of the filesystem) to
tell any later code that does the lstat(2) comparison not to
trust the cached stat info, and ie_modified() then actually goes
to the filesystem to compare the contents for such a path.

That's all good, but it should not be used for this "git add"
optimization, as the goal of "git add" is to actually update the
path in the index and make it stat-clean.  With the false
optimization, we did _not_ cause any data loss (after all, what
we failed to do was only to update the cached stat information),
but it made the following sequence leave the file stat dirty:

    $ echo hello >file
    $ git add file
    $ echo hello >file ;# the same contents
    $ git add file

The solution is not to use ie_modified() which goes to the
filesystem to see if it is really clean, but instead use
ie_match_stat() with "assume racily clean paths are dirty"
option, to force re-adding of such a path.

There was another problem with "git add -u".  The codepath
shares the same issue when adding the paths that are found to be
modified, but in addition, it asked "git diff-files" machinery
run_diff_files() function (which is "git diff-files") to list
the paths that are modified.  But "git diff-files" machinery
uses the same ie_modified() call so that it does not report
racily clean _and_ actually clean paths as modified, which is
not what we want.

The patch allows the callers of run_diff_files() to pass the
same "assume racily clean paths are dirty" option, and makes
"git-add -u" codepath to use that option, to discover and re-add
racily clean _and_ actually clean paths.

We could further optimize on top of this patch to differentiate
the case where the path really needs re-adding (i.e. the content
of the racily clean entry was indeed different) and the case
where only the cached stat information needs to be refreshed
(i.e. the racily clean entry was actually clean), but I do not
think it is worth it.

This patch applies to maint and all the way up.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * Blech.  This took longer than I thought.  It's only cached
   stat info and does not cause real data breakage, but I think
   it is a good thing to fix nevertheless.

 builtin-add.c |    2 +-
 diff-lib.c    |    6 ++++--
 diff.h        |    2 +-
 read-cache.c  |    2 +-
 4 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/builtin-add.c b/builtin-add.c
index 373f87f..b6d6bbe 100644
--- a/builtin-add.c
+++ b/builtin-add.c
@@ -123,7 +123,7 @@ static void update(int verbose, const char *prefix, const char **files)
 	rev.diffopt.format_callback_data = &verbose;
 	if (read_cache() < 0)
 		die("index file corrupt");
-	run_diff_files(&rev, 0);
+	run_diff_files(&rev, 2);
 }

 static void refresh(int verbose, const char **pathspec)
diff --git a/diff-lib.c b/diff-lib.c
index da55713..b83f650 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -332,10 +332,12 @@ int run_diff_files_cmd(struct rev_info *revs, int argc, const char **argv)
 	return run_diff_files(revs, silent_on_removed);
 }

-int run_diff_files(struct rev_info *revs, int silent_on_removed)
+int run_diff_files(struct rev_info *revs, int option)
 {
 	int entries, i;
 	int diff_unmerged_stage = revs->max_count;
+	int silent_on_removed = option & 01;
+	int assume_racy_is_modified = option & 02;

 	if (diff_unmerged_stage < 0)
 		diff_unmerged_stage = 2;
@@ -441,7 +443,7 @@ int run_diff_files(struct rev_info *revs, int silent_on_removed)
 				       ce->sha1, ce->name, NULL);
 			continue;
 		}
-		changed = ce_match_stat(ce, &st, 0);
+		changed = ce_match_stat(ce, &st, assume_racy_is_modified);
 		if (!changed && !revs->diffopt.find_copies_harder)
 			continue;
 		oldmode = ntohl(ce->ce_mode);
diff --git a/diff.h b/diff.h
index 4546aad..040e18c 100644
--- a/diff.h
+++ b/diff.h
@@ -224,7 +224,7 @@ extern void diff_flush(struct diff_options*);

 extern const char *diff_unique_abbrev(const unsigned char *, int);

-extern int run_diff_files(struct rev_info *revs, int silent_on_removed);
+extern int run_diff_files(struct rev_info *revs, int option);
 extern int setup_diff_no_index(struct rev_info *revs,
 		int argc, const char ** argv, int nongit, const char *prefix);
 extern int run_diff_files_cmd(struct rev_info *revs, int argc, const char **argv);
diff --git a/read-cache.c b/read-cache.c
index 928e8fa..75e2d46 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -420,7 +420,7 @@ int add_file_to_index(struct index_state *istate, const char *path, int verbose)
 	pos = index_name_pos(istate, ce->name, namelen);
 	if (0 <= pos &&
 	    !ce_stage(istate->cache[pos]) &&
-	    !ie_modified(istate, istate->cache[pos], &st, 1)) {
+	    !ie_match_stat(istate, istate->cache[pos], &st, 3)) {
 		/* Nothing changed, really */
 		free(ce);
 		return 0;
-- 
1.5.3.5.1651.g30bf0

^ permalink raw reply related

* Re: *[PATCH] builtin-commit: Refresh cache after adding files.
From: Junio C Hamano @ 2007-11-10  2:26 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Kristian Høgsberg, git
In-Reply-To: <Pine.LNX.4.64.0711100201310.4362@racer.site>

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> But I think that we still need to refresh the index.

You are correct.  See

    http://article.gmane.org/gmane.comp.version-control.git/64254

^ permalink raw reply

* Re: [PATCH] git-add: make the entry stat-clean after re-adding the same contents
From: Linus Torvalds @ 2007-11-10  2:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, Kristian Høgsberg, git
In-Reply-To: <7vzlxmq1oj.fsf_-_@gitster.siamese.dyndns.org>



On Fri, 9 Nov 2007, Junio C Hamano wrote:
>  
> -int run_diff_files(struct rev_info *revs, int silent_on_removed)
> +int run_diff_files(struct rev_info *revs, int option)

Wouldn't it be much better to now 
 - make it "unsigned int flags"
 - create a few enums or #define's to make the usage be more readable?

Because this:

>-       run_diff_files(&rev, 0);
>+       run_diff_files(&rev, 2);
> -	    !ie_modified(istate, istate->cache[pos], &st, 1)) {
> +	    !ie_match_stat(istate, istate->cache[pos], &st, 3)) {

just went from subtle to "incredibly non-obvious".

I realize that 3 is "silent + assume_racy" and 2 is just "assume_racy", 
but I may realize that now, and in a month? I'd need to look it up.

		Linus

^ permalink raw reply

* git packs
From: bob @ 2007-11-10  4:47 UTC (permalink / raw)
  To: git

When a repository is packed such as for a clone or fetch, is there  
just one pack file created that is used for the transfer?

^ permalink raw reply

* Re: git packs
From: Nicolas Pitre @ 2007-11-10  5:13 UTC (permalink / raw)
  To: bob; +Cc: git
In-Reply-To: <F6DD8DCD-416B-4DDF-B384-7213C9ED5565@mac.com>

On Fri, 9 Nov 2007, bob wrote:

> When a repository is packed such as for a clone or fetch, is there just one
> pack file created that is used for the transfer?

Yes.

And modern Git is able to handle packs larger than 4GB too, assuming it 
is compiled using a toolchain with large file support.

Nicolas

^ permalink raw reply

* [PATCH] builtin-commit: fix --signoff
From: Johannes Schindelin @ 2007-11-10  5:49 UTC (permalink / raw)
  To: git, krh, gitster


The Signed-off-by: line contained a spurious timestamp.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 builtin-commit.c |   31 ++++++++++++++++++++-----------
 1 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/builtin-commit.c b/builtin-commit.c
index e8bc4c4..f79ad48 100644
--- a/builtin-commit.c
+++ b/builtin-commit.c
@@ -181,21 +181,30 @@ static int prepare_log_message(const char *index_file, const char *prefix)
 		die("could not open %s\n", git_path(commit_editmsg));
 
 	stripspace(&sb, 0);
-	if (fwrite(sb.buf, 1, sb.len, fp) < sb.len)
-		die("could not write commit template: %s\n",
-		    strerror(errno));
 
 	if (signoff) {
-		const char *info, *bol;
-
-		info = git_committer_info(1);
-		strbuf_addch(&sb, '\0');
-		bol = strrchr(sb.buf + sb.len - 1, '\n');
-		if (!bol || prefixcmp(bol, sign_off_header))
-			fprintf(fp, "\n");
-		fprintf(fp, "%s%s\n", sign_off_header, git_committer_info(1));
+		struct strbuf sob;
+		const char *p;
+		int i;
+
+		strbuf_init(&sob, 0);
+		strbuf_addstr(&sob, sign_off_header);
+		strbuf_addstr(&sob, git_committer_info(1));
+		p = strrchr(sob.buf, '>');
+		if (p)
+			strbuf_setlen(&sob, p + 1 - sob.buf);
+		strbuf_addch(&sob, '\n');
+
+		for (i = sb.len - 1; i > 0 && sb.buf[i - 1] != '\n'; i--)
+			; /* do nothing */
+		if (prefixcmp(sb.buf + i, sob.buf))
+			strbuf_addbuf(&sb, &sob);
 	}
 
+	if (fwrite(sb.buf, 1, sb.len, fp) < sb.len)
+		die("could not write commit template: %s\n",
+		    strerror(errno));
+
 	strbuf_release(&sb);
 
 	if (in_merge && !no_edit)
-- 
1.5.3.5.1674.g6e7f7

^ permalink raw reply related

* Re: git packs
From: bob @ 2007-11-10  6:00 UTC (permalink / raw)
  To: git
In-Reply-To: <alpine.LFD.0.9999.0711100011150.21255@xanadu.home>

When you say toolchain, are you referring to the compiler
and associated libraries or are you referring to OS programs
such as ls, md5, cat, etc or both?

The reason that I ask is that I have been playing different
scenarios using git 1.5.3.5 under MacOSX 10.4.10 mostly
all day and every time that

A) 	a file approaches or exceeds 2gig on an 'add', it
	results in:

	fatal: Out of memory? mmap failed: Cannot allocate memory

B) 	the repository size less the .git subdirectory approaches
	4gig on a 'fetch' it results in:

	Resolving 3356 deltas...
	fatal: serious inflate inconsistency: -3 (unknown compression method)
	fatal: index-pack died with error code 128
	fatal: Fetch failure: ../rmwHtmlOld

	Under B, building the initial repository works fine.

(I added a patch the Linus Torvalds gave out when a previous inflate  
problem
was being researched.)  Also, I have been looking in the source
in particular in builtin-add.c builtin-pack-objects.c and associated  
headers
and see int and unsigned long being used a lot, but not any unsigned  
long
longs.  I have been testing on my laptop which has a 32-bit Intel  
Core Duo.
Also, I have run the same tests on a dual quad-core Intel processor
which is 64 bit, (but not sure that Apple uses the 64 bits in  
10.4.10).  I
get the same results as above.

The zlib is at the latest revision of 1.2.3 and gcc is at 4.0.1
which from what I can tell supports large files, because 'off_t' is 8  
bytes
which is the size used for a 'stat' file size.

I am just wondering if these size limitations exist for MacOSX
or maybe I am doing something wrong (which is probably
the case).

On Nov 10, 2007, at 12:13 AM, Nicolas Pitre wrote:

> On Fri, 9 Nov 2007, bob wrote:
>
>> When a repository is packed such as for a clone or fetch, is there  
>> just one
>> pack file created that is used for the transfer?
>
> Yes.
>
> And modern Git is able to handle packs larger than 4GB too,  
> assuming it
> is compiled using a toolchain with large file support.
>
>
> Nicolas
> -

^ permalink raw reply

* Re: gitweb, updating 'last changed' column on the project page
From: Jon Smirl @ 2007-11-10  6:01 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <fh337a$ggp$1@ger.gmane.org>

On 11/9/07, Jakub Narebski <jnareb@gmail.com> wrote:
> Jon Smirl wrote:
>
> > At http://git.digispeaker.com/ the 'last change' column is not getting updated.
> >
> > mpc5200b.git
> >       DigiSpeaker for Freescale MPC5200B.
> >       Jon Smirl
> >       5 weeks ago
> >       summary | shortlog | log | tree
> >
> > It still says 5 weeks ago, but if I click on the project last change is today.
> >
> > What controls this? I tried running update-server-info
>
> What does
>
>   git for-each-ref --format="%(refname):%09%(committer)" --sort=-committerdate
>       refs/heads

[daedalus]$ git for-each-ref --format="%(refname):%09%(committer)"
--sort=-committerdate refs/heads
refs/heads/m24: Jon Smirl <jonsmirl@gmail.com> 1191362799 -0400
refs/heads/m25: Jon Smirl <jonsmirl@gmail.com> 1191472422 -0400
refs/heads/m26: Jon Smirl <jonsmirl@gmail.com> 1194382038 -0500
refs/heads/m28: Jon Smirl <jonsmirl@gmail.com> 1194385071 -0500
refs/heads/m29: Jon Smirl <jonsmirl@gmail.com> 1194585780 -0500

>
> return? Does adding --count select proper branch, with proper update
> date?

Is it looking for master, and just picking the first branch instead?

>
> Which gitweb version is this?

<!-- git web interface version 1.5.3.5.605.g79fa-dirty, (C) 2005-2006,
Kay Sievers <kay.sievers@vrfy.org>, Christian Gierke -->
<!-- git core binaries version 1.5.3.5.605.g79fa-dirty -->

>
> --
> Jakub Narebski
> Warsaw, Poland
> ShadeHawk on #git
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* Re: Reducing the memory footprint
From: Jon Smirl @ 2007-11-10  6:07 UTC (permalink / raw)
  To: Brian Downing; +Cc: Git Mailing List
In-Reply-To: <9e4733910711091705i6f77d05uc5ba04f668796a73@mail.gmail.com>

On 11/9/07, Jon Smirl <jonsmirl@gmail.com> wrote:
> On 11/9/07, Brian Downing <bdowning@lavos.net> wrote:
> > On Fri, Nov 09, 2007 at 06:38:00PM -0500, Jon Smirl wrote:
> > > I'm using this config file:
> > >
> > > [pack]
> > >         windowMemory = 1M
> > >         deltaCacheSize = 1M
> > >
> > > And I have NO_MMAP compiled in.
> > >
> > > git is still using over 200MB of memory or address space, my process
> > > gets killed either way.
> >
> > I'm assuming it's dying on repacking since you included the pack
> > parameters.
> >
> > How big is your biggest object?  Even with pack.windowMemory, it still
> > keeps the last object around to try and delta against (in other words,
> > the window only shrinks to size 1), which means you have to have room
> > for it and its delta index.
>
> It's a Linux kernel repository. Git receive-pack is going over 200MB
> and getting zapped.  I don't understand why the process is so large. I
> am compiled with -DNO_MMAP.

I believe I must not have installed everything correctly with my
NO_MMAP build. After debugging for a while and fixing things I'm able
to do a push now in about 80MB of memory.

> I think I have a achieved a work around. I rsync'd in my last several
> weeks of changes. Now I can 'git push' small amounts of changes
> without getting killed.
>
> I'm begging dreamhost to simply install git. Installed commands don't
> get zapped.
>
> --
> Jon Smirl
> jonsmirl@gmail.com
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* Re: [PATCH] Make builtin-tag.c use parse_options.
From: Junio C Hamano @ 2007-11-10  6:07 UTC (permalink / raw)
  To: Carlos Rica; +Cc: git
In-Reply-To: <473463E0.7000406@gmail.com>

Carlos Rica <jasampler@gmail.com> writes:

> Also, this removes those tests ensuring that repeated
> -m options don't allocate memory more than once, because now
> this is done after parsing options, using the last one
> when more are given. The same for -F.

The reason for this change is...?  Is this because it is
cumbersome to detect and refuse multiple -m options using the
parseopt API?  If so, the API may be what needs to be fixed.
Taking the last one and discarding earlier ones feels to me an
arbitrary choice.

While I freely admit that I do not particularly find the "One -m
introduces one new line, concatenated to form the final
paragraph" handling of multiple -m options done by git-commit
nice nor useful, I suspect that it would make more sense to make
git-tag and git-commit handle multiple -m option consistently,
if you are going to change the existing semantics.  Since some
people really seem to like multiple -m handling of git-commit,
the avenue of the least resistance for better consistency would
be to accept and concatenate (with LF in between) multiple -m
options.

With multiple -F, I think erroring out would be the sensible
thing to do, but some people might prefer concatenation.  I do
not care either way as long as commit and tag behave
consistently.

^ permalink raw reply

* Re: gitweb, updating 'last changed' column on the project page
From: Jon Smirl @ 2007-11-10  6:10 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <9e4733910711092201n5aaeeb7cvfd0e76e43170d481@mail.gmail.com>

On 11/10/07, Jon Smirl <jonsmirl@gmail.com> wrote:
> On 11/9/07, Jakub Narebski <jnareb@gmail.com> wrote:
> > Jon Smirl wrote:
> >
> > > At http://git.digispeaker.com/ the 'last change' column is not getting updated.
> > >
> > > mpc5200b.git
> > >       DigiSpeaker for Freescale MPC5200B.
> > >       Jon Smirl
> > >       5 weeks ago
> > >       summary | shortlog | log | tree
> > >
> > > It still says 5 weeks ago, but if I click on the project last change is today.
> > >
> > > What controls this? I tried running update-server-info
> >
> > What does
> >
> >   git for-each-ref --format="%(refname):%09%(committer)" --sort=-committerdate
> >       refs/heads
>
> [daedalus]$ git for-each-ref --format="%(refname):%09%(committer)"
> --sort=-committerdate refs/heads
> refs/heads/m24: Jon Smirl <jonsmirl@gmail.com> 1191362799 -0400
> refs/heads/m25: Jon Smirl <jonsmirl@gmail.com> 1191472422 -0400
> refs/heads/m26: Jon Smirl <jonsmirl@gmail.com> 1194382038 -0500
> refs/heads/m28: Jon Smirl <jonsmirl@gmail.com> 1194385071 -0500
> refs/heads/m29: Jon Smirl <jonsmirl@gmail.com> 1194585780 -0500


It appears to be using the first head instead of the most recent date.


>
> >
> > return? Does adding --count select proper branch, with proper update
> > date?
>
> Is it looking for master, and just picking the first branch instead?
>
> >
> > Which gitweb version is this?
>
> <!-- git web interface version 1.5.3.5.605.g79fa-dirty, (C) 2005-2006,
> Kay Sievers <kay.sievers@vrfy.org>, Christian Gierke -->
> <!-- git core binaries version 1.5.3.5.605.g79fa-dirty -->
>
> >
> > --
> > Jakub Narebski
> > Warsaw, Poland
> > ShadeHawk on #git
> >
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe git" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
>
>
> --
> Jon Smirl
> jonsmirl@gmail.com
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* Re: git packs
From: Luke Lu @ 2007-11-10  6:36 UTC (permalink / raw)
  To: bob; +Cc: git
In-Reply-To: <FC175E4F-D9BE-42CC-B0BB-561B2EDCD941@mac.com>

On Nov 9, 2007, at 10:00 PM, bob wrote:
> When you say toolchain, are you referring to the compiler
> and associated libraries or are you referring to OS programs
> such as ls, md5, cat, etc or both?
>
> The reason that I ask is that I have been playing different
> scenarios using git 1.5.3.5 under MacOSX 10.4.10 mostly
> all day and every time that
>
> A) 	a file approaches or exceeds 2gig on an 'add', it
> 	results in:
> 	
> 	fatal: Out of memory? mmap failed: Cannot allocate memory
>
>
>
> B) 	the repository size less the .git subdirectory approaches
> 	4gig on a 'fetch' it results in:
>
> 	Resolving 3356 deltas...
> 	fatal: serious inflate inconsistency: -3 (unknown compression method)
> 	fatal: index-pack died with error code 128
> 	fatal: Fetch failure: ../rmwHtmlOld
>
> 	Under B, building the initial repository works fine.
>
> (I added a patch the Linus Torvalds gave out when a previous  
> inflate problem
> was being researched.)  Also, I have been looking in the source
> in particular in builtin-add.c builtin-pack-objects.c and  
> associated headers
> and see int and unsigned long being used a lot, but not any  
> unsigned long
> longs.  I have been testing on my laptop which has a 32-bit Intel  
> Core Duo.
> Also, I have run the same tests on a dual quad-core Intel processor
> which is 64 bit, (but not sure that Apple uses the 64 bits in  
> 10.4.10).  I
> get the same results as above.
>
> The zlib is at the latest revision of 1.2.3 and gcc is at 4.0.1
> which from what I can tell supports large files, because 'off_t' is  
> 8 bytes
> which is the size used for a 'stat' file size.

mmap(2), which git uses by default, is subject to vm limits  
(typically <2GB), regardless of large file support. file `which git`  
will probably tell you that it's a Mach-O executable i386 instead of  
x86_64. In order to get 64 bit binaries on Mactel boxes, you'll need  
the -m64 flag for gcc. I suspect that compiling with NO_MMAP option  
work as well.

__Luke

^ permalink raw reply

* Re: git packs
From: Linus Torvalds @ 2007-11-10  6:38 UTC (permalink / raw)
  To: bob; +Cc: git
In-Reply-To: <FC175E4F-D9BE-42CC-B0BB-561B2EDCD941@mac.com>

On Sat, 10 Nov 2007, bob wrote:
> 
> The reason that I ask is that I have been playing different
> scenarios using git 1.5.3.5 under MacOSX 10.4.10 mostly
> all day and every time that
> 
> A) 	a file approaches or exceeds 2gig on an 'add', it
> 	results in:
> 		fatal: Out of memory? mmap failed: Cannot allocate memory

Git wants to handle single files as one single entity, so single big files 
really do end up being very painful. The costs of compressing them and 
generating deltas would probably get prohibitively high *anyway*, but it 
does mean that if you have gigabyte files, you do want a 64-bit VM.

I thought OS X could do 64 bits these days. Maybe not.

Anyway, that explains the "cannot allocate memory". Git simply wants to 
mmap the whole file. You don't have enough VM space for it.

(And if you seriously want to work with multi-gigabyte files, git probbaly 
isn't going to perform wonderfully well, even if it *should* work fine if 
you just have a full 64-bit environment that allows the mmap).

> B) 	the repository size less the .git subdirectory approaches
> 	4gig on a 'fetch' it results in:
> 
> 	Resolving 3356 deltas...
> 	fatal: serious inflate inconsistency: -3 (unknown compression method)

That sounds really broken. I'm not seeing what would cause that, apart 
from some really bad data corruption and/or broken zlib implementation. 
But if the pack-file really is 2GB+ in size, I could imagine some sign 
issues cropping up.

git will generally use "unsigned long" (which is probably just 32-bit on 
your setup), but since git in those circumstances would be limited by the 
size of the VM _anyway_, that's not really much of a limitation (although 
probably broken on the crazy Windows "LLP64" model). But maybe we have 
some place where we use a signed thing, or zlib does, and I could see that 
causing breakage.

But that code-sequence really should never even come *close* to the 31-bit 
limit, as long as the individual objects themselves aren't bigger than the 
available VM space (and git currently assumes "unsigned long" is 
sufficiently big to cover the VM space, which is not technically correct, 
but should be fine on OS X too).

That said, we should use "off_t" in that function. I suspect we have a 
number of people (read: me) who have grown too used to living in a 64-bit 
world..

> I have been testing on my laptop which has a 32-bit Intel Core Duo.

Ok, so you're 32-bit limited even if there is were to be some 64-bit 
support for OS X.

> Also, I have run the same tests on a dual quad-core Intel processor
> which is 64 bit, (but not sure that Apple uses the 64 bits in 10.4.10).  I
> get the same results as above.

I'm pretty sure OS X defaults to a 32-bit environment, but has at least 
*some* 64-bit support. It would definitely need to be enabled explicitly 
(since they made the *insane* decision to move over to Intel laptop chips 
six months before they got 64-bit support! Somebody at Apple is a total 
idiot, and should get fired).

So it would be interesting to hear if a 64-bit build would make a 
difference.

> The zlib is at the latest revision of 1.2.3 and gcc is at 4.0.1
> which from what I can tell supports large files, because 'off_t' is 8 bytes
> which is the size used for a 'stat' file size.

See above: single files are size-limited, but with a large off_t like 
yours, you should be fine. Except we may have screwed up.

> I am just wondering if these size limitations exist for MacOSX
> or maybe I am doing something wrong (which is probably
> the case).

We *have* had issues with broken implementations of "pread()" on some 
systems.  

You could try setting NO_PREAD in the Makefile and compiling with the 
compatibility function.. That's the only thing that comes to mind as being 
worth trying in that area.

And if you have some script to generate the repository (ie you aren't 
using "live data", but are testing the limits of the system), if you can 
make that available, so that people with non-OSX environments can test, 
that would be interesting..

I certainly have some 32-bit environments too (old linux boxes), but I'm 
too lazy to write a test-case, so I was hoping you'd be using some simple 
scripts that I could just test and see if I can see the behaviour you 
describe myself.

That said, I have worked with a 3GB pack-file (one of the KDE trial 
repos). That worked fine. But git does tend to want a *lot* of memory for 
really big repositories, so I suspect that if you actually work with 2GB+ 
pack-files, you'll be wanting a 64-bit environment just because you'll be 
wanting more than 2GB of physical RAM in order to be able to access it 
efficiently.

			Linus

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox