Git development

Git development
 help / color / mirror / Atom feed

* Re: How to check new commit availability without full fetch?
From: Leo Razoumov @ 2010-01-11 17:35 UTC (permalink / raw)
  To: git
In-Reply-To: <alpine.LFD.2.00.1001111149150.10143@xanadu.home>

On 2010-01-11, Nicolas Pitre <nico@fluxnic.net> wrote:
> On Mon, 11 Jan 2010, Leo Razoumov wrote:
>
>  > On 2010-01-10, Nicolas Pitre <nico@fluxnic.net> wrote:
>  > >
>  > > You still don't answer my question though.  Again, _why_ do you need to
>  > >  know about remote commit availability without fetching them?
>  > >
>  >
>  > I use git to track almost all my data (code and otherwise) and spread
>  > it between several computers. I end up with several local repos having
>  > the same local branches. It happens once in a while that I fetch into
>  > a given remote/foo from several local foo branches from different
>  > machines and the operation fails. It happens because the commits have
>  > not been yet consistently distributed among the repos. To do the
>  > forensics and figure out who should update whom first I need a quick
>  > and non-destructive way to fetch dry-run.
>
>
> There is probably something awkward about your setup then.
>
>  Normally you should have a remote description for any of the remote
>  repositories you fetch from.  So if you have, say, remote machine_a with
>  repo foo, machine_b with repo bar, and machine_c with repo baz, then
>  fetching any of those will _only_ mirror locally the state of those
>  remote repositories.  There is no ordering required as there can't be
>  any conflicts in the mere fact of mirroring what the other guys have.
>  That's what remote tracking branches are for: they follow the state of a
>  remote repository and are never altered by local changes.  And you can
>  have as many of those as you wish and they will never conflict with each
>  other as each remote description is independent. And this is true
>  whether or not the remote repository lives on the same machine (that
>  would be a remote directory in that case).
>

Setup might be, indeed, awkward but it handles very diverse tasks.
As I said in my earlier emails different repos fetch into the *same* remote/foo.
So there could be conflicts and using fetch -f could cause loss of data.

Before switching to git I used mercurial for the same purpose and it
has command that are equivalent to fetch --dry-run.

--Leo--

^ permalink raw reply

* Re: [PATCH/RFC] filter-branch: Fix to allow replacing submodules with another content
From: Johannes Schindelin @ 2010-01-11 18:02 UTC (permalink / raw)
  To: Michal Sojka; +Cc: git
In-Reply-To: <1263227634-11259-1-git-send-email-sojkam1@fel.cvut.cz>

Hi,

On Mon, 11 Jan 2010, Michal Sojka wrote:

> When git filter-branch is used to replace a submodule with another
> content, it always fails on the first commit. Consider a repository with
> directory submodule containing a submodule. If I want to remove the
> submodule and replace it with a file, the following command fails.
> 
> git filter-branch --tree-filter 'rm -rf submodule &&
> 				 git rm -q submodule &&
> 				 mkdir submodule &&
> 				 touch submodule/file'
> 
> The error message is:
> error: submodule: is a directory - add files inside instead
> 
> The reason is that git diff-index, which generates a part of the list of
> files to update-index, emits also the removed submodule even if it was
> replaced by a real directory.
> 
> Adding --ignored-submodules solves the problem for me and
> tests in t7003-filter-branch.sh passes correctly.

Have you tested replacing one revision of a submodule with another?

Ciao,
Dscho

^ permalink raw reply

* Re: How to check new commit availability without full fetch?
From: Nicolas Pitre @ 2010-01-11 17:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Robin Rosenberg, SLONIK.AZ, Git Mailing List
In-Reply-To: <7vljg5ukol.fsf@alter.siamese.dyndns.org>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 912 bytes --]

On Mon, 11 Jan 2010, Junio C Hamano wrote:

> Robin Rosenberg <robin.rosenberg@dewire.com> writes:
> 
> > söndagen den 10 januari 2010 12.12.09 skrev  Leo Razoumov:
> >> Hi List,
> >> I am trying to find a way to check availability of new commits
> >> *before* doing fetch or pull. Unfortunately, neither fetch nor pull
> >> take "--dry-run" option (unlike push)
> >
> > Fetch has --dry-run. It's a fairly new option. The drawback is that it
> > still does the fetch, but it does not update the refs. If you re.run it
> > again it'll be quicker.
> 
> Doesn't that worry us if it really is quicker?
> 
> If --dry-run doesn't update the refs, why do the objects that were
> transferred by them not get asked the next time?  There must be a bug
> somewhere, but it is getting late already, so I'll leave it to experts in
> the transfer area to figure it out...

What about builtin-fetch.c:quickfetch() ?


Nicolas

^ permalink raw reply

* Problem creating commits/trees with commit-tree/mktree
From: Gavin Beatty @ 2010-01-11 18:14 UTC (permalink / raw)
  To: git

Hello,

I want to write commits to a branch without touching the index or
having a checkout (for a git subcommand I'm writing).

I can create new blobs and trees but can't figure out how to commit a
new tree/blob _with_ the old tree.

Currently, I do something a lot like:

objsha=$(echo 'contents' | git hash-object -w --stdin)
objtreesha=$(printf "100644 blob $objsha\tfile.txt\000" | git mktree -z)
newtreesha=$(printf "040000 tree $objtreesha\ttreefileisin\000" | git mktree -z)
echo 'commit msg' | git commit-tree $newtreesha -p $(git rev-parse
refs/heads/new)

I get a commit with treefileisin/file.txt. I haven't included the
other trees/files so they are gone in this commit. How do I include
them? Is commit-tree the wrong tool?

Is there some way to use git ls-tree that I don't know about?

Gavin

-- 
Gavin Beatty

SEMPER UBI SUB UBI

^ permalink raw reply

* Re: Problem creating commits/trees with commit-tree/mktree
From: Shawn O. Pearce @ 2010-01-11 18:17 UTC (permalink / raw)
  To: Gavin Beatty; +Cc: git
In-Reply-To: <f6d77fed1001111014g73a06923na05cd14d37968b04@mail.gmail.com>

Gavin Beatty <gavinbeatty@gmail.com> wrote:
> I want to write commits to a branch without touching the index or
> having a checkout (for a git subcommand I'm writing).
> 
> I can create new blobs and trees but can't figure out how to commit a
> new tree/blob _with_ the old tree.
> 
> Currently, I do something a lot like:
> 
> objsha=$(echo 'contents' | git hash-object -w --stdin)
> objtreesha=$(printf "100644 blob $objsha\tfile.txt\000" | git mktree -z)
> newtreesha=$(printf "040000 tree $objtreesha\ttreefileisin\000" | git mktree -z)

You aren't feeding in the old tree contents as part of this command.

If you are really doing this via a script, you should look at
git-fast-import.  Its faster, and its language better supports
this notion of editing an existing tree.

-- 
Shawn.

^ permalink raw reply

* Re: Problem creating commits/trees with commit-tree/mktree
From: Avery Pennarun @ 2010-01-11 18:38 UTC (permalink / raw)
  To: Gavin Beatty; +Cc: git
In-Reply-To: <f6d77fed1001111014g73a06923na05cd14d37968b04@mail.gmail.com>

On Mon, Jan 11, 2010 at 1:14 PM, Gavin Beatty <gavinbeatty@gmail.com> wrote:
> I can create new blobs and trees but can't figure out how to commit a
> new tree/blob _with_ the old tree.
[...]
> I get a commit with treefileisin/file.txt. I haven't included the
> other trees/files so they are gone in this commit. How do I include
> them? Is commit-tree the wrong tool?

When I'm doing similar things, I often prefer just using a temporary
git index file to keep track of my intermediate trees.  Just set
GIT_INDEX_FILE to point at a temporary file; then you can use
git-read-tree to read in an old tree, and git-update-index
(particularly with the --stdin flag) to update it.  Then you can use
git-write-tree to convert the temporary index into a real tree object.

Have fun,

Avery

^ permalink raw reply

* default behaviour for `gitmerge` (no arguments)
From: Gareth Adams @ 2010-01-11 18:49 UTC (permalink / raw)
  To: git

Hi there, long time user; first time caller here.

I wanted to suggest an improvement to git-merge which will either save some
typing or save some network resources. It won't save huge amounts of either but
every little helps!

Currently, some of my colleagues frequently end up typing:

    git pull; ...; git checkout otherbranch; git pull

Now, we have quite a low commit rate, it's unlikely (albeit vaguely possible)
that two people are working on the branch at the same time. This means the
second pull is doing a fetch which it effectively pointless.

Now of course this is a tiny amount of wastage, and while I could argue that it
would be an issue under poor network conditions that's not my point. As a coder
I'd want to get rid of the redundant fetch if I know it's redundant.

Unfortunately my other option is:

    git pull; ...; git checkout otherbranch; git merge myremote/otherbranch

which is annoying extra typing. Even with tab completion, it's redundant extra
typing because in these cases I'm trying to merge with the branch being tracked.

My suggestion is that `git merge` defaults to the same merge that a `git pull`
would perform, and there are 2 extra factors that make me think it's a workable
idea:

1) At the moment, `git merge` does nothing. Except mock me for not giving it a
command in a format it recognises. This change wouldn't have any effect that
would cause anyone a problem

2) When I checkout a branch which has unmerged changes in the tracking branch,
git *tells me* what branch I will be taking action with "Your branch is behind
the tracked remote branch '...' by 4 commits, and can be fast-forwarded" - but
then makes me type it out explicitly anyway!

I appreciate that there are many workflows where there is an advantage in
performing a second pull in case there are additional changes since the first
pull, but I still think there is a string case for git merge having a more
sensible default, as git pull does.

What do you think?

Thanks,
Gareth

^ permalink raw reply

* Git bin permissions
From: kp @ 2010-01-11 18:54 UTC (permalink / raw)
  To: git
In-Reply-To: <1b727e9e1001111052t5cb6f604tb7936bd627c5f9f7@mail.gmail.com>

Hello All,

It looks like my IT came in and destroyed permissions on my
git/gitosis install. Now I'm getting permission denied error while
trying to run a git push.
git/bin/gitosis-run-hook: Permission denied

I need to know the correct permissions for gitosis-init,
gitosis-run-hook, gitosis-serve. Can someone please give me an ls -la
output?

Thanks,
Alex

^ permalink raw reply

* Re: How to check new commit availability without full fetch?
From: Junio C Hamano @ 2010-01-11 19:20 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Robin Rosenberg, SLONIK.AZ, Git Mailing List
In-Reply-To: <alpine.LFD.2.00.1001111257300.10143@xanadu.home>

Nicolas Pitre <nico@fluxnic.net> writes:

> On Mon, 11 Jan 2010, Junio C Hamano wrote:
>
>> Robin Rosenberg <robin.rosenberg@dewire.com> writes:
>> 
>> > söndagen den 10 januari 2010 12.12.09 skrev  Leo Razoumov:
>> >> Hi List,
>> >> I am trying to find a way to check availability of new commits
>> >> *before* doing fetch or pull. Unfortunately, neither fetch nor pull
>> >> take "--dry-run" option (unlike push)
>> >
>> > Fetch has --dry-run. It's a fairly new option. The drawback is that it
>> > still does the fetch, but it does not update the refs. If you re.run it
>> > again it'll be quicker.
>> 
>> Doesn't that worry us if it really is quicker?
>> 
>> If --dry-run doesn't update the refs, why do the objects that were
>> transferred by them not get asked the next time?  There must be a bug
>> somewhere, but it is getting late already, so I'll leave it to experts in
>> the transfer area to figure it out...
>
> What about builtin-fetch.c:quickfetch() ?

Ahh, you are right.  It walks from objects the remote side told us are at
the tip, and stops at what we know are complete (i.e. reachable from our
tip of objects); immediately after --dry-run slurped objects, the next
fetch will prove everything is locally available and complete before going
over the network.

But either I am very confused or the use of fields from "struct ref" is
unintuitive in this codepath.

Why does it feed ref->old_sha1?  We are feeding _their_ tip commits to:

    rev-list --objects --stdin --not --all

and expecting it to report failure when some of their tip commits lead to
what we don't have yet.  The reason why we have old_sha1[] vs new_sha1[]
is because we want to report what changed from what, and also to protect
us from simultaneous updates by doing compare-and-swap using the value we
read from our refs when we started in old_sha1[], so I would have expected
that ref_map elements would have _their_ commits on the new_sha1[] side,
but apparently that is not what is happening, and it has been this way for
a long time.  The use of old_sha1[] came from 4191c35 (git-fetch: avoid
local fetching from alternate (again), 2007-11-11), so it is a lot more
likely that I am confused than the code is wrong and nobody noticed so
far.

What am I missing?

^ permalink raw reply

* Re: [PATCH 9/9] rerere forget path: forget recorded resolution
From: Johannes Sixt @ 2010-01-11 19:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v4omte72j.fsf@alter.siamese.dyndns.org>

On Montag, 11. Januar 2010, Junio C Hamano wrote:
> I ended up doing this myself.  As we are dropping the postimage and adding
> a new MERGE_RR entry, I also think that it is safer to update the preimage
> with the conflict we got for this round, so I added that as well.

Thank you, it appears to work as expected. It is actually very important to 
update the preimage as well, otherwise, the new postimage can contain 
unrelated additional changes.

> However, I think there is a room for improvement in preimage handling.
>
> Currently, the rerere database is indexed with the conflict hash and for
> each conflict hash you can record a single preimage-postimage pair to
> replay.  But you can have conflicts with the same conflict hash, but with
> slightly different contexts outside the conflicted region, and the right
> resolution can be different depending on the outside context.

I did encounter a case where the same resolution would apply to all conflicts 
that have the same conflict hash, so it's not quite what you talk about. But 
not all conflicts were automatically resolved. I haven't yet analyzed what 
happened - it could just be that the xdl_merge call fails due to the 
differences in the text immediately outside the conflict markers.

> In the traditional implementation, I punted this issue by noticing
> conflicts in the three-way merge between pre, post and this images.  If
> preimage is too different from the conflicted contents we got during this
> merge, then the previous resolution should not apply.
>
> But I think the right solution would be to have more than one preimage and
> postimage pairs (preimage.0 vs postimage.0,... etc.) and try to use each
> of them in handle_path() until it finds one that can be used to cleanly
> merge with the conflict we got in thisimage during this round.

The situation happens rarely, so I don't know if we should care. OTOH, *when* 
the situation arises, and a recorded resolution is applied incorrectly, it 
may be quite annoying. Dunno.

-- Hannes

^ permalink raw reply

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
From: Fredrik Kuivinen @ 2010-01-11 19:26 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Linus Torvalds, Miles Bader, Jeff King, Nguyen Thai Ngoc Duy, git
In-Reply-To: <7vvdf9402f.fsf@alter.siamese.dyndns.org>

[Resending as the first copy didn't reach the list.]

On Mon, Jan 11, 2010 at 07:39, Junio C Hamano <gitster@pobox.com> wrote:
>
> Linus Torvalds <torvalds@linux-foundation.org> writes:
>
> > It doesn't matter. Since we do the line-by-line thing, the input is always
> > so short that DFA vs NFA vs BM vs other-clever-search doesn't matter.
> > There is no scaling - the grep buffer tends to be too small for the
> > algorithm to matter.
> >
> > And the reason we do things line-by-line is that we need to then output
> > things line-per-line.
>
> Here is an experimental patch; first, some numbers (hot cache best of 5 runs).

I get some very unexpected results with this patch. (best of 5 runs):

before:

time git grep --no-ext-grep qwerty
drivers/char/keyboard.c:        "qwertyuiop[]\r\000as"
         /* 0x10 - 0x1f */

real 0m3.531s
user 0m3.056s
sys 0m0.468s

after:

time git grep --no-ext-grep qwerty
drivers/char/keyboard.c:        "qwertyuiop[]\r\000as"
         /* 0x10 - 0x1f */

real    0m4.794s
user    0m4.380s
sys    0m0.404s

with external grep:
time git grep qwerty
drivers/char/keyboard.c:        "qwertyuiop[]\r\000as"
         /* 0x10 - 0x1f */

real    0m1.236s
user    0m0.668s
sys    0m0.544s


So, if I haven't messed up the benchmark, it seems that Junio's patch
makes things go _slower_. I don't understand at all why I get these
results...

This is on a laptop with an Intel T2080 processor running Ubuntu 9.10.

Any ideas on how this can be explained?

- Fredrik

^ permalink raw reply

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
From: Linus Torvalds @ 2010-01-11 19:29 UTC (permalink / raw)
  To: Fredrik Kuivinen
  Cc: Junio C Hamano, Miles Bader, Jeff King, Nguyen Thai Ngoc Duy, git
In-Reply-To: <4c8ef71001111119p253170f8q37bcd3708d894a62@mail.gmail.com>

On Mon, 11 Jan 2010, Fredrik Kuivinen wrote:
> 
> Any ideas on how this can be explained?

Could it be a bad 'strstr()' implementation? 

Try a complex pattern ("qwerty.*as" finds the same line), and see if that 
too is slower than before. If that is faster than it used to be (with 
--no-ext-grep, of course), then it's strstr() that is badly implemented.

For me, on x86-64 (Fedora-12), strstr() seems to do pretty well. But it's 
easy to do a stupid implementation of strstr that does a 'strlen()' first, 
for example, and thus always traverses all data _twice_ etc. Depending on 
cache sizes etc, that can end up killing performance (or not mattering 
much at all..)

		Linus

^ permalink raw reply

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
From: Fredrik Kuivinen @ 2010-01-11 19:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Junio C Hamano, Miles Bader, Jeff King, Nguyen Thai Ngoc Duy, git
In-Reply-To: <alpine.LFD.2.00.1001111124480.17145@localhost.localdomain>

On Mon, Jan 11, 2010 at 20:29, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>
> On Mon, 11 Jan 2010, Fredrik Kuivinen wrote:
>>
>> Any ideas on how this can be explained?
>
> Could it be a bad 'strstr()' implementation?
>
> Try a complex pattern ("qwerty.*as" finds the same line), and see if that
> too is slower than before. If that is faster than it used to be (with
> --no-ext-grep, of course), then it's strstr() that is badly implemented.

Ah, yes, that's it. With the pattern "qwerty.*as" I get 2.5s with the
patch and 6s without.

Thanks.

- Fredrik

^ permalink raw reply

* Re: default behaviour for `gitmerge` (no arguments)
From: Junio C Hamano @ 2010-01-11 19:43 UTC (permalink / raw)
  To: Gareth Adams; +Cc: git
In-Reply-To: <loom.20100111T185144-655@post.gmane.org>

Gareth Adams <gareth.adams@gmail.com> writes:

> Unfortunately my other option is:
>
>     git pull; ...; git checkout otherbranch; git merge myremote/otherbranch
>
> which is annoying extra typing.

Replace 'pull' with 'fetch' and a bit more regular pattern would emerge.

The code indeed knows (as you can see "git pull" can figure it out) what
other ref the current branch is configured to merge with by default.
There is even a plumbing to do this for script writers.

    $ git for-each-ref --format='%(upstream)' $(git symbolic-ref HEAD)

We can teach this short-hand to "git merge", perhaps:

    $ git merge --default

But "no argument" cannot be the short-hand, because...

> 1) At the moment, `git merge` does nothing. Except mock me for not giving it a
> command in a format it recognises. This change wouldn't have any effect that
> would cause anyone a problem

... except for people who uses a script that does

    commits=
    while some condition
    do
    	commit=$(find some other commit that should be merged)
        commits="$commits$commit "
    done
    git merge $commits

and expect the last step will fail without doing any damage when the loop
finds no new developments.  "no argument means --default" will break their
scripts.

^ permalink raw reply

* Re: [PATCH 9/9] rerere forget path: forget recorded resolution
From: Junio C Hamano @ 2010-01-11 20:05 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git
In-Reply-To: <201001112022.31257.j6t@kdbg.org>

Johannes Sixt <j6t@kdbg.org> writes:

> I did encounter a case where the same resolution would apply to all
> conflicts that have the same conflict hash, so it's not quite what you
> talk about. But not all conflicts were automatically resolved. I haven't
> yet analyzed what happened - it could just be that the xdl_merge call
> fails due to the differences in the text immediately outside the
> conflict markers.

Actually it is _very_ easy to fool rerere to do something totally
unexpected, and I have been thinking about using the similarity comparison
algorithm on the region outside the conflicted area between preimage and
thisimage and reject use of rerere.

Try this in an empty directory.

-- >8 --

#!/bin/sh

git init

create_numbers () {
	for n in 0 1 2 3 4 "$1" 5 6 7 8 9
	do
		echo $n
	done >numbers.txt
}

create_letters () {
	for l in a b c d e "$1" f g h i j
	do
		echo $l
	done >letters.txt
}

create_files () {
	create_numbers "$1"
	create_letters "$1"
}

create_files ""
git add numbers.txt letters.txt
git commit -m initial
git branch side

create_files "+"
git commit -a -m master

git checkout side
create_files "-"
git commit -a -m side

mkdir -p .git/rr-cache

# On this history we changed an empty line to +; merge
# with another history that changed it to -
git checkout master^0
git merge side

# The above should have conflicted.  The resolution is to '='

create_numbers "="
git rerere

git rerere status
git rerere diff
cat numbers.txt
cat letters.txt

-- 8< --

Now, immediately after this sequence, rerere will give you an disaster.

^ permalink raw reply

* Re: How to check new commit availability without full fetch?
From: Andreas Schwab @ 2010-01-11 20:06 UTC (permalink / raw)
  To: SLONIK.AZ; +Cc: Git Mailing List
In-Reply-To: <ee2a733e1001100312j786108fct1b4c8abd0acc5afc@mail.gmail.com>

Leo Razoumov <slonik.az@gmail.com> writes:

> Hi List,
> I am trying to find a way to check availability of new commits
> *before* doing fetch or pull. Unfortunately, neither fetch nor pull
> take "--dry-run" option (unlike push)

Try git remote show origin.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
From: Linus Torvalds @ 2010-01-11 20:07 UTC (permalink / raw)
  To: Fredrik Kuivinen
  Cc: Junio C Hamano, Miles Bader, Jeff King, Nguyen Thai Ngoc Duy, git
In-Reply-To: <4c8ef71001111140j7e7d0081o7718d956104a2451@mail.gmail.com>

On Mon, 11 Jan 2010, Fredrik Kuivinen wrote:
> >
> > Try a complex pattern ("qwerty.*as" finds the same line), and see if that
> > too is slower than before. If that is faster than it used to be (with
> > --no-ext-grep, of course), then it's strstr() that is badly implemented.
> 
> Ah, yes, that's it. With the pattern "qwerty.*as" I get 2.5s with the
> patch and 6s without.

Ok, so on your machine, regcomp() is basically twice as fast as strstr().

Which is not entirely unexpected: I was actually surprised by strstr() 
being apparently so good on my machine. I do not generally expect things 
like that to be at all optimized for bigger working sets. Most common uses 
of strstr() are in short strings - not "strings" that are many kilobytes 
in size (the whole file).

In fact, I suspect it works so well for me because in my version of glibc 
it's not just SSE-optimized: judging by the naming it's SSE4.2 optimized - 
so the case I see on my machine will _only_ happen on Nehalem-based cores 
(ie the new "Core i[357]" cpu's).

It is entirely possible that strstr in general is a disaster.

		Linus

^ permalink raw reply

* Re: How to check new commit availability without full fetch?
From: Nicolas Pitre @ 2010-01-11 20:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Robin Rosenberg, SLONIK.AZ, Git Mailing List
In-Reply-To: <7vmy0kjvms.fsf@alter.siamese.dyndns.org>

On Mon, 11 Jan 2010, Junio C Hamano wrote:

> Ahh, you are right.  It walks from objects the remote side told us are at
> the tip, and stops at what we know are complete (i.e. reachable from our
> tip of objects); immediately after --dry-run slurped objects, the next
> fetch will prove everything is locally available and complete before going
> over the network.
> 
> But either I am very confused or the use of fields from "struct ref" is
> unintuitive in this codepath.
> 
> Why does it feed ref->old_sha1?  We are feeding _their_ tip commits to:
> 
>     rev-list --objects --stdin --not --all
> 
> and expecting it to report failure when some of their tip commits lead to
> what we don't have yet.  The reason why we have old_sha1[] vs new_sha1[]
> is because we want to report what changed from what, and also to protect
> us from simultaneous updates by doing compare-and-swap using the value we
> read from our refs when we started in old_sha1[], so I would have expected
> that ref_map elements would have _their_ commits on the new_sha1[] side,
> but apparently that is not what is happening, and it has been this way for
> a long time.  The use of old_sha1[] came from 4191c35 (git-fetch: avoid
> local fetching from alternate (again), 2007-11-11), so it is a lot more
> likely that I am confused than the code is wrong and nobody noticed so
> far.

Very confusing indeed.  I first discovered about quickfetch() myself 
when I fixed shallow clone leading to commit 86386829.

If old_sha1[] was our refs then quickfetch() would always succeed and 
we'd never fetch anything.

> What am I missing?

Digging a bit, it looks like get_remote_heads() is storing the remote's 
heads into old_sha1.  And so is performed in get_refs_from_bundle(), and 
in insert_packed_refs() from get_refs_via_rsync(), etc.

Looking at the struct ref definition, we can see:

struct ref {
        struct ref *next;
        unsigned char old_sha1[20];
        unsigned char new_sha1[20];
        [...]
        struct ref *peer_ref; /* when renaming */
        [...]
};

And apparently store_updated_refs() ends up using that peer_ref like 
this:

        for (rm = ref_map; rm; rm = rm->next) {
                struct ref *ref = NULL;

                if (rm->peer_ref) {
                        ref = xcalloc(1, sizeof(*ref) + strlen(rm->peer_ref->name) + 1);
                        strcpy(ref->name, rm->peer_ref->name);
                        hashcpy(ref->old_sha1, rm->peer_ref->old_sha1);
                        hashcpy(ref->new_sha1, rm->old_sha1);
                        ref->force = rm->peer_ref->force;
                }

So.... Doesn't this all look like a total mess of needless (and even 
leaked in this case) allocations and duplications, besides being 
completely unintuitive?  Both hashcpy() above certainly throw my sense 
of logic aside...

Nicolas

^ permalink raw reply

* Re: [PATCH 9/9] rerere forget path: forget recorded resolution
From: Johannes Sixt @ 2010-01-11 21:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v3a2cif04.fsf@alter.siamese.dyndns.org>

On Montag, 11. Januar 2010, Junio C Hamano wrote:
> Actually it is _very_ easy to fool rerere to do something totally
> unexpected, and I have been thinking about using the similarity comparison
> algorithm on the region outside the conflicted area between preimage and
> thisimage and reject use of rerere.
>
> Try this in an empty directory.
>[snip]
> Now, immediately after this sequence, rerere will give you an disaster.

Indeed. The problem here is that two entirely different resolutions are 
recorded for the same conflict hash *in one run of rerere*. The damage can be 
avoided if conflict hashes are not reused in do_plain_rerere (in the first 
loop). (Though, I'm currently not in the mood to look into this in more 
depth.)

Of course, this does not mean that *both* conflicts can be resolved 
automatically when the merge is repeated. In my use-case this would have been 
desirable (and even your example would suggest it is, but that is not 
generally true).

-- Hannes

^ permalink raw reply

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
From: Fredrik Kuivinen @ 2010-01-11 21:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Junio C Hamano, Miles Bader, Jeff King, Nguyen Thai Ngoc Duy, git
In-Reply-To: <alpine.LFD.2.00.1001111159270.17145@localhost.localdomain>

[-- Attachment #1: Type: text/plain, Size: 1685 bytes --]

On Mon, Jan 11, 2010 at 21:07, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>
> On Mon, 11 Jan 2010, Fredrik Kuivinen wrote:
>> >
>> > Try a complex pattern ("qwerty.*as" finds the same line), and see if that
>> > too is slower than before. If that is faster than it used to be (with
>> > --no-ext-grep, of course), then it's strstr() that is badly implemented.
>>
>> Ah, yes, that's it. With the pattern "qwerty.*as" I get 2.5s with the
>> patch and 6s without.
>
> Ok, so on your machine, regcomp() is basically twice as fast as strstr().

Yes.

> Which is not entirely unexpected: I was actually surprised by strstr()
> being apparently so good on my machine. I do not generally expect things
> like that to be at all optimized for bigger working sets. Most common uses
> of strstr() are in short strings - not "strings" that are many kilobytes
> in size (the whole file).
>
> In fact, I suspect it works so well for me because in my version of glibc
> it's not just SSE-optimized: judging by the naming it's SSE4.2 optimized -
> so the case I see on my machine will _only_ happen on Nehalem-based cores
> (ie the new "Core i[357]" cpu's).
>
> It is entirely possible that strstr in general is a disaster.

Another option is to use memmem instead. As we know the length of the
buffer already it should be a slight improvement over strstr for
everyone. memmem may cause some portability problems though as it is a
GNU extension.

I get these results: (git-grep --no-ext-grep qwerty, best of five)

Junio's patch: 0:04.84
memmem (attached patch on top of Junio's): 0:02.91
regcomp/regexec (I changed is_fixed to always return 0, also on top of
Junio's): 0:02.02

- Fredrik

[-- Attachment #2: patch --]
[-- Type: application/octet-stream, Size: 1236 bytes --]

diff --git a/grep.c b/grep.c
index 940e200..d34247f 100644
--- a/grep.c
+++ b/grep.c
@@ -264,13 +264,14 @@ static void show_name(struct grep_opt *opt, const char *name)
 }
 
 
-static int fixmatch(const char *pattern, char *line, int ignore_case, regmatch_t *match)
+static int fixmatch(const char *pattern, char *line, char *eol,
+		    int ignore_case, regmatch_t *match)
 {
 	char *hit;
 	if (ignore_case)
 		hit = strcasestr(line, pattern);
 	else
-		hit = strstr(line, pattern);
+		hit = memmem(line, eol - line, pattern, strlen(pattern));
 
 	if (!hit) {
 		match->rm_so = match->rm_eo = -1;
@@ -333,7 +334,7 @@ static int match_one_pattern(struct grep_pat *p, char *bol, char *eol,
 
  again:
 	if (p->fixed)
-		hit = !fixmatch(p->pattern, bol, p->ignore_case, pmatch);
+		hit = !fixmatch(p->pattern, bol, eol, p->ignore_case, pmatch);
 	else
 		hit = !regexec(&p->regexp, bol, 1, pmatch, eflags);
 
@@ -646,7 +647,7 @@ static int look_ahead(struct grep_opt *opt,
 		regmatch_t m;
 
 		if (p->fixed)
-			hit = !fixmatch(p->pattern, bol, p->ignore_case, &m);
+			hit = !fixmatch(p->pattern, bol, bol + *left_p, p->ignore_case, &m);
 		else
 			hit = !regexec(&p->regexp, bol, 1, &m, 0);
 		if (!hit || m.rm_so < 0 || m.rm_eo < 0)

^ permalink raw reply related

* [PATCH] push: spell 'Note about fast-forwards' section name correctly in error message.
From: Matthieu Moy @ 2010-01-11 21:09 UTC (permalink / raw)
  To: git, gitster; +Cc: Matthieu Moy

The error message in case of non-fast forward points to 'git push
--help', but used to talk about a section 'non-fast-forward', while the
actual section name is 'Note about fast-forwards'.

Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
---
My bad, I'm the original author of the patch introducing this ;-).

 builtin-push.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin-push.c b/builtin-push.c
index f7661d2..28a26e7 100644
--- a/builtin-push.c
+++ b/builtin-push.c
@@ -125,8 +125,8 @@ static int push_with_options(struct transport *transport, int flags)
 
 	if (nonfastforward && advice_push_nonfastforward) {
 		printf("To prevent you from losing history, non-fast-forward updates were rejected\n"
-		       "Merge the remote changes before pushing again.  See the 'non-fast-forward'\n"
-		       "section of 'git push --help' for details.\n");
+		       "Merge the remote changes before pushing again.  See the 'Note about\n"
+		       "fast-forwards' section of 'git push --help' for details.\n");
 	}
 
 	return 1;
-- 
1.6.6.198.gbaea2

^ permalink raw reply related

* Re: [PATCH] grep: do not do external grep on skip-worktree entries
From: Linus Torvalds @ 2010-01-11 21:24 UTC (permalink / raw)
  To: Fredrik Kuivinen
  Cc: Junio C Hamano, Miles Bader, Jeff King, Nguyen Thai Ngoc Duy, git
In-Reply-To: <4c8ef71001111307q6679039ajbef22f2e1748df56@mail.gmail.com>

On Mon, 11 Jan 2010, Fredrik Kuivinen wrote:
> 
> Another option is to use memmem instead. As we know the length of the
> buffer already it should be a slight improvement over strstr for
> everyone. memmem may cause some portability problems though as it is a
> GNU extension.

I'd almost prefer to just drop the strstr entirely.

It's not actually all *that* big a win, even on my machine. I get

 - strstr:

        real    0m0.309s
        user    0m0.168s
        sys     0m0.136s

 - regexec:

	real	0m0.410s
	user	0m0.220s
	sys	0m0.116s

so yeah, it's slower, but not by a huge degree. With strstr, "git grep" 
actually beats the external grep for me, but I don't really care. It's 
already way better than it used to be - and clearly strstr has a lot of 
potential problems.

Sure memmem() might be better for you than strstr, but on the other hand, 
it might easily be worse than strstr for others - and not just from a 
portability standpoint. Is memmem() optimized to take advantage of SSE4.2? 
I suspect it is not, exactly _because_ it's a GNU extension, so Intel 
hasn't published optimized sample code for people to use.

So I would argue against even bothering to try memmem. Especially since	in 
your case, regexec() is apparently faster than memmem _anyway_. I expect 
that it is for me too, but I'm too lazy to check.

			Linus

^ permalink raw reply

* [PATCH] string-list: remove print_string_list, since it is not used anymore.
From: Thiago Farina @ 2010-01-11 21:29 UTC (permalink / raw)
  To: git

Signed-off-by: Thiago Farina <tfransosi@gmail.com>
---
 string-list.c |   10 ----------
 string-list.h |    1 -
 2 files changed, 0 insertions(+), 11 deletions(-)

diff --git a/string-list.c b/string-list.c
index 1ac536e..e929745 100644
--- a/string-list.c
+++ b/string-list.c
@@ -138,16 +138,6 @@ void string_list_clear_func(struct string_list *list, string_list_clear_func_t c
 	list->nr = list->alloc = 0;
 }
 
-
-void print_string_list(const char *text, const struct string_list *p)
-{
-	int i;
-	if ( text )
-		printf("%s\n", text);
-	for (i = 0; i < p->nr; i++)
-		printf("%s:%p\n", p->items[i].string, p->items[i].util);
-}
-
 struct string_list_item *string_list_append(const char *string, struct string_list *list)
 {
 	ALLOC_GROW(list->items, list->nr + 1, list->alloc);
diff --git a/string-list.h b/string-list.h
index 6569cf6..8598257 100644
--- a/string-list.h
+++ b/string-list.h
@@ -12,7 +12,6 @@ struct string_list
 	unsigned int strdup_strings:1;
 };
 
-void print_string_list(const char *text, const struct string_list *p);
 void string_list_clear(struct string_list *list, int free_util);
 
 /* Use this function to call a custom clear function on each util pointer */
-- 
1.6.6.103.g699d2

^ permalink raw reply related

* Re: [PATCH] string-list: remove print_string_list, since it is not used anymore.
From: Johannes Schindelin @ 2010-01-11 21:52 UTC (permalink / raw)
  To: Thiago Farina; +Cc: git
In-Reply-To: <1263245389-1558-1-git-send-email-tfransosi@gmail.com>

Hi.

On Mon, 11 Jan 2010, Thiago Farina wrote:

> Signed-off-by: Thiago Farina <tfransosi@gmail.com>
> ---

It was never used, except for debugging.  Does it hurt you really all that 
much?

Ciao,
Dscho

^ permalink raw reply

* [RFC PATCH (WIP)] Show a dirty working tree and a detached HEAD in status for submodule
From: Jens Lehmann @ 2010-01-11 22:05 UTC (permalink / raw)
  To: Git Mailing List
  Cc: Junio C Hamano, Johannes Schindelin, Shawn O. Pearce, Heiko Voigt,
	Lars Hjemli

Until now a submodule only showed up as changed in the supermodule when
the last commit in the submodule differed from the one in the index or
the last commit of the superproject. A dirty working tree or a detached
HEAD in a submodule were just ignored when looking at it from the
superproject.

This patch shows these changes when using git status or one of the diff
commands which compare against the working tree in the superproject.

Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
---


This is the first version of a patch letting git status and the git
diff family show dirty working directories and a detached HEAD in
directories. It is not intended to be merged in its present form but
to be used as a starting point for discussion if this is going in
the right direction.


What the patch does:

* It makes git show submodules as modified in the superproject when
  one or more of these conditions are met:

    a) The submodule contains untracked files
    b) The submodule contains modified files
    c) The submodules HEAD is not on a local or remote branch

  That can be seen when using either "git status", "git diff[-files]"
  & "git diff[-index] HEAD" (and with "git gui" & gitk).


What the patch doesn't do (yet):

* It still breaks tests t7400-submodule-basic.sh &
  t7407-submodule-foreach.sh.

* It doesn't give detailed output when doing a "git diff* -p" with or
  without the --submodule option. It should show something like

    diff --git a/sub b/sub
    index 5431f52..3f35670 160000
    --- a/sub
    +++ b/sub
    @@ -1 +1 @@
    -Subproject commit 5431f529197f3831cdfbba1354a819a79f948f6f
    +Subproject commit 3f356705649b5d566d97ff843cf193359229a453-dirty

  for "git diff* -p" (notice the "-dirty" in the last line) or

      Submodule sub contains untracked files
      Submodule sub contains modified files
      Submodule sub contains a HEAD not on any branch
      Submodule sub 5431f52..3f35670:
      > commit message 1

  when using the --submodule option of the diff family.

* This behavior is not configurable but activated by default. A config
  option is needed here.

* It doesn't give optimal performance:

  - Apart from the fact that checking submodules this way will always
    be slower than ignoring their changes as git does until now, doing
    two run_command() calls for each submodule is not going to help at
    all (especially when running on Windows).

  - AFAICS the check for a detached HEAD would be faster for the most
    probable case if it would check against remotes/origin/master first.
    And it could stop when the first branch was found instead of
    continuing to look for others too as "git branch --contains" does.

  - Similar for the test for a dirty working directory, no need to have
    the full list of new and modified files, it could stop at the first
    one it finds.

  - If no detailed output is wanted the examination of HEAD and the
    working directory is not necessary when the HEAD and the commit in
    the index of the superproject already don't match.


What do you think?




 diff-lib.c                  |    7 +++-
 submodule.c                 |  103 +++++++++++++++++++++++++++++++++++++++++++
 submodule.h                 |    1 +
 t/t7506-status-submodule.sh |   45 ++++++++++++++++++-
 4 files changed, 154 insertions(+), 2 deletions(-)

diff --git a/diff-lib.c b/diff-lib.c
index 1c7e652..323305a 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -159,7 +159,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 				continue;
 		}

-		if (ce_uptodate(ce) || ce_skip_worktree(ce))
+		if ((ce_uptodate(ce) && !S_ISGITLINK(ce->ce_mode)) || ce_skip_worktree(ce))
 			continue;

 		/* If CE_VALID is set, don't look at workdir for file removal */
@@ -176,6 +176,11 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 			continue;
 		}
 		changed = ce_match_stat(ce, &st, ce_option);
+		if (S_ISGITLINK(ce->ce_mode)) {
+			/* TODO: This should not be executed when the submodule is changed
+			 * and only short ouptut is wanted for performance reasons. */
+			changed |= is_submodule_modified(ce->name);
+		}
 		if (!changed) {
 			ce_mark_uptodate(ce);
 			if (!DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER))
diff --git a/submodule.c b/submodule.c
index 86aad65..b35f1b3 100644
--- a/submodule.c
+++ b/submodule.c
@@ -4,6 +4,7 @@
 #include "diff.h"
 #include "commit.h"
 #include "revision.h"
+#include "run-command.h"

 int add_submodule_odb(const char *path)
 {
@@ -112,3 +113,105 @@ void show_submodule_summary(FILE *f, const char *path,
 	}
 	strbuf_release(&sb);
 }
+
+static int is_submodule_head_detached(const char *path)
+{
+	int retval, len;
+	struct child_process branch;
+	const char *argv[] = {
+		"branch",
+		"-a",
+		"--contains",
+		"HEAD",
+		NULL,
+	};
+	char *env[3];
+	struct strbuf buf = STRBUF_INIT;
+
+	strbuf_addf(&buf, "GIT_WORK_TREE=%s", path);
+	env[0] = strbuf_detach(&buf, NULL);
+	strbuf_addf(&buf, "GIT_DIR=%s/.git", path);
+	env[1] = strbuf_detach(&buf, NULL);
+	env[2] = NULL;
+
+	memset(&branch, 0, sizeof(branch));
+	branch.argv = argv;
+	branch.env = (const char *const *)env;
+	branch.git_cmd = 1;
+	branch.no_stdin = 1;
+	branch.out = -1;
+	if (start_command(&branch))
+		die("Could not run git branch -a --contains HEAD");
+
+	len = strbuf_read(&buf, branch.out, 1024);
+	close(branch.out);
+
+	if (finish_command(&branch))
+		die("git branch -a --contains HEAD failed");
+
+	retval = (strncmp(buf.buf, "* (no branch)", 13) == 0);
+
+	free(env[0]);
+	free(env[1]);
+	strbuf_release(&buf);
+	return retval;
+}
+
+static int is_submodule_working_directory_dirty(const char *path)
+{
+	int len;
+	struct child_process branch;
+	const char *argv[] = {
+		"status",
+		"--porcelain",
+		NULL,
+	};
+	char *env[3];
+	struct strbuf buf = STRBUF_INIT;
+
+	strbuf_addf(&buf, "GIT_WORK_TREE=%s", path);
+	env[0] = strbuf_detach(&buf, NULL);
+	strbuf_addf(&buf, "GIT_DIR=%s/.git", path);
+	env[1] = strbuf_detach(&buf, NULL);
+	env[2] = NULL;
+
+	memset(&branch, 0, sizeof(branch));
+	branch.argv = argv;
+	branch.env = (const char *const *)env;
+	branch.git_cmd = 1;
+	branch.no_stdin = 1;
+	branch.out = -1;
+	if (start_command(&branch))
+		die("Could not run git status --porcelain");
+
+	len = strbuf_read(&buf, branch.out, 1024);
+	close(branch.out);
+
+	if (finish_command(&branch))
+		die("git status --porcelain failed");
+
+	free(env[0]);
+	free(env[1]);
+	strbuf_release(&buf);
+	return len != 0;
+}
+
+int is_submodule_modified(const char *path)
+{
+	struct strbuf buffer = STRBUF_INIT;
+
+	strbuf_addf(&buffer, "%s/.git/", path);
+	if (!is_directory(buffer.buf)) {
+		strbuf_release(&buffer);
+		return 0;
+	}
+	strbuf_release(&buffer);
+
+	if (is_submodule_head_detached(path))
+		return 1;
+
+	if (is_submodule_working_directory_dirty(path))
+		return 1;
+
+	return 0;
+}
diff --git a/submodule.h b/submodule.h
index 4c0269d..0773121 100644
--- a/submodule.h
+++ b/submodule.h
@@ -4,5 +4,6 @@
 void show_submodule_summary(FILE *f, const char *path,
 		unsigned char one[20], unsigned char two[20],
 		const char *del, const char *add, const char *reset);
+int is_submodule_modified(const char *path);

 #endif
diff --git a/t/t7506-status-submodule.sh b/t/t7506-status-submodule.sh
index 3ca17ab..509754a 100755
--- a/t/t7506-status-submodule.sh
+++ b/t/t7506-status-submodule.sh
@@ -10,8 +10,12 @@ test_expect_success 'setup' '
 	: >bar &&
 	git add bar &&
 	git commit -m " Add bar" &&
+	: >foo &&
+	git add foo &&
+	git commit -m " Add foo" &&
 	cd .. &&
-	git add sub &&
+	echo output > .gitignore
+	git add sub .gitignore &&
 	git commit -m "Add submodule sub"
 '

@@ -23,6 +27,45 @@ test_expect_success 'commit --dry-run -a clean' '
 	git commit --dry-run -a |
 	grep "nothing to commit"
 '
+
+echo "changed" > sub/foo
+test_expect_success 'status with modified file in submodule' '
+	git status | grep "modified:   sub"
+'
+test_expect_success 'status with modified file in submodule (porcelain)' '
+	git status --porcelain >output &&
+	diff output - <<-EOF
+ M sub
+EOF
+'
+(cd sub && git checkout foo)
+
+echo "content" > sub/new-file
+test_expect_success 'status with untracked file in submodule' '
+	git status | grep "modified:   sub"
+'
+test_expect_success 'status with untracked file in submodule (porcelain)' '
+	git status --porcelain >output &&
+	diff output - <<-EOF
+ M sub
+EOF
+'
+rm sub/new-file
+
+(cd sub && 2>/dev/null
+old_head=$(cat .git/refs/heads/master) &&
+git reset --hard HEAD^ &&
+git checkout $old_head 2>/dev/null)
+test_expect_success 'status with detatched HEAD in submodule' '
+	git status | grep "modified:   sub"
+'
+test_expect_success 'status with detatched HEAD in submodule (porcelain)' '
+	git status --porcelain >output &&
+	diff output - <<-EOF
+ M sub
+EOF
+'
+
 test_expect_success 'rm submodule contents' '
 	rm -rf sub/* sub/.git
 '
-- 
1.6.6.203.g6b27d.dirty

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox