Git development

Git development
 help / color / mirror / Atom feed

* Re: Cleaning up git user-interface warts
From: Shawn Pearce @ 2006-11-20 19:46 UTC (permalink / raw)
  To: Horst H. von Brand; +Cc: Junio C Hamano, hanwen, git
In-Reply-To: <200611201944.kAKJiCAw014973@laptop13.inf.utfsm.cl>

"Horst H. von Brand" <vonbrand@inf.utfsm.cl> wrote:
> If you make pushing into an empty repository work also, you fix the case of
> "create an empty repo for somebody, let them fill it up remotely later".

This seems to work just fine now.  I do it all of the time.

-- 

^ permalink raw reply

* Re: Cleaning up git user-interface warts
From: Horst H. von Brand @ 2006-11-20 19:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: hanwen, git
In-Reply-To: <7vslgjaa0c.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> wrote:
> Junio C Hamano <junkio@cox.net> writes:
> > Han-Wen Nienhuys <hanwen@xs4all.nl> writes:
> >
> >> [hanwen@haring y]$ git pull ../x
> >> fatal: Needed a single revision
> >> Pulling into a black hole?
> 
> Having said all that, I happen to think that this particular
> case of pulling into void could deserve to be special cased to
> pretend it is a fast forward (after all, nothingness is an
> ancestor of anything), if only to make new people's first
> experience more pleasant.

If you make pushing into an empty repository work also, you fix the case of
"create an empty repo for somebody, let them fill it up remotely later".

[...]

> So please consider that this is classified as a bug.

Thanks!
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                    Fono: +56 32 2654431
Universidad Tecnica Federico Santa Maria             +56 32 2654239

^ permalink raw reply

* Re: [WISH] Store also tag dereferences in packed-refs
From: Junio C Hamano @ 2006-11-20 19:32 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0611200817330.3692@woody.osdl.org>

Linus Torvalds <torvalds@osdl.org> writes:

>> For this particular one, there is no need for version 2.
>
> I don't think you understand.

That is true.  I was not thinking about optimizing the
lightweight tag case -- we would want to be able to tell that
they are not tag objects and skip peel_ref altogether, and in
order to do that we do need a way to tell us that we can trust
the absense of peeled representation in the packed-refs file.

It is my day job day so I do not expect I can continue til later
today, though...

^ permalink raw reply

* Re: Feature request: git-pull -e/--edit
From: Eran Tromer @ 2006-11-20 19:10 UTC (permalink / raw)
  To: Horst H. von Brand; +Cc: Junio C Hamano, git
In-Reply-To: <200611201709.kAKH9or1012062@laptop13.inf.utfsm.cl>

On 2006-11-20 19:09, Horst H. von Brand wrote:
>>
>>   A------------F master
>>    \          /
>>     B--C--D--E
>>
>> Yes, E and F have identical trees. But it's actually *very useful*, if
>> the commit message at F says "merged branch foo containing experimental
>> bar from quux". And it shows up nicely when looking at gitk.
> 
> I don't see the usefulness of this. 

Just look up this thread for the most recent example: recording the text
of "pull foo to get" and "[00/05] Fix quux" message.

> And if quux merges back, she gets the same plus a new merge node, and...
> Linus told everybody (quite forcefully, I might add) that this is not
> acceptable for distributed development.

I've address this. Sure, it breaks down completely if done by default
when nothing new happens; but not when done judiciously. For real
problems to show up you'd need two people who both insist on always
using --force-commit when pulling each other. Inevitably, before long
they will realize the folly of their ways and stop doing that; problem
solved.

I expect common usage to be that --force-commit is only used by
maintainers, when pulling/applying non-trivial branches from downstream.
But this is a social convention that can be decided per project, and can
be ignored by anyone who decides to fork off. And if Linus doesn't like
it he can just avoid using it in his projects.

>> There are the obvious bad consequences if you make this the default,
>> but how about adding a --force-commit option to merge and pull?
> 
> Fast forward is fast forward. Merge is when /independent/ changes are
> integrated into one.

I was under the impression that git-merge is what (indirectly)
determines if joining multiple commits is a fast-forward or a real
merge. If it's in some other piece of git, please substitute that.

>> You'd need to educate users on how to use this responsibly
> 
> Looks like you've never met real users ;-)

No, it's really easy in this case: if someone asks you to pull a rotten
branch with too many forced merges, just refuse until he stops abusing
that option. It's not the default, right? There are plenty of much worse
non-default ways to damage history.

>> And to answer Linus: yes, it's expected that only non-leaf developers
>> will use --force-commit on regular basis, but that's not because
>> maintainers are technically special in any way. It's just because
>> maintainers have something useful to say ("someone's private topic
>> branch, starting at A and ending at E, has just been accepted into my
>> all-important public repo and here's why"). Anyone else can do the same
>> if he feels likewise.
> 
> But the individual changes will presumably reflect said someone's
> authorship. If they are interleaved with stuff by others or not doesn't
> make much (development) sense. Yes, it might be interesting for a software
> historian, but that's not git's main audience in the first place.

If the only thing you care about is the tree of the top commit, then
sure, those redundant commits are worthless. But then, why do you bother
with (for example) commit messages, or tag objects? Oh, you want to know
more about what happened and why? Then great, those "pull foo to get"
and "[00/05]" messages are probably the best place to start, if we only
had where to save them.

We are all "software historians" when we look at some project's public
branches and try to grok what's going on recently, who's doing what and
along what workflow, and why it got there. This is useful information,
that is not easily tracked by any other means; there's a reason this
comes up repeatedly in various guises, you know. I can't see why some
people are so eager to discard this information and tell others to use a
munged-up shortlogs, instead of looking for ways to record as much
possible with the least negative impact.

Now, --force-commit with appropriate usage conventions seems like a
reasonable tradeoff.

BTW, in principle another (better?) way to do it is by leaving the
commit DAG alone, and annotating it with tag objects where extra
information such as "[00/05]" is available. The problem is that git
doesn't have any scalable mechanism for adding such annotations. It's a
hard problem; nothing in the commit DAG points to to tag objects, so you
have to scan some external store and that gets more expensive as the
repo grows. It also gets nasty in fetches.

  Eran

^ permalink raw reply

* Re: Feature request: git-pull -e/--edit
From: Petr Baudis @ 2006-11-20 18:11 UTC (permalink / raw)
  To: Horst H. von Brand; +Cc: Eran Tromer, Junio C Hamano, git
In-Reply-To: <200611201709.kAKH9or1012062@laptop13.inf.utfsm.cl>

On Mon, Nov 20, 2006 at 06:09:50PM CET, Horst H. von Brand wrote:
> Eran Tromer <git2eran@tromer.org> wrote:
> >   A------------F master
> >    \          /
> >     B--C--D--E
..snip..
> And if quux merges back, she gets the same plus a new merge node, and...
> Linus told everybody (quite forcefully, I might add) that this is not
> acceptable for distributed development.

Wrong, if quux merges back and does not do the same "force commit"
fast-forward (why would it, anyway - OP clearly said it's only if you
_want_ to make it explicit), quux won't get another merge but end up
with F as well. It all converges back nicely.

I can see how it could be useful.

> > You'd need to educate users on how to use this responsibly
> 
> Looks like you've never met real users ;-)

Yes, that is a real problem. ;-) But not adding features because users
could use them irresponsibly doesn't get you too far.

> > And to answer Linus: yes, it's expected that only non-leaf developers
> > will use --force-commit on regular basis, but that's not because
> > maintainers are technically special in any way. It's just because
> > maintainers have something useful to say ("someone's private topic
> > branch, starting at A and ending at E, has just been accepted into my
> > all-important public repo and here's why"). Anyone else can do the same
> > if he feels likewise.
> 
> But the individual changes will presumably reflect said someone's
> authorship.

You are personifying too much. Git setups where multiple people have
commit access are very common, and there's no reason to play them down
just because Git makes other setups easy.

> If they are interleaved with stuff by others or not doesn't make much
> (development) sense. Yes, it might be interesting for a software
> historian, but that's not git's main audience in the first place.

Tell that to Junio, our pickaxe guy. :^)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
The meaning of Stonehenge in Traflamadorian, when viewed from above, is:
"Replacement part being rushed with all possible speed."

^ permalink raw reply

* static linking lib order problem
From: Bennett Todd @ 2006-11-20 17:32 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1761 bytes --]

I build git for Bent Linux, with

	make prefix=/usr NEEDS_LIBICONV=YesPlease

It develops compile and link lines that look like:

gcc -g -O2 -Wall  -DSHA1_HEADER='<openssl/sha.h>' -DNO_STRLCPY -o git-http-fetch   fetch.o http.o http-fetch.o \
                libgit.a xdiff/lib.a -lz  -liconv  -lcrypto -lcurl -lexpat

which produce vast numbers of errors, which look like

/usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.1/../../../libcurl.a(ssluse.o)(.text.rand_enough+0x4): In function `rand_enough':
: undefined reference to `RAND_status'
/usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.1/../../../libcurl.a(ssluse.o)(.text.ossl_seed+0x2b): In function `ossl_seed':
: undefined reference to `RAND_load_file'
[...]
/usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.1/../../../libcurl.a(http_ntlm.o)(.text.Curl_output_ntlm+0x370): In function `Curl_output_ntlm':
: undefined reference to `MD5_Update'
/usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.1/../../../libcurl.a(http_ntlm.o)(.text.Curl_output_ntlm+0x37d): In function `Curl_output_ntlm':
: undefined reference to `MD5_Final'
collect2: ld returned 1 exit status
make: *** [git-http-fetch] Error 1

I've been kludging around it with this patch:

diff -ru git-1.4.4.orig/Makefile git-1.4.4/Makefile
--- git-1.4.4.orig/Makefile	2006-11-15 07:22:27.000000000 +0000
+++ git-1.4.4/Makefile	2006-11-15 20:49:26.000000000 +0000
@@ -439,7 +439,7 @@
 		BASIC_CFLAGS += -I$(CURLDIR)/include
 		CURL_LIBCURL = -L$(CURLDIR)/lib -R$(CURLDIR)/lib -lcurl
 	else
-		CURL_LIBCURL = -lcurl
+		CURL_LIBCURL = -lcurl -lssl -lcrypto
 	endif
 	PROGRAMS += git-http-fetch$X
 	curl_check := $(shell (echo 070908; curl-config --vernum) | sort -r | sed -ne 2p)

just because I didn't take the time to understand the git build
process's library conf system.

-Bennett

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* tracking many cvs/svn/git remote archives
From: Randal L. Schwartz @ 2006-11-20 17:18 UTC (permalink / raw)
  To: git

Now that git-cvsimport and git-svn are mature, I'll share my script which I
call "get.cvs" to track a number of remote archives.  It's not extremely
general, but maybe it'll inspire someone else to generalize it.

I have ~/MIRROR/foo-GITSVN tracking a remote archive using git-svn, so
the name of the directory reflects the tracking mechanism.  There's also
*-CVS, *-SVN, *-GIT, and *-GITCVS.

For *-GITCVS, I have to keep the args for git-cvsimport around, so I
store that in the respository under getcvs.gitcvsargs.

For *-GITSVN, I have to force the head/origin to softlink to the proper remote
svn reference.

The *-GIT* merges are safe, because they won't pull over any uncommited
entries, but they *will* merge into whatever the current branch is.  This
keeps any checked out tree trivially up to date, which is mostly what I'm
watching anyway.

Setting up *-GIT* generally requires checking out a master branch to really
track the files... I think I did this with "git-checkout -b master origin".

#!/bin/sh

cd && cd MIRROR || exit 1

case $# in
    0) set -- '*';;
esac

eval set -- "$@"

trap ':' 2
for i in "$@"
do (
        trap - 2
        cd $i || exit
        echo == $i ==
        case $i in
            *-CVS) cvs -q update;;
            *-SVN) svn update;;
            *-GIT*)
                ## first, update "origin":
                case $i in
                    *-GIT)
                        git-fetch
                        ;;
                    *-GITCVS)
                        git-cvsimport -k -i $(git-repo-config getcvs.gitcvsargs)
                        ;;
                    *-GITSVN)
                        ## be sure to have origin "ref: refs/remotes/git-svn"
                        git-svn multi-fetch
                        ;;
                esac
                if git-status | grep -v 'nothing to commit'
                then echo UPDATE SKIPPED
                else
                    if git-pull . origin | egrep -v 'up-to-date'
                    then
                        git log --no-merges ORIG_HEAD.. | git shortlog
                    fi
                fi
                ;;
            *)
                echo "[ignoring]";;
        esac
        )
done

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.

^ permalink raw reply

* Re: Feature request: git-pull -e/--edit
From: Horst H. von Brand @ 2006-11-20 17:09 UTC (permalink / raw)
  To: Eran Tromer; +Cc: Junio C Hamano, git
In-Reply-To: <4561B0B5.1020305@tromer.org>

Eran Tromer <git2eran@tromer.org> wrote:

[...]

> What about fast forwards? Do you get to record the explanation for the
> series only if the guy you pulled from didn't bother to do a rebase?
> That's broken.
> 
> Let's face it, the merge commits generated when pulling have two
> completely independent uses:
> 1. They're technically necessary for joining DAG nodes that don't all
>    lie on one path.
> 2. They're useful as a record of workflow and a place to put comments.
> 
> The two uses are nearly independent.
> Consider the following silly DAG.
> 
>   A------------F master
>    \          /
>     B--C--D--E
> 
> Yes, E and F have identical trees. But it's actually *very useful*, if
> the commit message at F says "merged branch foo containing experimental
> bar from quux". And it shows up nicely when looking at gitk.

I don't see the usefulness of this. 

> Of course, you could just fast-forward instead:
> 
>   A--B--C--D--E master

Yep.

> but then you lose a meaningful and useful part of the historical record.

And if quux merges back, she gets the same plus a new merge node, and...
Linus told everybody (quite forcefully, I might add) that this is not
acceptable for distributed development.

> There are the obvious bad consequences if you make this the default,
> but how about adding a --force-commit option to merge and pull?

Fast forward is fast forward. Merge is when /independent/ changes are
integrated into one.

> You'd need to educate users on how to use this responsibly

Looks like you've never met real users ;-)

>                                                            to avoid
> noise, but that's not any different from existing stuff like rebase and
> revert. Most users won't even know it exists.

> And to answer Linus: yes, it's expected that only non-leaf developers
> will use --force-commit on regular basis, but that's not because
> maintainers are technically special in any way. It's just because
> maintainers have something useful to say ("someone's private topic
> branch, starting at A and ending at E, has just been accepted into my
> all-important public repo and here's why"). Anyone else can do the same
> if he feels likewise.

But the individual changes will presumably reflect said someone's
authorship. If they are interleaved with stuff by others or not doesn't
make much (development) sense. Yes, it might be interesting for a software
historian, but that's not git's main audience in the first place.
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                    Fono: +56 32 2654431
Universidad Tecnica Federico Santa Maria             +56 32 2654239

^ permalink raw reply

* Re: [PATCH] git-merge: make it usable as the first class UI
From: Horst H. von Brand @ 2006-11-20 17:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, linux
In-Reply-To: <7vu00u4e2d.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> wrote:

[...]

> -- >8 --
> [PATCH] git-merge: make it usable as the first class UI
> 
> This teaches the oft-requested syntax
> 
> 	git merge $commit
> 
> to implement merging the named commit to the current branch.
> This hopefully would make "git merge" usable as the first class
> UI instead of being a mere backend for "git pull".
> 
> Most notably, $commit above can be any committish, so you can
> say for example:
> 
> 	git merge js/shortlog~2
> 
> to merge early part of a topic branch without merging the rest
> of it.

"Early part", i.e., branch js/shortlog up to js/shortlog~2 or just that one
commit?

> A custom merge message can be given with the new --message=<msg>
> parameter.  The message is prepended in front of the usual
> "Merge ..." message autogenerated with fmt-merge-message.

Why not -m too (consistency!)?
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                    Fono: +56 32 2654431
Universidad Tecnica Federico Santa Maria             +56 32 2654239

^ permalink raw reply

* Re: git-diff opens too many files?
From: Linus Torvalds @ 2006-11-20 17:00 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy, Junio C Hamano; +Cc: Git Mailing List
In-Reply-To: <fcaeb9bf0611200212s6ddb0518k24f85223acfed08b@mail.gmail.com>

On Mon, 20 Nov 2006, Nguyen Thai Ngoc Duy wrote:
>
> I got this error in a quite big (in files) repository:
> error: open("vnexpress.net/Suc-khoe/2001/04/3B9AF976"): Too many open
> files in system

Ok, "too many open files in system" is ENFILE - you haven't run out of 
file descriptors in _one_ process, but you've exceeded the total number of 
file descriptors in the whole system.

Which is not because we forget to close() something, but because we're 
keeping file descriptors busy another way.

> fatal: cannot hash vnexpress.net/Suc-khoe/2001/04/3B9AF976

Hmm. We keep files mmap'ed in "git diff" for possibly too long. What 
happens is that we mmap a file that we want to diff when we start the 
whole thing, and keep it mapped over the whole diff session, because we're 
potentially going to need to compare it against other files (ie rename 
detection etc). And then we unmap it only at the end (in "diff_flush()" -> 
"diff_free_filepair()" -> "diff_free_filespec_data()").

And that's normally great, and means that we don't need to worry about the 
file data (we map it once, and can keep it in memory), but yeah, if you 
have thousands of files changed, you'll have thousands of mappings. And 
each one will have a pointer to a "struct file" inside the kernel. 

What OS/distro is this? Normally, you shouldn't have that low a limit on 
number of files open, but we do end up potentially opening thousands.

For example, under Linux, you can do this:

	# in one terminal window, do:
	while : ; do cat /proc/sys/fs/file-nr ; sleep 1; done

	# in another one:
	cd linux-repo
	git ls-files '*.c' | xargs touch
	git diff

and if it looks anything like mine, it could be:

	2464    0       349662
	2464    0       349662
	2464    0       349662
*	5920    0       349662
**	7616    0       349662
***	9024    0       349662
****	10944   0       349662
	2464    0       349662
	2464    0       349662

(see how the numnber of active files grows by thousands).

Anyway, there's two possible solutions:

 - simply make sure that you can have that many open files. 

   If it's a Linux system, just increase the value of the file
   /proc/sys/fs/file-max, and you're done. Of course, if you're not the 
   admin of the box, you may need to ask somebody else to do it for you..

 - we could try to make git not keep them mmap'ed for the whole time. 

Junio? This is your speciality, I'm not sure how painful it would be to 
unmap and remap on demand.. (or switch it to some kind of "keep the last 
<n> mmaps active" kind of thing to avoid having thousands and thousands of 
mmaps active).

One simple thing that might be worth it is to simply _not_ use mmap() at 
all for small files. If a file is less than 1kB, it might be better to do 
a malloc() and a read() - partly because it avoids having tons of file 
descriptors, but partly because it's also more efficient from a virtual 
memory usage perspective (not that you're probably very likely to ever 
really hit that problem in practice).

Nguyen - that "use malloc+read" thing might be a quick workaround, but 
only if you have tons of _small_ files (and if you can't easily just 
increase file-max). 

^ permalink raw reply

* Re: [WISH] Store also tag dereferences in packed-refs
From: Jakub Narebski @ 2006-11-20 11:33 UTC (permalink / raw)
  To: Junio C Hamano
In-Reply-To: <7vu00u2wln.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:
> Jakub Narebski <jnareb@gmail.com> writes:
> 
>> ... I mean you trust (use) reference
>> info from packed-refs, but don't trust lack of dereference in
>> packed-refs.
> 
> That is exactly what the code does (at least that was the intent;
> there could be bugs since I am not Linus ;-).

The question is: is it more common case to have very large number
of heavyweight tags, or is it more common case to have very large
number of lightweight tags (refs to commit objects).

In the latter case the solution to not trust lack of dereference
means no gain in performance (although for the core checking type
of object is faster (much faster?) than depeeling tag, so the gain
wouldn't be large), although the solution is probably safer.

Still, the decision: do not trust the lack of dereference in
packed-refs, or mark packed-refs as having dereferences and trust
lack of dereferences, is fairly orthogonal to the format for depeel
in packed-refs.

P.S. I have just noticed that you have taken the discussion
off-list...
-- 
Jakub Narebski

^ permalink raw reply

* Re: [WISH] Store also tag dereferences in packed-refs
From: Linus Torvalds @ 2006-11-20 16:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vr6vy7smi.fsf@assigned-by-dhcp.cox.net>

On Sun, 19 Nov 2006, Junio C Hamano wrote:

> Linus Torvalds <torvalds@osdl.org> writes:
> 
> > So I'd suggest adding - at the very top of the ref-pack file - a line line
> >
> > 	# Ref-pack version 2
> >
> > which will be ignored by the current ref-pack reader (again, because it's 
> > not a valid ref line), but we can use it in the future to specify further 
> > extensions if we want to.
> >
> > Now somebody would just need to implement that ;)
> 
> For this particular one, there is no need for version 2.

I don't think you understand.

> My current wip does:
> 
> 	SHA-1 SP name LF
> 	SHA-1 SP SP name^{} LF

I think that's ugly and redundant (if "name" is ever different from the 
lien above it, that would be a bug), but that's not the real problem.

The real problem is (go back to the mail that you answered, and snipped 
the explanation from) this:

 - you have a thousand tags

 - NONE of them are "tag objects".

 - as a result your ref-pack file doesn't have a _single_ of the ^{} lines

Think about it. How do you know whether you should look up the tag objects 
for "-d" or not?

The answer is: you don't. You can't tell a "version 1" and "version 2" 
file apart. It might be an old "version 1" file that simply doesn't _have_ 
dereference information. Or it might be a "version 2" file that _does_ 
have dereference information, but nothing to dereference.

So you either have to:

 - look up each object again to see if it's a tag that should be 
   dereferenced

OR:

 - add a "# ref-pack version 2" flag at the top of the file.

So it's not about "parsing" the new file structure. I realize that parsing 
it is trivial. It's simply about knowing whether the new information 
_could_ be there or not.

And once you have that flag, your _future_ extensions can add their own 
version, which is an added bonus. But that means that "version 2" parsing 
should _also_ ignore lines that it cannot match, so you'd better have an 
escape from the new format. I personally think that using

	^<sha1><lf>

instead of "<sha1><space><space><name>^{}<lf>" is better partly for that 
reason: it's not only denser, it is "stricter" in the sense that there's 
less room for some future extended version that could be mistaken for a 
"version 2 unpeeling" line. 

(But you can do the same thing with your version too. You should:
 - check that there is just _one_ extra space
 - verify that the name matches the previous one
 - verify that it ends exactly with "^{}", so that any future extension 
   could add their own flags at the end.)

But regardless of which format chosen, you need the flag of "this format 
is in use", exactly because the extended unpeeling information might not 
_exist_.

Oh, and regardless of which format chosen, you'd need to verify that the 
unpeeled object in the pack wasn't overridden, of course. 

		Linus

^ permalink raw reply

* Re: git-diff opens too many files?
From: Johannes Schindelin @ 2006-11-20 15:48 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: git
In-Reply-To: <fcaeb9bf0611200732y777868c5lb9d9061a4522de97@mail.gmail.com>

Hi,

On Mon, 20 Nov 2006, Nguyen Thai Ngoc Duy wrote:

> git diff |LANG=C grep '^-'|LANG=C grep -v '^--'|LANG=C sort|LANG=C uniq -c

I looked in the code, but nothing jumps into my eyes. There is a place 
where files are mmap()ed, but AFAICT they are unmmap()ed just after 
diffing. Could you run it with strace and find unclosed open calls?

Ciao,
Dscho

^ permalink raw reply

* Re: git-diff opens too many files?
From: Nguyen Thai Ngoc Duy @ 2006-11-20 15:32 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git
In-Reply-To: <Pine.LNX.4.63.0611201620070.13772@wbgn013.biozentrum.uni-wuerzburg.de>

On 11/20/06, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> Hi,
>
> On Mon, 20 Nov 2006, Nguyen Thai Ngoc Duy wrote:
>
> > I got this error in a quite big (in files) repository:
> > error: open("vnexpress.net/Suc-khoe/2001/04/3B9AF976"): Too many open
> > files in system
> > fatal: cannot hash vnexpress.net/Suc-khoe/2001/04/3B9AF976
> >
> > The repository contained about 67.000 files and probably all were modified.
> > git version 1.4.4.rc1.g2bba
>
> What was the command line? "git diff" really is a wrapper around different
> programs...

"git diff" from working directory with no argument. The real command is

git diff |LANG=C grep '^-'|LANG=C grep -v '^--'|LANG=C sort|LANG=C uniq -c

(I squeezed blank lines, so I wanted to check that it just did that,
nothing else)

> Ciao,
> Dscho
>
>


-- 

^ permalink raw reply

* Re: git-diff opens too many files?
From: Johannes Schindelin @ 2006-11-20 15:20 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: git
In-Reply-To: <fcaeb9bf0611200212s6ddb0518k24f85223acfed08b@mail.gmail.com>

Hi,

On Mon, 20 Nov 2006, Nguyen Thai Ngoc Duy wrote:

> I got this error in a quite big (in files) repository:
> error: open("vnexpress.net/Suc-khoe/2001/04/3B9AF976"): Too many open
> files in system
> fatal: cannot hash vnexpress.net/Suc-khoe/2001/04/3B9AF976
> 
> The repository contained about 67.000 files and probably all were modified.
> git version 1.4.4.rc1.g2bba

What was the command line? "git diff" really is a wrapper around different 
programs...

Ciao,
Dscho

^ permalink raw reply

* amendment to "remove merge-recursive-old"
From: Johannes Schindelin @ 2006-11-20 15:13 UTC (permalink / raw)
  To: junkio, git

Hi,

if git-merge-recursive-old.py is removed, gitMergeCommon.py should be 
removed, too.

Ciao,
Dscho

^ permalink raw reply

* Re: [PATCH 2/2] gitweb: Refactor feed generation, make output prettier, add Atom feed
From: Jakub Narebski @ 2006-11-20 14:45 UTC (permalink / raw)
  To: git
In-Reply-To: <11639451253906-git-send-email-jnareb@gmail.com>

Jakub Narebski wrote:

> Allow for feed generation for branches other than current (HEAD)
> branch, and for generation of feeds for file or directory history.

By the way, which feed format use for branch feeds? Atom or RSS?
Or perhaps leave this configurable? But which way:
  our $default_feed_format = 'atom';
or use 'feedformat' feature?
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply

* Re: Feature request: git-pull -e/--edit
From: Eran Tromer @ 2006-11-20 13:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v8xi67qhq.fsf@assigned-by-dhcp.cox.net>

On 2006-11-20 05:21, Junio C Hamano wrote:
> linux@horizon.com writes:

>> (Indeed, it might be nice to come up with a way of including a piece of
>> the "please pull" e-mail, similar to the way that git-applypatch works.)
> 
> That is a lot more relevant example.  For example, I could
> imagine that Linus coming up with a wrapper that is fed a series
> of e-mails and:
> 
[snip]
>    - otherwise annotate the commit message with the explanation
>      of the series taken from the pull request message.
[snip]
>  - People can say "git pull -m 'I am doing this merge for such
>    and such reason' $URL $branch" to _include_ that message in
>    the resulting merge commit;
> 
>  - The same can be said about "git merge -m 'comment' $branch".

What about fast forwards? Do you get to record the explanation for the
series only if the guy you pulled from didn't bother to do a rebase?
That's broken.

Let's face it, the merge commits generated when pulling have two
completely independent uses:
1. They're technically necessary for joining DAG nodes that don't all
   lie on one path.
2. They're useful as a record of workflow and a place to put comments.

The two uses are nearly independent.
Consider the following silly DAG.

  A------------F master
   \          /
    B--C--D--E

Yes, E and F have identical trees. But it's actually *very useful*, if
the commit message at F says "merged branch foo containing experimental
bar from quux". And it shows up nicely when looking at gitk.

Of course, you could just fast-forward instead:

  A--B--C--D--E master

but then you lose a meaningful and useful part of the historical record.

There are the obvious bad consequences if you make this the default,
but how about adding a --force-commit option to merge and pull?

You'd need to educate users on how to use this responsibly to avoid
noise, but that's not any different from existing stuff like rebase and
revert. Most users won't even know it exists.

And to answer Linus: yes, it's expected that only non-leaf developers
will use --force-commit on regular basis, but that's not because
maintainers are technically special in any way. It's just because
maintainers have something useful to say ("someone's private topic
branch, starting at A and ending at E, has just been accepted into my
all-important public repo and here's why"). Anyone else can do the same
if he feels likewise.

^ permalink raw reply

* Re: Patch to tutorial.txt
From: Petr Baudis @ 2006-11-20 13:13 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Paolo Ciarrocchi, git
In-Reply-To: <200611200949.32722.jnareb@gmail.com>

On Mon, Nov 20, 2006 at 09:49:31AM CET, Jakub Narebski wrote:
> On Mon, 20 Nov 2006, Paolo Ciarrocchi wrote:
> > On 11/19/06, Jakub Narebski <jnareb@gmail.com> wrote:
> >> Paolo Ciarrocchi wrote:
> >> [...]
> >>>  ------------------------------------------------
> >>>
> >>>  at this point the two branches have diverged, with different changes
> >>> -made in each.  To merge the changes made in the two branches, run
> >>> +made in each.  To merge the changes made in experimental into master run
> >>
> >> I would rather say:
> >>   To merge the changes made in the two branches into master, run
> > 
> > Why Jakub? There are only two branches, master and experimental.
> > While sitting in master and doing git pull . experimental I would
> > expect to merge I did in experimental into master. Changes did in
> > master are alreay merged in master. Am I wrong?
> 
> For me, "merge" in "to merge the changes" phrase is merge in common-sense
> meaning of the world, not the SCM jargon. Merge the changes == join the
> changes, so you have to give both sides, both changes you join.
> 
> Merge the changes == take changes in branch 'experimental' since forking,
> take changes in branch 'master' since forking, join those changes
> together (merge), and put the result of this joining (this merge) into
> branch 'master'.
> 
> On the contrary, in "merge branch 'experimenta' into 'master'" phrase
> "merge" is in the SCM meaning of this word.

I personally find the SVM meaning much less confusing, but I can't tell
how much I've been contaminated already - "merge in the two branches
into master" really strongly suggests to me that it's about some _other_
two branches.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
The meaning of Stonehenge in Traflamadorian, when viewed from above, is:
"Replacement part being rushed with all possible speed."

^ permalink raw reply

* Re: [WISH] Store also tag dereferences in packed-refs
From: Marco Costalba @ 2006-11-20 12:56 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <ejrt37$fg9$1@sea.gmane.org>

> >
> > For this particular one, there is no need for version 2.
>
> Actually, I think it is both true and untrue. True, because we need some
> indicator that we trust packed-refs file to provide tag dereferences to
> distinguish between the case when there are no tag objects at all, so there
> are no tag dereferences in packed-refs, and the situation where we use
> packed-refs generated by older git, and there are no tag dereferences in
> packed-refs because git didn't saved it.
>

We should be able to handle the ambiguous/malformed lines correctly
and gracefully _always_, without 'trust' a version number to avoid to
be prone to attacks with a malicious malformed file.

Anyway document versioning it's a common and savy practice.



^ permalink raw reply

* Re: Rename detection at git log
From: Andy Parkins @ 2006-11-20 12:16 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano
In-Reply-To: <7vfyce2w7d.fsf@assigned-by-dhcp.cox.net>

On Monday 2006 November 20 11:28, Junio C Hamano wrote:

> If people are well disciplined, code refactoring (which can
> trigger rename/copy detection) tend to affect both source and
> destination files at the same time, so many times -C finds what
> you want without --find-copies-harder.

That's true; however, I don't think that refactoring is the common operation.  
Usually it's (as Jakub says) copy-and-modify-the-copy.  In that case the 
original is untouched.

> Having said all that, I think the rename/copy as a wholesale
> operation on one file is an uninteresting special case.  The
> generic case that happens far more often in practice is the
> lines moving around across files, and the new "git blame" gives
> you better picture to answer "where the heck did this come from"
> question.

To help the version control system underneath, I have always obeyed the 
discipline of not to copy/move and modify in the same commit.  git has the 
potential to remove this necessity, but I'd still like all my old commits to 
have the copies detected correctly.

As an example: I've got a colleague who works on a project where each new 
version begins as a copy of the old one (it's not the way I'd work, but I 
think git is flexible enough to cope with anything).  So, project1/ exists 
and is copied to project2/ to begin work.  I suppose this is effectively 
branching using the filesystem rather than the version control system.  I 
noticed (and was surprised) that git didn't detect this as a copy.  No files 
were changed in the copy, so I thought git would easily spot this.

The problem is that the next project can be a copy of either project1/ or 
project2/.  All this has already gone on for a few years.  I've recently 
imported this into git and was examining the history.  I wanted to know for a 
particular subdirectory (of many) which of the others it was based off.  I 
was in qgit, and found that the commit didn't show as a copy, it showed as a 
create, and hence I couldn't tell which was the parent project.  It's a shame 
because all the mechanisms are there to show the operation, it just isn't 
shown (without --find-copies-harder).

git-blame is obviously of huge use for these detailed analyses of individual 
line history.  However, in the simple case of a commit being a 100% copy of 
another file, git lets me down.  In fact, in the case described above, it 
wouldn't necessarily help me.  What if it went like this:

project1/ copied to project2/
project2/ copied to project3/

git-blame on a file in project3/ will show that its contents came from a 
project1 commit, whereas I want to know it's direct parent.

Andy
-- 
Dr Andy Parkins, M Eng (hons), MIEE

^ permalink raw reply

* Re: Rename detection at git log
From: Andy Parkins @ 2006-11-20 11:59 UTC (permalink / raw)
  To: git; +Cc: Jakub Narebski
In-Reply-To: <ejs2lp$2r4$1@sea.gmane.org>

On Monday 2006 November 20 11:15, Jakub Narebski wrote:

> I'm not sure about this. You usually both do pure renames (to reorganize
> files, to give file a better name) and renames with modification, but
> I don't think that copy without modification is very common. Usually you
> copy a file because you take one file as template for the other, or you
> split file, or you join files into one file.

Exactly - unfortunately it's the /source/ that has to be modified to be 
included in the potential list.  Who copies a file then modifies the 
original?  The copy is by definition already one of the modified files.

"For performance reasons, by default, -C option finds copies only if the 
original file of the copy was modified in the same changeset. This flag makes"

Your points about copy-and-change accepted.  Hash comparison is not 
sufficient.

Andy

-- 
Dr Andy Parkins, M Eng (hons), MIEE

^ permalink raw reply

* Re: Rename detection at git log
From: Alexander Litvinov @ 2006-11-20 11:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vejry5t4g.fsf@assigned-by-dhcp.cox.net>

> What it means is that "git log" will look at path that matches
> b/a (that means b/a/c and b/a/d are looked at, if b/a were a
> directory).  Since path "a" which is what the file was
> originally at is not something the pattern b/a matches, there is
> no way b/a is noticed as a rename from a.

I have found that git blame show correct commits for this case. But I am still 
in trouble then examining file's history. I have found I can use 
git show -C -M commit-sha1 
for commit there file was created to see if this was a rename :-)

^ permalink raw reply

* Re: Rename detection at git log
From: Junio C Hamano @ 2006-11-20 11:32 UTC (permalink / raw)
  To: jnareb; +Cc: git
In-Reply-To: <ejs2lp$2r4$1@sea.gmane.org>

Jakub Narebski <jnareb@gmail.com> writes:

> That said, it should be fairly easy (if not that useful in true projects
> as I understand it, as stated above) to add to copy detection detection of
> pure copies by comparing hashes.

That is already done as a performance measure (notice the double
loop controlled with "contents_too" in diffcore_rename()).

^ permalink raw reply

* Re: Rename detection at git log
From: Junio C Hamano @ 2006-11-20 11:28 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git
In-Reply-To: <7virha4cnm.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> writes:

> There are a few things we need to be careful about rename/copy.
>...
>  - Copies are only picked up from files that were changed in the
>    same change (i.e. splitting major part of original file and
>    moving it to somewhere else, while leaving a skelton in the
>    original file).  "harder" is needed if the copy original was
>    untouched, as you found out.
>
> The last one is a compromise between performance and thoroughness,
> and the "harder" is one knob to tweak its behaviour.

If people are well disciplined, code refactoring (which can
trigger rename/copy detection) tend to affect both source and
destination files at the same time, so many times -C finds what
you want without --find-copies-harder.

But sometimes the source stays the same and you literally have
duplicate (with possibly some modifications) in the new
destination.  Finding exact copy is cheap (diffcore-rename has a
double loop that first finds exact copies without similarity
estimation which is very cheap, and then goes on to open blobs
and does its similarity magic for destinations whose origin is
still unknown) but copy/rename with edit is not, and "harder"
variant feeds _everything_ from the older tree as a candidate of
copy source, so it is very expensive for huge projects.

> In the kernel archive, 
>
> 	git show -C ad2f931d
>
> tells us that:
>
>  - drivers/i2c/chips/Kconfig lost major part of it and only
>    skeletal part of the original remains in it;
>
>  - major part of it went to drivers/hwmon/Kconfig;
>
> The story is similar to the Makefile next door.

Having said all that, I think the rename/copy as a wholesale
operation on one file is an uninteresting special case.  The
generic case that happens far more often in practice is the
lines moving around across files, and the new "git blame" gives
you better picture to answer "where the heck did this come from"
question.

For example,

	git blame -f -n -C 'ad2f931d^!' -- drivers/hwmon/Kconfig

on the same commit would show that many of its lines came from
i2c/chips/Kconfig but not all of them.

There are quite a few other things I should probably mention for
new people on the list about rename/copy/break heuristics but it
is getting late so I'd defer it to some other time.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox