Git development
 help / color / mirror / Atom feed
* Re: Google Summer of Code 2009
From: Shawn O. Pearce @ 2009-01-07 23:14 UTC (permalink / raw)
  To: Alex Riesen; +Cc: git
In-Reply-To: <81b0412b0901071512k64a7d5e2u2c602b903f5233d3@mail.gmail.com>

Alex Riesen <raa.lkml@gmail.com> wrote:
> 2009/1/7 Shawn O. Pearce <spearce@spearce.org>:
> >
> >  Organization ideas page:
> >    http://git.or.cz/gitwiki/SoC2009Ideas
> 
> BTW, what happened to GitTorrent?

I got lazy and didn't copy everything over.  ;-)

GitTorrent and restartable clone both should probably be on the 2009
idea list, though GitTorrent already has a code base from the failed
2008 project that someone might be able to start and pick up from...

-- 
Shawn.

^ permalink raw reply

* [PATCH] gitweb: support the rel=vcs-* microformat
From: Joey Hess @ 2009-01-07 23:24 UTC (permalink / raw)
  To: git
In-Reply-To: <20090107190238.GA3909@gnu.kitenet.net>

The rel=vcs-* microformat allows a web page to indicate the locations of
repositories related to it in a machine-parseable manner.
(See http://kitenet.net/~joey/rfc/rel-vcs/)

Make gitweb use the microformat if it has been configured with project url
information in any of the usual ways. On the project summary page, the
repository URL display is simply marked up using the microformat. On the
project list page and forks list page, the microformat is embedded in the
header, since the URLs do not appear on the page.

The microformat could be included on other pages too, but I've skipped
doing so for now, since it would mean reading another file for every page
displayed.

There is a small overhead in including the microformat on project list
and forks list pages, but getting the project descriptions for those pages
already incurs a similar overhead, and the ability to get every repo url
in one place seems worthwhile.

This changes git_get_project_url_list() to not check wantarray, and only
return in list context -- the only way it is used AFAICS. It memoizes
both that function and git_get_project_description(), to avoid redundant
file reads.

Signed-off-by: Joey Hess <joey@gnu.kitenet.net>
---
 gitweb/gitweb.perl |   78 +++++++++++++++++++++++++++++++++++++++++----------
 1 files changed, 62 insertions(+), 16 deletions(-)

This incorporates Giuseppe Bilotta's feedback, and uses new features
of the microformat. You can see this version running at
http://git.ikiwiki.info/

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 99f71b4..c238717 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -2020,9 +2020,14 @@ sub git_get_path_by_hash {
 ## ......................................................................
 ## git utility functions, directly accessing git repository
 
+{
+my %project_descriptions; # cache
+
 sub git_get_project_description {
 	my $path = shift;
 
+	return $project_descriptions{$path} if exists $project_descriptions{$path};
+
 	$git_dir = "$projectroot/$path";
 	open my $fd, "$git_dir/description"
 		or return git_get_project_config('description');
@@ -2031,7 +2036,9 @@ sub git_get_project_description {
 	if (defined $descr) {
 		chomp $descr;
 	}
-	return $descr;
+	return $project_descriptions{$path}=$descr;
+}
+
 }
 
 sub git_get_project_ctags {
@@ -2099,18 +2106,30 @@ sub git_show_project_tagcloud {
 	}
 }
 
+{
+my %project_url_lists; # cache
+
 sub git_get_project_url_list {
+	# use per project git URL list in $projectroot/$path/cloneurl
+	# or make project git URL from git base URL and project name
 	my $path = shift;
 
+	return @{$project_url_lists{$path}} if exists $project_url_lists{$path};
+
+	my @ret;
 	$git_dir = "$projectroot/$path";
-	open my $fd, "$git_dir/cloneurl"
-		or return wantarray ?
-		@{ config_to_multi(git_get_project_config('url')) } :
-		   config_to_multi(git_get_project_config('url'));
-	my @git_project_url_list = map { chomp; $_ } <$fd>;
-	close $fd;
+	if (open my $fd, "$git_dir/cloneurl") {
+		@ret = map { chomp; $_ } <$fd>;
+		close $fd;
+	} else {
+	       @ret = @{ config_to_multi(git_get_project_config('url')) };
+	}
+	@ret=map { "$_/$project" } @git_base_url_list if ! @ret;
+
+	$project_url_lists{$path}=\@ret;
+	return @ret;
+}
 
-	return wantarray ? @git_project_url_list : \@git_project_url_list;
 }
 
 sub git_get_projects_list {
@@ -2856,6 +2875,7 @@ sub blob_contenttype {
 sub git_header_html {
 	my $status = shift || "200 OK";
 	my $expires = shift;
+	my $extraheader = shift;
 
 	my $title = "$site_name";
 	if (defined $project) {
@@ -2953,6 +2973,8 @@ EOF
 		print qq(<link rel="shortcut icon" href="$favicon" type="image/png" />\n);
 	}
 
+	print $extraheader if defined $extraheader;
+
 	print "</head>\n" .
 	      "<body>\n";
 
@@ -4365,6 +4387,26 @@ sub git_search_grep_body {
 	print "</table>\n";
 }
 
+sub git_link_title {
+	my $project=shift;
+	
+	my $description=git_get_project_description($project);
+	return $project.(length $description ? " - $description" : "");
+}
+
+# generates header with links to the specified projects
+sub git_links_header {
+	my $ret='';
+	foreach my $project (@_) {
+		# rel=vcs-* microformat
+		my $title=git_link_title($project);
+		foreach my $url git_get_project_url_list($project) {
+			$ret.=qq{<link rel="vcs-git" href="$url" title="$title"/>\n}
+		}
+	}
+	return $ret;
+}
+
 ## ======================================================================
 ## ======================================================================
 ## actions
@@ -4380,7 +4422,9 @@ sub git_project_list {
 		die_error(404, "No projects found");
 	}
 
-	git_header_html();
+	my $extraheader=git_links_header(map { $_->{path} } @list);
+
+	git_header_html(undef, undef, $extraheader);
 	if (-f $home_text) {
 		print "<div class=\"index_include\">\n";
 		insert_file($home_text);
@@ -4405,8 +4449,10 @@ sub git_forks {
 	if (!@list) {
 		die_error(404, "No forks found");
 	}
+	
+	my $extraheader=git_links_header(map { $_->{path} } @list);
 
-	git_header_html();
+	git_header_html(undef, undef, $extraheader);
 	git_print_page_nav('','');
 	git_print_header_div('summary', "$project forks");
 	git_project_list_body(\@list, $order);
@@ -4468,14 +4514,14 @@ sub git_summary {
 		print "<tr id=\"metadata_lchange\"><td>last change</td><td>$cd{'rfc2822'}</td></tr>\n";
 	}
 
-	# use per project git URL list in $projectroot/$project/cloneurl
-	# or make project git URL from git base URL and project name
 	my $url_tag = "URL";
-	my @url_list = git_get_project_url_list($project);
-	@url_list = map { "$_/$project" } @git_base_url_list unless @url_list;
-	foreach my $git_url (@url_list) {
+	my $title=git_link_title($project);
+	foreach my $git_url (git_get_project_url_list($project)) {
 		next unless $git_url;
-		print "<tr class=\"metadata_url\"><td>$url_tag</td><td>$git_url</td></tr>\n";
+		print "<tr class=\"metadata_url\"><td>$url_tag</td><td>".
+		      # rel=vcs-* microformat
+		      "<a rel=\"vcs-git\" href=\"$git_url\" title=\"$title\">$git_url</a>".
+		      "</td></tr>\n";
 		$url_tag = "";
 	}
 
-- 
1.5.6.5



-- 
see shy jo

^ permalink raw reply related

* Re: Google Summer of Code 2009
From: Johannes Schindelin @ 2009-01-07 23:30 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Alex Riesen, git
In-Reply-To: <20090107231431.GC10790@spearce.org>

Hi,

On Wed, 7 Jan 2009, Shawn O. Pearce wrote:

> Alex Riesen <raa.lkml@gmail.com> wrote:
> > 2009/1/7 Shawn O. Pearce <spearce@spearce.org>:
> > >
> > >  Organization ideas page:
> > >    http://git.or.cz/gitwiki/SoC2009Ideas
> > 
> > BTW, what happened to GitTorrent?
> 
> I got lazy and didn't copy everything over.  ;-)

Actually, that would have been lazy. :-)

> GitTorrent and restartable clone both should probably be on the 2009 
> idea list, though GitTorrent already has a code base from the failed 
> 2008 project that someone might be able to start and pick up from...

According to

	http://repo.or.cz/w/VCS-Git-Torrent.git

Joshua is still working on it (albeit slowly).

However, from what Sam said at the GitTogether, it might be a much better 
idea to look at the existing code as a fact-finding experiment, scrap it 
(excluding the experience), and start modifying git-daemon.

AFAICT Sam has a pretty clear idea how to go about it, and staying with C 
should make it much easier for other people to comment.

Note that there has been a flurry of emails on the gittorrent list a few 
weeks back, where somebody challenged the approach Sam wants to take, 
saying that BitTorrent has some very nice features that are absolutely 
necessary, such as its pretty awkward custom encoding.

But AFAICT Sam did a pretty good job at dispelling all of the objections.

Ciao,
Dscho

^ permalink raw reply

* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
From: Linus Torvalds @ 2009-01-07 23:29 UTC (permalink / raw)
  To: R. Tyler Ballance; +Cc: Nicolas Pitre, Jan Krüger, Git ML
In-Reply-To: <1231368935.8870.584.camel@starfruit>



On Wed, 7 Jan 2009, R. Tyler Ballance wrote:
>
> >    git process - ie a wild pointer, or perhaps a race condition (but we 
> >    don't really use threading in 1.6.0.4 unless you ask for it, and even 
> >    then just for pack-file generation)
> 
> I have a feeling it's something like this, one of our operations guys
> did some research while I was looking at code and he came across this:
> 
>         On Wed, 2009-01-07 at 14:17 -0800, Ken Brownfield wrote:
>         git-merge is using too much RAM, and failing to malloc() but
>         NOT  
>         > reporting it.  This is all sorts of bad:
>         > 
>         >   A) using an unscalable amount of RAM
>         >   B) failing to detect malloc() failure
>         >   C) reporting file corruption instead

Well, I dont' think that's exactly it. git internally doesn't really use 
malloc at all, and uses xmalloc() instead which will die() if the malloc 
fails. So there's almost certainly no "failing to detect failures"

Yes, there's a few places that don't use the wrapper, but they should be 
safe (eg either they SIGSEGV, or they are like create_delta_index() and 
just create a sub-optimal pack with a warning).

HOWEVER:

>         > I was able to reproduce this.
>         >
>         > limit ~1.5GB -> corrupt file
>         > limit ~3GB -> magically no longer corrupt.

That is interesting, although I also worry that there might be other 
issues going on (ie since you've reported thigns magically fixing 
themselves, maybe the ulimit tests just _happened_ to show that, even if 
it wasn't the core reason).

BUT! This is definitely worth looking at.

For example, we do have some cases where we try to do "mmap()", and if it 
fails, we try to free some memory and try again. In particular, in 
xmmap(), if an mmap() fails - which may be due to running out of virtual 
address space - we'll actually try to release some pack-file memory and 
try again. Maybe there's a bug there - and it would be one that seldom 
triggers for others.

> I think you're correct insofar that our major site-specific alteration
> has come up on the mailing list before (okay maybe two site-specific
> things). 
> 	* Our Git repo is ~7.1GB
> 	* ulimit -v is set to ~1.5G

It is certainly possible. It's too bad that it's private, because it makes 
it _much_ harder to try to pinpoint this.

				Linus

^ permalink raw reply

* Re: Google Summer of Code 2009
From: Alex Riesen @ 2009-01-07 23:40 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Shawn O. Pearce, git
In-Reply-To: <alpine.DEB.1.00.0901080024170.7496@intel-tinevez-2-302>

2009/1/8 Johannes Schindelin <Johannes.Schindelin@gmx.de>:
>> GitTorrent and restartable clone both should probably be on the 2009
>> idea list, though GitTorrent already has a code base from the failed
>> 2008 project that someone might be able to start and pick up from...
>
> According to
>
>        http://repo.or.cz/w/VCS-Git-Torrent.git
>
> Joshua is still working on it (albeit slowly).
>
> However, from what Sam said at the GitTogether, it might be a much better
> idea to look at the existing code as a fact-finding experiment, scrap it
> (excluding the experience), and start modifying git-daemon.

Takes courage, saying things like that :)

^ permalink raw reply

* Re: Problems with large compressed binaries when converting from svn
From: Alex Riesen @ 2009-01-07 23:55 UTC (permalink / raw)
  To: Øyvind Harboe; +Cc: git
In-Reply-To: <c09652430901060455l5179888ep3c51ff4e3dd5a6ef@mail.gmail.com>

2009/1/6 Øyvind Harboe <oyvind.harboe@zylin.com>:
> I'm converting from svn and I've run into a
> problem with tar.gz and tar.bz2 compressed files.
>
> (This is a separate but only slightly related to previous post).
>
> In subversion we committed large tar.bz2/gz files. These files would
> change relatively rarely, but only very slightly.  The trouble with the tar.bz2
> format is that if the first byte changes, then the rest of the file will also
> be different. .zip does not have this problem, but .zip isn't a very friendly
> format for our purposes.
>
> Later on the tar.bz2/gz files started to change fairly often, but harddrives
> get bigger much more quickly than the .svn repository grows so we just
> kept doing things the same way rather than reeducate and reengineer
> the procedures.
>
> With .git we need to handle this differently somehow.
>
> Does git have some capability to store diffs of compressed files efficiently?

No, but you can unpack the tarballs and include the toolchains as submodules
(aka subprojects) in the projects which need them.

See man page to git submodule, the user-manual.txt on "submodule" and
gitmodules.txt (submodule configuration formats and conventions).

^ permalink raw reply

* fetch branch blacklist
From: jidanni @ 2009-01-08  0:07 UTC (permalink / raw)
  To: git

If one wants to always fetch all except one remote branch, one cannot
just blacklist it, but must instead whitelist all the rest.
$ git branch -rd origin/man origin/html
Deleted remote branch origin/man.
Deleted remote branch origin/html.
Plus I edited them out of FETCH_HEAD. Nonetheless, back from the dead:
$ git pull
From git://git.kernel.org/pub/scm/git/git
 * [new branch]      html       -> origin/html
 * [new branch]      man        -> origin/man
The only solution is to change .git/config:
[remote "origin"]
	url = git://git.kernel.org/pub/scm/git/git.git
#	fetch = +refs/heads/*:refs/remotes/origin/*
	fetch = +refs/heads/maint:refs/remotes/origin/maint
	fetch = +refs/heads/master:refs/remotes/origin/master
	fetch = +refs/heads/next:refs/remotes/origin/next
	fetch = +refs/heads/pu:refs/remotes/origin/pu
	fetch = +refs/heads/todo:refs/remotes/origin/todo
(Such explicit whitelisting will also sacrifice automatic addition or
even notification, if desired, of future new branches too.)
There is a remote.<name>.skipDefaultUpdate variable, but it probably
isn't fine grained enough.

^ permalink raw reply

* collapsing commits with rebase
From: Geoff Russell @ 2009-01-08  0:08 UTC (permalink / raw)
  To: git

Dear gits,

I have a series of commits:

    A---B---C---D---E---F

I want to collapse B---C---D into one single commit. git rebase -i B  will allow
me to do this, but I'm looking for a non-interactive incantation.

Cheers,
Geoff Russell


P.S. The context is a program that performs a single high level
operation on a repository
as a series of commits but then wants to turn  it back into a single
commit without
user intervention so it subsequently looks like a single op in the history.

^ permalink raw reply

* Re: Comments on Presentation Notes Request.
From: Daniel Barkalow @ 2009-01-08  0:14 UTC (permalink / raw)
  To: Tim Visher; +Cc: git
In-Reply-To: <c115fd3c0901061433i78bf3b26v77e5981aada6728e@mail.gmail.com>

On Tue, 6 Jan 2009, Tim Visher wrote:

> Hello Everyone,
> 
> I'm putting together a little 15 minute presentation for my company
> regarding SCMSes in an attempt to convince them to at the very least
> use a Distributed SCMS and at best to use git.  I put together all my
> notes, although I didn't put together the actual presentation yet.  I
> figured I'd post them here and maybe get some feedback about it.  Let
> me know what you think.
> 
> Thanks in advance!
> 
> Notes
> ---------
> 
> SCM: Distributed, Centralized, and Everything in Between.
> 
> * SCM Best Practices
> 
> ** Allow and Encourage Customer Participation
> 
> Most shops seem to attempt to funnel customer participation through
> the developers.  This is a cache miss for many operations such as
> developing the user manual by a design team external to the
> development team.  Basic operations such as commit and update are
> fairly simple to grasp and can even be simplified further through
> scripts and other such tools that non-developers can quickly be taught
> to use.
> 
> Of note is the Tortoise family of tools which integrate directly into
> Windows Explorer.  This makes it fairly easy for anyone who is
> familiar with Windows Explorer to get into using any of the tools that
> there is a Tortoise implementation for.

I still want an office software package with "commit" instead of "save" 
(when in a repository), and a mail program with "push" instead of "attach" 
and "fetch" instead of "open". (See below)

I think that the sales department should be using distributed version 
control, neatly packaged up.

> * The Centralized Model
> 
> ** We Know About This One
> 
> This is traditional, plain vanilla, ubiquitous SCM.
> 
> The great majority of the SCMSes out there are centralized.
> 
> Closely resembles the Client/Server system model.
> 
> ** Work Flow
> 
> <http://whygitisbetterthanx.com/#any-workflow>
> 
> *** 2 basic models: 'Lock, Modify, Unlock' and 'Copy, Modify, Merge'.
> 
> Older systems were primarily Lock, Modify, Unlock implementations.
> You would checkout a file that you intended to work on, and no one
> else would be able to check it out until you unlocked it, signaling
> that you were done editing it.  This is inherently inefficient as on a
> team of developers, the chances that two are working on the exact same
> part of a system without knowing it and coordinating are fairly low.
> Also, any disparate features that still touch the same files in the
> system cannot be worked on simultaneously.
> 
> The answer to this is Copy, Modify, Merge.  In this system, every
> developer gets a complete copy of the HEAD.  Everyone changes the HEAD
> concurrently.  When commits happen, the system attempts to
> intelligently merge them.  If it fails (usually doesn't happen unless
> there is bad coordination), then it asks you to merge them.  This has
> been proven to work well.

Git is almost unique in that, at the point where the user is asked to do a 
merge, the user's work is already preserved.

That is, most systems are: Copy, Modify, Merge, Commit. Git is: Copy, 
Modify, Commit, Merge.

> * The Distributed Model
> 
> ** This Ones New
> 
> At least new as in unfamiliar.  The concept is over a decade old.

In some fundamental ways, this actually resembles the "broadcast email" 
collaboration method. That is, a group is writing a document. Someone 
writes a skeleton, and emails it to everybody else. They make changes to 
different sections. When each person has changed something, they email the 
full document to everybody else. Before people send out their 
versions, they check their email and (painfully) merge the changes into 
what they've done.

This evolved into having a certain location to avoid the painful merge, 
and then to version control. Distributed systems go back to this model, 
except without the "(painfully)" and with all the other benefits of 
version control.

> There are a few different popular distributed SCMSes (Git, Mercurial
> (hg), Bazaar (bzr), Bitkeeper)
> 
> Very closely resembles a peer-to-peer network and the organic
> relationships that evolve in that space.
> 
> In a distributed system, there is no one point where all development
> comes together to for any reason other than policy.  Everyone who is
> working on a system intrinsically has their own copy of the entire
> repository.  All of the history, all of the source code, all of the
> public branches, all of the public tags, etc.  Because of this,
> developers can also have private branches, private tags, private
> commits, private history.  The distinction between public and private
> is very important in this context.  This has several distinct features
> which I'll go into now.
> 
> ** Work Flow (Pick Your Poison)
> 
> <http://whygitisbetterthanx.com/#any-workflow>
> 
> ** Key Properties
> 
> *** Private/Public Concept
> 
> Distributed SCMSes Private/Public ontology is __much__ richer.
> Whereas in a central system, private means only what you have yet to
> commit or what you are leaving untracked, in a distributed system,
> private means anything that you have not yet _chosen_ to make public.
> In other words, you can have private branches, private tags, private
> committed changes to your copy of the head, etc.  Anything that you do
> not specifically publish to a location that others can access is
> intrinsically private.
> 
> In other words, you can finally SCM your sandbox!  You can commit as
> many broken things as you want to a private repository, giving you the
> ability to have a nearly infinite set of undoable and recoverable
> changes, without breaking anyone else's build.  Or, you can just as
> easily ignore TDD, never commit anything for 3 weeks and then do a
> big, massive commit and as long as your final product is tested and
> merges with the rest of the tree, you're good to go and no one cares.

Although you'll be really sad if you accidentally wipe out your work after 
2 1/2 weeks...

> Because you have a rich ontology for private/public data, you can also
> do crazy things like rewriting your local history before anyone else
> sees it.  Because your repository is the only one that has to know
> about the history as long as you're dealing with private data, this is
> a completely safe (although policy debatable) operation.  Of course,
> once data has been published, you really shouldn't mess with its
> history anymore.

You can also see this as writing a new history. If you knew starting out 
everything that you knew when you finished, you might do things 
differently, and the results would likely be more useful. Writing a new 
history lets you start over from where you started, while being able to 
refer to the final working state that you came up with.

> *** Must Learn New Work Flows.
> 
> In order to fully experience the advantages of distributed systems,
> new work flows must be learned.  In other words, it's possible to use
> distributed systems nearly the exact same way as you use a centralized
> system (you just need to learn new commands), but you don't get many
> of the benefits except the speed improvements.  The real game change
> happens when you realize that you can keep things private until their
> finished.  Once you realize that, new branching patterns emerge, new
> work flows happen, you commit more often, and have the ability to
> become much looser and freer in your development process.

My experience bring git to a small company is that people don't need to 
learn new workflows. They can go on with their old workflows and develop 
new ones as they streamline their work. The one exception is really that 
they have to be told that, in git, you commit before merging instead of 
merging before committing.

> *** Impossible To Completely Enforce A Single, Canonical
> Representation of the Code Base.
> 
> By nature, a distributed system cannot enforce a single canonical
> representation of the code base except by policy, and policies can
> always be broken.  Also, any intentionally private data is not backed
> up because it is not shared.  However, backup becomes much simpler
> because you know that no one else is committing to your repository.
> 
> This bears some explanation.  Within a distributed system, you can
> have a single official release point that everyone has blessed (or the
> company has blessed, or the original developer has blessed, or
> whatever).  However, you cannot _stop_ someone else from making a
> release point because their repository is just as valid as yours.  You
> cannot _stop_ developers from sharing code between themselves without
> going out to the official central location.  All you can do is ask
> them not to.

And you might not want to ask them not to. It's really nice to be able to 
reassign a developer to a different task and pass that developer's 
incomplete and not-ready-for-prime-time work to somebody else.

> * Why Git is the Best Choice
> 
> ** (Un)Staged Changes
> 
> Git employs the concept of the Index or Cache or Commit Stage.  This
> is also unique to Git, and it's pretty strange for developers coming
> from a system without it.
> 
> Basically, There are 4 states that any content can be in under Git.
> 
> 1. Untracked: This is content that Git is completely unaware of.
> 2. Tracked but Unstaged: This is content that has changed that Git is
> aware of but will not commit on the next commit command.
> 3. Tracked and Staged: This is the same as unstaged except that this
> content will be committed on the next commit.
> 4. Tracked and Committed:  This is content that has not changed since
> the previous commit that Git is aware of.

1, 4, and something in between are normal; the only extra is 
distinguishing 2 and 3.

> This is very powerful yet somewhat awkward to grasp.  Basically, the
> upshot of this feature is that you can manually build commits if you
> want to.  Say you were working on feature foo and then made some other
> changes because you came across feature bar and thought it would be
> quick to do.  In any other system, the only way you could commit parts
> of what you'd changed is if you were lucky enough for the disparate
> changes to be in different files.  In that case, you could commit only
> the files that you wanted to change for the different features.
> However, if you made disparate changes to the same file, you were
> stuck.  In Git, you can stage only parts of the files to an extreme
> degree.  This allows you to create as many commits as you want out of
> a single change set until the whole change set is committed.

It's pretty common for a system to support:

$ (sys) commit <filenames...>

At its core, the index just lets you tell git about those files on 
multiple command lines instead of just one. And it lets you make 
unincluded changes after you give it a file but before you commit. And it 
lets you fabricate the contents that you're putting in. But really, it's 
about being able to list the things to include one-by-one. (Well, really, 
it's about being able to make 100 commits of a 30000-file project in under 
a second, but that's just the original inspiration.)

> I've found this to be particularly useful when working with an
> existing code base that was not properly formatted.  Often, I'll come
> to a file that has a bunch of wonky white space choices and improperly
> indented logical constructs and I'll just quickly run through it
> correcting that stuff before continuing with the feature I was working
> on.  Afterwords, I'll stage the formatting and commit it, and then
> stage the feature I was working on and commit that.  You may not want
> that kind of control (and if you don't, you don't need to use it), but
> I like it.
> 
> ** Cryptographically Guarantees Content
> 
> One of the most surprising things I learned as I was researching this
> was that most SCMSes do not guarantee that your content does not get
> corrupted.  In other words, if the repository's disk doesn't fail but
> instead just gets corrupted, you'll never know unless you actually
> notice the corruption in the files.  If you have memory corruption
> locally and commit your changes, you just won't know.
>
> Git guarantees absolutely that if corruption happens, you will know
> about it.  It does this by creating SHA-1 hashes of your content and
> then checking to make sure that the SHA-1 hash does not change for an
> object.  The details of this aren't as important as the fact that Git
> is one of the very few systems that do this and it's obviously
> desirable.

You can still get a situation where the content gets corrupted before it 
gets into git, and git happily tracks your corrupt content. But that's 
pretty obvious.

^ permalink raw reply

* Re: Comments on Presentation Notes Request.
From: Daniel Barkalow @ 2009-01-08  0:28 UTC (permalink / raw)
  To: Boyd Stephen Smith Jr.; +Cc: git
In-Reply-To: <200901071640.06288.bss@iguanasuicide.net>

On Wed, 7 Jan 2009, Boyd Stephen Smith Jr. wrote:

> On Wednesday 2009 January 07 16:30:04 Daniel Barkalow wrote:
> > Git is clever about finding [...]
> > the common ancestor of commits that don't have a common ancestor.
> 
> *confused*
> 
> Please elaborate.

I meant to say "a *unique* closest common ancestor". The clever trick is 
that, if there are multiple common ancestors which aren't closer than each 
other, you can merge those ancestors (based, recursively, on their common 
ancestors) to generate a new commit with merge conflicts in it. You then 
pretend that this commit is the unique common ancestor for 3-way merge. 
This works because the merge conflicts in the commit all seem to have been 
replaced in each branch, and the conflict region is some arbitrary chunk 
of text in between other context, and the 3-way merge output doesn't show 
the original text (which would be weird junk in this case: a merge 
conflict that didn't really happen in the middle of other merge 
conflicts), but only the text from the two sides being merged, so it's not 
necessary to resolve the old merge that didn't happen.

I think all of the other systems, if you have crossing history such that 
there isn't a unique common ancestor do one of: (a) give up, (b) generate 
conflicts between your change as it stayed in your branch and the same 
change as it went out and came back, or (c) mishandle some cases involving 
reverts.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply

* Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
From: R. Tyler Ballance @ 2009-01-08  0:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Nicolas Pitre, Jan Krüger, Git ML, kb
In-Reply-To: <alpine.LFD.2.00.0901071520330.3057@localhost.localdomain>

[-- Attachment #1: Type: text/plain, Size: 2370 bytes --]

On Wed, 2009-01-07 at 15:29 -0800, Linus Torvalds wrote:
> It is certainly possible. It's too bad that it's private, because it makes 
> it _much_ harder to try to pinpoint this.

My most esteemed colleague (Ken aka kb) who pointed out the memory issue
was on the right path (I think), and I have a reproduction case you can
try with your very own Linux kernel tree!

WOO!

I set ulimit -v really low (150M), and the operations I made got an
mmap(2) fatal error, but there is a sweet spot that I found, see the
transcript below. I basically chose an arbitrary revision from a couple
of weeks ago, and rolled the repository back to that point, then I tried
with iterations of ulimit -v 150, 250, 450, and then back down to 350.

        tyler@grapefruit:~/source/git/linux-2.6> limit
        cputime         unlimited
        filesize        unlimited
        datasize        unlimited
        stacksize       8MB
        coredumpsize    0kB
        memoryuse       2561MB
        maxproc         24564
        descriptors     1024
        memorylocked    64kB
        addressspace    unlimited
        maxfilelocks    unlimited
        sigpending      24564
        msgqueue        819200
        nice            0
        rt_priority     0
        tyler@grapefruit:~/source/git/linux-2.6> export
        START=56d18e9932ebf4e8eca42d2ce509450e6c9c1666
        tyler@grapefruit:~/source/git/linux-2.6> git reset --hard $START
        HEAD is now at 56d18e9 Merge branch 'upstream' of
        git://ftp.linux-mips.org/pub/scm/upstream-linus
        tyler@grapefruit:~/source/git/linux-2.6> ulimit -v `echo "350 *
        1024" | bc -l`
        tyler@grapefruit:~/source/git/linux-2.6> git pull
        error: failed to read object
        be1b87c70af69acfadb8a27a7a76dfb61de92643 at offset 1850923
        from .git/objects/pack/pack-dbe154052997a05499eb6b4fd90b924da68e799a.pack
        fatal: object be1b87c70af69acfadb8a27a7a76dfb61de92643 is
        corrupted
        tyler@grapefruit:~/source/git/linux-2.6>
        
I've tried this a couple of times, and it does seem to be reproducible,
let me know if you have any issues reproducing it locally and I'll try
to dig into it more with valgrind or something a bit more pin-pointing
than "ulimit -v && try, try again"


Cheers
-- 
-R. Tyler Ballance
Slide, Inc.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
From: Linus Torvalds @ 2009-01-08  0:37 UTC (permalink / raw)
  To: R. Tyler Ballance; +Cc: Nicolas Pitre, Jan Krüger, Git ML
In-Reply-To: <alpine.LFD.2.00.0901071520330.3057@localhost.localdomain>



On Wed, 7 Jan 2009, Linus Torvalds wrote:
>
> >         > limit ~1.5GB -> corrupt file
> >         > limit ~3GB -> magically no longer corrupt.
> 
> That is interesting, although I also worry that there might be other 
> issues going on (ie since you've reported thigns magically fixing 
> themselves, maybe the ulimit tests just _happened_ to show that, even if 
> it wasn't the core reason).
> 
> BUT! This is definitely worth looking at.
> 
> For example, we do have some cases where we try to do "mmap()", and if it 
> fails, we try to free some memory and try again. In particular, in 
> xmmap(), if an mmap() fails - which may be due to running out of virtual 
> address space - we'll actually try to release some pack-file memory and 
> try again. Maybe there's a bug there - and it would be one that seldom 
> triggers for others.

Ho humm. We really do have some interesting things there. 

Is this a 64-bit machine? I didn't think OS X did that, but if there is 
some limited 64-bit support there, maybe "sizeof(void *)" is 8, then we 
default the default git pack-window to a pretty healthy 1GB.

I could easily see that if you have a virtual memory size limit of 1.5GB, 
and the pack window size is 1GB, we might have trouble. Because we could 
only keep one such pack window in memory at a time.

I have _not_ looked at the code, though. I'd have expected a SIGSEGV if we 
really had issues with the window handling.

Anyway, _if_ your system has 64-bit pointers, then _maybe_ something the 
default 1GB pack window causes problem.

If so, then adding a

	[core]
		packedgitwindowsize = 64M

might make a difference. It would certainly be very interesting to hear if 
there's any impact.

		Linus

^ permalink raw reply

* Re: collapsing commits with rebase
From: Johannes Schindelin @ 2009-01-08  0:45 UTC (permalink / raw)
  To: Geoff Russell; +Cc: git
In-Reply-To: <93c3eada0901071608r190a723bma502b68c4ab81a08@mail.gmail.com>

Hi,

On Thu, 8 Jan 2009, Geoff Russell wrote:

> Dear gits,
> 
> I have a series of commits:
> 
>     A---B---C---D---E---F
> 
> I want to collapse B---C---D into one single commit. git rebase -i B 
> will allow me to do this, but I'm looking for a non-interactive 
> incantation.

You set GIT_EDITOR to a script ;-)

Alternatively, something like this should work for you:

	$ git checkout A
	$ git read-tree -u -m D
	$ git commit -m "My message"
	$ git cherry-pick E
	$ git cherry-pick F

Hth,
Dscho

^ permalink raw reply

* Re: Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
From: Linus Torvalds @ 2009-01-08  0:48 UTC (permalink / raw)
  To: R. Tyler Ballance; +Cc: Nicolas Pitre, Jan Krüger, Git ML, kb
In-Reply-To: <1231374514.8870.621.camel@starfruit>



On Wed, 7 Jan 2009, R. Tyler Ballance wrote:
>
> My most esteemed colleague (Ken aka kb) who pointed out the memory issue
> was on the right path (I think), and I have a reproduction case you can
> try with your very own Linux kernel tree!
> 
> WOO!
> 
> I set ulimit -v really low (150M), and the operations I made got an
> mmap(2) fatal error, but there is a sweet spot that I found, see the
> transcript below.

This is indeed the packfile mapping. The sweet spot you found depends on 
how big the biggest two pack-files are, I do believe.

And if you do that

	[core]
		packedgitwindowsize = 64M

I think you'll find that it works. Of course, with a _really_ low ulimit, 
you'd need to make it even smaller, but at some point you start hitting 
other problems than the pack-file limits, ie just the simple fact that git 
wants and expects you to have a certain amount of memory available ;)

Can you cnfirm that your "reproducible" case starts working with that 
addition to your ~/.gitconfig? If so, the solution is pretty simple: we 
should just lower the default pack windowsize.

		Linus

^ permalink raw reply

* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
From: R. Tyler Ballance @ 2009-01-08  0:49 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Nicolas Pitre, Jan Krüger, Git ML
In-Reply-To: <alpine.LFD.2.00.0901071621340.3283@localhost.localdomain>

[-- Attachment #1: Type: text/plain, Size: 2976 bytes --]

On Wed, 2009-01-07 at 16:37 -0800, Linus Torvalds wrote:
> 
> On Wed, 7 Jan 2009, Linus Torvalds wrote:
> >
> > >         > limit ~1.5GB -> corrupt file
> > >         > limit ~3GB -> magically no longer corrupt.
> > 
> > That is interesting, although I also worry that there might be other 
> > issues going on (ie since you've reported thigns magically fixing 
> > themselves, maybe the ulimit tests just _happened_ to show that, even if 
> > it wasn't the core reason).
> > 
> > BUT! This is definitely worth looking at.
> > 
> > For example, we do have some cases where we try to do "mmap()", and if it 
> > fails, we try to free some memory and try again. In particular, in 
> > xmmap(), if an mmap() fails - which may be due to running out of virtual 
> > address space - we'll actually try to release some pack-file memory and 
> > try again. Maybe there's a bug there - and it would be one that seldom 
> > triggers for others.
> 
> Ho humm. We really do have some interesting things there. 

Always enjoyable when these mail threads get this deep ;)

> 
> Is this a 64-bit machine? I didn't think OS X did that, but if there is 
> some limited 64-bit support there, maybe "sizeof(void *)" is 8, then we 
> default the default git pack-window to a pretty healthy 1GB.

I was only mentioning OS X with regards to the Samba/NFS red herring,
the rest of our operations are on 64-bit Linux machines.

The machine I reproduced this on ("Public repo case!") is the following:
        tyler@grapefruit:~> uname -a
        Linux grapefruit.corp.slide.com 2.6.27.7-9-default #1 SMP
        2008-12-04 18:10:04 +0100 x86_64 x86_64 x86_64 GNU/Linux
        tyler@grapefruit:~> cat /etc/issue
        Welcome to openSUSE 11.1   - Kernel \r (\l).
        
The machines we're experiencing this issue on "in the wild" are:
        xdev3 (master)% uname -a 
        Linux xdev3 2.6.24-22-server #1 SMP Mon Nov 24 20:06:28 UTC 2008
        x86_64 GNU/Linux
        xdev3 (master)% cat /etc/issue
        Ubuntu 8.04.1 \n \l
> 
> I could easily see that if you have a virtual memory size limit of 1.5GB, 
> and the pack window size is 1GB, we might have trouble. Because we could 
> only keep one such pack window in memory at a time.

The DEFAULT_PACKED_GIT_WINDOW_SIZE in our local builds is 256M, FWIW

> 
> I have _not_ looked at the code, though. I'd have expected a SIGSEGV if we 
> really had issues with the window handling.
> 
> Anyway, _if_ your system has 64-bit pointers, then _maybe_ something the 
> default 1GB pack window causes problem.
> 
> If so, then adding a
> 
> 	[core]
> 		packedgitwindowsize = 64M
> 
> might make a difference. It would certainly be very interesting to hear if 
> there's any impact.

I can try this still if you'd like, but it doesn't seem like that'd be
the issue since we're already lowering the window size system-wide



Cheers
-- 
-R. Tyler Ballance
Slide, Inc.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* [PATCH v3] parse-opt: migrate builtin-ls-files.
From: Miklos Vajna @ 2009-01-08  0:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Pierre Habouzit, git
In-Reply-To: <20090107144640.GD831@artemis.corp>

Signed-off-by: Miklos Vajna <vmiklos@frugalware.org>
---

On Wed, Jan 07, 2009 at 03:46:40PM +0100, Pierre Habouzit <madcoder@debian.org> wrote:
> > +   if (unset)
> > +           dir->show_ignored = 0;
> > +   else
> > +           dir->show_ignored = 1;
>
> dir->show_ignored = !unset ?

True, cleaned up all 3 occurrences.

Interdiff: git diff b2a38d9..ee34fcc in my repo.

 builtin-ls-files.c |  294 ++++++++++++++++++++++++++++------------------------
 1 files changed, 159 insertions(+), 135 deletions(-)

diff --git a/builtin-ls-files.c b/builtin-ls-files.c
index f72eb85..ffa8210 100644
--- a/builtin-ls-files.c
+++ b/builtin-ls-files.c
@@ -10,6 +10,7 @@
 #include "dir.h"
 #include "builtin.h"
 #include "tree.h"
+#include "parse-options.h"
 
 static int abbrev;
 static int show_deleted;
@@ -28,6 +29,7 @@ static const char **pathspec;
 static int error_unmatch;
 static char *ps_matched;
 static const char *with_tree;
+static int exc_given;
 
 static const char *tag_cached = "";
 static const char *tag_unmerged = "";
@@ -395,156 +397,178 @@ int report_path_error(const char *ps_matched, const char **pathspec, int prefix_
 	return errors;
 }
 
-static const char ls_files_usage[] =
-	"git ls-files [-z] [-t] [-v] (--[cached|deleted|others|stage|unmerged|killed|modified])* "
-	"[ --ignored ] [--exclude=<pattern>] [--exclude-from=<file>] "
-	"[ --exclude-per-directory=<filename> ] [--exclude-standard] "
-	"[--full-name] [--abbrev] [--] [<file>]*";
+static const char * const ls_files_usage[] = {
+	"git ls-files [options] [<file>]*",
+	NULL
+};
+
+static int option_parse_z(const struct option *opt,
+			  const char *arg, int unset)
+{
+	if (unset)
+		line_terminator = '\n';
+	else
+		line_terminator = 0;
+	return 0;
+}
+
+static int option_parse_exclude(const struct option *opt,
+				const char *arg, int unset)
+{
+	struct dir_struct *dir = opt->value;
+
+	exc_given = 1;
+	add_exclude(arg, "", 0, &dir->exclude_list[EXC_CMDL]);
+
+	return 0;
+}
+
+static int option_parse_exclude_from(const struct option *opt,
+				     const char *arg, int unset)
+{
+	struct dir_struct *dir = opt->value;
+
+	exc_given = 1;
+	add_excludes_from_file(dir, arg);
+
+	return 0;
+}
+
+static int option_parse_exclude_standard(const struct option *opt,
+					 const char *arg, int unset)
+{
+	struct dir_struct *dir = opt->value;
+
+	exc_given = 1;
+	setup_standard_excludes(dir);
+
+	return 0;
+}
+
+static int option_parse_ignored(const struct option *opt,
+				const char *arg, int unset)
+{
+	struct dir_struct *dir = opt->value;
+
+	dir->show_ignored = !unset;
+
+	return 0;
+}
+
+static int option_parse_directory(const struct option *opt,
+				  const char *arg, int unset)
+{
+	struct dir_struct *dir = opt->value;
+
+	dir->show_other_directories = !unset;
+
+	return 0;
+}
+
+static int option_parse_empty(const struct option *opt,
+				 const char *arg, int unset)
+{
+	struct dir_struct *dir = opt->value;
+
+	dir->hide_empty_directories = unset;
+
+	return 0;
+}
+
+static int option_parse_full_name(const struct option *opt,
+				  const char *arg, int unset)
+{
+	prefix_offset = 0;
+
+	return 0;
+}
 
 int cmd_ls_files(int argc, const char **argv, const char *prefix)
 {
-	int i;
-	int exc_given = 0, require_work_tree = 0;
+	int require_work_tree = 0, show_tag = 0;
 	struct dir_struct dir;
+	struct option builtin_ls_files_options[] = {
+		{ OPTION_CALLBACK, 'z', NULL, NULL, NULL,
+			"paths are separated with NUL character",
+			PARSE_OPT_NOARG, option_parse_z },
+		OPT_BOOLEAN('t', NULL, &show_tag,
+			"identify the file status with tags"),
+		OPT_BOOLEAN('v', NULL, &show_valid_bit,
+			"use lowercase letters for 'assume unchanged' files"),
+		OPT_BOOLEAN('c', "cached", &show_cached,
+				"show cached files in the output (default)"),
+		OPT_BOOLEAN('d', "deleted", &show_deleted,
+				"show deleted files in the output"),
+		OPT_BOOLEAN('m', "modified", &show_modified,
+				"show modified files in the output"),
+		OPT_BOOLEAN('o', "others", &show_others,
+				"show other files in the output"),
+		{ OPTION_CALLBACK, 'i', "ignored", &dir, NULL,
+			"show ignored files in the output",
+			PARSE_OPT_NOARG, option_parse_ignored },
+		OPT_BOOLEAN('s', "stage", &show_stage,
+			"show staged contents' object name in the output"),
+		OPT_BOOLEAN('k', "killed", &show_killed,
+			"show files on the filesystem that need to be removed"),
+		{ OPTION_CALLBACK, 0, "directory", &dir, NULL,
+			"show 'other' directories' name only",
+			PARSE_OPT_NOARG, option_parse_directory },
+		{ OPTION_CALLBACK, 0, "empty-directory", &dir, NULL,
+			"list empty directories",
+			PARSE_OPT_NOARG, option_parse_empty },
+		OPT_BOOLEAN('u', "unmerged", &show_unmerged,
+			"show unmerged files in the output"),
+		{ OPTION_CALLBACK, 'x', "exclude", &dir, "pattern",
+			"skip files matching pattern",
+			0, option_parse_exclude },
+		{ OPTION_CALLBACK, 'X', "exclude-from", &dir, "file",
+			"exclude patterns are read from <file>",
+			0, option_parse_exclude_from },
+		OPT_STRING(0, "exclude-per-directory", &dir.exclude_per_dir, "file",
+			"read additional per-directory exclude patterns in <file>"),
+		{ OPTION_CALLBACK, 0, "exclude-standard", &dir, NULL,
+			"add the standard git exclusions",
+			PARSE_OPT_NOARG, option_parse_exclude_standard },
+		{ OPTION_CALLBACK, 0, "full-name", NULL, NULL,
+			"make the output relative to the project top directory",
+			PARSE_OPT_NOARG, option_parse_full_name },
+		OPT_BOOLEAN(0, "error-unmatch", &error_unmatch,
+			"if any <file> is not in the index, treat this as an error"),
+		OPT_STRING(0, "with-tree", &with_tree, "tree-ish",
+			"pretend that paths removed since <tree-ish> are still present"),
+		OPT__ABBREV(&abbrev),
+		OPT_END()
+	};
 
 	memset(&dir, 0, sizeof(dir));
 	if (prefix)
 		prefix_offset = strlen(prefix);
 	git_config(git_default_config, NULL);
 
-	for (i = 1; i < argc; i++) {
-		const char *arg = argv[i];
-
-		if (!strcmp(arg, "--")) {
-			i++;
-			break;
-		}
-		if (!strcmp(arg, "-z")) {
-			line_terminator = 0;
-			continue;
-		}
-		if (!strcmp(arg, "-t") || !strcmp(arg, "-v")) {
-			tag_cached = "H ";
-			tag_unmerged = "M ";
-			tag_removed = "R ";
-			tag_modified = "C ";
-			tag_other = "? ";
-			tag_killed = "K ";
-			if (arg[1] == 'v')
-				show_valid_bit = 1;
-			continue;
-		}
-		if (!strcmp(arg, "-c") || !strcmp(arg, "--cached")) {
-			show_cached = 1;
-			continue;
-		}
-		if (!strcmp(arg, "-d") || !strcmp(arg, "--deleted")) {
-			show_deleted = 1;
-			continue;
-		}
-		if (!strcmp(arg, "-m") || !strcmp(arg, "--modified")) {
-			show_modified = 1;
-			require_work_tree = 1;
-			continue;
-		}
-		if (!strcmp(arg, "-o") || !strcmp(arg, "--others")) {
-			show_others = 1;
-			require_work_tree = 1;
-			continue;
-		}
-		if (!strcmp(arg, "-i") || !strcmp(arg, "--ignored")) {
-			dir.show_ignored = 1;
-			require_work_tree = 1;
-			continue;
-		}
-		if (!strcmp(arg, "-s") || !strcmp(arg, "--stage")) {
-			show_stage = 1;
-			continue;
-		}
-		if (!strcmp(arg, "-k") || !strcmp(arg, "--killed")) {
-			show_killed = 1;
-			require_work_tree = 1;
-			continue;
-		}
-		if (!strcmp(arg, "--directory")) {
-			dir.show_other_directories = 1;
-			continue;
-		}
-		if (!strcmp(arg, "--no-empty-directory")) {
-			dir.hide_empty_directories = 1;
-			continue;
-		}
-		if (!strcmp(arg, "-u") || !strcmp(arg, "--unmerged")) {
-			/* There's no point in showing unmerged unless
-			 * you also show the stage information.
-			 */
-			show_stage = 1;
-			show_unmerged = 1;
-			continue;
-		}
-		if (!strcmp(arg, "-x") && i+1 < argc) {
-			exc_given = 1;
-			add_exclude(argv[++i], "", 0, &dir.exclude_list[EXC_CMDL]);
-			continue;
-		}
-		if (!prefixcmp(arg, "--exclude=")) {
-			exc_given = 1;
-			add_exclude(arg+10, "", 0, &dir.exclude_list[EXC_CMDL]);
-			continue;
-		}
-		if (!strcmp(arg, "-X") && i+1 < argc) {
-			exc_given = 1;
-			add_excludes_from_file(&dir, argv[++i]);
-			continue;
-		}
-		if (!prefixcmp(arg, "--exclude-from=")) {
-			exc_given = 1;
-			add_excludes_from_file(&dir, arg+15);
-			continue;
-		}
-		if (!prefixcmp(arg, "--exclude-per-directory=")) {
-			exc_given = 1;
-			dir.exclude_per_dir = arg + 24;
-			continue;
-		}
-		if (!strcmp(arg, "--exclude-standard")) {
-			exc_given = 1;
-			setup_standard_excludes(&dir);
-			continue;
-		}
-		if (!strcmp(arg, "--full-name")) {
-			prefix_offset = 0;
-			continue;
-		}
-		if (!strcmp(arg, "--error-unmatch")) {
-			error_unmatch = 1;
-			continue;
-		}
-		if (!prefixcmp(arg, "--with-tree=")) {
-			with_tree = arg + 12;
-			continue;
-		}
-		if (!prefixcmp(arg, "--abbrev=")) {
-			abbrev = strtoul(arg+9, NULL, 10);
-			if (abbrev && abbrev < MINIMUM_ABBREV)
-				abbrev = MINIMUM_ABBREV;
-			else if (abbrev > 40)
-				abbrev = 40;
-			continue;
-		}
-		if (!strcmp(arg, "--abbrev")) {
-			abbrev = DEFAULT_ABBREV;
-			continue;
-		}
-		if (*arg == '-')
-			usage(ls_files_usage);
-		break;
+	argc = parse_options(argc, argv, builtin_ls_files_options,
+			ls_files_usage, 0);
+	if (show_tag || show_valid_bit) {
+		tag_cached = "H ";
+		tag_unmerged = "M ";
+		tag_removed = "R ";
+		tag_modified = "C ";
+		tag_other = "? ";
+		tag_killed = "K ";
 	}
+	if (show_modified || show_others || dir.show_ignored || show_killed)
+		require_work_tree = 1;
+	if (show_unmerged)
+		/* There's no point in showing unmerged unless
+		 * you also show the stage information.
+		 */
+		show_stage = 1;
+	if (dir.exclude_per_dir)
+		exc_given = 1;
 
 	if (require_work_tree && !is_inside_work_tree())
 		setup_work_tree();
 
-	pathspec = get_pathspec(prefix, argv + i);
+	pathspec = get_pathspec(prefix, argv);
 
 	/* Verify that the pathspec matches the prefix */
 	if (pathspec)
-- 
1.6.1

^ permalink raw reply related

* Re: Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
From: R. Tyler Ballance @ 2009-01-08  0:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Nicolas Pitre, Jan Krüger, Git ML, kb
In-Reply-To: <alpine.LFD.2.00.0901071644330.3283@localhost.localdomain>

[-- Attachment #1: Type: text/plain, Size: 1584 bytes --]

On Wed, 2009-01-07 at 16:48 -0800, Linus Torvalds wrote:
> 
> On Wed, 7 Jan 2009, R. Tyler Ballance wrote:
> >
> > My most esteemed colleague (Ken aka kb) who pointed out the memory issue
> > was on the right path (I think), and I have a reproduction case you can
> > try with your very own Linux kernel tree!
> > 
> > WOO!
> > 
> > I set ulimit -v really low (150M), and the operations I made got an
> > mmap(2) fatal error, but there is a sweet spot that I found, see the
> > transcript below.
> 
> This is indeed the packfile mapping. The sweet spot you found depends on 
> how big the biggest two pack-files are, I do believe.
> 
> And if you do that
> 
> 	[core]
> 		packedgitwindowsize = 64M
> 
> I think you'll find that it works. Of course, with a _really_ low ulimit, 
> you'd need to make it even smaller, but at some point you start hitting 
> other problems than the pack-file limits, ie just the simple fact that git 
> wants and expects you to have a certain amount of memory available ;)
> 
> Can you cnfirm that your "reproducible" case starts working with that 
> addition to your ~/.gitconfig? If so, the solution is pretty simple: we 
> should just lower the default pack windowsize.

This certainly corrected the issue, is there some magic
packedgitwindowsize that i should be looking at my own repository (our
internal one) in order to prevent the issue from occurring? 

Looking into .git/objects/pack, I think the two biggest pack files are
3.5G and 177MBG respectively :-!


Cheers
-- 
-R. Tyler Ballance
Slide, Inc.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
From: Linus Torvalds @ 2009-01-08  1:01 UTC (permalink / raw)
  To: R. Tyler Ballance; +Cc: Nicolas Pitre, Jan Krüger, Git ML
In-Reply-To: <1231375780.8870.629.camel@starfruit>



On Wed, 7 Jan 2009, R. Tyler Ballance wrote:
> 
> I was only mentioning OS X with regards to the Samba/NFS red herring,
> the rest of our operations are on 64-bit Linux machines.

Ahh, ok. Good. 

> > I could easily see that if you have a virtual memory size limit of 1.5GB, 
> > and the pack window size is 1GB, we might have trouble. Because we could 
> > only keep one such pack window in memory at a time.
> 
> The DEFAULT_PACKED_GIT_WINDOW_SIZE in our local builds is 256M, FWIW

Interesting. So you already had to lower it. However, now that you mention 
it, and now that I search for your emails about it on the mailing list (I 
don't normally read the mailing list except very occasionally), I see your 
patch that does

	#define DYNAMIC_WINDOW_SIZE_PERCENTAGE 0.85
	...
	packed_git_window_size = (unsigned int)(as->rlim_cur * DYNAMIC_WINDOW_SIZE_PERCENTAGE);

which is actually very bad.

It's bad for several reasons:

 - 85% of the virtual address space is actually pessimal.

   You need space for AT LEAST two full-sized windows, so you need less 
   than 50%.

 - the way that variable is used, it _has_ to be a multiple of the page 
   size. In fact, it needs to be a multiple of _twice_ the page size. So 
   just doing a random fraction of the rlimit is not correct.

Setting it in the .gitconfig does it right, though.

> > If so, then adding a
> > 
> > 	[core]
> > 		packedgitwindowsize = 64M
> > 
> > might make a difference. It would certainly be very interesting to hear if 
> > there's any impact.
> 
> I can try this still if you'd like, but it doesn't seem like that'd be
> the issue since we're already lowering the window size system-wide

Please do try, at least if your local git changes still match that patch I 
found, because that patch generates problems.

		Linus

^ permalink raw reply

* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
From: R. Tyler Ballance @ 2009-01-08  1:06 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Nicolas Pitre, Jan Krüger, Git ML
In-Reply-To: <alpine.LFD.2.00.0901071652490.3283@localhost.localdomain>

[-- Attachment #1: Type: text/plain, Size: 2255 bytes --]

On Wed, 2009-01-07 at 17:01 -0800, Linus Torvalds wrote:
> 
> On Wed, 7 Jan 2009, R. Tyler Ballance wrote:
> > 
> > I was only mentioning OS X with regards to the Samba/NFS red herring,
> > the rest of our operations are on 64-bit Linux machines.
> 
> Ahh, ok. Good. 
> 
> > > I could easily see that if you have a virtual memory size limit of 1.5GB, 
> > > and the pack window size is 1GB, we might have trouble. Because we could 
> > > only keep one such pack window in memory at a time.
> > 
> > The DEFAULT_PACKED_GIT_WINDOW_SIZE in our local builds is 256M, FWIW
> 
> Interesting. So you already had to lower it. However, now that you mention 
> it, and now that I search for your emails about it on the mailing list (I 
> don't normally read the mailing list except very occasionally), I see your 
> patch that does
> 
> 	#define DYNAMIC_WINDOW_SIZE_PERCENTAGE 0.85
> 	...
> 	packed_git_window_size = (unsigned int)(as->rlim_cur * DYNAMIC_WINDOW_SIZE_PERCENTAGE);
> 
> which is actually very bad.
> 
> It's bad for several reasons:
> 
>  - 85% of the virtual address space is actually pessimal.
> 
>    You need space for AT LEAST two full-sized windows, so you need less 
>    than 50%.
> 
>  - the way that variable is used, it _has_ to be a multiple of the page 
>    size. In fact, it needs to be a multiple of _twice_ the page size. So 
>    just doing a random fraction of the rlimit is not correct.

This patch never made it into any of our Git builds because my flight
landed and it wasn't stable enough (and as you pointed out, it sucks ;))



> 
> Setting it in the .gitconfig does it right, though.
> 
> > > If so, then adding a
> > > 
> > > 	[core]
> > > 		packedgitwindowsize = 64M
> > > 
> > > might make a difference. It would certainly be very interesting to hear if 
> > > there's any impact.
> > 
> > I can try this still if you'd like, but it doesn't seem like that'd be
> > the issue since we're already lowering the window size system-wide
> 
> Please do try, at least if your local git changes still match that patch I 
> found, because that patch generates problems.

See my prior reply in "Public repo case!" sent at 4:57PST

-- 
-R. Tyler Ballance
Slide, Inc.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
From: Linus Torvalds @ 2009-01-08  1:08 UTC (permalink / raw)
  To: R. Tyler Ballance; +Cc: Nicolas Pitre, Jan Krüger, Git ML, kb
In-Reply-To: <1231376259.8870.633.camel@starfruit>



On Wed, 7 Jan 2009, R. Tyler Ballance wrote:
> > 
> > Can you cnfirm that your "reproducible" case starts working with that 
> > addition to your ~/.gitconfig? If so, the solution is pretty simple: we 
> > should just lower the default pack windowsize.
> 
> This certainly corrected the issue, is there some magic
> packedgitwindowsize that i should be looking at my own repository (our
> internal one) in order to prevent the issue from occurring? 
> 
> Looking into .git/objects/pack, I think the two biggest pack files are
> 3.5G and 177MBG respectively :-!

So there's a few rules to packedgitwindowsize:

 - we need to be able to have at least two windows open at a time, in 
   addition to all the "normal" memory git needs just for objects, of 
   course. And quite frankly, you'd be better off with a few more windows, 
  even if that obviously implies smaller windows.

 - the window size really wants to be a round power-of-two number, and at 
   _least_ it wants to be a nice multiple of the 2*page size.

So if you have a virtual memory limit of 1.5GB, I'd hesitate to make the 
pack window size less than 512M, and 256M is probably better. That way, 
I'd expect you to be able to always have at least four windows open 
(assuming a reasonably generous half a gigabyte for "other stuff").

And quite frankly, there's not a huge downside to making them smaller. At 
"just" 32MB, you'll still fit plenty of data in one pack window, and while 
it will cost you a few mmap/unmap's to switch windows around, most 
operations simply will not likely ever notice. At least not under Linux, 
where mmap/munmap is pretty cheap.

		Linus

^ permalink raw reply

* Re: [PATCH v3] parse-opt: migrate builtin-ls-files.
From: Johannes Schindelin @ 2009-01-08  1:10 UTC (permalink / raw)
  To: Miklos Vajna; +Cc: Junio C Hamano, Pierre Habouzit, git
In-Reply-To: <1231376145-32331-1-git-send-email-vmiklos@frugalware.org>

Hi,

On Thu, 8 Jan 2009, Miklos Vajna wrote:

> Interdiff: git diff b2a38d9..ee34fcc in my repo.

... or on repo.or.cz:

http://repo.or.cz/w/git/vmiklos.git?a=treediff;hpb=b2a38d9;hb=ee34fcc

Ciao,
Dscho

^ permalink raw reply

* Re: Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
From: Linus Torvalds @ 2009-01-08  1:29 UTC (permalink / raw)
  To: R. Tyler Ballance; +Cc: Nicolas Pitre, Jan Krüger, Git ML, kb
In-Reply-To: <alpine.LFD.2.00.0901071702190.3283@localhost.localdomain>



On Wed, 7 Jan 2009, Linus Torvalds wrote:
> 
> So there's a few rules to packedgitwindowsize:
> 
>  - we need to be able to have at least two windows open at a time, in 
>    addition to all the "normal" memory git needs just for objects, of 
>    course. And quite frankly, you'd be better off with a few more windows, 
>    even if that obviously implies smaller windows.

Btw, I'm not 100% certain of this. Somebody should double-check me. Maybe 
there are cases where we want more than two windows alive. And maybe there 
aren't even that, and we can always make do with just one.

So I will _not_ guarantee that "at least two pack windows" is necessarily 
the right answer. The windowing code was mostly other people doing it. I 
think Shawn and Nico.

			Linus

^ permalink raw reply

* Re: Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
From: Shawn O. Pearce @ 2009-01-08  1:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: R. Tyler Ballance, Nicolas Pitre, Jan Krüger, Git ML, kb
In-Reply-To: <alpine.LFD.2.00.0901071726020.3283@localhost.localdomain>

Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Wed, 7 Jan 2009, Linus Torvalds wrote:
> > 
> > So there's a few rules to packedgitwindowsize:
> > 
> >  - we need to be able to have at least two windows open at a time, in 
> >    addition to all the "normal" memory git needs just for objects, of 
> >    course. And quite frankly, you'd be better off with a few more windows, 
> >    even if that obviously implies smaller windows.
> 
> Btw, I'm not 100% certain of this. Somebody should double-check me. Maybe 
> there are cases where we want more than two windows alive. And maybe there 
> aren't even that, and we can always make do with just one.
> 
> So I will _not_ guarantee that "at least two pack windows" is necessarily 
> the right answer. The windowing code was mostly other people doing it. I 
> think Shawn and Nico.

I was fairly certain we needed at least two windows open at once,
but reviewing the code in sha1_file.c I don't see a reason for that
restriction anymore.

I think it used to have to do with the delta reconstruction; to
unpack a delta we would read the delta header from one window,
but we may need base data from another.  The delta unpack code
kept the delta window pinned in use, so we couldn't replace it
to access base data from elsewhere, hence we needed two windows.
Thinking about it now I don't recall how we handled the recusion
on a delta chain longer than 2.  ;-)

But looking at the code we have long since refactored it so this
isn't an issue anymore.  We release the window between reading
the delta header and reading the base, so the delta window can be
replaced if necessary.  I think the "2 window minimum" is just a
performance suggestion, not a requirement.

-- 
Shawn.

^ permalink raw reply

* Re: [PATCH v3] parse-opt: migrate builtin-ls-files.
From: Miklos Vajna @ 2009-01-08  1:54 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, Pierre Habouzit, git
In-Reply-To: <alpine.DEB.1.00.0901080207520.30769@pacific.mpi-cbg.de>

[-- Attachment #1: Type: text/plain, Size: 407 bytes --]

On Thu, Jan 08, 2009 at 02:10:15AM +0100, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> > Interdiff: git diff b2a38d9..ee34fcc in my repo.
> 
> ... or on repo.or.cz:
> 
> http://repo.or.cz/w/git/vmiklos.git?a=treediff;hpb=b2a38d9;hb=ee34fcc

That *is* what I meant by 'my repo'. ;-)

BTW thanks for the link, I didn't know about it - AFAIK stock gitweb
does not have a treediff view.

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: collapsing commits with rebase
From: Geoff Russell @ 2009-01-08  1:59 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git
In-Reply-To: <alpine.DEB.1.00.0901080144270.30769@pacific.mpi-cbg.de>

On Thu, Jan 8, 2009 at 11:15 AM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Hi,
>
> On Thu, 8 Jan 2009, Geoff Russell wrote:
>
>> Dear gits,
>>
>> I have a series of commits:
>>
>>     A---B---C---D---E---F
>>
>> I want to collapse B---C---D into one single commit. git rebase -i B
>> will allow me to do this, but I'm looking for a non-interactive
>> incantation.
>
> You set GIT_EDITOR to a script ;-)

This is plan B.

>
> Alternatively, something like this should work for you:
>
>        $ git checkout A
>        $ git read-tree -u -m D
>        $ git commit -m "My message"
>        $ git cherry-pick E
>        $ git cherry-pick F

Plan B is looking good, because I'd generally like the commit message to be the
concatenation of the messages for B,C and D.

Many thanks.

Geoff.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox