Git development
 help / color / mirror / Atom feed
* Re: Problems with large compressed binaries when converting from svn
From: Øyvind Harboe @ 2009-01-08  7:33 UTC (permalink / raw)
  To: Alex Riesen; +Cc: git
In-Reply-To: <81b0412b0901071555t62c1da3ar2b2cfd14222b502e@mail.gmail.com>

>> Does git have some capability to store diffs of compressed files efficiently?
>
> No, but you can unpack the tarballs and include the toolchains as submodules
> (aka subprojects) in the projects which need them.
>
> See man page to git submodule, the user-manual.txt on "submodule" and
> gitmodules.txt (submodule configuration formats and conventions).

I'll need the submodule stuff for sure, but in this particular case I was
trying to see if there was a way to keep the svn abuse patterns from
svn under git without a lot of retraining.



-- 
Øyvind Harboe
http://www.zylin.com/zy1000.html
ARM7 ARM9 XScale Cortex
JTAG debugger and flash programmer

^ permalink raw reply

* Re: Google Summer of Code 2009
From: Sam Vilain @ 2009-01-08  7:55 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Shawn O. Pearce, Alex Riesen, git
In-Reply-To: <alpine.DEB.1.00.0901080024170.7496@intel-tinevez-2-302>

On Thu, 2009-01-08 at 00:30 +0100, Johannes Schindelin wrote:
> However, from what Sam said at the GitTogether, it might be a much better 
> idea to look at the existing code as a fact-finding experiment, scrap it 
> (excluding the experience), and start modifying git-daemon.
> 
> AFAICT Sam has a pretty clear idea how to go about it, and staying with C 
> should make it much easier for other people to comment.
> 
> Note that there has been a flurry of emails on the gittorrent list a few 
> weeks back, where somebody challenged the approach Sam wants to take, 
> saying that BitTorrent has some very nice features that are absolutely 
> necessary, such as its pretty awkward custom encoding.
> 
> But AFAICT Sam did a pretty good job at dispelling all of the objections.

Yes, this is accurate as I know it.  I've renamed and reworded the
heading under the SoC2009Ideas page to point to the most current design.
It's all in a "just add JFDI" point right now I think ;-).

Sam.

^ permalink raw reply

* Re: [PATCH] gitweb: support the rel=vcs-* microformat
From: Giuseppe Bilotta @ 2009-01-08  7:56 UTC (permalink / raw)
  To: git
In-Reply-To: <20090107232427.GA18958@gnu.kitenet.net>

Hello Joey,

On Thursday 08 January 2009 00:24, Joey Hess wrote:

> The rel=vcs-* microformat allows a web page to indicate the locations of
> repositories related to it in a machine-parseable manner.
> (See http://kitenet.net/~joey/rfc/rel-vcs/)

Have you considered submitting the microformat to microformats.org?
That would make the microformat more official and would be an good
first step to have wider coverage of it, and additional reviews.

> Make gitweb use the microformat if it has been configured with project url
> information in any of the usual ways. On the project summary page, the
> repository URL display is simply marked up using the microformat. On the
> project list page and forks list page, the microformat is embedded in the
> header, since the URLs do not appear on the page.
> 
> The microformat could be included on other pages too, but I've skipped
> doing so for now, since it would mean reading another file for every page
> displayed.
> 
> There is a small overhead in including the microformat on project list
> and forks list pages, but getting the project descriptions for those pages
> already incurs a similar overhead, and the ability to get every repo url
> in one place seems worthwhile.

I agree with this, although people with very large project lists may
differ ... do we have timings on these?
 
> This changes git_get_project_url_list() to not check wantarray, and only
> return in list context -- the only way it is used AFAICS. It memoizes
> both that function and git_get_project_description(), to avoid redundant
> file reads.

You may want to consider splitting the patch into three: memoizing
of git_get_project_description(), reworking of
git_get_project_url_list(), and the actual rel=vc-* insertions.

> Signed-off-by: Joey Hess <joey@gnu.kitenet.net>
> ---
>  gitweb/gitweb.perl |   78 +++++++++++++++++++++++++++++++++++++++++----------
>  1 files changed, 62 insertions(+), 16 deletions(-)
> 
> This incorporates Giuseppe Bilotta's feedback, and uses new features
> of the microformat. You can see this version running at
> http://git.ikiwiki.info/

Oh, and do consider cc'ing jnareb and paski when submitting patches
for gitweb, as they are the (unofficial?) maintainers. I usually cc
gitster (Junio C Hamano) too.

[ Also cc'ing me for this round would have been a nice idea too,
since we had the review going on ;-) ]

> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 99f71b4..c238717 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -2020,9 +2020,14 @@ sub git_get_path_by_hash {
>  ## ......................................................................
>  ## git utility functions, directly accessing git repository
>  
> +{
> +my %project_descriptions; # cache
> +

Out of curiosity, why the grouping? I would have had

our %project_descriptions;

up above with all the global variables.

>  sub git_get_project_description {
>       my $path = shift;
>  
> +     return $project_descriptions{$path} if exists $project_descriptions{$path};
> +

This line is bordering on the 80 characters, so you may want to
consider moving 'my $descr' here, with something such as

my $descr = $project_descriptions{$path};
return $descr if exists $descr;

Also, I'm no perl guru so I'm not sure about exists vs defined here.

>       $git_dir = "$projectroot/$path";
>       open my $fd, "$git_dir/description"
>               or return git_get_project_config('description');
> @@ -2031,7 +2036,9 @@ sub git_get_project_description {
>       if (defined $descr) {
>               chomp $descr;
>       }
> -     return $descr;
> +     return $project_descriptions{$path}=$descr;
> +}
> +
>  }

[This is where I would end the first patch]

>  
>  sub git_get_project_ctags {
> @@ -2099,18 +2106,30 @@ sub git_show_project_tagcloud {
>       }
>  }
>  
> +{
> +my %project_url_lists; # cache
> +

Ditto for this: why not our %project_url_lists; without scoping?

>  sub git_get_project_url_list {
> +     # use per project git URL list in $projectroot/$path/cloneurl
> +     # or make project git URL from git base URL and project name
>       my $path = shift;
>  
> +     return @{$project_url_lists{$path}} if exists $project_url_lists{$path};
> +
> +     my @ret;
>       $git_dir = "$projectroot/$path";
> -     open my $fd, "$git_dir/cloneurl"
> -             or return wantarray ?
> -             @{ config_to_multi(git_get_project_config('url')) } :
> -                config_to_multi(git_get_project_config('url'));
> -     my @git_project_url_list = map { chomp; $_ } <$fd>;
> -     close $fd;
> +     if (open my $fd, "$git_dir/cloneurl") {
> +             @ret = map { chomp; $_ } <$fd>;
> +             close $fd;
> +     } else {
> +            @ret = @{ config_to_multi(git_get_project_config('url')) };
> +     }
> +     @ret=map { "$_/$project" } @git_base_url_list if ! @ret;
> +
> +     $project_url_lists{$path}=\@ret;
> +     return @ret;
> +}
>  
> -     return wantarray ? @git_project_url_list : \@git_project_url_list;
>  }

[This is where I would end the second patch]

>  
>  sub git_get_projects_list {
> @@ -2856,6 +2875,7 @@ sub blob_contenttype {
>  sub git_header_html {
>       my $status = shift || "200 OK";
>       my $expires = shift;
> +     my $extraheader = shift;
>  
>       my $title = "$site_name";
>       if (defined $project) {
> @@ -2953,6 +2973,8 @@ EOF
>               print qq(<link rel="shortcut icon" href="$favicon" type="image/png" />\n);
>       }
>  
> +     print $extraheader if defined $extraheader;
> +
>       print "</head>\n" .
>             "<body>\n";
>  
> @@ -4365,6 +4387,26 @@ sub git_search_grep_body {
>       print "</table>\n";
>  }
>  
> +sub git_link_title {
> +     my $project=shift;
> +     
> +     my $description=git_get_project_description($project);
> +     return $project.(length $description ? " - $description" : "");
> +}

Nice.

> +
> +# generates header with links to the specified projects
> +sub git_links_header {
> +     my $ret='';
> +     foreach my $project (@_) {
> +             # rel=vcs-* microformat
> +             my $title=git_link_title($project);
> +             foreach my $url git_get_project_url_list($project) {
> +                     $ret.=qq{<link rel="vcs-git" href="$url" title="$title"/>\n}
> +             }
> +     }
> +     return $ret;
> +}
> +
>  ## ======================================================================
>  ## ======================================================================
>  ## actions
> @@ -4380,7 +4422,9 @@ sub git_project_list {
>               die_error(404, "No projects found");
>       }
>  
> -     git_header_html();
> +     my $extraheader=git_links_header(map { $_->{path} } @list);
> +
> +     git_header_html(undef, undef, $extraheader);
>       if (-f $home_text) {
>               print "<div class=\"index_include\">\n";
>               insert_file($home_text);
> @@ -4405,8 +4449,10 @@ sub git_forks {
>       if (!@list) {
>               die_error(404, "No forks found");
>       }
> +     
> +     my $extraheader=git_links_header(map { $_->{path} } @list);
>  
> -     git_header_html();
> +     git_header_html(undef, undef, $extraheader);

This makes me wonder if it would be worth it to turn git_header_html
into -param => value style, but I'm not really sure it's worth it.

>       git_print_page_nav('','');
>       git_print_header_div('summary', "$project forks");
>       git_project_list_body(\@list, $order);
> @@ -4468,14 +4514,14 @@ sub git_summary {
>               print "<tr id=\"metadata_lchange\"><td>last change</td><td>$cd{'rfc2822'}</td></tr>\n";
>       }
>  
> -     # use per project git URL list in $projectroot/$project/cloneurl
> -     # or make project git URL from git base URL and project name
>       my $url_tag = "URL";
> -     my @url_list = git_get_project_url_list($project);
> -     @url_list = map { "$_/$project" } @git_base_url_list unless @url_list;
> -     foreach my $git_url (@url_list) {
> +     my $title=git_link_title($project);
> +     foreach my $git_url (git_get_project_url_list($project)) {
>               next unless $git_url;
> -             print "<tr class=\"metadata_url\"><td>$url_tag</td><td>$git_url</td></tr>\n";
> +             print "<tr class=\"metadata_url\"><td>$url_tag</td><td>".
> +                   # rel=vcs-* microformat
> +                   "<a rel=\"vcs-git\" href=\"$git_url\" title=\"$title\">$git_url</a>".
> +                   "</td></tr>\n";
>               $url_tag = "";
>       }

Good. Of course the comment removal (which is actually a due move to
git_get_project_url_list) would go in the appropriate patch if you
split them 8-)

-- 
Giuseppe "Oblomov" Bilotta

^ permalink raw reply

* Re: [BUG PATCH RFC] mailinfo: correctly handle multiline 'Subject:' header
From: Junio C Hamano @ 2009-01-08  8:13 UTC (permalink / raw)
  To: Kirill Smelkov; +Cc: git
In-Reply-To: <20090107224342.GB4946@roro3>

Kirill Smelkov <kirr@landau.phys.spbu.ru> writes:

> On Fri, Dec 26, 2008 at 09:38:41PM +0300, Kirill Smelkov wrote:
>> When native language (RU) is in use, subject header usually contains several
>> parts, e.g.
> ...
> Junio, All,
>
> What about this patch?

What's most interesting is that I do not recall seeing this patch before.
Neither gmane (which is my back-up interface to the mailing list) nor my
mailbox seems to have a copy, and from the look of quoted parts (namely,
some Russian strings in the message), it is not implausible that my spam
filter (either on my receiving end or at the ISP) may have eaten it.

> It at least exposes bug in git-mailinfo wrt handling of multiline
> subjects, and in very details documents it and adds a test for it.
>
> ..., but may I try to attract git
> community attention one more time?

It is very appreciated.

> P.S. original post with patch:
>
> http://marc.info/?l=git&m=123031899307286&w=2

I have not had chance to look at your patch at marc yet, but from the look
of your problem description, I presume you could trigger this with any
utf-8 b-encoded loooooong subject line?

^ permalink raw reply

* Re: [BUG PATCH RFC] mailinfo: correctly handle multiline 'Subject:' header
From: Junio C Hamano @ 2009-01-08  8:35 UTC (permalink / raw)
  To: Kirill Smelkov; +Cc: git
In-Reply-To: <7vy6xm5i6h.fsf@gitster.siamese.dyndns.org>

Junio C Hamano <gitster@pobox.com> writes:

> Kirill Smelkov <kirr@landau.phys.spbu.ru> writes:
> ...
>> http://marc.info/?l=git&m=123031899307286&w=2
>
> I have not had chance to look at your patch at marc yet, but from the look
> of your problem description, I presume you could trigger this with any
> utf-8 b-encoded loooooong subject line?

Ok, I took a look at it after downloading from the marc archive.

> diff --git a/builtin-mailinfo.c b/builtin-mailinfo.c
> index e890f7a..d138bc3 100644
> --- a/builtin-mailinfo.c
> +++ b/builtin-mailinfo.c
> @@ -436,6 +436,14 @@ static struct strbuf *decode_b_segment(const struct strbuf *b_seg)
>  			 * for now we just trust the data.
>  			 */
>  			c = 0;
> +
> +			/* XXX: the following is needed not to output NUL in
> +			 * the resulting string
> +			 *
> +			 * This seems to be ok, but I'm not 100% sure -- that's
> +			 * why this is an RFC.
> +			 */
> +			continue;
>  		}
>  		else
>  			continue; /* garbage */

B encoding (RFC 2045) encodes an octet stream into a sequence of groups of
4 letters from 64-char alphabet, each of which encodes 6-bit, plus zero or
more padding char '=' to make the result multiple of 4.

 * If the length of the payload is a multiple of 3 octets, there is no
   special handling.  Padding char '=' is not produced;

 * If it is a multiple of 3 octets plus one, the remaining one octet is
   encoded with two letters, and two more padding char '=' is added;

 * If it is a multiple of 3 octets plus two, the remaining two octets are
   encoded with three letters, and one padding char '=' is added.

Hence, a "correct" implementation should decode the input as if '=' were
the same as 'A' (which encodes 6 bits of 0) til the end, making sure that
the padding char '=' appears only at the end of the input, that no char
outside the Base64 encoding alphabet appears in the input, and that the
length of the entire encoded string is multiple of 4.  Finally it would
discard either one or two octets (depending on the number of padding chars
it saw) from the end of the output.

Our decode_b_segment() however emits each octet as it completes, without
waiting for the 24-bit group that contains it to complete.  When decoding
a correctly encoded input, by the time we see a padding '=', all the real
payload octets are complete and we would not have any real information
still kept in the variable "acc" (accumulator), so ignoring '=' (you do
not even need to assign c = 0) like your patch did would work just fine.
An alternative would be to count the number of padding at the end and drop
the NULs from the output as necessary after the loop but that does not add
any value to the current code.

Ideally we should validate the encoded string a bit more carefully (see
the "correct" implementation about), and warn if a malformed input is
found (but probably not reject outright).  But as a low-impact fix for the
maintenance branches, I think your fix is very good.

	Side note: I suspect that the existing code was Ok before strbuf
	conversion as we assumed NUL terminated output buffer.

> @@ -513,7 +521,15 @@ static int decode_header_bq(struct strbuf *it)
>  		strbuf_reset(&piecebuf);
>  		rfc2047 = 1;
>  
> -		if (in != ep) {
> +		/* XXX: the follwoing is needed not to output '\n' on every
> +		 * multi-line segment in Subject.
> +		 *
> +		 * I suspect this is not 100% correct, but I'm not a MIME guy
> +		 * -- that's why this is an RFC.
> +		 */
> +
> +		/* if in does not end with '=?=', we emit it as is */
> +		if (in <= (ep-2) && !(ep[-1]=='\n' && ep[-2]=='=')) {
>  			strbuf_add(&outbuf, in, ep - in);
>  			in = ep;
> 
>  		}

I am not a MIME guy either (and mailinfo has a big comment that says we do
not really do MIME --- we just pretend to do), but let me give it a try.

RFC2046 specifies that an encoded-word ("=?charset?encoding?...?=") may
not be more than 75 characters long, and multiple encoded-words, separated
by CRLF SPACE can be used to encode more text if needed.

It further specifies that an encoded-word can appear next to ordinary text
or another encoded-word but it must be separated by linear white space,
and says that such linear white space is to be ignored when displaying.

Which means that we should be eating the CRLF SPACE we see if we have seen
an encoded-word immediately before and we are about to process another
encoded-word.

Based on the above discussion, here is what I came up with.  It passes
your test, but I ran out of energy to try breaking it seriously in any
other way than just running the existing test suite.  

We might want to steal some test cases from the "8. Examples" section of
RFC2047 and add them to t5100.

Thanks.

 builtin-mailinfo.c |   27 +++++++++++++++++++--------
 1 files changed, 19 insertions(+), 8 deletions(-)

diff --git c/builtin-mailinfo.c w/builtin-mailinfo.c
index e890f7a..fcb32c9 100644
--- c/builtin-mailinfo.c
+++ w/builtin-mailinfo.c
@@ -430,13 +430,6 @@ static struct strbuf *decode_b_segment(const struct strbuf *b_seg)
 			c -= 'a' - 26;
 		else if ('0' <= c && c <= '9')
 			c -= '0' - 52;
-		else if (c == '=') {
-			/* padding is almost like (c == 0), except we do
-			 * not output NUL resulting only from it;
-			 * for now we just trust the data.
-			 */
-			c = 0;
-		}
 		else
 			continue; /* garbage */
 		switch (pos++) {
@@ -514,7 +507,25 @@ static int decode_header_bq(struct strbuf *it)
 		rfc2047 = 1;
 
 		if (in != ep) {
-			strbuf_add(&outbuf, in, ep - in);
+			/*
+			 * We are about to process an encoded-word
+			 * that begins at ep, but there is something
+			 * before the encoded word.
+			 */
+			char *scan;
+			for (scan = in; scan < ep; scan++)
+				if (!isspace(*scan))
+					break;
+
+			if (scan != ep || in == it->buf) {
+				/*
+				 * We should not lose that "something",
+				 * unless we have just processed an
+				 * encoded-word, and there is only LWS
+				 * before the one we are about to process.
+				 */
+				strbuf_add(&outbuf, in, ep - in);
+			}
 			in = ep;
 		}
 		/* E.g.

^ permalink raw reply related

* Re: Can I prevent someone clone my git repository?
From: Junio C Hamano @ 2009-01-08  8:36 UTC (permalink / raw)
  To: Emily Ren; +Cc: git
In-Reply-To: <856bfe0e0901072303i4fcd3bf6u99790ab9f4170937@mail.gmail.com>

"Emily Ren" <lingyan.ren@gmail.com> writes:

> I want some person can clone my git repository, others can't clone my
> git repository. Is it realizable ? How to do it?

It depends on what transport these people come from.

On the local filesystem transport (either same host or network-mounted
filesystem), you do it the same way as you solve "how do I show these
files of mine on the local computer to some but not others".  Typically,
you place these group members in the same UNIX group, make the toplevel
directory of the hierarchy owned by the group, and "chmod g+rx,o=" it (and
make everything underneath group readable).  Setting core.sharedrepository
configuration variable would help maintain the group readability.

If they come over the http transport, you would solve it the same way as
you solve "how do I allow access to these files on my webserver to only
selected few?"  Probably .htaccess file in the toplevel directory will be
involved.

You can set up gitosis and have it serve your repository, and register
group members' SSH keys to gitosis.  It allows you to categorize these
users into different groups, and assign read-only or read-write access to
repositories.  When this is done, these people will be coming over the
"git over ssh" transport, i.e. git@your-host:/path/to/repository.git/
or its synonym ssh://git@your-host/path/to/repository.git/

The git-daemon transport deliberately omits authentication, and you cannot
restrict when they come over the git native transport using a URL like
git://your-host/repository.git

-jc

^ permalink raw reply

* [PATCH] Support ref logs for refs/*
From: Neil Macneale @ 2009-01-08  8:28 UTC (permalink / raw)
  To: git

The documentation for git update-ref seems to imply that logging of ref
updates should be done for anything in refs/, though the code looks like it
restricts changes to heads and remotes. Any reason not so support arbitrary
refs?

I don't see much point in logging for tags, so the patch ignores refs/tags.

Thanks,
Neil

Signed-off-by: Neil Macneale <mac4-git@theory.org>
---
 refs.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/refs.c b/refs.c
index 33ced65..cfff22b 100644
--- a/refs.c
+++ b/refs.c
@@ -1154,9 +1154,9 @@ static int log_ref_write(const char *ref_name, const
unsigned char *old_sha1,
        git_snpath(log_file, sizeof(log_file), "logs/%s", ref_name);
 
        if (log_all_ref_updates &&
-           (!prefixcmp(ref_name, "refs/heads/") ||
-            !prefixcmp(ref_name, "refs/remotes/") ||
-            !strcmp(ref_name, "HEAD"))) {
+           (!prefixcmp(ref_name, "refs/") ||
+            !strcmp(ref_name, "HEAD")) &&
+           prefixcmp(ref_name, "refs/tags/")) {
                if (safe_create_leading_directories(log_file) < 0)
                        return error("unable to create directory for %s",
                                     log_file);
-- 
1.6.1.141.gfe98e.dirty

^ permalink raw reply related

* Re: Can I prevent someone clone my git repository?
From: Johannes Sixt @ 2009-01-08  8:59 UTC (permalink / raw)
  To: Emily Ren; +Cc: Junio C Hamano, git
In-Reply-To: <7vr63e42ke.fsf@gitster.siamese.dyndns.org>

Junio C Hamano schrieb:
> The git-daemon transport deliberately omits authentication, and you cannot
> restrict when they come over the git native transport using a URL like
> git://your-host/repository.git

But you can wrap git daemon by tcpd and configure hosts.allow and
hosts.deny (with all its caveats), if this suits your needs.

-- Hannes

^ permalink raw reply

* Git for Product Line Engineering
From: Michail Anastasopoulos @ 2009-01-08  9:01 UTC (permalink / raw)
  To: git

Hello,
I was wondering if anyone of you has applied git to manage the evolution of a
product line. In such a context management of software reuse and permanent
variation becomes necessary.

I think that the distributed character of git as well the easier handling of
branches could be very beneficial in such a context.

Yet I was wondering how the relations between reusable and reused things could
be managed?

If for example I am the maintainer of a library in a product line context I want
to know who pulls from me and whether my library had to undergo any
product-specific changes in any of the other repositories that belong to my
product line.

Regards,
Michalis

^ permalink raw reply

* Re: [PATCH] Support ref logs for refs/*
From: Nanako Shiraishi @ 2009-01-08  9:08 UTC (permalink / raw)
  To: Neil Macneale; +Cc: git
In-Reply-To: <20090108082827.GA6177@tesla.theory.org>

Quoting Neil Macneale <mac4-git@theory.org>:

> The documentation for git update-ref seems to imply that logging of ref
> updates should be done for anything in refs/...

The documementation for git-update-ref is part of git, and git does not use anything outside of refs/{heads,tags,remotes}/ for its normal operation. 

I think it is generally assumed that there is nothing of interest outside of these areas that deserves the automated creation of reflogs, and the code you are touching is about that. Once you have reflog for any ref you are interested outside of these areas, your actions will be logged regardless. 

Most notably, refs/stash itself is exempt from this code path and it makes sure that reflog exists without relying on log_all_ref_updates configuration. 

Also the documentation for the configuration variable explicitly says it is about the branch heads.

core.logAllRefUpdates::
	Enable the reflog. Updates to a ref <ref> is logged to the file
	"$GIT_DIR/logs/<ref>", by appending the new and old
	SHA1, the date/time and the reason of the update, but
	only when the file exists.  If this configuration
	variable is set to true, missing "$GIT_DIR/logs/<ref>"
	file is automatically created for branch heads.

-- 
Nanako Shiraishi
http://ivory.ap.teacup.com/nanako3/

^ permalink raw reply

* Re: [PATCH (topgit) 1/2] Implement setup_pager just like in git
From: Kirill Smelkov @ 2009-01-08  9:23 UTC (permalink / raw)
  To: martin f krafft
  Cc: Thomas Rast, Bert Wesarg, Pierre Habouzit, Petr Baudis, git
In-Reply-To: <20090108020650.GC7345@lapse.rw.madduck.net>

On Thu, Jan 08, 2009 at 03:06:50PM +1300, martin f krafft wrote:
> also sprach Kirill Smelkov <kirr@landau.phys.spbu.ru> [2009.01.08.1100 +1300]:
> > > So I suppose you could use
> > > 
> > >   ${GIT_PAGER-${PAGER-less}}
> > > 
> > > or similar.
> > 
> > Good eyes, thanks!
> > 
> > I'll rework it.
> 
> I am not 100% on this, but I think nested {}'s are a bashism.

It seems to be ok:

kirr@roro3:~$ dash 
$ unset GIT_PAGER
$ unset PAGER
$ echo ${GIT_PAGER-${PAGER-less}}
less
$ PAGER=more
$ echo ${GIT_PAGER-${PAGER-less}}
more
$ GIT_PAGER=''
$ echo ${GIT_PAGER-${PAGER-less}}

$ GIT_PAGER=/bin/cat
$ echo ${GIT_PAGER-${PAGER-less}}
/bin/cat


> > On Wed, Jan 07, 2009 at 03:24:02PM +0100, Bert Wesarg wrote:
> > > On Wed, Jan 7, 2009 at 12:27, Kirill Smelkov <kirr@landau.phys.spbu.ru> wrote:
> > > > Martin, thanks for your review.
> > > > +       # atexit(close(1); wait pager)
> > > > +       trap "exec >&-; rm "$_pager_fifo"; rmdir "$_pager_fifo_dir"; wait" EXIT
> > > I think you need to escape the double quotes.
> > 
> > Good eyes -- corrected and thanks!
> 
> You could also just use single quotes inside the double quotes.

Thanks for the tip - I'll keep it in mind. Or is it the preferred way?


Thanks,
Kirill

^ permalink raw reply

* Re: [PATCH] tutorial.txt renamed
From: Brian Foster @ 2009-01-08  9:21 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Brian Gernhardt, Christian Couder, Joey Hess
In-Reply-To: <7vljtnbpha.fsf@gitster.siamese.dyndns.org>

On Wednesday 07 January 2009 07:27:13 Junio C Hamano wrote:
> Brian Gernhardt <benji@silverinsanity.com> writes:
> > This is the README file for the project, so it should advise looking  
> > at the Documentation directory as neither the man pages or git command  
> > are likely installed at this point.
> 
> I think that is a sane suggestion.  It is better to keep the number of
> prerequisites to the minimum for the user in order to follow README (and
> INSTALL, of course).

 It is indeed a sane suggestion.  However, there is no (obvious?)
 harm in *also* mentioning that ‘git help tutorial’ should also
 display the tutorial.  Something like “If git has been correctly
 installed, then this tutorial can also be read with the command
 ‘git help tutorial’.”

cheers!
	-blf-
-- 
“How many surrealists does it take to   | Brian Foster
 change a lightbulb? Three. One calms   | somewhere in south of France
 the warthog, and two fill the bathtub  |   Stop E$$o (ExxonMobil)!
 with brightly-coloured machine tools.” |      http://www.stopesso.com

^ permalink raw reply

* Re: Can I prevent someone clone my git repository?
From: Emily Ren @ 2009-01-08  9:33 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, git
In-Reply-To: <4965C07D.705@viscovery.net>

Hannes,
Could you give me a detailed steps on how to wrap git daemon by tcpd?

Junio,
I think gitosis can control readonly or writable, it can't control if
it's can be cloned. Am I right?

Thanks,
Emily

On Thu, Jan 8, 2009 at 4:59 PM, Johannes Sixt <j.sixt@viscovery.net> wrote:
> Junio C Hamano schrieb:
>> The git-daemon transport deliberately omits authentication, and you cannot
>> restrict when they come over the git native transport using a URL like
>> git://your-host/repository.git
>
> But you can wrap git daemon by tcpd and configure hosts.allow and
> hosts.deny (with all its caveats), if this suits your needs.
>
> -- Hannes
>

^ permalink raw reply

* Re: Can I prevent someone clone my git repository?
From: Johannes Sixt @ 2009-01-08  9:41 UTC (permalink / raw)
  To: Emily Ren; +Cc: Junio C Hamano, git
In-Reply-To: <856bfe0e0901080133q68d0008ao1abf9d235e70279e@mail.gmail.com>

Emily Ren schrieb:
> Could you give me a detailed steps on how to wrap git daemon by tcpd?

Sorry, no, I haven't done that myself. I would look into /etc/xinetd.d/*
how tcpd is used with other protocols and merge that information with the
examples in the man page of git daemon.

-- Hannes

^ permalink raw reply

* Re: collapsing commits with rebase
From: Geoff Russell @ 2009-01-08  9:49 UTC (permalink / raw)
  To: Boyd Stephen Smith Jr.; +Cc: Miklos Vajna, git
In-Reply-To: <200901072039.12631.bss@iguanasuicide.net>

On 1/8/09, Boyd Stephen Smith Jr. <bss@iguanasuicide.net> wrote:
> On Wednesday 2009 January 07 20:32:24 Miklos Vajna wrote:
>  >On Wed, Jan 07, 2009 at 08:11:32PM -0600, "Boyd Stephen Smith Jr."
>  <bss@iguanasuicide.net> wrote:
>  >> git merge -s sha(D)
>  >
>  >You probably mean --squash here, -s stands for --strategy - and I *hope*
>  >you don't have git-sha(D) in your PATH, as a custom merge strategy. ;-)
>

Many thanks, now I have plenty of ways to think about!

Cheers,
Geoff.

^ permalink raw reply

* Re: Comments on Presentation Notes Request.
From: Jeff King @ 2009-01-08  9:56 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Tim Visher, git
In-Reply-To: <alpine.LNX.1.00.0901071654530.19665@iabervon.org>

On Wed, Jan 07, 2009 at 05:30:04PM -0500, Daniel Barkalow wrote:

> > So yes, you are much more likely to salvage useful (if not all) data
> > from developer repositories in the event of a crash. But I still think
> > it's crazy not to have a backup strategy for your DVCS repo.
> 
> I think it's very important to have a backup strategy, but it's nice that 
> the developers can get work done while the server is still down.

I think everything you said in your email was correct, and I agree with
it, but I just wanted to clarify one thing about what I said.

I really _do_ think you are better off in a disaster or backup situation
with a DVCS. Both this past year and 2007, Junio dropped off the face of
the git planet for a few weeks, and everyone seamlessly switched to
Shawn as maintainer. So I think of the DVCS model almost more as "high
availablity": even if you model your workflow around a central server,
it's easy to route around the failure.

It's just that I don't think these features totally _replace_ backups as
a concept. And I feel like that notion creeps up now and again in the
centralized versus distributed holy wars.

So I think we agree; I just wasn't sure if I gave the wrong impression
from my first email.

-Peff

^ permalink raw reply

* Re: [PATCH RFC] mailinfo: correctly handle multiline 'Subject:' header
From: Alexander Potashev @ 2009-01-08 10:08 UTC (permalink / raw)
  To: Kirill Smelkov; +Cc: Junio C Hamano, git
In-Reply-To: <1230316721-14339-1-git-send-email-kirr@mns.spb.ru>

On 21:38 Fri 26 Dec     , Kirill Smelkov wrote:
> When native language (RU) is in use, subject header usually contains several
> parts, e.g.
> 
> Subject: [Navy-patches] [PATCH]
> 	=?utf-8?b?0JjQt9C80LXQvdGR0L0g0YHQv9C40YHQvtC6INC/0LA=?=
> 	=?utf-8?b?0LrQtdGC0L7QsiDQvdC10L7QsdGF0L7QtNC40LzRi9GFINC00LvRjyA=?=
> 	=?utf-8?b?0YHQsdC+0YDQutC4?=
> 

>  t/t5100/info0012    |    5 ++++
>  t/t5100/msg0012     |    7 ++++++
>  t/t5100/patch0012   |   30 +++++++++++++++++++++++++++++
>  t/t5100/sample.mbox |   52 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  6 files changed, 112 insertions(+), 2 deletions(-)

The testcases are too long, a minimal mbox with encoded "Subject:" would
be enough to test the mailinfo parser, it's all the you need to test
here.

^ permalink raw reply

* Re: Problems with large compressed binaries when converting from svn
From: Johan Herland @ 2009-01-08 10:01 UTC (permalink / raw)
  To: Øyvind Harboe; +Cc: git
In-Reply-To: <c09652430901060455l5179888ep3c51ff4e3dd5a6ef@mail.gmail.com>

On Tuesday 06 January 2009, Øyvind Harboe wrote:
> I'm converting from svn and I've run into a
> problem with tar.gz and tar.bz2 compressed files.
>
> (This is a separate but only slightly related to previous post).
>
> In subversion we committed large tar.bz2/gz files. These files would
> change relatively rarely, but only very slightly.  The trouble with the
> tar.bz2 format is that if the first byte changes, then the rest of the
> file will also be different. .zip does not have this problem, but .zip
> isn't a very friendly format for our purposes.
>
> Later on the tar.bz2/gz files started to change fairly often, but
> harddrives get bigger much more quickly than the .svn repository grows so
> we just kept doing things the same way rather than reeducate and
> reengineer the procedures.
>
> With .git we need to handle this differently somehow.
>
> Does git have some capability to store diffs of compressed files
> efficiently?
>
> The only other alternative I can think of is to commit uncompressed
> .tar files which is a bit of a bump in the road, but I suppose could be
> made to work.

Git can automate this for you. Take a look at the gitattributes(5) man page, 
specifically the "filter" attribute. You should be able to set up filter 
drivers for .tar.gz files that use "clean=gunzip" and "smudge=gzip" (and a 
similar filter driver for .tar.bz2 files).

If I've understood this right (I haven't used this myself) your checkouts 
should now have .tar.gz and .tar.bz2 files, even though Git only 
stores .tar files internally (thus improving compression across versions 
dramatically).


Have fun! :)

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply

* Re: collapsing commits with rebase
From: Johannes Schindelin @ 2009-01-08 11:07 UTC (permalink / raw)
  To: Geoff Russell; +Cc: git
In-Reply-To: <93c3eada0901071759u2496835dy134d92613bf4244b@mail.gmail.com>

Hi,

On Thu, 8 Jan 2009, Geoff Russell wrote:

> On Thu, Jan 8, 2009 at 11:15 AM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
>
> > Alternatively, something like this should work for you:
> >
> >        $ git checkout A
> >        $ git read-tree -u -m D
> >        $ git commit -m "My message"
> >        $ git cherry-pick E
> >        $ git cherry-pick F
> 
> Plan B is looking good, because I'd generally like the commit message to 
> be the concatenation of the messages for B,C and D.

Replace the commit call by this:

	$ for commit in B C D
	  do
		git cat-file commit $commit | sed '1,/^$/d'
		# possibly add an empty line between the commit messages,
		# git commit will strip away empty lines at the end.
	  done |
	  git commit -F -

Hth,
Dscho

^ permalink raw reply

* Re: Can I prevent someone clone my git repository?
From: Johannes Schindelin @ 2009-01-08 11:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Emily Ren, git
In-Reply-To: <7vr63e42ke.fsf@gitster.siamese.dyndns.org>

Hi,

On Thu, 8 Jan 2009, Junio C Hamano wrote:

> The git-daemon transport deliberately omits authentication, and you 
> cannot restrict when they come over the git native transport using a URL 
> like git://your-host/repository.git

If the people are on different IPs, a hook can restrict who may clone, 
since commit v1.6.1-rc1~109.

Ciao,
Dscho

^ permalink raw reply

* [PATCH/RFC v4 0/2] git checkout: optimise away lots of lstat() calls
From: Kjetil Barvik @ 2009-01-08 11:32 UTC (permalink / raw)
  To: git; +Cc: Kjetil Barvik

I have just started to clone some interesting Linux git trees to watch
the development more closely, and therefore also started to use git. I
noticed that 'git checkout' takes some time, and especially that the
'git checkout' command does lots and lots of lstat() calls.

After some more investigation and thinking, I have made 2 patches and
been able to optimise away over 42% of all lstat() calls in some cases
for the 'git checkout' command.  I have not tested other git porcelain
commands for reduced lstat() calls, but I would guess that the more
effective 'lstat_cache()' compared to 'has_leading_symlink_cache()',
should also give better numbers in other cases.

Both patches is against git master, and the git 'make test' test suite
still passes after each patch.

To document the improvement, below is some numbers, which compares
before and after the 2 patches. To reproduce the numbers:

- git clone the Linux git tree to be able to get the Linux tags
  'v2.6.25' and 'v2.6.27'.
- git checkout -b my-v2.6.27 v2.6.27
- git checkout -b my-v2.6.25 v2.6.25

Then, when the current branch is 'my-v2.6.25' do:

  strace -o strace_to27 -T git checkout -q my-v2.6.27

And then you pretty print and collect stats from the 'strace_to27'
file.  If someone wants a copy of the strace_stat.pl script, which I
made/used to do the pretty printing, then give me a hint.

Below is the stats/numbers from the current git version (before the 2
patches).  Notice that we do an lstat() call on the "arch" directory
over 6000 times!

TOTAL      185151 100.000% OK:165544 NOT: 19607  11.136001 sec   60 usec/call
lstat64    120954  65.327% OK:107013 NOT: 13941   5.388727 sec   45 usec/call
  strings  120954 tot  30163 uniq   4.010 /uniq   5.388727 sec   45 usec/call
  files     61491 tot  28712 uniq   2.142 /uniq   2.740520 sec   45 usec/call
  dirs      45522 tot   1436 uniq  31.701 /uniq   1.994448 sec   44 usec/call
  errors    13941 tot   5189 uniq   2.687 /uniq   0.653759 sec   47 usec/call
             6297   5.206% OK:  6297 NOT:     0  "arch"
             4544   3.757% OK:  4544 NOT:     0  "drivers"
             1816   1.501% OK:  1816 NOT:     0  "arch/arm"
             1499   1.239% OK:  1499 NOT:     0  "include"
              912   0.754% OK:   912 NOT:     0  "arch/powerpc"
              764   0.632% OK:   764 NOT:     0  "fs"
              746   0.617% OK:   746 NOT:     0  "drivers/net"
              662   0.547% OK:   662 NOT:     0  "net"
              652   0.539% OK:   325 NOT:   327  "arch/sparc/include"
              636   0.526% OK:   636 NOT:     0  "drivers/media"
              606   0.501% OK:   606 NOT:     0  "include/linux"
              533   0.441% OK:   533 NOT:     0  "arch/sh"
              522   0.432% OK:   260 NOT:   262  "arch/powerpc/include"
              488   0.403% OK:   243 NOT:   245  "arch/sh/include"
              413   0.341% OK:   413 NOT:     0  "arch/sparc"
              390   0.322% OK:   390 NOT:     0  "arch/x86"
              383   0.317% OK:   383 NOT:     0  "Documentation"
              370   0.306% OK:   184 NOT:   186  "arch/ia64/include"
              366   0.303% OK:   366 NOT:     0  "drivers/media/video"
              348   0.288% OK:   173 NOT:   175  "arch/arm/include"

Here is the stats/numbers after applying the 2 patches.  Notice how
nice the top 20 entries list now looks!

TOTAL      133655 100.000% OK:121615 NOT: 12040  10.429999 sec   78 usec/call
lstat64     69603  52.077% OK: 63218 NOT:  6385   3.419920 sec   49 usec/call
  strings   69603 tot  30163 uniq   2.308 /uniq   3.419920 sec   49 usec/call
  files     61491 tot  28712 uniq   2.142 /uniq   3.034869 sec   49 usec/call
  dirs       1727 tot   1164 uniq   1.484 /uniq   0.075681 sec   44 usec/call
  errors     6385 tot   5189 uniq   1.230 /uniq   0.309370 sec   48 usec/call
                4   0.006% OK:     4 NOT:     0  ".gitignore"
                4   0.006% OK:     4 NOT:     0  ".mailmap"
                4   0.006% OK:     4 NOT:     0  "CREDITS"
                4   0.006% OK:     4 NOT:     0  "Documentation/00-INDEX"
                4   0.006% OK:     4 NOT:     0  "Documentation/ABI/testing/sysfs-block"
                4   0.006% OK:     4 NOT:     0  "Documentation/ABI/testing/sysfs-firmware-acpi"
                4   0.006% OK:     4 NOT:     0  "Documentation/CodingStyle"
                4   0.006% OK:     4 NOT:     0  "Documentation/DMA-API.txt"
                4   0.006% OK:     4 NOT:     0  "Documentation/DMA-mapping.txt"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/Makefile"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/gadget.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/kernel-api.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/kernel-locking.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/procfs-guide.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/procfs_example.c"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/rapidio.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/s390-drivers.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/uio-howto.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/videobook.tmpl"
                4   0.006% OK:     4 NOT:     0  "Documentation/DocBook/writing_usb_driver.tmpl"

Note that the overall used system time as recorded from 'strace -T',
does not drop so much that the reduced lstat() time should indicate
for _this_ particular test run.  This is because now each unlink()
call takes much more time, at least for me on an slow ide disk (using
ext3) on a laptop.

A simple test gives me an overall improvement of 2.937 seconds: real
time drops from 28.195s (best of 5 runs with 'time git ...'), to
25.381s (best of 5 runs).

I have also noticed that inside the unlink_entry() function in file
unpack-trees.c, one could often end up calling rmdir() lots and lots
of times on none-empty directories.  Maybe one should schedule each
directory for removal by an appropriate function, and then at the end
call a new function to clean all the directories at once?

Comments?

----
  Changes since v3:
  = inside patch 1/2
    - readability: rename 'greatest_common_cache_path_prefix()' to
      'greatest_match_lstat_cache()' and corresponding line edits
  = inside patch 2/2
    - readability: renames of several variables and one function
    - reuse original names for 'buf' and 'len' (makes the patch touch
      less lines)
    - simplified the update of the cache
    - no need to clear the cache inside remove_subtree()
    - small cleanups of comments and commit message

  Changes since v2:
  = inside patch 1/2
      [[Following is updates after comments from Linus Torvalds - Thanks!]]
    - simplified the interface: introduce 2 static inline functions
      has_symlink_leading_path() and has_symlink_or_noent_leading_path()
    - similar, introduce 2 defines: clear_symlink_cache() and
      clear_symlink_or_noent_cache()
    - reorganise the patches: previous patch 2/4 and 4/4 is put into
      this one
    - update the commit message accordingly
    - keep the symlinks.c file
  = inside patch 2/2
    - was patch 3/4 in v2
    - always null terminate the dirs_path array
    - update the patch with some of the comments regarding patch 1/4
      from Junio C Hamano

  Changes since v1:
  = inside patch 1/4
    - always null terminate the cache_path array
    - added a paragraph to the commit message for this patch
    - small cleanup on 2 comments, and a small line indentation change
      [[Following is updates after comments from Junio C Hamano - Thanks!]]
    - removed the 'static inline update_path_cache()' function
    - replaced the else-part of the above inline function with a call
      to the 'clear_lstat_cache()' function.
    - deleted the '|| errno == ENOTDIR' part inside the big while-loop
      inside check_lstat_cache(), and updated the named BIT-fields
      accordingly
  = inside patch 2/4
    - moved a paragraph out from the commit message for this patch and
      into this cover-letter
      [[Following is updates after comments from Junio C Hamano - Thanks!]]
    - Removed the '|LSTAT_NOTDIR' part from the call to lstat_cache()
      inside function 'check_removed()' inside file diff-lib.c


Kjetil Barvik (2):
  Optimised, faster, more effective symlink/directory detection
  entry.c: create_directories(): only create/check each directory once!

 builtin-add.c          |    1 +
 builtin-apply.c        |    1 +
 builtin-update-index.c |    1 +
 cache.h                |   23 +++++++++-
 diff-lib.c             |    1 +
 entry.c                |   57 ++++++++++++++++++----
 symlinks.c             |  120 ++++++++++++++++++++++++++++++-----------------
 unpack-trees.c         |    6 ++-
 8 files changed, 152 insertions(+), 58 deletions(-)

^ permalink raw reply

* [PATCH/RFC v4 1/2] Optimised, faster, more effective symlink/directory detection
From: Kjetil Barvik @ 2009-01-08 11:32 UTC (permalink / raw)
  To: git; +Cc: Kjetil Barvik
In-Reply-To: <1231414356-6982-1-git-send-email-barvik@broadpark.no>

Changes includes the following:

- The cache functionality is more effective.  Previously when A/B/C/D
  was in the cache and A/B/C/E/file.c was called for, there was no
  match at all from the cache.  Now we use the fact that the paths
  "A", "A/B" and "A/B/C" is already tested, and we only need to do an
  lstat() call on "A/B/C/E".

- We only cache/store the last path regardless of it's type.  Since the
  cache functionality is always used with alphabetically sorted names
  (at least it seams so for me), there is no need to store both the
  last symlink-leading path and the last real-directory path.  Note
  that if the cache is not called with (mostly) alphabetically sorted
  names, neither the old, nor this new one, would be very effective.

- We also can cache the fact that a directory does not exist.
  Previously we could end up doing lots of lstat() calls for a removed
  directory which previously contained lots of files.  Since we
  already have simplified the cache functionality and only store the
  last path (see above), this new functionality was easy to add.

- Previously, when symlink A/B/C/S was cached/stored in the
  symlink-leading path, and A/B/C/file.c was called for, it was not
  easy to use the fact that we already known that the paths "A", "A/B"
  and "A/B/C" is real directories.  Since we now only store one single
  path (the last one), we also get similar logic for free regarding
  the new "non-exsisting-directory-cache".

- Avoid copying the first path components of the name 2 zillions times
  when we tests new path components.  Since we always cache/store the
  last path, we can copy each component as we test those directly into
  the cache.  Previously we ended up doing a memcpy() for the full
  path/name right before each lstat() call, and when updating the
  cache for each time we have tested an new path component.

- We also use less memory, that is PATH_MAX bytes less memory on the
  stack and PATH_MAX bytes less memory on the heap.

- Introduce a 3rd argument, 'unsigned int track_flags', to the
  cache-test function, check_lstat_cache().  This new argument can be
  used to tell the cache functionality which types of directories
  should be cached.

- Also introduce a 'void clear_lstat_cache(void)' function, which
  should be used to clean the cache before usage.  If for instance,
  you have changed the types of directories which should be cached,
  the cache could contain a path which was not wanted.

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
---
:100644 100644 719de8b... 870961e... M	builtin-add.c
:100644 100644 a8f75ed... d3d001a... M	builtin-apply.c
:100644 100644 5604977... 8907219... M	builtin-update-index.c
:100644 100644 231c06d... 768ba38... M	cache.h
:100644 100644 ae96c64... c9caa0e... M	diff-lib.c
:100644 100644 5a5e781... a68b11e... M	symlinks.c
:100644 100644 54f301d... 28e2759... M	unpack-trees.c
 builtin-add.c          |    1 +
 builtin-apply.c        |    1 +
 builtin-update-index.c |    1 +
 cache.h                |   22 ++++++++-
 diff-lib.c             |    1 +
 symlinks.c             |  120 ++++++++++++++++++++++++++++++-----------------
 unpack-trees.c         |    5 +-
 7 files changed, 104 insertions(+), 47 deletions(-)

diff --git a/builtin-add.c b/builtin-add.c
index 719de8b0f2d2d831f326d948aa18700e5c474950..870961e8ca4e3d6f9333020083d0a232bccd542c 100644
--- a/builtin-add.c
+++ b/builtin-add.c
@@ -225,6 +225,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 
 	argc = parse_options(argc, argv, builtin_add_options,
 			  builtin_add_usage, 0);
+	clear_symlink_cache();
 	if (patch_interactive)
 		add_interactive = 1;
 	if (add_interactive)
diff --git a/builtin-apply.c b/builtin-apply.c
index a8f75ed3ed411d8cf7a3ec9dfefef7407c50f447..d3d001a96be6e502d6338af4467f7c313370d78e 100644
--- a/builtin-apply.c
+++ b/builtin-apply.c
@@ -3154,6 +3154,7 @@ int cmd_apply(int argc, const char **argv, const char *unused_prefix)
 	if (apply_default_whitespace)
 		parse_whitespace_option(apply_default_whitespace);
 
+	clear_symlink_cache();
 	for (i = 1; i < argc; i++) {
 		const char *arg = argv[i];
 		char *end;
diff --git a/builtin-update-index.c b/builtin-update-index.c
index 560497750586ec61be4e34de6dedd9c307129817..8907219fb9cb438113e29ee17854edb5dd4baa4d 100644
--- a/builtin-update-index.c
+++ b/builtin-update-index.c
@@ -581,6 +581,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 	if (entries < 0)
 		die("cache corrupted");
 
+	clear_symlink_cache();
 	for (i = 1 ; i < argc; i++) {
 		const char *path = argv[i];
 		const char *p;
diff --git a/cache.h b/cache.h
index 231c06d7726b575f6e522d5b0c0fe43557e8c651..768ba3825f3015828381490b0c387177a4f71578 100644
--- a/cache.h
+++ b/cache.h
@@ -719,7 +719,27 @@ struct checkout {
 };
 
 extern int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *topath);
-extern int has_symlink_leading_path(int len, const char *name);
+
+#define LSTAT_DIR       (1u << 0)
+#define LSTAT_NOENT     (1u << 1)
+#define LSTAT_SYMLINK   (1u << 2)
+#define LSTAT_LSTATERR  (1u << 3)
+#define LSTAT_ERR       (1u << 4)
+extern unsigned int check_lstat_cache(int len, const char *name,
+				      unsigned int track_flags);
+extern void clear_lstat_cache(void);
+static inline unsigned int has_symlink_leading_path(int len, const char *name)
+{
+	return check_lstat_cache(len, name, LSTAT_SYMLINK|LSTAT_DIR) &
+		LSTAT_SYMLINK;
+}
+#define clear_symlink_cache() clear_lstat_cache()
+static inline unsigned int has_symlink_or_noent_leading_path(int len, const char *name)
+{
+	return check_lstat_cache(len, name, LSTAT_SYMLINK|LSTAT_NOENT|LSTAT_DIR) &
+		(LSTAT_SYMLINK|LSTAT_NOENT);
+}
+#define clear_symlink_or_noent_cache() clear_lstat_cache()
 
 extern struct alternate_object_database {
 	struct alternate_object_database *next;
diff --git a/diff-lib.c b/diff-lib.c
index ae96c64ca209f4df9008198e8a04b160bed618c7..c9caa0e6ef0f4a8ee8b850869ef6d0f52b712385 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -69,6 +69,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 		diff_unmerged_stage = 2;
 	entries = active_nr;
 	symcache[0] = '\0';
+	clear_symlink_cache();
 	for (i = 0; i < entries; i++) {
 		struct stat st;
 		unsigned int oldmode, newmode;
diff --git a/symlinks.c b/symlinks.c
index 5a5e781a15d7d9cb60797958433eca896b31ec85..a68b11e2dbd875bc26b4fe0b87490dd64305cdd0 100644
--- a/symlinks.c
+++ b/symlinks.c
@@ -1,64 +1,96 @@
 #include "cache.h"
 
-struct pathname {
-	int len;
-	char path[PATH_MAX];
-};
+static char         cache_path[PATH_MAX];
+static int          cache_len   = 0;
+static unsigned int cache_flags = 0;
 
-/* Return matching pathname prefix length, or zero if not matching */
-static inline int match_pathname(int len, const char *name, struct pathname *match)
+static inline int greatest_match_lstat_cache(int len, const char *name)
 {
-	int match_len = match->len;
-	return (len > match_len &&
-		name[match_len] == '/' &&
-		!memcmp(name, match->path, match_len)) ? match_len : 0;
-}
+	int max_len, match_len = 0, i = 0;
 
-static inline void set_pathname(int len, const char *name, struct pathname *match)
-{
-	if (len < PATH_MAX) {
-		match->len = len;
-		memcpy(match->path, name, len);
-		match->path[len] = 0;
+	max_len = len < cache_len ? len : cache_len;
+	while (i < max_len && name[i] == cache_path[i]) {
+		if (name[i] == '/') match_len = i;
+		i++;
 	}
+	if (i == cache_len && len > cache_len && name[cache_len] == '/')
+		match_len = cache_len;
+	return match_len;
 }
 
-int has_symlink_leading_path(int len, const char *name)
+/*
+ * Check if name 'name' of length 'len' has a symlink leading
+ * component, or if the directory exists and is real, or not.
+ *
+ * To speed up the check, some information is allowed to be cached.
+ * This is indicated by the 'track_flags' argument.
+ */
+unsigned int
+check_lstat_cache(int len, const char *name, unsigned int track_flags)
 {
-	static struct pathname link, nonlink;
-	char path[PATH_MAX];
+	int match_len, last_slash, max_len;
+	unsigned int match_flags, ret_flags, save_flags;
 	struct stat st;
-	char *sp;
-	int known_dir;
 
-	/*
-	 * See if the last known symlink cache matches.
+	/* Check if match from the cache for 2 "excluding" path types.
 	 */
-	if (match_pathname(len, name, &link))
-		return 1;
+	match_len = last_slash = greatest_match_lstat_cache(len, name);
+	match_flags = cache_flags & track_flags & (LSTAT_NOENT|LSTAT_SYMLINK);
+	if (match_flags && match_len == cache_len)
+		return match_flags;
 
-	/*
-	 * Get rid of the last known directory part
+	/* Okay, no match from the cache so far, so now we have to
+	 * check the rest of the path components.
 	 */
-	known_dir = match_pathname(len, name, &nonlink);
-
-	while ((sp = strchr(name + known_dir + 1, '/')) != NULL) {
-		int thislen = sp - name ;
-		memcpy(path, name, thislen);
-		path[thislen] = 0;
+	ret_flags = LSTAT_DIR;
+	max_len = len < PATH_MAX ? len : PATH_MAX;
+	while (match_len < max_len) {
+		do {
+			cache_path[match_len] = name[match_len];
+			match_len++;
+		} while (match_len < max_len && name[match_len] != '/');
+		if (match_len >= max_len)
+			break;
+		last_slash = match_len;
+		cache_path[last_slash] = '\0';
 
-		if (lstat(path, &st))
-			return 0;
-		if (S_ISDIR(st.st_mode)) {
-			set_pathname(thislen, path, &nonlink);
-			known_dir = thislen;
+		if (lstat(cache_path, &st)) {
+			ret_flags = LSTAT_LSTATERR;
+			if (errno == ENOENT)
+				ret_flags |= LSTAT_NOENT;
+		} else if (S_ISDIR(st.st_mode)) {
 			continue;
-		}
-		if (S_ISLNK(st.st_mode)) {
-			set_pathname(thislen, path, &link);
-			return 1;
+		} else if (S_ISLNK(st.st_mode)) {
+			ret_flags = LSTAT_SYMLINK;
+		} else {
+			ret_flags = LSTAT_ERR;
 		}
 		break;
 	}
-	return 0;
+
+	/* At the end update the cache.  Note that max 3 different
+	 * path types can be cached for the moment!
+	 */
+	save_flags = ret_flags & track_flags &
+		(LSTAT_NOENT|LSTAT_SYMLINK|LSTAT_DIR);
+	if (save_flags && last_slash > 0 && last_slash < PATH_MAX) {
+		cache_path[last_slash] = '\0';
+		cache_len   = last_slash;
+		cache_flags = save_flags;
+	} else {
+		clear_lstat_cache();
+	}
+	return ret_flags;
+}
+
+/*
+ * Before usage of the check_lstat_cache() function one should call
+ * clear_lstat_cache() (at an appropriate place) to make sure that the
+ * cache is clean.
+ */
+void clear_lstat_cache(void)
+{
+	cache_path[0] = '\0';
+	cache_len     = 0;
+	cache_flags   = 0;
 }
diff --git a/unpack-trees.c b/unpack-trees.c
index 54f301da67be879c80426bc21776427fdd38c02e..28e275981a21b033459ef9c7e420cce4bf7e5513 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -61,7 +61,7 @@ static void unlink_entry(struct cache_entry *ce)
 	char *cp, *prev;
 	char *name = ce->name;
 
-	if (has_symlink_leading_path(ce_namelen(ce), ce->name))
+	if (has_symlink_or_noent_leading_path(ce_namelen(ce), ce->name))
 		return;
 	if (unlink(name))
 		return;
@@ -105,6 +105,7 @@ static int check_updates(struct unpack_trees_options *o)
 		cnt = 0;
 	}
 
+	clear_symlink_or_noent_cache();
 	for (i = 0; i < index->cache_nr; i++) {
 		struct cache_entry *ce = index->cache[i];
 
@@ -584,7 +585,7 @@ static int verify_absent(struct cache_entry *ce, const char *action,
 	if (o->index_only || o->reset || !o->update)
 		return 0;
 
-	if (has_symlink_leading_path(ce_namelen(ce), ce->name))
+	if (has_symlink_or_noent_leading_path(ce_namelen(ce), ce->name))
 		return 0;
 
 	if (!lstat(ce->name, &st)) {
-- 
1.6.1.rc1.49.g7f705

^ permalink raw reply related

* [PATCH/RFC v4 2/2] entry.c: create_directories(): only create/check each directory once!
From: Kjetil Barvik @ 2009-01-08 11:32 UTC (permalink / raw)
  To: git; +Cc: Kjetil Barvik
In-Reply-To: <1231414356-6982-1-git-send-email-barvik@broadpark.no>

When we do an 'git checkout' after some time we end up in the
'checkout_entry()' function inside entry.c, and from here we call the
'create_directories()' function to make sure that all the directories
exists for the possible new file or entry.

The 'create_directories()' function happily started to check that all
path component exists.  This resulted in tons and tons of calls to
lstat() or stat() when we checkout files nested deep inside a
directory.

We try to avoid this by remembering the last created directory.

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
---
:100644 100644 768ba38... ec1297f... M	cache.h
:100644 100644 aa2ee46... 1d5fc85... M	entry.c
:100644 100644 28e2759... 0a03e65... M	unpack-trees.c
 cache.h        |    1 +
 entry.c        |   57 +++++++++++++++++++++++++++++++++++++++++++++----------
 unpack-trees.c |    1 +
 3 files changed, 48 insertions(+), 11 deletions(-)

diff --git a/cache.h b/cache.h
index 768ba3825f3015828381490b0c387177a4f71578..ec1297ff5621cc9eb7fce51cc025f18a030ac9ea 100644
--- a/cache.h
+++ b/cache.h
@@ -718,6 +718,7 @@ struct checkout {
 		 refresh_cache:1;
 };
 
+extern void clear_created_dirs_cache(void);
 extern int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *topath);
 
 #define LSTAT_DIR       (1u << 0)
diff --git a/entry.c b/entry.c
index aa2ee46a84033585d8e07a585610c5a697af82c2..1d5fc85b5f4b02bcdb862745777f7bc086b70c63 100644
--- a/entry.c
+++ b/entry.c
@@ -1,18 +1,50 @@
 #include "cache.h"
 #include "blob.h"
 
-static void create_directories(const char *path, const struct checkout *state)
+static char buf[PATH_MAX];
+static int  buf_len = 0;
+
+static inline int
+greatest_match_last_created_dir(int len, const char *path)
 {
-	int len = strlen(path);
-	char *buf = xmalloc(len + 1);
-	const char *slash = path;
+	int max_len, match_len = 0, i = 0;
 
-	while ((slash = strchr(slash+1, '/')) != NULL) {
-		struct stat st;
-		int stat_status;
+	max_len = len < buf_len ? len : buf_len;
+	while (i < max_len && path[i] == buf[i]) {
+		if (path[i] == '/') match_len = i;
+		i++;
+	}
+	if (i == buf_len && len > buf_len && path[buf_len] == '/')
+		match_len = buf_len;
+	return match_len;
+}
+
+void clear_created_dirs_cache(void)
+{
+	buf[0]  = 0;
+	buf_len = 0;
+}
+
+static void
+create_directories(int path_len, const char *path, const struct checkout *state)
+{
+	int path_len_max, buf_i, len, stat_status;
+	struct stat st;
 
-		len = slash - path;
-		memcpy(buf, path, len);
+	/* Check the cache for previously created directories (and
+	 * components) within this function.  There is no need to
+	 * re-create directory components more than once!
+	 */
+	path_len_max = path_len < PATH_MAX ? path_len : PATH_MAX;
+	buf_i = len = greatest_match_last_created_dir(path_len_max, path);
+	while (buf_i < path_len_max) {
+		do {
+			buf[buf_i] = path[buf_i];
+			buf_i++;
+		} while (buf_i < path_len_max && path[buf_i] != '/');
+		if (buf_i >= path_len_max)
+			break;
+		len = buf_i;
 		buf[len] = 0;
 
 		if (len <= state->base_dir_len)
@@ -45,7 +77,9 @@ static void create_directories(const char *path, const struct checkout *state)
 			die("cannot create directory at %s", buf);
 		}
 	}
-	free(buf);
+	/* Update the cache of already created directories */
+	buf[len] = 0;
+	buf_len  = len;
 }
 
 static void remove_subtree(const char *path)
@@ -201,6 +235,7 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *t
 
 	memcpy(path, state->base_dir, len);
 	strcpy(path + len, ce->name);
+	len += ce_namelen(ce);
 
 	if (!lstat(path, &st)) {
 		unsigned changed = ce_match_stat(ce, &st, CE_MATCH_IGNORE_VALID);
@@ -229,6 +264,6 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *t
 			return error("unable to unlink old '%s' (%s)", path, strerror(errno));
 	} else if (state->not_new)
 		return 0;
-	create_directories(path, state);
+	create_directories(len, path, state);
 	return write_entry(ce, path, state, 0);
 }
diff --git a/unpack-trees.c b/unpack-trees.c
index 28e275981a21b033459ef9c7e420cce4bf7e5513..0a03e65f9c9d869ab2d8b3c337f032ff2b8e7b2f 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -119,6 +119,7 @@ static int check_updates(struct unpack_trees_options *o)
 		}
 	}
 
+	clear_created_dirs_cache();
 	for (i = 0; i < index->cache_nr; i++) {
 		struct cache_entry *ce = index->cache[i];
 
-- 
1.6.1.rc1.49.g7f705

^ permalink raw reply related

* You've got an E-card
From: Unknown, Karen @ 2009-01-08 12:52 UTC (permalink / raw)
  To: git

Karen chose for you a digital postcard.
To view your eCard, click on the following link: 
http://greetingsupersite.com/?cardnum=10e7114323537c2c72bea4e52eba2
This card was sent from UltimateEcards.com!

^ permalink raw reply

* [PATCH] allow 8bit data in email body sent by send-email
From: Andre Przywara @ 2009-01-08 13:50 UTC (permalink / raw)
  To: git; +Cc: Andre Przywara

Hi,
when sending patch files via git send-email, the perl script assumes
7bit characters only. If there are other bytes in the body (foreign language
characters in names or translations), some servers (like vger.kernel.org)
reject the mail because of thät. This patch always adds an 8bit header line
to each mail.
If someone thinks this has any side-effects, tell me, I am open to suggestions.

Regards,
André.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>

Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 277-84917
****to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Jochen Polster; Thomas M. McCoy; Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

---
 git-send-email.perl |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index 77ca8fe..68a462c 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -793,6 +793,7 @@ To: $to${ccline}
 Subject: $subject
 Date: $date
 Message-Id: $message_id
+Content-Transfer-Encoding: 8bit
 X-Mailer: git-send-email $gitversion
 ";
 	if ($thread && $reply_to) {
-- 
1.5.5

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox