Git development

Git development
 help / color / mirror / Atom feed

* git filter-branch doesn't dereference annotated tags
From: Grégory Pakosz @ 2012-12-31 16:24 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 2067 bytes --]

Please disregard the previous email that contains an incorrect fix
suggestion. I wish my first contribution was flawless.

Here is what's happening.
git-filter-branch let git-update-ref -d verify that the value for $ref
matches $sha1.
However, when $ref points to an annotated tag that is being deleted,
that verification fails because $sha1 is the commit underneath.

I think there are two possible fixes:
  1) either make git-filter-branch dereference annotated tags and do
the verification itself then use the two arguments version of git
update-ref
  2) in the case of an annotated tag, pass another <old value> to git update-ref

Please find below a patch that implements solution 1). Please note the
patch doesn't contain a unit test for this situation as I wasn't sure
how to provide one. Yet I tested it on the repository I'm working on.

Gregory

>From 9d21960088a61bfbac1ffdb4b13e3038f88ab4d6 Mon Sep 17 00:00:00 2001
From: Gregory Pakosz <gpakosz@visionobjects.com>
Date: Mon, 31 Dec 2012 15:30:36 +0100
Subject: [PATCH] git-filter-branch: support annotated tags deletion

git-filter-branch let git-update-ref -d verify that the value for $ref matches
$sha1. However, when $ref is an annotated tag being deleted that verfication
fails because $sha1 corresponds to a commit object.

Instead of asking git-update-ref to verify values actually match, dereference
$ref ourselves and test against $sha1 first. Then invoke git-update-ref with two
arguments.

Signed-off-by: Gregory Pakosz <gpakosz@visionobjects.com>
---
 git-filter-branch.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 5314249..bbee6d0 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -383,7 +383,7 @@ do
 	case "$rewritten" in
 	'')
 		echo "Ref '$ref' was deleted"
-		git update-ref -m "filter-branch: delete" -d "$ref" $sha1 ||
+		test $(git rev-parse --verify "$ref^{commit}") = $sha1 && git
update-ref -m "filter-branch: delete" -d "$ref" ||
 			die "Could not delete $ref"
 	;;
 	$_x40)
-- 
1.8.0.1

[-- Attachment #2: 0001-git-filter-branch-support-annotated-tags-deletion.patch --]
[-- Type: application/octet-stream, Size: 1211 bytes --]

From 9d21960088a61bfbac1ffdb4b13e3038f88ab4d6 Mon Sep 17 00:00:00 2001
From: Gregory Pakosz <gpakosz@visionobjects.com>
Date: Mon, 31 Dec 2012 15:30:36 +0100
Subject: [PATCH] git-filter-branch: support annotated tags deletion

git-filter-branch let git-update-ref -d verify that the value for $ref matches
$sha1. However, when $ref is an annotated tag being deleted that verfication
fails because $sha1 corresponds to a commit object.

Instead of asking git-update-ref to verify values actually match, dereference
$ref ourselves and test against $sha1 first. Then invoke git-update-ref with two
arguments.

Signed-off-by: Gregory Pakosz <gpakosz@visionobjects.com>
---
 git-filter-branch.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 5314249..bbee6d0 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -383,7 +383,7 @@ do
 	case "$rewritten" in
 	'')
 		echo "Ref '$ref' was deleted"
-		git update-ref -m "filter-branch: delete" -d "$ref" $sha1 ||
+		test $(git rev-parse --verify "$ref^{commit}") = $sha1 && git update-ref -m "filter-branch: delete" -d "$ref" ||
 			die "Could not delete $ref"
 	;;
 	$_x40)
-- 
1.8.0.1

^ permalink raw reply related

* git filter-branch doesn't dereference annotated tags
From: Grégory Pakosz @ 2012-12-31 14:36 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 351 bytes --]

Hello,

I noticed git-filter-branch doesn't dereference annotated tags prior
to invoking git update-ref -d.

Please find a patch attached that changes the call to git update-ref:

-git update-ref -m "filter-branch: delete" -d "$ref" $sha1
+git update-ref -m "filter-branch: delete" -d $(git rev-parse --verify
"$ref^{commit}") $sha1

Regards,
Gregory

[-- Attachment #2: 0001-git-filter-branch-Dereference-annotated-tags-upon-de.patch --]
[-- Type: application/octet-stream, Size: 869 bytes --]

From cee5462f26bbb280f471ba1220398924bfd4bfd7 Mon Sep 17 00:00:00 2001
From: Gregory Pakosz <gpakosz@visionobjects.com>
Date: Mon, 31 Dec 2012 15:30:36 +0100
Subject: [PATCH] git-filter-branch: Dereference annotated tags upon deletion

git-filter-branch didn't dereference annotated tags upon deletion which made
git-update-ref -d unhappy.
---
 git-filter-branch.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 5314249..773a91b 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -383,7 +383,7 @@ do
 	case "$rewritten" in
 	'')
 		echo "Ref '$ref' was deleted"
-		git update-ref -m "filter-branch: delete" -d "$ref" $sha1 ||
+		git update-ref -m "filter-branch: delete" -d $(git rev-parse --verify "$ref^{commit}") $sha1 ||
 			die "Could not delete $ref"
 	;;
 	$_x40)
-- 
1.8.0.1


^ permalink raw reply related

* Re: Lockless Refs?  (Was [PATCH] refs: do not use cached refs in repack_without_ref)
From: Martin Fick @ 2012-12-31 10:30 UTC (permalink / raw)
  To: Michael Haggerty, Shawn Pearce; +Cc: Jeff King, git, Junio C Hamano
In-Reply-To: <201212271611.52203.mfick@codeaurora.org>

On Thursday, December 27, 2012 04:11:51 pm Martin Fick wrote:
> It concerns me that git uses any locking at all, even for
> refs since it has the potential to leave around stale
> locks.
> ...
> [a previous not so great attempt to fix this]
> ...

I may have finally figured out a working loose ref update 
mechanism which I think can avoid stale locks.  Unfortunately 
it requires atomic directory renames and universally unique 
identifiers (uuids).  These may be no-go criteria?  But I 
figure it is worth at least exploring this idea because of the 
potential benefits?

The general approach is to setup a transaction and either 
commit or abort it.  A transaction can be setup by renaming 
an appropriately setup directory to the "ref.lock" name.  If 
the rename succeeds, the transaction is begun.  Any actor can 
abort the transaction (up until it is committed) by simply 
deleting the "ref.lock" directory, so it is not at risk of 
going stale.  However, once the actor who sets up the 
transaction commits it, deleting the "ref.lock" directory 
simply aids in cleaning it up for the next transaction 
(instead of aborting it).

One important piece of the transaction is the use of uuids.  
The uuids provide a mechanism to tie the atomic commit pieces 
to the transactions and thus to prevent long sleeping process 
from inadvertently performing actions which could be out of 
date when they wake finally up.  In each case, the atomic 
commit piece is the renaming of a file.   For the create and 
update pieces, a file is renamed from the "ref.lock" dir to 
the "ref" file resulting in an update to the sha for the ref.  
However, in the delete case, the "ref" file is instead renamed 
to end up in the "ref.lock" directory resulting in a delete 
of the ref.  This scheme does not affect the way refs are read 
today,

To prepare for a transaction, an actor first generates a uuid 
(an exercise I will delay for now).  Next, a tmp directory 
named after the uuid is generated in the parent directory for 
the ref to be updated, perhaps something like:  ".lock_uuid".  
In this directory is places either a file or a directory named 
after the uuid, something like: ".lock_uuid/,uuid".  In the 
case of a create or an update, the new sha is written to this 
file.  In the case of a delete, it is a directory.  

Once the tmp directory is setup, the initiating actor 
attempts to start the transaction by renaming the tmp 
directory to "ref.lock".  If the rename fails, the update 
fails. If the rename succeeds, the actor can then attempt to 
commit the transaction (before another actor aborts it). 

In the case of a create, the actor verifies that "ref" does 
not currently exist, and then renames the now named 
"ref.lock/uuid" file to "ref". On success, the ref was 
created.

In the case of an update, the actor verifies that "ref" 
currently contains the old sha, and then also renames the now 
named "ref.lock/uuid" file to "ref". On success, the ref was 
updated.

In the case of a delete, the actor may verify that "ref" 
currently contains the sha to "prune" if it needs to, and 
then renames the "ref" file to "ref.lock/uuid/delete". On 
success, the ref was deleted.

Whether successful or not, the actor may now simply delete 
the "ref.lock" directory, clearing the way for a new 
transaction.  Any other actor may delete this directory at 
any time also, likely either on conflict (if they are 
attempting to initiate a transaction), or after a grace 
period just to cleanup the FS.  Any actor may also safely 
cleanup the tmp directories, preferably also after a grace 
period.

One neat part about this scheme is that I believe it would be 
backwards compatible with the current locking mechanism since 
the transaction directory will simply appear to be a lock to 
older clients.  And the old lock file should continue to lock 
out these newer transactions.

Due to this backwards compatibility, I believe that this 
could be incrementally employed today without affecting very 
much.  It could be deployed in place of any updates which 
only hold ref.locks to update the loose ref.  So for example 
I think it could replace step 4a below from Michael 
Haggerty's description of today's loose ref pruning during 
ref packing:

> * Pack references:
...
> 4. prune_refs(): for each ref in the ref_to_prune list,
> call  prune_ref():
>
>     a. Lock the reference using lock_ref_sha1(), 
>     verifying that the recorded SHA1 is still valid.  If it
>     is, unlink the loose reference file then free the lock;
>     otherwise leave the loose reference file untouched.

I think it would also therefore be able to replace the loose 
ref locking in Michael's new ref-packing scheme as well as 
the locking in Michael's new ref deletion scheme (again steps 
4):

> * Delete reference foo:
...
>   4. Delete loose ref for "foo":
> 
>      a. Acquire the lock $GIT_DIR/refs/heads/foo.lock
> 
>      b. Unlink $GIT_DIR/refs/heads/foo if it is unchanged.
>  If it is changed, leave it untouched.  If it is deleted,
> that is OK too.
> 
>      c. Release lock $GIT_DIR/refs/heads/foo.lock

...
> * Pack references:
...
>   4. prune_refs(): for each ref in the ref_to_prune list,
> call prune_ref():
> 
>      a. Lock the loose reference using lock_ref_sha1(),
> verifying that the recorded SHA1 is still valid
> 
>      b. If it is, unlink the loose reference file
> (otherwise, leave it untouched)
> 
>      c. Release the lock on the loose reference

To be honest, I suspect I missed something obvious because 
this seems almost too simple to work.  I am ashamed that it 
took me so long to come up with (of course, I will be even 
more ashamed :( when it is shown to be flawed!)  This scheme 
also feels extensible. if there are no obvious flaws in it, I 
will try to post solutions for ref packing and for multiple 
repository/ref transactions also soon.

I welcome any comments/criticisms,

-Martin

^ permalink raw reply

* Re: [PATCH 0/2] Add MAINTAINERS file and clarify gui workflows
From: Thomas Ackermann @ 2012-12-31  9:40 UTC (permalink / raw)
  To: git
In-Reply-To: <7va9svffr4.fsf@alter.siamese.dyndns.org>

Junio C Hamano <gitster <at> pobox.com> writes:

> 
> Thanks; I just realized that nothing in Documentation/ hierarchy
> mentions these; they are only mentioned in "A Note from the
> Maintainer" I send out every once in a while (kept in MaintNotes of
> 'todo' branch):
> 

Wouldn't it be a good idea to put MaintNotes somewhere below ./Documentation?

---
Thomas

^ permalink raw reply

* Aw: Re: Aw: Re: [PATCH 0/3] Move CodingGuidelines and SubmittingPatches to ./Documentation/technical
From: Thomas Ackermann @ 2012-12-31  9:33 UTC (permalink / raw)
  To: gitster, th.acker; +Cc: artagnon, git
In-Reply-To: <7v1ue7fcbh.fsf@alter.siamese.dyndns.org>

 
> 
> Implementation details are part of API; CG and SP are social not
> technical.
> 
This depends on your definition of "social" ;-)
> 
> Also CG and SP are in the part of the documents that are not
> installed for end-users and that is their right place.  They matter
> only to the people who grab our source code.
> 
But isn't that true for all files in ./technical? CG and SP currently
are in ./Documentation which contains *only* files which are installed
for end-users with CG and SP the only exception ...


---
Thomas

^ permalink raw reply

* Re: [RFC/PATCH] gitk: Visualize a merge commit with a right-click in gitk
From: Paul Mackerras @ 2012-12-31  4:27 UTC (permalink / raw)
  To: Jason Holden; +Cc: git
In-Reply-To: <1356826576-24334-1-git-send-email-jason.k.holden.swdev@gmail.com>

On Sat, Dec 29, 2012 at 07:16:16PM -0500, Jason Holden wrote:

> When first doing a merge in git-gui, the "Visualize Merge" button is
> quite helpful to visualize the changes due to a merge.
> But once the merge is complete, there's not a similarly convenient
> way to recreate that merge view in gitk.
> 
> This commit adds to gitk the ability to right-click on a merge commit and
> bring up a new gitk window displaying only those commits involved in
> the merge.
> 
> When right-clicking on a non-merge commit, this option is grayed out.  This
> patch also supports correct visualization of octopus merges

Thanks for the patch.  I have a couple of comments about it.  First,
the exec command waits for the process to complete, which means that
the initial gitk GUI will be unresponsive until the user quits the
gitk window showing the merge, which could be quite confusing for the
user.

Secondly, gitk already has support for showing multiple views of a
repository, that is, different subsets of the commits.  Wouldn't it be
much better to have your new menu item simply create a new view
showing the merge, rather than creating a whole new window?

Paul.

^ permalink raw reply

* cvsps import failure
From: Eric S. Raymond @ 2012-12-31  2:28 UTC (permalink / raw)
  To: Chris Rorvick; +Cc: git

Chris Rorvick <chris@rorvick.com>:
> I tried the new version and found I'm unable to import via pserver:

And now I know why.  One of the cvsps fix patches I merged from Yann
Dirson's collection changed the --root option parsing in an
incompatible way.  As soon as I figure out what it's doing I'll
either revert it or document the new behavior.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

The price of liberty is, always has been, and always will be blood.  The person
who is not willing to die for his liberty has already lost it to the first
scoundrel who is willing to risk dying to violate that person's liberty.  Are
you free? 	-- Andrew Ford

^ permalink raw reply

* Re: Heads up, an emergency fix for git-cvsimport is coming shortly
From: Eric S. Raymond @ 2012-12-31  1:23 UTC (permalink / raw)
  To: Chris Rorvick; +Cc: git
In-Reply-To: <CAEUsAPZ7kzc4qYSvD-YCk9sqQOuW219gOWyxpGqfkxmF2VC-PQ@mail.gmail.com>

Chris Rorvick <chris@rorvick.com>:
> I tried the new version and found I'm unable to import via pserver:
> 
>   $ ./cvsps --root :pserver:me@localhost:/cvsroot module
>   cvsps: connect error: Connection refused
>   cvsps: can't get CVS log data: Connection refused
> 
> Running 2.2b1 (the version packaged w/ Fedora 17) with the same
> arguments with the addition of --cvs-direct connects OK.  I haven't
> taken much time to look into this, so I might be doing something dumb.
>  Thought I'd find out if this is a known issue before delving into it.

Your problem does reproduce here. This paragraph from the output of 
'aptitude show cvs' may be relevant:

 This package contains a CVS binary which can act as both client and server,
 although there is no CVS dæmon; to access remote repositories, please use
 :extssh: not :pserver: any more.

It's therefore possible there's something slightly busted about the pserver 
method at the CVS end, and the 3.[23] code trips over it even though the old
code did not.  Note that new cvsps uses cvs-direct mode all the time; the old
support for fetching logs through local CVS commands is gone.

I use 

      cvsps --root :local:$PWD/repo module

for my testing, and that works. I'm up to my ears in finishing up the
test suite and tracking bugs in the repo-analysis code; if you want to
speed the process up, try running a :pserver: fetch with -v on under
both old and new code to see how the protocol transactions differ.

> Also, I'm curious what impact removing the caching from cvsps will
> have on incremental imports.  Is there any?

Not that I know of.  The caching was a performance hack for human viewing
of changesets.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

^ permalink raw reply

* Re: Heads up, an emergency fix for git-cvsimport is coming shortly
From: Chris Rorvick @ 2012-12-31  0:15 UTC (permalink / raw)
  To: Eric S. Raymond; +Cc: git
In-Reply-To: <20121230192116.C2A2444143@snark.thyrsus.com>

On Sun, Dec 30, 2012 at 1:21 PM, Eric S. Raymond <esr@thyrsus.com> wrote:
> Bad news: the combination of cvsps and the existing git-cvsimport
> script is seriously broken in both places.  This morning I fixed a
> nasty bug in cvsps's branch detection and shipped 3.3. This is a
> different bug from the broken (and now removed) ancestry-branch
> tracking.

I tried the new version and found I'm unable to import via pserver:

  $ ./cvsps --root :pserver:me@localhost:/cvsroot module
  cvsps: connect error: Connection refused
  cvsps: can't get CVS log data: Connection refused

Running 2.2b1 (the version packaged w/ Fedora 17) with the same
arguments with the addition of --cvs-direct connects OK.  I haven't
taken much time to look into this, so I might be doing something dumb.
 Thought I'd find out if this is a known issue before delving into it.

Also, I'm curious what impact removing the caching from cvsps will
have on incremental imports.  Is there any?

Thanks,

Chris Rorvick

^ permalink raw reply

* Re: Aw: Re: [PATCH 0/3] Move CodingGuidelines and SubmittingPatches to ./Documentation/technical
From: Junio C Hamano @ 2012-12-30 21:40 UTC (permalink / raw)
  To: Thomas Ackermann; +Cc: artagnon, git
In-Reply-To: <1965427282.405137.1356879393533.JavaMail.ngmail@webmail18.arcor-online.net>

Thomas Ackermann <th.acker@arcor.de> writes:

>  
> ./Documentation/technical contains not only API documentation but also
> several other documents describing Git implementation topics and thus
> is the place someone wanting to join Git development should look at.

Implementation details are part of API; CG and SP are social not
technical.

Also CG and SP are in the part of the documents that are not
installed for end-users and that is their right place.  They matter
only to the people who grab our source code.

^ permalink raw reply

* Re: [PATCH 1/2] dir.c: Make git-status --ignored more consistent
From: Junio C Hamano @ 2012-12-30 21:36 UTC (permalink / raw)
  To: Antoine Pelisse; +Cc: git, Jeff King, Adam Spiers
In-Reply-To: <CALWbr2w=CWkpbJhC5sjd9HnErmWj9JQnD6UUiDM91ovJ_-16vA@mail.gmail.com>

Antoine Pelisse <apelisse@gmail.com> writes:

> By the way, that merges without conflicts with Adam's series, but it
> will not compile as he renamed functions that I'm now using
> (path_excluded() -> is_path_excluded() that is).
>
> By the way, Junio, how do you handle this situation as a maintainer ?
> Do you keep a note to manually make the change every time you remerge
> the series together ? That is the kind of use-case you can't handle
> with git-rerere, and I've been trying to find a solution to it.

I'll finish the write-up on jc/doc-maintainer topic not in a very
distant future, but not today.

In the meantime, the hint is in the use of refs/merge-fix/ hierarchy
in the Reintegrate script found on my 'todo' branch (which I have a
separate clone/checkout of in "Meta/" directory in my main working
tree).

^ permalink raw reply

* Re: [RFC] pack-objects: compression level for non-blobs
From: Jeff King @ 2012-12-30 21:31 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: David Michael Barr, Git Mailing List
In-Reply-To: <CACsJy8C4UttGKcw11do1POcHZJM7iZ2r7F3ESOqEnWL8kdz+dQ@mail.gmail.com>

On Sun, Dec 30, 2012 at 07:53:48PM +0700, Nguyen Thai Ngoc Duy wrote:

> >   $ cd objects/pack && ls
> >   pack-a3e262f40d95fc0cc97d92797ff9988551367b75.commits
> >   pack-a3e262f40d95fc0cc97d92797ff9988551367b75.idx
> >   pack-a3e262f40d95fc0cc97d92797ff9988551367b75.pack
> >   pack-a3e262f40d95fc0cc97d92797ff9988551367b75.parents
> >   pack-a3e262f40d95fc0cc97d92797ff9988551367b75.timestamps
> >   pack-a3e262f40d95fc0cc97d92797ff9988551367b75.trees
> >
> > Each file describes the objects in the matching pack. If a new pack is
> > generated, you'd throw away the old cache files along with the old pack,
> > and generate new ones. Or not. These are totally optional, and an older
> > version of git will just ignore them. A newer version will use them if
> > they're available, and otherwise fallback to the existing code (i.e.,
> > reading the whole object from the pack). So you can generate them at
> 
> You have probably thought about this (and I don't have the source to
> check first), but we may need to version these extra files so we can
> change the format later if needed. Git versions that do not recognize
> new versions simply ignore the cahce.

Agreed. The current code has a 4-byte magic, followed by a 4-byte
version number, followed by a 4-byte record size[1]. Then the data,
followed by the pack sha1, followed by a sha1 of all of the preceding
data.  So you can verify the validity of any cache file (both its
checksum, and that it matches the right packfile), just as you can with
a ".idx" file.

[1] Probably the magic and version should be per-file-type, and the
    record size should be implicit from that; right now I make
    assumptions about what is in the files based on their names, but
    that is not part of the checksum.

> > repack time, later on, or not at all. For now I have a separate command
> > that generates them based on the pack index; if this turns out to be a
> > good idea, it would probably get called as part of "repack".
> 
> I'd like to make it part of index-pack, where we have nearly
> everything in memory. But let's leave it as a separate command first.

Yeah, in the long run that may work. The steps I figured were:

  1. Optional, external command. Let people experiment.

  2. Once it has proven itself, run the command from index-pack by
     default (or with a config option).

  3. If it turns out too slow, move the generation directly into the
     index-pack process.

The current iteration does not seem all that slow, but that is because I
am mostly picking static data out of the commits. So I have to load the
commits, and that's it. But something like reachability might be more
expensive (OTOH, it will always be more expensive, whether we have the
objects in memory or not).

> > Each file is a set of fixed-length records. The "commits" file contains
> > the sha1 of every commit in the pack (sorted). A binary search of the
> > mmap'd file gives the position of a particular commit within the list,
> 
> I think we could avoid storing sha-1 in the cache with Shawn's idea
> [1]. But now I read it again I fail to see it :(
> 
> [1] http://article.gmane.org/gmane.comp.version-control.git/206485

Right. My implementation is very similar to what Shawn said there. I.e.,
the timestamps file is literally 4 bytes times the number of commits.
The parents file is 40 bytes per commit (2 parents, with a marker to
indicate "more or less than 2"), though a lot of it is zero bytes.

Some alternatives I'm thinking about are:

  1. Using non-fixed-size records, which would allow trivial compression
     of entries like null sha1s. This would mean adding a separate
     lookup table, though, mapping sha1s to offsets. Still, even a
     32-bit offset is only 4 bytes per commit. If it meant dropping 40
     bytes of zeroes from the 2nd parent field out of half of all
     commits, that would be a win space-wise. It would be a
     double-indirect lookup, but it's constant effort, and only two page
     hits (which would be warm after the first lookup anyway).

  2. Storing offsets to objects in the packfile rather than their sha1s.
     This would save a lot of space, but would mean we couldn't refer to
     parents outside of the pack, but that may be OK. This is an
     optimization, and the case we want to target is a fully (or mostly)
     packed repo. It's OK to have the lookup fail and fallback to
     accessing the object.

  3. Dropping the "commits" file and just using the pack-*.idx as the
     index. The problem is that it is sparse in the commit space. So
     just naively storing 40 bytes per entry is going to waste a lot of
     space. If we had a separate index as in (1) above, that could be
     dropped to (say) 4 bytes of offset per object. But still, right now
     the commits file for linux-2.6 is about 7.2M (20 bytes times ~376K
     commits). There are almost 3 million total objects, so even storing
     4 bytes per object is going to be worse.

  4. Making a new index version that stores the sha1s separated by type.
     This means we can piggy-back on the regular index to get a packed
     list of just commits. But it also means that regular sha1 lookups
     of the objects have to look in several places (unless the caller
     annotates the call to read_sha1_object with "I am expecting this
     sha1 to be a commit"). And of course it means bumping the index
     version, which is a pain. The external index means it can be
     completely optional on top of the current index/pack.

> Depending on the use case, we could just generate packv4-like cache
> for recently-used trees only. I'm not sure how tree cache impact a
> merge operation on a very large worktree (iow, a lot of trees
> referenced from HEAD to be inflated). This is something a cache can
> do, but a new pack version cannot.

I do not care too much about the cost of running merge on a large
working tree. Of course it's better to make our optimizations as
generally applicable as possible, but there is a lot of other work going
on in a merge. The really painful, noticeable, repetitive bits right now
are:

  1. Running git-prune.

  2. Creating a pack from git-upload-pack.

Which are both just reachability problems. Something like "git log --
<pathspec>" would also be helped by packv4-ish tree access patterns,
though, but not by reachability bitmaps. And that may be something
worth caring about.

> Yes. And if narrow clone ever comes, which needs --objects limited by
> pathspec, we could just produce extra bitmaps for frequently-used
> pathspecs and only allow narrow clone with those pathspecs.

I hadn't thought about that. But yeah, because of the optional, external
nature, there's no reason you couldn't have extra bitmap sets for
specialized situations.

-Peff

^ permalink raw reply

* Re: [PATCH 0/2] Add MAINTAINERS file and clarify gui workflows
From: Junio C Hamano @ 2012-12-30 20:26 UTC (permalink / raw)
  To: Jason Holden; +Cc: git, paulus, patthoyts
In-Reply-To: <1356891535-5647-1-git-send-email-jason.k.holden.swdev@gmail.com>

Jason Holden <jason.k.holden.swdev@gmail.com> writes:

> I spent a good amount of time yesterday figuring out the correct workflow
> to submit a change to gitk.

Thanks; I just realized that nothing in Documentation/ hierarchy
mentions these; they are only mentioned in "A Note from the
Maintainer" I send out every once in a while (kept in MaintNotes of
'todo' branch):

    * Other people's trees, trusted lieutenants and credits.

    Documentation/SubmittingPatches outlines to whom your proposed changes
    should be sent.  As described in contrib/README, I would delegate fixes
    and enhancements in contrib/ area to the primary contributors of them.

    Although the following are included in git.git repository, they have their
    own authoritative repository and maintainers:

     - git-gui/ comes from git-gui project, maintained by Pat Thoyts:

            git://repo.or.cz/git-gui.git

     - gitk-git/ comes from Paul Mackerras's gitk project:

            git://ozlabs.org/~paulus/gitk

     - po/ comes from the localization coordinator, Jiang Xin:

            https://github.com/git-l10n/git-po/

Perhaps the update should mention po/ as well?

^ permalink raw reply

* Heads up, an emergency fix for git-cvsimport is coming shortly
From: Eric S. Raymond @ 2012-12-30 19:21 UTC (permalink / raw)
  To: git

Bad news: the combination of cvsps and the existing git-cvsimport
script is seriously broken in both places.  This morning I fixed a
nasty bug in cvsps's branch detection and shipped 3.3. This is a
different bug from the broken (and now removed) ancestry-branch
tracking.

Good news: I have fixed all the urgent bugs (and now you know how I
spent my holidays).  Somewhat to my surprise, half the problems listed
on the git-cvsimport manual page turned out to be problems in
git-cvsimport itself, not more cvsps lossage. Those bugs are dead.

cvsps is now much better about warning when it cannot translate a tag
or sees a dubious branch structure.  I've also enhanced git-cvsimport
to have an engine switch so it can optionally use cvs2git as its 
conversion engine. If and when I can get parsecvs back into working
shape, I will add it to the set of supported engines.

I have a test suite that proves fixes for all the urgent problems, but
that needs a bit more work before I'm willing to call it done.

In a few days I will ship a patch that replaces git-cvsimport with a
working version and removes the t960[123] tests from the git tree.
Those are not actually tests of git-cvsimport itself but of the
underlying conversion engine, and now form about half of cvsps's own
regression-test suite.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

It is proper to take alarm at the first experiment on our
liberties. We hold this prudent jealousy to be the first duty of
citizens and one of the noblest characteristics of the late
Revolution. The freemen of America did not wait till usurped power had
strengthened itself by exercise and entangled the question in
precedents. They saw all the consequences in the principle, and they
avoided the consequences by denying the principle. We revere this
lesson too much ... to forget it	-- James Madison.

^ permalink raw reply

* [PATCH 0/2] Add MAINTAINERS file and clarify gui workflows
From: Jason Holden @ 2012-12-30 18:18 UTC (permalink / raw)
  To: git; +Cc: gitster, paulus, patthoyts, Jason Holden

I spent a good amount of time yesterday figuring out the correct workflow
to submit a change to gitk.  As I understand it, gitk (and I think git-gui)
are maintained upstream of git, and patches should be sent to the git email
list against the upstream repo.  I think a top-level MAINTAINERS file would 
help new contributers like me get orientated, especially in the cases of these
upstream projects that require a somewhat non-standard workflow

I also added some additional clarifications to SubmittingPatches that 
clarifies the additional steps required to submit patches against the guis.

Please double check that I've got the correct email addresses and canonical
repositories

I'm guessing there are additional Maintainers who should be added to the 
MAINTAINERS file, I just haven't followed to email list closely enough to
know all the formal/informal workflows that should be observed.

Jason Holden (2):
  Add top-level maintainers file with email/canonical repository
    information
  Provide better guidance for submitting patches against git-gui, gitk

 Documentation/SubmittingPatches | 11 +++++++++++
 MAINTAINERS                     | 17 +++++++++++++++++
 2 files changed, 28 insertions(+)
 create mode 100644 MAINTAINERS

-- 
1.8.1.rc3.28.g0ab5d1f

^ permalink raw reply

* [PATCH 1/2] Add top-level maintainers file with email/canonical repository information
From: Jason Holden @ 2012-12-30 18:18 UTC (permalink / raw)
  To: git; +Cc: gitster, paulus, patthoyts, Jason Holden
In-Reply-To: <1356891535-5647-1-git-send-email-jason.k.holden.swdev@gmail.com>

Certain parts of git have a semi-formalized workflow for
incoming patches.  This file documents the maintainers, their area of
specialization, their email address, and their canonical repository against
which patches should be submitted.

Signed-off-by: Jason Holden <jason.k.holden.swdev@gmail.com>
---
 MAINTAINERS | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)
 create mode 100644 MAINTAINERS

diff --git a/MAINTAINERS b/MAINTAINERS
new file mode 100644
index 0000000..ed23b21
--- /dev/null
+++ b/MAINTAINERS
@@ -0,0 +1,17 @@
+Core Git/Overall Maintainer:
+ Junio C Hamano <gitster@pobox.com>
+ git://git.kernel.org/pub/scm/git/git.git
+
+
+The GUI's packaged with git (git-gui and gitk) are maintained 
+upstream of the core git repository.  Their contact information 
+and canonical repositories are below.  Patches to improve these utilities 
+should be made against the tree's referenced below
+
+gitk:
+ Paul Mackerras <paulus@samba.org>
+ git://ozlabs.org/~paulus/gitk
+
+git-gui:
+ Pat Thoyts <patthoyts@users.sourceforge.net>
+ git://repo.or.cz/git-gui
-- 
1.8.1.rc3.28.g0ab5d1f

^ permalink raw reply related

* [PATCH 2/2] Provide better guidance for submitting patches against git-gui, gitk
From: Jason Holden @ 2012-12-30 18:18 UTC (permalink / raw)
  To: git; +Cc: gitster, paulus, patthoyts, Jason Holden
In-Reply-To: <1356891535-5647-1-git-send-email-jason.k.holden.swdev@gmail.com>

git-gui and gitk are maintained upstream of git.  Document this, and the
procedure for submitting patches to these tools

Signed-off-by: Jason Holden <jason.k.holden.swdev@gmail.com>
---
 Documentation/SubmittingPatches | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 75935d5..b82d426 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -58,6 +58,17 @@ Checklist (and a short version for the impatient):
 	  please test it first by sending email to yourself.
 	- see below for instructions specific to your mailer
 
+	Improving the GUI's
+	- gitk and git-gui are maintained upstream of Git despite being 
+	  included in Git's git repository
+	- Patches should be made against the upstream gui repository, 
+	  and not against the version in Git's git repository
+	- The resulting patch should still be emailed for review
+	  to the git mailing list (git@vger.kernel.org), cc'ing the 
+	  applicable gui maintainer
+	- Please see the MAINTAINER's file for the gui maintainer's contact 
+	  information and canonical repository location
+
 Long version:
 
 I started reading over the SubmittingPatches document for Linux
-- 
1.8.1.rc3.28.g0ab5d1f

^ permalink raw reply related

* Re: Lockless Refs?  (Was [PATCH] refs: do not use cached refs in repack_without_ref)
From: Martin Fick @ 2012-12-30 17:03 UTC (permalink / raw)
  To: Jeff King; +Cc: Michael Haggerty, git, Junio C Hamano
In-Reply-To: <029f9379-a284-40e6-b4b9-529bd82d6e3e@email.android.com>

On Saturday, December 29, 2012 03:18:49 pm Martin Fick wrote:
> Jeff King <peff@peff.net> wrote:
> >On Thu, Dec 27, 2012 at 04:11:51PM -0700, Martin Fick 
wrote:
> >> My idea is based on using filenames to store sha1s
> >> instead of file contents.  To do this, the sha1 one of
> >> a ref would be stored in a file in a directory named
> >> after the loose ref.  I believe this would then make
> >> it possible to have lockless atomic ref updates by
> >> renaming the file.
> >> 
> >> To more fully illustrate the idea, imagine that any
> >> file (except for the null file) in the directory will
> >> represent the value of the ref with its name, then the
> >> following transitions can represent atomic state
> >> changes to a refs
> >
> >> value and existence:
> >Hmm. So basically you are relying on atomic rename() to
> >move the value around within a directory, rather than
> >using write to move it around within a file. Atomic
> >rename is usually something we have on local filesystems
> >(and I think we rely on it elsewhere). Though I would
> >not be
> >surprised if it is not atomic on all networked
> >filesystems (though it is
> >on NFS, at least).
> 
> Yes.  I assume this is OK because doesn't git already rely
> on atomic renames?  For example to rename the new
> packed-refs file to unlock it?
> 
> ...
> 
> >> 3) To create a ref, it must be renamed from the null
> >> file (sha 0000...) to the new value just as if it were
> >> being updated from any other value, but there is one
> >> extra condition: before renaming the null file, a full
> >> directory scan must be done to ensure that the null
> >> file is the only file in the directory (this condition
> >> exists because creating the directory and null file
> >> cannot be atomic unless the filesystem supports atomic
> >> directory renames, an expectation git does not
> >> currently make).  I am not sure how this compares to
> >> today's approach, but including the setup costs
> >> (described below), I suspect it is slower.
> >
> >Hmm. mkdir is atomic. So wouldn't it be sufficient to
> >just mkdir and create the correct sha1 file?
> 
> But then a process could mkdir and die leaving a stale
> empty dir with no reliable recovery mechanism.
> 
> 
> Unfortunately, I think I see another flaw though! :( I
> should have known that I cannot separate an important
> check from its state transitioning action.  The following
> could happen:
> 
>  A does mkdir
>  A creates null file
>  A checks dir -> no other files
>  B checks dir -> no other files
>  A renames null file to abcd
>  C creates second null file
>  B renames second null file to defg
> 
> One way to fix this is to rely on directory renames, but I
> believe this is something git does not want to require of
> every FS? If we did, we could Change #3 to be:
> 
> 3) To create a ref, it must be renamed from the null file
> (sha 0000...) to the new value just as if it were being
> updated from any other value. (No more scan)
> 
> Then, with reliable directory renames, a process could do
> what you suggested to a temporary directory, mkdir +
> create null file, then rename the temporary dir to
> refname.  This would prevent duplicate null files.  With
> a grace period, the temporary dirs could be cleaned up in
> case a process dies before the rename.  This is your
> approach with reliable recovery.

The whole null file can go away if we use directory renames.  
Make #3:

3) To create a ref, create a temporary directory containing a 
file named after the sha1 of the ref to be created and rename 
the directory to the name of the ref to create.  If the 
rename fails, the create fails.  If the rename succeeds, the 
create succeeds.

With a grace period, the temporary dirs could be cleaned up 
in case a process dies before the rename,

-Martin

^ permalink raw reply

* Re: [PATCH 1/2] dir.c: Make git-status --ignored more consistent
From: Adam Spiers @ 2012-12-30 15:01 UTC (permalink / raw)
  To: Antoine Pelisse; +Cc: Junio C Hamano, git, Jeff King
In-Reply-To: <CALWbr2w=CWkpbJhC5sjd9HnErmWj9JQnD6UUiDM91ovJ_-16vA@mail.gmail.com>

On Sun, Dec 30, 2012 at 2:54 PM, Antoine Pelisse <apelisse@gmail.com> wrote:
> By the way, that merges without conflicts with Adam's series, but it
> will not compile as he renamed functions that I'm now using
> (path_excluded() -> is_path_excluded() that is).

Ah, renames!  I forgot about those.

> By the way, Junio, how do you handle this situation as a maintainer ?
> Do you keep a note to manually make the change every time you remerge
> the series together ? That is the kind of use-case you can't handle
> with git-rerere, and I've been trying to find a solution to it.

Not sure if it helps to note that I am already basing my patch series
on top of Junio's nd/attr-match-optim-more branch.  Nguyen created
that branch which conflicted with mine, but then resolved the conflicts,
so I am basing mine on his to avoid having to continually resolve the
same conflicts.

So you could take the same approach and rebase yours on top of mine,
e.g.

    git remote add junio git://github.com/gitster/git.git
    git fetch junio
    git rebase junio/as/check-ignore

^ permalink raw reply

* Aw: Re: [PATCH 0/3] Move CodingGuidelines and SubmittingPatches to ./Documentation/technical
From: Thomas Ackermann @ 2012-12-30 14:56 UTC (permalink / raw)
  To: artagnon, th.acker; +Cc: git

./Documentation/technical contains not only API documentation but also
several other documents describing Git implementation topics and thus
is the place someone wanting to join Git development should look at.
So IMHO CodingGuidelines and SubmittingPatches should also be there.
(One could even consider renaming ./technical to ./internal to stress this point
and get rid of the rather generic "technical" ...)

In contrast ./howto implies containing documents a Git *user* might 
need to solve some tricky problems (and to this end maintain-git.txt
and new-command.txt should also be moved to ./technical (sorry for
being the guy who just moved ./technical/api-command to 
./howto/new-command.txt ;-)).

./Documentation itself should only contain the command manpages
and tutorials.

----- Original Nachricht ----
Von:     Ramkumar Ramachandra <artagnon@gmail.com>
An:      Thomas Ackermann <th.acker@arcor.de>
Datum:   30.12.2012 12:52
Betreff: Re: [PATCH 0/3] Move CodingGuidelines and SubmittingPatches to ./Documentation/technical

> Thomas Ackermann wrote:
> > CodingGuidelines and SubmittingPatches are IMHO a little bit hidden in
> ./Documentation
> > and with respect to their content should be better placed in
> ./Documentation/technical.
> 
> I don't think SubmittingPatches and CodingGuidelines belong to
> Documentation/technical; that location is mostly reserved for API
> documentation.  Also, being prominent documents, they're probably
> linked to by many places on the internet.  I wouldn't want to
> unnecessarily break those links.
> 
> Ram
> 

---
Thomas

^ permalink raw reply

* Re: [PATCH 1/2] dir.c: Make git-status --ignored more consistent
From: Antoine Pelisse @ 2012-12-30 14:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Adam Spiers
In-Reply-To: <1356878341-12942-1-git-send-email-apelisse@gmail.com>

By the way, that merges without conflicts with Adam's series, but it
will not compile as he renamed functions that I'm now using
(path_excluded() -> is_path_excluded() that is).

By the way, Junio, how do you handle this situation as a maintainer ?
Do you keep a note to manually make the change every time you remerge
the series together ? That is the kind of use-case you can't handle
with git-rerere, and I've been trying to find a solution to it.

^ permalink raw reply

* [PATCH 2/2] git-status: Test --ignored behavior
From: Antoine Pelisse @ 2012-12-30 14:39 UTC (permalink / raw)
  To: Jeff King, Junio C Hamano; +Cc: Antoine Pelisse, git
In-Reply-To: <20121229072249.GB15408@sigill.intra.peff.net>

Test all possible use-cases of git-status --ignored with
--untracked-files to normal and all:

 - untracked directory is listed as untracked if it has a mix of
 untracked and ignored files in it.
 with -uall, ignored/untracked files are listed as
 ignored/untracked.

 - untracked directory with only ignored files is listed as ignored.
 with -uall, all files in the directory are listed.

 - ignored directory is listed as ignored. With -uall, all files in
 the directory are listed as ignored.

 - ignored and committed directory is listed as ignored if it has
 untracked files.
 with -uall, all untracked files in the directory are listed as
 ignored.

Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
---
 t/t7061-wtstatus-ignore.sh |  146 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 146 insertions(+)
 create mode 100755 t/t7061-wtstatus-ignore.sh

diff --git a/t/t7061-wtstatus-ignore.sh b/t/t7061-wtstatus-ignore.sh
new file mode 100755
index 0000000..0da1214
--- /dev/null
+++ b/t/t7061-wtstatus-ignore.sh
@@ -0,0 +1,146 @@
+#!/bin/sh
+
+test_description='git-status ignored files'
+
+. ./test-lib.sh
+
+cat >expected <<\EOF
+?? .gitignore
+?? actual
+?? expected
+?? untracked/
+EOF
+
+test_expect_success 'status untracked directory with --ignored' '
+	echo "ignored" >.gitignore &&
+	mkdir untracked &&
+	: >untracked/ignored &&
+	: >untracked/uncommitted &&
+	git status --porcelain --ignored >actual &&
+	test_cmp expected actual
+'
+
+cat >expected <<\EOF
+?? .gitignore
+?? actual
+?? expected
+?? untracked/uncommitted
+!! untracked/ignored
+EOF
+
+test_expect_success 'status untracked directory with --ignored -u' '
+	git status --porcelain --ignored -u >actual &&
+	test_cmp expected actual
+'
+
+cat >expected <<\EOF
+?? .gitignore
+?? actual
+?? expected
+!! ignored/
+EOF
+
+test_expect_success 'status ignored directory with --ignore' '
+	rm -rf untracked &&
+	mkdir ignored &&
+	: >ignored/uncommitted &&
+	git status --porcelain --ignored >actual &&
+	test_cmp expected actual
+'
+
+cat >expected <<\EOF
+?? .gitignore
+?? actual
+?? expected
+!! ignored/uncommitted
+EOF
+
+test_expect_success 'status ignored directory with --ignore -u' '
+	git status --porcelain --ignored -u >actual &&
+	test_cmp expected actual
+'
+
+cat >expected <<\EOF
+?? .gitignore
+?? actual
+?? expected
+!! untracked-ignored/
+EOF
+
+test_expect_success 'status untracked directory with ignored files with --ignore' '
+	rm -rf ignored &&
+	mkdir untracked-ignored &&
+	mkdir untracked-ignored/test &&
+	: >untracked-ignored/ignored &&
+	: >untracked-ignored/test/ignored &&
+	git status --porcelain --ignored >actual &&
+	test_cmp expected actual
+'
+
+cat >expected <<\EOF
+?? .gitignore
+?? actual
+?? expected
+!! untracked-ignored/ignored
+!! untracked-ignored/test/ignored
+EOF
+
+test_expect_success 'status untracked directory with ignored files with --ignore -u' '
+	git status --porcelain --ignored -u >actual &&
+	test_cmp expected actual
+'
+
+cat >expected <<\EOF
+?? .gitignore
+?? actual
+?? expected
+EOF
+
+test_expect_success 'status ignored tracked directory with --ignore' '
+	rm -rf untracked-ignored &&
+	mkdir tracked &&
+	: >tracked/committed &&
+	git add tracked/committed &&
+	git commit -m. &&
+	echo "tracked" >.gitignore &&
+	git status --porcelain --ignored >actual &&
+	test_cmp expected actual
+'
+
+cat >expected <<\EOF
+?? .gitignore
+?? actual
+?? expected
+EOF
+
+test_expect_success 'status ignored tracked directory with --ignore -u' '
+	git status --porcelain --ignored -u >actual &&
+	test_cmp expected actual
+'
+
+cat >expected <<\EOF
+?? .gitignore
+?? actual
+?? expected
+!! tracked/
+EOF
+
+test_expect_success 'status ignored tracked directory and uncommitted file with --ignore' '
+	: >tracked/uncommitted &&
+	git status --porcelain --ignored >actual &&
+	test_cmp expected actual
+'
+
+cat >expected <<\EOF
+?? .gitignore
+?? actual
+?? expected
+!! tracked/uncommitted
+EOF
+
+test_expect_success 'status ignored tracked directory and uncommitted file with --ignore -u' '
+	git status --porcelain --ignored -u >actual &&
+	test_cmp expected actual
+'
+
+test_done
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 1/2] dir.c: Make git-status --ignored more consistent
From: Antoine Pelisse @ 2012-12-30 14:39 UTC (permalink / raw)
  To: Jeff King, Junio C Hamano; +Cc: Antoine Pelisse, git
In-Reply-To: <20121229072249.GB15408@sigill.intra.peff.net>

The current behavior of git-status is inconsistent and
misleading. Especially when used with --untracked-files=all option:

 - files ignored in untracked directories will be missing from status
 output.
 - untracked files in committed yet ignored directories are also
 missing.
 - with --untracked-files=normal, untracked directories that contains
 only ignored files are dropped too.

Make the behavior more consistent across all possible use cases:

 - "--ignored --untracked-files=normal" doesn't show each specific
 files but top directory.
 Shows untracked directories that only contains ignored files, and
 ignored tracked directories with untracked files.
 - "--ignored --untracked-files=all" shows all ignored files, either
 because it's in an ignored directory (tracked or untracked), or
 because the file is explicitly ignored.

Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
---
 dir.c       |   98 +++++++++++++++++++++++++++++++++++++++++++++++------------
 wt-status.c |    4 ++-
 2 files changed, 81 insertions(+), 21 deletions(-)

diff --git a/dir.c b/dir.c
index 5a83aa7..d0c92dc 100644
--- a/dir.c
+++ b/dir.c
@@ -834,8 +834,9 @@ static enum exist_status directory_exists_in_index(const char *dirname, int len)
  * traversal routine.
  *
  * Case 1: If we *already* have entries in the index under that
- * directory name, we always recurse into the directory to see
- * all the files.
+ * directory name, we recurse into the directory to see all the files,
+ * unless the directory is excluded and we want to show ignored
+ * directories
  *
  * Case 2: If we *already* have that directory name as a gitlink,
  * we always continue to see it as a gitlink, regardless of whether
@@ -849,6 +850,9 @@ static enum exist_status directory_exists_in_index(const char *dirname, int len)
  *      just a directory, unless "hide_empty_directories" is
  *      also true and the directory is empty, in which case
  *      we just ignore it entirely.
+ *      if we are looking for ignored directories, look if it
+ *      contains only ignored files to decide if it must be shown as
+ *      ignored or not.
  *  (b) if it looks like a git directory, and we don't have
  *      'no_gitlinks' set we treat it as a gitlink, and show it
  *      as a directory.
@@ -861,12 +865,15 @@ enum directory_treatment {
 };
 
 static enum directory_treatment treat_directory(struct dir_struct *dir,
-	const char *dirname, int len,
+	const char *dirname, int len, int exclude,
 	const struct path_simplify *simplify)
 {
 	/* The "len-1" is to strip the final '/' */
 	switch (directory_exists_in_index(dirname, len-1)) {
 	case index_directory:
+		if ((dir->flags & DIR_SHOW_OTHER_DIRECTORIES) && exclude)
+			break;
+
 		return recurse_into_directory;
 
 	case index_gitdir:
@@ -886,7 +893,23 @@ static enum directory_treatment treat_directory(struct dir_struct *dir,
 	}
 
 	/* This is the "show_other_directories" case */
-	if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
+
+	/*
+	 * We are looking for ignored files and our directory is not ignored,
+	 * check if it contains only ignored files
+	 */
+	if ((dir->flags & DIR_SHOW_IGNORED) && !exclude) {
+		int ignored;
+		dir->flags &= ~DIR_SHOW_IGNORED;
+		dir->flags |= DIR_HIDE_EMPTY_DIRECTORIES;
+		ignored = read_directory_recursive(dir, dirname, len, 1, simplify);
+		dir->flags &= ~DIR_HIDE_EMPTY_DIRECTORIES;
+		dir->flags |= DIR_SHOW_IGNORED;
+
+		return ignored ? ignore_directory : show_directory;
+	}
+	if (!(dir->flags & DIR_SHOW_IGNORED) &&
+	    !(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
 		return show_directory;
 	if (!read_directory_recursive(dir, dirname, len, 1, simplify))
 		return ignore_directory;
@@ -894,6 +917,49 @@ static enum directory_treatment treat_directory(struct dir_struct *dir,
 }
 
 /*
+ * Decide what to do when we find a file while traversing the
+ * filesystem. Mostly two cases:
+ *
+ *  1. We are looking for ignored files
+ *   (a) File is ignored, include it
+ *   (b) File is in ignored path, include it
+ *   (c) File is not ignored, exclude it
+ *
+ *  2. Other scenarios, include the file if not excluded
+ *
+ * Return 1 for exclude, 0 for include.
+ */
+static int treat_file(struct dir_struct *dir, struct strbuf *path, int exclude, int *dtype)
+{
+	struct path_exclude_check check;
+	int exclude_file = 0;
+
+	if (exclude)
+		exclude_file = !(dir->flags & DIR_SHOW_IGNORED);
+	else if (dir->flags & DIR_SHOW_IGNORED) {
+		/*
+		 * Optimization:
+		 * Don't spend time on indexed files, they won't be
+		 * added to the list anyway
+		 */
+		struct cache_entry *ce = index_name_exists(&the_index,
+		    path->buf, path->len, ignore_case);
+
+		if (ce)
+			return 1;
+
+		path_exclude_check_init(&check, dir);
+
+		if (!path_excluded(&check, path->buf, path->len, dtype))
+			exclude_file = 1;
+
+		path_exclude_check_clear(&check);
+	}
+
+	return exclude_file;
+}
+
+/*
  * This is an inexact early pruning of any recursive directory
  * reading - if the path cannot possibly be in the pathspec,
  * return true, and we'll skip it early.
@@ -1031,27 +1097,14 @@ static enum path_treatment treat_one_path(struct dir_struct *dir,
 	if (dtype == DT_UNKNOWN)
 		dtype = get_dtype(de, path->buf, path->len);
 
-	/*
-	 * Do we want to see just the ignored files?
-	 * We still need to recurse into directories,
-	 * even if we don't ignore them, since the
-	 * directory may contain files that we do..
-	 */
-	if (!exclude && (dir->flags & DIR_SHOW_IGNORED)) {
-		if (dtype != DT_DIR)
-			return path_ignored;
-	}
-
 	switch (dtype) {
 	default:
 		return path_ignored;
 	case DT_DIR:
 		strbuf_addch(path, '/');
-		switch (treat_directory(dir, path->buf, path->len, simplify)) {
+
+		switch (treat_directory(dir, path->buf, path->len, exclude, simplify)) {
 		case show_directory:
-			if (exclude != !!(dir->flags
-					  & DIR_SHOW_IGNORED))
-				return path_ignored;
 			break;
 		case recurse_into_directory:
 			return path_recurse;
@@ -1061,7 +1114,12 @@ static enum path_treatment treat_one_path(struct dir_struct *dir,
 		break;
 	case DT_REG:
 	case DT_LNK:
-		break;
+		switch(treat_file(dir, path, exclude, &dtype)) {
+		case 1:
+			return path_ignored;
+		default:
+			break;
+		}
 	}
 	return path_handled;
 }
diff --git a/wt-status.c b/wt-status.c
index 2a9658b..d7cfe8f 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -516,7 +516,9 @@ static void wt_status_collect_untracked(struct wt_status *s)
 
 	if (s->show_ignored_files) {
 		dir.nr = 0;
-		dir.flags = DIR_SHOW_IGNORED | DIR_SHOW_OTHER_DIRECTORIES;
+		dir.flags = DIR_SHOW_IGNORED;
+		if (s->show_untracked_files != SHOW_ALL_UNTRACKED_FILES)
+			dir.flags |= DIR_SHOW_OTHER_DIRECTORIES;
 		fill_directory(&dir, s->pathspec);
 		for (i = 0; i < dir.nr; i++) {
 			struct dir_entry *ent = dir.entries[i];
-- 
1.7.9.5

^ permalink raw reply related

* Re: [RFC] pack-objects: compression level for non-blobs
From: Nguyen Thai Ngoc Duy @ 2012-12-30 12:53 UTC (permalink / raw)
  To: Jeff King; +Cc: David Michael Barr, Git Mailing List
In-Reply-To: <20121230120542.GA10820@sigill.intra.peff.net>

On Sun, Dec 30, 2012 at 7:05 PM, Jeff King <peff@peff.net> wrote:
> So I was thinking about this, which led to some coding, which led to
> some benchmarking.

I like your way of thinking! May I suggest you take a new year break
first, then "think" about reachability bitmaps ;-) 2013 will be an
exciting year.

> I want to clean up a few things in the code before I post it, but the
> general idea is to have arbitrary per-pack cache files in the
> objects/pack directory. Like this:
>
>   $ cd objects/pack && ls
>   pack-a3e262f40d95fc0cc97d92797ff9988551367b75.commits
>   pack-a3e262f40d95fc0cc97d92797ff9988551367b75.idx
>   pack-a3e262f40d95fc0cc97d92797ff9988551367b75.pack
>   pack-a3e262f40d95fc0cc97d92797ff9988551367b75.parents
>   pack-a3e262f40d95fc0cc97d92797ff9988551367b75.timestamps
>   pack-a3e262f40d95fc0cc97d92797ff9988551367b75.trees
>
> Each file describes the objects in the matching pack. If a new pack is
> generated, you'd throw away the old cache files along with the old pack,
> and generate new ones. Or not. These are totally optional, and an older
> version of git will just ignore them. A newer version will use them if
> they're available, and otherwise fallback to the existing code (i.e.,
> reading the whole object from the pack). So you can generate them at

You have probably thought about this (and I don't have the source to
check first), but we may need to version these extra files so we can
change the format later if needed. Git versions that do not recognize
new versions simply ignore the cahce.

> repack time, later on, or not at all. For now I have a separate command
> that generates them based on the pack index; if this turns out to be a
> good idea, it would probably get called as part of "repack".

I'd like to make it part of index-pack, where we have nearly
everything in memory. But let's leave it as a separate command first.

> Each file is a set of fixed-length records. The "commits" file contains
> the sha1 of every commit in the pack (sorted). A binary search of the
> mmap'd file gives the position of a particular commit within the list,

I think we could avoid storing sha-1 in the cache with Shawn's idea
[1]. But now I read it again I fail to see it :(

[1] http://article.gmane.org/gmane.comp.version-control.git/206485

> Of course, it does very little for the full --objects listing, where we
> spend most of our time inflating trees. We could couple this with
> uncompressed trees (which are not all that much bigger, since the sha1s
> do not compress anyway). Or we could have an external tree cache, but
> I'm not sure exactly what it would look like (this is basically
> reinventing bits of packv4, but doing so in a way that is redundant with
> the existing packfile, rather than replacing it).

Depending on the use case, we could just generate packv4-like cache
for recently-used trees only. I'm not sure how tree cache impact a
merge operation on a very large worktree (iow, a lot of trees
referenced from HEAD to be inflated). This is something a cache can
do, but a new pack version cannot.

> Or since the point of
> --objects is usually reachability, it may make more sense to pursue the
> bitmap, which should be even faster still.

Yes. And if narrow clone ever comes, which needs --objects limited by
pathspec, we could just produce extra bitmaps for frequently-used
pathspecs and only allow narrow clone with those pathspecs.
-- 
Duy

^ permalink raw reply

* Re: [RFC] pack-objects: compression level for non-blobs
From: Jeff King @ 2012-12-30 12:05 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: David Michael Barr, Git Mailing List
In-Reply-To: <20121229052747.GA14928@sigill.intra.peff.net>

On Sat, Dec 29, 2012 at 12:27:47AM -0500, Jeff King wrote:

> > If reachability bitmap is implemented, we'll have per-pack cache
> > infrastructure ready, so less work there for commit cache.
> 
> True. I don't want to dissuade you from doing any commit cache work. I
> only wanted to point out that this alternative may have merit because of
> its simplicity (so we can use it until a caching solution exists, or
> even after, if managing the cache has downsides).

So I was thinking about this, which led to some coding, which led to
some benchmarking.

I want to clean up a few things in the code before I post it, but the
general idea is to have arbitrary per-pack cache files in the
objects/pack directory. Like this:

  $ cd objects/pack && ls
  pack-a3e262f40d95fc0cc97d92797ff9988551367b75.commits
  pack-a3e262f40d95fc0cc97d92797ff9988551367b75.idx
  pack-a3e262f40d95fc0cc97d92797ff9988551367b75.pack
  pack-a3e262f40d95fc0cc97d92797ff9988551367b75.parents
  pack-a3e262f40d95fc0cc97d92797ff9988551367b75.timestamps
  pack-a3e262f40d95fc0cc97d92797ff9988551367b75.trees

Each file describes the objects in the matching pack. If a new pack is
generated, you'd throw away the old cache files along with the old pack,
and generate new ones. Or not. These are totally optional, and an older
version of git will just ignore them. A newer version will use them if
they're available, and otherwise fallback to the existing code (i.e.,
reading the whole object from the pack). So you can generate them at
repack time, later on, or not at all. For now I have a separate command
that generates them based on the pack index; if this turns out to be a
good idea, it would probably get called as part of "repack".

Each file is a set of fixed-length records. The "commits" file contains
the sha1 of every commit in the pack (sorted). A binary search of the
mmap'd file gives the position of a particular commit within the list,
and that position is used to index the parents, timestamps, and trees
files (obviously if it is missing, then the other files are useless, but
we already have to be able to fallback to just reading the objects
anyway).

I split it out into multiple files because you can actually operate with
a subset (though in my initial attempt, I transparently plug in at the
parse_commit layer, which means we need all items to consider the commit
"parsed", whether the caller actually cares or not. But in theory a
reader could only want to ask for one item).  Making a "generation"
cache file is an obvious next step (and because we already have
"commits", it is only 4 bytes per commit on top of it). Reachability
bitmaps would be another one (though due to the compression, I am not
sure they will work with a fixed-size record design, so this may need
some modification).

Anyway, here are the numbers I came up with (appended to my earlier
compression numbers):

git.git:
 Pack  | Size          |  Cold Revs  |  Warm Revs  | Cold Objects | Warm Objects
-------+---------------+-------------+-------------+--------------+--------------
  none |  56.72        | 0.68        | 0.33        |  2.45        |  1.94       
commit |  64.61 (+13%) | 0.50 (-26%) | 0.09 (-74%) |  2.42  (-1%) |  1.69 (-13%)
  tree |  60.68  (+6%) | 0.79 (+16%) | 0.33   (0%) |  2.23  (-8%) |  1.75  (-9%)
  both |  68.54 (+20%) | 0.48 (-29%) | 0.08 (-75%) |  2.24  (-8%) |  1.48 (-23%)
 cache |  59.29  (+4%) | 0.57 (-16%) | 0.05 (-84%) |  2.23  (-8%) |  1.66 (-14%)

linux.git:
 Pack  | Size          |  Cold Revs  |  Warm Revs  | Cold Objects | Warm Objects
-------+---------------+-------------+-------------+--------------+--------------
  none | 864.61        | 8.66        | 4.07        | 42.76        | 36.32       
commit | 970.46 (+12%) | 8.87  (+2%) | 1.02 (-74%) | 42.94   (0%) | 33.43  (-7%)
  tree | 895.37  (+3%) | 9.08  (+4%) | 4.07   (0%) | 36.01 (-15%) | 29.62 (-18%)
  both |1001.25 (+15%) | 8.90  (+2%) | 1.03 (-74%) | 35.57 (-16%) | 26.25 (-27%)
 cache | 894.78  (+3%) | 4.88 (-43%) | 0.69 (-83%) | 38.80  (-9%) | 32.79  (-9%)

webkit.git:
 Pack  | Size          |  Cold Revs  |  Warm Revs  | Cold Objects | Warm Objects
-------+---------------+-------------+-------------+--------------+--------------
  none |   3.46        | 1.61        | 1.38        | 20.46        | 18.72       
commit |   3.54  (+2%) | 1.42 (-11%) | 0.34 (-75%) | 20.42   (0%) | 17.57  (-6%)
  tree |   3.59  (+3%) | 1.61   (0%) | 1.39   (0%) | 16.01 (-21%) | 14.00 (-25%)
  both |   3.67  (+6%) | 1.45 (-10%) | 0.34 (-75%) | 15.94 (-22%) | 12.91 (-31%)
 cache |   3.47   (0%) | 0.49 (-69%) | 0.14 (-90%) | 19.53  (-4%) | 17.86  (-4%)

So you can see that it performs even better than no-compression on the
warm-revs case. Which makes sense, since we do not even have to touch
the object data at all, and can do the whole traversal straight out of
the cache. So we do not even have to memcpy the bytes around. And it
takes up even less space (3-4% versus 12-13% on the first two repos).
Which makes sense, because even though we are duplicating some
information that is in the packfile, we are leaving all of the commit
message bodies compressed.

The other interesting thing is that the cold cache performance also
improves by a lot. Again, this makes sense; we are doing the traversal
completely out of cache, and our data is even more tightly packed in the
cache than it is in the packfile.

Of course, it does very little for the full --objects listing, where we
spend most of our time inflating trees. We could couple this with
uncompressed trees (which are not all that much bigger, since the sha1s
do not compress anyway). Or we could have an external tree cache, but
I'm not sure exactly what it would look like (this is basically
reinventing bits of packv4, but doing so in a way that is redundant with
the existing packfile, rather than replacing it). Or since the point of
--objects is usually reachability, it may make more sense to pursue the
bitmap, which should be even faster still.

-Peff

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox