git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: git@vger.kernel.org
Cc: Petr Baudis <pasky@ucw.cz>, Josh Triplett <josh@joshtriplett.org>
Subject: [PATCH 0/3] Optimizing "ls-files -k"
Date: Thu, 15 Aug 2013 14:28:07 -0700	[thread overview]
Message-ID: <1376602090-19142-1-git-send-email-gitster@pobox.com> (raw)
In-Reply-To: <7v8v02rb2g.fsf@alter.siamese.dyndns.org>

"ls-files -o" and "ls-files -k" both traverse the working tree down
to find either all untracked paths or those that will be "killed"
(removed from the working tree to make room) when the paths recorded
in the index are checked out.

It is necessary to traverse the working tree fully when enumerating
all the "other" paths, but when we are only interested in "killed"
paths, we can take advantage of the fact that paths that do not
overlap with entries in the index can never be killed.

The first one is an independent clean-up.  No public API in the
working tree traversal takes alternate in-core index, so there is no
reason to explicitly use the_index and index_* functions from the
in-core index API.

The second one is rerolled from the "something like this" patch I
sent earlier, but corrects the "we see a directory, it is not in the
index, but a file in it is" case.

And the third one adds a testcase that illustrates why the earlier
"something like this" patch is not sufficient.

These are designed to apply on top of v1.8.3, and needs a bit of
conflict resolution for the upcoming v1.8.4 codebase; I'll queue
them in 'pu' for now.

Note that t3010, especially after merged to 'pu', will use many
different ways to create a test file.  Some redirect "date" into it,
some redirect ":" into it, some "touch" it, and some just redirect
with no command.

	date >file1
	: >file2
	touch file3
	>file4

We should consolidate them all to just do ">file4" after making sure
the contents do not matter (we kind of know it already, as "date"
will output string that is not repeatable).  Use of "touch" for
anything other than updating the timestamp is especially bad, as it
is misleading.

Junio C Hamano (3):
  dir.c: use the cache_* macro to access the current index
  ls-files -k: a directory only can be killed if the index has a non-directory
  t3010: update to demonstrate "ls-files -k" optimization pitfalls

 builtin/ls-files.c                  |  2 ++
 dir.c                               | 40 +++++++++++++++++++++++++++++--------
 dir.h                               |  3 ++-
 t/t3010-ls-files-killed-modified.sh | 12 ++++++++---
 4 files changed, 45 insertions(+), 12 deletions(-)

-- 
1.8.4-rc3-232-ga8053f8

  reply	other threads:[~2013-08-15 21:28 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-10 21:44 git stash takes excessively long when many untracked files present Josh Triplett
2013-08-13 10:11 ` Anders Darander
2013-08-13 17:07   ` Junio C Hamano
2013-08-13 17:36     ` Anders Darander
2013-08-13 17:52       ` Junio C Hamano
2013-08-13 21:47         ` Junio C Hamano
2013-08-15 17:52           ` Junio C Hamano
2013-08-15 18:07             ` Josh Triplett
2013-08-15 18:58               ` Junio C Hamano
2013-08-15 19:47             ` Junio C Hamano
2013-08-15 21:28               ` Junio C Hamano [this message]
2013-08-15 21:28                 ` [PATCH 1/3] dir.c: use the cache_* macro to access the current index Junio C Hamano
2013-08-15 21:28                 ` [PATCH 2/3] ls-files -k: a directory only can be killed if the index has a non-directory Junio C Hamano
2013-08-15 21:28                 ` [PATCH 3/3] t3010: update to demonstrate "ls-files -k" optimization pitfalls Junio C Hamano
2013-08-15 23:30                 ` [PATCH 4/3] git stash: avoid data loss when "git stash save" kills a directory Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1376602090-19142-1-git-send-email-gitster@pobox.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=josh@joshtriplett.org \
    --cc=pasky@ucw.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).