git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jonathan Nieder <jrnieder@gmail.com>
To: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 03/10] unpack-trees: add function to update ce_flags based on sparse patterns
Date: Mon, 15 Nov 2010 14:19:45 -0600	[thread overview]
Message-ID: <20101115201945.GF16385@burratino> (raw)
In-Reply-To: <1289817410-32470-4-git-send-email-pclouds@gmail.com>

Nguyễn Thái Ngọc Duy wrote:

> The function will reconstruct directory structure from index and feed
> excluded_from_list() with proper dtype.

Okay, so I guess what this means is:

 Matching index entries against an excludes file currently has two
 problems.

 First, there's no function to do it.  Code paths (like sparse checkout)
 that wanted to try it would iterate over index entries and for each
 index entry pass that path to excluded_from_list().  But that is not
 how excluded_from_list() works; one is supposed to feed in each
 ancester of a path before a given path to find out if it was excluded
 because of some parent or grandparent matching a

	bigsubdirectory/

 pattern despite the path not matching any .gitignore pattern directly.

 Second, it's inefficient.  The excludes mechanism is supposed to let
 us block off vast swaths of the filesystem as uninteresting; separately
 checking every index entry doesn't fit that model.

 Introduce a new function to take care of both these problems.  This
 traverses the index in depth-first order (well, that's what order the
 index is in) to mark un-excluded entries.

 Maybe some day the in-core index format will be restructured to make
 this sort of operation easier.  Or maybe we will want to try some
 binary search based thing.  The interface is simple enough to allow
 all those things.  Example:

	clear_ce_flags(the_index.cache, the_index.cache_nr,
			CE_CANDIDATE, CE_CLEARME, exclude_list);

 would clear the CE_CLEARME flag on all index entries with
 CE_CANDIDATE flag and not matched by exclude_list.

> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -834,6 +834,94 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>  	return mask;
>  }
>  
> +/*
> + * traverse the index, find every entry that matches according to
> + * o->el. Do "ce_flags &= ~clear_mask" on those entries. Return the
> + * number of traversed entries.
> + *
> + * If select_mask is non-zero, only entries whose ce_flags has on of
> + * those bits enabled are traversed.
> + */
> +static int clear_ce_flags_1(struct cache_entry **cache, int nr,
> +			    int prefix_len, int match_all,
> +			    int select_mask, int clear_mask,
> +			    struct unpack_trees_options *o)

o is only used for its excludes list.  Why not cut out the middle man
and take o->el as argument?

State (struct-worthy?): 
	cache     	pointer to an index entry
	prefix_len	an offset into its path
	match_all 	a boolean telling whether we have some skipping to do

The current path (called the prefix), including trailing '/', is

	cache[0]->name[0] .. cache[0]->name[prefix_len - 1]

> +{
> +	const char *name, *slash;
> +	struct cache_entry *ce = cache[0];
> +	const char *prefix = ce->name;
> +	int processed, original_nr = nr;
> +
> +	if (prefix_len && !match_all) {

Top level gets special handling.  Let's skip to it (see [*] for where we
were).

> +	while (nr) {

For each cache entry...

Might be nice to phrase this as a for loop.

	const char *prefix = cache[0]->name;
	struct cache_entry ** const cache_end = cache + nr;
	struct cache_entry **cep;
	for (cep = cache; cep != cache_end; ) {
		struct cache_entry *ce = *cep;

> +		if (select_mask && !(ce->ce_flags & select_mask)) {
> +			cache++;
> +			ce = *cache;
> +			nr--;
> +			continue;
> +		}

Some index entries are disqualified.

		if (select_mask)
			if (!(ce->ce_cflags & select_mask)) {
				cep++;
				continue;
			}

> +		if (prefix_len && strncmp(ce->name, prefix, prefix_len))
> +			break;

Entries not under prefix are handled by the caller.

		if (prefix_len)
			if (strncmp(ce->name, prefix, prefix_len))
				return cep - cache;	/* number processed */

> +		name = ce->name + prefix_len;
> +		slash = strchr(name, '/');
> +
> +		/* no slash, this is a file */
> +		if (!slash) {

Leaf!

			/* File or directory?  Check if it's a submodule. */
> +			int dtype = ce_to_dtype(ce);
> +			if (match_all ||
> +			    excluded_from_list(ce->name, ce_namelen(ce), name, &dtype, o->el) > 0)
> +				ce->ce_flags &= ~clear_mask;
> +			cache++;
> +			ce = *cache;
> +			nr--;
> +			continue;
> +		}

			if (excluded_block ||
			    excluded_from_list(...		/* Excluded? */
				ce->ce_flags &= ~clear_mask;
			cep++;
			continue;
		}

Putting the loop body in its own function could allow some nice depth
reduction.

> +
> +		/* has slash, this is a directory */
> +		processed = clear_ce_flags_1(cache, nr,
> +					     slash + 1 - ce->name, match_all,
> +					     select_mask, clear_mask, o);
> +		cache += processed;
> +		ce = *cache;
> +		nr -= processed;
> +	}
> +	return original_nr - nr;

		/* Non-leaf.  Recurse. */
		cep += clear_ce_ ...
	}

How about the other case, mentioned before (see [*] above)?

	assert(prefix_len);	/* Not toplevel but a subdirectory. */
	assert(!match_all);	/* Not wholesale excluded. */

> +		int dtype = DT_DIR;
> +		static char path[PATH_MAX];
> +		int pathlen = prefix_len - 1;
> +		const char *basename;
> +
> +		memcpy(path, prefix, pathlen);
> +		path[pathlen] = '\0';
> +		basename = strrchr(path, '/');

Use memrchr?

> +		basename = basename ? basename+1 : path;
> +		switch (excluded_from_list(path, pathlen, basename, &dtype, o->el)) {
> +		case 0:
> +			while (nr && !strncmp(ce->name, prefix, prefix_len)) {
> +				cache++;
> +				ce = *cache;
> +				nr--;
> +			}
> +			return original_nr - nr;

		case 0:	/* Included.  Great, no flag to clear. */
			for (; cep != cache_end; cep++)
				if (strncmp((*cep)->name, prefix, prefix_len))
					break;
			return cep - cache;

> +		case 1:
> +			match_all = 1;
> +			break;

		case 1:	/* Excluded.  Great, clear them wholesale. */
			match_all = 1;
			continue;

> +		default:	/* case -1, undecided */
> +			break;
> +		}

		case -1:	/* Undecided.
			continue;
		}
> +	}
> +	return original_nr - nr;
> +}

	}
	/* Ran off the end. */
	return cep - cache;
 }

That last bit could even longjmp() if expecting deeply nested
directory hierarchies.  (Sorry, joking, you should not do such an
awful thing unless timing it proves it useful.)

That was clean, thanks.

Hope some of my musings are useful anyway.

  parent reply	other threads:[~2010-11-15 20:20 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-15 10:36 [PATCH 00/10] Sparse checkout fixes and improvements Nguyễn Thái Ngọc Duy
2010-11-15 10:36 ` [PATCH 01/10] add: do not rely on dtype being NULL behavior Nguyễn Thái Ngọc Duy
2010-11-15 12:14   ` Jonathan Nieder
2010-11-16  2:18     ` Nguyen Thai Ngoc Duy
2010-11-16  2:42       ` Jonathan Nieder
2010-11-16 18:58       ` Junio C Hamano
2010-11-17  6:38         ` Nguyen Thai Ngoc Duy
2010-11-15 10:36 ` [PATCH 02/10] unpack-trees: move all skip-worktree check back to unpack_trees() Nguyễn Thái Ngọc Duy
2010-11-15 12:34   ` Thiago Farina
2010-11-16  2:19     ` Nguyen Thai Ngoc Duy
2010-11-15 16:01   ` Jonathan Nieder
2010-11-16  2:39     ` Nguyen Thai Ngoc Duy
2010-11-15 10:36 ` [PATCH 03/10] unpack-trees: add function to update ce_flags based on sparse patterns Nguyễn Thái Ngọc Duy
2010-11-15 18:30   ` Jonathan Nieder
2010-11-15 20:19   ` Jonathan Nieder [this message]
2010-11-15 10:36 ` [PATCH 04/10] unpack-trees: fix sparse checkout's "unable to match directories" fault Nguyễn Thái Ngọc Duy
2010-11-15 19:10   ` Jonathan Nieder
2010-11-16  2:43     ` Nguyen Thai Ngoc Duy
2010-11-15 10:36 ` [PATCH 05/10] unpack-trees: optimize full checkout case Nguyễn Thái Ngọc Duy
2010-11-15 20:41   ` Jonathan Nieder
2010-11-15 10:36 ` [PATCH 06/10] templates: add info/sparse-checkout Nguyễn Thái Ngọc Duy
2010-11-15 10:36 ` [PATCH 07/10] checkout: add -S to update sparse checkout Nguyễn Thái Ngọc Duy
2010-11-15 21:16   ` Jonathan Nieder
2010-11-15 21:52     ` Miles Bader
2010-11-17 15:02       ` Nguyen Thai Ngoc Duy
2010-11-16  3:08     ` Nguyen Thai Ngoc Duy
2010-11-15 10:36 ` [PATCH 08/10] checkout: add --full to fully populate working directory Nguyễn Thái Ngọc Duy
2010-11-15 21:23   ` Jonathan Nieder
2010-11-16  2:50     ` Nguyen Thai Ngoc Duy
2010-11-15 10:36 ` [PATCH 09/10] git-checkout.txt: mention of sparse checkout Nguyễn Thái Ngọc Duy
2010-11-15 10:36 ` [PATCH 10/10] clean: support cleaning sparse checkout with -S Nguyễn Thái Ngọc Duy
2010-11-15 21:30   ` Jonathan Nieder
2010-11-16  2:53     ` Nguyen Thai Ngoc Duy
2010-11-16  3:07       ` Jonathan Nieder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101115201945.GF16385@burratino \
    --to=jrnieder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).