git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Hostetler <git@jeffhostetler.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, peff@peff.net, jonathantanmy@google.com,
	jrnieder@gmail.com, Jeff Hostetler <jeffhost@microsoft.com>
Subject: Re: [PATCH 1/3] list-objects: add filter_blob to traverse_commit_list
Date: Wed, 28 Jun 2017 13:13:22 -0400	[thread overview]
Message-ID: <e2216ab8-5af7-4edd-16aa-f84a45e0cbd7@jeffhostetler.com> (raw)
In-Reply-To: <xmqqy3scw06y.fsf@gitster.mtv.corp.google.com>



On 6/28/2017 12:23 PM, Junio C Hamano wrote:
> Jeff Hostetler <git@jeffhostetler.com> writes:
> 
>> diff --git a/list-objects.c b/list-objects.c
>> index f3ca6aa..c9ca81c 100644
>> --- a/list-objects.c
>> +++ b/list-objects.c
>> @@ -24,11 +25,28 @@ static void process_blob(struct rev_info *revs,
>>   		die("bad blob object");
>>   	if (obj->flags & (UNINTERESTING | SEEN))
>>   		return;
>> -	obj->flags |= SEEN;
>>   
>>   	pathlen = path->len;
>>   	strbuf_addstr(path, name);
>> -	show(obj, path->buf, cb_data);
>> +	if (!filter_blob) {
>> +		/*
>> +		 * Normal processing is to imediately dedup blobs
>> +		 * during commit traversal, regardless of how many
>> +		 * times it appears in a single or multiple commits,
>> +		 * so we always set SEEN.
>> +		 */
>> +		obj->flags |= SEEN;
>> +		show(obj, path->buf, cb_data);
>> +	} else {
>> +		/*
>> +		 * Use the filter-proc to decide whether to show
>> +		 * the blob.  We only set SEEN if requested.  For
>> +		 * example, this could be used to omit a specific
>> +		 * blob until it appears with a ".git*" entryname.
>> +		 */
>> +		if (filter_blob(obj, path->buf, &path->buf[pathlen], cb_data))
>> +			obj->flags |= SEEN;
>> +	}
> 
> This somehow looks a bit surprising organization and division of
> responsibility.  I would have expected
> 
> 	if (!filter_blob ||
> 	    filter_blob(obj, path->buf, &path->buf[pathlen], cb_data) {
> 		obj->flags |= SEEN;
> 		show(obj, path->buf, cb_data);
> 	}
> 
> i.e. making the filter function responsible for only making a
> decision to include or exclude, not giving it a chance to decide to
> "show" anything different.

Yes, my logic was a little confusing there.  Jonathan Tan said
something similar the other day.  I have a new version that I'm
working on now that looks like this:

	list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_SHOW;
	...
	if (filter)
		r = filter(obj, path->buf, ...
	if (r & LOFR_MARK_SEEN)
		obj->flags |= SEEN;
	if (r & LOFR_SHOW)
		show(obj, path->buf, cb_data);

I'm generalizing it a little to let the filter return 2 flags:
() SEEN to indicate that the filter doesn't want to see it again
() SHOW to include the object in the result.
These let filters do "hard" and "provisional" omits.  (This will
make more sense later when I get my patch cleaned up.)


>> @@ -67,6 +85,7 @@ static void process_gitlink(struct rev_info *revs,
>>   static void process_tree(struct rev_info *revs,
>>   			 struct tree *tree,
>>   			 show_object_fn show,
>> +			 filter_blob_fn filter_blob,
>>   			 struct strbuf *base,
>>   			 const char *name,
>>   			 void *cb_data)
>> @@ -111,7 +130,7 @@ static void process_tree(struct rev_info *revs,
>>   		if (S_ISDIR(entry.mode))
>>   			process_tree(revs,
>>   				     lookup_tree(entry.oid->hash),
>> -				     show, base, entry.path,
>> +				     show, filter_blob, base, entry.path,
>>   				     cb_data);
>>   		else if (S_ISGITLINK(entry.mode))
>>   			process_gitlink(revs, entry.oid->hash,
> 
> I wonder if we'll need filter_tree_fn in the future in this
> codepath.  When somebody wants to do a "narrow fetch/clone", would
> the approach taken by this series, i.e. decide not to show certain
> objects during the "rev-list --objects" traversal, a good precedent
> to follow?  Would this approach be a good foundation to build on
> such a future?

Yes, I'm including similar logic inside process_tree() to allow that
and let the filter know about entering and leaving each tree.  So we
only need one filter-proc to handle a particular strategy and it will
handle both tree and blob objects.

I want to be able to use this mechanism to do narrow clone/fetch
using such a filter-proc and a sparse-checkout-like spec.

Thanks,
Jeff



  reply	other threads:[~2017-06-28 17:13 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-22 20:36 [PATCH 0/3] WIP list-objects and pack-objects for partial clone Jeff Hostetler
2017-06-22 20:36 ` [PATCH 1/3] list-objects: add filter_blob to traverse_commit_list Jeff Hostetler
2017-06-22 21:45   ` Jonathan Tan
2017-06-22 22:10     ` Jonathan Tan
2017-06-23 17:16       ` Jeff Hostetler
2017-06-28 16:23   ` Junio C Hamano
2017-06-28 17:13     ` Jeff Hostetler [this message]
2017-06-28 17:54       ` Junio C Hamano
2017-06-22 20:36 ` [PATCH 2/3] pack-objects: WIP add max-blob-size filtering Jeff Hostetler
2017-06-22 21:54   ` Jonathan Tan
2017-06-22 22:14     ` Junio C Hamano
2017-06-22 20:36 ` [PATCH 3/3] pack-objects: add t5317 to test max-blob-size Jeff Hostetler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e2216ab8-5af7-4edd-16aa-f84a45e0cbd7@jeffhostetler.com \
    --to=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jeffhost@microsoft.com \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).