git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <ttaylorr@github.com>
To: Lars Schneider <larsxschneider@gmail.com>
Cc: git@vger.kernel.org, gitster@pobox.com, peff@peff.net,
	tboegi@web.de, e@80x24.org
Subject: Re: [PATCH v3 4/4] convert: add "status=delayed" to filter process protocol
Date: Wed, 12 Apr 2017 13:46:10 -0400	[thread overview]
Message-ID: <20170412174610.GB49694@Ida> (raw)
In-Reply-To: <20170409191107.20547-5-larsxschneider@gmail.com>

I think this is a great approach and one that I'd be happy to implement in LFS.
The additional capability isn't too complex, so I think other similar filters to
LFS shouldn't have a hard time implementing it either.

I left a few comments, mostly expressing approval to the documentation changes.
I'll leave the C review to someone more expert than me.

+1 from me on the protocol changes.

> +Delay
> +^^^^^
> +
> +If the filter supports the "delay" capability, then Git can send the
> +flag "delay-able" after the filter command and pathname.

Nit: I think either way is fine, but `can_delay` will save us 1 byte per each
new checkout entry.

> +"delay-id", a number that identifies the blob, and a flush packet. The
> +filter acknowledges this number with a "success" status and a flush
> +packet.

I mentioned this in another thread, but I'd prefer, if possible, that we use the
pathname as a unique identifier for referring back to a particular checkout
entry. I think adding an additional identifier adds unnecessary complication to
the protocol and introduces a forced mapping on the filter side from id to
path.

Both Git and the filter are going to have to keep these paths in memory
somewhere, be that in-process, or on disk. That being said, I can see potential
troubles with a large number of long paths that exceed the memory available to
Git or the filter when stored in a hashmap/set.

On Git's side, I think trading that for some CPU time might make sense. If Git
were to SHA1 each path and store that in a hashmap, it would consume more CPU
time, but less memory to store each path. Git and the filter could then exchange
path names, and Git would simply SHA1 the pathname each time it needed to refer
back to memory associated with that entry in a hashmap.

> +by a "success" status that is also terminated with a flush packet. If
> +no blobs for the delayed paths are available, yet, then the filter is
> +expected to block the response until at least one blob becomes
> +available. The filter can tell Git that it has no more delayed blobs
> +by sending an empty list.

I think the blocking nature of the `list_available_blobs` command is a great
synchronization mechanism for telling the filter when it can and can't send new
blob information to Git. I was tenatively thinking of suggesting to remove this
command and instead allow the filter to send readied blobs in sequence after all
unique checkout entries had been sent from Git to the filter. But I think this
allows approach allows us more flexibility, and isn't that much extra
complication or bytes across the pipe.


--
Thanks,
Taylor Blau

  parent reply	other threads:[~2017-04-12 17:47 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-09 19:11 [PATCH v3 0/4] convert: add "status=delayed" to filter process protocol Lars Schneider
2017-04-09 19:11 ` [PATCH v3 1/4] t0021: keep filter log files on comparison Lars Schneider
2017-04-09 19:11 ` [PATCH v3 2/4] t0021: make debug log file name configurable Lars Schneider
2017-04-09 19:11 ` [PATCH v3 3/4] t0021: write "OUT" only on success Lars Schneider
2017-04-09 19:11 ` [PATCH v3 4/4] convert: add "status=delayed" to filter process protocol Lars Schneider
2017-04-10 10:00   ` Lars Schneider
2017-04-10 14:28     ` Eric Wong
2017-04-10 14:52       ` Lars Schneider
2017-04-10 20:54   ` Torsten Bögershausen
2017-04-11 19:50     ` Lars Schneider
2017-04-12  4:37       ` Torsten Bögershausen
2017-04-18  8:53         ` Lars Schneider
2017-04-19 18:55           ` Torsten Bögershausen
2017-05-21 20:25             ` Lars Schneider
2017-04-12 17:34       ` Taylor Blau
2017-04-12 17:46   ` Taylor Blau [this message]
2017-04-18 16:14     ` Lars Schneider
2017-04-18 17:42       ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170412174610.GB49694@Ida \
    --to=ttaylorr@github.com \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=larsxschneider@gmail.com \
    --cc=peff@peff.net \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).