From: Ben Peart <peartben@gmail.com>
To: Jonathan Tan <jonathantanmy@google.com>, git@vger.kernel.org
Cc: gitster@pobox.com, christian.couder@gmail.com
Subject: Re: [PATCH v2 5/5] sha1_file: support loading lazy objects
Date: Tue, 8 Aug 2017 16:20:40 -0400 [thread overview]
Message-ID: <0c2a1339-0645-bb0c-216f-e093edf85994@gmail.com> (raw)
In-Reply-To: <eadce97b6a1e80345a2621e71ce187e9e6bc05bf.1501532294.git.jonathantanmy@google.com>
On 7/31/2017 5:02 PM, Jonathan Tan wrote:
> Teach sha1_file to invoke the command configured in
> extensions.lazyObject whenever an object is requested and unavailable.
>
> The usage of the hook can be suppressed through a flag when invoking
> has_object_file_with_flags() and other similar functions.
>
> This is meant as a temporary measure to ensure that all Git commands
> work in such a situation. Future patches will update some commands to
> either tolerate missing objects (without invoking the command) or be
> more efficient in invoking this command.
To prevent fetch from downloading all missing objects, you will also
need to add logic in check_connected. The simplest model is to simply
return 0 if repository_format_lazy_object is set.
/*
* Running a with lazy_objects there will be objects that are
* missing locally and we don't want to download a bunch of
* commits, trees, and blobs just to make sure everything is
* reachable locally so this option will skip reachablility
* checks below that use rev-list. This will stop the check
* before uploadpack runs to determine if there is anything to
* fetch. Returning zero for the first check will also prevent the
* uploadpack from happening. It will also skip the check after
* the fetch is finished to make sure all the objects where
* downloaded in the pack file. This will allow the fetch to
* run and get all the latest tip commit ids for all the branches
* in the fetch but not pull down commits, trees, or blobs via
* upload pack.
*/
if (repository_format_lazy_object)
return 0;
[...]
> diff --git a/sha1_file.c b/sha1_file.c
> index b60ae15f7..1785c61d8 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -28,6 +28,11 @@
> #include "list.h"
> #include "mergesort.h"
> #include "quote.h"
> +#include "iterator.h"
> +#include "dir-iterator.h"
> +#include "sha1-lookup.h"
> +#include "lazy-object.h"
> +#include "sha1-array.h"
>
> #define SZ_FMT PRIuMAX
> static inline uintmax_t sz_fmt(size_t s) { return s; }
> @@ -2984,6 +2989,7 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi,
> const unsigned char *real = (flags & OBJECT_INFO_LOOKUP_REPLACE) ?
> lookup_replace_object(sha1) :
> sha1;
> + int already_retried = 0;
>
> if (!oi)
> oi = &blank_oi;
> @@ -3008,30 +3014,38 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi,
> }
> }
>
> - if (!find_pack_entry(real, &e)) {
> - /* Most likely it's a loose object. */
> - if (!sha1_loose_object_info(real, oi, flags)) {
> - oi->whence = OI_LOOSE;
> - return 0;
> - }
> +retry:
> + if (find_pack_entry(real, &e))
> + goto found_packed;
>
> - /* Not a loose object; someone else may have just packed it. */
> - if (flags & OBJECT_INFO_QUICK) {
> - return -1;
> - } else {
> - reprepare_packed_git();
> - if (!find_pack_entry(real, &e))
> - return -1;
> - }
> + /* Most likely it's a loose object. */
> + if (!sha1_loose_object_info(real, oi, flags)) {
> + oi->whence = OI_LOOSE;
> + return 0;
> }
>
> + /* Not a loose object; someone else may have just packed it. */
> + reprepare_packed_git();
> + if (find_pack_entry(real, &e))
> + goto found_packed;
Same feedback as before. I like to avoid using goto's as flow control
other than in error handling.
Also, this patch looses the OBJECT_INFO_QUICK logic which could be restored.
[...]
> diff --git a/t/t0410/lazy-object b/t/t0410/lazy-object
> new file mode 100755
> index 000000000..4f4a9c38a
> --- /dev/null
> +++ b/t/t0410/lazy-object
> @@ -0,0 +1,102 @@
> +#!/usr/bin/perl
> +#
> +# Example implementation for the Git lazyObject protocol version 1. See
> +# the documentation for extensions.lazyObject in
> +# Documentation/technical/repository-version.txt
> +#
> +# Allows you to test the ability for blobs to be pulled from a host git repo
> +# "on demand." Called when git needs a blob it couldn't find locally due to
> +# a lazy clone that only cloned the commits and trees.
> +#
> +# Please note, this sample is a minimal skeleton. No proper error handling
> +# was implemented.
> +
> +use strict;
> +use warnings;
> +
> +#
> +# Point $DIR to the folder where your host git repo is located so we can pull
> +# missing objects from it
> +#
> +my $DIR = $ARGV[0];
> +
At some point, this should be based on the refactored pkt_* functions
currently contained in the ObjectDB patch series.
> +sub packet_bin_read {
> + my $buffer;
> + my $bytes_read = read STDIN, $buffer, 4;
> + if ( $bytes_read == 0 ) {
> +
> + # EOF - Git stopped talking to us!
> + exit();
> + }
> + elsif ( $bytes_read != 4 ) {
> + die "invalid packet: '$buffer'";
> + }
> + my $pkt_size = hex($buffer);
> + if ( $pkt_size == 0 ) {
> + return ( 1, "" );
> + }
> + elsif ( $pkt_size > 4 ) {
> + my $content_size = $pkt_size - 4;
> + $bytes_read = read STDIN, $buffer, $content_size;
> + if ( $bytes_read != $content_size ) {
> + die "invalid packet ($content_size bytes expected; $bytes_read bytes read)";
> + }
> + return ( 0, $buffer );
> + }
> + else {
> + die "invalid packet size: $pkt_size";
> + }
> +}
> +
> +sub packet_txt_read {
> + my ( $res, $buf ) = packet_bin_read();
> + unless ( $buf =~ s/\n$// ) {
> + die "A non-binary line MUST be terminated by an LF.";
> + }
> + return ( $res, $buf );
> +}
> +
> +sub packet_bin_write {
> + my $buf = shift;
> + print STDOUT sprintf( "%04x", length($buf) + 4 );
> + print STDOUT $buf;
> + STDOUT->flush();
> +}
> +
> +sub packet_txt_write {
> + packet_bin_write( $_[0] . "\n" );
> +}
> +
> +sub packet_flush {
> + print STDOUT sprintf( "%04x", 0 );
> + STDOUT->flush();
> +}
> +
> +( packet_txt_read() eq ( 0, "git-lazy-object-client" ) ) || die "bad initialize";
> +( packet_txt_read() eq ( 0, "version=1" ) ) || die "bad version";
> +( packet_bin_read() eq ( 1, "" ) ) || die "bad version end";
> +
> +packet_txt_write("git-lazy-object-server");
> +packet_txt_write("version=1");
> +packet_flush();
> +
> +( packet_txt_read() eq ( 0, "capability=get" ) ) || die "bad capability";
> +( packet_bin_read() eq ( 1, "" ) ) || die "bad capability end";
> +
> +packet_txt_write("capability=get");
> +packet_flush();
> +
> +while (1) {
> + my ($command) = packet_txt_read() =~ /^command=([^=]+)$/;
> +
> + if ( $command eq "get" ) {
> + my ($sha1) = packet_txt_read() =~ /^sha1=([0-9a-f]{40})$/;
> + packet_bin_read();
> +
> + system ('git --git-dir="' . $DIR . '" cat-file blob ' . $sha1 . ' | git -c extensions.lazyobject=false hash-object -w --stdin >/dev/null 2>&1');
> + packet_txt_write(($?) ? "status=error" : "status=success");
> + packet_flush();
> + } else {
> + die "bad command '$command'";
> + }
> +}
>
next prev parent reply other threads:[~2017-08-08 20:20 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-26 23:29 [RFC PATCH 0/4] Some patches for fsck for missing objects Jonathan Tan
2017-07-26 23:29 ` [RFC PATCH 1/4] environment, fsck: introduce lazyobject extension Jonathan Tan
2017-07-27 18:55 ` Junio C Hamano
2017-07-28 13:20 ` Ben Peart
2017-07-28 23:50 ` Jonathan Tan
2017-07-29 0:21 ` Junio C Hamano
2017-07-26 23:30 ` [RFC PATCH 2/4] fsck: support refs pointing to lazy objects Jonathan Tan
2017-07-27 18:59 ` Junio C Hamano
2017-07-27 23:50 ` Jonathan Tan
2017-07-28 13:29 ` Ben Peart
2017-07-28 20:08 ` [PATCH] tests: ensure fsck fails on corrupt packfiles Jonathan Tan
2017-07-26 23:30 ` [RFC PATCH 3/4] fsck: support referenced lazy objects Jonathan Tan
2017-07-27 19:17 ` Junio C Hamano
2017-07-27 23:50 ` Jonathan Tan
2017-07-29 16:04 ` Junio C Hamano
2017-07-26 23:30 ` [RFC PATCH 4/4] fsck: support lazy objects as CLI argument Jonathan Tan
2017-07-26 23:42 ` [RFC PATCH 0/4] Some patches for fsck for missing objects brian m. carlson
2017-07-27 0:24 ` Stefan Beller
2017-07-27 17:25 ` Jonathan Tan
2017-07-28 13:40 ` Ben Peart
2017-07-31 21:02 ` [PATCH v2 0/5] Fsck for lazy objects, and (now) actual invocation of loader Jonathan Tan
2017-07-31 21:02 ` [PATCH v2 1/5] environment, fsck: introduce lazyobject extension Jonathan Tan
2017-07-31 21:02 ` [PATCH v2 2/5] fsck: support refs pointing to lazy objects Jonathan Tan
2017-07-31 21:02 ` [PATCH v2 3/5] fsck: support referenced " Jonathan Tan
2017-07-31 21:02 ` [PATCH v2 4/5] fsck: support lazy objects as CLI argument Jonathan Tan
2017-07-31 21:02 ` [PATCH v2 5/5] sha1_file: support loading lazy objects Jonathan Tan
2017-07-31 21:29 ` Junio C Hamano
2017-08-08 20:20 ` Ben Peart [this message]
2017-07-31 21:21 ` [PATCH v2 0/5] Fsck for lazy objects, and (now) actual invocation of loader Junio C Hamano
2017-07-31 23:05 ` Jonathan Tan
2017-08-01 17:11 ` Junio C Hamano
2017-08-01 17:45 ` Jonathan Nieder
2017-08-01 20:15 ` Junio C Hamano
2017-08-02 0:19 ` Jonathan Tan
2017-08-02 16:20 ` Junio C Hamano
2017-08-02 17:38 ` Jonathan Nieder
2017-08-02 20:51 ` Junio C Hamano
2017-08-02 22:13 ` Jonathan Nieder
2017-08-03 19:08 ` Jonathan Tan
2017-08-08 17:13 ` Ben Peart
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0c2a1339-0645-bb0c-216f-e093edf85994@gmail.com \
--to=peartben@gmail.com \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jonathantanmy@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.