git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ben Peart <peartben@gmail.com>
To: Jonathan Tan <jonathantanmy@google.com>, git@vger.kernel.org
Cc: gitster@pobox.com, christian.couder@gmail.com
Subject: Re: [PATCH v2 5/5] sha1_file: support loading lazy objects
Date: Tue, 8 Aug 2017 16:20:40 -0400	[thread overview]
Message-ID: <0c2a1339-0645-bb0c-216f-e093edf85994@gmail.com> (raw)
In-Reply-To: <eadce97b6a1e80345a2621e71ce187e9e6bc05bf.1501532294.git.jonathantanmy@google.com>



On 7/31/2017 5:02 PM, Jonathan Tan wrote:
> Teach sha1_file to invoke the command configured in
> extensions.lazyObject whenever an object is requested and unavailable.
> 
> The usage of the hook can be suppressed through a flag when invoking
> has_object_file_with_flags() and other similar functions.
> 
> This is meant as a temporary measure to ensure that all Git commands
> work in such a situation. Future patches will update some commands to
> either tolerate missing objects (without invoking the command) or be
> more efficient in invoking this command.

To prevent fetch from downloading all missing objects, you will also 
need to add logic in check_connected.  The simplest model is to simply 
return 0 if repository_format_lazy_object is set.

/*
  * Running a with lazy_objects there will be objects that are
  * missing locally and we don't want to download a bunch of
  * commits, trees, and blobs just to make sure everything is
  * reachable locally so this option will skip reachablility
  * checks below that use rev-list.  This will stop the check
  * before uploadpack runs to determine if there is anything to
  * fetch.  Returning zero for the first check will also prevent the
  * uploadpack from happening.  It will also skip the check after
  * the fetch is finished to make sure all the objects where
  * downloaded in the pack file.  This will allow the fetch to
  * run and get all the latest tip commit ids for all the branches
  * in the fetch but not pull down commits, trees, or blobs via
  * upload pack.
  */
if (repository_format_lazy_object)
	return 0;

[...]
> diff --git a/sha1_file.c b/sha1_file.c
> index b60ae15f7..1785c61d8 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -28,6 +28,11 @@
>   #include "list.h"
>   #include "mergesort.h"
>   #include "quote.h"
> +#include "iterator.h"
> +#include "dir-iterator.h"
> +#include "sha1-lookup.h"
> +#include "lazy-object.h"
> +#include "sha1-array.h"
>   
>   #define SZ_FMT PRIuMAX
>   static inline uintmax_t sz_fmt(size_t s) { return s; }
> @@ -2984,6 +2989,7 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi,
>   	const unsigned char *real = (flags & OBJECT_INFO_LOOKUP_REPLACE) ?
>   				    lookup_replace_object(sha1) :
>   				    sha1;
> +	int already_retried = 0;
>   
>   	if (!oi)
>   		oi = &blank_oi;
> @@ -3008,30 +3014,38 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi,
>   		}
>   	}
>   
> -	if (!find_pack_entry(real, &e)) {
> -		/* Most likely it's a loose object. */
> -		if (!sha1_loose_object_info(real, oi, flags)) {
> -			oi->whence = OI_LOOSE;
> -			return 0;
> -		}
> +retry:
> +	if (find_pack_entry(real, &e))
> +		goto found_packed;
>   
> -		/* Not a loose object; someone else may have just packed it. */
> -		if (flags & OBJECT_INFO_QUICK) {
> -			return -1;
> -		} else {
> -			reprepare_packed_git();
> -			if (!find_pack_entry(real, &e))
> -				return -1;
> -		}
> +	/* Most likely it's a loose object. */
> +	if (!sha1_loose_object_info(real, oi, flags)) {
> +		oi->whence = OI_LOOSE;
> +		return 0;
>   	}
>   
> +	/* Not a loose object; someone else may have just packed it. */
> +	reprepare_packed_git();
> +	if (find_pack_entry(real, &e))
> +		goto found_packed;

Same feedback as before.  I like to avoid using goto's as flow control 
other than in error handling.

Also, this patch looses the OBJECT_INFO_QUICK logic which could be restored.

[...]

> diff --git a/t/t0410/lazy-object b/t/t0410/lazy-object
> new file mode 100755
> index 000000000..4f4a9c38a
> --- /dev/null
> +++ b/t/t0410/lazy-object
> @@ -0,0 +1,102 @@
> +#!/usr/bin/perl
> +#
> +# Example implementation for the Git lazyObject protocol version 1. See
> +# the documentation for extensions.lazyObject in
> +# Documentation/technical/repository-version.txt
> +#
> +# Allows you to test the ability for blobs to be pulled from a host git repo
> +# "on demand."  Called when git needs a blob it couldn't find locally due to
> +# a lazy clone that only cloned the commits and trees.
> +#
> +# Please note, this sample is a minimal skeleton. No proper error handling
> +# was implemented.
> +
> +use strict;
> +use warnings;
> +
> +#
> +# Point $DIR to the folder where your host git repo is located so we can pull
> +# missing objects from it
> +#
> +my $DIR = $ARGV[0];
> +

At some point, this should be based on the refactored pkt_* functions 
currently contained in the ObjectDB patch series.

> +sub packet_bin_read {
> +	my $buffer;
> +	my $bytes_read = read STDIN, $buffer, 4;
> +	if ( $bytes_read == 0 ) {
> +
> +		# EOF - Git stopped talking to us!
> +		exit();
> +	}
> +	elsif ( $bytes_read != 4 ) {
> +		die "invalid packet: '$buffer'";
> +	}
> +	my $pkt_size = hex($buffer);
> +	if ( $pkt_size == 0 ) {
> +		return ( 1, "" );
> +	}
> +	elsif ( $pkt_size > 4 ) {
> +		my $content_size = $pkt_size - 4;
> +		$bytes_read = read STDIN, $buffer, $content_size;
> +		if ( $bytes_read != $content_size ) {
> +			die "invalid packet ($content_size bytes expected; $bytes_read bytes read)";
> +		}
> +		return ( 0, $buffer );
> +	}
> +	else {
> +		die "invalid packet size: $pkt_size";
> +	}
> +}
> +
> +sub packet_txt_read {
> +	my ( $res, $buf ) = packet_bin_read();
> +	unless ( $buf =~ s/\n$// ) {
> +		die "A non-binary line MUST be terminated by an LF.";
> +	}
> +	return ( $res, $buf );
> +}
> +
> +sub packet_bin_write {
> +	my $buf = shift;
> +	print STDOUT sprintf( "%04x", length($buf) + 4 );
> +	print STDOUT $buf;
> +	STDOUT->flush();
> +}
> +
> +sub packet_txt_write {
> +	packet_bin_write( $_[0] . "\n" );
> +}
> +
> +sub packet_flush {
> +	print STDOUT sprintf( "%04x", 0 );
> +	STDOUT->flush();
> +}
> +
> +( packet_txt_read() eq ( 0, "git-lazy-object-client" ) ) || die "bad initialize";
> +( packet_txt_read() eq ( 0, "version=1" ) )		 || die "bad version";
> +( packet_bin_read() eq ( 1, "" ) )                       || die "bad version end";
> +
> +packet_txt_write("git-lazy-object-server");
> +packet_txt_write("version=1");
> +packet_flush();
> +
> +( packet_txt_read() eq ( 0, "capability=get" ) )    || die "bad capability";
> +( packet_bin_read() eq ( 1, "" ) )                  || die "bad capability end";
> +
> +packet_txt_write("capability=get");
> +packet_flush();
> +
> +while (1) {
> +	my ($command) = packet_txt_read() =~ /^command=([^=]+)$/;
> +
> +	if ( $command eq "get" ) {
> +		my ($sha1) = packet_txt_read() =~ /^sha1=([0-9a-f]{40})$/;
> +		packet_bin_read();
> +
> +		system ('git --git-dir="' . $DIR . '" cat-file blob ' . $sha1 . ' | git -c extensions.lazyobject=false hash-object -w --stdin >/dev/null 2>&1');
> +		packet_txt_write(($?) ? "status=error" : "status=success");
> +		packet_flush();
> +	} else {
> +		die "bad command '$command'";
> +	}
> +}
> 

      parent reply	other threads:[~2017-08-08 20:20 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-26 23:29 [RFC PATCH 0/4] Some patches for fsck for missing objects Jonathan Tan
2017-07-26 23:29 ` [RFC PATCH 1/4] environment, fsck: introduce lazyobject extension Jonathan Tan
2017-07-27 18:55   ` Junio C Hamano
2017-07-28 13:20     ` Ben Peart
2017-07-28 23:50     ` Jonathan Tan
2017-07-29  0:21       ` Junio C Hamano
2017-07-26 23:30 ` [RFC PATCH 2/4] fsck: support refs pointing to lazy objects Jonathan Tan
2017-07-27 18:59   ` Junio C Hamano
2017-07-27 23:50     ` Jonathan Tan
2017-07-28 13:29       ` Ben Peart
2017-07-28 20:08         ` [PATCH] tests: ensure fsck fails on corrupt packfiles Jonathan Tan
2017-07-26 23:30 ` [RFC PATCH 3/4] fsck: support referenced lazy objects Jonathan Tan
2017-07-27 19:17   ` Junio C Hamano
2017-07-27 23:50     ` Jonathan Tan
2017-07-29 16:04   ` Junio C Hamano
2017-07-26 23:30 ` [RFC PATCH 4/4] fsck: support lazy objects as CLI argument Jonathan Tan
2017-07-26 23:42 ` [RFC PATCH 0/4] Some patches for fsck for missing objects brian m. carlson
2017-07-27  0:24   ` Stefan Beller
2017-07-27 17:25   ` Jonathan Tan
2017-07-28 13:40     ` Ben Peart
2017-07-31 21:02 ` [PATCH v2 0/5] Fsck for lazy objects, and (now) actual invocation of loader Jonathan Tan
2017-07-31 21:21   ` Junio C Hamano
2017-07-31 23:05     ` Jonathan Tan
2017-08-01 17:11       ` Junio C Hamano
2017-08-01 17:45         ` Jonathan Nieder
2017-08-01 20:15           ` Junio C Hamano
2017-08-02  0:19         ` Jonathan Tan
2017-08-02 16:20           ` Junio C Hamano
2017-08-02 17:38             ` Jonathan Nieder
2017-08-02 20:51               ` Junio C Hamano
2017-08-02 22:13                 ` Jonathan Nieder
2017-08-03 19:08                 ` Jonathan Tan
2017-08-08 17:13   ` Ben Peart
2017-07-31 21:02 ` [PATCH v2 1/5] environment, fsck: introduce lazyobject extension Jonathan Tan
2017-07-31 21:02 ` [PATCH v2 2/5] fsck: support refs pointing to lazy objects Jonathan Tan
2017-07-31 21:02 ` [PATCH v2 3/5] fsck: support referenced " Jonathan Tan
2017-07-31 21:02 ` [PATCH v2 4/5] fsck: support lazy objects as CLI argument Jonathan Tan
2017-07-31 21:02 ` [PATCH v2 5/5] sha1_file: support loading lazy objects Jonathan Tan
2017-07-31 21:29   ` Junio C Hamano
2017-08-08 20:20   ` Ben Peart [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0c2a1339-0645-bb0c-216f-e093edf85994@gmail.com \
    --to=peartben@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).