git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Lea Wiemann <lewiemann@gmail.com>
To: Petr Baudis <pasky@suse.cz>
Cc: git@vger.kernel.org, John Hawley <warthog19@eaglescrag.net>,
	Jakub Narebski <jnareb@gmail.com>
Subject: Re: [PATCH 2/3] add new Git::Repo API
Date: Tue, 15 Jul 2008 00:19:31 +0200	[thread overview]
Message-ID: <487BD0F3.2060508@gmail.com> (raw)
In-Reply-To: <20080714014051.GK10151@machine.or.cz>

Petr Baudis wrote:
> Maybe I sound too mentoring at places; [OTOH] if something general gets into
> the Git tree, I'd like to make sure it's something we can live happily
> with for long time, not just a hack tailored for gitweb caching.

Yup, I agree (and you don't sound too mentoring ^^).  Thanks for the
review and feedback.

> I really miss some more detailed writeup on the envisioned design here. [...]
> Most importantly, how is Git::Repo interacting with working copies,

Git::Repo is not interacting with working copies at all right now.  Is
there anything you think should be considered for its design?

Here's a write-up about the design (I'll probably move this into
Git::Repo's man page):

----------
Git::Repo aims to provide low-level access to Git repositories.  For
instance, you can resolve object names (like "HEAD~2") to SHA1s, and
inspect objects.  It does not attempt to be a wrapper around the git
plumbing or porcelain commands.

Error handling is simple: On a consistent repository, the Perl interface
will never die.  You can use the get_sha1 method to resolve arbitrary
object names or check the existence of SHA1 hashes; get_sha1 will return
undef if the object does not exist in the repository.  Any SHA1 that is
returned by get_sha1 can be safely passed to the other Git::Repo methods.
----------

> First, I don't think it's good idea at all to put the pipe-related stuff
> to Git::Repo - this is botched up API just like the current one.

Well, they're more like helper methods.  Since they don't fit into the
design goals of the Git::Repo API at all, I'd suggest we just
underscore-prefix them and take them out of the man page.  (The only
reason why I hadn't done this is that gitweb uses $repo->cmd_output
extensively, so it'd end up with a lot of underscore calls.  But I
suppose we can either alias _cmd_output to cmd_output in gitweb's
CachedRepo subclass, or live with $repo->_cmd_output calls.)  Does
underscore-prefixing sound good to you?

If someone wants to come up with a consistent nice interface for calling
git commands, sure.  I wasn't actually trying to do that.

> Here is an idea: Introduce Git::Command object [and a Git::CommandFactory]
> 
> 	my $c = Git::Command->new(['git', '--git-dir=.', 'cat-file', \
> 		'-p', 'bla'], {pipe_out=>1})    [...]
> 	my $cf = Git::CommandFactory->new('git', '--git-dir=.');
> 
> Or am I overdoing it?

Yes, I think so. ;-)  All we're talking about here is a wrapper around
"open '-|'" calls (plus some workarounds for Windows I suppose).

I don't have much of a notion of a 'command' as an object in my head;
your (snipped) example makes it look like you're trying to create a
IO::Handle-compatible interface, which I think is way too much effort
(and error-prone) for simple pipes.  Also, a CommandFactory class just
to catenate lists together seems like overkill to me.

Something like a command interface *may* make sense if it's tied to
repositories or working copies, in which case it could automatically set
--git-dir or --work-tree, but it's beyond the scope of what I'm trying
to create here, and I don't think it's even overly useful.

> Instead, I believe the best course is to gradually translate all the
> Git.pm functionality to the new OO model, leaving Git.pm as a
> compatibility wrapper. Now, if you believe this is a non-trivial task,
> please tell us why.

Well, as I said, the fact that there are untested parts in Git.pm
doesn't exactly make it trivial to refactor.  Also, only very few parts
can be cleanly moved to Git::Repo.  In "grep '^sub [^_]' Git.pm" I find
only cat_blob and perhaps hash_object to be eligible to be moved (though
hash_object should probably live in a working-copy/non-bare-repo class,
with an optional insert => 1 parameter).  And even cat_blob is
non-trivial to move unless you want the whole blob to be read into memory.

That's a lot of non-trivialness for very little gain.  I doubt I'd even
have enough time till the end of GSoC (minus vacation) to do this.

> It should be actually very easy to start with moving all the pipe
> functionality to Git::Command.

Creating a new (Git::Command) API is very much non-trivial, apart from
the fact that I'm not convinced that we need Git::Command, and that a
clean command interface neither falls out of Git.pm nor Git::Repo.

>>   I'm working on caching for gitweb, not on implementing the
>>   next great Perl API for Git.  (And Git::Repo isn't great, FTR.)
> 
> Wait, I can't make sense out of this paragraph. If Git.pm sucks, we can
> work on new API. But we better _make_ it great. Or someone else comes by
> next year and says "oh, but it's buggy and needs refactoring, let's
> throw it away and redesign it!"

Sorry, I wasn't clear with my parenthesed remark: I actually think that
Git::Repo is pretty good in terms of code and interface quality.  It's
just not *complete*, even in its limited scope, and I'm not attempting
to make it complete.

I do think that someone who wants to extend Git::Repo (like Jakub with
Git::Config) won't have much trouble doing so with the existing design.

>> +use constant _MESSAGE => 'M';
>> +use constant _ENCODING => 'E';   [snip]
> 
> if you are going to use hashes
> anyway, why not actually key by sensible name directly?

Embarrassingly premature optimization here. ^_^  I'll fix it.

>> $commit = Git::Commit->new($repo, $sha1)
> 
> Are we sure we don't want hash-based arguments instead? This is badly
> extensible and inconsistent with the rest of the API.

*ponders*  Every commit needs a repo and a SHA1, so those will never get
optional.  We can always add hash-based options after the two mandatory
arguments, but I don't even see any such possible options at the moment.
 (And if I turn out to be completely wrong, we can even move to a
hash-only argument list by checking the type of the first parameter.)
Really, I wouldn't worry.

>> [Git::Commit->new, Git::Tag->new:]
>> +Calls to this method are free, since it does not check whether $sha1
>> +exists and has the right type.  However, accessing any of the commit
>> +object's properties will fail if $sha1 is not a valid commit object.
> 
> This is nice idea, but I'd also provide a well-defined way for the user
> to verify the object's validity at a good moment; basically, make load()
> a public method. The user can deal with errors then and rely on
> error-free behavior later.

No, you should never pass in an invalid SHA1 in the first place.  The
above piece of documentation is just a warning that bugs will show up
delayed.  IOW, this is not the right place to have your error handling.

If you're getting a SHA1 through the user-interface, check its existence
with get_sha1 before passing it to the constructor.

>> +Note that $sha1 must be the SHA1 of a commit object; tag objects are
>> +not dereferenced.
> 
> Why not?

Because the SHA1 might resolve to an object of the wrong type, which
means you have to do error handling in Git::Object objects; that's the
wrong place.

If tag-resolving is really needed, we can add an optional $type
parameter to get_sha1, which will cause get_sha1 to resolve the object
until a $type object is found, or return undef if the object is or
resolves to an object of the wrong type.

I have resolving code in gitweb's git_get_sha1_or_die (which I didn't
implement in Git::Repo since it uses some customized error reporting).
The resolving code could conceivably be extracted and moved to get_sha1.
 I think there are a few things to ponder and maybe discuss, so I'd do
that in a separate patch (if I get around it before the end of the project).

>> +=item $commit->authors
> 
> author

Fixed.

>> +Objects are loaded lazily, and hence instantiation is free.  Objects
>> +stringify to their SHA1s.
> 
> Maybe use the term 'Git database objects'? This way, it seems as if we
> are talking about all Git/*.pm objects.

I've replaced it with: "Objects are loaded lazily, and hence
instantiation is free.  Git::Object instances stringify to their SHA1s."

>> +sub sha1 {
>> +sub stringify {
> 
> Why not overload "" directly to sha1()?

Done (and removed stringify).

>> +sub assert_opts {
>> +sub assert_sha1 {
> 
> Pretend names with underscores, since they are internal?

Done, and removed them from @EXPORT_OK.

>> 'directory': The directory of the repository (mandatory).
> 
> I don't think making this mandatory is reasonable, since all the git
> commands can automatically figure this out by themselves too; so can
> Git::Repo easily by calling git rev-parse --git-dir.

Sure, it can be made non-mandatory if it's needed, but there are so many
possibilities for the exact time and place at which the repo directory
should be resolved using rev-parse (if at all) that I'd rather leave
this to the person who has an actual use-case for it.  I'm not a fan of
designing APIs before they are needed.

>> [Snipped a lot of quoting --LW]
>> +=item $repo->repo_dir
>> +=item $repo->git_binary
>> +=item $repo->version
>> +sub _get_git_cmd {
> 
> This definitely does not belong to a Git::Repo object.

Which of those methods are you referring to?  I think $repo->version
might reasonably be removed (and the code re-added to gitweb); I'll do
so unless you object.  _get_git_cmd is already underscored, and repo_dir
and git_binary only access attributes passed in through the constructor,
so I think those three should stay.

>> +=item $repo->cat_file($sha1)
> 
> I don't think this is good combination of semantic and name. Since we
> don't do the same thing as plain git cat-file, we might as well call it
> cat_object() or even better get_object().

Yup; I like get_object (I think I was planning to rename it and then
didn't remember doing so before sending off the patch).  Will rename it.

>> +=item $repo->get_path($tree_sha1, $file_sha1)
> 
> Now we are quickly getting messy again. This should definitely live in
> Git::Tree.

Yup, that's true.  I'll move it into gitweb until we have Git::Tree
(with a comment that it can be moved to Git::Tree once it exists).

>> +=item $repo->get_refs
>> +=item $repo->get_refs($pattern)
> 
> Again, the refs should be properly integrated into the object structure.

Really?  I think it's generally fine for get_refs to exist and to live
in Git::Repo.

Its return value (currently an an arrayref of [$sha1, $object_type,
$ref_name] arrayrefs) might need improvement though, and I find the
$pattern parameter pretty suspect (in that it smells like a for-each-ref
wrapper).  Since get_refs is unused at the moment (gitweb ended up
needing the slightly different show-ref), I'll remove it for now.  (Same
thing about me not being a fan of premature API design applies.)

I keep patches of everything I remove so other people will be able to
use them as starting points; I'll probably post them as "FYI"-patches to
the list at the end of my project, to preserve them for posterity.

  reply	other threads:[~2008-07-14 22:20 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-11  1:06 [PATCH 0/3] Git::Repo API and gitweb caching Lea Wiemann
2008-07-11  1:10 ` [PATCH 1/3 v9] gitweb: add test suite with Test::WWW::Mechanize::CGI Lea Wiemann
2008-07-11  1:11 ` [PATCH 2/3] add new Git::Repo API Lea Wiemann
2008-07-13 21:38   ` Junio C Hamano
2008-07-14  1:04     ` Lea Wiemann
2008-07-13 23:28   ` Jakub Narebski
2008-07-14  2:29     ` Lea Wiemann
2008-07-14  1:40   ` Petr Baudis
2008-07-14 22:19     ` Lea Wiemann [this message]
2008-07-18 16:48       ` Petr Baudis
2008-07-18 17:05         ` Jakub Narebski
2008-07-18 17:17           ` Petr Baudis
2008-07-18 18:09         ` Lea Wiemann
2008-07-18 18:19           ` Petr Baudis
2008-07-18 18:23           ` Johannes Schindelin
2008-07-19 20:54         ` Statictics on Git.pm usage in git commands (was: [PATCH 2/3] add new Git::Repo API) Jakub Narebski
2008-07-19 21:14           ` Petr Baudis
2008-07-20  0:16             ` Jakub Narebski
2008-07-20 21:38               ` Petr Baudis
2008-07-20 10:38           ` Johannes Schindelin
2008-07-20 10:49             ` Petr Baudis
2008-07-20 12:33               ` Johannes Schindelin
2008-07-20 12:58                 ` Petr Baudis
2008-07-20 13:21                   ` Johannes Schindelin
2008-07-14 23:41     ` [PATCH 2/3] add new Git::Repo API Jakub Narebski
2008-07-15  0:11       ` Lea Wiemann
2008-07-18 16:54       ` Petr Baudis
2008-07-19  0:03         ` Jakub Narebski
2008-07-19 19:07         ` Jakub Narebski
2008-07-20 21:36           ` Petr Baudis
2008-07-20 21:50             ` Jakub Narebski
2008-07-16 18:21   ` Jakub Narebski
2008-07-16 20:32     ` Lea Wiemann
2008-07-17 23:49       ` Jakub Narebski
2008-07-18 13:40         ` Lea Wiemann
2008-07-18 15:35           ` Jakub Narebski
2008-07-18 16:51             ` Lea Wiemann
2008-07-11  1:11 ` [PATCH 3/3] gitweb: use new Git::Repo API, and add optional caching Lea Wiemann
2008-07-14 21:23   ` Jakub Narebski
2008-07-14 23:03     ` Lea Wiemann
2008-07-14 23:14       ` Jakub Narebski
2008-07-14 23:56         ` Lea Wiemann
2008-07-15  0:52           ` Jakub Narebski
2008-07-15  1:16             ` Lea Wiemann
2008-07-15  1:28               ` Johannes Schindelin
2008-07-15  1:44                 ` J.H.
2008-07-15  1:50                 ` Lea Wiemann
2008-07-15  2:03                   ` J.H.
2008-07-11  1:21 ` [PATCH 0/3] Git::Repo API and gitweb caching Johannes Schindelin
2008-07-11  9:33 ` Jakub Narebski
2008-07-11 14:07   ` Lea Wiemann
2008-07-11 16:27     ` Abhijit Menon-Sen
2008-07-12 15:08       ` Jakub Narebski
2008-07-19  5:35 ` Lea Wiemann
2008-08-18 19:34 ` Lea Wiemann
2008-08-18 19:39   ` [PATCH 1/3 v10] gitweb: add test suite with Test::WWW::Mechanize::CGI Lea Wiemann
2008-08-19  1:17     ` Junio C Hamano
2008-08-19 14:37       ` Lea Wiemann
2008-08-18 19:39   ` [PATCH 2/3 v2] add new Perl API: Git::Repo, Git::Commit, Git::Tag, and Git::RepoRoot Lea Wiemann
2008-08-19  1:32     ` Junio C Hamano
2008-08-19 15:06       ` Lea Wiemann
2008-08-19 13:51     ` Lea Wiemann
2008-08-18 19:39   ` [PATCH 3/3 v2] gitweb: use new Git::Repo API, and add optional caching Lea Wiemann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=487BD0F3.2060508@gmail.com \
    --to=lewiemann@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    --cc=pasky@suse.cz \
    --cc=warthog19@eaglescrag.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).