git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Stefan Beller <sbeller@google.com>
Cc: Dennis Kaarsemaker <dennis@kaarsemaker.net>,
	"git@vger.kernel.org" <git@vger.kernel.org>,
	Dan Wang <dwwang@google.com>
Subject: Re: [RFC PATCHv1 0/4] Push options in C Git
Date: Fri, 1 Jul 2016 16:01:23 -0400	[thread overview]
Message-ID: <20160701200123.GB23141@sigill.intra.peff.net> (raw)
In-Reply-To: <CAGZ79kZx6=LGjQjzEEjMjZrNbXcVto-+djJbxj9xUCtLMsaL5w@mail.gmail.com>

On Fri, Jul 01, 2016 at 11:25:40AM -0700, Stefan Beller wrote:

> > So a litmus test for me would be something like:
> >
> >   Can you get the same effect with:
> >
> >     git push $remote my-branch &&
> >     curl https://$remote/api/do-the-thing?q=my-branch
> >
> >   as you would with:
> >
> >     git push --push-data=do-the-thing $remote mybranch
> 
> People are happy when they can use one tool instead of two,
> e.g. "Deploy with Git" is a thing[1] and maybe they want to add a
> switch "once pushed skip the tests, I know they are broken".
> 
> [1] https://devcenter.heroku.com/articles/git

Yes, but people are also happy when they can use a flexible and
standardized tool to do a thing. I'd be more frustrated when I found out
that Git's data-pushing protocol has arbitrary limitations (like, say, I
can't push a data item larger than a single 64K pkt-line), which would
easily just work with something like HTTP POSTs.

I'm not saying my point trumps yours; just that there are a lot of
tradeoffs in a design like this.

> > With Gerrit, how do you refer to the commits you just pushed, when
> > making a separate action?  My impression is that Gerrit assigns
> > change-ids centrally when you push to refs/for/master;
> 
> actually you install a commit hook to do it decentralized.

Ah, OK. So you could do roughly the same thing:

  git push ... &&
  id=$(
    git cat-file commit HEAD |
    perl -lne '/^Change-Id: (.*)/ and print $1'
  ) &&
  curl https://$remote/api/do-something?id=$id

(Yes, I know Gerrit doesn't necessarily have an API like that; I was
mostly just wondering how it compared to the GitHub PR example I had
laid out).

> I am used to read high quality emails from you and asked myself what I can
> learn by reading this email. "Trying to figure out your point", nothing more.

Heh. Sorry, they can't all be winners. ;)

> So you suggest that we only allow key=value pairs and not "key" only?
> I am not sure if that makes it easier, though (what part is easier?) compared
> to completely free form.

I had just assumed they would always have a key and a value. If you have
no key, then you are stuck parsing positionally to know what each value
means. If you have no value, you can treat each element like a boolean
true. But it's not that much worse to turn "foo" into "foo=true".

> > but they do complicate the server implementation (how do we
> > communicate them to the hooks? Should there be limits for safety and
> > DoS protection on the server?)
> 
> Oh good point! How do you handle DoS protection for pushing large
> blobs, or weird history (that may blow up the before-behind
> implementation as referenced in the "topological index field for
> commit objects" thread)

At GitHub we use a combination of push-time size heuristics and CPU
run-time limits.

So at push time, we reject objects larger than 100MB (uncompressed) in
index-pack, and we'll refuse to make a single allocation larger than a
gigabyte or so (to prevent index-pack blowing up while trying to figure
out how big the object is). We reject any single pack larger than 2GB.
Rejected objects never even make it to the repo, and we clean up the tmp
packs immediately. So you can't just stream data at us endlessly and
chew up disk or RAM (you can chew up plenty of CPU, though). 

That prevents the most egregious DoS problems, and protects us from crap
like "whoops, this tree entry has a 2GB name, and it wraps an integer
somewhere deep in git that probably should have been a size_t".

But git is complicated enough that you can still get plenty of malicious
crap through. So we aggressively limit completion times for most
operations, and we keep track of the runtime of every single git
operation and correlate it with the repo, user, and source IP, and apply
quotas when there's excessive use. That's not nice for a user who has a
legitimately big request, but it prevents them from impacting other
users (so it's a baseline, and then we try to optimize things that don't
fit into the timeouts, or that cause users to have quotas applied).

That was a bit of a tangent, but back to this topic: I'd probably want
some kind of hard limit on the number of data items and the size of each
(though if they are single pkt-lines, that puts a hard limit already).

> Shouldn't that be server specific as well?

Potentially, yes.

> Mind that I proposed a "receive.advertisePushOptions" config option.
> Maybe that can be fine tuned in a later patch "receive.PushOptionSizeLimit",
> "receive.maxPushOptions" or such?

Yep, I think that is a good approach (presumably with some sane safe-ish
defaults).

> Talking about authentication, any tool that we additionally have "to
> just trigger this one thing remotely" would need to learn about proper
> auth, too. so it will not be just a small helper script, but blows up
> into a large thing. I think we could avoid that duplicated effort,
> too.

I think the right tool there is credential helpers. And I don't think
it's a new problem. You already can log into Gerrit via the web browser,
but you have to have separate auth for running git. Hopefully there are
tools to make that less painful (either pulling the browser cookie, or
getting an opaque application token from the web app to feed into git).

-Peff

      reply	other threads:[~2016-07-01 20:11 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-30  0:59 [RFC PATCHv1 0/4] Push options in C Git Stefan Beller
2016-06-30  0:59 ` [PATCH 1/4] push options: {pre,post}-receive hook learns about push options Stefan Beller
2016-07-01  7:14   ` Jeff King
2016-07-01 17:20     ` Stefan Beller
2016-07-01 17:59       ` Jeff King
2016-07-01 18:03         ` Junio C Hamano
2016-07-01 18:11           ` Jeff King
2016-07-01 19:25             ` Junio C Hamano
2016-07-01 19:31               ` Stefan Beller
2016-07-01 19:39               ` Jeff King
2016-07-01 19:50                 ` Stefan Beller
2016-07-01 18:40         ` Stefan Beller
2016-06-30  0:59 ` [PATCH 2/4] receive-pack: implement advertising and receiving " Stefan Beller
2016-07-01 17:11   ` Junio C Hamano
2016-07-01 17:24     ` Stefan Beller
2016-06-30  0:59 ` [PATCH 3/4] push: accept " Stefan Beller
2016-06-30  0:59 ` [PATCH 4/4] add a test for " Stefan Beller
2016-07-01  7:09 ` [RFC PATCHv1 0/4] Push options in C Git Jeff King
2016-07-01 17:07   ` Stefan Beller
2016-07-01 17:55     ` Jeff King
2016-07-01 18:25       ` Stefan Beller
2016-07-01 20:01         ` Jeff King [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160701200123.GB23141@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=dennis@kaarsemaker.net \
    --cc=dwwang@google.com \
    --cc=git@vger.kernel.org \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).