* Monitoring a repository for changes @ 2017-06-21 14:27 Tim Hutt 2017-06-21 15:04 ` Ævar Arnfjörð Bjarmason 2017-06-21 21:19 ` Jonathan Nieder 0 siblings, 2 replies; 9+ messages in thread From: Tim Hutt @ 2017-06-21 14:27 UTC (permalink / raw) To: git Hi, Currently if you want to monitor a repository for changes there are three options: * Polling - run a script to check for updates every 60 seconds. * Server side hooks * Web hooks (on Github, Bitbucket etc.) Unfortunately for many (most?) cases server-side hooks and web hooks are not suitable. They require you to both have admin access to the repo and have a public server available to push updates to. That is a huge faff when all I want to do is run some local code when a repo is updated (e.g. play a sound). Currently people resort to polling (https://stackoverflow.com/a/5199111/265521) which is just ugly. I would like to propose that there should be a forth option that uses a persistent connection to monitor the repo. It would be used something like this: git watch https://github.com/git/git.git or git watch git@github.com:git/git.git It would then print simple messages to stdout. The complexity of what it prints is up for debate, - it could be something as simple as "PUSH\n", or it could include more information, e.g. JSON-encoded information about the commits. I'd be happy with just "PUSH\n" though. In terms of implementation, the HTTP transport could use Server-Sent Events, and the SSH transport can pretty much do whatever so that should be easy. Thoughts? Tim ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Monitoring a repository for changes 2017-06-21 14:27 Monitoring a repository for changes Tim Hutt @ 2017-06-21 15:04 ` Ævar Arnfjörð Bjarmason 2017-06-21 19:44 ` Jeff King 2017-06-21 19:52 ` Eric Wong 2017-06-21 21:19 ` Jonathan Nieder 1 sibling, 2 replies; 9+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2017-06-21 15:04 UTC (permalink / raw) To: Tim Hutt; +Cc: git On Wed, Jun 21 2017, Tim Hutt jotted: > Hi, > > Currently if you want to monitor a repository for changes there are > three options: > > * Polling - run a script to check for updates every 60 seconds. > * Server side hooks > * Web hooks (on Github, Bitbucket etc.) > > Unfortunately for many (most?) cases server-side hooks and web hooks > are not suitable. They require you to both have admin access to the > repo and have a public server available to push updates to. That is a > huge faff when all I want to do is run some local code when a repo is > updated (e.g. play a sound). > > Currently people resort to polling > (https://stackoverflow.com/a/5199111/265521) which is just ugly. I > would like to propose that there should be a forth option that uses a > persistent connection to monitor the repo. It would be used something > like this: > > git watch https://github.com/git/git.git > > or > > git watch git@github.com:git/git.git > > It would then print simple messages to stdout. The complexity of what > it prints is up for debate, - it could be something as simple as > "PUSH\n", or it could include more information, e.g. JSON-encoded > information about the commits. I'd be happy with just "PUSH\n" though. Insofar as this could be implemented in some standard way in Git it's likely to have a large overlap with the "protocol v2" that keeps coming up here on-list. You might want to search for past threads discussing that. > In terms of implementation, the HTTP transport could use Server-Sent > Events, and the SSH transport can pretty much do whatever so that > should be easy. In case you didn't know, any of the non-trivially sized git hosting providers (e.g. github, gitlab) provide you access over ssh, but you can't just run any arbitrary command, it's a tiny set of whitelisted commands. See the "git-shell" manual page (github doesn't use that exact software, but something similar). But overall, it would be nice to have some rationale for this approach other than that you think polling is ugly. There's a lot of advantages to polling for something you don't need near-instantly, e.g. imagine how many active connections a site like GitHub would need to handle if something like this became widely used, that's in a lot of ways harder to scale and load balance than just having clients that poll something that's trivially cached as static content. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Monitoring a repository for changes 2017-06-21 15:04 ` Ævar Arnfjörð Bjarmason @ 2017-06-21 19:44 ` Jeff King 2017-06-21 19:55 ` Stefan Beller 2017-06-21 19:52 ` Eric Wong 1 sibling, 1 reply; 9+ messages in thread From: Jeff King @ 2017-06-21 19:44 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: Tim Hutt, git On Wed, Jun 21, 2017 at 05:04:12PM +0200, Ævar Arnfjörð Bjarmason wrote: > > In terms of implementation, the HTTP transport could use Server-Sent > > Events, and the SSH transport can pretty much do whatever so that > > should be easy. > > In case you didn't know, any of the non-trivially sized git hosting > providers (e.g. github, gitlab) provide you access over ssh, but you > can't just run any arbitrary command, it's a tiny set of whitelisted > commands. See the "git-shell" manual page (github doesn't use that exact > software, but something similar). These days you don't even hit the actual fileservers with ssh at all. We terminate all of the protocols (http, git://, and ssh) at a proxy layer that kicks off git commands in the actual repositories using a separate protocol. The ssh handshakes were a huge performance bottleneck, so by doing it that way we can scale out the front-end tier independently of the repository storage (and of course it also provides a convenient layer for mapping user visible repository names into sharded paths). Not to take away from your point. Just a little bit of trivia. > But overall, it would be nice to have some rationale for this approach > other than that you think polling is ugly. There's a lot of advantages > to polling for something you don't need near-instantly, e.g. imagine how > many active connections a site like GitHub would need to handle if > something like this became widely used, that's in a lot of ways harder > to scale and load balance than just having clients that poll something > that's trivially cached as static content. Yeah. The naive way to implement this would be to have the client connect and receive the ref advertisement. And then when it's a noop (nothing to fetch), instead of saying "I want these objects", say "Please pause until one or more refs change". But I don't think we'd want to leave actual upload-pack processes sitting paused on the server. Their memory usage is too high. For this kind of "long polling" we have a separate front-end tier with a daemon that keeps the per-client cost very low. We could possibly wedge that into our proxy layer, but the system would be a lot simpler and more flexible if this were done separately from the actual git protocol. E.g., if an HTTP endpoint were defined that paused and returned data only when a particular repository's refs were updated. Another option is to keep polling, but just make noop fetches a lot cheaper. The ref advertisement on some repositories can get into the megabytes. I'd love to see protocol extensions for: 1. The client asking only for bits of the ref namespace they care about. I have some preliminary patches for this, but I really need to polish them. 2. Something ETag-ish where the client can say "I already saw state X, do you have updates?" Even just handling "no, no updates" (like an ETag) would be a big benefit. Bonus points if it can say "since state X, these are the changes; you are now at state Y". The sticking point on both is that the client needs to speak before the ref advertisement begins, which is why we have to deal with the protocol v2 headache. -Peff ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Monitoring a repository for changes 2017-06-21 19:44 ` Jeff King @ 2017-06-21 19:55 ` Stefan Beller 0 siblings, 0 replies; 9+ messages in thread From: Stefan Beller @ 2017-06-21 19:55 UTC (permalink / raw) To: Jeff King Cc: Ævar Arnfjörð Bjarmason, Tim Hutt, git@vger.kernel.org On Wed, Jun 21, 2017 at 12:44 PM, Jeff King <peff@peff.net> wrote: > > Yeah. The naive way to implement this would be to have the client > connect and receive the ref advertisement. And then when it's a noop > (nothing to fetch), instead of saying "I want these objects", say > "Please pause until one or more refs change". But I don't think we'd > want to leave actual upload-pack processes sitting paused on the server. > Their memory usage is too high. https://git.eclipse.org/r/#/c/6587/ JGit has had its experiments with some standing connection and then having some sort of Pub/Sub system. AFAICT it did not go anywhere because of the number of connections (even if you optimize for the serverside, such that each connection is just the cost of a java thread and a file descriptor). > > The sticking point on both is that the client needs to speak before the > ref advertisement begins, which is why we have to deal with the protocol > v2 headache. I would not call it headache, but large project that is not to be tackled by one person alone. ;) ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Monitoring a repository for changes 2017-06-21 15:04 ` Ævar Arnfjörð Bjarmason 2017-06-21 19:44 ` Jeff King @ 2017-06-21 19:52 ` Eric Wong 2017-06-21 21:56 ` Ævar Arnfjörð Bjarmason 1 sibling, 1 reply; 9+ messages in thread From: Eric Wong @ 2017-06-21 19:52 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: Tim Hutt, git Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > On Wed, Jun 21 2017, Tim Hutt jotted: > > > Hi, > > > > Currently if you want to monitor a repository for changes there are > > three options: > > > > * Polling - run a script to check for updates every 60 seconds. > > * Server side hooks > > * Web hooks (on Github, Bitbucket etc.) > > > > Unfortunately for many (most?) cases server-side hooks and web hooks > > are not suitable. They require you to both have admin access to the > > repo and have a public server available to push updates to. That is a > > huge faff when all I want to do is run some local code when a repo is > > updated (e.g. play a sound). Yeah, it kinda sucks that way. Currently, for one of my public-inbox mirrors which has ssh access to the primary server on public-inbox.org, I have: #!/bin/sh while true do # GNU tail(1) uses inotify to avoid polling on Linux ssh public-inbox.org tail -F /path/to/git-vger.git/info/refs | \ while read sha1 ref do for GIT_DIR in git-vger.git do export GIT_DIR git fetch || continue git update-server-info public-inbox-index # update Xapian index done done done It's not perfect as it requires multiple processes on the server, but it's better than polling for my limited use. > > Currently people resort to polling > > (https://stackoverflow.com/a/5199111/265521) which is just ugly. I > > would like to propose that there should be a forth option that uses a > > persistent connection to monitor the repo. It would be used something > > like this: > > > > git watch https://github.com/git/git.git > > > > or > > > > git watch git@github.com:git/git.git > > > > It would then print simple messages to stdout. The complexity of what > > it prints is up for debate, - it could be something as simple as > > "PUSH\n", or it could include more information, e.g. JSON-encoded > > information about the commits. I'd be happy with just "PUSH\n" though. > > Insofar as this could be implemented in some standard way in Git it's > likely to have a large overlap with the "protocol v2" that keeps coming > up here on-list. You might want to search for past threads discussing > that. Yeah, it hasn't been a priority for me, either... > > In terms of implementation, the HTTP transport could use Server-Sent > > Events, and the SSH transport can pretty much do whatever so that > > should be easy. > > In case you didn't know, any of the non-trivially sized git hosting > providers (e.g. github, gitlab) provide you access over ssh, but you > can't just run any arbitrary command, it's a tiny set of whitelisted > commands. See the "git-shell" manual page (github doesn't use that exact > software, but something similar). > > But overall, it would be nice to have some rationale for this approach > other than that you think polling is ugly. There's a lot of advantages > to polling for something you don't need near-instantly, e.g. imagine how > many active connections a site like GitHub would need to handle if > something like this became widely used, that's in a lot of ways harder > to scale and load balance than just having clients that poll something > that's trivially cached as static content. Polling becomes more expensive with TLS and high-latency connections, and also increases power consumption if done frequently for redundancy purposes. I've long wanted to do something better to allow others to keep public-inbox mirrors up-to-date. Having only 64-128 bytes of overhead per userspace per-connection should be totally doable based on my experience working on cmogstored; at which point port exhaustion will become the limiting factor (or TLS overhead for HTTPS). But perhaps a cheaper option might be the traditional email/IRC notification and having a client-side process watch for that before fetching. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Monitoring a repository for changes 2017-06-21 19:52 ` Eric Wong @ 2017-06-21 21:56 ` Ævar Arnfjörð Bjarmason 2017-06-21 22:20 ` Eric Wong 0 siblings, 1 reply; 9+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2017-06-21 21:56 UTC (permalink / raw) To: Eric Wong; +Cc: Tim Hutt, git On Wed, Jun 21 2017, Eric Wong jotted: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: >> On Wed, Jun 21 2017, Tim Hutt jotted: >> >> > Hi, >> > >> > Currently if you want to monitor a repository for changes there are >> > three options: >> > >> > * Polling - run a script to check for updates every 60 seconds. >> > * Server side hooks >> > * Web hooks (on Github, Bitbucket etc.) >> > >> > Unfortunately for many (most?) cases server-side hooks and web hooks >> > are not suitable. They require you to both have admin access to the >> > repo and have a public server available to push updates to. That is a >> > huge faff when all I want to do is run some local code when a repo is >> > updated (e.g. play a sound). > > Yeah, it kinda sucks that way. > > Currently, for one of my public-inbox mirrors which has ssh > access to the primary server on public-inbox.org, I have: > > #!/bin/sh > while true > do > # GNU tail(1) uses inotify to avoid polling on Linux > ssh public-inbox.org tail -F /path/to/git-vger.git/info/refs | \ > while read sha1 ref > do > for GIT_DIR in git-vger.git > do > export GIT_DIR > git fetch || continue > git update-server-info > public-inbox-index # update Xapian index > done > done > done > > It's not perfect as it requires multiple processes on the > server, but it's better than polling for my limited use. > >> > Currently people resort to polling >> > (https://stackoverflow.com/a/5199111/265521) which is just ugly. I >> > would like to propose that there should be a forth option that uses a >> > persistent connection to monitor the repo. It would be used something >> > like this: >> > >> > git watch https://github.com/git/git.git >> > >> > or >> > >> > git watch git@github.com:git/git.git >> > >> > It would then print simple messages to stdout. The complexity of what >> > it prints is up for debate, - it could be something as simple as >> > "PUSH\n", or it could include more information, e.g. JSON-encoded >> > information about the commits. I'd be happy with just "PUSH\n" though. >> >> Insofar as this could be implemented in some standard way in Git it's >> likely to have a large overlap with the "protocol v2" that keeps coming >> up here on-list. You might want to search for past threads discussing >> that. > > Yeah, it hasn't been a priority for me, either... > >> > In terms of implementation, the HTTP transport could use Server-Sent >> > Events, and the SSH transport can pretty much do whatever so that >> > should be easy. >> >> In case you didn't know, any of the non-trivially sized git hosting >> providers (e.g. github, gitlab) provide you access over ssh, but you >> can't just run any arbitrary command, it's a tiny set of whitelisted >> commands. See the "git-shell" manual page (github doesn't use that exact >> software, but something similar). >> >> But overall, it would be nice to have some rationale for this approach >> other than that you think polling is ugly. There's a lot of advantages >> to polling for something you don't need near-instantly, e.g. imagine how >> many active connections a site like GitHub would need to handle if >> something like this became widely used, that's in a lot of ways harder >> to scale and load balance than just having clients that poll something >> that's trivially cached as static content. > > Polling becomes more expensive with TLS and high-latency > connections, and also increases power consumption if done > frequently for redundancy purposes. > > I've long wanted to do something better to allow others to keep > public-inbox mirrors up-to-date. Having only 64-128 bytes of > overhead per userspace per-connection should be totally doable > based on my experience working on cmogstored; at which point > port exhaustion will become the limiting factor (or TLS overhead > for HTTPS). Come to think of it I should probably have asked you about this, but I have a one-liner running that polls every 5 minutes, but will stop if I haven't changed my git.git in a day: while true; do if test $(find ~/g/git -type f -mmin -1440 | wc -l) -gt 0; then git pull; else echo too old; fi ; date ; sleep 300; done > But perhaps a cheaper option might be the traditional email/IRC > notification and having a client-side process watch for that > before fetching. If there was a IRC channel with this info I could/would use that, getting it via E-Mail would just get me into the same problem public-inbox is currently solving for me, i.e. I might as well keep the git ML up-to-date on that machine if I'm going to otherwise need to subscribe to a "hey there's a new message on the git ML" list :) ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Monitoring a repository for changes 2017-06-21 21:56 ` Ævar Arnfjörð Bjarmason @ 2017-06-21 22:20 ` Eric Wong 2017-06-21 22:36 ` Eric Wong 0 siblings, 1 reply; 9+ messages in thread From: Eric Wong @ 2017-06-21 22:20 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: Tim Hutt, git Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > On Wed, Jun 21 2017, Eric Wong jotted: > > I've long wanted to do something better to allow others to keep > > public-inbox mirrors up-to-date. Having only 64-128 bytes of > > overhead per userspace per-connection should be totally doable > > based on my experience working on cmogstored; at which point > > port exhaustion will become the limiting factor (or TLS overhead > > for HTTPS). > > Come to think of it I should probably have asked you about this, but I > have a one-liner running that polls every 5 minutes, but will stop if I > haven't changed my git.git in a day: > > while true; do if test $(find ~/g/git -type f -mmin -1440 | wc -l) -gt 0; then git pull; else echo too old; fi ; date ; sleep 300; done Polling https://public-inbox.org/git ? no need to stop it, every 5 seconds is fine if you're not worried about power consumption on your end :) > > But perhaps a cheaper option might be the traditional email/IRC > > notification and having a client-side process watch for that > > before fetching. > > If there was a IRC channel with this info I could/would use that, > getting it via E-Mail would just get me into the same problem > public-inbox is currently solving for me, i.e. I might as well keep the > git ML up-to-date on that machine if I'm going to otherwise need to > subscribe to a "hey there's a new message on the git ML" list :) The IRC server would have the same scalability problems faced by maintaining persistent connections to git-daemon or HTTP servers, however. And, yes, email does seem redundant, and modern header sizes (with DKIM, etc) are gigantic; but connection lifetime and concurrency is manageable to the server even if not instantaneous. I also considered having clients setup a listener of some sort, (possibly using UDP) but that would have all the problems with git:// + firewalls. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Monitoring a repository for changes 2017-06-21 22:20 ` Eric Wong @ 2017-06-21 22:36 ` Eric Wong 0 siblings, 0 replies; 9+ messages in thread From: Eric Wong @ 2017-06-21 22:36 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: Tim Hutt, git Eric Wong <e@80x24.org> wrote: > And, yes, email does seem redundant, and > modern header sizes (with DKIM, etc) are gigantic; but > connection lifetime and concurrency is manageable to the server > even if not instantaneous. I should add that any email notification message should be significantly shorter than a normal message going to the list; possibly just a parsable subject line and empty body. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Monitoring a repository for changes 2017-06-21 14:27 Monitoring a repository for changes Tim Hutt 2017-06-21 15:04 ` Ævar Arnfjörð Bjarmason @ 2017-06-21 21:19 ` Jonathan Nieder 1 sibling, 0 replies; 9+ messages in thread From: Jonathan Nieder @ 2017-06-21 21:19 UTC (permalink / raw) To: Tim Hutt; +Cc: git Hi, Tim Hutt wrote: > Currently if you want to monitor a repository for changes there are > three options: > > * Polling - run a script to check for updates every 60 seconds. > * Server side hooks > * Web hooks (on Github, Bitbucket etc.) > > Unfortunately for many (most?) cases server-side hooks and web hooks > are not suitable. They require you to both have admin access to the > repo and have a public server available to push updates to. That is a > huge faff when all I want to do is run some local code when a repo is > updated (e.g. play a sound). On the polling side, it is possible to improve things a little: https://www.kernel.org/mirroring-kernelorg-repositories.html https://github.com/mricon/grokmirror A hanging GET or websocket is more client-friendly but more expensive server-side. That doesn't rule out making it happen on some servers if someone does the work. If I understand correctly then this architecture tends to lead to centralization --- a small number of services providing notifications pushed from multiple sources, as with https://developers.google.com/web/fundamentals/engage-and-retain/push-notifications/how-push-works If someone wants to try adding something like grokmirror (which describes the state of multiple repositories, amortizing the per-request costs) to git, especially if it supports something etag-like as Jeff King suggested, then I would be interested. Thanks and hope that helps, Jonathan ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2017-06-21 22:37 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-06-21 14:27 Monitoring a repository for changes Tim Hutt 2017-06-21 15:04 ` Ævar Arnfjörð Bjarmason 2017-06-21 19:44 ` Jeff King 2017-06-21 19:55 ` Stefan Beller 2017-06-21 19:52 ` Eric Wong 2017-06-21 21:56 ` Ævar Arnfjörð Bjarmason 2017-06-21 22:20 ` Eric Wong 2017-06-21 22:36 ` Eric Wong 2017-06-21 21:19 ` Jonathan Nieder
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).