git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Jeremie Nikaes <jeremie.nikaes@ensimag.imag.fr>
Cc: git@vger.kernel.org,
	"Arnaud Lacurie" <arnaud.lacurie@ensimag.imag.fr>,
	"Claire Fousse" <claire.fousse@ensimag.imag.fr>,
	"David Amouyal" <david.amouyal@ensimag.imag.fr>,
	"Matthieu Moy" <matthieu.moy@grenoble-inp.fr>,
	"Sylvain Boulmé" <sylvain.boulme@imag.fr>
Subject: Re: [PATCHv3 1/2] Add a remote helper to interact with mediawiki, pull & clone handled
Date: Thu, 9 Jun 2011 18:44:25 -0400	[thread overview]
Message-ID: <20110609224424.GA5364@sigill.intra.peff.net> (raw)
In-Reply-To: <1307625360-10973-1-git-send-email-jeremie.nikaes@ensimag.imag.fr>

On Thu, Jun 09, 2011 at 03:15:59PM +0200, Jeremie Nikaes wrote:

> For now, the whole wiki is cloned, but it will be possible to clone only
> some pages: the clone is based on a list of pages which is now all
> pages.

This is not really true anymore, is it? Later you say:

> Partial cloning is supported using the following syntax :
> "git clone mediawiki::http://wikiurl##A_Page##Another_Page"
> As always, this url is kept in .git/config, helping to always keep
> track of these specific pages

so I think it is not "will be" possible any more.

> +sub get_pages{
> +	my $mediawiki = MediaWiki::API->new;
> +	$mediawiki->{config}->{api_url} = "$url/api.php";
> [...]
> +	} else {
> +		#the list of titles should follow the pattern 'page1|page2|...'
> +		my $titles = "";
> +		foreach my $title (@pages_titles){
> +			$titles.="$title|";
> +		}
> +		#supress the last | that is add in the foreach
> +		chop($titles);

This is usually spelled:

  my $titles = join('|', @pages_titles);

> +		$pages = $mediawiki->api({
> +			action => 'query',
> +			titles => $titles,
> +		});
> +		if (!defined($pages)) {
> +			print STDERR "fatal: None of the pages exist \n";
> +			exit 1;
> +		}

That's not an accurate error message. If the pages don't exist, we will
actually get back a set of pages with negative ids (so you can tell
which ones exist and which ones don't). If $pages is undefined, it's
actually not a valid mediawiki repo.

Also, according to the mediawiki API, we can send only 51 titles at a
time. So we need to break this into pieces.

However, I wonder if this code path is needed at all. We are mapping
titles to page ids, so that we can later ask mediawiki for revisions by
page id. But why not just ask for revisions by title, and skip this
extra round trip to the server?

Speaking of round trips, I did have an idea for reducing round-trips in
the "mostly up to date" case. We can ask for the revisions for multiple
titles at once (apparently up to 51, or 501 if you have special bot
privileges), but you will only get the latest revision for each. Which
isn't sufficient for us to do anything except tell whether or not there
are any revisions to fetch.

So without the optimization, with N pages we will make N requests for
new revisions. But with it, we will make N/51 requests for the latest
revisions, and then M (where M <= N) for every page that actually has
new content. In other words, it is a good optimization as long as less
than 49/51 pages have changed, on average. So it's bad for "clone", but
very good for a subsequent "fetch". The best case is a fetch where
nothing has changed, which should have 1/51th as many round-trips to
determine that is the case.

-Peff

  parent reply	other threads:[~2011-06-09 22:44 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-09 13:15 [PATCHv3 1/2] Add a remote helper to interact with mediawiki, pull & clone handled Jeremie Nikaes
2011-06-09 13:16 ` [RFC/PATCH 2/2] Git-remote-mediawiki: Add push support Jeremie Nikaes
2011-06-09 17:15   ` Junio C Hamano
2011-06-09 14:03 ` [PATCHv3 1/2] Add a remote helper to interact with mediawiki, pull & clone handled Sverre Rabbelier
2011-06-09 14:30   ` Jérémie NIKAES
2011-06-09 14:32     ` Sverre Rabbelier
2011-06-09 22:44 ` Jeff King [this message]
2011-06-10  0:21 ` Jeff King
2011-06-10  6:31   ` Arnaud Lacurie
2011-06-10  7:22     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110609224424.GA5364@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=arnaud.lacurie@ensimag.imag.fr \
    --cc=claire.fousse@ensimag.imag.fr \
    --cc=david.amouyal@ensimag.imag.fr \
    --cc=git@vger.kernel.org \
    --cc=jeremie.nikaes@ensimag.imag.fr \
    --cc=matthieu.moy@grenoble-inp.fr \
    --cc=sylvain.boulme@imag.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).