From: "Simon.Cathebras" <Simon.Cathebras@ensimag.imag.fr>
To: Pavel Volek <Pavel.Volek@ensimag.imag.fr>
Cc: git@vger.kernel.org, Volek Pavel <me@pavelvolek.cz>,
NGUYEN Kim Thuat <Kim-Thuat.Nguyen@ensimag.imag.fr>,
ROUCHER IGLESIAS Javier <roucherj@ensimag.imag.fr>,
Matthieu Moy <Matthieu.Moy@imag.fr>
Subject: Re: [PATCHv1] git-remote-mediawiki: import "File:" attachments
Date: Fri, 08 Jun 2012 18:20:59 +0200 [thread overview]
Message-ID: <4FD2266B.3040706@ensimag.imag.fr> (raw)
In-Reply-To: <1339165376-20267-1-git-send-email-Pavel.Volek@ensimag.imag.fr>
On 08/06/2012 16:22, Pavel Volek wrote:
> From: Volek Pavel<me@pavelvolek.cz>
>
> The current version of the git-remote-mediawiki supports only import and export
> of the pages, doesn't support import and export of file attachements which are
> also exposed by MediaWiki API. This patch adds the functionality to import the
> last versions of the files and all versions of description pages for these
> files.
>
> Signed-off-by: Pavel Volek<Pavel.Volek@ensimag.imag.fr>
> Signed-off-by: NGUYEN Kim Thuat<Kim-Thuat.Nguyen@ensimag.imag.fr>
> Signed-off-by: ROUCHER IGLESIAS Javier<roucherj@ensimag.imag.fr>
> Signed-off-by: Matthieu Moy<Matthieu.Moy@imag.fr>
> ---
> contrib/mw-to-git/git-remote-mediawiki | 290 +++++++++++++++++++++++++++------
> 1 file changed, 244 insertions(+), 46 deletions(-)
I am wondering why are you showing the removal for a v1 patch ?
>
> diff --git a/contrib/mw-to-git/git-remote-mediawiki b/contrib/mw-to-git/git-remote-mediawiki
> index c18bfa1..9f21217 100755
> --- a/contrib/mw-to-git/git-remote-mediawiki
> +++ b/contrib/mw-to-git/git-remote-mediawiki
> @@ -212,59 +212,230 @@ sub get_mw_pages {
> my $user_defined;
> if (@tracked_pages) {
> $user_defined = 1;
> - # The user provided a list of pages titles, but we
> - # still need to query the API to get the page IDs.
> -
> - my @some_pages = @tracked_pages;
> - while (@some_pages) {
> - my $last = 50;
> - if ($#some_pages< $last) {
> - $last = $#some_pages;
> - }
> - my @slice = @some_pages[0..$last];
> - get_mw_first_pages(\@slice, \%pages);
> - @some_pages = @some_pages[51..$#some_pages];
> - }
> + get_mw_tracked_pages(\%pages);
> }
> if (@tracked_categories) {
> $user_defined = 1;
> - foreach my $category (@tracked_categories) {
> - if (index($category, ':')< 0) {
> - # Mediawiki requires the Category
> - # prefix, but let's not force the user
> - # to specify it.
> - $category = "Category:" . $category;
> - }
> - my $mw_pages = $mediawiki->list( {
> - action => 'query',
> - list => 'categorymembers',
> - cmtitle => $category,
> - cmlimit => 'max' } )
> - || die $mediawiki->{error}->{code} . ': ' . $mediawiki->{error}->{details};
> - foreach my $page (@{$mw_pages}) {
> - $pages{$page->{title}} = $page;
> - }
> - }
> + get_mw_tracked_categories(\%pages);
> }
> if (!$user_defined) {
> - # No user-provided list, get the list of pages from
> - # the API.
> - my $mw_pages = $mediawiki->list({
> - action => 'query',
> - list => 'allpages',
> - aplimit => 500,
> - });
> - if (!defined($mw_pages)) {
> - print STDERR "fatal: could not get the list of wiki pages.\n";
> - print STDERR "fatal: '$url' does not appear to be a mediawiki\n";
> - print STDERR "fatal: make sure '$url/api.php' is a valid page.\n";
> - exit 1;
> + get_mw_all_pages(\%pages);
> + }
> + return values(%pages);
> +}
> +
> +sub get_mw_all_pages {
> + my $pages = shift;
> + # No user-provided list, get the list of pages from the API.
> + my $mw_pages = $mediawiki->list({
> + action => 'query',
> + list => 'allpages',
> + aplimit => 500,
> + });
> + if (!defined($mw_pages)) {
> + print STDERR "fatal: could not get the list of wiki pages.\n";
> + print STDERR "fatal: '$url' does not appear to be a mediawiki\n";
> + print STDERR "fatal: make sure '$url/api.php' is a valid page.\n";
> + exit 1;
> + }
> + foreach my $page (@{$mw_pages}) {
> + $pages->{$page->{title}} = $page;
> + }
> +
> + # Attach list of all pages for meadia files from the API,
> + # they are in a different namespace, only one namespace
> + # can be queried at the same moment
> + my $mw_mediapages = $mediawiki->list({
> + action => 'query',
> + list => 'allpages',
> + apnamespace => get_mw_namespace_id("File"),
> + aplimit => 500,
> + });
> + if (!defined($mw_mediapages)) {
> + print STDERR "fatal: could not get the list of media file pages.\n";
> + print STDERR "fatal: '$url' does not appear to be a mediawiki\n";
> + print STDERR "fatal: make sure '$url/api.php' is a valid page.\n";
> + exit 1;
> + }
> + foreach my $page (@{$mw_mediapages}) {
> + $pages->{$page->{title}} = $page;
> + }
> +}
> +
> +sub get_mw_tracked_pages {
> + my $pages = shift;
> + # The user provided a list of pages titles, but we
> + # still need to query the API to get the page IDs.
> + my @some_pages = @tracked_pages;
> + while (@some_pages) {
> + my $last = 50;
> + if ($#some_pages< $last) {
> + $last = $#some_pages;
> + }
> + my @slice = @some_pages[0..$last];
> + get_mw_first_pages(\@slice, \%{$pages});
> + @some_pages = @some_pages[51..$#some_pages];
> + }
> +
> + # Get pages of related media files.
> + get_mw_linked_mediapages(\@tracked_pages, \%{$pages});
> +}
> +
> +sub get_mw_tracked_categories {
> + my $pages = shift;
> + foreach my $category (@tracked_categories) {
> + if (index($category, ':')< 0) {
> + # Mediawiki requires the Category
> + # prefix, but let's not force the user
> + # to specify it.
> + $category = "Category:" . $category;
> }
> + my $mw_pages = $mediawiki->list( {
> + action => 'query',
> + list => 'categorymembers',
> + cmtitle => $category,
> + cmlimit => 'max' } )
> + || die $mediawiki->{error}->{code} . ': '
> + . $mediawiki->{error}->{details};
> foreach my $page (@{$mw_pages}) {
> - $pages{$page->{title}} = $page;
> + $pages->{$page->{title}} = $page;
> + }
> +
> + my @titles = map $_->{title}, @{$mw_pages};
> + # Get pages of related media files.
> + get_mw_linked_mediapages(\@titles, \%{$pages});
> + }
> +}
> +
> +sub get_mw_linked_mediapages {
> + my $titles = shift;
> + my @titles = @{$titles};
> + my $pages = shift;
> +
> + # pattern 'page1|page2|...' required by the API
> + my $mw_titles = join('|', @titles);
> +
> + # Media files could be included or linked from
> + # a page, get all related
> + my $query = {
> + action => 'query',
> + prop => 'links|images',
> + titles => $mw_titles,
> + plnamespace => get_mw_namespace_id("File"),
> + pllimit => 500,
> + };
Why a comma after 500 ?
> + my $result = $mediawiki->api($query);
What happened if the titles in the query contains special character
which are not allowed by mediawiki for filename like { or [.
Maybe you should build a test for it and if it doesn't work try out the
functions called:
mediawiki_clean/smudge_filename
in the file git-remote-mediawiki
> +
> + while (my ($id, $page) = each(%{$result->{query}->{pages}})) {
> + my @titles;
> + if (defined($page->{links})) {
> + my @link_titles = map $_->{title}, @{$page->{links}};
> + push(@titles, @link_titles);
> + }
> + if (defined($page->{images})) {
> + my @image_titles = map $_->{title}, @{$page->{images}};
> + push(@titles, @image_titles);
> + }
> + if (@titles) {
> + get_mw_first_pages(\@titles, \%{$pages});
> }
> }
> - return values(%pages);
> +}
> +
> +sub get_mw_medafile_for_mediapage_revision {
> + # Name of the file on Wiki, with the prefix.
> + my $mw_filename = shift;
> + my $timestamp = shift;
> + my %mediafile;
> +
> + # Search if on MediaWiki exists a media file with given
> + # timestamp and in that case download the file.
> + my $query = {
> + action => 'query',
> + prop => 'imageinfo',
> + titles => $mw_filename,
> + iistart => $timestamp,
> + iiend => $timestamp,
> + iiprop => 'timestamp|archivename',
> + iilimit => 1,
> + };
Why a comma after iilimit ? (end of list of parameter here I think...)
> + my $result = $mediawiki->api($query);
> +
> + my ($fileid, $file) = each ( %{$result->{query}->{pages}} );
> + if (defined($file->{imageinfo})) {
> + my $fileinfo = pop(@{$file->{imageinfo}});
> + if (defined($fileinfo->{archivename})) {
> + return; # now we are not able to download files from archive
> + }
> +
> + my $filename; # real filename without prefix
> + if (index($mw_filename, 'File:') == 0) {
> + $filename = substr $mw_filename, 5;
> + } else {
> + $filename = substr $mw_filename, 6;
> + }
> +
> + $mediafile{title} = $filename;
> + $mediafile{content} = download_mw_mediafile($mw_filename);
> + }
> + return %mediafile;
> +}
> +
> +# Returns MediaWiki id for a canonical namespace name.
> +# Ex.: "File", "Project".
> +# Looks for the namespace id in the local configuration
> +# variables, if it is not found asks MW API.
> +sub get_mw_namespace_id {
> + mw_connect_maybe();
> +
> + my $name = shift;
> +
> + # Look at configuration file, if the record
> + # for that namespace is already stored.
> + my @tracked_namespaces = split(/[ \n]/, run_git("config --get-all remote.". $remotename .".namespaces"));
Broken indentation/line too long ?
> +
> + # NS not found => get namespace id from MW and store it in
> + # configuration file.
> + my $query = {
> + action => 'query',
> + meta => 'siteinfo',
> + siprop => 'namespaces',
> + };
Same here concerning comma.
> + my $result = $mediawiki->api($query);
> +
> + while (my ($id, $ns) = each(%{$result->{query}->{namespaces}})) {
> + if (defined($ns->{canonical})&& ($ns->{canonical} eq $name)) {
> + run_git("config --add remote.". $remotename .".namespaces ". $name ."=". $ns->{id});
> + return $ns->{id};
> + }
> + }
> + die "Namespace $name was not found on MediaWiki.";
> +}
> +
> +sub download_mw_mediafile {
> + my $filename = shift;
> +
> + $mediawiki->{config}->{files_url} = $url;
> +
> + my $file = $mediawiki->download( { title => $filename } );
Just wondering: What happened if $filename contains some forbidden
character on wiki's filename such as '{' or '|' ?
I am worrying about it because i've got some similar issues in my own
work on tests for git-remote-mediawiki.
Hope I helped :).
Simon
--
CATHEBRAS Simon
2A-ENSIMAG
Filière Ingéniérie des Systèmes d'Information
Membre Bug-Buster
next prev parent reply other threads:[~2012-06-08 16:21 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-08 14:22 [PATCHv1] git-remote-mediawiki: import "File:" attachments Pavel Volek
2012-06-08 14:42 ` Matthieu Moy
2012-06-08 16:20 ` Simon.Cathebras [this message]
2012-06-08 17:03 ` konglu
2012-06-08 23:24 ` Simon.Cathebras
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FD2266B.3040706@ensimag.imag.fr \
--to=simon.cathebras@ensimag.imag.fr \
--cc=Kim-Thuat.Nguyen@ensimag.imag.fr \
--cc=Matthieu.Moy@imag.fr \
--cc=Pavel.Volek@ensimag.imag.fr \
--cc=git@vger.kernel.org \
--cc=me@pavelvolek.cz \
--cc=roucherj@ensimag.imag.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).