* [PATCHv1] git-remote-mediawiki: import "File:" attachments
@ 2012-06-08 14:22 Pavel Volek
2012-06-08 14:42 ` Matthieu Moy
2012-06-08 16:20 ` Simon.Cathebras
0 siblings, 2 replies; 5+ messages in thread
From: Pavel Volek @ 2012-06-08 14:22 UTC (permalink / raw)
To: git
Cc: Volek Pavel, Pavel Volek, NGUYEN Kim Thuat,
ROUCHER IGLESIAS Javier, Matthieu Moy
From: Volek Pavel <me@pavelvolek.cz>
The current version of the git-remote-mediawiki supports only import and export
of the pages, doesn't support import and export of file attachements which are
also exposed by MediaWiki API. This patch adds the functionality to import the
last versions of the files and all versions of description pages for these
files.
Signed-off-by: Pavel Volek <Pavel.Volek@ensimag.imag.fr>
Signed-off-by: NGUYEN Kim Thuat <Kim-Thuat.Nguyen@ensimag.imag.fr>
Signed-off-by: ROUCHER IGLESIAS Javier <roucherj@ensimag.imag.fr>
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
---
contrib/mw-to-git/git-remote-mediawiki | 290 +++++++++++++++++++++++++++------
1 file changed, 244 insertions(+), 46 deletions(-)
diff --git a/contrib/mw-to-git/git-remote-mediawiki b/contrib/mw-to-git/git-remote-mediawiki
index c18bfa1..9f21217 100755
--- a/contrib/mw-to-git/git-remote-mediawiki
+++ b/contrib/mw-to-git/git-remote-mediawiki
@@ -212,59 +212,230 @@ sub get_mw_pages {
my $user_defined;
if (@tracked_pages) {
$user_defined = 1;
- # The user provided a list of pages titles, but we
- # still need to query the API to get the page IDs.
-
- my @some_pages = @tracked_pages;
- while (@some_pages) {
- my $last = 50;
- if ($#some_pages < $last) {
- $last = $#some_pages;
- }
- my @slice = @some_pages[0..$last];
- get_mw_first_pages(\@slice, \%pages);
- @some_pages = @some_pages[51..$#some_pages];
- }
+ get_mw_tracked_pages(\%pages);
}
if (@tracked_categories) {
$user_defined = 1;
- foreach my $category (@tracked_categories) {
- if (index($category, ':') < 0) {
- # Mediawiki requires the Category
- # prefix, but let's not force the user
- # to specify it.
- $category = "Category:" . $category;
- }
- my $mw_pages = $mediawiki->list( {
- action => 'query',
- list => 'categorymembers',
- cmtitle => $category,
- cmlimit => 'max' } )
- || die $mediawiki->{error}->{code} . ': ' . $mediawiki->{error}->{details};
- foreach my $page (@{$mw_pages}) {
- $pages{$page->{title}} = $page;
- }
- }
+ get_mw_tracked_categories(\%pages);
}
if (!$user_defined) {
- # No user-provided list, get the list of pages from
- # the API.
- my $mw_pages = $mediawiki->list({
- action => 'query',
- list => 'allpages',
- aplimit => 500,
- });
- if (!defined($mw_pages)) {
- print STDERR "fatal: could not get the list of wiki pages.\n";
- print STDERR "fatal: '$url' does not appear to be a mediawiki\n";
- print STDERR "fatal: make sure '$url/api.php' is a valid page.\n";
- exit 1;
+ get_mw_all_pages(\%pages);
+ }
+ return values(%pages);
+}
+
+sub get_mw_all_pages {
+ my $pages = shift;
+ # No user-provided list, get the list of pages from the API.
+ my $mw_pages = $mediawiki->list({
+ action => 'query',
+ list => 'allpages',
+ aplimit => 500,
+ });
+ if (!defined($mw_pages)) {
+ print STDERR "fatal: could not get the list of wiki pages.\n";
+ print STDERR "fatal: '$url' does not appear to be a mediawiki\n";
+ print STDERR "fatal: make sure '$url/api.php' is a valid page.\n";
+ exit 1;
+ }
+ foreach my $page (@{$mw_pages}) {
+ $pages->{$page->{title}} = $page;
+ }
+
+ # Attach list of all pages for meadia files from the API,
+ # they are in a different namespace, only one namespace
+ # can be queried at the same moment
+ my $mw_mediapages = $mediawiki->list({
+ action => 'query',
+ list => 'allpages',
+ apnamespace => get_mw_namespace_id("File"),
+ aplimit => 500,
+ });
+ if (!defined($mw_mediapages)) {
+ print STDERR "fatal: could not get the list of media file pages.\n";
+ print STDERR "fatal: '$url' does not appear to be a mediawiki\n";
+ print STDERR "fatal: make sure '$url/api.php' is a valid page.\n";
+ exit 1;
+ }
+ foreach my $page (@{$mw_mediapages}) {
+ $pages->{$page->{title}} = $page;
+ }
+}
+
+sub get_mw_tracked_pages {
+ my $pages = shift;
+ # The user provided a list of pages titles, but we
+ # still need to query the API to get the page IDs.
+ my @some_pages = @tracked_pages;
+ while (@some_pages) {
+ my $last = 50;
+ if ($#some_pages < $last) {
+ $last = $#some_pages;
+ }
+ my @slice = @some_pages[0..$last];
+ get_mw_first_pages(\@slice, \%{$pages});
+ @some_pages = @some_pages[51..$#some_pages];
+ }
+
+ # Get pages of related media files.
+ get_mw_linked_mediapages(\@tracked_pages, \%{$pages});
+}
+
+sub get_mw_tracked_categories {
+ my $pages = shift;
+ foreach my $category (@tracked_categories) {
+ if (index($category, ':') < 0) {
+ # Mediawiki requires the Category
+ # prefix, but let's not force the user
+ # to specify it.
+ $category = "Category:" . $category;
}
+ my $mw_pages = $mediawiki->list( {
+ action => 'query',
+ list => 'categorymembers',
+ cmtitle => $category,
+ cmlimit => 'max' } )
+ || die $mediawiki->{error}->{code} . ': '
+ . $mediawiki->{error}->{details};
foreach my $page (@{$mw_pages}) {
- $pages{$page->{title}} = $page;
+ $pages->{$page->{title}} = $page;
+ }
+
+ my @titles = map $_->{title}, @{$mw_pages};
+ # Get pages of related media files.
+ get_mw_linked_mediapages(\@titles, \%{$pages});
+ }
+}
+
+sub get_mw_linked_mediapages {
+ my $titles = shift;
+ my @titles = @{$titles};
+ my $pages = shift;
+
+ # pattern 'page1|page2|...' required by the API
+ my $mw_titles = join('|', @titles);
+
+ # Media files could be included or linked from
+ # a page, get all related
+ my $query = {
+ action => 'query',
+ prop => 'links|images',
+ titles => $mw_titles,
+ plnamespace => get_mw_namespace_id("File"),
+ pllimit => 500,
+ };
+ my $result = $mediawiki->api($query);
+
+ while (my ($id, $page) = each(%{$result->{query}->{pages}})) {
+ my @titles;
+ if (defined($page->{links})) {
+ my @link_titles = map $_->{title}, @{$page->{links}};
+ push(@titles, @link_titles);
+ }
+ if (defined($page->{images})) {
+ my @image_titles = map $_->{title}, @{$page->{images}};
+ push(@titles, @image_titles);
+ }
+ if (@titles) {
+ get_mw_first_pages(\@titles, \%{$pages});
}
}
- return values(%pages);
+}
+
+sub get_mw_medafile_for_mediapage_revision {
+ # Name of the file on Wiki, with the prefix.
+ my $mw_filename = shift;
+ my $timestamp = shift;
+ my %mediafile;
+
+ # Search if on MediaWiki exists a media file with given
+ # timestamp and in that case download the file.
+ my $query = {
+ action => 'query',
+ prop => 'imageinfo',
+ titles => $mw_filename,
+ iistart => $timestamp,
+ iiend => $timestamp,
+ iiprop => 'timestamp|archivename',
+ iilimit => 1,
+ };
+ my $result = $mediawiki->api($query);
+
+ my ($fileid, $file) = each ( %{$result->{query}->{pages}} );
+ if (defined($file->{imageinfo})) {
+ my $fileinfo = pop(@{$file->{imageinfo}});
+ if (defined($fileinfo->{archivename})) {
+ return; # now we are not able to download files from archive
+ }
+
+ my $filename; # real filename without prefix
+ if (index($mw_filename, 'File:') == 0) {
+ $filename = substr $mw_filename, 5;
+ } else {
+ $filename = substr $mw_filename, 6;
+ }
+
+ $mediafile{title} = $filename;
+ $mediafile{content} = download_mw_mediafile($mw_filename);
+ }
+ return %mediafile;
+}
+
+# Returns MediaWiki id for a canonical namespace name.
+# Ex.: "File", "Project".
+# Looks for the namespace id in the local configuration
+# variables, if it is not found asks MW API.
+sub get_mw_namespace_id {
+ mw_connect_maybe();
+
+ my $name = shift;
+
+ # Look at configuration file, if the record
+ # for that namespace is already stored.
+ my @tracked_namespaces = split(/[ \n]/, run_git("config --get-all remote.". $remotename .".namespaces"));
+ chomp(@tracked_namespaces);
+ if (@tracked_namespaces) {
+ foreach my $ns (@tracked_namespaces) {
+ my @ns_split = split(/=/, $ns);
+ if ($ns_split[0] eq $name) {
+ return $ns_split[1];
+ }
+ }
+ }
+
+ # NS not found => get namespace id from MW and store it in
+ # configuration file.
+ my $query = {
+ action => 'query',
+ meta => 'siteinfo',
+ siprop => 'namespaces',
+ };
+ my $result = $mediawiki->api($query);
+
+ while (my ($id, $ns) = each(%{$result->{query}->{namespaces}})) {
+ if (defined($ns->{canonical}) && ($ns->{canonical} eq $name)) {
+ run_git("config --add remote.". $remotename .".namespaces ". $name ."=". $ns->{id});
+ return $ns->{id};
+ }
+ }
+ die "Namespace $name was not found on MediaWiki.";
+}
+
+sub download_mw_mediafile {
+ my $filename = shift;
+
+ $mediawiki->{config}->{files_url} = $url;
+
+ my $file = $mediawiki->download( { title => $filename } );
+ if (!defined($file)) {
+ print STDERR "\tFile \'$filename\' could not be downloaded.\n";
+ exit 1;
+ } elsif ($file eq "") {
+ print STDERR "\tFile \'$filename\' does not exist on the wiki.\n";
+ exit 1;
+ } else {
+ return $file;
+ }
}
sub run_git {
@@ -466,6 +637,13 @@ sub import_file_revision {
my %commit = %{$commit};
my $full_import = shift;
my $n = shift;
+ my $mediafile_import = shift;
+ my $mediafile;
+ my %mediafile;
+ if ($mediafile_import) {
+ $mediafile = shift;
+ %mediafile = %{$mediafile};
+ }
my $title = $commit{title};
my $comment = $commit{comment};
@@ -485,6 +663,10 @@ sub import_file_revision {
if ($content ne DELETED_CONTENT) {
print STDOUT "M 644 inline $title.mw\n";
literal_data($content);
+ if ($mediafile_import) {
+ print STDOUT "M 644 inline $mediafile{title}\n";
+ literal_data($mediafile{content});
+ }
print STDOUT "\n\n";
} else {
print STDOUT "D $title.mw\n";
@@ -580,6 +762,7 @@ sub mw_import_ref {
$n++;
+ my $page_title = $result->{query}->{pages}->{$pagerevid->{pageid}}->{title};
my %commit;
$commit{author} = $rev->{user} || 'Anonymous';
$commit{comment} = $rev->{comment} || '*Empty MediaWiki Message*';
@@ -596,9 +779,24 @@ sub mw_import_ref {
}
$commit{date} = DateTime::Format::ISO8601->parse_datetime($last_timestamp);
- print STDERR "$n/", scalar(@revisions), ": Revision #$pagerevid->{revid} of $commit{title}\n";
+ # differentiates classic pages and media pages
+ my @prefix = split (":", $page_title);
- import_file_revision(\%commit, ($fetch_from == 1), $n);
+ my %mediafile;
+ if ($prefix[0] eq "File" || $prefix[0] eq "Image") {
+ # The name of the file is the same as the media page.
+ my $filename = $page_title;
+ %mediafile = get_mw_medafile_for_mediapage_revision($filename, $rev->{timestamp});
+ }
+ # If this is a revision of the media page for new version
+ # of a file do one common commit for both file and media page.
+ # Else do commit only for that page.
+ print STDERR "$n/", scalar(@revisions), ": Revision #$pagerevid->{revid} of $commit{title}\n";
+ if (%mediafile) {
+ import_file_revision(\%commit, ($fetch_from == 1), $n, 1, \%mediafile);
+ } else {
+ import_file_revision(\%commit, ($fetch_from == 1), $n, 0);
+ }
}
if ($fetch_from == 1 && $n == 0) {
--
1.7.10.2.552.gaa3bb87
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCHv1] git-remote-mediawiki: import "File:" attachments
2012-06-08 14:22 [PATCHv1] git-remote-mediawiki: import "File:" attachments Pavel Volek
@ 2012-06-08 14:42 ` Matthieu Moy
2012-06-08 16:20 ` Simon.Cathebras
1 sibling, 0 replies; 5+ messages in thread
From: Matthieu Moy @ 2012-06-08 14:42 UTC (permalink / raw)
To: Pavel Volek; +Cc: git, Volek Pavel, NGUYEN Kim Thuat, ROUCHER IGLESIAS Javier
Pavel Volek <Pavel.Volek@ensimag.imag.fr> writes:
> --- a/contrib/mw-to-git/git-remote-mediawiki
> +++ b/contrib/mw-to-git/git-remote-mediawiki
If the patch adds support for [[File:...]], then it should remove/adapt
the comment at the top of the file :
# Known limitations:
#
# - Only wiki pages are managed, no support for [[File:...]]
# attachments.
> @@ -212,59 +212,230 @@ sub get_mw_pages {
> my $user_defined;
> if (@tracked_pages) {
> $user_defined = 1;
> - # The user provided a list of pages titles, but we
> - # still need to query the API to get the page IDs.
> -
> - my @some_pages = @tracked_pages;
> - while (@some_pages) {
> - my $last = 50;
> - if ($#some_pages < $last) {
> - $last = $#some_pages;
> - }
> - my @slice = @some_pages[0..$last];
> - get_mw_first_pages(\@slice, \%pages);
> - @some_pages = @some_pages[51..$#some_pages];
> - }
> + get_mw_tracked_pages(\%pages);
> }
> if (@tracked_categories) {
> $user_defined = 1;
> - foreach my $category (@tracked_categories) {
> - if (index($category, ':') < 0) {
> - # Mediawiki requires the Category
> - # prefix, but let's not force the user
> - # to specify it.
> - $category = "Category:" . $category;
> - }
> - my $mw_pages = $mediawiki->list( {
> - action => 'query',
> - list => 'categorymembers',
> - cmtitle => $category,
> - cmlimit => 'max' } )
> - || die $mediawiki->{error}->{code} . ': ' . $mediawiki->{error}->{details};
> - foreach my $page (@{$mw_pages}) {
> - $pages{$page->{title}} = $page;
> - }
> - }
> + get_mw_tracked_categories(\%pages);
> }
> if (!$user_defined) {
> - # No user-provided list, get the list of pages from
> - # the API.
> - my $mw_pages = $mediawiki->list({
> - action => 'query',
> - list => 'allpages',
> - aplimit => 500,
> - });
> - if (!defined($mw_pages)) {
> - print STDERR "fatal: could not get the list of wiki pages.\n";
> - print STDERR "fatal: '$url' does not appear to be a mediawiki\n";
> - print STDERR "fatal: make sure '$url/api.php' is a valid page.\n";
> - exit 1;
> + get_mw_all_pages(\%pages);
> + }
> + return values(%pages);
> +}
The refactoring is welcome, but it would have been better to make it in
a separate patch. The patch as you made it is long and hard to review,
because it combines several new features, and refactoring.
> +sub get_mw_tracked_pages {
> + my $pages = shift;
> + # The user provided a list of pages titles, but we
> + # still need to query the API to get the page IDs.
> + my @some_pages = @tracked_pages;
> + while (@some_pages) {
> + my $last = 50;
> + if ($#some_pages < $last) {
> + $last = $#some_pages;
> + }
> + my @slice = @some_pages[0..$last];
> + get_mw_first_pages(\@slice, \%{$pages});
> + @some_pages = @some_pages[51..$#some_pages];
> + }
> +
> + # Get pages of related media files.
> + get_mw_linked_mediapages(\@tracked_pages, \%{$pages});
[...]
> +sub get_mw_linked_mediapages {
This is a nice feature, but I think it deserves to be configurable (if
the user explicitely specified one page, it actually seems strange to
import all the files it links to by default). Also, it should be
mentionned in the commit message.
Shouldn't the function be named get_mw_linked_mediafiles instead? In
general, the wording "media page" is used in many places in the code,
I prefer "media file" which is unambiguous.
> +sub get_mw_medafile_for_mediapage_revision {
medafile -> mediafile ?
--
Matthieu Moy
http://www-verimag.imag.fr/~moy/
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCHv1] git-remote-mediawiki: import "File:" attachments
2012-06-08 14:22 [PATCHv1] git-remote-mediawiki: import "File:" attachments Pavel Volek
2012-06-08 14:42 ` Matthieu Moy
@ 2012-06-08 16:20 ` Simon.Cathebras
2012-06-08 17:03 ` konglu
1 sibling, 1 reply; 5+ messages in thread
From: Simon.Cathebras @ 2012-06-08 16:20 UTC (permalink / raw)
To: Pavel Volek
Cc: git, Volek Pavel, NGUYEN Kim Thuat, ROUCHER IGLESIAS Javier,
Matthieu Moy
On 08/06/2012 16:22, Pavel Volek wrote:
> From: Volek Pavel<me@pavelvolek.cz>
>
> The current version of the git-remote-mediawiki supports only import and export
> of the pages, doesn't support import and export of file attachements which are
> also exposed by MediaWiki API. This patch adds the functionality to import the
> last versions of the files and all versions of description pages for these
> files.
>
> Signed-off-by: Pavel Volek<Pavel.Volek@ensimag.imag.fr>
> Signed-off-by: NGUYEN Kim Thuat<Kim-Thuat.Nguyen@ensimag.imag.fr>
> Signed-off-by: ROUCHER IGLESIAS Javier<roucherj@ensimag.imag.fr>
> Signed-off-by: Matthieu Moy<Matthieu.Moy@imag.fr>
> ---
> contrib/mw-to-git/git-remote-mediawiki | 290 +++++++++++++++++++++++++++------
> 1 file changed, 244 insertions(+), 46 deletions(-)
I am wondering why are you showing the removal for a v1 patch ?
>
> diff --git a/contrib/mw-to-git/git-remote-mediawiki b/contrib/mw-to-git/git-remote-mediawiki
> index c18bfa1..9f21217 100755
> --- a/contrib/mw-to-git/git-remote-mediawiki
> +++ b/contrib/mw-to-git/git-remote-mediawiki
> @@ -212,59 +212,230 @@ sub get_mw_pages {
> my $user_defined;
> if (@tracked_pages) {
> $user_defined = 1;
> - # The user provided a list of pages titles, but we
> - # still need to query the API to get the page IDs.
> -
> - my @some_pages = @tracked_pages;
> - while (@some_pages) {
> - my $last = 50;
> - if ($#some_pages< $last) {
> - $last = $#some_pages;
> - }
> - my @slice = @some_pages[0..$last];
> - get_mw_first_pages(\@slice, \%pages);
> - @some_pages = @some_pages[51..$#some_pages];
> - }
> + get_mw_tracked_pages(\%pages);
> }
> if (@tracked_categories) {
> $user_defined = 1;
> - foreach my $category (@tracked_categories) {
> - if (index($category, ':')< 0) {
> - # Mediawiki requires the Category
> - # prefix, but let's not force the user
> - # to specify it.
> - $category = "Category:" . $category;
> - }
> - my $mw_pages = $mediawiki->list( {
> - action => 'query',
> - list => 'categorymembers',
> - cmtitle => $category,
> - cmlimit => 'max' } )
> - || die $mediawiki->{error}->{code} . ': ' . $mediawiki->{error}->{details};
> - foreach my $page (@{$mw_pages}) {
> - $pages{$page->{title}} = $page;
> - }
> - }
> + get_mw_tracked_categories(\%pages);
> }
> if (!$user_defined) {
> - # No user-provided list, get the list of pages from
> - # the API.
> - my $mw_pages = $mediawiki->list({
> - action => 'query',
> - list => 'allpages',
> - aplimit => 500,
> - });
> - if (!defined($mw_pages)) {
> - print STDERR "fatal: could not get the list of wiki pages.\n";
> - print STDERR "fatal: '$url' does not appear to be a mediawiki\n";
> - print STDERR "fatal: make sure '$url/api.php' is a valid page.\n";
> - exit 1;
> + get_mw_all_pages(\%pages);
> + }
> + return values(%pages);
> +}
> +
> +sub get_mw_all_pages {
> + my $pages = shift;
> + # No user-provided list, get the list of pages from the API.
> + my $mw_pages = $mediawiki->list({
> + action => 'query',
> + list => 'allpages',
> + aplimit => 500,
> + });
> + if (!defined($mw_pages)) {
> + print STDERR "fatal: could not get the list of wiki pages.\n";
> + print STDERR "fatal: '$url' does not appear to be a mediawiki\n";
> + print STDERR "fatal: make sure '$url/api.php' is a valid page.\n";
> + exit 1;
> + }
> + foreach my $page (@{$mw_pages}) {
> + $pages->{$page->{title}} = $page;
> + }
> +
> + # Attach list of all pages for meadia files from the API,
> + # they are in a different namespace, only one namespace
> + # can be queried at the same moment
> + my $mw_mediapages = $mediawiki->list({
> + action => 'query',
> + list => 'allpages',
> + apnamespace => get_mw_namespace_id("File"),
> + aplimit => 500,
> + });
> + if (!defined($mw_mediapages)) {
> + print STDERR "fatal: could not get the list of media file pages.\n";
> + print STDERR "fatal: '$url' does not appear to be a mediawiki\n";
> + print STDERR "fatal: make sure '$url/api.php' is a valid page.\n";
> + exit 1;
> + }
> + foreach my $page (@{$mw_mediapages}) {
> + $pages->{$page->{title}} = $page;
> + }
> +}
> +
> +sub get_mw_tracked_pages {
> + my $pages = shift;
> + # The user provided a list of pages titles, but we
> + # still need to query the API to get the page IDs.
> + my @some_pages = @tracked_pages;
> + while (@some_pages) {
> + my $last = 50;
> + if ($#some_pages< $last) {
> + $last = $#some_pages;
> + }
> + my @slice = @some_pages[0..$last];
> + get_mw_first_pages(\@slice, \%{$pages});
> + @some_pages = @some_pages[51..$#some_pages];
> + }
> +
> + # Get pages of related media files.
> + get_mw_linked_mediapages(\@tracked_pages, \%{$pages});
> +}
> +
> +sub get_mw_tracked_categories {
> + my $pages = shift;
> + foreach my $category (@tracked_categories) {
> + if (index($category, ':')< 0) {
> + # Mediawiki requires the Category
> + # prefix, but let's not force the user
> + # to specify it.
> + $category = "Category:" . $category;
> }
> + my $mw_pages = $mediawiki->list( {
> + action => 'query',
> + list => 'categorymembers',
> + cmtitle => $category,
> + cmlimit => 'max' } )
> + || die $mediawiki->{error}->{code} . ': '
> + . $mediawiki->{error}->{details};
> foreach my $page (@{$mw_pages}) {
> - $pages{$page->{title}} = $page;
> + $pages->{$page->{title}} = $page;
> + }
> +
> + my @titles = map $_->{title}, @{$mw_pages};
> + # Get pages of related media files.
> + get_mw_linked_mediapages(\@titles, \%{$pages});
> + }
> +}
> +
> +sub get_mw_linked_mediapages {
> + my $titles = shift;
> + my @titles = @{$titles};
> + my $pages = shift;
> +
> + # pattern 'page1|page2|...' required by the API
> + my $mw_titles = join('|', @titles);
> +
> + # Media files could be included or linked from
> + # a page, get all related
> + my $query = {
> + action => 'query',
> + prop => 'links|images',
> + titles => $mw_titles,
> + plnamespace => get_mw_namespace_id("File"),
> + pllimit => 500,
> + };
Why a comma after 500 ?
> + my $result = $mediawiki->api($query);
What happened if the titles in the query contains special character
which are not allowed by mediawiki for filename like { or [.
Maybe you should build a test for it and if it doesn't work try out the
functions called:
mediawiki_clean/smudge_filename
in the file git-remote-mediawiki
> +
> + while (my ($id, $page) = each(%{$result->{query}->{pages}})) {
> + my @titles;
> + if (defined($page->{links})) {
> + my @link_titles = map $_->{title}, @{$page->{links}};
> + push(@titles, @link_titles);
> + }
> + if (defined($page->{images})) {
> + my @image_titles = map $_->{title}, @{$page->{images}};
> + push(@titles, @image_titles);
> + }
> + if (@titles) {
> + get_mw_first_pages(\@titles, \%{$pages});
> }
> }
> - return values(%pages);
> +}
> +
> +sub get_mw_medafile_for_mediapage_revision {
> + # Name of the file on Wiki, with the prefix.
> + my $mw_filename = shift;
> + my $timestamp = shift;
> + my %mediafile;
> +
> + # Search if on MediaWiki exists a media file with given
> + # timestamp and in that case download the file.
> + my $query = {
> + action => 'query',
> + prop => 'imageinfo',
> + titles => $mw_filename,
> + iistart => $timestamp,
> + iiend => $timestamp,
> + iiprop => 'timestamp|archivename',
> + iilimit => 1,
> + };
Why a comma after iilimit ? (end of list of parameter here I think...)
> + my $result = $mediawiki->api($query);
> +
> + my ($fileid, $file) = each ( %{$result->{query}->{pages}} );
> + if (defined($file->{imageinfo})) {
> + my $fileinfo = pop(@{$file->{imageinfo}});
> + if (defined($fileinfo->{archivename})) {
> + return; # now we are not able to download files from archive
> + }
> +
> + my $filename; # real filename without prefix
> + if (index($mw_filename, 'File:') == 0) {
> + $filename = substr $mw_filename, 5;
> + } else {
> + $filename = substr $mw_filename, 6;
> + }
> +
> + $mediafile{title} = $filename;
> + $mediafile{content} = download_mw_mediafile($mw_filename);
> + }
> + return %mediafile;
> +}
> +
> +# Returns MediaWiki id for a canonical namespace name.
> +# Ex.: "File", "Project".
> +# Looks for the namespace id in the local configuration
> +# variables, if it is not found asks MW API.
> +sub get_mw_namespace_id {
> + mw_connect_maybe();
> +
> + my $name = shift;
> +
> + # Look at configuration file, if the record
> + # for that namespace is already stored.
> + my @tracked_namespaces = split(/[ \n]/, run_git("config --get-all remote.". $remotename .".namespaces"));
Broken indentation/line too long ?
> +
> + # NS not found => get namespace id from MW and store it in
> + # configuration file.
> + my $query = {
> + action => 'query',
> + meta => 'siteinfo',
> + siprop => 'namespaces',
> + };
Same here concerning comma.
> + my $result = $mediawiki->api($query);
> +
> + while (my ($id, $ns) = each(%{$result->{query}->{namespaces}})) {
> + if (defined($ns->{canonical})&& ($ns->{canonical} eq $name)) {
> + run_git("config --add remote.". $remotename .".namespaces ". $name ."=". $ns->{id});
> + return $ns->{id};
> + }
> + }
> + die "Namespace $name was not found on MediaWiki.";
> +}
> +
> +sub download_mw_mediafile {
> + my $filename = shift;
> +
> + $mediawiki->{config}->{files_url} = $url;
> +
> + my $file = $mediawiki->download( { title => $filename } );
Just wondering: What happened if $filename contains some forbidden
character on wiki's filename such as '{' or '|' ?
I am worrying about it because i've got some similar issues in my own
work on tests for git-remote-mediawiki.
Hope I helped :).
Simon
--
CATHEBRAS Simon
2A-ENSIMAG
Filière Ingéniérie des Systèmes d'Information
Membre Bug-Buster
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCHv1] git-remote-mediawiki: import "File:" attachments
2012-06-08 16:20 ` Simon.Cathebras
@ 2012-06-08 17:03 ` konglu
2012-06-08 23:24 ` Simon.Cathebras
0 siblings, 1 reply; 5+ messages in thread
From: konglu @ 2012-06-08 17:03 UTC (permalink / raw)
To: Simon.Cathebras
Cc: git, Volek Pavel, NGUYEN Kim Thuat, ROUCHER IGLESIAS Javier,
Matthieu Moy
"Simon.Cathebras" <Simon.Cathebras@ensimag.imag.fr> a écrit :
> On 08/06/2012 16:22, Pavel Volek wrote:
>> From: Volek Pavel<me@pavelvolek.cz>
>>
>> The current version of the git-remote-mediawiki supports only
>> import and export
>> of the pages, doesn't support import and export of file
>> attachements which are
>> also exposed by MediaWiki API. This patch adds the functionality to
>> import the
>> last versions of the files and all versions of description pages for these
>> files.
>>
>> Signed-off-by: Pavel Volek<Pavel.Volek@ensimag.imag.fr>
>> Signed-off-by: NGUYEN Kim Thuat<Kim-Thuat.Nguyen@ensimag.imag.fr>
>> Signed-off-by: ROUCHER IGLESIAS Javier<roucherj@ensimag.imag.fr>
>> Signed-off-by: Matthieu Moy<Matthieu.Moy@imag.fr>
>> ---
>
>> contrib/mw-to-git/git-remote-mediawiki | 290
>> +++++++++++++++++++++++++++------
>> 1 file changed, 244 insertions(+), 46 deletions(-)
>
> I am wondering why are you showing the removal for a v1 patch ?
Why not ? The file already exists on branch master and they are
working on it. Anyway, the patch applies correctly on master.
BTW, are you implying that only v2+ patch could have deletions ?
(a patch is not meant to be applied on the previous version).
Lucien Kong
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCHv1] git-remote-mediawiki: import "File:" attachments
2012-06-08 17:03 ` konglu
@ 2012-06-08 23:24 ` Simon.Cathebras
0 siblings, 0 replies; 5+ messages in thread
From: Simon.Cathebras @ 2012-06-08 23:24 UTC (permalink / raw)
To: konglu
Cc: git, Volek Pavel, NGUYEN Kim Thuat, ROUCHER IGLESIAS Javier,
Matthieu Moy
On 08/06/2012 19:03, konglu@minatec.inpg.fr wrote:
>
> "Simon.Cathebras" <Simon.Cathebras@ensimag.imag.fr> a écrit :
>
>> On 08/06/2012 16:22, Pavel Volek wrote:
>>> From: Volek Pavel<me@pavelvolek.cz>
>>>
>>> The current version of the git-remote-mediawiki supports only import
>>> and export
>>> of the pages, doesn't support import and export of file attachements
>>> which are
>>> also exposed by MediaWiki API. This patch adds the functionality to
>>> import the
>>> last versions of the files and all versions of description pages for
>>> these
>>> files.
>>>
>>> Signed-off-by: Pavel Volek<Pavel.Volek@ensimag.imag.fr>
>>> Signed-off-by: NGUYEN Kim Thuat<Kim-Thuat.Nguyen@ensimag.imag.fr>
>>> Signed-off-by: ROUCHER IGLESIAS Javier<roucherj@ensimag.imag.fr>
>>> Signed-off-by: Matthieu Moy<Matthieu.Moy@imag.fr>
>>> ---
>>
>>> contrib/mw-to-git/git-remote-mediawiki | 290
>>> +++++++++++++++++++++++++++------
>>> 1 file changed, 244 insertions(+), 46 deletions(-)
>>
>> I am wondering why are you showing the removal for a v1 patch ?
>
> Why not ? The file already exists on branch master and they are
> working on it.
Makes sense... I didn't notice the deletions were on master, my bad.
> Anyway, the patch applies correctly on master.
> BTW, are you implying that only v2+ patch could have deletions ?
> (a patch is not meant to be applied on the previous version).
Actually, I was just saying that showing corrections on a patche's code
during the development of this one, isn't really necessary.
But if it is concerning a modification of a code in a previous version,
I agree, it is absolutly useful ;).
Simon
--
CATHEBRAS Simon
2A-ENSIMAG
Filière Ingéniérie des Systèmes d'Information
Membre Bug-Buster
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2012-06-08 23:24 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-08 14:22 [PATCHv1] git-remote-mediawiki: import "File:" attachments Pavel Volek
2012-06-08 14:42 ` Matthieu Moy
2012-06-08 16:20 ` Simon.Cathebras
2012-06-08 17:03 ` konglu
2012-06-08 23:24 ` Simon.Cathebras
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).