git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Jonathan Fine <jfine@pytex.org>
Cc: python-list@python.org, git@vger.kernel.org
Subject: Re: A Python script to put CTAN into git (from DVDs)
Date: Sun, 06 Nov 2011 08:42:23 -0800 (PST)	[thread overview]
Message-ID: <m37h3d430e.fsf@localhost.localdomain> (raw)
In-Reply-To: <4EB6A522.3020909@pytex.org>

Jonathan Fine <jfine@pytex.org> writes:

> Hi
> 
> This it to let you know that I'm writing (in Python) a script that
> places the content of CTAN into a git repository.
>      https://bitbucket.org/jfine/python-ctantools

I hope that you meant "repositories" (plural) here, one per tool,
rather than putting all of CTAN into single Git repository.
 
> I'm working from the TeX Collection DVDs that are published each year
> by the TeX user groups, which contain a snapshot of CTAN (about
> 100,000 files occupying 4Gb), which means I have to unzip folders and
> do a few other things.

There is 'contrib/fast-import/import-zips.py' in git.git repository.
If you are not using it, or its equivalent, it might be worth checking
out.
 
> CTAN is the Comprehensive TeX Archive Network.  CTAN keeps only the
> latest version of each file, but old CTAN snapshots will provide many
> earlier versions.

There was similar effort done in putting CPAN (Comprehensive _Perl_
Archive Network) in Git, hosting repositories on GitHub[1], by the name
of gitPAN, see e.g.:

  "The gitPAN Import is Complete"
  http://perlisalive.com/articles/36
 
[1]: https://github.com/gitpan

> I'm working on putting old CTAN files into modern version
> control. Martin Scharrer is working in the other direction.  He's
> putting new files added to CTAN into Mercurial.
>      http://ctanhg.scharrer-online.de/

Nb. thanks to tools such as git-hg and fast-import / fast-export
we have quite good interoperability and convertability between
Git and Mercurial.

P.S. I'd point to reposurgeon tool, which can be used to do fixups
after import, but it would probably won't work on such large (set of)
repositories.

P.P.S. Can you forward it to comp.text.tex?
-- 
Jakub Narębski

  reply	other threads:[~2011-11-06 16:42 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-06 15:17 A Python script to put CTAN into git (from DVDs) Jonathan Fine
2011-11-06 16:42 ` Jakub Narebski [this message]
     [not found] ` <mailman.2464.1320597747.27778.python-list@python.org>
2011-11-06 18:19   ` Jonathan Fine
2011-11-06 20:29     ` Jakub Narebski
2011-11-07 20:21       ` Jonathan Fine
2011-11-07 21:50         ` Jakub Narebski
2011-11-07 22:03           ` Jonathan Fine

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m37h3d430e.fsf@localhost.localdomain \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jfine@pytex.org \
    --cc=python-list@python.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).