* [PATCH 1/4] t3900: add missing UTF-16.txt and mark the test successful @ 2012-02-21 14:24 Nguyễn Thái Ngọc Duy 2012-02-21 14:24 ` [PATCH 2/4] Do attempt pretty print in ASCII-incompatible encodings Nguyễn Thái Ngọc Duy ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2012-02-21 14:24 UTC (permalink / raw) To: git; +Cc: Nguyễn Thái Ngọc Duy Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- t/t3900-i18n-commit.sh | 4 ++-- t/t3900/UTF-16.txt | Bin 0 -> 18 bytes 2 files changed, 2 insertions(+), 2 deletions(-) create mode 100644 t/t3900/UTF-16.txt diff --git a/t/t3900-i18n-commit.sh b/t/t3900-i18n-commit.sh index d48a7c0..a9e5662 100755 --- a/t/t3900-i18n-commit.sh +++ b/t/t3900-i18n-commit.sh @@ -34,9 +34,9 @@ test_expect_success 'no encoding header for base case' ' test z = "z$E" ' -test_expect_failure 'UTF-16 refused because of NULs' ' +test_expect_success 'UTF-16 refused because of NULs' ' echo UTF-16 >F && - git commit -a -F "$TEST_DIRECTORY"/t3900/UTF-16.txt + test_must_fail git commit -a -F "$TEST_DIRECTORY"/t3900/UTF-16.txt ' diff --git a/t/t3900/UTF-16.txt b/t/t3900/UTF-16.txt new file mode 100644 index 0000000000000000000000000000000000000000..8d0945b8e0a734ced8948da29ed9f8c65e3ec775 GIT binary patch literal 18 VcmezW&xIi$409P$8B!Ry7yv#b1kV5f literal 0 HcmV?d00001 -- 1.7.8.36.g69ee2 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/4] Do attempt pretty print in ASCII-incompatible encodings 2012-02-21 14:24 [PATCH 1/4] t3900: add missing UTF-16.txt and mark the test successful Nguyễn Thái Ngọc Duy @ 2012-02-21 14:24 ` Nguyễn Thái Ngọc Duy 2012-02-21 14:53 ` Nguyen Thai Ngoc Duy 2012-02-21 18:21 ` Jeff King 2012-02-21 14:24 ` [PATCH 3/4] utf8: die if failed to re-encoding Nguyễn Thái Ngọc Duy 2012-02-21 14:24 ` [PATCH 4/4] Only re-encode certain parts in commit object, not the whole Nguyễn Thái Ngọc Duy 2 siblings, 2 replies; 12+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2012-02-21 14:24 UTC (permalink / raw) To: git; +Cc: Nguyễn Thái Ngọc Duy We rely on ASCII everywhere. We print "\n" directly without conversion for example. The end result would be a mix of some encoding and ASCII if they are incompatible. Do not do that. In theory we could convert everything to utf-8 as intermediate medium, process process process, then convert final output to the desired encoding. But that's a lot of work (unless we have a pager-like converter) with little real use. Users can just pipe everything to iconv instead. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- It seems half of the encodings "iconv -l" list does not pass ascii_superset_encoding() test. I just assume they are either exotic or duplicate names. pretty.c | 7 +++++++ utf8.c | 15 +++++++++++++++ utf8.h | 1 + 3 files changed, 23 insertions(+), 0 deletions(-) diff --git a/pretty.c b/pretty.c index 8688b8f..5c433a2 100644 --- a/pretty.c +++ b/pretty.c @@ -493,12 +493,19 @@ char *logmsg_reencode(const struct commit *commit, const char *output_encoding) { static const char *utf8 = "UTF-8"; + static const char *last_output_encoding = NULL; const char *use_encoding; char *encoding; char *out; if (!*output_encoding) return NULL; + if (last_output_encoding != output_encoding) { + if (!ascii_superset_encoding(output_encoding)) + die("encoding %s is not a superset of ASCII.", + output_encoding); + last_output_encoding = output_encoding; + } encoding = get_header(commit, "encoding"); use_encoding = encoding ? encoding : utf8; if (!strcmp(use_encoding, output_encoding)) diff --git a/utf8.c b/utf8.c index 8acbc66..def93ee 100644 --- a/utf8.c +++ b/utf8.c @@ -482,3 +482,18 @@ char *reencode_string(const char *in, const char *out_encoding, const char *in_e return out; } #endif + +int ascii_superset_encoding(const char *encoding) +{ + const char *sample = " !\"#$%&'()*+,-./0123456789:;<=>?@" + "ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`" + "abcdefghijklmnopqrstuvwxyz{|}~\n"; + char *output; + int ret; + if (!encoding) + return 1; + output = reencode_string(sample, encoding, "US-ASCII"); + ret = !output || !strcmp(sample, output); + free(output); + return ret; +} diff --git a/utf8.h b/utf8.h index 81f2c82..75bc128 100644 --- a/utf8.h +++ b/utf8.h @@ -12,6 +12,7 @@ int strbuf_add_wrapped_text(struct strbuf *buf, const char *text, int indent, int indent2, int width); int strbuf_add_wrapped_bytes(struct strbuf *buf, const char *data, int len, int indent, int indent2, int width); +int ascii_superset_encoding(const char *encoding); #ifndef NO_ICONV char *reencode_string(const char *in, const char *out_encoding, const char *in_encoding); -- 1.7.8.36.g69ee2 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/4] Do attempt pretty print in ASCII-incompatible encodings 2012-02-21 14:24 ` [PATCH 2/4] Do attempt pretty print in ASCII-incompatible encodings Nguyễn Thái Ngọc Duy @ 2012-02-21 14:53 ` Nguyen Thai Ngoc Duy 2012-02-21 18:21 ` Jeff King 1 sibling, 0 replies; 12+ messages in thread From: Nguyen Thai Ngoc Duy @ 2012-02-21 14:53 UTC (permalink / raw) To: git 2012/2/21 Nguyễn Thái Ngọc Duy <pclouds@gmail.com>: > @@ -482,3 +482,18 @@ char *reencode_string(const char *in, const char *out_encoding, const char *in_e > return out; > } > #endif > + > +int ascii_superset_encoding(const char *encoding) > +{ > + const char *sample = " !\"#$%&'()*+,-./0123456789:;<=>?@" > + "ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`" > + "abcdefghijklmnopqrstuvwxyz{|}~\n"; > + char *output; > + int ret; > + if (!encoding) > + return 1; > + output = reencode_string(sample, encoding, "US-ASCII"); > + ret = !output || !strcmp(sample, output); > + free(output); > + return ret; > +} Side note about this function, which was written to ban all ascii-incompatible charsets from entering commit objects. The idea of mixing charsets in the same buffer without clear boundary does not sound healthy. Plus, ident.c will silently drop '\n', '<' and '>' in author/committer. If a hypothetical charset happens to place a letter in those, um.. code points?, the letter will be dropped. But meh.. -- Duy ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/4] Do attempt pretty print in ASCII-incompatible encodings 2012-02-21 14:24 ` [PATCH 2/4] Do attempt pretty print in ASCII-incompatible encodings Nguyễn Thái Ngọc Duy 2012-02-21 14:53 ` Nguyen Thai Ngoc Duy @ 2012-02-21 18:21 ` Jeff King 2012-02-22 2:17 ` Nguyen Thai Ngoc Duy 2012-02-23 11:25 ` Peter Krefting 1 sibling, 2 replies; 12+ messages in thread From: Jeff King @ 2012-02-21 18:21 UTC (permalink / raw) To: Nguyễn Thái Ngọc Duy; +Cc: git On Tue, Feb 21, 2012 at 09:24:50PM +0700, Nguyen Thai Ngoc Duy wrote: > We rely on ASCII everywhere. We print "\n" directly without conversion > for example. The end result would be a mix of some encoding and ASCII > if they are incompatible. Do not do that. > > In theory we could convert everything to utf-8 as intermediate medium, > process process process, then convert final output to the desired > encoding. But that's a lot of work (unless we have a pager-like > converter) with little real use. Users can just pipe everything to > iconv instead. I'm not sure why we bother checking this. Using non-ASCII-superset encodings is broken, yes, but are people actually doing that? I assume that the common one is utf-16, and anybody using it will experience severe breakage immediately. So are people actually doing this? Are there actually encodings that will cause subtle breakage that we want to catch? -Peff ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/4] Do attempt pretty print in ASCII-incompatible encodings 2012-02-21 18:21 ` Jeff King @ 2012-02-22 2:17 ` Nguyen Thai Ngoc Duy 2012-02-23 11:25 ` Peter Krefting 1 sibling, 0 replies; 12+ messages in thread From: Nguyen Thai Ngoc Duy @ 2012-02-22 2:17 UTC (permalink / raw) To: Jeff King; +Cc: git 2012/2/22 Jeff King <peff@peff.net>: > On Tue, Feb 21, 2012 at 09:24:50PM +0700, Nguyen Thai Ngoc Duy wrote: > >> We rely on ASCII everywhere. We print "\n" directly without conversion >> for example. The end result would be a mix of some encoding and ASCII >> if they are incompatible. Do not do that. >> >> In theory we could convert everything to utf-8 as intermediate medium, >> process process process, then convert final output to the desired >> encoding. But that's a lot of work (unless we have a pager-like >> converter) with little real use. Users can just pipe everything to >> iconv instead. > > I'm not sure why we bother checking this. Using non-ASCII-superset > encodings is broken, yes, but are people actually doing that? I assume > that the common one is utf-16, and anybody using it will experience > severe breakage immediately. So are people actually doing this? Are > there actually encodings that will cause subtle breakage that we want to > catch? I did :-) once actually. But that's a good point, using unsuitable encoding leads to garbage output, but no subtle breakage there. It'd be nice to say "your encoding is not supported" than throw garbage, but again probably no one did it but me, and I don't feel like doing it again. -- Duy ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/4] Do attempt pretty print in ASCII-incompatible encodings 2012-02-21 18:21 ` Jeff King 2012-02-22 2:17 ` Nguyen Thai Ngoc Duy @ 2012-02-23 11:25 ` Peter Krefting 1 sibling, 0 replies; 12+ messages in thread From: Peter Krefting @ 2012-02-23 11:25 UTC (permalink / raw) To: Git Mailing List; +Cc: Jeff King, Nguyễn Thái Ngọc Duy Jeff King: > I'm not sure why we bother checking this. Using non-ASCII-superset > encodings is broken, yes, but are people actually doing that? [...] > Are there actually encodings that will cause subtle breakage that we want > to catch? Shift-JIS could be a problem; if implemented to the letter it would convert 0x5C to a Yen character and 0x7E as an overline. Otherwise I expect it only being a problem with ISO 646 encodings, especially the ones that replace "@" with something else [1]. Also any ISO 2022 seven-bit encoding (ISO-2022-{CN,JP,KR}) could cause problems, especially if there is any preprocessing done on the string that does not respect its state-shifting (most 0x21--0x7E characters can be lead and trail bytes in their multi-byte modes). -- \\// Peter - http://www.softwolves.pp.se/ [1] Trying to send Internet e-mail from a system using the extended Swedish seven-bit encoding, where 0x40 mapped to "É", could sometimes be a challenge. ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 3/4] utf8: die if failed to re-encoding 2012-02-21 14:24 [PATCH 1/4] t3900: add missing UTF-16.txt and mark the test successful Nguyễn Thái Ngọc Duy 2012-02-21 14:24 ` [PATCH 2/4] Do attempt pretty print in ASCII-incompatible encodings Nguyễn Thái Ngọc Duy @ 2012-02-21 14:24 ` Nguyễn Thái Ngọc Duy 2012-02-21 17:36 ` Junio C Hamano 2012-02-21 14:24 ` [PATCH 4/4] Only re-encode certain parts in commit object, not the whole Nguyễn Thái Ngọc Duy 2 siblings, 1 reply; 12+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2012-02-21 14:24 UTC (permalink / raw) To: git; +Cc: Nguyễn Thái Ngọc Duy Return value NULL in this case means "no conversion needed", which is not quite true when conv == -1. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- t/t4201-shortlog.sh | 2 +- utf8.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh index 6872ba1..d445665 100755 --- a/t/t4201-shortlog.sh +++ b/t/t4201-shortlog.sh @@ -27,7 +27,7 @@ test_expect_success 'setup' ' tr 1234 "\360\235\204\236")" a1 && # now fsck up the utf8 - git config i18n.commitencoding non-utf-8 && + git config i18n.commitencoding viscii && echo 4 >a1 && git commit --quiet -m "$( echo "This is a very, very long first line for the commit message to see if it is wrapped correctly" | diff --git a/utf8.c b/utf8.c index def93ee..f918e9e 100644 --- a/utf8.c +++ b/utf8.c @@ -444,7 +444,7 @@ char *reencode_string(const char *in, const char *out_encoding, const char *in_e return NULL; conv = iconv_open(out_encoding, in_encoding); if (conv == (iconv_t) -1) - return NULL; + die("failed to convert from %s to %s", in_encoding, out_encoding); insz = strlen(in); outsz = insz; outalloc = outsz + 1; /* for terminating NUL */ -- 1.7.8.36.g69ee2 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 3/4] utf8: die if failed to re-encoding 2012-02-21 14:24 ` [PATCH 3/4] utf8: die if failed to re-encoding Nguyễn Thái Ngọc Duy @ 2012-02-21 17:36 ` Junio C Hamano 0 siblings, 0 replies; 12+ messages in thread From: Junio C Hamano @ 2012-02-21 17:36 UTC (permalink / raw) To: Nguyễn Thái Ngọc Duy; +Cc: git Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes: > Return value NULL in this case means "no conversion needed", which is > not quite true when conv == -1. Doing this only when producing new commits to avoid spreading damage might be a good idea. But utf8.c::reencode_string() is sufficiently deep in the call-chains to make me suspect that the codepaths this change affects are not limited to creation ones. If this also forbids readers from resurrecting salvageable bits while reading (imagine your commit had "encodign vscii" but your log message was in English, except only your name had letters outside ASCII that I cannot locally convert to utf-8 for viewing), I do not think it is an acceptable change. ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 4/4] Only re-encode certain parts in commit object, not the whole 2012-02-21 14:24 [PATCH 1/4] t3900: add missing UTF-16.txt and mark the test successful Nguyễn Thái Ngọc Duy 2012-02-21 14:24 ` [PATCH 2/4] Do attempt pretty print in ASCII-incompatible encodings Nguyễn Thái Ngọc Duy 2012-02-21 14:24 ` [PATCH 3/4] utf8: die if failed to re-encoding Nguyễn Thái Ngọc Duy @ 2012-02-21 14:24 ` Nguyễn Thái Ngọc Duy 2012-02-21 18:25 ` Jeff King 2 siblings, 1 reply; 12+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2012-02-21 14:24 UTC (permalink / raw) To: git; +Cc: Nguyễn Thái Ngọc Duy Commit object has its own format, which happens to be in ascii, but not really subject to re-encoding. There are only four areas that may be re-encoded: author line, committer line, mergetag lines and commit body. Encoding of tags embedded in mergetag lines is not decided by commit encoding, so leave it out and consider it binary. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- pretty.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 57 insertions(+), 1 deletions(-) diff --git a/pretty.c b/pretty.c index 5c433a2..6ccc091 100644 --- a/pretty.c +++ b/pretty.c @@ -489,6 +489,62 @@ static char *replace_encoding_header(char *buf, const char *encoding) return strbuf_detach(&tmp, NULL); } +/* + * Re-encode author, committer and commit body only, leaving the rest + * in ascii (or whatever the encoding it is in mergetag lines) + * regardless output encoding. We assume the commit is good, so no + * validation. + */ +static char *reencode_commit(const char *buffer, + const char *out_enc, const char *in_enc) +{ + struct strbuf out = STRBUF_INIT; + struct strbuf buf = STRBUF_INIT; + char *reencoded, *s, *e; + + strbuf_addstr(&buf, buffer); + + s = strstr(buf.buf, "\nauthor "); + assert(s != NULL); + s += 8; /* "\nauthor " */ + strbuf_add(&out, buf.buf, s - buf.buf); + e = strchr(s, '\n'); + *e = '\0'; + reencoded = reencode_string(s, out_enc, in_enc); + if (reencoded && strchr(reencoded, '\n')) + die("your chosen encoding produces \\n out of nowhere?"); + strbuf_addstr(&out, reencoded ? reencoded : s); + free(reencoded); + + strbuf_addstr(&out, "\ncommitter "); + assert(!strncmp(e + 1, "committer ", 10)); + s = e + 11; /* "\ncommitter " */ + e = strchr(s, '\n'); + *e = '\0'; + reencoded = reencode_string(s, out_enc, in_enc); + if (reencoded && strchr(reencoded, '\n')) + die("your chosen encoding produces \\n out of nowhere?"); + strbuf_addstr(&out, reencoded ? reencoded : s); + free(reencoded); + *e = '\n'; + + s = e; + e = strstr(s, "\n\n"); + if (e) { + e += 2; /* "\n\n" */ + strbuf_add(&out, s, e - s); + + s = e; + reencoded = reencode_string(s, out_enc, in_enc); + strbuf_addstr(&out, reencoded ? reencoded : s); + free(reencoded); + } else + strbuf_addstr(&out, s); + + strbuf_release(&buf); + return strbuf_detach(&out, NULL); +} + char *logmsg_reencode(const struct commit *commit, const char *output_encoding) { @@ -514,7 +570,7 @@ char *logmsg_reencode(const struct commit *commit, else return NULL; /* nothing to do */ else - out = reencode_string(commit->buffer, + out = reencode_commit(commit->buffer, output_encoding, use_encoding); if (out) out = replace_encoding_header(out, output_encoding); -- 1.7.8.36.g69ee2 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 4/4] Only re-encode certain parts in commit object, not the whole 2012-02-21 14:24 ` [PATCH 4/4] Only re-encode certain parts in commit object, not the whole Nguyễn Thái Ngọc Duy @ 2012-02-21 18:25 ` Jeff King 2012-02-22 2:01 ` Nguyen Thai Ngoc Duy 0 siblings, 1 reply; 12+ messages in thread From: Jeff King @ 2012-02-21 18:25 UTC (permalink / raw) To: Nguyễn Thái Ngọc Duy; +Cc: git On Tue, Feb 21, 2012 at 09:24:52PM +0700, Nguyen Thai Ngoc Duy wrote: > Commit object has its own format, which happens to be in ascii, but > not really subject to re-encoding. > > There are only four areas that may be re-encoded: author line, > committer line, mergetag lines and commit body. Encoding of tags > embedded in mergetag lines is not decided by commit encoding, so leave > it out and consider it binary. Is this worth the effort? Yes, re-encoding the ASCII bits of the commit object is unnecessary. But do we actually handle encodings that are not ASCII supersets? IOW, I could see the point if this is making it possible to hold utf-16 names and messages in your commits (though why you would want to do so is beyond me...). But my understanding is that this is horribly broken anyway by other parts of the code. And even looking at your code below: > +static char *reencode_commit(const char *buffer, > + const char *out_enc, const char *in_enc) > +{ > + struct strbuf out = STRBUF_INIT; > + struct strbuf buf = STRBUF_INIT; > + char *reencoded, *s, *e; > + > + strbuf_addstr(&buf, buffer); > + > + s = strstr(buf.buf, "\nauthor "); > + assert(s != NULL); Wouldn't this assert trigger in the presence of encodings which contain ASCII NUL (e.g., wide encodings like utf-16)? Is there an encoding you have in mind which would be helped by this? -Peff ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 4/4] Only re-encode certain parts in commit object, not the whole 2012-02-21 18:25 ` Jeff King @ 2012-02-22 2:01 ` Nguyen Thai Ngoc Duy 2012-02-22 3:14 ` Junio C Hamano 0 siblings, 1 reply; 12+ messages in thread From: Nguyen Thai Ngoc Duy @ 2012-02-22 2:01 UTC (permalink / raw) To: Jeff King; +Cc: git 2012/2/22 Jeff King <peff@peff.net>: > On Tue, Feb 21, 2012 at 09:24:52PM +0700, Nguyen Thai Ngoc Duy wrote: > >> Commit object has its own format, which happens to be in ascii, but >> not really subject to re-encoding. >> >> There are only four areas that may be re-encoded: author line, >> committer line, mergetag lines and commit body. Encoding of tags >> embedded in mergetag lines is not decided by commit encoding, so leave >> it out and consider it binary. > > Is this worth the effort? Yes, re-encoding the ASCII bits of the commit > object is unnecessary. But do we actually handle encodings that are not > ASCII supersets? IOW, I could see the point if this is making it > possible to hold utf-16 names and messages in your commits (though why > you would want to do so is beyond me...). But my understanding is that > this is horribly broken anyway by other parts of the code. And even > looking at your code below: No, utf-16 and friends are out of question. 617/1168 supported encodings in iconv translate chars 10,32-126 to something else, some of them does not generate NUL. I suppose none of these are actually used nowadays. Looking again, some don't even successfully translate the given input. No, it's probably not worth the effort. -- Duy ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 4/4] Only re-encode certain parts in commit object, not the whole 2012-02-22 2:01 ` Nguyen Thai Ngoc Duy @ 2012-02-22 3:14 ` Junio C Hamano 0 siblings, 0 replies; 12+ messages in thread From: Junio C Hamano @ 2012-02-22 3:14 UTC (permalink / raw) To: Nguyen Thai Ngoc Duy; +Cc: Jeff King, git By the way, zj/term-columns topic has already graduated to 'master', so if you are still interested in your earlier nd/columns topic, it would be a good time to re-roll it. No hurries, but pointing it out just in case you forgot. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2012-02-23 11:31 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-02-21 14:24 [PATCH 1/4] t3900: add missing UTF-16.txt and mark the test successful Nguyễn Thái Ngọc Duy 2012-02-21 14:24 ` [PATCH 2/4] Do attempt pretty print in ASCII-incompatible encodings Nguyễn Thái Ngọc Duy 2012-02-21 14:53 ` Nguyen Thai Ngoc Duy 2012-02-21 18:21 ` Jeff King 2012-02-22 2:17 ` Nguyen Thai Ngoc Duy 2012-02-23 11:25 ` Peter Krefting 2012-02-21 14:24 ` [PATCH 3/4] utf8: die if failed to re-encoding Nguyễn Thái Ngọc Duy 2012-02-21 17:36 ` Junio C Hamano 2012-02-21 14:24 ` [PATCH 4/4] Only re-encode certain parts in commit object, not the whole Nguyễn Thái Ngọc Duy 2012-02-21 18:25 ` Jeff King 2012-02-22 2:01 ` Nguyen Thai Ngoc Duy 2012-02-22 3:14 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).