* [mlmmj] List archive howto?
@ 2010-08-18 1:33 Morgan Gangwere
2010-08-18 6:38 ` Andreas Schneider
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: Morgan Gangwere @ 2010-08-18 1:33 UTC (permalink / raw)
To: mlmmj
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Howdy y'all
I'm kinda curious as to what is used for the archiving of the mailing list?
I've got my own list set up and I'm pouring through the documentation
looking for what actually generates the html. Any help?
- --
Morgan Gangwere
>> Why?
> Because it breaks the logical flow of conversation, plus makes
messages unreadable.
>>> Top-Posting is evil.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (MingW32)
iQIcBAEBAgAGBQJMaziCAAoJEEURiCSotvJDQmwP/jHrz4H2siLqZdsg4ZXvIz2A
xC9H2iaynQ5hLGHwbsS8ontRKoodD69Z33NtWRdtLtmoAEQ4LoT2goZtDwmj9/VB
cS1QpjSdkAaqoqeqL0iXx6yY6SvMZwigD1d/GCn3/VousqvOublR5F0VzNozAU5+
lPn6ty1EaP+wK1p9sktMWhN5DgF+KUNzDbc/v9Y6A75njp5SJgVfD3j20KBdkfw2
6B1ipnnAZu0kZLS1lEOj5CZYwQ4WaOQaL0F4lPYWUyOxRcE4zm+qi+XLiinacm2p
nTZS3RBP3XguRKYTbyKcxvRVHxUCMQlH5Lhduowdeecsgj6csaDawKuD68viDfsE
qIH+sjhn+eVwlMcEl1id1Mu5NDzH7S7cDDPDv2Kiz40NIQn0g2PqxPr7ibjsvWvg
7HJmQ4zfbEE/Wyfl45MgpsYhiq4wlVKFJnGgZTHyGaI33KVWIpSfkXOZvBo2iNzi
bGv7bAbUyfLUwcoLI/r7m/2iCuN9obXinP+Px7aYJ/34EXIzYdrHjr/BiQYWwEhU
wCU39EvN173Dv6FIiUUPySnGgOmubNzVXvcCVcovfw7tHv6VwR8WS06rVwDKW3Gk
Rr29D7ksrjwN9i4lmQwuM2mB0xJrr9Yp6AXYIq0pjNhnwzy+m1/zW66Znf4p8q7G
B6Ki/clGUYP+7OMtP5Tu
=PB0G
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [mlmmj] List archive howto?
2010-08-18 1:33 [mlmmj] List archive howto? Morgan Gangwere
@ 2010-08-18 6:38 ` Andreas Schneider
2010-08-18 12:06 ` Wolf Bergenheim
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Andreas Schneider @ 2010-08-18 6:38 UTC (permalink / raw)
To: mlmmj
On Wednesday 18 August 2010 03:33:54 Morgan Gangwere wrote:
> Howdy y'all
Hi,
> I'm kinda curious as to what is used for the archiving of the mailing list?
>
> I've got my own list set up and I'm pouring through the documentation
> looking for what actually generates the html. Any help?
I've created mlmmj-webarchiver, you can get it here.
http://git.cynapses.org/projects/mlmmj-webarchiver.git/
Maybe somone wants to link it on the website. The output looks like this:
http://www.libssh.org/archive/
-- andreas
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [mlmmj] List archive howto?
2010-08-18 1:33 [mlmmj] List archive howto? Morgan Gangwere
2010-08-18 6:38 ` Andreas Schneider
@ 2010-08-18 12:06 ` Wolf Bergenheim
2010-10-03 13:35 ` Ben Schmidt
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Wolf Bergenheim @ 2010-08-18 12:06 UTC (permalink / raw)
To: mlmmj
[-- Attachment #1: Type: text/plain, Size: 716 bytes --]
On Wed, Aug 18, 2010 at 04:33, Morgan Gangwere <0.fractalus@gmail.com> wrote:
> Howdy y'all
>
> I'm kinda curious as to what is used for the archiving of the mailing list?
We're using hypermail to create the webarchive. I have also made my
own script to control the use of hypermail.
> I've got my own list set up and I'm pouring through the documentation
> looking for what actually generates the html. Any help?
>
I've attached my script, it is a cli php script
I also have a script to use it which is cleaning up after hypermail:
#!/bin/bash
/usr/local/bin/wwwmmm
find /var/www/mlmmj.org/archive -type d -exec chmod ug=rwx,o=x {} \;
find /var/www/mlmmj.org/archive -type f -exec chmod ug=rw,o=r {} \;
--Wolf
[-- Attachment #2: wwwmmm --]
[-- Type: application/octet-stream, Size: 1989 bytes --]
#!/usr/bin/php
<?php
// -*- php -*-
$prefix = '/var/spool/mlmmj/';
$startdir = getcwd();
$hm = '/usr/bin/hypermail';
//chdir('');
putenv('LC_TIME=en_GB');
$lists = array();
$domains = array();
echo "looking for domains in ".$prefix."\n";
// all lists with the webarchive tunable will be processed
$tmp = scandir($prefix);
foreach($tmp as $t) {
if(is_dir($prefix.$t) && !is_link($prefix.$t) && (substr_compare($t, '.', 0, 1) != 0)) {
echo "adding ".$t." as a domain\n";
$domains[] = $t;
}
}
foreach($domains as $domain) {
echo "looking for lists for ".$domain."\n";
$tmp = scandir($prefix.$domain);
foreach($tmp as $list) {
if(file_exists($prefix.$domain.'/'.$list.'/control/webarchive')) {
echo "adding ".$list." as a list\n";
$lists[$domain][] = $list;
}
}
}
foreach($lists as $domain => $lists) {
foreach($lists as $list) {
$listpath = $prefix.$domain.'/'.$list;
if (file_exists($listpath.'/control/webarchiveindex')) {
$lastindex = trim(file_get_contents($listpath.'/control/webarchiveindex'));
}
else {
$lastindex = 1;
}
$lastmessage = 0;
$tmp = scandir($listpath.'/archive');
foreach($tmp as $t) {
if(($t != '.') && ($t != '..') && ($t > $lastmessage)) {
$lastmessage = $t;
}
}
if($lastindex >= $lastmessage) {
print("\tNo new messages to process. ($lastindex/$lastmessage)\n");
break;
}
print("\tProcessing messages ".$lastindex.' - '.$lastmessage.' for '.$list.'@'.$domain."\n");
for($msg=$lastindex+1; $msg<=$lastmessage;$msg++) {
if(file_exists($listpath.'/archive/'.$msg)) {
print("\t\tArchiving message ".$msg.' of '.$list.'@'.$domain."\n");
$command = $hm.' -u -M -l '.$list.' -L en -c '.$listpath.'/control/webarchive -i <'.$listpath.'/archive/'.$msg;
exec($command);
$lastindex = $msg;
}
else {
print("\t\tNo message ".$msg.' in archive of '.$list.'@'.$domain."\n");
}
}
file_put_contents($listpath.'/control/webarchiveindex', $lastindex);
}
}
?>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [mlmmj] List archive howto?
2010-08-18 1:33 [mlmmj] List archive howto? Morgan Gangwere
2010-08-18 6:38 ` Andreas Schneider
2010-08-18 12:06 ` Wolf Bergenheim
@ 2010-10-03 13:35 ` Ben Schmidt
2010-10-03 21:37 ` Robin H. Johnson
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Ben Schmidt @ 2010-10-03 13:35 UTC (permalink / raw)
To: mlmmj
I've drafted some documentation about this. It is here:
http://mlmmj.org/docs/readme-archives/
Comments?
Wolf, perhaps at some stage you'd like to tidy up your scripts, and
document their dependencies and installation (e.g. with cron?) in a
README, and I could whack them in contrib?
You're welcome to do the same if you like, Andreas, though if you prefer
to keep your stuff in its existent web home, I reckon that's got its
advantages, too.
Thanks to both of you for contributing to Mlmmj in this way.
Smiles,
Ben.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [mlmmj] List archive howto?
2010-08-18 1:33 [mlmmj] List archive howto? Morgan Gangwere
` (2 preceding siblings ...)
2010-10-03 13:35 ` Ben Schmidt
@ 2010-10-03 21:37 ` Robin H. Johnson
2010-10-06 8:03 ` Ben Schmidt
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Robin H. Johnson @ 2010-10-03 21:37 UTC (permalink / raw)
To: mlmmj
[-- Attachment #1: Type: text/plain, Size: 1826 bytes --]
On Tue, Aug 17, 2010 at 07:33:54PM -0600, Morgan Gangwere wrote:
> Howdy y'all
>
> I'm kinda curious as to what is used for the archiving of the mailing list?
>
> I've got my own list set up and I'm pouring through the documentation
> looking for what actually generates the html. Any help?
For Gentoo, we use mhonarc, with some minor modifications.
One of the key modifications was ultimately caused by the message
numbering issue in threads.
Specifically, what URL should exist if a message arrives out of sequence
due to a delay. Some archive tools renumber the urls, others keep an
index based on some part of the mail. Both of these caused some degree
of problems for us, and we had to come up with an alternative.
Specifically, when the message is being received, we add two headers:
X-Archives-Salt: ${UUID}
then we hash the entire message file, including the new header, and
generate:
X-Archives-Hash: ${SHA1}
This gets used in the filename for the individual message files we
write to disk:
http://archives.gentoo.org/gentoo-scm/msg_c0f2f8f123f85bb8b664827b4a1dcb09.xml
It's consistent regardless of how many times we have to rebuild a
message archive.
We've got nearly 588k messages in the archive, and now the next problem
is starting to be the storage/access scalability of some of the larger
lists. Also need to work on better parallelization for building archives
for each list.
Largest lists, by mails:
72808 gentoo-dev
97619 gentoo-user
283036 gentoo-commits
Footnote:
Yes, we generate XML, that's to integrate with our templating, but you
could generate HTML even more easily.
--
Robin Hugh Johnson
Gentoo Linux: Developer, Trustee & Infrastructure Lead
E-Mail : robbat2@gentoo.org
GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85
[-- Attachment #2: Type: application/pgp-signature, Size: 330 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [mlmmj] List archive howto?
2010-08-18 1:33 [mlmmj] List archive howto? Morgan Gangwere
` (3 preceding siblings ...)
2010-10-03 21:37 ` Robin H. Johnson
@ 2010-10-06 8:03 ` Ben Schmidt
2010-10-06 19:57 ` Robin H. Johnson
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Ben Schmidt @ 2010-10-06 8:03 UTC (permalink / raw)
To: mlmmj
I'm afraid you've completely lost me there, Robin, possibly because I
know basically nothing about mhonarc. The concepts sound great, though.
A question, though.... Is there anything specific and useful in your
stuff reply that I should be working into the documentation about
archive solutions? Or is this a complex solution, or not detailed
enough, and best left out and revealed if someone finds the other
solutions aren't adequate for them?
Cheers,
Ben.
On 4/10/10 8:37 AM, Robin H. Johnson wrote:
> On Tue, Aug 17, 2010 at 07:33:54PM -0600, Morgan Gangwere wrote:
>> Howdy y'all
>>
>> I'm kinda curious as to what is used for the archiving of the mailing list?
>>
>> I've got my own list set up and I'm pouring through the documentation
>> looking for what actually generates the html. Any help?
> For Gentoo, we use mhonarc, with some minor modifications.
>
> One of the key modifications was ultimately caused by the message
> numbering issue in threads.
>
> Specifically, what URL should exist if a message arrives out of sequence
> due to a delay. Some archive tools renumber the urls, others keep an
> index based on some part of the mail. Both of these caused some degree
> of problems for us, and we had to come up with an alternative.
>
> Specifically, when the message is being received, we add two headers:
> X-Archives-Salt: ${UUID}
> then we hash the entire message file, including the new header, and
> generate:
> X-Archives-Hash: ${SHA1}
>
> This gets used in the filename for the individual message files we
> write to disk:
> http://archives.gentoo.org/gentoo-scm/msg_c0f2f8f123f85bb8b664827b4a1dcb09.xml
> It's consistent regardless of how many times we have to rebuild a
> message archive.
>
> We've got nearly 588k messages in the archive, and now the next problem
> is starting to be the storage/access scalability of some of the larger
> lists. Also need to work on better parallelization for building archives
> for each list.
>
> Largest lists, by mails:
> 72808 gentoo-dev
> 97619 gentoo-user
> 283036 gentoo-commits
>
> Footnote:
> Yes, we generate XML, that's to integrate with our templating, but you
> could generate HTML even more easily.
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [mlmmj] List archive howto?
2010-08-18 1:33 [mlmmj] List archive howto? Morgan Gangwere
` (4 preceding siblings ...)
2010-10-06 8:03 ` Ben Schmidt
@ 2010-10-06 19:57 ` Robin H. Johnson
2010-10-06 22:19 ` Ben Schmidt
2010-10-11 10:41 ` Florian Effenberger
7 siblings, 0 replies; 9+ messages in thread
From: Robin H. Johnson @ 2010-10-06 19:57 UTC (permalink / raw)
To: mlmmj
[-- Attachment #1: Type: text/plain, Size: 852 bytes --]
On Wed, Oct 06, 2010 at 07:03:31PM +1100, Ben Schmidt wrote:
> I'm afraid you've completely lost me there, Robin, possibly because I
> know basically nothing about mhonarc. The concepts sound great, though.
>
> A question, though.... Is there anything specific and useful in your
> stuff reply that I should be working into the documentation about
> archive solutions? Or is this a complex solution, or not detailed
> enough, and best left out and revealed if someone finds the other
> solutions aren't adequate for them?
Mhonarc itself is fairly simple to setup.
The URLs based on SHA1 help with larger lists in case you have to
rebuild the archive only.
--
Robin Hugh Johnson
Gentoo Linux: Developer, Trustee & Infrastructure Lead
E-Mail : robbat2@gentoo.org
GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85
[-- Attachment #2: Type: application/pgp-signature, Size: 330 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [mlmmj] List archive howto?
2010-08-18 1:33 [mlmmj] List archive howto? Morgan Gangwere
` (5 preceding siblings ...)
2010-10-06 19:57 ` Robin H. Johnson
@ 2010-10-06 22:19 ` Ben Schmidt
2010-10-11 10:41 ` Florian Effenberger
7 siblings, 0 replies; 9+ messages in thread
From: Ben Schmidt @ 2010-10-06 22:19 UTC (permalink / raw)
To: mlmmj
On 7/10/10 6:57 AM, Robin H. Johnson wrote:
> On Wed, Oct 06, 2010 at 07:03:31PM +1100, Ben Schmidt wrote:
>> I'm afraid you've completely lost me there, Robin, possibly because I
>> know basically nothing about mhonarc. The concepts sound great, though.
>>
>> A question, though.... Is there anything specific and useful in your
>> stuff reply that I should be working into the documentation about
>> archive solutions? Or is this a complex solution, or not detailed
>> enough, and best left out and revealed if someone finds the other
>> solutions aren't adequate for them?
> Mhonarc itself is fairly simple to setup.
>
> The URLs based on SHA1 help with larger lists in case you have to
> rebuild the archive only.
Is this something that if I mention it in the docs a mhonarc user would
know what to do to make it happen? If not, are you able to flesh it out
a bit for me so I can include it?
Cheers,
Ben.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [mlmmj] List archive howto?
2010-08-18 1:33 [mlmmj] List archive howto? Morgan Gangwere
` (6 preceding siblings ...)
2010-10-06 22:19 ` Ben Schmidt
@ 2010-10-11 10:41 ` Florian Effenberger
7 siblings, 0 replies; 9+ messages in thread
From: Florian Effenberger @ 2010-10-11 10:41 UTC (permalink / raw)
To: mlmmj
Hi Robin,
2010/10/3 Robin H. Johnson <robbat2@gentoo.org>:
> On Tue, Aug 17, 2010 at 07:33:54PM -0600, Morgan Gangwere wrote:
> For Gentoo, we use mhonarc, with some minor modifications.
>
> One of the key modifications was ultimately caused by the message
> numbering issue in threads.
>
> Specifically, what URL should exist if a message arrives out of sequence
> due to a delay. Some archive tools renumber the urls, others keep an
> index based on some part of the mail. Both of these caused some degree
> of problems for us, and we had to come up with an alternative.
can you share how you did thas? Would be interested to deploy
something similar on our machines as well. ;)
Thanks,
Florian
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-10-11 10:41 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-18 1:33 [mlmmj] List archive howto? Morgan Gangwere
2010-08-18 6:38 ` Andreas Schneider
2010-08-18 12:06 ` Wolf Bergenheim
2010-10-03 13:35 ` Ben Schmidt
2010-10-03 21:37 ` Robin H. Johnson
2010-10-06 8:03 ` Ben Schmidt
2010-10-06 19:57 ` Robin H. Johnson
2010-10-06 22:19 ` Ben Schmidt
2010-10-11 10:41 ` Florian Effenberger
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.