* Re: [mlmmj] archive options
2015-07-22 18:13 [mlmmj] archive options A. Schulze
@ 2015-07-22 23:04 ` Chris Knadle
2015-07-23 7:23 ` Morten Shearman Kirkegaard
2015-07-23 13:36 ` Piotr Auksztulewicz
2 siblings, 0 replies; 4+ messages in thread
From: Chris Knadle @ 2015-07-22 23:04 UTC (permalink / raw)
To: mlmmj
On 07/22/2015 02:13 PM, A. Schulze wrote:
>
> Hello,
>
> just noticed mlmmj archive a list as one message per file in ~list/archive/
> Today I have a list with 300 messages. And I have a directory with 300
> files.
> I assume I'll get problems on 10k messages...
Not really. I ran a mail server that had a problem back in 2003 where
the ext3 filesystem was filled with message bounces and rebounces.
Because there were > 64k files the wildcards (?, *, etc) couldn't be
parsed by the system anymore, but the filesystem other than that didn't
have an issue with the number of files that were in a single directory.
And the issue I had at the time was that I was trying to parse which
files were the bounced messages vs "real" ones that hadn't been
sent/received yet.
In the case of MLMMJ the filenames would be in order, so it would be
clear which files were "old" and could be moved to another directory, if
needed. And I'm not sure what the limit on the number of files in a
directory is anymore -- it's probably filesystem dependent, and I think
the wildcards work past 64k files now too, IIRC.
> Are there options to distribute the archived content over multiple
> sub-directories?
I was thinking of running a monthly cronjob to move archived mailing
list mails into a directory with the YYYY-MM format, but haven't done
this because I believe it would remove the ability of mailing list users
to retrieve old messages via email commands to MLMMJ.
So instead what I'd suggest would be to create a web archive of mailing
list mail, using something like MHonArc or Lurker. That way users can
still get to archived mailing list mail, and you can move the archive
files as you see fit.
At least that's what I've been thinking about, as a user of MLMMJ.
-- Chris
--
Chris Knadle
Chris.Knadle@coredump.us
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [mlmmj] archive options
2015-07-22 18:13 [mlmmj] archive options A. Schulze
2015-07-22 23:04 ` Chris Knadle
@ 2015-07-23 7:23 ` Morten Shearman Kirkegaard
2015-07-23 13:36 ` Piotr Auksztulewicz
2 siblings, 0 replies; 4+ messages in thread
From: Morten Shearman Kirkegaard @ 2015-07-23 7:23 UTC (permalink / raw)
To: mlmmj
On Wed, Jul 22, 2015 at 07:04:43PM -0400, Chris Knadle wrote:
> Because there were > 64k files the wildcards (?, *, etc) couldn't be
> parsed by the system anymore, but the filesystem other than that didn't
> have an issue
The wildcards are handled by your shell, and will be expanded to a list
of arguments to a program.
> And I'm not sure what the limit on the number of files in a directory
> is anymore -- it's probably filesystem dependent,
It is.
> and I think the wildcards work past 64k files now too, IIRC.
Probably, but there is still a limit. A small trick is to use the "find"
command, instead of having your shell expand the filenames, e.g.:
$ find list/archive/ -type f -name qwe\* -exec grep blah '{}' \;
This would probably work on most modern systems, and be a bit faster as
it doesn't execute grep once per file:
$ find list/archive/ -type f -name qwe\* -exec grep blah '{}' +
> > Are there options to distribute the archived content over multiple
> > sub-directories?
That is not a bad idea. I am sure a patch would be welcome :)
Kind regards,
Morten
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [mlmmj] archive options
2015-07-22 18:13 [mlmmj] archive options A. Schulze
2015-07-22 23:04 ` Chris Knadle
2015-07-23 7:23 ` Morten Shearman Kirkegaard
@ 2015-07-23 13:36 ` Piotr Auksztulewicz
2 siblings, 0 replies; 4+ messages in thread
From: Piotr Auksztulewicz @ 2015-07-23 13:36 UTC (permalink / raw)
To: mlmmj
On Thu, Jul 23, 2015 at 09:23:58AM +0200, Morten Shearman Kirkegaard wrote:
> On Wed, Jul 22, 2015 at 07:04:43PM -0400, Chris Knadle wrote:
> > Because there were > 64k files the wildcards (?, *, etc) couldn't be
> > parsed by the system anymore, but the filesystem other than that didn't
> > have an issue
>
> The wildcards are handled by your shell, and will be expanded to a list
> of arguments to a program.
Exactly, and the problem is not the number of files, but the command line
length limit of your shell.
>
> > And I'm not sure what the limit on the number of files in a directory
> > is anymore -- it's probably filesystem dependent,
>
> It is.
There were never a limit on a number of files in a single directory in
unix-like filesystems. Early ones were just inefficient when scanning
large directories, because the structure was unordered linear list,
originally with fixed-size recorasd (e.g. 16 bytes: 2 bytes for inode
number, 14 bytes for null-terminated filename - out of currenly used
filesystems Minix still has such structure), later with variable-sized
records, allowing for longer file names.
Modern filesystems for (including ext3 and ext4) use hashed or tree-like
structures for directories and modern kernels apply extensive caching of
directory entries (and modern systems have much more memory for cache use)
so it is no more a big problem.
> A small trick is to use the "find"
> command, instead of having your shell expand the filenames, e.g.:
>
> $ find list/archive/ -type f -name qwe\* -exec grep blah '{}' \;
>
> This would probably work on most modern systems, and be a bit faster as
> it doesn't execute grep once per file:
>
> $ find list/archive/ -type f -name qwe\* -exec grep blah '{}' +
>
The '+' feature works in GNU find. A compatible way to run a command with
multpile file names from a long list (too long to fit into one command line)
is using xargs:
$ find <dir> <options> | xargs <command>
--
Piotr "Malgond" Auksztulewicz firstname@lastname.net
^ permalink raw reply [flat|nested] 4+ messages in thread