From: Eric Wong <e@80x24.org>
To: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jasper Spaans <j@jasper.es>,
kernelnewbies@kernelnewbies.org,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
linux-kernel@vger.kernel.org,
Eric Biederman <ebiederm@xmission.com>,
Joey Pabalinas <joeypabalinas@gmail.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Subject: Re: [RFC] LKML Archive in Maildir Format
Date: Tue, 5 Mar 2019 23:26:00 +0000 [thread overview]
Message-ID: <20190305232600.GA12110@dcvr> (raw)
In-Reply-To: <CAErSpo5a2oO=5byEuA5AouS=kBmj7ihw2EVYAvJcdti29Tf1HQ@mail.gmail.com>
Bjorn Helgaas <bhelgaas@google.com> wrote:
> OK, so I understand how to clone archives from lore.kernel.org and how
> to convert a git archive to a maildir (thanks, Konstantin!)
>
> What I *don't* understand is how to effectively read this locally.
> Ideally I'd like to run mutt, possibly with notmuch for indexing. But
> a maildir with 3M files seems impractical. I did actually try it
> (without notmuch), but it takes mutt about 5 minutes to start up. And
> the maildir is about 23G, compared with 7.5G for the git archive.
Right, relying on Maildir for long-term storage of giant archives
is not a usable solution with any general purpose FSes I know about.
git itself had the same problem with loose object scalability in
the old days and packs were invented as a result.
> Any pointers? I guess there's no mutt backend that can read a
> public-inbox archive directly?
There's mutt patches to support reading over NNTP, so that
works:
mutt -f news://$INBOX_HOST/$INBOX_NEWSGROUP
I don't think mutt handles mboxrd 100% correctly, but it's close
enough that you can can download the gzipped mboxrd of a search
query and open it via "mutt -f /path/to/downloaded/mbox.gz"
curl -XPOST -OJ "$INBOX_URL/?q=$SEARCH_QUERY&x=m"
POST is required(*), and -OJ lets it use the
Content-Disposition: header for a meaningful server-generated
name, but you can also redirect the result to whatever you want.
For all messages since March 1, you could use:
SEARCH_QUERY=d:20190301..
All the supported search queries are documented in
$INBOX_URL/_/text/help/ and the search prefixes (e.g. "d:",
"s:", "b:") are modeled after what's in mairix. You'll need to
escape the queries for URIs (e.g. " " => "+", and so on).
Xapian requires date ranges to be denoted with ".." whereas
mairix uses "-" for ranges.
The main thing public-inbox search misses from mairix is support
for "-t" which grabs non-matching messages from the same thread.
I would like to support that someday, but don't have enough time
(or funding) to make it happen at the moment.
(*) to reliably avoid wasting resources from spiders/prefetchers
_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
WARNING: multiple messages have this Message-ID (diff)
From: Eric Wong <e@80x24.org>
To: Bjorn Helgaas <bhelgaas@google.com>
Cc: Joey Pabalinas <joeypabalinas@gmail.com>,
linux-kernel@vger.kernel.org, kernelnewbies@kernelnewbies.org,
Linus Torvalds <torvalds@linux-foundation.org>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Konstantin Ryabitsev <konstantin@linuxfoundation.org>,
Eric Biederman <ebiederm@xmission.com>,
Jasper Spaans <j@jasper.es>
Subject: Re: [RFC] LKML Archive in Maildir Format
Date: Tue, 5 Mar 2019 23:26:00 +0000 [thread overview]
Message-ID: <20190305232600.GA12110@dcvr> (raw)
In-Reply-To: <CAErSpo5a2oO=5byEuA5AouS=kBmj7ihw2EVYAvJcdti29Tf1HQ@mail.gmail.com>
Bjorn Helgaas <bhelgaas@google.com> wrote:
> OK, so I understand how to clone archives from lore.kernel.org and how
> to convert a git archive to a maildir (thanks, Konstantin!)
>
> What I *don't* understand is how to effectively read this locally.
> Ideally I'd like to run mutt, possibly with notmuch for indexing. But
> a maildir with 3M files seems impractical. I did actually try it
> (without notmuch), but it takes mutt about 5 minutes to start up. And
> the maildir is about 23G, compared with 7.5G for the git archive.
Right, relying on Maildir for long-term storage of giant archives
is not a usable solution with any general purpose FSes I know about.
git itself had the same problem with loose object scalability in
the old days and packs were invented as a result.
> Any pointers? I guess there's no mutt backend that can read a
> public-inbox archive directly?
There's mutt patches to support reading over NNTP, so that
works:
mutt -f news://$INBOX_HOST/$INBOX_NEWSGROUP
I don't think mutt handles mboxrd 100% correctly, but it's close
enough that you can can download the gzipped mboxrd of a search
query and open it via "mutt -f /path/to/downloaded/mbox.gz"
curl -XPOST -OJ "$INBOX_URL/?q=$SEARCH_QUERY&x=m"
POST is required(*), and -OJ lets it use the
Content-Disposition: header for a meaningful server-generated
name, but you can also redirect the result to whatever you want.
For all messages since March 1, you could use:
SEARCH_QUERY=d:20190301..
All the supported search queries are documented in
$INBOX_URL/_/text/help/ and the search prefixes (e.g. "d:",
"s:", "b:") are modeled after what's in mairix. You'll need to
escape the queries for URIs (e.g. " " => "+", and so on).
Xapian requires date ranges to be denoted with ".." whereas
mairix uses "-" for ranges.
The main thing public-inbox search misses from mairix is support
for "-t" which grabs non-matching messages from the same thread.
I would like to support that someday, but don't have enough time
(or funding) to make it happen at the moment.
(*) to reliably avoid wasting resources from spiders/prefetchers
next prev parent reply other threads:[~2019-03-05 23:26 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-16 19:06 [RFC] LKML Archive in Maildir Format Joey Pabalinas
2018-12-16 19:06 ` Joey Pabalinas
2018-12-16 19:17 ` Joe Perches
2018-12-16 19:17 ` Joe Perches
2018-12-16 19:21 ` Joey Pabalinas
2018-12-16 19:21 ` Joey Pabalinas
2018-12-16 19:55 ` Konstantin Ryabitsev
2018-12-16 19:55 ` Konstantin Ryabitsev
2018-12-16 21:55 ` Joey Pabalinas
2018-12-16 21:55 ` Joey Pabalinas
2018-12-18 20:26 ` Jasper Spaans
2018-12-18 22:53 ` Joey Pabalinas
2018-12-16 19:46 ` Konstantin Ryabitsev
2018-12-16 19:46 ` Konstantin Ryabitsev
2018-12-16 19:53 ` Joey Pabalinas
2018-12-16 19:53 ` Joey Pabalinas
2019-01-04 1:35 ` Eric Wong
2019-03-05 20:48 ` Bjorn Helgaas
2019-03-05 20:48 ` Bjorn Helgaas
2019-03-05 23:26 ` Eric Wong [this message]
2019-03-05 23:26 ` Eric Wong
2019-03-06 20:50 ` Bjorn Helgaas
2019-03-06 20:50 ` Bjorn Helgaas
2019-03-07 3:44 ` Eric Wong
2019-03-07 3:44 ` Eric Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190305232600.GA12110@dcvr \
--to=e@80x24.org \
--cc=bhelgaas@google.com \
--cc=ebiederm@xmission.com \
--cc=gregkh@linuxfoundation.org \
--cc=j@jasper.es \
--cc=joeypabalinas@gmail.com \
--cc=kernelnewbies@kernelnewbies.org \
--cc=konstantin@linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.