All of lore.kernel.org
 help / color / mirror / Atom feed
* Git for redundant mail servers
@ 2005-04-23  6:42 David Woodhouse
  2005-04-23  8:24 ` Jon Seymour
  0 siblings, 1 reply; 5+ messages in thread
From: David Woodhouse @ 2005-04-23  6:42 UTC (permalink / raw)
  To: git; +Cc: David Gibson, jgarzik

Random alternative use for git... we could use it to provide a cluster
of redundant mail delivery/storage servers. 

The principle is simple; you use something like a set of Maildir
folders, stored in a git repository. Any action on the mail storage is
done as a commit -- that includes delivery of new mail, or user actions
from the IMAP server such as changing flags, deleting or moving mail.
These actions are actually fairly efficient when Maildir folders are
stored in a git repository -- the IMAP model is that mails are
immutable, and flag changes are done as renames.

In the normal case where all the servers are online, each commit is
immediately pushed to each remote server. When a server is offline or
separated somehow from the rest of the group, it's going to have to do a
merge when it reconnects -- we'd implement a Maildir-specific merge
algorithm, which really isn't that hard to do.

In this case we'd probably want to make active use of the feature of git
which allows you to prune history. You don't need to keep any history
further back than the commit which will be the common ancestor when a
currently-absent member of the cluster eventually comes back. In the
common case, that will actually be no history at all, since all members
will be present.

You can then have multiple members of a cluster, each running an SMTP
server and allowing for delivery of email, and each running an IMAP
server. Clients can connect to any of the machines and receive IMAP
service, and email will continue to flow inward, as long as at least one
machine in the cluster remains alive. 

-- 
dwmw2


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Git for redundant mail servers
  2005-04-23  6:42 Git for redundant mail servers David Woodhouse
@ 2005-04-23  8:24 ` Jon Seymour
  2005-04-24  5:12   ` David Lang
  0 siblings, 1 reply; 5+ messages in thread
From: Jon Seymour @ 2005-04-23  8:24 UTC (permalink / raw)
  To: David Woodhouse; +Cc: git

On 4/23/05, David Woodhouse <dwmw2@infradead.org> wrote:
> Random alternative use for git... we could use it to provide a cluster
> of redundant mail delivery/storage servers.
> 
> The principle is simple; you use something like a set of Maildir
> folders, stored in a git repository. Any action on the mail storage is
> done as a commit -- that includes delivery of new mail, or user actions
> from the IMAP server such as changing flags, deleting or moving mail.
> These actions are actually fairly efficient when Maildir folders are
> stored in a git repository -- the IMAP model is that mails are
> immutable, and flag changes are done as renames.
> 
> In the normal case where all the servers are online, each commit is
> immediately pushed to each remote server. When a server is offline or
> separated somehow from the rest of the group, it's going to have to do a
> merge when it reconnects -- we'd implement a Maildir-specific merge
> algorithm, which really isn't that hard to do.
> 

This is a cool idea. When the concept is rendered this way, it sounds
a lot like some of the core principles in the architecture of the
Lotus Notes replication engine. I've always thought it would be cool
to have an open engine that provided similar functionality to the
Lotus Notes replication engine without the naff programming
environment that sits on top. I can see how the git concepts and code
could provide the basis of such a solution. Very cool.

jon.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Git for redundant mail servers
  2005-04-23  8:24 ` Jon Seymour
@ 2005-04-24  5:12   ` David Lang
  2005-04-24  5:54     ` David Woodhouse
  0 siblings, 1 reply; 5+ messages in thread
From: David Lang @ 2005-04-24  5:12 UTC (permalink / raw)
  To: jon; +Cc: David Woodhouse, git

On Sat, 23 Apr 2005, Jon Seymour wrote:

> On 4/23/05, David Woodhouse <dwmw2@infradead.org> wrote:
>> Random alternative use for git... we could use it to provide a cluster
>> of redundant mail delivery/storage servers.
>>
>> The principle is simple; you use something like a set of Maildir
>> folders, stored in a git repository. Any action on the mail storage is
>> done as a commit -- that includes delivery of new mail, or user actions
>> from the IMAP server such as changing flags, deleting or moving mail.
>> These actions are actually fairly efficient when Maildir folders are
>> stored in a git repository -- the IMAP model is that mails are
>> immutable, and flag changes are done as renames.
>>
>> In the normal case where all the servers are online, each commit is
>> immediately pushed to each remote server. When a server is offline or
>> separated somehow from the rest of the group, it's going to have to do a
>> merge when it reconnects -- we'd implement a Maildir-specific merge
>> algorithm, which really isn't that hard to do.
>>
>
> This is a cool idea. When the concept is rendered this way, it sounds
> a lot like some of the core principles in the architecture of the
> Lotus Notes replication engine. I've always thought it would be cool
> to have an open engine that provided similar functionality to the
> Lotus Notes replication engine without the naff programming
> environment that sits on top. I can see how the git concepts and code
> could provide the basis of such a solution. Very cool.

Having been in several discussions on the cyrus mailing list about 
replication let me point out a couple basic problems that you have to work 
around.

1. when a new message arrives it gets given a numeric messageid, this 
message id is not supposed to change without fairly drastic things 
happening (the server telling all clients to forget everything they know 
about the status of the mailbox). this requires syncronization between 
servers if both are receiving messages.

2. git effectivly stores snapshots of things and you deduce the changes by 
comparing the snapshots. for things like flags changing this is a 
relativly inefficiant way to replicate changes (although if one server is 
offline for a while it could be a firly efficiant way to do the merge)

and now a couple of starting points

Cyrus already implements single-instance store so the concept of the same 
message living in multiple places doesn't have to be grafted in. it keeps 
the message flags seperate from the messages themselves so the messages 
could be replicated seperatly from the state.

personally I'm not seeing git being a huge advantage for this, but I do 
see some advantages and it's very possible I'm missing some others.

go for it.

David Lang

-- 
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
  -- C.A.R. Hoare

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Git for redundant mail servers
  2005-04-24  5:12   ` David Lang
@ 2005-04-24  5:54     ` David Woodhouse
  2005-04-24  7:45       ` David Lang
  0 siblings, 1 reply; 5+ messages in thread
From: David Woodhouse @ 2005-04-24  5:54 UTC (permalink / raw)
  To: David Lang; +Cc: jon, git

On Sat, 2005-04-23 at 22:12 -0700, David Lang wrote:
> 1. when a new message arrives it gets given a numeric messageid, this 
> message id is not supposed to change without fairly drastic things 
> happening (the server telling all clients to forget everything they know 
> about the status of the mailbox). this requires syncronization between 
> servers if both are receiving messages.

Yeah, that's the most interesting part. One option would be to require
quorum before a server is allowed to add to a mailbox -- but that would
render the thing unsuitable for _intentional_ offline use, where you
want to be able to move mails from one folder to another on your laptop
while it's disconnected.

Since it should be relatively rare for 'competing' commits to occur
during periods of disconnection, I suspect that the solution doesn't
have to be particularly efficient. I'm not sure I'd really want to
change UIDVALIDITY if it happened, but perhaps we could simply remove
_all_ the affected UIDs, and assign new UIDs to the same mails.

In practice, it's far more important that for us to ensure that an
existing UID _never_ refers to a different mail, than it is to make sure
that a given mail always keeps the same UID.

> 2. git effectivly stores snapshots of things and you deduce the changes by 
> comparing the snapshots. for things like flags changing this is a 
> relativly inefficiant way to replicate changes (although if one server is 
> offline for a while it could be a firly efficiant way to do the merge)

We don't have to stick _precisely_ to Maildir -- but flag changes are
just a rename in Maildir, leaving the mail object entirely intact while
changing only the tree. That isn't _so_ bad; but yes, it could probably
be done a little better than just "Maildir in git".

-- 
dwmw2


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Git for redundant mail servers
  2005-04-24  5:54     ` David Woodhouse
@ 2005-04-24  7:45       ` David Lang
  0 siblings, 0 replies; 5+ messages in thread
From: David Lang @ 2005-04-24  7:45 UTC (permalink / raw)
  To: David Woodhouse; +Cc: jon, git

On Sun, 24 Apr 2005, David Woodhouse wrote:

> Date: Sun, 24 Apr 2005 15:54:30 +1000
> From: David Woodhouse <dwmw2@infradead.org>
> To: David Lang <david.lang@digitalinsight.com>
> Cc: jon@zeta.org.au, git@vger.kernel.org
> Subject: Re: Git for redundant mail servers
> 
> On Sat, 2005-04-23 at 22:12 -0700, David Lang wrote:
>> 1. when a new message arrives it gets given a numeric messageid, this
>> message id is not supposed to change without fairly drastic things
>> happening (the server telling all clients to forget everything they know
>> about the status of the mailbox). this requires syncronization between
>> servers if both are receiving messages.
>
> Yeah, that's the most interesting part. One option would be to require
> quorum before a server is allowed to add to a mailbox -- but that would
> render the thing unsuitable for _intentional_ offline use, where you
> want to be able to move mails from one folder to another on your laptop
> while it's disconnected.

IMAP defines an offline mode, I haven't looked at it, but it would have to 
deal with this in some way.

> Since it should be relatively rare for 'competing' commits to occur
> during periods of disconnection, I suspect that the solution doesn't
> have to be particularly efficient. I'm not sure I'd really want to
> change UIDVALIDITY if it happened, but perhaps we could simply remove
> _all_ the affected UIDs, and assign new UIDs to the same mails.
>
> In practice, it's far more important that for us to ensure that an
> existing UID _never_ refers to a different mail, than it is to make sure
> that a given mail always keeps the same UID.

good point.

there are two things that will cause competing commits of full mail 
messages.

1. new mail arriving from the Net (probably via SMTP/LMTP

2. Client actions

2a. direct posting of messages (most common for FCC folders)

2b. copying of messages between folders

2c. flag changes

2d. expunging messages

>> 2. git effectivly stores snapshots of things and you deduce the changes by
>> comparing the snapshots. for things like flags changing this is a
>> relativly inefficiant way to replicate changes (although if one server is
>> offline for a while it could be a firly efficiant way to do the merge)
>
> We don't have to stick _precisely_ to Maildir -- but flag changes are
> just a rename in Maildir, leaving the mail object entirely intact while
> changing only the tree. That isn't _so_ bad; but yes, it could probably
> be done a little better than just "Maildir in git".

I'm familiar with Cyrus which has a similar concept of mail storage, but 
I'm only vaguely familar with maildir (I don't know all the details of how 
it does things)

the key question to answer is are you trying to just replicate maildir 
underneith the normal programs that use it?, or are you trying to have a 
replicated mailserver and are willing to modify the software as well as 
fiddle with the storage?

if you are after the first then you have to do everything at the 
filesystem level, if you are after the second it's amuch easier job, but 
you need to think carefully up front to decide what capabilities you need 
the software to have and pick the right software to start modifying.

David Lang


-- 
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
  -- C.A.R. Hoare

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2005-04-24  7:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-23  6:42 Git for redundant mail servers David Woodhouse
2005-04-23  8:24 ` Jon Seymour
2005-04-24  5:12   ` David Lang
2005-04-24  5:54     ` David Woodhouse
2005-04-24  7:45       ` David Lang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.