netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [0/9] pohmelfs for drivers/staging
@ 2009-02-09 14:02 Evgeniy Polyakov
  2009-02-09 14:02 ` [1/9] pohmelfs: documentation Evgeniy Polyakov
  2009-02-12 19:41 ` [0/9] pohmelfs for drivers/staging Greg KH
  0 siblings, 2 replies; 18+ messages in thread
From: Evgeniy Polyakov @ 2009-02-09 14:02 UTC (permalink / raw)
  To: GregKH; +Cc: linux-kernel, linux-fsdevel, netdev, Evgeniy Polyakov

POHMELFS stands for Parallel Optimized Host Message Exchange Layered File System.

This is a high performance network filesystem with local coherent cache of
data and metadata. Its main goal is distributed parallel processing of data.

POHMELFS is a kernel client for the developed distributed parallel internet
filesystem. As it exists today, it is a high-performance parallel network
filesystem with the ability to balance reading among multiple hosts and
simultaneously write data to multiple servers for redundancy.

The main design goal of this filesystem is to implement a very fast and
scalable network filesystem with local writeback cache of data and metadata,
which greatly speeds up every IO operation compared to traditional writethrough
based network filesystems. One can check recent POHMELFS, NFS and DST benchmarks
for comparison [5].

Read balancing and writing to the multiple hosts features can be used to
improve parallel multithreaded read-mostly data processing workload and
organize fault-tolerant systems. POHMELFS as a network client does not
support data synchrnonization between the nodes (after they were brought
up after the shutdown), so this task should be implemented in servers.
POHMELFS and multiple-server-write can be used as backup solution for
the physically distributed network servers.

Currently development is concentrated on the distributed object-based
server development implemeneted with distributed hash table design approach
in mind, which main goals it completely transparent from client point of
view node management, full absence of any controlling central servers
(points of failure), transaction/history based object storage [6]

POHMELFS utilizes writeback cache, which is built on top of MO(E)SI-like
coherency protocol. It uses scalable cached read/write locking. No additional
requests are performed if lock is granted to the filesystem. The same protocol
is used by the server for the on-demand flushing of the client's cache (for
example when server wants to update local data or send some new content
into the clients caches).

POHMELFS is able to encrypt data channel or perform strong data checksumming.
Algorithms used by the filesystems are autoconfigured during the startup and
mount may fail (depending on options) if server does not support requested
algorithms.

Autoconfiguration also involves sending information about size of the exported
directory specified by the server, permission, statistics about amount of inodes,
used space and so on.

POHMELFS utilizes transaction model for all its operations.
Each transction is an object, which may embed multiple commands completed
atomically. When server fails the whole transaction will be replied against it
(or different server) later. This approach allows to maintain high data integrity
and do not desynchronize filesystem state in case of network or server failures.

Basic POHMELFS features:

    * Local coherent cache for data and metadata. (Byte-range) locking. Locks were
    	prepared to be byte-range, but since all Linux filesystems lock the whole
	inode, it was decided to lock the whole object during writing. Actual
	messages being sent for locking/cache coherency protocol are byte-range,
	but because the whole inode is locked, lock is cached, so range actually
	is equal to the inode size. One can simultaneously write into the same page
	via different offsets from different client, and every time file will be
	coherent on all clients which do it and on the server itself.
    * Completely async processing of all events (hard and symlinks are the only
    	exceptions) including object creation and data reading and writing.
    * Flexible object architecture optimized for network processing. Ability to
    	create long pathes to object and remove arbitrary huge directories in
	single network command.
    * High performance is one of the main design goals.
    * Very fast and scalable multithreaded userspace server. Being in userspace it
    	works with any underlying filesystem and still is much faster than async
	in-kernel NFS one.
    * Transactions support. Full failover for all operations. Resending
    	transactions to different servers on timeout or error.
    * Client is able to switch between different servers (if one goes down, client
    	automatically reconnects to second and so on).
    * Client parallel extensions: ability to write to multiple servers and balance
    	reading between them.
    * Client dynamic server reconfiguration: ability to add/remove servers from
    	working set in run-time.
    * Strong authentification and possible data encryption in network channel.
    * Extended attributes support.
    * Read-only mounts, ability to limit maximum size of the exported directory.

This release fixes fair number of bugs introduced in the recent switch from own path
cache to dentry one. Also several long-standing coherency issues were successfully
resolved.

Since POHMELFS and its merge requests do not attract core kernel developers comments,
I ask for the drivers/staging merge and pushing it upstream in the next releases.
Current TODO entry is effectively empty (modulo bug fixes if any), but may include some
additional features requested by the users.

Development is concentrated on the distributed hash table server, which is
implemented as a userspace library suitable for the POHMELFS server usage and
external applications [6].

Patch is against vanilla tree.

One can grab sources from archive or GIT tree (web interface).

1. POHMELFS homepage.
http://www.ioremap.net/projects/pohmelfs

2. POHMELFS archive.
http://www.ioremap.net/archive/pohmelfs/

3. POHMELFS git tree.
http://www.ioremap.net/cgi-bin/gitweb.cgi

4. POHMELFS maillist.
pohmelfs@ioremap.net

5. Recent POHMELFS, NFS and DST benchmarks.
http://www.ioremap.net/node/156
http://www.ioremap.net/node/134
(the fun one which shows the power of the local metadata cache, but please note
that it was mostly done in RAM):
http://www.ioremap.net/node/153

6. The elliptics network - distributed hash table cloud
http://www.ioremap.net/taxonomy/term/17

Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net>

 .../filesystems/pohmelfs/design_notes.txt          |   70 +
 Documentation/filesystems/pohmelfs/info.txt        |   86 +
 .../filesystems/pohmelfs/network_protocol.txt      |  227 +++
 drivers/staging/Kconfig                                         |    2 +
 drivers/staging/Makefile                                        |    1 +
 drivers/staging/pohmelfs/Kconfig                                |   23 +
 drivers/staging/pohmelfs/Makefile                               |    3 +
 drivers/staging/pohmelfs/config.c                               |  478 +++++
 drivers/staging/pohmelfs/crypto.c                               |  880 +++++++++
 drivers/staging/pohmelfs/dir.c                                  | 1093 +++++++++++
 drivers/staging/pohmelfs/inode.c                                | 1976 ++++++++++++++++++++
 drivers/staging/pohmelfs/lock.c                                 |  182 ++
 drivers/staging/pohmelfs/mcache.c                               |  171 ++
 drivers/staging/pohmelfs/net.c                                  | 1246 ++++++++++++
 drivers/staging/pohmelfs/netfs.h                                |  932 +++++++++
 drivers/staging/pohmelfs/path_entry.c                           |  114 ++
 drivers/staging/pohmelfs/trans.c                                |  715 +++++++
 mm/filemap.c                                       |    2 +
 18 files changed, 8201 insertions(+), 0 deletions(-)

^ permalink raw reply	[flat|nested] 18+ messages in thread
* [0/9] pohmelfs: The Great Southern Trendkill release.
@ 2008-12-26 14:18 Evgeniy Polyakov
  2008-12-26 14:18 ` [1/9] pohmelfs: documentation Evgeniy Polyakov
  0 siblings, 1 reply; 18+ messages in thread
From: Evgeniy Polyakov @ 2008-12-26 14:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, pohmelfs, netdev, linux-fsdevel, Evgeniy Polyakov

POHMELFS stands for Parallel Optimized Host Message Exchange Layered File System.

This is a high performance network filesystem with local coherent cache of
data and metadata. Its main goal is distributed parallel processing of data.

POHMELFS is a kernel client for the developed distributed parallel internet
filesystem. As it exists today, it is a high-performance parallel network
filesystem with ability to balance reading from multiple hosts and simultaneously
write data to multiple hosts.

Main design goal of this filesystem is to implement very fast and scalable
network filesystem with local writeback cache of data and metadata, which
greatly speeds up every IO operation compared to traditional writethrough
based network filesystems.

Read balancing and writing to multiple hosts features can be used to improve
parallel multithreaded read-mostly data processing workload and organize
fault-tolerant systems. POHMELFS as a network client does not support data
synchrnonization between the nodes, so this task should be implemented in
servers. POHMELFS and multiple-server-write can be used as backup solution
for the physically distributed network servers.

Currently development is concentrated on the distributed object-based
server development implemeneted with distributed hash table design approach
in mind, which main goals it completely transparent from client point of
view node management, full absence of any controlling central servers
(points of failure), transaction/history based object storage.

POHMELFS utilizes writeback cache, which is built on top of MO(E)SI-like
coherency protocol. It uses scalable cached read/write locking. No additional
requests are performed if lock is granted to the filesystem. The same protocol
is used by the server to on-demand flushing of the client's cache (for
example when server wants to update local data or send some new content
into the clients caches).

POHMELFS is able to encrypt data channel or perform strong data checksumming.
Algorithms used by the filesystems are autoconfigured during startup and
mount may fail (depending on options) if server does not support requested
algorithms.

Autoconfiguration also involve sending information about size of the exported
directory specified by the server, permission, statistics about amount of inodes,
used space and so on.

POHMELFS utilizes transaction model for all its operations.
Each transction is an object, which may embed multiple commands completed
atomically. When server fails the whole transaction will be replied against it
(or different server) later. This approach allows to maintain high data integrity
and do not desynchronize filesystem state in case of network or server failures.

Basic POHMELFS features:

    * Local coherent cache for data and metadata. (Byte-range) locking. Locks were
    	prepared to be byte-range, but since all Linux filesystems lock the whole
	inode, it was decided to lock the whole object during writing. Actual
	messages being sent for locking/cache coherency protocol are byte-range,
	but because the whole inode is locked, lock is cached, so range actually
	is equal to the inode size. One can simultaneously write into the same page
	via different offsets from different client, and every time file will be
	coherent on all clients which do it and on the server itself.
    * Completely async processing of all events (hard and symlinks are the only
    	exceptions) including object creation and data reading and writing.
    * Flexible object architecture optimized for network processing. Ability to
    	create long pathes to object and remove arbitrary huge directories in
	single network command.
    * High performance is one of the main design goals.
    * Very fast and scalable multithreaded userspace server. Being in userspace it
    	works with any underlying filesystem and still is much faster than async
	in-kernel NFS one.
    * Transactions support. Full failover for all operations. Resending
    	transactions to different servers on timeout or error.
    * Client is able to switch between different servers (if one goes down, client
    	automatically reconnects to second and so on).
    * Client parallel extensions: ability to write to multiple servers and balance
    	reading between them.
    * Client dynamic server reconfiguration: ability to add/remove servers from
    	working set in run-time.
    * Strong authentification and possible data encryption in network channel.
    * Extended attributes support.
    * Read-only mounts, ability to limit maximum size of the exported directory.

One can grab sources from archive or GIT tree (web interface).

1. POHMELFS homepage.
http://www.ioremap.net/projects/pohmelfs

2. POHMELFS archive.
http://www.ioremap.net/archive/pohmelfs/

3. POHMELFS git tree.
http://www.ioremap.net/cgi-bin/gitweb.cgi

4. POHMELFS maillist.
pohmelfs @ ioremap.net

Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net>

.../filesystems/pohmelfs/design_notes.txt          |   70 +
 Documentation/filesystems/pohmelfs/info.txt        |   86 +
 .../filesystems/pohmelfs/network_protocol.txt      |  227 +++
 fs/Kconfig                                         |    2 +
 fs/Makefile                                        |    1 +
 fs/pohmelfs/Kconfig                                |   23 +
 fs/pohmelfs/Makefile                               |    3 +
 fs/pohmelfs/config.c                               |  478 +++++
 fs/pohmelfs/crypto.c                               |  880 +++++++++
 fs/pohmelfs/dir.c                                  | 1139 +++++++++++
 fs/pohmelfs/inode.c                                | 2085 ++++++++++++++++++++
 fs/pohmelfs/lock.c                                 |  186 ++
 fs/pohmelfs/mcache.c                               |  171 ++
 fs/pohmelfs/net.c                                  | 1180 +++++++++++
 fs/pohmelfs/netfs.h                                |  934 +++++++++
 fs/pohmelfs/path_entry.c                           |  356 ++++
 fs/pohmelfs/trans.c                                |  704 +++++++
 mm/filemap.c                                       |    2 +
 18 files changed, 8527 insertions(+), 0 deletions(-)



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2009-02-12 20:58 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-09 14:02 [0/9] pohmelfs for drivers/staging Evgeniy Polyakov
2009-02-09 14:02 ` [1/9] pohmelfs: documentation Evgeniy Polyakov
2009-02-09 14:02   ` [2/9] pohmelfs: configuration interface Evgeniy Polyakov
2009-02-09 14:02     ` [3/9] pohmelfs: crypto processing Evgeniy Polyakov
2009-02-09 14:02       ` [4/9] pohmelfs: directory operations Evgeniy Polyakov
2009-02-09 14:02         ` [5/9] pohmelfs: inode operations Evgeniy Polyakov
2009-02-09 14:02           ` [6/9] pohmelfs: distributed locking and cache coherency protocol Evgeniy Polyakov
2009-02-09 14:02             ` [7/9] pohmelfs: network operations Evgeniy Polyakov
2009-02-09 14:02               ` [8/9] pohmelfs: transaction layer Evgeniy Polyakov
2009-02-09 14:02                 ` [9/9] pohmelfs: kconfig/makefile and vfs changes Evgeniy Polyakov
2009-02-09 21:30                   ` [10/9] pohmelfs: select crypto modules from the config Evgeniy Polyakov
2009-02-12 19:41 ` [0/9] pohmelfs for drivers/staging Greg KH
2009-02-12 20:58   ` Evgeniy Polyakov
  -- strict thread matches above, loose matches on Subject: below --
2008-12-26 14:18 [0/9] pohmelfs: The Great Southern Trendkill release Evgeniy Polyakov
2008-12-26 14:18 ` [1/9] pohmelfs: documentation Evgeniy Polyakov
2008-12-26 19:22   ` Pavel Machek
2008-12-28 13:41     ` Evgeniy Polyakov
2008-12-28 22:19       ` Paolo Ciarrocchi
2008-12-30 21:23         ` Evgeniy Polyakov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).