* Re: [4/9] pohmelfs: directory operations. @ 2009-01-14 0:43 Yehuda Sadeh Weinraub 2009-01-14 1:07 ` Evgeniy Polyakov 0 siblings, 1 reply; 6+ messages in thread From: Yehuda Sadeh Weinraub @ 2009-01-14 0:43 UTC (permalink / raw) To: linux-fsdevel Hi Evgeniy, I've played a bit with pohmelfs and it looks very nice. Specifically I was very happy with the easy configuration on both the server and the client. I understand that you manage the metadata operations asynchronously. How do you handle multiple clients in pohmelfs? Any specific locking? I did have some problems. I was trying to run dbench and it failed. Also I understand that you haven't implemented all vfs interface yet (chmod, chown, etc.). I was looking at the following code (more of at the comments), and it explained some strange stuff that I had seen: > +/* > + * Create new inode for given parameters (name, inode info, parent). > + * This does not create object on the server, it will be synced there > during writeback. > + */ > +struct pohmelfs_inode *pohmelfs_new_inode(struct pohmelfs_sb *psb, > + struct pohmelfs_inode *parent, struct qstr *str, > + struct netfs_inode_info *info, int link) ... > +/* > + * Create new object in local cache. Object will be synced to server > + * during writeback for given inode. > + */ > +struct pohmelfs_inode *pohmelfs_create_entry_local(struct pohmelfs_sb *psb, > + struct pohmelfs_inode *parent, struct qstr *str, u64 start, int mode) > +{ ... > +/* > + * Create local object and bind it to dentry. > + */ > +static int pohmelfs_create_entry(struct inode *dir, struct dentry > *dentry, u64 start, int mode) > +{ > + struct pohmelfs_sb *psb = POHMELFS_SB(dir->i_sb); > + struct pohmelfs_inode *npi, *parent; > + struct qstr str = dentry->d_name; > + int err; > + > + parent = POHMELFS_I(dir); > + > + err = pohmelfs_data_lock(parent, 0, ~0, POHMELFS_WRITE_LOCK); > + if (err) > + return err; > + > + str.hash = jhash(dentry->d_name.name, dentry->d_name.len, 0); > + > + npi = pohmelfs_create_entry_local(psb, parent, &str, start, mode); > + if (IS_ERR(npi)) > + return PTR_ERR(npi); > + > + d_instantiate(dentry, &npi->vfs_inode); > + > + dprintk("%s: parent: %llu, inode: %llu, name: '%s', parent_nlink: > %d, nlink: %d.\n", > + __func__, parent->ino, npi->ino, dentry->d_name.name, > + (signed)dir->i_nlink, (signed)npi->vfs_inode.i_nlink); > + > + return 0; > +} > + > +/* > + * VFS create and mkdir callbacks. > + */ > +static int pohmelfs_create(struct inode *dir, struct dentry *dentry, int mode, > + struct nameidata *nd) > +{ > + return pohmelfs_create_entry(dir, dentry, 0, mode); > +} > + > +static int pohmelfs_mkdir(struct inode *dir, struct dentry *dentry, int mode) > +{ > + int err; > + > + inode_inc_link_count(dir); > + err = pohmelfs_create_entry(dir, dentry, 0, mode | S_IFDIR); > + if (err) > + inode_dec_link_count(dir); > + > + return err; > +} I understand that you handle metadata operations locally, and for example when doing mkdir or creating a file, the file/directory is created locally, the inode is set dirty and when a writeback occurs everything is syncronized to the server. Isn't there some problem in cases where there are more than one client operating on the same directory with the same filename then? Moreover, it is possible that client A creates a file and client B creates a directory with the same name. Am I missing something, or is that the intended design? Thanks, Yehuda ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [4/9] pohmelfs: directory operations. 2009-01-14 0:43 [4/9] pohmelfs: directory operations Yehuda Sadeh Weinraub @ 2009-01-14 1:07 ` Evgeniy Polyakov 2009-01-14 1:28 ` Yehuda Sadeh Weinraub 0 siblings, 1 reply; 6+ messages in thread From: Evgeniy Polyakov @ 2009-01-14 1:07 UTC (permalink / raw) To: Yehuda Sadeh Weinraub; +Cc: linux-fsdevel Hi. On Tue, Jan 13, 2009 at 04:43:30PM -0800, Yehuda Sadeh Weinraub (yehudasa@gmail.com) wrote: > I've played a bit with pohmelfs and it looks very nice. Specifically > I was very happy with the easy configuration on both the server and > the client. Thanks a lot for giving it a try! > I understand that you manage the metadata operations asynchronously. > How do you handle multiple clients in pohmelfs? Any specific locking? There is a cache coherency protocol which is managed by the server, so effectively every access requests a lock and if another client influences the object, server will ask the first one to flush its changes to the server which will then be sent to the second client after it gets the lock. Locks are cached and not requested again when new operation happens, it will be requested only if server requested to drop the lock. > I did have some problems. I was trying to run dbench and it failed. > Also I understand that you haven't implemented all vfs interface yet > (chmod, chown, etc.). I was looking at the following code (more of at > the comments), and it explained some strange stuff that I had seen: They should be implemented already in the change attribute code. Attribute changes are flushed at writeback time or on server's demand. I will try to reproduce the problem locally, thanks for the report. Can you provide the output of the command, its parameters and the dmesg? > I understand that you handle metadata operations locally, and for > example when doing mkdir or creating a file, the file/directory is > created locally, the inode is set dirty and when a writeback occurs > everything is syncronized to the server. Isn't there some problem in > cases where there are more than one client operating on the same > directory with the same filename then? Moreover, it is possible that > client A creates a file and client B creates a directory with the same > name. Am I missing something, or is that the intended design? There are distributed locks used in the POHMELFS, and appropriate protocol maintains coherent cache on all the clients. In the example you showed, the first client will grab the parent directory lock, create a file there and then flush it to the server when the second client starts an operation. It will then detect that there is a file with given name and file the processing (or will overwrite its content, depending on the application of course). -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [4/9] pohmelfs: directory operations. 2009-01-14 1:07 ` Evgeniy Polyakov @ 2009-01-14 1:28 ` Yehuda Sadeh Weinraub 2009-01-14 1:31 ` Evgeniy Polyakov 0 siblings, 1 reply; 6+ messages in thread From: Yehuda Sadeh Weinraub @ 2009-01-14 1:28 UTC (permalink / raw) To: Evgeniy Polyakov; +Cc: linux-fsdevel >> I did have some problems. I was trying to run dbench and it failed. >> Also I understand that you haven't implemented all vfs interface yet >> (chmod, chown, etc.). I was looking at the following code (more of at >> the comments), and it explained some strange stuff that I had seen: > > They should be implemented already in the change attribute code. > Attribute changes are flushed at writeback time or on server's demand. > ok, might be that the following is the problem: int pohmelfs_setattr(struct dentry *dentry, struct iattr *attr) { ... if (!(psb->state_flags & POHMELFS_FLAGS_XATTR)) return -EOPNOTSUPP; Probably this test should be only on setxattr? ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [4/9] pohmelfs: directory operations. 2009-01-14 1:28 ` Yehuda Sadeh Weinraub @ 2009-01-14 1:31 ` Evgeniy Polyakov 0 siblings, 0 replies; 6+ messages in thread From: Evgeniy Polyakov @ 2009-01-14 1:31 UTC (permalink / raw) To: Yehuda Sadeh Weinraub; +Cc: linux-fsdevel On Tue, Jan 13, 2009 at 05:28:29PM -0800, Yehuda Sadeh Weinraub (yehudasa@gmail.com) wrote: > > They should be implemented already in the change attribute code. > > Attribute changes are flushed at writeback time or on server's demand. > > > ok, might be that the following is the problem: > > int pohmelfs_setattr(struct dentry *dentry, struct iattr *attr) > { > ... > if (!(psb->state_flags & POHMELFS_FLAGS_XATTR)) > return -EOPNOTSUPP; > > > Probably this test should be only on setxattr? Definitely :) It should be moved there. Thanks for spotting this. -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 6+ messages in thread
* [0/9] pohmelfs for drivers/staging
@ 2009-02-09 14:02 Evgeniy Polyakov
2009-02-09 14:02 ` [1/9] pohmelfs: documentation Evgeniy Polyakov
0 siblings, 1 reply; 6+ messages in thread
From: Evgeniy Polyakov @ 2009-02-09 14:02 UTC (permalink / raw)
To: GregKH; +Cc: linux-kernel, linux-fsdevel, netdev, Evgeniy Polyakov
POHMELFS stands for Parallel Optimized Host Message Exchange Layered File System.
This is a high performance network filesystem with local coherent cache of
data and metadata. Its main goal is distributed parallel processing of data.
POHMELFS is a kernel client for the developed distributed parallel internet
filesystem. As it exists today, it is a high-performance parallel network
filesystem with the ability to balance reading among multiple hosts and
simultaneously write data to multiple servers for redundancy.
The main design goal of this filesystem is to implement a very fast and
scalable network filesystem with local writeback cache of data and metadata,
which greatly speeds up every IO operation compared to traditional writethrough
based network filesystems. One can check recent POHMELFS, NFS and DST benchmarks
for comparison [5].
Read balancing and writing to the multiple hosts features can be used to
improve parallel multithreaded read-mostly data processing workload and
organize fault-tolerant systems. POHMELFS as a network client does not
support data synchrnonization between the nodes (after they were brought
up after the shutdown), so this task should be implemented in servers.
POHMELFS and multiple-server-write can be used as backup solution for
the physically distributed network servers.
Currently development is concentrated on the distributed object-based
server development implemeneted with distributed hash table design approach
in mind, which main goals it completely transparent from client point of
view node management, full absence of any controlling central servers
(points of failure), transaction/history based object storage [6]
POHMELFS utilizes writeback cache, which is built on top of MO(E)SI-like
coherency protocol. It uses scalable cached read/write locking. No additional
requests are performed if lock is granted to the filesystem. The same protocol
is used by the server for the on-demand flushing of the client's cache (for
example when server wants to update local data or send some new content
into the clients caches).
POHMELFS is able to encrypt data channel or perform strong data checksumming.
Algorithms used by the filesystems are autoconfigured during the startup and
mount may fail (depending on options) if server does not support requested
algorithms.
Autoconfiguration also involves sending information about size of the exported
directory specified by the server, permission, statistics about amount of inodes,
used space and so on.
POHMELFS utilizes transaction model for all its operations.
Each transction is an object, which may embed multiple commands completed
atomically. When server fails the whole transaction will be replied against it
(or different server) later. This approach allows to maintain high data integrity
and do not desynchronize filesystem state in case of network or server failures.
Basic POHMELFS features:
* Local coherent cache for data and metadata. (Byte-range) locking. Locks were
prepared to be byte-range, but since all Linux filesystems lock the whole
inode, it was decided to lock the whole object during writing. Actual
messages being sent for locking/cache coherency protocol are byte-range,
but because the whole inode is locked, lock is cached, so range actually
is equal to the inode size. One can simultaneously write into the same page
via different offsets from different client, and every time file will be
coherent on all clients which do it and on the server itself.
* Completely async processing of all events (hard and symlinks are the only
exceptions) including object creation and data reading and writing.
* Flexible object architecture optimized for network processing. Ability to
create long pathes to object and remove arbitrary huge directories in
single network command.
* High performance is one of the main design goals.
* Very fast and scalable multithreaded userspace server. Being in userspace it
works with any underlying filesystem and still is much faster than async
in-kernel NFS one.
* Transactions support. Full failover for all operations. Resending
transactions to different servers on timeout or error.
* Client is able to switch between different servers (if one goes down, client
automatically reconnects to second and so on).
* Client parallel extensions: ability to write to multiple servers and balance
reading between them.
* Client dynamic server reconfiguration: ability to add/remove servers from
working set in run-time.
* Strong authentification and possible data encryption in network channel.
* Extended attributes support.
* Read-only mounts, ability to limit maximum size of the exported directory.
This release fixes fair number of bugs introduced in the recent switch from own path
cache to dentry one. Also several long-standing coherency issues were successfully
resolved.
Since POHMELFS and its merge requests do not attract core kernel developers comments,
I ask for the drivers/staging merge and pushing it upstream in the next releases.
Current TODO entry is effectively empty (modulo bug fixes if any), but may include some
additional features requested by the users.
Development is concentrated on the distributed hash table server, which is
implemented as a userspace library suitable for the POHMELFS server usage and
external applications [6].
Patch is against vanilla tree.
One can grab sources from archive or GIT tree (web interface).
1. POHMELFS homepage.
http://www.ioremap.net/projects/pohmelfs
2. POHMELFS archive.
http://www.ioremap.net/archive/pohmelfs/
3. POHMELFS git tree.
http://www.ioremap.net/cgi-bin/gitweb.cgi
4. POHMELFS maillist.
pohmelfs@ioremap.net
5. Recent POHMELFS, NFS and DST benchmarks.
http://www.ioremap.net/node/156
http://www.ioremap.net/node/134
(the fun one which shows the power of the local metadata cache, but please note
that it was mostly done in RAM):
http://www.ioremap.net/node/153
6. The elliptics network - distributed hash table cloud
http://www.ioremap.net/taxonomy/term/17
Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net>
.../filesystems/pohmelfs/design_notes.txt | 70 +
Documentation/filesystems/pohmelfs/info.txt | 86 +
.../filesystems/pohmelfs/network_protocol.txt | 227 +++
drivers/staging/Kconfig | 2 +
drivers/staging/Makefile | 1 +
drivers/staging/pohmelfs/Kconfig | 23 +
drivers/staging/pohmelfs/Makefile | 3 +
drivers/staging/pohmelfs/config.c | 478 +++++
drivers/staging/pohmelfs/crypto.c | 880 +++++++++
drivers/staging/pohmelfs/dir.c | 1093 +++++++++++
drivers/staging/pohmelfs/inode.c | 1976 ++++++++++++++++++++
drivers/staging/pohmelfs/lock.c | 182 ++
drivers/staging/pohmelfs/mcache.c | 171 ++
drivers/staging/pohmelfs/net.c | 1246 ++++++++++++
drivers/staging/pohmelfs/netfs.h | 932 +++++++++
drivers/staging/pohmelfs/path_entry.c | 114 ++
drivers/staging/pohmelfs/trans.c | 715 +++++++
mm/filemap.c | 2 +
18 files changed, 8201 insertions(+), 0 deletions(-)
^ permalink raw reply [flat|nested] 6+ messages in thread* [1/9] pohmelfs: documentation. 2009-02-09 14:02 [0/9] pohmelfs for drivers/staging Evgeniy Polyakov @ 2009-02-09 14:02 ` Evgeniy Polyakov 2009-02-09 14:02 ` [2/9] pohmelfs: configuration interface Evgeniy Polyakov 0 siblings, 1 reply; 6+ messages in thread From: Evgeniy Polyakov @ 2009-02-09 14:02 UTC (permalink / raw) To: GregKH; +Cc: linux-kernel, linux-fsdevel, netdev, Eveniy Polyakov This patch includes POHMELFS design and implementation description. Separate file includes mount options, default parameters and usage examples. Signed-off-by: Eveniy Polyakov <zbr@ioremap.net> diff --git a/Documentation/filesystems/pohmelfs/design_notes.txt b/Documentation/filesystems/pohmelfs/design_notes.txt new file mode 100644 index 0000000..6d6db60 --- /dev/null +++ b/Documentation/filesystems/pohmelfs/design_notes.txt @@ -0,0 +1,70 @@ +POHMELFS: Parallel Optimized Host Message Exchange Layered File System. + + Evgeniy Polyakov <zbr@ioremap.net> + +Homepage: http://www.ioremap.net/projects/pohmelfs + +POHMELFS first began as a network filesystem with coherent local data and +metadata caches but is now evolving into a parallel distributed filesystem. + +Main features of this FS include: + * Locally coherent cache for data and metadata with (potentially) byte-range locks. + Since all Linux filesystems lock the whole inode during writing, algorithm + is very simple and does not use byte-ranges, although they are sent in + locking messages. + * Completely async processing of all events except creation of hard and symbolic + links, and rename events. + Object creation and data reading and writing are processed asynchronously. + * Flexible object architecture optimized for network processing. + Ability to create long paths to objects and remove arbitrarily huge + directories with a single network command. + (like removing the whole kernel tree via a single network command). + * Very high performance. + * Fast and scalable multithreaded userspace server. Being in userspace it works + with any underlying filesystem and still is much faster than async in-kernel NFS one. + * Client is able to switch between different servers (if one goes down, client + automatically reconnects to second and so on). + * Transactions support. Full failover for all operations. + Resending transactions to different servers on timeout or error. + * Read request (data read, directory listing, lookup requests) balancing between multiple servers. + * Write requests are replicated to multiple servers and completed only when all of them are acked. + * Ability to add and/or remove servers from the working set at run-time. + * Strong authentification and possible data encryption in network channel. + * Extended attributes support. + +POHMELFS is based on transactions, which are potentially long-standing objects that live +in the client's memory. Each transaction contains all the information needed to process a given +command (or set of commands, which is frequently used during data writing: single transactions +can contain creation and data writing commands). Transactions are committed by all the servers +to which they are sent and, in case of failures, are eventually resent or dropped with an error. +For example, reading will return an error if no servers are available. + +POHMELFS uses a asynchronous approach to data processing. Courtesy of transactions, it is +possible to detach replies from requests and, if the command requires data to be received, the +caller sleeps waiting for it. Thus, it is possible to issue multiple read commands to different +servers and async threads will pick up replies in parallel, find appropriate transactions in the +system and put the data where it belongs (like the page or inode cache). + +The main feature of POHMELFS is writeback data and the metadata cache. +Only a few non-performance critical operations use the write-through cache and +are synchronous: hard and symbolic link creation, and object rename. Creation, +removal of objects and data writing are asynchronous and are sent to +the server during system writeback. Only one writer at a time is allowed for any +given inode, which is guarded by an appropriate locking protocol. +Because of this feature, POHMELFS is extremely fast at metadata intensive +workloads and can fully utilize the bandwidth to the servers when doing bulk +data transfers. + +POHMELFS clients operate with a working set of servers and are capable of balancing read-only +operations (like lookups or directory listings) between them. +Administrators can add or remove servers from the set at run-time via special commands (described +in Documentation/pohmelfs/info.txt file). Writes are replicated to all servers. + +POHMELFS is capable of full data channel encryption and/or strong crypto hashing. +One can select any kernel supported cipher, encryption mode, hash type and operation mode +(hmac or digest). It is also possible to use both or neither (default). Crypto configuration +is checked during mount time and, if the server does not support it, appropriate capabilities +will be disabled or mount will fail (if 'crypto_fail_unsupported' mount option is specified). +Crypto performance heavily depends on the number of crypto threads, which asynchronously perform +crypto operations and send the resulting data to server or submit it up the stack. This number +can be controlled via a mount option. diff --git a/Documentation/filesystems/pohmelfs/info.txt b/Documentation/filesystems/pohmelfs/info.txt new file mode 100644 index 0000000..4e3d501 --- /dev/null +++ b/Documentation/filesystems/pohmelfs/info.txt @@ -0,0 +1,86 @@ +POHMELFS usage information. + +Mount options: +idx=%u + Each mountpoint is associated with a special index via this option. + Administrator can add or remove servers from the given index, so all mounts, + which were attached to it, are updated. + Default it is 0. + +trans_scan_timeout=%u + This timeout, expressed in milliseconds, specifies time to scan transaction + trees looking for stale requests, which have to be resent, or if number of + retries exceed specified limit, dropped with error. + Default is 5 seconds. + +drop_scan_timeout=%u + Internal timeout, expressed in milliseconds, which specifies how frequently + inodes marked to be dropped are freed. It also specifies how frequently + the system checks that servers have to be added or removed from current working set. + Default is 1 second. + +wait_on_page_timeout=%u + Number of milliseconds to wait for reply from remote server for data reading command. + If this timeout is exceeded, reading returns an error. + Default is 5 seconds. + +trans_retries=%u + This is the number of times that a transaction will be resent to a server that did + not answer for the last @trans_scan_timeout milliseconds. + When the number of resends exceeds this limit, the transaction is completed with error. + Default is 5 resends. + +crypto_thread_num=%u + Number of crypto processing threads. Threads are used both for RX and TX traffic. + Default is 2, or no threads if crypto operations are not supported. + +trans_max_pages=%u + Maximum number of pages in a single transaction. This parameter also controls + the number of pages, allocated for crypto processing (each crypto thread has + pool of pages, the number of which is equal to 'trans_max_pages'. + Default is 100 pages. + +crypto_fail_unsupported + If specified, mount will fail if the server does not support requested crypto operations. + By default mount will disable non-matching crypto operations. + +mcache_timeout=%u + Maximum number of milliseconds to wait for the mcache objects to be processed. + Mcache includes locks (given lock should be granted by server), attributes (they should be + fully received in the given timeframe). + Default is 5 seconds. + +Usage examples. + +Add (or remove if it already exists) server server1.net:1025 into the working set with index $idx +with appropriate hash algorithm and key file and cipher algorithm, mode and key file: +$cfg -a server1.net -p 1025 -i $idx -K $hash_key -k $cipher_key + +Mount filesystem with given index $idx to /mnt mountpoint. +Client will connect to all servers specified in the working set via previous command: +mount -t pohmel -o idx=$idx q /mnt + +One can add or remove servers from working set after mounting too. + + +Server installation. + +Creating a server, which listens at port 1025 and 0.0.0.0 address. +Working root directory (note, that server chroots there, so you have to have appropriate permissions) +is set to /mnt, server will negotiate hash/cipher with client, in case client requested it, there +are appropriate key files. +Number of working threads is set to 10. + +# ./fserver -a 0.0.0.0 -p 1025 -r /mnt -w 10 -K hash_key -k cipher_key + + -A 6 - listen on ipv6 address. Default: Disabled. + -r root - path to root directory. Default: /tmp. + -a addr - listen address. Default: 0.0.0.0. + -p port - listen port. Default: 1025. + -w workers - number of workers per connected client. Default: 1. + -K file - hash key size. Default: none. + -k file - cipher key size. Default: none. + -h - this help. + +Number of worker threads specifies how many workers will be created for each client. +Bulk single-client transafers usually are better handled with smaller number (like 1-3). diff --git a/Documentation/filesystems/pohmelfs/network_protocol.txt b/Documentation/filesystems/pohmelfs/network_protocol.txt new file mode 100644 index 0000000..40ea6c2 --- /dev/null +++ b/Documentation/filesystems/pohmelfs/network_protocol.txt @@ -0,0 +1,227 @@ +POHMELFS network protocol. + +Basic structure used in network communication is following command: + +struct netfs_cmd +{ + __u16 cmd; /* Command number */ + __u16 csize; /* Attached crypto information size */ + __u16 cpad; /* Attached padding size */ + __u16 ext; /* External flags */ + __u32 size; /* Size of the attached data */ + __u32 trans; /* Transaction id */ + __u64 id; /* Object ID to operate on. Used for feedback.*/ + __u64 start; /* Start of the object. */ + __u64 iv; /* IV sequence */ + __u8 data[0]; +}; + +Commands can be embedded into transaction command (which in turn has own command), +so one can extend protocol as needed without breaking backward compatibility as long +as old commands are supported. All string lengths include tail 0 byte. + +All commans are transfered over the network in big-endian. CPU endianess is used at the end peers. + +@cmd - command number, which specifies command to be processed. Following + commands are used currently: + + NETFS_READDIR = 1, /* Read directory for given inode number */ + NETFS_READ_PAGE, /* Read data page from the server */ + NETFS_WRITE_PAGE, /* Write data page to the server */ + NETFS_CREATE, /* Create directory entry */ + NETFS_REMOVE, /* Remove directory entry */ + NETFS_LOOKUP, /* Lookup single object */ + NETFS_LINK, /* Create a link */ + NETFS_TRANS, /* Transaction */ + NETFS_OPEN, /* Open intent */ + NETFS_INODE_INFO, /* Metadata cache coherency synchronization message */ + NETFS_PAGE_CACHE, /* Page cache invalidation message */ + NETFS_READ_PAGES, /* Read multiple contiguous pages in one go */ + NETFS_RENAME, /* Rename object */ + NETFS_CAPABILITIES, /* Capabilities of the client, for example supported crypto */ + NETFS_LOCK, /* Distributed lock message */ + NETFS_XATTR_SET, /* Set extended attribute */ + NETFS_XATTR_GET, /* Get extended attribute */ + +@ext - external flags. Used by different commands to specify some extra arguments + like partial size of the embedded objects or creation flags. + +@size - size of the attached data. For NETFS_READ_PAGE and NETFS_READ_PAGES no data is attached, + but size of the requested data is incorporated here. It does not include size of the command + header (struct netfs_cmd) itself. + +@id - id of the object this command operates on. Each command can use it for own purpose. + +@start - start of the object this command operates on. Each command can use it for own purpose. + +@csize, @cpad - size and padding size of the (attached if needed) crypto information. + +Command specifications. + +@NETFS_READDIR +This command is used to sync content of the remote dir to the client. + +@ext - length of the path to object. +@size - the same. +@id - local inode number of the directory to read. +@start - zero. + + +@NETFS_READ_PAGE +This command is used to read data from remote server. +Data size does not exceed local page cache size. + +@id - inode number. +@start - first byte offset. +@size - number of bytes to read plus length of the path to object. +@ext - object path length. + + +@NETFS_CREATE +Used to create object. +It does not require that all directories on top of the object were +already created, it will create them automatically. Each object has +associated @netfs_path_entry data structure, which contains creation +mode (permissions and type) and length of the name as long as name itself. + +@start - 0 +@size - size of the all data structures needed to create a path +@id - local inode number +@ext - 0 + + +@NETFS_REMOVE +Used to remove object. + +@ext - length of the path to object. +@size - the same. +@id - local inode number. +@start - zero. + + +@NETFS_LOOKUP +Lookup information about object on server. + +@ext - length of the path to object. +@size - the same. +@id - local inode number of the directory to look object in. +@start - local inode number of the object to look at. + + +@NETFS_LINK +Create hard of symlink. +Command is sent as "object_path|target_path". + +@size - size of the above string. +@id - parent local inode number. +@start - 1 for symlink, 0 for hardlink. +@ext - size of the "object_path" above. + + +@NETFS_TRANS +Transaction header. + +@size - incorporates all embedded command sizes including theirs header sizes. +@start - transaction generation number - unique id used to find transaction. +@ext - transaction flags. Unused at the moment. +@id - 0. + + +@NETFS_OPEN +Open intent for given transaction. + +@id - local inode number. +@start - 0. +@size - path length to the object. +@ext - open flags (O_RDWR and so on). + + +@NETFS_INODE_INFO +Metadata update command. +It is sent to servers when attributes of the object are changed and received +when data or metadata were updated. It operates with the following structure: + +struct netfs_inode_info +{ + unsigned int mode; + unsigned int nlink; + unsigned int uid; + unsigned int gid; + unsigned int blocksize; + unsigned int padding; + __u64 ino; + __u64 blocks; + __u64 rdev; + __u64 size; + __u64 version; +}; + +It effectively mirrors stat(2) returned data. + + +@ext - path length to the object. +@size - the same plus size of the netfs_inode_info structure. +@id - local inode number. +@start - 0. + + +@NETFS_PAGE_CACHE +Command is only received by clients. It contains information about +page to be marked as not up-to-date. + +@id - client's inode number. +@start - last byte of the page to be invalidated. If it is not equal to + current inode size, it will be vmtruncated(). +@size - 0 +@ext - 0 + + +@NETFS_READ_PAGES +Used to read multiple contiguous pages in one go. + +@start - first byte of the contiguous region to read. +@size - contains of two fields: lower 8 bits are used to represent page cache shift + used by client, another 3 bytes are used to get number of pages. +@id - local inode number. +@ext - path length to the object. + + +@NETFS_RENAME +Used to rename object. +Attached data is formed into following string: "old_path|new_path". + +@id - local inode number. +@start - parent inode number. +@size - length of the above string. +@ext - length of the old path part. + + +@NETFS_CAPABILITIES +Used to exchange crypto capabilities with server. +If crypto capabilities are not supported by server, then client will disable it +or fail (if 'crypto_fail_unsupported' mount options was specified). + +@id - superblock index. Used to specify crypto information for group of servers. +@size - size of the attached capabilities structure. +@start - 0. +@size - 0. +@scsize - 0. + +@NETFS_LOCK +Used to send lock request/release messages. Although it sends byte range request +and is capable of flushing pages based on that, it is not used, since all Linux +filesystems lock the whole inode. + +@id - lock generation number. +@start - start of the locked range. +@size - size of the locked range. +@ext - lock type: read/write. Not used actually. 15'th bit is used to determine, + if it is lock request (1) or release (0). + +@NETFS_XATTR_SET +@NETFS_XATTR_GET +Used to set/get extended attributes for given inode. +@id - attribute generation number or xattr setting type +@start - size of the attribute (request or attached) +@size - name length, path len and data size for given attribute +@ext - path length for given object ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [2/9] pohmelfs: configuration interface. 2009-02-09 14:02 ` [1/9] pohmelfs: documentation Evgeniy Polyakov @ 2009-02-09 14:02 ` Evgeniy Polyakov 2009-02-09 14:02 ` [3/9] pohmelfs: crypto processing Evgeniy Polyakov 0 siblings, 1 reply; 6+ messages in thread From: Evgeniy Polyakov @ 2009-02-09 14:02 UTC (permalink / raw) To: GregKH; +Cc: linux-kernel, linux-fsdevel, netdev, Evgeniy Polyakov This patch includes POHMELFS configuration interface based on the netlink kernel connector. This interface allows to create configuration groups in the kerenel indexed by mount ID option. Each configuration group can include multiple servers to work with and various operation parameters. Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net> diff --git a/drivers/staging/pohmelfs/config.c b/drivers/staging/pohmelfs/config.c new file mode 100644 index 0000000..941ab02 --- /dev/null +++ b/drivers/staging/pohmelfs/config.c @@ -0,0 +1,478 @@ +/* + * 2007+ Copyright (c) Evgeniy Polyakov <zbr@ioremap.net> + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include <linux/kernel.h> +#include <linux/connector.h> +#include <linux/crypto.h> +#include <linux/list.h> +#include <linux/mutex.h> +#include <linux/string.h> +#include <linux/in.h> + +#include "netfs.h" + +/* + * Global configuration list. + * Each client can be asked to get one of them. + * + * Allows to provide remote server address (ipv4/v6/whatever), port + * and so on via kernel connector. + */ + +static struct cb_id pohmelfs_cn_id = {.idx = POHMELFS_CN_IDX, .val = POHMELFS_CN_VAL}; +static LIST_HEAD(pohmelfs_config_list); +static DEFINE_MUTEX(pohmelfs_config_lock); + +static inline int pohmelfs_config_eql(struct pohmelfs_ctl *sc, struct pohmelfs_ctl *ctl) +{ + if (sc->idx == ctl->idx && sc->type == ctl->type && + sc->proto == ctl->proto && + sc->addrlen == ctl->addrlen && + !memcmp(&sc->addr, &ctl->addr, ctl->addrlen)) + return 1; + + return 0; +} + +static struct pohmelfs_config_group *pohmelfs_find_config_group(unsigned int idx) +{ + struct pohmelfs_config_group *g, *group = NULL; + + list_for_each_entry(g, &pohmelfs_config_list, group_entry) { + if (g->idx == idx) { + group = g; + break; + } + } + + return group; +} + +static struct pohmelfs_config_group *pohmelfs_find_create_config_group(unsigned int idx) +{ + struct pohmelfs_config_group *g; + + g = pohmelfs_find_config_group(idx); + if (g) + return g; + + g = kzalloc(sizeof(struct pohmelfs_config_group), GFP_KERNEL); + if (!g) + return NULL; + + INIT_LIST_HEAD(&g->config_list); + g->idx = idx; + g->num_entry = 0; + + list_add_tail(&g->group_entry, &pohmelfs_config_list); + + return g; +} + +int pohmelfs_copy_config(struct pohmelfs_sb *psb) +{ + struct pohmelfs_config_group *g; + struct pohmelfs_config *c, *dst; + int err = -ENODEV; + + mutex_lock(&pohmelfs_config_lock); + + g = pohmelfs_find_config_group(psb->idx); + if (!g) + goto out_unlock; + + /* + * Run over all entries in given config group and try to crate and + * initialize those, which do not exist in superblock list. + * Skip all existing entries. + */ + + list_for_each_entry(c, &g->config_list, config_entry) { + err = 0; + list_for_each_entry(dst, &psb->state_list, config_entry) { + if (pohmelfs_config_eql(&dst->state.ctl, &c->state.ctl)) { + err = -EEXIST; + break; + } + } + + if (err) + continue; + + dst = kzalloc(sizeof(struct pohmelfs_config), GFP_KERNEL); + if (!dst) { + err = -ENOMEM; + break; + } + + memcpy(&dst->state.ctl, &c->state.ctl, sizeof(struct pohmelfs_ctl)); + + list_add_tail(&dst->config_entry, &psb->state_list); + + err = pohmelfs_state_init_one(psb, dst); + if (err) { + list_del(&dst->config_entry); + kfree(dst); + } + + err = 0; + } + +out_unlock: + mutex_unlock(&pohmelfs_config_lock); + + return err; +} + +int pohmelfs_copy_crypto(struct pohmelfs_sb *psb) +{ + struct pohmelfs_config_group *g; + int err = -ENOENT; + + mutex_lock(&pohmelfs_config_lock); + g = pohmelfs_find_config_group(psb->idx); + if (!g) + goto err_out_exit; + + if (g->hash_string) { + err = -ENOMEM; + psb->hash_string = kstrdup(g->hash_string, GFP_KERNEL); + if (!psb->hash_string) + goto err_out_exit; + psb->hash_strlen = g->hash_strlen; + } + + if (g->cipher_string) { + psb->cipher_string = kstrdup(g->cipher_string, GFP_KERNEL); + if (!psb->cipher_string) + goto err_out_free_hash_string; + psb->cipher_strlen = g->cipher_strlen; + } + + if (g->hash_keysize) { + psb->hash_key = kmalloc(g->hash_keysize, GFP_KERNEL); + if (!psb->hash_key) + goto err_out_free_cipher_string; + memcpy(psb->hash_key, g->hash_key, g->hash_keysize); + psb->hash_keysize = g->hash_keysize; + } + + if (g->cipher_keysize) { + psb->cipher_key = kmalloc(g->cipher_keysize, GFP_KERNEL); + if (!psb->cipher_key) + goto err_out_free_hash; + memcpy(psb->cipher_key, g->cipher_key, g->cipher_keysize); + psb->cipher_keysize = g->cipher_keysize; + } + + mutex_unlock(&pohmelfs_config_lock); + + return 0; + +err_out_free_hash: + kfree(psb->hash_key); +err_out_free_cipher_string: + kfree(psb->cipher_string); +err_out_free_hash_string: + kfree(psb->hash_string); +err_out_exit: + mutex_unlock(&pohmelfs_config_lock); + return err; +} + +static int pohmelfs_send_reply(int err, int msg_num, int action, struct cn_msg *msg, struct pohmelfs_ctl *ctl) +{ + struct pohmelfs_cn_ack *ack; + + ack = kmalloc(sizeof(struct pohmelfs_cn_ack), GFP_KERNEL); + if (!ack) + return -ENOMEM; + + memset(ack, 0, sizeof(struct pohmelfs_cn_ack)); + memcpy(&ack->msg, msg, sizeof(struct cn_msg)); + + if (action == POHMELFS_CTLINFO_ACK) + memcpy(&ack->ctl, ctl, sizeof(struct pohmelfs_ctl)); + + ack->msg.len = sizeof(struct pohmelfs_cn_ack) - sizeof(struct cn_msg); + ack->msg.ack = msg->ack + 1; + ack->error = err; + ack->msg_num = msg_num; + + cn_netlink_send(&ack->msg, 0, GFP_KERNEL); + kfree(ack); + return 0; +} + +static int pohmelfs_cn_disp(struct cn_msg *msg) +{ + struct pohmelfs_config_group *g; + struct pohmelfs_ctl *ctl = (struct pohmelfs_ctl *)msg->data; + struct pohmelfs_config *c, *tmp; + int err = 0, i = 1; + + if (msg->len != sizeof(struct pohmelfs_ctl)) + return -EBADMSG; + + mutex_lock(&pohmelfs_config_lock); + + g = pohmelfs_find_config_group(ctl->idx); + if (!g) { + pohmelfs_send_reply(err, 0, POHMELFS_NOINFO_ACK, msg, NULL); + goto out_unlock; + } + + list_for_each_entry_safe(c, tmp, &g->config_list, config_entry) { + struct pohmelfs_ctl *sc = &c->state.ctl; + if (pohmelfs_send_reply(err, g->num_entry - i, POHMELFS_CTLINFO_ACK, msg, sc)) { + err = -ENOMEM; + goto out_unlock; + } + i += 1; + } + +out_unlock: + mutex_unlock(&pohmelfs_config_lock); + return err; +} + +static int pohmelfs_cn_ctl(struct cn_msg *msg, int action) +{ + struct pohmelfs_config_group *g; + struct pohmelfs_ctl *ctl = (struct pohmelfs_ctl *)msg->data; + struct pohmelfs_config *c, *tmp; + int err = 0; + + if (msg->len != sizeof(struct pohmelfs_ctl)) + return -EBADMSG; + + mutex_lock(&pohmelfs_config_lock); + + g = pohmelfs_find_create_config_group(ctl->idx); + if (!g) { + err = -ENOMEM; + goto out_unlock; + } + + list_for_each_entry_safe(c, tmp, &g->config_list, config_entry) { + struct pohmelfs_ctl *sc = &c->state.ctl; + + if (pohmelfs_config_eql(sc, ctl)) { + if (action == POHMELFS_FLAGS_ADD) { + err = -EEXIST; + goto out_unlock; + } else if (action == POHMELFS_FLAGS_DEL) { + list_del(&c->config_entry); + g->num_entry--; + kfree(c); + goto out_unlock; + } else { + err = -EEXIST; + goto out_unlock; + } + } + } + if (action == POHMELFS_FLAGS_DEL) { + err = -EBADMSG; + goto out_unlock; + } + + c = kzalloc(sizeof(struct pohmelfs_config), GFP_KERNEL); + if (!c) { + err = -ENOMEM; + goto out_unlock; + } + memcpy(&c->state.ctl, ctl, sizeof(struct pohmelfs_ctl)); + g->num_entry++; + list_add_tail(&c->config_entry, &g->config_list); + +out_unlock: + mutex_unlock(&pohmelfs_config_lock); + if (pohmelfs_send_reply(err, 0, POHMELFS_NOINFO_ACK, msg, NULL)) + err = -ENOMEM; + + return err; +} + +static int pohmelfs_crypto_hash_init(struct pohmelfs_config_group *g, struct pohmelfs_crypto *c) +{ + char *algo = (char *)c->data; + u8 *key = (u8 *)(algo + c->strlen); + + if (g->hash_string) + return -EEXIST; + + g->hash_string = kstrdup(algo, GFP_KERNEL); + if (!g->hash_string) + return -ENOMEM; + g->hash_strlen = c->strlen; + g->hash_keysize = c->keysize; + + g->hash_key = kmalloc(c->keysize, GFP_KERNEL); + if (!g->hash_key) { + kfree(g->hash_string); + return -ENOMEM; + } + + memcpy(g->hash_key, key, c->keysize); + + return 0; +} + +static int pohmelfs_crypto_cipher_init(struct pohmelfs_config_group *g, struct pohmelfs_crypto *c) +{ + char *algo = (char *)c->data; + u8 *key = (u8 *)(algo + c->strlen); + + if (g->cipher_string) + return -EEXIST; + + g->cipher_string = kstrdup(algo, GFP_KERNEL); + if (!g->cipher_string) + return -ENOMEM; + g->cipher_strlen = c->strlen; + g->cipher_keysize = c->keysize; + + g->cipher_key = kmalloc(c->keysize, GFP_KERNEL); + if (!g->cipher_key) { + kfree(g->cipher_string); + return -ENOMEM; + } + + memcpy(g->cipher_key, key, c->keysize); + + return 0; +} + + +static int pohmelfs_cn_crypto(struct cn_msg *msg) +{ + struct pohmelfs_crypto *crypto = (struct pohmelfs_crypto *)msg->data; + struct pohmelfs_config_group *g; + int err = 0; + + dprintk("%s: idx: %u, strlen: %u, type: %u, keysize: %u, algo: %s.\n", + __func__, crypto->idx, crypto->strlen, crypto->type, + crypto->keysize, (char *)crypto->data); + + mutex_lock(&pohmelfs_config_lock); + g = pohmelfs_find_create_config_group(crypto->idx); + if (!g) { + err = -ENOMEM; + goto out_unlock; + } + + switch (crypto->type) { + case POHMELFS_CRYPTO_HASH: + err = pohmelfs_crypto_hash_init(g, crypto); + break; + case POHMELFS_CRYPTO_CIPHER: + err = pohmelfs_crypto_cipher_init(g, crypto); + break; + default: + err = -ENOTSUPP; + break; + } + +out_unlock: + mutex_unlock(&pohmelfs_config_lock); + if (pohmelfs_send_reply(err, 0, POHMELFS_NOINFO_ACK, msg, NULL)) + err = -ENOMEM; + + return err; +} + +static void pohmelfs_cn_callback(void *data) +{ + struct cn_msg *msg = data; + int err; + + switch (msg->flags) { + case POHMELFS_FLAGS_ADD: + err = pohmelfs_cn_ctl(msg, POHMELFS_FLAGS_ADD); + break; + case POHMELFS_FLAGS_DEL: + err = pohmelfs_cn_ctl(msg, POHMELFS_FLAGS_DEL); + break; + case POHMELFS_FLAGS_SHOW: + err = pohmelfs_cn_disp(msg); + break; + case POHMELFS_FLAGS_CRYPTO: + err = pohmelfs_cn_crypto(msg); + break; + default: + err = -ENOSYS; + break; + } +} + +int pohmelfs_config_check(struct pohmelfs_config *config, int idx) +{ + struct pohmelfs_ctl *ctl = &config->state.ctl; + struct pohmelfs_config *tmp; + int err = -ENOENT; + struct pohmelfs_ctl *sc; + struct pohmelfs_config_group *g; + + mutex_lock(&pohmelfs_config_lock); + + g = pohmelfs_find_config_group(ctl->idx); + if (g) { + list_for_each_entry(tmp, &g->config_list, config_entry) { + sc = &tmp->state.ctl; + + if (pohmelfs_config_eql(sc, ctl)) { + err = 0; + break; + } + } + } + + mutex_unlock(&pohmelfs_config_lock); + + return err; +} + +int __init pohmelfs_config_init(void) +{ + return cn_add_callback(&pohmelfs_cn_id, "pohmelfs", pohmelfs_cn_callback); +} + +void pohmelfs_config_exit(void) +{ + struct pohmelfs_config *c, *tmp; + struct pohmelfs_config_group *g, *gtmp; + + cn_del_callback(&pohmelfs_cn_id); + + mutex_lock(&pohmelfs_config_lock); + list_for_each_entry_safe(g, gtmp, &pohmelfs_config_list, group_entry) { + list_for_each_entry_safe(c, tmp, &g->config_list, config_entry) { + list_del(&c->config_entry); + kfree(c); + } + + list_del(&g->group_entry); + + if (g->hash_string) + kfree(g->hash_string); + + if (g->cipher_string) + kfree(g->cipher_string); + + kfree(g); + } + mutex_unlock(&pohmelfs_config_lock); +} ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [3/9] pohmelfs: crypto processing. 2009-02-09 14:02 ` [2/9] pohmelfs: configuration interface Evgeniy Polyakov @ 2009-02-09 14:02 ` Evgeniy Polyakov 2009-02-09 14:02 ` [4/9] pohmelfs: directory operations Evgeniy Polyakov 0 siblings, 1 reply; 6+ messages in thread From: Evgeniy Polyakov @ 2009-02-09 14:02 UTC (permalink / raw) To: GregKH; +Cc: linux-kernel, linux-fsdevel, netdev, Evgeniy Polyakov POHMELFS is able to encrypt the whole network channel or attach the strong checksum to own packets to catch faulty media. This patch implements crypto initialization, its autoconfiguration and sync with the server. Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net> diff --git a/drivers/staging/pohmelfs/crypto.c b/drivers/staging/pohmelfs/crypto.c new file mode 100644 index 0000000..a3abd61 --- /dev/null +++ b/drivers/staging/pohmelfs/crypto.c @@ -0,0 +1,880 @@ +/* + * 2007+ Copyright (c) Evgeniy Polyakov <zbr@ioremap.net> + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include <linux/crypto.h> +#include <linux/highmem.h> +#include <linux/kthread.h> +#include <linux/pagemap.h> +#include <linux/slab.h> + +#include "netfs.h" + +static struct crypto_hash *pohmelfs_init_hash(struct pohmelfs_sb *psb) +{ + int err; + struct crypto_hash *hash; + + hash = crypto_alloc_hash(psb->hash_string, 0, CRYPTO_ALG_ASYNC); + if (IS_ERR(hash)) { + err = PTR_ERR(hash); + dprintk("%s: idx: %u: failed to allocate hash '%s', err: %d.\n", + __func__, psb->idx, psb->hash_string, err); + goto err_out_exit; + } + + psb->crypto_attached_size = crypto_hash_digestsize(hash); + + if (!psb->hash_keysize) + return hash; + + err = crypto_hash_setkey(hash, psb->hash_key, psb->hash_keysize); + if (err) { + dprintk("%s: idx: %u: failed to set key for hash '%s', err: %d.\n", + __func__, psb->idx, psb->hash_string, err); + goto err_out_free; + } + + return hash; + +err_out_free: + crypto_free_hash(hash); +err_out_exit: + return ERR_PTR(err); +} + +static struct crypto_ablkcipher *pohmelfs_init_cipher(struct pohmelfs_sb *psb) +{ + int err = -EINVAL; + struct crypto_ablkcipher *cipher; + + if (!psb->cipher_keysize) + goto err_out_exit; + + cipher = crypto_alloc_ablkcipher(psb->cipher_string, 0, 0); + if (IS_ERR(cipher)) { + err = PTR_ERR(cipher); + dprintk("%s: idx: %u: failed to allocate cipher '%s', err: %d.\n", + __func__, psb->idx, psb->cipher_string, err); + goto err_out_exit; + } + + crypto_ablkcipher_clear_flags(cipher, ~0); + + err = crypto_ablkcipher_setkey(cipher, psb->cipher_key, psb->cipher_keysize); + if (err) { + dprintk("%s: idx: %u: failed to set key for cipher '%s', err: %d.\n", + __func__, psb->idx, psb->cipher_string, err); + goto err_out_free; + } + + return cipher; + +err_out_free: + crypto_free_ablkcipher(cipher); +err_out_exit: + return ERR_PTR(err); +} + +int pohmelfs_crypto_engine_init(struct pohmelfs_crypto_engine *e, struct pohmelfs_sb *psb) +{ + int err; + + e->page_num = 0; + + e->size = PAGE_SIZE; + e->data = kmalloc(e->size, GFP_KERNEL); + if (!e->data) { + err = -ENOMEM; + goto err_out_exit; + } + + if (psb->hash_string) { + e->hash = pohmelfs_init_hash(psb); + if (IS_ERR(e->hash)) { + err = PTR_ERR(e->hash); + e->hash = NULL; + goto err_out_free; + } + } + + if (psb->cipher_string) { + e->cipher = pohmelfs_init_cipher(psb); + if (IS_ERR(e->cipher)) { + err = PTR_ERR(e->cipher); + e->cipher = NULL; + goto err_out_free_hash; + } + } + + return 0; + +err_out_free_hash: + crypto_free_hash(e->hash); +err_out_free: + kfree(e->data); +err_out_exit: + return err; +} + +void pohmelfs_crypto_engine_exit(struct pohmelfs_crypto_engine *e) +{ + if (e->hash) + crypto_free_hash(e->hash); + if (e->cipher) + crypto_free_ablkcipher(e->cipher); + kfree(e->data); +} + +static void pohmelfs_crypto_complete(struct crypto_async_request *req, int err) +{ + struct pohmelfs_crypto_completion *c = req->data; + + if (err == -EINPROGRESS) + return; + + dprintk("%s: req: %p, err: %d.\n", __func__, req, err); + c->error = err; + complete(&c->complete); +} + +static int pohmelfs_crypto_process(struct ablkcipher_request *req, + struct scatterlist *sg_dst, struct scatterlist *sg_src, + void *iv, int enc, unsigned long timeout) +{ + struct pohmelfs_crypto_completion complete; + int err; + + init_completion(&complete.complete); + complete.error = -EINPROGRESS; + + ablkcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG, + pohmelfs_crypto_complete, &complete); + + ablkcipher_request_set_crypt(req, sg_src, sg_dst, sg_src->length, iv); + + if (enc) + err = crypto_ablkcipher_encrypt(req); + else + err = crypto_ablkcipher_decrypt(req); + + switch (err) { + case -EINPROGRESS: + case -EBUSY: + err = wait_for_completion_interruptible_timeout(&complete.complete, + timeout); + if (!err) + err = -ETIMEDOUT; + else + err = complete.error; + break; + default: + break; + } + + return err; +} + +int pohmelfs_crypto_process_input_data(struct pohmelfs_crypto_engine *e, u64 cmd_iv, + void *data, struct page *page, unsigned int size) +{ + int err; + struct scatterlist sg; + + if (!e->cipher && !e->hash) + return 0; + + dprintk("%s: eng: %p, iv: %llx, data: %p, page: %p/%lu, size: %u.\n", + __func__, e, cmd_iv, data, page, (page)?page->index:0, size); + + if (data) { + sg_init_one(&sg, data, size); + } else { + sg_init_table(&sg, 1); + sg_set_page(&sg, page, size, 0); + } + + if (e->cipher) { + struct ablkcipher_request *req = e->data + crypto_hash_digestsize(e->hash); + u8 iv[32]; + + memset(iv, 0, sizeof(iv)); + memcpy(iv, &cmd_iv, sizeof(cmd_iv)); + + ablkcipher_request_set_tfm(req, e->cipher); + + err = pohmelfs_crypto_process(req, &sg, &sg, iv, 0, e->timeout); + if (err) + goto err_out_exit; + } + + if (e->hash) { + struct hash_desc desc; + void *dst = e->data + e->size/2; + + desc.tfm = e->hash; + desc.flags = 0; + + err = crypto_hash_init(&desc); + if (err) + goto err_out_exit; + + err = crypto_hash_update(&desc, &sg, size); + if (err) + goto err_out_exit; + + err = crypto_hash_final(&desc, dst); + if (err) + goto err_out_exit; + + err = !!memcmp(dst, e->data, crypto_hash_digestsize(e->hash)); + + if (err) { +#ifdef CONFIG_POHMELFS_DEBUG + unsigned int i; + unsigned char *recv = e->data, *calc = dst; + + dprintk("%s: eng: %p, hash: %p, cipher: %p: iv : %llx, hash mismatch (recv/calc): ", + __func__, e, e->hash, e->cipher, cmd_iv); + for (i=0; i<crypto_hash_digestsize(e->hash); ++i) { +#if 0 + dprintka("%02x ", recv[i]); + if (recv[i] != calc[i]) { + dprintka("| calc byte: %02x.\n", calc[i]); + break; + } +#else + dprintka("%02x/%02x ", recv[i], calc[i]); +#endif + } + dprintk("\n"); +#endif + goto err_out_exit; + } else { + dprintk("%s: eng: %p, hash: %p, cipher: %p: hashes matched.\n", + __func__, e, e->hash, e->cipher); + } + } + + dprintk("%s: eng: %p, size: %u, hash: %p, cipher: %p: completed.\n", + __func__, e, e->size, e->hash, e->cipher); + + return 0; + +err_out_exit: + dprintk("%s: eng: %p, hash: %p, cipher: %p: err: %d.\n", + __func__, e, e->hash, e->cipher, err); + return err; +} + +static int pohmelfs_trans_iter(struct netfs_trans *t, struct pohmelfs_crypto_engine *e, + int (* iterator) (struct pohmelfs_crypto_engine *e, + struct scatterlist *dst, + struct scatterlist *src)) +{ + void *data = t->iovec.iov_base + sizeof(struct netfs_cmd) + t->psb->crypto_attached_size; + unsigned int size = t->iovec.iov_len - sizeof(struct netfs_cmd) - t->psb->crypto_attached_size; + struct netfs_cmd *cmd = data; + unsigned int sz, pages = t->attached_pages, i, csize, cmd_cmd, dpage_idx; + struct scatterlist sg_src, sg_dst; + int err; + + while (size) { + cmd = data; + cmd_cmd = __be16_to_cpu(cmd->cmd); + csize = __be32_to_cpu(cmd->size); + cmd->iv = __cpu_to_be64(e->iv); + + if (cmd_cmd == NETFS_READ_PAGES || cmd_cmd == NETFS_READ_PAGE) + csize = __be16_to_cpu(cmd->ext); + + sz = csize + __be16_to_cpu(cmd->cpad) + sizeof(struct netfs_cmd); + + dprintk("%s: size: %u, sz: %u, cmd_size: %u, cmd_cpad: %u.\n", + __func__, size, sz, __be32_to_cpu(cmd->size), __be16_to_cpu(cmd->cpad)); + + data += sz; + size -= sz; + + sg_init_one(&sg_src, cmd->data, sz - sizeof(struct netfs_cmd)); + sg_init_one(&sg_dst, cmd->data, sz - sizeof(struct netfs_cmd)); + + err = iterator(e, &sg_dst, &sg_src); + if (err) + return err; + } + + if (!pages) + return 0; + + dpage_idx = 0; + for (i=0; i<t->page_num; ++i) { + struct page *page = t->pages[i]; + struct page *dpage = e->pages[dpage_idx]; + + if (!page) + continue; + + sg_init_table(&sg_src, 1); + sg_init_table(&sg_dst, 1); + sg_set_page(&sg_src, page, page_private(page), 0); + sg_set_page(&sg_dst, dpage, page_private(page), 0); + + err = iterator(e, &sg_dst, &sg_src); + if (err) + return err; + + pages--; + if (!pages) + break; + dpage_idx++; + } + + return 0; +} + +static int pohmelfs_encrypt_iterator(struct pohmelfs_crypto_engine *e, + struct scatterlist *sg_dst, struct scatterlist *sg_src) +{ + struct ablkcipher_request *req = e->data; + u8 iv[32]; + + memset(iv, 0, sizeof(iv)); + + memcpy(iv, &e->iv, sizeof(e->iv)); + + return pohmelfs_crypto_process(req, sg_dst, sg_src, iv, 1, e->timeout); +} + +static int pohmelfs_encrypt(struct pohmelfs_crypto_thread *tc) +{ + struct netfs_trans *t = tc->trans; + struct pohmelfs_crypto_engine *e = &tc->eng; + struct ablkcipher_request *req = e->data; + + memset(req, 0, sizeof(struct ablkcipher_request)); + ablkcipher_request_set_tfm(req, e->cipher); + + e->iv = pohmelfs_gen_iv(t); + + return pohmelfs_trans_iter(t, e, pohmelfs_encrypt_iterator); +} + +static int pohmelfs_hash_iterator(struct pohmelfs_crypto_engine *e, + struct scatterlist *sg_dst, struct scatterlist *sg_src) +{ + return crypto_hash_update(e->data, sg_src, sg_src->length); +} + +static int pohmelfs_hash(struct pohmelfs_crypto_thread *tc) +{ + struct pohmelfs_crypto_engine *e = &tc->eng; + struct hash_desc *desc = e->data; + unsigned char *dst = tc->trans->iovec.iov_base + sizeof(struct netfs_cmd); + int err; + + desc->tfm = e->hash; + desc->flags = 0; + + err = crypto_hash_init(desc); + if (err) + return err; + + err = pohmelfs_trans_iter(tc->trans, e, pohmelfs_hash_iterator); + if (err) + return err; + + err = crypto_hash_final(desc, dst); + if (err) + return err; + + { + unsigned int i; + dprintk("%s: ", __func__); + for (i=0; i<tc->psb->crypto_attached_size; ++i) + dprintka("%02x ", dst[i]); + dprintka("\n"); + } + + return 0; +} + +static void pohmelfs_crypto_pages_free(struct pohmelfs_crypto_engine *e) +{ + unsigned int i; + + for (i=0; i<e->page_num; ++i) + __free_page(e->pages[i]); + kfree(e->pages); +} + +static int pohmelfs_crypto_pages_alloc(struct pohmelfs_crypto_engine *e, struct pohmelfs_sb *psb) +{ + unsigned int i; + + e->pages = kmalloc(psb->trans_max_pages * sizeof(struct page *), GFP_KERNEL); + if (!e->pages) + return -ENOMEM; + + for (i=0; i<psb->trans_max_pages; ++i) { + e->pages[i] = alloc_page(GFP_KERNEL); + if (!e->pages[i]) + break; + } + + e->page_num = i; + if (!e->page_num) + goto err_out_free; + + return 0; + +err_out_free: + kfree(e->pages); + return -ENOMEM; +} + +static void pohmelfs_sys_crypto_exit_one(struct pohmelfs_crypto_thread *t) +{ + struct pohmelfs_sb *psb = t->psb; + + if (t->thread) + kthread_stop(t->thread); + + mutex_lock(&psb->crypto_thread_lock); + list_del(&t->thread_entry); + psb->crypto_thread_num--; + mutex_unlock(&psb->crypto_thread_lock); + + pohmelfs_crypto_engine_exit(&t->eng); + pohmelfs_crypto_pages_free(&t->eng); + kfree(t); +} + +static int pohmelfs_crypto_finish(struct netfs_trans *t, struct pohmelfs_sb *psb, int err) +{ + struct netfs_cmd *cmd = t->iovec.iov_base; + netfs_convert_cmd(cmd); + + if (likely(!err)) + err = netfs_trans_finish_send(t, psb); + + t->result = err; + netfs_trans_put(t); + + return err; +} + +void pohmelfs_crypto_thread_make_ready(struct pohmelfs_crypto_thread *th) +{ + struct pohmelfs_sb *psb = th->psb; + + th->page = NULL; + th->trans = NULL; + + mutex_lock(&psb->crypto_thread_lock); + list_move_tail(&th->thread_entry, &psb->crypto_ready_list); + mutex_unlock(&psb->crypto_thread_lock); + wake_up(&psb->wait); +} + +static int pohmelfs_crypto_thread_trans(struct pohmelfs_crypto_thread *t) +{ + struct netfs_trans *trans; + int err = 0; + + trans = t->trans; + trans->eng = NULL; + + if (t->eng.hash) { + err = pohmelfs_hash(t); + if (err) + goto out_complete; + } + + if (t->eng.cipher) { + err = pohmelfs_encrypt(t); + if (err) + goto out_complete; + trans->eng = &t->eng; + } + +out_complete: + t->page = NULL; + t->trans = NULL; + + if (!trans->eng) + pohmelfs_crypto_thread_make_ready(t); + + pohmelfs_crypto_finish(trans, t->psb, err); + return err; +} + +static int pohmelfs_crypto_thread_page(struct pohmelfs_crypto_thread *t) +{ + struct pohmelfs_crypto_engine *e = &t->eng; + struct page *page = t->page; + int err; + + WARN_ON(!PageChecked(page)); + + err = pohmelfs_crypto_process_input_data(e, e->iv, NULL, page, t->size); + if (!err) + SetPageUptodate(page); + else + SetPageError(page); + unlock_page(page); + page_cache_release(page); + + pohmelfs_crypto_thread_make_ready(t); + + return err; +} + +static int pohmelfs_crypto_thread_func(void *data) +{ + struct pohmelfs_crypto_thread *t = data; + + while (!kthread_should_stop()) { + wait_event_interruptible(t->wait, kthread_should_stop() || + t->trans || t->page); + + if (kthread_should_stop()) + break; + + if (!t->trans && !t->page) + continue; + + dprintk("%s: thread: %p, trans: %p, page: %p.\n", + __func__, t, t->trans, t->page); + + if (t->trans) + pohmelfs_crypto_thread_trans(t); + else if (t->page) + pohmelfs_crypto_thread_page(t); + } + + return 0; +} + +static void pohmelfs_crypto_flush(struct pohmelfs_sb *psb, struct list_head *head) +{ + while (!list_empty(head)) { + struct pohmelfs_crypto_thread *t = NULL; + + mutex_lock(&psb->crypto_thread_lock); + if (!list_empty(head)) { + t = list_first_entry(head, struct pohmelfs_crypto_thread, thread_entry); + list_del_init(&t->thread_entry); + } + mutex_unlock(&psb->crypto_thread_lock); + + if (t) + pohmelfs_sys_crypto_exit_one(t); + } +} + +static void pohmelfs_sys_crypto_exit(struct pohmelfs_sb *psb) +{ + while (!list_empty(&psb->crypto_active_list) || !list_empty(&psb->crypto_ready_list)) { + dprintk("%s: crypto_thread_num: %u.\n", __func__, psb->crypto_thread_num); + pohmelfs_crypto_flush(psb, &psb->crypto_active_list); + pohmelfs_crypto_flush(psb, &psb->crypto_ready_list); + } +} + +static int pohmelfs_sys_crypto_init(struct pohmelfs_sb *psb) +{ + unsigned int i; + struct pohmelfs_crypto_thread *t; + struct pohmelfs_config *c; + struct netfs_state *st; + int err; + + list_for_each_entry(c, &psb->state_list, config_entry) { + st = &c->state; + + err = pohmelfs_crypto_engine_init(&st->eng, psb); + if (err) + goto err_out_exit; + + dprintk("%s: st: %p, eng: %p, hash: %p, cipher: %p.\n", + __func__, st, &st->eng, &st->eng.hash, &st->eng.cipher); + } + + for (i=0; i<psb->crypto_thread_num; ++i) { + err = -ENOMEM; + t = kzalloc(sizeof(struct pohmelfs_crypto_thread), GFP_KERNEL); + if (!t) + goto err_out_free_state_engines; + + init_waitqueue_head(&t->wait); + + t->psb = psb; + t->trans = NULL; + t->eng.thread = t; + + err = pohmelfs_crypto_engine_init(&t->eng, psb); + if (err) + goto err_out_free_state_engines; + + err = pohmelfs_crypto_pages_alloc(&t->eng, psb); + if (err) + goto err_out_free; + + t->thread = kthread_run(pohmelfs_crypto_thread_func, t, + "pohmelfs-crypto-%d-%d", psb->idx, i); + if (IS_ERR(t->thread)) { + err = PTR_ERR(t->thread); + t->thread = NULL; + goto err_out_free; + } + + if (t->eng.cipher) + psb->crypto_align_size = crypto_ablkcipher_blocksize(t->eng.cipher); + + mutex_lock(&psb->crypto_thread_lock); + list_add_tail(&t->thread_entry, &psb->crypto_ready_list); + mutex_unlock(&psb->crypto_thread_lock); + } + + psb->crypto_thread_num = i; + return 0; + +err_out_free: + pohmelfs_sys_crypto_exit_one(t); +err_out_free_state_engines: + list_for_each_entry(c, &psb->state_list, config_entry) { + st = &c->state; + pohmelfs_crypto_engine_exit(&st->eng); + } +err_out_exit: + pohmelfs_sys_crypto_exit(psb); + return err; +} + +void pohmelfs_crypto_exit(struct pohmelfs_sb *psb) +{ + pohmelfs_sys_crypto_exit(psb); + + kfree(psb->hash_string); + kfree(psb->cipher_string); +} + +static int pohmelfs_crypt_init_complete(struct page **pages, unsigned int page_num, + void *private, int err) +{ + struct pohmelfs_sb *psb = private; + + psb->flags = -err; + dprintk("%s: err: %d.\n", __func__, err); + + wake_up(&psb->wait); + + return err; +} + +static int pohmelfs_crypto_init_handshake(struct pohmelfs_sb *psb) +{ + struct netfs_trans *t; + struct netfs_crypto_capabilities *cap; + struct netfs_cmd *cmd; + char *str; + int err = -ENOMEM, size; + + size = sizeof(struct netfs_crypto_capabilities) + + psb->cipher_strlen + psb->hash_strlen + 2; /* 0 bytes */ + + t = netfs_trans_alloc(psb, size, 0, 0); + if (!t) + goto err_out_exit; + + t->complete = pohmelfs_crypt_init_complete; + t->private = psb; + + cmd = netfs_trans_current(t); + cap = (struct netfs_crypto_capabilities *)(cmd + 1); + str = (char *)(cap + 1); + + cmd->cmd = NETFS_CAPABILITIES; + cmd->id = POHMELFS_CRYPTO_CAPABILITIES; + cmd->size = size; + cmd->start = 0; + cmd->ext = 0; + cmd->csize = 0; + + netfs_convert_cmd(cmd); + netfs_trans_update(cmd, t, size); + + cap->hash_strlen = psb->hash_strlen; + if (cap->hash_strlen) { + sprintf(str, "%s", psb->hash_string); + str += cap->hash_strlen; + } + + cap->cipher_strlen = psb->cipher_strlen; + cap->cipher_keysize = psb->cipher_keysize; + if (cap->cipher_strlen) + sprintf(str, "%s", psb->cipher_string); + + netfs_convert_crypto_capabilities(cap); + + psb->flags = ~0; + err = netfs_trans_finish(t, psb); + if (err) + goto err_out_exit; + + err = wait_event_interruptible_timeout(psb->wait, (psb->flags != ~0), + psb->wait_on_page_timeout); + if (!err) + err = -ETIMEDOUT; + else + err = -psb->flags; + + if (!err) + psb->perform_crypto = 1; + psb->flags = 0; + + /* + * At this point NETFS_CAPABILITIES response command + * should setup superblock in a way, which is acceptible + * for both client and server, so if server refuses connection, + * it will send error in transaction response. + */ + + if (err) + goto err_out_exit; + + return 0; + +err_out_exit: + return err; +} + +int pohmelfs_crypto_init(struct pohmelfs_sb *psb) +{ + int err; + + if (!psb->cipher_string && !psb->hash_string) + return 0; + + err = pohmelfs_crypto_init_handshake(psb); + if (err) + return err; + + err = pohmelfs_sys_crypto_init(psb); + if (err) + return err; + + return 0; +} + +static int pohmelfs_crypto_thread_get(struct pohmelfs_sb *psb, + int (* action)(struct pohmelfs_crypto_thread *t, void *data), void *data) +{ + struct pohmelfs_crypto_thread *t = NULL; + int err; + + while (!t) { + err = wait_event_interruptible_timeout(psb->wait, + !list_empty(&psb->crypto_ready_list), + psb->wait_on_page_timeout); + + t = NULL; + err = 0; + mutex_lock(&psb->crypto_thread_lock); + if (!list_empty(&psb->crypto_ready_list)) { + t = list_entry(psb->crypto_ready_list.prev, + struct pohmelfs_crypto_thread, + thread_entry); + + list_move_tail(&t->thread_entry, + &psb->crypto_active_list); + + action(t, data); + wake_up(&t->wait); + + } + mutex_unlock(&psb->crypto_thread_lock); + } + + return err; +} + +static int pohmelfs_trans_crypt_action(struct pohmelfs_crypto_thread *t, void *data) +{ + struct netfs_trans *trans = data; + + netfs_trans_get(trans); + t->trans = trans; + + dprintk("%s: t: %p, gen: %u, thread: %p.\n", __func__, trans, trans->gen, t); + return 0; +} + +int pohmelfs_trans_crypt(struct netfs_trans *trans, struct pohmelfs_sb *psb) +{ + if ((!psb->hash_string && !psb->cipher_string) || !psb->perform_crypto) { + netfs_trans_get(trans); + return pohmelfs_crypto_finish(trans, psb, 0); + } + + return pohmelfs_crypto_thread_get(psb, pohmelfs_trans_crypt_action, trans); +} + +struct pohmelfs_crypto_input_action_data +{ + struct page *page; + struct pohmelfs_crypto_engine *e; + u64 iv; + unsigned int size; +}; + +static int pohmelfs_crypt_input_page_action(struct pohmelfs_crypto_thread *t, void *data) +{ + struct pohmelfs_crypto_input_action_data *act = data; + + memcpy(t->eng.data, act->e->data, t->psb->crypto_attached_size); + + t->size = act->size; + t->eng.iv = act->iv; + + t->page = act->page; + return 0; +} + +int pohmelfs_crypto_process_input_page(struct pohmelfs_crypto_engine *e, + struct page *page, unsigned int size, u64 iv) +{ + struct inode *inode = page->mapping->host; + struct pohmelfs_crypto_input_action_data act; + int err = -ENOENT; + + act.page = page; + act.e = e; + act.size = size; + act.iv = iv; + + err = pohmelfs_crypto_thread_get(POHMELFS_SB(inode->i_sb), + pohmelfs_crypt_input_page_action, &act); + if (err) + goto err_out_exit; + + return 0; + +err_out_exit: + SetPageUptodate(page); + page_cache_release(page); + + return err; +} ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [4/9] pohmelfs: directory operations. 2009-02-09 14:02 ` [3/9] pohmelfs: crypto processing Evgeniy Polyakov @ 2009-02-09 14:02 ` Evgeniy Polyakov 0 siblings, 0 replies; 6+ messages in thread From: Evgeniy Polyakov @ 2009-02-09 14:02 UTC (permalink / raw) To: GregKH; +Cc: linux-kernel, linux-fsdevel, netdev, Evgeniy Polyakov This patch implementes all supported directory operations like directory reading, object lookup, creation, removal and so on. Currently object removal is not optimized at all. Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net> diff --git a/drivers/staging/pohmelfs/dir.c b/drivers/staging/pohmelfs/dir.c new file mode 100644 index 0000000..3b795c9 --- /dev/null +++ b/drivers/staging/pohmelfs/dir.c @@ -0,0 +1,1093 @@ +/* + * 2007+ Copyright (c) Evgeniy Polyakov <zbr@ioremap.net> + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include <linux/kernel.h> +#include <linux/fs.h> +#include <linux/jhash.h> +#include <linux/namei.h> +#include <linux/pagemap.h> + +#include "netfs.h" + +static int pohmelfs_cmp_hash(struct pohmelfs_name *n, u32 hash) +{ + if (n->hash > hash) + return -1; + if (n->hash < hash) + return 1; + + return 0; +} + +static struct pohmelfs_name *pohmelfs_search_hash_unprecise(struct pohmelfs_inode *pi, u32 hash) +{ + struct rb_node *n = pi->hash_root.rb_node; + struct pohmelfs_name *tmp = NULL; + int cmp; + + while (n) { + tmp = rb_entry(n, struct pohmelfs_name, hash_node); + + cmp = pohmelfs_cmp_hash(tmp, hash); + if (cmp < 0) + n = n->rb_left; + else if (cmp > 0) + n = n->rb_right; + else + break; + + } + + return tmp; +} + +struct pohmelfs_name *pohmelfs_search_hash(struct pohmelfs_inode *pi, u32 hash) +{ + struct pohmelfs_name *tmp; + + tmp = pohmelfs_search_hash_unprecise(pi, hash); + if (tmp && (tmp->hash == hash)) + return tmp; + + return NULL; +} + +static void __pohmelfs_name_del(struct pohmelfs_inode *parent, struct pohmelfs_name *node) +{ + rb_erase(&node->hash_node, &parent->hash_root); +} + +/* + * Remove name cache entry from its caches and free it. + */ +static void pohmelfs_name_free(struct pohmelfs_inode *parent, struct pohmelfs_name *node) +{ + __pohmelfs_name_del(parent, node); + list_del(&node->sync_create_entry); + kfree(node); +} + +static struct pohmelfs_name *pohmelfs_insert_hash(struct pohmelfs_inode *pi, + struct pohmelfs_name *new) +{ + struct rb_node **n = &pi->hash_root.rb_node, *parent = NULL; + struct pohmelfs_name *ret = NULL, *tmp; + int cmp; + + while (*n) { + parent = *n; + + tmp = rb_entry(parent, struct pohmelfs_name, hash_node); + + cmp = pohmelfs_cmp_hash(tmp, new->hash); + if (cmp < 0) + n = &parent->rb_left; + else if (cmp > 0) + n = &parent->rb_right; + else { + ret = tmp; + break; + } + } + + if (ret) { + printk("%s: exist: parent: %llu, ino: %llu, hash: %x, len: %u, data: '%s', " + "new: ino: %llu, hash: %x, len: %u, data: '%s'.\n", + __func__, pi->ino, + ret->ino, ret->hash, ret->len, ret->data, + new->ino, new->hash, new->len, new->data); + ret->ino = new->ino; + return ret; + } + + rb_link_node(&new->hash_node, parent, n); + rb_insert_color(&new->hash_node, &pi->hash_root); + + return NULL; +} + +/* + * Free name cache for given inode. + */ +void pohmelfs_free_names(struct pohmelfs_inode *parent) +{ + struct rb_node *rb_node; + struct pohmelfs_name *n; + + for (rb_node = rb_first(&parent->hash_root); rb_node;) { + n = rb_entry(rb_node, struct pohmelfs_name, hash_node); + rb_node = rb_next(rb_node); + + pohmelfs_name_free(parent, n); + } +} + +static void pohmelfs_fix_offset(struct pohmelfs_inode *parent, struct pohmelfs_name *node) +{ + parent->total_len -= node->len; +} + +/* + * Free name cache entry helper. + */ +void pohmelfs_name_del(struct pohmelfs_inode *parent, struct pohmelfs_name *node) +{ + pohmelfs_fix_offset(parent, node); + pohmelfs_name_free(parent, node); +} + +/* + * Insert new name cache entry into all hash cache. + */ +static int pohmelfs_insert_name(struct pohmelfs_inode *parent, struct pohmelfs_name *n) +{ + struct pohmelfs_name *name; + + name = pohmelfs_insert_hash(parent, n); + if (name) + return -EEXIST; + + parent->total_len += n->len; + list_add_tail(&n->sync_create_entry, &parent->sync_create_list); + + return 0; +} + +/* + * Allocate new name cache entry. + */ +static struct pohmelfs_name *pohmelfs_name_alloc(unsigned int len) +{ + struct pohmelfs_name *n; + + n = kzalloc(sizeof(struct pohmelfs_name) + len, GFP_KERNEL); + if (!n) + return NULL; + + INIT_LIST_HEAD(&n->sync_create_entry); + + n->data = (char *)(n+1); + + return n; +} + +/* + * Add new name entry into directory's cache. + */ +static int pohmelfs_add_dir(struct pohmelfs_sb *psb, struct pohmelfs_inode *parent, + struct pohmelfs_inode *npi, struct qstr *str, unsigned int mode, int link) +{ + int err = -ENOMEM; + struct pohmelfs_name *n; + + n = pohmelfs_name_alloc(str->len + 1); + if (!n) + goto err_out_exit; + + n->ino = npi->ino; + n->mode = mode; + n->len = str->len; + n->hash = str->hash; + sprintf(n->data, "%s", str->name); + + mutex_lock(&parent->offset_lock); + err = pohmelfs_insert_name(parent, n); + mutex_unlock(&parent->offset_lock); + + if (err) { + if (err != -EEXIST) + goto err_out_free; + kfree(n); + } + + return 0; + +err_out_free: + kfree(n); +err_out_exit: + return err; +} + +/* + * Create new inode for given parameters (name, inode info, parent). + * This does not create object on the server, it will be synced there during writeback. + */ +struct pohmelfs_inode *pohmelfs_new_inode(struct pohmelfs_sb *psb, + struct pohmelfs_inode *parent, struct qstr *str, + struct netfs_inode_info *info, int link) +{ + struct inode *new = NULL; + struct pohmelfs_inode *npi; + int err = -EEXIST; + + dprintk("%s: creating inode: parent: %llu, ino: %llu, str: %p.\n", + __func__, (parent)?parent->ino:0, info->ino, str); + + err = -ENOMEM; + new = iget_locked(psb->sb, info->ino); + if (!new) + goto err_out_exit; + + npi = POHMELFS_I(new); + npi->ino = info->ino; + err = 0; + + if (new->i_state & I_NEW) { + dprintk("%s: filling VFS inode: %lu/%llu.\n", + __func__, new->i_ino, info->ino); + pohmelfs_fill_inode(new, info); + + if (S_ISDIR(info->mode)) { + struct qstr s; + + s.name = "."; + s.len = 1; + s.hash = jhash(s.name, s.len, 0); + + err = pohmelfs_add_dir(psb, npi, npi, &s, info->mode, 0); + if (err) + goto err_out_put; + + s.name = ".."; + s.len = 2; + s.hash = jhash(s.name, s.len, 0); + + err = pohmelfs_add_dir(psb, npi, (parent)?parent:npi, &s, + (parent)?parent->vfs_inode.i_mode:npi->vfs_inode.i_mode, 0); + if (err) + goto err_out_put; + } + } + + if (str) { + if (parent) { + err = pohmelfs_add_dir(psb, parent, npi, str, info->mode, link); + + dprintk("%s: %s inserted name: '%s', new_offset: %llu, ino: %llu, parent: %llu.\n", + __func__, (err)?"unsuccessfully":"successfully", + str->name, parent->total_len, info->ino, parent->ino); + + if (err && err != -EEXIST) + goto err_out_put; + } + } + + if (new->i_state & I_NEW) { + if (parent) + mark_inode_dirty(&parent->vfs_inode); + mark_inode_dirty(new); + } + + set_bit(NETFS_INODE_OWNED, &npi->state); + npi->lock_type = POHMELFS_WRITE_LOCK; + unlock_new_inode(new); + + return npi; + +err_out_put: + printk("%s: putting inode: %p, npi: %p, error: %d.\n", __func__, new, npi, err); + iput(new); +err_out_exit: + return ERR_PTR(err); +} + +static int pohmelfs_remote_sync_complete(struct page **pages, unsigned int page_num, + void *private, int err) +{ + struct pohmelfs_inode *pi = private; + struct pohmelfs_sb *psb = POHMELFS_SB(pi->vfs_inode.i_sb); + + dprintk("%s: ino: %llu, err: %d.\n", __func__, pi->ino, err); + + if (err) + pi->error = err; + wake_up(&psb->wait); + pohmelfs_put_inode(pi); + + return err; +} + +/* + * Receive directory content from the server. + * This should be only done for objects, which were not created locally, + * and which were not synced previously. + */ +static int pohmelfs_sync_remote_dir(struct pohmelfs_inode *pi) +{ + struct inode *inode = &pi->vfs_inode; + struct pohmelfs_sb *psb = POHMELFS_SB(inode->i_sb); + long ret = msecs_to_jiffies(25000); + int err; + + dprintk("%s: dir: %llu, state: %lx: remote_synced: %d.\n", + __func__, pi->ino, pi->state, test_bit(NETFS_INODE_REMOTE_SYNCED, &pi->state)); + + if (test_bit(NETFS_INODE_REMOTE_DIR_SYNCED, &pi->state)) + return 0; + + if (!igrab(inode)) { + err = -ENOENT; + goto err_out_exit; + } + + err = pohmelfs_meta_command(pi, NETFS_READDIR, NETFS_TRANS_SINGLE_DST, + pohmelfs_remote_sync_complete, pi, 0); + if (err) + goto err_out_exit; + + pi->error = 0; + ret = wait_event_interruptible_timeout(psb->wait, + test_bit(NETFS_INODE_REMOTE_DIR_SYNCED, &pi->state) || pi->error, ret); + dprintk("%s: awake dir: %llu, ret: %ld, err: %d.\n", __func__, pi->ino, ret, pi->error); + if (ret <= 0) { + err = -ETIMEDOUT; + goto err_out_exit; + } + + if (pi->error) + return pi->error; + + return 0; + +err_out_exit: + clear_bit(NETFS_INODE_REMOTE_SYNCED, &pi->state); + + return err; +} + +static int pohmelfs_dir_open(struct inode *inode, struct file *file) +{ + file->private_data = NULL; + return 0; +} + +/* + * VFS readdir callback. Syncs directory content from server if needed, + * and provides direntry info to the userspace. + */ +static int pohmelfs_readdir(struct file *file, void *dirent, filldir_t filldir) +{ + struct inode *inode = file->f_path.dentry->d_inode; + struct pohmelfs_inode *pi = POHMELFS_I(inode); + struct pohmelfs_name *n; + struct rb_node *rb_node; + int err = 0, mode; + u64 len; + + dprintk("%s: parent: %llu, fpos: %llu, hash: %08lx.\n", + __func__, pi->ino, (u64)file->f_pos, + (unsigned long)file->private_data); + + err = pohmelfs_data_lock(pi, 0, ~0, POHMELFS_READ_LOCK); + if (err) + return err; + + err = pohmelfs_sync_remote_dir(pi); + if (err) + return err; + + if (file->private_data && (file->private_data == (void *)(unsigned long)file->f_pos)) + return 0; + + mutex_lock(&pi->offset_lock); + n = pohmelfs_search_hash_unprecise(pi, (unsigned long)file->private_data); + + while (n) { + mode = (n->mode >> 12) & 15; + + dprintk("%s: offset: %llu, parent ino: %llu, name: '%s', len: %u, ino: %llu, " + "mode: %o/%o, fpos: %llu, hash: %08x.\n", + __func__, file->f_pos, pi->ino, n->data, n->len, + n->ino, n->mode, mode, file->f_pos, n->hash); + + file->private_data = (void *)n->hash; + + len = n->len; + err = filldir(dirent, n->data, n->len, file->f_pos, n->ino, mode); + + if (err < 0) { + dprintk("%s: err: %d.\n", __func__, err); + err = 0; + break; + } + + file->f_pos += len; + + rb_node = rb_next(&n->hash_node); + + if (!rb_node || (rb_node == &n->hash_node)) { + file->private_data = (void *)(unsigned long)file->f_pos; + break; + } + + n = rb_entry(rb_node, struct pohmelfs_name, hash_node); + } + mutex_unlock(&pi->offset_lock); + + return err; +} + +static loff_t pohmelfs_dir_lseek(struct file *file, loff_t offset, int origin) +{ + file->f_pos = offset; + file->private_data = NULL; + return offset; +} + +const struct file_operations pohmelfs_dir_fops = { + .open = pohmelfs_dir_open, + .read = generic_read_dir, + .llseek = pohmelfs_dir_lseek, + .readdir = pohmelfs_readdir, +}; + +/* + * Lookup single object on server. + */ +static int pohmelfs_lookup_single(struct pohmelfs_inode *parent, + struct qstr *str, u64 ino) +{ + struct pohmelfs_sb *psb = POHMELFS_SB(parent->vfs_inode.i_sb); + long ret = msecs_to_jiffies(5000); + int err; + + set_bit(NETFS_COMMAND_PENDING, &parent->state); + err = pohmelfs_meta_command_data(parent, parent->ino, NETFS_LOOKUP, + (char *)str->name, NETFS_TRANS_SINGLE_DST, NULL, NULL, ino); + if (err) + goto err_out_exit; + + err = 0; + ret = wait_event_interruptible_timeout(psb->wait, + !test_bit(NETFS_COMMAND_PENDING, &parent->state), ret); + if (ret == 0) + err = -ETIMEDOUT; + else if (signal_pending(current)) + err = -EINTR; + + if (err) + goto err_out_exit; + + return 0; + +err_out_exit: + clear_bit(NETFS_COMMAND_PENDING, &parent->state); + + printk("%s: failed: parent: %llu, ino: %llu, name: '%s', err: %d.\n", + __func__, parent->ino, ino, str->name, err); + + return err; +} + +/* + * VFS lookup callback. + * We first try to get inode number from local name cache, if we have one, + * then inode can be found in inode cache. If there is no inode or no object in + * local cache, try to lookup it on server. This only should be done for directories, + * which were not created locally, otherwise remote server does not know about dir at all, + * so no need to try to know that. + */ +struct dentry *pohmelfs_lookup(struct inode *dir, struct dentry *dentry, struct nameidata *nd) +{ + struct pohmelfs_inode *parent = POHMELFS_I(dir); + struct pohmelfs_name *n; + struct inode *inode = NULL; + unsigned long ino = 0; + int err, lock_type = POHMELFS_READ_LOCK, need_lock; + struct qstr str = dentry->d_name; + + if ((nd->intent.open.flags & O_ACCMODE) > 1) + lock_type = POHMELFS_WRITE_LOCK; + + need_lock = pohmelfs_need_lock(parent, lock_type); + + err = pohmelfs_data_lock(parent, 0, ~0, lock_type); + if (err) + goto out; + + str.hash = jhash(dentry->d_name.name, dentry->d_name.len, 0); + + mutex_lock(&parent->offset_lock); + n = pohmelfs_search_hash(parent, str.hash); + if (n) + ino = n->ino; + mutex_unlock(&parent->offset_lock); + + dprintk("%s: 1 ino: %lu, inode: %p, name: '%s', hash: %x, parent_state: %lx.\n", + __func__, ino, inode, str.name, str.hash, parent->state); + + if (ino) { + inode = ilookup(dir->i_sb, ino); + if (inode) + goto out; + } + + dprintk("%s: dir: %p, dir_ino: %llu, name: '%s', len: %u, dir_state: %lx, ino: %lu.\n", + __func__, dir, parent->ino, + str.name, str.len, parent->state, ino); + + if (!ino) { + if (!need_lock) + goto out; + } + + err = pohmelfs_lookup_single(parent, &str, ino); + if (err) + goto out; + + if (!ino) { + mutex_lock(&parent->offset_lock); + n = pohmelfs_search_hash(parent, str.hash); + if (n) + ino = n->ino; + mutex_unlock(&parent->offset_lock); + } + + if (ino) { + inode = ilookup(dir->i_sb, ino); + printk("%s: second lookup ino: %lu, inode: %p, name: '%s', hash: %x.\n", + __func__, ino, inode, str.name, str.hash); + if (!inode) { + printk("%s: No inode for ino: %lu, name: '%s', hash: %x.\n", + __func__, ino, str.name, str.hash); + //return NULL; + return ERR_PTR(-EACCES); + } + } else { + printk("%s: No inode number : name: '%s', hash: %x.\n", + __func__, str.name, str.hash); + } +out: + return d_splice_alias(inode, dentry); +} + +/* + * Create new object in local cache. Object will be synced to server + * during writeback for given inode. + */ +struct pohmelfs_inode *pohmelfs_create_entry_local(struct pohmelfs_sb *psb, + struct pohmelfs_inode *parent, struct qstr *str, u64 start, int mode) +{ + struct pohmelfs_inode *npi; + int err = -ENOMEM; + struct netfs_inode_info info; + + dprintk("%s: name: '%s', mode: %o, start: %llu.\n", + __func__, str->name, mode, start); + + info.mode = mode; + info.ino = start; + + if (!start) + info.ino = pohmelfs_new_ino(psb); + + info.nlink = S_ISDIR(mode)?2:1; + info.uid = current_fsuid(); + info.gid = current_fsgid(); + info.size = 0; + info.blocksize = 512; + info.blocks = 0; + info.rdev = 0; + info.version = 0; + + npi = pohmelfs_new_inode(psb, parent, str, &info, !!start); + if (IS_ERR(npi)) { + err = PTR_ERR(npi); + goto err_out_unlock; + } + + return npi; + +err_out_unlock: + dprintk("%s: err: %d.\n", __func__, err); + return ERR_PTR(err); +} + +/* + * Create local object and bind it to dentry. + */ +static int pohmelfs_create_entry(struct inode *dir, struct dentry *dentry, u64 start, int mode) +{ + struct pohmelfs_sb *psb = POHMELFS_SB(dir->i_sb); + struct pohmelfs_inode *npi, *parent; + struct qstr str = dentry->d_name; + int err; + + parent = POHMELFS_I(dir); + + err = pohmelfs_data_lock(parent, 0, ~0, POHMELFS_WRITE_LOCK); + if (err) + return err; + + str.hash = jhash(dentry->d_name.name, dentry->d_name.len, 0); + + npi = pohmelfs_create_entry_local(psb, parent, &str, start, mode); + if (IS_ERR(npi)) + return PTR_ERR(npi); + + d_instantiate(dentry, &npi->vfs_inode); + + dprintk("%s: parent: %llu, inode: %llu, name: '%s', parent_nlink: %d, nlink: %d.\n", + __func__, parent->ino, npi->ino, dentry->d_name.name, + (signed)dir->i_nlink, (signed)npi->vfs_inode.i_nlink); + + return 0; +} + +/* + * VFS create and mkdir callbacks. + */ +static int pohmelfs_create(struct inode *dir, struct dentry *dentry, int mode, + struct nameidata *nd) +{ + return pohmelfs_create_entry(dir, dentry, 0, mode); +} + +static int pohmelfs_mkdir(struct inode *dir, struct dentry *dentry, int mode) +{ + int err; + + inode_inc_link_count(dir); + err = pohmelfs_create_entry(dir, dentry, 0, mode | S_IFDIR); + if (err) + inode_dec_link_count(dir); + + return err; +} + +static int pohmelfs_remove_entry(struct inode *dir, struct dentry *dentry) +{ + struct pohmelfs_sb *psb = POHMELFS_SB(dir->i_sb); + struct inode *inode = dentry->d_inode; + struct pohmelfs_inode *parent = POHMELFS_I(dir), *pi = POHMELFS_I(inode); + struct pohmelfs_name *n; + int err = -ENOENT; + struct qstr str = dentry->d_name; + + err = pohmelfs_data_lock(parent, 0, ~0, POHMELFS_WRITE_LOCK); + if (err) + return err; + + str.hash = jhash(dentry->d_name.name, dentry->d_name.len, 0); + + dprintk("%s: dir_ino: %llu, inode: %llu, name: '%s', nlink: %d.\n", + __func__, parent->ino, pi->ino, + str.name, (signed)inode->i_nlink); + + BUG_ON(!inode); + + mutex_lock(&parent->offset_lock); + n = pohmelfs_search_hash(parent, str.hash); + if (n) { + pohmelfs_fix_offset(parent, n); + if (test_bit(NETFS_INODE_REMOTE_SYNCED, &pi->state)) { + pohmelfs_remove_child(pi, n); + } + pohmelfs_name_free(parent, n); + err = 0; + } + mutex_unlock(&parent->offset_lock); + + if (!err) { + psb->avail_size += inode->i_size; + + pohmelfs_inode_del_inode(psb, pi); + + mark_inode_dirty(dir); + + inode->i_ctime = dir->i_ctime; + if (inode->i_nlink) + inode_dec_link_count(inode); + } + dprintk("%s: inode: %p, lock: %ld, unhashed: %d.\n", + __func__, pi, inode->i_state & I_LOCK, hlist_unhashed(&inode->i_hash)); + + return err; +} + +/* + * Unlink and rmdir VFS callbacks. + */ +static int pohmelfs_unlink(struct inode *dir, struct dentry *dentry) +{ + return pohmelfs_remove_entry(dir, dentry); +} + +static int pohmelfs_rmdir(struct inode *dir, struct dentry *dentry) +{ + int err; + struct inode *inode = dentry->d_inode; + + dprintk("%s: parent: %llu, inode: %llu, name: '%s', parent_nlink: %d, nlink: %d.\n", + __func__, POHMELFS_I(dir)->ino, POHMELFS_I(inode)->ino, + dentry->d_name.name, (signed)dir->i_nlink, (signed)inode->i_nlink); + + err = pohmelfs_remove_entry(dir, dentry); + if (!err) { + inode_dec_link_count(dir); + inode_dec_link_count(inode); + } + + return err; +} + +/* + * Link creation is synchronous. + * I'm lazy. + * Earth is somewhat round. + */ +static int pohmelfs_create_link(struct pohmelfs_inode *parent, struct qstr *obj, + struct pohmelfs_inode *target, struct qstr *tstr) +{ + struct super_block *sb = parent->vfs_inode.i_sb; + struct pohmelfs_sb *psb = POHMELFS_SB(sb); + struct netfs_cmd *cmd; + struct netfs_trans *t; + void *data; + int err, parent_len, target_len = 0, cur_len, path_size = 0; + + err = pohmelfs_data_lock(parent, 0, ~0, POHMELFS_WRITE_LOCK); + if (err) + return err; + + err = sb->s_op->write_inode(&parent->vfs_inode, 0); + if (err) + goto err_out_exit; + + if (tstr) + target_len = tstr->len; + + parent_len = pohmelfs_path_length(parent); + if (target) + target_len += pohmelfs_path_length(target); + + if (parent_len < 0) { + err = parent_len; + goto err_out_exit; + } + + if (target_len < 0) { + err = target_len; + goto err_out_exit; + } + + t = netfs_trans_alloc(psb, parent_len + target_len + obj->len + 2, 0, 0); + if (!t) { + err = -ENOMEM; + goto err_out_exit; + } + cur_len = netfs_trans_cur_len(t); + + cmd = netfs_trans_current(t); + if (IS_ERR(cmd)) { + err = PTR_ERR(cmd); + goto err_out_free; + } + + data = (void *)(cmd + 1); + cur_len -= sizeof(struct netfs_cmd); + + err = pohmelfs_construct_path_string(parent, data, parent_len); + if (err > 0) { + /* Do not place null-byte before the slash */ + path_size = err - 1; + cur_len -= path_size; + + err = snprintf(data + path_size, cur_len, "/%s|", obj->name); + + path_size += err; + cur_len -= err; + + cmd->ext = path_size - 1; /* No | symbol */ + + if (target) { + err = pohmelfs_construct_path_string(target, data + path_size, target_len); + if (err > 0) { + path_size += err; + cur_len -= err; + } + } + } + + if (err < 0) + goto err_out_free; + + cmd->start = 0; + + if (!target && tstr) { + if (tstr->len > cur_len - 1) { + err = -ENAMETOOLONG; + goto err_out_free; + } + + err = snprintf(data + path_size, cur_len, "%s", tstr->name) + 1; /* 0-byte */ + path_size += err; + cur_len -= err; + cmd->start = 1; + } + + dprintk("%s: parent: %llu, obj: '%s', target_inode: %llu, target_str: '%s', full: '%s'.\n", + __func__, parent->ino, obj->name, (target)?target->ino:0, (tstr)?tstr->name:NULL, + (char *)data); + + cmd->cmd = NETFS_LINK; + cmd->size = path_size; + cmd->id = parent->ino; + + netfs_convert_cmd(cmd); + + netfs_trans_update(cmd, t, path_size); + + err = netfs_trans_finish(t, psb); + if (err) + goto err_out_exit; + + return 0; + +err_out_free: + t->result = err; + netfs_trans_put(t); +err_out_exit: + return err; +} + +/* + * VFS hard and soft link callbacks. + */ +static int pohmelfs_link(struct dentry *old_dentry, struct inode *dir, + struct dentry *dentry) +{ + struct inode *inode = old_dentry->d_inode; + struct pohmelfs_inode *pi = POHMELFS_I(inode); + int err; + struct qstr str = dentry->d_name; + + str.hash = jhash(dentry->d_name.name, dentry->d_name.len, 0); + + err = inode->i_sb->s_op->write_inode(inode, 0); + if (err) + return err; + + err = pohmelfs_create_link(POHMELFS_I(dir), &str, pi, NULL); + if (err) + return err; + + return pohmelfs_create_entry(dir, dentry, pi->ino, inode->i_mode); +} + +static int pohmelfs_symlink(struct inode *dir, struct dentry *dentry, const char *symname) +{ + struct qstr sym_str; + struct qstr str = dentry->d_name; + struct inode *inode; + int err; + + str.hash = jhash(dentry->d_name.name, dentry->d_name.len, 0); + + sym_str.name = symname; + sym_str.len = strlen(symname); + + err = pohmelfs_create_link(POHMELFS_I(dir), &str, NULL, &sym_str); + if (err) + goto err_out_exit; + + err = pohmelfs_create_entry(dir, dentry, 0, S_IFLNK | S_IRWXU | S_IRWXG | S_IRWXO); + if (err) + goto err_out_exit; + + inode = dentry->d_inode; + + err = page_symlink(inode, symname, sym_str.len + 1); + if (err) + goto err_out_put; + + return 0; + +err_out_put: + iput(inode); +err_out_exit: + return err; +} + +static int pohmelfs_send_rename(struct pohmelfs_inode *pi, struct pohmelfs_inode *parent, + struct qstr *str) +{ + int path_len, err, total_len = 0, inode_len, parent_len; + char *path; + struct netfs_trans *t; + struct netfs_cmd *cmd; + struct pohmelfs_sb *psb = POHMELFS_SB(pi->vfs_inode.i_sb); + + parent_len = pohmelfs_path_length(parent); + inode_len = pohmelfs_path_length(pi); + + if (parent_len < 0 || inode_len < 0) + return -EINVAL; + + path_len = parent_len + inode_len + str->len + 3; + + t = netfs_trans_alloc(psb, path_len, 0, 0); + if (!t) + return -ENOMEM; + + cmd = netfs_trans_current(t); + path = (char *)(cmd + 1); + + err = pohmelfs_construct_path_string(pi, path, inode_len); + if (err < 0) + goto err_out_unlock; + + cmd->ext = err; + + path += err; + total_len += err; + path_len -= err; + + *path = '|'; + path++; + total_len++; + path_len--; + + err = pohmelfs_construct_path_string(parent, path, parent_len); + if (err < 0) + goto err_out_unlock; + + /* + * Do not place a null-byte before the final slash and the name. + */ + err--; + path += err; + total_len += err; + path_len -= err; + + err = snprintf(path, path_len - 1, "/%s", str->name); + + total_len += err + 1; /* 0 symbol */ + path_len -= err + 1; + + cmd->cmd = NETFS_RENAME; + cmd->id = pi->ino; + cmd->start = parent->ino; + cmd->size = total_len; + + netfs_convert_cmd(cmd); + + netfs_trans_update(cmd, t, total_len); + + return netfs_trans_finish(t, psb); + +err_out_unlock: + netfs_trans_free(t); + return err; +} + +static int pohmelfs_rename(struct inode *old_dir, struct dentry *old_dentry, + struct inode *new_dir, struct dentry *new_dentry) +{ + struct inode *inode = old_dentry->d_inode; + struct pohmelfs_inode *old_parent, *pi, *new_parent; + struct qstr str = new_dentry->d_name; + struct pohmelfs_name *n; + unsigned int old_hash; + int err = -ENOENT; + + pi = POHMELFS_I(inode); + old_parent = POHMELFS_I(old_dir); + + if (new_dir) { + new_dir->i_sb->s_op->write_inode(new_dir, 0); + } + + old_hash = jhash(old_dentry->d_name.name, old_dentry->d_name.len, 0); + str.hash = jhash(new_dentry->d_name.name, new_dentry->d_name.len, 0); + + str.len = new_dentry->d_name.len; + str.name = new_dentry->d_name.name; + str.hash = jhash(new_dentry->d_name.name, new_dentry->d_name.len, 0); + + if (new_dir) { + new_parent = POHMELFS_I(new_dir); + err = -ENOTEMPTY; + + if (S_ISDIR(inode->i_mode) && + new_parent->total_len <= 3) + goto err_out_exit; + } else { + new_parent = old_parent; + } + + dprintk("%s: ino: %llu, parent: %llu, name: '%s' -> parent: %llu, name: '%s', i_size: %llu.\n", + __func__, pi->ino, old_parent->ino, old_dentry->d_name.name, + new_parent->ino, new_dentry->d_name.name, inode->i_size); + + if (test_bit(NETFS_INODE_REMOTE_SYNCED, &pi->state) && + test_bit(NETFS_INODE_OWNED, &pi->state)) { + err = pohmelfs_send_rename(pi, new_parent, &str); + if (err) + goto err_out_exit; + } + + n = pohmelfs_name_alloc(str.len + 1); + if (!n) + goto err_out_exit; + + mutex_lock(&new_parent->offset_lock); + n->ino = pi->ino; + n->mode = inode->i_mode; + n->len = str.len; + n->hash = str.hash; + sprintf(n->data, "%s", str.name); + + err = pohmelfs_insert_name(new_parent, n); + mutex_unlock(&new_parent->offset_lock); + + if (err) + goto err_out_exit; + + mutex_lock(&old_parent->offset_lock); + n = pohmelfs_search_hash(old_parent, old_hash); + if (n) + pohmelfs_name_del(old_parent, n); + mutex_unlock(&old_parent->offset_lock); + + mark_inode_dirty(inode); + mark_inode_dirty(&new_parent->vfs_inode); + + WARN_ON_ONCE(list_empty(&inode->i_dentry)); + + return 0; + +err_out_exit: + + clear_bit(NETFS_INODE_REMOTE_SYNCED, &pi->state); + + mutex_unlock(&inode->i_mutex); + return err; +} + +/* + * POHMELFS directory inode operations. + */ +const struct inode_operations pohmelfs_dir_inode_ops = { + .link = pohmelfs_link, + .symlink = pohmelfs_symlink, + .unlink = pohmelfs_unlink, + .mkdir = pohmelfs_mkdir, + .rmdir = pohmelfs_rmdir, + .create = pohmelfs_create, + .lookup = pohmelfs_lookup, + .setattr = pohmelfs_setattr, + .rename = pohmelfs_rename, +}; diff --git a/drivers/staging/pohmelfs/path_entry.c b/drivers/staging/pohmelfs/path_entry.c new file mode 100644 index 0000000..b62374b --- /dev/null +++ b/drivers/staging/pohmelfs/path_entry.c @@ -0,0 +1,114 @@ +/* + * 2007+ Copyright (c) Evgeniy Polyakov <zbr@ioremap.net> + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include <linux/module.h> +#include <linux/slab.h> +#include <linux/fs.h> +#include <linux/ktime.h> +#include <linux/fs.h> +#include <linux/pagemap.h> +#include <linux/writeback.h> +#include <linux/mount.h> +#include <linux/mm.h> + +#include "netfs.h" + +#define UNHASHED_OBSCURE_STRING_SIZE sizeof(" (deleted)") + +/* + * Create path from root for given inode. + * Path is formed as set of stuctures, containing name of the object + * and its inode data (mode, permissions and so on). + */ +int pohmelfs_construct_path_string(struct pohmelfs_inode *pi, void *data, int len) +{ + struct path path; + struct dentry *d; + char *ptr; + int err = 0, strlen, reduce = 0; + + d = d_find_alias(&pi->vfs_inode); + if (!d) { + printk("%s: no alias, list_empty: %d.\n", __func__, list_empty(&pi->vfs_inode.i_dentry)); + return -ENOENT; + } + + read_lock(¤t->fs->lock); + path.mnt = mntget(current->fs->root.mnt); + read_unlock(¤t->fs->lock); + + path.dentry = d; + + if (!IS_ROOT(d) && d_unhashed(d)) + reduce = 1; + + ptr = d_path(&path, data, len); + if (IS_ERR(ptr)) { + err = PTR_ERR(ptr); + goto out; + } + + if (reduce && len >= UNHASHED_OBSCURE_STRING_SIZE) { + char *end = data + len - UNHASHED_OBSCURE_STRING_SIZE; + *end = '\0'; + } + + strlen = len - (ptr - (char *)data); + memmove(data, ptr, strlen); + ptr = data; + + err = strlen; + + dprintk("%s: dname: '%s', len: %u, maxlen: %u, name: '%s', strlen: %d.\n", + __func__, d->d_name.name, d->d_name.len, len, ptr, strlen); + +out: + dput(d); + mntput(path.mnt); + + return err; +} + +int pohmelfs_path_length(struct pohmelfs_inode *pi) +{ + struct dentry *d, *root, *first; + int len = 1; /* Root slash */ + + first = d = d_find_alias(&pi->vfs_inode); + if (!d) { + dprintk("%s: ino: %llu, mode: %o.\n", __func__, pi->ino, pi->vfs_inode.i_mode); + return -ENOENT; + } + + read_lock(¤t->fs->lock); + root = dget(current->fs->root.dentry); + read_unlock(¤t->fs->lock); + + spin_lock(&dcache_lock); + + if (!IS_ROOT(d) && d_unhashed(d)) + len += UNHASHED_OBSCURE_STRING_SIZE; /* Obscure " (deleted)" string */ + + while (d && d != root && !IS_ROOT(d)) { + len += d->d_name.len + 1; /* Plus slash */ + d = d->d_parent; + } + spin_unlock(&dcache_lock); + + dput(root); + dput(first); + + return len + 1; /* Including zero-byte */ +} ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [0/9] pohmelfs: The Great Southern Trendkill release.
@ 2008-12-26 14:18 Evgeniy Polyakov
2008-12-26 14:18 ` [1/9] pohmelfs: documentation Evgeniy Polyakov
0 siblings, 1 reply; 6+ messages in thread
From: Evgeniy Polyakov @ 2008-12-26 14:18 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-kernel, pohmelfs, netdev, linux-fsdevel, Evgeniy Polyakov
POHMELFS stands for Parallel Optimized Host Message Exchange Layered File System.
This is a high performance network filesystem with local coherent cache of
data and metadata. Its main goal is distributed parallel processing of data.
POHMELFS is a kernel client for the developed distributed parallel internet
filesystem. As it exists today, it is a high-performance parallel network
filesystem with ability to balance reading from multiple hosts and simultaneously
write data to multiple hosts.
Main design goal of this filesystem is to implement very fast and scalable
network filesystem with local writeback cache of data and metadata, which
greatly speeds up every IO operation compared to traditional writethrough
based network filesystems.
Read balancing and writing to multiple hosts features can be used to improve
parallel multithreaded read-mostly data processing workload and organize
fault-tolerant systems. POHMELFS as a network client does not support data
synchrnonization between the nodes, so this task should be implemented in
servers. POHMELFS and multiple-server-write can be used as backup solution
for the physically distributed network servers.
Currently development is concentrated on the distributed object-based
server development implemeneted with distributed hash table design approach
in mind, which main goals it completely transparent from client point of
view node management, full absence of any controlling central servers
(points of failure), transaction/history based object storage.
POHMELFS utilizes writeback cache, which is built on top of MO(E)SI-like
coherency protocol. It uses scalable cached read/write locking. No additional
requests are performed if lock is granted to the filesystem. The same protocol
is used by the server to on-demand flushing of the client's cache (for
example when server wants to update local data or send some new content
into the clients caches).
POHMELFS is able to encrypt data channel or perform strong data checksumming.
Algorithms used by the filesystems are autoconfigured during startup and
mount may fail (depending on options) if server does not support requested
algorithms.
Autoconfiguration also involve sending information about size of the exported
directory specified by the server, permission, statistics about amount of inodes,
used space and so on.
POHMELFS utilizes transaction model for all its operations.
Each transction is an object, which may embed multiple commands completed
atomically. When server fails the whole transaction will be replied against it
(or different server) later. This approach allows to maintain high data integrity
and do not desynchronize filesystem state in case of network or server failures.
Basic POHMELFS features:
* Local coherent cache for data and metadata. (Byte-range) locking. Locks were
prepared to be byte-range, but since all Linux filesystems lock the whole
inode, it was decided to lock the whole object during writing. Actual
messages being sent for locking/cache coherency protocol are byte-range,
but because the whole inode is locked, lock is cached, so range actually
is equal to the inode size. One can simultaneously write into the same page
via different offsets from different client, and every time file will be
coherent on all clients which do it and on the server itself.
* Completely async processing of all events (hard and symlinks are the only
exceptions) including object creation and data reading and writing.
* Flexible object architecture optimized for network processing. Ability to
create long pathes to object and remove arbitrary huge directories in
single network command.
* High performance is one of the main design goals.
* Very fast and scalable multithreaded userspace server. Being in userspace it
works with any underlying filesystem and still is much faster than async
in-kernel NFS one.
* Transactions support. Full failover for all operations. Resending
transactions to different servers on timeout or error.
* Client is able to switch between different servers (if one goes down, client
automatically reconnects to second and so on).
* Client parallel extensions: ability to write to multiple servers and balance
reading between them.
* Client dynamic server reconfiguration: ability to add/remove servers from
working set in run-time.
* Strong authentification and possible data encryption in network channel.
* Extended attributes support.
* Read-only mounts, ability to limit maximum size of the exported directory.
One can grab sources from archive or GIT tree (web interface).
1. POHMELFS homepage.
http://www.ioremap.net/projects/pohmelfs
2. POHMELFS archive.
http://www.ioremap.net/archive/pohmelfs/
3. POHMELFS git tree.
http://www.ioremap.net/cgi-bin/gitweb.cgi
4. POHMELFS maillist.
pohmelfs @ ioremap.net
Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net>
.../filesystems/pohmelfs/design_notes.txt | 70 +
Documentation/filesystems/pohmelfs/info.txt | 86 +
.../filesystems/pohmelfs/network_protocol.txt | 227 +++
fs/Kconfig | 2 +
fs/Makefile | 1 +
fs/pohmelfs/Kconfig | 23 +
fs/pohmelfs/Makefile | 3 +
fs/pohmelfs/config.c | 478 +++++
fs/pohmelfs/crypto.c | 880 +++++++++
fs/pohmelfs/dir.c | 1139 +++++++++++
fs/pohmelfs/inode.c | 2085 ++++++++++++++++++++
fs/pohmelfs/lock.c | 186 ++
fs/pohmelfs/mcache.c | 171 ++
fs/pohmelfs/net.c | 1180 +++++++++++
fs/pohmelfs/netfs.h | 934 +++++++++
fs/pohmelfs/path_entry.c | 356 ++++
fs/pohmelfs/trans.c | 704 +++++++
mm/filemap.c | 2 +
18 files changed, 8527 insertions(+), 0 deletions(-)
^ permalink raw reply [flat|nested] 6+ messages in thread* [1/9] pohmelfs: documentation. 2008-12-26 14:18 [0/9] pohmelfs: The Great Southern Trendkill release Evgeniy Polyakov @ 2008-12-26 14:18 ` Evgeniy Polyakov 2008-12-26 14:18 ` [2/9] pohmelfs: configuration interface Evgeniy Polyakov 0 siblings, 1 reply; 6+ messages in thread From: Evgeniy Polyakov @ 2008-12-26 14:18 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, pohmelfs, netdev, linux-fsdevel, Eveniy Polyakov This patch includes POHMELFS design and implementation description. Separate file includes mount options, default parameters and usage examples. Signed-off-by: Eveniy Polyakov <zbr@ioremap.net> diff --git a/Documentation/filesystems/pohmelfs/design_notes.txt b/Documentation/filesystems/pohmelfs/design_notes.txt new file mode 100644 index 0000000..6d6db60 --- /dev/null +++ b/Documentation/filesystems/pohmelfs/design_notes.txt @@ -0,0 +1,70 @@ +POHMELFS: Parallel Optimized Host Message Exchange Layered File System. + + Evgeniy Polyakov <zbr@ioremap.net> + +Homepage: http://www.ioremap.net/projects/pohmelfs + +POHMELFS first began as a network filesystem with coherent local data and +metadata caches but is now evolving into a parallel distributed filesystem. + +Main features of this FS include: + * Locally coherent cache for data and metadata with (potentially) byte-range locks. + Since all Linux filesystems lock the whole inode during writing, algorithm + is very simple and does not use byte-ranges, although they are sent in + locking messages. + * Completely async processing of all events except creation of hard and symbolic + links, and rename events. + Object creation and data reading and writing are processed asynchronously. + * Flexible object architecture optimized for network processing. + Ability to create long paths to objects and remove arbitrarily huge + directories with a single network command. + (like removing the whole kernel tree via a single network command). + * Very high performance. + * Fast and scalable multithreaded userspace server. Being in userspace it works + with any underlying filesystem and still is much faster than async in-kernel NFS one. + * Client is able to switch between different servers (if one goes down, client + automatically reconnects to second and so on). + * Transactions support. Full failover for all operations. + Resending transactions to different servers on timeout or error. + * Read request (data read, directory listing, lookup requests) balancing between multiple servers. + * Write requests are replicated to multiple servers and completed only when all of them are acked. + * Ability to add and/or remove servers from the working set at run-time. + * Strong authentification and possible data encryption in network channel. + * Extended attributes support. + +POHMELFS is based on transactions, which are potentially long-standing objects that live +in the client's memory. Each transaction contains all the information needed to process a given +command (or set of commands, which is frequently used during data writing: single transactions +can contain creation and data writing commands). Transactions are committed by all the servers +to which they are sent and, in case of failures, are eventually resent or dropped with an error. +For example, reading will return an error if no servers are available. + +POHMELFS uses a asynchronous approach to data processing. Courtesy of transactions, it is +possible to detach replies from requests and, if the command requires data to be received, the +caller sleeps waiting for it. Thus, it is possible to issue multiple read commands to different +servers and async threads will pick up replies in parallel, find appropriate transactions in the +system and put the data where it belongs (like the page or inode cache). + +The main feature of POHMELFS is writeback data and the metadata cache. +Only a few non-performance critical operations use the write-through cache and +are synchronous: hard and symbolic link creation, and object rename. Creation, +removal of objects and data writing are asynchronous and are sent to +the server during system writeback. Only one writer at a time is allowed for any +given inode, which is guarded by an appropriate locking protocol. +Because of this feature, POHMELFS is extremely fast at metadata intensive +workloads and can fully utilize the bandwidth to the servers when doing bulk +data transfers. + +POHMELFS clients operate with a working set of servers and are capable of balancing read-only +operations (like lookups or directory listings) between them. +Administrators can add or remove servers from the set at run-time via special commands (described +in Documentation/pohmelfs/info.txt file). Writes are replicated to all servers. + +POHMELFS is capable of full data channel encryption and/or strong crypto hashing. +One can select any kernel supported cipher, encryption mode, hash type and operation mode +(hmac or digest). It is also possible to use both or neither (default). Crypto configuration +is checked during mount time and, if the server does not support it, appropriate capabilities +will be disabled or mount will fail (if 'crypto_fail_unsupported' mount option is specified). +Crypto performance heavily depends on the number of crypto threads, which asynchronously perform +crypto operations and send the resulting data to server or submit it up the stack. This number +can be controlled via a mount option. diff --git a/Documentation/filesystems/pohmelfs/info.txt b/Documentation/filesystems/pohmelfs/info.txt new file mode 100644 index 0000000..4e3d501 --- /dev/null +++ b/Documentation/filesystems/pohmelfs/info.txt @@ -0,0 +1,86 @@ +POHMELFS usage information. + +Mount options: +idx=%u + Each mountpoint is associated with a special index via this option. + Administrator can add or remove servers from the given index, so all mounts, + which were attached to it, are updated. + Default it is 0. + +trans_scan_timeout=%u + This timeout, expressed in milliseconds, specifies time to scan transaction + trees looking for stale requests, which have to be resent, or if number of + retries exceed specified limit, dropped with error. + Default is 5 seconds. + +drop_scan_timeout=%u + Internal timeout, expressed in milliseconds, which specifies how frequently + inodes marked to be dropped are freed. It also specifies how frequently + the system checks that servers have to be added or removed from current working set. + Default is 1 second. + +wait_on_page_timeout=%u + Number of milliseconds to wait for reply from remote server for data reading command. + If this timeout is exceeded, reading returns an error. + Default is 5 seconds. + +trans_retries=%u + This is the number of times that a transaction will be resent to a server that did + not answer for the last @trans_scan_timeout milliseconds. + When the number of resends exceeds this limit, the transaction is completed with error. + Default is 5 resends. + +crypto_thread_num=%u + Number of crypto processing threads. Threads are used both for RX and TX traffic. + Default is 2, or no threads if crypto operations are not supported. + +trans_max_pages=%u + Maximum number of pages in a single transaction. This parameter also controls + the number of pages, allocated for crypto processing (each crypto thread has + pool of pages, the number of which is equal to 'trans_max_pages'. + Default is 100 pages. + +crypto_fail_unsupported + If specified, mount will fail if the server does not support requested crypto operations. + By default mount will disable non-matching crypto operations. + +mcache_timeout=%u + Maximum number of milliseconds to wait for the mcache objects to be processed. + Mcache includes locks (given lock should be granted by server), attributes (they should be + fully received in the given timeframe). + Default is 5 seconds. + +Usage examples. + +Add (or remove if it already exists) server server1.net:1025 into the working set with index $idx +with appropriate hash algorithm and key file and cipher algorithm, mode and key file: +$cfg -a server1.net -p 1025 -i $idx -K $hash_key -k $cipher_key + +Mount filesystem with given index $idx to /mnt mountpoint. +Client will connect to all servers specified in the working set via previous command: +mount -t pohmel -o idx=$idx q /mnt + +One can add or remove servers from working set after mounting too. + + +Server installation. + +Creating a server, which listens at port 1025 and 0.0.0.0 address. +Working root directory (note, that server chroots there, so you have to have appropriate permissions) +is set to /mnt, server will negotiate hash/cipher with client, in case client requested it, there +are appropriate key files. +Number of working threads is set to 10. + +# ./fserver -a 0.0.0.0 -p 1025 -r /mnt -w 10 -K hash_key -k cipher_key + + -A 6 - listen on ipv6 address. Default: Disabled. + -r root - path to root directory. Default: /tmp. + -a addr - listen address. Default: 0.0.0.0. + -p port - listen port. Default: 1025. + -w workers - number of workers per connected client. Default: 1. + -K file - hash key size. Default: none. + -k file - cipher key size. Default: none. + -h - this help. + +Number of worker threads specifies how many workers will be created for each client. +Bulk single-client transafers usually are better handled with smaller number (like 1-3). diff --git a/Documentation/filesystems/pohmelfs/network_protocol.txt b/Documentation/filesystems/pohmelfs/network_protocol.txt new file mode 100644 index 0000000..40ea6c2 --- /dev/null +++ b/Documentation/filesystems/pohmelfs/network_protocol.txt @@ -0,0 +1,227 @@ +POHMELFS network protocol. + +Basic structure used in network communication is following command: + +struct netfs_cmd +{ + __u16 cmd; /* Command number */ + __u16 csize; /* Attached crypto information size */ + __u16 cpad; /* Attached padding size */ + __u16 ext; /* External flags */ + __u32 size; /* Size of the attached data */ + __u32 trans; /* Transaction id */ + __u64 id; /* Object ID to operate on. Used for feedback.*/ + __u64 start; /* Start of the object. */ + __u64 iv; /* IV sequence */ + __u8 data[0]; +}; + +Commands can be embedded into transaction command (which in turn has own command), +so one can extend protocol as needed without breaking backward compatibility as long +as old commands are supported. All string lengths include tail 0 byte. + +All commans are transfered over the network in big-endian. CPU endianess is used at the end peers. + +@cmd - command number, which specifies command to be processed. Following + commands are used currently: + + NETFS_READDIR = 1, /* Read directory for given inode number */ + NETFS_READ_PAGE, /* Read data page from the server */ + NETFS_WRITE_PAGE, /* Write data page to the server */ + NETFS_CREATE, /* Create directory entry */ + NETFS_REMOVE, /* Remove directory entry */ + NETFS_LOOKUP, /* Lookup single object */ + NETFS_LINK, /* Create a link */ + NETFS_TRANS, /* Transaction */ + NETFS_OPEN, /* Open intent */ + NETFS_INODE_INFO, /* Metadata cache coherency synchronization message */ + NETFS_PAGE_CACHE, /* Page cache invalidation message */ + NETFS_READ_PAGES, /* Read multiple contiguous pages in one go */ + NETFS_RENAME, /* Rename object */ + NETFS_CAPABILITIES, /* Capabilities of the client, for example supported crypto */ + NETFS_LOCK, /* Distributed lock message */ + NETFS_XATTR_SET, /* Set extended attribute */ + NETFS_XATTR_GET, /* Get extended attribute */ + +@ext - external flags. Used by different commands to specify some extra arguments + like partial size of the embedded objects or creation flags. + +@size - size of the attached data. For NETFS_READ_PAGE and NETFS_READ_PAGES no data is attached, + but size of the requested data is incorporated here. It does not include size of the command + header (struct netfs_cmd) itself. + +@id - id of the object this command operates on. Each command can use it for own purpose. + +@start - start of the object this command operates on. Each command can use it for own purpose. + +@csize, @cpad - size and padding size of the (attached if needed) crypto information. + +Command specifications. + +@NETFS_READDIR +This command is used to sync content of the remote dir to the client. + +@ext - length of the path to object. +@size - the same. +@id - local inode number of the directory to read. +@start - zero. + + +@NETFS_READ_PAGE +This command is used to read data from remote server. +Data size does not exceed local page cache size. + +@id - inode number. +@start - first byte offset. +@size - number of bytes to read plus length of the path to object. +@ext - object path length. + + +@NETFS_CREATE +Used to create object. +It does not require that all directories on top of the object were +already created, it will create them automatically. Each object has +associated @netfs_path_entry data structure, which contains creation +mode (permissions and type) and length of the name as long as name itself. + +@start - 0 +@size - size of the all data structures needed to create a path +@id - local inode number +@ext - 0 + + +@NETFS_REMOVE +Used to remove object. + +@ext - length of the path to object. +@size - the same. +@id - local inode number. +@start - zero. + + +@NETFS_LOOKUP +Lookup information about object on server. + +@ext - length of the path to object. +@size - the same. +@id - local inode number of the directory to look object in. +@start - local inode number of the object to look at. + + +@NETFS_LINK +Create hard of symlink. +Command is sent as "object_path|target_path". + +@size - size of the above string. +@id - parent local inode number. +@start - 1 for symlink, 0 for hardlink. +@ext - size of the "object_path" above. + + +@NETFS_TRANS +Transaction header. + +@size - incorporates all embedded command sizes including theirs header sizes. +@start - transaction generation number - unique id used to find transaction. +@ext - transaction flags. Unused at the moment. +@id - 0. + + +@NETFS_OPEN +Open intent for given transaction. + +@id - local inode number. +@start - 0. +@size - path length to the object. +@ext - open flags (O_RDWR and so on). + + +@NETFS_INODE_INFO +Metadata update command. +It is sent to servers when attributes of the object are changed and received +when data or metadata were updated. It operates with the following structure: + +struct netfs_inode_info +{ + unsigned int mode; + unsigned int nlink; + unsigned int uid; + unsigned int gid; + unsigned int blocksize; + unsigned int padding; + __u64 ino; + __u64 blocks; + __u64 rdev; + __u64 size; + __u64 version; +}; + +It effectively mirrors stat(2) returned data. + + +@ext - path length to the object. +@size - the same plus size of the netfs_inode_info structure. +@id - local inode number. +@start - 0. + + +@NETFS_PAGE_CACHE +Command is only received by clients. It contains information about +page to be marked as not up-to-date. + +@id - client's inode number. +@start - last byte of the page to be invalidated. If it is not equal to + current inode size, it will be vmtruncated(). +@size - 0 +@ext - 0 + + +@NETFS_READ_PAGES +Used to read multiple contiguous pages in one go. + +@start - first byte of the contiguous region to read. +@size - contains of two fields: lower 8 bits are used to represent page cache shift + used by client, another 3 bytes are used to get number of pages. +@id - local inode number. +@ext - path length to the object. + + +@NETFS_RENAME +Used to rename object. +Attached data is formed into following string: "old_path|new_path". + +@id - local inode number. +@start - parent inode number. +@size - length of the above string. +@ext - length of the old path part. + + +@NETFS_CAPABILITIES +Used to exchange crypto capabilities with server. +If crypto capabilities are not supported by server, then client will disable it +or fail (if 'crypto_fail_unsupported' mount options was specified). + +@id - superblock index. Used to specify crypto information for group of servers. +@size - size of the attached capabilities structure. +@start - 0. +@size - 0. +@scsize - 0. + +@NETFS_LOCK +Used to send lock request/release messages. Although it sends byte range request +and is capable of flushing pages based on that, it is not used, since all Linux +filesystems lock the whole inode. + +@id - lock generation number. +@start - start of the locked range. +@size - size of the locked range. +@ext - lock type: read/write. Not used actually. 15'th bit is used to determine, + if it is lock request (1) or release (0). + +@NETFS_XATTR_SET +@NETFS_XATTR_GET +Used to set/get extended attributes for given inode. +@id - attribute generation number or xattr setting type +@start - size of the attribute (request or attached) +@size - name length, path len and data size for given attribute +@ext - path length for given object ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [2/9] pohmelfs: configuration interface. 2008-12-26 14:18 ` [1/9] pohmelfs: documentation Evgeniy Polyakov @ 2008-12-26 14:18 ` Evgeniy Polyakov 2008-12-26 14:18 ` [3/9] pohmelfs: crypto processing Evgeniy Polyakov 0 siblings, 1 reply; 6+ messages in thread From: Evgeniy Polyakov @ 2008-12-26 14:18 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, pohmelfs, netdev, linux-fsdevel, Evgeniy Polyakov This patch includes POHMELFS configuration interface based on netlink kernel connector. This interface allows to create configuration groups in the kerenel indexed by mount ID option. Each configuration group can include multiple servers to work with and various operation parameters. Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net> diff --git a/fs/pohmelfs/config.c b/fs/pohmelfs/config.c new file mode 100644 index 0000000..941ab02 --- /dev/null +++ b/fs/pohmelfs/config.c @@ -0,0 +1,478 @@ +/* + * 2007+ Copyright (c) Evgeniy Polyakov <zbr@ioremap.net> + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include <linux/kernel.h> +#include <linux/connector.h> +#include <linux/crypto.h> +#include <linux/list.h> +#include <linux/mutex.h> +#include <linux/string.h> +#include <linux/in.h> + +#include "netfs.h" + +/* + * Global configuration list. + * Each client can be asked to get one of them. + * + * Allows to provide remote server address (ipv4/v6/whatever), port + * and so on via kernel connector. + */ + +static struct cb_id pohmelfs_cn_id = {.idx = POHMELFS_CN_IDX, .val = POHMELFS_CN_VAL}; +static LIST_HEAD(pohmelfs_config_list); +static DEFINE_MUTEX(pohmelfs_config_lock); + +static inline int pohmelfs_config_eql(struct pohmelfs_ctl *sc, struct pohmelfs_ctl *ctl) +{ + if (sc->idx == ctl->idx && sc->type == ctl->type && + sc->proto == ctl->proto && + sc->addrlen == ctl->addrlen && + !memcmp(&sc->addr, &ctl->addr, ctl->addrlen)) + return 1; + + return 0; +} + +static struct pohmelfs_config_group *pohmelfs_find_config_group(unsigned int idx) +{ + struct pohmelfs_config_group *g, *group = NULL; + + list_for_each_entry(g, &pohmelfs_config_list, group_entry) { + if (g->idx == idx) { + group = g; + break; + } + } + + return group; +} + +static struct pohmelfs_config_group *pohmelfs_find_create_config_group(unsigned int idx) +{ + struct pohmelfs_config_group *g; + + g = pohmelfs_find_config_group(idx); + if (g) + return g; + + g = kzalloc(sizeof(struct pohmelfs_config_group), GFP_KERNEL); + if (!g) + return NULL; + + INIT_LIST_HEAD(&g->config_list); + g->idx = idx; + g->num_entry = 0; + + list_add_tail(&g->group_entry, &pohmelfs_config_list); + + return g; +} + +int pohmelfs_copy_config(struct pohmelfs_sb *psb) +{ + struct pohmelfs_config_group *g; + struct pohmelfs_config *c, *dst; + int err = -ENODEV; + + mutex_lock(&pohmelfs_config_lock); + + g = pohmelfs_find_config_group(psb->idx); + if (!g) + goto out_unlock; + + /* + * Run over all entries in given config group and try to crate and + * initialize those, which do not exist in superblock list. + * Skip all existing entries. + */ + + list_for_each_entry(c, &g->config_list, config_entry) { + err = 0; + list_for_each_entry(dst, &psb->state_list, config_entry) { + if (pohmelfs_config_eql(&dst->state.ctl, &c->state.ctl)) { + err = -EEXIST; + break; + } + } + + if (err) + continue; + + dst = kzalloc(sizeof(struct pohmelfs_config), GFP_KERNEL); + if (!dst) { + err = -ENOMEM; + break; + } + + memcpy(&dst->state.ctl, &c->state.ctl, sizeof(struct pohmelfs_ctl)); + + list_add_tail(&dst->config_entry, &psb->state_list); + + err = pohmelfs_state_init_one(psb, dst); + if (err) { + list_del(&dst->config_entry); + kfree(dst); + } + + err = 0; + } + +out_unlock: + mutex_unlock(&pohmelfs_config_lock); + + return err; +} + +int pohmelfs_copy_crypto(struct pohmelfs_sb *psb) +{ + struct pohmelfs_config_group *g; + int err = -ENOENT; + + mutex_lock(&pohmelfs_config_lock); + g = pohmelfs_find_config_group(psb->idx); + if (!g) + goto err_out_exit; + + if (g->hash_string) { + err = -ENOMEM; + psb->hash_string = kstrdup(g->hash_string, GFP_KERNEL); + if (!psb->hash_string) + goto err_out_exit; + psb->hash_strlen = g->hash_strlen; + } + + if (g->cipher_string) { + psb->cipher_string = kstrdup(g->cipher_string, GFP_KERNEL); + if (!psb->cipher_string) + goto err_out_free_hash_string; + psb->cipher_strlen = g->cipher_strlen; + } + + if (g->hash_keysize) { + psb->hash_key = kmalloc(g->hash_keysize, GFP_KERNEL); + if (!psb->hash_key) + goto err_out_free_cipher_string; + memcpy(psb->hash_key, g->hash_key, g->hash_keysize); + psb->hash_keysize = g->hash_keysize; + } + + if (g->cipher_keysize) { + psb->cipher_key = kmalloc(g->cipher_keysize, GFP_KERNEL); + if (!psb->cipher_key) + goto err_out_free_hash; + memcpy(psb->cipher_key, g->cipher_key, g->cipher_keysize); + psb->cipher_keysize = g->cipher_keysize; + } + + mutex_unlock(&pohmelfs_config_lock); + + return 0; + +err_out_free_hash: + kfree(psb->hash_key); +err_out_free_cipher_string: + kfree(psb->cipher_string); +err_out_free_hash_string: + kfree(psb->hash_string); +err_out_exit: + mutex_unlock(&pohmelfs_config_lock); + return err; +} + +static int pohmelfs_send_reply(int err, int msg_num, int action, struct cn_msg *msg, struct pohmelfs_ctl *ctl) +{ + struct pohmelfs_cn_ack *ack; + + ack = kmalloc(sizeof(struct pohmelfs_cn_ack), GFP_KERNEL); + if (!ack) + return -ENOMEM; + + memset(ack, 0, sizeof(struct pohmelfs_cn_ack)); + memcpy(&ack->msg, msg, sizeof(struct cn_msg)); + + if (action == POHMELFS_CTLINFO_ACK) + memcpy(&ack->ctl, ctl, sizeof(struct pohmelfs_ctl)); + + ack->msg.len = sizeof(struct pohmelfs_cn_ack) - sizeof(struct cn_msg); + ack->msg.ack = msg->ack + 1; + ack->error = err; + ack->msg_num = msg_num; + + cn_netlink_send(&ack->msg, 0, GFP_KERNEL); + kfree(ack); + return 0; +} + +static int pohmelfs_cn_disp(struct cn_msg *msg) +{ + struct pohmelfs_config_group *g; + struct pohmelfs_ctl *ctl = (struct pohmelfs_ctl *)msg->data; + struct pohmelfs_config *c, *tmp; + int err = 0, i = 1; + + if (msg->len != sizeof(struct pohmelfs_ctl)) + return -EBADMSG; + + mutex_lock(&pohmelfs_config_lock); + + g = pohmelfs_find_config_group(ctl->idx); + if (!g) { + pohmelfs_send_reply(err, 0, POHMELFS_NOINFO_ACK, msg, NULL); + goto out_unlock; + } + + list_for_each_entry_safe(c, tmp, &g->config_list, config_entry) { + struct pohmelfs_ctl *sc = &c->state.ctl; + if (pohmelfs_send_reply(err, g->num_entry - i, POHMELFS_CTLINFO_ACK, msg, sc)) { + err = -ENOMEM; + goto out_unlock; + } + i += 1; + } + +out_unlock: + mutex_unlock(&pohmelfs_config_lock); + return err; +} + +static int pohmelfs_cn_ctl(struct cn_msg *msg, int action) +{ + struct pohmelfs_config_group *g; + struct pohmelfs_ctl *ctl = (struct pohmelfs_ctl *)msg->data; + struct pohmelfs_config *c, *tmp; + int err = 0; + + if (msg->len != sizeof(struct pohmelfs_ctl)) + return -EBADMSG; + + mutex_lock(&pohmelfs_config_lock); + + g = pohmelfs_find_create_config_group(ctl->idx); + if (!g) { + err = -ENOMEM; + goto out_unlock; + } + + list_for_each_entry_safe(c, tmp, &g->config_list, config_entry) { + struct pohmelfs_ctl *sc = &c->state.ctl; + + if (pohmelfs_config_eql(sc, ctl)) { + if (action == POHMELFS_FLAGS_ADD) { + err = -EEXIST; + goto out_unlock; + } else if (action == POHMELFS_FLAGS_DEL) { + list_del(&c->config_entry); + g->num_entry--; + kfree(c); + goto out_unlock; + } else { + err = -EEXIST; + goto out_unlock; + } + } + } + if (action == POHMELFS_FLAGS_DEL) { + err = -EBADMSG; + goto out_unlock; + } + + c = kzalloc(sizeof(struct pohmelfs_config), GFP_KERNEL); + if (!c) { + err = -ENOMEM; + goto out_unlock; + } + memcpy(&c->state.ctl, ctl, sizeof(struct pohmelfs_ctl)); + g->num_entry++; + list_add_tail(&c->config_entry, &g->config_list); + +out_unlock: + mutex_unlock(&pohmelfs_config_lock); + if (pohmelfs_send_reply(err, 0, POHMELFS_NOINFO_ACK, msg, NULL)) + err = -ENOMEM; + + return err; +} + +static int pohmelfs_crypto_hash_init(struct pohmelfs_config_group *g, struct pohmelfs_crypto *c) +{ + char *algo = (char *)c->data; + u8 *key = (u8 *)(algo + c->strlen); + + if (g->hash_string) + return -EEXIST; + + g->hash_string = kstrdup(algo, GFP_KERNEL); + if (!g->hash_string) + return -ENOMEM; + g->hash_strlen = c->strlen; + g->hash_keysize = c->keysize; + + g->hash_key = kmalloc(c->keysize, GFP_KERNEL); + if (!g->hash_key) { + kfree(g->hash_string); + return -ENOMEM; + } + + memcpy(g->hash_key, key, c->keysize); + + return 0; +} + +static int pohmelfs_crypto_cipher_init(struct pohmelfs_config_group *g, struct pohmelfs_crypto *c) +{ + char *algo = (char *)c->data; + u8 *key = (u8 *)(algo + c->strlen); + + if (g->cipher_string) + return -EEXIST; + + g->cipher_string = kstrdup(algo, GFP_KERNEL); + if (!g->cipher_string) + return -ENOMEM; + g->cipher_strlen = c->strlen; + g->cipher_keysize = c->keysize; + + g->cipher_key = kmalloc(c->keysize, GFP_KERNEL); + if (!g->cipher_key) { + kfree(g->cipher_string); + return -ENOMEM; + } + + memcpy(g->cipher_key, key, c->keysize); + + return 0; +} + + +static int pohmelfs_cn_crypto(struct cn_msg *msg) +{ + struct pohmelfs_crypto *crypto = (struct pohmelfs_crypto *)msg->data; + struct pohmelfs_config_group *g; + int err = 0; + + dprintk("%s: idx: %u, strlen: %u, type: %u, keysize: %u, algo: %s.\n", + __func__, crypto->idx, crypto->strlen, crypto->type, + crypto->keysize, (char *)crypto->data); + + mutex_lock(&pohmelfs_config_lock); + g = pohmelfs_find_create_config_group(crypto->idx); + if (!g) { + err = -ENOMEM; + goto out_unlock; + } + + switch (crypto->type) { + case POHMELFS_CRYPTO_HASH: + err = pohmelfs_crypto_hash_init(g, crypto); + break; + case POHMELFS_CRYPTO_CIPHER: + err = pohmelfs_crypto_cipher_init(g, crypto); + break; + default: + err = -ENOTSUPP; + break; + } + +out_unlock: + mutex_unlock(&pohmelfs_config_lock); + if (pohmelfs_send_reply(err, 0, POHMELFS_NOINFO_ACK, msg, NULL)) + err = -ENOMEM; + + return err; +} + +static void pohmelfs_cn_callback(void *data) +{ + struct cn_msg *msg = data; + int err; + + switch (msg->flags) { + case POHMELFS_FLAGS_ADD: + err = pohmelfs_cn_ctl(msg, POHMELFS_FLAGS_ADD); + break; + case POHMELFS_FLAGS_DEL: + err = pohmelfs_cn_ctl(msg, POHMELFS_FLAGS_DEL); + break; + case POHMELFS_FLAGS_SHOW: + err = pohmelfs_cn_disp(msg); + break; + case POHMELFS_FLAGS_CRYPTO: + err = pohmelfs_cn_crypto(msg); + break; + default: + err = -ENOSYS; + break; + } +} + +int pohmelfs_config_check(struct pohmelfs_config *config, int idx) +{ + struct pohmelfs_ctl *ctl = &config->state.ctl; + struct pohmelfs_config *tmp; + int err = -ENOENT; + struct pohmelfs_ctl *sc; + struct pohmelfs_config_group *g; + + mutex_lock(&pohmelfs_config_lock); + + g = pohmelfs_find_config_group(ctl->idx); + if (g) { + list_for_each_entry(tmp, &g->config_list, config_entry) { + sc = &tmp->state.ctl; + + if (pohmelfs_config_eql(sc, ctl)) { + err = 0; + break; + } + } + } + + mutex_unlock(&pohmelfs_config_lock); + + return err; +} + +int __init pohmelfs_config_init(void) +{ + return cn_add_callback(&pohmelfs_cn_id, "pohmelfs", pohmelfs_cn_callback); +} + +void pohmelfs_config_exit(void) +{ + struct pohmelfs_config *c, *tmp; + struct pohmelfs_config_group *g, *gtmp; + + cn_del_callback(&pohmelfs_cn_id); + + mutex_lock(&pohmelfs_config_lock); + list_for_each_entry_safe(g, gtmp, &pohmelfs_config_list, group_entry) { + list_for_each_entry_safe(c, tmp, &g->config_list, config_entry) { + list_del(&c->config_entry); + kfree(c); + } + + list_del(&g->group_entry); + + if (g->hash_string) + kfree(g->hash_string); + + if (g->cipher_string) + kfree(g->cipher_string); + + kfree(g); + } + mutex_unlock(&pohmelfs_config_lock); +} ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [3/9] pohmelfs: crypto processing. 2008-12-26 14:18 ` [2/9] pohmelfs: configuration interface Evgeniy Polyakov @ 2008-12-26 14:18 ` Evgeniy Polyakov 2008-12-26 14:18 ` [4/9] pohmelfs: directory operations Evgeniy Polyakov 0 siblings, 1 reply; 6+ messages in thread From: Evgeniy Polyakov @ 2008-12-26 14:18 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, pohmelfs, netdev, linux-fsdevel, Evgeniy Polyakov POHMELFS is able to encrypt used network channel or attach strong checksum to own packets to catch faulty media. This patch implements crypto initialization, its autoconfiguration and sync with the server. Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net> diff --git a/fs/pohmelfs/crypto.c b/fs/pohmelfs/crypto.c new file mode 100644 index 0000000..f8e8bde --- /dev/null +++ b/fs/pohmelfs/crypto.c @@ -0,0 +1,880 @@ +/* + * 2007+ Copyright (c) Evgeniy Polyakov <zbr@ioremap.net> + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include <linux/crypto.h> +#include <linux/highmem.h> +#include <linux/kthread.h> +#include <linux/pagemap.h> +#include <linux/slab.h> + +#include "netfs.h" + +static struct crypto_hash *pohmelfs_init_hash(struct pohmelfs_sb *psb) +{ + int err; + struct crypto_hash *hash; + + hash = crypto_alloc_hash(psb->hash_string, 0, CRYPTO_ALG_ASYNC); + if (IS_ERR(hash)) { + err = PTR_ERR(hash); + dprintk("%s: idx: %u: failed to allocate hash '%s', err: %d.\n", + __func__, psb->idx, psb->hash_string, err); + goto err_out_exit; + } + + psb->crypto_attached_size = crypto_hash_digestsize(hash); + + if (!psb->hash_keysize) + return hash; + + err = crypto_hash_setkey(hash, psb->hash_key, psb->hash_keysize); + if (err) { + dprintk("%s: idx: %u: failed to set key for hash '%s', err: %d.\n", + __func__, psb->idx, psb->hash_string, err); + goto err_out_free; + } + + return hash; + +err_out_free: + crypto_free_hash(hash); +err_out_exit: + return ERR_PTR(err); +} + +static struct crypto_ablkcipher *pohmelfs_init_cipher(struct pohmelfs_sb *psb) +{ + int err = -EINVAL; + struct crypto_ablkcipher *cipher; + + if (!psb->cipher_keysize) + goto err_out_exit; + + cipher = crypto_alloc_ablkcipher(psb->cipher_string, 0, 0); + if (IS_ERR(cipher)) { + err = PTR_ERR(cipher); + dprintk("%s: idx: %u: failed to allocate cipher '%s', err: %d.\n", + __func__, psb->idx, psb->cipher_string, err); + goto err_out_exit; + } + + crypto_ablkcipher_clear_flags(cipher, ~0); + + err = crypto_ablkcipher_setkey(cipher, psb->cipher_key, psb->cipher_keysize); + if (err) { + dprintk("%s: idx: %u: failed to set key for cipher '%s', err: %d.\n", + __func__, psb->idx, psb->cipher_string, err); + goto err_out_free; + } + + return cipher; + +err_out_free: + crypto_free_ablkcipher(cipher); +err_out_exit: + return ERR_PTR(err); +} + +int pohmelfs_crypto_engine_init(struct pohmelfs_crypto_engine *e, struct pohmelfs_sb *psb) +{ + int err; + + e->page_num = 0; + + e->size = PAGE_SIZE; + e->data = kmalloc(e->size, GFP_KERNEL); + if (!e->data) { + err = -ENOMEM; + goto err_out_exit; + } + + if (psb->hash_string) { + e->hash = pohmelfs_init_hash(psb); + if (IS_ERR(e->hash)) { + err = PTR_ERR(e->hash); + e->hash = NULL; + goto err_out_free; + } + } + + if (psb->cipher_string) { + e->cipher = pohmelfs_init_cipher(psb); + if (IS_ERR(e->cipher)) { + err = PTR_ERR(e->cipher); + e->cipher = NULL; + goto err_out_free_hash; + } + } + + return 0; + +err_out_free_hash: + crypto_free_hash(e->hash); +err_out_free: + kfree(e->data); +err_out_exit: + return err; +} + +void pohmelfs_crypto_engine_exit(struct pohmelfs_crypto_engine *e) +{ + if (e->hash) + crypto_free_hash(e->hash); + if (e->cipher) + crypto_free_ablkcipher(e->cipher); + kfree(e->data); +} + +static void pohmelfs_crypto_complete(struct crypto_async_request *req, int err) +{ + struct pohmelfs_crypto_completion *c = req->data; + + if (err == -EINPROGRESS) + return; + + dprintk("%s: req: %p, err: %d.\n", __func__, req, err); + c->error = err; + complete(&c->complete); +} + +static int pohmelfs_crypto_process(struct ablkcipher_request *req, + struct scatterlist *sg_dst, struct scatterlist *sg_src, + void *iv, int enc, unsigned long timeout) +{ + struct pohmelfs_crypto_completion complete; + int err; + + init_completion(&complete.complete); + complete.error = -EINPROGRESS; + + ablkcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG, + pohmelfs_crypto_complete, &complete); + + ablkcipher_request_set_crypt(req, sg_src, sg_dst, sg_src->length, iv); + + if (enc) + err = crypto_ablkcipher_encrypt(req); + else + err = crypto_ablkcipher_decrypt(req); + + switch (err) { + case -EINPROGRESS: + case -EBUSY: + err = wait_for_completion_interruptible_timeout(&complete.complete, + timeout); + if (!err) + err = -ETIMEDOUT; + else + err = complete.error; + break; + default: + break; + } + + return err; +} + +int pohmelfs_crypto_process_input_data(struct pohmelfs_crypto_engine *e, u64 cmd_iv, + void *data, struct page *page, unsigned int size) +{ + int err; + struct scatterlist sg; + + if (!e->cipher && !e->hash) + return 0; + + dprintk("%s: eng: %p, iv: %llx, data: %p, page: %p/%lu, size: %u.\n", + __func__, e, cmd_iv, data, page, (page)?page->index:0, size); + + if (data) { + sg_init_one(&sg, data, size); + } else { + sg_init_table(&sg, 1); + sg_set_page(&sg, page, size, 0); + } + + if (e->cipher) { + struct ablkcipher_request *req = e->data + crypto_hash_digestsize(e->hash); + u8 iv[32]; + + memset(iv, 0, sizeof(iv)); + memcpy(iv, &cmd_iv, sizeof(cmd_iv)); + + ablkcipher_request_set_tfm(req, e->cipher); + + err = pohmelfs_crypto_process(req, &sg, &sg, iv, 0, e->timeout); + if (err) + goto err_out_exit; + } + + if (e->hash) { + struct hash_desc desc; + void *dst = e->data + e->size/2; + + desc.tfm = e->hash; + desc.flags = 0; + + err = crypto_hash_init(&desc); + if (err) + goto err_out_exit; + + err = crypto_hash_update(&desc, &sg, size); + if (err) + goto err_out_exit; + + err = crypto_hash_final(&desc, dst); + if (err) + goto err_out_exit; + + err = !!memcmp(dst, e->data, crypto_hash_digestsize(e->hash)); + + if (err) { +#ifdef CONFIG_POHMELFS_DEBUG + unsigned int i; + unsigned char *recv = e->data, *calc = dst; + + dprintk("%s: eng: %p, hash: %p, cipher: %p: iv : %llx, hash mismatch (recv/calc): ", + __func__, e, e->hash, e->cipher, cmd_iv); + for (i=0; i<crypto_hash_digestsize(e->hash); ++i) { +#if 0 + dprintka("%02x ", recv[i]); + if (recv[i] != calc[i]) { + dprintka("| calc byte: %02x.\n", calc[i]); + break; + } +#else + dprintka("%02x/%02x ", recv[i], calc[i]); +#endif + } + dprintk("\n"); +#endif + goto err_out_exit; + } else { + dprintk("%s: eng: %p, hash: %p, cipher: %p: hashes matched.\n", + __func__, e, e->hash, e->cipher); + } + } + + dprintk("%s: eng: %p, size: %u, hash: %p, cipher: %p: completed.\n", + __func__, e, e->size, e->hash, e->cipher); + + return 0; + +err_out_exit: + dprintk("%s: eng: %p, hash: %p, cipher: %p: err: %d.\n", + __func__, e, e->hash, e->cipher, err); + return err; +} + +static int pohmelfs_trans_iter(struct netfs_trans *t, struct pohmelfs_crypto_engine *e, + int (* iterator) (struct pohmelfs_crypto_engine *e, + struct scatterlist *dst, + struct scatterlist *src)) +{ + void *data = t->iovec.iov_base + sizeof(struct netfs_cmd) + t->psb->crypto_attached_size; + unsigned int size = t->iovec.iov_len - sizeof(struct netfs_cmd) - t->psb->crypto_attached_size; + struct netfs_cmd *cmd = data; + unsigned int sz, pages = t->attached_pages, i, csize, cmd_cmd, dpage_idx; + struct scatterlist sg_src, sg_dst; + int err; + + while (size) { + cmd = data; + cmd_cmd = __be16_to_cpu(cmd->cmd); + csize = __be32_to_cpu(cmd->size); + cmd->iv = __cpu_to_be64(e->iv); + + if (cmd_cmd == NETFS_READ_PAGES || cmd_cmd == NETFS_READ_PAGE) + csize = __be16_to_cpu(cmd->ext); + + sz = csize + __be16_to_cpu(cmd->cpad) + sizeof(struct netfs_cmd); + + dprintk("%s: size: %u, sz: %u, cmd_size: %u, cmd_cpad: %u.\n", + __func__, size, sz, __be32_to_cpu(cmd->size), __be16_to_cpu(cmd->cpad)); + + data += sz; + size -= sz; + + sg_init_one(&sg_src, cmd->data, sz - sizeof(struct netfs_cmd)); + sg_init_one(&sg_dst, cmd->data, sz - sizeof(struct netfs_cmd)); + + err = iterator(e, &sg_dst, &sg_src); + if (err) + return err; + } + + if (!pages) + return 0; + + dpage_idx = 0; + for (i=0; i<t->page_num; ++i) { + struct page *page = t->pages[i]; + struct page *dpage = e->pages[dpage_idx]; + + if (!page) + continue; + + sg_init_table(&sg_src, 1); + sg_init_table(&sg_dst, 1); + sg_set_page(&sg_src, page, page_private(page), 0); + sg_set_page(&sg_dst, dpage, page_private(page), 0); + + err = iterator(e, &sg_dst, &sg_src); + if (err) + return err; + + pages--; + if (!pages) + break; + dpage_idx++; + } + + return 0; +} + +static int pohmelfs_encrypt_iterator(struct pohmelfs_crypto_engine *e, + struct scatterlist *sg_dst, struct scatterlist *sg_src) +{ + struct ablkcipher_request *req = e->data; + u8 iv[32]; + + memset(iv, 0, sizeof(iv)); + + memcpy(iv, &e->iv, sizeof(e->iv)); + + return pohmelfs_crypto_process(req, sg_dst, sg_src, iv, 1, e->timeout); +} + +static int pohmelfs_encrypt(struct pohmelfs_crypto_thread *tc) +{ + struct netfs_trans *t = tc->trans; + struct pohmelfs_crypto_engine *e = &tc->eng; + struct ablkcipher_request *req = e->data; + + memset(req, 0, sizeof(struct ablkcipher_request)); + ablkcipher_request_set_tfm(req, e->cipher); + + e->iv = pohmelfs_gen_iv(t); + + return pohmelfs_trans_iter(t, e, pohmelfs_encrypt_iterator); +} + +static int pohmelfs_hash_iterator(struct pohmelfs_crypto_engine *e, + struct scatterlist *sg_dst, struct scatterlist *sg_src) +{ + return crypto_hash_update(e->data, sg_src, sg_src->length); +} + +static int pohmelfs_hash(struct pohmelfs_crypto_thread *tc) +{ + struct pohmelfs_crypto_engine *e = &tc->eng; + struct hash_desc *desc = e->data; + unsigned char *dst = tc->trans->iovec.iov_base + sizeof(struct netfs_cmd); + int err; + + desc->tfm = e->hash; + desc->flags = 0; + + err = crypto_hash_init(desc); + if (err) + return err; + + err = pohmelfs_trans_iter(tc->trans, e, pohmelfs_hash_iterator); + if (err) + return err; + + err = crypto_hash_final(desc, dst); + if (err) + return err; + + { + unsigned int i; + dprintk("%s: ", __func__); + for (i=0; i<tc->psb->crypto_attached_size; ++i) + dprintka("%02x ", dst[i]); + dprintka("\n"); + } + + return 0; +} + +static void pohmelfs_crypto_pages_free(struct pohmelfs_crypto_engine *e) +{ + unsigned int i; + + for (i=0; i<e->page_num; ++i) + __free_page(e->pages[i]); + kfree(e->pages); +} + +static int pohmelfs_crypto_pages_alloc(struct pohmelfs_crypto_engine *e, struct pohmelfs_sb *psb) +{ + unsigned int i; + + e->pages = kmalloc(psb->trans_max_pages * sizeof(struct page *), GFP_KERNEL); + if (!e->pages) + return -ENOMEM; + + for (i=0; i<psb->trans_max_pages; ++i) { + e->pages[i] = alloc_page(GFP_KERNEL); + if (!e->pages[i]) + break; + } + + e->page_num = i; + if (!e->page_num) + goto err_out_free; + + return 0; + +err_out_free: + kfree(e->pages); + return -ENOMEM; +} + +static void pohmelfs_sys_crypto_exit_one(struct pohmelfs_crypto_thread *t) +{ + struct pohmelfs_sb *psb = t->psb; + + if (t->thread) + kthread_stop(t->thread); + + mutex_lock(&psb->crypto_thread_lock); + list_del(&t->thread_entry); + psb->crypto_thread_num--; + mutex_unlock(&psb->crypto_thread_lock); + + pohmelfs_crypto_engine_exit(&t->eng); + pohmelfs_crypto_pages_free(&t->eng); + kfree(t); +} + +static int pohmelfs_crypto_finish(struct netfs_trans *t, struct pohmelfs_sb *psb, int err) +{ + if (likely(!err)) { + struct netfs_cmd *cmd = t->iovec.iov_base; + netfs_convert_cmd(cmd); + + err = netfs_trans_finish_send(t, psb); + } + t->result = err; + netfs_trans_put(t); + + return err; +} + +void pohmelfs_crypto_thread_make_ready(struct pohmelfs_crypto_thread *th) +{ + struct pohmelfs_sb *psb = th->psb; + + th->page = NULL; + th->trans = NULL; + + mutex_lock(&psb->crypto_thread_lock); + list_move_tail(&th->thread_entry, &psb->crypto_ready_list); + mutex_unlock(&psb->crypto_thread_lock); + wake_up(&psb->wait); +} + +static int pohmelfs_crypto_thread_trans(struct pohmelfs_crypto_thread *t) +{ + struct netfs_trans *trans; + int err = 0; + + trans = t->trans; + trans->eng = NULL; + + if (t->eng.hash) { + err = pohmelfs_hash(t); + if (err) + goto out_complete; + } + + if (t->eng.cipher) { + err = pohmelfs_encrypt(t); + if (err) + goto out_complete; + trans->eng = &t->eng; + } + +out_complete: + t->page = NULL; + t->trans = NULL; + + if (!trans->eng) + pohmelfs_crypto_thread_make_ready(t); + + pohmelfs_crypto_finish(trans, t->psb, err); + return err; +} + +static int pohmelfs_crypto_thread_page(struct pohmelfs_crypto_thread *t) +{ + struct pohmelfs_crypto_engine *e = &t->eng; + struct page *page = t->page; + int err; + + WARN_ON(!PageChecked(page)); + + err = pohmelfs_crypto_process_input_data(e, e->iv, NULL, page, t->size); + if (!err) + SetPageUptodate(page); + else + SetPageError(page); + unlock_page(page); + page_cache_release(page); + + pohmelfs_crypto_thread_make_ready(t); + + return err; +} + +static int pohmelfs_crypto_thread_func(void *data) +{ + struct pohmelfs_crypto_thread *t = data; + + while (!kthread_should_stop()) { + wait_event_interruptible(t->wait, kthread_should_stop() || + t->trans || t->page); + + if (kthread_should_stop()) + break; + + if (!t->trans && !t->page) + continue; + + dprintk("%s: thread: %p, trans: %p, page: %p.\n", + __func__, t, t->trans, t->page); + + if (t->trans) + pohmelfs_crypto_thread_trans(t); + else if (t->page) + pohmelfs_crypto_thread_page(t); + } + + return 0; +} + +static void pohmelfs_crypto_flush(struct pohmelfs_sb *psb, struct list_head *head) +{ + while (!list_empty(head)) { + struct pohmelfs_crypto_thread *t = NULL; + + mutex_lock(&psb->crypto_thread_lock); + if (!list_empty(head)) { + t = list_first_entry(head, struct pohmelfs_crypto_thread, thread_entry); + list_del_init(&t->thread_entry); + } + mutex_unlock(&psb->crypto_thread_lock); + + if (t) + pohmelfs_sys_crypto_exit_one(t); + } +} + +static void pohmelfs_sys_crypto_exit(struct pohmelfs_sb *psb) +{ + while (!list_empty(&psb->crypto_active_list) || !list_empty(&psb->crypto_ready_list)) { + dprintk("%s: crypto_thread_num: %u.\n", __func__, psb->crypto_thread_num); + pohmelfs_crypto_flush(psb, &psb->crypto_active_list); + pohmelfs_crypto_flush(psb, &psb->crypto_ready_list); + } +} + +static int pohmelfs_sys_crypto_init(struct pohmelfs_sb *psb) +{ + unsigned int i; + struct pohmelfs_crypto_thread *t; + struct pohmelfs_config *c; + struct netfs_state *st; + int err; + + list_for_each_entry(c, &psb->state_list, config_entry) { + st = &c->state; + + err = pohmelfs_crypto_engine_init(&st->eng, psb); + if (err) + goto err_out_exit; + + dprintk("%s: st: %p, eng: %p, hash: %p, cipher: %p.\n", + __func__, st, &st->eng, &st->eng.hash, &st->eng.cipher); + } + + for (i=0; i<psb->crypto_thread_num; ++i) { + err = -ENOMEM; + t = kzalloc(sizeof(struct pohmelfs_crypto_thread), GFP_KERNEL); + if (!t) + goto err_out_free_state_engines; + + init_waitqueue_head(&t->wait); + + t->psb = psb; + t->trans = NULL; + t->eng.thread = t; + + err = pohmelfs_crypto_engine_init(&t->eng, psb); + if (err) + goto err_out_free_state_engines; + + err = pohmelfs_crypto_pages_alloc(&t->eng, psb); + if (err) + goto err_out_free; + + t->thread = kthread_run(pohmelfs_crypto_thread_func, t, + "pohmelfs-crypto-%d-%d", psb->idx, i); + if (IS_ERR(t->thread)) { + err = PTR_ERR(t->thread); + t->thread = NULL; + goto err_out_free; + } + + if (t->eng.cipher) + psb->crypto_align_size = crypto_ablkcipher_blocksize(t->eng.cipher); + + mutex_lock(&psb->crypto_thread_lock); + list_add_tail(&t->thread_entry, &psb->crypto_ready_list); + mutex_unlock(&psb->crypto_thread_lock); + } + + psb->crypto_thread_num = i; + return 0; + +err_out_free: + pohmelfs_sys_crypto_exit_one(t); +err_out_free_state_engines: + list_for_each_entry(c, &psb->state_list, config_entry) { + st = &c->state; + pohmelfs_crypto_engine_exit(&st->eng); + } +err_out_exit: + pohmelfs_sys_crypto_exit(psb); + return err; +} + +void pohmelfs_crypto_exit(struct pohmelfs_sb *psb) +{ + pohmelfs_sys_crypto_exit(psb); + + kfree(psb->hash_string); + kfree(psb->cipher_string); +} + +static int pohmelfs_crypt_init_complete(struct page **pages, unsigned int page_num, + void *private, int err) +{ + struct pohmelfs_sb *psb = private; + + psb->flags = -err; + dprintk("%s: err: %d.\n", __func__, err); + + wake_up(&psb->wait); + + return err; +} + +static int pohmelfs_crypto_init_handshake(struct pohmelfs_sb *psb) +{ + struct netfs_trans *t; + struct netfs_crypto_capabilities *cap; + struct netfs_cmd *cmd; + char *str; + int err = -ENOMEM, size; + + size = sizeof(struct netfs_crypto_capabilities) + + psb->cipher_strlen + psb->hash_strlen + 2; /* 0 bytes */ + + t = netfs_trans_alloc(psb, size, 0, 0); + if (!t) + goto err_out_exit; + + t->complete = pohmelfs_crypt_init_complete; + t->private = psb; + + cmd = netfs_trans_current(t); + cap = (struct netfs_crypto_capabilities *)(cmd + 1); + str = (char *)(cap + 1); + + cmd->cmd = NETFS_CAPABILITIES; + cmd->id = POHMELFS_CRYPTO_CAPABILITIES; + cmd->size = size; + cmd->start = 0; + cmd->ext = 0; + cmd->csize = 0; + + netfs_convert_cmd(cmd); + netfs_trans_update(cmd, t, size); + + cap->hash_strlen = psb->hash_strlen; + if (cap->hash_strlen) { + sprintf(str, "%s", psb->hash_string); + str += cap->hash_strlen; + } + + cap->cipher_strlen = psb->cipher_strlen; + cap->cipher_keysize = psb->cipher_keysize; + if (cap->cipher_strlen) + sprintf(str, "%s", psb->cipher_string); + + netfs_convert_crypto_capabilities(cap); + + psb->flags = ~0; + err = netfs_trans_finish(t, psb); + if (err) + goto err_out_exit; + + err = wait_event_interruptible_timeout(psb->wait, (psb->flags != ~0), + psb->wait_on_page_timeout); + if (!err) + err = -ETIMEDOUT; + else + err = -psb->flags; + + if (!err) + psb->perform_crypto = 1; + psb->flags = 0; + + /* + * At this point NETFS_CAPABILITIES response command + * should setup superblock in a way, which is acceptible + * for both client and server, so if server refuses connection, + * it will send error in transaction response. + */ + + if (err) + goto err_out_exit; + + return 0; + +err_out_exit: + return err; +} + +int pohmelfs_crypto_init(struct pohmelfs_sb *psb) +{ + int err; + + if (!psb->cipher_string && !psb->hash_string) + return 0; + + err = pohmelfs_crypto_init_handshake(psb); + if (err) + return err; + + err = pohmelfs_sys_crypto_init(psb); + if (err) + return err; + + return 0; +} + +static int pohmelfs_crypto_thread_get(struct pohmelfs_sb *psb, + int (* action)(struct pohmelfs_crypto_thread *t, void *data), void *data) +{ + struct pohmelfs_crypto_thread *t = NULL; + int err; + + while (!t) { + err = wait_event_interruptible_timeout(psb->wait, + !list_empty(&psb->crypto_ready_list), + psb->wait_on_page_timeout); + + t = NULL; + err = 0; + mutex_lock(&psb->crypto_thread_lock); + if (!list_empty(&psb->crypto_ready_list)) { + t = list_entry(psb->crypto_ready_list.prev, + struct pohmelfs_crypto_thread, + thread_entry); + + list_move_tail(&t->thread_entry, + &psb->crypto_active_list); + + action(t, data); + wake_up(&t->wait); + + } + mutex_unlock(&psb->crypto_thread_lock); + } + + return err; +} + +static int pohmelfs_trans_crypt_action(struct pohmelfs_crypto_thread *t, void *data) +{ + struct netfs_trans *trans = data; + + netfs_trans_get(trans); + t->trans = trans; + + dprintk("%s: t: %p, gen: %u, thread: %p.\n", __func__, trans, trans->gen, t); + return 0; +} + +int pohmelfs_trans_crypt(struct netfs_trans *trans, struct pohmelfs_sb *psb) +{ + if ((!psb->hash_string && !psb->cipher_string) || !psb->perform_crypto) { + netfs_trans_get(trans); + return pohmelfs_crypto_finish(trans, psb, 0); + } + + return pohmelfs_crypto_thread_get(psb, pohmelfs_trans_crypt_action, trans); +} + +struct pohmelfs_crypto_input_action_data +{ + struct page *page; + struct pohmelfs_crypto_engine *e; + u64 iv; + unsigned int size; +}; + +static int pohmelfs_crypt_input_page_action(struct pohmelfs_crypto_thread *t, void *data) +{ + struct pohmelfs_crypto_input_action_data *act = data; + + memcpy(t->eng.data, act->e->data, t->psb->crypto_attached_size); + + t->size = act->size; + t->eng.iv = act->iv; + + t->page = act->page; + return 0; +} + +int pohmelfs_crypto_process_input_page(struct pohmelfs_crypto_engine *e, + struct page *page, unsigned int size, u64 iv) +{ + struct inode *inode = page->mapping->host; + struct pohmelfs_crypto_input_action_data act; + int err = -ENOENT; + + act.page = page; + act.e = e; + act.size = size; + act.iv = iv; + + err = pohmelfs_crypto_thread_get(POHMELFS_SB(inode->i_sb), + pohmelfs_crypt_input_page_action, &act); + if (err) + goto err_out_exit; + + return 0; + +err_out_exit: + SetPageUptodate(page); + page_cache_release(page); + + return err; +} ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [4/9] pohmelfs: directory operations. 2008-12-26 14:18 ` [3/9] pohmelfs: crypto processing Evgeniy Polyakov @ 2008-12-26 14:18 ` Evgeniy Polyakov 0 siblings, 0 replies; 6+ messages in thread From: Evgeniy Polyakov @ 2008-12-26 14:18 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, pohmelfs, netdev, linux-fsdevel, Evgeniy Polyakov This patch implementes all supported directory operations like directory reading, object lookup, creation, removal and so on. Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net> diff --git a/fs/pohmelfs/dir.c b/fs/pohmelfs/dir.c new file mode 100644 index 0000000..29eda12 --- /dev/null +++ b/fs/pohmelfs/dir.c @@ -0,0 +1,1139 @@ +/* + * 2007+ Copyright (c) Evgeniy Polyakov <zbr@ioremap.net> + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include <linux/kernel.h> +#include <linux/fs.h> +#include <linux/jhash.h> +#include <linux/namei.h> +#include <linux/pagemap.h> + +#include "netfs.h" + +static int pohmelfs_cmp_hash(struct pohmelfs_name *n, u32 hash) +{ + if (n->hash > hash) + return -1; + if (n->hash < hash) + return 1; + + return 0; +} + +static struct pohmelfs_name *pohmelfs_search_hash_unprecise(struct pohmelfs_inode *pi, u32 hash) +{ + struct rb_node *n = pi->hash_root.rb_node; + struct pohmelfs_name *tmp = NULL; + int cmp; + + while (n) { + tmp = rb_entry(n, struct pohmelfs_name, hash_node); + + cmp = pohmelfs_cmp_hash(tmp, hash); + if (cmp < 0) + n = n->rb_left; + else if (cmp > 0) + n = n->rb_right; + else + break; + + } + + return tmp; +} + +struct pohmelfs_name *pohmelfs_search_hash(struct pohmelfs_inode *pi, u32 hash) +{ + struct pohmelfs_name *tmp; + + tmp = pohmelfs_search_hash_unprecise(pi, hash); + if (tmp && (tmp->hash == hash)) + return tmp; + + return NULL; +} + +static void __pohmelfs_name_del(struct pohmelfs_inode *parent, struct pohmelfs_name *node) +{ + rb_erase(&node->hash_node, &parent->hash_root); +} + +/* + * Remove name cache entry from its caches and free it. + */ +static void pohmelfs_name_free(struct pohmelfs_inode *parent, struct pohmelfs_name *node) +{ + __pohmelfs_name_del(parent, node); + list_del(&node->sync_del_entry); + list_del(&node->sync_create_entry); + kfree(node); +} + +static struct pohmelfs_name *pohmelfs_insert_hash(struct pohmelfs_inode *pi, + struct pohmelfs_name *new) +{ + struct rb_node **n = &pi->hash_root.rb_node, *parent = NULL; + struct pohmelfs_name *ret = NULL, *tmp; + int cmp; + + while (*n) { + parent = *n; + + tmp = rb_entry(parent, struct pohmelfs_name, hash_node); + + cmp = pohmelfs_cmp_hash(tmp, new->hash); + if (cmp < 0) + n = &parent->rb_left; + else if (cmp > 0) + n = &parent->rb_right; + else { + ret = tmp; + break; + } + } + + if (ret) { + printk("%s: exist: parent: %llu, ino: %llu, hash: %x, len: %u, data: '%s', " + "new: ino: %llu, hash: %x, len: %u, data: '%s'.\n", + __func__, pi->ino, + ret->ino, ret->hash, ret->len, ret->data, + new->ino, new->hash, new->len, new->data); + ret->ino = new->ino; + return ret; + } + + rb_link_node(&new->hash_node, parent, n); + rb_insert_color(&new->hash_node, &pi->hash_root); + + return NULL; +} + +/* + * Free name cache for given inode. + */ +void pohmelfs_free_names(struct pohmelfs_inode *parent) +{ + struct rb_node *rb_node; + struct pohmelfs_name *n; + + for (rb_node = rb_first(&parent->hash_root); rb_node;) { + n = rb_entry(rb_node, struct pohmelfs_name, hash_node); + rb_node = rb_next(rb_node); + + pohmelfs_name_free(parent, n); + } +} + +static void pohmelfs_fix_offset(struct pohmelfs_inode *parent, struct pohmelfs_name *node) +{ + parent->total_len -= node->len; +} + +/* + * Free name cache entry helper. + */ +void pohmelfs_name_del(struct pohmelfs_inode *parent, struct pohmelfs_name *node) +{ + pohmelfs_fix_offset(parent, node); + pohmelfs_name_free(parent, node); +} + +/* + * Insert new name cache entry into all hash cache. + */ +static int pohmelfs_insert_name(struct pohmelfs_inode *parent, struct pohmelfs_name *n) +{ + struct pohmelfs_name *name; + + name = pohmelfs_insert_hash(parent, n); + if (name) + return -EEXIST; + + parent->total_len += n->len; + list_add_tail(&n->sync_create_entry, &parent->sync_create_list); + + return 0; +} + +/* + * Allocate new name cache entry. + */ +static struct pohmelfs_name *pohmelfs_name_alloc(unsigned int len) +{ + struct pohmelfs_name *n; + + n = kzalloc(sizeof(struct pohmelfs_name) + len, GFP_KERNEL); + if (!n) + return NULL; + + INIT_LIST_HEAD(&n->sync_create_entry); + INIT_LIST_HEAD(&n->sync_del_entry); + + n->data = (char *)(n+1); + + return n; +} + +/* + * Add new name entry into directory's cache. + */ +static int pohmelfs_add_dir(struct pohmelfs_sb *psb, struct pohmelfs_inode *parent, + struct pohmelfs_inode *npi, struct qstr *str, unsigned int mode, int link) +{ + int err = -ENOMEM; + struct pohmelfs_name *n; + struct pohmelfs_path_entry *e = NULL; + + n = pohmelfs_name_alloc(str->len + 1); + if (!n) + goto err_out_exit; + + n->ino = npi->ino; + n->mode = mode; + n->len = str->len; + n->hash = str->hash; + sprintf(n->data, "%s", str->name); + + if (!(str->len == 1 && str->name[0] == '.') && + !(str->len == 2 && str->name[0] == '.' && str->name[1] == '.')) { + mutex_lock(&psb->path_lock); + e = pohmelfs_add_path_entry(psb, parent->ino, npi->ino, str, link, mode); + mutex_unlock(&psb->path_lock); + if (IS_ERR(e)) { + err = PTR_ERR(e); + goto err_out_free; + } + } + + mutex_lock(&parent->offset_lock); + err = pohmelfs_insert_name(parent, n); + mutex_unlock(&parent->offset_lock); + + if (err) { + if (err != -EEXIST) + goto err_out_remove; + kfree(n); + } + + return 0; + +err_out_remove: + if (e) { + mutex_lock(&psb->path_lock); + pohmelfs_remove_path_entry(psb, e); + mutex_unlock(&psb->path_lock); + } +err_out_free: + kfree(n); +err_out_exit: + return err; +} + +/* + * Create new inode for given parameters (name, inode info, parent). + * This does not create object on the server, it will be synced there during writeback. + */ +struct pohmelfs_inode *pohmelfs_new_inode(struct pohmelfs_sb *psb, + struct pohmelfs_inode *parent, struct qstr *str, + struct netfs_inode_info *info, int link) +{ + struct inode *new = NULL; + struct pohmelfs_inode *npi; + int err = -EEXIST; + + dprintk("%s: creating inode: parent: %llu, ino: %llu, str: %p.\n", + __func__, (parent)?parent->ino:0, info->ino, str); + + err = -ENOMEM; + new = iget_locked(psb->sb, info->ino); + if (!new) + goto err_out_exit; + + npi = POHMELFS_I(new); + npi->ino = info->ino; + err = 0; + + if (new->i_state & I_NEW) { + dprintk("%s: filling VFS inode: %lu/%llu.\n", + __func__, new->i_ino, info->ino); + pohmelfs_fill_inode(new, info); + + if (S_ISDIR(info->mode)) { + struct qstr s; + + s.name = "."; + s.len = 1; + s.hash = jhash(s.name, s.len, 0); + + err = pohmelfs_add_dir(psb, npi, npi, &s, info->mode, 0); + if (err) + goto err_out_put; + + s.name = ".."; + s.len = 2; + s.hash = jhash(s.name, s.len, 0); + + err = pohmelfs_add_dir(psb, npi, (parent)?parent:npi, &s, + (parent)?parent->vfs_inode.i_mode:npi->vfs_inode.i_mode, 0); + if (err) + goto err_out_put; + } + } + + if (str) { + if (parent) { + err = pohmelfs_add_dir(psb, parent, npi, str, info->mode, link); + + dprintk("%s: %s inserted name: '%s', new_offset: %llu, ino: %llu, parent: %llu.\n", + __func__, (err)?"unsuccessfully":"successfully", + str->name, parent->total_len, info->ino, parent->ino); + + if (err && err != -EEXIST) + goto err_out_put; + } else { + mutex_lock(&psb->path_lock); + pohmelfs_add_path_entry(psb, npi->ino, npi->ino, str, link, info->mode); + mutex_unlock(&psb->path_lock); + } + } + + if (new->i_state & I_NEW) { + if (parent) + mark_inode_dirty(&parent->vfs_inode); + mark_inode_dirty(new); + } + unlock_new_inode(new); + + return npi; + +err_out_put: + printk("%s: putting inode: %p, npi: %p, error: %d.\n", __func__, new, npi, err); + iput(new); +err_out_exit: + return ERR_PTR(err); +} + +static int pohmelfs_remote_sync_complete(struct page **pages, unsigned int page_num, + void *private, int err) +{ + struct pohmelfs_inode *pi = private; + struct pohmelfs_sb *psb = POHMELFS_SB(pi->vfs_inode.i_sb); + + dprintk("%s: ino: %llu, err: %d.\n", __func__, pi->ino, err); + + if (err) + pi->error = err; + wake_up(&psb->wait); + pohmelfs_put_inode(pi); + + return err; +} + +/* + * Receive directory content from the server. + * This should be only done for objects, which were not created locally, + * and which were not synced previously. + */ +static int pohmelfs_sync_remote_dir(struct pohmelfs_inode *pi) +{ + struct inode *inode = &pi->vfs_inode; + struct pohmelfs_sb *psb = POHMELFS_SB(inode->i_sb); + long ret = msecs_to_jiffies(25000); + int err; + + dprintk("%s: dir: %llu, state: %lx: created: %d, remote_synced: %d.\n", + __func__, pi->ino, pi->state, test_bit(NETFS_INODE_CREATED, &pi->state), + test_bit(NETFS_INODE_REMOTE_SYNCED, &pi->state)); + + if (!test_bit(NETFS_INODE_CREATED, &pi->state)) + return 0; + + if (test_bit(NETFS_INODE_REMOTE_SYNCED, &pi->state)) + return 0; + + if (!igrab(inode)) { + err = -ENOENT; + goto err_out_exit; + } + + err = pohmelfs_meta_command(pi, NETFS_READDIR, NETFS_TRANS_SINGLE_DST, + pohmelfs_remote_sync_complete, pi, 0); + if (err) + goto err_out_exit; + + pi->error = 0; + ret = wait_event_interruptible_timeout(psb->wait, + test_bit(NETFS_INODE_REMOTE_SYNCED, &pi->state) || pi->error, ret); + dprintk("%s: awake dir: %llu, ret: %ld, err: %d.\n", __func__, pi->ino, ret, pi->error); + if (ret <= 0) { + err = -ETIMEDOUT; + goto err_out_exit; + } + + if (pi->error) + return pi->error; + + return 0; + +err_out_exit: + clear_bit(NETFS_INODE_REMOTE_SYNCED, &pi->state); + + return err; +} + +static int pohmelfs_dir_open(struct inode *inode, struct file *file) +{ + file->private_data = NULL; + return 0; +} + +/* + * VFS readdir callback. Syncs directory content from server if needed, + * and provides direntry info to the userspace. + */ +static int pohmelfs_readdir(struct file *file, void *dirent, filldir_t filldir) +{ + struct inode *inode = file->f_path.dentry->d_inode; + struct pohmelfs_inode *pi = POHMELFS_I(inode); + struct pohmelfs_name *n; + struct rb_node *rb_node; + int err = 0, mode; + u64 len; + + dprintk("%s: parent: %llu, fpos: %llu, hash: %08lx.\n", + __func__, pi->ino, (u64)file->f_pos, + (unsigned long)file->private_data); + + err = pohmelfs_data_lock(pi, 0, ~0, POHMELFS_READ_LOCK); + if (err) + return err; + + err = pohmelfs_sync_remote_dir(pi); + if (err) + return err; + + if (file->private_data && (file->private_data == (void *)(unsigned long)file->f_pos)) + return 0; + + mutex_lock(&pi->offset_lock); + n = pohmelfs_search_hash_unprecise(pi, (unsigned long)file->private_data); + + while (n) { + mode = (n->mode >> 12) & 15; + + dprintk("%s: offset: %llu, parent ino: %llu, name: '%s', len: %u, ino: %llu, " + "mode: %o/%o, fpos: %llu, hash: %08x.\n", + __func__, file->f_pos, pi->ino, n->data, n->len, + n->ino, n->mode, mode, file->f_pos, n->hash); + + file->private_data = (void *)n->hash; + + len = n->len; + err = filldir(dirent, n->data, n->len, file->f_pos, n->ino, mode); + + if (err < 0) { + dprintk("%s: err: %d.\n", __func__, err); + err = 0; + break; + } + + file->f_pos += len; + + rb_node = rb_next(&n->hash_node); + + if (!rb_node || (rb_node == &n->hash_node)) { + file->private_data = (void *)(unsigned long)file->f_pos; + break; + } + + n = rb_entry(rb_node, struct pohmelfs_name, hash_node); + } + mutex_unlock(&pi->offset_lock); + + return err; +} + +static loff_t pohmelfs_dir_lseek(struct file *file, loff_t offset, int origin) +{ + file->f_pos = offset; + file->private_data = NULL; + return offset; +} + +const struct file_operations pohmelfs_dir_fops = { + .open = pohmelfs_dir_open, + .read = generic_read_dir, + .llseek = pohmelfs_dir_lseek, + .readdir = pohmelfs_readdir, +}; + +/* + * Lookup single object on server. + */ +static int pohmelfs_lookup_single(struct pohmelfs_inode *parent, + struct qstr *str, u64 ino) +{ + struct pohmelfs_sb *psb = POHMELFS_SB(parent->vfs_inode.i_sb); + long ret = msecs_to_jiffies(5000); + int err; + + set_bit(NETFS_COMMAND_PENDING, &parent->state); + err = pohmelfs_meta_command_data(parent, parent->ino, NETFS_LOOKUP, + (char *)str->name, NETFS_TRANS_SINGLE_DST, NULL, NULL, ino); + if (err) + goto err_out_exit; + + err = 0; + ret = wait_event_interruptible_timeout(psb->wait, + !test_bit(NETFS_COMMAND_PENDING, &parent->state), ret); + if (ret == 0) + err = -ETIMEDOUT; + else if (signal_pending(current)) + err = -EINTR; + + if (err) + goto err_out_exit; + + return 0; + +err_out_exit: + clear_bit(NETFS_COMMAND_PENDING, &parent->state); + + printk("%s: failed: parent: %llu, ino: %llu, name: '%s', err: %d.\n", + __func__, parent->ino, ino, str->name, err); + + return err; +} + +/* + * VFS lookup callback. + * We first try to get inode number from local name cache, if we have one, + * then inode can be found in inode cache. If there is no inode or no object in + * local cache, try to lookup it on server. This only should be done for directories, + * which were not created locally, otherwise remote server does not know about dir at all, + * so no need to try to know that. + */ +struct dentry *pohmelfs_lookup(struct inode *dir, struct dentry *dentry, struct nameidata *nd) +{ + struct pohmelfs_inode *parent = POHMELFS_I(dir); + struct pohmelfs_name *n; + struct inode *inode = NULL; + unsigned long ino = 0; + int err, lock_type = POHMELFS_READ_LOCK; + struct qstr str = dentry->d_name; + + if ((nd->intent.open.flags & O_ACCMODE) > 1) + lock_type = POHMELFS_WRITE_LOCK; + + err = pohmelfs_data_lock(parent, 0, ~0, lock_type); + if (err) + goto out; + + str.hash = jhash(dentry->d_name.name, dentry->d_name.len, 0); + + mutex_lock(&parent->offset_lock); + n = pohmelfs_search_hash(parent, str.hash); + if (n) + ino = n->ino; + mutex_unlock(&parent->offset_lock); + + dprintk("%s: 1 ino: %lu, inode: %p, name: '%s', hash: %x, parent_state: %lx.\n", + __func__, ino, inode, str.name, str.hash, parent->state); + + if (ino) { + inode = ilookup(dir->i_sb, ino); + if (inode) + goto out; + } + + dprintk("%s: dir: %p, dir_ino: %llu, name: '%s', len: %u, dir_state: %lx, ino: %lu.\n", + __func__, dir, parent->ino, + str.name, str.len, parent->state, ino); + + if (!ino) { + if (!test_bit(NETFS_INODE_CREATED, &parent->state)) + goto out; + + if (test_bit(NETFS_INODE_REMOTE_SYNCED, &parent->state)) + goto out; + } + + err = pohmelfs_lookup_single(parent, &str, ino); + if (err) + goto out; + + if (!ino) { + mutex_lock(&parent->offset_lock); + n = pohmelfs_search_hash(parent, str.hash); + if (n) + ino = n->ino; + mutex_unlock(&parent->offset_lock); + } + + if (ino) { + inode = ilookup(dir->i_sb, ino); + printk("%s: second lookup ino: %lu, inode: %p, name: '%s', hash: %x.\n", + __func__, ino, inode, str.name, str.hash); + if (!inode) { + printk("%s: No inode for ino: %lu, name: '%s', hash: %x.\n", + __func__, ino, str.name, str.hash); + //return NULL; + return ERR_PTR(-EACCES); + } + } else { + printk("%s: No inode number : name: '%s', hash: %x.\n", + __func__, str.name, str.hash); + } +out: + return d_splice_alias(inode, dentry); +} + +/* + * Create new object in local cache. Object will be synced to server + * during writeback for given inode. + */ +struct pohmelfs_inode *pohmelfs_create_entry_local(struct pohmelfs_sb *psb, + struct pohmelfs_inode *parent, struct qstr *str, u64 start, int mode) +{ + struct pohmelfs_inode *npi; + int err = -ENOMEM; + struct netfs_inode_info info; + + dprintk("%s: name: '%s', mode: %o, start: %llu.\n", + __func__, str->name, mode, start); + + info.mode = mode; + info.ino = start; + + if (!start) + info.ino = pohmelfs_new_ino(psb); + + info.nlink = S_ISDIR(mode)?2:1; + info.uid = current->uid; + info.gid = current->gid; + info.size = 0; + info.blocksize = 512; + info.blocks = 0; + info.rdev = 0; + info.version = 0; + + npi = pohmelfs_new_inode(psb, parent, str, &info, !!start); + if (IS_ERR(npi)) { + err = PTR_ERR(npi); + goto err_out_unlock; + } + + set_bit(NETFS_INODE_REMOTE_SYNCED, &npi->state); + + return npi; + +err_out_unlock: + dprintk("%s: err: %d.\n", __func__, err); + return ERR_PTR(err); +} + +/* + * Create local object and bind it to dentry. + */ +static int pohmelfs_create_entry(struct inode *dir, struct dentry *dentry, u64 start, int mode) +{ + struct pohmelfs_sb *psb = POHMELFS_SB(dir->i_sb); + struct pohmelfs_inode *npi, *parent; + struct qstr str = dentry->d_name; + int err; + + parent = POHMELFS_I(dir); + + err = pohmelfs_data_lock(parent, 0, ~0, POHMELFS_WRITE_LOCK); + if (err) + return err; + + str.hash = jhash(dentry->d_name.name, dentry->d_name.len, 0); + + npi = pohmelfs_create_entry_local(psb, parent, &str, start, mode); + if (IS_ERR(npi)) + return PTR_ERR(npi); + + d_instantiate(dentry, &npi->vfs_inode); + + dprintk("%s: parent: %llu, inode: %llu, name: '%s', parent_nlink: %d, nlink: %d.\n", + __func__, parent->ino, npi->ino, dentry->d_name.name, + (signed)dir->i_nlink, (signed)npi->vfs_inode.i_nlink); + + return 0; +} + +/* + * VFS create and mkdir callbacks. + */ +static int pohmelfs_create(struct inode *dir, struct dentry *dentry, int mode, + struct nameidata *nd) +{ + return pohmelfs_create_entry(dir, dentry, 0, mode); +} + +static int pohmelfs_mkdir(struct inode *dir, struct dentry *dentry, int mode) +{ + int err; + + inode_inc_link_count(dir); + err = pohmelfs_create_entry(dir, dentry, 0, mode | S_IFDIR); + if (err) + inode_dec_link_count(dir); + + return err; +} + +/* + * Remove entry from local cache. + * Object will not be removed from server, instead it will be queued into parent + * to-be-removed queue, which will be processed during parent writeback (parent + * also marked as dirty). Writeback will send remove request to server. + * Such approach allows to remove vey huge directories (like 2.6.24 kernel tree) + * with only single network command. + */ +static int pohmelfs_remove_entry(struct inode *dir, struct dentry *dentry) +{ + struct pohmelfs_sb *psb = POHMELFS_SB(dir->i_sb); + struct inode *inode = dentry->d_inode; + struct pohmelfs_inode *parent = POHMELFS_I(dir), *pi = POHMELFS_I(inode); + struct pohmelfs_name *n; + int err = -ENOENT; + struct qstr str = dentry->d_name; + + err = pohmelfs_data_lock(parent, 0, ~0, POHMELFS_WRITE_LOCK); + if (err) + return err; + + str.hash = jhash(dentry->d_name.name, dentry->d_name.len, 0); + + dprintk("%s: dir_ino: %llu, inode: %llu, name: '%s', nlink: %d.\n", + __func__, parent->ino, pi->ino, + str.name, (signed)inode->i_nlink); + + mutex_lock(&parent->offset_lock); + n = pohmelfs_search_hash(parent, str.hash); + if (n) { + pohmelfs_fix_offset(parent, n); + if (test_bit(NETFS_INODE_CREATED, &pi->state)) { + __pohmelfs_name_del(parent, n); + list_add_tail(&n->sync_del_entry, &parent->sync_del_list); + } else + pohmelfs_name_free(parent, n); + err = 0; + } + mutex_unlock(&parent->offset_lock); + + if (!err) { + mutex_lock(&psb->path_lock); + psb->avail_size += inode->i_size; + pohmelfs_remove_path_entry_by_ino(psb, pi->ino); + mutex_unlock(&psb->path_lock); + + pohmelfs_inode_del_inode(psb, pi); + + mark_inode_dirty(dir); + + inode->i_ctime = dir->i_ctime; + if (inode->i_nlink) + inode_dec_link_count(inode); + } + dprintk("%s: inode: %p, lock: %ld, unhashed: %d.\n", + __func__, pi, inode->i_state & I_LOCK, hlist_unhashed(&inode->i_hash)); + + return err; +} + +/* + * Unlink and rmdir VFS callbacks. + */ +static int pohmelfs_unlink(struct inode *dir, struct dentry *dentry) +{ + return pohmelfs_remove_entry(dir, dentry); +} + +static int pohmelfs_rmdir(struct inode *dir, struct dentry *dentry) +{ + int err; + struct inode *inode = dentry->d_inode; + + dprintk("%s: parent: %llu, inode: %llu, name: '%s', parent_nlink: %d, nlink: %d.\n", + __func__, POHMELFS_I(dir)->ino, POHMELFS_I(inode)->ino, + dentry->d_name.name, (signed)dir->i_nlink, (signed)inode->i_nlink); + + err = pohmelfs_remove_entry(dir, dentry); + if (!err) { + inode_dec_link_count(dir); + inode_dec_link_count(inode); + } + + return err; +} + +/* + * Link creation is synchronous. + * I'm lazy. + * Earth is somewhat round. + */ +static int pohmelfs_create_link(struct pohmelfs_inode *parent, struct qstr *obj, + struct pohmelfs_inode *target, struct qstr *tstr) +{ + struct super_block *sb = parent->vfs_inode.i_sb; + struct pohmelfs_sb *psb = POHMELFS_SB(sb); + struct netfs_cmd *cmd; + struct netfs_trans *t; + void *data; + int err, parent_len, target_len = 0, cur_len, path_size = 0; + + err = pohmelfs_data_lock(parent, 0, ~0, POHMELFS_WRITE_LOCK); + if (err) + return err; + + err = sb->s_op->write_inode(&parent->vfs_inode, 0); + if (err) + goto err_out_exit; + + if (tstr) + target_len = tstr->len; + + mutex_lock(&psb->path_lock); + parent_len = pohmelfs_path_length(parent); + if (target) + target_len += pohmelfs_path_length(target); + mutex_unlock(&psb->path_lock); + + if (parent_len < 0) { + err = parent_len; + goto err_out_exit; + } + + if (target_len < 0) { + err = target_len; + goto err_out_exit; + } + + t = netfs_trans_alloc(psb, parent_len + target_len + obj->len + 2, 0, 0); + if (!t) { + err = -ENOMEM; + goto err_out_exit; + } + cur_len = netfs_trans_cur_len(t); + + cmd = netfs_trans_current(t); + if (IS_ERR(cmd)) { + err = PTR_ERR(cmd); + goto err_out_free; + } + + data = (void *)(cmd + 1); + cur_len -= sizeof(struct netfs_cmd); + + mutex_lock(&psb->path_lock); + err = pohmelfs_construct_path_string(parent, data, parent_len); + if (err > 0) { + path_size = err; + cur_len -= path_size; + + err = snprintf(data + path_size, cur_len, "/%s|", obj->name); + + path_size += err; + cur_len -= err; + + cmd->ext = path_size - 1; /* No | symbol */ + + if (target) { + err = pohmelfs_construct_path_string(target, data + path_size, target_len); + if (err > 0) { + path_size += err + 1; + cur_len -= err + 1; + } + } + } + mutex_unlock(&psb->path_lock); + + if (err < 0) + goto err_out_free; + + cmd->start = 0; + + if (!target && tstr) { + if (tstr->len > cur_len - 1) { + err = -ENAMETOOLONG; + goto err_out_free; + } + + err = snprintf(data + path_size, cur_len, "%s", tstr->name) + 1 /* 0-byte */; + path_size += err; + cur_len -= err; + cmd->start = 1; + } + + dprintk("%s: parent: %llu, obj: '%s', target_inode: %llu, target_str: '%s', full: '%s'.\n", + __func__, parent->ino, obj->name, (target)?target->ino:0, (tstr)?tstr->name:NULL, + (char *)data); + + cmd->cmd = NETFS_LINK; + cmd->size = path_size; + cmd->id = parent->ino; + + netfs_convert_cmd(cmd); + + netfs_trans_update(cmd, t, path_size); + + err = netfs_trans_finish(t, psb); + if (err) + goto err_out_exit; + + return 0; + +err_out_free: + t->result = err; + netfs_trans_put(t); +err_out_exit: + return err; +} + +/* + * VFS hard and soft link callbacks. + */ +static int pohmelfs_link(struct dentry *old_dentry, struct inode *dir, + struct dentry *dentry) +{ + struct inode *inode = old_dentry->d_inode; + struct pohmelfs_inode *pi = POHMELFS_I(inode); + int err; + struct qstr str = dentry->d_name; + + str.hash = jhash(dentry->d_name.name, dentry->d_name.len, 0); + + err = inode->i_sb->s_op->write_inode(inode, 0); + if (err) + return err; + + err = pohmelfs_create_link(POHMELFS_I(dir), &str, pi, NULL); + if (err) + return err; + + return pohmelfs_create_entry(dir, dentry, pi->ino, inode->i_mode); +} + +static int pohmelfs_symlink(struct inode *dir, struct dentry *dentry, const char *symname) +{ + struct qstr sym_str; + struct qstr str = dentry->d_name; + struct inode *inode; + int err; + + str.hash = jhash(dentry->d_name.name, dentry->d_name.len, 0); + + sym_str.name = symname; + sym_str.len = strlen(symname); + + err = pohmelfs_create_link(POHMELFS_I(dir), &str, NULL, &sym_str); + if (err) + goto err_out_exit; + + err = pohmelfs_create_entry(dir, dentry, 0, S_IFLNK | S_IRWXU | S_IRWXG | S_IRWXO); + if (err) + goto err_out_exit; + + inode = dentry->d_inode; + + err = page_symlink(inode, symname, sym_str.len + 1); + if (err) + goto err_out_put; + + return 0; + +err_out_put: + iput(inode); +err_out_exit: + return err; +} + +static int pohmelfs_send_rename(struct pohmelfs_inode *pi, struct pohmelfs_inode *parent, + struct qstr *str) +{ + int path_len, err, total_len = 0, inode_len, parent_len; + char *path; + struct netfs_trans *t; + struct netfs_cmd *cmd; + struct pohmelfs_sb *psb = POHMELFS_SB(pi->vfs_inode.i_sb); + + mutex_lock(&psb->path_lock); + parent_len = pohmelfs_path_length(parent); + inode_len = pohmelfs_path_length(pi); + mutex_unlock(&psb->path_lock); + + if (parent_len < 0 || inode_len < 0) + return -EINVAL; + + path_len = parent_len + inode_len + str->len + 3; + + t = netfs_trans_alloc(psb, path_len, 0, 0); + if (!t) + return -ENOMEM; + + cmd = netfs_trans_current(t); + path = (char *)(cmd + 1); + + mutex_lock(&psb->path_lock); + err = pohmelfs_construct_path_string(pi, path, inode_len + 1); + if (err < 0) + goto err_out_unlock; + + cmd->ext = err; + + path += err; + total_len += err; + path_len -= err; + + *path = '|'; + path++; + total_len++; + path_len--; + + err = pohmelfs_construct_path_string(parent, path, parent_len + 1); + if (err < 0) + goto err_out_unlock; + mutex_unlock(&psb->path_lock); + + path += err; + total_len += err; + path_len -= err; + + err = snprintf(path, path_len - 1, "/%s", str->name); + + total_len += err + 1; /* 0 symbol */ + path_len -= err + 1; + + cmd->cmd = NETFS_RENAME; + cmd->id = pi->ino; + cmd->start = parent->ino; + cmd->size = total_len; + + netfs_convert_cmd(cmd); + + netfs_trans_update(cmd, t, total_len); + + return netfs_trans_finish(t, psb); + +err_out_unlock: + mutex_unlock(&psb->path_lock); + netfs_trans_free(t); + return err; +} + +static int pohmelfs_rename(struct inode *old_dir, struct dentry *old_dentry, + struct inode *new_dir, struct dentry *new_dentry) +{ + struct pohmelfs_sb *psb = POHMELFS_SB(old_dir->i_sb); + struct inode *old_inode = old_dentry->d_inode; + struct pohmelfs_inode *old_parent, *old, *new_parent; + struct qstr str = new_dentry->d_name; + struct pohmelfs_name *n; + unsigned int old_hash; + int err = -ENOENT; + + if (new_dir) { + err = new_dir->i_sb->s_op->write_inode(new_dir, 0); + if (err) + return err; + } + + err = old_inode->i_sb->s_op->write_inode(old_inode, 0); + if (err) + return err; + + old_hash = jhash(old_dentry->d_name.name, old_dentry->d_name.len, 0); + str.hash = jhash(new_dentry->d_name.name, new_dentry->d_name.len, 0); + + old = POHMELFS_I(old_inode); + old_parent = POHMELFS_I(old_dir); + + str.len = new_dentry->d_name.len; + str.name = new_dentry->d_name.name; + str.hash = jhash(new_dentry->d_name.name, new_dentry->d_name.len, 0); + + if (new_dir) { + new_parent = POHMELFS_I(new_dir); + err = -ENOTEMPTY; + + if (S_ISDIR(old_inode->i_mode) && + new_parent->total_len <= 3) + goto err_out_exit; + } else { + new_parent = old_parent; + } + + dprintk("%s: ino: %llu, parent: %llu, name: '%s' -> parent: %llu, name: '%s'.\n", + __func__, old->ino, old_parent->ino, old_dentry->d_name.name, + new_parent->ino, new_dentry->d_name.name); + + err = pohmelfs_send_rename(old, new_parent, &str); + if (err) + goto err_out_exit; + + mutex_lock(&psb->path_lock); + err = pohmelfs_rename_path_entry(psb, old->ino, new_parent->ino, &str); + mutex_unlock(&psb->path_lock); + if (err) + goto err_out_exit; + + n = pohmelfs_name_alloc(str.len + 1); + if (!n) + goto err_out_exit; + + mutex_lock(&new_parent->offset_lock); + n->ino = old->ino; + n->mode = old_inode->i_mode; + n->len = str.len; + n->hash = str.hash; + sprintf(n->data, "%s", str.name); + + err = pohmelfs_insert_name(new_parent, n); + mutex_unlock(&new_parent->offset_lock); + + if (err) + goto err_out_exit; + + mutex_lock(&old_parent->offset_lock); + n = pohmelfs_search_hash(old_parent, old_hash); + if (n) + pohmelfs_name_del(old_parent, n); + mutex_unlock(&old_parent->offset_lock); + + mark_inode_dirty(old_inode); + mark_inode_dirty(&new_parent->vfs_inode); + + return 0; + +err_out_exit: + return err; +} + +/* + * POHMELFS directory inode operations. + */ +const struct inode_operations pohmelfs_dir_inode_ops = { + .link = pohmelfs_link, + .symlink = pohmelfs_symlink, + .unlink = pohmelfs_unlink, + .mkdir = pohmelfs_mkdir, + .rmdir = pohmelfs_rmdir, + .create = pohmelfs_create, + .lookup = pohmelfs_lookup, + .setattr = pohmelfs_setattr, + .rename = pohmelfs_rename, +}; diff --git a/fs/pohmelfs/path_entry.c b/fs/pohmelfs/path_entry.c new file mode 100644 index 0000000..2dcbce7 --- /dev/null +++ b/fs/pohmelfs/path_entry.c @@ -0,0 +1,356 @@ +/* + * 2007+ Copyright (c) Evgeniy Polyakov <johnpol@2ka.mipt.ru> + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include <linux/module.h> +#include <linux/slab.h> +#include <linux/fs.h> +#include <linux/ktime.h> +#include <linux/fs.h> +#include <linux/pagemap.h> +#include <linux/writeback.h> +#include <linux/mm.h> + +#include "netfs.h" + +/* + * Path cache. + * + * Used to create pathes to root, strings (or structures, + * containing name, mode, permissions and so on) used by userspace + * server to process data. + * + * Cache is local for client, and its inode numbers are never synced + * with anyone else, server operates on names and pathes, not some obscure ids. + */ + +static void pohmelfs_free_path_entry(struct pohmelfs_path_entry *e) +{ + kfree(e); +} + +static struct pohmelfs_path_entry *pohmelfs_alloc_path_entry(unsigned int len) +{ + struct pohmelfs_path_entry *e; + + e = kzalloc(len + 1 + sizeof(struct pohmelfs_path_entry), GFP_KERNEL); + if (!e) + return NULL; + + e->name = (char *)((struct pohmelfs_path_entry *)(e + 1)); + e->len = len; + atomic_set(&e->refcnt, 1); + + return e; +} + +static inline int pohmelfs_cmp_path_entry(u64 path_ino, u64 new_ino) +{ + if (path_ino > new_ino) + return -1; + if (path_ino < new_ino) + return 1; + return 0; +} + +static struct pohmelfs_path_entry *pohmelfs_search_path_entry(struct rb_root *root, u64 ino) +{ + struct rb_node *n = root->rb_node; + struct pohmelfs_path_entry *tmp; + int cmp; + + while (n) { + tmp = rb_entry(n, struct pohmelfs_path_entry, path_entry); + + cmp = pohmelfs_cmp_path_entry(tmp->ino, ino); + if (cmp < 0) + n = n->rb_left; + else if (cmp > 0) + n = n->rb_right; + else + return tmp; + } + + dprintk("%s: Failed to find path entry for ino: %llu.\n", __func__, ino); + return NULL; +} + +static struct pohmelfs_path_entry *pohmelfs_insert_path_entry(struct rb_root *root, + struct pohmelfs_path_entry *new) +{ + struct rb_node **n = &root->rb_node, *parent = NULL; + struct pohmelfs_path_entry *ret = NULL, *tmp; + int cmp; + + while (*n) { + parent = *n; + + tmp = rb_entry(parent, struct pohmelfs_path_entry, path_entry); + + cmp = pohmelfs_cmp_path_entry(tmp->ino, new->ino); + if (cmp < 0) + n = &parent->rb_left; + else if (cmp > 0) + n = &parent->rb_right; + else { + ret = tmp; + break; + } + } + + if (ret) { + printk("%s: exist: ino: %llu, data: '%s', new: ino: %llu, data: '%s'.\n", + __func__, ret->ino, ret->name, new->ino, new->name); + return ret; + } + + rb_link_node(&new->path_entry, parent, n); + rb_insert_color(&new->path_entry, root); + + dprintk("%s: inserted: ino: %llu, data: '%s', parent: ino: %llu, data: '%s'.\n", + __func__, new->ino, new->name, new->parent->ino, new->parent->name); + + return new; +} + +void pohmelfs_remove_path_entry(struct pohmelfs_sb *psb, struct pohmelfs_path_entry *e) +{ + if (atomic_dec_and_test(&e->refcnt)) { + rb_erase(&e->path_entry, &psb->path_root); + + if (e->parent != e) + pohmelfs_remove_path_entry(psb, e->parent); + pohmelfs_free_path_entry(e); + } +} + +void pohmelfs_remove_path_entry_by_ino(struct pohmelfs_sb *psb, u64 ino) +{ + struct pohmelfs_path_entry *e; + + e = pohmelfs_search_path_entry(&psb->path_root, ino); + if (e) + pohmelfs_remove_path_entry(psb, e); +} + +int pohmelfs_change_path_entry(struct pohmelfs_sb *psb, u64 ino, unsigned int mode) +{ + struct pohmelfs_path_entry *e; + + e = pohmelfs_search_path_entry(&psb->path_root, ino); + if (!e) + return -ENOENT; + + e->mode = mode; + return 0; +} + +int pohmelfs_rename_path_entry(struct pohmelfs_sb *psb, u64 ino, u64 parent_ino, struct qstr *str) +{ + struct pohmelfs_path_entry *e; + unsigned int mode, link; + + e = pohmelfs_search_path_entry(&psb->path_root, ino); + if (!e) + return -ENOENT; + + if ((e->len >= str->len) && (parent_ino == e->parent->ino)) { + sprintf(e->name, "%s", str->name); + e->len = str->len; + e->hash = str->hash; + + return 0; + } + + mode = e->mode; + link = e->link; + + pohmelfs_remove_path_entry(psb, e); + + e = pohmelfs_add_path_entry(psb, parent_ino, ino, str, link, mode); + if (IS_ERR(e)) + return PTR_ERR(e); + + return 0; +} + +struct pohmelfs_path_entry * pohmelfs_add_path_entry(struct pohmelfs_sb *psb, + u64 parent_ino, u64 ino, struct qstr *str, int link, unsigned int mode) +{ + struct pohmelfs_path_entry *e, *ret, *parent; + + parent = pohmelfs_search_path_entry(&psb->path_root, parent_ino); + + e = pohmelfs_alloc_path_entry(str->len); + if (!e) + return ERR_PTR(-ENOMEM); + + e->parent = e; + if (parent) { + e->parent = parent; + atomic_inc(&parent->refcnt); + } + + e->ino = ino; + e->hash = str->hash; + e->link = link; + e->mode = mode; + + sprintf(e->name, "%s", str->name); + + ret = pohmelfs_insert_path_entry(&psb->path_root, e); + if (ret != e) { + pohmelfs_free_path_entry(e); + e = ret; + } + + dprintk("%s: parent: %llu, ino: %llu, name: '%s', len: %u.\n", + __func__, parent_ino, ino, e->name, e->len); + + return e; +} + +static int pohmelfs_prepare_path(struct pohmelfs_inode *pi, struct list_head *list, int len, int create) +{ + struct pohmelfs_path_entry *e; + struct pohmelfs_sb *psb = POHMELFS_SB(pi->vfs_inode.i_sb); + + e = pohmelfs_search_path_entry(&psb->path_root, pi->ino); + if (!e) + return -ENOENT; + + while (e && e->parent != e) { + if (len < e->len + create) + return -ETOOSMALL; + + len -= e->len + create; + + list_add(&e->entry, list); + e = e->parent; + } + + return 0; +} + +/* + * Create path from root for given inode. + * Path is formed as set of stuctures, containing name of the object + * and its inode data (mode, permissions and so on). + */ +int pohmelfs_construct_path(struct pohmelfs_inode *pi, void *data, int len) +{ + struct pohmelfs_path_entry *e; + struct netfs_path_entry *ne = data; + int used = 0, err; + LIST_HEAD(list); + + err = pohmelfs_prepare_path(pi, &list, len, sizeof(struct netfs_path_entry)); + if (err) + return err; + + list_for_each_entry(e, &list, entry) { + ne = data; + ne->mode = e->mode; + ne->len = e->len; + + used += sizeof(struct netfs_path_entry); + data += sizeof(struct netfs_path_entry); + + if (ne->len <= sizeof(ne->unused)) { + memcpy(ne->unused, e->name, ne->len); + } else { + memcpy(data, e->name, ne->len); + data += ne->len; + used += ne->len; + } + + dprintk("%s: ino: %llu, mode: %o, is_link: %d, name: '%s', used: %d, ne_len: %u.\n", + __func__, e->ino, ne->mode, e->link, e->name, used, ne->len); + + netfs_convert_path_entry(ne); + } + + return used; +} + +/* + * Create path from root for given inode. + */ +int pohmelfs_construct_path_string(struct pohmelfs_inode *pi, void *data, int len) +{ + struct pohmelfs_path_entry *e; + int used = 0, err; + char *ptr = data; + LIST_HEAD(list); + + err = pohmelfs_prepare_path(pi, &list, len, 0); + if (err) + return err; + + if (list_empty(&list)) { + err = sprintf(ptr, "/"); + ptr += err; + used += err; + } else { + list_for_each_entry(e, &list, entry) { + err = sprintf(ptr, "/%s", e->name); + + BUG_ON(!e->name); + + ptr += err; + used += err; + } + } + + dprintk("%s: inode: %llu, full path: '%s', used: %d.\n", + __func__, pi->ino, (char *)data, used); + + return used; +} + +static int pohmelfs_get_path_length(struct pohmelfs_inode *pi, int create) +{ + struct pohmelfs_path_entry *e; + struct pohmelfs_sb *psb = POHMELFS_SB(pi->vfs_inode.i_sb); + int len = 1 + create; + + e = pohmelfs_search_path_entry(&psb->path_root, pi->ino); + + /* + * This should never happen actually. + */ + if (!e) + return -ENOENT; + + while (e && e->parent != e) { + len += e->len + create + 1; + e = e->parent; + } + + return len; +} + +int pohmelfs_path_length(struct pohmelfs_inode *pi) +{ + int len = pohmelfs_get_path_length(pi, 0); + + if (len < 0) + return len; + return len; +} + +int pohmelfs_path_length_create(struct pohmelfs_inode *pi) +{ + return pohmelfs_get_path_length(pi, sizeof(struct netfs_path_entry)); +} ^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-02-09 14:02 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-01-14 0:43 [4/9] pohmelfs: directory operations Yehuda Sadeh Weinraub 2009-01-14 1:07 ` Evgeniy Polyakov 2009-01-14 1:28 ` Yehuda Sadeh Weinraub 2009-01-14 1:31 ` Evgeniy Polyakov -- strict thread matches above, loose matches on Subject: below -- 2009-02-09 14:02 [0/9] pohmelfs for drivers/staging Evgeniy Polyakov 2009-02-09 14:02 ` [1/9] pohmelfs: documentation Evgeniy Polyakov 2009-02-09 14:02 ` [2/9] pohmelfs: configuration interface Evgeniy Polyakov 2009-02-09 14:02 ` [3/9] pohmelfs: crypto processing Evgeniy Polyakov 2009-02-09 14:02 ` [4/9] pohmelfs: directory operations Evgeniy Polyakov 2008-12-26 14:18 [0/9] pohmelfs: The Great Southern Trendkill release Evgeniy Polyakov 2008-12-26 14:18 ` [1/9] pohmelfs: documentation Evgeniy Polyakov 2008-12-26 14:18 ` [2/9] pohmelfs: configuration interface Evgeniy Polyakov 2008-12-26 14:18 ` [3/9] pohmelfs: crypto processing Evgeniy Polyakov 2008-12-26 14:18 ` [4/9] pohmelfs: directory operations Evgeniy Polyakov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).