linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 1/1] vfs: Filemash fs
@ 2013-04-03  9:16 Ram Pai
  0 siblings, 0 replies; 6+ messages in thread
From: Ram Pai @ 2013-04-03  9:16 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linuxram

The following patch implements a filesystem driver which provides the ability
to mashup exisiting files in creative ways in-order to create new files.

Think of it as a way to union files; not filesystems.

Its a prototype idea with a prototype implementation.  Tested and working on
3.0.9-rc1.  I have included Documentation file which details the idea with
examples and possible applications.

Any suggestions/ideas to make this useful and generally applicable is very
much appreciated!

Signed-off-by: Ram Pai <linuxram@us.ibm.com>

Documentation/filesystems/filemash.txt |  422 ++++++++++++++++++
fs/Kconfig                             |    1 
fs/Makefile                            |    1 
fs/filemashfs/Kconfig                  |    6 
fs/filemashfs/Makefile                 |    5 
fs/filemashfs/super.c                  |  743 +++++++++++++++++++++++++++++++++
6 files changed, 1178 insertions(+)

diff --git a/Documentation/filesystems/filemash.txt b/Documentation/filesystems/filemash.txt
new file mode 100644
index 0000000..9ff93b9
--- /dev/null
+++ b/Documentation/filesystems/filemash.txt
@@ -0,0 +1,422 @@
+File Mash Filesystem
+---------------
+Contents:
+	1) Overview
+ 	2) How to use it?
+	3) Features
+	4) Format of mount option
+	5) Practical examples
+	6) Nice to have capabilities
+	7) FAQ
+
+
+1) Overview
+-----------
+
+Consider the following situation:
+
+a) You have a large file that needs to be expanded but you are running out of
+space on that filesystem. For muliple reasons you do not have the capability
+to expand the filesystem. You either have to move the file to a bigger
+filesystem or cleanup space on the current filesystem. What if you had the
+capability to expand the file to another filesystem?
+
+b) You have a large file whose contents change every day by a few bytes. An
+incremental backup of the filesystem; everyday using rsync, will end up making
+copies of the large file everytime. Instead; if your large file was mashedup
+using multiple smaller files, an incremental backup of the filesystem will
+just backup the smaller files that have changed; thus saving lots of backup
+space.
+
+c) Large files; imagine virtual disk images, that have some sections that
+hardly change, some sections that have significant i/o activity, and some
+sections that need redundancy. This need can be met if there is a way
+to mashup the file using smaller files residing on filesystem appropriate
+to the need.
+
+d) Imagine multiple files containing data which is mostly common. In such case
+storage space is unnecessary wasted.  What if there was a way to separate out
+the common data into a file and use that file to mashup each of the files?
+Data can be deduplicated saving lots of storage space.
+
+e) Imagine a script file which for whatever reason cannot be modified, but
+needs to be fixed to make it usable. What if there was way to temporarily
+patch that script without actually modifying the script?
+
+f) <fill in more applications>
+
+FileMash filesystem is an attempt to provide the ability to mash-up a new
+file using multiple existing files in creative ways.
+
+It is currently implemented as a filesystem driver that stacks on top of other
+filesystem drivers. It provides the capability to create the illusion of a new
+file by using the space provided by the sub-ordinate files residing on other
+filesystems.
+
+In other words, it provides unioning capability -- the capability to union
+files, not filesystems.
+
+
+2) How to use it?
+------------------
+
+Apply the patch.  This patch has been prototyped/tested on 3.0.9-rc1.
+
+$ make fs/filemashfs/filemashfs.ko
+$ modprobe  filemashfs.ko
+
+$ cat > file_even <<-!
+0000
+2222
+4444
+6666
+8888
+!
+
+$ cat > file_odd <<-!
+1111
+3333
+5555
+7777
+9999
+!
+
+# Example 1
+# create a new file named file_evenodd using contents of file 'file_odd' followed by
+# contents of file 'file_even'. Use the content in a striped fashion.
+$ touch file_evenodd
+$ mount -t filemashfs -ofile=file_even,file=file_odd,layout=stripe:5 s file_evenodd
+
+$ cat file_evenodd
+0000
+1111
+2222
+3333
+4444
+5555
+6666
+7777
+8888
+9999
+
+# Example 2
+# create a new file named file_evenodd using contents of file 'file_odd' followed by
+# contents of  file 'file_even'. Use the content in a striped fashion.
+$ touch file_oddeven
+$ mount -t filemashfs -ofile=file_odd,file=file_even,layout=stripe:5 s file_oddeven
+
+$ cat file_oddeven
+1111
+0000
+3333
+2222
+5555
+4444
+7777
+6666
+9999
+8888
+
+# Example 3:
+# create a new file named file_concat using entire contents of file 'file_odd' followed by
+# entire contents of  file 'file_even'.
+$ touch file_concat
+$ mount -t filemashfs -ofile=file_odd,file=file_even,layout=concat s file_concat
+$ cat file_concat
+1111
+3333
+5555
+7777
+9999
+0000
+2222
+4444
+6666
+8888
+
+# Example 4:
+# Here is a example demonstating how contents get fanned out to subordinate file when data is
+# written to the mashedup file.
+
+$ touch firstfile
+$ touch secondfile
+$ touch mashfile
+
+$ mount -t filemashfs -ofile=firstfile,file=secondfile,layout=stripe:10 s mashfile
+$ cat > mashfile <<!
+> 000000000
+> 111111111
+> 222222222
+> 333333333
+> 444444444
+> 555555555
+> 666666666
+> 777777777
+> 888888888
+> 999999999
+> !
+
+$ cat firstfile
+000000000
+222222222
+444444444
+666666666
+888888888
+
+$ cat secondfile 
+111111111
+333333333
+555555555
+777777777
+999999999
+
+
+
+2) Features
+-----------
+
+FileMash filesystem provides two features currently.
+
+a) Striping
+b) Concatenation
+
+If the data needs to striped evenly across all the subordinate files, striping
+option helps do that.  In the above example layout=stripe:5 is used. It tells
+the filesystem to use first 5 bytes from the first file, the second 5 bytes
+from the second file, the third 5 bytes from the first file, and so on an so
+forth till the first and the second file have no more data to offer.
+
+If the data needs to concatenated using all the subordinate files, concat
+option is your choice. In the above example layout=concat is used. It tells
+the filesystem to use all the available bytes in the first file and then
+proceed to use all the available bytes in the second file.
+
+The area of the subordinate file to be used can be specified on the command
+line. This will ensure that only the specified area is used by the mashed file
+and thus cannot cross the boundaries of that sub-ordinate file.
+
+$ mount -t filemashfs -ofile=file_odd:10:20,file=file_even,layout=concat s file_concat
+
+This command specifies that file_odd from byte 10 to the next 20bytes must
+be used. The sub-ordinate file cannot not be expanded by writing additional
+data to the mashedup file.
+
+
+3) Format of the mount option
+--------------------------
+
+ The format of the mount option is as follows
+
+ -o (file=filename:[offset]:[length],)*,layout=<stripe:size|concat>,.....
+ 
+  The file keyword specifies the name of the sub-ordinate file to be used
+  for mashup. The 'offset' specifies the location from which the file can
+  be used for read/write. The 'length' specifies the size of the file, starting
+  from the 'offset' location, to be used for read/write. If 'offset' is not
+  specified the file is used from its zeroth byte. if 'length' is not
+  specified the file is assumed to have infinite size, limited only by 
+  the size of the filesystem on which it resides.
+
+  Multile files can be specified. The order in which files are specified is
+  significant, since it dictates the order in which the mashed-up file will
+  map its contents to the underlying sub-ordinate files.
+
+  The layout keyword specifies the file mashup policy. Currently it supports
+  stripe or concat policy. The layout defaults to concat policy. The 'stripe'
+  keyword takes an addition option which specifies the length of the stripe.
+
+
+$ mount -t filemashfs -ofile=file_even,file=file_odd,layout=stripe:5  \
+                 s file_evenodd
+
+          file_even                           file_evenodd
+          ,----,                                ,----,
+          |0000| <----------------------------- |0000|
+          |2222| <--,                ,--------  |1111|
+          |4444| <-, `--------------/---------  |2222|
+          |6666|<-, \              / ,--------  |3333|
+          |8888|<~ \ '---------- -/-/---------  |4444|
+          '----'  \ \            / / ,--------  |5555|
+                   \ '-------- -/-/-/---------  |6666|
+                    \          / / /  ,------   |7777|
+          file_odd   '--------/-/-/--/ ------   |8888|
+          ,----,             / / /  /  /-----   |9999|
+          |1111|  <---------' / /  /  /         '----'
+          |3333|  <----------' /  /  /
+          |5555|  <-----------'  /  /
+          |7777|  <-------------'  /
+          |9999|  <---------------'
+          '----' 
+        
+$ mount -t filemashfs -ofile=file_even:5,file=file_odd,layout=stripe:5  \
+               s file_evenodd
+
+          file_even                           file_evenodd
+          ,----,                                ,----,
+          |0000|    ,-------------------------- |2222|
+          |2222| <-'                 ,--------  |1111|
+          |4444| <------------------/---------  |4444|
+          |6666| <--,              / ,--------  |3333|
+          |8888| <-, '---------- -/-/---------  |6666|
+          '----'    \            / / ,--------  |5555|
+                     '-------- -/-/-/---------  |8888|
+                               / / /  ,------   |7777|
+          file_odd            / / /  /  ,-----  |9999|
+          ,----,             / / /  /  /        '----'
+          |1111|  <---------' / /  /  /         
+          |3333|  <----------' /  /  /
+          |5555|  <-----------'  /  /
+          |7777|  <-------------'  /
+          |9999|  <---------------'
+          '----' 
+
+$ mount -t filemashfs -ofile=file_even:5:10,file=file_odd,layout=stripe:5  \
+                s file_evenodd
+
+          file_even                           file_evenodd
+          ,----,                                ,----,
+          |0000|    ,-------------------------- |2222|
+          |2222| <-'                 ,--------  |1111|
+          |4444| <------------------/---------  |4444|
+          |6666|                   / ,--------  |3333|
+          |8888|                  / / ,-------  |5555|
+          '----'                 / / /  ,-----  |7777|
+                                / / /  /  ,---  |9999|
+                               / / /  /  /      '----' 
+          file_odd            / / /  /  /
+          ,----,             / / /  /  /
+          |1111|  <---------' / /  /  /         
+          |3333|  <----------' /  /  /
+          |5555|  <-----------'  /  /
+          |7777|  <-------------'  /
+          |9999|  <---------------'
+          '----' 
+
+
+$ mount -t filemashfs -ofile=file_even:10:20,file=file_odd:5:,layout=concat \
+                s file_concat
+
+          file_even                           file_concat
+          ,----,                                ,----,
+          |0000|     ,------------------------- |4444|
+          |2222|    /                ,--------  |6666|
+          |4444| <-`                / ,-------  |3333|
+          |6666| <---------------- / / ,------  |5555|
+          |8888|                    / /  ,----  |7777|
+          '----'                   / /  /  ,--  |9999|
+                                  / /  /  /     '----'
+                                 / /  /  /
+          file_odd              / /  /  /
+          ,----,               / /  /  /
+          |1111|              / /  /  /         
+          |3333|  <----------' /  /  /
+          |5555|  <-----------'  /  /
+          |7777|  <-------------'  /
+          |9999|  <---------------'
+          '----' 
+        
+
+4) Practical examples
+----------------------
+
+a) how to create a large file spanning multiple filesystems?
+
+assuming that we have 5 differen filesystems mounted at /F1, /F2, /F3, /F4, /F5
+
+$ touch /F1/f1
+$ touch /F2/f2
+$ touch /F3/f3
+$ touch /F4/f4
+$ touch /F5/f5
+
+$ touch /mybigfile
+$ mount -t filemashfs -ofile=/F1/f1,file=/F2/f2,file=/F3/f3,file=/F4/f4,file=F5/f5,layout=stripe:10000 s /mybigfile
+
+start writing data to /mybigfile and it will continue to grow that file till
+all the space on all the filesystem is exhausted! I have not tested it yet.
+But that should work by design. Implementation; I promise, is currently buggy :).
+
+
+
+b) how do I patch a file?
+
+Assume you have a script named hello.sh
+
+$ cat hello.sh
+#/bin/bash
+echo Spanish:  hola?
+echo Mandarin: ni hao ma?
+echo English:  I don't know :(
+echo Hindi:    aap kaise hai?
+echo Kannada:  Nee Hege Iddiya?
+
+	we know that "I don't know :(" has to be "How are you?".
+	So create a file named "fix" with the correct string in it.
+
+#echo 'How are you?' > fix
+
+	And mashup a new "hello.sh" file using parts of "hello.sh" and parts of
+	"fix".
+
+#mount -t filemashfs -ofile=hello.sh:0:73,file=fix,file=hello.sh:89,layout=concat s hello.sh
+
+# cat hello.sh
+#/bin/bash
+echo Spanish:  hola?
+echo Mandarin: ni hao ma?
+echo English:  How are you?
+echo Hindi:    aap kaise hai?
+echo Kannada:  Nee Hege Iddiya?
+
+<hello.sh is now fixed!>
+
+
+c) How do I deduplicate files?
+
+	I leave this as an exercise to you :)
+
+
+6. Nice to have capabilities
+-------------------------
+
+a) Ability to add or remove a file dynamically from a mashed up file.
+
+b) ??
+
+
+7. FAQ
+-------
+
+a) Why is this not implemented using FUSE?
+
+	I do not have numbers to prove this. But I think it might be less
+	performant, given that reads/writes have to make a few trips
+	between userspace and kernel.
+
+
+b) btrfs has a feature which allows a filesystem to extended using files from
+	other filesystem using loop device. Why is this different?
+
+	That feature lets you extend filesystem using files. The proposed
+	feature lets us extend files using filesystems. The features are
+	complementary in nature.
+
+
+c) One of the application; you mentioned above, mentions deduplication. Does it
+	deduplicate automatically?
+
+	No. The current prototype code does support any
+	deduplication. It just provides the mechanism to deduplicate
+	data and consolidate filesystem space. The user has to
+	identify the files that have
+	common data, move that data into a single file and mashup
+	all the other files using this new file.
+	A userspace tool can be written to do so.
+
+
+
+Thanks for your interest till now!
+------------------------------------------------------------------------
+version 0.1  (created the initial document, Ram Pai linuxram@us.ibm.com)
+version 0.2  (added nice to have capabilities and FAQ, based on inputs from
+			Chandra and Malahal)
diff --git a/fs/Kconfig b/fs/Kconfig
index 780725a..4ea2cfd 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -67,6 +67,7 @@ source "fs/quota/Kconfig"
 
 source "fs/autofs4/Kconfig"
 source "fs/fuse/Kconfig"
+source "fs/filemashfs/Kconfig"
 
 config GENERIC_ACL
 	bool
diff --git a/fs/Makefile b/fs/Makefile
index 9d53192..296f3b2 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -127,3 +127,4 @@ obj-$(CONFIG_F2FS_FS)		+= f2fs/
 obj-y				+= exofs/ # Multiple modules
 obj-$(CONFIG_CEPH_FS)		+= ceph/
 obj-$(CONFIG_PSTORE)		+= pstore/
+obj-$(CONFIG_FILEMASH_FS)	+= filemashfs/
diff --git a/fs/filemashfs/Kconfig b/fs/filemashfs/Kconfig
new file mode 100644
index 0000000..eaf2ee7
--- /dev/null
+++ b/fs/filemashfs/Kconfig
@@ -0,0 +1,6 @@
+config FILEMASH_FS
+	tristate "FileMash file system (EXPERIMENTAL)"
+	help
+	  Add support for FileMash filesystem.
+
+	  If unsure, say N.
diff --git a/fs/filemashfs/Makefile b/fs/filemashfs/Makefile
new file mode 100644
index 0000000..73b82d7
--- /dev/null
+++ b/fs/filemashfs/Makefile
@@ -0,0 +1,5 @@
+#
+# Makefile for FileMash filesystem
+#
+obj-$(CONFIG_FILEMASH_FS) += filemashfs.o
+filemashfs-objs	:= super.o
diff --git a/fs/filemashfs/super.c b/fs/filemashfs/super.c
new file mode 100644
index 0000000..3b7cd00
--- /dev/null
+++ b/fs/filemashfs/super.c
@@ -0,0 +1,743 @@
+/*
+ *  linux/fs/filemashfs/super.c
+ *
+ * (C) Copyright IBM Corporation 2013.
+ *	Released under GPL v2.
+ *	Author : Ram Pai (linuxram@us.ibm.com)
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+#include <linux/fs.h>
+#include <linux/namei.h>
+#include <linux/xattr.h>
+#include <linux/security.h>
+#include <linux/mount.h>
+#include <linux/slab.h>
+#include <linux/parser.h>
+#include <linux/module.h>
+#include <linux/cred.h>
+#include <linux/sched.h>
+#include <linux/gfp.h>
+#include <linux/types.h>
+#include <linux/workqueue.h>
+#include <linux/pagemap.h>
+
+MODULE_AUTHOR("Ram Pai <linuxram@us.ibm.com>");
+MODULE_DESCRIPTION("FileMash filesystem");
+MODULE_LICENSE("GPL");
+
+#define my_div(numerator, denominator)   ((numerator)/(denominator))
+#define my_mod(numerator, denominator)   ((numerator)%(denominator))
+
+#define FM_READ  1
+#define FM_WRITE 2
+
+#define MAX_FILE 16
+
+struct fm_info  {
+	char	*f_file;
+	loff_t	f_offset;
+	size_t	f_len;
+};
+
+struct fm_fs {
+	size_t		  fs_total;
+	struct path	 *fs_path;
+	struct fm_info	 *fs_info;
+	int		 *fs_sort;
+};
+
+struct fm_f {
+	struct file   **f_file;
+	struct fm_fs   *f_fs;
+};
+
+
+enum {
+	opt_file,
+	opt_layout,
+	opt_err,
+};
+
+
+static int filemash_file_release(struct inode *inode, struct file *filp)
+{
+	struct fm_f *fm_f = (struct fm_f *)filp->private_data;
+	struct dentry *dentry = filp->f_path.dentry;
+	struct super_block *sb = dentry->d_sb;
+	struct fm_fs *fm_fs = (struct fm_fs *)sb->s_fs_info;
+	int    total_files = fm_fs->fs_total;
+	int    i;
+
+	for (i = 0; i < total_files; i++)
+		fm_f->f_file[i]->f_op->release(
+			fm_fs->fs_path[i].dentry->d_inode, fm_f->f_file[i]);
+
+	kfree(fm_f->f_file);
+	kfree(fm_f);
+	return 0;
+}
+
+static int filemash_file_open(struct inode *inode, struct file *filp)
+{
+	struct dentry *dentry = filp->f_path.dentry;
+	struct super_block *sb = dentry->d_sb;
+	struct fm_fs *fm_fs = (struct fm_fs *)sb->s_fs_info;
+	int    total_files = fm_fs->fs_total;
+	struct fm_f *fm_f;
+	int i, err = -ENOMEM;
+
+	fm_f = kmalloc(sizeof(struct fm_f), GFP_KERNEL);
+	if (!fm_f)
+		goto out;
+
+	fm_f->f_file = kmalloc(total_files*sizeof(struct file *), GFP_KERNEL);
+	if (!fm_f->f_file)
+		goto out1;
+
+	fm_f->f_fs = fm_fs;
+
+	filp->private_data = (void *)fm_f;
+
+	for (i = 0; i < total_files; i++) {
+		fm_f->f_file[i] = dentry_open(&fm_fs->fs_path[i],
+						filp->f_flags, current_cred());
+		fm_f->f_file[i]->f_op->open(fm_fs->fs_path[i].dentry->d_inode,
+						fm_f->f_file[i]);
+	}
+
+	return 0;
+
+out1:	kfree(fm_f);
+out:	return err;
+}
+
+static void wait_on_retry_sync_kiocb(struct kiocb *iocb)
+{
+	set_current_state(TASK_UNINTERRUPTIBLE);
+	if (!kiocbIsKicked(iocb))
+		schedule();
+	else
+		kiocbClearKicked(iocb);
+	__set_current_state(TASK_RUNNING);
+}
+
+static int copy_zero_bytes(char __user *buf,  size_t len)
+{
+	struct page *page = ZERO_PAGE(0);
+	char *kaddr;
+
+	len = min(PAGE_SIZE, len);
+
+	kaddr = kmap(page);
+	len = __copy_to_user(buf, kaddr, len);
+	kunmap(page);
+	return len;
+}
+
+
+/*
+ * return the file which holds the 'stripe_n'th stripe.
+ *
+ * @stripe_n : the index of the stripe. starts from one; not zero.
+ * @sort     : array pointing indexes to the file array whose lengths are
+ *		sorted.
+ * @fm_f     : all the files, their lengths, offsets. sort array.
+ *		open file descriptors
+ * @total_files : total number of files
+ * @which_stripe: the stripe number within the returned file
+ */
+static struct file *where_is_stripe_n(int stripe_n,
+				const struct fm_f *fm_f,
+				int *which_stripe)
+{
+	const struct fm_info *fm_info =  fm_f->f_fs->fs_info;
+	const int *sort =  fm_f->f_fs->fs_sort;
+	int total_files =  fm_f->f_fs->fs_total;
+	struct file  **f_file = fm_f->f_file;
+	int prev_min = 0;
+	int total = 0, accumulated_stripes = 0;
+	int remaining_files = total_files;
+	int i, rem_stripe, n;
+	int stripe_len = fm_info[0].f_len;
+
+	for (i = 0; i < total_files; i++) {
+		int min, tmp;
+
+		if (fm_info[sort[i]].f_len) {
+			int index = sort[i];
+			int file_len = fm_info[index].f_len -
+				fm_info[index].f_offset;
+			min = my_div(file_len, stripe_len);
+		} else
+			min = INT_MAX/remaining_files;
+
+		tmp = (min-prev_min)*remaining_files;
+
+		if ((total+tmp) > stripe_n) {
+			accumulated_stripes +=
+				(stripe_n-1-total)/remaining_files;
+			break;
+		}
+
+		total += tmp;
+		accumulated_stripes += (min-prev_min);
+		remaining_files--;
+		prev_min = min;
+	}
+
+	if (i == total_files)
+		return NULL;
+
+	rem_stripe = stripe_n  - total;
+	n = my_mod(rem_stripe-1, remaining_files)+1;
+
+	for (i = 1; i < total_files+1; i++) {
+		int tmp;
+		if (fm_info[i].f_len) {
+			int file_len = fm_info[i].f_len-fm_info[i].f_offset;
+			tmp = my_div(file_len, stripe_len);
+		} else
+			tmp = INT_MAX;
+
+		if (tmp > accumulated_stripes && !--n)
+			break;
+	}
+
+	if (n)
+		return NULL;
+
+	if (i < 1 || i > total_files)
+		return NULL;
+
+	*which_stripe = accumulated_stripes;
+	return f_file[i-1];
+}
+
+
+/*
+ * offset has reached or exceeded end-of-file only if it has exceeded
+ * the size of all the subordinate files
+ */
+static int end_of_file(struct file **f_file, int total_files, int offset)
+{
+	int i, k = total_files;
+	for (i = 0 ; i < total_files ; i++) {
+		struct inode *inode = f_file[i]->f_dentry->d_inode;
+		size_t i_size;
+
+		mutex_lock(&inode->i_mutex);
+		i_size = i_size_read(inode);
+		mutex_unlock(&inode->i_mutex);
+
+		if (offset > i_size && !--k)
+			return 1;
+	}
+	return 0;
+}
+
+/*
+ * return the subordinate file that holds the *ppos location in the
+ * mashed file.
+ */
+static struct file *find_stripe_filp(const struct fm_f *fm_f,
+					loff_t *ppos,
+					struct kiocb *kiocb,
+					int dir)
+{
+	struct file  **f_file = fm_f->f_file,  *filep = NULL;
+	const struct fm_info *fm_info = (struct fm_info *)fm_f->f_fs->fs_info;
+	int total_files = fm_f->f_fs->fs_total;
+	int stripe_len = fm_info[0].f_len;
+	int filep_stripe = 0;
+
+	int stripe_n = my_div(*ppos, stripe_len)+1;
+
+
+	filep = where_is_stripe_n(stripe_n, fm_f, &filep_stripe);
+
+	if (!filep)
+		return NULL;
+
+	if (dir == FM_READ && end_of_file(f_file, total_files,
+				filep_stripe*stripe_len))
+		return NULL;
+
+	init_sync_kiocb(kiocb, filep);
+	kiocb->ki_pos = *ppos - (stripe_len * (stripe_n-1)) +
+				(stripe_len * filep_stripe);
+	kiocb->ki_left = stripe_len*(filep_stripe+1) - kiocb->ki_pos;
+	kiocb->ki_nbytes = kiocb->ki_left;
+	return filep;
+}
+
+/*
+ * return the subordinate file that holds the *ppos location in the mashed file.
+ */
+static struct file *find_concat_filp(const struct fm_f *fm_f,
+					loff_t *ppos,
+					struct kiocb *kiocb)
+{
+	int i;
+	loff_t i_size = 0, pre_i_size = 0, tmp_i_size;
+	struct file  **f_file = fm_f->f_file,  *filep = NULL;
+	const struct fm_info *fm_info = (struct fm_info *)fm_f->f_fs->fs_info;
+	int total_files = fm_f->f_fs->fs_total;
+
+
+	for (i = 0; i < total_files; i++) {
+		struct inode *inode = f_file[i]->f_dentry->d_inode;
+		mutex_lock(&inode->i_mutex);
+		tmp_i_size = i_size_read(inode);
+		mutex_unlock(&inode->i_mutex);
+
+		tmp_i_size -= fm_info[i+1].f_offset;
+		if (tmp_i_size <= 0)
+			continue;
+		if (fm_info[i+1].f_len && fm_info[i+1].f_len < tmp_i_size)
+			tmp_i_size = fm_info[i+1].f_len;
+
+		i_size += tmp_i_size;
+
+		if (*ppos < i_size) {
+			init_sync_kiocb(kiocb, f_file[i]);
+			kiocb->ki_pos = *ppos - pre_i_size +
+				fm_info[i+1].f_offset;
+			filep = f_file[i];
+			break;
+		}
+		pre_i_size = i_size;
+	}
+	kiocb->ki_left = tmp_i_size;
+	kiocb->ki_nbytes = tmp_i_size;
+	return filep;
+}
+
+static ssize_t
+filemash_file_io(struct file *filp, char __user *buf, size_t len,
+			loff_t *ppos, int dir)
+{
+	struct iovec iov = { .iov_base = buf, .iov_len = len };
+	struct kiocb kiocb;
+	ssize_t ret;
+	const struct fm_f *fm_f = (struct fm_f *)filp->private_data;
+	struct fm_info *fm_info = fm_f->f_fs->fs_info;
+	struct file  *filep;
+	loff_t pre_pos;
+	size_t left;
+
+
+	if (!strcmp(fm_info[0].f_file, "concat"))
+		filep = find_concat_filp(fm_f, ppos, &kiocb);
+	else
+		filep = find_stripe_filp(fm_f, ppos, &kiocb, dir);
+
+	if (!filep)
+		return ((dir == FM_READ) ? 0 : -ENOSPC);
+
+	left = iov.iov_len = kiocb.ki_nbytes;
+	pre_pos = kiocb.ki_pos;
+	for (;;) {
+		if (dir == FM_READ) {
+			ret = filep->f_op->aio_read(&kiocb, &iov, 1,
+					kiocb.ki_pos);
+			if (!ret && kiocb.ki_nbytes) {
+				left = min(len, left);
+				len = copy_zero_bytes(buf, left);
+				ret = left - len;
+				kiocb.ki_pos += ret;
+			}
+		} else
+			ret = filep->f_op->aio_write(&kiocb, &iov, 1,
+					kiocb.ki_pos);
+		if (ret != -EIOCBRETRY)
+			break;
+		wait_on_retry_sync_kiocb(&kiocb);
+	}
+
+	if (-EIOCBQUEUED == ret)
+		ret = wait_on_sync_kiocb(&kiocb);
+
+	*ppos += kiocb.ki_pos - pre_pos;
+
+	return ret;
+}
+
+static ssize_t
+filemash_file_read(struct file *filp, char __user *buf, size_t len,
+				loff_t *ppos)
+{
+	return filemash_file_io(filp, buf, len, ppos, FM_READ);
+}
+
+
+static ssize_t
+filemash_file_write(struct file *filp, const char __user *buf,
+		size_t len, loff_t *ppos)
+{
+	return filemash_file_io(filp, (char __user *)buf, len, ppos, FM_WRITE);
+}
+
+static const struct file_operations filemash_file_operations = {
+	.open	 = filemash_file_open,
+	.read	 = filemash_file_read,
+	.release = filemash_file_release,
+	.write	= filemash_file_write,
+	/*.aio_read = generic_file_aio_read */
+};
+
+
+static int filemash_getattr(struct vfsmount *mnt, struct dentry *dentry,
+				struct kstat *stat)
+{
+	struct super_block *sb	  = dentry->d_sb;
+	struct kstat sstat;
+	int    ret, i;
+	struct fm_fs *fm_fs	  = (struct fm_fs *)sb->s_fs_info;
+	struct path *fs_path	  = (struct path *)fm_fs->fs_path;
+	struct fm_info *fm_info	  = (struct fm_info *)fm_fs->fs_info;
+	int total_files		  = fm_fs->fs_total;
+
+	stat->size = 0;
+	for (i = 0; i < total_files; i++) {
+		ret = vfs_getattr(fs_path+i, &sstat);
+		if (ret)
+			return ret;
+
+		sstat.size -= fm_info[i+1].f_offset;
+		if (sstat.size <= 0)
+			continue;
+		if (fm_info[i+1].f_len && sstat.size > fm_info[i+1].f_len)
+			sstat.size = fm_info[i+1].f_len;
+
+		if (i == 0)
+			*stat = sstat;
+		else
+			stat->size += sstat.size;
+	}
+	stat->ino   = dentry->d_inode->i_ino;
+	return 0;
+}
+
+
+static const struct inode_operations filemash_file_inode_operations = {
+	.getattr        = filemash_getattr,
+};
+
+
+struct inode *filemash_new_inode(struct super_block *sb, umode_t mode)
+{
+	struct inode *inode;
+
+	inode = new_inode(sb);
+	if (!inode)
+		return NULL;
+
+	mode &= S_IFMT;
+
+	inode->i_ino = get_next_ino();
+	inode->i_mode = mode;
+	inode->i_flags |= S_NOATIME | S_NOCMTIME;
+
+	inode->i_op =  &filemash_file_inode_operations;
+	inode->i_fop = &filemash_file_operations;
+
+	return inode;
+}
+
+static void filemash_put_super(struct super_block *sb)
+{
+	struct fm_fs *fm_fs = (struct fm_fs *)sb->s_fs_info;
+	int total_files = fm_fs->fs_total;
+	struct fm_info *fm_info = (struct fm_info *)fm_fs->fs_info;
+	int i;
+
+	kfree(fm_fs->fs_sort);
+	for (i = 0; i < total_files; i++)
+		path_put(&fm_fs->fs_path[i]);
+	kfree(fm_fs->fs_path);
+
+	/* yes i=1 is correct. it starts at 1 */
+	for (i = 1; i < total_files+1; i++)
+		kfree(fm_info[i].f_file);
+	kfree(fm_info);
+
+	return;
+}
+
+static int filemash_remount_fs(struct super_block *sb, int *flagsp, char *data)
+{
+	return 0;
+}
+
+static const match_table_t filemash_tokens = {
+	{opt_file,		"file=%s"},
+	{opt_layout,		"layout=%s"},
+	{opt_err,		NULL}
+};
+
+/*
+ *
+ * format of the input is
+ * file=file:[offset]:[len],layout=<stripe:size|concat>,.....
+ *
+ * offset; if is not specified, defaults to zero
+ * len; if is not specified, defaults to infinity
+ *
+ * file= and layout= tokens can be used in any order.
+ *
+ * the order of the files determines the order in which the files are mashed-up
+ */
+static struct fm_info *filemash_parse_opt(char *opt, int *total_files)
+{
+	char *p, *q, *r;
+	int i, total = 0;
+	struct fm_info *fm_info = (struct fm_info *)
+		kzalloc((MAX_FILE+1)*sizeof(struct fm_info), GFP_KERNEL);
+
+	if (!fm_info)
+		goto fail;
+
+	while ((p = strsep(&opt, ",")) != NULL) {
+		int token;
+		substring_t args[MAX_OPT_ARGS];
+
+		if (total >= MAX_FILE)
+			goto fail;
+
+		if (!*p)
+			continue;
+
+		token = match_token(p, filemash_tokens, args);
+		switch (token) {
+		case opt_file:
+			total++;
+
+			q = match_strdup(&args[0]);
+			if (!q)
+				goto fail;
+
+			r = strsep(&q, ":");
+			if (!r)
+				goto fail;
+			fm_info[total].f_file = r;
+
+			r = strsep(&q, ":");
+			if (!r || kstrtol(r, 10,
+					(long *)&fm_info[total].f_offset)) {
+				fm_info[total].f_offset = 0;
+				fm_info[total].f_len = 0;
+				break;
+			}
+
+			r = strsep(&q, ":");
+			if (!r || kstrtol(r, 10,
+					(long *)&fm_info[total].f_len)) {
+				fm_info[total].f_len = 0;
+				break;
+			}
+			break;
+
+		case opt_layout:
+			q = match_strdup(&args[0]);
+			if (!q)
+				goto fail;
+
+			r = strsep(&q, ":");
+			if (!r)
+				goto fail;
+			fm_info[0].f_file = r;
+
+			if (strncmp(r, "stripe", 6)) {
+				fm_info[0].f_len = 0;
+				break;
+			}
+			r = strsep(&q, ":");
+			if (!r || kstrtol(r, 10, (long *)&fm_info[0].f_len)) {
+				fm_info[0].f_len = 0;
+				break;
+			}
+			break;
+
+		default:
+			return NULL;
+		}
+	}
+
+	if (!fm_info[0].f_file) {
+		fm_info[0].f_file = "concat";
+		fm_info[0].f_len  = 0;
+	} else if (!strcmp(fm_info[0].f_file, "stripe") && !fm_info[0].f_len) {
+		goto fail;
+	} else if (!strcmp(fm_info[0].f_file, "concat"))  {
+		fm_info[0].f_len  = 0;
+	}
+
+	*total_files = total;
+	return fm_info;
+
+fail:
+	for (i = 0; i < total; i++)
+		kfree(fm_info[i].f_file);
+	kfree(fm_info);
+	*total_files = 0;
+	return NULL;
+}
+
+static const struct super_operations filemash_super_operations = {
+	.put_super	= filemash_put_super,
+	.remount_fs	= filemash_remount_fs,
+};
+
+
+/*
+ * not the best sort function in the world. implement heapsort or
+ * some such thing. Currently it is roughly O(n^2)
+ */
+static int *sort_info(struct fm_info *array, int total, int stripe_len)
+{
+	int i, j, index;
+	int no_of_stripes, cur_min, last_min;
+	int *sort = kmalloc(total*sizeof(int), GFP_KERNEL);
+	if (!sort)
+		return NULL;
+
+	j = 0;
+	cur_min = 0;
+	while (j < total) {
+		last_min = cur_min;
+		cur_min = -1;
+		index = j;
+		for (i = 1; i < total+1; i++) {
+			if (!array[i].f_len)
+				no_of_stripes = INT_MAX;
+			else
+				no_of_stripes = my_div(array[i].f_len,
+							stripe_len);
+
+			if (no_of_stripes <= last_min)
+				continue;
+
+			if (cur_min == -1) {
+				cur_min = no_of_stripes;
+				sort[index++] = i;
+			} else if (no_of_stripes < cur_min) {
+				index = j;
+				sort[index++] = i;
+			} else if (no_of_stripes == cur_min) {
+				sort[index++] = i;
+			}
+		}
+		BUG_ON(j == index);
+		j = index;
+	}
+
+	return sort;
+}
+
+static int filemash_fill_super(struct super_block *sb, void *data, int silent)
+{
+	struct inode *root_inode;
+	struct dentry *root_dentry;
+	struct fm_fs *fm_fs;
+	int i, j, total_files = 0;
+	int err = -ENOMEM;
+	struct fm_info *fm_info = (struct fm_info *)
+			filemash_parse_opt((char *) data, &total_files);
+
+	if (!fm_info)
+		goto out;
+
+	fm_fs = kmalloc(sizeof(struct fm_fs), GFP_KERNEL);
+	if (!fm_fs)
+		goto out;
+
+	fm_fs->fs_path = kmalloc(total_files*sizeof(struct path), GFP_KERNEL);
+	if (!fm_fs->fs_path)
+		goto out1;
+
+	fm_fs->fs_total = total_files;
+	fm_fs->fs_info  = fm_info;
+	fm_fs->fs_sort  = NULL;
+	if (!strcmp(fm_info[0].f_file, "stripe")) {
+		fm_fs->fs_sort = sort_info(fm_info, total_files,
+						fm_info[0].f_len);
+		if (!fm_fs->fs_sort)
+			goto out1;
+	}
+
+	for (i = 0; i < total_files; i++) {
+		/*
+		 * the first entry of fm_info contains the layout.
+		 * (i+1) is intentional
+		 */
+		err = kern_path(fm_info[i+1].f_file, LOOKUP_FOLLOW,
+				&fm_fs->fs_path[i]);
+		if (err)
+			goto out_free_filemash_path;
+	}
+
+	root_inode = filemash_new_inode(sb, S_IFREG);
+	if (!root_inode)
+		goto out_free_filemash_path;
+
+
+	root_dentry = d_make_root(root_inode);
+	if (!root_dentry)
+		goto out_release_root;
+
+	root_dentry->d_fsdata = NULL;
+	root_dentry->d_op = NULL;
+
+	sb->s_op = &filemash_super_operations;
+	sb->s_root = root_dentry;
+	sb->s_fs_info = (void *)fm_fs;
+
+	return 0;
+
+out_release_root:
+	iput(root_inode);
+
+out_free_filemash_path:
+	kfree(fm_fs->fs_sort);
+	for (j = 0; j < i; j++)
+		path_put(&fm_fs->fs_path[j]);
+	kfree(fm_fs->fs_path);
+
+
+out1:
+	kfree(fm_fs);
+out:
+	kfree(fm_info);
+	return err;
+}
+
+static struct dentry *filemash_mount(struct file_system_type *fs_type,
+					int flags,
+					const char *dev_name,
+					void *raw_data)
+{
+	return mount_nodev(fs_type, flags, raw_data, filemash_fill_super);
+}
+
+static struct file_system_type filemash_fs_type = {
+	.owner		= THIS_MODULE,
+	.name		= "filemashfs",
+	.mount		= filemash_mount,
+	.kill_sb	= kill_anon_super,
+};
+
+static int __init filemash_init(void)
+{
+	return register_filesystem(&filemash_fs_type);
+}
+
+static void __exit filemash_exit(void)
+{
+	unregister_filesystem(&filemash_fs_type);
+}
+
+module_init(filemash_init);
+module_exit(filemash_exit);


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC PATCH 1/1] vfs: Filemash fs
@ 2013-04-03 10:23 Ram Pai
  2013-04-07 11:03 ` Vyacheslav Dubeyko
  0 siblings, 1 reply; 6+ messages in thread
From: Ram Pai @ 2013-04-03 10:23 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linuxram

The following patch implements a filesystem driver which provides the ability
to mashup exisiting files in creative ways in-order to create new files.
    
Think of it as a way to union files; not filesystems.
    
Its a prototype idea with a prototype implementation.  Tested and working on
3.0.9-rc1.  I have included Documentation file which details the idea with
examples and possible applications.
    
Any suggestions/ideas to make this useful and generally applicable is very much
appreciated!
    
Signed-off-by: Ram Pai <linuxram@us.ibm.com>

Documentation/filesystems/filemash.txt |  422 ++++++++++++++++++
fs/Kconfig                             |    1 
fs/Makefile                            |    1 
fs/filemashfs/Kconfig                  |    6 
fs/filemashfs/Makefile                 |    5 
fs/filemashfs/super.c                  |  743 +++++++++++++++++++++++++++++++++
6 files changed, 1178 insertions(+)

diff --git a/Documentation/filesystems/filemash.txt b/Documentation/filesystems/filemash.txt
new file mode 100644
index 0000000..9623d5b
--- /dev/null
+++ b/Documentation/filesystems/filemash.txt
@@ -0,0 +1,422 @@
+File Mash Filesystem
+---------------
+Contents:
+	1) Overview
+ 	2) How to use it?
+	3) Features
+	4) Format of mount option
+	5) Practical examples
+	6) Nice to have capabilities
+	7) FAQ
+
+
+1) Overview
+-----------
+
+Consider the following situation:
+
+a) You have a large file that needs to be expanded but you are running out of
+space on that filesystem. For muliple reasons you do not have the capability
+to expand the filesystem. You either have to move the file to a bigger
+filesystem or cleanup space on the current filesystem. What if you had the
+capability to expand the file to another filesystem?
+
+b) You have a large file whose contents change every day by a few bytes. An
+incremental backup of the filesystem; everyday using rsync, will end up making
+copies of the large file everytime. Instead; if your large file was mashedup
+using multiple smaller files, an incremental backup of the filesystem will
+just backup the smaller files that have changed; thus saving lots of backup
+space.
+
+c) Large files; imagine virtual disk images, that have some sections that
+hardly change, some sections that have significant i/o activity, and some
+sections that need redundancy. This need can be met if there is a way
+to mashup the file using smaller files residing on filesystem appropriate
+to the need.
+
+d) Imagine multiple files containing data which is mostly common. In such case
+storage space is unnecessary wasted.  What if there was a way to separate out
+the common data into a file and use that file to mashup each of the files?
+Data can be deduplicated saving lots of storage space.
+
+e) Imagine a script file which for whatever reason cannot be modified, but
+needs to be fixed to make it usable. What if there was way to temporarily
+patch that script without actually modifying the script?
+
+f) <fill in more applications>
+
+FileMash filesystem is an attempt to provide the ability to mash-up a new
+file using multiple existing files in creative ways.
+
+It is currently implemented as a filesystem driver that stacks on top of other
+filesystem drivers. It provides the capability to create the illusion of a new
+file by using the space provided by the sub-ordinate files residing on other
+filesystems.
+
+In other words, it provides unioning capability -- the capability to union
+files, not filesystems.
+
+
+2) How to use it?
+------------------
+
+Apply the patch.  This patch has been prototyped/tested on 3.0.9-rc1.
+
+$ make fs/filemashfs/filemashfs.ko
+$ modprobe  filemashfs.ko
+
+$ cat > file_even <<-!
+0000
+2222
+4444
+6666
+8888
+!
+
+$ cat > file_odd <<-!
+1111
+3333
+5555
+7777
+9999
+!
+
+# Example 1
+# create a new file named file_evenodd using contents of file 'file_even' followed by
+# contents of file 'file_odd'. Use the content in a striped fashion.
+$ touch file_evenodd
+$ mount -t filemashfs -ofile=file_even,file=file_odd,layout=stripe:5 s file_evenodd
+
+$ cat file_evenodd
+0000
+1111
+2222
+3333
+4444
+5555
+6666
+7777
+8888
+9999
+
+# Example 2
+# create a new file named file_evenodd using contents of file 'file_odd' followed by
+# contents of  file 'file_even'. Use the content in a striped fashion.
+$ touch file_oddeven
+$ mount -t filemashfs -ofile=file_odd,file=file_even,layout=stripe:5 s file_oddeven
+
+$ cat file_oddeven
+1111
+0000
+3333
+2222
+5555
+4444
+7777
+6666
+9999
+8888
+
+# Example 3:
+# create a new file named file_concat using entire contents of file 'file_odd' followed by
+# entire contents of  file 'file_even'.
+$ touch file_concat
+$ mount -t filemashfs -ofile=file_odd,file=file_even,layout=concat s file_concat
+$ cat file_concat
+1111
+3333
+5555
+7777
+9999
+0000
+2222
+4444
+6666
+8888
+
+# Example 4:
+# Here is a example demonstating how contents get fanned out to sub-ordinate files when data is
+# written to the mashed file.
+
+$ touch firstfile
+$ touch secondfile
+$ touch mashfile
+
+$ mount -t filemashfs -ofile=firstfile,file=secondfile,layout=stripe:10 s mashfile
+$ cat > mashfile <<!
+> 000000000
+> 111111111
+> 222222222
+> 333333333
+> 444444444
+> 555555555
+> 666666666
+> 777777777
+> 888888888
+> 999999999
+> !
+
+$ cat firstfile
+000000000
+222222222
+444444444
+666666666
+888888888
+
+$ cat secondfile 
+111111111
+333333333
+555555555
+777777777
+999999999
+
+
+
+2) Features
+-----------
+
+FileMash filesystem provides two features currently.
+
+a) Striping
+b) Concatenation
+
+If the data needs to striped evenly across all the subordinate files, striping
+option helps do that.  In the above example layout=stripe:5 is used. It tells
+the filesystem to use first 5 bytes from the first file, the second 5 bytes
+from the second file, the third 5 bytes from the first file, and so on an so
+forth till the first and the second file have no more data to offer.
+
+If the data needs to concatenated using all the subordinate files, concat
+option is your choice. In the above example layout=concat is used. It tells
+the filesystem to use all the available bytes in the first file and then
+proceed to use all the available bytes in the second file.
+
+The area of the subordinate file to be used can be specified on the command
+line. This will ensure that only the specified area is used by the mashed file
+and thus cannot cross the boundaries of that sub-ordinate file.
+
+$ mount -t filemashfs -ofile=file_odd:10:20,file=file_even,layout=concat s file_concat
+
+This command specifies that file_odd from byte 10 to the next 20bytes must
+be used. The sub-ordinate file cannot not be expanded by writing additional
+data to the mashedup file.
+
+
+3) Format of the mount option
+--------------------------
+
+ The format of the mount option is as follows
+
+ -o (file=filename:[offset]:[length],)*,layout=<stripe:size|concat>,.....
+ 
+  The file keyword specifies the name of the sub-ordinate file to be used
+  for mashup. The 'offset' specifies the location from which the file can
+  be used for read/write. The 'length' specifies the size of the file, starting
+  from the 'offset' location, to be used for read/write. If 'offset' is not
+  specified the file is used from its zeroth byte. if 'length' is not
+  specified the file is assumed to have infinite size, limited only by 
+  the size of the filesystem on which it resides.
+
+  Multile files can be specified. The order in which files are specified is
+  significant, since it dictates the order in which the mashed-up file will
+  map its contents to the underlying sub-ordinate files.
+
+  The layout keyword specifies the file mashup policy. Currently it supports
+  stripe or concat policy. The layout defaults to concat policy. The 'stripe'
+  keyword takes an addition option which specifies the length of the stripe.
+
+
+$ mount -t filemashfs -ofile=file_even,file=file_odd,layout=stripe:5  \
+                 s file_evenodd
+
+          file_even                           file_evenodd
+          ,----,                                ,----,
+          |0000| <----------------------------- |0000|
+          |2222| <--,                ,--------  |1111|
+          |4444| <-, `--------------/---------  |2222|
+          |6666|<-, \              / ,--------  |3333|
+          |8888|<~ \ '---------- -/-/---------  |4444|
+          '----'  \ \            / / ,--------  |5555|
+                   \ '-------- -/-/-/---------  |6666|
+                    \          / / /  ,------   |7777|
+          file_odd   '--------/-/-/--/ ------   |8888|
+          ,----,             / / /  /  /-----   |9999|
+          |1111|  <---------' / /  /  /         '----'
+          |3333|  <----------' /  /  /
+          |5555|  <-----------'  /  /
+          |7777|  <-------------'  /
+          |9999|  <---------------'
+          '----' 
+        
+$ mount -t filemashfs -ofile=file_even:5,file=file_odd,layout=stripe:5  \
+               s file_evenodd
+
+          file_even                           file_evenodd
+          ,----,                                ,----,
+          |0000|    ,-------------------------- |2222|
+          |2222| <-'                 ,--------  |1111|
+          |4444| <------------------/---------  |4444|
+          |6666| <--,              / ,--------  |3333|
+          |8888| <-, '---------- -/-/---------  |6666|
+          '----'    \            / / ,--------  |5555|
+                     '-------- -/-/-/---------  |8888|
+                               / / /  ,------   |7777|
+          file_odd            / / /  /  ,-----  |9999|
+          ,----,             / / /  /  /        '----'
+          |1111|  <---------' / /  /  /         
+          |3333|  <----------' /  /  /
+          |5555|  <-----------'  /  /
+          |7777|  <-------------'  /
+          |9999|  <---------------'
+          '----' 
+
+$ mount -t filemashfs -ofile=file_even:5:10,file=file_odd,layout=stripe:5  \
+                s file_evenodd
+
+          file_even                           file_evenodd
+          ,----,                                ,----,
+          |0000|    ,-------------------------- |2222|
+          |2222| <-'                 ,--------  |1111|
+          |4444| <------------------/---------  |4444|
+          |6666|                   / ,--------  |3333|
+          |8888|                  / / ,-------  |5555|
+          '----'                 / / /  ,-----  |7777|
+                                / / /  /  ,---  |9999|
+                               / / /  /  /      '----' 
+          file_odd            / / /  /  /
+          ,----,             / / /  /  /
+          |1111|  <---------' / /  /  /         
+          |3333|  <----------' /  /  /
+          |5555|  <-----------'  /  /
+          |7777|  <-------------'  /
+          |9999|  <---------------'
+          '----' 
+
+
+$ mount -t filemashfs -ofile=file_even:10:20,file=file_odd:5:,layout=concat \
+                s file_concat
+
+          file_even                           file_concat
+          ,----,                                ,----,
+          |0000|     ,------------------------- |4444|
+          |2222|    /                ,--------  |6666|
+          |4444| <-`                / ,-------  |3333|
+          |6666| <---------------- / / ,------  |5555|
+          |8888|                    / /  ,----  |7777|
+          '----'                   / /  /  ,--  |9999|
+                                  / /  /  /     '----'
+                                 / /  /  /
+          file_odd              / /  /  /
+          ,----,               / /  /  /
+          |1111|              / /  /  /         
+          |3333|  <----------' /  /  /
+          |5555|  <-----------'  /  /
+          |7777|  <-------------'  /
+          |9999|  <---------------'
+          '----' 
+        
+
+4) Practical examples
+----------------------
+
+a) how to create a large file spanning multiple filesystems?
+
+assuming that we have 5 differen filesystems mounted at /F1, /F2, /F3, /F4, /F5
+
+$ touch /F1/f1
+$ touch /F2/f2
+$ touch /F3/f3
+$ touch /F4/f4
+$ touch /F5/f5
+
+$ touch /mybigfile
+$ mount -t filemashfs -ofile=/F1/f1,file=/F2/f2,file=/F3/f3,file=/F4/f4,file=F5/f5,layout=stripe:10000 s /mybigfile
+
+start writing data to /mybigfile and it will continue to grow that file till
+all the space on all the filesystem is exhausted! I have not tested it yet.
+But that should work by design. Implementation; I promise, is currently buggy :).
+
+
+
+b) how do I patch a file?
+
+Assume you have a script named hello.sh
+
+$ cat hello.sh
+#/bin/bash
+echo Spanish:  hola?
+echo Mandarin: ni hao ma?
+echo English:  I don't know :(
+echo Hindi:    aap kaise hai?
+echo Kannada:  Nee Hege Iddiya?
+
+	we know that "I don't know :(" has to be "How are you?".
+	So create a file named "fix" with the correct string in it.
+
+#echo 'How are you?' > fix
+
+	And mashup a new "hello.sh" file using parts of "hello.sh" and parts of
+	"fix".
+
+#mount -t filemashfs -ofile=hello.sh:0:73,file=fix,file=hello.sh:89,layout=concat s hello.sh
+
+# cat hello.sh
+#/bin/bash
+echo Spanish:  hola?
+echo Mandarin: ni hao ma?
+echo English:  How are you?
+echo Hindi:    aap kaise hai?
+echo Kannada:  Nee Hege Iddiya?
+
+<hello.sh is now fixed!>
+
+
+c) How do I deduplicate files?
+
+	I leave this as an exercise to you :)
+
+
+6. Nice to have capabilities
+-------------------------
+
+a) Ability to add or remove a file dynamically from a mashed up file.
+
+b) ??
+
+
+7. FAQ
+-------
+
+a) Why is this not implemented using FUSE?
+
+	I do not have numbers to prove this. But I think it might be less
+	performant, given that reads/writes have to make a few trips
+	between userspace and kernel.
+
+
+b) btrfs has a feature which allows a filesystem to extended using files from
+	other filesystem using loop device. Why is this different?
+
+	That feature lets you extend filesystem using files. The proposed
+	feature lets us extend files using filesystems. The features are
+	complementary in nature.
+
+
+c) One of the application; you mentioned above, mentions deduplication. Does it
+	deduplicate automatically?
+
+	No. The current prototype code does support any
+	deduplication. It just provides the mechanism to deduplicate
+	data and consolidate filesystem space. The user has to
+	identify the files that have
+	common data, move that data into a single file and mashup
+	all the other files using this new file.
+	A userspace tool can be written to do so.
+
+
+
+Thanks for your interest till now!
+------------------------------------------------------------------------
+version 0.1  (created the initial document, Ram Pai linuxram@us.ibm.com)
+version 0.2  (added nice to have capabilities and FAQ, based on inputs from
+			Chandra and Malahal)
diff --git a/fs/Kconfig b/fs/Kconfig
index 780725a..4ea2cfd 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -67,6 +67,7 @@ source "fs/quota/Kconfig"
 
 source "fs/autofs4/Kconfig"
 source "fs/fuse/Kconfig"
+source "fs/filemashfs/Kconfig"
 
 config GENERIC_ACL
 	bool
diff --git a/fs/Makefile b/fs/Makefile
index 9d53192..296f3b2 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -127,3 +127,4 @@ obj-$(CONFIG_F2FS_FS)		+= f2fs/
 obj-y				+= exofs/ # Multiple modules
 obj-$(CONFIG_CEPH_FS)		+= ceph/
 obj-$(CONFIG_PSTORE)		+= pstore/
+obj-$(CONFIG_FILEMASH_FS)	+= filemashfs/
diff --git a/fs/filemashfs/Kconfig b/fs/filemashfs/Kconfig
new file mode 100644
index 0000000..eaf2ee7
--- /dev/null
+++ b/fs/filemashfs/Kconfig
@@ -0,0 +1,6 @@
+config FILEMASH_FS
+	tristate "FileMash file system (EXPERIMENTAL)"
+	help
+	  Add support for FileMash filesystem.
+
+	  If unsure, say N.
diff --git a/fs/filemashfs/Makefile b/fs/filemashfs/Makefile
new file mode 100644
index 0000000..73b82d7
--- /dev/null
+++ b/fs/filemashfs/Makefile
@@ -0,0 +1,5 @@
+#
+# Makefile for FileMash filesystem
+#
+obj-$(CONFIG_FILEMASH_FS) += filemashfs.o
+filemashfs-objs	:= super.o
diff --git a/fs/filemashfs/super.c b/fs/filemashfs/super.c
new file mode 100644
index 0000000..3b7cd00
--- /dev/null
+++ b/fs/filemashfs/super.c
@@ -0,0 +1,743 @@
+/*
+ *  linux/fs/filemashfs/super.c
+ *
+ * (C) Copyright IBM Corporation 2013.
+ *	Released under GPL v2.
+ *	Author : Ram Pai (linuxram@us.ibm.com)
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+#include <linux/fs.h>
+#include <linux/namei.h>
+#include <linux/xattr.h>
+#include <linux/security.h>
+#include <linux/mount.h>
+#include <linux/slab.h>
+#include <linux/parser.h>
+#include <linux/module.h>
+#include <linux/cred.h>
+#include <linux/sched.h>
+#include <linux/gfp.h>
+#include <linux/types.h>
+#include <linux/workqueue.h>
+#include <linux/pagemap.h>
+
+MODULE_AUTHOR("Ram Pai <linuxram@us.ibm.com>");
+MODULE_DESCRIPTION("FileMash filesystem");
+MODULE_LICENSE("GPL");
+
+#define my_div(numerator, denominator)   ((numerator)/(denominator))
+#define my_mod(numerator, denominator)   ((numerator)%(denominator))
+
+#define FM_READ  1
+#define FM_WRITE 2
+
+#define MAX_FILE 16
+
+struct fm_info  {
+	char	*f_file;
+	loff_t	f_offset;
+	size_t	f_len;
+};
+
+struct fm_fs {
+	size_t		  fs_total;
+	struct path	 *fs_path;
+	struct fm_info	 *fs_info;
+	int		 *fs_sort;
+};
+
+struct fm_f {
+	struct file   **f_file;
+	struct fm_fs   *f_fs;
+};
+
+
+enum {
+	opt_file,
+	opt_layout,
+	opt_err,
+};
+
+
+static int filemash_file_release(struct inode *inode, struct file *filp)
+{
+	struct fm_f *fm_f = (struct fm_f *)filp->private_data;
+	struct dentry *dentry = filp->f_path.dentry;
+	struct super_block *sb = dentry->d_sb;
+	struct fm_fs *fm_fs = (struct fm_fs *)sb->s_fs_info;
+	int    total_files = fm_fs->fs_total;
+	int    i;
+
+	for (i = 0; i < total_files; i++)
+		fm_f->f_file[i]->f_op->release(
+			fm_fs->fs_path[i].dentry->d_inode, fm_f->f_file[i]);
+
+	kfree(fm_f->f_file);
+	kfree(fm_f);
+	return 0;
+}
+
+static int filemash_file_open(struct inode *inode, struct file *filp)
+{
+	struct dentry *dentry = filp->f_path.dentry;
+	struct super_block *sb = dentry->d_sb;
+	struct fm_fs *fm_fs = (struct fm_fs *)sb->s_fs_info;
+	int    total_files = fm_fs->fs_total;
+	struct fm_f *fm_f;
+	int i, err = -ENOMEM;
+
+	fm_f = kmalloc(sizeof(struct fm_f), GFP_KERNEL);
+	if (!fm_f)
+		goto out;
+
+	fm_f->f_file = kmalloc(total_files*sizeof(struct file *), GFP_KERNEL);
+	if (!fm_f->f_file)
+		goto out1;
+
+	fm_f->f_fs = fm_fs;
+
+	filp->private_data = (void *)fm_f;
+
+	for (i = 0; i < total_files; i++) {
+		fm_f->f_file[i] = dentry_open(&fm_fs->fs_path[i],
+						filp->f_flags, current_cred());
+		fm_f->f_file[i]->f_op->open(fm_fs->fs_path[i].dentry->d_inode,
+						fm_f->f_file[i]);
+	}
+
+	return 0;
+
+out1:	kfree(fm_f);
+out:	return err;
+}
+
+static void wait_on_retry_sync_kiocb(struct kiocb *iocb)
+{
+	set_current_state(TASK_UNINTERRUPTIBLE);
+	if (!kiocbIsKicked(iocb))
+		schedule();
+	else
+		kiocbClearKicked(iocb);
+	__set_current_state(TASK_RUNNING);
+}
+
+static int copy_zero_bytes(char __user *buf,  size_t len)
+{
+	struct page *page = ZERO_PAGE(0);
+	char *kaddr;
+
+	len = min(PAGE_SIZE, len);
+
+	kaddr = kmap(page);
+	len = __copy_to_user(buf, kaddr, len);
+	kunmap(page);
+	return len;
+}
+
+
+/*
+ * return the file which holds the 'stripe_n'th stripe.
+ *
+ * @stripe_n : the index of the stripe. starts from one; not zero.
+ * @sort     : array pointing indexes to the file array whose lengths are
+ *		sorted.
+ * @fm_f     : all the files, their lengths, offsets. sort array.
+ *		open file descriptors
+ * @total_files : total number of files
+ * @which_stripe: the stripe number within the returned file
+ */
+static struct file *where_is_stripe_n(int stripe_n,
+				const struct fm_f *fm_f,
+				int *which_stripe)
+{
+	const struct fm_info *fm_info =  fm_f->f_fs->fs_info;
+	const int *sort =  fm_f->f_fs->fs_sort;
+	int total_files =  fm_f->f_fs->fs_total;
+	struct file  **f_file = fm_f->f_file;
+	int prev_min = 0;
+	int total = 0, accumulated_stripes = 0;
+	int remaining_files = total_files;
+	int i, rem_stripe, n;
+	int stripe_len = fm_info[0].f_len;
+
+	for (i = 0; i < total_files; i++) {
+		int min, tmp;
+
+		if (fm_info[sort[i]].f_len) {
+			int index = sort[i];
+			int file_len = fm_info[index].f_len -
+				fm_info[index].f_offset;
+			min = my_div(file_len, stripe_len);
+		} else
+			min = INT_MAX/remaining_files;
+
+		tmp = (min-prev_min)*remaining_files;
+
+		if ((total+tmp) > stripe_n) {
+			accumulated_stripes +=
+				(stripe_n-1-total)/remaining_files;
+			break;
+		}
+
+		total += tmp;
+		accumulated_stripes += (min-prev_min);
+		remaining_files--;
+		prev_min = min;
+	}
+
+	if (i == total_files)
+		return NULL;
+
+	rem_stripe = stripe_n  - total;
+	n = my_mod(rem_stripe-1, remaining_files)+1;
+
+	for (i = 1; i < total_files+1; i++) {
+		int tmp;
+		if (fm_info[i].f_len) {
+			int file_len = fm_info[i].f_len-fm_info[i].f_offset;
+			tmp = my_div(file_len, stripe_len);
+		} else
+			tmp = INT_MAX;
+
+		if (tmp > accumulated_stripes && !--n)
+			break;
+	}
+
+	if (n)
+		return NULL;
+
+	if (i < 1 || i > total_files)
+		return NULL;
+
+	*which_stripe = accumulated_stripes;
+	return f_file[i-1];
+}
+
+
+/*
+ * offset has reached or exceeded end-of-file only if it has exceeded
+ * the size of all the subordinate files
+ */
+static int end_of_file(struct file **f_file, int total_files, int offset)
+{
+	int i, k = total_files;
+	for (i = 0 ; i < total_files ; i++) {
+		struct inode *inode = f_file[i]->f_dentry->d_inode;
+		size_t i_size;
+
+		mutex_lock(&inode->i_mutex);
+		i_size = i_size_read(inode);
+		mutex_unlock(&inode->i_mutex);
+
+		if (offset > i_size && !--k)
+			return 1;
+	}
+	return 0;
+}
+
+/*
+ * return the subordinate file that holds the *ppos location in the
+ * mashed file.
+ */
+static struct file *find_stripe_filp(const struct fm_f *fm_f,
+					loff_t *ppos,
+					struct kiocb *kiocb,
+					int dir)
+{
+	struct file  **f_file = fm_f->f_file,  *filep = NULL;
+	const struct fm_info *fm_info = (struct fm_info *)fm_f->f_fs->fs_info;
+	int total_files = fm_f->f_fs->fs_total;
+	int stripe_len = fm_info[0].f_len;
+	int filep_stripe = 0;
+
+	int stripe_n = my_div(*ppos, stripe_len)+1;
+
+
+	filep = where_is_stripe_n(stripe_n, fm_f, &filep_stripe);
+
+	if (!filep)
+		return NULL;
+
+	if (dir == FM_READ && end_of_file(f_file, total_files,
+				filep_stripe*stripe_len))
+		return NULL;
+
+	init_sync_kiocb(kiocb, filep);
+	kiocb->ki_pos = *ppos - (stripe_len * (stripe_n-1)) +
+				(stripe_len * filep_stripe);
+	kiocb->ki_left = stripe_len*(filep_stripe+1) - kiocb->ki_pos;
+	kiocb->ki_nbytes = kiocb->ki_left;
+	return filep;
+}
+
+/*
+ * return the subordinate file that holds the *ppos location in the mashed file.
+ */
+static struct file *find_concat_filp(const struct fm_f *fm_f,
+					loff_t *ppos,
+					struct kiocb *kiocb)
+{
+	int i;
+	loff_t i_size = 0, pre_i_size = 0, tmp_i_size;
+	struct file  **f_file = fm_f->f_file,  *filep = NULL;
+	const struct fm_info *fm_info = (struct fm_info *)fm_f->f_fs->fs_info;
+	int total_files = fm_f->f_fs->fs_total;
+
+
+	for (i = 0; i < total_files; i++) {
+		struct inode *inode = f_file[i]->f_dentry->d_inode;
+		mutex_lock(&inode->i_mutex);
+		tmp_i_size = i_size_read(inode);
+		mutex_unlock(&inode->i_mutex);
+
+		tmp_i_size -= fm_info[i+1].f_offset;
+		if (tmp_i_size <= 0)
+			continue;
+		if (fm_info[i+1].f_len && fm_info[i+1].f_len < tmp_i_size)
+			tmp_i_size = fm_info[i+1].f_len;
+
+		i_size += tmp_i_size;
+
+		if (*ppos < i_size) {
+			init_sync_kiocb(kiocb, f_file[i]);
+			kiocb->ki_pos = *ppos - pre_i_size +
+				fm_info[i+1].f_offset;
+			filep = f_file[i];
+			break;
+		}
+		pre_i_size = i_size;
+	}
+	kiocb->ki_left = tmp_i_size;
+	kiocb->ki_nbytes = tmp_i_size;
+	return filep;
+}
+
+static ssize_t
+filemash_file_io(struct file *filp, char __user *buf, size_t len,
+			loff_t *ppos, int dir)
+{
+	struct iovec iov = { .iov_base = buf, .iov_len = len };
+	struct kiocb kiocb;
+	ssize_t ret;
+	const struct fm_f *fm_f = (struct fm_f *)filp->private_data;
+	struct fm_info *fm_info = fm_f->f_fs->fs_info;
+	struct file  *filep;
+	loff_t pre_pos;
+	size_t left;
+
+
+	if (!strcmp(fm_info[0].f_file, "concat"))
+		filep = find_concat_filp(fm_f, ppos, &kiocb);
+	else
+		filep = find_stripe_filp(fm_f, ppos, &kiocb, dir);
+
+	if (!filep)
+		return ((dir == FM_READ) ? 0 : -ENOSPC);
+
+	left = iov.iov_len = kiocb.ki_nbytes;
+	pre_pos = kiocb.ki_pos;
+	for (;;) {
+		if (dir == FM_READ) {
+			ret = filep->f_op->aio_read(&kiocb, &iov, 1,
+					kiocb.ki_pos);
+			if (!ret && kiocb.ki_nbytes) {
+				left = min(len, left);
+				len = copy_zero_bytes(buf, left);
+				ret = left - len;
+				kiocb.ki_pos += ret;
+			}
+		} else
+			ret = filep->f_op->aio_write(&kiocb, &iov, 1,
+					kiocb.ki_pos);
+		if (ret != -EIOCBRETRY)
+			break;
+		wait_on_retry_sync_kiocb(&kiocb);
+	}
+
+	if (-EIOCBQUEUED == ret)
+		ret = wait_on_sync_kiocb(&kiocb);
+
+	*ppos += kiocb.ki_pos - pre_pos;
+
+	return ret;
+}
+
+static ssize_t
+filemash_file_read(struct file *filp, char __user *buf, size_t len,
+				loff_t *ppos)
+{
+	return filemash_file_io(filp, buf, len, ppos, FM_READ);
+}
+
+
+static ssize_t
+filemash_file_write(struct file *filp, const char __user *buf,
+		size_t len, loff_t *ppos)
+{
+	return filemash_file_io(filp, (char __user *)buf, len, ppos, FM_WRITE);
+}
+
+static const struct file_operations filemash_file_operations = {
+	.open	 = filemash_file_open,
+	.read	 = filemash_file_read,
+	.release = filemash_file_release,
+	.write	= filemash_file_write,
+	/*.aio_read = generic_file_aio_read */
+};
+
+
+static int filemash_getattr(struct vfsmount *mnt, struct dentry *dentry,
+				struct kstat *stat)
+{
+	struct super_block *sb	  = dentry->d_sb;
+	struct kstat sstat;
+	int    ret, i;
+	struct fm_fs *fm_fs	  = (struct fm_fs *)sb->s_fs_info;
+	struct path *fs_path	  = (struct path *)fm_fs->fs_path;
+	struct fm_info *fm_info	  = (struct fm_info *)fm_fs->fs_info;
+	int total_files		  = fm_fs->fs_total;
+
+	stat->size = 0;
+	for (i = 0; i < total_files; i++) {
+		ret = vfs_getattr(fs_path+i, &sstat);
+		if (ret)
+			return ret;
+
+		sstat.size -= fm_info[i+1].f_offset;
+		if (sstat.size <= 0)
+			continue;
+		if (fm_info[i+1].f_len && sstat.size > fm_info[i+1].f_len)
+			sstat.size = fm_info[i+1].f_len;
+
+		if (i == 0)
+			*stat = sstat;
+		else
+			stat->size += sstat.size;
+	}
+	stat->ino   = dentry->d_inode->i_ino;
+	return 0;
+}
+
+
+static const struct inode_operations filemash_file_inode_operations = {
+	.getattr        = filemash_getattr,
+};
+
+
+struct inode *filemash_new_inode(struct super_block *sb, umode_t mode)
+{
+	struct inode *inode;
+
+	inode = new_inode(sb);
+	if (!inode)
+		return NULL;
+
+	mode &= S_IFMT;
+
+	inode->i_ino = get_next_ino();
+	inode->i_mode = mode;
+	inode->i_flags |= S_NOATIME | S_NOCMTIME;
+
+	inode->i_op =  &filemash_file_inode_operations;
+	inode->i_fop = &filemash_file_operations;
+
+	return inode;
+}
+
+static void filemash_put_super(struct super_block *sb)
+{
+	struct fm_fs *fm_fs = (struct fm_fs *)sb->s_fs_info;
+	int total_files = fm_fs->fs_total;
+	struct fm_info *fm_info = (struct fm_info *)fm_fs->fs_info;
+	int i;
+
+	kfree(fm_fs->fs_sort);
+	for (i = 0; i < total_files; i++)
+		path_put(&fm_fs->fs_path[i]);
+	kfree(fm_fs->fs_path);
+
+	/* yes i=1 is correct. it starts at 1 */
+	for (i = 1; i < total_files+1; i++)
+		kfree(fm_info[i].f_file);
+	kfree(fm_info);
+
+	return;
+}
+
+static int filemash_remount_fs(struct super_block *sb, int *flagsp, char *data)
+{
+	return 0;
+}
+
+static const match_table_t filemash_tokens = {
+	{opt_file,		"file=%s"},
+	{opt_layout,		"layout=%s"},
+	{opt_err,		NULL}
+};
+
+/*
+ *
+ * format of the input is
+ * file=file:[offset]:[len],layout=<stripe:size|concat>,.....
+ *
+ * offset; if is not specified, defaults to zero
+ * len; if is not specified, defaults to infinity
+ *
+ * file= and layout= tokens can be used in any order.
+ *
+ * the order of the files determines the order in which the files are mashed-up
+ */
+static struct fm_info *filemash_parse_opt(char *opt, int *total_files)
+{
+	char *p, *q, *r;
+	int i, total = 0;
+	struct fm_info *fm_info = (struct fm_info *)
+		kzalloc((MAX_FILE+1)*sizeof(struct fm_info), GFP_KERNEL);
+
+	if (!fm_info)
+		goto fail;
+
+	while ((p = strsep(&opt, ",")) != NULL) {
+		int token;
+		substring_t args[MAX_OPT_ARGS];
+
+		if (total >= MAX_FILE)
+			goto fail;
+
+		if (!*p)
+			continue;
+
+		token = match_token(p, filemash_tokens, args);
+		switch (token) {
+		case opt_file:
+			total++;
+
+			q = match_strdup(&args[0]);
+			if (!q)
+				goto fail;
+
+			r = strsep(&q, ":");
+			if (!r)
+				goto fail;
+			fm_info[total].f_file = r;
+
+			r = strsep(&q, ":");
+			if (!r || kstrtol(r, 10,
+					(long *)&fm_info[total].f_offset)) {
+				fm_info[total].f_offset = 0;
+				fm_info[total].f_len = 0;
+				break;
+			}
+
+			r = strsep(&q, ":");
+			if (!r || kstrtol(r, 10,
+					(long *)&fm_info[total].f_len)) {
+				fm_info[total].f_len = 0;
+				break;
+			}
+			break;
+
+		case opt_layout:
+			q = match_strdup(&args[0]);
+			if (!q)
+				goto fail;
+
+			r = strsep(&q, ":");
+			if (!r)
+				goto fail;
+			fm_info[0].f_file = r;
+
+			if (strncmp(r, "stripe", 6)) {
+				fm_info[0].f_len = 0;
+				break;
+			}
+			r = strsep(&q, ":");
+			if (!r || kstrtol(r, 10, (long *)&fm_info[0].f_len)) {
+				fm_info[0].f_len = 0;
+				break;
+			}
+			break;
+
+		default:
+			return NULL;
+		}
+	}
+
+	if (!fm_info[0].f_file) {
+		fm_info[0].f_file = "concat";
+		fm_info[0].f_len  = 0;
+	} else if (!strcmp(fm_info[0].f_file, "stripe") && !fm_info[0].f_len) {
+		goto fail;
+	} else if (!strcmp(fm_info[0].f_file, "concat"))  {
+		fm_info[0].f_len  = 0;
+	}
+
+	*total_files = total;
+	return fm_info;
+
+fail:
+	for (i = 0; i < total; i++)
+		kfree(fm_info[i].f_file);
+	kfree(fm_info);
+	*total_files = 0;
+	return NULL;
+}
+
+static const struct super_operations filemash_super_operations = {
+	.put_super	= filemash_put_super,
+	.remount_fs	= filemash_remount_fs,
+};
+
+
+/*
+ * not the best sort function in the world. implement heapsort or
+ * some such thing. Currently it is roughly O(n^2)
+ */
+static int *sort_info(struct fm_info *array, int total, int stripe_len)
+{
+	int i, j, index;
+	int no_of_stripes, cur_min, last_min;
+	int *sort = kmalloc(total*sizeof(int), GFP_KERNEL);
+	if (!sort)
+		return NULL;
+
+	j = 0;
+	cur_min = 0;
+	while (j < total) {
+		last_min = cur_min;
+		cur_min = -1;
+		index = j;
+		for (i = 1; i < total+1; i++) {
+			if (!array[i].f_len)
+				no_of_stripes = INT_MAX;
+			else
+				no_of_stripes = my_div(array[i].f_len,
+							stripe_len);
+
+			if (no_of_stripes <= last_min)
+				continue;
+
+			if (cur_min == -1) {
+				cur_min = no_of_stripes;
+				sort[index++] = i;
+			} else if (no_of_stripes < cur_min) {
+				index = j;
+				sort[index++] = i;
+			} else if (no_of_stripes == cur_min) {
+				sort[index++] = i;
+			}
+		}
+		BUG_ON(j == index);
+		j = index;
+	}
+
+	return sort;
+}
+
+static int filemash_fill_super(struct super_block *sb, void *data, int silent)
+{
+	struct inode *root_inode;
+	struct dentry *root_dentry;
+	struct fm_fs *fm_fs;
+	int i, j, total_files = 0;
+	int err = -ENOMEM;
+	struct fm_info *fm_info = (struct fm_info *)
+			filemash_parse_opt((char *) data, &total_files);
+
+	if (!fm_info)
+		goto out;
+
+	fm_fs = kmalloc(sizeof(struct fm_fs), GFP_KERNEL);
+	if (!fm_fs)
+		goto out;
+
+	fm_fs->fs_path = kmalloc(total_files*sizeof(struct path), GFP_KERNEL);
+	if (!fm_fs->fs_path)
+		goto out1;
+
+	fm_fs->fs_total = total_files;
+	fm_fs->fs_info  = fm_info;
+	fm_fs->fs_sort  = NULL;
+	if (!strcmp(fm_info[0].f_file, "stripe")) {
+		fm_fs->fs_sort = sort_info(fm_info, total_files,
+						fm_info[0].f_len);
+		if (!fm_fs->fs_sort)
+			goto out1;
+	}
+
+	for (i = 0; i < total_files; i++) {
+		/*
+		 * the first entry of fm_info contains the layout.
+		 * (i+1) is intentional
+		 */
+		err = kern_path(fm_info[i+1].f_file, LOOKUP_FOLLOW,
+				&fm_fs->fs_path[i]);
+		if (err)
+			goto out_free_filemash_path;
+	}
+
+	root_inode = filemash_new_inode(sb, S_IFREG);
+	if (!root_inode)
+		goto out_free_filemash_path;
+
+
+	root_dentry = d_make_root(root_inode);
+	if (!root_dentry)
+		goto out_release_root;
+
+	root_dentry->d_fsdata = NULL;
+	root_dentry->d_op = NULL;
+
+	sb->s_op = &filemash_super_operations;
+	sb->s_root = root_dentry;
+	sb->s_fs_info = (void *)fm_fs;
+
+	return 0;
+
+out_release_root:
+	iput(root_inode);
+
+out_free_filemash_path:
+	kfree(fm_fs->fs_sort);
+	for (j = 0; j < i; j++)
+		path_put(&fm_fs->fs_path[j]);
+	kfree(fm_fs->fs_path);
+
+
+out1:
+	kfree(fm_fs);
+out:
+	kfree(fm_info);
+	return err;
+}
+
+static struct dentry *filemash_mount(struct file_system_type *fs_type,
+					int flags,
+					const char *dev_name,
+					void *raw_data)
+{
+	return mount_nodev(fs_type, flags, raw_data, filemash_fill_super);
+}
+
+static struct file_system_type filemash_fs_type = {
+	.owner		= THIS_MODULE,
+	.name		= "filemashfs",
+	.mount		= filemash_mount,
+	.kill_sb	= kill_anon_super,
+};
+
+static int __init filemash_init(void)
+{
+	return register_filesystem(&filemash_fs_type);
+}
+
+static void __exit filemash_exit(void)
+{
+	unregister_filesystem(&filemash_fs_type);
+}
+
+module_init(filemash_init);
+module_exit(filemash_exit);


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 1/1] vfs: Filemash fs
  2013-04-03 10:23 Ram Pai
@ 2013-04-07 11:03 ` Vyacheslav Dubeyko
  2013-04-08  3:47   ` Ram Pai
  0 siblings, 1 reply; 6+ messages in thread
From: Vyacheslav Dubeyko @ 2013-04-07 11:03 UTC (permalink / raw)
  To: Ram Pai; +Cc: linux-fsdevel

Hi Ram,

On Apr 3, 2013, at 2:23 PM, Ram Pai wrote:

> The following patch implements a filesystem driver which provides the ability
> to mashup exisiting files in creative ways in-order to create new files.
> 
> Think of it as a way to union files; not filesystems.
> 
> Its a prototype idea with a prototype implementation.  Tested and working on
> 3.0.9-rc1.  I have included Documentation file which details the idea with
> examples and possible applications.
> 
> Any suggestions/ideas to make this useful and generally applicable is very much
> appreciated!
> 

Why do you think that your solution is better than LVM or RAID technologies?

I think that using mount options in your solution is weird way. Let's imagine a file that it will include a hundreds parts.

Moreover, I think that in your approach the using of remount option is very necessary. It needs for adding file's parts and change mounted parts.

With the best regards,
Vyacheslav Dubeyko.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 1/1] vfs: Filemash fs
  2013-04-07 11:03 ` Vyacheslav Dubeyko
@ 2013-04-08  3:47   ` Ram Pai
  2013-04-11  0:44     ` Dave Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: Ram Pai @ 2013-04-08  3:47 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-fsdevel

On Sun, Apr 07, 2013 at 03:03:29PM +0400, Vyacheslav Dubeyko wrote:
> Hi Ram,
> 
> On Apr 3, 2013, at 2:23 PM, Ram Pai wrote:
> 
> > The following patch implements a filesystem driver which provides the ability
> > to mashup exisiting files in creative ways in-order to create new files.
> > 
> > Think of it as a way to union files; not filesystems.
> > 
> > Its a prototype idea with a prototype implementation.  Tested and working on
> > 3.0.9-rc1.  I have included Documentation file which details the idea with
> > examples and possible applications.
> > 
> > Any suggestions/ideas to make this useful and generally applicable is very much
> > appreciated!
> > 
> 
> Why do you think that your solution is better than LVM or RAID technologies?

Each has its own place. LVM/RAID lets you build block devices in
creative ways. Filemashup lets you build files in creative ways.
So both solutions have their own application which are not necessarily
the same. Hence I can't say one is better than the other.

> 
> I think that using mount options in your solution is weird way. Let's imagine a file that it will include a hundreds parts.

Well, I tried to mimic the same kind of approach used by overlayfs to
union-mount directories. 

Yes. you are right. if you want to mashup hundreds of files, then you
will have to provide all those hundred files on the command line, or you
can put the options in /etc/fstab.  Can you think of a better approach?


> 
> Moreover, I think that in your approach the using of remount option is very necessary. It needs for adding file's parts and change mounted parts.

Or use an ioctl or sysfs mechanism to dynamically change the layout of
the files.

Thanks for your comments, I certainly want to know if this idea is worth
pursuing further. I certainly see applications of this idea for my own
personal use, the best being, the ablilty to extend my files across
filesystems.

RP


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 1/1] vfs: Filemash fs
  2013-04-08  3:47   ` Ram Pai
@ 2013-04-11  0:44     ` Dave Chinner
  2013-04-24  8:03       ` Ram Pai
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2013-04-11  0:44 UTC (permalink / raw)
  To: Ram Pai; +Cc: Vyacheslav Dubeyko, linux-fsdevel

On Mon, Apr 08, 2013 at 11:47:09AM +0800, Ram Pai wrote:
> On Sun, Apr 07, 2013 at 03:03:29PM +0400, Vyacheslav Dubeyko wrote:
> > Hi Ram,
> > 
> > On Apr 3, 2013, at 2:23 PM, Ram Pai wrote:
> > 
> > > The following patch implements a filesystem driver which provides the ability
> > > to mashup exisiting files in creative ways in-order to create new files.
> > > 
> > > Think of it as a way to union files; not filesystems.
> > > 
> > > Its a prototype idea with a prototype implementation.  Tested and working on
> > > 3.0.9-rc1.  I have included Documentation file which details the idea with
> > > examples and possible applications.
> > > 
> > > Any suggestions/ideas to make this useful and generally applicable is very much
> > > appreciated!
> > > 
> > 
> > Why do you think that your solution is better than LVM or RAID technologies?
> 
> Each has its own place. LVM/RAID lets you build block devices in
> creative ways. Filemashup lets you build files in creative ways.
> So both solutions have their own application which are not necessarily
> the same. Hence I can't say one is better than the other.
> 
> > 
> > I think that using mount options in your solution is weird way. Let's imagine a file that it will include a hundreds parts.
> 
> Well, I tried to mimic the same kind of approach used by overlayfs to
> union-mount directories. 
> 
> Yes. you are right. if you want to mashup hundreds of files, then you
> will have to provide all those hundred files on the command line, or you
> can put the options in /etc/fstab.  Can you think of a better approach?

Overlay-style filesystem that keeps the mashup information in
xattrs in the underlying files.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 1/1] vfs: Filemash fs
  2013-04-11  0:44     ` Dave Chinner
@ 2013-04-24  8:03       ` Ram Pai
  0 siblings, 0 replies; 6+ messages in thread
From: Ram Pai @ 2013-04-24  8:03 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Vyacheslav Dubeyko, linux-fsdevel

On Thu, Apr 11, 2013 at 10:44:32AM +1000, Dave Chinner wrote:
> On Mon, Apr 08, 2013 at 11:47:09AM +0800, Ram Pai wrote:
> > On Sun, Apr 07, 2013 at 03:03:29PM +0400, Vyacheslav Dubeyko wrote:
> > > Hi Ram,
> > > 
> > > On Apr 3, 2013, at 2:23 PM, Ram Pai wrote:
> > > 
> > > > The following patch implements a filesystem driver which provides the ability
> > > > to mashup exisiting files in creative ways in-order to create new files.
> > > > 
> > > > Think of it as a way to union files; not filesystems.
> > > > 
> > > > Its a prototype idea with a prototype implementation.  Tested and working on
> > > > 3.0.9-rc1.  I have included Documentation file which details the idea with
> > > > examples and possible applications.
> > > > 
> > > > Any suggestions/ideas to make this useful and generally applicable is very much
> > > > appreciated!
> > > > 
> > > 
> > > Why do you think that your solution is better than LVM or RAID technologies?
> > 
> > Each has its own place. LVM/RAID lets you build block devices in
> > creative ways. Filemashup lets you build files in creative ways.
> > So both solutions have their own application which are not necessarily
> > the same. Hence I can't say one is better than the other.
> > 
> > > 
> > > I think that using mount options in your solution is weird way. Let's imagine a file that it will include a hundreds parts.
> > 
> > Well, I tried to mimic the same kind of approach used by overlayfs to
> > union-mount directories. 
> > 
> > Yes. you are right. if you want to mashup hundreds of files, then you
> > will have to provide all those hundred files on the command line, or you
> > can put the options in /etc/fstab.  Can you think of a better approach?
> 
> Overlay-style filesystem that keeps the mashup information in
> xattrs in the underlying files.

Thanks Dave for the suggestion. Will explore that.
RP


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-04-24  8:03 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-03  9:16 [RFC PATCH 1/1] vfs: Filemash fs Ram Pai
  -- strict thread matches above, loose matches on Subject: below --
2013-04-03 10:23 Ram Pai
2013-04-07 11:03 ` Vyacheslav Dubeyko
2013-04-08  3:47   ` Ram Pai
2013-04-11  0:44     ` Dave Chinner
2013-04-24  8:03       ` Ram Pai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).