[RFC 0/9] osdfs

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC 0/9] osdfs
@ 2008-10-30 14:26 Boaz Harrosh
  2008-10-30 14:30 ` [RFC 1/9] osdfs: osd Swiss army knife Boaz Harrosh
                   ` (9 more replies)
  0 siblings, 10 replies; 14+ messages in thread
From: Boaz Harrosh @ 2008-10-30 14:26 UTC (permalink / raw)
  To: Avishay Traeger, linux-scsi, linux-fsdevel, open-osd

Please review an OSD based file system. 

Given that our OSD initiator library is accepted into Kernel, we would
like to also submit an osdfs. This is the first iteration of this file system.

The next stage is to make it exportable by the pNFS-over-objects Server.
osdfs is one of the building blocks for a full, end-to-end open source
reference implementation of a Server/Client pNFS-over-objects we
want to have available in Linux. Other parts are the Generic pNFS
client project with the objects-layout-driver, and the generic pNFS
server plus osdfs once it is adapted to be exportable.
(See all about pNFS in Linux at:
http://wiki.linux-nfs.org/wiki/index.php/PNFS_prototype_design)

osdfs was originally developed by Avishay Traeger <avishay@gmail.com>
from IBM. A very old version of it is hosted on sourceforge as the osdfs
project. It was originally developed for the 2.6.10 Kernel over the old
IBM's osd-initiator Linux driver.

Since then it was picked by us, open-osd, and was both forward ported to
current Kernel, as well as converted to run over our osd Kernel Library.
The conversion effort, if anyone is interested, is also available as a
patchset here:
  git-clone git://git-open-osd.org/open-osd.git osdfs-devel
or on the web at:
  http://git.open-osd.org/gitweb.cgi?p=open-osd.git;a=shortlog;h=refs/heads/osdfs-devel

The Original code is based on ext2 code from the Kernel at the time.
Further reading is available at the last patch in the osdfs.txt file.

I have mechanically divided the code in parts, each introducing a
group of vfs function vectors, all tied at the end into a full filesystem.
Each patch can be compiled but it will only run at the very end.
This was done for the hope of easier reviewing.

Here is the list of patches
[RFC 1/9] osdfs: osd Swiss army knife
[RFC 2/9] osdfs: file and file_inode operations
[RFC 3/9] osdfs: symlink_inode and fast_symlink_inode operations
[RFC 4/9] osdfs: address_space_operations
[RFC 5/9] osdfs: dir_inode and directory operations
[RFC 6/9] osdfs: super_operations and file_system_type
[RFC 7/9] osdfs: mkosdfs
[RFC 8/9] osdfs: Documentation
  Patches to be submitted

[PATCH 9/9] [out-of-tree] open-osd: Global Makefile and do-osdfs test script
  This patch will not be submitted. It is only needed if compiling
  out-of-tree. One more patch that is missing from this patchset is the
  patch to Kernel's fs/Makefile fs/Kconfig and Documentation/filesystems/00-INDEX.

This patchset is also available on:
  git-clone git://git-open-osd.org/open-osd.git osdfs
or on the web at:
  http://git.open-osd.org/gitweb.cgi?p=open-osd.git;a=shortlog;h=refs/heads/osdfs

If anyone wants to actually run this code and test it
then please start here: http://open-osd.org
and also the osdfs.txt file at last patch should help

Thanks in advance
Boaz

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC 1/9] osdfs: osd Swiss army knife
  2008-10-30 14:26 [RFC 0/9] osdfs Boaz Harrosh
@ 2008-10-30 14:30 ` Boaz Harrosh
  2008-10-30 14:31 ` [RFC 2/9] osdfs: file and file_inode operations Boaz Harrosh
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Boaz Harrosh @ 2008-10-30 14:30 UTC (permalink / raw)
  To: Avishay Traeger, linux-scsi, linux-fsdevel, open-osd

In this patch are all the osd infrastructure that will be used later
by the file system.

Also the declarations of constants, on disk structures, and prototypes.

And the Kbuild+Kconfig files needed to build the osdfs module.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
 fs/osdfs/Kbuild   |   24 ++++
 fs/osdfs/Kconfig  |   13 ++
 fs/osdfs/common.h |  154 ++++++++++++++++++++++++++
 fs/osdfs/osd.c    |  316 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/osdfs/osdfs.h  |  180 ++++++++++++++++++++++++++++++
 5 files changed, 687 insertions(+), 0 deletions(-)
 create mode 100644 fs/osdfs/Kbuild
 create mode 100644 fs/osdfs/Kconfig
 create mode 100644 fs/osdfs/common.h
 create mode 100644 fs/osdfs/osd.c
 create mode 100644 fs/osdfs/osdfs.h

diff --git a/fs/osdfs/Kbuild b/fs/osdfs/Kbuild
new file mode 100644
index 0000000..19d709e
--- /dev/null
+++ b/fs/osdfs/Kbuild
@@ -0,0 +1,24 @@
+#
+# Kbuild for the OSDFS module
+#
+# Copyright (C) 2008 Panasas Inc.  All rights reserved.
+#
+# Authors:
+#   Boaz Harrosh <bharrosh@panasas.com>
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2
+#
+# Kbuild - Gets included from the Kernels Makefile and build system
+#
+
+ifneq ($(OSD_INC),)
+# we are built out-of-tree Kconfigure everything as on
+CONFIG_OSDFS_FS=m
+EXTRA_CFLAGS += -DCONFIG_OSDFS_FS -DCONFIG_OSDFS_FS_MODULE
+EXTRA_CFLAGS += -I$(OSD_INC)
+# EXTRA_CFLAGS += -DCONFIG_OSDFS_DEBUG
+endif
+
+osdfs-objs := osd.o
+obj-$(CONFIG_OSDFS_FS) += osdfs.o
diff --git a/fs/osdfs/Kconfig b/fs/osdfs/Kconfig
new file mode 100644
index 0000000..843bf82
--- /dev/null
+++ b/fs/osdfs/Kconfig
@@ -0,0 +1,13 @@
+config OSDFS_FS
+	tristate "OSD based file system support"
+	depends on SCSI_OSD_ULD
+	help
+	  OSDFS is a file system that uses an OSD storage device,
+	  as its backing storage.
+
+# Debugging-related stuff
+config OSDFS_DEBUG
+	bool "Enable debugging"
+	depends on OSDFS_FS
+	help
+	  This option enables OSDFS debug prints.
diff --git a/fs/osdfs/common.h b/fs/osdfs/common.h
new file mode 100644
index 0000000..37886f7
--- /dev/null
+++ b/fs/osdfs/common.h
@@ -0,0 +1,154 @@
+/*
+ * Copyright (C) 2005, 2006
+ * Avishay Traeger (avishay@gmail.com) (avishay@il.ibm.com)
+ * Copyright (C) 2005, 2006
+ * International Business Machines
+ *
+ * Copyrights for code taken from ext2:
+ *     Copyright (C) 1992, 1993, 1994, 1995
+ *     Remy Card (card@masi.ibp.fr)
+ *     Laboratoire MASI - Institut Blaise Pascal
+ *     Universite Pierre et Marie Curie (Paris VI)
+ *     from
+ *     linux/fs/minix/inode.c
+ *     Copyright (C) 1991, 1992  Linus Torvalds
+ *
+ * This file is part of osdfs.
+ *
+ * osdfs is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation.  Since it is based on ext2, and the only
+ * valid version of GPL for the Linux kernel is version 2, the only valid
+ * version of GPL for osdfs is version 2.
+ *
+ * osdfs is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with osdfs; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+#ifndef __OSDFS_COM_H__
+#define __OSDFS_COM_H__
+
+#include <linux/types.h>
+#include <linux/timex.h>
+
+#include <scsi/osd_attributes.h>
+#include <scsi/osd_initiator.h>
+#include <scsi/osd_sec.h>
+
+/****************************************************************************
+ * Object ID related defines
+ * NOTE: inode# = object ID - OSDFS_OBJ_OFF
+ ****************************************************************************/
+#define OSDFS_OBJ_OFF	0x10000	/* offset for objects */
+#define OSDFS_SUPER_ID	0x10000	/* object ID for on-disk superblock */
+#define OSDFS_BM_ID	0x10001	/* object ID for ID bitmap */
+#define OSDFS_ROOT_ID	0x10002	/* object ID for root directory */
+#define OSDFS_TEST_ID	0x10003	/* object ID for test object */
+
+/* osdfs Application specific page/attribute */
+#ifndef OSD_PAGE_NUM_IBM_UOBJ_FS_DATA
+# define OSD_PAGE_NUM_IBM_UOBJ_FS_DATA	   (OSD_APAGE_APP_DEFINED_FIRST + 3)
+# define OSD_ATTR_NUM_IBM_UOBJ_FS_DATA_INODE 1
+#endif
+
+/*
+ * The maximum number of files we can have is limited by the size of the
+ * inode number.  This is the largest object ID that the file system supports.
+ * Object IDs 0, 1, and 2 are always in use (see above defines).
+ */
+enum {
+	OSDFS_UINT64_MAX = (~0LL),
+	OSDFS_MAX_INO_ID = (sizeof(ino_t) * 8 == 64) ? OSDFS_UINT64_MAX :
+					(1LL << (sizeof(ino_t) * 8 - 1)),
+	OSDFS_MAX_ID	 = (OSDFS_MAX_INO_ID - 1 - OSDFS_OBJ_OFF),
+};
+
+/****************************************************************************
+ * Misc.
+ ****************************************************************************/
+#define OSDFS_BLKSHIFT	12
+#define OSDFS_BLKSIZE	(1UL << OSDFS_BLKSHIFT)
+
+/****************************************************************************
+ * superblock-related things
+ ****************************************************************************/
+#define OSDFS_SUPER_MAGIC	0x5DF5
+
+/*
+ * The file system control block - stored in an object's data (mainly, the one
+ * with ID OSDFS_SUPER_ID).  This is where the in-memory superblock is stored
+ * on disk.  Right now it just has a magic value, which is basically a sanity
+ * check on our ability to communicate with the object store.
+ */
+struct osdfs_fscb {
+	uint32_t  s_nextid;	/* Highest object ID used */
+	uint32_t  s_numfiles;	/* Number of files on fs */
+	uint16_t  s_magic;	/* Magic signature */
+	uint16_t  s_newfs;	/* Non-zero if this is a new fs */
+};
+
+/****************************************************************************
+ * inode-related things
+ ****************************************************************************/
+#define OSDFS_INO_ATTR_SIZE	64
+#define OSDFS_IDATA		15
+
+/*
+ * The file control block - stored in an object's attributes.  This is where
+ * the in-memory inode is stored on disk.
+ */
+struct osdfs_fcb {
+	uint16_t  i_mode;         	/* File mode */
+	uint16_t  i_links_count;  	/* Links count */
+	uint32_t  i_uid;          	/* Owner Uid */
+	uint32_t  i_gid;          	/* Group Id */
+	uint32_t  i_atime;        	/* Access time */
+	uint32_t  i_ctime;        	/* Creation time */
+	uint32_t  i_mtime;        	/* Modification time */
+	uint32_t  i_flags;        	/* File flags */
+	uint32_t  i_version;      	/* File version */
+	uint32_t  i_generation;   	/* File version (for NFS) */
+	uint64_t  i_size;		/* Size of the file */
+	uint64_t  i_objs;         	/* Other objects for file - not used */
+	uint32_t  i_data[OSDFS_IDATA];	/* Short symlink names and device #s */
+};
+
+/****************************************************************************
+ * dentry-related things
+ ****************************************************************************/
+#define OSDFS_NAME_LEN	255
+
+/*
+ * The on-disk directory entry
+ */
+struct osdfs_dir_entry {
+	uint32_t	inode;			/* inode number           */
+	uint16_t	rec_len;		/* directory entry length */
+	uint8_t		name_len;		/* name length            */
+	uint8_t		file_type;		/* umm...file type        */
+	char		name[OSDFS_NAME_LEN];	/* file name              */
+};
+
+enum {
+	OSDFS_FT_UNKNOWN,
+	OSDFS_FT_REG_FILE,
+	OSDFS_FT_DIR,
+	OSDFS_FT_CHRDEV,
+	OSDFS_FT_BLKDEV,
+	OSDFS_FT_FIFO,
+	OSDFS_FT_SOCK,
+	OSDFS_FT_SYMLINK,
+	OSDFS_FT_MAX
+};
+
+#define OSDFS_DIR_PAD			4
+#define OSDFS_DIR_ROUND			(OSDFS_DIR_PAD - 1)
+#define OSDFS_DIR_REC_LEN(name_len)	(((name_len) + 8 + OSDFS_DIR_ROUND) & \
+					 ~OSDFS_DIR_ROUND)
+#endif
diff --git a/fs/osdfs/osd.c b/fs/osdfs/osd.c
new file mode 100644
index 0000000..18574b4
--- /dev/null
+++ b/fs/osdfs/osd.c
@@ -0,0 +1,316 @@
+/*
+ * Copyright (C) 2005, 2006
+ * Avishay Traeger (avishay@gmail.com) (avishay@il.ibm.com)
+ * Copyright (C) 2005, 2006
+ * International Business Machines
+ *
+ * This file is part of osdfs.
+ *
+ * osdfs is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation.  Since it is based on ext2, and the only
+ * valid version of GPL for the Linux kernel is version 2, the only valid
+ * version of GPL for osdfs is version 2.
+ *
+ * osdfs is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with osdfs; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+#include <scsi/scsi_device.h>
+
+#include "osdfs.h"
+
+int check_ok(struct osd_request *req)
+{
+	return req->request->errors;
+}
+
+void make_credential(uint8_t cred_a[OSD_CAP_LEN], uint64_t pid, uint64_t oid)
+{
+	struct osd_obj_id obj = {
+		.partition = pid,
+		.id = oid
+	};
+
+	osd_sec_init_nosec_doall_caps(cred_a, &obj, false, true);
+}
+
+/*
+ * Perform a synchronous OSD operation.
+ */
+int osdfs_sync_op(struct osd_request *req, int timeout, uint8_t *credential)
+{
+	int ret;
+
+	req->timeout = timeout;
+	ret = osd_finalize_request(req, 0, credential, NULL);
+	if (ret) {
+		OSDFS_DBGMSG("Faild to osd_finalize_request() => %d\n", ret);
+		return ret;
+	}
+
+	ret = osd_execute_request(req);
+
+	if (ret)
+		OSDFS_DBGMSG("osd_execute_request() => %d\n", ret);
+	/* osd_req_decode_sense(or, ret); */
+	return ret;
+}
+
+/*
+ * Perform an asynchronous OSD operation.
+ */
+int osdfs_async_op(struct osd_request *req, osd_req_done_fn *async_done,
+		   void *caller_context, char *credential)
+{
+	int ret;
+
+	ret = osd_finalize_request(req, 0, credential, NULL);
+	if (ret) {
+		OSDFS_DBGMSG("Faild to osd_finalize_request() => %d\n", ret);
+		return ret;
+	}
+
+	ret = osd_execute_request_async(req, async_done, caller_context);
+
+	if (ret)
+		OSDFS_DBGMSG("osd_execute_request_async() => %d\n", ret);
+	return ret;
+}
+
+int prepare_get_attr_list_add_entry(struct osd_request *req,
+				    uint32_t page_num,
+				    uint32_t attr_num,
+				    uint32_t attr_len)
+{
+	struct osd_attr attr = {
+		.page = page_num,
+		.attr_id = attr_num,
+		.len = attr_len,
+	};
+
+	return osd_req_add_get_attr_list(req, &attr, 1);
+}
+
+int prepare_set_attr_list_add_entry(struct osd_request *req,
+				    uint32_t page_num,
+				    uint32_t attr_num,
+				    uint16_t attr_len,
+				    const unsigned char *attr_val)
+{
+	struct osd_attr attr = {
+		.page = page_num,
+		.attr_id = attr_num,
+		.len = attr_len,
+		.val_ptr = (u8 *)attr_val,
+	};
+
+	return osd_req_add_set_attr_list(req, &attr, 1);
+}
+
+int extract_next_attr_from_req(struct osd_request *req,
+	uint32_t *page_num, uint32_t *attr_num,
+	uint16_t *attr_len, uint8_t **attr_val)
+{
+	struct osd_attr attr = {.page = 0}; /* start with zeros */
+	void *iter = NULL;
+	int nelem;
+
+	do {
+		nelem = 1;
+		osd_req_decode_get_attr_list(req, &attr, &nelem, &iter);
+		if ((attr.page == *page_num) && (attr.attr_id == *attr_num)) {
+			*attr_len = attr.len;
+			*attr_val = attr.val_ptr;
+			return 0;
+		}
+	} while (iter);
+
+	return -EIO;
+}
+
+struct osd_request *prepare_osd_format_lun(struct osd_dev *dev,
+					   uint64_t formatted_capacity)
+{
+	struct osd_request *or = osd_start_request(dev, GFP_KERNEL);
+
+	if (!or)
+		return NULL;
+
+	osd_req_format(or, formatted_capacity);
+
+	return or;
+}
+
+struct osd_request *prepare_osd_create_partition(struct osd_dev *dev,
+						 uint64_t requested_id)
+{
+	struct osd_request *or = osd_start_request(dev, GFP_KERNEL);
+
+	if (!or)
+		return NULL;
+
+	osd_req_create_partition(or, requested_id);
+
+	return or;
+}
+
+struct osd_request *prepare_osd_remove_partition(struct osd_dev *dev,
+						 uint64_t requested_id)
+{
+	struct osd_request *or = osd_start_request(dev, GFP_KERNEL);
+
+	if (!or)
+		return NULL;
+
+	osd_req_remove_partition(or, requested_id);
+
+	return or;
+}
+
+struct osd_request *prepare_osd_create(struct osd_dev *dev,
+				     uint64_t part_id,
+				     uint64_t requested_id)
+{
+	struct osd_obj_id obj = {
+		.partition = part_id,
+		.id = requested_id
+	};
+	struct osd_request *or = osd_start_request(dev, GFP_KERNEL);
+
+	if (!or)
+		return NULL;
+
+	osd_req_create_object(or, &obj);
+
+	return or;
+}
+
+struct osd_request *prepare_osd_remove(struct osd_dev *dev,
+				     uint64_t part_id,
+				     uint64_t obj_id)
+{
+	struct osd_obj_id obj = {
+		.partition = part_id,
+		.id = obj_id
+	};
+	struct osd_request *or = osd_start_request(dev, GFP_KERNEL);
+
+	if (!or)
+		return NULL;
+
+	osd_req_remove_object(or, &obj);
+
+	return or;
+}
+
+struct osd_request *prepare_osd_set_attr(struct osd_dev *dev,
+				       uint64_t part_id,
+				       uint64_t obj_id)
+{
+	struct osd_obj_id obj = {
+		.partition = part_id,
+		.id = obj_id
+	};
+	struct osd_request *or = osd_start_request(dev, GFP_KERNEL);
+
+	if (!or)
+		return NULL;
+
+	osd_req_set_attributes(or, &obj);
+
+	return or;
+}
+
+struct osd_request *prepare_osd_get_attr(struct osd_dev *dev,
+				       uint64_t part_id,
+				       uint64_t obj_id)
+{
+	struct osd_obj_id obj = {
+		.partition = part_id,
+		.id = obj_id
+	};
+	struct osd_request *or = osd_start_request(dev, GFP_KERNEL);
+
+	if (!or)
+		return NULL;
+
+	osd_req_get_attributes(or, &obj);
+
+	return or;
+}
+
+struct osd_request *prepare_osd_read(struct osd_dev *dev,
+				   uint64_t part_id,
+				   uint64_t obj_id,
+				   uint64_t length,
+				   uint64_t offset,
+				   int cmd_data_use_sg,
+				   unsigned char *cmd_data)
+{
+	struct osd_obj_id obj = {
+		.partition = part_id,
+		.id = obj_id
+	};
+	struct osd_request *or = osd_start_request(dev, GFP_KERNEL);
+	struct request_queue *req_q = dev->scsi_dev->request_queue;
+	struct bio *bio;
+
+	if (!or)
+		return NULL;
+
+	BUG_ON(cmd_data_use_sg);
+	bio = bio_map_kern(req_q, cmd_data, length, or->alloc_flags);
+	if (!bio) {
+		osd_end_request(or);
+		return NULL;
+	}
+
+	osd_req_read(or, &obj, bio, offset);
+	OSDFS_DBGMSG("osd_req_read(p=%llX, ob=%llX, l=%llu, of=%llu)\n",
+		part_id, obj_id, length, offset);
+	return or;
+}
+
+struct osd_request *prepare_osd_write(struct osd_dev *dev,
+				    uint64_t part_id,
+				    uint64_t obj_id,
+				    uint64_t length,
+				    uint64_t offset,
+				    int cmd_data_use_sg,
+				    const unsigned char *cmd_data)
+{
+	struct osd_obj_id obj = {
+		.partition = part_id,
+		.id = obj_id
+	};
+	struct osd_request *or = osd_start_request(dev, GFP_KERNEL);
+	struct request_queue *req_q = dev->scsi_dev->request_queue;
+	struct bio *bio;
+
+	if (!or)
+		return NULL;
+
+	BUG_ON(cmd_data_use_sg);
+	bio = bio_map_kern(req_q, (u8 *)cmd_data, length, or->alloc_flags);
+	if (!bio) {
+		osd_end_request(or);
+		return NULL;
+	}
+
+	osd_req_write(or, &obj, bio, offset);
+	OSDFS_DBGMSG("osd_req_write(p=%llX, ob=%llX, l=%llu, of=%llu)\n",
+		part_id, obj_id, length, offset);
+	return or;
+}
+
+void free_osd_req(struct osd_request *req)
+{
+	osd_end_request(req);
+}
diff --git a/fs/osdfs/osdfs.h b/fs/osdfs/osdfs.h
new file mode 100644
index 0000000..30472b7
--- /dev/null
+++ b/fs/osdfs/osdfs.h
@@ -0,0 +1,180 @@
+/*
+ * Copyright (C) 2005, 2006
+ * Avishay Traeger (avishay@gmail.com) (avishay@il.ibm.com)
+ * Copyright (C) 2005, 2006
+ * International Business Machines
+ *
+ * Copyrights for code taken from ext2:
+ *     Copyright (C) 1992, 1993, 1994, 1995
+ *     Remy Card (card@masi.ibp.fr)
+ *     Laboratoire MASI - Institut Blaise Pascal
+ *     Universite Pierre et Marie Curie (Paris VI)
+ *     from
+ *     linux/fs/minix/inode.c
+ *     Copyright (C) 1991, 1992  Linus Torvalds
+ *
+ * This file is part of osdfs.
+ *
+ * osdfs is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation.  Since it is based on ext2, and the only
+ * valid version of GPL for the Linux kernel is version 2, the only valid
+ * version of GPL for osdfs is version 2.
+ *
+ * osdfs is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with osdfs; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+#include <linux/fs.h>
+#include <linux/time.h>
+#include "common.h"
+
+#ifndef __OSDFS_H__
+#define __OSDFS_H__
+
+#define OSDFS_ERR(fmt, a...) printk(KERN_ERR "osdfs: " fmt, ##a)
+
+#ifdef CONFIG_OSDFS_DEBUG
+#define OSDFS_DBGMSG(fmt, a...) \
+	printk(KERN_NOTICE "osdfs @%s:%d: " fmt, __func__, __LINE__, ##a)
+#else
+#define OSDFS_DBGMSG(fmt, a...) \
+	do {} while (0)
+#endif
+
+/*
+ * our extension to the in-memory superblock
+ */
+struct osdfs_sb_info {
+	struct osd_dev	*s_dev;			/* returned by get_osd_dev    */
+	uint64_t	s_pid;			/* partition ID of file system*/
+	int		s_timeout;		/* timeout for OSD operations */
+	uint32_t	s_nextid;		/* highest object ID used     */
+	uint32_t	s_numfiles;		/* number of files on fs      */
+	spinlock_t	s_next_gen_lock;	/* spinlock for gen # update  */
+	u32		s_next_generation;	/* next gen # to use          */
+	atomic_t	s_curr_pending;		/* number of pending commands */
+	uint8_t		s_cred[OSD_CAP_LEN];	/* all-powerful credential    */
+};
+
+/*
+ * our inode flags
+ */
+#ifdef ARCH_HAS_ATOMIC_UNSIGNED
+typedef unsigned osdfs_iflags_t;
+#else
+typedef unsigned long osdfs_iflags_t;
+#endif
+
+#define OBJ_2BCREATED	0	/* object will be created soon*/
+#define OBJ_CREATED	1	/* object has been created on the osd*/
+
+#define Obj2BCreated(oi) \
+	test_bit(OBJ_2BCREATED, &(oi->i_flags))
+#define SetObj2BCreated(oi) \
+	set_bit(OBJ_2BCREATED, &(oi->i_flags))
+
+#define ObjCreated(oi) \
+	test_bit(OBJ_CREATED, &(oi->i_flags))
+#define SetObjCreated(oi) \
+	set_bit(OBJ_CREATED, &(oi->i_flags))
+
+/*
+ * our extension to the in-memory inode
+ */
+struct osdfs_i_info {
+	osdfs_iflags_t i_flags;            /* various atomic flags            */
+	__le32	       i_data[OSDFS_IDATA];/*short symlink names and device #s*/
+	uint32_t       i_dir_start_lookup; /* which page to start lookup      */
+	wait_queue_head_t i_wq;            /* wait queue for inode            */
+	uint64_t       i_objs;             /*other objs for file (not used)   */
+	uint8_t        i_cred[OSD_CAP_LEN];/* all-powerful credential         */
+	struct inode   vfs_inode;          /* normal in-memory inode          */
+};
+
+/*
+ * get to our inode from the vfs inode
+ */
+static inline struct osdfs_i_info *OSDFS_I(struct inode *inode)
+{
+	return container_of(inode, struct osdfs_i_info, vfs_inode);
+}
+
+/*************************
+ * function declarations *
+ *************************/
+/* osd.c                 */
+void make_credential(uint8_t[], uint64_t, uint64_t);
+int check_ok(struct osd_request *);
+int osdfs_sync_op(struct osd_request *, int, uint8_t *);
+int osdfs_async_op(struct osd_request *, osd_req_done_fn *, void *, char *);
+
+int prepare_get_attr_list_add_entry(struct osd_request *req,
+				    uint32_t page_num,
+				    uint32_t attr_num,
+				    uint32_t attr_len);
+int prepare_set_attr_list_add_entry(struct osd_request *req,
+				    uint32_t page_num,
+				    uint32_t attr_num,
+				    uint16_t attr_len,
+				    const unsigned char *attr_val);
+int extract_next_attr_from_req(struct osd_request *req,
+			       uint32_t *page_num, uint32_t *attr_num,
+			       uint16_t *attr_len, uint8_t **attr_val);
+struct osd_request *prepare_osd_format_lun(struct osd_dev *dev,
+					   uint64_t formatted_capacity);
+struct osd_request *prepare_osd_create_partition(struct osd_dev *dev,
+						 uint64_t requested_id);
+struct osd_request *prepare_osd_remove_partition(struct osd_dev *dev,
+						 uint64_t requested_id);
+struct osd_request *prepare_osd_create(struct osd_dev *dev,
+				       uint64_t part_id,
+				       uint64_t requested_id);
+struct osd_request *prepare_osd_remove(struct osd_dev *dev,
+				       uint64_t part_id,
+				       uint64_t obj_id);
+struct osd_request *prepare_osd_set_attr(struct osd_dev *dev,
+					 uint64_t part_id,
+					 uint64_t obj_id);
+struct osd_request *prepare_osd_get_attr(struct osd_dev *dev,
+					 uint64_t part_id,
+					 uint64_t obj_id);
+struct osd_request *prepare_osd_read(struct osd_dev *dev,
+				     uint64_t part_id,
+				     uint64_t obj_id,
+				     uint64_t length,
+				     uint64_t offset,
+				     int cmd_data_use_sg,
+				     unsigned char *cmd_data);
+struct osd_request *prepare_osd_write(struct osd_dev *dev,
+				      uint64_t part_id,
+				      uint64_t obj_id,
+				      uint64_t length,
+				      uint64_t offset,
+				      int cmd_data_use_sg,
+				      const unsigned char *cmd_data);
+struct osd_request *prepare_osd_list(struct osd_dev *dev,
+				     uint64_t part_id,
+				     uint32_t list_id,
+				     uint64_t alloc_len,
+				     uint64_t initial_obj_id,
+				     int use_sg,
+				     void *data);
+int extract_list_from_req(struct osd_request *req,
+			  uint64_t *total_matches_p,
+			  uint64_t *num_ids_retrieved_p,
+			  uint64_t *list_of_ids_p[],
+			  int      *is_list_of_partitions_p,
+			  int      *list_isnt_up_to_date_p,
+			  uint64_t *continuation_tag_p,
+			  uint32_t *list_id_for_more_p);
+
+void free_osd_req(struct osd_request *req);
+
+#endif
-- 
1.6.0.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC 2/9] osdfs: file and file_inode operations
  2008-10-30 14:26 [RFC 0/9] osdfs Boaz Harrosh
  2008-10-30 14:30 ` [RFC 1/9] osdfs: osd Swiss army knife Boaz Harrosh
@ 2008-10-30 14:31 ` Boaz Harrosh
  2008-10-30 14:32 ` [RFC 3/9] osdfs: symlink_inode and fast_symlink_inode operations Boaz Harrosh
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Boaz Harrosh @ 2008-10-30 14:31 UTC (permalink / raw)
  To: Avishay Traeger, linux-scsi, linux-fsdevel, open-osd

implementation of the file_operations and inode_operations for
regular data files.

All file_operations are generic vfs implementations except
osdfs_truncate.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
 fs/osdfs/Kbuild  |    2 +-
 fs/osdfs/file.c  |   58 ++++++++++++++++++++++
 fs/osdfs/inode.c |  140 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/osdfs/osdfs.h |   11 ++++
 4 files changed, 210 insertions(+), 1 deletions(-)
 create mode 100644 fs/osdfs/file.c
 create mode 100644 fs/osdfs/inode.c

diff --git a/fs/osdfs/Kbuild b/fs/osdfs/Kbuild
index 19d709e..c8ca4ce 100644
--- a/fs/osdfs/Kbuild
+++ b/fs/osdfs/Kbuild
@@ -20,5 +20,5 @@ EXTRA_CFLAGS += -I$(OSD_INC)
 # EXTRA_CFLAGS += -DCONFIG_OSDFS_DEBUG
 endif
 
-osdfs-objs := osd.o
+osdfs-objs := osd.o inode.o file.o
 obj-$(CONFIG_OSDFS_FS) += osdfs.o
diff --git a/fs/osdfs/file.c b/fs/osdfs/file.c
new file mode 100644
index 0000000..3442979
--- /dev/null
+++ b/fs/osdfs/file.c
@@ -0,0 +1,58 @@
+/*
+ * Copyright (C) 2005, 2006
+ * Avishay Traeger (avishay@gmail.com) (avishay@il.ibm.com)
+ * Copyright (C) 2005, 2006
+ * International Business Machines
+ *
+ * Copyrights for code taken from ext2:
+ *     Copyright (C) 1992, 1993, 1994, 1995
+ *     Remy Card (card@masi.ibp.fr)
+ *     Laboratoire MASI - Institut Blaise Pascal
+ *     Universite Pierre et Marie Curie (Paris VI)
+ *     from
+ *     linux/fs/minix/inode.c
+ *     Copyright (C) 1991, 1992  Linus Torvalds
+ *
+ * This file is part of osdfs.
+ *
+ * osdfs is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation.  Since it is based on ext2, and the only
+ * valid version of GPL for the Linux kernel is version 2, the only valid
+ * version of GPL for osdfs is version 2.
+ *
+ * osdfs is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with osdfs; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+#include <linux/buffer_head.h>
+
+#include "osdfs.h"
+
+static int osdfs_release_file(struct inode *inode, struct file *filp)
+{
+	return 0;
+}
+
+struct file_operations osdfs_file_operations = {
+	.llseek		= generic_file_llseek,
+	.read		= do_sync_read,
+	.write		= do_sync_write,
+	.aio_read	= generic_file_aio_read,
+	.aio_write	= generic_file_aio_write,
+	.mmap		= generic_file_mmap,
+	.open		= generic_file_open,
+	.release	= osdfs_release_file,
+	.fsync		= file_fsync,
+};
+
+struct inode_operations osdfs_file_inode_operations = {
+	.truncate	= osdfs_truncate,
+	.setattr	= osdfs_setattr,
+};
diff --git a/fs/osdfs/inode.c b/fs/osdfs/inode.c
new file mode 100644
index 0000000..e009eb0
--- /dev/null
+++ b/fs/osdfs/inode.c
@@ -0,0 +1,140 @@
+/*
+ * Copyright (C) 2005, 2006
+ * Avishay Traeger (avishay@gmail.com) (avishay@il.ibm.com)
+ * Copyright (C) 2005, 2006
+ * International Business Machines
+ *
+ * Copyrights for code taken from ext2:
+ *     Copyright (C) 1992, 1993, 1994, 1995
+ *     Remy Card (card@masi.ibp.fr)
+ *     Laboratoire MASI - Institut Blaise Pascal
+ *     Universite Pierre et Marie Curie (Paris VI)
+ *     from
+ *     linux/fs/minix/inode.c
+ *     Copyright (C) 1991, 1992  Linus Torvalds
+ *
+ * This file is part of osdfs.
+ *
+ * osdfs is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation.  Since it is based on ext2, and the only
+ * valid version of GPL for the Linux kernel is version 2, the only valid
+ * version of GPL for osdfs is version 2.
+ *
+ * osdfs is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with osdfs; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+#include <linux/writeback.h>
+#include <linux/buffer_head.h>
+
+#include "osdfs.h"
+
+/*
+ * Test whether an inode is a fast symlink.
+ */
+static inline int osdfs_inode_is_fast_symlink(struct inode *inode)
+{
+	struct osdfs_i_info *oi = OSDFS_I(inode);
+
+	return S_ISLNK(inode->i_mode) && (oi->i_data[0] != 0);
+}
+
+/*
+ * get_block_t - Fill in a buffer_head
+ * An OSD takes care of block allocation so we just fake an allocation by
+ * putting in the inode's sector_t in the buffer_head.
+ * TODO: What about the case of create==0 and @iblock does not exist in the
+ * object?
+ */
+int osdfs_get_block(struct inode *inode, sector_t iblock,
+		    struct buffer_head *bh_result, int create)
+{
+	map_bh(bh_result, inode->i_sb, iblock);
+	return 0;
+}
+
+/******************************************************************************
+ * INODE OPERATIONS
+ *****************************************************************************/
+
+/*
+ * Truncate a file to the specified size - all we have to do is set the size
+ * attribute.  We make sure the object exists first.
+ */
+void osdfs_truncate(struct inode *inode)
+{
+	struct osdfs_sb_info *sbi = inode->i_sb->s_fs_info;
+	struct osdfs_i_info *oi = OSDFS_I(inode);
+	struct osd_request *req = NULL;
+	loff_t isize = i_size_read(inode);
+	uint64_t newsize;
+	int ret;
+
+	if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode)
+	     || S_ISLNK(inode->i_mode)))
+		return;
+	if (osdfs_inode_is_fast_symlink(inode))
+		return;
+	if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
+		return;
+	inode->i_mtime = inode->i_ctime = CURRENT_TIME;
+
+	nobh_truncate_page(inode->i_mapping, isize, osdfs_get_block);
+
+	req = prepare_osd_set_attr(sbi->s_dev, sbi->s_pid,
+				 inode->i_ino + OSDFS_OBJ_OFF);
+	if (!req) {
+		printk(KERN_ERR "ERROR: prepare set_attr failed.\n");
+		goto fail;
+	}
+
+	newsize = cpu_to_be64((uint64_t) isize);
+	prepare_set_attr_list_add_entry(req, OSD_APAGE_OBJECT_INFORMATION,
+					OSD_ATTR_OI_LOGICAL_LENGTH, 8,
+					(unsigned char *)(&newsize));
+
+	/* if we are about to truncate an object, and it hasn't been
+	 * created yet, wait
+	 */
+	if (!ObjCreated(oi)) {
+		if (!Obj2BCreated(oi))
+			BUG();
+		else
+			wait_event(oi->i_wq, ObjCreated(oi));
+	}
+
+	ret = osdfs_sync_op(req, sbi->s_timeout, oi->i_cred);
+	free_osd_req(req);
+	if (ret)
+		goto fail;
+
+out:
+	mark_inode_dirty(inode);
+	return;
+fail:
+	make_bad_inode(inode);
+	goto out;
+}
+
+/*
+ * Set inode attributes - just call generic functions.
+ */
+int osdfs_setattr(struct dentry *dentry, struct iattr *iattr)
+{
+	struct inode *inode = dentry->d_inode;
+	int error;
+
+	error = inode_change_ok(inode, iattr);
+	if (error)
+		return error;
+
+	error = inode_setattr(inode, iattr);
+	return error;
+}
diff --git a/fs/osdfs/osdfs.h b/fs/osdfs/osdfs.h
index 30472b7..ea20411 100644
--- a/fs/osdfs/osdfs.h
+++ b/fs/osdfs/osdfs.h
@@ -177,4 +177,15 @@ int extract_list_from_req(struct osd_request *req,
 
 void free_osd_req(struct osd_request *req);
 
+/* inode.c               */
+void osdfs_truncate(struct inode *inode);
+int osdfs_setattr(struct dentry *, struct iattr *);
+
+/*********************
+ * operation vectors *
+ *********************/
+/* file.c            */
+extern struct inode_operations osdfs_file_inode_operations;
+extern struct file_operations osdfs_file_operations;
+
 #endif
-- 
1.6.0.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC 3/9] osdfs: symlink_inode and fast_symlink_inode operations
  2008-10-30 14:26 [RFC 0/9] osdfs Boaz Harrosh
  2008-10-30 14:30 ` [RFC 1/9] osdfs: osd Swiss army knife Boaz Harrosh
  2008-10-30 14:31 ` [RFC 2/9] osdfs: file and file_inode operations Boaz Harrosh
@ 2008-10-30 14:32 ` Boaz Harrosh
  2008-10-30 14:33 ` [RFC 4/9] osdfs: address_space_operations Boaz Harrosh
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Boaz Harrosh @ 2008-10-30 14:32 UTC (permalink / raw)
  To: Avishay Traeger, linux-scsi, linux-fsdevel, open-osd

Generic implementation of symlink ops.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
 fs/osdfs/Kbuild    |    2 +-
 fs/osdfs/osdfs.h   |    4 +++
 fs/osdfs/symlink.c |   54 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 59 insertions(+), 1 deletions(-)
 create mode 100644 fs/osdfs/symlink.c

diff --git a/fs/osdfs/Kbuild b/fs/osdfs/Kbuild
index c8ca4ce..eddba6a 100644
--- a/fs/osdfs/Kbuild
+++ b/fs/osdfs/Kbuild
@@ -20,5 +20,5 @@ EXTRA_CFLAGS += -I$(OSD_INC)
 # EXTRA_CFLAGS += -DCONFIG_OSDFS_DEBUG
 endif
 
-osdfs-objs := osd.o inode.o file.o
+osdfs-objs := osd.o inode.o file.o symlink.o
 obj-$(CONFIG_OSDFS_FS) += osdfs.o
diff --git a/fs/osdfs/osdfs.h b/fs/osdfs/osdfs.h
index ea20411..7610ce3 100644
--- a/fs/osdfs/osdfs.h
+++ b/fs/osdfs/osdfs.h
@@ -188,4 +188,8 @@ int osdfs_setattr(struct dentry *, struct iattr *);
 extern struct inode_operations osdfs_file_inode_operations;
 extern struct file_operations osdfs_file_operations;
 
+/* symlink.c         */
+extern struct inode_operations osdfs_symlink_inode_operations;
+extern struct inode_operations osdfs_fast_symlink_inode_operations;
+
 #endif
diff --git a/fs/osdfs/symlink.c b/fs/osdfs/symlink.c
new file mode 100644
index 0000000..7af5388
--- /dev/null
+++ b/fs/osdfs/symlink.c
@@ -0,0 +1,54 @@
+/*
+ * Copyright (C) 2005, 2006
+ * Avishay Traeger (avishay@gmail.com) (avishay@il.ibm.com)
+ * Copyright (C) 2005, 2006
+ * International Business Machines
+ *
+ * Copyrights for code taken from ext2:
+ *     Copyright (C) 1992, 1993, 1994, 1995
+ *     Remy Card (card@masi.ibp.fr)
+ *     Laboratoire MASI - Institut Blaise Pascal
+ *     Universite Pierre et Marie Curie (Paris VI)
+ *     from
+ *     linux/fs/minix/inode.c
+ *     Copyright (C) 1991, 1992  Linus Torvalds
+ *
+ * This file is part of osdfs.
+ *
+ * osdfs is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation.  Since it is based on ext2, and the only
+ * valid version of GPL for the Linux kernel is version 2, the only valid
+ * version of GPL for osdfs is version 2.
+ *
+ * osdfs is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with osdfs; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+#include <linux/namei.h>
+
+#include "osdfs.h"
+
+static void *osdfs_follow_link(struct dentry *dentry, struct nameidata *nd)
+{
+	struct osdfs_i_info *oi = OSDFS_I(dentry->d_inode);
+	nd_set_link(nd, (char *)oi->i_data);
+	return NULL;
+}
+
+struct inode_operations osdfs_symlink_inode_operations = {
+	.readlink	= generic_readlink,
+	.follow_link	= page_follow_link_light,
+	.put_link	= page_put_link,
+};
+
+struct inode_operations osdfs_fast_symlink_inode_operations = {
+	.readlink	= generic_readlink,
+	.follow_link	= osdfs_follow_link,
+};
-- 
1.6.0.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC 4/9] osdfs: address_space_operations
  2008-10-30 14:26 [RFC 0/9] osdfs Boaz Harrosh
                   ` (2 preceding siblings ...)
  2008-10-30 14:32 ` [RFC 3/9] osdfs: symlink_inode and fast_symlink_inode operations Boaz Harrosh
@ 2008-10-30 14:33 ` Boaz Harrosh
  2008-10-30 14:34 ` [RFC 5/9] osdfs: dir_inode and directory operations Boaz Harrosh
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Boaz Harrosh @ 2008-10-30 14:33 UTC (permalink / raw)
  To: Avishay Traeger, linux-scsi, linux-fsdevel, open-osd

OK Now we start to read and write from osd-objects, page-by-page.
The page index is the object's offset.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
 fs/osdfs/inode.c |  284 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/osdfs/osdfs.h |    3 +
 2 files changed, 287 insertions(+), 0 deletions(-)

diff --git a/fs/osdfs/inode.c b/fs/osdfs/inode.c
index e009eb0..bfd82b1 100644
--- a/fs/osdfs/inode.c
+++ b/fs/osdfs/inode.c
@@ -60,6 +60,290 @@ int osdfs_get_block(struct inode *inode, sector_t iblock,
 	return 0;
 }
 
+/*
+ * Callback function when writepage finishes.  Check for errors, unlock, clean
+ * up, etc.
+ */
+void writepage_done(struct osd_request *req, void *p)
+{
+	int ret;
+	struct page *page = (struct page *)p;
+	struct inode *inode = page->mapping->host;
+	struct osdfs_sb_info *sbi = inode->i_sb->s_fs_info;
+
+	ret = check_ok(req);
+	free_osd_req(req);
+	atomic_dec(&sbi->s_curr_pending);
+
+	if (ret) {
+		if (ret == -ENOSPC)
+			set_bit(AS_ENOSPC, &page->mapping->flags);
+		else
+			set_bit(AS_EIO, &page->mapping->flags);
+
+		SetPageError(page);
+	}
+
+	end_page_writeback(page);
+	unlock_page(page);
+}
+
+/*
+ * Write a page to disk.  page->index gives us the page number.  The page is
+ * locked before this function is called.  We write asynchronously and then the
+ * callback function (writepage_done) is called.  We signify that the operation
+ * has completed by unlocking the page and calling end_page_writeback().
+ */
+static int osdfs_writepage(struct page *page, struct writeback_control *wbc)
+{
+	struct inode *inode = page->mapping->host;
+	struct osdfs_i_info *oi = OSDFS_I(inode);
+	loff_t i_size = i_size_read(inode);
+	unsigned long end_index = i_size >> PAGE_CACHE_SHIFT;
+	unsigned offset = 0;
+	struct osd_request *req = NULL;
+	struct osdfs_sb_info *sbi;
+	uint64_t start;
+	uint64_t len = PAGE_CACHE_SIZE;
+	unsigned char *kaddr;
+	int ret = 0;
+
+	if (!PageLocked(page))
+		BUG();
+
+	/* if the object has not been created, and we are not in sync mode,
+	 * just return.  otherwise, wait. */
+	if (!ObjCreated(oi)) {
+		if (!Obj2BCreated(oi))
+			BUG();
+
+		if (wbc->sync_mode == WB_SYNC_NONE) {
+			redirty_page_for_writepage(wbc, page);
+			unlock_page(page);
+			ret = 0;
+			goto out;
+		} else {
+			wait_event(oi->i_wq, ObjCreated(oi));
+		}
+	}
+
+	/* in this case, the page is within the limits of the file */
+	if (page->index < end_index)
+		goto do_it;
+
+	offset = i_size & (PAGE_CACHE_SIZE - 1);
+	len = offset;
+
+	/*in this case, the page is outside the limits (truncate in progress)*/
+	if (page->index >= end_index + 1 || !offset) {
+		unlock_page(page);
+		goto out;
+	}
+
+	/* otherwise, the page straddles i_size.  It must be zeroed out. */
+	kaddr = kmap_atomic(page, KM_USER0);
+	memset(kaddr + offset, 0, PAGE_CACHE_SIZE - offset);
+	flush_dcache_page(page);
+	kunmap_atomic(page, KM_USER0);
+
+do_it:
+	BUG_ON(PageWriteback(page));
+	set_page_writeback(page);
+	start = page->index << PAGE_CACHE_SHIFT;
+	sbi = inode->i_sb->s_fs_info;
+
+	kaddr = page_address(page);
+
+	req = prepare_osd_write(sbi->s_dev, sbi->s_pid,
+			      inode->i_ino + OSDFS_OBJ_OFF, len, start, 0,
+			      kaddr);
+	if (!req) {
+		printk(KERN_ERR "ERROR: writepage failed.\n");
+		ret = -ENOMEM;
+		goto fail;
+	}
+
+	ret = osdfs_async_op(req, writepage_done, (void *)page, oi->i_cred);
+	if (ret) {
+		free_osd_req(req);
+		goto fail;
+	}
+	atomic_inc(&sbi->s_curr_pending);
+out:
+	return ret;
+fail:
+	set_bit(AS_EIO, &page->mapping->flags);
+	end_page_writeback(page);
+	unlock_page(page);
+	goto out;
+}
+
+/*
+ * Callback for readpage
+ */
+void readpage_done(struct osd_request *req, void *p)
+{
+	struct page *page = (struct page *)p;
+	struct inode *inode = page->mapping->host;
+	struct osdfs_sb_info *sbi = inode->i_sb->s_fs_info;
+	char *kaddr;
+	int ret;
+
+	ret = check_ok(req);
+	free_osd_req(req);
+	atomic_dec(&sbi->s_curr_pending);
+
+	if (ret == -EFAULT) {
+
+		/* In this case we were trying to read something that wasn't on
+		 * disk yet - return a page full of zeroes.  This should be OK,
+		 * because the object should be empty (if there was a write
+		 * before this read, the read would be waiting with the page
+		 * locked */
+		kaddr = page_address(page);
+		memset(kaddr, 0, PAGE_CACHE_SIZE);
+
+		SetPageUptodate(page);
+		if (PageError(page))
+			ClearPageError(page);
+	} else if (ret == 0) {
+
+		/* Everything is OK */
+		SetPageUptodate(page);
+		if (PageError(page))
+			ClearPageError(page);
+	} else {
+
+		/* Error */
+		SetPageError(page);
+	}
+
+	unlock_page(page);
+}
+
+/*
+ * Read a page from the OSD
+ */
+static int readpage_filler(struct page *page)
+{
+	struct osd_request *req = NULL;
+	struct inode *inode = page->mapping->host;
+	struct osdfs_i_info *oi = OSDFS_I(inode);
+	ino_t ino = inode->i_ino;
+	loff_t i_size = i_size_read(inode);
+	unsigned long end_index = i_size >> PAGE_CACHE_SHIFT;
+	struct super_block *sb = inode->i_sb;
+	struct osdfs_sb_info *sbi = sb->s_fs_info;
+	uint64_t amount;
+	unsigned char *kaddr;
+	int ret = 0;
+
+	if (!PageLocked(page))
+		BUG();
+
+	if (PageUptodate(page))
+		goto out;
+
+	/* we are before the last page */
+	if (page->index < end_index) {
+		amount = PAGE_CACHE_SIZE;
+		goto do_it;
+	}
+
+	amount = i_size & (PAGE_CACHE_SIZE - 1);
+
+	/* this will be out of bounds, or doesn't exist yet */
+	if ((page->index >= end_index + 1 || !amount) || (!ObjCreated(oi))) {
+		kaddr = kmap_atomic(page, KM_USER0);
+		memset(kaddr, 0, PAGE_CACHE_SIZE);
+		flush_dcache_page(page);
+		kunmap_atomic(page, KM_USER0);
+		SetPageUptodate(page);
+		if (PageError(page))
+			ClearPageError(page);
+		unlock_page(page);
+		goto out;
+	}
+
+do_it:
+	kaddr = page_address(page);
+
+	req = prepare_osd_read(sbi->s_dev, sbi->s_pid, ino + OSDFS_OBJ_OFF,
+			amount, (uint64_t)(page->index << PAGE_CACHE_SHIFT), 0,
+			kaddr);
+	if (!req) {
+		printk(KERN_ERR "ERROR: readpage failed.\n");
+		ret = -ENOMEM;
+		unlock_page(page);
+		goto out;
+	}
+
+	ret = osdfs_async_op(req, readpage_done, (void *)page, oi->i_cred);
+	if (ret) {
+		free_osd_req(req);
+		unlock_page(page);
+		goto out;
+	}
+	atomic_inc(&sbi->s_curr_pending);
+out:
+	return ret;
+}
+
+/*
+ * We don't need the file
+ */
+static int osdfs_readpage(struct file *file, struct page *page)
+{
+	return readpage_filler(page);
+}
+
+/*
+ * We don't need the data
+ */
+static int readpage_strip(void *data, struct page *page)
+{
+	return readpage_filler(page);
+}
+
+/*
+ * read a bunch of pages - usually for readahead
+ */
+static int osdfs_readpages(struct file *file, struct address_space *mapping,
+			   struct list_head *pages, unsigned nr_pages)
+{
+	return read_cache_pages(mapping, pages, readpage_strip, NULL);
+}
+
+/* This was borrowed from fs/libfs.c it used to be exported but now it
+ * is not. FIXME: Is this at all right?
+ */
+static int osdfs_simple_commit_write(struct file *file, struct page *page,
+			       unsigned from, unsigned to)
+{
+	struct inode *inode = page->mapping->host;
+	loff_t pos = ((loff_t)page->index << PAGE_CACHE_SHIFT) + to;
+
+	if (!PageUptodate(page))
+		SetPageUptodate(page);
+	/*
+	 * No need to use i_size_read() here, the i_size
+	 * cannot change under us because we hold the i_mutex.
+	 */
+	if (pos > inode->i_size)
+		i_size_write(inode, pos);
+	set_page_dirty(page);
+	return 0;
+}
+
+struct address_space_operations osdfs_aops = {
+	.readpage		= osdfs_readpage,
+	.readpages		= osdfs_readpages,
+	.writepage		= osdfs_writepage,
+	.prepare_write		= simple_prepare_write,
+	.commit_write		= osdfs_simple_commit_write,
+	.writepages		= generic_writepages,
+};
+
 /******************************************************************************
  * INODE OPERATIONS
  *****************************************************************************/
diff --git a/fs/osdfs/osdfs.h b/fs/osdfs/osdfs.h
index 7610ce3..29e7d7b 100644
--- a/fs/osdfs/osdfs.h
+++ b/fs/osdfs/osdfs.h
@@ -188,6 +188,9 @@ int osdfs_setattr(struct dentry *, struct iattr *);
 extern struct inode_operations osdfs_file_inode_operations;
 extern struct file_operations osdfs_file_operations;
 
+/* inode.c           */
+extern struct address_space_operations osdfs_aops;
+
 /* symlink.c         */
 extern struct inode_operations osdfs_symlink_inode_operations;
 extern struct inode_operations osdfs_fast_symlink_inode_operations;
-- 
1.6.0.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC 5/9] osdfs: dir_inode and directory operations
  2008-10-30 14:26 [RFC 0/9] osdfs Boaz Harrosh
                   ` (3 preceding siblings ...)
  2008-10-30 14:33 ` [RFC 4/9] osdfs: address_space_operations Boaz Harrosh
@ 2008-10-30 14:34 ` Boaz Harrosh
  2008-10-30 14:35 ` [RFC 6/9] osdfs: super_operations and file_system_type Boaz Harrosh
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Boaz Harrosh @ 2008-10-30 14:34 UTC (permalink / raw)
  To: Avishay Traeger, linux-scsi, linux-fsdevel, open-osd

implementation of directory and inode operations.

* A directory is treated as a file, and essentially contains a list
  of <file name, inode #> pairs for files that are found in that
  directory. The object IDs correspond to the files' inode numbers
  and are allocated using a 64bit incrementing global counter.
* Each file's control block (AKA on-disk inode) is stored in its
  object's attributes. This applies to both regular files and other
  types (directories, device files, symlinks, etc.).

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
 fs/osdfs/Kbuild  |    2 +-
 fs/osdfs/dir.c   |  629 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/osdfs/inode.c |  267 +++++++++++++++++++++++
 fs/osdfs/namei.c |  348 ++++++++++++++++++++++++++++++
 fs/osdfs/osdfs.h |   26 +++
 5 files changed, 1271 insertions(+), 1 deletions(-)
 create mode 100644 fs/osdfs/dir.c
 create mode 100644 fs/osdfs/namei.c

diff --git a/fs/osdfs/Kbuild b/fs/osdfs/Kbuild
index eddba6a..d6ac8d6 100644
--- a/fs/osdfs/Kbuild
+++ b/fs/osdfs/Kbuild
@@ -20,5 +20,5 @@ EXTRA_CFLAGS += -I$(OSD_INC)
 # EXTRA_CFLAGS += -DCONFIG_OSDFS_DEBUG
 endif
 
-osdfs-objs := osd.o inode.o file.o symlink.o
+osdfs-objs := osd.o inode.o file.o symlink.o namei.o dir.o
 obj-$(CONFIG_OSDFS_FS) += osdfs.o
diff --git a/fs/osdfs/dir.c b/fs/osdfs/dir.c
new file mode 100644
index 0000000..ba28fd6
--- /dev/null
+++ b/fs/osdfs/dir.c
@@ -0,0 +1,629 @@
+/*
+ * Copyright (C) 2005, 2006
+ * Avishay Traeger (avishay@gmail.com) (avishay@il.ibm.com)
+ * Copyright (C) 2005, 2006
+ * International Business Machines
+ *
+ * Copyrights for code taken from ext2:
+ *     Copyright (C) 1992, 1993, 1994, 1995
+ *     Remy Card (card@masi.ibp.fr)
+ *     Laboratoire MASI - Institut Blaise Pascal
+ *     Universite Pierre et Marie Curie (Paris VI)
+ *     from
+ *     linux/fs/minix/inode.c
+ *     Copyright (C) 1991, 1992  Linus Torvalds
+ *
+ * This file is part of osdfs.
+ *
+ * osdfs is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation.  Since it is based on ext2, and the only
+ * valid version of GPL for the Linux kernel is version 2, the only valid
+ * version of GPL for osdfs is version 2.
+ *
+ * osdfs is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with osdfs; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+#include <linux/pagemap.h>
+#include <linux/smp_lock.h>
+#include "osdfs.h"
+
+static inline unsigned osdfs_chunk_size(struct inode *inode)
+{
+	return inode->i_sb->s_blocksize;
+}
+
+static inline void osdfs_put_page(struct page *page)
+{
+	kunmap(page);
+	page_cache_release(page);
+}
+
+static inline unsigned long dir_pages(struct inode *inode)
+{
+	return (inode->i_size+PAGE_CACHE_SIZE-1)>>PAGE_CACHE_SHIFT;
+}
+
+static unsigned osdfs_last_byte(struct inode *inode, unsigned long page_nr)
+{
+	unsigned last_byte = inode->i_size;
+
+	last_byte -= page_nr << PAGE_CACHE_SHIFT;
+	if (last_byte > PAGE_CACHE_SIZE)
+		last_byte = PAGE_CACHE_SIZE;
+	return last_byte;
+}
+
+static int osdfs_commit_chunk(struct page *page, unsigned from, unsigned to)
+{
+	struct inode *dir = page->mapping->host;
+	int err = 0;
+	dir->i_version++;
+	page->mapping->a_ops->commit_write(NULL, page, from, to);
+	if (IS_DIRSYNC(dir))
+		err = write_one_page(page, 1);
+	else
+		unlock_page(page);
+	return err;
+}
+
+static void osdfs_check_page(struct page *page)
+{
+	struct inode *dir = page->mapping->host;
+	unsigned chunk_size = osdfs_chunk_size(dir);
+	char *kaddr = page_address(page);
+	unsigned offs, rec_len;
+	unsigned limit = PAGE_CACHE_SIZE;
+	struct osdfs_dir_entry *p;
+	char *error;
+
+	/* if the page is the last one in the directory */
+	if ((dir->i_size >> PAGE_CACHE_SHIFT) == page->index) {
+		limit = dir->i_size & ~PAGE_CACHE_MASK;
+		if (limit & (chunk_size - 1))
+			goto Ebadsize;
+		if (!limit)
+			goto out;
+	}
+	for (offs = 0; offs <= limit - OSDFS_DIR_REC_LEN(1); offs += rec_len) {
+		p = (struct osdfs_dir_entry *)(kaddr + offs);
+		rec_len = p->rec_len;
+
+		if (rec_len < OSDFS_DIR_REC_LEN(1))
+			goto Eshort;
+		if (rec_len & 3)
+			goto Ealign;
+		if (rec_len < OSDFS_DIR_REC_LEN(p->name_len))
+			goto Enamelen;
+		if (((offs + rec_len - 1) ^ offs) & ~(chunk_size-1))
+			goto Espan;
+	}
+	if (offs != limit)
+		goto Eend;
+out:
+	SetPageChecked(page);
+	return;
+
+Ebadsize:
+	printk(KERN_ERR "ERROR [osdfs_check_page]: "
+		"size of directory #%lu is not a multiple of chunk size",
+		dir->i_ino
+	);
+	goto fail;
+Eshort:
+	error = "rec_len is smaller than minimal";
+	goto bad_entry;
+Ealign:
+	error = "unaligned directory entry";
+	goto bad_entry;
+Enamelen:
+	error = "rec_len is too small for name_len";
+	goto bad_entry;
+Espan:
+	error = "directory entry across blocks";
+	goto bad_entry;
+bad_entry:
+	printk(KERN_ERR
+		"ERROR [osdfs_check_page]: bad entry in directory #%lu: %s - "
+		"offset=%lu, inode=%lu, rec_len=%d, name_len=%d",
+		dir->i_ino, error, (page->index<<PAGE_CACHE_SHIFT)+offs,
+		(unsigned long) le32_to_cpu(p->inode),
+		rec_len, p->name_len);
+	goto fail;
+Eend:
+	p = (struct osdfs_dir_entry *)(kaddr + offs);
+	printk(KERN_ERR "ERROR [osdfs_check_page]: "
+		"entry in directory #%lu spans the page boundary"
+		"offset=%lu, inode=%lu",
+		dir->i_ino, (page->index<<PAGE_CACHE_SHIFT)+offs,
+		(unsigned long) le32_to_cpu(p->inode));
+fail:
+	SetPageChecked(page);
+	SetPageError(page);
+}
+
+static struct page *osdfs_get_page(struct inode *dir, unsigned long n)
+{
+	struct address_space *mapping = dir->i_mapping;
+	struct page *page = read_cache_page(mapping, n,
+				(filler_t *)mapping->a_ops->readpage, NULL);
+	if (!IS_ERR(page)) {
+		wait_on_page_locked(page);
+		kmap(page);
+		if (!PageUptodate(page))
+			goto fail;
+		if (!PageChecked(page))
+			osdfs_check_page(page);
+		if (PageError(page))
+			goto fail;
+	}
+	return page;
+
+fail:
+	osdfs_put_page(page);
+	return ERR_PTR(-EIO);
+}
+
+static inline int osdfs_match(int len, const unsigned char *name,
+					struct osdfs_dir_entry *de)
+{
+	if (len != de->name_len)
+		return 0;
+	if (!de->inode)
+		return 0;
+	return !memcmp(name, de->name, len);
+}
+
+static inline
+struct osdfs_dir_entry *osdfs_next_entry(struct osdfs_dir_entry *p)
+{
+	return (struct osdfs_dir_entry *)((char *)p + p->rec_len);
+}
+
+static inline unsigned
+osdfs_validate_entry(char *base, unsigned offset, unsigned mask)
+{
+	struct osdfs_dir_entry *de = (struct osdfs_dir_entry *)(base + offset);
+	struct osdfs_dir_entry *p =
+			(struct osdfs_dir_entry *)(base + (offset&mask));
+	while ((char *)p < (char *)de) {
+		if (p->rec_len == 0)
+			break;
+		p = osdfs_next_entry(p);
+	}
+	return (char *)p - base;
+}
+
+static unsigned char osdfs_filetype_table[OSDFS_FT_MAX] = {
+	[OSDFS_FT_UNKNOWN]	= DT_UNKNOWN,
+	[OSDFS_FT_REG_FILE]	= DT_REG,
+	[OSDFS_FT_DIR]		= DT_DIR,
+	[OSDFS_FT_CHRDEV]	= DT_CHR,
+	[OSDFS_FT_BLKDEV]	= DT_BLK,
+	[OSDFS_FT_FIFO]		= DT_FIFO,
+	[OSDFS_FT_SOCK]		= DT_SOCK,
+	[OSDFS_FT_SYMLINK]	= DT_LNK,
+};
+
+#define S_SHIFT 12
+static unsigned char osdfs_type_by_mode[S_IFMT >> S_SHIFT] = {
+	[S_IFREG >> S_SHIFT]	= OSDFS_FT_REG_FILE,
+	[S_IFDIR >> S_SHIFT]	= OSDFS_FT_DIR,
+	[S_IFCHR >> S_SHIFT]	= OSDFS_FT_CHRDEV,
+	[S_IFBLK >> S_SHIFT]	= OSDFS_FT_BLKDEV,
+	[S_IFIFO >> S_SHIFT]	= OSDFS_FT_FIFO,
+	[S_IFSOCK >> S_SHIFT]	= OSDFS_FT_SOCK,
+	[S_IFLNK >> S_SHIFT]	= OSDFS_FT_SYMLINK,
+};
+
+static inline
+void osdfs_set_de_type(struct osdfs_dir_entry *de, struct inode *inode)
+{
+	mode_t mode = inode->i_mode;
+	de->file_type = osdfs_type_by_mode[(mode & S_IFMT)>>S_SHIFT];
+}
+
+static int
+osdfs_readdir(struct file *filp, void *dirent, filldir_t filldir)
+{
+	loff_t pos = filp->f_pos;
+	struct inode *inode = filp->f_dentry->d_inode;
+	unsigned int offset = pos & ~PAGE_CACHE_MASK;
+	unsigned long n = pos >> PAGE_CACHE_SHIFT;
+	unsigned long npages = dir_pages(inode);
+	unsigned chunk_mask = ~(osdfs_chunk_size(inode)-1);
+	unsigned char *types = NULL;
+	int need_revalidate = (filp->f_version != inode->i_version);
+	int ret;
+
+	if (pos > inode->i_size - OSDFS_DIR_REC_LEN(1))
+		goto success;
+
+	types = osdfs_filetype_table;
+
+	for ( ; n < npages; n++, offset = 0) {
+		char *kaddr, *limit;
+		struct osdfs_dir_entry *de;
+		struct page *page = osdfs_get_page(inode, n);
+
+		if (IS_ERR(page)) {
+			printk(KERN_ERR "ERROR: "
+				   "bad page in #%lu",
+				   inode->i_ino);
+			filp->f_pos += PAGE_CACHE_SIZE - offset;
+			ret = -EIO;
+			goto done;
+		}
+		kaddr = page_address(page);
+		if (need_revalidate) {
+			offset = osdfs_validate_entry(kaddr, offset, chunk_mask);
+			need_revalidate = 0;
+		}
+		de = (struct osdfs_dir_entry *)(kaddr+offset);
+		limit = kaddr + osdfs_last_byte(inode, n) - OSDFS_DIR_REC_LEN(1);
+		for (; (char *)de <= limit; de = osdfs_next_entry(de)) {
+			if (de->rec_len == 0) {
+				printk(KERN_ERR "ERROR: "
+					"zero-length directory entry");
+				ret = -EIO;
+				osdfs_put_page(page);
+				goto done;
+			}
+			if (de->inode) {
+				int over;
+				unsigned char d_type = DT_UNKNOWN;
+
+				if (types && de->file_type < OSDFS_FT_MAX)
+					d_type = types[de->file_type];
+
+				offset = (char *)de - kaddr;
+				over = filldir(dirent, de->name, de->name_len,
+						(n<<PAGE_CACHE_SHIFT) | offset,
+						de->inode, d_type);
+				if (over) {
+					osdfs_put_page(page);
+					goto success;
+				}
+			}
+			filp->f_pos += de->rec_len;
+		}
+		osdfs_put_page(page);
+	}
+
+success:
+	ret = 0;
+done:
+	filp->f_version = inode->i_version;
+	return ret;
+}
+
+struct osdfs_dir_entry *osdfs_find_entry(struct inode *dir,
+			struct dentry *dentry, struct page **res_page)
+{
+	const unsigned char *name = dentry->d_name.name;
+	int namelen = dentry->d_name.len;
+	unsigned reclen = OSDFS_DIR_REC_LEN(namelen);
+	unsigned long start, n;
+	unsigned long npages = dir_pages(dir);
+	struct page *page = NULL;
+	struct osdfs_i_info *oi = OSDFS_I(dir);
+	struct osdfs_dir_entry *de;
+
+	if (npages == 0)
+		goto out;
+
+	*res_page = NULL;
+
+	start = oi->i_dir_start_lookup;
+	if (start >= npages)
+		start = 0;
+	n = start;
+	do {
+		char *kaddr;
+		page = osdfs_get_page(dir, n);
+		if (!IS_ERR(page)) {
+			kaddr = page_address(page);
+			de = (struct osdfs_dir_entry *) kaddr;
+			kaddr += osdfs_last_byte(dir, n) - reclen;
+			while ((char *) de <= kaddr) {
+				if (de->rec_len == 0) {
+					printk(KERN_ERR
+						"ERROR: osdfs_find_entry: "
+						"zero-length directory entry");
+					osdfs_put_page(page);
+					goto out;
+				}
+				if (osdfs_match(namelen, name, de))
+					goto found;
+				de = osdfs_next_entry(de);
+			}
+			osdfs_put_page(page);
+		}
+		if (++n >= npages)
+			n = 0;
+	} while (n != start);
+out:
+	return NULL;
+
+found:
+	*res_page = page;
+	oi->i_dir_start_lookup = n;
+	return de;
+}
+
+struct osdfs_dir_entry *osdfs_dotdot(struct inode *dir, struct page **p)
+{
+	struct page *page = osdfs_get_page(dir, 0);
+	struct osdfs_dir_entry *de = NULL;
+
+	if (!IS_ERR(page)) {
+		de = osdfs_next_entry(
+				(struct osdfs_dir_entry *)page_address(page));
+		*p = page;
+	}
+	return de;
+}
+
+ino_t osdfs_inode_by_name(struct inode *dir, struct dentry *dentry)
+{
+	ino_t res = 0;
+	struct osdfs_dir_entry *de;
+	struct page *page;
+
+	de = osdfs_find_entry(dir, dentry, &page);
+	if (de) {
+		res = de->inode;
+		kunmap(page);
+		page_cache_release(page);
+	}
+	return res;
+}
+
+void osdfs_set_link(struct inode *dir, struct osdfs_dir_entry *de,
+			struct page *page, struct inode *inode)
+{
+	unsigned from = (char *) de - (char *) page_address(page);
+	unsigned to = from + de->rec_len;
+	int err;
+
+	lock_page(page);
+	err = page->mapping->a_ops->prepare_write(NULL, page, from, to);
+	if (err)
+		BUG();
+	de->inode = inode->i_ino;
+	osdfs_set_de_type(de, inode);
+	err = osdfs_commit_chunk(page, from, to);
+	osdfs_put_page(page);
+	dir->i_mtime = dir->i_ctime = CURRENT_TIME;
+	mark_inode_dirty(dir);
+}
+
+int osdfs_add_link(struct dentry *dentry, struct inode *inode)
+{
+	struct inode *dir = dentry->d_parent->d_inode;
+	const unsigned char *name = dentry->d_name.name;
+	int namelen = dentry->d_name.len;
+	unsigned chunk_size = osdfs_chunk_size(dir);
+	unsigned reclen = OSDFS_DIR_REC_LEN(namelen);
+	unsigned short rec_len, name_len;
+	struct page *page = NULL;
+	struct osdfs_sb_info *sbi = inode->i_sb->s_fs_info;
+	struct osdfs_dir_entry *de;
+	unsigned long npages = dir_pages(dir);
+	unsigned long n;
+	char *kaddr;
+	unsigned from, to;
+	int err;
+
+	for (n = 0; n <= npages; n++) {
+		char *dir_end;
+
+		page = osdfs_get_page(dir, n);
+		err = PTR_ERR(page);
+		if (IS_ERR(page))
+			goto out;
+		lock_page(page);
+		kaddr = page_address(page);
+		dir_end = kaddr + osdfs_last_byte(dir, n);
+		de = (struct osdfs_dir_entry *)kaddr;
+		kaddr += PAGE_CACHE_SIZE - reclen;
+		while ((char *)de <= kaddr) {
+			if ((char *)de == dir_end) {
+				name_len = 0;
+				rec_len = chunk_size;
+				de->rec_len = chunk_size;
+				de->inode = 0;
+				goto got_it;
+			}
+			if (de->rec_len == 0) {
+				printk(KERN_ERR "ERROR: osdfs_add_link: "
+					"zero-length directory entry");
+				err = -EIO;
+				goto out_unlock;
+			}
+			err = -EEXIST;
+			if (osdfs_match(namelen, name, de))
+				goto out_unlock;
+			name_len = OSDFS_DIR_REC_LEN(de->name_len);
+			rec_len = de->rec_len;
+			if (!de->inode && rec_len >= reclen)
+				goto got_it;
+			if (rec_len >= name_len + reclen)
+				goto got_it;
+			de = (struct osdfs_dir_entry *) ((char *) de + rec_len);
+		}
+		unlock_page(page);
+		osdfs_put_page(page);
+	}
+	BUG();
+	return -EINVAL;
+
+got_it:
+	from = (char *)de - (char *)page_address(page);
+	to = from + rec_len;
+	err = page->mapping->a_ops->prepare_write(NULL, page, from, to);
+	if (err)
+		goto out_unlock;
+	if (de->inode) {
+		struct osdfs_dir_entry *de1 =
+			(struct osdfs_dir_entry *)((char *)de + name_len);
+		de1->rec_len = rec_len - name_len;
+		de->rec_len = name_len;
+		de = de1;
+	}
+	de->name_len = namelen;
+	memcpy(de->name, name, namelen);
+	de->inode = inode->i_ino;
+	osdfs_set_de_type(de, inode);
+	err = osdfs_commit_chunk(page, from, to);
+	dir->i_mtime = dir->i_ctime = CURRENT_TIME;
+	mark_inode_dirty(dir);
+	sbi->s_numfiles++;
+
+out_put:
+	osdfs_put_page(page);
+out:
+	return err;
+out_unlock:
+	unlock_page(page);
+	goto out_put;
+}
+
+int osdfs_delete_entry(struct osdfs_dir_entry *dir, struct page *page)
+{
+	struct address_space *mapping = page->mapping;
+	struct inode *inode = mapping->host;
+	struct osdfs_sb_info *sbi = inode->i_sb->s_fs_info;
+	char *kaddr = page_address(page);
+	unsigned from = ((char *)dir - kaddr) & ~(osdfs_chunk_size(inode)-1);
+	unsigned to = ((char *)dir - kaddr) + dir->rec_len;
+	struct osdfs_dir_entry *pde = NULL;
+	struct osdfs_dir_entry *de = (struct osdfs_dir_entry *) (kaddr + from);
+	int err;
+
+	while ((char *)de < (char *)dir) {
+		if (de->rec_len == 0) {
+			printk(KERN_ERR "ERROR: osdfs_delete_entry:"
+				"zero-length directory entry");
+			err = -EIO;
+			goto out;
+		}
+		pde = de;
+		de = osdfs_next_entry(de);
+	}
+	if (pde)
+		from = (char *)pde - (char *)page_address(page);
+	lock_page(page);
+	err = mapping->a_ops->prepare_write(NULL, page, from, to);
+	if (err)
+		BUG();
+	if (pde)
+		pde->rec_len = cpu_to_le16(to-from);
+	dir->inode = 0;
+	err = osdfs_commit_chunk(page, from, to);
+	inode->i_ctime = inode->i_mtime = CURRENT_TIME;
+	mark_inode_dirty(inode);
+	sbi->s_numfiles--;
+out:
+	osdfs_put_page(page);
+	return err;
+}
+
+int osdfs_make_empty(struct inode *inode, struct inode *parent)
+{
+	struct address_space *mapping = inode->i_mapping;
+	struct page *page = grab_cache_page(mapping, 0);
+	unsigned chunk_size = osdfs_chunk_size(inode);
+	struct osdfs_dir_entry *de;
+	int err;
+	void *kaddr;
+
+	if (!page)
+		return -ENOMEM;
+	err = mapping->a_ops->prepare_write(NULL, page, 0, chunk_size);
+	if (err) {
+		unlock_page(page);
+		goto fail;
+	}
+
+	kaddr = kmap_atomic(page, KM_USER0);
+	de = (struct osdfs_dir_entry *)kaddr;
+	de->name_len = 1;
+	de->rec_len = OSDFS_DIR_REC_LEN(1);
+	memcpy(de->name, ".\0\0", 4);
+	de->inode = inode->i_ino;
+	osdfs_set_de_type(de, inode);
+
+	de = (struct osdfs_dir_entry *)(kaddr + OSDFS_DIR_REC_LEN(1));
+	de->name_len = 2;
+	de->rec_len = chunk_size - OSDFS_DIR_REC_LEN(1);
+	de->inode = parent->i_ino;
+	memcpy(de->name, "..\0", 4);
+	osdfs_set_de_type(de, inode);
+	kunmap_atomic(page, KM_USER0);
+	err = osdfs_commit_chunk(page, 0, chunk_size);
+fail:
+	page_cache_release(page);
+	return err;
+}
+
+int osdfs_empty_dir(struct inode *inode)
+{
+	struct page *page = NULL;
+	unsigned long i, npages = dir_pages(inode);
+
+	for (i = 0; i < npages; i++) {
+		char *kaddr;
+		struct osdfs_dir_entry *de;
+		page = osdfs_get_page(inode, i);
+
+		if (IS_ERR(page))
+			continue;
+
+		kaddr = page_address(page);
+		de = (struct osdfs_dir_entry *)kaddr;
+		kaddr += osdfs_last_byte(inode, i) - OSDFS_DIR_REC_LEN(1);
+
+		while ((char *)de <= kaddr) {
+			if (de->rec_len == 0) {
+				printk(KERN_ERR "ERROR: osdfs_empty_dir: "
+					"zero-length directory entry");
+				printk("kaddr=%p, de=%p\n", kaddr, de);
+				goto not_empty;
+			}
+			if (de->inode != 0) {
+				/* check for . and .. */
+				if (de->name[0] != '.')
+					goto not_empty;
+				if (de->name_len > 2)
+					goto not_empty;
+				if (de->name_len < 2) {
+					if (de->inode !=
+					    inode->i_ino)
+						goto not_empty;
+				} else if (de->name[1] != '.')
+					goto not_empty;
+			}
+			de = osdfs_next_entry(de);
+		}
+		osdfs_put_page(page);
+	}
+	return 1;
+
+not_empty:
+	osdfs_put_page(page);
+	return 0;
+}
+
+struct file_operations osdfs_dir_operations = {
+	.llseek		= generic_file_llseek,
+	.read		= generic_read_dir,
+	.readdir	= osdfs_readdir,
+};
diff --git a/fs/osdfs/inode.c b/fs/osdfs/inode.c
index bfd82b1..478805e 100644
--- a/fs/osdfs/inode.c
+++ b/fs/osdfs/inode.c
@@ -408,6 +408,178 @@ fail:
 }
 
 /*
+ * Read an inode from the OSD, and return it as is.  We also return the size
+ * attribute in the 'sanity' argument if we got compiled with debugging turned
+ * on.
+ */
+int osdfs_get_inode(struct super_block *sb, struct osdfs_i_info *oi,
+		    struct osdfs_fcb *inode, uint64_t *sanity)
+{
+	struct osdfs_sb_info *sbi = sb->s_fs_info;
+	struct osd_request *req = NULL;
+	uint32_t page;
+	uint32_t attr;
+	uint16_t expected;
+	uint8_t *buf;
+	uint64_t o_id;
+	int ret;
+
+	o_id = oi->vfs_inode.i_ino + OSDFS_OBJ_OFF;
+
+	make_credential(oi->i_cred, sbi->s_pid, o_id);
+
+	req = prepare_osd_get_attr(sbi->s_dev, sbi->s_pid, o_id);
+	if (!req) {
+		printk(KERN_ERR "ERROR: prepare get_attr failed.\n");
+		return -ENOMEM;
+	}
+
+	/* we need the inode attribute */
+	prepare_get_attr_list_add_entry(req,
+					OSD_PAGE_NUM_IBM_UOBJ_FS_DATA,
+					OSD_ATTR_NUM_IBM_UOBJ_FS_DATA_INODE,
+					OSDFS_INO_ATTR_SIZE);
+
+#ifdef OSDFS_DEBUG
+	/* we get the size attributes to do a sanity check */
+	prepare_get_attr_list_add_entry(req,
+					OSD_APAGE_OBJECT_INFORMATION,
+					OSD_ATTR_OI_LOGICAL_LENGTH, 8);
+#endif
+
+	ret = osdfs_sync_op(req, sbi->s_timeout, oi->i_cred);
+	if (ret)
+		goto out;
+
+	page = OSD_PAGE_NUM_IBM_UOBJ_FS_DATA;
+	attr = OSD_ATTR_NUM_IBM_UOBJ_FS_DATA_INODE;
+	expected = OSDFS_INO_ATTR_SIZE;
+	ret = extract_next_attr_from_req(req, &page, &attr, &expected, &buf);
+	if (ret) {
+		printk(KERN_ERR "ERROR: extract attr from req failed\n");
+		goto out;
+	}
+	memcpy(inode, buf, sizeof(struct osdfs_fcb));
+
+#ifdef OSDFS_DEBUG
+	page = OSD_APAGE_OBJECT_INFORMATION;
+	attr = OSD_ATTR_OI_LOGICAL_LENGTH;
+	expected = 8;
+	ret = extract_next_attr_from_req(req, &page, &attr, &expected, &buf);
+	if (ret) {
+		printk(KERN_ERR "ERROR: extract attr from req failed\n");
+		goto out;
+	}
+	*sanity = be64_to_cpu(*((uint64_t *) buf));
+#endif
+
+out:
+	free_osd_req(req);
+	return ret;
+}
+
+/*
+ * Fill in an inode read from the OSD and set it up for use
+ */
+struct inode *osdfs_iget(struct super_block *sb, unsigned long ino)
+{
+	struct osdfs_i_info *oi;
+	struct osdfs_fcb fcb;
+	struct inode *inode;
+	uint64_t sanity;
+	int ret;
+	int n;
+
+	inode = iget_locked(sb, ino);
+	if (!inode)
+		return ERR_PTR(-ENOMEM);
+	if (!(inode->i_state & I_NEW))
+		return inode;
+	oi = OSDFS_I(inode);
+
+	/* read the inode from the osd */
+	ret = osdfs_get_inode(sb, oi, &fcb, &sanity);
+	if (ret)
+		goto bad_inode;
+
+	init_waitqueue_head(&oi->i_wq);
+	SetObjCreated(oi);
+
+	/* copy stuff from on-disk struct to in-memory struct */
+	inode->i_mode = be16_to_cpu(fcb.i_mode);
+	inode->i_uid = be32_to_cpu(fcb.i_uid);
+	inode->i_gid = be32_to_cpu(fcb.i_gid);
+	inode->i_nlink = be16_to_cpu(fcb.i_links_count);
+	inode->i_ctime.tv_sec = be32_to_cpu(fcb.i_ctime);
+	inode->i_atime.tv_sec = be32_to_cpu(fcb.i_atime);
+	inode->i_mtime.tv_sec = be32_to_cpu(fcb.i_mtime);
+	inode->i_atime.tv_nsec = inode->i_mtime.tv_nsec =
+	    inode->i_ctime.tv_nsec = 0;
+	i_size_write(inode, be64_to_cpu(fcb.i_size));
+	inode->i_blkbits = OSDFS_BLKSHIFT;
+	inode->i_generation = be32_to_cpu(fcb.i_generation);
+
+#ifdef OSDFS_DEBUG
+	if ((inode->i_size != sanity) &&
+		(!osdfs_inode_is_fast_symlink(inode))) {
+		printk(KERN_WARNING
+		       "WARNING: Size of object from inode and "
+		       "attributes differ (%lld != %llu)\n",
+		       inode->i_size, sanity);
+	}
+#endif
+
+	oi->i_objs = fcb.i_objs;
+	oi->i_dir_start_lookup = 0;
+
+	if ((inode->i_nlink == 0) && (inode->i_mode == 0)) {
+		ret = -ESTALE;
+		goto bad_inode;
+	}
+
+	if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode)) {
+		if (fcb.i_data[0])
+			inode->i_rdev = old_decode_dev(fcb.i_data[0]);
+		else
+			inode->i_rdev = new_decode_dev(fcb.i_data[1]);
+	} else
+		for (n = 0; n < OSDFS_IDATA; n++)
+			oi->i_data[n] = fcb.i_data[n];
+
+	if (S_ISREG(inode->i_mode)) {
+		inode->i_op = &osdfs_file_inode_operations;
+		inode->i_fop = &osdfs_file_operations;
+		inode->i_mapping->a_ops = &osdfs_aops;
+	} else if (S_ISDIR(inode->i_mode)) {
+		inode->i_op = &osdfs_dir_inode_operations;
+		inode->i_fop = &osdfs_dir_operations;
+		inode->i_mapping->a_ops = &osdfs_aops;
+	} else if (S_ISLNK(inode->i_mode)) {
+		if (osdfs_inode_is_fast_symlink(inode))
+			inode->i_op = &osdfs_fast_symlink_inode_operations;
+		else {
+			inode->i_op = &osdfs_symlink_inode_operations;
+			inode->i_mapping->a_ops = &osdfs_aops;
+		}
+	} else {
+		inode->i_op = &osdfs_special_inode_operations;
+		if (fcb.i_data[0])
+			init_special_inode(inode, inode->i_mode,
+			   old_decode_dev(le32_to_cpu(fcb.i_data[0])));
+		else
+			init_special_inode(inode, inode->i_mode,
+			   new_decode_dev(le32_to_cpu(fcb.i_data[1])));
+	}
+
+	unlock_new_inode(inode);
+	return inode;
+
+bad_inode:
+	iget_failed(inode);
+	return ERR_PTR(ret);
+}
+
+/*
  * Set inode attributes - just call generic functions.
  */
 int osdfs_setattr(struct dentry *dentry, struct iattr *iattr)
@@ -422,3 +594,98 @@ int osdfs_setattr(struct dentry *dentry, struct iattr *iattr)
 	error = inode_setattr(inode, iattr);
 	return error;
 }
+
+/*
+ * Callback function from osdfs_new_inode().  The important thing is that we
+ * set the ObjCreated flag so that other methods know that the object exists on
+ * the OSD.
+ */
+void create_done(struct osd_request *req, void *p)
+{
+	struct inode *inode = (struct inode *)p;
+	struct osdfs_i_info *oi = OSDFS_I(inode);
+	struct osdfs_sb_info *sbi = inode->i_sb->s_fs_info;
+	int ret;
+
+	ret = check_ok(req);
+	free_osd_req(req);
+	atomic_dec(&sbi->s_curr_pending);
+
+	if (ret)
+		make_bad_inode(inode);
+	else
+		SetObjCreated(oi);
+
+	atomic_dec(&inode->i_count);
+}
+
+/*
+ * Set up a new inode and create an object for it on the OSD
+ */
+struct inode *osdfs_new_inode(struct inode *dir, int mode)
+{
+	struct super_block *sb;
+	struct inode *inode;
+	struct osdfs_i_info *oi;
+	struct osdfs_sb_info *sbi;
+	struct osd_request *req = NULL;
+	int ret;
+
+	sb = dir->i_sb;
+	inode = new_inode(sb);
+	if (!inode)
+		return ERR_PTR(-ENOMEM);
+
+	oi = OSDFS_I(inode);
+
+	init_waitqueue_head(&oi->i_wq);
+	SetObj2BCreated(oi);
+
+	sbi = sb->s_fs_info;
+
+	sb->s_dirt = 1;
+	inode->i_uid = current->fsuid;
+	if (dir->i_mode & S_ISGID) {
+		inode->i_gid = dir->i_gid;
+		if (S_ISDIR(mode))
+			mode |= S_ISGID;
+	} else
+		inode->i_gid = current->fsgid;
+	inode->i_mode = mode;
+
+	inode->i_ino = sbi->s_nextid++;
+	inode->i_blkbits = OSDFS_BLKSHIFT;
+	inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME;
+	inode->i_size = 0;
+	spin_lock(&sbi->s_next_gen_lock);
+	inode->i_generation = sbi->s_next_generation++;
+	spin_unlock(&sbi->s_next_gen_lock);
+	insert_inode_hash(inode);
+
+	mark_inode_dirty(inode);
+
+	req = prepare_osd_create(sbi->s_dev, sbi->s_pid,
+			       inode->i_ino + OSDFS_OBJ_OFF);
+	if (!req) {
+		printk(KERN_ERR "ERROR: prepare_osd_create failed\n");
+		return ERR_PTR(-EIO);
+	}
+
+	make_credential(oi->i_cred, sbi->s_pid, inode->i_ino + OSDFS_OBJ_OFF);
+
+	/* increment the refcount so that the inode will still be around when we
+	 * reach the callback
+	 */
+	atomic_inc(&inode->i_count);
+
+	ret = osdfs_async_op(req, create_done, (void *)inode, oi->i_cred);
+	if (ret) {
+		atomic_dec(&inode->i_count);
+		free_osd_req(req);
+		return ERR_PTR(-EIO);
+	}
+	atomic_inc(&sbi->s_curr_pending);
+
+	return inode;
+}
+
diff --git a/fs/osdfs/namei.c b/fs/osdfs/namei.c
new file mode 100644
index 0000000..b747e90
--- /dev/null
+++ b/fs/osdfs/namei.c
@@ -0,0 +1,348 @@
+/*
+ * Copyright (C) 2005, 2006
+ * Avishay Traeger (avishay@gmail.com) (avishay@il.ibm.com)
+ * Copyright (C) 2005, 2006
+ * International Business Machines
+ *
+ * Copyrights for code taken from ext2:
+ *     Copyright (C) 1992, 1993, 1994, 1995
+ *     Remy Card (card@masi.ibp.fr)
+ *     Laboratoire MASI - Institut Blaise Pascal
+ *     Universite Pierre et Marie Curie (Paris VI)
+ *     from
+ *     linux/fs/minix/inode.c
+ *     Copyright (C) 1991, 1992  Linus Torvalds
+ *
+ * This file is part of osdfs.
+ *
+ * osdfs is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation.  Since it is based on ext2, and the only
+ * valid version of GPL for the Linux kernel is version 2, the only valid
+ * version of GPL for osdfs is version 2.
+ *
+ * osdfs is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with osdfs; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+#include "osdfs.h"
+
+static inline void osdfs_inc_count(struct inode *inode)
+{
+	inode->i_nlink++;
+	mark_inode_dirty(inode);
+}
+
+static inline void osdfs_dec_count(struct inode *inode)
+{
+	inode->i_nlink--;
+	mark_inode_dirty(inode);
+}
+
+static inline int osdfs_add_nondir(struct dentry *dentry, struct inode *inode)
+{
+	int err = osdfs_add_link(dentry, inode);
+	if (!err) {
+		d_instantiate(dentry, inode);
+		return 0;
+	}
+	osdfs_dec_count(inode);
+	iput(inode);
+	return err;
+}
+
+static struct dentry *osdfs_lookup(struct inode *dir, struct dentry *dentry,
+				   struct nameidata *nd)
+{
+	struct inode *inode;
+	ino_t ino;
+
+	if (dentry->d_name.len > OSDFS_NAME_LEN)
+		return ERR_PTR(-ENAMETOOLONG);
+
+	ino = osdfs_inode_by_name(dir, dentry);
+	inode = NULL;
+	if (ino) {
+		inode = osdfs_iget(dir->i_sb, ino);
+		if (IS_ERR(inode))
+			return ERR_CAST(inode);
+	}
+	if (inode)
+		return d_splice_alias(inode, dentry);
+	d_add(dentry, inode);
+	return NULL;
+}
+
+static int osdfs_create(struct inode *dir, struct dentry *dentry, int mode,
+			 struct nameidata *nd)
+{
+	struct inode *inode = osdfs_new_inode(dir, mode);
+	int err = PTR_ERR(inode);
+	if (!IS_ERR(inode)) {
+		inode->i_op = &osdfs_file_inode_operations;
+		inode->i_fop = &osdfs_file_operations;
+		inode->i_mapping->a_ops = &osdfs_aops;
+		mark_inode_dirty(inode);
+		err = osdfs_add_nondir(dentry, inode);
+	}
+	return err;
+}
+
+static int osdfs_mknod(struct inode *dir, struct dentry *dentry, int mode,
+		       dev_t rdev)
+{
+	struct inode *inode;
+	int err;
+
+	if (!new_valid_dev(rdev))
+		return -EINVAL;
+
+	inode = osdfs_new_inode(dir, mode);
+	err = PTR_ERR(inode);
+	if (!IS_ERR(inode)) {
+		init_special_inode(inode, inode->i_mode, rdev);
+		mark_inode_dirty(inode);
+		err = osdfs_add_nondir(dentry, inode);
+	}
+	return err;
+}
+
+static int osdfs_symlink(struct inode *dir, struct dentry *dentry,
+			  const char *symname)
+{
+	struct super_block *sb = dir->i_sb;
+	int err = -ENAMETOOLONG;
+	unsigned l = strlen(symname)+1;
+	struct inode *inode;
+
+	if (l > sb->s_blocksize)
+		goto out;
+
+	inode = osdfs_new_inode(dir, S_IFLNK | S_IRWXUGO);
+	err = PTR_ERR(inode);
+	if (IS_ERR(inode))
+		goto out;
+
+	if (l > sizeof(OSDFS_I(inode)->i_data)) {
+		/* slow symlink */
+		inode->i_op = &osdfs_symlink_inode_operations;
+		inode->i_mapping->a_ops = &osdfs_aops;
+		err = page_symlink(inode, symname, l);
+		memset((char *)(OSDFS_I(inode)->i_data), 0, OSDFS_IDATA);
+		if (err)
+			goto out_fail;
+	} else {
+		/* fast symlink */
+		inode->i_op = &osdfs_fast_symlink_inode_operations;
+		memcpy((char *)(OSDFS_I(inode)->i_data), symname, l);
+		inode->i_size = l-1;
+	}
+	mark_inode_dirty(inode);
+
+	err = osdfs_add_nondir(dentry, inode);
+out:
+	return err;
+
+out_fail:
+	osdfs_dec_count(inode);
+	iput(inode);
+	goto out;
+}
+
+static int osdfs_link(struct dentry *old_dentry, struct inode *dir,
+		struct dentry *dentry)
+{
+	struct inode *inode = old_dentry->d_inode;
+
+	if (inode->i_nlink >= OSDFS_LINK_MAX)
+		return -EMLINK;
+
+	inode->i_ctime = CURRENT_TIME;
+	osdfs_inc_count(inode);
+	atomic_inc(&inode->i_count);
+
+	return osdfs_add_nondir(dentry, inode);
+}
+
+static int osdfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
+{
+	struct inode *inode;
+	int err = -EMLINK;
+
+	if (dir->i_nlink >= OSDFS_LINK_MAX)
+		goto out;
+
+	osdfs_inc_count(dir);
+
+	inode = osdfs_new_inode(dir, S_IFDIR | mode);
+	err = PTR_ERR(inode);
+	if (IS_ERR(inode))
+		goto out_dir;
+
+	inode->i_op = &osdfs_dir_inode_operations;
+	inode->i_fop = &osdfs_dir_operations;
+	inode->i_mapping->a_ops = &osdfs_aops;
+
+	osdfs_inc_count(inode);
+
+	err = osdfs_make_empty(inode, dir);
+	if (err)
+		goto out_fail;
+
+	err = osdfs_add_link(dentry, inode);
+	if (err)
+		goto out_fail;
+
+	d_instantiate(dentry, inode);
+out:
+	return err;
+
+out_fail:
+	osdfs_dec_count(inode);
+	osdfs_dec_count(inode);
+	iput(inode);
+out_dir:
+	osdfs_dec_count(dir);
+	goto out;
+}
+
+static int osdfs_unlink(struct inode *dir, struct dentry *dentry)
+{
+	struct inode *inode = dentry->d_inode;
+	struct osdfs_dir_entry *de;
+	struct page *page;
+	int err = -ENOENT;
+
+	de = osdfs_find_entry(dir, dentry, &page);
+	if (!de)
+		goto out;
+
+	err = osdfs_delete_entry(de, page);
+	if (err)
+		goto out;
+
+	inode->i_ctime = dir->i_ctime;
+	osdfs_dec_count(inode);
+	err = 0;
+out:
+	return err;
+}
+
+static int osdfs_rmdir(struct inode *dir, struct dentry *dentry)
+{
+	struct inode *inode = dentry->d_inode;
+	int err = -ENOTEMPTY;
+
+	if (osdfs_empty_dir(inode)) {
+		err = osdfs_unlink(dir, dentry);
+		if (!err) {
+			inode->i_size = 0;
+			osdfs_dec_count(inode);
+			osdfs_dec_count(dir);
+		}
+	}
+	return err;
+}
+
+static int osdfs_rename(struct inode *old_dir, struct dentry *old_dentry,
+		struct inode *new_dir, struct dentry *new_dentry)
+{
+	struct inode *old_inode = old_dentry->d_inode;
+	struct inode *new_inode = new_dentry->d_inode;
+	struct page *dir_page = NULL;
+	struct osdfs_dir_entry *dir_de = NULL;
+	struct page *old_page;
+	struct osdfs_dir_entry *old_de;
+	int err = -ENOENT;
+
+	old_de = osdfs_find_entry(old_dir, old_dentry, &old_page);
+	if (!old_de)
+		goto out;
+
+	if (S_ISDIR(old_inode->i_mode)) {
+		err = -EIO;
+		dir_de = osdfs_dotdot(old_inode, &dir_page);
+		if (!dir_de)
+			goto out_old;
+	}
+
+	if (new_inode) {
+		struct page *new_page;
+		struct osdfs_dir_entry *new_de;
+
+		err = -ENOTEMPTY;
+		if (dir_de && !osdfs_empty_dir(new_inode))
+			goto out_dir;
+
+		err = -ENOENT;
+		new_de = osdfs_find_entry(new_dir, new_dentry, &new_page);
+		if (!new_de)
+			goto out_dir;
+		osdfs_inc_count(old_inode);
+		osdfs_set_link(new_dir, new_de, new_page, old_inode);
+		new_inode->i_ctime = CURRENT_TIME;
+		if (dir_de)
+			new_inode->i_nlink--;
+		osdfs_dec_count(new_inode);
+	} else {
+		if (dir_de) {
+			err = -EMLINK;
+			if (new_dir->i_nlink >= OSDFS_LINK_MAX)
+				goto out_dir;
+		}
+		osdfs_inc_count(old_inode);
+		err = osdfs_add_link(new_dentry, old_inode);
+		if (err) {
+			osdfs_dec_count(old_inode);
+			goto out_dir;
+		}
+		if (dir_de)
+			osdfs_inc_count(new_dir);
+	}
+
+	old_inode->i_ctime = CURRENT_TIME;
+
+	osdfs_delete_entry(old_de, old_page);
+	osdfs_dec_count(old_inode);
+
+	if (dir_de) {
+		osdfs_set_link(old_inode, dir_de, dir_page, new_dir);
+		osdfs_dec_count(old_dir);
+	}
+	return 0;
+
+
+out_dir:
+	if (dir_de) {
+		kunmap(dir_page);
+		page_cache_release(dir_page);
+	}
+out_old:
+	kunmap(old_page);
+	page_cache_release(old_page);
+out:
+	return err;
+}
+
+struct inode_operations osdfs_dir_inode_operations = {
+	.create 	= osdfs_create,
+	.lookup 	= osdfs_lookup,
+	.link   	= osdfs_link,
+	.unlink 	= osdfs_unlink,
+	.symlink	= osdfs_symlink,
+	.mkdir  	= osdfs_mkdir,
+	.rmdir  	= osdfs_rmdir,
+	.mknod  	= osdfs_mknod,
+	.rename 	= osdfs_rename,
+	.setattr	= osdfs_setattr,
+};
+
+struct inode_operations osdfs_special_inode_operations = {
+	.setattr	= osdfs_setattr,
+};
diff --git a/fs/osdfs/osdfs.h b/fs/osdfs/osdfs.h
index 29e7d7b..00c89f7 100644
--- a/fs/osdfs/osdfs.h
+++ b/fs/osdfs/osdfs.h
@@ -106,6 +106,11 @@ static inline struct osdfs_i_info *OSDFS_I(struct inode *inode)
 	return container_of(inode, struct osdfs_i_info, vfs_inode);
 }
 
+/*
+ * Maximum count of links to a file
+ */
+#define OSDFS_LINK_MAX           32000
+
 /*************************
  * function declarations *
  *************************/
@@ -179,11 +184,28 @@ void free_osd_req(struct osd_request *req);
 
 /* inode.c               */
 void osdfs_truncate(struct inode *inode);
+extern struct inode *osdfs_iget(struct super_block *, unsigned long);
+struct inode *osdfs_new_inode(struct inode *, int);
 int osdfs_setattr(struct dentry *, struct iattr *);
 
+/* dir.c:                */
+int osdfs_add_link(struct dentry *, struct inode *);
+ino_t osdfs_inode_by_name(struct inode *, struct dentry *);
+int osdfs_delete_entry(struct osdfs_dir_entry *, struct page *);
+int osdfs_make_empty(struct inode *, struct inode *);
+struct osdfs_dir_entry *osdfs_find_entry(struct inode *, struct dentry *,
+					 struct page **);
+int osdfs_empty_dir(struct inode *);
+struct osdfs_dir_entry *osdfs_dotdot(struct inode *, struct page **);
+void osdfs_set_link(struct inode *, struct osdfs_dir_entry *, struct page *,
+		    struct inode *);
+
 /*********************
  * operation vectors *
  *********************/
+/* dir.c:            */
+extern struct file_operations osdfs_dir_operations;
+
 /* file.c            */
 extern struct inode_operations osdfs_file_inode_operations;
 extern struct file_operations osdfs_file_operations;
@@ -191,6 +213,10 @@ extern struct file_operations osdfs_file_operations;
 /* inode.c           */
 extern struct address_space_operations osdfs_aops;
 
+/* namei.c           */
+extern struct inode_operations osdfs_dir_inode_operations;
+extern struct inode_operations osdfs_special_inode_operations;
+
 /* symlink.c         */
 extern struct inode_operations osdfs_symlink_inode_operations;
 extern struct inode_operations osdfs_fast_symlink_inode_operations;
-- 
1.6.0.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC 6/9] osdfs: super_operations and file_system_type
  2008-10-30 14:26 [RFC 0/9] osdfs Boaz Harrosh
                   ` (4 preceding siblings ...)
  2008-10-30 14:34 ` [RFC 5/9] osdfs: dir_inode and directory operations Boaz Harrosh
@ 2008-10-30 14:35 ` Boaz Harrosh
  2008-10-30 14:36 ` [RFC 7/9] osdfs: mkosdfs Boaz Harrosh
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Boaz Harrosh @ 2008-10-30 14:35 UTC (permalink / raw)
  To: Avishay Traeger, linux-scsi, linux-fsdevel, open-osd

This patch ties all operation vectors into a file system superblock
and registers the osdfs file_system_type at module's load time.

* The file system control block (AKA on-disk superblock) resides in
  an object with a special ID (defined in common.h).
  Information included in the file system control block is used to
  fill the in-memory superblock structure at mount time. This object
  is created before the file system is used by mkosdfs.c It contains
  information such as:
	- The file system's magic number
	- The next inode number to be allocated

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
 fs/osdfs/Kbuild  |    2 +-
 fs/osdfs/inode.c |  197 +++++++++++++++++++++-
 fs/osdfs/osdfs.h |   30 ++++
 fs/osdfs/super.c |  502 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 729 insertions(+), 2 deletions(-)
 create mode 100644 fs/osdfs/super.c

diff --git a/fs/osdfs/Kbuild b/fs/osdfs/Kbuild
index d6ac8d6..a005c04 100644
--- a/fs/osdfs/Kbuild
+++ b/fs/osdfs/Kbuild
@@ -20,5 +20,5 @@ EXTRA_CFLAGS += -I$(OSD_INC)
 # EXTRA_CFLAGS += -DCONFIG_OSDFS_DEBUG
 endif
 
-osdfs-objs := osd.o inode.o file.o symlink.o namei.o dir.o
+osdfs-objs := osd.o inode.o file.o symlink.o namei.o dir.o super.o
 obj-$(CONFIG_OSDFS_FS) += osdfs.o
diff --git a/fs/osdfs/inode.c b/fs/osdfs/inode.c
index 478805e..b140690 100644
--- a/fs/osdfs/inode.c
+++ b/fs/osdfs/inode.c
@@ -36,6 +36,8 @@
 
 #include "osdfs.h"
 
+static int osdfs_update_inode(struct inode *inode, int do_sync);
+
 /*
  * Test whether an inode is a fast symlink.
  */
@@ -47,6 +49,18 @@ static inline int osdfs_inode_is_fast_symlink(struct inode *inode)
 }
 
 /*
+ * Callback function from osdfs_delete_inode() - don't have much cleaning up to
+ * do.
+ */
+void delete_done(struct osd_request *req, void *p)
+{
+	struct osdfs_sb_info *sbi;
+	free_osd_req(req);
+	sbi = (struct osdfs_sb_info *)p;
+	atomic_dec(&sbi->s_curr_pending);
+}
+
+/*
  * get_block_t - Fill in a buffer_head
  * An OSD takes care of block allocation so we just fake an allocation by
  * putting in the inode's sector_t in the buffer_head.
@@ -61,6 +75,62 @@ int osdfs_get_block(struct inode *inode, sector_t iblock,
 }
 
 /*
+ * Called when the refcount of an inode reaches zero.  We remove the object
+ * from the OSD here.  We make sure the object was created before we try and
+ * delete it.
+ */
+void osdfs_delete_inode(struct inode *inode)
+{
+	struct osdfs_i_info *oi = OSDFS_I(inode);
+	struct osd_request *req = NULL;
+	struct super_block *sb = inode->i_sb;
+	struct osdfs_sb_info *sbi = sb->s_fs_info;
+	int ret;
+
+	truncate_inode_pages(&inode->i_data, 0);
+
+	if (is_bad_inode(inode))
+		goto no_delete;
+	mark_inode_dirty(inode);
+	osdfs_update_inode(inode, inode_needs_sync(inode));
+
+	inode->i_size = 0;
+	if (inode->i_blocks)
+		osdfs_truncate(inode);
+
+	clear_inode(inode);
+
+	req = prepare_osd_remove(sbi->s_dev, sbi->s_pid,
+				 inode->i_ino + OSDFS_OBJ_OFF);
+	if (!req) {
+		printk(KERN_ERR "ERROR: prepare_osd_remove failed\n");
+		return;
+	}
+
+	/* if we are deleting an obj that hasn't been created yet, wait */
+	if (!ObjCreated(oi)) {
+		if (!Obj2BCreated(oi))
+			BUG();
+		else
+			wait_event(oi->i_wq, ObjCreated(oi));
+	}
+
+	ret = osdfs_async_op(req, delete_done, sbi, oi->i_cred);
+	if (ret) {
+		printk(KERN_ERR
+		       "ERROR: @osdfs_delete_inode osdfs_async_op failed\n");
+		free_osd_req(req);
+		return;
+	}
+	atomic_inc(&sbi->s_curr_pending);
+
+	return;
+
+no_delete:
+	clear_inode(inode);
+}
+
+/*
  * Callback function when writepage finishes.  Check for errors, unlock, clean
  * up, etc.
  */
@@ -580,6 +650,132 @@ bad_inode:
 }
 
 /*
+ * Callback function from osdfs_update_inode().
+ */
+void updatei_done(struct osd_request *req, void *p)
+{
+	struct updatei_args *args = (struct updatei_args *)p;
+
+	free_osd_req(req);
+
+	atomic_dec(&args->sbi->s_curr_pending);
+
+	kfree(args->fcb);
+	kfree(args);
+	args = NULL;
+}
+
+/*
+ * Write the inode to the OSD.  Just fill up the struct, and set the attribute
+ * synchronously or asynchronously depending on the do_sync flag.
+ */
+static int osdfs_update_inode(struct inode *inode, int do_sync)
+{
+	struct osdfs_i_info *oi = OSDFS_I(inode);
+	struct super_block *sb = inode->i_sb;
+	struct osdfs_sb_info *sbi = sb->s_fs_info;
+	struct osd_request *req = NULL;
+	struct osdfs_fcb *fcb = NULL;
+	int ret;
+	int n;
+
+	fcb = kmalloc(sizeof(struct osdfs_fcb), GFP_KERNEL);
+	if (!fcb) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	fcb->i_mode = cpu_to_be16(inode->i_mode);
+	fcb->i_uid = cpu_to_be32(inode->i_uid);
+	fcb->i_gid = cpu_to_be32(inode->i_gid);
+	fcb->i_links_count = cpu_to_be16(inode->i_nlink);
+	fcb->i_ctime = cpu_to_be32(inode->i_ctime.tv_sec);
+	fcb->i_atime = cpu_to_be32(inode->i_atime.tv_sec);
+	fcb->i_mtime = cpu_to_be32(inode->i_mtime.tv_sec);
+	fcb->i_size = cpu_to_be64(i_size_read(inode));
+	fcb->i_generation = cpu_to_be32(inode->i_generation);
+	fcb->i_objs = cpu_to_be64(oi->i_objs);
+
+	if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode)) {
+		if (old_valid_dev(inode->i_rdev)) {
+			fcb->i_data[0] = old_encode_dev(inode->i_rdev);
+			fcb->i_data[1] = 0;
+		} else {
+			fcb->i_data[0] = 0;
+			fcb->i_data[1] = new_encode_dev(inode->i_rdev);
+			fcb->i_data[2] = 0;
+		}
+	} else
+		for (n = 0; n < OSDFS_IDATA; n++)
+			fcb->i_data[n] = oi->i_data[n];
+
+	req = prepare_osd_set_attr(sbi->s_dev, sbi->s_pid,
+				 (uint64_t) (inode->i_ino + OSDFS_OBJ_OFF));
+	if (!req) {
+		printk(KERN_ERR "ERROR: prepare set_attr failed.\n");
+		kfree(fcb);
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	prepare_set_attr_list_add_entry(req,
+					OSD_PAGE_NUM_IBM_UOBJ_FS_DATA,
+					OSD_ATTR_NUM_IBM_UOBJ_FS_DATA_INODE,
+					OSDFS_INO_ATTR_SIZE,
+					(unsigned char *)fcb);
+
+	if (!ObjCreated(oi)) {
+		if (!Obj2BCreated(oi))
+			BUG();
+		else
+			wait_event(oi->i_wq, ObjCreated(oi));
+	}
+
+	if (do_sync) {
+		ret = osdfs_sync_op(req, sbi->s_timeout, oi->i_cred);
+		free_osd_req(req);
+		kfree(fcb);
+	} else {
+		struct updatei_args *args = NULL;
+
+		args = kmalloc(sizeof(struct updatei_args), GFP_KERNEL);
+		if (!args) {
+			kfree(fcb);
+			ret = -ENOMEM;
+			goto out;
+		}
+		args->sbi = sbi;
+		args->fcb = fcb;
+
+		ret = osdfs_async_op(req, updatei_done, args, oi->i_cred);
+		if (ret) {
+			free_osd_req(req);
+			kfree(fcb);
+			kfree(args);
+			goto out;
+		}
+		atomic_inc(&sbi->s_curr_pending);
+	}
+out:
+	return ret;
+}
+
+int osdfs_write_inode(struct inode *inode, int wait)
+{
+	return osdfs_update_inode(inode, wait);
+}
+
+int osdfs_sync_inode(struct inode *inode)
+{
+	struct writeback_control wbc = {
+		.sync_mode = WB_SYNC_ALL,
+		.nr_to_write = 0,	/* sys_fsync did this */
+	};
+
+	return sync_inode(inode, &wbc);
+}
+
+/*
  * Set inode attributes - just call generic functions.
  */
 int osdfs_setattr(struct dentry *dentry, struct iattr *iattr)
@@ -594,7 +790,6 @@ int osdfs_setattr(struct dentry *dentry, struct iattr *iattr)
 	error = inode_setattr(inode, iattr);
 	return error;
 }
-
 /*
  * Callback function from osdfs_new_inode().  The important thing is that we
  * set the ObjCreated flag so that other methods know that the object exists on
diff --git a/fs/osdfs/osdfs.h b/fs/osdfs/osdfs.h
index 00c89f7..af8b998 100644
--- a/fs/osdfs/osdfs.h
+++ b/fs/osdfs/osdfs.h
@@ -49,6 +49,17 @@
 #endif
 
 /*
+ * struct to hold what we get from mount options
+ */
+struct osdfs_mountopt {
+	const char *dev_name;
+	uint64_t pid;
+	int timeout;
+	bool mkfs;
+	int format; /*in Mbyte*/
+};
+
+/*
  * our extension to the in-memory superblock
  */
 struct osdfs_sb_info {
@@ -107,6 +118,14 @@ static inline struct osdfs_i_info *OSDFS_I(struct inode *inode)
 }
 
 /*
+ * ugly struct so that we can pass two arguments to update_inode's callback
+ */
+struct updatei_args {
+	struct osdfs_sb_info	*sbi;
+	struct osdfs_fcb	*fcb;
+};
+
+/*
  * Maximum count of links to a file
  */
 #define OSDFS_LINK_MAX           32000
@@ -185,9 +204,17 @@ void free_osd_req(struct osd_request *req);
 /* inode.c               */
 void osdfs_truncate(struct inode *inode);
 extern struct inode *osdfs_iget(struct super_block *, unsigned long);
+extern int osdfs_write_inode(struct inode *, int);
+extern void osdfs_delete_inode(struct inode *);
 struct inode *osdfs_new_inode(struct inode *, int);
 int osdfs_setattr(struct dentry *, struct iattr *);
 
+/* super.c:              */
+#ifdef OSDFS_DEBUG
+void osdfs_dprint_internal(char *str, ...);
+#endif
+extern void osdfs_write_super(struct super_block *);
+
 /* dir.c:                */
 int osdfs_add_link(struct dentry *, struct inode *);
 ino_t osdfs_inode_by_name(struct inode *, struct dentry *);
@@ -217,6 +244,9 @@ extern struct address_space_operations osdfs_aops;
 extern struct inode_operations osdfs_dir_inode_operations;
 extern struct inode_operations osdfs_special_inode_operations;
 
+/* super.c           */
+extern struct super_operations osdfs_sops;
+
 /* symlink.c         */
 extern struct inode_operations osdfs_symlink_inode_operations;
 extern struct inode_operations osdfs_fast_symlink_inode_operations;
diff --git a/fs/osdfs/super.c b/fs/osdfs/super.c
new file mode 100644
index 0000000..095b960
--- /dev/null
+++ b/fs/osdfs/super.c
@@ -0,0 +1,502 @@
+/*
+ * Copyright (C) 2005, 2006
+ * Avishay Traeger (avishay@gmail.com) (avishay@il.ibm.com)
+ * Copyright (C) 2005, 2006
+ * International Business Machines
+ *
+ * Copyrights for code taken from ext2:
+ *     Copyright (C) 1992, 1993, 1994, 1995
+ *     Remy Card (card@masi.ibp.fr)
+ *     Laboratoire MASI - Institut Blaise Pascal
+ *     Universite Pierre et Marie Curie (Paris VI)
+ *     from
+ *     linux/fs/minix/inode.c
+ *     Copyright (C) 1991, 1992  Linus Torvalds
+ *
+ * This file is part of osdfs.
+ *
+ * osdfs is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation.  Since it is based on ext2, and the only
+ * valid version of GPL for the Linux kernel is version 2, the only valid
+ * version of GPL for osdfs is version 2.
+ *
+ * osdfs is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with osdfs; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+#include <linux/string.h>
+#include <linux/parser.h>
+#include <linux/vfs.h>
+#include <linux/random.h>
+
+#include "osdfs.h"
+
+/******************************************************************************
+ * MOUNT OPTIONS
+ *****************************************************************************/
+
+/*
+ * osdfs-specific mount-time options.
+ */
+enum { Opt_lun, Opt_tid, Opt_pid, Opt_to, Opt_mkfs, Opt_format, Opt_err };
+
+/*
+ * Our mount-time options.  These should ideally be 64-bit unsigned, but the
+ * kernel's parsing functions do not currently support that.  32-bit should be
+ * sufficient for most applications now.
+ */
+static match_table_t tokens = {
+	{Opt_pid, "pid=%u"},
+	{Opt_to, "to=%u"},
+	{Opt_err, NULL}
+};
+
+/*
+ * The main option parsing method.  Also makes sure that all of the mandatory
+ * mount options were set.
+ */
+static int parse_options(char *options, struct osdfs_mountopt *opts)
+{
+	char *p;
+	substring_t args[MAX_OPT_ARGS];
+	int option;
+	int s_pid = 0;
+
+	OSDFS_DBGMSG("parse_options %s\n", options);
+	/* defaults */
+	memset(opts, 0, sizeof(*opts));
+	opts->timeout = 20;
+
+	while ((p = strsep(&options, ",")) != NULL) {
+		int token;
+		if (!*p)
+			continue;
+
+		token = match_token(p, tokens, args);
+		switch (token) {
+		case Opt_pid:
+			if (match_int(&args[0], &option))
+				return -EINVAL;
+			if (option < 65536) {
+				printk(KERN_ERR "Partition ID must be >= 65536");
+				return -EINVAL;
+			}
+			opts->pid = option;
+			s_pid = 1;
+			break;
+		case Opt_to:
+			if (match_int(&args[0], &option))
+				return -EINVAL;
+			if (option <= 0) {
+				printk(KERN_ERR "Timout must be > 0");
+				return -EINVAL;
+			}
+			opts->timeout = option;
+			break;
+		}
+	}
+
+	if (!s_pid) {
+		printk(KERN_ERR "Need to specify the following options:\n");
+		printk(KERN_ERR "-o tid=X,lun=Y,pid=Z\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/******************************************************************************
+ * INODE CACHE
+ *****************************************************************************/
+
+/*
+ * Our inode cache.  Isn't it pretty?
+ */
+static struct kmem_cache *osdfs_inode_cachep;
+
+/*
+ * Allocate an inode in the cache
+ */
+static struct inode *osdfs_alloc_inode(struct super_block *sb)
+{
+	struct osdfs_i_info *oi;
+
+	oi = kmem_cache_alloc(osdfs_inode_cachep, GFP_KERNEL);
+	if (!oi)
+		return NULL;
+
+	oi->vfs_inode.i_version = 1;
+	return &oi->vfs_inode;
+}
+
+/*
+ * Remove an inode from the cache
+ */
+static void osdfs_destroy_inode(struct inode *inode)
+{
+	kmem_cache_free(osdfs_inode_cachep, OSDFS_I(inode));
+}
+
+/*
+ * Initialize the inode
+ */
+static void osdfs_init_once(void *foo)
+{
+	struct osdfs_i_info *oi = foo;
+
+	inode_init_once(&oi->vfs_inode);
+}
+
+/*
+ * Create and initialize the inode cache
+ */
+static int init_inodecache(void)
+{
+	osdfs_inode_cachep = kmem_cache_create("osdfs_inode_cache",
+					       sizeof(struct osdfs_i_info),
+					       0, SLAB_RECLAIM_ACCOUNT,
+					       osdfs_init_once);
+	if (osdfs_inode_cachep == NULL)
+		return -ENOMEM;
+	return 0;
+}
+
+/*
+ * Destroy the inode cache
+ */
+static void destroy_inodecache(void)
+{
+	kmem_cache_destroy(osdfs_inode_cachep);
+}
+
+/******************************************************************************
+ * SUPERBLOCK FUNCTIONS
+ *****************************************************************************/
+
+/*
+ * Write the superblock to the OSD
+ */
+void osdfs_write_super(struct super_block *sb)
+{
+	struct osdfs_sb_info *sbi;
+	struct osdfs_fscb *fscb = NULL;
+	struct osd_request *req = NULL;
+
+	fscb = kzalloc(sizeof(struct osdfs_fscb), GFP_KERNEL);
+	if (!fscb)
+		return;
+
+	lock_kernel();
+	sbi = sb->s_fs_info;
+	fscb->s_nextid = sbi->s_nextid;
+	fscb->s_magic = sb->s_magic;
+	fscb->s_numfiles = sbi->s_numfiles;
+	fscb->s_newfs = 0;
+
+	req = prepare_osd_write(sbi->s_dev, sbi->s_pid, OSDFS_SUPER_ID,
+				sizeof(struct osdfs_fscb), 0, 0,
+				(unsigned char *)(fscb));
+	if (!req) {
+		printk(KERN_ERR "ERROR: write super failed.\n");
+		kfree(fscb);
+		return;
+	}
+
+	osdfs_sync_op(req, sbi->s_timeout, sbi->s_cred);
+	free_osd_req(req);
+	sb->s_dirt = 0;
+	unlock_kernel();
+	kfree(fscb);
+}
+
+/*
+ * This function is called when the vfs is freeing the superblock.  We just
+ * need to free our own part.
+ */
+static void osdfs_put_super(struct super_block *sb)
+{
+	int num_pend;
+	struct osdfs_sb_info *sbi = sb->s_fs_info;
+
+	/* make sure there are no pending commands */
+	for (num_pend = atomic_read(&sbi->s_curr_pending); num_pend > 0;
+	     num_pend = atomic_read(&sbi->s_curr_pending)) {
+		wait_queue_head_t wq;
+		init_waitqueue_head(&wq);
+		wait_event_timeout(wq,
+				  (atomic_read(&sbi->s_curr_pending) == 0),
+				  msecs_to_jiffies(100));
+	}
+
+	osduld_put_device(sbi->s_dev);
+	kfree(sb->s_fs_info);
+	sb->s_fs_info = NULL;
+}
+
+/*
+ * Read the superblock from the OSD and fill in the fields
+ */
+static int osdfs_fill_super(struct super_block *sb, void *data, int silent)
+{
+	struct inode *root;
+	struct osdfs_mountopt *opts = data;
+	struct osdfs_sb_info *sbi = NULL;    /*extended info                  */
+	struct osdfs_fscb fscb;		     /*on-disk superblock info        */
+	struct osd_request *req = NULL;
+	int ret;
+
+	sbi = kzalloc(sizeof(*sbi), GFP_KERNEL);
+	if (!sbi)
+		return -ENOMEM;
+	sb->s_fs_info = sbi;
+
+	/* use mount options to fill superblock */
+	sbi->s_dev = osduld_path_lookup(opts->dev_name);
+	if (IS_ERR(sbi->s_dev)) {
+		ret = PTR_ERR(sbi->s_dev);
+		sbi->s_dev = NULL;
+		goto free_sbi;
+	}
+
+	sbi->s_pid = opts->pid;
+	sbi->s_timeout = opts->timeout;
+
+	/* fill in some other data by hand */
+	memset(sb->s_id, 0, sizeof(sb->s_id));
+	strcpy(sb->s_id, "osdfs");
+	sb->s_blocksize = OSDFS_BLKSIZE;
+	sb->s_blocksize_bits = OSDFS_BLKSHIFT;
+	atomic_set(&sbi->s_curr_pending, 0);
+	sb->s_bdev = NULL;
+	sb->s_dev = 0;
+
+	/* read data from on-disk superblock object */
+	make_credential(sbi->s_cred, sbi->s_pid, OSDFS_SUPER_ID);
+
+	req = prepare_osd_read(sbi->s_dev, sbi->s_pid, OSDFS_SUPER_ID,
+			       sizeof(struct osdfs_fscb), 0, 0,
+			       (unsigned char *)(&fscb));
+	if (!req) {
+		if (!silent)
+			printk(KERN_ERR
+			       "ERROR: could not prepare read request.\n");
+		ret = -ENOMEM;
+		goto free_sbi;
+	}
+
+	ret = osdfs_sync_op(req, sbi->s_timeout, sbi->s_cred);
+	if (ret != 0) {
+		if (!silent)
+			printk(KERN_ERR "ERROR: read super failed.\n");
+		ret = -EIO;
+		goto free_sbi;
+	}
+
+	sb->s_magic = fscb.s_magic;
+	sbi->s_nextid = fscb.s_nextid;
+	sbi->s_numfiles = fscb.s_numfiles;
+
+	/* make sure what we read from the object store is correct */
+	if (sb->s_magic != OSDFS_SUPER_MAGIC) {
+		if (!silent)
+			printk(KERN_ERR "ERROR: Bad magic value\n");
+		ret = -EINVAL;
+		goto free_sbi;
+	}
+
+	/* start generation numbers from a random point */
+	get_random_bytes(&sbi->s_next_generation, sizeof(u32));
+	spin_lock_init(&sbi->s_next_gen_lock);
+
+	/* set up operation vectors */
+	sb->s_op = &osdfs_sops;
+	root = osdfs_iget(sb, OSDFS_ROOT_ID - OSDFS_OBJ_OFF);
+	if (IS_ERR(root)) {
+		ret = PTR_ERR(root);
+		goto free_sbi;
+	}
+	sb->s_root = d_alloc_root(root);
+	if (!sb->s_root) {
+		iput(root);
+		printk(KERN_ERR "ERROR: get root inode failed\n");
+		ret = -ENOMEM;
+		goto free_sbi;
+	}
+
+	if (!S_ISDIR(root->i_mode)) {
+		dput(sb->s_root);
+		sb->s_root = NULL;
+		printk(KERN_ERR "ERROR: corrupt root inode (mode = %hd)\n",
+		       root->i_mode);
+		ret = -EINVAL;
+		goto free_sbi;
+	}
+
+	ret = 0;
+out:
+	if (req)
+		free_osd_req(req);
+	return ret;
+
+free_sbi:
+	osduld_put_device(sbi->s_dev); /* NULL safe */
+	kfree(sbi);
+	goto out;
+}
+
+/*
+ * Set up the superblock (calls osdfs_fill_super eventually)
+ */
+static int osdfs_get_sb(struct file_system_type *type,
+			  int flags, const char *dev_name,
+			  void *data, struct vfsmount *mnt)
+{
+	struct osdfs_mountopt opts;
+	int ret;
+
+	ret = parse_options((char *) data, &opts);
+	if (ret)
+		return ret;
+
+	opts.dev_name = dev_name;
+	return get_sb_nodev(type, flags, &opts, osdfs_fill_super, mnt);
+}
+
+/*
+ * Return information about the file system state in the buffer.  This is used
+ * by the 'df' command, for example.
+ */
+static int osdfs_statfs(struct dentry *dentry, struct kstatfs *buf)
+{
+	struct super_block *sb = dentry->d_sb;
+	struct osdfs_sb_info *sbi = sb->s_fs_info;
+	uint8_t cred_a[OSD_CAP_LEN];
+	struct osd_request *req = NULL;
+	uint32_t page;
+	uint32_t attr;
+	uint16_t expected;
+	uint64_t capacity;
+	uint64_t used;
+	uint8_t *data;
+	int ret;
+
+	/* get used/capacity attributes */
+	make_credential(cred_a, sbi->s_pid, 0);
+
+	req = prepare_osd_get_attr(sbi->s_dev, sbi->s_pid, 0);
+	if (!req) {
+		printk(KERN_ERR "ERROR: prepare get_attr failed.\n");
+		return -1;
+	}
+
+	prepare_get_attr_list_add_entry(req,
+			OSD_APAGE_PARTITION_QUOTAS,
+			OSD_ATTR_PQ_CAPACITY_QUOTA,
+			8);
+
+	prepare_get_attr_list_add_entry(req,
+			OSD_APAGE_PARTITION_INFORMATION,
+			OSD_ATTR_PI_USED_CAPACITY,
+			8);
+
+	ret = osdfs_sync_op(req, sbi->s_timeout, cred_a);
+	if (ret)
+		goto out;
+
+	page = OSD_APAGE_PARTITION_QUOTAS;
+	attr = OSD_ATTR_PQ_CAPACITY_QUOTA;
+	expected = 8;
+	ret = extract_next_attr_from_req(req, &page, &attr, &expected, &data);
+	if (ret) {
+		printk(KERN_ERR "ERROR: extract attr from req failed\n");
+		goto out;
+	}
+	capacity = be64_to_cpu(*((uint64_t *)data));
+
+	page = OSD_APAGE_PARTITION_INFORMATION;
+	attr = OSD_ATTR_PI_USED_CAPACITY;
+	expected = 8;
+	ret = extract_next_attr_from_req(req, &page, &attr, &expected, &data);
+	if (ret) {
+		printk(KERN_ERR "ERROR: extract attr from req failed\n");
+		goto out;
+	}
+	used = be64_to_cpu(*((uint64_t *)data));
+
+	/* fill in the stats buffer */
+	buf->f_type = OSDFS_SUPER_MAGIC;
+	buf->f_bsize = OSDFS_BLKSIZE;
+	buf->f_blocks = (capacity >> OSDFS_BLKSHIFT);
+	buf->f_bfree = ((capacity - used) >> OSDFS_BLKSHIFT);
+	buf->f_bavail = buf->f_bfree;
+	buf->f_files = sbi->s_numfiles;
+	buf->f_ffree = OSDFS_MAX_ID - sbi->s_numfiles;
+	buf->f_namelen = OSDFS_NAME_LEN;
+out:
+	free_osd_req(req);
+
+	return ret;
+}
+
+struct super_operations osdfs_sops = {
+	.alloc_inode    = osdfs_alloc_inode,
+	.destroy_inode  = osdfs_destroy_inode,
+	.write_inode    = osdfs_write_inode,
+	.delete_inode   = osdfs_delete_inode,
+	.put_super      = osdfs_put_super,
+	.write_super    = osdfs_write_super,
+	.statfs         = osdfs_statfs,
+};
+
+/******************************************************************************
+ * INSMOD/RMMOD
+ *****************************************************************************/
+
+/*
+ * struct that describes this file system
+ */
+static struct file_system_type osdfs_type = {
+	.owner          = THIS_MODULE,
+	.name           = "osdfs",
+	.get_sb         = osdfs_get_sb,
+	.kill_sb        = generic_shutdown_super,
+};
+
+static int __init init_osdfs(void)
+{
+	int err;
+
+	err = init_inodecache();
+	if (err)
+		goto out;
+
+	err = register_filesystem(&osdfs_type);
+	if (err)
+		goto out_d;
+
+	return 0;
+out_d:
+	destroy_inodecache();
+out:
+	return err;
+}
+
+static void __exit exit_osdfs(void)
+{
+	unregister_filesystem(&osdfs_type);
+	destroy_inodecache();
+}
+
+MODULE_AUTHOR("Avishay Traeger <avishay@gmail.com>");
+MODULE_DESCRIPTION("osdfs");
+MODULE_LICENSE("GPL");
+
+module_init(init_osdfs)
+module_exit(exit_osdfs)
-- 
1.6.0.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC 7/9] osdfs: mkosdfs
  2008-10-30 14:26 [RFC 0/9] osdfs Boaz Harrosh
                   ` (5 preceding siblings ...)
  2008-10-30 14:35 ` [RFC 6/9] osdfs: super_operations and file_system_type Boaz Harrosh
@ 2008-10-30 14:36 ` Boaz Harrosh
  2008-10-30 15:03 ` [RFC 8/9] osdfs: Documentation Boaz Harrosh
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 14+ messages in thread
From: Boaz Harrosh @ 2008-10-30 14:36 UTC (permalink / raw)
  To: Avishay Traeger, linux-scsi, linux-fsdevel, open-osd

We need a mechanism to prepare the file system (mkfs).
I chose to implement that by means of a couple of
mount-options. Because there is no user-mode API for committing
OSD commands. And also, all this stuff is highly internal to
the file system itself.

- Added two mount options mkfs=0/1,format=capacity_in_meg, so mkfs/format
  can be executed by kernel code just before mount. An mkosdfs utility
  can now be implemented by means of a script that mounts and unmount the
  file system with proper options.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
 fs/osdfs/Kbuild    |    2 +-
 fs/osdfs/mkosdfs.c |  605 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/osdfs/osdfs.h   |    3 +
 fs/osdfs/super.c   |   18 ++
 4 files changed, 627 insertions(+), 1 deletions(-)
 create mode 100644 fs/osdfs/mkosdfs.c

diff --git a/fs/osdfs/Kbuild b/fs/osdfs/Kbuild
index a005c04..34c718b 100644
--- a/fs/osdfs/Kbuild
+++ b/fs/osdfs/Kbuild
@@ -20,5 +20,5 @@ EXTRA_CFLAGS += -I$(OSD_INC)
 # EXTRA_CFLAGS += -DCONFIG_OSDFS_DEBUG
 endif
 
-osdfs-objs := osd.o inode.o file.o symlink.o namei.o dir.o super.o
+osdfs-objs := osd.o inode.o file.o symlink.o namei.o dir.o super.o mkosdfs.o
 obj-$(CONFIG_OSDFS_FS) += osdfs.o
diff --git a/fs/osdfs/mkosdfs.c b/fs/osdfs/mkosdfs.c
new file mode 100644
index 0000000..663fe6f
--- /dev/null
+++ b/fs/osdfs/mkosdfs.c
@@ -0,0 +1,605 @@
+/*
+ * mkosdfs.c - make an osdfs file system.
+ *
+ * Copyright (C) 2005, 2006
+ * Avishay Traeger (avishay@gmail.com) (avishay@il.ibm.com)
+ * Copyright (C) 2005, 2006
+ * International Business Machines
+ *
+ * Copyrights from mke2fs.c:
+ *     Copyright (C) 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002,
+ *     2003, 2004, 2005 by Theodore Ts'o.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+#include "osdfs.h"
+#include <linux/random.h>
+
+/* #define __MKOSDFS_DEBUG_CHECKS 1 */
+
+static int kick_it(struct osd_request *req, int timeout, uint8_t *cred_a,
+		   const char *op)
+{
+	return osdfs_sync_op(req, timeout, cred_a);
+}
+
+/* Format the LUN to the specified size */
+static int format(uint64_t lun_capacity, struct osd_dev *dev, int timeout)
+{
+	struct osd_request *req = prepare_osd_format_lun(dev, lun_capacity);
+	uint8_t cred_a[OSD_CAP_LEN];
+	int ret;
+
+	make_credential(cred_a, 0, 0);
+
+	if (req == NULL) {
+		OSDFS_ERR("ERROR: Failed to allocate request.\n");
+		return -ENOMEM;
+	}
+
+	ret = kick_it(req, timeout, cred_a, "format");
+
+	free_osd_req(req);
+
+	return ret;
+}
+
+static int create_partition(struct osd_dev *dev, uint64_t p_id, int timeout)
+{
+	struct osd_request *req;
+	uint8_t cred_a[OSD_CAP_LEN];
+	bool try_remove = false;
+	int ret;
+
+	make_credential(cred_a, p_id, 0);
+
+create_part:
+	req = prepare_osd_create_partition(dev, p_id);
+	if (!req)
+		return -ENOMEM;
+	ret = kick_it(req, timeout, cred_a, "create partition");
+	free_osd_req(req);
+
+	if (ret && !try_remove) {
+		try_remove = true;
+		req = prepare_osd_remove_partition(dev, p_id);
+		if (!req)
+			return -ENOMEM;
+		ret = kick_it(req, timeout, cred_a, "remove partition");
+		free_osd_req(req);
+		if (!ret) /* Try again now */
+			goto create_part;
+	}
+
+	return ret;
+}
+
+#ifdef __MKOSDFS_DEBUG_CHECKS
+static int list(struct osd_dev *dev, uint64_t p_id, int timeout)
+{
+	struct osd_request *req;
+	uint8_t cred_a[OSD_CAP_LEN];
+	unsigned char *buf = NULL;
+	int ret;
+	uint64_t total_matches;
+	uint64_t total_ret;
+	uint64_t *id_list;
+	int is_part, is_utd;
+	uint64_t cont;
+	uint32_t more;
+	int i;
+
+	buf = kzalloc(1024, GFP_KERNEL);
+	if (!buf) {
+		OSDFS_ERR("ERROR: Failed to allocate memory.\n");
+		return -ENOMEM;
+	}
+
+	make_credential(cred_a, p_id, 0);
+
+	req = prepare_osd_list(dev, p_id, 0, 1024, 0, 0, buf);
+	if (req == NULL) {
+		OSDFS_ERR("ERROR: Failed to allocate request.\n");
+		return -ENOMEM;
+	}
+
+	ret = kick_it(req, timeout, cred_a, "list");
+	if (ret != 0)
+		goto out;
+
+	ret = extract_list_from_req(req, &total_matches, &total_ret, &id_list,
+				    &is_part, &is_utd, &cont, &more);
+
+	OSDFS_DBGMSG("created %llu objects:\n", total_ret);
+	for (i = 0 ; i < total_ret ; i++)
+		OSDFS_DBGMSG("%llu\n", id_list[i]);
+
+out:
+	free_osd_req(req);
+	kfree(buf);
+
+	return ret;
+}
+#endif
+
+static int create(struct osd_dev *dev, uint64_t p_id, uint64_t o_id,
+		  int timeout)
+{
+	struct osd_request *req;
+	uint8_t cred_a[OSD_CAP_LEN];
+	int ret;
+
+	make_credential(cred_a, p_id, o_id);
+	req = prepare_osd_create(dev, p_id, o_id);
+
+	if (req == NULL) {
+		OSDFS_ERR("ERROR: Failed to allocate request.\n");
+		return -ENOMEM;
+	}
+
+	ret = kick_it(req, timeout, cred_a, "create");
+
+	free_osd_req(req);
+
+	return ret;
+}
+
+static int write_super(struct osd_dev *dev, uint64_t p_id, int timeout,
+		       int newfile)
+{
+	struct osd_request *req;
+	uint8_t cred_a[OSD_CAP_LEN];
+	struct osdfs_fscb data;
+	int ret;
+
+	make_credential(cred_a, p_id, OSDFS_SUPER_ID);
+
+	data.s_nextid = 4;
+	data.s_magic = OSDFS_SUPER_MAGIC;
+	data.s_newfs = 1;
+	if (newfile)
+		data.s_numfiles = 1;
+	else
+		data.s_numfiles = 0;
+
+	req = prepare_osd_write(dev, p_id, OSDFS_SUPER_ID,
+				sizeof(struct osdfs_fscb), 0, 0,
+				(unsigned char *)(&data));
+
+	if (req == NULL) {
+		OSDFS_ERR("ERROR: Failed to allocate request.\n");
+		return -ENOMEM;
+	}
+
+	ret = kick_it(req, timeout, cred_a, "write super");
+
+	free_osd_req(req);
+
+	return ret;
+}
+
+#ifdef __MKOSDFS_DEBUG_CHECKS
+static int read_super(struct osd_dev *dev, uint64_t p_id, int timeout)
+{
+	struct osd_request *req;
+	uint8_t cred_a[OSD_CAP_LEN];
+	struct osdfs_fscb data;
+	int ret;
+
+	make_credential(cred_a, p_id, OSDFS_SUPER_ID);
+
+	req = prepare_osd_read(dev, p_id, OSDFS_SUPER_ID,
+				sizeof(struct osdfs_fscb), 0, 0,
+				(unsigned char *)(&data));
+
+	if (req == NULL) {
+		OSDFS_ERR("ERROR: Failed to allocate request.\n");
+		return -ENOMEM;
+	}
+
+	ret = kick_it(req, timeout, cred_a, "read super");
+	if (ret)
+		goto out;
+
+	OSDFS_DBGMSG("nextid:\t%u\n", data.s_nextid);
+	OSDFS_DBGMSG("magic:\t%u\n", data.s_magic);
+	OSDFS_DBGMSG("numfiles:\t%u\n", data.s_numfiles);
+out:
+	free_osd_req(req);
+
+	return ret;
+}
+#endif
+
+static int write_bitmap(struct osd_dev *dev, uint64_t p_id, int timeout)
+{
+	struct osd_request *req;
+	uint8_t cred_a[OSD_CAP_LEN];
+	uint64_t off = 0;
+	unsigned int id = 3;
+	int ret;
+
+	/* XXX: For now just use counter - later make bitmap */
+	make_credential(cred_a, p_id, OSDFS_BM_ID);
+
+	req = prepare_osd_write(dev, p_id, OSDFS_BM_ID, sizeof(unsigned int),
+				off, 0, (unsigned char *)&id);
+	if (req == NULL) {
+		OSDFS_ERR("ERROR: Failed to allocate request.\n");
+		return -ENOMEM;
+	}
+
+	ret = kick_it(req, timeout, cred_a, "write bitmap");
+
+	free_osd_req(req);
+
+	return ret;
+}
+
+#ifdef __MKOSDFS_DEBUG_CHECKS
+static int write_testfile(struct osd_dev *dev, uint64_t p_id, int timeout)
+{
+	struct osd_request *req;
+	uint8_t cred_a[OSD_CAP_LEN];
+	uint64_t off = 0;
+	unsigned char buf[64];
+	int ret;
+
+	strcpy((char *)buf, "This file is a test, it is only a test.");
+	make_credential(cred_a, p_id, OSDFS_TEST_ID);
+
+	req = prepare_osd_write(dev, p_id, OSDFS_TEST_ID, 64, off, 0, buf);
+
+	if (req == NULL) {
+		OSDFS_ERR("ERROR: Failed to allocate request.\n");
+		return -ENOMEM;
+	}
+
+	ret = kick_it(req, timeout, cred_a, "write bitmap");
+
+	free_osd_req(req);
+
+	return ret;
+}
+
+static int read_testfile(struct osd_dev *dev, uint64_t p_id, int timeout)
+{
+	struct osd_request *req;
+	uint8_t cred_a[OSD_CAP_LEN];
+	unsigned char data[64];
+	int ret;
+
+	make_credential(cred_a, p_id, OSDFS_TEST_ID);
+
+	req = prepare_osd_read(dev, p_id, OSDFS_TEST_ID, 64, 0, 0, data);
+
+	if (req == NULL) {
+		OSDFS_ERR("ERROR: Failed to allocate request.\n");
+		return -ENOMEM;
+	}
+
+	ret = kick_it(req, timeout, cred_a, "read test file");
+	if (ret)
+		goto out;
+
+	OSDFS_DBGMSG("test file: %s\n", data);
+
+out:
+	free_osd_req(req);
+
+	return ret;
+}
+#endif
+
+static int write_rootdir(struct osd_dev *dev, uint64_t p_id, int timeout,
+			 int newfile)
+{
+	struct osd_request *req;
+	uint8_t cred_a[OSD_CAP_LEN];
+	struct osdfs_dir_entry *dir;
+	uint64_t off = 0;
+	unsigned char *buf = NULL;
+	int filetype = OSDFS_FT_DIR << 8;
+	int filetype2 = OSDFS_FT_REG_FILE << 8;
+	int rec_len;
+	int done;
+	int ret;
+
+	buf = kzalloc(OSDFS_BLKSIZE, GFP_KERNEL);
+	if (!buf) {
+		OSDFS_ERR("ERROR: Failed to allocate memory.\n");
+		return -ENOMEM;
+	}
+	dir = (struct osdfs_dir_entry *)buf;
+
+	/* create entry for '.' */
+	dir->name[0] = '.';
+	dir->name_len = 1 | filetype;
+	dir->inode = OSDFS_ROOT_ID - OSDFS_OBJ_OFF;
+	dir->rec_len = OSDFS_DIR_REC_LEN(1);
+	rec_len = OSDFS_BLKSIZE - OSDFS_DIR_REC_LEN(1);
+
+	/* create entry for '..' */
+	dir = (struct osdfs_dir_entry *) (buf + dir->rec_len);
+	dir->name[0] = '.';
+	dir->name[1] = '.';
+	dir->name_len = 2 | filetype;
+	dir->inode = OSDFS_ROOT_ID - OSDFS_OBJ_OFF;
+	if (newfile) {
+		rec_len -= OSDFS_DIR_REC_LEN(2);
+		dir->rec_len = OSDFS_DIR_REC_LEN(2);
+	} else
+		dir->rec_len = rec_len;
+	done = OSDFS_DIR_REC_LEN(1) + dir->rec_len;
+
+	/* create entry for 'test', if specified */
+	if (newfile) {
+		dir = (struct osdfs_dir_entry *) (buf + done);
+		dir->inode = OSDFS_TEST_ID - OSDFS_OBJ_OFF;
+		dir->name_len = 4 | filetype2;
+		dir->name[0] = 't';
+		dir->name[1] = 'e';
+		dir->name[2] = 's';
+		dir->name[3] = 't';
+		dir->rec_len = rec_len;
+	}
+
+	make_credential(cred_a, p_id, OSDFS_ROOT_ID);
+
+	req = prepare_osd_write(dev, p_id, OSDFS_ROOT_ID, OSDFS_BLKSIZE, off,
+				0, buf);
+
+	if (req == NULL) {
+		OSDFS_ERR("ERROR: Failed to allocate request.\n");
+		return -ENOMEM;
+	}
+
+	ret = kick_it(req, timeout, cred_a, "write rootdir");
+
+	free_osd_req(req);
+
+	return ret;
+}
+
+static int set_inode(struct osd_dev *dev, uint64_t p_id, int timeout,
+		     uint64_t o_id, uint16_t mode)
+{
+	struct osd_request *req;
+	uint8_t cred_a[OSD_CAP_LEN];
+	struct osdfs_fcb in = {0};
+	struct osdfs_fcb *inode = &in;
+	uint32_t i_generation;
+	int ret;
+
+	inode->i_mode = cpu_to_be16(mode);
+	inode->i_uid = inode->i_gid = 0;
+	inode->i_links_count = cpu_to_be16(2);
+	inode->i_ctime = inode->i_atime = inode->i_mtime =
+					cpu_to_be32(CURRENT_TIME.tv_sec);
+	inode->i_size = cpu_to_be64(OSDFS_BLKSIZE);
+	if (o_id != OSDFS_ROOT_ID)
+		inode->i_size = cpu_to_be64(64);
+
+	get_random_bytes(&i_generation, sizeof(i_generation));
+	inode->i_generation = cpu_to_be32(i_generation);
+
+	make_credential(cred_a, p_id, o_id);
+
+	req = prepare_osd_set_attr(dev, p_id, o_id);
+
+	if (req == NULL) {
+		OSDFS_ERR("ERROR: Failed to allocate request.\n");
+		return -ENOMEM;
+	}
+
+	prepare_set_attr_list_add_entry(req,
+			OSD_PAGE_NUM_IBM_UOBJ_FS_DATA,
+			OSD_ATTR_NUM_IBM_UOBJ_FS_DATA_INODE,
+			OSDFS_INO_ATTR_SIZE,
+			(unsigned char *)inode);
+
+	ret = kick_it(req, timeout, cred_a, "set inode");
+
+	free_osd_req(req);
+
+	return ret;
+}
+
+#ifdef __MKOSDFS_DEBUG_CHECKS
+static int get_root_attr(struct osd_dev *dev, uint64_t p_id, int timeout)
+{
+	struct osd_request *req;
+	uint8_t cred_a[OSD_CAP_LEN];
+	struct osdfs_fcb in = {0};
+	struct osdfs_fcb *inode = &in;
+	uint32_t page = OSD_PAGE_NUM_IBM_UOBJ_FS_DATA;
+	uint32_t attr = OSD_ATTR_NUM_IBM_UOBJ_FS_DATA_INODE;
+	uint16_t expected = OSDFS_INO_ATTR_SIZE;
+	uint8_t *buf;
+	int ret;
+
+	make_credential(cred_a, p_id, OSDFS_ROOT_ID);
+
+	req = prepare_osd_get_attr(dev, p_id, OSDFS_ROOT_ID);
+	if (req == NULL) {
+		OSDFS_ERR("ERROR: Failed to allocate request.\n");
+		return -ENOMEM;
+	}
+
+	prepare_get_attr_list_add_entry(req,
+			OSD_PAGE_NUM_IBM_UOBJ_FS_DATA,
+			OSD_ATTR_NUM_IBM_UOBJ_FS_DATA_INODE,
+			OSDFS_INO_ATTR_SIZE);
+
+	ret = kick_it(req, timeout, cred_a, "get root inode");
+	if (ret)
+		goto out;
+
+	ret = extract_next_attr_from_req(req, &page, &attr, &expected, &buf);
+	if (ret) {
+		OSDFS_ERR("ERROR: extract attr from req failed\n");
+		goto out;
+	}
+
+	memcpy(inode, buf, sizeof(struct osdfs_fcb));
+
+	OSDFS_DBGMSG("mode: %u\n", be16_to_cpu(inode->i_mode));
+	OSDFS_DBGMSG("uid: %u\n", be32_to_cpu(inode->i_uid));
+	OSDFS_DBGMSG("gid: %u\n", be32_to_cpu(inode->i_gid));
+	OSDFS_DBGMSG("links: %u\n", be16_to_cpu(inode->i_links_count));
+	OSDFS_DBGMSG("ctime: %u\n", be32_to_cpu(inode->i_ctime));
+	OSDFS_DBGMSG("atime: %u\n", be32_to_cpu(inode->i_atime));
+	OSDFS_DBGMSG("mtime: %u\n", be32_to_cpu(inode->i_mtime));
+	OSDFS_DBGMSG("gen: %u\n", be32_to_cpu(inode->i_generation));
+	OSDFS_DBGMSG("size: %llu\n", be64_to_cpu(inode->i_size));
+
+out:
+	free_osd_req(req);
+
+	return ret;
+}
+#endif
+
+/*
+ * This function creates an osdfs file system on the specified OSD partition.
+ */
+int osdfs_mkfs(struct osd_dev *dev, uint64_t p_id, uint64_t format_size_meg)
+{
+	int err;
+	const int to_format = (4 * 60 * HZ);
+	const int to_gen = (60 * HZ);
+	bool newfile = false;
+
+	/* Get a handle */
+	OSDFS_DBGMSG("setting up osdfs on partition %llu:\n", p_id);
+
+	/* Format LUN if requested */
+	if (format_size_meg > 0) {
+		OSDFS_DBGMSG("formatting %lld Mgb...\n", format_size_meg);
+		err = format(format_size_meg * 1024 * 1024, dev, to_format);
+		if (err)
+			goto out;
+		OSDFS_DBGMSG(" OK\n");
+	}
+
+	/* Create partition */
+	OSDFS_DBGMSG("creating partition...\n");
+	err = create_partition(dev, p_id, to_gen);
+	if (err)
+		goto out;
+	OSDFS_DBGMSG(" OK\n");
+
+	/* Create object with known ID for superblock info */
+	OSDFS_DBGMSG("creating superblock...\n");
+	err = create(dev, p_id, OSDFS_SUPER_ID, to_gen);
+	if (err)
+		goto out;
+	OSDFS_DBGMSG(" OK\n");
+
+	/* Create root directory object */
+	OSDFS_DBGMSG("creating root directory...\n");
+	err = create(dev, p_id, OSDFS_ROOT_ID, to_gen);
+	if (err)
+		goto out;
+	OSDFS_DBGMSG(" OK\n");
+
+	/* Create bitmap object */
+	OSDFS_DBGMSG("creating free ID bitmap...\n");
+	err = create(dev, p_id, OSDFS_BM_ID, to_gen);
+	if (err)
+		goto out;
+	OSDFS_DBGMSG(" OK\n");
+
+#ifdef __MKOSDFS_DEBUG_CHECKS
+	/* Create a test file, if specified by options */
+	if (newfile) {
+		OSDFS_DBGMSG("creating test file...\n");
+		err = create(dev, p_id, OSDFS_TEST_ID, to_gen);
+		if (err)
+			goto out;
+		OSDFS_DBGMSG(" OK\n");
+	}
+#endif
+
+	/* Write superblock */
+	OSDFS_DBGMSG("writing superblock...\n");
+	err = write_super(dev, p_id, to_gen, newfile);
+	if (err)
+		goto out;
+	OSDFS_DBGMSG(" OK\n");
+
+	/* Write root directory */
+	OSDFS_DBGMSG("writing root directory...\n");
+	err = write_rootdir(dev, p_id, to_gen, newfile);
+	if (err)
+		goto out;
+	OSDFS_DBGMSG(" OK\n");
+
+	/* Set root partition inode attribute */
+	OSDFS_DBGMSG("writing root inode...\n");
+	err = set_inode(dev, p_id, to_gen, OSDFS_ROOT_ID,
+			0040000 | (0777 & ~022));
+	if (err)
+		goto out;
+	OSDFS_DBGMSG(" OK\n");
+
+#ifdef __MKOSDFS_DEBUG_CHECKS
+	/* Set test file inode attribute */
+	if (newfile) {
+		OSDFS_DBGMSG("writing test inode...\n");
+		err = set_inode(dev, p_id, to_gen, OSDFS_TEST_ID,
+				0100000 | (0777 & ~022));
+		if (err)
+			goto out;
+		OSDFS_DBGMSG(" OK\n");
+	}
+#endif
+	/* Write bitmap */
+	OSDFS_DBGMSG("writing free ID bitmap...\n");
+	err = write_bitmap(dev, p_id, to_gen);
+	if (err)
+		goto out;
+	OSDFS_DBGMSG(" OK\n");
+
+#ifdef __MKOSDFS_DEBUG_CHECKS
+	/* Write test file */
+	if (newfile) {
+		OSDFS_DBGMSG("writing test file...\n");
+		err = write_testfile(dev, p_id, to_gen);
+		if (err)
+			goto out;
+		OSDFS_DBGMSG(" OK\n");
+	}
+
+	/* some debug info */
+	{
+		OSDFS_DBGMSG("listing:\n");
+		list(dev, p_id, to_gen);
+		OSDFS_DBGMSG("contents of superblock:\n");
+		read_super(dev, p_id, to_gen);
+		OSDFS_DBGMSG("contents of root inode:\n");
+		get_root_attr(dev, p_id, to_gen);
+		if (newfile) {
+			OSDFS_DBGMSG("contents of test file:\n");
+			read_testfile(dev, p_id, to_gen);
+		}
+	}
+#endif
+	OSDFS_DBGMSG("\nsetup complete: enjoy your shiny new osdfs!\n");
+
+out:
+	return err;
+}
diff --git a/fs/osdfs/osdfs.h b/fs/osdfs/osdfs.h
index af8b998..5dd36c7 100644
--- a/fs/osdfs/osdfs.h
+++ b/fs/osdfs/osdfs.h
@@ -201,6 +201,9 @@ int extract_list_from_req(struct osd_request *req,
 
 void free_osd_req(struct osd_request *req);
 
+/* mkosdfs.c             */
+int osdfs_mkfs(struct osd_dev *dev, uint64_t p_id, uint64_t format_size);
+
 /* inode.c               */
 void osdfs_truncate(struct inode *inode);
 extern struct inode *osdfs_iget(struct super_block *, unsigned long);
diff --git a/fs/osdfs/super.c b/fs/osdfs/super.c
index 095b960..3f70e78 100644
--- a/fs/osdfs/super.c
+++ b/fs/osdfs/super.c
@@ -55,6 +55,8 @@ enum { Opt_lun, Opt_tid, Opt_pid, Opt_to, Opt_mkfs, Opt_format, Opt_err };
 static match_table_t tokens = {
 	{Opt_pid, "pid=%u"},
 	{Opt_to, "to=%u"},
+	{Opt_mkfs, "mkfs=%u"},
+	{Opt_format, "format=%u"},
 	{Opt_err, NULL}
 };
 
@@ -100,6 +102,16 @@ static int parse_options(char *options, struct osdfs_mountopt *opts)
 			}
 			opts->timeout = option;
 			break;
+		case Opt_mkfs:
+			if (match_int(&args[0], &option))
+				return -EINVAL;
+			opts->mkfs = option != 0;
+			break;
+		case Opt_format:
+			if (match_int(&args[0], &option))
+				return -EINVAL;
+			opts->format = option;
+			break;
 		}
 	}
 
@@ -277,6 +289,12 @@ static int osdfs_fill_super(struct super_block *sb, void *data, int silent)
 	sb->s_bdev = NULL;
 	sb->s_dev = 0;
 
+	/* see if we need to make the file system on the obsd */
+	if (opts->mkfs) {
+		OSDFS_DBGMSG("osdfs_mkfs %p\n", sbi->s_dev);
+		osdfs_mkfs(sbi->s_dev, sbi->s_pid, opts->format);
+	}
+
 	/* read data from on-disk superblock object */
 	make_credential(sbi->s_cred, sbi->s_pid, OSDFS_SUPER_ID);
 
-- 
1.6.0.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC 8/9] osdfs: Documentation
  2008-10-30 14:26 [RFC 0/9] osdfs Boaz Harrosh
                   ` (6 preceding siblings ...)
  2008-10-30 14:36 ` [RFC 7/9] osdfs: mkosdfs Boaz Harrosh
@ 2008-10-30 15:03 ` Boaz Harrosh
  2008-10-30 15:04 ` [RFC 9/9] [out-of-tree] open-osd: Global Makefile and do-osdfs test script Boaz Harrosh
  2008-11-03 21:07 ` [RFC 0/9] osdfs Jeff Garzik
  9 siblings, 0 replies; 14+ messages in thread
From: Boaz Harrosh @ 2008-10-30 15:03 UTC (permalink / raw)
  To: Avishay Traeger, linux-scsi, linux-fsdevel, open-osd

Added some documentation in osdfs.txt, as well as a BUGS file.

For further reading, operation instructions, example scripts
and up to date infomation and code please see:
http://open-osd.org

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
 Documentation/filesystems/osdfs.txt |  173 +++++++++++++++++++++++++++++++++++
 fs/osdfs/BUGS                       |    6 +
 2 files changed, 179 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/filesystems/osdfs.txt
 create mode 100644 fs/osdfs/BUGS

diff --git a/Documentation/filesystems/osdfs.txt b/Documentation/filesystems/osdfs.txt
new file mode 100644
index 0000000..2579b3c
--- /dev/null
+++ b/Documentation/filesystems/osdfs.txt
@@ -0,0 +1,173 @@
+===============================================================================
+WHAT IS OSDFS?
+===============================================================================
+
+osdfs is a file system that uses an OSD and exports the API of a normal Linux
+file system. Users access osdfs like any other local file system, and osdfs
+will in turn issue commands to the local initiator.
+
+===============================================================================
+ENVIRONMENT
+===============================================================================
+
+To use this file system, you need to have an object store to run it on.  You
+may download a target from:
+http://open-osd.org
+
+See drivers/scsi/osd/README for how to setup a working osd environment.
+
+===============================================================================
+USAGE
+===============================================================================
+
+1. Download and compile osdfs and open-osd initiator:
+  You need an external Kernel source tree or kernel headers from your
+  distribution. (anything based on 2.6.26 or later).
+
+  a. download open-osd including osdfs source using:
+     [parent-directory]$ git clone git://git.open-osd.org/open-osd.git
+
+  b. Build the library module like this:
+     [parent-directory]$ make -C KSRC=$(KER_DIR) open-osd
+
+     This will build both the open-osd initiator as well as the osdfs kernel
+     module. Use whatever parameters you compiled your Kernel with and 
+     $(KER_DIR) above pointing to the Kernel you compile against. See the file
+     open-osd/top-level-Makefile for an example.
+
+2. Get the OSD initiator and target set up properly, and login to the target.
+  See drivers/scsi/osd/README for farther instructions. Also see ./do-osd-test
+  for example script that does all these steps.
+
+3. Insmod the osdfs.ko module:
+   [osdfs]$ insmod osdfs.ko
+
+4. Make sure the directory where you want to mount exists. If not, create it.
+   (For example, /mnt/osdfs)
+
+5. At first run you will need to invoke the mkosdfs.c routine
+
+   As an example, this will create the file system on:
+   /dev/osd0 partition ID 65540, max capacity 1024 Mg bytes
+
+   mount -t osdfs -o pid=65540,mkfs=1,format=1024 /dev/osd0 /mnt/osdfs/
+
+   The format=1024 is optional if not specified no OSD_FORMAT will be preformed
+   and a clean file system will be created in the specified pid, in the
+   available space of the target.
+   If pid already exist it will be deleted and a new one will be created in it's
+   place. Be careful.
+
+6. Mount the file system. The above command left the filesystem mounted,
+   but on subsequent runs the mkfs=1 should not be invoked.
+
+   For example, to mount /dev/osd0, partition ID 65540 on /mnt/osdfs:
+
+	mount -t osdfs -o pid=65540 /dev/osd0 /mnt/osdfs/
+
+7. For reference (under fs/osdfs/):
+	do-osdfs start - an example of how to perform the above steps.
+	do-osdfs stop -  an example of how to unmount the file system.
+
+8. Extra compilation flags (uncomment in fs/osdfs/Kbuild):
+	OSDFS_DEBUG - for debug messages and extra checks.
+
+===============================================================================
+osdfs mount options
+===============================================================================
+Similar to any mount command:
+	mount -t osdfs -o osdfs_options /dev/osdX mount_osdfs_directory
+
+Where:
+    -t osdfs: specifies the osdfs file system
+
+    /dev/osdX: X is a decimal number. /dev/osdX was created after a successful
+               login into an OSD target.
+
+    mount_osdfs_directory: The directory to mount the file system on
+
+    osdfs_options: Options are separated by commas (,)
+		pid=<integer> - The partition number to mount/create as
+                                container of the filesystem.
+                                This option is mandatory
+		mkfs=<1/0>    - If mkfs=1 make a new filesystem before mount.
+                                Default is 0 - don't make. If mkfs=0 pid must
+                                exist and an mkfs=1 was previously preformed
+                                on it.
+               format=<integer>- If mkfs=1 is specified then the format=
+                                 parameter will also invoke an OSD_FORMAT
+                                 command prior to creation of the filesystem
+                                 partition (mkfs). The integer specified is in
+                                 Mega bytes. If not specified or set to 0 then
+                                 no format is executed, and a partition is
+                                 created in the available space.
+                                 If mkfs=0 this option is ignored.
+                to=<integer>  - Timeout in ticks for a single command
+                                default is (60 * HZ) [for debugging only]
+
+===============================================================================
+DESIGN
+===============================================================================
+
+* The file system control block (AKA on-disk superblock) resides in an object
+  with a special ID (defined in common.h).
+  Information included in the file system control block is used to fill the
+  in-memory superblock structure at mount time. This object is created before
+  the file system is used by mkosdfs.c It contains information such as:
+	- The file system's magic number
+	- The next inode number to be allocated
+
+* Each file resides in its own object and contains the data (and it will be
+  possible to extend the file over multiple objects, though this has not been
+  implemented yet).
+
+* A directory is treated as a file, and essentially contains a list of <file
+  name, inode #> pairs for files that are found in that directory. The object
+  IDs correspond to the files' inode numbers and will be allocated according to
+  a bitmap (stored in a separate object). Now they are allocated using a
+  counter.
+
+* Each file's control block (AKA on-disk inode) is stored in its object's
+  attributes. This applies to both regular files and other types (directories,
+  device files, symlinks, etc.).
+
+* Credentials are generated per object (inode and superblock) when they is
+  created in memory (read off disk or created). The credential works for all
+  operations and is used as long as the object remains in memory.
+
+* Async OSD operations are used whenever possible, but the target may execute
+  them out of order. The operations that concern us are create, delete,
+  readpage, writepage, update_inode, and truncate. The following pairs of
+  operations should execute in the order written, and we need to prevent them
+  from executing in reverse order:
+	- The following are handled with the OBJ_CREATED and OBJ_2BCREATED
+	  flags. OBJ_CREATED is set when we know the object exists on the OSD -
+	  in create's callback function, and when we successfully do a read_inode.
+	  OBJ_2BCREATED is set in the beginning of the create function, so we
+	  know that we should wait.
+		- create/delete: delete should wait until the object is created
+		  on the OSD.
+		- create/readpage: readpage should be able to return a page
+		  full of zeroes in this case. If there was a write already
+		  en-route (i.e. create, writepage, readpage) then the page
+		  would be locked, and so it would really be the same as
+		  create/writepage.
+		- create/writepage: if writepage is called for a sync write, it
+		  should wait until the object is created on the OSD.
+		  Otherwise, it should just return.
+		- create/truncate: truncate should wait until the object is
+		  created on the OSD.
+		- create/update_inode: update_inode should wait until the
+		  object is created on the OSD.
+	- Handled by VFS locks:
+		- readpage/delete: shouldn't happen because of page lock.
+		- writepage/delete: shouldn't happen because of page lock.
+		- readpage/writepage: shouldn't happen because of page lock.
+
+===============================================================================
+LICENSE/COPYRIGHT
+===============================================================================
+The osdfs file system is based on ext2 v0.5b (distributed with the Linux kernel
+version 2.6.10).  All files include the original copyrights, and the license
+is GPL version 2 (only version 2, as is true for the Linux kernel).  The
+Linux kernel can be downloaded from www.kernel.org.
diff --git a/fs/osdfs/BUGS b/fs/osdfs/BUGS
new file mode 100644
index 0000000..6d6e1f9
--- /dev/null
+++ b/fs/osdfs/BUGS
@@ -0,0 +1,6 @@
+- Some mount time options should have been 64-bit, but are declared as 32-bit
+  because that's what the kernel's parsing methods support at this time.
+
+- Out-of-space may cause a severe problem if the object (and directory entry)
+  were written, but the inode attributes failed. Then if the filesystem was
+  unmounted and mounted the kernel can get into an endless loop doing a readdir.
-- 
1.6.0.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC 9/9] [out-of-tree] open-osd: Global Makefile and do-osdfs test script
  2008-10-30 14:26 [RFC 0/9] osdfs Boaz Harrosh
                   ` (7 preceding siblings ...)
  2008-10-30 15:03 ` [RFC 8/9] osdfs: Documentation Boaz Harrosh
@ 2008-10-30 15:04 ` Boaz Harrosh
  2008-11-03 21:07 ` [RFC 0/9] osdfs Jeff Garzik
  9 siblings, 0 replies; 14+ messages in thread
From: Boaz Harrosh @ 2008-10-30 15:04 UTC (permalink / raw)
  To: Avishay Traeger, linux-scsi, linux-fsdevel, open-osd

Added a global out-of-tree Makefile that includes both fs/osdfs
and drivers/scsi/osd. This way imported symbols from libosd are
accounted for by the Kernel's build system

Added a small interactive script to test drive osdfs mount/unmout
and preform some tests.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
 Makefile |   20 ++++++++++++
 do-osdfs |  106 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 126 insertions(+), 0 deletions(-)
 create mode 100644 Makefile
 create mode 100755 do-osdfs

diff --git a/Makefile b/Makefile
new file mode 100644
index 0000000..caa26f6
--- /dev/null
+++ b/Makefile
@@ -0,0 +1,20 @@
+OSD_INC=`pwd`/include
+LIBOSD=drivers/scsi/osd
+OSDFS=fs/osdfs
+
+# Kbuild part (Embeded in this Makefile)
+obj-m := $(LIBOSD)/ $(OSDFS)/
+
+# Makefile for out-of-tree builds
+KSRC ?= /lib/modules/$(shell uname -r)/build
+KBUILD_OUTPUT ?=
+ARCH ?=
+
+# this is the basic Kbuild out-of-tree invocation, with the M= option
+KBUILD_BASE = +$(MAKE) -C $(KSRC) M=`pwd` KBUILD_OUTPUT=$(KBUILD_OUTPUT) ARCH=$(ARCH)
+
+all: ;
+	$(KBUILD_BASE) OSD_INC=$(OSD_INC) modules
+
+clean: ;
+	$(KBUILD_BASE) clean
diff --git a/do-osdfs b/do-osdfs
new file mode 100755
index 0000000..495a1f7
--- /dev/null
+++ b/do-osdfs
@@ -0,0 +1,106 @@
+#!/bin/sh
+#
+
+MOUNTDIR=`dirname $0`/mnt
+DEV_OSD=/dev/osd0
+PID=65536
+
+do_cmd()
+{
+	$* 2>&1 | logger -t `basename $1` &
+}
+
+prompt()
+{
+	read -p "$* >>> "
+}
+
+start_osdf()
+{
+	insmod osdfs.ko
+# 	add-symbol-file ../../osd-initiator/so_mod.ko
+	add-symbol-file osdfs.ko
+}
+
+stop_osdf()
+{
+	rmmod osdfs
+}
+
+start_mount()
+{
+OPT="pid=$PID,to=50"
+	mount -t osdfs -o $OPT $DEV_OSD $MOUNTDIR
+}
+
+stop_mount()
+{
+	umount $MOUNTDIR
+}
+
+mkosdfs_format()
+{
+OPT="pid=$PID,to=50,mkfs=1,format=10000"
+	mount -t osdfs -o $OPT $DEV_OSD $MOUNTDIR
+}
+
+osdfs_hello_world()
+{
+	echo hello > $MOUNTDIR/world
+	cat $MOUNTDIR/world
+}
+
+case $1 in
+stop)
+	echo $0 Stopping | logger
+
+	prompt stop_mount
+	stop_mount
+
+	prompt stop_osdf
+	stop_osdf
+
+	echo $0 Stopped | logger
+	;;
+
+test)
+
+# TODO: Write lots of tests here
+# osdfs_hello_world && rm $MOUNTDIR/world;
+# cp a kernel-git-tree, edit, git-diff
+# unmount, mount, diff-with-original-tree
+# compile kernel
+# ...
+	;;
+
+format)
+	prompt mkosdfs_format
+	mkosdfs_format
+	;;
+
+format_start)
+	echo $0 Starting | logger
+
+	prompt start_osdf
+	start_osdf
+
+	prompt mkosdfs_format
+	mkosdfs_format
+
+	prompt osdfs_hello_world
+	osdfs_hello_world
+
+	echo $0 Initialized | logger
+	;;
+*)
+	echo $0 Starting | logger
+
+	prompt start_osdf
+	start_osdf
+
+	prompt start_mount
+	start_mount
+
+	echo $0 Initialized | logger
+	;;
+esac
-- 
1.6.0.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC 0/9] osdfs
  2008-10-30 14:26 [RFC 0/9] osdfs Boaz Harrosh
                   ` (8 preceding siblings ...)
  2008-10-30 15:04 ` [RFC 9/9] [out-of-tree] open-osd: Global Makefile and do-osdfs test script Boaz Harrosh
@ 2008-11-03 21:07 ` Jeff Garzik
  2008-11-04  8:04   ` [osd-dev] " Benny Halevy
  9 siblings, 1 reply; 14+ messages in thread
From: Jeff Garzik @ 2008-11-03 21:07 UTC (permalink / raw)
  To: Boaz Harrosh; +Cc: Avishay Traeger, linux-scsi, linux-fsdevel, open-osd, LKML

Boaz Harrosh wrote:
> Please review an OSD based file system. 
> 
> Given that our OSD initiator library is accepted into Kernel, we would
> like to also submit an osdfs. This is the first iteration of this file system.
> 
> The next stage is to make it exportable by the pNFS-over-objects Server.
> osdfs is one of the building blocks for a full, end-to-end open source
> reference implementation of a Server/Client pNFS-over-objects we
> want to have available in Linux. Other parts are the Generic pNFS
> client project with the objects-layout-driver, and the generic pNFS
> server plus osdfs once it is adapted to be exportable.
> (See all about pNFS in Linux at:
> http://wiki.linux-nfs.org/wiki/index.php/PNFS_prototype_design)
> 
> osdfs was originally developed by Avishay Traeger <avishay@gmail.com>
> from IBM. A very old version of it is hosted on sourceforge as the osdfs
> project. It was originally developed for the 2.6.10 Kernel over the old
> IBM's osd-initiator Linux driver.
> 
> Since then it was picked by us, open-osd, and was both forward ported to
> current Kernel, as well as converted to run over our osd Kernel Library.
> The conversion effort, if anyone is interested, is also available as a
> patchset here:
>   git-clone git://git-open-osd.org/open-osd.git osdfs-devel
> or on the web at:
>   http://git.open-osd.org/gitweb.cgi?p=open-osd.git;a=shortlog;h=refs/heads/osdfs-devel
> 
> The Original code is based on ext2 code from the Kernel at the time.
> Further reading is available at the last patch in the osdfs.txt file.
> 
> I have mechanically divided the code in parts, each introducing a
> group of vfs function vectors, all tied at the end into a full filesystem.
> Each patch can be compiled but it will only run at the very end.
> This was done for the hope of easier reviewing.
> 
> Here is the list of patches
> [RFC 1/9] osdfs: osd Swiss army knife
> [RFC 2/9] osdfs: file and file_inode operations
> [RFC 3/9] osdfs: symlink_inode and fast_symlink_inode operations
> [RFC 4/9] osdfs: address_space_operations
> [RFC 5/9] osdfs: dir_inode and directory operations
> [RFC 6/9] osdfs: super_operations and file_system_type
> [RFC 7/9] osdfs: mkosdfs
> [RFC 8/9] osdfs: Documentation

Pretty cool stuff.

I've been wondering when we would start seeing OSD filesystems make 
their appearance.

Random, unordered comments:

* This is important stuff.  Should have been posted to LKML.  Please CC 
LKML in the future.

* As discussed at the filesystem summit, OSD use implies a need for an 
MD-like layer for OSD objects.  Has anyone even started the design work 
for this?

* I tend to think there is room for more than one OSD filesystem in the 
Linux kernel.  Assuming all OSDs will use the same Linux filesystem 
driver will lead to bloat, and you potentially "code yourself into a 
corner."  Let's not rule out multiple filesystems.

As such, "osdfs" seems like too-generic a name. How about boazfs?  :)

* Get this into the kernel ASAP!  OSD stuff languishes outside the 
kernel for _far_ too long.  OSD is a key storage technology that needs 
to be developed in the full light of the Linux community, not off in a 
dark corner somewhere, where few see progress or discussions.

Object-based storage, and its SCSI incarnation OSD, is a MAJOR revision 
of the block storage API, moving away from LBA-addressed linear APIs. 
That's a big deal, and should be discussed on LKML, IMO...

	Jeff






^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [osd-dev] [RFC 0/9] osdfs
  2008-11-03 21:07 ` [RFC 0/9] osdfs Jeff Garzik
@ 2008-11-04  8:04   ` Benny Halevy
  2008-11-04 10:11     ` Boaz Harrosh
  0 siblings, 1 reply; 14+ messages in thread
From: Benny Halevy @ 2008-11-04  8:04 UTC (permalink / raw)
  Cc: open-osd development, Boaz Harrosh, linux-fsdevel,
	Avishay Traeger, linux-scsi, LKML, Jeff Garzik

On Nov. 03, 2008, 23:07 +0200, Jeff Garzik <jeff@garzik.org> wrote:
> Boaz Harrosh wrote:
>> Please review an OSD based file system. 
>>
>> Given that our OSD initiator library is accepted into Kernel, we would
>> like to also submit an osdfs. This is the first iteration of this file system.
>>
>> The next stage is to make it exportable by the pNFS-over-objects Server.
>> osdfs is one of the building blocks for a full, end-to-end open source
>> reference implementation of a Server/Client pNFS-over-objects we
>> want to have available in Linux. Other parts are the Generic pNFS
>> client project with the objects-layout-driver, and the generic pNFS
>> server plus osdfs once it is adapted to be exportable.
>> (See all about pNFS in Linux at:
>> http://wiki.linux-nfs.org/wiki/index.php/PNFS_prototype_design)
>>
>> osdfs was originally developed by Avishay Traeger <avishay@gmail.com>
>> from IBM. A very old version of it is hosted on sourceforge as the osdfs
>> project. It was originally developed for the 2.6.10 Kernel over the old
>> IBM's osd-initiator Linux driver.
>>
>> Since then it was picked by us, open-osd, and was both forward ported to
>> current Kernel, as well as converted to run over our osd Kernel Library.
>> The conversion effort, if anyone is interested, is also available as a
>> patchset here:
>>   git-clone git://git-open-osd.org/open-osd.git osdfs-devel
>> or on the web at:
>>   http://git.open-osd.org/gitweb.cgi?p=open-osd.git;a=shortlog;h=refs/heads/osdfs-devel
>>
>> The Original code is based on ext2 code from the Kernel at the time.
>> Further reading is available at the last patch in the osdfs.txt file.
>>
>> I have mechanically divided the code in parts, each introducing a
>> group of vfs function vectors, all tied at the end into a full filesystem.
>> Each patch can be compiled but it will only run at the very end.
>> This was done for the hope of easier reviewing.
>>
>> Here is the list of patches
>> [RFC 1/9] osdfs: osd Swiss army knife
>> [RFC 2/9] osdfs: file and file_inode operations
>> [RFC 3/9] osdfs: symlink_inode and fast_symlink_inode operations
>> [RFC 4/9] osdfs: address_space_operations
>> [RFC 5/9] osdfs: dir_inode and directory operations
>> [RFC 6/9] osdfs: super_operations and file_system_type
>> [RFC 7/9] osdfs: mkosdfs
>> [RFC 8/9] osdfs: Documentation
> 
> Pretty cool stuff.
> 
> I've been wondering when we would start seeing OSD filesystems make 
> their appearance.
> 
> Random, unordered comments:
> 
> * This is important stuff.  Should have been posted to LKML.  Please CC 
> LKML in the future.
> 
> * As discussed at the filesystem summit, OSD use implies a need for an 
> MD-like layer for OSD objects.  Has anyone even started the design work 
> for this?

Yes.  I have.
I'm coding a prototype to be used by both this file system and by
the pnfs objects layout driver.
Initially it will do striping and mirroring, and RAID-5 parity as a
stretched goal for the initial release.

> 
> * I tend to think there is room for more than one OSD filesystem in the 
> Linux kernel.  Assuming all OSDs will use the same Linux filesystem 
> driver will lead to bloat, and you potentially "code yourself into a 
> corner."  Let's not rule out multiple filesystems.
> 
> As such, "osdfs" seems like too-generic a name. How about boazfs?  :)

I agree.  osdfs is the name given by Avishay and IBM and we just adopted it.
I think that obfs (Object-based File System) would better represent
what it is (although it's still generic compared to boazfs :-)

> 
> * Get this into the kernel ASAP!  OSD stuff languishes outside the 
> kernel for _far_ too long.  OSD is a key storage technology that needs 
> to be developed in the full light of the Linux community, not off in a 
> dark corner somewhere, where few see progress or discussions.

I completely agree.  We've missed the merge window for 2.6.28 but
if we can get it into 2.6.29 that would be great!

> 
> Object-based storage, and its SCSI incarnation OSD, is a MAJOR revision 
> of the block storage API, moving away from LBA-addressed linear APIs. 
> That's a big deal, and should be discussed on LKML, IMO...

Absolutely.
Thanks for your comments!

Benny

> 
> 	Jeff
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [osd-dev] [RFC 0/9] osdfs
  2008-11-04  8:04   ` [osd-dev] " Benny Halevy
@ 2008-11-04 10:11     ` Boaz Harrosh
  2008-11-04 10:28       ` Avishay Traeger
  0 siblings, 1 reply; 14+ messages in thread
From: Boaz Harrosh @ 2008-11-04 10:11 UTC (permalink / raw)
  To: Benny Halevy, Jeff Garzik
  Cc: open-osd development, linux-fsdevel, Avishay Traeger, linux-scsi,
	LKML, Sami.Iren

Benny Halevy wrote:
> On Nov. 03, 2008, 23:07 +0200, Jeff Garzik <jeff@garzik.org> wrote:
>> Boaz Harrosh wrote:
>>> Please review an OSD based file system. 
>>>
>>> Given that our OSD initiator library is accepted into Kernel, we would
>>> like to also submit an osdfs. This is the first iteration of this file system.
>>>
>>> The next stage is to make it exportable by the pNFS-over-objects Server.
>>> osdfs is one of the building blocks for a full, end-to-end open source
>>> reference implementation of a Server/Client pNFS-over-objects we
>>> want to have available in Linux. Other parts are the Generic pNFS
>>> client project with the objects-layout-driver, and the generic pNFS
>>> server plus osdfs once it is adapted to be exportable.
>>> (See all about pNFS in Linux at:
>>> http://wiki.linux-nfs.org/wiki/index.php/PNFS_prototype_design)
>>>
>>> osdfs was originally developed by Avishay Traeger <avishay@gmail.com>
>>> from IBM. A very old version of it is hosted on sourceforge as the osdfs
>>> project. It was originally developed for the 2.6.10 Kernel over the old
>>> IBM's osd-initiator Linux driver.
>>>
>>> Since then it was picked by us, open-osd, and was both forward ported to
>>> current Kernel, as well as converted to run over our osd Kernel Library.
>>> The conversion effort, if anyone is interested, is also available as a
>>> patchset here:
>>>   git-clone git://git-open-osd.org/open-osd.git osdfs-devel
>>> or on the web at:
>>>   http://git.open-osd.org/gitweb.cgi?p=open-osd.git;a=shortlog;h=refs/heads/osdfs-devel
>>>
>>> The Original code is based on ext2 code from the Kernel at the time.
>>> Further reading is available at the last patch in the osdfs.txt file.
>>>
>>> I have mechanically divided the code in parts, each introducing a
>>> group of vfs function vectors, all tied at the end into a full filesystem.
>>> Each patch can be compiled but it will only run at the very end.
>>> This was done for the hope of easier reviewing.
>>>
>>> Here is the list of patches
>>> [RFC 1/9] osdfs: osd Swiss army knife
>>> [RFC 2/9] osdfs: file and file_inode operations
>>> [RFC 3/9] osdfs: symlink_inode and fast_symlink_inode operations
>>> [RFC 4/9] osdfs: address_space_operations
>>> [RFC 5/9] osdfs: dir_inode and directory operations
>>> [RFC 6/9] osdfs: super_operations and file_system_type
>>> [RFC 7/9] osdfs: mkosdfs
>>> [RFC 8/9] osdfs: Documentation
>> Pretty cool stuff.
>>
>> I've been wondering when we would start seeing OSD filesystems make 
>> their appearance.
>>
>> Random, unordered comments:
>>
>> * This is important stuff.  Should have been posted to LKML.  Please CC 
>> LKML in the future.
>>
>> * As discussed at the filesystem summit, OSD use implies a need for an 
>> MD-like layer for OSD objects.  Has anyone even started the design work 
>> for this?
> 
> Yes.  I have.
> I'm coding a prototype to be used by both this file system and by
> the pnfs objects layout driver.
> Initially it will do striping and mirroring, and RAID-5 parity as a
> stretched goal for the initial release.
> 

Thanks Benny, it would be nice to connect all these thing together.

>> * I tend to think there is room for more than one OSD filesystem in the 
>> Linux kernel.  Assuming all OSDs will use the same Linux filesystem 
>> driver will lead to bloat, and you potentially "code yourself into a 
>> corner."  Let's not rule out multiple filesystems.
>>
>> As such, "osdfs" seems like too-generic a name. How about boazfs?  :)
> 
> I agree.  osdfs is the name given by Avishay and IBM and we just adopted it.
> I think that obfs (Object-based File System) would better represent
> what it is (although it's still generic compared to boazfs :-)
> 

If at all then it's avishayfs, but unless I write this filesystem from
scratch I don't think I can do anything about the name. The code is
copyrighted to Avishay Traeger, and that is the name he chose. Also
he has a sourceforge.net project of that name for a long time.

Avishay would you be willing to change the name? See above, people
think it is too generic a name. Like if someone would do a scsifs
or blocksfs.

Personally for me it's just a name, I don't mind either way.

>> * Get this into the kernel ASAP!  OSD stuff languishes outside the 
>> kernel for _far_ too long.  OSD is a key storage technology that needs 
>> to be developed in the full light of the Linux community, not off in a 
>> dark corner somewhere, where few see progress or discussions.
> 
> I completely agree.  We've missed the merge window for 2.6.28 but
> if we can get it into 2.6.29 that would be great!
> 
>> Object-based storage, and its SCSI incarnation OSD, is a MAJOR revision 
>> of the block storage API, moving away from LBA-addressed linear APIs. 
>> That's a big deal, and should be discussed on LKML, IMO...
> 
> Absolutely.
> Thanks for your comments!
> 

I've been working on this OSD stuff for a while now, and I'm very excited 
about it, it feels very powerful yet simple. I was able to reach
stable results (hopefully) in relatively short time. and the code size of
both the Initiator and the osdfs is pretty small.

I forgot to mention in my introduction that I was able to clone
a git tree over an osdfs mount, compile a kernel, which actually
runs. Make changes git-diff and commit the changes. Unmount/mount
and diff with original tree with success. So it is functional.
With Benny's stuff it might even get fast with the right setup. 

> Benny
> 
>> 	Jeff
>>

Boaz

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [osd-dev] [RFC 0/9] osdfs
  2008-11-04 10:11     ` Boaz Harrosh
@ 2008-11-04 10:28       ` Avishay Traeger
  0 siblings, 0 replies; 14+ messages in thread
From: Avishay Traeger @ 2008-11-04 10:28 UTC (permalink / raw)
  To: Boaz Harrosh; +Cc: linux-fsdevel, linux-scsi, LKML

On Tue, Nov 4, 2008 at 12:11 PM, Boaz Harrosh <bharrosh@panasas.com> wrote:

<snip>

>>> As such, "osdfs" seems like too-generic a name. How about boazfs?  :)
>>
>> I agree.  osdfs is the name given by Avishay and IBM and we just adopted it.
>> I think that obfs (Object-based File System) would better represent
>> what it is (although it's still generic compared to boazfs :-)
>>
>
> If at all then it's avishayfs, but unless I write this filesystem from
> scratch I don't think I can do anything about the name. The code is
> copyrighted to Avishay Traeger, and that is the name he chose. Also
> he has a sourceforge.net project of that name for a long time.
>
> Avishay would you be willing to change the name? See above, people
> think it is too generic a name. Like if someone would do a scsifs
> or blocksfs.
>
> Personally for me it's just a name, I don't mind either way.

I don't care if we change the name.  I suppose we can discuss
alternatives outside of the mailing lists, but I'd rather not do
boazfs or avishayfs :-)

By the way, just to clarify: I wrote this file system at IBM, but they
were nice enough to give me full ownership of the code.  I then
proceeded to put the code up on sourceforge.  Thanks to Boaz and Benny
for reviving the code and submitting it for inclusion in the kernel.

Avishay

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2008-11-04 10:28 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-30 14:26 [RFC 0/9] osdfs Boaz Harrosh
2008-10-30 14:30 ` [RFC 1/9] osdfs: osd Swiss army knife Boaz Harrosh
2008-10-30 14:31 ` [RFC 2/9] osdfs: file and file_inode operations Boaz Harrosh
2008-10-30 14:32 ` [RFC 3/9] osdfs: symlink_inode and fast_symlink_inode operations Boaz Harrosh
2008-10-30 14:33 ` [RFC 4/9] osdfs: address_space_operations Boaz Harrosh
2008-10-30 14:34 ` [RFC 5/9] osdfs: dir_inode and directory operations Boaz Harrosh
2008-10-30 14:35 ` [RFC 6/9] osdfs: super_operations and file_system_type Boaz Harrosh
2008-10-30 14:36 ` [RFC 7/9] osdfs: mkosdfs Boaz Harrosh
2008-10-30 15:03 ` [RFC 8/9] osdfs: Documentation Boaz Harrosh
2008-10-30 15:04 ` [RFC 9/9] [out-of-tree] open-osd: Global Makefile and do-osdfs test script Boaz Harrosh
2008-11-03 21:07 ` [RFC 0/9] osdfs Jeff Garzik
2008-11-04  8:04   ` [osd-dev] " Benny Halevy
2008-11-04 10:11     ` Boaz Harrosh
2008-11-04 10:28       ` Avishay Traeger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).