Re: AZFS file system proposal

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dmitri Vorobiev <dmitri.vorobiev@movial.fi>
To: Maxim Shchetynin <maxim@linux.vnet.ibm.com>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: AZFS file system proposal
Date: Tue, 17 Jun 2008 18:02:41 +0300	[thread overview]
Message-ID: <4857D211.1030308@movial.fi> (raw)
In-Reply-To: <20080609104650.4f220492@mercedes-benz.boeblingen.de.ibm.com>

Maxim Shchetynin wrote:
> Hello,
> 
> there are some users which have interest on such kind of file system like azfs. Please, have a look at this version of a diff file which introduces a first version of azfs to 2.6.26. This file system may be useful for example on IBM CellBlades where user can mount DDR2 memory of Axon controller as a disk and to be able to access it directly without any caching mechanism in between.
> 
> Subject: azfs: initial submit of azfs, a non-buffered filesystem
> 
> From: Maxim Shchetynin <maxim@de.ibm.com>
> 
> Non-buffered filesystem for block devices with a gendisk and
> with direct_access() method in gendisk->fops.
> AZFS does not buffer outgoing traffic and is doing no read ahead.
> It supports mount options (given with -o) bs=x,uid=x,gid=x.
> If block-size (bs) is not specified AZFS uses block-size used
> by block device. Though mmap() method is available only if
> block-size equals to or is greater than the system page size.
> 
> Signed-off-by: Maxim Shchetynin <maxim@de.ibm.com>
> 
> diff -Nuar linux-2.6.26-rc5/arch/powerpc/configs/cell_defconfig linux-2.6.26-rc5-azfs/arch/powerpc/configs/cell_defconfig
> --- linux-2.6.26-rc5/arch/powerpc/configs/cell_defconfig	2008-06-05 05:10:44.000000000 +0200
> +++ linux-2.6.26-rc5-azfs/arch/powerpc/configs/cell_defconfig	2008-06-06 11:53:34.000000000 +0200
> @@ -240,6 +240,7 @@
>  # CPU Frequency drivers
>  #
>  CONFIG_AXON_RAM=m
> +CONFIG_AZ_FS=m
>  # CONFIG_FSL_ULI1575 is not set
>  
>  #
> diff -Nuar linux-2.6.26-rc5/fs/Kconfig linux-2.6.26-rc5-azfs/fs/Kconfig
> --- linux-2.6.26-rc5/fs/Kconfig	2008-06-05 05:10:44.000000000 +0200
> +++ linux-2.6.26-rc5-azfs/fs/Kconfig	2008-06-06 16:55:11.616419992 +0200
> @@ -360,6 +360,17 @@
>  	  If you are not using a security module that requires using
>  	  extended attributes for file security labels, say N.
>  
> +config AZ_FS
> +	tristate "AZFS filesystem support"
> +	help
> +	  Non-buffered filesystem for block devices with a gendisk and
> +	  with direct_access() method in gendisk->fops.
> +	  AZFS does not buffer outgoing traffic and is doing no read ahead.
> +	  It supports mount options (given with -o) bs=x,uid=x,gid=x.
> +	  If block-size (bs) is not specified AZFS uses block-size used
> +	  by block device. Though mmap() method is available only if
> +	  block-size equals to or is greater than system page size.
> +
>  config JFS_FS
>  	tristate "JFS filesystem support"
>  	select NLS
> diff -Nuar linux-2.6.26-rc5/fs/Makefile linux-2.6.26-rc5-azfs/fs/Makefile
> --- linux-2.6.26-rc5/fs/Makefile	2008-06-05 05:10:44.000000000 +0200
> +++ linux-2.6.26-rc5-azfs/fs/Makefile	2008-06-06 11:53:34.000000000 +0200
> @@ -119,3 +119,4 @@
>  obj-$(CONFIG_DEBUG_FS)		+= debugfs/
>  obj-$(CONFIG_OCFS2_FS)		+= ocfs2/
>  obj-$(CONFIG_GFS2_FS)           += gfs2/
> +obj-$(CONFIG_AZ_FS)		+= azfs.o
> diff -Nuar linux-2.6.26-rc5/fs/azfs.c linux-2.6.26-rc5-azfs/fs/azfs.c
> --- linux-2.6.26-rc5/fs/azfs.c	1970-01-01 01:00:00.000000000 +0100
> +++ linux-2.6.26-rc5-azfs/fs/azfs.c	2008-06-06 17:46:23.587653053 +0200
> @@ -0,0 +1,1179 @@
> +/*
> + * (C) Copyright IBM Deutschland Entwicklung GmbH 2007
> + *
> + * Author: Maxim Shchetynin <maxim@de.ibm.com>
> + *
> + * Non-buffered filesystem driver.
> + * It registers a filesystem which may be used for all kind of block devices
> + * which have a direct_access() method in block_device_operations.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2, or (at your option)
> + * any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
> + */
> +
> +#include <linux/backing-dev.h>
> +#include <linux/blkdev.h>
> +#include <linux/cache.h>
> +#include <linux/dcache.h>
> +#include <linux/device.h>
> +#include <linux/err.h>
> +#include <linux/fs.h>
> +#include <linux/genhd.h>
> +#include <linux/kernel.h>
> +#include <linux/limits.h>
> +#include <linux/list.h>
> +#include <linux/module.h>
> +#include <linux/mount.h>
> +#include <linux/mm.h>
> +#include <linux/mm_types.h>
> +#include <linux/mutex.h>
> +#include <linux/namei.h>
> +#include <linux/pagemap.h>
> +#include <linux/parser.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +#include <linux/stat.h>
> +#include <linux/statfs.h>
> +#include <linux/string.h>
> +#include <linux/time.h>
> +#include <linux/types.h>
> +#include <linux/aio.h>
> +#include <linux/uio.h>
> +#include <asm/bug.h>
> +#include <asm/page.h>
> +#include <asm/pgtable.h>
> +#include <asm/string.h>
> +
> +#define AZFS_FILESYSTEM_NAME		"azfs"
> +#define AZFS_FILESYSTEM_FLAGS		FS_REQUIRES_DEV
> +
> +#define AZFS_SUPERBLOCK_MAGIC		0xABBA1972
> +#define AZFS_SUPERBLOCK_FLAGS		MS_NOEXEC | \
> +					MS_SYNCHRONOUS | \
> +					MS_DIRSYNC | \
> +					MS_ACTIVE
> +
> +#define AZFS_BDI_CAPABILITIES		BDI_CAP_NO_ACCT_DIRTY | \
> +					BDI_CAP_NO_WRITEBACK | \
> +					BDI_CAP_MAP_COPY | \
> +					BDI_CAP_MAP_DIRECT | \
> +					BDI_CAP_VMFLAGS
> +
> +#define AZFS_CACHE_FLAGS		SLAB_HWCACHE_ALIGN | \
> +					SLAB_RECLAIM_ACCOUNT | \
> +					SLAB_MEM_SPREAD
> +
> +enum azfs_direction {
> +	AZFS_MMAP,
> +	AZFS_READ,
> +	AZFS_WRITE
> +};
> +
> +struct azfs_super {
> +	struct list_head		list;
> +	unsigned long			media_size;
> +	unsigned long			block_size;
> +	unsigned short			block_shift;
> +	unsigned long			sector_size;
> +	unsigned short			sector_shift;
> +	uid_t				uid;
> +	gid_t				gid;
> +	unsigned long			ph_addr;
> +	unsigned long			io_addr;
> +	struct block_device		*blkdev;
> +	struct dentry			*root;
> +	struct list_head		block_list;
> +	rwlock_t			lock;
> +};
> +
> +struct azfs_super_list {
> +	struct list_head		head;
> +	spinlock_t			lock;
> +};
> +
> +struct azfs_block {
> +	struct list_head		list;
> +	unsigned long			id;
> +	unsigned long			count;
> +};
> +
> +struct azfs_znode {
> +	struct list_head		block_list;
> +	rwlock_t			lock;
> +	loff_t				size;
> +	struct inode			vfs_inode;
> +};
> +
> +static struct azfs_super_list		super_list;
> +static struct kmem_cache		*azfs_znode_cache __read_mostly = NULL;
> +static struct kmem_cache		*azfs_block_cache __read_mostly = NULL;
> +
> +#define I2Z(inode) \
> +	container_of(inode, struct azfs_znode, vfs_inode)
> +
> +#define for_each_block(block, block_list) \
> +	list_for_each_entry(block, block_list, list)
> +#define for_each_block_reverse(block, block_list) \
> +	list_for_each_entry_reverse(block, block_list, list)
> +#define for_each_block_safe(block, ding, block_list) \
> +	list_for_each_entry_safe(block, ding, block_list, list)
> +#define for_each_block_safe_reverse(block, ding, block_list) \
> +	list_for_each_entry_safe_reverse(block, ding, block_list, list)
> +
> +/**
> + * azfs_block_init - create and initialise a new block in a list
> + * @block_list: destination list
> + * @id: block id
> + * @count: size of a block
> + */
> +static inline struct azfs_block*
> +azfs_block_init(struct list_head *block_list,
> +		unsigned long id, unsigned long count)
> +{
> +	struct azfs_block *block;
> +
> +	block = kmem_cache_alloc(azfs_block_cache, GFP_KERNEL);
> +	if (!block)
> +		return NULL;
> +
> +	block->id = id;
> +	block->count = count;
> +
> +	INIT_LIST_HEAD(&block->list);
> +	list_add_tail(&block->list, block_list);
> +
> +	return block;
> +}
> +
> +/**
> + * azfs_block_free - remove block from a list and free it back in cache
> + * @block: block to be removed
> + */
> +static inline void
> +azfs_block_free(struct azfs_block *block)
> +{
> +	list_del(&block->list);
> +	kmem_cache_free(azfs_block_cache, block);
> +}
> +
> +/**
> + * azfs_block_move - move block to another list
> + * @block: block to be moved
> + * @block_list: destination list
> + */
> +static inline void
> +azfs_block_move(struct azfs_block *block, struct list_head *block_list)
> +{
> +	list_move_tail(&block->list, block_list);
> +}
> +
> +/**
> + * azfs_recherche - get real address of a part of a file
> + * @inode: inode
> + * @direction: data direction
> + * @from: offset for read/write operation
> + * @size: pointer to a value of the amount of data to be read/written
> + */
> +static unsigned long
> +azfs_recherche(struct inode *inode, enum azfs_direction direction,

At the risk of being damned by the entire francophone world, I'd still
suggest using an English keyword for the function name here.

> +	       unsigned long from, unsigned long *size)
> +{
> +	struct azfs_super *super;
> +	struct azfs_znode *znode;
> +	struct azfs_block *block;
> +	unsigned long block_id, west, east;
> +
> +	super = inode->i_sb->s_fs_info;
> +	znode = I2Z(inode);
> +
> +	if (from + *size > znode->size) {
> +		i_size_write(inode, from + *size);
> +		inode->i_op->truncate(inode);
> +	}
> +
> +	read_lock(&znode->lock);
> +
> +	if (list_empty(&znode->block_list)) {
> +		read_unlock(&znode->lock);
> +		return 0;
> +	}
> +
> +	block_id = from >> super->block_shift;
> +
> +	for_each_block(block, &znode->block_list) {
> +		if (block->count > block_id)
> +			break;
> +		block_id -= block->count;
> +	}
> +
> +	west = from % super->block_size;
> +	east = ((block->count - block_id) << super->block_shift) - west;
> +
> +	if (*size > east)
> +		*size = east;
> +
> +	block_id = ((block->id + block_id) << super->block_shift) + west;
> +
> +	read_unlock(&znode->lock);
> +
> +	block_id += direction == AZFS_MMAP ? super->ph_addr : super->io_addr;
> +
> +	return block_id;
> +}
> +
> +static struct inode*
> +azfs_new_inode(struct super_block *, struct inode *, int, dev_t);

Would it not be better to place this function prototype along with the
bunch of macro definitions you have above into a private header?

> +
> +/**
> + * azfs_mknod - mknod() method for inode_operations
> + * @dir, @dentry, @mode, @dev: see inode_operations methods
> + */
> +static int
> +azfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev)
> +{
> +	struct inode *inode;
> +
> +	inode = azfs_new_inode(dir->i_sb, dir, mode, dev);
> +	if (!inode)
> +		return -ENOSPC;
> +
> +	if (S_ISREG(mode))
> +		I2Z(inode)->size = 0;
> +
> +	dget(dentry);
> +	d_instantiate(dentry, inode);
> +
> +	return 0;
> +}
> +
> +/**
> + * azfs_create - create() method for inode_operations
> + * @dir, @dentry, @mode, @nd: see inode_operations methods
> + */
> +static int
> +azfs_create(struct inode *dir, struct dentry *dentry, int mode,
> +	    struct nameidata *nd)
> +{
> +	return azfs_mknod(dir, dentry, mode | S_IFREG, 0);
> +}
> +
> +/**
> + * azfs_mkdir - mkdir() method for inode_operations
> + * @dir, @dentry, @mode: see inode_operations methods
> + */
> +static int
> +azfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
> +{
> +	int rc;
> +
> +	rc = azfs_mknod(dir, dentry, mode | S_IFDIR, 0);
> +	if (rc == 0)

Maybe "if (!rc)" ?

> +		inc_nlink(dir);
> +
> +	return rc;
> +}
> +
> +/**
> + * azfs_symlink - symlink() method for inode_operations
> + * @dir, @dentry, @name: see inode_operations methods
> + */
> +static int
> +azfs_symlink(struct inode *dir, struct dentry *dentry, const char *name)
> +{
> +	struct inode *inode;
> +	int rc;
> +
> +	inode = azfs_new_inode(dir->i_sb, dir, S_IFLNK | S_IRWXUGO, 0);
> +	if (!inode)
> +		return -ENOSPC;
> +
> +	rc = page_symlink(inode, name, strlen(name) + 1);
> +	if (rc) {
> +		iput(inode);
> +		return rc;
> +	}
> +
> +	dget(dentry);
> +	d_instantiate(dentry, inode);
> +
> +	return 0;
> +}
> +
> +/**
> + * azfs_aio_read - aio_read() method for file_operations
> + * @iocb, @iov, @nr_segs, @pos: see file_operations methods
> + */
> +static ssize_t
> +azfs_aio_read(struct kiocb *iocb, const struct iovec *iov,
> +	      unsigned long nr_segs, loff_t pos)
> +{
> +	struct inode *inode;
> +	void *ziel;

void *target?

> +	unsigned long pin;
> +	unsigned long size, todo, step;
> +	ssize_t rc;
> +
> +	inode = iocb->ki_filp->f_mapping->host;
> +
> +	mutex_lock(&inode->i_mutex);
> +
> +	if (pos >= i_size_read(inode)) {
> +		rc = 0;
> +		goto out;
> +	}
> +
> +	ziel = iov->iov_base;
> +	todo = min((loff_t) iov->iov_len, i_size_read(inode) - pos);
> +
> +	for (step = todo; step; step -= size) {
> +		size = step;
> +		pin = azfs_recherche(inode, AZFS_READ, pos, &size);
> +		if (!pin) {
> +			rc = -ENOSPC;
> +			goto out;
> +		}
> +		if (copy_to_user(ziel, (void*) pin, size)) {
> +			rc = -EFAULT;
> +			goto out;
> +		}
> +
> +		iocb->ki_pos += size;
> +		pos += size;
> +		ziel += size;
> +	}
> +
> +	rc = todo;
> +
> +out:
> +	mutex_unlock(&inode->i_mutex);
> +
> +	return rc;
> +}
> +
> +/**
> + * azfs_aio_write - aio_write() method for file_operations
> + * @iocb, @iov, @nr_segs, @pos: see file_operations methods
> + */
> +static ssize_t
> +azfs_aio_write(struct kiocb *iocb, const struct iovec *iov,
> +	       unsigned long nr_segs, loff_t pos)
> +{
> +	struct inode *inode;
> +	void *quell;

void *source?

> +	unsigned long pin;
> +	unsigned long size, todo, step;
> +	ssize_t rc;
> +
> +	inode = iocb->ki_filp->f_mapping->host;
> +
> +	quell = iov->iov_base;
> +	todo = iov->iov_len;
> +
> +	mutex_lock(&inode->i_mutex);
> +
> +	for (step = todo; step; step -= size) {
> +		size = step;
> +		pin = azfs_recherche(inode, AZFS_WRITE, pos, &size);
> +		if (!pin) {
> +			rc = -ENOSPC;
> +			goto out;
> +		}
> +		if (copy_from_user((void*) pin, quell, size)) {
> +			rc = -EFAULT;
> +			goto out;
> +		}
> +
> +		iocb->ki_pos += size;
> +		pos += size;
> +		quell += size;
> +	}
> +
> +	rc = todo;
> +
> +out:
> +	mutex_unlock(&inode->i_mutex);
> +
> +	return rc;
> +}
> +
> +/**
> + * azfs_open - open() method for file_operations
> + * @inode, @file: see file_operations methods
> + */
> +static int
> +azfs_open(struct inode *inode, struct file *file)
> +{
> +	file->private_data = inode;
> +
> +	if (file->f_flags & O_TRUNC) {
> +		i_size_write(inode, 0);
> +		inode->i_op->truncate(inode);
> +	}
> +	if (file->f_flags & O_APPEND)
> +		inode->i_fop->llseek(file, 0, SEEK_END);
> +
> +	return 0;
> +}
> +
> +/**
> + * azfs_mmap - mmap() method for file_operations
> + * @file, @vm: see file_operations methods
> + */
> +static int
> +azfs_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> +	struct azfs_super *super;
> +	struct azfs_znode *znode;
> +	struct inode *inode;
> +	unsigned long cursor, pin;
> +	unsigned long todo, size, vm_start;
> +	pgprot_t page_prot;
> +
> +	inode = file->private_data;
> +	znode = I2Z(inode);
> +	super = inode->i_sb->s_fs_info;
> +
> +	if (super->block_size < PAGE_SIZE)
> +		return -EINVAL;
> +
> +	cursor = vma->vm_pgoff << super->block_shift;
> +	todo = vma->vm_end - vma->vm_start;
> +
> +	if (cursor + todo > i_size_read(inode))
> +		return -EINVAL;
> +
> +	page_prot = pgprot_val(vma->vm_page_prot);
> +	page_prot |= (_PAGE_NO_CACHE | _PAGE_RW);
> +	page_prot &= ~_PAGE_GUARDED;
> +	vma->vm_page_prot = __pgprot(page_prot);
> +
> +	vm_start = vma->vm_start;
> +	for (size = todo; todo; todo -= size, size = todo) {
> +		pin = azfs_recherche(inode, AZFS_MMAP, cursor, &size);
> +		if (!pin)
> +			return -EAGAIN;
> +		pin >>= PAGE_SHIFT;
> +		if (remap_pfn_range(vma, vm_start, pin, size, vma->vm_page_prot))
> +			return -EAGAIN;
> +
> +		vm_start += size;
> +		cursor += size;
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * azfs_truncate - truncate() method for inode_operations
> + * @inode: see inode_operations methods
> + */
> +static void
> +azfs_truncate(struct inode *inode)
> +{
> +	struct azfs_super *super;
> +	struct azfs_znode *znode;
> +	struct azfs_block *block, *ding, *knoten, *west, *east;

The risk of me getting damned increases with that, but maybe it would be
better to use an English keyword for "knoten"?

> +	unsigned long id, count;
> +	signed long delta;
> +
> +	super = inode->i_sb->s_fs_info;
> +	znode = I2Z(inode);
> +
> +	delta = i_size_read(inode) + (super->block_size - 1);
> +	delta >>= super->block_shift;
> +	delta -= inode->i_blocks;
> +
> +	if (delta == 0) {
> +		znode->size = i_size_read(inode);
> +		return;
> +	}
> +
> +	write_lock(&znode->lock);
> +
> +	while (delta > 0) {
> +		west = east = NULL;
> +
> +		write_lock(&super->lock);
> +
> +		if (list_empty(&super->block_list)) {
> +			write_unlock(&super->lock);
> +			break;
> +		}
> +
> +		for (count = delta; count; count--) {
> +			for_each_block(block, &super->block_list)
> +				if (block->count >= count) {
> +					east = block;
> +					break;
> +				}
> +			if (east)
> +				break;
> +		}
> +
> +		for_each_block_reverse(block, &znode->block_list) {
> +			if (block->id + block->count == east->id)
> +				west = block;
> +			break;
> +		}
> +
> +		if (east->count == count) {
> +			if (west) {
> +				west->count += east->count;
> +				azfs_block_free(east);
> +			} else {
> +				azfs_block_move(east, &znode->block_list);
> +			}
> +		} else {
> +			if (west) {
> +				west->count += count;
> +			} else {
> +				if (!azfs_block_init(&znode->block_list,
> +						east->id, count)) {
> +					write_unlock(&super->lock);
> +					break;
> +				}
> +			}
> +
> +			east->id += count;
> +			east->count -= count;
> +		}
> +
> +		write_unlock(&super->lock);
> +
> +		inode->i_blocks += count;
> +
> +		delta -= count;
> +	}
> +
> +	while (delta < 0) {
> +		for_each_block_safe_reverse(block, knoten, &znode->block_list) {
> +			id = block->id;
> +			count = block->count;
> +			if ((signed long) count + delta > 0) {
> +				block->count += delta;
> +				id += block->count;
> +				count -= block->count;
> +				block = NULL;
> +			}
> +
> +			west = east = NULL;
> +
> +			write_lock(&super->lock);
> +
> +			for_each_block(ding, &super->block_list) {
> +				if (!west && (ding->id + ding->count == id))
> +					west = ding;
> +				else if (!east && (id + count == ding->id))
> +					east = ding;
> +				if (west && east)
> +					break;
> +			}
> +
> +			if (west && east) {
> +				west->count += count + east->count;
> +				azfs_block_free(east);
> +				if (block)
> +					azfs_block_free(block);
> +			} else if (west) {
> +				west->count += count;
> +				if (block)
> +					azfs_block_free(block);
> +			} else if (east) {
> +				east->id -= count;
> +				east->count += count;
> +				if (block)
> +					azfs_block_free(block);
> +			} else {
> +				if (!block) {
> +					if (!azfs_block_init(&super->block_list,
> +							id, count)) {
> +						write_unlock(&super->lock);
> +						break;
> +					}
> +				} else {
> +					azfs_block_move(block, &super->block_list);
> +				}
> +			}
> +
> +			write_unlock(&super->lock);
> +
> +			inode->i_blocks -= count;
> +
> +			delta += count;
> +
> +			break;
> +		}
> +	}
> +
> +	write_unlock(&znode->lock);
> +
> +	znode->size = min(i_size_read(inode),
> +			(loff_t) inode->i_blocks << super->block_shift);
> +}
> +
> +/**
> + * azfs_getattr - getattr() method for inode_operations
> + * @mnt, @dentry, @stat: see inode_operations methods
> + */
> +static int
> +azfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
> +{
> +	struct azfs_super *super;
> +	struct inode *inode;
> +	unsigned short shift;
> +
> +	inode = dentry->d_inode;
> +	super = inode->i_sb->s_fs_info;
> +
> +	generic_fillattr(inode, stat);
> +	stat->blocks = inode->i_blocks;
> +	shift = super->block_shift - super->sector_shift;
> +	if (shift)
> +		stat->blocks <<= shift;
> +
> +	return 0;
> +}
> +
> +static const struct address_space_operations azfs_aops = {
> +	.write_begin	= simple_write_begin,
> +	.write_end	= simple_write_end
> +};
> +
> +static struct backing_dev_info azfs_bdi = {
> +	.ra_pages	= 0,
> +	.capabilities	= AZFS_BDI_CAPABILITIES
> +};
> +
> +static struct inode_operations azfs_dir_iops = {
> +	.create		= azfs_create,
> +	.lookup		= simple_lookup,
> +	.link		= simple_link,
> +	.unlink		= simple_unlink,
> +	.symlink	= azfs_symlink,
> +	.mkdir		= azfs_mkdir,
> +	.rmdir		= simple_rmdir,
> +	.mknod		= azfs_mknod,
> +	.rename		= simple_rename
> +};
> +
> +static const struct file_operations azfs_reg_fops = {
> +	.llseek		= generic_file_llseek,
> +	.aio_read	= azfs_aio_read,
> +	.aio_write	= azfs_aio_write,
> +	.open		= azfs_open,
> +	.mmap		= azfs_mmap,
> +	.fsync		= simple_sync_file,
> +};
> +
> +static struct inode_operations azfs_reg_iops = {
> +	.truncate	= azfs_truncate,
> +	.getattr	= azfs_getattr
> +};
> +
> +/**
> + * azfs_new_inode - cook a new inode
> + * @sb: super-block
> + * @dir: parent directory
> + * @mode: file mode
> + * @dev: to be forwarded to init_special_inode()
> + */
> +static struct inode*
> +azfs_new_inode(struct super_block *sb, struct inode *dir, int mode, dev_t dev)
> +{
> +	struct azfs_super *super;
> +	struct inode *inode;
> +
> +	inode = new_inode(sb);
> +	if (!inode)
> +		return NULL;
> +
> +	inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
> +
> +	inode->i_mode = mode;
> +	if (dir) {
> +		dir->i_mtime = dir->i_ctime = inode->i_mtime;
> +		inode->i_uid = current->fsuid;
> +		if (dir->i_mode & S_ISGID) {
> +			if (S_ISDIR(mode))
> +				inode->i_mode |= S_ISGID;
> +			inode->i_gid = dir->i_gid;
> +		} else {
> +			inode->i_gid = current->fsgid;
> +		}
> +	} else {
> +		super = sb->s_fs_info;
> +		inode->i_uid = super->uid;
> +		inode->i_gid = super->gid;
> +	}
> +
> +	inode->i_blocks = 0;
> +	inode->i_mapping->a_ops = &azfs_aops;
> +	inode->i_mapping->backing_dev_info = &azfs_bdi;
> +
> +	switch (mode & S_IFMT) {
> +	case S_IFDIR:
> +		inode->i_op = &azfs_dir_iops;
> +		inode->i_fop = &simple_dir_operations;
> +		inc_nlink(inode);
> +		break;
> +
> +	case S_IFREG:
> +		inode->i_op = &azfs_reg_iops;
> +		inode->i_fop = &azfs_reg_fops;
> +		break;
> +
> +	case S_IFLNK:
> +		inode->i_op = &page_symlink_inode_operations;
> +		break;
> +
> +	default:
> +		init_special_inode(inode, mode, dev);
> +		break;
> +	}
> +
> +	return inode;
> +}
> +
> +/**
> + * azfs_alloc_inode - alloc_inode() method for super_operations
> + * @sb: see super_operations methods
> + */
> +static struct inode*
> +azfs_alloc_inode(struct super_block *sb)
> +{
> +	struct azfs_znode *znode;
> +
> +	znode = kmem_cache_alloc(azfs_znode_cache, GFP_KERNEL);
> +
> +	INIT_LIST_HEAD(&znode->block_list);
> +	rwlock_init(&znode->lock);
> +
> +	inode_init_once(&znode->vfs_inode);
> +
> +	return znode ? &znode->vfs_inode : NULL;
> +}
> +
> +/**
> + * azfs_destroy_inode - destroy_inode() method for super_operations
> + * @inode: see super_operations methods
> + */
> +static void
> +azfs_destroy_inode(struct inode *inode)
> +{
> +	kmem_cache_free(azfs_znode_cache, I2Z(inode));
> +}
> +
> +/**
> + * azfs_delete_inode - delete_inode() method for super_operations
> + * @inode: see super_operations methods
> + */
> +static void
> +azfs_delete_inode(struct inode *inode)
> +{
> +	if (S_ISREG(inode->i_mode)) {
> +		i_size_write(inode, 0);
> +		azfs_truncate(inode);
> +	}
> +	truncate_inode_pages(&inode->i_data, 0);
> +	clear_inode(inode);
> +}
> +
> +/**
> + * azfs_statfs - statfs() method for super_operations
> + * @dentry, @stat: see super_operations methods
> + */
> +static int
> +azfs_statfs(struct dentry *dentry, struct kstatfs *stat)
> +{
> +	struct super_block *sb;
> +	struct azfs_super *super;
> +	struct inode *inode;
> +	unsigned long inodes, blocks;
> +
> +	sb = dentry->d_sb;
> +	super = sb->s_fs_info;
> +
> +	inodes = blocks = 0;
> +	mutex_lock(&sb->s_lock);
> +	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> +		inodes++;
> +		blocks += inode->i_blocks;
> +	}
> +	mutex_unlock(&sb->s_lock);
> +
> +	stat->f_type = AZFS_SUPERBLOCK_MAGIC;
> +	stat->f_bsize = super->block_size;
> +	stat->f_blocks = super->media_size >> super->block_shift;
> +	stat->f_bfree = stat->f_blocks - blocks;
> +	stat->f_bavail = stat->f_blocks - blocks;
> +	stat->f_files = inodes + blocks;
> +	stat->f_ffree = blocks + 1;
> +	stat->f_namelen = NAME_MAX;
> +
> +	return 0;
> +}
> +
> +static struct super_operations azfs_ops = {
> +	.alloc_inode	= azfs_alloc_inode,
> +	.destroy_inode	= azfs_destroy_inode,
> +	.drop_inode	= generic_delete_inode,
> +	.delete_inode	= azfs_delete_inode,
> +	.statfs		= azfs_statfs
> +};
> +
> +enum {
> +	Opt_blocksize_short,
> +	Opt_blocksize_long,
> +	Opt_uid,
> +	Opt_gid,
> +	Opt_err
> +};
> +
> +static match_table_t tokens = {
> +	{Opt_blocksize_short, "bs=%u"},
> +	{Opt_blocksize_long, "blocksize=%u"},
> +	{Opt_uid, "uid=%u"},
> +	{Opt_gid, "gid=%u"},
> +	{Opt_err, NULL}
> +};
> +
> +/**
> + * azfs_parse_mount_parameters - parse options given to mount with -o
> + * @sb: super block
> + * @options: comma separated options
> + */
> +static int
> +azfs_parse_mount_parameters(struct super_block *sb, char *options)
> +{
> +	struct azfs_super *super;
> +	char *option;
> +	int token, value;
> +	substring_t args[MAX_OPT_ARGS];
> +
> +	super = sb->s_fs_info;
> +
> +	while ((option = strsep(&options, ",")) != NULL) {
> +		if (!*option)
> +			continue;
> +
> +		token = match_token(option, tokens, args);
> +		switch (token) {
> +		case Opt_blocksize_short:
> +		case Opt_blocksize_long:
> +			if (match_int(&args[0], &value))
> +				goto syntax_error;
> +			super->block_size = value;
> +			break;
> +
> +		case Opt_uid:
> +			if (match_int(&args[0], &value))
> +				goto syntax_error;
> +			super->uid = value;
> +			break;
> +
> +		case Opt_gid:
> +			if (match_int(&args[0], &value))
> +				goto syntax_error;
> +			super->gid = value;
> +			break;
> +
> +		default:
> +			goto syntax_error;
> +		}
> +	}
> +
> +	return 1;
> +
> +syntax_error:
> +	printk(KERN_ERR "%s: invalid mount option\n",
> +			AZFS_FILESYSTEM_NAME);
> +
> +	return 0;
> +}
> +
> +/**
> + * azfs_fill_super - fill_super routine for get_sb
> + * @sb, @data, @silent: see file_system_type methods
> + */
> +static int
> +azfs_fill_super(struct super_block *sb, void *data, int silent)
> +{
> +	struct gendisk *disk;
> +	struct azfs_super *super = NULL, *knoten;
> +	struct azfs_block *block = NULL;
> +	struct inode *inode = NULL;
> +	void *kaddr;
> +	unsigned long pfn;
> +	int rc;
> +
> +	BUG_ON(!sb->s_bdev);
> +
> +	disk = sb->s_bdev->bd_disk;
> +
> +	if (!disk || !disk->queue) {
> +		printk(KERN_ERR "%s needs a block device which has a gendisk "
> +				"with a queue\n",
> +				AZFS_FILESYSTEM_NAME);
> +		return -ENOSYS;
> +	}
> +
> +	if (!disk->fops->direct_access) {
> +		printk(KERN_ERR "%s needs a block device with a "
> +				"direct_access() method\n",
> +				AZFS_FILESYSTEM_NAME);
> +		return -ENOSYS;
> +	}
> +
> +	if (!get_device(disk->driverfs_dev)) {
> +		printk(KERN_ERR "%s cannot get reference to device driver\n",
> +				AZFS_FILESYSTEM_NAME);
> +		return -EFAULT;
> +	}
> +
> +	sb->s_magic = AZFS_SUPERBLOCK_MAGIC;
> +	sb->s_flags = AZFS_SUPERBLOCK_FLAGS;
> +	sb->s_op = &azfs_ops;
> +	sb->s_maxbytes = get_capacity(disk) * disk->queue->hardsect_size;
> +	sb->s_time_gran = 1;
> +
> +	spin_lock(&super_list.lock);
> +	list_for_each_entry(knoten, &super_list.head, list)
> +		if (knoten->blkdev == sb->s_bdev) {
> +			super = knoten;
> +			break;
> +		}
> +	spin_unlock(&super_list.lock);
> +
> +	if (super) {
> +		if (strlen((char*) data))
> +			printk(KERN_WARNING "/dev/%s was already mounted with "
> +					"%s before, it will be mounted with "
> +					"mount options used last time, "
> +					"options just given would be ignored\n",
> +					disk->disk_name, AZFS_FILESYSTEM_NAME);
> +		sb->s_fs_info = super;
> +	} else {
> +		super = kzalloc(sizeof(struct azfs_super), GFP_KERNEL);
> +		if (!super) {
> +			rc = -ENOMEM;
> +			goto failed;
> +		}
> +		sb->s_fs_info = super;
> +
> +		if (!azfs_parse_mount_parameters(sb, (char*) data)) {
> +			rc = -EINVAL;
> +			goto failed;
> +		}
> +
> +		inode = azfs_new_inode(sb, NULL, S_IFDIR | S_IRWXUGO, 0);
> +		if (!inode) {
> +			rc = -ENOMEM;
> +			goto failed;
> +		}
> +
> +		super->root = d_alloc_root(inode);
> +		if (!super->root) {
> +			rc = -ENOMEM;
> +			goto failed;
> +		}
> +		dget(super->root);
> +
> +		INIT_LIST_HEAD(&super->list);
> +		INIT_LIST_HEAD(&super->block_list);
> +		rwlock_init(&super->lock);
> +
> +		super->media_size = sb->s_maxbytes;
> +
> +		if (!super->block_size)
> +			super->block_size = sb->s_blocksize;
> +		super->block_shift = blksize_bits(super->block_size);
> +
> +		super->sector_size = disk->queue->hardsect_size;
> +		super->sector_shift = blksize_bits(super->sector_size);
> +
> +		super->blkdev = sb->s_bdev;
> +
> +		block = azfs_block_init(&super->block_list,
> +				0, super->media_size >> super->block_shift);
> +		if (!block) {
> +			rc = -ENOMEM;
> +			goto failed;
> +		}
> +
> +		rc = disk->fops->direct_access(super->blkdev, 0, &kaddr, &pfn);
> +		if (rc < 0) {
> +			rc = -EFAULT;
> +			goto failed;
> +		}
> +		super->ph_addr = (unsigned long) kaddr;
> +
> +		super->io_addr = (unsigned long) ioremap_flags(
> +				super->ph_addr, super->media_size, _PAGE_NO_CACHE);
> +		if (!super->io_addr) {
> +			rc = -EFAULT;
> +			goto failed;
> +		}
> +
> +		spin_lock(&super_list.lock);
> +		list_add(&super->list, &super_list.head);
> +		spin_unlock(&super_list.lock);
> +	}
> +
> +	sb->s_root = super->root;
> +	disk->driverfs_dev->driver_data = super;
> +	disk->driverfs_dev->platform_data = sb;
> +
> +	if (super->block_size < PAGE_SIZE)
> +		printk(KERN_INFO "Block size on %s is smaller then system "
> +				"page size: mmap() would not be supported\n",
> +				disk->disk_name);
> +
> +	return 0;
> +
> +failed:
> +	if (super) {
> +		sb->s_root = NULL;
> +		sb->s_fs_info = NULL;
> +		if (block)
> +			azfs_block_free(block);
> +		if (super->root)
> +			dput(super->root);
> +		if (inode)
> +			iput(inode);
> +		disk->driverfs_dev->driver_data = NULL;
> +		kfree(super);
> +		disk->driverfs_dev->platform_data = NULL;
> +		put_device(disk->driverfs_dev);
> +	}
> +
> +	return rc;
> +}
> +
> +/**
> + * azfs_get_sb - get_sb() method for file_system_type
> + * @fs_type, @flags, @dev_name, @data, @mount: see file_system_type methods
> + */
> +static int
> +azfs_get_sb(struct file_system_type *fs_type, int flags,
> +	    const char *dev_name, void *data, struct vfsmount *mount)
> +{
> +	return get_sb_bdev(fs_type, flags,
> +			dev_name, data, azfs_fill_super, mount);
> +}
> +
> +/**
> + * azfs_kill_sb - kill_sb() method for file_system_type
> + * @sb: see file_system_type methods
> + */
> +static void
> +azfs_kill_sb(struct super_block *sb)
> +{
> +	sb->s_root = NULL;
> +	kill_block_super(sb);
> +}
> +
> +static struct file_system_type azfs_fs = {
> +	.owner		= THIS_MODULE,
> +	.name		= AZFS_FILESYSTEM_NAME,
> +	.get_sb		= azfs_get_sb,
> +	.kill_sb	= azfs_kill_sb,
> +	.fs_flags	= AZFS_FILESYSTEM_FLAGS
> +};
> +
> +/**
> + * azfs_init
> + */
> +static int __init
> +azfs_init(void)
> +{
> +	int rc;
> +
> +	INIT_LIST_HEAD(&super_list.head);
> +	spin_lock_init(&super_list.lock);
> +
> +	azfs_znode_cache = kmem_cache_create("azfs_znode_cache",
> +			sizeof(struct azfs_znode), 0, AZFS_CACHE_FLAGS, NULL);
> +	if (!azfs_znode_cache) {
> +		printk(KERN_ERR "Could not allocate inode cache for %s\n",
> +				AZFS_FILESYSTEM_NAME);
> +		rc = -ENOMEM;
> +		goto failed;
> +	}
> +
> +	azfs_block_cache = kmem_cache_create("azfs_block_cache",
> +			sizeof(struct azfs_block), 0, AZFS_CACHE_FLAGS, NULL);
> +	if (!azfs_block_cache) {
> +		printk(KERN_ERR "Could not allocate block cache for %s\n",
> +				AZFS_FILESYSTEM_NAME);
> +		rc = -ENOMEM;
> +		goto failed;
> +	}
> +
> +	rc = register_filesystem(&azfs_fs);
> +	if (rc != 0) {
> +		printk(KERN_ERR "Could not register %s\n",
> +				AZFS_FILESYSTEM_NAME);
> +		goto failed;
> +	}
> +
> +	return 0;
> +
> +failed:
> +	if (azfs_block_cache)
> +		kmem_cache_destroy(azfs_block_cache);
> +
> +	if (azfs_znode_cache)
> +		kmem_cache_destroy(azfs_znode_cache);
> +
> +	return rc;
> +}
> +
> +/**
> + * azfs_exit
> + */
> +static void __exit
> +azfs_exit(void)
> +{
> +	struct azfs_super *super, *SUPER;

I think that yelling in deep desperation like that is not quite in
agreement with the kernel coding style.

> +	struct azfs_block *block, *knoten;
> +	struct gendisk *disk;
> +
> +	spin_lock(&super_list.lock);
> +	list_for_each_entry_safe(super, SUPER, &super_list.head, list) {
> +		disk = super->blkdev->bd_disk;
> +		list_del(&super->list);
> +		iounmap((void*) super->io_addr);
> +		write_lock(&super->lock);
> +		for_each_block_safe(block, knoten, &super->block_list)
> +			azfs_block_free(block);
> +		write_unlock(&super->lock);
> +		disk->driverfs_dev->driver_data = NULL;
> +		disk->driverfs_dev->platform_data = NULL;
> +		kfree(super);
> +		put_device(disk->driverfs_dev);
> +	}
> +	spin_unlock(&super_list.lock);
> +
> +	unregister_filesystem(&azfs_fs);
> +
> +	kmem_cache_destroy(azfs_block_cache);
> +	kmem_cache_destroy(azfs_znode_cache);
> +}
> +
> +module_init(azfs_init);
> +module_exit(azfs_exit);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Maxim Shchetynin <maxim@de.ibm.com>");
> +MODULE_DESCRIPTION("Non-buffered file system for IO devices");
> 

An unprecedented lack of comments in this driver can hardly boost the
reader's attention. Besides, I personally think that a kind of a design
document could be extremely useful - basically, explain the purpose of
the filesystem, the basic idead behind it, etc.

Thanks,
Dmitri

next prev parent reply	other threads:[~2008-06-17 15:01 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-09  8:46 AZFS file system proposal Maxim Shchetynin
2008-06-09  8:46 ` Maxim Shchetynin
2008-06-09 12:55 ` Matthew Wilcox
2008-06-10  8:49   ` Maxim Shchetynin
2008-06-10  8:49     ` Maxim Shchetynin
2008-06-10 22:02     ` Jan Engelhardt
2008-06-17  9:06       ` Maxim Shchetynin
2008-06-17  9:06         ` Maxim Shchetynin
2008-06-17  9:35         ` Jan Engelhardt
2008-06-17 10:53           ` Jörn Engel
2008-06-17 10:53             ` Jörn Engel
2008-06-17 14:06             ` Maxim Shchetynin
2008-06-17 14:06               ` Maxim Shchetynin
2008-06-17 14:45               ` Jörn Engel
2008-06-17 14:45                 ` Jörn Engel
2008-06-17 11:57           ` Maxim Shchetynin
2008-06-17 11:57             ` Maxim Shchetynin
2008-06-17 14:36             ` Jan Engelhardt
2008-06-17 15:51               ` Jörn Engel
2008-06-17 15:51                 ` Jörn Engel
2008-06-18 11:15               ` Maxim Shchetynin
2008-06-18 11:15                 ` Maxim Shchetynin
2008-06-18 20:56                 ` Jörn Engel
2008-06-18 11:21               ` Maxim Shchetynin
2008-06-18 11:21                 ` Maxim Shchetynin
2008-06-17 15:02 ` Dmitri Vorobiev [this message]
2008-06-18 14:01   ` Maxim Shchetynin
2008-06-18 14:01     ` Maxim Shchetynin
2008-06-18 11:27 ` Christoph Hellwig
2008-06-18 14:03   ` Maxim Shchetynin
  -- strict thread matches above, loose matches on Subject: below --
2008-06-18 14:06 Maxim Shchetynin
2008-06-18 14:06 ` Maxim Shchetynin
2008-07-01 14:59 ` Arnd Bergmann
2008-07-01 14:59   ` Arnd Bergmann
2008-07-07 15:39   ` Maxim Shchetynin
2008-07-07 15:39     ` Maxim Shchetynin
2008-07-07 15:39     ` Maxim Shchetynin
2008-07-08 14:42     ` Arnd Bergmann
2008-07-08 14:42       ` Arnd Bergmann
2008-07-08 14:42       ` Arnd Bergmann
2008-07-09  6:48       ` Benjamin Herrenschmidt
2008-07-09  6:48         ` Benjamin Herrenschmidt
2008-07-09  8:58   ` Benjamin Herrenschmidt
2008-07-09  8:58     ` Benjamin Herrenschmidt
2008-07-09  9:14     ` Maxim Shchetynin
2008-07-09  9:14       ` Maxim Shchetynin
2008-07-09  9:14       ` Maxim Shchetynin
2008-07-09  9:23       ` Benjamin Herrenschmidt
2008-07-09 10:58         ` Maxim Shchetynin
2008-07-09 10:58           ` Maxim Shchetynin
2008-07-09 10:58           ` Maxim Shchetynin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4857D211.1030308@movial.fi \
    --to=dmitri.vorobiev@movial.fi \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maxim@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.