Re: AZFS file system proposal

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Arnd Bergmann <arnd@arndb.de>
To: Maxim Shchetynin <maxim@linux.vnet.ibm.com>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linuxppc-dev@ozlabs.org
Subject: Re: AZFS file system proposal
Date: Tue, 1 Jul 2008 16:59:32 +0200	[thread overview]
Message-ID: <200807011659.33413.arnd@arndb.de> (raw)
In-Reply-To: <20080618160629.6cd749a8@mercedes-benz.boeblingen.de.ibm.com>

On Wednesday 18 June 2008, Maxim Shchetynin wrote:
> AZFS patch updated accordinly to comments of Christoph Hellwig and Dmitri Vorobiev.

Sorry for my not commenting earlier on this. I'm finally collecting my
2.6.27 patches and stumbled over it again. There are a few details
that I hope we can fix up quickly, other than that, it looks good now,
great work!

> Subject: azfs: initial submit of azfs, a non-buffered filesystem
 
Please make the patch subject the actual subject of your email next time,
and put the introductory text below the Signed-off-by: lines, separated
by a "---" line. That will make the standard tools work without extra
effort on my side. Also, please always Cc the person you want to merge
the patch, in this case probably me.

> diff -Nuar linux-2.6.26-rc6/fs/Makefile linux-2.6.26-rc6-azfs/fs/Makefile
> --- linux-2.6.26-rc6/fs/Makefile	2008-06-12 23:22:24.000000000 +0200
> +++ linux-2.6.26-rc6-azfs/fs/Makefile	2008-06-16 11:17:50.000000000 +0200
> @@ -119,3 +119,4 @@
>  obj-$(CONFIG_DEBUG_FS)		+= debugfs/
>  obj-$(CONFIG_OCFS2_FS)		+= ocfs2/
>  obj-$(CONFIG_GFS2_FS)           += gfs2/
> +obj-$(CONFIG_AZ_FS)		+= azfs.o
> diff -Nuar linux-2.6.26-rc6/fs/azfs.c linux-2.6.26-rc6-azfs/fs/azfs.c
> --- linux-2.6.26-rc6/fs/azfs.c	1970-01-01 01:00:00.000000000 +0100
> +++ linux-2.6.26-rc6-azfs/fs/azfs.c	2008-06-18 15:56:13.252266896 +0200

All other file systems are in separate directories, so it would be better
to rename fs/azfs.c to fs/azfs/inode.c

> +#define AZFS_FILESYSTEM_NAME		"azfs"
> +#define AZFS_FILESYSTEM_FLAGS		FS_REQUIRES_DEV
> +
> +#define AZFS_SUPERBLOCK_MAGIC		0xABBA1972
> +#define AZFS_SUPERBLOCK_FLAGS		MS_NOEXEC | \
> +					MS_SYNCHRONOUS | \
> +					MS_DIRSYNC | \
> +					MS_ACTIVE

Why MS_NOEXEC? What happens on a remount if the user does not specifies
-o remount,exec?

> +/**
> + * azfs_block_find - get real address of a part of a file
> + * @inode: inode
> + * @direction: data direction
> + * @from: offset for read/write operation
> + * @size: pointer to a value of the amount of data to be read/written
> + */
> +static unsigned long
> +azfs_block_find(struct inode *inode, enum azfs_direction direction,
> +		unsigned long from, unsigned long *size)
> +{
> +	struct azfs_super *super;
> +	struct azfs_znode *znode;
> +	struct azfs_block *block;
> +	unsigned long block_id, west, east;
> +
> +	super = inode->i_sb->s_fs_info;
> +	znode = I2Z(inode);
> +
> +	if (from + *size > znode->size) {
> +		i_size_write(inode, from + *size);
> +		inode->i_op->truncate(inode);
> +	}
> +
> +	read_lock(&znode->lock);
> +
> +	if (list_empty(&znode->block_list)) {
> +		read_unlock(&znode->lock);
> +		return 0;
> +	}
> +
> +	block_id = from >> super->block_shift;
> +
> +	for_each_block(block, &znode->block_list) {
> +		if (block->count > block_id)
> +			break;
> +		block_id -= block->count;
> +	}
> +
> +	west = from % super->block_size;
> +	east = ((block->count - block_id) << super->block_shift) - west;
> +
> +	if (*size > east)
> +		*size = east;
> +
> +	block_id = ((block->id + block_id) << super->block_shift) + west;
> +
> +	read_unlock(&znode->lock);
> +
> +	block_id += direction == AZFS_MMAP ? super->ph_addr : super->io_addr;
> +
> +	return block_id;
> +}

This overloading of the return type to mean either a pointer or an offset
on the block device is rather confusing. Why not just return the raw block_id
before the last += and leave that part up to the caller?

static void __iomem *
azfs_block_addr(struct inode *inode, enum azfs_direction direction,
		unsigned long from, unsigned long *size)
{
	struct azfs_super *super;
	unsigned long offset;
	void __iomem *p;

	super = inode->i_sb->s_fs_info;
	offset = azfs_block_find(inode, super, 0, from, size);
	p = super->ph_addr + offset;

	return p;
}

> +	target = iov->iov_base;
> +	todo = min((loff_t) iov->iov_len, i_size_read(inode) - pos);
> +
> +	for (step = todo; step; step -= size) {
> +		size = step;
> +		pin = azfs_block_find(inode, AZFS_READ, pos, &size);
> +		if (!pin) {
> +			rc = -ENOSPC;
> +			goto out;
> +		}
> +		if (copy_to_user(target, (void*) pin, size)) {
> +			rc = -EFAULT;
> +			goto out;
> +		}

Question to the powerpc folks: is copy_to_user safe for an __iomem source?
Should there be two copies (memcpy_fromio and copy_to_user) instead?

> +	page_prot = pgprot_val(vma->vm_page_prot);
> +	page_prot |= (_PAGE_NO_CACHE | _PAGE_RW);
> +	page_prot &= ~_PAGE_GUARDED;
> +	vma->vm_page_prot = __pgprot(page_prot);

The pgprot modifications rely on powerpc specific flags, but the
file system should not really need to be powerpc only.

The flags we want are more or less the same as PAGE_AGP, because
both are I/O mapped memory that needs to be uncached but should
not be guarded, for performance reasons.

Maybe we can introduce a new PAGE_IOMEM here that we can use
in all places that need something like this. In spufs we need
the same flags for the local store mappings.

I wouldn't hold up merging the file system for this problem, but
until it is solved, the Kconfig entry should probably have
a "depends on PPC".

	Arnd <><

next prev parent reply	other threads:[~2008-07-01 15:00 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-18 14:06 AZFS file system proposal Maxim Shchetynin
2008-07-01 14:59 ` Arnd Bergmann [this message]
2008-07-07 15:39   ` Maxim Shchetynin
2008-07-08 14:42     ` Arnd Bergmann
2008-07-09  6:48       ` Benjamin Herrenschmidt
2008-07-07 15:42   ` azfs: initial submit of azfs, a non-buffered filesystem Maxim Shchetynin
2008-07-07 19:37     ` Uli Luckas
2008-07-08  9:10       ` Maxim Shchetynin
2008-07-09  8:58   ` AZFS file system proposal Benjamin Herrenschmidt
2008-07-09  9:14     ` Maxim Shchetynin
2008-07-09  9:23       ` Benjamin Herrenschmidt
2008-07-09 10:58         ` Maxim Shchetynin
  -- strict thread matches above, loose matches on Subject: below --
2008-06-09  8:46 Maxim Shchetynin
2008-06-09 12:55 ` Matthew Wilcox
2008-06-10  8:49   ` Maxim Shchetynin
2008-06-10 22:02     ` Jan Engelhardt
2008-06-17  9:06       ` Maxim Shchetynin
2008-06-17  9:35         ` Jan Engelhardt
2008-06-17 10:53           ` Jörn Engel
2008-06-17 14:06             ` Maxim Shchetynin
2008-06-17 14:45               ` Jörn Engel
2008-06-17 11:57           ` Maxim Shchetynin
2008-06-17 14:36             ` Jan Engelhardt
2008-06-17 15:51               ` Jörn Engel
2008-06-18 11:15               ` Maxim Shchetynin
2008-06-18 20:56                 ` Jörn Engel
2008-06-18 11:21               ` Maxim Shchetynin
2008-06-17 15:02 ` Dmitri Vorobiev
2008-06-18 14:01   ` Maxim Shchetynin
2008-06-18 11:27 ` Christoph Hellwig
2008-06-18 14:03   ` Maxim Shchetynin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200807011659.33413.arnd@arndb.de \
    --to=arnd@arndb.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=maxim@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).