linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Jörn Engel" <joern@lazybastard.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mtd@lists.infradead.org,
	Albert Cahalan <acahalan@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Jan Engelhardt <jengelh@linux01.gwdg.de>,
	Evgeniy Polyakov <johnpol@2ka.mipt.ru>,
	Pekka Enberg <penberg@cs.helsinki.fi>, Greg KH <greg@kroah.com>,
	Ingo Oeser <ioe-lkml@rameria.de>
Subject: Re: [PATCH] LogFS take three
Date: Wed, 16 May 2007 02:06:31 +0200	[thread overview]
Message-ID: <20070516000630.GD1220@lazybastard.org> (raw)
In-Reply-To: <20070515133759.9ee434a2.akpm@linux-foundation.org>

On Tue, 15 May 2007 13:37:59 -0700, Andrew Morton wrote:
> > +
> > +config LOGFS_FSCK
> > +	bool "Run LogFS fsck at mount time"
> > +	depends on LOGFS
> > +	help
> > +	  Run a full filesystem check on every mount.  If any errors are
> > +	  found, mounting the filesystem will fail.  This is a debug option
> > +	  for developers.
> > +
> > +	  If unsure, say N.
> > +
> 
> No dependency on MTD,

Ack.

> Can this code be tested by people who don't have MTD hardware?  We used to
> ahve a fake-mtd-on-a-blockdev thing, whcih was in a state of some
> disrepair.  Maybe it got repaired.  Or removed.  I can't immediately locate
> it...

Got removed.  Block2mtd is the new one and works with logfs.  At least
my version of it does and I pushed my patches to David - not sure if
they made it to mainline already.  Mtdram should work as well.

> It's strange and a bit regrettable that an fs would have dependency on MTD,
> really.

It is and changing this wouldn't be too hard.  All device access goes
through five functions (read, write, erase, is_bad and mark_bad).  As
soon as someone seriously cares I will add a struct logfs_device_ops and
have a second set of these functions for block devices.

On hard disks it shouldn't make too much sense.  The filesystem will
fragment like a splinter bomb and be just as popular.

> ooh, documentation.  Quick, merge it!

:)

> > +/* memtree.c */
> > +void btree_init(struct btree_head *head);
> > +void *btree_lookup(struct btree_head *head, long val);
> > +int btree_insert(struct btree_head *head, long val, void *ptr);
> > +int btree_remove(struct btree_head *head, long val);
> 
> These names are too generic.  If we later add a btree library: blam.

My plan was to move this code to lib/ sooner or later.  If you consider
it useful in its current state, I can do it immediatly.  And if someone
else merged a superior btree library I'd happily remove mine and use the
new one instead.

Opinions?

> > +/* super.c */
> > +int mtdread(struct super_block *sb, loff_t ofs, size_t len, void *buf);
> > +int mtdwrite(struct super_block *sb, loff_t ofs, size_t len, void *buf);
> > +int mtderase(struct super_block *sb, loff_t ofs, size_t len);
> > +void logfs_crash_dump(struct super_block *sb);
> > +int all_ff(void *buf, size_t len);
> > +int logfs_statfs(struct dentry *dentry, struct kstatfs *stats);
> 
> Have you checked that all of this needs global scope?

Not within the last month.  Will recheck.

> > +
> > +/* progs/fsck.c */
> > +#ifdef CONFIG_LOGFS_FSCK
> > +int logfs_fsck(struct super_block *sb);
> > +#else
> > +#define logfs_fsck(sb) ({ 0; })
> 
> static inline int logfs_fsck(struct super_block *sb)
> {
> 	return 0;
> }
> 
> is better: nicer to look at, has typechecking.

That's what I tried first.  But the brick I carry on my neck decided to
compile and link the read logfs_fsck() unconditionally and so this
collided.

Will change the Makefile and this.

> > +static inline struct logfs_super *LOGFS_SUPER(struct super_block *sb)
> > +{
> > +	return sb->s_fs_info;
> > +}
> > +
> > +static inline struct logfs_inode *LOGFS_INODE(struct inode *inode)
> > +{
> > +	return container_of(inode, struct logfs_inode, vfs_inode);
> > +}
> 
> Do these need to be uppercase?

No more than EXT2_SB() or EXT2_I().

Since you are the second person to ask, I sense a majority vote to
change the case.

> > +static inline pgoff_t logfs_index(u64 pos)
> > +{
> > +	return pos / LOGFS_BLOCKSIZE;
> > +}
> 
> If the compiler goofs up here we'll end up trying to do a 64/32 divide and
> it won't link on 32-bit machines.  It would be safer to do
> 
> 	return pos >> LOGFS_BLOCKSHIFT;

Will change.

> > +int logfs_uncompress(void *in, void *out, size_t inlen, size_t outlen)
> > +{
> > +	int err, ret;
> > +
> > +	ret = -EIO;
> > +	mutex_lock(&compr_mutex);
> 
> A per-superblock lock and stream would be nicer.

Yes and no.  The zlib workspaces weigh about 300k.  On any decent SMP
machine, that hardly matters and getting the scalability is nice.  For
low-end embedded, that amount of memory per filesystem can matter.

I have no clue how many people would suffer one way or another.

> > +
> > +
> > +static inline loff_t file_end(struct inode *inode)
> > +{
> > +	return (i_size_read(inode) + inode->i_sb->s_blocksize - 1)
> > +		>> inode->i_sb->s_blocksize_bits;
> > +}
> > +static void logfs_set_name(struct logfs_disk_dentry *dd, struct qstr *name)
> > +{
> 
> The code has a strange mix of two-blank-lines-between-functions and
> no-blank-lines-between-functions.  One blank line is usual.

Will change.

> > +
> > +	if (dest) {
> > +		/* symlink */
> > +		ret = logfs_inode_write(inode, dest, destlen, 0);
> > +	} else {
> > +		/* creat/mkdir/mknod */
> > +		ret = __logfs_write_inode(inode);
> > +	}
> > +	super->s_victim_ino = 0;
> > +	if (ret) {
> > +		if (!dest)
> > +			li->li_flags |= LOGFS_IF_STILLBORN;
> > +		/* FIXME: truncate symlink */
> > +		inode->i_nlink--;
> > +		iput(inode);
> > +		goto out;
> > +	}
> > +
> > +	if (inode->i_mode & S_IFDIR)
> > +		dir->i_nlink++;
> 
> You have helper functions for i_nlink++, which remember to do
> mark_inode_dirty()?

I do.  Looks like I should use them here and in at least one other
place.  Will recheck them all.

> > +static struct inode_operations logfs_symlink_iops = {
> > +	.readlink	= generic_readlink,
> > +	.follow_link	= page_follow_link_light,
> > +};
> 
> Should be const.

As should be all oder struct foo_ops.  Will do.

> > +static int logfs_permission(struct inode *inode, int mask, struct nameidata *nd)
> > +{
> > +	return generic_permission(inode, mask, NULL);
> > +}
> 
> Does this need to exist?

It does not.  Will remove.

> const
> const

Yep.

> > +
> > +	BUG_ON(PAGE_CACHE_SIZE != inode->i_sb->s_blocksize);
> 
> This check can be done once, at mount time.

Will do.

> > +	buf = kmap(page);
> > +	ret = logfs_write_buf(inode, index, buf);
> > +	kunmap(page);
> 
> kmap() is lame.  The preferred approach would be to pass the page* down to
> the lower layers and to use kmap_atomic() at the lowest possible point.

Interesting.  I'll have a look at this.

> const
> const
> const

Yep.

> > +/*
> > + * cookie is set to 1 if we hand out a cached inode, 0 otherwise.
> > + * this allows logfs_iput to do the right thing later
> > + */
> > +struct inode *logfs_iget(struct super_block *sb, ino_t ino, int *cookie)
> > +{
> > +	struct logfs_super *super = LOGFS_SUPER(sb);
> > +	struct logfs_inode *li;
> > +
> > +	if (ino == LOGFS_INO_MASTER)
> > +		return super->s_master_inode;
> > +
> > +	spin_lock(&inode_lock);
> > +	list_for_each_entry(li, &super->s_freeing_list, li_freeing_list)
> > +		if (li->vfs_inode.i_ino == ino) {
> > +			spin_unlock(&inode_lock);
> > +			*cookie = 1;
> > +			return &li->vfs_inode;
> > +		}
> > +	spin_unlock(&inode_lock);
> > +
> > +	*cookie = 0;
> > +	return __logfs_iget(sb, ino);
> > +}
> 
> A filesystem playing with inode_lock: not good.  What's going on here?
> 
> As a minimum, the reasons for this should be clearly spelled out in code
> comments, because this sticks out like a sore thumb.

There is an explanation in the top 30 lines of the file.  Should I add a
reference here?

> > +struct inode *logfs_new_meta_inode(struct super_block *sb, u64 ino)
> > +{
> > +	struct inode *inode;
> > +
> > +	inode = logfs_alloc_inode(sb);
> > +	if (!inode)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	logfs_init_inode(inode);
> > +	inode->i_mode = 0;
> > +	inode->i_ino = ino;
> > +	inode->i_sb = sb;
> > +
> > +	/* This is a blatant copy of alloc_inode code.  We'd need alloc_inode
> > +	 * to be nonstatic, alas. */
> > +	{
> > +		static const struct address_space_operations empty_aops;
> > +		struct address_space * const mapping = &inode->i_data;
> > +
> > +		mapping->a_ops = &empty_aops;
> > +		mapping->host = inode;
> > +		mapping->flags = 0;
> > +		mapping_set_gfp_mask(mapping, GFP_HIGHUSER);
> > +		mapping->assoc_mapping = NULL;
> > +		mapping->backing_dev_info = &default_backing_dev_info;
> > +		inode->i_mapping = mapping;
> > +	}
> > +
> > +	return inode;
> > +}
> 
> This function would benefit from some comments.  What's it doing, and why
> is it special?  I mean, new_inode() calls alloc_inode() anyway, so you're
> unable to use new_inode().  The reader wonders why.

Inodes are stored in an inode file.  The inode file's inode (master
inode) caused me a bit of headache.  It needs to be released as the very
last, so that any regular inodes can still use it for writeback.

Doing so would gave the user an unpleasant message:
		if (invalidate_inodes(sb)) {
			printk("VFS: Busy inodes after unmount of %s. "
			   "Self-destruct in 5 seconds.  Have a nice day...\n",
			   sb->s_id);
		}

This function mainly exists to smuggle the master inode around this
check.  I didn't know any better way to deal with the problem.

Add this as comment or is there a better strategy I should persue?

> > +static struct timespec be64_to_timespec(__be64 betime)
> > +{
> > +	u64 time = be64_to_cpu(betime);
> > +	struct timespec tsp;
> > +
> > +	tsp.tv_sec = time >> 32;
> > +	tsp.tv_nsec = time & 0xffffffff;
> > +	WARN_ON(tsp.tv_nsec > 999999999);
> > +	return tsp;
> > +}
> 
> Could use ns_to_timespec(be64_to_cpu(betime)) here.

Would this help.  Looks like the function does a division and adds any
remainder to tsp.tv_sec.  But any value for tsp.tv_nsec >= NSEC_PER_SEC
is caused by either a bug or data corruption.

If the goal is to prevent illegal values, I could do something cheaper
like:
	if (tsp.tv_nsec >= NSEC_PER_SEC)
		tsp.tv_nsec = 0;

> Should use >= NSEC_PER_SEC here.

Will do.

> > +
> > +static __be64 timespec_to_be64(struct timespec tsp)
> > +{
> > +	u64 time = ((u64)tsp.tv_sec << 32) + (tsp.tv_nsec & 0xffffffff);
> > +
> > +	WARN_ON(tsp.tv_nsec > 999999999);
> > +	return cpu_to_be64(time);
> > +}
> 
> Dittos.

Yep.

> > +	spin_lock(&super->s_ino_lock);
> > +	ino = super->s_last_ino;
> > +	super->s_last_ino++;
> > +	spin_unlock(&super->s_ino_lock);
> > +	return ino;
> > +}
> 
> Could use atomic64_add_return() here.

tag not found: atomic64_add_return

That one seems to be missing from include/asm-i386/

> > +static void logfs_init_once(void *_li, struct kmem_cache *cachep,
> > +		unsigned long flags)
> > +{
> > +	struct logfs_inode *li = _li;
> > +	int i;
> > +
> > +	if ((flags & (SLAB_CTOR_VERIFY|SLAB_CTOR_CONSTRUCTOR)) ==
> > +			SLAB_CTOR_CONSTRUCTOR) {
> 
> This won't compile in mainline (SLAB_CTOR_VERIFY has gone)
> 
> And it won't compile in -mm (SLAB_CTOR_CONSTRUCTOR has gone).
> 
> Just remove the test altogether.

Will do.

> const

Yep.

> > +
> > +int logfs_init_inode_cache(void)
> > +{
> > +	logfs_inode_cache = kmem_cache_create("logfs_inode_cache",
> > +			sizeof(struct logfs_inode), 0, SLAB_RECLAIM_ACCOUNT,
> > +			logfs_init_once, NULL);
> 
> Use KMEM_CACHE() helper

Neat!  Will do.

> <attention span ran out, sorry>

No worries!  Thank you a lot for what you've found.  It seems to include
some bugs.

Jörn

-- 
It is better to die of hunger having lived without grief and fear,
than to live with a troubled spirit amid abundance.
-- Epictetus
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2007-05-16  0:10 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-15 15:19 [PATCH] LogFS take three Jörn Engel
2007-05-15 15:21 ` Review status (Re: [PATCH] LogFS take three) Jörn Engel
2007-05-17 16:03   ` Evgeniy Polyakov
2007-05-17 17:10     ` Jörn Engel
2007-05-20 17:30       ` Evgeniy Polyakov
2007-05-23 12:58         ` Jörn Engel
2007-05-23 15:07           ` Evgeniy Polyakov
2007-05-23 15:14             ` Jörn Engel
2007-05-23 15:53               ` Evgeniy Polyakov
2007-05-15 18:37 ` [PATCH] LogFS take three Sam Ravnborg
2007-05-15 19:10   ` Jörn Engel
2007-05-15 19:07 ` John Stoffel
2007-05-15 19:19   ` Jörn Engel
2007-05-16  4:54     ` David Woodhouse
2007-05-16 11:09       ` Jörn Engel
2007-05-16 11:34         ` Jamie Lokier
2007-05-16 11:38           ` Artem Bityutskiy
2007-05-16 12:25             ` Jamie Lokier
2007-05-16 12:49               ` Jörn Engel
2007-05-16 11:50           ` Jörn Engel
2007-05-16 12:06             ` CaT
2007-05-17 17:07               ` Jan Engelhardt
2007-05-16 12:29             ` Evgeniy Polyakov
2007-05-16 12:55               ` Jörn Engel
2007-05-17 17:08                 ` Jan Engelhardt
2007-05-16 13:41             ` John Stoffel
2007-05-16 13:53               ` Jörn Engel
2007-05-16 14:04                 ` David Woodhouse
2007-05-16 14:13                   ` Artem Bityutskiy
2007-05-16 14:17                   ` Kevin Bowling
2007-05-17  9:12                     ` Pavel Machek
2007-05-17  9:52                       ` David Woodhouse
2007-05-19 15:00                     ` Bill Davidsen
2007-05-16 14:29                 ` CaT
2007-05-17 17:14                 ` Jan Engelhardt
2007-05-19 14:45                 ` David Weinehall
2007-05-19 16:17                   ` Jamie Lokier
2007-05-19 16:27                     ` Evgeniy Polyakov
2007-05-17 16:59         ` Jan Engelhardt
2007-05-17 11:39     ` Arnd Bergmann
2007-05-15 20:37 ` Andrew Morton
2007-05-16  0:06   ` Jörn Engel [this message]
2007-05-16  2:11     ` Jörn Engel
2007-05-16  5:22     ` Willy Tarreau
2007-05-16 11:23       ` Jörn Engel
2007-05-17 17:26     ` Jan Engelhardt
2007-05-17 17:45       ` Evgeniy Polyakov
2007-05-18  1:00         ` Kyle Moffett
2007-05-18  6:16           ` Jan Engelhardt
2007-05-16 12:07   ` David Woodhouse
2007-05-16 15:34     ` Andrew Morton
2007-05-16 15:49       ` David Woodhouse
2007-05-16 16:41         ` Jörn Engel
     [not found]           ` <7fe698080705162312t4e7ed90byd10ef8e664027b17@mail.gmail.com>
2007-05-17  6:25             ` David Woodhouse
2007-05-17  8:20               ` Dongjun Shin
2007-05-17  8:43                 ` David Woodhouse
2007-05-17 12:05                   ` Jörn Engel
2007-05-17 20:18                 ` Pavel Machek
2007-05-18  0:01                   ` Dongjun Shin
2007-05-18  6:17                     ` Jan Engelhardt
2007-05-18  6:47                       ` David Woodhouse
2007-05-19 15:18                 ` Bill Davidsen
2007-05-19  6:15   ` Rob Landley
2007-05-19  9:24     ` Jan Engelhardt
2007-05-19 17:26       ` Rob Landley
2007-05-15 23:26 ` Albert Cahalan
2007-05-16  0:09   ` Jörn Engel
2007-05-16 11:18   ` Jamie Lokier
2007-05-16 12:08     ` Pekka Enberg
2007-05-16 12:58       ` Jörn Engel
2007-05-16  2:37 ` Roland Dreier
2007-05-16 11:35   ` Jörn Engel
2007-05-16 10:21 ` Pekka J Enberg
2007-05-16 12:26   ` Jörn Engel
2007-05-16 12:36     ` Pekka Enberg
2007-05-16 12:38       ` Pekka Enberg
2007-05-16 13:20       ` Jörn Engel
2007-05-17 20:58         ` Pekka Enberg
2007-05-17 21:36           ` Arnd Bergmann
2007-05-17 21:50             ` Jörn Engel
2007-05-16 19:17 ` Pavel Machek
2007-05-16 19:23   ` Jörn Engel
2007-05-17 15:08 ` Arnd Bergmann
2007-05-17 20:21   ` Jörn Engel
2007-05-17 21:00     ` Arnd Bergmann
2007-05-17 21:30       ` Jörn Engel
2007-05-17 22:01     ` Jamie Lokier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070516000630.GD1220@lazybastard.org \
    --to=joern@lazybastard.org \
    --cc=acahalan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=greg@kroah.com \
    --cc=ioe-lkml@rameria.de \
    --cc=jengelh@linux01.gwdg.de \
    --cc=johnpol@2ka.mipt.ru \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mtd@lists.infradead.org \
    --cc=penberg@cs.helsinki.fi \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).