linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dan Magenheimer <dan.magenheimer@oracle.com>
To: Al Viro <viro@ZenIV.linux.org.uk>, linux-fsdevel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Sage Weil <sage@newdream.net>,
	Andreas Dilger <aedilger@gmail.com>,
	Chris Mason <chris.mason@oracle.com>
Subject: RE: [RFC] ->encode_fh() and related breakage
Date: Fri, 30 Dec 2011 08:48:07 -0800 (PST)	[thread overview]
Message-ID: <d7fec7cf-b016-4697-98db-9feee986d8c8@default> (raw)
In-Reply-To: <20111230062719.GU23916@ZenIV.linux.org.uk>

> From: Al Viro [mailto:viro@ZenIV.linux.org.uk]
> Sent: Thursday, December 29, 2011 11:27 PM
> To: linux-fsdevel@vger.kernel.org
> Cc: Linus Torvalds; Sage Weil; Dan Magenheimer
> Subject: [RFC] ->encode_fh() and related breakage
> 
> 	a) cleancache is abusing ->encode_fh().  Badly.  In particular,
> on-stack(!) struct dentry is a lousy idea.

Indeed. See:  https://lkml.org/lkml/2011/2/24/378 
Chris Mason (now cc'ed) suggested (offlist) this short-term hack, which
was a fix to a reported issue** from a user of btrfs+cleancache+zcache
and the above lkml post was intended to log the hack for later review.

Thanks, Al, for picking up the ball I dropped and neglected.

** https://lkml.org/lkml/2011/2/13/163 
 
> 	c) cleancache relies on (unguaranteed) property of ->encode_fh() -
> if the last argument is 0, most of the instances do not look at anything
> other than dentry->d_inode.  Guess what happens with ceph?  Right, we
> end up doing spin_lock(&d.d_lock) and dget(d.d_parent).  Even though
> struct dentry d is auto, has no initializer and the only thing done to
> it is assignment to d.d_inode.  IOW, that spinlock has undefined contents
> and in case if you happen to pass that spin_lock(), dget(random_pointer)
> will get you an increment of value in memory at random address.  Fun...
> Moreover, on FAT we get dereference (read-only, but still it's at least an
> oops) of uninitialized pointer - d.d_parent->d_inode is followed.

Fortunately, cleancache is per-filesystem opt-in so that encode_fh
call from cleancache_get_key (in mm/cleancache.c) would never occur
for those filesystems.

> 	e) in case of default encode_fh, cleancache gets one difference from
> exportfs_encode_fh() behaviour - it ignores ->i_generation.  Note that for
> non-default encode_fh() i_generation is *not* ignored.
> 
> 	f) in case of _no_ export operations being present, cleancache is
> basically praying for inumbers being stable for some unknown reason.  Which
> is about as efficient as prayers tend to be...

What cleancache is looking for is a persistent 192-bit file identifier
that is unique and reproducible, regardless of whether the dentry is in-cache.
There should be no way for that unique identifier to become incoherent
with pages on disk belonging to that inode unless the inode is
in memory and cleancache_flush/invalidate_inode is called. Any fs
that cannot guarantee this uniqueness and coherency should not enable
cleancache, which is a good reason why cleancache is per-fs opt-in.

The use case (in case you are baffled):  Think of cleancache as being
huge, NOT part of the main memory of the system and so NOT subject
to normal system memory pressure.  We want the data pages to remain
in cleancache as long as possible, regardless of whether the inode
or dentry are in-cache.

> 	One kinda-sorta solution (besides kicking cleancache out of tree)

8-/

> would be to modify ->encode_fh() API - instead of dentry + connectable
> pass it inode + NULL-or-parent-inode, with ->d_lock held by caller.

> -static int export_encode_fh(struct dentry *dentry, struct fid *fid,
> -		int *max_len, int connectable)
> +static int export_encode_fh(struct inode *inode, struct fid *fid,
> +		int *max_len, struct inode *parent)

I won't pretend to be worthy to comment on the rest of the patch
or the change-of-API issues but FWIW this API change and the matching
cleancache changes below:

Acked-by: Dan Magenheimer <dan.magenheimer.com>

> diff --git a/mm/cleancache.c b/mm/cleancache.c
> index bcaae4c..668c74f 100644
> --- a/mm/cleancache.c
> +++ b/mm/cleancache.c
> @@ -75,7 +75,7 @@ EXPORT_SYMBOL(__cleancache_init_shared_fs);
>  static int cleancache_get_key(struct inode *inode,
>  			      struct cleancache_filekey *key)
>  {
> -	int (*fhfn)(struct dentry *, __u32 *fh, int *, int);
> +	int (*fhfn)(struct inode *, __u32 *fh, int *, struct inode *);
>  	int len = 0, maxlen = CLEANCACHE_KEY_MAX;
>  	struct super_block *sb = inode->i_sb;
> 
> @@ -83,9 +83,7 @@ static int cleancache_get_key(struct inode *inode,
>  	if (sb->s_export_op != NULL) {
>  		fhfn = sb->s_export_op->encode_fh;
>  		if  (fhfn) {
> -			struct dentry d;
> -			d.d_inode = inode;
> -			len = (*fhfn)(&d, &key->u.fh[0], &maxlen, 0);
> +			len = (*fhfn)(inode, &key->u.fh[0], &maxlen, NULL);
>  			if (len <= 0 || len == 255)
>  				return -1;
>  			if (maxlen > CLEANCACHE_KEY_MAX)

  parent reply	other threads:[~2011-12-30 16:48 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-30  6:27 [RFC] ->encode_fh() and related breakage Al Viro
2011-12-30  7:14 ` Andreas Dilger
2011-12-30  7:57   ` Al Viro
2012-01-09 13:55     ` Bernd Schubert
2011-12-30 16:48 ` Dan Magenheimer [this message]
2012-01-03 11:31 ` Steven Whitehouse
2012-01-03 17:00 ` Sage Weil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d7fec7cf-b016-4697-98db-9feee986d8c8@default \
    --to=dan.magenheimer@oracle.com \
    --cc=aedilger@gmail.com \
    --cc=chris.mason@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=sage@newdream.net \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).