linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] [RFC] Adding the MAY_CREATE flag to ->permission()
@ 2009-09-03  1:06 Joel Becker
  2009-09-03  1:06 ` [PATCH 1/2] vfs: Add MAY_CREATE to the permission() flags Joel Becker
  2009-09-03  1:06 ` [PATCH 2/2] ocfs2: Use MAY_CREATE in ocfs2_permission() Joel Becker
  0 siblings, 2 replies; 6+ messages in thread
From: Joel Becker @ 2009-09-03  1:06 UTC (permalink / raw)
  To: ocfs2-devel, linux-fsdevel, viro; +Cc: hch

Hey,
	Ran into a fun problem in ocfs2.  ocfs2, being a cluster
filesystem, has cluster locks.  Being nice to our users, we allow
signals to interrupt the cluster locking layer if it hasn't gotten too
far yet (sleeping on local locking rather than the cluster).
	Now, system calls are only allowed to return -ERESTARTSYS if
they can be safely restarted.  In ocfs2_mknod(), which underlies
mkdir(2), mknod(2), and creat(2), we allow signals to interrupt us while
we gather our locks, but once we start changing things, there's no going
back.  Everyone else does the same thing.
	The problem is open(O_CREAT|O_EXCL).  See, ocfs2_mknod() will
successfully create the file.  Then we get back to
__open_namei_create(), which promptly calls may_open().  This is
backended by ocfs2_permission(), and it needs the cluster lock to
check the new inode's permissions.  Send a signal here, and the ocfs2
code will return -ERESTARTSYS.  (This is easily verified via
'git-checkout').  When entry.S restarts the open(O_CREAT|O_EXCL), it
gets -EEXIST.  Ouch!
	We can't naively block signals in ocfs2_permission().  The
majority of calls are not for O_CREAT|O_EXCL.  So how do we let
ocfs2_permission() know about this case?
	Christoph's suggestion was a new flag to ->permission().  I've
picked MAY_CREATE, but I'm totally open to a better name.  I'm open to a
better solution too.
	Folling this are the MAY_CREATE patch and the ocfs2 patch to
make use of it.

Joel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] vfs: Add MAY_CREATE to the permission() flags.
  2009-09-03  1:06 [PATCH 0/2] [RFC] Adding the MAY_CREATE flag to ->permission() Joel Becker
@ 2009-09-03  1:06 ` Joel Becker
  2009-09-03  1:06 ` [PATCH 2/2] ocfs2: Use MAY_CREATE in ocfs2_permission() Joel Becker
  1 sibling, 0 replies; 6+ messages in thread
From: Joel Becker @ 2009-09-03  1:06 UTC (permalink / raw)
  To: ocfs2-devel, linux-fsdevel, viro; +Cc: hch

A simple rule of system calls is that you cannot return -ERESTARTSYS
after you've made non-idempotent changes.  ocfs2 has run into this with
open(O_CREAT|O_EXCL).  Once you've created the file, you can't restart
the open(), because O_CREAT|O_EXCL will trigger -EEXIST.

The problem is that ocfs2 is catching the signal ->permission(), called
by may_open().  This happens after ->create() has successfully created
the file.  ocfs2_permission() has to get a cluster lock, and this is
what can be interrupted by a signal.  Now, obviously we want to block
signals in the O_CREAT|O_EXCL case, but ocfs2_permission() has no way of
knowing it just got called from open_namei_create().

So we add the MAY_CREATE flag to permission().  open_namei_create() will
pass it to may_open(), and then ocfs2 can block signals in
ocfs2_permission() as appropriate.  The same is true of any other
filesystem that has to do work in may_open().

Signed-off-by: Joel Becker <joel.becker@oracle.com>
---
 fs/namei.c         |    2 +-
 include/linux/fs.h |    1 +
 2 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index f3c5b27..b33a87c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1613,7 +1613,7 @@ out_unlock:
 	if (error)
 		return error;
 	/* Don't check for write permission, don't truncate */
-	return may_open(&nd->path, 0, flag & ~O_TRUNC);
+	return may_open(&nd->path, MAY_CREATE, flag & ~O_TRUNC);
 }
 
 /*
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 67888a9..31928cc 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -53,6 +53,7 @@ struct inodes_stat_t {
 #define MAY_APPEND 8
 #define MAY_ACCESS 16
 #define MAY_OPEN 32
+#define MAY_CREATE 64
 
 /*
  * flags in file.f_mode.  Note that FMODE_READ and FMODE_WRITE must correspond
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] ocfs2: Use MAY_CREATE in ocfs2_permission()
  2009-09-03  1:06 [PATCH 0/2] [RFC] Adding the MAY_CREATE flag to ->permission() Joel Becker
  2009-09-03  1:06 ` [PATCH 1/2] vfs: Add MAY_CREATE to the permission() flags Joel Becker
@ 2009-09-03  1:06 ` Joel Becker
  2009-09-03  1:45   ` Sunil Mushran
  1 sibling, 1 reply; 6+ messages in thread
From: Joel Becker @ 2009-09-03  1:06 UTC (permalink / raw)
  To: ocfs2-devel, linux-fsdevel, viro; +Cc: hch

ocfs2 has a problem with open(O_CREAT|O_EXCL).  Once you've created the
file, you can't restart the open(), because O_CREAT|O_EXCL will trigger
-EEXIST.

The problem is that ocfs2 is catching the signal ->permission(), called
by may_open().  This happens after ->create() has successfully created
the file.  ocfs2_permission() has to get a cluster lock, and this is
what can be interrupted by a signal.  Now, obviously we want to block
signals in the O_CREAT|O_EXCL case, but ocfs2_permission() has no way of
knowing it just got called from open_namei_create().

We key on the MAY_CREATE flag passed to permission to block signals.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
---
 fs/ocfs2/file.c |   13 +++++++++++++
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index aa501d3..508a2db 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1095,9 +1095,18 @@ bail:
 int ocfs2_permission(struct inode *inode, int mask)
 {
 	int ret;
+	sigset_t oldset;
 
 	mlog_entry_void();
 
+	/*
+	 * If this inode was just created by open(O_CREAT|O_EXCL), we
+	 * can't allow signal restarting.  So we need to block signals
+	 * around the cluster locking.
+	 */
+	if (mask & MAY_CREATE)
+		ocfs2_block_signals(&oldset);
+
 	ret = ocfs2_inode_lock(inode, NULL, 0);
 	if (ret) {
 		if (ret != -ENOENT)
@@ -1108,6 +1117,10 @@ int ocfs2_permission(struct inode *inode, int mask)
 	ret = generic_permission(inode, mask, ocfs2_check_acl);
 
 	ocfs2_inode_unlock(inode, 0);
+
+	if (mask & MAY_CREATE)
+		ocfs2_unblock_signals(&oldset);
+
 out:
 	mlog_exit(ret);
 	return ret;
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] ocfs2: Use MAY_CREATE in ocfs2_permission()
  2009-09-03  1:06 ` [PATCH 2/2] ocfs2: Use MAY_CREATE in ocfs2_permission() Joel Becker
@ 2009-09-03  1:45   ` Sunil Mushran
  2009-09-03 19:12     ` Joel Becker
  0 siblings, 1 reply; 6+ messages in thread
From: Sunil Mushran @ 2009-09-03  1:45 UTC (permalink / raw)
  To: Joel Becker; +Cc: hch, viro, ocfs2-devel, linux-fsdevel

Joel Becker wrote:
> ocfs2 has a problem with open(O_CREAT|O_EXCL).  Once you've created the
> file, you can't restart the open(), because O_CREAT|O_EXCL will trigger
> -EEXIST.
>
> The problem is that ocfs2 is catching the signal ->permission(), called
> by may_open().  This happens after ->create() has successfully created
> the file.  ocfs2_permission() has to get a cluster lock, and this is
> what can be interrupted by a signal.  Now, obviously we want to block
> signals in the O_CREAT|O_EXCL case, but ocfs2_permission() has no way of
> knowing it just got called from open_namei_create().
>
> We key on the MAY_CREATE flag passed to permission to block signals.
>
> Signed-off-by: Joel Becker <joel.becker@oracle.com>
> ---
>  fs/ocfs2/file.c |   13 +++++++++++++
>  1 files changed, 13 insertions(+), 0 deletions(-)
>
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index aa501d3..508a2db 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -1095,9 +1095,18 @@ bail:
>  int ocfs2_permission(struct inode *inode, int mask)
>  {
>  	int ret;
> +	sigset_t oldset;
>  
>  	mlog_entry_void();
>  
> +	/*
> +	 * If this inode was just created by open(O_CREAT|O_EXCL), we
> +	 * can't allow signal restarting.  So we need to block signals
> +	 * around the cluster locking.
> +	 */
> +	if (mask & MAY_CREATE)
> +		ocfs2_block_signals(&oldset);
> +
>  	ret = ocfs2_inode_lock(inode, NULL, 0);
>  	if (ret) {
>  		if (ret != -ENOENT)
> @@ -1108,6 +1117,10 @@ int ocfs2_permission(struct inode *inode, int mask)
>  	ret = generic_permission(inode, mask, ocfs2_check_acl);
>  
>  	ocfs2_inode_unlock(inode, 0);
> +
> +	if (mask & MAY_CREATE)
> +		ocfs2_unblock_signals(&oldset);
> +
>  out:
>  	mlog_exit(ret);
>  	return ret;
>   

Maybe I am missing something but shouldn't we be unblocking the signal
after the out label.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] ocfs2: Use MAY_CREATE in ocfs2_permission()
  2009-09-03  1:45   ` Sunil Mushran
@ 2009-09-03 19:12     ` Joel Becker
  0 siblings, 0 replies; 6+ messages in thread
From: Joel Becker @ 2009-09-03 19:12 UTC (permalink / raw)
  To: Sunil Mushran; +Cc: hch, viro, ocfs2-devel, linux-fsdevel

On Wed, Sep 02, 2009 at 06:45:47PM -0700, Sunil Mushran wrote:
> Joel Becker wrote:
> >@@ -1108,6 +1117,10 @@ int ocfs2_permission(struct inode *inode, int mask)
> > 	ret = generic_permission(inode, mask, ocfs2_check_acl);
> > 	ocfs2_inode_unlock(inode, 0);
> >+
> >+	if (mask & MAY_CREATE)
> >+		ocfs2_unblock_signals(&oldset);
> >+
> > out:
> > 	mlog_exit(ret);
> > 	return ret;
> 
> Maybe I am missing something but shouldn't we be unblocking the signal
> after the out label.

	Yes.

Joel

-- 

"Heav'n hath no rage like love to hatred turn'd, nor Hell a fury,
 like a woman scorn'd."
        - William Congreve

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 0/2] [RFC] Adding the MAY_CREATE flag to ->permission()
@ 2009-10-14  9:57 Joel Becker
  0 siblings, 0 replies; 6+ messages in thread
From: Joel Becker @ 2009-10-14  9:57 UTC (permalink / raw)
  To: ocfs2-devel, linux-fsdevel, linux-kernel, viro, hch, torvalds

Hey,
	Ran into a fun problem in ocfs2.  ocfs2, being a cluster
filesystem, has cluster locks.  Being nice to our users, we allow
signals to interrupt the cluster locking layer if it hasn't gotten too
far yet (sleeping on local locking rather than the cluster).
	Now, system calls are only allowed to return -ERESTARTSYS if
they can be safely restarted.  In ocfs2_mknod(), which underlies
mkdir(2), mknod(2), and creat(2), we allow signals to interrupt us while
we gather our locks, but once we start changing things, there's no going
back.  Everyone else does the same thing.
	The problem is open(O_CREAT|O_EXCL).  See, ocfs2_mknod() will
successfully create the file.  Then we get back to
__open_namei_create(), which promptly calls may_open().  This is
backended by ocfs2_permission(), and it needs the cluster lock to
check the new inode's permissions.  Send a signal here, and the ocfs2
code will return -ERESTARTSYS.  (This is easily verified via
'git-checkout').  When entry.S restarts the open(O_CREAT|O_EXCL), it
gets -EEXIST.  Ouch!
	We can't naively block signals in ocfs2_permission().  The
majority of calls are not for O_CREAT|O_EXCL.  So how do we let
ocfs2_permission() know about this case?
	Christoph's suggestion was a new flag to ->permission().  I've
picked MAY_CREATE, but I'm totally open to a better name.  I'm open to a
better solution too.
	Following this are the MAY_CREATE patch and the ocfs2 patch to
make use of it.

Joel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-10-14  9:57 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-09-03  1:06 [PATCH 0/2] [RFC] Adding the MAY_CREATE flag to ->permission() Joel Becker
2009-09-03  1:06 ` [PATCH 1/2] vfs: Add MAY_CREATE to the permission() flags Joel Becker
2009-09-03  1:06 ` [PATCH 2/2] ocfs2: Use MAY_CREATE in ocfs2_permission() Joel Becker
2009-09-03  1:45   ` Sunil Mushran
2009-09-03 19:12     ` Joel Becker
  -- strict thread matches above, loose matches on Subject: below --
2009-10-14  9:57 [PATCH 0/2] [RFC] Adding the MAY_CREATE flag to ->permission() Joel Becker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).