linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] fs: Fix hang with BSD accounting on frozen filesystem
@ 2012-11-07 22:31 Jan Kara
  2012-11-08  5:22 ` Dave Chinner
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Kara @ 2012-11-07 22:31 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Nikola Ciprich, Jan Kara

When BSD process accounting is enabled and logs information to a filesystem
which gets frozen, system easily becomes unusable because each attempt to
account process information blocks. Thus e.g. every task gets blocked in exit.

It seems better to drop accounting information (which can already happen when
filesystem is running out of space) instead of locking system up. This is
implemented using a special flag FMODE_NO_FREEZE_WAIT in file->f_mode of a
file to which accounting information is written.

Reported-and-tested-by: Nikola Ciprich <nikola.ciprich@linuxbox.cz>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 This feels a bit hacky but I didn't find a better solution... If someone
has idea, please speak up.

 fs/btrfs/file.c    |    3 ++-
 fs/cifs/file.c     |    3 ++-
 fs/fuse/file.c     |    3 ++-
 fs/ntfs/file.c     |    3 ++-
 fs/ocfs2/file.c    |    3 ++-
 fs/open.c          |    2 +-
 fs/xfs/xfs_file.c  |    3 ++-
 include/linux/fs.h |   14 ++++++++++++++
 kernel/acct.c      |    1 +
 mm/filemap.c       |    3 ++-
 10 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 9ab1bed..6eb2e30 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1411,7 +1411,8 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb,
 	ssize_t err = 0;
 	size_t count, ocount;
 
-	sb_start_write(inode->i_sb);
+	if (!sb_start_file_write(file))
+		return -EAGAIN;
 
 	mutex_lock(&inode->i_mutex);
 
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index edb25b4..1629e47 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2448,7 +2448,8 @@ cifs_writev(struct kiocb *iocb, const struct iovec *iov,
 
 	BUG_ON(iocb->ki_pos != pos);
 
-	sb_start_write(inode->i_sb);
+	if (!sb_start_file_write(file))
+		return -EAGAIN;
 
 	/*
 	 * We need to hold the sem to be sure nobody modifies lock list
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 78d2837..641df9e 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -947,7 +947,8 @@ static ssize_t fuse_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
 		return err;
 
 	count = ocount;
-	sb_start_write(inode->i_sb);
+	if (!sb_start_file_write(file))
+		return -EAGAIN;
 	mutex_lock(&inode->i_mutex);
 
 	/* We can write back this queue in page reclaim */
diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c
index 1ecf464..028b349 100644
--- a/fs/ntfs/file.c
+++ b/fs/ntfs/file.c
@@ -2118,7 +2118,8 @@ static ssize_t ntfs_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
 
 	BUG_ON(iocb->ki_pos != pos);
 
-	sb_start_write(inode->i_sb);
+	if (!sb_start_file_write(file))
+		return -EAGAIN;
 	mutex_lock(&inode->i_mutex);
 	ret = ntfs_file_aio_write_nolock(iocb, iov, nr_segs, &iocb->ki_pos);
 	mutex_unlock(&inode->i_mutex);
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index 5a4ee77..93ef34d 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -2265,7 +2265,8 @@ static ssize_t ocfs2_file_aio_write(struct kiocb *iocb,
 	if (iocb->ki_left == 0)
 		return 0;
 
-	sb_start_write(inode->i_sb);
+	if (!sb_start_file_write(file))
+		return -EAGAIN;
 
 	appending = file->f_flags & O_APPEND ? 1 : 0;
 	direct_io = file->f_flags & O_DIRECT ? 1 : 0;
diff --git a/fs/open.c b/fs/open.c
index 59071f5..42bd875 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -808,7 +808,7 @@ static inline int build_open_flags(int flags, umode_t mode, struct open_flags *o
 		op->mode = 0;
 
 	/* Must never be set by userspace */
-	flags &= ~FMODE_NONOTIFY & ~O_CLOEXEC;
+	flags &= ~FMODE_NONOTIFY & ~O_CLOEXEC & ~FMODE_NO_FREEZE_WAIT;
 
 	/*
 	 * O_SYNC is implemented as __O_SYNC|O_DSYNC.  As many places only
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index aa473fa..7d8af61 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -771,7 +771,8 @@ xfs_file_aio_write(
 	if (ocount == 0)
 		return 0;
 
-	sb_start_write(inode->i_sb);
+	if (!sb_start_file_write(file))
+		return -EAGAIN;
 
 	if (XFS_FORCED_SHUTDOWN(ip->i_mount)) {
 		ret = -EIO;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b33cfc9..c040a6c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -123,6 +123,9 @@ typedef void (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 /* File was opened by fanotify and shouldn't generate fanotify events */
 #define FMODE_NONOTIFY		((__force fmode_t)0x1000000)
 
+/* Write to file should fail on frozen fs rather than block */
+#define FMODE_NO_FREEZE_WAIT	((__force fmode_t)0x2000000)
+
 /*
  * Flag for rw_copy_check_uvector and compat_rw_copy_check_uvector
  * that indicates that they should check the contents of the iovec are
@@ -1401,6 +1404,17 @@ static inline int sb_start_write_trylock(struct super_block *sb)
 	return __sb_start_write(sb, SB_FREEZE_WRITE, false);
 }
 
+/*
+ * We use trylock semantics if write originates in kernel and normal lock
+ * semantics otherwise. This is a hack but solves problems with deadlocking
+ * of e.g. psacct when filesystem is frozen.
+ */
+static inline int sb_start_file_write(struct file *file)
+{
+	return __sb_start_write(file->f_mapping->host->i_sb, SB_FREEZE_WRITE,
+				!(file->f_mode & FMODE_NO_FREEZE_WAIT));
+}
+
 /**
  * sb_start_pagefault - get write access to a superblock from a page fault
  * @sb: the super we write to
diff --git a/kernel/acct.c b/kernel/acct.c
index 051e071..0b5f231 100644
--- a/kernel/acct.c
+++ b/kernel/acct.c
@@ -183,6 +183,7 @@ static void acct_file_reopen(struct bsd_acct_struct *acct, struct file *file,
 		acct->needcheck = jiffies + ACCT_TIMEOUT*HZ;
 		acct->active = 1;
 		list_add(&acct->list, &acct_list);
+		file->f_mode |= FMODE_NO_FREEZE_WAIT;
 	}
 	if (old_acct) {
 		mnt_unpin(old_acct->f_path.mnt);
diff --git a/mm/filemap.c b/mm/filemap.c
index 83efee7..3b2812b 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2527,7 +2527,8 @@ ssize_t generic_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
 
 	BUG_ON(iocb->ki_pos != pos);
 
-	sb_start_write(inode->i_sb);
+	if (!sb_start_file_write(file))
+		return -EAGAIN;
 	mutex_lock(&inode->i_mutex);
 	ret = __generic_file_aio_write(iocb, iov, nr_segs, &iocb->ki_pos);
 	mutex_unlock(&inode->i_mutex);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] fs: Fix hang with BSD accounting on frozen filesystem
  2012-11-07 22:31 [PATCH] fs: Fix hang with BSD accounting on frozen filesystem Jan Kara
@ 2012-11-08  5:22 ` Dave Chinner
  2012-11-08  8:09   ` Marco Stornelli
  2012-11-08 10:18   ` Jan Kara
  0 siblings, 2 replies; 4+ messages in thread
From: Dave Chinner @ 2012-11-08  5:22 UTC (permalink / raw)
  To: Jan Kara; +Cc: Al Viro, linux-fsdevel, Nikola Ciprich

On Wed, Nov 07, 2012 at 11:31:39PM +0100, Jan Kara wrote:
> When BSD process accounting is enabled and logs information to a filesystem
> which gets frozen, system easily becomes unusable because each attempt to
> account process information blocks. Thus e.g. every task gets blocked in exit.
> 
> It seems better to drop accounting information (which can already happen when
> filesystem is running out of space) instead of locking system up. This is
> implemented using a special flag FMODE_NO_FREEZE_WAIT in file->f_mode of a
> file to which accounting information is written.

I have no problems with making freeze waiting non-blocking, by why
invent a new flag for what is essentially an O_NONBLOCK operation?

Indeed, if someone opens a file O_NONBLOCK, shouldn't if behave
exactly the same on a frozen filesystem as this special
FMODE_NO_FREEZE_WAIT flag?

FWIW, nfsd could use this as well so that it doesn't block all the
nfsd threads trying to write to a frozen filesystem but instead
returns EJUKEBOX to the client to tell it ot wait for a while before
trying the operation again...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] fs: Fix hang with BSD accounting on frozen filesystem
  2012-11-08  5:22 ` Dave Chinner
@ 2012-11-08  8:09   ` Marco Stornelli
  2012-11-08 10:18   ` Jan Kara
  1 sibling, 0 replies; 4+ messages in thread
From: Marco Stornelli @ 2012-11-08  8:09 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Jan Kara, Al Viro, linux-fsdevel, Nikola Ciprich

2012/11/8 Dave Chinner <david@fromorbit.com>:
> On Wed, Nov 07, 2012 at 11:31:39PM +0100, Jan Kara wrote:
>> When BSD process accounting is enabled and logs information to a filesystem
>> which gets frozen, system easily becomes unusable because each attempt to
>> account process information blocks. Thus e.g. every task gets blocked in exit.
>>
>> It seems better to drop accounting information (which can already happen when
>> filesystem is running out of space) instead of locking system up. This is
>> implemented using a special flag FMODE_NO_FREEZE_WAIT in file->f_mode of a
>> file to which accounting information is written.
>
> I have no problems with making freeze waiting non-blocking, by why
> invent a new flag for what is essentially an O_NONBLOCK operation?
>

It's a general problem. I didn't try but I don't think O_NONBLOCK
works on a frozen fs, even without BSD accounting. And this behaviour
it's not related with the Jan's fsfreeze patches.

Marco

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] fs: Fix hang with BSD accounting on frozen filesystem
  2012-11-08  5:22 ` Dave Chinner
  2012-11-08  8:09   ` Marco Stornelli
@ 2012-11-08 10:18   ` Jan Kara
  1 sibling, 0 replies; 4+ messages in thread
From: Jan Kara @ 2012-11-08 10:18 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Jan Kara, Al Viro, linux-fsdevel, Nikola Ciprich

On Thu 08-11-12 16:22:56, Dave Chinner wrote:
> On Wed, Nov 07, 2012 at 11:31:39PM +0100, Jan Kara wrote:
> > When BSD process accounting is enabled and logs information to a filesystem
> > which gets frozen, system easily becomes unusable because each attempt to
> > account process information blocks. Thus e.g. every task gets blocked in exit.
> > 
> > It seems better to drop accounting information (which can already happen when
> > filesystem is running out of space) instead of locking system up. This is
> > implemented using a special flag FMODE_NO_FREEZE_WAIT in file->f_mode of a
> > file to which accounting information is written.
> 
> I have no problems with making freeze waiting non-blocking, by why
> invent a new flag for what is essentially an O_NONBLOCK operation?
> 
> Indeed, if someone opens a file O_NONBLOCK, shouldn't if behave
> exactly the same on a frozen filesystem as this special
> FMODE_NO_FREEZE_WAIT flag?
  Originally I didn't want to cause user visible effects for other files
with the fix. But you are right that O_NONBLOCK means exactly what we need
and userspace could actually expect something like this to work. I'm
somewhat worried about broken applications which don't expect EAGAIN from
O_NONBLOCK write to a file (as traditionally O_NONBLOCK was no-op for
buffered file writes) but I guess it's worth a try. Thanks for idea, I'll
send an updated patch shortly.
 
> FWIW, nfsd could use this as well so that it doesn't block all the
> nfsd threads trying to write to a frozen filesystem but instead
> returns EJUKEBOX to the client to tell it ot wait for a while before
> trying the operation again...
  Yes, that would be another possible use of O_NONBLOCK.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-11-08 10:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-07 22:31 [PATCH] fs: Fix hang with BSD accounting on frozen filesystem Jan Kara
2012-11-08  5:22 ` Dave Chinner
2012-11-08  8:09   ` Marco Stornelli
2012-11-08 10:18   ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).