Re: [RFC][WIP] DIO simplification and AIO-DIO stability

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Wendy Cheng <wcheng@redhat.com>
To: suparna@in.ibm.com
Cc: linux-fsdevel@vger.kernel.org, linux-aio@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC][WIP] DIO simplification and AIO-DIO stability
Date: Thu, 23 Feb 2006 14:12:04 -0500	[thread overview]
Message-ID: <43FE0904.3020900@redhat.com> (raw)
In-Reply-To: <20060223072955.GA14244@in.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 1929 bytes --]

Suparna Bhattacharya wrote:

>http://www.kernel.org/pub/linux/kernel/people/suparna/DIO-simplify.txt
>(also inlined below)
>  
>
Hi, Suparna,
                                                                                

It would be nice to ensure that the lock sequence will not cause issues 
for out-of-tree external kernel modules (e.g. cluster files System) that 
require extra locking for various purposes. We've found several 
deadlocks issues in Global File System (GFS) Direct IO path due to lock 
order enforced by VFS layer:
                                                                                

1) In sys_ftruncate()->do_truncate(), VFS layer grabs
  * i_sem
  * then i_alloc_sem (i_mutex)
  * then call filesystem's setattr().
                                                                                

2) In Direct IO read, VFS layer calls
  * filesystem's direct_IO()
  * grabs i_sem (i_mutex)
  * followed by i_alloc_sem.

In our case, both gfs_setattr() and gfs_direct_IO() need its own 
(global) locks to synchronize inter-nodes (and inter-processes) control 
structures access but gfs_direct_IO later ends up in 
__blockdev_direct_IO path that deadlocks with i_sem (i_mutex) and 
i_alloc_sem.
                                                                                

A new DIO flag is added into our distribution (2.6.9 based) to work 
around the problem by moving the inode semaphore acquiring within 
__blockdev_direct_IO() (patch attached) into GFS code path (so lock 
order can be re-arranged). The new lock granularity is not ideal but it 
gets us out of this deadlock.

We havn't had a chance to go thru your mail (and patch) in details yet 
but would like bring up this issue earlier before it gets messy.
                                                                                                                                                             

-- Wendy



[-- Attachment #2: linux-2.6.9-dio-gfs-locking.patch --]
[-- Type: text/plain, Size: 2648 bytes --]

--- linux-2.6.9-22.EL/include/linux/fs.h	2005-12-07 12:43:55.000000000 -0500
+++ linux.truncate/include/linux/fs.h	2005-12-02 00:25:22.000000000 -0500
@@ -1509,7 +1509,8 @@ ssize_t __blockdev_direct_IO(int rw, str
 	int lock_type);
 
 enum {
-	DIO_LOCKING = 1, /* need locking between buffered and direct access */
+	DIO_CLUSTER_LOCKING = 0, /* allow (cluster) fs handle its own locking */
+	DIO_LOCKING,     /* need locking between buffered and direct access */
 	DIO_NO_LOCKING,  /* bdev; no locking at all between buffered/direct */
 	DIO_OWN_LOCKING, /* filesystem locks buffered and direct internally */
 };
@@ -1541,6 +1542,15 @@ static inline ssize_t blockdev_direct_IO
 				nr_segs, get_blocks, end_io, DIO_OWN_LOCKING);
 }
 
+static inline ssize_t blockdev_direct_IO_cluster_locking(int rw, struct kiocb *iocb,
+	struct inode *inode, struct block_device *bdev, const struct iovec *iov,
+	loff_t offset, unsigned long nr_segs, get_blocks_t get_blocks,
+	dio_iodone_t end_io)
+{
+	return __blockdev_direct_IO(rw, iocb, inode, bdev, iov, offset,
+			nr_segs, get_blocks, end_io, DIO_CLUSTER_LOCKING);
+}
+
 extern struct file_operations generic_ro_fops;
 
 #define special_file(m) (S_ISCHR(m)||S_ISBLK(m)||S_ISFIFO(m)||S_ISSOCK(m))
--- linux-2.6.9-22.EL/fs/direct-io.c	2005-11-09 17:26:02.000000000 -0500
+++ linux.truncate/fs/direct-io.c	2005-12-07 12:27:17.000000000 -0500
@@ -515,7 +515,7 @@ static int get_more_blocks(struct dio *d
 			fs_count++;
 
 		create = dio->rw == WRITE;
-		if (dio->lock_type == DIO_LOCKING) {
+		if ((dio->lock_type == DIO_LOCKING) || (dio->lock_type == DIO_CLUSTER_LOCKING)) {
 			if (dio->block_in_file < (i_size_read(dio->inode) >>
 							dio->blkbits))
 				create = 0;
@@ -1183,9 +1183,16 @@ __blockdev_direct_IO(int rw, struct kioc
 	 * For regular files using DIO_OWN_LOCKING,
 	 *	neither readers nor writers take any locks here
 	 *	(i_sem is already held and release for writers here)
+	 * The DIO_CLUSTER_LOCKING allows (cluster) filesystem manages its own
+	 *	locking (bypassing i_sem and i_alloc_sem handling within
+	 *	__blockdev_direct_IO()).
 	 */
+
 	dio->lock_type = dio_lock_type;
-	if (dio_lock_type != DIO_NO_LOCKING) {
+	if (dio_lock_type == DIO_CLUSTER_LOCKING)
+		goto cluster_skip_locking;
+
+	if (dio_lock_type != DIO_NO_LOCKING) { 
 		if (rw == READ) {
 			struct address_space *mapping;
 
@@ -1205,6 +1212,9 @@ __blockdev_direct_IO(int rw, struct kioc
 		if (dio_lock_type == DIO_LOCKING)
 			down_read(&inode->i_alloc_sem);
 	}
+
+cluster_skip_locking:
+
 	/*
 	 * For file extending writes updating i_size before data
 	 * writeouts complete can expose uninitialized blocks. So

next prev parent reply	other threads:[~2006-02-23 19:12 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-23  7:29 [RFC][WIP] DIO simplification and AIO-DIO stability Suparna Bhattacharya
2006-02-23 19:12 ` Wendy Cheng [this message]
2006-02-24 11:53   ` Suparna Bhattacharya
2006-02-24 15:51     ` Wendy Cheng
2006-02-24  0:39 ` Badari Pulavarty
2006-02-24  1:13   ` Andrew Morton
2006-02-24 11:25     ` Suparna Bhattacharya
2006-02-24  1:01 ` Chris Mason
2006-02-24  9:37   ` Suparna Bhattacharya
2006-02-24  1:21 ` Zach Brown
2006-02-24 11:12   ` Suparna Bhattacharya
2006-02-24 18:09     ` Badari Pulavarty

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43FE0904.3020900@redhat.com \
    --to=wcheng@redhat.com \
    --cc=linux-aio@kvack.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=suparna@in.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).