linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dean <seattleplus-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: Alexandre Depoutovitch
	<adepoutovitch-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org>
Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH RFC v2] Performing direct I/O on sector-aligned requests
Date: Fri, 11 May 2012 11:36:52 -0700	[thread overview]
Message-ID: <4FAD5C44.50407@gmail.com> (raw)
In-Reply-To: <1508773761.4854678.1335731939770.JavaMail.root-uUpdlAIx0AHkdGAVcyJ/gDSPNL9O62GLZeezCHUQhQ4@public.gmane.org>



On 4/29/12 2:03 PM, Alexandre Depoutovitch wrote:
> A new flag is exposed to users through /proc/fs/nfsd/direct_io node. The
> default value of 1 results in the above behavior. Writing 0 to the node
> turns off the direct I/O completely, and forces NFS daemon to always use
> buffered IO (as it has done before). Writing 2 to the node tells NFS
> daemon to use direct I/O whenever possible, even if  requests are aligned
> at file system block boundary.

Not sure if this was previously discussed, but I assume the default 
would remain the same (value 0)?

Dean


>
> In order to test the patch, the following have been done: I've deployed 2
> Linux machines with 3.0 kernel and my modifications. One acted as an NFS
> server, the other acted as an NFS client. NFS volume was mounted in sync
> mode.
> Number of NFS daemons was increased to 64 in order to have higher chances
> of catching concurrency issues. Volume was formatted using ext4 file
> system. Volume was located on a hardware RAID10 array with 8 10K 450GB SAS
> drives. Raid adapter was HP P410i.
>
> 1. During first set of experiments, the client machine created a 200 GB
> file by writing to it. Then it performed the following access patterns:
> Read, random, (4K)
> Write, random, (4K)
> Read, sequential (4K)
> Write, sequential (4K)
> Read, sequential (4K, first access at 512 offset)
> Write, sequential (4K, first access at 512 offset)
> Read, sequential (32K)
> Write, sequential (32K)
> Read, sequential (32K, first access at 512 offset)
> Write, sequential (32K, first access at 512 offset)
> Read, sequential (256K)
> Write, sequential (256K)
> All accesses where done with keeping 64 outstanding IO requests on a
> client. I compared performance of the above patterns on vanilla Linux and
> Linux with my changes. All numbers (IOPS, latency) where the same for all
> cases except for random writes, where IOPS increase was 14 times.
>
> In addition, I have done several correctness tests.
>
> 2. Allocated three 200GB files using (a) explicit writes to a file, (b)
> fallocate() system call, (c) seeking to the end of the file and writing
> one sector there.
> Then, did random and sequential writes to files. After that, I verified
> that files were indeed modified and contained the latest data. Test for
> each file ran for 2 hours.
>
> 3. Allocated 200GB file and started sequential reads to trigger read-ahead
> mechanism. Every 100 read operations, one file system unaligned write
> immediately after the current read position was requested in order to
> trigger a direct write. After that, read continued. All writes contained a
> predefined value, so that read can check for it. I have done this, in
> order to be sure that direct write correctly invalidates already in-memory
> cache.
>
>
> Current implementation performs synchronous direct I/O and may trigger
> higher latencies when NFS volume is mounted in asynchronous mode. In
> order to avoid it, as per Trond Myklebust's suggestion, iov_iter interface
> with asynchronous reads and writes can be used. This is why currently,
> Direct I/O can
> be enabled or disabled at boot or run-time without NFS server restart
> through the /proc/fs/nfsd/direct_io node.
>
>
>
> --------------------------------------------------------------------------
>
> diff -uNr linux-orig/fs/direct-io.c
> linux-3.0.7-0.7.2.8796.vmw/fs/direct-io.c
> --- linux-orig/fs/direct-io.c   2011-10-24 14:06:32.000000000 -0400
> +++ linux-3.0.7-0.7.2.8796.vmw/fs/direct-io.c   2012-04-25
> 16:34:30.000000000 -0400
> @@ -152,11 +152,30 @@
>      int nr_pages;
>
>      nr_pages = min(dio->total_pages - dio->curr_page, DIO_PAGES);
> -   ret = get_user_pages_fast(
> -       dio->curr_user_address,     /* Where from? */
> -       nr_pages,           /* How many pages? */
> -       dio->rw == READ,        /* Write to memory? */
> -&dio->pages[0]);        /* Put results here */
> +
> +   if (current->mm) {
> +       ret = get_user_pages_fast(
> +           dio->curr_user_address,     /* Where from? */
> +           nr_pages,           /* How many pages? */
> +           dio->rw == READ,        /* Write to memory? */
> +&dio->pages[0]);        /* Put results here */
> +   } else {
> +       /* For kernel threads mm is NULL, so all we need is to increment
> +       page's reference count and add page to dio->pages array */
> +       int i;
> +       struct page* page;
> +       unsigned long start_pfn = virt_to_phys((void
> *)dio->curr_user_address)
> +>>  PAGE_SHIFT;
> +       /* For kernel threads buffer must be in kernel memory */
> +       BUG_ON(dio->curr_user_address<  TASK_SIZE_MAX);
> +
> +       for (i = 0; i<  nr_pages; i++) {
> +           page = pfn_to_page(start_pfn + i);
> +           page_cache_get(page);
> +           dio->pages[i] = page;
> +       }
> +       /* No need to lock pages: this is kernel thread and the pages are in
> +         kernel as well */
> +       ret = nr_pages;
> +   }
>
>      if (ret<  0&&  dio->blocks_available&&  (dio->rw&  WRITE)) {
>          struct page *page = ZERO_PAGE(0);
> @@ -972,7 +991,11 @@
>                  break;
>          }
>
> -       /* Drop the ref which was taken in get_user_pages() */
> +       /*
> +        * Drop the ref which was taken in dio_refill_pages
> +        * directly (for direct I/O) or by calling get_user_pages
> +        * (for buffered IO)
> +        */
>          page_cache_release(page);
>          block_in_page = 0;
>      }
> diff -uNr linux-orig/fs/nfsd/lockd.c
> linux-3.0.7-0.7.2.8796.vmw/fs/nfsd/lockd.c
> --- linux-orig/fs/nfsd/lockd.c  2011-10-24 14:06:32.000000000 -0400
> +++ linux-3.0.7-0.7.2.8796.vmw/fs/nfsd/lockd.c  2012-03-28
> 15:40:29.000000000 -0400
> @@ -36,7 +36,7 @@
>      fh.fh_export = NULL;
>
>      exp_readlock();
> -   nfserr = nfsd_open(rqstp,&fh, S_IFREG, NFSD_MAY_LOCK, filp);
> +   nfserr = nfsd_open(rqstp,&fh, S_IFREG, NFSD_MAY_LOCK, filp, 0, 0);
>      fh_put(&fh);
>      exp_readunlock();
>      /* We return nlm error codes as nlm doesn't know
> diff -uNr linux-orig/fs/nfsd/nfs4state.c
> linux-3.0.7-0.7.2.8796.vmw/fs/nfsd/nfs4state.c
> --- linux-orig/fs/nfsd/nfs4state.c  2011-10-24 14:06:32.000000000 -0400
> +++ linux-3.0.7-0.7.2.8796.vmw/fs/nfsd/nfs4state.c  2012-03-28
> 15:40:29.000000000 -0400
> @@ -2557,7 +2557,7 @@
>
>      if (!fp->fi_fds[oflag]) {
>          status = nfsd_open(rqstp, cur_fh, S_IFREG, access,
> -&fp->fi_fds[oflag]);
> +&fp->fi_fds[oflag], 0, 0);
>          if (status)
>              return status;
>      }
> @@ -3951,7 +3951,7 @@
>      struct file *file;
>      int err;
>
> -   err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ,&file);
> +   err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ,&file, 0, 0);
>      if (err)
>          return err;
>      err = vfs_test_lock(file, lock);
> diff -uNr linux-orig/fs/nfsd/nfsctl.c
> linux-3.0.7-0.7.2.8796.vmw/fs/nfsd/nfsctl.c
> --- linux-orig/fs/nfsd/nfsctl.c 2011-10-24 14:06:32.000000000 -0400
> +++ linux-3.0.7-0.7.2.8796.vmw/fs/nfsd/nfsctl.c 2012-03-28
> 15:40:29.000000000 -0400
> @@ -46,6 +46,7 @@
>      NFSD_TempPorts,
>      NFSD_MaxBlkSize,
>      NFSD_SupportedEnctypes,
> +   NFSD_DirectIO,
>      /*
>       * The below MUST come last.  Otherwise we leave a hole in nfsd_files[]
>       * with !CONFIG_NFSD_V4 and simple_fill_super() goes oops
> @@ -78,6 +79,7 @@
>   static ssize_t write_ports(struct file *file, char *buf, size_t size);
>   static ssize_t write_temp_ports(struct file *file, char *buf, size_t size);
>   static ssize_t write_maxblksize(struct file *file, char *buf, size_t size);
> +static ssize_t write_directio(struct file *file, char *buf, size_t size);
>   #ifdef CONFIG_NFSD_V4
>   static ssize_t write_leasetime(struct file *file, char *buf, size_t size);
>   static ssize_t write_gracetime(struct file *file, char *buf, size_t size);
> @@ -103,6 +105,7 @@
>      [NFSD_Ports] = write_ports,
>      [NFSD_TempPorts] = write_temp_ports,
>      [NFSD_MaxBlkSize] = write_maxblksize,
> +   [NFSD_DirectIO] = write_directio,
>   #ifdef CONFIG_NFSD_V4
>      [NFSD_Leasetime] = write_leasetime,
>      [NFSD_Gracetime] = write_gracetime,
> @@ -1348,6 +1351,58 @@
>                              nfsd_max_blksize);
>   }
>
> +int nfsd_directio_mode = DIO_NEVER;
> +
> +/**
> + * nfsd_directio_mode - sets conditions when direct IO is activated
> + *
> + * Input:
> + *         buf:        ignored
> + *         size:       zero
> + *
> + * OR
> + *
> + * Input:
> + *             buf:        C string containing an unsigned
> + *                     integer value representing the new
> + *                     NFS direct IO mode
> + *         size:       non-zero length of C string in @buf
> + * Output:
> + * On success: passed-in buffer filled with '\n'-terminated C string
> + *         containing numeric value of the current direct IO mode
> + *         return code is the size in bytes of the string
> + *
> + * Possible modes are:
> + *     DIO_NEVER (0) - never use direct I/O
> + *     DIO_FS_UNALIGNED (1) - use direct I/O only for requests that FS
> unaligned
> + *         and block device aligned
> + *     DIO_SECTOR_ALIGNED (3) - use direct I/O for all block device aligned
> IO
> + * On error:   return code is zero or a negative errno value
> + */
> +static ssize_t write_directio(struct file *file, char *buf, size_t size)
> +{
> +   char *mesg = buf;
> +   if (size>  0) {
> +       int mode;
> +       int rv = get_int(&mesg,&mode);
> +       if (rv)
> +           return rv;
> +       if (mode<  DIO_NEVER || mode>  DIO_BDEV_ALIGNED)
> +           return -EINVAL;
> +       /*
> +       There is no need for synchronization here. No harm will happen
> +       even if mode was changed between opening a file and choosing whether
> +       to choose direct or buffered path. When we choosing a path we make
> sure
> +       that the file has been opened in the compatible mode
> +       */
> +       nfsd_directio_mode = mode;
> +       printk(KERN_WARNING"NFSD direct I/O mode changed to %d.",
> +           nfsd_directio_mode);
> +   }
> +
> +   return scnprintf(buf, SIMPLE_TRANSACTION_LIMIT, "%d\n",
> nfsd_directio_mode);
> +}
> +
>   #ifdef CONFIG_NFSD_V4
>   static ssize_t __nfsd4_write_time(struct file *file, char *buf, size_t
> size, time_t *time)
>   {
> @@ -1525,6 +1580,7 @@
>          [NFSD_Ports] = {"portlist",&transaction_ops, S_IWUSR|S_IRUGO},
>          [NFSD_TempPorts] = {"tempportlist",&transaction_ops,
> S_IWUSR|S_IRUGO},
>          [NFSD_MaxBlkSize] = {"max_block_size",&transaction_ops,
> S_IWUSR|S_IRUGO},
> +       [NFSD_DirectIO] = {"direct_io",&transaction_ops, S_IWUSR|S_IRUGO},
>   #if defined(CONFIG_SUNRPC_GSS) || defined(CONFIG_SUNRPC_GSS_MODULE)
>          [NFSD_SupportedEnctypes] = {"supported_krb5_enctypes",
> &supported_enctypes_ops, S_IRUGO},
>   #endif /* CONFIG_SUNRPC_GSS or CONFIG_SUNRPC_GSS_MODULE */
> diff -uNr linux-orig/fs/nfsd/nfsd.h
> linux-3.0.7-0.7.2.8796.vmw/fs/nfsd/nfsd.h
> --- linux-orig/fs/nfsd/nfsd.h   2011-10-24 14:06:32.000000000 -0400
> +++ linux-3.0.7-0.7.2.8796.vmw/fs/nfsd/nfsd.h   2012-04-17
> 11:45:55.000000000 -0400
> @@ -68,6 +68,14 @@
>
>   extern int nfsd_max_blksize;
>
> +enum {
> +   DIO_NEVER = 0,// Never use Direct I/O. The first value
> +   DIO_FS_UNALIGNED = 1,   // Use Direct I/O when request is FS unaligned
> +   DIO_BDEV_ALIGNED =2, // Always use Direct I/O when possible. The last
> value
> +};
> +
> +extern int nfsd_directio_mode;
> +
>   static inline int nfsd_v4client(struct svc_rqst *rq)
>   {
>      return rq->rq_prog == NFS_PROGRAM&&  rq->rq_vers == 4;
> diff -uNr linux-orig/fs/nfsd/vfs.c linux-3.0.7-0.7.2.8796.vmw/fs/nfsd/vfs.c
> --- linux-orig/fs/nfsd/vfs.c    2011-10-24 14:06:32.000000000 -0400
> +++ linux-3.0.7-0.7.2.8796.vmw/fs/nfsd/vfs.c    2012-04-25
> 14:21:38.000000000 -0400
> @@ -28,6 +28,7 @@
>   #include<asm/uaccess.h>
>   #include<linux/exportfs.h>
>   #include<linux/writeback.h>
> +#include<linux/blkdev.h>
>
>   #ifdef CONFIG_NFSD_V3
>   #include "xdr3.h"
> @@ -718,6 +719,255 @@
>      return break_lease(inode, mode | O_NONBLOCK);
>   }
>
> +/*
> + Copies data between two iovec arrays. Individual array elements might have
> + different sizes, but total size of data described by the two arrays must
> + be the same
> +*/
> +static int nfsd_copy_iovec(const struct iovec* svec, const unsigned int
> scount,
> +               struct iovec* dvec, const unsigned int dcount, size_t size)
> {
> +   size_t cur_size, soff, doff, tocopy, srem , drem ;
> +   unsigned int di, si;
> +
> +   cur_size = iov_length(svec, scount);
> +   if (cur_size != iov_length(dvec, dcount))
> +       return -EINVAL;
> +
> +   srem = drem = 0;
> +   di = si = 0;
> +   soff = doff = 0;
> +   while (cur_size>  0)    {
> +       if (si>= scount || di>= dcount)
> +           return -EFAULT;
> +
> +       srem = svec[si].iov_len - soff;
> +       drem = dvec[di].iov_len - doff;
> +       tocopy = (srem>  drem) ? drem : srem;
> +       memcpy((char*)(dvec[di].iov_base) + doff,
> (char*)(svec[si].iov_base) + soff, tocopy);
> +       cur_size -= tocopy;
> +       srem -= tocopy;
> +       drem -= tocopy;
> +       doff += tocopy;
> +       soff += tocopy;
> +       if (srem == 0) {
> +           si++;
> +           soff = 0;
> +       }
> +       if (drem == 0) {
> +           di++;
> +           doff = 0;
> +       }
> +   }
> +   if (si != scount || di != dcount || srem !=0 || drem != 0)
> +   {
> +       printk(KERN_WARNING"In copy_iovec: si=%lu, scount=%lu, di=%lu,
> dcount=%lu, srem=%lu, drem=%lu",
> +           (unsigned long)si, (unsigned long)scount, (unsigned long)di,
> +           (unsigned long)dcount, (unsigned long)srem, (unsigned
> long)drem);
> +       return -EFAULT;
> +   }
> +
> +   return 0;
> +}
> +
> +/*
> + Allocates iovec array where each element has memory page-aligned base
> address
> + and size of a page. Needed for DIRECT I/O to be possbile from this array
> + */
> +static int nfsd_allocate_paged_iovec(size_t size, unsigned int* pcount,
> +       struct iovec** pvec) {
> +   unsigned int i;
> +   unsigned int page_num = size / PAGE_SIZE;
> +   struct iovec * vec = NULL;
> +
> +   *pvec = NULL;
> +   *pcount = 0;
> +   if (page_num * PAGE_SIZE != size)
> +       page_num++;
> +
> +   vec = kmalloc(sizeof(struct iovec) * page_num, GFP_KERNEL);
> +   if (!vec)
> +       return -ENOMEM;
> +   memset(vec, 0, sizeof(struct iovec) * page_num);
> +   *pvec = vec;
> +   *pcount = page_num;
> +
> +   for (i = 0; i<  page_num; i++) {
> +       vec[i].iov_base = (void*)__get_free_page(GFP_KERNEL);
> +       if (!vec[i].iov_base)
> +           return -ENOMEM;
> +       vec[i].iov_len = PAGE_SIZE;
> +   }
> +
> +   if (size % PAGE_SIZE)
> +       vec[page_num - 1].iov_len = size % PAGE_SIZE;
> +
> +   return 0;
> +}
> +
> +/*
> + Deallocates iovec array, allocated by nfsd_allocate_paged_iovec
> +*/
> +static void nfsd_free_paged_iovec(unsigned int count, struct iovec* vec) {
> +   unsigned int i;
> +   if (vec) {
> +       for (i = 0; i<  count; i++)
> +           if (vec[i].iov_base)
> +               free_page((unsigned long)(vec[i].iov_base));
> +       kfree(vec);
> +   }
> +}
> +
> +/*
> + Performs direct I/O for a given NFS write request
> +*/
> +static ssize_t nfsd_vfs_write_direct(struct file *file, const struct iovec
> *vec,
> +          unsigned long vlen, loff_t *pos) {
> +   ssize_t result = -EINVAL;
> +   unsigned int page_num;
> +   struct iovec *aligned_vec = NULL;
> +
> +   // Check size to be multiple of sectors
> +   size_t size = iov_length(vec, vlen);
> +
> +   if (size == 0)
> +       return vfs_writev(file, (struct iovec __user *)vec, vlen, pos);
> +
> +   // Allocate necesary number of pages
> +   result = nfsd_allocate_paged_iovec(size,&page_num,&aligned_vec);
> +   if (result) {
> +       printk(KERN_WARNING"Cannot allocate aligned_vec.");
> +       goto out;
> +   }
> +
> +   // Copy data
> +   result = nfsd_copy_iovec(vec, vlen, aligned_vec, page_num, size);
> +   if(result) {
> +       printk(KERN_WARNING"Wrong amount of data copied to aligned
> buffer.");
> +       goto out;
> +   }
> +
> +   // Call further
> +   result = vfs_writev(file, (struct iovec __user *)aligned_vec, page_num,
> pos);
> +
> +out:
> +   nfsd_free_paged_iovec(page_num, aligned_vec);
> +   return result;
> +}
> +
> +
> +/*
> + Performs direct I/O for a given NFS read request
> +*/
> +static ssize_t nfsd_vfs_read_direct(struct file *file, struct iovec *vec,
> +          unsigned long vlen, loff_t *pos) {
> +   unsigned int page_num;
> +   struct iovec *aligned_vec = NULL;
> +   ssize_t result = -EINVAL;
> +   size_t size;
> +
> +   // Check size to be multiple of sectors
> +   size = iov_length(vec, vlen);
> +
> +   if (size == 0)
> +       return vfs_readv(file, (struct iovec __user *)vec, vlen, pos);
> +
> +   // Allocate necesary number of pages
> +   result = nfsd_allocate_paged_iovec(size,&page_num,&aligned_vec);
> +   if (result) {
> +       printk(KERN_WARNING"Cannot allocate aligned_vec.");
> +       goto out;
> +   }
> +
> +   // Call further
> +   result = vfs_readv(file, (struct iovec __user *)aligned_vec, page_num,
> pos);
> +   if (result<  0) {
> +       printk(KERN_WARNING"Error during read operation.");
> +       goto out;
> +   }
> +
> +   // Copy data
> +   if(nfsd_copy_iovec(aligned_vec, page_num, vec, vlen, size)) {
> +       printk(KERN_WARNING"Wrong amount of data copied from aligned
> buffer.");
> +       goto out;
> +   }
> +
> +out:
> +   nfsd_free_paged_iovec(page_num, aligned_vec);
> +
> +   return result;
> +}
> +
> +// Returns number of terminal zero bits for a given number (number
> alignment)
> +static unsigned int get_alignment(loff_t n) {
> +   unsigned int i=0;
> +   if (n == 0)
> +       return (unsigned int)-1; // 0 is alligned to any number
> +   while ((n&  1) == 0&&  n>  0) {
> +       n = n>>  1;
> +       i++;
> +   }
> +   return i;
> +}
> +
> +// Returns the alignment of I/O request
> +static unsigned int io_alignment(const loff_t offset,
> +       const unsigned long size) {
> +   unsigned int i1, i2;
> +
> +   i1 = get_alignment(offset);
> +   i2 = get_alignment(size);
> +
> +   return i1>  i2 ? i2 : i1;
> +}
> +
> +
> +/*
> + Based on the I/O request and file system parameters determines if
> + direct I/O can be used to perform the given request
> + Either file or sb are needed to retrieve file system and device
> + paramters
> +*/
> +static int can_use_direct_io(const struct file *file,
> +       const struct super_block* sb,
> +       const loff_t offset, const unsigned long size) {
> +   unsigned int blkbits = 0;
> +   struct inode *inode;
> +   unsigned int fsblkbits = 0;
> +   unsigned int alignment  = io_alignment(offset, size);
> +
> +   if (alignment == 0)
> +       return 0;
> +
> +   if (file == NULL&&  sb == NULL)
> +       return 0;
> +
> +   if (nfsd_directio_mode == DIO_NEVER)
> +       return 0;
> +
> +   if (file != NULL&&  sb == NULL) {
> +       inode = file->f_path.dentry->d_inode;
> +       sb = inode->i_sb;
> +       fsblkbits = inode->i_blkbits;
> +   }
> +
> +   if (sb !=NULL) {
> +       blkbits = sb->s_blocksize_bits;
> +       fsblkbits = sb->s_blocksize_bits;
> +       if (sb->s_bdev)
> +           blkbits = blksize_bits(bdev_logical_block_size(sb->s_bdev));
> +   } else
> +       blkbits = fsblkbits;
> +
> +   if (alignment>= fsblkbits&&  fsblkbits>  0&&  nfsd_directio_mode !=
> DIO_BDEV_ALIGNED)
> +       return 0;
> +
> +   if (alignment<  blkbits)
> +       return 0;
> +
> +   return 1;
> +}
> +
> +
>   /*
>    * Open an existing file or directory.
>    * The access argument indicates the type of open (read/write/lock)
> @@ -725,13 +975,15 @@
>    */
>   __be32
>   nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
> -           int access, struct file **filp)
> +           int access, struct file **filp,
> +           const loff_t offset, const unsigned long size)
>   {
>      struct dentry   *dentry;
>      struct inode    *inode;
>      int     flags = O_RDONLY|O_LARGEFILE;
>      __be32      err;
>      int     host_err = 0;
> +   struct super_block* sb;
>
>      validate_process_creds();
>
> @@ -774,6 +1026,11 @@
>          else
>              flags = O_WRONLY|O_LARGEFILE;
>      }
> +
> +   sb = fhp->fh_export->ex_path.mnt->mnt_sb;
> +   if (size&&  can_use_direct_io(NULL, sb, offset, size))
> +       flags |= O_DIRECT;
> +
>      *filp = dentry_open(dget(dentry), mntget(fhp->fh_export->ex_path.mnt),
>                  flags, current_cred());
>      if (IS_ERR(*filp))
> @@ -885,8 +1142,10 @@
>      return __splice_from_pipe(pipe, sd, nfsd_splice_actor);
>   }
>
> +
> +
>   static __be32
> -nfsd_vfs_read(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file
> *file,
> + nfsd_vfs_read(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file
> *file,
>                 loff_t offset, struct kvec *vec, int vlen, unsigned long
> *count)
>   {
>      mm_segment_t    oldfs;
> @@ -899,21 +1158,29 @@
>      if (rqstp->rq_vers>= 3)
>          file->f_flags |= O_NONBLOCK;
>
> -   if (file->f_op->splice_read&&  rqstp->rq_splice_ok) {
> -       struct splice_desc sd = {
> -           .len        = 0,
> -           .total_len  = *count,
> -           .pos        = offset,
> -           .u.data     = rqstp,
> -       };
> -
> -       rqstp->rq_resused = 1;
> -       host_err = splice_direct_to_actor(file,&sd,
> nfsd_direct_splice_actor);
> -   } else {
> +   if (file->f_flags&  O_DIRECT) {
> +       // So far we do not support splice IO, so always do regular
>          oldfs = get_fs();
>          set_fs(KERNEL_DS);
> -       host_err = vfs_readv(file, (struct iovec __user *)vec, vlen,
> &offset);
> +       host_err = nfsd_vfs_read_direct(file, (struct iovec*)vec, vlen,
> &offset);
>          set_fs(oldfs);
> +   } else {
> +       if (file->f_op->splice_read&&  rqstp->rq_splice_ok) {
> +           struct splice_desc sd = {
> +               .len        = 0,
> +               .total_len  = *count,
> +               .pos        = offset,
> +               .u.data     = rqstp,
> +           };
> +
> +           rqstp->rq_resused = 1;
> +           host_err = splice_direct_to_actor(file,&sd,
> nfsd_direct_splice_actor);
> +       } else {
> +           oldfs = get_fs();
> +           set_fs(KERNEL_DS);
> +           host_err = vfs_readv(file, (struct iovec __user *)vec, vlen,
> &offset);
> +           set_fs(oldfs);
> +       }
>      }
>
>      if (host_err>= 0) {
> @@ -1024,7 +1291,11 @@
>
>      /* Write the data. */
>      oldfs = get_fs(); set_fs(KERNEL_DS);
> -   host_err = vfs_writev(file, (struct iovec __user *)vec, vlen,&offset);
> +   if (file->f_flags&  O_DIRECT)
> +       host_err = nfsd_vfs_write_direct(file, (struct iovec*)vec, vlen,
> &offset);
> +   else
> +       host_err = vfs_writev(file, (struct iovec __user *)vec, vlen,
> &offset);
> +
>      set_fs(oldfs);
>      if (host_err<  0)
>          goto out_nfserr;
> @@ -1064,8 +1335,9 @@
>      struct inode *inode;
>      struct raparms  *ra;
>      __be32 err;
> +   unsigned long size = iov_length((struct iovec*)vec, vlen);
>
> -   err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ,&file);
> +   err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ,&file, offset,
> size);
>      if (err)
>          return err;
>
> @@ -1133,7 +1405,8 @@
>          err = nfsd_vfs_write(rqstp, fhp, file, offset, vec, vlen, cnt,
>                  stablep);
>      } else {
> -       err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_WRITE,&file);
> +       unsigned long size = iov_length((struct iovec*)vec, vlen);
> +       err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_WRITE,&file, offset,
> size);
>          if (err)
>              goto out;
>
> @@ -1173,7 +1446,7 @@
>      }
>
>      err = nfsd_open(rqstp, fhp, S_IFREG,
> -           NFSD_MAY_WRITE|NFSD_MAY_NOT_BREAK_LEASE,&file);
> +           NFSD_MAY_WRITE|NFSD_MAY_NOT_BREAK_LEASE,&file, 0, 0);
>      if (err)
>          goto out;
>      if (EX_ISSYNC(fhp->fh_export)) {
> @@ -2018,7 +2291,7 @@
>      struct file *file;
>      loff_t      offset = *offsetp;
>
> -   err = nfsd_open(rqstp, fhp, S_IFDIR, NFSD_MAY_READ,&file);
> +   err = nfsd_open(rqstp, fhp, S_IFDIR, NFSD_MAY_READ,&file, 0, 0);
>      if (err)
>          goto out;
>
> diff -uNr linux-orig/fs/nfsd/vfs.h linux-3.0.7-0.7.2.8796.vmw/fs/nfsd/vfs.h
> --- linux-orig/fs/nfsd/vfs.h    2011-10-24 14:06:32.000000000 -0400
> +++ linux-3.0.7-0.7.2.8796.vmw/fs/nfsd/vfs.h    2012-03-28
> 15:40:29.000000000 -0400
> @@ -66,7 +66,7 @@
>                  loff_t, unsigned long);
>   #endif /* CONFIG_NFSD_V3 */
>   __be32     nfsd_open(struct svc_rqst *, struct svc_fh *, int,
> -               int, struct file **);
> +               int, struct file **, const loff_t, const unsigned long);
>   void       nfsd_close(struct file *);
>   __be32         nfsd_read(struct svc_rqst *, struct svc_fh *,
>                  loff_t, struct kvec *, int, unsigned long *);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2012-05-11 18:36 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <9c0b78d1.00001bd8.00000533@aldep-VC.vmware.com>
2012-04-29 21:03 ` [PATCH RFC v2] Performing direct I/O on sector-aligned requests Alexandre Depoutovitch
2012-04-30 19:56   ` Matthew Wilcox
2012-04-30 21:39     ` Alexandre Depoutovitch
     [not found]   ` <1508773761.4854678.1335731939770.JavaMail.root-uUpdlAIx0AHkdGAVcyJ/gDSPNL9O62GLZeezCHUQhQ4@public.gmane.org>
2012-04-30 18:22     ` Jeff Moyer
2012-05-15 18:50       ` Alexandre Depoutovitch
2012-05-08 19:51     ` Alexandre Depoutovitch
2012-05-11 18:36     ` Dean [this message]
     [not found]       ` <4FAD5C44.50407-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2012-05-15 17:44         ` Alexandre Depoutovitch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FAD5C44.50407@gmail.com \
    --to=seattleplus-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=adepoutovitch-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org \
    --cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).