public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <jens.axboe@oracle.com>
To: Anton Blanchard <anton@samba.org>
Cc: Jan Kara <jack@suse.cz>, Christoph Hellwig <hch@lst.de>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] Fix regression in O_DIRECT|O_SYNC writes to block devices
Date: Thu, 15 Apr 2010 12:04:15 +0200	[thread overview]
Message-ID: <20100415100415.GU27497@kernel.dk> (raw)
In-Reply-To: <20100415044039.GJ11751@kryten>

On Thu, Apr 15 2010, Anton Blanchard wrote:
> 
> We are seeing a large regression in database performance on recent kernels.
> The database opens a block device with O_DIRECT|O_SYNC and a number of threads
> write to different regions of the file at the same time.
> 
> A simple test case is below. I haven't defined DEVICE to anything since getting
> it wrong will destroy your data :) On an 3 disk LVM with a 64k chunk size we
> see about 17MB/sec and only a few threads in IO wait:
> 
> procs  -----io---- -system-- -----cpu------
>  r  b     bi    bo   in   cs us sy id wa st
>  0  3      0 16170  656 2259  0  0 86 14  0
>  0  2      0 16704  695 2408  0  0 92  8  0
>  0  2      0 17308  744 2653  0  0 86 14  0
>  0  2      0 17933  759 2777  0  0 89 10  0
> 
> Most threads are blocking in vfs_fsync_range, which has:
> 
>         mutex_lock(&mapping->host->i_mutex);
>         err = fop->fsync(file, dentry, datasync);
>         if (!ret)
>                 ret = err;
>         mutex_unlock(&mapping->host->i_mutex);
> 
> Commit 148f948ba877f4d3cdef036b1ff6d9f68986706a (vfs: Introduce new helpers for
> syncing after writing to O_SYNC file or IS_SYNC inode) offers some explanation
> of what is going on:
> 
>     Use these new helpers for syncing from generic VFS functions. This makes
>     O_SYNC writes to block devices acquire i_mutex for syncing. If we really
>     care about this, we can make block_fsync() drop the i_mutex and reacquire
>     it before it returns.
> 
> Thanks Jan for such a good commit message! The patch below drops the i_mutex
> in blkdev_fsync as suggested. With it the testcase improves from 17MB/s to
> 68M/sec:
> 
> procs  -----io---- -system-- -----cpu------
>  r  b     bi    bo   in   cs us sy id wa st
>  0  7      0 65536 1000 3878  0  0 70 30  0
>  0 34      0 69632 1016 3921  0  1 46 53  0
>  0 57      0 69632 1000 3921  0  0 55 45  0
>  0 53      0 69640  754 4111  0  0 81 19  0
> 
> I'd appreciate any comments from the I/O guys on if this is the right approach.

Looks good to me, I see Jan already made a few style suggestions.

-- 
Jens Axboe


  parent reply	other threads:[~2010-04-15 10:04 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-15  4:40 [PATCH] Fix regression in O_DIRECT|O_SYNC writes to block devices Anton Blanchard
2010-04-15  8:47 ` Jan Kara
2010-04-15 10:04 ` Jens Axboe [this message]
2010-04-15 10:42 ` Christoph Hellwig
2010-04-15 13:34   ` Jan Kara
2010-04-20  2:26   ` Anton Blanchard
2010-04-20  2:30   ` Anton Blanchard
2010-04-22 19:25     ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100415100415.GU27497@kernel.dk \
    --to=jens.axboe@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=anton@samba.org \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox