All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <jens.axboe@oracle.com>
To: Anton Blanchard <anton@samba.org>
Cc: Jan Kara <jack@suse.cz>, Christoph Hellwig <hch@lst.de>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] Fix regression in O_DIRECT|O_SYNC writes to block devices
Date: Thu, 15 Apr 2010 12:04:15 +0200	[thread overview]
Message-ID: <20100415100415.GU27497@kernel.dk> (raw)
In-Reply-To: <20100415044039.GJ11751@kryten>

On Thu, Apr 15 2010, Anton Blanchard wrote:
> 
> We are seeing a large regression in database performance on recent kernels.
> The database opens a block device with O_DIRECT|O_SYNC and a number of threads
> write to different regions of the file at the same time.
> 
> A simple test case is below. I haven't defined DEVICE to anything since getting
> it wrong will destroy your data :) On an 3 disk LVM with a 64k chunk size we
> see about 17MB/sec and only a few threads in IO wait:
> 
> procs  -----io---- -system-- -----cpu------
>  r  b     bi    bo   in   cs us sy id wa st
>  0  3      0 16170  656 2259  0  0 86 14  0
>  0  2      0 16704  695 2408  0  0 92  8  0
>  0  2      0 17308  744 2653  0  0 86 14  0
>  0  2      0 17933  759 2777  0  0 89 10  0
> 
> Most threads are blocking in vfs_fsync_range, which has:
> 
>         mutex_lock(&mapping->host->i_mutex);
>         err = fop->fsync(file, dentry, datasync);
>         if (!ret)
>                 ret = err;
>         mutex_unlock(&mapping->host->i_mutex);
> 
> Commit 148f948ba877f4d3cdef036b1ff6d9f68986706a (vfs: Introduce new helpers for
> syncing after writing to O_SYNC file or IS_SYNC inode) offers some explanation
> of what is going on:
> 
>     Use these new helpers for syncing from generic VFS functions. This makes
>     O_SYNC writes to block devices acquire i_mutex for syncing. If we really
>     care about this, we can make block_fsync() drop the i_mutex and reacquire
>     it before it returns.
> 
> Thanks Jan for such a good commit message! The patch below drops the i_mutex
> in blkdev_fsync as suggested. With it the testcase improves from 17MB/s to
> 68M/sec:
> 
> procs  -----io---- -system-- -----cpu------
>  r  b     bi    bo   in   cs us sy id wa st
>  0  7      0 65536 1000 3878  0  0 70 30  0
>  0 34      0 69632 1016 3921  0  1 46 53  0
>  0 57      0 69632 1000 3921  0  0 55 45  0
>  0 53      0 69640  754 4111  0  0 81 19  0
> 
> I'd appreciate any comments from the I/O guys on if this is the right approach.

Looks good to me, I see Jan already made a few style suggestions.

-- 
Jens Axboe


  parent reply	other threads:[~2010-04-15 10:04 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-15  4:40 [PATCH] Fix regression in O_DIRECT|O_SYNC writes to block devices Anton Blanchard
2010-04-15  8:47 ` Jan Kara
2010-04-15 10:04 ` Jens Axboe [this message]
2010-04-15 10:42 ` Christoph Hellwig
2010-04-15 13:34   ` Jan Kara
2010-04-20  2:26   ` Anton Blanchard
2010-04-20  2:30   ` Anton Blanchard
2010-04-22 19:25     ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100415100415.GU27497@kernel.dk \
    --to=jens.axboe@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=anton@samba.org \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.