From: Zheng Liu <gnehzuil.liu@gmail.com>
To: Allison Henderson <achender@linux.vnet.ibm.com>
Cc: Lukas Czerner <lczerner@redhat.com>, Tao Ma <tm@tao.ma>,
Ext4 Developers List <linux-ext4@vger.kernel.org>,
xfs@oss.sgi.com
Subject: Re: working on extent locks for i_mutex
Date: Wed, 18 Jan 2012 20:02:23 +0800 [thread overview]
Message-ID: <20120118120223.GA4322@gmail.com> (raw)
In-Reply-To: <4F146275.8090304@linux.vnet.ibm.com>
On Mon, Jan 16, 2012 at 10:46:29AM -0700, Allison Henderson wrote:
> On 01/15/2012 04:57 PM, Dave Chinner wrote:
> >On Fri, Jan 13, 2012 at 01:50:52PM -0700, Allison Henderson wrote:
> >>On 01/12/2012 09:34 PM, Dave Chinner wrote:
> >>>On Thu, Jan 12, 2012 at 08:01:43PM -0700, Allison Henderson wrote:
> >>>>Hi All,
> >>>>
> >>>>I know this is an old topic, but I am poking it again because I've
> >>>>had some work items wrap up, and Im planning on picking up on this
> >>>>one again. I am thinking about implementing extent locks to replace
> >>>>i_mutex. So I just wanted to touch base with folks and see what
> >>>>people are working on because I know there were some folks out there
> >>>>that were thing about doing similar solutions.
> >>>
> >>>What locking API are you looking at? If you are looking at an
> >>>something like:
> >>>
> >>>read_range_{try}lock(lock, off, len)
> >>>read_range_unlock(lock, off, len)
> >>>write_range_{try}lock(lock, off, len)
> >>>write_range_unlock(lock, off, len)
> >>>
> >>>and implementing with an rbtree or a btree for tracking, then I
> >>>definitely have a use for it in XFS - replacing the current rwsem
> >>>that is used for the iolock. Range locks like this are the only
> >>>thing we need to allow concurrent buffered writes to the same file
> >>>to maintain the per-write exclusion that posix requires.
> >>
> >>Yes that is generally the idea I was thinking about doing, but at
> >>the time, I was not thinking outside the scope of ext4. You are
> >>thinking maybe it should be in vfs layer so that it's something that
> >>all the filesystems will use? That seems to be the impression I'm
> >>getting from folks. Thx!
> >
> >Yes, that's what I'm suggesting. Not so much a vfs layer function,
> >but a library (range locks could be useful outside filesystems) so
> >locating it in lib/ was what I was thinking....
> >
> >Cheers,
> >
> >Dave.
>
> Alrighty, that sounds good to me. I will aim to keep it as general
> purpose as I can. I am going to start some proto typing and will
> post back when I get something working. Thx for the feedback all!
> :)
Hi Allison,
For this project, do you have a schedule? Would you like to share to me? This
lock contention heavily impacts the performance of direct IO in our production
environment. So we hope to improve it ASAP.
I have done some direct IO benchmarks to compare ext4 with xfs using fio
in Intel SSD. The result shows that, in direct IO, xfs outperforms ext4 and
ext4 with dioread_nolock.
To understand the effect of lock contention, I define a new function called
ext4_file_aio_write() that calls __generic_file_aio_write() without acquiring
i_mutex lock. Meanwhile, I remove DIO_LOCKING flag when __blockdev_direct_IO()
is called and do the similar benchmarks. The result shows that the performance
in ext4 is almost the same to the xfs. Thus, it proves that the i_mutex heavily
impacts the performance. Hopefully the result is useful for you. :-)
I post the result in here.
config file:
[global]
filesize=64G
size=64G
bs=16k
ioengine=psync
direct=1
filename=/mnt/ext4/benchmark
runtime=600
group_reporting
thread
[randrw]
numjobs=32
rw=randrw
rwmixread=90
result:
iops 1 (r/w) 2 3
ext4 5584/622 5726/636 5719/636
ext4+dioread_nolock 7105/789 7117/793 7129/795
ext4+dio_nolock 8920/992 8956/995 8976/997
xfs 8726/971 8962/994 8975/998
bandwidth 1 (r/w) 2 3 KB/s
ext4 89359/9955.3 91621/10186 91519/10185
ext4+dioread_nolock 113691/12635 113882/12692 114066/12728
ext4+dio_nolock 142731/15888 143301/15930 143617/15959
xfs 139627/15537 143400/15914 143603/15980
latency 1 (r/w) 2 3 usec
ext4 5163.28/5048.31 5037.81/4914.82 5041.49/4932.81
ext4+dioread_nolock 1220.04/29510.5 1213.67/29418.9 1208.77/29361.49
ext4+dio_nolock 3226.61/3194.35 3214.59/3178.09 3207.34/3173.78
xfs 3299.87/3266.32 3213.73/3182.20 3208.16/3178.10
Regards,
Zheng
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2012-01-18 11:58 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <4F0F9E97.1090403@linux.vnet.ibm.com>
2012-01-13 4:34 ` working on extent locks for i_mutex Dave Chinner
2012-01-13 7:14 ` Tao Ma
2012-01-13 11:52 ` Dave Chinner
2012-01-13 11:57 ` Tao Ma
2012-01-13 20:50 ` Allison Henderson
2012-01-15 23:57 ` Dave Chinner
[not found] ` <4F146275.8090304@linux.vnet.ibm.com>
2012-01-18 12:02 ` Zheng Liu [this message]
2012-01-19 21:16 ` Frank Mayhar
2012-01-20 2:26 ` Zheng Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120118120223.GA4322@gmail.com \
--to=gnehzuil.liu@gmail.com \
--cc=achender@linux.vnet.ibm.com \
--cc=lczerner@redhat.com \
--cc=linux-ext4@vger.kernel.org \
--cc=tm@tao.ma \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox