All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dmitry Monakhov <dmonakhov@openvz.org>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Ext4 Developers List <linux-ext4@vger.kernel.org>
Subject: Re: [PATCH,RFC] ext4: add lazytime mount option
Date: Fri, 14 Nov 2014 14:34:34 +0300	[thread overview]
Message-ID: <87h9y2t3qt.fsf@openvz.org> (raw)
In-Reply-To: <20141113160710.GE5235@thunk.org>

[-- Attachment #1: Type: text/plain, Size: 3570 bytes --]

Theodore Ts'o <tytso@mit.edu> writes:

> On Wed, Nov 12, 2014 at 04:47:42PM +0300, Dmitry Monakhov wrote:
>> Also sync mtime updates is a great pain for AIO submitter
>> because AIO submission may be blocked for a seconds (up to 5 second in my case)
>> if inode is part of current committing transaction see: do_get_write_access
>
> 5 seconds?!?  So you're seeing cases where the jbd2 layer is taking
> that long to close a commit?  It might be worth looking at that so we
> can understand why that is happening, and to see if there's anything
> we might do to improve things on that front.  Even if we can get rid
> of most of the mtime updates, there will be other cases where a commit
> that takes a long time to complete will cause all sorts of other very
> nasty latencies on the entire system.
Our chunk server workload is quite generic
submit_task: performs aio-dio requests in to multiple chunk files from
             several threads, this task should not block for too long.
sync_task: performs fsync/fdatasync on demand for modified chunk files before
           we can ACK write-op to user, this task may block


Here is chunk server simulation load:
#TEST_CASE assumes that target fs is mounted to /mnt
# Performs random  aio-dio write  bsz:64k to preallocated files (size:128M) threads:32
# and performs fdatasync each 32'th write operation
$ fio ./aio-dio.fio
# Measure AIO-DIO write submission latency 
$ dd if=/dev/zero of=/mnt/f bs=1M count=1
$ ioping -A  -C -D  -WWW /mnt/f
4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=1 time=410 us
4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=2 time=430 us
4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=3 time=370 us
4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=4 time=400 us
4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=5 time=1.9 s
4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=6 time=4.2 s 
4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=7 time=3.8 s
4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=8 time=3.7 s
4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=9 time=4.1 s
4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=10 time=1.9 s
>
>> Yeah we also has ticket for that :)
>> https://jira.sw.ru/browse/PSBM-20411
>
> Is this supposed to be a URL to publically visible web page?
>
> 	Host jira.sw.ru not found: 3(NXDOMAIN)
Ohh, unfortunetly this host is not visiable from outside.
>
>> > +	if (flags & S_VERSION)
>> > +		inode_inc_iversion(inode);
> 	  ....
>> Since we want update all in-memory data we also have to explicitly update inode->i_version
>> Which was previously updated implicitly here:
>> mark_inode_dirty_sync()
>> ->__mark_inode_dirty
>>   ->ext4_dirty_inode
>>     ->ext4_mark_inode_dirty
>>       ->ext4_mark_iloc_dirty
>>         ->inode_inc_iversion(inode);
>
> It's not necessary to add a anothre call to inode_inc_version() since
> we already incremented the i_version if S_VERSION is set, and
> S_VERSIOn gets set when it's necessary to handle incrementing
> i_Version.
>
> The inode_inc_iversion() in mark4_ext4_iloc_dirty() is probably not
> necessary, since we already should be incrementing i_version whenever
> ctime and mtime gets updated.  The inode_inc_iversion() there is more
> of a "belt and suspenders" safety thing, on the theory that the extra
> bump in i_version won't hurt anything.
>
> Cheers,
>
> 					- Ted

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

  reply	other threads:[~2014-11-14 11:34 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-12  4:07 [PATCH,RFC] ext4: add lazytime mount option Theodore Ts'o
2014-11-12 13:47 ` Dmitry Monakhov
2014-11-13 16:07   ` Theodore Ts'o
2014-11-14 11:34     ` Dmitry Monakhov [this message]
2014-11-14 11:35     ` Dmitry Monakhov
2014-11-13  6:41 ` Dave Chinner
2014-11-13  8:44   ` Boaz Harrosh
2014-11-13 16:35   ` Theodore Ts'o
2014-11-13 20:48     ` Dave Chinner
2014-11-13 21:34       ` Theodore Ts'o
2014-11-13 22:49         ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h9y2t3qt.fsf@openvz.org \
    --to=dmonakhov@openvz.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.