Re: Plan for reducing i_mutex in ext4

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Allison Henderson <achender@linux.vnet.ibm.com>
To: Lukas Czerner <lczerner@redhat.com>
Cc: Ext4 Developers List <linux-ext4@vger.kernel.org>,
	"Ted Ts'o" <tytso@mit.edu>, Christoph Hellwig <hch@infradead.org>
Subject: Re: Plan for reducing i_mutex in ext4
Date: Thu, 06 Oct 2011 10:36:46 -0700	[thread overview]
Message-ID: <4E8DE72E.2060103@linux.vnet.ibm.com> (raw)
In-Reply-To: <alpine.LFD.2.00.1110041024470.7833@dhcp-27-109.brq.redhat.com>

On 10/04/2011 01:38 AM, Lukas Czerner wrote:
> On Mon, 3 Oct 2011, Allison Henderson wrote:
>
>> Hi all,
>>
>> I've been working on locating all the existing uses of i_mutex in the current
>> ext4 code because I know we are planning to reduce the usage of i_mutex in
>> ext4.  So I've gone through the ext4 code and also the vfs code and come up
>> with a list of ext4 items that appear to be protected under i_mutex.  I'm
>> thinking about doing a patch to replace i_mutex with a private ext4 mutex, and
>> I wanted to update folks on this idea and pick up any feed back people might
>> have.
>>
>> I'm thinking maybe we can have a separate mutex for functions that only modify
>> meta data like ext4_ioctl and ext4_setattr to help relieve unneeded
>> contention.  And then the rest of functions that are modifying data can go
>> under a data mutex (including truncate since sometimes ext4_ioctl and
>> ext4_setattr will call ext4_truncate if they modify i_size).
>
> Just the other day I was talking with Christoph (adding him to cc) about
> this, but unfortunately I still did not have time to look at this, but I
> am glad that someone did.
>
> His suggestion was a bit more general than creating separate ext4
> specific mutex. His idea was to change i_mutex to union of plain mutex
> for directories and a rwlock for regular files. Then this union can be
> used in other file systems as well, for example to replace xfs_iolock in
> xfs.
>
> Also it might be nice to do something smarter than just a rwlock for
> regular files. It would be nice to have an structure of extent locks, so
> we can use it for file system using extents, which will improve
> scalability while hammering a single file from different processes.
>
> Note that currently ext4 concurrent read/write are atomic only wrt
> individual pages, but not on the system call as the whole. This might
> cause read() to return data mixed from several different writes, which
> is not posix conform. That could be solved with the generic rwlock for
> files, or even better with the system of extent locking.
>
> But Christoph, can probably describe hi idea a bit better.
>
> Thanks!
> -Lukas

Hi Lukas,

Sorry for the delay, and thanks for the response :)  Alrighty, I will 
have to do some prototyping and see if I can work in some of these 
concepts into a solution.  At the moment, Im trying to make sure I come 
up with something that still provides all the existing functionality so 
I dont introduce any new race problems, but there's certainly a lot of 
room for optimizing too.  Thx!

Allison Henderson
>
>>
>> So these are ext4 functions that currently lock i_mutex:
>>
>> ext4_sync_file
>> ext4_fallocate
>> ext4_move_extents via two helper routines:
>> 	mext_inode_double_lock and mext_inode_double_unlock
>> ext4_ioctl (for the EXT4_IOC_SETFLAGS ioctl)
>> ext4_quota_write
>> ext4_llseek
>> ext4_end_io_work
>> ext4_evict_inode (only while calling ext4_flush_completed_IO)
>> ext4_ind_direct_IO (only while calling ext4_flush_completed_IO)
>>
>>
>> And these are ext4 functions that have i_mutex locked by the vfs layer.  So we
>> will need to lock the new private mutex here too if we want them to be
>> synchronous with the above functions.
>>
>> ext4_setattr
>> ext4_da_writepages
>> ext4_rmdir
>> ext4_unlink
>> ext4_symlink
>> ext4_link
>> ext4_rename
>>
>> And one unique case:
>> ext4_fiemap calls generic_block_fiemap and passes it a function pointer to
>> ext4_get_block.  generic_block_fiemap will lock i_mutex before calling the
>> pointer.  I dont think ext4_get_block needs i_mutex locked all the time, so I
>> think we can just make a wrapper for ext4_get_block that locks the new private
>> mutex and then we can pass a pointer to the wrapper.
>>
>>
>> That's my list so far, if anyone knows of one I missed please let me know, and
>> also if you spot any other places where we can reduce unneeded contention by
>> using a separate lock.  Thx!
>>
>> Allison Henderson
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>

next prev parent reply	other threads:[~2011-10-06 17:45 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-03 19:00 Plan for reducing i_mutex in ext4 Allison Henderson
2011-10-04  8:38 ` Lukas Czerner
2011-10-06 17:36   ` Allison Henderson [this message]
2011-10-04  8:57 ` Dmitry Monakhov
2011-10-04 19:13   ` Allison Henderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E8DE72E.2060103@linux.vnet.ibm.com \
    --to=achender@linux.vnet.ibm.com \
    --cc=hch@infradead.org \
    --cc=lczerner@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.