All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Mahoney <jeffm@suse.com>
To: Jan Kara <jack@suse.cz>
Cc: ReiserFS List <reiserfs-list@namesys.com>,
	"E. Gryaznova" <grev@namesys.com>
Subject: Re: [PATCH 00/11] reiserfs: xattr rework
Date: Thu, 02 Mar 2006 16:53:57 -0500	[thread overview]
Message-ID: <44076975.70609@suse.com> (raw)
In-Reply-To: <20060301123406.GA20367@atrey.karlin.mff.cuni.cz>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jan Kara wrote:
>   Hello,
> 
>> Following this message is a series of 11 patches that rework the reiserfs
>> xattr code. The current implementation open codes a number of functions that
>> are well tested and stable elsewhere, but will slight modifications. It also
>> does a number of things suboptimally, such as operations that affect all of
>> the xattrs associated with an inode, as well as not handling journalling as
>> well as can be.
>>
>> I've run them through a weekend of stress testing successfully, but I'd like
>> some additional testing before considering them safe.
>>
>> Here's the run down:
>>
>> * 01 - Make internal xattr operations use vfs ops
>>   This eliminates the open coding of the read/write loops in favor of using
>>   vfs_read and vfs_write. Performance-wise, it's very little difference
>>   from the open coding, and implementation-wise, it's more tested. Yes, this
>>   violates the no-file-io-in-the-kernel rules, but this rule has been violated
>>   since the day the original patches were accepted.
>   I'm not completely sure but from briefly looking at the patches I
> think there might be the following problem: you first start a transaction
> and then call a VFS write operation. That will lock a page it wants to
> write. But if bdflush works on the xattr file and sends some dirty data
> to disk, it will first take page lock and then start the transaction.
> This introduces essentially a lock inversion (as starting a transaction
> behaves like taking a lock) and hence deadlocks... I've been solving
> these for the quota code some time ago (quota also has similar needs as
> your xattr code) - I really had some deadlock reports for heavily loaded
> machines. There the only reasonable solution was an extra write
> functions bypassing the page cache. Maybe in your case you can solve the
> problems differently as you don't need the working solution for all the
> filesystems but just for Reiserfs. 

Sigh. Ok. The way you describe it definitely makes it a lock inversion
issue. I haven't run into it yet, but as you say, it occurs on heavily
loaded machines. I've done some load testing, but apparently not enough,
since your analysis is sound.

But, I think there is a silver lining after all. It sounds like you've
already worked around these issues for the journaled quota code. What do
you think about turning the quota read/write functions into something
more generic and using that for xattrs as well as quotas?

Ultimately, I think that quota files and xattrs are the same things -
"normal" files read from and written to during the I/O path. The changes
to journaled quotas would be minimal - just turn
reiserfs_quota_{read,write} into small wrappers that make the
type->inode mapping and then call the remainder of the existing code as
reiserfs_internal_{read,write} with the appropriate inode. The xattr
code could juse the reiserfs_internal_{read,write} similarly and get all
the deadlock avoidance work you've already done for free.

- -Jeff

- --
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFEB2l0LPWxlyuTD7IRAsuaAKCHnEKlcgxcIGEV5n5f3m4CY5c/pACeK4CH
h1cj7wLkY+irYbaagmUHtm4=
=o4nE
-----END PGP SIGNATURE-----

  reply	other threads:[~2006-03-02 21:53 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-20 20:14 [PATCH 00/11] reiserfs: xattr rework Jeff Mahoney
2006-03-01 12:34 ` Jan Kara
2006-03-02 21:53   ` Jeff Mahoney [this message]
2006-03-03  0:29     ` Jan Kara
2006-03-03  0:56       ` Jeff Mahoney
2006-03-03 10:04         ` Jan Kara
2006-03-03 22:17           ` Jeff Mahoney
2006-03-06 11:59             ` Jan Kara
2006-03-07 21:39               ` Jeff Mahoney
2006-03-08 18:20                 ` Jan Kara
2006-03-08 19:12                   ` Jeff Mahoney
2006-03-08 23:14                     ` Andreas Dilger
2006-03-09 18:26                       ` Hans Reiser
2006-03-09 19:11                         ` Jeff Mahoney
2006-03-09 19:52                           ` Hans Reiser

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44076975.70609@suse.com \
    --to=jeffm@suse.com \
    --cc=grev@namesys.com \
    --cc=jack@suse.cz \
    --cc=reiserfs-list@namesys.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.