public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Steve Lord <lord@xfs.org>
To: Miquel van Smoorenburg <miquels@cistron.nl>
Cc: nathans@sgi.com, hch@infradead.org, linux-kernel@vger.kernel.org
Subject: Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()
Date: Sat, 31 Jan 2004 09:59:06 -0600	[thread overview]
Message-ID: <401BD0CA.5070204@xfs.org> (raw)
In-Reply-To: <20040131114638.GA29609@drinkel.cistron.nl>

Miquel van Smoorenburg wrote:
> On Sat, 31 Jan 2004 02:38:51, Andrew Morton wrote:
> 
>>Miquel van Smoorenburg <miquels@cistron.nl> wrote:
>>
>>>But the XFS problem appears to be vn_revalidate which calls i_size_write()
>>>without holding i_sem:
>>
>>There's your bug.
> 
> 
> Okay. It seems that XFS uses its own locking with xfs_ilock() etc,
> so I am not sure if this should be fixed by using down(&inode->i_sem)
> or by using xfs_ilock().
> 
> Perhaps xfs_ilock() should also get the inode->i_sem semaphore
> in the XFS_ILOCK_EXCL case ?
> 
> Also, there are one or two more places that call i_size_write()
> that should be looked at I guess.
> 

Actually it gets more messy than this. i_size_write is called by
different spots by xfs without the i_sem held. The call in
generic_file_aio_write_nolock() is made without the lock for O_DIRECT 
writes, XFS allows multiple parallel writes for O_DIRECT
writes.

The vn_revalidate call is called out of linvfs_setattr,
which is called with the i_sem held, it is also potentially called out
of linvfs_getattr, although since the i_size is always maintained
as it is changed, this call should not actually be updating the size.
Possibly changing the code in vn_revalidate to do this:

	if (i_size_read(inode) < va.va_size))
		i_size_write(inode, va.va_size);

Would be a good starting point, I suspect those calls from the nfs
revalidate call are not really going to change the inode size. My
guess is this will make your problem go away.

Probably some larger code restructure is needed so that revalidate
knows if the i_sem is held or not at this point.

The O_DIRECT write case is the hard one. In XFS's internal view of
the world, the inode size is maintained via the XFS_ILOCK, but we
only hold that across metadata manipulation within the fs code,
not across I/O such as a call to generic_file_aio_write_nolock.
Right now the only way I see of dealing with that is to make
writes which we know will extent the file hold the i_sem for
the duration in the O_DIRECT case.

Steve

  reply	other threads:[~2004-02-01 16:00 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-01-28 17:17 2.6.2-rc2 nfsd+xfs spins in i_size_read() Miquel van Smoorenburg
2004-01-29  6:25 ` Andrew Morton
2004-01-29 23:20   ` Miquel van Smoorenburg
2004-02-04  0:03     ` Christoph Hellwig
2004-02-03 14:13       ` Steve Lord
2004-02-04 15:16         ` Christoph Hellwig
2004-02-04 18:17         ` Christoph Hellwig
2004-02-04  0:06       ` David Weinehall
2004-02-04  0:07         ` Christoph Hellwig
2004-01-30 16:01   ` Miquel van Smoorenburg
2004-01-30 20:21   ` Miquel van Smoorenburg
2004-01-30 22:13     ` Miquel van Smoorenburg
2004-01-30 22:34       ` Andrew Morton
2004-01-30 22:53         ` Christoph Hellwig
2004-01-30 23:13           ` Andrew Morton
2004-01-31  1:25             ` Miquel van Smoorenburg
2004-01-31  1:38               ` Andrew Morton
2004-01-31 11:46                 ` Miquel van Smoorenburg
2004-01-31 15:59                   ` Steve Lord [this message]
2004-02-01 16:15                     ` Christoph Hellwig
2004-01-31 16:41                       ` Steve Lord
2004-01-31 17:07               ` Christoph Hellwig
2004-02-01  1:46               ` Anton Blanchard
2004-01-30 23:07         ` Nathan Scott
2004-01-29  6:30 ` Nathan Scott

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=401BD0CA.5070204@xfs.org \
    --to=lord@xfs.org \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miquels@cistron.nl \
    --cc=nathans@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox