linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Chinner <dgc@sgi.com>
To: Zach Brown <zach.brown@oracle.com>
Cc: David Chinner <dgc@sgi.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>, xfs-oss <xfs@oss.sgi.com>
Subject: Re: correct use of vmtruncate()?
Date: Wed, 30 Apr 2008 07:52:07 +1000	[thread overview]
Message-ID: <20080429215207.GT108924158@sgi.com> (raw)
In-Reply-To: <481756A3.20601@oracle.com>

On Tue, Apr 29, 2008 at 10:10:59AM -0700, Zach Brown wrote:
> 
> > The obvious fix for this is that block_write_begin() and
> > friends should be calling ->setattr to do the truncation and hence
> > follow normal convention for truncating blocks off an inode.
> > However, even that appears to have thorns. e.g. in XFS we hold the
> > iolock exclusively when we call block_write_begin(), but it is not
> > held in all cases where ->setattr is currently called. Hence calling
> > ->setattr from block_write_begin in this failure case will deadlock
> > unless we also pass a "nolock" flag as well. XFS already
> > supports this (e.g. see the XFS fallocate implementation) but no other
> > filesystem does (some probably don't need to).
> 
> This paragraph in particular reminds me of an outstanding bug with
> O_DIRECT and ext*.  It isn't truncating partial allocations when a dio
> fails with ENOSPC.  This was noticed by a user who saw that fsck found
> bocks outside i_size in the file that saw ENOSPC if they tried to
> unmount and check the volume after the failed write.

That sounds very similar - ENOSPC seems to be one way of "easily"
generating the error condition that exposes this condition, but
I'm sure there are others as well...

> So, whether we decide that failed writes should call setattr or
> vmtruncate, we should also keep the generic O_DIRECT path in
> consideration.  Today it doesn't even try the supposed generic method of
> calling vmtrunate().

Certainly, though the locking will certainly be entertaining in
this path....

> (Though I'm sure XFS' dio code already handles freeing blocks :))

Not the dio code as such, but the close path does. Blocks beyond EOF get
truncated off in ->release or ->clear_inode (unless they were specifically
preallocated) and dio does not do delayed allocation so does not suffer
from the "need ->setattr issue" to truncate them away on ENOSPC. i.e. after
the error occurs and the app closes the fd, the blocks get truncated away.

Basically the problem I described is leaving delayed allocation blocks beyond
EOF without any page cache mappings to indicate they are there - allocated
blocks beyond EOF are not a problem...

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

  reply	other threads:[~2008-04-29 21:52 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-29 10:06 correct use of vmtruncate()? David Chinner
2008-04-29 17:10 ` Zach Brown
2008-04-29 21:52   ` David Chinner [this message]
2008-04-30  7:24   ` Aneesh Kumar K.V
2008-04-30 15:55     ` Zach Brown
2008-04-30  3:46 ` David Chinner
2008-04-30  7:47 ` Aneesh Kumar K.V
2008-04-30 10:15   ` David Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080429215207.GT108924158@sgi.com \
    --to=dgc@sgi.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=xfs@oss.sgi.com \
    --cc=zach.brown@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).