Re: [RFC PATCH 3/3] DIO: don't fall back to buffered writes

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Suparna Bhattacharya <suparna@in.ibm.com>
To: Chris Mason <chris.mason@oracle.com>
Cc: linux-fsdevel@vger.kernel.org, akpm@osdl.org, zach.brown@oracle.com
Subject: Re: [RFC PATCH 3/3] DIO: don't fall back to buffered writes
Date: Fri, 3 Nov 2006 12:55:45 +0530	[thread overview]
Message-ID: <20061103072545.GB13999@in.ibm.com> (raw)
In-Reply-To: <20061101124719.GG5620@think.oraclecorp.com>

On Wed, Nov 01, 2006 at 07:47:19AM -0500, Chris Mason wrote:
> On Wed, Nov 01, 2006 at 04:21:37PM +0530, Suparna Bhattacharya wrote:
> > > +	/*
> > > +	 * extend the file if needed, and drop i_mutex for non-aio writes.
> > > +	 * We extend the file by creating a hole that is later filled in
> > > +	 * during the O_DIRECT.  Because pages are locked or placeholders
> > > +	 * are inserted, the locking rules end up being the same as
> > > +	 * mmap'd writes using writepage to fill holes
> > 
> > I like the idea of extending the size in the beginning to avoid having
> > to hold i_sem for the extended writes case. The only question is whether
> > there is a semantics issue here - e.g. if there isn't enough space, would the
> > app which did an append still see the size change even though the blocks
> > themselves are not populated ? Of course we already have this happen in the
> > -EIO case for AIO-DIO writes, no am not sure if this is an issue.
> 
> You're right, there is a semantic issue, I'm being a little loose with
> the traditional write() expectations.  The current code creates a hole
> and does not remove it in the face of errors.
> 
> > 
> > > +	 */
> > > +	dio->reacquire_i_mutex = 0;
> > > +	if (dio_lock_type == DIO_LOCKING && (rw & WRITE)) {
> > 
> > Hmm, we really do want to be able to avoid all these various locking
> > mode checks inside the DIO code to simplify things. I thought you
> > had mostly managed to do away with these in your previous patch. 
> > Unfortunately this change while it should help concurrency, brings
> > that check back in. One of the plans I had earlier was to pull this
> > up to a higher level - i.e. in the locking version of blockdev_direct_IO()
> > thus doing away with the flag and related confusion. Do you think
> > that would make sense ?
> 
> I went the other direction, making the flags a little more formal (patch
> will go out today).  I don't have strong feelings about this, but it
> seems easiest to me to keep the flags inside blockdev_direct_IO.

Do we have some kind of description of what conditions each flags would
be used / who would use them (not just what they do). For example I
have forgotton why DIO_CREATE was actually introduced for XFS. Is it
still needed now that we no longer needed to do the buffered IO fallback
for overwriting holes ? 

That would help the discussion.

And then there used to be special flags for cluster
filesystems which was introduced by GFS ... all of it got to be quite
confusing. The formality helps, but only to an extent.

> 
> > 
> > 
> > > +		/* if our write goes past i_size, do an expanding
> > > +		 * truncate to fill it before dropping i_mutex
> > > +		 */
> > > +		if (end > i_size_read(inode) && iocb->ki_filp) {
> > > +			struct iattr newattrs;
> > > +			newattrs.ia_size = end;
> > > +			newattrs.ia_file = iocb->ki_filp;
> > > +			newattrs.ia_valid = ATTR_SIZE | ATTR_FILE;
> > > +			retval = notify_change(iocb->ki_filp->f_dentry,
> > > +					       &newattrs);
> > > +			if (retval)
> > > +				goto out;
> > > +		}
> > > +		if (is_sync_kiocb(iocb)) {
> > > +			dio->reacquire_i_mutex = 1;
> > > +			mutex_unlock(&inode->i_mutex);
> > 
> > BTW, don't the page locks cover the AIO case as well ? I'm wondering
> > why we need this special case here.
> 
> Only because i_mutex is already dropped implicitly by the aio code when
> we return.  Ideally we could drop it sooner inside fs/direct-io.c, but
> I'll leave that tuning for a later patch.

OK, I see.

There is one other thing that I'm not sure we have covered yet. That is
ensuring ordered mode guarantees with AIO-DIO. As long as we were falling
back to buffered IO in cases involving block allocation or forcing extends
to be synchronous, we probably didn't need to worry about this. How do
we make sure we address that now.

Regards
Suparna

> 
> -chris

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India

     prev parent reply	other threads:[~2006-11-03  7:22 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-10-27 18:22 [RFC PATCH 0/3] O_DIRECT locking rework Chris Mason
2006-10-27 18:23 ` [RFC PATCH 1/3] Introduce a place holder page for the pagecache Chris Mason
2006-11-01 10:23   ` Suparna Bhattacharya
2006-11-01 12:41     ` Chris Mason
2006-11-03  7:12       ` Suparna Bhattacharya
2006-11-03 14:10         ` Chris Mason
2006-10-27 18:25 ` [RFC PATCH 2/3] Change O_DIRECT to use placeholders Chris Mason
2006-10-27 18:26 ` [RFC PATCH 3/3] DIO: don't fall back to buffered writes Chris Mason
2006-11-01 10:51   ` Suparna Bhattacharya
2006-11-01 12:47     ` Chris Mason
2006-11-03  7:25       ` Suparna Bhattacharya [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061103072545.GB13999@in.ibm.com \
    --to=suparna@in.ibm.com \
    --cc=akpm@osdl.org \
    --cc=chris.mason@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=zach.brown@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).