Re: [PATCH V2] Btrfs: Full direct I/O and AIO read implementation.

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Chris Mason <chris.mason@oracle.com>
To: jim owens <owens6336@gmail.com>
Cc: Josef Bacik <josef@redhat.com>,
	linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH V2] Btrfs: Full direct I/O and AIO read implementation.
Date: Tue, 16 Feb 2010 11:01:07 -0500	[thread overview]
Message-ID: <20100216160107.GF3497@think> (raw)
In-Reply-To: <4B799DF7.9090008@gmail.com>

On Mon, Feb 15, 2010 at 02:18:15PM -0500, jim owens wrote:
> Chris Mason wrote:
> >Thanks to both Jim and Josef for their time on this one.  More comments
> >below.
> >
> >On Sat, Feb 13, 2010 at 08:30:07PM -0500, jim owens wrote:
> >>Josef Bacik wrote:
> >>>On Wed, Feb 10, 2010 at 01:53:50PM -0500, jim owens wrote:
> >>>>+	/* traditional 512-byte device sector alignment is the
> >>The problem that I did not explain in my comment is that other fs's
> >>get the blocksize from 1 device bdi, which usually will be 512.  But
> >>the btrfs bdi covers all volumes and gives the btrf logical 4096 size.
> >>If we do what is done in direct-io.c then we only allow access on 4k
> >>(or in the future larger logical) boundaries.
> >
> >We can record the smallest alignment while scanning the devices and just
> >use that.  In general 4K is good enough, although some other database
> >company might be more worried about smaller numbers.
> 
> So you are saying at mount time we will set a new field in a
> btrfs data structure that we can use here instead of the hard-code.
> With today's disks this will usually be 512.

Yes please.

[ use i_mutex for reads? ]

> >
> >But, we're making the performance of the common case (parallel reads)
> >dramatically slower to close a small race.  Reads and writes are allowed
> >to race, and the results are allowed to be undefined.  The only
> >requirement is that we not return garbage (zeros, or good data only).
> 
> I agree with both why we don't want the mutex and that we don't care
> about the read returning old or new data.
> 
> But in spite of what you said above, I found lock_extent() was
> not enough and I needed the i_mutex.
> 
> The problem I had was that if btrfs_wait_ordered_range() does not
> run on the new data, that data can still be on disk but the btree
> checksum is wrong. I need the checksum forced into the btree so
> I don't fail the comparison.
> 
> Even an overwrite when nodatacow is set will cause the same problem.
> 
> I can try to code something like btrfs_page_mkwrite() does and see
> if that works, though I really question if that will be a dramatic
> improvement in parallel read because now we have to take the lock
> and search the ordered tree.

But, we already need the code that btrfs_page_mkwrite uses.  It should
be enough to wait for the ordered extents and have the extent range
locked.

> To me this will be a net loss of
> performance on files that are not being modified as the i_mutex
> does not slow down parallel readers except when the area they
> are reading needs flushed. 

The cost of i_mutex on parallel readers + high speed devices can be
surprising.

> And if the file is being modified in
> the spot we are reading, without i_mutex we may need to flush
> it multiple times, endlessly.  And unlike btrfs_page_mkwrite()
> this code is not doing just 1 page.

If the file is being modified by buffered writes, they get what they
deserve.  That's not what O_DIRECT is for.

> 
> Or am I getting something wrong?
> 
> >I think the temp buffer setup is the best place to start for
> >O_DIRECT + compressed.
> 
> Not sure what you mean by that.

I meant I think your way is right ;)

-chris

next prev parent reply	other threads:[~2010-02-16 16:01 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-10 18:53 [PATCH V2] Btrfs: Full direct I/O and AIO read implementation jim owens
2010-02-12 19:28 ` Josef Bacik
2010-02-14  1:30   ` jim owens
2010-02-15 16:42     ` Chris Mason
2010-02-15 19:18       ` jim owens
2010-02-16 16:01         ` Chris Mason [this message]
2010-02-16 17:09           ` jim owens
2010-02-15 21:58       ` Christoph Hellwig
2010-02-15 22:26         ` jim owens
2010-02-15 22:32           ` Christoph Hellwig
2010-02-15 22:40             ` jim owens
2010-02-16 15:49               ` Chris Mason
2010-02-15 22:01     ` rk
2010-02-15 22:31       ` jim owens
2010-02-16 19:28   ` jim owens
2010-02-16 19:39     ` Josef Bacik
2010-03-03 18:54       ` jim owens

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100216160107.GF3497@think \
    --to=chris.mason@oracle.com \
    --cc=josef@redhat.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=owens6336@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox