From: Neil Brown <neilb@suse.de>
To: Boaz Harrosh <bharrosh@panasas.com>
Cc: "Ing. Daniel Rozsnyó" <daniel@rozsnyo.com>,
"Milan Broz" <mbroz@redhat.com>,
"Marti Raudsepp" <marti@juffo.org>,
linux-kernel@vger.kernel.org,
"Trond Myklebust" <Trond.Myklebust@netapp.com>,
"Andrew Morton" <akpm@linux-foundation.org>
Subject: Re: bio too big - in nested raid setup
Date: Fri, 29 Jan 2010 09:14:57 +1100 [thread overview]
Message-ID: <20100129091457.0088c4af@notabene> (raw)
In-Reply-To: <4B617E03.1050403@panasas.com>
On Thu, 28 Jan 2010 14:07:31 +0200
Boaz Harrosh <bharrosh@panasas.com> wrote:
> On 01/28/2010 12:50 PM, Neil Brown wrote:
> >
> > Both raid0 and linear register a 'bvec_mergeable' function (or whatever it is
> > called today).
> > This allows for the fact that these devices have restrictions that cannot be
> > expressed simply with request sizes. In particular they only handle requests
> > that don't cross a chunk boundary.
> >
> > As raid1 never calls the bvec_mergeable function of it's components (it would
> > be very hard to get that to work reliably, maybe impossible), it treats any
> > device with a bvec_mergeable function as though the max_sectors were one page.
> > This is because the interface guarantees that a one page request will always
> > be handled.
> >
>
> I'm also guilty of doing some mirror work, in exofs, over osd objects.
>
> I was thinking about that reliability problem with mirrors, also related
> to that infamous problem of coping the mirrored buffers so they do not
> change while writing at the page cache level.
So this is a totally new topic, right?
>
> So what if we don't fight it? what if we just keep a journal of the mirror
> unbalanced state and do not page_uptodate until the mirror is finally balanced.
> Only then pages can be dropped from the cache, and journal cleared.
I cannot see what you are suggesting, but it seems like a layering violation.
The block device level cannot see anything about whether the page is up to
date or not. The page it has may not even be in the page cache.
The only thing that the block device can do is make a copy of the page and
write that out twice.
If we could have a flag which the filesystem can send to say "I promise not
to change this page until the IO completes", then that copy could be
optimised away in lots of common cases.
>
> (Balanced-mirror-page is when a page has participated in an IO to all devices
> without being marked dirty from the get-go to the completion of IO)
>
Block device cannot see the 'dirty' flag.
> I think Trond's last work with adding that un_updated-but-committed state to
> pages can facilitate in doing that, though I do understand that it is a major
> conceptual change to the the VFS-BLOCKS relationship in letting the block devices
> participate in the pages state machine (And md keeping a journal). Sigh
>
> ??
> Boaz
NeilBrown
next prev parent reply other threads:[~2010-01-28 22:15 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-24 18:49 bio too big - in nested raid setup "Ing. Daniel Rozsnyó"
2010-01-25 15:25 ` Marti Raudsepp
2010-01-25 18:27 ` Milan Broz
2010-01-28 2:28 ` Neil Brown
2010-01-28 9:24 ` "Ing. Daniel Rozsnyó"
2010-01-28 10:50 ` Neil Brown
2010-01-28 12:07 ` Boaz Harrosh
2010-01-28 22:14 ` Neil Brown [this message]
2010-01-31 15:42 ` Boaz Harrosh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100129091457.0088c4af@notabene \
--to=neilb@suse.de \
--cc=Trond.Myklebust@netapp.com \
--cc=akpm@linux-foundation.org \
--cc=bharrosh@panasas.com \
--cc=daniel@rozsnyo.com \
--cc=linux-kernel@vger.kernel.org \
--cc=marti@juffo.org \
--cc=mbroz@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox