From: "Ed L. Cashin" <ecashin@coraid.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: linux-kernel@vger.kernel.org, Andrew Morton <akpm@osdl.org>,
xfs@oss.sgi.com, Alan Cox <alan@lxorguk.ukuu.org.uk>
Subject: Re: Re: bio pages with zero page reference count
Date: Fri, 19 Jan 2007 11:21:08 -0500 [thread overview]
Message-ID: <20070119162108.GG16715@coraid.com> (raw)
In-Reply-To: <20061218225343.GA30167@infradead.org>
On Mon, Dec 18, 2006 at 10:53:43PM +0000, Christoph Hellwig wrote:
> On Mon, Dec 18, 2006 at 05:21:09PM -0500, Ed L. Cashin wrote:
...
> > If anyone has a better reference, I'd like to see it.
>
> I searched around a little bit and found these:
>
> http://groups.google.at/group/open-iscsi/browse_frm/thread/17fbe253cf1f69dd/f26cf19b0fee9147?tvc=1&q=kmalloc+iscsi+%22christoph+hellwig%22&hl=de#f26cf19b0fee9147
> http://www.ussg.iu.edu/hypermail/linux/kernel/0408.3/0061.html
>
> But that's not the conclusion I was looking for.
So it sounds like you've been advocating a general discussion of this
issue for a few years now.
To summarize the issue:
1) users of the block layer assume that it's fine to associate pages
that have a zero reference count with a bio before requesting
I/O,
2) intermediaries like iscsi, aoe, and drbd, associate the pages
with the frags of skbuffs, but
3) when the network layer has to linearize the skbuff for a network
device that doesn't support scatter gather, it winds up doing a
get_page and put_page on each page in the frags, despite the fact
that the page reference count on each may already be zero. The
network layer is assuming that it's OK to do use these operations
on any page in the frags.
Maybe the discussion is slow to start because too many parts of the
kernel are involved. Here are a couple of specific questions. Maybe
they'll help get the ball rolling.
1) What are the disadvantages of making the network layer *not*
to assume it's correct to use get/put_page on the frags when it
linearizes an sk_buff?
For example, the network layer could omit the get/put_page when
the page reference count is zero.
2) What are the disadvantages of having one part of the kernel (e.g.,
XFS) reference a page before handing it off to another part of the
kernel, e.g., in a bio?
This change would require multiple parts of the kernel to change
behavior, but it seems conceptually cleaner, since the reference
count would reflect the reality that the page does have an owner
(XFS or whoever). I don't know how practical the implementation
would be.
3) It seems messy to handle this is in each of the individual
intermediary drivers that sit between the block and network
layers, but if that really is the place to do it, then is there a
problem with simply incrementing the page reference counts upon
getting a bio from the block layer, and later decrementing them
before giving them back with bio_endio?
bio_for_each_segment(bv, bio, i)
atomic_inc(&bv->bv_page->_count);
... [and later]
bio_for_each_segment(bv, bio, i)
atomic_dec(&bv->bv_page->_count);
bio_endio(bio, bytes_done, error);
That seems to eliminate problems aoe users have with XFS on AoE
devices that are accessible via network devices that don't support
scatter gather, but is it the right fix?
Andrew Morton changed "count" to "_count" to stop folks from
directly manipulating the page struct member, but I don't see any
get/put_page type operations that fit what the aoe driver has to
do in this case.
--
Ed L Cashin <ecashin@coraid.com>
prev parent reply other threads:[~2007-01-19 16:24 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20061209234305.c65b4e14.akpm@osdl.org>
2006-12-18 17:53 ` [PATCH 2.6.19.1] fix aoe without scatter-gather [Bug 7662] Ed L. Cashin
2006-12-18 22:21 ` bio pages with zero page reference count Ed L. Cashin
2006-12-18 22:53 ` Christoph Hellwig
2007-01-19 16:21 ` Ed L. Cashin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070119162108.GG16715@coraid.com \
--to=ecashin@coraid.com \
--cc=akpm@osdl.org \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=hch@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=support@coraid.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox