PATCH 2.6.21-rc1 aoe: handle zero

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* PATCH 2.6.21-rc1 aoe: handle zero _count pages in bios
@ 2007-03-01 23:15 Ed L. Cashin
  2007-03-02  1:42 ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: Ed L. Cashin @ 2007-03-01 23:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: Greg KH, Andrew Morton, ecashin

This patch works around a problem discussed here and on the XFS
mailing list in January.

  http://lkml.org/lkml/2007/1/19/56

To summarize the issue: If XFS (or any other creator of bios) gives
the aoe driver a bio with pages that have a zero page _count, and then
the aoe driver hands the page to the network layer in an sk_buff's
frags, and if the network card does not support the scatter gather
feature, then the network layer will eventually try to put_page on the
page, and the kernel winds up panicing.

There is a disconnect between the assumptions of the bio creator (that
pages don't need to have a non-zero _count), and the assumptions of
the network layer (where it's assumed that pages will always have a
positive count).  There was no response, though, to a call in January
for ideas about resolving the disconnect.

So to work around the issue, the simple patch below increments the
page _count before handing it to the network layer and decrements it
after the network layer is done with the page.  This patch eliminates
panics for XFS on aoe users who lack scatter gather support in their
network cards.

It's regrettable that _count is manipulated directly, because Andrew
Morton changed the page "count" member to a _count to prevent exactly
this kind of direct manipulation of the data.  There does not appear
to be a "right" way to increment and decrement the count, however,
inside a driver without unwanted side effects.  The closest candidates
are in mm/internal.h and are presumably intended to be used
exclusively by mm/*.c.

Signed-off-by: "Ed L. Cashin" <ecashin@coraid.com>

diff -upr -X linux-2.6.21-rc1.dontdiff linux-2.6.21-rc1.orig/drivers/block/aoe/aoe.h linux-2.6.21-rc1/drivers/block/aoe/aoe.h
--- linux-2.6.21-rc1.orig/drivers/block/aoe/aoe.h	2007-02-27 14:11:06.249132000 -0500
+++ linux-2.6.21-rc1/drivers/block/aoe/aoe.h	2007-02-27 17:43:22.037069000 -0500
@@ -150,6 +150,7 @@ int aoeblk_init(void);
 void aoeblk_exit(void);
 void aoeblk_gdalloc(void *);
 void aoedisk_rm_sysfs(struct aoedev *d);
+void aoe_bio_done(struct bio *bio, unsigned int bytes_done, int error);
 
 int aoechr_init(void);
 void aoechr_exit(void);
diff -upr -X linux-2.6.21-rc1.dontdiff linux-2.6.21-rc1.orig/drivers/block/aoe/aoeblk.c linux-2.6.21-rc1/drivers/block/aoe/aoeblk.c
--- linux-2.6.21-rc1.orig/drivers/block/aoe/aoeblk.c	2007-02-27 14:11:06.253132000 -0500
+++ linux-2.6.21-rc1/drivers/block/aoe/aoeblk.c	2007-02-27 17:43:22.037069000 -0500
@@ -14,6 +14,29 @@
 
 static struct kmem_cache *buf_pool_cache;
 
+/* workaround for XFS and bios with zero pageref pages in general */
+void
+aoe_bio_done(struct bio *bio, unsigned int bytes_done, int error)
+{
+	struct bio_vec *bv;
+	int i;
+
+	bio_for_each_segment(bv, bio, i)
+		atomic_dec(&bv->bv_page->_count);
+
+	bio_endio(bio, bytes_done, error);
+}
+
+static void
+bio_refpages(struct bio *bio)
+{
+	struct bio_vec *bv;
+	int i;
+
+	bio_for_each_segment(bv, bio, i)
+		atomic_inc(&bv->bv_page->_count);
+}
+
 static ssize_t aoedisk_show_state(struct gendisk * disk, char *page)
 {
 	struct aoedev *d = disk->private_data;
@@ -147,6 +170,7 @@ aoeblk_make_request(request_queue_t *q, 
 	buf->bio = bio;
 	buf->resid = bio->bi_size;
 	buf->sector = bio->bi_sector;
+	bio_refpages(bio);
 	buf->bv = &bio->bi_io_vec[bio->bi_idx];
 	WARN_ON(buf->bv->bv_len == 0);
 	buf->bv_resid = buf->bv->bv_len;
@@ -159,7 +183,7 @@ aoeblk_make_request(request_queue_t *q, 
 			d->aoemajor, d->aoeminor);
 		spin_unlock_irqrestore(&d->lock, flags);
 		mempool_free(buf, d->bufpool);
-		bio_endio(bio, bio->bi_size, -ENXIO);
+		aoe_bio_done(bio, bio->bi_size, -ENXIO);
 		return 0;
 	}
 
diff -upr -X linux-2.6.21-rc1.dontdiff linux-2.6.21-rc1.orig/drivers/block/aoe/aoecmd.c linux-2.6.21-rc1/drivers/block/aoe/aoecmd.c
--- linux-2.6.21-rc1.orig/drivers/block/aoe/aoecmd.c	2007-02-27 14:11:06.253132000 -0500
+++ linux-2.6.21-rc1/drivers/block/aoe/aoecmd.c	2007-02-27 17:43:22.037069000 -0500
@@ -649,7 +649,7 @@ aoecmd_ata_rsp(struct sk_buff *skb)
 			disk_stat_add(disk, sectors[rw], n_sect);
 			disk_stat_add(disk, io_ticks, duration);
 			n = (buf->flags & BUFFL_FAIL) ? -EIO : 0;
-			bio_endio(buf->bio, buf->bio->bi_size, n);
+			aoe_bio_done(buf->bio, buf->bio->bi_size, n);
 			mempool_free(buf, d->bufpool);
 		}
 	}
diff -upr -X linux-2.6.21-rc1.dontdiff linux-2.6.21-rc1.orig/drivers/block/aoe/aoedev.c linux-2.6.21-rc1/drivers/block/aoe/aoedev.c
--- linux-2.6.21-rc1.orig/drivers/block/aoe/aoedev.c	2007-02-27 14:11:06.253132000 -0500
+++ linux-2.6.21-rc1/drivers/block/aoe/aoedev.c	2007-02-27 17:43:22.041069250 -0500
@@ -119,7 +119,7 @@ aoedev_downdev(struct aoedev *d)
 		bio = buf->bio;
 		if (--buf->nframesout == 0) {
 			mempool_free(buf, d->bufpool);
-			bio_endio(bio, bio->bi_size, -EIO);
+			aoe_bio_done(bio, bio->bi_size, -EIO);
 		}
 		skb_shinfo(f->skb)->nr_frags = f->skb->data_len = 0;
 	}
@@ -130,7 +130,7 @@ aoedev_downdev(struct aoedev *d)
 		list_del(d->bufq.next);
 		bio = buf->bio;
 		mempool_free(buf, d->bufpool);
-		bio_endio(bio, bio->bi_size, -EIO);
+		aoe_bio_done(bio, bio->bi_size, -EIO);
 	}
 
 	if (d->gd)


-- 
  Ed L Cashin <ecashin@coraid.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PATCH 2.6.21-rc1 aoe: handle zero _count pages in bios
  2007-03-01 23:15 PATCH 2.6.21-rc1 aoe: handle zero _count pages in bios Ed L. Cashin
@ 2007-03-02  1:42 ` Andrew Morton
  2007-03-02  2:29   ` Christoph Hellwig
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2007-03-02  1:42 UTC (permalink / raw)
  To: support; +Cc: Ed L. Cashin, linux-kernel, Greg KH

On Thu, 1 Mar 2007 18:15:10 -0500
"Ed L. Cashin" <ecashin@coraid.com> wrote:

> This patch works around a problem discussed here and on the XFS
> mailing list in January.
> 
>   http://lkml.org/lkml/2007/1/19/56
> 
> To summarize the issue: If XFS (or any other creator of bios) gives
> the aoe driver a bio with pages that have a zero page _count, and then
> the aoe driver hands the page to the network layer in an sk_buff's
> frags, and if the network card does not support the scatter gather
> feature, then the network layer will eventually try to put_page on the
> page, and the kernel winds up panicing.
> 
> There is a disconnect between the assumptions of the bio creator (that
> pages don't need to have a non-zero _count), and the assumptions of
> the network layer (where it's assumed that pages will always have a
> positive count).  There was no response, though, to a call in January
> for ideas about resolving the disconnect.
> 
> So to work around the issue, the simple patch below increments the
> page _count before handing it to the network layer and decrements it
> after the network layer is done with the page.  This patch eliminates
> panics for XFS on aoe users who lack scatter gather support in their
> network cards.

Something funny is going on here.

Generally, one should increment the refcount of a page when it is put into
some container.  That means that the page should get +1 when it is added to
a bio.  (direct-io does this, but the mpage.c pagecache code cheats, and
relies upon PG_locked and PG-writeback protecting the page).

Similarly, the network code (or its caller) should be incrementing the
page's refcount as the page goes into a container (ie: the skb) and
decrementing it as the page is removed.

But someone somewhere is breaking those rules.  Who?

> It's regrettable that _count is manipulated directly, because Andrew
> Morton changed the page "count" member to a _count to prevent exactly
> this kind of direct manipulation of the data.  There does not appear
> to be a "right" way to increment and decrement the count, however,
> inside a driver without unwanted side effects.  The closest candidates
> are in mm/internal.h and are presumably intended to be used
> exclusively by mm/*.c.
> 

Odd.  You _should_ be able to use plain old get_page() and put_page().  If
you use a raw decrement of page->count then there's a risk that this was
the final reference and the page will leak forever.  We do need to use
put_page()'s return-it-to-the-allocator-if-this-was-the-last-use logic.

So.  Who is breaking refcounting protocol here?  Perhaps it is AOE, failing
to increment the refcount on pages as they are added to an skb?

(Do we know which callsite in XFS is adding zero-ref pages to a BIO, btw?)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PATCH 2.6.21-rc1 aoe: handle zero _count pages in bios
  2007-03-02  1:42 ` Andrew Morton
@ 2007-03-02  2:29   ` Christoph Hellwig
  2007-03-02  3:22     ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2007-03-02  2:29 UTC (permalink / raw)
  To: Andrew Morton; +Cc: support, Ed L. Cashin, linux-kernel, Greg KH

On Thu, Mar 01, 2007 at 05:42:04PM -0800, Andrew Morton wrote:
> Something funny is going on here.

Not so funny for those who've tried to sort out the issue over
the past years and just got ignored..

> Generally, one should increment the refcount of a page when it is put into
> some container.  That means that the page should get +1 when it is added to
> a bio.  (direct-io does this, but the mpage.c pagecache code cheats, and
> relies upon PG_locked and PG-writeback protecting the page).

It's a slab page, and slab pages aren't refcounted (which is a good thing
as you don't own the whole page)

> Similarly, the network code (or its caller) should be incrementing the
> page's refcount as the page goes into a container (ie: the skb) and
> decrementing it as the page is removed.
> 
> But someone somewhere is breaking those rules.  Who?

slab code.  

> So.  Who is breaking refcounting protocol here?  Perhaps it is AOE, failing
> to increment the refcount on pages as they are added to an skb?
> 
> (Do we know which callsite in XFS is adding zero-ref pages to a BIO, btw?)

For example all log I/O is done from kmalloce pages.

Anyway, to rehash what I've been trying to get clarified for ages:


 (1) should we allow to pass slab pages into bios

and

 (2) if yes what's the way lower layers are supposed to handle them
     for any possible refcounting operations like networking or rdma.

There's also a pontial caller in ext3 that can send down kmalloc'ed
buffers: journal_write_metadata_buffer() in need_copy_out && !done_copy_out
case.  But apparently that's an almost dead code path as I've never
seen anyone tripping this one, it's always XFS that people report.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PATCH 2.6.21-rc1 aoe: handle zero _count pages in bios
  2007-03-02  2:29   ` Christoph Hellwig
@ 2007-03-02  3:22     ` Andrew Morton
  2007-03-02  4:30       ` Christoph Hellwig
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2007-03-02  3:22 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: support, Ed L. Cashin, linux-kernel, Greg KH

On Fri, 2 Mar 2007 02:29:19 +0000 Christoph Hellwig <hch@infradead.org> wrote:

> On Thu, Mar 01, 2007 at 05:42:04PM -0800, Andrew Morton wrote:
> > Something funny is going on here.
> 
> Not so funny for those who've tried to sort out the issue over
> the past years and just got ignored..
> 
> > Generally, one should increment the refcount of a page when it is put into
> > some container.  That means that the page should get +1 when it is added to
> > a bio.  (direct-io does this, but the mpage.c pagecache code cheats, and
> > relies upon PG_locked and PG-writeback protecting the page).
> 
> It's a slab page, and slab pages aren't refcounted (which is a good thing
> as you don't own the whole page)

ah, I see.

> > Similarly, the network code (or its caller) should be incrementing the
> > page's refcount as the page goes into a container (ie: the skb) and
> > decrementing it as the page is removed.
> > 
> > But someone somewhere is breaking those rules.  Who?
> 
> slab code.  

Well I spose slab _could_ take a ref on these pages.

> > So.  Who is breaking refcounting protocol here?  Perhaps it is AOE, failing
> > to increment the refcount on pages as they are added to an skb?
> > 
> > (Do we know which callsite in XFS is adding zero-ref pages to a BIO, btw?)
> 
> For example all log I/O is done from kmalloce pages.
> 
> Anyway, to rehash what I've been trying to get clarified for ages:
> 
> 
>  (1) should we allow to pass slab pages into bios
> 
> and
> 
>  (2) if yes what's the way lower layers are supposed to handle them
>      for any possible refcounting operations like networking or rdma.
> 
> There's also a pontial caller in ext3 that can send down kmalloc'ed
> buffers: journal_write_metadata_buffer() in need_copy_out && !done_copy_out
> case.  But apparently that's an almost dead code path as I've never
> seen anyone tripping this one, it's always XFS that people report.

OK.  Let's go through it.

Networking internally maintains caller memory lifetimes, and it assumes
that the caller allocated memory via __alloc_pages() - because it uses
get_page() and put_page().

BIO, however, does not internally manage caller memory lifetime.  This is
because the caller's ->bi_end_io is always called, so the caller can do it.

So where we've come unstuck is in a module which has gone and fed BIO
memory into networking.  The differing design philosophies are clashing.

I'm surprised this doesn't happen in other places - aren't there any other
drivers which take a BIO and stuff it down the network?

Anyway, where's the bug?

Really, I'd say it's XFS (and ext3).  Even though BIO doesn't presently
manage page lifetimes, it _could_.  After all, the function is called
bio_add_page(), not bio_add_virtual_address().  It's a bit hacky to kmalloc
some memory, run virt_to_page() and to then present that page to BIO even
though the caller (thanks to the slab optimisation) doesn't actually have
control of that page's lifetime.

So we have a few options to look at:

a) kludge things in AOE.  Unpleasing, and might cause memory leaks
   (although it won't, because the caller hasn't run bi_end_io yet).

b) Take a ref on slab pages in slab.  A bit costly, perhaps.

c) teach ext3 and XFS to take a ref on these pages as they are added to
   the BIOs, undo that ref in bi_end_io.

I think c)?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PATCH 2.6.21-rc1 aoe: handle zero _count pages in bios
  2007-03-02  3:22     ` Andrew Morton
@ 2007-03-02  4:30       ` Christoph Hellwig
  2007-03-02  4:48         ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2007-03-02  4:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Hellwig, support, Ed L. Cashin, linux-kernel, Greg KH

On Thu, Mar 01, 2007 at 07:22:45PM -0800, Andrew Morton wrote:
> Well I spose slab _could_ take a ref on these pages.

What it would need to do is:

 - add a reference for every object touching this page
 - don't give the page back to the page allocator or reuse any
   single object inside it until there are no more reference to the page.

I don't think this is a very good idea, although the netowkring references
tend to be rather short-term once making this not a that bad burden.

> Networking internally maintains caller memory lifetimes, and it assumes
> that the caller allocated memory via __alloc_pages() - because it uses
> get_page() and put_page().
> 
> BIO, however, does not internally manage caller memory lifetime.  This is
> because the caller's ->bi_end_io is always called, so the caller can do it.
> 
> So where we've come unstuck is in a module which has gone and fed BIO
> memory into networking.  The differing design philosophies are clashing.
> 
> I'm surprised this doesn't happen in other places - aren't there any other
> drivers which take a BIO and stuff it down the network?
> 
> Anyway, where's the bug?
> 
> Really, I'd say it's XFS (and ext3).  Even though BIO doesn't presently
> manage page lifetimes, it _could_.  After all, the function is called
> bio_add_page(), not bio_add_virtual_address().  It's a bit hacky to kmalloc
> some memory, run virt_to_page() and to then present that page to BIO even
> though the caller (thanks to the slab optimisation) doesn't actually have
> control of that page's lifetime.

That was the conclusion I came to when this was brought up initially.
Fixing up XFS would be easyish and only waste a tiny amount of memory,
and the same is true for ext3 (I did in fact suggest just using get_free_page
for this case but got shot down for stupid reasons when the slab debug
alignment issues in that area came up)

But in this case we'd really need to enforce this, and add a
BUG_ON(PageSlab(page)) in bio_add_page to trip everyone submit
this kind of pages.

> So we have a few options to look at:
> 
> a) kludge things in AOE.  Unpleasing, and might cause memory leaks
>    (although it won't, because the caller hasn't run bi_end_io yet).
> 
> b) Take a ref on slab pages in slab.  A bit costly, perhaps.
> 
> c) teach ext3 and XFS to take a ref on these pages as they are added to
>    the BIOs, undo that ref in bi_end_io.
> 
> I think c)?

Yes.  I'm perfectly fine with this as long as we document and enforce
this.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PATCH 2.6.21-rc1 aoe: handle zero _count pages in bios
  2007-03-02  4:30       ` Christoph Hellwig
@ 2007-03-02  4:48         ` Andrew Morton
  2007-03-02  4:49           ` Christoph Hellwig
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2007-03-02  4:48 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: support, Ed L. Cashin, linux-kernel, Greg KH

On Fri, 2 Mar 2007 04:30:39 +0000 Christoph Hellwig <hch@infradead.org> wrote:

> But in this case we'd really need to enforce this, and add a
> BUG_ON(PageSlab(page)) in bio_add_page to trip everyone submit
> this kind of pages.

That would be

	BUG_ON(PageSlab(page) && page_count(page) == 0)?


> > So we have a few options to look at:
> > 
> > a) kludge things in AOE.  Unpleasing, and might cause memory leaks
> >    (although it won't, because the caller hasn't run bi_end_io yet).
> > 
> > b) Take a ref on slab pages in slab.  A bit costly, perhaps.
> > 
> > c) teach ext3 and XFS to take a ref on these pages as they are added to
> >    the BIOs, undo that ref in bi_end_io.
> > 
> > I think c)?
> 
> Yes.  I'm perfectly fine with this as long as we document and enforce
> this.

And write the patch ;)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PATCH 2.6.21-rc1 aoe: handle zero _count pages in bios
  2007-03-02  4:48         ` Andrew Morton
@ 2007-03-02  4:49           ` Christoph Hellwig
  2007-03-02  5:00             ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2007-03-02  4:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Hellwig, support, Ed L. Cashin, linux-kernel, Greg KH

On Thu, Mar 01, 2007 at 08:48:06PM -0800, Andrew Morton wrote:
> On Fri, 2 Mar 2007 04:30:39 +0000 Christoph Hellwig <hch@infradead.org> wrote:
> 
> > But in this case we'd really need to enforce this, and add a
> > BUG_ON(PageSlab(page)) in bio_add_page to trip everyone submit
> > this kind of pages.
> 
> That would be
> 
> 	BUG_ON(PageSlab(page) && page_count(page) == 0)?

No, all slab pages.  Currently they all have a reference count of
zero, but we generally don't want people to pass in pages that
come from a non-refcounted allocator.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PATCH 2.6.21-rc1 aoe: handle zero _count pages in bios
  2007-03-02  4:49           ` Christoph Hellwig
@ 2007-03-02  5:00             ` Andrew Morton
  2007-03-02  5:03               ` Christoph Hellwig
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2007-03-02  5:00 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: support, Ed L. Cashin, linux-kernel, Greg KH

On Fri, 2 Mar 2007 04:49:10 +0000 Christoph Hellwig <hch@infradead.org> wrote:

> On Thu, Mar 01, 2007 at 08:48:06PM -0800, Andrew Morton wrote:
> > On Fri, 2 Mar 2007 04:30:39 +0000 Christoph Hellwig <hch@infradead.org> wrote:
> > 
> > > But in this case we'd really need to enforce this, and add a
> > > BUG_ON(PageSlab(page)) in bio_add_page to trip everyone submit
> > > this kind of pages.
> > 
> > That would be
> > 
> > 	BUG_ON(PageSlab(page) && page_count(page) == 0)?
> 
> No, all slab pages.  Currently they all have a reference count of
> zero, but we generally don't want people to pass in pages that
> come from a non-refcounted allocator.

I that case we're talking about different things.

I thought the proposal was to continue to use slab pages, but to take a ref
on them as they're added to the bio, drop that ref in bi_end_io()?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PATCH 2.6.21-rc1 aoe: handle zero _count pages in bios
  2007-03-02  5:00             ` Andrew Morton
@ 2007-03-02  5:03               ` Christoph Hellwig
  2007-03-02  5:09                 ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2007-03-02  5:03 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Hellwig, support, Ed L. Cashin, linux-kernel, Greg KH

On Thu, Mar 01, 2007 at 09:00:44PM -0800, Andrew Morton wrote:
> I that case we're talking about different things.
> 
> I thought the proposal was to continue to use slab pages, but to take a ref
> on them as they're added to the bio, drop that ref in bi_end_io()?

That would give you silent memory corruption in case the networking code
hold a reference after the memory gets returned to slab and reused.

We need to either stop allowing to pass slab memory to the block layer,
or document that drivers need to handle it specially and give them a
way to find out about them. (Or do the horrible slab refcounting hack
I wrote up above)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PATCH 2.6.21-rc1 aoe: handle zero _count pages in bios
  2007-03-02  5:03               ` Christoph Hellwig
@ 2007-03-02  5:09                 ` Andrew Morton
  2007-03-02  5:15                   ` Christoph Hellwig
  2007-03-02 15:51                   ` Sam Hopkins
  0 siblings, 2 replies; 12+ messages in thread
From: Andrew Morton @ 2007-03-02  5:09 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: support, Ed L. Cashin, linux-kernel, Greg KH

On Fri, 2 Mar 2007 05:03:51 +0000 Christoph Hellwig <hch@infradead.org> wrote:

> On Thu, Mar 01, 2007 at 09:00:44PM -0800, Andrew Morton wrote:
> > I that case we're talking about different things.
> > 
> > I thought the proposal was to continue to use slab pages, but to take a ref
> > on them as they're added to the bio, drop that ref in bi_end_io()?
> 
> That would give you silent memory corruption in case the networking code
> hold a reference after the memory gets returned to slab and reused.

Well, given that bi_end_io() is called after the "io" has completed, I'm
assuming that networking has completely finished with the memory by the
time bi_end_io() gets called.

I guess one can envisage situations where that might not happen, but they'd
be terribly buggy ones, surely.

> We need to either stop allowing to pass slab memory to the block layer,
> or document that drivers need to handle it specially and give them a
> way to find out about them. (Or do the horrible slab refcounting hack
> I wrote up above)

OK.  So you're proposing that XFS and ext3 simply stop sing slab for this
memory?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PATCH 2.6.21-rc1 aoe: handle zero _count pages in bios
  2007-03-02  5:09                 ` Andrew Morton
@ 2007-03-02  5:15                   ` Christoph Hellwig
  2007-03-02 15:51                   ` Sam Hopkins
  1 sibling, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2007-03-02  5:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Hellwig, support, Ed L. Cashin, linux-kernel, Greg KH

On Thu, Mar 01, 2007 at 09:09:42PM -0800, Andrew Morton wrote:
> > or document that drivers need to handle it specially and give them a
> > way to find out about them. (Or do the horrible slab refcounting hack
> > I wrote up above)
> 
> OK.  So you're proposing that XFS and ext3 simply stop sing slab for this
> memory?

Yes.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PATCH 2.6.21-rc1 aoe: handle zero _count pages in bios
  2007-03-02  5:09                 ` Andrew Morton
  2007-03-02  5:15                   ` Christoph Hellwig
@ 2007-03-02 15:51                   ` Sam Hopkins
  1 sibling, 0 replies; 12+ messages in thread
From: Sam Hopkins @ 2007-03-02 15:51 UTC (permalink / raw)
  To: akpm, hch; +Cc: support, ecashin, linux-kernel, greg

> Well, given that bi_end_io() is called after the "io" has completed, I'm
> assuming that networking has completely finished with the memory by the
> time bi_end_io() gets called.
> 
> I guess one can envisage situations where that might not happen, but they'd
> be terribly buggy ones, surely.

This is actually quite common when using broadcom chipsets that take a
long time to clean out the tx ring.  We send a command skb out to
write some data, get the response some tens of ms later and the
command skb (with the pages) still sits in the tx ring.  I've gone to
some lengths to limit the skb memory used in aoe to help with the
OOM/swap issue and this has given me headaches.

Sam

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2007-03-02 16:12 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-01 23:15 PATCH 2.6.21-rc1 aoe: handle zero _count pages in bios Ed L. Cashin
2007-03-02  1:42 ` Andrew Morton
2007-03-02  2:29   ` Christoph Hellwig
2007-03-02  3:22     ` Andrew Morton
2007-03-02  4:30       ` Christoph Hellwig
2007-03-02  4:48         ` Andrew Morton
2007-03-02  4:49           ` Christoph Hellwig
2007-03-02  5:00             ` Andrew Morton
2007-03-02  5:03               ` Christoph Hellwig
2007-03-02  5:09                 ` Andrew Morton
2007-03-02  5:15                   ` Christoph Hellwig
2007-03-02 15:51                   ` Sam Hopkins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox