All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <jens.axboe@oracle.com>
To: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: davem@davemloft.net, linux-kernel@vger.kernel.org
Subject: Re: IDE crash...
Date: Tue, 23 Oct 2007 09:23:24 +0200	[thread overview]
Message-ID: <20071023072324.GG25962@kernel.dk> (raw)
In-Reply-To: <20071023161416W.fujita.tomonori@lab.ntt.co.jp>

On Tue, Oct 23 2007, FUJITA Tomonori wrote:
> On Tue, 23 Oct 2007 09:09:33 +0200
> Jens Axboe <jens.axboe@oracle.com> wrote:
> 
> > On Tue, Oct 23 2007, Jens Axboe wrote:
> > > On Mon, Oct 22 2007, David Miller wrote:
> > > > 
> > > > I'm debugging a blk_rq_map_sg() crash that i'm getting on sparc64 as
> > > > root is mounted over IDE.  I think I know what is happening now.
> > > > 
> > > > The IDE sg table is allocated and initialized like this in
> > > > drivers/ide/ide-probe.c:
> > > > 
> > > > 	x = kmalloc(sizeof(struct scatterlist) * nents, GFP_XXX);
> > > > 	sg_init_table(x, nents);
> > > > 
> > > > So far, so good.
> > > > 
> > > > Now, ide_map_sg() passes requests down to blk_rq_map_sg() like this in
> > > > drivers/block/ide-io.c:
> > > > 
> > > > 		hwif->sg_nents = blk_rq_map_sg(drive->queue, rq, sg);
> > > > 
> > > > Ok, so what does blk_rq_map_sg() do?
> > > > 
> > > > 	sg = NULL;
> > > > 	rq_for_each_segment(bvec, rq, iter) {
> > > >  ...
> > > > 		if (bvprv && cluster) {
> > > >  ...
> > > > 		} else {
> > > > new_segment:
> > > > 			if (!sg)
> > > > 				sg = sglist;
> > > > 			else
> > > > 				sg = sg_next(sg);
> > > >  ...
> > > > 		}
> > > > 		bvprv = bvec;
> > > > 	} /* segments in rq */
> > > > 
> > > > 	if (sg)
> > > > 		__sg_mark_end(sg);
> > > > 
> > > > So let's say the first request comes in and needs 2 segs.
> > > > This will mark sg[1].page_link with 0x2
> > > > 
> > > > If the next request from IDE needs 4 segs, we'll OOPS because
> > > > sg_next() on &sg[1] will see page_link bit 0x2 is set and
> > > > therefore return NULL.
> > > > 
> > > > A quick look shows that if you're testing on SCSI (or something
> > > > layered on top of it like SATA or PATA) you won't see this seemingly
> > > > guarenteed crash because the SCSI mid-layer allocates a fresh sglist
> > > > via mempool_alloc() and runs sg_init_table() on it for every I/O
> > > > request.
> > > 
> > > We should never see the end pointer in blk_rq_map_sg(), or that's a bug
> > > in the driver. So it should be OK to just clear the end pointer always
> > > in there, even if it's not the prettiest solution...
> > > 
> > > This just needs to be wrapped up in some scatterlist.h macro/function.
> > > 
> > > diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
> > > index 61c2e39..a3bda2f 100644
> > > --- a/block/ll_rw_blk.c
> > > +++ b/block/ll_rw_blk.c
> > > @@ -1354,6 +1354,12 @@ new_segment:
> > >  			else
> > >  				sg = sg_next(sg);
> > >  
> > > +			/*
> > > +			 * Clear end-of-table pointer, we'll mark a new one
> > > +			 * at the end
> > > +			 */
> > > +			sg->page_link &= ~0x2;
> > > +
> > >  			sg_dma_len(sg) = 0;
> > >  			sg_dma_address(sg) = 0;
> > >  			sg_set_page(sg, bvec->bv_page);
> > 
> > Eh this wont work, it's the wrong entry... Here's a temporary
> > work-around.
> 
> Yeah, it won't work. Now we must call sg_init_table for every I/O
> request (it's not nice).

I think the fix would be to have a sg_next_and_clear() or something that
doesn't honor the 0x02 termination bit and clears it, for the cases
where you KNOW that there are more entries.

> I think that there are other blk_rq_map_sg users need such fix.

Possibly, I did do quite a few of them. Alternatively, we can remove
__sg_mark_end() and leave that up to the driver.


diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index 61c2e39..290836f 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -1352,7 +1352,7 @@ new_segment:
 			if (!sg)
 				sg = sglist;
 			else
-				sg = sg_next(sg);
+				sg = sg_next_force(sg);
 
 			sg_dma_len(sg) = 0;
 			sg_dma_address(sg) = 0;
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 42daf5e..a98a2ee 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -99,6 +99,22 @@ static inline struct scatterlist *sg_next(struct scatterlist *sg)
 	return sg;
 }
 
+/**
+ * sg_next_force - return the next scatterlist entry in a list
+ * @sg:		   The current sg entry
+ *
+ * Description:
+ *   Must only be used when more entries beyond this one is known to exist,
+ *   as it clears the termination bit. Useful to avoid adding a full sg
+ *   table init on every mapping.
+ *
+ **/
+static inline struct scatterlist *sg_next_force(struct scatterlist *sg)
+{
+	sg->page_link &= ~0x02;
+	return sg_next(sg);
+}
+
 /*
  * Loop over each sg element, following the pointer to a new list if necessary
  */

-- 
Jens Axboe


  reply	other threads:[~2007-10-23  7:24 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-10-23  6:50 IDE crash David Miller
2007-10-23  7:02 ` Jens Axboe
2007-10-23  7:09   ` Jens Axboe
2007-10-23  7:14     ` FUJITA Tomonori
2007-10-23  7:23       ` Jens Axboe [this message]
2007-10-23  7:18     ` David Miller
2007-10-23  7:23       ` Jens Axboe
2007-10-23  7:43         ` David Miller
2007-10-23  7:45           ` Jens Axboe
2007-10-23 15:10             ` John Stoffel
2007-10-24  6:49               ` Jens Axboe
2007-10-24 16:27                 ` John Stoffel
2007-10-24 18:10                   ` Jens Axboe
2007-10-23 10:52           ` FUJITA Tomonori
2007-10-23 10:57             ` Jens Axboe
2007-10-23 10:58               ` Jens Axboe
2007-10-23 11:10                 ` FUJITA Tomonori
2007-10-23 11:43                   ` Jens Axboe
2007-10-23 21:18               ` David Miller
2007-10-23 21:44                 ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20071023072324.GG25962@kernel.dk \
    --to=jens.axboe@oracle.com \
    --cc=davem@davemloft.net \
    --cc=fujita.tomonori@lab.ntt.co.jp \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.