public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <jens.axboe@oracle.com>
To: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: davem@davemloft.net, linux-kernel@vger.kernel.org
Subject: Re: IDE crash...
Date: Tue, 23 Oct 2007 09:23:24 +0200	[thread overview]
Message-ID: <20071023072324.GG25962@kernel.dk> (raw)
In-Reply-To: <20071023161416W.fujita.tomonori@lab.ntt.co.jp>

On Tue, Oct 23 2007, FUJITA Tomonori wrote:
> On Tue, 23 Oct 2007 09:09:33 +0200
> Jens Axboe <jens.axboe@oracle.com> wrote:
> 
> > On Tue, Oct 23 2007, Jens Axboe wrote:
> > > On Mon, Oct 22 2007, David Miller wrote:
> > > > 
> > > > I'm debugging a blk_rq_map_sg() crash that i'm getting on sparc64 as
> > > > root is mounted over IDE.  I think I know what is happening now.
> > > > 
> > > > The IDE sg table is allocated and initialized like this in
> > > > drivers/ide/ide-probe.c:
> > > > 
> > > > 	x = kmalloc(sizeof(struct scatterlist) * nents, GFP_XXX);
> > > > 	sg_init_table(x, nents);
> > > > 
> > > > So far, so good.
> > > > 
> > > > Now, ide_map_sg() passes requests down to blk_rq_map_sg() like this in
> > > > drivers/block/ide-io.c:
> > > > 
> > > > 		hwif->sg_nents = blk_rq_map_sg(drive->queue, rq, sg);
> > > > 
> > > > Ok, so what does blk_rq_map_sg() do?
> > > > 
> > > > 	sg = NULL;
> > > > 	rq_for_each_segment(bvec, rq, iter) {
> > > >  ...
> > > > 		if (bvprv && cluster) {
> > > >  ...
> > > > 		} else {
> > > > new_segment:
> > > > 			if (!sg)
> > > > 				sg = sglist;
> > > > 			else
> > > > 				sg = sg_next(sg);
> > > >  ...
> > > > 		}
> > > > 		bvprv = bvec;
> > > > 	} /* segments in rq */
> > > > 
> > > > 	if (sg)
> > > > 		__sg_mark_end(sg);
> > > > 
> > > > So let's say the first request comes in and needs 2 segs.
> > > > This will mark sg[1].page_link with 0x2
> > > > 
> > > > If the next request from IDE needs 4 segs, we'll OOPS because
> > > > sg_next() on &sg[1] will see page_link bit 0x2 is set and
> > > > therefore return NULL.
> > > > 
> > > > A quick look shows that if you're testing on SCSI (or something
> > > > layered on top of it like SATA or PATA) you won't see this seemingly
> > > > guarenteed crash because the SCSI mid-layer allocates a fresh sglist
> > > > via mempool_alloc() and runs sg_init_table() on it for every I/O
> > > > request.
> > > 
> > > We should never see the end pointer in blk_rq_map_sg(), or that's a bug
> > > in the driver. So it should be OK to just clear the end pointer always
> > > in there, even if it's not the prettiest solution...
> > > 
> > > This just needs to be wrapped up in some scatterlist.h macro/function.
> > > 
> > > diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
> > > index 61c2e39..a3bda2f 100644
> > > --- a/block/ll_rw_blk.c
> > > +++ b/block/ll_rw_blk.c
> > > @@ -1354,6 +1354,12 @@ new_segment:
> > >  			else
> > >  				sg = sg_next(sg);
> > >  
> > > +			/*
> > > +			 * Clear end-of-table pointer, we'll mark a new one
> > > +			 * at the end
> > > +			 */
> > > +			sg->page_link &= ~0x2;
> > > +
> > >  			sg_dma_len(sg) = 0;
> > >  			sg_dma_address(sg) = 0;
> > >  			sg_set_page(sg, bvec->bv_page);
> > 
> > Eh this wont work, it's the wrong entry... Here's a temporary
> > work-around.
> 
> Yeah, it won't work. Now we must call sg_init_table for every I/O
> request (it's not nice).

I think the fix would be to have a sg_next_and_clear() or something that
doesn't honor the 0x02 termination bit and clears it, for the cases
where you KNOW that there are more entries.

> I think that there are other blk_rq_map_sg users need such fix.

Possibly, I did do quite a few of them. Alternatively, we can remove
__sg_mark_end() and leave that up to the driver.


diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index 61c2e39..290836f 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -1352,7 +1352,7 @@ new_segment:
 			if (!sg)
 				sg = sglist;
 			else
-				sg = sg_next(sg);
+				sg = sg_next_force(sg);
 
 			sg_dma_len(sg) = 0;
 			sg_dma_address(sg) = 0;
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 42daf5e..a98a2ee 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -99,6 +99,22 @@ static inline struct scatterlist *sg_next(struct scatterlist *sg)
 	return sg;
 }
 
+/**
+ * sg_next_force - return the next scatterlist entry in a list
+ * @sg:		   The current sg entry
+ *
+ * Description:
+ *   Must only be used when more entries beyond this one is known to exist,
+ *   as it clears the termination bit. Useful to avoid adding a full sg
+ *   table init on every mapping.
+ *
+ **/
+static inline struct scatterlist *sg_next_force(struct scatterlist *sg)
+{
+	sg->page_link &= ~0x02;
+	return sg_next(sg);
+}
+
 /*
  * Loop over each sg element, following the pointer to a new list if necessary
  */

-- 
Jens Axboe


  reply	other threads:[~2007-10-23  7:24 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-10-23  6:50 IDE crash David Miller
2007-10-23  7:02 ` Jens Axboe
2007-10-23  7:09   ` Jens Axboe
2007-10-23  7:14     ` FUJITA Tomonori
2007-10-23  7:23       ` Jens Axboe [this message]
2007-10-23  7:18     ` David Miller
2007-10-23  7:23       ` Jens Axboe
2007-10-23  7:43         ` David Miller
2007-10-23  7:45           ` Jens Axboe
2007-10-23 15:10             ` John Stoffel
2007-10-24  6:49               ` Jens Axboe
2007-10-24 16:27                 ` John Stoffel
2007-10-24 18:10                   ` Jens Axboe
2007-10-23 10:52           ` FUJITA Tomonori
2007-10-23 10:57             ` Jens Axboe
2007-10-23 10:58               ` Jens Axboe
2007-10-23 11:10                 ` FUJITA Tomonori
2007-10-23 11:43                   ` Jens Axboe
2007-10-23 21:18               ` David Miller
2007-10-23 21:44                 ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20071023072324.GG25962@kernel.dk \
    --to=jens.axboe@oracle.com \
    --cc=davem@davemloft.net \
    --cc=fujita.tomonori@lab.ntt.co.jp \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox