All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [TEST PATCH] xen/blkfront: use blk_rq_map_sg to generate ring entries
       [not found] <13313891.8411201233953050380.JavaMail.root@ouachita>
@ 2009-02-06 20:47 ` Greg Harris
  2009-02-06 21:16   ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 4+ messages in thread
From: Greg Harris @ 2009-02-06 20:47 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Xen-devel, Jens Axboe

It looks like this patch applied against 2.6.28.2 resolves the issue for us -- our spinner has been running for two days now without a panic.  We've been testing this by spinning around an installer which creates file systems across two disks and then installs a base Linux system and some additional software.  Previously we were running into these panics towards the end of file-system creation.  

Will this patch be part of the next upstream kernel release?

Thank you all for your help in tracking down this issue.  

---

Greg Harris
System Administrator
MetaCarta, Inc.

(O) +1 (617) 301-5530
(M) +1 (781) 258-4474

----- "Jeremy Fitzhardinge" <jeremy@goop.org> wrote:

> From: Jens Axboe <jens.axboe@oracle.com>
> 
> On occasion, the request will apparently have more segments than we
> fit into the ring. Jens says:
> 
> > The second problem is that the block layer then appears to create
> one
> > too many segments, but from the dump it has rq->nr_phys_segments ==
> > BLKIF_MAX_SEGMENTS_PER_REQUEST. I suspect the latter is due to
> > xen-blkfront not handling the merging on its own. It should check
> that
> > the new page doesn't form part of the previous page. The
> > rq_for_each_segment() iterates all single bits in the request, not
> dma
> > segments. The "easiest" way to do this is to call blk_rq_map_sg()
> and
> > then iterate the mapped sg list. That will give you what you are
> > looking for.
> 
> > Here's a test patch, compiles but otherwise untested. I spent more
> > time figuring out how to enable XEN than to code it up, so YMMV!
> > Probably the sg list wants to be put inside the ring and only
> > initialized on allocation, then you can get rid of the sg on stack
> and
> > sg_init_table() loop call in the function. I'll leave that, and the
> > testing, to you.
> 
> [Moved sg array into info structure, and initialize once. -J]
> 
> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> ---
>  drivers/block/xen-blkfront.c |   30 +++++++++++++++---------------
>  1 file changed, 15 insertions(+), 15 deletions(-)
> 
> ===================================================================
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -40,6 +40,7 @@
>  #include <linux/hdreg.h>
>  #include <linux/cdrom.h>
>  #include <linux/module.h>
> +#include <linux/scatterlist.h>
>  
>  #include <xen/xenbus.h>
>  #include <xen/grant_table.h>
> @@ -82,6 +83,7 @@
>  	enum blkif_state connected;
>  	int ring_ref;
>  	struct blkif_front_ring ring;
> +	struct scatterlist sg[BLKIF_MAX_SEGMENTS_PER_REQUEST];
>  	unsigned int evtchn, irq;
>  	struct request_queue *rq;
>  	struct work_struct work;
> @@ -204,12 +206,11 @@
>  	struct blkfront_info *info = req->rq_disk->private_data;
>  	unsigned long buffer_mfn;
>  	struct blkif_request *ring_req;
> -	struct req_iterator iter;
> -	struct bio_vec *bvec;
>  	unsigned long id;
>  	unsigned int fsect, lsect;
> -	int ref;
> +	int i, ref;
>  	grant_ref_t gref_head;
> +	struct scatterlist *sg;
>  
>  	if (unlikely(info->connected != BLKIF_STATE_CONNECTED))
>  		return 1;
> @@ -238,12 +239,13 @@
>  	if (blk_barrier_rq(req))
>  		ring_req->operation = BLKIF_OP_WRITE_BARRIER;
>  
> -	ring_req->nr_segments = 0;
> -	rq_for_each_segment(bvec, req, iter) {
> -		BUG_ON(ring_req->nr_segments == BLKIF_MAX_SEGMENTS_PER_REQUEST);
> -		buffer_mfn = pfn_to_mfn(page_to_pfn(bvec->bv_page));
> -		fsect = bvec->bv_offset >> 9;
> -		lsect = fsect + (bvec->bv_len >> 9) - 1;
> +	ring_req->nr_segments = blk_rq_map_sg(req->q, req, info->sg);
> +	BUG_ON(ring_req->nr_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST);
> +
> +	for_each_sg(info->sg, sg, ring_req->nr_segments, i) {
> +		buffer_mfn = pfn_to_mfn(page_to_pfn(sg_page(sg)));
> +		fsect = sg->offset >> 9;
> +		lsect = fsect + (sg->length >> 9) - 1;
>  		/* install a grant reference. */
>  		ref = gnttab_claim_grant_reference(&gref_head);
>  		BUG_ON(ref == -ENOSPC);
> @@ -254,16 +256,12 @@
>  				buffer_mfn,
>  				rq_data_dir(req) );
>  
> -		info->shadow[id].frame[ring_req->nr_segments] =
> -				mfn_to_pfn(buffer_mfn);
> -
> -		ring_req->seg[ring_req->nr_segments] =
> +		info->shadow[id].frame[i] = mfn_to_pfn(buffer_mfn);
> +		ring_req->seg[i] =
>  				(struct blkif_request_segment) {
>  					.gref       = ref,
>  					.first_sect = fsect,
>  					.last_sect  = lsect };
> -
> -		ring_req->nr_segments++;
>  	}
>  
>  	info->ring.req_prod_pvt++;
> @@ -622,6 +620,8 @@
>  	SHARED_RING_INIT(sring);
>  	FRONT_RING_INIT(&info->ring, sring, PAGE_SIZE);
>  
> +	sg_init_table(info->sg, BLKIF_MAX_SEGMENTS_PER_REQUEST);
> +
>  	err = xenbus_grant_ring(dev, virt_to_mfn(info->ring.sring));
>  	if (err < 0) {
>  		free_page((unsigned long)sring);

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [TEST PATCH] xen/blkfront: use blk_rq_map_sg to generate ring entries
  2009-02-06 20:47 ` [TEST PATCH] xen/blkfront: use blk_rq_map_sg to generate ring entries Greg Harris
@ 2009-02-06 21:16   ` Jeremy Fitzhardinge
  2009-02-09  9:52     ` [TEST PATCH] xen/blkfront: use blk_rq_map_sg togenerate " Jan Beulich
  0 siblings, 1 reply; 4+ messages in thread
From: Jeremy Fitzhardinge @ 2009-02-06 21:16 UTC (permalink / raw)
  To: Greg Harris; +Cc: Xen-devel, Jens Axboe

Greg Harris wrote:
> It looks like this patch applied against 2.6.28.2 resolves the issue for us -- our spinner has been running for two days now without a panic.  We've been testing this by spinning around an installer which creates file systems across two disks and then installs a base Linux system and some additional software.  Previously we were running into these panics towards the end of file-system creation.  
>   

Thanks very much for testing.

> Will this patch be part of the next upstream kernel release?
>   

Yes, and it should go into the next stable kernel too.

    J

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [TEST PATCH] xen/blkfront: use blk_rq_map_sg togenerate ring entries
  2009-02-06 21:16   ` Jeremy Fitzhardinge
@ 2009-02-09  9:52     ` Jan Beulich
  2009-02-09 17:35       ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Beulich @ 2009-02-09  9:52 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Greg Harris; +Cc: Xen-devel, Jens Axboe

>>> Jeremy Fitzhardinge <jeremy@goop.org> 06.02.09 22:16 >>>
>Greg Harris wrote:
>> It looks like this patch applied against 2.6.28.2 resolves the issue for us -- our spinner has been running for two days now without a panic.  We've been testing this by spinning around an installer which creates file systems across two disks and then installs a base Linux system and some additional software.  Previously we were running into these panics towards the end of file-system creation.  
>>   
>
>Thanks very much for testing.
>
>> Will this patch be part of the next upstream kernel release?
>>   
>
>Yes, and it should go into the next stable kernel too.

But isn't this just reducing the likelihood of hitting the problem? Without
understanding why the entry limit gets exceeded (and fixing that), the
potential for this to happen again is still there.

Of course, the change made here has its own benefits, so I fully agree
it should be applied (though I wonder whether it's indeed a -stable
candidate).

Jan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Re: [TEST PATCH] xen/blkfront: use blk_rq_map_sg togenerate ring entries
  2009-02-09  9:52     ` [TEST PATCH] xen/blkfront: use blk_rq_map_sg togenerate " Jan Beulich
@ 2009-02-09 17:35       ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 4+ messages in thread
From: Jeremy Fitzhardinge @ 2009-02-09 17:35 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Greg Harris, Xen-devel, Jens Axboe

Jan Beulich wrote:
> But isn't this just reducing the likelihood of hitting the problem? Without
> understanding why the entry limit gets exceeded (and fixing that), the
> potential for this to happen again is still there.
>   

I had the same thought, but Jens says it fixes the real problem, so that 
satisfies me ;)

Greg reported that simply decreasing the max segments to one less than 
the ring size was sufficient to clear up the symptoms, which indicates 
it was never submitting more than one extra segment.  Jens writes:
> The second problem is
> that the block layer then appears to create one too many segments, but
> from the dump it has rq->nr_phys_segments ==
> BLKIF_MAX_SEGMENTS_PER_REQUEST. I suspect the latter is due to
> xen-blkfront not handling the merging on its own. It should check that
> the new page doesn't form part of the previous page.
which suggests that blkfront was omitting a merging check that driver 
should perform, results in this problem.  I don't really understand why 
the upper layers can't do this merge check, but I'm sure Jens has a good 
reason.

> Of course, the change made here has its own benefits, so I fully agree
> it should be applied (though I wonder whether it's indeed a -stable
> candidate).

Well, it appears to be the proper fix to a real bug observed in the 
field, which makes it an ideal -stable candidate.

    J

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-02-09 17:35 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <13313891.8411201233953050380.JavaMail.root@ouachita>
2009-02-06 20:47 ` [TEST PATCH] xen/blkfront: use blk_rq_map_sg to generate ring entries Greg Harris
2009-02-06 21:16   ` Jeremy Fitzhardinge
2009-02-09  9:52     ` [TEST PATCH] xen/blkfront: use blk_rq_map_sg togenerate " Jan Beulich
2009-02-09 17:35       ` Jeremy Fitzhardinge

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.