From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: [PATCH V4 2/3] xen-blkfront: teach blkfront driver to handle discard requests Date: Tue, 13 Sep 2011 09:15:17 -0400 Message-ID: <20110913131517.GB25632@phenom.oracle.com> References: <20110901152534.GA6965@dumpdata.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Return-path: Content-Disposition: inline In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Li Dongyang Cc: xen-devel@lists.xensource.com, owen.smith@citrix.com, JBeulich@novell.com List-Id: xen-devel@lists.xenproject.org On Tue, Sep 13, 2011 at 04:33:27PM +0800, Li Dongyang wrote: > Hi, Konrad > On Thu, Sep 1, 2011 at 11:25 PM, Konrad Rzeszutek Wilk > wrote: > > On Thu, Sep 01, 2011 at 06:39:09PM +0800, Li Dongyang wrote: > >> The blkfront driver now will read discard related nodes from xenstor= e, > >> and set up the request queue, then we can forward the > >> discard requests to backend driver. > > > > > > A better description is as follow: > > > > xen-blkfront: Handle discard requests. > > > > =A0 =A0If the backend advertises 'feature-discard', then interrogate > > =A0 =A0the backend for alignment, granularity, and max discard block = size. > > =A0 =A0Setup the request queue with the appropiate values and send th= e > > =A0 =A0discard operation as required. > > > > > >> > >> Signed-off-by: Li Dongyang > >> --- > >> =A0drivers/block/xen-blkfront.c | =A0111 +++++++++++++++++++++++++++= ++++++--------- > >> =A01 files changed, 88 insertions(+), 23 deletions(-) > >> > >> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfro= nt.c > >> index 9ea8c25..86e2c63 100644 > >> --- a/drivers/block/xen-blkfront.c > >> +++ b/drivers/block/xen-blkfront.c > >> @@ -98,6 +98,9 @@ struct blkfront_info > >> =A0 =A0 =A0 unsigned long shadow_free; > >> =A0 =A0 =A0 unsigned int feature_flush; > >> =A0 =A0 =A0 unsigned int flush_op; > >> + =A0 =A0 unsigned int feature_discard; > >> + =A0 =A0 unsigned int discard_granularity; > >> + =A0 =A0 unsigned int discard_alignment; > >> =A0 =A0 =A0 int is_ready; > >> =A0}; > >> > >> @@ -302,29 +305,36 @@ static int blkif_queue_request(struct request = *req) > >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 ring_req->operation =3D info->flush_op; > >> =A0 =A0 =A0 } > >> > >> - =A0 =A0 ring_req->nr_segments =3D blk_rq_map_sg(req->q, req, info-= >sg); > >> - =A0 =A0 BUG_ON(ring_req->nr_segments > BLKIF_MAX_SEGMENTS_PER_REQU= EST); > >> + =A0 =A0 if (unlikely(req->cmd_flags & REQ_DISCARD)) { > >> + =A0 =A0 =A0 =A0 =A0 =A0 /* id, sector_number and handle are set ab= ove. */ > >> + =A0 =A0 =A0 =A0 =A0 =A0 ring_req->operation =3D BLKIF_OP_DISCARD; > >> + =A0 =A0 =A0 =A0 =A0 =A0 ring_req->nr_segments =3D 0; > >> + =A0 =A0 =A0 =A0 =A0 =A0 ring_req->u.discard.nr_sectors =3D blk_rq_= sectors(req); > >> + =A0 =A0 } else { > >> + =A0 =A0 =A0 =A0 =A0 =A0 ring_req->nr_segments =3D blk_rq_map_sg(re= q->q, req, info->sg); > >> + =A0 =A0 =A0 =A0 =A0 =A0 BUG_ON(ring_req->nr_segments > BLKIF_MAX_S= EGMENTS_PER_REQUEST); > >> > >> - =A0 =A0 for_each_sg(info->sg, sg, ring_req->nr_segments, i) { > >> - =A0 =A0 =A0 =A0 =A0 =A0 buffer_mfn =3D pfn_to_mfn(page_to_pfn(sg_p= age(sg))); > >> - =A0 =A0 =A0 =A0 =A0 =A0 fsect =3D sg->offset >> 9; > >> - =A0 =A0 =A0 =A0 =A0 =A0 lsect =3D fsect + (sg->length >> 9) - 1; > >> - =A0 =A0 =A0 =A0 =A0 =A0 /* install a grant reference. */ > >> - =A0 =A0 =A0 =A0 =A0 =A0 ref =3D gnttab_claim_grant_reference(&gref= _head); > >> - =A0 =A0 =A0 =A0 =A0 =A0 BUG_ON(ref =3D=3D -ENOSPC); > >> + =A0 =A0 =A0 =A0 =A0 =A0 for_each_sg(info->sg, sg, ring_req->nr_seg= ments, i) { > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 buffer_mfn =3D pfn_to_mfn(= page_to_pfn(sg_page(sg))); > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 fsect =3D sg->offset >> 9; > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 lsect =3D fsect + (sg->len= gth >> 9) - 1; > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* install a grant referen= ce. */ > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ref =3D gnttab_claim_grant= _reference(&gref_head); > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 BUG_ON(ref =3D=3D -ENOSPC)= ; > >> > >> - =A0 =A0 =A0 =A0 =A0 =A0 gnttab_grant_foreign_access_ref( > >> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ref, > >> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 info->xbde= v->otherend_id, > >> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 buffer_mfn= , > >> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 rq_data_di= r(req) ); > >> - > >> - =A0 =A0 =A0 =A0 =A0 =A0 info->shadow[id].frame[i] =3D mfn_to_pfn(b= uffer_mfn); > >> - =A0 =A0 =A0 =A0 =A0 =A0 ring_req->u.rw.seg[i] =3D > >> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (struct bl= kif_request_segment) { > >> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 .gref =A0 =A0 =A0 =3D ref, > >> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 .first_sect =3D fsect, > >> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 .last_sect =A0=3D lsect }; > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 gnttab_grant_foreign_acces= s_ref( > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 ref, > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 info->xbdev->otherend_id, > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 buffer_mfn, > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 rq_data_dir(req)); > >> + > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 info->shadow[id].frame[i] = =3D mfn_to_pfn(buffer_mfn); > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ring_req->u.rw.seg[i] =3D > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 (struct blkif_request_segment) { > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 .gref =A0 =A0 =A0 =3D ref, > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 .first_sect =3D fsect, > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 .last_sect =A0=3D lsect }; > >> + =A0 =A0 =A0 =A0 =A0 =A0 } > >> =A0 =A0 =A0 } > >> > >> =A0 =A0 =A0 info->ring.req_prod_pvt++; > >> @@ -399,6 +409,7 @@ wait: > >> =A0static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_si= ze) > >> =A0{ > >> =A0 =A0 =A0 struct request_queue *rq; > >> + =A0 =A0 struct blkfront_info *info =3D gd->private_data; > >> > >> =A0 =A0 =A0 rq =3D blk_init_queue(do_blkif_request, &blkif_io_lock); > >> =A0 =A0 =A0 if (rq =3D=3D NULL) > >> @@ -406,6 +417,13 @@ static int xlvbd_init_blk_queue(struct gendisk = *gd, u16 sector_size) > >> > >> =A0 =A0 =A0 queue_flag_set_unlocked(QUEUE_FLAG_VIRT, rq); > >> > >> + =A0 =A0 if (info->feature_discard) { > >> + =A0 =A0 =A0 =A0 =A0 =A0 queue_flag_set_unlocked(QUEUE_FLAG_DISCARD= , rq); > >> + =A0 =A0 =A0 =A0 =A0 =A0 blk_queue_max_discard_sectors(rq, get_capa= city(gd)); > > > > This is not correct. I took a look at the SCSI support for this > > ('sd_config_discard') and if you look there carefully you will see th= at > > the value can be 64KB for example. > > > > You need to provide a mechanism similar to 'discard-*' to fetch that = data > > from the backend. > > > >> + =A0 =A0 =A0 =A0 =A0 =A0 rq->limits.discard_granularity =3D info->d= iscard_granularity; > >> + =A0 =A0 =A0 =A0 =A0 =A0 rq->limits.discard_alignment =3D info->dis= card_alignment; > >> + =A0 =A0 } > >> + > >> =A0 =A0 =A0 /* Hard sector size and max sectors impersonate the equi= v. hardware. */ > >> =A0 =A0 =A0 blk_queue_logical_block_size(rq, sector_size); > >> =A0 =A0 =A0 blk_queue_max_hw_sectors(rq, 512); > >> @@ -722,6 +740,19 @@ static irqreturn_t blkif_interrupt(int irq, voi= d *dev_id) > >> > >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 error =3D (bret->status =3D=3D BLKIF_RSP= _OKAY) ? 0 : -EIO; > >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 switch (bret->operation) { > >> + =A0 =A0 =A0 =A0 =A0 =A0 case BLKIF_OP_DISCARD: > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (unlikely(bret->status = =3D=3D BLKIF_RSP_EOPNOTSUPP)) { > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 struct req= uest_queue *rq =3D info->rq; > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 printk(KER= N_WARNING "blkfront: %s: discard op failed\n", > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0info->gd->disk_name); > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 error =3D = -EOPNOTSUPP; > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 info->feat= ure_discard =3D 0; > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 spin_lock(= rq->queue_lock); > we should not take the spin_lock here, > as the rq->queue_lock is actually blkif_io_lock, we pass the > blkif_io_lock to blk_init_queue > when we init the queue, > and blkif_io_lock has already acquired above in blkif_interrupt(), so > we will deadlock here. >=20 > I'm so sorry if this has already in your tree, I've tested with > BLKIF_RSP_ERROR but forgot the EOPNOTSUPP case, > I feel like I'm a stupid ass. > could u get rid od the spin_lock and spin_unlock below? Thanks Just send me a patch with the change and I will apply it to the tree. Thanks for testing it even further!