From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: [PATCH V4 2/3] xen-blkfront: teach blkfront driver to handle discard requests Date: Fri, 2 Sep 2011 09:28:30 -0400 Message-ID: <20110902132830.GB5088@dumpdata.com> References: <20110901152534.GA6965@dumpdata.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Return-path: Content-Disposition: inline In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Li Dongyang Cc: xen-devel@lists.xensource.com, owen.smith@citrix.com, JBeulich@novell.com List-Id: xen-devel@lists.xenproject.org On Fri, Sep 02, 2011 at 02:30:03PM +0800, Li Dongyang wrote: > On Thu, Sep 1, 2011 at 11:25 PM, Konrad Rzeszutek Wilk > wrote: > > On Thu, Sep 01, 2011 at 06:39:09PM +0800, Li Dongyang wrote: > >> The blkfront driver now will read discard related nodes from xenstor= e, > >> and set up the request queue, then we can forward the > >> discard requests to backend driver. > > > > > > A better description is as follow: > > > > xen-blkfront: Handle discard requests. > > > > =A0 =A0If the backend advertises 'feature-discard', then interrogate > > =A0 =A0the backend for alignment, granularity, and max discard block = size. > > =A0 =A0Setup the request queue with the appropiate values and send th= e > > =A0 =A0discard operation as required. > > > thanks for the description, way better than mine :-) OK. Lets use that. .. snip.. > >> + =A0 =A0 if (info->feature_discard) { > >> + =A0 =A0 =A0 =A0 =A0 =A0 queue_flag_set_unlocked(QUEUE_FLAG_DISCARD= , rq); > >> + =A0 =A0 =A0 =A0 =A0 =A0 blk_queue_max_discard_sectors(rq, get_capa= city(gd)); > > > > This is not correct. I took a look at the SCSI support for this > > ('sd_config_discard') and if you look there carefully you will see th= at > > the value can be 64KB for example. > well I don't think so, max_discard_sectors are used to split the bio, > make them not to > exceed the max_discard_sectors, see blkdev_issue_discard, and > that's fine if we use the capacity as max_discard_sectors > in guest: for file backend, there's no need for max_discard_sectors, > and for phy backend, we'll make the blkfront sent out a single big > discard request, Right, one sector for the whole disk size. > and when we deal with that in blkback by calling blkdev_issue_discard, > the real max_discard_sectors > from phy dev will be used, and take care of splitting bios, Thanks Ah, you are right: if (nr_sects > max_discard_sectors) { bio->bi_size =3D max_discard_sectors << 9; nr_sects -=3D max_discard_sectors; sector +=3D max_discard_sectors; } else { handles when that value (nr_sects =3D=3D get_capacity(x)) is extremely large. OK, that was the only issue I had - and that has been resolved. So stuck your patchset series in my tree - but they are not yet visisble at kernel.org >=20 > > > > You need to provide a mechanism similar to 'discard-*' to fetch that = data > > from the backend. > > > >> + =A0 =A0 =A0 =A0 =A0 =A0 rq->limits.discard_granularity =3D info->d= iscard_granularity; > >> + =A0 =A0 =A0 =A0 =A0 =A0 rq->limits.discard_alignment =3D info->dis= card_alignment; > >> + =A0 =A0 } > >> + > >> =A0 =A0 =A0 /* Hard sector size and max sectors impersonate the equi= v. hardware. */ > >> =A0 =A0 =A0 blk_queue_logical_block_size(rq, sector_size); > >> =A0 =A0 =A0 blk_queue_max_hw_sectors(rq, 512); > >> @@ -722,6 +740,19 @@ static irqreturn_t blkif_interrupt(int irq, voi= d *dev_id) > >> > >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 error =3D (bret->status =3D=3D BLKIF_RSP= _OKAY) ? 0 : -EIO; > >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 switch (bret->operation) { > >> + =A0 =A0 =A0 =A0 =A0 =A0 case BLKIF_OP_DISCARD: > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (unlikely(bret->status = =3D=3D BLKIF_RSP_EOPNOTSUPP)) { > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 struct req= uest_queue *rq =3D info->rq; > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 printk(KER= N_WARNING "blkfront: %s: discard op failed\n", > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0info->gd->disk_name); > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 error =3D = -EOPNOTSUPP; > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 info->feat= ure_discard =3D 0; > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 spin_lock(= rq->queue_lock); > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 queue_flag= _clear(QUEUE_FLAG_DISCARD, rq); > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 spin_unloc= k(rq->queue_lock); > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 __blk_end_request_all(req,= error); > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 break; > >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 case BLKIF_OP_FLUSH_DISKCACHE: > >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 case BLKIF_OP_WRITE_BARRIER: > >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (unlikely(bret->statu= s =3D=3D BLKIF_RSP_EOPNOTSUPP)) { > >> @@ -1098,6 +1129,33 @@ blkfront_closing(struct blkfront_info *info) > >> =A0 =A0 =A0 bdput(bdev); > >> =A0} > >> > >> +static void blkfront_setup_discard(struct blkfront_info *info) > >> +{ > >> + =A0 =A0 int err; > >> + =A0 =A0 char *type; > >> + =A0 =A0 unsigned int discard_granularity; > >> + =A0 =A0 unsigned int discard_alignment; > >> + > >> + =A0 =A0 type =3D xenbus_read(XBT_NIL, info->xbdev->otherend, "type= ", NULL); > >> + =A0 =A0 if (IS_ERR(type)) > >> + =A0 =A0 =A0 =A0 =A0 =A0 return; > >> + > >> + =A0 =A0 if (strncmp(type, "phy", 3) =3D=3D 0) { > >> + =A0 =A0 =A0 =A0 =A0 =A0 err =3D xenbus_gather(XBT_NIL, info->xbdev= ->otherend, > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 "discard-granularity", "%u= ", &discard_granularity, > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 "discard-alignment", "%u",= &discard_alignment, > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 NULL); > >> + =A0 =A0 =A0 =A0 =A0 =A0 if (!err) { > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 info->feature_discard =3D = 1; > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 info->discard_granularity = =3D discard_granularity; > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 info->discard_alignment =3D= discard_alignment; > >> + =A0 =A0 =A0 =A0 =A0 =A0 } > >> + =A0 =A0 } else if (strncmp(type, "file", 4) =3D=3D 0) > >> + =A0 =A0 =A0 =A0 =A0 =A0 info->feature_discard =3D 1; > >> + > >> + =A0 =A0 kfree(type); > >> +} > >> + > >> =A0/* > >> =A0 * Invoked when the backend is finally 'ready' (and has told prod= uced > >> =A0 * the details about the physical device - #sectors, size, etc). > >> @@ -1108,7 +1166,7 @@ static void blkfront_connect(struct blkfront_i= nfo *info) > >> =A0 =A0 =A0 unsigned long sector_size; > >> =A0 =A0 =A0 unsigned int binfo; > >> =A0 =A0 =A0 int err; > >> - =A0 =A0 int barrier, flush; > >> + =A0 =A0 int barrier, flush, discard; > >> > >> =A0 =A0 =A0 switch (info->connected) { > >> =A0 =A0 =A0 case BLKIF_STATE_CONNECTED: > >> @@ -1178,7 +1236,14 @@ static void blkfront_connect(struct blkfront_= info *info) > >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 info->feature_flush =3D REQ_FLUSH; > >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 info->flush_op =3D BLKIF_OP_FLUSH_DISKCA= CHE; > >> =A0 =A0 =A0 } > >> - > >> + > >> + =A0 =A0 err =3D xenbus_gather(XBT_NIL, info->xbdev->otherend, > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 "feature-discard",= "%d", &discard, > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 NULL); > >> + > >> + =A0 =A0 if (!err && discard) > >> + =A0 =A0 =A0 =A0 =A0 =A0 blkfront_setup_discard(info); > >> + > >> =A0 =A0 =A0 err =3D xlvbd_alloc_gendisk(sectors, info, binfo, sector= _size); > >> =A0 =A0 =A0 if (err) { > >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 xenbus_dev_fatal(info->xbdev, err, "xlvb= d_add at %s", > >> -- > >> 1.7.6 > >