xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Bob Liu <bob.liu@oracle.com>
To: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Juergen Gross <jgross@suse.com>,
	Ian.Jackson@citrix.com, Paul.Durrant@citrix.com,
	david.vrabel@citrix.com, xen-devel@lists.xen.org
Subject: Re: [RFC PATCH] Data integrity extension support for xen-block
Date: Fri, 08 Apr 2016 18:13:22 +0800	[thread overview]
Message-ID: <57078442.3040901@oracle.com> (raw)
In-Reply-To: <alpine.OSX.2.20.1604081144030.18723@mac>


On 04/08/2016 05:44 PM, Roger Pau Monné wrote:
> On Fri, 8 Apr 2016, Bob Liu wrote:
>>
>> On 04/07/2016 11:55 PM, Juergen Gross wrote:
>>> On 07/04/16 12:00, Bob Liu wrote:
>>>> * What's data integrity extension and why?
>>>> Modern filesystems feature checksumming of data and metadata to protect against
>>>> data corruption.  However, the detection of the corruption is done at read time
>>>> which could potentially be months after the data was written.  At that point the
>>>> original data that the application tried to write is most likely lost.
>>>>
>>>> The solution in Linux is the data integrity framework which enables protection
>>>> information to be pinned to I/Os and sent to/received from controllers that
>>>> support it. struct bio has been extended with a pointer to a struct bip which
>>>> in turn contains the integrity metadata. The bip is essentially a trimmed down
>>>> bio with a bio_vec and some housekeeping.
>>>>
>>>> * Issues when xen-block get involved.
>>>> xen-blkfront only transmits the normal data of struct bio while the integrity
>>>> metadata buffer(struct bio_integrity_payload in each bio) is ignored.
>>>>
>>>> * Proposal of transmitting bio integrity payload.
>>>> Adding an extra request following the normal data request, this extra request
>>>> contains the integrity payload.
>>>> The xen-blkback will reconstruct an new bio with both received normal data and
>>>> integrity metadata.
>>>>
>>>> Welcome any better ideas, thank you!
>>>>
>>>> [1] http://lwn.net/Articles/280023/
>>>> [2] https://www.kernel.org/doc/Documentation/block/data-integrity.txt
>>>>
>>>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>>>> ---
>>>>  xen/include/public/io/blkif.h |   50 +++++++++++++++++++++++++++++++++++++++++
>>>>  1 file changed, 50 insertions(+)
>>>>
>>>> diff --git a/xen/include/public/io/blkif.h b/xen/include/public/io/blkif.h
>>>> index 99f0326..3d8d39f 100644
>>>> --- a/xen/include/public/io/blkif.h
>>>> +++ b/xen/include/public/io/blkif.h
>>>> @@ -635,6 +635,28 @@
>>>>  #define BLKIF_OP_INDIRECT          6
>>>>  
>>>>  /*
>>>> + * Recognized only if "feature-extra-request" is present in backend xenbus info.
>>>> + * A request with BLKIF_OP_EXTRA_FLAG indicates an extra request is followed
>>>> + * in the shared ring buffer.
>>>> + *
>>>> + * By this way, extra data like bio integrity payload can be transmitted from
>>>> + * frontend to backend.
>>>> + *
>>>> + * The 'wire' format is like:
>>>> + *  Request 1: xen_blkif_request
>>>> + * [Request 2: xen_blkif_extra_request]    (only if request 1 has BLKIF_OP_EXTRA_FLAG)
>>>> + *  Request 3: xen_blkif_request
>>>> + *  Request 4: xen_blkif_request
>>>> + * [Request 5: xen_blkif_extra_request]    (only if request 4 has BLKIF_OP_EXTRA_FLAG)
>>>> + *  ...
>>>> + *  Request N: xen_blkif_request
>>>> + *
>>>> + * If a backend does not recognize BLKIF_OP_EXTRA_FLAG, it should *not* create the
>>>> + * "feature-extra-request" node!
>>>> + */
>>>> +#define BLKIF_OP_EXTRA_FLAG (0x80)
>>>> +
>>>> +/*
>>>>   * Maximum scatter/gather segments per request.
>>>>   * This is carefully chosen so that sizeof(blkif_ring_t) <= PAGE_SIZE.
>>>>   * NB. This could be 12 if the ring indexes weren't stored in the same page.
>>>> @@ -703,6 +725,34 @@ struct blkif_request_indirect {
>>>>  };
>>>>  typedef struct blkif_request_indirect blkif_request_indirect_t;
>>>>  
>>>> +enum blkif_extra_request_type {
>>>> +	BLKIF_EXTRA_TYPE_DIX = 1,		/* Data integrity extension payload.  */
>>>> +};
>>>> +
>>>> +struct bio_integrity_req {
>>>> +	/*
>>>> +	 * Grant mapping for transmitting bio integrity payload to backend.
>>>> +	 */
>>>> +	grant_ref_t *gref;
>>>> +	unsigned int nr_grefs;
>>>> +	unsigned int len;
>>>> +};
>>>
>>> How does the payload look like? It's structure should be defined here
>>> or a reference to it's definition in case it is a standard should be
>>> given.
>>>
>>
>> The payload is also described using struct bio_vec(the same as bio).
>>
>> /*
>>  * bio integrity payload
>>  */
>> struct bio_integrity_payload {
>> 	struct bio		*bip_bio;	/* parent bio */
>>
>> 	struct bvec_iter	bip_iter;
>>
>> 	bio_end_io_t		*bip_end_io;	/* saved I/O completion fn */
>>
>> 	unsigned short		bip_slab;	/* slab the bip came from */
>> 	unsigned short		bip_vcnt;	/* # of integrity bio_vecs */
>> 	unsigned short		bip_max_vcnt;	/* integrity bio_vec slots */
>> 	unsigned short		bip_flags;	/* control flags */
>>
>> 	struct work_struct	bip_work;	/* I/O completion */
>>
>> 	struct bio_vec		*bip_vec;
>> 	struct bio_vec		bip_inline_vecs[0];/* embedded bvec array */
>> };
> 
> There's no way we are going to embed such a Linux specific payload into 
> the PV block protocol. Also, I have the feeling there are a lot of fields 
> in this struct that make no sense to transmit on the ring (work_struct?).
> 

Only the bio_vec data bip_vec pointed is necessary to be transmitted.

> TBH, I don't know much about this integrity thing. Why does the frontend 
> needs to create and pass this bio_integrity_payload around? Couldn't this 
> be created from blkback before sending the request down to the disk? Then 
> blkback would check the result and either return BLKIF_RSP_OKAY or 
> BLKIF_RSP_ERROR if the metadata doesn't match?
>  

Yes, but that's only one use case.

The Linux data integrity framework also allows the user space application or filesystem generating the metadata.
* A filesystem in Guest that is integrity-aware can prepare I/Os with metadata attached.
* Filesystems in Guest are capable of transferring metadata from user space.
Those metadata get lost if we don't pass them through in blkfront.

You may have a look at:
[1] http://lwn.net/Articles/280023/
[2] https://www.kernel.org/doc/Documentation/block/data-integrity.txt

Thanks,
Bob

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2016-04-08 10:13 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-07 10:00 [RFC PATCH] Data integrity extension support for xen-block Bob Liu
2016-04-07 15:32 ` Ian Jackson
2016-04-07 15:55 ` Juergen Gross
2016-04-08  1:24   ` Bob Liu
2016-04-08  4:04     ` Juergen Gross
2016-04-08  9:44     ` Roger Pau Monné
2016-04-08 10:13       ` Bob Liu [this message]
2016-04-08 13:42         ` Ian Jackson
2016-04-11 15:04         ` Konrad Rzeszutek Wilk
2016-04-08 14:16 ` David Vrabel
2016-04-08 14:20   ` Ian Jackson
2016-04-08 14:32     ` David Vrabel
2016-04-11 12:32       ` Bob Liu
2016-04-13 12:22 ` Bob Liu
2016-04-13 13:19   ` Ian Jackson
2016-04-13 14:43   ` Juergen Gross

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57078442.3040901@oracle.com \
    --to=bob.liu@oracle.com \
    --cc=Ian.Jackson@citrix.com \
    --cc=Paul.Durrant@citrix.com \
    --cc=david.vrabel@citrix.com \
    --cc=jgross@suse.com \
    --cc=roger.pau@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).