All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bob Liu <bob.liu@oracle.com>
To: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Juergen Gross <jgross@suse.com>,
	Ian.Jackson@citrix.com, Paul.Durrant@citrix.com,
	david.vrabel@citrix.com, xen-devel@lists.xen.org
Subject: Re: [RFC PATCH] Data integrity extension support for xen-block
Date: Fri, 08 Apr 2016 18:13:22 +0800	[thread overview]
Message-ID: <57078442.3040901@oracle.com> (raw)
In-Reply-To: <alpine.OSX.2.20.1604081144030.18723@mac>


On 04/08/2016 05:44 PM, Roger Pau Monné wrote:
> On Fri, 8 Apr 2016, Bob Liu wrote:
>>
>> On 04/07/2016 11:55 PM, Juergen Gross wrote:
>>> On 07/04/16 12:00, Bob Liu wrote:
>>>> * What's data integrity extension and why?
>>>> Modern filesystems feature checksumming of data and metadata to protect against
>>>> data corruption.  However, the detection of the corruption is done at read time
>>>> which could potentially be months after the data was written.  At that point the
>>>> original data that the application tried to write is most likely lost.
>>>>
>>>> The solution in Linux is the data integrity framework which enables protection
>>>> information to be pinned to I/Os and sent to/received from controllers that
>>>> support it. struct bio has been extended with a pointer to a struct bip which
>>>> in turn contains the integrity metadata. The bip is essentially a trimmed down
>>>> bio with a bio_vec and some housekeeping.
>>>>
>>>> * Issues when xen-block get involved.
>>>> xen-blkfront only transmits the normal data of struct bio while the integrity
>>>> metadata buffer(struct bio_integrity_payload in each bio) is ignored.
>>>>
>>>> * Proposal of transmitting bio integrity payload.
>>>> Adding an extra request following the normal data request, this extra request
>>>> contains the integrity payload.
>>>> The xen-blkback will reconstruct an new bio with both received normal data and
>>>> integrity metadata.
>>>>
>>>> Welcome any better ideas, thank you!
>>>>
>>>> [1] http://lwn.net/Articles/280023/
>>>> [2] https://www.kernel.org/doc/Documentation/block/data-integrity.txt
>>>>
>>>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>>>> ---
>>>>  xen/include/public/io/blkif.h |   50 +++++++++++++++++++++++++++++++++++++++++
>>>>  1 file changed, 50 insertions(+)
>>>>
>>>> diff --git a/xen/include/public/io/blkif.h b/xen/include/public/io/blkif.h
>>>> index 99f0326..3d8d39f 100644
>>>> --- a/xen/include/public/io/blkif.h
>>>> +++ b/xen/include/public/io/blkif.h
>>>> @@ -635,6 +635,28 @@
>>>>  #define BLKIF_OP_INDIRECT          6
>>>>  
>>>>  /*
>>>> + * Recognized only if "feature-extra-request" is present in backend xenbus info.
>>>> + * A request with BLKIF_OP_EXTRA_FLAG indicates an extra request is followed
>>>> + * in the shared ring buffer.
>>>> + *
>>>> + * By this way, extra data like bio integrity payload can be transmitted from
>>>> + * frontend to backend.
>>>> + *
>>>> + * The 'wire' format is like:
>>>> + *  Request 1: xen_blkif_request
>>>> + * [Request 2: xen_blkif_extra_request]    (only if request 1 has BLKIF_OP_EXTRA_FLAG)
>>>> + *  Request 3: xen_blkif_request
>>>> + *  Request 4: xen_blkif_request
>>>> + * [Request 5: xen_blkif_extra_request]    (only if request 4 has BLKIF_OP_EXTRA_FLAG)
>>>> + *  ...
>>>> + *  Request N: xen_blkif_request
>>>> + *
>>>> + * If a backend does not recognize BLKIF_OP_EXTRA_FLAG, it should *not* create the
>>>> + * "feature-extra-request" node!
>>>> + */
>>>> +#define BLKIF_OP_EXTRA_FLAG (0x80)
>>>> +
>>>> +/*
>>>>   * Maximum scatter/gather segments per request.
>>>>   * This is carefully chosen so that sizeof(blkif_ring_t) <= PAGE_SIZE.
>>>>   * NB. This could be 12 if the ring indexes weren't stored in the same page.
>>>> @@ -703,6 +725,34 @@ struct blkif_request_indirect {
>>>>  };
>>>>  typedef struct blkif_request_indirect blkif_request_indirect_t;
>>>>  
>>>> +enum blkif_extra_request_type {
>>>> +	BLKIF_EXTRA_TYPE_DIX = 1,		/* Data integrity extension payload.  */
>>>> +};
>>>> +
>>>> +struct bio_integrity_req {
>>>> +	/*
>>>> +	 * Grant mapping for transmitting bio integrity payload to backend.
>>>> +	 */
>>>> +	grant_ref_t *gref;
>>>> +	unsigned int nr_grefs;
>>>> +	unsigned int len;
>>>> +};
>>>
>>> How does the payload look like? It's structure should be defined here
>>> or a reference to it's definition in case it is a standard should be
>>> given.
>>>
>>
>> The payload is also described using struct bio_vec(the same as bio).
>>
>> /*
>>  * bio integrity payload
>>  */
>> struct bio_integrity_payload {
>> 	struct bio		*bip_bio;	/* parent bio */
>>
>> 	struct bvec_iter	bip_iter;
>>
>> 	bio_end_io_t		*bip_end_io;	/* saved I/O completion fn */
>>
>> 	unsigned short		bip_slab;	/* slab the bip came from */
>> 	unsigned short		bip_vcnt;	/* # of integrity bio_vecs */
>> 	unsigned short		bip_max_vcnt;	/* integrity bio_vec slots */
>> 	unsigned short		bip_flags;	/* control flags */
>>
>> 	struct work_struct	bip_work;	/* I/O completion */
>>
>> 	struct bio_vec		*bip_vec;
>> 	struct bio_vec		bip_inline_vecs[0];/* embedded bvec array */
>> };
> 
> There's no way we are going to embed such a Linux specific payload into 
> the PV block protocol. Also, I have the feeling there are a lot of fields 
> in this struct that make no sense to transmit on the ring (work_struct?).
> 

Only the bio_vec data bip_vec pointed is necessary to be transmitted.

> TBH, I don't know much about this integrity thing. Why does the frontend 
> needs to create and pass this bio_integrity_payload around? Couldn't this 
> be created from blkback before sending the request down to the disk? Then 
> blkback would check the result and either return BLKIF_RSP_OKAY or 
> BLKIF_RSP_ERROR if the metadata doesn't match?
>  

Yes, but that's only one use case.

The Linux data integrity framework also allows the user space application or filesystem generating the metadata.
* A filesystem in Guest that is integrity-aware can prepare I/Os with metadata attached.
* Filesystems in Guest are capable of transferring metadata from user space.
Those metadata get lost if we don't pass them through in blkfront.

You may have a look at:
[1] http://lwn.net/Articles/280023/
[2] https://www.kernel.org/doc/Documentation/block/data-integrity.txt

Thanks,
Bob

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2016-04-08 10:13 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-07 10:00 [RFC PATCH] Data integrity extension support for xen-block Bob Liu
2016-04-07 15:32 ` Ian Jackson
2016-04-07 15:55 ` Juergen Gross
2016-04-08  1:24   ` Bob Liu
2016-04-08  4:04     ` Juergen Gross
2016-04-08  9:44     ` Roger Pau Monné
2016-04-08 10:13       ` Bob Liu [this message]
2016-04-08 13:42         ` Ian Jackson
2016-04-11 15:04         ` Konrad Rzeszutek Wilk
2016-04-08 14:16 ` David Vrabel
2016-04-08 14:20   ` Ian Jackson
2016-04-08 14:32     ` David Vrabel
2016-04-11 12:32       ` Bob Liu
2016-04-13 12:22 ` Bob Liu
2016-04-13 13:19   ` Ian Jackson
2016-04-13 14:43   ` Juergen Gross

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57078442.3040901@oracle.com \
    --to=bob.liu@oracle.com \
    --cc=Ian.Jackson@citrix.com \
    --cc=Paul.Durrant@citrix.com \
    --cc=david.vrabel@citrix.com \
    --cc=jgross@suse.com \
    --cc=roger.pau@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.