* [PATCH v2 0/4] Introduce bulk mode for crypto engine framework @ 2016-03-15 7:47 Baolin Wang 2016-03-15 7:47 ` [PATCH v2 1/4] scatterlist: Introduce some helper functions Baolin Wang ` (4 more replies) 0 siblings, 5 replies; 27+ messages in thread From: Baolin Wang @ 2016-03-15 7:47 UTC (permalink / raw) To: herbert, davem, agk, snitzer, axboe, dm-devel Cc: akpm, david.s.gordon, thomas.lendacky, robert.jarzmik, yamada.masahiro, smueller, tadeusz.struk, standby24x7, shli, broonie, linus.walleij, arnd, baolin.wang, linux-kernel, linux-crypto, linux-raid Now some cipher hardware engines prefer to handle bulk block by merging requests to increase the block size and thus increase the hardware engine processing speed. This patchset introduces request bulk mode to help the crypto hardware drivers improve in efficiency. Changes since v1: - Modify the sg_is_contiguous() function. Baolin Wang (4): scatterlist: Introduce some helper functions crypto: Introduce some helper functions to help to merge requests crypto: Introduce the bulk mode for crypto engine framework md: dm-crypt: Initialize the sector number for one request crypto/Kconfig | 1 + crypto/ablk_helper.c | 135 ++++++++++++++++++++++++++++++++++++++++++ crypto/crypto_engine.c | 122 +++++++++++++++++++++++++++++++++++++- drivers/crypto/omap-aes.c | 2 +- drivers/md/dm-crypt.c | 1 + include/crypto/ablk_helper.h | 3 + include/crypto/algapi.h | 23 ++++++- include/linux/crypto.h | 5 ++ include/linux/scatterlist.h | 33 +++++++++++ lib/scatterlist.c | 69 +++++++++++++++++++++ 10 files changed, 389 insertions(+), 5 deletions(-) -- 1.7.9.5 ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v2 1/4] scatterlist: Introduce some helper functions 2016-03-15 7:47 [PATCH v2 0/4] Introduce bulk mode for crypto engine framework Baolin Wang @ 2016-03-15 7:47 ` Baolin Wang 2016-04-02 15:00 ` Robert Jarzmik 2016-03-15 7:48 ` [PATCH v2 2/4] crypto: Introduce some helper functions to help to merge requests Baolin Wang ` (3 subsequent siblings) 4 siblings, 1 reply; 27+ messages in thread From: Baolin Wang @ 2016-03-15 7:47 UTC (permalink / raw) To: herbert, davem, agk, snitzer, axboe, dm-devel Cc: akpm, david.s.gordon, thomas.lendacky, robert.jarzmik, yamada.masahiro, smueller, tadeusz.struk, standby24x7, shli, broonie, linus.walleij, arnd, baolin.wang, linux-kernel, linux-crypto, linux-raid In crypto engine framework, one request can combine (copy) other requests' scatterlists into its dynamic sg table to manage them together as a bulk block , which can improve engine efficency with handling bulk block. Thus we need some helper functions to manage dynamic scattertables. This patch introduces 'sg_is_contiguous()' function to check if two scatterlists are contiguous, 'sg_alloc_empty_table()' function to allocate one empty dynamic sg table, 'sg_add_sg_to_table()' function to copy one mapped scatterlist into sg table and 'sg_table_is_empty' function to check if the sg table is empty. Signed-off-by: Baolin Wang <baolin.wang@linaro.org> --- include/linux/scatterlist.h | 33 +++++++++++++++++++++ lib/scatterlist.c | 69 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 102 insertions(+) diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h index 556ec1e..c1ed9f4 100644 --- a/include/linux/scatterlist.h +++ b/include/linux/scatterlist.h @@ -212,6 +212,20 @@ static inline void sg_unmark_end(struct scatterlist *sg) } /** + * sg_table_is_empty - Check if the sg table is empty + * @sgt: sg table + * + * Description: + * The 'orig_nents' member of one sg table is used to indicate how many + * scatterlists in the sg table. + * + **/ +static inline bool sg_table_is_empty(struct sg_table *sgt) +{ + return !sgt->orig_nents; +} + +/** * sg_phys - Return physical address of an sg entry * @sg: SG entry * @@ -241,6 +255,23 @@ static inline void *sg_virt(struct scatterlist *sg) return page_address(sg_page(sg)) + sg->offset; } +/** + * sg_is_contiguous - Check if the scatterlists are contiguous + * @sga: SG entry + * @sgb: SG entry + * + * Description: + * If the sga scatterlist is contiguous with the sgb scatterlist, + * that means they can be merged together. + * + **/ +static inline bool sg_is_contiguous(struct scatterlist *sga, + struct scatterlist *sgb) +{ + return *(unsigned long *)sg_virt(sga) + sga->length == + *(unsigned long *)sg_virt(sgb); +} + int sg_nents(struct scatterlist *sg); int sg_nents_for_len(struct scatterlist *sg, u64 len); struct scatterlist *sg_next(struct scatterlist *); @@ -261,6 +292,8 @@ void sg_free_table(struct sg_table *); int __sg_alloc_table(struct sg_table *, unsigned int, unsigned int, struct scatterlist *, gfp_t, sg_alloc_fn *); int sg_alloc_table(struct sg_table *, unsigned int, gfp_t); +int sg_alloc_empty_table(struct sg_table *, unsigned int, gfp_t); +int sg_add_sg_to_table(struct sg_table *, struct scatterlist *); int sg_alloc_table_from_pages(struct sg_table *sgt, struct page **pages, unsigned int n_pages, unsigned long offset, unsigned long size, diff --git a/lib/scatterlist.c b/lib/scatterlist.c index 004fc70..6d3f3b0 100644 --- a/lib/scatterlist.c +++ b/lib/scatterlist.c @@ -370,6 +370,75 @@ int sg_alloc_table(struct sg_table *table, unsigned int nents, gfp_t gfp_mask) EXPORT_SYMBOL(sg_alloc_table); /** + * sg_add_sg_to_table - Add one scatterlist into sg table + * @sgt: The sg table header to use + * @src: The sg need to be added into sg table + * + * Description: + * The 'nents' member indicates how many mapped scatterlists has been added + * in the dynamic sg table. The 'orig_nents' member indicates the size of the + * dynamic sg table. + * + * Copy one mapped @src@ scatterlist into the dynamic sg table and increase + * 'nents' member. + * + **/ +int sg_add_sg_to_table(struct sg_table *sgt, struct scatterlist *src) +{ + unsigned int i = 0, orig_nents = sgt->orig_nents; + struct scatterlist *sgl = sgt->sgl; + struct scatterlist *sg; + + /* Check if there are enough space for the new sg to be added */ + if (sgt->nents >= sgt->orig_nents) + return -EINVAL; + + for_each_sg(sgl, sg, orig_nents, i) { + if (sgt->nents > 0 && i == (sgt->nents - 1)) { + sg_unmark_end(sg); + } else if (i == sgt->nents) { + memcpy(sg, src, sizeof(struct scatterlist)); + sg_mark_end(sg); + sgt->nents++; + break; + } + } + + return 0; +} + +/** + * sg_alloc_empty_table - Allocate one empty dynamic sg table + * @sgt: The sg table header to use + * @nents: Number of entries in sg list + * @gfp_mask: GFP allocation mask + * + * Description: + * Allocate and initialize one dynamic sg table. One dynamic sg table means + * it need allocate @nents@ empty scatterlists entries and is used to copy + * other mapped scatterlists into the dynamic sg table. + * + * The 'nents' member indicates how many scatterlists has been copied into + * the dynamic sg table. It should set 0 which means there are no mapped + * scatterlists added in this sg table now. + * + * The 'orig_nents' member indicates the size of the dynamic sg table. + * + **/ +int sg_alloc_empty_table(struct sg_table *sgt, unsigned int nents, + gfp_t gfp_mask) +{ + int ret; + + ret = sg_alloc_table(sgt, nents, gfp_mask); + if (ret) + return ret; + + sgt->nents = 0; + return 0; +} + +/** * sg_alloc_table_from_pages - Allocate and initialize an sg table from * an array of pages * @sgt: The sg table header to use -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v2 1/4] scatterlist: Introduce some helper functions 2016-03-15 7:47 ` [PATCH v2 1/4] scatterlist: Introduce some helper functions Baolin Wang @ 2016-04-02 15:00 ` Robert Jarzmik 2016-04-05 7:10 ` Baolin Wang 0 siblings, 1 reply; 27+ messages in thread From: Robert Jarzmik @ 2016-04-02 15:00 UTC (permalink / raw) To: Baolin Wang Cc: herbert, davem, agk, snitzer, axboe, dm-devel, akpm, david.s.gordon, thomas.lendacky, yamada.masahiro, smueller, tadeusz.struk, standby24x7, shli, broonie, linus.walleij, arnd, linux-kernel, linux-crypto, linux-raid Baolin Wang <baolin.wang@linaro.org> writes: > +/** > + * sg_is_contiguous - Check if the scatterlists are contiguous > + * @sga: SG entry > + * @sgb: SG entry > + * > + * Description: > + * If the sga scatterlist is contiguous with the sgb scatterlist, > + * that means they can be merged together. > + * > + **/ > +static inline bool sg_is_contiguous(struct scatterlist *sga, > + struct scatterlist *sgb) > +{ > + return *(unsigned long *)sg_virt(sga) + sga->length == > + *(unsigned long *)sg_virt(sgb); As I already said, I don't like casts. But let's take some height : you're needing this function to decide to merge scatterlists. That means that you think the probability of having 2 scatterlist mergeable is not meaningless, ie. 50% or more. I suppose your scatterlists are allocated through kmalloc(). I'd like to know, through your testing, what is the success rate of sg_is_contiguous(), ie. I'd like to know how many times sg_is_contiguous() was called, and amongst these calls how many times it returned true. That will tell me how "worth" is this optimization. > + * sg_add_sg_to_table - Add one scatterlist into sg table > + * @sgt: The sg table header to use > + * @src: The sg need to be added into sg table > + * > + * Description: > + * The 'nents' member indicates how many mapped scatterlists has been added > + * in the dynamic sg table. The 'orig_nents' member indicates the size of the > + * dynamic sg table. > + * > + * Copy one mapped @src@ scatterlist into the dynamic sg table and increase > + * 'nents' member. > + * > + **/ Okay, I still believe this one is wrong, because we don't understand each other. So let's take an example : sg_table = { .sgl = { { .page_link = PAGE_48, .offset = 0, .length = 2048, .dma_address = 0x30000, .dma_length = 4096, }, { .page_link = PAGE_48 | 0x02, .offset = 2048, .length = 2048, .dma_address = 0, .dma_length = 0, }, }, .nents = 1, .orig_nents = 2, }; In this example, by sheer luck the 2 scatterlist entries were physically contiguous, and the mapping function coallesced the 2 entries into only one (dma_address, dma_length) entry. That could also happen with an IOMMU by the way. Therefore, sg_table.nents = 1. If I understand your function correctly, it will erase sg_table.sgl[1], and that looks incorrect to me. This is why I can't understand how your code can be correct, and why I say you add a new "meaning" to sg_table->nents, which is not consistent with the meaning I understand. Cheers. -- Robert ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 1/4] scatterlist: Introduce some helper functions 2016-04-02 15:00 ` Robert Jarzmik @ 2016-04-05 7:10 ` Baolin Wang 2016-04-20 7:34 ` Baolin Wang 0 siblings, 1 reply; 27+ messages in thread From: Baolin Wang @ 2016-04-05 7:10 UTC (permalink / raw) To: Robert Jarzmik Cc: Herbert Xu, David Miller, Alasdair G Kergon, Mike Snitzer, Jens Axboe, dm-devel, Andrew Morton, david.s.gordon, Tom Lendacky, Masahiro Yamada, smueller, tadeusz.struk, Masanari Iida, shli, Mark Brown, Linus Walleij, Arnd Bergmann, LKML, linux-crypto, linux-raid Hi Robert, Sorry for the late reply. On 2 April 2016 at 23:00, Robert Jarzmik <robert.jarzmik@free.fr> wrote: > Baolin Wang <baolin.wang@linaro.org> writes: > >> +/** >> + * sg_is_contiguous - Check if the scatterlists are contiguous >> + * @sga: SG entry >> + * @sgb: SG entry >> + * >> + * Description: >> + * If the sga scatterlist is contiguous with the sgb scatterlist, >> + * that means they can be merged together. >> + * >> + **/ >> +static inline bool sg_is_contiguous(struct scatterlist *sga, >> + struct scatterlist *sgb) >> +{ >> + return *(unsigned long *)sg_virt(sga) + sga->length == >> + *(unsigned long *)sg_virt(sgb); > As I already said, I don't like casts. OK. Could you give me a good example. Thanks. > > But let's take some height : you're needing this function to decide to merge > scatterlists. That means that you think the probability of having 2 scatterlist > mergeable is not meaningless, ie. 50% or more. > > I suppose your scatterlists are allocated through kmalloc(). I'd like to know, > through your testing, what is the success rate of sg_is_contiguous(), ie. I'd > like to know how many times sg_is_contiguous() was called, and amongst these > calls how many times it returned true. > > That will tell me how "worth" is this optimization. I think this is just one useful API for users, If the rate is only 1%, we also need to check if they are contiguous to decide if they can be coalesced. > >> + * sg_add_sg_to_table - Add one scatterlist into sg table >> + * @sgt: The sg table header to use >> + * @src: The sg need to be added into sg table >> + * >> + * Description: >> + * The 'nents' member indicates how many mapped scatterlists has been added >> + * in the dynamic sg table. The 'orig_nents' member indicates the size of the >> + * dynamic sg table. >> + * >> + * Copy one mapped @src@ scatterlist into the dynamic sg table and increase >> + * 'nents' member. >> + * >> + **/ > Okay, I still believe this one is wrong, because we don't understand each other. > So let's take an example : > sg_table = { > .sgl = { > { > .page_link = PAGE_48, > .offset = 0, > .length = 2048, > .dma_address = 0x30000, > .dma_length = 4096, > }, > { > .page_link = PAGE_48 | 0x02, > .offset = 2048, > .length = 2048, > .dma_address = 0, > .dma_length = 0, > }, > }, > .nents = 1, > .orig_nents = 2, > }; > > In this example, by sheer luck the 2 scatterlist entries were physically > contiguous, and the mapping function coallesced the 2 entries into only one > (dma_address, dma_length) entry. That could also happen with an IOMMU by the > way. Therefore, sg_table.nents = 1. > > If I understand your function correctly, it will erase sg_table.sgl[1], and that > looks incorrect to me. This is why I can't understand how your code can be > correct, and why I say you add a new "meaning" to sg_table->nents, which is not > consistent with the meaning I understand. I think you misunderstood my point. The 'sg_add_sg_to_table()' function is used to add one mapped scatterlist into the dynamic sg table. For example: 1. How to add one mapped sg into dynamic sg table (1) If we allocate one dynamic sg table with 3 (small size can be showed easily) empty scatterlists.: sg_table = { .sgl = { { .page_link = 0, .offset = 0, .length = 0, }, { .page_link = 0, .offset = 0, .length = 0, }, { .page_link = 0 | 0x02, .offset = 0, .length = 0, }, }, .nents = 0, .orig_nents = 3, }; (2) If some module (like dm-crypt module) always sends one mapped scatterlist (size is always 512bytes) to another module (crypto driver) to handle the mapped scatterlist at one time. But we want to collect the mapped scatterlist into one dynamic sg table to manage them together, means we can send multiple mapped scatterlists (from sg_table->sgl) to the crypto driver to handle them together to improve its efficiency. So we add one mapped scatterlists into the dynamic sg table. mapped sg = { .page_link = PAGE_20, .offset = 0, .length = 512, }, Add into synamic sg table -------> sg_table = { .sgl = { { .page_link = PAGE_20 | 0x02, .offset = 0, .length = 512, }, { .page_link = 0, .offset = 0, .length = 0, }, { .page_link = 0, .offset = 0, .length = 0, }, }, .nents = 1, .orig_nents = 3, }; Another two mapped scatterlists are added into synamic sg table -------> sg_table = { .sgl = { { .page_link = PAGE_20, .offset = 0, .length = 512, }, { .page_link = PAGE_25, .offset = 0, .length = 512, }, { .page_link = PAGE_28 | 0x20, .offset = 0, .length = 512, }, }, .nents = 3, .orig_nents = 3, }; Then we can send the dynamic sg table to the crypto engine driver to handle them together at one time. (If the dynamic sg table size is 512, then the crypto engine driver can handle 512 scatterlists at one time, which can improve much efficiency). That's the reason why we want to introduce the dynamic sg table. 2. How to coalesce (1) If one mapped scatterlsit already has been added into dynamic sg table: sg_table = { .sgl = { { .page_link = PAGE_20 | 0x02, .offset = 0, .length = 512, }, { .page_link = 0, .offset = 0, .length = 0, }, { .page_link = 0, .offset = 0, .length = 0, }, }, .nents = 1, .orig_nents = 3, }; (2) Another mapped scatterlist comes. mapped sg = { .page_link = PAGE_20, .offset = 512, .length = 512, }, So we check the new mapped scatterlist can be coalesced into previous one in dynamic sg table like this: sg_table = { .sgl = { { .page_link = PAGE_20 | 0x02, .offset = 0, .length = 1024, }, { .page_link = 0, .offset = 0, .length = 0, }, { .page_link = 0, .offset = 0, .length = 0, }, }, .nents = 1, .orig_nents = 3, }; It's done. We coalesced one scatterlist into antoher one to expand the scatterlist's length. Thanks for your comments. > > Cheers. > > -- > Robert -- Baolin.wang Best Regards ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 1/4] scatterlist: Introduce some helper functions 2016-04-05 7:10 ` Baolin Wang @ 2016-04-20 7:34 ` Baolin Wang 2016-04-20 8:38 ` Herbert Xu 0 siblings, 1 reply; 27+ messages in thread From: Baolin Wang @ 2016-04-20 7:34 UTC (permalink / raw) To: Robert Jarzmik Cc: Herbert Xu, David Miller, Alasdair G Kergon, Mike Snitzer, Jens Axboe, dm-devel, Andrew Morton, david.s.gordon, Tom Lendacky, Masahiro Yamada, smueller, tadeusz.struk, Masanari Iida, shli, Mark Brown, Linus Walleij, Arnd Bergmann, LKML, linux-crypto, linux-raid Hi Robert, On 5 April 2016 at 15:10, Baolin Wang <baolin.wang@linaro.org> wrote: > Hi Robert, > > Sorry for the late reply. > > On 2 April 2016 at 23:00, Robert Jarzmik <robert.jarzmik@free.fr> wrote: >> Baolin Wang <baolin.wang@linaro.org> writes: >> >>> +/** >>> + * sg_is_contiguous - Check if the scatterlists are contiguous >>> + * @sga: SG entry >>> + * @sgb: SG entry >>> + * >>> + * Description: >>> + * If the sga scatterlist is contiguous with the sgb scatterlist, >>> + * that means they can be merged together. >>> + * >>> + **/ >>> +static inline bool sg_is_contiguous(struct scatterlist *sga, >>> + struct scatterlist *sgb) >>> +{ >>> + return *(unsigned long *)sg_virt(sga) + sga->length == >>> + *(unsigned long *)sg_virt(sgb); >> As I already said, I don't like casts. > > OK. Could you give me a good example. Thanks. > >> >> But let's take some height : you're needing this function to decide to merge >> scatterlists. That means that you think the probability of having 2 scatterlist >> mergeable is not meaningless, ie. 50% or more. >> >> I suppose your scatterlists are allocated through kmalloc(). I'd like to know, >> through your testing, what is the success rate of sg_is_contiguous(), ie. I'd >> like to know how many times sg_is_contiguous() was called, and amongst these >> calls how many times it returned true. >> >> That will tell me how "worth" is this optimization. > > I think this is just one useful API for users, If the rate is only 1%, > we also need to check if they are contiguous to decide if they can be > coalesced. > >> >>> + * sg_add_sg_to_table - Add one scatterlist into sg table >>> + * @sgt: The sg table header to use >>> + * @src: The sg need to be added into sg table >>> + * >>> + * Description: >>> + * The 'nents' member indicates how many mapped scatterlists has been added >>> + * in the dynamic sg table. The 'orig_nents' member indicates the size of the >>> + * dynamic sg table. >>> + * >>> + * Copy one mapped @src@ scatterlist into the dynamic sg table and increase >>> + * 'nents' member. >>> + * >>> + **/ >> Okay, I still believe this one is wrong, because we don't understand each other. >> So let's take an example : >> sg_table = { >> .sgl = { >> { >> .page_link = PAGE_48, >> .offset = 0, >> .length = 2048, >> .dma_address = 0x30000, >> .dma_length = 4096, >> }, >> { >> .page_link = PAGE_48 | 0x02, >> .offset = 2048, >> .length = 2048, >> .dma_address = 0, >> .dma_length = 0, >> }, >> }, >> .nents = 1, >> .orig_nents = 2, >> }; >> >> In this example, by sheer luck the 2 scatterlist entries were physically >> contiguous, and the mapping function coallesced the 2 entries into only one >> (dma_address, dma_length) entry. That could also happen with an IOMMU by the >> way. Therefore, sg_table.nents = 1. >> >> If I understand your function correctly, it will erase sg_table.sgl[1], and that >> looks incorrect to me. This is why I can't understand how your code can be >> correct, and why I say you add a new "meaning" to sg_table->nents, which is not >> consistent with the meaning I understand. > > I think you misunderstood my point. The 'sg_add_sg_to_table()' > function is used to add one mapped scatterlist into the dynamic sg > table. For example: > 1. How to add one mapped sg into dynamic sg table > (1) If we allocate one dynamic sg table with 3 (small size can be > showed easily) empty scatterlists.: > sg_table = { > .sgl = { > { > .page_link = 0, > .offset = 0, > .length = 0, > }, > { > .page_link = 0, > .offset = 0, > .length = 0, > }, > { > .page_link = 0 | 0x02, > .offset = 0, > .length = 0, > }, > }, > .nents = 0, > .orig_nents = 3, > }; > > (2) If some module (like dm-crypt module) always sends one mapped > scatterlist (size is always 512bytes) to another module (crypto > driver) to handle the mapped scatterlist at one time. But we want to > collect the mapped scatterlist into one dynamic sg table to manage > them together, means we can send multiple mapped scatterlists (from > sg_table->sgl) to the crypto driver to handle them together to improve > its efficiency. So we add one mapped scatterlists into the dynamic sg > table. > mapped sg = { > .page_link = PAGE_20, > .offset = 0, > .length = 512, > }, > > Add into synamic sg table -------> > sg_table = { > .sgl = { > { > .page_link = PAGE_20 | 0x02, > .offset = 0, > .length = 512, > }, > { > .page_link = 0, > .offset = 0, > .length = 0, > }, > { > .page_link = 0, > .offset = 0, > .length = 0, > }, > }, > .nents = 1, > .orig_nents = 3, > }; > > Another two mapped scatterlists are added into synamic sg table -------> > sg_table = { > .sgl = { > { > .page_link = PAGE_20, > .offset = 0, > .length = 512, > }, > { > .page_link = PAGE_25, > .offset = 0, > .length = 512, > }, > { > .page_link = PAGE_28 | 0x20, > .offset = 0, > .length = 512, > }, > }, > .nents = 3, > .orig_nents = 3, > }; > > Then we can send the dynamic sg table to the crypto engine driver to > handle them together at one time. (If the dynamic sg table size is > 512, then the crypto engine driver can handle 512 scatterlists at one > time, which can improve much efficiency). That's the reason why we > want to introduce the dynamic sg table. > > 2. How to coalesce > (1) If one mapped scatterlsit already has been added into dynamic sg table: > sg_table = { > .sgl = { > { > .page_link = PAGE_20 | 0x02, > .offset = 0, > .length = 512, > }, > { > .page_link = 0, > .offset = 0, > .length = 0, > }, > { > .page_link = 0, > .offset = 0, > .length = 0, > }, > }, > .nents = 1, > .orig_nents = 3, > }; > > (2) Another mapped scatterlist comes. > mapped sg = { > .page_link = PAGE_20, > .offset = 512, > .length = 512, > }, > > So we check the new mapped scatterlist can be coalesced into previous > one in dynamic sg table like this: > sg_table = { > .sgl = { > { > .page_link = PAGE_20 | 0x02, > .offset = 0, > .length = 1024, > }, > { > .page_link = 0, > .offset = 0, > .length = 0, > }, > { > .page_link = 0, > .offset = 0, > .length = 0, > }, > }, > .nents = 1, > .orig_nents = 3, > }; > > It's done. We coalesced one scatterlist into antoher one to expand the > scatterlist's length. > Thanks for your comments. > It seems there are no more comments about this patchset? We really want to these APIs to coalesce scatterlists to improve the crypto engine efficiency. Thanks. -- Baolin.wang Best Regards ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 1/4] scatterlist: Introduce some helper functions 2016-04-20 7:34 ` Baolin Wang @ 2016-04-20 8:38 ` Herbert Xu 0 siblings, 0 replies; 27+ messages in thread From: Herbert Xu @ 2016-04-20 8:38 UTC (permalink / raw) To: Baolin Wang Cc: Robert Jarzmik, David Miller, Alasdair G Kergon, Mike Snitzer, Jens Axboe, dm-devel, Andrew Morton, david.s.gordon, Tom Lendacky, Masahiro Yamada, smueller, tadeusz.struk, Masanari Iida, shli, Mark Brown, Linus Walleij, Arnd Bergmann, LKML, linux-crypto, linux-raid On Wed, Apr 20, 2016 at 03:34:47PM +0800, Baolin Wang wrote: > It seems there are no more comments about this patchset? We really > want to these APIs to coalesce scatterlists to improve the crypto > engine efficiency. Thanks. Your patch-set makes no sense. If the data can be coalesced then it shouldn't have been split up in the first place. I'm nacking the whole series. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v2 2/4] crypto: Introduce some helper functions to help to merge requests 2016-03-15 7:47 [PATCH v2 0/4] Introduce bulk mode for crypto engine framework Baolin Wang 2016-03-15 7:47 ` [PATCH v2 1/4] scatterlist: Introduce some helper functions Baolin Wang @ 2016-03-15 7:48 ` Baolin Wang 2016-03-15 7:48 ` [PATCH v2 3/4] crypto: Introduce the bulk mode for crypto engine framework Baolin Wang ` (2 subsequent siblings) 4 siblings, 0 replies; 27+ messages in thread From: Baolin Wang @ 2016-03-15 7:48 UTC (permalink / raw) To: herbert, davem, agk, snitzer, axboe, dm-devel Cc: akpm, david.s.gordon, thomas.lendacky, robert.jarzmik, yamada.masahiro, smueller, tadeusz.struk, standby24x7, shli, broonie, linus.walleij, arnd, baolin.wang, linux-kernel, linux-crypto, linux-raid Usually the dm-crypt subsystem will send encryption/descryption requests to the crypto layer one block at a time, making each request 512 bytes long, which is a much smaller size for hardware engine, that means the hardware engine can not play its best performance. Now some cipher hardware engines prefer to handle bulk block rather than one sector (512 bytes) created by dm-crypt, cause these cipher engines can handle the intermediate values (IV) by themselves in one bulk block. This means we can increase the size of the request by merging request rather than always 512 bytes and thus increase the hardware engine processing speed. This patch introduces some helper functions to help to merge requests to improve hardware engine efficiency. Signed-off-by: Baolin Wang <baolin.wang@linaro.org> --- crypto/ablk_helper.c | 135 ++++++++++++++++++++++++++++++++++++++++++ include/crypto/ablk_helper.h | 3 + include/linux/crypto.h | 5 ++ 3 files changed, 143 insertions(+) diff --git a/crypto/ablk_helper.c b/crypto/ablk_helper.c index e1fcf53..3cf15cb 100644 --- a/crypto/ablk_helper.c +++ b/crypto/ablk_helper.c @@ -26,6 +26,7 @@ #include <linux/kernel.h> #include <linux/crypto.h> +#include <linux/device-mapper.h> #include <linux/init.h> #include <linux/module.h> #include <linux/hardirq.h> @@ -34,6 +35,140 @@ #include <crypto/ablk_helper.h> #include <asm/simd.h> +/** + * ablk_link_request_if_contigous - try to link one request into previous one + * if the page address is contiguous. + * @list_req: the request from queue list + * @req: the new request need to be merged + * + * If the listed request and new request's pages of the scatterlists are + * contiguous, then merge the scatterlists of new request into the listed one. + * + * Return true on success, others means failed. + */ +static bool ablk_link_request_if_contigous(struct ablkcipher_request *list_req, + struct ablkcipher_request *req) +{ + struct scatterlist *last_src_sg = + sg_last(list_req->sgt_src.sgl, list_req->sgt_src.nents); + struct scatterlist *last_dst_sg = + sg_last(list_req->sgt_dst.sgl, list_req->sgt_dst.nents); + struct scatterlist *req_src_sg = req->src; + struct scatterlist *req_dst_sg = req->dst; + + if (!last_src_sg || !last_dst_sg) + return false; + + /* Check if the src/dst scatterlists are contiguous */ + if (!sg_is_contiguous(last_src_sg, req_src_sg) || + !sg_is_contiguous(last_dst_sg, req_dst_sg)) + return false; + + /* + * If the request can be merged into the listed request after the + * checking, then expand the listed request scatterlists' length. + */ + last_src_sg->length += req_src_sg->length; + last_dst_sg->length += req_dst_sg->length; + list_req->nbytes += req->nbytes; + + return true; +} + +/** + * ablk_merge_request - try to merge one request into previous one + * @list_req: the request from queue list + * @req: the request need to be merged + * + * This function will create a dynamic scatterlist table for both source + * and destination if the request is the first coming in. + * + * Return true on success, others means failed. + */ +static bool ablk_merge_request(struct ablkcipher_request *list_req, + struct ablkcipher_request *req) +{ + struct sg_table *sgt_src = &list_req->sgt_src; + struct sg_table *sgt_dst = &list_req->sgt_dst; + unsigned int nents = SG_MAX_SINGLE_ALLOC; + + if (sg_table_is_empty(sgt_src)) { + if (sg_alloc_empty_table(sgt_src, nents, GFP_ATOMIC)) + return false; + + if (sg_add_sg_to_table(sgt_src, list_req->src)) + return false; + } + + if (sg_table_is_empty(sgt_dst)) { + if (sg_alloc_empty_table(sgt_dst, nents, GFP_ATOMIC)) + return false; + + if (sg_add_sg_to_table(sgt_dst, list_req->dst)) + return false; + } + + /* + * Check if the new request is contiguous for the listed request, + * if it is contiguous then merge the new request into the listed one. + */ + if (ablk_link_request_if_contigous(list_req, req)) + return true; + + if (sg_add_sg_to_table(sgt_src, req->src)) + return false; + + if (sg_add_sg_to_table(sgt_dst, req->dst)) + return false; + + list_req->nbytes += req->nbytes; + return true; +} + +/** + * ablk_try_merge - try to merge one request into previous one + * @queue: the crypto queue list + * @req: the request need to be merged + * + * Note: The merging action should be under the spinlock or mutex protection. + * + * Return 0 on success and others are failed. + */ +int ablk_try_merge(struct crypto_queue *queue, + struct ablkcipher_request *req) +{ + struct ablkcipher_request *list_req; + struct crypto_async_request *async_req; + + list_for_each_entry(async_req, &queue->list, list) { + list_req = ablkcipher_request_cast(async_req); + + if (list_req->base.flags != req->base.flags) + continue; + + /* Check that the request adds up to an even number of sectors */ + if (!IS_ALIGNED(list_req->nbytes, (1U << SECTOR_SHIFT))) + continue; + + if (list_req->nbytes + req->nbytes > UINT_MAX) + continue; + + /* + * We first check that the sectors are adjacent so we don't + * mistadly coalesce something that is contigous in memory but + * not contigous on disk. + */ + if (list_req->sector + list_req->nbytes / + (1U << SECTOR_SHIFT) == req->sector) { + if (ablk_merge_request(list_req, req)) + return 0; + } + } + + return 1; +} +EXPORT_SYMBOL_GPL(ablk_try_merge); + int ablk_set_key(struct crypto_ablkcipher *tfm, const u8 *key, unsigned int key_len) { diff --git a/include/crypto/ablk_helper.h b/include/crypto/ablk_helper.h index 4f93df5..12ae00d 100644 --- a/include/crypto/ablk_helper.h +++ b/include/crypto/ablk_helper.h @@ -8,6 +8,7 @@ #include <linux/crypto.h> #include <linux/kernel.h> #include <crypto/cryptd.h> +#include <crypto/algapi.h> struct async_helper_ctx { struct cryptd_ablkcipher *cryptd_tfm; @@ -28,4 +29,6 @@ extern int ablk_init_common(struct crypto_tfm *tfm, const char *drv_name); extern int ablk_init(struct crypto_tfm *tfm); +extern int ablk_try_merge(struct crypto_queue *queue, + struct ablkcipher_request *req); #endif /* _CRYPTO_ABLK_HELPER_H */ diff --git a/include/linux/crypto.h b/include/linux/crypto.h index e71cb70..f878bb1 100644 --- a/include/linux/crypto.h +++ b/include/linux/crypto.h @@ -21,6 +21,7 @@ #include <linux/kernel.h> #include <linux/list.h> #include <linux/bug.h> +#include <linux/scatterlist.h> #include <linux/slab.h> #include <linux/string.h> #include <linux/uaccess.h> @@ -170,6 +171,10 @@ struct ablkcipher_request { struct scatterlist *src; struct scatterlist *dst; + struct sg_table sgt_src; + struct sg_table sgt_dst; + sector_t sector; + void *__ctx[] CRYPTO_MINALIGN_ATTR; }; -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v2 3/4] crypto: Introduce the bulk mode for crypto engine framework 2016-03-15 7:47 [PATCH v2 0/4] Introduce bulk mode for crypto engine framework Baolin Wang 2016-03-15 7:47 ` [PATCH v2 1/4] scatterlist: Introduce some helper functions Baolin Wang 2016-03-15 7:48 ` [PATCH v2 2/4] crypto: Introduce some helper functions to help to merge requests Baolin Wang @ 2016-03-15 7:48 ` Baolin Wang 2016-03-15 7:48 ` [PATCH v2 4/4] md: dm-crypt: Initialize the sector number for one request Baolin Wang 2016-04-15 13:48 ` [PATCH v2 0/4] Introduce bulk mode for crypto engine framework Herbert Xu 4 siblings, 0 replies; 27+ messages in thread From: Baolin Wang @ 2016-03-15 7:48 UTC (permalink / raw) To: herbert, davem, agk, snitzer, axboe, dm-devel Cc: akpm, david.s.gordon, thomas.lendacky, robert.jarzmik, yamada.masahiro, smueller, tadeusz.struk, standby24x7, shli, broonie, linus.walleij, arnd, baolin.wang, linux-kernel, linux-crypto, linux-raid Now some cipher hardware engines prefer to handle bulk block by merging requests to increase the block size and thus increase the hardware engine processing speed. This patch introduces request bulk mode to help the crypto hardware drivers improve in efficiency, and chooses the suitable mode (SECTOR_MODE) for initializing omap aes engine. Signed-off-by: Baolin Wang <baolin.wang@linaro.org> --- crypto/Kconfig | 1 + crypto/crypto_engine.c | 122 +++++++++++++++++++++++++++++++++++++++++++-- drivers/crypto/omap-aes.c | 2 +- include/crypto/algapi.h | 23 ++++++++- 4 files changed, 143 insertions(+), 5 deletions(-) diff --git a/crypto/Kconfig b/crypto/Kconfig index c844227..6a2f9a6 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -229,6 +229,7 @@ config CRYPTO_GLUE_HELPER_X86 config CRYPTO_ENGINE tristate + select CRYPTO_ABLK_HELPER comment "Authenticated Encryption with Associated Data" diff --git a/crypto/crypto_engine.c b/crypto/crypto_engine.c index a55c82d..0de5829 100644 --- a/crypto/crypto_engine.c +++ b/crypto/crypto_engine.c @@ -14,6 +14,7 @@ #include <linux/err.h> #include <linux/delay.h> +#include <crypto/ablk_helper.h> #include "internal.h" #define CRYPTO_ENGINE_MAX_QLEN 10 @@ -84,6 +85,17 @@ static void crypto_pump_requests(struct crypto_engine *engine, req = ablkcipher_request_cast(async_req); + /* + * If the engine supports the bulk mode and the request is allocated the + * sg table to expand scatterlists entries, then it need to point the + * scatterlists from the sg table. + */ + if (engine->mode == SECTOR_BULK_MODE && req->sgt_src.orig_nents && + req->sgt_dst.orig_nents) { + req->src = req->sgt_src.sgl; + req->dst = req->sgt_dst.sgl; + } + engine->cur_req = req; if (backlog) backlog->complete(backlog, -EINPROGRESS); @@ -137,9 +149,46 @@ static void crypto_pump_work(struct kthread_work *work) } /** + * crypto_merge_request_to_engine - try to merge one request into previous one + * @engine: the hardware engine + * @req: the request need to be merged + * + * If the crypto engine supports bulk mode, then try to merge the new request + * into the listed one from engine queue to handle them together. + * + * Return 0 on success and others are failed. + */ +static bool crypto_merge_request_to_engine(struct crypto_engine *engine, + struct ablkcipher_request *req) +{ + /* + * The request is allocated from memory pool in dm-crypt, here need to + * do initialization for sg table in case some random values. + */ + req->sgt_src.orig_nents = 0; + req->sgt_dst.orig_nents = 0; + + /* + * If the hardware engine can not support the bulk mode encryption, + * just return 1 means merging failed. + */ + if (engine->mode != SECTOR_BULK_MODE) + return 1; + + return ablk_try_merge(&engine->queue, req); +} + +/** * crypto_transfer_request - transfer the new request into the engine queue * @engine: the hardware engine * @req: the request need to be listed into the engine queue + * + * Firstly it will check if the new request can be merged into previous one + * if their secotr numbers are continuous, if not should list it into engine + * queue. + * + * If the new request can be merged into the previous request, then just + * finalize the new request. */ int crypto_transfer_request(struct crypto_engine *engine, struct ablkcipher_request *req, bool need_pump) @@ -154,6 +203,26 @@ int crypto_transfer_request(struct crypto_engine *engine, return -ESHUTDOWN; } + /* + * Here need to check if the request can be merged into previous + * request. If the hardware engine can support encryption with + * bulk block, we can merge the new request into previous request + * if their secotr numbers are continuous, which can be handled + * together by engine to improve the encryption efficiency. + * Return -EINPROGRESS means it has been merged into previous request, + * so just end up this request. + */ + ret = crypto_merge_request_to_engine(engine, req); + if (!ret) { + spin_unlock_irqrestore(&engine->queue_lock, flags); + crypto_finalize_request(engine, req, 0); + return -EINPROGRESS; + } + + /* + * If the request can not be merged into previous request, then list it + * into the queue of engine, and will be handled by kthread worker. + */ ret = ablkcipher_enqueue_request(&engine->queue, req); if (!engine->busy && need_pump) @@ -178,7 +247,8 @@ int crypto_transfer_request_to_engine(struct crypto_engine *engine, EXPORT_SYMBOL_GPL(crypto_transfer_request_to_engine); /** - * crypto_finalize_request - finalize one request if the request is done + * crypto_finalize_request - finalize one request if the request is done or + * merged into previous request * @engine: the hardware engine * @req: the request need to be finalized * @err: error number @@ -208,9 +278,18 @@ void crypto_finalize_request(struct crypto_engine *engine, spin_unlock_irqrestore(&engine->queue_lock, flags); } + sg_free_table(&req->sgt_src); + sg_free_table(&req->sgt_dst); req->base.complete(&req->base, err); - queue_kthread_work(&engine->kworker, &engine->pump_requests); + /* + * If the request is finalized by merging into the previous request from + * the engine queue, then it is no need to queue the kthread work. + * Cause now maybe there are other requests need to be merged into the + * listed request from one block, just wait for merging action. + */ + if (finalize_cur_req) + queue_kthread_work(&engine->kworker, &engine->pump_requests); } EXPORT_SYMBOL_GPL(crypto_finalize_request); @@ -279,15 +358,45 @@ int crypto_engine_stop(struct crypto_engine *engine) EXPORT_SYMBOL_GPL(crypto_engine_stop); /** + * crypto_engine_change_mode - Change the mode for hardware engine + * @engine: the hardware engine + * @mode: engine mode to be set + * + * This function can change the hardware engine mode when the engine is running. + * Return 0 on success, else on fail. + */ +int crypto_engine_change_mode(struct crypto_engine *engine, + enum engine_mode mode) +{ + unsigned long flags; + int ret; + + ret = crypto_engine_stop(engine); + if (ret) { + pr_warn("could not change engine mode now\n"); + return ret; + } + + spin_lock_irqsave(&engine->queue_lock, flags); + engine->mode = mode; + spin_unlock_irqrestore(&engine->queue_lock, flags); + + return crypto_engine_start(engine); +} +EXPORT_SYMBOL_GPL(crypto_engine_change_mode); + +/** * crypto_engine_alloc_init - allocate crypto hardware engine structure and * initialize it. * @dev: the device attached with one hardware engine + * @mode: crypto engine mode * @rt: whether this queue is set to run as a realtime task * * This must be called from context that can sleep. * Return: the crypto engine structure on success, else NULL. */ -struct crypto_engine *crypto_engine_alloc_init(struct device *dev, bool rt) +struct crypto_engine *crypto_engine_alloc_init(struct device *dev, + enum engine_mode mode, bool rt) { struct sched_param param = { .sched_priority = MAX_RT_PRIO - 1 }; struct crypto_engine *engine; @@ -299,6 +408,13 @@ struct crypto_engine *crypto_engine_alloc_init(struct device *dev, bool rt) if (!engine) return NULL; + /* + * If the hardware engine can handle the IV by itself, that means it + * just need one initial IV for multiple requests from one block. So + * we can merge requests from one block into one request to handle, + * which can improve the hardware engine efficiency. + */ + engine->mode = mode; engine->rt = rt; engine->running = false; engine->busy = false; diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c index d420ec7..946f11f 100644 --- a/drivers/crypto/omap-aes.c +++ b/drivers/crypto/omap-aes.c @@ -1230,7 +1230,7 @@ static int omap_aes_probe(struct platform_device *pdev) } /* Initialize crypto engine */ - dd->engine = crypto_engine_alloc_init(dev, 1); + dd->engine = crypto_engine_alloc_init(dev, SECTOR_MODE, 1); if (!dd->engine) goto err_algs; diff --git a/include/crypto/algapi.h b/include/crypto/algapi.h index b09d43f..69fb43e 100644 --- a/include/crypto/algapi.h +++ b/include/crypto/algapi.h @@ -130,6 +130,22 @@ struct ablkcipher_walk { }; #define ENGINE_NAME_LEN 30 + +/* + * enum engine_mode - crypto engine mode + * @SECTOR_MODE: should do encryption/decryption one request by one (one + * request length is one sector size), and it will not coalesce requests. + * @SECTOR_BULK_MODE: it can coalesce the contiguous requests (one request + * length is one sector size) together to be one bulk request, which can be + * handled by crypto engine at one time. + * @MAX_MODE: engine mode numbers + */ +enum engine_mode { + SECTOR_MODE, + SECTOR_BULK_MODE, + MAX_MODE, +}; + /* * struct crypto_engine - crypto hardware engine * @name: the engine name @@ -140,6 +156,7 @@ struct ablkcipher_walk { * @list: link with the global crypto engine list * @queue_lock: spinlock to syncronise access to request queue * @queue: the crypto queue of the engine + * @mode: crypto engine mode * @rt: whether this queue is set to run as a realtime task * @prepare_crypt_hardware: a request will soon arrive from the queue * so the subsystem requests the driver to prepare the hardware @@ -167,6 +184,7 @@ struct crypto_engine { spinlock_t queue_lock; struct crypto_queue queue; + enum engine_mode mode; bool rt; int (*prepare_crypt_hardware)(struct crypto_engine *engine); @@ -195,7 +213,10 @@ void crypto_finalize_request(struct crypto_engine *engine, struct ablkcipher_request *req, int err); int crypto_engine_start(struct crypto_engine *engine); int crypto_engine_stop(struct crypto_engine *engine); -struct crypto_engine *crypto_engine_alloc_init(struct device *dev, bool rt); +int crypto_engine_change_mode(struct crypto_engine *engine, + enum engine_mode mode); +struct crypto_engine *crypto_engine_alloc_init(struct device *dev, + enum engine_mode mode, bool rt); int crypto_engine_exit(struct crypto_engine *engine); extern const struct crypto_type crypto_ablkcipher_type; -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v2 4/4] md: dm-crypt: Initialize the sector number for one request 2016-03-15 7:47 [PATCH v2 0/4] Introduce bulk mode for crypto engine framework Baolin Wang ` (2 preceding siblings ...) 2016-03-15 7:48 ` [PATCH v2 3/4] crypto: Introduce the bulk mode for crypto engine framework Baolin Wang @ 2016-03-15 7:48 ` Baolin Wang 2016-04-15 13:48 ` [PATCH v2 0/4] Introduce bulk mode for crypto engine framework Herbert Xu 4 siblings, 0 replies; 27+ messages in thread From: Baolin Wang @ 2016-03-15 7:48 UTC (permalink / raw) To: herbert, davem, agk, snitzer, axboe, dm-devel Cc: akpm, david.s.gordon, thomas.lendacky, robert.jarzmik, yamada.masahiro, smueller, tadeusz.struk, standby24x7, shli, broonie, linus.walleij, arnd, baolin.wang, linux-kernel, linux-crypto, linux-raid If the crypto engine can support the bulk mode, that means the contiguous requests from one block can be merged into one request to be handled by crypto engine. If so, the crypto engine need the sector number of one request to do merging action. Signed-off-by: Baolin Wang <baolin.wang@linaro.org> --- drivers/md/dm-crypt.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c index 3147c8d..9e2dbfd 100644 --- a/drivers/md/dm-crypt.c +++ b/drivers/md/dm-crypt.c @@ -866,6 +866,7 @@ static int crypt_convert_block(struct crypt_config *cc, return r; } + req->sector = ctx->cc_sector; ablkcipher_request_set_crypt(req, &dmreq->sg_in, &dmreq->sg_out, 1 << SECTOR_SHIFT, iv); -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework 2016-03-15 7:47 [PATCH v2 0/4] Introduce bulk mode for crypto engine framework Baolin Wang ` (3 preceding siblings ...) 2016-03-15 7:48 ` [PATCH v2 4/4] md: dm-crypt: Initialize the sector number for one request Baolin Wang @ 2016-04-15 13:48 ` Herbert Xu 2016-04-18 5:31 ` Baolin Wang 4 siblings, 1 reply; 27+ messages in thread From: Herbert Xu @ 2016-04-15 13:48 UTC (permalink / raw) To: Baolin Wang Cc: davem, agk, snitzer, axboe, dm-devel, akpm, david.s.gordon, thomas.lendacky, robert.jarzmik, yamada.masahiro, smueller, tadeusz.struk, standby24x7, shli, broonie, linus.walleij, arnd, linux-kernel, linux-crypto, linux-raid On Tue, Mar 15, 2016 at 03:47:58PM +0800, Baolin Wang wrote: > Now some cipher hardware engines prefer to handle bulk block by merging requests > to increase the block size and thus increase the hardware engine processing speed. > > This patchset introduces request bulk mode to help the crypto hardware drivers > improve in efficiency. Could you please explain why this merging can't be done in dm-crypt instead? Thanks, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework 2016-04-15 13:48 ` [PATCH v2 0/4] Introduce bulk mode for crypto engine framework Herbert Xu @ 2016-04-18 5:31 ` Baolin Wang 2016-04-18 5:45 ` Herbert Xu 2016-04-18 13:36 ` Mike Snitzer 0 siblings, 2 replies; 27+ messages in thread From: Baolin Wang @ 2016-04-18 5:31 UTC (permalink / raw) To: Herbert Xu Cc: David Miller, Alasdair G Kergon, Mike Snitzer, Jens Axboe, dm-devel, Andrew Morton, david.s.gordon, Tom Lendacky, Robert Jarzmik, Masahiro Yamada, smueller, tadeusz.struk, Masanari Iida, shli, Mark Brown, Linus Walleij, Arnd Bergmann, LKML, linux-crypto, linux-raid Hi Herbert, On 15 April 2016 at 21:48, Herbert Xu <herbert@gondor.apana.org.au> wrote: > On Tue, Mar 15, 2016 at 03:47:58PM +0800, Baolin Wang wrote: >> Now some cipher hardware engines prefer to handle bulk block by merging requests >> to increase the block size and thus increase the hardware engine processing speed. >> >> This patchset introduces request bulk mode to help the crypto hardware drivers >> improve in efficiency. > > Could you please explain why this merging can't be done in dm-crypt > instead? We've tried to do this in dm-crypt, but it failed. The dm-crypt maintainer explained to me that I should optimize the driver, not add strange hw-dependent crypto modes to dm-crypt, this is not the first crypto accelerator that is just not suited for this kind of use. He thought if it can process batch of chunks of data each with own IV, then it can work with dm-crypt, but he thought such optimized code should be inside crypto API, not in dmcrypt. I think his suggestion is reasonable, so we introduce the crypto engine framework to factor out the common patterns for driving the queue of operations. Then it will be more reasonable to do the bulk mode optimization in crypto engine framework. Thanks. > > Thanks, > -- > Email: Herbert Xu <herbert@gondor.apana.org.au> > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- Baolin.wang Best Regards ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework 2016-04-18 5:31 ` Baolin Wang @ 2016-04-18 5:45 ` Herbert Xu 2016-04-18 6:02 ` Baolin Wang 2016-04-18 13:36 ` Mike Snitzer 1 sibling, 1 reply; 27+ messages in thread From: Herbert Xu @ 2016-04-18 5:45 UTC (permalink / raw) To: Baolin Wang Cc: David Miller, Alasdair G Kergon, Mike Snitzer, Jens Axboe, dm-devel, Andrew Morton, david.s.gordon, Tom Lendacky, Robert Jarzmik, Masahiro Yamada, smueller, tadeusz.struk, Masanari Iida, shli, Mark Brown, Linus Walleij, Arnd Bergmann, LKML, linux-crypto, linux-raid On Mon, Apr 18, 2016 at 01:31:09PM +0800, Baolin Wang wrote: > > We've tried to do this in dm-crypt, but it failed. > The dm-crypt maintainer explained to me that I should optimize the > driver, not add strange hw-dependent crypto modes to dm-crypt, this is > not the first crypto accelerator that is just not suited for this kind > of use. > He thought if it can process batch of chunks of data each with own IV, > then it can work with dm-crypt, but he thought such optimized code > should be inside crypto API, not in dmcrypt. That's a completely bogus argument. The user always has more information available than the underlying API. So it is totally stupid to have the API try to extract information that the user could have provided in the first place. I'm not taking this patch-set. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework 2016-04-18 5:45 ` Herbert Xu @ 2016-04-18 6:02 ` Baolin Wang 2016-04-18 7:04 ` Herbert Xu 0 siblings, 1 reply; 27+ messages in thread From: Baolin Wang @ 2016-04-18 6:02 UTC (permalink / raw) To: Herbert Xu Cc: David Miller, Alasdair G Kergon, Mike Snitzer, Jens Axboe, dm-devel, Andrew Morton, david.s.gordon, Tom Lendacky, Robert Jarzmik, Masahiro Yamada, smueller, tadeusz.struk, Masanari Iida, shli, Mark Brown, Linus Walleij, Arnd Bergmann, LKML, linux-crypto, linux-raid On 18 April 2016 at 13:45, Herbert Xu <herbert@gondor.apana.org.au> wrote: > On Mon, Apr 18, 2016 at 01:31:09PM +0800, Baolin Wang wrote: >> >> We've tried to do this in dm-crypt, but it failed. >> The dm-crypt maintainer explained to me that I should optimize the >> driver, not add strange hw-dependent crypto modes to dm-crypt, this is >> not the first crypto accelerator that is just not suited for this kind >> of use. >> He thought if it can process batch of chunks of data each with own IV, >> then it can work with dm-crypt, but he thought such optimized code >> should be inside crypto API, not in dmcrypt. > > That's a completely bogus argument. The user always has more > information available than the underlying API. So it is totally > stupid to have the API try to extract information that the user > could have provided in the first place. If the crypto hardware engine can support bulk data encryption/decryption, so the engine driver can select bulk mode to handle the requests. I think it is a totally driver things, not in dmcrypt. The dmcrypt can not get the hardware engine's attributes. > > I'm not taking this patch-set. > > Cheers, > -- > Email: Herbert Xu <herbert@gondor.apana.org.au> > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- Baolin.wang Best Regards ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework 2016-04-18 6:02 ` Baolin Wang @ 2016-04-18 7:04 ` Herbert Xu 2016-04-18 7:21 ` Baolin Wang 0 siblings, 1 reply; 27+ messages in thread From: Herbert Xu @ 2016-04-18 7:04 UTC (permalink / raw) To: Baolin Wang Cc: David Miller, Alasdair G Kergon, Mike Snitzer, Jens Axboe, dm-devel, Andrew Morton, david.s.gordon, Tom Lendacky, Robert Jarzmik, Masahiro Yamada, smueller, tadeusz.struk, Masanari Iida, shli, Mark Brown, Linus Walleij, Arnd Bergmann, LKML, linux-crypto, linux-raid On Mon, Apr 18, 2016 at 02:02:51PM +0800, Baolin Wang wrote: > > If the crypto hardware engine can support bulk data > encryption/decryption, so the engine driver can select bulk mode to > handle the requests. I think it is a totally driver things, not in > dmcrypt. The dmcrypt can not get the hardware engine's attributes. It has nothing to do with the hardware attributes. dm-crypt should be sending maximal requests in the first place. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework 2016-04-18 7:04 ` Herbert Xu @ 2016-04-18 7:21 ` Baolin Wang 2016-04-18 7:24 ` Herbert Xu 0 siblings, 1 reply; 27+ messages in thread From: Baolin Wang @ 2016-04-18 7:21 UTC (permalink / raw) To: Herbert Xu Cc: David Miller, Alasdair G Kergon, Mike Snitzer, Jens Axboe, dm-devel, Andrew Morton, david.s.gordon, Tom Lendacky, Robert Jarzmik, Masahiro Yamada, smueller, tadeusz.struk, Masanari Iida, shli, Mark Brown, Linus Walleij, Arnd Bergmann, LKML, linux-crypto, linux-raid On 18 April 2016 at 15:04, Herbert Xu <herbert@gondor.apana.org.au> wrote: > On Mon, Apr 18, 2016 at 02:02:51PM +0800, Baolin Wang wrote: >> >> If the crypto hardware engine can support bulk data >> encryption/decryption, so the engine driver can select bulk mode to >> handle the requests. I think it is a totally driver things, not in >> dmcrypt. The dmcrypt can not get the hardware engine's attributes. > > It has nothing to do with the hardware attributes. dm-crypt should > be sending maximal requests in the first place. I don't think so, the dm-crypt can not send maximal requests at some situations. For example, the 'cbc(aes)' cipher, it must be handled sector by sector (IV is dependency for each sector), so the dm-crypt can not send maximal requests, but must sector by sector in this situation. > > Cheers, > -- > Email: Herbert Xu <herbert@gondor.apana.org.au> > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- Baolin.wang Best Regards ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework 2016-04-18 7:21 ` Baolin Wang @ 2016-04-18 7:24 ` Herbert Xu 2016-04-18 7:58 ` Baolin Wang 0 siblings, 1 reply; 27+ messages in thread From: Herbert Xu @ 2016-04-18 7:24 UTC (permalink / raw) To: Baolin Wang Cc: David Miller, Alasdair G Kergon, Mike Snitzer, Jens Axboe, dm-devel, Andrew Morton, david.s.gordon, Tom Lendacky, Robert Jarzmik, Masahiro Yamada, smueller, tadeusz.struk, Masanari Iida, shli, Mark Brown, Linus Walleij, Arnd Bergmann, LKML, linux-crypto, linux-raid On Mon, Apr 18, 2016 at 03:21:16PM +0800, Baolin Wang wrote: > > I don't think so, the dm-crypt can not send maximal requests at some > situations. For example, the 'cbc(aes)' cipher, it must be handled > sector by sector (IV is dependency for each sector), so the dm-crypt > can not send maximal requests, but must sector by sector in this > situation. If you can't merge them as requests, then how is the hardware going to benefit from batching them? Or are you talking about parallel processing similar to sha-mb? Even with batching we should be involving the user because only the user knows (if anyone does) whether more data will be forthcoming. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework 2016-04-18 7:24 ` Herbert Xu @ 2016-04-18 7:58 ` Baolin Wang 2016-04-18 8:04 ` Herbert Xu 0 siblings, 1 reply; 27+ messages in thread From: Baolin Wang @ 2016-04-18 7:58 UTC (permalink / raw) To: Herbert Xu Cc: David Miller, Alasdair G Kergon, Mike Snitzer, Jens Axboe, dm-devel, Andrew Morton, david.s.gordon, Tom Lendacky, Robert Jarzmik, Masahiro Yamada, smueller, tadeusz.struk, Masanari Iida, shli, Mark Brown, Linus Walleij, Arnd Bergmann, LKML, linux-crypto, linux-raid On 18 April 2016 at 15:24, Herbert Xu <herbert@gondor.apana.org.au> wrote: > On Mon, Apr 18, 2016 at 03:21:16PM +0800, Baolin Wang wrote: >> >> I don't think so, the dm-crypt can not send maximal requests at some >> situations. For example, the 'cbc(aes)' cipher, it must be handled >> sector by sector (IV is dependency for each sector), so the dm-crypt >> can not send maximal requests, but must sector by sector in this >> situation. > > If you can't merge them as requests, then how is the hardware going > to benefit from batching them? Or are you talking about parallel That depends on the hardware engine. Some cipher hardware engines (like xts(aes) engine) can handle the intermediate values (IV) by themselves in one bulk block, which means we can increase the size of the request by merging request rather than always 512 bytes and thus increase the hardware engine processing speed. But for some other hardware engines (like cbc(aes) engine), they can not support bulk block, must sector by sector. So the engine drivers can select the suitable mode to do encryption/decryption. > processing similar to sha-mb? > > Even with batching we should be involving the user because only the > user knows (if anyone does) whether more data will be forthcoming. If this cipher engine can support bulk block encryption, the crypto engine framework can merge requests if they are eligible automatically. Don't need to worry about how many data will be forthcoming. > > Cheers, > -- > Email: Herbert Xu <herbert@gondor.apana.org.au> > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- Baolin.wang Best Regards ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework 2016-04-18 7:58 ` Baolin Wang @ 2016-04-18 8:04 ` Herbert Xu 2016-04-18 8:14 ` Baolin Wang 0 siblings, 1 reply; 27+ messages in thread From: Herbert Xu @ 2016-04-18 8:04 UTC (permalink / raw) To: Baolin Wang Cc: David Miller, Alasdair G Kergon, Mike Snitzer, Jens Axboe, dm-devel, Andrew Morton, david.s.gordon, Tom Lendacky, Robert Jarzmik, Masahiro Yamada, smueller, tadeusz.struk, Masanari Iida, shli, Mark Brown, Linus Walleij, Arnd Bergmann, LKML, linux-crypto, linux-raid On Mon, Apr 18, 2016 at 03:58:59PM +0800, Baolin Wang wrote: > > That depends on the hardware engine. Some cipher hardware engines > (like xts(aes) engine) can handle the intermediate values (IV) by > themselves in one bulk block, which means we can increase the size of > the request by merging request rather than always 512 bytes and thus > increase the hardware engine processing speed. But for some other > hardware engines (like cbc(aes) engine), they can not support bulk > block, must sector by sector. So the engine drivers can select the > suitable mode to do encryption/decryption. So what is this supposed to handle, xts or cbc? > > Even with batching we should be involving the user because only the > > user knows (if anyone does) whether more data will be forthcoming. > > If this cipher engine can support bulk block encryption, the crypto > engine framework can merge requests if they are eligible > automatically. Don't need to worry about how many data will be > forthcoming. Merging is simply wrong when the data is coming in as one piece and you've just artifically broken it up, only to merge it later. If the data can be merged then it should have stayed as one piece rather than being fragmented. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework 2016-04-18 8:04 ` Herbert Xu @ 2016-04-18 8:14 ` Baolin Wang 2016-04-18 8:17 ` Herbert Xu 0 siblings, 1 reply; 27+ messages in thread From: Baolin Wang @ 2016-04-18 8:14 UTC (permalink / raw) To: Herbert Xu Cc: David Miller, Alasdair G Kergon, Mike Snitzer, Jens Axboe, dm-devel, Andrew Morton, david.s.gordon, Tom Lendacky, Robert Jarzmik, Masahiro Yamada, smueller, tadeusz.struk, Masanari Iida, shli, Mark Brown, Linus Walleij, Arnd Bergmann, LKML, linux-crypto, linux-raid On 18 April 2016 at 16:04, Herbert Xu <herbert@gondor.apana.org.au> wrote: > On Mon, Apr 18, 2016 at 03:58:59PM +0800, Baolin Wang wrote: >> >> That depends on the hardware engine. Some cipher hardware engines >> (like xts(aes) engine) can handle the intermediate values (IV) by >> themselves in one bulk block, which means we can increase the size of >> the request by merging request rather than always 512 bytes and thus >> increase the hardware engine processing speed. But for some other >> hardware engines (like cbc(aes) engine), they can not support bulk >> block, must sector by sector. So the engine drivers can select the >> suitable mode to do encryption/decryption. > > So what is this supposed to handle, xts or cbc? As I know, now cbc engine also need to handle requests sector by sector, but for xts/ecb engine can support bulk block, which means can merge requests. > >> > Even with batching we should be involving the user because only the >> > user knows (if anyone does) whether more data will be forthcoming. >> >> If this cipher engine can support bulk block encryption, the crypto >> engine framework can merge requests if they are eligible >> automatically. Don't need to worry about how many data will be >> forthcoming. > > Merging is simply wrong when the data is coming in as one piece > and you've just artifically broken it up, only to merge it later. It will not broke it up, and it will check if the requests coming from dm-crypt can be merged together. > > If the data can be merged then it should have stayed as one piece > rather than being fragmented. Yes, usually one whole block can be merged into one request as the latency. > > Cheers, > -- > Email: Herbert Xu <herbert@gondor.apana.org.au> > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- Baolin.wang Best Regards ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework 2016-04-18 8:14 ` Baolin Wang @ 2016-04-18 8:17 ` Herbert Xu 2016-04-18 8:28 ` Baolin Wang 0 siblings, 1 reply; 27+ messages in thread From: Herbert Xu @ 2016-04-18 8:17 UTC (permalink / raw) To: Baolin Wang Cc: David Miller, Alasdair G Kergon, Mike Snitzer, Jens Axboe, dm-devel, Andrew Morton, david.s.gordon, Tom Lendacky, Robert Jarzmik, Masahiro Yamada, smueller, tadeusz.struk, Masanari Iida, shli, Mark Brown, Linus Walleij, Arnd Bergmann, LKML, linux-crypto, linux-raid On Mon, Apr 18, 2016 at 04:14:48PM +0800, Baolin Wang wrote: > On 18 April 2016 at 16:04, Herbert Xu <herbert@gondor.apana.org.au> wrote: > > On Mon, Apr 18, 2016 at 03:58:59PM +0800, Baolin Wang wrote: > >> > >> That depends on the hardware engine. Some cipher hardware engines > >> (like xts(aes) engine) can handle the intermediate values (IV) by > >> themselves in one bulk block, which means we can increase the size of > >> the request by merging request rather than always 512 bytes and thus > >> increase the hardware engine processing speed. But for some other > >> hardware engines (like cbc(aes) engine), they can not support bulk > >> block, must sector by sector. So the engine drivers can select the > >> suitable mode to do encryption/decryption. > > > > So what is this supposed to handle, xts or cbc? > > As I know, now cbc engine also need to handle requests sector by > sector, but for xts/ecb engine can support bulk block, which means can > merge requests. If it's just xts then why can't dm-crypt merge it and send a single request? Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework 2016-04-18 8:17 ` Herbert Xu @ 2016-04-18 8:28 ` Baolin Wang 2016-04-18 8:31 ` Herbert Xu 0 siblings, 1 reply; 27+ messages in thread From: Baolin Wang @ 2016-04-18 8:28 UTC (permalink / raw) To: Herbert Xu Cc: David Miller, Alasdair G Kergon, Mike Snitzer, Jens Axboe, dm-devel, Andrew Morton, david.s.gordon, Tom Lendacky, Robert Jarzmik, Masahiro Yamada, smueller, tadeusz.struk, Masanari Iida, shli, Mark Brown, Linus Walleij, Arnd Bergmann, LKML, linux-crypto, linux-raid On 18 April 2016 at 16:17, Herbert Xu <herbert@gondor.apana.org.au> wrote: > On Mon, Apr 18, 2016 at 04:14:48PM +0800, Baolin Wang wrote: >> On 18 April 2016 at 16:04, Herbert Xu <herbert@gondor.apana.org.au> wrote: >> > On Mon, Apr 18, 2016 at 03:58:59PM +0800, Baolin Wang wrote: >> >> >> >> That depends on the hardware engine. Some cipher hardware engines >> >> (like xts(aes) engine) can handle the intermediate values (IV) by >> >> themselves in one bulk block, which means we can increase the size of >> >> the request by merging request rather than always 512 bytes and thus >> >> increase the hardware engine processing speed. But for some other >> >> hardware engines (like cbc(aes) engine), they can not support bulk >> >> block, must sector by sector. So the engine drivers can select the >> >> suitable mode to do encryption/decryption. >> > >> > So what is this supposed to handle, xts or cbc? >> >> As I know, now cbc engine also need to handle requests sector by >> sector, but for xts/ecb engine can support bulk block, which means can >> merge requests. > > If it's just xts then why can't dm-crypt merge it and send a single > request? What I meaning is if the xts engine can support bulk block, then the engine driver can select bulk mode to do encryption, but if their xts engine can not support bulk mode, which depends on hardware design, the engine driver can not select bulk mode. So the dm-crypt can not know what will be selected by the engine driver, it can not send one bulk block each time. > > Cheers, > -- > Email: Herbert Xu <herbert@gondor.apana.org.au> > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- Baolin.wang Best Regards ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework 2016-04-18 8:28 ` Baolin Wang @ 2016-04-18 8:31 ` Herbert Xu 2016-04-18 8:40 ` Baolin Wang 0 siblings, 1 reply; 27+ messages in thread From: Herbert Xu @ 2016-04-18 8:31 UTC (permalink / raw) To: Baolin Wang Cc: David Miller, Alasdair G Kergon, Mike Snitzer, Jens Axboe, dm-devel, Andrew Morton, david.s.gordon, Tom Lendacky, Robert Jarzmik, Masahiro Yamada, smueller, tadeusz.struk, Masanari Iida, shli, Mark Brown, Linus Walleij, Arnd Bergmann, LKML, linux-crypto, linux-raid On Mon, Apr 18, 2016 at 04:28:46PM +0800, Baolin Wang wrote: > > What I meaning is if the xts engine can support bulk block, then the > engine driver can select bulk mode to do encryption, but if their xts > engine can not support bulk mode, which depends on hardware design, > the engine driver can not select bulk mode. So the dm-crypt can not > know what will be selected by the engine driver, it can not send one > bulk block each time. Why can't the xts code just break it up if it can't handle it? You want to postpone splitting as much as possible. Even if the underlying xts code couldn't handle it, it would still make sense for the crypto API to see the request in one piece. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework 2016-04-18 8:31 ` Herbert Xu @ 2016-04-18 8:40 ` Baolin Wang 2016-04-18 8:41 ` Herbert Xu 0 siblings, 1 reply; 27+ messages in thread From: Baolin Wang @ 2016-04-18 8:40 UTC (permalink / raw) To: Herbert Xu Cc: David Miller, Alasdair G Kergon, Mike Snitzer, Jens Axboe, dm-devel, Andrew Morton, david.s.gordon, Tom Lendacky, Robert Jarzmik, Masahiro Yamada, smueller, tadeusz.struk, Masanari Iida, shli, Mark Brown, Linus Walleij, Arnd Bergmann, LKML, linux-crypto, linux-raid On 18 April 2016 at 16:31, Herbert Xu <herbert@gondor.apana.org.au> wrote: > On Mon, Apr 18, 2016 at 04:28:46PM +0800, Baolin Wang wrote: >> >> What I meaning is if the xts engine can support bulk block, then the >> engine driver can select bulk mode to do encryption, but if their xts >> engine can not support bulk mode, which depends on hardware design, >> the engine driver can not select bulk mode. So the dm-crypt can not >> know what will be selected by the engine driver, it can not send one >> bulk block each time. > > Why can't the xts code just break it up if it can't handle it? Simply to say, now there are many different hardware engines for different vendors, some engines can support bulk block but some can not (or no cipher hardware engine), then the dm-crypt can not know your hardware engine features. If the dm-crypt send one bulk block to low level, but the engine driver can not support bulk block, then it will crash. So we did the merging action in driver level not dm-crypt level. > > You want to postpone splitting as much as possible. Even if the > underlying xts code couldn't handle it, it would still make sense > for the crypto API to see the request in one piece. > > Cheers, > -- > Email: Herbert Xu <herbert@gondor.apana.org.au> > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- Baolin.wang Best Regards ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework 2016-04-18 8:40 ` Baolin Wang @ 2016-04-18 8:41 ` Herbert Xu 2016-04-18 8:51 ` Baolin Wang 0 siblings, 1 reply; 27+ messages in thread From: Herbert Xu @ 2016-04-18 8:41 UTC (permalink / raw) To: Baolin Wang Cc: David Miller, Alasdair G Kergon, Mike Snitzer, Jens Axboe, dm-devel, Andrew Morton, david.s.gordon, Tom Lendacky, Robert Jarzmik, Masahiro Yamada, smueller, tadeusz.struk, Masanari Iida, shli, Mark Brown, Linus Walleij, Arnd Bergmann, LKML, linux-crypto, linux-raid On Mon, Apr 18, 2016 at 04:40:36PM +0800, Baolin Wang wrote: > > Simply to say, now there are many different hardware engines for > different vendors, some engines can support bulk block but some can > not (or no cipher hardware engine), then the dm-crypt can not know > your hardware engine features. If the dm-crypt send one bulk block to > low level, but the engine driver can not support bulk block, then it > will crash. So we did the merging action in driver level not dm-crypt > level. Surely we can handle it in the crypto API layer, just as we do GSO in the network stack for drivers that can't handle TSO? Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework 2016-04-18 8:41 ` Herbert Xu @ 2016-04-18 8:51 ` Baolin Wang 0 siblings, 0 replies; 27+ messages in thread From: Baolin Wang @ 2016-04-18 8:51 UTC (permalink / raw) To: Herbert Xu Cc: David Miller, Alasdair G Kergon, Mike Snitzer, Jens Axboe, dm-devel, Andrew Morton, david.s.gordon, Tom Lendacky, Robert Jarzmik, Masahiro Yamada, smueller, tadeusz.struk, Masanari Iida, shli, Mark Brown, Linus Walleij, Arnd Bergmann, LKML, linux-crypto, linux-raid On 18 April 2016 at 16:41, Herbert Xu <herbert@gondor.apana.org.au> wrote: > On Mon, Apr 18, 2016 at 04:40:36PM +0800, Baolin Wang wrote: >> >> Simply to say, now there are many different hardware engines for >> different vendors, some engines can support bulk block but some can >> not (or no cipher hardware engine), then the dm-crypt can not know >> your hardware engine features. If the dm-crypt send one bulk block to >> low level, but the engine driver can not support bulk block, then it >> will crash. So we did the merging action in driver level not dm-crypt >> level. > > Surely we can handle it in the crypto API layer, just as we do GSO > in the network stack for drivers that can't handle TSO? Yes, it is similar. Thanks. > > Cheers, > -- > Email: Herbert Xu <herbert@gondor.apana.org.au> > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- Baolin.wang Best Regards ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework 2016-04-18 5:31 ` Baolin Wang 2016-04-18 5:45 ` Herbert Xu @ 2016-04-18 13:36 ` Mike Snitzer 2016-04-18 21:22 ` Milan Broz 1 sibling, 1 reply; 27+ messages in thread From: Mike Snitzer @ 2016-04-18 13:36 UTC (permalink / raw) To: Baolin Wang Cc: Herbert Xu, David Miller, Alasdair G Kergon, Jens Axboe, dm-devel, Andrew Morton, david.s.gordon, Tom Lendacky, Robert Jarzmik, Masahiro Yamada, smueller, tadeusz.struk, Masanari Iida, shli, Mark Brown, Linus Walleij, Arnd Bergmann, LKML, linux-crypto, linux-raid, Milan Broz On Mon, Apr 18 2016 at 1:31am -0400, Baolin Wang <baolin.wang@linaro.org> wrote: > Hi Herbert, > > On 15 April 2016 at 21:48, Herbert Xu <herbert@gondor.apana.org.au> wrote: > > On Tue, Mar 15, 2016 at 03:47:58PM +0800, Baolin Wang wrote: > >> Now some cipher hardware engines prefer to handle bulk block by merging requests > >> to increase the block size and thus increase the hardware engine processing speed. > >> > >> This patchset introduces request bulk mode to help the crypto hardware drivers > >> improve in efficiency. > > > > Could you please explain why this merging can't be done in dm-crypt > > instead? > > We've tried to do this in dm-crypt, but it failed. > The dm-crypt maintainer explained to me that I should optimize the > driver, not add strange hw-dependent crypto modes to dm-crypt, this is > not the first crypto accelerator that is just not suited for this kind > of use. As a DM mainatiner my only contribution to this line of discussion was relative to your proposal to train the dm-crypt target (which is bio-based) to also provide request-based support, see: https://www.redhat.com/archives/dm-devel/2015-November/msg00112.html But follow-up discussion occured, primarily with Milan Broz, which led to this bulk mode support in the crypto layer. Pretty strange Milan wasn't cc'd on your patchset posting (I've now cc'd him). Mike ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework 2016-04-18 13:36 ` Mike Snitzer @ 2016-04-18 21:22 ` Milan Broz 0 siblings, 0 replies; 27+ messages in thread From: Milan Broz @ 2016-04-18 21:22 UTC (permalink / raw) To: Mike Snitzer, Baolin Wang Cc: Herbert Xu, David Miller, Alasdair G Kergon, Jens Axboe, dm-devel, Andrew Morton, david.s.gordon, Tom Lendacky, Robert Jarzmik, Masahiro Yamada, smueller, tadeusz.struk, Masanari Iida, shli, Mark Brown, Linus Walleij, Arnd Bergmann, LKML, linux-crypto, linux-raid On 04/18/2016 03:36 PM, Mike Snitzer wrote: > On Mon, Apr 18 2016 at 1:31am -0400, > Baolin Wang <baolin.wang@linaro.org> wrote: > >> Hi Herbert, >> >> On 15 April 2016 at 21:48, Herbert Xu <herbert@gondor.apana.org.au> wrote: >>> On Tue, Mar 15, 2016 at 03:47:58PM +0800, Baolin Wang wrote: >>>> Now some cipher hardware engines prefer to handle bulk block by merging requests >>>> to increase the block size and thus increase the hardware engine processing speed. >>>> >>>> This patchset introduces request bulk mode to help the crypto hardware drivers >>>> improve in efficiency. >>> >>> Could you please explain why this merging can't be done in dm-crypt >>> instead? >> >> We've tried to do this in dm-crypt, but it failed. >> The dm-crypt maintainer explained to me that I should optimize the >> driver, not add strange hw-dependent crypto modes to dm-crypt, this is >> not the first crypto accelerator that is just not suited for this kind >> of use. > > As a DM mainatiner my only contribution to this line of discussion was > relative to your proposal to train the dm-crypt target (which is > bio-based) to also provide request-based support, see: > https://www.redhat.com/archives/dm-devel/2015-November/msg00112.html > > But follow-up discussion occured, primarily with Milan Broz, which led > to this bulk mode support in the crypto layer. Pretty strange Milan > wasn't cc'd on your patchset posting (I've now cc'd him). My complaint was mainly that the proposed dmcrypt based version just did not work properly. https://lkml.org/lkml/2016/1/2/109 (I did not test the new version we are replying here. I wonder how the problem I mentioned is fixed though. Also see Mikulas' comments https://www.redhat.com/archives/dm-devel/2016-January/msg00145.html) Anyway, all this seems to optimize case for specific crypto hw, that is not able to perform optimally with pattern dmcrypt produces. I do not think dmcrypt should do optimizations for specific hw. But if we decide that it is needed there, it should not cause any performance or compatibility problems elsewhere... Milan ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2016-04-20 8:38 UTC | newest] Thread overview: 27+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-03-15 7:47 [PATCH v2 0/4] Introduce bulk mode for crypto engine framework Baolin Wang 2016-03-15 7:47 ` [PATCH v2 1/4] scatterlist: Introduce some helper functions Baolin Wang 2016-04-02 15:00 ` Robert Jarzmik 2016-04-05 7:10 ` Baolin Wang 2016-04-20 7:34 ` Baolin Wang 2016-04-20 8:38 ` Herbert Xu 2016-03-15 7:48 ` [PATCH v2 2/4] crypto: Introduce some helper functions to help to merge requests Baolin Wang 2016-03-15 7:48 ` [PATCH v2 3/4] crypto: Introduce the bulk mode for crypto engine framework Baolin Wang 2016-03-15 7:48 ` [PATCH v2 4/4] md: dm-crypt: Initialize the sector number for one request Baolin Wang 2016-04-15 13:48 ` [PATCH v2 0/4] Introduce bulk mode for crypto engine framework Herbert Xu 2016-04-18 5:31 ` Baolin Wang 2016-04-18 5:45 ` Herbert Xu 2016-04-18 6:02 ` Baolin Wang 2016-04-18 7:04 ` Herbert Xu 2016-04-18 7:21 ` Baolin Wang 2016-04-18 7:24 ` Herbert Xu 2016-04-18 7:58 ` Baolin Wang 2016-04-18 8:04 ` Herbert Xu 2016-04-18 8:14 ` Baolin Wang 2016-04-18 8:17 ` Herbert Xu 2016-04-18 8:28 ` Baolin Wang 2016-04-18 8:31 ` Herbert Xu 2016-04-18 8:40 ` Baolin Wang 2016-04-18 8:41 ` Herbert Xu 2016-04-18 8:51 ` Baolin Wang 2016-04-18 13:36 ` Mike Snitzer 2016-04-18 21:22 ` Milan Broz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).