Re[4]: [PATCH 02/11][v3] async_tx: add support for asynchronous GF multiplication

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Yuri Tikhonov <yur@emcraft.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: linux-raid@vger.kernel.org, linuxppc-dev@ozlabs.org, wd@denx.de,
	dzu@denx.de, yanok@emcraft.com
Subject: Re[4]: [PATCH 02/11][v3] async_tx: add support for asynchronous GF multiplication
Date: Sat, 17 Jan 2009 15:19:37 +0300	[thread overview]
Message-ID: <81938970.20090117151937@emcraft.com> (raw)
In-Reply-To: <e9c3a7c20901161028v3320b14ay68df8360fff2b8a5@mail.gmail.com>

Hello Dan,

On Friday, January 16, 2009 you wrote:

> On Fri, Jan 16, 2009 at 4:41 AM, Yuri Tikhonov <yur@emcraft.com> wrote:
>>> I don't think this will work as we will be mixing Q into the new P and
>>> P into the new Q.  In order to support (src_cnt > device->max_pq) we
>>> need to explicitly tell the driver that the operation is being
>>> continued (DMA_PREP_CONTINUE) and to apply different coeffeicients to
>>> P and Q to cancel the effect of including them as sources.
>>
>>  With DMA_PREP_ZERO_P/Q approach, the Q isn't mixed into new P, and P
>> isn't mixed into new Q. For your example of max_pq=3D4:
>>
>>  p, q =3D PQ(src0, src1, src2, src3, src4, COEF({01}, {02}, {04}, {08}, =
{10}))
>>
>>  with the current implementation will be split into:
>>
>>  p, q =3D PQ(src0, src1, src2, src3, COEF({01}, {02}, {04}, {08})
>>  p`,q` =3D PQ(src4, COEF({10}))
>>
>>  which will result to the following:
>>
>>  p =3D ((dma_flags & DMA_PREP_ZERO_P) ? 0 : old_p) + src0 + src1 + src2 =
+ src3
>>  q =3D ((dma_flags & DMA_PREP_ZERO_Q) ? 0 : old_q) + {01}*src0 + {02}*sr=
c1 + {04}*src2 + {08}*src3
>>
>>  p` =3D p + src4
>>  q` =3D q + {10}*src4
>>

> Huh?  Does the ppc440spe engine have some notion of flagging a source
> as old_p/old_q?  Otherwise I do not see how the engine will not turn
> this into:

> p` =3D p + src4 + q
> q` =3D q + {10}*src4 + {x}*p

> I think you missed the fact that we have passed p and q back in as
> sources.  Unless we have multiple p destinations and multiple q
> destinations, or hardware support for continuations I do not see how
> you can guarantee this split.

 I guess, I've got your point. You are missing the fact that=20
destinations for 'p' and 'q' are passed in device_prep_dma_pq() method=20
separately from sources. Speaking your words: we do not have multiple=20
destinations through the while() cycles, the destinations are the same=20
in each pass.

 Please look at do_async_pq() implementation more carefully: 'blocks'=20
is a pointer to 'src_cnt' sources _plus_ two destination pages (as=20
it's stated in async_pq() description). Before coming into the while()=20
cycle we save destinations in the dma_dest[] array, and then pass this=20
to device_prep_dma_pq() in each (src_cnt/max_pq) cycle. That is, we do=20
not passes destinations as the sources explicitly: we just clear=20
DMA_PREP_ZERO_P/Q flags to notify ADMA level that this have to XOR the=20
current content of destination(s) with the result of new operation.

>>  I'm afraid that the difference (13/4, 125/32) is very significant, so
>> getting rid of DMA_PREP_ZERO_P/Q will eat most of the improvement
>> which could be achieved with the current approach.

> Data corruption is a slightly higher cost :-).

>>
>>>  but at this point I do not see a cleaner alternatve for engines like i=
op13xx.
>>
>>  I can't find any description of iop13xx processors at Intel's
>> web-site, only 3xx:
>>
>> http://www.intel.com/design/iio/index.htm?iid=3Dipp_embed+embed_io
>>
>>  So, it's hard for me to do any suggestions. I just wonder - doesn't
>> iop13xx allow users to program destination addresses into the sources
>> fields of descriptors?

> Yes it does, but the engine does not know it is a destination.

> Take a look at page 496 of the following and tell me if you come to a
> different conclusion.
> http://download.intel.com/design/iio/docs/31503602.pdf

 I see. The major difference in the implementation of support for P+Q=20
in ppc440spe DMA engines is that ppc440spe allows to include (xor) the=20
previous content of P_Result and/or Q_Result just by setting a=20
corresponding indication in the destination (P_Result and/or Q_Result)=20
address(es)=20

 The "5.7.5 P+Q Update Operation" case won't help here, since, if=20
I understand it right, it doesn't allow to set up different=20
multipliers for Old and New Data.

 So, it looks like your approach:

p', q' =3D PQ(p, q, q, src4, COEF({00}, {01}, {00}, {10}))

 is the only possible way of including the previous P/Q content into=20
the calculation.

 But I still think, that this p'/q' hack should have a place on the=20
ADMA level, not ASYNC_TX. It looks more generic if ASYNC_TX will=20
assume that ADMA is capable of p'=3Dp+src / q'=3Dq+{}*src. Otherwise,=20
we'll have an overhead for the DMAs which could work without this=20
overhead.

 In your case, the IOP ADMA driver should handle the situation when it=20
receives 4 sources to be P+Qed with the previous contents of=20
destinations, for example, by generating the sequence of 4 descriptors=20
to process such a request.

 Regards, Yuri

 --
 Yuri Tikhonov, Senior Software Engineer
 Emcraft Systems, www.emcraft.com

next prev parent reply	other threads:[~2009-01-17 12:19 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-13  0:43 [PATCH 02/11][v3] async_tx: add support for asynchronous GF multiplication Yuri Tikhonov
2009-01-13  0:43 ` Yuri Tikhonov
2009-01-15  0:56 ` Dan Williams
2009-01-15  0:56   ` Dan Williams
2009-01-16 11:41   ` Re[2]: " Yuri Tikhonov
2009-01-16 11:41     ` Yuri Tikhonov
2009-01-16 18:28     ` Dan Williams
2009-01-16 18:28       ` Dan Williams
2009-01-17 12:19       ` Yuri Tikhonov [this message]
2009-01-17 12:19       ` Re[4]: " Yuri Tikhonov
2009-01-17 12:19       ` Yuri Tikhonov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=81938970.20090117151937@emcraft.com \
    --to=yur@emcraft.com \
    --cc=dan.j.williams@intel.com \
    --cc=dzu@denx.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=wd@denx.de \
    --cc=yanok@emcraft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.