linux-crypto.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH0/5] camellia: cleanup, de-unrolling, and 64bit-ization
@ 2007-10-25 11:43 Denys Vlasenko
  2007-10-25 11:45 ` [PATCH 1/5] camellia: cleanup Denys Vlasenko
                   ` (5 more replies)
  0 siblings, 6 replies; 40+ messages in thread
From: Denys Vlasenko @ 2007-10-25 11:43 UTC (permalink / raw)
  To: Herbert Xu; +Cc: linux-crypto

[-- Attachment #1: Type: text/plain, Size: 5455 bytes --]

Hi Hervert,

Please review and maybe propagate upstream following patches.

camellia1.diff:
    Move code blocks around so that related pieces are closer together:
    e.g. CAMELLIA_ROUNDSM macro does not need to be separated
    from the rest of the code by huge array of constants.

    Remove unused macros (COPY4WORD, SWAP4WORD, XOR4WORD[2])

    Drop SUBL(), SUBR() macros which only obscure things.
    Same for CAMELLIA_SP1110() macro and KEY_TABLE_TYPE typedef.

    Remove useless comments:
    /* encryption */ -- well it's obvious enough already!
    void camellia_encrypt128(...)

    Combine swap with copying at the beginning/end of encrypt/decrypt.


camellia2.diff
    Rename some macros to shorter names: CAMELLIA_RR8 -> ROR8,
    making it easier to understand that it is just a right rotation,
    nothing camellia-specific in it.
    CAMELLIA_SUBKEY_L() -> SUBKEY_L() - just shorter.

    Move be32 <-> cpu conversions out of en/decrypt128/256 and into
    camellia_en/decrypt - no reason to have that code duplicated twice.


camellia3.diff
    Optimize GETU32 to use 4-byte memcpy (modern gcc will convert
    such memcpy to single move instruction on i386).
    Original GETU32 did four byte fetches, and shifted/XORed those.


camellia4.diff
    Move huge unrolled pieces of code (3 screenfuls) at the end of
    128/256 key setup routines into common camellia_setup_tail(),
    convert it to loop there.
    Loop is still unrolled six times, so performance hit is very small,
    code size win is big.


camellia5.diff
    Use alternative key setup implementation with mostly 64-bit ops
    if BITS_PER_LONG >= 64. Both much smaller and much faster.

    Unify camellia_en/decrypt128/256 into camellia_do_en/decrypt.
    Code was similar, with just one additional if() we can use came code.

    If CONFIG_CC_OPTIMIZE_FOR_SIZE is defined,
    use loop in camellia_do_en/decrypt instead of unrolled code.
    ~5% encrypt/decrypt slowdown.

    Replace (x & 0xff) with (u8)x, gcc is not smart enough to realize
    that it can do (x & 0xff) this way (which is smaller at least on i386).

    Don't do (x & 0xff) in a few places where x cannot be > 255 anyway:
        t0 = il >> 16; v = camellia_sp0222[(t1 >> 8) & 0xff];
    il16 is u32, (thus t1 >> 8) is one byte!



Benchmarking was done in userspace (see attached tarball for code).
All times are in microseconds. Two runs give some idea of test variability.
"Setup NN: NNNNNN NNNNNN" - time taken by 100000 key setups (two runs).
"Encrypt: NNNNNN NNNNNN" - time taken by 1000 encryptions of 8K buffer.
"Decrypt: NNNNNN NNNNNN" - time taken by 1000 decryptions of 8K buffer.
"(matches)" - encrypt/decrypt cycle produced non corrupted plaintext.

CONFIG_CC_OPTIMIZE_FOR_SIZE is not set:

$ ./camellia
Setup 16:32779 33169 Encrypt:153582 153740 Decrypt:150985 149811 (matches)
Setup 24:49333 48987 Encrypt:197973 198853 Decrypt:201240 197585 (matches)
Setup 32:46700 47680 Encrypt:195650 195800 Decrypt:195450 195469 (matches)
$ ./camellia5
Setup 16:33417 32968 Encrypt:149195 149095 Decrypt:148593 148661 (matches)
Setup 24:50082 50064 Encrypt:201214 199204 Decrypt:197078 197579 (matches)
Setup 32:48938 48824 Encrypt:200231 199545 Decrypt:198954 198996 (matches)
$ ./camellia_64
Setup 16:22247 22473 Encrypt:152321 149860 Decrypt:149058 148451 (matches)
Setup 24:33832 34017 Encrypt:200428 202969 Decrypt:196789 195524 (matches)
Setup 32:32884 32821 Encrypt:200414 200640 Decrypt:197857 195987 (matches)

$ size camellia.o camellia7.o camellia_64.o
   text    data     bss     dec     hex filename
  24586       0       0   24586    600a camellia.o
  21714       0       0   21714    54d2 camellia5.o
  18666       0       0   18666    48ea camellia_64.o

Very small speed loss in camellia -> camellia5, noticeably smaller size.
Big key setup speedup in 64-bit camellia_64, and it is even smaller.


CONFIG_CC_OPTIMIZE_FOR_SIZE is set:

$ ./camellia_Os
Setup 16:32573 34985 Encrypt:151825 152011 Decrypt:147581 147630 (matches)
Setup 24:48528 49250 Encrypt:196223 199056 Decrypt:198811 196394 (matches)
Setup 32:46650 47538 Encrypt:197466 196412 Decrypt:196290 196550 (matches)
$ ./camellia5_Os
Setup 16:33360 34487 Encrypt:154718 154499 Decrypt:157432 157135 (matches)
Setup 24:53969 54304 Encrypt:205184 205818 Decrypt:210675 208552 (matches)
Setup 32:53064 52904 Encrypt:205350 205439 Decrypt:211654 208468 (matches)
$ ./camellia_64_Os
Setup 16:24696 25894 Encrypt:155903 155747 Decrypt:157385 155696 (matches)
Setup 24:33873 33230 Encrypt:206111 206385 Decrypt:208111 207650 (matches)
Setup 32:32799 32325 Encrypt:209715 205973 Decrypt:207578 207644 (matches)

$ size camellia_Os.o camellia7_Os.o camellia_64_Os.o
   text    data     bss     dec     hex filename
  24586       0       0   24586    600a camellia_Os.o
  15906       0       0   15906    3e22 camellia5_Os.o
  13098       0       0   13098    332a camellia_64_Os.o

Some speed loss in camellia -> camellia5, much smaller size.
Big key setup speedup in 64-bit camellia_64, and it is even smaller still.


Above sizes are for userspace test programs. Kernel sizes are similar.
For example, kernel module sizes with CONFIG_CC_OPTIMIZE_FOR_SIZE set, AMD64:

$ size */camellia.o
   text    data     bss     dec     hex filename
  23208     272       0   23480    5bb8 crypto.org/camellia.o
  11328     272       0   11600    2d50 crypto/camellia.o

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
--
vda

[-- Attachment #2: test_camellia.tar.bz2 --]
[-- Type: application/x-tbz, Size: 18097 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 1/5] camellia: cleanup
  2007-10-25 11:43 [PATCH0/5] camellia: cleanup, de-unrolling, and 64bit-ization Denys Vlasenko
@ 2007-10-25 11:45 ` Denys Vlasenko
  2007-10-26  8:43   ` Noriaki TAKAMIYA
  2007-11-06 14:17   ` Herbert Xu
  2007-10-25 11:45 ` [PATCH 2/5] " Denys Vlasenko
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 40+ messages in thread
From: Denys Vlasenko @ 2007-10-25 11:45 UTC (permalink / raw)
  To: Herbert Xu; +Cc: linux-crypto

[-- Attachment #1: Type: text/plain, Size: 824 bytes --]

On Thursday 25 October 2007 12:43, Denys Vlasenko wrote:
> Hi Hervert,
> 
> Please review and maybe propagate upstream following patches.
> 
> camellia1.diff:
>     Move code blocks around so that related pieces are closer together:
>     e.g. CAMELLIA_ROUNDSM macro does not need to be separated
>     from the rest of the code by huge array of constants.
> 
>     Remove unused macros (COPY4WORD, SWAP4WORD, XOR4WORD[2])
> 
>     Drop SUBL(), SUBR() macros which only obscure things.
>     Same for CAMELLIA_SP1110() macro and KEY_TABLE_TYPE typedef.
> 
>     Remove useless comments:
>     /* encryption */ -- well it's obvious enough already!
>     void camellia_encrypt128(...)
> 
>     Combine swap with copying at the beginning/end of encrypt/decrypt.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
--
vda

[-- Attachment #2: camellia1.diff --]
[-- Type: text/x-diff, Size: 43463 bytes --]

--- linux-2.6.23.src/crypto/camellia0.c	2007-10-24 19:01:54.000000000 +0100
+++ linux-2.6.23.src/crypto/camellia.c	2007-10-24 19:03:05.000000000 +0100
@@ -36,176 +36,6 @@
 #include <linux/kernel.h>
 #include <linux/module.h>
 
-
-#define CAMELLIA_MIN_KEY_SIZE        16
-#define CAMELLIA_MAX_KEY_SIZE        32
-#define CAMELLIA_BLOCK_SIZE 16
-#define CAMELLIA_TABLE_BYTE_LEN 272
-#define CAMELLIA_TABLE_WORD_LEN (CAMELLIA_TABLE_BYTE_LEN / 4)
-
-typedef u32 KEY_TABLE_TYPE[CAMELLIA_TABLE_WORD_LEN];
-
-
-/* key constants */
-
-#define CAMELLIA_SIGMA1L (0xA09E667FL)
-#define CAMELLIA_SIGMA1R (0x3BCC908BL)
-#define CAMELLIA_SIGMA2L (0xB67AE858L)
-#define CAMELLIA_SIGMA2R (0x4CAA73B2L)
-#define CAMELLIA_SIGMA3L (0xC6EF372FL)
-#define CAMELLIA_SIGMA3R (0xE94F82BEL)
-#define CAMELLIA_SIGMA4L (0x54FF53A5L)
-#define CAMELLIA_SIGMA4R (0xF1D36F1CL)
-#define CAMELLIA_SIGMA5L (0x10E527FAL)
-#define CAMELLIA_SIGMA5R (0xDE682D1DL)
-#define CAMELLIA_SIGMA6L (0xB05688C2L)
-#define CAMELLIA_SIGMA6R (0xB3E6C1FDL)
-
-struct camellia_ctx {
-	int key_length;
-	KEY_TABLE_TYPE key_table;
-};
-
-
-/*
- *  macros
- */
-
-
-# define GETU32(pt) (((u32)(pt)[0] << 24)	\
-		     ^ ((u32)(pt)[1] << 16)	\
-		     ^ ((u32)(pt)[2] <<  8)	\
-		     ^ ((u32)(pt)[3]))
-
-#define COPY4WORD(dst, src)			\
-    do {					\
-	(dst)[0]=(src)[0];			\
-	(dst)[1]=(src)[1];			\
-	(dst)[2]=(src)[2];			\
-	(dst)[3]=(src)[3];			\
-    }while(0)
-
-#define SWAP4WORD(word)				\
-    do {					\
-	CAMELLIA_SWAP4((word)[0]);		\
-	CAMELLIA_SWAP4((word)[1]);		\
-	CAMELLIA_SWAP4((word)[2]);		\
-	CAMELLIA_SWAP4((word)[3]);		\
-    }while(0)
-
-#define XOR4WORD(a, b)/* a = a ^ b */		\
-    do {					\
-	(a)[0]^=(b)[0];				\
-	(a)[1]^=(b)[1];				\
-	(a)[2]^=(b)[2];				\
-	(a)[3]^=(b)[3];				\
-    }while(0)
-
-#define XOR4WORD2(a, b, c)/* a = b ^ c */	\
-    do {					\
-	(a)[0]=(b)[0]^(c)[0];			\
-	(a)[1]=(b)[1]^(c)[1];			\
-	(a)[2]=(b)[2]^(c)[2];			\
-	(a)[3]=(b)[3]^(c)[3];			\
-    }while(0)
-
-#define CAMELLIA_SUBKEY_L(INDEX) (subkey[(INDEX)*2])
-#define CAMELLIA_SUBKEY_R(INDEX) (subkey[(INDEX)*2 + 1])
-
-/* rotation right shift 1byte */
-#define CAMELLIA_RR8(x) (((x) >> 8) + ((x) << 24))
-/* rotation left shift 1bit */
-#define CAMELLIA_RL1(x) (((x) << 1) + ((x) >> 31))
-/* rotation left shift 1byte */
-#define CAMELLIA_RL8(x) (((x) << 8) + ((x) >> 24))
-
-#define CAMELLIA_ROLDQ(ll, lr, rl, rr, w0, w1, bits)	\
-    do {						\
-	w0 = ll;					\
-	ll = (ll << bits) + (lr >> (32 - bits));	\
-	lr = (lr << bits) + (rl >> (32 - bits));	\
-	rl = (rl << bits) + (rr >> (32 - bits));	\
-	rr = (rr << bits) + (w0 >> (32 - bits));	\
-    } while(0)
-
-#define CAMELLIA_ROLDQo32(ll, lr, rl, rr, w0, w1, bits)	\
-    do {						\
-	w0 = ll;					\
-	w1 = lr;					\
-	ll = (lr << (bits - 32)) + (rl >> (64 - bits));	\
-	lr = (rl << (bits - 32)) + (rr >> (64 - bits));	\
-	rl = (rr << (bits - 32)) + (w0 >> (64 - bits));	\
-	rr = (w0 << (bits - 32)) + (w1 >> (64 - bits));	\
-    } while(0)
-
-#define CAMELLIA_SP1110(INDEX) (camellia_sp1110[(INDEX)])
-#define CAMELLIA_SP0222(INDEX) (camellia_sp0222[(INDEX)])
-#define CAMELLIA_SP3033(INDEX) (camellia_sp3033[(INDEX)])
-#define CAMELLIA_SP4404(INDEX) (camellia_sp4404[(INDEX)])
-
-#define CAMELLIA_F(xl, xr, kl, kr, yl, yr, il, ir, t0, t1)	\
-    do {							\
-	il = xl ^ kl;						\
-	ir = xr ^ kr;						\
-	t0 = il >> 16;						\
-	t1 = ir >> 16;						\
-	yl = CAMELLIA_SP1110(ir & 0xff)				\
-	    ^ CAMELLIA_SP0222((t1 >> 8) & 0xff)			\
-	    ^ CAMELLIA_SP3033(t1 & 0xff)			\
-	    ^ CAMELLIA_SP4404((ir >> 8) & 0xff);		\
-	yr = CAMELLIA_SP1110((t0 >> 8) & 0xff)			\
-	    ^ CAMELLIA_SP0222(t0 & 0xff)			\
-	    ^ CAMELLIA_SP3033((il >> 8) & 0xff)			\
-	    ^ CAMELLIA_SP4404(il & 0xff);			\
-	yl ^= yr;						\
-	yr = CAMELLIA_RR8(yr);					\
-	yr ^= yl;						\
-    } while(0)
-
-
-/*
- * for speed up
- *
- */
-#define CAMELLIA_FLS(ll, lr, rl, rr, kll, klr, krl, krr, t0, t1, t2, t3) \
-    do {								\
-	t0 = kll;							\
-	t2 = krr;							\
-	t0 &= ll;							\
-	t2 |= rr;							\
-	rl ^= t2;							\
-	lr ^= CAMELLIA_RL1(t0);						\
-	t3 = krl;							\
-	t1 = klr;							\
-	t3 &= rl;							\
-	t1 |= lr;							\
-	ll ^= t1;							\
-	rr ^= CAMELLIA_RL1(t3);						\
-    } while(0)
-
-#define CAMELLIA_ROUNDSM(xl, xr, kl, kr, yl, yr, il, ir, t0, t1)	\
-    do {								\
-	ir =  CAMELLIA_SP1110(xr & 0xff);				\
-	il =  CAMELLIA_SP1110((xl>>24) & 0xff);				\
-	ir ^= CAMELLIA_SP0222((xr>>24) & 0xff);				\
-	il ^= CAMELLIA_SP0222((xl>>16) & 0xff);				\
-	ir ^= CAMELLIA_SP3033((xr>>16) & 0xff);				\
-	il ^= CAMELLIA_SP3033((xl>>8) & 0xff);				\
-	ir ^= CAMELLIA_SP4404((xr>>8) & 0xff);				\
-	il ^= CAMELLIA_SP4404(xl & 0xff);				\
-	il ^= kl;							\
-	ir ^= il ^ kr;							\
-	yl ^= ir;							\
-	yr ^= CAMELLIA_RR8(il) ^ ir;					\
-    } while(0)
-
-/**
- * Stuff related to the Camellia key schedule
- */
-#define SUBL(x) subL[(x)]
-#define SUBR(x) subR[(x)]
-
-
 static const u32 camellia_sp1110[256] = {
 	0x70707000,0x82828200,0x2c2c2c00,0xececec00,
 	0xb3b3b300,0x27272700,0xc0c0c000,0xe5e5e500,
@@ -475,6 +305,122 @@ static const u32 camellia_sp4404[256] = 
 };
 
 
+#define CAMELLIA_MIN_KEY_SIZE        16
+#define CAMELLIA_MAX_KEY_SIZE        32
+#define CAMELLIA_BLOCK_SIZE          16
+#define CAMELLIA_TABLE_BYTE_LEN     272
+
+
+/* key constants */
+
+#define CAMELLIA_SIGMA1L (0xA09E667FL)
+#define CAMELLIA_SIGMA1R (0x3BCC908BL)
+#define CAMELLIA_SIGMA2L (0xB67AE858L)
+#define CAMELLIA_SIGMA2R (0x4CAA73B2L)
+#define CAMELLIA_SIGMA3L (0xC6EF372FL)
+#define CAMELLIA_SIGMA3R (0xE94F82BEL)
+#define CAMELLIA_SIGMA4L (0x54FF53A5L)
+#define CAMELLIA_SIGMA4R (0xF1D36F1CL)
+#define CAMELLIA_SIGMA5L (0x10E527FAL)
+#define CAMELLIA_SIGMA5R (0xDE682D1DL)
+#define CAMELLIA_SIGMA6L (0xB05688C2L)
+#define CAMELLIA_SIGMA6R (0xB3E6C1FDL)
+
+/*
+ *  macros
+ */
+
+# define GETU32(pt) (((u32)(pt)[0] << 24)	\
+		     ^ ((u32)(pt)[1] << 16)	\
+		     ^ ((u32)(pt)[2] <<  8)	\
+		     ^ ((u32)(pt)[3]))
+
+/* rotation right shift 1byte */
+#define CAMELLIA_RR8(x) (((x) >> 8) + ((x) << 24))
+/* rotation left shift 1bit */
+#define CAMELLIA_RL1(x) (((x) << 1) + ((x) >> 31))
+/* rotation left shift 1byte */
+#define CAMELLIA_RL8(x) (((x) << 8) + ((x) >> 24))
+
+#define CAMELLIA_ROLDQ(ll, lr, rl, rr, w0, w1, bits)	\
+    do {						\
+	w0 = ll;					\
+	ll = (ll << bits) + (lr >> (32 - bits));	\
+	lr = (lr << bits) + (rl >> (32 - bits));	\
+	rl = (rl << bits) + (rr >> (32 - bits));	\
+	rr = (rr << bits) + (w0 >> (32 - bits));	\
+    } while(0)
+
+#define CAMELLIA_ROLDQo32(ll, lr, rl, rr, w0, w1, bits)	\
+    do {						\
+	w0 = ll;					\
+	w1 = lr;					\
+	ll = (lr << (bits - 32)) + (rl >> (64 - bits));	\
+	lr = (rl << (bits - 32)) + (rr >> (64 - bits));	\
+	rl = (rr << (bits - 32)) + (w0 >> (64 - bits));	\
+	rr = (w0 << (bits - 32)) + (w1 >> (64 - bits));	\
+    } while(0)
+
+
+#define CAMELLIA_F(xl, xr, kl, kr, yl, yr, il, ir, t0, t1)	\
+    do {							\
+	il = xl ^ kl;						\
+	ir = xr ^ kr;						\
+	t0 = il >> 16;						\
+	t1 = ir >> 16;						\
+	yl = camellia_sp1110[ir & 0xff]				\
+	   ^ camellia_sp0222[(t1 >> 8) & 0xff]			\
+	   ^ camellia_sp3033[t1 & 0xff]				\
+	   ^ camellia_sp4404[(ir >> 8) & 0xff];			\
+	yr = camellia_sp1110[(t0 >> 8) & 0xff]			\
+	   ^ camellia_sp0222[t0 & 0xff]				\
+	   ^ camellia_sp3033[(il >> 8) & 0xff]			\
+	   ^ camellia_sp4404[il & 0xff];			\
+	yl ^= yr;						\
+	yr = CAMELLIA_RR8(yr);					\
+	yr ^= yl;						\
+    } while(0)
+
+
+/*
+ * for speed up
+ *
+ */
+#define CAMELLIA_FLS(ll, lr, rl, rr, kll, klr, krl, krr, t0, t1, t2, t3) \
+    do {								\
+	t0 = kll;							\
+	t2 = krr;							\
+	t0 &= ll;							\
+	t2 |= rr;							\
+	rl ^= t2;							\
+	lr ^= CAMELLIA_RL1(t0);						\
+	t3 = krl;							\
+	t1 = klr;							\
+	t3 &= rl;							\
+	t1 |= lr;							\
+	ll ^= t1;							\
+	rr ^= CAMELLIA_RL1(t3);						\
+    } while(0)
+
+#define CAMELLIA_ROUNDSM(xl, xr, kl, kr, yl, yr, il, ir, t0, t1)	\
+    do {								\
+	ir =  camellia_sp1110[xr & 0xff];				\
+	il =  camellia_sp1110[(xl>>24) & 0xff];				\
+	ir ^= camellia_sp0222[(xr>>24) & 0xff];				\
+	il ^= camellia_sp0222[(xl>>16) & 0xff];				\
+	ir ^= camellia_sp3033[(xr>>16) & 0xff];				\
+	il ^= camellia_sp3033[(xl>>8) & 0xff];				\
+	ir ^= camellia_sp4404[(xr>>8) & 0xff];				\
+	il ^= camellia_sp4404[xl & 0xff];				\
+	il ^= kl;							\
+	ir ^= il ^ kr;							\
+	yl ^= ir;							\
+	yr ^= CAMELLIA_RR8(il) ^ ir;					\
+    } while(0)
+
+
+#define CAMELLIA_SUBKEY_L(INDEX) (subkey[(INDEX)*2])
+#define CAMELLIA_SUBKEY_R(INDEX) (subkey[(INDEX)*2 + 1])
 
 static void camellia_setup128(const unsigned char *key, u32 *subkey)
 {
@@ -495,47 +441,47 @@ static void camellia_setup128(const unsi
 	 * generate KL dependent subkeys
 	 */
 	/* kw1 */
-	SUBL(0) = kll; SUBR(0) = klr;
+	subL[0] = kll; subR[0] = klr;
 	/* kw2 */
-	SUBL(1) = krl; SUBR(1) = krr;
+	subL[1] = krl; subR[1] = krr;
 	/* rotation left shift 15bit */
 	CAMELLIA_ROLDQ(kll, klr, krl, krr, w0, w1, 15);
 	/* k3 */
-	SUBL(4) = kll; SUBR(4) = klr;
+	subL[4] = kll; subR[4] = klr;
 	/* k4 */
-	SUBL(5) = krl; SUBR(5) = krr;
+	subL[5] = krl; subR[5] = krr;
 	/* rotation left shift 15+30bit */
 	CAMELLIA_ROLDQ(kll, klr, krl, krr, w0, w1, 30);
 	/* k7 */
-	SUBL(10) = kll; SUBR(10) = klr;
+	subL[10] = kll; subR[10] = klr;
 	/* k8 */
-	SUBL(11) = krl; SUBR(11) = krr;
+	subL[11] = krl; subR[11] = krr;
 	/* rotation left shift 15+30+15bit */
 	CAMELLIA_ROLDQ(kll, klr, krl, krr, w0, w1, 15);
 	/* k10 */
-	SUBL(13) = krl; SUBR(13) = krr;
+	subL[13] = krl; subR[13] = krr;
 	/* rotation left shift 15+30+15+17 bit */
 	CAMELLIA_ROLDQ(kll, klr, krl, krr, w0, w1, 17);
 	/* kl3 */
-	SUBL(16) = kll; SUBR(16) = klr;
+	subL[16] = kll; subR[16] = klr;
 	/* kl4 */
-	SUBL(17) = krl; SUBR(17) = krr;
+	subL[17] = krl; subR[17] = krr;
 	/* rotation left shift 15+30+15+17+17 bit */
 	CAMELLIA_ROLDQ(kll, klr, krl, krr, w0, w1, 17);
 	/* k13 */
-	SUBL(18) = kll; SUBR(18) = klr;
+	subL[18] = kll; subR[18] = klr;
 	/* k14 */
-	SUBL(19) = krl; SUBR(19) = krr;
+	subL[19] = krl; subR[19] = krr;
 	/* rotation left shift 15+30+15+17+17+17 bit */
 	CAMELLIA_ROLDQ(kll, klr, krl, krr, w0, w1, 17);
 	/* k17 */
-	SUBL(22) = kll; SUBR(22) = klr;
+	subL[22] = kll; subR[22] = klr;
 	/* k18 */
-	SUBL(23) = krl; SUBR(23) = krr;
+	subL[23] = krl; subR[23] = krr;
 
 	/* generate KA */
-	kll = SUBL(0); klr = SUBR(0);
-	krl = SUBL(1); krr = SUBR(1);
+	kll = subL[0]; klr = subR[0];
+	krl = subL[1]; krr = subR[1];
 	CAMELLIA_F(kll, klr,
 		   CAMELLIA_SIGMA1L, CAMELLIA_SIGMA1R,
 		   w0, w1, il, ir, t0, t1);
@@ -555,152 +501,150 @@ static void camellia_setup128(const unsi
 
 	/* generate KA dependent subkeys */
 	/* k1, k2 */
-	SUBL(2) = kll; SUBR(2) = klr;
-	SUBL(3) = krl; SUBR(3) = krr;
+	subL[2] = kll; subR[2] = klr;
+	subL[3] = krl; subR[3] = krr;
 	CAMELLIA_ROLDQ(kll, klr, krl, krr, w0, w1, 15);
 	/* k5,k6 */
-	SUBL(6) = kll; SUBR(6) = klr;
-	SUBL(7) = krl; SUBR(7) = krr;
+	subL[6] = kll; subR[6] = klr;
+	subL[7] = krl; subR[7] = krr;
 	CAMELLIA_ROLDQ(kll, klr, krl, krr, w0, w1, 15);
 	/* kl1, kl2 */
-	SUBL(8) = kll; SUBR(8) = klr;
-	SUBL(9) = krl; SUBR(9) = krr;
+	subL[8] = kll; subR[8] = klr;
+	subL[9] = krl; subR[9] = krr;
 	CAMELLIA_ROLDQ(kll, klr, krl, krr, w0, w1, 15);
 	/* k9 */
-	SUBL(12) = kll; SUBR(12) = klr;
+	subL[12] = kll; subR[12] = klr;
 	CAMELLIA_ROLDQ(kll, klr, krl, krr, w0, w1, 15);
 	/* k11, k12 */
-	SUBL(14) = kll; SUBR(14) = klr;
-	SUBL(15) = krl; SUBR(15) = krr;
+	subL[14] = kll; subR[14] = klr;
+	subL[15] = krl; subR[15] = krr;
 	CAMELLIA_ROLDQo32(kll, klr, krl, krr, w0, w1, 34);
 	/* k15, k16 */
-	SUBL(20) = kll; SUBR(20) = klr;
-	SUBL(21) = krl; SUBR(21) = krr;
+	subL[20] = kll; subR[20] = klr;
+	subL[21] = krl; subR[21] = krr;
 	CAMELLIA_ROLDQ(kll, klr, krl, krr, w0, w1, 17);
 	/* kw3, kw4 */
-	SUBL(24) = kll; SUBR(24) = klr;
-	SUBL(25) = krl; SUBR(25) = krr;
-
+	subL[24] = kll; subR[24] = klr;
+	subL[25] = krl; subR[25] = krr;
 
 	/* absorb kw2 to other subkeys */
 	/* round 2 */
-	SUBL(3) ^= SUBL(1); SUBR(3) ^= SUBR(1);
+	subL[3] ^= subL[1]; subR[3] ^= subR[1];
 	/* round 4 */
-	SUBL(5) ^= SUBL(1); SUBR(5) ^= SUBR(1);
+	subL[5] ^= subL[1]; subR[5] ^= subR[1];
 	/* round 6 */
-	SUBL(7) ^= SUBL(1); SUBR(7) ^= SUBR(1);
-	SUBL(1) ^= SUBR(1) & ~SUBR(9);
-	dw = SUBL(1) & SUBL(9),
-		SUBR(1) ^= CAMELLIA_RL1(dw); /* modified for FLinv(kl2) */
+	subL[7] ^= subL[1]; subR[7] ^= subR[1];
+	subL[1] ^= subR[1] & ~subR[9];
+	dw = subL[1] & subL[9],
+		subR[1] ^= CAMELLIA_RL1(dw); /* modified for FLinv(kl2) */
 	/* round 8 */
-	SUBL(11) ^= SUBL(1); SUBR(11) ^= SUBR(1);
+	subL[11] ^= subL[1]; subR[11] ^= subR[1];
 	/* round 10 */
-	SUBL(13) ^= SUBL(1); SUBR(13) ^= SUBR(1);
+	subL[13] ^= subL[1]; subR[13] ^= subR[1];
 	/* round 12 */
-	SUBL(15) ^= SUBL(1); SUBR(15) ^= SUBR(1);
-	SUBL(1) ^= SUBR(1) & ~SUBR(17);
-	dw = SUBL(1) & SUBL(17),
-		SUBR(1) ^= CAMELLIA_RL1(dw); /* modified for FLinv(kl4) */
+	subL[15] ^= subL[1]; subR[15] ^= subR[1];
+	subL[1] ^= subR[1] & ~subR[17];
+	dw = subL[1] & subL[17],
+		subR[1] ^= CAMELLIA_RL1(dw); /* modified for FLinv(kl4) */
 	/* round 14 */
-	SUBL(19) ^= SUBL(1); SUBR(19) ^= SUBR(1);
+	subL[19] ^= subL[1]; subR[19] ^= subR[1];
 	/* round 16 */
-	SUBL(21) ^= SUBL(1); SUBR(21) ^= SUBR(1);
+	subL[21] ^= subL[1]; subR[21] ^= subR[1];
 	/* round 18 */
-	SUBL(23) ^= SUBL(1); SUBR(23) ^= SUBR(1);
+	subL[23] ^= subL[1]; subR[23] ^= subR[1];
 	/* kw3 */
-	SUBL(24) ^= SUBL(1); SUBR(24) ^= SUBR(1);
+	subL[24] ^= subL[1]; subR[24] ^= subR[1];
 
 	/* absorb kw4 to other subkeys */
-	kw4l = SUBL(25); kw4r = SUBR(25);
+	kw4l = subL[25]; kw4r = subR[25];
 	/* round 17 */
-	SUBL(22) ^= kw4l; SUBR(22) ^= kw4r;
+	subL[22] ^= kw4l; subR[22] ^= kw4r;
 	/* round 15 */
-	SUBL(20) ^= kw4l; SUBR(20) ^= kw4r;
+	subL[20] ^= kw4l; subR[20] ^= kw4r;
 	/* round 13 */
-	SUBL(18) ^= kw4l; SUBR(18) ^= kw4r;
-	kw4l ^= kw4r & ~SUBR(16);
-	dw = kw4l & SUBL(16),
+	subL[18] ^= kw4l; subR[18] ^= kw4r;
+	kw4l ^= kw4r & ~subR[16];
+	dw = kw4l & subL[16],
 		kw4r ^= CAMELLIA_RL1(dw); /* modified for FL(kl3) */
 	/* round 11 */
-	SUBL(14) ^= kw4l; SUBR(14) ^= kw4r;
+	subL[14] ^= kw4l; subR[14] ^= kw4r;
 	/* round 9 */
-	SUBL(12) ^= kw4l; SUBR(12) ^= kw4r;
+	subL[12] ^= kw4l; subR[12] ^= kw4r;
 	/* round 7 */
-	SUBL(10) ^= kw4l; SUBR(10) ^= kw4r;
-	kw4l ^= kw4r & ~SUBR(8);
-	dw = kw4l & SUBL(8),
+	subL[10] ^= kw4l; subR[10] ^= kw4r;
+	kw4l ^= kw4r & ~subR[8];
+	dw = kw4l & subL[8],
 		kw4r ^= CAMELLIA_RL1(dw); /* modified for FL(kl1) */
 	/* round 5 */
-	SUBL(6) ^= kw4l; SUBR(6) ^= kw4r;
+	subL[6] ^= kw4l; subR[6] ^= kw4r;
 	/* round 3 */
-	SUBL(4) ^= kw4l; SUBR(4) ^= kw4r;
+	subL[4] ^= kw4l; subR[4] ^= kw4r;
 	/* round 1 */
-	SUBL(2) ^= kw4l; SUBR(2) ^= kw4r;
+	subL[2] ^= kw4l; subR[2] ^= kw4r;
 	/* kw1 */
-	SUBL(0) ^= kw4l; SUBR(0) ^= kw4r;
-
+	subL[0] ^= kw4l; subR[0] ^= kw4r;
 
 	/* key XOR is end of F-function */
-	CAMELLIA_SUBKEY_L(0) = SUBL(0) ^ SUBL(2);/* kw1 */
-	CAMELLIA_SUBKEY_R(0) = SUBR(0) ^ SUBR(2);
-	CAMELLIA_SUBKEY_L(2) = SUBL(3);       /* round 1 */
-	CAMELLIA_SUBKEY_R(2) = SUBR(3);
-	CAMELLIA_SUBKEY_L(3) = SUBL(2) ^ SUBL(4); /* round 2 */
-	CAMELLIA_SUBKEY_R(3) = SUBR(2) ^ SUBR(4);
-	CAMELLIA_SUBKEY_L(4) = SUBL(3) ^ SUBL(5); /* round 3 */
-	CAMELLIA_SUBKEY_R(4) = SUBR(3) ^ SUBR(5);
-	CAMELLIA_SUBKEY_L(5) = SUBL(4) ^ SUBL(6); /* round 4 */
-	CAMELLIA_SUBKEY_R(5) = SUBR(4) ^ SUBR(6);
-	CAMELLIA_SUBKEY_L(6) = SUBL(5) ^ SUBL(7); /* round 5 */
-	CAMELLIA_SUBKEY_R(6) = SUBR(5) ^ SUBR(7);
-	tl = SUBL(10) ^ (SUBR(10) & ~SUBR(8));
-	dw = tl & SUBL(8),  /* FL(kl1) */
-		tr = SUBR(10) ^ CAMELLIA_RL1(dw);
-	CAMELLIA_SUBKEY_L(7) = SUBL(6) ^ tl; /* round 6 */
-	CAMELLIA_SUBKEY_R(7) = SUBR(6) ^ tr;
-	CAMELLIA_SUBKEY_L(8) = SUBL(8);       /* FL(kl1) */
-	CAMELLIA_SUBKEY_R(8) = SUBR(8);
-	CAMELLIA_SUBKEY_L(9) = SUBL(9);       /* FLinv(kl2) */
-	CAMELLIA_SUBKEY_R(9) = SUBR(9);
-	tl = SUBL(7) ^ (SUBR(7) & ~SUBR(9));
-	dw = tl & SUBL(9),  /* FLinv(kl2) */
-		tr = SUBR(7) ^ CAMELLIA_RL1(dw);
-	CAMELLIA_SUBKEY_L(10) = tl ^ SUBL(11); /* round 7 */
-	CAMELLIA_SUBKEY_R(10) = tr ^ SUBR(11);
-	CAMELLIA_SUBKEY_L(11) = SUBL(10) ^ SUBL(12); /* round 8 */
-	CAMELLIA_SUBKEY_R(11) = SUBR(10) ^ SUBR(12);
-	CAMELLIA_SUBKEY_L(12) = SUBL(11) ^ SUBL(13); /* round 9 */
-	CAMELLIA_SUBKEY_R(12) = SUBR(11) ^ SUBR(13);
-	CAMELLIA_SUBKEY_L(13) = SUBL(12) ^ SUBL(14); /* round 10 */
-	CAMELLIA_SUBKEY_R(13) = SUBR(12) ^ SUBR(14);
-	CAMELLIA_SUBKEY_L(14) = SUBL(13) ^ SUBL(15); /* round 11 */
-	CAMELLIA_SUBKEY_R(14) = SUBR(13) ^ SUBR(15);
-	tl = SUBL(18) ^ (SUBR(18) & ~SUBR(16));
-	dw = tl & SUBL(16), /* FL(kl3) */
-		tr = SUBR(18) ^ CAMELLIA_RL1(dw);
-	CAMELLIA_SUBKEY_L(15) = SUBL(14) ^ tl; /* round 12 */
-	CAMELLIA_SUBKEY_R(15) = SUBR(14) ^ tr;
-	CAMELLIA_SUBKEY_L(16) = SUBL(16);     /* FL(kl3) */
-	CAMELLIA_SUBKEY_R(16) = SUBR(16);
-	CAMELLIA_SUBKEY_L(17) = SUBL(17);     /* FLinv(kl4) */
-	CAMELLIA_SUBKEY_R(17) = SUBR(17);
-	tl = SUBL(15) ^ (SUBR(15) & ~SUBR(17));
-	dw = tl & SUBL(17), /* FLinv(kl4) */
-		tr = SUBR(15) ^ CAMELLIA_RL1(dw);
-	CAMELLIA_SUBKEY_L(18) = tl ^ SUBL(19); /* round 13 */
-	CAMELLIA_SUBKEY_R(18) = tr ^ SUBR(19);
-	CAMELLIA_SUBKEY_L(19) = SUBL(18) ^ SUBL(20); /* round 14 */
-	CAMELLIA_SUBKEY_R(19) = SUBR(18) ^ SUBR(20);
-	CAMELLIA_SUBKEY_L(20) = SUBL(19) ^ SUBL(21); /* round 15 */
-	CAMELLIA_SUBKEY_R(20) = SUBR(19) ^ SUBR(21);
-	CAMELLIA_SUBKEY_L(21) = SUBL(20) ^ SUBL(22); /* round 16 */
-	CAMELLIA_SUBKEY_R(21) = SUBR(20) ^ SUBR(22);
-	CAMELLIA_SUBKEY_L(22) = SUBL(21) ^ SUBL(23); /* round 17 */
-	CAMELLIA_SUBKEY_R(22) = SUBR(21) ^ SUBR(23);
-	CAMELLIA_SUBKEY_L(23) = SUBL(22);     /* round 18 */
-	CAMELLIA_SUBKEY_R(23) = SUBR(22);
-	CAMELLIA_SUBKEY_L(24) = SUBL(24) ^ SUBL(23); /* kw3 */
-	CAMELLIA_SUBKEY_R(24) = SUBR(24) ^ SUBR(23);
+	CAMELLIA_SUBKEY_L(0) = subL[0] ^ subL[2];/* kw1 */
+	CAMELLIA_SUBKEY_R(0) = subR[0] ^ subR[2];
+	CAMELLIA_SUBKEY_L(2) = subL[3];       /* round 1 */
+	CAMELLIA_SUBKEY_R(2) = subR[3];
+	CAMELLIA_SUBKEY_L(3) = subL[2] ^ subL[4]; /* round 2 */
+	CAMELLIA_SUBKEY_R(3) = subR[2] ^ subR[4];
+	CAMELLIA_SUBKEY_L(4) = subL[3] ^ subL[5]; /* round 3 */
+	CAMELLIA_SUBKEY_R(4) = subR[3] ^ subR[5];
+	CAMELLIA_SUBKEY_L(5) = subL[4] ^ subL[6]; /* round 4 */
+	CAMELLIA_SUBKEY_R(5) = subR[4] ^ subR[6];
+	CAMELLIA_SUBKEY_L(6) = subL[5] ^ subL[7]; /* round 5 */
+	CAMELLIA_SUBKEY_R(6) = subR[5] ^ subR[7];
+	tl = subL[10] ^ (subR[10] & ~subR[8]);
+	dw = tl & subL[8],  /* FL(kl1) */
+		tr = subR[10] ^ CAMELLIA_RL1(dw);
+	CAMELLIA_SUBKEY_L(7) = subL[6] ^ tl; /* round 6 */
+	CAMELLIA_SUBKEY_R(7) = subR[6] ^ tr;
+	CAMELLIA_SUBKEY_L(8) = subL[8];       /* FL(kl1) */
+	CAMELLIA_SUBKEY_R(8) = subR[8];
+	CAMELLIA_SUBKEY_L(9) = subL[9];       /* FLinv(kl2) */
+	CAMELLIA_SUBKEY_R(9) = subR[9];
+	tl = subL[7] ^ (subR[7] & ~subR[9]);
+	dw = tl & subL[9],  /* FLinv(kl2) */
+		tr = subR[7] ^ CAMELLIA_RL1(dw);
+	CAMELLIA_SUBKEY_L(10) = tl ^ subL[11]; /* round 7 */
+	CAMELLIA_SUBKEY_R(10) = tr ^ subR[11];
+	CAMELLIA_SUBKEY_L(11) = subL[10] ^ subL[12]; /* round 8 */
+	CAMELLIA_SUBKEY_R(11) = subR[10] ^ subR[12];
+	CAMELLIA_SUBKEY_L(12) = subL[11] ^ subL[13]; /* round 9 */
+	CAMELLIA_SUBKEY_R(12) = subR[11] ^ subR[13];
+	CAMELLIA_SUBKEY_L(13) = subL[12] ^ subL[14]; /* round 10 */
+	CAMELLIA_SUBKEY_R(13) = subR[12] ^ subR[14];
+	CAMELLIA_SUBKEY_L(14) = subL[13] ^ subL[15]; /* round 11 */
+	CAMELLIA_SUBKEY_R(14) = subR[13] ^ subR[15];
+	tl = subL[18] ^ (subR[18] & ~subR[16]);
+	dw = tl & subL[16], /* FL(kl3) */
+		tr = subR[18] ^ CAMELLIA_RL1(dw);
+	CAMELLIA_SUBKEY_L(15) = subL[14] ^ tl; /* round 12 */
+	CAMELLIA_SUBKEY_R(15) = subR[14] ^ tr;
+	CAMELLIA_SUBKEY_L(16) = subL[16];     /* FL(kl3) */
+	CAMELLIA_SUBKEY_R(16) = subR[16];
+	CAMELLIA_SUBKEY_L(17) = subL[17];     /* FLinv(kl4) */
+	CAMELLIA_SUBKEY_R(17) = subR[17];
+	tl = subL[15] ^ (subR[15] & ~subR[17]);
+	dw = tl & subL[17], /* FLinv(kl4) */
+		tr = subR[15] ^ CAMELLIA_RL1(dw);
+	CAMELLIA_SUBKEY_L(18) = tl ^ subL[19]; /* round 13 */
+	CAMELLIA_SUBKEY_R(18) = tr ^ subR[19];
+	CAMELLIA_SUBKEY_L(19) = subL[18] ^ subL[20]; /* round 14 */
+	CAMELLIA_SUBKEY_R(19) = subR[18] ^ subR[20];
+	CAMELLIA_SUBKEY_L(20) = subL[19] ^ subL[21]; /* round 15 */
+	CAMELLIA_SUBKEY_R(20) = subR[19] ^ subR[21];
+	CAMELLIA_SUBKEY_L(21) = subL[20] ^ subL[22]; /* round 16 */
+	CAMELLIA_SUBKEY_R(21) = subR[20] ^ subR[22];
+	CAMELLIA_SUBKEY_L(22) = subL[21] ^ subL[23]; /* round 17 */
+	CAMELLIA_SUBKEY_R(22) = subR[21] ^ subR[23];
+	CAMELLIA_SUBKEY_L(23) = subL[22];     /* round 18 */
+	CAMELLIA_SUBKEY_R(23) = subR[22];
+	CAMELLIA_SUBKEY_L(24) = subL[24] ^ subL[23]; /* kw3 */
+	CAMELLIA_SUBKEY_R(24) = subR[24] ^ subR[23];
 
 	/* apply the inverse of the last half of P-function */
 	dw = CAMELLIA_SUBKEY_L(2) ^ CAMELLIA_SUBKEY_R(2),
@@ -775,11 +719,8 @@ static void camellia_setup128(const unsi
 		dw = CAMELLIA_RL8(dw);/* round 18 */
 	CAMELLIA_SUBKEY_R(23) = CAMELLIA_SUBKEY_L(23) ^ dw,
 		CAMELLIA_SUBKEY_L(23) = dw;
-
-	return;
 }
 
-
 static void camellia_setup256(const unsigned char *key, u32 *subkey)
 {
 	u32 kll,klr,krl,krr;           /* left half of key */
@@ -805,56 +746,56 @@ static void camellia_setup256(const unsi
 
 	/* generate KL dependent subkeys */
 	/* kw1 */
-	SUBL(0) = kll; SUBR(0) = klr;
+	subL[0] = kll; subR[0] = klr;
 	/* kw2 */
-	SUBL(1) = krl; SUBR(1) = krr;
+	subL[1] = krl; subR[1] = krr;
 	CAMELLIA_ROLDQo32(kll, klr, krl, krr, w0, w1, 45);
 	/* k9 */
-	SUBL(12) = kll; SUBR(12) = klr;
+	subL[12] = kll; subR[12] = klr;
 	/* k10 */
-	SUBL(13) = krl; SUBR(13) = krr;
+	subL[13] = krl; subR[13] = krr;
 	CAMELLIA_ROLDQ(kll, klr, krl, krr, w0, w1, 15);
 	/* kl3 */
-	SUBL(16) = kll; SUBR(16) = klr;
+	subL[16] = kll; subR[16] = klr;
 	/* kl4 */
-	SUBL(17) = krl; SUBR(17) = krr;
+	subL[17] = krl; subR[17] = krr;
 	CAMELLIA_ROLDQ(kll, klr, krl, krr, w0, w1, 17);
 	/* k17 */
-	SUBL(22) = kll; SUBR(22) = klr;
+	subL[22] = kll; subR[22] = klr;
 	/* k18 */
-	SUBL(23) = krl; SUBR(23) = krr;
+	subL[23] = krl; subR[23] = krr;
 	CAMELLIA_ROLDQo32(kll, klr, krl, krr, w0, w1, 34);
 	/* k23 */
-	SUBL(30) = kll; SUBR(30) = klr;
+	subL[30] = kll; subR[30] = klr;
 	/* k24 */
-	SUBL(31) = krl; SUBR(31) = krr;
+	subL[31] = krl; subR[31] = krr;
 
 	/* generate KR dependent subkeys */
 	CAMELLIA_ROLDQ(krll, krlr, krrl, krrr, w0, w1, 15);
 	/* k3 */
-	SUBL(4) = krll; SUBR(4) = krlr;
+	subL[4] = krll; subR[4] = krlr;
 	/* k4 */
-	SUBL(5) = krrl; SUBR(5) = krrr;
+	subL[5] = krrl; subR[5] = krrr;
 	CAMELLIA_ROLDQ(krll, krlr, krrl, krrr, w0, w1, 15);
 	/* kl1 */
-	SUBL(8) = krll; SUBR(8) = krlr;
+	subL[8] = krll; subR[8] = krlr;
 	/* kl2 */
-	SUBL(9) = krrl; SUBR(9) = krrr;
+	subL[9] = krrl; subR[9] = krrr;
 	CAMELLIA_ROLDQ(krll, krlr, krrl, krrr, w0, w1, 30);
 	/* k13 */
-	SUBL(18) = krll; SUBR(18) = krlr;
+	subL[18] = krll; subR[18] = krlr;
 	/* k14 */
-	SUBL(19) = krrl; SUBR(19) = krrr;
+	subL[19] = krrl; subR[19] = krrr;
 	CAMELLIA_ROLDQo32(krll, krlr, krrl, krrr, w0, w1, 34);
 	/* k19 */
-	SUBL(26) = krll; SUBR(26) = krlr;
+	subL[26] = krll; subR[26] = krlr;
 	/* k20 */
-	SUBL(27) = krrl; SUBR(27) = krrr;
+	subL[27] = krrl; subR[27] = krrr;
 	CAMELLIA_ROLDQo32(krll, krlr, krrl, krrr, w0, w1, 34);
 
 	/* generate KA */
-	kll = SUBL(0) ^ krll; klr = SUBR(0) ^ krlr;
-	krl = SUBL(1) ^ krrl; krr = SUBR(1) ^ krrr;
+	kll = subL[0] ^ krll; klr = subR[0] ^ krlr;
+	krl = subL[1] ^ krrl; krr = subR[1] ^ krrr;
 	CAMELLIA_F(kll, klr,
 		   CAMELLIA_SIGMA1L, CAMELLIA_SIGMA1R,
 		   w0, w1, il, ir, t0, t1);
@@ -887,208 +828,207 @@ static void camellia_setup256(const unsi
 	/* generate KA dependent subkeys */
 	CAMELLIA_ROLDQ(kll, klr, krl, krr, w0, w1, 15);
 	/* k5 */
-	SUBL(6) = kll; SUBR(6) = klr;
+	subL[6] = kll; subR[6] = klr;
 	/* k6 */
-	SUBL(7) = krl; SUBR(7) = krr;
+	subL[7] = krl; subR[7] = krr;
 	CAMELLIA_ROLDQ(kll, klr, krl, krr, w0, w1, 30);
 	/* k11 */
-	SUBL(14) = kll; SUBR(14) = klr;
+	subL[14] = kll; subR[14] = klr;
 	/* k12 */
-	SUBL(15) = krl; SUBR(15) = krr;
+	subL[15] = krl; subR[15] = krr;
 	/* rotation left shift 32bit */
 	/* kl5 */
-	SUBL(24) = klr; SUBR(24) = krl;
+	subL[24] = klr; subR[24] = krl;
 	/* kl6 */
-	SUBL(25) = krr; SUBR(25) = kll;
+	subL[25] = krr; subR[25] = kll;
 	/* rotation left shift 49 from k11,k12 -> k21,k22 */
 	CAMELLIA_ROLDQo32(kll, klr, krl, krr, w0, w1, 49);
 	/* k21 */
-	SUBL(28) = kll; SUBR(28) = klr;
+	subL[28] = kll; subR[28] = klr;
 	/* k22 */
-	SUBL(29) = krl; SUBR(29) = krr;
+	subL[29] = krl; subR[29] = krr;
 
 	/* generate KB dependent subkeys */
 	/* k1 */
-	SUBL(2) = krll; SUBR(2) = krlr;
+	subL[2] = krll; subR[2] = krlr;
 	/* k2 */
-	SUBL(3) = krrl; SUBR(3) = krrr;
+	subL[3] = krrl; subR[3] = krrr;
 	CAMELLIA_ROLDQ(krll, krlr, krrl, krrr, w0, w1, 30);
 	/* k7 */
-	SUBL(10) = krll; SUBR(10) = krlr;
+	subL[10] = krll; subR[10] = krlr;
 	/* k8 */
-	SUBL(11) = krrl; SUBR(11) = krrr;
+	subL[11] = krrl; subR[11] = krrr;
 	CAMELLIA_ROLDQ(krll, krlr, krrl, krrr, w0, w1, 30);
 	/* k15 */
-	SUBL(20) = krll; SUBR(20) = krlr;
+	subL[20] = krll; subR[20] = krlr;
 	/* k16 */
-	SUBL(21) = krrl; SUBR(21) = krrr;
+	subL[21] = krrl; subR[21] = krrr;
 	CAMELLIA_ROLDQo32(krll, krlr, krrl, krrr, w0, w1, 51);
 	/* kw3 */
-	SUBL(32) = krll; SUBR(32) = krlr;
+	subL[32] = krll; subR[32] = krlr;
 	/* kw4 */
-	SUBL(33) = krrl; SUBR(33) = krrr;
+	subL[33] = krrl; subR[33] = krrr;
 
 	/* absorb kw2 to other subkeys */
 	/* round 2 */
-	SUBL(3) ^= SUBL(1); SUBR(3) ^= SUBR(1);
+	subL[3] ^= subL[1]; subR[3] ^= subR[1];
 	/* round 4 */
-	SUBL(5) ^= SUBL(1); SUBR(5) ^= SUBR(1);
+	subL[5] ^= subL[1]; subR[5] ^= subR[1];
 	/* round 6 */
-	SUBL(7) ^= SUBL(1); SUBR(7) ^= SUBR(1);
-	SUBL(1) ^= SUBR(1) & ~SUBR(9);
-	dw = SUBL(1) & SUBL(9),
-		SUBR(1) ^= CAMELLIA_RL1(dw); /* modified for FLinv(kl2) */
+	subL[7] ^= subL[1]; subR[7] ^= subR[1];
+	subL[1] ^= subR[1] & ~subR[9];
+	dw = subL[1] & subL[9],
+		subR[1] ^= CAMELLIA_RL1(dw); /* modified for FLinv(kl2) */
 	/* round 8 */
-	SUBL(11) ^= SUBL(1); SUBR(11) ^= SUBR(1);
+	subL[11] ^= subL[1]; subR[11] ^= subR[1];
 	/* round 10 */
-	SUBL(13) ^= SUBL(1); SUBR(13) ^= SUBR(1);
+	subL[13] ^= subL[1]; subR[13] ^= subR[1];
 	/* round 12 */
-	SUBL(15) ^= SUBL(1); SUBR(15) ^= SUBR(1);
-	SUBL(1) ^= SUBR(1) & ~SUBR(17);
-	dw = SUBL(1) & SUBL(17),
-		SUBR(1) ^= CAMELLIA_RL1(dw); /* modified for FLinv(kl4) */
+	subL[15] ^= subL[1]; subR[15] ^= subR[1];
+	subL[1] ^= subR[1] & ~subR[17];
+	dw = subL[1] & subL[17],
+		subR[1] ^= CAMELLIA_RL1(dw); /* modified for FLinv(kl4) */
 	/* round 14 */
-	SUBL(19) ^= SUBL(1); SUBR(19) ^= SUBR(1);
+	subL[19] ^= subL[1]; subR[19] ^= subR[1];
 	/* round 16 */
-	SUBL(21) ^= SUBL(1); SUBR(21) ^= SUBR(1);
+	subL[21] ^= subL[1]; subR[21] ^= subR[1];
 	/* round 18 */
-	SUBL(23) ^= SUBL(1); SUBR(23) ^= SUBR(1);
-	SUBL(1) ^= SUBR(1) & ~SUBR(25);
-	dw = SUBL(1) & SUBL(25),
-		SUBR(1) ^= CAMELLIA_RL1(dw); /* modified for FLinv(kl6) */
+	subL[23] ^= subL[1]; subR[23] ^= subR[1];
+	subL[1] ^= subR[1] & ~subR[25];
+	dw = subL[1] & subL[25],
+		subR[1] ^= CAMELLIA_RL1(dw); /* modified for FLinv(kl6) */
 	/* round 20 */
-	SUBL(27) ^= SUBL(1); SUBR(27) ^= SUBR(1);
+	subL[27] ^= subL[1]; subR[27] ^= subR[1];
 	/* round 22 */
-	SUBL(29) ^= SUBL(1); SUBR(29) ^= SUBR(1);
+	subL[29] ^= subL[1]; subR[29] ^= subR[1];
 	/* round 24 */
-	SUBL(31) ^= SUBL(1); SUBR(31) ^= SUBR(1);
+	subL[31] ^= subL[1]; subR[31] ^= subR[1];
 	/* kw3 */
-	SUBL(32) ^= SUBL(1); SUBR(32) ^= SUBR(1);
-
+	subL[32] ^= subL[1]; subR[32] ^= subR[1];
 
 	/* absorb kw4 to other subkeys */
-	kw4l = SUBL(33); kw4r = SUBR(33);
+	kw4l = subL[33]; kw4r = subR[33];
 	/* round 23 */
-	SUBL(30) ^= kw4l; SUBR(30) ^= kw4r;
+	subL[30] ^= kw4l; subR[30] ^= kw4r;
 	/* round 21 */
-	SUBL(28) ^= kw4l; SUBR(28) ^= kw4r;
+	subL[28] ^= kw4l; subR[28] ^= kw4r;
 	/* round 19 */
-	SUBL(26) ^= kw4l; SUBR(26) ^= kw4r;
-	kw4l ^= kw4r & ~SUBR(24);
-	dw = kw4l & SUBL(24),
+	subL[26] ^= kw4l; subR[26] ^= kw4r;
+	kw4l ^= kw4r & ~subR[24];
+	dw = kw4l & subL[24],
 		kw4r ^= CAMELLIA_RL1(dw); /* modified for FL(kl5) */
 	/* round 17 */
-	SUBL(22) ^= kw4l; SUBR(22) ^= kw4r;
+	subL[22] ^= kw4l; subR[22] ^= kw4r;
 	/* round 15 */
-	SUBL(20) ^= kw4l; SUBR(20) ^= kw4r;
+	subL[20] ^= kw4l; subR[20] ^= kw4r;
 	/* round 13 */
-	SUBL(18) ^= kw4l; SUBR(18) ^= kw4r;
-	kw4l ^= kw4r & ~SUBR(16);
-	dw = kw4l & SUBL(16),
+	subL[18] ^= kw4l; subR[18] ^= kw4r;
+	kw4l ^= kw4r & ~subR[16];
+	dw = kw4l & subL[16],
 		kw4r ^= CAMELLIA_RL1(dw); /* modified for FL(kl3) */
 	/* round 11 */
-	SUBL(14) ^= kw4l; SUBR(14) ^= kw4r;
+	subL[14] ^= kw4l; subR[14] ^= kw4r;
 	/* round 9 */
-	SUBL(12) ^= kw4l; SUBR(12) ^= kw4r;
+	subL[12] ^= kw4l; subR[12] ^= kw4r;
 	/* round 7 */
-	SUBL(10) ^= kw4l; SUBR(10) ^= kw4r;
-	kw4l ^= kw4r & ~SUBR(8);
-	dw = kw4l & SUBL(8),
+	subL[10] ^= kw4l; subR[10] ^= kw4r;
+	kw4l ^= kw4r & ~subR[8];
+	dw = kw4l & subL[8],
 		kw4r ^= CAMELLIA_RL1(dw); /* modified for FL(kl1) */
 	/* round 5 */
-	SUBL(6) ^= kw4l; SUBR(6) ^= kw4r;
+	subL[6] ^= kw4l; subR[6] ^= kw4r;
 	/* round 3 */
-	SUBL(4) ^= kw4l; SUBR(4) ^= kw4r;
+	subL[4] ^= kw4l; subR[4] ^= kw4r;
 	/* round 1 */
-	SUBL(2) ^= kw4l; SUBR(2) ^= kw4r;
+	subL[2] ^= kw4l; subR[2] ^= kw4r;
 	/* kw1 */
-	SUBL(0) ^= kw4l; SUBR(0) ^= kw4r;
+	subL[0] ^= kw4l; subR[0] ^= kw4r;
 
 	/* key XOR is end of F-function */
-	CAMELLIA_SUBKEY_L(0) = SUBL(0) ^ SUBL(2);/* kw1 */
-	CAMELLIA_SUBKEY_R(0) = SUBR(0) ^ SUBR(2);
-	CAMELLIA_SUBKEY_L(2) = SUBL(3);       /* round 1 */
-	CAMELLIA_SUBKEY_R(2) = SUBR(3);
-	CAMELLIA_SUBKEY_L(3) = SUBL(2) ^ SUBL(4); /* round 2 */
-	CAMELLIA_SUBKEY_R(3) = SUBR(2) ^ SUBR(4);
-	CAMELLIA_SUBKEY_L(4) = SUBL(3) ^ SUBL(5); /* round 3 */
-	CAMELLIA_SUBKEY_R(4) = SUBR(3) ^ SUBR(5);
-	CAMELLIA_SUBKEY_L(5) = SUBL(4) ^ SUBL(6); /* round 4 */
-	CAMELLIA_SUBKEY_R(5) = SUBR(4) ^ SUBR(6);
-	CAMELLIA_SUBKEY_L(6) = SUBL(5) ^ SUBL(7); /* round 5 */
-	CAMELLIA_SUBKEY_R(6) = SUBR(5) ^ SUBR(7);
-	tl = SUBL(10) ^ (SUBR(10) & ~SUBR(8));
-	dw = tl & SUBL(8),  /* FL(kl1) */
-		tr = SUBR(10) ^ CAMELLIA_RL1(dw);
-	CAMELLIA_SUBKEY_L(7) = SUBL(6) ^ tl; /* round 6 */
-	CAMELLIA_SUBKEY_R(7) = SUBR(6) ^ tr;
-	CAMELLIA_SUBKEY_L(8) = SUBL(8);       /* FL(kl1) */
-	CAMELLIA_SUBKEY_R(8) = SUBR(8);
-	CAMELLIA_SUBKEY_L(9) = SUBL(9);       /* FLinv(kl2) */
-	CAMELLIA_SUBKEY_R(9) = SUBR(9);
-	tl = SUBL(7) ^ (SUBR(7) & ~SUBR(9));
-	dw = tl & SUBL(9),  /* FLinv(kl2) */
-		tr = SUBR(7) ^ CAMELLIA_RL1(dw);
-	CAMELLIA_SUBKEY_L(10) = tl ^ SUBL(11); /* round 7 */
-	CAMELLIA_SUBKEY_R(10) = tr ^ SUBR(11);
-	CAMELLIA_SUBKEY_L(11) = SUBL(10) ^ SUBL(12); /* round 8 */
-	CAMELLIA_SUBKEY_R(11) = SUBR(10) ^ SUBR(12);
-	CAMELLIA_SUBKEY_L(12) = SUBL(11) ^ SUBL(13); /* round 9 */
-	CAMELLIA_SUBKEY_R(12) = SUBR(11) ^ SUBR(13);
-	CAMELLIA_SUBKEY_L(13) = SUBL(12) ^ SUBL(14); /* round 10 */
-	CAMELLIA_SUBKEY_R(13) = SUBR(12) ^ SUBR(14);
-	CAMELLIA_SUBKEY_L(14) = SUBL(13) ^ SUBL(15); /* round 11 */
-	CAMELLIA_SUBKEY_R(14) = SUBR(13) ^ SUBR(15);
-	tl = SUBL(18) ^ (SUBR(18) & ~SUBR(16));
-	dw = tl & SUBL(16), /* FL(kl3) */
-		tr = SUBR(18) ^ CAMELLIA_RL1(dw);
-	CAMELLIA_SUBKEY_L(15) = SUBL(14) ^ tl; /* round 12 */
-	CAMELLIA_SUBKEY_R(15) = SUBR(14) ^ tr;
-	CAMELLIA_SUBKEY_L(16) = SUBL(16);     /* FL(kl3) */
-	CAMELLIA_SUBKEY_R(16) = SUBR(16);
-	CAMELLIA_SUBKEY_L(17) = SUBL(17);     /* FLinv(kl4) */
-	CAMELLIA_SUBKEY_R(17) = SUBR(17);
-	tl = SUBL(15) ^ (SUBR(15) & ~SUBR(17));
-	dw = tl & SUBL(17), /* FLinv(kl4) */
-		tr = SUBR(15) ^ CAMELLIA_RL1(dw);
-	CAMELLIA_SUBKEY_L(18) = tl ^ SUBL(19); /* round 13 */
-	CAMELLIA_SUBKEY_R(18) = tr ^ SUBR(19);
-	CAMELLIA_SUBKEY_L(19) = SUBL(18) ^ SUBL(20); /* round 14 */
-	CAMELLIA_SUBKEY_R(19) = SUBR(18) ^ SUBR(20);
-	CAMELLIA_SUBKEY_L(20) = SUBL(19) ^ SUBL(21); /* round 15 */
-	CAMELLIA_SUBKEY_R(20) = SUBR(19) ^ SUBR(21);
-	CAMELLIA_SUBKEY_L(21) = SUBL(20) ^ SUBL(22); /* round 16 */
-	CAMELLIA_SUBKEY_R(21) = SUBR(20) ^ SUBR(22);
-	CAMELLIA_SUBKEY_L(22) = SUBL(21) ^ SUBL(23); /* round 17 */
-	CAMELLIA_SUBKEY_R(22) = SUBR(21) ^ SUBR(23);
-	tl = SUBL(26) ^ (SUBR(26)
-			 & ~SUBR(24));
-	dw = tl & SUBL(24), /* FL(kl5) */
-		tr = SUBR(26) ^ CAMELLIA_RL1(dw);
-	CAMELLIA_SUBKEY_L(23) = SUBL(22) ^ tl; /* round 18 */
-	CAMELLIA_SUBKEY_R(23) = SUBR(22) ^ tr;
-	CAMELLIA_SUBKEY_L(24) = SUBL(24);     /* FL(kl5) */
-	CAMELLIA_SUBKEY_R(24) = SUBR(24);
-	CAMELLIA_SUBKEY_L(25) = SUBL(25);     /* FLinv(kl6) */
-	CAMELLIA_SUBKEY_R(25) = SUBR(25);
-	tl = SUBL(23) ^ (SUBR(23) &
-			 ~SUBR(25));
-	dw = tl & SUBL(25), /* FLinv(kl6) */
-		tr = SUBR(23) ^ CAMELLIA_RL1(dw);
-	CAMELLIA_SUBKEY_L(26) = tl ^ SUBL(27); /* round 19 */
-	CAMELLIA_SUBKEY_R(26) = tr ^ SUBR(27);
-	CAMELLIA_SUBKEY_L(27) = SUBL(26) ^ SUBL(28); /* round 20 */
-	CAMELLIA_SUBKEY_R(27) = SUBR(26) ^ SUBR(28);
-	CAMELLIA_SUBKEY_L(28) = SUBL(27) ^ SUBL(29); /* round 21 */
-	CAMELLIA_SUBKEY_R(28) = SUBR(27) ^ SUBR(29);
-	CAMELLIA_SUBKEY_L(29) = SUBL(28) ^ SUBL(30); /* round 22 */
-	CAMELLIA_SUBKEY_R(29) = SUBR(28) ^ SUBR(30);
-	CAMELLIA_SUBKEY_L(30) = SUBL(29) ^ SUBL(31); /* round 23 */
-	CAMELLIA_SUBKEY_R(30) = SUBR(29) ^ SUBR(31);
-	CAMELLIA_SUBKEY_L(31) = SUBL(30);     /* round 24 */
-	CAMELLIA_SUBKEY_R(31) = SUBR(30);
-	CAMELLIA_SUBKEY_L(32) = SUBL(32) ^ SUBL(31); /* kw3 */
-	CAMELLIA_SUBKEY_R(32) = SUBR(32) ^ SUBR(31);
+	CAMELLIA_SUBKEY_L(0) = subL[0] ^ subL[2];/* kw1 */
+	CAMELLIA_SUBKEY_R(0) = subR[0] ^ subR[2];
+	CAMELLIA_SUBKEY_L(2) = subL[3];       /* round 1 */
+	CAMELLIA_SUBKEY_R(2) = subR[3];
+	CAMELLIA_SUBKEY_L(3) = subL[2] ^ subL[4]; /* round 2 */
+	CAMELLIA_SUBKEY_R(3) = subR[2] ^ subR[4];
+	CAMELLIA_SUBKEY_L(4) = subL[3] ^ subL[5]; /* round 3 */
+	CAMELLIA_SUBKEY_R(4) = subR[3] ^ subR[5];
+	CAMELLIA_SUBKEY_L(5) = subL[4] ^ subL[6]; /* round 4 */
+	CAMELLIA_SUBKEY_R(5) = subR[4] ^ subR[6];
+	CAMELLIA_SUBKEY_L(6) = subL[5] ^ subL[7]; /* round 5 */
+	CAMELLIA_SUBKEY_R(6) = subR[5] ^ subR[7];
+	tl = subL[10] ^ (subR[10] & ~subR[8]);
+	dw = tl & subL[8],  /* FL(kl1) */
+		tr = subR[10] ^ CAMELLIA_RL1(dw);
+	CAMELLIA_SUBKEY_L(7) = subL[6] ^ tl; /* round 6 */
+	CAMELLIA_SUBKEY_R(7) = subR[6] ^ tr;
+	CAMELLIA_SUBKEY_L(8) = subL[8];       /* FL(kl1) */
+	CAMELLIA_SUBKEY_R(8) = subR[8];
+	CAMELLIA_SUBKEY_L(9) = subL[9];       /* FLinv(kl2) */
+	CAMELLIA_SUBKEY_R(9) = subR[9];
+	tl = subL[7] ^ (subR[7] & ~subR[9]);
+	dw = tl & subL[9],  /* FLinv(kl2) */
+		tr = subR[7] ^ CAMELLIA_RL1(dw);
+	CAMELLIA_SUBKEY_L(10) = tl ^ subL[11]; /* round 7 */
+	CAMELLIA_SUBKEY_R(10) = tr ^ subR[11];
+	CAMELLIA_SUBKEY_L(11) = subL[10] ^ subL[12]; /* round 8 */
+	CAMELLIA_SUBKEY_R(11) = subR[10] ^ subR[12];
+	CAMELLIA_SUBKEY_L(12) = subL[11] ^ subL[13]; /* round 9 */
+	CAMELLIA_SUBKEY_R(12) = subR[11] ^ subR[13];
+	CAMELLIA_SUBKEY_L(13) = subL[12] ^ subL[14]; /* round 10 */
+	CAMELLIA_SUBKEY_R(13) = subR[12] ^ subR[14];
+	CAMELLIA_SUBKEY_L(14) = subL[13] ^ subL[15]; /* round 11 */
+	CAMELLIA_SUBKEY_R(14) = subR[13] ^ subR[15];
+	tl = subL[18] ^ (subR[18] & ~subR[16]);
+	dw = tl & subL[16], /* FL(kl3) */
+		tr = subR[18] ^ CAMELLIA_RL1(dw);
+	CAMELLIA_SUBKEY_L(15) = subL[14] ^ tl; /* round 12 */
+	CAMELLIA_SUBKEY_R(15) = subR[14] ^ tr;
+	CAMELLIA_SUBKEY_L(16) = subL[16];     /* FL(kl3) */
+	CAMELLIA_SUBKEY_R(16) = subR[16];
+	CAMELLIA_SUBKEY_L(17) = subL[17];     /* FLinv(kl4) */
+	CAMELLIA_SUBKEY_R(17) = subR[17];
+	tl = subL[15] ^ (subR[15] & ~subR[17]);
+	dw = tl & subL[17], /* FLinv(kl4) */
+		tr = subR[15] ^ CAMELLIA_RL1(dw);
+	CAMELLIA_SUBKEY_L(18) = tl ^ subL[19]; /* round 13 */
+	CAMELLIA_SUBKEY_R(18) = tr ^ subR[19];
+	CAMELLIA_SUBKEY_L(19) = subL[18] ^ subL[20]; /* round 14 */
+	CAMELLIA_SUBKEY_R(19) = subR[18] ^ subR[20];
+	CAMELLIA_SUBKEY_L(20) = subL[19] ^ subL[21]; /* round 15 */
+	CAMELLIA_SUBKEY_R(20) = subR[19] ^ subR[21];
+	CAMELLIA_SUBKEY_L(21) = subL[20] ^ subL[22]; /* round 16 */
+	CAMELLIA_SUBKEY_R(21) = subR[20] ^ subR[22];
+	CAMELLIA_SUBKEY_L(22) = subL[21] ^ subL[23]; /* round 17 */
+	CAMELLIA_SUBKEY_R(22) = subR[21] ^ subR[23];
+	tl = subL[26] ^ (subR[26]
+			 & ~subR[24]);
+	dw = tl & subL[24], /* FL(kl5) */
+		tr = subR[26] ^ CAMELLIA_RL1(dw);
+	CAMELLIA_SUBKEY_L(23) = subL[22] ^ tl; /* round 18 */
+	CAMELLIA_SUBKEY_R(23) = subR[22] ^ tr;
+	CAMELLIA_SUBKEY_L(24) = subL[24];     /* FL(kl5) */
+	CAMELLIA_SUBKEY_R(24) = subR[24];
+	CAMELLIA_SUBKEY_L(25) = subL[25];     /* FLinv(kl6) */
+	CAMELLIA_SUBKEY_R(25) = subR[25];
+	tl = subL[23] ^ (subR[23] &
+			 ~subR[25]);
+	dw = tl & subL[25], /* FLinv(kl6) */
+		tr = subR[23] ^ CAMELLIA_RL1(dw);
+	CAMELLIA_SUBKEY_L(26) = tl ^ subL[27]; /* round 19 */
+	CAMELLIA_SUBKEY_R(26) = tr ^ subR[27];
+	CAMELLIA_SUBKEY_L(27) = subL[26] ^ subL[28]; /* round 20 */
+	CAMELLIA_SUBKEY_R(27) = subR[26] ^ subR[28];
+	CAMELLIA_SUBKEY_L(28) = subL[27] ^ subL[29]; /* round 21 */
+	CAMELLIA_SUBKEY_R(28) = subR[27] ^ subR[29];
+	CAMELLIA_SUBKEY_L(29) = subL[28] ^ subL[30]; /* round 22 */
+	CAMELLIA_SUBKEY_R(29) = subR[28] ^ subR[30];
+	CAMELLIA_SUBKEY_L(30) = subL[29] ^ subL[31]; /* round 23 */
+	CAMELLIA_SUBKEY_R(30) = subR[29] ^ subR[31];
+	CAMELLIA_SUBKEY_L(31) = subL[30];     /* round 24 */
+	CAMELLIA_SUBKEY_R(31) = subR[30];
+	CAMELLIA_SUBKEY_L(32) = subL[32] ^ subL[31]; /* kw3 */
+	CAMELLIA_SUBKEY_R(32) = subR[32] ^ subR[31];
 
 	/* apply the inverse of the last half of P-function */
 	dw = CAMELLIA_SUBKEY_L(2) ^ CAMELLIA_SUBKEY_R(2),
@@ -1187,8 +1127,6 @@ static void camellia_setup256(const unsi
 		dw = CAMELLIA_RL8(dw);/* round 24 */
 	CAMELLIA_SUBKEY_R(31) = CAMELLIA_SUBKEY_L(31) ^ dw,
 		CAMELLIA_SUBKEY_L(31) = dw;
-
-	return;
 }
 
 static void camellia_setup192(const unsigned char *key, u32 *subkey)
@@ -1197,20 +1135,16 @@ static void camellia_setup192(const unsi
 	u32 krll, krlr, krrl,krrr;
 
 	memcpy(kk, key, 24);
-	memcpy((unsigned char *)&krll, key+16,4);
-	memcpy((unsigned char *)&krlr, key+20,4);
+	memcpy((unsigned char *)&krll, key+16, 4);
+	memcpy((unsigned char *)&krlr, key+20, 4);
 	krrl = ~krll;
 	krrr = ~krlr;
 	memcpy(kk+24, (unsigned char *)&krrl, 4);
 	memcpy(kk+28, (unsigned char *)&krrr, 4);
 	camellia_setup256(kk, subkey);
-	return;
 }
 
 
-/**
- * Stuff related to camellia encryption/decryption
- */
 static void camellia_encrypt128(const u32 *subkey, __be32 *io_text)
 {
 	u32 il,ir,t0,t1;               /* temporary valiables */
@@ -1222,11 +1156,11 @@ static void camellia_encrypt128(const u3
 	io[2] = be32_to_cpu(io_text[2]);
 	io[3] = be32_to_cpu(io_text[3]);
 
-	/* pre whitening but absorb kw2*/
+	/* pre whitening but absorb kw2 */
 	io[0] ^= CAMELLIA_SUBKEY_L(0);
 	io[1] ^= CAMELLIA_SUBKEY_R(0);
-	/* main iteration */
 
+	/* main iteration */
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 CAMELLIA_SUBKEY_L(2),CAMELLIA_SUBKEY_R(2),
 			 io[2],io[3],il,ir,t0,t1);
@@ -1298,19 +1232,10 @@ static void camellia_encrypt128(const u3
 	io[2] ^= CAMELLIA_SUBKEY_L(24);
 	io[3] ^= CAMELLIA_SUBKEY_R(24);
 
-	t0 = io[0];
-	t1 = io[1];
-	io[0] = io[2];
-	io[1] = io[3];
-	io[2] = t0;
-	io[3] = t1;
-
-	io_text[0] = cpu_to_be32(io[0]);
-	io_text[1] = cpu_to_be32(io[1]);
-	io_text[2] = cpu_to_be32(io[2]);
-	io_text[3] = cpu_to_be32(io[3]);
-
-	return;
+	io_text[0] = cpu_to_be32(io[2]);
+	io_text[1] = cpu_to_be32(io[3]);
+	io_text[2] = cpu_to_be32(io[0]);
+	io_text[3] = cpu_to_be32(io[1]);
 }
 
 static void camellia_decrypt128(const u32 *subkey, __be32 *io_text)
@@ -1324,7 +1249,7 @@ static void camellia_decrypt128(const u3
 	io[2] = be32_to_cpu(io_text[2]);
 	io[3] = be32_to_cpu(io_text[3]);
 
-	/* pre whitening but absorb kw2*/
+	/* pre whitening but absorb kw2 */
 	io[0] ^= CAMELLIA_SUBKEY_L(24);
 	io[1] ^= CAMELLIA_SUBKEY_R(24);
 
@@ -1400,25 +1325,12 @@ static void camellia_decrypt128(const u3
 	io[2] ^= CAMELLIA_SUBKEY_L(0);
 	io[3] ^= CAMELLIA_SUBKEY_R(0);
 
-	t0 = io[0];
-	t1 = io[1];
-	io[0] = io[2];
-	io[1] = io[3];
-	io[2] = t0;
-	io[3] = t1;
-
-	io_text[0] = cpu_to_be32(io[0]);
-	io_text[1] = cpu_to_be32(io[1]);
-	io_text[2] = cpu_to_be32(io[2]);
-	io_text[3] = cpu_to_be32(io[3]);
-
-	return;
+	io_text[0] = cpu_to_be32(io[2]);
+	io_text[1] = cpu_to_be32(io[3]);
+	io_text[2] = cpu_to_be32(io[0]);
+	io_text[3] = cpu_to_be32(io[1]);
 }
 
-
-/**
- * stuff for 192 and 256bit encryption/decryption
- */
 static void camellia_encrypt256(const u32 *subkey, __be32 *io_text)
 {
 	u32 il,ir,t0,t1;           /* temporary valiables */
@@ -1430,7 +1342,7 @@ static void camellia_encrypt256(const u3
 	io[2] = be32_to_cpu(io_text[2]);
 	io[3] = be32_to_cpu(io_text[3]);
 
-	/* pre whitening but absorb kw2*/
+	/* pre whitening but absorb kw2 */
 	io[0] ^= CAMELLIA_SUBKEY_L(0);
 	io[1] ^= CAMELLIA_SUBKEY_R(0);
 
@@ -1530,22 +1442,12 @@ static void camellia_encrypt256(const u3
 	io[2] ^= CAMELLIA_SUBKEY_L(32);
 	io[3] ^= CAMELLIA_SUBKEY_R(32);
 
-	t0 = io[0];
-	t1 = io[1];
-	io[0] = io[2];
-	io[1] = io[3];
-	io[2] = t0;
-	io[3] = t1;
-
-	io_text[0] = cpu_to_be32(io[0]);
-	io_text[1] = cpu_to_be32(io[1]);
-	io_text[2] = cpu_to_be32(io[2]);
-	io_text[3] = cpu_to_be32(io[3]);
-
-	return;
+	io_text[0] = cpu_to_be32(io[2]);
+	io_text[1] = cpu_to_be32(io[3]);
+	io_text[2] = cpu_to_be32(io[0]);
+	io_text[3] = cpu_to_be32(io[1]);
 }
 
-
 static void camellia_decrypt256(const u32 *subkey, __be32 *io_text)
 {
 	u32 il,ir,t0,t1;           /* temporary valiables */
@@ -1557,7 +1459,7 @@ static void camellia_decrypt256(const u3
 	io[2] = be32_to_cpu(io_text[2]);
 	io[3] = be32_to_cpu(io_text[3]);
 
-	/* pre whitening but absorb kw2*/
+	/* pre whitening but absorb kw2 */
 	io[0] ^= CAMELLIA_SUBKEY_L(32);
 	io[1] ^= CAMELLIA_SUBKEY_R(32);
 
@@ -1657,22 +1559,18 @@ static void camellia_decrypt256(const u3
 	io[2] ^= CAMELLIA_SUBKEY_L(0);
 	io[3] ^= CAMELLIA_SUBKEY_R(0);
 
-	t0 = io[0];
-	t1 = io[1];
-	io[0] = io[2];
-	io[1] = io[3];
-	io[2] = t0;
-	io[3] = t1;
-
-	io_text[0] = cpu_to_be32(io[0]);
-	io_text[1] = cpu_to_be32(io[1]);
-	io_text[2] = cpu_to_be32(io[2]);
-	io_text[3] = cpu_to_be32(io[3]);
-
-	return;
+	io_text[0] = cpu_to_be32(io[2]);
+	io_text[1] = cpu_to_be32(io[3]);
+	io_text[2] = cpu_to_be32(io[0]);
+	io_text[3] = cpu_to_be32(io[1]);
 }
 
 
+struct camellia_ctx {
+	int key_length;
+	u32 key_table[CAMELLIA_TABLE_BYTE_LEN / 4];
+};
+
 static int
 camellia_set_key(struct crypto_tfm *tfm, const u8 *in_key,
 		 unsigned int key_len)
@@ -1688,7 +1586,7 @@ camellia_set_key(struct crypto_tfm *tfm,
 
 	cctx->key_length = key_len;
 
-	switch(key_len) {
+	switch (key_len) {
 	case 16:
 		camellia_setup128(key, cctx->key_table);
 		break;
@@ -1698,14 +1596,11 @@ camellia_set_key(struct crypto_tfm *tfm,
 	case 32:
 		camellia_setup256(key, cctx->key_table);
 		break;
-	default:
-		break;
 	}
 
 	return 0;
 }
 
-
 static void camellia_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
 {
 	const struct camellia_ctx *cctx = crypto_tfm_ctx(tfm);
@@ -1725,14 +1620,11 @@ static void camellia_encrypt(struct cryp
 	case 32:
 		camellia_encrypt256(cctx->key_table, tmp);
 		break;
-	default:
-		break;
 	}
 
 	memcpy(dst, tmp, CAMELLIA_BLOCK_SIZE);
 }
 
-
 static void camellia_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
 {
 	const struct camellia_ctx *cctx = crypto_tfm_ctx(tfm);
@@ -1752,14 +1644,11 @@ static void camellia_decrypt(struct cryp
 	case 32:
 		camellia_decrypt256(cctx->key_table, tmp);
 		break;
-	default:
-		break;
 	}
 
 	memcpy(dst, tmp, CAMELLIA_BLOCK_SIZE);
 }
 
-
 static struct crypto_alg camellia_alg = {
 	.cra_name		=	"camellia",
 	.cra_driver_name	=	"camellia-generic",
@@ -1786,16 +1675,13 @@ static int __init camellia_init(void)
 	return crypto_register_alg(&camellia_alg);
 }
 
-
 static void __exit camellia_fini(void)
 {
 	crypto_unregister_alg(&camellia_alg);
 }
 
-
 module_init(camellia_init);
 module_exit(camellia_fini);
 
-
 MODULE_DESCRIPTION("Camellia Cipher Algorithm");
 MODULE_LICENSE("GPL");

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 2/5] camellia: cleanup
  2007-10-25 11:43 [PATCH0/5] camellia: cleanup, de-unrolling, and 64bit-ization Denys Vlasenko
  2007-10-25 11:45 ` [PATCH 1/5] camellia: cleanup Denys Vlasenko
@ 2007-10-25 11:45 ` Denys Vlasenko
  2007-10-26  8:44   ` Noriaki TAKAMIYA
  2007-11-06 14:19   ` Herbert Xu
  2007-10-25 11:46 ` [PATCH 3/5] " Denys Vlasenko
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 40+ messages in thread
From: Denys Vlasenko @ 2007-10-25 11:45 UTC (permalink / raw)
  To: Herbert Xu; +Cc: linux-crypto

[-- Attachment #1: Type: text/plain, Size: 601 bytes --]

On Thursday 25 October 2007 12:43, Denys Vlasenko wrote:
> Hi Hervert,
> 
> Please review and maybe propagate upstream following patches.
> 
> camellia2.diff
>     Rename some macros to shorter names: CAMELLIA_RR8 -> ROR8,
>     making it easier to understand that it is just a right rotation,
>     nothing camellia-specific in it.
>     CAMELLIA_SUBKEY_L() -> SUBKEY_L() - just shorter.
> 
>     Move be32 <-> cpu conversions out of en/decrypt128/256 and into
>     camellia_en/decrypt - no reason to have that code duplicated twice.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
--
vda

[-- Attachment #2: camellia2.diff --]
[-- Type: text/x-diff, Size: 56541 bytes --]

--- linux-2.6.23.src/crypto/camellia1.c	2007-10-24 19:03:05.000000000 +0100
+++ linux-2.6.23.src/crypto/camellia.c	2007-10-24 19:03:22.000000000 +0100
@@ -336,13 +336,13 @@ static const u32 camellia_sp4404[256] = 
 		     ^ ((u32)(pt)[3]))
 
 /* rotation right shift 1byte */
-#define CAMELLIA_RR8(x) (((x) >> 8) + ((x) << 24))
+#define ROR8(x) (((x) >> 8) + ((x) << 24))
 /* rotation left shift 1bit */
-#define CAMELLIA_RL1(x) (((x) << 1) + ((x) >> 31))
+#define ROL1(x) (((x) << 1) + ((x) >> 31))
 /* rotation left shift 1byte */
-#define CAMELLIA_RL8(x) (((x) << 8) + ((x) >> 24))
+#define ROL8(x) (((x) << 8) + ((x) >> 24))
 
-#define CAMELLIA_ROLDQ(ll, lr, rl, rr, w0, w1, bits)	\
+#define ROLDQ(ll, lr, rl, rr, w0, w1, bits)		\
     do {						\
 	w0 = ll;					\
 	ll = (ll << bits) + (lr >> (32 - bits));	\
@@ -351,7 +351,7 @@ static const u32 camellia_sp4404[256] = 
 	rr = (rr << bits) + (w0 >> (32 - bits));	\
     } while(0)
 
-#define CAMELLIA_ROLDQo32(ll, lr, rl, rr, w0, w1, bits)	\
+#define ROLDQo32(ll, lr, rl, rr, w0, w1, bits)		\
     do {						\
 	w0 = ll;					\
 	w1 = lr;					\
@@ -377,7 +377,7 @@ static const u32 camellia_sp4404[256] = 
 	   ^ camellia_sp3033[(il >> 8) & 0xff]			\
 	   ^ camellia_sp4404[il & 0xff];			\
 	yl ^= yr;						\
-	yr = CAMELLIA_RR8(yr);					\
+	yr = ROR8(yr);						\
 	yr ^= yl;						\
     } while(0)
 
@@ -393,13 +393,13 @@ static const u32 camellia_sp4404[256] = 
 	t0 &= ll;							\
 	t2 |= rr;							\
 	rl ^= t2;							\
-	lr ^= CAMELLIA_RL1(t0);						\
+	lr ^= ROL1(t0);							\
 	t3 = krl;							\
 	t1 = klr;							\
 	t3 &= rl;							\
 	t1 |= lr;							\
 	ll ^= t1;							\
-	rr ^= CAMELLIA_RL1(t3);						\
+	rr ^= ROL1(t3);							\
     } while(0)
 
 #define CAMELLIA_ROUNDSM(xl, xr, kl, kr, yl, yr, il, ir, t0, t1)	\
@@ -415,12 +415,12 @@ static const u32 camellia_sp4404[256] = 
 	il ^= kl;							\
 	ir ^= il ^ kr;							\
 	yl ^= ir;							\
-	yr ^= CAMELLIA_RR8(il) ^ ir;					\
+	yr ^= ROR8(il) ^ ir;						\
     } while(0)
 
 
-#define CAMELLIA_SUBKEY_L(INDEX) (subkey[(INDEX)*2])
-#define CAMELLIA_SUBKEY_R(INDEX) (subkey[(INDEX)*2 + 1])
+#define SUBKEY_L(INDEX) (subkey[(INDEX)*2])
+#define SUBKEY_R(INDEX) (subkey[(INDEX)*2 + 1])
 
 static void camellia_setup128(const unsigned char *key, u32 *subkey)
 {
@@ -445,35 +445,35 @@ static void camellia_setup128(const unsi
 	/* kw2 */
 	subL[1] = krl; subR[1] = krr;
 	/* rotation left shift 15bit */
-	CAMELLIA_ROLDQ(kll, klr, krl, krr, w0, w1, 15);
+	ROLDQ(kll, klr, krl, krr, w0, w1, 15);
 	/* k3 */
 	subL[4] = kll; subR[4] = klr;
 	/* k4 */
 	subL[5] = krl; subR[5] = krr;
 	/* rotation left shift 15+30bit */
-	CAMELLIA_ROLDQ(kll, klr, krl, krr, w0, w1, 30);
+	ROLDQ(kll, klr, krl, krr, w0, w1, 30);
 	/* k7 */
 	subL[10] = kll; subR[10] = klr;
 	/* k8 */
 	subL[11] = krl; subR[11] = krr;
 	/* rotation left shift 15+30+15bit */
-	CAMELLIA_ROLDQ(kll, klr, krl, krr, w0, w1, 15);
+	ROLDQ(kll, klr, krl, krr, w0, w1, 15);
 	/* k10 */
 	subL[13] = krl; subR[13] = krr;
 	/* rotation left shift 15+30+15+17 bit */
-	CAMELLIA_ROLDQ(kll, klr, krl, krr, w0, w1, 17);
+	ROLDQ(kll, klr, krl, krr, w0, w1, 17);
 	/* kl3 */
 	subL[16] = kll; subR[16] = klr;
 	/* kl4 */
 	subL[17] = krl; subR[17] = krr;
 	/* rotation left shift 15+30+15+17+17 bit */
-	CAMELLIA_ROLDQ(kll, klr, krl, krr, w0, w1, 17);
+	ROLDQ(kll, klr, krl, krr, w0, w1, 17);
 	/* k13 */
 	subL[18] = kll; subR[18] = klr;
 	/* k14 */
 	subL[19] = krl; subR[19] = krr;
 	/* rotation left shift 15+30+15+17+17+17 bit */
-	CAMELLIA_ROLDQ(kll, klr, krl, krr, w0, w1, 17);
+	ROLDQ(kll, klr, krl, krr, w0, w1, 17);
 	/* k17 */
 	subL[22] = kll; subR[22] = klr;
 	/* k18 */
@@ -503,26 +503,26 @@ static void camellia_setup128(const unsi
 	/* k1, k2 */
 	subL[2] = kll; subR[2] = klr;
 	subL[3] = krl; subR[3] = krr;
-	CAMELLIA_ROLDQ(kll, klr, krl, krr, w0, w1, 15);
+	ROLDQ(kll, klr, krl, krr, w0, w1, 15);
 	/* k5,k6 */
 	subL[6] = kll; subR[6] = klr;
 	subL[7] = krl; subR[7] = krr;
-	CAMELLIA_ROLDQ(kll, klr, krl, krr, w0, w1, 15);
+	ROLDQ(kll, klr, krl, krr, w0, w1, 15);
 	/* kl1, kl2 */
 	subL[8] = kll; subR[8] = klr;
 	subL[9] = krl; subR[9] = krr;
-	CAMELLIA_ROLDQ(kll, klr, krl, krr, w0, w1, 15);
+	ROLDQ(kll, klr, krl, krr, w0, w1, 15);
 	/* k9 */
 	subL[12] = kll; subR[12] = klr;
-	CAMELLIA_ROLDQ(kll, klr, krl, krr, w0, w1, 15);
+	ROLDQ(kll, klr, krl, krr, w0, w1, 15);
 	/* k11, k12 */
 	subL[14] = kll; subR[14] = klr;
 	subL[15] = krl; subR[15] = krr;
-	CAMELLIA_ROLDQo32(kll, klr, krl, krr, w0, w1, 34);
+	ROLDQo32(kll, klr, krl, krr, w0, w1, 34);
 	/* k15, k16 */
 	subL[20] = kll; subR[20] = klr;
 	subL[21] = krl; subR[21] = krr;
-	CAMELLIA_ROLDQ(kll, klr, krl, krr, w0, w1, 17);
+	ROLDQ(kll, klr, krl, krr, w0, w1, 17);
 	/* kw3, kw4 */
 	subL[24] = kll; subR[24] = klr;
 	subL[25] = krl; subR[25] = krr;
@@ -536,7 +536,7 @@ static void camellia_setup128(const unsi
 	subL[7] ^= subL[1]; subR[7] ^= subR[1];
 	subL[1] ^= subR[1] & ~subR[9];
 	dw = subL[1] & subL[9],
-		subR[1] ^= CAMELLIA_RL1(dw); /* modified for FLinv(kl2) */
+		subR[1] ^= ROL1(dw); /* modified for FLinv(kl2) */
 	/* round 8 */
 	subL[11] ^= subL[1]; subR[11] ^= subR[1];
 	/* round 10 */
@@ -545,7 +545,7 @@ static void camellia_setup128(const unsi
 	subL[15] ^= subL[1]; subR[15] ^= subR[1];
 	subL[1] ^= subR[1] & ~subR[17];
 	dw = subL[1] & subL[17],
-		subR[1] ^= CAMELLIA_RL1(dw); /* modified for FLinv(kl4) */
+		subR[1] ^= ROL1(dw); /* modified for FLinv(kl4) */
 	/* round 14 */
 	subL[19] ^= subL[1]; subR[19] ^= subR[1];
 	/* round 16 */
@@ -565,7 +565,7 @@ static void camellia_setup128(const unsi
 	subL[18] ^= kw4l; subR[18] ^= kw4r;
 	kw4l ^= kw4r & ~subR[16];
 	dw = kw4l & subL[16],
-		kw4r ^= CAMELLIA_RL1(dw); /* modified for FL(kl3) */
+		kw4r ^= ROL1(dw); /* modified for FL(kl3) */
 	/* round 11 */
 	subL[14] ^= kw4l; subR[14] ^= kw4r;
 	/* round 9 */
@@ -574,7 +574,7 @@ static void camellia_setup128(const unsi
 	subL[10] ^= kw4l; subR[10] ^= kw4r;
 	kw4l ^= kw4r & ~subR[8];
 	dw = kw4l & subL[8],
-		kw4r ^= CAMELLIA_RL1(dw); /* modified for FL(kl1) */
+		kw4r ^= ROL1(dw); /* modified for FL(kl1) */
 	/* round 5 */
 	subL[6] ^= kw4l; subR[6] ^= kw4r;
 	/* round 3 */
@@ -585,140 +585,104 @@ static void camellia_setup128(const unsi
 	subL[0] ^= kw4l; subR[0] ^= kw4r;
 
 	/* key XOR is end of F-function */
-	CAMELLIA_SUBKEY_L(0) = subL[0] ^ subL[2];/* kw1 */
-	CAMELLIA_SUBKEY_R(0) = subR[0] ^ subR[2];
-	CAMELLIA_SUBKEY_L(2) = subL[3];       /* round 1 */
-	CAMELLIA_SUBKEY_R(2) = subR[3];
-	CAMELLIA_SUBKEY_L(3) = subL[2] ^ subL[4]; /* round 2 */
-	CAMELLIA_SUBKEY_R(3) = subR[2] ^ subR[4];
-	CAMELLIA_SUBKEY_L(4) = subL[3] ^ subL[5]; /* round 3 */
-	CAMELLIA_SUBKEY_R(4) = subR[3] ^ subR[5];
-	CAMELLIA_SUBKEY_L(5) = subL[4] ^ subL[6]; /* round 4 */
-	CAMELLIA_SUBKEY_R(5) = subR[4] ^ subR[6];
-	CAMELLIA_SUBKEY_L(6) = subL[5] ^ subL[7]; /* round 5 */
-	CAMELLIA_SUBKEY_R(6) = subR[5] ^ subR[7];
+	SUBKEY_L(0) = subL[0] ^ subL[2];/* kw1 */
+	SUBKEY_R(0) = subR[0] ^ subR[2];
+	SUBKEY_L(2) = subL[3];       /* round 1 */
+	SUBKEY_R(2) = subR[3];
+	SUBKEY_L(3) = subL[2] ^ subL[4]; /* round 2 */
+	SUBKEY_R(3) = subR[2] ^ subR[4];
+	SUBKEY_L(4) = subL[3] ^ subL[5]; /* round 3 */
+	SUBKEY_R(4) = subR[3] ^ subR[5];
+	SUBKEY_L(5) = subL[4] ^ subL[6]; /* round 4 */
+	SUBKEY_R(5) = subR[4] ^ subR[6];
+	SUBKEY_L(6) = subL[5] ^ subL[7]; /* round 5 */
+	SUBKEY_R(6) = subR[5] ^ subR[7];
 	tl = subL[10] ^ (subR[10] & ~subR[8]);
 	dw = tl & subL[8],  /* FL(kl1) */
-		tr = subR[10] ^ CAMELLIA_RL1(dw);
-	CAMELLIA_SUBKEY_L(7) = subL[6] ^ tl; /* round 6 */
-	CAMELLIA_SUBKEY_R(7) = subR[6] ^ tr;
-	CAMELLIA_SUBKEY_L(8) = subL[8];       /* FL(kl1) */
-	CAMELLIA_SUBKEY_R(8) = subR[8];
-	CAMELLIA_SUBKEY_L(9) = subL[9];       /* FLinv(kl2) */
-	CAMELLIA_SUBKEY_R(9) = subR[9];
+		tr = subR[10] ^ ROL1(dw);
+	SUBKEY_L(7) = subL[6] ^ tl; /* round 6 */
+	SUBKEY_R(7) = subR[6] ^ tr;
+	SUBKEY_L(8) = subL[8];       /* FL(kl1) */
+	SUBKEY_R(8) = subR[8];
+	SUBKEY_L(9) = subL[9];       /* FLinv(kl2) */
+	SUBKEY_R(9) = subR[9];
 	tl = subL[7] ^ (subR[7] & ~subR[9]);
 	dw = tl & subL[9],  /* FLinv(kl2) */
-		tr = subR[7] ^ CAMELLIA_RL1(dw);
-	CAMELLIA_SUBKEY_L(10) = tl ^ subL[11]; /* round 7 */
-	CAMELLIA_SUBKEY_R(10) = tr ^ subR[11];
-	CAMELLIA_SUBKEY_L(11) = subL[10] ^ subL[12]; /* round 8 */
-	CAMELLIA_SUBKEY_R(11) = subR[10] ^ subR[12];
-	CAMELLIA_SUBKEY_L(12) = subL[11] ^ subL[13]; /* round 9 */
-	CAMELLIA_SUBKEY_R(12) = subR[11] ^ subR[13];
-	CAMELLIA_SUBKEY_L(13) = subL[12] ^ subL[14]; /* round 10 */
-	CAMELLIA_SUBKEY_R(13) = subR[12] ^ subR[14];
-	CAMELLIA_SUBKEY_L(14) = subL[13] ^ subL[15]; /* round 11 */
-	CAMELLIA_SUBKEY_R(14) = subR[13] ^ subR[15];
+		tr = subR[7] ^ ROL1(dw);
+	SUBKEY_L(10) = tl ^ subL[11]; /* round 7 */
+	SUBKEY_R(10) = tr ^ subR[11];
+	SUBKEY_L(11) = subL[10] ^ subL[12]; /* round 8 */
+	SUBKEY_R(11) = subR[10] ^ subR[12];
+	SUBKEY_L(12) = subL[11] ^ subL[13]; /* round 9 */
+	SUBKEY_R(12) = subR[11] ^ subR[13];
+	SUBKEY_L(13) = subL[12] ^ subL[14]; /* round 10 */
+	SUBKEY_R(13) = subR[12] ^ subR[14];
+	SUBKEY_L(14) = subL[13] ^ subL[15]; /* round 11 */
+	SUBKEY_R(14) = subR[13] ^ subR[15];
 	tl = subL[18] ^ (subR[18] & ~subR[16]);
 	dw = tl & subL[16], /* FL(kl3) */
-		tr = subR[18] ^ CAMELLIA_RL1(dw);
-	CAMELLIA_SUBKEY_L(15) = subL[14] ^ tl; /* round 12 */
-	CAMELLIA_SUBKEY_R(15) = subR[14] ^ tr;
-	CAMELLIA_SUBKEY_L(16) = subL[16];     /* FL(kl3) */
-	CAMELLIA_SUBKEY_R(16) = subR[16];
-	CAMELLIA_SUBKEY_L(17) = subL[17];     /* FLinv(kl4) */
-	CAMELLIA_SUBKEY_R(17) = subR[17];
+		tr = subR[18] ^ ROL1(dw);
+	SUBKEY_L(15) = subL[14] ^ tl; /* round 12 */
+	SUBKEY_R(15) = subR[14] ^ tr;
+	SUBKEY_L(16) = subL[16];     /* FL(kl3) */
+	SUBKEY_R(16) = subR[16];
+	SUBKEY_L(17) = subL[17];     /* FLinv(kl4) */
+	SUBKEY_R(17) = subR[17];
 	tl = subL[15] ^ (subR[15] & ~subR[17]);
 	dw = tl & subL[17], /* FLinv(kl4) */
-		tr = subR[15] ^ CAMELLIA_RL1(dw);
-	CAMELLIA_SUBKEY_L(18) = tl ^ subL[19]; /* round 13 */
-	CAMELLIA_SUBKEY_R(18) = tr ^ subR[19];
-	CAMELLIA_SUBKEY_L(19) = subL[18] ^ subL[20]; /* round 14 */
-	CAMELLIA_SUBKEY_R(19) = subR[18] ^ subR[20];
-	CAMELLIA_SUBKEY_L(20) = subL[19] ^ subL[21]; /* round 15 */
-	CAMELLIA_SUBKEY_R(20) = subR[19] ^ subR[21];
-	CAMELLIA_SUBKEY_L(21) = subL[20] ^ subL[22]; /* round 16 */
-	CAMELLIA_SUBKEY_R(21) = subR[20] ^ subR[22];
-	CAMELLIA_SUBKEY_L(22) = subL[21] ^ subL[23]; /* round 17 */
-	CAMELLIA_SUBKEY_R(22) = subR[21] ^ subR[23];
-	CAMELLIA_SUBKEY_L(23) = subL[22];     /* round 18 */
-	CAMELLIA_SUBKEY_R(23) = subR[22];
-	CAMELLIA_SUBKEY_L(24) = subL[24] ^ subL[23]; /* kw3 */
-	CAMELLIA_SUBKEY_R(24) = subR[24] ^ subR[23];
+		tr = subR[15] ^ ROL1(dw);
+	SUBKEY_L(18) = tl ^ subL[19]; /* round 13 */
+	SUBKEY_R(18) = tr ^ subR[19];
+	SUBKEY_L(19) = subL[18] ^ subL[20]; /* round 14 */
+	SUBKEY_R(19) = subR[18] ^ subR[20];
+	SUBKEY_L(20) = subL[19] ^ subL[21]; /* round 15 */
+	SUBKEY_R(20) = subR[19] ^ subR[21];
+	SUBKEY_L(21) = subL[20] ^ subL[22]; /* round 16 */
+	SUBKEY_R(21) = subR[20] ^ subR[22];
+	SUBKEY_L(22) = subL[21] ^ subL[23]; /* round 17 */
+	SUBKEY_R(22) = subR[21] ^ subR[23];
+	SUBKEY_L(23) = subL[22];     /* round 18 */
+	SUBKEY_R(23) = subR[22];
+	SUBKEY_L(24) = subL[24] ^ subL[23]; /* kw3 */
+	SUBKEY_R(24) = subR[24] ^ subR[23];
 
 	/* apply the inverse of the last half of P-function */
-	dw = CAMELLIA_SUBKEY_L(2) ^ CAMELLIA_SUBKEY_R(2),
-		dw = CAMELLIA_RL8(dw);/* round 1 */
-	CAMELLIA_SUBKEY_R(2) = CAMELLIA_SUBKEY_L(2) ^ dw,
-		CAMELLIA_SUBKEY_L(2) = dw;
-	dw = CAMELLIA_SUBKEY_L(3) ^ CAMELLIA_SUBKEY_R(3),
-		dw = CAMELLIA_RL8(dw);/* round 2 */
-	CAMELLIA_SUBKEY_R(3) = CAMELLIA_SUBKEY_L(3) ^ dw,
-		CAMELLIA_SUBKEY_L(3) = dw;
-	dw = CAMELLIA_SUBKEY_L(4) ^ CAMELLIA_SUBKEY_R(4),
-		dw = CAMELLIA_RL8(dw);/* round 3 */
-	CAMELLIA_SUBKEY_R(4) = CAMELLIA_SUBKEY_L(4) ^ dw,
-		CAMELLIA_SUBKEY_L(4) = dw;
-	dw = CAMELLIA_SUBKEY_L(5) ^ CAMELLIA_SUBKEY_R(5),
-		dw = CAMELLIA_RL8(dw);/* round 4 */
-	CAMELLIA_SUBKEY_R(5) = CAMELLIA_SUBKEY_L(5) ^ dw,
-		CAMELLIA_SUBKEY_L(5) = dw;
-	dw = CAMELLIA_SUBKEY_L(6) ^ CAMELLIA_SUBKEY_R(6),
-		dw = CAMELLIA_RL8(dw);/* round 5 */
-	CAMELLIA_SUBKEY_R(6) = CAMELLIA_SUBKEY_L(6) ^ dw,
-		CAMELLIA_SUBKEY_L(6) = dw;
-	dw = CAMELLIA_SUBKEY_L(7) ^ CAMELLIA_SUBKEY_R(7),
-		dw = CAMELLIA_RL8(dw);/* round 6 */
-	CAMELLIA_SUBKEY_R(7) = CAMELLIA_SUBKEY_L(7) ^ dw,
-		CAMELLIA_SUBKEY_L(7) = dw;
-	dw = CAMELLIA_SUBKEY_L(10) ^ CAMELLIA_SUBKEY_R(10),
-		dw = CAMELLIA_RL8(dw);/* round 7 */
-	CAMELLIA_SUBKEY_R(10) = CAMELLIA_SUBKEY_L(10) ^ dw,
-		CAMELLIA_SUBKEY_L(10) = dw;
-	dw = CAMELLIA_SUBKEY_L(11) ^ CAMELLIA_SUBKEY_R(11),
-		dw = CAMELLIA_RL8(dw);/* round 8 */
-	CAMELLIA_SUBKEY_R(11) = CAMELLIA_SUBKEY_L(11) ^ dw,
-		CAMELLIA_SUBKEY_L(11) = dw;
-	dw = CAMELLIA_SUBKEY_L(12) ^ CAMELLIA_SUBKEY_R(12),
-		dw = CAMELLIA_RL8(dw);/* round 9 */
-	CAMELLIA_SUBKEY_R(12) = CAMELLIA_SUBKEY_L(12) ^ dw,
-		CAMELLIA_SUBKEY_L(12) = dw;
-	dw = CAMELLIA_SUBKEY_L(13) ^ CAMELLIA_SUBKEY_R(13),
-		dw = CAMELLIA_RL8(dw);/* round 10 */
-	CAMELLIA_SUBKEY_R(13) = CAMELLIA_SUBKEY_L(13) ^ dw,
-		CAMELLIA_SUBKEY_L(13) = dw;
-	dw = CAMELLIA_SUBKEY_L(14) ^ CAMELLIA_SUBKEY_R(14),
-		dw = CAMELLIA_RL8(dw);/* round 11 */
-	CAMELLIA_SUBKEY_R(14) = CAMELLIA_SUBKEY_L(14) ^ dw,
-		CAMELLIA_SUBKEY_L(14) = dw;
-	dw = CAMELLIA_SUBKEY_L(15) ^ CAMELLIA_SUBKEY_R(15),
-		dw = CAMELLIA_RL8(dw);/* round 12 */
-	CAMELLIA_SUBKEY_R(15) = CAMELLIA_SUBKEY_L(15) ^ dw,
-		CAMELLIA_SUBKEY_L(15) = dw;
-	dw = CAMELLIA_SUBKEY_L(18) ^ CAMELLIA_SUBKEY_R(18),
-		dw = CAMELLIA_RL8(dw);/* round 13 */
-	CAMELLIA_SUBKEY_R(18) = CAMELLIA_SUBKEY_L(18) ^ dw,
-		CAMELLIA_SUBKEY_L(18) = dw;
-	dw = CAMELLIA_SUBKEY_L(19) ^ CAMELLIA_SUBKEY_R(19),
-		dw = CAMELLIA_RL8(dw);/* round 14 */
-	CAMELLIA_SUBKEY_R(19) = CAMELLIA_SUBKEY_L(19) ^ dw,
-		CAMELLIA_SUBKEY_L(19) = dw;
-	dw = CAMELLIA_SUBKEY_L(20) ^ CAMELLIA_SUBKEY_R(20),
-		dw = CAMELLIA_RL8(dw);/* round 15 */
-	CAMELLIA_SUBKEY_R(20) = CAMELLIA_SUBKEY_L(20) ^ dw,
-		CAMELLIA_SUBKEY_L(20) = dw;
-	dw = CAMELLIA_SUBKEY_L(21) ^ CAMELLIA_SUBKEY_R(21),
-		dw = CAMELLIA_RL8(dw);/* round 16 */
-	CAMELLIA_SUBKEY_R(21) = CAMELLIA_SUBKEY_L(21) ^ dw,
-		CAMELLIA_SUBKEY_L(21) = dw;
-	dw = CAMELLIA_SUBKEY_L(22) ^ CAMELLIA_SUBKEY_R(22),
-		dw = CAMELLIA_RL8(dw);/* round 17 */
-	CAMELLIA_SUBKEY_R(22) = CAMELLIA_SUBKEY_L(22) ^ dw,
-		CAMELLIA_SUBKEY_L(22) = dw;
-	dw = CAMELLIA_SUBKEY_L(23) ^ CAMELLIA_SUBKEY_R(23),
-		dw = CAMELLIA_RL8(dw);/* round 18 */
-	CAMELLIA_SUBKEY_R(23) = CAMELLIA_SUBKEY_L(23) ^ dw,
-		CAMELLIA_SUBKEY_L(23) = dw;
+	dw = SUBKEY_L(2) ^ SUBKEY_R(2); dw = ROL8(dw);/* round 1 */
+	SUBKEY_R(2) = SUBKEY_L(2) ^ dw; SUBKEY_L(2) = dw;
+	dw = SUBKEY_L(3) ^ SUBKEY_R(3); dw = ROL8(dw);/* round 2 */
+	SUBKEY_R(3) = SUBKEY_L(3) ^ dw; SUBKEY_L(3) = dw;
+	dw = SUBKEY_L(4) ^ SUBKEY_R(4); dw = ROL8(dw);/* round 3 */
+	SUBKEY_R(4) = SUBKEY_L(4) ^ dw; SUBKEY_L(4) = dw;
+	dw = SUBKEY_L(5) ^ SUBKEY_R(5); dw = ROL8(dw);/* round 4 */
+	SUBKEY_R(5) = SUBKEY_L(5) ^ dw; SUBKEY_L(5) = dw;
+	dw = SUBKEY_L(6) ^ SUBKEY_R(6); dw = ROL8(dw);/* round 5 */
+	SUBKEY_R(6) = SUBKEY_L(6) ^ dw; SUBKEY_L(6) = dw;
+	dw = SUBKEY_L(7) ^ SUBKEY_R(7); dw = ROL8(dw);/* round 6 */
+	SUBKEY_R(7) = SUBKEY_L(7) ^ dw; SUBKEY_L(7) = dw;
+	dw = SUBKEY_L(10) ^ SUBKEY_R(10); dw = ROL8(dw);/* round 7 */
+	SUBKEY_R(10) = SUBKEY_L(10) ^ dw; SUBKEY_L(10) = dw;
+	dw = SUBKEY_L(11) ^ SUBKEY_R(11); dw = ROL8(dw);/* round 8 */
+	SUBKEY_R(11) = SUBKEY_L(11) ^ dw; SUBKEY_L(11) = dw;
+	dw = SUBKEY_L(12) ^ SUBKEY_R(12); dw = ROL8(dw);/* round 9 */
+	SUBKEY_R(12) = SUBKEY_L(12) ^ dw; SUBKEY_L(12) = dw;
+	dw = SUBKEY_L(13) ^ SUBKEY_R(13); dw = ROL8(dw);/* round 10 */
+	SUBKEY_R(13) = SUBKEY_L(13) ^ dw; SUBKEY_L(13) = dw;
+	dw = SUBKEY_L(14) ^ SUBKEY_R(14); dw = ROL8(dw);/* round 11 */
+	SUBKEY_R(14) = SUBKEY_L(14) ^ dw; SUBKEY_L(14) = dw;
+	dw = SUBKEY_L(15) ^ SUBKEY_R(15); dw = ROL8(dw);/* round 12 */
+	SUBKEY_R(15) = SUBKEY_L(15) ^ dw; SUBKEY_L(15) = dw;
+	dw = SUBKEY_L(18) ^ SUBKEY_R(18); dw = ROL8(dw);/* round 13 */
+	SUBKEY_R(18) = SUBKEY_L(18) ^ dw; SUBKEY_L(18) = dw;
+	dw = SUBKEY_L(19) ^ SUBKEY_R(19); dw = ROL8(dw);/* round 14 */
+	SUBKEY_R(19) = SUBKEY_L(19) ^ dw; SUBKEY_L(19) = dw;
+	dw = SUBKEY_L(20) ^ SUBKEY_R(20); dw = ROL8(dw);/* round 15 */
+	SUBKEY_R(20) = SUBKEY_L(20) ^ dw; SUBKEY_L(20) = dw;
+	dw = SUBKEY_L(21) ^ SUBKEY_R(21); dw = ROL8(dw);/* round 16 */
+	SUBKEY_R(21) = SUBKEY_L(21) ^ dw; SUBKEY_L(21) = dw;
+	dw = SUBKEY_L(22) ^ SUBKEY_R(22); dw = ROL8(dw);/* round 17 */
+	SUBKEY_R(22) = SUBKEY_L(22) ^ dw; SUBKEY_L(22) = dw;
+	dw = SUBKEY_L(23) ^ SUBKEY_R(23); dw = ROL8(dw);/* round 18 */
+	SUBKEY_R(23) = SUBKEY_L(23) ^ dw; SUBKEY_L(23) = dw;
 }
 
 static void camellia_setup256(const unsigned char *key, u32 *subkey)
@@ -734,7 +698,6 @@ static void camellia_setup256(const unsi
 	 *  key = (kll || klr || krl || krr || krll || krlr || krrl || krrr)
 	 *  (|| is concatination)
 	 */
-
 	kll  = GETU32(key     );
 	klr  = GETU32(key +  4);
 	krl  = GETU32(key +  8);
@@ -749,49 +712,49 @@ static void camellia_setup256(const unsi
 	subL[0] = kll; subR[0] = klr;
 	/* kw2 */
 	subL[1] = krl; subR[1] = krr;
-	CAMELLIA_ROLDQo32(kll, klr, krl, krr, w0, w1, 45);
+	ROLDQo32(kll, klr, krl, krr, w0, w1, 45);
 	/* k9 */
 	subL[12] = kll; subR[12] = klr;
 	/* k10 */
 	subL[13] = krl; subR[13] = krr;
-	CAMELLIA_ROLDQ(kll, klr, krl, krr, w0, w1, 15);
+	ROLDQ(kll, klr, krl, krr, w0, w1, 15);
 	/* kl3 */
 	subL[16] = kll; subR[16] = klr;
 	/* kl4 */
 	subL[17] = krl; subR[17] = krr;
-	CAMELLIA_ROLDQ(kll, klr, krl, krr, w0, w1, 17);
+	ROLDQ(kll, klr, krl, krr, w0, w1, 17);
 	/* k17 */
 	subL[22] = kll; subR[22] = klr;
 	/* k18 */
 	subL[23] = krl; subR[23] = krr;
-	CAMELLIA_ROLDQo32(kll, klr, krl, krr, w0, w1, 34);
+	ROLDQo32(kll, klr, krl, krr, w0, w1, 34);
 	/* k23 */
 	subL[30] = kll; subR[30] = klr;
 	/* k24 */
 	subL[31] = krl; subR[31] = krr;
 
 	/* generate KR dependent subkeys */
-	CAMELLIA_ROLDQ(krll, krlr, krrl, krrr, w0, w1, 15);
+	ROLDQ(krll, krlr, krrl, krrr, w0, w1, 15);
 	/* k3 */
 	subL[4] = krll; subR[4] = krlr;
 	/* k4 */
 	subL[5] = krrl; subR[5] = krrr;
-	CAMELLIA_ROLDQ(krll, krlr, krrl, krrr, w0, w1, 15);
+	ROLDQ(krll, krlr, krrl, krrr, w0, w1, 15);
 	/* kl1 */
 	subL[8] = krll; subR[8] = krlr;
 	/* kl2 */
 	subL[9] = krrl; subR[9] = krrr;
-	CAMELLIA_ROLDQ(krll, krlr, krrl, krrr, w0, w1, 30);
+	ROLDQ(krll, krlr, krrl, krrr, w0, w1, 30);
 	/* k13 */
 	subL[18] = krll; subR[18] = krlr;
 	/* k14 */
 	subL[19] = krrl; subR[19] = krrr;
-	CAMELLIA_ROLDQo32(krll, krlr, krrl, krrr, w0, w1, 34);
+	ROLDQo32(krll, krlr, krrl, krrr, w0, w1, 34);
 	/* k19 */
 	subL[26] = krll; subR[26] = krlr;
 	/* k20 */
 	subL[27] = krrl; subR[27] = krrr;
-	CAMELLIA_ROLDQo32(krll, krlr, krrl, krrr, w0, w1, 34);
+	ROLDQo32(krll, krlr, krrl, krrr, w0, w1, 34);
 
 	/* generate KA */
 	kll = subL[0] ^ krll; klr = subR[0] ^ krlr;
@@ -826,12 +789,12 @@ static void camellia_setup256(const unsi
 	krll ^= w0; krlr ^= w1;
 
 	/* generate KA dependent subkeys */
-	CAMELLIA_ROLDQ(kll, klr, krl, krr, w0, w1, 15);
+	ROLDQ(kll, klr, krl, krr, w0, w1, 15);
 	/* k5 */
 	subL[6] = kll; subR[6] = klr;
 	/* k6 */
 	subL[7] = krl; subR[7] = krr;
-	CAMELLIA_ROLDQ(kll, klr, krl, krr, w0, w1, 30);
+	ROLDQ(kll, klr, krl, krr, w0, w1, 30);
 	/* k11 */
 	subL[14] = kll; subR[14] = klr;
 	/* k12 */
@@ -842,7 +805,7 @@ static void camellia_setup256(const unsi
 	/* kl6 */
 	subL[25] = krr; subR[25] = kll;
 	/* rotation left shift 49 from k11,k12 -> k21,k22 */
-	CAMELLIA_ROLDQo32(kll, klr, krl, krr, w0, w1, 49);
+	ROLDQo32(kll, klr, krl, krr, w0, w1, 49);
 	/* k21 */
 	subL[28] = kll; subR[28] = klr;
 	/* k22 */
@@ -853,17 +816,17 @@ static void camellia_setup256(const unsi
 	subL[2] = krll; subR[2] = krlr;
 	/* k2 */
 	subL[3] = krrl; subR[3] = krrr;
-	CAMELLIA_ROLDQ(krll, krlr, krrl, krrr, w0, w1, 30);
+	ROLDQ(krll, krlr, krrl, krrr, w0, w1, 30);
 	/* k7 */
 	subL[10] = krll; subR[10] = krlr;
 	/* k8 */
 	subL[11] = krrl; subR[11] = krrr;
-	CAMELLIA_ROLDQ(krll, krlr, krrl, krrr, w0, w1, 30);
+	ROLDQ(krll, krlr, krrl, krrr, w0, w1, 30);
 	/* k15 */
 	subL[20] = krll; subR[20] = krlr;
 	/* k16 */
 	subL[21] = krrl; subR[21] = krrr;
-	CAMELLIA_ROLDQo32(krll, krlr, krrl, krrr, w0, w1, 51);
+	ROLDQo32(krll, krlr, krrl, krrr, w0, w1, 51);
 	/* kw3 */
 	subL[32] = krll; subR[32] = krlr;
 	/* kw4 */
@@ -878,7 +841,7 @@ static void camellia_setup256(const unsi
 	subL[7] ^= subL[1]; subR[7] ^= subR[1];
 	subL[1] ^= subR[1] & ~subR[9];
 	dw = subL[1] & subL[9],
-		subR[1] ^= CAMELLIA_RL1(dw); /* modified for FLinv(kl2) */
+		subR[1] ^= ROL1(dw); /* modified for FLinv(kl2) */
 	/* round 8 */
 	subL[11] ^= subL[1]; subR[11] ^= subR[1];
 	/* round 10 */
@@ -887,7 +850,7 @@ static void camellia_setup256(const unsi
 	subL[15] ^= subL[1]; subR[15] ^= subR[1];
 	subL[1] ^= subR[1] & ~subR[17];
 	dw = subL[1] & subL[17],
-		subR[1] ^= CAMELLIA_RL1(dw); /* modified for FLinv(kl4) */
+		subR[1] ^= ROL1(dw); /* modified for FLinv(kl4) */
 	/* round 14 */
 	subL[19] ^= subL[1]; subR[19] ^= subR[1];
 	/* round 16 */
@@ -896,7 +859,7 @@ static void camellia_setup256(const unsi
 	subL[23] ^= subL[1]; subR[23] ^= subR[1];
 	subL[1] ^= subR[1] & ~subR[25];
 	dw = subL[1] & subL[25],
-		subR[1] ^= CAMELLIA_RL1(dw); /* modified for FLinv(kl6) */
+		subR[1] ^= ROL1(dw); /* modified for FLinv(kl6) */
 	/* round 20 */
 	subL[27] ^= subL[1]; subR[27] ^= subR[1];
 	/* round 22 */
@@ -916,7 +879,7 @@ static void camellia_setup256(const unsi
 	subL[26] ^= kw4l; subR[26] ^= kw4r;
 	kw4l ^= kw4r & ~subR[24];
 	dw = kw4l & subL[24],
-		kw4r ^= CAMELLIA_RL1(dw); /* modified for FL(kl5) */
+		kw4r ^= ROL1(dw); /* modified for FL(kl5) */
 	/* round 17 */
 	subL[22] ^= kw4l; subR[22] ^= kw4r;
 	/* round 15 */
@@ -925,7 +888,7 @@ static void camellia_setup256(const unsi
 	subL[18] ^= kw4l; subR[18] ^= kw4r;
 	kw4l ^= kw4r & ~subR[16];
 	dw = kw4l & subL[16],
-		kw4r ^= CAMELLIA_RL1(dw); /* modified for FL(kl3) */
+		kw4r ^= ROL1(dw); /* modified for FL(kl3) */
 	/* round 11 */
 	subL[14] ^= kw4l; subR[14] ^= kw4r;
 	/* round 9 */
@@ -934,7 +897,7 @@ static void camellia_setup256(const unsi
 	subL[10] ^= kw4l; subR[10] ^= kw4r;
 	kw4l ^= kw4r & ~subR[8];
 	dw = kw4l & subL[8],
-		kw4r ^= CAMELLIA_RL1(dw); /* modified for FL(kl1) */
+		kw4r ^= ROL1(dw); /* modified for FL(kl1) */
 	/* round 5 */
 	subL[6] ^= kw4l; subR[6] ^= kw4r;
 	/* round 3 */
@@ -945,188 +908,138 @@ static void camellia_setup256(const unsi
 	subL[0] ^= kw4l; subR[0] ^= kw4r;
 
 	/* key XOR is end of F-function */
-	CAMELLIA_SUBKEY_L(0) = subL[0] ^ subL[2];/* kw1 */
-	CAMELLIA_SUBKEY_R(0) = subR[0] ^ subR[2];
-	CAMELLIA_SUBKEY_L(2) = subL[3];       /* round 1 */
-	CAMELLIA_SUBKEY_R(2) = subR[3];
-	CAMELLIA_SUBKEY_L(3) = subL[2] ^ subL[4]; /* round 2 */
-	CAMELLIA_SUBKEY_R(3) = subR[2] ^ subR[4];
-	CAMELLIA_SUBKEY_L(4) = subL[3] ^ subL[5]; /* round 3 */
-	CAMELLIA_SUBKEY_R(4) = subR[3] ^ subR[5];
-	CAMELLIA_SUBKEY_L(5) = subL[4] ^ subL[6]; /* round 4 */
-	CAMELLIA_SUBKEY_R(5) = subR[4] ^ subR[6];
-	CAMELLIA_SUBKEY_L(6) = subL[5] ^ subL[7]; /* round 5 */
-	CAMELLIA_SUBKEY_R(6) = subR[5] ^ subR[7];
+	SUBKEY_L(0) = subL[0] ^ subL[2];/* kw1 */
+	SUBKEY_R(0) = subR[0] ^ subR[2];
+	SUBKEY_L(2) = subL[3];       /* round 1 */
+	SUBKEY_R(2) = subR[3];
+	SUBKEY_L(3) = subL[2] ^ subL[4]; /* round 2 */
+	SUBKEY_R(3) = subR[2] ^ subR[4];
+	SUBKEY_L(4) = subL[3] ^ subL[5]; /* round 3 */
+	SUBKEY_R(4) = subR[3] ^ subR[5];
+	SUBKEY_L(5) = subL[4] ^ subL[6]; /* round 4 */
+	SUBKEY_R(5) = subR[4] ^ subR[6];
+	SUBKEY_L(6) = subL[5] ^ subL[7]; /* round 5 */
+	SUBKEY_R(6) = subR[5] ^ subR[7];
 	tl = subL[10] ^ (subR[10] & ~subR[8]);
 	dw = tl & subL[8],  /* FL(kl1) */
-		tr = subR[10] ^ CAMELLIA_RL1(dw);
-	CAMELLIA_SUBKEY_L(7) = subL[6] ^ tl; /* round 6 */
-	CAMELLIA_SUBKEY_R(7) = subR[6] ^ tr;
-	CAMELLIA_SUBKEY_L(8) = subL[8];       /* FL(kl1) */
-	CAMELLIA_SUBKEY_R(8) = subR[8];
-	CAMELLIA_SUBKEY_L(9) = subL[9];       /* FLinv(kl2) */
-	CAMELLIA_SUBKEY_R(9) = subR[9];
+		tr = subR[10] ^ ROL1(dw);
+	SUBKEY_L(7) = subL[6] ^ tl; /* round 6 */
+	SUBKEY_R(7) = subR[6] ^ tr;
+	SUBKEY_L(8) = subL[8];       /* FL(kl1) */
+	SUBKEY_R(8) = subR[8];
+	SUBKEY_L(9) = subL[9];       /* FLinv(kl2) */
+	SUBKEY_R(9) = subR[9];
 	tl = subL[7] ^ (subR[7] & ~subR[9]);
 	dw = tl & subL[9],  /* FLinv(kl2) */
-		tr = subR[7] ^ CAMELLIA_RL1(dw);
-	CAMELLIA_SUBKEY_L(10) = tl ^ subL[11]; /* round 7 */
-	CAMELLIA_SUBKEY_R(10) = tr ^ subR[11];
-	CAMELLIA_SUBKEY_L(11) = subL[10] ^ subL[12]; /* round 8 */
-	CAMELLIA_SUBKEY_R(11) = subR[10] ^ subR[12];
-	CAMELLIA_SUBKEY_L(12) = subL[11] ^ subL[13]; /* round 9 */
-	CAMELLIA_SUBKEY_R(12) = subR[11] ^ subR[13];
-	CAMELLIA_SUBKEY_L(13) = subL[12] ^ subL[14]; /* round 10 */
-	CAMELLIA_SUBKEY_R(13) = subR[12] ^ subR[14];
-	CAMELLIA_SUBKEY_L(14) = subL[13] ^ subL[15]; /* round 11 */
-	CAMELLIA_SUBKEY_R(14) = subR[13] ^ subR[15];
+		tr = subR[7] ^ ROL1(dw);
+	SUBKEY_L(10) = tl ^ subL[11]; /* round 7 */
+	SUBKEY_R(10) = tr ^ subR[11];
+	SUBKEY_L(11) = subL[10] ^ subL[12]; /* round 8 */
+	SUBKEY_R(11) = subR[10] ^ subR[12];
+	SUBKEY_L(12) = subL[11] ^ subL[13]; /* round 9 */
+	SUBKEY_R(12) = subR[11] ^ subR[13];
+	SUBKEY_L(13) = subL[12] ^ subL[14]; /* round 10 */
+	SUBKEY_R(13) = subR[12] ^ subR[14];
+	SUBKEY_L(14) = subL[13] ^ subL[15]; /* round 11 */
+	SUBKEY_R(14) = subR[13] ^ subR[15];
 	tl = subL[18] ^ (subR[18] & ~subR[16]);
 	dw = tl & subL[16], /* FL(kl3) */
-		tr = subR[18] ^ CAMELLIA_RL1(dw);
-	CAMELLIA_SUBKEY_L(15) = subL[14] ^ tl; /* round 12 */
-	CAMELLIA_SUBKEY_R(15) = subR[14] ^ tr;
-	CAMELLIA_SUBKEY_L(16) = subL[16];     /* FL(kl3) */
-	CAMELLIA_SUBKEY_R(16) = subR[16];
-	CAMELLIA_SUBKEY_L(17) = subL[17];     /* FLinv(kl4) */
-	CAMELLIA_SUBKEY_R(17) = subR[17];
+		tr = subR[18] ^ ROL1(dw);
+	SUBKEY_L(15) = subL[14] ^ tl; /* round 12 */
+	SUBKEY_R(15) = subR[14] ^ tr;
+	SUBKEY_L(16) = subL[16];     /* FL(kl3) */
+	SUBKEY_R(16) = subR[16];
+	SUBKEY_L(17) = subL[17];     /* FLinv(kl4) */
+	SUBKEY_R(17) = subR[17];
 	tl = subL[15] ^ (subR[15] & ~subR[17]);
 	dw = tl & subL[17], /* FLinv(kl4) */
-		tr = subR[15] ^ CAMELLIA_RL1(dw);
-	CAMELLIA_SUBKEY_L(18) = tl ^ subL[19]; /* round 13 */
-	CAMELLIA_SUBKEY_R(18) = tr ^ subR[19];
-	CAMELLIA_SUBKEY_L(19) = subL[18] ^ subL[20]; /* round 14 */
-	CAMELLIA_SUBKEY_R(19) = subR[18] ^ subR[20];
-	CAMELLIA_SUBKEY_L(20) = subL[19] ^ subL[21]; /* round 15 */
-	CAMELLIA_SUBKEY_R(20) = subR[19] ^ subR[21];
-	CAMELLIA_SUBKEY_L(21) = subL[20] ^ subL[22]; /* round 16 */
-	CAMELLIA_SUBKEY_R(21) = subR[20] ^ subR[22];
-	CAMELLIA_SUBKEY_L(22) = subL[21] ^ subL[23]; /* round 17 */
-	CAMELLIA_SUBKEY_R(22) = subR[21] ^ subR[23];
-	tl = subL[26] ^ (subR[26]
-			 & ~subR[24]);
+		tr = subR[15] ^ ROL1(dw);
+	SUBKEY_L(18) = tl ^ subL[19]; /* round 13 */
+	SUBKEY_R(18) = tr ^ subR[19];
+	SUBKEY_L(19) = subL[18] ^ subL[20]; /* round 14 */
+	SUBKEY_R(19) = subR[18] ^ subR[20];
+	SUBKEY_L(20) = subL[19] ^ subL[21]; /* round 15 */
+	SUBKEY_R(20) = subR[19] ^ subR[21];
+	SUBKEY_L(21) = subL[20] ^ subL[22]; /* round 16 */
+	SUBKEY_R(21) = subR[20] ^ subR[22];
+	SUBKEY_L(22) = subL[21] ^ subL[23]; /* round 17 */
+	SUBKEY_R(22) = subR[21] ^ subR[23];
+	tl = subL[26] ^ (subR[26] & ~subR[24]);
 	dw = tl & subL[24], /* FL(kl5) */
-		tr = subR[26] ^ CAMELLIA_RL1(dw);
-	CAMELLIA_SUBKEY_L(23) = subL[22] ^ tl; /* round 18 */
-	CAMELLIA_SUBKEY_R(23) = subR[22] ^ tr;
-	CAMELLIA_SUBKEY_L(24) = subL[24];     /* FL(kl5) */
-	CAMELLIA_SUBKEY_R(24) = subR[24];
-	CAMELLIA_SUBKEY_L(25) = subL[25];     /* FLinv(kl6) */
-	CAMELLIA_SUBKEY_R(25) = subR[25];
-	tl = subL[23] ^ (subR[23] &
-			 ~subR[25]);
+		tr = subR[26] ^ ROL1(dw);
+	SUBKEY_L(23) = subL[22] ^ tl; /* round 18 */
+	SUBKEY_R(23) = subR[22] ^ tr;
+	SUBKEY_L(24) = subL[24];     /* FL(kl5) */
+	SUBKEY_R(24) = subR[24];
+	SUBKEY_L(25) = subL[25];     /* FLinv(kl6) */
+	SUBKEY_R(25) = subR[25];
+	tl = subL[23] ^ (subR[23] & ~subR[25]);
 	dw = tl & subL[25], /* FLinv(kl6) */
-		tr = subR[23] ^ CAMELLIA_RL1(dw);
-	CAMELLIA_SUBKEY_L(26) = tl ^ subL[27]; /* round 19 */
-	CAMELLIA_SUBKEY_R(26) = tr ^ subR[27];
-	CAMELLIA_SUBKEY_L(27) = subL[26] ^ subL[28]; /* round 20 */
-	CAMELLIA_SUBKEY_R(27) = subR[26] ^ subR[28];
-	CAMELLIA_SUBKEY_L(28) = subL[27] ^ subL[29]; /* round 21 */
-	CAMELLIA_SUBKEY_R(28) = subR[27] ^ subR[29];
-	CAMELLIA_SUBKEY_L(29) = subL[28] ^ subL[30]; /* round 22 */
-	CAMELLIA_SUBKEY_R(29) = subR[28] ^ subR[30];
-	CAMELLIA_SUBKEY_L(30) = subL[29] ^ subL[31]; /* round 23 */
-	CAMELLIA_SUBKEY_R(30) = subR[29] ^ subR[31];
-	CAMELLIA_SUBKEY_L(31) = subL[30];     /* round 24 */
-	CAMELLIA_SUBKEY_R(31) = subR[30];
-	CAMELLIA_SUBKEY_L(32) = subL[32] ^ subL[31]; /* kw3 */
-	CAMELLIA_SUBKEY_R(32) = subR[32] ^ subR[31];
+		tr = subR[23] ^ ROL1(dw);
+	SUBKEY_L(26) = tl ^ subL[27]; /* round 19 */
+	SUBKEY_R(26) = tr ^ subR[27];
+	SUBKEY_L(27) = subL[26] ^ subL[28]; /* round 20 */
+	SUBKEY_R(27) = subR[26] ^ subR[28];
+	SUBKEY_L(28) = subL[27] ^ subL[29]; /* round 21 */
+	SUBKEY_R(28) = subR[27] ^ subR[29];
+	SUBKEY_L(29) = subL[28] ^ subL[30]; /* round 22 */
+	SUBKEY_R(29) = subR[28] ^ subR[30];
+	SUBKEY_L(30) = subL[29] ^ subL[31]; /* round 23 */
+	SUBKEY_R(30) = subR[29] ^ subR[31];
+	SUBKEY_L(31) = subL[30];     /* round 24 */
+	SUBKEY_R(31) = subR[30];
+	SUBKEY_L(32) = subL[32] ^ subL[31]; /* kw3 */
+	SUBKEY_R(32) = subR[32] ^ subR[31];
 
 	/* apply the inverse of the last half of P-function */
-	dw = CAMELLIA_SUBKEY_L(2) ^ CAMELLIA_SUBKEY_R(2),
-		dw = CAMELLIA_RL8(dw);/* round 1 */
-	CAMELLIA_SUBKEY_R(2) = CAMELLIA_SUBKEY_L(2) ^ dw,
-		CAMELLIA_SUBKEY_L(2) = dw;
-	dw = CAMELLIA_SUBKEY_L(3) ^ CAMELLIA_SUBKEY_R(3),
-		dw = CAMELLIA_RL8(dw);/* round 2 */
-	CAMELLIA_SUBKEY_R(3) = CAMELLIA_SUBKEY_L(3) ^ dw,
-		CAMELLIA_SUBKEY_L(3) = dw;
-	dw = CAMELLIA_SUBKEY_L(4) ^ CAMELLIA_SUBKEY_R(4),
-		dw = CAMELLIA_RL8(dw);/* round 3 */
-	CAMELLIA_SUBKEY_R(4) = CAMELLIA_SUBKEY_L(4) ^ dw,
-		CAMELLIA_SUBKEY_L(4) = dw;
-	dw = CAMELLIA_SUBKEY_L(5) ^ CAMELLIA_SUBKEY_R(5),
-		dw = CAMELLIA_RL8(dw);/* round 4 */
-	CAMELLIA_SUBKEY_R(5) = CAMELLIA_SUBKEY_L(5) ^ dw,
-	CAMELLIA_SUBKEY_L(5) = dw;
-	dw = CAMELLIA_SUBKEY_L(6) ^ CAMELLIA_SUBKEY_R(6),
-		dw = CAMELLIA_RL8(dw);/* round 5 */
-	CAMELLIA_SUBKEY_R(6) = CAMELLIA_SUBKEY_L(6) ^ dw,
-		CAMELLIA_SUBKEY_L(6) = dw;
-	dw = CAMELLIA_SUBKEY_L(7) ^ CAMELLIA_SUBKEY_R(7),
-		dw = CAMELLIA_RL8(dw);/* round 6 */
-	CAMELLIA_SUBKEY_R(7) = CAMELLIA_SUBKEY_L(7) ^ dw,
-		CAMELLIA_SUBKEY_L(7) = dw;
-	dw = CAMELLIA_SUBKEY_L(10) ^ CAMELLIA_SUBKEY_R(10),
-		dw = CAMELLIA_RL8(dw);/* round 7 */
-	CAMELLIA_SUBKEY_R(10) = CAMELLIA_SUBKEY_L(10) ^ dw,
-		CAMELLIA_SUBKEY_L(10) = dw;
-	dw = CAMELLIA_SUBKEY_L(11) ^ CAMELLIA_SUBKEY_R(11),
-	    dw = CAMELLIA_RL8(dw);/* round 8 */
-	CAMELLIA_SUBKEY_R(11) = CAMELLIA_SUBKEY_L(11) ^ dw,
-		CAMELLIA_SUBKEY_L(11) = dw;
-	dw = CAMELLIA_SUBKEY_L(12) ^ CAMELLIA_SUBKEY_R(12),
-		dw = CAMELLIA_RL8(dw);/* round 9 */
-	CAMELLIA_SUBKEY_R(12) = CAMELLIA_SUBKEY_L(12) ^ dw,
-		CAMELLIA_SUBKEY_L(12) = dw;
-	dw = CAMELLIA_SUBKEY_L(13) ^ CAMELLIA_SUBKEY_R(13),
-		dw = CAMELLIA_RL8(dw);/* round 10 */
-	CAMELLIA_SUBKEY_R(13) = CAMELLIA_SUBKEY_L(13) ^ dw,
-		CAMELLIA_SUBKEY_L(13) = dw;
-	dw = CAMELLIA_SUBKEY_L(14) ^ CAMELLIA_SUBKEY_R(14),
-		dw = CAMELLIA_RL8(dw);/* round 11 */
-	CAMELLIA_SUBKEY_R(14) = CAMELLIA_SUBKEY_L(14) ^ dw,
-		CAMELLIA_SUBKEY_L(14) = dw;
-	dw = CAMELLIA_SUBKEY_L(15) ^ CAMELLIA_SUBKEY_R(15),
-		dw = CAMELLIA_RL8(dw);/* round 12 */
-	CAMELLIA_SUBKEY_R(15) = CAMELLIA_SUBKEY_L(15) ^ dw,
-		CAMELLIA_SUBKEY_L(15) = dw;
-	dw = CAMELLIA_SUBKEY_L(18) ^ CAMELLIA_SUBKEY_R(18),
-		dw = CAMELLIA_RL8(dw);/* round 13 */
-	CAMELLIA_SUBKEY_R(18) = CAMELLIA_SUBKEY_L(18) ^ dw,
-		CAMELLIA_SUBKEY_L(18) = dw;
-	dw = CAMELLIA_SUBKEY_L(19) ^ CAMELLIA_SUBKEY_R(19),
-		dw = CAMELLIA_RL8(dw);/* round 14 */
-	CAMELLIA_SUBKEY_R(19) = CAMELLIA_SUBKEY_L(19) ^ dw,
-		CAMELLIA_SUBKEY_L(19) = dw;
-	dw = CAMELLIA_SUBKEY_L(20) ^ CAMELLIA_SUBKEY_R(20),
-		dw = CAMELLIA_RL8(dw);/* round 15 */
-	CAMELLIA_SUBKEY_R(20) = CAMELLIA_SUBKEY_L(20) ^ dw,
-		CAMELLIA_SUBKEY_L(20) = dw;
-	dw = CAMELLIA_SUBKEY_L(21) ^ CAMELLIA_SUBKEY_R(21),
-		dw = CAMELLIA_RL8(dw);/* round 16 */
-	CAMELLIA_SUBKEY_R(21) = CAMELLIA_SUBKEY_L(21) ^ dw,
-		CAMELLIA_SUBKEY_L(21) = dw;
-	dw = CAMELLIA_SUBKEY_L(22) ^ CAMELLIA_SUBKEY_R(22),
-		dw = CAMELLIA_RL8(dw);/* round 17 */
-	CAMELLIA_SUBKEY_R(22) = CAMELLIA_SUBKEY_L(22) ^ dw,
-		CAMELLIA_SUBKEY_L(22) = dw;
-	dw = CAMELLIA_SUBKEY_L(23) ^ CAMELLIA_SUBKEY_R(23),
-		dw = CAMELLIA_RL8(dw);/* round 18 */
-	CAMELLIA_SUBKEY_R(23) = CAMELLIA_SUBKEY_L(23) ^ dw,
-		CAMELLIA_SUBKEY_L(23) = dw;
-	dw = CAMELLIA_SUBKEY_L(26) ^ CAMELLIA_SUBKEY_R(26),
-		dw = CAMELLIA_RL8(dw);/* round 19 */
-	CAMELLIA_SUBKEY_R(26) = CAMELLIA_SUBKEY_L(26) ^ dw,
-		CAMELLIA_SUBKEY_L(26) = dw;
-	dw = CAMELLIA_SUBKEY_L(27) ^ CAMELLIA_SUBKEY_R(27),
-		dw = CAMELLIA_RL8(dw);/* round 20 */
-	CAMELLIA_SUBKEY_R(27) = CAMELLIA_SUBKEY_L(27) ^ dw,
-		CAMELLIA_SUBKEY_L(27) = dw;
-	dw = CAMELLIA_SUBKEY_L(28) ^ CAMELLIA_SUBKEY_R(28),
-		dw = CAMELLIA_RL8(dw);/* round 21 */
-	CAMELLIA_SUBKEY_R(28) = CAMELLIA_SUBKEY_L(28) ^ dw,
-		CAMELLIA_SUBKEY_L(28) = dw;
-	dw = CAMELLIA_SUBKEY_L(29) ^ CAMELLIA_SUBKEY_R(29),
-		dw = CAMELLIA_RL8(dw);/* round 22 */
-	CAMELLIA_SUBKEY_R(29) = CAMELLIA_SUBKEY_L(29) ^ dw,
-		CAMELLIA_SUBKEY_L(29) = dw;
-	dw = CAMELLIA_SUBKEY_L(30) ^ CAMELLIA_SUBKEY_R(30),
-		dw = CAMELLIA_RL8(dw);/* round 23 */
-	CAMELLIA_SUBKEY_R(30) = CAMELLIA_SUBKEY_L(30) ^ dw,
-		CAMELLIA_SUBKEY_L(30) = dw;
-	dw = CAMELLIA_SUBKEY_L(31) ^ CAMELLIA_SUBKEY_R(31),
-		dw = CAMELLIA_RL8(dw);/* round 24 */
-	CAMELLIA_SUBKEY_R(31) = CAMELLIA_SUBKEY_L(31) ^ dw,
-		CAMELLIA_SUBKEY_L(31) = dw;
+	dw = SUBKEY_L(2) ^ SUBKEY_R(2); dw = ROL8(dw);/* round 1 */
+	SUBKEY_R(2) = SUBKEY_L(2) ^ dw; SUBKEY_L(2) = dw;
+	dw = SUBKEY_L(3) ^ SUBKEY_R(3); dw = ROL8(dw);/* round 2 */
+	SUBKEY_R(3) = SUBKEY_L(3) ^ dw; SUBKEY_L(3) = dw;
+	dw = SUBKEY_L(4) ^ SUBKEY_R(4); dw = ROL8(dw);/* round 3 */
+	SUBKEY_R(4) = SUBKEY_L(4) ^ dw; SUBKEY_L(4) = dw;
+	dw = SUBKEY_L(5) ^ SUBKEY_R(5); dw = ROL8(dw);/* round 4 */
+	SUBKEY_R(5) = SUBKEY_L(5) ^ dw; SUBKEY_L(5) = dw;
+	dw = SUBKEY_L(6) ^ SUBKEY_R(6); dw = ROL8(dw);/* round 5 */
+	SUBKEY_R(6) = SUBKEY_L(6) ^ dw; SUBKEY_L(6) = dw;
+	dw = SUBKEY_L(7) ^ SUBKEY_R(7); dw = ROL8(dw);/* round 6 */
+	SUBKEY_R(7) = SUBKEY_L(7) ^ dw; SUBKEY_L(7) = dw;
+	dw = SUBKEY_L(10) ^ SUBKEY_R(10); dw = ROL8(dw);/* round 7 */
+	SUBKEY_R(10) = SUBKEY_L(10) ^ dw; SUBKEY_L(10) = dw;
+	dw = SUBKEY_L(11) ^ SUBKEY_R(11); dw = ROL8(dw);/* round 8 */
+	SUBKEY_R(11) = SUBKEY_L(11) ^ dw; SUBKEY_L(11) = dw;
+	dw = SUBKEY_L(12) ^ SUBKEY_R(12); dw = ROL8(dw);/* round 9 */
+	SUBKEY_R(12) = SUBKEY_L(12) ^ dw; SUBKEY_L(12) = dw;
+	dw = SUBKEY_L(13) ^ SUBKEY_R(13); dw = ROL8(dw);/* round 10 */
+	SUBKEY_R(13) = SUBKEY_L(13) ^ dw; SUBKEY_L(13) = dw;
+	dw = SUBKEY_L(14) ^ SUBKEY_R(14); dw = ROL8(dw);/* round 11 */
+	SUBKEY_R(14) = SUBKEY_L(14) ^ dw; SUBKEY_L(14) = dw;
+	dw = SUBKEY_L(15) ^ SUBKEY_R(15); dw = ROL8(dw);/* round 12 */
+	SUBKEY_R(15) = SUBKEY_L(15) ^ dw; SUBKEY_L(15) = dw;
+	dw = SUBKEY_L(18) ^ SUBKEY_R(18); dw = ROL8(dw);/* round 13 */
+	SUBKEY_R(18) = SUBKEY_L(18) ^ dw; SUBKEY_L(18) = dw;
+	dw = SUBKEY_L(19) ^ SUBKEY_R(19); dw = ROL8(dw);/* round 14 */
+	SUBKEY_R(19) = SUBKEY_L(19) ^ dw; SUBKEY_L(19) = dw;
+	dw = SUBKEY_L(20) ^ SUBKEY_R(20); dw = ROL8(dw);/* round 15 */
+	SUBKEY_R(20) = SUBKEY_L(20) ^ dw; SUBKEY_L(20) = dw;
+	dw = SUBKEY_L(21) ^ SUBKEY_R(21); dw = ROL8(dw);/* round 16 */
+	SUBKEY_R(21) = SUBKEY_L(21) ^ dw; SUBKEY_L(21) = dw;
+	dw = SUBKEY_L(22) ^ SUBKEY_R(22); dw = ROL8(dw);/* round 17 */
+	SUBKEY_R(22) = SUBKEY_L(22) ^ dw; SUBKEY_L(22) = dw;
+	dw = SUBKEY_L(23) ^ SUBKEY_R(23); dw = ROL8(dw);/* round 18 */
+	SUBKEY_R(23) = SUBKEY_L(23) ^ dw; SUBKEY_L(23) = dw;
+	dw = SUBKEY_L(26) ^ SUBKEY_R(26); dw = ROL8(dw);/* round 19 */
+	SUBKEY_R(26) = SUBKEY_L(26) ^ dw; SUBKEY_L(26) = dw;
+	dw = SUBKEY_L(27) ^ SUBKEY_R(27); dw = ROL8(dw);/* round 20 */
+	SUBKEY_R(27) = SUBKEY_L(27) ^ dw; SUBKEY_L(27) = dw;
+	dw = SUBKEY_L(28) ^ SUBKEY_R(28); dw = ROL8(dw);/* round 21 */
+	SUBKEY_R(28) = SUBKEY_L(28) ^ dw; SUBKEY_L(28) = dw;
+	dw = SUBKEY_L(29) ^ SUBKEY_R(29); dw = ROL8(dw);/* round 22 */
+	SUBKEY_R(29) = SUBKEY_L(29) ^ dw; SUBKEY_L(29) = dw;
+	dw = SUBKEY_L(30) ^ SUBKEY_R(30); dw = ROL8(dw);/* round 23 */
+	SUBKEY_R(30) = SUBKEY_L(30) ^ dw; SUBKEY_L(30) = dw;
+	dw = SUBKEY_L(31) ^ SUBKEY_R(31); dw = ROL8(dw);/* round 24 */
+	SUBKEY_R(31) = SUBKEY_L(31) ^ dw; SUBKEY_L(31) = dw;
 }
 
 static void camellia_setup192(const unsigned char *key, u32 *subkey)
@@ -1145,424 +1058,400 @@ static void camellia_setup192(const unsi
 }
 
 
-static void camellia_encrypt128(const u32 *subkey, __be32 *io_text)
+static void camellia_encrypt128(const u32 *subkey, u32 *io_text)
 {
-	u32 il,ir,t0,t1;               /* temporary valiables */
+	u32 il,ir,t0,t1;               /* temporary variables */
 
 	u32 io[4];
 
-	io[0] = be32_to_cpu(io_text[0]);
-	io[1] = be32_to_cpu(io_text[1]);
-	io[2] = be32_to_cpu(io_text[2]);
-	io[3] = be32_to_cpu(io_text[3]);
-
 	/* pre whitening but absorb kw2 */
-	io[0] ^= CAMELLIA_SUBKEY_L(0);
-	io[1] ^= CAMELLIA_SUBKEY_R(0);
+	io[0] = io_text[0] ^ SUBKEY_L(0);
+	io[1] = io_text[1] ^ SUBKEY_R(0);
+	io[2] = io_text[2];
+	io[3] = io_text[3];
 
 	/* main iteration */
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(2),CAMELLIA_SUBKEY_R(2),
+			 SUBKEY_L(2),SUBKEY_R(2),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(3),CAMELLIA_SUBKEY_R(3),
+			 SUBKEY_L(3),SUBKEY_R(3),
 			 io[0],io[1],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(4),CAMELLIA_SUBKEY_R(4),
+			 SUBKEY_L(4),SUBKEY_R(4),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(5),CAMELLIA_SUBKEY_R(5),
+			 SUBKEY_L(5),SUBKEY_R(5),
 			 io[0],io[1],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(6),CAMELLIA_SUBKEY_R(6),
+			 SUBKEY_L(6),SUBKEY_R(6),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(7),CAMELLIA_SUBKEY_R(7),
+			 SUBKEY_L(7),SUBKEY_R(7),
 			 io[0],io[1],il,ir,t0,t1);
 
 	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     CAMELLIA_SUBKEY_L(8),CAMELLIA_SUBKEY_R(8),
-		     CAMELLIA_SUBKEY_L(9),CAMELLIA_SUBKEY_R(9),
+		     SUBKEY_L(8),SUBKEY_R(8),
+		     SUBKEY_L(9),SUBKEY_R(9),
 		     t0,t1,il,ir);
 
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(10),CAMELLIA_SUBKEY_R(10),
+			 SUBKEY_L(10),SUBKEY_R(10),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(11),CAMELLIA_SUBKEY_R(11),
+			 SUBKEY_L(11),SUBKEY_R(11),
 			 io[0],io[1],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(12),CAMELLIA_SUBKEY_R(12),
+			 SUBKEY_L(12),SUBKEY_R(12),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(13),CAMELLIA_SUBKEY_R(13),
+			 SUBKEY_L(13),SUBKEY_R(13),
 			 io[0],io[1],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(14),CAMELLIA_SUBKEY_R(14),
+			 SUBKEY_L(14),SUBKEY_R(14),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(15),CAMELLIA_SUBKEY_R(15),
+			 SUBKEY_L(15),SUBKEY_R(15),
 			 io[0],io[1],il,ir,t0,t1);
 
 	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     CAMELLIA_SUBKEY_L(16),CAMELLIA_SUBKEY_R(16),
-		     CAMELLIA_SUBKEY_L(17),CAMELLIA_SUBKEY_R(17),
+		     SUBKEY_L(16),SUBKEY_R(16),
+		     SUBKEY_L(17),SUBKEY_R(17),
 		     t0,t1,il,ir);
 
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(18),CAMELLIA_SUBKEY_R(18),
+			 SUBKEY_L(18),SUBKEY_R(18),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(19),CAMELLIA_SUBKEY_R(19),
+			 SUBKEY_L(19),SUBKEY_R(19),
 			 io[0],io[1],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(20),CAMELLIA_SUBKEY_R(20),
+			 SUBKEY_L(20),SUBKEY_R(20),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(21),CAMELLIA_SUBKEY_R(21),
+			 SUBKEY_L(21),SUBKEY_R(21),
 			 io[0],io[1],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(22),CAMELLIA_SUBKEY_R(22),
+			 SUBKEY_L(22),SUBKEY_R(22),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(23),CAMELLIA_SUBKEY_R(23),
+			 SUBKEY_L(23),SUBKEY_R(23),
 			 io[0],io[1],il,ir,t0,t1);
 
 	/* post whitening but kw4 */
-	io[2] ^= CAMELLIA_SUBKEY_L(24);
-	io[3] ^= CAMELLIA_SUBKEY_R(24);
-
-	io_text[0] = cpu_to_be32(io[2]);
-	io_text[1] = cpu_to_be32(io[3]);
-	io_text[2] = cpu_to_be32(io[0]);
-	io_text[3] = cpu_to_be32(io[1]);
+	io_text[0] = io[2] ^ SUBKEY_L(24);
+	io_text[1] = io[3] ^ SUBKEY_R(24);
+	io_text[2] = io[0];
+	io_text[3] = io[1];
 }
 
-static void camellia_decrypt128(const u32 *subkey, __be32 *io_text)
+static void camellia_decrypt128(const u32 *subkey, u32 *io_text)
 {
-	u32 il,ir,t0,t1;               /* temporary valiables */
+	u32 il,ir,t0,t1;               /* temporary variables */
 
 	u32 io[4];
 
-	io[0] = be32_to_cpu(io_text[0]);
-	io[1] = be32_to_cpu(io_text[1]);
-	io[2] = be32_to_cpu(io_text[2]);
-	io[3] = be32_to_cpu(io_text[3]);
-
 	/* pre whitening but absorb kw2 */
-	io[0] ^= CAMELLIA_SUBKEY_L(24);
-	io[1] ^= CAMELLIA_SUBKEY_R(24);
+	io[0] = io_text[0] ^ SUBKEY_L(24);
+	io[1] = io_text[1] ^ SUBKEY_R(24);
+	io[2] = io_text[2];
+	io[3] = io_text[3];
 
 	/* main iteration */
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(23),CAMELLIA_SUBKEY_R(23),
+			 SUBKEY_L(23),SUBKEY_R(23),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(22),CAMELLIA_SUBKEY_R(22),
+			 SUBKEY_L(22),SUBKEY_R(22),
 			 io[0],io[1],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(21),CAMELLIA_SUBKEY_R(21),
+			 SUBKEY_L(21),SUBKEY_R(21),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(20),CAMELLIA_SUBKEY_R(20),
+			 SUBKEY_L(20),SUBKEY_R(20),
 			 io[0],io[1],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(19),CAMELLIA_SUBKEY_R(19),
+			 SUBKEY_L(19),SUBKEY_R(19),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(18),CAMELLIA_SUBKEY_R(18),
+			 SUBKEY_L(18),SUBKEY_R(18),
 			 io[0],io[1],il,ir,t0,t1);
 
 	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     CAMELLIA_SUBKEY_L(17),CAMELLIA_SUBKEY_R(17),
-		     CAMELLIA_SUBKEY_L(16),CAMELLIA_SUBKEY_R(16),
+		     SUBKEY_L(17),SUBKEY_R(17),
+		     SUBKEY_L(16),SUBKEY_R(16),
 		     t0,t1,il,ir);
 
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(15),CAMELLIA_SUBKEY_R(15),
+			 SUBKEY_L(15),SUBKEY_R(15),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(14),CAMELLIA_SUBKEY_R(14),
+			 SUBKEY_L(14),SUBKEY_R(14),
 			 io[0],io[1],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(13),CAMELLIA_SUBKEY_R(13),
+			 SUBKEY_L(13),SUBKEY_R(13),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(12),CAMELLIA_SUBKEY_R(12),
+			 SUBKEY_L(12),SUBKEY_R(12),
 			 io[0],io[1],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(11),CAMELLIA_SUBKEY_R(11),
+			 SUBKEY_L(11),SUBKEY_R(11),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(10),CAMELLIA_SUBKEY_R(10),
+			 SUBKEY_L(10),SUBKEY_R(10),
 			 io[0],io[1],il,ir,t0,t1);
 
 	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     CAMELLIA_SUBKEY_L(9),CAMELLIA_SUBKEY_R(9),
-		     CAMELLIA_SUBKEY_L(8),CAMELLIA_SUBKEY_R(8),
+		     SUBKEY_L(9),SUBKEY_R(9),
+		     SUBKEY_L(8),SUBKEY_R(8),
 		     t0,t1,il,ir);
 
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(7),CAMELLIA_SUBKEY_R(7),
+			 SUBKEY_L(7),SUBKEY_R(7),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(6),CAMELLIA_SUBKEY_R(6),
+			 SUBKEY_L(6),SUBKEY_R(6),
 			 io[0],io[1],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(5),CAMELLIA_SUBKEY_R(5),
+			 SUBKEY_L(5),SUBKEY_R(5),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(4),CAMELLIA_SUBKEY_R(4),
+			 SUBKEY_L(4),SUBKEY_R(4),
 			 io[0],io[1],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(3),CAMELLIA_SUBKEY_R(3),
+			 SUBKEY_L(3),SUBKEY_R(3),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(2),CAMELLIA_SUBKEY_R(2),
+			 SUBKEY_L(2),SUBKEY_R(2),
 			 io[0],io[1],il,ir,t0,t1);
 
 	/* post whitening but kw4 */
-	io[2] ^= CAMELLIA_SUBKEY_L(0);
-	io[3] ^= CAMELLIA_SUBKEY_R(0);
-
-	io_text[0] = cpu_to_be32(io[2]);
-	io_text[1] = cpu_to_be32(io[3]);
-	io_text[2] = cpu_to_be32(io[0]);
-	io_text[3] = cpu_to_be32(io[1]);
+	io_text[0] = io[2] ^ SUBKEY_L(0);
+	io_text[1] = io[3] ^ SUBKEY_R(0);
+	io_text[2] = io[0];
+	io_text[3] = io[1];
 }
 
-static void camellia_encrypt256(const u32 *subkey, __be32 *io_text)
+static void camellia_encrypt256(const u32 *subkey, u32 *io_text)
 {
-	u32 il,ir,t0,t1;           /* temporary valiables */
+	u32 il,ir,t0,t1;           /* temporary variables */
 
 	u32 io[4];
 
-	io[0] = be32_to_cpu(io_text[0]);
-	io[1] = be32_to_cpu(io_text[1]);
-	io[2] = be32_to_cpu(io_text[2]);
-	io[3] = be32_to_cpu(io_text[3]);
-
 	/* pre whitening but absorb kw2 */
-	io[0] ^= CAMELLIA_SUBKEY_L(0);
-	io[1] ^= CAMELLIA_SUBKEY_R(0);
+	io[0] = io_text[0] ^ SUBKEY_L(0);
+	io[1] = io_text[1] ^ SUBKEY_R(0);
+	io[2] = io_text[2];
+	io[3] = io_text[3];
 
 	/* main iteration */
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(2),CAMELLIA_SUBKEY_R(2),
+			 SUBKEY_L(2),SUBKEY_R(2),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(3),CAMELLIA_SUBKEY_R(3),
+			 SUBKEY_L(3),SUBKEY_R(3),
 			 io[0],io[1],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(4),CAMELLIA_SUBKEY_R(4),
+			 SUBKEY_L(4),SUBKEY_R(4),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(5),CAMELLIA_SUBKEY_R(5),
+			 SUBKEY_L(5),SUBKEY_R(5),
 			 io[0],io[1],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(6),CAMELLIA_SUBKEY_R(6),
+			 SUBKEY_L(6),SUBKEY_R(6),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(7),CAMELLIA_SUBKEY_R(7),
+			 SUBKEY_L(7),SUBKEY_R(7),
 			 io[0],io[1],il,ir,t0,t1);
 
 	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     CAMELLIA_SUBKEY_L(8),CAMELLIA_SUBKEY_R(8),
-		     CAMELLIA_SUBKEY_L(9),CAMELLIA_SUBKEY_R(9),
+		     SUBKEY_L(8),SUBKEY_R(8),
+		     SUBKEY_L(9),SUBKEY_R(9),
 		     t0,t1,il,ir);
 
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(10),CAMELLIA_SUBKEY_R(10),
+			 SUBKEY_L(10),SUBKEY_R(10),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(11),CAMELLIA_SUBKEY_R(11),
+			 SUBKEY_L(11),SUBKEY_R(11),
 			 io[0],io[1],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(12),CAMELLIA_SUBKEY_R(12),
+			 SUBKEY_L(12),SUBKEY_R(12),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(13),CAMELLIA_SUBKEY_R(13),
+			 SUBKEY_L(13),SUBKEY_R(13),
 			 io[0],io[1],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(14),CAMELLIA_SUBKEY_R(14),
+			 SUBKEY_L(14),SUBKEY_R(14),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(15),CAMELLIA_SUBKEY_R(15),
+			 SUBKEY_L(15),SUBKEY_R(15),
 			 io[0],io[1],il,ir,t0,t1);
 
 	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     CAMELLIA_SUBKEY_L(16),CAMELLIA_SUBKEY_R(16),
-		     CAMELLIA_SUBKEY_L(17),CAMELLIA_SUBKEY_R(17),
+		     SUBKEY_L(16),SUBKEY_R(16),
+		     SUBKEY_L(17),SUBKEY_R(17),
 		     t0,t1,il,ir);
 
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(18),CAMELLIA_SUBKEY_R(18),
+			 SUBKEY_L(18),SUBKEY_R(18),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(19),CAMELLIA_SUBKEY_R(19),
+			 SUBKEY_L(19),SUBKEY_R(19),
 			 io[0],io[1],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(20),CAMELLIA_SUBKEY_R(20),
+			 SUBKEY_L(20),SUBKEY_R(20),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(21),CAMELLIA_SUBKEY_R(21),
+			 SUBKEY_L(21),SUBKEY_R(21),
 			 io[0],io[1],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(22),CAMELLIA_SUBKEY_R(22),
+			 SUBKEY_L(22),SUBKEY_R(22),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(23),CAMELLIA_SUBKEY_R(23),
+			 SUBKEY_L(23),SUBKEY_R(23),
 			 io[0],io[1],il,ir,t0,t1);
 
 	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     CAMELLIA_SUBKEY_L(24),CAMELLIA_SUBKEY_R(24),
-		     CAMELLIA_SUBKEY_L(25),CAMELLIA_SUBKEY_R(25),
+		     SUBKEY_L(24),SUBKEY_R(24),
+		     SUBKEY_L(25),SUBKEY_R(25),
 		     t0,t1,il,ir);
 
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(26),CAMELLIA_SUBKEY_R(26),
+			 SUBKEY_L(26),SUBKEY_R(26),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(27),CAMELLIA_SUBKEY_R(27),
+			 SUBKEY_L(27),SUBKEY_R(27),
 			 io[0],io[1],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(28),CAMELLIA_SUBKEY_R(28),
+			 SUBKEY_L(28),SUBKEY_R(28),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(29),CAMELLIA_SUBKEY_R(29),
+			 SUBKEY_L(29),SUBKEY_R(29),
 			 io[0],io[1],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(30),CAMELLIA_SUBKEY_R(30),
+			 SUBKEY_L(30),SUBKEY_R(30),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(31),CAMELLIA_SUBKEY_R(31),
+			 SUBKEY_L(31),SUBKEY_R(31),
 			 io[0],io[1],il,ir,t0,t1);
 
 	/* post whitening but kw4 */
-	io[2] ^= CAMELLIA_SUBKEY_L(32);
-	io[3] ^= CAMELLIA_SUBKEY_R(32);
-
-	io_text[0] = cpu_to_be32(io[2]);
-	io_text[1] = cpu_to_be32(io[3]);
-	io_text[2] = cpu_to_be32(io[0]);
-	io_text[3] = cpu_to_be32(io[1]);
+	io_text[0] = io[2] ^ SUBKEY_L(32);
+	io_text[1] = io[3] ^ SUBKEY_R(32);
+	io_text[2] = io[0];
+	io_text[3] = io[1];
 }
 
-static void camellia_decrypt256(const u32 *subkey, __be32 *io_text)
+static void camellia_decrypt256(const u32 *subkey, u32 *io_text)
 {
-	u32 il,ir,t0,t1;           /* temporary valiables */
+	u32 il,ir,t0,t1;           /* temporary variables */
 
 	u32 io[4];
 
-	io[0] = be32_to_cpu(io_text[0]);
-	io[1] = be32_to_cpu(io_text[1]);
-	io[2] = be32_to_cpu(io_text[2]);
-	io[3] = be32_to_cpu(io_text[3]);
-
 	/* pre whitening but absorb kw2 */
-	io[0] ^= CAMELLIA_SUBKEY_L(32);
-	io[1] ^= CAMELLIA_SUBKEY_R(32);
+	io[0] = io_text[0] ^ SUBKEY_L(32);
+	io[1] = io_text[1] ^ SUBKEY_R(32);
+	io[2] = io_text[2];
+	io[3] = io_text[3];
 
 	/* main iteration */
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(31),CAMELLIA_SUBKEY_R(31),
+			 SUBKEY_L(31),SUBKEY_R(31),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(30),CAMELLIA_SUBKEY_R(30),
+			 SUBKEY_L(30),SUBKEY_R(30),
 			 io[0],io[1],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(29),CAMELLIA_SUBKEY_R(29),
+			 SUBKEY_L(29),SUBKEY_R(29),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(28),CAMELLIA_SUBKEY_R(28),
+			 SUBKEY_L(28),SUBKEY_R(28),
 			 io[0],io[1],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(27),CAMELLIA_SUBKEY_R(27),
+			 SUBKEY_L(27),SUBKEY_R(27),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(26),CAMELLIA_SUBKEY_R(26),
+			 SUBKEY_L(26),SUBKEY_R(26),
 			 io[0],io[1],il,ir,t0,t1);
 
 	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     CAMELLIA_SUBKEY_L(25),CAMELLIA_SUBKEY_R(25),
-		     CAMELLIA_SUBKEY_L(24),CAMELLIA_SUBKEY_R(24),
+		     SUBKEY_L(25),SUBKEY_R(25),
+		     SUBKEY_L(24),SUBKEY_R(24),
 		     t0,t1,il,ir);
 
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(23),CAMELLIA_SUBKEY_R(23),
+			 SUBKEY_L(23),SUBKEY_R(23),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(22),CAMELLIA_SUBKEY_R(22),
+			 SUBKEY_L(22),SUBKEY_R(22),
 			 io[0],io[1],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(21),CAMELLIA_SUBKEY_R(21),
+			 SUBKEY_L(21),SUBKEY_R(21),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(20),CAMELLIA_SUBKEY_R(20),
+			 SUBKEY_L(20),SUBKEY_R(20),
 			 io[0],io[1],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(19),CAMELLIA_SUBKEY_R(19),
+			 SUBKEY_L(19),SUBKEY_R(19),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(18),CAMELLIA_SUBKEY_R(18),
+			 SUBKEY_L(18),SUBKEY_R(18),
 			 io[0],io[1],il,ir,t0,t1);
 
 	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     CAMELLIA_SUBKEY_L(17),CAMELLIA_SUBKEY_R(17),
-		     CAMELLIA_SUBKEY_L(16),CAMELLIA_SUBKEY_R(16),
+		     SUBKEY_L(17),SUBKEY_R(17),
+		     SUBKEY_L(16),SUBKEY_R(16),
 		     t0,t1,il,ir);
 
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(15),CAMELLIA_SUBKEY_R(15),
+			 SUBKEY_L(15),SUBKEY_R(15),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(14),CAMELLIA_SUBKEY_R(14),
+			 SUBKEY_L(14),SUBKEY_R(14),
 			 io[0],io[1],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(13),CAMELLIA_SUBKEY_R(13),
+			 SUBKEY_L(13),SUBKEY_R(13),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(12),CAMELLIA_SUBKEY_R(12),
+			 SUBKEY_L(12),SUBKEY_R(12),
 			 io[0],io[1],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(11),CAMELLIA_SUBKEY_R(11),
+			 SUBKEY_L(11),SUBKEY_R(11),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(10),CAMELLIA_SUBKEY_R(10),
+			 SUBKEY_L(10),SUBKEY_R(10),
 			 io[0],io[1],il,ir,t0,t1);
 
 	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     CAMELLIA_SUBKEY_L(9),CAMELLIA_SUBKEY_R(9),
-		     CAMELLIA_SUBKEY_L(8),CAMELLIA_SUBKEY_R(8),
+		     SUBKEY_L(9),SUBKEY_R(9),
+		     SUBKEY_L(8),SUBKEY_R(8),
 		     t0,t1,il,ir);
 
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(7),CAMELLIA_SUBKEY_R(7),
+			 SUBKEY_L(7),SUBKEY_R(7),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(6),CAMELLIA_SUBKEY_R(6),
+			 SUBKEY_L(6),SUBKEY_R(6),
 			 io[0],io[1],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(5),CAMELLIA_SUBKEY_R(5),
+			 SUBKEY_L(5),SUBKEY_R(5),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(4),CAMELLIA_SUBKEY_R(4),
+			 SUBKEY_L(4),SUBKEY_R(4),
 			 io[0],io[1],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[0],io[1],
-			 CAMELLIA_SUBKEY_L(3),CAMELLIA_SUBKEY_R(3),
+			 SUBKEY_L(3),SUBKEY_R(3),
 			 io[2],io[3],il,ir,t0,t1);
 	CAMELLIA_ROUNDSM(io[2],io[3],
-			 CAMELLIA_SUBKEY_L(2),CAMELLIA_SUBKEY_R(2),
+			 SUBKEY_L(2),SUBKEY_R(2),
 			 io[0],io[1],il,ir,t0,t1);
 
 	/* post whitening but kw4 */
-	io[2] ^= CAMELLIA_SUBKEY_L(0);
-	io[3] ^= CAMELLIA_SUBKEY_R(0);
-
-	io_text[0] = cpu_to_be32(io[2]);
-	io_text[1] = cpu_to_be32(io[3]);
-	io_text[2] = cpu_to_be32(io[0]);
-	io_text[3] = cpu_to_be32(io[1]);
+	io_text[0] = io[2] ^ SUBKEY_L(0);
+	io_text[1] = io[3] ^ SUBKEY_R(0);
+	io_text[2] = io[0];
+	io_text[3] = io[1];
 }
 
 
@@ -1607,9 +1496,12 @@ static void camellia_encrypt(struct cryp
 	const __be32 *src = (const __be32 *)in;
 	__be32 *dst = (__be32 *)out;
 
-	__be32 tmp[4];
+	u32 tmp[4];
 
-	memcpy(tmp, src, CAMELLIA_BLOCK_SIZE);
+	tmp[0] = be32_to_cpu(src[0]);
+	tmp[1] = be32_to_cpu(src[1]);
+	tmp[2] = be32_to_cpu(src[2]);
+	tmp[3] = be32_to_cpu(src[3]);
 
 	switch (cctx->key_length) {
 	case 16:
@@ -1622,7 +1514,10 @@ static void camellia_encrypt(struct cryp
 		break;
 	}
 
-	memcpy(dst, tmp, CAMELLIA_BLOCK_SIZE);
+	dst[0] = cpu_to_be32(tmp[0]);
+	dst[1] = cpu_to_be32(tmp[1]);
+	dst[2] = cpu_to_be32(tmp[2]);
+	dst[3] = cpu_to_be32(tmp[3]);
 }
 
 static void camellia_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
@@ -1631,9 +1526,12 @@ static void camellia_decrypt(struct cryp
 	const __be32 *src = (const __be32 *)in;
 	__be32 *dst = (__be32 *)out;
 
-	__be32 tmp[4];
+	u32 tmp[4];
 
-	memcpy(tmp, src, CAMELLIA_BLOCK_SIZE);
+	tmp[0] = be32_to_cpu(src[0]);
+	tmp[1] = be32_to_cpu(src[1]);
+	tmp[2] = be32_to_cpu(src[2]);
+	tmp[3] = be32_to_cpu(src[3]);
 
 	switch (cctx->key_length) {
 	case 16:
@@ -1646,7 +1544,10 @@ static void camellia_decrypt(struct cryp
 		break;
 	}
 
-	memcpy(dst, tmp, CAMELLIA_BLOCK_SIZE);
+	dst[0] = cpu_to_be32(tmp[0]);
+	dst[1] = cpu_to_be32(tmp[1]);
+	dst[2] = cpu_to_be32(tmp[2]);
+	dst[3] = cpu_to_be32(tmp[3]);
 }
 
 static struct crypto_alg camellia_alg = {

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 3/5] camellia: cleanup
  2007-10-25 11:43 [PATCH0/5] camellia: cleanup, de-unrolling, and 64bit-ization Denys Vlasenko
  2007-10-25 11:45 ` [PATCH 1/5] camellia: cleanup Denys Vlasenko
  2007-10-25 11:45 ` [PATCH 2/5] " Denys Vlasenko
@ 2007-10-25 11:46 ` Denys Vlasenko
  2007-10-26  8:44   ` Noriaki TAKAMIYA
  2007-11-06 14:21   ` Herbert Xu
  2007-10-25 11:47 ` [PATCH 4/5] camellia: de-unrolling Denys Vlasenko
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 40+ messages in thread
From: Denys Vlasenko @ 2007-10-25 11:46 UTC (permalink / raw)
  To: Herbert Xu; +Cc: linux-crypto

[-- Attachment #1: Type: text/plain, Size: 416 bytes --]

On Thursday 25 October 2007 12:43, Denys Vlasenko wrote:
> Hi Hervert,
> 
> Please review and maybe propagate upstream following patches.
> 
> camellia3.diff
>     Optimize GETU32 to use 4-byte memcpy (modern gcc will convert
>     such memcpy to single move instruction on i386).
>     Original GETU32 did four byte fetches, and shifted/XORed those.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
--
vda

[-- Attachment #2: camellia3.diff --]
[-- Type: text/x-diff, Size: 2113 bytes --]

--- linux-2.6.23.src/crypto/camellia2.c	2007-10-24 19:03:22.000000000 +0100
+++ linux-2.6.23.src/crypto/camellia.c	2007-10-24 19:03:27.000000000 +0100
@@ -330,10 +330,12 @@ static const u32 camellia_sp4404[256] = 
  *  macros
  */
 
-# define GETU32(pt) (((u32)(pt)[0] << 24)	\
-		     ^ ((u32)(pt)[1] << 16)	\
-		     ^ ((u32)(pt)[2] <<  8)	\
-		     ^ ((u32)(pt)[3]))
+# define GETU32(v, pt) \
+    do { \
+	/* latest breed of gcc is clever enough to use move */ \
+	memcpy(&(v), (pt), 4); \
+	(v) = be32_to_cpu(v); \
+    } while(0)
 
 /* rotation right shift 1byte */
 #define ROR8(x) (((x) >> 8) + ((x) << 24))
@@ -433,10 +435,11 @@ static void camellia_setup128(const unsi
 	/**
 	 *  k == kll || klr || krl || krr (|| is concatination)
 	 */
-	kll = GETU32(key     );
-	klr = GETU32(key +  4);
-	krl = GETU32(key +  8);
-	krr = GETU32(key + 12);
+	GETU32(kll, key     );
+	GETU32(klr, key +  4);
+	GETU32(krl, key +  8);
+	GETU32(krr, key + 12);
+
 	/**
 	 * generate KL dependent subkeys
 	 */
@@ -687,8 +690,8 @@ static void camellia_setup128(const unsi
 
 static void camellia_setup256(const unsigned char *key, u32 *subkey)
 {
-	u32 kll,klr,krl,krr;           /* left half of key */
-	u32 krll,krlr,krrl,krrr;       /* right half of key */
+	u32 kll, klr, krl, krr;        /* left half of key */
+	u32 krll, krlr, krrl, krrr;    /* right half of key */
 	u32 il, ir, t0, t1, w0, w1;    /* temporary variables */
 	u32 kw4l, kw4r, dw, tl, tr;
 	u32 subL[34];
@@ -698,14 +701,14 @@ static void camellia_setup256(const unsi
 	 *  key = (kll || klr || krl || krr || krll || krlr || krrl || krrr)
 	 *  (|| is concatination)
 	 */
-	kll  = GETU32(key     );
-	klr  = GETU32(key +  4);
-	krl  = GETU32(key +  8);
-	krr  = GETU32(key + 12);
-	krll = GETU32(key + 16);
-	krlr = GETU32(key + 20);
-	krrl = GETU32(key + 24);
-	krrr = GETU32(key + 28);
+	GETU32(kll,  key     );
+	GETU32(klr,  key +  4);
+	GETU32(krl,  key +  8);
+	GETU32(krr,  key + 12);
+	GETU32(krll, key + 16);
+	GETU32(krlr, key + 20);
+	GETU32(krrl, key + 24);
+	GETU32(krrr, key + 28);
 
 	/* generate KL dependent subkeys */
 	/* kw1 */

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 4/5] camellia: de-unrolling
  2007-10-25 11:43 [PATCH0/5] camellia: cleanup, de-unrolling, and 64bit-ization Denys Vlasenko
                   ` (2 preceding siblings ...)
  2007-10-25 11:46 ` [PATCH 3/5] " Denys Vlasenko
@ 2007-10-25 11:47 ` Denys Vlasenko
  2007-10-26  8:45   ` Noriaki TAKAMIYA
  2007-11-06 14:21   ` Herbert Xu
  2007-10-25 11:48 ` [PATCH 5/5] camellia: de-unrolling, 64bit-ization Denys Vlasenko
  2007-10-25 11:57 ` [PATCH0/5] camellia: cleanup, de-unrolling, and 64bit-ization Denys Vlasenko
  5 siblings, 2 replies; 40+ messages in thread
From: Denys Vlasenko @ 2007-10-25 11:47 UTC (permalink / raw)
  To: Herbert Xu; +Cc: linux-crypto

[-- Attachment #1: Type: text/plain, Size: 494 bytes --]

On Thursday 25 October 2007 12:43, Denys Vlasenko wrote:
> Hi Hervert,
> 
> Please review and maybe propagate upstream following patches.
> 
> camellia4.diff
>     Move huge unrolled pieces of code (3 screenfuls) at the end of
>     128/256 key setup routines into common camellia_setup_tail(),
>     convert it to loop there.
>     Loop is still unrolled six times, so performance hit is very small,
>     code size win is big.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
--
vda

[-- Attachment #2: camellia4.diff --]
[-- Type: text/x-diff, Size: 6849 bytes --]

--- linux-2.6.23.src/crypto/camellia3.c	2007-10-24 19:03:27.000000000 +0100
+++ linux-2.6.23.src/crypto/camellia.c	2007-10-24 19:03:57.000000000 +0100
@@ -424,6 +424,27 @@ static const u32 camellia_sp4404[256] = 
 #define SUBKEY_L(INDEX) (subkey[(INDEX)*2])
 #define SUBKEY_R(INDEX) (subkey[(INDEX)*2 + 1])
 
+static void camellia_setup_tail(u32 *subkey, int max)
+{
+	u32 dw;
+	int i = 2;
+	do {
+		dw = SUBKEY_L(i + 0) ^ SUBKEY_R(i + 0); dw = ROL8(dw);/* round 1 */
+		SUBKEY_R(i + 0) = SUBKEY_L(i + 0) ^ dw; SUBKEY_L(i + 0) = dw;
+		dw = SUBKEY_L(i + 1) ^ SUBKEY_R(i + 1); dw = ROL8(dw);/* round 2 */
+		SUBKEY_R(i + 1) = SUBKEY_L(i + 1) ^ dw; SUBKEY_L(i + 1) = dw;
+		dw = SUBKEY_L(i + 2) ^ SUBKEY_R(i + 2); dw = ROL8(dw);/* round 3 */
+		SUBKEY_R(i + 2) = SUBKEY_L(i + 2) ^ dw; SUBKEY_L(i + 2) = dw;
+		dw = SUBKEY_L(i + 3) ^ SUBKEY_R(i + 3); dw = ROL8(dw);/* round 4 */
+		SUBKEY_R(i + 3) = SUBKEY_L(i + 3) ^ dw; SUBKEY_L(i + 3) = dw;
+		dw = SUBKEY_L(i + 4) ^ SUBKEY_R(i + 4); dw = ROL8(dw);/* round 5 */
+		SUBKEY_R(i + 4) = SUBKEY_L(i + 4) ^ dw; SUBKEY_L(i + 4) = dw;
+		dw = SUBKEY_L(i + 5) ^ SUBKEY_R(i + 5); dw = ROL8(dw);/* round 6 */
+		SUBKEY_R(i + 5) = SUBKEY_L(i + 5) ^ dw; SUBKEY_L(i + 5) = dw;
+		i += 8;
+	} while (i < max);
+}
+
 static void camellia_setup128(const unsigned char *key, u32 *subkey)
 {
 	u32 kll, klr, krl, krr;
@@ -650,42 +671,7 @@ static void camellia_setup128(const unsi
 	SUBKEY_R(24) = subR[24] ^ subR[23];
 
 	/* apply the inverse of the last half of P-function */
-	dw = SUBKEY_L(2) ^ SUBKEY_R(2); dw = ROL8(dw);/* round 1 */
-	SUBKEY_R(2) = SUBKEY_L(2) ^ dw; SUBKEY_L(2) = dw;
-	dw = SUBKEY_L(3) ^ SUBKEY_R(3); dw = ROL8(dw);/* round 2 */
-	SUBKEY_R(3) = SUBKEY_L(3) ^ dw; SUBKEY_L(3) = dw;
-	dw = SUBKEY_L(4) ^ SUBKEY_R(4); dw = ROL8(dw);/* round 3 */
-	SUBKEY_R(4) = SUBKEY_L(4) ^ dw; SUBKEY_L(4) = dw;
-	dw = SUBKEY_L(5) ^ SUBKEY_R(5); dw = ROL8(dw);/* round 4 */
-	SUBKEY_R(5) = SUBKEY_L(5) ^ dw; SUBKEY_L(5) = dw;
-	dw = SUBKEY_L(6) ^ SUBKEY_R(6); dw = ROL8(dw);/* round 5 */
-	SUBKEY_R(6) = SUBKEY_L(6) ^ dw; SUBKEY_L(6) = dw;
-	dw = SUBKEY_L(7) ^ SUBKEY_R(7); dw = ROL8(dw);/* round 6 */
-	SUBKEY_R(7) = SUBKEY_L(7) ^ dw; SUBKEY_L(7) = dw;
-	dw = SUBKEY_L(10) ^ SUBKEY_R(10); dw = ROL8(dw);/* round 7 */
-	SUBKEY_R(10) = SUBKEY_L(10) ^ dw; SUBKEY_L(10) = dw;
-	dw = SUBKEY_L(11) ^ SUBKEY_R(11); dw = ROL8(dw);/* round 8 */
-	SUBKEY_R(11) = SUBKEY_L(11) ^ dw; SUBKEY_L(11) = dw;
-	dw = SUBKEY_L(12) ^ SUBKEY_R(12); dw = ROL8(dw);/* round 9 */
-	SUBKEY_R(12) = SUBKEY_L(12) ^ dw; SUBKEY_L(12) = dw;
-	dw = SUBKEY_L(13) ^ SUBKEY_R(13); dw = ROL8(dw);/* round 10 */
-	SUBKEY_R(13) = SUBKEY_L(13) ^ dw; SUBKEY_L(13) = dw;
-	dw = SUBKEY_L(14) ^ SUBKEY_R(14); dw = ROL8(dw);/* round 11 */
-	SUBKEY_R(14) = SUBKEY_L(14) ^ dw; SUBKEY_L(14) = dw;
-	dw = SUBKEY_L(15) ^ SUBKEY_R(15); dw = ROL8(dw);/* round 12 */
-	SUBKEY_R(15) = SUBKEY_L(15) ^ dw; SUBKEY_L(15) = dw;
-	dw = SUBKEY_L(18) ^ SUBKEY_R(18); dw = ROL8(dw);/* round 13 */
-	SUBKEY_R(18) = SUBKEY_L(18) ^ dw; SUBKEY_L(18) = dw;
-	dw = SUBKEY_L(19) ^ SUBKEY_R(19); dw = ROL8(dw);/* round 14 */
-	SUBKEY_R(19) = SUBKEY_L(19) ^ dw; SUBKEY_L(19) = dw;
-	dw = SUBKEY_L(20) ^ SUBKEY_R(20); dw = ROL8(dw);/* round 15 */
-	SUBKEY_R(20) = SUBKEY_L(20) ^ dw; SUBKEY_L(20) = dw;
-	dw = SUBKEY_L(21) ^ SUBKEY_R(21); dw = ROL8(dw);/* round 16 */
-	SUBKEY_R(21) = SUBKEY_L(21) ^ dw; SUBKEY_L(21) = dw;
-	dw = SUBKEY_L(22) ^ SUBKEY_R(22); dw = ROL8(dw);/* round 17 */
-	SUBKEY_R(22) = SUBKEY_L(22) ^ dw; SUBKEY_L(22) = dw;
-	dw = SUBKEY_L(23) ^ SUBKEY_R(23); dw = ROL8(dw);/* round 18 */
-	SUBKEY_R(23) = SUBKEY_L(23) ^ dw; SUBKEY_L(23) = dw;
+	camellia_setup_tail(subkey, 24);
 }
 
 static void camellia_setup256(const unsigned char *key, u32 *subkey)
@@ -995,54 +981,7 @@ static void camellia_setup256(const unsi
 	SUBKEY_R(32) = subR[32] ^ subR[31];
 
 	/* apply the inverse of the last half of P-function */
-	dw = SUBKEY_L(2) ^ SUBKEY_R(2); dw = ROL8(dw);/* round 1 */
-	SUBKEY_R(2) = SUBKEY_L(2) ^ dw; SUBKEY_L(2) = dw;
-	dw = SUBKEY_L(3) ^ SUBKEY_R(3); dw = ROL8(dw);/* round 2 */
-	SUBKEY_R(3) = SUBKEY_L(3) ^ dw; SUBKEY_L(3) = dw;
-	dw = SUBKEY_L(4) ^ SUBKEY_R(4); dw = ROL8(dw);/* round 3 */
-	SUBKEY_R(4) = SUBKEY_L(4) ^ dw; SUBKEY_L(4) = dw;
-	dw = SUBKEY_L(5) ^ SUBKEY_R(5); dw = ROL8(dw);/* round 4 */
-	SUBKEY_R(5) = SUBKEY_L(5) ^ dw; SUBKEY_L(5) = dw;
-	dw = SUBKEY_L(6) ^ SUBKEY_R(6); dw = ROL8(dw);/* round 5 */
-	SUBKEY_R(6) = SUBKEY_L(6) ^ dw; SUBKEY_L(6) = dw;
-	dw = SUBKEY_L(7) ^ SUBKEY_R(7); dw = ROL8(dw);/* round 6 */
-	SUBKEY_R(7) = SUBKEY_L(7) ^ dw; SUBKEY_L(7) = dw;
-	dw = SUBKEY_L(10) ^ SUBKEY_R(10); dw = ROL8(dw);/* round 7 */
-	SUBKEY_R(10) = SUBKEY_L(10) ^ dw; SUBKEY_L(10) = dw;
-	dw = SUBKEY_L(11) ^ SUBKEY_R(11); dw = ROL8(dw);/* round 8 */
-	SUBKEY_R(11) = SUBKEY_L(11) ^ dw; SUBKEY_L(11) = dw;
-	dw = SUBKEY_L(12) ^ SUBKEY_R(12); dw = ROL8(dw);/* round 9 */
-	SUBKEY_R(12) = SUBKEY_L(12) ^ dw; SUBKEY_L(12) = dw;
-	dw = SUBKEY_L(13) ^ SUBKEY_R(13); dw = ROL8(dw);/* round 10 */
-	SUBKEY_R(13) = SUBKEY_L(13) ^ dw; SUBKEY_L(13) = dw;
-	dw = SUBKEY_L(14) ^ SUBKEY_R(14); dw = ROL8(dw);/* round 11 */
-	SUBKEY_R(14) = SUBKEY_L(14) ^ dw; SUBKEY_L(14) = dw;
-	dw = SUBKEY_L(15) ^ SUBKEY_R(15); dw = ROL8(dw);/* round 12 */
-	SUBKEY_R(15) = SUBKEY_L(15) ^ dw; SUBKEY_L(15) = dw;
-	dw = SUBKEY_L(18) ^ SUBKEY_R(18); dw = ROL8(dw);/* round 13 */
-	SUBKEY_R(18) = SUBKEY_L(18) ^ dw; SUBKEY_L(18) = dw;
-	dw = SUBKEY_L(19) ^ SUBKEY_R(19); dw = ROL8(dw);/* round 14 */
-	SUBKEY_R(19) = SUBKEY_L(19) ^ dw; SUBKEY_L(19) = dw;
-	dw = SUBKEY_L(20) ^ SUBKEY_R(20); dw = ROL8(dw);/* round 15 */
-	SUBKEY_R(20) = SUBKEY_L(20) ^ dw; SUBKEY_L(20) = dw;
-	dw = SUBKEY_L(21) ^ SUBKEY_R(21); dw = ROL8(dw);/* round 16 */
-	SUBKEY_R(21) = SUBKEY_L(21) ^ dw; SUBKEY_L(21) = dw;
-	dw = SUBKEY_L(22) ^ SUBKEY_R(22); dw = ROL8(dw);/* round 17 */
-	SUBKEY_R(22) = SUBKEY_L(22) ^ dw; SUBKEY_L(22) = dw;
-	dw = SUBKEY_L(23) ^ SUBKEY_R(23); dw = ROL8(dw);/* round 18 */
-	SUBKEY_R(23) = SUBKEY_L(23) ^ dw; SUBKEY_L(23) = dw;
-	dw = SUBKEY_L(26) ^ SUBKEY_R(26); dw = ROL8(dw);/* round 19 */
-	SUBKEY_R(26) = SUBKEY_L(26) ^ dw; SUBKEY_L(26) = dw;
-	dw = SUBKEY_L(27) ^ SUBKEY_R(27); dw = ROL8(dw);/* round 20 */
-	SUBKEY_R(27) = SUBKEY_L(27) ^ dw; SUBKEY_L(27) = dw;
-	dw = SUBKEY_L(28) ^ SUBKEY_R(28); dw = ROL8(dw);/* round 21 */
-	SUBKEY_R(28) = SUBKEY_L(28) ^ dw; SUBKEY_L(28) = dw;
-	dw = SUBKEY_L(29) ^ SUBKEY_R(29); dw = ROL8(dw);/* round 22 */
-	SUBKEY_R(29) = SUBKEY_L(29) ^ dw; SUBKEY_L(29) = dw;
-	dw = SUBKEY_L(30) ^ SUBKEY_R(30); dw = ROL8(dw);/* round 23 */
-	SUBKEY_R(30) = SUBKEY_L(30) ^ dw; SUBKEY_L(30) = dw;
-	dw = SUBKEY_L(31) ^ SUBKEY_R(31); dw = ROL8(dw);/* round 24 */
-	SUBKEY_R(31) = SUBKEY_L(31) ^ dw; SUBKEY_L(31) = dw;
+	camellia_setup_tail(subkey, 32);
 }
 
 static void camellia_setup192(const unsigned char *key, u32 *subkey)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 5/5] camellia: de-unrolling, 64bit-ization
  2007-10-25 11:43 [PATCH0/5] camellia: cleanup, de-unrolling, and 64bit-ization Denys Vlasenko
                   ` (3 preceding siblings ...)
  2007-10-25 11:47 ` [PATCH 4/5] camellia: de-unrolling Denys Vlasenko
@ 2007-10-25 11:48 ` Denys Vlasenko
  2007-10-26  8:45   ` Noriaki TAKAMIYA
  2007-11-06 14:23   ` Herbert Xu
  2007-10-25 11:57 ` [PATCH0/5] camellia: cleanup, de-unrolling, and 64bit-ization Denys Vlasenko
  5 siblings, 2 replies; 40+ messages in thread
From: Denys Vlasenko @ 2007-10-25 11:48 UTC (permalink / raw)
  To: Herbert Xu; +Cc: linux-crypto

[-- Attachment #1: Type: text/plain, Size: 1001 bytes --]

On Thursday 25 October 2007 12:43, Denys Vlasenko wrote:
> Hi Hervert,
> 
> Please review and maybe propagate upstream following patches.
> 
> camellia5.diff
>     Use alternative key setup implementation with mostly 64-bit ops
>     if BITS_PER_LONG >= 64. Both much smaller and much faster.
> 
>     Unify camellia_en/decrypt128/256 into camellia_do_en/decrypt.
>     Code was similar, with just one additional if() we can use came code.
> 
>     If CONFIG_CC_OPTIMIZE_FOR_SIZE is defined,
>     use loop in camellia_do_en/decrypt instead of unrolled code.
>     ~5% encrypt/decrypt slowdown.
> 
>     Replace (x & 0xff) with (u8)x, gcc is not smart enough to realize
>     that it can do (x & 0xff) this way (which is smaller at least on i386).
> 
>     Don't do (x & 0xff) in a few places where x cannot be > 255 anyway:
>         t0 = il >> 16; v = camellia_sp0222[(t1 >> 8) & 0xff];
>     il16 is u32, (thus t1 >> 8) is one byte!

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
--
vda

[-- Attachment #2: camellia5.diff --]
[-- Type: text/x-diff, Size: 55724 bytes --]

--- linux-2.6.23.src/crypto/camellia4.c	2007-10-24 19:03:57.000000000 +0100
+++ linux-2.6.23.src/crypto/camellia.c	2007-10-25 11:57:16.000000000 +0100
@@ -36,6 +36,13 @@
 #include <linux/kernel.h>
 #include <linux/module.h>
 
+#if BITS_PER_LONG >= 64
+
+/* Use alternative implementation with mostly 64-bit ops */
+#include "camellia_64.c"
+
+#else
+
 static const u32 camellia_sp1110[256] = {
 	0x70707000,0x82828200,0x2c2c2c00,0xececec00,
 	0xb3b3b300,0x27272700,0xc0c0c000,0xe5e5e500,
@@ -329,7 +336,6 @@ static const u32 camellia_sp4404[256] = 
 /*
  *  macros
  */
-
 # define GETU32(v, pt) \
     do { \
 	/* latest breed of gcc is clever enough to use move */ \
@@ -364,63 +370,28 @@ static const u32 camellia_sp4404[256] = 
     } while(0)
 
 
+/*
+ * Key setup
+ */
 #define CAMELLIA_F(xl, xr, kl, kr, yl, yr, il, ir, t0, t1)	\
     do {							\
 	il = xl ^ kl;						\
 	ir = xr ^ kr;						\
 	t0 = il >> 16;						\
 	t1 = ir >> 16;						\
-	yl = camellia_sp1110[ir & 0xff]				\
-	   ^ camellia_sp0222[(t1 >> 8) & 0xff]			\
-	   ^ camellia_sp3033[t1 & 0xff]				\
-	   ^ camellia_sp4404[(ir >> 8) & 0xff];			\
-	yr = camellia_sp1110[(t0 >> 8) & 0xff]			\
-	   ^ camellia_sp0222[t0 & 0xff]				\
-	   ^ camellia_sp3033[(il >> 8) & 0xff]			\
-	   ^ camellia_sp4404[il & 0xff];			\
+	yl = camellia_sp1110[(u8)(ir     )]			\
+	   ^ camellia_sp0222[    (t1 >> 8)]			\
+	   ^ camellia_sp3033[(u8)(t1     )]			\
+	   ^ camellia_sp4404[(u8)(ir >> 8)];			\
+	yr = camellia_sp1110[    (t0 >> 8)]			\
+	   ^ camellia_sp0222[(u8)(t0     )]			\
+	   ^ camellia_sp3033[(u8)(il >> 8)]			\
+	   ^ camellia_sp4404[(u8)(il     )];			\
 	yl ^= yr;						\
 	yr = ROR8(yr);						\
 	yr ^= yl;						\
     } while(0)
 
-
-/*
- * for speed up
- *
- */
-#define CAMELLIA_FLS(ll, lr, rl, rr, kll, klr, krl, krr, t0, t1, t2, t3) \
-    do {								\
-	t0 = kll;							\
-	t2 = krr;							\
-	t0 &= ll;							\
-	t2 |= rr;							\
-	rl ^= t2;							\
-	lr ^= ROL1(t0);							\
-	t3 = krl;							\
-	t1 = klr;							\
-	t3 &= rl;							\
-	t1 |= lr;							\
-	ll ^= t1;							\
-	rr ^= ROL1(t3);							\
-    } while(0)
-
-#define CAMELLIA_ROUNDSM(xl, xr, kl, kr, yl, yr, il, ir, t0, t1)	\
-    do {								\
-	ir =  camellia_sp1110[xr & 0xff];				\
-	il =  camellia_sp1110[(xl>>24) & 0xff];				\
-	ir ^= camellia_sp0222[(xr>>24) & 0xff];				\
-	il ^= camellia_sp0222[(xl>>16) & 0xff];				\
-	ir ^= camellia_sp3033[(xr>>16) & 0xff];				\
-	il ^= camellia_sp3033[(xl>>8) & 0xff];				\
-	ir ^= camellia_sp4404[(xr>>8) & 0xff];				\
-	il ^= camellia_sp4404[xl & 0xff];				\
-	il ^= kl;							\
-	ir ^= il ^ kr;							\
-	yl ^= ir;							\
-	yr ^= ROR8(il) ^ ir;						\
-    } while(0)
-
-
 #define SUBKEY_L(INDEX) (subkey[(INDEX)*2])
 #define SUBKEY_R(INDEX) (subkey[(INDEX)*2 + 1])
 
@@ -622,7 +593,7 @@ static void camellia_setup128(const unsi
 	SUBKEY_L(6) = subL[5] ^ subL[7]; /* round 5 */
 	SUBKEY_R(6) = subR[5] ^ subR[7];
 	tl = subL[10] ^ (subR[10] & ~subR[8]);
-	dw = tl & subL[8],  /* FL(kl1) */
+	dw = tl & subL[8];  /* FL(kl1) */
 		tr = subR[10] ^ ROL1(dw);
 	SUBKEY_L(7) = subL[6] ^ tl; /* round 6 */
 	SUBKEY_R(7) = subR[6] ^ tr;
@@ -1000,400 +971,173 @@ static void camellia_setup192(const unsi
 }
 
 
-static void camellia_encrypt128(const u32 *subkey, u32 *io_text)
-{
-	u32 il,ir,t0,t1;               /* temporary variables */
-
-	u32 io[4];
-
-	/* pre whitening but absorb kw2 */
-	io[0] = io_text[0] ^ SUBKEY_L(0);
-	io[1] = io_text[1] ^ SUBKEY_R(0);
-	io[2] = io_text[2];
-	io[3] = io_text[3];
-
-	/* main iteration */
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(2),SUBKEY_R(2),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(3),SUBKEY_R(3),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(4),SUBKEY_R(4),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(5),SUBKEY_R(5),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(6),SUBKEY_R(6),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(7),SUBKEY_R(7),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(8),SUBKEY_R(8),
-		     SUBKEY_L(9),SUBKEY_R(9),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(10),SUBKEY_R(10),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(11),SUBKEY_R(11),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(12),SUBKEY_R(12),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(13),SUBKEY_R(13),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(14),SUBKEY_R(14),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(15),SUBKEY_R(15),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(16),SUBKEY_R(16),
-		     SUBKEY_L(17),SUBKEY_R(17),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(18),SUBKEY_R(18),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(19),SUBKEY_R(19),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(20),SUBKEY_R(20),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(21),SUBKEY_R(21),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(22),SUBKEY_R(22),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(23),SUBKEY_R(23),
-			 io[0],io[1],il,ir,t0,t1);
+/*
+ * Encrypt/decrypt
+ */
+#define CAMELLIA_FLS(ll, lr, rl, rr, kll, klr, krl, krr, t0, t1, t2, t3) \
+    do {								\
+	t0 = kll;							\
+	t2 = krr;							\
+	t0 &= ll;							\
+	t2 |= rr;							\
+	rl ^= t2;							\
+	lr ^= ROL1(t0);							\
+	t3 = krl;							\
+	t1 = klr;							\
+	t3 &= rl;							\
+	t1 |= lr;							\
+	ll ^= t1;							\
+	rr ^= ROL1(t3);							\
+    } while(0)
 
-	/* post whitening but kw4 */
-	io_text[0] = io[2] ^ SUBKEY_L(24);
-	io_text[1] = io[3] ^ SUBKEY_R(24);
-	io_text[2] = io[0];
-	io_text[3] = io[1];
-}
+#define CAMELLIA_ROUNDSM(xl, xr, kl, kr, yl, yr, il, ir)		\
+    do {								\
+	ir =  camellia_sp1110[(u8)xr];					\
+	il =  camellia_sp1110[    (xl >> 24)];				\
+	ir ^= camellia_sp0222[    (xr >> 24)];				\
+	il ^= camellia_sp0222[(u8)(xl >> 16)];				\
+	ir ^= camellia_sp3033[(u8)(xr >> 16)];				\
+	il ^= camellia_sp3033[(u8)(xl >> 8)];				\
+	ir ^= camellia_sp4404[(u8)(xr >> 8)];				\
+	il ^= camellia_sp4404[(u8)xl];					\
+	il ^= kl;							\
+	ir ^= il ^ kr;							\
+	yl ^= ir;							\
+	yr ^= ROR8(il) ^ ir;						\
+    } while(0)
 
-static void camellia_decrypt128(const u32 *subkey, u32 *io_text)
+/* max = 24: 128bit encrypt, max = 32: 256bit encrypt */
+static void camellia_do_encrypt(const u32 *subkey, u32 *io, unsigned max)
 {
 	u32 il,ir,t0,t1;               /* temporary variables */
 
-	u32 io[4];
-
 	/* pre whitening but absorb kw2 */
-	io[0] = io_text[0] ^ SUBKEY_L(24);
-	io[1] = io_text[1] ^ SUBKEY_R(24);
-	io[2] = io_text[2];
-	io[3] = io_text[3];
+	io[0] ^= SUBKEY_L(0);
+	io[1] ^= SUBKEY_R(0);
 
 	/* main iteration */
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(23),SUBKEY_R(23),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(22),SUBKEY_R(22),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(21),SUBKEY_R(21),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(20),SUBKEY_R(20),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(19),SUBKEY_R(19),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(18),SUBKEY_R(18),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(17),SUBKEY_R(17),
-		     SUBKEY_L(16),SUBKEY_R(16),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(15),SUBKEY_R(15),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(14),SUBKEY_R(14),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(13),SUBKEY_R(13),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(12),SUBKEY_R(12),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(11),SUBKEY_R(11),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(10),SUBKEY_R(10),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(9),SUBKEY_R(9),
-		     SUBKEY_L(8),SUBKEY_R(8),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(7),SUBKEY_R(7),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(6),SUBKEY_R(6),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(5),SUBKEY_R(5),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(4),SUBKEY_R(4),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(3),SUBKEY_R(3),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(2),SUBKEY_R(2),
-			 io[0],io[1],il,ir,t0,t1);
-
-	/* post whitening but kw4 */
-	io_text[0] = io[2] ^ SUBKEY_L(0);
-	io_text[1] = io[3] ^ SUBKEY_R(0);
-	io_text[2] = io[0];
-	io_text[3] = io[1];
-}
-
-static void camellia_encrypt256(const u32 *subkey, u32 *io_text)
-{
-	u32 il,ir,t0,t1;           /* temporary variables */
-
-	u32 io[4];
-
-	/* pre whitening but absorb kw2 */
-	io[0] = io_text[0] ^ SUBKEY_L(0);
-	io[1] = io_text[1] ^ SUBKEY_R(0);
-	io[2] = io_text[2];
-	io[3] = io_text[3];
+#define ROUNDS(i) do { \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 2),SUBKEY_R(i + 2), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 3),SUBKEY_R(i + 3), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 4),SUBKEY_R(i + 4), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 5),SUBKEY_R(i + 5), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 6),SUBKEY_R(i + 6), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 7),SUBKEY_R(i + 7), \
+			 io[0],io[1],il,ir); \
+} while (0)
+#define FLS(i) do { \
+	CAMELLIA_FLS(io[0],io[1],io[2],io[3], \
+		     SUBKEY_L(i + 0),SUBKEY_R(i + 0), \
+		     SUBKEY_L(i + 1),SUBKEY_R(i + 1), \
+		     t0,t1,il,ir); \
+} while (0)
+
+#ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
+	{
+		unsigned i = 0;
+		while (1) {
+			ROUNDS(i);
+			i += 8;
+			if (i >= max)
+				break;
+			FLS(i);
+		}
+	}
+#else
+	ROUNDS(0);
+	FLS(8);
+	ROUNDS(8);
+	FLS(16);
+	ROUNDS(16);
+	if (max == 32) {
+		FLS(24);
+		ROUNDS(24);
+	}
+#endif
 
-	/* main iteration */
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(2),SUBKEY_R(2),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(3),SUBKEY_R(3),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(4),SUBKEY_R(4),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(5),SUBKEY_R(5),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(6),SUBKEY_R(6),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(7),SUBKEY_R(7),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(8),SUBKEY_R(8),
-		     SUBKEY_L(9),SUBKEY_R(9),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(10),SUBKEY_R(10),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(11),SUBKEY_R(11),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(12),SUBKEY_R(12),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(13),SUBKEY_R(13),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(14),SUBKEY_R(14),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(15),SUBKEY_R(15),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(16),SUBKEY_R(16),
-		     SUBKEY_L(17),SUBKEY_R(17),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(18),SUBKEY_R(18),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(19),SUBKEY_R(19),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(20),SUBKEY_R(20),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(21),SUBKEY_R(21),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(22),SUBKEY_R(22),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(23),SUBKEY_R(23),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(24),SUBKEY_R(24),
-		     SUBKEY_L(25),SUBKEY_R(25),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(26),SUBKEY_R(26),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(27),SUBKEY_R(27),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(28),SUBKEY_R(28),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(29),SUBKEY_R(29),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(30),SUBKEY_R(30),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(31),SUBKEY_R(31),
-			 io[0],io[1],il,ir,t0,t1);
+#undef ROUNDS
+#undef FLS
 
 	/* post whitening but kw4 */
-	io_text[0] = io[2] ^ SUBKEY_L(32);
-	io_text[1] = io[3] ^ SUBKEY_R(32);
-	io_text[2] = io[0];
-	io_text[3] = io[1];
+	io[2] ^= SUBKEY_L(max);
+	io[3] ^= SUBKEY_R(max);
+	/* NB: io[0],[1] should be swapped with [2],[3] by caller! */
 }
 
-static void camellia_decrypt256(const u32 *subkey, u32 *io_text)
+static void camellia_do_decrypt(const u32 *subkey, u32 *io, unsigned i)
 {
-	u32 il,ir,t0,t1;           /* temporary variables */
-
-	u32 io[4];
+	u32 il,ir,t0,t1;               /* temporary variables */
 
 	/* pre whitening but absorb kw2 */
-	io[0] = io_text[0] ^ SUBKEY_L(32);
-	io[1] = io_text[1] ^ SUBKEY_R(32);
-	io[2] = io_text[2];
-	io[3] = io_text[3];
+	io[0] ^= SUBKEY_L(i);
+	io[1] ^= SUBKEY_R(i);
 
 	/* main iteration */
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(31),SUBKEY_R(31),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(30),SUBKEY_R(30),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(29),SUBKEY_R(29),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(28),SUBKEY_R(28),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(27),SUBKEY_R(27),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(26),SUBKEY_R(26),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(25),SUBKEY_R(25),
-		     SUBKEY_L(24),SUBKEY_R(24),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(23),SUBKEY_R(23),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(22),SUBKEY_R(22),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(21),SUBKEY_R(21),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(20),SUBKEY_R(20),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(19),SUBKEY_R(19),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(18),SUBKEY_R(18),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(17),SUBKEY_R(17),
-		     SUBKEY_L(16),SUBKEY_R(16),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(15),SUBKEY_R(15),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(14),SUBKEY_R(14),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(13),SUBKEY_R(13),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(12),SUBKEY_R(12),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(11),SUBKEY_R(11),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(10),SUBKEY_R(10),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(9),SUBKEY_R(9),
-		     SUBKEY_L(8),SUBKEY_R(8),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(7),SUBKEY_R(7),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(6),SUBKEY_R(6),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(5),SUBKEY_R(5),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(4),SUBKEY_R(4),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(3),SUBKEY_R(3),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(2),SUBKEY_R(2),
-			 io[0],io[1],il,ir,t0,t1);
+#define ROUNDS(i) do { \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 7),SUBKEY_R(i + 7), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 6),SUBKEY_R(i + 6), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 5),SUBKEY_R(i + 5), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 4),SUBKEY_R(i + 4), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 3),SUBKEY_R(i + 3), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 2),SUBKEY_R(i + 2), \
+			 io[0],io[1],il,ir); \
+} while (0)
+#define FLS(i) do { \
+	CAMELLIA_FLS(io[0],io[1],io[2],io[3], \
+		     SUBKEY_L(i + 1),SUBKEY_R(i + 1), \
+		     SUBKEY_L(i + 0),SUBKEY_R(i + 0), \
+		     t0,t1,il,ir); \
+} while (0)
+
+#ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
+	while (1) {
+		i -= 8;
+		ROUNDS(i);
+		if (i == 0)
+			break;
+		FLS(i);
+	}
+#else
+	if (i == 32) {
+		ROUNDS(24);
+		FLS(24);
+	}
+	ROUNDS(16);
+	FLS(16);
+	ROUNDS(8);
+	FLS(8);
+	ROUNDS(0);
+#endif
+
+#undef ROUNDS
+#undef FLS
 
 	/* post whitening but kw4 */
-	io_text[0] = io[2] ^ SUBKEY_L(0);
-	io_text[1] = io[3] ^ SUBKEY_R(0);
-	io_text[2] = io[0];
-	io_text[3] = io[1];
+	io[2] ^= SUBKEY_L(0);
+	io[3] ^= SUBKEY_R(0);
+	/* NB: 0,1 should be swapped with 2,3 by caller! */
 }
 
 
@@ -1445,21 +1189,15 @@ static void camellia_encrypt(struct cryp
 	tmp[2] = be32_to_cpu(src[2]);
 	tmp[3] = be32_to_cpu(src[3]);
 
-	switch (cctx->key_length) {
-	case 16:
-		camellia_encrypt128(cctx->key_table, tmp);
-		break;
-	case 24:
-		/* fall through */
-	case 32:
-		camellia_encrypt256(cctx->key_table, tmp);
-		break;
-	}
-
-	dst[0] = cpu_to_be32(tmp[0]);
-	dst[1] = cpu_to_be32(tmp[1]);
-	dst[2] = cpu_to_be32(tmp[2]);
-	dst[3] = cpu_to_be32(tmp[3]);
+	camellia_do_encrypt(cctx->key_table, tmp,
+		cctx->key_length == 16 ? 24 : 32 /* for key lengths of 24 and 32 */
+	);
+
+	/* do_encrypt returns 0,1 swapped with 2,3 */
+	dst[0] = cpu_to_be32(tmp[2]);
+	dst[1] = cpu_to_be32(tmp[3]);
+	dst[2] = cpu_to_be32(tmp[0]);
+	dst[3] = cpu_to_be32(tmp[1]);
 }
 
 static void camellia_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
@@ -1475,21 +1213,15 @@ static void camellia_decrypt(struct cryp
 	tmp[2] = be32_to_cpu(src[2]);
 	tmp[3] = be32_to_cpu(src[3]);
 
-	switch (cctx->key_length) {
-	case 16:
-		camellia_decrypt128(cctx->key_table, tmp);
-		break;
-	case 24:
-		/* fall through */
-	case 32:
-		camellia_decrypt256(cctx->key_table, tmp);
-		break;
-	}
-
-	dst[0] = cpu_to_be32(tmp[0]);
-	dst[1] = cpu_to_be32(tmp[1]);
-	dst[2] = cpu_to_be32(tmp[2]);
-	dst[3] = cpu_to_be32(tmp[3]);
+	camellia_do_decrypt(cctx->key_table, tmp,
+		cctx->key_length == 16 ? 24 : 32 /* for key lengths of 24 and 32 */
+	);
+
+	/* do_decrypt returns 0,1 swapped with 2,3 */
+	dst[0] = cpu_to_be32(tmp[2]);
+	dst[1] = cpu_to_be32(tmp[3]);
+	dst[2] = cpu_to_be32(tmp[0]);
+	dst[3] = cpu_to_be32(tmp[1]);
 }
 
 static struct crypto_alg camellia_alg = {
@@ -1528,3 +1260,5 @@ module_exit(camellia_fini);
 
 MODULE_DESCRIPTION("Camellia Cipher Algorithm");
 MODULE_LICENSE("GPL");
+
+#endif /* if BITS_PER_LONG < 64 */
--- /dev/null	2006-05-22 15:25:23.000000000 +0100
+++ linux-2.6.23.src/crypto/camellia_64.c	2007-10-25 12:32:16.000000000 +0100
@@ -0,0 +1,1172 @@
+/*
+ * Copyright (C) 2006
+ * NTT (Nippon Telegraph and Telephone Corporation).
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
+ */
+
+/*
+ * Algorithm Specification
+ *  http://info.isl.ntt.co.jp/crypt/eng/camellia/specifications.html
+ */
+
+/*
+ *
+ * NOTE --- NOTE --- NOTE --- NOTE
+ * This implementation assumes that all memory addresses passed
+ * as parameters are four-byte aligned.
+ *
+ */
+
+/* #included from camellia.c if long is 64bit */
+
+/*
+#include <linux/crypto.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+*/
+
+static const u32 camellia_sp1110[256] = {
+	0x70707000,0x82828200,0x2c2c2c00,0xececec00,
+	0xb3b3b300,0x27272700,0xc0c0c000,0xe5e5e500,
+	0xe4e4e400,0x85858500,0x57575700,0x35353500,
+	0xeaeaea00,0x0c0c0c00,0xaeaeae00,0x41414100,
+	0x23232300,0xefefef00,0x6b6b6b00,0x93939300,
+	0x45454500,0x19191900,0xa5a5a500,0x21212100,
+	0xededed00,0x0e0e0e00,0x4f4f4f00,0x4e4e4e00,
+	0x1d1d1d00,0x65656500,0x92929200,0xbdbdbd00,
+	0x86868600,0xb8b8b800,0xafafaf00,0x8f8f8f00,
+	0x7c7c7c00,0xebebeb00,0x1f1f1f00,0xcecece00,
+	0x3e3e3e00,0x30303000,0xdcdcdc00,0x5f5f5f00,
+	0x5e5e5e00,0xc5c5c500,0x0b0b0b00,0x1a1a1a00,
+	0xa6a6a600,0xe1e1e100,0x39393900,0xcacaca00,
+	0xd5d5d500,0x47474700,0x5d5d5d00,0x3d3d3d00,
+	0xd9d9d900,0x01010100,0x5a5a5a00,0xd6d6d600,
+	0x51515100,0x56565600,0x6c6c6c00,0x4d4d4d00,
+	0x8b8b8b00,0x0d0d0d00,0x9a9a9a00,0x66666600,
+	0xfbfbfb00,0xcccccc00,0xb0b0b000,0x2d2d2d00,
+	0x74747400,0x12121200,0x2b2b2b00,0x20202000,
+	0xf0f0f000,0xb1b1b100,0x84848400,0x99999900,
+	0xdfdfdf00,0x4c4c4c00,0xcbcbcb00,0xc2c2c200,
+	0x34343400,0x7e7e7e00,0x76767600,0x05050500,
+	0x6d6d6d00,0xb7b7b700,0xa9a9a900,0x31313100,
+	0xd1d1d100,0x17171700,0x04040400,0xd7d7d700,
+	0x14141400,0x58585800,0x3a3a3a00,0x61616100,
+	0xdedede00,0x1b1b1b00,0x11111100,0x1c1c1c00,
+	0x32323200,0x0f0f0f00,0x9c9c9c00,0x16161600,
+	0x53535300,0x18181800,0xf2f2f200,0x22222200,
+	0xfefefe00,0x44444400,0xcfcfcf00,0xb2b2b200,
+	0xc3c3c300,0xb5b5b500,0x7a7a7a00,0x91919100,
+	0x24242400,0x08080800,0xe8e8e800,0xa8a8a800,
+	0x60606000,0xfcfcfc00,0x69696900,0x50505000,
+	0xaaaaaa00,0xd0d0d000,0xa0a0a000,0x7d7d7d00,
+	0xa1a1a100,0x89898900,0x62626200,0x97979700,
+	0x54545400,0x5b5b5b00,0x1e1e1e00,0x95959500,
+	0xe0e0e000,0xffffff00,0x64646400,0xd2d2d200,
+	0x10101000,0xc4c4c400,0x00000000,0x48484800,
+	0xa3a3a300,0xf7f7f700,0x75757500,0xdbdbdb00,
+	0x8a8a8a00,0x03030300,0xe6e6e600,0xdadada00,
+	0x09090900,0x3f3f3f00,0xdddddd00,0x94949400,
+	0x87878700,0x5c5c5c00,0x83838300,0x02020200,
+	0xcdcdcd00,0x4a4a4a00,0x90909000,0x33333300,
+	0x73737300,0x67676700,0xf6f6f600,0xf3f3f300,
+	0x9d9d9d00,0x7f7f7f00,0xbfbfbf00,0xe2e2e200,
+	0x52525200,0x9b9b9b00,0xd8d8d800,0x26262600,
+	0xc8c8c800,0x37373700,0xc6c6c600,0x3b3b3b00,
+	0x81818100,0x96969600,0x6f6f6f00,0x4b4b4b00,
+	0x13131300,0xbebebe00,0x63636300,0x2e2e2e00,
+	0xe9e9e900,0x79797900,0xa7a7a700,0x8c8c8c00,
+	0x9f9f9f00,0x6e6e6e00,0xbcbcbc00,0x8e8e8e00,
+	0x29292900,0xf5f5f500,0xf9f9f900,0xb6b6b600,
+	0x2f2f2f00,0xfdfdfd00,0xb4b4b400,0x59595900,
+	0x78787800,0x98989800,0x06060600,0x6a6a6a00,
+	0xe7e7e700,0x46464600,0x71717100,0xbababa00,
+	0xd4d4d400,0x25252500,0xababab00,0x42424200,
+	0x88888800,0xa2a2a200,0x8d8d8d00,0xfafafa00,
+	0x72727200,0x07070700,0xb9b9b900,0x55555500,
+	0xf8f8f800,0xeeeeee00,0xacacac00,0x0a0a0a00,
+	0x36363600,0x49494900,0x2a2a2a00,0x68686800,
+	0x3c3c3c00,0x38383800,0xf1f1f100,0xa4a4a400,
+	0x40404000,0x28282800,0xd3d3d300,0x7b7b7b00,
+	0xbbbbbb00,0xc9c9c900,0x43434300,0xc1c1c100,
+	0x15151500,0xe3e3e300,0xadadad00,0xf4f4f400,
+	0x77777700,0xc7c7c700,0x80808000,0x9e9e9e00,
+};
+
+static const u32 camellia_sp0222[256] = {
+	0x00e0e0e0,0x00050505,0x00585858,0x00d9d9d9,
+	0x00676767,0x004e4e4e,0x00818181,0x00cbcbcb,
+	0x00c9c9c9,0x000b0b0b,0x00aeaeae,0x006a6a6a,
+	0x00d5d5d5,0x00181818,0x005d5d5d,0x00828282,
+	0x00464646,0x00dfdfdf,0x00d6d6d6,0x00272727,
+	0x008a8a8a,0x00323232,0x004b4b4b,0x00424242,
+	0x00dbdbdb,0x001c1c1c,0x009e9e9e,0x009c9c9c,
+	0x003a3a3a,0x00cacaca,0x00252525,0x007b7b7b,
+	0x000d0d0d,0x00717171,0x005f5f5f,0x001f1f1f,
+	0x00f8f8f8,0x00d7d7d7,0x003e3e3e,0x009d9d9d,
+	0x007c7c7c,0x00606060,0x00b9b9b9,0x00bebebe,
+	0x00bcbcbc,0x008b8b8b,0x00161616,0x00343434,
+	0x004d4d4d,0x00c3c3c3,0x00727272,0x00959595,
+	0x00ababab,0x008e8e8e,0x00bababa,0x007a7a7a,
+	0x00b3b3b3,0x00020202,0x00b4b4b4,0x00adadad,
+	0x00a2a2a2,0x00acacac,0x00d8d8d8,0x009a9a9a,
+	0x00171717,0x001a1a1a,0x00353535,0x00cccccc,
+	0x00f7f7f7,0x00999999,0x00616161,0x005a5a5a,
+	0x00e8e8e8,0x00242424,0x00565656,0x00404040,
+	0x00e1e1e1,0x00636363,0x00090909,0x00333333,
+	0x00bfbfbf,0x00989898,0x00979797,0x00858585,
+	0x00686868,0x00fcfcfc,0x00ececec,0x000a0a0a,
+	0x00dadada,0x006f6f6f,0x00535353,0x00626262,
+	0x00a3a3a3,0x002e2e2e,0x00080808,0x00afafaf,
+	0x00282828,0x00b0b0b0,0x00747474,0x00c2c2c2,
+	0x00bdbdbd,0x00363636,0x00222222,0x00383838,
+	0x00646464,0x001e1e1e,0x00393939,0x002c2c2c,
+	0x00a6a6a6,0x00303030,0x00e5e5e5,0x00444444,
+	0x00fdfdfd,0x00888888,0x009f9f9f,0x00656565,
+	0x00878787,0x006b6b6b,0x00f4f4f4,0x00232323,
+	0x00484848,0x00101010,0x00d1d1d1,0x00515151,
+	0x00c0c0c0,0x00f9f9f9,0x00d2d2d2,0x00a0a0a0,
+	0x00555555,0x00a1a1a1,0x00414141,0x00fafafa,
+	0x00434343,0x00131313,0x00c4c4c4,0x002f2f2f,
+	0x00a8a8a8,0x00b6b6b6,0x003c3c3c,0x002b2b2b,
+	0x00c1c1c1,0x00ffffff,0x00c8c8c8,0x00a5a5a5,
+	0x00202020,0x00898989,0x00000000,0x00909090,
+	0x00474747,0x00efefef,0x00eaeaea,0x00b7b7b7,
+	0x00151515,0x00060606,0x00cdcdcd,0x00b5b5b5,
+	0x00121212,0x007e7e7e,0x00bbbbbb,0x00292929,
+	0x000f0f0f,0x00b8b8b8,0x00070707,0x00040404,
+	0x009b9b9b,0x00949494,0x00212121,0x00666666,
+	0x00e6e6e6,0x00cecece,0x00ededed,0x00e7e7e7,
+	0x003b3b3b,0x00fefefe,0x007f7f7f,0x00c5c5c5,
+	0x00a4a4a4,0x00373737,0x00b1b1b1,0x004c4c4c,
+	0x00919191,0x006e6e6e,0x008d8d8d,0x00767676,
+	0x00030303,0x002d2d2d,0x00dedede,0x00969696,
+	0x00262626,0x007d7d7d,0x00c6c6c6,0x005c5c5c,
+	0x00d3d3d3,0x00f2f2f2,0x004f4f4f,0x00191919,
+	0x003f3f3f,0x00dcdcdc,0x00797979,0x001d1d1d,
+	0x00525252,0x00ebebeb,0x00f3f3f3,0x006d6d6d,
+	0x005e5e5e,0x00fbfbfb,0x00696969,0x00b2b2b2,
+	0x00f0f0f0,0x00313131,0x000c0c0c,0x00d4d4d4,
+	0x00cfcfcf,0x008c8c8c,0x00e2e2e2,0x00757575,
+	0x00a9a9a9,0x004a4a4a,0x00575757,0x00848484,
+	0x00111111,0x00454545,0x001b1b1b,0x00f5f5f5,
+	0x00e4e4e4,0x000e0e0e,0x00737373,0x00aaaaaa,
+	0x00f1f1f1,0x00dddddd,0x00595959,0x00141414,
+	0x006c6c6c,0x00929292,0x00545454,0x00d0d0d0,
+	0x00787878,0x00707070,0x00e3e3e3,0x00494949,
+	0x00808080,0x00505050,0x00a7a7a7,0x00f6f6f6,
+	0x00777777,0x00939393,0x00868686,0x00838383,
+	0x002a2a2a,0x00c7c7c7,0x005b5b5b,0x00e9e9e9,
+	0x00eeeeee,0x008f8f8f,0x00010101,0x003d3d3d,
+};
+
+static const u32 camellia_sp3033[256] = {
+	0x38003838,0x41004141,0x16001616,0x76007676,
+	0xd900d9d9,0x93009393,0x60006060,0xf200f2f2,
+	0x72007272,0xc200c2c2,0xab00abab,0x9a009a9a,
+	0x75007575,0x06000606,0x57005757,0xa000a0a0,
+	0x91009191,0xf700f7f7,0xb500b5b5,0xc900c9c9,
+	0xa200a2a2,0x8c008c8c,0xd200d2d2,0x90009090,
+	0xf600f6f6,0x07000707,0xa700a7a7,0x27002727,
+	0x8e008e8e,0xb200b2b2,0x49004949,0xde00dede,
+	0x43004343,0x5c005c5c,0xd700d7d7,0xc700c7c7,
+	0x3e003e3e,0xf500f5f5,0x8f008f8f,0x67006767,
+	0x1f001f1f,0x18001818,0x6e006e6e,0xaf00afaf,
+	0x2f002f2f,0xe200e2e2,0x85008585,0x0d000d0d,
+	0x53005353,0xf000f0f0,0x9c009c9c,0x65006565,
+	0xea00eaea,0xa300a3a3,0xae00aeae,0x9e009e9e,
+	0xec00ecec,0x80008080,0x2d002d2d,0x6b006b6b,
+	0xa800a8a8,0x2b002b2b,0x36003636,0xa600a6a6,
+	0xc500c5c5,0x86008686,0x4d004d4d,0x33003333,
+	0xfd00fdfd,0x66006666,0x58005858,0x96009696,
+	0x3a003a3a,0x09000909,0x95009595,0x10001010,
+	0x78007878,0xd800d8d8,0x42004242,0xcc00cccc,
+	0xef00efef,0x26002626,0xe500e5e5,0x61006161,
+	0x1a001a1a,0x3f003f3f,0x3b003b3b,0x82008282,
+	0xb600b6b6,0xdb00dbdb,0xd400d4d4,0x98009898,
+	0xe800e8e8,0x8b008b8b,0x02000202,0xeb00ebeb,
+	0x0a000a0a,0x2c002c2c,0x1d001d1d,0xb000b0b0,
+	0x6f006f6f,0x8d008d8d,0x88008888,0x0e000e0e,
+	0x19001919,0x87008787,0x4e004e4e,0x0b000b0b,
+	0xa900a9a9,0x0c000c0c,0x79007979,0x11001111,
+	0x7f007f7f,0x22002222,0xe700e7e7,0x59005959,
+	0xe100e1e1,0xda00dada,0x3d003d3d,0xc800c8c8,
+	0x12001212,0x04000404,0x74007474,0x54005454,
+	0x30003030,0x7e007e7e,0xb400b4b4,0x28002828,
+	0x55005555,0x68006868,0x50005050,0xbe00bebe,
+	0xd000d0d0,0xc400c4c4,0x31003131,0xcb00cbcb,
+	0x2a002a2a,0xad00adad,0x0f000f0f,0xca00caca,
+	0x70007070,0xff00ffff,0x32003232,0x69006969,
+	0x08000808,0x62006262,0x00000000,0x24002424,
+	0xd100d1d1,0xfb00fbfb,0xba00baba,0xed00eded,
+	0x45004545,0x81008181,0x73007373,0x6d006d6d,
+	0x84008484,0x9f009f9f,0xee00eeee,0x4a004a4a,
+	0xc300c3c3,0x2e002e2e,0xc100c1c1,0x01000101,
+	0xe600e6e6,0x25002525,0x48004848,0x99009999,
+	0xb900b9b9,0xb300b3b3,0x7b007b7b,0xf900f9f9,
+	0xce00cece,0xbf00bfbf,0xdf00dfdf,0x71007171,
+	0x29002929,0xcd00cdcd,0x6c006c6c,0x13001313,
+	0x64006464,0x9b009b9b,0x63006363,0x9d009d9d,
+	0xc000c0c0,0x4b004b4b,0xb700b7b7,0xa500a5a5,
+	0x89008989,0x5f005f5f,0xb100b1b1,0x17001717,
+	0xf400f4f4,0xbc00bcbc,0xd300d3d3,0x46004646,
+	0xcf00cfcf,0x37003737,0x5e005e5e,0x47004747,
+	0x94009494,0xfa00fafa,0xfc00fcfc,0x5b005b5b,
+	0x97009797,0xfe00fefe,0x5a005a5a,0xac00acac,
+	0x3c003c3c,0x4c004c4c,0x03000303,0x35003535,
+	0xf300f3f3,0x23002323,0xb800b8b8,0x5d005d5d,
+	0x6a006a6a,0x92009292,0xd500d5d5,0x21002121,
+	0x44004444,0x51005151,0xc600c6c6,0x7d007d7d,
+	0x39003939,0x83008383,0xdc00dcdc,0xaa00aaaa,
+	0x7c007c7c,0x77007777,0x56005656,0x05000505,
+	0x1b001b1b,0xa400a4a4,0x15001515,0x34003434,
+	0x1e001e1e,0x1c001c1c,0xf800f8f8,0x52005252,
+	0x20002020,0x14001414,0xe900e9e9,0xbd00bdbd,
+	0xdd00dddd,0xe400e4e4,0xa100a1a1,0xe000e0e0,
+	0x8a008a8a,0xf100f1f1,0xd600d6d6,0x7a007a7a,
+	0xbb00bbbb,0xe300e3e3,0x40004040,0x4f004f4f,
+};
+
+static const u32 camellia_sp4404[256] = {
+	0x70700070,0x2c2c002c,0xb3b300b3,0xc0c000c0,
+	0xe4e400e4,0x57570057,0xeaea00ea,0xaeae00ae,
+	0x23230023,0x6b6b006b,0x45450045,0xa5a500a5,
+	0xeded00ed,0x4f4f004f,0x1d1d001d,0x92920092,
+	0x86860086,0xafaf00af,0x7c7c007c,0x1f1f001f,
+	0x3e3e003e,0xdcdc00dc,0x5e5e005e,0x0b0b000b,
+	0xa6a600a6,0x39390039,0xd5d500d5,0x5d5d005d,
+	0xd9d900d9,0x5a5a005a,0x51510051,0x6c6c006c,
+	0x8b8b008b,0x9a9a009a,0xfbfb00fb,0xb0b000b0,
+	0x74740074,0x2b2b002b,0xf0f000f0,0x84840084,
+	0xdfdf00df,0xcbcb00cb,0x34340034,0x76760076,
+	0x6d6d006d,0xa9a900a9,0xd1d100d1,0x04040004,
+	0x14140014,0x3a3a003a,0xdede00de,0x11110011,
+	0x32320032,0x9c9c009c,0x53530053,0xf2f200f2,
+	0xfefe00fe,0xcfcf00cf,0xc3c300c3,0x7a7a007a,
+	0x24240024,0xe8e800e8,0x60600060,0x69690069,
+	0xaaaa00aa,0xa0a000a0,0xa1a100a1,0x62620062,
+	0x54540054,0x1e1e001e,0xe0e000e0,0x64640064,
+	0x10100010,0x00000000,0xa3a300a3,0x75750075,
+	0x8a8a008a,0xe6e600e6,0x09090009,0xdddd00dd,
+	0x87870087,0x83830083,0xcdcd00cd,0x90900090,
+	0x73730073,0xf6f600f6,0x9d9d009d,0xbfbf00bf,
+	0x52520052,0xd8d800d8,0xc8c800c8,0xc6c600c6,
+	0x81810081,0x6f6f006f,0x13130013,0x63630063,
+	0xe9e900e9,0xa7a700a7,0x9f9f009f,0xbcbc00bc,
+	0x29290029,0xf9f900f9,0x2f2f002f,0xb4b400b4,
+	0x78780078,0x06060006,0xe7e700e7,0x71710071,
+	0xd4d400d4,0xabab00ab,0x88880088,0x8d8d008d,
+	0x72720072,0xb9b900b9,0xf8f800f8,0xacac00ac,
+	0x36360036,0x2a2a002a,0x3c3c003c,0xf1f100f1,
+	0x40400040,0xd3d300d3,0xbbbb00bb,0x43430043,
+	0x15150015,0xadad00ad,0x77770077,0x80800080,
+	0x82820082,0xecec00ec,0x27270027,0xe5e500e5,
+	0x85850085,0x35350035,0x0c0c000c,0x41410041,
+	0xefef00ef,0x93930093,0x19190019,0x21210021,
+	0x0e0e000e,0x4e4e004e,0x65650065,0xbdbd00bd,
+	0xb8b800b8,0x8f8f008f,0xebeb00eb,0xcece00ce,
+	0x30300030,0x5f5f005f,0xc5c500c5,0x1a1a001a,
+	0xe1e100e1,0xcaca00ca,0x47470047,0x3d3d003d,
+	0x01010001,0xd6d600d6,0x56560056,0x4d4d004d,
+	0x0d0d000d,0x66660066,0xcccc00cc,0x2d2d002d,
+	0x12120012,0x20200020,0xb1b100b1,0x99990099,
+	0x4c4c004c,0xc2c200c2,0x7e7e007e,0x05050005,
+	0xb7b700b7,0x31310031,0x17170017,0xd7d700d7,
+	0x58580058,0x61610061,0x1b1b001b,0x1c1c001c,
+	0x0f0f000f,0x16160016,0x18180018,0x22220022,
+	0x44440044,0xb2b200b2,0xb5b500b5,0x91910091,
+	0x08080008,0xa8a800a8,0xfcfc00fc,0x50500050,
+	0xd0d000d0,0x7d7d007d,0x89890089,0x97970097,
+	0x5b5b005b,0x95950095,0xffff00ff,0xd2d200d2,
+	0xc4c400c4,0x48480048,0xf7f700f7,0xdbdb00db,
+	0x03030003,0xdada00da,0x3f3f003f,0x94940094,
+	0x5c5c005c,0x02020002,0x4a4a004a,0x33330033,
+	0x67670067,0xf3f300f3,0x7f7f007f,0xe2e200e2,
+	0x9b9b009b,0x26260026,0x37370037,0x3b3b003b,
+	0x96960096,0x4b4b004b,0xbebe00be,0x2e2e002e,
+	0x79790079,0x8c8c008c,0x6e6e006e,0x8e8e008e,
+	0xf5f500f5,0xb6b600b6,0xfdfd00fd,0x59590059,
+	0x98980098,0x6a6a006a,0x46460046,0xbaba00ba,
+	0x25250025,0x42420042,0xa2a200a2,0xfafa00fa,
+	0x07070007,0x55550055,0xeeee00ee,0x0a0a000a,
+	0x49490049,0x68680068,0x38380038,0xa4a400a4,
+	0x28280028,0x7b7b007b,0xc9c900c9,0xc1c100c1,
+	0xe3e300e3,0xf4f400f4,0xc7c700c7,0x9e9e009e,
+};
+
+
+#define CAMELLIA_MIN_KEY_SIZE        16
+#define CAMELLIA_MAX_KEY_SIZE        32
+#define CAMELLIA_BLOCK_SIZE          16
+#define CAMELLIA_TABLE_BYTE_LEN     272
+
+
+/* key constants */
+
+#define CAMELLIA_SIGMA1 (0xA09E667F3BCC908B)
+#define CAMELLIA_SIGMA2 (0xB67AE8584CAA73B2)
+#define CAMELLIA_SIGMA3 (0xC6EF372FE94F82BE)
+#define CAMELLIA_SIGMA4 (0x54FF53A5F1D36F1C)
+#define CAMELLIA_SIGMA5 (0x10E527FADE682D1D)
+#define CAMELLIA_SIGMA6 (0xB05688C2B3E6C1FD)
+
+/*
+ *  macros
+ */
+#define GETU64(v, pt) \
+    do { \
+	/* latest breed of gcc is clever enough to use move */ \
+	memcpy(&(v), (pt), 8); \
+	(v) = be64_to_cpu(v); \
+    } while(0)
+
+/* rotation right shift 1byte */
+#define ROR8(x) (((x) >> 8) + ((x) << (sizeof(x)*8 - 8)))
+/* rotation left shift 1bit */
+#define ROL1(x) (((x) << 1) + ((x) >> (sizeof(x)*8 - 1)))
+/* rotation left shift 1byte */
+#define ROL8(x) (((x) << 8) + ((x) >> (sizeof(x)*8 - 8)))
+
+#define ROLDQ(l, r, w, bits)				\
+    do {						\
+	w = l;						\
+	l = (l << bits) + (r >> (64 - bits));		\
+	r = (r << bits) + (w >> (64 - bits));		\
+    } while(0)
+
+/*
+ * NB: L and R below stand for 'left' and 'right' as in written numbers.
+ * That is, in (xxxL,xxxR) pair xxxL holds most significant digits,
+ * _not_ least significant ones!
+ */
+
+
+/*
+ * Key setup
+ */
+#define CAMELLIA_F(x, k, y, i)					\
+    do {							\
+	u32 yl, yr;						\
+	i = x ^ k;						\
+	yl = camellia_sp1110[(u8)i]				\
+	   ^ camellia_sp0222[(u8)(i >> 24)]			\
+	   ^ camellia_sp3033[(u8)(i >> 16)]			\
+	   ^ camellia_sp4404[(u8)(i >> 8)];			\
+	yr = camellia_sp1110[    (i >> 56)]			\
+	   ^ camellia_sp0222[(u8)(i >> 48)]			\
+	   ^ camellia_sp3033[(u8)(i >> 40)]			\
+	   ^ camellia_sp4404[(u8)(i >> 32)];			\
+	yl ^= yr;						\
+	yr = ROR8(yr);						\
+	yr ^= yl;						\
+	y = ((u64)yl << 32) + yr;				\
+    } while(0)
+
+#define SUBKEY(INDEX) (subkey[(INDEX)])
+
+#ifdef __BIG_ENDIAN
+#define SUBKEY_L(INDEX) (((u32*)subkey)[(INDEX)*2])
+#define SUBKEY_R(INDEX) (((u32*)subkey)[(INDEX)*2 + 1])
+#else
+#define SUBKEY_L(INDEX) (((u32*)subkey)[(INDEX)*2 + 1])
+#define SUBKEY_R(INDEX) (((u32*)subkey)[(INDEX)*2])
+#endif
+
+static void camellia_setup_tail(u64 *subkey, int max)
+{
+	u32 dw;
+	int i = 2;
+	do {
+		dw = SUBKEY_L(i + 0) ^ SUBKEY_R(i + 0); dw = ROL8(dw);/* round 1 */
+		SUBKEY_R(i + 0) = SUBKEY_L(i + 0) ^ dw; SUBKEY_L(i + 0) = dw;
+		dw = SUBKEY_L(i + 1) ^ SUBKEY_R(i + 1); dw = ROL8(dw);/* round 2 */
+		SUBKEY_R(i + 1) = SUBKEY_L(i + 1) ^ dw; SUBKEY_L(i + 1) = dw;
+		dw = SUBKEY_L(i + 2) ^ SUBKEY_R(i + 2); dw = ROL8(dw);/* round 3 */
+		SUBKEY_R(i + 2) = SUBKEY_L(i + 2) ^ dw; SUBKEY_L(i + 2) = dw;
+		dw = SUBKEY_L(i + 3) ^ SUBKEY_R(i + 3); dw = ROL8(dw);/* round 4 */
+		SUBKEY_R(i + 3) = SUBKEY_L(i + 3) ^ dw; SUBKEY_L(i + 3) = dw;
+		dw = SUBKEY_L(i + 4) ^ SUBKEY_R(i + 4); dw = ROL8(dw);/* round 5 */
+		SUBKEY_R(i + 4) = SUBKEY_L(i + 4) ^ dw; SUBKEY_L(i + 4) = dw;
+		dw = SUBKEY_L(i + 5) ^ SUBKEY_R(i + 5); dw = ROL8(dw);/* round 6 */
+		SUBKEY_R(i + 5) = SUBKEY_L(i + 5) ^ dw; SUBKEY_L(i + 5) = dw;
+		i += 8;
+	} while (i < max);
+}
+
+#ifdef __BIG_ENDIAN
+#define subL(INDEX) (((u32*)sub)[(INDEX)*2])
+#define subR(INDEX) (((u32*)sub)[(INDEX)*2 + 1])
+#else
+#define subL(INDEX) (((u32*)sub)[(INDEX)*2 + 1])
+#define subR(INDEX) (((u32*)sub)[(INDEX)*2])
+#endif
+
+static void camellia_setup128(const unsigned char *key, u64 *subkey)
+{
+	u64 kl, kr;
+	u64 i, t, w;
+	u64 kw4;
+	u32 dw;
+	u64 sub[26];
+
+	/**
+	 *  k == kl || kr (|| is concatination)
+	 */
+	GETU64(kl, key     );
+	GETU64(kr, key +  8);
+
+	/**
+	 * generate KL dependent subkeys
+	 */
+	/* kw1 */
+	sub[0] = kl;
+	/* kw2 */
+	sub[1] = kr;
+	/* rotation left shift 15bit */
+	ROLDQ(kl, kr, w, 15);
+	/* k3 */
+	sub[4] = kl;
+	/* k4 */
+	sub[5] = kr;
+	/* rotation left shift 15+30bit */
+	ROLDQ(kl, kr, w, 30);
+	/* k7 */
+	sub[10] = kl;
+	/* k8 */
+	sub[11] = kr;
+	/* rotation left shift 15+30+15bit */
+	ROLDQ(kl, kr, w, 15);
+	/* k10 */
+	sub[13] = kr;
+	/* rotation left shift 15+30+15+17 bit */
+	ROLDQ(kl, kr, w, 17);
+	/* kl3 */
+	sub[16] = kl;
+	/* kl4 */
+	sub[17] = kr;
+	/* rotation left shift 15+30+15+17+17 bit */
+	ROLDQ(kl, kr, w, 17);
+	/* k13 */
+	sub[18] = kl;
+	/* k14 */
+	sub[19] = kr;
+	/* rotation left shift 15+30+15+17+17+17 bit */
+	ROLDQ(kl, kr, w, 17);
+	/* k17 */
+	sub[22] = kl;
+	/* k18 */
+	sub[23] = kr;
+
+	/* generate KA */
+	kl = sub[0];
+	kr = sub[1];
+	CAMELLIA_F(kl, CAMELLIA_SIGMA1, w, i);
+	kr ^= w;
+	CAMELLIA_F(kr, CAMELLIA_SIGMA2, kl, i);
+	/* current status == (kl, w) */
+	CAMELLIA_F(kl, CAMELLIA_SIGMA3, kr, i);
+	kr ^= w;
+	CAMELLIA_F(kr, CAMELLIA_SIGMA4, w, i);
+	kl ^= w;
+
+	/* generate KA dependent subkeys */
+	/* k1, k2 */
+	sub[2] = kl;
+	sub[3] = kr;
+	ROLDQ(kl, kr, w, 15);
+	/* k5,k6 */
+	sub[6] = kl;
+	sub[7] = kr;
+	ROLDQ(kl, kr, w, 15);
+	/* kl1, kl2 */
+	sub[8] = kl;
+	sub[9] = kr;
+	ROLDQ(kl, kr, w, 15);
+	/* k9 */
+	sub[12] = kl;
+	ROLDQ(kl, kr, w, 15);
+	/* k11, k12 */
+	sub[14] = kl;
+	sub[15] = kr;
+	ROLDQ(kl, kr, w, 34);
+	/* k15, k16 */
+	sub[20] = kl;
+	sub[21] = kr;
+	ROLDQ(kl, kr, w, 17);
+	/* kw3, kw4 */
+	sub[24] = kl;
+	sub[25] = kr;
+
+	/* absorb kw2 to other subkeys */
+	/* round 2 */
+	sub[3] ^= sub[1];
+	/* round 4 */
+	sub[5] ^= sub[1];
+	/* round 6 */
+	sub[7] ^= sub[1];
+	subL(1) ^= subR(1) & ~subR(9);
+	dw = subL(1) & subL(9),
+		subR(1) ^= ROL1(dw); /* modified for FLinv(kl2) */
+	/* round 8 */
+	sub[11] ^= sub[1];
+	/* round 10 */
+	sub[13] ^= sub[1];
+	/* round 12 */
+	sub[15] ^= sub[1];
+	subL(1) ^= subR(1) & ~subR(17);
+	dw = subL(1) & subL(17),
+		subR(1) ^= ROL1(dw); /* modified for FLinv(kl4) */
+	/* round 14 */
+	sub[19] ^= sub[1];
+	/* round 16 */
+	sub[21] ^= sub[1];
+	/* round 18 */
+	sub[23] ^= sub[1];
+	/* kw3 */
+	sub[24] ^= sub[1];
+
+	/* absorb kw4 to other subkeys */
+	kw4 = sub[25];
+	/* round 17 */
+	sub[22] ^= kw4;
+	/* round 15 */
+	sub[20] ^= kw4;
+	/* round 13 */
+	sub[18] ^= kw4;
+	kw4 ^= (u64)((u32)kw4 & ~subR(16)) << 32; //kw4l ^= kw4r & ~subR(16);
+	dw = (u32)(kw4 >> 32) & subL(16); // kw4l & subL[16],
+	kw4 ^= ROL1(dw); /* modified for FL(kl3) */
+	/* round 11 */
+	sub[14] ^= kw4;
+	/* round 9 */
+	sub[12] ^= kw4;
+	/* round 7 */
+	sub[10] ^= kw4;
+	kw4 ^= (u64)((u32)kw4 & ~subR(8)) << 32; //kw4l ^= kw4r & ~subR[8];
+	dw = (u32)(kw4 >> 32) & subL(8);
+	kw4 ^= ROL1(dw); /* modified for FL(kl1) */
+	/* round 5 */
+	sub[6] ^= kw4;
+	/* round 3 */
+	sub[4] ^= kw4;
+	/* round 1 */
+	sub[2] ^= kw4;
+	/* kw1 */
+	sub[0] ^= kw4;
+
+	/* key XOR is end of F-function */
+	SUBKEY(0) = sub[0] ^ sub[2];/* kw1 */
+	SUBKEY(2) = sub[3];       /* round 1 */
+	SUBKEY(3) = sub[2] ^ sub[4]; /* round 2 */
+	SUBKEY(4) = sub[3] ^ sub[5]; /* round 3 */
+	SUBKEY(5) = sub[4] ^ sub[6]; /* round 4 */
+	SUBKEY(6) = sub[5] ^ sub[7]; /* round 5 */
+	t = subL(10) ^ (subR(10) & ~subR(8)); // tl = subL[10] ^ (subR[10] & ~subR[8]);
+	dw = (u32)t & subL(8);  /* FL(kl1) */
+	t = (t << 32) | (subR(10) ^ ROL1(dw)); // tr = subR[10] ^ ROL1(dw);
+	SUBKEY(7) = sub[6] ^ t; /* round 6 */
+	SUBKEY(8) = sub[8];       /* FL(kl1) */
+	SUBKEY(9) = sub[9];       /* FLinv(kl2) */
+	t = subL(7) ^ (subR(7) & ~subR(9));
+	dw = (u32)t & subL(9);  /* FLinv(kl2) */
+	t = (t << 32) | (subR(7) ^ ROL1(dw));
+	SUBKEY(10) = t ^ sub[11]; /* round 7 */
+	SUBKEY(11) = sub[10] ^ sub[12]; /* round 8 */
+	SUBKEY(12) = sub[11] ^ sub[13]; /* round 9 */
+	SUBKEY(13) = sub[12] ^ sub[14]; /* round 10 */
+	SUBKEY(14) = sub[13] ^ sub[15]; /* round 11 */
+	t = subL(18) ^ (subR(18) & ~subR(16));
+	dw = (u32)t & subL(16); /* FL(kl3) */
+	t = (t << 32) | (subR(18) ^ ROL1(dw));
+	SUBKEY(15) = sub[14] ^ t; /* round 12 */
+	SUBKEY(16) = sub[16];     /* FL(kl3) */
+	SUBKEY(17) = sub[17];     /* FLinv(kl4) */
+	t = subL(15) ^ (subR(15) & ~subR(17));
+	dw = (u32)t & subL(17); /* FLinv(kl4) */
+	t = (t << 32) | (subR(15) ^ ROL1(dw));
+	SUBKEY(18) = t ^ sub[19]; /* round 13 */
+	SUBKEY(19) = sub[18] ^ sub[20]; /* round 14 */
+	SUBKEY(20) = sub[19] ^ sub[21]; /* round 15 */
+	SUBKEY(21) = sub[20] ^ sub[22]; /* round 16 */
+	SUBKEY(22) = sub[21] ^ sub[23]; /* round 17 */
+	SUBKEY(23) = sub[22];     /* round 18 */
+	SUBKEY(24) = sub[24] ^ sub[23]; /* kw3 */
+
+	/* apply the inverse of the last half of P-function */
+	camellia_setup_tail(subkey, 24);
+}
+
+static void camellia_setup256(const unsigned char *key, u64 *subkey)
+{
+	u64 kl, kr;        /* left half of key */
+	u64 krl, krr;      /* right half of key */
+	u64 i, t, w;       /* temporary variables */
+	u64 kw4;
+	u32 dw;
+	u64 sub[34];
+
+	/**
+	 *  key = (kl || kr || krl || krr)
+	 *  (|| is concatination)
+	 */
+	GETU64(kl,  key     );
+	GETU64(kr,  key +  8);
+	GETU64(krl, key + 16);
+	GETU64(krr, key + 24);
+
+	/* generate KL dependent subkeys */
+	/* kw1 */
+	sub[0] = kl;
+	/* kw2 */
+	sub[1] = kr;
+	ROLDQ(kl, kr, w, 45);
+	/* k9 */
+	sub[12] = kl;
+	/* k10 */
+	sub[13] = kr;
+	ROLDQ(kl, kr, w, 15);
+	/* kl3 */
+	sub[16] = kl;
+	/* kl4 */
+	sub[17] = kr;
+	ROLDQ(kl, kr, w, 17);
+	/* k17 */
+	sub[22] = kl;
+	/* k18 */
+	sub[23] = kr;
+	ROLDQ(kl, kr, w, 34);
+	/* k23 */
+	sub[30] = kl;
+	/* k24 */
+	sub[31] = kr;
+
+	/* generate KR dependent subkeys */
+	ROLDQ(krl, krr, w, 15);
+	/* k3 */
+	sub[4] = krl;
+	/* k4 */
+	sub[5] = krr;
+	ROLDQ(krl, krr, w, 15);
+	/* kl1 */
+	sub[8] = krl;
+	/* kl2 */
+	sub[9] = krr;
+	ROLDQ(krl, krr, w, 30);
+	/* k13 */
+	sub[18] = krl;
+	/* k14 */
+	sub[19] = krr;
+	ROLDQ(krl, krr, w, 34);
+	/* k19 */
+	sub[26] = krl;
+	/* k20 */
+	sub[27] = krr;
+	ROLDQ(krl, krr, w, 34);
+
+	/* generate KA */
+	kl = sub[0] ^ krl;
+	kr = sub[1] ^ krr;
+	CAMELLIA_F(kl, CAMELLIA_SIGMA1, w, i);
+	kr ^= w;
+	CAMELLIA_F(kr, CAMELLIA_SIGMA2, kl, i);
+	kl ^= krl;
+	CAMELLIA_F(kl, CAMELLIA_SIGMA3, kr, i);
+	kr ^= w ^ krr;
+	CAMELLIA_F(kr, CAMELLIA_SIGMA4, w, i);
+	kl ^= w;
+
+	/* generate KB */
+	krl ^= kl;
+	krr ^= kr;
+	CAMELLIA_F(krl, CAMELLIA_SIGMA5, w, i);
+	krr ^= w;
+	CAMELLIA_F(krr, CAMELLIA_SIGMA6, w, i);
+	krl ^= w;
+
+	/* generate KA dependent subkeys */
+	ROLDQ(kl, kr, w, 15);
+	/* k5 */
+	sub[6] = kl;
+	/* k6 */
+	sub[7] = kr;
+	ROLDQ(kl, kr, w, 30);
+	/* k11 */
+	sub[14] = kl;
+	/* k12 */
+	sub[15] = kr;
+	/* kl5 */
+	ROLDQ(kl, kr, w, 32);
+	sub[24] = kl;
+	/* kl6 */
+	sub[25] = kr;
+	/* rotation left shift 49 from k11,k12 -> k21,k22 */
+	ROLDQ(kl, kr, w, (49 - 32));
+	/* k21 */
+	sub[28] = kl;
+	/* k22 */
+	sub[29] = kr;
+
+	/* generate KB dependent subkeys */
+	/* k1 */
+	sub[2] = krl;
+	/* k2 */
+	sub[3] = krr;
+	ROLDQ(krl, krr, w, 30);
+	/* k7 */
+	sub[10] = krl;
+	/* k8 */
+	sub[11] = krr;
+	ROLDQ(krl, krr, w, 30);
+	/* k15 */
+	sub[20] = krl;
+	/* k16 */
+	sub[21] = krr;
+	ROLDQ(krl, krr, w, 51);
+	/* kw3 */
+	sub[32] = krl;
+	/* kw4 */
+	sub[33] = krr;
+
+	/* absorb kw2 to other subkeys */
+	/* round 2 */
+	sub[3] ^= sub[1];
+	/* round 4 */
+	sub[5] ^= sub[1];
+	/* round 6 */
+	sub[7] ^= sub[1];
+	subL(1) ^= subR(1) & ~subR(9);
+	dw = subL(1) & subL(9),
+		subR(1) ^= ROL1(dw); /* modified for FLinv(kl2) */
+	/* round 8 */
+	sub[11] ^= sub[1];
+	/* round 10 */
+	sub[13] ^= sub[1];
+	/* round 12 */
+	sub[15] ^= sub[1];
+	subL(1) ^= subR(1) & ~subR(17);
+	dw = subL(1) & subL(17),
+		subR(1) ^= ROL1(dw); /* modified for FLinv(kl4) */
+	/* round 14 */
+	sub[19] ^= sub[1];
+	/* round 16 */
+	sub[21] ^= sub[1];
+	/* round 18 */
+	sub[23] ^= sub[1];
+	subL(1) ^= subR(1) & ~subR(25);
+	dw = subL(1) & subL(25),
+		subR(1) ^= ROL1(dw); /* modified for FLinv(kl6) */
+	/* round 20 */
+	sub[27] ^= sub[1];
+	/* round 22 */
+	sub[29] ^= sub[1];
+	/* round 24 */
+	sub[31] ^= sub[1];
+	/* kw3 */
+	sub[32] ^= sub[1];
+
+	/* absorb kw4 to other subkeys */
+	kw4 = sub[33];
+	/* round 23 */
+	sub[30] ^= kw4;
+	/* round 21 */
+	sub[28] ^= kw4;
+	/* round 19 */
+	sub[26] ^= kw4;
+	kw4 ^= (u64)((u32)kw4 & ~subR(24)) << 32; //kw4l ^= kw4r & ~subR[24];
+	dw = (u32)(kw4 >> 32) & subL(24),
+		kw4 ^= ROL1(dw); /* modified for FL(kl5) */
+	/* round 17 */
+	sub[22] ^= kw4;
+	/* round 15 */
+	sub[20] ^= kw4;
+	/* round 13 */
+	sub[18] ^= kw4;
+	kw4 ^= (u64)((u32)kw4 & ~subR(16)) << 32;
+	dw = (u32)(kw4 >> 32) & subL(16),
+		kw4 ^= ROL1(dw); /* modified for FL(kl3) */
+	/* round 11 */
+	sub[14] ^= kw4;
+	/* round 9 */
+	sub[12] ^= kw4;
+	/* round 7 */
+	sub[10] ^= kw4;
+	kw4 ^= (u64)((u32)kw4 & ~subR(8)) << 32;
+	dw = (u32)(kw4 >> 32) & subL(8),
+		kw4 ^= ROL1(dw); /* modified for FL(kl1) */
+	/* round 5 */
+	sub[6] ^= kw4;
+	/* round 3 */
+	sub[4] ^= kw4;
+	/* round 1 */
+	sub[2] ^= kw4;
+	/* kw1 */
+	sub[0] ^= kw4;
+
+	/* key XOR is end of F-function */
+	SUBKEY(0) = sub[0] ^ sub[2];/* kw1 */
+	SUBKEY(2) = sub[3];       /* round 1 */
+	SUBKEY(3) = sub[2] ^ sub[4]; /* round 2 */
+	SUBKEY(4) = sub[3] ^ sub[5]; /* round 3 */
+	SUBKEY(5) = sub[4] ^ sub[6]; /* round 4 */
+	SUBKEY(6) = sub[5] ^ sub[7]; /* round 5 */
+	t = subL(10) ^ (subR(10) & ~subR(8)); // tl = subL[10] ^ (subR[10] & ~subR[8]);
+	dw = (u32)t & subL(8);  /* FL(kl1) */
+	t = (t << 32) | (subR(10) ^ ROL1(dw)); //tr = subR[10] ^ ROL1(dw);
+	SUBKEY(7) = sub[6] ^ t;   /* round 6 */
+	SUBKEY(8) = sub[8];       /* FL(kl1) */
+	SUBKEY(9) = sub[9];       /* FLinv(kl2) */
+	t = subL(7) ^ (subR(7) & ~subR(9));
+	dw = (u32)t & subL(9);  /* FLinv(kl2) */
+	t = (t << 32) | (subR(7) ^ ROL1(dw));
+	SUBKEY(10) = t ^ sub[11]; /* round 7 */
+	SUBKEY(11) = sub[10] ^ sub[12]; /* round 8 */
+	SUBKEY(12) = sub[11] ^ sub[13]; /* round 9 */
+	SUBKEY(13) = sub[12] ^ sub[14]; /* round 10 */
+	SUBKEY(14) = sub[13] ^ sub[15]; /* round 11 */
+	t = subL(18) ^ (subR(18) & ~subR(16));
+	dw = (u32)t & subL(16); /* FL(kl3) */
+	t = (t << 32) | (subR(18) ^ ROL1(dw));
+	SUBKEY(15) = sub[14] ^ t; /* round 12 */
+	SUBKEY(16) = sub[16];     /* FL(kl3) */
+	SUBKEY(17) = sub[17];     /* FLinv(kl4) */
+	t = subL(15) ^ (subR(15) & ~subR(17));
+	dw = (u32)t & subL(17); /* FLinv(kl4) */
+	t = (t << 32) | (subR(15) ^ ROL1(dw));
+	SUBKEY(18) = t ^ sub[19]; /* round 13 */
+	SUBKEY(19) = sub[18] ^ sub[20]; /* round 14 */
+	SUBKEY(20) = sub[19] ^ sub[21]; /* round 15 */
+	SUBKEY(21) = sub[20] ^ sub[22]; /* round 16 */
+	SUBKEY(22) = sub[21] ^ sub[23]; /* round 17 */
+	t = subL(26) ^ (subR(26) & ~subR(24));
+	dw = (u32)t & subL(24); /* FL(kl5) */
+	t = (t << 32) | (subR(26) ^ ROL1(dw));
+	SUBKEY(23) = sub[22] ^ t; /* round 18 */
+	SUBKEY(24) = sub[24];     /* FL(kl5) */
+	SUBKEY(25) = sub[25];     /* FLinv(kl6) */
+	t = subL(23) ^ (subR(23) & ~subR(25));
+	dw = (u32)t & subL(25); /* FLinv(kl6) */
+	t = (t << 32) | (subR(23) ^ ROL1(dw));
+	SUBKEY(26) = t ^ sub[27]; /* round 19 */
+	SUBKEY(27) = sub[26] ^ sub[28]; /* round 20 */
+	SUBKEY(28) = sub[27] ^ sub[29]; /* round 21 */
+	SUBKEY(29) = sub[28] ^ sub[30]; /* round 22 */
+	SUBKEY(30) = sub[29] ^ sub[31]; /* round 23 */
+	SUBKEY(31) = sub[30];     /* round 24 */
+	SUBKEY(32) = sub[32] ^ sub[31]; /* kw3 */
+
+	/* apply the inverse of the last half of P-function */
+	camellia_setup_tail(subkey, 32);
+}
+
+static void camellia_setup192(const unsigned char *key, u64 *subkey)
+{
+	unsigned char kk[32];
+	u64 krl, krr;
+
+	memcpy(kk, key, 24);
+	memcpy((unsigned char *)&krl, key+16, 8);
+	krr = ~krl;
+	memcpy(kk+24, (unsigned char *)&krr, 8);
+	camellia_setup256(kk, subkey);
+}
+
+
+/*
+ * Encrypt/decrypt
+ */
+#define CAMELLIA_FLS(ll, lr, rl, rr, kll, klr, krl, krr, t0, t1, t2, t3) \
+    do {								\
+	t0 = kll & ll;							\
+	t2 = krr | rr;							\
+	rl ^= t2;							\
+	lr ^= ROL1(t0);							\
+	t3 = krl & rl;							\
+	t1 = klr | lr;							\
+	ll ^= t1;							\
+	rr ^= ROL1(t3);							\
+    } while(0)
+
+#define CAMELLIA_ROUNDSM(xl, xr, kl, kr, yl, yr, il, ir)		\
+    do {								\
+	ir =  camellia_sp1110[(u8)xr];					\
+	il =  camellia_sp1110[    (xl >> 24)];				\
+	ir ^= camellia_sp0222[    (xr >> 24)];				\
+	il ^= camellia_sp0222[(u8)(xl >> 16)];				\
+	ir ^= camellia_sp3033[(u8)(xr >> 16)];				\
+	il ^= camellia_sp3033[(u8)(xl >> 8)];				\
+	ir ^= camellia_sp4404[(u8)(xr >> 8)];				\
+	il ^= camellia_sp4404[(u8)xl];					\
+	il ^= kl;							\
+	ir ^= il ^ kr;							\
+	yl ^= ir;							\
+	yr ^= ROR8(il) ^ ir;						\
+    } while(0)
+
+/* max = 24: 128bit encrypt, max = 32: 256bit encrypt */
+static void camellia_do_encrypt(const u64 *subkey, u32 *io, unsigned max)
+{
+	u32 il,ir,t0,t1;               /* temporary variables */
+
+	/* pre whitening but absorb kw2 */
+	io[0] ^= SUBKEY_L(0);
+	io[1] ^= SUBKEY_R(0);
+
+	/* main iteration */
+#define ROUNDS(i) do { \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 2),SUBKEY_R(i + 2), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 3),SUBKEY_R(i + 3), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 4),SUBKEY_R(i + 4), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 5),SUBKEY_R(i + 5), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 6),SUBKEY_R(i + 6), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 7),SUBKEY_R(i + 7), \
+			 io[0],io[1],il,ir); \
+} while (0)
+#define FLS(i) do { \
+	CAMELLIA_FLS(io[0],io[1],io[2],io[3], \
+		     SUBKEY_L(i + 0),SUBKEY_R(i + 0), \
+		     SUBKEY_L(i + 1),SUBKEY_R(i + 1), \
+		     t0,t1,il,ir); \
+} while (0)
+
+#ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
+	{
+		unsigned i = 0;
+		while (1) {
+			ROUNDS(i);
+			i += 8;
+			if (i >= max)
+				break;
+			FLS(i);
+		}
+	}
+#else
+	ROUNDS(0);
+	FLS(8);
+	ROUNDS(8);
+	FLS(16);
+	ROUNDS(16);
+	if (max == 32) {
+		FLS(24);
+		ROUNDS(24);
+	}
+#endif
+
+#undef ROUNDS
+#undef FLS
+
+	/* post whitening but kw4 */
+	io[2] ^= SUBKEY_L(max);
+	io[3] ^= SUBKEY_R(max);
+	/* NB: 0,1 should be swapped with 2,3 by caller! */
+}
+
+static void camellia_do_decrypt(const u64 *subkey, u32 *io, unsigned i)
+{
+	u32 il,ir,t0,t1;               /* temporary variables */
+
+	/* pre whitening but absorb kw2 */
+	io[0] ^= SUBKEY_L(i);
+	io[1] ^= SUBKEY_R(i);
+
+	/* main iteration */
+#define ROUNDS(i) do { \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 7),SUBKEY_R(i + 7), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 6),SUBKEY_R(i + 6), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 5),SUBKEY_R(i + 5), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 4),SUBKEY_R(i + 4), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 3),SUBKEY_R(i + 3), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 2),SUBKEY_R(i + 2), \
+			 io[0],io[1],il,ir); \
+} while (0)
+#define FLS(i) do { \
+	CAMELLIA_FLS(io[0],io[1],io[2],io[3], \
+		     SUBKEY_L(i + 1),SUBKEY_R(i + 1), \
+		     SUBKEY_L(i + 0),SUBKEY_R(i + 0), \
+		     t0,t1,il,ir); \
+} while (0)
+
+#ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
+	while (1) {
+		i -= 8;
+		ROUNDS(i);
+		if (i == 0)
+			break;
+		FLS(i);
+	}
+#else
+	if (i == 32) {
+		ROUNDS(24);
+		FLS(24);
+	}
+	ROUNDS(16);
+	FLS(16);
+	ROUNDS(8);
+	FLS(8);
+	ROUNDS(0);
+#endif
+
+#undef ROUNDS
+#undef FLS
+
+	/* post whitening but kw4 */
+	io[2] ^= SUBKEY_L(0);
+	io[3] ^= SUBKEY_R(0);
+	/* NB: 0,1 should be swapped with 2,3 by caller! */
+}
+
+
+struct camellia_ctx {
+	int key_length;
+	u64 key_table[CAMELLIA_TABLE_BYTE_LEN / 8];
+};
+
+static int
+camellia_set_key(struct crypto_tfm *tfm, const u8 *in_key,
+		 unsigned int key_len)
+{
+	struct camellia_ctx *cctx = crypto_tfm_ctx(tfm);
+	const unsigned char *key = (const unsigned char *)in_key;
+	u32 *flags = &tfm->crt_flags;
+
+	if (key_len != 16 && key_len != 24 && key_len != 32) {
+		*flags |= CRYPTO_TFM_RES_BAD_KEY_LEN;
+		return -EINVAL;
+	}
+
+	cctx->key_length = key_len;
+
+	switch (key_len) {
+	case 16:
+		camellia_setup128(key, cctx->key_table);
+		break;
+	case 24:
+		camellia_setup192(key, cctx->key_table);
+		break;
+	case 32:
+		camellia_setup256(key, cctx->key_table);
+		break;
+	}
+
+	return 0;
+}
+
+static void camellia_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+	const struct camellia_ctx *cctx = crypto_tfm_ctx(tfm);
+	const __be32 *src = (const __be32 *)in;
+	__be32 *dst = (__be32 *)out;
+
+	u32 tmp[4];
+
+	tmp[0] = be32_to_cpu(src[0]);
+	tmp[1] = be32_to_cpu(src[1]);
+	tmp[2] = be32_to_cpu(src[2]);
+	tmp[3] = be32_to_cpu(src[3]);
+
+	camellia_do_encrypt(cctx->key_table, tmp,
+		cctx->key_length == 16 ? 24 : 32 /* for key lengths of 24 and 32 */
+	);
+
+	/* do_encrypt returns 0,1 swapped with 2,3 */
+	dst[0] = cpu_to_be32(tmp[2]);
+	dst[1] = cpu_to_be32(tmp[3]);
+	dst[2] = cpu_to_be32(tmp[0]);
+	dst[3] = cpu_to_be32(tmp[1]);
+}
+
+static void camellia_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+	const struct camellia_ctx *cctx = crypto_tfm_ctx(tfm);
+	const __be32 *src = (const __be32 *)in;
+	__be32 *dst = (__be32 *)out;
+
+	u32 tmp[4];
+
+	tmp[0] = be32_to_cpu(src[0]);
+	tmp[1] = be32_to_cpu(src[1]);
+	tmp[2] = be32_to_cpu(src[2]);
+	tmp[3] = be32_to_cpu(src[3]);
+
+	camellia_do_decrypt(cctx->key_table, tmp,
+		cctx->key_length == 16 ? 24 : 32 /* for key lengths of 24 and 32 */
+	);
+
+	/* do_decrypt returns 0,1 swapped with 2,3 */
+	dst[0] = cpu_to_be32(tmp[2]);
+	dst[1] = cpu_to_be32(tmp[3]);
+	dst[2] = cpu_to_be32(tmp[0]);
+	dst[3] = cpu_to_be32(tmp[1]);
+}
+
+static struct crypto_alg camellia_alg = {
+	.cra_name		=	"camellia",
+	.cra_driver_name	=	"camellia-generic",
+	.cra_priority		=	100,
+	.cra_flags		=	CRYPTO_ALG_TYPE_CIPHER,
+	.cra_blocksize		=	CAMELLIA_BLOCK_SIZE,
+	.cra_ctxsize		=	sizeof(struct camellia_ctx),
+	.cra_alignmask		=	3,
+	.cra_module		=	THIS_MODULE,
+	.cra_list		=	LIST_HEAD_INIT(camellia_alg.cra_list),
+	.cra_u			=	{
+		.cipher = {
+			.cia_min_keysize	=	CAMELLIA_MIN_KEY_SIZE,
+			.cia_max_keysize	=	CAMELLIA_MAX_KEY_SIZE,
+			.cia_setkey		=	camellia_set_key,
+			.cia_encrypt		=	camellia_encrypt,
+			.cia_decrypt		=	camellia_decrypt
+		}
+	}
+};
+
+static int __init camellia_init(void)
+{
+	return crypto_register_alg(&camellia_alg);
+}
+
+static void __exit camellia_fini(void)
+{
+	crypto_unregister_alg(&camellia_alg);
+}
+
+module_init(camellia_init);
+module_exit(camellia_fini);
+
+MODULE_DESCRIPTION("Camellia Cipher Algorithm");
+MODULE_LICENSE("GPL");

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH0/5] camellia: cleanup, de-unrolling, and 64bit-ization
  2007-10-25 11:43 [PATCH0/5] camellia: cleanup, de-unrolling, and 64bit-ization Denys Vlasenko
                   ` (4 preceding siblings ...)
  2007-10-25 11:48 ` [PATCH 5/5] camellia: de-unrolling, 64bit-ization Denys Vlasenko
@ 2007-10-25 11:57 ` Denys Vlasenko
  5 siblings, 0 replies; 40+ messages in thread
From: Denys Vlasenko @ 2007-10-25 11:57 UTC (permalink / raw)
  To: Herbert Xu; +Cc: linux-crypto

Hi HERBERT (with B!)

On Thursday 25 October 2007 12:43, Denys Vlasenko wrote:
> Hi Hervert,
        ^

Sorry.
--
vda

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/5] camellia: cleanup
  2007-10-25 11:45 ` [PATCH 1/5] camellia: cleanup Denys Vlasenko
@ 2007-10-26  8:43   ` Noriaki TAKAMIYA
  2007-11-06 14:17   ` Herbert Xu
  1 sibling, 0 replies; 40+ messages in thread
From: Noriaki TAKAMIYA @ 2007-10-26  8:43 UTC (permalink / raw)
  To: vda.linux; +Cc: herbert, linux-crypto, camellia-oss

Hi,

>> Thu, 25 Oct 2007 12:45:04 +0100
>> [Subject: [PATCH 1/5] camellia: cleanup]
>> Denys Vlasenko <vda.linux@googlemail.com> wrote...

> On Thursday 25 October 2007 12:43, Denys Vlasenko wrote:
> > Hi Hervert,
> > 
> > Please review and maybe propagate upstream following patches.
> > 
> > camellia1.diff:
> >     Move code blocks around so that related pieces are closer together:
> >     e.g. CAMELLIA_ROUNDSM macro does not need to be separated
> >     from the rest of the code by huge array of constants.
> > 
> >     Remove unused macros (COPY4WORD, SWAP4WORD, XOR4WORD[2])
> > 
> >     Drop SUBL(), SUBR() macros which only obscure things.
> >     Same for CAMELLIA_SP1110() macro and KEY_TABLE_TYPE typedef.
> > 
> >     Remove useless comments:
> >     /* encryption */ -- well it's obvious enough already!
> >     void camellia_encrypt128(...)
> > 
> >     Combine swap with copying at the beginning/end of encrypt/decrypt.
> 
> Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>

Acked-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>

--
Noriaki TAKAMIYA

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/5] camellia: cleanup
  2007-10-25 11:45 ` [PATCH 2/5] " Denys Vlasenko
@ 2007-10-26  8:44   ` Noriaki TAKAMIYA
  2007-11-06 14:19   ` Herbert Xu
  1 sibling, 0 replies; 40+ messages in thread
From: Noriaki TAKAMIYA @ 2007-10-26  8:44 UTC (permalink / raw)
  To: vda.linux; +Cc: herbert, linux-crypto, camellia-oss

>> Thu, 25 Oct 2007 12:45:42 +0100
>> [Subject: [PATCH 2/5] camellia: cleanup]
>> Denys Vlasenko <vda.linux@googlemail.com> wrote...

> On Thursday 25 October 2007 12:43, Denys Vlasenko wrote:
> > Hi Hervert,
> > 
> > Please review and maybe propagate upstream following patches.
> > 
> > camellia2.diff
> >     Rename some macros to shorter names: CAMELLIA_RR8 -> ROR8,
> >     making it easier to understand that it is just a right rotation,
> >     nothing camellia-specific in it.
> >     CAMELLIA_SUBKEY_L() -> SUBKEY_L() - just shorter.
> > 
> >     Move be32 <-> cpu conversions out of en/decrypt128/256 and into
> >     camellia_en/decrypt - no reason to have that code duplicated twice.
> 
> Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>

Acked-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>

--
Noriaki TAKAMIYA

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 3/5] camellia: cleanup
  2007-10-25 11:46 ` [PATCH 3/5] " Denys Vlasenko
@ 2007-10-26  8:44   ` Noriaki TAKAMIYA
  2007-11-06 14:21   ` Herbert Xu
  1 sibling, 0 replies; 40+ messages in thread
From: Noriaki TAKAMIYA @ 2007-10-26  8:44 UTC (permalink / raw)
  To: vda.linux; +Cc: herbert, linux-crypto, camellia-oss

>> Thu, 25 Oct 2007 12:46:35 +0100
>> [Subject: [PATCH 3/5] camellia: cleanup]
>> Denys Vlasenko <vda.linux@googlemail.com> wrote...

> On Thursday 25 October 2007 12:43, Denys Vlasenko wrote:
> > Hi Hervert,
> > 
> > Please review and maybe propagate upstream following patches.
> > 
> > camellia3.diff
> >     Optimize GETU32 to use 4-byte memcpy (modern gcc will convert
> >     such memcpy to single move instruction on i386).
> >     Original GETU32 did four byte fetches, and shifted/XORed those.
> 
> Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>

Acked-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>

--
Noriaki TAKAMIYA

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 4/5] camellia: de-unrolling
  2007-10-25 11:47 ` [PATCH 4/5] camellia: de-unrolling Denys Vlasenko
@ 2007-10-26  8:45   ` Noriaki TAKAMIYA
  2007-11-06 14:21   ` Herbert Xu
  1 sibling, 0 replies; 40+ messages in thread
From: Noriaki TAKAMIYA @ 2007-10-26  8:45 UTC (permalink / raw)
  To: vda.linux; +Cc: herbert, linux-crypto, takamiya

Hi,

>> Thu, 25 Oct 2007 12:47:16 +0100
>> [Subject: [PATCH 4/5] camellia: de-unrolling]
>> Denys Vlasenko <vda.linux@googlemail.com> wrote...

> On Thursday 25 October 2007 12:43, Denys Vlasenko wrote:
> > Hi Hervert,
> > 
> > Please review and maybe propagate upstream following patches.
> > 
> > camellia4.diff
> >     Move huge unrolled pieces of code (3 screenfuls) at the end of
> >     128/256 key setup routines into common camellia_setup_tail(),
> >     convert it to loop there.
> >     Loop is still unrolled six times, so performance hit is very small,
> >     code size win is big.
> 
> Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
> --
> vda
Acked-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>

--
Noriaki TAKAMIYA

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization
  2007-10-25 11:48 ` [PATCH 5/5] camellia: de-unrolling, 64bit-ization Denys Vlasenko
@ 2007-10-26  8:45   ` Noriaki TAKAMIYA
  2007-11-06 14:23   ` Herbert Xu
  1 sibling, 0 replies; 40+ messages in thread
From: Noriaki TAKAMIYA @ 2007-10-26  8:45 UTC (permalink / raw)
  To: vda.linux; +Cc: herbert, linux-crypto, camellia-oss

>> Thu, 25 Oct 2007 12:48:29 +0100
>> [Subject: [PATCH 5/5] camellia: de-unrolling, 64bit-ization]
>> Denys Vlasenko <vda.linux@googlemail.com> wrote...

> On Thursday 25 October 2007 12:43, Denys Vlasenko wrote:
> > Hi Hervert,
> > 
> > Please review and maybe propagate upstream following patches.
> > 
> > camellia5.diff
> >     Use alternative key setup implementation with mostly 64-bit ops
> >     if BITS_PER_LONG >= 64. Both much smaller and much faster.
> > 
> >     Unify camellia_en/decrypt128/256 into camellia_do_en/decrypt.
> >     Code was similar, with just one additional if() we can use came code.
> > 
> >     If CONFIG_CC_OPTIMIZE_FOR_SIZE is defined,
> >     use loop in camellia_do_en/decrypt instead of unrolled code.
> >     ~5% encrypt/decrypt slowdown.
> > 
> >     Replace (x & 0xff) with (u8)x, gcc is not smart enough to realize
> >     that it can do (x & 0xff) this way (which is smaller at least on i386).
> > 
> >     Don't do (x & 0xff) in a few places where x cannot be > 255 anyway:
> >         t0 = il >> 16; v = camellia_sp0222[(t1 >> 8) & 0xff];
> >     il16 is u32, (thus t1 >> 8) is one byte!
> 
> Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
> --
> vda
Acked-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>

--
Noriaki TAKAMIYA

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 1/5] camellia: cleanup
  2007-10-25 11:45 ` [PATCH 1/5] camellia: cleanup Denys Vlasenko
  2007-10-26  8:43   ` Noriaki TAKAMIYA
@ 2007-11-06 14:17   ` Herbert Xu
  1 sibling, 0 replies; 40+ messages in thread
From: Herbert Xu @ 2007-11-06 14:17 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: linux-crypto

On Thu, Oct 25, 2007 at 12:45:04PM +0100, Denys Vlasenko wrote:
> On Thursday 25 October 2007 12:43, Denys Vlasenko wrote:
> > Hi Hervert,
> > 
> > Please review and maybe propagate upstream following patches.
> > 
> > camellia1.diff:
> >     Move code blocks around so that related pieces are closer together:
> >     e.g. CAMELLIA_ROUNDSM macro does not need to be separated
> >     from the rest of the code by huge array of constants.
> > 
> >     Remove unused macros (COPY4WORD, SWAP4WORD, XOR4WORD[2])
> > 
> >     Drop SUBL(), SUBR() macros which only obscure things.
> >     Same for CAMELLIA_SP1110() macro and KEY_TABLE_TYPE typedef.
> > 
> >     Remove useless comments:
> >     /* encryption */ -- well it's obvious enough already!
> >     void camellia_encrypt128(...)
> > 
> >     Combine swap with copying at the beginning/end of encrypt/decrypt.
> 
> Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>

Patch applied.  Thanks Denis!
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 2/5] camellia: cleanup
  2007-10-25 11:45 ` [PATCH 2/5] " Denys Vlasenko
  2007-10-26  8:44   ` Noriaki TAKAMIYA
@ 2007-11-06 14:19   ` Herbert Xu
  1 sibling, 0 replies; 40+ messages in thread
From: Herbert Xu @ 2007-11-06 14:19 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: linux-crypto

On Thu, Oct 25, 2007 at 12:45:42PM +0100, Denys Vlasenko wrote:
> On Thursday 25 October 2007 12:43, Denys Vlasenko wrote:
> > Hi Hervert,
> > 
> > Please review and maybe propagate upstream following patches.
> > 
> > camellia2.diff
> >     Rename some macros to shorter names: CAMELLIA_RR8 -> ROR8,
> >     making it easier to understand that it is just a right rotation,
> >     nothing camellia-specific in it.
> >     CAMELLIA_SUBKEY_L() -> SUBKEY_L() - just shorter.
> > 
> >     Move be32 <-> cpu conversions out of en/decrypt128/256 and into
> >     camellia_en/decrypt - no reason to have that code duplicated twice.
> 
> Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>

Patch applied.  BTW, we have a function called ror32 that can
replace all of these ROR* macros.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 3/5] camellia: cleanup
  2007-10-25 11:46 ` [PATCH 3/5] " Denys Vlasenko
  2007-10-26  8:44   ` Noriaki TAKAMIYA
@ 2007-11-06 14:21   ` Herbert Xu
  1 sibling, 0 replies; 40+ messages in thread
From: Herbert Xu @ 2007-11-06 14:21 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: linux-crypto

On Thu, Oct 25, 2007 at 12:46:35PM +0100, Denys Vlasenko wrote:
> On Thursday 25 October 2007 12:43, Denys Vlasenko wrote:
> > Hi Hervert,
> > 
> > Please review and maybe propagate upstream following patches.
> > 
> > camellia3.diff
> >     Optimize GETU32 to use 4-byte memcpy (modern gcc will convert
> >     such memcpy to single move instruction on i386).
> >     Original GETU32 did four byte fetches, and shifted/XORed those.
> 
> Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>

Patch applied.

> +# define GETU32(v, pt) \
> +    do { \
> +	/* latest breed of gcc is clever enough to use move */ \
> +	memcpy(&(v), (pt), 4); \
> +	(v) = be32_to_cpu(v); \
> +    } while(0)

You can get rid of this memcpy too since camellia declares an
alignmask of 3 which means that the key is 32-bit aligned.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 4/5] camellia: de-unrolling
  2007-10-25 11:47 ` [PATCH 4/5] camellia: de-unrolling Denys Vlasenko
  2007-10-26  8:45   ` Noriaki TAKAMIYA
@ 2007-11-06 14:21   ` Herbert Xu
  1 sibling, 0 replies; 40+ messages in thread
From: Herbert Xu @ 2007-11-06 14:21 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: linux-crypto

On Thu, Oct 25, 2007 at 12:47:16PM +0100, Denys Vlasenko wrote:
> On Thursday 25 October 2007 12:43, Denys Vlasenko wrote:
> > Hi Hervert,
> > 
> > Please review and maybe propagate upstream following patches.
> > 
> > camellia4.diff
> >     Move huge unrolled pieces of code (3 screenfuls) at the end of
> >     128/256 key setup routines into common camellia_setup_tail(),
> >     convert it to loop there.
> >     Loop is still unrolled six times, so performance hit is very small,
> >     code size win is big.
> 
> Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>

Good work! Patch applied.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization
  2007-10-25 11:48 ` [PATCH 5/5] camellia: de-unrolling, 64bit-ization Denys Vlasenko
  2007-10-26  8:45   ` Noriaki TAKAMIYA
@ 2007-11-06 14:23   ` Herbert Xu
  2007-11-07 13:22     ` Denys Vlasenko
  1 sibling, 1 reply; 40+ messages in thread
From: Herbert Xu @ 2007-11-06 14:23 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: linux-crypto

On Thu, Oct 25, 2007 at 12:48:29PM +0100, Denys Vlasenko wrote:
> On Thursday 25 October 2007 12:43, Denys Vlasenko wrote:
> > Hi Hervert,
> > 
> > Please review and maybe propagate upstream following patches.
> > 
> > camellia5.diff
> >     Use alternative key setup implementation with mostly 64-bit ops
> >     if BITS_PER_LONG >= 64. Both much smaller and much faster.
> > 
> >     Unify camellia_en/decrypt128/256 into camellia_do_en/decrypt.
> >     Code was similar, with just one additional if() we can use came code.
> > 
> >     If CONFIG_CC_OPTIMIZE_FOR_SIZE is defined,
> >     use loop in camellia_do_en/decrypt instead of unrolled code.
> >     ~5% encrypt/decrypt slowdown.

Having two versions of the cdoe is unmaintainable.  So please
either decide that 5% is worth it or isn't.

The rest of this patch looks fine.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization
  2007-11-06 14:23   ` Herbert Xu
@ 2007-11-07 13:22     ` Denys Vlasenko
  2007-11-08 13:30       ` Herbert Xu
  0 siblings, 1 reply; 40+ messages in thread
From: Denys Vlasenko @ 2007-11-07 13:22 UTC (permalink / raw)
  To: Herbert Xu; +Cc: linux-crypto

[-- Attachment #1: Type: text/plain, Size: 1506 bytes --]

On Tuesday 06 November 2007 14:23, Herbert Xu wrote:
> On Thu, Oct 25, 2007 at 12:48:29PM +0100, Denys Vlasenko wrote:
> > On Thursday 25 October 2007 12:43, Denys Vlasenko wrote:
> > > Hi Hervert,
> > > 
> > > Please review and maybe propagate upstream following patches.
> > > 
> > > camellia5.diff
> > >     Use alternative key setup implementation with mostly 64-bit ops
> > >     if BITS_PER_LONG >= 64. Both much smaller and much faster.
> > > 
> > >     Unify camellia_en/decrypt128/256 into camellia_do_en/decrypt.
> > >     Code was similar, with just one additional if() we can use came code.
> > > 
> > >     If CONFIG_CC_OPTIMIZE_FOR_SIZE is defined,
> > >     use loop in camellia_do_en/decrypt instead of unrolled code.
> > >     ~5% encrypt/decrypt slowdown.
> 
> Having two versions of the cdoe is unmaintainable.  So please
> either decide that 5% is worth it or isn't.

*I* am happy with 5% speed sacrifice. I'm afraid other people won't be.

I just want to escape vicious cycle of -Os people arguing with
-O2 people to no end. I don't want somebody to come later
and unroll the loop again. And then me to come
and de-unroll it again...

It's better for everybody to recognize that both POVs are valid,
and have provisions for tuning size/speed tradeoff by the user
(person which builds the binary).

That's why I made a patch where unrolling can be enabled by CONFIG_xxx.

If you disagree with me and don't want this type of selectability,
the updated patch is attached.

Thanks!
--
vda

[-- Attachment #2: camellia5.diff --]
[-- Type: text/x-diff, Size: 55385 bytes --]

--- linux-2.6.23.src/crypto/camellia4.c	2007-10-24 19:03:57.000000000 +0100
+++ linux-2.6.23.src/crypto/camellia.c	2007-11-07 13:06:48.000000000 +0000
@@ -36,6 +36,13 @@
 #include <linux/kernel.h>
 #include <linux/module.h>
 
+#if BITS_PER_LONG >= 64
+
+/* Use alternative implementation with mostly 64-bit ops */
+#include "camellia_64.c"
+
+#else
+
 static const u32 camellia_sp1110[256] = {
 	0x70707000,0x82828200,0x2c2c2c00,0xececec00,
 	0xb3b3b300,0x27272700,0xc0c0c000,0xe5e5e500,
@@ -329,7 +336,6 @@ static const u32 camellia_sp4404[256] = 
 /*
  *  macros
  */
-
 # define GETU32(v, pt) \
     do { \
 	/* latest breed of gcc is clever enough to use move */ \
@@ -364,63 +370,28 @@ static const u32 camellia_sp4404[256] = 
     } while(0)
 
 
+/*
+ * Key setup
+ */
 #define CAMELLIA_F(xl, xr, kl, kr, yl, yr, il, ir, t0, t1)	\
     do {							\
 	il = xl ^ kl;						\
 	ir = xr ^ kr;						\
 	t0 = il >> 16;						\
 	t1 = ir >> 16;						\
-	yl = camellia_sp1110[ir & 0xff]				\
-	   ^ camellia_sp0222[(t1 >> 8) & 0xff]			\
-	   ^ camellia_sp3033[t1 & 0xff]				\
-	   ^ camellia_sp4404[(ir >> 8) & 0xff];			\
-	yr = camellia_sp1110[(t0 >> 8) & 0xff]			\
-	   ^ camellia_sp0222[t0 & 0xff]				\
-	   ^ camellia_sp3033[(il >> 8) & 0xff]			\
-	   ^ camellia_sp4404[il & 0xff];			\
+	yl = camellia_sp1110[(u8)(ir     )]			\
+	   ^ camellia_sp0222[    (t1 >> 8)]			\
+	   ^ camellia_sp3033[(u8)(t1     )]			\
+	   ^ camellia_sp4404[(u8)(ir >> 8)];			\
+	yr = camellia_sp1110[    (t0 >> 8)]			\
+	   ^ camellia_sp0222[(u8)(t0     )]			\
+	   ^ camellia_sp3033[(u8)(il >> 8)]			\
+	   ^ camellia_sp4404[(u8)(il     )];			\
 	yl ^= yr;						\
 	yr = ROR8(yr);						\
 	yr ^= yl;						\
     } while(0)
 
-
-/*
- * for speed up
- *
- */
-#define CAMELLIA_FLS(ll, lr, rl, rr, kll, klr, krl, krr, t0, t1, t2, t3) \
-    do {								\
-	t0 = kll;							\
-	t2 = krr;							\
-	t0 &= ll;							\
-	t2 |= rr;							\
-	rl ^= t2;							\
-	lr ^= ROL1(t0);							\
-	t3 = krl;							\
-	t1 = klr;							\
-	t3 &= rl;							\
-	t1 |= lr;							\
-	ll ^= t1;							\
-	rr ^= ROL1(t3);							\
-    } while(0)
-
-#define CAMELLIA_ROUNDSM(xl, xr, kl, kr, yl, yr, il, ir, t0, t1)	\
-    do {								\
-	ir =  camellia_sp1110[xr & 0xff];				\
-	il =  camellia_sp1110[(xl>>24) & 0xff];				\
-	ir ^= camellia_sp0222[(xr>>24) & 0xff];				\
-	il ^= camellia_sp0222[(xl>>16) & 0xff];				\
-	ir ^= camellia_sp3033[(xr>>16) & 0xff];				\
-	il ^= camellia_sp3033[(xl>>8) & 0xff];				\
-	ir ^= camellia_sp4404[(xr>>8) & 0xff];				\
-	il ^= camellia_sp4404[xl & 0xff];				\
-	il ^= kl;							\
-	ir ^= il ^ kr;							\
-	yl ^= ir;							\
-	yr ^= ROR8(il) ^ ir;						\
-    } while(0)
-
-
 #define SUBKEY_L(INDEX) (subkey[(INDEX)*2])
 #define SUBKEY_R(INDEX) (subkey[(INDEX)*2 + 1])
 
@@ -622,7 +593,7 @@ static void camellia_setup128(const unsi
 	SUBKEY_L(6) = subL[5] ^ subL[7]; /* round 5 */
 	SUBKEY_R(6) = subR[5] ^ subR[7];
 	tl = subL[10] ^ (subR[10] & ~subR[8]);
-	dw = tl & subL[8],  /* FL(kl1) */
+	dw = tl & subL[8];  /* FL(kl1) */
 		tr = subR[10] ^ ROL1(dw);
 	SUBKEY_L(7) = subL[6] ^ tl; /* round 6 */
 	SUBKEY_R(7) = subR[6] ^ tr;
@@ -1000,400 +971,144 @@ static void camellia_setup192(const unsi
 }
 
 
-static void camellia_encrypt128(const u32 *subkey, u32 *io_text)
-{
-	u32 il,ir,t0,t1;               /* temporary variables */
-
-	u32 io[4];
-
-	/* pre whitening but absorb kw2 */
-	io[0] = io_text[0] ^ SUBKEY_L(0);
-	io[1] = io_text[1] ^ SUBKEY_R(0);
-	io[2] = io_text[2];
-	io[3] = io_text[3];
-
-	/* main iteration */
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(2),SUBKEY_R(2),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(3),SUBKEY_R(3),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(4),SUBKEY_R(4),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(5),SUBKEY_R(5),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(6),SUBKEY_R(6),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(7),SUBKEY_R(7),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(8),SUBKEY_R(8),
-		     SUBKEY_L(9),SUBKEY_R(9),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(10),SUBKEY_R(10),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(11),SUBKEY_R(11),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(12),SUBKEY_R(12),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(13),SUBKEY_R(13),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(14),SUBKEY_R(14),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(15),SUBKEY_R(15),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(16),SUBKEY_R(16),
-		     SUBKEY_L(17),SUBKEY_R(17),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(18),SUBKEY_R(18),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(19),SUBKEY_R(19),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(20),SUBKEY_R(20),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(21),SUBKEY_R(21),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(22),SUBKEY_R(22),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(23),SUBKEY_R(23),
-			 io[0],io[1],il,ir,t0,t1);
+/*
+ * Encrypt/decrypt
+ */
+#define CAMELLIA_FLS(ll, lr, rl, rr, kll, klr, krl, krr, t0, t1, t2, t3) \
+    do {								\
+	t0 = kll;							\
+	t2 = krr;							\
+	t0 &= ll;							\
+	t2 |= rr;							\
+	rl ^= t2;							\
+	lr ^= ROL1(t0);							\
+	t3 = krl;							\
+	t1 = klr;							\
+	t3 &= rl;							\
+	t1 |= lr;							\
+	ll ^= t1;							\
+	rr ^= ROL1(t3);							\
+    } while(0)
 
-	/* post whitening but kw4 */
-	io_text[0] = io[2] ^ SUBKEY_L(24);
-	io_text[1] = io[3] ^ SUBKEY_R(24);
-	io_text[2] = io[0];
-	io_text[3] = io[1];
-}
+#define CAMELLIA_ROUNDSM(xl, xr, kl, kr, yl, yr, il, ir)		\
+    do {								\
+	ir =  camellia_sp1110[(u8)xr];					\
+	il =  camellia_sp1110[    (xl >> 24)];				\
+	ir ^= camellia_sp0222[    (xr >> 24)];				\
+	il ^= camellia_sp0222[(u8)(xl >> 16)];				\
+	ir ^= camellia_sp3033[(u8)(xr >> 16)];				\
+	il ^= camellia_sp3033[(u8)(xl >> 8)];				\
+	ir ^= camellia_sp4404[(u8)(xr >> 8)];				\
+	il ^= camellia_sp4404[(u8)xl];					\
+	il ^= kl;							\
+	ir ^= il ^ kr;							\
+	yl ^= ir;							\
+	yr ^= ROR8(il) ^ ir;						\
+    } while(0)
 
-static void camellia_decrypt128(const u32 *subkey, u32 *io_text)
+/* max = 24: 128bit encrypt, max = 32: 256bit encrypt */
+static void camellia_do_encrypt(const u32 *subkey, u32 *io, unsigned max)
 {
 	u32 il,ir,t0,t1;               /* temporary variables */
-
-	u32 io[4];
-
-	/* pre whitening but absorb kw2 */
-	io[0] = io_text[0] ^ SUBKEY_L(24);
-	io[1] = io_text[1] ^ SUBKEY_R(24);
-	io[2] = io_text[2];
-	io[3] = io_text[3];
-
-	/* main iteration */
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(23),SUBKEY_R(23),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(22),SUBKEY_R(22),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(21),SUBKEY_R(21),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(20),SUBKEY_R(20),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(19),SUBKEY_R(19),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(18),SUBKEY_R(18),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(17),SUBKEY_R(17),
-		     SUBKEY_L(16),SUBKEY_R(16),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(15),SUBKEY_R(15),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(14),SUBKEY_R(14),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(13),SUBKEY_R(13),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(12),SUBKEY_R(12),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(11),SUBKEY_R(11),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(10),SUBKEY_R(10),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(9),SUBKEY_R(9),
-		     SUBKEY_L(8),SUBKEY_R(8),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(7),SUBKEY_R(7),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(6),SUBKEY_R(6),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(5),SUBKEY_R(5),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(4),SUBKEY_R(4),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(3),SUBKEY_R(3),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(2),SUBKEY_R(2),
-			 io[0],io[1],il,ir,t0,t1);
-
-	/* post whitening but kw4 */
-	io_text[0] = io[2] ^ SUBKEY_L(0);
-	io_text[1] = io[3] ^ SUBKEY_R(0);
-	io_text[2] = io[0];
-	io_text[3] = io[1];
-}
-
-static void camellia_encrypt256(const u32 *subkey, u32 *io_text)
-{
-	u32 il,ir,t0,t1;           /* temporary variables */
-
-	u32 io[4];
+	unsigned i;
 
 	/* pre whitening but absorb kw2 */
-	io[0] = io_text[0] ^ SUBKEY_L(0);
-	io[1] = io_text[1] ^ SUBKEY_R(0);
-	io[2] = io_text[2];
-	io[3] = io_text[3];
+	io[0] ^= SUBKEY_L(0);
+	io[1] ^= SUBKEY_R(0);
 
 	/* main iteration */
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(2),SUBKEY_R(2),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(3),SUBKEY_R(3),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(4),SUBKEY_R(4),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(5),SUBKEY_R(5),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(6),SUBKEY_R(6),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(7),SUBKEY_R(7),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(8),SUBKEY_R(8),
-		     SUBKEY_L(9),SUBKEY_R(9),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(10),SUBKEY_R(10),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(11),SUBKEY_R(11),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(12),SUBKEY_R(12),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(13),SUBKEY_R(13),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(14),SUBKEY_R(14),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(15),SUBKEY_R(15),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(16),SUBKEY_R(16),
-		     SUBKEY_L(17),SUBKEY_R(17),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(18),SUBKEY_R(18),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(19),SUBKEY_R(19),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(20),SUBKEY_R(20),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(21),SUBKEY_R(21),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(22),SUBKEY_R(22),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(23),SUBKEY_R(23),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(24),SUBKEY_R(24),
-		     SUBKEY_L(25),SUBKEY_R(25),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(26),SUBKEY_R(26),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(27),SUBKEY_R(27),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(28),SUBKEY_R(28),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(29),SUBKEY_R(29),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(30),SUBKEY_R(30),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(31),SUBKEY_R(31),
-			 io[0],io[1],il,ir,t0,t1);
+#define ROUNDS(i) do { \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 2),SUBKEY_R(i + 2), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 3),SUBKEY_R(i + 3), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 4),SUBKEY_R(i + 4), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 5),SUBKEY_R(i + 5), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 6),SUBKEY_R(i + 6), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 7),SUBKEY_R(i + 7), \
+			 io[0],io[1],il,ir); \
+} while (0)
+#define FLS(i) do { \
+	CAMELLIA_FLS(io[0],io[1],io[2],io[3], \
+		     SUBKEY_L(i + 0),SUBKEY_R(i + 0), \
+		     SUBKEY_L(i + 1),SUBKEY_R(i + 1), \
+		     t0,t1,il,ir); \
+} while (0)
+	i = 0;
+	while (1) {
+		ROUNDS(i);
+		i += 8;
+		if (i >= max)
+			break;
+		FLS(i);
+	}
+#undef ROUNDS
+#undef FLS
 
 	/* post whitening but kw4 */
-	io_text[0] = io[2] ^ SUBKEY_L(32);
-	io_text[1] = io[3] ^ SUBKEY_R(32);
-	io_text[2] = io[0];
-	io_text[3] = io[1];
+	io[2] ^= SUBKEY_L(max);
+	io[3] ^= SUBKEY_R(max);
+	/* NB: io[0],[1] should be swapped with [2],[3] by caller! */
 }
 
-static void camellia_decrypt256(const u32 *subkey, u32 *io_text)
+static void camellia_do_decrypt(const u32 *subkey, u32 *io, unsigned i)
 {
-	u32 il,ir,t0,t1;           /* temporary variables */
-
-	u32 io[4];
+	u32 il,ir,t0,t1;               /* temporary variables */
 
 	/* pre whitening but absorb kw2 */
-	io[0] = io_text[0] ^ SUBKEY_L(32);
-	io[1] = io_text[1] ^ SUBKEY_R(32);
-	io[2] = io_text[2];
-	io[3] = io_text[3];
+	io[0] ^= SUBKEY_L(i);
+	io[1] ^= SUBKEY_R(i);
 
 	/* main iteration */
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(31),SUBKEY_R(31),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(30),SUBKEY_R(30),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(29),SUBKEY_R(29),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(28),SUBKEY_R(28),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(27),SUBKEY_R(27),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(26),SUBKEY_R(26),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(25),SUBKEY_R(25),
-		     SUBKEY_L(24),SUBKEY_R(24),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(23),SUBKEY_R(23),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(22),SUBKEY_R(22),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(21),SUBKEY_R(21),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(20),SUBKEY_R(20),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(19),SUBKEY_R(19),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(18),SUBKEY_R(18),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(17),SUBKEY_R(17),
-		     SUBKEY_L(16),SUBKEY_R(16),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(15),SUBKEY_R(15),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(14),SUBKEY_R(14),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(13),SUBKEY_R(13),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(12),SUBKEY_R(12),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(11),SUBKEY_R(11),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(10),SUBKEY_R(10),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(9),SUBKEY_R(9),
-		     SUBKEY_L(8),SUBKEY_R(8),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(7),SUBKEY_R(7),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(6),SUBKEY_R(6),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(5),SUBKEY_R(5),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(4),SUBKEY_R(4),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(3),SUBKEY_R(3),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(2),SUBKEY_R(2),
-			 io[0],io[1],il,ir,t0,t1);
+#define ROUNDS(i) do { \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 7),SUBKEY_R(i + 7), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 6),SUBKEY_R(i + 6), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 5),SUBKEY_R(i + 5), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 4),SUBKEY_R(i + 4), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 3),SUBKEY_R(i + 3), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 2),SUBKEY_R(i + 2), \
+			 io[0],io[1],il,ir); \
+} while (0)
+#define FLS(i) do { \
+	CAMELLIA_FLS(io[0],io[1],io[2],io[3], \
+		     SUBKEY_L(i + 1),SUBKEY_R(i + 1), \
+		     SUBKEY_L(i + 0),SUBKEY_R(i + 0), \
+		     t0,t1,il,ir); \
+} while (0)
+	while (1) {
+		i -= 8;
+		ROUNDS(i);
+		if (i == 0)
+			break;
+		FLS(i);
+	}
+#undef ROUNDS
+#undef FLS
 
 	/* post whitening but kw4 */
-	io_text[0] = io[2] ^ SUBKEY_L(0);
-	io_text[1] = io[3] ^ SUBKEY_R(0);
-	io_text[2] = io[0];
-	io_text[3] = io[1];
+	io[2] ^= SUBKEY_L(0);
+	io[3] ^= SUBKEY_R(0);
+	/* NB: 0,1 should be swapped with 2,3 by caller! */
 }
 
 
@@ -1445,21 +1160,15 @@ static void camellia_encrypt(struct cryp
 	tmp[2] = be32_to_cpu(src[2]);
 	tmp[3] = be32_to_cpu(src[3]);
 
-	switch (cctx->key_length) {
-	case 16:
-		camellia_encrypt128(cctx->key_table, tmp);
-		break;
-	case 24:
-		/* fall through */
-	case 32:
-		camellia_encrypt256(cctx->key_table, tmp);
-		break;
-	}
-
-	dst[0] = cpu_to_be32(tmp[0]);
-	dst[1] = cpu_to_be32(tmp[1]);
-	dst[2] = cpu_to_be32(tmp[2]);
-	dst[3] = cpu_to_be32(tmp[3]);
+	camellia_do_encrypt(cctx->key_table, tmp,
+		cctx->key_length == 16 ? 24 : 32 /* for key lengths of 24 and 32 */
+	);
+
+	/* do_encrypt returns 0,1 swapped with 2,3 */
+	dst[0] = cpu_to_be32(tmp[2]);
+	dst[1] = cpu_to_be32(tmp[3]);
+	dst[2] = cpu_to_be32(tmp[0]);
+	dst[3] = cpu_to_be32(tmp[1]);
 }
 
 static void camellia_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
@@ -1475,21 +1184,15 @@ static void camellia_decrypt(struct cryp
 	tmp[2] = be32_to_cpu(src[2]);
 	tmp[3] = be32_to_cpu(src[3]);
 
-	switch (cctx->key_length) {
-	case 16:
-		camellia_decrypt128(cctx->key_table, tmp);
-		break;
-	case 24:
-		/* fall through */
-	case 32:
-		camellia_decrypt256(cctx->key_table, tmp);
-		break;
-	}
-
-	dst[0] = cpu_to_be32(tmp[0]);
-	dst[1] = cpu_to_be32(tmp[1]);
-	dst[2] = cpu_to_be32(tmp[2]);
-	dst[3] = cpu_to_be32(tmp[3]);
+	camellia_do_decrypt(cctx->key_table, tmp,
+		cctx->key_length == 16 ? 24 : 32 /* for key lengths of 24 and 32 */
+	);
+
+	/* do_decrypt returns 0,1 swapped with 2,3 */
+	dst[0] = cpu_to_be32(tmp[2]);
+	dst[1] = cpu_to_be32(tmp[3]);
+	dst[2] = cpu_to_be32(tmp[0]);
+	dst[3] = cpu_to_be32(tmp[1]);
 }
 
 static struct crypto_alg camellia_alg = {
@@ -1528,3 +1231,5 @@ module_exit(camellia_fini);
 
 MODULE_DESCRIPTION("Camellia Cipher Algorithm");
 MODULE_LICENSE("GPL");
+
+#endif /* if BITS_PER_LONG < 64 */
--- /dev/null	2006-05-22 15:25:23.000000000 +0100
+++ linux-2.6.23.src/crypto/camellia_64.c	2007-10-25 12:32:16.000000000 +0100
@@ -0,0 +1,1172 @@
+/*
+ * Copyright (C) 2006
+ * NTT (Nippon Telegraph and Telephone Corporation).
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
+ */
+
+/*
+ * Algorithm Specification
+ *  http://info.isl.ntt.co.jp/crypt/eng/camellia/specifications.html
+ */
+
+/*
+ *
+ * NOTE --- NOTE --- NOTE --- NOTE
+ * This implementation assumes that all memory addresses passed
+ * as parameters are four-byte aligned.
+ *
+ */
+
+/* #included from camellia.c if long is 64bit */
+
+/*
+#include <linux/crypto.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+*/
+
+static const u32 camellia_sp1110[256] = {
+	0x70707000,0x82828200,0x2c2c2c00,0xececec00,
+	0xb3b3b300,0x27272700,0xc0c0c000,0xe5e5e500,
+	0xe4e4e400,0x85858500,0x57575700,0x35353500,
+	0xeaeaea00,0x0c0c0c00,0xaeaeae00,0x41414100,
+	0x23232300,0xefefef00,0x6b6b6b00,0x93939300,
+	0x45454500,0x19191900,0xa5a5a500,0x21212100,
+	0xededed00,0x0e0e0e00,0x4f4f4f00,0x4e4e4e00,
+	0x1d1d1d00,0x65656500,0x92929200,0xbdbdbd00,
+	0x86868600,0xb8b8b800,0xafafaf00,0x8f8f8f00,
+	0x7c7c7c00,0xebebeb00,0x1f1f1f00,0xcecece00,
+	0x3e3e3e00,0x30303000,0xdcdcdc00,0x5f5f5f00,
+	0x5e5e5e00,0xc5c5c500,0x0b0b0b00,0x1a1a1a00,
+	0xa6a6a600,0xe1e1e100,0x39393900,0xcacaca00,
+	0xd5d5d500,0x47474700,0x5d5d5d00,0x3d3d3d00,
+	0xd9d9d900,0x01010100,0x5a5a5a00,0xd6d6d600,
+	0x51515100,0x56565600,0x6c6c6c00,0x4d4d4d00,
+	0x8b8b8b00,0x0d0d0d00,0x9a9a9a00,0x66666600,
+	0xfbfbfb00,0xcccccc00,0xb0b0b000,0x2d2d2d00,
+	0x74747400,0x12121200,0x2b2b2b00,0x20202000,
+	0xf0f0f000,0xb1b1b100,0x84848400,0x99999900,
+	0xdfdfdf00,0x4c4c4c00,0xcbcbcb00,0xc2c2c200,
+	0x34343400,0x7e7e7e00,0x76767600,0x05050500,
+	0x6d6d6d00,0xb7b7b700,0xa9a9a900,0x31313100,
+	0xd1d1d100,0x17171700,0x04040400,0xd7d7d700,
+	0x14141400,0x58585800,0x3a3a3a00,0x61616100,
+	0xdedede00,0x1b1b1b00,0x11111100,0x1c1c1c00,
+	0x32323200,0x0f0f0f00,0x9c9c9c00,0x16161600,
+	0x53535300,0x18181800,0xf2f2f200,0x22222200,
+	0xfefefe00,0x44444400,0xcfcfcf00,0xb2b2b200,
+	0xc3c3c300,0xb5b5b500,0x7a7a7a00,0x91919100,
+	0x24242400,0x08080800,0xe8e8e800,0xa8a8a800,
+	0x60606000,0xfcfcfc00,0x69696900,0x50505000,
+	0xaaaaaa00,0xd0d0d000,0xa0a0a000,0x7d7d7d00,
+	0xa1a1a100,0x89898900,0x62626200,0x97979700,
+	0x54545400,0x5b5b5b00,0x1e1e1e00,0x95959500,
+	0xe0e0e000,0xffffff00,0x64646400,0xd2d2d200,
+	0x10101000,0xc4c4c400,0x00000000,0x48484800,
+	0xa3a3a300,0xf7f7f700,0x75757500,0xdbdbdb00,
+	0x8a8a8a00,0x03030300,0xe6e6e600,0xdadada00,
+	0x09090900,0x3f3f3f00,0xdddddd00,0x94949400,
+	0x87878700,0x5c5c5c00,0x83838300,0x02020200,
+	0xcdcdcd00,0x4a4a4a00,0x90909000,0x33333300,
+	0x73737300,0x67676700,0xf6f6f600,0xf3f3f300,
+	0x9d9d9d00,0x7f7f7f00,0xbfbfbf00,0xe2e2e200,
+	0x52525200,0x9b9b9b00,0xd8d8d800,0x26262600,
+	0xc8c8c800,0x37373700,0xc6c6c600,0x3b3b3b00,
+	0x81818100,0x96969600,0x6f6f6f00,0x4b4b4b00,
+	0x13131300,0xbebebe00,0x63636300,0x2e2e2e00,
+	0xe9e9e900,0x79797900,0xa7a7a700,0x8c8c8c00,
+	0x9f9f9f00,0x6e6e6e00,0xbcbcbc00,0x8e8e8e00,
+	0x29292900,0xf5f5f500,0xf9f9f900,0xb6b6b600,
+	0x2f2f2f00,0xfdfdfd00,0xb4b4b400,0x59595900,
+	0x78787800,0x98989800,0x06060600,0x6a6a6a00,
+	0xe7e7e700,0x46464600,0x71717100,0xbababa00,
+	0xd4d4d400,0x25252500,0xababab00,0x42424200,
+	0x88888800,0xa2a2a200,0x8d8d8d00,0xfafafa00,
+	0x72727200,0x07070700,0xb9b9b900,0x55555500,
+	0xf8f8f800,0xeeeeee00,0xacacac00,0x0a0a0a00,
+	0x36363600,0x49494900,0x2a2a2a00,0x68686800,
+	0x3c3c3c00,0x38383800,0xf1f1f100,0xa4a4a400,
+	0x40404000,0x28282800,0xd3d3d300,0x7b7b7b00,
+	0xbbbbbb00,0xc9c9c900,0x43434300,0xc1c1c100,
+	0x15151500,0xe3e3e300,0xadadad00,0xf4f4f400,
+	0x77777700,0xc7c7c700,0x80808000,0x9e9e9e00,
+};
+
+static const u32 camellia_sp0222[256] = {
+	0x00e0e0e0,0x00050505,0x00585858,0x00d9d9d9,
+	0x00676767,0x004e4e4e,0x00818181,0x00cbcbcb,
+	0x00c9c9c9,0x000b0b0b,0x00aeaeae,0x006a6a6a,
+	0x00d5d5d5,0x00181818,0x005d5d5d,0x00828282,
+	0x00464646,0x00dfdfdf,0x00d6d6d6,0x00272727,
+	0x008a8a8a,0x00323232,0x004b4b4b,0x00424242,
+	0x00dbdbdb,0x001c1c1c,0x009e9e9e,0x009c9c9c,
+	0x003a3a3a,0x00cacaca,0x00252525,0x007b7b7b,
+	0x000d0d0d,0x00717171,0x005f5f5f,0x001f1f1f,
+	0x00f8f8f8,0x00d7d7d7,0x003e3e3e,0x009d9d9d,
+	0x007c7c7c,0x00606060,0x00b9b9b9,0x00bebebe,
+	0x00bcbcbc,0x008b8b8b,0x00161616,0x00343434,
+	0x004d4d4d,0x00c3c3c3,0x00727272,0x00959595,
+	0x00ababab,0x008e8e8e,0x00bababa,0x007a7a7a,
+	0x00b3b3b3,0x00020202,0x00b4b4b4,0x00adadad,
+	0x00a2a2a2,0x00acacac,0x00d8d8d8,0x009a9a9a,
+	0x00171717,0x001a1a1a,0x00353535,0x00cccccc,
+	0x00f7f7f7,0x00999999,0x00616161,0x005a5a5a,
+	0x00e8e8e8,0x00242424,0x00565656,0x00404040,
+	0x00e1e1e1,0x00636363,0x00090909,0x00333333,
+	0x00bfbfbf,0x00989898,0x00979797,0x00858585,
+	0x00686868,0x00fcfcfc,0x00ececec,0x000a0a0a,
+	0x00dadada,0x006f6f6f,0x00535353,0x00626262,
+	0x00a3a3a3,0x002e2e2e,0x00080808,0x00afafaf,
+	0x00282828,0x00b0b0b0,0x00747474,0x00c2c2c2,
+	0x00bdbdbd,0x00363636,0x00222222,0x00383838,
+	0x00646464,0x001e1e1e,0x00393939,0x002c2c2c,
+	0x00a6a6a6,0x00303030,0x00e5e5e5,0x00444444,
+	0x00fdfdfd,0x00888888,0x009f9f9f,0x00656565,
+	0x00878787,0x006b6b6b,0x00f4f4f4,0x00232323,
+	0x00484848,0x00101010,0x00d1d1d1,0x00515151,
+	0x00c0c0c0,0x00f9f9f9,0x00d2d2d2,0x00a0a0a0,
+	0x00555555,0x00a1a1a1,0x00414141,0x00fafafa,
+	0x00434343,0x00131313,0x00c4c4c4,0x002f2f2f,
+	0x00a8a8a8,0x00b6b6b6,0x003c3c3c,0x002b2b2b,
+	0x00c1c1c1,0x00ffffff,0x00c8c8c8,0x00a5a5a5,
+	0x00202020,0x00898989,0x00000000,0x00909090,
+	0x00474747,0x00efefef,0x00eaeaea,0x00b7b7b7,
+	0x00151515,0x00060606,0x00cdcdcd,0x00b5b5b5,
+	0x00121212,0x007e7e7e,0x00bbbbbb,0x00292929,
+	0x000f0f0f,0x00b8b8b8,0x00070707,0x00040404,
+	0x009b9b9b,0x00949494,0x00212121,0x00666666,
+	0x00e6e6e6,0x00cecece,0x00ededed,0x00e7e7e7,
+	0x003b3b3b,0x00fefefe,0x007f7f7f,0x00c5c5c5,
+	0x00a4a4a4,0x00373737,0x00b1b1b1,0x004c4c4c,
+	0x00919191,0x006e6e6e,0x008d8d8d,0x00767676,
+	0x00030303,0x002d2d2d,0x00dedede,0x00969696,
+	0x00262626,0x007d7d7d,0x00c6c6c6,0x005c5c5c,
+	0x00d3d3d3,0x00f2f2f2,0x004f4f4f,0x00191919,
+	0x003f3f3f,0x00dcdcdc,0x00797979,0x001d1d1d,
+	0x00525252,0x00ebebeb,0x00f3f3f3,0x006d6d6d,
+	0x005e5e5e,0x00fbfbfb,0x00696969,0x00b2b2b2,
+	0x00f0f0f0,0x00313131,0x000c0c0c,0x00d4d4d4,
+	0x00cfcfcf,0x008c8c8c,0x00e2e2e2,0x00757575,
+	0x00a9a9a9,0x004a4a4a,0x00575757,0x00848484,
+	0x00111111,0x00454545,0x001b1b1b,0x00f5f5f5,
+	0x00e4e4e4,0x000e0e0e,0x00737373,0x00aaaaaa,
+	0x00f1f1f1,0x00dddddd,0x00595959,0x00141414,
+	0x006c6c6c,0x00929292,0x00545454,0x00d0d0d0,
+	0x00787878,0x00707070,0x00e3e3e3,0x00494949,
+	0x00808080,0x00505050,0x00a7a7a7,0x00f6f6f6,
+	0x00777777,0x00939393,0x00868686,0x00838383,
+	0x002a2a2a,0x00c7c7c7,0x005b5b5b,0x00e9e9e9,
+	0x00eeeeee,0x008f8f8f,0x00010101,0x003d3d3d,
+};
+
+static const u32 camellia_sp3033[256] = {
+	0x38003838,0x41004141,0x16001616,0x76007676,
+	0xd900d9d9,0x93009393,0x60006060,0xf200f2f2,
+	0x72007272,0xc200c2c2,0xab00abab,0x9a009a9a,
+	0x75007575,0x06000606,0x57005757,0xa000a0a0,
+	0x91009191,0xf700f7f7,0xb500b5b5,0xc900c9c9,
+	0xa200a2a2,0x8c008c8c,0xd200d2d2,0x90009090,
+	0xf600f6f6,0x07000707,0xa700a7a7,0x27002727,
+	0x8e008e8e,0xb200b2b2,0x49004949,0xde00dede,
+	0x43004343,0x5c005c5c,0xd700d7d7,0xc700c7c7,
+	0x3e003e3e,0xf500f5f5,0x8f008f8f,0x67006767,
+	0x1f001f1f,0x18001818,0x6e006e6e,0xaf00afaf,
+	0x2f002f2f,0xe200e2e2,0x85008585,0x0d000d0d,
+	0x53005353,0xf000f0f0,0x9c009c9c,0x65006565,
+	0xea00eaea,0xa300a3a3,0xae00aeae,0x9e009e9e,
+	0xec00ecec,0x80008080,0x2d002d2d,0x6b006b6b,
+	0xa800a8a8,0x2b002b2b,0x36003636,0xa600a6a6,
+	0xc500c5c5,0x86008686,0x4d004d4d,0x33003333,
+	0xfd00fdfd,0x66006666,0x58005858,0x96009696,
+	0x3a003a3a,0x09000909,0x95009595,0x10001010,
+	0x78007878,0xd800d8d8,0x42004242,0xcc00cccc,
+	0xef00efef,0x26002626,0xe500e5e5,0x61006161,
+	0x1a001a1a,0x3f003f3f,0x3b003b3b,0x82008282,
+	0xb600b6b6,0xdb00dbdb,0xd400d4d4,0x98009898,
+	0xe800e8e8,0x8b008b8b,0x02000202,0xeb00ebeb,
+	0x0a000a0a,0x2c002c2c,0x1d001d1d,0xb000b0b0,
+	0x6f006f6f,0x8d008d8d,0x88008888,0x0e000e0e,
+	0x19001919,0x87008787,0x4e004e4e,0x0b000b0b,
+	0xa900a9a9,0x0c000c0c,0x79007979,0x11001111,
+	0x7f007f7f,0x22002222,0xe700e7e7,0x59005959,
+	0xe100e1e1,0xda00dada,0x3d003d3d,0xc800c8c8,
+	0x12001212,0x04000404,0x74007474,0x54005454,
+	0x30003030,0x7e007e7e,0xb400b4b4,0x28002828,
+	0x55005555,0x68006868,0x50005050,0xbe00bebe,
+	0xd000d0d0,0xc400c4c4,0x31003131,0xcb00cbcb,
+	0x2a002a2a,0xad00adad,0x0f000f0f,0xca00caca,
+	0x70007070,0xff00ffff,0x32003232,0x69006969,
+	0x08000808,0x62006262,0x00000000,0x24002424,
+	0xd100d1d1,0xfb00fbfb,0xba00baba,0xed00eded,
+	0x45004545,0x81008181,0x73007373,0x6d006d6d,
+	0x84008484,0x9f009f9f,0xee00eeee,0x4a004a4a,
+	0xc300c3c3,0x2e002e2e,0xc100c1c1,0x01000101,
+	0xe600e6e6,0x25002525,0x48004848,0x99009999,
+	0xb900b9b9,0xb300b3b3,0x7b007b7b,0xf900f9f9,
+	0xce00cece,0xbf00bfbf,0xdf00dfdf,0x71007171,
+	0x29002929,0xcd00cdcd,0x6c006c6c,0x13001313,
+	0x64006464,0x9b009b9b,0x63006363,0x9d009d9d,
+	0xc000c0c0,0x4b004b4b,0xb700b7b7,0xa500a5a5,
+	0x89008989,0x5f005f5f,0xb100b1b1,0x17001717,
+	0xf400f4f4,0xbc00bcbc,0xd300d3d3,0x46004646,
+	0xcf00cfcf,0x37003737,0x5e005e5e,0x47004747,
+	0x94009494,0xfa00fafa,0xfc00fcfc,0x5b005b5b,
+	0x97009797,0xfe00fefe,0x5a005a5a,0xac00acac,
+	0x3c003c3c,0x4c004c4c,0x03000303,0x35003535,
+	0xf300f3f3,0x23002323,0xb800b8b8,0x5d005d5d,
+	0x6a006a6a,0x92009292,0xd500d5d5,0x21002121,
+	0x44004444,0x51005151,0xc600c6c6,0x7d007d7d,
+	0x39003939,0x83008383,0xdc00dcdc,0xaa00aaaa,
+	0x7c007c7c,0x77007777,0x56005656,0x05000505,
+	0x1b001b1b,0xa400a4a4,0x15001515,0x34003434,
+	0x1e001e1e,0x1c001c1c,0xf800f8f8,0x52005252,
+	0x20002020,0x14001414,0xe900e9e9,0xbd00bdbd,
+	0xdd00dddd,0xe400e4e4,0xa100a1a1,0xe000e0e0,
+	0x8a008a8a,0xf100f1f1,0xd600d6d6,0x7a007a7a,
+	0xbb00bbbb,0xe300e3e3,0x40004040,0x4f004f4f,
+};
+
+static const u32 camellia_sp4404[256] = {
+	0x70700070,0x2c2c002c,0xb3b300b3,0xc0c000c0,
+	0xe4e400e4,0x57570057,0xeaea00ea,0xaeae00ae,
+	0x23230023,0x6b6b006b,0x45450045,0xa5a500a5,
+	0xeded00ed,0x4f4f004f,0x1d1d001d,0x92920092,
+	0x86860086,0xafaf00af,0x7c7c007c,0x1f1f001f,
+	0x3e3e003e,0xdcdc00dc,0x5e5e005e,0x0b0b000b,
+	0xa6a600a6,0x39390039,0xd5d500d5,0x5d5d005d,
+	0xd9d900d9,0x5a5a005a,0x51510051,0x6c6c006c,
+	0x8b8b008b,0x9a9a009a,0xfbfb00fb,0xb0b000b0,
+	0x74740074,0x2b2b002b,0xf0f000f0,0x84840084,
+	0xdfdf00df,0xcbcb00cb,0x34340034,0x76760076,
+	0x6d6d006d,0xa9a900a9,0xd1d100d1,0x04040004,
+	0x14140014,0x3a3a003a,0xdede00de,0x11110011,
+	0x32320032,0x9c9c009c,0x53530053,0xf2f200f2,
+	0xfefe00fe,0xcfcf00cf,0xc3c300c3,0x7a7a007a,
+	0x24240024,0xe8e800e8,0x60600060,0x69690069,
+	0xaaaa00aa,0xa0a000a0,0xa1a100a1,0x62620062,
+	0x54540054,0x1e1e001e,0xe0e000e0,0x64640064,
+	0x10100010,0x00000000,0xa3a300a3,0x75750075,
+	0x8a8a008a,0xe6e600e6,0x09090009,0xdddd00dd,
+	0x87870087,0x83830083,0xcdcd00cd,0x90900090,
+	0x73730073,0xf6f600f6,0x9d9d009d,0xbfbf00bf,
+	0x52520052,0xd8d800d8,0xc8c800c8,0xc6c600c6,
+	0x81810081,0x6f6f006f,0x13130013,0x63630063,
+	0xe9e900e9,0xa7a700a7,0x9f9f009f,0xbcbc00bc,
+	0x29290029,0xf9f900f9,0x2f2f002f,0xb4b400b4,
+	0x78780078,0x06060006,0xe7e700e7,0x71710071,
+	0xd4d400d4,0xabab00ab,0x88880088,0x8d8d008d,
+	0x72720072,0xb9b900b9,0xf8f800f8,0xacac00ac,
+	0x36360036,0x2a2a002a,0x3c3c003c,0xf1f100f1,
+	0x40400040,0xd3d300d3,0xbbbb00bb,0x43430043,
+	0x15150015,0xadad00ad,0x77770077,0x80800080,
+	0x82820082,0xecec00ec,0x27270027,0xe5e500e5,
+	0x85850085,0x35350035,0x0c0c000c,0x41410041,
+	0xefef00ef,0x93930093,0x19190019,0x21210021,
+	0x0e0e000e,0x4e4e004e,0x65650065,0xbdbd00bd,
+	0xb8b800b8,0x8f8f008f,0xebeb00eb,0xcece00ce,
+	0x30300030,0x5f5f005f,0xc5c500c5,0x1a1a001a,
+	0xe1e100e1,0xcaca00ca,0x47470047,0x3d3d003d,
+	0x01010001,0xd6d600d6,0x56560056,0x4d4d004d,
+	0x0d0d000d,0x66660066,0xcccc00cc,0x2d2d002d,
+	0x12120012,0x20200020,0xb1b100b1,0x99990099,
+	0x4c4c004c,0xc2c200c2,0x7e7e007e,0x05050005,
+	0xb7b700b7,0x31310031,0x17170017,0xd7d700d7,
+	0x58580058,0x61610061,0x1b1b001b,0x1c1c001c,
+	0x0f0f000f,0x16160016,0x18180018,0x22220022,
+	0x44440044,0xb2b200b2,0xb5b500b5,0x91910091,
+	0x08080008,0xa8a800a8,0xfcfc00fc,0x50500050,
+	0xd0d000d0,0x7d7d007d,0x89890089,0x97970097,
+	0x5b5b005b,0x95950095,0xffff00ff,0xd2d200d2,
+	0xc4c400c4,0x48480048,0xf7f700f7,0xdbdb00db,
+	0x03030003,0xdada00da,0x3f3f003f,0x94940094,
+	0x5c5c005c,0x02020002,0x4a4a004a,0x33330033,
+	0x67670067,0xf3f300f3,0x7f7f007f,0xe2e200e2,
+	0x9b9b009b,0x26260026,0x37370037,0x3b3b003b,
+	0x96960096,0x4b4b004b,0xbebe00be,0x2e2e002e,
+	0x79790079,0x8c8c008c,0x6e6e006e,0x8e8e008e,
+	0xf5f500f5,0xb6b600b6,0xfdfd00fd,0x59590059,
+	0x98980098,0x6a6a006a,0x46460046,0xbaba00ba,
+	0x25250025,0x42420042,0xa2a200a2,0xfafa00fa,
+	0x07070007,0x55550055,0xeeee00ee,0x0a0a000a,
+	0x49490049,0x68680068,0x38380038,0xa4a400a4,
+	0x28280028,0x7b7b007b,0xc9c900c9,0xc1c100c1,
+	0xe3e300e3,0xf4f400f4,0xc7c700c7,0x9e9e009e,
+};
+
+
+#define CAMELLIA_MIN_KEY_SIZE        16
+#define CAMELLIA_MAX_KEY_SIZE        32
+#define CAMELLIA_BLOCK_SIZE          16
+#define CAMELLIA_TABLE_BYTE_LEN     272
+
+
+/* key constants */
+
+#define CAMELLIA_SIGMA1 (0xA09E667F3BCC908B)
+#define CAMELLIA_SIGMA2 (0xB67AE8584CAA73B2)
+#define CAMELLIA_SIGMA3 (0xC6EF372FE94F82BE)
+#define CAMELLIA_SIGMA4 (0x54FF53A5F1D36F1C)
+#define CAMELLIA_SIGMA5 (0x10E527FADE682D1D)
+#define CAMELLIA_SIGMA6 (0xB05688C2B3E6C1FD)
+
+/*
+ *  macros
+ */
+#define GETU64(v, pt) \
+    do { \
+	/* latest breed of gcc is clever enough to use move */ \
+	memcpy(&(v), (pt), 8); \
+	(v) = be64_to_cpu(v); \
+    } while(0)
+
+/* rotation right shift 1byte */
+#define ROR8(x) (((x) >> 8) + ((x) << (sizeof(x)*8 - 8)))
+/* rotation left shift 1bit */
+#define ROL1(x) (((x) << 1) + ((x) >> (sizeof(x)*8 - 1)))
+/* rotation left shift 1byte */
+#define ROL8(x) (((x) << 8) + ((x) >> (sizeof(x)*8 - 8)))
+
+#define ROLDQ(l, r, w, bits)				\
+    do {						\
+	w = l;						\
+	l = (l << bits) + (r >> (64 - bits));		\
+	r = (r << bits) + (w >> (64 - bits));		\
+    } while(0)
+
+/*
+ * NB: L and R below stand for 'left' and 'right' as in written numbers.
+ * That is, in (xxxL,xxxR) pair xxxL holds most significant digits,
+ * _not_ least significant ones!
+ */
+
+
+/*
+ * Key setup
+ */
+#define CAMELLIA_F(x, k, y, i)					\
+    do {							\
+	u32 yl, yr;						\
+	i = x ^ k;						\
+	yl = camellia_sp1110[(u8)i]				\
+	   ^ camellia_sp0222[(u8)(i >> 24)]			\
+	   ^ camellia_sp3033[(u8)(i >> 16)]			\
+	   ^ camellia_sp4404[(u8)(i >> 8)];			\
+	yr = camellia_sp1110[    (i >> 56)]			\
+	   ^ camellia_sp0222[(u8)(i >> 48)]			\
+	   ^ camellia_sp3033[(u8)(i >> 40)]			\
+	   ^ camellia_sp4404[(u8)(i >> 32)];			\
+	yl ^= yr;						\
+	yr = ROR8(yr);						\
+	yr ^= yl;						\
+	y = ((u64)yl << 32) + yr;				\
+    } while(0)
+
+#define SUBKEY(INDEX) (subkey[(INDEX)])
+
+#ifdef __BIG_ENDIAN
+#define SUBKEY_L(INDEX) (((u32*)subkey)[(INDEX)*2])
+#define SUBKEY_R(INDEX) (((u32*)subkey)[(INDEX)*2 + 1])
+#else
+#define SUBKEY_L(INDEX) (((u32*)subkey)[(INDEX)*2 + 1])
+#define SUBKEY_R(INDEX) (((u32*)subkey)[(INDEX)*2])
+#endif
+
+static void camellia_setup_tail(u64 *subkey, int max)
+{
+	u32 dw;
+	int i = 2;
+	do {
+		dw = SUBKEY_L(i + 0) ^ SUBKEY_R(i + 0); dw = ROL8(dw);/* round 1 */
+		SUBKEY_R(i + 0) = SUBKEY_L(i + 0) ^ dw; SUBKEY_L(i + 0) = dw;
+		dw = SUBKEY_L(i + 1) ^ SUBKEY_R(i + 1); dw = ROL8(dw);/* round 2 */
+		SUBKEY_R(i + 1) = SUBKEY_L(i + 1) ^ dw; SUBKEY_L(i + 1) = dw;
+		dw = SUBKEY_L(i + 2) ^ SUBKEY_R(i + 2); dw = ROL8(dw);/* round 3 */
+		SUBKEY_R(i + 2) = SUBKEY_L(i + 2) ^ dw; SUBKEY_L(i + 2) = dw;
+		dw = SUBKEY_L(i + 3) ^ SUBKEY_R(i + 3); dw = ROL8(dw);/* round 4 */
+		SUBKEY_R(i + 3) = SUBKEY_L(i + 3) ^ dw; SUBKEY_L(i + 3) = dw;
+		dw = SUBKEY_L(i + 4) ^ SUBKEY_R(i + 4); dw = ROL8(dw);/* round 5 */
+		SUBKEY_R(i + 4) = SUBKEY_L(i + 4) ^ dw; SUBKEY_L(i + 4) = dw;
+		dw = SUBKEY_L(i + 5) ^ SUBKEY_R(i + 5); dw = ROL8(dw);/* round 6 */
+		SUBKEY_R(i + 5) = SUBKEY_L(i + 5) ^ dw; SUBKEY_L(i + 5) = dw;
+		i += 8;
+	} while (i < max);
+}
+
+#ifdef __BIG_ENDIAN
+#define subL(INDEX) (((u32*)sub)[(INDEX)*2])
+#define subR(INDEX) (((u32*)sub)[(INDEX)*2 + 1])
+#else
+#define subL(INDEX) (((u32*)sub)[(INDEX)*2 + 1])
+#define subR(INDEX) (((u32*)sub)[(INDEX)*2])
+#endif
+
+static void camellia_setup128(const unsigned char *key, u64 *subkey)
+{
+	u64 kl, kr;
+	u64 i, t, w;
+	u64 kw4;
+	u32 dw;
+	u64 sub[26];
+
+	/**
+	 *  k == kl || kr (|| is concatination)
+	 */
+	GETU64(kl, key     );
+	GETU64(kr, key +  8);
+
+	/**
+	 * generate KL dependent subkeys
+	 */
+	/* kw1 */
+	sub[0] = kl;
+	/* kw2 */
+	sub[1] = kr;
+	/* rotation left shift 15bit */
+	ROLDQ(kl, kr, w, 15);
+	/* k3 */
+	sub[4] = kl;
+	/* k4 */
+	sub[5] = kr;
+	/* rotation left shift 15+30bit */
+	ROLDQ(kl, kr, w, 30);
+	/* k7 */
+	sub[10] = kl;
+	/* k8 */
+	sub[11] = kr;
+	/* rotation left shift 15+30+15bit */
+	ROLDQ(kl, kr, w, 15);
+	/* k10 */
+	sub[13] = kr;
+	/* rotation left shift 15+30+15+17 bit */
+	ROLDQ(kl, kr, w, 17);
+	/* kl3 */
+	sub[16] = kl;
+	/* kl4 */
+	sub[17] = kr;
+	/* rotation left shift 15+30+15+17+17 bit */
+	ROLDQ(kl, kr, w, 17);
+	/* k13 */
+	sub[18] = kl;
+	/* k14 */
+	sub[19] = kr;
+	/* rotation left shift 15+30+15+17+17+17 bit */
+	ROLDQ(kl, kr, w, 17);
+	/* k17 */
+	sub[22] = kl;
+	/* k18 */
+	sub[23] = kr;
+
+	/* generate KA */
+	kl = sub[0];
+	kr = sub[1];
+	CAMELLIA_F(kl, CAMELLIA_SIGMA1, w, i);
+	kr ^= w;
+	CAMELLIA_F(kr, CAMELLIA_SIGMA2, kl, i);
+	/* current status == (kl, w) */
+	CAMELLIA_F(kl, CAMELLIA_SIGMA3, kr, i);
+	kr ^= w;
+	CAMELLIA_F(kr, CAMELLIA_SIGMA4, w, i);
+	kl ^= w;
+
+	/* generate KA dependent subkeys */
+	/* k1, k2 */
+	sub[2] = kl;
+	sub[3] = kr;
+	ROLDQ(kl, kr, w, 15);
+	/* k5,k6 */
+	sub[6] = kl;
+	sub[7] = kr;
+	ROLDQ(kl, kr, w, 15);
+	/* kl1, kl2 */
+	sub[8] = kl;
+	sub[9] = kr;
+	ROLDQ(kl, kr, w, 15);
+	/* k9 */
+	sub[12] = kl;
+	ROLDQ(kl, kr, w, 15);
+	/* k11, k12 */
+	sub[14] = kl;
+	sub[15] = kr;
+	ROLDQ(kl, kr, w, 34);
+	/* k15, k16 */
+	sub[20] = kl;
+	sub[21] = kr;
+	ROLDQ(kl, kr, w, 17);
+	/* kw3, kw4 */
+	sub[24] = kl;
+	sub[25] = kr;
+
+	/* absorb kw2 to other subkeys */
+	/* round 2 */
+	sub[3] ^= sub[1];
+	/* round 4 */
+	sub[5] ^= sub[1];
+	/* round 6 */
+	sub[7] ^= sub[1];
+	subL(1) ^= subR(1) & ~subR(9);
+	dw = subL(1) & subL(9),
+		subR(1) ^= ROL1(dw); /* modified for FLinv(kl2) */
+	/* round 8 */
+	sub[11] ^= sub[1];
+	/* round 10 */
+	sub[13] ^= sub[1];
+	/* round 12 */
+	sub[15] ^= sub[1];
+	subL(1) ^= subR(1) & ~subR(17);
+	dw = subL(1) & subL(17),
+		subR(1) ^= ROL1(dw); /* modified for FLinv(kl4) */
+	/* round 14 */
+	sub[19] ^= sub[1];
+	/* round 16 */
+	sub[21] ^= sub[1];
+	/* round 18 */
+	sub[23] ^= sub[1];
+	/* kw3 */
+	sub[24] ^= sub[1];
+
+	/* absorb kw4 to other subkeys */
+	kw4 = sub[25];
+	/* round 17 */
+	sub[22] ^= kw4;
+	/* round 15 */
+	sub[20] ^= kw4;
+	/* round 13 */
+	sub[18] ^= kw4;
+	kw4 ^= (u64)((u32)kw4 & ~subR(16)) << 32; //kw4l ^= kw4r & ~subR(16);
+	dw = (u32)(kw4 >> 32) & subL(16); // kw4l & subL[16],
+	kw4 ^= ROL1(dw); /* modified for FL(kl3) */
+	/* round 11 */
+	sub[14] ^= kw4;
+	/* round 9 */
+	sub[12] ^= kw4;
+	/* round 7 */
+	sub[10] ^= kw4;
+	kw4 ^= (u64)((u32)kw4 & ~subR(8)) << 32; //kw4l ^= kw4r & ~subR[8];
+	dw = (u32)(kw4 >> 32) & subL(8);
+	kw4 ^= ROL1(dw); /* modified for FL(kl1) */
+	/* round 5 */
+	sub[6] ^= kw4;
+	/* round 3 */
+	sub[4] ^= kw4;
+	/* round 1 */
+	sub[2] ^= kw4;
+	/* kw1 */
+	sub[0] ^= kw4;
+
+	/* key XOR is end of F-function */
+	SUBKEY(0) = sub[0] ^ sub[2];/* kw1 */
+	SUBKEY(2) = sub[3];       /* round 1 */
+	SUBKEY(3) = sub[2] ^ sub[4]; /* round 2 */
+	SUBKEY(4) = sub[3] ^ sub[5]; /* round 3 */
+	SUBKEY(5) = sub[4] ^ sub[6]; /* round 4 */
+	SUBKEY(6) = sub[5] ^ sub[7]; /* round 5 */
+	t = subL(10) ^ (subR(10) & ~subR(8)); // tl = subL[10] ^ (subR[10] & ~subR[8]);
+	dw = (u32)t & subL(8);  /* FL(kl1) */
+	t = (t << 32) | (subR(10) ^ ROL1(dw)); // tr = subR[10] ^ ROL1(dw);
+	SUBKEY(7) = sub[6] ^ t; /* round 6 */
+	SUBKEY(8) = sub[8];       /* FL(kl1) */
+	SUBKEY(9) = sub[9];       /* FLinv(kl2) */
+	t = subL(7) ^ (subR(7) & ~subR(9));
+	dw = (u32)t & subL(9);  /* FLinv(kl2) */
+	t = (t << 32) | (subR(7) ^ ROL1(dw));
+	SUBKEY(10) = t ^ sub[11]; /* round 7 */
+	SUBKEY(11) = sub[10] ^ sub[12]; /* round 8 */
+	SUBKEY(12) = sub[11] ^ sub[13]; /* round 9 */
+	SUBKEY(13) = sub[12] ^ sub[14]; /* round 10 */
+	SUBKEY(14) = sub[13] ^ sub[15]; /* round 11 */
+	t = subL(18) ^ (subR(18) & ~subR(16));
+	dw = (u32)t & subL(16); /* FL(kl3) */
+	t = (t << 32) | (subR(18) ^ ROL1(dw));
+	SUBKEY(15) = sub[14] ^ t; /* round 12 */
+	SUBKEY(16) = sub[16];     /* FL(kl3) */
+	SUBKEY(17) = sub[17];     /* FLinv(kl4) */
+	t = subL(15) ^ (subR(15) & ~subR(17));
+	dw = (u32)t & subL(17); /* FLinv(kl4) */
+	t = (t << 32) | (subR(15) ^ ROL1(dw));
+	SUBKEY(18) = t ^ sub[19]; /* round 13 */
+	SUBKEY(19) = sub[18] ^ sub[20]; /* round 14 */
+	SUBKEY(20) = sub[19] ^ sub[21]; /* round 15 */
+	SUBKEY(21) = sub[20] ^ sub[22]; /* round 16 */
+	SUBKEY(22) = sub[21] ^ sub[23]; /* round 17 */
+	SUBKEY(23) = sub[22];     /* round 18 */
+	SUBKEY(24) = sub[24] ^ sub[23]; /* kw3 */
+
+	/* apply the inverse of the last half of P-function */
+	camellia_setup_tail(subkey, 24);
+}
+
+static void camellia_setup256(const unsigned char *key, u64 *subkey)
+{
+	u64 kl, kr;        /* left half of key */
+	u64 krl, krr;      /* right half of key */
+	u64 i, t, w;       /* temporary variables */
+	u64 kw4;
+	u32 dw;
+	u64 sub[34];
+
+	/**
+	 *  key = (kl || kr || krl || krr)
+	 *  (|| is concatination)
+	 */
+	GETU64(kl,  key     );
+	GETU64(kr,  key +  8);
+	GETU64(krl, key + 16);
+	GETU64(krr, key + 24);
+
+	/* generate KL dependent subkeys */
+	/* kw1 */
+	sub[0] = kl;
+	/* kw2 */
+	sub[1] = kr;
+	ROLDQ(kl, kr, w, 45);
+	/* k9 */
+	sub[12] = kl;
+	/* k10 */
+	sub[13] = kr;
+	ROLDQ(kl, kr, w, 15);
+	/* kl3 */
+	sub[16] = kl;
+	/* kl4 */
+	sub[17] = kr;
+	ROLDQ(kl, kr, w, 17);
+	/* k17 */
+	sub[22] = kl;
+	/* k18 */
+	sub[23] = kr;
+	ROLDQ(kl, kr, w, 34);
+	/* k23 */
+	sub[30] = kl;
+	/* k24 */
+	sub[31] = kr;
+
+	/* generate KR dependent subkeys */
+	ROLDQ(krl, krr, w, 15);
+	/* k3 */
+	sub[4] = krl;
+	/* k4 */
+	sub[5] = krr;
+	ROLDQ(krl, krr, w, 15);
+	/* kl1 */
+	sub[8] = krl;
+	/* kl2 */
+	sub[9] = krr;
+	ROLDQ(krl, krr, w, 30);
+	/* k13 */
+	sub[18] = krl;
+	/* k14 */
+	sub[19] = krr;
+	ROLDQ(krl, krr, w, 34);
+	/* k19 */
+	sub[26] = krl;
+	/* k20 */
+	sub[27] = krr;
+	ROLDQ(krl, krr, w, 34);
+
+	/* generate KA */
+	kl = sub[0] ^ krl;
+	kr = sub[1] ^ krr;
+	CAMELLIA_F(kl, CAMELLIA_SIGMA1, w, i);
+	kr ^= w;
+	CAMELLIA_F(kr, CAMELLIA_SIGMA2, kl, i);
+	kl ^= krl;
+	CAMELLIA_F(kl, CAMELLIA_SIGMA3, kr, i);
+	kr ^= w ^ krr;
+	CAMELLIA_F(kr, CAMELLIA_SIGMA4, w, i);
+	kl ^= w;
+
+	/* generate KB */
+	krl ^= kl;
+	krr ^= kr;
+	CAMELLIA_F(krl, CAMELLIA_SIGMA5, w, i);
+	krr ^= w;
+	CAMELLIA_F(krr, CAMELLIA_SIGMA6, w, i);
+	krl ^= w;
+
+	/* generate KA dependent subkeys */
+	ROLDQ(kl, kr, w, 15);
+	/* k5 */
+	sub[6] = kl;
+	/* k6 */
+	sub[7] = kr;
+	ROLDQ(kl, kr, w, 30);
+	/* k11 */
+	sub[14] = kl;
+	/* k12 */
+	sub[15] = kr;
+	/* kl5 */
+	ROLDQ(kl, kr, w, 32);
+	sub[24] = kl;
+	/* kl6 */
+	sub[25] = kr;
+	/* rotation left shift 49 from k11,k12 -> k21,k22 */
+	ROLDQ(kl, kr, w, (49 - 32));
+	/* k21 */
+	sub[28] = kl;
+	/* k22 */
+	sub[29] = kr;
+
+	/* generate KB dependent subkeys */
+	/* k1 */
+	sub[2] = krl;
+	/* k2 */
+	sub[3] = krr;
+	ROLDQ(krl, krr, w, 30);
+	/* k7 */
+	sub[10] = krl;
+	/* k8 */
+	sub[11] = krr;
+	ROLDQ(krl, krr, w, 30);
+	/* k15 */
+	sub[20] = krl;
+	/* k16 */
+	sub[21] = krr;
+	ROLDQ(krl, krr, w, 51);
+	/* kw3 */
+	sub[32] = krl;
+	/* kw4 */
+	sub[33] = krr;
+
+	/* absorb kw2 to other subkeys */
+	/* round 2 */
+	sub[3] ^= sub[1];
+	/* round 4 */
+	sub[5] ^= sub[1];
+	/* round 6 */
+	sub[7] ^= sub[1];
+	subL(1) ^= subR(1) & ~subR(9);
+	dw = subL(1) & subL(9),
+		subR(1) ^= ROL1(dw); /* modified for FLinv(kl2) */
+	/* round 8 */
+	sub[11] ^= sub[1];
+	/* round 10 */
+	sub[13] ^= sub[1];
+	/* round 12 */
+	sub[15] ^= sub[1];
+	subL(1) ^= subR(1) & ~subR(17);
+	dw = subL(1) & subL(17),
+		subR(1) ^= ROL1(dw); /* modified for FLinv(kl4) */
+	/* round 14 */
+	sub[19] ^= sub[1];
+	/* round 16 */
+	sub[21] ^= sub[1];
+	/* round 18 */
+	sub[23] ^= sub[1];
+	subL(1) ^= subR(1) & ~subR(25);
+	dw = subL(1) & subL(25),
+		subR(1) ^= ROL1(dw); /* modified for FLinv(kl6) */
+	/* round 20 */
+	sub[27] ^= sub[1];
+	/* round 22 */
+	sub[29] ^= sub[1];
+	/* round 24 */
+	sub[31] ^= sub[1];
+	/* kw3 */
+	sub[32] ^= sub[1];
+
+	/* absorb kw4 to other subkeys */
+	kw4 = sub[33];
+	/* round 23 */
+	sub[30] ^= kw4;
+	/* round 21 */
+	sub[28] ^= kw4;
+	/* round 19 */
+	sub[26] ^= kw4;
+	kw4 ^= (u64)((u32)kw4 & ~subR(24)) << 32; //kw4l ^= kw4r & ~subR[24];
+	dw = (u32)(kw4 >> 32) & subL(24),
+		kw4 ^= ROL1(dw); /* modified for FL(kl5) */
+	/* round 17 */
+	sub[22] ^= kw4;
+	/* round 15 */
+	sub[20] ^= kw4;
+	/* round 13 */
+	sub[18] ^= kw4;
+	kw4 ^= (u64)((u32)kw4 & ~subR(16)) << 32;
+	dw = (u32)(kw4 >> 32) & subL(16),
+		kw4 ^= ROL1(dw); /* modified for FL(kl3) */
+	/* round 11 */
+	sub[14] ^= kw4;
+	/* round 9 */
+	sub[12] ^= kw4;
+	/* round 7 */
+	sub[10] ^= kw4;
+	kw4 ^= (u64)((u32)kw4 & ~subR(8)) << 32;
+	dw = (u32)(kw4 >> 32) & subL(8),
+		kw4 ^= ROL1(dw); /* modified for FL(kl1) */
+	/* round 5 */
+	sub[6] ^= kw4;
+	/* round 3 */
+	sub[4] ^= kw4;
+	/* round 1 */
+	sub[2] ^= kw4;
+	/* kw1 */
+	sub[0] ^= kw4;
+
+	/* key XOR is end of F-function */
+	SUBKEY(0) = sub[0] ^ sub[2];/* kw1 */
+	SUBKEY(2) = sub[3];       /* round 1 */
+	SUBKEY(3) = sub[2] ^ sub[4]; /* round 2 */
+	SUBKEY(4) = sub[3] ^ sub[5]; /* round 3 */
+	SUBKEY(5) = sub[4] ^ sub[6]; /* round 4 */
+	SUBKEY(6) = sub[5] ^ sub[7]; /* round 5 */
+	t = subL(10) ^ (subR(10) & ~subR(8)); // tl = subL[10] ^ (subR[10] & ~subR[8]);
+	dw = (u32)t & subL(8);  /* FL(kl1) */
+	t = (t << 32) | (subR(10) ^ ROL1(dw)); //tr = subR[10] ^ ROL1(dw);
+	SUBKEY(7) = sub[6] ^ t;   /* round 6 */
+	SUBKEY(8) = sub[8];       /* FL(kl1) */
+	SUBKEY(9) = sub[9];       /* FLinv(kl2) */
+	t = subL(7) ^ (subR(7) & ~subR(9));
+	dw = (u32)t & subL(9);  /* FLinv(kl2) */
+	t = (t << 32) | (subR(7) ^ ROL1(dw));
+	SUBKEY(10) = t ^ sub[11]; /* round 7 */
+	SUBKEY(11) = sub[10] ^ sub[12]; /* round 8 */
+	SUBKEY(12) = sub[11] ^ sub[13]; /* round 9 */
+	SUBKEY(13) = sub[12] ^ sub[14]; /* round 10 */
+	SUBKEY(14) = sub[13] ^ sub[15]; /* round 11 */
+	t = subL(18) ^ (subR(18) & ~subR(16));
+	dw = (u32)t & subL(16); /* FL(kl3) */
+	t = (t << 32) | (subR(18) ^ ROL1(dw));
+	SUBKEY(15) = sub[14] ^ t; /* round 12 */
+	SUBKEY(16) = sub[16];     /* FL(kl3) */
+	SUBKEY(17) = sub[17];     /* FLinv(kl4) */
+	t = subL(15) ^ (subR(15) & ~subR(17));
+	dw = (u32)t & subL(17); /* FLinv(kl4) */
+	t = (t << 32) | (subR(15) ^ ROL1(dw));
+	SUBKEY(18) = t ^ sub[19]; /* round 13 */
+	SUBKEY(19) = sub[18] ^ sub[20]; /* round 14 */
+	SUBKEY(20) = sub[19] ^ sub[21]; /* round 15 */
+	SUBKEY(21) = sub[20] ^ sub[22]; /* round 16 */
+	SUBKEY(22) = sub[21] ^ sub[23]; /* round 17 */
+	t = subL(26) ^ (subR(26) & ~subR(24));
+	dw = (u32)t & subL(24); /* FL(kl5) */
+	t = (t << 32) | (subR(26) ^ ROL1(dw));
+	SUBKEY(23) = sub[22] ^ t; /* round 18 */
+	SUBKEY(24) = sub[24];     /* FL(kl5) */
+	SUBKEY(25) = sub[25];     /* FLinv(kl6) */
+	t = subL(23) ^ (subR(23) & ~subR(25));
+	dw = (u32)t & subL(25); /* FLinv(kl6) */
+	t = (t << 32) | (subR(23) ^ ROL1(dw));
+	SUBKEY(26) = t ^ sub[27]; /* round 19 */
+	SUBKEY(27) = sub[26] ^ sub[28]; /* round 20 */
+	SUBKEY(28) = sub[27] ^ sub[29]; /* round 21 */
+	SUBKEY(29) = sub[28] ^ sub[30]; /* round 22 */
+	SUBKEY(30) = sub[29] ^ sub[31]; /* round 23 */
+	SUBKEY(31) = sub[30];     /* round 24 */
+	SUBKEY(32) = sub[32] ^ sub[31]; /* kw3 */
+
+	/* apply the inverse of the last half of P-function */
+	camellia_setup_tail(subkey, 32);
+}
+
+static void camellia_setup192(const unsigned char *key, u64 *subkey)
+{
+	unsigned char kk[32];
+	u64 krl, krr;
+
+	memcpy(kk, key, 24);
+	memcpy((unsigned char *)&krl, key+16, 8);
+	krr = ~krl;
+	memcpy(kk+24, (unsigned char *)&krr, 8);
+	camellia_setup256(kk, subkey);
+}
+
+
+/*
+ * Encrypt/decrypt
+ */
+#define CAMELLIA_FLS(ll, lr, rl, rr, kll, klr, krl, krr, t0, t1, t2, t3) \
+    do {								\
+	t0 = kll & ll;							\
+	t2 = krr | rr;							\
+	rl ^= t2;							\
+	lr ^= ROL1(t0);							\
+	t3 = krl & rl;							\
+	t1 = klr | lr;							\
+	ll ^= t1;							\
+	rr ^= ROL1(t3);							\
+    } while(0)
+
+#define CAMELLIA_ROUNDSM(xl, xr, kl, kr, yl, yr, il, ir)		\
+    do {								\
+	ir =  camellia_sp1110[(u8)xr];					\
+	il =  camellia_sp1110[    (xl >> 24)];				\
+	ir ^= camellia_sp0222[    (xr >> 24)];				\
+	il ^= camellia_sp0222[(u8)(xl >> 16)];				\
+	ir ^= camellia_sp3033[(u8)(xr >> 16)];				\
+	il ^= camellia_sp3033[(u8)(xl >> 8)];				\
+	ir ^= camellia_sp4404[(u8)(xr >> 8)];				\
+	il ^= camellia_sp4404[(u8)xl];					\
+	il ^= kl;							\
+	ir ^= il ^ kr;							\
+	yl ^= ir;							\
+	yr ^= ROR8(il) ^ ir;						\
+    } while(0)
+
+/* max = 24: 128bit encrypt, max = 32: 256bit encrypt */
+static void camellia_do_encrypt(const u64 *subkey, u32 *io, unsigned max)
+{
+	u32 il,ir,t0,t1;               /* temporary variables */
+
+	/* pre whitening but absorb kw2 */
+	io[0] ^= SUBKEY_L(0);
+	io[1] ^= SUBKEY_R(0);
+
+	/* main iteration */
+#define ROUNDS(i) do { \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 2),SUBKEY_R(i + 2), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 3),SUBKEY_R(i + 3), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 4),SUBKEY_R(i + 4), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 5),SUBKEY_R(i + 5), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 6),SUBKEY_R(i + 6), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 7),SUBKEY_R(i + 7), \
+			 io[0],io[1],il,ir); \
+} while (0)
+#define FLS(i) do { \
+	CAMELLIA_FLS(io[0],io[1],io[2],io[3], \
+		     SUBKEY_L(i + 0),SUBKEY_R(i + 0), \
+		     SUBKEY_L(i + 1),SUBKEY_R(i + 1), \
+		     t0,t1,il,ir); \
+} while (0)
+
+#ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
+	{
+		unsigned i = 0;
+		while (1) {
+			ROUNDS(i);
+			i += 8;
+			if (i >= max)
+				break;
+			FLS(i);
+		}
+	}
+#else
+	ROUNDS(0);
+	FLS(8);
+	ROUNDS(8);
+	FLS(16);
+	ROUNDS(16);
+	if (max == 32) {
+		FLS(24);
+		ROUNDS(24);
+	}
+#endif
+
+#undef ROUNDS
+#undef FLS
+
+	/* post whitening but kw4 */
+	io[2] ^= SUBKEY_L(max);
+	io[3] ^= SUBKEY_R(max);
+	/* NB: 0,1 should be swapped with 2,3 by caller! */
+}
+
+static void camellia_do_decrypt(const u64 *subkey, u32 *io, unsigned i)
+{
+	u32 il,ir,t0,t1;               /* temporary variables */
+
+	/* pre whitening but absorb kw2 */
+	io[0] ^= SUBKEY_L(i);
+	io[1] ^= SUBKEY_R(i);
+
+	/* main iteration */
+#define ROUNDS(i) do { \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 7),SUBKEY_R(i + 7), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 6),SUBKEY_R(i + 6), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 5),SUBKEY_R(i + 5), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 4),SUBKEY_R(i + 4), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 3),SUBKEY_R(i + 3), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 2),SUBKEY_R(i + 2), \
+			 io[0],io[1],il,ir); \
+} while (0)
+#define FLS(i) do { \
+	CAMELLIA_FLS(io[0],io[1],io[2],io[3], \
+		     SUBKEY_L(i + 1),SUBKEY_R(i + 1), \
+		     SUBKEY_L(i + 0),SUBKEY_R(i + 0), \
+		     t0,t1,il,ir); \
+} while (0)
+
+#ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
+	while (1) {
+		i -= 8;
+		ROUNDS(i);
+		if (i == 0)
+			break;
+		FLS(i);
+	}
+#else
+	if (i == 32) {
+		ROUNDS(24);
+		FLS(24);
+	}
+	ROUNDS(16);
+	FLS(16);
+	ROUNDS(8);
+	FLS(8);
+	ROUNDS(0);
+#endif
+
+#undef ROUNDS
+#undef FLS
+
+	/* post whitening but kw4 */
+	io[2] ^= SUBKEY_L(0);
+	io[3] ^= SUBKEY_R(0);
+	/* NB: 0,1 should be swapped with 2,3 by caller! */
+}
+
+
+struct camellia_ctx {
+	int key_length;
+	u64 key_table[CAMELLIA_TABLE_BYTE_LEN / 8];
+};
+
+static int
+camellia_set_key(struct crypto_tfm *tfm, const u8 *in_key,
+		 unsigned int key_len)
+{
+	struct camellia_ctx *cctx = crypto_tfm_ctx(tfm);
+	const unsigned char *key = (const unsigned char *)in_key;
+	u32 *flags = &tfm->crt_flags;
+
+	if (key_len != 16 && key_len != 24 && key_len != 32) {
+		*flags |= CRYPTO_TFM_RES_BAD_KEY_LEN;
+		return -EINVAL;
+	}
+
+	cctx->key_length = key_len;
+
+	switch (key_len) {
+	case 16:
+		camellia_setup128(key, cctx->key_table);
+		break;
+	case 24:
+		camellia_setup192(key, cctx->key_table);
+		break;
+	case 32:
+		camellia_setup256(key, cctx->key_table);
+		break;
+	}
+
+	return 0;
+}
+
+static void camellia_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+	const struct camellia_ctx *cctx = crypto_tfm_ctx(tfm);
+	const __be32 *src = (const __be32 *)in;
+	__be32 *dst = (__be32 *)out;
+
+	u32 tmp[4];
+
+	tmp[0] = be32_to_cpu(src[0]);
+	tmp[1] = be32_to_cpu(src[1]);
+	tmp[2] = be32_to_cpu(src[2]);
+	tmp[3] = be32_to_cpu(src[3]);
+
+	camellia_do_encrypt(cctx->key_table, tmp,
+		cctx->key_length == 16 ? 24 : 32 /* for key lengths of 24 and 32 */
+	);
+
+	/* do_encrypt returns 0,1 swapped with 2,3 */
+	dst[0] = cpu_to_be32(tmp[2]);
+	dst[1] = cpu_to_be32(tmp[3]);
+	dst[2] = cpu_to_be32(tmp[0]);
+	dst[3] = cpu_to_be32(tmp[1]);
+}
+
+static void camellia_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+	const struct camellia_ctx *cctx = crypto_tfm_ctx(tfm);
+	const __be32 *src = (const __be32 *)in;
+	__be32 *dst = (__be32 *)out;
+
+	u32 tmp[4];
+
+	tmp[0] = be32_to_cpu(src[0]);
+	tmp[1] = be32_to_cpu(src[1]);
+	tmp[2] = be32_to_cpu(src[2]);
+	tmp[3] = be32_to_cpu(src[3]);
+
+	camellia_do_decrypt(cctx->key_table, tmp,
+		cctx->key_length == 16 ? 24 : 32 /* for key lengths of 24 and 32 */
+	);
+
+	/* do_decrypt returns 0,1 swapped with 2,3 */
+	dst[0] = cpu_to_be32(tmp[2]);
+	dst[1] = cpu_to_be32(tmp[3]);
+	dst[2] = cpu_to_be32(tmp[0]);
+	dst[3] = cpu_to_be32(tmp[1]);
+}
+
+static struct crypto_alg camellia_alg = {
+	.cra_name		=	"camellia",
+	.cra_driver_name	=	"camellia-generic",
+	.cra_priority		=	100,
+	.cra_flags		=	CRYPTO_ALG_TYPE_CIPHER,
+	.cra_blocksize		=	CAMELLIA_BLOCK_SIZE,
+	.cra_ctxsize		=	sizeof(struct camellia_ctx),
+	.cra_alignmask		=	3,
+	.cra_module		=	THIS_MODULE,
+	.cra_list		=	LIST_HEAD_INIT(camellia_alg.cra_list),
+	.cra_u			=	{
+		.cipher = {
+			.cia_min_keysize	=	CAMELLIA_MIN_KEY_SIZE,
+			.cia_max_keysize	=	CAMELLIA_MAX_KEY_SIZE,
+			.cia_setkey		=	camellia_set_key,
+			.cia_encrypt		=	camellia_encrypt,
+			.cia_decrypt		=	camellia_decrypt
+		}
+	}
+};
+
+static int __init camellia_init(void)
+{
+	return crypto_register_alg(&camellia_alg);
+}
+
+static void __exit camellia_fini(void)
+{
+	crypto_unregister_alg(&camellia_alg);
+}
+
+module_init(camellia_init);
+module_exit(camellia_fini);
+
+MODULE_DESCRIPTION("Camellia Cipher Algorithm");
+MODULE_LICENSE("GPL");

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization
  2007-11-07 13:22     ` Denys Vlasenko
@ 2007-11-08 13:30       ` Herbert Xu
  2007-11-13  6:07         ` Noriaki TAKAMIYA
  0 siblings, 1 reply; 40+ messages in thread
From: Herbert Xu @ 2007-11-08 13:30 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: linux-crypto, Noriaki TAKAMIYA

On Wed, Nov 07, 2007 at 01:22:52PM +0000, Denys Vlasenko wrote:
>
> *I* am happy with 5% speed sacrifice. I'm afraid other people won't be.

I'd like to hear the opinion of the author.

Takamiya-san, what do you think about this change?

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization
  2007-11-08 13:30       ` Herbert Xu
@ 2007-11-13  6:07         ` Noriaki TAKAMIYA
  2007-11-13  6:25           ` [camellia-oss:00952] " Noriaki TAKAMIYA
  0 siblings, 1 reply; 40+ messages in thread
From: Noriaki TAKAMIYA @ 2007-11-13  6:07 UTC (permalink / raw)
  To: herbert; +Cc: vda.linux, linux-crypto

Hi,

  Sorry for late reply

>> Thu, 8 Nov 2007 21:30:20 +0800 頃、
>> [Subject: Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization] において、
>> Herbert Xu <herbert@gondor.apana.org.au>さんが書きました....

> On Wed, Nov 07, 2007 at 01:22:52PM +0000, Denys Vlasenko wrote:
> >
> > *I* am happy with 5% speed sacrifice. I'm afraid other people won't be.
> 
> I'd like to hear the opinion of the author.
> 
> Takamiya-san, what do you think about this change?

  For IPsec processing, I think performance is important.

  If this fix improves the performance, it is acceptable.

  But, there are many duplicate decralations between camellia.c and
  camellia_64.c...
  (e.g., CAMELLIA_MIN_KEY_SIZE and so on)

  Regards,

--
Noriaki TAKAMIYA

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization
  2007-11-13  6:07         ` Noriaki TAKAMIYA
@ 2007-11-13  6:25           ` Noriaki TAKAMIYA
  2007-11-13 22:34             ` Denys Vlasenko
  0 siblings, 1 reply; 40+ messages in thread
From: Noriaki TAKAMIYA @ 2007-11-13  6:25 UTC (permalink / raw)
  To: herbert; +Cc: vda.linux, linux-crypto

Hi,

  sorry, again.

>> Tue, 13 Nov 2007 15:07:02 +0900 (JST) 
>> [Subject: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization] 
>> Noriaki TAKAMIYA <takamiya@po.ntts.co.jp> wrote...

> > I'd like to hear the opinion of the author.
> > 
> > Takamiya-san, what do you think about this change?
> 
>   For IPsec processing, I think performance is important.
> 
>   If this fix improves the performance, it is acceptable.

  I misunderstood the meaning. If this fix decreases the performance,
  I wouldn't prefer this patch(and the below is also one of the
  reason).

>   But, there are many duplicate decralations between camellia.c and
>   camellia_64.c...
>   (e.g., CAMELLIA_MIN_KEY_SIZE and so on)

  Regards,

--
Noriaki TAKAMIYA

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization
  2007-11-13  6:25           ` [camellia-oss:00952] " Noriaki TAKAMIYA
@ 2007-11-13 22:34             ` Denys Vlasenko
  2007-11-14  1:41               ` David Miller
  0 siblings, 1 reply; 40+ messages in thread
From: Denys Vlasenko @ 2007-11-13 22:34 UTC (permalink / raw)
  To: Noriaki TAKAMIYA; +Cc: herbert, linux-crypto

[-- Attachment #1: Type: text/plain, Size: 4186 bytes --]

On Monday 12 November 2007 23:25, Noriaki TAKAMIYA wrote:
> Hi,
>
>   sorry, again.
>
> >> Tue, 13 Nov 2007 15:07:02 +0900 (JST)
> >> [Subject: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling,
> >> 64bit-ization] Noriaki TAKAMIYA <takamiya@po.ntts.co.jp> wrote...
> >>
> > > I'd like to hear the opinion of the author.
> > >
> > > Takamiya-san, what do you think about this change?
> >
> >   For IPsec processing, I think performance is important.
> >
> >   If this fix improves the performance, it is acceptable.
>
>   I misunderstood the meaning. If this fix decreases the performance,
>   I wouldn't prefer this patch(and the below is also one of the
>   reason).

My preferred solution is to make loop unrolling conditional on
CONFIG_CC_OPTIMIZE_FOR_SIZE - and this is what is done in my
(first) patch (see attached). This part:

+#ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
+       while (1) {
+               i -= 8;
+               ROUNDS(i);
+               if (i == 0)
+                       break;
+               FLS(i);
+       }
+#else
+       if (i == 32) {
+               ROUNDS(24);
+               FLS(24);
+       }
+       ROUNDS(16);
+       FLS(16);
+       ROUNDS(8);
+       FLS(8);
+       ROUNDS(0);
+#endif

Do you agree that this solution does not look too ugly
and would satisfy both "speed" and "size" camps?


For reference, size and speed numbers again:

All times are in microseconds. Two runs give some idea of test variability.
"Setup NN: NNNNNN NNNNNN" - time taken by 100000 key setups (two runs).
"Encrypt: NNNNNN NNNNNN" - time taken by 1000 encryptions of 8K buffer.
"Decrypt: NNNNNN NNNNNN" - time taken by 1000 decryptions of 8K buffer.
"(matches)" - encrypt/decrypt cycle produced non corrupted plaintext.

CONFIG_CC_OPTIMIZE_FOR_SIZE is not set:

$ ./camellia
Setup 16:32779 33169 Encrypt:153582 153740 Decrypt:150985 149811 (matches)
Setup 24:49333 48987 Encrypt:197973 198853 Decrypt:201240 197585 (matches)
Setup 32:46700 47680 Encrypt:195650 195800 Decrypt:195450 195469 (matches)
$ ./camellia5
Setup 16:33417 32968 Encrypt:149195 149095 Decrypt:148593 148661 (matches)
Setup 24:50082 50064 Encrypt:201214 199204 Decrypt:197078 197579 (matches)
Setup 32:48938 48824 Encrypt:200231 199545 Decrypt:198954 198996 (matches)
$ ./camellia_64
Setup 16:22247 22473 Encrypt:152321 149860 Decrypt:149058 148451 (matches)
Setup 24:33832 34017 Encrypt:200428 202969 Decrypt:196789 195524 (matches)
Setup 32:32884 32821 Encrypt:200414 200640 Decrypt:197857 195987 (matches)
$ size camellia.o camellia7.o camellia_64.o
   text    data     bss     dec     hex filename
  24586       0       0   24586    600a camellia.o
  21714       0       0   21714    54d2 camellia5.o
  18666       0       0   18666    48ea camellia_64.o

Very small speed loss in camellia -> camellia5, noticeably smaller size.
Big key setup speedup in 64-bit camellia_64, and it is even smaller.

CONFIG_CC_OPTIMIZE_FOR_SIZE is set:

$ ./camellia_Os
Setup 16:32573 34985 Encrypt:151825 152011 Decrypt:147581 147630 (matches)
Setup 24:48528 49250 Encrypt:196223 199056 Decrypt:198811 196394 (matches)
Setup 32:46650 47538 Encrypt:197466 196412 Decrypt:196290 196550 (matches)
$ ./camellia5_Os
Setup 16:33360 34487 Encrypt:154718 154499 Decrypt:157432 157135 (matches)
Setup 24:53969 54304 Encrypt:205184 205818 Decrypt:210675 208552 (matches)
Setup 32:53064 52904 Encrypt:205350 205439 Decrypt:211654 208468 (matches)
$ ./camellia_64_Os
Setup 16:24696 25894 Encrypt:155903 155747 Decrypt:157385 155696 (matches)
Setup 24:33873 33230 Encrypt:206111 206385 Decrypt:208111 207650 (matches)
Setup 32:32799 32325 Encrypt:209715 205973 Decrypt:207578 207644 (matches)
$ size camellia_Os.o camellia7_Os.o camellia_64_Os.o
   text    data     bss     dec     hex filename
  24586       0       0   24586    600a camellia_Os.o
  15906       0       0   15906    3e22 camellia5_Os.o
  13098       0       0   13098    332a camellia_64_Os.o

~5% speed loss in camellia -> camellia5, much smaller size.
Big key setup speedup in 64-bit camellia_64, and it is even smaller still.
--
vda

[-- Attachment #2: camellia5.diff --]
[-- Type: text/x-diff, Size: 55724 bytes --]

--- linux-2.6.23.src/crypto/camellia4.c	2007-10-24 19:03:57.000000000 +0100
+++ linux-2.6.23.src/crypto/camellia.c	2007-10-25 11:57:16.000000000 +0100
@@ -36,6 +36,13 @@
 #include <linux/kernel.h>
 #include <linux/module.h>
 
+#if BITS_PER_LONG >= 64
+
+/* Use alternative implementation with mostly 64-bit ops */
+#include "camellia_64.c"
+
+#else
+
 static const u32 camellia_sp1110[256] = {
 	0x70707000,0x82828200,0x2c2c2c00,0xececec00,
 	0xb3b3b300,0x27272700,0xc0c0c000,0xe5e5e500,
@@ -329,7 +336,6 @@ static const u32 camellia_sp4404[256] = 
 /*
  *  macros
  */
-
 # define GETU32(v, pt) \
     do { \
 	/* latest breed of gcc is clever enough to use move */ \
@@ -364,63 +370,28 @@ static const u32 camellia_sp4404[256] = 
     } while(0)
 
 
+/*
+ * Key setup
+ */
 #define CAMELLIA_F(xl, xr, kl, kr, yl, yr, il, ir, t0, t1)	\
     do {							\
 	il = xl ^ kl;						\
 	ir = xr ^ kr;						\
 	t0 = il >> 16;						\
 	t1 = ir >> 16;						\
-	yl = camellia_sp1110[ir & 0xff]				\
-	   ^ camellia_sp0222[(t1 >> 8) & 0xff]			\
-	   ^ camellia_sp3033[t1 & 0xff]				\
-	   ^ camellia_sp4404[(ir >> 8) & 0xff];			\
-	yr = camellia_sp1110[(t0 >> 8) & 0xff]			\
-	   ^ camellia_sp0222[t0 & 0xff]				\
-	   ^ camellia_sp3033[(il >> 8) & 0xff]			\
-	   ^ camellia_sp4404[il & 0xff];			\
+	yl = camellia_sp1110[(u8)(ir     )]			\
+	   ^ camellia_sp0222[    (t1 >> 8)]			\
+	   ^ camellia_sp3033[(u8)(t1     )]			\
+	   ^ camellia_sp4404[(u8)(ir >> 8)];			\
+	yr = camellia_sp1110[    (t0 >> 8)]			\
+	   ^ camellia_sp0222[(u8)(t0     )]			\
+	   ^ camellia_sp3033[(u8)(il >> 8)]			\
+	   ^ camellia_sp4404[(u8)(il     )];			\
 	yl ^= yr;						\
 	yr = ROR8(yr);						\
 	yr ^= yl;						\
     } while(0)
 
-
-/*
- * for speed up
- *
- */
-#define CAMELLIA_FLS(ll, lr, rl, rr, kll, klr, krl, krr, t0, t1, t2, t3) \
-    do {								\
-	t0 = kll;							\
-	t2 = krr;							\
-	t0 &= ll;							\
-	t2 |= rr;							\
-	rl ^= t2;							\
-	lr ^= ROL1(t0);							\
-	t3 = krl;							\
-	t1 = klr;							\
-	t3 &= rl;							\
-	t1 |= lr;							\
-	ll ^= t1;							\
-	rr ^= ROL1(t3);							\
-    } while(0)
-
-#define CAMELLIA_ROUNDSM(xl, xr, kl, kr, yl, yr, il, ir, t0, t1)	\
-    do {								\
-	ir =  camellia_sp1110[xr & 0xff];				\
-	il =  camellia_sp1110[(xl>>24) & 0xff];				\
-	ir ^= camellia_sp0222[(xr>>24) & 0xff];				\
-	il ^= camellia_sp0222[(xl>>16) & 0xff];				\
-	ir ^= camellia_sp3033[(xr>>16) & 0xff];				\
-	il ^= camellia_sp3033[(xl>>8) & 0xff];				\
-	ir ^= camellia_sp4404[(xr>>8) & 0xff];				\
-	il ^= camellia_sp4404[xl & 0xff];				\
-	il ^= kl;							\
-	ir ^= il ^ kr;							\
-	yl ^= ir;							\
-	yr ^= ROR8(il) ^ ir;						\
-    } while(0)
-
-
 #define SUBKEY_L(INDEX) (subkey[(INDEX)*2])
 #define SUBKEY_R(INDEX) (subkey[(INDEX)*2 + 1])
 
@@ -622,7 +593,7 @@ static void camellia_setup128(const unsi
 	SUBKEY_L(6) = subL[5] ^ subL[7]; /* round 5 */
 	SUBKEY_R(6) = subR[5] ^ subR[7];
 	tl = subL[10] ^ (subR[10] & ~subR[8]);
-	dw = tl & subL[8],  /* FL(kl1) */
+	dw = tl & subL[8];  /* FL(kl1) */
 		tr = subR[10] ^ ROL1(dw);
 	SUBKEY_L(7) = subL[6] ^ tl; /* round 6 */
 	SUBKEY_R(7) = subR[6] ^ tr;
@@ -1000,400 +971,173 @@ static void camellia_setup192(const unsi
 }
 
 
-static void camellia_encrypt128(const u32 *subkey, u32 *io_text)
-{
-	u32 il,ir,t0,t1;               /* temporary variables */
-
-	u32 io[4];
-
-	/* pre whitening but absorb kw2 */
-	io[0] = io_text[0] ^ SUBKEY_L(0);
-	io[1] = io_text[1] ^ SUBKEY_R(0);
-	io[2] = io_text[2];
-	io[3] = io_text[3];
-
-	/* main iteration */
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(2),SUBKEY_R(2),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(3),SUBKEY_R(3),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(4),SUBKEY_R(4),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(5),SUBKEY_R(5),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(6),SUBKEY_R(6),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(7),SUBKEY_R(7),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(8),SUBKEY_R(8),
-		     SUBKEY_L(9),SUBKEY_R(9),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(10),SUBKEY_R(10),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(11),SUBKEY_R(11),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(12),SUBKEY_R(12),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(13),SUBKEY_R(13),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(14),SUBKEY_R(14),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(15),SUBKEY_R(15),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(16),SUBKEY_R(16),
-		     SUBKEY_L(17),SUBKEY_R(17),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(18),SUBKEY_R(18),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(19),SUBKEY_R(19),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(20),SUBKEY_R(20),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(21),SUBKEY_R(21),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(22),SUBKEY_R(22),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(23),SUBKEY_R(23),
-			 io[0],io[1],il,ir,t0,t1);
+/*
+ * Encrypt/decrypt
+ */
+#define CAMELLIA_FLS(ll, lr, rl, rr, kll, klr, krl, krr, t0, t1, t2, t3) \
+    do {								\
+	t0 = kll;							\
+	t2 = krr;							\
+	t0 &= ll;							\
+	t2 |= rr;							\
+	rl ^= t2;							\
+	lr ^= ROL1(t0);							\
+	t3 = krl;							\
+	t1 = klr;							\
+	t3 &= rl;							\
+	t1 |= lr;							\
+	ll ^= t1;							\
+	rr ^= ROL1(t3);							\
+    } while(0)
 
-	/* post whitening but kw4 */
-	io_text[0] = io[2] ^ SUBKEY_L(24);
-	io_text[1] = io[3] ^ SUBKEY_R(24);
-	io_text[2] = io[0];
-	io_text[3] = io[1];
-}
+#define CAMELLIA_ROUNDSM(xl, xr, kl, kr, yl, yr, il, ir)		\
+    do {								\
+	ir =  camellia_sp1110[(u8)xr];					\
+	il =  camellia_sp1110[    (xl >> 24)];				\
+	ir ^= camellia_sp0222[    (xr >> 24)];				\
+	il ^= camellia_sp0222[(u8)(xl >> 16)];				\
+	ir ^= camellia_sp3033[(u8)(xr >> 16)];				\
+	il ^= camellia_sp3033[(u8)(xl >> 8)];				\
+	ir ^= camellia_sp4404[(u8)(xr >> 8)];				\
+	il ^= camellia_sp4404[(u8)xl];					\
+	il ^= kl;							\
+	ir ^= il ^ kr;							\
+	yl ^= ir;							\
+	yr ^= ROR8(il) ^ ir;						\
+    } while(0)
 
-static void camellia_decrypt128(const u32 *subkey, u32 *io_text)
+/* max = 24: 128bit encrypt, max = 32: 256bit encrypt */
+static void camellia_do_encrypt(const u32 *subkey, u32 *io, unsigned max)
 {
 	u32 il,ir,t0,t1;               /* temporary variables */
 
-	u32 io[4];
-
 	/* pre whitening but absorb kw2 */
-	io[0] = io_text[0] ^ SUBKEY_L(24);
-	io[1] = io_text[1] ^ SUBKEY_R(24);
-	io[2] = io_text[2];
-	io[3] = io_text[3];
+	io[0] ^= SUBKEY_L(0);
+	io[1] ^= SUBKEY_R(0);
 
 	/* main iteration */
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(23),SUBKEY_R(23),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(22),SUBKEY_R(22),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(21),SUBKEY_R(21),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(20),SUBKEY_R(20),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(19),SUBKEY_R(19),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(18),SUBKEY_R(18),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(17),SUBKEY_R(17),
-		     SUBKEY_L(16),SUBKEY_R(16),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(15),SUBKEY_R(15),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(14),SUBKEY_R(14),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(13),SUBKEY_R(13),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(12),SUBKEY_R(12),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(11),SUBKEY_R(11),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(10),SUBKEY_R(10),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(9),SUBKEY_R(9),
-		     SUBKEY_L(8),SUBKEY_R(8),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(7),SUBKEY_R(7),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(6),SUBKEY_R(6),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(5),SUBKEY_R(5),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(4),SUBKEY_R(4),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(3),SUBKEY_R(3),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(2),SUBKEY_R(2),
-			 io[0],io[1],il,ir,t0,t1);
-
-	/* post whitening but kw4 */
-	io_text[0] = io[2] ^ SUBKEY_L(0);
-	io_text[1] = io[3] ^ SUBKEY_R(0);
-	io_text[2] = io[0];
-	io_text[3] = io[1];
-}
-
-static void camellia_encrypt256(const u32 *subkey, u32 *io_text)
-{
-	u32 il,ir,t0,t1;           /* temporary variables */
-
-	u32 io[4];
-
-	/* pre whitening but absorb kw2 */
-	io[0] = io_text[0] ^ SUBKEY_L(0);
-	io[1] = io_text[1] ^ SUBKEY_R(0);
-	io[2] = io_text[2];
-	io[3] = io_text[3];
+#define ROUNDS(i) do { \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 2),SUBKEY_R(i + 2), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 3),SUBKEY_R(i + 3), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 4),SUBKEY_R(i + 4), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 5),SUBKEY_R(i + 5), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 6),SUBKEY_R(i + 6), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 7),SUBKEY_R(i + 7), \
+			 io[0],io[1],il,ir); \
+} while (0)
+#define FLS(i) do { \
+	CAMELLIA_FLS(io[0],io[1],io[2],io[3], \
+		     SUBKEY_L(i + 0),SUBKEY_R(i + 0), \
+		     SUBKEY_L(i + 1),SUBKEY_R(i + 1), \
+		     t0,t1,il,ir); \
+} while (0)
+
+#ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
+	{
+		unsigned i = 0;
+		while (1) {
+			ROUNDS(i);
+			i += 8;
+			if (i >= max)
+				break;
+			FLS(i);
+		}
+	}
+#else
+	ROUNDS(0);
+	FLS(8);
+	ROUNDS(8);
+	FLS(16);
+	ROUNDS(16);
+	if (max == 32) {
+		FLS(24);
+		ROUNDS(24);
+	}
+#endif
 
-	/* main iteration */
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(2),SUBKEY_R(2),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(3),SUBKEY_R(3),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(4),SUBKEY_R(4),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(5),SUBKEY_R(5),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(6),SUBKEY_R(6),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(7),SUBKEY_R(7),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(8),SUBKEY_R(8),
-		     SUBKEY_L(9),SUBKEY_R(9),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(10),SUBKEY_R(10),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(11),SUBKEY_R(11),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(12),SUBKEY_R(12),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(13),SUBKEY_R(13),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(14),SUBKEY_R(14),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(15),SUBKEY_R(15),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(16),SUBKEY_R(16),
-		     SUBKEY_L(17),SUBKEY_R(17),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(18),SUBKEY_R(18),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(19),SUBKEY_R(19),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(20),SUBKEY_R(20),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(21),SUBKEY_R(21),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(22),SUBKEY_R(22),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(23),SUBKEY_R(23),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(24),SUBKEY_R(24),
-		     SUBKEY_L(25),SUBKEY_R(25),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(26),SUBKEY_R(26),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(27),SUBKEY_R(27),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(28),SUBKEY_R(28),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(29),SUBKEY_R(29),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(30),SUBKEY_R(30),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(31),SUBKEY_R(31),
-			 io[0],io[1],il,ir,t0,t1);
+#undef ROUNDS
+#undef FLS
 
 	/* post whitening but kw4 */
-	io_text[0] = io[2] ^ SUBKEY_L(32);
-	io_text[1] = io[3] ^ SUBKEY_R(32);
-	io_text[2] = io[0];
-	io_text[3] = io[1];
+	io[2] ^= SUBKEY_L(max);
+	io[3] ^= SUBKEY_R(max);
+	/* NB: io[0],[1] should be swapped with [2],[3] by caller! */
 }
 
-static void camellia_decrypt256(const u32 *subkey, u32 *io_text)
+static void camellia_do_decrypt(const u32 *subkey, u32 *io, unsigned i)
 {
-	u32 il,ir,t0,t1;           /* temporary variables */
-
-	u32 io[4];
+	u32 il,ir,t0,t1;               /* temporary variables */
 
 	/* pre whitening but absorb kw2 */
-	io[0] = io_text[0] ^ SUBKEY_L(32);
-	io[1] = io_text[1] ^ SUBKEY_R(32);
-	io[2] = io_text[2];
-	io[3] = io_text[3];
+	io[0] ^= SUBKEY_L(i);
+	io[1] ^= SUBKEY_R(i);
 
 	/* main iteration */
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(31),SUBKEY_R(31),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(30),SUBKEY_R(30),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(29),SUBKEY_R(29),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(28),SUBKEY_R(28),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(27),SUBKEY_R(27),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(26),SUBKEY_R(26),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(25),SUBKEY_R(25),
-		     SUBKEY_L(24),SUBKEY_R(24),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(23),SUBKEY_R(23),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(22),SUBKEY_R(22),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(21),SUBKEY_R(21),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(20),SUBKEY_R(20),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(19),SUBKEY_R(19),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(18),SUBKEY_R(18),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(17),SUBKEY_R(17),
-		     SUBKEY_L(16),SUBKEY_R(16),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(15),SUBKEY_R(15),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(14),SUBKEY_R(14),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(13),SUBKEY_R(13),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(12),SUBKEY_R(12),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(11),SUBKEY_R(11),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(10),SUBKEY_R(10),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(9),SUBKEY_R(9),
-		     SUBKEY_L(8),SUBKEY_R(8),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(7),SUBKEY_R(7),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(6),SUBKEY_R(6),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(5),SUBKEY_R(5),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(4),SUBKEY_R(4),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(3),SUBKEY_R(3),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(2),SUBKEY_R(2),
-			 io[0],io[1],il,ir,t0,t1);
+#define ROUNDS(i) do { \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 7),SUBKEY_R(i + 7), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 6),SUBKEY_R(i + 6), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 5),SUBKEY_R(i + 5), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 4),SUBKEY_R(i + 4), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 3),SUBKEY_R(i + 3), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 2),SUBKEY_R(i + 2), \
+			 io[0],io[1],il,ir); \
+} while (0)
+#define FLS(i) do { \
+	CAMELLIA_FLS(io[0],io[1],io[2],io[3], \
+		     SUBKEY_L(i + 1),SUBKEY_R(i + 1), \
+		     SUBKEY_L(i + 0),SUBKEY_R(i + 0), \
+		     t0,t1,il,ir); \
+} while (0)
+
+#ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
+	while (1) {
+		i -= 8;
+		ROUNDS(i);
+		if (i == 0)
+			break;
+		FLS(i);
+	}
+#else
+	if (i == 32) {
+		ROUNDS(24);
+		FLS(24);
+	}
+	ROUNDS(16);
+	FLS(16);
+	ROUNDS(8);
+	FLS(8);
+	ROUNDS(0);
+#endif
+
+#undef ROUNDS
+#undef FLS
 
 	/* post whitening but kw4 */
-	io_text[0] = io[2] ^ SUBKEY_L(0);
-	io_text[1] = io[3] ^ SUBKEY_R(0);
-	io_text[2] = io[0];
-	io_text[3] = io[1];
+	io[2] ^= SUBKEY_L(0);
+	io[3] ^= SUBKEY_R(0);
+	/* NB: 0,1 should be swapped with 2,3 by caller! */
 }
 
 
@@ -1445,21 +1189,15 @@ static void camellia_encrypt(struct cryp
 	tmp[2] = be32_to_cpu(src[2]);
 	tmp[3] = be32_to_cpu(src[3]);
 
-	switch (cctx->key_length) {
-	case 16:
-		camellia_encrypt128(cctx->key_table, tmp);
-		break;
-	case 24:
-		/* fall through */
-	case 32:
-		camellia_encrypt256(cctx->key_table, tmp);
-		break;
-	}
-
-	dst[0] = cpu_to_be32(tmp[0]);
-	dst[1] = cpu_to_be32(tmp[1]);
-	dst[2] = cpu_to_be32(tmp[2]);
-	dst[3] = cpu_to_be32(tmp[3]);
+	camellia_do_encrypt(cctx->key_table, tmp,
+		cctx->key_length == 16 ? 24 : 32 /* for key lengths of 24 and 32 */
+	);
+
+	/* do_encrypt returns 0,1 swapped with 2,3 */
+	dst[0] = cpu_to_be32(tmp[2]);
+	dst[1] = cpu_to_be32(tmp[3]);
+	dst[2] = cpu_to_be32(tmp[0]);
+	dst[3] = cpu_to_be32(tmp[1]);
 }
 
 static void camellia_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
@@ -1475,21 +1213,15 @@ static void camellia_decrypt(struct cryp
 	tmp[2] = be32_to_cpu(src[2]);
 	tmp[3] = be32_to_cpu(src[3]);
 
-	switch (cctx->key_length) {
-	case 16:
-		camellia_decrypt128(cctx->key_table, tmp);
-		break;
-	case 24:
-		/* fall through */
-	case 32:
-		camellia_decrypt256(cctx->key_table, tmp);
-		break;
-	}
-
-	dst[0] = cpu_to_be32(tmp[0]);
-	dst[1] = cpu_to_be32(tmp[1]);
-	dst[2] = cpu_to_be32(tmp[2]);
-	dst[3] = cpu_to_be32(tmp[3]);
+	camellia_do_decrypt(cctx->key_table, tmp,
+		cctx->key_length == 16 ? 24 : 32 /* for key lengths of 24 and 32 */
+	);
+
+	/* do_decrypt returns 0,1 swapped with 2,3 */
+	dst[0] = cpu_to_be32(tmp[2]);
+	dst[1] = cpu_to_be32(tmp[3]);
+	dst[2] = cpu_to_be32(tmp[0]);
+	dst[3] = cpu_to_be32(tmp[1]);
 }
 
 static struct crypto_alg camellia_alg = {
@@ -1528,3 +1260,5 @@ module_exit(camellia_fini);
 
 MODULE_DESCRIPTION("Camellia Cipher Algorithm");
 MODULE_LICENSE("GPL");
+
+#endif /* if BITS_PER_LONG < 64 */
--- /dev/null	2006-05-22 15:25:23.000000000 +0100
+++ linux-2.6.23.src/crypto/camellia_64.c	2007-10-25 12:32:16.000000000 +0100
@@ -0,0 +1,1172 @@
+/*
+ * Copyright (C) 2006
+ * NTT (Nippon Telegraph and Telephone Corporation).
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
+ */
+
+/*
+ * Algorithm Specification
+ *  http://info.isl.ntt.co.jp/crypt/eng/camellia/specifications.html
+ */
+
+/*
+ *
+ * NOTE --- NOTE --- NOTE --- NOTE
+ * This implementation assumes that all memory addresses passed
+ * as parameters are four-byte aligned.
+ *
+ */
+
+/* #included from camellia.c if long is 64bit */
+
+/*
+#include <linux/crypto.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+*/
+
+static const u32 camellia_sp1110[256] = {
+	0x70707000,0x82828200,0x2c2c2c00,0xececec00,
+	0xb3b3b300,0x27272700,0xc0c0c000,0xe5e5e500,
+	0xe4e4e400,0x85858500,0x57575700,0x35353500,
+	0xeaeaea00,0x0c0c0c00,0xaeaeae00,0x41414100,
+	0x23232300,0xefefef00,0x6b6b6b00,0x93939300,
+	0x45454500,0x19191900,0xa5a5a500,0x21212100,
+	0xededed00,0x0e0e0e00,0x4f4f4f00,0x4e4e4e00,
+	0x1d1d1d00,0x65656500,0x92929200,0xbdbdbd00,
+	0x86868600,0xb8b8b800,0xafafaf00,0x8f8f8f00,
+	0x7c7c7c00,0xebebeb00,0x1f1f1f00,0xcecece00,
+	0x3e3e3e00,0x30303000,0xdcdcdc00,0x5f5f5f00,
+	0x5e5e5e00,0xc5c5c500,0x0b0b0b00,0x1a1a1a00,
+	0xa6a6a600,0xe1e1e100,0x39393900,0xcacaca00,
+	0xd5d5d500,0x47474700,0x5d5d5d00,0x3d3d3d00,
+	0xd9d9d900,0x01010100,0x5a5a5a00,0xd6d6d600,
+	0x51515100,0x56565600,0x6c6c6c00,0x4d4d4d00,
+	0x8b8b8b00,0x0d0d0d00,0x9a9a9a00,0x66666600,
+	0xfbfbfb00,0xcccccc00,0xb0b0b000,0x2d2d2d00,
+	0x74747400,0x12121200,0x2b2b2b00,0x20202000,
+	0xf0f0f000,0xb1b1b100,0x84848400,0x99999900,
+	0xdfdfdf00,0x4c4c4c00,0xcbcbcb00,0xc2c2c200,
+	0x34343400,0x7e7e7e00,0x76767600,0x05050500,
+	0x6d6d6d00,0xb7b7b700,0xa9a9a900,0x31313100,
+	0xd1d1d100,0x17171700,0x04040400,0xd7d7d700,
+	0x14141400,0x58585800,0x3a3a3a00,0x61616100,
+	0xdedede00,0x1b1b1b00,0x11111100,0x1c1c1c00,
+	0x32323200,0x0f0f0f00,0x9c9c9c00,0x16161600,
+	0x53535300,0x18181800,0xf2f2f200,0x22222200,
+	0xfefefe00,0x44444400,0xcfcfcf00,0xb2b2b200,
+	0xc3c3c300,0xb5b5b500,0x7a7a7a00,0x91919100,
+	0x24242400,0x08080800,0xe8e8e800,0xa8a8a800,
+	0x60606000,0xfcfcfc00,0x69696900,0x50505000,
+	0xaaaaaa00,0xd0d0d000,0xa0a0a000,0x7d7d7d00,
+	0xa1a1a100,0x89898900,0x62626200,0x97979700,
+	0x54545400,0x5b5b5b00,0x1e1e1e00,0x95959500,
+	0xe0e0e000,0xffffff00,0x64646400,0xd2d2d200,
+	0x10101000,0xc4c4c400,0x00000000,0x48484800,
+	0xa3a3a300,0xf7f7f700,0x75757500,0xdbdbdb00,
+	0x8a8a8a00,0x03030300,0xe6e6e600,0xdadada00,
+	0x09090900,0x3f3f3f00,0xdddddd00,0x94949400,
+	0x87878700,0x5c5c5c00,0x83838300,0x02020200,
+	0xcdcdcd00,0x4a4a4a00,0x90909000,0x33333300,
+	0x73737300,0x67676700,0xf6f6f600,0xf3f3f300,
+	0x9d9d9d00,0x7f7f7f00,0xbfbfbf00,0xe2e2e200,
+	0x52525200,0x9b9b9b00,0xd8d8d800,0x26262600,
+	0xc8c8c800,0x37373700,0xc6c6c600,0x3b3b3b00,
+	0x81818100,0x96969600,0x6f6f6f00,0x4b4b4b00,
+	0x13131300,0xbebebe00,0x63636300,0x2e2e2e00,
+	0xe9e9e900,0x79797900,0xa7a7a700,0x8c8c8c00,
+	0x9f9f9f00,0x6e6e6e00,0xbcbcbc00,0x8e8e8e00,
+	0x29292900,0xf5f5f500,0xf9f9f900,0xb6b6b600,
+	0x2f2f2f00,0xfdfdfd00,0xb4b4b400,0x59595900,
+	0x78787800,0x98989800,0x06060600,0x6a6a6a00,
+	0xe7e7e700,0x46464600,0x71717100,0xbababa00,
+	0xd4d4d400,0x25252500,0xababab00,0x42424200,
+	0x88888800,0xa2a2a200,0x8d8d8d00,0xfafafa00,
+	0x72727200,0x07070700,0xb9b9b900,0x55555500,
+	0xf8f8f800,0xeeeeee00,0xacacac00,0x0a0a0a00,
+	0x36363600,0x49494900,0x2a2a2a00,0x68686800,
+	0x3c3c3c00,0x38383800,0xf1f1f100,0xa4a4a400,
+	0x40404000,0x28282800,0xd3d3d300,0x7b7b7b00,
+	0xbbbbbb00,0xc9c9c900,0x43434300,0xc1c1c100,
+	0x15151500,0xe3e3e300,0xadadad00,0xf4f4f400,
+	0x77777700,0xc7c7c700,0x80808000,0x9e9e9e00,
+};
+
+static const u32 camellia_sp0222[256] = {
+	0x00e0e0e0,0x00050505,0x00585858,0x00d9d9d9,
+	0x00676767,0x004e4e4e,0x00818181,0x00cbcbcb,
+	0x00c9c9c9,0x000b0b0b,0x00aeaeae,0x006a6a6a,
+	0x00d5d5d5,0x00181818,0x005d5d5d,0x00828282,
+	0x00464646,0x00dfdfdf,0x00d6d6d6,0x00272727,
+	0x008a8a8a,0x00323232,0x004b4b4b,0x00424242,
+	0x00dbdbdb,0x001c1c1c,0x009e9e9e,0x009c9c9c,
+	0x003a3a3a,0x00cacaca,0x00252525,0x007b7b7b,
+	0x000d0d0d,0x00717171,0x005f5f5f,0x001f1f1f,
+	0x00f8f8f8,0x00d7d7d7,0x003e3e3e,0x009d9d9d,
+	0x007c7c7c,0x00606060,0x00b9b9b9,0x00bebebe,
+	0x00bcbcbc,0x008b8b8b,0x00161616,0x00343434,
+	0x004d4d4d,0x00c3c3c3,0x00727272,0x00959595,
+	0x00ababab,0x008e8e8e,0x00bababa,0x007a7a7a,
+	0x00b3b3b3,0x00020202,0x00b4b4b4,0x00adadad,
+	0x00a2a2a2,0x00acacac,0x00d8d8d8,0x009a9a9a,
+	0x00171717,0x001a1a1a,0x00353535,0x00cccccc,
+	0x00f7f7f7,0x00999999,0x00616161,0x005a5a5a,
+	0x00e8e8e8,0x00242424,0x00565656,0x00404040,
+	0x00e1e1e1,0x00636363,0x00090909,0x00333333,
+	0x00bfbfbf,0x00989898,0x00979797,0x00858585,
+	0x00686868,0x00fcfcfc,0x00ececec,0x000a0a0a,
+	0x00dadada,0x006f6f6f,0x00535353,0x00626262,
+	0x00a3a3a3,0x002e2e2e,0x00080808,0x00afafaf,
+	0x00282828,0x00b0b0b0,0x00747474,0x00c2c2c2,
+	0x00bdbdbd,0x00363636,0x00222222,0x00383838,
+	0x00646464,0x001e1e1e,0x00393939,0x002c2c2c,
+	0x00a6a6a6,0x00303030,0x00e5e5e5,0x00444444,
+	0x00fdfdfd,0x00888888,0x009f9f9f,0x00656565,
+	0x00878787,0x006b6b6b,0x00f4f4f4,0x00232323,
+	0x00484848,0x00101010,0x00d1d1d1,0x00515151,
+	0x00c0c0c0,0x00f9f9f9,0x00d2d2d2,0x00a0a0a0,
+	0x00555555,0x00a1a1a1,0x00414141,0x00fafafa,
+	0x00434343,0x00131313,0x00c4c4c4,0x002f2f2f,
+	0x00a8a8a8,0x00b6b6b6,0x003c3c3c,0x002b2b2b,
+	0x00c1c1c1,0x00ffffff,0x00c8c8c8,0x00a5a5a5,
+	0x00202020,0x00898989,0x00000000,0x00909090,
+	0x00474747,0x00efefef,0x00eaeaea,0x00b7b7b7,
+	0x00151515,0x00060606,0x00cdcdcd,0x00b5b5b5,
+	0x00121212,0x007e7e7e,0x00bbbbbb,0x00292929,
+	0x000f0f0f,0x00b8b8b8,0x00070707,0x00040404,
+	0x009b9b9b,0x00949494,0x00212121,0x00666666,
+	0x00e6e6e6,0x00cecece,0x00ededed,0x00e7e7e7,
+	0x003b3b3b,0x00fefefe,0x007f7f7f,0x00c5c5c5,
+	0x00a4a4a4,0x00373737,0x00b1b1b1,0x004c4c4c,
+	0x00919191,0x006e6e6e,0x008d8d8d,0x00767676,
+	0x00030303,0x002d2d2d,0x00dedede,0x00969696,
+	0x00262626,0x007d7d7d,0x00c6c6c6,0x005c5c5c,
+	0x00d3d3d3,0x00f2f2f2,0x004f4f4f,0x00191919,
+	0x003f3f3f,0x00dcdcdc,0x00797979,0x001d1d1d,
+	0x00525252,0x00ebebeb,0x00f3f3f3,0x006d6d6d,
+	0x005e5e5e,0x00fbfbfb,0x00696969,0x00b2b2b2,
+	0x00f0f0f0,0x00313131,0x000c0c0c,0x00d4d4d4,
+	0x00cfcfcf,0x008c8c8c,0x00e2e2e2,0x00757575,
+	0x00a9a9a9,0x004a4a4a,0x00575757,0x00848484,
+	0x00111111,0x00454545,0x001b1b1b,0x00f5f5f5,
+	0x00e4e4e4,0x000e0e0e,0x00737373,0x00aaaaaa,
+	0x00f1f1f1,0x00dddddd,0x00595959,0x00141414,
+	0x006c6c6c,0x00929292,0x00545454,0x00d0d0d0,
+	0x00787878,0x00707070,0x00e3e3e3,0x00494949,
+	0x00808080,0x00505050,0x00a7a7a7,0x00f6f6f6,
+	0x00777777,0x00939393,0x00868686,0x00838383,
+	0x002a2a2a,0x00c7c7c7,0x005b5b5b,0x00e9e9e9,
+	0x00eeeeee,0x008f8f8f,0x00010101,0x003d3d3d,
+};
+
+static const u32 camellia_sp3033[256] = {
+	0x38003838,0x41004141,0x16001616,0x76007676,
+	0xd900d9d9,0x93009393,0x60006060,0xf200f2f2,
+	0x72007272,0xc200c2c2,0xab00abab,0x9a009a9a,
+	0x75007575,0x06000606,0x57005757,0xa000a0a0,
+	0x91009191,0xf700f7f7,0xb500b5b5,0xc900c9c9,
+	0xa200a2a2,0x8c008c8c,0xd200d2d2,0x90009090,
+	0xf600f6f6,0x07000707,0xa700a7a7,0x27002727,
+	0x8e008e8e,0xb200b2b2,0x49004949,0xde00dede,
+	0x43004343,0x5c005c5c,0xd700d7d7,0xc700c7c7,
+	0x3e003e3e,0xf500f5f5,0x8f008f8f,0x67006767,
+	0x1f001f1f,0x18001818,0x6e006e6e,0xaf00afaf,
+	0x2f002f2f,0xe200e2e2,0x85008585,0x0d000d0d,
+	0x53005353,0xf000f0f0,0x9c009c9c,0x65006565,
+	0xea00eaea,0xa300a3a3,0xae00aeae,0x9e009e9e,
+	0xec00ecec,0x80008080,0x2d002d2d,0x6b006b6b,
+	0xa800a8a8,0x2b002b2b,0x36003636,0xa600a6a6,
+	0xc500c5c5,0x86008686,0x4d004d4d,0x33003333,
+	0xfd00fdfd,0x66006666,0x58005858,0x96009696,
+	0x3a003a3a,0x09000909,0x95009595,0x10001010,
+	0x78007878,0xd800d8d8,0x42004242,0xcc00cccc,
+	0xef00efef,0x26002626,0xe500e5e5,0x61006161,
+	0x1a001a1a,0x3f003f3f,0x3b003b3b,0x82008282,
+	0xb600b6b6,0xdb00dbdb,0xd400d4d4,0x98009898,
+	0xe800e8e8,0x8b008b8b,0x02000202,0xeb00ebeb,
+	0x0a000a0a,0x2c002c2c,0x1d001d1d,0xb000b0b0,
+	0x6f006f6f,0x8d008d8d,0x88008888,0x0e000e0e,
+	0x19001919,0x87008787,0x4e004e4e,0x0b000b0b,
+	0xa900a9a9,0x0c000c0c,0x79007979,0x11001111,
+	0x7f007f7f,0x22002222,0xe700e7e7,0x59005959,
+	0xe100e1e1,0xda00dada,0x3d003d3d,0xc800c8c8,
+	0x12001212,0x04000404,0x74007474,0x54005454,
+	0x30003030,0x7e007e7e,0xb400b4b4,0x28002828,
+	0x55005555,0x68006868,0x50005050,0xbe00bebe,
+	0xd000d0d0,0xc400c4c4,0x31003131,0xcb00cbcb,
+	0x2a002a2a,0xad00adad,0x0f000f0f,0xca00caca,
+	0x70007070,0xff00ffff,0x32003232,0x69006969,
+	0x08000808,0x62006262,0x00000000,0x24002424,
+	0xd100d1d1,0xfb00fbfb,0xba00baba,0xed00eded,
+	0x45004545,0x81008181,0x73007373,0x6d006d6d,
+	0x84008484,0x9f009f9f,0xee00eeee,0x4a004a4a,
+	0xc300c3c3,0x2e002e2e,0xc100c1c1,0x01000101,
+	0xe600e6e6,0x25002525,0x48004848,0x99009999,
+	0xb900b9b9,0xb300b3b3,0x7b007b7b,0xf900f9f9,
+	0xce00cece,0xbf00bfbf,0xdf00dfdf,0x71007171,
+	0x29002929,0xcd00cdcd,0x6c006c6c,0x13001313,
+	0x64006464,0x9b009b9b,0x63006363,0x9d009d9d,
+	0xc000c0c0,0x4b004b4b,0xb700b7b7,0xa500a5a5,
+	0x89008989,0x5f005f5f,0xb100b1b1,0x17001717,
+	0xf400f4f4,0xbc00bcbc,0xd300d3d3,0x46004646,
+	0xcf00cfcf,0x37003737,0x5e005e5e,0x47004747,
+	0x94009494,0xfa00fafa,0xfc00fcfc,0x5b005b5b,
+	0x97009797,0xfe00fefe,0x5a005a5a,0xac00acac,
+	0x3c003c3c,0x4c004c4c,0x03000303,0x35003535,
+	0xf300f3f3,0x23002323,0xb800b8b8,0x5d005d5d,
+	0x6a006a6a,0x92009292,0xd500d5d5,0x21002121,
+	0x44004444,0x51005151,0xc600c6c6,0x7d007d7d,
+	0x39003939,0x83008383,0xdc00dcdc,0xaa00aaaa,
+	0x7c007c7c,0x77007777,0x56005656,0x05000505,
+	0x1b001b1b,0xa400a4a4,0x15001515,0x34003434,
+	0x1e001e1e,0x1c001c1c,0xf800f8f8,0x52005252,
+	0x20002020,0x14001414,0xe900e9e9,0xbd00bdbd,
+	0xdd00dddd,0xe400e4e4,0xa100a1a1,0xe000e0e0,
+	0x8a008a8a,0xf100f1f1,0xd600d6d6,0x7a007a7a,
+	0xbb00bbbb,0xe300e3e3,0x40004040,0x4f004f4f,
+};
+
+static const u32 camellia_sp4404[256] = {
+	0x70700070,0x2c2c002c,0xb3b300b3,0xc0c000c0,
+	0xe4e400e4,0x57570057,0xeaea00ea,0xaeae00ae,
+	0x23230023,0x6b6b006b,0x45450045,0xa5a500a5,
+	0xeded00ed,0x4f4f004f,0x1d1d001d,0x92920092,
+	0x86860086,0xafaf00af,0x7c7c007c,0x1f1f001f,
+	0x3e3e003e,0xdcdc00dc,0x5e5e005e,0x0b0b000b,
+	0xa6a600a6,0x39390039,0xd5d500d5,0x5d5d005d,
+	0xd9d900d9,0x5a5a005a,0x51510051,0x6c6c006c,
+	0x8b8b008b,0x9a9a009a,0xfbfb00fb,0xb0b000b0,
+	0x74740074,0x2b2b002b,0xf0f000f0,0x84840084,
+	0xdfdf00df,0xcbcb00cb,0x34340034,0x76760076,
+	0x6d6d006d,0xa9a900a9,0xd1d100d1,0x04040004,
+	0x14140014,0x3a3a003a,0xdede00de,0x11110011,
+	0x32320032,0x9c9c009c,0x53530053,0xf2f200f2,
+	0xfefe00fe,0xcfcf00cf,0xc3c300c3,0x7a7a007a,
+	0x24240024,0xe8e800e8,0x60600060,0x69690069,
+	0xaaaa00aa,0xa0a000a0,0xa1a100a1,0x62620062,
+	0x54540054,0x1e1e001e,0xe0e000e0,0x64640064,
+	0x10100010,0x00000000,0xa3a300a3,0x75750075,
+	0x8a8a008a,0xe6e600e6,0x09090009,0xdddd00dd,
+	0x87870087,0x83830083,0xcdcd00cd,0x90900090,
+	0x73730073,0xf6f600f6,0x9d9d009d,0xbfbf00bf,
+	0x52520052,0xd8d800d8,0xc8c800c8,0xc6c600c6,
+	0x81810081,0x6f6f006f,0x13130013,0x63630063,
+	0xe9e900e9,0xa7a700a7,0x9f9f009f,0xbcbc00bc,
+	0x29290029,0xf9f900f9,0x2f2f002f,0xb4b400b4,
+	0x78780078,0x06060006,0xe7e700e7,0x71710071,
+	0xd4d400d4,0xabab00ab,0x88880088,0x8d8d008d,
+	0x72720072,0xb9b900b9,0xf8f800f8,0xacac00ac,
+	0x36360036,0x2a2a002a,0x3c3c003c,0xf1f100f1,
+	0x40400040,0xd3d300d3,0xbbbb00bb,0x43430043,
+	0x15150015,0xadad00ad,0x77770077,0x80800080,
+	0x82820082,0xecec00ec,0x27270027,0xe5e500e5,
+	0x85850085,0x35350035,0x0c0c000c,0x41410041,
+	0xefef00ef,0x93930093,0x19190019,0x21210021,
+	0x0e0e000e,0x4e4e004e,0x65650065,0xbdbd00bd,
+	0xb8b800b8,0x8f8f008f,0xebeb00eb,0xcece00ce,
+	0x30300030,0x5f5f005f,0xc5c500c5,0x1a1a001a,
+	0xe1e100e1,0xcaca00ca,0x47470047,0x3d3d003d,
+	0x01010001,0xd6d600d6,0x56560056,0x4d4d004d,
+	0x0d0d000d,0x66660066,0xcccc00cc,0x2d2d002d,
+	0x12120012,0x20200020,0xb1b100b1,0x99990099,
+	0x4c4c004c,0xc2c200c2,0x7e7e007e,0x05050005,
+	0xb7b700b7,0x31310031,0x17170017,0xd7d700d7,
+	0x58580058,0x61610061,0x1b1b001b,0x1c1c001c,
+	0x0f0f000f,0x16160016,0x18180018,0x22220022,
+	0x44440044,0xb2b200b2,0xb5b500b5,0x91910091,
+	0x08080008,0xa8a800a8,0xfcfc00fc,0x50500050,
+	0xd0d000d0,0x7d7d007d,0x89890089,0x97970097,
+	0x5b5b005b,0x95950095,0xffff00ff,0xd2d200d2,
+	0xc4c400c4,0x48480048,0xf7f700f7,0xdbdb00db,
+	0x03030003,0xdada00da,0x3f3f003f,0x94940094,
+	0x5c5c005c,0x02020002,0x4a4a004a,0x33330033,
+	0x67670067,0xf3f300f3,0x7f7f007f,0xe2e200e2,
+	0x9b9b009b,0x26260026,0x37370037,0x3b3b003b,
+	0x96960096,0x4b4b004b,0xbebe00be,0x2e2e002e,
+	0x79790079,0x8c8c008c,0x6e6e006e,0x8e8e008e,
+	0xf5f500f5,0xb6b600b6,0xfdfd00fd,0x59590059,
+	0x98980098,0x6a6a006a,0x46460046,0xbaba00ba,
+	0x25250025,0x42420042,0xa2a200a2,0xfafa00fa,
+	0x07070007,0x55550055,0xeeee00ee,0x0a0a000a,
+	0x49490049,0x68680068,0x38380038,0xa4a400a4,
+	0x28280028,0x7b7b007b,0xc9c900c9,0xc1c100c1,
+	0xe3e300e3,0xf4f400f4,0xc7c700c7,0x9e9e009e,
+};
+
+
+#define CAMELLIA_MIN_KEY_SIZE        16
+#define CAMELLIA_MAX_KEY_SIZE        32
+#define CAMELLIA_BLOCK_SIZE          16
+#define CAMELLIA_TABLE_BYTE_LEN     272
+
+
+/* key constants */
+
+#define CAMELLIA_SIGMA1 (0xA09E667F3BCC908B)
+#define CAMELLIA_SIGMA2 (0xB67AE8584CAA73B2)
+#define CAMELLIA_SIGMA3 (0xC6EF372FE94F82BE)
+#define CAMELLIA_SIGMA4 (0x54FF53A5F1D36F1C)
+#define CAMELLIA_SIGMA5 (0x10E527FADE682D1D)
+#define CAMELLIA_SIGMA6 (0xB05688C2B3E6C1FD)
+
+/*
+ *  macros
+ */
+#define GETU64(v, pt) \
+    do { \
+	/* latest breed of gcc is clever enough to use move */ \
+	memcpy(&(v), (pt), 8); \
+	(v) = be64_to_cpu(v); \
+    } while(0)
+
+/* rotation right shift 1byte */
+#define ROR8(x) (((x) >> 8) + ((x) << (sizeof(x)*8 - 8)))
+/* rotation left shift 1bit */
+#define ROL1(x) (((x) << 1) + ((x) >> (sizeof(x)*8 - 1)))
+/* rotation left shift 1byte */
+#define ROL8(x) (((x) << 8) + ((x) >> (sizeof(x)*8 - 8)))
+
+#define ROLDQ(l, r, w, bits)				\
+    do {						\
+	w = l;						\
+	l = (l << bits) + (r >> (64 - bits));		\
+	r = (r << bits) + (w >> (64 - bits));		\
+    } while(0)
+
+/*
+ * NB: L and R below stand for 'left' and 'right' as in written numbers.
+ * That is, in (xxxL,xxxR) pair xxxL holds most significant digits,
+ * _not_ least significant ones!
+ */
+
+
+/*
+ * Key setup
+ */
+#define CAMELLIA_F(x, k, y, i)					\
+    do {							\
+	u32 yl, yr;						\
+	i = x ^ k;						\
+	yl = camellia_sp1110[(u8)i]				\
+	   ^ camellia_sp0222[(u8)(i >> 24)]			\
+	   ^ camellia_sp3033[(u8)(i >> 16)]			\
+	   ^ camellia_sp4404[(u8)(i >> 8)];			\
+	yr = camellia_sp1110[    (i >> 56)]			\
+	   ^ camellia_sp0222[(u8)(i >> 48)]			\
+	   ^ camellia_sp3033[(u8)(i >> 40)]			\
+	   ^ camellia_sp4404[(u8)(i >> 32)];			\
+	yl ^= yr;						\
+	yr = ROR8(yr);						\
+	yr ^= yl;						\
+	y = ((u64)yl << 32) + yr;				\
+    } while(0)
+
+#define SUBKEY(INDEX) (subkey[(INDEX)])
+
+#ifdef __BIG_ENDIAN
+#define SUBKEY_L(INDEX) (((u32*)subkey)[(INDEX)*2])
+#define SUBKEY_R(INDEX) (((u32*)subkey)[(INDEX)*2 + 1])
+#else
+#define SUBKEY_L(INDEX) (((u32*)subkey)[(INDEX)*2 + 1])
+#define SUBKEY_R(INDEX) (((u32*)subkey)[(INDEX)*2])
+#endif
+
+static void camellia_setup_tail(u64 *subkey, int max)
+{
+	u32 dw;
+	int i = 2;
+	do {
+		dw = SUBKEY_L(i + 0) ^ SUBKEY_R(i + 0); dw = ROL8(dw);/* round 1 */
+		SUBKEY_R(i + 0) = SUBKEY_L(i + 0) ^ dw; SUBKEY_L(i + 0) = dw;
+		dw = SUBKEY_L(i + 1) ^ SUBKEY_R(i + 1); dw = ROL8(dw);/* round 2 */
+		SUBKEY_R(i + 1) = SUBKEY_L(i + 1) ^ dw; SUBKEY_L(i + 1) = dw;
+		dw = SUBKEY_L(i + 2) ^ SUBKEY_R(i + 2); dw = ROL8(dw);/* round 3 */
+		SUBKEY_R(i + 2) = SUBKEY_L(i + 2) ^ dw; SUBKEY_L(i + 2) = dw;
+		dw = SUBKEY_L(i + 3) ^ SUBKEY_R(i + 3); dw = ROL8(dw);/* round 4 */
+		SUBKEY_R(i + 3) = SUBKEY_L(i + 3) ^ dw; SUBKEY_L(i + 3) = dw;
+		dw = SUBKEY_L(i + 4) ^ SUBKEY_R(i + 4); dw = ROL8(dw);/* round 5 */
+		SUBKEY_R(i + 4) = SUBKEY_L(i + 4) ^ dw; SUBKEY_L(i + 4) = dw;
+		dw = SUBKEY_L(i + 5) ^ SUBKEY_R(i + 5); dw = ROL8(dw);/* round 6 */
+		SUBKEY_R(i + 5) = SUBKEY_L(i + 5) ^ dw; SUBKEY_L(i + 5) = dw;
+		i += 8;
+	} while (i < max);
+}
+
+#ifdef __BIG_ENDIAN
+#define subL(INDEX) (((u32*)sub)[(INDEX)*2])
+#define subR(INDEX) (((u32*)sub)[(INDEX)*2 + 1])
+#else
+#define subL(INDEX) (((u32*)sub)[(INDEX)*2 + 1])
+#define subR(INDEX) (((u32*)sub)[(INDEX)*2])
+#endif
+
+static void camellia_setup128(const unsigned char *key, u64 *subkey)
+{
+	u64 kl, kr;
+	u64 i, t, w;
+	u64 kw4;
+	u32 dw;
+	u64 sub[26];
+
+	/**
+	 *  k == kl || kr (|| is concatination)
+	 */
+	GETU64(kl, key     );
+	GETU64(kr, key +  8);
+
+	/**
+	 * generate KL dependent subkeys
+	 */
+	/* kw1 */
+	sub[0] = kl;
+	/* kw2 */
+	sub[1] = kr;
+	/* rotation left shift 15bit */
+	ROLDQ(kl, kr, w, 15);
+	/* k3 */
+	sub[4] = kl;
+	/* k4 */
+	sub[5] = kr;
+	/* rotation left shift 15+30bit */
+	ROLDQ(kl, kr, w, 30);
+	/* k7 */
+	sub[10] = kl;
+	/* k8 */
+	sub[11] = kr;
+	/* rotation left shift 15+30+15bit */
+	ROLDQ(kl, kr, w, 15);
+	/* k10 */
+	sub[13] = kr;
+	/* rotation left shift 15+30+15+17 bit */
+	ROLDQ(kl, kr, w, 17);
+	/* kl3 */
+	sub[16] = kl;
+	/* kl4 */
+	sub[17] = kr;
+	/* rotation left shift 15+30+15+17+17 bit */
+	ROLDQ(kl, kr, w, 17);
+	/* k13 */
+	sub[18] = kl;
+	/* k14 */
+	sub[19] = kr;
+	/* rotation left shift 15+30+15+17+17+17 bit */
+	ROLDQ(kl, kr, w, 17);
+	/* k17 */
+	sub[22] = kl;
+	/* k18 */
+	sub[23] = kr;
+
+	/* generate KA */
+	kl = sub[0];
+	kr = sub[1];
+	CAMELLIA_F(kl, CAMELLIA_SIGMA1, w, i);
+	kr ^= w;
+	CAMELLIA_F(kr, CAMELLIA_SIGMA2, kl, i);
+	/* current status == (kl, w) */
+	CAMELLIA_F(kl, CAMELLIA_SIGMA3, kr, i);
+	kr ^= w;
+	CAMELLIA_F(kr, CAMELLIA_SIGMA4, w, i);
+	kl ^= w;
+
+	/* generate KA dependent subkeys */
+	/* k1, k2 */
+	sub[2] = kl;
+	sub[3] = kr;
+	ROLDQ(kl, kr, w, 15);
+	/* k5,k6 */
+	sub[6] = kl;
+	sub[7] = kr;
+	ROLDQ(kl, kr, w, 15);
+	/* kl1, kl2 */
+	sub[8] = kl;
+	sub[9] = kr;
+	ROLDQ(kl, kr, w, 15);
+	/* k9 */
+	sub[12] = kl;
+	ROLDQ(kl, kr, w, 15);
+	/* k11, k12 */
+	sub[14] = kl;
+	sub[15] = kr;
+	ROLDQ(kl, kr, w, 34);
+	/* k15, k16 */
+	sub[20] = kl;
+	sub[21] = kr;
+	ROLDQ(kl, kr, w, 17);
+	/* kw3, kw4 */
+	sub[24] = kl;
+	sub[25] = kr;
+
+	/* absorb kw2 to other subkeys */
+	/* round 2 */
+	sub[3] ^= sub[1];
+	/* round 4 */
+	sub[5] ^= sub[1];
+	/* round 6 */
+	sub[7] ^= sub[1];
+	subL(1) ^= subR(1) & ~subR(9);
+	dw = subL(1) & subL(9),
+		subR(1) ^= ROL1(dw); /* modified for FLinv(kl2) */
+	/* round 8 */
+	sub[11] ^= sub[1];
+	/* round 10 */
+	sub[13] ^= sub[1];
+	/* round 12 */
+	sub[15] ^= sub[1];
+	subL(1) ^= subR(1) & ~subR(17);
+	dw = subL(1) & subL(17),
+		subR(1) ^= ROL1(dw); /* modified for FLinv(kl4) */
+	/* round 14 */
+	sub[19] ^= sub[1];
+	/* round 16 */
+	sub[21] ^= sub[1];
+	/* round 18 */
+	sub[23] ^= sub[1];
+	/* kw3 */
+	sub[24] ^= sub[1];
+
+	/* absorb kw4 to other subkeys */
+	kw4 = sub[25];
+	/* round 17 */
+	sub[22] ^= kw4;
+	/* round 15 */
+	sub[20] ^= kw4;
+	/* round 13 */
+	sub[18] ^= kw4;
+	kw4 ^= (u64)((u32)kw4 & ~subR(16)) << 32; //kw4l ^= kw4r & ~subR(16);
+	dw = (u32)(kw4 >> 32) & subL(16); // kw4l & subL[16],
+	kw4 ^= ROL1(dw); /* modified for FL(kl3) */
+	/* round 11 */
+	sub[14] ^= kw4;
+	/* round 9 */
+	sub[12] ^= kw4;
+	/* round 7 */
+	sub[10] ^= kw4;
+	kw4 ^= (u64)((u32)kw4 & ~subR(8)) << 32; //kw4l ^= kw4r & ~subR[8];
+	dw = (u32)(kw4 >> 32) & subL(8);
+	kw4 ^= ROL1(dw); /* modified for FL(kl1) */
+	/* round 5 */
+	sub[6] ^= kw4;
+	/* round 3 */
+	sub[4] ^= kw4;
+	/* round 1 */
+	sub[2] ^= kw4;
+	/* kw1 */
+	sub[0] ^= kw4;
+
+	/* key XOR is end of F-function */
+	SUBKEY(0) = sub[0] ^ sub[2];/* kw1 */
+	SUBKEY(2) = sub[3];       /* round 1 */
+	SUBKEY(3) = sub[2] ^ sub[4]; /* round 2 */
+	SUBKEY(4) = sub[3] ^ sub[5]; /* round 3 */
+	SUBKEY(5) = sub[4] ^ sub[6]; /* round 4 */
+	SUBKEY(6) = sub[5] ^ sub[7]; /* round 5 */
+	t = subL(10) ^ (subR(10) & ~subR(8)); // tl = subL[10] ^ (subR[10] & ~subR[8]);
+	dw = (u32)t & subL(8);  /* FL(kl1) */
+	t = (t << 32) | (subR(10) ^ ROL1(dw)); // tr = subR[10] ^ ROL1(dw);
+	SUBKEY(7) = sub[6] ^ t; /* round 6 */
+	SUBKEY(8) = sub[8];       /* FL(kl1) */
+	SUBKEY(9) = sub[9];       /* FLinv(kl2) */
+	t = subL(7) ^ (subR(7) & ~subR(9));
+	dw = (u32)t & subL(9);  /* FLinv(kl2) */
+	t = (t << 32) | (subR(7) ^ ROL1(dw));
+	SUBKEY(10) = t ^ sub[11]; /* round 7 */
+	SUBKEY(11) = sub[10] ^ sub[12]; /* round 8 */
+	SUBKEY(12) = sub[11] ^ sub[13]; /* round 9 */
+	SUBKEY(13) = sub[12] ^ sub[14]; /* round 10 */
+	SUBKEY(14) = sub[13] ^ sub[15]; /* round 11 */
+	t = subL(18) ^ (subR(18) & ~subR(16));
+	dw = (u32)t & subL(16); /* FL(kl3) */
+	t = (t << 32) | (subR(18) ^ ROL1(dw));
+	SUBKEY(15) = sub[14] ^ t; /* round 12 */
+	SUBKEY(16) = sub[16];     /* FL(kl3) */
+	SUBKEY(17) = sub[17];     /* FLinv(kl4) */
+	t = subL(15) ^ (subR(15) & ~subR(17));
+	dw = (u32)t & subL(17); /* FLinv(kl4) */
+	t = (t << 32) | (subR(15) ^ ROL1(dw));
+	SUBKEY(18) = t ^ sub[19]; /* round 13 */
+	SUBKEY(19) = sub[18] ^ sub[20]; /* round 14 */
+	SUBKEY(20) = sub[19] ^ sub[21]; /* round 15 */
+	SUBKEY(21) = sub[20] ^ sub[22]; /* round 16 */
+	SUBKEY(22) = sub[21] ^ sub[23]; /* round 17 */
+	SUBKEY(23) = sub[22];     /* round 18 */
+	SUBKEY(24) = sub[24] ^ sub[23]; /* kw3 */
+
+	/* apply the inverse of the last half of P-function */
+	camellia_setup_tail(subkey, 24);
+}
+
+static void camellia_setup256(const unsigned char *key, u64 *subkey)
+{
+	u64 kl, kr;        /* left half of key */
+	u64 krl, krr;      /* right half of key */
+	u64 i, t, w;       /* temporary variables */
+	u64 kw4;
+	u32 dw;
+	u64 sub[34];
+
+	/**
+	 *  key = (kl || kr || krl || krr)
+	 *  (|| is concatination)
+	 */
+	GETU64(kl,  key     );
+	GETU64(kr,  key +  8);
+	GETU64(krl, key + 16);
+	GETU64(krr, key + 24);
+
+	/* generate KL dependent subkeys */
+	/* kw1 */
+	sub[0] = kl;
+	/* kw2 */
+	sub[1] = kr;
+	ROLDQ(kl, kr, w, 45);
+	/* k9 */
+	sub[12] = kl;
+	/* k10 */
+	sub[13] = kr;
+	ROLDQ(kl, kr, w, 15);
+	/* kl3 */
+	sub[16] = kl;
+	/* kl4 */
+	sub[17] = kr;
+	ROLDQ(kl, kr, w, 17);
+	/* k17 */
+	sub[22] = kl;
+	/* k18 */
+	sub[23] = kr;
+	ROLDQ(kl, kr, w, 34);
+	/* k23 */
+	sub[30] = kl;
+	/* k24 */
+	sub[31] = kr;
+
+	/* generate KR dependent subkeys */
+	ROLDQ(krl, krr, w, 15);
+	/* k3 */
+	sub[4] = krl;
+	/* k4 */
+	sub[5] = krr;
+	ROLDQ(krl, krr, w, 15);
+	/* kl1 */
+	sub[8] = krl;
+	/* kl2 */
+	sub[9] = krr;
+	ROLDQ(krl, krr, w, 30);
+	/* k13 */
+	sub[18] = krl;
+	/* k14 */
+	sub[19] = krr;
+	ROLDQ(krl, krr, w, 34);
+	/* k19 */
+	sub[26] = krl;
+	/* k20 */
+	sub[27] = krr;
+	ROLDQ(krl, krr, w, 34);
+
+	/* generate KA */
+	kl = sub[0] ^ krl;
+	kr = sub[1] ^ krr;
+	CAMELLIA_F(kl, CAMELLIA_SIGMA1, w, i);
+	kr ^= w;
+	CAMELLIA_F(kr, CAMELLIA_SIGMA2, kl, i);
+	kl ^= krl;
+	CAMELLIA_F(kl, CAMELLIA_SIGMA3, kr, i);
+	kr ^= w ^ krr;
+	CAMELLIA_F(kr, CAMELLIA_SIGMA4, w, i);
+	kl ^= w;
+
+	/* generate KB */
+	krl ^= kl;
+	krr ^= kr;
+	CAMELLIA_F(krl, CAMELLIA_SIGMA5, w, i);
+	krr ^= w;
+	CAMELLIA_F(krr, CAMELLIA_SIGMA6, w, i);
+	krl ^= w;
+
+	/* generate KA dependent subkeys */
+	ROLDQ(kl, kr, w, 15);
+	/* k5 */
+	sub[6] = kl;
+	/* k6 */
+	sub[7] = kr;
+	ROLDQ(kl, kr, w, 30);
+	/* k11 */
+	sub[14] = kl;
+	/* k12 */
+	sub[15] = kr;
+	/* kl5 */
+	ROLDQ(kl, kr, w, 32);
+	sub[24] = kl;
+	/* kl6 */
+	sub[25] = kr;
+	/* rotation left shift 49 from k11,k12 -> k21,k22 */
+	ROLDQ(kl, kr, w, (49 - 32));
+	/* k21 */
+	sub[28] = kl;
+	/* k22 */
+	sub[29] = kr;
+
+	/* generate KB dependent subkeys */
+	/* k1 */
+	sub[2] = krl;
+	/* k2 */
+	sub[3] = krr;
+	ROLDQ(krl, krr, w, 30);
+	/* k7 */
+	sub[10] = krl;
+	/* k8 */
+	sub[11] = krr;
+	ROLDQ(krl, krr, w, 30);
+	/* k15 */
+	sub[20] = krl;
+	/* k16 */
+	sub[21] = krr;
+	ROLDQ(krl, krr, w, 51);
+	/* kw3 */
+	sub[32] = krl;
+	/* kw4 */
+	sub[33] = krr;
+
+	/* absorb kw2 to other subkeys */
+	/* round 2 */
+	sub[3] ^= sub[1];
+	/* round 4 */
+	sub[5] ^= sub[1];
+	/* round 6 */
+	sub[7] ^= sub[1];
+	subL(1) ^= subR(1) & ~subR(9);
+	dw = subL(1) & subL(9),
+		subR(1) ^= ROL1(dw); /* modified for FLinv(kl2) */
+	/* round 8 */
+	sub[11] ^= sub[1];
+	/* round 10 */
+	sub[13] ^= sub[1];
+	/* round 12 */
+	sub[15] ^= sub[1];
+	subL(1) ^= subR(1) & ~subR(17);
+	dw = subL(1) & subL(17),
+		subR(1) ^= ROL1(dw); /* modified for FLinv(kl4) */
+	/* round 14 */
+	sub[19] ^= sub[1];
+	/* round 16 */
+	sub[21] ^= sub[1];
+	/* round 18 */
+	sub[23] ^= sub[1];
+	subL(1) ^= subR(1) & ~subR(25);
+	dw = subL(1) & subL(25),
+		subR(1) ^= ROL1(dw); /* modified for FLinv(kl6) */
+	/* round 20 */
+	sub[27] ^= sub[1];
+	/* round 22 */
+	sub[29] ^= sub[1];
+	/* round 24 */
+	sub[31] ^= sub[1];
+	/* kw3 */
+	sub[32] ^= sub[1];
+
+	/* absorb kw4 to other subkeys */
+	kw4 = sub[33];
+	/* round 23 */
+	sub[30] ^= kw4;
+	/* round 21 */
+	sub[28] ^= kw4;
+	/* round 19 */
+	sub[26] ^= kw4;
+	kw4 ^= (u64)((u32)kw4 & ~subR(24)) << 32; //kw4l ^= kw4r & ~subR[24];
+	dw = (u32)(kw4 >> 32) & subL(24),
+		kw4 ^= ROL1(dw); /* modified for FL(kl5) */
+	/* round 17 */
+	sub[22] ^= kw4;
+	/* round 15 */
+	sub[20] ^= kw4;
+	/* round 13 */
+	sub[18] ^= kw4;
+	kw4 ^= (u64)((u32)kw4 & ~subR(16)) << 32;
+	dw = (u32)(kw4 >> 32) & subL(16),
+		kw4 ^= ROL1(dw); /* modified for FL(kl3) */
+	/* round 11 */
+	sub[14] ^= kw4;
+	/* round 9 */
+	sub[12] ^= kw4;
+	/* round 7 */
+	sub[10] ^= kw4;
+	kw4 ^= (u64)((u32)kw4 & ~subR(8)) << 32;
+	dw = (u32)(kw4 >> 32) & subL(8),
+		kw4 ^= ROL1(dw); /* modified for FL(kl1) */
+	/* round 5 */
+	sub[6] ^= kw4;
+	/* round 3 */
+	sub[4] ^= kw4;
+	/* round 1 */
+	sub[2] ^= kw4;
+	/* kw1 */
+	sub[0] ^= kw4;
+
+	/* key XOR is end of F-function */
+	SUBKEY(0) = sub[0] ^ sub[2];/* kw1 */
+	SUBKEY(2) = sub[3];       /* round 1 */
+	SUBKEY(3) = sub[2] ^ sub[4]; /* round 2 */
+	SUBKEY(4) = sub[3] ^ sub[5]; /* round 3 */
+	SUBKEY(5) = sub[4] ^ sub[6]; /* round 4 */
+	SUBKEY(6) = sub[5] ^ sub[7]; /* round 5 */
+	t = subL(10) ^ (subR(10) & ~subR(8)); // tl = subL[10] ^ (subR[10] & ~subR[8]);
+	dw = (u32)t & subL(8);  /* FL(kl1) */
+	t = (t << 32) | (subR(10) ^ ROL1(dw)); //tr = subR[10] ^ ROL1(dw);
+	SUBKEY(7) = sub[6] ^ t;   /* round 6 */
+	SUBKEY(8) = sub[8];       /* FL(kl1) */
+	SUBKEY(9) = sub[9];       /* FLinv(kl2) */
+	t = subL(7) ^ (subR(7) & ~subR(9));
+	dw = (u32)t & subL(9);  /* FLinv(kl2) */
+	t = (t << 32) | (subR(7) ^ ROL1(dw));
+	SUBKEY(10) = t ^ sub[11]; /* round 7 */
+	SUBKEY(11) = sub[10] ^ sub[12]; /* round 8 */
+	SUBKEY(12) = sub[11] ^ sub[13]; /* round 9 */
+	SUBKEY(13) = sub[12] ^ sub[14]; /* round 10 */
+	SUBKEY(14) = sub[13] ^ sub[15]; /* round 11 */
+	t = subL(18) ^ (subR(18) & ~subR(16));
+	dw = (u32)t & subL(16); /* FL(kl3) */
+	t = (t << 32) | (subR(18) ^ ROL1(dw));
+	SUBKEY(15) = sub[14] ^ t; /* round 12 */
+	SUBKEY(16) = sub[16];     /* FL(kl3) */
+	SUBKEY(17) = sub[17];     /* FLinv(kl4) */
+	t = subL(15) ^ (subR(15) & ~subR(17));
+	dw = (u32)t & subL(17); /* FLinv(kl4) */
+	t = (t << 32) | (subR(15) ^ ROL1(dw));
+	SUBKEY(18) = t ^ sub[19]; /* round 13 */
+	SUBKEY(19) = sub[18] ^ sub[20]; /* round 14 */
+	SUBKEY(20) = sub[19] ^ sub[21]; /* round 15 */
+	SUBKEY(21) = sub[20] ^ sub[22]; /* round 16 */
+	SUBKEY(22) = sub[21] ^ sub[23]; /* round 17 */
+	t = subL(26) ^ (subR(26) & ~subR(24));
+	dw = (u32)t & subL(24); /* FL(kl5) */
+	t = (t << 32) | (subR(26) ^ ROL1(dw));
+	SUBKEY(23) = sub[22] ^ t; /* round 18 */
+	SUBKEY(24) = sub[24];     /* FL(kl5) */
+	SUBKEY(25) = sub[25];     /* FLinv(kl6) */
+	t = subL(23) ^ (subR(23) & ~subR(25));
+	dw = (u32)t & subL(25); /* FLinv(kl6) */
+	t = (t << 32) | (subR(23) ^ ROL1(dw));
+	SUBKEY(26) = t ^ sub[27]; /* round 19 */
+	SUBKEY(27) = sub[26] ^ sub[28]; /* round 20 */
+	SUBKEY(28) = sub[27] ^ sub[29]; /* round 21 */
+	SUBKEY(29) = sub[28] ^ sub[30]; /* round 22 */
+	SUBKEY(30) = sub[29] ^ sub[31]; /* round 23 */
+	SUBKEY(31) = sub[30];     /* round 24 */
+	SUBKEY(32) = sub[32] ^ sub[31]; /* kw3 */
+
+	/* apply the inverse of the last half of P-function */
+	camellia_setup_tail(subkey, 32);
+}
+
+static void camellia_setup192(const unsigned char *key, u64 *subkey)
+{
+	unsigned char kk[32];
+	u64 krl, krr;
+
+	memcpy(kk, key, 24);
+	memcpy((unsigned char *)&krl, key+16, 8);
+	krr = ~krl;
+	memcpy(kk+24, (unsigned char *)&krr, 8);
+	camellia_setup256(kk, subkey);
+}
+
+
+/*
+ * Encrypt/decrypt
+ */
+#define CAMELLIA_FLS(ll, lr, rl, rr, kll, klr, krl, krr, t0, t1, t2, t3) \
+    do {								\
+	t0 = kll & ll;							\
+	t2 = krr | rr;							\
+	rl ^= t2;							\
+	lr ^= ROL1(t0);							\
+	t3 = krl & rl;							\
+	t1 = klr | lr;							\
+	ll ^= t1;							\
+	rr ^= ROL1(t3);							\
+    } while(0)
+
+#define CAMELLIA_ROUNDSM(xl, xr, kl, kr, yl, yr, il, ir)		\
+    do {								\
+	ir =  camellia_sp1110[(u8)xr];					\
+	il =  camellia_sp1110[    (xl >> 24)];				\
+	ir ^= camellia_sp0222[    (xr >> 24)];				\
+	il ^= camellia_sp0222[(u8)(xl >> 16)];				\
+	ir ^= camellia_sp3033[(u8)(xr >> 16)];				\
+	il ^= camellia_sp3033[(u8)(xl >> 8)];				\
+	ir ^= camellia_sp4404[(u8)(xr >> 8)];				\
+	il ^= camellia_sp4404[(u8)xl];					\
+	il ^= kl;							\
+	ir ^= il ^ kr;							\
+	yl ^= ir;							\
+	yr ^= ROR8(il) ^ ir;						\
+    } while(0)
+
+/* max = 24: 128bit encrypt, max = 32: 256bit encrypt */
+static void camellia_do_encrypt(const u64 *subkey, u32 *io, unsigned max)
+{
+	u32 il,ir,t0,t1;               /* temporary variables */
+
+	/* pre whitening but absorb kw2 */
+	io[0] ^= SUBKEY_L(0);
+	io[1] ^= SUBKEY_R(0);
+
+	/* main iteration */
+#define ROUNDS(i) do { \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 2),SUBKEY_R(i + 2), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 3),SUBKEY_R(i + 3), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 4),SUBKEY_R(i + 4), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 5),SUBKEY_R(i + 5), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 6),SUBKEY_R(i + 6), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 7),SUBKEY_R(i + 7), \
+			 io[0],io[1],il,ir); \
+} while (0)
+#define FLS(i) do { \
+	CAMELLIA_FLS(io[0],io[1],io[2],io[3], \
+		     SUBKEY_L(i + 0),SUBKEY_R(i + 0), \
+		     SUBKEY_L(i + 1),SUBKEY_R(i + 1), \
+		     t0,t1,il,ir); \
+} while (0)
+
+#ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
+	{
+		unsigned i = 0;
+		while (1) {
+			ROUNDS(i);
+			i += 8;
+			if (i >= max)
+				break;
+			FLS(i);
+		}
+	}
+#else
+	ROUNDS(0);
+	FLS(8);
+	ROUNDS(8);
+	FLS(16);
+	ROUNDS(16);
+	if (max == 32) {
+		FLS(24);
+		ROUNDS(24);
+	}
+#endif
+
+#undef ROUNDS
+#undef FLS
+
+	/* post whitening but kw4 */
+	io[2] ^= SUBKEY_L(max);
+	io[3] ^= SUBKEY_R(max);
+	/* NB: 0,1 should be swapped with 2,3 by caller! */
+}
+
+static void camellia_do_decrypt(const u64 *subkey, u32 *io, unsigned i)
+{
+	u32 il,ir,t0,t1;               /* temporary variables */
+
+	/* pre whitening but absorb kw2 */
+	io[0] ^= SUBKEY_L(i);
+	io[1] ^= SUBKEY_R(i);
+
+	/* main iteration */
+#define ROUNDS(i) do { \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 7),SUBKEY_R(i + 7), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 6),SUBKEY_R(i + 6), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 5),SUBKEY_R(i + 5), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 4),SUBKEY_R(i + 4), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 3),SUBKEY_R(i + 3), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 2),SUBKEY_R(i + 2), \
+			 io[0],io[1],il,ir); \
+} while (0)
+#define FLS(i) do { \
+	CAMELLIA_FLS(io[0],io[1],io[2],io[3], \
+		     SUBKEY_L(i + 1),SUBKEY_R(i + 1), \
+		     SUBKEY_L(i + 0),SUBKEY_R(i + 0), \
+		     t0,t1,il,ir); \
+} while (0)
+
+#ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
+	while (1) {
+		i -= 8;
+		ROUNDS(i);
+		if (i == 0)
+			break;
+		FLS(i);
+	}
+#else
+	if (i == 32) {
+		ROUNDS(24);
+		FLS(24);
+	}
+	ROUNDS(16);
+	FLS(16);
+	ROUNDS(8);
+	FLS(8);
+	ROUNDS(0);
+#endif
+
+#undef ROUNDS
+#undef FLS
+
+	/* post whitening but kw4 */
+	io[2] ^= SUBKEY_L(0);
+	io[3] ^= SUBKEY_R(0);
+	/* NB: 0,1 should be swapped with 2,3 by caller! */
+}
+
+
+struct camellia_ctx {
+	int key_length;
+	u64 key_table[CAMELLIA_TABLE_BYTE_LEN / 8];
+};
+
+static int
+camellia_set_key(struct crypto_tfm *tfm, const u8 *in_key,
+		 unsigned int key_len)
+{
+	struct camellia_ctx *cctx = crypto_tfm_ctx(tfm);
+	const unsigned char *key = (const unsigned char *)in_key;
+	u32 *flags = &tfm->crt_flags;
+
+	if (key_len != 16 && key_len != 24 && key_len != 32) {
+		*flags |= CRYPTO_TFM_RES_BAD_KEY_LEN;
+		return -EINVAL;
+	}
+
+	cctx->key_length = key_len;
+
+	switch (key_len) {
+	case 16:
+		camellia_setup128(key, cctx->key_table);
+		break;
+	case 24:
+		camellia_setup192(key, cctx->key_table);
+		break;
+	case 32:
+		camellia_setup256(key, cctx->key_table);
+		break;
+	}
+
+	return 0;
+}
+
+static void camellia_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+	const struct camellia_ctx *cctx = crypto_tfm_ctx(tfm);
+	const __be32 *src = (const __be32 *)in;
+	__be32 *dst = (__be32 *)out;
+
+	u32 tmp[4];
+
+	tmp[0] = be32_to_cpu(src[0]);
+	tmp[1] = be32_to_cpu(src[1]);
+	tmp[2] = be32_to_cpu(src[2]);
+	tmp[3] = be32_to_cpu(src[3]);
+
+	camellia_do_encrypt(cctx->key_table, tmp,
+		cctx->key_length == 16 ? 24 : 32 /* for key lengths of 24 and 32 */
+	);
+
+	/* do_encrypt returns 0,1 swapped with 2,3 */
+	dst[0] = cpu_to_be32(tmp[2]);
+	dst[1] = cpu_to_be32(tmp[3]);
+	dst[2] = cpu_to_be32(tmp[0]);
+	dst[3] = cpu_to_be32(tmp[1]);
+}
+
+static void camellia_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+	const struct camellia_ctx *cctx = crypto_tfm_ctx(tfm);
+	const __be32 *src = (const __be32 *)in;
+	__be32 *dst = (__be32 *)out;
+
+	u32 tmp[4];
+
+	tmp[0] = be32_to_cpu(src[0]);
+	tmp[1] = be32_to_cpu(src[1]);
+	tmp[2] = be32_to_cpu(src[2]);
+	tmp[3] = be32_to_cpu(src[3]);
+
+	camellia_do_decrypt(cctx->key_table, tmp,
+		cctx->key_length == 16 ? 24 : 32 /* for key lengths of 24 and 32 */
+	);
+
+	/* do_decrypt returns 0,1 swapped with 2,3 */
+	dst[0] = cpu_to_be32(tmp[2]);
+	dst[1] = cpu_to_be32(tmp[3]);
+	dst[2] = cpu_to_be32(tmp[0]);
+	dst[3] = cpu_to_be32(tmp[1]);
+}
+
+static struct crypto_alg camellia_alg = {
+	.cra_name		=	"camellia",
+	.cra_driver_name	=	"camellia-generic",
+	.cra_priority		=	100,
+	.cra_flags		=	CRYPTO_ALG_TYPE_CIPHER,
+	.cra_blocksize		=	CAMELLIA_BLOCK_SIZE,
+	.cra_ctxsize		=	sizeof(struct camellia_ctx),
+	.cra_alignmask		=	3,
+	.cra_module		=	THIS_MODULE,
+	.cra_list		=	LIST_HEAD_INIT(camellia_alg.cra_list),
+	.cra_u			=	{
+		.cipher = {
+			.cia_min_keysize	=	CAMELLIA_MIN_KEY_SIZE,
+			.cia_max_keysize	=	CAMELLIA_MAX_KEY_SIZE,
+			.cia_setkey		=	camellia_set_key,
+			.cia_encrypt		=	camellia_encrypt,
+			.cia_decrypt		=	camellia_decrypt
+		}
+	}
+};
+
+static int __init camellia_init(void)
+{
+	return crypto_register_alg(&camellia_alg);
+}
+
+static void __exit camellia_fini(void)
+{
+	crypto_unregister_alg(&camellia_alg);
+}
+
+module_init(camellia_init);
+module_exit(camellia_fini);
+
+MODULE_DESCRIPTION("Camellia Cipher Algorithm");
+MODULE_LICENSE("GPL");

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization
  2007-11-13 22:34             ` Denys Vlasenko
@ 2007-11-14  1:41               ` David Miller
  2007-11-14  2:47                 ` Denys Vlasenko
  0 siblings, 1 reply; 40+ messages in thread
From: David Miller @ 2007-11-14  1:41 UTC (permalink / raw)
  To: vda.linux; +Cc: takamiya, herbert, linux-crypto

From: Denys Vlasenko <vda.linux@googlemail.com>
Date: Tue, 13 Nov 2007 15:34:33 -0700

> My preferred solution is to make loop unrolling conditional on
> CONFIG_CC_OPTIMIZE_FOR_SIZE - and this is what is done in my
> (first) patch (see attached). This part:

The default build is going to be CONFIG_CC_OPTIMIZE_FOR_SIZE
basically for everyone, this is what people get by default
and this is what every distribution uses.

Therefore %99.9999 of folks will get the slowdown.

So in my book this is not an acceptable way to deal with
this problem.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization
  2007-11-14  1:41               ` David Miller
@ 2007-11-14  2:47                 ` Denys Vlasenko
  2007-11-14  3:49                   ` David Miller
  2007-11-14  4:18                   ` Noriaki TAKAMIYA
  0 siblings, 2 replies; 40+ messages in thread
From: Denys Vlasenko @ 2007-11-14  2:47 UTC (permalink / raw)
  To: David Miller; +Cc: takamiya, herbert, linux-crypto

On Tuesday 13 November 2007 18:41, David Miller wrote:
> From: Denys Vlasenko <vda.linux@googlemail.com>
> Date: Tue, 13 Nov 2007 15:34:33 -0700
>
> > My preferred solution is to make loop unrolling conditional on
> > CONFIG_CC_OPTIMIZE_FOR_SIZE - and this is what is done in my
> > (first) patch (see attached). This part:
>
> The default build is going to be CONFIG_CC_OPTIMIZE_FOR_SIZE
> basically for everyone, this is what people get by default
> and this is what every distribution uses.
>
> Therefore %99.9999 of folks will get the slowdown.
>
> So in my book this is not an acceptable way to deal with
> this problem.

Loop unrolling here amounts to 25% code growth:

   text    data     bss     dec     hex filename
  21714       0       0   21714    54d2 camellia5.o
  15906       0       0   15906    3e22 camellia5_Os.o

Saving 25% or code size and going 5% slower is perfectly acceptable
tradeof for some users. NB: I'm not saying all, ut some significant
part of users would like to be able to have this choice.

If CONFIG_CC_OPTIMIZE_FOR_SIZE is not an acceptable method,
do you have other ideas?
--
vda

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization
  2007-11-14  2:47                 ` Denys Vlasenko
@ 2007-11-14  3:49                   ` David Miller
  2007-11-14  5:30                     ` Denys Vlasenko
  2007-11-14  4:18                   ` Noriaki TAKAMIYA
  1 sibling, 1 reply; 40+ messages in thread
From: David Miller @ 2007-11-14  3:49 UTC (permalink / raw)
  To: vda.linux; +Cc: takamiya, herbert, linux-crypto

From: Denys Vlasenko <vda.linux@googlemail.com>
Date: Tue, 13 Nov 2007 19:47:08 -0700

> If CONFIG_CC_OPTIMIZE_FOR_SIZE is not an acceptable method,
> do you have other ideas?

Look at ways to make the code run faster without loop unrolling?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization
  2007-11-14  2:47                 ` Denys Vlasenko
  2007-11-14  3:49                   ` David Miller
@ 2007-11-14  4:18                   ` Noriaki TAKAMIYA
  1 sibling, 0 replies; 40+ messages in thread
From: Noriaki TAKAMIYA @ 2007-11-14  4:18 UTC (permalink / raw)
  To: vda.linux; +Cc: davem, herbert, linux-crypto

Hi,

>> Tue, 13 Nov 2007 19:47:08 -0700
>> [Subject: Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization]
>> Denys Vlasenko <vda.linux@googlemail.com> wrote...

> On Tuesday 13 November 2007 18:41, David Miller wrote:
> > From: Denys Vlasenko <vda.linux@googlemail.com>
> > Date: Tue, 13 Nov 2007 15:34:33 -0700
> >
> > > My preferred solution is to make loop unrolling conditional on
> > > CONFIG_CC_OPTIMIZE_FOR_SIZE - and this is what is done in my
> > > (first) patch (see attached). This part:
> >
> > The default build is going to be CONFIG_CC_OPTIMIZE_FOR_SIZE
> > basically for everyone, this is what people get by default
> > and this is what every distribution uses.
> >
> > Therefore %99.9999 of folks will get the slowdown.
> >
> > So in my book this is not an acceptable way to deal with
> > this problem.
> 
> Loop unrolling here amounts to 25% code growth:
> 
>    text    data     bss     dec     hex filename
>   21714       0       0   21714    54d2 camellia5.o
>   15906       0       0   15906    3e22 camellia5_Os.o
> 
> Saving 25% or code size and going 5% slower is perfectly acceptable
> tradeof for some users. NB: I'm not saying all, ut some significant
> part of users would like to be able to have this choice.

  IMHO, if you are going to use camellia on the embedded system, size
  of code will be important.

  On the other hand, I think typically the CPU performance is
  restricted on the embedded system, so the performance of code will
  be important...

  I'm not sure 5% slow down is important or not. It will depend on the
  system.

  Regards,

--
Noriaki TAKAMYA

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization
  2007-11-14  3:49                   ` David Miller
@ 2007-11-14  5:30                     ` Denys Vlasenko
  2007-11-14  6:10                       ` David Miller
  2007-11-14  7:15                       ` Denys Vlasenko
  0 siblings, 2 replies; 40+ messages in thread
From: Denys Vlasenko @ 2007-11-14  5:30 UTC (permalink / raw)
  To: David Miller; +Cc: takamiya, herbert, linux-crypto

On Tuesday 13 November 2007 20:49, David Miller wrote:
> From: Denys Vlasenko <vda.linux@googlemail.com>
> Date: Tue, 13 Nov 2007 19:47:08 -0700
>
> > If CONFIG_CC_OPTIMIZE_FOR_SIZE is not an acceptable method,
> > do you have other ideas?
>
> Look at ways to make the code run faster without loop unrolling?

I did it. I noticed that key setup is mostly operating on 64-bit
quantities, and provided alternative implementation which
exploits that fact. It's smaller and faster.

However, after I've done that, the question still stands:
should I unroll the loop or not?

The situation we are in now is exactly the sutiation I want to
avoid:

On Wednesday 07 November 2007 06:22, Denys Vlasenko wrote:
> > Having two versions of the cdoe is unmaintainable.  So please
> > either decide that 5% is worth it or isn't.
>
> *I* am happy with 5% speed sacrifice. I'm afraid other people won't be.
>
> I just want to escape vicious cycle of -Os people arguing with
> -O2 people to no end. I don't want somebody to come later
> and unroll the loop again. And then me to come
> and de-unroll it again...
>
> It's better for everybody to recognize that both POVs are valid,
> and have provisions for tuning size/speed tradeoff by the user
> (person which builds the binary).

That's why I made a patch where unrolling can be enabled by CONFIG_xxx.

I will resubmit the patch without de-unrolling.
Meanwhile, I'd like to ask you guys to think about ways
to make size/speed tradeoffs selectable at build time.
--
vda

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization
  2007-11-14  5:30                     ` Denys Vlasenko
@ 2007-11-14  6:10                       ` David Miller
  2007-11-14  7:38                         ` Denys Vlasenko
  2007-11-14  7:15                       ` Denys Vlasenko
  1 sibling, 1 reply; 40+ messages in thread
From: David Miller @ 2007-11-14  6:10 UTC (permalink / raw)
  To: vda.linux; +Cc: takamiya, herbert, linux-crypto

From: Denys Vlasenko <vda.linux@googlemail.com>
Date: Tue, 13 Nov 2007 22:30:47 -0700

> On Tuesday 13 November 2007 20:49, David Miller wrote:
> > From: Denys Vlasenko <vda.linux@googlemail.com>
> > Date: Tue, 13 Nov 2007 19:47:08 -0700
> >
> > > If CONFIG_CC_OPTIMIZE_FOR_SIZE is not an acceptable method,
> > > do you have other ideas?
> >
> > Look at ways to make the code run faster without loop unrolling?
> 
> I did it. I noticed that key setup is mostly operating on 64-bit
> quantities, and provided alternative implementation which
> exploits that fact. It's smaller and faster.

Great, then you don't have to unroll the loop and performance
is at least as good as before _and_ you save code space.

It's perfect, you don't need compile time checks or anything
silly like that.

Please submit this new version :-)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization
  2007-11-14  5:30                     ` Denys Vlasenko
  2007-11-14  6:10                       ` David Miller
@ 2007-11-14  7:15                       ` Denys Vlasenko
  2007-11-14 14:14                         ` Herbert Xu
  1 sibling, 1 reply; 40+ messages in thread
From: Denys Vlasenko @ 2007-11-14  7:15 UTC (permalink / raw)
  To: David Miller, takamiya, herbert; +Cc: linux-crypto

[-- Attachment #1: Type: text/plain, Size: 977 bytes --]

On Tuesday 13 November 2007 22:30, Denys Vlasenko wrote:
> I will resubmit the patch without de-unrolling.
> Meanwhile, I'd like to ask you guys to think about ways
> to make size/speed tradeoffs selectable at build time.

Here is the patch which has loops still unrolled,
but otherwise unchanged.

Description:

    Use alternative key setup implementation with mostly 64-bit ops
    if BITS_PER_LONG >= 64. Both much smaller and much faster.

    Unify camellia_en/decrypt128/256 into camellia_do_en/decrypt.
    Code was similar, with just one additional if() we can use came code.

    Replace (x & 0xff) with (u8)x, gcc is not smart enough to realize
    that it can do (x & 0xff) this way (which is smaller at least on i386).

    Don't do (x & 0xff) in a few places where x cannot be > 255 anyway:
        t0 = il >> 16; v = camellia_sp0222[(t1 >> 8) & 0xff];
    il16 is u32, (thus t1 >> 8) is one byte!

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
--
vda

[-- Attachment #2: linux-2.6.23.1.camellia5.diff --]
[-- Type: text/x-diff, Size: 55383 bytes --]

diff -urpN linux-2.6.23.1.camellia/crypto/camellia.c linux-2.6.23.1.camellia5/crypto/camellia.c
--- linux-2.6.23.1.camellia/crypto/camellia.c	2007-11-13 22:47:28.000000000 -0700
+++ linux-2.6.23.1.camellia5/crypto/camellia.c	2007-11-13 22:57:54.000000000 -0700
@@ -36,6 +36,13 @@
 #include <linux/kernel.h>
 #include <linux/module.h>
 
+#if BITS_PER_LONG >= 64
+
+/* Use alternative implementation with mostly 64-bit ops */
+#include "camellia_64.c"
+
+#else
+
 static const u32 camellia_sp1110[256] = {
 	0x70707000,0x82828200,0x2c2c2c00,0xececec00,
 	0xb3b3b300,0x27272700,0xc0c0c000,0xe5e5e500,
@@ -329,7 +336,6 @@ static const u32 camellia_sp4404[256] = 
 /*
  *  macros
  */
-
 # define GETU32(v, pt) \
     do { \
 	/* latest breed of gcc is clever enough to use move */ \
@@ -364,63 +370,28 @@ static const u32 camellia_sp4404[256] = 
     } while(0)
 
 
+/*
+ * Key setup
+ */
 #define CAMELLIA_F(xl, xr, kl, kr, yl, yr, il, ir, t0, t1)	\
     do {							\
 	il = xl ^ kl;						\
 	ir = xr ^ kr;						\
 	t0 = il >> 16;						\
 	t1 = ir >> 16;						\
-	yl = camellia_sp1110[ir & 0xff]				\
-	   ^ camellia_sp0222[(t1 >> 8) & 0xff]			\
-	   ^ camellia_sp3033[t1 & 0xff]				\
-	   ^ camellia_sp4404[(ir >> 8) & 0xff];			\
-	yr = camellia_sp1110[(t0 >> 8) & 0xff]			\
-	   ^ camellia_sp0222[t0 & 0xff]				\
-	   ^ camellia_sp3033[(il >> 8) & 0xff]			\
-	   ^ camellia_sp4404[il & 0xff];			\
+	yl = camellia_sp1110[(u8)(ir     )]			\
+	   ^ camellia_sp0222[    (t1 >> 8)]			\
+	   ^ camellia_sp3033[(u8)(t1     )]			\
+	   ^ camellia_sp4404[(u8)(ir >> 8)];			\
+	yr = camellia_sp1110[    (t0 >> 8)]			\
+	   ^ camellia_sp0222[(u8)(t0     )]			\
+	   ^ camellia_sp3033[(u8)(il >> 8)]			\
+	   ^ camellia_sp4404[(u8)(il     )];			\
 	yl ^= yr;						\
 	yr = ROR8(yr);						\
 	yr ^= yl;						\
     } while(0)
 
-
-/*
- * for speed up
- *
- */
-#define CAMELLIA_FLS(ll, lr, rl, rr, kll, klr, krl, krr, t0, t1, t2, t3) \
-    do {								\
-	t0 = kll;							\
-	t2 = krr;							\
-	t0 &= ll;							\
-	t2 |= rr;							\
-	rl ^= t2;							\
-	lr ^= ROL1(t0);							\
-	t3 = krl;							\
-	t1 = klr;							\
-	t3 &= rl;							\
-	t1 |= lr;							\
-	ll ^= t1;							\
-	rr ^= ROL1(t3);							\
-    } while(0)
-
-#define CAMELLIA_ROUNDSM(xl, xr, kl, kr, yl, yr, il, ir, t0, t1)	\
-    do {								\
-	ir =  camellia_sp1110[xr & 0xff];				\
-	il =  camellia_sp1110[(xl>>24) & 0xff];				\
-	ir ^= camellia_sp0222[(xr>>24) & 0xff];				\
-	il ^= camellia_sp0222[(xl>>16) & 0xff];				\
-	ir ^= camellia_sp3033[(xr>>16) & 0xff];				\
-	il ^= camellia_sp3033[(xl>>8) & 0xff];				\
-	ir ^= camellia_sp4404[(xr>>8) & 0xff];				\
-	il ^= camellia_sp4404[xl & 0xff];				\
-	il ^= kl;							\
-	ir ^= il ^ kr;							\
-	yl ^= ir;							\
-	yr ^= ROR8(il) ^ ir;						\
-    } while(0)
-
-
 #define SUBKEY_L(INDEX) (subkey[(INDEX)*2])
 #define SUBKEY_R(INDEX) (subkey[(INDEX)*2 + 1])
 
@@ -622,7 +593,7 @@ static void camellia_setup128(const unsi
 	SUBKEY_L(6) = subL[5] ^ subL[7]; /* round 5 */
 	SUBKEY_R(6) = subR[5] ^ subR[7];
 	tl = subL[10] ^ (subR[10] & ~subR[8]);
-	dw = tl & subL[8],  /* FL(kl1) */
+	dw = tl & subL[8];  /* FL(kl1) */
 		tr = subR[10] ^ ROL1(dw);
 	SUBKEY_L(7) = subL[6] ^ tl; /* round 6 */
 	SUBKEY_R(7) = subR[6] ^ tr;
@@ -1000,400 +971,150 @@ static void camellia_setup192(const unsi
 }
 
 
-static void camellia_encrypt128(const u32 *subkey, u32 *io_text)
-{
-	u32 il,ir,t0,t1;               /* temporary variables */
-
-	u32 io[4];
-
-	/* pre whitening but absorb kw2 */
-	io[0] = io_text[0] ^ SUBKEY_L(0);
-	io[1] = io_text[1] ^ SUBKEY_R(0);
-	io[2] = io_text[2];
-	io[3] = io_text[3];
-
-	/* main iteration */
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(2),SUBKEY_R(2),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(3),SUBKEY_R(3),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(4),SUBKEY_R(4),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(5),SUBKEY_R(5),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(6),SUBKEY_R(6),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(7),SUBKEY_R(7),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(8),SUBKEY_R(8),
-		     SUBKEY_L(9),SUBKEY_R(9),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(10),SUBKEY_R(10),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(11),SUBKEY_R(11),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(12),SUBKEY_R(12),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(13),SUBKEY_R(13),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(14),SUBKEY_R(14),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(15),SUBKEY_R(15),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(16),SUBKEY_R(16),
-		     SUBKEY_L(17),SUBKEY_R(17),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(18),SUBKEY_R(18),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(19),SUBKEY_R(19),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(20),SUBKEY_R(20),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(21),SUBKEY_R(21),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(22),SUBKEY_R(22),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(23),SUBKEY_R(23),
-			 io[0],io[1],il,ir,t0,t1);
+/*
+ * Encrypt/decrypt
+ */
+#define CAMELLIA_FLS(ll, lr, rl, rr, kll, klr, krl, krr, t0, t1, t2, t3) \
+    do {								\
+	t0 = kll;							\
+	t2 = krr;							\
+	t0 &= ll;							\
+	t2 |= rr;							\
+	rl ^= t2;							\
+	lr ^= ROL1(t0);							\
+	t3 = krl;							\
+	t1 = klr;							\
+	t3 &= rl;							\
+	t1 |= lr;							\
+	ll ^= t1;							\
+	rr ^= ROL1(t3);							\
+    } while(0)
 
-	/* post whitening but kw4 */
-	io_text[0] = io[2] ^ SUBKEY_L(24);
-	io_text[1] = io[3] ^ SUBKEY_R(24);
-	io_text[2] = io[0];
-	io_text[3] = io[1];
-}
+#define CAMELLIA_ROUNDSM(xl, xr, kl, kr, yl, yr, il, ir)		\
+    do {								\
+	ir =  camellia_sp1110[(u8)xr];					\
+	il =  camellia_sp1110[    (xl >> 24)];				\
+	ir ^= camellia_sp0222[    (xr >> 24)];				\
+	il ^= camellia_sp0222[(u8)(xl >> 16)];				\
+	ir ^= camellia_sp3033[(u8)(xr >> 16)];				\
+	il ^= camellia_sp3033[(u8)(xl >> 8)];				\
+	ir ^= camellia_sp4404[(u8)(xr >> 8)];				\
+	il ^= camellia_sp4404[(u8)xl];					\
+	il ^= kl;							\
+	ir ^= il ^ kr;							\
+	yl ^= ir;							\
+	yr ^= ROR8(il) ^ ir;						\
+    } while(0)
 
-static void camellia_decrypt128(const u32 *subkey, u32 *io_text)
+/* max = 24: 128bit encrypt, max = 32: 256bit encrypt */
+static void camellia_do_encrypt(const u32 *subkey, u32 *io, unsigned max)
 {
 	u32 il,ir,t0,t1;               /* temporary variables */
 
-	u32 io[4];
-
 	/* pre whitening but absorb kw2 */
-	io[0] = io_text[0] ^ SUBKEY_L(24);
-	io[1] = io_text[1] ^ SUBKEY_R(24);
-	io[2] = io_text[2];
-	io[3] = io_text[3];
+	io[0] ^= SUBKEY_L(0);
+	io[1] ^= SUBKEY_R(0);
 
 	/* main iteration */
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(23),SUBKEY_R(23),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(22),SUBKEY_R(22),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(21),SUBKEY_R(21),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(20),SUBKEY_R(20),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(19),SUBKEY_R(19),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(18),SUBKEY_R(18),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(17),SUBKEY_R(17),
-		     SUBKEY_L(16),SUBKEY_R(16),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(15),SUBKEY_R(15),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(14),SUBKEY_R(14),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(13),SUBKEY_R(13),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(12),SUBKEY_R(12),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(11),SUBKEY_R(11),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(10),SUBKEY_R(10),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(9),SUBKEY_R(9),
-		     SUBKEY_L(8),SUBKEY_R(8),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(7),SUBKEY_R(7),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(6),SUBKEY_R(6),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(5),SUBKEY_R(5),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(4),SUBKEY_R(4),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(3),SUBKEY_R(3),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(2),SUBKEY_R(2),
-			 io[0],io[1],il,ir,t0,t1);
-
-	/* post whitening but kw4 */
-	io_text[0] = io[2] ^ SUBKEY_L(0);
-	io_text[1] = io[3] ^ SUBKEY_R(0);
-	io_text[2] = io[0];
-	io_text[3] = io[1];
-}
-
-static void camellia_encrypt256(const u32 *subkey, u32 *io_text)
-{
-	u32 il,ir,t0,t1;           /* temporary variables */
-
-	u32 io[4];
-
-	/* pre whitening but absorb kw2 */
-	io[0] = io_text[0] ^ SUBKEY_L(0);
-	io[1] = io_text[1] ^ SUBKEY_R(0);
-	io[2] = io_text[2];
-	io[3] = io_text[3];
+#define ROUNDS(i) do { \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 2),SUBKEY_R(i + 2), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 3),SUBKEY_R(i + 3), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 4),SUBKEY_R(i + 4), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 5),SUBKEY_R(i + 5), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 6),SUBKEY_R(i + 6), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 7),SUBKEY_R(i + 7), \
+			 io[0],io[1],il,ir); \
+} while (0)
+#define FLS(i) do { \
+	CAMELLIA_FLS(io[0],io[1],io[2],io[3], \
+		     SUBKEY_L(i + 0),SUBKEY_R(i + 0), \
+		     SUBKEY_L(i + 1),SUBKEY_R(i + 1), \
+		     t0,t1,il,ir); \
+} while (0)
+
+	ROUNDS(0);
+	FLS(8);
+	ROUNDS(8);
+	FLS(16);
+	ROUNDS(16);
+	if (max == 32) {
+		FLS(24);
+		ROUNDS(24);
+	}
 
-	/* main iteration */
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(2),SUBKEY_R(2),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(3),SUBKEY_R(3),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(4),SUBKEY_R(4),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(5),SUBKEY_R(5),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(6),SUBKEY_R(6),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(7),SUBKEY_R(7),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(8),SUBKEY_R(8),
-		     SUBKEY_L(9),SUBKEY_R(9),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(10),SUBKEY_R(10),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(11),SUBKEY_R(11),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(12),SUBKEY_R(12),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(13),SUBKEY_R(13),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(14),SUBKEY_R(14),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(15),SUBKEY_R(15),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(16),SUBKEY_R(16),
-		     SUBKEY_L(17),SUBKEY_R(17),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(18),SUBKEY_R(18),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(19),SUBKEY_R(19),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(20),SUBKEY_R(20),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(21),SUBKEY_R(21),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(22),SUBKEY_R(22),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(23),SUBKEY_R(23),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(24),SUBKEY_R(24),
-		     SUBKEY_L(25),SUBKEY_R(25),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(26),SUBKEY_R(26),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(27),SUBKEY_R(27),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(28),SUBKEY_R(28),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(29),SUBKEY_R(29),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(30),SUBKEY_R(30),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(31),SUBKEY_R(31),
-			 io[0],io[1],il,ir,t0,t1);
+#undef ROUNDS
+#undef FLS
 
 	/* post whitening but kw4 */
-	io_text[0] = io[2] ^ SUBKEY_L(32);
-	io_text[1] = io[3] ^ SUBKEY_R(32);
-	io_text[2] = io[0];
-	io_text[3] = io[1];
+	io[2] ^= SUBKEY_L(max);
+	io[3] ^= SUBKEY_R(max);
+	/* NB: io[0],[1] should be swapped with [2],[3] by caller! */
 }
 
-static void camellia_decrypt256(const u32 *subkey, u32 *io_text)
+static void camellia_do_decrypt(const u32 *subkey, u32 *io, unsigned i)
 {
-	u32 il,ir,t0,t1;           /* temporary variables */
-
-	u32 io[4];
+	u32 il,ir,t0,t1;               /* temporary variables */
 
 	/* pre whitening but absorb kw2 */
-	io[0] = io_text[0] ^ SUBKEY_L(32);
-	io[1] = io_text[1] ^ SUBKEY_R(32);
-	io[2] = io_text[2];
-	io[3] = io_text[3];
+	io[0] ^= SUBKEY_L(i);
+	io[1] ^= SUBKEY_R(i);
 
 	/* main iteration */
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(31),SUBKEY_R(31),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(30),SUBKEY_R(30),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(29),SUBKEY_R(29),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(28),SUBKEY_R(28),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(27),SUBKEY_R(27),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(26),SUBKEY_R(26),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(25),SUBKEY_R(25),
-		     SUBKEY_L(24),SUBKEY_R(24),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(23),SUBKEY_R(23),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(22),SUBKEY_R(22),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(21),SUBKEY_R(21),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(20),SUBKEY_R(20),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(19),SUBKEY_R(19),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(18),SUBKEY_R(18),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(17),SUBKEY_R(17),
-		     SUBKEY_L(16),SUBKEY_R(16),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(15),SUBKEY_R(15),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(14),SUBKEY_R(14),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(13),SUBKEY_R(13),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(12),SUBKEY_R(12),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(11),SUBKEY_R(11),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(10),SUBKEY_R(10),
-			 io[0],io[1],il,ir,t0,t1);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(9),SUBKEY_R(9),
-		     SUBKEY_L(8),SUBKEY_R(8),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(7),SUBKEY_R(7),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(6),SUBKEY_R(6),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(5),SUBKEY_R(5),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(4),SUBKEY_R(4),
-			 io[0],io[1],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(3),SUBKEY_R(3),
-			 io[2],io[3],il,ir,t0,t1);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(2),SUBKEY_R(2),
-			 io[0],io[1],il,ir,t0,t1);
+#define ROUNDS(i) do { \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 7),SUBKEY_R(i + 7), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 6),SUBKEY_R(i + 6), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 5),SUBKEY_R(i + 5), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 4),SUBKEY_R(i + 4), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 3),SUBKEY_R(i + 3), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 2),SUBKEY_R(i + 2), \
+			 io[0],io[1],il,ir); \
+} while (0)
+#define FLS(i) do { \
+	CAMELLIA_FLS(io[0],io[1],io[2],io[3], \
+		     SUBKEY_L(i + 1),SUBKEY_R(i + 1), \
+		     SUBKEY_L(i + 0),SUBKEY_R(i + 0), \
+		     t0,t1,il,ir); \
+} while (0)
+
+	if (i == 32) {
+		ROUNDS(24);
+		FLS(24);
+	}
+	ROUNDS(16);
+	FLS(16);
+	ROUNDS(8);
+	FLS(8);
+	ROUNDS(0);
+
+#undef ROUNDS
+#undef FLS
 
 	/* post whitening but kw4 */
-	io_text[0] = io[2] ^ SUBKEY_L(0);
-	io_text[1] = io[3] ^ SUBKEY_R(0);
-	io_text[2] = io[0];
-	io_text[3] = io[1];
+	io[2] ^= SUBKEY_L(0);
+	io[3] ^= SUBKEY_R(0);
+	/* NB: 0,1 should be swapped with 2,3 by caller! */
 }
 
 
@@ -1445,21 +1166,15 @@ static void camellia_encrypt(struct cryp
 	tmp[2] = be32_to_cpu(src[2]);
 	tmp[3] = be32_to_cpu(src[3]);
 
-	switch (cctx->key_length) {
-	case 16:
-		camellia_encrypt128(cctx->key_table, tmp);
-		break;
-	case 24:
-		/* fall through */
-	case 32:
-		camellia_encrypt256(cctx->key_table, tmp);
-		break;
-	}
-
-	dst[0] = cpu_to_be32(tmp[0]);
-	dst[1] = cpu_to_be32(tmp[1]);
-	dst[2] = cpu_to_be32(tmp[2]);
-	dst[3] = cpu_to_be32(tmp[3]);
+	camellia_do_encrypt(cctx->key_table, tmp,
+		cctx->key_length == 16 ? 24 : 32 /* for key lengths of 24 and 32 */
+	);
+
+	/* do_encrypt returns 0,1 swapped with 2,3 */
+	dst[0] = cpu_to_be32(tmp[2]);
+	dst[1] = cpu_to_be32(tmp[3]);
+	dst[2] = cpu_to_be32(tmp[0]);
+	dst[3] = cpu_to_be32(tmp[1]);
 }
 
 static void camellia_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
@@ -1475,21 +1190,15 @@ static void camellia_decrypt(struct cryp
 	tmp[2] = be32_to_cpu(src[2]);
 	tmp[3] = be32_to_cpu(src[3]);
 
-	switch (cctx->key_length) {
-	case 16:
-		camellia_decrypt128(cctx->key_table, tmp);
-		break;
-	case 24:
-		/* fall through */
-	case 32:
-		camellia_decrypt256(cctx->key_table, tmp);
-		break;
-	}
-
-	dst[0] = cpu_to_be32(tmp[0]);
-	dst[1] = cpu_to_be32(tmp[1]);
-	dst[2] = cpu_to_be32(tmp[2]);
-	dst[3] = cpu_to_be32(tmp[3]);
+	camellia_do_decrypt(cctx->key_table, tmp,
+		cctx->key_length == 16 ? 24 : 32 /* for key lengths of 24 and 32 */
+	);
+
+	/* do_decrypt returns 0,1 swapped with 2,3 */
+	dst[0] = cpu_to_be32(tmp[2]);
+	dst[1] = cpu_to_be32(tmp[3]);
+	dst[2] = cpu_to_be32(tmp[0]);
+	dst[3] = cpu_to_be32(tmp[1]);
 }
 
 static struct crypto_alg camellia_alg = {
@@ -1528,3 +1237,5 @@ module_exit(camellia_fini);
 
 MODULE_DESCRIPTION("Camellia Cipher Algorithm");
 MODULE_LICENSE("GPL");
+
+#endif /* if BITS_PER_LONG < 64 */
diff -urpN linux-2.6.23.1.camellia/crypto/camellia_64.c linux-2.6.23.1.camellia5/crypto/camellia_64.c
--- linux-2.6.23.1.camellia/crypto/camellia_64.c	1969-12-31 17:00:00.000000000 -0700
+++ linux-2.6.23.1.camellia5/crypto/camellia_64.c	2007-11-13 22:57:16.000000000 -0700
@@ -0,0 +1,1149 @@
+/*
+ * Copyright (C) 2006
+ * NTT (Nippon Telegraph and Telephone Corporation).
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
+ */
+
+/*
+ * Algorithm Specification
+ *  http://info.isl.ntt.co.jp/crypt/eng/camellia/specifications.html
+ */
+
+/*
+ *
+ * NOTE --- NOTE --- NOTE --- NOTE
+ * This implementation assumes that all memory addresses passed
+ * as parameters are four-byte aligned.
+ *
+ */
+
+/* #included from camellia.c if long is 64bit */
+
+/*
+#include <linux/crypto.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+*/
+
+static const u32 camellia_sp1110[256] = {
+	0x70707000,0x82828200,0x2c2c2c00,0xececec00,
+	0xb3b3b300,0x27272700,0xc0c0c000,0xe5e5e500,
+	0xe4e4e400,0x85858500,0x57575700,0x35353500,
+	0xeaeaea00,0x0c0c0c00,0xaeaeae00,0x41414100,
+	0x23232300,0xefefef00,0x6b6b6b00,0x93939300,
+	0x45454500,0x19191900,0xa5a5a500,0x21212100,
+	0xededed00,0x0e0e0e00,0x4f4f4f00,0x4e4e4e00,
+	0x1d1d1d00,0x65656500,0x92929200,0xbdbdbd00,
+	0x86868600,0xb8b8b800,0xafafaf00,0x8f8f8f00,
+	0x7c7c7c00,0xebebeb00,0x1f1f1f00,0xcecece00,
+	0x3e3e3e00,0x30303000,0xdcdcdc00,0x5f5f5f00,
+	0x5e5e5e00,0xc5c5c500,0x0b0b0b00,0x1a1a1a00,
+	0xa6a6a600,0xe1e1e100,0x39393900,0xcacaca00,
+	0xd5d5d500,0x47474700,0x5d5d5d00,0x3d3d3d00,
+	0xd9d9d900,0x01010100,0x5a5a5a00,0xd6d6d600,
+	0x51515100,0x56565600,0x6c6c6c00,0x4d4d4d00,
+	0x8b8b8b00,0x0d0d0d00,0x9a9a9a00,0x66666600,
+	0xfbfbfb00,0xcccccc00,0xb0b0b000,0x2d2d2d00,
+	0x74747400,0x12121200,0x2b2b2b00,0x20202000,
+	0xf0f0f000,0xb1b1b100,0x84848400,0x99999900,
+	0xdfdfdf00,0x4c4c4c00,0xcbcbcb00,0xc2c2c200,
+	0x34343400,0x7e7e7e00,0x76767600,0x05050500,
+	0x6d6d6d00,0xb7b7b700,0xa9a9a900,0x31313100,
+	0xd1d1d100,0x17171700,0x04040400,0xd7d7d700,
+	0x14141400,0x58585800,0x3a3a3a00,0x61616100,
+	0xdedede00,0x1b1b1b00,0x11111100,0x1c1c1c00,
+	0x32323200,0x0f0f0f00,0x9c9c9c00,0x16161600,
+	0x53535300,0x18181800,0xf2f2f200,0x22222200,
+	0xfefefe00,0x44444400,0xcfcfcf00,0xb2b2b200,
+	0xc3c3c300,0xb5b5b500,0x7a7a7a00,0x91919100,
+	0x24242400,0x08080800,0xe8e8e800,0xa8a8a800,
+	0x60606000,0xfcfcfc00,0x69696900,0x50505000,
+	0xaaaaaa00,0xd0d0d000,0xa0a0a000,0x7d7d7d00,
+	0xa1a1a100,0x89898900,0x62626200,0x97979700,
+	0x54545400,0x5b5b5b00,0x1e1e1e00,0x95959500,
+	0xe0e0e000,0xffffff00,0x64646400,0xd2d2d200,
+	0x10101000,0xc4c4c400,0x00000000,0x48484800,
+	0xa3a3a300,0xf7f7f700,0x75757500,0xdbdbdb00,
+	0x8a8a8a00,0x03030300,0xe6e6e600,0xdadada00,
+	0x09090900,0x3f3f3f00,0xdddddd00,0x94949400,
+	0x87878700,0x5c5c5c00,0x83838300,0x02020200,
+	0xcdcdcd00,0x4a4a4a00,0x90909000,0x33333300,
+	0x73737300,0x67676700,0xf6f6f600,0xf3f3f300,
+	0x9d9d9d00,0x7f7f7f00,0xbfbfbf00,0xe2e2e200,
+	0x52525200,0x9b9b9b00,0xd8d8d800,0x26262600,
+	0xc8c8c800,0x37373700,0xc6c6c600,0x3b3b3b00,
+	0x81818100,0x96969600,0x6f6f6f00,0x4b4b4b00,
+	0x13131300,0xbebebe00,0x63636300,0x2e2e2e00,
+	0xe9e9e900,0x79797900,0xa7a7a700,0x8c8c8c00,
+	0x9f9f9f00,0x6e6e6e00,0xbcbcbc00,0x8e8e8e00,
+	0x29292900,0xf5f5f500,0xf9f9f900,0xb6b6b600,
+	0x2f2f2f00,0xfdfdfd00,0xb4b4b400,0x59595900,
+	0x78787800,0x98989800,0x06060600,0x6a6a6a00,
+	0xe7e7e700,0x46464600,0x71717100,0xbababa00,
+	0xd4d4d400,0x25252500,0xababab00,0x42424200,
+	0x88888800,0xa2a2a200,0x8d8d8d00,0xfafafa00,
+	0x72727200,0x07070700,0xb9b9b900,0x55555500,
+	0xf8f8f800,0xeeeeee00,0xacacac00,0x0a0a0a00,
+	0x36363600,0x49494900,0x2a2a2a00,0x68686800,
+	0x3c3c3c00,0x38383800,0xf1f1f100,0xa4a4a400,
+	0x40404000,0x28282800,0xd3d3d300,0x7b7b7b00,
+	0xbbbbbb00,0xc9c9c900,0x43434300,0xc1c1c100,
+	0x15151500,0xe3e3e300,0xadadad00,0xf4f4f400,
+	0x77777700,0xc7c7c700,0x80808000,0x9e9e9e00,
+};
+
+static const u32 camellia_sp0222[256] = {
+	0x00e0e0e0,0x00050505,0x00585858,0x00d9d9d9,
+	0x00676767,0x004e4e4e,0x00818181,0x00cbcbcb,
+	0x00c9c9c9,0x000b0b0b,0x00aeaeae,0x006a6a6a,
+	0x00d5d5d5,0x00181818,0x005d5d5d,0x00828282,
+	0x00464646,0x00dfdfdf,0x00d6d6d6,0x00272727,
+	0x008a8a8a,0x00323232,0x004b4b4b,0x00424242,
+	0x00dbdbdb,0x001c1c1c,0x009e9e9e,0x009c9c9c,
+	0x003a3a3a,0x00cacaca,0x00252525,0x007b7b7b,
+	0x000d0d0d,0x00717171,0x005f5f5f,0x001f1f1f,
+	0x00f8f8f8,0x00d7d7d7,0x003e3e3e,0x009d9d9d,
+	0x007c7c7c,0x00606060,0x00b9b9b9,0x00bebebe,
+	0x00bcbcbc,0x008b8b8b,0x00161616,0x00343434,
+	0x004d4d4d,0x00c3c3c3,0x00727272,0x00959595,
+	0x00ababab,0x008e8e8e,0x00bababa,0x007a7a7a,
+	0x00b3b3b3,0x00020202,0x00b4b4b4,0x00adadad,
+	0x00a2a2a2,0x00acacac,0x00d8d8d8,0x009a9a9a,
+	0x00171717,0x001a1a1a,0x00353535,0x00cccccc,
+	0x00f7f7f7,0x00999999,0x00616161,0x005a5a5a,
+	0x00e8e8e8,0x00242424,0x00565656,0x00404040,
+	0x00e1e1e1,0x00636363,0x00090909,0x00333333,
+	0x00bfbfbf,0x00989898,0x00979797,0x00858585,
+	0x00686868,0x00fcfcfc,0x00ececec,0x000a0a0a,
+	0x00dadada,0x006f6f6f,0x00535353,0x00626262,
+	0x00a3a3a3,0x002e2e2e,0x00080808,0x00afafaf,
+	0x00282828,0x00b0b0b0,0x00747474,0x00c2c2c2,
+	0x00bdbdbd,0x00363636,0x00222222,0x00383838,
+	0x00646464,0x001e1e1e,0x00393939,0x002c2c2c,
+	0x00a6a6a6,0x00303030,0x00e5e5e5,0x00444444,
+	0x00fdfdfd,0x00888888,0x009f9f9f,0x00656565,
+	0x00878787,0x006b6b6b,0x00f4f4f4,0x00232323,
+	0x00484848,0x00101010,0x00d1d1d1,0x00515151,
+	0x00c0c0c0,0x00f9f9f9,0x00d2d2d2,0x00a0a0a0,
+	0x00555555,0x00a1a1a1,0x00414141,0x00fafafa,
+	0x00434343,0x00131313,0x00c4c4c4,0x002f2f2f,
+	0x00a8a8a8,0x00b6b6b6,0x003c3c3c,0x002b2b2b,
+	0x00c1c1c1,0x00ffffff,0x00c8c8c8,0x00a5a5a5,
+	0x00202020,0x00898989,0x00000000,0x00909090,
+	0x00474747,0x00efefef,0x00eaeaea,0x00b7b7b7,
+	0x00151515,0x00060606,0x00cdcdcd,0x00b5b5b5,
+	0x00121212,0x007e7e7e,0x00bbbbbb,0x00292929,
+	0x000f0f0f,0x00b8b8b8,0x00070707,0x00040404,
+	0x009b9b9b,0x00949494,0x00212121,0x00666666,
+	0x00e6e6e6,0x00cecece,0x00ededed,0x00e7e7e7,
+	0x003b3b3b,0x00fefefe,0x007f7f7f,0x00c5c5c5,
+	0x00a4a4a4,0x00373737,0x00b1b1b1,0x004c4c4c,
+	0x00919191,0x006e6e6e,0x008d8d8d,0x00767676,
+	0x00030303,0x002d2d2d,0x00dedede,0x00969696,
+	0x00262626,0x007d7d7d,0x00c6c6c6,0x005c5c5c,
+	0x00d3d3d3,0x00f2f2f2,0x004f4f4f,0x00191919,
+	0x003f3f3f,0x00dcdcdc,0x00797979,0x001d1d1d,
+	0x00525252,0x00ebebeb,0x00f3f3f3,0x006d6d6d,
+	0x005e5e5e,0x00fbfbfb,0x00696969,0x00b2b2b2,
+	0x00f0f0f0,0x00313131,0x000c0c0c,0x00d4d4d4,
+	0x00cfcfcf,0x008c8c8c,0x00e2e2e2,0x00757575,
+	0x00a9a9a9,0x004a4a4a,0x00575757,0x00848484,
+	0x00111111,0x00454545,0x001b1b1b,0x00f5f5f5,
+	0x00e4e4e4,0x000e0e0e,0x00737373,0x00aaaaaa,
+	0x00f1f1f1,0x00dddddd,0x00595959,0x00141414,
+	0x006c6c6c,0x00929292,0x00545454,0x00d0d0d0,
+	0x00787878,0x00707070,0x00e3e3e3,0x00494949,
+	0x00808080,0x00505050,0x00a7a7a7,0x00f6f6f6,
+	0x00777777,0x00939393,0x00868686,0x00838383,
+	0x002a2a2a,0x00c7c7c7,0x005b5b5b,0x00e9e9e9,
+	0x00eeeeee,0x008f8f8f,0x00010101,0x003d3d3d,
+};
+
+static const u32 camellia_sp3033[256] = {
+	0x38003838,0x41004141,0x16001616,0x76007676,
+	0xd900d9d9,0x93009393,0x60006060,0xf200f2f2,
+	0x72007272,0xc200c2c2,0xab00abab,0x9a009a9a,
+	0x75007575,0x06000606,0x57005757,0xa000a0a0,
+	0x91009191,0xf700f7f7,0xb500b5b5,0xc900c9c9,
+	0xa200a2a2,0x8c008c8c,0xd200d2d2,0x90009090,
+	0xf600f6f6,0x07000707,0xa700a7a7,0x27002727,
+	0x8e008e8e,0xb200b2b2,0x49004949,0xde00dede,
+	0x43004343,0x5c005c5c,0xd700d7d7,0xc700c7c7,
+	0x3e003e3e,0xf500f5f5,0x8f008f8f,0x67006767,
+	0x1f001f1f,0x18001818,0x6e006e6e,0xaf00afaf,
+	0x2f002f2f,0xe200e2e2,0x85008585,0x0d000d0d,
+	0x53005353,0xf000f0f0,0x9c009c9c,0x65006565,
+	0xea00eaea,0xa300a3a3,0xae00aeae,0x9e009e9e,
+	0xec00ecec,0x80008080,0x2d002d2d,0x6b006b6b,
+	0xa800a8a8,0x2b002b2b,0x36003636,0xa600a6a6,
+	0xc500c5c5,0x86008686,0x4d004d4d,0x33003333,
+	0xfd00fdfd,0x66006666,0x58005858,0x96009696,
+	0x3a003a3a,0x09000909,0x95009595,0x10001010,
+	0x78007878,0xd800d8d8,0x42004242,0xcc00cccc,
+	0xef00efef,0x26002626,0xe500e5e5,0x61006161,
+	0x1a001a1a,0x3f003f3f,0x3b003b3b,0x82008282,
+	0xb600b6b6,0xdb00dbdb,0xd400d4d4,0x98009898,
+	0xe800e8e8,0x8b008b8b,0x02000202,0xeb00ebeb,
+	0x0a000a0a,0x2c002c2c,0x1d001d1d,0xb000b0b0,
+	0x6f006f6f,0x8d008d8d,0x88008888,0x0e000e0e,
+	0x19001919,0x87008787,0x4e004e4e,0x0b000b0b,
+	0xa900a9a9,0x0c000c0c,0x79007979,0x11001111,
+	0x7f007f7f,0x22002222,0xe700e7e7,0x59005959,
+	0xe100e1e1,0xda00dada,0x3d003d3d,0xc800c8c8,
+	0x12001212,0x04000404,0x74007474,0x54005454,
+	0x30003030,0x7e007e7e,0xb400b4b4,0x28002828,
+	0x55005555,0x68006868,0x50005050,0xbe00bebe,
+	0xd000d0d0,0xc400c4c4,0x31003131,0xcb00cbcb,
+	0x2a002a2a,0xad00adad,0x0f000f0f,0xca00caca,
+	0x70007070,0xff00ffff,0x32003232,0x69006969,
+	0x08000808,0x62006262,0x00000000,0x24002424,
+	0xd100d1d1,0xfb00fbfb,0xba00baba,0xed00eded,
+	0x45004545,0x81008181,0x73007373,0x6d006d6d,
+	0x84008484,0x9f009f9f,0xee00eeee,0x4a004a4a,
+	0xc300c3c3,0x2e002e2e,0xc100c1c1,0x01000101,
+	0xe600e6e6,0x25002525,0x48004848,0x99009999,
+	0xb900b9b9,0xb300b3b3,0x7b007b7b,0xf900f9f9,
+	0xce00cece,0xbf00bfbf,0xdf00dfdf,0x71007171,
+	0x29002929,0xcd00cdcd,0x6c006c6c,0x13001313,
+	0x64006464,0x9b009b9b,0x63006363,0x9d009d9d,
+	0xc000c0c0,0x4b004b4b,0xb700b7b7,0xa500a5a5,
+	0x89008989,0x5f005f5f,0xb100b1b1,0x17001717,
+	0xf400f4f4,0xbc00bcbc,0xd300d3d3,0x46004646,
+	0xcf00cfcf,0x37003737,0x5e005e5e,0x47004747,
+	0x94009494,0xfa00fafa,0xfc00fcfc,0x5b005b5b,
+	0x97009797,0xfe00fefe,0x5a005a5a,0xac00acac,
+	0x3c003c3c,0x4c004c4c,0x03000303,0x35003535,
+	0xf300f3f3,0x23002323,0xb800b8b8,0x5d005d5d,
+	0x6a006a6a,0x92009292,0xd500d5d5,0x21002121,
+	0x44004444,0x51005151,0xc600c6c6,0x7d007d7d,
+	0x39003939,0x83008383,0xdc00dcdc,0xaa00aaaa,
+	0x7c007c7c,0x77007777,0x56005656,0x05000505,
+	0x1b001b1b,0xa400a4a4,0x15001515,0x34003434,
+	0x1e001e1e,0x1c001c1c,0xf800f8f8,0x52005252,
+	0x20002020,0x14001414,0xe900e9e9,0xbd00bdbd,
+	0xdd00dddd,0xe400e4e4,0xa100a1a1,0xe000e0e0,
+	0x8a008a8a,0xf100f1f1,0xd600d6d6,0x7a007a7a,
+	0xbb00bbbb,0xe300e3e3,0x40004040,0x4f004f4f,
+};
+
+static const u32 camellia_sp4404[256] = {
+	0x70700070,0x2c2c002c,0xb3b300b3,0xc0c000c0,
+	0xe4e400e4,0x57570057,0xeaea00ea,0xaeae00ae,
+	0x23230023,0x6b6b006b,0x45450045,0xa5a500a5,
+	0xeded00ed,0x4f4f004f,0x1d1d001d,0x92920092,
+	0x86860086,0xafaf00af,0x7c7c007c,0x1f1f001f,
+	0x3e3e003e,0xdcdc00dc,0x5e5e005e,0x0b0b000b,
+	0xa6a600a6,0x39390039,0xd5d500d5,0x5d5d005d,
+	0xd9d900d9,0x5a5a005a,0x51510051,0x6c6c006c,
+	0x8b8b008b,0x9a9a009a,0xfbfb00fb,0xb0b000b0,
+	0x74740074,0x2b2b002b,0xf0f000f0,0x84840084,
+	0xdfdf00df,0xcbcb00cb,0x34340034,0x76760076,
+	0x6d6d006d,0xa9a900a9,0xd1d100d1,0x04040004,
+	0x14140014,0x3a3a003a,0xdede00de,0x11110011,
+	0x32320032,0x9c9c009c,0x53530053,0xf2f200f2,
+	0xfefe00fe,0xcfcf00cf,0xc3c300c3,0x7a7a007a,
+	0x24240024,0xe8e800e8,0x60600060,0x69690069,
+	0xaaaa00aa,0xa0a000a0,0xa1a100a1,0x62620062,
+	0x54540054,0x1e1e001e,0xe0e000e0,0x64640064,
+	0x10100010,0x00000000,0xa3a300a3,0x75750075,
+	0x8a8a008a,0xe6e600e6,0x09090009,0xdddd00dd,
+	0x87870087,0x83830083,0xcdcd00cd,0x90900090,
+	0x73730073,0xf6f600f6,0x9d9d009d,0xbfbf00bf,
+	0x52520052,0xd8d800d8,0xc8c800c8,0xc6c600c6,
+	0x81810081,0x6f6f006f,0x13130013,0x63630063,
+	0xe9e900e9,0xa7a700a7,0x9f9f009f,0xbcbc00bc,
+	0x29290029,0xf9f900f9,0x2f2f002f,0xb4b400b4,
+	0x78780078,0x06060006,0xe7e700e7,0x71710071,
+	0xd4d400d4,0xabab00ab,0x88880088,0x8d8d008d,
+	0x72720072,0xb9b900b9,0xf8f800f8,0xacac00ac,
+	0x36360036,0x2a2a002a,0x3c3c003c,0xf1f100f1,
+	0x40400040,0xd3d300d3,0xbbbb00bb,0x43430043,
+	0x15150015,0xadad00ad,0x77770077,0x80800080,
+	0x82820082,0xecec00ec,0x27270027,0xe5e500e5,
+	0x85850085,0x35350035,0x0c0c000c,0x41410041,
+	0xefef00ef,0x93930093,0x19190019,0x21210021,
+	0x0e0e000e,0x4e4e004e,0x65650065,0xbdbd00bd,
+	0xb8b800b8,0x8f8f008f,0xebeb00eb,0xcece00ce,
+	0x30300030,0x5f5f005f,0xc5c500c5,0x1a1a001a,
+	0xe1e100e1,0xcaca00ca,0x47470047,0x3d3d003d,
+	0x01010001,0xd6d600d6,0x56560056,0x4d4d004d,
+	0x0d0d000d,0x66660066,0xcccc00cc,0x2d2d002d,
+	0x12120012,0x20200020,0xb1b100b1,0x99990099,
+	0x4c4c004c,0xc2c200c2,0x7e7e007e,0x05050005,
+	0xb7b700b7,0x31310031,0x17170017,0xd7d700d7,
+	0x58580058,0x61610061,0x1b1b001b,0x1c1c001c,
+	0x0f0f000f,0x16160016,0x18180018,0x22220022,
+	0x44440044,0xb2b200b2,0xb5b500b5,0x91910091,
+	0x08080008,0xa8a800a8,0xfcfc00fc,0x50500050,
+	0xd0d000d0,0x7d7d007d,0x89890089,0x97970097,
+	0x5b5b005b,0x95950095,0xffff00ff,0xd2d200d2,
+	0xc4c400c4,0x48480048,0xf7f700f7,0xdbdb00db,
+	0x03030003,0xdada00da,0x3f3f003f,0x94940094,
+	0x5c5c005c,0x02020002,0x4a4a004a,0x33330033,
+	0x67670067,0xf3f300f3,0x7f7f007f,0xe2e200e2,
+	0x9b9b009b,0x26260026,0x37370037,0x3b3b003b,
+	0x96960096,0x4b4b004b,0xbebe00be,0x2e2e002e,
+	0x79790079,0x8c8c008c,0x6e6e006e,0x8e8e008e,
+	0xf5f500f5,0xb6b600b6,0xfdfd00fd,0x59590059,
+	0x98980098,0x6a6a006a,0x46460046,0xbaba00ba,
+	0x25250025,0x42420042,0xa2a200a2,0xfafa00fa,
+	0x07070007,0x55550055,0xeeee00ee,0x0a0a000a,
+	0x49490049,0x68680068,0x38380038,0xa4a400a4,
+	0x28280028,0x7b7b007b,0xc9c900c9,0xc1c100c1,
+	0xe3e300e3,0xf4f400f4,0xc7c700c7,0x9e9e009e,
+};
+
+
+#define CAMELLIA_MIN_KEY_SIZE        16
+#define CAMELLIA_MAX_KEY_SIZE        32
+#define CAMELLIA_BLOCK_SIZE          16
+#define CAMELLIA_TABLE_BYTE_LEN     272
+
+
+/* key constants */
+
+#define CAMELLIA_SIGMA1 (0xA09E667F3BCC908B)
+#define CAMELLIA_SIGMA2 (0xB67AE8584CAA73B2)
+#define CAMELLIA_SIGMA3 (0xC6EF372FE94F82BE)
+#define CAMELLIA_SIGMA4 (0x54FF53A5F1D36F1C)
+#define CAMELLIA_SIGMA5 (0x10E527FADE682D1D)
+#define CAMELLIA_SIGMA6 (0xB05688C2B3E6C1FD)
+
+/*
+ *  macros
+ */
+#define GETU64(v, pt) \
+    do { \
+	/* latest breed of gcc is clever enough to use move */ \
+	memcpy(&(v), (pt), 8); \
+	(v) = be64_to_cpu(v); \
+    } while(0)
+
+/* rotation right shift 1byte */
+#define ROR8(x) (((x) >> 8) + ((x) << (sizeof(x)*8 - 8)))
+/* rotation left shift 1bit */
+#define ROL1(x) (((x) << 1) + ((x) >> (sizeof(x)*8 - 1)))
+/* rotation left shift 1byte */
+#define ROL8(x) (((x) << 8) + ((x) >> (sizeof(x)*8 - 8)))
+
+#define ROLDQ(l, r, w, bits)				\
+    do {						\
+	w = l;						\
+	l = (l << bits) + (r >> (64 - bits));		\
+	r = (r << bits) + (w >> (64 - bits));		\
+    } while(0)
+
+/*
+ * NB: L and R below stand for 'left' and 'right' as in written numbers.
+ * That is, in (xxxL,xxxR) pair xxxL holds most significant digits,
+ * _not_ least significant ones!
+ */
+
+
+/*
+ * Key setup
+ */
+#define CAMELLIA_F(x, k, y, i)					\
+    do {							\
+	u32 yl, yr;						\
+	i = x ^ k;						\
+	yl = camellia_sp1110[(u8)i]				\
+	   ^ camellia_sp0222[(u8)(i >> 24)]			\
+	   ^ camellia_sp3033[(u8)(i >> 16)]			\
+	   ^ camellia_sp4404[(u8)(i >> 8)];			\
+	yr = camellia_sp1110[    (i >> 56)]			\
+	   ^ camellia_sp0222[(u8)(i >> 48)]			\
+	   ^ camellia_sp3033[(u8)(i >> 40)]			\
+	   ^ camellia_sp4404[(u8)(i >> 32)];			\
+	yl ^= yr;						\
+	yr = ROR8(yr);						\
+	yr ^= yl;						\
+	y = ((u64)yl << 32) + yr;				\
+    } while(0)
+
+#define SUBKEY(INDEX) (subkey[(INDEX)])
+
+#ifdef __BIG_ENDIAN
+#define SUBKEY_L(INDEX) (((u32*)subkey)[(INDEX)*2])
+#define SUBKEY_R(INDEX) (((u32*)subkey)[(INDEX)*2 + 1])
+#else
+#define SUBKEY_L(INDEX) (((u32*)subkey)[(INDEX)*2 + 1])
+#define SUBKEY_R(INDEX) (((u32*)subkey)[(INDEX)*2])
+#endif
+
+static void camellia_setup_tail(u64 *subkey, int max)
+{
+	u32 dw;
+	int i = 2;
+	do {
+		dw = SUBKEY_L(i + 0) ^ SUBKEY_R(i + 0); dw = ROL8(dw);/* round 1 */
+		SUBKEY_R(i + 0) = SUBKEY_L(i + 0) ^ dw; SUBKEY_L(i + 0) = dw;
+		dw = SUBKEY_L(i + 1) ^ SUBKEY_R(i + 1); dw = ROL8(dw);/* round 2 */
+		SUBKEY_R(i + 1) = SUBKEY_L(i + 1) ^ dw; SUBKEY_L(i + 1) = dw;
+		dw = SUBKEY_L(i + 2) ^ SUBKEY_R(i + 2); dw = ROL8(dw);/* round 3 */
+		SUBKEY_R(i + 2) = SUBKEY_L(i + 2) ^ dw; SUBKEY_L(i + 2) = dw;
+		dw = SUBKEY_L(i + 3) ^ SUBKEY_R(i + 3); dw = ROL8(dw);/* round 4 */
+		SUBKEY_R(i + 3) = SUBKEY_L(i + 3) ^ dw; SUBKEY_L(i + 3) = dw;
+		dw = SUBKEY_L(i + 4) ^ SUBKEY_R(i + 4); dw = ROL8(dw);/* round 5 */
+		SUBKEY_R(i + 4) = SUBKEY_L(i + 4) ^ dw; SUBKEY_L(i + 4) = dw;
+		dw = SUBKEY_L(i + 5) ^ SUBKEY_R(i + 5); dw = ROL8(dw);/* round 6 */
+		SUBKEY_R(i + 5) = SUBKEY_L(i + 5) ^ dw; SUBKEY_L(i + 5) = dw;
+		i += 8;
+	} while (i < max);
+}
+
+#ifdef __BIG_ENDIAN
+#define subL(INDEX) (((u32*)sub)[(INDEX)*2])
+#define subR(INDEX) (((u32*)sub)[(INDEX)*2 + 1])
+#else
+#define subL(INDEX) (((u32*)sub)[(INDEX)*2 + 1])
+#define subR(INDEX) (((u32*)sub)[(INDEX)*2])
+#endif
+
+static void camellia_setup128(const unsigned char *key, u64 *subkey)
+{
+	u64 kl, kr;
+	u64 i, t, w;
+	u64 kw4;
+	u32 dw;
+	u64 sub[26];
+
+	/**
+	 *  k == kl || kr (|| is concatination)
+	 */
+	GETU64(kl, key     );
+	GETU64(kr, key +  8);
+
+	/**
+	 * generate KL dependent subkeys
+	 */
+	/* kw1 */
+	sub[0] = kl;
+	/* kw2 */
+	sub[1] = kr;
+	/* rotation left shift 15bit */
+	ROLDQ(kl, kr, w, 15);
+	/* k3 */
+	sub[4] = kl;
+	/* k4 */
+	sub[5] = kr;
+	/* rotation left shift 15+30bit */
+	ROLDQ(kl, kr, w, 30);
+	/* k7 */
+	sub[10] = kl;
+	/* k8 */
+	sub[11] = kr;
+	/* rotation left shift 15+30+15bit */
+	ROLDQ(kl, kr, w, 15);
+	/* k10 */
+	sub[13] = kr;
+	/* rotation left shift 15+30+15+17 bit */
+	ROLDQ(kl, kr, w, 17);
+	/* kl3 */
+	sub[16] = kl;
+	/* kl4 */
+	sub[17] = kr;
+	/* rotation left shift 15+30+15+17+17 bit */
+	ROLDQ(kl, kr, w, 17);
+	/* k13 */
+	sub[18] = kl;
+	/* k14 */
+	sub[19] = kr;
+	/* rotation left shift 15+30+15+17+17+17 bit */
+	ROLDQ(kl, kr, w, 17);
+	/* k17 */
+	sub[22] = kl;
+	/* k18 */
+	sub[23] = kr;
+
+	/* generate KA */
+	kl = sub[0];
+	kr = sub[1];
+	CAMELLIA_F(kl, CAMELLIA_SIGMA1, w, i);
+	kr ^= w;
+	CAMELLIA_F(kr, CAMELLIA_SIGMA2, kl, i);
+	/* current status == (kl, w) */
+	CAMELLIA_F(kl, CAMELLIA_SIGMA3, kr, i);
+	kr ^= w;
+	CAMELLIA_F(kr, CAMELLIA_SIGMA4, w, i);
+	kl ^= w;
+
+	/* generate KA dependent subkeys */
+	/* k1, k2 */
+	sub[2] = kl;
+	sub[3] = kr;
+	ROLDQ(kl, kr, w, 15);
+	/* k5,k6 */
+	sub[6] = kl;
+	sub[7] = kr;
+	ROLDQ(kl, kr, w, 15);
+	/* kl1, kl2 */
+	sub[8] = kl;
+	sub[9] = kr;
+	ROLDQ(kl, kr, w, 15);
+	/* k9 */
+	sub[12] = kl;
+	ROLDQ(kl, kr, w, 15);
+	/* k11, k12 */
+	sub[14] = kl;
+	sub[15] = kr;
+	ROLDQ(kl, kr, w, 34);
+	/* k15, k16 */
+	sub[20] = kl;
+	sub[21] = kr;
+	ROLDQ(kl, kr, w, 17);
+	/* kw3, kw4 */
+	sub[24] = kl;
+	sub[25] = kr;
+
+	/* absorb kw2 to other subkeys */
+	/* round 2 */
+	sub[3] ^= sub[1];
+	/* round 4 */
+	sub[5] ^= sub[1];
+	/* round 6 */
+	sub[7] ^= sub[1];
+	subL(1) ^= subR(1) & ~subR(9);
+	dw = subL(1) & subL(9),
+		subR(1) ^= ROL1(dw); /* modified for FLinv(kl2) */
+	/* round 8 */
+	sub[11] ^= sub[1];
+	/* round 10 */
+	sub[13] ^= sub[1];
+	/* round 12 */
+	sub[15] ^= sub[1];
+	subL(1) ^= subR(1) & ~subR(17);
+	dw = subL(1) & subL(17),
+		subR(1) ^= ROL1(dw); /* modified for FLinv(kl4) */
+	/* round 14 */
+	sub[19] ^= sub[1];
+	/* round 16 */
+	sub[21] ^= sub[1];
+	/* round 18 */
+	sub[23] ^= sub[1];
+	/* kw3 */
+	sub[24] ^= sub[1];
+
+	/* absorb kw4 to other subkeys */
+	kw4 = sub[25];
+	/* round 17 */
+	sub[22] ^= kw4;
+	/* round 15 */
+	sub[20] ^= kw4;
+	/* round 13 */
+	sub[18] ^= kw4;
+	kw4 ^= (u64)((u32)kw4 & ~subR(16)) << 32; //kw4l ^= kw4r & ~subR(16);
+	dw = (u32)(kw4 >> 32) & subL(16); // kw4l & subL[16],
+	kw4 ^= ROL1(dw); /* modified for FL(kl3) */
+	/* round 11 */
+	sub[14] ^= kw4;
+	/* round 9 */
+	sub[12] ^= kw4;
+	/* round 7 */
+	sub[10] ^= kw4;
+	kw4 ^= (u64)((u32)kw4 & ~subR(8)) << 32; //kw4l ^= kw4r & ~subR[8];
+	dw = (u32)(kw4 >> 32) & subL(8);
+	kw4 ^= ROL1(dw); /* modified for FL(kl1) */
+	/* round 5 */
+	sub[6] ^= kw4;
+	/* round 3 */
+	sub[4] ^= kw4;
+	/* round 1 */
+	sub[2] ^= kw4;
+	/* kw1 */
+	sub[0] ^= kw4;
+
+	/* key XOR is end of F-function */
+	SUBKEY(0) = sub[0] ^ sub[2];/* kw1 */
+	SUBKEY(2) = sub[3];       /* round 1 */
+	SUBKEY(3) = sub[2] ^ sub[4]; /* round 2 */
+	SUBKEY(4) = sub[3] ^ sub[5]; /* round 3 */
+	SUBKEY(5) = sub[4] ^ sub[6]; /* round 4 */
+	SUBKEY(6) = sub[5] ^ sub[7]; /* round 5 */
+	t = subL(10) ^ (subR(10) & ~subR(8)); // tl = subL[10] ^ (subR[10] & ~subR[8]);
+	dw = (u32)t & subL(8);  /* FL(kl1) */
+	t = (t << 32) | (subR(10) ^ ROL1(dw)); // tr = subR[10] ^ ROL1(dw);
+	SUBKEY(7) = sub[6] ^ t; /* round 6 */
+	SUBKEY(8) = sub[8];       /* FL(kl1) */
+	SUBKEY(9) = sub[9];       /* FLinv(kl2) */
+	t = subL(7) ^ (subR(7) & ~subR(9));
+	dw = (u32)t & subL(9);  /* FLinv(kl2) */
+	t = (t << 32) | (subR(7) ^ ROL1(dw));
+	SUBKEY(10) = t ^ sub[11]; /* round 7 */
+	SUBKEY(11) = sub[10] ^ sub[12]; /* round 8 */
+	SUBKEY(12) = sub[11] ^ sub[13]; /* round 9 */
+	SUBKEY(13) = sub[12] ^ sub[14]; /* round 10 */
+	SUBKEY(14) = sub[13] ^ sub[15]; /* round 11 */
+	t = subL(18) ^ (subR(18) & ~subR(16));
+	dw = (u32)t & subL(16); /* FL(kl3) */
+	t = (t << 32) | (subR(18) ^ ROL1(dw));
+	SUBKEY(15) = sub[14] ^ t; /* round 12 */
+	SUBKEY(16) = sub[16];     /* FL(kl3) */
+	SUBKEY(17) = sub[17];     /* FLinv(kl4) */
+	t = subL(15) ^ (subR(15) & ~subR(17));
+	dw = (u32)t & subL(17); /* FLinv(kl4) */
+	t = (t << 32) | (subR(15) ^ ROL1(dw));
+	SUBKEY(18) = t ^ sub[19]; /* round 13 */
+	SUBKEY(19) = sub[18] ^ sub[20]; /* round 14 */
+	SUBKEY(20) = sub[19] ^ sub[21]; /* round 15 */
+	SUBKEY(21) = sub[20] ^ sub[22]; /* round 16 */
+	SUBKEY(22) = sub[21] ^ sub[23]; /* round 17 */
+	SUBKEY(23) = sub[22];     /* round 18 */
+	SUBKEY(24) = sub[24] ^ sub[23]; /* kw3 */
+
+	/* apply the inverse of the last half of P-function */
+	camellia_setup_tail(subkey, 24);
+}
+
+static void camellia_setup256(const unsigned char *key, u64 *subkey)
+{
+	u64 kl, kr;        /* left half of key */
+	u64 krl, krr;      /* right half of key */
+	u64 i, t, w;       /* temporary variables */
+	u64 kw4;
+	u32 dw;
+	u64 sub[34];
+
+	/**
+	 *  key = (kl || kr || krl || krr)
+	 *  (|| is concatination)
+	 */
+	GETU64(kl,  key     );
+	GETU64(kr,  key +  8);
+	GETU64(krl, key + 16);
+	GETU64(krr, key + 24);
+
+	/* generate KL dependent subkeys */
+	/* kw1 */
+	sub[0] = kl;
+	/* kw2 */
+	sub[1] = kr;
+	ROLDQ(kl, kr, w, 45);
+	/* k9 */
+	sub[12] = kl;
+	/* k10 */
+	sub[13] = kr;
+	ROLDQ(kl, kr, w, 15);
+	/* kl3 */
+	sub[16] = kl;
+	/* kl4 */
+	sub[17] = kr;
+	ROLDQ(kl, kr, w, 17);
+	/* k17 */
+	sub[22] = kl;
+	/* k18 */
+	sub[23] = kr;
+	ROLDQ(kl, kr, w, 34);
+	/* k23 */
+	sub[30] = kl;
+	/* k24 */
+	sub[31] = kr;
+
+	/* generate KR dependent subkeys */
+	ROLDQ(krl, krr, w, 15);
+	/* k3 */
+	sub[4] = krl;
+	/* k4 */
+	sub[5] = krr;
+	ROLDQ(krl, krr, w, 15);
+	/* kl1 */
+	sub[8] = krl;
+	/* kl2 */
+	sub[9] = krr;
+	ROLDQ(krl, krr, w, 30);
+	/* k13 */
+	sub[18] = krl;
+	/* k14 */
+	sub[19] = krr;
+	ROLDQ(krl, krr, w, 34);
+	/* k19 */
+	sub[26] = krl;
+	/* k20 */
+	sub[27] = krr;
+	ROLDQ(krl, krr, w, 34);
+
+	/* generate KA */
+	kl = sub[0] ^ krl;
+	kr = sub[1] ^ krr;
+	CAMELLIA_F(kl, CAMELLIA_SIGMA1, w, i);
+	kr ^= w;
+	CAMELLIA_F(kr, CAMELLIA_SIGMA2, kl, i);
+	kl ^= krl;
+	CAMELLIA_F(kl, CAMELLIA_SIGMA3, kr, i);
+	kr ^= w ^ krr;
+	CAMELLIA_F(kr, CAMELLIA_SIGMA4, w, i);
+	kl ^= w;
+
+	/* generate KB */
+	krl ^= kl;
+	krr ^= kr;
+	CAMELLIA_F(krl, CAMELLIA_SIGMA5, w, i);
+	krr ^= w;
+	CAMELLIA_F(krr, CAMELLIA_SIGMA6, w, i);
+	krl ^= w;
+
+	/* generate KA dependent subkeys */
+	ROLDQ(kl, kr, w, 15);
+	/* k5 */
+	sub[6] = kl;
+	/* k6 */
+	sub[7] = kr;
+	ROLDQ(kl, kr, w, 30);
+	/* k11 */
+	sub[14] = kl;
+	/* k12 */
+	sub[15] = kr;
+	/* kl5 */
+	ROLDQ(kl, kr, w, 32);
+	sub[24] = kl;
+	/* kl6 */
+	sub[25] = kr;
+	/* rotation left shift 49 from k11,k12 -> k21,k22 */
+	ROLDQ(kl, kr, w, (49 - 32));
+	/* k21 */
+	sub[28] = kl;
+	/* k22 */
+	sub[29] = kr;
+
+	/* generate KB dependent subkeys */
+	/* k1 */
+	sub[2] = krl;
+	/* k2 */
+	sub[3] = krr;
+	ROLDQ(krl, krr, w, 30);
+	/* k7 */
+	sub[10] = krl;
+	/* k8 */
+	sub[11] = krr;
+	ROLDQ(krl, krr, w, 30);
+	/* k15 */
+	sub[20] = krl;
+	/* k16 */
+	sub[21] = krr;
+	ROLDQ(krl, krr, w, 51);
+	/* kw3 */
+	sub[32] = krl;
+	/* kw4 */
+	sub[33] = krr;
+
+	/* absorb kw2 to other subkeys */
+	/* round 2 */
+	sub[3] ^= sub[1];
+	/* round 4 */
+	sub[5] ^= sub[1];
+	/* round 6 */
+	sub[7] ^= sub[1];
+	subL(1) ^= subR(1) & ~subR(9);
+	dw = subL(1) & subL(9),
+		subR(1) ^= ROL1(dw); /* modified for FLinv(kl2) */
+	/* round 8 */
+	sub[11] ^= sub[1];
+	/* round 10 */
+	sub[13] ^= sub[1];
+	/* round 12 */
+	sub[15] ^= sub[1];
+	subL(1) ^= subR(1) & ~subR(17);
+	dw = subL(1) & subL(17),
+		subR(1) ^= ROL1(dw); /* modified for FLinv(kl4) */
+	/* round 14 */
+	sub[19] ^= sub[1];
+	/* round 16 */
+	sub[21] ^= sub[1];
+	/* round 18 */
+	sub[23] ^= sub[1];
+	subL(1) ^= subR(1) & ~subR(25);
+	dw = subL(1) & subL(25),
+		subR(1) ^= ROL1(dw); /* modified for FLinv(kl6) */
+	/* round 20 */
+	sub[27] ^= sub[1];
+	/* round 22 */
+	sub[29] ^= sub[1];
+	/* round 24 */
+	sub[31] ^= sub[1];
+	/* kw3 */
+	sub[32] ^= sub[1];
+
+	/* absorb kw4 to other subkeys */
+	kw4 = sub[33];
+	/* round 23 */
+	sub[30] ^= kw4;
+	/* round 21 */
+	sub[28] ^= kw4;
+	/* round 19 */
+	sub[26] ^= kw4;
+	kw4 ^= (u64)((u32)kw4 & ~subR(24)) << 32; //kw4l ^= kw4r & ~subR[24];
+	dw = (u32)(kw4 >> 32) & subL(24),
+		kw4 ^= ROL1(dw); /* modified for FL(kl5) */
+	/* round 17 */
+	sub[22] ^= kw4;
+	/* round 15 */
+	sub[20] ^= kw4;
+	/* round 13 */
+	sub[18] ^= kw4;
+	kw4 ^= (u64)((u32)kw4 & ~subR(16)) << 32;
+	dw = (u32)(kw4 >> 32) & subL(16),
+		kw4 ^= ROL1(dw); /* modified for FL(kl3) */
+	/* round 11 */
+	sub[14] ^= kw4;
+	/* round 9 */
+	sub[12] ^= kw4;
+	/* round 7 */
+	sub[10] ^= kw4;
+	kw4 ^= (u64)((u32)kw4 & ~subR(8)) << 32;
+	dw = (u32)(kw4 >> 32) & subL(8),
+		kw4 ^= ROL1(dw); /* modified for FL(kl1) */
+	/* round 5 */
+	sub[6] ^= kw4;
+	/* round 3 */
+	sub[4] ^= kw4;
+	/* round 1 */
+	sub[2] ^= kw4;
+	/* kw1 */
+	sub[0] ^= kw4;
+
+	/* key XOR is end of F-function */
+	SUBKEY(0) = sub[0] ^ sub[2];/* kw1 */
+	SUBKEY(2) = sub[3];       /* round 1 */
+	SUBKEY(3) = sub[2] ^ sub[4]; /* round 2 */
+	SUBKEY(4) = sub[3] ^ sub[5]; /* round 3 */
+	SUBKEY(5) = sub[4] ^ sub[6]; /* round 4 */
+	SUBKEY(6) = sub[5] ^ sub[7]; /* round 5 */
+	t = subL(10) ^ (subR(10) & ~subR(8)); // tl = subL[10] ^ (subR[10] & ~subR[8]);
+	dw = (u32)t & subL(8);  /* FL(kl1) */
+	t = (t << 32) | (subR(10) ^ ROL1(dw)); //tr = subR[10] ^ ROL1(dw);
+	SUBKEY(7) = sub[6] ^ t;   /* round 6 */
+	SUBKEY(8) = sub[8];       /* FL(kl1) */
+	SUBKEY(9) = sub[9];       /* FLinv(kl2) */
+	t = subL(7) ^ (subR(7) & ~subR(9));
+	dw = (u32)t & subL(9);  /* FLinv(kl2) */
+	t = (t << 32) | (subR(7) ^ ROL1(dw));
+	SUBKEY(10) = t ^ sub[11]; /* round 7 */
+	SUBKEY(11) = sub[10] ^ sub[12]; /* round 8 */
+	SUBKEY(12) = sub[11] ^ sub[13]; /* round 9 */
+	SUBKEY(13) = sub[12] ^ sub[14]; /* round 10 */
+	SUBKEY(14) = sub[13] ^ sub[15]; /* round 11 */
+	t = subL(18) ^ (subR(18) & ~subR(16));
+	dw = (u32)t & subL(16); /* FL(kl3) */
+	t = (t << 32) | (subR(18) ^ ROL1(dw));
+	SUBKEY(15) = sub[14] ^ t; /* round 12 */
+	SUBKEY(16) = sub[16];     /* FL(kl3) */
+	SUBKEY(17) = sub[17];     /* FLinv(kl4) */
+	t = subL(15) ^ (subR(15) & ~subR(17));
+	dw = (u32)t & subL(17); /* FLinv(kl4) */
+	t = (t << 32) | (subR(15) ^ ROL1(dw));
+	SUBKEY(18) = t ^ sub[19]; /* round 13 */
+	SUBKEY(19) = sub[18] ^ sub[20]; /* round 14 */
+	SUBKEY(20) = sub[19] ^ sub[21]; /* round 15 */
+	SUBKEY(21) = sub[20] ^ sub[22]; /* round 16 */
+	SUBKEY(22) = sub[21] ^ sub[23]; /* round 17 */
+	t = subL(26) ^ (subR(26) & ~subR(24));
+	dw = (u32)t & subL(24); /* FL(kl5) */
+	t = (t << 32) | (subR(26) ^ ROL1(dw));
+	SUBKEY(23) = sub[22] ^ t; /* round 18 */
+	SUBKEY(24) = sub[24];     /* FL(kl5) */
+	SUBKEY(25) = sub[25];     /* FLinv(kl6) */
+	t = subL(23) ^ (subR(23) & ~subR(25));
+	dw = (u32)t & subL(25); /* FLinv(kl6) */
+	t = (t << 32) | (subR(23) ^ ROL1(dw));
+	SUBKEY(26) = t ^ sub[27]; /* round 19 */
+	SUBKEY(27) = sub[26] ^ sub[28]; /* round 20 */
+	SUBKEY(28) = sub[27] ^ sub[29]; /* round 21 */
+	SUBKEY(29) = sub[28] ^ sub[30]; /* round 22 */
+	SUBKEY(30) = sub[29] ^ sub[31]; /* round 23 */
+	SUBKEY(31) = sub[30];     /* round 24 */
+	SUBKEY(32) = sub[32] ^ sub[31]; /* kw3 */
+
+	/* apply the inverse of the last half of P-function */
+	camellia_setup_tail(subkey, 32);
+}
+
+static void camellia_setup192(const unsigned char *key, u64 *subkey)
+{
+	unsigned char kk[32];
+	u64 krl, krr;
+
+	memcpy(kk, key, 24);
+	memcpy((unsigned char *)&krl, key+16, 8);
+	krr = ~krl;
+	memcpy(kk+24, (unsigned char *)&krr, 8);
+	camellia_setup256(kk, subkey);
+}
+
+
+/*
+ * Encrypt/decrypt
+ */
+#define CAMELLIA_FLS(ll, lr, rl, rr, kll, klr, krl, krr, t0, t1, t2, t3) \
+    do {								\
+	t0 = kll & ll;							\
+	t2 = krr | rr;							\
+	rl ^= t2;							\
+	lr ^= ROL1(t0);							\
+	t3 = krl & rl;							\
+	t1 = klr | lr;							\
+	ll ^= t1;							\
+	rr ^= ROL1(t3);							\
+    } while(0)
+
+#define CAMELLIA_ROUNDSM(xl, xr, kl, kr, yl, yr, il, ir)		\
+    do {								\
+	ir =  camellia_sp1110[(u8)xr];					\
+	il =  camellia_sp1110[    (xl >> 24)];				\
+	ir ^= camellia_sp0222[    (xr >> 24)];				\
+	il ^= camellia_sp0222[(u8)(xl >> 16)];				\
+	ir ^= camellia_sp3033[(u8)(xr >> 16)];				\
+	il ^= camellia_sp3033[(u8)(xl >> 8)];				\
+	ir ^= camellia_sp4404[(u8)(xr >> 8)];				\
+	il ^= camellia_sp4404[(u8)xl];					\
+	il ^= kl;							\
+	ir ^= il ^ kr;							\
+	yl ^= ir;							\
+	yr ^= ROR8(il) ^ ir;						\
+    } while(0)
+
+/* max = 24: 128bit encrypt, max = 32: 256bit encrypt */
+static void camellia_do_encrypt(const u64 *subkey, u32 *io, unsigned max)
+{
+	u32 il,ir,t0,t1;               /* temporary variables */
+
+	/* pre whitening but absorb kw2 */
+	io[0] ^= SUBKEY_L(0);
+	io[1] ^= SUBKEY_R(0);
+
+	/* main iteration */
+#define ROUNDS(i) do { \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 2),SUBKEY_R(i + 2), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 3),SUBKEY_R(i + 3), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 4),SUBKEY_R(i + 4), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 5),SUBKEY_R(i + 5), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 6),SUBKEY_R(i + 6), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 7),SUBKEY_R(i + 7), \
+			 io[0],io[1],il,ir); \
+} while (0)
+#define FLS(i) do { \
+	CAMELLIA_FLS(io[0],io[1],io[2],io[3], \
+		     SUBKEY_L(i + 0),SUBKEY_R(i + 0), \
+		     SUBKEY_L(i + 1),SUBKEY_R(i + 1), \
+		     t0,t1,il,ir); \
+} while (0)
+
+	ROUNDS(0);
+	FLS(8);
+	ROUNDS(8);
+	FLS(16);
+	ROUNDS(16);
+	if (max == 32) {
+		FLS(24);
+		ROUNDS(24);
+	}
+
+#undef ROUNDS
+#undef FLS
+
+	/* post whitening but kw4 */
+	io[2] ^= SUBKEY_L(max);
+	io[3] ^= SUBKEY_R(max);
+	/* NB: 0,1 should be swapped with 2,3 by caller! */
+}
+
+static void camellia_do_decrypt(const u64 *subkey, u32 *io, unsigned i)
+{
+	u32 il,ir,t0,t1;               /* temporary variables */
+
+	/* pre whitening but absorb kw2 */
+	io[0] ^= SUBKEY_L(i);
+	io[1] ^= SUBKEY_R(i);
+
+	/* main iteration */
+#define ROUNDS(i) do { \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 7),SUBKEY_R(i + 7), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 6),SUBKEY_R(i + 6), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 5),SUBKEY_R(i + 5), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 4),SUBKEY_R(i + 4), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 3),SUBKEY_R(i + 3), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 2),SUBKEY_R(i + 2), \
+			 io[0],io[1],il,ir); \
+} while (0)
+#define FLS(i) do { \
+	CAMELLIA_FLS(io[0],io[1],io[2],io[3], \
+		     SUBKEY_L(i + 1),SUBKEY_R(i + 1), \
+		     SUBKEY_L(i + 0),SUBKEY_R(i + 0), \
+		     t0,t1,il,ir); \
+} while (0)
+
+	if (i == 32) {
+		ROUNDS(24);
+		FLS(24);
+	}
+	ROUNDS(16);
+	FLS(16);
+	ROUNDS(8);
+	FLS(8);
+	ROUNDS(0);
+
+#undef ROUNDS
+#undef FLS
+
+	/* post whitening but kw4 */
+	io[2] ^= SUBKEY_L(0);
+	io[3] ^= SUBKEY_R(0);
+	/* NB: 0,1 should be swapped with 2,3 by caller! */
+}
+
+
+struct camellia_ctx {
+	int key_length;
+	u64 key_table[CAMELLIA_TABLE_BYTE_LEN / 8];
+};
+
+static int
+camellia_set_key(struct crypto_tfm *tfm, const u8 *in_key,
+		 unsigned int key_len)
+{
+	struct camellia_ctx *cctx = crypto_tfm_ctx(tfm);
+	const unsigned char *key = (const unsigned char *)in_key;
+	u32 *flags = &tfm->crt_flags;
+
+	if (key_len != 16 && key_len != 24 && key_len != 32) {
+		*flags |= CRYPTO_TFM_RES_BAD_KEY_LEN;
+		return -EINVAL;
+	}
+
+	cctx->key_length = key_len;
+
+	switch (key_len) {
+	case 16:
+		camellia_setup128(key, cctx->key_table);
+		break;
+	case 24:
+		camellia_setup192(key, cctx->key_table);
+		break;
+	case 32:
+		camellia_setup256(key, cctx->key_table);
+		break;
+	}
+
+	return 0;
+}
+
+static void camellia_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+	const struct camellia_ctx *cctx = crypto_tfm_ctx(tfm);
+	const __be32 *src = (const __be32 *)in;
+	__be32 *dst = (__be32 *)out;
+
+	u32 tmp[4];
+
+	tmp[0] = be32_to_cpu(src[0]);
+	tmp[1] = be32_to_cpu(src[1]);
+	tmp[2] = be32_to_cpu(src[2]);
+	tmp[3] = be32_to_cpu(src[3]);
+
+	camellia_do_encrypt(cctx->key_table, tmp,
+		cctx->key_length == 16 ? 24 : 32 /* for key lengths of 24 and 32 */
+	);
+
+	/* do_encrypt returns 0,1 swapped with 2,3 */
+	dst[0] = cpu_to_be32(tmp[2]);
+	dst[1] = cpu_to_be32(tmp[3]);
+	dst[2] = cpu_to_be32(tmp[0]);
+	dst[3] = cpu_to_be32(tmp[1]);
+}
+
+static void camellia_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+	const struct camellia_ctx *cctx = crypto_tfm_ctx(tfm);
+	const __be32 *src = (const __be32 *)in;
+	__be32 *dst = (__be32 *)out;
+
+	u32 tmp[4];
+
+	tmp[0] = be32_to_cpu(src[0]);
+	tmp[1] = be32_to_cpu(src[1]);
+	tmp[2] = be32_to_cpu(src[2]);
+	tmp[3] = be32_to_cpu(src[3]);
+
+	camellia_do_decrypt(cctx->key_table, tmp,
+		cctx->key_length == 16 ? 24 : 32 /* for key lengths of 24 and 32 */
+	);
+
+	/* do_decrypt returns 0,1 swapped with 2,3 */
+	dst[0] = cpu_to_be32(tmp[2]);
+	dst[1] = cpu_to_be32(tmp[3]);
+	dst[2] = cpu_to_be32(tmp[0]);
+	dst[3] = cpu_to_be32(tmp[1]);
+}
+
+static struct crypto_alg camellia_alg = {
+	.cra_name		=	"camellia",
+	.cra_driver_name	=	"camellia-generic",
+	.cra_priority		=	100,
+	.cra_flags		=	CRYPTO_ALG_TYPE_CIPHER,
+	.cra_blocksize		=	CAMELLIA_BLOCK_SIZE,
+	.cra_ctxsize		=	sizeof(struct camellia_ctx),
+	.cra_alignmask		=	3,
+	.cra_module		=	THIS_MODULE,
+	.cra_list		=	LIST_HEAD_INIT(camellia_alg.cra_list),
+	.cra_u			=	{
+		.cipher = {
+			.cia_min_keysize	=	CAMELLIA_MIN_KEY_SIZE,
+			.cia_max_keysize	=	CAMELLIA_MAX_KEY_SIZE,
+			.cia_setkey		=	camellia_set_key,
+			.cia_encrypt		=	camellia_encrypt,
+			.cia_decrypt		=	camellia_decrypt
+		}
+	}
+};
+
+static int __init camellia_init(void)
+{
+	return crypto_register_alg(&camellia_alg);
+}
+
+static void __exit camellia_fini(void)
+{
+	crypto_unregister_alg(&camellia_alg);
+}
+
+module_init(camellia_init);
+module_exit(camellia_fini);
+
+MODULE_DESCRIPTION("Camellia Cipher Algorithm");
+MODULE_LICENSE("GPL");

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization
  2007-11-14  6:10                       ` David Miller
@ 2007-11-14  7:38                         ` Denys Vlasenko
  0 siblings, 0 replies; 40+ messages in thread
From: Denys Vlasenko @ 2007-11-14  7:38 UTC (permalink / raw)
  To: David Miller; +Cc: takamiya, herbert, linux-crypto

On Tuesday 13 November 2007 23:10, David Miller wrote:
> From: Denys Vlasenko <vda.linux@googlemail.com>
> Date: Tue, 13 Nov 2007 22:30:47 -0700
>
> > On Tuesday 13 November 2007 20:49, David Miller wrote:
> > > From: Denys Vlasenko <vda.linux@googlemail.com>
> > > Date: Tue, 13 Nov 2007 19:47:08 -0700
> > >
> > > > If CONFIG_CC_OPTIMIZE_FOR_SIZE is not an acceptable method,
> > > > do you have other ideas?
> > >
> > > Look at ways to make the code run faster without loop unrolling?
> >
> > I did it. I noticed that key setup is mostly operating on 64-bit
> > quantities, and provided alternative implementation which
> > exploits that fact. It's smaller and faster.
>
> Great, then you don't have to unroll the loop and performance
> is at least as good as before _and_ you save code space.

Unfortunately, it's applicable only to key setup,
and unrolling happens in actual encryption.

But the point still stands: irrespective of other optimizations,
unrolled and non-unrolled forms will still have different sizes
and speeds, and in some cases (like this one) you can't
pick one form which fits all.

> Please submit this new version :-)

Just did it. It's linux-2.6.23.1.camellia5.diff
--
vda

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization
  2007-11-14  7:15                       ` Denys Vlasenko
@ 2007-11-14 14:14                         ` Herbert Xu
  2007-11-14 21:28                           ` Denys Vlasenko
  0 siblings, 1 reply; 40+ messages in thread
From: Herbert Xu @ 2007-11-14 14:14 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: David Miller, takamiya, linux-crypto

On Wed, Nov 14, 2007 at 12:15:19AM -0700, Denys Vlasenko wrote:
>
>     Use alternative key setup implementation with mostly 64-bit ops
>     if BITS_PER_LONG >= 64. Both much smaller and much faster.

Can we please not have two versions of the same algorithm in C?
They're a pain to maintain and test.

Where performance is paramount you could look at doing an assembly
version.  Unlike two C versions at least that can be easily tested
by someone who has access to the platform in question.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization
  2007-11-14 14:14                         ` Herbert Xu
@ 2007-11-14 21:28                           ` Denys Vlasenko
  2007-11-18 13:21                             ` Herbert Xu
  0 siblings, 1 reply; 40+ messages in thread
From: Denys Vlasenko @ 2007-11-14 21:28 UTC (permalink / raw)
  To: Herbert Xu, Noriaki TAKAMIYA; +Cc: David Miller, linux-crypto

[-- Attachment #1: Type: text/plain, Size: 2481 bytes --]

On Wednesday 14 November 2007 07:14, Herbert Xu wrote:
> On Wed, Nov 14, 2007 at 12:15:19AM -0700, Denys Vlasenko wrote:
> >     Use alternative key setup implementation with mostly 64-bit ops
> >     if BITS_PER_LONG >= 64. Both much smaller and much faster.
>
> Can we please not have two versions of the same algorithm in C?
> They're a pain to maintain and test.
>
> Where performance is paramount you could look at doing an assembly
> version.  Unlike two C versions at least that can be easily tested
> by someone who has access to the platform in question.

Having two versions, one in C and another in assembly cannot be easier
than two C versions. Moreover, asm version will be arch specific -
one needs to write separate amd64/ppc64/sparc64/etc versions.
It means even more versions to maintain.

It would be faster too, though, and I think it makes sense to do it
for most popular arches sometime in future.

What I have now is a generic 64-bit C implentation which is
likely to be much faster and a bit smaller than 32-bit one
on _all_ 64-bit arches. For i386 it's 33% faster.

I think this win is big enough to justify having two versions.

I think that you are right that having separate camellia_64.c
with substantial duplication is bad. I reworked ot so that
both 32-bit and 64-bit code is now in camellia.c,
and I removed (merged) all duplicated stuff (constants, macros,
and whole encryption/decryption part).

I also split this patch into two parts for easier review:
camellia5:
        adds 64-bit key setup
camellia6:
        unifies encrypt/decrypt routines for different key lengths.
        This reduces module size by ~25%, with tiny (less than 1%)
        speed impact.
        Also collapses encrypt/decrypt into more readable
        (visually shorter) form using macros.

Compiled it on i385 and amd64:

   text    data     bss     dec     hex filename
  29724     224       0   29948    74fc 2.6.23.1.camellia.t/crypto/camellia.o
  29233     224       0   29457    7311 2.6.23.1.camellia5.t/crypto/camellia.o
  21190     224       0   21414    53a6 2.6.23.1.camellia6.t/crypto/camellia.o

  22498     288       0   22786    5902 2.6.23.1.camellia.t64/crypto/camellia.o
  21134     288       0   21422    53ae 2.6.23.1.camellia5.t64/crypto/camellia.o
  16067     288       0   16355    3fe3 2.6.23.1.camellia6.t64/crypto/camellia.o

Takamiya-san, can you review attached patches please?

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
--
vda

[-- Attachment #2: linux-2.6.23.1.camellia5.diff --]
[-- Type: text/x-diff, Size: 32570 bytes --]

diff -urpN linux-2.6.23.1.camellia/crypto/camellia.c linux-2.6.23.1.camellia5/crypto/camellia.c
--- linux-2.6.23.1.camellia/crypto/camellia.c	2007-11-14 12:30:27.000000000 -0700
+++ linux-2.6.23.1.camellia5/crypto/camellia.c	2007-11-14 12:30:27.000000000 -0700
@@ -310,6 +310,589 @@ static const u32 camellia_sp4404[256] = 
 #define CAMELLIA_BLOCK_SIZE          16
 #define CAMELLIA_TABLE_BYTE_LEN     272
 
+/*
+ * NB: L and R below stand for 'left' and 'right' as in written numbers.
+ * That is, in (xxxL,xxxR) pair xxxL holds most significant digits,
+ * _not_ least significant ones!
+ */
+
+
+
+#if BITS_PER_LONG >= 64
+
+/*
+ * Key setup implementation with mostly 64-bit ops
+ */
+
+/* key constants */
+
+#define CAMELLIA_SIGMA1 (0xA09E667F3BCC908B)
+#define CAMELLIA_SIGMA2 (0xB67AE8584CAA73B2)
+#define CAMELLIA_SIGMA3 (0xC6EF372FE94F82BE)
+#define CAMELLIA_SIGMA4 (0x54FF53A5F1D36F1C)
+#define CAMELLIA_SIGMA5 (0x10E527FADE682D1D)
+#define CAMELLIA_SIGMA6 (0xB05688C2B3E6C1FD)
+
+/*
+ *  macros
+ */
+#define GETU64(v, pt) \
+    do { \
+	/* latest breed of gcc is clever enough to use move */ \
+	memcpy(&(v), (pt), 8); \
+	(v) = be64_to_cpu(v); \
+    } while(0)
+
+/* rotation right shift 1byte */
+#define ROR8(x) (((x) >> 8) + ((x) << (sizeof(x)*8 - 8)))
+/* rotation left shift 1bit */
+#define ROL1(x) (((x) << 1) + ((x) >> (sizeof(x)*8 - 1)))
+/* rotation left shift 1byte */
+#define ROL8(x) (((x) << 8) + ((x) >> (sizeof(x)*8 - 8)))
+
+#define ROLDQ(l, r, w, bits)				\
+    do {						\
+	w = l;						\
+	l = (l << bits) + (r >> (64 - bits));		\
+	r = (r << bits) + (w >> (64 - bits));		\
+    } while(0)
+
+#define CAMELLIA_F(x, k, y, i)					\
+    do {							\
+	u32 yl, yr;						\
+	i = x ^ k;						\
+	yl = camellia_sp1110[(u8)i]				\
+	   ^ camellia_sp0222[(u8)(i >> 24)]			\
+	   ^ camellia_sp3033[(u8)(i >> 16)]			\
+	   ^ camellia_sp4404[(u8)(i >> 8)];			\
+	yr = camellia_sp1110[    (i >> 56)]			\
+	   ^ camellia_sp0222[(u8)(i >> 48)]			\
+	   ^ camellia_sp3033[(u8)(i >> 40)]			\
+	   ^ camellia_sp4404[(u8)(i >> 32)];			\
+	yl ^= yr;						\
+	yr = ROR8(yr);						\
+	yr ^= yl;						\
+	y = ((u64)yl << 32) + yr;				\
+    } while(0)
+
+#define SUBKEY(INDEX) (subkey[(INDEX)])
+
+#ifdef __BIG_ENDIAN
+#define SUBKEY_L(INDEX) (((u32*)subkey)[(INDEX)*2])
+#define SUBKEY_R(INDEX) (((u32*)subkey)[(INDEX)*2 + 1])
+#else
+#define SUBKEY_L(INDEX) (((u32*)subkey)[(INDEX)*2 + 1])
+#define SUBKEY_R(INDEX) (((u32*)subkey)[(INDEX)*2])
+#endif
+
+static void camellia_setup_tail(u64 *subkey, int max)
+{
+	u32 dw;
+	int i = 2;
+	do {
+		dw = SUBKEY_L(i + 0) ^ SUBKEY_R(i + 0); dw = ROL8(dw);/* round 1 */
+		SUBKEY_R(i + 0) = SUBKEY_L(i + 0) ^ dw; SUBKEY_L(i + 0) = dw;
+		dw = SUBKEY_L(i + 1) ^ SUBKEY_R(i + 1); dw = ROL8(dw);/* round 2 */
+		SUBKEY_R(i + 1) = SUBKEY_L(i + 1) ^ dw; SUBKEY_L(i + 1) = dw;
+		dw = SUBKEY_L(i + 2) ^ SUBKEY_R(i + 2); dw = ROL8(dw);/* round 3 */
+		SUBKEY_R(i + 2) = SUBKEY_L(i + 2) ^ dw; SUBKEY_L(i + 2) = dw;
+		dw = SUBKEY_L(i + 3) ^ SUBKEY_R(i + 3); dw = ROL8(dw);/* round 4 */
+		SUBKEY_R(i + 3) = SUBKEY_L(i + 3) ^ dw; SUBKEY_L(i + 3) = dw;
+		dw = SUBKEY_L(i + 4) ^ SUBKEY_R(i + 4); dw = ROL8(dw);/* round 5 */
+		SUBKEY_R(i + 4) = SUBKEY_L(i + 4) ^ dw; SUBKEY_L(i + 4) = dw;
+		dw = SUBKEY_L(i + 5) ^ SUBKEY_R(i + 5); dw = ROL8(dw);/* round 6 */
+		SUBKEY_R(i + 5) = SUBKEY_L(i + 5) ^ dw; SUBKEY_L(i + 5) = dw;
+		i += 8;
+	} while (i < max);
+}
+
+#ifdef __BIG_ENDIAN
+#define subL(INDEX) (((u32*)sub)[(INDEX)*2])
+#define subR(INDEX) (((u32*)sub)[(INDEX)*2 + 1])
+#else
+#define subL(INDEX) (((u32*)sub)[(INDEX)*2 + 1])
+#define subR(INDEX) (((u32*)sub)[(INDEX)*2])
+#endif
+
+static void camellia_setup128(const unsigned char *key, u64 *subkey)
+{
+	u64 kl, kr;
+	u64 i, t, w;
+	u64 kw4;
+	u32 dw;
+	u64 sub[26];
+
+	/**
+	 *  k == kl || kr (|| is concatination)
+	 */
+	GETU64(kl, key     );
+	GETU64(kr, key +  8);
+
+	/**
+	 * generate KL dependent subkeys
+	 */
+	/* kw1 */
+	sub[0] = kl;
+	/* kw2 */
+	sub[1] = kr;
+	/* rotation left shift 15bit */
+	ROLDQ(kl, kr, w, 15);
+	/* k3 */
+	sub[4] = kl;
+	/* k4 */
+	sub[5] = kr;
+	/* rotation left shift 15+30bit */
+	ROLDQ(kl, kr, w, 30);
+	/* k7 */
+	sub[10] = kl;
+	/* k8 */
+	sub[11] = kr;
+	/* rotation left shift 15+30+15bit */
+	ROLDQ(kl, kr, w, 15);
+	/* k10 */
+	sub[13] = kr;
+	/* rotation left shift 15+30+15+17 bit */
+	ROLDQ(kl, kr, w, 17);
+	/* kl3 */
+	sub[16] = kl;
+	/* kl4 */
+	sub[17] = kr;
+	/* rotation left shift 15+30+15+17+17 bit */
+	ROLDQ(kl, kr, w, 17);
+	/* k13 */
+	sub[18] = kl;
+	/* k14 */
+	sub[19] = kr;
+	/* rotation left shift 15+30+15+17+17+17 bit */
+	ROLDQ(kl, kr, w, 17);
+	/* k17 */
+	sub[22] = kl;
+	/* k18 */
+	sub[23] = kr;
+
+	/* generate KA */
+	kl = sub[0];
+	kr = sub[1];
+	CAMELLIA_F(kl, CAMELLIA_SIGMA1, w, i);
+	kr ^= w;
+	CAMELLIA_F(kr, CAMELLIA_SIGMA2, kl, i);
+	/* current status == (kl, w) */
+	CAMELLIA_F(kl, CAMELLIA_SIGMA3, kr, i);
+	kr ^= w;
+	CAMELLIA_F(kr, CAMELLIA_SIGMA4, w, i);
+	kl ^= w;
+
+	/* generate KA dependent subkeys */
+	/* k1, k2 */
+	sub[2] = kl;
+	sub[3] = kr;
+	ROLDQ(kl, kr, w, 15);
+	/* k5,k6 */
+	sub[6] = kl;
+	sub[7] = kr;
+	ROLDQ(kl, kr, w, 15);
+	/* kl1, kl2 */
+	sub[8] = kl;
+	sub[9] = kr;
+	ROLDQ(kl, kr, w, 15);
+	/* k9 */
+	sub[12] = kl;
+	ROLDQ(kl, kr, w, 15);
+	/* k11, k12 */
+	sub[14] = kl;
+	sub[15] = kr;
+	ROLDQ(kl, kr, w, 34);
+	/* k15, k16 */
+	sub[20] = kl;
+	sub[21] = kr;
+	ROLDQ(kl, kr, w, 17);
+	/* kw3, kw4 */
+	sub[24] = kl;
+	sub[25] = kr;
+
+	/* absorb kw2 to other subkeys */
+	/* round 2 */
+	sub[3] ^= sub[1];
+	/* round 4 */
+	sub[5] ^= sub[1];
+	/* round 6 */
+	sub[7] ^= sub[1];
+	subL(1) ^= subR(1) & ~subR(9);
+	dw = subL(1) & subL(9),
+		subR(1) ^= ROL1(dw); /* modified for FLinv(kl2) */
+	/* round 8 */
+	sub[11] ^= sub[1];
+	/* round 10 */
+	sub[13] ^= sub[1];
+	/* round 12 */
+	sub[15] ^= sub[1];
+	subL(1) ^= subR(1) & ~subR(17);
+	dw = subL(1) & subL(17),
+		subR(1) ^= ROL1(dw); /* modified for FLinv(kl4) */
+	/* round 14 */
+	sub[19] ^= sub[1];
+	/* round 16 */
+	sub[21] ^= sub[1];
+	/* round 18 */
+	sub[23] ^= sub[1];
+	/* kw3 */
+	sub[24] ^= sub[1];
+
+	/* absorb kw4 to other subkeys */
+	kw4 = sub[25];
+	/* round 17 */
+	sub[22] ^= kw4;
+	/* round 15 */
+	sub[20] ^= kw4;
+	/* round 13 */
+	sub[18] ^= kw4;
+	kw4 ^= (u64)((u32)kw4 & ~subR(16)) << 32; //kw4l ^= kw4r & ~subR(16);
+	dw = (u32)(kw4 >> 32) & subL(16); // kw4l & subL[16],
+	kw4 ^= ROL1(dw); /* modified for FL(kl3) */
+	/* round 11 */
+	sub[14] ^= kw4;
+	/* round 9 */
+	sub[12] ^= kw4;
+	/* round 7 */
+	sub[10] ^= kw4;
+	kw4 ^= (u64)((u32)kw4 & ~subR(8)) << 32; //kw4l ^= kw4r & ~subR[8];
+	dw = (u32)(kw4 >> 32) & subL(8);
+	kw4 ^= ROL1(dw); /* modified for FL(kl1) */
+	/* round 5 */
+	sub[6] ^= kw4;
+	/* round 3 */
+	sub[4] ^= kw4;
+	/* round 1 */
+	sub[2] ^= kw4;
+	/* kw1 */
+	sub[0] ^= kw4;
+
+	/* key XOR is end of F-function */
+	SUBKEY(0) = sub[0] ^ sub[2];/* kw1 */
+	SUBKEY(2) = sub[3];       /* round 1 */
+	SUBKEY(3) = sub[2] ^ sub[4]; /* round 2 */
+	SUBKEY(4) = sub[3] ^ sub[5]; /* round 3 */
+	SUBKEY(5) = sub[4] ^ sub[6]; /* round 4 */
+	SUBKEY(6) = sub[5] ^ sub[7]; /* round 5 */
+	t = subL(10) ^ (subR(10) & ~subR(8)); // tl = subL[10] ^ (subR[10] & ~subR[8]);
+	dw = (u32)t & subL(8);  /* FL(kl1) */
+	t = (t << 32) | (subR(10) ^ ROL1(dw)); // tr = subR[10] ^ ROL1(dw);
+	SUBKEY(7) = sub[6] ^ t; /* round 6 */
+	SUBKEY(8) = sub[8];       /* FL(kl1) */
+	SUBKEY(9) = sub[9];       /* FLinv(kl2) */
+	t = subL(7) ^ (subR(7) & ~subR(9));
+	dw = (u32)t & subL(9);  /* FLinv(kl2) */
+	t = (t << 32) | (subR(7) ^ ROL1(dw));
+	SUBKEY(10) = t ^ sub[11]; /* round 7 */
+	SUBKEY(11) = sub[10] ^ sub[12]; /* round 8 */
+	SUBKEY(12) = sub[11] ^ sub[13]; /* round 9 */
+	SUBKEY(13) = sub[12] ^ sub[14]; /* round 10 */
+	SUBKEY(14) = sub[13] ^ sub[15]; /* round 11 */
+	t = subL(18) ^ (subR(18) & ~subR(16));
+	dw = (u32)t & subL(16); /* FL(kl3) */
+	t = (t << 32) | (subR(18) ^ ROL1(dw));
+	SUBKEY(15) = sub[14] ^ t; /* round 12 */
+	SUBKEY(16) = sub[16];     /* FL(kl3) */
+	SUBKEY(17) = sub[17];     /* FLinv(kl4) */
+	t = subL(15) ^ (subR(15) & ~subR(17));
+	dw = (u32)t & subL(17); /* FLinv(kl4) */
+	t = (t << 32) | (subR(15) ^ ROL1(dw));
+	SUBKEY(18) = t ^ sub[19]; /* round 13 */
+	SUBKEY(19) = sub[18] ^ sub[20]; /* round 14 */
+	SUBKEY(20) = sub[19] ^ sub[21]; /* round 15 */
+	SUBKEY(21) = sub[20] ^ sub[22]; /* round 16 */
+	SUBKEY(22) = sub[21] ^ sub[23]; /* round 17 */
+	SUBKEY(23) = sub[22];     /* round 18 */
+	SUBKEY(24) = sub[24] ^ sub[23]; /* kw3 */
+
+	/* apply the inverse of the last half of P-function */
+	camellia_setup_tail(subkey, 24);
+}
+
+static void camellia_setup256(const unsigned char *key, u64 *subkey)
+{
+	u64 kl, kr;        /* left half of key */
+	u64 krl, krr;      /* right half of key */
+	u64 i, t, w;       /* temporary variables */
+	u64 kw4;
+	u32 dw;
+	u64 sub[34];
+
+	/**
+	 *  key = (kl || kr || krl || krr)
+	 *  (|| is concatination)
+	 */
+	GETU64(kl,  key     );
+	GETU64(kr,  key +  8);
+	GETU64(krl, key + 16);
+	GETU64(krr, key + 24);
+
+	/* generate KL dependent subkeys */
+	/* kw1 */
+	sub[0] = kl;
+	/* kw2 */
+	sub[1] = kr;
+	ROLDQ(kl, kr, w, 45);
+	/* k9 */
+	sub[12] = kl;
+	/* k10 */
+	sub[13] = kr;
+	ROLDQ(kl, kr, w, 15);
+	/* kl3 */
+	sub[16] = kl;
+	/* kl4 */
+	sub[17] = kr;
+	ROLDQ(kl, kr, w, 17);
+	/* k17 */
+	sub[22] = kl;
+	/* k18 */
+	sub[23] = kr;
+	ROLDQ(kl, kr, w, 34);
+	/* k23 */
+	sub[30] = kl;
+	/* k24 */
+	sub[31] = kr;
+
+	/* generate KR dependent subkeys */
+	ROLDQ(krl, krr, w, 15);
+	/* k3 */
+	sub[4] = krl;
+	/* k4 */
+	sub[5] = krr;
+	ROLDQ(krl, krr, w, 15);
+	/* kl1 */
+	sub[8] = krl;
+	/* kl2 */
+	sub[9] = krr;
+	ROLDQ(krl, krr, w, 30);
+	/* k13 */
+	sub[18] = krl;
+	/* k14 */
+	sub[19] = krr;
+	ROLDQ(krl, krr, w, 34);
+	/* k19 */
+	sub[26] = krl;
+	/* k20 */
+	sub[27] = krr;
+	ROLDQ(krl, krr, w, 34);
+
+	/* generate KA */
+	kl = sub[0] ^ krl;
+	kr = sub[1] ^ krr;
+	CAMELLIA_F(kl, CAMELLIA_SIGMA1, w, i);
+	kr ^= w;
+	CAMELLIA_F(kr, CAMELLIA_SIGMA2, kl, i);
+	kl ^= krl;
+	CAMELLIA_F(kl, CAMELLIA_SIGMA3, kr, i);
+	kr ^= w ^ krr;
+	CAMELLIA_F(kr, CAMELLIA_SIGMA4, w, i);
+	kl ^= w;
+
+	/* generate KB */
+	krl ^= kl;
+	krr ^= kr;
+	CAMELLIA_F(krl, CAMELLIA_SIGMA5, w, i);
+	krr ^= w;
+	CAMELLIA_F(krr, CAMELLIA_SIGMA6, w, i);
+	krl ^= w;
+
+	/* generate KA dependent subkeys */
+	ROLDQ(kl, kr, w, 15);
+	/* k5 */
+	sub[6] = kl;
+	/* k6 */
+	sub[7] = kr;
+	ROLDQ(kl, kr, w, 30);
+	/* k11 */
+	sub[14] = kl;
+	/* k12 */
+	sub[15] = kr;
+	/* kl5 */
+	ROLDQ(kl, kr, w, 32);
+	sub[24] = kl;
+	/* kl6 */
+	sub[25] = kr;
+	/* rotation left shift 49 from k11,k12 -> k21,k22 */
+	ROLDQ(kl, kr, w, (49 - 32));
+	/* k21 */
+	sub[28] = kl;
+	/* k22 */
+	sub[29] = kr;
+
+	/* generate KB dependent subkeys */
+	/* k1 */
+	sub[2] = krl;
+	/* k2 */
+	sub[3] = krr;
+	ROLDQ(krl, krr, w, 30);
+	/* k7 */
+	sub[10] = krl;
+	/* k8 */
+	sub[11] = krr;
+	ROLDQ(krl, krr, w, 30);
+	/* k15 */
+	sub[20] = krl;
+	/* k16 */
+	sub[21] = krr;
+	ROLDQ(krl, krr, w, 51);
+	/* kw3 */
+	sub[32] = krl;
+	/* kw4 */
+	sub[33] = krr;
+
+	/* absorb kw2 to other subkeys */
+	/* round 2 */
+	sub[3] ^= sub[1];
+	/* round 4 */
+	sub[5] ^= sub[1];
+	/* round 6 */
+	sub[7] ^= sub[1];
+	subL(1) ^= subR(1) & ~subR(9);
+	dw = subL(1) & subL(9),
+		subR(1) ^= ROL1(dw); /* modified for FLinv(kl2) */
+	/* round 8 */
+	sub[11] ^= sub[1];
+	/* round 10 */
+	sub[13] ^= sub[1];
+	/* round 12 */
+	sub[15] ^= sub[1];
+	subL(1) ^= subR(1) & ~subR(17);
+	dw = subL(1) & subL(17),
+		subR(1) ^= ROL1(dw); /* modified for FLinv(kl4) */
+	/* round 14 */
+	sub[19] ^= sub[1];
+	/* round 16 */
+	sub[21] ^= sub[1];
+	/* round 18 */
+	sub[23] ^= sub[1];
+	subL(1) ^= subR(1) & ~subR(25);
+	dw = subL(1) & subL(25),
+		subR(1) ^= ROL1(dw); /* modified for FLinv(kl6) */
+	/* round 20 */
+	sub[27] ^= sub[1];
+	/* round 22 */
+	sub[29] ^= sub[1];
+	/* round 24 */
+	sub[31] ^= sub[1];
+	/* kw3 */
+	sub[32] ^= sub[1];
+
+	/* absorb kw4 to other subkeys */
+	kw4 = sub[33];
+	/* round 23 */
+	sub[30] ^= kw4;
+	/* round 21 */
+	sub[28] ^= kw4;
+	/* round 19 */
+	sub[26] ^= kw4;
+	kw4 ^= (u64)((u32)kw4 & ~subR(24)) << 32; //kw4l ^= kw4r & ~subR[24];
+	dw = (u32)(kw4 >> 32) & subL(24),
+		kw4 ^= ROL1(dw); /* modified for FL(kl5) */
+	/* round 17 */
+	sub[22] ^= kw4;
+	/* round 15 */
+	sub[20] ^= kw4;
+	/* round 13 */
+	sub[18] ^= kw4;
+	kw4 ^= (u64)((u32)kw4 & ~subR(16)) << 32;
+	dw = (u32)(kw4 >> 32) & subL(16),
+		kw4 ^= ROL1(dw); /* modified for FL(kl3) */
+	/* round 11 */
+	sub[14] ^= kw4;
+	/* round 9 */
+	sub[12] ^= kw4;
+	/* round 7 */
+	sub[10] ^= kw4;
+	kw4 ^= (u64)((u32)kw4 & ~subR(8)) << 32;
+	dw = (u32)(kw4 >> 32) & subL(8),
+		kw4 ^= ROL1(dw); /* modified for FL(kl1) */
+	/* round 5 */
+	sub[6] ^= kw4;
+	/* round 3 */
+	sub[4] ^= kw4;
+	/* round 1 */
+	sub[2] ^= kw4;
+	/* kw1 */
+	sub[0] ^= kw4;
+
+	/* key XOR is end of F-function */
+	SUBKEY(0) = sub[0] ^ sub[2];/* kw1 */
+	SUBKEY(2) = sub[3];       /* round 1 */
+	SUBKEY(3) = sub[2] ^ sub[4]; /* round 2 */
+	SUBKEY(4) = sub[3] ^ sub[5]; /* round 3 */
+	SUBKEY(5) = sub[4] ^ sub[6]; /* round 4 */
+	SUBKEY(6) = sub[5] ^ sub[7]; /* round 5 */
+	t = subL(10) ^ (subR(10) & ~subR(8)); // tl = subL[10] ^ (subR[10] & ~subR[8]);
+	dw = (u32)t & subL(8);  /* FL(kl1) */
+	t = (t << 32) | (subR(10) ^ ROL1(dw)); //tr = subR[10] ^ ROL1(dw);
+	SUBKEY(7) = sub[6] ^ t;   /* round 6 */
+	SUBKEY(8) = sub[8];       /* FL(kl1) */
+	SUBKEY(9) = sub[9];       /* FLinv(kl2) */
+	t = subL(7) ^ (subR(7) & ~subR(9));
+	dw = (u32)t & subL(9);  /* FLinv(kl2) */
+	t = (t << 32) | (subR(7) ^ ROL1(dw));
+	SUBKEY(10) = t ^ sub[11]; /* round 7 */
+	SUBKEY(11) = sub[10] ^ sub[12]; /* round 8 */
+	SUBKEY(12) = sub[11] ^ sub[13]; /* round 9 */
+	SUBKEY(13) = sub[12] ^ sub[14]; /* round 10 */
+	SUBKEY(14) = sub[13] ^ sub[15]; /* round 11 */
+	t = subL(18) ^ (subR(18) & ~subR(16));
+	dw = (u32)t & subL(16); /* FL(kl3) */
+	t = (t << 32) | (subR(18) ^ ROL1(dw));
+	SUBKEY(15) = sub[14] ^ t; /* round 12 */
+	SUBKEY(16) = sub[16];     /* FL(kl3) */
+	SUBKEY(17) = sub[17];     /* FLinv(kl4) */
+	t = subL(15) ^ (subR(15) & ~subR(17));
+	dw = (u32)t & subL(17); /* FLinv(kl4) */
+	t = (t << 32) | (subR(15) ^ ROL1(dw));
+	SUBKEY(18) = t ^ sub[19]; /* round 13 */
+	SUBKEY(19) = sub[18] ^ sub[20]; /* round 14 */
+	SUBKEY(20) = sub[19] ^ sub[21]; /* round 15 */
+	SUBKEY(21) = sub[20] ^ sub[22]; /* round 16 */
+	SUBKEY(22) = sub[21] ^ sub[23]; /* round 17 */
+	t = subL(26) ^ (subR(26) & ~subR(24));
+	dw = (u32)t & subL(24); /* FL(kl5) */
+	t = (t << 32) | (subR(26) ^ ROL1(dw));
+	SUBKEY(23) = sub[22] ^ t; /* round 18 */
+	SUBKEY(24) = sub[24];     /* FL(kl5) */
+	SUBKEY(25) = sub[25];     /* FLinv(kl6) */
+	t = subL(23) ^ (subR(23) & ~subR(25));
+	dw = (u32)t & subL(25); /* FLinv(kl6) */
+	t = (t << 32) | (subR(23) ^ ROL1(dw));
+	SUBKEY(26) = t ^ sub[27]; /* round 19 */
+	SUBKEY(27) = sub[26] ^ sub[28]; /* round 20 */
+	SUBKEY(28) = sub[27] ^ sub[29]; /* round 21 */
+	SUBKEY(29) = sub[28] ^ sub[30]; /* round 22 */
+	SUBKEY(30) = sub[29] ^ sub[31]; /* round 23 */
+	SUBKEY(31) = sub[30];     /* round 24 */
+	SUBKEY(32) = sub[32] ^ sub[31]; /* kw3 */
+
+	/* apply the inverse of the last half of P-function */
+	camellia_setup_tail(subkey, 32);
+}
+
+static void camellia_setup192(const unsigned char *key, u64 *subkey)
+{
+	unsigned char kk[32];
+	u64 krl, krr;
+
+	memcpy(kk, key, 24);
+	memcpy((unsigned char *)&krl, key+16, 8);
+	krr = ~krl;
+	memcpy(kk+24, (unsigned char *)&krr, 8);
+	camellia_setup256(kk, subkey);
+}
+
+typedef u64 key_element;
+typedef const u64 const_key_element;
+
+
+
+#else /* BITS_PER_LONG < 64 */
+
+/*
+ * Key setup implementation with 32-bit ops
+ */
 
 /* key constants */
 
@@ -329,8 +912,7 @@ static const u32 camellia_sp4404[256] = 
 /*
  *  macros
  */
-
-# define GETU32(v, pt) \
+#define GETU32(v, pt) \
     do { \
 	/* latest breed of gcc is clever enough to use move */ \
 	memcpy(&(v), (pt), 4); \
@@ -363,64 +945,25 @@ static const u32 camellia_sp4404[256] = 
 	rr = (w0 << (bits - 32)) + (w1 >> (64 - bits));	\
     } while(0)
 
-
 #define CAMELLIA_F(xl, xr, kl, kr, yl, yr, il, ir, t0, t1)	\
     do {							\
 	il = xl ^ kl;						\
 	ir = xr ^ kr;						\
 	t0 = il >> 16;						\
 	t1 = ir >> 16;						\
-	yl = camellia_sp1110[ir & 0xff]				\
-	   ^ camellia_sp0222[(t1 >> 8) & 0xff]			\
-	   ^ camellia_sp3033[t1 & 0xff]				\
-	   ^ camellia_sp4404[(ir >> 8) & 0xff];			\
-	yr = camellia_sp1110[(t0 >> 8) & 0xff]			\
-	   ^ camellia_sp0222[t0 & 0xff]				\
-	   ^ camellia_sp3033[(il >> 8) & 0xff]			\
-	   ^ camellia_sp4404[il & 0xff];			\
+	yl = camellia_sp1110[(u8)(ir     )]			\
+	   ^ camellia_sp0222[    (t1 >> 8)]			\
+	   ^ camellia_sp3033[(u8)(t1     )]			\
+	   ^ camellia_sp4404[(u8)(ir >> 8)];			\
+	yr = camellia_sp1110[    (t0 >> 8)]			\
+	   ^ camellia_sp0222[(u8)(t0     )]			\
+	   ^ camellia_sp3033[(u8)(il >> 8)]			\
+	   ^ camellia_sp4404[(u8)(il     )];			\
 	yl ^= yr;						\
 	yr = ROR8(yr);						\
 	yr ^= yl;						\
     } while(0)
 
-
-/*
- * for speed up
- *
- */
-#define CAMELLIA_FLS(ll, lr, rl, rr, kll, klr, krl, krr, t0, t1, t2, t3) \
-    do {								\
-	t0 = kll;							\
-	t2 = krr;							\
-	t0 &= ll;							\
-	t2 |= rr;							\
-	rl ^= t2;							\
-	lr ^= ROL1(t0);							\
-	t3 = krl;							\
-	t1 = klr;							\
-	t3 &= rl;							\
-	t1 |= lr;							\
-	ll ^= t1;							\
-	rr ^= ROL1(t3);							\
-    } while(0)
-
-#define CAMELLIA_ROUNDSM(xl, xr, kl, kr, yl, yr, il, ir, t0, t1)	\
-    do {								\
-	ir =  camellia_sp1110[xr & 0xff];				\
-	il =  camellia_sp1110[(xl>>24) & 0xff];				\
-	ir ^= camellia_sp0222[(xr>>24) & 0xff];				\
-	il ^= camellia_sp0222[(xl>>16) & 0xff];				\
-	ir ^= camellia_sp3033[(xr>>16) & 0xff];				\
-	il ^= camellia_sp3033[(xl>>8) & 0xff];				\
-	ir ^= camellia_sp4404[(xr>>8) & 0xff];				\
-	il ^= camellia_sp4404[xl & 0xff];				\
-	il ^= kl;							\
-	ir ^= il ^ kr;							\
-	yl ^= ir;							\
-	yr ^= ROR8(il) ^ ir;						\
-    } while(0)
-
-
 #define SUBKEY_L(INDEX) (subkey[(INDEX)*2])
 #define SUBKEY_R(INDEX) (subkey[(INDEX)*2 + 1])
 
@@ -999,8 +1542,49 @@ static void camellia_setup192(const unsi
 	camellia_setup256(kk, subkey);
 }
 
+typedef u32 key_element;
+typedef const u32 const_key_element;
+
+#endif /* 32/64-bit key setup versions */
+
+
+
+/*
+ * Encrypt/decrypt
+ */
+#define CAMELLIA_FLS(ll, lr, rl, rr, kll, klr, krl, krr, t0, t1, t2, t3) \
+    do {								\
+	t0 = kll;							\
+	t2 = krr;							\
+	t0 &= ll;							\
+	t2 |= rr;							\
+	rl ^= t2;							\
+	lr ^= ROL1(t0);							\
+	t3 = krl;							\
+	t1 = klr;							\
+	t3 &= rl;							\
+	t1 |= lr;							\
+	ll ^= t1;							\
+	rr ^= ROL1(t3);							\
+    } while(0)
+
+#define CAMELLIA_ROUNDSM(xl, xr, kl, kr, yl, yr, il, ir)		\
+    do {								\
+	ir =  camellia_sp1110[(u8)xr];					\
+	il =  camellia_sp1110[    (xl >> 24)];				\
+	ir ^= camellia_sp0222[    (xr >> 24)];				\
+	il ^= camellia_sp0222[(u8)(xl >> 16)];				\
+	ir ^= camellia_sp3033[(u8)(xr >> 16)];				\
+	il ^= camellia_sp3033[(u8)(xl >> 8)];				\
+	ir ^= camellia_sp4404[(u8)(xr >> 8)];				\
+	il ^= camellia_sp4404[(u8)xl];					\
+	il ^= kl;							\
+	ir ^= il ^ kr;							\
+	yl ^= ir;							\
+	yr ^= ROR8(il) ^ ir;						\
+    } while(0)
 
-static void camellia_encrypt128(const u32 *subkey, u32 *io_text)
+static void camellia_encrypt128(const_key_element *subkey, u32 *io_text)
 {
 	u32 il,ir,t0,t1;               /* temporary variables */
 
@@ -1015,22 +1599,22 @@ static void camellia_encrypt128(const u3
 	/* main iteration */
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(2),SUBKEY_R(2),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(3),SUBKEY_R(3),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(4),SUBKEY_R(4),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(5),SUBKEY_R(5),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(6),SUBKEY_R(6),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(7),SUBKEY_R(7),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 
 	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
 		     SUBKEY_L(8),SUBKEY_R(8),
@@ -1039,22 +1623,22 @@ static void camellia_encrypt128(const u3
 
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(10),SUBKEY_R(10),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(11),SUBKEY_R(11),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(12),SUBKEY_R(12),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(13),SUBKEY_R(13),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(14),SUBKEY_R(14),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(15),SUBKEY_R(15),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 
 	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
 		     SUBKEY_L(16),SUBKEY_R(16),
@@ -1063,22 +1647,22 @@ static void camellia_encrypt128(const u3
 
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(18),SUBKEY_R(18),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(19),SUBKEY_R(19),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(20),SUBKEY_R(20),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(21),SUBKEY_R(21),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(22),SUBKEY_R(22),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(23),SUBKEY_R(23),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 
 	/* post whitening but kw4 */
 	io_text[0] = io[2] ^ SUBKEY_L(24);
@@ -1087,7 +1671,7 @@ static void camellia_encrypt128(const u3
 	io_text[3] = io[1];
 }
 
-static void camellia_decrypt128(const u32 *subkey, u32 *io_text)
+static void camellia_decrypt128(const_key_element *subkey, u32 *io_text)
 {
 	u32 il,ir,t0,t1;               /* temporary variables */
 
@@ -1102,22 +1686,22 @@ static void camellia_decrypt128(const u3
 	/* main iteration */
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(23),SUBKEY_R(23),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(22),SUBKEY_R(22),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(21),SUBKEY_R(21),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(20),SUBKEY_R(20),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(19),SUBKEY_R(19),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(18),SUBKEY_R(18),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 
 	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
 		     SUBKEY_L(17),SUBKEY_R(17),
@@ -1126,22 +1710,22 @@ static void camellia_decrypt128(const u3
 
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(15),SUBKEY_R(15),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(14),SUBKEY_R(14),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(13),SUBKEY_R(13),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(12),SUBKEY_R(12),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(11),SUBKEY_R(11),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(10),SUBKEY_R(10),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 
 	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
 		     SUBKEY_L(9),SUBKEY_R(9),
@@ -1150,22 +1734,22 @@ static void camellia_decrypt128(const u3
 
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(7),SUBKEY_R(7),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(6),SUBKEY_R(6),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(5),SUBKEY_R(5),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(4),SUBKEY_R(4),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(3),SUBKEY_R(3),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(2),SUBKEY_R(2),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 
 	/* post whitening but kw4 */
 	io_text[0] = io[2] ^ SUBKEY_L(0);
@@ -1174,7 +1758,7 @@ static void camellia_decrypt128(const u3
 	io_text[3] = io[1];
 }
 
-static void camellia_encrypt256(const u32 *subkey, u32 *io_text)
+static void camellia_encrypt256(const_key_element *subkey, u32 *io_text)
 {
 	u32 il,ir,t0,t1;           /* temporary variables */
 
@@ -1189,22 +1773,22 @@ static void camellia_encrypt256(const u3
 	/* main iteration */
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(2),SUBKEY_R(2),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(3),SUBKEY_R(3),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(4),SUBKEY_R(4),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(5),SUBKEY_R(5),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(6),SUBKEY_R(6),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(7),SUBKEY_R(7),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 
 	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
 		     SUBKEY_L(8),SUBKEY_R(8),
@@ -1213,22 +1797,22 @@ static void camellia_encrypt256(const u3
 
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(10),SUBKEY_R(10),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(11),SUBKEY_R(11),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(12),SUBKEY_R(12),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(13),SUBKEY_R(13),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(14),SUBKEY_R(14),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(15),SUBKEY_R(15),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 
 	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
 		     SUBKEY_L(16),SUBKEY_R(16),
@@ -1237,22 +1821,22 @@ static void camellia_encrypt256(const u3
 
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(18),SUBKEY_R(18),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(19),SUBKEY_R(19),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(20),SUBKEY_R(20),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(21),SUBKEY_R(21),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(22),SUBKEY_R(22),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(23),SUBKEY_R(23),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 
 	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
 		     SUBKEY_L(24),SUBKEY_R(24),
@@ -1261,22 +1845,22 @@ static void camellia_encrypt256(const u3
 
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(26),SUBKEY_R(26),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(27),SUBKEY_R(27),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(28),SUBKEY_R(28),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(29),SUBKEY_R(29),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(30),SUBKEY_R(30),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(31),SUBKEY_R(31),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 
 	/* post whitening but kw4 */
 	io_text[0] = io[2] ^ SUBKEY_L(32);
@@ -1285,7 +1869,7 @@ static void camellia_encrypt256(const u3
 	io_text[3] = io[1];
 }
 
-static void camellia_decrypt256(const u32 *subkey, u32 *io_text)
+static void camellia_decrypt256(const_key_element *subkey, u32 *io_text)
 {
 	u32 il,ir,t0,t1;           /* temporary variables */
 
@@ -1300,22 +1884,22 @@ static void camellia_decrypt256(const u3
 	/* main iteration */
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(31),SUBKEY_R(31),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(30),SUBKEY_R(30),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(29),SUBKEY_R(29),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(28),SUBKEY_R(28),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(27),SUBKEY_R(27),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(26),SUBKEY_R(26),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 
 	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
 		     SUBKEY_L(25),SUBKEY_R(25),
@@ -1324,22 +1908,22 @@ static void camellia_decrypt256(const u3
 
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(23),SUBKEY_R(23),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(22),SUBKEY_R(22),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(21),SUBKEY_R(21),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(20),SUBKEY_R(20),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(19),SUBKEY_R(19),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(18),SUBKEY_R(18),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 
 	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
 		     SUBKEY_L(17),SUBKEY_R(17),
@@ -1348,22 +1932,22 @@ static void camellia_decrypt256(const u3
 
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(15),SUBKEY_R(15),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(14),SUBKEY_R(14),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(13),SUBKEY_R(13),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(12),SUBKEY_R(12),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(11),SUBKEY_R(11),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(10),SUBKEY_R(10),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 
 	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
 		     SUBKEY_L(9),SUBKEY_R(9),
@@ -1372,22 +1956,22 @@ static void camellia_decrypt256(const u3
 
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(7),SUBKEY_R(7),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(6),SUBKEY_R(6),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(5),SUBKEY_R(5),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(4),SUBKEY_R(4),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(3),SUBKEY_R(3),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(2),SUBKEY_R(2),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 
 	/* post whitening but kw4 */
 	io_text[0] = io[2] ^ SUBKEY_L(0);
@@ -1399,7 +1983,7 @@ static void camellia_decrypt256(const u3
 
 struct camellia_ctx {
 	int key_length;
-	u32 key_table[CAMELLIA_TABLE_BYTE_LEN / 4];
+	key_element key_table[CAMELLIA_TABLE_BYTE_LEN / sizeof(key_element)];
 };
 
 static int

[-- Attachment #3: linux-2.6.23.1.camellia6.diff --]
[-- Type: text/x-diff, Size: 15277 bytes --]

diff -urpN linux-2.6.23.1.camellia5/crypto/camellia.c linux-2.6.23.1.camellia6/crypto/camellia.c
--- linux-2.6.23.1.camellia5/crypto/camellia.c	2007-11-14 12:30:27.000000000 -0700
+++ linux-2.6.23.1.camellia6/crypto/camellia.c	2007-11-14 12:30:27.000000000 -0700
@@ -1584,400 +1584,115 @@ typedef const u32 const_key_element;
 	yr ^= ROR8(il) ^ ir;						\
     } while(0)
 
-static void camellia_encrypt128(const_key_element *subkey, u32 *io_text)
+/* max = 24: 128bit encrypt, max = 32: 256bit encrypt */
+static void camellia_do_encrypt(const_key_element *subkey, u32 *io, unsigned max)
 {
 	u32 il,ir,t0,t1;               /* temporary variables */
 
-	u32 io[4];
-
 	/* pre whitening but absorb kw2 */
-	io[0] = io_text[0] ^ SUBKEY_L(0);
-	io[1] = io_text[1] ^ SUBKEY_R(0);
-	io[2] = io_text[2];
-	io[3] = io_text[3];
+	io[0] ^= SUBKEY_L(0);
+	io[1] ^= SUBKEY_R(0);
 
 	/* main iteration */
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(2),SUBKEY_R(2),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(3),SUBKEY_R(3),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(4),SUBKEY_R(4),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(5),SUBKEY_R(5),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(6),SUBKEY_R(6),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(7),SUBKEY_R(7),
-			 io[0],io[1],il,ir);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(8),SUBKEY_R(8),
-		     SUBKEY_L(9),SUBKEY_R(9),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(10),SUBKEY_R(10),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(11),SUBKEY_R(11),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(12),SUBKEY_R(12),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(13),SUBKEY_R(13),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(14),SUBKEY_R(14),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(15),SUBKEY_R(15),
-			 io[0],io[1],il,ir);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(16),SUBKEY_R(16),
-		     SUBKEY_L(17),SUBKEY_R(17),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(18),SUBKEY_R(18),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(19),SUBKEY_R(19),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(20),SUBKEY_R(20),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(21),SUBKEY_R(21),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(22),SUBKEY_R(22),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(23),SUBKEY_R(23),
-			 io[0],io[1],il,ir);
-
-	/* post whitening but kw4 */
-	io_text[0] = io[2] ^ SUBKEY_L(24);
-	io_text[1] = io[3] ^ SUBKEY_R(24);
-	io_text[2] = io[0];
-	io_text[3] = io[1];
-}
-
-static void camellia_decrypt128(const_key_element *subkey, u32 *io_text)
-{
-	u32 il,ir,t0,t1;               /* temporary variables */
-
-	u32 io[4];
-
-	/* pre whitening but absorb kw2 */
-	io[0] = io_text[0] ^ SUBKEY_L(24);
-	io[1] = io_text[1] ^ SUBKEY_R(24);
-	io[2] = io_text[2];
-	io[3] = io_text[3];
+#define ROUNDS(i) do { \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 2),SUBKEY_R(i + 2), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 3),SUBKEY_R(i + 3), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 4),SUBKEY_R(i + 4), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 5),SUBKEY_R(i + 5), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 6),SUBKEY_R(i + 6), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 7),SUBKEY_R(i + 7), \
+			 io[0],io[1],il,ir); \
+} while (0)
+#define FLS(i) do { \
+	CAMELLIA_FLS(io[0],io[1],io[2],io[3], \
+		     SUBKEY_L(i + 0),SUBKEY_R(i + 0), \
+		     SUBKEY_L(i + 1),SUBKEY_R(i + 1), \
+		     t0,t1,il,ir); \
+} while (0)
+
+	ROUNDS(0);
+	FLS(8);
+	ROUNDS(8);
+	FLS(16);
+	ROUNDS(16);
+	if (max == 32) {
+		FLS(24);
+		ROUNDS(24);
+	}
 
-	/* main iteration */
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(23),SUBKEY_R(23),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(22),SUBKEY_R(22),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(21),SUBKEY_R(21),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(20),SUBKEY_R(20),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(19),SUBKEY_R(19),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(18),SUBKEY_R(18),
-			 io[0],io[1],il,ir);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(17),SUBKEY_R(17),
-		     SUBKEY_L(16),SUBKEY_R(16),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(15),SUBKEY_R(15),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(14),SUBKEY_R(14),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(13),SUBKEY_R(13),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(12),SUBKEY_R(12),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(11),SUBKEY_R(11),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(10),SUBKEY_R(10),
-			 io[0],io[1],il,ir);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(9),SUBKEY_R(9),
-		     SUBKEY_L(8),SUBKEY_R(8),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(7),SUBKEY_R(7),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(6),SUBKEY_R(6),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(5),SUBKEY_R(5),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(4),SUBKEY_R(4),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(3),SUBKEY_R(3),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(2),SUBKEY_R(2),
-			 io[0],io[1],il,ir);
+#undef ROUNDS
+#undef FLS
 
 	/* post whitening but kw4 */
-	io_text[0] = io[2] ^ SUBKEY_L(0);
-	io_text[1] = io[3] ^ SUBKEY_R(0);
-	io_text[2] = io[0];
-	io_text[3] = io[1];
+	io[2] ^= SUBKEY_L(max);
+	io[3] ^= SUBKEY_R(max);
+	/* NB: io[0],[1] should be swapped with [2],[3] by caller! */
 }
 
-static void camellia_encrypt256(const_key_element *subkey, u32 *io_text)
+static void camellia_do_decrypt(const_key_element *subkey, u32 *io, unsigned i)
 {
-	u32 il,ir,t0,t1;           /* temporary variables */
-
-	u32 io[4];
+	u32 il,ir,t0,t1;               /* temporary variables */
 
 	/* pre whitening but absorb kw2 */
-	io[0] = io_text[0] ^ SUBKEY_L(0);
-	io[1] = io_text[1] ^ SUBKEY_R(0);
-	io[2] = io_text[2];
-	io[3] = io_text[3];
+	io[0] ^= SUBKEY_L(i);
+	io[1] ^= SUBKEY_R(i);
 
 	/* main iteration */
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(2),SUBKEY_R(2),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(3),SUBKEY_R(3),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(4),SUBKEY_R(4),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(5),SUBKEY_R(5),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(6),SUBKEY_R(6),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(7),SUBKEY_R(7),
-			 io[0],io[1],il,ir);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(8),SUBKEY_R(8),
-		     SUBKEY_L(9),SUBKEY_R(9),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(10),SUBKEY_R(10),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(11),SUBKEY_R(11),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(12),SUBKEY_R(12),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(13),SUBKEY_R(13),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(14),SUBKEY_R(14),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(15),SUBKEY_R(15),
-			 io[0],io[1],il,ir);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(16),SUBKEY_R(16),
-		     SUBKEY_L(17),SUBKEY_R(17),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(18),SUBKEY_R(18),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(19),SUBKEY_R(19),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(20),SUBKEY_R(20),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(21),SUBKEY_R(21),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(22),SUBKEY_R(22),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(23),SUBKEY_R(23),
-			 io[0],io[1],il,ir);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(24),SUBKEY_R(24),
-		     SUBKEY_L(25),SUBKEY_R(25),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(26),SUBKEY_R(26),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(27),SUBKEY_R(27),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(28),SUBKEY_R(28),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(29),SUBKEY_R(29),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(30),SUBKEY_R(30),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(31),SUBKEY_R(31),
-			 io[0],io[1],il,ir);
-
-	/* post whitening but kw4 */
-	io_text[0] = io[2] ^ SUBKEY_L(32);
-	io_text[1] = io[3] ^ SUBKEY_R(32);
-	io_text[2] = io[0];
-	io_text[3] = io[1];
-}
-
-static void camellia_decrypt256(const_key_element *subkey, u32 *io_text)
-{
-	u32 il,ir,t0,t1;           /* temporary variables */
-
-	u32 io[4];
-
-	/* pre whitening but absorb kw2 */
-	io[0] = io_text[0] ^ SUBKEY_L(32);
-	io[1] = io_text[1] ^ SUBKEY_R(32);
-	io[2] = io_text[2];
-	io[3] = io_text[3];
+#define ROUNDS(i) do { \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 7),SUBKEY_R(i + 7), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 6),SUBKEY_R(i + 6), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 5),SUBKEY_R(i + 5), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 4),SUBKEY_R(i + 4), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 3),SUBKEY_R(i + 3), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 2),SUBKEY_R(i + 2), \
+			 io[0],io[1],il,ir); \
+} while (0)
+#define FLS(i) do { \
+	CAMELLIA_FLS(io[0],io[1],io[2],io[3], \
+		     SUBKEY_L(i + 1),SUBKEY_R(i + 1), \
+		     SUBKEY_L(i + 0),SUBKEY_R(i + 0), \
+		     t0,t1,il,ir); \
+} while (0)
+
+	if (i == 32) {
+		ROUNDS(24);
+		FLS(24);
+	}
+	ROUNDS(16);
+	FLS(16);
+	ROUNDS(8);
+	FLS(8);
+	ROUNDS(0);
 
-	/* main iteration */
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(31),SUBKEY_R(31),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(30),SUBKEY_R(30),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(29),SUBKEY_R(29),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(28),SUBKEY_R(28),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(27),SUBKEY_R(27),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(26),SUBKEY_R(26),
-			 io[0],io[1],il,ir);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(25),SUBKEY_R(25),
-		     SUBKEY_L(24),SUBKEY_R(24),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(23),SUBKEY_R(23),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(22),SUBKEY_R(22),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(21),SUBKEY_R(21),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(20),SUBKEY_R(20),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(19),SUBKEY_R(19),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(18),SUBKEY_R(18),
-			 io[0],io[1],il,ir);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(17),SUBKEY_R(17),
-		     SUBKEY_L(16),SUBKEY_R(16),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(15),SUBKEY_R(15),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(14),SUBKEY_R(14),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(13),SUBKEY_R(13),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(12),SUBKEY_R(12),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(11),SUBKEY_R(11),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(10),SUBKEY_R(10),
-			 io[0],io[1],il,ir);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(9),SUBKEY_R(9),
-		     SUBKEY_L(8),SUBKEY_R(8),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(7),SUBKEY_R(7),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(6),SUBKEY_R(6),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(5),SUBKEY_R(5),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(4),SUBKEY_R(4),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(3),SUBKEY_R(3),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(2),SUBKEY_R(2),
-			 io[0],io[1],il,ir);
+#undef ROUNDS
+#undef FLS
 
 	/* post whitening but kw4 */
-	io_text[0] = io[2] ^ SUBKEY_L(0);
-	io_text[1] = io[3] ^ SUBKEY_R(0);
-	io_text[2] = io[0];
-	io_text[3] = io[1];
+	io[2] ^= SUBKEY_L(0);
+	io[3] ^= SUBKEY_R(0);
+	/* NB: 0,1 should be swapped with 2,3 by caller! */
 }
 
 
@@ -2029,21 +1744,15 @@ static void camellia_encrypt(struct cryp
 	tmp[2] = be32_to_cpu(src[2]);
 	tmp[3] = be32_to_cpu(src[3]);
 
-	switch (cctx->key_length) {
-	case 16:
-		camellia_encrypt128(cctx->key_table, tmp);
-		break;
-	case 24:
-		/* fall through */
-	case 32:
-		camellia_encrypt256(cctx->key_table, tmp);
-		break;
-	}
-
-	dst[0] = cpu_to_be32(tmp[0]);
-	dst[1] = cpu_to_be32(tmp[1]);
-	dst[2] = cpu_to_be32(tmp[2]);
-	dst[3] = cpu_to_be32(tmp[3]);
+	camellia_do_encrypt(cctx->key_table, tmp,
+		cctx->key_length == 16 ? 24 : 32 /* for key lengths of 24 and 32 */
+	);
+
+	/* do_encrypt returns 0,1 swapped with 2,3 */
+	dst[0] = cpu_to_be32(tmp[2]);
+	dst[1] = cpu_to_be32(tmp[3]);
+	dst[2] = cpu_to_be32(tmp[0]);
+	dst[3] = cpu_to_be32(tmp[1]);
 }
 
 static void camellia_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
@@ -2059,21 +1768,15 @@ static void camellia_decrypt(struct cryp
 	tmp[2] = be32_to_cpu(src[2]);
 	tmp[3] = be32_to_cpu(src[3]);
 
-	switch (cctx->key_length) {
-	case 16:
-		camellia_decrypt128(cctx->key_table, tmp);
-		break;
-	case 24:
-		/* fall through */
-	case 32:
-		camellia_decrypt256(cctx->key_table, tmp);
-		break;
-	}
-
-	dst[0] = cpu_to_be32(tmp[0]);
-	dst[1] = cpu_to_be32(tmp[1]);
-	dst[2] = cpu_to_be32(tmp[2]);
-	dst[3] = cpu_to_be32(tmp[3]);
+	camellia_do_decrypt(cctx->key_table, tmp,
+		cctx->key_length == 16 ? 24 : 32 /* for key lengths of 24 and 32 */
+	);
+
+	/* do_decrypt returns 0,1 swapped with 2,3 */
+	dst[0] = cpu_to_be32(tmp[2]);
+	dst[1] = cpu_to_be32(tmp[3]);
+	dst[2] = cpu_to_be32(tmp[0]);
+	dst[3] = cpu_to_be32(tmp[1]);
 }
 
 static struct crypto_alg camellia_alg = {

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization
  2007-11-14 21:28                           ` Denys Vlasenko
@ 2007-11-18 13:21                             ` Herbert Xu
  2007-11-19  4:30                               ` Denys Vlasenko
  0 siblings, 1 reply; 40+ messages in thread
From: Herbert Xu @ 2007-11-18 13:21 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: Noriaki TAKAMIYA, David Miller, linux-crypto

On Wed, Nov 14, 2007 at 02:28:25PM -0700, Denys Vlasenko wrote:
>
> I also split this patch into two parts for easier review:
> camellia5:
>         adds 64-bit key setup

Sorry but this still duplicates way too much code.  Also key
setup is the slow path relatively speaking so it's even less
justifiable.

> camellia6:
>         unifies encrypt/decrypt routines for different key lengths.
>         This reduces module size by ~25%, with tiny (less than 1%)
>         speed impact.
>         Also collapses encrypt/decrypt into more readable
>         (visually shorter) form using macros.

This looks pretty neat though.  I'll merge it unless I hear any
objections.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization
  2007-11-18 13:21                             ` Herbert Xu
@ 2007-11-19  4:30                               ` Denys Vlasenko
  2007-11-19 18:49                                 ` Noriaki TAKAMIYA
  2007-11-21  3:53                                 ` Herbert Xu
  0 siblings, 2 replies; 40+ messages in thread
From: Denys Vlasenko @ 2007-11-19  4:30 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Noriaki TAKAMIYA, David Miller, linux-crypto

[-- Attachment #1: Type: text/plain, Size: 1869 bytes --]

Hi Herbert,

On Sunday 18 November 2007 05:21, Herbert Xu wrote:
> On Wed, Nov 14, 2007 at 02:28:25PM -0700, Denys Vlasenko wrote:
> > I also split this patch into two parts for easier review:
> > camellia5:
> >         adds 64-bit key setup
>
> Sorry but this still duplicates way too much code.  Also key
> setup is the slow path relatively speaking so it's even less
> justifiable.

Oh, Herbert, have heart, my camellia.c source file is smaller
than the one I started from. It's not like it's twice as big.
It's smaller already.

64-bit key setup is not just faster, it is also smaller
by ~4k, and this benefit is always there, not only when
key setup is performed.

With attached camellia7 patch, I further reduce the size
of key setup routines by reusing a bit of the code
at the end of them. 2 screenfuls of code less.

I hope it makes code duplication a bit more tolerable.

> > camellia6:
> >         unifies encrypt/decrypt routines for different key lengths.
> >         This reduces module size by ~25%, with tiny (less than 1%)
> >         speed impact.
> >         Also collapses encrypt/decrypt into more readable
> >         (visually shorter) form using macros.

And here is

camellia7:
        Move "key XOR is end of F-function" code part into
        camellia_setup_tail(), it is sufficiently similar
        between camellia_setup128 and camellia_setup256.
        This shaves off another ~1k:
          dec     hex filename
        21414    53a6 2.6.23.1.camellia6.t/crypto/camellia.o
        20518    5026 2.6.23.1.camellia7.t/crypto/camellia.o
        16355    3fe3 2.6.23.1.camellia6.t64/crypto/camellia.o
        15813    3dc5 2.6.23.1.camellia7.t64/crypto/camellia.o


At the moment I cannot run test it, try to do it ASAP.

Takamiya-san, can you review attached patch please?

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
-- 
vda

[-- Attachment #2: linux-2.6.23.1.camellia7.diff --]
[-- Type: text/x-diff, Size: 21989 bytes --]

diff -urpN linux-2.6.23.1.camellia6/crypto/camellia.c linux-2.6.23.1.camellia7/crypto/camellia.c
--- linux-2.6.23.1.camellia6/crypto/camellia.c	2007-11-14 11:30:27.000000000 -0800
+++ linux-2.6.23.1.camellia7/crypto/camellia.c	2007-11-18 20:15:19.000000000 -0800
@@ -380,15 +380,80 @@ static const u32 camellia_sp4404[256] = 
 #ifdef __BIG_ENDIAN
 #define SUBKEY_L(INDEX) (((u32*)subkey)[(INDEX)*2])
 #define SUBKEY_R(INDEX) (((u32*)subkey)[(INDEX)*2 + 1])
+#define subL(INDEX) (((u32*)sub)[(INDEX)*2])
+#define subR(INDEX) (((u32*)sub)[(INDEX)*2 + 1])
 #else
 #define SUBKEY_L(INDEX) (((u32*)subkey)[(INDEX)*2 + 1])
 #define SUBKEY_R(INDEX) (((u32*)subkey)[(INDEX)*2])
+#define subL(INDEX) (((u32*)sub)[(INDEX)*2 + 1])
+#define subR(INDEX) (((u32*)sub)[(INDEX)*2])
 #endif
 
-static void camellia_setup_tail(u64 *subkey, int max)
+static void camellia_setup_tail(u64 *subkey, u64 *sub, int max)
 {
+	u64 t;
 	u32 dw;
-	int i = 2;
+	int i;
+
+	/* key XOR is end of F-function */
+	SUBKEY(0) = sub[0] ^ sub[2];/* kw1 */
+	SUBKEY(2) = sub[3];       /* round 1 */
+	SUBKEY(3) = sub[2] ^ sub[4]; /* round 2 */
+	SUBKEY(4) = sub[3] ^ sub[5]; /* round 3 */
+	SUBKEY(5) = sub[4] ^ sub[6]; /* round 4 */
+	SUBKEY(6) = sub[5] ^ sub[7]; /* round 5 */
+	t = subL(10) ^ (subR(10) & ~subR(8)); // tl = subL[10] ^ (subR[10] & ~subR[8]);
+	dw = (u32)t & subL(8);  /* FL(kl1) */
+	t = (t << 32) | (subR(10) ^ ROL1(dw)); // tr = subR[10] ^ ROL1(dw);
+	SUBKEY(7) = sub[6] ^ t;   /* round 6 */
+	SUBKEY(8) = sub[8];       /* FL(kl1) */
+	SUBKEY(9) = sub[9];       /* FLinv(kl2) */
+	t = subL(7) ^ (subR(7) & ~subR(9));
+	dw = (u32)t & subL(9);  /* FLinv(kl2) */
+	t = (t << 32) | (subR(7) ^ ROL1(dw));
+	SUBKEY(10) = t ^ sub[11]; /* round 7 */
+	SUBKEY(11) = sub[10] ^ sub[12]; /* round 8 */
+	SUBKEY(12) = sub[11] ^ sub[13]; /* round 9 */
+	SUBKEY(13) = sub[12] ^ sub[14]; /* round 10 */
+	SUBKEY(14) = sub[13] ^ sub[15]; /* round 11 */
+	t = subL(18) ^ (subR(18) & ~subR(16));
+	dw = (u32)t & subL(16); /* FL(kl3) */
+	t = (t << 32) | (subR(18) ^ ROL1(dw));
+	SUBKEY(15) = sub[14] ^ t; /* round 12 */
+	SUBKEY(16) = sub[16];     /* FL(kl3) */
+	SUBKEY(17) = sub[17];     /* FLinv(kl4) */
+	t = subL(15) ^ (subR(15) & ~subR(17));
+	dw = (u32)t & subL(17); /* FLinv(kl4) */
+	t = (t << 32) | (subR(15) ^ ROL1(dw));
+	SUBKEY(18) = t ^ sub[19]; /* round 13 */
+	SUBKEY(19) = sub[18] ^ sub[20]; /* round 14 */
+	SUBKEY(20) = sub[19] ^ sub[21]; /* round 15 */
+	SUBKEY(21) = sub[20] ^ sub[22]; /* round 16 */
+	SUBKEY(22) = sub[21] ^ sub[23]; /* round 17 */
+	if (max == 24) {
+		SUBKEY(23) = sub[22];     /* round 18 */
+		SUBKEY(24) = sub[24] ^ sub[23]; /* kw3 */
+	} else { 
+		t = subL(26) ^ (subR(26) & ~subR(24));
+		dw = (u32)t & subL(24); /* FL(kl5) */
+		t = (t << 32) | (subR(26) ^ ROL1(dw));
+		SUBKEY(23) = sub[22] ^ t; /* round 18 */
+		SUBKEY(24) = sub[24];     /* FL(kl5) */
+		SUBKEY(25) = sub[25];     /* FLinv(kl6) */
+		t = subL(23) ^ (subR(23) & ~subR(25));
+		dw = (u32)t & subL(25); /* FLinv(kl6) */
+		t = (t << 32) | (subR(23) ^ ROL1(dw));
+		SUBKEY(26) = t ^ sub[27]; /* round 19 */
+		SUBKEY(27) = sub[26] ^ sub[28]; /* round 20 */
+		SUBKEY(28) = sub[27] ^ sub[29]; /* round 21 */
+		SUBKEY(29) = sub[28] ^ sub[30]; /* round 22 */
+		SUBKEY(30) = sub[29] ^ sub[31]; /* round 23 */
+		SUBKEY(31) = sub[30];     /* round 24 */
+		SUBKEY(32) = sub[32] ^ sub[31]; /* kw3 */
+	}
+
+	/* apply the inverse of the last half of P-function */
+	i = 2;
 	do {
 		dw = SUBKEY_L(i + 0) ^ SUBKEY_R(i + 0); dw = ROL8(dw);/* round 1 */
 		SUBKEY_R(i + 0) = SUBKEY_L(i + 0) ^ dw; SUBKEY_L(i + 0) = dw;
@@ -406,31 +471,21 @@ static void camellia_setup_tail(u64 *sub
 	} while (i < max);
 }
 
-#ifdef __BIG_ENDIAN
-#define subL(INDEX) (((u32*)sub)[(INDEX)*2])
-#define subR(INDEX) (((u32*)sub)[(INDEX)*2 + 1])
-#else
-#define subL(INDEX) (((u32*)sub)[(INDEX)*2 + 1])
-#define subR(INDEX) (((u32*)sub)[(INDEX)*2])
-#endif
-
 static void camellia_setup128(const unsigned char *key, u64 *subkey)
 {
 	u64 kl, kr;
-	u64 i, t, w;
+	u64 i, w;
 	u64 kw4;
 	u32 dw;
 	u64 sub[26];
 
 	/**
-	 *  k == kl || kr (|| is concatination)
+	 *  k == kl || kr (|| is concatenation)
 	 */
 	GETU64(kl, key     );
 	GETU64(kr, key +  8);
 
-	/**
-	 * generate KL dependent subkeys
-	 */
+	/* generate KL dependent subkeys */
 	/* kw1 */
 	sub[0] = kl;
 	/* kw2 */
@@ -567,60 +622,21 @@ static void camellia_setup128(const unsi
 	/* kw1 */
 	sub[0] ^= kw4;
 
-	/* key XOR is end of F-function */
-	SUBKEY(0) = sub[0] ^ sub[2];/* kw1 */
-	SUBKEY(2) = sub[3];       /* round 1 */
-	SUBKEY(3) = sub[2] ^ sub[4]; /* round 2 */
-	SUBKEY(4) = sub[3] ^ sub[5]; /* round 3 */
-	SUBKEY(5) = sub[4] ^ sub[6]; /* round 4 */
-	SUBKEY(6) = sub[5] ^ sub[7]; /* round 5 */
-	t = subL(10) ^ (subR(10) & ~subR(8)); // tl = subL[10] ^ (subR[10] & ~subR[8]);
-	dw = (u32)t & subL(8);  /* FL(kl1) */
-	t = (t << 32) | (subR(10) ^ ROL1(dw)); // tr = subR[10] ^ ROL1(dw);
-	SUBKEY(7) = sub[6] ^ t; /* round 6 */
-	SUBKEY(8) = sub[8];       /* FL(kl1) */
-	SUBKEY(9) = sub[9];       /* FLinv(kl2) */
-	t = subL(7) ^ (subR(7) & ~subR(9));
-	dw = (u32)t & subL(9);  /* FLinv(kl2) */
-	t = (t << 32) | (subR(7) ^ ROL1(dw));
-	SUBKEY(10) = t ^ sub[11]; /* round 7 */
-	SUBKEY(11) = sub[10] ^ sub[12]; /* round 8 */
-	SUBKEY(12) = sub[11] ^ sub[13]; /* round 9 */
-	SUBKEY(13) = sub[12] ^ sub[14]; /* round 10 */
-	SUBKEY(14) = sub[13] ^ sub[15]; /* round 11 */
-	t = subL(18) ^ (subR(18) & ~subR(16));
-	dw = (u32)t & subL(16); /* FL(kl3) */
-	t = (t << 32) | (subR(18) ^ ROL1(dw));
-	SUBKEY(15) = sub[14] ^ t; /* round 12 */
-	SUBKEY(16) = sub[16];     /* FL(kl3) */
-	SUBKEY(17) = sub[17];     /* FLinv(kl4) */
-	t = subL(15) ^ (subR(15) & ~subR(17));
-	dw = (u32)t & subL(17); /* FLinv(kl4) */
-	t = (t << 32) | (subR(15) ^ ROL1(dw));
-	SUBKEY(18) = t ^ sub[19]; /* round 13 */
-	SUBKEY(19) = sub[18] ^ sub[20]; /* round 14 */
-	SUBKEY(20) = sub[19] ^ sub[21]; /* round 15 */
-	SUBKEY(21) = sub[20] ^ sub[22]; /* round 16 */
-	SUBKEY(22) = sub[21] ^ sub[23]; /* round 17 */
-	SUBKEY(23) = sub[22];     /* round 18 */
-	SUBKEY(24) = sub[24] ^ sub[23]; /* kw3 */
-
-	/* apply the inverse of the last half of P-function */
-	camellia_setup_tail(subkey, 24);
+	camellia_setup_tail(subkey, sub, 24);
 }
 
 static void camellia_setup256(const unsigned char *key, u64 *subkey)
 {
 	u64 kl, kr;        /* left half of key */
 	u64 krl, krr;      /* right half of key */
-	u64 i, t, w;       /* temporary variables */
+	u64 i, w;          /* temporary variables */
 	u64 kw4;
 	u32 dw;
 	u64 sub[34];
 
 	/**
 	 *  key = (kl || kr || krl || krr)
-	 *  (|| is concatination)
+	 *  (|| is concatenation)
 	 */
 	GETU64(kl,  key     );
 	GETU64(kr,  key +  8);
@@ -786,8 +802,8 @@ static void camellia_setup256(const unsi
 	/* round 19 */
 	sub[26] ^= kw4;
 	kw4 ^= (u64)((u32)kw4 & ~subR(24)) << 32; //kw4l ^= kw4r & ~subR[24];
-	dw = (u32)(kw4 >> 32) & subL(24),
-		kw4 ^= ROL1(dw); /* modified for FL(kl5) */
+	dw = (u32)(kw4 >> 32) & subL(24);
+	kw4 ^= ROL1(dw); /* modified for FL(kl5) */
 	/* round 17 */
 	sub[22] ^= kw4;
 	/* round 15 */
@@ -795,8 +811,8 @@ static void camellia_setup256(const unsi
 	/* round 13 */
 	sub[18] ^= kw4;
 	kw4 ^= (u64)((u32)kw4 & ~subR(16)) << 32;
-	dw = (u32)(kw4 >> 32) & subL(16),
-		kw4 ^= ROL1(dw); /* modified for FL(kl3) */
+	dw = (u32)(kw4 >> 32) & subL(16);
+	kw4 ^= ROL1(dw); /* modified for FL(kl3) */
 	/* round 11 */
 	sub[14] ^= kw4;
 	/* round 9 */
@@ -804,8 +820,8 @@ static void camellia_setup256(const unsi
 	/* round 7 */
 	sub[10] ^= kw4;
 	kw4 ^= (u64)((u32)kw4 & ~subR(8)) << 32;
-	dw = (u32)(kw4 >> 32) & subL(8),
-		kw4 ^= ROL1(dw); /* modified for FL(kl1) */
+	dw = (u32)(kw4 >> 32) & subL(8);
+	kw4 ^= ROL1(dw); /* modified for FL(kl1) */
 	/* round 5 */
 	sub[6] ^= kw4;
 	/* round 3 */
@@ -815,60 +831,7 @@ static void camellia_setup256(const unsi
 	/* kw1 */
 	sub[0] ^= kw4;
 
-	/* key XOR is end of F-function */
-	SUBKEY(0) = sub[0] ^ sub[2];/* kw1 */
-	SUBKEY(2) = sub[3];       /* round 1 */
-	SUBKEY(3) = sub[2] ^ sub[4]; /* round 2 */
-	SUBKEY(4) = sub[3] ^ sub[5]; /* round 3 */
-	SUBKEY(5) = sub[4] ^ sub[6]; /* round 4 */
-	SUBKEY(6) = sub[5] ^ sub[7]; /* round 5 */
-	t = subL(10) ^ (subR(10) & ~subR(8)); // tl = subL[10] ^ (subR[10] & ~subR[8]);
-	dw = (u32)t & subL(8);  /* FL(kl1) */
-	t = (t << 32) | (subR(10) ^ ROL1(dw)); //tr = subR[10] ^ ROL1(dw);
-	SUBKEY(7) = sub[6] ^ t;   /* round 6 */
-	SUBKEY(8) = sub[8];       /* FL(kl1) */
-	SUBKEY(9) = sub[9];       /* FLinv(kl2) */
-	t = subL(7) ^ (subR(7) & ~subR(9));
-	dw = (u32)t & subL(9);  /* FLinv(kl2) */
-	t = (t << 32) | (subR(7) ^ ROL1(dw));
-	SUBKEY(10) = t ^ sub[11]; /* round 7 */
-	SUBKEY(11) = sub[10] ^ sub[12]; /* round 8 */
-	SUBKEY(12) = sub[11] ^ sub[13]; /* round 9 */
-	SUBKEY(13) = sub[12] ^ sub[14]; /* round 10 */
-	SUBKEY(14) = sub[13] ^ sub[15]; /* round 11 */
-	t = subL(18) ^ (subR(18) & ~subR(16));
-	dw = (u32)t & subL(16); /* FL(kl3) */
-	t = (t << 32) | (subR(18) ^ ROL1(dw));
-	SUBKEY(15) = sub[14] ^ t; /* round 12 */
-	SUBKEY(16) = sub[16];     /* FL(kl3) */
-	SUBKEY(17) = sub[17];     /* FLinv(kl4) */
-	t = subL(15) ^ (subR(15) & ~subR(17));
-	dw = (u32)t & subL(17); /* FLinv(kl4) */
-	t = (t << 32) | (subR(15) ^ ROL1(dw));
-	SUBKEY(18) = t ^ sub[19]; /* round 13 */
-	SUBKEY(19) = sub[18] ^ sub[20]; /* round 14 */
-	SUBKEY(20) = sub[19] ^ sub[21]; /* round 15 */
-	SUBKEY(21) = sub[20] ^ sub[22]; /* round 16 */
-	SUBKEY(22) = sub[21] ^ sub[23]; /* round 17 */
-	t = subL(26) ^ (subR(26) & ~subR(24));
-	dw = (u32)t & subL(24); /* FL(kl5) */
-	t = (t << 32) | (subR(26) ^ ROL1(dw));
-	SUBKEY(23) = sub[22] ^ t; /* round 18 */
-	SUBKEY(24) = sub[24];     /* FL(kl5) */
-	SUBKEY(25) = sub[25];     /* FLinv(kl6) */
-	t = subL(23) ^ (subR(23) & ~subR(25));
-	dw = (u32)t & subL(25); /* FLinv(kl6) */
-	t = (t << 32) | (subR(23) ^ ROL1(dw));
-	SUBKEY(26) = t ^ sub[27]; /* round 19 */
-	SUBKEY(27) = sub[26] ^ sub[28]; /* round 20 */
-	SUBKEY(28) = sub[27] ^ sub[29]; /* round 21 */
-	SUBKEY(29) = sub[28] ^ sub[30]; /* round 22 */
-	SUBKEY(30) = sub[29] ^ sub[31]; /* round 23 */
-	SUBKEY(31) = sub[30];     /* round 24 */
-	SUBKEY(32) = sub[32] ^ sub[31]; /* kw3 */
-
-	/* apply the inverse of the last half of P-function */
-	camellia_setup_tail(subkey, 32);
+	camellia_setup_tail(subkey, sub, 32);
 }
 
 static void camellia_setup192(const unsigned char *key, u64 *subkey)
@@ -967,10 +930,104 @@ typedef const u64 const_key_element;
 #define SUBKEY_L(INDEX) (subkey[(INDEX)*2])
 #define SUBKEY_R(INDEX) (subkey[(INDEX)*2 + 1])
 
-static void camellia_setup_tail(u32 *subkey, int max)
+static void camellia_setup_tail(u32 *subkey, u32 *subL, u32 *subR, int max)
 {
-	u32 dw;
-	int i = 2;
+	u32 dw, tl, tr;
+	int i;
+
+	/* key XOR is end of F-function */
+	SUBKEY_L(0) = subL[0] ^ subL[2];/* kw1 */
+	SUBKEY_R(0) = subR[0] ^ subR[2];
+	SUBKEY_L(2) = subL[3];       /* round 1 */
+	SUBKEY_R(2) = subR[3];
+	SUBKEY_L(3) = subL[2] ^ subL[4]; /* round 2 */
+	SUBKEY_R(3) = subR[2] ^ subR[4];
+	SUBKEY_L(4) = subL[3] ^ subL[5]; /* round 3 */
+	SUBKEY_R(4) = subR[3] ^ subR[5];
+	SUBKEY_L(5) = subL[4] ^ subL[6]; /* round 4 */
+	SUBKEY_R(5) = subR[4] ^ subR[6];
+	SUBKEY_L(6) = subL[5] ^ subL[7]; /* round 5 */
+	SUBKEY_R(6) = subR[5] ^ subR[7];
+	tl = subL[10] ^ (subR[10] & ~subR[8]);
+	dw = tl & subL[8],  /* FL(kl1) */
+		tr = subR[10] ^ ROL1(dw);
+	SUBKEY_L(7) = subL[6] ^ tl; /* round 6 */
+	SUBKEY_R(7) = subR[6] ^ tr;
+	SUBKEY_L(8) = subL[8];       /* FL(kl1) */
+	SUBKEY_R(8) = subR[8];
+	SUBKEY_L(9) = subL[9];       /* FLinv(kl2) */
+	SUBKEY_R(9) = subR[9];
+	tl = subL[7] ^ (subR[7] & ~subR[9]);
+	dw = tl & subL[9],  /* FLinv(kl2) */
+		tr = subR[7] ^ ROL1(dw);
+	SUBKEY_L(10) = tl ^ subL[11]; /* round 7 */
+	SUBKEY_R(10) = tr ^ subR[11];
+	SUBKEY_L(11) = subL[10] ^ subL[12]; /* round 8 */
+	SUBKEY_R(11) = subR[10] ^ subR[12];
+	SUBKEY_L(12) = subL[11] ^ subL[13]; /* round 9 */
+	SUBKEY_R(12) = subR[11] ^ subR[13];
+	SUBKEY_L(13) = subL[12] ^ subL[14]; /* round 10 */
+	SUBKEY_R(13) = subR[12] ^ subR[14];
+	SUBKEY_L(14) = subL[13] ^ subL[15]; /* round 11 */
+	SUBKEY_R(14) = subR[13] ^ subR[15];
+	tl = subL[18] ^ (subR[18] & ~subR[16]);
+	dw = tl & subL[16], /* FL(kl3) */
+		tr = subR[18] ^ ROL1(dw);
+	SUBKEY_L(15) = subL[14] ^ tl; /* round 12 */
+	SUBKEY_R(15) = subR[14] ^ tr;
+	SUBKEY_L(16) = subL[16];     /* FL(kl3) */
+	SUBKEY_R(16) = subR[16];
+	SUBKEY_L(17) = subL[17];     /* FLinv(kl4) */
+	SUBKEY_R(17) = subR[17];
+	tl = subL[15] ^ (subR[15] & ~subR[17]);
+	dw = tl & subL[17], /* FLinv(kl4) */
+		tr = subR[15] ^ ROL1(dw);
+	SUBKEY_L(18) = tl ^ subL[19]; /* round 13 */
+	SUBKEY_R(18) = tr ^ subR[19];
+	SUBKEY_L(19) = subL[18] ^ subL[20]; /* round 14 */
+	SUBKEY_R(19) = subR[18] ^ subR[20];
+	SUBKEY_L(20) = subL[19] ^ subL[21]; /* round 15 */
+	SUBKEY_R(20) = subR[19] ^ subR[21];
+	SUBKEY_L(21) = subL[20] ^ subL[22]; /* round 16 */
+	SUBKEY_R(21) = subR[20] ^ subR[22];
+	SUBKEY_L(22) = subL[21] ^ subL[23]; /* round 17 */
+	SUBKEY_R(22) = subR[21] ^ subR[23];
+	if (max == 24) {
+		SUBKEY_L(23) = subL[22];     /* round 18 */
+		SUBKEY_R(23) = subR[22];
+		SUBKEY_L(24) = subL[24] ^ subL[23]; /* kw3 */
+		SUBKEY_R(24) = subR[24] ^ subR[23];
+	} else {
+		tl = subL[26] ^ (subR[26] & ~subR[24]);
+		dw = tl & subL[24], /* FL(kl5) */
+			tr = subR[26] ^ ROL1(dw);
+		SUBKEY_L(23) = subL[22] ^ tl; /* round 18 */
+		SUBKEY_R(23) = subR[22] ^ tr;
+		SUBKEY_L(24) = subL[24];     /* FL(kl5) */
+		SUBKEY_R(24) = subR[24];
+		SUBKEY_L(25) = subL[25];     /* FLinv(kl6) */
+		SUBKEY_R(25) = subR[25];
+		tl = subL[23] ^ (subR[23] & ~subR[25]);
+		dw = tl & subL[25], /* FLinv(kl6) */
+			tr = subR[23] ^ ROL1(dw);
+		SUBKEY_L(26) = tl ^ subL[27]; /* round 19 */
+		SUBKEY_R(26) = tr ^ subR[27];
+		SUBKEY_L(27) = subL[26] ^ subL[28]; /* round 20 */
+		SUBKEY_R(27) = subR[26] ^ subR[28];
+		SUBKEY_L(28) = subL[27] ^ subL[29]; /* round 21 */
+		SUBKEY_R(28) = subR[27] ^ subR[29];
+		SUBKEY_L(29) = subL[28] ^ subL[30]; /* round 22 */
+		SUBKEY_R(29) = subR[28] ^ subR[30];
+		SUBKEY_L(30) = subL[29] ^ subL[31]; /* round 23 */
+		SUBKEY_R(30) = subR[29] ^ subR[31];
+		SUBKEY_L(31) = subL[30];     /* round 24 */
+		SUBKEY_R(31) = subR[30];
+		SUBKEY_L(32) = subL[32] ^ subL[31]; /* kw3 */
+		SUBKEY_R(32) = subR[32] ^ subR[31];
+	}
+
+	/* apply the inverse of the last half of P-function */
+	i = 2;
 	do {
 		dw = SUBKEY_L(i + 0) ^ SUBKEY_R(i + 0); dw = ROL8(dw);/* round 1 */
 		SUBKEY_R(i + 0) = SUBKEY_L(i + 0) ^ dw; SUBKEY_L(i + 0) = dw;
@@ -992,21 +1049,19 @@ static void camellia_setup128(const unsi
 {
 	u32 kll, klr, krl, krr;
 	u32 il, ir, t0, t1, w0, w1;
-	u32 kw4l, kw4r, dw, tl, tr;
+	u32 kw4l, kw4r, dw;
 	u32 subL[26];
 	u32 subR[26];
 
 	/**
-	 *  k == kll || klr || krl || krr (|| is concatination)
+	 *  k == kll || klr || krl || krr (|| is concatenation)
 	 */
 	GETU32(kll, key     );
 	GETU32(klr, key +  4);
 	GETU32(krl, key +  8);
 	GETU32(krr, key + 12);
 
-	/**
-	 * generate KL dependent subkeys
-	 */
+	/* generate KL dependent subkeys */
 	/* kw1 */
 	subL[0] = kll; subR[0] = klr;
 	/* kw2 */
@@ -1151,70 +1206,7 @@ static void camellia_setup128(const unsi
 	/* kw1 */
 	subL[0] ^= kw4l; subR[0] ^= kw4r;
 
-	/* key XOR is end of F-function */
-	SUBKEY_L(0) = subL[0] ^ subL[2];/* kw1 */
-	SUBKEY_R(0) = subR[0] ^ subR[2];
-	SUBKEY_L(2) = subL[3];       /* round 1 */
-	SUBKEY_R(2) = subR[3];
-	SUBKEY_L(3) = subL[2] ^ subL[4]; /* round 2 */
-	SUBKEY_R(3) = subR[2] ^ subR[4];
-	SUBKEY_L(4) = subL[3] ^ subL[5]; /* round 3 */
-	SUBKEY_R(4) = subR[3] ^ subR[5];
-	SUBKEY_L(5) = subL[4] ^ subL[6]; /* round 4 */
-	SUBKEY_R(5) = subR[4] ^ subR[6];
-	SUBKEY_L(6) = subL[5] ^ subL[7]; /* round 5 */
-	SUBKEY_R(6) = subR[5] ^ subR[7];
-	tl = subL[10] ^ (subR[10] & ~subR[8]);
-	dw = tl & subL[8],  /* FL(kl1) */
-		tr = subR[10] ^ ROL1(dw);
-	SUBKEY_L(7) = subL[6] ^ tl; /* round 6 */
-	SUBKEY_R(7) = subR[6] ^ tr;
-	SUBKEY_L(8) = subL[8];       /* FL(kl1) */
-	SUBKEY_R(8) = subR[8];
-	SUBKEY_L(9) = subL[9];       /* FLinv(kl2) */
-	SUBKEY_R(9) = subR[9];
-	tl = subL[7] ^ (subR[7] & ~subR[9]);
-	dw = tl & subL[9],  /* FLinv(kl2) */
-		tr = subR[7] ^ ROL1(dw);
-	SUBKEY_L(10) = tl ^ subL[11]; /* round 7 */
-	SUBKEY_R(10) = tr ^ subR[11];
-	SUBKEY_L(11) = subL[10] ^ subL[12]; /* round 8 */
-	SUBKEY_R(11) = subR[10] ^ subR[12];
-	SUBKEY_L(12) = subL[11] ^ subL[13]; /* round 9 */
-	SUBKEY_R(12) = subR[11] ^ subR[13];
-	SUBKEY_L(13) = subL[12] ^ subL[14]; /* round 10 */
-	SUBKEY_R(13) = subR[12] ^ subR[14];
-	SUBKEY_L(14) = subL[13] ^ subL[15]; /* round 11 */
-	SUBKEY_R(14) = subR[13] ^ subR[15];
-	tl = subL[18] ^ (subR[18] & ~subR[16]);
-	dw = tl & subL[16], /* FL(kl3) */
-		tr = subR[18] ^ ROL1(dw);
-	SUBKEY_L(15) = subL[14] ^ tl; /* round 12 */
-	SUBKEY_R(15) = subR[14] ^ tr;
-	SUBKEY_L(16) = subL[16];     /* FL(kl3) */
-	SUBKEY_R(16) = subR[16];
-	SUBKEY_L(17) = subL[17];     /* FLinv(kl4) */
-	SUBKEY_R(17) = subR[17];
-	tl = subL[15] ^ (subR[15] & ~subR[17]);
-	dw = tl & subL[17], /* FLinv(kl4) */
-		tr = subR[15] ^ ROL1(dw);
-	SUBKEY_L(18) = tl ^ subL[19]; /* round 13 */
-	SUBKEY_R(18) = tr ^ subR[19];
-	SUBKEY_L(19) = subL[18] ^ subL[20]; /* round 14 */
-	SUBKEY_R(19) = subR[18] ^ subR[20];
-	SUBKEY_L(20) = subL[19] ^ subL[21]; /* round 15 */
-	SUBKEY_R(20) = subR[19] ^ subR[21];
-	SUBKEY_L(21) = subL[20] ^ subL[22]; /* round 16 */
-	SUBKEY_R(21) = subR[20] ^ subR[22];
-	SUBKEY_L(22) = subL[21] ^ subL[23]; /* round 17 */
-	SUBKEY_R(22) = subR[21] ^ subR[23];
-	SUBKEY_L(23) = subL[22];     /* round 18 */
-	SUBKEY_R(23) = subR[22];
-	SUBKEY_L(24) = subL[24] ^ subL[23]; /* kw3 */
-	SUBKEY_R(24) = subR[24] ^ subR[23];
-
-	/* apply the inverse of the last half of P-function */
-	camellia_setup_tail(subkey, 24);
+	camellia_setup_tail(subkey, subL, subR, 24);
 }
 
 static void camellia_setup256(const unsigned char *key, u32 *subkey)
@@ -1222,13 +1214,13 @@ static void camellia_setup256(const unsi
 	u32 kll, klr, krl, krr;        /* left half of key */
 	u32 krll, krlr, krrl, krrr;    /* right half of key */
 	u32 il, ir, t0, t1, w0, w1;    /* temporary variables */
-	u32 kw4l, kw4r, dw, tl, tr;
+	u32 kw4l, kw4r, dw;
 	u32 subL[34];
 	u32 subR[34];
 
 	/**
 	 *  key = (kll || klr || krl || krr || krll || krlr || krrl || krrr)
-	 *  (|| is concatination)
+	 *  (|| is concatenation)
 	 */
 	GETU32(kll,  key     );
 	GETU32(klr,  key +  4);
@@ -1439,92 +1431,7 @@ static void camellia_setup256(const unsi
 	/* kw1 */
 	subL[0] ^= kw4l; subR[0] ^= kw4r;
 
-	/* key XOR is end of F-function */
-	SUBKEY_L(0) = subL[0] ^ subL[2];/* kw1 */
-	SUBKEY_R(0) = subR[0] ^ subR[2];
-	SUBKEY_L(2) = subL[3];       /* round 1 */
-	SUBKEY_R(2) = subR[3];
-	SUBKEY_L(3) = subL[2] ^ subL[4]; /* round 2 */
-	SUBKEY_R(3) = subR[2] ^ subR[4];
-	SUBKEY_L(4) = subL[3] ^ subL[5]; /* round 3 */
-	SUBKEY_R(4) = subR[3] ^ subR[5];
-	SUBKEY_L(5) = subL[4] ^ subL[6]; /* round 4 */
-	SUBKEY_R(5) = subR[4] ^ subR[6];
-	SUBKEY_L(6) = subL[5] ^ subL[7]; /* round 5 */
-	SUBKEY_R(6) = subR[5] ^ subR[7];
-	tl = subL[10] ^ (subR[10] & ~subR[8]);
-	dw = tl & subL[8],  /* FL(kl1) */
-		tr = subR[10] ^ ROL1(dw);
-	SUBKEY_L(7) = subL[6] ^ tl; /* round 6 */
-	SUBKEY_R(7) = subR[6] ^ tr;
-	SUBKEY_L(8) = subL[8];       /* FL(kl1) */
-	SUBKEY_R(8) = subR[8];
-	SUBKEY_L(9) = subL[9];       /* FLinv(kl2) */
-	SUBKEY_R(9) = subR[9];
-	tl = subL[7] ^ (subR[7] & ~subR[9]);
-	dw = tl & subL[9],  /* FLinv(kl2) */
-		tr = subR[7] ^ ROL1(dw);
-	SUBKEY_L(10) = tl ^ subL[11]; /* round 7 */
-	SUBKEY_R(10) = tr ^ subR[11];
-	SUBKEY_L(11) = subL[10] ^ subL[12]; /* round 8 */
-	SUBKEY_R(11) = subR[10] ^ subR[12];
-	SUBKEY_L(12) = subL[11] ^ subL[13]; /* round 9 */
-	SUBKEY_R(12) = subR[11] ^ subR[13];
-	SUBKEY_L(13) = subL[12] ^ subL[14]; /* round 10 */
-	SUBKEY_R(13) = subR[12] ^ subR[14];
-	SUBKEY_L(14) = subL[13] ^ subL[15]; /* round 11 */
-	SUBKEY_R(14) = subR[13] ^ subR[15];
-	tl = subL[18] ^ (subR[18] & ~subR[16]);
-	dw = tl & subL[16], /* FL(kl3) */
-		tr = subR[18] ^ ROL1(dw);
-	SUBKEY_L(15) = subL[14] ^ tl; /* round 12 */
-	SUBKEY_R(15) = subR[14] ^ tr;
-	SUBKEY_L(16) = subL[16];     /* FL(kl3) */
-	SUBKEY_R(16) = subR[16];
-	SUBKEY_L(17) = subL[17];     /* FLinv(kl4) */
-	SUBKEY_R(17) = subR[17];
-	tl = subL[15] ^ (subR[15] & ~subR[17]);
-	dw = tl & subL[17], /* FLinv(kl4) */
-		tr = subR[15] ^ ROL1(dw);
-	SUBKEY_L(18) = tl ^ subL[19]; /* round 13 */
-	SUBKEY_R(18) = tr ^ subR[19];
-	SUBKEY_L(19) = subL[18] ^ subL[20]; /* round 14 */
-	SUBKEY_R(19) = subR[18] ^ subR[20];
-	SUBKEY_L(20) = subL[19] ^ subL[21]; /* round 15 */
-	SUBKEY_R(20) = subR[19] ^ subR[21];
-	SUBKEY_L(21) = subL[20] ^ subL[22]; /* round 16 */
-	SUBKEY_R(21) = subR[20] ^ subR[22];
-	SUBKEY_L(22) = subL[21] ^ subL[23]; /* round 17 */
-	SUBKEY_R(22) = subR[21] ^ subR[23];
-	tl = subL[26] ^ (subR[26] & ~subR[24]);
-	dw = tl & subL[24], /* FL(kl5) */
-		tr = subR[26] ^ ROL1(dw);
-	SUBKEY_L(23) = subL[22] ^ tl; /* round 18 */
-	SUBKEY_R(23) = subR[22] ^ tr;
-	SUBKEY_L(24) = subL[24];     /* FL(kl5) */
-	SUBKEY_R(24) = subR[24];
-	SUBKEY_L(25) = subL[25];     /* FLinv(kl6) */
-	SUBKEY_R(25) = subR[25];
-	tl = subL[23] ^ (subR[23] & ~subR[25]);
-	dw = tl & subL[25], /* FLinv(kl6) */
-		tr = subR[23] ^ ROL1(dw);
-	SUBKEY_L(26) = tl ^ subL[27]; /* round 19 */
-	SUBKEY_R(26) = tr ^ subR[27];
-	SUBKEY_L(27) = subL[26] ^ subL[28]; /* round 20 */
-	SUBKEY_R(27) = subR[26] ^ subR[28];
-	SUBKEY_L(28) = subL[27] ^ subL[29]; /* round 21 */
-	SUBKEY_R(28) = subR[27] ^ subR[29];
-	SUBKEY_L(29) = subL[28] ^ subL[30]; /* round 22 */
-	SUBKEY_R(29) = subR[28] ^ subR[30];
-	SUBKEY_L(30) = subL[29] ^ subL[31]; /* round 23 */
-	SUBKEY_R(30) = subR[29] ^ subR[31];
-	SUBKEY_L(31) = subL[30];     /* round 24 */
-	SUBKEY_R(31) = subR[30];
-	SUBKEY_L(32) = subL[32] ^ subL[31]; /* kw3 */
-	SUBKEY_R(32) = subR[32] ^ subR[31];
-
-	/* apply the inverse of the last half of P-function */
-	camellia_setup_tail(subkey, 32);
+	camellia_setup_tail(subkey, subL, subR, 32);
 }
 
 static void camellia_setup192(const unsigned char *key, u32 *subkey)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization
  2007-11-19  4:30                               ` Denys Vlasenko
@ 2007-11-19 18:49                                 ` Noriaki TAKAMIYA
  2007-11-21  2:44                                   ` Denys Vlasenko
  2007-11-21  3:53                                 ` Herbert Xu
  1 sibling, 1 reply; 40+ messages in thread
From: Noriaki TAKAMIYA @ 2007-11-19 18:49 UTC (permalink / raw)
  To: vda.linux; +Cc: herbert, davem, linux-crypto

Hi,

>> Sun, 18 Nov 2007 20:30:16 -0800
>> [Subject: Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization]
>> Denys Vlasenko <vda.linux@googlemail.com> wrote...

> > > camellia6:
> > >         unifies encrypt/decrypt routines for different key lengths.
> > >         This reduces module size by ~25%, with tiny (less than 1%)
> > >         speed impact.
> > >         Also collapses encrypt/decrypt into more readable
> > >         (visually shorter) form using macros.
> 
> And here is
> 
> camellia7:
>         Move "key XOR is end of F-function" code part into
>         camellia_setup_tail(), it is sufficiently similar
>         between camellia_setup128 and camellia_setup256.
>         This shaves off another ~1k:
>           dec     hex filename
>         21414    53a6 2.6.23.1.camellia6.t/crypto/camellia.o
>         20518    5026 2.6.23.1.camellia7.t/crypto/camellia.o
>         16355    3fe3 2.6.23.1.camellia6.t64/crypto/camellia.o
>         15813    3dc5 2.6.23.1.camellia7.t64/crypto/camellia.o
> 
> 
> At the moment I cannot run test it, try to do it ASAP.
> 
> Takamiya-san, can you review attached patch please?

  Sorry for late reply.

  I think you're testing now:-), and if speed impact is less than 1%
  as you say, I think it is acceptable.

  The smaller code size is, the easier to enable camellia in the
  embedded systems.

  Regards,

Acked-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>

--
Noriaki TAKAMIYA

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization
  2007-11-19 18:49                                 ` Noriaki TAKAMIYA
@ 2007-11-21  2:44                                   ` Denys Vlasenko
  0 siblings, 0 replies; 40+ messages in thread
From: Denys Vlasenko @ 2007-11-21  2:44 UTC (permalink / raw)
  To: Noriaki TAKAMIYA; +Cc: herbert, davem, linux-crypto

[-- Attachment #1: Type: text/plain, Size: 2882 bytes --]

Hi Herbert, Noriaki,

On Monday 19 November 2007 10:49, Noriaki TAKAMIYA wrote:
> > > > camellia6:
> > > >         unifies encrypt/decrypt routines for different key lengths.
> > > >         This reduces module size by ~25%, with tiny (less than 1%)
> > > >         speed impact.
> > > >         Also collapses encrypt/decrypt into more readable
> > > >         (visually shorter) form using macros.
> >
> > And here is
> >
> > camellia7:
> >         Move "key XOR is end of F-function" code part into
> >         camellia_setup_tail(), it is sufficiently similar
> >         between camellia_setup128 and camellia_setup256.
> >         This shaves off another ~1k:
> >           dec     hex filename
> >         21414    53a6 2.6.23.1.camellia6.t/crypto/camellia.o
> >         20518    5026 2.6.23.1.camellia7.t/crypto/camellia.o
> >         16355    3fe3 2.6.23.1.camellia6.t64/crypto/camellia.o
> >         15813    3dc5 2.6.23.1.camellia7.t64/crypto/camellia.o
> >
> >
> > At the moment I cannot run test it, try to do it ASAP.
> >
> > Takamiya-san, can you review attached patch please?
>
>   Sorry for late reply.
>
>   I think you're testing now:-), and if speed impact is less than 1%
>   as you say, I think it is acceptable.

Actaually I tested it only now. Explored wonders of ubuntu
package management. Why it doesn't rebuild module
when I do "touch camellia.c" and then run their magic stuff?
I suspected that these .deb, .rpm, .whatever are evil,
but it changed today. Now I *know* that. ;)

Back to patches.

It turns out that two more key setup stages can be easily folded into
common "tail". Attached patches do this.

I'm also attaching all previous patches not yet ACKed by Herbert
(patches 5,6,7).

camellia8:
        Analogously to camellia7 patch, move
        "absorb kw2 to other subkeys" and "absorb kw4 to other subkeys"
        code parts into camellia_setup_tail(). This further reduces
        source and object code size at the cost of two brances
        in key setup code.

Code sizes (starting from the state with pathces 1-4 already applied):
64-bit:
dec      hex   filename
22786    5902  2.6.23.1.camellia4.t64/crypto/camellia.o
21422    53ae  2.6.23.1.camellia5.t64/crypto/camellia.o
16355    3fe3  2.6.23.1.camellia6.t64/crypto/camellia.o
15813    3dc5  2.6.23.1.camellia7.t64/crypto/camellia.o
15670    3d36  2.6.23.1.camellia8.t64/crypto/camellia.o
32-bit:
29948    74fc  2.6.23.1.camellia4.t/crypto/camellia.o
29457    7311  2.6.23.1.camellia5.t/crypto/camellia.o
21414    53a6  2.6.23.1.camellia6.t/crypto/camellia.o
20518    5026  2.6.23.1.camellia7.t/crypto/camellia.o
18454    4816  2.6.23.1.camellia8.t/crypto/camellia.o

Code is compile-tested for 32/64-bit x86 and run tested on 32-bit
(tcrypt tests all pass).

Herbert, please let me know what you think about them.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
-- 
vda


[-- Attachment #2: linux-2.6.23.1.camellia5.diff --]
[-- Type: text/x-diff, Size: 32570 bytes --]

diff -urpN linux-2.6.23.1.camellia/crypto/camellia.c linux-2.6.23.1.camellia5/crypto/camellia.c
--- linux-2.6.23.1.camellia/crypto/camellia.c	2007-11-14 11:30:27.000000000 -0800
+++ linux-2.6.23.1.camellia5/crypto/camellia.c	2007-11-14 11:30:27.000000000 -0800
@@ -310,6 +310,589 @@ static const u32 camellia_sp4404[256] = 
 #define CAMELLIA_BLOCK_SIZE          16
 #define CAMELLIA_TABLE_BYTE_LEN     272
 
+/*
+ * NB: L and R below stand for 'left' and 'right' as in written numbers.
+ * That is, in (xxxL,xxxR) pair xxxL holds most significant digits,
+ * _not_ least significant ones!
+ */
+
+
+
+#if BITS_PER_LONG >= 64
+
+/*
+ * Key setup implementation with mostly 64-bit ops
+ */
+
+/* key constants */
+
+#define CAMELLIA_SIGMA1 (0xA09E667F3BCC908B)
+#define CAMELLIA_SIGMA2 (0xB67AE8584CAA73B2)
+#define CAMELLIA_SIGMA3 (0xC6EF372FE94F82BE)
+#define CAMELLIA_SIGMA4 (0x54FF53A5F1D36F1C)
+#define CAMELLIA_SIGMA5 (0x10E527FADE682D1D)
+#define CAMELLIA_SIGMA6 (0xB05688C2B3E6C1FD)
+
+/*
+ *  macros
+ */
+#define GETU64(v, pt) \
+    do { \
+	/* latest breed of gcc is clever enough to use move */ \
+	memcpy(&(v), (pt), 8); \
+	(v) = be64_to_cpu(v); \
+    } while(0)
+
+/* rotation right shift 1byte */
+#define ROR8(x) (((x) >> 8) + ((x) << (sizeof(x)*8 - 8)))
+/* rotation left shift 1bit */
+#define ROL1(x) (((x) << 1) + ((x) >> (sizeof(x)*8 - 1)))
+/* rotation left shift 1byte */
+#define ROL8(x) (((x) << 8) + ((x) >> (sizeof(x)*8 - 8)))
+
+#define ROLDQ(l, r, w, bits)				\
+    do {						\
+	w = l;						\
+	l = (l << bits) + (r >> (64 - bits));		\
+	r = (r << bits) + (w >> (64 - bits));		\
+    } while(0)
+
+#define CAMELLIA_F(x, k, y, i)					\
+    do {							\
+	u32 yl, yr;						\
+	i = x ^ k;						\
+	yl = camellia_sp1110[(u8)i]				\
+	   ^ camellia_sp0222[(u8)(i >> 24)]			\
+	   ^ camellia_sp3033[(u8)(i >> 16)]			\
+	   ^ camellia_sp4404[(u8)(i >> 8)];			\
+	yr = camellia_sp1110[    (i >> 56)]			\
+	   ^ camellia_sp0222[(u8)(i >> 48)]			\
+	   ^ camellia_sp3033[(u8)(i >> 40)]			\
+	   ^ camellia_sp4404[(u8)(i >> 32)];			\
+	yl ^= yr;						\
+	yr = ROR8(yr);						\
+	yr ^= yl;						\
+	y = ((u64)yl << 32) + yr;				\
+    } while(0)
+
+#define SUBKEY(INDEX) (subkey[(INDEX)])
+
+#ifdef __BIG_ENDIAN
+#define SUBKEY_L(INDEX) (((u32*)subkey)[(INDEX)*2])
+#define SUBKEY_R(INDEX) (((u32*)subkey)[(INDEX)*2 + 1])
+#else
+#define SUBKEY_L(INDEX) (((u32*)subkey)[(INDEX)*2 + 1])
+#define SUBKEY_R(INDEX) (((u32*)subkey)[(INDEX)*2])
+#endif
+
+static void camellia_setup_tail(u64 *subkey, int max)
+{
+	u32 dw;
+	int i = 2;
+	do {
+		dw = SUBKEY_L(i + 0) ^ SUBKEY_R(i + 0); dw = ROL8(dw);/* round 1 */
+		SUBKEY_R(i + 0) = SUBKEY_L(i + 0) ^ dw; SUBKEY_L(i + 0) = dw;
+		dw = SUBKEY_L(i + 1) ^ SUBKEY_R(i + 1); dw = ROL8(dw);/* round 2 */
+		SUBKEY_R(i + 1) = SUBKEY_L(i + 1) ^ dw; SUBKEY_L(i + 1) = dw;
+		dw = SUBKEY_L(i + 2) ^ SUBKEY_R(i + 2); dw = ROL8(dw);/* round 3 */
+		SUBKEY_R(i + 2) = SUBKEY_L(i + 2) ^ dw; SUBKEY_L(i + 2) = dw;
+		dw = SUBKEY_L(i + 3) ^ SUBKEY_R(i + 3); dw = ROL8(dw);/* round 4 */
+		SUBKEY_R(i + 3) = SUBKEY_L(i + 3) ^ dw; SUBKEY_L(i + 3) = dw;
+		dw = SUBKEY_L(i + 4) ^ SUBKEY_R(i + 4); dw = ROL8(dw);/* round 5 */
+		SUBKEY_R(i + 4) = SUBKEY_L(i + 4) ^ dw; SUBKEY_L(i + 4) = dw;
+		dw = SUBKEY_L(i + 5) ^ SUBKEY_R(i + 5); dw = ROL8(dw);/* round 6 */
+		SUBKEY_R(i + 5) = SUBKEY_L(i + 5) ^ dw; SUBKEY_L(i + 5) = dw;
+		i += 8;
+	} while (i < max);
+}
+
+#ifdef __BIG_ENDIAN
+#define subL(INDEX) (((u32*)sub)[(INDEX)*2])
+#define subR(INDEX) (((u32*)sub)[(INDEX)*2 + 1])
+#else
+#define subL(INDEX) (((u32*)sub)[(INDEX)*2 + 1])
+#define subR(INDEX) (((u32*)sub)[(INDEX)*2])
+#endif
+
+static void camellia_setup128(const unsigned char *key, u64 *subkey)
+{
+	u64 kl, kr;
+	u64 i, t, w;
+	u64 kw4;
+	u32 dw;
+	u64 sub[26];
+
+	/**
+	 *  k == kl || kr (|| is concatination)
+	 */
+	GETU64(kl, key     );
+	GETU64(kr, key +  8);
+
+	/**
+	 * generate KL dependent subkeys
+	 */
+	/* kw1 */
+	sub[0] = kl;
+	/* kw2 */
+	sub[1] = kr;
+	/* rotation left shift 15bit */
+	ROLDQ(kl, kr, w, 15);
+	/* k3 */
+	sub[4] = kl;
+	/* k4 */
+	sub[5] = kr;
+	/* rotation left shift 15+30bit */
+	ROLDQ(kl, kr, w, 30);
+	/* k7 */
+	sub[10] = kl;
+	/* k8 */
+	sub[11] = kr;
+	/* rotation left shift 15+30+15bit */
+	ROLDQ(kl, kr, w, 15);
+	/* k10 */
+	sub[13] = kr;
+	/* rotation left shift 15+30+15+17 bit */
+	ROLDQ(kl, kr, w, 17);
+	/* kl3 */
+	sub[16] = kl;
+	/* kl4 */
+	sub[17] = kr;
+	/* rotation left shift 15+30+15+17+17 bit */
+	ROLDQ(kl, kr, w, 17);
+	/* k13 */
+	sub[18] = kl;
+	/* k14 */
+	sub[19] = kr;
+	/* rotation left shift 15+30+15+17+17+17 bit */
+	ROLDQ(kl, kr, w, 17);
+	/* k17 */
+	sub[22] = kl;
+	/* k18 */
+	sub[23] = kr;
+
+	/* generate KA */
+	kl = sub[0];
+	kr = sub[1];
+	CAMELLIA_F(kl, CAMELLIA_SIGMA1, w, i);
+	kr ^= w;
+	CAMELLIA_F(kr, CAMELLIA_SIGMA2, kl, i);
+	/* current status == (kl, w) */
+	CAMELLIA_F(kl, CAMELLIA_SIGMA3, kr, i);
+	kr ^= w;
+	CAMELLIA_F(kr, CAMELLIA_SIGMA4, w, i);
+	kl ^= w;
+
+	/* generate KA dependent subkeys */
+	/* k1, k2 */
+	sub[2] = kl;
+	sub[3] = kr;
+	ROLDQ(kl, kr, w, 15);
+	/* k5,k6 */
+	sub[6] = kl;
+	sub[7] = kr;
+	ROLDQ(kl, kr, w, 15);
+	/* kl1, kl2 */
+	sub[8] = kl;
+	sub[9] = kr;
+	ROLDQ(kl, kr, w, 15);
+	/* k9 */
+	sub[12] = kl;
+	ROLDQ(kl, kr, w, 15);
+	/* k11, k12 */
+	sub[14] = kl;
+	sub[15] = kr;
+	ROLDQ(kl, kr, w, 34);
+	/* k15, k16 */
+	sub[20] = kl;
+	sub[21] = kr;
+	ROLDQ(kl, kr, w, 17);
+	/* kw3, kw4 */
+	sub[24] = kl;
+	sub[25] = kr;
+
+	/* absorb kw2 to other subkeys */
+	/* round 2 */
+	sub[3] ^= sub[1];
+	/* round 4 */
+	sub[5] ^= sub[1];
+	/* round 6 */
+	sub[7] ^= sub[1];
+	subL(1) ^= subR(1) & ~subR(9);
+	dw = subL(1) & subL(9),
+		subR(1) ^= ROL1(dw); /* modified for FLinv(kl2) */
+	/* round 8 */
+	sub[11] ^= sub[1];
+	/* round 10 */
+	sub[13] ^= sub[1];
+	/* round 12 */
+	sub[15] ^= sub[1];
+	subL(1) ^= subR(1) & ~subR(17);
+	dw = subL(1) & subL(17),
+		subR(1) ^= ROL1(dw); /* modified for FLinv(kl4) */
+	/* round 14 */
+	sub[19] ^= sub[1];
+	/* round 16 */
+	sub[21] ^= sub[1];
+	/* round 18 */
+	sub[23] ^= sub[1];
+	/* kw3 */
+	sub[24] ^= sub[1];
+
+	/* absorb kw4 to other subkeys */
+	kw4 = sub[25];
+	/* round 17 */
+	sub[22] ^= kw4;
+	/* round 15 */
+	sub[20] ^= kw4;
+	/* round 13 */
+	sub[18] ^= kw4;
+	kw4 ^= (u64)((u32)kw4 & ~subR(16)) << 32; //kw4l ^= kw4r & ~subR(16);
+	dw = (u32)(kw4 >> 32) & subL(16); // kw4l & subL[16],
+	kw4 ^= ROL1(dw); /* modified for FL(kl3) */
+	/* round 11 */
+	sub[14] ^= kw4;
+	/* round 9 */
+	sub[12] ^= kw4;
+	/* round 7 */
+	sub[10] ^= kw4;
+	kw4 ^= (u64)((u32)kw4 & ~subR(8)) << 32; //kw4l ^= kw4r & ~subR[8];
+	dw = (u32)(kw4 >> 32) & subL(8);
+	kw4 ^= ROL1(dw); /* modified for FL(kl1) */
+	/* round 5 */
+	sub[6] ^= kw4;
+	/* round 3 */
+	sub[4] ^= kw4;
+	/* round 1 */
+	sub[2] ^= kw4;
+	/* kw1 */
+	sub[0] ^= kw4;
+
+	/* key XOR is end of F-function */
+	SUBKEY(0) = sub[0] ^ sub[2];/* kw1 */
+	SUBKEY(2) = sub[3];       /* round 1 */
+	SUBKEY(3) = sub[2] ^ sub[4]; /* round 2 */
+	SUBKEY(4) = sub[3] ^ sub[5]; /* round 3 */
+	SUBKEY(5) = sub[4] ^ sub[6]; /* round 4 */
+	SUBKEY(6) = sub[5] ^ sub[7]; /* round 5 */
+	t = subL(10) ^ (subR(10) & ~subR(8)); // tl = subL[10] ^ (subR[10] & ~subR[8]);
+	dw = (u32)t & subL(8);  /* FL(kl1) */
+	t = (t << 32) | (subR(10) ^ ROL1(dw)); // tr = subR[10] ^ ROL1(dw);
+	SUBKEY(7) = sub[6] ^ t; /* round 6 */
+	SUBKEY(8) = sub[8];       /* FL(kl1) */
+	SUBKEY(9) = sub[9];       /* FLinv(kl2) */
+	t = subL(7) ^ (subR(7) & ~subR(9));
+	dw = (u32)t & subL(9);  /* FLinv(kl2) */
+	t = (t << 32) | (subR(7) ^ ROL1(dw));
+	SUBKEY(10) = t ^ sub[11]; /* round 7 */
+	SUBKEY(11) = sub[10] ^ sub[12]; /* round 8 */
+	SUBKEY(12) = sub[11] ^ sub[13]; /* round 9 */
+	SUBKEY(13) = sub[12] ^ sub[14]; /* round 10 */
+	SUBKEY(14) = sub[13] ^ sub[15]; /* round 11 */
+	t = subL(18) ^ (subR(18) & ~subR(16));
+	dw = (u32)t & subL(16); /* FL(kl3) */
+	t = (t << 32) | (subR(18) ^ ROL1(dw));
+	SUBKEY(15) = sub[14] ^ t; /* round 12 */
+	SUBKEY(16) = sub[16];     /* FL(kl3) */
+	SUBKEY(17) = sub[17];     /* FLinv(kl4) */
+	t = subL(15) ^ (subR(15) & ~subR(17));
+	dw = (u32)t & subL(17); /* FLinv(kl4) */
+	t = (t << 32) | (subR(15) ^ ROL1(dw));
+	SUBKEY(18) = t ^ sub[19]; /* round 13 */
+	SUBKEY(19) = sub[18] ^ sub[20]; /* round 14 */
+	SUBKEY(20) = sub[19] ^ sub[21]; /* round 15 */
+	SUBKEY(21) = sub[20] ^ sub[22]; /* round 16 */
+	SUBKEY(22) = sub[21] ^ sub[23]; /* round 17 */
+	SUBKEY(23) = sub[22];     /* round 18 */
+	SUBKEY(24) = sub[24] ^ sub[23]; /* kw3 */
+
+	/* apply the inverse of the last half of P-function */
+	camellia_setup_tail(subkey, 24);
+}
+
+static void camellia_setup256(const unsigned char *key, u64 *subkey)
+{
+	u64 kl, kr;        /* left half of key */
+	u64 krl, krr;      /* right half of key */
+	u64 i, t, w;       /* temporary variables */
+	u64 kw4;
+	u32 dw;
+	u64 sub[34];
+
+	/**
+	 *  key = (kl || kr || krl || krr)
+	 *  (|| is concatination)
+	 */
+	GETU64(kl,  key     );
+	GETU64(kr,  key +  8);
+	GETU64(krl, key + 16);
+	GETU64(krr, key + 24);
+
+	/* generate KL dependent subkeys */
+	/* kw1 */
+	sub[0] = kl;
+	/* kw2 */
+	sub[1] = kr;
+	ROLDQ(kl, kr, w, 45);
+	/* k9 */
+	sub[12] = kl;
+	/* k10 */
+	sub[13] = kr;
+	ROLDQ(kl, kr, w, 15);
+	/* kl3 */
+	sub[16] = kl;
+	/* kl4 */
+	sub[17] = kr;
+	ROLDQ(kl, kr, w, 17);
+	/* k17 */
+	sub[22] = kl;
+	/* k18 */
+	sub[23] = kr;
+	ROLDQ(kl, kr, w, 34);
+	/* k23 */
+	sub[30] = kl;
+	/* k24 */
+	sub[31] = kr;
+
+	/* generate KR dependent subkeys */
+	ROLDQ(krl, krr, w, 15);
+	/* k3 */
+	sub[4] = krl;
+	/* k4 */
+	sub[5] = krr;
+	ROLDQ(krl, krr, w, 15);
+	/* kl1 */
+	sub[8] = krl;
+	/* kl2 */
+	sub[9] = krr;
+	ROLDQ(krl, krr, w, 30);
+	/* k13 */
+	sub[18] = krl;
+	/* k14 */
+	sub[19] = krr;
+	ROLDQ(krl, krr, w, 34);
+	/* k19 */
+	sub[26] = krl;
+	/* k20 */
+	sub[27] = krr;
+	ROLDQ(krl, krr, w, 34);
+
+	/* generate KA */
+	kl = sub[0] ^ krl;
+	kr = sub[1] ^ krr;
+	CAMELLIA_F(kl, CAMELLIA_SIGMA1, w, i);
+	kr ^= w;
+	CAMELLIA_F(kr, CAMELLIA_SIGMA2, kl, i);
+	kl ^= krl;
+	CAMELLIA_F(kl, CAMELLIA_SIGMA3, kr, i);
+	kr ^= w ^ krr;
+	CAMELLIA_F(kr, CAMELLIA_SIGMA4, w, i);
+	kl ^= w;
+
+	/* generate KB */
+	krl ^= kl;
+	krr ^= kr;
+	CAMELLIA_F(krl, CAMELLIA_SIGMA5, w, i);
+	krr ^= w;
+	CAMELLIA_F(krr, CAMELLIA_SIGMA6, w, i);
+	krl ^= w;
+
+	/* generate KA dependent subkeys */
+	ROLDQ(kl, kr, w, 15);
+	/* k5 */
+	sub[6] = kl;
+	/* k6 */
+	sub[7] = kr;
+	ROLDQ(kl, kr, w, 30);
+	/* k11 */
+	sub[14] = kl;
+	/* k12 */
+	sub[15] = kr;
+	/* kl5 */
+	ROLDQ(kl, kr, w, 32);
+	sub[24] = kl;
+	/* kl6 */
+	sub[25] = kr;
+	/* rotation left shift 49 from k11,k12 -> k21,k22 */
+	ROLDQ(kl, kr, w, (49 - 32));
+	/* k21 */
+	sub[28] = kl;
+	/* k22 */
+	sub[29] = kr;
+
+	/* generate KB dependent subkeys */
+	/* k1 */
+	sub[2] = krl;
+	/* k2 */
+	sub[3] = krr;
+	ROLDQ(krl, krr, w, 30);
+	/* k7 */
+	sub[10] = krl;
+	/* k8 */
+	sub[11] = krr;
+	ROLDQ(krl, krr, w, 30);
+	/* k15 */
+	sub[20] = krl;
+	/* k16 */
+	sub[21] = krr;
+	ROLDQ(krl, krr, w, 51);
+	/* kw3 */
+	sub[32] = krl;
+	/* kw4 */
+	sub[33] = krr;
+
+	/* absorb kw2 to other subkeys */
+	/* round 2 */
+	sub[3] ^= sub[1];
+	/* round 4 */
+	sub[5] ^= sub[1];
+	/* round 6 */
+	sub[7] ^= sub[1];
+	subL(1) ^= subR(1) & ~subR(9);
+	dw = subL(1) & subL(9),
+		subR(1) ^= ROL1(dw); /* modified for FLinv(kl2) */
+	/* round 8 */
+	sub[11] ^= sub[1];
+	/* round 10 */
+	sub[13] ^= sub[1];
+	/* round 12 */
+	sub[15] ^= sub[1];
+	subL(1) ^= subR(1) & ~subR(17);
+	dw = subL(1) & subL(17),
+		subR(1) ^= ROL1(dw); /* modified for FLinv(kl4) */
+	/* round 14 */
+	sub[19] ^= sub[1];
+	/* round 16 */
+	sub[21] ^= sub[1];
+	/* round 18 */
+	sub[23] ^= sub[1];
+	subL(1) ^= subR(1) & ~subR(25);
+	dw = subL(1) & subL(25),
+		subR(1) ^= ROL1(dw); /* modified for FLinv(kl6) */
+	/* round 20 */
+	sub[27] ^= sub[1];
+	/* round 22 */
+	sub[29] ^= sub[1];
+	/* round 24 */
+	sub[31] ^= sub[1];
+	/* kw3 */
+	sub[32] ^= sub[1];
+
+	/* absorb kw4 to other subkeys */
+	kw4 = sub[33];
+	/* round 23 */
+	sub[30] ^= kw4;
+	/* round 21 */
+	sub[28] ^= kw4;
+	/* round 19 */
+	sub[26] ^= kw4;
+	kw4 ^= (u64)((u32)kw4 & ~subR(24)) << 32; //kw4l ^= kw4r & ~subR[24];
+	dw = (u32)(kw4 >> 32) & subL(24),
+		kw4 ^= ROL1(dw); /* modified for FL(kl5) */
+	/* round 17 */
+	sub[22] ^= kw4;
+	/* round 15 */
+	sub[20] ^= kw4;
+	/* round 13 */
+	sub[18] ^= kw4;
+	kw4 ^= (u64)((u32)kw4 & ~subR(16)) << 32;
+	dw = (u32)(kw4 >> 32) & subL(16),
+		kw4 ^= ROL1(dw); /* modified for FL(kl3) */
+	/* round 11 */
+	sub[14] ^= kw4;
+	/* round 9 */
+	sub[12] ^= kw4;
+	/* round 7 */
+	sub[10] ^= kw4;
+	kw4 ^= (u64)((u32)kw4 & ~subR(8)) << 32;
+	dw = (u32)(kw4 >> 32) & subL(8),
+		kw4 ^= ROL1(dw); /* modified for FL(kl1) */
+	/* round 5 */
+	sub[6] ^= kw4;
+	/* round 3 */
+	sub[4] ^= kw4;
+	/* round 1 */
+	sub[2] ^= kw4;
+	/* kw1 */
+	sub[0] ^= kw4;
+
+	/* key XOR is end of F-function */
+	SUBKEY(0) = sub[0] ^ sub[2];/* kw1 */
+	SUBKEY(2) = sub[3];       /* round 1 */
+	SUBKEY(3) = sub[2] ^ sub[4]; /* round 2 */
+	SUBKEY(4) = sub[3] ^ sub[5]; /* round 3 */
+	SUBKEY(5) = sub[4] ^ sub[6]; /* round 4 */
+	SUBKEY(6) = sub[5] ^ sub[7]; /* round 5 */
+	t = subL(10) ^ (subR(10) & ~subR(8)); // tl = subL[10] ^ (subR[10] & ~subR[8]);
+	dw = (u32)t & subL(8);  /* FL(kl1) */
+	t = (t << 32) | (subR(10) ^ ROL1(dw)); //tr = subR[10] ^ ROL1(dw);
+	SUBKEY(7) = sub[6] ^ t;   /* round 6 */
+	SUBKEY(8) = sub[8];       /* FL(kl1) */
+	SUBKEY(9) = sub[9];       /* FLinv(kl2) */
+	t = subL(7) ^ (subR(7) & ~subR(9));
+	dw = (u32)t & subL(9);  /* FLinv(kl2) */
+	t = (t << 32) | (subR(7) ^ ROL1(dw));
+	SUBKEY(10) = t ^ sub[11]; /* round 7 */
+	SUBKEY(11) = sub[10] ^ sub[12]; /* round 8 */
+	SUBKEY(12) = sub[11] ^ sub[13]; /* round 9 */
+	SUBKEY(13) = sub[12] ^ sub[14]; /* round 10 */
+	SUBKEY(14) = sub[13] ^ sub[15]; /* round 11 */
+	t = subL(18) ^ (subR(18) & ~subR(16));
+	dw = (u32)t & subL(16); /* FL(kl3) */
+	t = (t << 32) | (subR(18) ^ ROL1(dw));
+	SUBKEY(15) = sub[14] ^ t; /* round 12 */
+	SUBKEY(16) = sub[16];     /* FL(kl3) */
+	SUBKEY(17) = sub[17];     /* FLinv(kl4) */
+	t = subL(15) ^ (subR(15) & ~subR(17));
+	dw = (u32)t & subL(17); /* FLinv(kl4) */
+	t = (t << 32) | (subR(15) ^ ROL1(dw));
+	SUBKEY(18) = t ^ sub[19]; /* round 13 */
+	SUBKEY(19) = sub[18] ^ sub[20]; /* round 14 */
+	SUBKEY(20) = sub[19] ^ sub[21]; /* round 15 */
+	SUBKEY(21) = sub[20] ^ sub[22]; /* round 16 */
+	SUBKEY(22) = sub[21] ^ sub[23]; /* round 17 */
+	t = subL(26) ^ (subR(26) & ~subR(24));
+	dw = (u32)t & subL(24); /* FL(kl5) */
+	t = (t << 32) | (subR(26) ^ ROL1(dw));
+	SUBKEY(23) = sub[22] ^ t; /* round 18 */
+	SUBKEY(24) = sub[24];     /* FL(kl5) */
+	SUBKEY(25) = sub[25];     /* FLinv(kl6) */
+	t = subL(23) ^ (subR(23) & ~subR(25));
+	dw = (u32)t & subL(25); /* FLinv(kl6) */
+	t = (t << 32) | (subR(23) ^ ROL1(dw));
+	SUBKEY(26) = t ^ sub[27]; /* round 19 */
+	SUBKEY(27) = sub[26] ^ sub[28]; /* round 20 */
+	SUBKEY(28) = sub[27] ^ sub[29]; /* round 21 */
+	SUBKEY(29) = sub[28] ^ sub[30]; /* round 22 */
+	SUBKEY(30) = sub[29] ^ sub[31]; /* round 23 */
+	SUBKEY(31) = sub[30];     /* round 24 */
+	SUBKEY(32) = sub[32] ^ sub[31]; /* kw3 */
+
+	/* apply the inverse of the last half of P-function */
+	camellia_setup_tail(subkey, 32);
+}
+
+static void camellia_setup192(const unsigned char *key, u64 *subkey)
+{
+	unsigned char kk[32];
+	u64 krl, krr;
+
+	memcpy(kk, key, 24);
+	memcpy((unsigned char *)&krl, key+16, 8);
+	krr = ~krl;
+	memcpy(kk+24, (unsigned char *)&krr, 8);
+	camellia_setup256(kk, subkey);
+}
+
+typedef u64 key_element;
+typedef const u64 const_key_element;
+
+
+
+#else /* BITS_PER_LONG < 64 */
+
+/*
+ * Key setup implementation with 32-bit ops
+ */
 
 /* key constants */
 
@@ -329,8 +912,7 @@ static const u32 camellia_sp4404[256] = 
 /*
  *  macros
  */
-
-# define GETU32(v, pt) \
+#define GETU32(v, pt) \
     do { \
 	/* latest breed of gcc is clever enough to use move */ \
 	memcpy(&(v), (pt), 4); \
@@ -363,64 +945,25 @@ static const u32 camellia_sp4404[256] = 
 	rr = (w0 << (bits - 32)) + (w1 >> (64 - bits));	\
     } while(0)
 
-
 #define CAMELLIA_F(xl, xr, kl, kr, yl, yr, il, ir, t0, t1)	\
     do {							\
 	il = xl ^ kl;						\
 	ir = xr ^ kr;						\
 	t0 = il >> 16;						\
 	t1 = ir >> 16;						\
-	yl = camellia_sp1110[ir & 0xff]				\
-	   ^ camellia_sp0222[(t1 >> 8) & 0xff]			\
-	   ^ camellia_sp3033[t1 & 0xff]				\
-	   ^ camellia_sp4404[(ir >> 8) & 0xff];			\
-	yr = camellia_sp1110[(t0 >> 8) & 0xff]			\
-	   ^ camellia_sp0222[t0 & 0xff]				\
-	   ^ camellia_sp3033[(il >> 8) & 0xff]			\
-	   ^ camellia_sp4404[il & 0xff];			\
+	yl = camellia_sp1110[(u8)(ir     )]			\
+	   ^ camellia_sp0222[    (t1 >> 8)]			\
+	   ^ camellia_sp3033[(u8)(t1     )]			\
+	   ^ camellia_sp4404[(u8)(ir >> 8)];			\
+	yr = camellia_sp1110[    (t0 >> 8)]			\
+	   ^ camellia_sp0222[(u8)(t0     )]			\
+	   ^ camellia_sp3033[(u8)(il >> 8)]			\
+	   ^ camellia_sp4404[(u8)(il     )];			\
 	yl ^= yr;						\
 	yr = ROR8(yr);						\
 	yr ^= yl;						\
     } while(0)
 
-
-/*
- * for speed up
- *
- */
-#define CAMELLIA_FLS(ll, lr, rl, rr, kll, klr, krl, krr, t0, t1, t2, t3) \
-    do {								\
-	t0 = kll;							\
-	t2 = krr;							\
-	t0 &= ll;							\
-	t2 |= rr;							\
-	rl ^= t2;							\
-	lr ^= ROL1(t0);							\
-	t3 = krl;							\
-	t1 = klr;							\
-	t3 &= rl;							\
-	t1 |= lr;							\
-	ll ^= t1;							\
-	rr ^= ROL1(t3);							\
-    } while(0)
-
-#define CAMELLIA_ROUNDSM(xl, xr, kl, kr, yl, yr, il, ir, t0, t1)	\
-    do {								\
-	ir =  camellia_sp1110[xr & 0xff];				\
-	il =  camellia_sp1110[(xl>>24) & 0xff];				\
-	ir ^= camellia_sp0222[(xr>>24) & 0xff];				\
-	il ^= camellia_sp0222[(xl>>16) & 0xff];				\
-	ir ^= camellia_sp3033[(xr>>16) & 0xff];				\
-	il ^= camellia_sp3033[(xl>>8) & 0xff];				\
-	ir ^= camellia_sp4404[(xr>>8) & 0xff];				\
-	il ^= camellia_sp4404[xl & 0xff];				\
-	il ^= kl;							\
-	ir ^= il ^ kr;							\
-	yl ^= ir;							\
-	yr ^= ROR8(il) ^ ir;						\
-    } while(0)
-
-
 #define SUBKEY_L(INDEX) (subkey[(INDEX)*2])
 #define SUBKEY_R(INDEX) (subkey[(INDEX)*2 + 1])
 
@@ -999,8 +1542,49 @@ static void camellia_setup192(const unsi
 	camellia_setup256(kk, subkey);
 }
 
+typedef u32 key_element;
+typedef const u32 const_key_element;
+
+#endif /* 32/64-bit key setup versions */
+
+
+
+/*
+ * Encrypt/decrypt
+ */
+#define CAMELLIA_FLS(ll, lr, rl, rr, kll, klr, krl, krr, t0, t1, t2, t3) \
+    do {								\
+	t0 = kll;							\
+	t2 = krr;							\
+	t0 &= ll;							\
+	t2 |= rr;							\
+	rl ^= t2;							\
+	lr ^= ROL1(t0);							\
+	t3 = krl;							\
+	t1 = klr;							\
+	t3 &= rl;							\
+	t1 |= lr;							\
+	ll ^= t1;							\
+	rr ^= ROL1(t3);							\
+    } while(0)
+
+#define CAMELLIA_ROUNDSM(xl, xr, kl, kr, yl, yr, il, ir)		\
+    do {								\
+	ir =  camellia_sp1110[(u8)xr];					\
+	il =  camellia_sp1110[    (xl >> 24)];				\
+	ir ^= camellia_sp0222[    (xr >> 24)];				\
+	il ^= camellia_sp0222[(u8)(xl >> 16)];				\
+	ir ^= camellia_sp3033[(u8)(xr >> 16)];				\
+	il ^= camellia_sp3033[(u8)(xl >> 8)];				\
+	ir ^= camellia_sp4404[(u8)(xr >> 8)];				\
+	il ^= camellia_sp4404[(u8)xl];					\
+	il ^= kl;							\
+	ir ^= il ^ kr;							\
+	yl ^= ir;							\
+	yr ^= ROR8(il) ^ ir;						\
+    } while(0)
 
-static void camellia_encrypt128(const u32 *subkey, u32 *io_text)
+static void camellia_encrypt128(const_key_element *subkey, u32 *io_text)
 {
 	u32 il,ir,t0,t1;               /* temporary variables */
 
@@ -1015,22 +1599,22 @@ static void camellia_encrypt128(const u3
 	/* main iteration */
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(2),SUBKEY_R(2),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(3),SUBKEY_R(3),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(4),SUBKEY_R(4),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(5),SUBKEY_R(5),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(6),SUBKEY_R(6),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(7),SUBKEY_R(7),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 
 	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
 		     SUBKEY_L(8),SUBKEY_R(8),
@@ -1039,22 +1623,22 @@ static void camellia_encrypt128(const u3
 
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(10),SUBKEY_R(10),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(11),SUBKEY_R(11),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(12),SUBKEY_R(12),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(13),SUBKEY_R(13),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(14),SUBKEY_R(14),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(15),SUBKEY_R(15),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 
 	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
 		     SUBKEY_L(16),SUBKEY_R(16),
@@ -1063,22 +1647,22 @@ static void camellia_encrypt128(const u3
 
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(18),SUBKEY_R(18),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(19),SUBKEY_R(19),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(20),SUBKEY_R(20),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(21),SUBKEY_R(21),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(22),SUBKEY_R(22),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(23),SUBKEY_R(23),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 
 	/* post whitening but kw4 */
 	io_text[0] = io[2] ^ SUBKEY_L(24);
@@ -1087,7 +1671,7 @@ static void camellia_encrypt128(const u3
 	io_text[3] = io[1];
 }
 
-static void camellia_decrypt128(const u32 *subkey, u32 *io_text)
+static void camellia_decrypt128(const_key_element *subkey, u32 *io_text)
 {
 	u32 il,ir,t0,t1;               /* temporary variables */
 
@@ -1102,22 +1686,22 @@ static void camellia_decrypt128(const u3
 	/* main iteration */
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(23),SUBKEY_R(23),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(22),SUBKEY_R(22),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(21),SUBKEY_R(21),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(20),SUBKEY_R(20),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(19),SUBKEY_R(19),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(18),SUBKEY_R(18),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 
 	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
 		     SUBKEY_L(17),SUBKEY_R(17),
@@ -1126,22 +1710,22 @@ static void camellia_decrypt128(const u3
 
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(15),SUBKEY_R(15),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(14),SUBKEY_R(14),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(13),SUBKEY_R(13),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(12),SUBKEY_R(12),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(11),SUBKEY_R(11),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(10),SUBKEY_R(10),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 
 	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
 		     SUBKEY_L(9),SUBKEY_R(9),
@@ -1150,22 +1734,22 @@ static void camellia_decrypt128(const u3
 
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(7),SUBKEY_R(7),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(6),SUBKEY_R(6),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(5),SUBKEY_R(5),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(4),SUBKEY_R(4),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(3),SUBKEY_R(3),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(2),SUBKEY_R(2),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 
 	/* post whitening but kw4 */
 	io_text[0] = io[2] ^ SUBKEY_L(0);
@@ -1174,7 +1758,7 @@ static void camellia_decrypt128(const u3
 	io_text[3] = io[1];
 }
 
-static void camellia_encrypt256(const u32 *subkey, u32 *io_text)
+static void camellia_encrypt256(const_key_element *subkey, u32 *io_text)
 {
 	u32 il,ir,t0,t1;           /* temporary variables */
 
@@ -1189,22 +1773,22 @@ static void camellia_encrypt256(const u3
 	/* main iteration */
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(2),SUBKEY_R(2),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(3),SUBKEY_R(3),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(4),SUBKEY_R(4),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(5),SUBKEY_R(5),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(6),SUBKEY_R(6),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(7),SUBKEY_R(7),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 
 	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
 		     SUBKEY_L(8),SUBKEY_R(8),
@@ -1213,22 +1797,22 @@ static void camellia_encrypt256(const u3
 
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(10),SUBKEY_R(10),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(11),SUBKEY_R(11),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(12),SUBKEY_R(12),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(13),SUBKEY_R(13),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(14),SUBKEY_R(14),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(15),SUBKEY_R(15),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 
 	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
 		     SUBKEY_L(16),SUBKEY_R(16),
@@ -1237,22 +1821,22 @@ static void camellia_encrypt256(const u3
 
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(18),SUBKEY_R(18),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(19),SUBKEY_R(19),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(20),SUBKEY_R(20),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(21),SUBKEY_R(21),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(22),SUBKEY_R(22),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(23),SUBKEY_R(23),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 
 	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
 		     SUBKEY_L(24),SUBKEY_R(24),
@@ -1261,22 +1845,22 @@ static void camellia_encrypt256(const u3
 
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(26),SUBKEY_R(26),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(27),SUBKEY_R(27),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(28),SUBKEY_R(28),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(29),SUBKEY_R(29),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(30),SUBKEY_R(30),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(31),SUBKEY_R(31),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 
 	/* post whitening but kw4 */
 	io_text[0] = io[2] ^ SUBKEY_L(32);
@@ -1285,7 +1869,7 @@ static void camellia_encrypt256(const u3
 	io_text[3] = io[1];
 }
 
-static void camellia_decrypt256(const u32 *subkey, u32 *io_text)
+static void camellia_decrypt256(const_key_element *subkey, u32 *io_text)
 {
 	u32 il,ir,t0,t1;           /* temporary variables */
 
@@ -1300,22 +1884,22 @@ static void camellia_decrypt256(const u3
 	/* main iteration */
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(31),SUBKEY_R(31),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(30),SUBKEY_R(30),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(29),SUBKEY_R(29),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(28),SUBKEY_R(28),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(27),SUBKEY_R(27),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(26),SUBKEY_R(26),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 
 	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
 		     SUBKEY_L(25),SUBKEY_R(25),
@@ -1324,22 +1908,22 @@ static void camellia_decrypt256(const u3
 
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(23),SUBKEY_R(23),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(22),SUBKEY_R(22),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(21),SUBKEY_R(21),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(20),SUBKEY_R(20),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(19),SUBKEY_R(19),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(18),SUBKEY_R(18),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 
 	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
 		     SUBKEY_L(17),SUBKEY_R(17),
@@ -1348,22 +1932,22 @@ static void camellia_decrypt256(const u3
 
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(15),SUBKEY_R(15),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(14),SUBKEY_R(14),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(13),SUBKEY_R(13),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(12),SUBKEY_R(12),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(11),SUBKEY_R(11),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(10),SUBKEY_R(10),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 
 	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
 		     SUBKEY_L(9),SUBKEY_R(9),
@@ -1372,22 +1956,22 @@ static void camellia_decrypt256(const u3
 
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(7),SUBKEY_R(7),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(6),SUBKEY_R(6),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(5),SUBKEY_R(5),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(4),SUBKEY_R(4),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 	CAMELLIA_ROUNDSM(io[0],io[1],
 			 SUBKEY_L(3),SUBKEY_R(3),
-			 io[2],io[3],il,ir,t0,t1);
+			 io[2],io[3],il,ir);
 	CAMELLIA_ROUNDSM(io[2],io[3],
 			 SUBKEY_L(2),SUBKEY_R(2),
-			 io[0],io[1],il,ir,t0,t1);
+			 io[0],io[1],il,ir);
 
 	/* post whitening but kw4 */
 	io_text[0] = io[2] ^ SUBKEY_L(0);
@@ -1399,7 +1983,7 @@ static void camellia_decrypt256(const u3
 
 struct camellia_ctx {
 	int key_length;
-	u32 key_table[CAMELLIA_TABLE_BYTE_LEN / 4];
+	key_element key_table[CAMELLIA_TABLE_BYTE_LEN / sizeof(key_element)];
 };
 
 static int

[-- Attachment #3: linux-2.6.23.1.camellia6.diff --]
[-- Type: text/x-diff, Size: 15277 bytes --]

diff -urpN linux-2.6.23.1.camellia5/crypto/camellia.c linux-2.6.23.1.camellia6/crypto/camellia.c
--- linux-2.6.23.1.camellia5/crypto/camellia.c	2007-11-14 11:30:27.000000000 -0800
+++ linux-2.6.23.1.camellia6/crypto/camellia.c	2007-11-14 11:30:27.000000000 -0800
@@ -1584,400 +1584,115 @@ typedef const u32 const_key_element;
 	yr ^= ROR8(il) ^ ir;						\
     } while(0)
 
-static void camellia_encrypt128(const_key_element *subkey, u32 *io_text)
+/* max = 24: 128bit encrypt, max = 32: 256bit encrypt */
+static void camellia_do_encrypt(const_key_element *subkey, u32 *io, unsigned max)
 {
 	u32 il,ir,t0,t1;               /* temporary variables */
 
-	u32 io[4];
-
 	/* pre whitening but absorb kw2 */
-	io[0] = io_text[0] ^ SUBKEY_L(0);
-	io[1] = io_text[1] ^ SUBKEY_R(0);
-	io[2] = io_text[2];
-	io[3] = io_text[3];
+	io[0] ^= SUBKEY_L(0);
+	io[1] ^= SUBKEY_R(0);
 
 	/* main iteration */
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(2),SUBKEY_R(2),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(3),SUBKEY_R(3),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(4),SUBKEY_R(4),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(5),SUBKEY_R(5),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(6),SUBKEY_R(6),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(7),SUBKEY_R(7),
-			 io[0],io[1],il,ir);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(8),SUBKEY_R(8),
-		     SUBKEY_L(9),SUBKEY_R(9),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(10),SUBKEY_R(10),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(11),SUBKEY_R(11),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(12),SUBKEY_R(12),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(13),SUBKEY_R(13),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(14),SUBKEY_R(14),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(15),SUBKEY_R(15),
-			 io[0],io[1],il,ir);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(16),SUBKEY_R(16),
-		     SUBKEY_L(17),SUBKEY_R(17),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(18),SUBKEY_R(18),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(19),SUBKEY_R(19),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(20),SUBKEY_R(20),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(21),SUBKEY_R(21),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(22),SUBKEY_R(22),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(23),SUBKEY_R(23),
-			 io[0],io[1],il,ir);
-
-	/* post whitening but kw4 */
-	io_text[0] = io[2] ^ SUBKEY_L(24);
-	io_text[1] = io[3] ^ SUBKEY_R(24);
-	io_text[2] = io[0];
-	io_text[3] = io[1];
-}
-
-static void camellia_decrypt128(const_key_element *subkey, u32 *io_text)
-{
-	u32 il,ir,t0,t1;               /* temporary variables */
-
-	u32 io[4];
-
-	/* pre whitening but absorb kw2 */
-	io[0] = io_text[0] ^ SUBKEY_L(24);
-	io[1] = io_text[1] ^ SUBKEY_R(24);
-	io[2] = io_text[2];
-	io[3] = io_text[3];
+#define ROUNDS(i) do { \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 2),SUBKEY_R(i + 2), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 3),SUBKEY_R(i + 3), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 4),SUBKEY_R(i + 4), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 5),SUBKEY_R(i + 5), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 6),SUBKEY_R(i + 6), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 7),SUBKEY_R(i + 7), \
+			 io[0],io[1],il,ir); \
+} while (0)
+#define FLS(i) do { \
+	CAMELLIA_FLS(io[0],io[1],io[2],io[3], \
+		     SUBKEY_L(i + 0),SUBKEY_R(i + 0), \
+		     SUBKEY_L(i + 1),SUBKEY_R(i + 1), \
+		     t0,t1,il,ir); \
+} while (0)
+
+	ROUNDS(0);
+	FLS(8);
+	ROUNDS(8);
+	FLS(16);
+	ROUNDS(16);
+	if (max == 32) {
+		FLS(24);
+		ROUNDS(24);
+	}
 
-	/* main iteration */
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(23),SUBKEY_R(23),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(22),SUBKEY_R(22),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(21),SUBKEY_R(21),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(20),SUBKEY_R(20),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(19),SUBKEY_R(19),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(18),SUBKEY_R(18),
-			 io[0],io[1],il,ir);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(17),SUBKEY_R(17),
-		     SUBKEY_L(16),SUBKEY_R(16),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(15),SUBKEY_R(15),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(14),SUBKEY_R(14),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(13),SUBKEY_R(13),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(12),SUBKEY_R(12),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(11),SUBKEY_R(11),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(10),SUBKEY_R(10),
-			 io[0],io[1],il,ir);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(9),SUBKEY_R(9),
-		     SUBKEY_L(8),SUBKEY_R(8),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(7),SUBKEY_R(7),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(6),SUBKEY_R(6),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(5),SUBKEY_R(5),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(4),SUBKEY_R(4),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(3),SUBKEY_R(3),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(2),SUBKEY_R(2),
-			 io[0],io[1],il,ir);
+#undef ROUNDS
+#undef FLS
 
 	/* post whitening but kw4 */
-	io_text[0] = io[2] ^ SUBKEY_L(0);
-	io_text[1] = io[3] ^ SUBKEY_R(0);
-	io_text[2] = io[0];
-	io_text[3] = io[1];
+	io[2] ^= SUBKEY_L(max);
+	io[3] ^= SUBKEY_R(max);
+	/* NB: io[0],[1] should be swapped with [2],[3] by caller! */
 }
 
-static void camellia_encrypt256(const_key_element *subkey, u32 *io_text)
+static void camellia_do_decrypt(const_key_element *subkey, u32 *io, unsigned i)
 {
-	u32 il,ir,t0,t1;           /* temporary variables */
-
-	u32 io[4];
+	u32 il,ir,t0,t1;               /* temporary variables */
 
 	/* pre whitening but absorb kw2 */
-	io[0] = io_text[0] ^ SUBKEY_L(0);
-	io[1] = io_text[1] ^ SUBKEY_R(0);
-	io[2] = io_text[2];
-	io[3] = io_text[3];
+	io[0] ^= SUBKEY_L(i);
+	io[1] ^= SUBKEY_R(i);
 
 	/* main iteration */
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(2),SUBKEY_R(2),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(3),SUBKEY_R(3),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(4),SUBKEY_R(4),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(5),SUBKEY_R(5),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(6),SUBKEY_R(6),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(7),SUBKEY_R(7),
-			 io[0],io[1],il,ir);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(8),SUBKEY_R(8),
-		     SUBKEY_L(9),SUBKEY_R(9),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(10),SUBKEY_R(10),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(11),SUBKEY_R(11),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(12),SUBKEY_R(12),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(13),SUBKEY_R(13),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(14),SUBKEY_R(14),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(15),SUBKEY_R(15),
-			 io[0],io[1],il,ir);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(16),SUBKEY_R(16),
-		     SUBKEY_L(17),SUBKEY_R(17),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(18),SUBKEY_R(18),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(19),SUBKEY_R(19),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(20),SUBKEY_R(20),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(21),SUBKEY_R(21),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(22),SUBKEY_R(22),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(23),SUBKEY_R(23),
-			 io[0],io[1],il,ir);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(24),SUBKEY_R(24),
-		     SUBKEY_L(25),SUBKEY_R(25),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(26),SUBKEY_R(26),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(27),SUBKEY_R(27),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(28),SUBKEY_R(28),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(29),SUBKEY_R(29),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(30),SUBKEY_R(30),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(31),SUBKEY_R(31),
-			 io[0],io[1],il,ir);
-
-	/* post whitening but kw4 */
-	io_text[0] = io[2] ^ SUBKEY_L(32);
-	io_text[1] = io[3] ^ SUBKEY_R(32);
-	io_text[2] = io[0];
-	io_text[3] = io[1];
-}
-
-static void camellia_decrypt256(const_key_element *subkey, u32 *io_text)
-{
-	u32 il,ir,t0,t1;           /* temporary variables */
-
-	u32 io[4];
-
-	/* pre whitening but absorb kw2 */
-	io[0] = io_text[0] ^ SUBKEY_L(32);
-	io[1] = io_text[1] ^ SUBKEY_R(32);
-	io[2] = io_text[2];
-	io[3] = io_text[3];
+#define ROUNDS(i) do { \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 7),SUBKEY_R(i + 7), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 6),SUBKEY_R(i + 6), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 5),SUBKEY_R(i + 5), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 4),SUBKEY_R(i + 4), \
+			 io[0],io[1],il,ir); \
+	CAMELLIA_ROUNDSM(io[0],io[1], \
+			 SUBKEY_L(i + 3),SUBKEY_R(i + 3), \
+			 io[2],io[3],il,ir); \
+	CAMELLIA_ROUNDSM(io[2],io[3], \
+			 SUBKEY_L(i + 2),SUBKEY_R(i + 2), \
+			 io[0],io[1],il,ir); \
+} while (0)
+#define FLS(i) do { \
+	CAMELLIA_FLS(io[0],io[1],io[2],io[3], \
+		     SUBKEY_L(i + 1),SUBKEY_R(i + 1), \
+		     SUBKEY_L(i + 0),SUBKEY_R(i + 0), \
+		     t0,t1,il,ir); \
+} while (0)
+
+	if (i == 32) {
+		ROUNDS(24);
+		FLS(24);
+	}
+	ROUNDS(16);
+	FLS(16);
+	ROUNDS(8);
+	FLS(8);
+	ROUNDS(0);
 
-	/* main iteration */
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(31),SUBKEY_R(31),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(30),SUBKEY_R(30),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(29),SUBKEY_R(29),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(28),SUBKEY_R(28),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(27),SUBKEY_R(27),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(26),SUBKEY_R(26),
-			 io[0],io[1],il,ir);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(25),SUBKEY_R(25),
-		     SUBKEY_L(24),SUBKEY_R(24),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(23),SUBKEY_R(23),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(22),SUBKEY_R(22),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(21),SUBKEY_R(21),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(20),SUBKEY_R(20),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(19),SUBKEY_R(19),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(18),SUBKEY_R(18),
-			 io[0],io[1],il,ir);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(17),SUBKEY_R(17),
-		     SUBKEY_L(16),SUBKEY_R(16),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(15),SUBKEY_R(15),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(14),SUBKEY_R(14),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(13),SUBKEY_R(13),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(12),SUBKEY_R(12),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(11),SUBKEY_R(11),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(10),SUBKEY_R(10),
-			 io[0],io[1],il,ir);
-
-	CAMELLIA_FLS(io[0],io[1],io[2],io[3],
-		     SUBKEY_L(9),SUBKEY_R(9),
-		     SUBKEY_L(8),SUBKEY_R(8),
-		     t0,t1,il,ir);
-
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(7),SUBKEY_R(7),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(6),SUBKEY_R(6),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(5),SUBKEY_R(5),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(4),SUBKEY_R(4),
-			 io[0],io[1],il,ir);
-	CAMELLIA_ROUNDSM(io[0],io[1],
-			 SUBKEY_L(3),SUBKEY_R(3),
-			 io[2],io[3],il,ir);
-	CAMELLIA_ROUNDSM(io[2],io[3],
-			 SUBKEY_L(2),SUBKEY_R(2),
-			 io[0],io[1],il,ir);
+#undef ROUNDS
+#undef FLS
 
 	/* post whitening but kw4 */
-	io_text[0] = io[2] ^ SUBKEY_L(0);
-	io_text[1] = io[3] ^ SUBKEY_R(0);
-	io_text[2] = io[0];
-	io_text[3] = io[1];
+	io[2] ^= SUBKEY_L(0);
+	io[3] ^= SUBKEY_R(0);
+	/* NB: 0,1 should be swapped with 2,3 by caller! */
 }
 
 
@@ -2029,21 +1744,15 @@ static void camellia_encrypt(struct cryp
 	tmp[2] = be32_to_cpu(src[2]);
 	tmp[3] = be32_to_cpu(src[3]);
 
-	switch (cctx->key_length) {
-	case 16:
-		camellia_encrypt128(cctx->key_table, tmp);
-		break;
-	case 24:
-		/* fall through */
-	case 32:
-		camellia_encrypt256(cctx->key_table, tmp);
-		break;
-	}
-
-	dst[0] = cpu_to_be32(tmp[0]);
-	dst[1] = cpu_to_be32(tmp[1]);
-	dst[2] = cpu_to_be32(tmp[2]);
-	dst[3] = cpu_to_be32(tmp[3]);
+	camellia_do_encrypt(cctx->key_table, tmp,
+		cctx->key_length == 16 ? 24 : 32 /* for key lengths of 24 and 32 */
+	);
+
+	/* do_encrypt returns 0,1 swapped with 2,3 */
+	dst[0] = cpu_to_be32(tmp[2]);
+	dst[1] = cpu_to_be32(tmp[3]);
+	dst[2] = cpu_to_be32(tmp[0]);
+	dst[3] = cpu_to_be32(tmp[1]);
 }
 
 static void camellia_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
@@ -2059,21 +1768,15 @@ static void camellia_decrypt(struct cryp
 	tmp[2] = be32_to_cpu(src[2]);
 	tmp[3] = be32_to_cpu(src[3]);
 
-	switch (cctx->key_length) {
-	case 16:
-		camellia_decrypt128(cctx->key_table, tmp);
-		break;
-	case 24:
-		/* fall through */
-	case 32:
-		camellia_decrypt256(cctx->key_table, tmp);
-		break;
-	}
-
-	dst[0] = cpu_to_be32(tmp[0]);
-	dst[1] = cpu_to_be32(tmp[1]);
-	dst[2] = cpu_to_be32(tmp[2]);
-	dst[3] = cpu_to_be32(tmp[3]);
+	camellia_do_decrypt(cctx->key_table, tmp,
+		cctx->key_length == 16 ? 24 : 32 /* for key lengths of 24 and 32 */
+	);
+
+	/* do_decrypt returns 0,1 swapped with 2,3 */
+	dst[0] = cpu_to_be32(tmp[2]);
+	dst[1] = cpu_to_be32(tmp[3]);
+	dst[2] = cpu_to_be32(tmp[0]);
+	dst[3] = cpu_to_be32(tmp[1]);
 }
 
 static struct crypto_alg camellia_alg = {

[-- Attachment #4: linux-2.6.23.1.camellia8.diff --]
[-- Type: text/x-diff, Size: 13738 bytes --]

diff -urpN linux-2.6.23.1.camellia7/crypto/camellia.c linux-2.6.23.1.camellia8/crypto/camellia.c
--- linux-2.6.23.1.camellia7/crypto/camellia.c	2007-11-18 20:15:19.000000000 -0800
+++ linux-2.6.23.1.camellia8/crypto/camellia.c	2007-11-20 18:29:39.000000000 -0800
@@ -391,10 +391,94 @@ static const u32 camellia_sp4404[256] = 
 
 static void camellia_setup_tail(u64 *subkey, u64 *sub, int max)
 {
+	u64 kw4;
 	u64 t;
 	u32 dw;
 	int i;
 
+	/* absorb kw2 to other subkeys */
+	/* round 2 */
+	sub[3] ^= sub[1];
+	/* round 4 */
+	sub[5] ^= sub[1];
+	/* round 6 */
+	sub[7] ^= sub[1];
+	subL(1) ^= subR(1) & ~subR(9);
+	dw = subL(1) & subL(9),
+		subR(1) ^= ROL1(dw); /* modified for FLinv(kl2) */
+	/* round 8 */
+	sub[11] ^= sub[1];
+	/* round 10 */
+	sub[13] ^= sub[1];
+	/* round 12 */
+	sub[15] ^= sub[1];
+	subL(1) ^= subR(1) & ~subR(17);
+	dw = subL(1) & subL(17),
+		subR(1) ^= ROL1(dw); /* modified for FLinv(kl4) */
+	/* round 14 */
+	sub[19] ^= sub[1];
+	/* round 16 */
+	sub[21] ^= sub[1];
+	/* round 18 */
+	sub[23] ^= sub[1];
+	if (max == 24) {
+		/* kw3 */
+		sub[24] ^= sub[1];
+
+	/* absorb kw4 to other subkeys */
+		kw4 = sub[25];
+	} else {
+		subL(1) ^= subR(1) & ~subR(25);
+		dw = subL(1) & subL(25),
+			subR(1) ^= ROL1(dw); /* modified for FLinv(kl6) */
+		/* round 20 */
+		sub[27] ^= sub[1];
+		/* round 22 */
+		sub[29] ^= sub[1];
+		/* round 24 */
+		sub[31] ^= sub[1];
+		/* kw3 */
+		sub[32] ^= sub[1];
+
+	/* absorb kw4 to other subkeys */
+		kw4 = sub[33];
+		/* round 23 */
+		sub[30] ^= kw4;
+		/* round 21 */
+		sub[28] ^= kw4;
+		/* round 19 */
+		sub[26] ^= kw4;
+		kw4 ^= (u64)((u32)kw4 & ~subR(24)) << 32; //kw4l ^= kw4r & ~subR[24];
+		dw = (u32)(kw4 >> 32) & subL(24);
+		kw4 ^= ROL1(dw); /* modified for FL(kl5) */
+	}
+	/* round 17 */
+	sub[22] ^= kw4;
+	/* round 15 */
+	sub[20] ^= kw4;
+	/* round 13 */
+	sub[18] ^= kw4;
+	kw4 ^= (u64)((u32)kw4 & ~subR(16)) << 32; //kw4l ^= kw4r & ~subR(16);
+	dw = (u32)(kw4 >> 32) & subL(16); // kw4l & subL[16],
+	kw4 ^= ROL1(dw); /* modified for FL(kl3) */
+	/* round 11 */
+	sub[14] ^= kw4;
+	/* round 9 */
+	sub[12] ^= kw4;
+	/* round 7 */
+	sub[10] ^= kw4;
+	kw4 ^= (u64)((u32)kw4 & ~subR(8)) << 32; //kw4l ^= kw4r & ~subR[8];
+	dw = (u32)(kw4 >> 32) & subL(8);
+	kw4 ^= ROL1(dw); /* modified for FL(kl1) */
+	/* round 5 */
+	sub[6] ^= kw4;
+	/* round 3 */
+	sub[4] ^= kw4;
+	/* round 1 */
+	sub[2] ^= kw4;
+	/* kw1 */
+	sub[0] ^= kw4;
+
 	/* key XOR is end of F-function */
 	SUBKEY(0) = sub[0] ^ sub[2];/* kw1 */
 	SUBKEY(2) = sub[3];       /* round 1 */
@@ -475,8 +559,6 @@ static void camellia_setup128(const unsi
 {
 	u64 kl, kr;
 	u64 i, w;
-	u64 kw4;
-	u32 dw;
 	u64 sub[26];
 
 	/**
@@ -565,63 +647,6 @@ static void camellia_setup128(const unsi
 	sub[24] = kl;
 	sub[25] = kr;
 
-	/* absorb kw2 to other subkeys */
-	/* round 2 */
-	sub[3] ^= sub[1];
-	/* round 4 */
-	sub[5] ^= sub[1];
-	/* round 6 */
-	sub[7] ^= sub[1];
-	subL(1) ^= subR(1) & ~subR(9);
-	dw = subL(1) & subL(9),
-		subR(1) ^= ROL1(dw); /* modified for FLinv(kl2) */
-	/* round 8 */
-	sub[11] ^= sub[1];
-	/* round 10 */
-	sub[13] ^= sub[1];
-	/* round 12 */
-	sub[15] ^= sub[1];
-	subL(1) ^= subR(1) & ~subR(17);
-	dw = subL(1) & subL(17),
-		subR(1) ^= ROL1(dw); /* modified for FLinv(kl4) */
-	/* round 14 */
-	sub[19] ^= sub[1];
-	/* round 16 */
-	sub[21] ^= sub[1];
-	/* round 18 */
-	sub[23] ^= sub[1];
-	/* kw3 */
-	sub[24] ^= sub[1];
-
-	/* absorb kw4 to other subkeys */
-	kw4 = sub[25];
-	/* round 17 */
-	sub[22] ^= kw4;
-	/* round 15 */
-	sub[20] ^= kw4;
-	/* round 13 */
-	sub[18] ^= kw4;
-	kw4 ^= (u64)((u32)kw4 & ~subR(16)) << 32; //kw4l ^= kw4r & ~subR(16);
-	dw = (u32)(kw4 >> 32) & subL(16); // kw4l & subL[16],
-	kw4 ^= ROL1(dw); /* modified for FL(kl3) */
-	/* round 11 */
-	sub[14] ^= kw4;
-	/* round 9 */
-	sub[12] ^= kw4;
-	/* round 7 */
-	sub[10] ^= kw4;
-	kw4 ^= (u64)((u32)kw4 & ~subR(8)) << 32; //kw4l ^= kw4r & ~subR[8];
-	dw = (u32)(kw4 >> 32) & subL(8);
-	kw4 ^= ROL1(dw); /* modified for FL(kl1) */
-	/* round 5 */
-	sub[6] ^= kw4;
-	/* round 3 */
-	sub[4] ^= kw4;
-	/* round 1 */
-	sub[2] ^= kw4;
-	/* kw1 */
-	sub[0] ^= kw4;
-
 	camellia_setup_tail(subkey, sub, 24);
 }
 
@@ -630,8 +655,6 @@ static void camellia_setup256(const unsi
 	u64 kl, kr;        /* left half of key */
 	u64 krl, krr;      /* right half of key */
 	u64 i, w;          /* temporary variables */
-	u64 kw4;
-	u32 dw;
 	u64 sub[34];
 
 	/**
@@ -756,81 +779,6 @@ static void camellia_setup256(const unsi
 	/* kw4 */
 	sub[33] = krr;
 
-	/* absorb kw2 to other subkeys */
-	/* round 2 */
-	sub[3] ^= sub[1];
-	/* round 4 */
-	sub[5] ^= sub[1];
-	/* round 6 */
-	sub[7] ^= sub[1];
-	subL(1) ^= subR(1) & ~subR(9);
-	dw = subL(1) & subL(9),
-		subR(1) ^= ROL1(dw); /* modified for FLinv(kl2) */
-	/* round 8 */
-	sub[11] ^= sub[1];
-	/* round 10 */
-	sub[13] ^= sub[1];
-	/* round 12 */
-	sub[15] ^= sub[1];
-	subL(1) ^= subR(1) & ~subR(17);
-	dw = subL(1) & subL(17),
-		subR(1) ^= ROL1(dw); /* modified for FLinv(kl4) */
-	/* round 14 */
-	sub[19] ^= sub[1];
-	/* round 16 */
-	sub[21] ^= sub[1];
-	/* round 18 */
-	sub[23] ^= sub[1];
-	subL(1) ^= subR(1) & ~subR(25);
-	dw = subL(1) & subL(25),
-		subR(1) ^= ROL1(dw); /* modified for FLinv(kl6) */
-	/* round 20 */
-	sub[27] ^= sub[1];
-	/* round 22 */
-	sub[29] ^= sub[1];
-	/* round 24 */
-	sub[31] ^= sub[1];
-	/* kw3 */
-	sub[32] ^= sub[1];
-
-	/* absorb kw4 to other subkeys */
-	kw4 = sub[33];
-	/* round 23 */
-	sub[30] ^= kw4;
-	/* round 21 */
-	sub[28] ^= kw4;
-	/* round 19 */
-	sub[26] ^= kw4;
-	kw4 ^= (u64)((u32)kw4 & ~subR(24)) << 32; //kw4l ^= kw4r & ~subR[24];
-	dw = (u32)(kw4 >> 32) & subL(24);
-	kw4 ^= ROL1(dw); /* modified for FL(kl5) */
-	/* round 17 */
-	sub[22] ^= kw4;
-	/* round 15 */
-	sub[20] ^= kw4;
-	/* round 13 */
-	sub[18] ^= kw4;
-	kw4 ^= (u64)((u32)kw4 & ~subR(16)) << 32;
-	dw = (u32)(kw4 >> 32) & subL(16);
-	kw4 ^= ROL1(dw); /* modified for FL(kl3) */
-	/* round 11 */
-	sub[14] ^= kw4;
-	/* round 9 */
-	sub[12] ^= kw4;
-	/* round 7 */
-	sub[10] ^= kw4;
-	kw4 ^= (u64)((u32)kw4 & ~subR(8)) << 32;
-	dw = (u32)(kw4 >> 32) & subL(8);
-	kw4 ^= ROL1(dw); /* modified for FL(kl1) */
-	/* round 5 */
-	sub[6] ^= kw4;
-	/* round 3 */
-	sub[4] ^= kw4;
-	/* round 1 */
-	sub[2] ^= kw4;
-	/* kw1 */
-	sub[0] ^= kw4;
-
 	camellia_setup_tail(subkey, sub, 32);
 }
 
@@ -933,8 +881,92 @@ typedef const u64 const_key_element;
 static void camellia_setup_tail(u32 *subkey, u32 *subL, u32 *subR, int max)
 {
 	u32 dw, tl, tr;
+	u32 kw4l, kw4r;
 	int i;
 
+	/* absorb kw2 to other subkeys */
+	/* round 2 */
+	subL[3] ^= subL[1]; subR[3] ^= subR[1];
+	/* round 4 */
+	subL[5] ^= subL[1]; subR[5] ^= subR[1];
+	/* round 6 */
+	subL[7] ^= subL[1]; subR[7] ^= subR[1];
+	subL[1] ^= subR[1] & ~subR[9];
+	dw = subL[1] & subL[9],
+		subR[1] ^= ROL1(dw); /* modified for FLinv(kl2) */
+	/* round 8 */
+	subL[11] ^= subL[1]; subR[11] ^= subR[1];
+	/* round 10 */
+	subL[13] ^= subL[1]; subR[13] ^= subR[1];
+	/* round 12 */
+	subL[15] ^= subL[1]; subR[15] ^= subR[1];
+	subL[1] ^= subR[1] & ~subR[17];
+	dw = subL[1] & subL[17],
+		subR[1] ^= ROL1(dw); /* modified for FLinv(kl4) */
+	/* round 14 */
+	subL[19] ^= subL[1]; subR[19] ^= subR[1];
+	/* round 16 */
+	subL[21] ^= subL[1]; subR[21] ^= subR[1];
+	/* round 18 */
+	subL[23] ^= subL[1]; subR[23] ^= subR[1];
+	if (max == 24) {
+		/* kw3 */
+		subL[24] ^= subL[1]; subR[24] ^= subR[1];
+
+	/* absorb kw4 to other subkeys */
+		kw4l = subL[25]; kw4r = subR[25];
+	} else {
+		subL[1] ^= subR[1] & ~subR[25];
+		dw = subL[1] & subL[25],
+			subR[1] ^= ROL1(dw); /* modified for FLinv(kl6) */
+		/* round 20 */
+		subL[27] ^= subL[1]; subR[27] ^= subR[1];
+		/* round 22 */
+		subL[29] ^= subL[1]; subR[29] ^= subR[1];
+		/* round 24 */
+		subL[31] ^= subL[1]; subR[31] ^= subR[1];
+		/* kw3 */
+		subL[32] ^= subL[1]; subR[32] ^= subR[1];
+
+	/* absorb kw4 to other subkeys */
+		kw4l = subL[33]; kw4r = subR[33];
+		/* round 23 */
+		subL[30] ^= kw4l; subR[30] ^= kw4r;
+		/* round 21 */
+		subL[28] ^= kw4l; subR[28] ^= kw4r;
+		/* round 19 */
+		subL[26] ^= kw4l; subR[26] ^= kw4r;
+		kw4l ^= kw4r & ~subR[24];
+		dw = kw4l & subL[24],
+			kw4r ^= ROL1(dw); /* modified for FL(kl5) */
+	}
+	/* round 17 */
+	subL[22] ^= kw4l; subR[22] ^= kw4r;
+	/* round 15 */
+	subL[20] ^= kw4l; subR[20] ^= kw4r;
+	/* round 13 */
+	subL[18] ^= kw4l; subR[18] ^= kw4r;
+	kw4l ^= kw4r & ~subR[16];
+	dw = kw4l & subL[16],
+		kw4r ^= ROL1(dw); /* modified for FL(kl3) */
+	/* round 11 */
+	subL[14] ^= kw4l; subR[14] ^= kw4r;
+	/* round 9 */
+	subL[12] ^= kw4l; subR[12] ^= kw4r;
+	/* round 7 */
+	subL[10] ^= kw4l; subR[10] ^= kw4r;
+	kw4l ^= kw4r & ~subR[8];
+	dw = kw4l & subL[8],
+		kw4r ^= ROL1(dw); /* modified for FL(kl1) */
+	/* round 5 */
+	subL[6] ^= kw4l; subR[6] ^= kw4r;
+	/* round 3 */
+	subL[4] ^= kw4l; subR[4] ^= kw4r;
+	/* round 1 */
+	subL[2] ^= kw4l; subR[2] ^= kw4r;
+	/* kw1 */
+	subL[0] ^= kw4l; subR[0] ^= kw4r;
+
 	/* key XOR is end of F-function */
 	SUBKEY_L(0) = subL[0] ^ subL[2];/* kw1 */
 	SUBKEY_R(0) = subR[0] ^ subR[2];
@@ -1049,7 +1081,6 @@ static void camellia_setup128(const unsi
 {
 	u32 kll, klr, krl, krr;
 	u32 il, ir, t0, t1, w0, w1;
-	u32 kw4l, kw4r, dw;
 	u32 subL[26];
 	u32 subR[26];
 
@@ -1149,63 +1180,6 @@ static void camellia_setup128(const unsi
 	subL[24] = kll; subR[24] = klr;
 	subL[25] = krl; subR[25] = krr;
 
-	/* absorb kw2 to other subkeys */
-	/* round 2 */
-	subL[3] ^= subL[1]; subR[3] ^= subR[1];
-	/* round 4 */
-	subL[5] ^= subL[1]; subR[5] ^= subR[1];
-	/* round 6 */
-	subL[7] ^= subL[1]; subR[7] ^= subR[1];
-	subL[1] ^= subR[1] & ~subR[9];
-	dw = subL[1] & subL[9],
-		subR[1] ^= ROL1(dw); /* modified for FLinv(kl2) */
-	/* round 8 */
-	subL[11] ^= subL[1]; subR[11] ^= subR[1];
-	/* round 10 */
-	subL[13] ^= subL[1]; subR[13] ^= subR[1];
-	/* round 12 */
-	subL[15] ^= subL[1]; subR[15] ^= subR[1];
-	subL[1] ^= subR[1] & ~subR[17];
-	dw = subL[1] & subL[17],
-		subR[1] ^= ROL1(dw); /* modified for FLinv(kl4) */
-	/* round 14 */
-	subL[19] ^= subL[1]; subR[19] ^= subR[1];
-	/* round 16 */
-	subL[21] ^= subL[1]; subR[21] ^= subR[1];
-	/* round 18 */
-	subL[23] ^= subL[1]; subR[23] ^= subR[1];
-	/* kw3 */
-	subL[24] ^= subL[1]; subR[24] ^= subR[1];
-
-	/* absorb kw4 to other subkeys */
-	kw4l = subL[25]; kw4r = subR[25];
-	/* round 17 */
-	subL[22] ^= kw4l; subR[22] ^= kw4r;
-	/* round 15 */
-	subL[20] ^= kw4l; subR[20] ^= kw4r;
-	/* round 13 */
-	subL[18] ^= kw4l; subR[18] ^= kw4r;
-	kw4l ^= kw4r & ~subR[16];
-	dw = kw4l & subL[16],
-		kw4r ^= ROL1(dw); /* modified for FL(kl3) */
-	/* round 11 */
-	subL[14] ^= kw4l; subR[14] ^= kw4r;
-	/* round 9 */
-	subL[12] ^= kw4l; subR[12] ^= kw4r;
-	/* round 7 */
-	subL[10] ^= kw4l; subR[10] ^= kw4r;
-	kw4l ^= kw4r & ~subR[8];
-	dw = kw4l & subL[8],
-		kw4r ^= ROL1(dw); /* modified for FL(kl1) */
-	/* round 5 */
-	subL[6] ^= kw4l; subR[6] ^= kw4r;
-	/* round 3 */
-	subL[4] ^= kw4l; subR[4] ^= kw4r;
-	/* round 1 */
-	subL[2] ^= kw4l; subR[2] ^= kw4r;
-	/* kw1 */
-	subL[0] ^= kw4l; subR[0] ^= kw4r;
-
 	camellia_setup_tail(subkey, subL, subR, 24);
 }
 
@@ -1214,7 +1188,6 @@ static void camellia_setup256(const unsi
 	u32 kll, klr, krl, krr;        /* left half of key */
 	u32 krll, krlr, krrl, krrr;    /* right half of key */
 	u32 il, ir, t0, t1, w0, w1;    /* temporary variables */
-	u32 kw4l, kw4r, dw;
 	u32 subL[34];
 	u32 subR[34];
 
@@ -1356,81 +1329,6 @@ static void camellia_setup256(const unsi
 	/* kw4 */
 	subL[33] = krrl; subR[33] = krrr;
 
-	/* absorb kw2 to other subkeys */
-	/* round 2 */
-	subL[3] ^= subL[1]; subR[3] ^= subR[1];
-	/* round 4 */
-	subL[5] ^= subL[1]; subR[5] ^= subR[1];
-	/* round 6 */
-	subL[7] ^= subL[1]; subR[7] ^= subR[1];
-	subL[1] ^= subR[1] & ~subR[9];
-	dw = subL[1] & subL[9],
-		subR[1] ^= ROL1(dw); /* modified for FLinv(kl2) */
-	/* round 8 */
-	subL[11] ^= subL[1]; subR[11] ^= subR[1];
-	/* round 10 */
-	subL[13] ^= subL[1]; subR[13] ^= subR[1];
-	/* round 12 */
-	subL[15] ^= subL[1]; subR[15] ^= subR[1];
-	subL[1] ^= subR[1] & ~subR[17];
-	dw = subL[1] & subL[17],
-		subR[1] ^= ROL1(dw); /* modified for FLinv(kl4) */
-	/* round 14 */
-	subL[19] ^= subL[1]; subR[19] ^= subR[1];
-	/* round 16 */
-	subL[21] ^= subL[1]; subR[21] ^= subR[1];
-	/* round 18 */
-	subL[23] ^= subL[1]; subR[23] ^= subR[1];
-	subL[1] ^= subR[1] & ~subR[25];
-	dw = subL[1] & subL[25],
-		subR[1] ^= ROL1(dw); /* modified for FLinv(kl6) */
-	/* round 20 */
-	subL[27] ^= subL[1]; subR[27] ^= subR[1];
-	/* round 22 */
-	subL[29] ^= subL[1]; subR[29] ^= subR[1];
-	/* round 24 */
-	subL[31] ^= subL[1]; subR[31] ^= subR[1];
-	/* kw3 */
-	subL[32] ^= subL[1]; subR[32] ^= subR[1];
-
-	/* absorb kw4 to other subkeys */
-	kw4l = subL[33]; kw4r = subR[33];
-	/* round 23 */
-	subL[30] ^= kw4l; subR[30] ^= kw4r;
-	/* round 21 */
-	subL[28] ^= kw4l; subR[28] ^= kw4r;
-	/* round 19 */
-	subL[26] ^= kw4l; subR[26] ^= kw4r;
-	kw4l ^= kw4r & ~subR[24];
-	dw = kw4l & subL[24],
-		kw4r ^= ROL1(dw); /* modified for FL(kl5) */
-	/* round 17 */
-	subL[22] ^= kw4l; subR[22] ^= kw4r;
-	/* round 15 */
-	subL[20] ^= kw4l; subR[20] ^= kw4r;
-	/* round 13 */
-	subL[18] ^= kw4l; subR[18] ^= kw4r;
-	kw4l ^= kw4r & ~subR[16];
-	dw = kw4l & subL[16],
-		kw4r ^= ROL1(dw); /* modified for FL(kl3) */
-	/* round 11 */
-	subL[14] ^= kw4l; subR[14] ^= kw4r;
-	/* round 9 */
-	subL[12] ^= kw4l; subR[12] ^= kw4r;
-	/* round 7 */
-	subL[10] ^= kw4l; subR[10] ^= kw4r;
-	kw4l ^= kw4r & ~subR[8];
-	dw = kw4l & subL[8],
-		kw4r ^= ROL1(dw); /* modified for FL(kl1) */
-	/* round 5 */
-	subL[6] ^= kw4l; subR[6] ^= kw4r;
-	/* round 3 */
-	subL[4] ^= kw4l; subR[4] ^= kw4r;
-	/* round 1 */
-	subL[2] ^= kw4l; subR[2] ^= kw4r;
-	/* kw1 */
-	subL[0] ^= kw4l; subR[0] ^= kw4r;
-
 	camellia_setup_tail(subkey, subL, subR, 32);
 }
 

[-- Attachment #5: linux-2.6.23.1.camellia7.diff --]
[-- Type: text/x-diff, Size: 21989 bytes --]

diff -urpN linux-2.6.23.1.camellia6/crypto/camellia.c linux-2.6.23.1.camellia7/crypto/camellia.c
--- linux-2.6.23.1.camellia6/crypto/camellia.c	2007-11-14 11:30:27.000000000 -0800
+++ linux-2.6.23.1.camellia7/crypto/camellia.c	2007-11-18 20:15:19.000000000 -0800
@@ -380,15 +380,80 @@ static const u32 camellia_sp4404[256] = 
 #ifdef __BIG_ENDIAN
 #define SUBKEY_L(INDEX) (((u32*)subkey)[(INDEX)*2])
 #define SUBKEY_R(INDEX) (((u32*)subkey)[(INDEX)*2 + 1])
+#define subL(INDEX) (((u32*)sub)[(INDEX)*2])
+#define subR(INDEX) (((u32*)sub)[(INDEX)*2 + 1])
 #else
 #define SUBKEY_L(INDEX) (((u32*)subkey)[(INDEX)*2 + 1])
 #define SUBKEY_R(INDEX) (((u32*)subkey)[(INDEX)*2])
+#define subL(INDEX) (((u32*)sub)[(INDEX)*2 + 1])
+#define subR(INDEX) (((u32*)sub)[(INDEX)*2])
 #endif
 
-static void camellia_setup_tail(u64 *subkey, int max)
+static void camellia_setup_tail(u64 *subkey, u64 *sub, int max)
 {
+	u64 t;
 	u32 dw;
-	int i = 2;
+	int i;
+
+	/* key XOR is end of F-function */
+	SUBKEY(0) = sub[0] ^ sub[2];/* kw1 */
+	SUBKEY(2) = sub[3];       /* round 1 */
+	SUBKEY(3) = sub[2] ^ sub[4]; /* round 2 */
+	SUBKEY(4) = sub[3] ^ sub[5]; /* round 3 */
+	SUBKEY(5) = sub[4] ^ sub[6]; /* round 4 */
+	SUBKEY(6) = sub[5] ^ sub[7]; /* round 5 */
+	t = subL(10) ^ (subR(10) & ~subR(8)); // tl = subL[10] ^ (subR[10] & ~subR[8]);
+	dw = (u32)t & subL(8);  /* FL(kl1) */
+	t = (t << 32) | (subR(10) ^ ROL1(dw)); // tr = subR[10] ^ ROL1(dw);
+	SUBKEY(7) = sub[6] ^ t;   /* round 6 */
+	SUBKEY(8) = sub[8];       /* FL(kl1) */
+	SUBKEY(9) = sub[9];       /* FLinv(kl2) */
+	t = subL(7) ^ (subR(7) & ~subR(9));
+	dw = (u32)t & subL(9);  /* FLinv(kl2) */
+	t = (t << 32) | (subR(7) ^ ROL1(dw));
+	SUBKEY(10) = t ^ sub[11]; /* round 7 */
+	SUBKEY(11) = sub[10] ^ sub[12]; /* round 8 */
+	SUBKEY(12) = sub[11] ^ sub[13]; /* round 9 */
+	SUBKEY(13) = sub[12] ^ sub[14]; /* round 10 */
+	SUBKEY(14) = sub[13] ^ sub[15]; /* round 11 */
+	t = subL(18) ^ (subR(18) & ~subR(16));
+	dw = (u32)t & subL(16); /* FL(kl3) */
+	t = (t << 32) | (subR(18) ^ ROL1(dw));
+	SUBKEY(15) = sub[14] ^ t; /* round 12 */
+	SUBKEY(16) = sub[16];     /* FL(kl3) */
+	SUBKEY(17) = sub[17];     /* FLinv(kl4) */
+	t = subL(15) ^ (subR(15) & ~subR(17));
+	dw = (u32)t & subL(17); /* FLinv(kl4) */
+	t = (t << 32) | (subR(15) ^ ROL1(dw));
+	SUBKEY(18) = t ^ sub[19]; /* round 13 */
+	SUBKEY(19) = sub[18] ^ sub[20]; /* round 14 */
+	SUBKEY(20) = sub[19] ^ sub[21]; /* round 15 */
+	SUBKEY(21) = sub[20] ^ sub[22]; /* round 16 */
+	SUBKEY(22) = sub[21] ^ sub[23]; /* round 17 */
+	if (max == 24) {
+		SUBKEY(23) = sub[22];     /* round 18 */
+		SUBKEY(24) = sub[24] ^ sub[23]; /* kw3 */
+	} else { 
+		t = subL(26) ^ (subR(26) & ~subR(24));
+		dw = (u32)t & subL(24); /* FL(kl5) */
+		t = (t << 32) | (subR(26) ^ ROL1(dw));
+		SUBKEY(23) = sub[22] ^ t; /* round 18 */
+		SUBKEY(24) = sub[24];     /* FL(kl5) */
+		SUBKEY(25) = sub[25];     /* FLinv(kl6) */
+		t = subL(23) ^ (subR(23) & ~subR(25));
+		dw = (u32)t & subL(25); /* FLinv(kl6) */
+		t = (t << 32) | (subR(23) ^ ROL1(dw));
+		SUBKEY(26) = t ^ sub[27]; /* round 19 */
+		SUBKEY(27) = sub[26] ^ sub[28]; /* round 20 */
+		SUBKEY(28) = sub[27] ^ sub[29]; /* round 21 */
+		SUBKEY(29) = sub[28] ^ sub[30]; /* round 22 */
+		SUBKEY(30) = sub[29] ^ sub[31]; /* round 23 */
+		SUBKEY(31) = sub[30];     /* round 24 */
+		SUBKEY(32) = sub[32] ^ sub[31]; /* kw3 */
+	}
+
+	/* apply the inverse of the last half of P-function */
+	i = 2;
 	do {
 		dw = SUBKEY_L(i + 0) ^ SUBKEY_R(i + 0); dw = ROL8(dw);/* round 1 */
 		SUBKEY_R(i + 0) = SUBKEY_L(i + 0) ^ dw; SUBKEY_L(i + 0) = dw;
@@ -406,31 +471,21 @@ static void camellia_setup_tail(u64 *sub
 	} while (i < max);
 }
 
-#ifdef __BIG_ENDIAN
-#define subL(INDEX) (((u32*)sub)[(INDEX)*2])
-#define subR(INDEX) (((u32*)sub)[(INDEX)*2 + 1])
-#else
-#define subL(INDEX) (((u32*)sub)[(INDEX)*2 + 1])
-#define subR(INDEX) (((u32*)sub)[(INDEX)*2])
-#endif
-
 static void camellia_setup128(const unsigned char *key, u64 *subkey)
 {
 	u64 kl, kr;
-	u64 i, t, w;
+	u64 i, w;
 	u64 kw4;
 	u32 dw;
 	u64 sub[26];
 
 	/**
-	 *  k == kl || kr (|| is concatination)
+	 *  k == kl || kr (|| is concatenation)
 	 */
 	GETU64(kl, key     );
 	GETU64(kr, key +  8);
 
-	/**
-	 * generate KL dependent subkeys
-	 */
+	/* generate KL dependent subkeys */
 	/* kw1 */
 	sub[0] = kl;
 	/* kw2 */
@@ -567,60 +622,21 @@ static void camellia_setup128(const unsi
 	/* kw1 */
 	sub[0] ^= kw4;
 
-	/* key XOR is end of F-function */
-	SUBKEY(0) = sub[0] ^ sub[2];/* kw1 */
-	SUBKEY(2) = sub[3];       /* round 1 */
-	SUBKEY(3) = sub[2] ^ sub[4]; /* round 2 */
-	SUBKEY(4) = sub[3] ^ sub[5]; /* round 3 */
-	SUBKEY(5) = sub[4] ^ sub[6]; /* round 4 */
-	SUBKEY(6) = sub[5] ^ sub[7]; /* round 5 */
-	t = subL(10) ^ (subR(10) & ~subR(8)); // tl = subL[10] ^ (subR[10] & ~subR[8]);
-	dw = (u32)t & subL(8);  /* FL(kl1) */
-	t = (t << 32) | (subR(10) ^ ROL1(dw)); // tr = subR[10] ^ ROL1(dw);
-	SUBKEY(7) = sub[6] ^ t; /* round 6 */
-	SUBKEY(8) = sub[8];       /* FL(kl1) */
-	SUBKEY(9) = sub[9];       /* FLinv(kl2) */
-	t = subL(7) ^ (subR(7) & ~subR(9));
-	dw = (u32)t & subL(9);  /* FLinv(kl2) */
-	t = (t << 32) | (subR(7) ^ ROL1(dw));
-	SUBKEY(10) = t ^ sub[11]; /* round 7 */
-	SUBKEY(11) = sub[10] ^ sub[12]; /* round 8 */
-	SUBKEY(12) = sub[11] ^ sub[13]; /* round 9 */
-	SUBKEY(13) = sub[12] ^ sub[14]; /* round 10 */
-	SUBKEY(14) = sub[13] ^ sub[15]; /* round 11 */
-	t = subL(18) ^ (subR(18) & ~subR(16));
-	dw = (u32)t & subL(16); /* FL(kl3) */
-	t = (t << 32) | (subR(18) ^ ROL1(dw));
-	SUBKEY(15) = sub[14] ^ t; /* round 12 */
-	SUBKEY(16) = sub[16];     /* FL(kl3) */
-	SUBKEY(17) = sub[17];     /* FLinv(kl4) */
-	t = subL(15) ^ (subR(15) & ~subR(17));
-	dw = (u32)t & subL(17); /* FLinv(kl4) */
-	t = (t << 32) | (subR(15) ^ ROL1(dw));
-	SUBKEY(18) = t ^ sub[19]; /* round 13 */
-	SUBKEY(19) = sub[18] ^ sub[20]; /* round 14 */
-	SUBKEY(20) = sub[19] ^ sub[21]; /* round 15 */
-	SUBKEY(21) = sub[20] ^ sub[22]; /* round 16 */
-	SUBKEY(22) = sub[21] ^ sub[23]; /* round 17 */
-	SUBKEY(23) = sub[22];     /* round 18 */
-	SUBKEY(24) = sub[24] ^ sub[23]; /* kw3 */
-
-	/* apply the inverse of the last half of P-function */
-	camellia_setup_tail(subkey, 24);
+	camellia_setup_tail(subkey, sub, 24);
 }
 
 static void camellia_setup256(const unsigned char *key, u64 *subkey)
 {
 	u64 kl, kr;        /* left half of key */
 	u64 krl, krr;      /* right half of key */
-	u64 i, t, w;       /* temporary variables */
+	u64 i, w;          /* temporary variables */
 	u64 kw4;
 	u32 dw;
 	u64 sub[34];
 
 	/**
 	 *  key = (kl || kr || krl || krr)
-	 *  (|| is concatination)
+	 *  (|| is concatenation)
 	 */
 	GETU64(kl,  key     );
 	GETU64(kr,  key +  8);
@@ -786,8 +802,8 @@ static void camellia_setup256(const unsi
 	/* round 19 */
 	sub[26] ^= kw4;
 	kw4 ^= (u64)((u32)kw4 & ~subR(24)) << 32; //kw4l ^= kw4r & ~subR[24];
-	dw = (u32)(kw4 >> 32) & subL(24),
-		kw4 ^= ROL1(dw); /* modified for FL(kl5) */
+	dw = (u32)(kw4 >> 32) & subL(24);
+	kw4 ^= ROL1(dw); /* modified for FL(kl5) */
 	/* round 17 */
 	sub[22] ^= kw4;
 	/* round 15 */
@@ -795,8 +811,8 @@ static void camellia_setup256(const unsi
 	/* round 13 */
 	sub[18] ^= kw4;
 	kw4 ^= (u64)((u32)kw4 & ~subR(16)) << 32;
-	dw = (u32)(kw4 >> 32) & subL(16),
-		kw4 ^= ROL1(dw); /* modified for FL(kl3) */
+	dw = (u32)(kw4 >> 32) & subL(16);
+	kw4 ^= ROL1(dw); /* modified for FL(kl3) */
 	/* round 11 */
 	sub[14] ^= kw4;
 	/* round 9 */
@@ -804,8 +820,8 @@ static void camellia_setup256(const unsi
 	/* round 7 */
 	sub[10] ^= kw4;
 	kw4 ^= (u64)((u32)kw4 & ~subR(8)) << 32;
-	dw = (u32)(kw4 >> 32) & subL(8),
-		kw4 ^= ROL1(dw); /* modified for FL(kl1) */
+	dw = (u32)(kw4 >> 32) & subL(8);
+	kw4 ^= ROL1(dw); /* modified for FL(kl1) */
 	/* round 5 */
 	sub[6] ^= kw4;
 	/* round 3 */
@@ -815,60 +831,7 @@ static void camellia_setup256(const unsi
 	/* kw1 */
 	sub[0] ^= kw4;
 
-	/* key XOR is end of F-function */
-	SUBKEY(0) = sub[0] ^ sub[2];/* kw1 */
-	SUBKEY(2) = sub[3];       /* round 1 */
-	SUBKEY(3) = sub[2] ^ sub[4]; /* round 2 */
-	SUBKEY(4) = sub[3] ^ sub[5]; /* round 3 */
-	SUBKEY(5) = sub[4] ^ sub[6]; /* round 4 */
-	SUBKEY(6) = sub[5] ^ sub[7]; /* round 5 */
-	t = subL(10) ^ (subR(10) & ~subR(8)); // tl = subL[10] ^ (subR[10] & ~subR[8]);
-	dw = (u32)t & subL(8);  /* FL(kl1) */
-	t = (t << 32) | (subR(10) ^ ROL1(dw)); //tr = subR[10] ^ ROL1(dw);
-	SUBKEY(7) = sub[6] ^ t;   /* round 6 */
-	SUBKEY(8) = sub[8];       /* FL(kl1) */
-	SUBKEY(9) = sub[9];       /* FLinv(kl2) */
-	t = subL(7) ^ (subR(7) & ~subR(9));
-	dw = (u32)t & subL(9);  /* FLinv(kl2) */
-	t = (t << 32) | (subR(7) ^ ROL1(dw));
-	SUBKEY(10) = t ^ sub[11]; /* round 7 */
-	SUBKEY(11) = sub[10] ^ sub[12]; /* round 8 */
-	SUBKEY(12) = sub[11] ^ sub[13]; /* round 9 */
-	SUBKEY(13) = sub[12] ^ sub[14]; /* round 10 */
-	SUBKEY(14) = sub[13] ^ sub[15]; /* round 11 */
-	t = subL(18) ^ (subR(18) & ~subR(16));
-	dw = (u32)t & subL(16); /* FL(kl3) */
-	t = (t << 32) | (subR(18) ^ ROL1(dw));
-	SUBKEY(15) = sub[14] ^ t; /* round 12 */
-	SUBKEY(16) = sub[16];     /* FL(kl3) */
-	SUBKEY(17) = sub[17];     /* FLinv(kl4) */
-	t = subL(15) ^ (subR(15) & ~subR(17));
-	dw = (u32)t & subL(17); /* FLinv(kl4) */
-	t = (t << 32) | (subR(15) ^ ROL1(dw));
-	SUBKEY(18) = t ^ sub[19]; /* round 13 */
-	SUBKEY(19) = sub[18] ^ sub[20]; /* round 14 */
-	SUBKEY(20) = sub[19] ^ sub[21]; /* round 15 */
-	SUBKEY(21) = sub[20] ^ sub[22]; /* round 16 */
-	SUBKEY(22) = sub[21] ^ sub[23]; /* round 17 */
-	t = subL(26) ^ (subR(26) & ~subR(24));
-	dw = (u32)t & subL(24); /* FL(kl5) */
-	t = (t << 32) | (subR(26) ^ ROL1(dw));
-	SUBKEY(23) = sub[22] ^ t; /* round 18 */
-	SUBKEY(24) = sub[24];     /* FL(kl5) */
-	SUBKEY(25) = sub[25];     /* FLinv(kl6) */
-	t = subL(23) ^ (subR(23) & ~subR(25));
-	dw = (u32)t & subL(25); /* FLinv(kl6) */
-	t = (t << 32) | (subR(23) ^ ROL1(dw));
-	SUBKEY(26) = t ^ sub[27]; /* round 19 */
-	SUBKEY(27) = sub[26] ^ sub[28]; /* round 20 */
-	SUBKEY(28) = sub[27] ^ sub[29]; /* round 21 */
-	SUBKEY(29) = sub[28] ^ sub[30]; /* round 22 */
-	SUBKEY(30) = sub[29] ^ sub[31]; /* round 23 */
-	SUBKEY(31) = sub[30];     /* round 24 */
-	SUBKEY(32) = sub[32] ^ sub[31]; /* kw3 */
-
-	/* apply the inverse of the last half of P-function */
-	camellia_setup_tail(subkey, 32);
+	camellia_setup_tail(subkey, sub, 32);
 }
 
 static void camellia_setup192(const unsigned char *key, u64 *subkey)
@@ -967,10 +930,104 @@ typedef const u64 const_key_element;
 #define SUBKEY_L(INDEX) (subkey[(INDEX)*2])
 #define SUBKEY_R(INDEX) (subkey[(INDEX)*2 + 1])
 
-static void camellia_setup_tail(u32 *subkey, int max)
+static void camellia_setup_tail(u32 *subkey, u32 *subL, u32 *subR, int max)
 {
-	u32 dw;
-	int i = 2;
+	u32 dw, tl, tr;
+	int i;
+
+	/* key XOR is end of F-function */
+	SUBKEY_L(0) = subL[0] ^ subL[2];/* kw1 */
+	SUBKEY_R(0) = subR[0] ^ subR[2];
+	SUBKEY_L(2) = subL[3];       /* round 1 */
+	SUBKEY_R(2) = subR[3];
+	SUBKEY_L(3) = subL[2] ^ subL[4]; /* round 2 */
+	SUBKEY_R(3) = subR[2] ^ subR[4];
+	SUBKEY_L(4) = subL[3] ^ subL[5]; /* round 3 */
+	SUBKEY_R(4) = subR[3] ^ subR[5];
+	SUBKEY_L(5) = subL[4] ^ subL[6]; /* round 4 */
+	SUBKEY_R(5) = subR[4] ^ subR[6];
+	SUBKEY_L(6) = subL[5] ^ subL[7]; /* round 5 */
+	SUBKEY_R(6) = subR[5] ^ subR[7];
+	tl = subL[10] ^ (subR[10] & ~subR[8]);
+	dw = tl & subL[8],  /* FL(kl1) */
+		tr = subR[10] ^ ROL1(dw);
+	SUBKEY_L(7) = subL[6] ^ tl; /* round 6 */
+	SUBKEY_R(7) = subR[6] ^ tr;
+	SUBKEY_L(8) = subL[8];       /* FL(kl1) */
+	SUBKEY_R(8) = subR[8];
+	SUBKEY_L(9) = subL[9];       /* FLinv(kl2) */
+	SUBKEY_R(9) = subR[9];
+	tl = subL[7] ^ (subR[7] & ~subR[9]);
+	dw = tl & subL[9],  /* FLinv(kl2) */
+		tr = subR[7] ^ ROL1(dw);
+	SUBKEY_L(10) = tl ^ subL[11]; /* round 7 */
+	SUBKEY_R(10) = tr ^ subR[11];
+	SUBKEY_L(11) = subL[10] ^ subL[12]; /* round 8 */
+	SUBKEY_R(11) = subR[10] ^ subR[12];
+	SUBKEY_L(12) = subL[11] ^ subL[13]; /* round 9 */
+	SUBKEY_R(12) = subR[11] ^ subR[13];
+	SUBKEY_L(13) = subL[12] ^ subL[14]; /* round 10 */
+	SUBKEY_R(13) = subR[12] ^ subR[14];
+	SUBKEY_L(14) = subL[13] ^ subL[15]; /* round 11 */
+	SUBKEY_R(14) = subR[13] ^ subR[15];
+	tl = subL[18] ^ (subR[18] & ~subR[16]);
+	dw = tl & subL[16], /* FL(kl3) */
+		tr = subR[18] ^ ROL1(dw);
+	SUBKEY_L(15) = subL[14] ^ tl; /* round 12 */
+	SUBKEY_R(15) = subR[14] ^ tr;
+	SUBKEY_L(16) = subL[16];     /* FL(kl3) */
+	SUBKEY_R(16) = subR[16];
+	SUBKEY_L(17) = subL[17];     /* FLinv(kl4) */
+	SUBKEY_R(17) = subR[17];
+	tl = subL[15] ^ (subR[15] & ~subR[17]);
+	dw = tl & subL[17], /* FLinv(kl4) */
+		tr = subR[15] ^ ROL1(dw);
+	SUBKEY_L(18) = tl ^ subL[19]; /* round 13 */
+	SUBKEY_R(18) = tr ^ subR[19];
+	SUBKEY_L(19) = subL[18] ^ subL[20]; /* round 14 */
+	SUBKEY_R(19) = subR[18] ^ subR[20];
+	SUBKEY_L(20) = subL[19] ^ subL[21]; /* round 15 */
+	SUBKEY_R(20) = subR[19] ^ subR[21];
+	SUBKEY_L(21) = subL[20] ^ subL[22]; /* round 16 */
+	SUBKEY_R(21) = subR[20] ^ subR[22];
+	SUBKEY_L(22) = subL[21] ^ subL[23]; /* round 17 */
+	SUBKEY_R(22) = subR[21] ^ subR[23];
+	if (max == 24) {
+		SUBKEY_L(23) = subL[22];     /* round 18 */
+		SUBKEY_R(23) = subR[22];
+		SUBKEY_L(24) = subL[24] ^ subL[23]; /* kw3 */
+		SUBKEY_R(24) = subR[24] ^ subR[23];
+	} else {
+		tl = subL[26] ^ (subR[26] & ~subR[24]);
+		dw = tl & subL[24], /* FL(kl5) */
+			tr = subR[26] ^ ROL1(dw);
+		SUBKEY_L(23) = subL[22] ^ tl; /* round 18 */
+		SUBKEY_R(23) = subR[22] ^ tr;
+		SUBKEY_L(24) = subL[24];     /* FL(kl5) */
+		SUBKEY_R(24) = subR[24];
+		SUBKEY_L(25) = subL[25];     /* FLinv(kl6) */
+		SUBKEY_R(25) = subR[25];
+		tl = subL[23] ^ (subR[23] & ~subR[25]);
+		dw = tl & subL[25], /* FLinv(kl6) */
+			tr = subR[23] ^ ROL1(dw);
+		SUBKEY_L(26) = tl ^ subL[27]; /* round 19 */
+		SUBKEY_R(26) = tr ^ subR[27];
+		SUBKEY_L(27) = subL[26] ^ subL[28]; /* round 20 */
+		SUBKEY_R(27) = subR[26] ^ subR[28];
+		SUBKEY_L(28) = subL[27] ^ subL[29]; /* round 21 */
+		SUBKEY_R(28) = subR[27] ^ subR[29];
+		SUBKEY_L(29) = subL[28] ^ subL[30]; /* round 22 */
+		SUBKEY_R(29) = subR[28] ^ subR[30];
+		SUBKEY_L(30) = subL[29] ^ subL[31]; /* round 23 */
+		SUBKEY_R(30) = subR[29] ^ subR[31];
+		SUBKEY_L(31) = subL[30];     /* round 24 */
+		SUBKEY_R(31) = subR[30];
+		SUBKEY_L(32) = subL[32] ^ subL[31]; /* kw3 */
+		SUBKEY_R(32) = subR[32] ^ subR[31];
+	}
+
+	/* apply the inverse of the last half of P-function */
+	i = 2;
 	do {
 		dw = SUBKEY_L(i + 0) ^ SUBKEY_R(i + 0); dw = ROL8(dw);/* round 1 */
 		SUBKEY_R(i + 0) = SUBKEY_L(i + 0) ^ dw; SUBKEY_L(i + 0) = dw;
@@ -992,21 +1049,19 @@ static void camellia_setup128(const unsi
 {
 	u32 kll, klr, krl, krr;
 	u32 il, ir, t0, t1, w0, w1;
-	u32 kw4l, kw4r, dw, tl, tr;
+	u32 kw4l, kw4r, dw;
 	u32 subL[26];
 	u32 subR[26];
 
 	/**
-	 *  k == kll || klr || krl || krr (|| is concatination)
+	 *  k == kll || klr || krl || krr (|| is concatenation)
 	 */
 	GETU32(kll, key     );
 	GETU32(klr, key +  4);
 	GETU32(krl, key +  8);
 	GETU32(krr, key + 12);
 
-	/**
-	 * generate KL dependent subkeys
-	 */
+	/* generate KL dependent subkeys */
 	/* kw1 */
 	subL[0] = kll; subR[0] = klr;
 	/* kw2 */
@@ -1151,70 +1206,7 @@ static void camellia_setup128(const unsi
 	/* kw1 */
 	subL[0] ^= kw4l; subR[0] ^= kw4r;
 
-	/* key XOR is end of F-function */
-	SUBKEY_L(0) = subL[0] ^ subL[2];/* kw1 */
-	SUBKEY_R(0) = subR[0] ^ subR[2];
-	SUBKEY_L(2) = subL[3];       /* round 1 */
-	SUBKEY_R(2) = subR[3];
-	SUBKEY_L(3) = subL[2] ^ subL[4]; /* round 2 */
-	SUBKEY_R(3) = subR[2] ^ subR[4];
-	SUBKEY_L(4) = subL[3] ^ subL[5]; /* round 3 */
-	SUBKEY_R(4) = subR[3] ^ subR[5];
-	SUBKEY_L(5) = subL[4] ^ subL[6]; /* round 4 */
-	SUBKEY_R(5) = subR[4] ^ subR[6];
-	SUBKEY_L(6) = subL[5] ^ subL[7]; /* round 5 */
-	SUBKEY_R(6) = subR[5] ^ subR[7];
-	tl = subL[10] ^ (subR[10] & ~subR[8]);
-	dw = tl & subL[8],  /* FL(kl1) */
-		tr = subR[10] ^ ROL1(dw);
-	SUBKEY_L(7) = subL[6] ^ tl; /* round 6 */
-	SUBKEY_R(7) = subR[6] ^ tr;
-	SUBKEY_L(8) = subL[8];       /* FL(kl1) */
-	SUBKEY_R(8) = subR[8];
-	SUBKEY_L(9) = subL[9];       /* FLinv(kl2) */
-	SUBKEY_R(9) = subR[9];
-	tl = subL[7] ^ (subR[7] & ~subR[9]);
-	dw = tl & subL[9],  /* FLinv(kl2) */
-		tr = subR[7] ^ ROL1(dw);
-	SUBKEY_L(10) = tl ^ subL[11]; /* round 7 */
-	SUBKEY_R(10) = tr ^ subR[11];
-	SUBKEY_L(11) = subL[10] ^ subL[12]; /* round 8 */
-	SUBKEY_R(11) = subR[10] ^ subR[12];
-	SUBKEY_L(12) = subL[11] ^ subL[13]; /* round 9 */
-	SUBKEY_R(12) = subR[11] ^ subR[13];
-	SUBKEY_L(13) = subL[12] ^ subL[14]; /* round 10 */
-	SUBKEY_R(13) = subR[12] ^ subR[14];
-	SUBKEY_L(14) = subL[13] ^ subL[15]; /* round 11 */
-	SUBKEY_R(14) = subR[13] ^ subR[15];
-	tl = subL[18] ^ (subR[18] & ~subR[16]);
-	dw = tl & subL[16], /* FL(kl3) */
-		tr = subR[18] ^ ROL1(dw);
-	SUBKEY_L(15) = subL[14] ^ tl; /* round 12 */
-	SUBKEY_R(15) = subR[14] ^ tr;
-	SUBKEY_L(16) = subL[16];     /* FL(kl3) */
-	SUBKEY_R(16) = subR[16];
-	SUBKEY_L(17) = subL[17];     /* FLinv(kl4) */
-	SUBKEY_R(17) = subR[17];
-	tl = subL[15] ^ (subR[15] & ~subR[17]);
-	dw = tl & subL[17], /* FLinv(kl4) */
-		tr = subR[15] ^ ROL1(dw);
-	SUBKEY_L(18) = tl ^ subL[19]; /* round 13 */
-	SUBKEY_R(18) = tr ^ subR[19];
-	SUBKEY_L(19) = subL[18] ^ subL[20]; /* round 14 */
-	SUBKEY_R(19) = subR[18] ^ subR[20];
-	SUBKEY_L(20) = subL[19] ^ subL[21]; /* round 15 */
-	SUBKEY_R(20) = subR[19] ^ subR[21];
-	SUBKEY_L(21) = subL[20] ^ subL[22]; /* round 16 */
-	SUBKEY_R(21) = subR[20] ^ subR[22];
-	SUBKEY_L(22) = subL[21] ^ subL[23]; /* round 17 */
-	SUBKEY_R(22) = subR[21] ^ subR[23];
-	SUBKEY_L(23) = subL[22];     /* round 18 */
-	SUBKEY_R(23) = subR[22];
-	SUBKEY_L(24) = subL[24] ^ subL[23]; /* kw3 */
-	SUBKEY_R(24) = subR[24] ^ subR[23];
-
-	/* apply the inverse of the last half of P-function */
-	camellia_setup_tail(subkey, 24);
+	camellia_setup_tail(subkey, subL, subR, 24);
 }
 
 static void camellia_setup256(const unsigned char *key, u32 *subkey)
@@ -1222,13 +1214,13 @@ static void camellia_setup256(const unsi
 	u32 kll, klr, krl, krr;        /* left half of key */
 	u32 krll, krlr, krrl, krrr;    /* right half of key */
 	u32 il, ir, t0, t1, w0, w1;    /* temporary variables */
-	u32 kw4l, kw4r, dw, tl, tr;
+	u32 kw4l, kw4r, dw;
 	u32 subL[34];
 	u32 subR[34];
 
 	/**
 	 *  key = (kll || klr || krl || krr || krll || krlr || krrl || krrr)
-	 *  (|| is concatination)
+	 *  (|| is concatenation)
 	 */
 	GETU32(kll,  key     );
 	GETU32(klr,  key +  4);
@@ -1439,92 +1431,7 @@ static void camellia_setup256(const unsi
 	/* kw1 */
 	subL[0] ^= kw4l; subR[0] ^= kw4r;
 
-	/* key XOR is end of F-function */
-	SUBKEY_L(0) = subL[0] ^ subL[2];/* kw1 */
-	SUBKEY_R(0) = subR[0] ^ subR[2];
-	SUBKEY_L(2) = subL[3];       /* round 1 */
-	SUBKEY_R(2) = subR[3];
-	SUBKEY_L(3) = subL[2] ^ subL[4]; /* round 2 */
-	SUBKEY_R(3) = subR[2] ^ subR[4];
-	SUBKEY_L(4) = subL[3] ^ subL[5]; /* round 3 */
-	SUBKEY_R(4) = subR[3] ^ subR[5];
-	SUBKEY_L(5) = subL[4] ^ subL[6]; /* round 4 */
-	SUBKEY_R(5) = subR[4] ^ subR[6];
-	SUBKEY_L(6) = subL[5] ^ subL[7]; /* round 5 */
-	SUBKEY_R(6) = subR[5] ^ subR[7];
-	tl = subL[10] ^ (subR[10] & ~subR[8]);
-	dw = tl & subL[8],  /* FL(kl1) */
-		tr = subR[10] ^ ROL1(dw);
-	SUBKEY_L(7) = subL[6] ^ tl; /* round 6 */
-	SUBKEY_R(7) = subR[6] ^ tr;
-	SUBKEY_L(8) = subL[8];       /* FL(kl1) */
-	SUBKEY_R(8) = subR[8];
-	SUBKEY_L(9) = subL[9];       /* FLinv(kl2) */
-	SUBKEY_R(9) = subR[9];
-	tl = subL[7] ^ (subR[7] & ~subR[9]);
-	dw = tl & subL[9],  /* FLinv(kl2) */
-		tr = subR[7] ^ ROL1(dw);
-	SUBKEY_L(10) = tl ^ subL[11]; /* round 7 */
-	SUBKEY_R(10) = tr ^ subR[11];
-	SUBKEY_L(11) = subL[10] ^ subL[12]; /* round 8 */
-	SUBKEY_R(11) = subR[10] ^ subR[12];
-	SUBKEY_L(12) = subL[11] ^ subL[13]; /* round 9 */
-	SUBKEY_R(12) = subR[11] ^ subR[13];
-	SUBKEY_L(13) = subL[12] ^ subL[14]; /* round 10 */
-	SUBKEY_R(13) = subR[12] ^ subR[14];
-	SUBKEY_L(14) = subL[13] ^ subL[15]; /* round 11 */
-	SUBKEY_R(14) = subR[13] ^ subR[15];
-	tl = subL[18] ^ (subR[18] & ~subR[16]);
-	dw = tl & subL[16], /* FL(kl3) */
-		tr = subR[18] ^ ROL1(dw);
-	SUBKEY_L(15) = subL[14] ^ tl; /* round 12 */
-	SUBKEY_R(15) = subR[14] ^ tr;
-	SUBKEY_L(16) = subL[16];     /* FL(kl3) */
-	SUBKEY_R(16) = subR[16];
-	SUBKEY_L(17) = subL[17];     /* FLinv(kl4) */
-	SUBKEY_R(17) = subR[17];
-	tl = subL[15] ^ (subR[15] & ~subR[17]);
-	dw = tl & subL[17], /* FLinv(kl4) */
-		tr = subR[15] ^ ROL1(dw);
-	SUBKEY_L(18) = tl ^ subL[19]; /* round 13 */
-	SUBKEY_R(18) = tr ^ subR[19];
-	SUBKEY_L(19) = subL[18] ^ subL[20]; /* round 14 */
-	SUBKEY_R(19) = subR[18] ^ subR[20];
-	SUBKEY_L(20) = subL[19] ^ subL[21]; /* round 15 */
-	SUBKEY_R(20) = subR[19] ^ subR[21];
-	SUBKEY_L(21) = subL[20] ^ subL[22]; /* round 16 */
-	SUBKEY_R(21) = subR[20] ^ subR[22];
-	SUBKEY_L(22) = subL[21] ^ subL[23]; /* round 17 */
-	SUBKEY_R(22) = subR[21] ^ subR[23];
-	tl = subL[26] ^ (subR[26] & ~subR[24]);
-	dw = tl & subL[24], /* FL(kl5) */
-		tr = subR[26] ^ ROL1(dw);
-	SUBKEY_L(23) = subL[22] ^ tl; /* round 18 */
-	SUBKEY_R(23) = subR[22] ^ tr;
-	SUBKEY_L(24) = subL[24];     /* FL(kl5) */
-	SUBKEY_R(24) = subR[24];
-	SUBKEY_L(25) = subL[25];     /* FLinv(kl6) */
-	SUBKEY_R(25) = subR[25];
-	tl = subL[23] ^ (subR[23] & ~subR[25]);
-	dw = tl & subL[25], /* FLinv(kl6) */
-		tr = subR[23] ^ ROL1(dw);
-	SUBKEY_L(26) = tl ^ subL[27]; /* round 19 */
-	SUBKEY_R(26) = tr ^ subR[27];
-	SUBKEY_L(27) = subL[26] ^ subL[28]; /* round 20 */
-	SUBKEY_R(27) = subR[26] ^ subR[28];
-	SUBKEY_L(28) = subL[27] ^ subL[29]; /* round 21 */
-	SUBKEY_R(28) = subR[27] ^ subR[29];
-	SUBKEY_L(29) = subL[28] ^ subL[30]; /* round 22 */
-	SUBKEY_R(29) = subR[28] ^ subR[30];
-	SUBKEY_L(30) = subL[29] ^ subL[31]; /* round 23 */
-	SUBKEY_R(30) = subR[29] ^ subR[31];
-	SUBKEY_L(31) = subL[30];     /* round 24 */
-	SUBKEY_R(31) = subR[30];
-	SUBKEY_L(32) = subL[32] ^ subL[31]; /* kw3 */
-	SUBKEY_R(32) = subR[32] ^ subR[31];
-
-	/* apply the inverse of the last half of P-function */
-	camellia_setup_tail(subkey, 32);
+	camellia_setup_tail(subkey, subL, subR, 32);
 }
 
 static void camellia_setup192(const unsigned char *key, u32 *subkey)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization
  2007-11-19  4:30                               ` Denys Vlasenko
  2007-11-19 18:49                                 ` Noriaki TAKAMIYA
@ 2007-11-21  3:53                                 ` Herbert Xu
  2007-11-21  8:08                                   ` Denys Vlasenko
  1 sibling, 1 reply; 40+ messages in thread
From: Herbert Xu @ 2007-11-21  3:53 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: Noriaki TAKAMIYA, David Miller, linux-crypto

On Sun, Nov 18, 2007 at 08:30:16PM -0800, Denys Vlasenko wrote:
>
> Oh, Herbert, have heart, my camellia.c source file is smaller
> than the one I started from. It's not like it's twice as big.
> It's smaller already.
> 
> 64-bit key setup is not just faster, it is also smaller
> by ~4k, and this benefit is always there, not only when
> key setup is performed.

The key setup path is the slow path so I don't see why we can't
just switch to the 64-bit version if it's better.

BTW I tried to apply your -6 patch but it doesn't apply against
cryptodev-2.6 so I had to drop it.

Please make sure that you only send one patch per email and that
they apply against cryptodev-2.6.  If the patches have dependencies
then please make that clear as well.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization
  2007-11-21  3:53                                 ` Herbert Xu
@ 2007-11-21  8:08                                   ` Denys Vlasenko
  2007-11-21  8:12                                     ` Herbert Xu
  0 siblings, 1 reply; 40+ messages in thread
From: Denys Vlasenko @ 2007-11-21  8:08 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Noriaki TAKAMIYA, David Miller, linux-crypto

On Tuesday 20 November 2007 19:53, Herbert Xu wrote:
> On Sun, Nov 18, 2007 at 08:30:16PM -0800, Denys Vlasenko wrote:
> > Oh, Herbert, have heart, my camellia.c source file is smaller
> > than the one I started from. It's not like it's twice as big.
> > It's smaller already.
> >
> > 64-bit key setup is not just faster, it is also smaller
> > by ~4k, and this benefit is always there, not only when
> > key setup is performed.
>
> The key setup path is the slow path so I don't see why we can't
> just switch to the 64-bit version if it's better.

Yes, with minor modifications "64-bit" version
can be compiled and will work correctly on 32-bit CPU.
But it will be larger. This is what I got on i386:

   text    data     bss     dec     hex filename
  18230     224       0   18454    4816 t/crypto/camellia.o
  20198     224       0   20422    4fc6 t_fake64/crypto/camellia.o

> BTW I tried to apply your -6 patch but it doesn't apply against
> cryptodev-2.6 so I had to drop it.

Will correct this and re-post.
--
vda

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization
  2007-11-21  8:08                                   ` Denys Vlasenko
@ 2007-11-21  8:12                                     ` Herbert Xu
  2007-11-21  8:38                                       ` Denys Vlasenko
  0 siblings, 1 reply; 40+ messages in thread
From: Herbert Xu @ 2007-11-21  8:12 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: Noriaki TAKAMIYA, David Miller, linux-crypto

On Wed, Nov 21, 2007 at 12:08:57AM -0800, Denys Vlasenko wrote:
>
> Yes, with minor modifications "64-bit" version
> can be compiled and will work correctly on 32-bit CPU.
> But it will be larger. This is what I got on i386:
> 
>    text    data     bss     dec     hex filename
>   18230     224       0   18454    4816 t/crypto/camellia.o
>   20198     224       0   20422    4fc6 t_fake64/crypto/camellia.o

What are the size differences on x86-64?

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [camellia-oss:00952] Re: [PATCH 5/5] camellia: de-unrolling, 64bit-ization
  2007-11-21  8:12                                     ` Herbert Xu
@ 2007-11-21  8:38                                       ` Denys Vlasenko
  0 siblings, 0 replies; 40+ messages in thread
From: Denys Vlasenko @ 2007-11-21  8:38 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Noriaki TAKAMIYA, David Miller, linux-crypto

On Wednesday 21 November 2007 00:12, Herbert Xu wrote:
> On Wed, Nov 21, 2007 at 12:08:57AM -0800, Denys Vlasenko wrote:
> > Yes, with minor modifications "64-bit" version
> > can be compiled and will work correctly on 32-bit CPU.
> > But it will be larger. This is what I got on i386:
> >
> >    text    data     bss     dec     hex filename
> >   18230     224       0   18454    4816 t/crypto/camellia.o
> >   20198     224       0   20422    4fc6 t_fake64/crypto/camellia.o
>
> What are the size differences on x86-64?

The above sizes were: final code (with all patches applied)
built for i386
versus same code with #if BITS_PER_LONG >= 64 replaced by #if 1,
and a few fixes for "integer is too big for long" warnings)

For 64-bit, replacing that #if is a no-op, sizes
will be the same.

If you are asking about 64-bit size comparison *across patches*
5..8, here they are:

64-bit:
dec      hex   filename
22786    5902  2.6.23.1.camellia4.t64/crypto/camellia.o
21422    53ae  2.6.23.1.camellia5.t64/crypto/camellia.o
16355    3fe3  2.6.23.1.camellia6.t64/crypto/camellia.o
15813    3dc5  2.6.23.1.camellia7.t64/crypto/camellia.o
15670    3d36  2.6.23.1.camellia8.t64/crypto/camellia.o

--
vda

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2007-11-21  8:38 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-25 11:43 [PATCH0/5] camellia: cleanup, de-unrolling, and 64bit-ization Denys Vlasenko
2007-10-25 11:45 ` [PATCH 1/5] camellia: cleanup Denys Vlasenko
2007-10-26  8:43   ` Noriaki TAKAMIYA
2007-11-06 14:17   ` Herbert Xu
2007-10-25 11:45 ` [PATCH 2/5] " Denys Vlasenko
2007-10-26  8:44   ` Noriaki TAKAMIYA
2007-11-06 14:19   ` Herbert Xu
2007-10-25 11:46 ` [PATCH 3/5] " Denys Vlasenko
2007-10-26  8:44   ` Noriaki TAKAMIYA
2007-11-06 14:21   ` Herbert Xu
2007-10-25 11:47 ` [PATCH 4/5] camellia: de-unrolling Denys Vlasenko
2007-10-26  8:45   ` Noriaki TAKAMIYA
2007-11-06 14:21   ` Herbert Xu
2007-10-25 11:48 ` [PATCH 5/5] camellia: de-unrolling, 64bit-ization Denys Vlasenko
2007-10-26  8:45   ` Noriaki TAKAMIYA
2007-11-06 14:23   ` Herbert Xu
2007-11-07 13:22     ` Denys Vlasenko
2007-11-08 13:30       ` Herbert Xu
2007-11-13  6:07         ` Noriaki TAKAMIYA
2007-11-13  6:25           ` [camellia-oss:00952] " Noriaki TAKAMIYA
2007-11-13 22:34             ` Denys Vlasenko
2007-11-14  1:41               ` David Miller
2007-11-14  2:47                 ` Denys Vlasenko
2007-11-14  3:49                   ` David Miller
2007-11-14  5:30                     ` Denys Vlasenko
2007-11-14  6:10                       ` David Miller
2007-11-14  7:38                         ` Denys Vlasenko
2007-11-14  7:15                       ` Denys Vlasenko
2007-11-14 14:14                         ` Herbert Xu
2007-11-14 21:28                           ` Denys Vlasenko
2007-11-18 13:21                             ` Herbert Xu
2007-11-19  4:30                               ` Denys Vlasenko
2007-11-19 18:49                                 ` Noriaki TAKAMIYA
2007-11-21  2:44                                   ` Denys Vlasenko
2007-11-21  3:53                                 ` Herbert Xu
2007-11-21  8:08                                   ` Denys Vlasenko
2007-11-21  8:12                                     ` Herbert Xu
2007-11-21  8:38                                       ` Denys Vlasenko
2007-11-14  4:18                   ` Noriaki TAKAMIYA
2007-10-25 11:57 ` [PATCH0/5] camellia: cleanup, de-unrolling, and 64bit-ization Denys Vlasenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).