dev.dpdk.org archive mirror
 help / color / mirror / Atom feed
From: Konstantin Ananyev <konstantin.ananyev-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
To: dev-VfR2kkLFssw@public.gmane.org
Subject: [PATCH 13/17] libter_acl: move lo/hi dwords shuffle out from calc_addr
Date: Sun, 14 Dec 2014 18:10:55 +0000	[thread overview]
Message-ID: <1418580659-12595-14-git-send-email-konstantin.ananyev@intel.com> (raw)
In-Reply-To: <1418580659-12595-1-git-send-email-konstantin.ananyev-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

Reorganise SSE code-path a bit by moving lo/hi dwords shuffle
out from calc_addr().
That allows to make calc_addr() for SSE and AVX2 practically identical
and opens opportunity for further code deduplication.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 lib/librte_acl/acl_run_sse.h | 38 ++++++++++++++++++++------------------
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/lib/librte_acl/acl_run_sse.h b/lib/librte_acl/acl_run_sse.h
index 1b7870e..4a174e9 100644
--- a/lib/librte_acl/acl_run_sse.h
+++ b/lib/librte_acl/acl_run_sse.h
@@ -172,9 +172,9 @@ acl_match_check_x4(int slot, const struct rte_acl_ctx *ctx, struct parms *parms,
  */
 static inline __attribute__((always_inline)) xmm_t
 calc_addr_sse(xmm_t index_mask, xmm_t next_input, xmm_t shuffle_input,
-	xmm_t ones_16, xmm_t indices1, xmm_t indices2)
+	xmm_t ones_16, xmm_t tr_lo, xmm_t tr_hi)
 {
-	xmm_t addr, node_types, range, temp;
+	xmm_t addr, node_types;
 	xmm_t dfa_msk, dfa_ofs, quad_ofs;
 	xmm_t in, r, t;
 
@@ -187,18 +187,14 @@ calc_addr_sse(xmm_t index_mask, xmm_t next_input, xmm_t shuffle_input,
 	 * it reaches a match.
 	 */
 
-	/* Shuffle low 32 into temp and high 32 into indices2 */
-	temp = (xmm_t)MM_SHUFFLEPS((__m128)indices1, (__m128)indices2, 0x88);
-	range = (xmm_t)MM_SHUFFLEPS((__m128)indices1, (__m128)indices2, 0xdd);
-
 	t = MM_XOR(index_mask, index_mask);
 
 	/* shuffle input byte to all 4 positions of 32 bit value */
 	in = MM_SHUFFLE8(next_input, shuffle_input);
 
 	/* Calc node type and node addr */
-	node_types = MM_ANDNOT(index_mask, temp);
-	addr = MM_AND(index_mask, temp);
+	node_types = MM_ANDNOT(index_mask, tr_lo);
+	addr = MM_AND(index_mask, tr_lo);
 
 	/*
 	 * Calc addr for DFAs - addr = dfa_index + input_byte
@@ -211,7 +207,7 @@ calc_addr_sse(xmm_t index_mask, xmm_t next_input, xmm_t shuffle_input,
 	r = _mm_add_epi8(r, range_base);
 
 	t = _mm_srli_epi32(in, 24);
-	r = _mm_shuffle_epi8(range, r);
+	r = _mm_shuffle_epi8(tr_hi, r);
 
 	dfa_ofs = _mm_sub_epi32(t, r);
 
@@ -224,22 +220,22 @@ calc_addr_sse(xmm_t index_mask, xmm_t next_input, xmm_t shuffle_input,
 	 */
 
 	/* check ranges */
-	temp = MM_CMPGT8(in, range);
+	t = MM_CMPGT8(in, tr_hi);
 
 	/* convert -1 to 1 (bytes greater than input byte */
-	temp = MM_SIGN8(temp, temp);
+	t = MM_SIGN8(t, t);
 
 	/* horizontal add pairs of bytes into words */
-	temp = MM_MADD8(temp, temp);
+	t = MM_MADD8(t, t);
 
 	/* horizontal add pairs of words into dwords */
-	quad_ofs = MM_MADD16(temp, ones_16);
+	quad_ofs = MM_MADD16(t, ones_16);
 
-	/* mask to range type nodes */
-	temp = _mm_blendv_epi8(quad_ofs, dfa_ofs, dfa_msk);
+	/* blend DFA and QUAD/SINGLE. */
+	t = _mm_blendv_epi8(quad_ofs, dfa_ofs, dfa_msk);
 
 	/* add index into node position */
-	return MM_ADD32(addr, temp);
+	return MM_ADD32(addr, t);
 }
 
 /*
@@ -249,13 +245,19 @@ static inline __attribute__((always_inline)) xmm_t
 transition4(xmm_t next_input, const uint64_t *trans,
 	xmm_t *indices1, xmm_t *indices2)
 {
-	xmm_t addr;
+	xmm_t addr, tr_lo, tr_hi;
 	uint64_t trans0, trans2;
 
+	/* Shuffle low 32 into tr_lo and high 32 into tr_hi */
+	tr_lo = (xmm_t)_mm_shuffle_ps((__m128)*indices1, (__m128)*indices2,
+		0x88);
+	tr_hi = (xmm_t)_mm_shuffle_ps((__m128)*indices1, (__m128)*indices2,
+		0xdd);
+
 	 /* Calculate the address (array index) for all 4 transitions. */
 
 	addr = calc_addr_sse(xmm_index_mask.x, next_input, xmm_shuffle_input.x,
-		xmm_ones_16.x, *indices1, *indices2);
+		xmm_ones_16.x, tr_lo, tr_hi);
 
 	 /* Gather 64 bit transitions and pack back into 2 registers. */
 
-- 
1.8.5.3

  parent reply	other threads:[~2014-12-14 18:10 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-14 18:10 [PATCH 00/17] ACL: New AVX2 classify method and several other enhancements Konstantin Ananyev
     [not found] ` <1418580659-12595-1-git-send-email-konstantin.ananyev-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2014-12-14 18:10   ` [PATCH 01/17] app/test: few small fixes fot test_acl.c Konstantin Ananyev
2014-12-14 18:10   ` [PATCH 02/17] librte_acl: make data_indexes long enough to survive idle transitions Konstantin Ananyev
2014-12-14 18:10   ` [PATCH 03/17] librte_acl: remove build phase heuristsic with negative perfomance effect Konstantin Ananyev
2014-12-14 18:10   ` [PATCH 04/17] librte_acl: fix a bug at build phase that can cause matches beeing overwirtten Konstantin Ananyev
2014-12-14 18:10   ` [PATCH 05/17] librte_acl: introduce DFA nodes compression (group64) for identical entries Konstantin Ananyev
2014-12-14 18:10   ` [PATCH 06/17] librte_acl: build/gen phase - simplify the way match nodes are allocated Konstantin Ananyev
2014-12-14 18:10   ` [PATCH 07/17] librte_acl: make scalar RT code to be more similar to vector one Konstantin Ananyev
2014-12-14 18:10   ` [PATCH 08/17] librte_acl: a bit of RT code deduplication Konstantin Ananyev
2014-12-14 18:10   ` [PATCH 09/17] EAL: introduce rte_ymm and relatives in rte_common_vect.h Konstantin Ananyev
     [not found]     ` <1418580659-12595-10-git-send-email-konstantin.ananyev-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2014-12-15 15:56       ` Neil Horman
2014-12-14 18:10   ` [PATCH 10/17] librte_acl: add AVX2 as new rte_acl_classify() method Konstantin Ananyev
     [not found]     ` <1418580659-12595-11-git-send-email-konstantin.ananyev-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2014-12-15 16:00       ` Neil Horman
     [not found]         ` <20141215160009.GC3803-B26myB8xz7F8NnZeBjwnZQMhkBWG/bsMQH7oEaQurus@public.gmane.org>
2014-12-15 16:33           ` Ananyev, Konstantin
     [not found]             ` <2601191342CEEE43887BDE71AB977258213C0D9C-kPTMFJFq+rEu0RiL9chJVbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2014-12-15 20:20               ` Neil Horman
     [not found]                 ` <20141215202043.GD3803-B26myB8xz7F8NnZeBjwnZQMhkBWG/bsMQH7oEaQurus@public.gmane.org>
2014-12-16 16:16                   ` Ananyev, Konstantin
     [not found]                     ` <2601191342CEEE43887BDE71AB977258213C12AD-kPTMFJFq+rEu0RiL9chJVbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2014-12-17 15:32                       ` Neil Horman
     [not found]                         ` <20141217153232.GA6618-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2014-12-17 19:22                           ` Ananyev, Konstantin
     [not found]                             ` <2601191342CEEE43887BDE71AB977258213C1AB4-kPTMFJFq+rEu0RiL9chJVbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2014-12-17 20:27                               ` Neil Horman
     [not found]                                 ` <20141217202743.GA10240-B26myB8xz7F8NnZeBjwnZQMhkBWG/bsMQH7oEaQurus@public.gmane.org>
2014-12-18 15:01                                   ` Ananyev, Konstantin
     [not found]                                     ` <2601191342CEEE43887BDE71AB977258213C1DDD-kPTMFJFq+rEu0RiL9chJVbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-01-06  9:57                                       ` Ananyev, Konstantin
     [not found]                                         ` <2601191342CEEE43887BDE71AB977258213D1E29-pww93C2UFcwu0RiL9chJVbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-01-06 12:40                                           ` Neil Horman
2014-12-17  0:38                 ` Ananyev, Konstantin
2014-12-14 18:10   ` [PATCH 11/17] test-acl: add ability to manually select RT method Konstantin Ananyev
2014-12-14 18:10   ` [PATCH 12/17] librte_acl: Remove search_sse_2 and relatives Konstantin Ananyev
2014-12-14 18:10   ` Konstantin Ananyev [this message]
2014-12-14 18:10   ` [PATCH 14/17] libte_acl: make calc_addr a define to deduplicate the code Konstantin Ananyev
2014-12-14 18:10   ` [PATCH 15/17] libte_acl: introduce max_size into rte_acl_config Konstantin Ananyev
2014-12-14 18:10   ` [PATCH 16/17] libte_acl: remove unused macros Konstantin Ananyev
2014-12-14 18:10   ` [PATCH 17/17] libte_acl: fix compilation issues with RTE_LIBRTE_ACL_STANDALONE=y Konstantin Ananyev
     [not found]     ` <1418580659-12595-18-git-send-email-konstantin.ananyev-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2014-12-16 13:51       ` Neil Horman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1418580659-12595-14-git-send-email-konstantin.ananyev@intel.com \
    --to=konstantin.ananyev-ral2jqcrhueavxtiumwx3w@public.gmane.org \
    --cc=dev-VfR2kkLFssw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).