Re: [dpdk-dev] [RFC PATCH v1] regexdev: introduce regexdev subsystem

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Wang Xiang <xiang.w.wang@intel.com>
To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
Cc: Thomas Monjalon <thomas@monjalon.net>,
	"dev@dpdk.org" <dev@dpdk.org>,
	Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>,
	Shahaf Shuler <shahafs@mellanox.com>,
	Hemant Agrawal <hemant.agrawal@nxp.com>,
	Opher Reviv <opher@mellanox.com>,
	Alex Rosenbaum <alexr@mellanox.com>,
	Dovrat Zifroni <dovrat@marvell.com>,
	Prasun Kapoor <pkapoor@marvell.com>,
	Nipun Gupta <nipun.gupta@nxp.com>,
	"Richardson, Bruce" <bruce.richardson@intel.com>,
	"Hong, Yang A" <yang.a.hong@intel.com>,
	"Chang, Harry" <harry.chang@intel.com>,
	"gu.jian1@zte.com.cn" <gu.jian1@zte.com.cn>,
	"shanjiangh@chinatelecom.cn" <shanjiangh@chinatelecom.cn>,
	"zhangy.yun@chinatelecom.cn" <zhangy.yun@chinatelecom.cn>,
	"lixingfu@huachentel.com" <lixingfu@huachentel.com>,
	"wushuai@inspur.com" <wushuai@inspur.com>,
	"yuyingxia@yxlink.com" <yuyingxia@yxlink.com>,
	"fanchenggang@sunyainfo.com" <fanchenggang@sunyainfo.com>,
	"davidfgao@tencent.com" <davidfgao@tencent.com>,
	"liuzhong1@chinaunicom.cn" <liuzhong1@chinaunicom.cn>,
	"zhaoyong11@huawei.com" <zhaoyong11@huawei.com>,
	"oc@yunify.com" <oc@yunify.com>,
	"jim@netgate.com" <jim@netgate.com>,
	"Ni, Hongjun" <hongjun.ni@intel.com>,
	"j.bromhead@titan-ic.com" <j.bromhead@titan-ic.com>,
	"deri@ntop.org" <deri@ntop.org>,
	"fc@napatech.com" <fc@napatech.com>,
	"arthur.su@lionic.com" <arthur.su@lionic.com>,
	Guy Kaneti <guyk@marvell.com>, Smadar Fuks <smadarf@marvell.com>,
	Liron Himi <lironh@marvell.com>,
	edwin.verplanke@intel.com, keith.wiles@intel.com
Subject: Re: [dpdk-dev] [RFC PATCH v1] regexdev: introduce regexdev subsystem
Date: Thu, 19 Sep 2019 09:58:57 -0400	[thread overview]
Message-ID: <20190919135857.GA82263@hs1> (raw)
In-Reply-To: <BYAPR18MB2424A91AF798B957AD29C9E2C8B60@BYAPR18MB2424.namprd18.prod.outlook.com>

Hi Jerin,

Thanks for your response. More comments below and inline.

1) I think the size of some varaibles (e.g. nb_matches, scan_size,
matching offset, etc) should be increased based on what Hyperscan supports.

    a) struct rte_regex_ops:

        uint16_t scan_size => uint32_t scan_size
        uint8_t nb_actual_matches => uint64 nb_actual_matches
        uint8_t nb_matches => uint64 nb__matches

    b) struct rte_regex_match:
        uint16_t offset => uint32_t offset
        uint16_t len => uint32_t len

    c) uint16_t
        rte_regex_rule_db_update(uint8_t dev_id, const struct rte_regex_rule *rules,
                                 uint16_t nb_rules);
    =>
       uint32_t
        rte_regex_rule_db_update(uint8_t dev_id, const struct rte_regex_rule *rules,
                                 uint32_t nb_rules);

    d) int
    rte_regex_queue_pair_setup(uint8_t dev_id, uint8_t queue_pair_id,
                    const struct rte_regex_qp_conf *qp_conf);
    =>
       int
    rte_regex_queue_pair_setup(uint8_t dev_id, uint16_t queue_pair_id,
                    const struct rte_regex_qp_conf *qp_conf);

    e) struct rte_regex_dev_config:
        uint8_t nb_max_matches => uint64_t nb_max_matches

    f) struct rte_regex_dev_info:
        uint8_t max_matches => uint64_t max_matches

2) There are rte_regex_dev_attr_get() and rte_regex_dev_attr_set() defined.
Are all the attributes below could be set by users? Is any of them read-only?

/** Enumerates RegEx device attribute identifier */
enum rte_regex_dev_attr_id {
    RTE_REGEX_DEV_ATTR_SOCKET_ID,
    /**< The NUMA socket id to which the device is connected or
     * a default of zero if the socket could not be determined.
     * datatype: *int*
     * operation: *get*
     */
    RTE_REGEX_DEV_ATTR_MAX_MATCHES,
    /**< Maximum number of matches per scan.
     * datatype: *uint8_t*
     * operation: *get* and *set*
     *
     * @see RTE_REGEX_OPS_RSP_MAX_MATCH_F
     */
    RTE_REGEX_DEV_ATTR_MAX_SCAN_TIMEOUT,
    /**< Upper bound scan time in ns.
     * datatype: *uint16_t*
     * operation: *get* and *set*
     *
     * @see RTE_REGEX_OPS_RSP_MAX_SCAN_TIMEOUT_F
     */
    RTE_REGEX_DEV_ATTR_MAX_PREFIX,
    /**< Maximum number of prefix detected per scan.
     * This would be useful for denial of service detection.
     * datatype: *uint16_t*
     * operation: *get* and *set*
     *
     * @see RTE_REGEX_OPS_RSP_MAX_PREFIX_F
     */
};

3) Both RTE_REGEX_PCRE_RULE_* and
RTE_REGEX_DEV_PCRE_UNSUP_* can be viewed as device capabilities. Can we
merge them with RTE_REGEX_DEV_CAPA_RUNTIME_COMPILATION_F and have a
unified regex_dev_capa in struct rte_regex_dev_info.


4) It'll be good if we can also define synchronous matching API for users who
want to have a one-off scan and wait for the results.

On Tue, Sep 10, 2019 at 08:05:39AM +0000, Jerin Jacob Kollanukkaran wrote:
> Hi Xiang,
> 
> Sorry for delay in response(Was busy with 19.11 proposal deadline). Please see inline.
>  
> > 
> > Reply to Xiang's queries in main thread:
> > 
> > Hi all,
> > 
> > Some questions regarding APIs. Could you please give more insights?
> > 
> > 1) rte_regex_ops
> >       a) rsp_flags
> >       These two flags RTE_REGEX_OPS_RSP_PMI_SOJ_F and
> > RTE_REGEX_OPS_RSP_PMI_EOJ_F are used for cross buffer scan.
> >       RTE_REGEX_OPS_RSP_PMI_EOJ_F tells whether we have a partial match
> > at the end of current buffer after scan.
> >       What's the purpose of having RTE_REGEX_OPS_RSP_PMI_SOJ_F?
> > 
> > [Jerin] Since we need three states to represent partial match buffer,
> > RTE_REGEX_OPS_RSP_PMI_SOJ_F to
> > represent start of the buffer, intermediate buffers with no flag, and end of
> > the buffer with RTE_REGEX_OPS_RSP_PMI_EOJ
> 
> > [Xiang] How could a user leverage these flags for matching? Suppose a large
> > buffer is divided into multiple chunks. Will RTE_REGEX_OPS_RSP_PMI_SOJ_F
> > cause an early quit once it isn't set after scan the first chunk. Similarly,
> > RTE_REGEX_OPS_RSP_PMI_EOJ tells a user whether to stop matching future
> > buffers after finish the last chunk?
> 
> Let me describe with an example,
> 
> Assume,
> 1) struct rte_regex_dev_info:: max_payload_size set to 1024
> 2) rte_regex_dev_config:: dev_cfg_flags configured with RTE_REGEX_DEV_CFG_CROSS_BUFFER_SCAN_F
> 3) Device programmed with matching "hello\s+world" pattern
> 4) user enqueue struct rte_regex_ops:: buf_addr point following "data" and struct rte_regex_op:: scan_size = 1024
> 
> data[0..1021] = data don???t have hello world pattern
> data[1022] = 'h'
> data[1023] = 'e'
> 
> 5) user enqueue struct rte_regex_ops:: buf_addr point following "data" and struct rte_regex_op:: scan_size = 9
> 
> data[0] = 'l'
> data[1] = 'l'
> data[2] = 'o'
> data[3] = ' '
> data[4] = 'w'
> data[5] = 'o'
> data[6] = 'r'
> data[7] = 'l'
> data[8] = 'd'
> 
> If so,
> 
> Response to 4) will be RTE_REGEX_OPS_RSP_PMI_SOJ_F in rte_regex_ops:: rsp_flags on dequeue
> Where rte_regex_match:: offset is 1022 and len 2
> 
> Response to 5) will be RTE_REGEX_OPS_RSP_PMI_EOJ_F in rte_regex_ops:: rsp_flags on dequeue
> Where rte_regex_match:: offset is 0 and len 9
>
If the defined pattern is "hello.*world" instead of "hello\s+world", and
we enqueue following struct rte_regex_ops:

1) rte_regex_op:: scan_size = 1024

   data[0..1021] = data don???t have hello world pattern
   data[1022] = 'h'
   data[1023] = 'e'

2) rte_regex_op:: scan_size = 9
   data[0] = 'l'
   data[1] = 'l'
   data[2] = 'o'
   data[3] = ' '
   data[4] = 'w'
   data[5] = 'o'
   data[6] = 'r'
   data[7] = 'l'
   data[8] = 'd'

3) rte_regex_op:: scan_size = 5
   data[0] = 'w'
   data[1] = 'o'
   data[2] = 'r'
   data[3] = 'l'
   data[4] = 'd'

Will response to 3) have RTE_REGEX_OPS_RSP_PMI_EOJ_F in rte_regex_ops::
rsp_flags on dequeue
Where rte_regex_match:: offset is 0 and len 4?

I am wondering what's your expected behavior for .* or similar syntax and if
there are syntax compatability issues. We report all matches in Hyperscan,
e.g. report end match offsets 11 and 16 for pattern "hello.*world" and
corpus "hello worldworld".

BTW, not sure how other hardware devices handle cross buffer scan. Hyperscan
doesn't reports matches for start and intermediate buffers but only reports
end offset if a full match is found.

> 
> > 
> >       RTE_REGEX_OPS_RSP_MAX_PREFIX_F: This looks like a definition for a
> > specific hardware implementation. I am wondering what this PREFIX refers
> > to:)?
> > 
> > [Jerin] Yes. Looks like it is for hardware specific implementation. Introduced
> > rte_regex_dev_attr_set/get functions to make it portable and
> > To add new implementation specific fields.
> > For example, if a rule is
> > /ABCDEF.*XYZ/, ABCD is considered the prefix, and EF.*XYZ is considered the
> > factor. The prefix is a literal
> > string, while the factor can contain complex regular expression constructs. As
> > a result, rule matching occurs in
> > two stages: prefix matching and factor matching.
> > 
> >       b)  user_id or user_ptr
> >       Under what kind of circumstances should an application pass value into
> > these variables for enqueue and dequeuer operations?
> > 
> > [Jerin] Just like rte_crypto_ops, struct rte_regex_ops also allocated using
> > mempool normally, on enqueue, user can specify user_id
> > If needed to in order identify the op on dequeue if required. The use case
> > could be to store the sequence number from application
> > POV or storing the mbuf ptr in which pattern is requested etc.
> > 
> > 
> >  2) rte_regex_match
> >       a) offset; /**< Starting Byte Position for matched rule. */ and  uint16_t
> > len; /**< Length of match in bytes */
> >       Looks like the matching offset is defined as *starting matching offset*
> > instead of *end matching offset*, e.g. report the offset of "a" instead of "c"
> > for pattern "abc".
> >       If so, this makes it hard to integrate software regex libraries such as
> > Hyperscan and RE2 as they only report *end matching offset* without length
> > of match.
> >       Although Hyperscan has API for *starting matching offset*, it only delivers
> > partial syntax support. So I think we have to define *end of matching offset*
> > for software solutions.
> > 
> > [Jerin] I understand the hyperscan's HS_FLAG_SOM_LEFTMOST tradeoffs. I
> > thought application would need always the length of the match.
> > Probably we will see how other HW implementation (from Mellanox) etc. We
> > will try to abstract it, probably we can make it as function of "user
> > requested".
> > [Xiang] Yes, it will be good to make it per user request. At least from
> > Hyperscan user's point of view, start of match and match length are not
> > mandatory.
> 
> OK. I think, we can introduce RTE_REGEX_DEV_CFG_MATCH_AS_START
> In device configure.
> 
> Since offset+len == end, we can introduce following generic inline function.
> 
> static inline 
> rte_regex_match_end(truct rte_regex_match *match)
> {
> 	match->offset + match->len;
> }
> 
> Example:  pattern to match is  "hello\s+world"  and data is following
> data[4] = 'h'
> data[5] = 'e'
> data[6] = 'l'
> data[7] = 'l'
> data[8] = 'o'
> data[9] = ' '
> data[10] = 'w'
> data[11] = 'o'
> data[12] = 'r'
> data[13] = 'l'
> data[14] = 'd'
> 
> if device is configured with RTE_REGEX_DEV_CFG_MATCH_AS_START
> match->offset returns 4
> match->len returns 11
> 
> if device is NOT configured with RTE_REGEX_DEV_CFG_MATCH_AS_START
> driver MAY return the following(in hyperscan case)
> match->offset returns 0
> match->len returns 11 + 4
> 
> In both case(irrespective of flags, to make application life easy) rte_regex_match_end() would return 15.
> If application demands for MATCH_AS_START then driver can return match->offset returns 4 and match->len returns 11
> Aka set HS_FLAG_SOM_LEFTMOST in hyperscan driver, But application should use rte_regex_match_end()
> for finding the end of the match. To make, work in all cases.
> 
> Is it OK? 
> 
Can we replace len with end offset? So we can change "offset" to
"start_offset" and len to "end_ offset" in struct rte_regex_match. Users
interested in len could take "end_offset - start_offset".
We may also change RTE_REGEX_DEV_CFG_MATCH_AS_START to RTE_REGEX_DEV_CFG_MATCH_START

In your example,
if device is configured with RTE_REGEX_DEV_CFG_MATCH_START
match->start_offset returns 4
match->end_offset returns 15

if device is NOT configured with RTE_REGEX_DEV_CFG_MATCH_START
match->start_offset returns 0
match->end_offset returns 15

> > 
> > 3)  rte_regex_rule_db_update()
> >     Does this mean we can dynamically add or delete rules for an already
> > generated database without recompile from scratch for hardware Regex
> > implementation?
> >     If so, this isn't possible for software solutions as they don't support
> > dynamic database update and require recompile.
> > 
> > [Jerin] rte_regex_rule_db_update() internally it would call recompile
> > function for both HW and SW.
> > See rte_regex_dev_config::rule_db in rte_regex_dev_configure() for
> > precompiled rule database case.
> > [Xiang] OK, sounds like we have to save the original rule-set for the device in
> > order to do recompile. I see both ADD and REMOVE operators from
> > rte_regex_rule.
> > For rules with REMOVE operator, what's the expected behavior to handle
> > them for the old rule-set? Do we need to go through the old rule-set and
> > remove corresponding rules before doing recompile?
> 
> Yes.
>
I think it'll be better to change rte_regex_rule_db_update() to
rte_regex_rule_compile() and have users to provide a full rule-set.
So we don't have to maintain old rule-set and decide which one to keep
and remove. We can simply recompile new rule-set and get rid of
rte_regex_rule_op in this case.

next prev parent reply	other threads:[~2019-09-19  6:40 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-27 15:50 [dpdk-dev] [RFC PATCH v1] regexdev: introduce regexdev subsystem jerinj
2019-07-15  4:26 ` Jerin Jacob Kollanukkaran
2019-08-15  9:35 ` Thomas Monjalon
2019-08-15 11:34   ` Thomas Monjalon
2019-08-19  3:09     ` Jerin Jacob Kollanukkaran
2019-08-20  1:54       ` Wang, Xiang W
2019-09-10  8:05         ` Jerin Jacob Kollanukkaran
2019-09-19 13:58           ` Wang Xiang [this message]
2019-09-27 14:35             ` Jerin Jacob Kollanukkaran
2019-10-14 13:59               ` Wang Xiang
2020-01-26 11:55                 ` Ori Kam
2019-08-21  5:32     ` Shahaf Shuler
2019-08-21 15:12       ` John Bromhead
2019-09-10 10:31       ` Jerin Jacob Kollanukkaran
2019-09-10 11:02       ` Jerin Jacob Kollanukkaran
2019-09-27 14:45         ` Jerin Jacob Kollanukkaran
2019-10-02  5:53           ` Shahaf Shuler
2019-10-02  8:31             ` Jerin Jacob Kollanukkaran
2019-10-02  8:52               ` Shahaf Shuler
2019-10-02  9:34                 ` Jerin Jacob Kollanukkaran
2020-01-27 21:19 ` [dpdk-dev] [PATCH v2] net/regexdev: " Ori Kam
2020-01-28  9:00 ` [dpdk-dev] [PATCH v3] regexdev: " Ori Kam
2020-02-22 16:52   ` Jerin Jacob
2020-02-23  8:41     ` Ori Kam
2020-02-23  9:53       ` Jerin Jacob
2020-02-23 12:33         ` Ori Kam
2020-02-25  5:57           ` Jerin Jacob
2020-02-25  7:48             ` Ori Kam
2020-02-26  9:03               ` Wang Xiang
2020-02-26  8:36                 ` Ori Kam
2020-02-27  9:25                   ` Wang Xiang
2020-02-27  7:31                     ` Ori Kam
2020-02-27  9:16                       ` Wang Xiang
2020-02-27 14:40 ` [dpdk-dev] [RFC v4] " Ori Kam
2020-02-27 14:55   ` Jerin Jacob
2020-02-27 15:08 ` [dpdk-dev] [RFC v5] " Ori Kam
2020-03-01  6:13   ` [dpdk-dev] [EXT] " Pavan Nikhilesh Bhagavatula
2020-03-01  7:31     ` Ori Kam
2020-03-01 13:23       ` Pavan Nikhilesh Bhagavatula
2020-03-01 14:10         ` Ori Kam
2020-03-01 14:38           ` Pavan Nikhilesh Bhagavatula
2020-03-01 15:41             ` Ori Kam
2020-03-01 15:57               ` Pavan Nikhilesh Bhagavatula
2020-03-02  7:18                 ` Jerin Jacob
2020-03-03  7:06                   ` Ori Kam
2020-03-02  7:05   ` [dpdk-dev] " Wang Xiang
2020-03-03  7:44     ` Ori Kam
2020-03-03  7:54       ` Jerin Jacob
2020-03-10 10:32 ` [dpdk-dev] [RFC v6] " Ori Kam
2020-03-10 13:42   ` Pavan Nikhilesh Bhagavatula
2020-03-10 16:23     ` Ori Kam
2020-03-10 16:36       ` Pavan Nikhilesh Bhagavatula
2020-03-10 17:00         ` Ori Kam
2020-03-12 12:13           ` Ori Kam
2020-03-13  1:20   ` Wang Xiang
2020-03-15 10:05     ` Ori Kam
2020-03-16  1:25       ` Wang Xiang
2020-03-16  9:09         ` Ori Kam
2020-03-16 20:48           ` Wang Xiang
2020-03-16 13:49             ` Ori Kam
2020-03-16 21:10               ` Wang Xiang
  -- strict thread matches above, loose matches on Subject: below --
2019-10-20 14:09 [dpdk-dev] [RFC PATCH v1] " Jerin Jacob Kollanukkaran

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190919135857.GA82263@hs1 \
    --to=xiang.w.wang@intel.com \
    --cc=alexr@mellanox.com \
    --cc=arthur.su@lionic.com \
    --cc=bruce.richardson@intel.com \
    --cc=davidfgao@tencent.com \
    --cc=deri@ntop.org \
    --cc=dev@dpdk.org \
    --cc=dovrat@marvell.com \
    --cc=edwin.verplanke@intel.com \
    --cc=fanchenggang@sunyainfo.com \
    --cc=fc@napatech.com \
    --cc=gu.jian1@zte.com.cn \
    --cc=guyk@marvell.com \
    --cc=harry.chang@intel.com \
    --cc=hemant.agrawal@nxp.com \
    --cc=hongjun.ni@intel.com \
    --cc=j.bromhead@titan-ic.com \
    --cc=jerinj@marvell.com \
    --cc=jim@netgate.com \
    --cc=keith.wiles@intel.com \
    --cc=lironh@marvell.com \
    --cc=liuzhong1@chinaunicom.cn \
    --cc=lixingfu@huachentel.com \
    --cc=nipun.gupta@nxp.com \
    --cc=oc@yunify.com \
    --cc=opher@mellanox.com \
    --cc=pbhagavatula@marvell.com \
    --cc=pkapoor@marvell.com \
    --cc=shahafs@mellanox.com \
    --cc=shanjiangh@chinatelecom.cn \
    --cc=smadarf@marvell.com \
    --cc=thomas@monjalon.net \
    --cc=wushuai@inspur.com \
    --cc=yang.a.hong@intel.com \
    --cc=yuyingxia@yxlink.com \
    --cc=zhangy.yun@chinatelecom.cn \
    --cc=zhaoyong11@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.