* [patch iproute2 v5 0/3] tc: Add -bs option to batch mode
@ 2018-01-03 2:55 Chris Mi
2018-01-03 2:55 ` [patch iproute2 v5 1/3] lib/libnetlink: Add a function rtnl_talk_msg Chris Mi
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Chris Mi @ 2018-01-03 2:55 UTC (permalink / raw)
To: netdev; +Cc: gerlitz.or, stephen, dsahern, marcelo.leitner
Currently in tc batch mode, only one command is read from the batch
file and sent to kernel to process. With this patchset, we can accumulate
several commands before sending to kernel. The batch size is specified
using option -bs or -batchsize.
To accumulate the commands in tc, client should allocate an array of
struct iovec. If batchsize is bigger than 1, only after the client
has accumulated enough commands, can the client call rtnl_talk_msg
to send the message that includes the iov array. One exception is
that there is no more command in the batch file.
But please note that kernel still processes the requests one by one.
To process the requests in parallel in kernel is another effort.
The time we're saving in this patchset is the user mode and kernel mode
context switch. So this patchset works on top of the current kernel.
Using the following script in kernel, we can generate 1,000,000 rules.
tools/testing/selftests/tc-testing/tdc_batch.py
Without this patchset, 'tc -b $file' exection time is:
real 0m15.125s
user 0m6.982s
sys 0m8.080s
With this patchset, 'tc -b $file -bs 10' exection time is:
real 0m12.772s
user 0m5.984s
sys 0m6.723s
The insertion rate is improved more than 10%.
In this patchset, we still ack for every rule. If we don't ack at all,
'tc -b $file' exection time is:
real 0m14.484s
user 0m6.919s
sys 0m7.498s
'tc -b $file -bs 10' exection time is:
real 0m11.664s
user 0m6.017s
sys 0m5.578s
We can see that the performance win is to send multiple messages instead
of no acking. I think that's because in tc, we don't spend too much time
processing the ack message.
v3
==
1. Instead of hacking function rtnl_talk directly, add a new function
rtnl_talk_msg.
2. remove most of global variables to use parameter passing
3. divide the previous patch into 4 patches.
v4
==
1. Remove function setcmdlinetotal. Now in function batch, we read one
more line to determine if we are reaching the end of file.
2. Remove function __rtnl_check_ack. Now __rtnl_talk calls __rtnl_talk_msg
directly.
3. if (batch_size < 1)
batch_size = 1;
v5
==
1. Fix a bug that can't deal with batch file with blank line.
2. Describe the limitation in man page.
Chris Mi (3):
lib/libnetlink: Add a function rtnl_talk_msg
tc: Add -bs option to batch mode
man: Add -bs option to tc manpage
include/libnetlink.h | 3 ++
lib/libnetlink.c | 59 ++++++++++++++++++-------
man/man8/tc.8 | 9 ++++
tc/m_action.c | 90 +++++++++++++++++++++++++++++---------
tc/tc.c | 70 +++++++++++++++++++++++------
tc/tc_common.h | 8 +++-
tc/tc_filter.c | 121 +++++++++++++++++++++++++++++++++++++--------------
7 files changed, 276 insertions(+), 84 deletions(-)
--
2.14.3
^ permalink raw reply [flat|nested] 9+ messages in thread* [patch iproute2 v5 1/3] lib/libnetlink: Add a function rtnl_talk_msg 2018-01-03 2:55 [patch iproute2 v5 0/3] tc: Add -bs option to batch mode Chris Mi @ 2018-01-03 2:55 ` Chris Mi 2018-01-03 4:08 ` David Ahern 2018-01-03 2:55 ` [patch iproute2 v5 2/3] tc: Add -bs option to batch mode Chris Mi ` (2 subsequent siblings) 3 siblings, 1 reply; 9+ messages in thread From: Chris Mi @ 2018-01-03 2:55 UTC (permalink / raw) To: netdev; +Cc: gerlitz.or, stephen, dsahern, marcelo.leitner rtnl_talk can only send a single message to kernel. Add a new function rtnl_talk_msg that can send multiple messages to kernel. Signed-off-by: Chris Mi <chrism@mellanox.com> --- include/libnetlink.h | 3 +++ lib/libnetlink.c | 59 ++++++++++++++++++++++++++++++++++++++-------------- 2 files changed, 46 insertions(+), 16 deletions(-) diff --git a/include/libnetlink.h b/include/libnetlink.h index a4d83b9e..01d98b16 100644 --- a/include/libnetlink.h +++ b/include/libnetlink.h @@ -96,6 +96,9 @@ int rtnl_dump_filter_nc(struct rtnl_handle *rth, int rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n, struct nlmsghdr **answer) __attribute__((warn_unused_result)); +int rtnl_talk_msg(struct rtnl_handle *rtnl, struct msghdr *m, + struct nlmsghdr **answer) + __attribute__((warn_unused_result)); int rtnl_talk_extack(struct rtnl_handle *rtnl, struct nlmsghdr *n, struct nlmsghdr **answer, nl_ext_ack_fn_t errfn) __attribute__((warn_unused_result)); diff --git a/lib/libnetlink.c b/lib/libnetlink.c index 00e6ce0c..cc02a139 100644 --- a/lib/libnetlink.c +++ b/lib/libnetlink.c @@ -581,32 +581,34 @@ static void rtnl_talk_error(struct nlmsghdr *h, struct nlmsgerr *err, strerror(-err->error)); } -static int __rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n, - struct nlmsghdr **answer, - bool show_rtnl_err, nl_ext_ack_fn_t errfn) +static int __rtnl_talk_msg(struct rtnl_handle *rtnl, struct msghdr *m, + struct nlmsghdr **answer, + bool show_rtnl_err, nl_ext_ack_fn_t errfn) { - int status; - unsigned int seq; - struct nlmsghdr *h; + int iovlen = m->msg_iovlen; + unsigned int seq = 0; + int i, status; + char *buf; + struct sockaddr_nl nladdr = { .nl_family = AF_NETLINK }; - struct iovec iov = { - .iov_base = n, - .iov_len = n->nlmsg_len - }; + struct iovec iov, *v; + struct nlmsghdr *h; struct msghdr msg = { .msg_name = &nladdr, .msg_namelen = sizeof(nladdr), .msg_iov = &iov, .msg_iovlen = 1, }; - char *buf; - n->nlmsg_seq = seq = ++rtnl->seq; - - if (answer == NULL) - n->nlmsg_flags |= NLM_F_ACK; + for (i = 0; i < iovlen; i++) { + v = &m->msg_iov[i]; + h = v->iov_base; + h->nlmsg_seq = seq = ++rtnl->seq; + if (answer == NULL) + h->nlmsg_flags |= NLM_F_ACK; + } - status = sendmsg(rtnl->fd, &msg, 0); + status = sendmsg(rtnl->fd, m, 0); if (status < 0) { perror("Cannot talk to rtnetlink"); return -1; @@ -698,12 +700,37 @@ static int __rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n, } } +static int __rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n, + struct nlmsghdr **answer, + bool show_rtnl_err, nl_ext_ack_fn_t errfn) +{ + struct sockaddr_nl nladdr = { .nl_family = AF_NETLINK }; + struct iovec iov = { + .iov_base = n, + .iov_len = n->nlmsg_len + }; + struct msghdr msg = { + .msg_name = &nladdr, + .msg_namelen = sizeof(nladdr), + .msg_iov = &iov, + .msg_iovlen = 1, + }; + + return __rtnl_talk_msg(rtnl, &msg, answer, show_rtnl_err, errfn); +} + int rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n, struct nlmsghdr **answer) { return __rtnl_talk(rtnl, n, answer, true, NULL); } +int rtnl_talk_msg(struct rtnl_handle *rtnl, struct msghdr *m, + struct nlmsghdr **answer) +{ + return __rtnl_talk_msg(rtnl, m, answer, true, NULL); +} + int rtnl_talk_extack(struct rtnl_handle *rtnl, struct nlmsghdr *n, struct nlmsghdr **answer, nl_ext_ack_fn_t errfn) -- 2.14.3 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [patch iproute2 v5 1/3] lib/libnetlink: Add a function rtnl_talk_msg 2018-01-03 2:55 ` [patch iproute2 v5 1/3] lib/libnetlink: Add a function rtnl_talk_msg Chris Mi @ 2018-01-03 4:08 ` David Ahern 2018-01-04 7:27 ` Chris Mi 0 siblings, 1 reply; 9+ messages in thread From: David Ahern @ 2018-01-03 4:08 UTC (permalink / raw) To: Chris Mi, netdev; +Cc: gerlitz.or, stephen, marcelo.leitner On 1/2/18 7:55 PM, Chris Mi wrote: > diff --git a/lib/libnetlink.c b/lib/libnetlink.c > index 00e6ce0c..cc02a139 100644 > --- a/lib/libnetlink.c > +++ b/lib/libnetlink.c > @@ -581,32 +581,34 @@ static void rtnl_talk_error(struct nlmsghdr *h, struct nlmsgerr *err, > strerror(-err->error)); > } > > -static int __rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n, > - struct nlmsghdr **answer, > - bool show_rtnl_err, nl_ext_ack_fn_t errfn) > +static int __rtnl_talk_msg(struct rtnl_handle *rtnl, struct msghdr *m, > + struct nlmsghdr **answer, > + bool show_rtnl_err, nl_ext_ack_fn_t errfn) > { > - int status; > - unsigned int seq; > - struct nlmsghdr *h; > + int iovlen = m->msg_iovlen; > + unsigned int seq = 0; > + int i, status; > + char *buf; > + > struct sockaddr_nl nladdr = { .nl_family = AF_NETLINK }; > - struct iovec iov = { > - .iov_base = n, > - .iov_len = n->nlmsg_len > - }; > + struct iovec iov, *v; > + struct nlmsghdr *h; > struct msghdr msg = { > .msg_name = &nladdr, > .msg_namelen = sizeof(nladdr), > .msg_iov = &iov, > .msg_iovlen = 1, > }; > - char *buf; Reverse xmas tree is the coding standard for net code. Please adhere to it. Only dependencies between variables are an acceptable exception. Some of those (struct nlmsghdr *h and struct iovec *v) can be moved to the for loop which aligns with your intentions of grouping variables. > > - n->nlmsg_seq = seq = ++rtnl->seq; > - > - if (answer == NULL) > - n->nlmsg_flags |= NLM_F_ACK; > + for (i = 0; i < iovlen; i++) { > + v = &m->msg_iov[i]; > + h = v->iov_base; > + h->nlmsg_seq = seq = ++rtnl->seq; doesn't seq need to track the recvmsg loop? I think for batching you want it to start at the first seq number and then in the recvmsg loop increment it. As it stands this file: $ cat tc.batch filter add dev eth2 ingress protocol ip pref 21 flower dst_ip 192.168.1.0/16 action drop filter add dev eth2 ingress protocol ip pref 22 flower dst_ip 192.168.2.0/16 action drop filter add dev eth2 ingress protocol ip pref 22 flower dst_ip 192.168.3.0/16 action drop filter add dev eth2 ingress protocol ip pref 24 flower dst_ip 192.168.4.0/16 action drop filter add dev eth2 ingress protocol ip pref 25 flower dst_ip 192.168.5.0/16 action drop does not give me an error message: $ tc -b tc.batch -bs 5 <no output> Yet it failed to insert all filters: $ tc filter show dev eth2 ingress filter protocol ip pref 21 flower chain 0 filter protocol ip pref 21 flower chain 0 handle 0x1 eth_type ipv4 dst_ip 192.168.1.0/16 not_in_hw action order 1: gact action drop random type none pass val 0 index 1 ref 1 bind 1 filter protocol ip pref 22 flower chain 0 filter protocol ip pref 22 flower chain 0 handle 0x1 eth_type ipv4 dst_ip 192.168.2.0/16 not_in_hw action order 1: gact action drop random type none pass val 0 index 2 ref 1 bind 1 filter protocol ip pref 24 flower chain 0 filter protocol ip pref 24 flower chain 0 handle 0x1 eth_type ipv4 dst_ip 192.168.4.0/16 not_in_hw action order 1: gact action drop random type none pass val 0 index 3 ref 1 bind 1 filter protocol ip pref 25 flower chain 0 filter protocol ip pref 25 flower chain 0 handle 0x1 eth_type ipv4 dst_ip 192.168.5.0/16 not_in_hw action order 1: gact action drop random type none pass val 0 index 4 ref 1 bind 1 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [patch iproute2 v5 1/3] lib/libnetlink: Add a function rtnl_talk_msg 2018-01-03 4:08 ` David Ahern @ 2018-01-04 7:27 ` Chris Mi 0 siblings, 0 replies; 9+ messages in thread From: Chris Mi @ 2018-01-04 7:27 UTC (permalink / raw) To: David Ahern, netdev; +Cc: gerlitz.or, stephen, marcelo.leitner 2018/1/3 12:08, David Ahern: > On 1/2/18 7:55 PM, Chris Mi wrote: >> diff --git a/lib/libnetlink.c b/lib/libnetlink.c >> index 00e6ce0c..cc02a139 100644 >> --- a/lib/libnetlink.c >> +++ b/lib/libnetlink.c >> @@ -581,32 +581,34 @@ static void rtnl_talk_error(struct nlmsghdr *h, struct nlmsgerr *err, >> strerror(-err->error)); >> } >> >> -static int __rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n, >> - struct nlmsghdr **answer, >> - bool show_rtnl_err, nl_ext_ack_fn_t errfn) >> +static int __rtnl_talk_msg(struct rtnl_handle *rtnl, struct msghdr *m, >> + struct nlmsghdr **answer, >> + bool show_rtnl_err, nl_ext_ack_fn_t errfn) >> { >> - int status; >> - unsigned int seq; >> - struct nlmsghdr *h; >> + int iovlen = m->msg_iovlen; >> + unsigned int seq = 0; >> + int i, status; >> + char *buf; >> + >> struct sockaddr_nl nladdr = { .nl_family = AF_NETLINK }; >> - struct iovec iov = { >> - .iov_base = n, >> - .iov_len = n->nlmsg_len >> - }; >> + struct iovec iov, *v; >> + struct nlmsghdr *h; >> struct msghdr msg = { >> .msg_name = &nladdr, >> .msg_namelen = sizeof(nladdr), >> .msg_iov = &iov, >> .msg_iovlen = 1, >> }; >> - char *buf; > Reverse xmas tree is the coding standard for net code. Please adhere to > it. Only dependencies between variables are an acceptable exception. OK, got it. > > Some of those (struct nlmsghdr *h and struct iovec *v) can be moved to > the for loop which aligns with your intentions of grouping variables. Done. > >> >> - n->nlmsg_seq = seq = ++rtnl->seq; >> - >> - if (answer == NULL) >> - n->nlmsg_flags |= NLM_F_ACK; >> + for (i = 0; i < iovlen; i++) { >> + v = &m->msg_iov[i]; >> + h = v->iov_base; >> + h->nlmsg_seq = seq = ++rtnl->seq; > doesn't seq need to track the recvmsg loop? I think for batching you > want it to start at the first seq number and then in the recvmsg loop > increment it. Yes, it is a bug. Thanks for your test case. > > As it stands this file: > $ cat tc.batch > filter add dev eth2 ingress protocol ip pref 21 flower dst_ip > 192.168.1.0/16 action drop > filter add dev eth2 ingress protocol ip pref 22 flower dst_ip > 192.168.2.0/16 action drop > filter add dev eth2 ingress protocol ip pref 22 flower dst_ip > 192.168.3.0/16 action drop > filter add dev eth2 ingress protocol ip pref 24 flower dst_ip > 192.168.4.0/16 action drop > filter add dev eth2 ingress protocol ip pref 25 flower dst_ip > 192.168.5.0/16 action drop > > does not give me an error message: > $ tc -b tc.batch -bs 5 > <no output> > > Yet it failed to insert all filters: > $ tc filter show dev eth2 ingress > filter protocol ip pref 21 flower chain 0 > filter protocol ip pref 21 flower chain 0 handle 0x1 > eth_type ipv4 > dst_ip 192.168.1.0/16 > not_in_hw > action order 1: gact action drop > random type none pass val 0 > index 1 ref 1 bind 1 > > filter protocol ip pref 22 flower chain 0 > filter protocol ip pref 22 flower chain 0 handle 0x1 > eth_type ipv4 > dst_ip 192.168.2.0/16 > not_in_hw > action order 1: gact action drop > random type none pass val 0 > index 2 ref 1 bind 1 > > filter protocol ip pref 24 flower chain 0 > filter protocol ip pref 24 flower chain 0 handle 0x1 > eth_type ipv4 > dst_ip 192.168.4.0/16 > not_in_hw > action order 1: gact action drop > random type none pass val 0 > index 3 ref 1 bind 1 > > filter protocol ip pref 25 flower chain 0 > filter protocol ip pref 25 flower chain 0 handle 0x1 > eth_type ipv4 > dst_ip 192.168.5.0/16 > not_in_hw > action order 1: gact action drop > random type none pass val 0 > index 4 ref 1 bind 1 > After fixing it, the test result is: # tc -b tc.batch -bs 5 RTNETLINK answers: File exists We have an error talking to the kernel, -1 Command failed 1.txt:0-4 We can't tell exactly which command causes this error, so we give a range which is less than the batch size. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [patch iproute2 v5 2/3] tc: Add -bs option to batch mode 2018-01-03 2:55 [patch iproute2 v5 0/3] tc: Add -bs option to batch mode Chris Mi 2018-01-03 2:55 ` [patch iproute2 v5 1/3] lib/libnetlink: Add a function rtnl_talk_msg Chris Mi @ 2018-01-03 2:55 ` Chris Mi 2018-01-03 4:25 ` David Ahern 2018-01-03 2:55 ` [patch iproute2 v5 3/3] man: Add -bs option to tc manpage Chris Mi 2018-01-03 2:57 ` [patch iproute2 v5 0/3] tc: Add -bs option to batch mode David Ahern 3 siblings, 1 reply; 9+ messages in thread From: Chris Mi @ 2018-01-03 2:55 UTC (permalink / raw) To: netdev; +Cc: gerlitz.or, stephen, dsahern, marcelo.leitner Signed-off-by: Chris Mi <chrism@mellanox.com> --- tc/m_action.c | 90 ++++++++++++++++++++++++++++++++---------- tc/tc.c | 70 ++++++++++++++++++++++++++------- tc/tc_common.h | 8 +++- tc/tc_filter.c | 121 +++++++++++++++++++++++++++++++++++++++++---------------- 4 files changed, 221 insertions(+), 68 deletions(-) diff --git a/tc/m_action.c b/tc/m_action.c index fc422364..2e79034d 100644 --- a/tc/m_action.c +++ b/tc/m_action.c @@ -23,6 +23,7 @@ #include <arpa/inet.h> #include <string.h> #include <dlfcn.h> +#include <errno.h> #include "utils.h" #include "tc_common.h" @@ -546,40 +547,87 @@ bad_val: return ret; } +typedef struct { + struct nlmsghdr n; + struct tcamsg t; + char buf[MAX_MSG]; +} tc_action_req; + +static tc_action_req *action_reqs; +static struct iovec msg_iov[MSG_IOV_MAX]; + +void free_action_reqs(void) +{ + free(action_reqs); +} + +static tc_action_req *get_action_req(int batch_size, int index) +{ + tc_action_req *req; + + if (action_reqs == NULL) { + action_reqs = malloc(batch_size * sizeof (tc_action_req)); + if (action_reqs == NULL) + return NULL; + } + req = &action_reqs[index]; + memset(req, 0, sizeof (*req)); + + return req; +} + static int tc_action_modify(int cmd, unsigned int flags, - int *argc_p, char ***argv_p) + int *argc_p, char ***argv_p, + int batch_size, int index, bool send) { int argc = *argc_p; char **argv = *argv_p; int ret = 0; - struct { - struct nlmsghdr n; - struct tcamsg t; - char buf[MAX_MSG]; - } req = { - .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcamsg)), - .n.nlmsg_flags = NLM_F_REQUEST | flags, - .n.nlmsg_type = cmd, - .t.tca_family = AF_UNSPEC, + tc_action_req *req; + struct sockaddr_nl nladdr = { .nl_family = AF_NETLINK }; + struct iovec *iov = &msg_iov[index]; + + req = get_action_req(batch_size, index); + if (req == NULL) { + fprintf(stderr, "get_action_req error: not enough buffer\n"); + return -ENOMEM; + } + + req->n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcamsg)); + req->n.nlmsg_flags = NLM_F_REQUEST | flags; + req->n.nlmsg_type = cmd; + req->t.tca_family = AF_UNSPEC; + struct rtattr *tail = NLMSG_TAIL(&req->n); + + struct msghdr msg = { + .msg_name = &nladdr, + .msg_namelen = sizeof(nladdr), + .msg_iov = msg_iov, + .msg_iovlen = index + 1, }; - struct rtattr *tail = NLMSG_TAIL(&req.n); argc -= 1; argv += 1; - if (parse_action(&argc, &argv, TCA_ACT_TAB, &req.n)) { + if (parse_action(&argc, &argv, TCA_ACT_TAB, &req->n)) { fprintf(stderr, "Illegal \"action\"\n"); return -1; } - tail->rta_len = (void *) NLMSG_TAIL(&req.n) - (void *) tail; + tail->rta_len = (void *) NLMSG_TAIL(&req->n) - (void *) tail; + + *argc_p = argc; + *argv_p = argv; + + iov->iov_base = &req->n; + iov->iov_len = req->n.nlmsg_len; + + if (!send) + return 0; - if (rtnl_talk(&rth, &req.n, NULL) < 0) { + if (rtnl_talk_msg(&rth, &msg, NULL) < 0) { fprintf(stderr, "We have an error talking to the kernel\n"); ret = -1; } - *argc_p = argc; - *argv_p = argv; - return ret; } @@ -679,7 +727,7 @@ bad_val: return ret; } -int do_action(int argc, char **argv) +int do_action(int argc, char **argv, int batch_size, int index, bool send) { int ret = 0; @@ -689,12 +737,14 @@ int do_action(int argc, char **argv) if (matches(*argv, "add") == 0) { ret = tc_action_modify(RTM_NEWACTION, NLM_F_EXCL | NLM_F_CREATE, - &argc, &argv); + &argc, &argv, batch_size, + index, send); } else if (matches(*argv, "change") == 0 || matches(*argv, "replace") == 0) { ret = tc_action_modify(RTM_NEWACTION, NLM_F_CREATE | NLM_F_REPLACE, - &argc, &argv); + &argc, &argv, batch_size, + index, send); } else if (matches(*argv, "delete") == 0) { argc -= 1; argv += 1; diff --git a/tc/tc.c b/tc/tc.c index ad9f07e9..90ce4ce2 100644 --- a/tc/tc.c +++ b/tc/tc.c @@ -189,20 +189,20 @@ static void usage(void) fprintf(stderr, "Usage: tc [ OPTIONS ] OBJECT { COMMAND | help }\n" " tc [-force] -batch filename\n" "where OBJECT := { qdisc | class | filter | action | monitor | exec }\n" - " OPTIONS := { -s[tatistics] | -d[etails] | -r[aw] | -p[retty] | -b[atch] [filename] | -n[etns] name |\n" + " OPTIONS := { -s[tatistics] | -d[etails] | -r[aw] | -p[retty] | -b[atch] [filename] | -bs | -batchsize [size] | -n[etns] name |\n" " -nm | -nam[es] | { -cf | -conf } path } | -j[son]\n"); } -static int do_cmd(int argc, char **argv) +static int do_cmd(int argc, char **argv, int batch_size, int index, bool send) { if (matches(*argv, "qdisc") == 0) return do_qdisc(argc-1, argv+1); if (matches(*argv, "class") == 0) return do_class(argc-1, argv+1); if (matches(*argv, "filter") == 0) - return do_filter(argc-1, argv+1); + return do_filter(argc-1, argv+1, batch_size, index, send); if (matches(*argv, "actions") == 0) - return do_action(argc-1, argv+1); + return do_action(argc-1, argv+1, batch_size, index, send); if (matches(*argv, "monitor") == 0) return do_tcmonitor(argc-1, argv+1); if (matches(*argv, "exec") == 0) @@ -217,11 +217,15 @@ static int do_cmd(int argc, char **argv) return -1; } -static int batch(const char *name) +static int batch(const char *name, int batch_size) { + bool lastline = false; + int msg_iov_index = 0; + char *line2 = NULL; char *line = NULL; size_t len = 0; int ret = 0; + bool send; batch_mode = 1; if (name && strcmp(name, "-") != 0) { @@ -240,23 +244,50 @@ static int batch(const char *name) } cmdlineno = 0; - while (getcmdline(&line, &len, stdin) != -1) { + if (getcmdline(&line, &len, stdin) == -1) + goto Exit; + do { char *largv[100]; int largc; + if (getcmdline(&line2, &len, stdin) == -1) + lastline = true; + largc = makeargs(line, largv, 100); + + line = line2; + line2 = NULL; + len = 0; + if (largc == 0) continue; /* blank line */ - if (do_cmd(largc, largv)) { - fprintf(stderr, "Command failed %s:%d\n", name, cmdlineno); + /* + * In batch mode, if we haven't accumulated enough commands + * and this is not the last command, don't send the message + * immediately. + */ + if (batch_size > 1 && msg_iov_index + 1 != batch_size + && !lastline) + send = false; + else + send = true; + + ret = do_cmd(largc, largv, batch_size, msg_iov_index++, send); + if (ret < 0) { + fprintf(stderr, "Command failed %s:%d\n", name, + cmdlineno); ret = 1; if (!force) break; } - } - if (line) - free(line); + msg_iov_index %= batch_size; + } while (!lastline); + + free_filter_reqs(); + free_action_reqs(); +Exit: + free(line); rtnl_close(&rth); return ret; @@ -267,6 +298,7 @@ int main(int argc, char **argv) { int ret; char *batch_file = NULL; + int batch_size = 1; while (argc > 1) { if (argv[1][0] != '-') @@ -297,6 +329,16 @@ int main(int argc, char **argv) if (argc <= 1) usage(); batch_file = argv[1]; + } else if (matches(argv[1], "-batchsize") == 0 || + matches(argv[1], "-bs") == 0) { + argc--; argv++; + if (argc <= 1) + usage(); + batch_size = atoi(argv[1]); + if (batch_size > MSG_IOV_MAX) + batch_size = MSG_IOV_MAX; + else if (batch_size < 0) + batch_size = 1; } else if (matches(argv[1], "-netns") == 0) { NEXT_ARG(); if (netns_switch(argv[1])) @@ -323,7 +365,7 @@ int main(int argc, char **argv) } if (batch_file) - return batch(batch_file); + return batch(batch_file, batch_size); if (argc <= 1) { usage(); @@ -341,7 +383,9 @@ int main(int argc, char **argv) goto Exit; } - ret = do_cmd(argc-1, argv+1); + ret = do_cmd(argc-1, argv+1, 1, 0, true); + free_filter_reqs(); + free_action_reqs(); Exit: rtnl_close(&rth); diff --git a/tc/tc_common.h b/tc/tc_common.h index 264fbdac..8a82439f 100644 --- a/tc/tc_common.h +++ b/tc/tc_common.h @@ -1,13 +1,14 @@ /* SPDX-License-Identifier: GPL-2.0 */ #define TCA_BUF_MAX (64*1024) +#define MSG_IOV_MAX 256 extern struct rtnl_handle rth; extern int do_qdisc(int argc, char **argv); extern int do_class(int argc, char **argv); -extern int do_filter(int argc, char **argv); -extern int do_action(int argc, char **argv); +extern int do_filter(int argc, char **argv, int batch_size, int index, bool send); +extern int do_action(int argc, char **argv, int batch_size, int index, bool send); extern int do_tcmonitor(int argc, char **argv); extern int do_exec(int argc, char **argv); @@ -24,5 +25,8 @@ struct tc_sizespec; extern int parse_size_table(int *p_argc, char ***p_argv, struct tc_sizespec *s); extern int check_size_table_opts(struct tc_sizespec *s); +extern void free_filter_reqs(void); +extern void free_action_reqs(void); + extern int show_graph; extern bool use_names; diff --git a/tc/tc_filter.c b/tc/tc_filter.c index 545cc3a1..6fecbb45 100644 --- a/tc/tc_filter.c +++ b/tc/tc_filter.c @@ -19,6 +19,7 @@ #include <arpa/inet.h> #include <string.h> #include <linux/if_ether.h> +#include <errno.h> #include "rt_names.h" #include "utils.h" @@ -42,18 +43,44 @@ static void usage(void) "OPTIONS := ... try tc filter add <desired FILTER_KIND> help\n"); } -static int tc_filter_modify(int cmd, unsigned int flags, int argc, char **argv) +typedef struct { + struct nlmsghdr n; + struct tcmsg t; + char buf[MAX_MSG]; +} tc_filter_req; + +static tc_filter_req *filter_reqs; +static struct iovec msg_iov[MSG_IOV_MAX]; + +void free_filter_reqs(void) { - struct { - struct nlmsghdr n; - struct tcmsg t; - char buf[MAX_MSG]; - } req = { - .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg)), - .n.nlmsg_flags = NLM_F_REQUEST | flags, - .n.nlmsg_type = cmd, - .t.tcm_family = AF_UNSPEC, - }; + free(filter_reqs); +} + +static tc_filter_req *get_filter_req(int batch_size, int index) +{ + tc_filter_req *req; + + if (filter_reqs == NULL) { + filter_reqs = malloc(batch_size * sizeof (tc_filter_req)); + if (filter_reqs == NULL) + return NULL; + } + req = &filter_reqs[index]; + memset(req, 0, sizeof (*req)); + + return req; +} + +static int tc_filter_modify(int cmd, unsigned int flags, int argc, char **argv, + int batch_size, int index, bool send) +{ + tc_filter_req *req; + int ret; + + struct sockaddr_nl nladdr = { .nl_family = AF_NETLINK }; + struct iovec *iov = &msg_iov[index]; + struct filter_util *q = NULL; __u32 prio = 0; __u32 protocol = 0; @@ -65,6 +92,24 @@ static int tc_filter_modify(int cmd, unsigned int flags, int argc, char **argv) char k[FILTER_NAMESZ] = {}; struct tc_estimator est = {}; + req = get_filter_req(batch_size, index); + if (req == NULL) { + fprintf(stderr, "get_filter_req error: not enough buffer\n"); + return -ENOMEM; + } + + req->n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg)); + req->n.nlmsg_flags = NLM_F_REQUEST | flags; + req->n.nlmsg_type = cmd; + req->t.tcm_family = AF_UNSPEC; + + struct msghdr msg = { + .msg_name = &nladdr, + .msg_namelen = sizeof(nladdr), + .msg_iov = msg_iov, + .msg_iovlen = index + 1, + }; + if (cmd == RTM_NEWTFILTER && flags & NLM_F_CREATE) protocol = htons(ETH_P_ALL); @@ -75,37 +120,37 @@ static int tc_filter_modify(int cmd, unsigned int flags, int argc, char **argv) duparg("dev", *argv); strncpy(d, *argv, sizeof(d)-1); } else if (strcmp(*argv, "root") == 0) { - if (req.t.tcm_parent) { + if (req->t.tcm_parent) { fprintf(stderr, "Error: \"root\" is duplicate parent ID\n"); return -1; } - req.t.tcm_parent = TC_H_ROOT; + req->t.tcm_parent = TC_H_ROOT; } else if (strcmp(*argv, "ingress") == 0) { - if (req.t.tcm_parent) { + if (req->t.tcm_parent) { fprintf(stderr, "Error: \"ingress\" is duplicate parent ID\n"); return -1; } - req.t.tcm_parent = TC_H_MAKE(TC_H_CLSACT, + req->t.tcm_parent = TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS); } else if (strcmp(*argv, "egress") == 0) { - if (req.t.tcm_parent) { + if (req->t.tcm_parent) { fprintf(stderr, "Error: \"egress\" is duplicate parent ID\n"); return -1; } - req.t.tcm_parent = TC_H_MAKE(TC_H_CLSACT, + req->t.tcm_parent = TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS); } else if (strcmp(*argv, "parent") == 0) { __u32 handle; NEXT_ARG(); - if (req.t.tcm_parent) + if (req->t.tcm_parent) duparg("parent", *argv); if (get_tc_classid(&handle, *argv)) invarg("Invalid parent ID", *argv); - req.t.tcm_parent = handle; + req->t.tcm_parent = handle; } else if (strcmp(*argv, "handle") == 0) { NEXT_ARG(); if (fhandle) @@ -152,26 +197,26 @@ static int tc_filter_modify(int cmd, unsigned int flags, int argc, char **argv) argc--; argv++; } - req.t.tcm_info = TC_H_MAKE(prio<<16, protocol); + req->t.tcm_info = TC_H_MAKE(prio<<16, protocol); if (chain_index_set) - addattr32(&req.n, sizeof(req), TCA_CHAIN, chain_index); + addattr32(&req->n, sizeof(*req), TCA_CHAIN, chain_index); if (k[0]) - addattr_l(&req.n, sizeof(req), TCA_KIND, k, strlen(k)+1); + addattr_l(&req->n, sizeof(*req), TCA_KIND, k, strlen(k)+1); if (d[0]) { ll_init_map(&rth); - req.t.tcm_ifindex = ll_name_to_index(d); - if (req.t.tcm_ifindex == 0) { + req->t.tcm_ifindex = ll_name_to_index(d); + if (req->t.tcm_ifindex == 0) { fprintf(stderr, "Cannot find device \"%s\"\n", d); return 1; } } if (q) { - if (q->parse_fopt(q, fhandle, argc, argv, &req.n)) + if (q->parse_fopt(q, fhandle, argc, argv, &req->n)) return 1; } else { if (fhandle) { @@ -190,10 +235,17 @@ static int tc_filter_modify(int cmd, unsigned int flags, int argc, char **argv) } if (est.ewma_log) - addattr_l(&req.n, sizeof(req), TCA_RATE, &est, sizeof(est)); + addattr_l(&req->n, sizeof(*req), TCA_RATE, &est, sizeof(est)); - if (rtnl_talk(&rth, &req.n, NULL) < 0) { - fprintf(stderr, "We have an error talking to the kernel\n"); + iov->iov_base = &req->n; + iov->iov_len = req->n.nlmsg_len; + + if (!send) + return 0; + + ret = rtnl_talk_msg(&rth, &msg, NULL); + if (ret < 0) { + fprintf(stderr, "We have an error talking to the kernel, %d\n", ret); return 2; } @@ -636,20 +688,23 @@ static int tc_filter_list(int argc, char **argv) return 0; } -int do_filter(int argc, char **argv) +int do_filter(int argc, char **argv, int batch_size, int index, bool send) { if (argc < 1) return tc_filter_list(0, NULL); if (matches(*argv, "add") == 0) return tc_filter_modify(RTM_NEWTFILTER, NLM_F_EXCL|NLM_F_CREATE, - argc-1, argv+1); + argc-1, argv+1, + batch_size, index, send); if (matches(*argv, "change") == 0) - return tc_filter_modify(RTM_NEWTFILTER, 0, argc-1, argv+1); + return tc_filter_modify(RTM_NEWTFILTER, 0, argc-1, argv+1, + batch_size, index, send); if (matches(*argv, "replace") == 0) return tc_filter_modify(RTM_NEWTFILTER, NLM_F_CREATE, argc-1, - argv+1); + argv+1, batch_size, index, send); if (matches(*argv, "delete") == 0) - return tc_filter_modify(RTM_DELTFILTER, 0, argc-1, argv+1); + return tc_filter_modify(RTM_DELTFILTER, 0, argc-1, argv+1, + batch_size, index, send); if (matches(*argv, "get") == 0) return tc_filter_get(RTM_GETTFILTER, 0, argc-1, argv+1); if (matches(*argv, "list") == 0 || matches(*argv, "show") == 0 -- 2.14.3 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [patch iproute2 v5 2/3] tc: Add -bs option to batch mode 2018-01-03 2:55 ` [patch iproute2 v5 2/3] tc: Add -bs option to batch mode Chris Mi @ 2018-01-03 4:25 ` David Ahern 2018-01-04 7:32 ` Chris Mi 0 siblings, 1 reply; 9+ messages in thread From: David Ahern @ 2018-01-03 4:25 UTC (permalink / raw) To: Chris Mi, netdev; +Cc: gerlitz.or, stephen, marcelo.leitner You need a patch description here ... On 1/2/18 7:55 PM, Chris Mi wrote: > static int tc_action_modify(int cmd, unsigned int flags, > - int *argc_p, char ***argv_p) > + int *argc_p, char ***argv_p, > + int batch_size, int index, bool send) > { > int argc = *argc_p; > char **argv = *argv_p; > int ret = 0; > - struct { > - struct nlmsghdr n; > - struct tcamsg t; > - char buf[MAX_MSG]; > - } req = { > - .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcamsg)), > - .n.nlmsg_flags = NLM_F_REQUEST | flags, > - .n.nlmsg_type = cmd, > - .t.tca_family = AF_UNSPEC, > + tc_action_req *req; > + struct sockaddr_nl nladdr = { .nl_family = AF_NETLINK }; > + struct iovec *iov = &msg_iov[index]; Reverse xmas tree is the coding standard for net code. Please check all new code to conform to this standard. I have not reviewed all of this patch, but I firmly believe the batching size option needs to be able handle a file with mixed commands. Your use case is filter and action adds and deletes, but you should allow users (e.g., test suites) to benefit from this performance speed up with test cases that have single files with all of the commands. For example, $ cat tc.batch qdisc add dev eth2 ingress filter add dev eth2 ingress protocol ip pref 21 flower dst_ip 192.168.1.0/16 action drop filter add dev eth2 ingress protocol ip pref 22 flower dst_ip 192.168.2.0/16 action drop filter add dev eth2 ingress protocol ip pref 23 flower dst_ip 192.168.3.0/16 action drop filter add dev eth2 ingress protocol ip pref 24 flower dst_ip 192.168.4.0/16 action drop filter add dev eth2 ingress protocol ip pref 25 flower dst_ip 192.168.5.0/16 action drop qdisc del dev eth2 ingress (and consider this to be a huge file to really stress tc code paths for example). Right now, the above file fails: $ tc -b tc.batch -bs 5 Segmentation fault Also, your changes fail to break out on an error: $ tc -b tc.batch -bs 1 RTNETLINK answers: File exists We have an error talking to the kernel, -1 RTNETLINK answers: File exists We have an error talking to the kernel, -1 RTNETLINK answers: File exists We have an error talking to the kernel, -1 RTNETLINK answers: File exists We have an error talking to the kernel, -1 RTNETLINK answers: File exists We have an error talking to the kernel, -1 where as the existing command does this: $ tc -b tc.batch RTNETLINK answers: File exists We have an error talking to the kernel Command failed tc.batch:1 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [patch iproute2 v5 2/3] tc: Add -bs option to batch mode 2018-01-03 4:25 ` David Ahern @ 2018-01-04 7:32 ` Chris Mi 0 siblings, 0 replies; 9+ messages in thread From: Chris Mi @ 2018-01-04 7:32 UTC (permalink / raw) To: David Ahern, netdev; +Cc: gerlitz.or, stephen, marcelo.leitner 2018/1/3 12:25, David Ahern: > You need a patch description here ... Done. > > On 1/2/18 7:55 PM, Chris Mi wrote: >> static int tc_action_modify(int cmd, unsigned int flags, >> - int *argc_p, char ***argv_p) >> + int *argc_p, char ***argv_p, >> + int batch_size, int index, bool send) >> { >> int argc = *argc_p; >> char **argv = *argv_p; >> int ret = 0; >> - struct { >> - struct nlmsghdr n; >> - struct tcamsg t; >> - char buf[MAX_MSG]; >> - } req = { >> - .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcamsg)), >> - .n.nlmsg_flags = NLM_F_REQUEST | flags, >> - .n.nlmsg_type = cmd, >> - .t.tca_family = AF_UNSPEC, >> + tc_action_req *req; >> + struct sockaddr_nl nladdr = { .nl_family = AF_NETLINK }; >> + struct iovec *iov = &msg_iov[index]; > Reverse xmas tree is the coding standard for net code. Please check all > new code to conform to this standard. Done. > > I have not reviewed all of this patch, but I firmly believe the batching > size option needs to be able handle a file with mixed commands. Your use > case is filter and action adds and deletes, but you should allow users > (e.g., test suites) to benefit from this performance speed up with test > cases that have single files with all of the commands. Done. There is a little performance pernalty. But I think it is better than segfault. > > For example, > $ cat tc.batch > qdisc add dev eth2 ingress > filter add dev eth2 ingress protocol ip pref 21 flower dst_ip > 192.168.1.0/16 action drop > filter add dev eth2 ingress protocol ip pref 22 flower dst_ip > 192.168.2.0/16 action drop > filter add dev eth2 ingress protocol ip pref 23 flower dst_ip > 192.168.3.0/16 action drop > filter add dev eth2 ingress protocol ip pref 24 flower dst_ip > 192.168.4.0/16 action drop > filter add dev eth2 ingress protocol ip pref 25 flower dst_ip > 192.168.5.0/16 action drop > qdisc del dev eth2 ingress > > (and consider this to be a huge file to really stress tc code paths for > example). Right now, the above file fails: > > $ tc -b tc.batch -bs 5 > Segmentation fault > > > Also, your changes fail to break out on an error: > > $ tc -b tc.batch -bs 1 > RTNETLINK answers: File exists > We have an error talking to the kernel, -1 > RTNETLINK answers: File exists > We have an error talking to the kernel, -1 > RTNETLINK answers: File exists > We have an error talking to the kernel, -1 > RTNETLINK answers: File exists > We have an error talking to the kernel, -1 > RTNETLINK answers: File exists > We have an error talking to the kernel, -1 > > where as the existing command does this: > $ tc -b tc.batch > RTNETLINK answers: File exists > We have an error talking to the kernel > Command failed tc.batch:1 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [patch iproute2 v5 3/3] man: Add -bs option to tc manpage 2018-01-03 2:55 [patch iproute2 v5 0/3] tc: Add -bs option to batch mode Chris Mi 2018-01-03 2:55 ` [patch iproute2 v5 1/3] lib/libnetlink: Add a function rtnl_talk_msg Chris Mi 2018-01-03 2:55 ` [patch iproute2 v5 2/3] tc: Add -bs option to batch mode Chris Mi @ 2018-01-03 2:55 ` Chris Mi 2018-01-03 2:57 ` [patch iproute2 v5 0/3] tc: Add -bs option to batch mode David Ahern 3 siblings, 0 replies; 9+ messages in thread From: Chris Mi @ 2018-01-03 2:55 UTC (permalink / raw) To: netdev; +Cc: gerlitz.or, stephen, dsahern, marcelo.leitner Signed-off-by: Chris Mi <chrism@mellanox.com> --- man/man8/tc.8 | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/man/man8/tc.8 b/man/man8/tc.8 index ff071b33..7338ed3b 100644 --- a/man/man8/tc.8 +++ b/man/man8/tc.8 @@ -601,6 +601,15 @@ must exist already. read commands from provided file or standard input and invoke them. First failure will cause termination of tc. +.TP +.BR "\-bs", " \-bs size", " \-batchsize", " \-batchsize size" +How many commands are accumulated before sending to kernel. +By default, it is 1. It only takes effect in batch mode. +Currently, it only supports filter add or actions add. +If there are mixed commands in the batch file, the result is unpredictable. +And there is a limitation that the last line in the batch file should not be blank. +Or you will lose at most batchsize - 1 rules. + .TP .BR "\-force" don't terminate tc on errors in batch mode. -- 2.14.3 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [patch iproute2 v5 0/3] tc: Add -bs option to batch mode 2018-01-03 2:55 [patch iproute2 v5 0/3] tc: Add -bs option to batch mode Chris Mi ` (2 preceding siblings ...) 2018-01-03 2:55 ` [patch iproute2 v5 3/3] man: Add -bs option to tc manpage Chris Mi @ 2018-01-03 2:57 ` David Ahern 3 siblings, 0 replies; 9+ messages in thread From: David Ahern @ 2018-01-03 2:57 UTC (permalink / raw) To: Chris Mi, netdev; +Cc: gerlitz.or, stephen, marcelo.leitner I have a day job outside of iproute2 patches; give a day or 2 for review by many people. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2018-01-04 7:33 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-01-03 2:55 [patch iproute2 v5 0/3] tc: Add -bs option to batch mode Chris Mi 2018-01-03 2:55 ` [patch iproute2 v5 1/3] lib/libnetlink: Add a function rtnl_talk_msg Chris Mi 2018-01-03 4:08 ` David Ahern 2018-01-04 7:27 ` Chris Mi 2018-01-03 2:55 ` [patch iproute2 v5 2/3] tc: Add -bs option to batch mode Chris Mi 2018-01-03 4:25 ` David Ahern 2018-01-04 7:32 ` Chris Mi 2018-01-03 2:55 ` [patch iproute2 v5 3/3] man: Add -bs option to tc manpage Chris Mi 2018-01-03 2:57 ` [patch iproute2 v5 0/3] tc: Add -bs option to batch mode David Ahern
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox