* Re: [PATCH net] net: get rid of an signed integer overflow in ip_idents_reserve()
From: David Miller @ 2016-09-22 6:42 UTC (permalink / raw)
To: eric.dumazet; +Cc: jiri, netdev
In-Reply-To: <1474419977.23058.53.camel@edumazet-glaptop3.roam.corp.google.com>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 20 Sep 2016 18:06:17 -0700
> From: Eric Dumazet <edumazet@google.com>
>
> Jiri Pirko reported an UBSAN warning happening in ip_idents_reserve()
>
> [] UBSAN: Undefined behaviour in ./arch/x86/include/asm/atomic.h:156:11
> [] signed integer overflow:
> [] -2117905507 + -695755206 cannot be represented in type 'int'
>
> Since we do not have uatomic_add_return() yet, use atomic_cmpxchg()
> so that the arithmetics can be done using unsigned int.
>
> Fixes: 04ca6973f7c1 ("ip: make IP identifiers less predictable")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Jiri Pirko <jiri@resnulli.us>
> ---
> David, Jiri, I removed the prandom_u32() stuff in favor of a traditional
> loop to meet stable requirements. Thanks !
Applied.
^ permalink raw reply
* [PATCH nf v4] netfilter: seqadj: Fix the wrong ack adjust for the RST packet without ack
From: fgao @ 2016-09-22 6:29 UTC (permalink / raw)
To: pablo, kaber, netfilter-devel, netdev; +Cc: gfree.wind, Gao Feng
From: Gao Feng <fgao@ikuai8.com>
It is valid that the TCP RST packet which does not set ack flag, and bytes
of ack number are zero. But current seqadj codes would adjust the "0" ack
to invalid ack number. Actually seqadj need to check the ack flag before
adjust it for these RST packets.
The following is my test case
client is 10.26.98.245, and add one iptable rule:
iptables -I INPUT -p tcp --sport 12345 -m connbytes --connbytes 2:
--connbytes-dir reply --connbytes-mode packets -j REJECT --reject-with
tcp-reset
This iptables rule could generate on TCP RST without ack flag.
server:10.172.135.55
Enable the synproxy with seqadjust by the following iptables rules
iptables -t raw -A PREROUTING -i eth0 -p tcp -d 10.172.135.55 --dport 12345
-m tcp --syn -j CT --notrack
iptables -A INPUT -i eth0 -p tcp -d 10.172.135.55 --dport 12345 -m conntrack
--ctstate INVALID,UNTRACKED -j SYNPROXY --sack-perm --timestamp --wscale 7
--mss 1460
iptables -A OUTPUT -o eth0 -p tcp -s 10.172.135.55 --sport 12345 -m conntrack
--ctstate INVALID,UNTRACKED -m tcp --tcp-flags SYN,RST,ACK SYN,ACK -j ACCEPT
The following is my test result.
1. packet trace on client
root@routers:/tmp# tcpdump -i eth0 tcp port 12345 -n
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
IP 10.26.98.245.45154 > 10.172.135.55.12345: Flags [S], seq 3695959829,
win 29200, options [mss 1460,sackOK,TS val 452367884 ecr 0,nop,wscale 7],
length 0
IP 10.172.135.55.12345 > 10.26.98.245.45154: Flags [S.], seq 546723266,
ack 3695959830, win 0, options [mss 1460,sackOK,TS val 15643479 ecr 452367884,
nop,wscale 7], length 0
IP 10.26.98.245.45154 > 10.172.135.55.12345: Flags [.], ack 1, win 229,
options [nop,nop,TS val 452367885 ecr 15643479], length 0
IP 10.172.135.55.12345 > 10.26.98.245.45154: Flags [.], ack 1, win 226,
options [nop,nop,TS val 15643479 ecr 452367885], length 0
IP 10.26.98.245.45154 > 10.172.135.55.12345: Flags [R], seq 3695959830,
win 0, length 0
2. seqadj log on server
[62873.867319] Adjusting sequence number from 602341895->546723267,
ack from 3695959830->3695959830
[62873.867644] Adjusting sequence number from 602341895->546723267,
ack from 3695959830->3695959830
[62873.869040] Adjusting sequence number from 3695959830->3695959830,
ack from 0->55618628
To summarize, it is clear that the seqadj codes adjust the 0 ack when receive
one TCP RST packet without ack.
Signed-off-by: Gao Feng <fgao@ikuai8.com>
---
v4: Don't invoke nf_ct_sack_adjust when no ack flag
v3: Add the reproduce steps and packet trace
v2: Regenerate because the first patch is removed
v1: Initial patch
net/netfilter/nf_conntrack_seqadj.c | 37 ++++++++++++++++++++-----------------
1 file changed, 20 insertions(+), 17 deletions(-)
diff --git a/net/netfilter/nf_conntrack_seqadj.c b/net/netfilter/nf_conntrack_seqadj.c
index dff0f0c..80ab429 100644
--- a/net/netfilter/nf_conntrack_seqadj.c
+++ b/net/netfilter/nf_conntrack_seqadj.c
@@ -169,7 +169,7 @@ int nf_ct_seq_adjust(struct sk_buff *skb,
s32 seqoff, ackoff;
struct nf_conn_seqadj *seqadj = nfct_seqadj(ct);
struct nf_ct_seqadj *this_way, *other_way;
- int res;
+ int res = 1;
this_way = &seqadj->seq[dir];
other_way = &seqadj->seq[!dir];
@@ -184,27 +184,30 @@ int nf_ct_seq_adjust(struct sk_buff *skb,
else
seqoff = this_way->offset_before;
- if (after(ntohl(tcph->ack_seq) - other_way->offset_before,
- other_way->correction_pos))
- ackoff = other_way->offset_after;
- else
- ackoff = other_way->offset_before;
-
newseq = htonl(ntohl(tcph->seq) + seqoff);
- newack = htonl(ntohl(tcph->ack_seq) - ackoff);
-
inet_proto_csum_replace4(&tcph->check, skb, tcph->seq, newseq, false);
- inet_proto_csum_replace4(&tcph->check, skb, tcph->ack_seq, newack,
- false);
+ pr_debug("Adjusting sequence number from %u->%u\n",
+ ntohl(tcph->seq), ntohl(newseq));
+ tcph->seq = newseq;
- pr_debug("Adjusting sequence number from %u->%u, ack from %u->%u\n",
- ntohl(tcph->seq), ntohl(newseq), ntohl(tcph->ack_seq),
- ntohl(newack));
+ if (likely(tcph->ack)) {
+ if (after(ntohl(tcph->ack_seq) - other_way->offset_before,
+ other_way->correction_pos))
+ ackoff = other_way->offset_after;
+ else
+ ackoff = other_way->offset_before;
- tcph->seq = newseq;
- tcph->ack_seq = newack;
+ newack = htonl(ntohl(tcph->ack_seq) - ackoff);
+ inet_proto_csum_replace4(&tcph->check, skb, tcph->ack_seq,
+ newack, false);
+ pr_debug("Adjusting ack number from %u->%u, ack from %u->%u\n",
+ ntohl(tcph->seq), ntohl(newseq), ntohl(tcph->ack_seq),
+ ntohl(newack));
+ tcph->ack_seq = newack;
+
+ res = nf_ct_sack_adjust(skb, protoff, tcph, ct, ctinfo);
+ }
- res = nf_ct_sack_adjust(skb, protoff, tcph, ct, ctinfo);
spin_unlock_bh(&ct->lock);
return res;
--
1.9.1
^ permalink raw reply related
* [PATCH iproute2 2/2] ip rule: add selector support
From: Hangbin Liu @ 2016-09-22 6:28 UTC (permalink / raw)
To: netdev; +Cc: Stephen Hemminger, Phil Sutter, Hangbin Liu
In-Reply-To: <1474525729-2845-1-git-send-email-liuhangbin@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
ip/iprule.c | 180 +++++++++++++++++++++++++++++++++++++++++++++++++++--
man/man8/ip-rule.8 | 6 +-
2 files changed, 180 insertions(+), 6 deletions(-)
diff --git a/ip/iprule.c b/ip/iprule.c
index e18505f..42fb6af 100644
--- a/ip/iprule.c
+++ b/ip/iprule.c
@@ -20,6 +20,7 @@
#include <netinet/ip.h>
#include <arpa/inet.h>
#include <string.h>
+#include <linux/if.h>
#include <linux/fib_rules.h>
#include <errno.h>
@@ -41,7 +42,7 @@ static void usage(void)
{
fprintf(stderr, "Usage: ip rule { add | del } SELECTOR ACTION\n");
fprintf(stderr, " ip rule { flush | save | restore }\n");
- fprintf(stderr, " ip rule [ list ]\n");
+ fprintf(stderr, " ip rule [ list [ SELECTOR ]]\n");
fprintf(stderr, "SELECTOR := [ not ] [ from PREFIX ] [ to PREFIX ] [ tos TOS ] [ fwmark FWMARK[/MASK] ]\n");
fprintf(stderr, " [ iif STRING ] [ oif STRING ] [ pref NUMBER ] [ l3mdev ]\n");
fprintf(stderr, "ACTION := [ table TABLE_ID ]\n");
@@ -55,6 +56,105 @@ static void usage(void)
exit(-1);
}
+static struct
+{
+ int not;
+ int l3mdev;
+ int iifmask, oifmask;
+ unsigned int tb;
+ unsigned int tos, tosmask;
+ unsigned int pref, prefmask;
+ unsigned int fwmark, fwmask;
+ char iif[IFNAMSIZ];
+ char oif[IFNAMSIZ];
+ inet_prefix src;
+ inet_prefix dst;
+} filter;
+
+static bool filter_nlmsg(struct nlmsghdr *n, struct rtattr **tb, int host_len)
+{
+ struct rtmsg *r = NLMSG_DATA(n);
+ inet_prefix src = { .family = r->rtm_family };
+ inet_prefix dst = { .family = r->rtm_family };
+ __u32 table;
+
+ if (preferred_family != AF_UNSPEC && r->rtm_family != preferred_family)
+ return false;
+
+ if (filter.prefmask &&
+ filter.pref ^ (tb[FRA_PRIORITY] ? rta_getattr_u32(tb[FRA_PRIORITY]) : 0))
+ return false;
+ if (filter.not && !(r->rtm_flags & FIB_RULE_INVERT))
+ return false;
+
+ if (filter.src.family) {
+ if (tb[FRA_SRC]) {
+ memcpy(&src.data, RTA_DATA(tb[FRA_SRC]),
+ (r->rtm_src_len + 7) / 8);
+ }
+ if (filter.src.family != r->rtm_family ||
+ filter.src.bitlen > r->rtm_src_len ||
+ inet_addr_match(&src, &filter.src, filter.src.bitlen))
+ return false;
+ }
+
+ if (filter.dst.family) {
+ if (tb[FRA_DST]) {
+ memcpy(&dst.data, RTA_DATA(tb[FRA_DST]),
+ (r->rtm_dst_len + 7) / 8);
+ }
+ if (filter.dst.family != r->rtm_family ||
+ filter.dst.bitlen > r->rtm_dst_len ||
+ inet_addr_match(&dst, &filter.dst, filter.dst.bitlen))
+ return false;
+ }
+
+ if (filter.tosmask && filter.tos ^ r->rtm_tos)
+ return false;
+
+ if (filter.fwmark) {
+ __u32 mark = 0;
+ if (tb[FRA_FWMARK])
+ mark = rta_getattr_u32(tb[FRA_FWMARK]);
+ if (filter.fwmark ^ mark)
+ return false;
+ }
+ if (filter.fwmask) {
+ __u32 mask = 0;
+ if (tb[FRA_FWMASK])
+ mask = rta_getattr_u32(tb[FRA_FWMASK]);
+ if (filter.fwmask ^ mask)
+ return false;
+ }
+
+ if (filter.iifmask) {
+ if (tb[FRA_IFNAME]) {
+ if (strcmp(filter.iif, rta_getattr_str(tb[FRA_IFNAME])) != 0)
+ return false;
+ } else {
+ return false;
+ }
+ }
+
+ if (filter.oifmask) {
+ if (tb[FRA_OIFNAME]) {
+ if (strcmp(filter.oif, rta_getattr_str(tb[FRA_OIFNAME])) != 0)
+ return false;
+ } else {
+ return false;
+ }
+ }
+
+ if (filter.l3mdev && !(tb[FRA_L3MDEV] && rta_getattr_u8(tb[FRA_L3MDEV])))
+ return false;
+
+ table = rtm_get_table(r, tb);
+ if (filter.tb > 0 && filter.tb ^ table)
+ return false;
+
+ return true;
+}
+
int print_rule(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
{
FILE *fp = (FILE *)arg;
@@ -77,6 +177,9 @@ int print_rule(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
host_len = af_bit_len(r->rtm_family);
+ if(!filter_nlmsg(n, tb, host_len))
+ return 0;
+
if (n->nlmsg_type == RTM_DELRULE)
fprintf(fp, "Deleted ");
@@ -287,9 +390,9 @@ static int iprule_list_flush_or_save(int argc, char **argv, int action)
if (af == AF_UNSPEC)
af = AF_INET;
- if (argc > 0) {
- fprintf(stderr,
- "\"ip rule list/flush/save\" does not take any arguments\n");
+ if (action != IPRULE_LIST && argc > 0) {
+ fprintf(stderr, "\"ip rule %s\" does not take any arguments.\n",
+ action == IPRULE_SAVE ? "save" : "flush");
return -1;
}
@@ -306,6 +409,75 @@ static int iprule_list_flush_or_save(int argc, char **argv, int action)
filter_fn = print_rule;
}
+ memset(&filter, 0, sizeof(filter));
+
+ while (argc > 0) {
+ if (matches(*argv, "preference") == 0 ||
+ matches(*argv, "order") == 0 ||
+ matches(*argv, "priority") == 0) {
+ __u32 pref;
+ NEXT_ARG();
+ if (get_u32(&pref, *argv, 0))
+ invarg("preference value is invalid\n", *argv);
+ filter.pref = pref;
+ filter.prefmask = 1;
+ } else if (strcmp(*argv, "not") == 0) {
+ filter.not = 1;
+ } else if (strcmp(*argv, "tos") == 0) {
+ __u32 tos;
+ NEXT_ARG();
+ if (rtnl_dsfield_a2n(&tos, *argv))
+ invarg("TOS value is invalid\n", *argv);
+ filter.tos = tos;
+ filter.tosmask = 1;
+ } else if (strcmp(*argv, "fwmark") == 0) {
+ char *slash;
+ __u32 fwmark, fwmask;
+ NEXT_ARG();
+ slash = strchr(*argv, '/');
+ if (slash != NULL)
+ *slash = '\0';
+ if (get_u32(&fwmark, *argv, 0))
+ invarg("fwmark value is invalid\n", *argv);
+ filter.fwmark = fwmark;
+ if (slash) {
+ if (get_u32(&fwmask, slash+1, 0))
+ invarg("fwmask value is invalid\n",
+ slash+1);
+ filter.fwmask = fwmask;
+ }
+ } else if (strcmp(*argv, "dev") == 0 ||
+ strcmp(*argv, "iif") == 0) {
+ NEXT_ARG();
+ strncpy(filter.iif, *argv, IFNAMSIZ);
+ filter.iifmask = 1;
+ } else if (strcmp(*argv, "oif") == 0) {
+ NEXT_ARG();
+ strncpy(filter.oif, *argv, IFNAMSIZ);
+ filter.oifmask = 1;
+ } else if (strcmp(*argv, "l3mdev") == 0) {
+ filter.l3mdev = 1;
+ } else if (matches(*argv, "lookup") == 0 ||
+ matches(*argv, "table") == 0 ) {
+ __u32 tid;
+ NEXT_ARG();
+ if (rtnl_rttable_a2n(&tid, *argv))
+ invarg("table id value is invalid\n", *argv);
+ filter.tb = tid;
+ } else if (matches(*argv, "from") == 0 ||
+ matches(*argv, "src") == 0) {
+ NEXT_ARG();
+ get_prefix(&filter.src, *argv, af);
+ } else {
+ if (matches(*argv, "dst") == 0 ||
+ matches(*argv, "to") == 0) {
+ NEXT_ARG();
+ }
+ get_prefix(&filter.dst, *argv, af);
+ }
+ argc--; argv++;
+ }
+
if (rtnl_wilddump_request(&rth, af, RTM_GETRULE) < 0) {
perror("Cannot send dump request");
return 1;
diff --git a/man/man8/ip-rule.8 b/man/man8/ip-rule.8
index 3508d80..ec0e31d 100644
--- a/man/man8/ip-rule.8
+++ b/man/man8/ip-rule.8
@@ -15,7 +15,8 @@ ip-rule \- routing policy database management
.ti -8
.B ip rule
-.RB "[ " list " ]"
+.RB "[ " list
+.I "[ " SELECTOR " ]]"
.ti -8
.B ip rule
@@ -42,7 +43,8 @@ ip-rule \- routing policy database management
.B oif
.IR STRING " ] [ "
.B pref
-.IR NUMBER " ]"
+.IR NUMBER " ] [ "
+.BR l3mdev " ]"
.ti -8
.IR ACTION " := [ "
--
2.5.5
^ permalink raw reply related
* [PATCH iproute2 1/2] ip rule: merge ip rule flush and list, save together
From: Hangbin Liu @ 2016-09-22 6:28 UTC (permalink / raw)
To: netdev; +Cc: Stephen Hemminger, Phil Sutter, Hangbin Liu
In-Reply-To: <1474525729-2845-1-git-send-email-liuhangbin@gmail.com>
iprule_flush() and iprule_list_or_save() both call function
rtnl_wilddump_request() and rtnl_dump_filter(). So merge them
together just like other files do.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
ip/iprule.c | 121 +++++++++++++++++++++++++++---------------------------------
1 file changed, 54 insertions(+), 67 deletions(-)
diff --git a/ip/iprule.c b/ip/iprule.c
index 70562c5..e18505f 100644
--- a/ip/iprule.c
+++ b/ip/iprule.c
@@ -27,6 +27,12 @@
#include "utils.h"
#include "ip_common.h"
+enum list_action {
+ IPRULE_LIST,
+ IPRULE_FLUSH,
+ IPRULE_SAVE,
+};
+
extern struct rtnl_handle rth;
static void usage(void) __attribute__((noreturn));
@@ -243,24 +249,61 @@ static int save_rule(const struct sockaddr_nl *who,
return ret == n->nlmsg_len ? 0 : ret;
}
-static int iprule_list_or_save(int argc, char **argv, int save)
+static int flush_rule(const struct sockaddr_nl *who, struct nlmsghdr *n,
+ void *arg)
+{
+ struct rtnl_handle rth2;
+ struct rtmsg *r = NLMSG_DATA(n);
+ int len = n->nlmsg_len;
+ struct rtattr *tb[FRA_MAX+1];
+
+ len -= NLMSG_LENGTH(sizeof(*r));
+ if (len < 0)
+ return -1;
+
+ parse_rtattr(tb, FRA_MAX, RTM_RTA(r), len);
+
+ if (tb[FRA_PRIORITY]) {
+ n->nlmsg_type = RTM_DELRULE;
+ n->nlmsg_flags = NLM_F_REQUEST;
+
+ if (rtnl_open(&rth2, 0) < 0)
+ return -1;
+
+ if (rtnl_talk(&rth2, n, NULL, 0) < 0)
+ return -2;
+
+ rtnl_close(&rth2);
+ }
+
+ return 0;
+}
+
+static int iprule_list_flush_or_save(int argc, char **argv, int action)
{
- rtnl_filter_t filter = print_rule;
+ rtnl_filter_t filter_fn;
int af = preferred_family;
if (af == AF_UNSPEC)
af = AF_INET;
if (argc > 0) {
- fprintf(stderr, "\"ip rule %s\" does not take any arguments.\n",
- save ? "save" : "show");
+ fprintf(stderr,
+ "\"ip rule list/flush/save\" does not take any arguments\n");
return -1;
}
- if (save) {
+ switch (action) {
+ case IPRULE_SAVE:
if (save_rule_prep())
return -1;
- filter = save_rule;
+ filter_fn = save_rule;
+ break;
+ case IPRULE_FLUSH:
+ filter_fn = flush_rule;
+ break;
+ default:
+ filter_fn = print_rule;
}
if (rtnl_wilddump_request(&rth, af, RTM_GETRULE) < 0) {
@@ -268,7 +311,7 @@ static int iprule_list_or_save(int argc, char **argv, int save)
return 1;
}
- if (rtnl_dump_filter(&rth, filter, stdout) < 0) {
+ if (rtnl_dump_filter(&rth, filter_fn, stdout) < 0) {
fprintf(stderr, "Dump terminated\n");
return 1;
}
@@ -511,72 +554,16 @@ static int iprule_modify(int cmd, int argc, char **argv)
return 0;
}
-
-static int flush_rule(const struct sockaddr_nl *who, struct nlmsghdr *n,
- void *arg)
-{
- struct rtnl_handle rth2;
- struct rtmsg *r = NLMSG_DATA(n);
- int len = n->nlmsg_len;
- struct rtattr *tb[FRA_MAX+1];
-
- len -= NLMSG_LENGTH(sizeof(*r));
- if (len < 0)
- return -1;
-
- parse_rtattr(tb, FRA_MAX, RTM_RTA(r), len);
-
- if (tb[FRA_PRIORITY]) {
- n->nlmsg_type = RTM_DELRULE;
- n->nlmsg_flags = NLM_F_REQUEST;
-
- if (rtnl_open(&rth2, 0) < 0)
- return -1;
-
- if (rtnl_talk(&rth2, n, NULL, 0) < 0)
- return -2;
-
- rtnl_close(&rth2);
- }
-
- return 0;
-}
-
-static int iprule_flush(int argc, char **argv)
-{
- int af = preferred_family;
-
- if (af == AF_UNSPEC)
- af = AF_INET;
-
- if (argc > 0) {
- fprintf(stderr, "\"ip rule flush\" does not allow arguments\n");
- return -1;
- }
-
- if (rtnl_wilddump_request(&rth, af, RTM_GETRULE) < 0) {
- perror("Cannot send dump request");
- return 1;
- }
-
- if (rtnl_dump_filter(&rth, flush_rule, NULL) < 0) {
- fprintf(stderr, "Flush terminated\n");
- return 1;
- }
-
- return 0;
-}
-
int do_iprule(int argc, char **argv)
{
if (argc < 1) {
- return iprule_list_or_save(0, NULL, 0);
+ return iprule_list_flush_or_save(0, NULL, IPRULE_LIST);
} else if (matches(argv[0], "list") == 0 ||
matches(argv[0], "lst") == 0 ||
matches(argv[0], "show") == 0) {
- return iprule_list_or_save(argc-1, argv+1, 0);
+ return iprule_list_flush_or_save(argc-1, argv+1, IPRULE_LIST);
} else if (matches(argv[0], "save") == 0) {
- return iprule_list_or_save(argc-1, argv+1, 1);
+ return iprule_list_flush_or_save(argc-1, argv+1, IPRULE_SAVE);
} else if (matches(argv[0], "restore") == 0) {
return iprule_restore();
} else if (matches(argv[0], "add") == 0) {
@@ -584,7 +571,7 @@ int do_iprule(int argc, char **argv)
} else if (matches(argv[0], "delete") == 0) {
return iprule_modify(RTM_DELRULE, argc-1, argv+1);
} else if (matches(argv[0], "flush") == 0) {
- return iprule_flush(argc-1, argv+1);
+ return iprule_list_flush_or_save(argc-1, argv+1, IPRULE_FLUSH);
} else if (matches(argv[0], "help") == 0)
usage();
--
2.5.5
^ permalink raw reply related
* [PATCH iproute2 0/2] ip rule: merger iprule_flush and add selector support
From: Hangbin Liu @ 2016-09-22 6:28 UTC (permalink / raw)
To: netdev; +Cc: Stephen Hemminger, Phil Sutter, Hangbin Liu
When merge iprule_flush() and iprule_list_or_save(). Renamed
rtnl_filter_t filter to filter_fn because we want to use global
variable 'filter' to filter nlmsg in the next patch.
Hangbin Liu (2):
ip rule: merge ip rule flush and list, save together
ip rule: add selector support
ip/iprule.c | 295 +++++++++++++++++++++++++++++++++++++++++------------
man/man8/ip-rule.8 | 6 +-
2 files changed, 231 insertions(+), 70 deletions(-)
--
2.5.5
^ permalink raw reply
* Re: [PATCH nf v3] netfilter: seqadj: Fix the wrong ack adjust for the RST packet without ack
From: Pablo Neira Ayuso @ 2016-09-22 6:27 UTC (permalink / raw)
To: fgao; +Cc: kaber, netfilter-devel, netdev, gfree.wind, Eric Dumazet
In-Reply-To: <1474510965-2049-1-git-send-email-fgao@ikuai8.com>
On top of Eric's comments.
On Thu, Sep 22, 2016 at 10:22:45AM +0800, fgao@ikuai8.com wrote:
> diff --git a/net/netfilter/nf_conntrack_seqadj.c b/net/netfilter/nf_conntrack_seqadj.c
> index dff0f0c..3bd9c7e 100644
> --- a/net/netfilter/nf_conntrack_seqadj.c
> +++ b/net/netfilter/nf_conntrack_seqadj.c
> @@ -179,30 +179,34 @@ int nf_ct_seq_adjust(struct sk_buff *skb,
>
> tcph = (void *)skb->data + protoff;
> spin_lock_bh(&ct->lock);
> +
> if (after(ntohl(tcph->seq), this_way->correction_pos))
> seqoff = this_way->offset_after;
> else
> seqoff = this_way->offset_before;
>
> - if (after(ntohl(tcph->ack_seq) - other_way->offset_before,
> - other_way->correction_pos))
> - ackoff = other_way->offset_after;
> - else
> - ackoff = other_way->offset_before;
> -
> newseq = htonl(ntohl(tcph->seq) + seqoff);
> - newack = htonl(ntohl(tcph->ack_seq) - ackoff);
> -
> inet_proto_csum_replace4(&tcph->check, skb, tcph->seq, newseq, false);
> - inet_proto_csum_replace4(&tcph->check, skb, tcph->ack_seq, newack,
> - false);
> -
> - pr_debug("Adjusting sequence number from %u->%u, ack from %u->%u\n",
> - ntohl(tcph->seq), ntohl(newseq), ntohl(tcph->ack_seq),
> - ntohl(newack));
>
> + pr_debug("Adjusting sequence number from %u->%u\n",
> + ntohl(tcph->seq), ntohl(newseq));
> tcph->seq = newseq;
> - tcph->ack_seq = newack;
> +
> + if (likely(tcph->ack)) {
I'd suggest:
if (!tcph->ack)
goto out;
given gcc sets goto branch as unlikely already, then you place an "out"
label...
> + if (after(ntohl(tcph->ack_seq) - other_way->offset_before,
> + other_way->correction_pos))
> + ackoff = other_way->offset_after;
> + else
> + ackoff = other_way->offset_before;
> +
> + newack = htonl(ntohl(tcph->ack_seq) - ackoff);
> + inet_proto_csum_replace4(&tcph->check, skb, tcph->ack_seq,
> + newack, false);
> +
> + pr_debug("Adjusting ack number from %u->%u\n",
> + ntohl(tcph->ack_seq), ntohl(newack));
> + tcph->ack_seq = newack;
> + }
>
> res = nf_ct_sack_adjust(skb, protoff, tcph, ct, ctinfo);
out: <- here
> spin_unlock_bh(&ct->lock);
This will get you a smaller patch fix.
^ permalink raw reply
* Re: [PATCHv3 net-next 0/2] Preparation for mv88e6390
From: David Miller @ 2016-09-22 6:20 UTC (permalink / raw)
To: andrew; +Cc: vivien.didelot, netdev
In-Reply-To: <1474414832-1638-1-git-send-email-andrew@lunn.ch>
From: Andrew Lunn <andrew@lunn.ch>
Date: Wed, 21 Sep 2016 01:40:30 +0200
> These two patches are a couple of preparation steps for supporting the
> the MV88E6390 family of chips. This is a new generation from Marvell,
> and will need more feature flags than are currently available in an
> unsigned long. Expand to an unsigned long long. The MV88E6390 also
> places its port registers somewhere else, so add a wrapper around port
> register access.
>
> v2:
> Rework wrappers to use mv88e6xxx_{read|write}
> Simpliy some (err < ) to (err)
> Add Reviewed by tag.
>
> v3::
> reg = reg & foo -> reg &= foo
> Fix over zealous s/ret/err
Series applied, thanks.
^ permalink raw reply
* Re: [PATCH] ptp_clock: future-proofing drivers against PTP subsystem becoming optional
From: David Miller @ 2016-09-22 6:18 UTC (permalink / raw)
To: nicolas.pitre
Cc: richardcochran, john.stultz, tglx, josh, netdev, linux-kernel
In-Reply-To: <alpine.LFD.2.20.1609201915070.9311@knanqh.ubzr>
From: Nicolas Pitre <nicolas.pitre@linaro.org>
Date: Tue, 20 Sep 2016 19:25:58 -0400 (EDT)
>
> Drivers must be ready to accept NULL from ptp_clock_register() if the
> PTP clock subsystem is configured out.
>
> This patch documents that and ensures that all drivers cope well
> with a NULL return.
>
> Signed-off-by: Nicolas Pitre <nico@linaro.org>
> Reviewed-by: Eugenia Emantayev <eugenia@mellanox.com>
>
> ---
>
> Let's have the basics merged now and work out the actual Kconfig issue
> separately. Richard, if you agree with this patch, I think this could go
> via the netdev tree.
Applied to net-next, thanks.
^ permalink raw reply
* Re: [PATCH 2/2] net: ethernet: hisilicon: hns: use new api ethtool_{get|set}_link_ksettings
From: David Miller @ 2016-09-22 6:12 UTC (permalink / raw)
To: tremyfr
Cc: yisen.zhuang, salil.mehta, yankejian, huangdaode, lisheng011,
xieqianqian, arnd, lipeng321, andrew, chenny.xu, netdev,
linux-kernel
In-Reply-To: <1474403412-18748-2-git-send-email-tremyfr@gmail.com>
From: Philippe Reynes <tremyfr@gmail.com>
Date: Tue, 20 Sep 2016 22:30:12 +0200
> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
>
> Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Applied.
^ permalink raw reply
* Re: [PATCH 1/2] net: ethernet: hisilicon: hns: use phydev from struct net_device
From: David Miller @ 2016-09-22 6:12 UTC (permalink / raw)
To: tremyfr
Cc: yisen.zhuang, salil.mehta, yankejian, huangdaode, lisheng011,
xieqianqian, arnd, lipeng321, andrew, chenny.xu, netdev,
linux-kernel
In-Reply-To: <1474403412-18748-1-git-send-email-tremyfr@gmail.com>
From: Philippe Reynes <tremyfr@gmail.com>
Date: Tue, 20 Sep 2016 22:30:11 +0200
> The private structure contain a pointer to phydev, but the structure
> net_device already contain such pointer. So we can remove the pointer
> phydev in the private structure, and update the driver to use the
> one contained in struct net_device.
>
> Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Applied.
^ permalink raw reply
* Re: [PATCH v2 0/2] make POSIX timers optional
From: David Miller @ 2016-09-22 6:09 UTC (permalink / raw)
To: nicolas.pitre
Cc: john.stultz, tglx, richardcochran, josh, netdev, linux-kernel
In-Reply-To: <1474401400-18491-1-git-send-email-nicolas.pitre@linaro.org>
From: Nicolas Pitre <nicolas.pitre@linaro.org>
Date: Tue, 20 Sep 2016 15:56:38 -0400
> Many embedded systems don't need the full POSIX timer support.
> Configuring them out provides a nice kernel image size reduction.
>
> When POSIX timers are configured out, the PTP clock subsystem should be
> left out as well. However a bunch of ethernet drivers currently *select*
> it in their Kconfig entries. Therefore some more tweaks were needed to
> break that hard dependency for those drivers to still be configured in
> if desired.
>
> It was agreed that the best path upstream for those patches is via
> John Stultz's timer tree.
Acked-by: David S. Miller <davem@davemloft.net>
^ permalink raw reply
* Re: [PATCH] net: fec: set mac address unconditionally
From: Uwe Kleine-König @ 2016-09-22 6:08 UTC (permalink / raw)
To: Gavin Schenk; +Cc: fugang.duan, netdev, kernel
In-Reply-To: <1474464655-126940-1-git-send-email-g.schenk@eckelmann.de>
Hello,
just a few nitpicks:
On Wed, Sep 21, 2016 at 03:30:55PM +0200, Gavin Schenk wrote:
> Fixes: 9638d19e4816 ("net: fec: add netif status check before set mac address")
This line belongs to in the S-o-B area below.
> If the mac address origin is not dt, you can only safe assign a
s/safe/safely/
> mac address after "link up" of the device. If the link is down the
> clocks are disabled and because of issues assigning registers when
> clocks are down the new mac address is discarded on some soc's. This fix
s/down/off/; s/is discarded/cannot be written in .ndo_set_mac_address()/
> sets the mac address unconditionally in fec_restart(...) and ensures
> consistens between fec registers and the network layer.
s/consistens/consistency/
Other than that:
Acked-by: Uwe Kleine-König <u.kleine-koenig@eckelmann.de>
Thanks
Uwe
--
Pengutronix e.K. | Uwe Kleine-König |
Industrial Linux Solutions | http://www.pengutronix.de/ |
^ permalink raw reply
* Re: [PATCH net-next] net: ethernet: mediatek: fix missing changes merged for conflicts overlapping commits
From: David Miller @ 2016-09-22 6:05 UTC (permalink / raw)
To: sean.wang; +Cc: john, nbd, netdev, linux-mediatek, keyhaede, objelf
In-Reply-To: <1474386804-11728-1-git-send-email-sean.wang@mediatek.com>
From: <sean.wang@mediatek.com>
Date: Tue, 20 Sep 2016 23:53:24 +0800
> From: Sean Wang <sean.wang@mediatek.com>
>
> add the missing commits about
> 1)
> Commit d3bd1ce4db8e843dce421e2f8f123e5251a9c7d3
> ("remove redundant free_irq for devm_request_ir allocated irq")
> 2)
> Commit 7c6b0d76fa02213393815e3b6d5e4a415bf3f0e2
> ("fix logic unbalance between probe and remove")
>
> during merge for conflicts overlapping commits by
> Commit b20b378d49926b82c0a131492fa8842156e0e8a9
> ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")
>
> Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Applied, thanks for fixing this up for me.
^ permalink raw reply
* Re: [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb()
From: David Miller @ 2016-09-22 5:52 UTC (permalink / raw)
To: jbenc; +Cc: sowmini.varadhan, netdev, hannes, aduyck, daniel, pabeni
In-Reply-To: <20160920190929.57ddaeb0@griffin>
From: Jiri Benc <jbenc@redhat.com>
Date: Tue, 20 Sep 2016 19:09:29 +0200
> But the point stands, we have much greater problems here than VXLAN.
> And I don't think that wrapping all IP address accesses into
> get/put_unaligned all around the stack is the solution.
Right, and I don't like marking things as packed either.
We need something that really solves the problem. We can't
change the existing protocols, but we can perhaps change the
geometry of the SKB when we deal with such protocols.
For example, we can memmove() to align the headers at skb->data and
then for the skb->data portion past the headers we insert a frag
pointing to it at the front of the frag list.
So we "memmove" down, creating a gap, and then past the gap
is the post-header area which gets inserted into the head of
the SKB's fraglist.
That will align all of the subsequent headers and avoid the
unaligned accesses after the vxlan header.
Alternatively we can do Alexander Duyck's trick, by pushing
the headers into the frag list, forcing a pull and realignment
by the next protocol layer.
This is so much better than the little hacks sprinkled all
over the problem and tackles the fundamental issue.
Thanks.
^ permalink raw reply
* Re: [PATCH] cxgb4: fix signed wrap around when decrementing index idx
From: David Miller @ 2016-09-22 5:49 UTC (permalink / raw)
To: colin.king; +Cc: hariprasad, netdev, linux-kernel
In-Reply-To: <20160920144846.21765-1-colin.king@canonical.com>
From: Colin King <colin.king@canonical.com>
Date: Tue, 20 Sep 2016 15:48:45 +0100
> From: Colin Ian King <colin.king@canonical.com>
>
> Change predecrement compare to post decrement compare to avoid an
> unsigned integer wrap-around comparison when decrementing idx in
> the while loop.
>
> For example, when idx is zero, the current situation will
> predecrement idx in the while loop, wrapping idx to the maximum
> signed integer and cause out of bounds reads on rxq_info->msix_tbl[idx].
>
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
This doesn't apply cleanly to the 'net' tree, please respin.
^ permalink raw reply
* Re: [PATCH net] net/mlx4_core: Fix to clean devlink resources
From: David Miller @ 2016-09-22 5:41 UTC (permalink / raw)
To: tariqt; +Cc: netdev, eranbe, kamalh
In-Reply-To: <1474372531-27120-1-git-send-email-tariqt@mellanox.com>
From: Tariq Toukan <tariqt@mellanox.com>
Date: Tue, 20 Sep 2016 14:55:31 +0300
> From: Kamal Heib <kamalh@mellanox.com>
>
> This patch cleans devlink resources by calling devlink_port_unregister()
> to avoid the following issues:
>
> - Kernel panic when triggering reset flow.
> - Memory leak due to unfreed resources in mlx4_init_port_info().
>
> Fixes: 09d4d087cd48 ("mlx4: Implement devlink interface")
> Signed-off-by: Kamal Heib <kamalh@mellanox.com>
> Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
> ---
> Please push it to -stable >= 4.6 as well. Thanks.
Applied and queued up for -stable, thanks.
^ permalink raw reply
* Re: [PATCH net-next v3 0/5] cxgb4: add support for offloading TC u32 filters
From: David Miller @ 2016-09-22 5:40 UTC (permalink / raw)
To: rahul.lakkireddy; +Cc: netdev, hariprasad, leedom, nirranjan, indranil
In-Reply-To: <cover.1474029677.git.rahul.lakkireddy@chelsio.com>
From: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Date: Tue, 20 Sep 2016 17:13:05 +0530
> This series of patches add support to offload TC u32 filters onto
> Chelsio NICs.
>
> Patch 1 moves current common filter code to separate files
> in order to provide a common api for performing packet classification
> and filtering in Chelsio NICs.
>
> Patch 2 enables filters for normal NIC configuration and implements
> common api for setting and deleting filters.
>
> Patches 3-5 add support for TC u32 offload via ndo_setup_tc.
Series applied.
^ permalink raw reply
* Re: [RFC PATCH v3 2/7] proc: Reduce cache miss in {snmp,netstat}_seq_show
From: hejianet @ 2016-09-22 5:38 UTC (permalink / raw)
To: Marcelo
Cc: netdev, linux-sctp, linux-kernel, davem, Alexey Kuznetsov,
James Morris, Hideaki YOSHIFUJI, Patrick McHardy, Vlad Yasevich,
Neil Horman, Steffen Klassert, Herbert Xu
In-Reply-To: <20160921182438.GG9323@localhost.localdomain>
On 9/22/16 2:24 AM, Marcelo wrote:
> On Thu, Sep 22, 2016 at 12:18:46AM +0800, hejianet wrote:
>> Hi Marcelo
>>
>> sorry for the late, just came back from a vacation.
> Hi, no problem. Hope your batteries are recharged now :-)
>
>> On 9/14/16 7:55 PM, Marcelo wrote:
>>> Hi Jia,
>>>
>>> On Wed, Sep 14, 2016 at 01:58:42PM +0800, hejianet wrote:
>>>> Hi Marcelo
>>>>
>>>>
>>>> On 9/13/16 2:57 AM, Marcelo wrote:
>>>>> On Fri, Sep 09, 2016 at 02:33:57PM +0800, Jia He wrote:
>>>>>> This is to use the generic interface snmp_get_cpu_field{,64}_batch to
>>>>>> aggregate the data by going through all the items of each cpu sequentially.
>>>>>> Then snmp_seq_show and netstat_seq_show are split into 2 parts to avoid build
>>>>>> warning "the frame size" larger than 1024 on s390.
>>>>> Yeah about that, did you test it with stack overflow detection?
>>>>> These arrays can be quite large.
>>>>>
>>>>> One more below..
>>>> Do you think it is acceptable if the stack usage is a little larger than 1024?
>>>> e.g. 1120
>>>> I can't find any other way to reduce the stack usage except use "static" before
>>>> unsigned long buff[TCP_MIB_MAX]
>>>>
>>>> PS. sizeof buff is about TCP_MIB_MAX(116)*8=928
>>>> B.R.
>>> That's pretty much the question. Linux has the option on some archs to
>>> run with 4Kb (4KSTACKS option), so this function alone would be using
>>> 25% of it in this last case. While on x86_64, it uses 16Kb (6538b8ea886e
>>> ("x86_64: expand kernel stack to 16K")).
>>>
>>> Adding static to it is not an option as it actually makes the variable
>>> shared amongst the CPUs (and then you have concurrency issues), plus the
>>> fact that it's always allocated, even while not in use.
>>>
>>> Others here certainly know better than me if it's okay to make such
>>> usage of the stach.
>> What about this patch instead?
>> It is a trade-off. I split the aggregation process into 2 parts, it will
>> increase the cache miss a little bit, but it can reduce the stack usage.
>> After this, stack usage is 672bytes
>> objdump -d vmlinux | ./scripts/checkstack.pl ppc64 | grep seq_show
>> 0xc0000000007f7cc0 netstat_seq_show_tcpext.isra.3 [vmlinux]:672
>>
>> diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
>> index c6ee8a2..cc41590 100644
>> --- a/net/ipv4/proc.c
>> +++ b/net/ipv4/proc.c
>> @@ -486,22 +486,37 @@ static const struct file_operations snmp_seq_fops = {
>> */
>> static int netstat_seq_show_tcpext(struct seq_file *seq, void *v)
>> {
>> - int i;
>> - unsigned long buff[LINUX_MIB_MAX];
>> + int i, c;
>> + unsigned long buff[LINUX_MIB_MAX/2 + 1];
>> struct net *net = seq->private;
>>
>> - memset(buff, 0, sizeof(unsigned long) * LINUX_MIB_MAX);
>> + memset(buff, 0, sizeof(unsigned long) * (LINUX_MIB_MAX/2 + 1));
>>
>> seq_puts(seq, "TcpExt:");
>> for (i = 0; snmp4_net_list[i].name; i++)
>> seq_printf(seq, " %s", snmp4_net_list[i].name);
>>
>> seq_puts(seq, "\nTcpExt:");
>> - snmp_get_cpu_field_batch(buff, snmp4_net_list,
>> - net->mib.net_statistics);
>> - for (i = 0; snmp4_net_list[i].name; i++)
>> + for_each_possible_cpu(c) {
>> + for (i = 0; i < LINUX_MIB_MAX/2; i++)
>> + buff[i] += snmp_get_cpu_field(
>> + net->mib.net_statistics,
>> + c, snmp4_net_list[i].entry);
>> + }
>> + for (i = 0; i < LINUX_MIB_MAX/2; i++)
>> seq_printf(seq, " %lu", buff[i]);
>>
>> + memset(buff, 0, sizeof(unsigned long) * (LINUX_MIB_MAX/2 + 1));
>> + for_each_possible_cpu(c) {
>> + for (i = LINUX_MIB_MAX/2; snmp4_net_list[i].name; i++)
>> + buff[i - LINUX_MIB_MAX/2] += snmp_get_cpu_field(
>> + net->mib.net_statistics,
>> + c,
>> + snmp4_net_list[i].entry);
>> + }
>> + for (i = LINUX_MIB_MAX/2; snmp4_net_list[i].name; i++)
>> + seq_printf(seq, " %lu", buff[i - LINUX_MIB_MAX/2]);
>> +
>> return 0;
>> }
> Yep, it halves the stack usage, but it doesn't look good heh
>
> But well, you may try to post the patchset (with or without this last
> change, you pick) officially and see how it goes. As you're posting as
> RFC, it's not being evaluated as seriously.
Thanks for the suggestion, I will remove it in future patch version
>
> FWIW, I tested your patches, using your test and /proc/net/snmp file on
> a x86_64 box, Intel(R) Xeon(R) CPU E5-2643 v3.
>
> Before the patches:
>
> Performance counter stats for './test /proc/net/snmp':
>
> 5.225 cache-misses
> 12.708.673.785 L1-dcache-loads
> 1.288.450.174 L1-dcache-load-misses # 10,14% of all L1-dcache hits
> 1.271.857.028 LLC-loads
> 4.122 LLC-load-misses # 0,00% of all LL-cache hits
>
> 9,174936524 seconds time elapsed
>
> After:
>
> Performance counter stats for './test /proc/net/snmp':
>
> 2.865 cache-misses
> 30.203.883.807 L1-dcache-loads
> 1.215.774.643 L1-dcache-load-misses # 4,03% of all L1-dcache hits
> 1.181.662.831 LLC-loads
> 2.685 LLC-load-misses # 0,00% of all LL-cache hits
>
> 13,374445056 seconds time elapsed
>
> Numbers were steady across multiple runs.
>
> Marcelo
Yes, I guess your X86 machine doesn't have the large cpu number as mine (cpu#=160).
The cache misses rate difference btw before and after this patch will be more
significant if the cpu number is large.
B.R.
Jia
>
>>>>>> +static int netstat_seq_show_ipext(struct seq_file *seq, void *v)
>>>>>> +{
>>>>>> + int i;
>>>>>> + u64 buff64[IPSTATS_MIB_MAX];
>>>>>> + struct net *net = seq->private;
>>>>>> seq_puts(seq, "\nIpExt:");
>>>>>> for (i = 0; snmp4_ipextstats_list[i].name != NULL; i++)
>>>>>> seq_printf(seq, " %s", snmp4_ipextstats_list[i].name);
>>>>>> seq_puts(seq, "\nIpExt:");
>>>>> You're missing a memset() call here.
>>> Not sure if you missed this one or not..
>> indeed, thanks
>> B.R.
>> Jia
>>> Thanks,
>>> Marcelo
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
^ permalink raw reply
* Re: [PATCH v3 net-next 2/2] net: skbuff: Coding: Use eth_type_vlan() instead of open coding it
From: David Miller @ 2016-09-22 5:36 UTC (permalink / raw)
To: shmulik.ladkani
Cc: jiri, daniel, pshelar, eric.dumazet, netdev, shmulik.ladkani
In-Reply-To: <1474364917-31710-2-git-send-email-shmulik.ladkani@gmail.com>
From: Shmulik Ladkani <shmulik.ladkani@ravellosystems.com>
Date: Tue, 20 Sep 2016 12:48:37 +0300
> Fix 'skb_vlan_pop' to use eth_type_vlan instead of directly comparing
> skb->protocol to ETH_P_8021Q or ETH_P_8021AD.
>
> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Applied.
^ permalink raw reply
* Re: [PATCH v3 net-next 1/2] net: skbuff: Remove errornous length validation in skb_vlan_pop()
From: David Miller @ 2016-09-22 5:36 UTC (permalink / raw)
To: shmulik.ladkani
Cc: jiri, daniel, pshelar, eric.dumazet, netdev, shmulik.ladkani
In-Reply-To: <1474364917-31710-1-git-send-email-shmulik.ladkani@gmail.com>
From: Shmulik Ladkani <shmulik.ladkani@ravellosystems.com>
Date: Tue, 20 Sep 2016 12:48:36 +0300
> In 93515d53b1
> "net: move vlan pop/push functions into common code"
> skb_vlan_pop was moved from its private location in openvswitch to
> skbuff common code.
>
> In case skb has non hw-accel vlan tag, the original 'pop_vlan()' assured
> that skb->len is sufficient (if skb->len < VLAN_ETH_HLEN then pop was
> considered a no-op).
>
> This validation was moved as is into the new common 'skb_vlan_pop'.
>
> Alas, in its original location (openvswitch), there was a guarantee that
> 'data' points to the mac_header, therefore the 'skb->len < VLAN_ETH_HLEN'
> condition made sense.
> However there's no such guarantee in the generic 'skb_vlan_pop'.
>
> For short packets received in rx path going through 'skb_vlan_pop',
> this causes 'skb_vlan_pop' to fail pop-ing a valid vlan hdr (in the non
> hw-accel case) or to fail moving next tag into hw-accel tag.
>
> Remove the 'skb->len < VLAN_ETH_HLEN' condition entirely:
> It is superfluous since inner '__skb_vlan_pop' already verifies there
> are VLAN_ETH_HLEN writable bytes at the mac_header.
>
> Note this presents a slight change to skb_vlan_pop() users:
> In case total length is smaller than VLAN_ETH_HLEN, skb_vlan_pop() now
> returns an error, as opposed to previous "no-op" behavior.
> Existing callers (e.g. tc act vlan, ovs) usually drop the packet if
> 'skb_vlan_pop' fails.
>
> Fixes: 93515d53b1 ("net: move vlan pop/push functions into common code")
> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Applied.
^ permalink raw reply
* Re: [PATCH v2 0/2] act_vlan: Introduce TCA_VLAN_ACT_MODIFY vlan action
From: David Miller @ 2016-09-22 5:34 UTC (permalink / raw)
To: shmulik.ladkani; +Cc: jiri, jhs, netdev
In-Reply-To: <1474301470-17965-1-git-send-email-shmulik.ladkani@gmail.com>
From: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Date: Mon, 19 Sep 2016 19:11:08 +0300
> TCA_VLAN_ACT_MODIFY allows one to change an existing tag.
>
> It accepts same attributes as TCA_VLAN_ACT_PUSH (protocol, id,
> priority).
> If packet is vlan tagged, then the tag gets overwritten according to
> user specified attributes.
>
> For example, this allows user to replace a tag's vid while preserving
> its priority bits (as opposed to "action vlan pop pipe action vlan push").
Series applied.
^ permalink raw reply
* Re: [PATCH] net: explicitly whitelist sysctls for unpriv namespaces
From: David Miller @ 2016-09-22 5:31 UTC (permalink / raw)
To: jann; +Cc: kuznet, jmorris, yoshfuji, kaber, netdev
In-Reply-To: <1474232300-8423-1-git-send-email-jann@thejh.net>
From: Jann Horn <jann@thejh.net>
Date: Sun, 18 Sep 2016 22:58:20 +0200
> There were two net sysctls that could be written from unprivileged net
> namespaces, but weren't actually namespaced.
>
> To fix the existing issues and prevent stuff this from happening again in
> the future, explicitly whitelist permitted sysctls.
>
> Note: The current whitelist is "allow everything that was previously
> accessible and that doesn't obviously modify global state".
>
> On my system, this patch just removes the write permissions for
> ipv4/netfilter/ip_conntrack_max, which would have been usable for a local
> DoS. With a different config, the ipv4/vs/debug_level sysctl would also be
> affected.
>
> Maximum impact of this seems to be local DoS, and it's a fairly large
> commit, so I'm sending this publicly directly.
>
> An alternative (and much smaller) fix would be to just change the
> permissions of the two files in question to be 0444 in non-privileged
> namespaces, but I believe that this solution is slightly less error-prone.
> If you think I should switch to the simple fix, let me know.
>
> Signed-off-by: Jann Horn <jann@thejh.net>
And actually this patch dosn't apply cleanly to net-next, please
respin.
Thanks.
^ permalink raw reply
* Re: [PATCH] net: explicitly whitelist sysctls for unpriv namespaces
From: David Miller @ 2016-09-22 5:30 UTC (permalink / raw)
To: jann; +Cc: kuznet, jmorris, yoshfuji, kaber, netdev
In-Reply-To: <1474232300-8423-1-git-send-email-jann@thejh.net>
From: Jann Horn <jann@thejh.net>
Date: Sun, 18 Sep 2016 22:58:20 +0200
> There were two net sysctls that could be written from unprivileged net
> namespaces, but weren't actually namespaced.
>
> To fix the existing issues and prevent stuff this from happening again in
> the future, explicitly whitelist permitted sysctls.
>
> Note: The current whitelist is "allow everything that was previously
> accessible and that doesn't obviously modify global state".
>
> On my system, this patch just removes the write permissions for
> ipv4/netfilter/ip_conntrack_max, which would have been usable for a local
> DoS. With a different config, the ipv4/vs/debug_level sysctl would also be
> affected.
>
> Maximum impact of this seems to be local DoS, and it's a fairly large
> commit, so I'm sending this publicly directly.
>
> An alternative (and much smaller) fix would be to just change the
> permissions of the two files in question to be 0444 in non-privileged
> namespaces, but I believe that this solution is slightly less error-prone.
> If you think I should switch to the simple fix, let me know.
>
> Signed-off-by: Jann Horn <jann@thejh.net>
I think this is fine for net-next and will apply it there.
But for 'net' and 'stable', please also submit the simpler fix.
Thanks.
^ permalink raw reply
* Re: [PATCH net-next 9/9] rxrpc: Reduce the number of ACK-Requests sent
From: kbuild test robot @ 2016-09-22 5:24 UTC (permalink / raw)
To: David Howells; +Cc: kbuild-all, netdev, dhowells, linux-afs, linux-kernel
In-Reply-To: <147450481095.14691.12793864306697245558.stgit@warthog.procyon.org.uk>
[-- Attachment #1: Type: text/plain, Size: 871 bytes --]
Hi David,
[auto build test ERROR on net-next/master]
url: https://github.com/0day-ci/linux/commits/David-Howells/rxrpc-Preparation-for-slow-start-algorithm/20160922-085242
config: arm-omap2plus_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm
All errors (new ones prefixed by >>):
>> ERROR: "__aeabi_ldivmod" [net/rxrpc/af-rxrpc.ko] undefined!
ERROR: "__aeabi_uldivmod" [net/rxrpc/af-rxrpc.ko] undefined!
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 28256 bytes --]
^ permalink raw reply
* Re: [PATCH nf v3] netfilter: seqadj: Fix the wrong ack adjust for the RST packet without ack
From: Eric Dumazet @ 2016-09-22 5:20 UTC (permalink / raw)
To: fgao; +Cc: pablo, kaber, netfilter-devel, netdev, gfree.wind
In-Reply-To: <1474510965-2049-1-git-send-email-fgao@ikuai8.com>
On Thu, 2016-09-22 at 10:22 +0800, fgao@ikuai8.com wrote:
> From: Gao Feng <fgao@ikuai8.com>
>
> It is valid that the TCP RST packet which does not set ack flag, and bytes
> of ack number are zero. But current seqadj codes would adjust the "0" ack
> to invalid ack number. Actually seqadj need to check the ack flag before
> adjust it for these RST packets.
>
> The following is my test case
>
> client is 10.26.98.245, and add one iptable rule:
> iptables -I INPUT -p tcp --sport 12345 -m connbytes --connbytes 2:
> --connbytes-dir reply --connbytes-mode packets -j REJECT --reject-with
> tcp-reset
> This iptables rule could generate on TCP RST without ack flag.
>
> server:10.172.135.55
> Enable the synproxy with seqadjust by the following iptables rules
> iptables -t raw -A PREROUTING -i eth0 -p tcp -d 10.172.135.55 --dport 12345
> -m tcp --syn -j CT --notrack
>
> iptables -A INPUT -i eth0 -p tcp -d 10.172.135.55 --dport 12345 -m conntrack
> --ctstate INVALID,UNTRACKED -j SYNPROXY --sack-perm --timestamp --wscale 7
> --mss 1460
> iptables -A OUTPUT -o eth0 -p tcp -s 10.172.135.55 --sport 12345 -m conntrack
> --ctstate INVALID,UNTRACKED -m tcp --tcp-flags SYN,RST,ACK SYN,ACK -j ACCEPT
>
> The following is my test result.
>
> 1. packet trace on client
> root@routers:/tmp# tcpdump -i eth0 tcp port 12345 -n
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
> IP 10.26.98.245.45154 > 10.172.135.55.12345: Flags [S], seq 3695959829,
> win 29200, options [mss 1460,sackOK,TS val 452367884 ecr 0,nop,wscale 7],
> length 0
> IP 10.172.135.55.12345 > 10.26.98.245.45154: Flags [S.], seq 546723266,
> ack 3695959830, win 0, options [mss 1460,sackOK,TS val 15643479 ecr 452367884,
> nop,wscale 7], length 0
> IP 10.26.98.245.45154 > 10.172.135.55.12345: Flags [.], ack 1, win 229,
> options [nop,nop,TS val 452367885 ecr 15643479], length 0
> IP 10.172.135.55.12345 > 10.26.98.245.45154: Flags [.], ack 1, win 226,
> options [nop,nop,TS val 15643479 ecr 452367885], length 0
> IP 10.26.98.245.45154 > 10.172.135.55.12345: Flags [R], seq 3695959830,
> win 0, length 0
>
> 2. seqadj log on server
> [62873.867319] Adjusting sequence number from 602341895->546723267,
> ack from 3695959830->3695959830
> [62873.867644] Adjusting sequence number from 602341895->546723267,
> ack from 3695959830->3695959830
> [62873.869040] Adjusting sequence number from 3695959830->3695959830,
> ack from 0->55618628
>
> To summarize, it is clear that the seqadj codes adjust the 0 ack when receive
> one TCP RST packet without ack.
>
> Signed-off-by: Gao Feng <fgao@ikuai8.com>
> ---
> v3: Add the reproduce steps and packet trace
> v2: Regenerate because the first patch is removed
> v1: Initial patch
>
> net/netfilter/nf_conntrack_seqadj.c | 34 +++++++++++++++++++---------------
> 1 file changed, 19 insertions(+), 15 deletions(-)
>
> diff --git a/net/netfilter/nf_conntrack_seqadj.c b/net/netfilter/nf_conntrack_seqadj.c
> index dff0f0c..3bd9c7e 100644
> --- a/net/netfilter/nf_conntrack_seqadj.c
> +++ b/net/netfilter/nf_conntrack_seqadj.c
> @@ -179,30 +179,34 @@ int nf_ct_seq_adjust(struct sk_buff *skb,
>
> tcph = (void *)skb->data + protoff;
> spin_lock_bh(&ct->lock);
> +
Please do not add style change during a bug fix.
> if (after(ntohl(tcph->seq), this_way->correction_pos))
> seqoff = this_way->offset_after;
> else
> seqoff = this_way->offset_before;
>
> - if (after(ntohl(tcph->ack_seq) - other_way->offset_before,
> - other_way->correction_pos))
> - ackoff = other_way->offset_after;
> - else
> - ackoff = other_way->offset_before;
> -
> newseq = htonl(ntohl(tcph->seq) + seqoff);
> - newack = htonl(ntohl(tcph->ack_seq) - ackoff);
> -
> inet_proto_csum_replace4(&tcph->check, skb, tcph->seq, newseq, false);
> - inet_proto_csum_replace4(&tcph->check, skb, tcph->ack_seq, newack,
> - false);
> -
> - pr_debug("Adjusting sequence number from %u->%u, ack from %u->%u\n",
> - ntohl(tcph->seq), ntohl(newseq), ntohl(tcph->ack_seq),
> - ntohl(newack));
>
> + pr_debug("Adjusting sequence number from %u->%u\n",
> + ntohl(tcph->seq), ntohl(newseq));
> tcph->seq = newseq;
> - tcph->ack_seq = newack;
> +
> + if (likely(tcph->ack)) {
> + if (after(ntohl(tcph->ack_seq) - other_way->offset_before,
> + other_way->correction_pos))
> + ackoff = other_way->offset_after;
> + else
> + ackoff = other_way->offset_before;
> +
> + newack = htonl(ntohl(tcph->ack_seq) - ackoff);
> + inet_proto_csum_replace4(&tcph->check, skb, tcph->ack_seq,
> + newack, false);
> +
> + pr_debug("Adjusting ack number from %u->%u\n",
> + ntohl(tcph->ack_seq), ntohl(newack));
> + tcph->ack_seq = newack;
> + }
>
If tcph->ack is not set, why are we calling nf_ct_sack_adjust() ?
> res = nf_ct_sack_adjust(skb, protoff, tcph, ct, ctinfo);
> spin_unlock_bh(&ct->lock);
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox