* Re: Silently dropped UDP packets on kernel 4.14
From: Michal Kubecek @ 2018-05-03 9:42 UTC (permalink / raw)
To: Florian Westphal
Cc: Kristian Evensen, Netfilter Development Mailing list,
Network Development
In-Reply-To: <20180503050345.iyasach2ogf25dt3@breakpoint.cc>
On Thu, May 03, 2018 at 07:03:45AM +0200, Florian Westphal wrote:
> Kristian Evensen <kristian.evensen@gmail.com> wrote:
> > I went for the early-insert approached and have patched
>
> I'm sorry for suggesting that.
>
> It doesn't work, because of NAT.
> NAT rewrites packet content and changes the reply tuple, but the tuples
> determine the hash insertion location.
>
> I don't know how to solve this problem.
It's an old problem which surfaces from time to time when some special
conditions make it more visible. When I was facing it in 2015, I found
this thread from as early as 2009:
https://www.spinics.net/lists/linux-net/msg16712.html
In our case, the customer was using IPVS in "one packet scheduling" mode
(it drops the conntrack entry after each packet) which increased the
probability of insert collisions significantly. Using NFQUEUE
We were lucky, though, as it turned out the only reason why customer
needed connection tracking was to make sure fragments of long UDP
datagrams are not sent to different real servers. For newer kernels
after commit 6aafeef03b9d ("netfilter: push reasm skb through instead of
original frag skbs"), this was no longer necessary so that they could
disable connection tracking for these packets.
For older kernels without this change, I tried several ideas, each of
which didn't work for some reason. We ended up with rather hacky
workaround, not dropping the packet on collision (so that its conntrack
wasn't inserted into the table and was dropped once the packet was
sent). It worked fine for our customer but like the early insert
approach, it wouldn't work with NAT.
One of the ideas I had was this:
- keep also unconfirmed conntracks in some data structure
- check new packets also against unconfirmed conntracks
- if it matches an unconfirmed conntrack, defer its processing
until that conntrack is either inserted or discarded
But as it would be rather complicated to implement without races and
harming performance, I didn't want to actually try it until I would
run out of other ideas. With NAT coming to the play, there doesn't seem
to be many other options.
Michal Kubecek
^ permalink raw reply
* [PATCH net-next] net: core: rework skb_probe_transport_header()
From: Paolo Abeni @ 2018-05-03 9:35 UTC (permalink / raw)
To: netdev; +Cc: David S. Miller, Eric Dumazet, Jason Wang
When the transport header is not available, skb_probe_transport_header()
resorts to fully dissect the flow keys, even if it only needs the
ransport offset. We can obtain the latter using a simpler flow dissector -
flow_keys_buf_dissector - and a smaller struct for key storage.
The above gives ~50% performance improvement in micro benchmarking around
skb_probe_transport_header(), mostly due to the smaller memset. Small, but
measurable improvement is measured also in macro benchmarking - raw xmit
tput from a VM.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
include/linux/skbuff.h | 7 +++++--
include/net/flow_dissector.h | 5 +++++
net/core/flow_dissector.c | 1 +
3 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 908d66e55b14..63cb523d3519 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2350,11 +2350,14 @@ static inline void skb_pop_mac_header(struct sk_buff *skb)
static inline void skb_probe_transport_header(struct sk_buff *skb,
const int offset_hint)
{
- struct flow_keys keys;
+ struct flow_keys_basic keys;
if (skb_transport_header_was_set(skb))
return;
- else if (skb_flow_dissect_flow_keys(skb, &keys, 0))
+
+ memset(&keys, 0, sizeof(keys));
+ if (__skb_flow_dissect(skb, &flow_keys_buf_dissector, &keys,
+ 0, 0, 0, 0, 0))
skb_set_transport_header(skb, keys.control.thoff);
else
skb_set_transport_header(skb, offset_hint);
diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
index 9a074776f70b..e81dab6e9ac6 100644
--- a/include/net/flow_dissector.h
+++ b/include/net/flow_dissector.h
@@ -226,6 +226,11 @@ struct flow_dissector {
unsigned short int offset[FLOW_DISSECTOR_KEY_MAX];
};
+struct flow_keys_basic {
+ struct flow_dissector_key_control control;
+ struct flow_dissector_key_basic basic;
+};
+
struct flow_keys {
struct flow_dissector_key_control control;
#define FLOW_KEYS_HASH_START_FIELD basic
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index d29f09bc5ff9..ac7b4de4a0f0 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -1418,6 +1418,7 @@ struct flow_dissector flow_keys_dissector __read_mostly;
EXPORT_SYMBOL(flow_keys_dissector);
struct flow_dissector flow_keys_buf_dissector __read_mostly;
+EXPORT_SYMBOL(flow_keys_buf_dissector);
static int __init init_default_flow_dissectors(void)
{
--
2.14.3
^ permalink raw reply related
* [PATCH] net/mlx5e: fix spelling mistake: "loobpack" -> "loopback"
From: Colin King @ 2018-05-03 9:12 UTC (permalink / raw)
To: Saeed Mahameed, Matan Barak, Leon Romanovsky, netdev, linux-rdma
Cc: kernel-janitors, David S . Miller, linux-kernel
From: Colin Ian King <colin.king@canonical.com>
Trivial fix to spelling mistake in netdev_err error message
Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
index 707976482c09..027f54ac1ca2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
@@ -290,7 +290,7 @@ static int mlx5e_test_loopback(struct mlx5e_priv *priv)
if (!test_bit(MLX5E_STATE_OPENED, &priv->state)) {
netdev_err(priv->netdev,
- "\tCan't perform loobpack test while device is down\n");
+ "\tCan't perform loopback test while device is down\n");
return -ENODEV;
}
--
2.17.0
^ permalink raw reply related
* [PATCH V2] net/netlink: optimize seq_puts and seq_printf in af_netlink.c
From: YU Bo @ 2018-05-03 9:09 UTC (permalink / raw)
To: davem, xiyou.wangcong, yuzibode, tsu.yubo; +Cc: netdev, kernel-janitors
Before the patch, the command `cat /proc/net/netlink` will output like:
https://clbin.com/BojZv
After the patch:
https://clbin.com/lnu4L
The optimization will make convenience for using `cat /proc/net/netlink`
But,The checkpatch will give a warning:
WARNING: quoted string split across lines
Signed-off-by: Bo YU <tsu.yubo@gmail.com>
---
Changes in v2:
Do not break the indentation of the code line
---
net/netlink/af_netlink.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 55342c4d5cec..2e2dd88fc79f 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2606,13 +2606,13 @@ static int netlink_seq_show(struct seq_file *seq, void *v)
{
if (v == SEQ_START_TOKEN) {
seq_puts(seq,
- "sk Eth Pid Groups "
- "Rmem Wmem Dump Locks Drops Inode\n");
+ "sk Eth Pid Groups "
+ "Rmem Wmem Dump Locks Drops Inode\n");
} else {
struct sock *s = v;
struct netlink_sock *nlk = nlk_sk(s);
- seq_printf(seq, "%pK %-3d %-6u %08x %-8d %-8d %d %-8d %-8d %-8lu\n",
+ seq_printf(seq, "%pK %-3d %-10u %08x %-8d %-8d %-5d %-8d %-8d %-8lu\n",
s,
s->sk_protocol,
nlk->portid,
^ permalink raw reply related
* Re: Silently dropped UDP packets on kernel 4.14
From: Kristian Evensen @ 2018-05-03 9:06 UTC (permalink / raw)
To: Florian Westphal; +Cc: Netfilter Development Mailing list, Network Development
In-Reply-To: <20180503050345.iyasach2ogf25dt3@breakpoint.cc>
Hi Florian,
On Thu, May 3, 2018 at 7:03 AM, Florian Westphal <fw@strlen.de> wrote:
> I'm sorry for suggesting that.
>
> It doesn't work, because of NAT.
> NAT rewrites packet content and changes the reply tuple, but the tuples
> determine the hash insertion location.
>
> I don't know how to solve this problem.
No problem. This has anyway served as a good exercise for getting more
familiar with the conntrack/nat code in the kernel. I did some more
tests and I see that on my router (or routers actually), just
replacing the ct solves the issue. While not a perfect solution, the
situation is improved considerably. Do you think a patch where the ct
is replace would be acceptable, or would upstream rather wait for a
"proper" fix to this problem? When replacing the ct, it is at least
possible to work around the problem in userspace, while without
replacing ct we are stuck with the original entry.
BR,
Kristian
^ permalink raw reply
* [PATCH ipsec-next] xfrm: use a dedicated slab cache for struct xfrm_state
From: Mathias Krause @ 2018-05-03 8:55 UTC (permalink / raw)
To: Steffen Klassert; +Cc: Mathias Krause, Herbert Xu, David S. Miller, netdev
struct xfrm_state is rather large (768 bytes here) and therefore wastes
quite a lot of memory as it falls into the kmalloc-1024 slab cache,
leaving 256 bytes of unused memory per XFRM state object -- a net waste
of 25%.
Using a dedicated slab cache for struct xfrm_state reduces the level of
internal fragmentation to a minimum.
On my configuration SLUB chooses to create a slab cache covering 4
pages holding 21 objects, resulting in an average memory waste of ~13
bytes per object -- a net waste of only 1.6%.
In my tests this led to memory savings of roughly 2.3MB for 10k XFRM
states.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
---
net/xfrm/xfrm_state.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index f9d2f2233f09..73db0ea8692a 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -42,6 +42,7 @@
static unsigned int xfrm_state_hashmax __read_mostly = 1 * 1024 * 1024;
static __read_mostly seqcount_t xfrm_state_hash_generation = SEQCNT_ZERO(xfrm_state_hash_generation);
+static struct kmem_cache *xfrm_state_cache __ro_after_init;
static DECLARE_WORK(xfrm_state_gc_work, xfrm_state_gc_task);
static HLIST_HEAD(xfrm_state_gc_list);
@@ -451,7 +452,7 @@ static void xfrm_state_gc_destroy(struct xfrm_state *x)
}
xfrm_dev_state_free(x);
security_xfrm_state_free(x);
- kfree(x);
+ kmem_cache_free(xfrm_state_cache, x);
}
static void xfrm_state_gc_task(struct work_struct *work)
@@ -563,7 +564,7 @@ struct xfrm_state *xfrm_state_alloc(struct net *net)
{
struct xfrm_state *x;
- x = kzalloc(sizeof(struct xfrm_state), GFP_ATOMIC);
+ x = kmem_cache_alloc(xfrm_state_cache, GFP_ATOMIC | __GFP_ZERO);
if (x) {
write_pnet(&x->xs_net, net);
@@ -2307,6 +2308,10 @@ int __net_init xfrm_state_init(struct net *net)
{
unsigned int sz;
+ if (net_eq(net, &init_net))
+ xfrm_state_cache = KMEM_CACHE(xfrm_state,
+ SLAB_HWCACHE_ALIGN | SLAB_PANIC);
+
INIT_LIST_HEAD(&net->xfrm.state_all);
sz = sizeof(struct hlist_head) * 8;
--
1.7.10.4
^ permalink raw reply related
* Re: [PATCH net] macmace: Set platform device coherent_dma_mask
From: Christoph Hellwig @ 2018-05-03 8:51 UTC (permalink / raw)
To: Geert Uytterhoeven
Cc: Finn Thain, David S. Miller, linux-m68k, netdev,
Linux Kernel Mailing List, Christoph Hellwig
In-Reply-To: <CAMuHMdU1XBqt7hwEW6JTas64ZNGCGCMr5HMZwuLo0O-ZBCOWyA@mail.gmail.com>
On Thu, May 03, 2018 at 10:46:56AM +0200, Geert Uytterhoeven wrote:
> Perhaps you can add a new helper (platform_device_register_simple_dma()?)
> that takes the DMA mask, too?
> With people setting the mask to kill the WARNING splat, this may become
> more common.
>
> struct platform_device_info already has a dma_mask field, but
> platform_device_register_resndata() explicitly sets it to zero.
Yes, that would be useful. The other assumption could be that
platform devices always allow an all-0xff dma mask.
^ permalink raw reply
* Re: [PATCH net] macmace: Set platform device coherent_dma_mask
From: Geert Uytterhoeven @ 2018-05-03 8:46 UTC (permalink / raw)
To: Finn Thain
Cc: David S. Miller, linux-m68k, netdev, Linux Kernel Mailing List,
Christoph Hellwig
In-Reply-To: <alpine.LNX.2.21.1805031801310.8@nippy.intranet>
Hi Finn,
CC Christoph
On Thu, May 3, 2018 at 10:38 AM, Finn Thain <fthain@telegraphics.com.au> wrote:
> On Thu, 3 May 2018, Geert Uytterhoeven wrote:
>> > --- a/drivers/net/ethernet/apple/macmace.c
>> > +++ b/drivers/net/ethernet/apple/macmace.c
>> > @@ -203,6 +203,10 @@ static int mace_probe(struct platform_device *pdev)
>> > unsigned char checksum = 0;
>> > int err;
>> >
>> > + err = dma_coerce_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
>> > + if (err)
>> > + return err;
>> > +
>> > dev = alloc_etherdev(PRIV_BYTES);
>> > if (!dev)
>> > return -ENOMEM;
>>
>> Shouldn't this be handled in the platform code that instantiates the
>> device, i.e. in arch/m68k/mac/config.c:mac_platform_init()?
>
> I wondered about that too. The downside is that I'd have to convert
> platform_device_register_simple() into platform_device_register() and add
> all of the boilerplate that goes with that, for little gain.
>
>> Cfr. commit f61e64310b75733d ("m68k: set dma and coherent masks for
>> platform FEC ethernets").
>
> Yes, I looked at that patch before I sent this one. It makes sense to set
> the mask when defining the device since some devices tend to have inherent
> limitations (but that's not really applicable here).
>
> Moreover, it turns out that a number of platform drivers already call
> dma_set_mask_and_coherent() or dma_coerce_mask_and_coherent() or similar.
>
> I figured that platform drivers aren't expected to be particularly
> portable. Well, I'd expect macmace and macsonic to be portable to NuBus
> PowerMacs, but AFAIK the correct mask would remain DMA_BIT_MASK(32).
>
> So that's how I ended up with this patch. But if you are not pursuaded by
> my reasoning then just say the word and I'll take another approach.
Perhaps you can add a new helper (platform_device_register_simple_dma()?)
that takes the DMA mask, too?
With people setting the mask to kill the WARNING splat, this may become
more common.
struct platform_device_info already has a dma_mask field, but
platform_device_register_resndata() explicitly sets it to zero.
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply
* Re: [PATCH net] macmace: Set platform device coherent_dma_mask
From: Finn Thain @ 2018-05-03 8:38 UTC (permalink / raw)
To: Geert Uytterhoeven; +Cc: David S. Miller, linux-m68k, netdev, linux-kernel
In-Reply-To: <CAMuHMdUAxmVZLekekvVrnMbbL18oyY86sF9QX087idSqcKMiPQ@mail.gmail.com>
On Thu, 3 May 2018, Geert Uytterhoeven wrote:
> > --- a/drivers/net/ethernet/apple/macmace.c
> > +++ b/drivers/net/ethernet/apple/macmace.c
> > @@ -203,6 +203,10 @@ static int mace_probe(struct platform_device *pdev)
> > unsigned char checksum = 0;
> > int err;
> >
> > + err = dma_coerce_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
> > + if (err)
> > + return err;
> > +
> > dev = alloc_etherdev(PRIV_BYTES);
> > if (!dev)
> > return -ENOMEM;
>
> Shouldn't this be handled in the platform code that instantiates the
> device, i.e. in arch/m68k/mac/config.c:mac_platform_init()?
>
I wondered about that too. The downside is that I'd have to convert
platform_device_register_simple() into platform_device_register() and add
all of the boilerplate that goes with that, for little gain.
> Cfr. commit f61e64310b75733d ("m68k: set dma and coherent masks for
> platform FEC ethernets").
>
Yes, I looked at that patch before I sent this one. It makes sense to set
the mask when defining the device since some devices tend to have inherent
limitations (but that's not really applicable here).
Moreover, it turns out that a number of platform drivers already call
dma_set_mask_and_coherent() or dma_coerce_mask_and_coherent() or similar.
I figured that platform drivers aren't expected to be particularly
portable. Well, I'd expect macmace and macsonic to be portable to NuBus
PowerMacs, but AFAIK the correct mask would remain DMA_BIT_MASK(32).
So that's how I ended up with this patch. But if you are not pursuaded by
my reasoning then just say the word and I'll take another approach.
--
> Gr{oetje,eeting}s,
>
> Geert
>
^ permalink raw reply
* Re: [PATCH net-next v7 3/7] sch_cake: Add optional ACK filter
From: kbuild test robot @ 2018-05-03 8:26 UTC (permalink / raw)
To: Toke Høiland-Jørgensen; +Cc: kbuild-all, netdev, cake
In-Reply-To: <152527387324.14936.10258520847821060114.stgit@alrua-kau>
Hi Toke,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on net-next/master]
url: https://github.com/0day-ci/linux/commits/Toke-H-iland-J-rgensen/sched-Add-Common-Applications-Kept-Enhanced-cake-qdisc/20180503-073002
coccinelle warnings: (new ones prefixed by >>)
>> net/sched/sch_cake.c:1047:6-13: ERROR: PTR_ERR applied after initialization to constant on line 822
vim +1047 net/sched/sch_cake.c
812
813 static struct sk_buff *cake_ack_filter(struct cake_sched_data *q,
814 struct cake_flow *flow)
815 {
816 bool thisconn_redundant_seen = false, thisconn_seen_last = false;
817 bool aggressive = q->ack_filter == CAKE_ACK_AGGRESSIVE;
818 bool otherconn_ack_seen = false;
819 struct sk_buff *skb_check, *skb_check_prev;
820 struct sk_buff *otherconn_checked_to = NULL;
821 struct sk_buff *thisconn_checked_to = NULL;
822 struct sk_buff *thisconn_ack = NULL;
823 const struct ipv6hdr *ipv6h, *ipv6h_check;
824 const struct tcphdr *tcph, *tcph_check;
825 const struct iphdr *iph, *iph_check;
826 const struct sk_buff *skb;
827 struct ipv6hdr _iph, _iph_check;
828 struct tcphdr _tcph_check;
829 unsigned char _tcph[64]; /* need to hold maximum hdr size */
830 int seglen;
831
832 /* no other possible ACKs to filter */
833 if (flow->head == flow->tail)
834 return NULL;
835
836 skb = flow->tail;
837 tcph = cake_get_tcphdr(skb, _tcph, sizeof(_tcph));
838 iph = cake_get_iphdr(skb, &_iph);
839 if (!tcph)
840 return NULL;
841
842 /* the 'triggering' packet need only have the ACK flag set.
843 * also check that SYN is not set, as there won't be any previous ACKs.
844 */
845 if ((tcp_flag_word(tcph) &
846 (TCP_FLAG_ACK | TCP_FLAG_SYN)) != TCP_FLAG_ACK)
847 return NULL;
848
849 /* the 'triggering' ACK is at the end of the queue,
850 * we have already returned if it is the only packet in the flow.
851 * stop before last packet in queue, don't compare trigger ACK to itself
852 * start where we finished last time if recorded in ->ackcheck
853 * otherwise start from the the head of the flow queue.
854 */
855 skb_check_prev = flow->ackcheck;
856 skb_check = flow->ackcheck ?: flow->head;
857
858 while (skb_check->next) {
859 bool pure_ack, thisconn;
860
861 /* don't increment if at head of flow queue (_prev == NULL) */
862 if (skb_check_prev) {
863 skb_check_prev = skb_check;
864 skb_check = skb_check->next;
865 if (!skb_check->next)
866 break;
867 } else {
868 skb_check_prev = ERR_PTR(-1);
869 }
870
871 iph_check = cake_get_iphdr(skb_check, &_iph_check);
872 tcph_check = cake_get_tcphdr(skb_check, &_tcph_check,
873 sizeof(_tcph_check));
874
875 if (!tcph_check || iph->version != iph_check->version)
876 continue;
877
878 if (iph->version == 4) {
879 seglen = ntohs(iph_check->tot_len) -
880 (4 * iph_check->ihl);
881
882 thisconn = (iph_check->saddr == iph->saddr &&
883 iph_check->daddr == iph->daddr);
884 } else if (iph->version == 6) {
885 ipv6h = (struct ipv6hdr *)iph;
886 ipv6h_check = (struct ipv6hdr *)iph_check;
887 seglen = ntohs(ipv6h_check->payload_len);
888
889 thisconn = (!ipv6_addr_cmp(&ipv6h_check->saddr,
890 &ipv6h->saddr) &&
891 !ipv6_addr_cmp(&ipv6h_check->daddr,
892 &ipv6h->daddr));
893 } else {
894 WARN_ON(1); /* shouldn't happen */
895 continue;
896 }
897
898 /* stricter criteria apply to ACKs that we may filter
899 * 3 reserved flags must be unset to avoid future breakage
900 * ECE/CWR/NS can be safely ignored
901 * ACK must be set
902 * All other flags URG/PSH/RST/SYN/FIN must be unset
903 * 0x0FFF0000 = all TCP flags (confirm ACK=1, others zero)
904 * 0x01C00000 = NS/CWR/ECE (safe to ignore)
905 * 0x0E3F0000 = 0x0FFF0000 & ~0x01C00000
906 * must be 'pure' ACK, contain zero bytes of segment data
907 * options are ignored
908 */
909 if ((tcp_flag_word(tcph_check) &
910 (TCP_FLAG_ACK | TCP_FLAG_SYN)) != TCP_FLAG_ACK)
911 continue;
912
913 else if (((tcp_flag_word(tcph_check) &
914 cpu_to_be32(0x0E3F0000)) != TCP_FLAG_ACK) ||
915 ((seglen - __tcp_hdrlen(tcph_check)) != 0))
916 pure_ack = false;
917
918 else
919 pure_ack = true;
920
921 /* if we find an ACK belonging to a different connection
922 * continue checking for other ACKs this round however
923 * restart checking from the other connection next time.
924 */
925 if (thisconn && (tcph_check->source != tcph->source ||
926 tcph_check->dest != tcph->dest))
927 thisconn = false;
928
929 /* new ack sequence must be greater
930 */
931 if (thisconn &&
932 ((int32_t)(ntohl(tcph_check->ack_seq) -
933 ntohl(tcph->ack_seq)) > 0))
934 continue;
935
936 /* DupACKs with an equal sequence number shouldn't be filtered,
937 * but we can filter if the triggering packet is a SACK
938 */
939 if (thisconn &&
940 (ntohl(tcph_check->ack_seq) == ntohl(tcph->ack_seq))) {
941 /* inspired by tcp_parse_options in tcp_input.c */
942 bool sack = false;
943 int length = __tcp_hdrlen(tcph) - sizeof(struct tcphdr);
944 const u8 *ptr = (const u8 *)(tcph + 1);
945
946 while (length > 0) {
947 int opcode = *ptr++;
948 int opsize;
949
950 if (opcode == TCPOPT_EOL)
951 break;
952 if (opcode == TCPOPT_NOP) {
953 length--;
954 continue;
955 }
956 opsize = *ptr++;
957 if (opsize < 2 || opsize > length)
958 break;
959 if (opcode == TCPOPT_SACK) {
960 sack = true;
961 break;
962 }
963 ptr += opsize - 2;
964 length -= opsize;
965 }
966 if (!sack)
967 continue;
968 }
969
970 /* somewhat complicated control flow for 'conservative'
971 * ACK filtering that aims to be more polite to slow-start and
972 * in the presence of packet loss.
973 * does not filter if there is one 'redundant' ACK in the queue.
974 * 'data' ACKs won't be filtered but do count as redundant ACKs.
975 */
976 if (thisconn) {
977 thisconn_seen_last = true;
978 /* if aggressive and this is a data ack we can skip
979 * checking it next time.
980 */
981 thisconn_checked_to = (aggressive && !pure_ack) ?
982 skb_check : skb_check_prev;
983 /* the first pure ack for this connection.
984 * record where it is, but only break if aggressive
985 * or already seen data ack from the same connection
986 */
987 if (pure_ack && !thisconn_ack) {
988 thisconn_ack = skb_check_prev;
989 if (aggressive || thisconn_redundant_seen)
990 break;
991 /* data ack or subsequent pure ack */
992 } else {
993 thisconn_redundant_seen = true;
994 /* this is the second ack for this connection
995 * break to filter the first pure ack
996 */
997 if (thisconn_ack)
998 break;
999 }
1000 /* track packets from non-matching tcp connections that will
1001 * need evaluation on the next run.
1002 * if there are packets from both the matching connection and
1003 * others that requre checking next run, track which was updated
1004 * last and return the older of the two to ensure full coverage.
1005 * if a non-matching pure ack has been seen, cannot skip any
1006 * further on the next run so don't update.
1007 */
1008 } else if (!otherconn_ack_seen) {
1009 thisconn_seen_last = false;
1010 if (pure_ack) {
1011 otherconn_ack_seen = true;
1012 /* if aggressive we don't care about old data,
1013 * start from the pure ack.
1014 * otherwise if there is a previous data ack,
1015 * start checking from it next time.
1016 */
1017 if (aggressive || !otherconn_checked_to)
1018 otherconn_checked_to = skb_check_prev;
1019 } else {
1020 otherconn_checked_to = aggressive ?
1021 skb_check : skb_check_prev;
1022 }
1023 }
1024 }
1025
1026 /* skb_check is reused at this point
1027 * it is the pure ACK to be filtered (if any)
1028 */
1029 skb_check = NULL;
1030
1031 /* next time start checking from the older/nearest to head of unfiltered
1032 * but important tcp packets from this connection and other connections.
1033 * if none seen, start after the last packet evaluated in the loop.
1034 */
1035 if (thisconn_checked_to && otherconn_checked_to)
1036 flow->ackcheck = thisconn_seen_last ?
1037 otherconn_checked_to : thisconn_checked_to;
1038 else if (thisconn_checked_to)
1039 flow->ackcheck = thisconn_checked_to;
1040 else if (otherconn_checked_to)
1041 flow->ackcheck = otherconn_checked_to;
1042 else
1043 flow->ackcheck = skb_check_prev;
1044
1045 /* if filtering, remove the pure ACK from the flow queue */
1046 if (thisconn_ack && (aggressive || thisconn_redundant_seen)) {
> 1047 if (PTR_ERR(thisconn_ack) == -1) {
1048 skb_check = flow->head;
1049 flow->head = flow->head->next;
1050 } else {
1051 skb_check = thisconn_ack->next;
1052 thisconn_ack->next = thisconn_ack->next->next;
1053 }
1054 }
1055
1056 /* we just filtered that ack, fix up the list */
1057 if (flow->ackcheck == skb_check)
1058 flow->ackcheck = thisconn_ack;
1059 /* check the entire flow queue next time */
1060 if (PTR_ERR(flow->ackcheck) == -1)
1061 flow->ackcheck = NULL;
1062
1063 return skb_check;
1064 }
1065
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
^ permalink raw reply
* Re: r8169 doesn't work after boot until `transmit queue 0 timed out`
From: ojab // @ 2018-05-03 8:09 UTC (permalink / raw)
To: nic_swsd, netdev
In-Reply-To: <CAKzrAgTEy5N+To7E23JeL7Vk7ci8qctNVA3=qzaX7YTd3gsKYw@mail.gmail.com>
On Fri, Apr 27, 2018 at 8:22 PM, ojab // <ojab@ojab.ru> wrote:
> Oh hai!
>
> I've created bugzilla ticket about this, but I'm not sure if anyone
> reads it, so duplicating here.
>
> I have new motherboard (ASUS A320-K) with 10ec:8168 realtek network
> card installed and it doesn't work (i. e. `tcpdump` shows outgoing
> packets but no packets are actually transmitted and `tcpdump` doesn't
> show incoming packets while they are trasmitted from the other side)
> then after 200-300 seconds I see [full stacktrace [1] and lspci [2]
> output are attached to bugzilla ticket]
>
> [ 256.996145] ------------[ cut here ]------------
> [ 256.997574] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
> [ 256.998992] WARNING: CPU: 6 PID: 0 at dev_watchdog+0x1f2/0x200
> …
>
> [ 257.012243] RIP: 0010:dev_watchdog+0x1f2/0x200
> …
> [ 257.032044] <IRQ>
> [ 257.033829] ? pfifo_fast_init+0x150/0x150
> [ 257.035618] call_timer_fn+0x2b/0x120
> [ 257.037400] run_timer_softirq+0x2f4/0x410
> [ 257.039170] ? pfifo_fast_init+0x150/0x150
> [ 257.040931] ? timerqueue_add+0x52/0x80
> [ 257.042694] ? __hrtimer_run_queues+0x161/0x2e0
> [ 257.044462] __do_softirq+0x111/0x32c
> [ 257.046223] irq_exit+0x85/0x90
> [ 257.047966] smp_apic_timer_interrupt+0x6c/0x120
> [ 257.049720] apic_timer_interrupt+0xf/0x20
> [ 257.051475] </IRQ>
>
> and everything starts working normally. How can I make it work right after boot?
>
> The issue is reproducible in linux-4.16.5 & linux-4.17-rc2 with
> rtl_nic fw from linux-firmware git master.
>
> [1] https://bugzilla.kernel.org/attachment.cgi?id=275627
> [2] https://bugzilla.kernel.org/attachment.cgi?id=275629
>
ping?
//wbr ojab
^ permalink raw reply
* Re: [PATCH net-next 2/2] selftests: forwarding: Allow running specific tests
From: Jiri Pirko @ 2018-05-03 8:03 UTC (permalink / raw)
To: Ido Schimmel; +Cc: netdev, davem, petrm, dsahern, mlxsw
In-Reply-To: <20180503075133.17450-3-idosch@mellanox.com>
Thu, May 03, 2018 at 09:51:33AM CEST, idosch@mellanox.com wrote:
>Similar to commit a511858c7536 ("selftests: fib_tests: Allow user to run
>a specific test"), allow user to run only a subset of the tests using
>the TESTS environment variable.
>
>This is useful when not all the tests can pass on a given system.
>
>Example:
># export TESTS="ping_ipv4 ping_ipv6"
># ./bridge_vlan_aware.sh
>TEST: ping [PASS]
>TEST: ping6 [PASS]
>
>Signed-off-by: Petr Machata <petrm@mellanox.com>
>Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Looks fine:
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
^ permalink raw reply
* [PATCH net-next 2/2] selftests: forwarding: Allow running specific tests
From: Ido Schimmel @ 2018-05-03 7:51 UTC (permalink / raw)
To: netdev; +Cc: davem, petrm, dsahern, mlxsw, Ido Schimmel
In-Reply-To: <20180503075133.17450-1-idosch@mellanox.com>
Similar to commit a511858c7536 ("selftests: fib_tests: Allow user to run
a specific test"), allow user to run only a subset of the tests using
the TESTS environment variable.
This is useful when not all the tests can pass on a given system.
Example:
# export TESTS="ping_ipv4 ping_ipv6"
# ./bridge_vlan_aware.sh
TEST: ping [PASS]
TEST: ping6 [PASS]
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
.../selftests/net/forwarding/bridge_vlan_aware.sh | 26 +++++++++++++---
.../net/forwarding/bridge_vlan_unaware.sh | 26 +++++++++++++---
tools/testing/selftests/net/forwarding/lib.sh | 9 ++++++
.../testing/selftests/net/forwarding/mirror_gre.sh | 36 +++++++++++++++++-----
.../selftests/net/forwarding/mirror_gre_bound.sh | 23 +++++++++++---
.../selftests/net/forwarding/mirror_gre_changes.sh | 29 ++++++++++++++---
.../selftests/net/forwarding/mirror_gre_flower.sh | 23 +++++++++++---
.../selftests/net/forwarding/mirror_gre_neigh.sh | 22 ++++++++++---
.../selftests/net/forwarding/mirror_gre_nh.sh | 8 +++--
tools/testing/selftests/net/forwarding/router.sh | 14 +++++++--
.../selftests/net/forwarding/router_multipath.sh | 15 +++++++--
.../testing/selftests/net/forwarding/tc_actions.sh | 25 ++++++++++-----
.../testing/selftests/net/forwarding/tc_chains.sh | 7 ++---
.../testing/selftests/net/forwarding/tc_flower.sh | 14 +++------
.../selftests/net/forwarding/tc_shblocks.sh | 5 +--
15 files changed, 219 insertions(+), 63 deletions(-)
diff --git a/tools/testing/selftests/net/forwarding/bridge_vlan_aware.sh b/tools/testing/selftests/net/forwarding/bridge_vlan_aware.sh
index 75d922438bc9..d8313d0438b7 100755
--- a/tools/testing/selftests/net/forwarding/bridge_vlan_aware.sh
+++ b/tools/testing/selftests/net/forwarding/bridge_vlan_aware.sh
@@ -1,6 +1,7 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
+ALL_TESTS="ping_ipv4 ping_ipv6 learning flooding"
NUM_NETIFS=4
CHECK_TC="yes"
source lib.sh
@@ -75,14 +76,31 @@ cleanup()
vrf_cleanup
}
+ping_ipv4()
+{
+ ping_test $h1 192.0.2.2
+}
+
+ping_ipv6()
+{
+ ping6_test $h1 2001:db8:1::2
+}
+
+learning()
+{
+ learning_test "br0" $swp1 $h1 $h2
+}
+
+flooding()
+{
+ flood_test $swp2 $h1 $h2
+}
+
trap cleanup EXIT
setup_prepare
setup_wait
-ping_test $h1 192.0.2.2
-ping6_test $h1 2001:db8:1::2
-learning_test "br0" $swp1 $h1 $h2
-flood_test $swp2 $h1 $h2
+tests_run
exit $EXIT_STATUS
diff --git a/tools/testing/selftests/net/forwarding/bridge_vlan_unaware.sh b/tools/testing/selftests/net/forwarding/bridge_vlan_unaware.sh
index 1cddf06f691d..c15c6c85c984 100755
--- a/tools/testing/selftests/net/forwarding/bridge_vlan_unaware.sh
+++ b/tools/testing/selftests/net/forwarding/bridge_vlan_unaware.sh
@@ -1,6 +1,7 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
+ALL_TESTS="ping_ipv4 ping_ipv6 learning flooding"
NUM_NETIFS=4
source lib.sh
@@ -73,14 +74,31 @@ cleanup()
vrf_cleanup
}
+ping_ipv4()
+{
+ ping_test $h1 192.0.2.2
+}
+
+ping_ipv6()
+{
+ ping6_test $h1 2001:db8:1::2
+}
+
+learning()
+{
+ learning_test "br0" $swp1 $h1 $h2
+}
+
+flooding()
+{
+ flood_test $swp2 $h1 $h2
+}
+
trap cleanup EXIT
setup_prepare
setup_wait
-ping_test $h1 192.0.2.2
-ping6_test $h1 2001:db8:1::2
-learning_test "br0" $swp1 $h1 $h2
-flood_test $swp2 $h1 $h2
+tests_run
exit $EXIT_STATUS
diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index a066ca536ac4..061c87bbf77c 100644
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -477,6 +477,15 @@ matchall_sink_create()
action drop
}
+tests_run()
+{
+ local current_test
+
+ for current_test in ${TESTS:-$ALL_TESTS}; do
+ $current_test
+ done
+}
+
##############################################################################
# Tests
diff --git a/tools/testing/selftests/net/forwarding/mirror_gre.sh b/tools/testing/selftests/net/forwarding/mirror_gre.sh
index a8abc736f67c..c6786d1b2b96 100755
--- a/tools/testing/selftests/net/forwarding/mirror_gre.sh
+++ b/tools/testing/selftests/net/forwarding/mirror_gre.sh
@@ -10,6 +10,14 @@
# traffic. Test that the payload is what is expected (ICMP ping request or
# reply, depending on test).
+ALL_TESTS="
+ test_gretap
+ test_ip6gretap
+ test_gretap_mac
+ test_ip6gretap_mac
+ test_two_spans
+"
+
NUM_NETIFS=6
source lib.sh
source mirror_lib.sh
@@ -100,22 +108,36 @@ test_two_spans()
log_test "two simultaneously configured mirrors ($tcflags)"
}
-test_all()
+test_gretap()
{
- slow_path_trap_install $swp1 ingress
- slow_path_trap_install $swp1 egress
-
full_test_span_gre_dir gt4 ingress 8 0 "mirror to gretap"
- full_test_span_gre_dir gt6 ingress 8 0 "mirror to ip6gretap"
full_test_span_gre_dir gt4 egress 0 8 "mirror to gretap"
+}
+
+test_ip6gretap()
+{
+ full_test_span_gre_dir gt6 ingress 8 0 "mirror to ip6gretap"
full_test_span_gre_dir gt6 egress 0 8 "mirror to ip6gretap"
+}
+test_gretap_mac()
+{
test_span_gre_mac gt4 ingress ip "mirror to gretap"
- test_span_gre_mac gt6 ingress ipv6 "mirror to ip6gretap"
test_span_gre_mac gt4 egress ip "mirror to gretap"
+}
+
+test_ip6gretap_mac()
+{
+ test_span_gre_mac gt6 ingress ipv6 "mirror to ip6gretap"
test_span_gre_mac gt6 egress ipv6 "mirror to ip6gretap"
+}
- test_two_spans
+test_all()
+{
+ slow_path_trap_install $swp1 ingress
+ slow_path_trap_install $swp1 egress
+
+ tests_run
slow_path_trap_uninstall $swp1 egress
slow_path_trap_uninstall $swp1 ingress
diff --git a/tools/testing/selftests/net/forwarding/mirror_gre_bound.sh b/tools/testing/selftests/net/forwarding/mirror_gre_bound.sh
index 3708ac0f400a..360ca133bead 100755
--- a/tools/testing/selftests/net/forwarding/mirror_gre_bound.sh
+++ b/tools/testing/selftests/net/forwarding/mirror_gre_bound.sh
@@ -42,6 +42,11 @@
# underlay manner, i.e. with a bound dummy device that marks underlay VRF where
# the encapsulated packed should be routed.
+ALL_TESTS="
+ test_gretap
+ test_ip6gretap
+"
+
NUM_NETIFS=6
source lib.sh
source mirror_lib.sh
@@ -178,6 +183,18 @@ cleanup()
vrf_cleanup
}
+test_gretap()
+{
+ full_test_span_gre_dir gt4 ingress 8 0 "mirror to gretap w/ UL"
+ full_test_span_gre_dir gt4 egress 0 8 "mirror to gretap w/ UL"
+}
+
+test_ip6gretap()
+{
+ full_test_span_gre_dir gt6 ingress 8 0 "mirror to ip6gretap w/ UL"
+ full_test_span_gre_dir gt6 egress 0 8 "mirror to ip6gretap w/ UL"
+}
+
test_all()
{
RET=0
@@ -185,11 +202,7 @@ test_all()
slow_path_trap_install $swp1 ingress
slow_path_trap_install $swp1 egress
- full_test_span_gre_dir gt4 ingress 8 0 "mirror to gretap w/ UL"
- full_test_span_gre_dir gt6 ingress 8 0 "mirror to ip6gretap w/ UL"
-
- full_test_span_gre_dir gt4 egress 0 8 "mirror to gretap w/ UL"
- full_test_span_gre_dir gt6 egress 0 8 "mirror to ip6gretap w/ UL"
+ tests_run
slow_path_trap_uninstall $swp1 egress
slow_path_trap_uninstall $swp1 ingress
diff --git a/tools/testing/selftests/net/forwarding/mirror_gre_changes.sh b/tools/testing/selftests/net/forwarding/mirror_gre_changes.sh
index 0ed288ac76d2..fdb612f69613 100755
--- a/tools/testing/selftests/net/forwarding/mirror_gre_changes.sh
+++ b/tools/testing/selftests/net/forwarding/mirror_gre_changes.sh
@@ -7,6 +7,13 @@
# Test how mirrors to gretap and ip6gretap react to changes to relevant
# configuration.
+ALL_TESTS="
+ test_ttl
+ test_tun_up
+ test_egress_up
+ test_remote_ip
+"
+
NUM_NETIFS=6
source lib.sh
source mirror_lib.sh
@@ -155,22 +162,36 @@ test_span_gre_remote_ip()
log_test "$what: remote address change ($tcflags)"
}
-test_all()
+test_ttl()
{
- slow_path_trap_install $swp1 ingress
- slow_path_trap_install $swp1 egress
-
test_span_gre_ttl gt4 gretap ip "mirror to gretap"
test_span_gre_ttl gt6 ip6gretap ipv6 "mirror to ip6gretap"
+}
+test_tun_up()
+{
test_span_gre_tun_up gt4 "mirror to gretap"
test_span_gre_tun_up gt6 "mirror to ip6gretap"
+}
+test_egress_up()
+{
test_span_gre_egress_up gt4 192.0.2.130 "mirror to gretap"
test_span_gre_egress_up gt6 2001:db8:2::2 "mirror to ip6gretap"
+}
+test_remote_ip()
+{
test_span_gre_remote_ip gt4 gretap 192.0.2.130 192.0.2.132 "mirror to gretap"
test_span_gre_remote_ip gt6 ip6gretap 2001:db8:2::2 2001:db8:2::4 "mirror to ip6gretap"
+}
+
+test_all()
+{
+ slow_path_trap_install $swp1 ingress
+ slow_path_trap_install $swp1 egress
+
+ tests_run
slow_path_trap_uninstall $swp1 egress
slow_path_trap_uninstall $swp1 ingress
diff --git a/tools/testing/selftests/net/forwarding/mirror_gre_flower.sh b/tools/testing/selftests/net/forwarding/mirror_gre_flower.sh
index 178a42d771aa..2e54407d8954 100755
--- a/tools/testing/selftests/net/forwarding/mirror_gre_flower.sh
+++ b/tools/testing/selftests/net/forwarding/mirror_gre_flower.sh
@@ -10,6 +10,11 @@
# this address, mirroring takes place, whereas when pinging the other one,
# there's no mirroring.
+ALL_TESTS="
+ test_gretap
+ test_ip6gretap
+"
+
NUM_NETIFS=6
source lib.sh
source mirror_lib.sh
@@ -81,6 +86,18 @@ full_test_span_gre_dir_acl()
log_test "$direction $what ($tcflags)"
}
+test_gretap()
+{
+ full_test_span_gre_dir_acl gt4 ingress 8 0 192.0.2.4 "ACL mirror to gretap"
+ full_test_span_gre_dir_acl gt4 egress 0 8 192.0.2.3 "ACL mirror to gretap"
+}
+
+test_ip6gretap()
+{
+ full_test_span_gre_dir_acl gt6 ingress 8 0 192.0.2.4 "ACL mirror to ip6gretap"
+ full_test_span_gre_dir_acl gt6 egress 0 8 192.0.2.3 "ACL mirror to ip6gretap"
+}
+
test_all()
{
RET=0
@@ -88,11 +105,7 @@ test_all()
slow_path_trap_install $swp1 ingress
slow_path_trap_install $swp1 egress
- full_test_span_gre_dir_acl gt4 ingress 8 0 192.0.2.4 "ACL mirror to gretap"
- full_test_span_gre_dir_acl gt6 ingress 8 0 192.0.2.4 "ACL mirror to ip6gretap"
-
- full_test_span_gre_dir_acl gt4 egress 0 8 192.0.2.3 "ACL mirror to gretap"
- full_test_span_gre_dir_acl gt6 egress 0 8 192.0.2.3 "ACL mirror to ip6gretap"
+ tests_run
slow_path_trap_uninstall $swp1 egress
slow_path_trap_uninstall $swp1 ingress
diff --git a/tools/testing/selftests/net/forwarding/mirror_gre_neigh.sh b/tools/testing/selftests/net/forwarding/mirror_gre_neigh.sh
index 1ca29ba4f338..fc0508e40fca 100755
--- a/tools/testing/selftests/net/forwarding/mirror_gre_neigh.sh
+++ b/tools/testing/selftests/net/forwarding/mirror_gre_neigh.sh
@@ -9,6 +9,11 @@
# is set up. Later on, the neighbor is deleted and it is expected to be
# reinitialized using the usual ARP process, and the mirroring offload updated.
+ALL_TESTS="
+ test_gretap
+ test_ip6gretap
+"
+
NUM_NETIFS=6
source lib.sh
source mirror_lib.sh
@@ -69,15 +74,24 @@ test_span_gre_neigh()
log_test "$direction $what: neighbor change ($tcflags)"
}
-test_all()
+test_gretap()
{
- slow_path_trap_install $swp1 ingress
- slow_path_trap_install $swp1 egress
-
test_span_gre_neigh 192.0.2.130 gt4 ingress "mirror to gretap"
test_span_gre_neigh 192.0.2.130 gt4 egress "mirror to gretap"
+}
+
+test_ip6gretap()
+{
test_span_gre_neigh 2001:db8:2::2 gt6 ingress "mirror to ip6gretap"
test_span_gre_neigh 2001:db8:2::2 gt6 egress "mirror to ip6gretap"
+}
+
+test_all()
+{
+ slow_path_trap_install $swp1 ingress
+ slow_path_trap_install $swp1 egress
+
+ tests_run
slow_path_trap_uninstall $swp1 egress
slow_path_trap_uninstall $swp1 ingress
diff --git a/tools/testing/selftests/net/forwarding/mirror_gre_nh.sh b/tools/testing/selftests/net/forwarding/mirror_gre_nh.sh
index 9ac70978541f..a0d1ad46a2bc 100755
--- a/tools/testing/selftests/net/forwarding/mirror_gre_nh.sh
+++ b/tools/testing/selftests/net/forwarding/mirror_gre_nh.sh
@@ -7,6 +7,11 @@
# Test that gretap and ip6gretap mirroring works when the other tunnel endpoint
# is reachable through a next-hop route (as opposed to directly-attached route).
+ALL_TESTS="
+ test_gretap
+ test_ip6gretap
+"
+
NUM_NETIFS=6
source lib.sh
source mirror_lib.sh
@@ -92,8 +97,7 @@ test_all()
slow_path_trap_install $swp1 ingress
slow_path_trap_install $swp1 egress
- test_gretap
- test_ip6gretap
+ tests_run
slow_path_trap_uninstall $swp1 egress
slow_path_trap_uninstall $swp1 ingress
diff --git a/tools/testing/selftests/net/forwarding/router.sh b/tools/testing/selftests/net/forwarding/router.sh
index cc6a14abfa87..a75cb51cc5bd 100755
--- a/tools/testing/selftests/net/forwarding/router.sh
+++ b/tools/testing/selftests/net/forwarding/router.sh
@@ -1,6 +1,7 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
+ALL_TESTS="ping_ipv4 ping_ipv6"
NUM_NETIFS=4
source lib.sh
@@ -114,12 +115,21 @@ cleanup()
vrf_cleanup
}
+ping_ipv4()
+{
+ ping_test $h1 198.51.100.2
+}
+
+ping_ipv6()
+{
+ ping6_test $h1 2001:db8:2::2
+}
+
trap cleanup EXIT
setup_prepare
setup_wait
-ping_test $h1 198.51.100.2
-ping6_test $h1 2001:db8:2::2
+tests_run
exit $EXIT_STATUS
diff --git a/tools/testing/selftests/net/forwarding/router_multipath.sh b/tools/testing/selftests/net/forwarding/router_multipath.sh
index 2bd3d41354d0..6c4376289695 100755
--- a/tools/testing/selftests/net/forwarding/router_multipath.sh
+++ b/tools/testing/selftests/net/forwarding/router_multipath.sh
@@ -1,6 +1,7 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
+ALL_TESTS="ping_ipv4 ping_ipv6 multipath_test"
NUM_NETIFS=8
source lib.sh
@@ -364,13 +365,21 @@ cleanup()
vrf_cleanup
}
+ping_ipv4()
+{
+ ping_test $h1 198.51.100.2
+}
+
+ping_ipv6()
+{
+ ping6_test $h1 2001:db8:2::2
+}
+
trap cleanup EXIT
setup_prepare
setup_wait
-ping_test $h1 198.51.100.2
-ping6_test $h1 2001:db8:2::2
-multipath_test
+tests_run
exit $EXIT_STATUS
diff --git a/tools/testing/selftests/net/forwarding/tc_actions.sh b/tools/testing/selftests/net/forwarding/tc_actions.sh
index 3a6385ebd5d0..813d02d1939d 100755
--- a/tools/testing/selftests/net/forwarding/tc_actions.sh
+++ b/tools/testing/selftests/net/forwarding/tc_actions.sh
@@ -1,6 +1,8 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
+ALL_TESTS="gact_drop_and_ok_test mirred_egress_redirect_test \
+ mirred_egress_mirror_test gact_trap_test"
NUM_NETIFS=4
source tc_common.sh
source lib.sh
@@ -111,6 +113,10 @@ gact_trap_test()
{
RET=0
+ if [[ "$tcflags" != "skip_sw" ]]; then
+ return 0;
+ fi
+
tc filter add dev $swp1 ingress protocol ip pref 1 handle 101 flower \
skip_hw dst_ip 192.0.2.2 action drop
tc filter add dev $swp1 ingress protocol ip pref 3 handle 103 flower \
@@ -179,24 +185,29 @@ cleanup()
ip link set $swp1 address $swp1origmac
}
+mirred_egress_redirect_test()
+{
+ mirred_egress_test "redirect"
+}
+
+mirred_egress_mirror_test()
+{
+ mirred_egress_test "mirror"
+}
+
trap cleanup EXIT
setup_prepare
setup_wait
-gact_drop_and_ok_test
-mirred_egress_test "redirect"
-mirred_egress_test "mirror"
+tests_run
tc_offload_check
if [[ $? -ne 0 ]]; then
log_info "Could not test offloaded functionality"
else
tcflags="skip_sw"
- gact_drop_and_ok_test
- mirred_egress_test "redirect"
- mirred_egress_test "mirror"
- gact_trap_test
+ tests_run
fi
exit $EXIT_STATUS
diff --git a/tools/testing/selftests/net/forwarding/tc_chains.sh b/tools/testing/selftests/net/forwarding/tc_chains.sh
index 2fd15226974b..d2c783e94df3 100755
--- a/tools/testing/selftests/net/forwarding/tc_chains.sh
+++ b/tools/testing/selftests/net/forwarding/tc_chains.sh
@@ -1,6 +1,7 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
+ALL_TESTS="unreachable_chain_test gact_goto_chain_test"
NUM_NETIFS=2
source tc_common.sh
source lib.sh
@@ -107,16 +108,14 @@ trap cleanup EXIT
setup_prepare
setup_wait
-unreachable_chain_test
-gact_goto_chain_test
+tests_run
tc_offload_check
if [[ $? -ne 0 ]]; then
log_info "Could not test offloaded functionality"
else
tcflags="skip_sw"
- unreachable_chain_test
- gact_goto_chain_test
+ tests_run
fi
exit $EXIT_STATUS
diff --git a/tools/testing/selftests/net/forwarding/tc_flower.sh b/tools/testing/selftests/net/forwarding/tc_flower.sh
index 0c54059f1875..20d1077e5a3d 100755
--- a/tools/testing/selftests/net/forwarding/tc_flower.sh
+++ b/tools/testing/selftests/net/forwarding/tc_flower.sh
@@ -1,6 +1,8 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
+ALL_TESTS="match_dst_mac_test match_src_mac_test match_dst_ip_test \
+ match_src_ip_test match_ip_flags_test"
NUM_NETIFS=2
source tc_common.sh
source lib.sh
@@ -245,22 +247,14 @@ trap cleanup EXIT
setup_prepare
setup_wait
-match_dst_mac_test
-match_src_mac_test
-match_dst_ip_test
-match_src_ip_test
-match_ip_flags_test
+tests_run
tc_offload_check
if [[ $? -ne 0 ]]; then
log_info "Could not test offloaded functionality"
else
tcflags="skip_sw"
- match_dst_mac_test
- match_src_mac_test
- match_dst_ip_test
- match_src_ip_test
- match_ip_flags_test
+ tests_run
fi
exit $EXIT_STATUS
diff --git a/tools/testing/selftests/net/forwarding/tc_shblocks.sh b/tools/testing/selftests/net/forwarding/tc_shblocks.sh
index 077b98048ef4..b5b917203815 100755
--- a/tools/testing/selftests/net/forwarding/tc_shblocks.sh
+++ b/tools/testing/selftests/net/forwarding/tc_shblocks.sh
@@ -1,6 +1,7 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
+ALL_TESTS="shared_block_test"
NUM_NETIFS=4
source tc_common.sh
source lib.sh
@@ -109,14 +110,14 @@ trap cleanup EXIT
setup_prepare
setup_wait
-shared_block_test
+tests_run
tc_offload_check
if [[ $? -ne 0 ]]; then
log_info "Could not test offloaded functionality"
else
tcflags="skip_sw"
- shared_block_test
+ tests_run
fi
exit $EXIT_STATUS
--
2.14.3
^ permalink raw reply related
* [PATCH net-next 1/2] selftests: forwarding: Increase maximum deviation in multipath test
From: Ido Schimmel @ 2018-05-03 7:51 UTC (permalink / raw)
To: netdev; +Cc: davem, petrm, dsahern, mlxsw, Ido Schimmel
In-Reply-To: <20180503075133.17450-1-idosch@mellanox.com>
We sometimes observe failures in the test due to too large discrepancy
between the measured and expected ratios. For example:
TEST: ECMP [FAIL]
Too large discrepancy between expected and measured ratios
INFO: Expected ratio 1.00 Measured ratio 1.11
Fix this by allowing an up to 15% deviation between both ratios.
Another possibility is to increase the number of generated flows, but
this will prolong the execution time of the test, which is already quite
high.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
tools/testing/selftests/net/forwarding/router_multipath.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/forwarding/router_multipath.sh b/tools/testing/selftests/net/forwarding/router_multipath.sh
index 3bc351008db6..2bd3d41354d0 100755
--- a/tools/testing/selftests/net/forwarding/router_multipath.sh
+++ b/tools/testing/selftests/net/forwarding/router_multipath.sh
@@ -191,7 +191,7 @@ multipath_eval()
diff=$(echo $weights_ratio - $packets_ratio | bc -l)
diff=${diff#-}
- test "$(echo "$diff / $weights_ratio > 0.1" | bc -l)" -eq 0
+ test "$(echo "$diff / $weights_ratio > 0.15" | bc -l)" -eq 0
check_err $? "Too large discrepancy between expected and measured ratios"
log_test "$desc"
log_info "Expected ratio $weights_ratio Measured ratio $packets_ratio"
--
2.14.3
^ permalink raw reply related
* [PATCH net-next 0/2] selftests: forwarding: Two enhancements
From: Ido Schimmel @ 2018-05-03 7:51 UTC (permalink / raw)
To: netdev; +Cc: davem, petrm, dsahern, mlxsw, Ido Schimmel
First patch increases the maximum deviation in the multipath tests which
proved to be too low in some cases.
Second patch allows user to run only specific tests from each file using
the TESTS environment variable. This granularity is needed in setups
where not all the tests can pass.
Ido Schimmel (2):
selftests: forwarding: Increase maximum deviation in multipath test
selftests: forwarding: Allow running specific tests
.../selftests/net/forwarding/bridge_vlan_aware.sh | 26 +++++++++++++---
.../net/forwarding/bridge_vlan_unaware.sh | 26 +++++++++++++---
tools/testing/selftests/net/forwarding/lib.sh | 9 ++++++
.../testing/selftests/net/forwarding/mirror_gre.sh | 36 +++++++++++++++++-----
.../selftests/net/forwarding/mirror_gre_bound.sh | 23 +++++++++++---
.../selftests/net/forwarding/mirror_gre_changes.sh | 29 ++++++++++++++---
.../selftests/net/forwarding/mirror_gre_flower.sh | 23 +++++++++++---
.../selftests/net/forwarding/mirror_gre_neigh.sh | 22 ++++++++++---
.../selftests/net/forwarding/mirror_gre_nh.sh | 8 +++--
tools/testing/selftests/net/forwarding/router.sh | 14 +++++++--
.../selftests/net/forwarding/router_multipath.sh | 17 +++++++---
.../testing/selftests/net/forwarding/tc_actions.sh | 25 ++++++++++-----
.../testing/selftests/net/forwarding/tc_chains.sh | 7 ++---
.../testing/selftests/net/forwarding/tc_flower.sh | 14 +++------
.../selftests/net/forwarding/tc_shblocks.sh | 5 +--
15 files changed, 220 insertions(+), 64 deletions(-)
--
2.14.3
^ permalink raw reply
* Re: [PATCH net] macsonic: Set platform device coherent_dma_mask
From: Geert Uytterhoeven @ 2018-05-03 7:25 UTC (permalink / raw)
To: Finn Thain; +Cc: David S. Miller, linux-m68k, netdev, Linux Kernel Mailing List
In-Reply-To: <S1752057AbeECEYP/20180503042418Z+1168@vger.kernel.org>
Hi Finn,
On Thu, May 3, 2018 at 6:24 AM, Finn Thain <fthain@telegraphics.com.au> wrote:
> Set the device's coherent_dma_mask to avoid a WARNING splat.
> Please see commit 205e1b7f51e4 ("dma-mapping: warn when there is
> no coherent_dma_mask").
>
> Cc: linux-m68k@lists.linux-m68k.org
> Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Thanks for your patch!
> --- a/drivers/net/ethernet/natsemi/macsonic.c
> +++ b/drivers/net/ethernet/natsemi/macsonic.c
> @@ -523,6 +523,10 @@ static int mac_sonic_platform_probe(struct platform_device *pdev)
> struct sonic_local *lp;
> int err;
>
> + err = dma_coerce_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
> + if (err)
> + return err;
> +
> dev = alloc_etherdev(sizeof(struct sonic_local));
> if (!dev)
> return -ENOMEM;
Shouldn't this be handled in the platform code that instantiates the device,
i.e. in arch/m68k/mac/config.c:mac_platform_init()?
Cfr. commit f61e64310b75733d ("m68k: set dma and coherent masks for platform
FEC ethernets").
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply
* Re: [PATCH net] macmace: Set platform device coherent_dma_mask
From: Geert Uytterhoeven @ 2018-05-03 7:25 UTC (permalink / raw)
To: Finn Thain; +Cc: David S. Miller, linux-m68k, netdev, Linux Kernel Mailing List
In-Reply-To: <S1751632AbeECEYA/20180503042400Z+254@vger.kernel.org>
Hi Finn,
On Thu, May 3, 2018 at 6:23 AM, Finn Thain <fthain@telegraphics.com.au> wrote:
> Set the device's coherent_dma_mask to avoid a WARNING splat.
> Please see commit 205e1b7f51e4 ("dma-mapping: warn when there is
> no coherent_dma_mask").
>
> Cc: linux-m68k@lists.linux-m68k.org
> Tested-by: Stan Johnson <userm57@yahoo.com>
> Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Thanks for your patch!
> --- a/drivers/net/ethernet/apple/macmace.c
> +++ b/drivers/net/ethernet/apple/macmace.c
> @@ -203,6 +203,10 @@ static int mace_probe(struct platform_device *pdev)
> unsigned char checksum = 0;
> int err;
>
> + err = dma_coerce_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
> + if (err)
> + return err;
> +
> dev = alloc_etherdev(PRIV_BYTES);
> if (!dev)
> return -ENOMEM;
Shouldn't this be handled in the platform code that instantiates the device,
i.e. in arch/m68k/mac/config.c:mac_platform_init()?
Cfr. commit f61e64310b75733d ("m68k: set dma and coherent masks for platform
FEC ethernets").
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply
* Re: [PATCH net-next v3 1/2] openvswitch: Add conntrack limit netlink definition
From: Pravin Shelar @ 2018-05-03 6:49 UTC (permalink / raw)
To: Yi-Hung Wei; +Cc: Linux Kernel Network Developers
In-Reply-To: <1525123713-38891-2-git-send-email-yihung.wei@gmail.com>
On Mon, Apr 30, 2018 at 2:28 PM, Yi-Hung Wei <yihung.wei@gmail.com> wrote:
> Define netlink messages and attributes to support user kernel
> communication that uses the conntrack limit feature.
>
> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
> ---
> include/uapi/linux/openvswitch.h | 62 ++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 62 insertions(+)
>
> diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
> index 713e56ce681f..ca63c16375ce 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -937,4 +937,66 @@ enum ovs_meter_band_type {
>
> #define OVS_METER_BAND_TYPE_MAX (__OVS_METER_BAND_TYPE_MAX - 1)
>
> +/* Conntrack limit */
> +#define OVS_CT_LIMIT_FAMILY "ovs_ct_limit"
> +#define OVS_CT_LIMIT_MCGROUP "ovs_ct_limit"
> +#define OVS_CT_LIMIT_VERSION 0x1
> +
> +enum ovs_ct_limit_cmd {
> + OVS_CT_LIMIT_CMD_UNSPEC,
> + OVS_CT_LIMIT_CMD_SET, /* Add or modify ct limit. */
> + OVS_CT_LIMIT_CMD_DEL, /* Delete ct limit. */
> + OVS_CT_LIMIT_CMD_GET /* Get ct limit. */
> +};
> +
> +enum ovs_ct_limit_attr {
> + OVS_CT_LIMIT_ATTR_UNSPEC,
> + OVS_CT_LIMIT_ATTR_OPTION, /* Nested OVS_CT_LIMIT_ATTR_* */
> + __OVS_CT_LIMIT_ATTR_MAX
> +};
> +
> +#define OVS_CT_LIMIT_ATTR_MAX (__OVS_CT_LIMIT_ATTR_MAX - 1)
> +
> +/**
> + * @OVS_CT_ZONE_LIMIT_ATTR_SET_REQ: Contains either
> + * OVS_CT_ZONE_LIMIT_ATTR_DEFAULT_LIMIT or a pair of
> + * OVS_CT_ZONE_LIMIT_ATTR_ZONE and OVS_CT_ZONE_LIMIT_ATTR_LIMIT.
> + * @OVS_CT_ZONE_LIMIT_ATTR_DEL_REQ: Contains OVS_CT_ZONE_LIMIT_ATTR_ZONE.
> + * @OVS_CT_ZONE_LIMIT_ATTR_GET_REQ: Contains OVS_CT_ZONE_LIMIT_ATTR_ZONE.
> + * @OVS_CT_ZONE_LIMIT_ATTR_GET_RLY: Contains either
> + * OVS_CT_ZONE_LIMIT_ATTR_DEFAULT_LIMIT or a triple of
> + * OVS_CT_ZONE_LIMIT_ATTR_ZONE, OVS_CT_ZONE_LIMIT_ATTR_LIMIT and
> + * OVS_CT_ZONE_LIMIT_ATTR_COUNT.
> + */
> +enum ovs_ct_limit_option_attr {
> + OVS_CT_LIMIT_OPTION_ATTR_UNSPEC,
> + OVS_CT_ZONE_LIMIT_ATTR_SET_REQ, /* Nested OVS_CT_ZONE_LIMIT_ATTR_*
> + * attributes. */
> + OVS_CT_ZONE_LIMIT_ATTR_DEL_REQ, /* Nested OVS_CT_ZONE_LIMIT_ATTR_*
> + * attributes. */
> + OVS_CT_ZONE_LIMIT_ATTR_GET_REQ, /* Nested OVS_CT_ZONE_LIMIT_ATTR_*
> + * attributes. */
> + OVS_CT_ZONE_LIMIT_ATTR_GET_RLY, /* Nested OVS_CT_ZONE_LIMIT_ATTR_*
This option looks redundant to me, can we just use ovs_ct_limit_cmd
and have nested attributes with ovs_ct_zone_limit_attr as parameters ?
I do not see need for ovs_ct_limit_attr either, These changes would
simplify the interface.
> + * attributes. */
> + __OVS_CT_LIMIT_OPTION_ATTR_MAX
> +};
> +
> +#define OVS_CT_LIMIT_OPTION_ATTR_MAX (__OVS_CT_LIMIT_OPTION_ATTR_MAX - 1)
> +
> +enum ovs_ct_zone_limit_attr {
> + OVS_CT_ZONE_LIMIT_ATTR_UNSPEC,
> + OVS_CT_ZONE_LIMIT_ATTR_DEFAULT_LIMIT, /* u32 default conntrack limit
> + * for all zones. */
> + OVS_CT_ZONE_LIMIT_ATTR_ZONE, /* u16 conntrack zone id. */
> + OVS_CT_ZONE_LIMIT_ATTR_LIMIT, /* u32 max number of conntrack
> + * entries allowed in the
> + * corresponding zone. */
> + OVS_CT_ZONE_LIMIT_ATTR_COUNT, /* u32 number of conntrack
> + * entries in the corresponding
> + * zone. */
> + __OVS_CT_ZONE_LIMIT_ATTR_MAX
> +};
> +
> +#define OVS_CT_ZONE_LIMIT_ATTR_MAX (__OVS_CT_ZONE_LIMIT_ATTR_MAX - 1)
> +
> #endif /* _LINUX_OPENVSWITCH_H */
> --
> 2.7.4
>
^ permalink raw reply
* Re: [RFC v3 4/5] virtio_ring: add event idx support in packed ring
From: Jason Wang @ 2018-05-03 7:25 UTC (permalink / raw)
To: Tiwei Bie, Michael S. Tsirkin; +Cc: netdev, wexu, linux-kernel, virtualization
In-Reply-To: <20180503020949.5u3qz32gsk33z6vk@debian>
On 2018年05月03日 10:09, Tiwei Bie wrote:
>>>> So how about we use the straightforward way then?
>>> You mean we do new += vq->vring_packed.num instead
>>> of event_idx -= vq->vring_packed.num before calling
>>> vring_need_event()?
>>>
>>> The problem is that, the second param (new_idx) of
>>> vring_need_event() will be used for:
>>>
>>> (__u16)(new_idx - event_idx - 1)
>>> (__u16)(new_idx - old)
>>>
>>> So if we change new, we will need to change old too.
>> I think that since we have a branch there anyway,
>> we are better off just special-casing if (wrap_counter != vq->wrap_counter).
>> Treat is differenty and avoid casts.
>>
>>> And that would be an ugly hack..
>>>
>>> Best regards,
>>> Tiwei Bie
>> I consider casts and huge numbers with two's complement
>> games even uglier.
> The dependency on two's complement game is introduced
> since the split ring.
>
> In packed ring, old is calculated via:
>
> old = vq->next_avail_idx - vq->num_added;
>
> In split ring, old is calculated via:
>
> old = vq->avail_idx_shadow - vq->num_added;
>
> In both cases, when vq->num_added is bigger, old will
> be a big number.
>
> Best regards,
> Tiwei Bie
>
How about just do something like vhost:
static u16 vhost_idx_diff(struct vhost_virtqueue *vq, u16 old, u16 new)
{
if (new > old)
return new - old;
return (new + vq->num - old);
}
static bool vhost_vring_packed_need_event(struct vhost_virtqueue *vq,
__u16 event_off, __u16 new,
__u16 old)
{
return (__u16)(vhost_idx_diff(vq, new, event_off) - 1) <
(__u16)vhost_idx_diff(vq, new, old);
}
?
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: DSA switch
From: Ran Shalit @ 2018-05-03 7:25 UTC (permalink / raw)
To: Jiri Pirko; +Cc: Andrew Lunn, netdev
In-Reply-To: <20180503071124.GM19250@nanopsycho>
On Thu, May 3, 2018 at 10:11 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> Thu, May 03, 2018 at 08:50:52AM CEST, ranshalit@gmail.com wrote:
>>On Wed, May 2, 2018 at 11:56 PM, Andrew Lunn <andrew@lunn.ch> wrote:
>>> On Wed, May 02, 2018 at 11:20:05PM +0300, Ran Shalit wrote:
>>>> Hello,
>>>>
>>>> Is it possible to use switch just like external real switch,
>>>> connecting all ports to the same subnet ?
>>>
>>> Yes. Just bridge all ports/interfaces together and put your host IP
>>> address on the bridge.
>>>
>>> Andrew
>>
>>
>>Hi,
>>
>>I get error on trying to add bridge.
>>I am trying to =understand which configuration is missing probably in my kernel,
>> I ran strace, but not sure , does it point to any missing configuration ?
>>
>>root@dm814x-evm:~# ip link add br0 type bridge
>
> Is the bridge module enabled in the kernel config?
Yes, I've also added all configuration listed in
https://www.thelinuxfaq.com/355-rtnetlink-answers-operation-not-supported-on-centos
(we old kernel 2.6.37, which support TI's chip)
>
>
>>RTNETLINK answers: Operation not supported
I've managed doing it with brctl instead and it seems to work fine.
ifconfig lan0 0.0.0.0
ifconfig lan1 0.0.0.0
ifconfig lan2 0.0.0.0
ifconfig lan3 0.0.0.0
brctl addbr br0
brctl addif br0 lan0
brctl addif br0 lan1
brctl addif br0 lan2
brctl addif br0 lan3
ifconfig br0 150.42.40.222
Yet, brctl command seems to take time (about a second till it
returns), and we have a requirement for fast boot,
So, I wander why " ip link add br0 type bridge" command gave those errors.
I also notice in the strace I've pasted here the following:
open("/usr/lib//ip/link_bridge.so", O_RDONLY) = -1 ENOENT (No such
file or directory)
There is really no such file in my filesystem /usr/lib//ip/link_bridge.so.
Why is it missing ?
Thank you,
ranran
^ permalink raw reply
* Re: [RFC V3 PATCH 1/8] vhost: move get_rx_bufs to vhost.c
From: Jason Wang @ 2018-05-03 7:19 UTC (permalink / raw)
To: Tiwei Bie; +Cc: mst, kvm, virtualization, netdev, linux-kernel, jfreimann, wexu
In-Reply-To: <20180502080518.h52wme46fnqpyfpf@debian>
On 2018年05月02日 16:05, Tiwei Bie wrote:
> On Mon, Apr 23, 2018 at 01:34:53PM +0800, Jason Wang wrote:
>> Move get_rx_bufs() to vhost.c and rename it to
>> vhost_get_rx_bufs(). This helps to hide vring internal layout from
> A small typo. Based on the code change in this patch, it
> seems that this function is renamed to vhost_get_bufs().
>
> Thanks
>
Right, let me fix it in the next version.
Thanks
^ permalink raw reply
* Re: [PATCH v2 bpf-next 2/2] bpf: add selftest for stackmap with build_id in NMI context
From: Tobin C. Harding @ 2018-05-03 7:19 UTC (permalink / raw)
To: Song Liu; +Cc: netdev, kernel-team, qinteng
In-Reply-To: <20180502232030.3788284-3-songliubraving@fb.com>
On Wed, May 02, 2018 at 04:20:30PM -0700, Song Liu wrote:
> This new test captures stackmap with build_id with hardware event
> PERF_COUNT_HW_CPU_CYCLES.
>
> Because we only support one ips-to-build_id lookup per cpu in NMI
> context, stack_amap will not be able to do the lookup in this test.
stack_map ?
> Therefore, we didn't do compare_stack_ips(), as it will alwasy fail.
>
> urandom_read.c is extended to run configurable cycles so that it can be
> caught by the perf event.
>
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
> tools/testing/selftests/bpf/test_progs.c | 137 +++++++++++++++++++++++++++++
> tools/testing/selftests/bpf/urandom_read.c | 10 ++-
> 2 files changed, 145 insertions(+), 2 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
> index aa336f0..00bb08c 100644
> --- a/tools/testing/selftests/bpf/test_progs.c
> +++ b/tools/testing/selftests/bpf/test_progs.c
> @@ -1272,6 +1272,142 @@ static void test_stacktrace_build_id(void)
> return;
> }
>
> +static void test_stacktrace_build_id_nmi(void)
> +{
> + int control_map_fd, stackid_hmap_fd, stackmap_fd, stack_amap_fd;
> + const char *file = "./test_stacktrace_build_id.o";
> + int err, pmu_fd, prog_fd;
> + struct perf_event_attr attr = {
> + .sample_freq = 5000,
> + .freq = 1,
> + .type = PERF_TYPE_HARDWARE,
> + .config = PERF_COUNT_HW_CPU_CYCLES,
> + };
> + __u32 key, previous_key, val, duration = 0;
> + struct bpf_object *obj;
> + char buf[256];
> + int i, j;
> + struct bpf_stack_build_id id_offs[PERF_MAX_STACK_DEPTH];
> + int build_id_matches = 0;
> +
> + err = bpf_prog_load(file, BPF_PROG_TYPE_PERF_EVENT, &obj, &prog_fd);
> + if (CHECK(err, "prog_load", "err %d errno %d\n", err, errno))
> + goto out;
perhaps:
return;
> + pmu_fd = syscall(__NR_perf_event_open, &attr, -1 /* pid */,
> + 0 /* cpu 0 */, -1 /* group id */,
> + 0 /* flags */);
> + if (CHECK(pmu_fd < 0, "perf_event_open",
> + "err %d errno %d. Does the test host support PERF_COUNT_HW_CPU_CYCLES?\n",
> + pmu_fd, errno))
> + goto close_prog;
> +
> + err = ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
> + if (CHECK(err, "perf_event_ioc_enable", "err %d errno %d\n",
> + err, errno))
> + goto close_pmu;
> +
> + err = ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
> + if (CHECK(err, "perf_event_ioc_set_bpf", "err %d errno %d\n",
> + err, errno))
> + goto disable_pmu;
> +
> + /* find map fds */
> + control_map_fd = bpf_find_map(__func__, obj, "control_map");
> + if (CHECK(control_map_fd < 0, "bpf_find_map control_map",
> + "err %d errno %d\n", err, errno))
> + goto disable_pmu;
> +
> + stackid_hmap_fd = bpf_find_map(__func__, obj, "stackid_hmap");
> + if (CHECK(stackid_hmap_fd < 0, "bpf_find_map stackid_hmap",
> + "err %d errno %d\n", err, errno))
> + goto disable_pmu;
> +
> + stackmap_fd = bpf_find_map(__func__, obj, "stackmap");
> + if (CHECK(stackmap_fd < 0, "bpf_find_map stackmap", "err %d errno %d\n",
> + err, errno))
> + goto disable_pmu;
> +
> + stack_amap_fd = bpf_find_map(__func__, obj, "stack_amap");
> + if (CHECK(stack_amap_fd < 0, "bpf_find_map stack_amap",
> + "err %d errno %d\n", err, errno))
> + goto disable_pmu;
> +
> + assert(system("dd if=/dev/urandom of=/dev/zero count=4 2> /dev/null")
> + == 0);
> + assert(system("taskset 0x1 ./urandom_read 100000") == 0);
> + /* disable stack trace collection */
> + key = 0;
> + val = 1;
> + bpf_map_update_elem(control_map_fd, &key, &val, 0);
> +
> + /* for every element in stackid_hmap, we can find a corresponding one
> + * in stackmap, and vise versa.
> + */
> + err = compare_map_keys(stackid_hmap_fd, stackmap_fd);
> + if (CHECK(err, "compare_map_keys stackid_hmap vs. stackmap",
> + "err %d errno %d\n", err, errno))
> + goto disable_pmu;
> +
> + err = compare_map_keys(stackmap_fd, stackid_hmap_fd);
> + if (CHECK(err, "compare_map_keys stackmap vs. stackid_hmap",
> + "err %d errno %d\n", err, errno))
> + goto disable_pmu;
> +
> + err = extract_build_id(buf, 256);
> +
> + if (CHECK(err, "get build_id with readelf",
> + "err %d errno %d\n", err, errno))
> + goto disable_pmu;
> +
> + err = bpf_map_get_next_key(stackmap_fd, NULL, &key);
> + if (CHECK(err, "get_next_key from stackmap",
> + "err %d, errno %d\n", err, errno))
> + goto disable_pmu;
> +
> + do {
> + char build_id[64];
> +
> + err = bpf_map_lookup_elem(stackmap_fd, &key, id_offs);
> + if (CHECK(err, "lookup_elem from stackmap",
> + "err %d, errno %d\n", err, errno))
> + goto disable_pmu;
> + for (i = 0; i < PERF_MAX_STACK_DEPTH; ++i)
> + if (id_offs[i].status == BPF_STACK_BUILD_ID_VALID &&
> + id_offs[i].offset != 0) {
> + for (j = 0; j < 20; ++j)
> + sprintf(build_id + 2 * j, "%02x",
> + id_offs[i].build_id[j] & 0xff);
> + if (strstr(buf, build_id) != NULL)
> + build_id_matches = 1;
> + }
> + previous_key = key;
> + } while (bpf_map_get_next_key(stackmap_fd, &previous_key, &key) == 0);
> +
> + if (CHECK(build_id_matches < 1, "build id match",
> + "Didn't find expected build ID from the map\n"))
> + goto disable_pmu;
> +
> + /*
> + * We intentionally skip compare_stack_ips(). This is because we
> + * only support one in_nmi() ips-to-build_id translation per cpu
> + * at any time, thus stack_amap here will always fallback to
> + * BPF_STACK_BUILD_ID_IP;
> + */
> +
> +disable_pmu:
> + ioctl(pmu_fd, PERF_EVENT_IOC_DISABLE);
> +
> +close_pmu:
> + close(pmu_fd);
> +
> +close_prog:
> + bpf_object__close(obj);
> +
> +out:
> + return;
> +}
No real need for label 'out' right? We can just return directly and
remove the last three lines of this function.
Hope this helps,
Tobin.
^ permalink raw reply
* Re: DSA switch
From: Jiri Pirko @ 2018-05-03 7:11 UTC (permalink / raw)
To: Ran Shalit; +Cc: Andrew Lunn, netdev
In-Reply-To: <CAJ2oMhLMSmzUHGSMS6AFhRfWSTYwjTJ1-E7Gsx0Pn9Opmtb5YA@mail.gmail.com>
Thu, May 03, 2018 at 08:50:52AM CEST, ranshalit@gmail.com wrote:
>On Wed, May 2, 2018 at 11:56 PM, Andrew Lunn <andrew@lunn.ch> wrote:
>> On Wed, May 02, 2018 at 11:20:05PM +0300, Ran Shalit wrote:
>>> Hello,
>>>
>>> Is it possible to use switch just like external real switch,
>>> connecting all ports to the same subnet ?
>>
>> Yes. Just bridge all ports/interfaces together and put your host IP
>> address on the bridge.
>>
>> Andrew
>
>
>Hi,
>
>I get error on trying to add bridge.
>I am trying to =understand which configuration is missing probably in my kernel,
> I ran strace, but not sure , does it point to any missing configuration ?
>
>root@dm814x-evm:~# ip link add br0 type bridge
Is the bridge module enabled in the kernel config?
>RTNETLINK answers: Operation not supported
^ permalink raw reply
* Re: [PATCH v2 bpf-next 1/2] bpf: enable stackmap with build_id in nmi context
From: Tobin C. Harding @ 2018-05-03 7:03 UTC (permalink / raw)
To: Song Liu
Cc: netdev, kernel-team, qinteng, Alexei Starovoitov, Daniel Borkmann,
Peter Zijlstra
In-Reply-To: <20180502232030.3788284-2-songliubraving@fb.com>
On Wed, May 02, 2018 at 04:20:29PM -0700, Song Liu wrote:
> Currently, we cannot parse build_id in nmi context because of
> up_read(¤t->mm->mmap_sem), this makes stackmap with build_id
> less useful. This patch enables parsing build_id in nmi by putting
> the up_read() call in irq_work. To avoid memory allocation in nmi
> context, we use per cpu variable for the irq_work. As a result, only
> one irq_work per cpu is allowed. If the irq_work is in-use, we
> fallback to only report ips.
>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
> init/Kconfig | 1 +
> kernel/bpf/stackmap.c | 59 +++++++++++++++++++++++++++++++++++++++++++++------
> 2 files changed, 54 insertions(+), 6 deletions(-)
>
> diff --git a/init/Kconfig b/init/Kconfig
> index f013afc..480a4f2 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1391,6 +1391,7 @@ config BPF_SYSCALL
> bool "Enable bpf() system call"
> select ANON_INODES
> select BPF
> + select IRQ_WORK
> default n
> help
> Enable the bpf() system call that allows to manipulate eBPF
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index 3ba102b..51d4aea 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -11,6 +11,7 @@
> #include <linux/perf_event.h>
> #include <linux/elf.h>
> #include <linux/pagemap.h>
> +#include <linux/irq_work.h>
> #include "percpu_freelist.h"
>
> #define STACK_CREATE_FLAG_MASK \
> @@ -32,6 +33,23 @@ struct bpf_stack_map {
> struct stack_map_bucket *buckets[];
> };
>
> +/* irq_work to run up_read() for build_id lookup in nmi context */
> +struct stack_map_irq_work {
> + struct irq_work irq_work;
> + struct rw_semaphore *sem;
> +};
> +
> +static void do_up_read(struct irq_work *entry)
> +{
> + struct stack_map_irq_work *work = container_of(entry,
> + struct stack_map_irq_work, irq_work);
perhaps:
struct stack_map_irq_work *work;
work = container_of(entry, struct stack_map_irq_work, irq_work);
> + up_read(work->sem);
> + work->sem = NULL;
> +}
> +
> +static DEFINE_PER_CPU(struct stack_map_irq_work, up_read_work);
> +
> static inline bool stack_map_use_build_id(struct bpf_map *map)
> {
> return (map->map_flags & BPF_F_STACK_BUILD_ID);
> @@ -267,17 +285,27 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
> {
> int i;
> struct vm_area_struct *vma;
> + bool in_nmi_ctx = in_nmi();
> + bool irq_work_busy = false;
> + struct stack_map_irq_work *work;
> +
> + if (in_nmi_ctx) {
> + work = this_cpu_ptr(&up_read_work);
> + if (work->irq_work.flags & IRQ_WORK_BUSY)
> + /* cannot queue more up_read, fallback */
> + irq_work_busy = true;
> + }
>
> /*
> - * We cannot do up_read() in nmi context, so build_id lookup is
> - * only supported for non-nmi events. If at some point, it is
> - * possible to run find_vma() without taking the semaphore, we
> - * would like to allow build_id lookup in nmi context.
> + * We cannot do up_read() in nmi context. To do build_id lookup
> + * in nmi context, we need to run up_read() in irq_work. We use
> + * a percpu variable to do the irq_work. If the irq_work is
> + * already used by another lookup, we fall back to report ips.
> *
> * Same fallback is used for kernel stack (!user) on a stackmap
> * with build_id.
> */
> - if (!user || !current || !current->mm || in_nmi() ||
> + if (!user || !current || !current->mm || irq_work_busy ||
> down_read_trylock(¤t->mm->mmap_sem) == 0) {
> /* cannot access current->mm, fall back to ips */
> for (i = 0; i < trace_nr; i++) {
> @@ -299,7 +327,13 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
> - vma->vm_start;
> id_offs[i].status = BPF_STACK_BUILD_ID_VALID;
> }
> - up_read(¤t->mm->mmap_sem);
> +
> + if (!in_nmi_ctx)
> + up_read(¤t->mm->mmap_sem);
> + else {
perhaps:
if (!in_nmi_ctx) {
up_read(¤t->mm->mmap_sem);
} else {
Hope this helps,
Tobin.
^ permalink raw reply
* Re: [PATCH RFC net-next] net: ipvs: Adjust gso_size for IPPROTO_TCP
From: Martin KaFai Lau @ 2018-05-03 7:01 UTC (permalink / raw)
To: Julian Anastasov
Cc: netdev, David Ahern, Tom Herbert, Eric Dumazet, Nikita Shirokov,
kernel-team, lvs-devel
In-Reply-To: <alpine.LFD.2.20.1805022143360.3301@ja.home.ssi.bg>
On Wed, May 02, 2018 at 10:30:32PM +0300, Julian Anastasov wrote:
>
> Hello,
>
> On Wed, 2 May 2018, Martin KaFai Lau wrote:
>
> > On Wed, May 02, 2018 at 09:38:43AM +0300, Julian Anastasov wrote:
> > >
> > > - initial traffic for port 21 does not use GSO. But after
> > > every packet IPVS calls maybe_update_pmtu (rt->dst.ops->update_pmtu)
> > > to report the reduced MTU. These updates are stored in fnhe_pmtu
> > > but they do not go to any route, even if we try to get fresh
> > > output route. Why? Because the local routes are not cached, so
> > > they can not use the fnhe. This is what my patch for route.c
> > > will fix. With this fix FTP-DATA gets route with reduced PMTU.
> > For IPv6, the 'if (rt6->rt6i_flags & RTF_LOCAL)' gate in
> > __ip6_rt_update_pmtu() may need to be lifted also.
>
> Probably. I completely forgot the IPv6 part
> but as I don't know the IPv6 code enough, it may take
> some time to understand what can be the problem there...
> I'm not sure whether everything started with commit 0a6b2a1dc2a2,
> so that in some configurations before that commit things
> worked and problem was not noticed.
>
> I think, we should focus on such direction for IPv6:
>
> - do we remember per-VIP PMTU for the local routes
IPv6 used not to create cache route for DST_HOST route which
is a /128 route (that includes local /128 route).
Because of this, it had a bug such that a PMTU for the DST_HOST
route will trigger dst.ops->update_pmtu() which then set
an expire on the permanent /128 route instead of a cache
route. The permanent route got unexpectedly expired/removed
later.
The fix was to allow creating /128 cache route as long as
it is not RTF_LOCAL in 653437d02f1f and 7035870d1219. The
first post spelled out the problem better:
https://patchwork.ozlabs.org/patch/456050/
Later, when we only create cache route after seeing PMTU
in 45e4fd26683c, this RTF_LOCAL checking was carried over
to __ip6_rt_update_pmtu().
Out of my head, I don't see issue removing the
RTF_LOCAL check from __ip6_rt_update_pmtu().
DavidA, what do you think?
>
> - when exactly we start to use the new PMTU, eg. what happens
> in case socket caches the route, whether route is killed via
> dst->obsolete. Or may be while the PMTU expiration is handled
> per-packet, the PMTU change is noticed only on ICMP...
Before sk can reuse its dst cache, the sk will notice
its dst cache is no longer valid by calling dst_check().
dst_check() should return NULL which is one of the side
effect of the earlier update_pmtu(). This dst_check()
is usually only called when the sk needs to do output,
so the new PMTU route (i.e. the RTF_CACHE IPv6 route)
only have effect to the later packets.
>
> - as IPVS reports the PMTU via dst.ops->update_pmtu() long
> before any large packets are sent, do we propagate the
> PMTU. Also, for IPv4 __ip_rt_update_pmtu() has some protection
> from such per-packet updates that do not change the PMTU.
>
> - if IPVS starts to send ICMP when gso_size exceeds PMTU,
> like in my draft patch, whether the PMTU is propagated
> to route and then to socket. As for the gso_size decrease,
> playing in IPVS is not very safe, at least, we need help
> from GSO experts to know how we should use it.
>
> Regards
>
> --
> Julian Anastasov <ja@ssi.bg>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox