Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next] netdevsim: correctly check return value of debugfs_create_dir
From: David Miller @ 2017-12-19 14:11 UTC (permalink / raw)
  To: bhole_prashant_q7; +Cc: jakub.kicinski, netdev
In-Reply-To: <014801d37884$3cc5d290$b65177b0$@lab.ntt.co.jp>

From: "Prashant Bhole" <bhole_prashant_q7@lab.ntt.co.jp>
Date: Tue, 19 Dec 2017 13:45:47 +0900

> I tried to evaluate whether fixing return value of debugfs_create_dir() (and
> friends) will be useful or not because it has not been changed since very
> long time. Now I am not much convinced about changing this api. 
> 
> Important and possible error codes could be -EEXIST and -ENOMEM. Suppose
> -EEXIST is returned, IMO the directory shouldn't exists in the first place
> because it is specific to particular module. Also, there is no point in
> creating file in such directory, because directory owner (creator) might
> remove it too. This means there are less chances that api change will be
> useful. Please let me know your opinion on it.
> 
> If you are ok with above explanation, shall I submit v2 for this patch?

Well, something is seriously wrong if the directory exists already.

It could be that two netdevsim modules, independantly compiled, are trying
to be loaded.

Wouldn't it clearly be desirable to fail and not load the module in
that case?

This is why I think ignoring debugfs errors is foolish.

^ permalink raw reply

* Re: [PATCH V4 14/26] pch_gbe: deprecate pci_get_bus_and_slot()
From: David Miller @ 2017-12-19 14:13 UTC (permalink / raw)
  To: okaya
  Cc: linux-pci, timur, linux-arm-msm, linux-arm-kernel, keescook,
	edumazet, tklauser, andriy.shevchenko, netdev, linux-kernel
In-Reply-To: <1513661883-28662-15-git-send-email-okaya@codeaurora.org>

From: Sinan Kaya <okaya@codeaurora.org>
Date: Tue, 19 Dec 2017 00:37:50 -0500

> pci_get_bus_and_slot() is restrictive such that it assumes domain=0 as
> where a PCI device is present. This restricts the device drivers to be
> reused for other domain numbers.
> 
> Getting ready to remove pci_get_bus_and_slot() function in favor of
> pci_get_domain_bus_and_slot().
> 
> Use the domain information from pdev while calling into
> pci_get_domain_bus_and_slot() function.
> 
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* Re: [PATCH V4 13/26] bnx2x: deprecate pci_get_bus_and_slot()
From: David Miller @ 2017-12-19 14:14 UTC (permalink / raw)
  To: okaya
  Cc: linux-pci, timur, linux-arm-msm, linux-arm-kernel, ariel.elior,
	everest-linux-l2, netdev, linux-kernel
In-Reply-To: <1513661883-28662-14-git-send-email-okaya@codeaurora.org>

From: Sinan Kaya <okaya@codeaurora.org>
Date: Tue, 19 Dec 2017 00:37:49 -0500

> pci_get_bus_and_slot() is restrictive such that it assumes domain=0 as
> where a PCI device is present. This restricts the device drivers to be
> reused for other domain numbers.
> 
> Getting ready to remove pci_get_bus_and_slot() function in favor of
> pci_get_domain_bus_and_slot().
> 
> Introduce bnx2x_vf_domain() function to extract the domain information
> and save it to VF specific data structure.
> 
> Use the saved domain value while calling pci_get_domain_bus_and_slot().
> 
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* [PATCH net] ipv4: fib: Fix metrics match when deleting a route
From: Phil Sutter @ 2017-12-19 14:17 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Xin Long

The recently added fib_metrics_match() causes a regression for routes
with both RTAX_FEATURES and RTAX_CC_ALGO if the latter has
TCP_CONG_NEEDS_ECN flag set:

| # ip link add d0 type dummy
| # ip link set d0 up
| # ip route add 172.29.29.0/24 dev d0 features ecn congctl dctcp
| # ip route del 172.29.29.0/24 dev d0 features ecn congctl dctcp
| RTNETLINK answers: No such process

During route insertion, fib_convert_metrics() detects that the given CC
algo requires ECN and hence sets DST_FEATURE_ECN_CA bit in
RTAX_FEATURES.

During route deletion though, fib_metrics_match() compares stored
RTAX_FEATURES value with that from userspace (which obviously has no
knowledge about DST_FEATURE_ECN_CA) and fails.

Fixes: 5f9ae3d9e7e4a ("ipv4: do metrics match when looking up and deleting a route")
Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 net/ipv4/fib_semantics.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index f04d944f8abe0..c586597da20db 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -698,7 +698,7 @@ bool fib_metrics_match(struct fib_config *cfg, struct fib_info *fi)
 
 	nla_for_each_attr(nla, cfg->fc_mx, cfg->fc_mx_len, remaining) {
 		int type = nla_type(nla);
-		u32 val;
+		u32 fi_val, val;
 
 		if (!type)
 			continue;
@@ -715,7 +715,11 @@ bool fib_metrics_match(struct fib_config *cfg, struct fib_info *fi)
 			val = nla_get_u32(nla);
 		}
 
-		if (fi->fib_metrics->metrics[type - 1] != val)
+		fi_val = fi->fib_metrics->metrics[type - 1];
+		if (type == RTAX_FEATURES)
+			fi_val &= ~DST_FEATURE_ECN_CA;
+
+		if (fi_val != val)
 			return false;
 	}
 
-- 
2.13.1

^ permalink raw reply related

* Re: [PATCH 2/3] rhashtable: Add rhashtable_walk_curr
From: David Miller @ 2017-12-19 14:35 UTC (permalink / raw)
  To: agruenba; +Cc: herbert, cluster-devel, tgraf, netdev, tom
In-Reply-To: <CAHc6FU5JLviUSt9SO3oXZfWRE5ScBZ7O2f+2YubPJwCxQGW-2g@mail.gmail.com>

From: Andreas Gruenbacher <agruenba@redhat.com>
Date: Tue, 19 Dec 2017 09:35:47 +0100

> By the way, git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git
> master doesn't merge cleanly with current mainline.

Yes, this is the case more often than not :-)

^ permalink raw reply

* [RFC PATCH net-next] tools/bpf: fix build with binutils >= 2.28
From: Roman Gushchin @ 2017-12-19 14:38 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel-team, Roman Gushchin, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann

Bpftool build is broken with binutils version 2.28 and later.
The cause is commit 003ca0fd2286 ("Refactor disassembler selection")
in the binutils repo, which changed the disassembler() function
signature.

Fix this by checking binutils version and use an appropriate
disassembler() signature.

Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
---
 tools/bpf/Makefile             | 6 ++++++
 tools/bpf/bpf_jit_disasm.c     | 7 +++++++
 tools/bpf/bpftool/Makefile     | 6 ++++++
 tools/bpf/bpftool/jit_disasm.c | 5 +++++
 4 files changed, 24 insertions(+)

diff --git a/tools/bpf/Makefile b/tools/bpf/Makefile
index 07a6697466ef..3fd32fd0daa1 100644
--- a/tools/bpf/Makefile
+++ b/tools/bpf/Makefile
@@ -6,8 +6,14 @@ LEX = flex
 YACC = bison
 MAKE = make
 
+BINUTILS_VER := $(word 4, $(shell readelf -v | head -n 1))
+BINUTILS_VER_MAJ := $(word 1, $(subst ., , $(subst -, , ${BINUTILS_VER})))
+BINUTILS_VER_MIN := $(word 2, $(subst ., , $(subst -, , ${BINUTILS_VER})))
+
 CFLAGS += -Wall -O2
 CFLAGS += -D__EXPORTED_HEADERS__ -I../../include/uapi -I../../include
+CFLAGS += -DBINUTILS_VER_MAJ=${BINUTILS_VER_MAJ}
+CFLAGS += -DBINUTILS_VER_MIN=${BINUTILS_VER_MIN}
 
 %.yacc.c: %.y
 	$(YACC) -o $@ -d $<
diff --git a/tools/bpf/bpf_jit_disasm.c b/tools/bpf/bpf_jit_disasm.c
index 75bf526a0168..3ef7c8bdc0f3 100644
--- a/tools/bpf/bpf_jit_disasm.c
+++ b/tools/bpf/bpf_jit_disasm.c
@@ -72,7 +72,14 @@ static void get_asm_insns(uint8_t *image, size_t len, int opcodes)
 
 	disassemble_init_for_target(&info);
 
+#if (BINUTILS_VER_MAJ >= 2) && (BINUTILS_VER_MIN >= 28)
+	disassemble = disassembler(bfd_get_arch(bfdf),
+				   bfd_big_endian(bfdf),
+				   bfd_get_mach(bfdf),
+				   bfdf);
+#else
 	disassemble = disassembler(bfdf);
+#endif
 	assert(disassemble);
 
 	do {
diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
index 3f17ad317512..94ad51bf14b5 100644
--- a/tools/bpf/bpftool/Makefile
+++ b/tools/bpf/bpftool/Makefile
@@ -40,6 +40,12 @@ CFLAGS += -W -Wall -Wextra -Wno-unused-parameter -Wshadow
 CFLAGS += -D__EXPORTED_HEADERS__ -I$(srctree)/tools/include/uapi -I$(srctree)/tools/include -I$(srctree)/tools/lib/bpf -I$(srctree)/kernel/bpf/
 LIBS = -lelf -lbfd -lopcodes $(LIBBPF)
 
+BINUTILS_VER := $(word 4, $(shell readelf -v | head -n 1))
+BINUTILS_VER_MAJ := $(word 1, $(subst ., , $(subst -, , ${BINUTILS_VER})))
+BINUTILS_VER_MIN := $(word 2, $(subst ., , $(subst -, , ${BINUTILS_VER})))
+CFLAGS += -DBINUTILS_VER_MAJ=${BINUTILS_VER_MAJ}
+CFLAGS += -DBINUTILS_VER_MIN=${BINUTILS_VER_MIN}
+
 INSTALL ?= install
 RM ?= rm -f
 
diff --git a/tools/bpf/bpftool/jit_disasm.c b/tools/bpf/bpftool/jit_disasm.c
index 1551d3918d4c..eaa7127e9eeb 100644
--- a/tools/bpf/bpftool/jit_disasm.c
+++ b/tools/bpf/bpftool/jit_disasm.c
@@ -107,7 +107,12 @@ void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes)
 
 	disassemble_init_for_target(&info);
 
+#if (BINUTILS_VER_MAJ >= 2) && (BINUTILS_VER_MIN >= 28)
+	disassemble = disassembler(bfd_get_arch(bfdf), bfd_big_endian(bfdf),
+				   bfd_get_mach(bfdf), bfdf);
+#else
 	disassemble = disassembler(bfdf);
+#endif
 	assert(disassemble);
 
 	if (json_output)
-- 
2.14.3

^ permalink raw reply related

* Re: pull-request: mac80211 2017-12-19
From: David Miller @ 2017-12-19 14:39 UTC (permalink / raw)
  To: johannes; +Cc: netdev, linux-wireless
In-Reply-To: <20171219095710.16889-1-johannes@sipsolutions.net>

From: Johannes Berg <johannes@sipsolutions.net>
Date: Tue, 19 Dec 2017 10:57:09 +0100

> Other work has been hectic, and I got caught by rc4. We still
> have a few more fixes though - and more build issues were
> reported.
> 
> Please pull and let me know if there's any problem.

Pulled, thanks Johannes.

^ permalink raw reply

* Re: [PATCH V4 14/26] pch_gbe: deprecate pci_get_bus_and_slot()
From: David Miller @ 2017-12-19 14:53 UTC (permalink / raw)
  To: okaya
  Cc: andriy.shevchenko, linux-pci, timur, linux-arm-msm,
	linux-arm-kernel, keescook, edumazet, tklauser, netdev,
	linux-kernel
In-Reply-To: <ef7d09ce-06af-ad3e-1048-3b413cc32e9b@codeaurora.org>

From: Sinan Kaya <okaya@codeaurora.org>
Date: Tue, 19 Dec 2017 07:17:41 -0500

> Then hard code the domain number as 0 while calling
> pci_get_domain_bus_and_slot() if you know for sure that this device
> will never work on a non-zero domain.

Agreed, it's so much better to be explicit about this.

^ permalink raw reply

* Re: [PATCH v3 net-next 3/6] net: Add SW fallback infrastructure for offloaded sockets
From: Marcelo Ricardo Leitner @ 2017-12-19 15:05 UTC (permalink / raw)
  To: Ilya Lesokhin
  Cc: netdev@vger.kernel.org, davem@davemloft.net, davejwatson@fb.com,
	tom@herbertland.com, hannes@stressinduktion.org, Boris Pismenny,
	Aviad Yehezkel, Liran Liss, Steffen Klassert
In-Reply-To: <AM4PR0501MB2723B5452037B69253A0459BD40F0@AM4PR0501MB2723.eurprd05.prod.outlook.com>

On Tue, Dec 19, 2017 at 07:51:02AM +0000, Ilya Lesokhin wrote:
> On Monday, December 18, 2017 9:18 PM, Marcelo Ricardo Leitner wrote:
> 
> > > +
> > > +	if (sk && sk_fullsock(sk) && sk->sk_offload_check)
> > 
> > Isn't this going to hurt the fast path, checking for sk fields here?
> > 
> 
> We do add code to the fast path but it seems unavoidable if you want to have SW fallback.
> The XFRM device offload also does that
> http://elixir.free-electrons.com/linux/v4.14.7/source/net/core/dev.c#L3058

Right, although a bit different. It's accessing skb->sp and not the
socket and depending on how compiler is doing things, the check
http://elixir.free-electrons.com/linux/v4.14.7/source/net/xfrm/xfrm_device.c#L32
will help in some cases.

But more importantly, all the above only exists if CONFIG_XFRM_OFFLOAD
is enabled.

> 
> The check can be optimized but I didn't want to do that before I saw that it's an issue.
> I'm also not sure what the correct solution is.
> I don't like that fact that each "stateful protocol" we offload requires its own check. 
> We need to think if we can find a generic way of doing it.
> 
> Perhaps we can hold the expected netdev somewhere in the SKB and only if we don't
> Go out of the expected netdev go to a slow path that does a check for each protocol.

This could be a good switch, yes.

Thanks,
Marcelo

^ permalink raw reply

* Re: [PATCH v3 net-next 6/6] tls: Add generic NIC offload infrastructure.
From: Marcelo Ricardo Leitner @ 2017-12-19 15:11 UTC (permalink / raw)
  To: Ilya Lesokhin
  Cc: netdev@vger.kernel.org, davem@davemloft.net, davejwatson@fb.com,
	tom@herbertland.com, hannes@stressinduktion.org, Boris Pismenny,
	Aviad Yehezkel, Liran Liss
In-Reply-To: <AM4PR0501MB27235B47E8E4DDA2B9E8B757D40F0@AM4PR0501MB2723.eurprd05.prod.outlook.com>

On Tue, Dec 19, 2017 at 07:31:24AM +0000, Ilya Lesokhin wrote:
> On Mon, Monday, December 18, 2017 9:54 PM, Marcelo Ricardo Leitner wrote:
> 
> > On Mon, Dec 18, 2017 at 01:10:33PM +0200, Ilya Lesokhin wrote:
> > > This patch adds a generic infrastructure to offload TLS crypto to a
> > > network devices. It enables the kernel TLS socket to skip encryption
> > > and authentication operations on the transmit side of the data path.
> > > Leaving those computationally expensive operations to the NIC.
> > 
> > I have a hard time understanding why this was named 'tls_device' if no
> > net_device's are registered.
> > 
> I'm not quite sure what you mean by "no net_device's are registered"
> Presumably you mean there is no device that implements the 
> NETIF_F_HW_TLS_TX capability yet.

Not really. Let me try again. This patchset is using the expression
"tls_device". When I read that, I expect a new interface type, like a
tunnel, that would be created on top of another interface that has the
offloading capability. That's why I'm confused. IMHO "tls_offload" is
a better fit. Makes sense?

> I'll just say that the IPSEC device offload infrastructure was also submitted
> https://github.com/torvalds/linux/commit/d77e38e612a017480157fe6d2c1422f42cb5b7e3
> before the first implementation
> https://github.com/torvalds/linux/commit/bebb23e6cb02d2fc752905e39d09ff6152852c6c
> 
> And we did provide a link to an implementation 
> https://github.com/Mellanox/tls-offload/tree/tls_device_v3
> for people who want to take a look.
> Unfortunately it is not ready for upstream submission yet

Yep, although I still have to get there.

Thanks,
Marcelo

^ permalink raw reply

* Re: [patch iproute2] tc: add -bs option for batch mode
From: Stephen Hemminger @ 2017-12-19 15:15 UTC (permalink / raw)
  To: Chris Mi; +Cc: netdev, gerlitz.or
In-Reply-To: <20171219063346.19300-1-chrism@mellanox.com>

On Tue, 19 Dec 2017 15:33:46 +0900
Chris Mi <chrism@mellanox.com> wrote:

> Currently in tc batch mode, only one command is read from the batch
> file and sent to kernel to process. With this patch, we can accumulate
> several commands before sending to kernel. The batch size is specified
> using option -bs or -batchsize.
> 
> To accumulate the commands in tc, we allocate an array of struct iovec.
> If batchsize is bigger than 1 and we haven't accumulated enough
> commands, rtnl_talk() will return without actually sending the message.
> One exception is that there is no more command in the batch file.
> 
> But please note that kernel still processes the requests one by one.
> To process the requests in parallel in kernel is another effort.
> The time we're saving in this patch is the user mode and kernel mode
> context switch. So this patch works on top of the current kernel.
> 
> Using the following script in kernel, we can generate 1,000,000 rules.
> 	tools/testing/selftests/tc-testing/tdc_batch.py
> 
> Without this patch, 'tc -b $file' exection time is:
> 
> real    0m14.916s
> user    0m6.808s
> sys     0m8.046s
> 
> With this patch, 'tc -b $file -bs 10' exection time is:
> 
> real    0m12.286s
> user    0m5.903s
> sys     0m6.312s
> 
> The insertion rate is improved more than 10%.
> 
> Signed-off-by: Chris Mi <chrism@mellanox.com>

I think this is probably optmizing for an unrealistic use case.
If there are 1M rules then it should be managed by a dynmaic process
like a routing daemon with a real database and API. Such a daemon
would talk to kernel directly. Plus at scale netlink gets overwhelmed
and the daemon has to handle resync.

Not convinced that the added complexity and error handling issues
warrant the change. The batch mode already sucks at handling
errors.

^ permalink raw reply

* Re: [PATCH v7 2/3] sock: Move the socket inuse to namespace.
From: David Miller @ 2017-12-19 15:19 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: xiangxia.m.yue, netdev
In-Reply-To: <CAM_iQpXhuQ03LB+DsVU4z=m5BCOWKTP1j1oUnBX7-rWC=z_oSA@mail.gmail.com>

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Mon, 18 Dec 2017 13:38:39 -0800

> On Mon, Dec 18, 2017 at 11:30 AM, David Miller <davem@davemloft.net> wrote:
>> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>> Date: Thu, 14 Dec 2017 05:51:58 -0800
>>
>>> In some case, we want to know how many sockets are in use in
>>> different _net_ namespaces. It's a key resource metric.
>>
>> Useful or not, you're not exporting this value.
>>
>> All this patch series does is convert the existing export of the
>> global tally to add up the per-net values.
>>
>> So if you're not exporting the per-net value on it's own in any way,
>> this patch series isn't achieving the stated goal.
>>
>> I'm not applying this series, sorry.
> 
> 
> This value is already exported via procfs:
> sockstat_seq_show() -> socket_seq_show().
> 
> And the proc file itself should already be per-net:
> 
> static int sockstat_seq_open(struct inode *inode, struct file *file)
> {
>         return single_open_net(inode, file, sockstat_seq_show);
> }
> 
> 
> This patch just makes that value to be per-net too.

Ok, I've applied this patch series.

Thanks for your patience and understanding.

^ permalink raw reply

* Re: [PATCH -tip v3 3/6] net: sctp: Add SCTP ACK tracking trace event
From: Steven Rostedt @ 2017-12-19 15:20 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ingo Molnar, Ian McDonald, Vlad Yasevich, Stephen Hemminger,
	Peter Zijlstra, Thomas Gleixner, LKML, H . Peter Anvin,
	Gerrit Renker, David S . Miller, Neil Horman, dccp, netdev,
	linux-sctp, Stephen Rothwell
In-Reply-To: <151367390483.32364.4432870291926458046.stgit@devbox>

On Tue, 19 Dec 2017 17:58:25 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> +TRACE_EVENT(sctp_probe,
> +
> +	TP_PROTO(const struct sctp_endpoint *ep,
> +		 const struct sctp_association *asoc,
> +		 struct sctp_chunk *chunk),
> +
> +	TP_ARGS(ep, asoc, chunk),
> +
> +	TP_STRUCT__entry(
> +		__field(__u64, asoc)
> +		__field(__u32, mark)
> +		__field(__u16, bind_port)
> +		__field(__u16, peer_port)
> +		__field(__u32, pathmtu)
> +		__field(__u32, rwnd)
> +		__field(__u16, unack_data)
> +	),
> +
> +	TP_fast_assign(
> +		struct sctp_transport *sp;
> +		struct sk_buff *skb = chunk->skb;
> +
> +		__entry->asoc = (__u64)asoc;
> +		__entry->mark = skb->mark;
> +		__entry->bind_port = ep->base.bind_addr.port;
> +		__entry->peer_port = asoc->peer.port;
> +		__entry->pathmtu = asoc->pathmtu;
> +		__entry->rwnd = asoc->peer.rwnd;
> +		__entry->unack_data = asoc->unack_data;
> +
> +		if (trace_sctp_probe_path_enabled()) {
> +			list_for_each_entry(sp, &asoc->peer.transport_addr_list,
> +					    transports) {
> +				trace_sctp_probe_path(sp, asoc);
> +			}
> +		}

I thought you were going to move this into the code, like I suggested?

-- Steve

> +	),
> +
> +	TP_printk("asoc=%#llx mark=%#x bind_port=%d peer_port=%d pathmtu=%d "
> +		  "rwnd=%u unack_data=%d",
> +		  __entry->asoc, __entry->mark, __entry->bind_port,
> +		  __entry->peer_port, __entry->pathmtu, __entry->rwnd,
> +		  __entry->unack_data)
> +);
> +

^ permalink raw reply

* Re: [patch iproute2] tc: add -bs option for batch mode
From: Stephen Hemminger @ 2017-12-19 15:22 UTC (permalink / raw)
  To: Chris Mi; +Cc: netdev, gerlitz.or
In-Reply-To: <20171219063346.19300-1-chrism@mellanox.com>

On Tue, 19 Dec 2017 15:33:46 +0900
Chris Mi <chrism@mellanox.com> wrote:

> Currently in tc batch mode, only one command is read from the batch
> file and sent to kernel to process. With this patch, we can accumulate
> several commands before sending to kernel. The batch size is specified
> using option -bs or -batchsize.
> 
> To accumulate the commands in tc, we allocate an array of struct iovec.
> If batchsize is bigger than 1 and we haven't accumulated enough
> commands, rtnl_talk() will return without actually sending the message.
> One exception is that there is no more command in the batch file.
> 
> But please note that kernel still processes the requests one by one.
> To process the requests in parallel in kernel is another effort.
> The time we're saving in this patch is the user mode and kernel mode
> context switch. So this patch works on top of the current kernel.
> 
> Using the following script in kernel, we can generate 1,000,000 rules.
> 	tools/testing/selftests/tc-testing/tdc_batch.py
> 
> Without this patch, 'tc -b $file' exection time is:
> 
> real    0m14.916s
> user    0m6.808s
> sys     0m8.046s
> 
> With this patch, 'tc -b $file -bs 10' exection time is:
> 
> real    0m12.286s
> user    0m5.903s
> sys     0m6.312s
> 
> The insertion rate is improved more than 10%.
> 
> Signed-off-by: Chris Mi <chrism@mellanox.com>
> ---
>  include/libnetlink.h | 27 ++++++++++++++++
>  include/utils.h      |  8 +++++
>  lib/libnetlink.c     | 30 +++++++++++++-----
>  lib/utils.c          | 20 ++++++++++++
>  man/man8/tc.8        |  5 +++
>  tc/m_action.c        | 63 ++++++++++++++++++++++++++++---------
>  tc/tc.c              | 27 ++++++++++++++--
>  tc/tc_common.h       |  3 ++
>  tc/tc_filter.c       | 89 ++++++++++++++++++++++++++++++++++++----------------
>  9 files changed, 221 insertions(+), 51 deletions(-)

In addition to my earlier comments, these are the implementation
issues.

> 
> diff --git a/include/libnetlink.h b/include/libnetlink.h
> index a4d83b9e..07e88c94 100644
> --- a/include/libnetlink.h
> +++ b/include/libnetlink.h
> @@ -13,6 +13,8 @@
>  #include <linux/netconf.h>
>  #include <arpa/inet.h>
>  
> +#define MSG_IOV_MAX 256
> +
>  struct rtnl_handle {
>  	int			fd;
>  	struct sockaddr_nl	local;
> @@ -93,6 +95,31 @@ int rtnl_dump_filter_nc(struct rtnl_handle *rth,
>  			void *arg, __u16 nc_flags);
>  #define rtnl_dump_filter(rth, filter, arg) \
>  	rtnl_dump_filter_nc(rth, filter, arg, 0)
> +
> +extern int msg_iov_index;
> +static inline int get_msg_iov_index(void)
> +{
> +	return msg_iov_index;
> +}
> +static inline void set_msg_iov_index(int index)
> +{
> +	msg_iov_index = index;
> +}
> +static inline void incr_msg_iov_index(void)
> +{
> +	++msg_iov_index;
> +}
> +
> +extern int batch_size;
> +static inline int get_batch_size(void)
> +{
> +	return batch_size;
> +}
> +static inline void set_batch_size(int size)
> +{
> +	batch_size = size;
> +}

Iproute2 is C not C++; no accessors for every variable. 


>  int rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n,
>  	      struct nlmsghdr **answer)
>  	__attribute__((warn_unused_result));
> diff --git a/include/utils.h b/include/utils.h
> index d3895d56..66cb4747 100644
> --- a/include/utils.h
> +++ b/include/utils.h
> @@ -235,6 +235,14 @@ void print_nlmsg_timestamp(FILE *fp, const struct nlmsghdr *n);
>  
>  extern int cmdlineno;
>  ssize_t getcmdline(char **line, size_t *len, FILE *in);
> +
> +extern int cmdlinetotal;
> +static inline int getcmdlinetotal(void)
> +{
> +	return cmdlinetotal;
> +}
> +void setcmdlinetotal(const char *name);
> +
>  int makeargs(char *line, char *argv[], int maxargs);
>  int inet_get_addr(const char *src, __u32 *dst, struct in6_addr *dst6);
>  
> diff --git a/lib/libnetlink.c b/lib/libnetlink.c
> index 00e6ce0c..f9be1c6d 100644
> --- a/lib/libnetlink.c
> +++ b/lib/libnetlink.c
> @@ -24,6 +24,7 @@
>  #include <sys/uio.h>
>  
>  #include "libnetlink.h"
> +#include "utils.h"
>  
>  #ifndef SOL_NETLINK
>  #define SOL_NETLINK 270
> @@ -581,6 +582,10 @@ static void rtnl_talk_error(struct nlmsghdr *h, struct nlmsgerr *err,
>  		strerror(-err->error));
>  }
>  
> +static struct iovec msg_iov[MSG_IOV_MAX];
> +int msg_iov_index;
> +int batch_size = 1;
> +
>  static int __rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n,
>  		       struct nlmsghdr **answer,
>  		       bool show_rtnl_err, nl_ext_ack_fn_t errfn)
> @@ -589,23 +594,34 @@ static int __rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n,
>  	unsigned int seq;
>  	struct nlmsghdr *h;
>  	struct sockaddr_nl nladdr = { .nl_family = AF_NETLINK };
> -	struct iovec iov = {
> -		.iov_base = n,
> -		.iov_len = n->nlmsg_len
> -	};
> +	struct iovec *iov = &msg_iov[get_msg_iov_index()];
> +	int index;
> +	char *buf;
> +
> +	iov->iov_base = n;
> +	iov->iov_len = n->nlmsg_len;
> +
> +	incr_msg_iov_index();
>  	struct msghdr msg = {
>  		.msg_name = &nladdr,
>  		.msg_namelen = sizeof(nladdr),
> -		.msg_iov = &iov,
> -		.msg_iovlen = 1,
> +		.msg_iov = msg_iov,
> +		.msg_iovlen = get_msg_iov_index(),
>  	};
> -	char *buf;
>  
>  	n->nlmsg_seq = seq = ++rtnl->seq;
>  
>  	if (answer == NULL)
>  		n->nlmsg_flags |= NLM_F_ACK;

Your real performance win is just not asking for ACK for every rule.
If you switched to pure streaming mode, then the batching wouldn't
be necessary.

>  
> +	index = get_msg_iov_index() % get_batch_size();
> +	if (index != 0 && get_batch_size() > 1 &&
> +	    cmdlineno != getcmdlinetotal() &&
> +	    (n->nlmsg_type == RTM_NEWTFILTER ||
> +	    n->nlmsg_type == RTM_NEWACTION))
> +		return 3;
> +	set_msg_iov_index(index);
> +
>  	status = sendmsg(rtnl->fd, &msg, 0);
>  	if (status < 0) {
>  		perror("Cannot talk to rtnetlink");
> diff --git a/lib/utils.c b/lib/utils.c
> index 7ced8c06..53ca389f 100644
> --- a/lib/utils.c
> +++ b/lib/utils.c
> @@ -1202,6 +1202,26 @@ ssize_t getcmdline(char **linep, size_t *lenp, FILE *in)
>  	return cc;
>  }
>  
> +int cmdlinetotal;
> +
> +void setcmdlinetotal(const char *name)
> +{
> +	char *line = NULL;
> +	size_t len = 0;
> +
> +	if (name && strcmp(name, "-") != 0) {
> +		if (freopen(name, "r", stdin) == NULL) {
> +			fprintf(stderr, "Cannot open file \"%s\" for reading: %s\n",
> +				name, strerror(errno));
> +			return;
> +		}
> +	}
> +
> +	cmdlinetotal = 0;
> +	while (getcmdline(&line, &len, stdin) != -1)
> +		cmdlinetotal++;
> +}
> +
>  /* split command line into argument vector */
>  int makeargs(char *line, char *argv[], int maxargs)
>  {
> diff --git a/man/man8/tc.8 b/man/man8/tc.8
> index ff071b33..de137e16 100644
> --- a/man/man8/tc.8
> +++ b/man/man8/tc.8
> @@ -601,6 +601,11 @@ must exist already.
>  read commands from provided file or standard input and invoke them.
>  First failure will cause termination of tc.
>  
> +.TP
> +.BR "\-bs", " \-bs size", " \-batchsize", " \-batchsize size"
> +How many commands are accumulated before sending to kernel.
> +By default, it is 1. It only takes effect in batch mode.
> +
>  .TP
>  .BR "\-force"
>  don't terminate tc on errors in batch mode.
> diff --git a/tc/m_action.c b/tc/m_action.c
> index 13f942bf..574b78ca 100644
> --- a/tc/m_action.c
> +++ b/tc/m_action.c
> @@ -23,6 +23,7 @@
>  #include <arpa/inet.h>
>  #include <string.h>
>  #include <dlfcn.h>
> +#include <errno.h>
>  
>  #include "utils.h"
>  #include "tc_common.h"
> @@ -546,33 +547,65 @@ bad_val:
>  	return ret;
>  }
>  
> +typedef struct {
> +	struct nlmsghdr		n;
> +	struct tcamsg		t;
> +	char			buf[MAX_MSG];
> +} tc_action_req;
> +
> +static tc_action_req *action_reqs;
> +
> +void free_action_reqs(void)
> +{
> +	if (action_reqs)
> +		free(action_reqs);
> +}
> +
> +static tc_action_req *get_action_req(void)
> +{
> +	tc_action_req *req;
> +
> +	if (action_reqs == NULL) {
> +		action_reqs = malloc(get_batch_size() * sizeof (tc_action_req));
> +		if (action_reqs == NULL)
> +			return NULL;
> +	}
> +	req = &action_reqs[get_msg_iov_index()];
> +	memset(req, 0, sizeof (*req));
> +
> +	return req;
> +}
> +
>  static int tc_action_modify(int cmd, unsigned int flags,
>  			    int *argc_p, char ***argv_p)
>  {
>  	int argc = *argc_p;
>  	char **argv = *argv_p;
>  	int ret = 0;
> -	struct {
> -		struct nlmsghdr         n;
> -		struct tcamsg           t;
> -		char                    buf[MAX_MSG];
> -	} req = {
> -		.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcamsg)),
> -		.n.nlmsg_flags = NLM_F_REQUEST | flags,
> -		.n.nlmsg_type = cmd,
> -		.t.tca_family = AF_UNSPEC,
> -	};
> -	struct rtattr *tail = NLMSG_TAIL(&req.n);
> +	tc_action_req *req;
> +
> +	req = get_action_req();
> +	if (req == NULL) {
> +		fprintf(stderr, "get_action_req error: not enough buffer\n");
> +		return -ENOMEM;
> +	}
> +
> +	req->n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcamsg));
> +	req->n.nlmsg_flags = NLM_F_REQUEST | flags;
> +	req->n.nlmsg_type = cmd;
> +	req->t.tca_family = AF_UNSPEC;
> +	struct rtattr *tail = NLMSG_TAIL(&req->n);
>  
>  	argc -= 1;
>  	argv += 1;
> -	if (parse_action(&argc, &argv, TCA_ACT_TAB, &req.n)) {
> +	if (parse_action(&argc, &argv, TCA_ACT_TAB, &req->n)) {
>  		fprintf(stderr, "Illegal \"action\"\n");
>  		return -1;
>  	}
> -	tail->rta_len = (void *) NLMSG_TAIL(&req.n) - (void *) tail;
> +	tail->rta_len = (void *) NLMSG_TAIL(&req->n) - (void *) tail;
>  
> -	if (rtnl_talk(&rth, &req.n, NULL) < 0) {
> +	ret = rtnl_talk(&rth, &req->n, NULL);
> +	if (ret < 0) {
>  		fprintf(stderr, "We have an error talking to the kernel\n");
>  		ret = -1;
>  	}
> @@ -739,5 +772,5 @@ int do_action(int argc, char **argv)
>  			return -1;
>  	}
>  
> -	return 0;
> +	return ret;
>  }
> diff --git a/tc/tc.c b/tc/tc.c
> index ad9f07e9..9c8e828b 100644
> --- a/tc/tc.c
> +++ b/tc/tc.c
> @@ -189,7 +189,7 @@ static void usage(void)
>  	fprintf(stderr, "Usage: tc [ OPTIONS ] OBJECT { COMMAND | help }\n"
>  			"       tc [-force] -batch filename\n"
>  			"where  OBJECT := { qdisc | class | filter | action | monitor | exec }\n"
> -	                "       OPTIONS := { -s[tatistics] | -d[etails] | -r[aw] | -p[retty] | -b[atch] [filename] | -n[etns] name |\n"
> +	                "       OPTIONS := { -s[tatistics] | -d[etails] | -r[aw] | -p[retty] | -b[atch] [filename] | -bs | -batchsize [size] | -n[etns] name |\n"
>  			"                    -nm | -nam[es] | { -cf | -conf } path } | -j[son]\n");
>  }
>  
> @@ -223,6 +223,9 @@ static int batch(const char *name)
>  	size_t len = 0;
>  	int ret = 0;
>  
> +	if (get_batch_size() > 1)
> +		setcmdlinetotal(name);
> +
>  	batch_mode = 1;
>  	if (name && strcmp(name, "-") != 0) {
>  		if (freopen(name, "r", stdin) == NULL) {
> @@ -248,15 +251,22 @@ static int batch(const char *name)
>  		if (largc == 0)
>  			continue;	/* blank line */
>  
> -		if (do_cmd(largc, largv)) {
> +		ret = do_cmd(largc, largv);
> +		if (ret != 0 && ret != 3) {
>  			fprintf(stderr, "Command failed %s:%d\n", name, cmdlineno);
>  			ret = 1;
>  			if (!force)
>  				break;
>  		}
>  	}
> +	if (ret == 3)
> +		fprintf(stderr,
> +			"Command is not sent to kernel due to internal error %s:%d\n",
> +			name, cmdlineno);
>  	if (line)
>  		free(line);
> +	free_filter_reqs();
> +	free_action_reqs();
>  
>  	rtnl_close(&rth);
>  	return ret;
> @@ -267,6 +277,7 @@ int main(int argc, char **argv)
>  {
>  	int ret;
>  	char *batch_file = NULL;
> +	int size;
>  
>  	while (argc > 1) {
>  		if (argv[1][0] != '-')
> @@ -297,6 +308,16 @@ int main(int argc, char **argv)
>  			if (argc <= 1)
>  				usage();
>  			batch_file = argv[1];
> +		} else if (matches(argv[1], "-batchsize") == 0 ||
> +				matches(argv[1], "-bs") == 0) {
> +			argc--;	argv++;
> +			if (argc <= 1)
> +				usage();
> +			size = atoi(argv[1]);
> +			if (size > MSG_IOV_MAX)
> +				set_batch_size(MSG_IOV_MAX);
> +			else
> +				set_batch_size(size);
>  		} else if (matches(argv[1], "-netns") == 0) {
>  			NEXT_ARG();
>  			if (netns_switch(argv[1]))
> @@ -342,6 +363,8 @@ int main(int argc, char **argv)
>  	}
>  
>  	ret = do_cmd(argc-1, argv+1);
> +	free_filter_reqs();
> +	free_action_reqs();
>  Exit:
>  	rtnl_close(&rth);
>  
> diff --git a/tc/tc_common.h b/tc/tc_common.h
> index 264fbdac..fde8db1b 100644
> --- a/tc/tc_common.h
> +++ b/tc/tc_common.h
> @@ -24,5 +24,8 @@ struct tc_sizespec;
>  extern int parse_size_table(int *p_argc, char ***p_argv, struct tc_sizespec *s);
>  extern int check_size_table_opts(struct tc_sizespec *s);
>  
> +extern void free_filter_reqs(void);
> +extern void free_action_reqs(void);
> +
>  extern int show_graph;
>  extern bool use_names;
> diff --git a/tc/tc_filter.c b/tc/tc_filter.c
> index 545cc3a1..dc53c128 100644
> --- a/tc/tc_filter.c
> +++ b/tc/tc_filter.c
> @@ -19,6 +19,7 @@
>  #include <arpa/inet.h>
>  #include <string.h>
>  #include <linux/if_ether.h>
> +#include <errno.h>
>  
>  #include "rt_names.h"
>  #include "utils.h"
> @@ -42,18 +43,51 @@ static void usage(void)
>  		"OPTIONS := ... try tc filter add <desired FILTER_KIND> help\n");
>  }
>  
> +typedef struct {
> +	struct nlmsghdr		n;
> +	struct tcmsg		t;
> +	char			buf[MAX_MSG];
> +} tc_filter_req;
> +
> +static tc_filter_req *filter_reqs;
> +
> +void free_filter_reqs(void)
> +{
> +	if (filter_reqs)
> +		free(filter_reqs);
> +}

You don't need to check for NULL when calling free.
free(NULL) is a nop. 

^ permalink raw reply

* Re: [PATCH] net: arc_emac: fix arc_emac_rx() error paths
From: David Miller @ 2017-12-19 15:22 UTC (permalink / raw)
  To: al.kochet; +Cc: netdev, linux-kernel, f.fainelli, edumazet
In-Reply-To: <1513358406-32503-1-git-send-email-al.kochet@gmail.com>

From: Alexander Kochetkov <al.kochet@gmail.com>
Date: Fri, 15 Dec 2017 20:20:06 +0300

> arc_emac_rx() has some issues found by code review.
> 
> In case netdev_alloc_skb_ip_align() or dma_map_single() failure
> rx fifo entry will not be returned to EMAC.
> 
> In case dma_map_single() failure previously allocated skb became
> lost to driver. At the same time address of newly allocated skb
> will not be provided to EMAC.
> 
> Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>

This patch adds quite a few bugs, here is one which shows this is not
functionally tested:

> -		/* receive_skb only if new skb was allocated to avoid holes */
> -		netif_receive_skb(skb);
> -
> -		addr = dma_map_single(&ndev->dev, (void *)rx_buff->skb->data,
> +		addr = dma_map_single(&ndev->dev, (void *)skb->data,
>  				      EMAC_BUFFER_SIZE, DMA_FROM_DEVICE);

Map the new SKB.

>  		if (dma_mapping_error(&ndev->dev, addr)) {
>  			if (net_ratelimit())
> -				netdev_err(ndev, "cannot dma map\n");
> -			dev_kfree_skb(rx_buff->skb);
> +				netdev_err(ndev, "cannot map dma buffer\n");
> +			dev_kfree_skb(skb);
> +			/* Return ownership to EMAC */
> +			rxbd->info = cpu_to_le32(FOR_EMAC | EMAC_BUFFER_SIZE);
>  			stats->rx_errors++;
> +			stats->rx_dropped++;
>  			continue;
>  		}
> +
> +		/* unmap previosly mapped skb */
> +		dma_unmap_single(&ndev->dev, dma_unmap_addr(rx_buff, addr),
> +				 dma_unmap_len(rx_buff, len), DMA_FROM_DEVICE);

And then you unmap it.  "addr" is the new DMA mapping, not the old one.

^ permalink raw reply

* Re: [PATCH] tg3: Fix rx hang on MTU change with 5717/5719
From: David Miller @ 2017-12-19 15:25 UTC (permalink / raw)
  To: brking
  Cc: siva.kallam, prashant, mchan, benjamin.kun, netdev, maurosr,
	muvic, brking, stable
In-Reply-To: <1513372910-32121-1-git-send-email-brking@linux.vnet.ibm.com>

From: Brian King <brking@linux.vnet.ibm.com>
Date: Fri, 15 Dec 2017 15:21:50 -0600

> This fixes a hang issue seen when changing the MTU size from 1500 MTU
> to 9000 MTU on both 5717 and 5719 chips. In discussion with Broadcom,
> they've indicated that these chipsets have the same phy as the 57766
> chipset, so the same workarounds apply. This has been tested by IBM
> on both Power 8 and Power 9 systems as well as by Broadcom on x86
> hardware and has been confirmed to resolve the hang issue.
> 
> Cc: stable <stable@vger.kernel.org>
> Signed-off-by: Brian King <brking@linux.vnet.ibm.com>

Applied and queued up for -stable.

Please do not add explicit CC: stable tags for networking patches, as
documented in Documentation/networking/netdev-FAQ.txt

Thank you.

^ permalink raw reply

* Re: Linux ECN Handling
From: Neal Cardwell @ 2017-12-19 15:28 UTC (permalink / raw)
  To: Steve Ibanez
  Cc: Eric Dumazet, Yuchung Cheng, Daniel Borkmann, Netdev,
	Florian Westphal, Mohammad Alizadeh, Lawrence Brakmo
In-Reply-To: <CACJspmKJkHQ+yMO5JCjc8PiZ4VeZd_vKbUEUqzX-MmdXKJFAZA@mail.gmail.com>

On Tue, Dec 19, 2017 at 12:16 AM, Steve Ibanez <sibanez@stanford.edu> wrote:
>
> Hi Neal,
>
> I started looking into this receiver ACKing issue today. Strangely,
> when I tried adding printk statements at the top of the
> tcp_v4_do_rcv(), tcp_rcv_established(), __tcp_ack_snd_check() and
> tcp_send_delayed_ack() functions they were never executed on the
> machine running the iperf3 server (i.e. the destination of the flows).
> Maybe the iperf3 server is using its own TCP stack?
>
> In any case, the ACKing problem is reproducible using just normal
> iperf for which I do see my printk statements being executed. I can
> now confirm that when the CWR marked packet (for which no ACK is sent)
> arrives at the receiver, the __tcp_ack_snd_check() function is never
> invoked; and hence neither is the tcp_send_delayed_ack() function.
> Hopefully this helps narrow down where the issue might be? I started
> adding some printk statements into the tcp_rcv_established() function,
> but I'm not sure where the best places to look would be so I wanted to
> ask your advice on this.

Thanks for the detailed report!

As a next step to narrow down why the CWR-marked packet is not acked,
I would suggest adding printk statements at the bottom of
tcp_rcv_established() in all the spots where we have a goto or return
that would cause us to short-circuit and not reach the
tcp_ack_snd_check() at the bottom of the function. This could be much
like your existing nice debugging printks, that log any time we get to
that spot and if (sk->sk_family == AF_INET && th->cwr). And these
could be in the following spots (marked "here"):

slow_path:
        if (len < (th->doff << 2) || tcp_checksum_complete(skb))
                goto csum_error;                /* <=== here */

        if (!th->ack && !th->rst && !th->syn)
                goto discard;                /* <=== here */

        /*
         *      Standard slow path.
         */

        if (!tcp_validate_incoming(sk, skb, th, 1))
                return;                /* <=== here */

step5:
        if (tcp_ack(sk, skb, FLAG_SLOWPATH | FLAG_UPDATE_TS_RECENT) < 0)
                goto discard;                /* <=== here */


thanks,
neal

^ permalink raw reply

* Re: [PATCH v2,net-next] ip6_gre: fix a pontential issue in ip6erspan_rcv
From: David Miller @ 2017-12-19 15:34 UTC (permalink / raw)
  To: yanhaishuang; +Cc: kuznet, yoshfuji, netdev, linux-kernel, u9012063
In-Reply-To: <1513391125-28227-1-git-send-email-yanhaishuang@cmss.chinamobile.com>

From: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Date: Sat, 16 Dec 2017 10:25:25 +0800

> pskb_may_pull() can change skb->data, so we need to load ipv6h/ershdr at
> the right place.
> 
> Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support")
> Acked-by: William Tu <u9012063@gmail.com>
> Cc: William Tu <u9012063@gmail.com>
> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>

This patch does not apply:

> +	ipv6h = ipv6_hdr(skb);
> +	ershdr = (struct erspan_base_hdr *)skb->data;
>  	ver = (ntohs(ershdr->ver_vlan) & VER_MASK) >> VER_OFFSET;
>  	tpi->key = cpu_to_be32(ntohs(ershdr->session_id) & ID_MASK);
>  	pkt_md = (struct erspan_metadata *)(ershdr + 1);

There is not "pkt_md = ..." assignment in net-next on this line.

^ permalink raw reply

* Re: [PATCH v2,net-next 1/2] ip_gre: fix potential memory leak in erspan_rcv
From: David Miller @ 2017-12-19 15:36 UTC (permalink / raw)
  To: yanhaishuang; +Cc: kuznet, yoshfuji, netdev, linux-kernel, u9012063
In-Reply-To: <1513392519-30127-1-git-send-email-yanhaishuang@cmss.chinamobile.com>

From: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Date: Sat, 16 Dec 2017 10:48:38 +0800

> If md is NULL, tun_dst must be freed, otherwise it will cause memory
> leak.
> 
> Fixes: 1a66a836da6 ("gre: add collect_md mode to ERSPAN tunnel")
> Cc: William Tu <u9012063@gmail.com>
> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
> 
> Change since v2:
>   * Rebase on latest master branch.
>   * Correct wrong fix information.

Please do not put a changelog after the fixes and signoff tags, those tags must
appear last in the commit message.

Thank you.

^ permalink raw reply

* Re: [PATCH v2,net-next 2/2] ip6_gre: fix potential memory leak in ip6erspan_rcv
From: David Miller @ 2017-12-19 15:36 UTC (permalink / raw)
  To: yanhaishuang; +Cc: kuznet, yoshfuji, netdev, linux-kernel, u9012063
In-Reply-To: <1513392519-30127-2-git-send-email-yanhaishuang@cmss.chinamobile.com>

From: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Date: Sat, 16 Dec 2017 10:48:39 +0800

> If md is NULL, tun_dst must be freed, otherwise it will cause memory
> leak.
> 
> Fixes: ef7baf5e083c ("ip6_gre: add ip6 erspan collect_md mode")
> Cc: William Tu <u9012063@gmail.com>
> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
> 
> Change since v2:
>   * Rebase on latest master branch.
>   * Correct wrong fix information.

Likewise, please place the tags at the end of the commit message.

^ permalink raw reply

* RE: [PATCH v3 net-next 6/6] tls: Add generic NIC offload infrastructure.
From: Ilya Lesokhin @ 2017-12-19 15:38 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: netdev@vger.kernel.org, davem@davemloft.net, davejwatson@fb.com,
	tom@herbertland.com, hannes@stressinduktion.org, Boris Pismenny,
	Aviad Yehezkel, Liran Liss
In-Reply-To: <20171219151138.GD6122@localhost.localdomain>

Tuesday, December 19, 2017 5:12 PM, Marcelo Ricardo Leitner wrote:

> > I'm not quite sure what you mean by "no net_device's are registered"
> > Presumably you mean there is no device that implements the
> > NETIF_F_HW_TLS_TX capability yet.
> 
> Not really. Let me try again. This patchset is using the expression "tls_device".
> When I read that, I expect a new interface type, like a tunnel, that would be
> created on top of another interface that has the offloading capability. That's
> why I'm confused. IMHO "tls_offload" is a better fit. Makes sense?
> 

We don't expose a new interface. An existing netdev does the offload.

The xfrm layer also calls the offload layer xfrm_device and It also doesn't need to
add another interface to offload ipsec to a netdev.

I thought about calling it tls_hw or tls_hw_offload.
The problem is that the important distinction here is that the 
offload is done by a netdev.
tls_sw can also use hw offload if you have the required 
memory to memory crypto engine and crypto_alloc_aead("gcm(aes)", 0, 0); 
decides on using it.

^ permalink raw reply

* Re: [RFC PATCH] virtio_net: Extend virtio to use VF datapath when available
From: Michael S. Tsirkin @ 2017-12-19 15:47 UTC (permalink / raw)
  To: Sridhar Samudrala; +Cc: stephen, netdev, virtualization, alexander.duyck
In-Reply-To: <1513644036-45230-1-git-send-email-sridhar.samudrala@intel.com>

I'll need to look at this more, in particular the feature
bit is missing here. For now one question:

On Mon, Dec 18, 2017 at 04:40:36PM -0800, Sridhar Samudrala wrote:
> @@ -56,6 +58,8 @@ module_param(napi_tx, bool, 0644);
>   */
>  DECLARE_EWMA(pkt_len, 0, 64)
>  
> +#define VF_TAKEOVER_INT	(HZ / 10)
> +
>  #define VIRTNET_DRIVER_VERSION "1.0.0"
>  
>  static const unsigned long guest_offloads[] = {

Why is this delay necessary? And why by 100ms?

^ permalink raw reply

* Re: [PATCH v4 net-next 1/6] lwt: Add net to build_state argument
From: Roopa Prabhu @ 2017-12-19 15:47 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David Miller, netdev, rohit
In-Reply-To: <20171215182800.10248-2-tom@quantonium.net>

On Fri, Dec 15, 2017 at 10:27 AM, Tom Herbert <tom@quantonium.net> wrote:
> Users of LWT need to know net if they want to have per net operations
> in LWT.
>
> Signed-off-by: Tom Herbert <tom@quantonium.net>

Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>

^ permalink raw reply

* Re: [PATCH] net: arc_emac: fix arc_emac_rx() error paths
From: Alexander Kochetkov @ 2017-12-19 15:49 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel, f.fainelli, edumazet
In-Reply-To: <20171219.102254.1959679351283055093.davem@davemloft.net>

> 19 дек. 2017 г., в 18:22, David Miller <davem@davemloft.net> написал(а):
> 
> From: Alexander Kochetkov <al.kochet@gmail.com>
> Date: Fri, 15 Dec 2017 20:20:06 +0300
> 
>> arc_emac_rx() has some issues found by code review.
>> 
>> In case netdev_alloc_skb_ip_align() or dma_map_single() failure
>> rx fifo entry will not be returned to EMAC.
>> 
>> In case dma_map_single() failure previously allocated skb became
>> lost to driver. At the same time address of newly allocated skb
>> will not be provided to EMAC.
>> 
>> Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
> 
> This patch adds quite a few bugs, here is one which shows this is not
> functionally tested:

May be I don’t understand correctly, but I don’t see bug here.

Wrong dma mapping usage should immediately manifest itself (kernel
instability, koops and so on). The patch was tested on rk3188 and work fine for me.
Also I did simulations of netdev_alloc_skb_ip_align() and dma_map_single()
faults and can confirm that new error handling work.

Could someone else with ARC EMAC test the patch? I would be very grateful for the help.
Florian or Eric, can you test it on your hardware?

>> +
>> +		/* unmap previosly mapped skb */
>> +		dma_unmap_single(&ndev->dev, dma_unmap_addr(rx_buff, addr),
>> +				 dma_unmap_len(rx_buff, len), DMA_FROM_DEVICE);
> 
> And then you unmap it.  "addr" is the new DMA mapping, not the old one.

The old mapping should be unmapped here. It refer to old skb what contains already
received packet. Because buffer doesn’t belong to EMAC anymore.
That old mapping point to old skb buffer. And netif_receive_skb() will be
called for old skb after that.

The new mapping «addr" will be unmapped then EMAC receive
new packet. Later by the code (after netif_receive_skb()) there are lines for
saving «addr» value:

    rx_buff->skb = skb;
    dma_unmap_addr_set(rx_buff, addr, addr);
    dma_unmap_len_set(rx_buff, len, EMAC_BUFFER_SIZE);

Regards,
Alexander.

^ permalink raw reply

* Re: [PATCH net-next v5 0/5] Introduce NETIF_F_GRO_HW
From: David Miller @ 2017-12-19 15:50 UTC (permalink / raw)
  To: michael.chan; +Cc: netdev, andrew.gospodarek
In-Reply-To: <1513411784-17653-1-git-send-email-michael.chan@broadcom.com>

From: Michael Chan <michael.chan@broadcom.com>
Date: Sat, 16 Dec 2017 03:09:39 -0500

> Introduce NETIF_F_GRO_HW feature flag and convert drivers that support
> hardware GRO to use the new flag.

Series applied, thanks for following through with this work.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox