Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH iproute2 net-next v1 0/7] tipc: updates for neighbour monitor
From: Stephen Hemminger @ 2016-09-20 16:39 UTC (permalink / raw)
  To: Parthasarathy Bhuvaragan; +Cc: netdev, tipc-discussion, jon.maloy, maloy
In-Reply-To: <1473693441-14254-1-git-send-email-parthasarathy.bhuvaragan@ericsson.com>

On Mon, 12 Sep 2016 17:17:14 +0200
Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> wrote:

> We add configuration support for the new link monitoring attributes.
> 
> Parthasarathy Bhuvaragan (7):
>   tipc: remove dead code
>   tipc: add link monitor set threshold
>   tipc: add link monitor get threshold
>   tipc: add link monitor summary
>   tipc: refractor bearer to facilitate link monitor
>   tipc: add link monitor list
>   tipc: update man page for link monitor
> 
>  man/man8/tipc-link.8 | 104 +++++++++++++
>  tipc/bearer.c        |  75 ++++++----
>  tipc/bearer.h        |   4 +
>  tipc/link.c          | 408 ++++++++++++++++++++++++++++++++++++++++++++++++++-
>  4 files changed, 556 insertions(+), 35 deletions(-)
> 

Applied, then I did a style scrub on tipc/link.c

^ permalink raw reply

* Re: [PATCH v4 net-next 01/16] tcp: cdg: rename struct minmax in tcp_cdg.c to avoid a naming conflict
From: Kenneth Klette Jonassen @ 2016-09-20 16:40 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: David Miller, netdev, Soheil Hassas Yeganeh, Yuchung Cheng,
	Eric Dumazet, Kenneth Klette Jonassen
In-Reply-To: <1474342763-16715-2-git-send-email-ncardwell@google.com>

On Tue, Sep 20, 2016 at 5:39 AM, Neal Cardwell <ncardwell@google.com> wrote:
>
> From: Soheil Hassas Yeganeh <soheil@google.com>
>
> The upcoming change "lib/win_minmax: windowed min or max estimator"
> introduces a struct called minmax, which is then included in
> include/linux/tcp.h in the upcoming change "tcp: use windowed min
> filter library for TCP min_rtt estimation". This would create a
> compilation error for tcp_cdg.c, which defines its own minmax
> struct. To avoid this naming conflict (and potentially others in the
> future), this commit renames the version used in tcp_cdg.c to
> cdg_minmax.
>
> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
> Signed-off-by: Neal Cardwell <ncardwell@google.com>
> Signed-off-by: Yuchung Cheng <ycheng@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Kenneth Klette Jonassen <kennetkl@ifi.uio.no>

Acked-by: Kenneth Klette Jonassen <kennetkl@ifi.uio.no>

^ permalink raw reply

* Re: [PATCH 3/6] ip xfrm state: add save/restore
From: Stephen Hemminger @ 2016-09-20 16:43 UTC (permalink / raw)
  To: Pavel Tikhomirov; +Cc: netdev, Konstantin Khorenko, crml
In-Reply-To: <1472806946-16575-4-git-send-email-ptikhomirov@virtuozzo.com>

On Fri,  2 Sep 2016 12:02:23 +0300
Pavel Tikhomirov <ptikhomirov@virtuozzo.com> wrote:

> +static __u32 state_dump_magic = 0x71706987;
> +
> +static int xfrm_state_list_deleteall_or_save(int argc, char **argv, int deleteall, int save)

I have no problem with the overall functionality of this patch set, but this
function_name_is_too_long_for_rational_humans.

Please make the function names (especially the local static ones) shorter and more reasonable
and resubmit

^ permalink raw reply

* Re: [PATCH net-next 8/8] net: qualcomm: add QCA7000 UART driver
From: Stefan Wahren @ 2016-09-20 16:43 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, Jiri Slaby, Greg Kroah-Hartman, netdev, linux-kernel,
	David S. Miller
In-Reply-To: <201609202253.jYnEZtwg%fengguang.wu@intel.com>


> kbuild test robot <lkp@intel.com> hat am 20. September 2016 um 16:44
> geschrieben:
> 
> 
> Hi Stefan,
> 
> [auto build test ERROR on net-next/master]
> 
> url:
>    https://github.com/0day-ci/linux/commits/Stefan-Wahren/net-qualcomm-add-QCA7000-UART-driver/20160920-210908
> config: i386-allmodconfig (attached as .config)
> compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
> reproduce:
>         # save the attached .config to linux build tree
>         make ARCH=i386 
> 
> All errors (new ones prefixed by >>):
> 
>    drivers/net/ethernet/qualcomm/qca_uart.c: In function
> 'qcauart_netdev_xmit':
> >> drivers/net/ethernet/qualcomm/qca_uart.c:302:5: error: 'struct net_device'
> >> has no member named 'trans_start'; did you mean 'mem_start'?
>      dev->trans_start = jiffies;

This should be replaced by netif_trans_update(dev)

>         ^~
>    drivers/net/ethernet/qualcomm/qca_uart.c: In function
> 'qcauart_netdev_tx_timeout':
>    drivers/net/ethernet/qualcomm/qca_uart.c:314:29: error: 'struct net_device'
> has no member named 'trans_start'; did you mean 'mem_start'?
>           jiffies, jiffies - dev->trans_start);

and this by dev_trans_start(dev) in the next version.

Sorry about that :-(

^ permalink raw reply

* Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs
From: Daniel Mack @ 2016-09-20 16:43 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: htejun-b10kYP2dOMg, daniel-FeC+5ew28dpmcu3hnIyYJQ,
	ast-b10kYP2dOMg, davem-fT/PcQaiUtIeIZ0/mPfg9Q, kafai-b10kYP2dOMg,
	fw-HFFVJYpyMKqzQB+pC5nmwQ, harald-H+wXaHxf7aLQT0dZR+AlfA,
	netdev-u79uwXL29TY76Z2rM5mHXA, sargun-GaZTRHToo+CzQB+pC5nmwQ,
	cgroups-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20160920142931.GA31438@salvia>

Hi Pablo,

On 09/20/2016 04:29 PM, Pablo Neira Ayuso wrote:
> On Mon, Sep 19, 2016 at 10:56:14PM +0200, Daniel Mack wrote:
> [...]
>> Why would we artificially limit the use-cases of this implementation if
>> the way it stands, both filtering and introspection are possible?
> 
> Why should we place infrastructure in the kernel to filter packets so
> late, and why at postrouting btw, when we can do this way earlier
> before any packet is actually sent?

The point is that from an application's perspective, restricting the
ability to bind a port and dropping packets that are being sent is a
very different thing. Applications will start to behave differently if
they can't bind to a port, and that's something we do not want to happen.

Looking at packets and making a verdict on them is the only way to
implement what we have in mind. Given that's in line with what netfilter
does, it can't be all that wrong, can it?

> No performance impact, no need for
> skbuff allocation and *no cycles wasted to evaluate if every packet is
> wanted or not*.

Hmm, not sure why this keeps coming up. As I said - for accounting,
there is no other option than to look at every packet and its size.

Regarding the performance concerns, are you saying a netfilter based
implementation that uses counters for that purpose would be more
efficient? Because in my tests, just loading the netfilter modules with
no rules in place at all has more impact than running the code from 6/6
on every packet.

As stated before, I see no reason why we shouldn't have a netfilter
based implementation that can achieve the same, function-wise. And I
would also like to compare their throughput.

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb()
From: Jiri Benc @ 2016-09-20 16:43 UTC (permalink / raw)
  To: Sowmini Varadhan; +Cc: netdev, davem, hannes, aduyck, daniel, pabeni
In-Reply-To: <20160920163108.GP8920@oracle.com>

On Tue, 20 Sep 2016 12:31:08 -0400, Sowmini Varadhan wrote:
> The vxlan header is after the ethernet header (14 bytes),
> IP header (20 bytes, assuming no options) and udp header (8 bytes).

If the VXLAN header is not aligned to 4 bytes, then IP header is not
aligned to 4 bytes, too, and you have greater problems than just VXLAN.

> Post the skb_reserve adjustments (see computations in in mld_newpack(),
> for example), this triggers an unaligned access on sparc.

IPv6 header is certainly not 20 bytes.

If you see unaligned access, something is wrong. But I very much doubt
that the problem is at the place you're trying to fix. Could you share
the traces with us?

 Jiri

^ permalink raw reply

* Re: [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb()
From: Hannes Frederic Sowa @ 2016-09-20 16:45 UTC (permalink / raw)
  To: Sowmini Varadhan, netdev; +Cc: davem, jbenc, aduyck, daniel, pabeni
In-Reply-To: <20160920142700.GJ8920@oracle.com>

On 20.09.2016 16:27, Sowmini Varadhan wrote:
> The vxlan header is at offset (14 + 20 + 8) into the packet,
> so the vxh is not aligned in vxlan_build_skb. Use [get/put]_unaligned
> functions to modify flags and vni field in the vxh.
> 
> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
> ---
>  drivers/net/vxlan.c |   10 ++++++----
>  1 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index e7d1668..f903fa4 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -1751,15 +1751,17 @@ static int vxlan_build_skb(struct sk_buff *skb, struct dst_entry *dst,
>  		goto out_free;
>  
>  	vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
> -	vxh->vx_flags = VXLAN_HF_VNI;
> -	vxh->vx_vni = vxlan_vni_field(vni);
> +	put_unaligned(VXLAN_HF_VNI, &vxh->vx_flags);
> +	put_unaligned(vxlan_vni_field(vni), &vxh->vx_vni);
>  
>  	if (type & SKB_GSO_TUNNEL_REMCSUM) {
>  		unsigned int start;
> +		__be32 tmpvni = get_unaligned(&vxh->vx_vni);
>  
>  		start = skb_checksum_start_offset(skb) - sizeof(struct vxlanhdr);
> -		vxh->vx_vni |= vxlan_compute_rco(start, skb->csum_offset);
> -		vxh->vx_flags |= VXLAN_HF_RCO;
> +		tmpvni |= vxlan_compute_rco(start, skb->csum_offset);
> +		put_unaligned(tmpvni, &vxh->vx_vni);
> +		put_unaligned(VXLAN_HF_VNI | VXLAN_HF_RCO, &vxh->vx_flags);
>  
>  		if (!skb_is_gso(skb)) {
>  			skb->ip_summed = CHECKSUM_NONE;
> 

The easiest way would be to make struct vxlanhdr __packed. Probably the
best way out of this instead of randomly adding put/get unaligned.

Bye,
Hannes

^ permalink raw reply

* Re: [PATCH net-next 7/8] net/mlx5e: XDP TX forwarding support
From: Alexei Starovoitov @ 2016-09-20 16:45 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Eric Dumazet, Tom Herbert, Tariq Toukan, Tariq Toukan,
	David S. Miller, Linux Kernel Network Developers, Eran Ben Elisha,
	Saeed Mahameed, Rana Shahout
In-Reply-To: <20160920182717.2de9541e@redhat.com>

On Tue, Sep 20, 2016 at 06:27:17PM +0200, Jesper Dangaard Brouer wrote:
> On Tue, 20 Sep 2016 09:13:56 -0700
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > On Tue, 2016-09-20 at 18:06 +0200, Jesper Dangaard Brouer wrote:
> > > On Tue, 20 Sep 2016 08:58:30 -0700
> > > Eric Dumazet <eric.dumazet@gmail.com> wrote:  
> > 
> > > > Same for XDP_TX if/when packet is dropped because output ring is full.  
> > > 
> > > For the XDP_TX case a counter is already incremented[1] but it is a
> > > local ring counter (ring->tx_dropped++).
> > > 
> > > Do you think we should maintain separate counters for XDP? (to have a
> > > more consistent interface across drivers...)  
> > 
> > No, as long as the admin can learn drops are occurring.
> 
> Okay, so recording these drops is important for an admin, agreed.  Now
> that we have the chance to define the API, wouldn't is be nice if the
> admin across drive drivers knew what counter to look for???
> 
> > "perf ... -e skb:kfree_skb ..." or drop monitor wont work,
> > unfortunately.
> 
> That is actually a good idea... why not add a trace point for these
> rare drop cases, which is a hassle to debug for an admin?
> Let's actually take advantage of all the nice infrastructure the kernel
> provides(?)

that's a great idea. Instead of adding aborted, ring_full, blahblah counters
everywhere that cost performance, let's add tracepoints that are nops
when not in use.
And if all drivers call the same tracepoints it will be even better.
Monitoring trace_xdp_aborted tracepoint would be generic way to debug
'divide by zero'.

To your other question:
> Please explain why a eBPF program error (div by zero) must be a silent drop?

because 'div by zero' is an abnormal situation that shouldn't be exploited.
Meaning if xdp program is doing DoS prevention and it has a bug that
attacker can now exploit by sending a crafted packet that causes
'div by zero' and kernel will warn then attack got successful.
Therefore it has to be silent drop.
tracpoint in such case is great, since the user can do debugging with it
and even monitoring 24/7 and if suddenly the control plan sees a lot
of such trace_xdp_abotred events, it can disable that tracepoint to avoid
spam and adjust the program or act on attack some other way.
Hardcoded warnings and counters are not generic enough for all
the use cases people want to throw at XDP.
The tracepoints idea is awesome, in a sense that it's optional.

^ permalink raw reply

* Re: [RFC v2 04/12] qedr: Add support for user context verbs
From: Leon Romanovsky @ 2016-09-20 16:45 UTC (permalink / raw)
  To: Ram Amrani
  Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q, dledford-H+wXaHxf7aLQT0dZR+AlfA,
	Ariel.Elior-YGCgFSpz5w/QT0dZR+AlfA,
	Michal.Kalderon-YGCgFSpz5w/QT0dZR+AlfA,
	Yuval.Mintz-YGCgFSpz5w/QT0dZR+AlfA,
	rajesh.borundia-YGCgFSpz5w/QT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1474367764-9555-5-git-send-email-Ram.Amrani-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 2215 bytes --]

On Tue, Sep 20, 2016 at 01:35:56PM +0300, Ram Amrani wrote:
> From: Ram amrani <Ram.Amrani-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
>
> Add support for ucontext, query port, add and del gid verbs.
>
> Signed-off-by: Rajesh Borundia <rajesh.borundia-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Ram Amrani <Ram.Amrani-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
> ---

<...>

> +++ b/drivers/infiniband/hw/qedr/verbs.c
> @@ -0,0 +1,417 @@
> +/* QLogic qed NIC Driver
> + * Copyright (c) 2015 QLogic Corporation
> + *
> + * This software is available under the terms of the GNU General Public License
> + * (GPL) Version 2, available from the file COPYING in the main directory of
> + * this source tree.
> + */

<...>

> --- /dev/null
> +++ b/drivers/infiniband/hw/qedr/verbs.h
> @@ -0,0 +1,26 @@
> +/* QLogic qed NIC Driver
> + * Copyright (c) 2015 QLogic Corporation
> + *
> + * This software is available under the terms of the GNU General Public License
> + * (GPL) Version 2, available from the file COPYING in the main directory of
> + * this source tree.
> + */

<...>

> +++ b/include/uapi/rdma/providers/qedr-abi.h
> @@ -0,0 +1,27 @@
> +/* QLogic qed NIC Driver
> + * Copyright (c) 2015 QLogic Corporation
> + *
> + * This software is available under the terms of the GNU General Public License
> + * (GPL) Version 2, available from the file COPYING in the main directory of
> + * this source tree.
> + */

Are you sure that you want GPL-only license?


> +#ifndef __QEDR_USER_H__
> +#define __QEDR_USER_H__
> +
> +#define QEDR_ABI_VERSION		(6)

Is it possible to start new file with new driver from ABI version 1 and
not 6?

> +
> +/* user kernel communication data structures. */
> +
> +struct qedr_alloc_ucontext_resp {
> +	u64 db_pa;
> +	u32 db_size;
> +
> +	u32 max_send_wr;
> +	u32 max_recv_wr;
> +	u32 max_srq_wr;
> +	u32 sges_per_send_wr;
> +	u32 sges_per_recv_wr;
> +	u32 sges_per_srq_wr;
> +	int max_cqes;
> +};
> +#endif /* __QEDR_USER_H__ */
> --
> 1.8.3.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH iproute2 2/2] vxlan: group address requires net device
From: Stephen Hemminger @ 2016-09-20 16:47 UTC (permalink / raw)
  To: Jiri Benc; +Cc: netdev
In-Reply-To: <339db8bf7d0b0bdc9dc69209f4d1d00d45f4132d.1473067954.git.jbenc@redhat.com>

On Mon,  5 Sep 2016 11:35:28 +0200
Jiri Benc <jbenc@redhat.com> wrote:

> This is now enforced in the kernel, check also in iproute to get a better
> error message.
> 
> Signed-off-by: Jiri Benc <jbenc@redhat.com>

Applied

^ permalink raw reply

* Re: [iproute PATCH] macsec: fix input range of 'icvlen' parameter
From: Stephen Hemminger @ 2016-09-20 16:49 UTC (permalink / raw)
  To: Davide Caratti; +Cc: netdev, Sabrina Dubroca, Phil Sutter
In-Reply-To: <b1f730436465e511edb56e4486299e4f3835599b.1473428548.git.dcaratti@redhat.com>

On Fri,  9 Sep 2016 16:02:22 +0200
Davide Caratti <dcaratti@redhat.com> wrote:

> the maximum possible ICV length in a MACsec frame is 16 octects, not 32:
> fix get_icvlen() accordingly, so that a proper error message is displayed
> in case input 'icvlen' is greater than 16.
> 
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>

Applied

^ permalink raw reply

* Re: [PATCH] ip: (ipvlan) introduce L3s mode
From: Stephen Hemminger @ 2016-09-20 16:51 UTC (permalink / raw)
  To: Mahesh Bandewar; +Cc: netdev, Mahesh Bandewar
In-Reply-To: <1474321180-31717-1-git-send-email-mahesh@bandewar.net>

On Mon, 19 Sep 2016 14:39:40 -0700
Mahesh Bandewar <mahesh@bandewar.net> wrote:

> From: Mahesh Bandewar <maheshb@google.com>
> 
> The new mode 'l3s' can be set like -
> 
>   ip link add link <master> dev <IPvlan-slave> type ipvlan mode l3s
> 
>   e.g. ip link add link eth0 dev ipvl0 type ipvlan mode l3s
> 
> Also did some trivial code restructuring.
> 
> Signed-off-by: Mahesh Bandewar <maheshb@google.com>

Applied to net-next

^ permalink raw reply

* Re: [iproute PATCH] tc: don't accept qdisc 'handle' greater than ffff
From: Stephen Hemminger @ 2016-09-20 16:46 UTC (permalink / raw)
  To: Davide Caratti; +Cc: netdev, Phil Sutter, Hangbin Liu
In-Reply-To: <bdc0d8e58be9344b44fcd64a56b5b62004c108a1.1473930575.git.dcaratti@redhat.com>

On Fri, 16 Sep 2016 10:30:00 +0200
Davide Caratti <dcaratti@redhat.com> wrote:

> since get_qdisc_handle() truncates the input value to 16 bit, return an
> error and prompt "invalid qdisc ID" in case input 'handle' parameter needs
> more than 16 bit to be stored.
> 
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>

Applied

^ permalink raw reply

* Re: [PATCHv2 iproute2] ip route: check ftell, fseek return value
From: Stephen Hemminger @ 2016-09-20 16:53 UTC (permalink / raw)
  To: Hangbin Liu; +Cc: netdev, David Laight, Phil Sutter
In-Reply-To: <1473301617-9123-1-git-send-email-liuhangbin@gmail.com>

On Thu,  8 Sep 2016 10:26:57 +0800
Hangbin Liu <liuhangbin@gmail.com> wrote:

> ftell() may return -1 in error case, which is not handled and therefore pass a
> negative offset to fseek(). The return code of fseek() is also not checked.
> 
> Reported-by: Phil Sutter <phil@nwl.cc>
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>

Applied

^ permalink raw reply

* Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs
From: Thomas Graf @ 2016-09-20 16:53 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Daniel Mack, htejun-b10kYP2dOMg, daniel-FeC+5ew28dpmcu3hnIyYJQ,
	ast-b10kYP2dOMg, davem-fT/PcQaiUtIeIZ0/mPfg9Q, kafai-b10kYP2dOMg,
	fw-HFFVJYpyMKqzQB+pC5nmwQ, harald-H+wXaHxf7aLQT0dZR+AlfA,
	netdev-u79uwXL29TY76Z2rM5mHXA, sargun-GaZTRHToo+CzQB+pC5nmwQ,
	cgroups-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20160920142931.GA31438@salvia>

On 09/20/16 at 04:29pm, Pablo Neira Ayuso wrote:
> On Mon, Sep 19, 2016 at 10:56:14PM +0200, Daniel Mack wrote:
> [...]
> > Why would we artificially limit the use-cases of this implementation if
> > the way it stands, both filtering and introspection are possible?
> 
> Why should we place infrastructure in the kernel to filter packets so
> late, and why at postrouting btw, when we can do this way earlier
> before any packet is actually sent? No performance impact, no need for
> skbuff allocation and *no cycles wasted to evaluate if every packet is
> wanted or not*.

I won't argue against filtering at socket level, it is very valuable.
A difference is transparency of enforcement. Dropping at networking
level is accounted for in the usual counters and well understood by
admins. "Dropping" at bind socket level would either involve returning
an error to the application (which may change behaviour of the application)
or require a new form of accounting.

I also want to see packet size, selected source address, outgoing device,
and attach tunnel key metadata to the skb based on the cgroup. All of
which are not available at socket level.

^ permalink raw reply

* Re: lsm naming dilemma. Re: [RFC v3 07/22] landlock: Handle file comparisons
From: Mickaël Salaün @ 2016-09-20 16:58 UTC (permalink / raw)
  To: Sargun Dhillon, Alexei Starovoitov
  Cc: LKML, Alexei Starovoitov, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, Eric W . Biederman,
	James Morris, Kees Cook, Paul Moore, Serge E . Hallyn, Tejun Heo,
	Will Drewry, kernel-hardening, linux-api, LSM, netdev, cgroups
In-Reply-To: <CAMp4zn8_GhvL5gGLrkADVSTP9Z4VO78bXtX8wbzFSa1P7j-yfA@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 2137 bytes --]


On 20/09/2016 03:10, Sargun Dhillon wrote:
> I'm fine giving up the Checmate name. Landlock seems easy enough to
> Google. I haven't gotten a chance to look through the entire patchset
> yet, but it does seem like they are somewhat similar.

Excellent! I'm looking forward for your review.


> 
> On Mon, Sep 19, 2016 at 5:12 PM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>> On Thu, Sep 15, 2016 at 11:25:10PM +0200, Mickaël Salaün wrote:
>>>>> Agreed. With this RFC, the Checmate features (i.e. network helpers)
>>>>> should be able to sit on top of Landlock.
>>>>
>>>> I think neither of them should be called fancy names for no technical reason.
>>>> We will have only one bpf based lsm. That's it and it doesn't
>>>> need an obscure name. Directory name can be security/bpf/..stuff.c
>>>
>>> I disagree on an LSM named "BPF". I first started with the "seccomp LSM"
>>> name (first RFC) but I later realized that it is confusing because
>>> seccomp is associated to its syscall and the underlying features. Same
>>> thing goes for BPF. It is also artificially hard to grep on a name too
>>> used in the kernel source tree.
>>> Making an association between the generic eBPF mechanism and a security
>>> centric approach (i.e. LSM) seems a bit reductive (for BPF). Moreover,
>>> the seccomp interface [1] can still be used.
>>
>> agree with above.
>>
>>> Landlock is a nice name to depict a sandbox as an enclave (i.e. a
>>> landlocked country/state). I want to keep this name, which is simple,
>>> express the goal of Landlock nicely and is comparable to other sandbox
>>> mechanisms as Seatbelt or Pledge.
>>> Landlock should not be confused with the underlying eBPF implementation.
>>> Landlock could use more than only eBPF in the future and eBPF could be
>>> used in other LSM as well.
>>
>> there will not be two bpf based LSMs.
>> Therefore unless you can convince Sargun to give up his 'checmate' name,
>> nothing goes in.
>> The features you both need are 90% the same, so they must be done
>> as part of single LSM whatever you both agree to call it.
>>
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply

* Re: [PATCH net-next 7/8] net/mlx5e: XDP TX forwarding support
From: Thomas Graf @ 2016-09-20 16:58 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jesper Dangaard Brouer, Eric Dumazet, Tom Herbert, Tariq Toukan,
	Tariq Toukan, David S. Miller, Linux Kernel Network Developers,
	Eran Ben Elisha, Saeed Mahameed, Rana Shahout
In-Reply-To: <20160920164526.GB99429@ast-mbp.thefacebook.com>

On 09/20/16 at 09:45am, Alexei Starovoitov wrote:
> that's a great idea. Instead of adding aborted, ring_full, blahblah counters
> everywhere that cost performance, let's add tracepoints that are nops
> when not in use.
> And if all drivers call the same tracepoints it will be even better.
> Monitoring trace_xdp_aborted tracepoint would be generic way to debug
> 'divide by zero'.

Not opposed but I still need an indication to start tracing ;-) A
single counter would be enough though.

^ permalink raw reply

* Re: [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks
From: Mickaël Salaün @ 2016-09-20 17:02 UTC (permalink / raw)
  To: Sargun Dhillon
  Cc: Alexei Starovoitov, Andy Lutomirski, linux-kernel@vger.kernel.org,
	Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, Eric W . Biederman, James Morris, Kees Cook,
	Paul Moore, Serge E . Hallyn, Tejun Heo, Will Drewry,
	kernel-hardening@lists.openwall.com
In-Reply-To: <20160920043744.GA13906@ircssh.c.rugged-nimbus-611.internal>


[-- Attachment #1.1: Type: text/plain, Size: 7168 bytes --]


On 20/09/2016 06:37, Sargun Dhillon wrote:
> On Thu, Sep 15, 2016 at 09:41:33PM +0200, Mickaël Salaün wrote:
>>
>> On 15/09/2016 06:48, Alexei Starovoitov wrote:
>>> On Wed, Sep 14, 2016 at 09:38:16PM -0700, Andy Lutomirski wrote:
>>>> On Wed, Sep 14, 2016 at 9:31 PM, Alexei Starovoitov
>>>> <alexei.starovoitov@gmail.com> wrote:
>>>>> On Wed, Sep 14, 2016 at 09:08:57PM -0700, Andy Lutomirski wrote:
>>>>>> On Wed, Sep 14, 2016 at 9:00 PM, Alexei Starovoitov
>>>>>> <alexei.starovoitov@gmail.com> wrote:
>>>>>>> On Wed, Sep 14, 2016 at 07:27:08PM -0700, Andy Lutomirski wrote:
>>>>>>>>>>>
>>>>>>>>>>> This RFC handle both cgroup and seccomp approaches in a similar way. I
>>>>>>>>>>> don't see why building on top of cgroup v2 is a problem. Is there
>>>>>>>>>>> security issues with delegation?
>>>>>>>>>>
>>>>>>>>>> What I mean is: cgroup v2 delegation has a functionality problem.
>>>>>>>>>> Tejun says [1]:
>>>>>>>>>>
>>>>>>>>>> We haven't had to face this decision because cgroup has never properly
>>>>>>>>>> supported delegating to applications and the in-use setups where this
>>>>>>>>>> happens are custom configurations where there is no boundary between
>>>>>>>>>> system and applications and adhoc trial-and-error is good enough a way
>>>>>>>>>> to find a working solution.  That wiggle room goes away once we
>>>>>>>>>> officially open this up to individual applications.
>>>>>>>>>>
>>>>>>>>>> Unless and until that changes, I think that landlock should stay away
>>>>>>>>>> from cgroups.  Others could reasonably disagree with me.
>>>>>>>>>
>>>>>>>>> Ours and Sargun's use cases for cgroup+lsm+bpf is not for security
>>>>>>>>> and not for sandboxing. So the above doesn't matter in such contexts.
>>>>>>>>> lsm hooks + cgroups provide convenient scope and existing entry points.
>>>>>>>>> Please see checmate examples how it's used.
>>>>>>>>>
>>>>>>>>
>>>>>>>> To be clear: I'm not arguing at all that there shouldn't be
>>>>>>>> bpf+lsm+cgroup integration.  I'm arguing that the unprivileged
>>>>>>>> landlock interface shouldn't expose any cgroup integration, at least
>>>>>>>> until the cgroup situation settles down a lot.
>>>>>>>
>>>>>>> ahh. yes. we're perfectly in agreement here.
>>>>>>> I'm suggesting that the next RFC shouldn't include unpriv
>>>>>>> and seccomp at all. Once bpf+lsm+cgroup is merged, we can
>>>>>>> argue about unpriv with cgroups and even unpriv as a whole,
>>>>>>> since it's not a given. Seccomp integration is also questionable.
>>>>>>> I'd rather not have seccomp as a gate keeper for this lsm.
>>>>>>> lsm and seccomp are orthogonal hook points. Syscalls and lsm hooks
>>>>>>> don't have one to one relationship, so mixing them up is only
>>>>>>> asking for trouble further down the road.
>>>>>>> If we really need to carry some information from seccomp to lsm+bpf,
>>>>>>> it's easier to add eBPF support to seccomp and let bpf side deal
>>>>>>> with passing whatever information.
>>>>>>>
>>>>>>
>>>>>> As an argument for keeping seccomp (or an extended seccomp) as the
>>>>>> interface for an unprivileged bpf+lsm: seccomp already checks off most
>>>>>> of the boxes for safely letting unprivileged programs sandbox
>>>>>> themselves.
>>>>>
>>>>> you mean the attach part of seccomp syscall that deals with no_new_priv?
>>>>> sure, that's reusable.
>>>>>
>>>>>> Furthermore, to the extent that there are use cases for
>>>>>> unprivileged bpf+lsm that *aren't* expressible within the seccomp
>>>>>> hierarchy, I suspect that syscall filters have exactly the same
>>>>>> problem and that we should fix seccomp to cover it.
>>>>>
>>>>> not sure what you mean by 'seccomp hierarchy'. The normal process
>>>>> hierarchy ?
>>>>
>>>> Kind of.  I mean the filter layers that are inherited across fork(),
>>>> the TSYNC mechanism, etc.
>>>>
>>>>> imo the main deficiency of secccomp is inability to look into arguments.
>>>>> One can argue that it's a blessing, since composite args
>>>>> are not yet copied into the kernel memory.
>>>>> But in a lot of cases the seccomp arguments are FDs pointing
>>>>> to kernel objects and if programs could examine those objects
>>>>> the sandboxing scope would be more precise.
>>>>> lsm+bpf solves that part and I'd still argue that it's
>>>>> orthogonal to seccomp's pass/reject flow.
>>>>> I mean if seccomp says 'ok' the syscall should continue executing
>>>>> as normal and whatever LSM hooks were triggered by it may have
>>>>> their own lsm+bpf verdicts.
>>>>
>>>> I agree with all of this...
>>>>
>>>>> Furthermore in the process hierarchy different children
>>>>> should be able to set their own lsm+bpf filters that are not
>>>>> related to parallel seccomp+bpf hierarchy of programs.
>>>>> seccomp syscall can be an interface to attach programs
>>>>> to lsm hooks, but nothing more than that.
>>>>
>>>> I'm not sure what you mean.  I mean that, logically, I think we should
>>>> be able to do:
>>>>
>>>> seccomp(attach a syscall filter);
>>>> fork();
>>>> child does seccomp(attach some lsm filters);
>>>>
>>>> I think that they *should* be related to the seccomp+bpf hierarchy of
>>>> programs in that they are entries in the same logical list of filter
>>>> layers installed.  Some of those layers can be syscall filters and
>>>> some of the layers can be lsm filters.  If we subsequently add a way
>>>> to attach a removable seccomp filter or a way to attach a seccomp
>>>> filter that logs failures to some fd watched by an outside monitor, I
>>>> think that should work for lsm, too, with more or less the same
>>>> interface.
>>>>
>>>> If we need a way for a sandbox manager to opt different children into
>>>> different subsets of fancy filters, then I think that syscall filters
>>>> and lsm filters should use the same mechanism.
>>>>
>>>> I think we might be on the same page here and just saying it different ways.
>>>
>>> Sounds like it :)
>>> All of the above makes sense to me.
>>> The 'orthogonal' part is that the user should be able to use
>>> this seccomp-managed hierarchy without actually enabling
>>> TIF_SECCOMP for the task and syscalls should still go through
>>> fast path and all the way till lsm hooks as normal.
>>> I don't want to pay _any_ performance penalty for this feature
>>> for lsm hooks (and all syscalls) that don't have bpf programs attached.
>>
>> Yes, it seems that we are all on the same page here, and that match this
>> RFC implementation. So, using the seccomp(2) *interface* to attach
>> Landlock programs to a process hierarchy is still on track. :)
>>
> 
> So, I'm catching up on this after a little while away. I really like the 
> simplicity of the approach Daniel took with his patches. I began to have 
> difficulty reading your patchset once you got into using seccomp + unprivileged 
> mode. I would love to see a separate patchset that only have the verifier, and
> lsm hook changes. Do you think you could decompose your patchset into an MVP?
> 

OK, I'll try to split the common parts from the seccomp part, but there
is already a dedicated patch for the LSM hooks [06/22].


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply

* Re: [RFC 02/11] Add RoCE driver framework
From: Leon Romanovsky @ 2016-09-20 17:03 UTC (permalink / raw)
  To: Elior, Ariel
  Cc: Mintz, Yuval, dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	Mark Bloch, Ram Amrani, David Miller, Ariel Elior,
	Michal Kalderon, Rajesh Borundia,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, netdev
In-Reply-To: <CY1PR0701MB13376438A5767BC844EA35C290F70-UpKza+2NMNLi6bjPjkn3FE5OhdzP3rhOnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 3713 bytes --]

On Tue, Sep 20, 2016 at 03:04:12PM +0000, Elior, Ariel wrote:
> > From: Leon Romanovsky [mailto:leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org]
> > On Thu, Sep 15, 2016 at 05:11:03AM +0000, Mintz, Yuval wrote:
> > > > As a summary, I didn't see in your responses any real life example where you will
> > > > need global debug level for your driver.
> > >
> > > Not sure what you you're expecting - a list of BZs /private e-mails where
> > > user reproductions were needed?
> > > You're basically ignoring my claims that such are used, instead wanting
> > > "evidence". I'm not going to try and produce any such.
> >
> > I asked an example and not evidence, where "modprobe your_driver
> > debug=1" will be superior to "modprobe your_driver dyndbg==pmf".
> >
> > https://www.kernel.org/doc/Documentation/dynamic-debug-howto.txt
> Hi,
>
> dyndbg vs module param:
> Dynamic debug has two very nice features: dynamic activation/configuration and per line/file/module/format activation. The module param has neither, but it does have a few merits which I am not sure dyndbg has:
> (1)
> It can activate printouts according to *flow*. A lot of thought has been put into associating the right printout in our driver at the right verbosity level with the right "flow tag" (e.g. QEDR_MSG_INIT, QEDR_MSG_QP). The module parameter accepts a bitmask which allows setting any subset of these flows. This means that with the correct values for the parameter I can open only "init" printouts, or only "Memory Region" printouts, even if these cross multiple files / functions and don't share a common format. Presumably, one would claim that we could add the "flow tag" to the format to every printout according to its flow, but that would encumber the printouts, and also doesn't scale well to printouts which belong to multiple flows, where the current approach allows this (QEDR_MSG_SQ | QEDR_
 MSG_RQ).

Dynamic prints are enabled per-print. The best possible granularity
which you can achieve.

> (2)
> As Yuval pointed out, there are users out there which have no trouble loading a driver with a module parameter, at probe or at kernel boot, but would be at a loss in mounting debugfs and dumping a matchspec script into a node there. As kernel developers, educating users is part of what we do, but it comes with a cost.

You are free to add this parameter to your out-of-tree driver. As I said
before to Yuval your module "debug" is equal to "dyndbg". Your users can
use it and is already available in kernel.

> (3)
> debugfs can be compiled out of the kernel in some codesize sensitive environments, or may not exist in an emergency shell or kdump kernel, whereas the module parameter would always be there.
>
> Simply allowing our module parameter mechanism to be dynamically activated is very straightforward - we planned on adding a debugfs node for that anyway. But I would keep the module parameter for the sake of those less capable users and also for when debugfs is not available as detailed in (3) above.

This is an example why we don't want this module parameter, once added it
will be with us forever.

>
> I certainly see the merits of joining an existing infrastructure instead of implementing our own thing, but I would like to know if there are ways of obtaining the merits I listed for our approach via that infrastructure.
>
> Leon, aside of commenting on the above, can you give an example of a driver which in your opinion does a good job of using dyndbg?

All modern drivers in this subsystem are supporting dyndbg out of the
box.

General note, can you please configure your email to wrap lines? It is
very hard to read, follow and respond to lines with >800 symbols in them.

>
> Thanks,
> Ariel
>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb()
From: Sowmini Varadhan @ 2016-09-20 17:07 UTC (permalink / raw)
  To: Jiri Benc; +Cc: netdev, davem, hannes, aduyck, daniel, pabeni
In-Reply-To: <20160920184333.17444823@griffin>

On (09/20/16 18:43), Jiri Benc wrote:
> 
> IPv6 header is certainly not 20 bytes.

vxlan encapsulates an IPv6/ethernet frame in a UDP/IPv4/ethernet packet.
the ipv4 header is 20 bytes (without options).
mld_newpack is dealing with the vxlan net_device in this case,
which has ->hard_header_len of 14, and ->needed_headroom of 64,
so hlen (used in mld_newpack) comes out to be 80.

> If you see unaligned access, something is wrong. But I very much doubt
> that the problem is at the place you're trying to fix. Could you share
> the traces with us?

Exactly what traces do you want? I think you can check this easily
by using printk's predicated on IS_ALIGNED checks in vxlan_build_skb().
IPv6 MLD triggers this quite easily.

I will try out Hannes' solution (which makes sense) in a moment, 
and report back.

--Sowmini

^ permalink raw reply

* Re: [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing
From: Mickaël Salaün @ 2016-09-20 17:08 UTC (permalink / raw)
  To: Pavel Machek
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, James Morris, Kees Cook,
	Paul Moore, Sargun Dhillon, Serge E . Hallyn, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev
In-Reply-To: <20160915091902.GA13132@amd>


[-- Attachment #1.1: Type: text/plain, Size: 3190 bytes --]


On 15/09/2016 11:19, Pavel Machek wrote:
> Hi!
> 
>> This series is a proof of concept to fill some missing part of seccomp as the
>> ability to check syscall argument pointers or creating more dynamic security
>> policies. The goal of this new stackable Linux Security Module (LSM) called
>> Landlock is to allow any process, including unprivileged ones, to create
>> powerful security sandboxes comparable to the Seatbelt/XNU Sandbox or the
>> OpenBSD Pledge. This kind of sandbox help to mitigate the security impact of
>> bugs or unexpected/malicious behaviors in userland applications.
>>
>> The first RFC [1] was focused on extending seccomp while staying at the syscall
>> level. This brought a working PoC but with some (mitigated) ToCToU race
>> conditions due to the seccomp ptrace hole (now fixed) and the non-atomic
>> syscall argument evaluation (hence the LSM hooks).
> 
> Long and nice description follows. Should it go to Documentation/
> somewhere?
> 
> Because some documentation would be useful...
> 								Pavel

Right, but I was looking for feedback before investing in documentation. :)


> 
>>  include/linux/bpf.h                   |  41 +++++
>>  include/linux/lsm_hooks.h             |   5 +
>>  include/linux/seccomp.h               |  54 ++++++-
>>  include/uapi/asm-generic/errno-base.h |   1 +
>>  include/uapi/linux/bpf.h              | 103 ++++++++++++
>>  include/uapi/linux/seccomp.h          |   2 +
>>  kernel/bpf/arraymap.c                 | 222 +++++++++++++++++++++++++
>>  kernel/bpf/syscall.c                  |  18 ++-
>>  kernel/bpf/verifier.c                 |  32 +++-
>>  kernel/fork.c                         |  41 ++++-
>>  kernel/seccomp.c                      | 211 +++++++++++++++++++++++-
>>  samples/Makefile                      |   2 +-
>>  samples/landlock/.gitignore           |   1 +
>>  samples/landlock/Makefile             |  16 ++
>>  samples/landlock/sandbox.c            | 295 ++++++++++++++++++++++++++++++++++
>>  security/Kconfig                      |   1 +
>>  security/Makefile                     |   2 +
>>  security/landlock/Kconfig             |  19 +++
>>  security/landlock/Makefile            |   3 +
>>  security/landlock/checker_cgroup.c    |  96 +++++++++++
>>  security/landlock/checker_cgroup.h    |  18 +++
>>  security/landlock/checker_fs.c        | 183 +++++++++++++++++++++
>>  security/landlock/checker_fs.h        |  20 +++
>>  security/landlock/lsm.c               | 228 ++++++++++++++++++++++++++
>>  security/security.c                   |   1 +
>>  25 files changed, 1592 insertions(+), 23 deletions(-)
>>  create mode 100644 samples/landlock/.gitignore
>>  create mode 100644 samples/landlock/Makefile
>>  create mode 100644 samples/landlock/sandbox.c
>>  create mode 100644 security/landlock/Kconfig
>>  create mode 100644 security/landlock/Makefile
>>  create mode 100644 security/landlock/checker_cgroup.c
>>  create mode 100644 security/landlock/checker_cgroup.h
>>  create mode 100644 security/landlock/checker_fs.c
>>  create mode 100644 security/landlock/checker_fs.h
>>  create mode 100644 security/landlock/lsm.c
>>
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply

* Re: [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb()
From: Jiri Benc @ 2016-09-20 17:09 UTC (permalink / raw)
  To: Sowmini Varadhan; +Cc: netdev, davem, hannes, aduyck, daniel, pabeni
In-Reply-To: <20160920184333.17444823@griffin>

On Tue, 20 Sep 2016 18:43:33 +0200, Jiri Benc wrote:
> On Tue, 20 Sep 2016 12:31:08 -0400, Sowmini Varadhan wrote:
> > The vxlan header is after the ethernet header (14 bytes),
> > IP header (20 bytes, assuming no options) and udp header (8 bytes).
> 
> If the VXLAN header is not aligned to 4 bytes, then IP header is not
> aligned to 4 bytes, too, and you have greater problems than just VXLAN.

...which is indeed the case. Sorry, I guess I'm too much focused on
VXLAN-GPE nowadays and I missed that we're building this before the
inner Ethernet header.

But the point stands, we have much greater problems here than VXLAN.
And I don't think that wrapping all IP address accesses into
get/put_unaligned all around the stack is the solution.

 Jiri

^ permalink raw reply

* Re: [PATCH v3 net-next 1/2] net: skbuff: Remove errornous length validation in skb_vlan_pop()
From: pravin shelar @ 2016-09-20 17:12 UTC (permalink / raw)
  To: Shmulik Ladkani
  Cc: David S. Miller, Jiri Pirko, Daniel Borkmann, Eric Dumazet,
	Linux Kernel Network Developers, Shmulik Ladkani
In-Reply-To: <1474364917-31710-1-git-send-email-shmulik.ladkani@gmail.com>

On Tue, Sep 20, 2016 at 2:48 AM, Shmulik Ladkani
<shmulik.ladkani@ravellosystems.com> wrote:
> In 93515d53b1
>   "net: move vlan pop/push functions into common code"
> skb_vlan_pop was moved from its private location in openvswitch to
> skbuff common code.
>
> In case skb has non hw-accel vlan tag, the original 'pop_vlan()' assured
> that skb->len is sufficient (if skb->len < VLAN_ETH_HLEN then pop was
> considered a no-op).
>
> This validation was moved as is into the new common 'skb_vlan_pop'.
>
> Alas, in its original location (openvswitch), there was a guarantee that
> 'data' points to the mac_header, therefore the 'skb->len < VLAN_ETH_HLEN'
> condition made sense.
> However there's no such guarantee in the generic 'skb_vlan_pop'.
>
> For short packets received in rx path going through 'skb_vlan_pop',
> this causes 'skb_vlan_pop' to fail pop-ing a valid vlan hdr (in the non
> hw-accel case) or to fail moving next tag into hw-accel tag.
>
> Remove the 'skb->len < VLAN_ETH_HLEN' condition entirely:
> It is superfluous since inner '__skb_vlan_pop' already verifies there
> are VLAN_ETH_HLEN writable bytes at the mac_header.
>
> Note this presents a slight change to skb_vlan_pop() users:
> In case total length is smaller than VLAN_ETH_HLEN, skb_vlan_pop() now
> returns an error, as opposed to previous "no-op" behavior.
> Existing callers (e.g. tc act vlan, ovs) usually drop the packet if
> 'skb_vlan_pop' fails.
>
> Fixes: 93515d53b1 ("net: move vlan pop/push functions into common code")
> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
> Cc: Pravin Shelar <pshelar@ovn.org>
> ---
>  v3: Elaborate log message to explain the change presented to
>      'skb_vlan_pop' users for packets smaller than VLAN_ETH_HLEN
>
>  v2: Remove 'skb->len < VLAN_ETH_HLEN' condition entirely (instead of
>      testing skb->mac_len as was in v1), suggested by Pravin Shelar
>
>  net/core/skbuff.c | 10 ++++------
>  1 file changed, 4 insertions(+), 6 deletions(-)
>

Reviewed-by: Pravin B Shelar <pshelar@ovn.org>

^ permalink raw reply

* Re: [PATCH v3 net-next 2/2] net: skbuff: Coding: Use eth_type_vlan() instead of open coding it
From: pravin shelar @ 2016-09-20 17:13 UTC (permalink / raw)
  To: Shmulik Ladkani
  Cc: David S. Miller, Jiri Pirko, Daniel Borkmann, Eric Dumazet,
	Linux Kernel Network Developers, Shmulik Ladkani
In-Reply-To: <1474364917-31710-2-git-send-email-shmulik.ladkani@gmail.com>

On Tue, Sep 20, 2016 at 2:48 AM, Shmulik Ladkani
<shmulik.ladkani@ravellosystems.com> wrote:
> Fix 'skb_vlan_pop' to use eth_type_vlan instead of directly comparing
> skb->protocol to ETH_P_8021Q or ETH_P_8021AD.
>
> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>

Reviewed-by: Pravin B Shelar <pshelar@ovn.org>

^ permalink raw reply

* Re: [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb()
From: Jiri Benc @ 2016-09-20 17:15 UTC (permalink / raw)
  To: Sowmini Varadhan; +Cc: netdev, davem, hannes, aduyck, daniel, pabeni
In-Reply-To: <20160920170742.GR8920@oracle.com>

On Tue, 20 Sep 2016 13:07:42 -0400, Sowmini Varadhan wrote:
> I will try out Hannes' solution (which makes sense) in a moment, 
> and report back.

No objections to marking the struct as packed. Will only push the
problem to a different place, though.

 Jiri

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox