Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v3 02/31] usercopy: Enforce slab cache usercopy region boundaries
From: Christopher Lameter @ 2017-09-21 15:23 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, David Windsor, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Laura Abbott, Ingo Molnar,
	Mark Rutland, linux-mm, linux-xfs, linux-fsdevel, netdev,
	kernel-hardening
In-Reply-To: <1505940337-79069-3-git-send-email-keescook@chromium.org>

On Wed, 20 Sep 2017, Kees Cook wrote:

> diff --git a/mm/slab.c b/mm/slab.c
> index 87b6e5e0cdaf..df268999cf02 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -4408,7 +4408,9 @@ module_init(slab_proc_init);
>
>  #ifdef CONFIG_HARDENED_USERCOPY
>  /*
> - * Rejects objects that are incorrectly sized.
> + * Rejects incorrectly sized objects and objects that are to be copied
> + * to/from userspace but do not fall entirely within the containing slab
> + * cache's usercopy region.
>   *
>   * Returns NULL if check passes, otherwise const char * to name of cache
>   * to indicate an error.
> @@ -4428,11 +4430,15 @@ const char *__check_heap_object(const void *ptr, unsigned long n,
>  	/* Find offset within object. */
>  	offset = ptr - index_to_obj(cachep, page, objnr) - obj_offset(cachep);
>
> -	/* Allow address range falling entirely within object size. */
> -	if (offset <= cachep->object_size && n <= cachep->object_size - offset)
> -		return NULL;
> +	/* Make sure object falls entirely within cache's usercopy region. */
> +	if (offset < cachep->useroffset)
> +		return cachep->name;
> +	if (offset - cachep->useroffset > cachep->usersize)
> +		return cachep->name;
> +	if (n > cachep->useroffset - offset + cachep->usersize)
> +		return cachep->name;
>
> -	return cachep->name;
> +	return NULL;
>  }
>  #endif /* CONFIG_HARDENED_USERCOPY */

Looks like this is almost the same for all allocators. Can we put this
into mm/slab_common.c?

^ permalink raw reply

* Re: [PATCH net-next 3/4] cxgb4: add support to offload action vlan
From: Kumar Sanghvi @ 2017-09-21 15:23 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: Rahul Lakkireddy, netdev, davem, ganeshgr, nirranjan, indranil
In-Reply-To: <20170921085508.GA2028@nanopsycho>

Hi Jiri,

On Thursday, September 09/21/17, 2017 at 10:55:08 +0200, Jiri Pirko wrote:
> Thu, Sep 21, 2017 at 09:33:36AM CEST, rahul.lakkireddy@chelsio.com wrote:
> >From: Kumar Sanghvi <kumaras@chelsio.com>
> >
> >Add support for offloading tc-flower flows having
> >vlan actions: pop, push and modify.
> >
> >Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
> >Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
> >Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
> >---
> > .../net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c   | 43 ++++++++++++++++++++++
> > 1 file changed, 43 insertions(+)
> >
> >diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
> 
> [...]
> 
> 
> >+			switch (vlan_action) {
> >+			case TCA_VLAN_ACT_POP:
> >+				break;
> >+			case TCA_VLAN_ACT_PUSH:
> >+			case TCA_VLAN_ACT_MODIFY:
> >+				if (proto != ETH_P_8021Q) {
> >+					netdev_err(dev,
> >+						   "%s: Unsupp. vlan proto\n",
> 
> Don't wrap this. Also "Unsupp."vs"Unsupported". Please be consistent.

Thank you for pointing this.
I will take care of this in V2.

> 
> 
> >+						   __func__);
> >+					return -EOPNOTSUPP;
> >+				}
> >+				break;
> >+			default:
> >+				netdev_err(dev, "%s: Unsupported vlan action\n",
> >+					   __func__);
> >+				return -EOPNOTSUPP;
> >+			}
> > 		} else {
> > 			netdev_err(dev, "%s: Unsupported action\n", __func__);
> > 			return -EOPNOTSUPP;
> >-- 
> >2.14.1
> >

^ permalink raw reply

* Re: [patch net-next 07/12] mlxsw: spectrum: Add the multicast routing offloading logic
From: Andrew Lunn @ 2017-09-21 15:26 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, yotamg, idosch, mlxsw
In-Reply-To: <20170921064338.1282-8-jiri@resnulli.us>

> +static void mlxsw_sp_mr_route_stats_update(struct mlxsw_sp *mlxsw_sp,
> +					   struct mlxsw_sp_mr_route *mr_route)
> +{
> +	struct mlxsw_sp_mr *mr = mlxsw_sp->mr;
> +	u64 packets, bytes;
> +
> +	if (mr_route->route_action == MLXSW_SP_MR_ROUTE_ACTION_TRAP)
> +		return;
> +
> +	mr->mr_ops->route_stats(mlxsw_sp, mr_route->route_priv, &packets,
> +				&bytes);
> +
> +	switch (mr_route->mr_table->proto) {
> +	case MLXSW_SP_L3_PROTO_IPV4:
> +		mr_route->mfc4->mfc_un.res.pkt = packets;
> +		mr_route->mfc4->mfc_un.res.bytes = bytes;

What about wrong_if and lastuse? 

Is an mfc with iif on the host, not the switch, not offloaded?

   Andrew

^ permalink raw reply

* Re: [PATCH v3 03/31] usercopy: Mark kmalloc caches as usercopy caches
From: Christopher Lameter @ 2017-09-21 15:27 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, David Windsor, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, linux-mm, linux-xfs, linux-fsdevel,
	netdev, kernel-hardening
In-Reply-To: <1505940337-79069-4-git-send-email-keescook@chromium.org>

On Wed, 20 Sep 2017, Kees Cook wrote:

> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1291,7 +1291,8 @@ void __init kmem_cache_init(void)
>  	 */
>  	kmalloc_caches[INDEX_NODE] = create_kmalloc_cache(
>  				kmalloc_info[INDEX_NODE].name,
> -				kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS);
> +				kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS,
> +				0, kmalloc_size(INDEX_NODE));
>  	slab_state = PARTIAL_NODE;
>  	setup_kmalloc_cache_index_table();

Ok this presumes that at some point we will be able to restrict the number
of bytes writeable and thus set the offset and size field to different
values. Is that realistic?

We already whitelist all kmalloc caches (see first patch).

So what is the point of this patch?

^ permalink raw reply

* Re: [PATCH net-next v2] bridge: also trigger RTM_NEWLINK when interface is released from bridge
From: Vincent Bernat @ 2017-09-21 15:31 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: Stephen Hemminger, David Ahern, David Miller, bridge,
	netdev@vger.kernel.org
In-Reply-To: <CAJieiUgKO-O3YFc3iTQXMmXc287RGgw4FjYhaLn-7+pmA_Fh+g@mail.gmail.com>

 ❦ 21 septembre 2017 08:09 -0700, Roopa Prabhu <roopa@cumulusnetworks.com> :

>>> The one concern is that ports added or removed through ioctl should
>>> cause same events as doing the same thing via netlink. Some users use
>>> brctl (ioctl) and others use newer bridge (netlink) API.
>>
>> I'll make a third iteration to have the same notifications when using
>> ioctl() with details in the commit message.
>> --
>
> as long as the ioctl path for bridge port removal is generating a:
> RTM_DELLINK with AF_BRIDGE and
> RTM_NEWLINK with AF_UNSPEC with 'master' removed
>
> we should be good. These are the most important ones.
>
> are there other specific bridge-events missing ?. you might want to
> run `bridge monitor link` also. and un-slaving of a port also triggers
> fdb events which are visible under `bridge monitor fdb`

With the patch, bridge monitor link generates the same event with
ioctl() than with netlink (like for ip monitor link, netlink generates
one extra duplicate event when enslaving).

For bridge monitor fdb, there is a difference. With ioctl(), I don't get
the event for VLAN1:

Deleted ca:18:06:bc:f6:11 dev dummy1 vlan 1 master bridge0 permanent

I suppose this is an expected difference due to the inability to manage
VLAN-aware bridges with ioctl().
-- 
Use the fundamental control flow constructs.
            - The Elements of Programming Style (Kernighan & Plauger)

^ permalink raw reply

* Re: [PATCH net-next 2/4] cxgb4: add basic tc flower offload support
From: Kumar Sanghvi @ 2017-09-21 15:31 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: Rahul Lakkireddy, netdev, davem, ganeshgr, nirranjan, indranil
In-Reply-To: <cf99ae13-6ae1-2a1f-80cb-9390ab9e3af0@huawei.com>

Hi Yunsheng,

On Thursday, September 09/21/17, 2017 at 18:55:26 +0800, Yunsheng Lin wrote:
> Hi, Kumar
> 
> On 2017/9/21 15:33, Rahul Lakkireddy wrote:
> > From: Kumar Sanghvi <kumaras@chelsio.com>
> > 
> > Add support to add/remove flows for offload.  Following match
> > and action are supported for offloading a flow:
> > 
> > Match: ether-protocol, IPv4/IPv6 addresses, L4 ports (TCP/UDP)
> > Action: drop, redirect to another port on the device.
> > 
> > The qualifying flows can have accompanying mask information.
> > 
> > Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
> > Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
> > Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
> > ---

[...]

> > diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
> > index 45b5853ca2f1..07a4619e2164 100644
> > --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
> > +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
> > @@ -148,6 +148,32 @@ static int get_filter_steerq(struct net_device *dev,
> >  	return iq;
> >  }
> >  
> > +int cxgb4_get_free_ftid(struct net_device *dev, int family)
> > +{
> > +	struct adapter *adap = netdev2adap(dev);
> > +	struct tid_info *t = &adap->tids;
> > +	int ftid;
> > +
> > +	spin_lock_bh(&t->ftid_lock);
> > +	if (family == PF_INET) {
> > +		ftid = find_first_zero_bit(t->ftid_bmap, t->nftids);
> > +		if (ftid >= t->nftids)
> > +			ftid = -1;
> > +	} else {
> > +		ftid = bitmap_find_free_region(t->ftid_bmap, t->nftids, 2);
> > +		if (ftid < 0) {
> > +			ftid = -1;
> 
> ftid = -1 is not needed?

You are right, its not needed. Thank you for pointing this.
I will take care of this in V2.


> 
> > +			goto out_unlock;
> > +		}
> > +
> > +		/* this is only a lookup, keep the found region unallocated */
> > +		bitmap_release_region(t->ftid_bmap, ftid, 2);
> > +	}
> > +out_unlock:
> > +	spin_unlock_bh(&t->ftid_lock);
> > +	return ftid;
> > +}

[...]

> > diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
> > index 16dff71e4d02..1af01101faaf 100644
> > --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
> > +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
> > @@ -38,16 +38,292 @@
> >  #include "cxgb4.h"
> >  #include "cxgb4_tc_flower.h"
> > 

[...]

> >  int cxgb4_tc_flower_replace(struct net_device *dev,
> >  			    struct tc_cls_flower_offload *cls)
> >  {
> > -	return -EOPNOTSUPP;
> > +	struct adapter *adap = netdev2adap(dev);
> > +	struct ch_tc_flower_entry *ch_flower;
> > +	struct ch_filter_specification *fs;
> > +	struct filter_ctx ctx;
> > +	int fidx;
> > +	int ret;
> > +
> > +	if (cxgb4_validate_flow_actions(dev, cls))
> > +		return -EOPNOTSUPP;
> > +
> > +	if (cxgb4_validate_flow_match(dev, cls))
> > +		return -EOPNOTSUPP;
> > +
> > +	ch_flower = allocate_flower_entry();
> > +	if (!ch_flower) {
> > +		netdev_err(dev, "%s: ch_flower alloc failed.\n", __func__);
> > +		ret = -ENOMEM;
> > +		goto err;
> 
> Just return, err label is needed?

Yes, err label is not needed. I will take care of this in V2.

> 
> > +	}
> > +
> > +	fs = &ch_flower->fs;
> > +	fs->hitcnts = 1;
> > +	cxgb4_process_flow_actions(dev, cls, fs);
> > +	cxgb4_process_flow_match(dev, cls, fs);

[...]


> >  int cxgb4_tc_flower_destroy(struct net_device *dev,
> >  			    struct tc_cls_flower_offload *cls)
> >  {
> > -	return -EOPNOTSUPP;
> > +	struct adapter *adap = netdev2adap(dev);
> > +	struct ch_tc_flower_entry *ch_flower;
> > +	int ret;
> > +
> > +	ch_flower = ch_flower_lookup(adap, cls->cookie);
> > +	if (!ch_flower) {
> > +		ret = -ENOENT;
> > +		goto err;
> 
> Same as above

I will take care of this in V2.
Thank you.

^ permalink raw reply

* [PATCH net-next] test_rhashtable: remove initdata annotation
From: Florian Westphal @ 2017-09-21 15:36 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal

kbuild test robot reported a section mismatch warning w. gcc 4.x:
WARNING: lib/test_rhashtable.o(.text+0x139e):
Section mismatch in reference from the function rhltable_insert.clone.3() to the variable .init.data:rhlt

so remove this annotation.

Fixes: cdd4de372ea06 ("test_rhashtable: add test case for rhl_table interface")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
I don't see this warning with same .config and gcc 6.3.x; however
the annotation isn't essential so just remove it.

diff --git a/lib/test_rhashtable.c b/lib/test_rhashtable.c
index de4d0584631a..8e83cbdc049c 100644
--- a/lib/test_rhashtable.c
+++ b/lib/test_rhashtable.c
@@ -251,7 +251,7 @@ static s64 __init test_rhashtable(struct rhashtable *ht, struct test_obj *array,
 }
 
 static struct rhashtable ht;
-static struct rhltable rhlt __initdata;
+static struct rhltable rhlt;
 
 static int __init test_rhltable(unsigned int entries)
 {
-- 
2.13.5

^ permalink raw reply related

* Re: [PATCH net 1/4] net:ethernet:aquantia: Setup max_mtu in ndev to enable jumbo frames
From: Andrew Lunn @ 2017-09-21 15:36 UTC (permalink / raw)
  To: Igor Russkikh
  Cc: David S . Miller, netdev, David Arcari, Pavel Belous,
	Nadezhda Krupnina, Simon Edelhaus
In-Reply-To: <1b556b3a2fd12c566912541f3c969d4bdc5dde9c.1505915085.git.igor.russkikh@aquantia.com>

> @@ -283,6 +282,8 @@ int aq_nic_ndev_init(struct aq_nic_s *self)
>  	self->ndev->features = aq_hw_caps->hw_features;
>  	self->ndev->priv_flags = aq_hw_caps->hw_priv_flags;
>  	self->ndev->mtu = aq_nic_cfg->mtu - ETH_HLEN;
> +	self->ndev->min_mtu = ETH_MIN_MTU;

This is not required. It will default to ETH_MIN_MTU.

     Andrew

> +	self->ndev->max_mtu = self->aq_hw_caps.mtu - ETH_FCS_LEN - ETH_HLEN;
>  
>  	return 0;
>  }
> @@ -695,7 +696,7 @@ int aq_nic_set_mtu(struct aq_nic_s *self, int new_mtu)
>  {
>  	int err = 0;
>  
> -	if (new_mtu > self->aq_hw_caps.mtu) {
> +	if (new_mtu + ETH_FCS_LEN > self->aq_hw_caps.mtu) {

If you have set max_mtu correctly, this cannot happen.  But it seems
odd you don't have ETH_HLEN here, where as when setting max_mtu you
do.

   Andrew

^ permalink raw reply

* Re: [PATCH net 3/4] net:ethernet:aquantia: Fix transient invalid link down/up indications
From: Andrew Lunn @ 2017-09-21 15:38 UTC (permalink / raw)
  To: Igor Russkikh
  Cc: David S . Miller, netdev, David Arcari, Pavel Belous,
	Nadezhda Krupnina, Simon Edelhaus
In-Reply-To: <6c8d54cad231f22ba46e08d3ec0fe9d6fcef7468.1505915085.git.igor.russkikh@aquantia.com>

On Thu, Sep 21, 2017 at 01:53:41PM +0300, Igor Russkikh wrote:
> Due to a bug in aquantia atlantic card firmware, it sometimes reports
> invalid link speed bits. That caused driver to report link down events,
> although link itself is totally fine.
> 
> This patch ignores such out of blue readings.
> 
> Signed-off-by: Pavel Belous <Pavel.Belous@aquantia.com>
> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
> ---
>  drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
> index 4f5ec9a..ab5d3cb 100644
> --- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
> +++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
> @@ -351,8 +351,7 @@ int hw_atl_utils_mpi_get_link_status(struct aq_hw_s *self)
>  			break;
>  
>  		default:
> -			link_status->mbps = 0U;
> -			break;
> +			return -1;

Hi Igno

Please use a proper error code.

       Andrew

^ permalink raw reply

* Re: [PATCH net-next 00/14] gtp: Additional feature support
From: Harald Welte @ 2017-09-21 15:38 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Tom Herbert, David S. Miller, Linux Kernel Network Developers,
	Pablo Neira Ayuso, Rohit Seth
In-Reply-To: <CAPDqMep60z-7QhAQAyBk_mZbqHgV8=tkn=v6BxRwK0hqMX4+tg@mail.gmail.com>

Hi Tom,

On Tue, Sep 19, 2017 at 04:47:11PM -0700, Tom Herbert wrote:
> On Tue, Sep 19, 2017 at 4:19 PM, Harald Welte <laforge@gnumonks.org> wrote:
>
> > I think there has to be a clear plan/architecture on how to implement
> > those bits in terms of the kernel/userspace split, and at least a proof
> > of concept implementation that we can show works with some real phones
> > out there - otherwise there's no point in having IPv6 support that works
> > well with some custom tools.
> >
> OTOH, I will argue that the GTP patches should never have been allowed
> in the kernel in the first place without IPv6 support! ;-) 

Well, it could be shown that the code works with integration to OpenGGSN
(and later ergw, and now OsmoGGSN) and works within a complete 3GPP
network.  So we were not merging something that we hypothesized it would
work once the rest would be implemented, but we could actually show it
was working before it got merged.

> I think the best plan forward is to get the IPv6 data path running
> that so can demonstrate a functional GTP/IPv6 datapath 

Yes, but a functional "datapath" (= GTP-U) unfortunately includes the
way how address allocation/assignment is done.  Putting something into
the mainline kernel that we know will for sure not work/interop in a
real scenario is not a good idea.  I think what's worse than not
supporting a feature is to implying support for it while actually
doing it in an incomplete/incompliant way.

So from my point of view, what's needed is
* making sure router advertisement/solicitation are covered in some
  way, either by doing it in the kernel or having a clear strategy how
  it could be done from userspace while not a) introducing races regarding
  who owns the sequence numbers
* making sure the implementation covers entire /64 prefixes for each
  PDP context and not single addresses.  That is non-negotiable and
  mandatory by 3GPP specs.

> Since "real" configuration path doesn't use the path to set up a
> standalone interface, I would presume that that will be fleshed when
> someone has cycles and expertise to work on both sides of the problem.

The configuration will be different, yes. but we need to ensure that the
actual *implementation* of the data path does what it is expected to do,
no matter who configures it via which interface.

> Even if this requires structural changes to how IPv6 is managed in
> GTP, I doubt that the fundamental TX/RX, GRO/GSO data paths will
> change much.  In other words, please consider this to be a step on an
> evolutionary path. More work is required to reach the ultimate
> deployable solution.

Agreed. But then I'm still against merging something that we know for
sure will not be compatible with real-world use case.  It should be kept
out of mainline until we are sure of that, at the very least
theoretically, but even better which we can prove in practise will do
what it claims to do.

> As for testing on real phones, that is cannot be a requirement for a
> kernel feature. 

GTP is not implemented on phones.  GTP is implemented only inside the
fixed (land-side) of the cellular network.  However, the inner IP data
originates from phones, and large parts of what you see on GTP
originates from phones in a different format.  The inner IP data inside
GTP-U originates from phones.  And that's where address configuration
for IPv6 works.

> If you expect Linux community to support this, then we
> need a way to be develop and test on commodity PC hardware. 

I'm not arguing you need to run any of the code on a different
architecture such as a phone.  I'm arguing for you to run a cellular
network including the kernel GTP code, and then use that cellular
network from a real phone.  This is the only way to know for sure you
interoperate.  See my other mail related to the Open Soruce based
configurations for both 2G and 3G that can be used for this.

As a replacement, one can e.g. look at protocol traces of real phones
and then simulate the behavior of one or several different phones and
implement that as test caess.  This is what I did in the GGSN_Test
pointed out in my other mail in this thread.  And btw, all I've asked is
for showing it works with *one* phone model at all.  I'm not talking
about the various different implementation specifics, such as whether or
not the phone will insist on using neighbor solicitation and mandate
neighbor advertisement (on a point-to-point link, how absurd!) after
the (mandatory) router discovery.

> That is one of the major values of creating a standalone interface
> configuration-- we can test the datapath just like any other
> encapsulation supported by the kernel.

Well, you cannot.  You might be able to do some benchmarking to compare
if an old version of the kernel gtp driver will perform better or worse
than some optimizations introduced.  But you can *not* have a realistic
functional test that will tell you if your implementation is 'valid'.
I'm not even talking about being 'complete' here, but simply about being
broken or not.  Or test whether it will interoperate.  Particularly for
IPv6 this is impossible, due to the conflated way of involving both
GTP-C and GTP-U with router advertisement+solicitation for PDP context
activation, as outlined several times in this thread.

I wish it was simpler, but I haven't created GTP, sorry :)

-- 
- Harald Welte <laforge@gnumonks.org>           http://laforge.gnumonks.org/
============================================================================
"Privacy in residential applications is a desirable marketing option."
                                                  (ETSI EN 300 175-7 Ch. A6)

^ permalink raw reply

* Re: [PATCH net-next 09/14] gtp: Allow configuring GTP interface as standalone
From: Harald Welte @ 2017-09-21 15:12 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Andreas Schultz, Tom Herbert, David S. Miller,
	Linux Kernel Network Developers, Pablo Neira Ayuso, Rohit Seth
In-Reply-To: <CAPDqMeox74aEdV5_cr7HJ=kh0-etLdFE-u5=g8OqRzmS8OjOuQ@mail.gmail.com>

Hi Tom,

On Wed, Sep 20, 2017 at 05:55:01PM -0700, Tom Herbert wrote:
> You have the point of view of someone who has a lot of experience
> dealing with this protocol. Try to imagine if you were some random
> kernel network programmer with no experience in the area. If they
> happen to find a one-off bug and want to do the right thing by running
> a test, you want to make that as easy as possible. 

Agreed.  But we're not talking abut fixing a random bug in your patch
series, but we're talking about adding significant new features - and
those features need to be tested in real use caes, not just in an
artificial test setup that holds assumptions that are not true.

To improve performance, or to fix simple bugs that only affect the
processing of the GTP-U G-PDU, a much more limited and hence
"unrealistic" test scenario is probably sufficient/acceptable.

> From that perspective, building protocol specific libraries and
> finding the right cmd line to run is significant hoops (I can attest
> to this).

I understand your argument.  But then, there is actually quite some
tools to help you (see further below), as well as the wiki page at
http://osmocom.org/projects/linux-kernel-gtp-u/wiki/Basic_Testing

Of course, existing tools and existing wiki pages also only document
existing features of the kernel code :)

Yes, the documentation could be better.  But then, how much more can you
expect from somebody who's doing this mostly for fun and who - despite
working in his dayjob on FOSS cellular projects - has no single
commercial project/context that uses the kernel GTP code.

In any case, working on a specific protocol or technology will require
that you understand that technology to some extent, including the
available tools.  There's always a learning curve involved.

> There are other examples in the kernel of systems bigger than GTP that
> require a whole lot of effort just to run a simple test; you'll notice
> for those it's rare that best developers ever bother to look at them
> unless they're making a global change that affects the code. We don't
> want GTP to take be like that!

I'm all for following your argument.  My point is simply: You cannot
develop code solely based on mock-ups without any 'realistic' test
scenarios.  Otherwise you will end up with something that works only in
your artificial lab setup, and follows all the best practises of the way
how the Linux kernel traditionally approaches tunneling implementations,
but it will never work/interop in the real world.

And I'm very strongly opposed to merging code where we have not been
able to show that it will inter-operate in at least one realistic
scenario.  This would raise wrong expectations with users and all sorts
of downstream problems.

So let's say we merge your IPv6 support as-is, and kernels get released
+ shipped with it.  Later on, we find that in order to turn it into a
standards-compliant implementation together with all the required bits
in userspace and on the control plane, we need to change some parts of
it, particularly those parts that affect the netlink or any other
exposed userspace interface.  At that point, we cannot change the
interface as the kernel has a strict rule of never breaking userspace
ABI.  But we must change it in order to make it work in the real world.
So what do we do?  Add lots of cruft in order to emulate backwards
compatibility?

> So the forward looking question now is how to get to be able to run a
> "realistic setup"?

You can run this realistic setup entirely using Osmocom components.

For running a 2G network: OsmoBTS+OsmoNITB+OsmoSGSN
For running a 3G network: OsmoHNBGW+OsmoMSC+OsmoHLR+OsmoSGSN

Both above stacks/combinations will provide you with GTP-C and GTP-U
against a GGSN.  As GGSN, you can then use either OpenGGSN, or OsmoGGSN,
or ergw.  For OpenGGSN and ergw, this will work with kernel GTP today,
for those features present in kernel GTP (i.e. IPv4-only).

In both cases you need some RF hardware.  I'm happy to contribute
related hardware (and support getting it set up) free of charge from my
company sysmocom to anyone who has a realistic prospect of either
* integrating your IPv6 support or other significant feature patches with
  libgtpnl + OsmoGGSN (at which point you can run a complete setup with
  real phones to verify it works end-to-end)
* building and documenting or operating a continuous integration setup
  that would run tests on each new kernel version (or net-next, or
  whatever tree makes sense) to help us catch any regressions as the
  code proceeds

In order to have a smaller, but still realistic test scenario, I
implemented a series of GTP tests in
http://git.osmocom.org/osmo-ttcn3-hacks/tree/ggsn_tests/GGSN_Tests.ttcn

This code basically emulates the combination of everything from Phone to
SGSN, so that you have only two entities:
* the implementation under test = IUT (a GGSN implementing GTP-C + GTP-U)
* the test itself (GGSN_Test) executing against the IUT

The code so far implements PDP context activation + address allocation
for IPv4-only and IPv6-only cases and can be run against a GGSN
implementing those.  The IPv6-only PDP context unit tests include the
convoluted two-phase address assignment including sending the router
solicitation from the simulated phone as well as verifying the router
advertisement sent in response from the GGSN.

Yes, I know they're written in an unknown niche programming language
called TTCN-3, but this was the best tool at hand for the job I could
find, and the tests are open source as is the Eclipse Titan toolchain
for compiling it.

We even have a Dockerfile that will build you a docker container
containing the compiled GGSN_test at
http://git.osmocom.org/docker-playground/tree/ggsn-test

That docker container is used by a jenkins build test job to test
current OsmoGGSN master every night against the test suite.

I was stupid enough to break the testsuite with an accidential commit,
so tests between August 27th and today failed.  I've just reverted that
accident - don't let that mistake of mine mislead you.

I'm happy to contribute further to this by adding actual user-IP GTP-U
functional testing beyond the router soliciatation/advertisement to it.

So my suggestion in terms of a "realistic testbed without having to
configure + run dozens of programs and using real RF + phones" is to use
that testsuite.

What one cannot get around is having to implement support for new
features added on the kernel side such as IPv6 in libgtpnl and at least
one of the GGSN's using it, sorry.  Without that, there is no way to
know if that code would do anything useful.  You simply cannot
realistically test GTP-U alone without GTP-C.

I'd love to offer help on this, but it's really impossible right now.
I'm on holidays on a motorbike tour through rural Taiwan's mountains,
had to deal with a flat tire today, have limited connectivity and am
already cutting down hard on sleep every night to be able to respond to
the absolute minimally required work e-mails.  And review/follow-up to
your much appreciated patch series the last couple of days has also used
a lot of (unexpected not scheduled for) time.  I'm not complaining, I'm
just saying I am really not able to contribute more to this effort right
now beyond my review, the offer of free hardware for a real cellular
network, and the extension of the test cases for GTP-U beyond the
already implemented very important IPv6 address allocation/assignment
which I believe your current code would not pass.

Regards,
	Harald

-- 
- Harald Welte <laforge@gnumonks.org>           http://laforge.gnumonks.org/
============================================================================
"Privacy in residential applications is a desirable marketing option."
                                                  (ETSI EN 300 175-7 Ch. A6)

^ permalink raw reply

* usb/net/hso: global-out-of-bounds in hso_probe
From: Andrey Konovalov @ 2017-09-21 15:39 UTC (permalink / raw)
  To: David S. Miller, Andreas Kemnade, Johan Hovold, Johannes Berg,
	yuan linyu, Ingo Molnar, USB list, netdev, LKML
  Cc: Dmitry Vyukov, Kostya Serebryany, syzkaller

Hi!

I've got the following report while fuzzing the kernel with syzkaller.

On commit ebb2c2437d8008d46796902ff390653822af6cc4 (Sep 18).

Is seems there's no check on the if_num value when it's used in ((u32
*)(id->driver_info))[if_num].

==================================================================
BUG: KASAN: global-out-of-bounds in hso_probe+0x1379/0x1b70
Read of size 4 at addr ffffffff85c582f8 by task kworker/1:2/1846

CPU: 1 PID: 1846 Comm: kworker/1:2 Not tainted
4.14.0-rc1-42251-gebb2c2437d80 #215
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Workqueue: usb_hub_wq hub_event
Call Trace:
 __dump_stack lib/dump_stack.c:16
 dump_stack+0x292/0x395 lib/dump_stack.c:52
 print_address_description+0x1d9/0x280 mm/kasan/report.c:252
 kasan_report_error mm/kasan/report.c:351
 kasan_report+0x22f/0x340 mm/kasan/report.c:409
 __asan_report_load4_noabort+0x19/0x20 mm/kasan/report.c:429
 hso_probe+0x1379/0x1b70 drivers/net/usb/hso.c:2880
 usb_probe_interface+0x35d/0x8e0 drivers/usb/core/driver.c:361
 really_probe drivers/base/dd.c:413
 driver_probe_device+0x610/0xa00 drivers/base/dd.c:557
 __device_attach_driver+0x230/0x290 drivers/base/dd.c:653
 bus_for_each_drv+0x161/0x210 drivers/base/bus.c:463
 __device_attach+0x26e/0x3d0 drivers/base/dd.c:710
 device_initial_probe+0x1f/0x30 drivers/base/dd.c:757
 bus_probe_device+0x1eb/0x290 drivers/base/bus.c:523
 device_add+0xd0b/0x1660 drivers/base/core.c:1835
 usb_set_configuration+0x104e/0x1870 drivers/usb/core/message.c:1932
 generic_probe+0x73/0xe0 drivers/usb/core/generic.c:174
 usb_probe_device+0xaf/0xe0 drivers/usb/core/driver.c:266
 really_probe drivers/base/dd.c:413
 driver_probe_device+0x610/0xa00 drivers/base/dd.c:557
 __device_attach_driver+0x230/0x290 drivers/base/dd.c:653
 bus_for_each_drv+0x161/0x210 drivers/base/bus.c:463
 __device_attach+0x26e/0x3d0 drivers/base/dd.c:710
 device_initial_probe+0x1f/0x30 drivers/base/dd.c:757
 bus_probe_device+0x1eb/0x290 drivers/base/bus.c:523
 device_add+0xd0b/0x1660 drivers/base/core.c:1835
 usb_new_device+0x7b8/0x1020 drivers/usb/core/hub.c:2457
 hub_port_connect drivers/usb/core/hub.c:4903
 hub_port_connect_change drivers/usb/core/hub.c:5009
 port_event drivers/usb/core/hub.c:5115
 hub_event+0x194d/0x3740 drivers/usb/core/hub.c:5195
 process_one_work+0xc7f/0x1db0 kernel/workqueue.c:2119
 worker_thread+0x221/0x1850 kernel/workqueue.c:2253
 kthread+0x3a1/0x470 kernel/kthread.c:231
 ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431

The buggy address belongs to the variable:
 driver_name+0xb8/0xee0

Memory state around the buggy address:
 ffffffff85c58180: 00 00 04 fa fa fa fa fa 00 00 fa fa fa fa fa fa
 ffffffff85c58200: 06 fa fa fa fa fa fa fa 04 fa fa fa fa fa fa fa
>ffffffff85c58280: 00 fa fa fa fa fa fa fa 06 fa fa fa fa fa fa fa
                                                                ^
 ffffffff85c58300: 00 02 fa fa fa fa fa fa 00 00 00 00 00 07 fa fa
 ffffffff85c58380: fa fa fa fa 00 07 fa fa fa fa fa fa 00 00 00 06
==================================================================

^ permalink raw reply

* usb/net/hso: warning in hso_free_net_device
From: Andrey Konovalov @ 2017-09-21 15:39 UTC (permalink / raw)
  To: David S. Miller, Andreas Kemnade, Johan Hovold, Johannes Berg,
	yuan linyu, Ingo Molnar, USB list, netdev, LKML
  Cc: Dmitry Vyukov, Kostya Serebryany, syzkaller

Hi!

I've got the following report while fuzzing the kernel with syzkaller.

On commit ebb2c2437d8008d46796902ff390653822af6cc4 (Sep 18).

Looks like hso_create_net_device() can do goto exit before registering
network device.

hso 1-1:4.0: Can't find BULK IN endpoint
------------[ cut here ]------------
WARNING: CPU: 0 PID: 24 at net/core/dev.c:7117
rollback_registered_many+0x235/0xd90
Modules linked in:
CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 4.14.0-rc1-42251-gebb2c2437d80 #215
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Workqueue: usb_hub_wq hub_event
task: ffff88006befe300 task.stack: ffff88006bf80000
RIP: 0010:rollback_registered_many+0x235/0xd90 net/core/dev.c:7117
RSP: 0018:ffff88006bf86058 EFLAGS: 00010297
RAX: ffff88006befe300 RBX: 0000000000000000 RCX: 1ffff1000d40644e
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88006a032778
RBP: ffff88006bf86210 R08: ffff88006befe300 R09: ffffffff849e79cc
R10: ffff88006bf86450 R11: 1ffff1000d7dfefb R12: ffff88006a032200
R13: ffff88006a032270 R14: ffff88006bf86258 R15: dffffc0000000000
FS:  0000000000000000(0000) GS:ffff88006c800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1c4eebb000 CR3: 0000000069b63000 CR4: 00000000000006f0
Call Trace:
 rollback_registered+0x1ac/0x3c0 net/core/dev.c:7192
 unregister_netdevice_queue+0x2f3/0x580 net/core/dev.c:8180
 unregister_netdevice ./include/linux/netdevice.h:2427
 unregister_netdev+0x21/0x30 net/core/dev.c:8221
 hso_free_net_device+0xf2/0x2e0 drivers/net/usb/hso.c:2379
 hso_create_net_device+0x3f9/0x9d0 drivers/net/usb/hso.c:2567
 hso_probe+0xaa7/0x1b70 drivers/net/usb/hso.c:2896
 usb_probe_interface+0x35d/0x8e0 drivers/usb/core/driver.c:361
 really_probe drivers/base/dd.c:413
 driver_probe_device+0x610/0xa00 drivers/base/dd.c:557
 __device_attach_driver+0x230/0x290 drivers/base/dd.c:653
 bus_for_each_drv+0x161/0x210 drivers/base/bus.c:463
 __device_attach+0x26e/0x3d0 drivers/base/dd.c:710
 device_initial_probe+0x1f/0x30 drivers/base/dd.c:757
 bus_probe_device+0x1eb/0x290 drivers/base/bus.c:523
 device_add+0xd0b/0x1660 drivers/base/core.c:1835
 usb_set_configuration+0x104e/0x1870 drivers/usb/core/message.c:1932
 generic_probe+0x73/0xe0 drivers/usb/core/generic.c:174
 usb_probe_device+0xaf/0xe0 drivers/usb/core/driver.c:266
 really_probe drivers/base/dd.c:413
 driver_probe_device+0x610/0xa00 drivers/base/dd.c:557
 __device_attach_driver+0x230/0x290 drivers/base/dd.c:653
 bus_for_each_drv+0x161/0x210 drivers/base/bus.c:463
 __device_attach+0x26e/0x3d0 drivers/base/dd.c:710
 device_initial_probe+0x1f/0x30 drivers/base/dd.c:757
 bus_probe_device+0x1eb/0x290 drivers/base/bus.c:523
 device_add+0xd0b/0x1660 drivers/base/core.c:1835
 usb_new_device+0x7b8/0x1020 drivers/usb/core/hub.c:2457
 hub_port_connect drivers/usb/core/hub.c:4903
 hub_port_connect_change drivers/usb/core/hub.c:5009
 port_event drivers/usb/core/hub.c:5115
 hub_event+0x194d/0x3740 drivers/usb/core/hub.c:5195
 process_one_work+0xc7f/0x1db0 kernel/workqueue.c:2119
 worker_thread+0x221/0x1850 kernel/workqueue.c:2253
 kthread+0x3a1/0x470 kernel/kthread.c:231
 ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
Code: 48 c1 e8 03 42 0f b6 04 38 84 c0 74 08 84 c0 0f 8e e7 09 00 00
41 0f b6 9c 24 78 05 00 00 84 db 0f 85 57 ff ff ff e8 0b 92 b0 fc <0f>
ff 4c 89 ef e8 21 15 b7 fd 84 c0 0f 84 e1 00 00 00 e8 f4 91
---[ end trace 94e5147be32b2d45 ]---

^ permalink raw reply

* Re: [kernel-hardening] Re: [PATCH v3 03/31] usercopy: Mark kmalloc caches as usercopy caches
From: Kees Cook @ 2017-09-21 15:40 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: LKML, David Windsor, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Linux-MM, linux-xfs, linux-fsdevel@vger.kernel.org,
	Network Development, kernel-hardening@lists.openwall.com
In-Reply-To: <alpine.DEB.2.20.1709211024120.14427@nuc-kabylake>

On Thu, Sep 21, 2017 at 8:27 AM, Christopher Lameter <cl@linux.com> wrote:
> On Wed, 20 Sep 2017, Kees Cook wrote:
>
>> --- a/mm/slab.c
>> +++ b/mm/slab.c
>> @@ -1291,7 +1291,8 @@ void __init kmem_cache_init(void)
>>        */
>>       kmalloc_caches[INDEX_NODE] = create_kmalloc_cache(
>>                               kmalloc_info[INDEX_NODE].name,
>> -                             kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS);
>> +                             kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS,
>> +                             0, kmalloc_size(INDEX_NODE));
>>       slab_state = PARTIAL_NODE;
>>       setup_kmalloc_cache_index_table();
>
> Ok this presumes that at some point we will be able to restrict the number
> of bytes writeable and thus set the offset and size field to different
> values. Is that realistic?
>
> We already whitelist all kmalloc caches (see first patch).
>
> So what is the point of this patch?

The DMA kmalloc caches are not whitelisted:

>>                         kmalloc_dma_caches[i] = create_kmalloc_cache(n,
>> -                               size, SLAB_CACHE_DMA | flags);
>> +                               size, SLAB_CACHE_DMA | flags, 0, 0);

So this is creating the distinction between the kmallocs that go to
userspace and those that don't. The expectation is that future work
can start to distinguish between "for userspace" and "only kernel"
kmalloc allocations, as is already done here for DMA.

-Kees

-- 
Kees Cook
Pixel Security

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH] net: phy: Fix truncation of large IRQ numbers in phy_attached_print()
From: Andrew Lunn @ 2017-09-21 15:40 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Florian Fainelli, David S . Miller, Romain Perier, netdev,
	linux-kernel
In-Reply-To: <1505993222-16721-1-git-send-email-geert+renesas@glider.be>

On Thu, Sep 21, 2017 at 01:27:02PM +0200, Geert Uytterhoeven wrote:
> Given NR_IRQS is 2048 on sparc64, and even 32784 on alpha, 3 digits is
> not enough to represent interrupt numbers on all architectures.  Hence
> PHY interrupt numbers may be truncated during printing.
> 
> Increase the buffer size from 4 to 8 bytes to fix this.
> 
> Fixes: 5e369aefdce4818c ("net: stmmac: Delete dead code for MDIO registration")
> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [net-next v3] bridge: trigger RTM_NEWLINK when interface is modified by bridge ioctl
From: Vincent Bernat @ 2017-09-21 15:45 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Ahern, David Miller, bridge, netdev
In-Reply-To: <20170921081542.4c266c83@xeon-e3>

 ❦ 21 septembre 2017 08:15 -0700, Stephen Hemminger <stephen@networkplumber.org> :

>> Currently, there is a difference in netlink events received when an
>> interface is modified through bridge ioctl() or through netlink. This
>> patch generates additional events when an interface is added to or
>> removed from a bridge via ioctl().
>> 
>> When adding then removing an interface from a bridge with netlink, we
>> get:
>> 
>> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master bridge0 state UNKNOWN group default
>>     link/ether 9e:da:60:ee:cf:c8 brd ff:ff:ff:ff:ff:ff
>> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
>>     link/ether 9e:da:60:ee:cf:c8
>> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
>>     link/ether 9e:da:60:ee:cf:c8
>> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
>>     link/ether 9e:da:60:ee:cf:c8
>> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
>>     link/ether 9e:da:60:ee:cf:c8
>> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master bridge0 state UNKNOWN group default
>>     link/ether 9e:da:60:ee:cf:c8 brd ff:ff:ff:ff:ff:ff
>> 
>> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master bridge0 state UNKNOWN group default
>>     link/ether 9e:da:60:ee:cf:c8 brd ff:ff:ff:ff:ff:ff
>> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
>>     link/ether 9e:da:60:ee:cf:c8
>> Deleted 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
>>     link/ether 9e:da:60:ee:cf:c8
>> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
>>     link/ether 9e:da:60:ee:cf:c8 brd ff:ff:ff:ff:ff:ff
>> 
>> When using ioctl():
>> 
>> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master bridge0 state UNKNOWN group default
>>     link/ether 9e:da:60:ee:cf:c8 brd ff:ff:ff:ff:ff:ff
>> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
>>     link/ether 9e:da:60:ee:cf:c8
>> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
>>     link/ether 9e:da:60:ee:cf:c8
>> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
>>     link/ether 9e:da:60:ee:cf:c8
>> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master bridge0 state UNKNOWN group default
>>     link/ether 9e:da:60:ee:cf:c8 brd ff:ff:ff:ff:ff:ff
>> 
>> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master bridge0 state UNKNOWN group default
>>     link/ether 9e:da:60:ee:cf:c8 brd ff:ff:ff:ff:ff:ff
>> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
>>     link/ether 9e:da:60:ee:cf:c8
>> Deleted 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
>>     link/ether 9e:da:60:ee:cf:c8
>> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
>>     link/ether 9e:da:60:ee:cf:c8 brd ff:ff:ff:ff:ff:ff
>> 
>> Without this patch, the last netlink notification is not sent.
>> 
>> Signed-off-by: Vincent Bernat <vincent@bernat.im>
>
> This makes sense, you should probably add a Fixes: tag to help maintainers
> of long term stable kernels.
>
> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>

I wouldn't know which commit would be fixed since this is not a
regression, just a behavior difference.
-- 
Make sure special cases are truly special.
            - The Elements of Programming Style (Kernighan & Plauger)

^ permalink raw reply

* Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
From: Alexei Starovoitov @ 2017-09-21 15:52 UTC (permalink / raw)
  To: Edward Cree; +Cc: David Miller, netdev, Daniel Borkmann
In-Reply-To: <7013ee9d-a8e6-13fd-cc5f-86cf3d8bf4e0@solarflare.com>

On Thu, Sep 21, 2017 at 04:09:34PM +0100, Edward Cree wrote:
> print_bpf_insn() was treating all BPF_ALU[64] the same, but BPF_END has a
>  different structure: it has a size in insn->imm (even if it's BPF_X) and
>  uses the BPF_SRC (X or K) to indicate which endianness to use.  So it
>  needs different code to print it.
> 
> Signed-off-by: Edward Cree <ecree@solarflare.com>
> ---
> It's not the same format as the new LLVM asm uses, does that matter?
> AFAIK the LLVM format doesn't comprehend BPF_TO_LE, just assumes that all
>  endian ops are necessarily swaps (rather than sometimes nops).

that is being fixed and we will fix asm format too.
Let's pick good format first.

>  kernel/bpf/verifier.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 799b245..e7657a4 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -331,20 +331,29 @@ static void print_bpf_insn(const struct bpf_verifier_env *env,
>  	u8 class = BPF_CLASS(insn->code);
>  
>  	if (class == BPF_ALU || class == BPF_ALU64) {
> -		if (BPF_SRC(insn->code) == BPF_X)
> +		if (BPF_OP(insn->code) == BPF_END) {
> +			if (class == BPF_ALU64)
> +				verbose("BUG_alu64_%02x\n", insn->code);
> +			else
> +				verbose("(%02x) (u%d) r%d %s %s\n",
> +					insn->code, insn->imm, insn->dst_reg,
> +					bpf_alu_string[BPF_END >> 4],
> +					BPF_SRC(insn->code) == BPF_X ? "be" : "le");

yes the bit the same, but please use BPF_SRC(insn->code) == BPF_TO_BE.
imo
(u16) r4 endian be
isn't intuitive.
Can we come up with some better syntax?
Like
bswap16be r4
bswap32le r4

or

to_be16 r4
to_le32 r4

It will be more obvious what's happening.

^ permalink raw reply

* Re: [kernel-hardening] Re: [PATCH v3 03/31] usercopy: Mark kmalloc caches as usercopy caches
From: Christopher Lameter @ 2017-09-21 16:04 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, David Windsor, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Linux-MM, linux-xfs, linux-fsdevel@vger.kernel.org,
	Network Development, kernel-hardening@lists.openwall.com
In-Reply-To: <CAGXu5j+X6dWCGocG=P7pszTY-5OZ6Jmp-RsnDKox75M5rmVe4g@mail.gmail.com>

On Thu, 21 Sep 2017, Kees Cook wrote:

> > So what is the point of this patch?
>
> The DMA kmalloc caches are not whitelisted:

The DMA kmalloc caches are pretty obsolete and mostly there for obscure
drivers.

??

> >>                         kmalloc_dma_caches[i] = create_kmalloc_cache(n,
> >> -                               size, SLAB_CACHE_DMA | flags);
> >> +                               size, SLAB_CACHE_DMA | flags, 0, 0);
>
> So this is creating the distinction between the kmallocs that go to
> userspace and those that don't. The expectation is that future work
> can start to distinguish between "for userspace" and "only kernel"
> kmalloc allocations, as is already done here for DMA.

The creation of the kmalloc caches in earlier patches already setup the
"whitelisting". Why do it twice?

^ permalink raw reply

* [PATCH net] net: prevent dst uses after free
From: Eric Dumazet @ 2017-09-21 16:15 UTC (permalink / raw)
  To: Paweł Staszewski
  Cc: Wei Wang, Cong Wang, Linux Kernel Network Developers,
	Eric Dumazet
In-Reply-To: <1506005799.29839.130.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <edumazet@google.com>

In linux-4.13, Wei worked hard to convert dst to a traditional
refcounted model, removing GC.

We now want to make sure a dst refcount can not transition from 0 back
to 1.

The problem here is that input path attached a not refcounted dst to an
skb. Then later, because packet is forwarded and hits skb_dst_force()
before exiting RCU section, we might try to take a refcount on one dst
that is about to be freed, if another cpu saw 1 -> 0 transition in
dst_release() and queued the dst for freeing after one RCU grace period.

Lets unify skb_dst_force() and skb_dst_force_safe(), since we should
always perform the complete check against dst refcount, and not assume
it is not zero.

Bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=197005

[  989.919496]  skb_dst_force+0x32/0x34
[  989.919498]  __dev_queue_xmit+0x1ad/0x482
[  989.919501]  ? eth_header+0x28/0xc6
[  989.919502]  dev_queue_xmit+0xb/0xd
[  989.919504]  neigh_connected_output+0x9b/0xb4
[  989.919507]  ip_finish_output2+0x234/0x294
[  989.919509]  ? ipt_do_table+0x369/0x388
[  989.919510]  ip_finish_output+0x12c/0x13f
[  989.919512]  ip_output+0x53/0x87
[  989.919513]  ip_forward_finish+0x53/0x5a
[  989.919515]  ip_forward+0x2cb/0x3e6
[  989.919516]  ? pskb_trim_rcsum.part.9+0x4b/0x4b
[  989.919518]  ip_rcv_finish+0x2e2/0x321
[  989.919519]  ip_rcv+0x26f/0x2eb
[  989.919522]  ? vlan_do_receive+0x4f/0x289
[  989.919523]  __netif_receive_skb_core+0x467/0x50b
[  989.919526]  ? tcp_gro_receive+0x239/0x239
[  989.919529]  ? inet_gro_receive+0x226/0x238
[  989.919530]  __netif_receive_skb+0x4d/0x5f
[  989.919532]  netif_receive_skb_internal+0x5c/0xaf
[  989.919533]  napi_gro_receive+0x45/0x81
[  989.919536]  ixgbe_poll+0xc8a/0xf09
[  989.919539]  ? kmem_cache_free_bulk+0x1b6/0x1f7
[  989.919540]  net_rx_action+0xf4/0x266
[  989.919543]  __do_softirq+0xa8/0x19d
[  989.919545]  irq_exit+0x5d/0x6b
[  989.919546]  do_IRQ+0x9c/0xb5
[  989.919548]  common_interrupt+0x93/0x93
[  989.919548]  </IRQ>


Similarly dst_clone() can use dst_hold() helper to have additional
debugging, as a follow up to commit 44ebe79149ff ("net: add debug
atomic_inc_not_zero() in dst_hold()")

In net-next we will convert dst atomic_t to refcount_t for peace of
mind.

Fixes: a4c2fd7f7891 ("net: remove DST_NOCACHE flag")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Reported-by: Paweł Staszewski <pstaszewski@itcare.pl>
Bisected-by: Paweł Staszewski <pstaszewski@itcare.pl>
---
 include/net/dst.h   |   22 ++++------------------
 include/net/route.h |    2 +-
 include/net/sock.h  |    2 +-
 3 files changed, 6 insertions(+), 20 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 93568bd0a3520bb7402f04d90cf04ac99c81cfbe..06a6765da074449e6f1fe42ee05e711e898ad372 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -271,7 +271,7 @@ static inline void dst_use_noref(struct dst_entry *dst, unsigned long time)
 static inline struct dst_entry *dst_clone(struct dst_entry *dst)
 {
 	if (dst)
-		atomic_inc(&dst->__refcnt);
+		dst_hold(dst);
 	return dst;
 }
 
@@ -311,21 +311,6 @@ static inline void skb_dst_copy(struct sk_buff *nskb, const struct sk_buff *oskb
 	__skb_dst_copy(nskb, oskb->_skb_refdst);
 }
 
-/**
- * skb_dst_force - makes sure skb dst is refcounted
- * @skb: buffer
- *
- * If dst is not yet refcounted, let's do it
- */
-static inline void skb_dst_force(struct sk_buff *skb)
-{
-	if (skb_dst_is_noref(skb)) {
-		WARN_ON(!rcu_read_lock_held());
-		skb->_skb_refdst &= ~SKB_DST_NOREF;
-		dst_clone(skb_dst(skb));
-	}
-}
-
 /**
  * dst_hold_safe - Take a reference on a dst if possible
  * @dst: pointer to dst entry
@@ -339,16 +324,17 @@ static inline bool dst_hold_safe(struct dst_entry *dst)
 }
 
 /**
- * skb_dst_force_safe - makes sure skb dst is refcounted
+ * skb_dst_force - makes sure skb dst is refcounted
  * @skb: buffer
  *
  * If dst is not yet refcounted and not destroyed, grab a ref on it.
  */
-static inline void skb_dst_force_safe(struct sk_buff *skb)
+static inline void skb_dst_force(struct sk_buff *skb)
 {
 	if (skb_dst_is_noref(skb)) {
 		struct dst_entry *dst = skb_dst(skb);
 
+		WARN_ON(!rcu_read_lock_held());
 		if (!dst_hold_safe(dst))
 			dst = NULL;
 
diff --git a/include/net/route.h b/include/net/route.h
index 1b09a9368c68d46f0c5ee8ce3cefe566000c1ec1..57dfc6850d378e4b96f13b140eef554d66c24cdf 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -190,7 +190,7 @@ static inline int ip_route_input(struct sk_buff *skb, __be32 dst, __be32 src,
 	rcu_read_lock();
 	err = ip_route_input_noref(skb, dst, src, tos, devin);
 	if (!err) {
-		skb_dst_force_safe(skb);
+		skb_dst_force(skb);
 		if (!skb_dst(skb))
 			err = -EINVAL;
 	}
diff --git a/include/net/sock.h b/include/net/sock.h
index 03a362568357acc7278a318423dd3873103f90ca..a6b9a8d1a6df3f72df8f1aac0f577257fa6452d0 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -856,7 +856,7 @@ void sk_stream_write_space(struct sock *sk);
 static inline void __sk_add_backlog(struct sock *sk, struct sk_buff *skb)
 {
 	/* dont let skb dst not refcounted, we are going to leave rcu lock */
-	skb_dst_force_safe(skb);
+	skb_dst_force(skb);
 
 	if (!sk->sk_backlog.tail)
 		sk->sk_backlog.head = skb;

^ permalink raw reply related

* Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
From: Edward Cree @ 2017-09-21 16:24 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: David Miller, netdev, Daniel Borkmann
In-Reply-To: <20170921155215.jta52sesbiq54vri@ast-mbp>

On 21/09/17 16:52, Alexei Starovoitov wrote:
> On Thu, Sep 21, 2017 at 04:09:34PM +0100, Edward Cree wrote:
>> print_bpf_insn() was treating all BPF_ALU[64] the same, but BPF_END has a
>>  different structure: it has a size in insn->imm (even if it's BPF_X) and
>>  uses the BPF_SRC (X or K) to indicate which endianness to use.  So it
>>  needs different code to print it.
>>
>> Signed-off-by: Edward Cree <ecree@solarflare.com>
>> ---
>> It's not the same format as the new LLVM asm uses, does that matter?
>> AFAIK the LLVM format doesn't comprehend BPF_TO_LE, just assumes that all
>>  endian ops are necessarily swaps (rather than sometimes nops).
> that is being fixed and we will fix asm format too.
> Let's pick good format first.
Agreed.
>>  kernel/bpf/verifier.c | 13 +++++++++++--
>>  1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 799b245..e7657a4 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -331,20 +331,29 @@ static void print_bpf_insn(const struct bpf_verifier_env *env,
>>  	u8 class = BPF_CLASS(insn->code);
>>  
>>  	if (class == BPF_ALU || class == BPF_ALU64) {
>> -		if (BPF_SRC(insn->code) == BPF_X)
>> +		if (BPF_OP(insn->code) == BPF_END) {
>> +			if (class == BPF_ALU64)
>> +				verbose("BUG_alu64_%02x\n", insn->code);
>> +			else
>> +				verbose("(%02x) (u%d) r%d %s %s\n",
>> +					insn->code, insn->imm, insn->dst_reg,
>> +					bpf_alu_string[BPF_END >> 4],
>> +					BPF_SRC(insn->code) == BPF_X ? "be" : "le");
> yes the bit the same, but please use BPF_SRC(insn->code) == BPF_TO_BE.
Good point.
> imo
> (u16) r4 endian be
> isn't intuitive.
> Can we come up with some better syntax?
> Like
> bswap16be r4
> bswap32le r4
Hmm, I don't like these, since bswapbe is a swap on *le* and a nop on be.
> or
>
> to_be16 r4
> to_le32 r4
And the problem here is that it's not just to_be, it's also from_be.
 Otherwise we could write `(be16) r4 = endian (u16) r4` and be much more
 explicit about what's happening.
Really the operation is something like `cpu_tofrom_be16 r4`, but that also
 seems a bit clumsy and longwinded.  Also it's inconsistent with the other
 ops that all indicate sizes with these (u16) etc casts.
`endian (be16) r4`, perhaps?

^ permalink raw reply

* Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
From: Y Song @ 2017-09-21 16:30 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Edward Cree, David Miller, netdev, Daniel Borkmann
In-Reply-To: <20170921155215.jta52sesbiq54vri@ast-mbp>

On Thu, Sep 21, 2017 at 8:52 AM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Thu, Sep 21, 2017 at 04:09:34PM +0100, Edward Cree wrote:
>> print_bpf_insn() was treating all BPF_ALU[64] the same, but BPF_END has a
>>  different structure: it has a size in insn->imm (even if it's BPF_X) and
>>  uses the BPF_SRC (X or K) to indicate which endianness to use.  So it
>>  needs different code to print it.
>>
>> Signed-off-by: Edward Cree <ecree@solarflare.com>
>> ---
>> It's not the same format as the new LLVM asm uses, does that matter?
>> AFAIK the LLVM format doesn't comprehend BPF_TO_LE, just assumes that all
>>  endian ops are necessarily swaps (rather than sometimes nops).
>
> that is being fixed and we will fix asm format too.
> Let's pick good format first.
>
>>  kernel/bpf/verifier.c | 13 +++++++++++--
>>  1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 799b245..e7657a4 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -331,20 +331,29 @@ static void print_bpf_insn(const struct bpf_verifier_env *env,
>>       u8 class = BPF_CLASS(insn->code);
>>
>>       if (class == BPF_ALU || class == BPF_ALU64) {
>> -             if (BPF_SRC(insn->code) == BPF_X)
>> +             if (BPF_OP(insn->code) == BPF_END) {
>> +                     if (class == BPF_ALU64)
>> +                             verbose("BUG_alu64_%02x\n", insn->code);
>> +                     else
>> +                             verbose("(%02x) (u%d) r%d %s %s\n",
>> +                                     insn->code, insn->imm, insn->dst_reg,
>> +                                     bpf_alu_string[BPF_END >> 4],
>> +                                     BPF_SRC(insn->code) == BPF_X ? "be" : "le");
>
> yes the bit the same, but please use BPF_SRC(insn->code) == BPF_TO_BE.
> imo
> (u16) r4 endian be
> isn't intuitive.
> Can we come up with some better syntax?
> Like
> bswap16be r4
> bswap32le r4
>
> or
>
> to_be16 r4
> to_le32 r4

Currently, llvm bpf backend uses "bswap16 r4" "bswap32 r2" "bswap64 r2" syntax.

I prefer something like "to_be16 r4" "to_le32 r4", or "bswap2be16"
"bswap2le32" or something similar.
This captures what the operation really is.

Right the support to bswap2le will be added to LLVM soon.

>
> It will be more obvious what's happening.
>

^ permalink raw reply

* Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
From: Alexei Starovoitov @ 2017-09-21 16:40 UTC (permalink / raw)
  To: Edward Cree; +Cc: David Miller, netdev, Daniel Borkmann
In-Reply-To: <4cfac985-4f99-cf85-fc15-c3ad1f8ff123@solarflare.com>

On Thu, Sep 21, 2017 at 05:24:10PM +0100, Edward Cree wrote:
> On 21/09/17 16:52, Alexei Starovoitov wrote:
> > On Thu, Sep 21, 2017 at 04:09:34PM +0100, Edward Cree wrote:
> >> print_bpf_insn() was treating all BPF_ALU[64] the same, but BPF_END has a
> >>  different structure: it has a size in insn->imm (even if it's BPF_X) and
> >>  uses the BPF_SRC (X or K) to indicate which endianness to use.  So it
> >>  needs different code to print it.
> >>
> >> Signed-off-by: Edward Cree <ecree@solarflare.com>
> >> ---
> >> It's not the same format as the new LLVM asm uses, does that matter?
> >> AFAIK the LLVM format doesn't comprehend BPF_TO_LE, just assumes that all
> >>  endian ops are necessarily swaps (rather than sometimes nops).
> > that is being fixed and we will fix asm format too.
> > Let's pick good format first.
> Agreed.
> >>  kernel/bpf/verifier.c | 13 +++++++++++--
> >>  1 file changed, 11 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> >> index 799b245..e7657a4 100644
> >> --- a/kernel/bpf/verifier.c
> >> +++ b/kernel/bpf/verifier.c
> >> @@ -331,20 +331,29 @@ static void print_bpf_insn(const struct bpf_verifier_env *env,
> >>  	u8 class = BPF_CLASS(insn->code);
> >>  
> >>  	if (class == BPF_ALU || class == BPF_ALU64) {
> >> -		if (BPF_SRC(insn->code) == BPF_X)
> >> +		if (BPF_OP(insn->code) == BPF_END) {
> >> +			if (class == BPF_ALU64)
> >> +				verbose("BUG_alu64_%02x\n", insn->code);
> >> +			else
> >> +				verbose("(%02x) (u%d) r%d %s %s\n",
> >> +					insn->code, insn->imm, insn->dst_reg,
> >> +					bpf_alu_string[BPF_END >> 4],
> >> +					BPF_SRC(insn->code) == BPF_X ? "be" : "le");
> > yes the bit the same, but please use BPF_SRC(insn->code) == BPF_TO_BE.
> Good point.
> > imo
> > (u16) r4 endian be
> > isn't intuitive.
> > Can we come up with some better syntax?
> > Like
> > bswap16be r4
> > bswap32le r4
> Hmm, I don't like these, since bswapbe is a swap on *le* and a nop on be.
> > or
> >
> > to_be16 r4
> > to_le32 r4
> And the problem here is that it's not just to_be, it's also from_be.
>  Otherwise we could write `(be16) r4 = endian (u16) r4` and be much more
>  explicit about what's happening.
> Really the operation is something like `cpu_tofrom_be16 r4`, but that also
>  seems a bit clumsy and longwinded.  Also it's inconsistent with the other
>  ops that all indicate sizes with these (u16) etc casts.
> `endian (be16) r4`, perhaps?

How about:
r4 = (be16) (u16) r4 
r4 = (le64) (u64) r4 
most C like and most explicit ?
it should be easy to grasp that (be16) cast on big-endian arch is a nop and
swap on little.

^ permalink raw reply

* Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
From: Y Song @ 2017-09-21 16:40 UTC (permalink / raw)
  To: Edward Cree; +Cc: Alexei Starovoitov, David Miller, netdev, Daniel Borkmann
In-Reply-To: <4cfac985-4f99-cf85-fc15-c3ad1f8ff123@solarflare.com>

On Thu, Sep 21, 2017 at 9:24 AM, Edward Cree <ecree@solarflare.com> wrote:
> On 21/09/17 16:52, Alexei Starovoitov wrote:
>> On Thu, Sep 21, 2017 at 04:09:34PM +0100, Edward Cree wrote:
>>> print_bpf_insn() was treating all BPF_ALU[64] the same, but BPF_END has a
>>>  different structure: it has a size in insn->imm (even if it's BPF_X) and
>>>  uses the BPF_SRC (X or K) to indicate which endianness to use.  So it
>>>  needs different code to print it.
>>>
>>> Signed-off-by: Edward Cree <ecree@solarflare.com>
>>> ---
>>> It's not the same format as the new LLVM asm uses, does that matter?
>>> AFAIK the LLVM format doesn't comprehend BPF_TO_LE, just assumes that all
>>>  endian ops are necessarily swaps (rather than sometimes nops).
>> that is being fixed and we will fix asm format too.
>> Let's pick good format first.
> Agreed.
>>>  kernel/bpf/verifier.c | 13 +++++++++++--
>>>  1 file changed, 11 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>> index 799b245..e7657a4 100644
>>> --- a/kernel/bpf/verifier.c
>>> +++ b/kernel/bpf/verifier.c
>>> @@ -331,20 +331,29 @@ static void print_bpf_insn(const struct bpf_verifier_env *env,
>>>      u8 class = BPF_CLASS(insn->code);
>>>
>>>      if (class == BPF_ALU || class == BPF_ALU64) {
>>> -            if (BPF_SRC(insn->code) == BPF_X)
>>> +            if (BPF_OP(insn->code) == BPF_END) {
>>> +                    if (class == BPF_ALU64)
>>> +                            verbose("BUG_alu64_%02x\n", insn->code);
>>> +                    else
>>> +                            verbose("(%02x) (u%d) r%d %s %s\n",
>>> +                                    insn->code, insn->imm, insn->dst_reg,
>>> +                                    bpf_alu_string[BPF_END >> 4],
>>> +                                    BPF_SRC(insn->code) == BPF_X ? "be" : "le");
>> yes the bit the same, but please use BPF_SRC(insn->code) == BPF_TO_BE.
> Good point.
>> imo
>> (u16) r4 endian be
>> isn't intuitive.
>> Can we come up with some better syntax?
>> Like
>> bswap16be r4
>> bswap32le r4
> Hmm, I don't like these, since bswapbe is a swap on *le* and a nop on be.
>> or
>>
>> to_be16 r4
>> to_le32 r4
> And the problem here is that it's not just to_be, it's also from_be.

Could you explain what is "from_be" here? Do not quite understand.

>  Otherwise we could write `(be16) r4 = endian (u16) r4` and be much more
>  explicit about what's happening.
> Really the operation is something like `cpu_tofrom_be16 r4`, but that also
>  seems a bit clumsy and longwinded.  Also it's inconsistent with the other
>  ops that all indicate sizes with these (u16) etc casts.
> `endian (be16) r4`, perhaps?

^ permalink raw reply

* Re: [PATCH net-next 09/14] gtp: Allow configuring GTP interface as standalone
From: Tom Herbert @ 2017-09-21 16:43 UTC (permalink / raw)
  To: Harald Welte
  Cc: Andreas Schultz, Tom Herbert, David S. Miller,
	Linux Kernel Network Developers, Pablo Neira Ayuso, Rohit Seth
In-Reply-To: <20170921151253.4udm54dcq7x3ed4q@nataraja>

On Thu, Sep 21, 2017 at 8:12 AM, Harald Welte <laforge@netfilter.org> wrote:
> Hi Tom,
>
> On Wed, Sep 20, 2017 at 05:55:01PM -0700, Tom Herbert wrote:
>> You have the point of view of someone who has a lot of experience
>> dealing with this protocol. Try to imagine if you were some random
>> kernel network programmer with no experience in the area. If they
>> happen to find a one-off bug and want to do the right thing by running
>> a test, you want to make that as easy as possible.
>
> Agreed.  But we're not talking abut fixing a random bug in your patch
> series, but we're talking about adding significant new features - and
> those features need to be tested in real use caes, not just in an
> artificial test setup that holds assumptions that are not true.
>
> To improve performance, or to fix simple bugs that only affect the
> processing of the GTP-U G-PDU, a much more limited and hence
> "unrealistic" test scenario is probably sufficient/acceptable.
>
>> From that perspective, building protocol specific libraries and
>> finding the right cmd line to run is significant hoops (I can attest
>> to this).
>
> I understand your argument.  But then, there is actually quite some
> tools to help you (see further below), as well as the wiki page at
> http://osmocom.org/projects/linux-kernel-gtp-u/wiki/Basic_Testing
>
> Of course, existing tools and existing wiki pages also only document
> existing features of the kernel code :)
>
> Yes, the documentation could be better.  But then, how much more can you
> expect from somebody who's doing this mostly for fun and who - despite
> working in his dayjob on FOSS cellular projects - has no single
> commercial project/context that uses the kernel GTP code.
>
> In any case, working on a specific protocol or technology will require
> that you understand that technology to some extent, including the
> available tools.  There's always a learning curve involved.
>
>> There are other examples in the kernel of systems bigger than GTP that
>> require a whole lot of effort just to run a simple test; you'll notice
>> for those it's rare that best developers ever bother to look at them
>> unless they're making a global change that affects the code. We don't
>> want GTP to take be like that!
>
> I'm all for following your argument.  My point is simply: You cannot
> develop code solely based on mock-ups without any 'realistic' test
> scenarios.  Otherwise you will end up with something that works only in
> your artificial lab setup, and follows all the best practises of the way
> how the Linux kernel traditionally approaches tunneling implementations,
> but it will never work/interop in the real world.
>

> And I'm very strongly opposed to merging code where we have not been
> able to show that it will inter-operate in at least one realistic
> scenario.  This would raise wrong expectations with users and all sorts
> of downstream problems.
>
Harald,

Please see the cover letter for the original GTP kernel patches dated
May 10, 2016. My first question on those was "Is there a timeline for
adding IPv6 support?". To which Pablo replied that there was a
preliminary patch for it that has not been released. That was almost a
year and half ago and we have not heard anything since. If you don't
like my patches or don't think that can be adapted to fully support
the GTP specification, that's fine. But then you need to provide a
viable alternative. We are at the point where a kernel networking
feature that only supports IPv4 when it could support IPv6 must be
considered incomplete.

Thanks,
Tom

^ permalink raw reply

* Re: [net-next v3] bridge: trigger RTM_NEWLINK when interface is modified by bridge ioctl
From: David Ahern @ 2017-09-21 16:43 UTC (permalink / raw)
  To: Vincent Bernat, Stephen Hemminger, David Miller, bridge, netdev
In-Reply-To: <20170921100525.20395-1-vincent@bernat.im>

On 9/21/17 4:05 AM, Vincent Bernat wrote:
> Currently, there is a difference in netlink events received when an
> interface is modified through bridge ioctl() or through netlink. This
> patch generates additional events when an interface is added to or
> removed from a bridge via ioctl().
> 
> When adding then removing an interface from a bridge with netlink, we
> get:
> 
> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master bridge0 state UNKNOWN group default
>     link/ether 9e:da:60:ee:cf:c8 brd ff:ff:ff:ff:ff:ff
> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
>     link/ether 9e:da:60:ee:cf:c8
> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
>     link/ether 9e:da:60:ee:cf:c8
> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
>     link/ether 9e:da:60:ee:cf:c8
> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
>     link/ether 9e:da:60:ee:cf:c8
> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master bridge0 state UNKNOWN group default
>     link/ether 9e:da:60:ee:cf:c8 brd ff:ff:ff:ff:ff:ff
> 
> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master bridge0 state UNKNOWN group default
>     link/ether 9e:da:60:ee:cf:c8 brd ff:ff:ff:ff:ff:ff
> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
>     link/ether 9e:da:60:ee:cf:c8
> Deleted 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
>     link/ether 9e:da:60:ee:cf:c8
> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
>     link/ether 9e:da:60:ee:cf:c8 brd ff:ff:ff:ff:ff:ff
> 
> When using ioctl():
> 
> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master bridge0 state UNKNOWN group default
>     link/ether 9e:da:60:ee:cf:c8 brd ff:ff:ff:ff:ff:ff
> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
>     link/ether 9e:da:60:ee:cf:c8
> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
>     link/ether 9e:da:60:ee:cf:c8
> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
>     link/ether 9e:da:60:ee:cf:c8
> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master bridge0 state UNKNOWN group default
>     link/ether 9e:da:60:ee:cf:c8 brd ff:ff:ff:ff:ff:ff
> 
> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master bridge0 state UNKNOWN group default
>     link/ether 9e:da:60:ee:cf:c8 brd ff:ff:ff:ff:ff:ff
> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
>     link/ether 9e:da:60:ee:cf:c8
> Deleted 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
>     link/ether 9e:da:60:ee:cf:c8
> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
>     link/ether 9e:da:60:ee:cf:c8 brd ff:ff:ff:ff:ff:ff
> 
> Without this patch, the last netlink notification is not sent.
> 
> Signed-off-by: Vincent Bernat <vincent@bernat.im>
> ---
>  net/bridge/br_ioctl.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/bridge/br_ioctl.c b/net/bridge/br_ioctl.c
> index 7970f8540cbb..66cd98772051 100644
> --- a/net/bridge/br_ioctl.c
> +++ b/net/bridge/br_ioctl.c
> @@ -102,6 +102,9 @@ static int add_del_if(struct net_bridge *br, int ifindex, int isadd)
>  	else
>  		ret = br_del_if(br, dev);
>  
> +	if (!ret)
> +		rtmsg_ifinfo(RTM_NEWLINK, dev, IFF_MASTER, GFP_KERNEL);
> +
>  	return ret;
>  }
>  
> 

Agreed that this is needed for userspace to know about the master change
when done through ioctl. The bridge code is emitting a lot of what
appears to be redundant messages for both paths (netlink and ioctl).

Reviewed-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox