Netdev List

Netdev List
 help / color / mirror / Atom feed

* [net 0/5][pull request] Intel Wired LAN Driver Updates 2017-10-09
From: Jeff Kirsher @ 2017-10-09 15:12 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann, jogreene

This series contains updates to ixgbe and arch/Kconfig.

Mark fixes a case where PHY register access is not supported and we were
returning a PHY address, when we should have been returning -EOPNOTSUPP.

Sabrina Dubroca fixes the use of a logical "and" when it should have been
the bitwise "and" operator.

Ding Tianhong reverts the commit that added the Kconfig bool option
ARCH_WANT_RELAX_ORDER, since there is now a new flag
PCI_DEV_FLAGS_NO_RELAXED_ORDERING that has been added to indicate that
Relaxed Ordering Attributes should not be used for Transaction Layer
Packets.  Then follows up with making the needed changes to ixgbe to
use the new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag.

John Fastabend fixes an issue in the ring accounting when the transmit
ring parameters are changed via ethtool when an XDP program is attached.

The following are changes since commit a9e2971b8cd3ef469de0112ba15778b5b98ad72e:
  tipc: Unclone message at secondary destination lookup
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue 10GbE

Ding Tianhong (2):
  Revert commit 1a8b6d76dc5b ("net:add one common config...")
  net: ixgbe: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

John Fastabend (1):
  ixgbe: incorrect xdp ring accounting in ethtool tx_frame param

Mark D Rustad (1):
  ixgbe: Return error when getting PHY address if PHY access is not
    supported

Sabrina Dubroca (1):
  ixgbe: fix masking of bits read from IXGBE_VXLANCTRL register

 arch/Kconfig                                     |  3 ---
 arch/sparc/Kconfig                               |  1 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c   | 22 ----------------------
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.c  | 19 -------------------
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 16 ++++++++--------
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |  6 +++++-
 6 files changed, 13 insertions(+), 54 deletions(-)

-- 
2.14.2

^ permalink raw reply

* [net 1/5] ixgbe: Return error when getting PHY address if PHY access is not supported
From: Jeff Kirsher @ 2017-10-09 15:12 UTC (permalink / raw)
  To: davem; +Cc: Mark D Rustad, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171009151251.53939-1-jeffrey.t.kirsher@intel.com>

From: Mark D Rustad <mark.d.rustad@intel.com>

In cases where PHY register access is not supported, don't mislead
a caller into thinking that it is supported by returning a PHY
address. Instead, return -EOPNOTSUPP when PHY access is not
supported.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index d962368d08d0..822cdb4f2c25 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8529,6 +8529,10 @@ static int ixgbe_ioctl(struct net_device *netdev, struct ifreq *req, int cmd)
 		return ixgbe_ptp_set_ts_config(adapter, req);
 	case SIOCGHWTSTAMP:
 		return ixgbe_ptp_get_ts_config(adapter, req);
+	case SIOCGMIIPHY:
+		if (!adapter->hw.phy.ops.read_reg)
+			return -EOPNOTSUPP;
+		/* fall through */
 	default:
 		return mdio_mii_ioctl(&adapter->hw.phy.mdio, if_mii(req), cmd);
 	}
-- 
2.14.2

^ permalink raw reply related

* Re: [PATCH net] net: enable interface alias removal via rtnl
From: David Ahern @ 2017-10-09 14:02 UTC (permalink / raw)
  To: nicolas.dichtel, Oliver Hartkopp, davem
  Cc: netdev, Oliver Hartkopp, Stephen Hemminger
In-Reply-To: <6e1671c7-93d7-0090-54f6-6b36abb8ba89@6wind.com>

On 10/9/17 2:23 AM, Nicolas Dichtel wrote:
> Le 06/10/2017 à 22:10, Oliver Hartkopp a écrit :
>>
>>
>> On 10/06/2017 08:18 PM, David Ahern wrote:
>>> On 10/5/17 4:19 AM, Nicolas Dichtel wrote:
>>>> IFLA_IFALIAS is defined as NLA_STRING. It means that the minimal length of
>>>> the attribute is 1 ("\0"). However, to remove an alias, the attribute
>>>> length must be 0 (see dev_set_alias()).
>>>
>>> why not add a check in dev_set_alias that if len is 1 and the 1
>>> character is '\0' it means remove the alias?
> Because it requires an iproute2 patch. iproute2 doesn't send the '\0'. With the
> command 'ip link set dummy0 alias ""', the attribute length is 0.

iproute2 needs the feature for 0-len strings or perhaps a 'noalias' option.

You can reset the alias using the sysfs file. Given that there is a
workaround for existing kernels and userspace, upstream can get fixed
without changing the UAPI.

> A kernel patch is probably enough for this problem. Updating iproute2 on old
> distributions is not always easy.

Can't say I have ever heard someone suggest that a kernel is easier to
change than userspace.

^ permalink raw reply

* [PATCH net-next 2/3] net/mlx4_core: Fix cast warning in fw.c
From: Tariq Toukan @ 2017-10-09 13:59 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Eran Ben Elisha, Tariq Toukan
In-Reply-To: <1507557590-17747-1-git-send-email-tariqt@mellanox.com>

Fix the following SPARSE warning, in MLX4_GET() macro:
drivers/net/ethernet/mellanox/mlx4/fw.c:233:9: warning: cast to restricted __be64

Fixes: 17d5ceb6e43e ("net/mlx4_core: Fix unaligned accesses")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/fw.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c
index 16c09949afd5..634f603f941c 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -57,12 +57,12 @@ enum {
 #define MLX4_GET(dest, source, offset)				      \
 	do {							      \
 		void *__p = (char *) (source) + (offset);	      \
-		u64 val;                                              \
-		switch (sizeof(dest)) {			      \
+		__be64 val;                                           \
+		switch (sizeof(dest)) {				      \
 		case 1: (dest) = *(u8 *) __p;	    break;	      \
 		case 2: (dest) = be16_to_cpup(__p); break;	      \
 		case 4: (dest) = be32_to_cpup(__p); break;	      \
-		case 8: val = get_unaligned((u64 *)__p);              \
+		case 8: val = get_unaligned((__be64 *)__p);           \
 			(dest) = be64_to_cpu(val);  break;            \
 		default: __buggy_use_of_MLX4_GET();		      \
 		}						      \
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 3/3] net/mlx4_en: Use __force to fix a sparse warning in TX datapath
From: Tariq Toukan @ 2017-10-09 13:59 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Eran Ben Elisha, Tariq Toukan
In-Reply-To: <1507557590-17747-1-git-send-email-tariqt@mellanox.com>

In TX data-path, we intentionally do not byte-swap, as documented
in code and in the cited commit log.
This fixes sparse warning:
en_tx.c:720:23: warning: incorrect type in argument 1 (different base types)
en_tx.c:720:23:    expected unsigned int [unsigned] [usertype] <noident>
en_tx.c:720:23:    got restricted __be32 [usertype] doorbell_qpn

Fixes: 492f5add4be8 ("net/mlx4_en: Doorbell is byteswapped in Little Endian archs")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_tx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 8a32a8f7f9c0..2cc82dc07397 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -718,7 +718,7 @@ void mlx4_en_xmit_doorbell(struct mlx4_en_tx_ring *ring)
 #else
 	iowrite32be(
 #endif
-		  ring->doorbell_qpn,
+		  (__force u32)ring->doorbell_qpn,
 		  ring->bf.uar->map + MLX4_SEND_DOORBELL);
 }

-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 1/3] net/mlx4: Fix endianness issue in qp context params
From: Tariq Toukan @ 2017-10-09 13:59 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Eran Ben Elisha, Tariq Toukan
In-Reply-To: <1507557590-17747-1-git-send-email-tariqt@mellanox.com>

Should take care of the endianness before assigning to params2 field.

Fixes: 53f33ae295a5 ("net/mlx4_core: Port aggregation upper layer interface")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_resources.c     | 2 +-
 drivers/net/ethernet/mellanox/mlx4/qp.c               | 2 +-
 drivers/net/ethernet/mellanox/mlx4/resource_tracker.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_resources.c b/drivers/net/ethernet/mellanox/mlx4/en_resources.c
index 5a47f9669621..6883ac75d37f 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_resources.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_resources.c
@@ -53,7 +53,7 @@ void mlx4_en_fill_qp_context(struct mlx4_en_priv *priv, int size, int stride,
 	if (is_tx) {
 		context->sq_size_stride = ilog2(size) << 3 | (ilog2(stride) - 4);
 		if (mdev->dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_PORT_REMAP)
-			context->params2 |= MLX4_QP_BIT_FPP;
+			context->params2 |= cpu_to_be32(MLX4_QP_BIT_FPP);
 
 	} else {
 		context->sq_size_stride = ilog2(TXBB_SIZE) - 4;
diff --git a/drivers/net/ethernet/mellanox/mlx4/qp.c b/drivers/net/ethernet/mellanox/mlx4/qp.c
index 728a2fb1f5c0..203320923340 100644
--- a/drivers/net/ethernet/mellanox/mlx4/qp.c
+++ b/drivers/net/ethernet/mellanox/mlx4/qp.c
@@ -925,7 +925,7 @@ int mlx4_qp_to_ready(struct mlx4_dev *dev, struct mlx4_mtt *mtt,
 		context->flags &= cpu_to_be32(~(0xf << 28));
 		context->flags |= cpu_to_be32(states[i + 1] << 28);
 		if (states[i + 1] != MLX4_QP_STATE_RTR)
-			context->params2 &= ~MLX4_QP_BIT_FPP;
+			context->params2 &= ~cpu_to_be32(MLX4_QP_BIT_FPP);
 		err = mlx4_qp_modify(dev, mtt, states[i], states[i + 1],
 				     context, 0, 0, qp);
 		if (err) {
diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
index fabb53379727..04304dd894c6 100644
--- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
+++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
@@ -3185,7 +3185,7 @@ static int verify_qp_parameters(struct mlx4_dev *dev,
 	optpar	= be32_to_cpu(*(__be32 *) inbox->buf);
 
 	if (slave != mlx4_master_func_num(dev)) {
-		qp_ctx->params2 &= ~MLX4_QP_BIT_FPP;
+		qp_ctx->params2 &= ~cpu_to_be32(MLX4_QP_BIT_FPP);
 		/* setting QP rate-limit is disallowed for VFs */
 		if (qp_ctx->rate_limit_params)
 			return -EPERM;
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 0/3] Fix mlx4 static checker warnings
From: Tariq Toukan @ 2017-10-09 13:59 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Eran Ben Elisha, Tariq Toukan

Hi Dave,

This patchset contains fixes for static checker warnings
in the mlx4 Core and Eth drivers.

Patch 1 fixes an actual bug discovered by the checker.
Patches 2 and 3 fix the warnings without functional changes.

Series generated against net-next commit:
c49c777f9c87 qed: Delete redundant check on dcb_app priority

Thanks,
Tariq.

Tariq Toukan (3):
  net/mlx4: Fix endianness issue in qp context params
  net/mlx4_core: Fix cast warning in fw.c
  net/mlx4_en: Use __force to fix a sparse warning in TX datapath

 drivers/net/ethernet/mellanox/mlx4/en_resources.c     | 2 +-
 drivers/net/ethernet/mellanox/mlx4/en_tx.c            | 2 +-
 drivers/net/ethernet/mellanox/mlx4/fw.c               | 6 +++---
 drivers/net/ethernet/mellanox/mlx4/qp.c               | 2 +-
 drivers/net/ethernet/mellanox/mlx4/resource_tracker.c | 2 +-
 5 files changed, 7 insertions(+), 7 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* Re: [pull request][for-next 0/9] Mellanox, mlx5 updates 2017-10-06
From: Doug Ledford @ 2017-10-09 13:47 UTC (permalink / raw)
  To: Saeed Mahameed, David S. Miller; +Cc: netdev, linux-rdma, Leon Romanovsky
In-Reply-To: <20171006233749.25545-1-saeedm@mellanox.com>

On Fri, 2017-10-06 at 16:37 -0700, Saeed Mahameed wrote:
> The following changes since commit
> e19b205be43d11bff638cad4487008c48d21c103:
> 
>   Linux 4.14-rc2 (2017-09-24 16:38:56 -0700)

Thanks for keeping the base at rc2 like I requested.  Pulled.

-- 
Doug Ledford <dledford@redhat.com>
    GPG KeyID: B826A3330E572FDD
    Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

^ permalink raw reply

* Re: [net-next V5 PATCH 1/5] bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAP
From: Daniel Borkmann @ 2017-10-09 13:40 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, netdev
  Cc: jakub.kicinski, Michael S. Tsirkin, pavel.odintsov, Jason Wang,
	mchan, John Fastabend, peter.waskiewicz.jr, Daniel Borkmann,
	Alexei Starovoitov, Andy Gospodarek
In-Reply-To: <59DB7A29.5050906@iogearbox.net>

On 10/09/2017 03:31 PM, Daniel Borkmann wrote:
> On 10/06/2017 06:12 PM, Jesper Dangaard Brouer wrote:
[...]
>> +    /* Pre-limit array size based on NR_CPUS, not final CPU check */
>> +    if (cmap->map.max_entries > NR_CPUS)
>
> Nit: needs to be >= NR_CPUS.

Scratch that comment, you bail out on key_cpu >= cmap->map.max_entries
in the other handlers, so that's fine.

^ permalink raw reply

* Re: [net-next V5 PATCH 1/5] bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAP
From: Daniel Borkmann @ 2017-10-09 13:31 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, netdev
  Cc: jakub.kicinski, Michael S. Tsirkin, pavel.odintsov, Jason Wang,
	mchan, John Fastabend, peter.waskiewicz.jr, Daniel Borkmann,
	Alexei Starovoitov, Andy Gospodarek
In-Reply-To: <150730636196.22839.17119032803741721925.stgit@firesoul>

On 10/06/2017 06:12 PM, Jesper Dangaard Brouer wrote:
[...]
> +static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
> +{
> +	struct bpf_cpu_map *cmap;
> +	int err = -ENOMEM;

err init here is basically not needed since overriden later anyway
w/o being read, but ...

> +	u64 cost;
> +
> +	if (!capable(CAP_SYS_ADMIN))
> +		return ERR_PTR(-EPERM);
> +
> +	/* check sanity of attributes */
> +	if (attr->max_entries == 0 || attr->key_size != 4 ||
> +	    attr->value_size != 4 || attr->map_flags & ~BPF_F_NUMA_NODE)
> +		return ERR_PTR(-EINVAL);
> +
> +	cmap = kzalloc(sizeof(*cmap), GFP_USER);
> +	if (!cmap)
> +		return ERR_PTR(-ENOMEM);
> +
> +	/* mandatory map attributes */
> +	cmap->map.map_type = attr->map_type;
> +	cmap->map.key_size = attr->key_size;
> +	cmap->map.value_size = attr->value_size;
> +	cmap->map.max_entries = attr->max_entries;
> +	cmap->map.map_flags = attr->map_flags;
> +	cmap->map.numa_node = bpf_map_attr_numa_node(attr);
> +
> +	/* Pre-limit array size based on NR_CPUS, not final CPU check */
> +	if (cmap->map.max_entries > NR_CPUS)

Nit: needs to be >= NR_CPUS.

> +		return ERR_PTR(-E2BIG);
> +
> +	/* make sure page count doesn't overflow */
> +	cost = (u64) cmap->map.max_entries * sizeof(struct bpf_cpu_map_entry *);
> +	cost += cpu_map_bitmap_size(attr) * num_possible_cpus();
> +	if (cost >= U32_MAX - PAGE_SIZE)
> +		goto free_cmap;
> +	cmap->map.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
> +
> +	/* Notice returns -EPERM on if map size is larger than memlock limit */
> +	err = bpf_map_precharge_memlock(cmap->map.pages);
> +	if (err)
> +		goto free_cmap;

... here, you need to set err = -ENOMEM.

> +	/* A per cpu bitfield with a bit per possible CPU in map  */
> +	cmap->flush_needed = __alloc_percpu(cpu_map_bitmap_size(attr),
> +					    __alignof__(unsigned long));
> +	if (!cmap->flush_needed)
> +		goto free_cmap;

Otherwise when we fail here or in error case for bpf_map_area_alloc()
below, we still return 0 although it's really -ENOMEM. And returning 0,
would mean that find_and_alloc_map() will miss this since it only tests
for IS_ERR(), and we'll crash later on thinking we have a valid map
pointer.

> +	/* Alloc array for possible remote "destination" CPUs */
> +	cmap->cpu_map = bpf_map_area_alloc(cmap->map.max_entries *
> +					   sizeof(struct bpf_cpu_map_entry *),
> +					   cmap->map.numa_node);
> +	if (!cmap->cpu_map)
> +		goto free_cmap;
> +
> +	return &cmap->map;
> +free_cmap:
> +	free_percpu(cmap->flush_needed);
> +	kfree(cmap);
> +	return ERR_PTR(err);
> +}
> +
[...]
> +int cpu_map_update_elem(struct bpf_map *map, void *key, void *value,
> +				u64 map_flags)
> +{
> +	struct bpf_cpu_map *cmap = container_of(map, struct bpf_cpu_map, map);
> +	struct bpf_cpu_map_entry *rcpu;
> +
> +	/* Array index key correspond to CPU number */
> +	u32 key_cpu = *(u32 *)key;
> +	/* Value is the queue size */
> +	u32 qsize = *(u32 *)value;
> +
> +	/* Make sure CPU is a valid possible cpu */
> +	if (!cpu_possible(key_cpu))
> +		return -ENODEV;

Nit: cpu_possible() expects that key_cpu < NR_CPUS, otherwise you'd
access the bitmap out of bounds.

Better move the below test for 'key_cpu >= cmap->map.max_entries'
first as on map alloc you enforce upper limit of NR_CPUS on the
max_entries, then above cpu_possible() test will be valid, too.

> +	if (unlikely(map_flags > BPF_EXIST))
> +		return -EINVAL;
> +	if (unlikely(key_cpu >= cmap->map.max_entries))
> +		return -E2BIG;
> +	if (unlikely(map_flags == BPF_NOEXIST))
> +		return -EEXIST;
> +	if (unlikely(qsize > 16384)) /* sanity limit on qsize */
> +		return -EOVERFLOW;
> +
> +	if (qsize == 0) {
> +		rcpu = NULL; /* Same as deleting */
> +	} else {
> +		/* Updating qsize cause re-allocation of bpf_cpu_map_entry */
> +		rcpu = __cpu_map_entry_alloc(qsize, key_cpu, map->id);
> +		if (!rcpu)
> +			return -ENOMEM;
> +	}
> +	rcu_read_lock();
> +	__cpu_map_entry_replace(cmap, key_cpu, rcpu);
> +	rcu_read_unlock();
> +	return 0;
> +}
[...]
> +struct bpf_cpu_map_entry *__cpu_map_lookup_elem(struct bpf_map *map, u32 key)
> +{
> +	struct bpf_cpu_map *cmap = container_of(map, struct bpf_cpu_map, map);
> +	struct bpf_cpu_map_entry *rcpu;
> +
> +	if (key >= map->max_entries)
> +		return NULL;
> +
> +	rcpu = READ_ONCE(cmap->cpu_map[key]);
> +	return rcpu;
> +}
> +
> +static void *cpu_map_lookup_elem(struct bpf_map *map, void *key)
> +{
> +	struct bpf_cpu_map_entry *rcpu =
> +		__cpu_map_lookup_elem(map, *(u32 *)key);
> +
> +	return rcpu ? &rcpu->qsize : NULL;

I still think from my prior email/comment that we should use per-cpu
scratch buffer here. Would be nice to keep the guarantee that noone
can modify it, it's just a tiny change.

> +}
> +
> +static int cpu_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
> +{
> +	struct bpf_cpu_map *cmap = container_of(map, struct bpf_cpu_map, map);
> +	u32 index = key ? *(u32 *)key : U32_MAX;
> +	u32 *next = next_key;
> +
> +	if (index >= cmap->map.max_entries) {
> +		*next = 0;
> +		return 0;
> +	}
> +
> +	if (index == cmap->map.max_entries - 1)
> +		return -ENOENT;
> +	*next = index + 1;
> +	return 0;
> +}
> +
> +const struct bpf_map_ops cpu_map_ops = {
> +	.map_alloc		= cpu_map_alloc,
> +	.map_free		= cpu_map_free,
> +	.map_delete_elem	= cpu_map_delete_elem,
> +	.map_update_elem	= cpu_map_update_elem,
> +	.map_lookup_elem	= cpu_map_lookup_elem,
> +	.map_get_next_key	= cpu_map_get_next_key,
> +};

^ permalink raw reply

* Re: [PATCH 10/13] timer: Remove expires and data arguments from DEFINE_TIMER
From: Ralf Baechle @ 2017-10-09 13:27 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thomas Gleixner, Andrew Morton, Arnd Bergmann,
	Benjamin Herrenschmidt, Chris Metcalf, Geert Uytterhoeven,
	Greg Kroah-Hartman, Guenter Roeck, Harish Patil, Heiko Carstens,
	James E.J. Bottomley, John Stultz, Julian Wiedmann, Kalle Valo,
	Lai Jiangshan, Len Brown, Manish Chopra, Mark Gross,
	"Martin K. Petersen"
In-Reply-To: <1507159627-127660-11-git-send-email-keescook@chromium.org>

On Wed, Oct 04, 2017 at 04:27:04PM -0700, Kees Cook wrote:

> Subject: [PATCH 10/13] timer: Remove expires and data arguments from
>  DEFINE_TIMER
> 
> Drop the arguments from the macro and adjust all callers with the
> following script:
> 
>   perl -pi -e 's/DEFINE_TIMER\((.*), 0, 0\);/DEFINE_TIMER($1);/g;' \
>     $(git grep DEFINE_TIMER | cut -d: -f1 | sort -u | grep -v timer.h)
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>
> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # for m68k parts
> ---
>  arch/arm/mach-ixp4xx/dsmg600-setup.c      | 2 +-
>  arch/arm/mach-ixp4xx/nas100d-setup.c      | 2 +-
>  arch/m68k/amiga/amisound.c                | 2 +-
>  arch/m68k/mac/macboing.c                  | 2 +-
>  arch/mips/mti-malta/malta-display.c       | 2 +-
>  arch/parisc/kernel/pdc_cons.c             | 2 +-
>  arch/s390/mm/cmm.c                        | 2 +-
>  drivers/atm/idt77105.c                    | 4 ++--
>  drivers/atm/iphase.c                      | 2 +-
>  drivers/block/ataflop.c                   | 8 ++++----
>  drivers/char/dtlk.c                       | 2 +-
>  drivers/char/hangcheck-timer.c            | 2 +-
>  drivers/char/nwbutton.c                   | 2 +-
>  drivers/char/rtc.c                        | 2 +-
>  drivers/input/touchscreen/s3c2410_ts.c    | 2 +-
>  drivers/net/cris/eth_v10.c                | 6 +++---
>  drivers/net/hamradio/yam.c                | 2 +-
>  drivers/net/wireless/atmel/at76c50x-usb.c | 2 +-
>  drivers/staging/speakup/main.c            | 2 +-
>  drivers/staging/speakup/synth.c           | 2 +-
>  drivers/tty/cyclades.c                    | 2 +-
>  drivers/tty/isicom.c                      | 2 +-
>  drivers/tty/moxa.c                        | 2 +-
>  drivers/tty/rocket.c                      | 2 +-
>  drivers/tty/vt/keyboard.c                 | 2 +-
>  drivers/tty/vt/vt.c                       | 2 +-
>  drivers/watchdog/alim7101_wdt.c           | 2 +-
>  drivers/watchdog/machzwd.c                | 2 +-
>  drivers/watchdog/mixcomwd.c               | 2 +-
>  drivers/watchdog/sbc60xxwdt.c             | 2 +-
>  drivers/watchdog/sc520_wdt.c              | 2 +-
>  drivers/watchdog/via_wdt.c                | 2 +-
>  drivers/watchdog/w83877f_wdt.c            | 2 +-
>  drivers/xen/grant-table.c                 | 2 +-
>  fs/pstore/platform.c                      | 2 +-
>  include/linux/timer.h                     | 4 ++--
>  kernel/irq/spurious.c                     | 2 +-
>  lib/random32.c                            | 2 +-
>  net/atm/mpc.c                             | 2 +-
>  net/decnet/dn_route.c                     | 2 +-
>  net/ipv6/ip6_flowlabel.c                  | 2 +-
>  net/netrom/nr_loopback.c                  | 2 +-
>  security/keys/gc.c                        | 2 +-
>  sound/oss/midibuf.c                       | 2 +-
>  sound/oss/soundcard.c                     | 2 +-
>  sound/oss/sys_timer.c                     | 2 +-
>  sound/oss/uart6850.c                      | 2 +-
>  47 files changed, 54 insertions(+), 54 deletions(-)

Acked-by: Ralf Baechle <ralf@linux-mips.org>

Thanks,

  Ralf

^ permalink raw reply

* Re: [PATCH 09/13] timer: Remove users of expire and data arguments to DEFINE_TIMER
From: Ralf Baechle @ 2017-10-09 13:23 UTC (permalink / raw)
  To: Kees Cook
  Cc: Thomas Gleixner, Wim Van Sebroeck, Guenter Roeck,
	Geert Uytterhoeven, linux-mips, linux-watchdog, Andrew Morton,
	Arnd Bergmann, Benjamin Herrenschmidt, Chris Metcalf,
	Greg Kroah-Hartman, Harish Patil, Heiko Carstens,
	James E.J. Bottomley, John Stultz, Julian Wiedmann, Kalle Valo,
	Lai Jiangshan, Len Brown, Manis
In-Reply-To: <1507159627-127660-10-git-send-email-keescook@chromium.org>

On Wed, Oct 04, 2017 at 04:27:03PM -0700, Kees Cook wrote:

> Subject: [PATCH 09/13] timer: Remove users of expire and data arguments to
>  DEFINE_TIMER
> 
> The expire and data arguments of DEFINE_TIMER are only used in two places
> and are ignored by the code (malta-display.c only uses mod_timer(),
> never add_timer(), so the preset expires value is ignored). Set both
> sets of arguments to zero.
> 
> Cc: Ralf Baechle <ralf@linux-mips.org>
> Cc: Wim Van Sebroeck <wim@iguana.be>
> Cc: Guenter Roeck <linux@roeck-us.net>
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: linux-mips@linux-mips.org
> Cc: linux-watchdog@vger.kernel.org
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  arch/mips/mti-malta/malta-display.c | 6 +++---
>  drivers/watchdog/alim7101_wdt.c     | 4 ++--
>  2 files changed, 5 insertions(+), 5 deletions(-)

For malta-display:

Acked-by: Ralf Baechle <ralf@linux-mips.org>

  Ralf

^ permalink raw reply

* Re: [PATCH net-next] ipv6: avoid zeroing per cpu data again
From: Tejun Heo @ 2017-10-09 13:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Martin KaFai Lau
In-Reply-To: <1507554097.31614.11.camel@edumazet-glaptop3.roam.corp.google.com>

On Mon, Oct 09, 2017 at 06:01:37AM -0700, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> per cpu allocations are already zeroed, no need to clear them again.
> 
> Fixes: d52d3997f843f ("ipv6: Create percpu rt6_info")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Martin KaFai Lau <kafai@fb.com>
> Cc: Tejun Heo <tj@kernel.org>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply

* [PATCH] thunderbolt: Initialize Thunderbolt bus earlier
From: Mika Westerberg @ 2017-10-09 13:22 UTC (permalink / raw)
  To: David S . Miller
  Cc: Andreas Noever, Michael Jamet, Yehezkel Bernat, kernel test robot,
	Greg Kroah-Hartman, Andy Shevchenko, Mika Westerberg, netdev,
	linux-kernel

The 0day kbuild robot reports following crash:

  BUG: unable to handle kernel NULL pointer dereference at 00000004
  IP: tb_property_find+0xe/0x41
  *pde = 00000000
  Oops: 0000 [#1]
  CPU: 0 PID: 1 Comm: swapper Not tainted 4.14.0-rc1-00741-ge69b6c0 #412
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
  task: 89c80000 task.stack: 89c7c000
  EIP: tb_property_find+0xe/0x41
  EFLAGS: 00210246 CPU: 0
  EAX: 00000000 EBX: 7a368f47 ECX: 00000044 EDX: 7a368f47
  ESI: 8851d340 EDI: 7a368f47 EBP: 89c7df0c ESP: 89c7defc
   DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
  CR0: 80050033 CR2: 00000004 CR3: 027a2000 CR4: 00000690
  Call Trace:
   tb_register_property_dir+0x49/0xb9
   ? cdc_mbim_driver_init+0x1b/0x1b
   tbnet_init+0x77/0x9f
   ? cdc_mbim_driver_init+0x1b/0x1b
   do_one_initcall+0x7e/0x145
   ? parse_args+0x10c/0x1b3
   ? kernel_init_freeable+0xbe/0x159
   kernel_init_freeable+0xd1/0x159
   ? rest_init+0x110/0x110
   kernel_init+0xd/0xd0
   ret_from_fork+0x19/0x30

The reason is that both Thunderbolt bus and thunderbolt-net are build
into the kernel image, and the latter is linked first because
drivers/net comes before drivers/thunderbolt. Since both use
module_init() thunderbolt-net ends up calling Thunderbolt bus functions
too early triggering the above crash.

Fix this by moving Thunderbolt bus initialization to happen earlier to
make sure all the data structures are ready when Thunderbolt service
drivers are initialized. To be on the safe side also add a check for
properly initialized xdomain_property_dir to tb_register_property_dir().

Reported-by: kernel test robot <fengguang.wu@intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---
Hi David,

This fixes a crash introduced in the Thunderbolt networking patches, so I'm
wondering could you take this to your net-next as well?

Thanks.

 drivers/thunderbolt/nhi.c     | 2 +-
 drivers/thunderbolt/xdomain.c | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c
index 0e79eebfcbb7..419a7a90bce0 100644
--- a/drivers/thunderbolt/nhi.c
+++ b/drivers/thunderbolt/nhi.c
@@ -1144,5 +1144,5 @@ static void __exit nhi_unload(void)
 	tb_domain_exit();
 }
 
-module_init(nhi_init);
+fs_initcall(nhi_init);
 module_exit(nhi_unload);
diff --git a/drivers/thunderbolt/xdomain.c b/drivers/thunderbolt/xdomain.c
index f2d06f6f7be9..138027537d29 100644
--- a/drivers/thunderbolt/xdomain.c
+++ b/drivers/thunderbolt/xdomain.c
@@ -1487,6 +1487,9 @@ int tb_register_property_dir(const char *key, struct tb_property_dir *dir)
 {
 	int ret;
 
+	if (WARN_ON(!xdomain_property_dir))
+		return -EAGAIN;
+
 	if (!key || strlen(key) > 8)
 		return -EINVAL;
 
-- 
2.14.2

^ permalink raw reply related

* Re: [PATCH v2] netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'
From: Pablo Neira Ayuso @ 2017-10-09 13:18 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Shmulik Ladkani, netfilter-devel, netdev, Rafael Buchbinder,
	Shmulik Ladkani, Willem de Bruijn
In-Reply-To: <59DB6D22.6080507@iogearbox.net>

On Mon, Oct 09, 2017 at 02:35:46PM +0200, Daniel Borkmann wrote:
> On 10/09/2017 02:27 PM, Shmulik Ladkani wrote:
> >From: Shmulik Ladkani <shmulik.ladkani@gmail.com>
> >
> >Commit 2c16d6033264 ("netfilter: xt_bpf: support ebpf") introduced
> >support for attaching an eBPF object by an fd, with the
> >'bpf_mt_check_v1' ABI expecting the '.fd' to be specified upon each
> >IPT_SO_SET_REPLACE call.
> >
> >However this breaks subsequent iptables calls:
> >
> >  # iptables -A INPUT -m bpf --object-pinned /sys/fs/bpf/xxx -j ACCEPT
> >  # iptables -A INPUT -s 5.6.7.8 -j ACCEPT
> >  iptables: Invalid argument. Run `dmesg' for more information.
> >
> >That's because iptables works by loading exising rules using
> >IPT_SO_GET_ENTRIES to userspace, then issuing IPT_SO_SET_REPLACE with
> >the replacement set.
> >
> >However, the loaded 'xt_bpf_info_v1' has an arbitrary '.fd' number
> >(from the initial "iptables -m bpf" invocation) - so when 2nd invocation
> >occurs, userspace passes a bogus fd number, which leads to
> >'bpf_mt_check_v1' to fail.
> >
> >One suggested solution [1] was to hack iptables userspace, to perform a
> >"entries fixup" immediatley after IPT_SO_GET_ENTRIES, by opening a new,
> >process-local fd per every 'xt_bpf_info_v1' entry seen.
> >
> >However, in [2] both Pablo Neira Ayuso and Willem de Bruijn suggested to
> >depricate the xt_bpf_info_v1 ABI dealing with pinned ebpf objects.
> >
> >This fix changes the XT_BPF_MODE_FD_PINNED behavior to ignore the given
> >'.fd' and instead perform an in-kernel lookup for the bpf object given
> >the provided '.path'.
> >
> >It also defines an alias for the XT_BPF_MODE_FD_PINNED mode, named
> >XT_BPF_MODE_PATH_PINNED, to better reflect the fact that the user is
> >expected to provide the path of the pinned object.
> >
> >Existing XT_BPF_MODE_FD_ELF behavior (non-pinned fd mode) is preserved.
> >
> >References: [1] https://marc.info/?l=netfilter-devel&m=150564724607440&w=2
> >             [2] https://marc.info/?l=netfilter-devel&m=150575727129880&w=2
> >
> >Cc: Pablo Neira Ayuso <pablo@netfilter.org>
> >Cc: Willem de Bruijn <willemb@google.com>
> >Reported-by: Rafael Buchbinder <rafi@rbk.ms>
> >Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
> 
> Acked-by: Daniel Borkmann <daniel@iogearbox.net>

Applied, thanks everyone.

^ permalink raw reply

* Re: [PATCH net-next] net: mvpp2: phylink support
From: Russell King - ARM Linux @ 2017-10-09 13:09 UTC (permalink / raw)
  To: Antoine Tenart
  Cc: davem, andrew, gregory.clement, thomas.petazzoni, miquel.raynal,
	nadavh, linux-kernel, mw, stefanc, netdev
In-Reply-To: <20171009125527.GA24414@kwain>

On Mon, Oct 09, 2017 at 02:55:27PM +0200, Antoine Tenart wrote:
> Hi Russell,
> 
> On Mon, Sep 25, 2017 at 11:55:14AM +0200, Antoine Tenart wrote:
> > On Fri, Sep 22, 2017 at 12:07:31PM +0100, Russell King - ARM Linux wrote:
> > > On Thu, Sep 21, 2017 at 03:45:22PM +0200, Antoine Tenart wrote:
> > 
> > > > +static int mvpp2_phylink_mac_link_state(struct net_device *dev,
> > > > +					struct phylink_link_state *state)
> > > > +{
> > > > +	struct mvpp2_port *port = netdev_priv(dev);
> > > > +	u32 val;
> > > > +
> > > > +	if (!phy_interface_mode_is_rgmii(port->phy_interface) &&
> > > > +	    port->phy_interface != PHY_INTERFACE_MODE_SGMII)
> > > > +		return 0;
> > > 
> > > You're blocking this for 1000base-X and 10G connections, which is not
> > > correct.  The expectation is that this function returns the current
> > > MAC state irrespective of the interface mode.
> > 
> > I moved what was already supported in the PPv2 driver and did not
> > implemented the full set of what is supported. It's not perfect, but it
> > does move what was already supported.
> > 
> > Any reason not to first move what's already supported to phylink, and
> > then add more supported modes in separate patches?
> 
> Any thoughts on this?

You're asking me to comment about something I know little about as
I've not used mvpp2.c.  I don't know the details of what your "already
supported" statement refers to.  Maybe you could give some clues -
maybe produce a list of what mvpp2 currently supports?

Here's the link modes that phylink supports:
1. PHY based links
2. PHYless fixed links with details specified in DT, in the same way as
   the existing "fixed-link" support works, but without needing to create
   fake PHYs.
3. PHYless fixed links with GPIO link indication (again, same way as the
   existing fixed-link support.)
4. Direct fibre connections via fixed-link or SFP.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply

* [PATCH net-next] ipv6: avoid zeroing per cpu data again
From: Eric Dumazet @ 2017-10-09 13:01 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Martin KaFai Lau, Tejun Heo

From: Eric Dumazet <edumazet@google.com>

per cpu allocations are already zeroed, no need to clear them again.

Fixes: d52d3997f843f ("ipv6: Create percpu rt6_info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Tejun Heo <tj@kernel.org>
---
 net/ipv6/route.c |   12 +-----------
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 606e80325b21c0e10a02e9c7d5b3fcfbfc26a003..3d7d4e09301ef4deae1985412599c6f4e973c46f 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -377,17 +377,7 @@ struct rt6_info *ip6_dst_alloc(struct net *net,
 
 	if (rt) {
 		rt->rt6i_pcpu = alloc_percpu_gfp(struct rt6_info *, GFP_ATOMIC);
-		if (rt->rt6i_pcpu) {
-			int cpu;
-
-			for_each_possible_cpu(cpu) {
-				struct rt6_info **p;
-
-				p = per_cpu_ptr(rt->rt6i_pcpu, cpu);
-				/* no one shares rt */
-				*p =  NULL;
-			}
-		} else {
+		if (!rt->rt6i_pcpu) {
 			dst_release_immediate(&rt->dst);
 			return NULL;
 		}

^ permalink raw reply related

* [RFC net-next] ppp: allow usage in namespaces
From: Matteo Croce @ 2017-10-09 12:59 UTC (permalink / raw)
  To: Paul Mackerras, linux-ppp, netdev; +Cc: David Miller

Check for CAP_NET_ADMIN with ns_capable() instead of capable()
to allow usage of ppp in user namespace other than the init one.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
---
 drivers/net/ppp/ppp_generic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index a404552555d4..be5634b4835b 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -389,7 +389,7 @@ static int ppp_open(struct inode *inode, struct file *file)
 	/*
 	 * This could (should?) be enforced by the permissions on /dev/ppp.
 	 */
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(file->f_cred->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 	return 0;
 }
-- 
2.13.6

^ permalink raw reply related

* Re: [PATCH net-next] net: mvpp2: phylink support
From: Antoine Tenart @ 2017-10-09 12:55 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Antoine Tenart, davem, andrew, gregory.clement, thomas.petazzoni,
	miquel.raynal, nadavh, linux-kernel, mw, stefanc, netdev
In-Reply-To: <20170925095514.GA19364@kwain>

Hi Russell,

On Mon, Sep 25, 2017 at 11:55:14AM +0200, Antoine Tenart wrote:
> On Fri, Sep 22, 2017 at 12:07:31PM +0100, Russell King - ARM Linux wrote:
> > On Thu, Sep 21, 2017 at 03:45:22PM +0200, Antoine Tenart wrote:
> 
> > > +static int mvpp2_phylink_mac_link_state(struct net_device *dev,
> > > +					struct phylink_link_state *state)
> > > +{
> > > +	struct mvpp2_port *port = netdev_priv(dev);
> > > +	u32 val;
> > > +
> > > +	if (!phy_interface_mode_is_rgmii(port->phy_interface) &&
> > > +	    port->phy_interface != PHY_INTERFACE_MODE_SGMII)
> > > +		return 0;
> > 
> > You're blocking this for 1000base-X and 10G connections, which is not
> > correct.  The expectation is that this function returns the current
> > MAC state irrespective of the interface mode.
> 
> I moved what was already supported in the PPv2 driver and did not
> implemented the full set of what is supported. It's not perfect, but it
> does move what was already supported.
> 
> Any reason not to first move what's already supported to phylink, and
> then add more supported modes in separate patches?

Any thoughts on this?

> > > +static void mvpp2_mac_config(struct net_device *dev, unsigned int mode,
> > > +			     const struct phylink_link_state *state)
> > > +{
> > > +	struct mvpp2_port *port = netdev_priv(dev);
> > > +	u32 val;
> > > +
> > > +	/* disable current port for reconfiguration */
> > > +	mvpp2_interrupts_disable(port);
> > > +	netif_carrier_off(port->dev);
> > > +	mvpp2_port_disable(port);
> > > +	phy_power_off(port->comphy);
> > > +
> > > +	/* comphy reconfiguration */
> > > +	port->phy_interface = state->interface;
> > > +	mvpp22_comphy_init(port);
> > > +
> > > +	/* gop/mac reconfiguration */
> > > +	mvpp22_gop_init(port);
> > > +	mvpp2_port_mii_set(port);
> > > +
> > > +	if (!phy_interface_mode_is_rgmii(port->phy_interface) &&
> > > +	    port->phy_interface != PHY_INTERFACE_MODE_SGMII)
> > > +		return;
> > 
> > Again, 1000base-X is excluded, which will break it.  You do need
> > to avoid touching the GMAC for 10G connections however.
> 
> Same comment as above.

Antoine

-- 
Antoine Ténart, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply

* [PATCH net] udp: fix bcast packet reception
From: Paolo Abeni @ 2017-10-09 12:52 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Hannes Frederic Sowa

The commit bc044e8db796 ("udp: perform source validation for
mcast early demux") does not take into account that broadcast packets
lands in the same code path and they need different checks for the
source address - notably, zero source address are valid for bcast
and invalid for mcast.

As a result, 2nd and later broadcast packets with 0 source address
landing to the same socket are dropped. This breaks dhcp servers.

Since we don't have stringent performance requirements for ingress
broadcast traffic, fix it by disabling UDP early demux such traffic.

Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Fixes: bc044e8db796 ("udp: perform source validation for mcast early demux")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/ipv4/udp.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 5676237d2b0f..e45177ceb0ee 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2240,20 +2240,16 @@ int udp_v4_early_demux(struct sk_buff *skb)
 	iph = ip_hdr(skb);
 	uh = udp_hdr(skb);

-	if (skb->pkt_type == PACKET_BROADCAST ||
-	    skb->pkt_type == PACKET_MULTICAST) {
+	if (skb->pkt_type == PACKET_MULTICAST) {
 		in_dev = __in_dev_get_rcu(skb->dev);

 		if (!in_dev)
 			return 0;

-		/* we are supposed to accept bcast packets */
-		if (skb->pkt_type == PACKET_MULTICAST) {
-			ours = ip_check_mc_rcu(in_dev, iph->daddr, iph->saddr,
-					       iph->protocol);
-			if (!ours)
-				return 0;
-		}
+		ours = ip_check_mc_rcu(in_dev, iph->daddr, iph->saddr,
+				       iph->protocol);
+		if (!ours)
+			return 0;

 		sk = __udp4_lib_mcast_demux_lookup(net, uh->dest, iph->daddr,
 						   uh->source, iph->saddr,
-- 
2.13.6

^ permalink raw reply related

* Re: [PATCH 2/4] ravb: Add optional PHY reset during system resume
From: Sergei Shtylyov @ 2017-10-09 12:50 UTC (permalink / raw)
  To: Florian Fainelli, Geert Uytterhoeven
  Cc: Geert Uytterhoeven, David S . Miller, Simon Horman, Magnus Damm,
	Andrew Lunn, Niklas Söderlund, netdev@vger.kernel.org,
	Linux-Renesas, devicetree@vger.kernel.org
In-Reply-To: <316a7f99-d2ca-cdb2-2eca-deb044085c52@cogentembedded.com>

[-- Attachment #1: Type: text/plain, Size: 2880 bytes --]

On 10/09/2017 12:37 PM, Sergei Shtylyov wrote:

>>>>>>> If the optional "reset-gpios" property is specified in DT, the generic
>>>>>>> MDIO bus code takes care of resetting the PHY during device probe.
>>>>>>> However, the PHY may still have to be reset explicitly after system
>>>>>>> resume.
>>>>>>>
>>>>>>> This allows to restore Ethernet operation after resume from s2ram on
>>>>>>> Salvator-XS, where the enable pin of the regulator providing PHY power
>>>>>>> is connected to PRESETn, and PSCI suspend powers down the SoC.
>>>>>>>
>>>>>>> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
>>>>>>> ---
>>>>>>>    drivers/net/ethernet/renesas/ravb_main.c | 9 +++++++++
>>>>>>>    1 file changed, 9 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/net/ethernet/renesas/ravb_main.c
>>>>>>> b/drivers/net/ethernet/renesas/ravb_main.c
>>>>>>> index fdf30bfa403bf416..96d1d48e302f8c9a 100644
>>>>>>> --- a/drivers/net/ethernet/renesas/ravb_main.c
>>>>>>> +++ b/drivers/net/ethernet/renesas/ravb_main.c
>>> [...]
>>>>>>> @@ -2302,6 +2304,13 @@ static int __maybe_unused ravb_resume(struct
>>>>>>> device *dev)he patches
>>>>>>>          * reopen device if it was running before system suspended.
>>>>>>>          */
>>>>>>>
>>>>>>> +     /* PHY reset */
>>>>>>> +     if (bus->reset_gpiod) {
>>>>>>> +             gpiod_set_value_cansleep(bus->reset_gpiod, 1);
>>>>>>> +             udelay(bus->reset_delay_us);
>>>>>>> +             gpiod_set_value_cansleep(bus->reset_gpiod, 0);
>>>>>>> +     }
>>>>>>
>>>>>> This is a clever hack, but unfortunately this is also misusing the MDIO
>>>>>> bus reset line into a PHY reset line. As commented in patch 3, if this
>>>>>> reset line is tied to the PHY, then this should be a PHY property and
>>>>>
>>>>> OK.
>>>>>
>>>>>> you cannot (ab)use the MDIO bus GPIO reset logic anymore...
>>>>>
>>>>> And then I should add reset-gpios support to drivers/net/phy/micrel.c?
>>>>> Or is there already generic code to handle per-PHY reset? I couldn't
>>>>> find it.
>>>>
>>>> There is not such a thing unfortunately, but it would presumably be
>>>
>>>     It's strange you don't remember about my (abandoned) patches to
>>> handle per=PHY reset GPIOs -- perhaps it's time to unearth them. Here
>>> they are:
>>>
>>> http://patchwork.ozlabs.org/patch/616495/
>>> http://patchwork.ozlabs.org/patch/616501/
>>>
>>>     I had v3 in the works before abandoning this series -- it doesn't
>>> apply now.
>>
>> Should Geert pick-up where you left and address the feedback given in
>> v2, or do you plan to post a rebased v3?
> 
>     I was going to address the rejects in v3 and give the patchset to Geert... 
> unfortunately, this took so-o-o long. I'm going to do it today, at last.

    Here you are! Attaching both the patches against net-next.git...
Perhaps I also will be able to return to this subject in the near future...

MBR, Sergei

[-- Attachment #2: phylib-add-device-reset-GPIO-support-v3.patch --]
[-- Type: text/x-patch, Size: 11599 bytes --]

Subject: phylib: add device reset GPIO support

The PHY devices sometimes do have their reset signal (maybe even power
supply?) tied to some GPIO and sometimes it also does happen that a boot
loader does not leave it deasserted. So far this issue has been attacked
from (as I believe) a wrong angle: by teaching the MAC driver to manipulate
the GPIO in question; that solution, when applied to the device trees, led
to adding the PHY reset GPIO properties to the MAC device node, with one
exception: Cadence MACB driver which could handle the "reset-gpios" prop
in a PHY device subnode. I believe that the correct approach is to teach
the 'phylib' to get the MDIO device reset GPIO from the device tree node
corresponding to this device -- which this patch is doing...

Note that I had to modify the AT803x PHY driver as it would stop working
otherwise -- it made use of the reset GPIO for its own purposes...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

---
Changes in version 3:
- fixed the fwnode_get_named_gpiod() call due to the added parameters (which
  allowed to eliminate gpiod_direction_output() call);
- undeleted one blank line in the AT803x driver;
- resolved rejects, refreshed the patch;
- reworded/reformatted the changelog.

Changes in version 2:
- reformatted the changelog;
- resolved rejects, refreshed the patch.

 Documentation/devicetree/bindings/net/phy.txt |    2 +
 drivers/net/phy/at803x.c                      |   18 ++------------
 drivers/net/phy/mdio_bus.c                    |    4 +++
 drivers/net/phy/mdio_device.c                 |   27 +++++++++++++++++++--
 drivers/net/phy/phy_device.c                  |   33 ++++++++++++++++++++++++--
 drivers/of/of_mdio.c                          |   16 ++++++++++++
 include/linux/mdio.h                          |    3 ++
 include/linux/phy.h                           |    5 +++
 8 files changed, 89 insertions(+), 19 deletions(-)

Index: net-next/Documentation/devicetree/bindings/net/phy.txt
===================================================================
--- net-next.orig/Documentation/devicetree/bindings/net/phy.txt
+++ net-next/Documentation/devicetree/bindings/net/phy.txt
@@ -53,6 +53,8 @@ Optional Properties:
   to ensure the integrated PHY is used. The absence of this property indicates
   the muxers should be configured so that the external PHY is used.
 
+- reset-gpios: The GPIO phandle and specifier for the PHY reset signal.
+
 Example:
 
 ethernet-phy@0 {
Index: net-next/drivers/net/phy/at803x.c
===================================================================
--- net-next.orig/drivers/net/phy/at803x.c
+++ net-next/drivers/net/phy/at803x.c
@@ -71,7 +71,6 @@ MODULE_LICENSE("GPL");
 
 struct at803x_priv {
 	bool phy_reset:1;
-	struct gpio_desc *gpiod_reset;
 };
 
 struct at803x_context {
@@ -254,22 +253,11 @@ static int at803x_probe(struct phy_devic
 {
 	struct device *dev = &phydev->mdio.dev;
 	struct at803x_priv *priv;
-	struct gpio_desc *gpiod_reset;
 
 	priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
 	if (!priv)
 		return -ENOMEM;
 
-	if (phydev->drv->phy_id != ATH8030_PHY_ID)
-		goto does_not_require_reset_workaround;
-
-	gpiod_reset = devm_gpiod_get_optional(dev, "reset", GPIOD_OUT_LOW);
-	if (IS_ERR(gpiod_reset))
-		return PTR_ERR(gpiod_reset);
-
-	priv->gpiod_reset = gpiod_reset;
-
-does_not_require_reset_workaround:
 	phydev->priv = priv;
 
 	return 0;
@@ -343,14 +331,14 @@ static void at803x_link_change_notify(st
 	 * cannot recover from by software.
 	 */
 	if (phydev->state == PHY_NOLINK) {
-		if (priv->gpiod_reset && !priv->phy_reset) {
+		if (phydev->mdio.reset && !priv->phy_reset) {
 			struct at803x_context context;
 
 			at803x_context_save(phydev, &context);
 
-			gpiod_set_value(priv->gpiod_reset, 1);
+			phy_device_reset(phydev, 1);
 			msleep(1);
-			gpiod_set_value(priv->gpiod_reset, 0);
+			phy_device_reset(phydev, 0);
 			msleep(1);
 
 			at803x_context_restore(phydev, &context);
Index: net-next/drivers/net/phy/mdio_bus.c
===================================================================
--- net-next.orig/drivers/net/phy/mdio_bus.c
+++ net-next/drivers/net/phy/mdio_bus.c
@@ -38,6 +38,7 @@
 #include <linux/phy.h>
 #include <linux/io.h>
 #include <linux/uaccess.h>
+#include <linux/gpio/consumer.h>
 
 #include <asm/irq.h>
 
@@ -420,6 +421,9 @@ void mdiobus_unregister(struct mii_bus *
 		if (!mdiodev)
 			continue;
 
+		if (mdiodev->reset)
+			gpiod_put(mdiodev->reset);
+
 		mdiodev->device_remove(mdiodev);
 		mdiodev->device_free(mdiodev);
 	}
Index: net-next/drivers/net/phy/mdio_device.c
===================================================================
--- net-next.orig/drivers/net/phy/mdio_device.c
+++ net-next/drivers/net/phy/mdio_device.c
@@ -12,6 +12,8 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include <linux/errno.h>
+#include <linux/gpio.h>
+#include <linux/gpio/consumer.h>
 #include <linux/init.h>
 #include <linux/interrupt.h>
 #include <linux/kernel.h>
@@ -114,6 +116,13 @@ void mdio_device_remove(struct mdio_devi
 }
 EXPORT_SYMBOL(mdio_device_remove);
 
+void mdio_device_reset(struct mdio_device *mdiodev, int value)
+{
+	if (mdiodev->reset)
+		gpiod_set_value(mdiodev->reset, value);
+}
+EXPORT_SYMBOL(mdio_device_reset);
+
 /**
  * mdio_probe - probe an MDIO device
  * @dev: device to probe
@@ -128,9 +137,16 @@ static int mdio_probe(struct device *dev
 	struct mdio_driver *mdiodrv = to_mdio_driver(drv);
 	int err = 0;
 
-	if (mdiodrv->probe)
+	if (mdiodrv->probe) {
+		/* Deassert the reset signal */
+		mdio_device_reset(mdiodev, 0);
+
 		err = mdiodrv->probe(mdiodev);
 
+		/* Assert the reset signal */
+		mdio_device_reset(mdiodev, 1);
+	}
+
 	return err;
 }
 
@@ -140,9 +156,16 @@ static int mdio_remove(struct device *de
 	struct device_driver *drv = mdiodev->dev.driver;
 	struct mdio_driver *mdiodrv = to_mdio_driver(drv);
 
-	if (mdiodrv->remove)
+	if (mdiodrv->remove) {
+		/* Deassert the reset signal */
+		mdio_device_reset(mdiodev, 0);
+
 		mdiodrv->remove(mdiodev);
 
+		/* Assert the reset signal */
+		mdio_device_reset(mdiodev, 1);
+	}
+
 	return 0;
 }
 
Index: net-next/drivers/net/phy/phy_device.c
===================================================================
--- net-next.orig/drivers/net/phy/phy_device.c
+++ net-next/drivers/net/phy/phy_device.c
@@ -632,6 +632,9 @@ int phy_device_register(struct phy_devic
 	if (err)
 		return err;
 
+	/* Deassert the reset signal */
+	phy_device_reset(phydev, 0);
+
 	/* Run all of the fixups for this PHY */
 	err = phy_scan_fixups(phydev);
 	if (err) {
@@ -647,9 +650,15 @@ int phy_device_register(struct phy_devic
 		goto out;
 	}
 
+	/* Assert the reset signal */
+	phy_device_reset(phydev, 1);
+
 	return 0;
 
  out:
+	/* Assert the reset signal */
+	phy_device_reset(phydev, 1);
+
 	mdiobus_unregister_device(&phydev->mdio);
 	return err;
 }
@@ -849,6 +858,9 @@ int phy_init_hw(struct phy_device *phyde
 {
 	int ret = 0;
 
+	/* Deassert the reset signal */
+	phy_device_reset(phydev, 0);
+
 	if (!phydev->drv || !phydev->drv->config_init)
 		return 0;
 
@@ -1126,6 +1138,9 @@ void phy_detach(struct phy_device *phyde
 	put_device(&phydev->mdio.dev);
 	if (ndev_owner != bus->owner)
 		module_put(bus->owner);
+
+	/* Assert the reset signal */
+	phy_device_reset(phydev, 1);
 }
 EXPORT_SYMBOL(phy_detach);
 
@@ -1811,9 +1826,16 @@ static int phy_probe(struct device *dev)
 	/* Set the state to READY by default */
 	phydev->state = PHY_READY;
 
-	if (phydev->drv->probe)
+	if (phydev->drv->probe) {
+		/* Deassert the reset signal */
+		phy_device_reset(phydev, 0);
+
 		err = phydev->drv->probe(phydev);
 
+		/* Assert the reset signal */
+		phy_device_reset(phydev, 1);
+	}
+
 	mutex_unlock(&phydev->lock);
 
 	return err;
@@ -1829,8 +1851,15 @@ static int phy_remove(struct device *dev
 	phydev->state = PHY_DOWN;
 	mutex_unlock(&phydev->lock);
 
-	if (phydev->drv && phydev->drv->remove)
+	if (phydev->drv && phydev->drv->remove) {
+		/* Deassert the reset signal */
+		phy_device_reset(phydev, 0);
+
 		phydev->drv->remove(phydev);
+
+		/* Assert the reset signal */
+		phy_device_reset(phydev, 1);
+	}
 	phydev->drv = NULL;
 
 	return 0;
Index: net-next/drivers/of/of_mdio.c
===================================================================
--- net-next.orig/drivers/of/of_mdio.c
+++ net-next/drivers/of/of_mdio.c
@@ -47,6 +47,7 @@ static int of_get_phy_id(struct device_n
 static void of_mdiobus_register_phy(struct mii_bus *mdio,
 				    struct device_node *child, u32 addr)
 {
+	struct gpio_desc *gpiod;
 	struct phy_device *phy;
 	bool is_c45;
 	int rc;
@@ -55,10 +56,16 @@ static void of_mdiobus_register_phy(stru
 	is_c45 = of_device_is_compatible(child,
 					 "ethernet-phy-ieee802.3-c45");
 
+	/* Deassert the reset signal */
+	gpiod = fwnode_get_named_gpiod(&child->fwnode, "reset-gpios", 0,
+				       GPIOD_OUT_LOW, "PHY reset");
 	if (!is_c45 && !of_get_phy_id(child, &phy_id))
 		phy = phy_device_create(mdio, addr, phy_id, 0, NULL);
 	else
 		phy = get_phy_device(mdio, addr, is_c45);
+	/* Assert the reset signal again */
+	if (!IS_ERR(gpiod))
+		gpiod_set_value(gpiod, 1);
 	if (IS_ERR(phy))
 		return;
 
@@ -78,6 +85,9 @@ static void of_mdiobus_register_phy(stru
 	of_node_get(child);
 	phy->mdio.dev.of_node = child;
 
+	if (!IS_ERR(gpiod))
+		phy->mdio.reset = gpiod;
+
 	/* All data is now stored in the phy struct;
 	 * register it */
 	rc = phy_device_register(phy);
@@ -95,6 +105,7 @@ static void of_mdiobus_register_device(s
 				       struct device_node *child, u32 addr)
 {
 	struct mdio_device *mdiodev;
+	struct gpio_desc *gpiod;
 	int rc;
 
 	mdiodev = mdio_device_create(mdio, addr);
@@ -107,6 +118,11 @@ static void of_mdiobus_register_device(s
 	of_node_get(child);
 	mdiodev->dev.of_node = child;
 
+	gpiod = fwnode_get_named_gpiod(&child->fwnode, "reset-gpios", 0,
+				       GPIOD_ASIS, "PHY reset");
+	if (!IS_ERR(gpiod))
+		mdiodev->reset = gpiod;
+
 	/* All data is now stored in the mdiodev struct; register it. */
 	rc = mdio_device_register(mdiodev);
 	if (rc) {
Index: net-next/include/linux/mdio.h
===================================================================
--- net-next.orig/include/linux/mdio.h
+++ net-next/include/linux/mdio.h
@@ -12,6 +12,7 @@
 #include <uapi/linux/mdio.h>
 #include <linux/mod_devicetable.h>
 
+struct gpio_desc;
 struct mii_bus;
 
 /* Multiple levels of nesting are possible. However typically this is
@@ -39,6 +40,7 @@ struct mdio_device {
 	/* Bus address of the MDIO device (0-31) */
 	int addr;
 	int flags;
+	struct gpio_desc *reset;
 };
 #define to_mdio_device(d) container_of(d, struct mdio_device, dev)
 
@@ -71,6 +73,7 @@ void mdio_device_free(struct mdio_device
 struct mdio_device *mdio_device_create(struct mii_bus *bus, int addr);
 int mdio_device_register(struct mdio_device *mdiodev);
 void mdio_device_remove(struct mdio_device *mdiodev);
+void mdio_device_reset(struct mdio_device *mdiodev, int value);
 int mdio_driver_register(struct mdio_driver *drv);
 void mdio_driver_unregister(struct mdio_driver *drv);
 int mdio_device_bus_match(struct device *dev, struct device_driver *drv);
Index: net-next/include/linux/phy.h
===================================================================
--- net-next.orig/include/linux/phy.h
+++ net-next/include/linux/phy.h
@@ -847,6 +847,11 @@ static inline int phy_read_status(struct
 	return phydev->drv->read_status(phydev);
 }
 
+static inline void phy_device_reset(struct phy_device *phydev, int value)
+{
+	mdio_device_reset(&phydev->mdio, value);
+}
+
 #define phydev_err(_phydev, format, args...)	\
 	dev_err(&_phydev->mdio.dev, format, ##args)
 

[-- Attachment #3: macb-kill-PHY-reset-code-v3.patch --]
[-- Type: text/x-patch, Size: 2800 bytes --]

Subject: macb: kill PHY reset code

With  the phylib now  being aware of the "reset-gpios"  PHY node property,
there should  be no need to frob the PHY  reset in this driver anymore...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

---
Changes in version 3:
- resolved rejects due to the file being renamed, refreshed the patch;
- added the code to reset PHY on a probe error;
- edited the patch description.

 drivers/net/ethernet/cadence/macb.h      |    1 -
 drivers/net/ethernet/cadence/macb_main.c |   21 ---------------------
 2 files changed, 22 deletions(-)

Index: net-next/drivers/net/ethernet/cadence/macb.h
===================================================================
--- net-next.orig/drivers/net/ethernet/cadence/macb.h
+++ net-next/drivers/net/ethernet/cadence/macb.h
@@ -1032,7 +1032,6 @@ struct macb {
 	unsigned int		dma_burst_length;
 
 	phy_interface_t		phy_interface;
-	struct gpio_desc	*reset_gpio;
 
 	/* AT91RM9200 transmit */
 	struct sk_buff *skb;			/* holds skb until xmit interrupt completes */
Index: net-next/drivers/net/ethernet/cadence/macb_main.c
===================================================================
--- net-next.orig/drivers/net/ethernet/cadence/macb_main.c
+++ net-next/drivers/net/ethernet/cadence/macb_main.c
@@ -3393,7 +3393,6 @@ static int macb_probe(struct platform_de
 					      = macb_config->clk_init;
 	int (*init)(struct platform_device *) = macb_config->init;
 	struct device_node *np = pdev->dev.of_node;
-	struct device_node *phy_node;
 	struct clk *pclk, *hclk = NULL, *tx_clk = NULL, *rx_clk = NULL;
 	unsigned int queue_mask, num_queues;
 	struct macb_platform_data *pdata;
@@ -3499,18 +3498,6 @@ static int macb_probe(struct platform_de
 	else
 		macb_get_hwaddr(bp);
 
-	/* Power up the PHY if there is a GPIO reset */
-	phy_node =  of_get_next_available_child(np, NULL);
-	if (phy_node) {
-		int gpio = of_get_named_gpio(phy_node, "reset-gpios", 0);
-
-		if (gpio_is_valid(gpio)) {
-			bp->reset_gpio = gpio_to_desc(gpio);
-			gpiod_direction_output(bp->reset_gpio, 1);
-		}
-	}
-	of_node_put(phy_node);
-
 	err = of_get_phy_mode(np);
 	if (err < 0) {
 		pdata = dev_get_platdata(&pdev->dev);
@@ -3554,10 +3541,6 @@ err_out_unregister_mdio:
 	mdiobus_unregister(bp->mii_bus);
 	mdiobus_free(bp->mii_bus);
 
-	/* Shutdown the PHY if there is a GPIO reset */
-	if (bp->reset_gpio)
-		gpiod_set_value(bp->reset_gpio, 0);
-
 err_out_free_netdev:
 	free_netdev(dev);
 
@@ -3585,10 +3568,6 @@ static int macb_remove(struct platform_d
 		dev->phydev = NULL;
 		mdiobus_free(bp->mii_bus);
 
-		/* Shutdown the PHY if there is a GPIO reset */
-		if (bp->reset_gpio)
-			gpiod_set_value(bp->reset_gpio, 0);
-
 		unregister_netdev(dev);
 		clk_disable_unprepare(bp->tx_clk);
 		clk_disable_unprepare(bp->hclk);

^ permalink raw reply

* Re: [PATCH v2] netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'
From: Daniel Borkmann @ 2017-10-09 12:35 UTC (permalink / raw)
  To: Shmulik Ladkani, Pablo Neira Ayuso, netfilter-devel
  Cc: netdev, Rafael Buchbinder, Shmulik Ladkani, Willem de Bruijn
In-Reply-To: <20171009122715.19010-1-shmulik@nsof.io>

On 10/09/2017 02:27 PM, Shmulik Ladkani wrote:
> From: Shmulik Ladkani <shmulik.ladkani@gmail.com>
>
> Commit 2c16d6033264 ("netfilter: xt_bpf: support ebpf") introduced
> support for attaching an eBPF object by an fd, with the
> 'bpf_mt_check_v1' ABI expecting the '.fd' to be specified upon each
> IPT_SO_SET_REPLACE call.
>
> However this breaks subsequent iptables calls:
>
>   # iptables -A INPUT -m bpf --object-pinned /sys/fs/bpf/xxx -j ACCEPT
>   # iptables -A INPUT -s 5.6.7.8 -j ACCEPT
>   iptables: Invalid argument. Run `dmesg' for more information.
>
> That's because iptables works by loading exising rules using
> IPT_SO_GET_ENTRIES to userspace, then issuing IPT_SO_SET_REPLACE with
> the replacement set.
>
> However, the loaded 'xt_bpf_info_v1' has an arbitrary '.fd' number
> (from the initial "iptables -m bpf" invocation) - so when 2nd invocation
> occurs, userspace passes a bogus fd number, which leads to
> 'bpf_mt_check_v1' to fail.
>
> One suggested solution [1] was to hack iptables userspace, to perform a
> "entries fixup" immediatley after IPT_SO_GET_ENTRIES, by opening a new,
> process-local fd per every 'xt_bpf_info_v1' entry seen.
>
> However, in [2] both Pablo Neira Ayuso and Willem de Bruijn suggested to
> depricate the xt_bpf_info_v1 ABI dealing with pinned ebpf objects.
>
> This fix changes the XT_BPF_MODE_FD_PINNED behavior to ignore the given
> '.fd' and instead perform an in-kernel lookup for the bpf object given
> the provided '.path'.
>
> It also defines an alias for the XT_BPF_MODE_FD_PINNED mode, named
> XT_BPF_MODE_PATH_PINNED, to better reflect the fact that the user is
> expected to provide the path of the pinned object.
>
> Existing XT_BPF_MODE_FD_ELF behavior (non-pinned fd mode) is preserved.
>
> References: [1] https://marc.info/?l=netfilter-devel&m=150564724607440&w=2
>              [2] https://marc.info/?l=netfilter-devel&m=150575727129880&w=2
>
> Cc: Pablo Neira Ayuso <pablo@netfilter.org>
> Cc: Willem de Bruijn <willemb@google.com>
> Reported-by: Rafael Buchbinder <rafi@rbk.ms>
> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply

* Re: [PATCH v2] netlink: do not set cb_running if dump's start() errs
From: Johannes Berg @ 2017-10-09 12:31 UTC (permalink / raw)
  To: Jason A. Donenfeld, davem, Netdev, linux-kernel
In-Reply-To: <20171009121451.26815-1-Jason@zx2c4.com>

On Mon, 2017-10-09 at 14:14 +0200, Jason A. Donenfeld wrote:
> It turns out that multiple places can call netlink_dump(), which
> means
> it's still possible to dereference partially initialized values in
> dump() that were the result of a faulty returned start().
> 
> This fixes the issue by calling start() _before_ setting cb_running
> to
> true, so that there's no chance at all of hitting the dump() function
> through any indirect paths.
> 
> It also moves the call to start() to be when the mutex is held. This
> has
> the nice side effect of serializing invocations to start(), which is
> likely desirable anyway. It also prevents any possible other races
> that
> might come out of this logic.

I'm not necessarily sure it's _nice_, but I do think it doesn't matter,
so that's just splitting hairs. If you do have a genl family with
parallel_ops, you'd better be prepared to handle parallel things, and
then this could also be in parallel :-)

> In testing this with several different pieces of tricky code to
> trigger
> these issues, this commit fixes all avenues that I'm aware of.
> 
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

Reviewed-by: Johannes Berg <johannes@sipsolutions.net>

johannes

^ permalink raw reply

* Re: [PATCH] netlink: do not set cb_running if dump's start() errs
From: Johannes Berg @ 2017-10-09 12:27 UTC (permalink / raw)
  To: Jason A. Donenfeld, davem, Netdev, linux-kernel
In-Reply-To: <1507550326.26041.39.camel@sipsolutions.net>

Just decided to take another look:

On Mon, 2017-10-09 at 13:58 +0200, Johannes Berg wrote:
> On Mon, 2017-10-09 at 13:56 +0200, Jason A. Donenfeld wrote:
> 
> > @@ -2266,16 +2266,17 @@ int __netlink_dump_start(struct sock *ssk,
> > struct sk_buff *skb,
> >  	cb->min_dump_alloc = control->min_dump_alloc;
> >  	cb->skb = skb;
> >  
> > +	if (cb->start) {
> > +		ret = cb->start(cb);
> > +		if (ret)
> > +			goto error_unlock;
> > +	}
> > +
> >  	nlk->cb_running = true;
> >  
> >  	mutex_unlock(nlk->cb_mutex);
> 
> Hmm. Now start is invoked with the mutex held, I'm not sure it
> actually _matters_, but that should probably be reviewed and
> mentioned in the commit log?

It sort of seems designed to run ->start outside the lock, otherwise we
wouldn't really have to acquire it again in netlink_dump() but could
just keep it across the call (with some locking changes in
netlink_recvmsg())?

Then again, clearly none of the (few) existing users actually care.

Btw - we should (separately) also remove "start" from struct
netlink_callback, it's only ever used within this function and we can
use control->start instead of cb->start here.

johannes

^ permalink raw reply

* [PATCH v2] netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'
From: Shmulik Ladkani @ 2017-10-09 12:27 UTC (permalink / raw)
  To: Pablo Neira Ayuso, netfilter-devel
  Cc: netdev, Daniel Borkmann, shmulik, Rafael Buchbinder,
	Shmulik Ladkani, Willem de Bruijn

From: Shmulik Ladkani <shmulik.ladkani@gmail.com>

Commit 2c16d6033264 ("netfilter: xt_bpf: support ebpf") introduced
support for attaching an eBPF object by an fd, with the
'bpf_mt_check_v1' ABI expecting the '.fd' to be specified upon each
IPT_SO_SET_REPLACE call.

However this breaks subsequent iptables calls:

 # iptables -A INPUT -m bpf --object-pinned /sys/fs/bpf/xxx -j ACCEPT
 # iptables -A INPUT -s 5.6.7.8 -j ACCEPT
 iptables: Invalid argument. Run `dmesg' for more information.

That's because iptables works by loading exising rules using
IPT_SO_GET_ENTRIES to userspace, then issuing IPT_SO_SET_REPLACE with
the replacement set.

However, the loaded 'xt_bpf_info_v1' has an arbitrary '.fd' number
(from the initial "iptables -m bpf" invocation) - so when 2nd invocation
occurs, userspace passes a bogus fd number, which leads to
'bpf_mt_check_v1' to fail.

One suggested solution [1] was to hack iptables userspace, to perform a
"entries fixup" immediatley after IPT_SO_GET_ENTRIES, by opening a new,
process-local fd per every 'xt_bpf_info_v1' entry seen.

However, in [2] both Pablo Neira Ayuso and Willem de Bruijn suggested to
depricate the xt_bpf_info_v1 ABI dealing with pinned ebpf objects.

This fix changes the XT_BPF_MODE_FD_PINNED behavior to ignore the given
'.fd' and instead perform an in-kernel lookup for the bpf object given
the provided '.path'.

It also defines an alias for the XT_BPF_MODE_FD_PINNED mode, named
XT_BPF_MODE_PATH_PINNED, to better reflect the fact that the user is
expected to provide the path of the pinned object.

Existing XT_BPF_MODE_FD_ELF behavior (non-pinned fd mode) is preserved.

References: [1] https://marc.info/?l=netfilter-devel&m=150564724607440&w=2
            [2] https://marc.info/?l=netfilter-devel&m=150575727129880&w=2

Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Willem de Bruijn <willemb@google.com>
Reported-by: Rafael Buchbinder <rafi@rbk.ms>
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
---
v2: Fix xt_bpf.c compilation if CONFIG_BPF_SYSCALL=n
---
 include/linux/bpf.h                   |  5 +++++
 include/uapi/linux/netfilter/xt_bpf.h |  1 +
 kernel/bpf/inode.c                    |  1 +
 net/netfilter/xt_bpf.c                | 22 ++++++++++++++++++++--
 4 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index a67daea731ab..a49b321a1262 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -407,6 +407,11 @@ static inline void __bpf_prog_uncharge(struct user_struct *user, u32 pages)
 {
 }
 
+static inline int bpf_obj_get_user(const char __user *pathname)
+{
+	return -EOPNOTSUPP;
+}
+
 static inline struct net_device  *__dev_map_lookup_elem(struct bpf_map *map,
 						       u32 key)
 {
diff --git a/include/uapi/linux/netfilter/xt_bpf.h b/include/uapi/linux/netfilter/xt_bpf.h
index b97725af2ac0..da161b56c79e 100644
--- a/include/uapi/linux/netfilter/xt_bpf.h
+++ b/include/uapi/linux/netfilter/xt_bpf.h
@@ -23,6 +23,7 @@ enum xt_bpf_modes {
 	XT_BPF_MODE_FD_PINNED,
 	XT_BPF_MODE_FD_ELF,
 };
+#define XT_BPF_MODE_PATH_PINNED XT_BPF_MODE_FD_PINNED
 
 struct xt_bpf_info_v1 {
 	__u16 mode;
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index e833ed914358..be1dde967208 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -363,6 +363,7 @@ int bpf_obj_get_user(const char __user *pathname)
 	putname(pname);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(bpf_obj_get_user);
 
 static void bpf_evict_inode(struct inode *inode)
 {
diff --git a/net/netfilter/xt_bpf.c b/net/netfilter/xt_bpf.c
index 38986a95216c..29123934887b 100644
--- a/net/netfilter/xt_bpf.c
+++ b/net/netfilter/xt_bpf.c
@@ -8,6 +8,7 @@
  */
 
 #include <linux/module.h>
+#include <linux/syscalls.h>
 #include <linux/skbuff.h>
 #include <linux/filter.h>
 #include <linux/bpf.h>
@@ -49,6 +50,22 @@ static int __bpf_mt_check_fd(int fd, struct bpf_prog **ret)
 	return 0;
 }
 
+static int __bpf_mt_check_path(const char *path, struct bpf_prog **ret)
+{
+	mm_segment_t oldfs = get_fs();
+	int retval, fd;
+
+	set_fs(KERNEL_DS);
+	fd = bpf_obj_get_user(path);
+	set_fs(oldfs);
+	if (fd < 0)
+		return fd;
+
+	retval = __bpf_mt_check_fd(fd, ret);
+	sys_close(fd);
+	return retval;
+}
+
 static int bpf_mt_check(const struct xt_mtchk_param *par)
 {
 	struct xt_bpf_info *info = par->matchinfo;
@@ -66,9 +83,10 @@ static int bpf_mt_check_v1(const struct xt_mtchk_param *par)
 		return __bpf_mt_check_bytecode(info->bpf_program,
 					       info->bpf_program_num_elem,
 					       &info->filter);
-	else if (info->mode == XT_BPF_MODE_FD_PINNED ||
-		 info->mode == XT_BPF_MODE_FD_ELF)
+	else if (info->mode == XT_BPF_MODE_FD_ELF)
 		return __bpf_mt_check_fd(info->fd, &info->filter);
+	else if (info->mode == XT_BPF_MODE_PATH_PINNED)
+		return __bpf_mt_check_path(info->path, &info->filter);
 	else
 		return -EINVAL;
 }
-- 
2.14.2


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox