Netdev List
 help / color / mirror / Atom feed
* [Patch net 01/11] net: hns3: add error handler for hns3_nic_init_vector_data()
From: Huazhong Tan @ 2018-10-27  2:41 UTC (permalink / raw)
  To: davem; +Cc: netdev, linuxarm, salil.mehta, yisen.zhuang, lipeng321,
	Huazhong Tan
In-Reply-To: <1540608118-27449-1-git-send-email-tanhuazhong@huawei.com>

When hns3_nic_init_vector_data() failed for mapping ring to vector,
it should cancel the netif_napi_add() that have been successfully done
and then exit.

Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 32f3aca8..d9066c5 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -2821,7 +2821,7 @@ static int hns3_nic_init_vector_data(struct hns3_nic_priv *priv)
 	struct hnae3_handle *h = priv->ae_handle;
 	struct hns3_enet_tqp_vector *tqp_vector;
 	int ret = 0;
-	u16 i;
+	int i, j;
 
 	hns3_nic_set_cpumask(priv);
 
@@ -2868,13 +2868,19 @@ static int hns3_nic_init_vector_data(struct hns3_nic_priv *priv)
 		hns3_free_vector_ring_chain(tqp_vector, &vector_ring_chain);
 
 		if (ret)
-			return ret;
+			goto map_ring_fail;
 
 		netif_napi_add(priv->netdev, &tqp_vector->napi,
 			       hns3_nic_common_poll, NAPI_POLL_WEIGHT);
 	}
 
 	return 0;
+
+map_ring_fail:
+	for (j = i - 1; j >= 0; j--)
+		netif_napi_del(&priv->tqp_vector[j].napi);
+
+	return ret;
 }
 
 static int hns3_nic_alloc_vector_data(struct hns3_nic_priv *priv)
-- 
2.7.4

^ permalink raw reply related

* [Patch net 06/11] net: hns3: bugfix for is_valid_csq_clean_head()
From: Huazhong Tan @ 2018-10-27  2:41 UTC (permalink / raw)
  To: davem; +Cc: netdev, linuxarm, salil.mehta, yisen.zhuang, lipeng321,
	Huazhong Tan
In-Reply-To: <1540608118-27449-1-git-send-email-tanhuazhong@huawei.com>

The HEAD pointer of the hardware command queue maybe equal to the command
queue's next_to_use the driver, so that does not belong to the invalid
HEAD pointer, since the hardware may not process the command in time,
causing the HEAD pointer to be too late to update. The variables' name
in this function is unreadable, so give them a more readable one.

Fixes: 3ff504908f95 ("net: hns3: fix a dead loop in hclge_cmd_csq_clean")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
index 68026a5..690f62e 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
@@ -24,15 +24,15 @@ static int hclge_ring_space(struct hclge_cmq_ring *ring)
 	return ring->desc_num - used - 1;
 }
 
-static int is_valid_csq_clean_head(struct hclge_cmq_ring *ring, int h)
+static int is_valid_csq_clean_head(struct hclge_cmq_ring *ring, int head)
 {
-	int u = ring->next_to_use;
-	int c = ring->next_to_clean;
+	int ntu = ring->next_to_use;
+	int ntc = ring->next_to_clean;
 
-	if (unlikely(h >= ring->desc_num))
-		return 0;
+	if (ntu > ntc)
+		return head >= ntc && head <= ntu;
 
-	return u > c ? (h > c && h <= u) : (h > c || h <= u);
+	return head >= ntc || head <= ntu;
 }
 
 static int hclge_alloc_cmd_desc(struct hclge_cmq_ring *ring)
-- 
2.7.4

^ permalink raw reply related

* [Patch net 05/11] net: hns3: remove unnecessary queue reset in the hns3_uninit_all_ring()
From: Huazhong Tan @ 2018-10-27  2:41 UTC (permalink / raw)
  To: davem; +Cc: netdev, linuxarm, salil.mehta, yisen.zhuang, lipeng321,
	Huazhong Tan
In-Reply-To: <1540608118-27449-1-git-send-email-tanhuazhong@huawei.com>

It is not necessary to reset the queue in the hns3_uninit_all_ring(),
since the queue is stopped in the down operation, and will be resetted
in the up operaton. And the judgment of the HCLGE_STATE_RST_HANDLING
flag in the hclge_reset_tqp() is not correct, because we need to reset
tqp during pf reset, otherwise it may cause queue not be resetted to
working state problem.

Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c         | 3 ---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 3 ---
 2 files changed, 6 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 6f0fd62..a80ecfb 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -3240,9 +3240,6 @@ int hns3_uninit_all_ring(struct hns3_nic_priv *priv)
 	int i;
 
 	for (i = 0; i < h->kinfo.num_tqps; i++) {
-		if (h->ae_algo->ops->reset_queue)
-			h->ae_algo->ops->reset_queue(h, i);
-
 		hns3_fini_ring(priv->ring_data[i].ring);
 		hns3_fini_ring(priv->ring_data[i + h->kinfo.num_tqps].ring);
 	}
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 2a63147..4dd0506 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -6116,9 +6116,6 @@ void hclge_reset_tqp(struct hnae3_handle *handle, u16 queue_id)
 	u16 queue_gid;
 	int ret;
 
-	if (test_bit(HCLGE_STATE_RST_HANDLING, &hdev->state))
-		return;
-
 	queue_gid = hclge_covert_handle_qid_global(handle, queue_id);
 
 	ret = hclge_tqp_enable(hdev, queue_id, 0, false);
-- 
2.7.4

^ permalink raw reply related

* [Patch net 07/11] net: hns3: bugfix for hclge_mdio_write and hclge_mdio_read
From: Huazhong Tan @ 2018-10-27  2:41 UTC (permalink / raw)
  To: davem; +Cc: netdev, linuxarm, salil.mehta, yisen.zhuang, lipeng321,
	Huazhong Tan
In-Reply-To: <1540608118-27449-1-git-send-email-tanhuazhong@huawei.com>

When there is a PHY, the driver needs to complete some operations through
MDIO during reset reinitialization, so HCLGE_STATE_CMD_DISABLE is more
suitable than HCLGE_STATE_RST_HANDLING to prevent the MDIO operation from
being sent during the hardware reset.

Fixes: b50ae26c57cb ("net: hns3: never send command queue message to IMP when reset)
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
index 24b1f2a..0301863 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
@@ -52,7 +52,7 @@ static int hclge_mdio_write(struct mii_bus *bus, int phyid, int regnum,
 	struct hclge_desc desc;
 	int ret;
 
-	if (test_bit(HCLGE_STATE_RST_HANDLING, &hdev->state))
+	if (test_bit(HCLGE_STATE_CMD_DISABLE, &hdev->state))
 		return 0;
 
 	hclge_cmd_setup_basic_desc(&desc, HCLGE_OPC_MDIO_CONFIG, false);
@@ -90,7 +90,7 @@ static int hclge_mdio_read(struct mii_bus *bus, int phyid, int regnum)
 	struct hclge_desc desc;
 	int ret;
 
-	if (test_bit(HCLGE_STATE_RST_HANDLING, &hdev->state))
+	if (test_bit(HCLGE_STATE_CMD_DISABLE, &hdev->state))
 		return 0;
 
 	hclge_cmd_setup_basic_desc(&desc, HCLGE_OPC_MDIO_CONFIG, true);
-- 
2.7.4

^ permalink raw reply related

* [Patch net 04/11] net: hns3: bugfix for the initialization of command queue's spin lock
From: Huazhong Tan @ 2018-10-27  2:41 UTC (permalink / raw)
  To: davem; +Cc: netdev, linuxarm, salil.mehta, yisen.zhuang, lipeng321,
	Huazhong Tan
In-Reply-To: <1540608118-27449-1-git-send-email-tanhuazhong@huawei.com>

The spin lock of the command queue only needs to be initialized once
when the driver initializes the command queue. It is not necessary to
initialize the spin lock when resetting. At the same time, the
modification of the queue member should be performed after acquiring
the lock.

Fixes: 3efb960f056d ("net: hns3: Refactor the initialization of command queue")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
index ac13cb2..68026a5 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
@@ -304,6 +304,10 @@ int hclge_cmd_queue_init(struct hclge_dev *hdev)
 {
 	int ret;
 
+	/* Setup the lock for command queue */
+	spin_lock_init(&hdev->hw.cmq.csq.lock);
+	spin_lock_init(&hdev->hw.cmq.crq.lock);
+
 	/* Setup the queue entries for use cmd queue */
 	hdev->hw.cmq.csq.desc_num = HCLGE_NIC_CMQ_DESC_NUM;
 	hdev->hw.cmq.crq.desc_num = HCLGE_NIC_CMQ_DESC_NUM;
@@ -337,18 +341,20 @@ int hclge_cmd_init(struct hclge_dev *hdev)
 	u32 version;
 	int ret;
 
+	spin_lock_bh(&hdev->hw.cmq.csq.lock);
+	spin_lock_bh(&hdev->hw.cmq.crq.lock);
+
 	hdev->hw.cmq.csq.next_to_clean = 0;
 	hdev->hw.cmq.csq.next_to_use = 0;
 	hdev->hw.cmq.crq.next_to_clean = 0;
 	hdev->hw.cmq.crq.next_to_use = 0;
 
-	/* Setup the lock for command queue */
-	spin_lock_init(&hdev->hw.cmq.csq.lock);
-	spin_lock_init(&hdev->hw.cmq.crq.lock);
-
 	hclge_cmd_init_regs(&hdev->hw);
 	clear_bit(HCLGE_STATE_CMD_DISABLE, &hdev->state);
 
+	spin_unlock_bh(&hdev->hw.cmq.crq.lock);
+	spin_unlock_bh(&hdev->hw.cmq.csq.lock);
+
 	ret = hclge_cmd_query_firmware_version(&hdev->hw, &version);
 	if (ret) {
 		dev_err(&hdev->pdev->dev,
-- 
2.7.4

^ permalink raw reply related

* [Patch net 00/11] Bugfix for the HNS3 driver
From: Huazhong Tan @ 2018-10-27  2:41 UTC (permalink / raw)
  To: davem; +Cc: netdev, linuxarm, salil.mehta, yisen.zhuang, lipeng321,
	Huazhong Tan

This patch series include bugfix for the HNS3 ethernet
controller driver.

Huazhong Tan (11):
  net: hns3: add error handler for hns3_nic_init_vector_data()
  net: hns3: add error handler for
    hns3_get_ring_config/hns3_queue_to_ring
  net: hns3: bugfix for reporting unknown vector0 interrupt repeatly
    problem
  net: hns3: bugfix for the initialization of command queue's spin lock
  net: hns3: remove unnecessary queue reset in the
    hns3_uninit_all_ring()
  net: hns3: bugfix for is_valid_csq_clean_head()
  net: hns3: bugfix for hclge_mdio_write and hclge_mdio_read
  net: hns3: fix incorrect return value/type of some functions
  net: hns3: bugfix for handling mailbox while the command queue
    reinitialized
  net: hns3: bugfix for rtnl_lock's range in the hclge_reset()
  net: hns3: bugfix for rtnl_lock's range in the hclgevf_reset()

 drivers/net/ethernet/hisilicon/hns3/hnae3.h        |   6 +-
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c    | 106 +++++++++++++++------
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.h    |   2 +-
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c |  26 +++--
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c    |  40 ++++----
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h    |   2 +-
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c |   6 ++
 .../ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c    |   4 +-
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c  |  12 ++-
 9 files changed, 133 insertions(+), 71 deletions(-)

-- 
2.7.4

^ permalink raw reply

* [Patch net 03/11] net: hns3: bugfix for reporting unknown vector0 interrupt repeatly problem
From: Huazhong Tan @ 2018-10-27  2:41 UTC (permalink / raw)
  To: davem; +Cc: netdev, linuxarm, salil.mehta, yisen.zhuang, lipeng321,
	Huazhong Tan
In-Reply-To: <1540608118-27449-1-git-send-email-tanhuazhong@huawei.com>

The current driver supports handling two vector0 interrupts, reset and
mailbox. When the hardware reports an interrupt of another type of
interrupt source, if the driver does not process the interrupt and
enables the interrupt, the hardware will repeatedly report the unknown
interrupt.

Therefore, the driver enables the vector0 interrupt after clearing the
known type of interrupt source. Other conditions are not enabled.

Fixes: cd8c5c269b1d ("net: hns3: Fix for hclge_reset running repeatly problem")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 5234b53..2a63147 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -2236,7 +2236,7 @@ static irqreturn_t hclge_misc_irq_handle(int irq, void *data)
 	}
 
 	/* clear the source of interrupt if it is not cause by reset */
-	if (event_cause != HCLGE_VECTOR0_EVENT_RST) {
+	if (event_cause == HCLGE_VECTOR0_EVENT_MBX) {
 		hclge_clear_event_cause(hdev, event_cause, clearval);
 		hclge_enable_vector(&hdev->misc_vector, true);
 	}
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH net v2] net: udp: fix handling of CHECKSUM_COMPLETE packets
From: Maciej Żenczykowski @ 2018-10-27  1:53 UTC (permalink / raw)
  To: David Miller
  Cc: stranche, Eric Dumazet, Linux NetDev, samanthakumar, Eric Dumazet
In-Reply-To: <20181024.142105.1871980098995024381.davem@davemloft.net>

Possibly worth mentioning that without this fix you can also end up
with valid udp packets being dropped (ie. the reverse of the commit
description which talks about receiving invalid ones).

The (approximate?) requirement is:
(a) nic generates CHECKSUM_COMPLETE, but gets the actual checksum wrong
(b) udp packet is longer then 76 bytes L2 (CHECKSUM_BREAK), so we
don't calculate checksum inline prior to pulling udp header
(c) udp socket has no bpf filter, so we delay checksum until we do
copy to userspace from recvmsg()

76 bytes is 76 - 14 ether - 20 ipv4 - 8 udp header = 34 udp ipv4 payload bytes
On Wed, Oct 24, 2018 at 2:22 PM David Miller <davem@davemloft.net> wrote:
>
> From: Sean Tranchetti <stranche@codeaurora.org>
> Date: Tue, 23 Oct 2018 16:04:31 -0600
>
> > Current handling of CHECKSUM_COMPLETE packets by the UDP stack is
> > incorrect for any packet that has an incorrect checksum value.
> >
> > udp4/6_csum_init() will both make a call to
> > __skb_checksum_validate_complete() to initialize/validate the csum
> > field when receiving a CHECKSUM_COMPLETE packet. When this packet
> > fails validation, skb->csum will be overwritten with the pseudoheader
> > checksum so the packet can be fully validated by software, but the
> > skb->ip_summed value will be left as CHECKSUM_COMPLETE so that way
> > the stack can later warn the user about their hardware spewing bad
> > checksums. Unfortunately, leaving the SKB in this state can cause
> > problems later on in the checksum calculation.
> >
> > Since the the packet is still marked as CHECKSUM_COMPLETE,
> > udp_csum_pull_header() will SUBTRACT the checksum of the UDP header
> > from skb->csum instead of adding it, leaving us with a garbage value
> > in that field. Once we try to copy the packet to userspace in the
> > udp4/6_recvmsg(), we'll make a call to skb_copy_and_csum_datagram_msg()
> > to checksum the packet data and add it in the garbage skb->csum value
> > to perform our final validation check.
> >
> > Since the value we're validating is not the proper checksum, it's possible
> > that the folded value could come out to 0, causing us not to drop the
> > packet. Instead, we believe that the packet was checksummed incorrectly
> > by hardware since skb->ip_summed is still CHECKSUM_COMPLETE, and we attempt
> > to warn the user with netdev_rx_csum_fault(skb->dev);
> >
> > Unfortunately, since this is the UDP path, skb->dev has been overwritten
> > by skb->dev_scratch and is no longer a valid pointer, so we end up
> > reading invalid memory.
>
> Just want to say that it has always been complicated in this area due to
> the fact that we do this deferral of final checksum validation to when we
> copy the packet into userspace.  For example, poll() needs to do special
> things, etc.
>
> Because we have to make it seem as if we dropped the packet with a bad
> checksum from the point of view of what the user sees during recvmsg()
> and poll() calls.  But until we do that checksum validation, we don't
> know exactly what the situation is.
>
> > This patch addresses this problem in two ways:
> >       1) Do not use the dev pointer when calling netdev_rx_csum_fault()
> >          from skb_copy_and_csum_datagram_msg(). Since this gets called
> >          from the UDP path where skb->dev has been overwritten, we have
> >          no way of knowing if the pointer is still valid. Also for the
> >          sake of consistency with the other uses of
> >          netdev_rx_csum_fault(), don't attempt to call it if the
> >          packet was checksummed by software.
> >
> >       2) Add better CHECKSUM_COMPLETE handling to udp4/6_csum_init().
> >          If we receive a packet that's CHECKSUM_COMPLETE that fails
> >          verification (i.e. skb->csum_valid == 0), check who performed
> >          the calculation. It's possible that the checksum was done in
> >          software by the network stack earlier (such as Netfilter's
> >          CONNTRACK module), and if that says the checksum is bad,
> >          we can drop the packet immediately instead of waiting until
> >          we try and copy it to userspace. Otherwise, we need to
> >          mark the SKB as CHECKSUM_NONE, since the skb->csum field
> >          no longer contains the full packet checksum after the
> >          call to __skb_checksum_validate_complete().
> >
> > Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
>
> Can't count on my hands how many regressions are a result of that change and
> it's subtle side effects. :-/
>
> > Fixes: c84d949057ca ("udp: copy skb->truesize in the first cache line")
> > Cc: Sam Kumar <samanthakumar@google.com>
> > Cc: Eric Dumazet <edumazet@google.com>
> > Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
>
> Applied and queued up for -stable, thank you.

^ permalink raw reply

* [net:master 17/19] net/bridge/br_multicast.c:1432:31: error: 'union <anonymous>' has no member named 'ip6'
From: kbuild test robot @ 2018-10-27  1:49 UTC (permalink / raw)
  To: Hangbin Liu; +Cc: kbuild-all, netdev

[-- Attachment #1: Type: text/plain, Size: 1841 bytes --]

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master
head:   aab456dfa404f3a16d6f1780e62a6a8533c4d008
commit: 5a2de63fd1a59c30c02526d427bc014b98adf508 [17/19] bridge: do not add port to router list when receives query with source 0.0.0.0
config: i386-randconfig-x0-10270816 (attached as .config)
compiler: gcc-5 (Debian 5.5.0-3) 5.4.1 20171010
reproduce:
        git checkout 5a2de63fd1a59c30c02526d427bc014b98adf508
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   net/bridge/br_multicast.c: In function 'br_multicast_query_received':
>> net/bridge/br_multicast.c:1432:31: error: 'union <anonymous>' has no member named 'ip6'
          !ipv6_addr_any(&saddr->u.ip6)))
                                  ^

vim +1432 net/bridge/br_multicast.c

  1414	
  1415	static void br_multicast_query_received(struct net_bridge *br,
  1416						struct net_bridge_port *port,
  1417						struct bridge_mcast_other_query *query,
  1418						struct br_ip *saddr,
  1419						unsigned long max_delay)
  1420	{
  1421		if (!br_multicast_select_querier(br, port, saddr))
  1422			return;
  1423	
  1424		br_multicast_update_query_timer(br, query, max_delay);
  1425	
  1426		/* Based on RFC4541, section 2.1.1 IGMP Forwarding Rules,
  1427		 * the arrival port for IGMP Queries where the source address
  1428		 * is 0.0.0.0 should not be added to router port list.
  1429		 */
  1430		if ((saddr->proto == htons(ETH_P_IP) && saddr->u.ip4) ||
  1431		    (saddr->proto == htons(ETH_P_IPV6) &&
> 1432		     !ipv6_addr_any(&saddr->u.ip6)))
  1433			br_multicast_mark_router(br, port);
  1434	}
  1435	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 34058 bytes --]

^ permalink raw reply

* [Bug?] ss command display unix domain port overflow
From: Suoben @ 2018-10-27  1:34 UTC (permalink / raw)
  To: Stephen Hemminger, netdev@vger.kernel.org
  Cc: Wencongyang (UVP), sangxu, guijianfeng, Wanghui (John),
	chenminhua (A)

[-- Attachment #1: Type: text/plain, Size: 2618 bytes --]


> Dear all,
> 
> When l use ss command to display unix domain socket, the ss shows as below, the local address port and peer address port may overflow which looks like equal to the socket fd inode .
> 
> /**********
> # ss –apx
> Netid  State      Recv-Q Send-Q     Local Address:Port                                                                      Peer Address:Port
> u_str  ESTAB      0      0         /var/run/libvirt/libvirt-sock -1351067523                                                    * -1351073721           users:(("libvirtd",pid=2295,fd=39))
> u_str  ESTAB      0      0         /var/run/libvirt/libvirt-sock -1351121518                                                    * -1351223988           users:(("libvirtd",pid=2295,fd=21))
> u_str  ESTAB      0      0         /var/run/libvirt/libvirt-sock -1351245849                                                    * -1351128250           users:(("libvirtd",pid=2295,fd=40))
> u_str  ESTAB      0      0         /var/run/libvirt/libvirt-sock -1351042552                                                    * -1351050742           users:(("libvirtd",pid=2295,fd=51))
> 
> #ll /proc/2295/fd/51
> lrwx------ 1 root root 64 Oct 10 10:34 51 -> socket:[2943924744]
> 
> **********/
> 
> When I read the iproute source code., I find that the ‘lport’ and 
> ‘rport’ was defined as type ‘int’ in ‘struct sockstat’, which I think 
> should be ‘unsigned int’, also the ‘fd’ in ‘struct user_ent’ should 
> be.  And,
> 
> when printing the port, In function ‘static const char 
> *resolve_service(int port)’ uses “%u” to print tcp port, however, in 
> function ‘static void unix_stats_print(struct sockstat *s, struct 
> filter *f)’ uses “%d” to
> 
> print port when showing unix info, what is the problem I think.
> 
> Can you tell me if there are some reasons by using the type ‘int’ to define the port? Or is it a bug? Can I use “%u” to print the port when showing the unix socket info?
> 
> /**********
> static void unix_stats_print(struct sockstat *s, struct filter *f) {
>          char port_name[30] = {};
> 
>          sock_state_print(s);
> 
>          sock_addr_print(s->name ?: "*", " ",
>                             int_to_str(s->lport, port_name), NULL);
>          sock_addr_print(s->peer_name ?: "*", " ",
>                             int_to_str(s->rport, port_name), NULL);
> 
>          proc_ctx_print(s);
> }
> **********/
> 
> 
> Best regards,
> SangXu

Please  report problems to netdev@vger.kernel.org.
Yes, it looks like a bug.

[-- Attachment #2: 0001-ss-Use-u-to-print-port-when-show-unix-socket.patch --]
[-- Type: application/octet-stream, Size: 1539 bytes --]

From 777eba52309e229853a4dd08566e3df4980de740 Mon Sep 17 00:00:00 2001
From: sangxu <sangxu@huawei.com>
Date: Fri, 26 Oct 2018 16:34:43 +0800
Subject: [PATCH] ss:Use "%u" to print port when show unix socket

When using ss command to display unix domain socket, the ss shows
as below, the local address port and peer address port may
overflow which looks like equal to the socket fd inode.

Netid  State  Recv-Q Send-Q Local Address:Port Peer Address:Port
u_str  ESTAB  0      0  /var/run/libvirt/libvirt-sock -1351042552
 * -1351050742           users:(("libvirtd",pid=2295,fd=51))

lrwx------ 1 root root 64 Oct 10 10:34 51 -> socket:[2943924744]

As we can see, it uses "%u" to print port when printing tcp socket
in function "resolve_service". So, it seems that it should use "%u"
to pirnt port when showing unix socket info as well.

---
 misc/ss.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/misc/ss.c b/misc/ss.c
index f99b687..abee4e4 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -3540,10 +3540,11 @@ static void unix_stats_print(struct sockstat *s, struct filter *f)
 
 	sock_state_print(s);
 
-	sock_addr_print(s->name ?: "*", " ",
-			int_to_str(s->lport, port_name), NULL);
-	sock_addr_print(s->peer_name ?: "*", " ",
-			int_to_str(s->rport, port_name), NULL);
+	sprintf(port_name, "%u", s->lport);
+	sock_addr_print(s->name ?: "*", " ", port_name, NULL);
+
+	sprintf(port_name, "%u", s->rport);
+	sock_addr_print(s->peer_name ?: "*", " ", port_name, NULL);
 
 	proc_ctx_print(s);
 }
-- 
1.8.3.1


^ permalink raw reply related

* checksumming on non-local forward path
From: Jason A. Donenfeld @ 2018-10-27  0:58 UTC (permalink / raw)
  To: Netdev

Hey netdev,

In a protocol like wireguard, if a packet arrives on the other end of
the tunnel, and the crypto checks out, then we're absolutely certain
that the packet hasn't been modified in transit. In this case, there's
no useful information that validating the inner checksums of the
various headers would give us. Every byte of the packet has arrived
intact, and we're certain of it.

Therefore, you might think in a situation like this, before calling
netif_receive_skb or the like, we can just set ip_summed to
CHECKSUM_UNNECESSARY, csum_level to ~0, and feel glad that we're not
wasting cycles unnecessarily validating the checksum.

But what if the receiving computer forwards the packet on across a
real physical network? In that case, it's probably important that the
kernel that originally produced the packet in the first place ensures
it has a valid checksum before encrypting it and sending it through
the wireguard tunnel. That's fine, but it does mean that in the case
of the packet being locally received on the other end, and not
forwarded, the checksumming by the original sender was not needed and
was therefore wasteful.

What would you think of a flag on the receiving end like,
"CHECKSUM_INVALID_BUT_UNNECESSARY"? It would be treated as
CHECKSUM_UNNECESSARY in the case that the the packet is locally
received. But if the packet is going to be forwarded instead, then
skb_checksum_help is called on it before forwarding onward.

AFAICT, wireguard isn't the only thing that could benefit from this:
virtio is another case where it's not always necessary for the sender
to call skb_checksum_help, when the receiver could just do it
conditionally based on whether it's being forwarded.

Thoughts?

Regards,
Jason

^ permalink raw reply

* [net:master 17/19] net//bridge/br_multicast.c:1432:32: error: 'union <anonymous>' has no member named 'ip6'; did you mean 'ip4'?
From: kbuild test robot @ 2018-10-27  0:50 UTC (permalink / raw)
  To: Hangbin Liu; +Cc: kbuild-all, netdev

[-- Attachment #1: Type: text/plain, Size: 2074 bytes --]

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master
head:   aab456dfa404f3a16d6f1780e62a6a8533c4d008
commit: 5a2de63fd1a59c30c02526d427bc014b98adf508 [17/19] bridge: do not add port to router list when receives query with source 0.0.0.0
config: powerpc-defconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        git checkout 5a2de63fd1a59c30c02526d427bc014b98adf508
        # save the attached .config to linux build tree
        GCC_VERSION=7.2.0 make.cross ARCH=powerpc 

All errors (new ones prefixed by >>):

   net//bridge/br_multicast.c: In function 'br_multicast_query_received':
>> net//bridge/br_multicast.c:1432:32: error: 'union <anonymous>' has no member named 'ip6'; did you mean 'ip4'?
          !ipv6_addr_any(&saddr->u.ip6)))
                                   ^~~
                                   ip4

vim +1432 net//bridge/br_multicast.c

  1414	
  1415	static void br_multicast_query_received(struct net_bridge *br,
  1416						struct net_bridge_port *port,
  1417						struct bridge_mcast_other_query *query,
  1418						struct br_ip *saddr,
  1419						unsigned long max_delay)
  1420	{
  1421		if (!br_multicast_select_querier(br, port, saddr))
  1422			return;
  1423	
  1424		br_multicast_update_query_timer(br, query, max_delay);
  1425	
  1426		/* Based on RFC4541, section 2.1.1 IGMP Forwarding Rules,
  1427		 * the arrival port for IGMP Queries where the source address
  1428		 * is 0.0.0.0 should not be added to router port list.
  1429		 */
  1430		if ((saddr->proto == htons(ETH_P_IP) && saddr->u.ip4) ||
  1431		    (saddr->proto == htons(ETH_P_IPV6) &&
> 1432		     !ipv6_addr_any(&saddr->u.ip6)))
  1433			br_multicast_mark_router(br, port);
  1434	}
  1435	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 23983 bytes --]

^ permalink raw reply

* [PATCH net] net: bridge: remove ipv6 zero address check in mcast queries
From: Nikolay Aleksandrov @ 2018-10-27  9:07 UTC (permalink / raw)
  To: netdev; +Cc: Nikolay Aleksandrov, roopa, bridge, yinxu, liuhangbin, davem
In-Reply-To: <90c5f2fe-1743-6b17-2e44-eba58cdbbb35@cumulusnetworks.com>

Recently a check was added which prevents marking of routers with zero
source address, but for IPv6 that cannot happen as the relevant RFCs
actually forbid such packets:
RFC 2710 (MLDv1):
"To be valid, the Query message MUST
 come from a link-local IPv6 Source Address, be at least 24 octets
 long, and have a correct MLD checksum."

Same goes for RFC 3810.

And also it can be seen as a requirement in ipv6_mc_check_mld_query()
which is used by the bridge to validate the message before processing
it. Thus any queries with :: source address won't be processed anyway.
So just remove the check for zero IPv6 source address from the query
processing function.

Fixes: 5a2de63fd1a5 ("bridge: do not add port to router list when receives query with source 0.0.0.0")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
 net/bridge/br_multicast.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 41cdafbf2ebe..6bac0d6b7b94 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1428,8 +1428,7 @@ static void br_multicast_query_received(struct net_bridge *br,
 	 * is 0.0.0.0 should not be added to router port list.
 	 */
 	if ((saddr->proto == htons(ETH_P_IP) && saddr->u.ip4) ||
-	    (saddr->proto == htons(ETH_P_IPV6) &&
-	     !ipv6_addr_any(&saddr->u.ip6)))
+	    saddr->proto == htons(ETH_P_IPV6))
 		br_multicast_mark_router(br, port);
 }
 
-- 
2.17.2

^ permalink raw reply related

* pull-request: bpf 2018-10-27
From: Daniel Borkmann @ 2018-10-27  0:07 UTC (permalink / raw)
  To: davem; +Cc: daniel, ast, netdev

Hi David,

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix toctou race in BTF header validation, from Martin and Wenwen.

2) Fix devmap interface comparison in notifier call which was
   neglecting netns, from Taehee.

3) Several fixes in various places, for example, correcting direct
   packet access and helper function availability, from Daniel.

4) Fix BPF kselftest config fragment to include af_xdp and sockmap,
   from Naresh.

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Thanks a lot!

----------------------------------------------------------------

The following changes since commit 42d0f71c9b5fd48861d61cfc05c9e001f847c9d5:

  octeontx2-af: Use GFP_ATOMIC under spin lock (2018-10-25 11:36:29 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git 

for you to fetch changes up to d8fd9e106fbc291167ebb675ad69234597d0fd98:

  bpf: fix wrong helper enablement in cgroup local storage (2018-10-26 16:03:30 -0700)

----------------------------------------------------------------
Alexei Starovoitov (1):
      Merge branch 'pkt-access-fixes'

Daniel Borkmann (9):
      bpf: fix test suite to enable all unpriv program types
      bpf: disallow direct packet access for unpriv in cg_skb
      bpf: fix direct packet access for flow dissector progs
      bpf: fix cg_skb types to hint access type in may_access_direct_pkt_data
      bpf: fix direct packet write into pop/peek helpers
      bpf: fix leaking uninitialized memory on pop/peek helpers
      bpf: make direct packet write unclone more robust
      bpf: add bpf_jit_limit knob to restrict unpriv allocations
      bpf: fix wrong helper enablement in cgroup local storage

Martin Lau (1):
      bpf, btf: fix a missing check bug in btf_parse

Naresh Kamboju (1):
      selftests/bpf: add config fragments BPF_STREAM_PARSER and XDP_SOCKETS

Taehee Yoo (1):
      bpf: devmap: fix wrong interface selection in notifier_call

 Documentation/sysctl/net.txt                |  8 ++++
 include/linux/filter.h                      |  1 +
 kernel/bpf/btf.c                            | 58 +++++++++++++----------------
 kernel/bpf/core.c                           | 49 ++++++++++++++++++++++--
 kernel/bpf/devmap.c                         |  3 +-
 kernel/bpf/helpers.c                        |  2 -
 kernel/bpf/queue_stack_maps.c               |  2 +
 kernel/bpf/verifier.c                       | 13 +++++--
 net/core/filter.c                           | 21 +++++++++--
 net/core/sysctl_net_core.c                  | 10 ++++-
 tools/testing/selftests/bpf/config          |  2 +
 tools/testing/selftests/bpf/test_verifier.c | 15 +++++++-
 12 files changed, 133 insertions(+), 51 deletions(-)

^ permalink raw reply

* Re: [PATCH bpf] bpf: fix wrong helper enablement in cgroup local storage
From: Alexei Starovoitov @ 2018-10-26 23:05 UTC (permalink / raw)
  To: Roman Gushchin; +Cc: Daniel Borkmann, ast@kernel.org, netdev@vger.kernel.org
In-Reply-To: <20181026225709.GA12467@tower.DHCP.thefacebook.com>

On Fri, Oct 26, 2018 at 10:57:18PM +0000, Roman Gushchin wrote:
> On Sat, Oct 27, 2018 at 12:49:02AM +0200, Daniel Borkmann wrote:
> > Commit cd3394317653 ("bpf: introduce the bpf_get_local_storage()
> > helper function") enabled the bpf_get_local_storage() helper also
> > for BPF program types where it does not make sense to use them.
> > 
> > They have been added both in sk_skb_func_proto() and sk_msg_func_proto()
> > even though both program types are not invoked in combination with
> > cgroups, and neither through BPF_PROG_RUN_ARRAY(). In the latter the
> > bpf_cgroup_storage_set() is set shortly before BPF program invocation.
> > 
> > Later, the helper bpf_get_local_storage() retrieves this prior set
> > up per-cpu pointer and hands the buffer to the BPF program. The map
> > argument in there solely retrieves the enum bpf_cgroup_storage_type
> > from a local storage map associated with the program and based on the
> > type returns either the global or per-cpu storage. However, there
> > is no specific association between the program's map and the actual
> > content in bpf_cgroup_storage[].
> > 
> > Meaning, any BPF program that would have been properly run from the
> > cgroup side through BPF_PROG_RUN_ARRAY() where bpf_cgroup_storage_set()
> > was performed, and that is later unloaded such that prog / maps are
> > teared down will cause a use after free if that pointer is retrieved
> > from programs that are not run through BPF_PROG_RUN_ARRAY() but have
> > the cgroup local storage helper enabled in their func proto.
> > 
> > Lets just remove it from the two sock_map program types to fix it.
> > Auditing through the types where this helper is enabled, it appears
> > that these are the only ones where it was mistakenly allowed.
> > 
> > Fixes: cd3394317653 ("bpf: introduce the bpf_get_local_storage() helper function")
> > Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> > Cc: Roman Gushchin <guro@fb.com>
> > Acked-by: John Fastabend <john.fastabend@gmail.com>
> 
> 
> Acked-by: Roman Gushchin <guro@fb.com>

Applied, Thanks

^ permalink raw reply

* Re: [PATCH net] net/neigh: fix NULL deref in pneigh_dump_table()
From: David Miller @ 2018-10-26 23:04 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, eric.dumazet, dsahern
In-Reply-To: <20181026163327.219797-1-edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>
Date: Fri, 26 Oct 2018 09:33:27 -0700

> pneigh can have NULL device pointer, so we need to make
> neigh_master_filtered() and neigh_ifindex_filtered() more robust.
 ...
> Fixes: 6f52f80e8530 ("net/neigh: Extend dump filter to proxy neighbor dumps")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: David Ahern <dsahern@gmail.com>
> Reported-by: syzbot <syzkaller@googlegroups.com>

Applied, thanks Eric.

^ permalink raw reply

* Re: [PATCH] net: allow traceroute with a specified interface in a vrf
From: David Miller @ 2018-10-26 23:03 UTC (permalink / raw)
  To: mmanning; +Cc: netdev
In-Reply-To: <20181026112435.12159-1-mmanning@vyatta.att-mail.com>

From: Mike Manning <mmanning@vyatta.att-mail.com>
Date: Fri, 26 Oct 2018 12:24:35 +0100

> Traceroute executed in a vrf succeeds if no device is given or if the
> vrf is given as the device, but fails if the interface is given as the
> device. This is for default UDP probes, it succeeds for TCP SYN or ICMP
> ECHO probes. As the skb bound dev is the interface and the sk dev is
> the vrf, sk lookup fails for ICMP_DEST_UNREACH and ICMP_TIME_EXCEEDED
> messages. The solution is for the secondary dev to be passed so that
> the interface is available for the device match to succeed, in the same
> way as is already done for non-error cases.
> 
> Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net] bridge: do not add port to router list when receives query with source 0.0.0.0
From: David Miller @ 2018-10-26 23:02 UTC (permalink / raw)
  To: liuhangbin; +Cc: netdev, nikolay, jiri, linus.luessing
In-Reply-To: <1540520923-17589-1-git-send-email-liuhangbin@gmail.com>

From: Hangbin Liu <liuhangbin@gmail.com>
Date: Fri, 26 Oct 2018 10:28:43 +0800

> Based on RFC 4541, 2.1.1.  IGMP Forwarding Rules
> 
>   The switch supporting IGMP snooping must maintain a list of
>   multicast routers and the ports on which they are attached.  This
>   list can be constructed in any combination of the following ways:
> 
>   a) This list should be built by the snooping switch sending
>      Multicast Router Solicitation messages as described in IGMP
>      Multicast Router Discovery [MRDISC].  It may also snoop
>      Multicast Router Advertisement messages sent by and to other
>      nodes.
> 
>   b) The arrival port for IGMP Queries (sent by multicast routers)
>      where the source address is not 0.0.0.0.
> 
> We should not add the port to router list when receives query with source
> 0.0.0.0.
> 
> Reported-by: Ying Xu <yinxu@redhat.com>
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH net V2 1/1] net/smc: fix smc_buf_unuse to use the lgr pointer
From: David Miller @ 2018-10-26 23:00 UTC (permalink / raw)
  To: ubraun; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl
In-Reply-To: <20181025112528.77307-1-ubraun@linux.ibm.com>

From: Ursula Braun <ubraun@linux.ibm.com>
Date: Thu, 25 Oct 2018 13:25:28 +0200

> From: Karsten Graul <kgraul@linux.ibm.com>
> 
> The pointer to the link group is unset in the smc connection structure
> right before the call to smc_buf_unuse. Provide the lgr pointer to
> smc_buf_unuse explicitly.
> And move the call to smc_lgr_schedule_free_work to the end of
> smc_conn_free.
> 
> Fixes: a6920d1d130c ("net/smc: handle unregistered buffers")
> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>

Applied and queued up for -stable, thanks!

^ permalink raw reply

* Re: [PATCH net] ipv6/ndisc: Preserve IPv6 control buffer if protocol error handlers are called
From: David Miller @ 2018-10-26 22:58 UTC (permalink / raw)
  To: sbrivio; +Cc: yoshfuji, netdev
In-Reply-To: <7acb12cb7a35c7a5f9670fc5b1373610b4d5ed67.1540384081.git.sbrivio@redhat.com>

From: Stefano Brivio <sbrivio@redhat.com>
Date: Wed, 24 Oct 2018 14:37:21 +0200

> Commit a61bbcf28a8c ("[NET]: Store skb->timestamp as offset to a base
> timestamp") introduces a neighbour control buffer and zeroes it out in
> ndisc_rcv(), as ndisc_recv_ns() uses it.
> 
> Commit f2776ff04722 ("[IPV6]: Fix address/interface handling in UDP and
> DCCP, according to the scoping architecture.") introduces the usage of the
> IPv6 control buffer in protocol error handlers (e.g. inet6_iif() in
> present-day __udp6_lib_err()).
> 
> Now, with commit b94f1c0904da ("ipv6: Use icmpv6_notify() to propagate
> redirect, instead of rt6_redirect()."), we call protocol error handlers
> from ndisc_redirect_rcv(), after the control buffer is already stolen and
> some parts are already zeroed out. This implies that inet6_iif() on this
> path will always return zero.
> 
> This gives unexpected results on UDP socket lookup in __udp6_lib_err(), as
> we might actually need to match sockets for a given interface.
> 
> Instead of always claiming the control buffer in ndisc_rcv(), do that only
> when needed.
> 
> Fixes: b94f1c0904da ("ipv6: Use icmpv6_notify() to propagate redirect, instead of rt6_redirect().")
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>

Applied and queued up for -stable, thank you.

^ permalink raw reply

* Re: [PATCH bpf] bpf: fix wrong helper enablement in cgroup local storage
From: Roman Gushchin @ 2018-10-26 22:57 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: ast@kernel.org, netdev@vger.kernel.org
In-Reply-To: <20181026224902.12455-1-daniel@iogearbox.net>

On Sat, Oct 27, 2018 at 12:49:02AM +0200, Daniel Borkmann wrote:
> Commit cd3394317653 ("bpf: introduce the bpf_get_local_storage()
> helper function") enabled the bpf_get_local_storage() helper also
> for BPF program types where it does not make sense to use them.
> 
> They have been added both in sk_skb_func_proto() and sk_msg_func_proto()
> even though both program types are not invoked in combination with
> cgroups, and neither through BPF_PROG_RUN_ARRAY(). In the latter the
> bpf_cgroup_storage_set() is set shortly before BPF program invocation.
> 
> Later, the helper bpf_get_local_storage() retrieves this prior set
> up per-cpu pointer and hands the buffer to the BPF program. The map
> argument in there solely retrieves the enum bpf_cgroup_storage_type
> from a local storage map associated with the program and based on the
> type returns either the global or per-cpu storage. However, there
> is no specific association between the program's map and the actual
> content in bpf_cgroup_storage[].
> 
> Meaning, any BPF program that would have been properly run from the
> cgroup side through BPF_PROG_RUN_ARRAY() where bpf_cgroup_storage_set()
> was performed, and that is later unloaded such that prog / maps are
> teared down will cause a use after free if that pointer is retrieved
> from programs that are not run through BPF_PROG_RUN_ARRAY() but have
> the cgroup local storage helper enabled in their func proto.
> 
> Lets just remove it from the two sock_map program types to fix it.
> Auditing through the types where this helper is enabled, it appears
> that these are the only ones where it was mistakenly allowed.
> 
> Fixes: cd3394317653 ("bpf: introduce the bpf_get_local_storage() helper function")
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Roman Gushchin <guro@fb.com>
> Acked-by: John Fastabend <john.fastabend@gmail.com>


Acked-by: Roman Gushchin <guro@fb.com>

Thanks, Daniel!

^ permalink raw reply

* Re: ethernet "bus" number in DTS ?
From: Florian Fainelli @ 2018-10-26 22:57 UTC (permalink / raw)
  To: Michal Suchánek
  Cc: Joakim Tjernlund, linuxppc-dev@lists.ozlabs.org,
	netdev@vger.kernel.org, andrew@lunn.ch, robh
In-Reply-To: <20181024082239.5ee41017@naga.suse.cz>

On 10/23/18 11:22 PM, Michal Suchánek wrote:
> On Tue, 23 Oct 2018 11:20:36 -0700
> Florian Fainelli <f.fainelli@gmail.com> wrote:
> 
>> On 10/23/18 11:02 AM, Joakim Tjernlund wrote:
>>> On Tue, 2018-10-23 at 10:03 -0700, Florian Fainelli wrote:  
> 
>>>
>>> I also noted that using status = "disabled" didn't work either to
>>> create a fix name scheme. Even worse, all the eth I/F after gets
>>> renumbered. It seems to me there is value in having stability in
>>> eth I/F naming at boot. Then userspace(udev) can rename if need be. 
>>>
>>> Sure would like to known more about why this feature is not wanted ?
>>>
>>> I found
>>>   https://patchwork.kernel.org/patch/4122441/
>>> You quote policy as reason but surely it must be better to
>>> have something stable, connected to the hardware name, than
>>> semirandom naming?  
>>
>> If the Device Tree nodes are ordered by ascending base register
>> address, my understanding is that you get the same order as far as
>> platform_device creation goes, this may not be true in the future if
>> Rob decides to randomize that, but AFAICT this is still true. This
>> may not work well with status = disabled properties being inserted
>> here and there, but we have used that here and it has worked for as
>> far as I can remember doing it.
> 
> So this is unstable in several respects. First is changing the
> enabled/disabled status in the deivecetrees provided by the kernel.
> 
> Second is if you have hardware hotplug mechanism either by firmware or
> by loading device overlays.
> 
> Third is the case when the devicetree is not built as part of the
> kernel but is instead provided by firmware that initializes the
> low-level hardware details. Then the ordering by address is not
> guaranteed nor is that the same address will be used to access the same
> interface every time. There might be multiple ways to configure the
> hardware depending on firmware configuration and/or version.
> 
>  
>> Second, you might want to name network devices ethX, but what if I
>> want to name them ethernetX or fooX or barX? Should we be accepting a
>> mechanism in the kernel that would allow someone to name the
>> interfaces the way they want straight from a name being provided in
>> Device Tree?
> 
> Clearly if there is text Ethernet1 printed above the Ethernet port we
> should provide a mechanism to name the port Ethernet1 by default.

Yes, but then have a specific property that is flexible enough to
retrieve things like "vital product information". For DSA switches, we
have an optional "label" property which names the network device
directly like it would be found on the product's case. Ideally this
should be much more flexible such that it can point to the appropriate
node/firmware service/whatever to get that name, which is what some
people have been working on lately, see [1].

[1]: https://lkml.org/lkml/2018/8/14/1039

The point is don't re-purpose the aliases which is something that exists
within Device Tree to basically provide a shorter path to a specific set
of nodes for the boot program to mangle and muck with instead of having
to resolve a full path to a node. At least that is how I conceive it.

Now what complicates the matter is that some OSes like Linux tend to use
it to also generate seemingly stable index for peripherals that are
numbered by index such as SPI, I2C, etc buses, which is why we are
having this conversation here, see below for more.

> 
>>
>> Aliases are fine for providing relative stability within the Device
>> Tree itself and boot programs that might need to modify the Device
>> Tree (e.g: inserting MAC addresses) such that you don't have to
>> encode logic to search for nodes by compatible strings etc. but
>> outside of that use case, it seems to me that you can resolve every
>> naming decision in user-space.
> 
> However, this is pushing platform-specific knowledge to userspace. The
> way the Ethernet interface is attached and hence the device properties
> usable for identifying the device uniquely are platform-specific.

There is always going to be some amount of platform specific knowledge
to user-space, what matters is the level of abstraction that is
presented to you.

> 
> On the other hand, aliases are universal when provided. If they are
> good enough to assign a MAC address they are good enough to provide a
> stable default name.

We are not talking about the same aliases then. The special Device Tree
node named "aliases" is a way to create shorted Device Tree node paths,
it is not by any means an equivalent for what I would rather call a
"label", or "port name" or silk screen annotation on a PCB for instance.

> 
> I think this is indeed forcing the userspace to reinvent several wheels
> for no good reason.

udev or systemd will come up with some stable names for your network
device right off the bat. If you are deeply embedded maybe you don't
want those, but then use something like the full path in /sys to get
some stable names based on the SoC's topology.

> 
> What is the problem with adding the aliases?

aliases is IMHO the wrong tool for the job because it is too limiting,
you want it to be used to have Ethernet controller instance N to be
named "ethN", what if someone tomorrows says, no this is not good, I
want the aliases to automatically name my network devices
"ethernet-controllerN" or "blahblahN"? You see where I am getting with that?

And yes, I do realize that Linux has traditionally named Ethernet
adapters ethN, but also allows people to name interfaces just the way
they want even from within the drivers themselves.

Now imagine your platform uses ACPI, and there are no aliases there to
point a node with a shorter name, how we would go about naming your
Ethernet controller unless there is a specific VPD/label/sticker
property that can be somehow be retried?
-- 
Florian

^ permalink raw reply

* Re: KASAN: use-after-free Read in sctp_outq_select_transport
From: Xin Long @ 2018-10-27  7:33 UTC (permalink / raw)
  To: syzbot+56a40ceee5fb35932f4d
  Cc: davem, LKML, linux-sctp, Marcelo Ricardo Leitner, network dev,
	Neil Horman, syzkaller-bugs, Vlad Yasevich
In-Reply-To: <00000000000073bd310578e29d8d@google.com>

On Tue, Oct 23, 2018 at 7:14 PM syzbot
<syzbot+56a40ceee5fb35932f4d@syzkaller.appspotmail.com> wrote:
>
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit:    23469de647c4 Merge git://git.kernel.org/pub/scm/linux/kern..
> git tree:       net
> console output: https://syzkaller.appspot.com/x/log.txt?x=13580ce5400000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=b3f55cb3dfcc6c33
> dashboard link: https://syzkaller.appspot.com/bug?extid=56a40ceee5fb35932f4d
> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>
> Unfortunately, I don't have any reproducer for this crash yet.
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+56a40ceee5fb35932f4d@syzkaller.appspotmail.com
>
> ==================================================================
> BUG: KASAN: use-after-free in sctp_outq_select_transport+0x77e/0x9a0
> net/sctp/outqueue.c:840
> Read of size 4 at addr ffff8801c5ff2614 by task syz-executor5/8275
>
> CPU: 0 PID: 8275 Comm: syz-executor5 Not tainted 4.19.0-rc8+ #152
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>   <IRQ>
>   __dump_stack lib/dump_stack.c:77 [inline]
>   dump_stack+0x1c4/0x2b6 lib/dump_stack.c:113
>   print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
>   kasan_report_error mm/kasan/report.c:354 [inline]
>   kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
>   __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432
>   sctp_outq_select_transport+0x77e/0x9a0 net/sctp/outqueue.c:840
I think we need a fix like:
@@ -586,6 +586,10 @@ void sctp_assoc_rm_peer(struct sctp_association *asoc,
                                sctp_transport_hold(active);
        }

+        list_for_each_entry(ch, &asoc->outqueue.out_chunk_list, list)
+               if (ch->transport == transport)
+                       ch->transport = NULL;
+
        asoc->peer.transport_count--;

        sctp_transport_free(peer);

>   sctp_outq_flush_data net/sctp/outqueue.c:1100 [inline]
>   sctp_outq_flush+0x14f7/0x34f0 net/sctp/outqueue.c:1209
>   sctp_outq_uncork+0x6a/0x80 net/sctp/outqueue.c:776
>   sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1820 [inline]
>   sctp_side_effects net/sctp/sm_sideeffect.c:1220 [inline]
>   sctp_do_sm+0x608/0x7190 net/sctp/sm_sideeffect.c:1191
>   sctp_assoc_bh_rcv+0x346/0x670 net/sctp/associola.c:1067
>   sctp_inq_push+0x280/0x370 net/sctp/inqueue.c:95
>   sctp_rcv+0x2d93/0x3c00 net/sctp/input.c:268
>   sctp6_rcv+0x15/0x30 net/sctp/ipv6.c:1061
>   ip6_input_finish+0x3fc/0x1aa0 net/ipv6/ip6_input.c:383
>   NF_HOOK include/linux/netfilter.h:289 [inline]
>   ip6_input+0xe9/0x600 net/ipv6/ip6_input.c:426
>   dst_input include/net/dst.h:450 [inline]
>   ip6_rcv_finish+0x17a/0x330 net/ipv6/ip6_input.c:76
>   NF_HOOK include/linux/netfilter.h:289 [inline]
>   ipv6_rcv+0x120/0x640 net/ipv6/ip6_input.c:271
>   __netif_receive_skb_one_core+0x14d/0x200 net/core/dev.c:4913
>   __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:5023
>   process_backlog+0x218/0x6f0 net/core/dev.c:5829
>   napi_poll net/core/dev.c:6249 [inline]
>   net_rx_action+0x7c5/0x1950 net/core/dev.c:6315
> IPv6: ADDRCONF(NETDEV_CHANGE): vcan0: link becomes ready
> IPv6: ADDRCONF(NETDEV_CHANGE): vcan0: link becomes ready
>   __do_softirq+0x30c/0xb03 kernel/softirq.c:292
>   do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1047
>   </IRQ>
>   do_softirq.part.13+0x126/0x160 kernel/softirq.c:336
>   do_softirq kernel/softirq.c:328 [inline]
>   __local_bh_enable_ip+0x21d/0x260 kernel/softirq.c:189
>   local_bh_enable include/linux/bottom_half.h:32 [inline]
>   rcu_read_unlock_bh include/linux/rcupdate.h:723 [inline]
>   ip6_finish_output2+0xce4/0x27a0 net/ipv6/ip6_output.c:121
>   ip6_finish_output+0x58c/0xc60 net/ipv6/ip6_output.c:154
>   NF_HOOK_COND include/linux/netfilter.h:278 [inline]
>   ip6_output+0x232/0x9d0 net/ipv6/ip6_output.c:171
>   dst_output include/net/dst.h:444 [inline]
>   NF_HOOK include/linux/netfilter.h:289 [inline]
>   ip6_xmit+0xf69/0x2420 net/ipv6/ip6_output.c:275
>   sctp_v6_xmit+0x3b2/0x790 net/sctp/ipv6.c:230
>   sctp_packet_transmit+0x298e/0x3db0 net/sctp/output.c:656
>   sctp_packet_singleton net/sctp/outqueue.c:791 [inline]
>   sctp_outq_flush_ctrl.constprop.11+0x7a9/0xe50 net/sctp/outqueue.c:922
>   sctp_outq_flush+0x310/0x34f0 net/sctp/outqueue.c:1204
>   sctp_outq_uncork+0x6a/0x80 net/sctp/outqueue.c:776
> kobject: 'rfkill9' (0000000067be229d): fill_kobj_path: path
> = '/devices/virtual/mac80211_hwsim/hwsim7/ieee80211/phy7/rfkill9'
>   sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1349 [inline]
>   sctp_side_effects net/sctp/sm_sideeffect.c:1220 [inline]
>   sctp_do_sm+0x4f25/0x7190 net/sctp/sm_sideeffect.c:1191
> ieee80211 phy7: Selected rate control algorithm 'minstrel_ht'
> kobject: 'net' (00000000cdd117dc): kobject_add_internal: parent: 'hwsim7',
> set: '(null)'
>   sctp_endpoint_bh_rcv+0x465/0x960 net/sctp/endpointola.c:456
> kobject: 'wlan0' (00000000c9d333fc): kobject_add_internal: parent: 'net',
> set: 'devices'
>   sctp_inq_push+0x280/0x370 net/sctp/inqueue.c:95
>   sctp_backlog_rcv+0x1a8/0xd50 net/sctp/input.c:351
> kobject: 'wlan0' (00000000c9d333fc): kobject_uevent_env
> kobject: 'wlan0' (00000000c9d333fc): fill_kobj_path: path
> = '/devices/virtual/mac80211_hwsim/hwsim7/net/wlan0'
>   sk_backlog_rcv include/net/sock.h:931 [inline]
>   __release_sock+0x12f/0x3a0 net/core/sock.c:2336
> kobject: 'queues' (00000000c5d009e9): kobject_add_internal:
> parent: 'wlan0', set: '<NULL>'
>   release_sock+0xad/0x2c0 net/core/sock.c:2849
>   sock_setsockopt+0x60d/0x2280 net/core/sock.c:1054
> kobject: 'queues' (00000000c5d009e9): kobject_uevent_env
> kobject: 'queues' (00000000c5d009e9): kobject_uevent_env: filter function
> caused the event to drop!
>   __sys_setsockopt+0x304/0x3c0 net/socket.c:1898
> kobject: 'rx-0' (000000007736ba36): kobject_add_internal: parent: 'queues',
> set: 'queues'
>   __do_sys_setsockopt net/socket.c:1913 [inline]
>   __se_sys_setsockopt net/socket.c:1910 [inline]
>   __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1910
>   do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
> kobject: 'rx-0' (000000007736ba36): kobject_uevent_env
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> kobject: 'rx-0' (000000007736ba36): fill_kobj_path: path
> = '/devices/virtual/mac80211_hwsim/hwsim7/net/wlan0/queues/rx-0'
> RIP: 0033:0x457569
> Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7
> 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
> ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:00007f0613beec78 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
> RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000457569
> RDX: 000000000000001a RSI: 0000000000000001 RDI: 0000000000000003
> RBP: 000000000072c040 R08: 0000000000000010 R09: 0000000000000000
> R10: 00000000200000c0 R11: 0000000000000246 R12: 00007f0613bef6d4
> R13: 00000000004c3b61 R14: 00000000004d5c88 R15: 00000000ffffffff
>
> Allocated by task 8261:
>   save_stack+0x43/0xd0 mm/kasan/kasan.c:448
> kobject: 'tx-0' (000000008f0eae4c): kobject_add_internal: parent: 'queues',
> set: 'queues'
>   set_track mm/kasan/kasan.c:460 [inline]
>   kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
>   kmem_cache_alloc_trace+0x152/0x750 mm/slab.c:3620
>   kmalloc include/linux/slab.h:513 [inline]
>   kzalloc include/linux/slab.h:707 [inline]
>   sctp_transport_new+0x10a/0x880 net/sctp/transport.c:111
>   sctp_assoc_add_peer+0x2de/0x10d0 net/sctp/associola.c:630
>   sctp_process_param net/sctp/sm_make_chunk.c:2540 [inline]
>   sctp_process_init+0xfc0/0x29e0 net/sctp/sm_make_chunk.c:2356
>   sctp_cmd_process_init net/sctp/sm_sideeffect.c:682 [inline]
>   sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1410 [inline]
>   sctp_side_effects net/sctp/sm_sideeffect.c:1220 [inline]
>   sctp_do_sm+0x13b9/0x7190 net/sctp/sm_sideeffect.c:1191
>   sctp_assoc_bh_rcv+0x346/0x670 net/sctp/associola.c:1067
>   sctp_inq_push+0x280/0x370 net/sctp/inqueue.c:95
>   sctp_backlog_rcv+0x1a8/0xd50 net/sctp/input.c:351
>   sk_backlog_rcv include/net/sock.h:931 [inline]
>   __release_sock+0x12f/0x3a0 net/core/sock.c:2336
> kobject: 'tx-0' (000000008f0eae4c): kobject_uevent_env
>   release_sock+0xad/0x2c0 net/core/sock.c:2849
>   sctp_wait_for_connect+0x391/0x640 net/sctp/socket.c:8667
>   sctp_sendmsg_to_asoc+0x1d0f/0x2230 net/sctp/socket.c:1985
>   sctp_sendmsg+0x13c2/0x1da0 net/sctp/socket.c:2131
>   inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
>   sock_sendmsg_nosec net/socket.c:621 [inline]
>   sock_sendmsg+0xd5/0x120 net/socket.c:631
>   __sys_sendto+0x3d7/0x670 net/socket.c:1788
>   __do_sys_sendto net/socket.c:1800 [inline]
>   __se_sys_sendto net/socket.c:1796 [inline]
>   __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1796
>   do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> kobject: 'tx-0' (000000008f0eae4c): fill_kobj_path: path
> = '/devices/virtual/mac80211_hwsim/hwsim7/net/wlan0/queues/tx-0'
>
> Freed by task 9:
>   save_stack+0x43/0xd0 mm/kasan/kasan.c:448
>   set_track mm/kasan/kasan.c:460 [inline]
>   __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
>   kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
>   __cache_free mm/slab.c:3498 [inline]
>   kfree+0xcf/0x230 mm/slab.c:3813
>   sctp_transport_destroy_rcu+0x4a/0x60 net/sctp/transport.c:163
>   __rcu_reclaim kernel/rcu/rcu.h:236 [inline]
>   rcu_do_batch kernel/rcu/tree.c:2576 [inline]
>   invoke_rcu_callbacks kernel/rcu/tree.c:2880 [inline]
>   __rcu_process_callbacks kernel/rcu/tree.c:2847 [inline]
>   rcu_process_callbacks+0xf23/0x2670 kernel/rcu/tree.c:2864
>   __do_softirq+0x30c/0xb03 kernel/softirq.c:292
> kobject: 'tx-1' (000000009aaee194): kobject_add_internal: parent: 'queues',
> set: 'queues'
>
> The buggy address belongs to the object at ffff8801c5ff24c0
>   which belongs to the cache kmalloc-1024 of size 1024
> The buggy address is located 340 bytes inside of
>   1024-byte region [ffff8801c5ff24c0, ffff8801c5ff28c0)
> The buggy address belongs to the page:
> page:ffffea000717fc80 count:1 mapcount:0 mapping:ffff8801da800ac0 index:0x0
> compound_mapcount: 0
> flags: 0x2fffc0000008100(slab|head)
> raw: 02fffc0000008100 ffffea00070ed808 ffffea00071b2488 ffff8801da800ac0
> raw: 0000000000000000 ffff8801c5ff2040 0000000100000007 0000000000000000
> kobject: 'tx-1' (000000009aaee194): kobject_uevent_env
> page dumped because: kasan: bad access detected
>
> Memory state around the buggy address:
>   ffff8801c5ff2500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>   ffff8801c5ff2580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > ffff8801c5ff2600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>                           ^
>   ffff8801c5ff2680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>   ffff8801c5ff2700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ==================================================================
>
>
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
>
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with
> syzbot.

^ permalink raw reply

* [PATCH net] net: sched: gred: pass the right attribute to gred_change_table_def()
From: Jakub Kicinski @ 2018-10-26 22:51 UTC (permalink / raw)
  To: davem; +Cc: tgraf, jhs, xiyou.wangcong, netdev, oss-drivers, Jakub Kicinski

gred_change_table_def() takes a pointer to TCA_GRED_DPS attribute,
and expects it will be able to interpret its contents as
struct tc_gred_sopt.  Pass the correct gred attribute, instead of
TCA_OPTIONS.

This bug meant the table definition could never be changed after
Qdisc was initialized (unless whatever TCA_OPTIONS contained both
passed netlink validation and was a valid struct tc_gred_sopt...).

Old behaviour:
$ ip link add type dummy
$ tc qdisc replace dev dummy0 parent root handle 7: \
     gred setup vqs 4 default 0
$ tc qdisc replace dev dummy0 parent root handle 7: \
     gred setup vqs 4 default 0
RTNETLINK answers: Invalid argument

Now:
$ ip link add type dummy
$ tc qdisc replace dev dummy0 parent root handle 7: \
     gred setup vqs 4 default 0
$ tc qdisc replace dev dummy0 parent root handle 7: \
     gred setup vqs 4 default 0
$ tc qdisc replace dev dummy0 parent root handle 7: \
     gred setup vqs 4 default 0

Fixes: f62d6b936df5 ("[PKT_SCHED]: GRED: Use central VQ change procedure")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 net/sched/sch_gred.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c
index cbe4831f46f4..4a042abf844c 100644
--- a/net/sched/sch_gred.c
+++ b/net/sched/sch_gred.c
@@ -413,7 +413,7 @@ static int gred_change(struct Qdisc *sch, struct nlattr *opt,
 	if (tb[TCA_GRED_PARMS] == NULL && tb[TCA_GRED_STAB] == NULL) {
 		if (tb[TCA_GRED_LIMIT] != NULL)
 			sch->limit = nla_get_u32(tb[TCA_GRED_LIMIT]);
-		return gred_change_table_def(sch, opt);
+		return gred_change_table_def(sch, tb[TCA_GRED_DPS]);
 	}
 
 	if (tb[TCA_GRED_PARMS] == NULL ||
-- 
2.17.1

^ permalink raw reply related

* [PATCH bpf] bpf: fix wrong helper enablement in cgroup local storage
From: Daniel Borkmann @ 2018-10-26 22:49 UTC (permalink / raw)
  To: ast; +Cc: netdev, Daniel Borkmann, Roman Gushchin

Commit cd3394317653 ("bpf: introduce the bpf_get_local_storage()
helper function") enabled the bpf_get_local_storage() helper also
for BPF program types where it does not make sense to use them.

They have been added both in sk_skb_func_proto() and sk_msg_func_proto()
even though both program types are not invoked in combination with
cgroups, and neither through BPF_PROG_RUN_ARRAY(). In the latter the
bpf_cgroup_storage_set() is set shortly before BPF program invocation.

Later, the helper bpf_get_local_storage() retrieves this prior set
up per-cpu pointer and hands the buffer to the BPF program. The map
argument in there solely retrieves the enum bpf_cgroup_storage_type
from a local storage map associated with the program and based on the
type returns either the global or per-cpu storage. However, there
is no specific association between the program's map and the actual
content in bpf_cgroup_storage[].

Meaning, any BPF program that would have been properly run from the
cgroup side through BPF_PROG_RUN_ARRAY() where bpf_cgroup_storage_set()
was performed, and that is later unloaded such that prog / maps are
teared down will cause a use after free if that pointer is retrieved
from programs that are not run through BPF_PROG_RUN_ARRAY() but have
the cgroup local storage helper enabled in their func proto.

Lets just remove it from the two sock_map program types to fix it.
Auditing through the types where this helper is enabled, it appears
that these are the only ones where it was mistakenly allowed.

Fixes: cd3394317653 ("bpf: introduce the bpf_get_local_storage() helper function")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Roman Gushchin <guro@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
---
 net/core/filter.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 35c6933..4d7c511 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5264,8 +5264,6 @@ sk_msg_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_msg_pull_data_proto;
 	case BPF_FUNC_msg_push_data:
 		return &bpf_msg_push_data_proto;
-	case BPF_FUNC_get_local_storage:
-		return &bpf_get_local_storage_proto;
 	default:
 		return bpf_base_func_proto(func_id);
 	}
@@ -5296,8 +5294,6 @@ sk_skb_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_sk_redirect_map_proto;
 	case BPF_FUNC_sk_redirect_hash:
 		return &bpf_sk_redirect_hash_proto;
-	case BPF_FUNC_get_local_storage:
-		return &bpf_get_local_storage_proto;
 #ifdef CONFIG_INET
 	case BPF_FUNC_sk_lookup_tcp:
 		return &bpf_sk_lookup_tcp_proto;
-- 
2.9.5

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox