Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next v2 0/3] tipc: some small fixes
From: Jon Maloy @ 2016-04-05 20:35 UTC (permalink / raw)
  To: davem
  Cc: netdev, Paul Gortmaker, parthasarathy.bhuvaragan, richard.alpe,
	ying.xue, maloy, tipc-discussion, Jon Maloy

When running TIPC in large clusters we experience behavior that
may potentially become problematic in the future. This series
picks some low-hanging fruit in this regard, and also fixes a
couple of other minor issues.

v2: Corrected typos in commit #3, as per feedback from S. Shtylyov

Jon Maloy (3):
  tipc: eliminate buffer leak in bearer layer
  tipc: stricter filtering of packets in bearer layer
  tipc: reduce transmission rate of reset messages when link is down

 net/tipc/bearer.c   | 101 ++++++++++++++++++++++++++++++----------------------
 net/tipc/discover.c |   7 ++--
 net/tipc/discover.h |   2 +-
 net/tipc/link.c     |  10 +++---
 net/tipc/msg.h      |   5 +++
 5 files changed, 73 insertions(+), 52 deletions(-)

-- 
1.9.1

^ permalink raw reply

* [PATCH net] bridge, netem: mark mailing lists as moderated
From: Stephen Hemminger @ 2016-04-05 20:43 UTC (permalink / raw)
  To: davem; +Cc: netdev, Stephen Hemminger

I moderate these (lightly loaded) lists to block spam.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 MAINTAINERS | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 67d99dd..8355536 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4303,7 +4303,7 @@ F:	drivers/net/ethernet/agere/
 
 ETHERNET BRIDGE
 M:	Stephen Hemminger <stephen@networkplumber.org>
-L:	bridge@lists.linux-foundation.org
+L:	bridge@lists.linux-foundation.org (moderated for non-subscribers)
 L:	netdev@vger.kernel.org
 W:	http://www.linuxfoundation.org/en/Net:Bridge
 S:	Maintained
@@ -7576,7 +7576,7 @@ F:	drivers/infiniband/hw/nes/
 
 NETEM NETWORK EMULATOR
 M:	Stephen Hemminger <stephen@networkplumber.org>
-L:	netem@lists.linux-foundation.org
+L:	netem@lists.linux-foundation.org (moderated for non-subscribers)
 S:	Maintained
 F:	net/sched/sch_netem.c
 
-- 
2.1.4

^ permalink raw reply related

* [PATCH] Revert "netpoll: Fix extra refcount release in netpoll_cleanup()"
From: Bjorn Helgaas @ 2016-04-05 20:58 UTC (permalink / raw)
  To: David S. Miller
  Cc: Neil Horman, Nikolay Aleksandrov, netdev, linux-kernel,
	Alexander Duyck, Bart Van Assche

This reverts commit 543e3a8da5a4c453e992d5351ef405d5e32f27d7.

Direct callers of __netpoll_setup() depend on it to set np->dev,
so we can't simply move that assignment up to netpoll_stup().

Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 net/core/netpoll.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index a57bd17..94acfc8 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -603,6 +603,7 @@ int __netpoll_setup(struct netpoll *np, struct net_device *ndev)
 	const struct net_device_ops *ops;
 	int err;
 
+	np->dev = ndev;
 	strlcpy(np->dev_name, ndev->name, IFNAMSIZ);
 	INIT_WORK(&np->cleanup_work, netpoll_async_cleanup);
 
@@ -669,7 +670,6 @@ int netpoll_setup(struct netpoll *np)
 		goto unlock;
 	}
 	dev_hold(ndev);
-	np->dev = ndev;
 
 	if (netdev_master_upper_dev_get(ndev)) {
 		np_err(np, "%s is a slave device, aborting\n", np->dev_name);
@@ -770,7 +770,6 @@ int netpoll_setup(struct netpoll *np)
 	return 0;
 
 put:
-	np->dev = NULL;
 	dev_put(ndev);
 unlock:
 	rtnl_unlock();

^ permalink raw reply related

* [RESEND PATCH net-next v2 0/3] bcmgenet cleanups
From: Petri Gynther @ 2016-04-05 20:59 UTC (permalink / raw)
  To: netdev; +Cc: davem, f.fainelli, opendmb, jaedon.shin, Petri Gynther

Three cleanup patches for bcmgenet.

Petri Gynther (3):
  net: bcmgenet: cleanup for bcmgenet_xmit()
  net: bcmgenet: cleanup for bcmgenet_xmit_frag()
  net: bcmgenet: cleanup for dmadesc_set()

 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 27 ++++++++++++++++----------
 1 file changed, 17 insertions(+), 10 deletions(-)

-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply

* [RESEND PATCH net-next v2 1/3] net: bcmgenet: cleanup for bcmgenet_xmit()
From: Petri Gynther @ 2016-04-05 20:59 UTC (permalink / raw)
  To: netdev; +Cc: davem, f.fainelli, opendmb, jaedon.shin, Petri Gynther
In-Reply-To: <1459890001-78171-1-git-send-email-pgynther@google.com>

1. Readability: Move nr_frags assignment a few lines down in order
   to bundle index -> ring -> txq calculations together.
2. Readability: Add parentheses around nr_frags + 1.
3. Minor fix: Stop the Tx queue and throw the error message only if
   the Tx queue hasn't already been stopped.

Signed-off-by: Petri Gynther <pgynther@google.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index cf6445d..7f85a84 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -1447,15 +1447,19 @@ static netdev_tx_t bcmgenet_xmit(struct sk_buff *skb, struct net_device *dev)
 	else
 		index -= 1;
 
-	nr_frags = skb_shinfo(skb)->nr_frags;
 	ring = &priv->tx_rings[index];
 	txq = netdev_get_tx_queue(dev, ring->queue);
 
+	nr_frags = skb_shinfo(skb)->nr_frags;
+
 	spin_lock_irqsave(&ring->lock, flags);
-	if (ring->free_bds <= nr_frags + 1) {
-		netif_tx_stop_queue(txq);
-		netdev_err(dev, "%s: tx ring %d full when queue %d awake\n",
-			   __func__, index, ring->queue);
+	if (ring->free_bds <= (nr_frags + 1)) {
+		if (!netif_tx_queue_stopped(txq)) {
+			netif_tx_stop_queue(txq);
+			netdev_err(dev,
+				   "%s: tx ring %d full when queue %d awake\n",
+				   __func__, index, ring->queue);
+		}
 		ret = NETDEV_TX_BUSY;
 		goto out;
 	}
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* [RESEND PATCH net-next v2 2/3] net: bcmgenet: cleanup for bcmgenet_xmit_frag()
From: Petri Gynther @ 2016-04-05 21:00 UTC (permalink / raw)
  To: netdev; +Cc: davem, f.fainelli, opendmb, jaedon.shin, Petri Gynther
In-Reply-To: <1459890001-78171-1-git-send-email-pgynther@google.com>

Add frag_size = skb_frag_size(frag) and use it when needed.

Signed-off-by: Petri Gynther <pgynther@google.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 7f85a84..d77cd6d 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -1331,6 +1331,7 @@ static int bcmgenet_xmit_frag(struct net_device *dev,
 	struct bcmgenet_priv *priv = netdev_priv(dev);
 	struct device *kdev = &priv->pdev->dev;
 	struct enet_cb *tx_cb_ptr;
+	unsigned int frag_size;
 	dma_addr_t mapping;
 	int ret;
 
@@ -1338,10 +1339,12 @@ static int bcmgenet_xmit_frag(struct net_device *dev,
 
 	if (unlikely(!tx_cb_ptr))
 		BUG();
+
 	tx_cb_ptr->skb = NULL;
 
-	mapping = skb_frag_dma_map(kdev, frag, 0,
-				   skb_frag_size(frag), DMA_TO_DEVICE);
+	frag_size = skb_frag_size(frag);
+
+	mapping = skb_frag_dma_map(kdev, frag, 0, frag_size, DMA_TO_DEVICE);
 	ret = dma_mapping_error(kdev, mapping);
 	if (ret) {
 		priv->mib.tx_dma_failed++;
@@ -1351,10 +1354,10 @@ static int bcmgenet_xmit_frag(struct net_device *dev,
 	}
 
 	dma_unmap_addr_set(tx_cb_ptr, dma_addr, mapping);
-	dma_unmap_len_set(tx_cb_ptr, dma_len, frag->size);
+	dma_unmap_len_set(tx_cb_ptr, dma_len, frag_size);
 
 	dmadesc_set(priv, tx_cb_ptr->bd_addr, mapping,
-		    (frag->size << DMA_BUFLENGTH_SHIFT) | dma_desc_flags |
+		    (frag_size << DMA_BUFLENGTH_SHIFT) | dma_desc_flags |
 		    (priv->hw_params->qtag_mask << DMA_TX_QTAG_SHIFT));
 
 	return 0;
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* [RESEND PATCH net-next v2 3/3] net: bcmgenet: cleanup for dmadesc_set()
From: Petri Gynther @ 2016-04-05 21:00 UTC (permalink / raw)
  To: netdev; +Cc: davem, f.fainelli, opendmb, jaedon.shin, Petri Gynther
In-Reply-To: <1459890001-78171-1-git-send-email-pgynther@google.com>

dmadesc_set() is used for setting the Tx buffer DMA address, length,
and status bits on a Tx ring descriptor when a frame is being Tx'ed.

Always set the Tx buffer DMA address first, before updating the length
and status bits, i.e. giving the Tx descriptor to the hardware.

The reason this is a cleanup rather than a fix is that the hardware
won't transmit anything from a Tx ring until the TDMA producer index
has been incremented. As long as the dmadesc_set() writes complete
before the TDMA producer index write, life is good.

Signed-off-by: Petri Gynther <pgynther@google.com>
---
 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index d77cd6d..f7b42b9 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -104,8 +104,8 @@ static inline void dmadesc_set_addr(struct bcmgenet_priv *priv,
 static inline void dmadesc_set(struct bcmgenet_priv *priv,
 			       void __iomem *d, dma_addr_t addr, u32 val)
 {
-	dmadesc_set_length_status(priv, d, val);
 	dmadesc_set_addr(priv, d, addr);
+	dmadesc_set_length_status(priv, d, val);
 }

 static inline dma_addr_t dmadesc_get_addr(struct bcmgenet_priv *priv,
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* Re: [RFC PATCH 0/6] ppp: add rtnetlink support
From: Guillaume Nault @ 2016-04-05 21:05 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, linux-ppp, Paul Mackerras, David Miller
In-Reply-To: <20160405082745.6d2d2aa6@xeon-e3>

.On Tue, Apr 05, 2016 at 08:27:45AM -0700, Stephen Hemminger wrote:
> On Tue, 5 Apr 2016 02:56:17 +0200
> Guillaume Nault <g.nault@alphalink.fr> wrote:
> 
> > The rtnetlink handlers implemented in this series are minimal, and can
> > only replace the PPPIOCNEWUNIT ioctl. The rest of PPP ioctls remains
> > necessary for any other operation on channels and units.
> > It is perfectly to possible to mix PPP devices created by rtnl
> > and by ioctl(PPPIOCNEWUNIT). Devices will behave in the same way,
> > except for a few specific cases (as detailed in patch #6).
> 
> What blocks PPP from being fully netlink (use attributes),
> 
I just didn't implement other netlink attributes because I wanted to
get the foundations validated first. Implementing PPP unit ioctls with
rtnetlink attributes shouldn't be a problem because there's a 1:1
mapping between units and netdevices. So we could have some kind of
feature parity (I'm not sure if all ioctls are worth a netlink
attribute though).

But there's the problem of getting the unit identifier of a PPP device.
If that device was created with kernel assigned name and index, then
the user space daemon has no ifindex or ifname for building an
RTM_GETLINK message. So the ability to retrieve the unit identifer with
rtnetlink wouldn't be enough to fully replace ioctls on unit.

If by "fully netlink", you also meant implementing a netlink
replacement for all supported ioctls, then that's going to be even
trickier. A genetlink API would probably need to be created for
handling generic operations on PPP channels. But that wouldn't be
enough since unknown ioctls on channels are passed to the
chan->ops->ioctl() callback. So netlink support would also have to be
added to the channel handlers (pptp, pppoatm, sync_ppp, irda...).

> and work with same API set independent of how device was created.
> Special cases are nuisance and source of bugs.
> 
It looks like handling rtnetlink messages in ioctl based PPP devices is
just a matter of assigning ->rtnl_link_ops in ppp_create_interface().
I'll consider that for v3.

> > I'm sending the series only as RFC this time, because there are a few
> > points I'm unsatisfied with.
> > 
> > First, I'm not fond of passing file descriptors as netlink attributes,
> > as done with IFLA_PPP_DEV_FD (which is filled with a /dev/ppp fd). But
> > given how PPP units work, we have to associate a /dev/ppp fd somehow.
> > 
> > More importantly, the locking constraints of PPP are quite problematic.
> > The rtnetlink handler has to associate the new PPP unit with the
> > /dev/ppp file descriptor passed as parameter. This requires holding the
> > ppp_mutex (see e8e56ffd9d29 "ppp: ensure file->private_data can't be
> > overridden"), while the rtnetlink callback is already protected by
> > rtnl_lock(). Since other parts of the module take these locks in
> > reverse order, most of this series deals with preparing the code for
> > inverting the dependency between rtnl_lock and ppp_mutex. Some more
> > work is needed on that part (see patch #4 for details), but I wanted
> > to be sure that approach it worth it before spending some more time on
> > it.
> 
> One other way to handle the locking is to use trylock. Yes it justs
> pushs the problem back to userspace, but that is how lock reordering was
> handled in sysfs.
>
If that's considered a valid approach, then I'll use it for v3. That'd
simplify things nicely.

^ permalink raw reply

* Re: [RFC PATCH 5/6] ppp: define reusable device creation functions
From: Guillaume Nault @ 2016-04-05 21:14 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, linux-ppp, Paul Mackerras, David Miller
In-Reply-To: <20160405082832.534257df@xeon-e3>

On Tue, Apr 05, 2016 at 08:28:32AM -0700, Stephen Hemminger wrote:
> On Tue, 5 Apr 2016 02:56:29 +0200
> Guillaume Nault <g.nault@alphalink.fr> wrote:
> 
> > Move PPP device initialisation and registration out of
> > ppp_create_interface().
> > This prepares code for device registration with rtnetlink.
> > 
> 
> Does PPP module autoload correctly based on the netlink attributes?
> 
Patch #6 has MODULE_ALIAS_RTNL_LINK("ppp"). This works fine for
auto-loading ppp_generic when creating a PPP device with rtnetlink.
Is there anything else required?


^ permalink raw reply

* Re: [RFC PATCH 6/6] ppp: add rtnetlink device creation support
From: Guillaume Nault @ 2016-04-05 21:22 UTC (permalink / raw)
  To: walter harms; +Cc: netdev, linux-ppp, Paul Mackerras, David Miller
In-Reply-To: <5703F356.6050107@bfs.de>

On Tue, Apr 05, 2016 at 07:18:14PM +0200, walter harms wrote:
> 
> 
> Am 05.04.2016 02:56, schrieb Guillaume Nault:
> > @@ -1043,12 +1048,39 @@ static int ppp_dev_configure(struct net *src_net, struct net_device *dev,
> >  			     const struct ppp_config *conf)
> >  {
> >  	struct ppp *ppp = netdev_priv(dev);
> > +	struct file *file;
> >  	int indx;
> > +	int err;
> > +
> > +	if (conf->fd < 0) {
> > +		file = conf->file;
> > +		if (!file) {
> > +			err = -EBADF;
> > +			goto out;
> 
> why not just return -EBADF;
> 
> > +		}
> > +	} else {
> > +		file = fget(conf->fd);
> > +		if (!file) {
> > +			err = -EBADF;
> > +			goto out;
> 	
> why not just return -EBADF;
> 
Just because the 'out' label is declared anyway and because this
centralises the return point. But I agree returning -EBADF directly
could be more readable. I don't have strong opinion.

^ permalink raw reply

* Re: [PATCH net-next 05/10] bnxt_en: Add get_eee() and set_eee() ethtool support.
From: Ben Hutchings @ 2016-04-05 21:22 UTC (permalink / raw)
  To: Michael Chan; +Cc: David Miller, netdev
In-Reply-To: <CACKFLinJMDC52Bbc_OFQLFAJuoC-DC5kdawkRo1J75Q5v3XOpA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1418 bytes --]

On Tue, 2016-04-05 at 03:36 -0700, Michael Chan wrote:
> On Tue, Apr 5, 2016 at 3:07 AM, Ben Hutchings <ben@decadent.org.uk> wrote:
[...]
> > > +static int bnxt_get_eee(struct net_device *dev, struct ethtool_eee *edata)
> > > +{
> > > +     struct bnxt *bp = netdev_priv(dev);
> > > +
> > > +     if (!(bp->flags & BNXT_FLAG_EEE_CAP))
> > > +             return -EOPNOTSUPP;
> > > +
> > > +     *edata = bp->eee;
> > > +     if (!bp->eee.eee_enabled) {
> > > +             edata->advertised = 0;
> > > +             edata->tx_lpi_enabled = 0;
> > What about tx_lpi_timer?
> We want to keep the tx_lpi_timer value so that it can be used again
> when it is turned on again.
> 
> The user doesn't have to figure out what value to use if he just wants
> to use the default or the last value.

OK, that seems like a good reason.

> > 
> > 
> > And, wouldn't it make more sense to do these fixups to the internal
> > state in bnxt_set_eee()?
> I don't understand.  If the user is enabling EEE, we take all the
> parameters.  If he is disabling, we don't take any of the parameters.

Right - it's just a bit weird that you keep the internal state in a
struct ethtool_eee but get_eee() returns a slightly different version
of the state.

Ben.

-- 
Ben Hutchings
No political challenge can be met by shopping. - George Monbiot

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* [PATCH] ipv6: icmp: Add protection from concurrent users in the function icmpv6_echo_reply
From: Bastien Philbert @ 2016-04-05 21:27 UTC (permalink / raw)
  To: davem; +Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel

This adds protection from concurrenct users in the function
icmpv6_echo_reply around the call to the function __in6_dev_get
by locking/unlocking around this call with calls to the functions
rtnl_lock and rtnl_unlock to protect against concurrent users
when calling this function in icmpv6_echo_reply as stated in the
comments for locking requirements for the function, __in6_dev_get.

Signed-off-by: Bastien Philbert <bastienphilbert@gmail.com>
---
 net/ipv6/icmp.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index 0a37ddc..798434f 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -607,7 +607,9 @@ static void icmpv6_echo_reply(struct sk_buff *skb)

 	hlimit = ip6_sk_dst_hoplimit(np, &fl6, dst);

+	rtnl_lock();
 	idev = __in6_dev_get(skb->dev);
+	rtnl_unlock();

 	msg.skb = skb;
 	msg.offset = 0;
-- 
2.5.0

^ permalink raw reply related

* Re: [PATCH] ipv6: icmp: Add protection from concurrent users in the function icmpv6_echo_reply
From: Hannes Frederic Sowa @ 2016-04-05 21:36 UTC (permalink / raw)
  To: Bastien Philbert, davem
  Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel
In-Reply-To: <1459891676-20836-1-git-send-email-bastienphilbert@gmail.com>

On 05.04.2016 23:27, Bastien Philbert wrote:
> This adds protection from concurrenct users in the function
> icmpv6_echo_reply around the call to the function __in6_dev_get
> by locking/unlocking around this call with calls to the functions
> rtnl_lock and rtnl_unlock to protect against concurrent users
> when calling this function in icmpv6_echo_reply as stated in the
> comments for locking requirements for the function, __in6_dev_get.
>
> Signed-off-by: Bastien Philbert <bastienphilbert@gmail.com>
> ---
>   net/ipv6/icmp.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
> index 0a37ddc..798434f 100644
> --- a/net/ipv6/icmp.c
> +++ b/net/ipv6/icmp.c
> @@ -607,7 +607,9 @@ static void icmpv6_echo_reply(struct sk_buff *skb)
>
>   	hlimit = ip6_sk_dst_hoplimit(np, &fl6, dst);
>
> +	rtnl_lock();
>   	idev = __in6_dev_get(skb->dev);
> +	rtnl_unlock();
>
>   	msg.skb = skb;
>   	msg.offset = 0;
>

We can't hold rtnl_lock in bh context. Have you seen a rcu verifier 
report? I am sure we hold rcu read lock at this point.

Bye,
Hannes

^ permalink raw reply

* [PATCH] sctp: Fix error handling for switch statement case in the function sctp_cmd_interprete
From: Bastien Philbert @ 2016-04-05 21:36 UTC (permalink / raw)
  To: vyasevich; +Cc: nhorman, davem, linux-sctp, netdev, linux-kernel

This fixes error handling for the switch statement case
SCTP_CMD_SEND_PKT by making the error value of the call
to sctp_packet_transmit equal the variable error due to
this function being able to fail with a error code. In
addition allow the call to sctp_ootb_pkt_free afterwards
to free up the no longer in use sctp packet even if the
call to the function sctp_packet_transmit fails in order
to avoid a memory leak here for not freeing the sctp

Signed-off-by: Bastien Philbert <bastienphilbert@gmail.com>
---
 net/sctp/sm_sideeffect.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index 7fe56d0..f3a8b58 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -1434,7 +1434,7 @@ static int sctp_cmd_interpreter(sctp_event_t event_type,
 		case SCTP_CMD_SEND_PKT:
 			/* Send a full packet to our peer.  */
 			packet = cmd->obj.packet;
-			sctp_packet_transmit(packet, gfp);
+			error = sctp_packet_transmit(packet, gfp);
 			sctp_ootb_pkt_free(packet);
 			break;

-- 
2.5.0

^ permalink raw reply related

* Re: [PATCH] sctp: Fix error handling for switch statement case in the function sctp_cmd_interprete
From: Daniel Borkmann @ 2016-04-05 21:53 UTC (permalink / raw)
  To: Bastien Philbert, vyasevich
  Cc: nhorman, davem, linux-sctp, netdev, linux-kernel
In-Reply-To: <1459892201-21397-1-git-send-email-bastienphilbert@gmail.com>

On 04/05/2016 11:36 PM, Bastien Philbert wrote:
> This fixes error handling for the switch statement case
> SCTP_CMD_SEND_PKT by making the error value of the call
> to sctp_packet_transmit equal the variable error due to
> this function being able to fail with a error code. In

What actual issue have you observed that you fix?

> addition allow the call to sctp_ootb_pkt_free afterwards
> to free up the no longer in use sctp packet even if the
> call to the function sctp_packet_transmit fails in order
> to avoid a memory leak here for not freeing the sctp

Not sure how this relates to your code?

> Signed-off-by: Bastien Philbert <bastienphilbert@gmail.com>
> ---
>   net/sctp/sm_sideeffect.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
> index 7fe56d0..f3a8b58 100644
> --- a/net/sctp/sm_sideeffect.c
> +++ b/net/sctp/sm_sideeffect.c
> @@ -1434,7 +1434,7 @@ static int sctp_cmd_interpreter(sctp_event_t event_type,
>   		case SCTP_CMD_SEND_PKT:
>   			/* Send a full packet to our peer.  */
>   			packet = cmd->obj.packet;
> -			sctp_packet_transmit(packet, gfp);
> +			error = sctp_packet_transmit(packet, gfp);
>   			sctp_ootb_pkt_free(packet);
>   			break;
>
>

^ permalink raw reply

* Re: [PATCH] sctp: Fix error handling for switch statement case in the function sctp_cmd_interprete
From: Bastien Philbert @ 2016-04-05 21:58 UTC (permalink / raw)
  To: Daniel Borkmann, vyasevich
  Cc: nhorman, davem, linux-sctp, netdev, linux-kernel
In-Reply-To: <570433F0.6040506@iogearbox.net>



On 2016-04-05 05:53 PM, Daniel Borkmann wrote:
> On 04/05/2016 11:36 PM, Bastien Philbert wrote:
>> This fixes error handling for the switch statement case
>> SCTP_CMD_SEND_PKT by making the error value of the call
>> to sctp_packet_transmit equal the variable error due to
>> this function being able to fail with a error code. In
> 
> What actual issue have you observed that you fix?
> 
The issue here is basically that sctp_packet_transmit
can return a error if it unsuccessfully transmit the
sk_buff as a parameter. Seems that we should signal
the user/caller(s) when a sctp packet transmission
fails here. If you would like I can resend with a better
commit message in a V2 if this explains the issue better.
Bastien
>> addition allow the call to sctp_ootb_pkt_free afterwards
>> to free up the no longer in use sctp packet even if the
>> call to the function sctp_packet_transmit fails in order
>> to avoid a memory leak here for not freeing the sctp
> 
> Not sure how this relates to your code?
> 
>> Signed-off-by: Bastien Philbert <bastienphilbert@gmail.com>
>> ---
>>   net/sctp/sm_sideeffect.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
>> index 7fe56d0..f3a8b58 100644
>> --- a/net/sctp/sm_sideeffect.c
>> +++ b/net/sctp/sm_sideeffect.c
>> @@ -1434,7 +1434,7 @@ static int sctp_cmd_interpreter(sctp_event_t event_type,
>>           case SCTP_CMD_SEND_PKT:
>>               /* Send a full packet to our peer.  */
>>               packet = cmd->obj.packet;
>> -            sctp_packet_transmit(packet, gfp);
>> +            error = sctp_packet_transmit(packet, gfp);
>>               sctp_ootb_pkt_free(packet);
>>               break;
>>
>>
> 

^ permalink raw reply

* Re: [RFC PATCH 1/5] bpf: add PHYS_DEV prog type for early driver filter
From: Alexei Starovoitov @ 2016-04-05 22:06 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Brenden Blanco, John Fastabend, Tom Herbert, Daniel Borkmann,
	David S. Miller, Linux Kernel Network Developers, ogerlitz
In-Reply-To: <20160405112905.66b84e13@redhat.com>

On Tue, Apr 05, 2016 at 11:29:05AM +0200, Jesper Dangaard Brouer wrote:
> > 
> > Of course, there are other pieces to accelerate:
> >  12.71%  ksoftirqd/1    [mlx4_en]         [k] mlx4_en_alloc_frags
> >   6.87%  ksoftirqd/1    [mlx4_en]         [k] mlx4_en_free_frag
> >   4.20%  ksoftirqd/1    [kernel.vmlinux]  [k] get_page_from_freelist
> >   4.09%  swapper        [mlx4_en]         [k] mlx4_en_process_rx_cq
> > and I think Jesper's work on batch allocation is going help that a lot.
> 
> Actually, it looks like all of this "overhead" comes from the page
> alloc/free (+ dma unmap/map). We would need a page-pool recycle
> mechanism to solve/remove this overhead.  For the early drop case we
> might be able to hack recycle the page directly in the driver (and also
> avoid dma_unmap/map cycle).

Exactly. A cache of allocated and mapped pages will help a lot both drop
and redirect use cases. After tx completion we can recycle still mmaped
page into the cache (need to make sure to map them PCI_DMA_BIDIRECTIONAL)
and rx can refill the ring with it. For load balancer steady state
we won't have any calls to page allocator and dma.
Being able to do cheap percpu pool like this is a huge advantage
that any kernel bypass cannot have. I'm pretty sure it will be
possible to avoid local_cmpxchg as well.

^ permalink raw reply

* Re: [PATCH] sctp: Fix error handling for switch statement case in the function sctp_cmd_interprete
From: Marcelo Ricardo Leitner @ 2016-04-05 22:12 UTC (permalink / raw)
  To: Bastien Philbert
  Cc: vyasevich, nhorman, davem, linux-sctp, netdev, linux-kernel
In-Reply-To: <1459892201-21397-1-git-send-email-bastienphilbert@gmail.com>

On Tue, Apr 05, 2016 at 05:36:41PM -0400, Bastien Philbert wrote:
> This fixes error handling for the switch statement case
> SCTP_CMD_SEND_PKT by making the error value of the call
> to sctp_packet_transmit equal the variable error due to
> this function being able to fail with a error code. In
> addition allow the call to sctp_ootb_pkt_free afterwards
> to free up the no longer in use sctp packet even if the
> call to the function sctp_packet_transmit fails in order
> to avoid a memory leak here for not freeing the sctp

This leak shouldn't exist as sctp_packet_transmit() will free the packet
if it returns ENOMEM, through the nomem: handling.

But about making it visible to the user, that looks interesting to me
although I cannot foresee yet its effects, like the comment at the end
of sctp_packet_transmit() on not returning EHOSTUNREACH. Did you check
it?

> 
> Signed-off-by: Bastien Philbert <bastienphilbert@gmail.com>
> ---
>  net/sctp/sm_sideeffect.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
> index 7fe56d0..f3a8b58 100644
> --- a/net/sctp/sm_sideeffect.c
> +++ b/net/sctp/sm_sideeffect.c
> @@ -1434,7 +1434,7 @@ static int sctp_cmd_interpreter(sctp_event_t event_type,
>  		case SCTP_CMD_SEND_PKT:
>  			/* Send a full packet to our peer.  */
>  			packet = cmd->obj.packet;
> -			sctp_packet_transmit(packet, gfp);
> +			error = sctp_packet_transmit(packet, gfp);
>  			sctp_ootb_pkt_free(packet);
>  			break;
>  
> -- 
> 2.5.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* [net 0/3][pull request] Intel Wired LAN Driver Updates 2016-04-05
From: Jeff Kirsher @ 2016-04-05 22:30 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann, jogreene, john.ronciak

This series contains updates to i40e and e1000.

Jesse fixes an issue where code was added by a previous commit but the
flag to enable it was never set.

Alex fixes the e1000 driver from grossly overestimated the descriptors
needed to transmit a frame.

The following are changes since commit eb8e97715f29a1240cdf67b0df725be27433259f:
  sctp: use list_* in sctp_list_dequeue
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue master

Alexander Duyck (2):
  e1000: Do not overestimate descriptor counts in Tx pre-check
  e1000: Double Tx descriptors needed check for 82544

Jesse Brandeburg (1):
  i40e: fix errant PCIe bandwidth message

 drivers/net/ethernet/intel/e1000/e1000_main.c | 21 +++++++++++++++++++--
 drivers/net/ethernet/intel/i40e/i40e_main.c   |  1 +
 2 files changed, 20 insertions(+), 2 deletions(-)

-- 
2.5.5

^ permalink raw reply

* [net 3/3] e1000: Double Tx descriptors needed check for 82544
From: Jeff Kirsher @ 2016-04-05 22:30 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <1459895451-45300-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <aduyck@mirantis.com>

The 82544 has code that adds one additional descriptor per data buffer.
However we weren't taking that into account when determining the descriptors
needed for the next transmit at the end of the xmit_frame path.

This change takes that into account by doubling the number of descriptors
needed for the 82544 so that we can avoid a potential issue where we could
hang the Tx ring by loading frames with xmit_more enabled and then stopping
the ring without writing the tail.

In addition it adds a few more descriptors to account for some additional
workarounds that have been added over time.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/e1000/e1000_main.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index d213fb4..ae90d4f 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -3256,12 +3256,29 @@ static netdev_tx_t e1000_xmit_frame(struct sk_buff *skb,
 			     nr_frags, mss);
 
 	if (count) {
+		/* The descriptors needed is higher than other Intel drivers
+		 * due to a number of workarounds.  The breakdown is below:
+		 * Data descriptors: MAX_SKB_FRAGS + 1
+		 * Context Descriptor: 1
+		 * Keep head from touching tail: 2
+		 * Workarounds: 3
+		 */
+		int desc_needed = MAX_SKB_FRAGS + 7;
+
 		netdev_sent_queue(netdev, skb->len);
 		skb_tx_timestamp(skb);
 
 		e1000_tx_queue(adapter, tx_ring, tx_flags, count);
+
+		/* 82544 potentially requires twice as many data descriptors
+		 * in order to guarantee buffers don't end on evenly-aligned
+		 * dwords
+		 */
+		if (adapter->pcix_82544)
+			desc_needed += MAX_SKB_FRAGS + 1;
+
 		/* Make sure there is space in the ring for the next send. */
-		e1000_maybe_stop_tx(netdev, tx_ring, MAX_SKB_FRAGS + 2);
+		e1000_maybe_stop_tx(netdev, tx_ring, desc_needed);
 
 		if (!skb->xmit_more ||
 		    netif_xmit_stopped(netdev_get_tx_queue(netdev, 0))) {
-- 
2.5.5

^ permalink raw reply related

* [net 1/3] i40e: fix errant PCIe bandwidth message
From: Jeff Kirsher @ 2016-04-05 22:30 UTC (permalink / raw)
  To: davem
  Cc: Jesse Brandeburg, netdev, nhorman, sassmann, jogreene,
	Anjali Singhai Jain, Stefan Assman, Jeff Kirsher
In-Reply-To: <1459895451-45300-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Jesse Brandeburg <jesse.brandeburg@intel.com>

There was an error introduced with commit 3fced535079a ("i40e: X722 is
on the IOSF bus and does not report the PCI bus info"), where code was
added but the enabling flag is never set.

CC: Anjali Singhai Jain <anjali.singhai@intel.com>
CC: Stefan Assman <sassman@redhat.com>
Fixes: 3fced535079a ("i40e: X722 is on the IOSF bus ...")
Reported-by: Steve Best <sbest@redhat.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 6700643..3449129 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -8559,6 +8559,7 @@ static int i40e_sw_init(struct i40e_pf *pf)
 			     I40E_FLAG_OUTER_UDP_CSUM_CAPABLE |
 			     I40E_FLAG_WB_ON_ITR_CAPABLE |
 			     I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE |
+			     I40E_FLAG_NO_PCI_LINK_CHECK |
 			     I40E_FLAG_100M_SGMII_CAPABLE |
 			     I40E_FLAG_USE_SET_LLDP_MIB |
 			     I40E_FLAG_GENEVE_OFFLOAD_CAPABLE;
-- 
2.5.5

^ permalink raw reply related

* [net 2/3] e1000: Do not overestimate descriptor counts in Tx pre-check
From: Jeff Kirsher @ 2016-04-05 22:30 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <1459895451-45300-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <aduyck@mirantis.com>

The current code path is capable of grossly overestimating the number of
descriptors needed to transmit a new frame.  This specifically occurs if
the skb contains a number of 4K pages.  The issue is that the logic for
determining the descriptors needed is ((S) >> (X)) + 1.  When X is 12 it
means that we were indicating that we required 2 descriptors for each 4K
page when we only needed one.

This change corrects this by instead adding (1 << (X)) - 1 to the S value
instead of adding 1 after the fact.  This way we get an accurate descriptor
needed count as we are essentially doing a DIV_ROUNDUP().

Reported-by: Ivan Suzdal <isuzdal@mirantis.com>
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/e1000/e1000_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 3fc7bde..d213fb4 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -3106,7 +3106,7 @@ static int e1000_maybe_stop_tx(struct net_device *netdev,
 	return __e1000_maybe_stop_tx(netdev, size);
 }

-#define TXD_USE_COUNT(S, X) (((S) >> (X)) + 1)
+#define TXD_USE_COUNT(S, X) (((S) + ((1 << (X)) - 1)) >> (X))
 static netdev_tx_t e1000_xmit_frame(struct sk_buff *skb,
 				    struct net_device *netdev)
 {
-- 
2.5.5

^ permalink raw reply related

* Re: [PATCH] sctp: Fix error handling for switch statement case in the function sctp_cmd_interprete
From: Bastien Philbert @ 2016-04-05 23:03 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: vyasevich, nhorman, davem, linux-sctp, netdev, linux-kernel
In-Reply-To: <20160405221220.GA15005@localhost.localdomain>



On 2016-04-05 06:12 PM, Marcelo Ricardo Leitner wrote:
> On Tue, Apr 05, 2016 at 05:36:41PM -0400, Bastien Philbert wrote:
>> This fixes error handling for the switch statement case
>> SCTP_CMD_SEND_PKT by making the error value of the call
>> to sctp_packet_transmit equal the variable error due to
>> this function being able to fail with a error code. In
>> addition allow the call to sctp_ootb_pkt_free afterwards
>> to free up the no longer in use sctp packet even if the
>> call to the function sctp_packet_transmit fails in order
>> to avoid a memory leak here for not freeing the sctp
> 
> This leak shouldn't exist as sctp_packet_transmit() will free the packet
> if it returns ENOMEM, through the nomem: handling.
> 
> But about making it visible to the user, that looks interesting to me
> although I cannot foresee yet its effects, like the comment at the end
> of sctp_packet_transmit() on not returning EHOSTUNREACH. Did you check
> it?
> 
I was aware of the -EHOSTUNREACH issue but assumed that this needs to be
known to functions internal to the kernel. TO rephase does it matter if
the callers of this function known if sctp_packet_transmit or care if it 
fails or is this just unnecessary as we do cleanup else where which is 
enough so the new error check is not needed? Again if their is a certain
test would like me to run on this patch too to make sure it's OK I don't
mind, just let me known :).
Cheers,
Bastien
>>
>> Signed-off-by: Bastien Philbert <bastienphilbert@gmail.com>
>> ---
>>  net/sctp/sm_sideeffect.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
>> index 7fe56d0..f3a8b58 100644
>> --- a/net/sctp/sm_sideeffect.c
>> +++ b/net/sctp/sm_sideeffect.c
>> @@ -1434,7 +1434,7 @@ static int sctp_cmd_interpreter(sctp_event_t event_type,
>>  		case SCTP_CMD_SEND_PKT:
>>  			/* Send a full packet to our peer.  */
>>  			packet = cmd->obj.packet;
>> -			sctp_packet_transmit(packet, gfp);
>> +			error = sctp_packet_transmit(packet, gfp);
>>  			sctp_ootb_pkt_free(packet);
>>  			break;
>>  
>> -- 
>> 2.5.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>

^ permalink raw reply

* Re: [RESEND PATCH net-next v2 3/3] net: bcmgenet: cleanup for dmadesc_set()
From: Florian Fainelli @ 2016-04-05 23:20 UTC (permalink / raw)
  To: Petri Gynther; +Cc: netdev, David Miller, opendmb, Jaedon Shin
In-Reply-To: <1459890001-78171-4-git-send-email-pgynther@google.com>

2016-04-05 14:00 GMT-07:00 Petri Gynther <pgynther@google.com>:
> dmadesc_set() is used for setting the Tx buffer DMA address, length,
> and status bits on a Tx ring descriptor when a frame is being Tx'ed.
>
> Always set the Tx buffer DMA address first, before updating the length
> and status bits, i.e. giving the Tx descriptor to the hardware.
>
> The reason this is a cleanup rather than a fix is that the hardware
> won't transmit anything from a Tx ring until the TDMA producer index
> has been incremented. As long as the dmadesc_set() writes complete
> before the TDMA producer index write, life is good.
>
> Signed-off-by: Petri Gynther <pgynther@google.com>

Acked-by: Florian Fainelli <f.fainelli@gmail.com>

^ permalink raw reply

* Re: [PATCH] sctp: Fix error handling for switch statement case in the function sctp_cmd_interprete
From: David Miller @ 2016-04-05 23:29 UTC (permalink / raw)
  To: daniel
  Cc: bastienphilbert, vyasevich, nhorman, linux-sctp, netdev,
	linux-kernel
In-Reply-To: <570433F0.6040506@iogearbox.net>

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Tue, 05 Apr 2016 23:53:52 +0200

> On 04/05/2016 11:36 PM, Bastien Philbert wrote:
>> This fixes error handling for the switch statement case
>> SCTP_CMD_SEND_PKT by making the error value of the call
>> to sctp_packet_transmit equal the variable error due to
>> this function being able to fail with a error code. In
> 
> What actual issue have you observed that you fix?
> 
>> addition allow the call to sctp_ootb_pkt_free afterwards
>> to free up the no longer in use sctp packet even if the
>> call to the function sctp_packet_transmit fails in order
>> to avoid a memory leak here for not freeing the sctp
> 
> Not sure how this relates to your code?

Bastien, I'm seeing a clear negative pattern with the bug fixes
you are submitting.

Just now you submitted the ICMP change which obviously was never
tested because it tried to take the RTNL mutex in atomic context,
and now this sctp thing.

If you don't start actually testing your changes and expalining
clearly what the problem actually is, how you discovered it,
and how you actually tested your patch, I will start completely
ignoring your patch submissions.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox