Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH V3 4/6] vhost_net: determine whether or not to use zerocopy at one time
From: Michael S. Tsirkin @ 2013-09-23  7:16 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, kvm, virtualization
In-Reply-To: <5227F274.9040506@redhat.com>

On Thu, Sep 05, 2013 at 10:54:44AM +0800, Jason Wang wrote:
> On 09/04/2013 07:59 PM, Michael S. Tsirkin wrote:
> > On Mon, Sep 02, 2013 at 04:40:59PM +0800, Jason Wang wrote:
> >> Currently, even if the packet length is smaller than VHOST_GOODCOPY_LEN, if
> >> upend_idx != done_idx we still set zcopy_used to true and rollback this choice
> >> later. This could be avoided by determining zerocopy once by checking all
> >> conditions at one time before.
> >>
> >> Signed-off-by: Jason Wang <jasowang@redhat.com>
> >> ---
> >>  drivers/vhost/net.c |   47 ++++++++++++++++++++---------------------------
> >>  1 files changed, 20 insertions(+), 27 deletions(-)
> >>
> >> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> >> index 8a6dd0d..3f89dea 100644
> >> --- a/drivers/vhost/net.c
> >> +++ b/drivers/vhost/net.c
> >> @@ -404,43 +404,36 @@ static void handle_tx(struct vhost_net *net)
> >>  			       iov_length(nvq->hdr, s), hdr_size);
> >>  			break;
> >>  		}
> >> -		zcopy_used = zcopy && (len >= VHOST_GOODCOPY_LEN ||
> >> -				       nvq->upend_idx != nvq->done_idx);
> >> +
> >> +		zcopy_used = zcopy && len >= VHOST_GOODCOPY_LEN
> >> +				   && (nvq->upend_idx + 1) % UIO_MAXIOV !=
> >> +				      nvq->done_idx
> > Thinking about this, this looks strange.
> > The original idea was that once we start doing zcopy, we keep
> > using the heads ring even for short packets until no zcopy is outstanding.
> 
> What's the reason for keep using the heads ring?

To keep completions in order.

> >
> > What's the logic behind (nvq->upend_idx + 1) % UIO_MAXIOV != nvq->done_idx
> > here?
> 
> Because we initialize both upend_idx and done_idx to zero, so upend_idx
> != done_idx could not be used to check whether or not the heads ring
> were full.

But what does ring full have to do with zerocopy use?

> >> +				   && vhost_net_tx_select_zcopy(net);
> >>  
> >>  		/* use msg_control to pass vhost zerocopy ubuf info to skb */
> >>  		if (zcopy_used) {
> >> +			struct ubuf_info *ubuf;
> >> +			ubuf = nvq->ubuf_info + nvq->upend_idx;
> >> +
> >>  			vq->heads[nvq->upend_idx].id = head;
> >> -			if (!vhost_net_tx_select_zcopy(net) ||
> >> -			    len < VHOST_GOODCOPY_LEN) {
> >> -				/* copy don't need to wait for DMA done */
> >> -				vq->heads[nvq->upend_idx].len =
> >> -							VHOST_DMA_DONE_LEN;
> >> -				msg.msg_control = NULL;
> >> -				msg.msg_controllen = 0;
> >> -				ubufs = NULL;
> >> -			} else {
> >> -				struct ubuf_info *ubuf;
> >> -				ubuf = nvq->ubuf_info + nvq->upend_idx;
> >> -
> >> -				vq->heads[nvq->upend_idx].len =
> >> -					VHOST_DMA_IN_PROGRESS;
> >> -				ubuf->callback = vhost_zerocopy_callback;
> >> -				ubuf->ctx = nvq->ubufs;
> >> -				ubuf->desc = nvq->upend_idx;
> >> -				msg.msg_control = ubuf;
> >> -				msg.msg_controllen = sizeof(ubuf);
> >> -				ubufs = nvq->ubufs;
> >> -				kref_get(&ubufs->kref);
> >> -			}
> >> +			vq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS;
> >> +			ubuf->callback = vhost_zerocopy_callback;
> >> +			ubuf->ctx = nvq->ubufs;
> >> +			ubuf->desc = nvq->upend_idx;
> >> +			msg.msg_control = ubuf;
> >> +			msg.msg_controllen = sizeof(ubuf);
> >> +			ubufs = nvq->ubufs;
> >> +			kref_get(&ubufs->kref);
> >>  			nvq->upend_idx = (nvq->upend_idx + 1) % UIO_MAXIOV;
> >> -		} else
> >> +		} else {
> >>  			msg.msg_control = NULL;
> >> +			ubufs = NULL;
> >> +		}
> >>  		/* TODO: Check specific error and bomb out unless ENOBUFS? */
> >>  		err = sock->ops->sendmsg(NULL, sock, &msg, len);
> >>  		if (unlikely(err < 0)) {
> >>  			if (zcopy_used) {
> >> -				if (ubufs)
> >> -					vhost_net_ubuf_put(ubufs);
> >> +				vhost_net_ubuf_put(ubufs);
> >>  				nvq->upend_idx = ((unsigned)nvq->upend_idx - 1)
> >>  					% UIO_MAXIOV;
> >>  			}
> >> -- 
> >> 1.7.1
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net 0/6] bnx2x: Bug fixes patch series
From: Yuval Mintz @ 2013-09-23  7:12 UTC (permalink / raw)
  To: davem, netdev; +Cc: ariele, eilong

Hi Dave,

This patch contains various bug fixes, half of which are SR-IOV related
(some fixing issues in the recently added VF RSS support), while the other fix
a wide assortments of issues in the driver.

Please consider applying these patches to `net'.

Thanks,
Yuval Mintz

^ permalink raw reply

* [PATCH net 1/6] bnx2x: Prevent mistaken hangup between driver & FW
From: Yuval Mintz @ 2013-09-23  7:12 UTC (permalink / raw)
  To: davem, netdev; +Cc: ariele, eilong, Yuval Mintz
In-Reply-To: <1379920375-10303-1-git-send-email-yuvalmin@broadcom.com>

From: Eilon Greenstein <eilong@broadcom.com>

When system CPU is stressed it's possible that the driver will not be able
to pulse the FW every second, which will cause the log to be filled with
error messages.

Increasing the threshold to 5 seconds seems to be enough to eliminate the
issue.

Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index a6704b5..f403c6b 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -5447,26 +5447,24 @@ static void bnx2x_timer(unsigned long data)
 	if (IS_PF(bp) &&
 	    !BP_NOMCP(bp)) {
 		int mb_idx = BP_FW_MB_IDX(bp);
-		u32 drv_pulse;
-		u32 mcp_pulse;
+		u16 drv_pulse;
+		u16 mcp_pulse;
 
 		++bp->fw_drv_pulse_wr_seq;
 		bp->fw_drv_pulse_wr_seq &= DRV_PULSE_SEQ_MASK;
-		/* TBD - add SYSTEM_TIME */
 		drv_pulse = bp->fw_drv_pulse_wr_seq;
 		bnx2x_drv_pulse(bp);
 
 		mcp_pulse = (SHMEM_RD(bp, func_mb[mb_idx].mcp_pulse_mb) &
 			     MCP_PULSE_SEQ_MASK);
 		/* The delta between driver pulse and mcp response
-		 * should be 1 (before mcp response) or 0 (after mcp response)
+		 * should not get too big. If the MFW is more than 5 pulses
+		 * behind, we should worry about it enough to generate an error
+		 * log.
 		 */
-		if ((drv_pulse != mcp_pulse) &&
-		    (drv_pulse != ((mcp_pulse + 1) & MCP_PULSE_SEQ_MASK))) {
-			/* someone lost a heartbeat... */
-			BNX2X_ERR("drv_pulse (0x%x) != mcp_pulse (0x%x)\n",
+		if (((drv_pulse - mcp_pulse) & MCP_PULSE_SEQ_MASK) > 5)
+			BNX2X_ERR("MFW seems hanged: drv_pulse (0x%x) != mcp_pulse (0x%x)\n",
 				  drv_pulse, mcp_pulse);
-		}
 	}
 
 	if (bp->state == BNX2X_STATE_OPEN)
-- 
1.8.1.4

^ permalink raw reply related

* [PATCH net 4/6] bnx2x: prevent masking error from cnic
From: Yuval Mintz @ 2013-09-23  7:12 UTC (permalink / raw)
  To: davem, netdev; +Cc: ariele, eilong, Yuval Mintz
In-Reply-To: <1379920375-10303-1-git-send-email-yuvalmin@broadcom.com>

During error flows while loading cnic the return value was incorrectly replaced
by that of bnx2x_set_real_num_queues(); If that function was to finish
successfully then the cnic would have mistakenly thought the load ended
successfully, causing issues (& panics) later on.

Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index 61726af..e66beff 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -2481,8 +2481,7 @@ load_error_cnic2:
 load_error_cnic1:
 	bnx2x_napi_disable_cnic(bp);
 	/* Update the number of queues without the cnic queues */
-	rc = bnx2x_set_real_num_queues(bp, 0);
-	if (rc)
+	if (bnx2x_set_real_num_queues(bp, 0))
 		BNX2X_ERR("Unable to set real_num_queues not including cnic\n");
 load_error_cnic0:
 	BNX2X_ERR("CNIC-related load failed\n");
-- 
1.8.1.4

^ permalink raw reply related

* [PATCH net 6/6] bnx2x: handle known but unsupported VF messages
From: Yuval Mintz @ 2013-09-23  7:12 UTC (permalink / raw)
  To: davem, netdev; +Cc: ariele, eilong, Yuval Mintz
In-Reply-To: <1379920375-10303-1-git-send-email-yuvalmin@broadcom.com>

From: Ariel Elior <ariele@broadcom.com>

Commit b9871bcf "bnx2x: VF RSS support - PF side" has deprecated one of
the previous existing messages. If an old VF driver were to send this message
to the PF then the PF will not reply and leave the mailbox in an unsteady
state (and cause a timeout on the VF side).

Wait until firmware ack is written before unlocking channel

Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c | 50 ++++++++++++------------
 1 file changed, 24 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
index 6cfb887..da16953 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
@@ -1765,28 +1765,28 @@ static void bnx2x_vf_mbx_request(struct bnx2x *bp, struct bnx2x_virtf *vf,
 		switch (mbx->first_tlv.tl.type) {
 		case CHANNEL_TLV_ACQUIRE:
 			bnx2x_vf_mbx_acquire(bp, vf, mbx);
-			break;
+			return;
 		case CHANNEL_TLV_INIT:
 			bnx2x_vf_mbx_init_vf(bp, vf, mbx);
-			break;
+			return;
 		case CHANNEL_TLV_SETUP_Q:
 			bnx2x_vf_mbx_setup_q(bp, vf, mbx);
-			break;
+			return;
 		case CHANNEL_TLV_SET_Q_FILTERS:
 			bnx2x_vf_mbx_set_q_filters(bp, vf, mbx);
-			break;
+			return;
 		case CHANNEL_TLV_TEARDOWN_Q:
 			bnx2x_vf_mbx_teardown_q(bp, vf, mbx);
-			break;
+			return;
 		case CHANNEL_TLV_CLOSE:
 			bnx2x_vf_mbx_close_vf(bp, vf, mbx);
-			break;
+			return;
 		case CHANNEL_TLV_RELEASE:
 			bnx2x_vf_mbx_release_vf(bp, vf, mbx);
-			break;
+			return;
 		case CHANNEL_TLV_UPDATE_RSS:
 			bnx2x_vf_mbx_update_rss(bp, vf, mbx);
-			break;
+			return;
 		}
 
 	} else {
@@ -1802,26 +1802,24 @@ static void bnx2x_vf_mbx_request(struct bnx2x *bp, struct bnx2x_virtf *vf,
 		for (i = 0; i < 20; i++)
 			DP_CONT(BNX2X_MSG_IOV, "%x ",
 				mbx->msg->req.tlv_buf_size.tlv_buffer[i]);
+	}
 
-		/* test whether we can respond to the VF (do we have an address
-		 * for it?)
-		 */
-		if (vf->state == VF_ACQUIRED || vf->state == VF_ENABLED) {
-			/* mbx_resp uses the op_rc of the VF */
-			vf->op_rc = PFVF_STATUS_NOT_SUPPORTED;
+	/* can we respond to VF (do we have an address for it?) */
+	if (vf->state == VF_ACQUIRED || vf->state == VF_ENABLED) {
+		/* mbx_resp uses the op_rc of the VF */
+		vf->op_rc = PFVF_STATUS_NOT_SUPPORTED;
 
-			/* notify the VF that we do not support this request */
-			bnx2x_vf_mbx_resp(bp, vf);
-		} else {
-			/* can't send a response since this VF is unknown to us
-			 * just ack the FW to release the mailbox and unlock
-			 * the channel.
-			 */
-			storm_memset_vf_mbx_ack(bp, vf->abs_vfid);
-			mmiowb();
-			bnx2x_unlock_vf_pf_channel(bp, vf,
-						   mbx->first_tlv.tl.type);
-		}
+		/* notify the VF that we do not support this request */
+		bnx2x_vf_mbx_resp(bp, vf);
+	} else {
+		/* can't send a response since this VF is unknown to us
+		 * just ack the FW to release the mailbox and unlock
+		 * the channel.
+		 */
+		storm_memset_vf_mbx_ack(bp, vf->abs_vfid);
+		/* Firmware ack should be written before unlocking channel */
+		mmiowb();
+		bnx2x_unlock_vf_pf_channel(bp, vf, mbx->first_tlv.tl.type);
 	}
 }
 
-- 
1.8.1.227.g44fe835

^ permalink raw reply related

* [PATCH net 5/6] bnx2x: prevent masked MCP parities from appearing
From: Yuval Mintz @ 2013-09-23  7:12 UTC (permalink / raw)
  To: davem, netdev; +Cc: ariele, eilong, Yuval Mintz
In-Reply-To: <1379920375-10303-1-git-send-email-yuvalmin@broadcom.com>

During flows which mask block attentions (e.g., register dump) all parities
are masked. However, unlike other blocks the MCP's attention is not masked
inside the block but rather the indication to the driver. If another attention
(e.g., link change) will occour while there's an MCP parity, the driver will
ignore the fact that the parity is masked and erroneously report a parity.

This patch forces the driver to read the MCP masking while checking for
parities.

Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index f403c6b..82b658d 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -4703,6 +4703,14 @@ bool bnx2x_chk_parity_attn(struct bnx2x *bp, bool *global, bool print)
 	attn.sig[3] = REG_RD(bp,
 		MISC_REG_AEU_AFTER_INVERT_4_FUNC_0 +
 			     port*4);
+	/* Since MCP attentions can't be disabled inside the block, we need to
+	 * read AEU registers to see whether they're currently disabled
+	 */
+	attn.sig[3] &= ((REG_RD(bp,
+				!port ? MISC_REG_AEU_ENABLE4_FUNC_0_OUT_0
+				      : MISC_REG_AEU_ENABLE4_FUNC_1_OUT_0) &
+			 MISC_AEU_ENABLE_MCP_PRTY_BITS) |
+			~MISC_AEU_ENABLE_MCP_PRTY_BITS);
 
 	if (!CHIP_IS_E1x(bp))
 		attn.sig[4] = REG_RD(bp,
-- 
1.8.1.4

^ permalink raw reply related

* [PATCH net 2/6] bnx2x: Fix support for VFs on some PFs
From: Yuval Mintz @ 2013-09-23  7:12 UTC (permalink / raw)
  To: davem, netdev; +Cc: ariele, eilong, Yuval Mintz
In-Reply-To: <1379920375-10303-1-git-send-email-yuvalmin@broadcom.com>

From: Ariel Elior <ariele@broadcom.com>

Due to incorrect usage of PF macros when reading information relating to
interrupts, some PFs were erroneously unable to support VFs.

Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c
index 2604b62..d9370d4 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c
@@ -1819,7 +1819,7 @@ bnx2x_get_vf_igu_cam_info(struct bnx2x *bp)
 		fid = GET_FIELD((val), IGU_REG_MAPPING_MEMORY_FID);
 		if (fid & IGU_FID_ENCODE_IS_PF)
 			current_pf = fid & IGU_FID_PF_NUM_MASK;
-		else if (current_pf == BP_ABS_FUNC(bp))
+		else if (current_pf == BP_FUNC(bp))
 			bnx2x_vf_set_igu_info(bp, sb_id,
 					      (fid & IGU_FID_VF_NUM_MASK));
 		DP(BNX2X_MSG_IOV, "%s[%d], igu_sb_id=%d, msix=%d\n",
-- 
1.8.1.4

^ permalink raw reply related

* [PATCH net 3/6] bnx2x: add missing VF resource allocation during init
From: Yuval Mintz @ 2013-09-23  7:12 UTC (permalink / raw)
  To: davem, netdev; +Cc: ariele, eilong, Yuval Mintz
In-Reply-To: <1379920375-10303-1-git-send-email-yuvalmin@broadcom.com>

From: Ariel Elior <ariele@broadcom.com>

bnx2x_iov_static_resc() should be called after IGU was read for information on
the number of available VFs, so that resources will be correctly set.

Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c
index d9370d4..9ad012b 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c
@@ -3180,6 +3180,7 @@ int bnx2x_enable_sriov(struct bnx2x *bp)
 		/* set local queue arrays */
 		vf->vfqs = &bp->vfdb->vfqs[qcount];
 		qcount += vf_sb_count(vf);
+		bnx2x_iov_static_resc(bp, vf);
 	}
 
 	/* prepare msix vectors in VF configuration space */
@@ -3187,6 +3188,8 @@ int bnx2x_enable_sriov(struct bnx2x *bp)
 		bnx2x_pretend_func(bp, HW_VF_HANDLE(bp, vf_idx));
 		REG_WR(bp, PCICFG_OFFSET + GRC_CONFIG_REG_VF_MSIX_CONTROL,
 		       num_vf_queues);
+		DP(BNX2X_MSG_IOV, "set msix vec num in VF %d cfg space to %d\n",
+		   vf_idx, num_vf_queues);
 	}
 	bnx2x_pretend_func(bp, BP_ABS_FUNC(bp));
 
-- 
1.8.1.4

^ permalink raw reply related

* Re: Bug - regression - Via velocity interface coming up freezes kernel
From: Dirk Kraft @ 2013-09-23  7:05 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, Julia Lawall
In-Reply-To: <CAFES+iKYWTn-DZRCnWer56rWeGh0t86GAx66OgZ7Jnvm-9fo9w@mail.gmail.com>

On Mon, Sep 23, 2013 at 8:29 AM, Dirk Kraft <dirk.kraft@gmail.com> wrote:
[...]
> By applying the below patch to 3.11-rc1 the problem is gone.

Uups, I meant 3.12-rc1. Sorry.

^ permalink raw reply

* [PATCH net-next] ipv6: Not need to set fl6.flowi6_flags as zero
From: roy.qing.li @ 2013-09-23  6:55 UTC (permalink / raw)
  To: netdev

From: Li RongQing <roy.qing.li@gmail.com>

setting fl6.flowi6_flags as zero after memset is redundant, Remove it.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
---
 net/ipv6/route.c |    3 ---
 1 file changed, 3 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index c979dd9..c6b2e1c 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1137,7 +1137,6 @@ void ip6_update_pmtu(struct sk_buff *skb, struct net *net, __be32 mtu,
 	memset(&fl6, 0, sizeof(fl6));
 	fl6.flowi6_oif = oif;
 	fl6.flowi6_mark = mark;
-	fl6.flowi6_flags = 0;
 	fl6.daddr = iph->daddr;
 	fl6.saddr = iph->saddr;
 	fl6.flowlabel = ip6_flowinfo(iph);
@@ -1236,7 +1235,6 @@ void ip6_redirect(struct sk_buff *skb, struct net *net, int oif, u32 mark)
 	memset(&fl6, 0, sizeof(fl6));
 	fl6.flowi6_oif = oif;
 	fl6.flowi6_mark = mark;
-	fl6.flowi6_flags = 0;
 	fl6.daddr = iph->daddr;
 	fl6.saddr = iph->saddr;
 	fl6.flowlabel = ip6_flowinfo(iph);
@@ -1258,7 +1256,6 @@ void ip6_redirect_no_header(struct sk_buff *skb, struct net *net, int oif,
 	memset(&fl6, 0, sizeof(fl6));
 	fl6.flowi6_oif = oif;
 	fl6.flowi6_mark = mark;
-	fl6.flowi6_flags = 0;
 	fl6.daddr = msg->dest;
 	fl6.saddr = iph->daddr;
 
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH net-next] net ipv4: Convert ipv4.ip_local_port_range to be per netns
From: Eric W. Biederman @ 2013-09-23  6:27 UTC (permalink / raw)
  To: David Miller; +Cc: netdev


- Move sysctl_local_ports from a global variable into struct netns_ipv4.
- Modify inet_get_local_port_range to take a struct net.
- Manually expand inet_get_local_range into ipv4_local_port_range
  because I do not know the struct net.
- Move the initialization of sysctl_local_ports into
  sysctl_net_ipv4.c:ipv4_sysctl_init_net from inet_connection_sock.c

Originally-by: Samya <samya@twitter.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 drivers/infiniband/core/cma.c   |    2 +-
 drivers/net/vxlan.c             |    2 +-
 include/net/ip.h                |    7 +----
 include/net/netns/ipv4.h        |    6 +++++
 net/ipv4/inet_connection_sock.c |   20 +++++---------
 net/ipv4/inet_hashtables.c      |    2 +-
 net/ipv4/ping.c                 |    4 +--
 net/ipv4/sysctl_net_ipv4.c      |   57 ++++++++++++++++++++++++++-------------
 net/ipv4/udp.c                  |    2 +-
 net/sctp/socket.c               |    2 +-
 security/selinux/hooks.c        |    3 ++-
 11 files changed, 61 insertions(+), 46 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 7c0f953..9627545 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -2302,7 +2302,7 @@ static int cma_alloc_any_port(struct idr *ps, struct rdma_id_private *id_priv)
 	int low, high, remaining;
 	unsigned int rover;
 
-	inet_get_local_port_range(&low, &high);
+	inet_get_local_port_range(&init_net, &low, &high);
 	remaining = (high - low) + 1;
 	rover = net_random() % remaining + low;
 retry:
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 767f7af..a105376 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1501,7 +1501,7 @@ static void vxlan_setup(struct net_device *dev)
 	vxlan->age_timer.function = vxlan_cleanup;
 	vxlan->age_timer.data = (unsigned long) vxlan;
 
-	inet_get_local_port_range(&low, &high);
+	inet_get_local_port_range(dev_net(net), &low, &high);
 	vxlan->port_min = low;
 	vxlan->port_max = high;
 	vxlan->dst_port = htons(vxlan_port);
diff --git a/include/net/ip.h b/include/net/ip.h
index a68f838..5e46435 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -195,12 +195,7 @@ static inline u64 snmp_fold_field64(void __percpu *mib[], int offt, size_t syncp
 #endif
 extern int snmp_mib_init(void __percpu *ptr[2], size_t mibsize, size_t align);
 extern void snmp_mib_free(void __percpu *ptr[2]);
-
-extern struct local_ports {
-	seqlock_t	lock;
-	int		range[2];
-} sysctl_local_ports;
-extern void inet_get_local_port_range(int *low, int *high);
+extern void inet_get_local_port_range(struct net *net, int *low, int *high);
 
 extern unsigned long *sysctl_local_reserved_ports;
 static inline int inet_is_reserved_local_port(int port)
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 2ba9de8..d685e50 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -15,6 +15,10 @@ struct fib_rules_ops;
 struct hlist_head;
 struct fib_table;
 struct sock;
+struct local_ports {
+	seqlock_t	lock;
+	int		range[2];
+};
 
 struct netns_ipv4 {
 #ifdef CONFIG_SYSCTL
@@ -62,6 +66,8 @@ struct netns_ipv4 {
 	int sysctl_icmp_ratemask;
 	int sysctl_icmp_errors_use_inbound_ifaddr;
 
+	struct local_ports sysctl_local_ports;
+
 	int sysctl_tcp_ecn;
 
 	kgid_t sysctl_ping_group_range[2];
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 6acb541..7ac7aa1 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -29,27 +29,19 @@ const char inet_csk_timer_bug_msg[] = "inet_csk BUG: unknown timer value\n";
 EXPORT_SYMBOL(inet_csk_timer_bug_msg);
 #endif
 
-/*
- * This struct holds the first and last local port number.
- */
-struct local_ports sysctl_local_ports __read_mostly = {
-	.lock = __SEQLOCK_UNLOCKED(sysctl_local_ports.lock),
-	.range = { 32768, 61000 },
-};
-
 unsigned long *sysctl_local_reserved_ports;
 EXPORT_SYMBOL(sysctl_local_reserved_ports);
 
-void inet_get_local_port_range(int *low, int *high)
+void inet_get_local_port_range(struct net *net, int *low, int *high)
 {
 	unsigned int seq;
 
 	do {
-		seq = read_seqbegin(&sysctl_local_ports.lock);
+		seq = read_seqbegin(&net->ipv4.sysctl_local_ports.lock);
 
-		*low = sysctl_local_ports.range[0];
-		*high = sysctl_local_ports.range[1];
-	} while (read_seqretry(&sysctl_local_ports.lock, seq));
+		*low = net->ipv4.sysctl_local_ports.range[0];
+		*high = net->ipv4.sysctl_local_ports.range[1];
+	} while (read_seqretry(&net->ipv4.sysctl_local_ports.lock, seq));
 }
 EXPORT_SYMBOL(inet_get_local_port_range);
 
@@ -116,7 +108,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 		int remaining, rover, low, high;
 
 again:
-		inet_get_local_port_range(&low, &high);
+		inet_get_local_port_range(net, &low, &high);
 		remaining = (high - low) + 1;
 		smallest_rover = rover = net_random() % remaining + low;
 
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 7bd8983..2779037 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -494,7 +494,7 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 		u32 offset = hint + port_offset;
 		struct inet_timewait_sock *tw = NULL;
 
-		inet_get_local_port_range(&low, &high);
+		inet_get_local_port_range(net, &low, &high);
 		remaining = (high - low) + 1;
 
 		local_bh_disable();
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 746427c..d71ecc4 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -237,11 +237,11 @@ static void inet_get_ping_group_range_net(struct net *net, kgid_t *low,
 	unsigned int seq;
 
 	do {
-		seq = read_seqbegin(&sysctl_local_ports.lock);
+		seq = read_seqbegin(&net->ipv4.sysctl_local_ports.lock);
 
 		*low = data[0];
 		*high = data[1];
-	} while (read_seqretry(&sysctl_local_ports.lock, seq));
+	} while (read_seqretry(&net->ipv4.sysctl_local_ports.lock, seq));
 }
 
 
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 610e324..b91f963 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -42,12 +42,12 @@ static int ip_ping_group_range_min[] = { 0, 0 };
 static int ip_ping_group_range_max[] = { GID_T_MAX, GID_T_MAX };
 
 /* Update system visible IP port range */
-static void set_local_port_range(int range[2])
+static void set_local_port_range(struct local_ports *ports, int range[2])
 {
-	write_seqlock(&sysctl_local_ports.lock);
-	sysctl_local_ports.range[0] = range[0];
-	sysctl_local_ports.range[1] = range[1];
-	write_sequnlock(&sysctl_local_ports.lock);
+	write_seqlock(&ports->lock);
+	ports->range[0] = range[0];
+	ports->range[1] = range[1];
+	write_sequnlock(&ports->lock);
 }
 
 /* Validate changes from /proc interface. */
@@ -55,6 +55,9 @@ static int ipv4_local_port_range(struct ctl_table *table, int write,
 				 void __user *buffer,
 				 size_t *lenp, loff_t *ppos)
 {
+	struct local_ports *ports =
+		container_of(table->data, struct local_ports, range);
+	unsigned int seq;
 	int ret;
 	int range[2];
 	struct ctl_table tmp = {
@@ -65,14 +68,19 @@ static int ipv4_local_port_range(struct ctl_table *table, int write,
 		.extra2 = &ip_local_port_range_max,
 	};
 
-	inet_get_local_port_range(range, range + 1);
+	do {
+		seq = read_seqbegin(&ports->lock);
+		range[0] = ports->range[0];
+		range[1] = ports->range[1];
+	} while (read_seqretry(&ports->lock, seq));
+
 	ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos);
 
 	if (write && ret == 0) {
 		if (range[1] < range[0])
 			ret = -EINVAL;
 		else
-			set_local_port_range(range);
+			set_local_port_range(ports, range);
 	}
 
 	return ret;
@@ -82,23 +90,27 @@ static int ipv4_local_port_range(struct ctl_table *table, int write,
 static void inet_get_ping_group_range_table(struct ctl_table *table, kgid_t *low, kgid_t *high)
 {
 	kgid_t *data = table->data;
+        struct netns_ipv4 *ipv4 =
+		container_of(table->data, struct netns_ipv4, sysctl_ping_group_range);
 	unsigned int seq;
 	do {
-		seq = read_seqbegin(&sysctl_local_ports.lock);
+		seq = read_seqbegin(&ipv4->sysctl_local_ports.lock);
 
 		*low = data[0];
 		*high = data[1];
-	} while (read_seqretry(&sysctl_local_ports.lock, seq));
+	} while (read_seqretry(&ipv4->sysctl_local_ports.lock, seq));
 }
 
 /* Update system visible IP port range */
 static void set_ping_group_range(struct ctl_table *table, kgid_t low, kgid_t high)
 {
 	kgid_t *data = table->data;
-	write_seqlock(&sysctl_local_ports.lock);
+        struct netns_ipv4 *ipv4 =
+		container_of(table->data, struct netns_ipv4, sysctl_ping_group_range);
+	write_seqlock(&ipv4->sysctl_local_ports.lock);
 	data[0] = low;
 	data[1] = high;
-	write_sequnlock(&sysctl_local_ports.lock);
+	write_sequnlock(&ipv4->sysctl_local_ports.lock);
 }
 
 /* Validate changes from /proc interface. */
@@ -474,13 +486,6 @@ static struct ctl_table ipv4_table[] = {
 		.proc_handler	= proc_dointvec
 	},
 	{
-		.procname	= "ip_local_port_range",
-		.data		= &sysctl_local_ports.range,
-		.maxlen		= sizeof(sysctl_local_ports.range),
-		.mode		= 0644,
-		.proc_handler	= ipv4_local_port_range,
-	},
-	{
 		.procname	= "ip_local_reserved_ports",
 		.data		= NULL, /* initialized in sysctl_ipv4_init */
 		.maxlen		= 65536,
@@ -837,6 +842,13 @@ static struct ctl_table ipv4_net_table[] = {
 		.proc_handler	= proc_dointvec
 	},
 	{
+		.procname	= "ip_local_port_range",
+		.maxlen		= sizeof(init_net.ipv4.sysctl_local_ports.range),
+		.data		= &init_net.ipv4.sysctl_local_ports.range,
+		.mode		= 0644,
+		.proc_handler	= ipv4_local_port_range,
+	},
+	{
 		.procname	= "tcp_mem",
 		.maxlen		= sizeof(init_net.ipv4.sysctl_tcp_mem),
 		.mode		= 0644,
@@ -871,6 +883,8 @@ static __net_init int ipv4_sysctl_init_net(struct net *net)
 			&net->ipv4.sysctl_ping_group_range;
 		table[7].data =
 			&net->ipv4.sysctl_tcp_ecn;
+		table[8].data =
+			&net->ipv4.sysctl_local_ports.range;
 
 		/* Don't export sysctls to unprivileged users */
 		if (net->user_ns != &init_user_ns)
@@ -884,6 +898,13 @@ static __net_init int ipv4_sysctl_init_net(struct net *net)
 	net->ipv4.sysctl_ping_group_range[0] = make_kgid(&init_user_ns, 1);
 	net->ipv4.sysctl_ping_group_range[1] = make_kgid(&init_user_ns, 0);
 
+	/*
+	 * Set defaults for local port range
+	 */
+	seqlock_init(&net->ipv4.sysctl_local_ports.lock);
+	net->ipv4.sysctl_local_ports.range[0] =  32768;
+	net->ipv4.sysctl_local_ports.range[1] =  61000;
+
 	tcp_init_mem(net);
 
 	net->ipv4.ipv4_hdr = register_net_sysctl(net, "net/ipv4", table);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 766e6ba..d0c3529 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -219,7 +219,7 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
 		unsigned short first, last;
 		DECLARE_BITMAP(bitmap, PORTS_PER_CHAIN);
 
-		inet_get_local_port_range(&low, &high);
+		inet_get_local_port_range(net, &low, &high);
 		remaining = (high - low) + 1;
 
 		rand = net_random();
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index c6670d2..09f46fb 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -5893,7 +5893,7 @@ static long sctp_get_port_local(struct sock *sk, union sctp_addr *addr)
 		int low, high, remaining, index;
 		unsigned int rover;
 
-		inet_get_local_port_range(&low, &high);
+		inet_get_local_port_range(sock_net(sk), &low, &high);
 		remaining = (high - low) + 1;
 		rover = net_random() % remaining + low;
 
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index c956390..558d0d9 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -3898,6 +3898,7 @@ static int selinux_socket_post_create(struct socket *sock, int family,
 static int selinux_socket_bind(struct socket *sock, struct sockaddr *address, int addrlen)
 {
 	struct sock *sk = sock->sk;
+	struct net *net = sock_net(sk);
 	u16 family;
 	int err;
 
@@ -3934,7 +3935,7 @@ static int selinux_socket_bind(struct socket *sock, struct sockaddr *address, in
 		if (snum) {
 			int low, high;
 
-			inet_get_local_port_range(&low, &high);
+			inet_get_local_port_range(net, &low, &high);
 
 			if (snum < max(PROT_SOCK, low) || snum > high) {
 				err = sel_netport_sid(sk->sk_protocol,
-- 
1.7.10.4

^ permalink raw reply related

* Re: Bug - regression - Via velocity interface coming up freezes kernel
From: Dirk Kraft @ 2013-09-23  6:29 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, Julia Lawall
In-Reply-To: <20130922221109.GA14246@electric-eye.fr.zoreil.com>

Hi

On Mon, Sep 23, 2013 at 12:11 AM, Francois Romieu <romieu@fr.zoreil.com> wrote:
[...]
> You can try this one as a wild guess before I have more time to analyze.

By applying the below patch to 3.11-rc1 the problem is gone.

I was only able to do a short test. Could not say if this has any side effects.

Thanks,
Dirk

>
> diff --git a/drivers/net/ethernet/via/via-velocity.c b/drivers/net/ethernet/via/via-velocity.c
> index d022bf9..64c42be 100644
> --- a/drivers/net/ethernet/via/via-velocity.c
> +++ b/drivers/net/ethernet/via/via-velocity.c
> @@ -2172,16 +2172,13 @@ static int velocity_poll(struct napi_struct *napi, int budget)
>         unsigned int rx_done;
>         unsigned long flags;
>
> -       spin_lock_irqsave(&vptr->lock, flags);
>         /*
>          * Do rx and tx twice for performance (taken from the VIA
>          * out-of-tree driver).
>          */
> -       rx_done = velocity_rx_srv(vptr, budget / 2);
> -       velocity_tx_srv(vptr);
> -       rx_done += velocity_rx_srv(vptr, budget - rx_done);
> +       rx_done = velocity_rx_srv(vptr, budget);
> +       spin_lock_irqsave(&vptr->lock, flags);
>         velocity_tx_srv(vptr);
> -
>         /* If budget not fully consumed, exit the polling mode */
>         if (rx_done < budget) {
>                 napi_complete(napi);

^ permalink raw reply

* Re: [Xen-devel] [PATCH net-next] xen-netfront: convert to GRO API and advertise this feature
From: annie li @ 2013-09-23  6:22 UTC (permalink / raw)
  To: Jason Wang
  Cc: Anirban Chakraborty, Wei Liu, <netdev@vger.kernel.org>,
	Ian Campbell, <xen-devel@lists.xen.org>
In-Reply-To: <523FCB4D.30801@redhat.com>


On 2013-9-23 13:02, Jason Wang wrote:
> On 09/23/2013 07:04 AM, Anirban Chakraborty wrote:
>> On Sep 22, 2013, at 5:09 AM, Wei Liu <wei.liu2@citrix.com> wrote:
>>
>>> On Sun, Sep 22, 2013 at 02:29:15PM +0800, Jason Wang wrote:
>>>> On 09/22/2013 12:05 AM, Wei Liu wrote:
>>>>> Anirban was seeing netfront received MTU size packets, which downgraded
>>>>> throughput. The following patch makes netfront use GRO API which
>>>>> improves throughput for that case.
>>>>>
>>>>> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
>>>>> Signed-off-by: Anirban Chakraborty <abchak@juniper.net>
>>>>> Cc: Ian Campbell <ian.campbell@citrix.com>
>>>> Maybe a dumb question: doesn't Xen depends on the driver of host card to
>>>> do GRO and pass it to netfront? What the case that netfront can receive
>>> The would be the ideal situation. Netback pushes large packets to
>>> netfront and netfront sees large packets.
>>>
>>>> a MTU size packet, for a card that does not support GRO in host? Doing
>>> However Anirban saw the case when backend interface receives large
>>> packets but netfront sees MTU size packets, so my thought is there is
>>> certain configuration that leads to this issue. As we cannot tell
>>> users what to enable and what not to enable so I would like to solve
>>> this within our driver.
>>>
>>>> GRO twice may introduce extra overheads.
>>>>
>>> AIUI if the packet that frontend sees is large already then the GRO path
>>> is quite short which will not introduce heavy penalty, while on the
>>> other hand if packet is segmented doing GRO improves throughput.
>>>
>> Thanks Wei, for explaining and submitting the patch. I would like add following to what you have already mentioned.
>> In my configuration, I was seeing netback was pushing large packets to the guest (Centos 6.4) but the netfront was receiving MTU sized packets. With this patch on, I do see large packets received on the guest interface. As a result there was substantial throughput improvement in the guest side (2.8 Gbps to 3.8 Gbps). Also, note that the host NIC driver was enabled for GRO already.
>>
>> -Anirban
> In this case, even if you still want to do GRO. It's better to find the
> root cause of why the GSO packet were segmented

Totally agree, we need to find the cause why large packets is segmented 
only in different host case.

> (maybe GSO were not
> enabled for netback?), since it introduces extra overheads.

 From Anirban's feedback, large packets can be seen on vif interface, 
and even on guests running on the same host.

Thanks
Annie

^ permalink raw reply

* Re: [PATCH net 0/5] bnx2x: Link fixes
From: David Miller @ 2013-09-23  6:10 UTC (permalink / raw)
  To: yanivr; +Cc: netdev, eilong
In-Reply-To: <1379851166-11959-1-git-send-email-yanivr@broadcom.com>

From: "Yaniv Rosner" <yanivr@broadcom.com>
Date: Sun, 22 Sep 2013 14:59:21 +0300

> The following patch series contain few link fixes.
> Please consider applying it to net.

All applied, thanks a lot.

^ permalink raw reply

* Re: [PATCH net-next 11/11] sfc: Add static tracepoints to datapath
From: David Miller @ 2013-09-23  6:08 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1379788592.1681.45.camel@bwh-desktop.uk.level5networks.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Sat, 21 Sep 2013 19:36:32 +0100

> These tracepoints support the driver-specific datapath feature tests
> we're running internally, though they might be useful for other
> purposes.  The skb fields are chosen to cover driver features
> implemented now or likely to be added soon.
> 
> (Includes a bug fix from Edward Cree.)
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Make this generic, rather than specific to this driver.

You're putting a tracepoint right before calls to the generic core
receive interfaces.  That makes no sense at all.

^ permalink raw reply

* Re: [PATCH net-next] xen-netfront: convert to GRO API and advertise this feature
From: Eric Dumazet @ 2013-09-23  5:58 UTC (permalink / raw)
  To: Anirban Chakraborty
  Cc: Wei Liu, <netdev@vger.kernel.org>,
	<xen-devel@lists.xen.org>, Ian Campbell
In-Reply-To: <D310C490-3B7D-4937-95DC-AFC1682B60BE@juniper.net>

On Sun, 2013-09-22 at 23:09 +0000, Anirban Chakraborty wrote:
> On Sep 22, 2013, at 7:55 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > On Sat, 2013-09-21 at 17:05 +0100, Wei Liu wrote:
> >> Anirban was seeing netfront received MTU size packets, which downgraded
> >> throughput. The following patch makes netfront use GRO API which
> >> improves throughput for that case.
> > 
> >> -	netdev->hw_features	= NETIF_F_IP_CSUM | NETIF_F_SG | NETIF_F_TSO;
> >> +	netdev->hw_features	= NETIF_F_IP_CSUM | NETIF_F_SG | NETIF_F_TSO |
> >> +				  NETIF_F_GRO;
> > 
> > 
> > This part is not needed.
> 
> Shouldn't the flag be set? In dev_gro_receive() we do check if this flag is set or not:
> 
>         if (!(skb->dev->features & NETIF_F_GRO) || netpoll_rx_on(skb))
>                goto normal;

Drivers do not set NETIF_F_GRO themselves, they do not need to.

Look at other drivers which are GRO ready : NETIF_F_GRO is enabled by
default by core networking stack, in register_netdevice()


dev->hw_features |= NETIF_F_SOFT_FEATURES;
dev->features |= NETIF_F_SOFT_FEATURES;

^ permalink raw reply

* Re: [PATCH 01/12] ping.h: Remove extern from function prototypes
From: David Miller @ 2013-09-23  5:51 UTC (permalink / raw)
  To: joe; +Cc: netdev, linux-kernel
In-Reply-To: <b692aba58032f629907ea2d462c99b87906645ca.1379870986.git.joe@perches.com>


Series applied, thanks Joe.

^ permalink raw reply

* Re: [Xen-devel] [PATCH net-next] xen-netfront: convert to GRO API and advertise this feature
From: Jason Wang @ 2013-09-23  5:02 UTC (permalink / raw)
  To: Anirban Chakraborty, Wei Liu
  Cc: <netdev@vger.kernel.org>, Ian Campbell,
	<xen-devel@lists.xen.org>
In-Reply-To: <9C83E3AC-719D-4290-8C19-A06356C4BFFA@juniper.net>

On 09/23/2013 07:04 AM, Anirban Chakraborty wrote:
> On Sep 22, 2013, at 5:09 AM, Wei Liu <wei.liu2@citrix.com> wrote:
>
>> On Sun, Sep 22, 2013 at 02:29:15PM +0800, Jason Wang wrote:
>>> On 09/22/2013 12:05 AM, Wei Liu wrote:
>>>> Anirban was seeing netfront received MTU size packets, which downgraded
>>>> throughput. The following patch makes netfront use GRO API which
>>>> improves throughput for that case.
>>>>
>>>> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
>>>> Signed-off-by: Anirban Chakraborty <abchak@juniper.net>
>>>> Cc: Ian Campbell <ian.campbell@citrix.com>
>>> Maybe a dumb question: doesn't Xen depends on the driver of host card to
>>> do GRO and pass it to netfront? What the case that netfront can receive
>> The would be the ideal situation. Netback pushes large packets to
>> netfront and netfront sees large packets.
>>
>>> a MTU size packet, for a card that does not support GRO in host? Doing
>> However Anirban saw the case when backend interface receives large
>> packets but netfront sees MTU size packets, so my thought is there is
>> certain configuration that leads to this issue. As we cannot tell
>> users what to enable and what not to enable so I would like to solve
>> this within our driver.
>>
>>> GRO twice may introduce extra overheads.
>>>
>> AIUI if the packet that frontend sees is large already then the GRO path
>> is quite short which will not introduce heavy penalty, while on the
>> other hand if packet is segmented doing GRO improves throughput.
>>
> Thanks Wei, for explaining and submitting the patch. I would like add following to what you have already mentioned.
> In my configuration, I was seeing netback was pushing large packets to the guest (Centos 6.4) but the netfront was receiving MTU sized packets. With this patch on, I do see large packets received on the guest interface. As a result there was substantial throughput improvement in the guest side (2.8 Gbps to 3.8 Gbps). Also, note that the host NIC driver was enabled for GRO already. 
>
> -Anirban

In this case, even if you still want to do GRO. It's better to find the
root cause of why the GSO packet were segmented (maybe GSO were not
enabled for netback?), since it introduces extra overheads.
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply

* Re: [PATCH v2] qlge: call ql_core_dump() only if dump memory was allocated.
From: Jitendra Kalsaria @ 2013-09-23  4:41 UTC (permalink / raw)
  To: Malahal Naineni, netdev
In-Reply-To: <1379712077-31750-1-git-send-email-malahal@us.ibm.com>

On 9/20/13 2:21 PM, "Malahal Naineni" <malahal@us.ibm.com> wrote:

>Also changed a log message to indicate that memory was not allocated
>instead of memory not available!
>
>Signed-off-by: Malahal Naineni <malahal@us.ibm.com>
>---
> drivers/net/ethernet/qlogic/qlge/qlge_dbg.c | 4 ++--
> drivers/net/ethernet/qlogic/qlge/qlge_mpi.c | 2 +-
> 2 files changed, 3 insertions(+), 3 deletions(-)

Acked-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>

Thanks,
    Jitendra

^ permalink raw reply

* [PATCH] i40e: using for_each_set_bit to simplify the code
From: Wei Yongjun @ 2013-09-23  3:39 UTC (permalink / raw)
  To: jeffrey.t.kirsher, jesse.brandeburg, bruce.w.allan,
	carolyn.wyborny, donald.c.skidmore, gregory.v.rose,
	peter.p.waskiewicz.jr, alexander.h.duyck, john.ronciak,
	tushar.n.dave, shannon.nelson, mitch.a.williams
  Cc: yongjun_wei, e1000-devel, netdev

From: Wei Yongjun <yongjun_wei@trendmicro.com.cn>

Using for_each_set_bit() to simplify the code.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 64 ++++------------------
 1 file changed, 12 insertions(+), 52 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 8967e58..84d7675 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -260,23 +260,17 @@ static void i40e_config_irq_link_list(struct i40e_vf *vf, u16 vsi_idx,
 		goto irq_list_done;
 	}
 	tempmap = vecmap->rxq_map;
-	vsi_queue_id = find_first_bit(&tempmap, I40E_MAX_VSI_QP);
-	while (vsi_queue_id < I40E_MAX_VSI_QP) {
+	for_each_set_bit(vsi_queue_id, &tempmap, I40E_MAX_VSI_QP) {
 		linklistmap |= (1 <<
 				(I40E_VIRTCHNL_SUPPORTED_QTYPES *
 				 vsi_queue_id));
-		vsi_queue_id =
-		    find_next_bit(&tempmap, I40E_MAX_VSI_QP, vsi_queue_id + 1);
 	}
 
 	tempmap = vecmap->txq_map;
-	vsi_queue_id = find_first_bit(&tempmap, I40E_MAX_VSI_QP);
-	while (vsi_queue_id < I40E_MAX_VSI_QP) {
+	for_each_set_bit(vsi_queue_id, &tempmap, I40E_MAX_VSI_QP) {
 		linklistmap |= (1 <<
 				(I40E_VIRTCHNL_SUPPORTED_QTYPES * vsi_queue_id
 				 + 1));
-		vsi_queue_id = find_next_bit(&tempmap, I40E_MAX_VSI_QP,
-					     vsi_queue_id + 1);
 	}
 
 	next_q = find_first_bit(&linklistmap,
@@ -1291,27 +1285,21 @@ static int i40e_vc_config_irq_map_msg(struct i40e_vf *vf, u8 *msg, u16 msglen)
 
 		/* lookout for the invalid queue index */
 		tempmap = map->rxq_map;
-		vsi_queue_id = find_first_bit(&tempmap, I40E_MAX_VSI_QP);
-		while (vsi_queue_id < I40E_MAX_VSI_QP) {
+		for_each_set_bit(vsi_queue_id, &tempmap, I40E_MAX_VSI_QP) {
 			if (!i40e_vc_isvalid_queue_id(vf, vsi_id,
 						      vsi_queue_id)) {
 				aq_ret = I40E_ERR_PARAM;
 				goto error_param;
 			}
-			vsi_queue_id = find_next_bit(&tempmap, I40E_MAX_VSI_QP,
-						     vsi_queue_id + 1);
 		}
 
 		tempmap = map->txq_map;
-		vsi_queue_id = find_first_bit(&tempmap, I40E_MAX_VSI_QP);
-		while (vsi_queue_id < I40E_MAX_VSI_QP) {
+		for_each_set_bit(vsi_queue_id, &tempmap, I40E_MAX_VSI_QP) {
 			if (!i40e_vc_isvalid_queue_id(vf, vsi_id,
 						      vsi_queue_id)) {
 				aq_ret = I40E_ERR_PARAM;
 				goto error_param;
 			}
-			vsi_queue_id = find_next_bit(&tempmap, I40E_MAX_VSI_QP,
-						     vsi_queue_id + 1);
 		}
 
 		i40e_config_irq_link_list(vf, vsi_id, map);
@@ -1356,31 +1344,23 @@ static int i40e_vc_enable_queues_msg(struct i40e_vf *vf, u8 *msg, u16 msglen)
 	}
 
 	tempmap = vqs->rx_queues;
-	queue_id = find_first_bit(&tempmap, I40E_MAX_VSI_QP);
-	while (queue_id < I40E_MAX_VSI_QP) {
+	for_each_set_bit(queue_id, &tempmap, I40E_MAX_VSI_QP) {
 		if (!i40e_vc_isvalid_queue_id(vf, vsi_id, queue_id)) {
 			aq_ret = I40E_ERR_PARAM;
 			goto error_param;
 		}
 		i40e_ctrl_vsi_rx_queue(vf, vsi_id, queue_id,
 				       I40E_QUEUE_CTRL_ENABLE);
-
-		queue_id = find_next_bit(&tempmap, I40E_MAX_VSI_QP,
-					 queue_id + 1);
 	}
 
 	tempmap = vqs->tx_queues;
-	queue_id = find_first_bit(&tempmap, I40E_MAX_VSI_QP);
-	while (queue_id < I40E_MAX_VSI_QP) {
+	for_each_set_bit(queue_id, &tempmap, I40E_MAX_VSI_QP) {
 		if (!i40e_vc_isvalid_queue_id(vf, vsi_id, queue_id)) {
 			aq_ret = I40E_ERR_PARAM;
 			goto error_param;
 		}
 		i40e_ctrl_vsi_tx_queue(vf, vsi_id, queue_id,
 				       I40E_QUEUE_CTRL_ENABLE);
-
-		queue_id = find_next_bit(&tempmap, I40E_MAX_VSI_QP,
-					 queue_id + 1);
 	}
 
 	/* Poll the status register to make sure that the
@@ -1389,29 +1369,23 @@ static int i40e_vc_enable_queues_msg(struct i40e_vf *vf, u8 *msg, u16 msglen)
 	udelay(10);
 
 	tempmap = vqs->rx_queues;
-	queue_id = find_first_bit(&tempmap, I40E_MAX_VSI_QP);
-	while (queue_id < I40E_MAX_VSI_QP) {
+	for_each_set_bit(queue_id, &tempmap, I40E_MAX_VSI_QP) {
 		if (i40e_ctrl_vsi_rx_queue(vf, vsi_id, queue_id,
 					   I40E_QUEUE_CTRL_ENABLECHECK)) {
 			dev_err(&pf->pdev->dev,
 				"Queue control check failed on RX queue %d of VSI %d VF %d\n",
 				queue_id, vsi_id, vf->vf_id);
 		}
-		queue_id = find_next_bit(&tempmap, I40E_MAX_VSI_QP,
-					 queue_id + 1);
 	}
 
 	tempmap = vqs->tx_queues;
-	queue_id = find_first_bit(&tempmap, I40E_MAX_VSI_QP);
-	while (queue_id < I40E_MAX_VSI_QP) {
+	for_each_set_bit(queue_id, &tempmap, I40E_MAX_VSI_QP) {
 		if (i40e_ctrl_vsi_tx_queue(vf, vsi_id, queue_id,
 					   I40E_QUEUE_CTRL_ENABLECHECK)) {
 			dev_err(&pf->pdev->dev,
 				"Queue control check failed on TX queue %d of VSI %d VF %d\n",
 				queue_id, vsi_id, vf->vf_id);
 		}
-		queue_id = find_next_bit(&tempmap, I40E_MAX_VSI_QP,
-					 queue_id + 1);
 	}
 
 error_param:
@@ -1455,31 +1429,23 @@ static int i40e_vc_disable_queues_msg(struct i40e_vf *vf, u8 *msg, u16 msglen)
 	}
 
 	tempmap = vqs->rx_queues;
-	queue_id = find_first_bit(&tempmap, I40E_MAX_VSI_QP);
-	while (queue_id < I40E_MAX_VSI_QP) {
+	for_each_set_bit(queue_id, &tempmap, I40E_MAX_VSI_QP) {
 		if (!i40e_vc_isvalid_queue_id(vf, vsi_id, queue_id)) {
 			aq_ret = I40E_ERR_PARAM;
 			goto error_param;
 		}
 		i40e_ctrl_vsi_rx_queue(vf, vsi_id, queue_id,
 				       I40E_QUEUE_CTRL_DISABLE);
-
-		queue_id = find_next_bit(&tempmap, I40E_MAX_VSI_QP,
-					 queue_id + 1);
 	}
 
 	tempmap = vqs->tx_queues;
-	queue_id = find_first_bit(&tempmap, I40E_MAX_VSI_QP);
-	while (queue_id < I40E_MAX_VSI_QP) {
+	for_each_set_bit(queue_id, &tempmap, I40E_MAX_VSI_QP) {
 		if (!i40e_vc_isvalid_queue_id(vf, vsi_id, queue_id)) {
 			aq_ret = I40E_ERR_PARAM;
 			goto error_param;
 		}
 		i40e_ctrl_vsi_tx_queue(vf, vsi_id, queue_id,
 				       I40E_QUEUE_CTRL_DISABLE);
-
-		queue_id = find_next_bit(&tempmap, I40E_MAX_VSI_QP,
-					 queue_id + 1);
 	}
 
 	/* Poll the status register to make sure that the
@@ -1488,29 +1454,23 @@ static int i40e_vc_disable_queues_msg(struct i40e_vf *vf, u8 *msg, u16 msglen)
 	udelay(10);
 
 	tempmap = vqs->rx_queues;
-	queue_id = find_first_bit(&tempmap, I40E_MAX_VSI_QP);
-	while (queue_id < I40E_MAX_VSI_QP) {
+	for_each_set_bit(queue_id, &tempmap, I40E_MAX_VSI_QP) {
 		if (i40e_ctrl_vsi_rx_queue(vf, vsi_id, queue_id,
 					   I40E_QUEUE_CTRL_DISABLECHECK)) {
 			dev_err(&pf->pdev->dev,
 				"Queue control check failed on RX queue %d of VSI %d VF %d\n",
 				queue_id, vsi_id, vf->vf_id);
 		}
-		queue_id = find_next_bit(&tempmap, I40E_MAX_VSI_QP,
-					 queue_id + 1);
 	}
 
 	tempmap = vqs->tx_queues;
-	queue_id = find_first_bit(&tempmap, I40E_MAX_VSI_QP);
-	while (queue_id < I40E_MAX_VSI_QP) {
+	for_each_set_bit(queue_id, &tempmap, I40E_MAX_VSI_QP) {
 		if (i40e_ctrl_vsi_tx_queue(vf, vsi_id, queue_id,
 					   I40E_QUEUE_CTRL_DISABLECHECK)) {
 			dev_err(&pf->pdev->dev,
 				"Queue control check failed on TX queue %d of VSI %d VF %d\n",
 				queue_id, vsi_id, vf->vf_id);
 		}
-		queue_id = find_next_bit(&tempmap, I40E_MAX_VSI_QP,
-					 queue_id + 1);
 	}
 
 error_param:

^ permalink raw reply related

* [PATCH] i40e: remove unused including <linux/version.h>
From: Wei Yongjun @ 2013-09-23  3:39 UTC (permalink / raw)
  To: jeffrey.t.kirsher, jesse.brandeburg, bruce.w.allan,
	carolyn.wyborny, donald.c.skidmore, gregory.v.rose,
	peter.p.waskiewicz.jr, alexander.h.duyck, john.ronciak,
	tushar.n.dave, shannon.nelson
  Cc: yongjun_wei, e1000-devel, netdev

From: Wei Yongjun <yongjun_wei@trendmicro.com.cn>

Remove including <linux/version.h> that don't need it.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
---
 drivers/net/ethernet/intel/i40e/i40e.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index b5252eb..3f232ab 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -46,7 +46,6 @@
 #include <linux/sctp.h>
 #include <linux/pkt_sched.h>
 #include <linux/ipv6.h>
-#include <linux/version.h>
 #include <net/checksum.h>
 #include <net/ip6_checksum.h>
 #include <linux/ethtool.h>

^ permalink raw reply related

* [PATCH] i40e: fix error return code in i40e_probe()
From: Wei Yongjun @ 2013-09-23  2:47 UTC (permalink / raw)
  To: jeffrey.t.kirsher, jesse.brandeburg, bruce.w.allan,
	carolyn.wyborny, donald.c.skidmore, gregory.v.rose,
	peter.p.waskiewicz.jr, alexander.h.duyck, john.ronciak,
	tushar.n.dave, shannon.nelson
  Cc: yongjun_wei, e1000-devel, netdev

From: Wei Yongjun <yongjun_wei@trendmicro.com.cn>

Fix to return -ENOMEM in the memory alloc error handling
case instead of 0, as done elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 601d482..117e014 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -7060,8 +7060,10 @@ static int i40e_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	 */
 	len = sizeof(struct i40e_vsi *) * pf->hw.func_caps.num_vsis;
 	pf->vsi = kzalloc(len, GFP_KERNEL);
-	if (!pf->vsi)
+	if (!pf->vsi) {
+		err = -ENOMEM;
 		goto err_switch_setup;
+	}
 
 	err = i40e_setup_pf_switch(pf);
 	if (err) {

^ permalink raw reply related

* Re: [PATCH 01/12] ping.h: Remove extern from function prototypes
From: Joe Perches @ 2013-09-23  2:43 UTC (permalink / raw)
  To: Ryan Mallon; +Cc: netdev, David S. Miller, linux-kernel
In-Reply-To: <523FA98E.4080908@gmail.com>

On Mon, 2013-09-23 at 12:38 +1000, Ryan Mallon wrote:
> A checkpatch rule might help,

Extant.

^ permalink raw reply

* Re: [PATCH 01/12] ping.h: Remove extern from function prototypes
From: Ryan Mallon @ 2013-09-23  2:38 UTC (permalink / raw)
  To: Joe Perches; +Cc: netdev, David S. Miller, linux-kernel
In-Reply-To: <1379902599.3575.9.camel@joe-AO722>

On 23/09/13 12:16, Joe Perches wrote:
> On Mon, 2013-09-23 at 11:59 +1000, Ryan Mallon wrote:
>> This seems like a lot of code churn for very little benefit. At a quick
>> glance:
>>
>>   git grep extern include/ | wc -l
>>   11427
>>
>> Not all of those will need to be removed, but that is still a huge
>> number to change, and doesn't include extern usage in C files or local
>> headers. You are probably never going to remove all the instances, so
>> what is the point of just randomly doing a handful?
> 
> Rather more than a handful.
> 
> The ratio of function prototypes without extern to
> function prototypes with extern is currently ~2.5:1
> 
> So:
> 
> Standardization without extern
> Line count reduction (~10%)
> Miscellaneous neatening at the same time
> Removal of all unnecessary externs from include/net
> 
> There are ~8500 instances in include/
> There are ~1500 instances in include/net/
> 
> After this series, 0 in include/net/
> 
> Start somewhere, go from there...
> 
> $ git grep -E "^\s*\bextern(\s+\w+){1,4}\s*\(\s*[^\*]" include/ | wc -l
> 8395
> $ git grep -E "^\s*\bextern(\s+\w+){1,4}\s*\(\s*[^\*]" include/net/ | wc -l
> 1471

Right, and:

  $ git grep -E "^\s*\bextern(\s+\w+){1,4}\s*\(\s*[^\*]" | wc -l
  29104

Since there are lots of local/arch headers, and there are uses of extern
function prototypes in C files.

I don't see the real benefit though. Its like trying to "clean-up" the
difference between "unsigned x" and "unsigned int x", or any number of
other minor style differences. Either version, with or without the
extern, is correct, valid C code. Plus you will get people adding new
instances of extern because they don't know any better. A checkpatch
rule might help, but we all know how often people run that...

~Ryan

^ permalink raw reply

* Re: [net-next] hp100: replace hardcoded name in /proc/interrupts with interface name
From: Matthew Whitehead @ 2013-09-23  2:26 UTC (permalink / raw)
  To: Mihir Singh; +Cc: netdev
In-Reply-To: <1379789289-10961-1-git-send-email-me@mihirsingh.com>

On Sat, Sep 21, 2013 at 06:48:09PM +0000, Mihir Singh wrote:
> The /proc/interrupts file displays hp100, which is not the accepted style. Printing eth%d is more helpful.
> 
> Signed-off-by: Mihir Singh <me@mihirsingh.com>
> ---
>  drivers/net/ethernet/hp/hp100.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/hp/hp100.c b/drivers/net/ethernet/hp/hp100.c
> index 91227d0..3786009 100644
> --- a/drivers/net/ethernet/hp/hp100.c
> +++ b/drivers/net/ethernet/hp/hp100.c
> @@ -1098,7 +1098,7 @@ static int hp100_open(struct net_device *dev)
>  	if (request_irq(dev->irq, hp100_interrupt,
>  			lp->bus == HP100_BUS_PCI || lp->bus ==
>  			HP100_BUS_EISA ? IRQF_SHARED : 0,
> -			"hp100", dev)) {
> +			dev->name, dev)) {
>  		printk("hp100: %s: unable to get IRQ %d\n", dev->name, dev->irq);
>  		return -EAGAIN;
>  	}
> -- 
> 1.8.1.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reviewed-By: Matthew Whitehead <tedheadster@gmail.com>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox