Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 4/4] Phonet: restore flow control credits when sending fails
From: Kumar SANGHVI @ 2010-09-30  8:31 UTC (permalink / raw)
  To: netdev@vger.kernel.org, davem@davemloft.net,
	remi.denis-courmont@nokia.com, "eric.dumazet@gmail.co
  Cc: Gulshan KARMANI, Linus WALLEIJ
In-Reply-To: <1285835105-20293-1-git-send-email-kumar.sanghvi@stericsson.com>

Hi All,

On Thu, Sep 30, 2010 at 10:25:05 +0200, Kumar A SANGHVI wrote:
> From: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
> 
> This patch restores the below flow control patch submitted by Rémi
> Denis-Courmont, which accidentaly got lost due to Pipe controller patch
> on Phonet.
> 
> 	commit 1a98214feef2221cd7c24b17cd688a5a9d85b2ea
> 	Author: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
> 	Date:   Mon Aug 30 12:57:03 2010 +0000
> 
> 	Phonet: restore flow control credits when sending fails
> 
> 	Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
> 	Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> Signed-off-by: Kumar Sanghvi <kumar.sanghvi@stericsson.com>
> Acked-by: Linus Walleij <linus.walleij@stericsson.com>

Please discard this.
I will send a new patch.

Thanks,
Kumar. 

^ permalink raw reply

* [PATCH 4/4] Phonet: restore flow control credits when sending fails
From: Kumar A Sanghvi @ 2010-09-30  8:25 UTC (permalink / raw)
  To: netdev, davem, remi.denis-courmont, eric.dumazet
  Cc: gulshan.karmani, Kumar Sanghvi, Linus Walleij

From: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>

This patch restores the below flow control patch submitted by Rémi
Denis-Courmont, which accidentaly got lost due to Pipe controller patch
on Phonet.

	commit 1a98214feef2221cd7c24b17cd688a5a9d85b2ea
	Author: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
	Date:   Mon Aug 30 12:57:03 2010 +0000

	Phonet: restore flow control credits when sending fails

	Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
	Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Kumar Sanghvi <kumar.sanghvi@stericsson.com>
Acked-by: Linus Walleij <linus.walleij@stericsson.com>
---
 net/phonet/pep.c |   10 ++++++++--
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/phonet/pep.c b/net/phonet/pep.c
index 9746c6d..aa3d870 100644
--- a/net/phonet/pep.c
+++ b/net/phonet/pep.c
@@ -1289,6 +1289,7 @@ static int pipe_skb_send(struct sock *sk, struct sk_buff *skb)
 {
 	struct pep_sock *pn = pep_sk(sk);
 	struct pnpipehdr *ph;
+	int err;
 #ifdef CONFIG_PHONET_PIPECTRLR
 	struct sockaddr_pn spn = {
 		.spn_family = AF_PHONET,
@@ -1315,10 +1316,15 @@ static int pipe_skb_send(struct sock *sk, struct sk_buff *skb)
 		ph->message_id = PNS_PIPE_DATA;
 	ph->pipe_handle = pn->pipe_handle;
 #ifdef CONFIG_PHONET_PIPECTRLR
-	return pn_skb_send(sk, skb, &spn);
+	err = pn_skb_send(sk, skb, &spn);
 #else
-	return pn_skb_send(sk, skb, &pipe_srv);
+	err = pn_skb_send(sk, skb, &pipe_srv);
 #endif
+
+	if (err && pn_flow_safe(pn->tx_fc))
+		atomic_inc(&pn->tx_credits);
+	return err;
+
 }
 
 static int pep_sendmsg(struct kiocb *iocb, struct sock *sk,
-- 
1.7.2.dirty


^ permalink raw reply related

* Re: VLAN packets silently dropped in promiscuous mode
From: Roger Luethi @ 2010-09-30  8:07 UTC (permalink / raw)
  To: Jesse Gross; +Cc: netdev, Patrick McHardy
In-Reply-To: <AANLkTi=QAoVH9KU8FAH6iscWt_mKnZpU3P9QqgFRLqXM@mail.gmail.com>

On Wed, 29 Sep 2010 10:44:26 -0700, Jesse Gross wrote:
> On Wed, Sep 29, 2010 at 4:37 AM, Roger Luethi <rl@hellgate.ch> wrote:
> > I noticed packets for unknown VLANs getting silently dropped even in
> > promiscuous mode (this is true only for the hardware accelerated path).
> > netif_nit_deliver was introduced specifically to prevent that, but the
> > function gets called only _after_ packets from unknown VLANs have been
> > dropped.
> 
> Some drivers are fixing this on a case by case basis by disabling
> hardware accelerated VLAN stripping when in promiscuous mode, i.e.:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5f6c01819979afbfec7e0b15fe52371b8eed87e8
> 
> However, at this point it is more or less random which drivers do
> this.  It would obviously be much better if it were consistent.

My understanding is this. Hardware VLAN tagging and stripping can always be
enabled. The kernel passes 802.1Q information along with the stripped
header to libpcap which reassembles the original header where necessary.
Works for me.

Hardware VLAN filtering, on the other hand, must be disabled in promiscuous
mode. But doing that in the driver makes no difference now as the current
VLAN code drops the packets so preserved before they are passed to the pcap
interface. That appears to be a bug.

Roger

^ permalink raw reply

* [net-next-2.6 PATCH 2/3] e1000e: reset PHY after errors detected on 82574 Si
From: Jeff Kirsher @ 2010-09-30  7:39 UTC (permalink / raw)
  To: davem
  Cc: netdev, gospo, bphilips, Jesse Brandeburg, Carolyn Wyborny,
	Jeff Kirsher
In-Reply-To: <20100930073814.13378.4212.stgit@localhost.localdomain>

From: Carolyn Wyborny <carolyn.wyborny@intel.com>

Some errors can be induced in the PHY via environmental testing
(specifically extreme temperature changes and electro static
discharge testing), and in the case of the PHY hanging due to
this input, this detects the problem and resets to continue.
This issue only applies to 82574 silicon.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/e1000e/82571.c  |   40 +++++++++++++++++++++++++++++++++++++++-
 drivers/net/e1000e/e1000.h  |    3 +++
 drivers/net/e1000e/netdev.c |   25 +++++++++++++++++++++++++
 3 files changed, 67 insertions(+), 1 deletions(-)

diff --git a/drivers/net/e1000e/82571.c b/drivers/net/e1000e/82571.c
index ca663f1..db5ef6e 100644
--- a/drivers/net/e1000e/82571.c
+++ b/drivers/net/e1000e/82571.c
@@ -52,6 +52,10 @@
 			      (ID_LED_DEF1_DEF2))
 
 #define E1000_GCR_L1_ACT_WITHOUT_L0S_RX 0x08000000
+#define E1000_BASE1000T_STATUS          10
+#define E1000_IDLE_ERROR_COUNT_MASK     0xFF
+#define E1000_RECEIVE_ERROR_COUNTER     21
+#define E1000_RECEIVE_ERROR_MAX         0xFFFF
 
 #define E1000_NVM_INIT_CTRL2_MNGM 0x6000 /* Manageability Operation Mode mask */
 
@@ -1243,6 +1247,39 @@ static s32 e1000_led_on_82574(struct e1000_hw *hw)
 }
 
 /**
+ *  e1000_check_phy_82574 - check 82574 phy hung state
+ *  @hw: pointer to the HW structure
+ *
+ *  Returns whether phy is hung or not
+ **/
+bool e1000_check_phy_82574(struct e1000_hw *hw)
+{
+	u16 status_1kbt = 0;
+	u16 receive_errors = 0;
+	bool phy_hung = false;
+	s32 ret_val = 0;
+
+	/*
+	 * Read PHY Receive Error counter first, if its is max - all F's then
+	 * read the Base1000T status register If both are max then PHY is hung.
+	 */
+	ret_val = e1e_rphy(hw, E1000_RECEIVE_ERROR_COUNTER, &receive_errors);
+
+	if (ret_val)
+		goto out;
+	if (receive_errors == E1000_RECEIVE_ERROR_MAX)  {
+		ret_val = e1e_rphy(hw, E1000_BASE1000T_STATUS, &status_1kbt);
+		if (ret_val)
+			goto out;
+		if ((status_1kbt & E1000_IDLE_ERROR_COUNT_MASK) ==
+		    E1000_IDLE_ERROR_COUNT_MASK)
+			phy_hung = true;
+	}
+out:
+	return phy_hung;
+}
+
+/**
  *  e1000_setup_link_82571 - Setup flow control and link settings
  *  @hw: pointer to the HW structure
  *
@@ -1858,7 +1895,8 @@ struct e1000_info e1000_82574_info = {
 				  | FLAG_RX_CSUM_ENABLED
 				  | FLAG_HAS_SMART_POWER_DOWN
 				  | FLAG_HAS_AMT
-				  | FLAG_HAS_CTRLEXT_ON_LOAD,
+				  | FLAG_HAS_CTRLEXT_ON_LOAD
+				  | FLAG2_CHECK_PHY_HANG,
 	.pba			= 36,
 	.max_hw_frame_size	= DEFAULT_JUMBO,
 	.get_variants		= e1000_get_variants_82571,
diff --git a/drivers/net/e1000e/e1000.h b/drivers/net/e1000e/e1000.h
index 5ec0af5..da3b82f 100644
--- a/drivers/net/e1000e/e1000.h
+++ b/drivers/net/e1000e/e1000.h
@@ -397,6 +397,7 @@ struct e1000_adapter {
 	struct work_struct print_hang_task;
 
 	bool idle_check;
+	int phy_hang_count;
 };
 
 struct e1000_info {
@@ -453,6 +454,7 @@ struct e1000_info {
 #define FLAG2_HAS_PHY_STATS               (1 << 4)
 #define FLAG2_HAS_EEE                     (1 << 5)
 #define FLAG2_DMA_BURST                   (1 << 6)
+#define FLAG2_CHECK_PHY_HANG              (1 << 7)
 
 #define E1000_RX_DESC_PS(R, i)	    \
 	(&(((union e1000_rx_desc_packet_split *)((R).desc))[i]))
@@ -630,6 +632,7 @@ extern s32 e1000_get_phy_info_ife(struct e1000_hw *hw);
 extern s32 e1000_check_polarity_ife(struct e1000_hw *hw);
 extern s32 e1000_phy_force_speed_duplex_ife(struct e1000_hw *hw);
 extern s32 e1000_check_polarity_igp(struct e1000_hw *hw);
+extern bool e1000_check_phy_82574(struct e1000_hw *hw);
 
 static inline s32 e1000_phy_hw_reset(struct e1000_hw *hw)
 {
diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index 1aa4228..a8b55ab 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -4060,6 +4060,28 @@ static void e1000e_enable_receives(struct e1000_adapter *adapter)
 	}
 }
 
+static void e1000e_check_82574_phy_workaround(struct e1000_adapter *adapter)
+{
+	struct e1000_hw *hw = &adapter->hw;
+
+	if (!(adapter->flags2 & FLAG2_CHECK_PHY_HANG))
+		return;
+
+	/*
+	 * With 82574 controllers, PHY needs to be checked periodically
+	 * for hung state and reset, if two calls return true
+	 */
+	if (e1000_check_phy_82574(hw))
+		adapter->phy_hang_count++;
+	else
+		adapter->phy_hang_count = 0;
+
+	if (adapter->phy_hang_count > 1) {
+		adapter->phy_hang_count = 0;
+		schedule_work(&adapter->reset_task);
+	}
+}
+
 /**
  * e1000_watchdog - Timer Call-back
  * @data: pointer to adapter cast into an unsigned long
@@ -4295,6 +4317,9 @@ link_up:
 	if (e1000e_get_laa_state_82571(hw))
 		e1000e_rar_set(hw, adapter->hw.mac.addr, 0);
 
+	if (adapter->flags2 & FLAG2_CHECK_PHY_HANG)
+		e1000e_check_82574_phy_workaround(adapter);
+
 	/* Reset the timer */
 	if (!test_bit(__E1000_DOWN, &adapter->state))
 		mod_timer(&adapter->watchdog_timer,


^ permalink raw reply related

* Re: [PATCH] net: Fix IPv6 PMTU disc. w/ asymmetric routes
From: David Miller @ 2010-09-30  7:41 UTC (permalink / raw)
  To: zenczykowski; +Cc: netdev, yoshfuji
In-Reply-To: <AANLkTi=9THOcD9FieK3uy635C1kNDc=uEdtDe4qm1WU6@mail.gmail.com>

From: Maciej Żenczykowski <zenczykowski@gmail.com>
Date: Tue, 28 Sep 2010 15:37:26 -0700

> * I still think that handling the saddr == NULL ie. INADDR_ANY case is
> entirely superfluous, since it doesn't actually iterate through all
> possible source addresses.  With IPv6 there can be many, many possible
> source addresses (just think of link local vs global public vs privacy
> addresses and then tack on 6to4 and mobility, etc... for example I see
> 13 ipv6 addresses on eth0 on my desktop at home, 12 of them globally
> reachable).

I only have %100 confidence in the reasoning behind why ipv4 handles
things this way, so I'll discuss this in those terms and then try
to tie it into the ipv6 side.

When we are looking up an ipv4 output route, there are 2 "source
address" objects.

1) The one specified in the "struct flowi" for the lookup
   (the flp->fl4_src passed into ip_route_output_flow) which
   is also the one that ends up in the routing cache entry's
   ->fl.fl4_src member.

2) The one contained in the routing cache entry's specification.
   Ie. rth->rt_src

These are distinct.  #1 is what is used to hash and find a matching
routing cache entry.

Since a source address of INADDR_ANY is allowed for routing lookups,
routing cache entries for the same daddr/saddr pair can exist in more
than one hash chains.

Therefore, if we didn't iterate over INADDR_ANY and the specific
address in the icmp PMTU message, we'd miss some routing cache
entries.

Look at the PMTU loops in ipv4 ip_rt_frag_needed():

	for (k = 0; k < 2; k++) {
		for (i = 0; i < 2; i++) {
			unsigned hash = rt_hash(daddr, skeys[i], ikeys[k],
						rt_genid(net));

("ANY vs. specific" ifindex and saddr are used for the hash
 computation)

				if (rth->fl.fl4_dst != daddr ||
				    rth->fl.fl4_src != skeys[i] ||
				    rth->rt_dst != daddr ||
				    rth->rt_src != iph->saddr ||
				    rth->fl.oif != ikeys[k] ||
				    rth->fl.iif != 0 ||

(and for the routing cache entry flow member comparisons)

But the routing cache entry "rt_src" member is compared always to
"iph->saddr", it doesn't use the "ANY vs. specific" skey[] value.

Unless ipv6 does not allow INADDR_ANY source address specifications
during route lookups, it ought to have the same issue too.

My understanding is that ipv6 uses a two-layered tree based scheme,
one layer to key off of the source address and one layer to key off
of the destination address.

So it seems to me that the lookups would have the same aliasing issue
that ipv4 does, and thus require checking both the specific saddr
and also the saddr INADDR_ANY.

Maybe the problem is that the ipv6 side uses the same saddr for both
the lookup and the entry comparison in these PMTU code paths?  Does it
not allow specifying them seperately as the ipv4 PMTU (and incidently
the RT redirect) code paths do?

Or is this not an issue on the ipv6 side for some reason?

^ permalink raw reply

* [net-next-2.6 PATCH 3/3] e1000e: 82579 performance improvements
From: Jeff Kirsher @ 2010-09-30  7:39 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, bphilips, Bruce Allan, Jeff Kirsher
In-Reply-To: <20100930073814.13378.4212.stgit@localhost.localdomain>

From: Bruce Allan <bruce.w.allan@intel.com>

The initial support for 82579 was tuned poorly for performance.  Adjust the
packet buffer allocation appropriately for both standard and jumbo frames;
and for jumbo frames increase the receive descriptor pre-fetch, disable
adaptive interrupt moderation and set the DMA latency tolerance.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/e1000e/e1000.h   |    1 +
 drivers/net/e1000e/ich8lan.c |    2 +-
 drivers/net/e1000e/netdev.c  |   50 +++++++++++++++++++++++++++++++++++++-----
 3 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/drivers/net/e1000e/e1000.h b/drivers/net/e1000e/e1000.h
index da3b82f..8a79bbd 100644
--- a/drivers/net/e1000e/e1000.h
+++ b/drivers/net/e1000e/e1000.h
@@ -455,6 +455,7 @@ struct e1000_info {
 #define FLAG2_HAS_EEE                     (1 << 5)
 #define FLAG2_DMA_BURST                   (1 << 6)
 #define FLAG2_CHECK_PHY_HANG              (1 << 7)
+#define FLAG2_DISABLE_AIM                 (1 << 8)
 
 #define E1000_RX_DESC_PS(R, i)	    \
 	(&(((union e1000_rx_desc_packet_split *)((R).desc))[i]))
diff --git a/drivers/net/e1000e/ich8lan.c b/drivers/net/e1000e/ich8lan.c
index 57b5435..e3374d9 100644
--- a/drivers/net/e1000e/ich8lan.c
+++ b/drivers/net/e1000e/ich8lan.c
@@ -3986,7 +3986,7 @@ struct e1000_info e1000_pch2_info = {
 				  | FLAG_APME_IN_WUC,
 	.flags2			= FLAG2_HAS_PHY_STATS
 				  | FLAG2_HAS_EEE,
-	.pba			= 18,
+	.pba			= 26,
 	.max_hw_frame_size	= DEFAULT_JUMBO,
 	.get_variants		= e1000_get_variants_ich8lan,
 	.mac_ops		= &ich8_mac_ops,
diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index a8b55ab..c194804 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -2290,6 +2290,11 @@ static void e1000_set_itr(struct e1000_adapter *adapter)
 		goto set_itr_now;
 	}
 
+	if (adapter->flags2 & FLAG2_DISABLE_AIM) {
+		new_itr = 0;
+		goto set_itr_now;
+	}
+
 	adapter->tx_itr = e1000_update_itr(adapter,
 				    adapter->tx_itr,
 				    adapter->total_tx_packets,
@@ -2338,7 +2343,10 @@ set_itr_now:
 		if (adapter->msix_entries)
 			adapter->rx_ring->set_itr = 1;
 		else
-			ew32(ITR, 1000000000 / (new_itr * 256));
+			if (new_itr)
+				ew32(ITR, 1000000000 / (new_itr * 256));
+			else
+				ew32(ITR, 0);
 	}
 }
 
@@ -2920,7 +2928,7 @@ static void e1000_configure_rx(struct e1000_adapter *adapter)
 
 	/* irq moderation */
 	ew32(RADV, adapter->rx_abs_int_delay);
-	if (adapter->itr_setting != 0)
+	if ((adapter->itr_setting != 0) && (adapter->itr != 0))
 		ew32(ITR, 1000000000 / (adapter->itr * 256));
 
 	ctrl_ext = er32(CTRL_EXT);
@@ -2965,11 +2973,13 @@ static void e1000_configure_rx(struct e1000_adapter *adapter)
 	 * packet size is equal or larger than the specified value (in 8 byte
 	 * units), e.g. using jumbo frames when setting to E1000_ERT_2048
 	 */
-	if (adapter->flags & FLAG_HAS_ERT) {
+	if ((adapter->flags & FLAG_HAS_ERT) ||
+	    (adapter->hw.mac.type == e1000_pch2lan)) {
 		if (adapter->netdev->mtu > ETH_DATA_LEN) {
 			u32 rxdctl = er32(RXDCTL(0));
 			ew32(RXDCTL(0), rxdctl | 0x3);
-			ew32(ERT, E1000_ERT_2048 | (1 << 13));
+			if (adapter->flags & FLAG_HAS_ERT)
+				ew32(ERT, E1000_ERT_2048 | (1 << 13));
 			/*
 			 * With jumbo frames and early-receive enabled,
 			 * excessive C-state transition latencies result in
@@ -3232,9 +3242,35 @@ void e1000e_reset(struct e1000_adapter *adapter)
 		fc->low_water = 0x05048;
 		fc->pause_time = 0x0650;
 		fc->refresh_time = 0x0400;
+		if (adapter->netdev->mtu > ETH_DATA_LEN) {
+			pba = 14;
+			ew32(PBA, pba);
+		}
 		break;
 	}
 
+	/*
+	 * Disable Adaptive Interrupt Moderation if 2 full packets cannot
+	 * fit in receive buffer and early-receive not supported.
+	 */
+	if (adapter->itr_setting & 0x3) {
+		if (((adapter->max_frame_size * 2) > (pba << 10)) &&
+		    !(adapter->flags & FLAG_HAS_ERT)) {
+			if (!(adapter->flags2 & FLAG2_DISABLE_AIM)) {
+				dev_info(&adapter->pdev->dev,
+					"Interrupt Throttle Rate turned off\n");
+				adapter->flags2 |= FLAG2_DISABLE_AIM;
+				ew32(ITR, 0);
+			}
+		} else if (adapter->flags2 & FLAG2_DISABLE_AIM) {
+			dev_info(&adapter->pdev->dev,
+				 "Interrupt Throttle Rate turned on\n");
+			adapter->flags2 &= ~FLAG2_DISABLE_AIM;
+			adapter->itr = 20000;
+			ew32(ITR, 1000000000 / (adapter->itr * 256));
+		}
+	}
+
 	/* Allow time for pending master requests to run */
 	mac->ops.reset_hw(hw);
 
@@ -3553,7 +3589,8 @@ static int e1000_open(struct net_device *netdev)
 		e1000_update_mng_vlan(adapter);
 
 	/* DMA latency requirement to workaround early-receive/jumbo issue */
-	if (adapter->flags & FLAG_HAS_ERT)
+	if ((adapter->flags & FLAG_HAS_ERT) ||
+	    (adapter->hw.mac.type == e1000_pch2lan))
 		pm_qos_add_request(&adapter->netdev->pm_qos_req,
 				   PM_QOS_CPU_DMA_LATENCY,
 				   PM_QOS_DEFAULT_VALUE);
@@ -3662,7 +3699,8 @@ static int e1000_close(struct net_device *netdev)
 	if (adapter->flags & FLAG_HAS_AMT)
 		e1000_release_hw_control(adapter);
 
-	if (adapter->flags & FLAG_HAS_ERT)
+	if ((adapter->flags & FLAG_HAS_ERT) ||
+	    (adapter->hw.mac.type == e1000_pch2lan))
 		pm_qos_remove_request(&adapter->netdev->pm_qos_req);
 
 	pm_runtime_put_sync(&pdev->dev);


^ permalink raw reply related

* [net-next-2.6 PATCH 1/3] e1000e: use hardware writeback batching
From: Jeff Kirsher @ 2010-09-30  7:38 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, bphilips, Jesse Brandeburg, Jeff Kirsher

From: Jesse Brandeburg <jesse.brandeburg@intel.com>

Most e1000e parts support batching writebacks.  The problem with this is
that when some of the TADV or TIDV timers are not set, Tx can sit forever.

This is solved in this patch with write flushes using the Flush Partial
Descriptors (FPD) bit in TIDV and RDTR.

This improves bus utilization and removes partial writes on e1000e,
particularly from 82571 parts in S5500 chipset based machines.

Only ES2LAN and 82571/2 parts are included in this optimization, to reduce
testing load.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/e1000e/82571.c   |    6 +++--
 drivers/net/e1000e/defines.h |    2 ++
 drivers/net/e1000e/e1000.h   |   28 ++++++++++++++++++++++
 drivers/net/e1000e/es2lan.c  |    1 +
 drivers/net/e1000e/netdev.c  |   53 ++++++++++++++++++++++++++++++++++++++++++
 drivers/net/e1000e/param.c   |    2 --
 6 files changed, 88 insertions(+), 4 deletions(-)

diff --git a/drivers/net/e1000e/82571.c b/drivers/net/e1000e/82571.c
index d3d4a57..ca663f1 100644
--- a/drivers/net/e1000e/82571.c
+++ b/drivers/net/e1000e/82571.c
@@ -1801,7 +1801,8 @@ struct e1000_info e1000_82571_info = {
 				  | FLAG_RESET_OVERWRITES_LAA /* errata */
 				  | FLAG_TARC_SPEED_MODE_BIT /* errata */
 				  | FLAG_APME_CHECK_PORT_B,
-	.flags2			= FLAG2_DISABLE_ASPM_L1, /* errata 13 */
+	.flags2			= FLAG2_DISABLE_ASPM_L1 /* errata 13 */
+				  | FLAG2_DMA_BURST,
 	.pba			= 38,
 	.max_hw_frame_size	= DEFAULT_JUMBO,
 	.get_variants		= e1000_get_variants_82571,
@@ -1819,7 +1820,8 @@ struct e1000_info e1000_82572_info = {
 				  | FLAG_RX_CSUM_ENABLED
 				  | FLAG_HAS_CTRLEXT_ON_LOAD
 				  | FLAG_TARC_SPEED_MODE_BIT, /* errata */
-	.flags2			= FLAG2_DISABLE_ASPM_L1, /* errata 13 */
+	.flags2			= FLAG2_DISABLE_ASPM_L1 /* errata 13 */
+				  | FLAG2_DMA_BURST,
 	.pba			= 38,
 	.max_hw_frame_size	= DEFAULT_JUMBO,
 	.get_variants		= e1000_get_variants_82571,
diff --git a/drivers/net/e1000e/defines.h b/drivers/net/e1000e/defines.h
index 93b3bed..d3f7a9c 100644
--- a/drivers/net/e1000e/defines.h
+++ b/drivers/net/e1000e/defines.h
@@ -446,7 +446,9 @@
 
 /* Transmit Descriptor Control */
 #define E1000_TXDCTL_PTHRESH 0x0000003F /* TXDCTL Prefetch Threshold */
+#define E1000_TXDCTL_HTHRESH 0x00003F00 /* TXDCTL Host Threshold */
 #define E1000_TXDCTL_WTHRESH 0x003F0000 /* TXDCTL Writeback Threshold */
+#define E1000_TXDCTL_GRAN    0x01000000 /* TXDCTL Granularity */
 #define E1000_TXDCTL_FULL_TX_DESC_WB 0x01010000 /* GRAN=1, WTHRESH=1 */
 #define E1000_TXDCTL_MAX_TX_DESC_PREFETCH 0x0100001F /* GRAN=1, PTHRESH=31 */
 /* Enable the counting of desc. still to be processed. */
diff --git a/drivers/net/e1000e/e1000.h b/drivers/net/e1000e/e1000.h
index f9a31c8..5ec0af5 100644
--- a/drivers/net/e1000e/e1000.h
+++ b/drivers/net/e1000e/e1000.h
@@ -153,6 +153,33 @@ struct e1000_info;
 /* Time to wait before putting the device into D3 if there's no link (in ms). */
 #define LINK_TIMEOUT		100
 
+#define DEFAULT_RDTR			0
+#define DEFAULT_RADV			8
+#define BURST_RDTR			0x20
+#define BURST_RADV			0x20
+
+/*
+ * in the case of WTHRESH, it appears at least the 82571/2 hardware
+ * writes back 4 descriptors when WTHRESH=5, and 3 descriptors when
+ * WTHRESH=4, and since we want 64 bytes at a time written back, set
+ * it to 5
+ */
+#define E1000_TXDCTL_DMA_BURST_ENABLE                          \
+	(E1000_TXDCTL_GRAN | /* set descriptor granularity */  \
+	 E1000_TXDCTL_COUNT_DESC |                             \
+	 (5 << 16) | /* wthresh must be +1 more than desired */\
+	 (1 << 8)  | /* hthresh */                             \
+	 0x1f)       /* pthresh */
+
+#define E1000_RXDCTL_DMA_BURST_ENABLE                          \
+	(0x01000000 | /* set descriptor granularity */         \
+	 (4 << 16)  | /* set writeback threshold    */         \
+	 (4 << 8)   | /* set prefetch threshold     */         \
+	 0x20)        /* set hthresh                */
+
+#define E1000_TIDV_FPD (1 << 31)
+#define E1000_RDTR_FPD (1 << 31)
+
 enum e1000_boards {
 	board_82571,
 	board_82572,
@@ -425,6 +452,7 @@ struct e1000_info {
 #define FLAG2_DISABLE_ASPM_L1             (1 << 3)
 #define FLAG2_HAS_PHY_STATS               (1 << 4)
 #define FLAG2_HAS_EEE                     (1 << 5)
+#define FLAG2_DMA_BURST                   (1 << 6)
 
 #define E1000_RX_DESC_PS(R, i)	    \
 	(&(((union e1000_rx_desc_packet_split *)((R).desc))[i]))
diff --git a/drivers/net/e1000e/es2lan.c b/drivers/net/e1000e/es2lan.c
index 45aebb4..24f8ac9 100644
--- a/drivers/net/e1000e/es2lan.c
+++ b/drivers/net/e1000e/es2lan.c
@@ -1494,6 +1494,7 @@ struct e1000_info e1000_es2_info = {
 				  | FLAG_APME_CHECK_PORT_B
 				  | FLAG_DISABLE_FC_PAUSE_TIME /* errata */
 				  | FLAG_TIPG_MEDIUM_FOR_80003ESLAN,
+	.flags2			= FLAG2_DMA_BURST,
 	.pba			= 38,
 	.max_hw_frame_size	= DEFAULT_JUMBO,
 	.get_variants		= e1000_get_variants_80003es2lan,
diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index c69563c..1aa4228 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -2650,6 +2650,26 @@ static void e1000_configure_tx(struct e1000_adapter *adapter)
 	/* Tx irq moderation */
 	ew32(TADV, adapter->tx_abs_int_delay);
 
+	if (adapter->flags2 & FLAG2_DMA_BURST) {
+		u32 txdctl = er32(TXDCTL(0));
+		txdctl &= ~(E1000_TXDCTL_PTHRESH | E1000_TXDCTL_HTHRESH |
+			    E1000_TXDCTL_WTHRESH);
+		/*
+		 * set up some performance related parameters to encourage the
+		 * hardware to use the bus more efficiently in bursts, depends
+		 * on the tx_int_delay to be enabled,
+		 * wthresh = 5 ==> burst write a cacheline (64 bytes) at a time
+		 * hthresh = 1 ==> prefetch when one or more available
+		 * pthresh = 0x1f ==> prefetch if internal cache 31 or less
+		 * BEWARE: this seems to work but should be considered first if
+		 * there are tx hangs or other tx related bugs
+		 */
+		txdctl |= E1000_TXDCTL_DMA_BURST_ENABLE;
+		ew32(TXDCTL(0), txdctl);
+		/* erratum work around: set txdctl the same for both queues */
+		ew32(TXDCTL(1), txdctl);
+	}
+
 	/* Program the Transmit Control Register */
 	tctl = er32(TCTL);
 	tctl &= ~E1000_TCTL_CT;
@@ -2872,6 +2892,29 @@ static void e1000_configure_rx(struct e1000_adapter *adapter)
 	e1e_flush();
 	msleep(10);
 
+	if (adapter->flags2 & FLAG2_DMA_BURST) {
+		/*
+		 * set the writeback threshold (only takes effect if the RDTR
+		 * is set). set GRAN=1 and write back up to 0x4 worth, and
+		 * enable prefetching of 0x20 rx descriptors
+		 * granularity = 01
+		 * wthresh = 04,
+		 * hthresh = 04,
+		 * pthresh = 0x20
+		 */
+		ew32(RXDCTL(0), E1000_RXDCTL_DMA_BURST_ENABLE);
+		ew32(RXDCTL(1), E1000_RXDCTL_DMA_BURST_ENABLE);
+
+		/*
+		 * override the delay timers for enabling bursting, only if
+		 * the value was not set by the user via module options
+		 */
+		if (adapter->rx_int_delay == DEFAULT_RDTR)
+			adapter->rx_int_delay = BURST_RDTR;
+		if (adapter->rx_abs_int_delay == DEFAULT_RADV)
+			adapter->rx_abs_int_delay = BURST_RADV;
+	}
+
 	/* set the Receive Delay Timer Register */
 	ew32(RDTR, adapter->rx_int_delay);
 
@@ -4235,6 +4278,16 @@ link_up:
 	/* Force detection of hung controller every watchdog period */
 	adapter->detect_tx_hung = 1;
 
+	/* flush partial descriptors to memory before detecting tx hang */
+	if (adapter->flags2 & FLAG2_DMA_BURST) {
+		ew32(TIDV, adapter->tx_int_delay | E1000_TIDV_FPD);
+		ew32(RDTR, adapter->rx_int_delay | E1000_RDTR_FPD);
+		/*
+		 * no need to flush the writes because the timeout code does
+		 * an er32 first thing
+		 */
+	}
+
 	/*
 	 * With 82571 controllers, LAA may be overwritten due to controller
 	 * reset from the other port. Set the appropriate LAA in RAR[0]
diff --git a/drivers/net/e1000e/param.c b/drivers/net/e1000e/param.c
index 34aeec1..3d36911 100644
--- a/drivers/net/e1000e/param.c
+++ b/drivers/net/e1000e/param.c
@@ -91,7 +91,6 @@ E1000_PARAM(TxAbsIntDelay, "Transmit Absolute Interrupt Delay");
  * Valid Range: 0-65535
  */
 E1000_PARAM(RxIntDelay, "Receive Interrupt Delay");
-#define DEFAULT_RDTR 0
 #define MAX_RXDELAY 0xFFFF
 #define MIN_RXDELAY 0
 
@@ -101,7 +100,6 @@ E1000_PARAM(RxIntDelay, "Receive Interrupt Delay");
  * Valid Range: 0-65535
  */
 E1000_PARAM(RxAbsIntDelay, "Receive Absolute Interrupt Delay");
-#define DEFAULT_RADV 8
 #define MAX_RXABSDELAY 0xFFFF
 #define MIN_RXABSDELAY 0
 


^ permalink raw reply related

* [net-next-2.6 PATCH] ixgbe: fix link issues and panic with shared interrupts for 82598
From: Jeff Kirsher @ 2010-09-30  7:35 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, bphilips, Emil Tantilov, Jeff Kirsher

From: Emil Tantilov <emil.s.tantilov@intel.com>

Fix possible panic/hang with shared Legacy interrupts by not enabling
interrupts when interface is down.

Also fixes an intermittent link by enabling LSC upon exit from ixgbe_intr()

This patch adds flags to ixgbe_irq_enable() to allow for some flexibility
when enabling interrupts.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/ixgbe/ixgbe_main.c |   36 ++++++++++++++++++++++++++----------
 1 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index c35185c..c35e13c 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -2233,7 +2233,8 @@ static void ixgbe_set_itr(struct ixgbe_adapter *adapter)
  * ixgbe_irq_enable - Enable default interrupt generation settings
  * @adapter: board private structure
  **/
-static inline void ixgbe_irq_enable(struct ixgbe_adapter *adapter)
+static inline void ixgbe_irq_enable(struct ixgbe_adapter *adapter, bool queues,
+				    bool flush)
 {
 	u32 mask;
 
@@ -2254,8 +2255,10 @@ static inline void ixgbe_irq_enable(struct ixgbe_adapter *adapter)
 		mask |= IXGBE_EIMS_FLOW_DIR;
 
 	IXGBE_WRITE_REG(&adapter->hw, IXGBE_EIMS, mask);
-	ixgbe_irq_enable_queues(adapter, ~0);
-	IXGBE_WRITE_FLUSH(&adapter->hw);
+	if (queues)
+		ixgbe_irq_enable_queues(adapter, ~0);
+	if (flush)
+		IXGBE_WRITE_FLUSH(&adapter->hw);
 
 	if (adapter->num_vfs > 32) {
 		u32 eitrsel = (1 << (adapter->num_vfs - 32)) - 1;
@@ -2277,7 +2280,7 @@ static irqreturn_t ixgbe_intr(int irq, void *data)
 	u32 eicr;
 
 	/*
-	 * Workaround for silicon errata.  Mask the interrupts
+	 * Workaround for silicon errata on 82598.  Mask the interrupts
 	 * before the read of EICR.
 	 */
 	IXGBE_WRITE_REG(hw, IXGBE_EIMC, IXGBE_IRQ_CLEAR_MASK);
@@ -2286,10 +2289,15 @@ static irqreturn_t ixgbe_intr(int irq, void *data)
 	 * therefore no explict interrupt disable is necessary */
 	eicr = IXGBE_READ_REG(hw, IXGBE_EICR);
 	if (!eicr) {
-		/* shared interrupt alert!
+		/*
+		 * shared interrupt alert!
 		 * make sure interrupts are enabled because the read will
-		 * have disabled interrupts due to EIAM */
-		ixgbe_irq_enable(adapter);
+		 * have disabled interrupts due to EIAM
+		 * finish the workaround of silicon errata on 82598.  Unmask
+		 * the interrupt that we masked before the EICR read.
+		 */
+		if (!test_bit(__IXGBE_DOWN, &adapter->state))
+			ixgbe_irq_enable(adapter, true, true);
 		return IRQ_NONE;	/* Not our interrupt */
 	}
 
@@ -2313,6 +2321,14 @@ static irqreturn_t ixgbe_intr(int irq, void *data)
 		__napi_schedule(&(q_vector->napi));
 	}
 
+	/*
+	 * re-enable link(maybe) and non-queue interrupts, no flush.
+	 * ixgbe_poll will re-enable the queue interrupts
+	 */
+
+	if (!test_bit(__IXGBE_DOWN, &adapter->state))
+		ixgbe_irq_enable(adapter, false, false);
+
 	return IRQ_HANDLED;
 }
 
@@ -3048,7 +3064,7 @@ static void ixgbe_vlan_rx_kill_vid(struct net_device *netdev, u16 vid)
 	vlan_group_set_device(adapter->vlgrp, vid, NULL);
 
 	if (!test_bit(__IXGBE_DOWN, &adapter->state))
-		ixgbe_irq_enable(adapter);
+		ixgbe_irq_enable(adapter, true, true);
 
 	/* remove VID from filter table */
 	hw->mac.ops.set_vfta(&adapter->hw, vid, pool_ndx, false);
@@ -3145,7 +3161,7 @@ static void ixgbe_vlan_rx_register(struct net_device *netdev,
 	ixgbe_vlan_rx_add_vid(netdev, 0);
 
 	if (!test_bit(__IXGBE_DOWN, &adapter->state))
-		ixgbe_irq_enable(adapter);
+		ixgbe_irq_enable(adapter, true, true);
 }
 
 static void ixgbe_restore_vlan(struct ixgbe_adapter *adapter)
@@ -3546,7 +3562,7 @@ static int ixgbe_up_complete(struct ixgbe_adapter *adapter)
 
 	/* clear any pending interrupts, may auto mask */
 	IXGBE_READ_REG(hw, IXGBE_EICR);
-	ixgbe_irq_enable(adapter);
+	ixgbe_irq_enable(adapter, true, true);
 
 	/*
 	 * If this adapter has a fan, check to see if we had a failure


^ permalink raw reply related

* Re: [PATCH v2 1/2] Phonet: Implement Pipe Controller to support Nokia Slim Modems
From: Kumar SANGHVI @ 2010-09-30  7:19 UTC (permalink / raw)
  To: Rémi Denis-Courmont
  Cc: netdev@vger.kernel.org, STEricsson_nomadik_linux,
	Sudeep DIVAKARAN, Gulshan KARMANI, Linus WALLEIJ
In-Reply-To: <201009292121.18274.remi.denis-courmont@nokia.com>

Hi Rémi Denis-Courmont, 

On Wed, Sep 29, 2010 at 20:21:17 +0200, Rémi Denis-Courmont wrote:
> It seems to me that you really want to implement the connect() socket call, so 
> that one of the two endpoints will stand up for the missing controller.

Yes, implementing connect() socket call would be nice.

> That's 
> still much cleaner than CREATE and DESTROY ioctl()'s.

I have not introduced any new ioctl()'s as part of Pipe controller
implementation.
The PIPE_CREATE/PIPE_DESTROY/PIPE_ENABLE/PIPE_DISABLE are all provided
as socket options.
So, user-space can call setsockopt for creating/enabling or
disabling/destroying pipe.

Regarding implementing connect() socket call, few queries:
1. It should carry out all the same steps which I am currently doing as part
   of PIPE_CREATE socket option, right?
2. Currently, as part of Pipe controller implementation, user-space
   follows below sequence:-
	socket()
	bind()
	listen()
	setsockopt(PIPE_CREATE)
	accept()

   In the phonet stack pipe controller logic, we wait for PEP_CONNECT_RESP
   from host-pep (GPRS socket or video telephony socket is a host-pep.
   pep_reply sends out the PEP_CONNECT_RESP) and remote-pep (modem),
   negotiate the best flow-control to be used, and then send
   PIPE_CREATED_IND, with selected flow-control to both pipe end-points.

   I am not sure how the sequence would be when using the connect() socket
   call.

Thanks for your inputs.

Thanks & regards,
Kumar.

^ permalink raw reply

* Re: [PATCH net-next] ip_gre: comments change
From: David Miller @ 2010-09-30  6:35 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1285824777.5211.663.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 30 Sep 2010 07:32:57 +0200

> HARD_TX_LOCK no longer protects tunnels from dead loops,
> but xmit_recursion percpu counter.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied, t hanks Eric.

^ permalink raw reply

* Re: Packet time delays on multi-core systems
From: Eric Dumazet @ 2010-09-30  6:33 UTC (permalink / raw)
  To: Alexey Vlasov; +Cc: Linux Kernel Mailing List, netdev
In-Reply-To: <20100930062419.GD86786@beaver.vrungel.ru>

Le jeudi 30 septembre 2010 à 10:24 +0400, Alexey Vlasov a écrit :
> Here I found some dude with the same problem:
> http://lkml.org/lkml/2010/7/9/340
> 

In your opinion its the same problem.

But the description you gave is completely different.

You have time skew only when activating a particular iptables rule.

No ?

^ permalink raw reply

* Re: [PATCH] net: code cleanups
From: Eric Dumazet @ 2010-09-30  6:31 UTC (permalink / raw)
  To: Changli Gao
  Cc: David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev
In-Reply-To: <AANLkTimVngrNU+PLbaf8r42J3HrvURpY8KUPa+--KvLC@mail.gmail.com>

Le jeudi 30 septembre 2010 à 14:09 +0800, Changli Gao a écrit :
> On Thu, Sep 30, 2010 at 1:16 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Le jeudi 30 septembre 2010 à 10:24 +0800, Changli Gao a écrit :
> >> Compare operations are more readable, and compilers generate the same code
> >> for the both.
> >>
> >
> > You have a buggy compiler then.
> 
> gcc version 4.4.3 (Gentoo 4.4.3-r2 p1.2)
> 
>              rth = rcu_dereference(rth->dst.rt_next)) {
>                 if ((((__force u32)rth->fl.fl4_dst ^ (__force u32)daddr) |
>                      ((__force u32)rth->fl.fl4_src ^ (__force u32)saddr) |
>                      (rth->fl.iif ^ iif) |
>     2f12:       44 3b 80 dc 00 00 00    cmp    0xdc(%rax),%r8d
>     2f19:       0f 85 a2 00 00 00       jne    2fc1 <ip_route_input_common+0x145
> >
>                      rth->fl.oif |
>     2f1f:       83 b8 d8 00 00 00 00    cmpl   $0x0,0xd8(%rax)
>     2f26:       0f 85 95 00 00 00       jne    2fc1 <ip_route_input_common+0x145
> >
>         tos &= IPTOS_RT_MASK;
>         hash = rt_hash(daddr, saddr, iif, rt_genid(net));
> 
>         for (rth = rcu_dereference(rt_hash_table[hash].chain); rth;
>              rth = rcu_dereference(rth->dst.rt_next)) {
>                 if ((((__force u32)rth->fl.fl4_dst ^ (__force u32)daddr) |
>     2f2c:       44 3b b8 e4 00 00 00    cmp    0xe4(%rax),%r15d
>     2f33:       0f 85 88 00 00 00       jne    2fc1
> <ip_route_input_common+0x145>
>                      ((__force u32)rth->fl.fl4_src ^ (__force u32)saddr) |
>     2f39:       44 3b b0 e8 00 00 00    cmp    0xe8(%rax),%r14d
>     2f40:       75 7f                   jne    2fc1
> <ip_route_input_common+0x145>
> 
> 
> >
> > I know this code is ugly, but please keep it as is, dont add conditional
> > branches on hot paths.
> >
> 
> If the compiler doesn't generate conditional branches, we have to
> touch every necessary field of all the cache entries in one hash
> bucket. Is it better than condition branch? I think the compiler
> developers know it better.

Last famous words.

Are you aware of cache lines (64 bytes at least on typical cpus), and
that all fields are already in CPU L1 cache ? I (and others) worked hard
in the past.

> 
> And the compiler reorders the conditional branches, is it expected?
> 

Your compiler added conditional branches on a code not wanting them,
only because on _your_ cpu, these conditional branches might be cheap.

Now, try to compile for an i686 target and see the difference.

If there was no difference, your compiler would be _buggy_, because not
generating optimal assembly.

Here I get :

c141dda9:       8b 55 e8                mov    -0x18(%ebp),%edx
c141ddac:       8b 81 9c 00 00 00       mov    0x9c(%ecx),%eax
c141ddb2:       33 91 a0 00 00 00       xor    0xa0(%ecx),%edx
c141ddb8:       31 f0                   xor    %esi,%eax
c141ddba:       09 d0                   or     %edx,%eax
c141ddbc:       8b 55 e0                mov    -0x20(%ebp),%edx
c141ddbf:       33 91 94 00 00 00       xor    0x94(%ecx),%edx
c141ddc5:       09 d0                   or     %edx,%eax
c141ddc7:       0f b6 55 e7             movzbl -0x19(%ebp),%edx
c141ddcb:       0b 81 90 00 00 00       or     0x90(%ecx),%eax
c141ddd1:       32 91 a4 00 00 00       xor    0xa4(%ecx),%dl
c141ddd7:       0f b6 d2                movzbl %dl,%edx
c141ddda:       09 d0                   or     %edx,%eax
c141dddc:       0f 85 9d 00 00 00       jne    c141de7f <ip_route_input_common+0x1b4>

As you can see, only one conditional branch.

Your patch is not welcomed, thanks.



^ permalink raw reply

* Re: Packet time delays on multi-core systems
From: Alexey Vlasov @ 2010-09-30  6:24 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Linux Kernel Mailing List, netdev
In-Reply-To: <1285796721.5211.156.camel@edumazet-laptop>

Here I found some dude with the same problem:
http://lkml.org/lkml/2010/7/9/340

On Wed, Sep 29, 2010 at 11:45:21PM +0200, Eric Dumazet wrote:
> Le mercredi 29 septembre 2010 ?? 23:18 +0400, Alexey Vlasov a ??crit : 
> > Hi.
> > 
> > I'm not sure actually that I should write here, may be I should ask in
> > netfilter maillist, but if is something wrong please correct me.
> > 
> 
> CC netdev
> 
> 
> > I've got rather large linux shared hosting, and on my new servers I
> > noticed some strange singularity, that this simple rule:
> > 
> > # iptables -A OUTPUT -p tcp -m tcp --dport 80 --tcp-flags
> > FIN,SYN,RST,ACK SYN -j LOG --log-prefix "ipsec:SYN-OUTPUT "
> > --log-uid
> > 
> > gives essential time delays simply at ping from the adjacent server
> > on a local area network. I don't know precisely what's wrong whether the
> > reason is in the bad support by a kernel of new hardware, or it concerns
> > generally the new kernel, but now it leads to the situation that even at simple
> > DDOS attacks to client sites, it becomes difficult to make something, and in
> > general all works only worse.
> > 
> > It seems to me that with the increase of CPU cores' amount, it only becomes
> > worse and worse, and, obviously, iptables uses resources of only one processor,
> > which resources to it for any reason doesn't suffice.
> > 
> 
> Its not true. iptables can run on all cpus in //
> 
> > newbox # iptables -F
> > otherbox # ping -c 100 newbox
> > ...
> > 100 packets transmitted, 100 received, 0% packet loss, time 100044ms
> > rtt min/avg/max/mdev = 0.133/2.637/17.172/3.736 ms
> > 
> > OK.
> > 
> > newbox # iptables -A OUTPUT -p tcp -m tcp --dport 80 --tcp-flags FIN,SYN,RST,ACK SYN
> > -j LOG --log-prefix "ipsec:SYN-OUTPUT " --log-uid
> > otherbox # ping -c 100 newbox
> > ...
> > 64 bytes from (newbox): icmp_seq=3 ttl=64 time=1.58 ms
> > 64 bytes from (newbox): icmp_seq=4 ttl=64 time=98.7 ms
> > 64 bytes from (newbox): icmp_seq=5 ttl=64 time=18.2 ms
> > 64 bytes from (newbox): icmp_seq=6 ttl=64 time=6.13 ms
> > 64 bytes from (newbox): icmp_seq=7 ttl=64 time=108 ms
> > ...
> > 64 bytes from (newbox): icmp_seq=55 ttl=64 time=2.30 ms
> > 64 bytes from (newbox): icmp_seq=56 ttl=64 time=59.9 ms
> > 64 bytes from (newbox): icmp_seq=57 ttl=64 time=0.155 ms
> > ...
> > 64 bytes from (newbox): icmp_seq=61 ttl=64 time=13.4 ms
> > 64 bytes from (newbox): icmp_seq=62 ttl=64 time=55.0 ms
> > 64 bytes from (newbox): icmp_seq=63 ttl=64 time=0.233 ms
> > ...
> > 100 packets transmitted, 100 received, 0% packet loss, time 99957ms
> > rtt min/avg/max/mdev = 0.111/7.519/108.061/18.478 ms
> > 
> > newbox # iptables -L -v -n
> > Chain INPUT (policy ACCEPT 346K packets, 213M bytes)
> >  pkts bytes target     prot opt in     out     source               destination
> > 
> > Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
> >  pkts bytes target     prot opt in     out     source               destination
> > 
> > Chain OUTPUT (policy ACCEPT 296K packets, 290M bytes)
> >  pkts bytes target     prot opt in     out     source               destination
> >   234 14040 LOG        tcp  --  *      *       0.0.0.0/0            0.0.0.0/0    
> > tcp dpt:80 flags:0x17/0x02 LOG flags 8 level 4 prefix `ipsec:SYN-OUTPUT- '
> > 
> > My old server: Intel SR1500, Xeon 5430, kernel 2.6.24 - 2.6.28
> > Newbox: SR1620UR, 5650, kernel 2.6.32
> > 
> > Thanks in advance.
> > 
> 
> Seems strange indeed, since the LOG you add should not slowdown icmp
> trafic that much.
> 
> But if you send SYN packets in the same time, (logged), this might slow
> down the reception (and answers) of ICMP frames. LOG target can be quite
> expensive... 
> 
> Is using other rules gives same problem ?
> 
> iptables -A OUTPUT -p tcp -m tcp --dport 80 --tcp-flags FIN,SYN,RST,ACK SYN
> iptables -A OUTPUT -p tcp -m tcp --dport 80 --tcp-flags FIN,SYN,RST,ACK SYN
> iptables -A OUTPUT -p tcp -m tcp --dport 80 --tcp-flags FIN,SYN,RST,ACK SYN
> iptables -A OUTPUT -p tcp -m tcp --dport 80 --tcp-flags FIN,SYN,RST,ACK SYN
> 
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
BRGDS. Alexey Vlasov.

^ permalink raw reply

* Re: [PATCH] net: code cleanups
From: Changli Gao @ 2010-09-30  6:09 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev
In-Reply-To: <1285823808.5211.627.camel@edumazet-laptop>

On Thu, Sep 30, 2010 at 1:16 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 30 septembre 2010 à 10:24 +0800, Changli Gao a écrit :
>> Compare operations are more readable, and compilers generate the same code
>> for the both.
>>
>
> You have a buggy compiler then.

gcc version 4.4.3 (Gentoo 4.4.3-r2 p1.2)

             rth = rcu_dereference(rth->dst.rt_next)) {
                if ((((__force u32)rth->fl.fl4_dst ^ (__force u32)daddr) |
                     ((__force u32)rth->fl.fl4_src ^ (__force u32)saddr) |
                     (rth->fl.iif ^ iif) |
    2f12:       44 3b 80 dc 00 00 00    cmp    0xdc(%rax),%r8d
    2f19:       0f 85 a2 00 00 00       jne    2fc1 <ip_route_input_common+0x145
>
                     rth->fl.oif |
    2f1f:       83 b8 d8 00 00 00 00    cmpl   $0x0,0xd8(%rax)
    2f26:       0f 85 95 00 00 00       jne    2fc1 <ip_route_input_common+0x145
>
        tos &= IPTOS_RT_MASK;
        hash = rt_hash(daddr, saddr, iif, rt_genid(net));

        for (rth = rcu_dereference(rt_hash_table[hash].chain); rth;
             rth = rcu_dereference(rth->dst.rt_next)) {
                if ((((__force u32)rth->fl.fl4_dst ^ (__force u32)daddr) |
    2f2c:       44 3b b8 e4 00 00 00    cmp    0xe4(%rax),%r15d
    2f33:       0f 85 88 00 00 00       jne    2fc1
<ip_route_input_common+0x145>
                     ((__force u32)rth->fl.fl4_src ^ (__force u32)saddr) |
    2f39:       44 3b b0 e8 00 00 00    cmp    0xe8(%rax),%r14d
    2f40:       75 7f                   jne    2fc1
<ip_route_input_common+0x145>


>
> I know this code is ugly, but please keep it as is, dont add conditional
> branches on hot paths.
>

If the compiler doesn't generate conditional branches, we have to
touch every necessary field of all the cache entries in one hash
bucket. Is it better than condition branch? I think the compiler
developers know it better.

And the compiler reorders the conditional branches, is it expected?

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* [PATCH net-next] ip_gre: comments change
From: Eric Dumazet @ 2010-09-30  5:32 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

HARD_TX_LOCK no longer protects tunnels from dead loops,
but xmit_recursion percpu counter.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv4/ip_gre.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 035db63..fbe2c47 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -64,13 +64,13 @@
    We cannot track such dead loops during route installation,
    it is infeasible task. The most general solutions would be
    to keep skb->encapsulation counter (sort of local ttl),
-   and silently drop packet when it expires. It is the best
+   and silently drop packet when it expires. It is a good
    solution, but it supposes maintaing new variable in ALL
    skb, even if no tunneling is used.
 
-   Current solution: HARD_TX_LOCK lock breaks dead loops.
-
-
+   Current solution: xmit_recursion breaks dead loops. This is a percpu
+   counter, since when we enter the first ndo_xmit(), cpu migration is
+   forbidden. We force an exit if this counter reaches RECURSION_LIMIT
 
    2. Networking dead loops would not kill routers, but would really
    kill network. IP hop limit plays role of "t->recursion" in this case,



^ permalink raw reply related

* Re: [PATCH] net: code cleanups
From: Eric Dumazet @ 2010-09-30  5:16 UTC (permalink / raw)
  To: Changli Gao
  Cc: David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev
In-Reply-To: <1285813497-7384-1-git-send-email-xiaosuo@gmail.com>

Le jeudi 30 septembre 2010 à 10:24 +0800, Changli Gao a écrit :
> Compare operations are more readable, and compilers generate the same code
> for the both.
> 

You have a buggy compiler then.

I know this code is ugly, but please keep it as is, dont add conditional
branches on hot paths.

Thanks

> Use the macros fl4_* to shrink the length of the lines.
> 
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
> ---
>  net/ipv4/af_inet.c |    7 +++----
>  net/ipv4/route.c   |   27 ++++++++++++---------------
>  2 files changed, 15 insertions(+), 19 deletions(-)
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index f581f77..ef26640 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1338,10 +1338,9 @@ static struct sk_buff **inet_gro_receive(struct sk_buff **head,
>  
>  		iph2 = ip_hdr(p);
>  
> -		if ((iph->protocol ^ iph2->protocol) |
> -		    (iph->tos ^ iph2->tos) |
> -		    ((__force u32)iph->saddr ^ (__force u32)iph2->saddr) |
> -		    ((__force u32)iph->daddr ^ (__force u32)iph2->daddr)) {
> +		if (iph->protocol != iph2->protocol || iph->tos != iph2->tos ||
> +		    (__force u32)iph->saddr != (__force u32)iph2->saddr ||
> +		    (__force u32)iph->daddr != (__force u32)iph2->daddr) {
>  			NAPI_GRO_CB(p)->same_flow = 0;
>  			continue;
>  		}
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 98beda4..6b00fde 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -683,19 +683,18 @@ static inline bool rt_caching(const struct net *net)
>  static inline bool compare_hash_inputs(const struct flowi *fl1,
>  					const struct flowi *fl2)
>  {
> -	return ((((__force u32)fl1->nl_u.ip4_u.daddr ^ (__force u32)fl2->nl_u.ip4_u.daddr) |
> -		((__force u32)fl1->nl_u.ip4_u.saddr ^ (__force u32)fl2->nl_u.ip4_u.saddr) |
> -		(fl1->iif ^ fl2->iif)) == 0);
> +	return (__force u32)fl1->fl4_dst == (__force u32)fl2->fl4_dst &&
> +	       (__force u32)fl1->fl4_src == (__force u32)fl2->fl4_src &&
> +	       fl1->iif == fl2->iif;
>  }
>  
>  static inline int compare_keys(struct flowi *fl1, struct flowi *fl2)
>  {
> -	return (((__force u32)fl1->nl_u.ip4_u.daddr ^ (__force u32)fl2->nl_u.ip4_u.daddr) |
> -		((__force u32)fl1->nl_u.ip4_u.saddr ^ (__force u32)fl2->nl_u.ip4_u.saddr) |
> -		(fl1->mark ^ fl2->mark) |
> -		(*(u16 *)&fl1->nl_u.ip4_u.tos ^ *(u16 *)&fl2->nl_u.ip4_u.tos) |
> -		(fl1->oif ^ fl2->oif) |
> -		(fl1->iif ^ fl2->iif)) == 0;
> +	return (__force u32)fl1->fl4_dst == (__force u32)fl2->fl4_dst &&
> +	       (__force u32)fl1->fl4_src == (__force u32)fl2->fl4_src &&
> +	       fl1->mark == fl2->mark &&
> +	       *(u16 *)&fl1->fl4_tos == *(u16 *)&fl2->fl4_tos &&
> +	       fl1->oif == fl2->oif && fl1->iif == fl2->iif;
>  }
>  
>  static inline int compare_netns(struct rtable *rt1, struct rtable *rt2)
> @@ -2286,12 +2285,10 @@ int ip_route_input_common(struct sk_buff *skb, __be32 daddr, __be32 saddr,
>  
>  	for (rth = rcu_dereference(rt_hash_table[hash].chain); rth;
>  	     rth = rcu_dereference(rth->dst.rt_next)) {
> -		if ((((__force u32)rth->fl.fl4_dst ^ (__force u32)daddr) |
> -		     ((__force u32)rth->fl.fl4_src ^ (__force u32)saddr) |
> -		     (rth->fl.iif ^ iif) |
> -		     rth->fl.oif |
> -		     (rth->fl.fl4_tos ^ tos)) == 0 &&
> -		    rth->fl.mark == skb->mark &&
> +		if ((__force u32)rth->fl.fl4_dst == (__force u32)daddr &&
> +		    (__force u32)rth->fl.fl4_src == (__force u32)saddr &&
> +		    rth->fl.iif == iif && rth->fl.oif == 0 &&
> +		    rth->fl.fl4_tos == tos && rth->fl.mark == skb->mark &&
>  		    net_eq(dev_net(rth->dst.dev), net) &&
>  		    !rt_is_expired(rth)) {
>  			if (noref) {
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply

* RE: [PATCH net-next 1/2] enic: remove dead code
From: Vasanthy Kolluri (vkolluri) @ 2010-09-30  5:16 UTC (permalink / raw)
  To: David Miller, shemminger
  Cc: netdev, Scott Feldman (scofeldm), Roopa Prabhu (roprabhu)
In-Reply-To: <20100929.195720.241921649.davem@davemloft.net>

Thanks. I'll remove it.

-Vasanthy

-----Original Message-----
From: David Miller [mailto:davem@davemloft.net] 
Sent: Wednesday, September 29, 2010 7:57 PM
To: shemminger@vyatta.com
Cc: Vasanthy Kolluri (vkolluri); netdev@vger.kernel.org; Scott Feldman
(scofeldm); Roopa Prabhu (roprabhu)
Subject: Re: [PATCH net-next 1/2] enic: remove dead code

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Thu, 30 Sep 2010 11:49:58 +0900

> Ok, but Linux tree is not the repository for "possible future
enhancements".
> Do it or remove it.

Agreed, remove it now and add it back when you submit the patches that
use the code.

The code is in the repository history so it's not like it's lost.

^ permalink raw reply

* Re: [PATCH v6 0/8] ptp: IEEE 1588 hardware clock support
From: Christian Riesch @ 2010-09-30  3:50 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Rodolfo Giometti, Arnd Bergmann, Peter Zijlstra, john stultz,
	devicetree-discuss, linux-kernel, David Miller, netdev,
	linux-arm-kernel, linux-api, Thomas Gleixner, linuxppc-dev,
	Richard Cochran, Alan Cox, Krzysztof Halasa
In-Reply-To: <alpine.DEB.2.00.1009271035110.9258@router.home>

Quoting Christoph Lameter <cl@linux.com>:
> On Thu, 23 Sep 2010, Christian Riesch wrote:
>
>> > > It implies clock tuning in userspace for a potential sub microsecond
>> > > accurate clock. The clock accuracy will be limited by user space
>> > > latencies and noise. You wont be able to discipline the system clock
>> > > accurately.
>> >
>> > Noise matters, latency doesn't.
>>
>> Well put! That's why we need hardware support for PTP timestamping to reduce
>> the noise, but get along well with the clock servo that is steering  
>> the PHC in
>> user space.
>
> Even if I buy into the catch phrase above: User space is subject to noise
> that the in kernel code is not. If you do the tuning over long intervals
> then it hopefully averages out but it still causes jitter effects that
> affects the degree of accuracy (or sync) that you can reach. And the noise
> varies with the load on the system.

Yes and no. If you regard it as a control system: The latencies of the  
operating system are a dead time in the control system. The sampling  
time is quite large, one second, maybe around 100ms or 10ms in  
telecommunication applications, but that is still large compared to  
the latencies you expect to have in the operating system. Hence, this  
latencies (=dead time) can be neglected and the important thing that  
remains is the noise that you introduce in the measurements of the  
time stamps, which is therefore done in hardware.
I admit that my short statement above is not completely correct, I  
should have mentioned the rather large sampling time we are dealing  
with here.

Christian

^ permalink raw reply

* Re: [PATCHv3 net-next-2.6 4/5] XFRM,IPv6: Add IRO remapping hook in xfrm_input()
From: David Miller @ 2010-09-30  3:17 UTC (permalink / raw)
  To: arno; +Cc: eric.dumazet, herbert, yoshfuji, netdev
In-Reply-To: <cc9aa9a5b238d27701140d0d9230593954e53a39.1285749610.git.arno@natisbad.org>

From: Arnaud Ebalard <arno@natisbad.org>
Date: Wed, 29 Sep 2010 11:05:59 +0200

> +EXPORT_SYMBOL(xfrm4_input_addr_check);

> +EXPORT_SYMBOL(xfrm6_input_addr_check);

net/ipv{4,6}/xfrm{4,6}_{state,input}.c will be built together as a
group, so there is no need to export the address check symbol to
modules.

^ permalink raw reply

* Re: [PATCHv3 net-next-2.6 3/5] XFRM,IPv6: Add IRO src/dst address remapping XFRM types and i/o handlers
From: David Miller @ 2010-09-30  3:16 UTC (permalink / raw)
  To: arno; +Cc: eric.dumazet, herbert, yoshfuji, netdev
In-Reply-To: <fd4eec3c9486c46b535e89ceed479c7536f51fb9.1285749610.git.arno@natisbad.org>

From: Arnaud Ebalard <arno@natisbad.org>
Date: Wed, 29 Sep 2010 11:05:47 +0200

> +static int mip6_iro_src_reject(struct xfrm_state *x, struct sk_buff *skb, struct flowi *fl)
> +{
> +	int err = 0;
> +
> +	/* XXX We may need some reject handler at some point but it is not
> +	 * critical yet: see xfrm_secpath_reject() in net/xfrm/xfrm_policy.c
> +	 * and aslo what mip6_destopt_reject() implements */
> +
> +	printk("XXX FIXME: mip6_iro_src_reject() called\n");

pr_debug() or pr_err() or get rid of it altogher and use WARN_ON() or
similar.

> +	spin_lock(&x->lock);
> +	if (!ipv6_addr_equal(&iph->daddr, (struct in6_addr *)x->coaddr) &&
> +	    !ipv6_addr_any((struct in6_addr *)x->coaddr))
> +		err = -ENOENT;
> +	spin_unlock(&x->lock);

What are you actually protecting with this lock?  The moment you drop
it the x->coaddr can change which changes the result you should return
here.

I suspect you either don't need the lock, or you need to lock at a higher
level.

> +		printk(KERN_INFO "%s: spi is not 0: %u\n", __func__,

pr_info()

> +		printk(KERN_INFO "%s: state's mode is not %u: %u\n",

pr_info()

> +		       __func__, XFRM_MODE_ROUTEOPTIMIZATION,

Printing decimal values for CPP macro constants does not make log
messages very readable.

^ permalink raw reply

* Re: [PATCH] net: code cleanups
From: Joe Perches @ 2010-09-30  3:13 UTC (permalink / raw)
  To: Changli Gao
  Cc: Eric Dumazet, David S. Miller, Alexey Kuznetsov,
	Pekka Savola (ipv6), James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy, netdev
In-Reply-To: <AANLkTinRm7dB=n7KG8+TU0_=gf-19LUPV+40_m2YxbU2@mail.gmail.com>

On Thu, 2010-09-30 at 11:07 +0800, Changli Gao wrote:
> On Thu, Sep 30, 2010 at 10:49 AM, Joe Perches <joe@perches.com> wrote:
> > On Thu, 2010-09-30 at 10:24 +0800, Changli Gao wrote:
> >> Compare operations are more readable, and compilers generate the same code
> >> for the both.
> > As far as I know, not all supported versions of gcc
> > generate the same code.
> Is the former better for the compilers?

Yes.  I don't know how much it matters though.

> > Also, you could probably now remove the (__force u32) casts.
> Maybe Eric doesn't think so.

Comparisons of equal types don't need (__force u32) casts. 

They needed to be cast to u32 for the bitwise or's to avoid
compiler warnings.



^ permalink raw reply

* Re: [PATCH] net: code cleanups
From: Changli Gao @ 2010-09-30  3:07 UTC (permalink / raw)
  To: Joe Perches, Eric Dumazet
  Cc: David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev
In-Reply-To: <1285814979.1866.229.camel@Joe-Laptop>

On Thu, Sep 30, 2010 at 10:49 AM, Joe Perches <joe@perches.com> wrote:
> On Thu, 2010-09-30 at 10:24 +0800, Changli Gao wrote:
>> Compare operations are more readable, and compilers generate the same code
>> for the both.
>
> As far as I know, not all supported versions of gcc
> generate the same code.

Is the former better for the compilers?

>
> Also, you could probably now remove the (__force u32) casts.
>

Maybe Eric doesn't think so.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH net-next 1/2] enic: remove dead code
From: David Miller @ 2010-09-30  2:57 UTC (permalink / raw)
  To: shemminger; +Cc: vkolluri, netdev, scofeldm, roprabhu
In-Reply-To: <20100930114958.59c438d7@s6510>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Thu, 30 Sep 2010 11:49:58 +0900

> Ok, but Linux tree is not the repository for "possible future enhancements".
> Do it or remove it.

Agreed, remove it now and add it back when you submit the patches that
use the code.

The code is in the repository history so it's not like it's lost.

^ permalink raw reply

* Re: [PATCH net-next 1/2] enic: remove dead code
From: Stephen Hemminger @ 2010-09-30  2:49 UTC (permalink / raw)
  To: Vasanthy Kolluri (vkolluri)
  Cc: netdev, Scott Feldman (scofeldm), Roopa Prabhu (roprabhu),
	David Miller
In-Reply-To: <212AA327A3557741A058E787E0618873042E0BC6@xmb-sjc-219.amer.cisco.com>

On Wed, 29 Sep 2010 17:10:00 -0700
"Vasanthy Kolluri (vkolluri)" <vkolluri@cisco.com> wrote:

> Hi Stephen,
> 
>  
> 
> Thanks a lot for submitting this patch. However, I have got few
> concerns:
> 
>  
> 
> 1.       Need to retain vnic_dev_soft_reset and vnic_dev_soft_reset_done
> as they are used in vnic_dev_hang_reset and vnic_dev_hang_reset_done
> respectively
> 
> 2.       Want to retain enic_set_rss_key and enic_set_rss_cpu as we will
> be using those in the near future for adding multi rq functionality to
> enic. 
> 
> 3.       Additional cleanup in vnic_rss.h. FYI, the struct defines in
> vnic_rss.h are currently not in use. But I retained them for the same
> reason as in #2.

Ok, but Linux tree is not the repository for "possible future enhancements".
Do it or remove it.

^ permalink raw reply

* Re: [PATCH] net: code cleanups
From: Joe Perches @ 2010-09-30  2:49 UTC (permalink / raw)
  To: Changli Gao
  Cc: David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev
In-Reply-To: <1285813497-7384-1-git-send-email-xiaosuo@gmail.com>

On Thu, 2010-09-30 at 10:24 +0800, Changli Gao wrote:
> Compare operations are more readable, and compilers generate the same code
> for the both.

As far as I know, not all supported versions of gcc
generate the same code.

Also, you could probably now remove the (__force u32) casts.

> Use the macros fl4_* to shrink the length of the lines.
> 
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
> ---
>  net/ipv4/af_inet.c |    7 +++----
>  net/ipv4/route.c   |   27 ++++++++++++---------------
>  2 files changed, 15 insertions(+), 19 deletions(-)
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index f581f77..ef26640 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1338,10 +1338,9 @@ static struct sk_buff **inet_gro_receive(struct sk_buff **head,
>  
>  		iph2 = ip_hdr(p);
>  
> -		if ((iph->protocol ^ iph2->protocol) |
> -		    (iph->tos ^ iph2->tos) |
> -		    ((__force u32)iph->saddr ^ (__force u32)iph2->saddr) |
> -		    ((__force u32)iph->daddr ^ (__force u32)iph2->daddr)) {
> +		if (iph->protocol != iph2->protocol || iph->tos != iph2->tos ||
> +		    (__force u32)iph->saddr != (__force u32)iph2->saddr ||
> +		    (__force u32)iph->daddr != (__force u32)iph2->daddr) {
>  			NAPI_GRO_CB(p)->same_flow = 0;
>  			continue;
>  		}
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 98beda4..6b00fde 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -683,19 +683,18 @@ static inline bool rt_caching(const struct net *net)
>  static inline bool compare_hash_inputs(const struct flowi *fl1,
>  					const struct flowi *fl2)
>  {
> -	return ((((__force u32)fl1->nl_u.ip4_u.daddr ^ (__force u32)fl2->nl_u.ip4_u.daddr) |
> -		((__force u32)fl1->nl_u.ip4_u.saddr ^ (__force u32)fl2->nl_u.ip4_u.saddr) |
> -		(fl1->iif ^ fl2->iif)) == 0);
> +	return (__force u32)fl1->fl4_dst == (__force u32)fl2->fl4_dst &&
> +	       (__force u32)fl1->fl4_src == (__force u32)fl2->fl4_src &&
> +	       fl1->iif == fl2->iif;
>  }
>  
>  static inline int compare_keys(struct flowi *fl1, struct flowi *fl2)
>  {
> -	return (((__force u32)fl1->nl_u.ip4_u.daddr ^ (__force u32)fl2->nl_u.ip4_u.daddr) |
> -		((__force u32)fl1->nl_u.ip4_u.saddr ^ (__force u32)fl2->nl_u.ip4_u.saddr) |
> -		(fl1->mark ^ fl2->mark) |
> -		(*(u16 *)&fl1->nl_u.ip4_u.tos ^ *(u16 *)&fl2->nl_u.ip4_u.tos) |
> -		(fl1->oif ^ fl2->oif) |
> -		(fl1->iif ^ fl2->iif)) == 0;
> +	return (__force u32)fl1->fl4_dst == (__force u32)fl2->fl4_dst &&
> +	       (__force u32)fl1->fl4_src == (__force u32)fl2->fl4_src &&
> +	       fl1->mark == fl2->mark &&
> +	       *(u16 *)&fl1->fl4_tos == *(u16 *)&fl2->fl4_tos &&
> +	       fl1->oif == fl2->oif && fl1->iif == fl2->iif;
>  }
>  
>  static inline int compare_netns(struct rtable *rt1, struct rtable *rt2)
> @@ -2286,12 +2285,10 @@ int ip_route_input_common(struct sk_buff *skb, __be32 daddr, __be32 saddr,
>  
>  	for (rth = rcu_dereference(rt_hash_table[hash].chain); rth;
>  	     rth = rcu_dereference(rth->dst.rt_next)) {
> -		if ((((__force u32)rth->fl.fl4_dst ^ (__force u32)daddr) |
> -		     ((__force u32)rth->fl.fl4_src ^ (__force u32)saddr) |
> -		     (rth->fl.iif ^ iif) |
> -		     rth->fl.oif |
> -		     (rth->fl.fl4_tos ^ tos)) == 0 &&
> -		    rth->fl.mark == skb->mark &&
> +		if ((__force u32)rth->fl.fl4_dst == (__force u32)daddr &&
> +		    (__force u32)rth->fl.fl4_src == (__force u32)saddr &&
> +		    rth->fl.iif == iif && rth->fl.oif == 0 &&
> +		    rth->fl.fl4_tos == tos && rth->fl.mark == skb->mark &&
>  		    net_eq(dev_net(rth->dst.dev), net) &&
>  		    !rt_is_expired(rth)) {
>  			if (noref) {




^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox