Netdev List
 help / color / mirror / Atom feed
* Regarding Routing cache
From: Ajith Adapa @ 2011-11-17  5:00 UTC (permalink / raw)
  To: netdev

Hi,

I have a small doubt regarding routing cache in linux kernel.

It seems ip_route_connect is the way we have to access routing cache
entries. In case of all locally generated packets I see that struct
dst_entry is filled up with a lookup in routing cache.

What about in case of forwarding packets ? I dont see any usage of
routing cache mechanism to fill up the struct dst_entry. So it seems
we directly check the fib_rules or fib table to fill the structure.
If it is true then it would be very slow right ?

Sorry if I am wrong about above findings. Do correct me if I am wrong about it ?

Regards,
Ajith

^ permalink raw reply

* Re: [PATCH 2/2 v4] net/smsc911x: Add regulator support
From: Mike Frysinger @ 2011-11-17  5:07 UTC (permalink / raw)
  To: Robert MARKLUND
  Cc: netdev@vger.kernel.org, Steve Glendinning, Mathieu Poirier,
	Paul Mundt, linux-sh@vger.kernel.org, Sascha Hauer, Tony Lindgren,
	linux-omap@vger.kernel.org,
	uclinux-dist-devel@blackfin.uclinux.org, Linus Walleij
In-Reply-To: <2B1D156D95AE9B4EAD379CB9E465FE7324AECE3650@EXDCVYMBSTM005.EQ1STM.local>

[-- Attachment #1: Type: Text/Plain, Size: 521 bytes --]

On Wednesday 16 November 2011 07:59:41 Robert MARKLUND wrote:
> From: Mike Frysinger [mailto:vapier@gentoo.org]
> > On Monday 31 October 2011 08:38:39 Robert Marklund wrote:
> > > ChangeLog v3->v4:
> > > - Remove dual prints and old comment on Mike's request.
> > > - Split the request_free fucntion on Mike and Sascha request.
> > 
> > would be nice if the enable/disable were split as well ...
> 
> I interpret this as "nice if", if it's a "must be" then ill change it.

i would prefer it were split
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH 5/5] tcp: skip cwnd moderation in TCP_CA_Open in tcp_try_to_open
From: Ilpo Järvinen @ 2011-11-17  5:14 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: David Miller, Netdev, Nandita Dukkipati, Yuchung Cheng,
	Tom Herbert
In-Reply-To: <1321469885-10885-5-git-send-email-ncardwell@google.com>

On Wed, 16 Nov 2011, Neal Cardwell wrote:

> The problem: Senders were overriding cwnd values picked during an undo
> by calling tcp_moderate_cwnd() in tcp_try_to_open().

I think it's intentional. Because of receiver lying bandwidth cheats all 
unlimited undos are bit dangerous.

> The fix: Don't moderate cwnd in tcp_try_to_open() if we're in
> TCP_CA_Open, since doing so is generally unnecessary and specifically
> would override a DSACK-based undo of a cwnd reduction made in fast
> recovery.
> 
> Signed-off-by: Neal Cardwell <ncardwell@google.com>
> ---
>  net/ipv4/tcp_input.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index a4efdd7..78dd38c 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -2881,7 +2881,8 @@ static void tcp_try_to_open(struct sock *sk, int flag)
>  
>  	if (inet_csk(sk)->icsk_ca_state != TCP_CA_CWR) {
>  		tcp_try_keep_open(sk);
> -		tcp_moderate_cwnd(tp);
> +		if (inet_csk(sk)->icsk_ca_state != TCP_CA_Open)
> +			tcp_moderate_cwnd(tp);
>  	} else {
>  		tcp_cwnd_down(sk, flag);
>  	}

Wouldn't it be enough if tcp max burst is increased to match IW (iirc we 
had 3 still there as a magic number)?


-- 
 i.

^ permalink raw reply

* Re: [PATCH 4/5] tcp: allow undo from reordered DSACKs
From: Ilpo Järvinen @ 2011-11-17  5:18 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: David Miller, Netdev, Nandita Dukkipati, Yuchung Cheng,
	Tom Herbert
In-Reply-To: <1321469885-10885-4-git-send-email-ncardwell@google.com>

On Wed, 16 Nov 2011, Neal Cardwell wrote:

> Previously, SACK-enabled connections hung around in TCP_CA_Disorder
> state while snd_una==high_seq, just waiting to accumulate DSACKs and
> hopefully undo a cwnd reduction. This could and did lead to the
> following unfortunate scenario: if some incoming ACKs advance snd_una
> beyond high_seq then we were setting undo_marker to 0 and moving to
> TCP_CA_Open, so if (due to reordering in the ACK return path) we
> shortly thereafter received a DSACK then we were no longer able to
> undo the cwnd reduction.
> 
> The change: Simplify the congestion avoidance state machine by
> removing the behavior where SACK-enabled connections hung around in
> the TCP_CA_Disorder state just waiting for DSACKs. Instead, when
> snd_una advances to high_seq or beyond we typically move to
> TCP_CA_Open immediately and allow an undo in either TCP_CA_Open or
> TCP_CA_Disorder if we later receive enough DSACKs.
> 
> Other patches in this series will provide other changes that are
> necessary to fully fix this problem.
> 
> Signed-off-by: Neal Cardwell <ncardwell@google.com>
> ---
>  net/ipv4/tcp_input.c |   15 ++-------------
>  1 files changed, 2 insertions(+), 13 deletions(-)
> 
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 751d390..a4efdd7 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -2858,7 +2858,7 @@ static void tcp_try_keep_open(struct sock *sk)
>  	struct tcp_sock *tp = tcp_sk(sk);
>  	int state = TCP_CA_Open;
>  
> -	if (tcp_left_out(tp) || tcp_any_retrans_done(sk) || tp->undo_marker)
> +	if (tcp_left_out(tp) || tcp_any_retrans_done(sk))
>  		state = TCP_CA_Disorder;
>  
>  	if (inet_csk(sk)->icsk_ca_state != state) {
> @@ -3066,17 +3066,6 @@ static void tcp_fastretrans_alert(struct sock *sk, int pkts_acked,
>  			}
>  			break;
>  
> -		case TCP_CA_Disorder:
> -			tcp_try_undo_dsack(sk);
> -			if (!tp->undo_marker ||
> -			    /* For SACK case do not Open to allow to undo
> -			     * catching for all duplicate ACKs. */
> -			    tcp_is_reno(tp) || tp->snd_una != tp->high_seq) {
> -				tp->undo_marker = 0;
> -				tcp_set_ca_state(sk, TCP_CA_Open);
> -			}
> -			break;
> -
>  		case TCP_CA_Recovery:
>  			if (tcp_is_reno(tp))
>  				tcp_reset_reno_sack(tp);
> @@ -3117,7 +3106,7 @@ static void tcp_fastretrans_alert(struct sock *sk, int pkts_acked,
>  				tcp_add_reno_sack(sk);
>  		}
>  
> -		if (icsk->icsk_ca_state == TCP_CA_Disorder)
> +		if (icsk->icsk_ca_state <= TCP_CA_Disorder)
>  			tcp_try_undo_dsack(sk);
>  
>  		if (!tcp_time_to_recover(sk)) {

How about extending Disorder state until second cumulative ACK that is 
acking >= high_seq?

-- 
 i.

^ permalink raw reply

* Re: Regarding Routing cache
From: Ajith Adapa @ 2011-11-17  6:04 UTC (permalink / raw)
  To: netdev
In-Reply-To: <CADAe=+Lmp=e3NRH5NOYewJp=XZ0CncPzC-4V=Ct-TEuGjfq3nw@mail.gmail.com>

Hi,

Actually I have doubt with IPv6 related packets.

In case of IPv6 packet in ip6_route_output function is called for
destination related information.
where ip6_route_output calls "fib6_rule_lookup" function. Why lookup
is done in fib table instead of routing cache in case of IPv6 packet ?

In case of IPv4 packet ... ip_route_output checks in routing cache and
if there is a cache miss then it checks the fib table.

Regards,
Ajith




On Thu, Nov 17, 2011 at 10:30 AM, Ajith Adapa <adapa.ajith@gmail.com> wrote:
> Hi,
>
> I have a small doubt regarding routing cache in linux kernel.
>
> It seems ip_route_connect is the way we have to access routing cache
> entries. In case of all locally generated packets I see that struct
> dst_entry is filled up with a lookup in routing cache.
>
> What about in case of forwarding packets ? I dont see any usage of
> routing cache mechanism to fill up the struct dst_entry. So it seems
> we directly check the fib_rules or fib table to fill the structure.
> If it is true then it would be very slow right ?
>
> Sorry if I am wrong about above findings. Do correct me if I am wrong about it ?
>
> Regards,
> Ajith
>

^ permalink raw reply

* [PATCH 1/1]  PHY configuration for compatible issue
From: AriesLee @ 2011-11-17 14:05 UTC (permalink / raw)
  To: Guo-Fu Tseng, netdev; +Cc: AriesLee, Aries Lee

From: Aries Lee <AriesLee@jmicron.com>

To perform PHY calibration and set a different EA value by chip ID,
Whenever the NIC chip power on, ie booting or resuming, we need to
force HW to calibrate PHY parameter again, and also set a proper EA
value which gathered from experiment.

That process resolve the compatible issues(NIC is unable to link
up in some special case) in giga speed.

Signed-off-by: Aries Lee <AriesLee@jmicron.com>
---
 drivers/net/ethernet/jme.c |  127 ++++++++++++++++++++++++++++++++++++++++++-
 drivers/net/ethernet/jme.h |   19 +++++++
 2 files changed, 143 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/jme.c b/drivers/net/ethernet/jme.c
index df3ab83..bd9633d 100644
--- a/drivers/net/ethernet/jme.c
+++ b/drivers/net/ethernet/jme.c
@@ -1743,6 +1743,126 @@ jme_phy_off(struct jme_adapter *jme)
 	if (new_phy_power_ctrl(jme->chip_main_rev))
 		jme_new_phy_off(jme);
 }
+static int
+jme_phy_calibration(struct jme_adapter *jme)
+{
+	u32 ctrl1000, bmcr, phy_addr, phy_data;
+
+	/*  Turn PHY off */
+	bmcr = jme_mdio_read(jme->dev, jme->mii_if.phy_id, MII_BMCR);
+	bmcr |= BMCR_PDOWN;
+	jme_mdio_write(jme->dev, jme->mii_if.phy_id, MII_BMCR, bmcr);
+	/*  Turn PHY on */
+	bmcr = jme_mdio_read(jme->dev, jme->mii_if.phy_id, MII_BMCR);
+	bmcr &= ~BMCR_PDOWN;
+	jme_mdio_write(jme->dev, jme->mii_if.phy_id, MII_BMCR, bmcr);
+	/*  Enabel PHY test mode 1 */
+	ctrl1000 = jme_mdio_read(jme->dev, jme->mii_if.phy_id, MII_CTRL1000);
+	ctrl1000 &= ~PHY_GAD_TEST_MODE_MSK;
+	ctrl1000 |= PHY_GAD_TEST_MODE_1;
+	jme_mdio_write(jme->dev, jme->mii_if.phy_id, MII_CTRL1000, ctrl1000);
+
+
+	phy_addr = JM_PHY_SPEC_REG_READ | JM_PHY_EXT_COMM_2_REG;
+	jme_mdio_write(jme->dev, jme->mii_if.phy_id, JM_PHY_SPEC_ADDR_REG,
+			phy_addr);
+	phy_data = jme_mdio_read(jme->dev, jme->mii_if.phy_id,
+			JM_PHY_SPEC_DATA_REG);
+
+	phy_data &= ~JM_PHY_EXT_COMM_2_CALI_MODE_0;
+	phy_data |= JM_PHY_EXT_COMM_2_CALI_LATCH |
+			JM_PHY_EXT_COMM_2_CALI_ENABLE;
+
+	phy_addr = JM_PHY_SPEC_REG_WRITE | JM_PHY_EXT_COMM_2_REG;
+	jme_mdio_write(jme->dev, jme->mii_if.phy_id, JM_PHY_SPEC_DATA_REG,
+			phy_data);
+	jme_mdio_write(jme->dev, jme->mii_if.phy_id, JM_PHY_SPEC_ADDR_REG,
+			phy_addr);
+
+	msleep(20);
+
+	phy_addr = JM_PHY_SPEC_REG_READ | JM_PHY_EXT_COMM_2_REG;
+	jme_mdio_write(jme->dev, jme->mii_if.phy_id, JM_PHY_SPEC_ADDR_REG,
+			phy_addr);
+	phy_data = jme_mdio_read(jme->dev, jme->mii_if.phy_id,
+			JM_PHY_SPEC_DATA_REG);
+
+	phy_data &= ~(JM_PHY_EXT_COMM_2_CALI_ENABLE |
+			JM_PHY_EXT_COMM_2_CALI_MODE_0 |
+			JM_PHY_EXT_COMM_2_CALI_LATCH);
+
+	phy_addr = JM_PHY_SPEC_REG_WRITE | JM_PHY_EXT_COMM_2_REG;
+	jme_mdio_write(jme->dev, jme->mii_if.phy_id, JM_PHY_SPEC_DATA_REG,
+			phy_data);
+	jme_mdio_write(jme->dev, jme->mii_if.phy_id, JM_PHY_SPEC_ADDR_REG,
+			phy_addr);
+
+	/*  Disable PHY test mode */
+	ctrl1000 = jme_mdio_read(jme->dev, jme->mii_if.phy_id, MII_CTRL1000);
+	ctrl1000 &= ~PHY_GAD_TEST_MODE_MSK;
+	jme_mdio_write(jme->dev, jme->mii_if.phy_id, MII_CTRL1000, ctrl1000);
+	return 0;
+}
+
+static int
+jme_phy_setEA(struct jme_adapter *jme)
+{
+	u32 phy_addr, phy_comm0 = 0, phy_comm1 = 0;
+	u8 nic_ctrl;
+
+	pci_read_config_byte(jme->pdev, PCI_PRIV_SHARE_NICCTRL, &nic_ctrl);
+	if ((nic_ctrl & 0x3) == JME_FLAG_PHYEA_ENABLE)
+		return 0;
+
+	switch (jme->pdev->device) {
+	case PCI_DEVICE_ID_JMICRON_JMC250:
+		if (((jme->chip_main_rev == 5) &&
+			((jme->chip_sub_rev == 0) || (jme->chip_sub_rev == 1) ||
+			(jme->chip_sub_rev == 3))) ||
+			(jme->chip_main_rev >= 6)) {
+			phy_comm0 = 0x008A;
+			phy_comm1 = 0x4109;
+		}
+		if ((jme->chip_main_rev == 3) &&
+			((jme->chip_sub_rev == 1) || (jme->chip_sub_rev == 2)))
+			phy_comm0 = 0xE088;
+		break;
+	case PCI_DEVICE_ID_JMICRON_JMC260:
+		if (((jme->chip_main_rev == 5) &&
+			((jme->chip_sub_rev == 0) || (jme->chip_sub_rev == 1) ||
+			(jme->chip_sub_rev == 3))) ||
+			(jme->chip_main_rev >= 6)) {
+			phy_comm0 = 0x008A;
+			phy_comm1 = 0x4109;
+		}
+		if ((jme->chip_main_rev == 3) &&
+			((jme->chip_sub_rev == 1) || (jme->chip_sub_rev == 2)))
+			phy_comm0 = 0xE088;
+		if ((jme->chip_main_rev == 2) && (jme->chip_sub_rev == 0))
+			phy_comm0 = 0x608A;
+		if ((jme->chip_main_rev == 2) && (jme->chip_sub_rev == 2))
+			phy_comm0 = 0x408A;
+		break;
+	default:
+		return -ENODEV;
+	}
+	if (phy_comm0) {
+		phy_addr = JM_PHY_SPEC_REG_WRITE | JM_PHY_EXT_COMM_0_REG;
+		jme_mdio_write(jme->dev, jme->mii_if.phy_id,
+				JM_PHY_SPEC_DATA_REG, phy_comm0);
+		jme_mdio_write(jme->dev, jme->mii_if.phy_id,
+				JM_PHY_SPEC_ADDR_REG, phy_addr);
+	}
+	if (phy_comm1) {
+		phy_addr = JM_PHY_SPEC_REG_WRITE | JM_PHY_EXT_COMM_1_REG;
+		jme_mdio_write(jme->dev, jme->mii_if.phy_id,
+				JM_PHY_SPEC_DATA_REG, phy_comm1);
+		jme_mdio_write(jme->dev, jme->mii_if.phy_id,
+				JM_PHY_SPEC_ADDR_REG, phy_addr);
+	}
+
+	return 0;
+}
 
 static int
 jme_open(struct net_device *netdev)
@@ -1769,7 +1889,8 @@ jme_open(struct net_device *netdev)
 		jme_set_settings(netdev, &jme->old_ecmd);
 	else
 		jme_reset_phy_processor(jme);
-
+	jme_phy_calibration(jme);
+	jme_phy_setEA(jme);
 	jme_reset_link(jme);
 
 	return 0;
@@ -3184,7 +3305,8 @@ jme_resume(struct device *dev)
 		jme_set_settings(netdev, &jme->old_ecmd);
 	else
 		jme_reset_phy_processor(jme);
-
+	jme_phy_calibration(jme);
+	jme_phy_setEA(jme);
 	jme_start_irq(jme);
 	netif_device_attach(netdev);
 
@@ -3239,4 +3361,3 @@ MODULE_DESCRIPTION("JMicron JMC2x0 PCI Express Ethernet driver");
 MODULE_LICENSE("GPL");
 MODULE_VERSION(DRV_VERSION);
 MODULE_DEVICE_TABLE(pci, jme_pci_tbl);
-
diff --git a/drivers/net/ethernet/jme.h b/drivers/net/ethernet/jme.h
index 02ea27c..47e47a9 100644
--- a/drivers/net/ethernet/jme.h
+++ b/drivers/net/ethernet/jme.h
@@ -760,6 +760,25 @@ enum jme_rxmcs_bits {
 				  RXMCS_CHECKSUM,
 };
 
+/*	Extern PHY common register 2	*/
+
+#define PHY_GAD_TEST_MODE_1			0x00002000
+#define PHY_GAD_TEST_MODE_MSK			0x0000E000
+#define JM_PHY_SPEC_REG_READ			0x00004000
+#define JM_PHY_SPEC_REG_WRITE			0x00008000
+#define PHY_CALIBRATION_DELAY			20
+#define JM_PHY_SPEC_ADDR_REG			0x1E
+#define JM_PHY_SPEC_DATA_REG			0x1F
+
+#define JM_PHY_EXT_COMM_0_REG			0x30
+#define JM_PHY_EXT_COMM_1_REG			0x31
+#define JM_PHY_EXT_COMM_2_REG			0x32
+#define JM_PHY_EXT_COMM_2_CALI_ENABLE		0x01
+#define JM_PHY_EXT_COMM_2_CALI_MODE_0		0x02
+#define JM_PHY_EXT_COMM_2_CALI_LATCH		0x10
+#define PCI_PRIV_SHARE_NICCTRL			0xF5
+#define JME_FLAG_PHYEA_ENABLE			0x2
+
 /*
  * Wakeup Frame setup interface registers
  */
-- 
1.7.4.4

^ permalink raw reply related

* Re: Regarding Routing cache
From: Eric Dumazet @ 2011-11-17  6:26 UTC (permalink / raw)
  To: Ajith Adapa; +Cc: netdev
In-Reply-To: <CADAe=+K8bf_VOe=KszDZRqZnxQH2VFS3C05mpG=RarN+R5BSbA@mail.gmail.com>

Please dont top post on netdev

Le jeudi 17 novembre 2011 à 11:34 +0530, Ajith Adapa a écrit :
> Hi,
> 
> Actually I have doubt with IPv6 related packets.
> 
> In case of IPv6 packet in ip6_route_output function is called for
> destination related information.
> where ip6_route_output calls "fib6_rule_lookup" function. Why lookup
> is done in fib table instead of routing cache in case of IPv6 packet ?
> 
> In case of IPv4 packet ... ip_route_output checks in routing cache and
> if there is a cache miss then it checks the fib table.
> 

IPv6 has no routing cache, and wont have one, since we are trying to
remove IPv4 routing cache :)

> 
> 
> 
> On Thu, Nov 17, 2011 at 10:30 AM, Ajith Adapa <adapa.ajith@gmail.com> wrote:
> > Hi,
> >
> > I have a small doubt regarding routing cache in linux kernel.
> >
> > It seems ip_route_connect is the way we have to access routing cache
> > entries. In case of all locally generated packets I see that struct
> > dst_entry is filled up with a lookup in routing cache.
> >
> > What about in case of forwarding packets ? I dont see any usage of
> > routing cache mechanism to fill up the struct dst_entry. So it seems
> > we directly check the fib_rules or fib table to fill the structure.
> > If it is true then it would be very slow right ?
> >
> > Sorry if I am wrong about above findings. Do correct me if I am wrong about it ?
> >
> > Regards,
> > Ajith
> >
> --

^ permalink raw reply

* Re: [PATCH net-next v6 7/9] forcedeth: implement ndo_get_stats64() API
From: Eric Dumazet @ 2011-11-17  6:34 UTC (permalink / raw)
  To: David Decotigny
  Cc: netdev, linux-kernel, David S. Miller, Ian Campbell, Jeff Kirsher,
	Ben Hutchings, Jiri Pirko, Joe Perches, Szymon Janc,
	Richard Jones, Ayaz Abdulla
In-Reply-To: <185a0f33c69ff07e8fe3482828b368371420bf47.1321481064.git.david.decotigny@google.com>

Le mercredi 16 novembre 2011 à 14:15 -0800, David Decotigny a écrit :
> This commit implements the ndo_get_stats64() API for forcedeth. Since
> hardware stats are being updated from different contexts (process and
> timer), this commit adds synchronization. For software stats, it
> relies on the u64_stats_sync.h API.
> 
> Tested:
>   - 16-way SMP x86_64 ->
>     RX bytes:7244556582 (7.2 GB)  TX bytes:181904254 (181.9 MB)
>   - pktgen + loopback: identical rx_bytes/tx_bytes and rx_packets/tx_packets
> 
> 
> 
> Signed-off-by: David Decotigny <david.decotigny@google.com>
> ---
>  drivers/net/ethernet/nvidia/forcedeth.c |  197 +++++++++++++++++++++++--------
>  1 files changed, 146 insertions(+), 51 deletions(-)
> 


> +static struct rtnl_link_stats64*
> +nv_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *storage)
> +	__acquires(&netdev_priv(dev)->hwstats_lock)
> +	__releases(&netdev_priv(dev)->hwstats_lock)
>  {
>  	struct fe_priv *np = netdev_priv(dev);
> +	unsigned int syncp_start;
> +
> +	/*
> +	 * Note: because HW stats are not always available and for
> +	 * consistency reasons, the following ifconfig stats are
> +	 * managed by software: rx_bytes, tx_bytes, rx_packets and
> +	 * tx_packets. The related hardware stats reported by ethtool
> +	 * should be equivalent to these ifconfig stats, with 4
> +	 * additional bytes per packet (Ethernet FCS CRC), except for
> +	 * tx_packets when TSO kicks in.
> +	 */
> +
> +	/* software stats */
> +	do {
> +		syncp_start = u64_stats_fetch_begin(&np->swstats_rx_syncp);
> +		storage->rx_packets       = np->stat_rx_packets;
> +		storage->rx_bytes         = np->stat_rx_bytes;
> +		storage->rx_missed_errors = np->stat_rx_missed_errors;
> +	} while (u64_stats_fetch_retry(&np->swstats_rx_syncp, syncp_start));
> +
> +	do {
> +		syncp_start = u64_stats_fetch_begin(&np->swstats_tx_syncp);
> +		storage->tx_packets = np->stat_tx_packets;
> +		storage->tx_bytes   = np->stat_tx_bytes;
> +		storage->tx_dropped = np->stat_tx_dropped;
> +	} while (u64_stats_fetch_retry(&np->swstats_tx_syncp, syncp_start));
>  

I have no idea why you think u64_stats_fetch_begin() is safe on 32bit
arch here.

Hint : On CONFIG_SMP=n build, only preemption is disabled in
u64_stats_fetch_begin()

So an interrupt could come and change your counters while you were
reading them.

Its very unlikely, but its possible.

You should use the _bh variants.

^ permalink raw reply

* [PATCH net-next] r8169: Add 64bit statistics
From: Junchang Wang @ 2011-11-17  6:48 UTC (permalink / raw)
  To: romieu, nic_swsd, eric.dumazet; +Cc: netdev


Switch to use ndo_get_stats64 to get 64bit statistics.
Per cpu data is used to avoid lock operations.


Signed-off-by: Junchang Wang <junchangwang@gmail.com>
---
 drivers/net/ethernet/realtek/r8169.c |  113 ++++++++++++++++++++++++++++------
 1 files changed, 93 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index cdf66d6..0165646 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -670,11 +670,31 @@ struct rtl8169_counters {
 	__le16	tx_underun;
 };
 
+struct rtl8169_pcpu_stats {
+	u64			rx_packets;
+	u64			rx_bytes;
+	u64			tx_packets;
+	u64			tx_bytes;
+	struct u64_stats_sync	syncp;
+	/*
+	 * The following variables are updated
+	 * without syncp protection.
+	 */
+	unsigned long		rx_dropped;
+	unsigned long		tx_dropped;
+	unsigned long		rx_length_errors;
+	unsigned long		rx_errors;
+	unsigned long		rx_crc_errors;
+	unsigned long		rx_fifo_errors;
+	unsigned long		rx_missed_errors;
+};
+
 struct rtl8169_private {
 	void __iomem *mmio_addr;	/* memory map physical address */
 	struct pci_dev *pci_dev;
 	struct net_device *dev;
 	struct napi_struct napi;
+	struct rtl8169_pcpu_stats __percpu *pcpu_stats;
 	spinlock_t lock;
 	u32 msg_enable;
 	u16 txd_version;
@@ -766,7 +786,9 @@ static void rtl_hw_start(struct net_device *dev);
 static int rtl8169_close(struct net_device *dev);
 static void rtl_set_rx_mode(struct net_device *dev);
 static void rtl8169_tx_timeout(struct net_device *dev);
-static struct net_device_stats *rtl8169_get_stats(struct net_device *dev);
+static struct rtnl_link_stats64 *rtl8169_get_stats64(struct net_device *dev,
+						     struct rtnl_link_stats64
+						     *stats);
 static int rtl8169_rx_interrupt(struct net_device *, struct rtl8169_private *,
 				void __iomem *, u32 budget);
 static int rtl8169_change_mtu(struct net_device *dev, int new_mtu);
@@ -3454,7 +3476,7 @@ static void rtl_disable_msi(struct pci_dev *pdev, struct rtl8169_private *tp)
 static const struct net_device_ops rtl8169_netdev_ops = {
 	.ndo_open		= rtl8169_open,
 	.ndo_stop		= rtl8169_close,
-	.ndo_get_stats		= rtl8169_get_stats,
+	.ndo_get_stats64	= rtl8169_get_stats64,
 	.ndo_start_xmit		= rtl8169_start_xmit,
 	.ndo_tx_timeout		= rtl8169_tx_timeout,
 	.ndo_validate_addr	= eth_validate_addr,
@@ -4138,6 +4160,7 @@ rtl8169_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	tp->rtl_fw = RTL_FIRMWARE_UNKNOWN;
 
+	tp->pcpu_stats = alloc_percpu(struct rtl8169_pcpu_stats);
 	rc = register_netdev(dev);
 	if (rc < 0)
 		goto err_out_msi_4;
@@ -4196,6 +4219,7 @@ static void __devexit rtl8169_remove_one(struct pci_dev *pdev)
 
 	cancel_delayed_work_sync(&tp->task);
 
+	free_percpu(tp->pcpu_stats);
 	unregister_netdev(dev);
 
 	rtl_release_firmware(tp);
@@ -5310,7 +5334,7 @@ static void rtl8169_tx_clear_range(struct rtl8169_private *tp, u32 start,
 			rtl8169_unmap_tx_skb(&tp->pci_dev->dev, tx_skb,
 					     tp->TxDescArray + entry);
 			if (skb) {
-				tp->dev->stats.tx_dropped++;
+				this_cpu_inc(tp->pcpu_stats->tx_dropped);
 				dev_kfree_skb(skb);
 				tx_skb->skb = NULL;
 			}
@@ -5562,12 +5586,12 @@ err_dma_1:
 	rtl8169_unmap_tx_skb(d, tp->tx_skb + entry, txd);
 err_dma_0:
 	dev_kfree_skb(skb);
-	dev->stats.tx_dropped++;
+	this_cpu_inc(tp->pcpu_stats->tx_dropped);
 	return NETDEV_TX_OK;
 
 err_stop_0:
 	netif_stop_queue(dev);
-	dev->stats.tx_dropped++;
+	this_cpu_inc(tp->pcpu_stats->tx_dropped);
 	return NETDEV_TX_BUSY;
 }
 
@@ -5641,8 +5665,13 @@ static void rtl8169_tx_interrupt(struct net_device *dev,
 		rtl8169_unmap_tx_skb(&tp->pci_dev->dev, tx_skb,
 				     tp->TxDescArray + entry);
 		if (status & LastFrag) {
-			dev->stats.tx_packets++;
-			dev->stats.tx_bytes += tx_skb->skb->len;
+			struct rtl8169_pcpu_stats *pcpu_stats;
+
+			pcpu_stats = this_cpu_ptr(tp->pcpu_stats);
+			u64_stats_update_begin(&pcpu_stats->syncp);
+			pcpu_stats->tx_packets++;
+			pcpu_stats->tx_bytes += tx_skb->skb->len;
+			u64_stats_update_end(&pcpu_stats->syncp);
 			dev_kfree_skb(tx_skb->skb);
 			tx_skb->skb = NULL;
 		}
@@ -5728,20 +5757,21 @@ static int rtl8169_rx_interrupt(struct net_device *dev,
 		if (unlikely(status & RxRES)) {
 			netif_info(tp, rx_err, dev, "Rx ERROR. status = %08x\n",
 				   status);
-			dev->stats.rx_errors++;
+			this_cpu_inc(tp->pcpu_stats->rx_errors);
 			if (status & (RxRWT | RxRUNT))
-				dev->stats.rx_length_errors++;
+				this_cpu_inc(tp->pcpu_stats->rx_length_errors);
 			if (status & RxCRC)
-				dev->stats.rx_crc_errors++;
+				this_cpu_inc(tp->pcpu_stats->rx_crc_errors);
 			if (status & RxFOVF) {
 				rtl8169_schedule_work(dev, rtl8169_reset_task);
-				dev->stats.rx_fifo_errors++;
+				this_cpu_inc(tp->pcpu_stats->rx_fifo_errors);
 			}
 			rtl8169_mark_to_asic(desc, rx_buf_sz);
 		} else {
 			struct sk_buff *skb;
 			dma_addr_t addr = le64_to_cpu(desc->addr);
 			int pkt_size = (status & 0x00003fff) - 4;
+			struct rtl8169_pcpu_stats *pcpu_stats;
 
 			/*
 			 * The driver does not support incoming fragmented
@@ -5749,8 +5779,8 @@ static int rtl8169_rx_interrupt(struct net_device *dev,
 			 * sized frames.
 			 */
 			if (unlikely(rtl8169_fragmented_frame(status))) {
-				dev->stats.rx_dropped++;
-				dev->stats.rx_length_errors++;
+				this_cpu_inc(tp->pcpu_stats->rx_dropped);
+				this_cpu_inc(tp->pcpu_stats->rx_length_errors);
 				rtl8169_mark_to_asic(desc, rx_buf_sz);
 				continue;
 			}
@@ -5759,7 +5789,7 @@ static int rtl8169_rx_interrupt(struct net_device *dev,
 						  tp, pkt_size, addr);
 			rtl8169_mark_to_asic(desc, rx_buf_sz);
 			if (!skb) {
-				dev->stats.rx_dropped++;
+				this_cpu_inc(tp->pcpu_stats->rx_dropped);
 				continue;
 			}
 
@@ -5771,8 +5801,11 @@ static int rtl8169_rx_interrupt(struct net_device *dev,
 
 			napi_gro_receive(&tp->napi, skb);
 
-			dev->stats.rx_bytes += pkt_size;
-			dev->stats.rx_packets++;
+			pcpu_stats = this_cpu_ptr(tp->pcpu_stats);
+			u64_stats_update_begin(&pcpu_stats->syncp);
+			pcpu_stats->rx_bytes += pkt_size;
+			pcpu_stats->rx_packets++;
+			u64_stats_update_end(&pcpu_stats->syncp);
 		}
 
 		/* Work around for AMD plateform. */
@@ -5916,7 +5949,8 @@ static void rtl8169_rx_missed(struct net_device *dev, void __iomem *ioaddr)
 	if (tp->mac_version > RTL_GIGA_MAC_VER_06)
 		return;
 
-	dev->stats.rx_missed_errors += (RTL_R32(RxMissed) & 0xffffff);
+	this_cpu_add(tp->pcpu_stats->rx_missed_errors,
+		     (RTL_R32(RxMissed) & 0xffffff));
 	RTL_W32(RxMissed, 0);
 }
 
@@ -6034,16 +6068,24 @@ static void rtl_set_rx_mode(struct net_device *dev)
 }
 
 /**
- *  rtl8169_get_stats - Get rtl8169 read/write statistics
+ *  rtl8169_get_stats64 - Get rtl8169 read/write statistics
  *  @dev: The Ethernet Device to get statistics for
  *
  *  Get TX/RX statistics for rtl8169
  */
-static struct net_device_stats *rtl8169_get_stats(struct net_device *dev)
+static struct rtnl_link_stats64 *
+rtl8169_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
 {
 	struct rtl8169_private *tp = netdev_priv(dev);
 	void __iomem *ioaddr = tp->mmio_addr;
+	struct rtl8169_pcpu_stats *pcpu_stats;
+	u64 rx_packets, rx_bytes, tx_packets, tx_bytes;
+	unsigned long rx_dropped = 0, tx_dropped = 0, rx_length_errors = 0;
+	unsigned long rx_errors = 0, rx_crc_errors = 0, rx_fifo_errors = 0;
+	unsigned long rx_missed_errors = 0;
 	unsigned long flags;
+	unsigned int start;
+	int i;
 
 	if (netif_running(dev)) {
 		spin_lock_irqsave(&tp->lock, flags);
@@ -6051,7 +6093,38 @@ static struct net_device_stats *rtl8169_get_stats(struct net_device *dev)
 		spin_unlock_irqrestore(&tp->lock, flags);
 	}
 
-	return &dev->stats;
+	for_each_possible_cpu(i) {
+		pcpu_stats = per_cpu_ptr(tp->pcpu_stats, i);
+		do {
+			start = u64_stats_fetch_begin_bh(&pcpu_stats->syncp);
+			rx_packets	= pcpu_stats->rx_packets;
+			rx_bytes	= pcpu_stats->rx_bytes;
+			tx_packets	= pcpu_stats->tx_packets;
+			tx_bytes	= pcpu_stats->tx_bytes;
+		} while (u64_stats_fetch_retry_bh(&pcpu_stats->syncp, start));
+
+		stats->rx_packets	+= rx_packets;
+		stats->rx_bytes		+= rx_bytes;
+		stats->tx_packets	+= tx_packets;
+		stats->tx_bytes		+= tx_bytes;
+
+		rx_dropped		+= pcpu_stats->rx_dropped;
+		tx_dropped		+= pcpu_stats->tx_dropped;
+		rx_length_errors	+= pcpu_stats->rx_length_errors;
+		rx_errors		+= pcpu_stats->rx_errors;
+		rx_crc_errors		+= pcpu_stats->rx_crc_errors;
+		rx_fifo_errors		+= pcpu_stats->rx_fifo_errors;
+		rx_missed_errors	+= pcpu_stats->rx_missed_errors;
+	}
+	stats->rx_dropped	= rx_dropped;
+	stats->tx_dropped	= tx_dropped;
+	stats->rx_length_errors = rx_length_errors;
+	stats->rx_errors	= rx_errors;
+	stats->rx_crc_errors	= rx_crc_errors;
+	stats->rx_fifo_errors	= rx_fifo_errors;
+	stats->rx_missed_errors = rx_missed_errors;
+
+	return stats;
 }
 
 static void rtl8169_net_suspend(struct net_device *dev)

^ permalink raw reply related

* Re: [PATCH net-next] r8169: Add 64bit statistics
From: Stephen Hemminger @ 2011-11-17  7:03 UTC (permalink / raw)
  To: Junchang Wang; +Cc: netdev, romieu, nic swsd, eric dumazet
In-Reply-To: <20111117064826.GA4429@Desktop-Junchang>


> Switch to use ndo_get_stats64 to get 64bit statistics.
> Per cpu data is used to avoid lock operations.
> 
> 
> Signed-off-by: Junchang Wang <junchangwang@gmail.com>

This was recently brought up in the proposed forcedeth patch.
You dont need per-cpu since Tx is locked by dev->xmit_lock and
rx is implicitly single threaded by NAPI. You do need to have
two u64_stat_sync entries (one for Tx and one for Rx).  

^ permalink raw reply

* [PATCH net-next] IPV6 Fix a crash when trying to replace non existing route
From: Matti Vaittinen @ 2011-11-17  7:18 UTC (permalink / raw)
  To: davem; +Cc: netdev


This patch fixes a crash when non existing IPv6 route is tried to be changed.

When new destination node was inserted in middle of FIB6 tree, no relevant
sanity checks were performed. Later route insertion might have been prevented
due to invalid request, causing node with no rt info being left in tree. 
When this node was accessed, a crash occurred.

Patch adds missing checks in fib6_add_1()


Signed-off-by: Matti Vaittinen <Mazziesaccount@gmail.com>
--
diff -uNr net-next-229a66e.orig/net/ipv6/ip6_fib.c net-next-229a66e.new/net/ipv6/ip6_fib.c
--- net-next-229a66e.orig/net/ipv6/ip6_fib.c	2011-11-16 16:03:27.000000000 +0200
+++ net-next-229a66e.new/net/ipv6/ip6_fib.c	2011-11-16 16:15:25.000000000 +0200
@@ -449,9 +449,15 @@
 		 */
 		if (plen < fn->fn_bit ||
 		    !ipv6_prefix_equal(&key->addr, addr, fn->fn_bit)) {
-			if (!allow_create)
+			if (!allow_create) {
+				if (replace_required) {
+					printk(KERN_WARNING
+					    "IPv6: Can't replace route, no match found\n");
+					return ERR_PTR(-ENOENT);
+				}
 				printk(KERN_WARNING
 				    "IPv6: NLM_F_CREATE should be set when creating new route\n");
+			}
 			goto insert_above;
 		}
 
@@ -482,7 +488,7 @@
 		fn = dir ? fn->right: fn->left;
 	} while (fn);
 
-	if (replace_required && !allow_create) {
+	if (!allow_create) {
 		/* We should not create new node because
 		 * NLM_F_REPLACE was specified without NLM_F_CREATE
 		 * I assume it is safe to require NLM_F_CREATE when
@@ -492,16 +498,17 @@
 		 * MUST be specified if new route is created.
 		 * That would keep IPv6 consistent with IPv4
 		 */
-		printk(KERN_WARNING
-		    "IPv6: NLM_F_CREATE should be set when creating new route - ignoring request\n");
-		return ERR_PTR(-ENOENT);
+		if (replace_required) {
+			printk(KERN_WARNING
+			    "IPv6: Can't replace route, no match found\n");
+			return ERR_PTR(-ENOENT);
+		}
+		printk(KERN_WARNING "IPv6: NLM_F_CREATE should be set when creating new route\n");
 	}
 	/*
 	 *	We walked to the bottom of tree.
 	 *	Create new leaf node without children.
 	 */
-	if (!allow_create)
-		printk(KERN_WARNING "IPv6: NLM_F_CREATE should be set when creating new route\n");
 
 	ln = node_alloc();
 

^ permalink raw reply

* Re: [PATCH net-next] r8169: Add 64bit statistics
From: Eric Dumazet @ 2011-11-17  7:21 UTC (permalink / raw)
  To: Junchang Wang; +Cc: romieu, nic_swsd, netdev
In-Reply-To: <20111117064826.GA4429@Desktop-Junchang>

Le jeudi 17 novembre 2011 à 14:48 +0800, Junchang Wang a écrit :
> Switch to use ndo_get_stats64 to get 64bit statistics.
> Per cpu data is used to avoid lock operations.
> 
> 
> Signed-off-by: Junchang Wang <junchangwang@gmail.com>
> ---
>  drivers/net/ethernet/realtek/r8169.c |  113 ++++++++++++++++++++++++++++------
>  1 files changed, 93 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
> index cdf66d6..0165646 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -670,11 +670,31 @@ struct rtl8169_counters {
>  	__le16	tx_underun;
>  };
>  
> +struct rtl8169_pcpu_stats {
> +	u64			rx_packets;
> +	u64			rx_bytes;
> +	u64			tx_packets;
> +	u64			tx_bytes;
> +	struct u64_stats_sync	syncp;
> +	/*
> +	 * The following variables are updated
> +	 * without syncp protection.
> +	 */
> +	unsigned long		rx_dropped;
> +	unsigned long		tx_dropped;
> +	unsigned long		rx_length_errors;
> +	unsigned long		rx_errors;
> +	unsigned long		rx_crc_errors;
> +	unsigned long		rx_fifo_errors;
> +	unsigned long		rx_missed_errors;
> +};
> +

Thats overkill. Dont copy what have been done for virtual devices
(loopback, tunnels, ...)

RX and TX path are serialized (only one cpu can fly at one moment)

^ permalink raw reply

* Re: [PATCH 1/1]  PHY configuration for compatible issue
From: Guo-Fu Tseng @ 2011-11-17  7:15 UTC (permalink / raw)
  To: AriesLee, netdev; +Cc: AriesLee
In-Reply-To: <1321538742-3701-1-git-send-email-AriesLee@jmicron.com>

On Thu, 17 Nov 2011 22:05:42 +0800, AriesLee wrote
> From: Aries Lee <AriesLee@jmicron.com>
> 
> To perform PHY calibration and set a different EA value by chip ID,
> Whenever the NIC chip power on, ie booting or resuming, we need to
> force HW to calibrate PHY parameter again, and also set a proper EA
> value which gathered from experiment.
> 
> That process resolve the compatible issues(NIC is unable to link
> up in some special case) in giga speed.
Thank you Aries.

Here is some suggestions after a quick review:

It would be better if you implement the read/write function
for extended-phy-register, instead of using JM_PHY_SPEC_ADDR_REG
and JM_PHY_SPEC_DATA_REG directly all the time.

There are jme_phy_on() and jme_phy_off() function in place.
Should you simply using it?

> 
> Signed-off-by: Aries Lee <AriesLee@jmicron.com>
> ---
>  drivers/net/ethernet/jme.c |  127 
> ++++++++++++++++++++++++++++++++++++++++++- drivers/net/ethernet/jme.h 
> |   19 +++++++ 2 files changed, 143 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/jme.c b/drivers/net/ethernet/jme.c
> index df3ab83..bd9633d 100644
> --- a/drivers/net/ethernet/jme.c
> +++ b/drivers/net/ethernet/jme.c
> @@ -1743,6 +1743,126 @@ jme_phy_off(struct jme_adapter *jme)
>  	if (new_phy_power_ctrl(jme->chip_main_rev))
>  		jme_new_phy_off(jme);
>  }
> +static int
> +jme_phy_calibration(struct jme_adapter *jme)
> +{
> +	u32 ctrl1000, bmcr, phy_addr, phy_data;
> +
> +	/*  Turn PHY off */
> +	bmcr = jme_mdio_read(jme->dev, jme->mii_if.phy_id, MII_BMCR);
> +	bmcr |= BMCR_PDOWN;
> +	jme_mdio_write(jme->dev, jme->mii_if.phy_id, MII_BMCR, bmcr);
> +	/*  Turn PHY on */
> +	bmcr = jme_mdio_read(jme->dev, jme->mii_if.phy_id, MII_BMCR);
> +	bmcr &= ~BMCR_PDOWN;
> +	jme_mdio_write(jme->dev, jme->mii_if.phy_id, MII_BMCR, bmcr);
> +	/*  Enabel PHY test mode 1 */
> +	ctrl1000 = jme_mdio_read(jme->dev, jme->mii_if.phy_id, MII_CTRL1000);
> +	ctrl1000 &= ~PHY_GAD_TEST_MODE_MSK;
> +	ctrl1000 |= PHY_GAD_TEST_MODE_1;
> +	jme_mdio_write(jme->dev, jme->mii_if.phy_id, MII_CTRL1000, ctrl1000);
> +
> +
> +	phy_addr = JM_PHY_SPEC_REG_READ | JM_PHY_EXT_COMM_2_REG;
> +	jme_mdio_write(jme->dev, jme->mii_if.phy_id, JM_PHY_SPEC_ADDR_REG,
> +			phy_addr);
> +	phy_data = jme_mdio_read(jme->dev, jme->mii_if.phy_id,
> +			JM_PHY_SPEC_DATA_REG);
> +
> +	phy_data &= ~JM_PHY_EXT_COMM_2_CALI_MODE_0;
> +	phy_data |= JM_PHY_EXT_COMM_2_CALI_LATCH |
> +			JM_PHY_EXT_COMM_2_CALI_ENABLE;
> +
> +	phy_addr = JM_PHY_SPEC_REG_WRITE | JM_PHY_EXT_COMM_2_REG;
> +	jme_mdio_write(jme->dev, jme->mii_if.phy_id, JM_PHY_SPEC_DATA_REG,
> +			phy_data);
> +	jme_mdio_write(jme->dev, jme->mii_if.phy_id, JM_PHY_SPEC_ADDR_REG,
> +			phy_addr);
> +
> +	msleep(20);
> +
> +	phy_addr = JM_PHY_SPEC_REG_READ | JM_PHY_EXT_COMM_2_REG;
> +	jme_mdio_write(jme->dev, jme->mii_if.phy_id, JM_PHY_SPEC_ADDR_REG,
> +			phy_addr);
> +	phy_data = jme_mdio_read(jme->dev, jme->mii_if.phy_id,
> +			JM_PHY_SPEC_DATA_REG);
> +
> +	phy_data &= ~(JM_PHY_EXT_COMM_2_CALI_ENABLE |
> +			JM_PHY_EXT_COMM_2_CALI_MODE_0 |
> +			JM_PHY_EXT_COMM_2_CALI_LATCH);
> +
> +	phy_addr = JM_PHY_SPEC_REG_WRITE | JM_PHY_EXT_COMM_2_REG;
> +	jme_mdio_write(jme->dev, jme->mii_if.phy_id, JM_PHY_SPEC_DATA_REG,
> +			phy_data);
> +	jme_mdio_write(jme->dev, jme->mii_if.phy_id, JM_PHY_SPEC_ADDR_REG,
> +			phy_addr);
> +
> +	/*  Disable PHY test mode */
> +	ctrl1000 = jme_mdio_read(jme->dev, jme->mii_if.phy_id, MII_CTRL1000);
> +	ctrl1000 &= ~PHY_GAD_TEST_MODE_MSK;
> +	jme_mdio_write(jme->dev, jme->mii_if.phy_id, MII_CTRL1000, ctrl1000);
> +	return 0;
> +}
> +
> +static int
> +jme_phy_setEA(struct jme_adapter *jme)
> +{
> +	u32 phy_addr, phy_comm0 = 0, phy_comm1 = 0;
> +	u8 nic_ctrl;
> +
> +	pci_read_config_byte(jme->pdev, PCI_PRIV_SHARE_NICCTRL, &nic_ctrl);
> +	if ((nic_ctrl & 0x3) == JME_FLAG_PHYEA_ENABLE)
> +		return 0;
> +
> +	switch (jme->pdev->device) {
> +	case PCI_DEVICE_ID_JMICRON_JMC250:
> +		if (((jme->chip_main_rev == 5) &&
> +			((jme->chip_sub_rev == 0) || (jme->chip_sub_rev == 1) ||
> +			(jme->chip_sub_rev == 3))) ||
> +			(jme->chip_main_rev >= 6)) {
> +			phy_comm0 = 0x008A;
> +			phy_comm1 = 0x4109;
> +		}
> +		if ((jme->chip_main_rev == 3) &&
> +			((jme->chip_sub_rev == 1) || (jme->chip_sub_rev == 2)))
> +			phy_comm0 = 0xE088;
> +		break;
> +	case PCI_DEVICE_ID_JMICRON_JMC260:
> +		if (((jme->chip_main_rev == 5) &&
> +			((jme->chip_sub_rev == 0) || (jme->chip_sub_rev == 1) ||
> +			(jme->chip_sub_rev == 3))) ||
> +			(jme->chip_main_rev >= 6)) {
> +			phy_comm0 = 0x008A;
> +			phy_comm1 = 0x4109;
> +		}
> +		if ((jme->chip_main_rev == 3) &&
> +			((jme->chip_sub_rev == 1) || (jme->chip_sub_rev == 2)))
> +			phy_comm0 = 0xE088;
> +		if ((jme->chip_main_rev == 2) && (jme->chip_sub_rev == 0))
> +			phy_comm0 = 0x608A;
> +		if ((jme->chip_main_rev == 2) && (jme->chip_sub_rev == 2))
> +			phy_comm0 = 0x408A;
> +		break;
> +	default:
> +		return -ENODEV;
> +	}
> +	if (phy_comm0) {
> +		phy_addr = JM_PHY_SPEC_REG_WRITE | JM_PHY_EXT_COMM_0_REG;
> +		jme_mdio_write(jme->dev, jme->mii_if.phy_id,
> +				JM_PHY_SPEC_DATA_REG, phy_comm0);
> +		jme_mdio_write(jme->dev, jme->mii_if.phy_id,
> +				JM_PHY_SPEC_ADDR_REG, phy_addr);
> +	}
> +	if (phy_comm1) {
> +		phy_addr = JM_PHY_SPEC_REG_WRITE | JM_PHY_EXT_COMM_1_REG;
> +		jme_mdio_write(jme->dev, jme->mii_if.phy_id,
> +				JM_PHY_SPEC_DATA_REG, phy_comm1);
> +		jme_mdio_write(jme->dev, jme->mii_if.phy_id,
> +				JM_PHY_SPEC_ADDR_REG, phy_addr);
> +	}
> +
> +	return 0;
> +}
> 
>  static int
>  jme_open(struct net_device *netdev)
> @@ -1769,7 +1889,8 @@ jme_open(struct net_device *netdev)
>  		jme_set_settings(netdev, &jme->old_ecmd);
>  	else
>  		jme_reset_phy_processor(jme);
> -
> +	jme_phy_calibration(jme);
> +	jme_phy_setEA(jme);
>  	jme_reset_link(jme);
> 
>  	return 0;
> @@ -3184,7 +3305,8 @@ jme_resume(struct device *dev)
>  		jme_set_settings(netdev, &jme->old_ecmd);
>  	else
>  		jme_reset_phy_processor(jme);
> -
> +	jme_phy_calibration(jme);
> +	jme_phy_setEA(jme);
>  	jme_start_irq(jme);
>  	netif_device_attach(netdev);
> 
> @@ -3239,4 +3361,3 @@ MODULE_DESCRIPTION("JMicron JMC2x0 PCI Express 
> Ethernet driver"); MODULE_LICENSE("GPL"); MODULE_VERSION(DRV_VERSION); 
> MODULE_DEVICE_TABLE(pci, jme_pci_tbl);
> -
> diff --git a/drivers/net/ethernet/jme.h b/drivers/net/ethernet/jme.h
> index 02ea27c..47e47a9 100644
> --- a/drivers/net/ethernet/jme.h
> +++ b/drivers/net/ethernet/jme.h
> @@ -760,6 +760,25 @@ enum jme_rxmcs_bits {
>  				  RXMCS_CHECKSUM,
>  };
> 
> +/*	Extern PHY common register 2	*/
> +
> +#define PHY_GAD_TEST_MODE_1			0x00002000
> +#define PHY_GAD_TEST_MODE_MSK			0x0000E000
> +#define JM_PHY_SPEC_REG_READ			0x00004000
> +#define JM_PHY_SPEC_REG_WRITE			0x00008000
> +#define PHY_CALIBRATION_DELAY			20
> +#define JM_PHY_SPEC_ADDR_REG			0x1E
> +#define JM_PHY_SPEC_DATA_REG			0x1F
> +
> +#define JM_PHY_EXT_COMM_0_REG			0x30
> +#define JM_PHY_EXT_COMM_1_REG			0x31
> +#define JM_PHY_EXT_COMM_2_REG			0x32
> +#define JM_PHY_EXT_COMM_2_CALI_ENABLE		0x01
> +#define JM_PHY_EXT_COMM_2_CALI_MODE_0		0x02
> +#define JM_PHY_EXT_COMM_2_CALI_LATCH		0x10
> +#define PCI_PRIV_SHARE_NICCTRL			0xF5
> +#define JME_FLAG_PHYEA_ENABLE			0x2
> +
>  /*
>   * Wakeup Frame setup interface registers
>   */
> -- 
> 1.7.4.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Guo-Fu Tseng

^ permalink raw reply

* Re: Regarding Routing cache
From: Eric Dumazet @ 2011-11-17  7:26 UTC (permalink / raw)
  To: Ajith Adapa; +Cc: netdev
In-Reply-To: <CADAe=+Jc0YinguPhTuv3sNA2bgEuuTpY9ocE7-_z-uKnsyC7Lw@mail.gmail.com>


Please, no private mails on this discussion. I added back netdev on CC.

Le jeudi 17 novembre 2011 à 12:09 +0530, Ajith Adapa a écrit :
> Hi,
> 
> Sorry about my naive post regarding the issue.
> 
> > IPv6 has no routing cache, and wont have one, since we are trying to
> > remove IPv4 routing cache :)
> 
> But it would have performance issues right as we always check fib
> table for destination route ?
> 
> Is there any references or posts or material which takes about the
> reason for removing the ipv6 and ipv4 routing caches ?
> 

routing cache doesnt scale and was difficult to tune in some situations.

It uses a lot of memory, and doesnt fit in CPU caches.

It was a good solution in the past, but better is to get scalable algo
in the first place.

http://vger.kernel.org/netconf2011.html

http://vger.kernel.org/netconf2011_slides/davem_netconf2011.pdf

^ permalink raw reply

* Re: [PATCH net-next] r8169: Add 64bit statistics
From: Junchang Wang @ 2011-11-17  7:39 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: romieu, nic_swsd, netdev
In-Reply-To: <1321514484.3274.32.camel@edumazet-laptop>

On Thu, Nov 17, 2011 at 3:21 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 17 novembre 2011 à 14:48 +0800, Junchang Wang a écrit :
>> Switch to use ndo_get_stats64 to get 64bit statistics.
>> Per cpu data is used to avoid lock operations.
>>
>>
>> Signed-off-by: Junchang Wang <junchangwang@gmail.com>
>> ---
>>  drivers/net/ethernet/realtek/r8169.c |  113 ++++++++++++++++++++++++++++------
>>  1 files changed, 93 insertions(+), 20 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
>> index cdf66d6..0165646 100644
>> --- a/drivers/net/ethernet/realtek/r8169.c
>> +++ b/drivers/net/ethernet/realtek/r8169.c
>> @@ -670,11 +670,31 @@ struct rtl8169_counters {
>>       __le16  tx_underun;
>>  };
>>
>> +struct rtl8169_pcpu_stats {
>> +     u64                     rx_packets;
>> +     u64                     rx_bytes;
>> +     u64                     tx_packets;
>> +     u64                     tx_bytes;
>> +     struct u64_stats_sync   syncp;
>> +     /*
>> +      * The following variables are updated
>> +      * without syncp protection.
>> +      */
>> +     unsigned long           rx_dropped;
>> +     unsigned long           tx_dropped;
>> +     unsigned long           rx_length_errors;
>> +     unsigned long           rx_errors;
>> +     unsigned long           rx_crc_errors;
>> +     unsigned long           rx_fifo_errors;
>> +     unsigned long           rx_missed_errors;
>> +};
>> +
>
> Thats overkill. Dont copy what have been done for virtual devices
> (loopback, tunnels, ...)
>
> RX and TX path are serialized (only one cpu can fly at one moment)
>
Thanks. I'll submit a new version.



-- 
--Junchang

^ permalink raw reply

* Re: [PATCH net-next] r8169: Add 64bit statistics
From: Junchang Wang @ 2011-11-17  7:46 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, romieu, nic swsd, eric dumazet
In-Reply-To: <0a41dc06-15ab-4cf6-9bbd-3a1556993854@tahiti.vyatta.com>

> You dont need per-cpu since Tx is locked by dev->xmit_lock and
> rx is implicitly single threaded by NAPI.

Thanks.

>You do need to have
> two u64_stat_sync entries (one for Tx and one for Rx).

You mean Rx and Tx will perform on different cores at one moment.
So I need a sync for Tx to protect tx_xxx, and another for Rx to
protect rx_xxx. Is that right?

Thanks.

-- 
--Junchang

^ permalink raw reply

* [PATCH 1/5] stmmac: use mdelay on timeout of sw reset
From: Giuseppe CAVALLARO @ 2011-11-17  7:57 UTC (permalink / raw)
  To: netdev; +Cc: francesco.virlinzi, srinivas.kandagatla

From: Francesco Virlinzi <francesco.virlinzi@st.com>

This patch uses an mdelay to manage the timeout on
sw reset to be independant of cpu_clk.

Signed-off-by: Francesco Virlinzi <francesco.virlinzi@st.com>
Reviewed-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
 .../net/ethernet/stmicro/stmmac/dwmac1000_dma.c    |    3 ++-
 drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c |    3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c
index da66ac5..4d5402a 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c
@@ -39,10 +39,11 @@ static int dwmac1000_dma_init(void __iomem *ioaddr, int pbl, u32 dma_tx,
 	/* DMA SW reset */
 	value |= DMA_BUS_MODE_SFT_RESET;
 	writel(value, ioaddr + DMA_BUS_MODE);
-	limit = 15000;
+	limit = 10;
 	while (limit--) {
 		if (!(readl(ioaddr + DMA_BUS_MODE) & DMA_BUS_MODE_SFT_RESET))
 			break;
+		mdelay(10);
 	}
 	if (limit < 0)
 		return -EBUSY;
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c b/drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c
index 627f656..bc17fd0 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c
@@ -41,10 +41,11 @@ static int dwmac100_dma_init(void __iomem *ioaddr, int pbl, u32 dma_tx,
 	/* DMA SW reset */
 	value |= DMA_BUS_MODE_SFT_RESET;
 	writel(value, ioaddr + DMA_BUS_MODE);
-	limit = 15000;
+	limit = 10;
 	while (limit--) {
 		if (!(readl(ioaddr + DMA_BUS_MODE) & DMA_BUS_MODE_SFT_RESET))
 			break;
+		mdelay(10);
 	}
 	if (limit < 0)
 		return -EBUSY;
-- 
1.7.4.4

^ permalink raw reply related

* [PATCH 2/5] stmmac: fix advertising 1000Base capabilties for non GMII iface
From: Giuseppe CAVALLARO @ 2011-11-17  7:57 UTC (permalink / raw)
  To: netdev; +Cc: francesco.virlinzi, srinivas.kandagatla
In-Reply-To: <1321516682-32208-1-git-send-email-peppe.cavallaro@st.com>

From: Srinivas Kandagatla <srinivas.kandagatla@st.com>

This patch fixes the way to stop the 1000Base advertising
capabilties for non GMII interfaces.

Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |   10 ++++------
 1 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 20546bb..e079762 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -321,12 +321,10 @@ static int stmmac_init_phy(struct net_device *dev)
 	}
 
 	/* Stop Advertising 1000BASE Capability if interface is not GMII */
-	if ((interface) && ((interface == PHY_INTERFACE_MODE_MII) ||
-	    (interface == PHY_INTERFACE_MODE_RMII))) {
-		phydev->supported &= (PHY_BASIC_FEATURES | SUPPORTED_Pause |
-				      SUPPORTED_Asym_Pause);
-		phydev->advertising = phydev->supported;
-	}
+	if ((interface == PHY_INTERFACE_MODE_MII) ||
+	    (interface == PHY_INTERFACE_MODE_RMII))
+		phydev->advertising &= ~(SUPPORTED_1000baseT_Half |
+					 SUPPORTED_1000baseT_Full);
 
 	/*
 	 * Broken HW is sometimes missing the pull-up resistor on the
-- 
1.7.4.4

^ permalink raw reply related

* [PATCH 3/5] stmmac: parameters auto-tuning through HW cap reg
From: Giuseppe CAVALLARO @ 2011-11-17  7:58 UTC (permalink / raw)
  To: netdev; +Cc: francesco.virlinzi, srinivas.kandagatla, Giuseppe Cavallaro
In-Reply-To: <1321516682-32208-1-git-send-email-peppe.cavallaro@st.com>

New GMAC devices (newer than the databook 3.50a) have the
HW capability register that provides which features are actually
supported by the hardware.

On old devices many information have to be passed through the
platform, for example: enhanced descriptor structure,
TX COE etc. These are mandatory to properly configure the driver.
This remains still valid because the driver has to support old
Synopsys devices but now it's also able to override them using the
values from the HW capability register if supported.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac.h       |    2 +-
 .../net/ethernet/stmicro/stmmac/stmmac_ethtool.c   |    6 ++
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  |   93 ++++++++++++++------
 3 files changed, 72 insertions(+), 29 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index 9bafa6c..a140a8f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -72,7 +72,6 @@ struct stmmac_priv {
 	spinlock_t lock;
 	spinlock_t tx_lock;
 	int wolopts;
-	int wolenabled;
 	int wol_irq;
 #ifdef CONFIG_STMMAC_TIMER
 	struct stmmac_timer *tm;
@@ -80,6 +79,7 @@ struct stmmac_priv {
 	struct plat_stmmacenet_data *plat;
 	struct stmmac_counters mmc;
 	struct dma_features dma_cap;
+	int hw_cap_support;
 };
 
 extern int stmmac_mdio_unregister(struct net_device *ndev);
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
index e8eff09..0395f9e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
@@ -430,6 +430,12 @@ static int stmmac_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
 	struct stmmac_priv *priv = netdev_priv(dev);
 	u32 support = WAKE_MAGIC | WAKE_UCAST;
 
+	/* By default almost all GMAC devices support the WoL via
+	 * magic frame but we can disable it if the HW capability
+	 * register shows no support for pmt_magic_frame. */
+	if ((priv->hw_cap_support) && (!priv->dma_cap.pmt_magic_frame))
+		wol->wolopts &= ~WAKE_MAGIC;
+
 	if (!device_can_wakeup(priv->device))
 		return -EINVAL;
 
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index e079762..7f3ffd3 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -805,8 +805,29 @@ static u32 stmmac_get_synopsys_id(struct stmmac_priv *priv)
 	return 0;
 }
 
-/* New GMAC chips support a new register to indicate the
- * presence of the optional feature/functions.
+/**
+ * stmmac_selec_desc_mode
+ * @dev : device pointer
+ * Description: select the Enhanced/Alternate or Normal descriptors */
+static void stmmac_selec_desc_mode(struct stmmac_priv *priv)
+{
+	if (priv->plat->enh_desc) {
+		pr_info(" Enhanced/Alternate descriptors\n");
+		priv->hw->desc = &enh_desc_ops;
+	} else {
+		pr_info(" Normal descriptors\n");
+		priv->hw->desc = &ndesc_ops;
+	}
+}
+
+/**
+ * stmmac_get_hw_features
+ * @priv : private device pointer
+ * Description:
+ *  new GMAC chip generations have a new register to indicate the
+ *  presence of the optional feature/functions.
+ *  This can be also used to override the value passed through the
+ *  platform and necessary for old MAC10/100 and GMAC chips.
  */
 static int stmmac_get_hw_features(struct stmmac_priv *priv)
 {
@@ -827,7 +848,7 @@ static int stmmac_get_hw_features(struct stmmac_priv *priv)
 			(hw_cap & DMA_HW_FEAT_RWKSEL) >> 9;
 		priv->dma_cap.pmt_magic_frame =
 			(hw_cap & DMA_HW_FEAT_MGKSEL) >> 10;
-		/*MMC*/
+		/* MMC */
 		priv->dma_cap.rmon = (hw_cap & DMA_HW_FEAT_MMCSEL) >> 11;
 		/* IEEE 1588-2002*/
 		priv->dma_cap.time_stamp =
@@ -855,8 +876,7 @@ static int stmmac_get_hw_features(struct stmmac_priv *priv)
 		priv->dma_cap.enh_desc =
 			(hw_cap & DMA_HW_FEAT_ENHDESSEL) >> 24;
 
-	} else
-		pr_debug("\tNo HW DMA feature register supported");
+	}
 
 	return hw_cap;
 }
@@ -911,6 +931,44 @@ static int stmmac_open(struct net_device *dev)
 		goto open_error;
 	}
 
+	stmmac_get_synopsys_id(priv);
+
+	priv->hw_cap_support = stmmac_get_hw_features(priv);
+
+	if (priv->hw_cap_support) {
+		pr_info(" Support DMA HW capability register");
+
+		/* We can override some gmac/dma configuration fields: e.g.
+		 * enh_desc, tx_coe (e.g. that are passed through the
+		 * platform) with the values from the HW capability
+		 * register (if supported).
+		 */
+		priv->plat->enh_desc = priv->dma_cap.enh_desc;
+		priv->plat->tx_coe = priv->dma_cap.tx_coe;
+		priv->plat->pmt = priv->dma_cap.pmt_remote_wake_up;
+
+		/* By default disable wol on magic frame if not supported */
+		if (!priv->dma_cap.pmt_magic_frame)
+			priv->wolopts &= ~WAKE_MAGIC;
+
+	} else
+		pr_info(" No HW DMA feature register supported");
+
+	/* Select the enhnaced/normal descriptor structures */
+	stmmac_selec_desc_mode(priv);
+
+	/* PMT module is not integrated in all the MAC devices. */
+	if (priv->plat->pmt) {
+		pr_info(" Remote wake-up capable\n");
+		device_set_wakeup_capable(priv->device, 1);
+	}
+
+	priv->rx_coe = priv->hw->mac->rx_coe(priv->ioaddr);
+	if (priv->rx_coe)
+		pr_info(" Checksum Offload Engine supported\n");
+	if (priv->plat->tx_coe)
+		pr_info(" Checksum insertion supported\n");
+
 	/* Create and initialize the TX/RX descriptors chains. */
 	priv->dma_tx_size = STMMAC_ALIGN(dma_txsize);
 	priv->dma_rx_size = STMMAC_ALIGN(dma_rxsize);
@@ -933,15 +991,6 @@ static int stmmac_open(struct net_device *dev)
 	/* Initialize the MAC Core */
 	priv->hw->mac->core_init(priv->ioaddr);
 
-	stmmac_get_synopsys_id(priv);
-
-	stmmac_get_hw_features(priv);
-
-	priv->rx_coe = priv->hw->mac->rx_coe(priv->ioaddr);
-	if (priv->rx_coe)
-		pr_info("stmmac: Rx Checksum Offload Engine supported\n");
-	if (priv->plat->tx_coe)
-		pr_info("\tTX Checksum insertion supported\n");
 	netdev_update_features(dev);
 
 	/* Request the IRQ lines */
@@ -1556,7 +1605,7 @@ static int stmmac_sysfs_dma_cap_read(struct seq_file *seq, void *v)
 	struct net_device *dev = seq->private;
 	struct stmmac_priv *priv = netdev_priv(dev);
 
-	if (!stmmac_get_hw_features(priv)) {
+	if (!priv->hw_cap_support) {
 		seq_printf(seq, "DMA HW features not supported\n");
 		return 0;
 	}
@@ -1764,12 +1813,6 @@ static int stmmac_mac_device_setup(struct net_device *dev)
 	if (!device)
 		return -ENOMEM;
 
-	if (priv->plat->enh_desc) {
-		device->desc = &enh_desc_ops;
-		pr_info("\tEnhanced descriptor structure\n");
-	} else
-		device->desc = &ndesc_ops;
-
 	priv->hw = device;
 	priv->hw->ring = &ring_mode_ops;
 
@@ -1843,11 +1886,6 @@ static int stmmac_dvr_probe(struct platform_device *pdev)
 
 	priv->ioaddr = addr;
 
-	/* PMT module is not integrated in all the MAC devices. */
-	if (plat_dat->pmt) {
-		pr_info("\tPMT module supported\n");
-		device_set_wakeup_capable(&pdev->dev, 1);
-	}
 	/*
 	 * On some platforms e.g. SPEAr the wake up irq differs from the mac irq
 	 * The external wake up irq can be passed through the platform code
@@ -1860,7 +1898,6 @@ static int stmmac_dvr_probe(struct platform_device *pdev)
 	if (priv->wol_irq == -ENXIO)
 		priv->wol_irq = ndev->irq;
 
-
 	platform_set_drvdata(pdev, ndev);
 
 	/* Set the I/O base addr */
@@ -1873,7 +1910,7 @@ static int stmmac_dvr_probe(struct platform_device *pdev)
 			goto out_free_ndev;
 	}
 
-	/* MAC HW revice detection */
+	/* MAC HW device detection */
 	ret = stmmac_mac_device_setup(ndev);
 	if (ret < 0)
 		goto out_plat_exit;
-- 
1.7.4.4

^ permalink raw reply related

* [PATCH 4/5] stmmac: remove spin_lock in stmmac_ioctl.
From: Giuseppe CAVALLARO @ 2011-11-17  7:58 UTC (permalink / raw)
  To: netdev; +Cc: francesco.virlinzi, srinivas.kandagatla, Giuseppe Cavallaro
In-Reply-To: <1321516682-32208-1-git-send-email-peppe.cavallaro@st.com>

From: Srinivas Kandagatla <srinivas.kandagatla@st.com>

This patch removes un-needed spin_lock in stmmac_ioctl while reading and
writing mdio registers. While holding spin_lock the code must be
atomic, which is not true in this case as both mdiobus_read and writes
have mutex locks.

Without this patch reading mdio registers via mii-tool results in below
BUG:
mii-tool -vvv eth0"
Using SIOCGMIIPHY=0x8947
BUG: sleeping function called from invalid context at kernel/mutex.c:287
in_atomic(): 1, irqs_disabled(): 0, pid: 614, name: mii-tool
2 locks held by mii-tool/614:
 #0:  (rtnl_mutex){......}, at: [<c01fd80c>] dev_ioctl+0x550/0x674
 #1:  (&priv->lock){......}, at: [<c01b34ec>] stmmac_ioctl+0x4c/0x78
[<c002ea14>] (unwind_backtrace+0x0/0xcc) from [<c0272c38>]
(mutex_lock_nested+0x24/0x35c)
[<c0272c38>] (mutex_lock_nested+0x24/0x35c) from [<c01b237c>]
(mdiobus_read+0x44/0x70)
[<c01b237c>] (mdiobus_read+0x44/0x70) from [<c01b0c64>]
(phy_mii_ioctl+0x4c/0x138)
[<c01b0c64>] (phy_mii_ioctl+0x4c/0x138) from [<c01b34fc>]
(stmmac_ioctl+0x5c/0x78)
[<c01b34fc>] (stmmac_ioctl+0x5c/0x78) from [<c01fcec8>]
(dev_ifsioc+0x2a4/0x2c8)
[<c01fcec8>] (dev_ifsioc+0x2a4/0x2c8) from [<c01fd81c>]
(dev_ioctl+0x560/0x674)
[<c01fd81c>] (dev_ioctl+0x560/0x674) from [<c00c36e0>]
(vfs_ioctl+0x2c/0x8c)
[<c00c36e0>] (vfs_ioctl+0x2c/0x8c) from [<c00c4130>]
(do_vfs_ioctl+0x530/0x578)
[<c00c4130>] (do_vfs_ioctl+0x530/0x578) from [<c00c41ac>]
(sys_ioctl+0x34/0x54)
[<c00c41ac>] (sys_ioctl+0x34/0x54) from [<c0028aa0>]
(ret_fast_syscall+0x0/0x2c)

Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 7f3ffd3..29dd87c 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1536,9 +1536,7 @@ static int stmmac_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 	if (!priv->phydev)
 		return -EINVAL;
 
-	spin_lock(&priv->lock);
 	ret = phy_mii_ioctl(priv->phydev, rq, cmd);
-	spin_unlock(&priv->lock);
 
 	return ret;
 }
-- 
1.7.4.4

^ permalink raw reply related

* [PATCH 5/5] stmmac: fix pm functions avoiding sleep on spinlock
From: Giuseppe CAVALLARO @ 2011-11-17  7:58 UTC (permalink / raw)
  To: netdev; +Cc: francesco.virlinzi, srinivas.kandagatla, Giuseppe Cavallaro
In-Reply-To: <1321516682-32208-1-git-send-email-peppe.cavallaro@st.com>

From: Francesco Virlinzi <francesco.virlinzi@st.com>

This patch fixes the pm functions to avoid the system
sleeps while a spinlock is taken.

Signed-off-by: Francesco Virlinzi <francesco.virlinzi@st.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |   12 +++++++-----
 1 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 29dd87c..8ea770a 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2011,12 +2011,13 @@ static int stmmac_suspend(struct device *dev)
 	if (!ndev || !netif_running(ndev))
 		return 0;
 
+	if (priv->phydev)
+		phy_stop(priv->phydev);
+
 	spin_lock(&priv->lock);
 
 	netif_device_detach(ndev);
 	netif_stop_queue(ndev);
-	if (priv->phydev)
-		phy_stop(priv->phydev);
 
 #ifdef CONFIG_STMMAC_TIMER
 	priv->tm->timer_stop();
@@ -2074,12 +2075,13 @@ static int stmmac_resume(struct device *dev)
 #endif
 	napi_enable(&priv->napi);
 
-	if (priv->phydev)
-		phy_start(priv->phydev);
-
 	netif_start_queue(ndev);
 
 	spin_unlock(&priv->lock);
+
+	if (priv->phydev)
+		phy_start(priv->phydev);
+
 	return 0;
 }
 
-- 
1.7.4.4

^ permalink raw reply related

* [PATCH 1/6] net: add the nwhwconfig support
From: Giuseppe CAVALLARO @ 2011-11-17  8:01 UTC (permalink / raw)
  To: netdev
  Cc: mamoroso, shiraz.hashim, armando.visconti, Giuseppe Cavallaro,
	Stuart Menefy

Network drivers support hardware level configuration via utilities such as
ifconfig, ethtool and mii-tool. However sometimes these settings need to be
adjusted before a file system is available (typically if the root file system
uses NFS).

This patch adds a new support called nwhwconf. It is used in STLinux embedded
platforms since long time. This support adds a simple kernel command line
interface to configure some common network parameters i.e. the MAC address.
In fact, some boards (like ST STB) with embedded  the stmmac Ethernet device
drivers do not have a MAC address blown into the serial EEPROM and the nwhwconf
 is actually used for providing it.

Enable this feature (CONFIG_NWHW_CONFIG) from the configuration menu as
follows:

Device Drivers ---> Networking Support --->
  Configure network hardware from the command line

This is an example how to add the MAC address to the boot command line:

nwhwconf=device:<dev>,hwaddr:<addr>

where:
    <dev> is the device name, normally eth0
    <addr> is the MAC address, which has the form: xx:xx:xx:xx:xx:xx,

Signed-off-by: Stuart Menefy <stuart.menefy@st.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
 drivers/net/Kconfig      |   10 +++
 drivers/net/Makefile     |    1 +
 drivers/net/nwhwconfig.c |  173 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 184 insertions(+), 0 deletions(-)
 create mode 100644 drivers/net/nwhwconfig.c

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 654a5e9..921a8c8 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -338,4 +338,14 @@ config VMXNET3
 	  To compile this driver as a module, choose M here: the
 	  module will be called vmxnet3.
 
+config NWHW_CONFIG
+	bool "Configure network hardware from the command line"
+	help
+	  Many network drivers support hardware level configuration via
+	  utilities such as ifconfig, ethtool and mii-tool. However sometimes
+	  these settings need to be adjusted before a file system is
+	  available (typically if the root file system uses NFS).
+	  This option adds a simple kernel command line interface to configure
+	  some common network parameters.
+
 endif # NETDEVICES
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index fa877cd..0869bf7 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -66,3 +66,4 @@ obj-$(CONFIG_USB_USBNET)        += usb/
 obj-$(CONFIG_USB_ZD1201)        += usb/
 obj-$(CONFIG_USB_IPHETH)        += usb/
 obj-$(CONFIG_USB_CDC_PHONET)   += usb/
+obj-$(CONFIG_NWHW_CONFIG) += nwhwconfig.o
diff --git a/drivers/net/nwhwconfig.c b/drivers/net/nwhwconfig.c
new file mode 100644
index 0000000..76f3fdf
--- /dev/null
+++ b/drivers/net/nwhwconfig.c
@@ -0,0 +1,173 @@
+/*
+ * Configuration of network device hardware from the kernel command line.
+ *
+ * Official documentation available at:
+ *   http://www.stlinux.com/howto/network/ethernet-MAC
+ *
+ * Copyright (c) STMicroelectronics Limited
+ *
+ * Author: Stuart Menefy <stuart.menefy@st.com>
+ * Author: Giuseppe Cavallaro <peppe.cavallaro@st.com>
+ */
+
+#include <linux/string.h>
+#include <linux/netdevice.h>
+#include <linux/if_arp.h>
+#include <linux/ethtool.h>
+#include <linux/etherdevice.h>
+#include <net/ip.h>
+
+static struct eth_dev {
+	char dev_name[IFNAMSIZ];
+	char hw_addr[18];
+	int speed;
+	int duplex;
+} nwhwdev[NETDEV_BOOT_SETUP_MAX];
+
+static int parse_addr(char *str, struct sockaddr *addr)
+{
+	char *s;
+	char *mac = addr->sa_data;
+
+	while ((s = strsep(&str, ":")) != NULL) {
+		unsigned byte;
+		if (sscanf(s, "%x", &byte) != 1 || byte > 0xff)
+			return -1;
+		*mac++ = byte;
+	}
+	addr->sa_family = ARPHRD_ETHER;
+	return 0;
+}
+
+/**
+ * nwhw_config
+ * @dev : net device pointer
+ * Description: it sets the MAC address.
+ * Note that if the network device driver already uses a right
+ * address this function doesn't replace any value.
+ */
+static int __init nwhw_config(void)
+{
+	struct net_device *dev;
+	struct sockaddr s_addr;
+	int ndev = 0;
+
+	while ((ndev < NETDEV_BOOT_SETUP_MAX) &&
+	       (dev = __dev_get_by_name(&init_net, nwhwdev[ndev].dev_name))) {
+
+		if (!dev)
+			break;
+
+		if (!is_valid_ether_addr(dev->dev_addr)) {
+
+			if (nwhwdev[ndev].hw_addr[0]) {
+				int valid_ether =
+				    parse_addr(nwhwdev[ndev].hw_addr, &s_addr);
+				if (!valid_ether) {
+					rtnl_lock();
+					if (dev_set_mac_address(dev, &s_addr))
+						pr_err("%s: Error: not set MAC"
+						       " addr", __func__);
+					rtnl_unlock();
+					goto hw_mac_done;
+				} else
+					pr_err("%s: Error: Invalid MAC addr",
+					       __func__);
+			}
+			/* Although many drivers do that in case of
+			 * problems, we assume the nwhw_config always
+			 * has to exit with a good MAC address (even if
+			 * generated randomly). */
+			random_ether_addr(dev->dev_addr);
+			pr_warning("%s: generating random addr...", __func__);
+		}
+hw_mac_done:
+		pr_info("%s: (%s) setting mac address: %pM", __func__,
+			dev->name, dev->dev_addr);
+
+		if ((nwhwdev[ndev].speed != -1) ||
+		    (nwhwdev[ndev].duplex != -1)) {
+			struct ethtool_cmd cmd = { ETHTOOL_GSET };
+
+			if (!dev->ethtool_ops->get_settings ||
+			    (dev->ethtool_ops->get_settings(dev, &cmd) < 0))
+				pr_err("%s: cannot read ether device settings",
+				       __func__);
+			else {
+				cmd.cmd = ETHTOOL_SSET;
+				cmd.autoneg = AUTONEG_DISABLE;
+				if (nwhwdev[ndev].speed != -1)
+					cmd.speed = nwhwdev[ndev].speed;
+				if (nwhwdev[ndev].duplex != -1)
+					cmd.duplex = nwhwdev[ndev].duplex;
+				if (!dev->ethtool_ops->set_settings ||
+				    (dev->ethtool_ops->set_settings(dev, &cmd) <
+				     0))
+					pr_err("%s: cannot setting the eth dev",
+					       __func__);
+			}
+		}
+		ndev++;
+	}
+	return 0;
+}
+
+device_initcall(nwhw_config);
+
+/**
+ * nwhw_config_setup - parse the nwhwconfig parameters
+ * @str : pointer to the nwhwconfig parameter
+ * Description:
+ * This function parses the nwhwconfig command line argumets.
+ * Command line syntax:
+ * nwhwconf=device:eth0,hwaddr:<mac0>[,speed:<speed0>][,duplex:<duplex0>];
+ *	    device:eth1,hwaddr:<mac1>[,speed:<speed1>][,duplex:<duplex1>];
+ *	...
+ */
+static int __init nwhw_config_setup(char *str)
+{
+	char *opt;
+	int j = 0;
+
+	if (!str || !*str)
+		return 0;
+
+	while (((opt = strsep(&str, ";")) != NULL)
+	       && (j < NETDEV_BOOT_SETUP_MAX)) {
+		char *p;
+
+		nwhwdev[j].speed = -1;
+		nwhwdev[j].duplex = -1;
+
+		while ((p = strsep(&opt, ",")) != NULL) {
+			if (!strncmp(p, "device:", 7))
+				strlcpy(nwhwdev[j].dev_name, p + 7,
+					sizeof(nwhwdev[j].dev_name));
+			else if (!strncmp(p, "hwaddr:", 7))
+				strlcpy(nwhwdev[j].hw_addr, p + 7,
+					sizeof(nwhwdev[j].hw_addr));
+			else if (!strcmp(p, "duplex:full"))
+				nwhwdev[j].duplex = DUPLEX_FULL;
+
+			else if (!strcmp(p, "duplex:half"))
+				nwhwdev[j].duplex = DUPLEX_HALF;
+
+			else if (!strncmp(p, "speed:", 6)) {
+				int speed;
+
+				if (!(kstrtoul(p + 6, 0,
+						     (unsigned long *)&speed)))
+					if ((speed == SPEED_10) ||
+					    (speed == SPEED_100) ||
+					    (speed == SPEED_1000) ||
+					    (speed == SPEED_10000))
+						nwhwdev[j].speed = speed;
+			}
+
+		}
+		j++;
+	}
+	return 1;
+}
+
+__setup("nwhwconf=", nwhw_config_setup);
-- 
1.7.4.4

^ permalink raw reply related

* Re: [PATCH 1/6] net: add the nwhwconfig support
From: David Miller @ 2011-11-17  8:06 UTC (permalink / raw)
  To: peppe.cavallaro
  Cc: netdev, mamoroso, shiraz.hashim, armando.visconti, stuart.menefy
In-Reply-To: <1321516900-616-1-git-send-email-peppe.cavallaro@st.com>

From: Giuseppe CAVALLARO <peppe.cavallaro@st.com>
Date: Thu, 17 Nov 2011 09:01:40 +0100

> Network drivers support hardware level configuration via utilities such as
> ifconfig, ethtool and mii-tool. However sometimes these settings need to be
> adjusted before a file system is available (typically if the root file system
> uses NFS).

No way, use an initial ramdisk.

^ permalink raw reply

* Re: Unable to flush ICMP redirect routes in kernel 3.0+
From: Ivan Zahariev @ 2011-11-17  8:10 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20111116223330.08de9e52@asterix.rh>

On 17.11.2011 г. 02:33 ч., Flavio Leitner wrote:
> On Thu, 17 Nov 2011 00:32:18 +0200
> Ivan Zahariev<famzah@icdsoft.com>  wrote:
>
>> On 11/15/2011 11:09 PM, Eric Dumazet wrote:
>>> Le mardi 15 novembre 2011 à 22:23 +0200, Ivan Zahariev a écrit :
>>>> Hello,
>>>>
>>>> We have changed nothing in our network infrastructure but only
>>>> upgraded from Linux kernel 2.6.36.2 to 3.0.3. Here is the problem
>>>> we are experiencing:
>>>>
>>>> ICMP redirected routes are cached forever, and they can be cleared
>>>> only by a reboot.
>>>>
>> ### (bug #1) even though we flushed the route cache, the<redirected>
>> route resurrects from somewhere; even without making any TCP requests
>> ### this time what "ip" returns is consistent with the real
>> (incorrect) routing behavior of machine5
>> root@machine5:~# ip route flush cache
>> root@machine5:~# ip route list cache match 8.8.4.4
>> root@machine5:~# ip route get 8.8.4.4
>> 8.8.4.4 via 192.168.0.120 dev eth0  src 192.168.0.244
>>       cache<redirected>   ipid 0x303a
>>
>> ### only a reboot clears the cached<redirected>  routes
> IIRC, the cache flush doesn't affect the inetpeer where the
> redirected gateway is now stored, so even after flushing the
> route cache, the inetpeer will restore the old info later.
>
> fbl
OK, I guess my questions now are:
* How to flush the inetpeer (redirected cache info) without having to 
reboot the machine?
* Why "ip route" returns an incorrect route; example:

### (bug #2) what "ip route" returns is inconsistent, because we are 
using the <redirected> route 192.168.0.120 in reality
### note that the count of the route lines increased with one
root@machine5:~# ip route list cache match 8.8.4.4
8.8.4.4 from 192.168.0.244 tos lowdelay via 192.168.0.8 dev eth0
     cache  ipid 0x303a
8.8.4.4 tos lowdelay via 192.168.0.8 dev eth0  src 192.168.0.244
     cache  ipid 0x303a
8.8.4.4 via 192.168.0.8 dev eth0  src 192.168.0.244
     cache
8.8.4.4 from 192.168.0.244 tos lowdelay via 192.168.0.8 dev eth0
     cache  ipid 0x303a

### After "ip route flush cache", the output of "ip route" gets 
consistent with the real routing behavior of machine5
root@machine5:~# ip route flush cache
root@machine5:~# ip route list cache match 8.8.4.4
root@machine5:~# ip route get 8.8.4.4
8.8.4.4 via 192.168.0.120 dev eth0  src 192.168.0.244
     cache <redirected>  ipid 0x303a

Thanks.
--Ivan

^ permalink raw reply

* Re: [PATCH net-next] r8169: Add 64bit statistics
From: Eric Dumazet @ 2011-11-17  8:11 UTC (permalink / raw)
  To: Junchang Wang; +Cc: Stephen Hemminger, netdev, romieu, nic swsd
In-Reply-To: <CABoNC82RO2uvn9TfToAygEspUZnNrefXgO6SGpZoSAayCp3QiA@mail.gmail.com>

Le jeudi 17 novembre 2011 à 15:46 +0800, Junchang Wang a écrit :
> > You dont need per-cpu since Tx is locked by dev->xmit_lock and
> > rx is implicitly single threaded by NAPI.
> 
> Thanks.
> 
> >You do need to have
> > two u64_stat_sync entries (one for Tx and one for Rx).
> 
> You mean Rx and Tx will perform on different cores at one moment.
> So I need a sync for Tx to protect tx_xxx, and another for Rx to
> protect rx_xxx. Is that right?
> 

Yes, look at sky2.c for a template

drivers/net/ethernet/marvell/sky2.c contains code like that
(different syncp for rx/tx)

TX path:
                        u64_stats_update_begin(&sky2->tx_stats.syncp);
                        ++sky2->tx_stats.packets;
                        sky2->tx_stats.bytes += skb->len;
                        u64_stats_update_end(&sky2->tx_stats.syncp);


RX path:

        u64_stats_update_begin(&sky2->rx_stats.syncp);
        sky2->rx_stats.packets += packets;
        sky2->rx_stats.bytes += bytes;
        u64_stats_update_end(&sky2->rx_stats.syncp);

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox