Netdev List
 help / color / mirror / Atom feed
* Re: pull request: wireless-next 2012-05-03
From: Joe Perches @ 2012-05-03 17:29 UTC (permalink / raw)
  To: David Miller; +Cc: linville, linux-wireless, netdev
In-Reply-To: <20120503.131707.112550136096227430.davem@davemloft.net>

On Thu, 2012-05-03 at 13:17 -0400, David Miller wrote:
>  ...
> -		if (hdev->discovery.type == DISCOV_TYPE_INTERLEAVED) {
> +		if (hdev->discovery.type == DISCOV_TYPE_INTERLEAVED &&
> +				hdev->discovery.state == DISCOVERY_FINDING) {
> 
> Really, we went through this a million times very recently and I'm
> not pulling anything into my tree that has garbage like this in it.

Perhaps the bluetooth folk can adopt using

scripts/checkpatch.pl --strict

or maybe checkpatch could be changed to use
--strict on patches in net and drivers/net
automatically.

^ permalink raw reply

* Re: [net-next 0/9][pull request] Intel Wired LAN Dirver Updates
From: David Miller @ 2012-05-03 17:30 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, sassmann
In-Reply-To: <1336038992-3144-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Thu,  3 May 2012 02:56:23 -0700

> This series of patches contains updates for e1000e and ixgbevf.
> 
> The following are changes since commit af94bf6db1d58d26f1cdab145b6312ad363254a6:
>   ixgbe: Fix use after free on module remove
> and are available in the git repository at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master

Pulled, thanks Jeff.

^ permalink raw reply

* Re: [PATCH 7/9] net: add skb_orphan_frags to copy aside frags with destructors
From: David Miller @ 2012-05-03 17:55 UTC (permalink / raw)
  To: mst; +Cc: ian.campbell, netdev, eric.dumazet
In-Reply-To: <20120503154142.GB27671@redhat.com>

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Thu, 3 May 2012 18:41:43 +0300

> On Thu, May 03, 2012 at 03:56:09PM +0100, Ian Campbell wrote:
>> This should be used by drivers which need to hold on to an skb for an extended
>> (perhaps unbounded) period of time. e.g. the tun driver which relies on
>> userspace consuming the skb.
>> 
>> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
>> Cc: mst@redhat.com
> 
> 
> Right. But local sockets queue at socket forever as well.
> I think this should be called in skb_set_owner_r?
> 
> This might somewhat penalize speed for local clients in the name
> of correctness but these are rare so being correct is
> more important I think.

But, on the other hand, putting the check into skb_set_owner_r() is a
not so nice test to have in the fast path of every socket receive.

^ permalink raw reply

* pull request: wireless 2012-05-03
From: John W. Linville @ 2012-05-03 17:37 UTC (permalink / raw)
  To: davem; +Cc: linux-wireless, netdev, linux-kernel

commit 3c3052eac295678fd2765552c6a86d5441306cb4

Dave,

Here is another trio of fixes intended for 3.4.  The fix from Eric
Dumazet corrects skb truesize reporting in iwlwifi, avoiding a
potential exhaustion of kernel memory due to differet socket memory
accounting for these skb allocations.  The fix from Franky Lin avoids
a double unlock on a spinlock in brcmfmac.  Finally, Rajkumar Manoharan
reverts an earlier patch that created a minor regression in ath9k.

Please let me know if there are problems!

John

---

The following changes since commit 5a8887d39e1ba5ee2d4ccb94b14d6f2dce5ddfca:

  sungem: Fix WakeOnLan (2012-05-03 01:42:55 -0400)

are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless.git for-davem

Eric Dumazet (1):
      iwlwifi: fix skb truesize underestimation

Franky Lin (1):
      brcmfmac: fix a double spin_unlock_irqrestore issue in dpc

John W. Linville (1):
      Merge branch 'master' of git://git.kernel.org/.../linville/wireless into for-davem

Rajkumar Manoharan (1):
      Revert "ath9k_hw: Fix incorrect spur_freq_sd for AR9003"

 drivers/net/wireless/ath/ath9k/ar9003_phy.c        |    4 +-
 drivers/net/wireless/brcm80211/brcmfmac/dhd_sdio.c |    1 +
 drivers/net/wireless/iwlwifi/iwl-agn-rx.c          |   21 +++++++++++++------
 drivers/net/wireless/iwlwifi/iwl-trans-pcie-rx.c   |    3 +-
 drivers/net/wireless/iwlwifi/iwl-trans.h           |    1 +
 5 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/ar9003_phy.c b/drivers/net/wireless/ath/ath9k/ar9003_phy.c
index deb6cfb..600aca9 100644
--- a/drivers/net/wireless/ath/ath9k/ar9003_phy.c
+++ b/drivers/net/wireless/ath/ath9k/ar9003_phy.c
@@ -373,7 +373,7 @@ static void ar9003_hw_spur_ofdm_work(struct ath_hw *ah,
 			else
 				spur_subchannel_sd = 0;
 
-			spur_freq_sd = (freq_offset << 9) / 11;
+			spur_freq_sd = ((freq_offset + 10) << 9) / 11;
 
 		} else {
 			if (REG_READ_FIELD(ah, AR_PHY_GEN_CTRL,
@@ -382,7 +382,7 @@ static void ar9003_hw_spur_ofdm_work(struct ath_hw *ah,
 			else
 				spur_subchannel_sd = 1;
 
-			spur_freq_sd = (freq_offset << 9) / 11;
+			spur_freq_sd = ((freq_offset - 10) << 9) / 11;
 
 		}
 
diff --git a/drivers/net/wireless/brcm80211/brcmfmac/dhd_sdio.c b/drivers/net/wireless/brcm80211/brcmfmac/dhd_sdio.c
index eb3829b..e2b34e1 100644
--- a/drivers/net/wireless/brcm80211/brcmfmac/dhd_sdio.c
+++ b/drivers/net/wireless/brcm80211/brcmfmac/dhd_sdio.c
@@ -2637,6 +2637,7 @@ static int brcmf_sdbrcm_dpc_thread(void *data)
 				/* after stopping the bus, exit thread */
 				brcmf_sdbrcm_bus_stop(bus->sdiodev->dev);
 				bus->dpc_tsk = NULL;
+				spin_lock_irqsave(&bus->dpc_tl_lock, flags);
 				break;
 			}
 
diff --git a/drivers/net/wireless/iwlwifi/iwl-agn-rx.c b/drivers/net/wireless/iwlwifi/iwl-agn-rx.c
index f4b84d1..2247460 100644
--- a/drivers/net/wireless/iwlwifi/iwl-agn-rx.c
+++ b/drivers/net/wireless/iwlwifi/iwl-agn-rx.c
@@ -773,8 +773,7 @@ static void iwlagn_pass_packet_to_mac80211(struct iwl_priv *priv,
 	struct sk_buff *skb;
 	__le16 fc = hdr->frame_control;
 	struct iwl_rxon_context *ctx;
-	struct page *p;
-	int offset;
+	unsigned int hdrlen, fraglen;
 
 	/* We only process data packets if the interface is open */
 	if (unlikely(!priv->is_open)) {
@@ -788,16 +787,24 @@ static void iwlagn_pass_packet_to_mac80211(struct iwl_priv *priv,
 	    iwlagn_set_decrypted_flag(priv, hdr, ampdu_status, stats))
 		return;
 
-	skb = dev_alloc_skb(128);
+	/* Dont use dev_alloc_skb(), we'll have enough headroom once
+	 * ieee80211_hdr pulled.
+	 */
+	skb = alloc_skb(128, GFP_ATOMIC);
 	if (!skb) {
-		IWL_ERR(priv, "dev_alloc_skb failed\n");
+		IWL_ERR(priv, "alloc_skb failed\n");
 		return;
 	}
+	hdrlen = min_t(unsigned int, len, skb_tailroom(skb));
+	memcpy(skb_put(skb, hdrlen), hdr, hdrlen);
+	fraglen = len - hdrlen;
 
-	offset = (void *)hdr - rxb_addr(rxb);
-	p = rxb_steal_page(rxb);
-	skb_add_rx_frag(skb, 0, p, offset, len, len);
+	if (fraglen) {
+		int offset = (void *)hdr + hdrlen - rxb_addr(rxb);
 
+		skb_add_rx_frag(skb, 0, rxb_steal_page(rxb), offset,
+				fraglen, rxb->truesize);
+	}
 	iwl_update_stats(priv, false, fc, len);
 
 	/*
diff --git a/drivers/net/wireless/iwlwifi/iwl-trans-pcie-rx.c b/drivers/net/wireless/iwlwifi/iwl-trans-pcie-rx.c
index 8b1a798..aa7aea1 100644
--- a/drivers/net/wireless/iwlwifi/iwl-trans-pcie-rx.c
+++ b/drivers/net/wireless/iwlwifi/iwl-trans-pcie-rx.c
@@ -374,8 +374,9 @@ static void iwl_rx_handle_rxbuf(struct iwl_trans *trans,
 	if (WARN_ON(!rxb))
 		return;
 
+	rxcb.truesize = PAGE_SIZE << hw_params(trans).rx_page_order;
 	dma_unmap_page(trans->dev, rxb->page_dma,
-		       PAGE_SIZE << hw_params(trans).rx_page_order,
+		       rxcb.truesize,
 		       DMA_FROM_DEVICE);
 
 	rxcb._page = rxb->page;
diff --git a/drivers/net/wireless/iwlwifi/iwl-trans.h b/drivers/net/wireless/iwlwifi/iwl-trans.h
index 0c81cba..fdf9788 100644
--- a/drivers/net/wireless/iwlwifi/iwl-trans.h
+++ b/drivers/net/wireless/iwlwifi/iwl-trans.h
@@ -260,6 +260,7 @@ static inline void iwl_free_resp(struct iwl_host_cmd *cmd)
 
 struct iwl_rx_cmd_buffer {
 	struct page *_page;
+	unsigned int truesize;
 };
 
 static inline void *rxb_addr(struct iwl_rx_cmd_buffer *r)
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply related

* Re: sky2 still badly broken
From: Niccolò Belli @ 2012-05-03 18:06 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20120503082352.771acab3@nehalam.linuxnetplumber.net>

Il 03/05/2012 17:23, Stephen Hemminger ha scritto:
> The receiver on some versions of the chip can't keep up with full speed
> of 1G bit/sec. The receive  FIFO has hardware issues, and since I don't
> work for Marvell, working around the problem is guesswork. Without exact
> information all that can be done is have a timeout and blunt force reset
> logic. The vendor driver sk98lin has the same brute force logic, but may
> just not print the message.

If I lower the speed to 100Mb/s I don't have rx errors anymore *BUT* 
when using dhcp after a while the network doesn't work anymore:

64 bytes from 8.8.8.8: icmp_req=590 ttl=47 time=61.7 ms
64 bytes from 8.8.8.8: icmp_req=591 ttl=47 time=62.0 ms
64 bytes from 8.8.8.8: icmp_req=592 ttl=47 time=62.0 ms
ping: sendmsg: Network is unreachable
ping: sendmsg: Network is unreachable
ping: sendmsg: Network is unreachable


I have no problems with dhcp using wifi or other NICs.

Niccolò

^ permalink raw reply

* Re: sky2 still badly broken
From: Stephen Hemminger @ 2012-05-03 18:15 UTC (permalink / raw)
  To: Niccolò Belli; +Cc: netdev
In-Reply-To: <4FA2C913.4080504@linuxsystems.it>

On Thu, 03 May 2012 20:06:11 +0200
Niccolò Belli <darkbasic@linuxsystems.it> wrote:

> Il 03/05/2012 17:23, Stephen Hemminger ha scritto:
> > The receiver on some versions of the chip can't keep up with full speed
> > of 1G bit/sec. The receive  FIFO has hardware issues, and since I don't
> > work for Marvell, working around the problem is guesswork. Without exact
> > information all that can be done is have a timeout and blunt force reset
> > logic. The vendor driver sk98lin has the same brute force logic, but may
> > just not print the message.
> 
> If I lower the speed to 100Mb/s I don't have rx errors anymore *BUT* 
> when using dhcp after a while the network doesn't work anymore:
> 
> 64 bytes from 8.8.8.8: icmp_req=590 ttl=47 time=61.7 ms
> 64 bytes from 8.8.8.8: icmp_req=591 ttl=47 time=62.0 ms
> 64 bytes from 8.8.8.8: icmp_req=592 ttl=47 time=62.0 ms
> ping: sendmsg: Network is unreachable
> ping: sendmsg: Network is unreachable
> ping: sendmsg: Network is unreachable
> 
> 
> I have no problems with dhcp using wifi or other NICs.
> 
> Niccolò

Maybe ethtool registers have some info?

# ethtool -S eth0
NIC statistics:
     tx_bytes: 3640274
     rx_bytes: 61588953
     tx_broadcast: 641
     rx_broadcast: 9964
     tx_multicast: 126
     rx_multicast: 1501
     tx_unicast: 32683
     rx_unicast: 50415
     tx_mac_pause: 0
     rx_mac_pause: 0
     collisions: 0
     late_collision: 0
     aborted: 0
     single_collisions: 0
     multi_collisions: 0
     rx_short: 0
     rx_runt: 0
     rx_64_byte_packets: 8704
     rx_65_to_127_byte_packets: 4414
     rx_128_to_255_byte_packets: 6218
     rx_256_to_511_byte_packets: 1289
     rx_512_to_1023_byte_packets: 1777
     rx_1024_to_1518_byte_packets: 39478
     rx_1518_to_max_byte_packets: 0
     rx_too_long: 0
     rx_fifo_overflow: 0
     rx_jabber: 0
     rx_fcs_error: 0
     tx_64_byte_packets: 1628
     tx_65_to_127_byte_packets: 27483
     tx_128_to_255_byte_packets: 2885
     tx_256_to_511_byte_packets: 637
     tx_512_to_1023_byte_packets: 429
     tx_1024_to_1518_byte_packets: 388
     tx_1519_to_max_byte_packets: 0
     tx_fifo_underrun: 0


And if you enable the debugfs option there is more info
hidden there.

# mount -t debugfs debugfs /sys/kernel/debug
# cat /sys/kernel/debug/sky2/eth0

^ permalink raw reply

* [PATCH] net/niu: remove one superfluous dma mask check
From: Sebastian Andrzej Siewior @ 2012-05-03 18:22 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Sebastian Andrzej Siewior

The idea here seems to be to get a 44bit DMA mask working and if this
fails it should fallback to a 32bit DMA mask. The dma_mask variable is
assigned once to 44bit and never updated. pci_set_dma_mask() and
pci_set_consistent_dma_mask() are both implemented as functions so there
is no evil macro which might update dma_mask. Looking at the assembly, I
see a call to dma_set_mask() followed by dma_supported() and then a jump
passed the second dma_set_mask(). The only way to get to second
dma_set_mask() call is by an error code in the first one.

So I hereby remove the check since it looks superfluous. Please ignore
the path if there is black magic involved.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/net/ethernet/sun/niu.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/sun/niu.c b/drivers/net/ethernet/sun/niu.c
index c99b3b0..703c8cc 100644
--- a/drivers/net/ethernet/sun/niu.c
+++ b/drivers/net/ethernet/sun/niu.c
@@ -9838,7 +9838,7 @@ static int __devinit niu_pci_init_one(struct pci_dev *pdev,
 			goto err_out_release_parent;
 		}
 	}
-	if (err || dma_mask == DMA_BIT_MASK(32)) {
+	if (err) {
 		err = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
 		if (err) {
 			dev_err(&pdev->dev, "No usable DMA configuration, aborting\n");
-- 
1.7.10

^ permalink raw reply related

* RE: [PATCH] net: davinci_emac: Add pre_open, post_stop platform callbacks
From: Bedia, Vaibhav @ 2012-05-03 18:21 UTC (permalink / raw)
  To: Mark A. Greer
  Cc: netdev@vger.kernel.org, linux-omap@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <20120503160917.GA11310@animalcreek.com>

On Thu, May 03, 2012 at 21:39:18, Mark A. Greer wrote:
> On Thu, May 03, 2012 at 10:44:44AM +0000, Bedia, Vaibhav wrote:
> > On Thu, May 03, 2012 at 05:17:18, Mark A. Greer wrote:
> > > From: "Mark A. Greer" <mgreer@animalcreek.com>
> > > 
> > > The davinci EMAC driver has been incorporated into the am35x
> > > family of SoC's which is OMAP-based.  The incorporation is
> > > incomplete in that the EMAC cannot unblock the [ARM] core if
> > > its blocked on a 'wfi' instruction.  This is an issue with
> > > the cpu_idle code because it has the core execute a 'wfi'
> > > instruction.
> > > 
> > > To work around this issue, add platform data callbacks which
> > > are called at the beginning of the open routine and at the
> > > end of the stop routine of the davinci_emac driver.  The
> > > callbacks allow the platform code to issue disable_hlt() and
> > > enable_hlt() calls appropriately.  Calling disable_hlt()
> > > prevents cpu_idle from issuing the 'wfi' instruction.
> > > 
> > > It is not sufficient to simply call disable_hlt() when
> > > there is an EMAC present because it could be present but
> > > not actually used in which case, we do want the 'wfi' to
> > > be executed.
> > > 
> > 
> > Are you trying to say that if ARM executes _just_ wfi and _absolutely
> > nothing else_ is done in the OMAP PM code, EMAC stops working?
> 
> No, I'm saying the EMAC can't wake the core from the wfi so if nothing
> else happens in the system, its effectively hung.  If something else
> does happen in the system (e.g., a timer expires), the the system is
> extremely slow because because its only waking up when a timer (or
> something else wakes it up--but not net traffic).  This is very apparent
> when using an nfs-mounted rootfs. It doesn't hang but its extremely
> slow because occasionally something else wakes up the core but it
> spends most of its time stuck in the wfi when it should be handling
> net/nfs traffic.
> 

So, if I understood this correctly, it's effectively like blocking a low power
state transition (here wfi execution) when EMAC is active?

^ permalink raw reply

* Re: [PATCH] net: davinci_emac: Add pre_open, post_stop platform callbacks
From: Mark A. Greer @ 2012-05-03 18:46 UTC (permalink / raw)
  To: Bedia, Vaibhav
  Cc: netdev@vger.kernel.org, linux-omap@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <B5906170F1614E41A8A28DE3B8D121433EA73591@DBDE01.ent.ti.com>

On Thu, May 03, 2012 at 06:21:27PM +0000, Bedia, Vaibhav wrote:
> On Thu, May 03, 2012 at 21:39:18, Mark A. Greer wrote:
> > On Thu, May 03, 2012 at 10:44:44AM +0000, Bedia, Vaibhav wrote:
> > > On Thu, May 03, 2012 at 05:17:18, Mark A. Greer wrote:
> > > > From: "Mark A. Greer" <mgreer@animalcreek.com>
> > > > 
> > > > The davinci EMAC driver has been incorporated into the am35x
> > > > family of SoC's which is OMAP-based.  The incorporation is
> > > > incomplete in that the EMAC cannot unblock the [ARM] core if
> > > > its blocked on a 'wfi' instruction.  This is an issue with
> > > > the cpu_idle code because it has the core execute a 'wfi'
> > > > instruction.
> > > > 
> > > > To work around this issue, add platform data callbacks which
> > > > are called at the beginning of the open routine and at the
> > > > end of the stop routine of the davinci_emac driver.  The
> > > > callbacks allow the platform code to issue disable_hlt() and
> > > > enable_hlt() calls appropriately.  Calling disable_hlt()
> > > > prevents cpu_idle from issuing the 'wfi' instruction.
> > > > 
> > > > It is not sufficient to simply call disable_hlt() when
> > > > there is an EMAC present because it could be present but
> > > > not actually used in which case, we do want the 'wfi' to
> > > > be executed.
> > > > 
> > > 
> > > Are you trying to say that if ARM executes _just_ wfi and _absolutely
> > > nothing else_ is done in the OMAP PM code, EMAC stops working?
> > 
> > No, I'm saying the EMAC can't wake the core from the wfi so if nothing
> > else happens in the system, its effectively hung.  If something else
> > does happen in the system (e.g., a timer expires), the the system is
> > extremely slow because because its only waking up when a timer (or
> > something else wakes it up--but not net traffic).  This is very apparent
> > when using an nfs-mounted rootfs. It doesn't hang but its extremely
> > slow because occasionally something else wakes up the core but it
> > spends most of its time stuck in the wfi when it should be handling
> > net/nfs traffic.
> > 
> 
> So, if I understood this correctly, it's effectively like blocking a low power
> state transition (here wfi execution) when EMAC is active?

Assuming "it" is my patch, correct.

Mark

^ permalink raw reply

* Re: 3.4-rc: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
From: Stefan Richter @ 2012-05-03 18:48 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev
In-Reply-To: <20120501200403.GA20763@electric-eye.fr.zoreil.com>

On May 01 Francois Romieu wrote:
[...]
> Can you apply the patch below on top of current ethtool
> (git://git.kernel.org/pub/scm/network/ethtool/ethtool.git) and see
> if it is enough to compare the register dumps (ethtool -d eth0).
[...]
> This is a firmware free chipset anyway. Nothing strange in the interface
> stats (ethtool -S eth0) ?
> 
> You may have to narrow things. Can you check if the r8169.c at
> 036dafa28da1e2565a8529de2ae663c37b7a0060 behaves the same ?

I will follow up on this eventually, but it may take quite some time due
to interfering work.

Thank you for looking into it and giving directions,
-- 
Stefan Richter
-=====-===-- -=-= ---==
http://arcgraph.de/sr/

^ permalink raw reply

* Re: [PATCH] net/niu: remove one superfluous dma mask check
From: David Miller @ 2012-05-03 18:55 UTC (permalink / raw)
  To: bigeasy; +Cc: netdev
In-Reply-To: <1336069320-23639-1-git-send-email-bigeasy@linutronix.de>

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date: Thu,  3 May 2012 20:22:00 +0200

> The idea here seems to be to get a 44bit DMA mask working and if this
> fails it should fallback to a 32bit DMA mask. The dma_mask variable is
> assigned once to 44bit and never updated. pci_set_dma_mask() and
> pci_set_consistent_dma_mask() are both implemented as functions so there
> is no evil macro which might update dma_mask. Looking at the assembly, I
> see a call to dma_set_mask() followed by dma_supported() and then a jump
> passed the second dma_set_mask(). The only way to get to second
> dma_set_mask() call is by an error code in the first one.
> 
> So I hereby remove the check since it looks superfluous. Please ignore
> the path if there is black magic involved.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

It looks like it's been this way since day one. :-)

Applied, thanks.

^ permalink raw reply

* Re: sky2 still badly broken
From: Niccolò Belli @ 2012-05-03 19:26 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20120503111536.269490e3@nehalam.linuxnetplumber.net>

Il 03/05/2012 20:15, Stephen Hemminger ha scritto:
> Maybe ethtool registers have some info?
>
> # ethtool -S eth0

After:
64 bytes from 8.8.8.8: icmp_req=592 ttl=47 time=61.0 ms
64 bytes from 8.8.8.8: icmp_req=593 ttl=47 time=64.9 ms
64 bytes from 8.8.8.8: icmp_req=594 ttl=47 time=61.8 ms
ping: sendmsg: Network is unreachable
ping: sendmsg: Network is unreachable
ping: sendmsg: Network is unreachable
[...]

I get:

laptop ~ # ethtool eth0
Settings for eth0:
         Supported ports: [ TP ]
         Supported link modes:   10baseT/Half 10baseT/Full
                                 100baseT/Half 100baseT/Full
                                 1000baseT/Half 1000baseT/Full
         Supported pause frame use: No
         Supports auto-negotiation: Yes
         Advertised link modes:  100baseT/Full
         Advertised pause frame use: No
         Advertised auto-negotiation: Yes
         Speed: 100Mb/s
         Duplex: Full
         Port: Twisted Pair
         PHYAD: 0
         Transceiver: internal
         Auto-negotiation: on
         MDI-X: Unknown
         Supports Wake-on: pg
         Wake-on: g
         Current message level: 0x000000ff (255)
                                drv probe link timer ifdown ifup rx_err 
tx_err
         Link detected: yes
laptop ~ # ethtool -S eth0
NIC statistics:
      tx_bytes: 118170
      rx_bytes: 891033
      tx_broadcast: 36
      rx_broadcast: 346
      tx_multicast: 4
      rx_multicast: 332
      tx_unicast: 1131
      rx_unicast: 1235
      tx_mac_pause: 0
      rx_mac_pause: 0
      collisions: 0
      late_collision: 0
      aborted: 0
      single_collisions: 0
      multi_collisions: 0
      rx_short: 0
      rx_runt: 0
      rx_64_byte_packets: 344
      rx_65_to_127_byte_packets: 654
      rx_128_to_255_byte_packets: 362
      rx_256_to_511_byte_packets: 50
      rx_512_to_1023_byte_packets: 11
      rx_1024_to_1518_byte_packets: 492
      rx_1518_to_max_byte_packets: 0
      rx_too_long: 0
      rx_fifo_overflow: 0
      rx_jabber: 0
      rx_fcs_error: 0
      tx_64_byte_packets: 45
      tx_65_to_127_byte_packets: 1074
      tx_128_to_255_byte_packets: 0
      tx_256_to_511_byte_packets: 52
      tx_512_to_1023_byte_packets: 0
      tx_1024_to_1518_byte_packets: 0
      tx_1519_to_max_byte_packets: 0
      tx_fifo_underrun: 0

> And if you enable the debugfs option there is more info
> hidden there.
>
> # mount -t debugfs debugfs /sys/kernel/debug
> # cat /sys/kernel/debug/sky2/eth0

There is no sky2 directory, maybe I don't have enough debug options in 
my kernel config.

Niccolò

^ permalink raw reply

* RE: [PATCH] net: davinci_emac: Add pre_open, post_stop platform callbacks
From: Bedia, Vaibhav @ 2012-05-03 19:25 UTC (permalink / raw)
  To: Mark A. Greer
  Cc: netdev@vger.kernel.org, linux-omap@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <20120503184632.GA28089@animalcreek.com>

On Fri, May 04, 2012 at 00:16:32, Mark A. Greer wrote:
[...]
> > 
> > So, if I understood this correctly, it's effectively like blocking a low power
> > state transition (here wfi execution) when EMAC is active?
> 
> Assuming "it" is my patch, correct.
> 

Recently I was thinking about how to get certain drivers to disallow some or all
low power states and to me this also seems to fall in a similar category.

One of the suggestions that I got was to check if the 'wakeup' entry associated with
the device under sysfs could be leveraged for this. The PM code could maintain
a whitelist (or blacklist) of devices and it decides the low power state to enter
based on the 'wakeup' entries associated with these devices. In this particular case,
maybe the driver could simply set this entry to non-wakeup capable when necessary and
then let the PM code take care of skipping the wfi execution.

Thoughts/brickbats welcome :)

Regards,
Vaibhav

^ permalink raw reply

* Re: [net-next PATCH v4 0/8] Managing the forwarding database(FDB)
From: John Fastabend @ 2012-05-03 19:26 UTC (permalink / raw)
  To: shemminger
  Cc: Michael S. Tsirkin, bhutchings, sri, hadi, jeffrey.t.kirsher,
	netdev, gregory.v.rose, krkumar2, roprabhu
In-Reply-To: <20120503054859.GH8266@redhat.com>

On 5/2/2012 10:48 PM, Michael S. Tsirkin wrote:
> On Wed, May 02, 2012 at 02:52:33PM -0700, John Fastabend wrote:
>> On 5/2/2012 8:08 AM, Michael S. Tsirkin wrote:
>>> On Sun, Apr 15, 2012 at 01:06:37PM -0400, David Miller wrote:
>>>> From: John Fastabend <john.r.fastabend@intel.com>
>>>> Date: Sun, 15 Apr 2012 09:43:51 -0700
>>>>
>>>>> The following series is a submission for net-next to allow
>>>>> embedded switches and other stacked devices other then the
>>>>> Linux bridge to manage a forwarding database.
>>>>>
>>>>> Previously discussed here,
>>>>>
>>>>> http://lists.openwall.net/netdev/2012/03/19/26
>>>>>
>>>>> v4: propagate return codes correctly for ndo_dflt_Fdb_dump()
>>>>>
>>>>> v3: resolve the macvlan patch 8/8 to fix a dev_set_promiscuity()
>>>>>     error and add the flags field to change and get link routines.
>>>>>
>>>>> v2: addressed feedback from Ben Hutchings resolving a typo in the
>>>>>     multicast add/del routines and improving the error handling
>>>>>     when both NTF_SELF and NTF_MASTER are set.
>>>>>
>>>>> I've tested this with 'br' tool published by Stephen Hemminger
>>>>> soon to be renamed 'bridge' I believe and various traffic
>>>>> generators mostly pktgen, ping, and netperf.
>>>>
>>>> All applied, if we need any more tweaks we can just add them
>>>> on top of this work.
>>>>
>>>> Thanks John.
>>>
>>> John, do you plan to update kvm userspace to use this interface?
>>>
>>
>> No immediate plans. I would really appreciate it if you or one
>> of the IBM developers working in this space took it on. Of course
>> if no one steps up I guess I can eventually get at it but it will
>> be sometime. For now I've been doing this manually with the bridge
>> tool yet to be published.
>>
>> .John
> 
> It'll be easier once you publish the tool, qemu can just run
> scripts like it does for ifup/ifdown now.
> 

Agreed.

Stephen when do you think you will be able to submit 'br' renamed
'bridge' for the iproute2 package? I've been using it now for sometime
without any issues and it seems to be working great for me.

Thanks,
John

^ permalink raw reply

* Re: sky2 still badly broken
From: Niccolò Belli @ 2012-05-03 19:36 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20120503082352.771acab3@nehalam.linuxnetplumber.net>

Il 03/05/2012 17:23, Stephen Hemminger ha scritto:
> The receiver on some versions of the chip can't keep up with full speed
> of 1G bit/sec. The receive  FIFO has hardware issues, and since I don't
> work for Marvell, working around the problem is guesswork.

Just one question: if that is true why don't I have any problem with a 
point to point gigabit link? I did transfer 3GB from ram to ram (ramfs) 
at 100+MB/s without any issue in a point to point setup, as soon as I 
attach it to the switch I have problems (very low transfer rate and tons 
of rx errors).

Niccolò

^ permalink raw reply

* [PATCH v4] tilegx network driver: initial support
From: Chris Metcalf @ 2012-05-03 16:41 UTC (permalink / raw)
  To: David Miller, arnd, linux-kernel, netdev
In-Reply-To: <20120503.014156.149171097979026872.davem@davemloft.net>

This change adds support for the tilegx network driver based on the
GXIO IORPC support in the tilegx software stack, using the on-chip
mPIPE packet processing engine.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
This version removes the USE_SIM_PRINTF hack from the driver.

 drivers/net/ethernet/tile/Kconfig  |    1 +
 drivers/net/ethernet/tile/Makefile |    4 +-
 drivers/net/ethernet/tile/tilegx.c | 1919 ++++++++++++++++++++++++++++++++++++
 3 files changed, 1922 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/tile/tilegx.c

diff --git a/drivers/net/ethernet/tile/Kconfig b/drivers/net/ethernet/tile/Kconfig
index 2d9218f..9184b61 100644
--- a/drivers/net/ethernet/tile/Kconfig
+++ b/drivers/net/ethernet/tile/Kconfig
@@ -7,6 +7,7 @@ config TILE_NET
 	depends on TILE
 	default y
 	select CRC32
+	select TILE_GXIO_MPIPE if TILEGX
 	---help---
 	  This is a standard Linux network device driver for the
 	  on-chip Tilera Gigabit Ethernet and XAUI interfaces.
diff --git a/drivers/net/ethernet/tile/Makefile b/drivers/net/ethernet/tile/Makefile
index f634f14..0ef9eef 100644
--- a/drivers/net/ethernet/tile/Makefile
+++ b/drivers/net/ethernet/tile/Makefile
@@ -4,7 +4,7 @@
 
 obj-$(CONFIG_TILE_NET) += tile_net.o
 ifdef CONFIG_TILEGX
-tile_net-objs := tilegx.o mpipe.o iorpc_mpipe.o dma_queue.o
+tile_net-y := tilegx.o
 else
-tile_net-objs := tilepro.o
+tile_net-y := tilepro.o
 endif
diff --git a/drivers/net/ethernet/tile/tilegx.c b/drivers/net/ethernet/tile/tilegx.c
new file mode 100644
index 0000000..297c074
--- /dev/null
+++ b/drivers/net/ethernet/tile/tilegx.c
@@ -0,0 +1,1919 @@
+/*
+ * Copyright 2012 Tilera Corporation. All Rights Reserved.
+ *
+ *   This program is free software; you can redistribute it and/or
+ *   modify it under the terms of the GNU General Public License
+ *   as published by the Free Software Foundation, version 2.
+ *
+ *   This program is distributed in the hope that it will be useful, but
+ *   WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ *   NON INFRINGEMENT.  See the GNU General Public License for
+ *   more details.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/moduleparam.h>
+#include <linux/sched.h>
+#include <linux/kernel.h>      /* printk() */
+#include <linux/slab.h>        /* kmalloc() */
+#include <linux/errno.h>       /* error codes */
+#include <linux/types.h>       /* size_t */
+#include <linux/interrupt.h>
+#include <linux/in.h>
+#include <linux/irq.h>
+#include <linux/netdevice.h>   /* struct device, and other headers */
+#include <linux/etherdevice.h> /* eth_type_trans */
+#include <linux/skbuff.h>
+#include <linux/ioctl.h>
+#include <linux/cdev.h>
+#include <linux/hugetlb.h>
+#include <linux/in6.h>
+#include <linux/timer.h>
+#include <linux/io.h>
+#include <linux/ctype.h>
+#include <asm/checksum.h>
+#include <asm/homecache.h>
+
+#include <gxio/mpipe.h>
+
+/* For TSO */
+#include <linux/ip.h>
+#include <linux/tcp.h>
+
+
+#include <arch/sim.h>
+
+
+
+/* First, "tile_net_init_module()" initializes each network cpu to
+ * handle incoming packets, and initializes all the network devices.
+ *
+ * Then, "ifconfig DEVICE up" calls "tile_net_open()", which will
+ * turn on packet processing, if needed.
+ *
+ * If "ifconfig DEVICE down" is called, it uses "tile_net_stop()" to
+ * stop egress, and possibly turn off packet processing.
+ *
+ * We start out with the ingress IRQ enabled on each CPU.  When it
+ * fires, it is automatically disabled, and we call "napi_schedule()".
+ * This will cause "tile_net_poll()" to be called, which will pull
+ * packets from the netio queue, filtering them out, or passing them
+ * to "netif_receive_skb()".  If our budget is exhausted, we will
+ * return, knowing we will be called again later.  Otherwise, we
+ * reenable the ingress IRQ, and call "napi_complete()".
+ *
+ *
+ * NOTE: Failing to free completions for an arbitrarily long time
+ * (which is defined to be illegal) does in fact cause bizarre problems.
+ *
+ * NOTE: The egress code can be interrupted by the interrupt handler.
+ */
+
+
+/* HACK: Define to support GSO.
+ * ISSUE: This may actually hurt performance of the TCP blaster.
+ */
+#undef TILE_NET_GSO
+
+/* HACK: Define to support TSO. */
+#define TILE_NET_TSO
+
+/* Use 3000 to enable the Linux Traffic Control (QoS) layer, else 0. */
+#define TILE_NET_TX_QUEUE_LEN 0
+
+/* Define to dump packets (prints out the whole packet on tx and rx). */
+#undef TILE_NET_DUMP_PACKETS
+
+/* Define to use "round robin" distribution. */
+#undef TILE_NET_ROUND_ROBIN
+
+/* Default transmit lockup timeout period, in jiffies. */
+#define TILE_NET_TIMEOUT (5 * HZ)
+
+/* The maximum number of distinct channels (idesc.channel is 5 bits). */
+#define TILE_NET_CHANNELS 32
+
+/* Maximum number of idescs to handle per "poll". */
+#define TILE_NET_BATCH 128
+
+/* Maximum number of packets to handle per "poll". */
+#define TILE_NET_WEIGHT 64
+
+/* Number of entries in each iqueue. */
+#define IQUEUE_ENTRIES 512
+
+/* Number of entries in each equeue. */
+#define EQUEUE_ENTRIES 2048
+
+/* Total header bytes per equeue slot.  Must be big enough for 2 bytes
+ * of NET_IP_ALIGN alignment, plus 14 bytes (?) of L2 header, plus up to
+ * 60 bytes of actual TCP header.  We round up to align to cache lines.
+ */
+#define HEADER_BYTES 128
+
+/* Maximum completions per cpu per device (must be a power of two).
+ * ISSUE: What is the right number here?
+ */
+#define TILE_NET_MAX_COMPS 64
+
+
+#define ROUND_UP(n, align) (((n) + (align) - 1) & -(align))
+
+
+#define MAX_FRAGS (65536 / PAGE_SIZE + 2 + 1)
+
+
+MODULE_AUTHOR("Tilera Corporation");
+MODULE_LICENSE("GPL");
+
+
+
+/* A "packet fragment" (a chunk of memory). */
+struct frag {
+	void *buf;
+	size_t length;
+};
+
+
+/* A single completion. */
+struct tile_net_comp {
+	/* The "complete_count" when the completion will be complete. */
+	s64 when;
+	/* The buffer to be freed when the completion is complete. */
+	struct sk_buff *skb;
+};
+
+
+/* The completions for a given cpu and device. */
+struct tile_net_comps {
+	/* The completions. */
+	struct tile_net_comp comp_queue[TILE_NET_MAX_COMPS];
+	/* The number of completions used. */
+	unsigned long comp_next;
+	/* The number of completions freed. */
+	unsigned long comp_last;
+};
+
+
+/* Info for a specific cpu. */
+struct tile_net_info {
+	/* The NAPI struct. */
+	struct napi_struct napi;
+	/* Packet queue. */
+	gxio_mpipe_iqueue_t iqueue;
+	/* Our cpu. */
+	int my_cpu;
+	/* True if iqueue is valid. */
+	bool has_iqueue;
+	/* NAPI flags. */
+	bool napi_added;
+	bool napi_enabled;
+	/* Number of small sk_buffs which must still be provided. */
+	unsigned int num_needed_small_buffers;
+	/* Number of large sk_buffs which must still be provided. */
+	unsigned int num_needed_large_buffers;
+	/* A timer for handling egress completions. */
+	struct timer_list egress_timer;
+	/* True if "egress_timer" is scheduled. */
+	bool egress_timer_scheduled;
+	/* Comps for each egress channel. */
+	struct tile_net_comps *comps_for_echannel[TILE_NET_CHANNELS];
+};
+
+
+/* Info for egress on a particular egress channel. */
+struct tile_net_egress {
+	/* The "equeue". */
+	gxio_mpipe_equeue_t *equeue;
+	/* The headers for TSO. */
+	unsigned char *headers;
+};
+
+
+/* Info for a specific device. */
+struct tile_net_priv {
+	/* Our network device. */
+	struct net_device *dev;
+	/* The primary link. */
+	gxio_mpipe_link_t link;
+	/* The primary channel, if open, else -1. */
+	int channel;
+	/* The "loopify" egress link, if needed. */
+	gxio_mpipe_link_t loopify_link;
+	/* The "loopify" egress channel, if open, else -1. */
+	int loopify_channel;
+	/* The egress channel (channel or loopify_channel). */
+	int echannel;
+	/* Total stats. */
+	struct net_device_stats stats;
+};
+
+
+/* Egress info, indexed by "priv->echannel" (lazily created as needed). */
+static struct tile_net_egress egress_for_echannel[TILE_NET_CHANNELS];
+
+/* Devices currently associated with each channel.
+ * NOTE: The array entry can become NULL after ifconfig down, but
+ * we do not free the underlying net_device structures, so it is
+ * safe to use a pointer after reading it from this array.
+ */
+static struct net_device *tile_net_devs_for_channel[TILE_NET_CHANNELS];
+
+/* A mutex for "tile_net_devs_for_channel". */
+static DEFINE_MUTEX(tile_net_devs_for_channel_mutex);
+
+/* The per-cpu info. */
+static DEFINE_PER_CPU(struct tile_net_info, per_cpu_info);
+
+/* The "context" for all devices. */
+static gxio_mpipe_context_t context;
+
+/* The small/large "buffer stacks". */
+static int small_buffer_stack = -1;
+static int large_buffer_stack = -1;
+
+/* The buckets. */
+static int first_bucket = -1;
+static int num_buckets = 1;
+
+/* The ingress irq. */
+static int ingress_irq = -1;
+
+
+/* Text value of tile_net.cpus if passed as a module parameter. */
+static char *network_cpus_string;
+
+/* The actual cpus in "network_cpus". */
+static struct cpumask network_cpus_map;
+
+
+/* If "loopify=LINK" was specified, this is "LINK". */
+static char *loopify_link_name;
+
+
+/* The "tile_net.cpus" argument specifies the cpus that are dedicated
+ * to handle ingress packets.
+ *
+ * The parameter should be in the form "tile_net.cpus=m-n[,x-y]", where
+ * m, n, x, y are integer numbers that represent the cpus that can be
+ * neither a dedicated cpu nor a dataplane cpu.
+ */
+static bool network_cpus_init(void)
+{
+	char buf[1024];
+	int rc;
+
+	if (network_cpus_string == NULL)
+		return false;
+
+	rc = cpulist_parse_crop(network_cpus_string, &network_cpus_map);
+	if (rc != 0) {
+		pr_warning("tile_net.cpus=%s: malformed cpu list\n",
+		       network_cpus_string);
+		return false;
+	}
+
+	/* Remove dedicated cpus. */
+	cpumask_and(&network_cpus_map, &network_cpus_map, cpu_possible_mask);
+
+
+	if (cpumask_empty(&network_cpus_map)) {
+		pr_warning("Ignoring empty tile_net.cpus='%s'.\n",
+			   network_cpus_string);
+		return false;
+	}
+
+	cpulist_scnprintf(buf, sizeof(buf), &network_cpus_map);
+	pr_info("Linux network CPUs: %s\n", buf);
+	return true;
+}
+
+module_param_named(cpus, network_cpus_string, charp, 0444);
+MODULE_PARM_DESC(cpus, "cpulist of cores that handle network interrupts");
+
+
+/* The "tile_net.loopify=LINK" argument causes the named device to
+ * actually use "loop0" for ingress, and "loop1" for egress.  This
+ * allows an app to sit between the actual link and linux, passing
+ * (some) packets along to linux, and forwarding (some) packets sent
+ * out by linux.
+ */
+module_param_named(loopify, loopify_link_name, charp, 0444);
+MODULE_PARM_DESC(loopify, "name the device to use loop0/1 for ingress/egress");
+
+
+#ifdef TILE_NET_DUMP_PACKETS
+/* Dump a packet. */
+static void dump_packet(unsigned char *data, unsigned long length, char *s)
+{
+	unsigned long i;
+	static unsigned int count;
+	char buf[128];
+
+	pr_info("Dumping %s packet of 0x%lx bytes at %p [%d]\n",
+	       s, length, data, count++);
+
+	pr_info("\n");
+
+	for (i = 0; i < length; i++) {
+		if ((i & 0xf) == 0)
+			sprintf(buf, "%8.8lx:", i);
+		sprintf(buf + strlen(buf), " %02x", data[i]);
+		if ((i & 0xf) == 0xf || i == length - 1)
+			pr_info("%s\n", buf);
+	}
+
+	pr_info("\n");
+}
+#endif
+
+
+/* Allocate and push a buffer. */
+static bool tile_net_provide_buffer(bool small)
+{
+	int stack = small ? small_buffer_stack : large_buffer_stack;
+
+	/* Buffers must be aligned. */
+	const unsigned long align = 128;
+
+	/* Note that "dev_alloc_skb()" adds NET_SKB_PAD more bytes,
+	 * and also "reserves" that many bytes.
+	 */
+	int len = sizeof(struct sk_buff **) + align + (small ? 128 : 1664);
+
+	/* Allocate (or fail). */
+	struct sk_buff *skb = dev_alloc_skb(len);
+	if (skb == NULL)
+		return false;
+
+	/* Make room for a back-pointer to 'skb'. */
+	skb_reserve(skb, sizeof(struct sk_buff **));
+
+	/* Make sure we are aligned. */
+	skb_reserve(skb, -(long)skb->data & (align - 1));
+
+	/* Save a back-pointer to 'skb'. */
+	*(struct sk_buff **)(skb->data - sizeof(struct sk_buff **)) = skb;
+
+	/* Make sure "skb" and the back-pointer have been flushed. */
+	wmb();
+
+	gxio_mpipe_push_buffer(&context, stack,
+			       (void *)va_to_tile_io_addr(skb->data));
+
+	return true;
+}
+
+
+/* Provide linux buffers to mPIPE. */
+static void tile_net_provide_needed_buffers(struct tile_net_info *info)
+{
+	while (info->num_needed_small_buffers != 0) {
+		if (!tile_net_provide_buffer(true))
+			goto oops;
+		info->num_needed_small_buffers--;
+	}
+
+	while (info->num_needed_large_buffers != 0) {
+		if (!tile_net_provide_buffer(false))
+			goto oops;
+		info->num_needed_large_buffers--;
+	}
+
+	return;
+
+oops:
+
+	/* Add a description to the page allocation failure dump. */
+	pr_notice("Tile %d still needs some buffers\n", info->my_cpu);
+}
+
+
+/* Handle a packet.  Return true if "processed", false if "filtered". */
+static bool tile_net_handle_packet(struct tile_net_info *info,
+				    gxio_mpipe_idesc_t *idesc)
+{
+	struct net_device *dev = tile_net_devs_for_channel[idesc->channel];
+
+	void *va;
+
+	uint8_t l2_offset = gxio_mpipe_idesc_get_l2_offset(idesc);
+
+	void *buf;
+	unsigned long len;
+
+	int filter = 0;
+
+	/* Drop packets for which no buffer was available.
+	 * NOTE: This happens under heavy load.
+	 */
+	if (idesc->be) {
+		gxio_mpipe_iqueue_consume(&info->iqueue, idesc);
+		if (net_ratelimit())
+			pr_info("Dropping packet (insufficient buffers).\n");
+		return false;
+	}
+
+	/* Get the raw buffer VA. */
+	va = tile_io_addr_to_va((unsigned long)gxio_mpipe_idesc_get_va(idesc));
+
+	/* Get the actual packet start/length. */
+	buf = va + l2_offset;
+	len = gxio_mpipe_idesc_get_l2_length(idesc);
+
+	/* Point "va" at the raw buffer. */
+	va -= NET_IP_ALIGN;
+
+#ifdef TILE_NET_DUMP_PACKETS
+	dump_packet(buf, len, "rx");
+#endif /* TILE_NET_DUMP_PACKETS */
+
+	if (dev != NULL) {
+		/* ISSUE: Is this needed? */
+		dev->last_rx = jiffies;
+	}
+
+	if (dev == NULL || !(dev->flags & IFF_UP)) {
+		/* Filter packets received before we're up. */
+		filter = 1;
+	} else if (!(dev->flags & IFF_PROMISC)) {
+		/* ISSUE: "eth_type_trans()" implies that "IFF_PROMISC"
+		 * is set for "all silly devices", however, it appears
+		 * to NOT be set for us, so this code here DOES run.
+		 * FIXME: The classifier will soon detect "multicast".
+		 */
+		if (!is_multicast_ether_addr(buf)) {
+			/* Filter packets not for our address. */
+			const u8 *mine = dev->dev_addr;
+			filter = compare_ether_addr(mine, buf);
+		}
+	}
+
+	if (filter) {
+
+		/* ISSUE: Update "drop" statistics? */
+
+		gxio_mpipe_iqueue_drop(&info->iqueue, idesc);
+
+	} else {
+
+		struct tile_net_priv *priv = netdev_priv(dev);
+
+		/* Acquire the associated "skb". */
+		struct sk_buff **skb_ptr = va - sizeof(*skb_ptr);
+		struct sk_buff *skb = *skb_ptr;
+
+		/* Paranoia. */
+		if (skb->data != va)
+			panic("Corrupt linux buffer! "
+			      "buf=%p, skb=%p, skb->data=%p\n",
+			      buf, skb, skb->data);
+
+		/* Skip headroom, and any custom header. */
+		skb_reserve(skb, NET_IP_ALIGN + l2_offset);
+
+		/* Encode the actual packet length. */
+		skb_put(skb, len);
+
+		/* NOTE: This call also sets "skb->dev = dev".
+		 * ISSUE: The classifier provides us with "eth_type"
+		 * (aka "eth->h_proto"), which is basically the value
+		 * returned by "eth_type_trans()".
+		 * Note that "eth_type_trans()" computes "skb->pkt_type",
+		 * which would be useful for the "filter" check above,
+		 * if we had a (modifiable) "skb" to work with.
+		 */
+		skb->protocol = eth_type_trans(skb, dev);
+
+		/* Acknowledge "good" hardware checksums. */
+		if (idesc->cs && idesc->csum_seed_val == 0xFFFF)
+			skb->ip_summed = CHECKSUM_UNNECESSARY;
+
+		netif_receive_skb(skb);
+
+		/* Update stats. */
+		atomic_add(1, (atomic_t *)&priv->stats.rx_packets);
+		atomic_add(len, (atomic_t *)&priv->stats.rx_bytes);
+
+		/* Need a new buffer. */
+		if (idesc->size == GXIO_MPIPE_BUFFER_SIZE_128)
+			info->num_needed_small_buffers++;
+		else
+			info->num_needed_large_buffers++;
+	}
+
+	gxio_mpipe_iqueue_consume(&info->iqueue, idesc);
+
+	return !filter;
+}
+
+
+/* Handle some packets for the current CPU.
+ *
+ * This function handles up to TILE_NET_BATCH idescs per call.
+ *
+ * ISSUE: Since we do not provide new buffers until this function is
+ * complete, we must initially provide enough buffers for each network
+ * cpu to fill its iqueue and also its batched idescs.
+ *
+ * ISSUE: The "rotting packet" race condition occurs if a packet
+ * arrives after the queue appears to be empty, and before the
+ * hypervisor interrupt is re-enabled.
+ */
+static int tile_net_poll(struct napi_struct *napi, int budget)
+{
+	struct tile_net_info *info = &__get_cpu_var(per_cpu_info);
+
+	unsigned int work = 0;
+
+	gxio_mpipe_idesc_t *idesc;
+	int i, n;
+
+	/* Process packets. */
+	while ((n = gxio_mpipe_iqueue_try_peek(&info->iqueue, &idesc)) > 0) {
+		for (i = 0; i < n; i++) {
+			if (i == TILE_NET_BATCH)
+				goto done;
+			if (tile_net_handle_packet(info, idesc + i)) {
+				if (++work >= budget)
+					goto done;
+			}
+		}
+	}
+
+	/* There are no packets left. */
+	napi_complete(&info->napi);
+
+	/* Re-enable hypervisor interrupts. */
+	gxio_mpipe_enable_notif_ring_interrupt(&context, info->iqueue.ring);
+
+	/* HACK: Avoid the "rotting packet" problem. */
+	if (gxio_mpipe_iqueue_try_peek(&info->iqueue, &idesc) > 0)
+		napi_schedule(&info->napi);
+
+	/* ISSUE: Handle completions? */
+
+done:
+
+	tile_net_provide_needed_buffers(info);
+
+	return work;
+}
+
+
+/* Handle an ingress interrupt on the current cpu. */
+static irqreturn_t tile_net_handle_ingress_irq(int irq, void *unused)
+{
+	struct tile_net_info *info = &__get_cpu_var(per_cpu_info);
+	napi_schedule(&info->napi);
+	return IRQ_HANDLED;
+}
+
+
+/* Free some completions.  This must be called with interrupts blocked. */
+static void tile_net_free_comps(gxio_mpipe_equeue_t* equeue,
+				struct tile_net_comps *comps,
+				int limit, bool force_update)
+{
+	int n = 0;
+	while (comps->comp_last < comps->comp_next) {
+		unsigned int cid = comps->comp_last % TILE_NET_MAX_COMPS;
+		struct tile_net_comp *comp = &comps->comp_queue[cid];
+		if (!gxio_mpipe_equeue_is_complete(equeue, comp->when,
+						   force_update || n == 0))
+			return;
+		dev_kfree_skb_irq(comp->skb);
+		comps->comp_last++;
+		if (++n == limit)
+			return;
+	}
+}
+
+
+/* Make sure the egress timer is scheduled.
+ *
+ * Note that we use "schedule if not scheduled" logic instead of the more
+ * obvious "reschedule" logic, because "reschedule" is fairly expensive.
+ */
+static void tile_net_schedule_egress_timer(struct tile_net_info *info)
+{
+	if (!info->egress_timer_scheduled) {
+		mod_timer_pinned(&info->egress_timer, jiffies + 1);
+		info->egress_timer_scheduled = true;
+	}
+}
+
+
+/* The "function" for "info->egress_timer".
+ *
+ * This timer will reschedule itself as long as there are any pending
+ * completions expected for this tile.
+ */
+static void tile_net_handle_egress_timer(unsigned long arg)
+{
+	struct tile_net_info *info = (struct tile_net_info *)arg;
+
+	unsigned int i;
+
+	bool pending = false;
+
+	unsigned long irqflags;
+
+	local_irq_save(irqflags);
+
+	/* The timer is no longer scheduled. */
+	info->egress_timer_scheduled = false;
+
+	/* Free all possible comps for this tile. */
+	for (i = 0; i < TILE_NET_CHANNELS; i++) {
+		struct tile_net_egress *egress = &egress_for_echannel[i];
+		struct tile_net_comps *comps = info->comps_for_echannel[i];
+		if (comps->comp_last >= comps->comp_next)
+			continue;
+		tile_net_free_comps(egress->equeue, comps, -1, true);
+		pending = pending || (comps->comp_last < comps->comp_next);
+	}
+
+	/* Reschedule timer if needed. */
+	if (pending)
+		tile_net_schedule_egress_timer(info);
+
+	local_irq_restore(irqflags);
+}
+
+
+/* Prepare each CPU. */
+static void tile_net_prepare_cpu(void *unused)
+{
+	struct tile_net_info *info = &__get_cpu_var(per_cpu_info);
+
+	int my_cpu = smp_processor_id();
+
+	info->has_iqueue = false;
+
+	info->my_cpu = my_cpu;
+
+	/* Initialize the egress timer. */
+	init_timer(&info->egress_timer);
+	info->egress_timer.data = (long)info;
+	info->egress_timer.function = tile_net_handle_egress_timer;
+}
+
+
+/* Helper function for "tile_net_update()". */
+static void tile_net_update_cpu(void *arg)
+{
+	struct net_device *dev = arg;
+
+	struct tile_net_info *info = &__get_cpu_var(per_cpu_info);
+
+	if (info->has_iqueue) {
+		if (dev != NULL) {
+			if (!info->napi_added) {
+				/* FIXME: HACK: We use one of the devices.
+				 * ISSUE: We never call "netif_napi_del()".
+				 */
+				netif_napi_add(dev, &info->napi,
+					       tile_net_poll, TILE_NET_WEIGHT);
+				info->napi_added = true;
+			}
+			if (!info->napi_enabled) {
+				napi_enable(&info->napi);
+				info->napi_enabled = true;
+			}
+			enable_percpu_irq(ingress_irq, 0);
+		} else {
+			disable_percpu_irq(ingress_irq);
+			if (info->napi_enabled) {
+				napi_disable(&info->napi);
+				info->napi_enabled = false;
+			}
+			/* FIXME: Drain the iqueue. */
+		}
+	}
+}
+
+
+/* Helper function for tile_net_open() and tile_net_stop(). */
+static int tile_net_update(void)
+{
+	struct net_device *dev = NULL;
+	int channel;
+	long count = 0;
+	int cpu;
+
+	/* HACK: This is too big for the linux stack. */
+	static gxio_mpipe_rules_t rules;
+
+	gxio_mpipe_rules_init(&rules, &context);
+
+	/* TODO: Add support for "dmac" splitting? */
+	for (channel = 0; channel < TILE_NET_CHANNELS; channel++) {
+		if (tile_net_devs_for_channel[channel] == NULL)
+			continue;
+		if (dev == NULL) {
+			dev = tile_net_devs_for_channel[channel];
+			gxio_mpipe_rules_begin(&rules, first_bucket,
+					       num_buckets, NULL);
+			gxio_mpipe_rules_set_headroom(&rules, NET_IP_ALIGN);
+		}
+		gxio_mpipe_rules_add_channel(&rules, channel);
+	}
+
+	/* NOTE: This can happen if there is no classifier.
+	 * ISSUE: Can anything else cause it to happen?
+	 */
+	if (gxio_mpipe_rules_commit(&rules) != 0) {
+		pr_warning("Failed to update classifier rules!\n");
+		return -EIO;
+	}
+
+	/* Update all cpus, sequentially (to protect "netif_napi_add()"). */
+	for_each_online_cpu(cpu)
+		smp_call_function_single(cpu, tile_net_update_cpu, dev, 1);
+
+	/* HACK: Allow packets to flow. */
+	if (count != 0)
+		sim_enable_mpipe_links(0, -1);
+
+	return 0;
+}
+
+
+/* Helper function for "tile_net_init_cpus()". */
+static void tile_net_init_stacks(int network_cpus_count)
+{
+	int err;
+	int i;
+
+	gxio_mpipe_buffer_size_enum_t small_buf_size =
+		GXIO_MPIPE_BUFFER_SIZE_128;
+	gxio_mpipe_buffer_size_enum_t large_buf_size =
+		GXIO_MPIPE_BUFFER_SIZE_1664;
+
+	int num_buffers;
+
+	size_t stack_bytes;
+
+	pte_t pte = { 0 };
+
+	void *mem;
+
+	num_buffers =
+		network_cpus_count * (IQUEUE_ENTRIES + TILE_NET_BATCH);
+
+	/* Compute stack bytes, honoring the 64KB minimum alignment. */
+	stack_bytes = ROUND_UP(gxio_mpipe_calc_buffer_stack_bytes(num_buffers),
+			       64 * 1024);
+	if (stack_bytes > HPAGE_SIZE)
+		panic("Cannot allocate %d physically contiguous buffers.",
+		      num_buffers);
+
+#if 0
+	sim_printf("Using %d buffers for %d network cpus.\n",
+		   num_buffers, network_cpus_count);
+#endif
+
+	/* Allocate two buffer stacks. */
+	small_buffer_stack = gxio_mpipe_alloc_buffer_stacks(&context, 2, 0, 0);
+	if (small_buffer_stack < 0)
+		panic("Failure in 'gxio_mpipe_alloc_buffer_stacks()'");
+	large_buffer_stack = small_buffer_stack + 1;
+
+	/* Allocate the small memory stack. */
+	mem = alloc_pages_exact(stack_bytes, GFP_KERNEL);
+	if (mem == NULL)
+		panic("Could not allocate buffer memory!");
+	err = gxio_mpipe_init_buffer_stack(&context, small_buffer_stack,
+					   small_buf_size,
+					   mem, stack_bytes, 0);
+	if (err != 0)
+		panic("Error %d in 'gxio_mpipe_init_buffer_stack()'.", err);
+
+	/* Allocate the large buffer stack. */
+	mem = alloc_pages_exact(stack_bytes, GFP_KERNEL);
+	if (mem == NULL)
+		panic("Could not allocate buffer memory!");
+	err = gxio_mpipe_init_buffer_stack(&context, large_buffer_stack,
+					   large_buf_size,
+					   mem, stack_bytes, 0);
+	if (err != 0)
+		panic("Error %d in 'gxio_mpipe_init_buffer_stack()'.", err);
+
+	/* Pin all the client memory. */
+	pte = pte_set_home(pte, PAGE_HOME_HASH);
+	err = gxio_mpipe_register_client_memory(&context, small_buffer_stack,
+						pte, 0);
+	if (err != 0)
+		panic("Error %d in 'gxio_mpipe_register_buffer_memory()'.",
+		      err);
+	err = gxio_mpipe_register_client_memory(&context, large_buffer_stack,
+						pte, 0);
+	if (err != 0)
+		panic("Error %d in 'gxio_mpipe_register_buffer_memory()'.",
+		      err);
+
+	/* Provide initial buffers. */
+	for (i = 0; i < num_buffers; i++) {
+		if (!tile_net_provide_buffer(true))
+			panic("Cannot provide initial buffers!");
+	}
+	for (i = 0; i < num_buffers; i++) {
+		if (!tile_net_provide_buffer(false))
+			panic("Cannot provide initial buffers!");
+	}
+}
+
+
+/* Actually initialize the mPIPE state. */
+static int tile_net_init_cpus(void)
+{
+	int network_cpus_count;
+
+	int ring;
+	int group;
+
+	int next_ring;
+
+	int cpu;
+
+	int i;
+
+#ifdef TILE_NET_ROUND_ROBIN
+	gxio_mpipe_bucket_mode_t mode = GXIO_MPIPE_BUCKET_ROUND_ROBIN;
+#else
+	/* Use random rebalancing. */
+	gxio_mpipe_bucket_mode_t mode = GXIO_MPIPE_BUCKET_STICKY_FLOW_LOCALITY;
+#endif
+
+	if (!hash_default) {
+		pr_warning("Networking requires hash_default!\n");
+		goto fail;
+	}
+
+	if (gxio_mpipe_init(&context, 0) != 0) {
+		pr_warning("Failed to initialize mPIPE!\n");
+		goto fail;
+	}
+
+	network_cpus_count = cpus_weight(network_cpus_map);
+
+	/* ISSUE: Handle failures more gracefully. */
+	tile_net_init_stacks(network_cpus_count);
+
+	/* Allocate one NotifRing for each network cpu. */
+	ring = gxio_mpipe_alloc_notif_rings(&context, network_cpus_count,
+					    0, 0);
+	if (ring < 0) {
+		pr_warning("Failed to allocate notif rings.\n");
+		goto fail;
+	}
+
+	/* ISSUE: Handle failures below more cleanly. */
+
+	/* Init NotifRings. */
+	next_ring = ring;
+
+	for_each_online_cpu(cpu) {
+
+		size_t notif_ring_size =
+			IQUEUE_ENTRIES * sizeof(gxio_mpipe_idesc_t);
+
+		int order;
+		struct page *page;
+		void *addr;
+
+		struct tile_net_info *info = &per_cpu(per_cpu_info, cpu);
+
+		/* ISSUE: This is overkill. */
+		size_t comps_size =
+			TILE_NET_CHANNELS * sizeof(struct tile_net_comps);
+
+		/* Allocate the "comps". */
+		order = get_order(comps_size);
+		page = homecache_alloc_pages(GFP_KERNEL, order, cpu);
+		if (page == NULL)
+			panic("Failed to allocate comps memory.");
+		addr = pfn_to_kaddr(page_to_pfn(page));
+		/* ISSUE: Is this needed? */
+		memset(addr, 0, comps_size);
+		for (i = 0; i < TILE_NET_CHANNELS; i++)
+			info->comps_for_echannel[i] =
+				addr + i * sizeof(struct tile_net_comps);
+
+		/* Only network cpus can receive packets. */
+		if (!cpu_isset(cpu, network_cpus_map))
+			continue;
+
+		/* Allocate the actual idescs array. */
+		order = get_order(notif_ring_size);
+		page = homecache_alloc_pages(GFP_KERNEL, order, cpu);
+		if (page == NULL)
+			panic("Failed to allocate iqueue memory.");
+		addr = pfn_to_kaddr(page_to_pfn(page));
+
+		if (gxio_mpipe_iqueue_init(&info->iqueue, &context, next_ring,
+					   addr, notif_ring_size, 0) != 0)
+			panic("Failure in 'gxio_mpipe_iqueue_init()'.");
+
+		info->has_iqueue = true;
+
+		next_ring++;
+	}
+
+	/* Allocate one NotifGroup. */
+	group = gxio_mpipe_alloc_notif_groups(&context, 1, 0, 0);
+	if (group < 0)
+		panic("Failure in 'gxio_mpipe_alloc_notif_groups()'.");
+
+#ifndef TILE_NET_ROUND_ROBIN
+	if (network_cpus_count > 4)
+		num_buckets = 256;
+	else if (network_cpus_count > 1)
+		num_buckets = 16;
+#endif
+
+	/* Allocate some buckets. */
+	first_bucket = gxio_mpipe_alloc_buckets(&context, num_buckets, 0, 0);
+	if (first_bucket < 0)
+		panic("Failure in 'gxio_mpipe_alloc_buckets()'.");
+
+	/* Init group and buckets. */
+	if (gxio_mpipe_init_notif_group_and_buckets(&context, group, ring,
+						    network_cpus_count,
+						    first_bucket, num_buckets,
+						    mode) != 0)
+		panic("Fail in 'gxio_mpipe_init_notif_group_and_buckets().");
+
+
+	/* Create an irq and register it. */
+	ingress_irq = create_irq();
+	if (ingress_irq < 0)
+		panic("Failed to create irq for ingress.");
+	tile_irq_activate(ingress_irq, TILE_IRQ_PERCPU);
+	BUG_ON(request_irq(ingress_irq, tile_net_handle_ingress_irq,
+			   0, NULL, NULL) != 0);
+
+	for_each_online_cpu(cpu) {
+
+		struct tile_net_info *info = &per_cpu(per_cpu_info, cpu);
+
+		int ring = info->iqueue.ring;
+
+		if (!info->has_iqueue)
+			continue;
+
+		gxio_mpipe_request_notif_ring_interrupt(&context,
+							cpu_x(cpu), cpu_y(cpu),
+							1, ingress_irq, ring);
+	}
+
+	return 0;
+
+fail:
+	return -EIO;
+}
+
+
+/* Create persistent egress info for a given egress channel.
+ *
+ * Note that this may be shared between, say, "gbe0" and "xgbe0".
+ *
+ * ISSUE: Defer header allocation until TSO is actually needed?
+ */
+static int tile_net_init_egress(int echannel)
+{
+	size_t headers_order;
+	struct page *headers_page;
+	unsigned char* headers;
+
+	size_t edescs_size;
+	int edescs_order;
+	struct page *edescs_page;
+	gxio_mpipe_edesc_t* edescs;
+
+	int equeue_order;
+	struct page *equeue_page;
+	gxio_mpipe_equeue_t* equeue;
+	int edma;
+
+	/* Only initialize once. */
+	if (egress_for_echannel[echannel].equeue != NULL)
+		return 0;
+
+	/* Allocate memory for the "headers". */
+	headers_order = get_order(EQUEUE_ENTRIES * HEADER_BYTES);
+	headers_page = alloc_pages(GFP_KERNEL, headers_order);
+	if (headers_page == NULL) {
+		pr_warning("Could not allocate memory for TSO headers.\n");
+		goto fail;
+	}
+	headers = pfn_to_kaddr(page_to_pfn(headers_page));
+
+	/* Allocate memory for the "edescs". */
+	edescs_size = EQUEUE_ENTRIES * sizeof(*edescs);
+	edescs_order = get_order(edescs_size);
+	edescs_page = alloc_pages(GFP_KERNEL, edescs_order);
+	if (edescs_page == NULL) {
+		pr_warning("Could not allocate memory for eDMA ring.\n");
+		goto fail_headers;
+	}
+	edescs = pfn_to_kaddr(page_to_pfn(edescs_page));
+
+	/* Allocate memory for the "equeue". */
+	equeue_order = get_order(sizeof(*equeue));
+	equeue_page = alloc_pages(GFP_KERNEL, equeue_order);
+	if (equeue_page == NULL) {
+		pr_warning("Could not allocate memory for equeue info.\n");
+		goto fail_edescs;
+	}
+	equeue = pfn_to_kaddr(page_to_pfn(equeue_page));
+
+	/* Allocate an edma ring. */
+	edma = gxio_mpipe_alloc_edma_rings(&context, 1, 0, 0);
+	if (edma < 0) {
+		pr_warning("Could not allocate edma ring.\n");
+		goto fail_equeue;
+	}
+
+	/* Initialize the equeue.  This should not fail. */
+	if (gxio_mpipe_equeue_init(equeue, &context, edma, echannel,
+				   edescs, edescs_size, 0) != 0)
+		panic("Failure in 'gxio_mpipe_equeue_init()'.");
+
+	/* Done. */
+	egress_for_echannel[echannel].equeue = equeue;
+	egress_for_echannel[echannel].headers = headers;
+	return 0;
+
+fail_equeue:
+	__free_pages(equeue_page, equeue_order);
+
+fail_edescs:
+	__free_pages(edescs_page, edescs_order);
+
+fail_headers:
+	__free_pages(headers_page, headers_order);
+
+fail:
+	return -EIO;
+}
+
+
+/* Help the kernel activate the given network interface. */
+static int tile_net_open(struct net_device *dev)
+{
+	struct tile_net_priv *priv = netdev_priv(dev);
+
+	/* Determine if this is the "loopify" device. */
+	bool loopify = (loopify_link_name != NULL) &&
+		!strcmp(dev->name, loopify_link_name);
+
+	int result;
+
+	mutex_lock(&tile_net_devs_for_channel_mutex);
+
+	if (ingress_irq < 0) {
+		result = tile_net_init_cpus();
+		if (result != 0)
+			goto fail;
+	}
+
+	if (priv->channel < 0) {
+		const char* ln = loopify ? "loop0" : dev->name;
+		if (gxio_mpipe_link_open(&priv->link, &context, ln, 0) < 0) {
+			netdev_err(dev, "Failed to open '%s'.\n", ln);
+			result = -EIO;
+			goto fail;
+		}
+		priv->channel = gxio_mpipe_link_channel(&priv->link);
+		BUG_ON(priv->channel < 0 ||
+		       priv->channel >= TILE_NET_CHANNELS);
+	}
+
+	if (loopify && priv->loopify_channel < 0) {
+		if (gxio_mpipe_link_open(&priv->loopify_link,
+					 &context, "loop1", 0) < 0) {
+			netdev_err(dev, "Failed to open 'loop1'.\n");
+			result = -EIO;
+			goto fail;
+		}
+		priv->loopify_channel =
+			gxio_mpipe_link_channel(&priv->loopify_link);
+		BUG_ON(priv->loopify_channel < 0 ||
+			priv->loopify_channel >= TILE_NET_CHANNELS);
+	}
+
+	priv->echannel =
+		((priv->loopify_channel >= 0) ?
+		 priv->loopify_channel : priv->channel);
+
+	/* Initialize egress info (if needed). */
+	result = tile_net_init_egress(priv->echannel);
+	if (result != 0)
+		goto fail;
+
+	tile_net_devs_for_channel[priv->channel] = dev;
+
+	result = tile_net_update();
+	if (result != 0)
+		goto fail_channel;
+
+	mutex_unlock(&tile_net_devs_for_channel_mutex);
+
+	/* Start our transmit queue. */
+	netif_start_queue(dev);
+
+	netif_carrier_on(dev);
+
+	return 0;
+
+fail_channel:
+	tile_net_devs_for_channel[priv->channel] = NULL;
+
+fail:
+	if (priv->loopify_channel >= 0) {
+		if (gxio_mpipe_link_close(&priv->loopify_link) != 0)
+			pr_warning("Failed to close loopify link!\n");
+		else
+			priv->loopify_channel = -1;
+	}
+	if (priv->channel >= 0) {
+		if (gxio_mpipe_link_close(&priv->link) != 0)
+			pr_warning("Failed to close link!\n");
+		else
+			priv->channel = -1;
+	}
+
+	priv->echannel = -1;
+
+	mutex_unlock(&tile_net_devs_for_channel_mutex);
+	return result;
+}
+
+
+
+/* Help the kernel deactivate the given network interface. */
+static int tile_net_stop(struct net_device *dev)
+{
+	struct tile_net_priv *priv = netdev_priv(dev);
+
+	/* Stop our transmit queue. */
+	netif_stop_queue(dev);
+
+	mutex_lock(&tile_net_devs_for_channel_mutex);
+
+	tile_net_devs_for_channel[priv->channel] = NULL;
+
+	(void)tile_net_update();
+
+	if (priv->loopify_channel >= 0) {
+		if (gxio_mpipe_link_close(&priv->loopify_link) != 0)
+			pr_warning("Failed to close loopify link!\n");
+		priv->loopify_channel = -1;
+	}
+
+	if (priv->channel >= 0) {
+		if (gxio_mpipe_link_close(&priv->link) != 0)
+			pr_warning("Failed to close link!\n");
+		priv->channel = -1;
+	}
+
+	priv->echannel = -1;
+
+	mutex_unlock(&tile_net_devs_for_channel_mutex);
+
+	return 0;
+}
+
+
+/* Determine the VA for a fragment. */
+static inline void *tile_net_frag_buf(skb_frag_t *f)
+{
+	unsigned long pfn = page_to_pfn(skb_frag_page(f));
+	return pfn_to_kaddr(pfn) + f->page_offset;
+}
+
+
+/* This function takes "skb", consisting of a header template and a
+ * (presumably) huge payload, and egresses it as one or more segments
+ * (aka packets), each consisting of a (possibly modified) copy of the
+ * header plus a piece of the payload, via "tcp segmentation offload".
+ *
+ * Usually, "data" will contain the header template, of size "sh_len",
+ * and "sh->frags" will contain "skb->data_len" bytes of payload, and
+ * there will be "sh->gso_segs" segments.
+ *
+ * Sometimes, if "sendfile()" requires copying, we will be called with
+ * "data" containing the header and payload, with "frags" being empty.
+ *
+ * Sometimes, for example when using NFS over TCP, a single segment can
+ * span 3 fragments.  This requires special care below.
+ *
+ * See "emulate_large_send_offload()" for some reference code, which
+ * does not handle checksumming.
+ */
+static int tile_net_tx_tso(struct sk_buff *skb, struct net_device *dev)
+{
+	struct tile_net_priv *priv = netdev_priv(dev);
+
+	struct tile_net_info *info = &__get_cpu_var(per_cpu_info);
+
+	struct tile_net_egress *egress = &egress_for_echannel[priv->echannel];
+	gxio_mpipe_equeue_t *equeue = egress->equeue;
+
+	struct tile_net_comps *comps =
+		info->comps_for_echannel[priv->echannel];
+
+	unsigned int len = skb->len;
+	unsigned char *data = skb->data;
+
+	/* The ip header follows the ethernet header. */
+	struct iphdr *ih = ip_hdr(skb);
+	unsigned int ih_len = ih->ihl * 4;
+
+	/* Note that "nh == iph", by definition. */
+	unsigned char *nh = skb_network_header(skb);
+	unsigned int eh_len = nh - data;
+
+	/* The tcp header follows the ip header. */
+	struct tcphdr *th = (struct tcphdr *)(nh + ih_len);
+	unsigned int th_len = th->doff * 4;
+
+	/* The total number of header bytes. */
+	unsigned int sh_len = eh_len + ih_len + th_len;
+
+	/* Help compute "jh->check". */
+	unsigned int isum_hack =
+		((0xFFFF - ih->check) +
+		 (0xFFFF - ih->tot_len) +
+		 (0xFFFF - ih->id));
+
+	/* Help compute "uh->check". */
+	unsigned int tsum_hack = th->check + (0xFFFF ^ htons(len));
+
+	struct skb_shared_info *sh = skb_shinfo(skb);
+
+	/* The maximum payload size. */
+	unsigned int gso_size = sh->gso_size;
+
+	/* The size of the initial segments (including header). */
+	unsigned int mtu = sh_len + gso_size;
+
+	/* The size of the final segment (including header). */
+	unsigned int mtu2 = len - ((sh->gso_segs - 1) * gso_size);
+
+	/* Track tx stats. */
+	unsigned int tx_packets = 0;
+	unsigned int tx_bytes = 0;
+
+	/* Which segment are we on. */
+	unsigned int segment;
+
+	/* Get the initial ip "id". */
+	u16 id = ntohs(ih->id);
+
+	/* Get the initial tcp "seq". */
+	u32 seq = ntohl(th->seq);
+
+	/* The id of the current fragment (or -1). */
+	long f_id;
+
+	/* The size of the current fragment (or -1). */
+	long f_size;
+
+	/* The bytes used from the current fragment (or -1). */
+	long f_used;
+
+	/* The size of the current piece of payload. */
+	long n;
+
+	/* Prepare checksum info. */
+	unsigned int csum_start = skb_checksum_start_offset(skb);
+
+	/* The header/payload edesc's. */
+	gxio_mpipe_edesc_t edesc_head = { { 0 } };
+	gxio_mpipe_edesc_t edesc_body = { { 0 } };
+
+	/* Total number of edescs needed. */
+	unsigned int num_edescs = 0;
+
+	unsigned long irqflags;
+
+	/* First reserved egress slot. */
+	s64 slot;
+
+	int cid;
+
+	/* Empty packets (etc) would cause trouble below. */
+	BUG_ON(skb->data_len == 0);
+	BUG_ON(sh->nr_frags == 0);
+	BUG_ON(sh->gso_segs == 0);
+
+	/* We assume the frags contain the entire payload. */
+	BUG_ON(skb_headlen(skb) != sh_len);
+	BUG_ON(len != sh_len + skb->data_len);
+
+	/* Implicitly verify "gso_segs" and "gso_size". */
+	BUG_ON(mtu2 > mtu);
+
+	/* We only have HEADER_BYTES for each header. */
+	BUG_ON(NET_IP_ALIGN + sh_len > HEADER_BYTES);
+
+	/* Paranoia. */
+	BUG_ON(skb->protocol != htons(ETH_P_IP));
+	BUG_ON(ih->protocol != IPPROTO_TCP);
+	BUG_ON(skb->ip_summed != CHECKSUM_PARTIAL);
+	BUG_ON(csum_start != eh_len + ih_len);
+
+	/* NOTE: ".hwb = 0", so ".size" is unused.
+	 * NOTE: ".stack_idx" determines the TLB.
+	 */
+
+	/* Prepare to egress the headers. */
+	edesc_head.csum = 1;
+	edesc_head.csum_start = csum_start;
+	edesc_head.csum_dest = csum_start + skb->csum_offset;
+	edesc_head.xfer_size = sh_len;
+	edesc_head.stack_idx = large_buffer_stack;
+
+	/* Prepare to egress the body. */
+	edesc_body.stack_idx = large_buffer_stack;
+
+	/* Reset. */
+	f_id = f_size = f_used = -1;
+
+	/* Determine how many edesc's are needed. */
+	for (segment = 0; segment < sh->gso_segs; segment++) {
+
+		/* Detect the final segment. */
+		bool final = (segment == sh->gso_segs - 1);
+
+		/* The segment size (including header). */
+		unsigned int s_len = final ? mtu2 : mtu;
+
+		/* The size of the payload. */
+		unsigned int p_len = s_len - sh_len;
+
+		/* The bytes used from the payload. */
+		unsigned int p_used = 0;
+
+		/* One edesc for the header. */
+		num_edescs++;
+
+		/* One edesc for each piece of the payload. */
+		while (p_used < p_len) {
+
+			/* Advance as needed. */
+			while (f_used >= f_size) {
+				f_id++;
+				f_size = sh->frags[f_id].size;
+				f_used = 0;
+			}
+
+			/* Use bytes from the current fragment. */
+			n = p_len - p_used;
+			if (n > f_size - f_used)
+				n = f_size - f_used;
+			f_used += n;
+			p_used += n;
+
+			num_edescs++;
+		}
+	}
+
+	/* Verify all fragments consumed. */
+	BUG_ON(f_id + 1 != sh->nr_frags);
+	BUG_ON(f_used != f_size);
+
+	local_irq_save(irqflags);
+
+	/* Reserve slots, or return NETDEV_TX_BUSY if "full". */
+	slot = gxio_mpipe_equeue_try_reserve(equeue, num_edescs);
+	if (slot < 0) {
+		local_irq_restore(irqflags);
+		/* ISSUE: "Virtual device xxx asks to queue packet". */
+		return NETDEV_TX_BUSY;
+	}
+
+	/* Reset. */
+	f_id = f_size = f_used = -1;
+
+	/* Prepare all the headers. */
+	for (segment = 0; segment < sh->gso_segs; segment++) {
+
+		/* Detect the final segment. */
+		bool final = (segment == sh->gso_segs - 1);
+
+		/* The segment size (including header). */
+		unsigned int s_len = final ? mtu2 : mtu;
+
+		/* The size of the payload. */
+		unsigned int p_len = s_len - sh_len;
+
+		/* The bytes used from the payload. */
+		unsigned int p_used = 0;
+
+		/* Access the header memory for this segment. */
+		unsigned int bn = slot % EQUEUE_ENTRIES;
+		unsigned char *buf =
+			egress->headers + bn * HEADER_BYTES + NET_IP_ALIGN;
+
+		/* The soon-to-be copied "ip" header. */
+		struct iphdr *jh = (struct iphdr *)(buf + eh_len);
+
+		/* The soon-to-be copied "tcp" header. */
+		struct tcphdr *uh = (struct tcphdr *)(buf + eh_len + ih_len);
+
+		unsigned int jsum;
+
+		/* Copy the header. */
+		memcpy(buf, data, sh_len);
+
+		/* The packet size, not including ethernet header. */
+		jh->tot_len = htons(s_len - eh_len);
+
+		/* Update the ip "id". */
+		jh->id = htons(id);
+
+		/* Compute the "ip checksum". */
+		jsum = isum_hack + htons(s_len - eh_len) + htons(id);
+		jh->check = csum_long(jsum) ^ 0xffff;
+
+		/* Update the tcp "seq". */
+		uh->seq = htonl(seq);
+
+		/* Update some flags. */
+		if (!final)
+			uh->fin = uh->psh = 0;
+
+		/* Compute the tcp pseudo-header checksum. */
+		uh->check = csum_long(tsum_hack + htons(s_len));
+
+		/* Skip past the header. */
+		slot++;
+
+		/* Skip past the payload. */
+		while (p_used < p_len) {
+
+			/* Advance as needed. */
+			while (f_used >= f_size) {
+				f_id++;
+				f_size = sh->frags[f_id].size;
+				f_used = 0;
+			}
+
+			/* Use bytes from the current fragment. */
+			n = p_len - p_used;
+			if (n > f_size - f_used)
+				n = f_size - f_used;
+			f_used += n;
+			p_used += n;
+
+			slot++;
+		}
+
+		id++;
+		seq += p_len;
+	}
+
+	/* Reset "slot". */
+	slot -= num_edescs;
+
+	/* Flush the headers. */
+	wmb();
+
+	/* Reset. */
+	f_id = f_size = f_used = -1;
+
+	/* Egress all the edescs. */
+	for (segment = 0; segment < sh->gso_segs; segment++) {
+
+		/* Detect the final segment. */
+		bool final = (segment == sh->gso_segs - 1);
+
+		/* The segment size (including header). */
+		unsigned int s_len = final ? mtu2 : mtu;
+
+		/* The size of the payload. */
+		unsigned int p_len = s_len - sh_len;
+
+		/* The bytes used from the payload. */
+		unsigned int p_used = 0;
+
+		/* Access the header memory for this segment. */
+		unsigned int bn = slot % EQUEUE_ENTRIES;
+		unsigned char *buf =
+			egress->headers + bn * HEADER_BYTES + NET_IP_ALIGN;
+
+		void *va;
+
+		/* Egress the header. */
+		edesc_head.va = va_to_tile_io_addr(buf);
+		gxio_mpipe_equeue_put_at(equeue, edesc_head, slot);
+		slot++;
+
+		/* Egress the payload. */
+		while (p_used < p_len) {
+
+			/* Advance as needed. */
+			while (f_used >= f_size) {
+				f_id++;
+				f_size = sh->frags[f_id].size;
+				f_used = 0;
+			}
+
+			va = tile_net_frag_buf(&sh->frags[f_id]) + f_used;
+
+			/* Use bytes from the current fragment. */
+			n = p_len - p_used;
+			if (n > f_size - f_used)
+				n = f_size - f_used;
+			f_used += n;
+			p_used += n;
+
+			/* Egress a piece of the payload. */
+			edesc_body.va = va_to_tile_io_addr(va);
+			edesc_body.xfer_size = n;
+			edesc_body.bound = !(p_used < p_len);
+			gxio_mpipe_equeue_put_at(equeue, edesc_body, slot);
+			slot++;
+		}
+
+		tx_packets++;
+		tx_bytes += s_len;
+	}
+
+	/* Wait for a free completion entry.
+	 * ISSUE: Is this the best logic?
+	 * ISSUE: Can this cause undesirable "blocking"?
+	 */
+	while (comps->comp_next - comps->comp_last >= TILE_NET_MAX_COMPS - 1)
+		tile_net_free_comps(equeue, comps, 32, false);
+
+	/* Update the completions array. */
+	cid = comps->comp_next % TILE_NET_MAX_COMPS;
+	comps->comp_queue[cid].when = slot;
+	comps->comp_queue[cid].skb = skb;
+	comps->comp_next++;
+
+	/* Update stats. */
+	atomic_add(tx_packets, (atomic_t *)&priv->stats.tx_packets);
+	atomic_add(tx_bytes, (atomic_t *)&priv->stats.tx_bytes);
+
+	local_irq_restore(irqflags);
+
+	/* Make sure the egress timer is scheduled. */
+	tile_net_schedule_egress_timer(info);
+
+	return NETDEV_TX_OK;
+}
+
+
+/* Analyze the body and frags for a transmit request. */
+static unsigned int tile_net_tx_frags(struct frag *frags,
+				       struct sk_buff *skb,
+				       void *b_data, unsigned int b_len)
+{
+	unsigned int i, n = 0;
+
+	struct skb_shared_info *sh = skb_shinfo(skb);
+
+	if (b_len != 0) {
+		frags[n].buf = b_data;
+		frags[n++].length = b_len;
+	}
+
+	for (i = 0; i < sh->nr_frags; i++) {
+		skb_frag_t *f = &sh->frags[i];
+		frags[n].buf = tile_net_frag_buf(f);
+		frags[n++].length = skb_frag_size(f);
+	}
+
+	return n;
+}
+
+
+/* Help the kernel transmit a packet. */
+static int tile_net_tx(struct sk_buff *skb, struct net_device *dev)
+{
+	struct tile_net_priv *priv = netdev_priv(dev);
+
+	struct tile_net_info *info = &__get_cpu_var(per_cpu_info);
+
+	struct tile_net_egress *egress = &egress_for_echannel[priv->echannel];
+	gxio_mpipe_equeue_t *equeue = egress->equeue;
+
+	struct tile_net_comps *comps =
+		info->comps_for_echannel[priv->echannel];
+
+	struct skb_shared_info *sh = skb_shinfo(skb);
+
+	unsigned int len = skb->len;
+	unsigned char *data = skb->data;
+
+	unsigned int num_frags;
+	struct frag frags[MAX_FRAGS];
+	gxio_mpipe_edesc_t edescs[MAX_FRAGS];
+
+	unsigned int i;
+
+	int cid;
+
+	s64 slot;
+
+	unsigned long irqflags;
+
+	/* Save the timestamp. */
+	dev->trans_start = jiffies;
+
+#ifdef TILE_NET_DUMP_PACKETS
+	/* ISSUE: Does not dump the "frags". */
+	dump_packet(data, skb_headlen(skb), "tx");
+#endif /* TILE_NET_DUMP_PACKETS */
+
+	if (sh->gso_size != 0)
+		return tile_net_tx_tso(skb, dev);
+
+	/* NOTE: This is usually 2, sometimes 3, for big writes. */
+	num_frags = tile_net_tx_frags(frags, skb, data, skb_headlen(skb));
+
+	/* Prepare the edescs. */
+	for (i = 0; i < num_frags; i++) {
+
+		/* NOTE: ".hwb = 0", so ".size" is unused.
+		 * NOTE: ".stack_idx" determines the TLB.
+		 */
+
+		gxio_mpipe_edesc_t edesc = { { 0 } };
+
+		/* Prepare the basic command. */
+		edesc.bound = (i == num_frags - 1);
+		edesc.xfer_size = frags[i].length;
+		edesc.va = va_to_tile_io_addr(frags[i].buf);
+		edesc.stack_idx = large_buffer_stack;
+
+		edescs[i] = edesc;
+	}
+
+	/* Add checksum info if needed. */
+	if (skb->ip_summed == CHECKSUM_PARTIAL) {
+		unsigned int csum_start = skb->csum_start - skb_headroom(skb);
+		edescs[0].csum = 1;
+		edescs[0].csum_start = csum_start;
+		edescs[0].csum_dest = csum_start + skb->csum_offset;
+	}
+
+	local_irq_save(irqflags);
+
+	/* Reserve slots, or return NETDEV_TX_BUSY if "full". */
+	slot = gxio_mpipe_equeue_try_reserve(equeue, num_frags);
+	if (slot < 0) {
+		local_irq_restore(irqflags);
+		/* ISSUE: "Virtual device xxx asks to queue packet". */
+		return NETDEV_TX_BUSY;
+	}
+
+	for (i = 0; i < num_frags; i++)
+		gxio_mpipe_equeue_put_at(equeue, edescs[i], slot + i);
+
+	/* Wait for a free completion entry.
+	 * ISSUE: Is this the best logic?
+	 * ISSUE: Can this cause undesirable "blocking"?
+	 */
+	while (comps->comp_next - comps->comp_last >= TILE_NET_MAX_COMPS - 1)
+		tile_net_free_comps(equeue, comps, 32, false);
+
+	/* Update the completions array. */
+	cid = comps->comp_next % TILE_NET_MAX_COMPS;
+	comps->comp_queue[cid].when = slot + num_frags;
+	comps->comp_queue[cid].skb = skb;
+	comps->comp_next++;
+
+	/* HACK: Track "expanded" size for short packets (e.g. 42 < 60). */
+	atomic_add(1, (atomic_t *)&priv->stats.tx_packets);
+	atomic_add((len >= ETH_ZLEN) ? len : ETH_ZLEN,
+		   (atomic_t *)&priv->stats.tx_bytes);
+
+	local_irq_restore(irqflags);
+
+	/* Make sure the egress timer is scheduled. */
+	tile_net_schedule_egress_timer(info);
+
+	return NETDEV_TX_OK;
+}
+
+
+/* Deal with a transmit timeout. */
+static void tile_net_tx_timeout(struct net_device *dev)
+{
+	/* ISSUE: This doesn't seem useful for us. */
+	netif_wake_queue(dev);
+}
+
+
+/* Ioctl commands. */
+static int tile_net_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
+{
+	return -EOPNOTSUPP;
+}
+
+
+/* Get System Network Statistics.
+ *
+ * Returns the address of the device statistics structure.
+ */
+static struct net_device_stats *tile_net_get_stats(struct net_device *dev)
+{
+	struct tile_net_priv *priv = netdev_priv(dev);
+	return &priv->stats;
+}
+
+
+/* Change the "mtu". */
+static int tile_net_change_mtu(struct net_device *dev, int new_mtu)
+{
+	/* Check ranges. */
+	if ((new_mtu < 68) || (new_mtu > 1500))
+		return -EINVAL;
+
+	/* Accept the value. */
+	dev->mtu = new_mtu;
+
+	return 0;
+}
+
+
+/* Change the Ethernet Address of the NIC.
+ *
+ * The hypervisor driver does not support changing MAC address.  However,
+ * the hardware does not do anything with the MAC address, so the address
+ * which gets used on outgoing packets, and which is accepted on incoming
+ * packets, is completely up to us.
+ *
+ * Returns 0 on success, negative on failure.
+ */
+static int tile_net_set_mac_address(struct net_device *dev, void *p)
+{
+	struct sockaddr *addr = p;
+
+	if (!is_valid_ether_addr(addr->sa_data))
+		return -EINVAL;
+
+	memcpy(dev->dev_addr, addr->sa_data, dev->addr_len);
+
+	return 0;
+}
+
+
+#ifdef CONFIG_NET_POLL_CONTROLLER
+/* Polling 'interrupt' - used by things like netconsole to send skbs
+ * without having to re-enable interrupts. It's not called while
+ * the interrupt routine is executing.
+ */
+static void tile_net_netpoll(struct net_device *dev)
+{
+	disable_percpu_irq(ingress_irq);
+	tile_net_handle_ingress_irq(ingress_irq, NULL);
+	enable_percpu_irq(ingress_irq, 0);
+}
+#endif
+
+
+static const struct net_device_ops tile_net_ops = {
+	.ndo_open = tile_net_open,
+	.ndo_stop = tile_net_stop,
+	.ndo_start_xmit = tile_net_tx,
+	.ndo_do_ioctl = tile_net_ioctl,
+	.ndo_get_stats = tile_net_get_stats,
+	.ndo_change_mtu = tile_net_change_mtu,
+	.ndo_tx_timeout = tile_net_tx_timeout,
+	.ndo_set_mac_address = tile_net_set_mac_address,
+#ifdef CONFIG_NET_POLL_CONTROLLER
+	.ndo_poll_controller = tile_net_netpoll,
+#endif
+};
+
+/* The setup function.
+ *
+ * This uses ether_setup() to assign various fields in dev, including
+ * setting IFF_BROADCAST and IFF_MULTICAST, then sets some extra fields.
+ */
+static void tile_net_setup(struct net_device *dev)
+{
+	ether_setup(dev);
+
+	dev->netdev_ops      = &tile_net_ops;
+	dev->watchdog_timeo  = TILE_NET_TIMEOUT;
+
+	/* We want lockless xmit. */
+	dev->features |= NETIF_F_LLTX;
+
+	/* We support hardware tx checksums. */
+	dev->features |= NETIF_F_HW_CSUM;
+
+	/* We support scatter/gather. */
+	dev->features |= NETIF_F_SG;
+
+#ifdef TILE_NET_GSO
+	/* We support GSO. */
+	dev->features |= NETIF_F_GSO;
+#endif
+
+#ifdef TILE_NET_TSO
+	/* We support TSO. */
+	dev->features |= NETIF_F_TSO;
+#endif
+
+	dev->tx_queue_len = TILE_NET_TX_QUEUE_LEN;
+
+	dev->mtu = 1500;
+}
+
+
+/* Allocate the device structure, register the device, and obtain the
+ * MAC address from the hypervisor.
+ */
+static void tile_net_dev_init(const char *name, const uint8_t* mac)
+{
+	int ret;
+	int i;
+	int nz_addr = 0;
+	struct net_device *dev;
+	struct tile_net_priv *priv;
+
+	/* HACK: Ignore "loop" links. */
+	if (strncmp(name, "loop", 4) == 0)
+		return;
+
+	/* Allocate the device structure.  This allocates "priv", calls
+	 * tile_net_setup(), and saves "name".  Normally, "name" is a
+	 * template, instantiated by register_netdev(), but not for us.
+	 */
+	dev = alloc_netdev(sizeof(*priv), name, tile_net_setup);
+	if (!dev) {
+		pr_err("alloc_netdev(%s) failed\n", name);
+		return;
+	}
+
+	priv = netdev_priv(dev);
+
+	/* Initialize "priv". */
+
+	memset(priv, 0, sizeof(*priv));
+
+	priv->dev = dev;
+
+	priv->channel = priv->loopify_channel = priv->echannel = -1;
+
+	/* Register the network device. */
+	ret = register_netdev(dev);
+	if (ret) {
+		netdev_err(dev, "register_netdev failed %d\n", ret);
+		free_netdev(dev);
+		return;
+	}
+
+	/* Get the MAC address and set it in the device struct; this must
+	 * be done before the device is opened.  If the MAC is all zeroes,
+	 * we use a random address, since we're probably on the simulator.
+	 */
+	for (i = 0; i < 6; i++)
+		nz_addr |= mac[i];
+
+	if (nz_addr) {
+		memcpy(dev->dev_addr, mac, 6);
+		dev->addr_len = 6;
+	} else {
+		random_ether_addr(dev->dev_addr);
+	}
+}
+
+
+/* Module initialization. */
+static int __init tile_net_init_module(void)
+{
+	int i;
+	char name[GXIO_MPIPE_LINK_NAME_LEN];
+	uint8_t mac[6];
+
+	pr_info("Tilera Network Driver\n");
+
+	mutex_init(&tile_net_devs_for_channel_mutex);
+
+	/* Initialize each CPU. */
+	on_each_cpu(tile_net_prepare_cpu, NULL, 1);
+
+	/* Find out what devices we have, and initialize them. */
+	for (i = 0; gxio_mpipe_link_enumerate_mac(i, name, mac) >= 0; i++)
+		tile_net_dev_init(name, mac);
+
+	if (!network_cpus_init())
+		network_cpus_map = *cpu_online_mask;
+
+	return 0;
+}
+
+module_init(tile_net_init_module);
-- 
1.6.5.2

^ permalink raw reply related

* Re: [net-next PATCH v4 0/8] Managing the forwarding database(FDB)
From: John Fastabend @ 2012-05-03 19:38 UTC (permalink / raw)
  To: Sridhar Samudrala, Roopa Prabhu
  Cc: Michael S. Tsirkin, shemminger, bhutchings, hadi,
	jeffrey.t.kirsher, netdev, gregory.v.rose, krkumar2
In-Reply-To: <4FA1C50C.7010405@us.ibm.com>

On 5/2/2012 4:36 PM, Sridhar Samudrala wrote:
> On 5/2/2012 2:52 PM, John Fastabend wrote:
>> On 5/2/2012 8:08 AM, Michael S. Tsirkin wrote:
>>> On Sun, Apr 15, 2012 at 01:06:37PM -0400, David Miller wrote:
>>>> From: John Fastabend<john.r.fastabend@intel.com>
>>>> Date: Sun, 15 Apr 2012 09:43:51 -0700
>>>>
>>>>> The following series is a submission for net-next to allow
>>>>> embedded switches and other stacked devices other then the
>>>>> Linux bridge to manage a forwarding database.
>>>>>
>>>>> Previously discussed here,
>>>>>
>>>>> http://lists.openwall.net/netdev/2012/03/19/26
>>>>>
>>>>> v4: propagate return codes correctly for ndo_dflt_Fdb_dump()
>>>>>
>>>>> v3: resolve the macvlan patch 8/8 to fix a dev_set_promiscuity()
>>>>>      error and add the flags field to change and get link routines.
>>>>>
>>>>> v2: addressed feedback from Ben Hutchings resolving a typo in the
>>>>>      multicast add/del routines and improving the error handling
>>>>>      when both NTF_SELF and NTF_MASTER are set.
>>>>>
>>>>> I've tested this with 'br' tool published by Stephen Hemminger
>>>>> soon to be renamed 'bridge' I believe and various traffic
>>>>> generators mostly pktgen, ping, and netperf.
>>>> All applied, if we need any more tweaks we can just add them
>>>> on top of this work.
>>>>
>>>> Thanks John.
>>> John, do you plan to update kvm userspace to use this interface?
>>>
>> No immediate plans. I would really appreciate it if you or one
>> of the IBM developers working in this space took it on. Of course
>> if no one steps up I guess I can eventually get at it but it will
>> be sometime. For now I've been doing this manually with the bridge
>> tool yet to be published.
>>
>>
> Does this mean that when we add an interface to a bridge, it need not be put in promiscuous mode and
> add/delete fdb entries dynamically?

The net/bridge will automatically put the interface in promisc mode
when the device is attached. We do need to add/delete fdb entries
though to allow forwarding packets from the virtual function and
any emulated devices e.g. tap devices on the bridge.

Currently I am doing this by manually running a tool Stephen created.
My hope would be to integrate this with KVM so that when I setup my
VM with an emulated device and have SR-IOV enabled perhaps for direct
assign use case qemu/libvirt also adds the VM address to the embedded
switch FDB.

> Or are we talking only about VMs attached to macvtap?
> 

The macvlan bridge calls dev_uc_add and dev_uc_sync so in this case
we shouldn't need to explicitly add entries to the embedded bridge
on the physical function.

> Thanks
> Sridhar
> 

^ permalink raw reply

* RE: [net-next 5/9] e1000e: Disable ASPM L1 on 82574
From: Wyborny, Carolyn @ 2012-05-03 20:12 UTC (permalink / raw)
  To: Nix, Kirsher, Jeffrey T
  Cc: davem@davemloft.net, Chris Boot, netdev@vger.kernel.org,
	gospo@redhat.com, sassmann@redhat.com
In-Reply-To: <87d36ld1as.fsf@spindle.srvr.nix>



>-----Original Message-----
>From: Nix [mailto:nix@esperi.org.uk]
>Sent: Thursday, May 03, 2012 3:09 AM
>To: Kirsher, Jeffrey T
>Cc: davem@davemloft.net; Chris Boot; netdev@vger.kernel.org;
>gospo@redhat.com; sassmann@redhat.com; Wyborny, Carolyn
>Subject: Re: [net-next 5/9] e1000e: Disable ASPM L1 on 82574
>
 [..]
>(reminder: this is known not to fix the instance of this problem I am
>experiencing, where ASPM is being re-enabled by something even if turned
>off via setpci during boot, though it does fix those instances seen by
>others where that doesn't happen. I'd have done more printf()-scattering
>debugging to see where it's turned back on if it wasn't that this is
>happening on an always-on server for which rebooting outside the dead of
>night is a long-winded chore...)
>
>FWIW I have also seen -- very rare -- lockups of the same nature on
>82574L links in 100MbE mode using non-jumbo frames. However they are far
>more common on GbE jumbo-framed links, normally taking less than an hour
>to take the link down with a wildly corrupted register set (as shown by
>ethtool).
>
>(It's annoying this firmware isn't flashable so we could just *fix* this
>bug rather than working around it. :( )
>
>
>I think I might cheat a bit next and printk_once() the state of ASPM L1
>on the errant PCI device from inside the scheduler when it flips from L1
>off to L1 on again. At 100 tests per second that should indicate at what
>time the thing is turned back on fairly tightly: even if not providing a
>direct clue as to which bit of the kernel is doing it, if I combine it
>with a set -x in userspace it should at least indicate what bit of the
>boot process is happening at the same time. It'll be the weekend before
>I can try that though.
>
>--
>NULL && (void)

Hello,

It would be good to know why/how your system is re-enabling the setting.  The problem is not solvable in firmware unfortunately and is somewhat platform dependent. MMIO-tracer might be used to try and see when the re-enabling config space is written, but it might be too heavyweight for a live production system.

I am also working on a patch to the driver to detect the condition and then do a slot reset to avoid a whole system reboot.  Would you be willing to test it in your problem system?

Thanks,

Carolyn

^ permalink raw reply

* [PATCH iproute2 ] Update man8 Makefile
From: Vijay Subramanian @ 2012-05-03 20:15 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger, Christoph J. Thompson, Vijay Subramanian

Commit (761a1e60 iproute2 - Split up manual page installation )
introduced man/man8/Makefile but did not add all the man pages.
This patch adds the missing man pages for installation.

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
---
 man/man8/Makefile |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/man/man8/Makefile b/man/man8/Makefile
index 5d64012..6873a4b 100644
--- a/man/man8/Makefile
+++ b/man/man8/Makefile
@@ -3,7 +3,11 @@ TARGETS = ip-address.8 ip-link.8 ip-route.8
 MAN8PAGES = $(TARGETS) ip.8 arpd.8 lnstat.8 routel.8 rtacct.8 rtmon.8 ss.8 \
 	tc-bfifo.8 tc-cbq-details.8 tc-cbq.8 tc-drr.8 tc-htb.8 \
 	tc-pfifo.8 tc-pfifo_fast.8 tc-prio.8 tc-red.8 tc-sfq.8 \
-	tc-tbf.8 tc.8 rtstat.8 ctstat.8 nstat.8 routef.8
+	tc-tbf.8 tc.8 rtstat.8 ctstat.8 nstat.8 routef.8 \
+	tc-sfb.8 tc-netem.8 tc-choke.8 ip-tunnel.8 ip-rule.8 ip-ntable.8 \
+	ip-monitor.8 tc-stab.8 tc-hfsc.8 ip-xfrm.8 ip-netns.8 \
+	ip-neighbour.8 ip-mroute.8 ip-maddress.8 ip-addrlabel.8
+
 
 all: $(TARGETS)
 
-- 
1.7.0.4

^ permalink raw reply related

* Re: [net-next 5/9] e1000e: Disable ASPM L1 on 82574
From: Nix @ 2012-05-03 20:17 UTC (permalink / raw)
  To: Wyborny, Carolyn
  Cc: Kirsher, Jeffrey T, davem@davemloft.net, Chris Boot,
	netdev@vger.kernel.org, gospo@redhat.com, sassmann@redhat.com
In-Reply-To: <9BBC4E0CF881AA4299206E2E1412B6261882C3A9@FMSMSX151.amr.corp.intel.com>

On 3 May 2012, Carolyn Wyborny told this:

> It would be good to know why/how your system is re-enabling the
> setting. The problem is not solvable in firmware unfortunately and is
> somewhat platform dependent. MMIO-tracer might be used to try and see

I entirely forgot about that tool! *Definitely* worth trying.

I'll give it a try this weekend.

> when the re-enabling config space is written, but it might be too
> heavyweight for a live production system.

Given that the re-enabling happens at around the same time as the boot
scripts finish running (it's done by the time I can log in), that's not
going to be a problem. Hence my speculation that it's being re-enabled
when the interface stabilizes (which is, of course, asynchronous) or
something like that.

> I am also working on a patch to the driver to detect the condition and
> then do a slot reset to avoid a whole system reboot. Would you be
> willing to test it in your problem system?

Yes, definitely. The whole-system reboot is irritating because the
system is headless, and with its NICs dead that means a big red switch
to reboot when this problem strikes, which gives me the heebie-jeebies
:)

(Turning off ASPM definitely completely fixes it, so it *is* this bug.
It's just getting the disabling to stick that's proving tricky.)

-- 
NULL && (void)

^ permalink raw reply

* Re: [PATCH 2/2] tcp: cleanup tcp_try_coalesce
From: Guy, Wey-Yi @ 2012-05-03 20:21 UTC (permalink / raw)
  To: John W. Linville
  Cc: David Miller, eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	alexander.duyck-Re5JQEeQqe8AvxtiuMwx3w,
	alexander.h.duyck-ral2JQCrhuEAvxtiuMwx3w,
	netdev-u79uwXL29TY76Z2rM5mHXA, edumazet-hpIqsD4AKlfQT0dZR+AlfA,
	jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA, Stephen Rothwell
In-Reply-To: <20120503170727.GM9285-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>

Hi John,

On Thu, 2012-05-03 at 13:07 -0400, John W. Linville wrote:
> On Thu, May 03, 2012 at 08:24:19AM -0700, Guy, Wey-Yi wrote:
> > Hi John,
> > 
> > On Thu, 2012-05-03 at 11:14 -0400, John W. Linville wrote:
> > > On Thu, May 03, 2012 at 01:25:02AM -0400, David Miller wrote:
> > > > From: Eric Dumazet <eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> > > > Date: Thu, 03 May 2012 07:19:33 +0200
> > > > 
> > > > > My last patch against iwlwifi is still waiting to make its way into
> > > > > official tree.
> > > > > 
> > > > > http://www.spinics.net/lists/netdev/msg192629.html
> > > > 
> > > > John, please rectify this situation.
> > > > 
> > > > The Intel Wireless folks said they would test it, but that was more
> > > > than a month ago.
> > > > 
> > > > It's not acceptable to let bug fixes rot for that long, I don't care
> > > > what their special internal testing procedure is.
> > > > 
> > > > If they give you further pushback, please just ignore them and apply
> > > > Eric's fix directly.
> > > > 
> > > > Thank you.
> > > 
> > > I imagine that this somehow got lost in the shuffle during the
> > > merge window.  That doesn't excuse it, of course.
> > > 
> > > It has waited long enough already, so I'll just go ahead and take it.
> > > 
> > it is my mistake to lost this patch during the iwlwifi re-factor work,
> > the patch is no longer apply and I ask Eric to rebase the patch.
> > 
> > Sorry again for the mistake
> 
> Well, it seems like a fix needed for 3.4.  And, the patch applies there.
> 
> It does cause some merge breakage in wireless-testing (and presumably
> in linux-next).  I'll attach the commit diff for the wireless-testing
> merge fixup I did, for review and/or as a peace offering to sfr... :-)
> 
> Please take a look at the result in wireless-testing and let me know
> if it is broken...thanks!
> 
Looks good to me, thanks very much for helping this.

Wey

> 


--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [PATCH] net: davinci_emac: Add pre_open, post_stop platform callbacks
From: Ben Hutchings @ 2012-05-03 20:22 UTC (permalink / raw)
  To: Bedia, Vaibhav
  Cc: Mark A. Greer, netdev@vger.kernel.org, linux-omap@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <B5906170F1614E41A8A28DE3B8D121433EA7366B@DBDE01.ent.ti.com>

On Thu, 2012-05-03 at 19:25 +0000, Bedia, Vaibhav wrote:
> On Fri, May 04, 2012 at 00:16:32, Mark A. Greer wrote:
> [...]
> > > 
> > > So, if I understood this correctly, it's effectively like blocking a low power
> > > state transition (here wfi execution) when EMAC is active?
> > 
> > Assuming "it" is my patch, correct.
> > 
> 
> Recently I was thinking about how to get certain drivers to disallow some or all
> low power states and to me this also seems to fall in a similar category.
> 
> One of the suggestions that I got was to check if the 'wakeup' entry associated with
> the device under sysfs could be leveraged for this. The PM code could maintain
> a whitelist (or blacklist) of devices and it decides the low power state to enter
> based on the 'wakeup' entries associated with these devices. In this particular case,
> maybe the driver could simply set this entry to non-wakeup capable when necessary and
> then let the PM code take care of skipping the wfi execution.
> 
> Thoughts/brickbats welcome :)

You can maybe (ab)use the pm_qos mechanism for this.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH] [IPv6]: Treat ND option 31 as userland (DNSSL support)
From: Francesco Ruggeri @ 2012-05-03 20:51 UTC (permalink / raw)
  To: netdev

Does anybody know if this patch should be in v3.4? I did not see it in 3.4-rc5.
Thanks,
Francesco Ruggeri


Date:	Thu, 12 Apr 2012 15:57:51 -0400 (EDT)
From:	David Miller <davem@...emloft.net>
To:	raorn@...rn.name
Cc:	netdev@...r.kernel.org
Subject: Re: [PATCH] [IPv6]: Treat ND option 31 as userland (DNSSL support)

From: "Alexey I. Froloff" <raorn@...rn.name>
Date: Fri,  6 Apr 2012 19:50:58 +0400

> As specified in RFC6106, DNSSL option contains one or more domain names
> of DNS suffixes.  8-bit identifier of the DNSSL option type as assigned
> by the IANA is 31.  This option should also be treated as userland.
>
> Signed-off-by: Alexey I. Froloff <raorn@...rn.name>

Applied to net-next, thanks.

^ permalink raw reply

* Re: [PATCH 1/2] cs89x0_platform : Use ioread16/iowrite16 instead of inw/outw
From: Jaccon Bastiaansen @ 2012-05-03 20:55 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: s.hauer, gfm, davem, festevam, linux-arm-kernel, kernel, netdev
In-Reply-To: <201204301419.27119.arnd@arndb.de>

Hello Arnd,

2012/4/30 Arnd Bergmann <arnd@arndb.de>:
> On Monday 30 April 2012, Jaccon Bastiaansen wrote:
>> The use of the inw/outw functions by the cs89x0 platform driver
>> results in NULL pointer references.
>>
>> Signed-off-by: Jaccon Bastiaansen <jaccon.bastiaansen@gmail.com>
>> ---
>>  drivers/net/ethernet/cirrus/cs89x0.c |   12 ++++++++++++
>>  1 files changed, 12 insertions(+), 0 deletions(-)
>
> It's actually broken on most platforms already, and the #ifdef is
> about to go away since IXP2xxx is getting removed in v3.5.
>
>> diff --git a/drivers/net/ethernet/cirrus/cs89x0.c b/drivers/net/ethernet/cirrus/cs89x0.c
>> index b9406cb..95737e9 100644
>> --- a/drivers/net/ethernet/cirrus/cs89x0.c
>> +++ b/drivers/net/ethernet/cirrus/cs89x0.c
>> @@ -369,6 +369,18 @@ writeword(unsigned long base_addr, int portno, u16 value)
>>  {
>>         __raw_writel(value, base_addr + (portno << 1));
>>  }
>> +#elif defined(CONFIG_CS89x0_PLATFORM)
>> +static u16
>> +readword(unsigned long base_addr, int portno)
>> +{
>> +       return ioread16(base_addr + portno);
>> +}
>> +
>> +static void
>> +writeword(unsigned long base_addr, int portno, u16 value)
>> +{
>> +       iowrite16(value, base_addr + portno);
>> +}
>>  #else
>>  static u16
>>  readword(unsigned long base_addr, int portno)
>
> I think the best solution would be to always using ioread32/iowrite32
> in the #else path, and change the ISA code to do an ioport_map
> for the base address, passing around the virtual address as an __iomem
> pointer.
>
>        Arnd
> --

So if I understand you correctly you would like to have an
iopart_map() call in the cs89x0_probe() function and use the return
value of that iopart_map() call as ioaddr parameter of the
cs89x0_probe1() function. Is this correct? This would make the
cs89x0_probe() function similar to the cs89x0_platform_probe()
function where the return value of the ioremap() call is used as
ioaddr parameter of the cs89x0_probe1() function.

But why do you want to convert the current 16 bit accesses in the
#else path to 32 bit accesses? Why not using ioread16()/iowrite16()?


Regards,
  Jaccon

^ permalink raw reply

* Re: [PATCHv2 4/7] mISDN: Make layer1 timer 3 value configurable
From: David Miller @ 2012-05-03 21:08 UTC (permalink / raw)
  To: kkeil; +Cc: netdev, isdn
In-Reply-To: <1336060052-27119-5-git-send-email-kkeil@linux-pingi.de>

From: Karsten Keil <kkeil@linux-pingi.de>
Date: Thu,  3 May 2012 17:47:29 +0200

> @@ -372,6 +372,7 @@ clear_channelmap(u_int nr, u_char *map)
>  #define MISDN_CTRL_RX_OFF		0x0100
>  #define MISDN_CTRL_FILL_EMPTY		0x0200
>  #define MISDN_CTRL_GETPEER		0x0400
> +#define MISDN_CTRL_L1_TIMER3		0x0800
>  #define MISDN_CTRL_HW_FEATURES_OP	0x2000
>  #define MISDN_CTRL_HW_FEATURES		0x2001
>  #define MISDN_CTRL_HFC_OP		0x4000

This define is completely unused by this patch.

To say that I'm frustrated by this process would be an understatement.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox