Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] net/fsl_pq_mdio: use spin_event_timeout() to poll the indicator register
From: Tabi Timur-B04825 @ 2012-07-09 14:31 UTC (permalink / raw)
  To: David Miller; +Cc: Fleming Andy-AFLEMING, netdev@vger.kernel.org
In-Reply-To: <20120708.235926.1117975937932919247.davem@davemloft.net>

David Miller wrote:

> Define a macro for the timeout value rather than use an arbitrary
> constant.

Ok.

>> +	status = spin_event_timeout(!(in_be32(&regs->miimind) &	MIIMIND_BUSY),
>> +		1000, 0);
>
> This indentation is absolutely terrible.

Can you give me a clue as to how you think it should look?  I could not 
come up with a good way to break up these lines and keep them under 80 
characters.

-- 
Timur Tabi
Linux kernel developer at Freescale

^ permalink raw reply

* Re: [RFC PATCH] ppp: add support for L2 multihop / tunnel switching
From: Benjamin LaHaise @ 2012-07-09 14:15 UTC (permalink / raw)
  To: James Chapman; +Cc: netdev, linux-ppp
In-Reply-To: <4FFAC5EF.6030003@katalix.com>

On Mon, Jul 09, 2012 at 12:52:15PM +0100, James Chapman wrote:
> As a mechanism for switching PPP interfaces together, this patch is
> good. For L2TP though, I prefer an approach that would be applicable for
> all L2TP traffic types, not just PPP.

*nod*  This seems like a reasonable consideration.

> L2TP supports many different pseudowire types, and this patch will only
> be useful for tunnel switching between PPP pseudowires. Whereas if we
> implement it within the L2TP core, rather than in the PPP code, we would
> get switching between all pseudowire types. If we add this patch and
> then subsequently add switching between other pseudowires in the L2TP
> core (which we're likely to want to do), then we're left with two
> different interfaces for doing L2TP tunnel switching in the kernel.

At least for ethernet pseudowires, it can already be implemented by using 
an ethernet bridge device.  Besides PPP and ethernet pseudowires, what 
other types are supported at present by the L2TP core?

> The L2TP core allows traffic to be passed directly into an L2TP session.
> In the case of PPPoE, for example, the PPP data can be  extracted from a
> PPPoE packet and passed into an L2TP tunnel/session, with no PPP
> interface(s) involved.
> 
> That said, your approach allows two PPP interfaces to be switched
> together, which has its own advantages.

I think the approach I'm using should be reasonably efficient for PPPoE 
to L2TP, although the locking overhead in the PPP core probably needs to 
be reduced to improve scaling.  I haven't yet done any benchmarking on this 
approach to see how much overhead there is compared to the other code I'd 
written which took a more direct approach (this wasn't on top of the 
ppp_generic core, but the old Babylon kernel modules which have had this 
functionality for a long time).

> > The reasoning behind using dev_queue_xmit() rather than outputting directly 
> > to another PPP channel is to enable the use of the traffic shaping and 
> > queuing features of the kernel on multihop sessions.
> 
> I'm not sure about using a pseudo packet type to do this. For L2TP, it
> would seem better to add netfilter/tc support for L2TP data packets,
> which would let people add rules for, say, traffic in L2TP tunnel x /
> session y. This would avoid the need for ETH_P_PPP and you could then
> output directly to the ppp channel.

The downside of an L2TP specific method is that all the mechanisms need to 
be duplicated, resulting in a much higher maintenance overhead for the 
code and functionality, not to mention all the tool changes to go along 
with that.

As for the pseudo packet type, it may indeed be better to avoid the pseudo 
packet type for known PPP packet types.  One of the benefits of going the 
network device route is that it makes it much easier to implement additional 
functionality like lawful intercept, which would be yet more functionality 
that would have to be implemented if the mechanism is L2TP specific.  The 
pseudo packet type would still be needed for forwarding PPP frames that the 
kernel doesn't know about (all the *CP packet types and MLPPP come to mind)

I had thought about doing the packet forwarding in a manner similar to the 
bridging code -- that is, as a pseudowire bridge in the network core that 
only works between 2 devices.  That approach might work better for L2TP, as 
it would be able to pass packets of any type between the 2 endpoints.

		-ben
-- 
"Thought is the essence of where you are now."

^ permalink raw reply

* Re: TCP transmit performance regression
From: Eric Dumazet @ 2012-07-09 13:54 UTC (permalink / raw)
  To: Ming Lei; +Cc: Network Development, David Miller
In-Reply-To: <CACVXFVNYwNrdWS_EiQ4O0-ou369AQ2tB4qbvmzY1rt5T3QnyUw@mail.gmail.com>

On Mon, 2012-07-09 at 21:23 +0800, Ming Lei wrote:

> Looks the patch replaces skb_clone with netdev_alloc_skb_ip_align and
> introduces extra copies on incoming data, so would you mind explaining
> it in a bit detail? And why is skb_clone not OK for the purpose?

Problem with cloning is that some paths will have to make a private copy
of the skb.

So you dont see the cost here in the driver, but later in upper stacks.

Since this driver defaults to a huge RX area of more than 16Kbytes,
a copy to a much smaller skb (we call this 'copybreak' in our jargon )
is more than welcome to avoid OOM problems anyway.

TCP coalescing (skb_try_coalesce) for example wont work for cloned skbs,
so TCP receive window will close pretty fast, and performance sucks in
lossy environments (like the Internet)

Actually, since this driver lies about skb->truesize, a single UDP frame
consumes 32Kbytes of memory, escaping normal memory limits we have in
kernel by a factor of 64. Thats pretty bad, especially for a beagle
board.

^ permalink raw reply

* Re: [PATCH] smsc95xx: support ethtool get_regs
From: Émeric Vigier @ 2012-07-09 13:44 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Steve Glendinning, steve glendinning, netdev, Nancy Lin
In-Reply-To: <1341690959.25597.168.camel@deadeye.wl.decadent.org.uk>

----- Mail original -----
> On Sat, 2012-07-07 at 09:58 -0400, Émeric Vigier wrote:
> [...]
> > > > +	for (i = 0; i <= PHY_SPECIAL; i++)
> > > > +		data[j++] = smsc95xx_mdio_read(netdev, dev->mii.phy_id, i);
> > > > +}
> > > 
> > > Again, why use PHY_SPECIAL (+ 1) here as opposed to 32 in the
> > > calculation of the length?
> > 
> > 32 was ok, but I hesitated between defining a SMSC95XX_PHY_END or
> > using the last defined register.
> > Are 32 register-PHY generic to most devices? I mean could 32 be use
> > widely?
> 
> Yes, the address space for the original MDIO protocol ('clause 22')
> allows for 32 registers.  Perhaps that number should be named in
> <linux/mii.h>.

I have not found any definition of this in mii.h.

> 
> As another reviewer commented, though, MDIO PHY registers should be
> accessible with SIOCGMIIREG and mii-tool so it's not really necessary
> to
> duplicate them here.

Ok, I remove this then.

> 
> Ben.
> 
> --
> Ben Hutchings, Staff Engineer, Solarflare
> Not speaking for my employer; that's the marketing department's job.
> They asked us to note that Solarflare product names are trademarked.
> 
> 

-- 
Emeric

^ permalink raw reply

* [PATCH 2/4] pch_gbe: fix transmit watchdog timeout
From: Andy Cress @ 2012-07-09 13:24 UTC (permalink / raw)
  To: netdev

Author: Andy Cress <andy.cress@us.kontron.com>

An extended ping and iperf test with 6 vlans resulted in transmit
timeout stats 
and sometimes a driver oops with a netdev watchdog transmit timeout.
Fix WATCHDOG_TIMEOUT to be more like e1000e at 5 * HZ, to avoid 
unnecessary transmit timeouts.

Signed-off-by: Andy Cress <andy.cress@us.kontron.com>

diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
index 4c04843..a746064 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
@@ -35,7 +35,7 @@ const char pch_driver_version[] = DRV_VERSION;
 #define DSC_INIT16			0xC000
 #define PCH_GBE_DMA_ALIGN		0
 #define PCH_GBE_DMA_PADDING		2
-#define PCH_GBE_WATCHDOG_PERIOD		(1 * HZ)	/*
watchdog time */
+#define PCH_GBE_WATCHDOG_PERIOD		(5 * HZ)	/*
watchdog time */
 #define PCH_GBE_COPYBREAK_DEFAULT	256
 #define PCH_GBE_PCI_BAR			1
 #define PCH_GBE_RESERVE_MEMORY		0x200000	/* 2MB */

^ permalink raw reply related

* [PATCH 1/4] pch_gbe: fix the checksum fill to the error location
From: Andy Cress @ 2012-07-09 13:30 UTC (permalink / raw)
  To: netdev

From: Zhong Hongbo <hongbo.zhong@windriver.com>
Date: Mon, 9 Apr 2012 10:51:28 +0800

Due to some unknown hardware limitations the pch_gbe hardware cannot
calculate checksums when the length of network package is less
than 64 bytes, where we will surprisingly encounter a problem of
the destination IP incorrectly changed.

When forwarding network packages at the network layer the IP packages
won't be relayed to the upper transport layer and analyzed there,
consequently, skb->transport_header pointer will be mistakenly remained
the same as that of skb->network_header, resulting in TCP checksum
wrongly
filled into the field of destination IP in IP header.

We can fix this issue by manually calculate the offset of the TCP
checksum
 and update it accordingly.

Signed-off-by: Zhong Hongbo <hongbo.zhong@windriver.com>
Merged-by: Andy Cress <andy.cress@us.kontron.com>

---
 drivers/net/pch_gbe/pch_gbe_main.c |   14 ++++++++------
 1 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
index 3787c64..4c04843 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
@@ -1180,30 +1180,32 @@ static void pch_gbe_tx_queue(struct
pch_gbe_adapter *adapter,
 	 * when the received data size is less than 64 bytes.
 	 */
 	if (skb->len < PCH_GBE_SHORT_PKT && skb->ip_summed !=
CHECKSUM_NONE) {
+		struct iphdr *iph = ip_hdr(skb);
 		frame_ctrl |= PCH_GBE_TXD_CTRL_APAD |
 			      PCH_GBE_TXD_CTRL_TCPIP_ACC_OFF;
 		if (skb->protocol == htons(ETH_P_IP)) {
-			struct iphdr *iph = ip_hdr(skb);
 			unsigned int offset;
-			offset = skb_transport_offset(skb);
+			offset = (unsigned char *)((u8 *)iph + iph->ihl
* 4) - skb->data;
 			if (iph->protocol == IPPROTO_TCP) {
+				struct tcphdr *tcphdr_point = (struct
tcphdr *)((u8 *)iph + iph->ihl * 4);
 				skb->csum = 0;
-				tcp_hdr(skb)->check = 0;
+				tcphdr_point->check = 0;
 				skb->csum = skb_checksum(skb, offset,
 							 skb->len -
offset, 0);
-				tcp_hdr(skb)->check =
+				tcphdr_point->check = 
 					csum_tcpudp_magic(iph->saddr,
 							  iph->daddr,
 							  skb->len -
offset,
 							  IPPROTO_TCP,
 							  skb->csum);
 			} else if (iph->protocol == IPPROTO_UDP) {
+				struct udphdr *udphdr_point = (struct
udphdr *)((u8 *)iph + iph->ihl * 4);
 				skb->csum = 0;
-				udp_hdr(skb)->check = 0;
+				udphdr_point->check = 0;
 				skb->csum =
 					skb_checksum(skb, offset,
 						     skb->len - offset,
0);
-				udp_hdr(skb)->check =
+				udphdr_point->check = 
 					csum_tcpudp_magic(iph->saddr,
 							  iph->daddr,
 							  skb->len -
offset,

^ permalink raw reply related

* [PATCH 4/4] pch_gbe: vlan skb len fix
From: Andy Cress @ 2012-07-09 13:30 UTC (permalink / raw)
  To: netdev

Author: Veaceslav Falico <vfalico@redhat.com>
Date:   Tue Apr 10 08:14:17 2012 +0200

    pch_gbe: correctly verify skb->len in vlan case
             to avoid bogus transfer length errors.

Signed-off-by: Andy Cress <andy.cress@us.kontron.com>

diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
index 30ef285..04b0e49 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
@@ -2158,8 +2158,10 @@ static int pch_gbe_xmit_frame(struct sk_buff
*skb, struct net_device *netdev)
 	struct pch_gbe_adapter *adapter = netdev_priv(netdev);
 	struct pch_gbe_tx_ring *tx_ring = adapter->tx_ring;
 	unsigned long flags;
+	int offset;
 
-	if (unlikely(skb->len > (adapter->hw.mac.max_frame_size - 4))) {
+	offset = skb->protocol == htons(ETH_P_8021Q) ? 0 : 4;
+	if (unlikely(skb->len > (adapter->hw.mac.max_frame_size -
offset))) {
 		pr_err("Transfer length Error: skb len: %d > max: %d\n",
 		       skb->len, adapter->hw.mac.max_frame_size);
 		dev_kfree_skb_any(skb);

^ permalink raw reply related

* [PATCH 0/4] pch_gbe: avoiding transmit timeouts, etc.
From: Andy Cress @ 2012-07-09 13:19 UTC (permalink / raw)
  To: netdev

When the interface is stressed with 6 VLANs, some transmit timeout stats
were observed, which is a potential precursor to the more severe netdev
watchdog timeout oops.  Also we saw more than the expected number of
transmit restarts, which impacted performance.   The following patches
were applied and resolved the symptom of the transmit timeout stats, and
reduced the number of transmit restarts.  

This patch set includes the following patches:
0001-pch_gbe-Fix-the-checksum-fill-to-the-error-location.patch  
0002-pch_gbe-fix-transmit-watchdog-timeout.patch  
0003-pch_gbe-add-extra-clean-tx.patch  (which also includes bumping the
version to 1.01)
0004-pch_gbe-vlan-skb-len-fix.patch

The resulting pch_gbe 1.01 driver has been tested on Kontron Tunnel
Creek EG20T modules and Intel Crown Bay EG20T modules, so I believe that
these are appropriate for consideration in the upstream pch_gbe driver.

Please review and comment.

Thanks,
Andy

^ permalink raw reply

* [PATCH 3/4] pch_gbe: add extra clean tx
From: Andy Cress @ 2012-07-09 13:28 UTC (permalink / raw)
  To: netdev

Author: Andy Cress <andy.cress@us.kontron.com>

This adds extra cleaning to the pch_gbe_clean_tx routine to avoid 
transmit timeouts on some PHYs that have different timing.
Also update the DRV_VERSION to 1.01, and show it.

Signed-off-by: Andy Cress <andy.cress@us.kontron.com>

diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
index a746064..30ef285 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
@@ -26,7 +26,7 @@
 #include <linux/ptp_classify.h>
 #endif
 
-#define DRV_VERSION     "1.00"
+#define DRV_VERSION     "1.01"
 const char pch_driver_version[] = DRV_VERSION;
 
 #define PCI_DEVICE_ID_INTEL_IOH1_GBE	0x8802		/* Pci device ID
*/
@@ -1581,7 +1581,8 @@ pch_gbe_clean_tx(struct pch_gbe_adapter *adapter,
 	struct sk_buff *skb;
 	unsigned int i;
 	unsigned int cleaned_count = 0;
-	bool cleaned = true;
+	bool cleaned = false;
+	int unused, thresh;
 
 	pr_debug("next_to_clean : %d\n", tx_ring->next_to_clean);
 
@@ -1590,10 +1591,36 @@ pch_gbe_clean_tx(struct pch_gbe_adapter
*adapter,
 	pr_debug("gbec_status:0x%04x  dma_status:0x%04x\n",
 		 tx_desc->gbec_status, tx_desc->dma_status);
 
+	unused = PCH_GBE_DESC_UNUSED(tx_ring);
+	thresh = tx_ring->count - PCH_GBE_TX_WEIGHT;
+	if ((tx_desc->gbec_status == DSC_INIT16) && (unused < thresh))
+	{  /* current marked clean, tx queue filling up, do extra clean
*/
+		int j, k;
+		if (unused < 8) {  /* tx queue nearly full */
+			pr_debug("clean_tx: transmit queue warning
(%x,%x) unused=%d\n",
+
tx_ring->next_to_clean,tx_ring->next_to_use,unused);
+		}
+	   
+		/* current marked clean, scan for more that need
cleaning. */
+		k = i;
+		for (j = 0; j < PCH_GBE_TX_WEIGHT; j++) 
+		{
+			tx_desc = PCH_GBE_TX_DESC(*tx_ring, k);
+			if (tx_desc->gbec_status != DSC_INIT16) break;
/*found*/
+			if (++k >= tx_ring->count) k = 0;  /*increment,
wrap*/
+		}
+		if (j < PCH_GBE_TX_WEIGHT) {
+			pr_debug("clean_tx: unused=%d loops=%d found
tx_desc[%x,%x:%x].gbec_status=%04x\n",
+				unused,j, i,k, tx_ring->next_to_use,
tx_desc->gbec_status);
+			i = k;  /*found one to clean, usu
gbec_status==2000.*/
+		}
+	}
+
 	while ((tx_desc->gbec_status & DSC_INIT16) == 0x0000) {
 		pr_debug("gbec_status:0x%04x\n", tx_desc->gbec_status);
 		buffer_info = &tx_ring->buffer_info[i];
 		skb = buffer_info->skb;
+		cleaned = true;
 
 		if ((tx_desc->gbec_status & PCH_GBE_TXD_GMAC_STAT_ABT))
{
 			adapter->stats.tx_aborted_errors++;
@@ -1641,18 +1668,21 @@ pch_gbe_clean_tx(struct pch_gbe_adapter
*adapter,
 	}
 	pr_debug("called pch_gbe_unmap_and_free_tx_resource() %d
count\n",
 		 cleaned_count);
-	/* Recover from running out of Tx resources in xmit_frame */
-	spin_lock(&tx_ring->tx_lock);
-	if (unlikely(cleaned && (netif_queue_stopped(adapter->netdev))))
{
-		netif_wake_queue(adapter->netdev);
-		adapter->stats.tx_restart_count++;
-		pr_debug("Tx wake queue\n");
-	}
+	if (cleaned_count > 0)  { /*skip this if nothing cleaned*/
+		/* Recover from running out of Tx resources in
xmit_frame */
+		spin_lock(&tx_ring->tx_lock);
+		if (unlikely(cleaned &&
(netif_queue_stopped(adapter->netdev))))
+		{
+			netif_wake_queue(adapter->netdev);
+			adapter->stats.tx_restart_count++;
+			pr_debug("Tx wake queue\n");
+		}
 
-	tx_ring->next_to_clean = i;
+		tx_ring->next_to_clean = i;
 
-	pr_debug("next_to_clean : %d\n", tx_ring->next_to_clean);
-	spin_unlock(&tx_ring->tx_lock);
+		pr_debug("next_to_clean : %d\n",
tx_ring->next_to_clean);
+		spin_unlock(&tx_ring->tx_lock);
+	}
 	return cleaned;
 }
 
@@ -2389,7 +2419,7 @@ static int pch_gbe_napi_poll(struct napi_struct
*napi, int budget)
 	pch_gbe_clean_rx(adapter, adapter->rx_ring, &work_done, budget);
 	cleaned = pch_gbe_clean_tx(adapter, adapter->tx_ring);
 
-	if (!cleaned)
+	if (cleaned)
 		work_done = budget;
 	/* If no Tx and not enough Rx work done,
 	 * exit the polling mode
@@ -2795,6 +2825,7 @@ static int __init pch_gbe_init_module(void)
 {
 	int ret;
 
+	pr_info("EG20T PCH Gigabit Ethernet Driver - version
%s\n",DRV_VERSION);
 	ret = pci_register_driver(&pch_gbe_driver);
 	if (copybreak != PCH_GBE_COPYBREAK_DEFAULT) {
 		if (copybreak == 0) {

^ permalink raw reply related

* Re: TCP transmit performance regression
From: Ming Lei @ 2012-07-09 13:23 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Network Development, David Miller
In-Reply-To: <1341481760.2583.3579.camel@edumazet-glaptop>

On Thu, Jul 5, 2012 at 5:49 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2012-07-05 at 16:42 +0800, Ming Lei wrote:
>> On Thu, Jul 5, 2012 at 4:33 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Thu, 2012-07-05 at 16:27 +0800, Ming Lei wrote:
>> >
>> >> After some investigation, the problem is caused by enabling
>> >> DEBUG_SLAB, so it is not a regression.
>> >>
>> >
>> > Strange, unless your machine is a _very_ slow one maybe ?
>>
>> It is a beagle-xm board, and its cpu is ARMv7, 1GHz.
>
> OK, driver seems buggy, please try following patch (on both sides if
> possible)
>
>  drivers/net/usb/smsc95xx.c |   11 ++++-------
>  1 file changed, 4 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
> index b1112e7..0a4ae35 100644
> --- a/drivers/net/usb/smsc95xx.c
> +++ b/drivers/net/usb/smsc95xx.c
> @@ -1084,26 +1084,23 @@ static int smsc95xx_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
>                         if (skb->len == size) {
>                                 if (dev->net->features & NETIF_F_RXCSUM)
>                                         smsc95xx_rx_csum_offload(skb);
> -                               skb_trim(skb, skb->len - 4); /* remove fcs */
> +                               __skb_trim(skb, skb->len - 4); /* remove fcs */
>                                 skb->truesize = size + sizeof(struct sk_buff);
>
>                                 return 1;
>                         }
>
> -                       ax_skb = skb_clone(skb, GFP_ATOMIC);
> +                       ax_skb = netdev_alloc_skb_ip_align(dev->net, size);
>                         if (unlikely(!ax_skb)) {
>                                 netdev_warn(dev->net, "Error allocating skb\n");
>                                 return 0;
>                         }
>
> -                       ax_skb->len = size;
> -                       ax_skb->data = packet;
> -                       skb_set_tail_pointer(ax_skb, size);
> +                       memcpy(skb_put(ax_skb, size), packet, size);
>
>                         if (dev->net->features & NETIF_F_RXCSUM)
>                                 smsc95xx_rx_csum_offload(ax_skb);
> -                       skb_trim(ax_skb, ax_skb->len - 4); /* remove fcs */
> -                       ax_skb->truesize = size + sizeof(struct sk_buff);
> +                       __skb_trim(ax_skb, ax_skb->len - 4); /* remove fcs */
>
>                         usbnet_skb_return(dev, ax_skb);
>                 }
>
>

Looks the patch replaces skb_clone with netdev_alloc_skb_ip_align and
introduces extra copies on incoming data, so would you mind explaining
it in a bit detail? And why is skb_clone not OK for the purpose?


Thanks,
-- 
Ming Lei

^ permalink raw reply

* Re: [PATCH] net: cgroup: fix out of bounds accesses
From: Neil Horman @ 2012-07-09 12:56 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, linux-kernel, netdev, lizefan, tj, Gao feng
In-Reply-To: <1341837625.3265.2748.camel@edumazet-glaptop>

On Mon, Jul 09, 2012 at 02:40:25PM +0200, Eric Dumazet wrote:
> On Mon, 2012-07-09 at 08:13 -0400, Neil Horman wrote:
> > On Mon, Jul 09, 2012 at 01:50:52PM +0200, Eric Dumazet wrote:
> > > On Mon, 2012-07-09 at 07:01 -0400, Neil Horman wrote:
> > > 
> > > > Thank you for doing this Eric, Gao.  Just to be sure (I asked in the previous
> > > > thread), would it be better to avoid the length check in skb_update_prio, and
> > > > instead update the netdev tables to be long enough in cgrp_create and in
> > > > netprio_device_event on device registration?
> > > 
> > > Yes probably, and it is even needed because extend_netdev_table() can
> > > acutally fail to expand the table if kzalloc() returned NULL.
> > > 
> > > Current code just ignores this allocation failure so we also can crash
> > > in write_priomap()
> > > 
> > ACK, can you follow up with a patch please?
> 
> Gao was working on this allocation problem (he privately sent me a v1 of
> his patch), so I think we can wait Gao submit a v2 to combine all the
> work/ideas in a single patch.
> 
> (ie make sure we dont need additional bound checkings in fast path)
> 
> 
Ok, I agree.  thanks!
Neil

> 
> 

^ permalink raw reply

* Re: [PATCH] net: cgroup: fix out of bounds accesses
From: Eric Dumazet @ 2012-07-09 12:40 UTC (permalink / raw)
  To: Neil Horman; +Cc: David Miller, linux-kernel, netdev, lizefan, tj, Gao feng
In-Reply-To: <20120709121350.GC9186@hmsreliant.think-freely.org>

On Mon, 2012-07-09 at 08:13 -0400, Neil Horman wrote:
> On Mon, Jul 09, 2012 at 01:50:52PM +0200, Eric Dumazet wrote:
> > On Mon, 2012-07-09 at 07:01 -0400, Neil Horman wrote:
> > 
> > > Thank you for doing this Eric, Gao.  Just to be sure (I asked in the previous
> > > thread), would it be better to avoid the length check in skb_update_prio, and
> > > instead update the netdev tables to be long enough in cgrp_create and in
> > > netprio_device_event on device registration?
> > 
> > Yes probably, and it is even needed because extend_netdev_table() can
> > acutally fail to expand the table if kzalloc() returned NULL.
> > 
> > Current code just ignores this allocation failure so we also can crash
> > in write_priomap()
> > 
> ACK, can you follow up with a patch please?

Gao was working on this allocation problem (he privately sent me a v1 of
his patch), so I think we can wait Gao submit a v2 to combine all the
work/ideas in a single patch.

(ie make sure we dont need additional bound checkings in fast path)

^ permalink raw reply

* Re: 82571EB: Detected Hardware Unit Hang
From: Joe Jin @ 2012-07-09 12:19 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev@vger.kernel.org, e1000-devel, linux-kernel@vger.kernel.org
In-Reply-To: <1341825677.3265.2330.camel@edumazet-glaptop>

On 07/09/12 17:21, Eric Dumazet wrote:
> On Mon, 2012-07-09 at 16:51 +0800, Joe Jin wrote:
>> Hi list,
>>
>> I'm seeing a Unit Hang even with the latest e1000e driver 2.0.0 when doing
>> scp test. this issue is easy do reproduced on SUN FIRE X2270 M2, just copy
>> a big file (>500M) from another server will hit it at once. 
>>
>> Would you please help on this?
>>
> 
> Its a known problem.
> 
> But apparently Intel guys are not very responsive, as they have another
> patch than the following :
> 
> http://permalink.gmane.org/gmane.linux.network/232669

Eris, 

Thanks for you reply, but seems this patch not help for me, 
applied the patch still hit the issue:

# dmesg
e1000e 0000:05:00.0: eth0: Detected Hardware Unit Hang:
  TDH                  <6f>
  TDT                  <7e>
  next_to_use          <7e>
  next_to_clean        <6e>
buffer_info[next_to_clean]:
  time_stamp           <fffd48dc>
  next_to_watch        <74>
  jiffies              <fffd5344>
  next_to_watch.status <0>
MAC Status             <80387>
PHY Status             <792d>
PHY 1000BASE-T Status  <3c00>
PHY Extended Status    <3000>
PCI Status             <10>
e1000e 0000:05:00.0: eth0: Detected Hardware Unit Hang:
  TDH                  <6f>
  TDT                  <7e>
  next_to_use          <7e>
  next_to_clean        <6e>
buffer_info[next_to_clean]:
  time_stamp           <fffd48dc>
  next_to_watch        <74>
  jiffies              <fffd5b14>
  next_to_watch.status <0>
MAC Status             <80387>
PHY Status             <792d>
PHY 1000BASE-T Status  <3c00>
PHY Extended Status    <3000>
PCI Status             <10>
e1000e 0000:05:00.0: eth0: Detected Hardware Unit Hang:
  TDH                  <6f>
  TDT                  <7e>
  next_to_use          <7e>
  next_to_clean        <6e>
buffer_info[next_to_clean]:
  time_stamp           <fffd48dc>
  next_to_watch        <74>
  jiffies              <fffd62e4>
  next_to_watch.status <0>
MAC Status             <80387>
PHY Status             <792d>
PHY 1000BASE-T Status  <3c00>
PHY Extended Status    <3000>
PCI Status             <10>
e1000e 0000:05:00.0: eth0: Detected Hardware Unit Hang:
  TDH                  <6f>
  TDT                  <7e>
  next_to_use          <7e>
  next_to_clean        <6e>
buffer_info[next_to_clean]:
  time_stamp           <fffd48dc>
  next_to_watch        <74>
  jiffies              <fffd6ab4>
  next_to_watch.status <0>
MAC Status             <80387>
PHY Status             <792d>
PHY 1000BASE-T Status  <3c00>
PHY Extended Status    <3000>
PCI Status             <10>
------------[ cut here ]------------
WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x225/0x230()
Hardware name: SUN FIRE X2270 M2
NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
Modules linked in: autofs4 hidp rfcomm bluetooth rfkill lockd sunrpc cpufreq_ondemand acpi_cpufreq mperf be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi video sbs sbshc acpi_pad acpi_ipmi ipmi_msghandler parport_pc lp parport e1000e(U) snd_seq_dummy snd_seq_oss snd_seq_midi_event igb snd_seq snd_seq_device serio_raw snd_pcm_oss snd_mixer_oss snd_pcm tpm_infineon snd_timer snd soundcore i7core_edac iTCO_wdt iTCO_vendor_support snd_page_alloc edac_core i2c_i801 ioatdma i2c_core pcspkr ghes dca hed dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage sd_mod crc_t10dif sg ahci libahci ext3 jbd mbcache [last unloaded: microcode]
Pid: 0, comm: swapper Not tainted 2.6.39-200.24.1.el5uek #1
Call Trace:
 [<c07d9ac5>] ? dev_watchdog+0x225/0x230
 [<c045ba61>] warn_slowpath_common+0x81/0xa0
 [<c07d9ac5>] ? dev_watchdog+0x225/0x230
 [<c045bb23>] warn_slowpath_fmt+0x33/0x40
 [<c07d9ac5>] dev_watchdog+0x225/0x230
 [<c07d98a0>] ? dev_activate+0xb0/0xb0
 [<c0468e82>] call_timer_fn+0x32/0xf0
 [<c046a76d>] run_timer_softirq+0xed/0x1b0
 [<c07d98a0>] ? dev_activate+0xb0/0xb0
 [<c0461a81>] __do_softirq+0x91/0x1a0
 [<c04619f0>] ? local_bh_enable+0x80/0x80
 <IRQ>  [<c0462295>] ? irq_exit+0x95/0xa0
 [<c087f8b8>] ? smp_apic_timer_interrupt+0x38/0x42
 [<c08784f5>] ? apic_timer_interrupt+0x31/0x38
 [<c046007b>] ? do_exit+0x11b/0x370
 [<c065eae4>] ? intel_idle+0xa4/0x100
 [<c078d9b9>] ? cpuidle_idle_call+0xb9/0x1e0
 [<c0411d77>] ? cpu_idle+0x97/0xd0
 [<c085cbbd>] ? rest_init+0x5d/0x70
 [<c0b07a7a>] ? start_kernel+0x28a/0x340
 [<c0b074b0>] ? obsolete_checksetup+0xb0/0xb0
 [<c0b070a4>] ? i386_start_kernel+0x64/0xb0
---[ end trace 5d51553c2ad66677 ]---
e1000e 0000:05:00.0: eth0: Reset adapter
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx

Any idea?

Thanks,
Joe

> 
> 
> We only have to wait they push their alternative patch, eventually.
> 
> In the mean time, you can use Hiroaki SHIMODA patch, it works.
> 
> 
> 



------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* Re: [PATCH] net: cgroup: fix out of bounds accesses
From: Neil Horman @ 2012-07-09 12:13 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, linux-kernel, netdev, lizefan, tj, Gao feng
In-Reply-To: <1341834652.3265.2642.camel@edumazet-glaptop>

On Mon, Jul 09, 2012 at 01:50:52PM +0200, Eric Dumazet wrote:
> On Mon, 2012-07-09 at 07:01 -0400, Neil Horman wrote:
> 
> > Thank you for doing this Eric, Gao.  Just to be sure (I asked in the previous
> > thread), would it be better to avoid the length check in skb_update_prio, and
> > instead update the netdev tables to be long enough in cgrp_create and in
> > netprio_device_event on device registration?
> 
> Yes probably, and it is even needed because extend_netdev_table() can
> acutally fail to expand the table if kzalloc() returned NULL.
> 
> Current code just ignores this allocation failure so we also can crash
> in write_priomap()
> 
ACK, can you follow up with a patch please?

Thanks!
Neil

> 
> 
> 

^ permalink raw reply

* Re: [RFC PATCH] bridge: netfilter: fix skb->nf_bridge NULL panic in br_nf_forward_finish
From: Lin Ming @ 2012-07-09 12:00 UTC (permalink / raw)
  To: Massimo Cetra
  Cc: Eric Dumazet, netdev, Stephen Hemminger, David S. Miller,
	Julian Anastasov
In-Reply-To: <4FFAA3CE.1090504@navynet.it>

On Mon, Jul 9, 2012 at 5:26 PM, Massimo Cetra
<ctrix+debianbugs@navynet.it> wrote:
> On 06/07/2012 16:19, Lin Ming wrote:
>>
>> I can reproduce similiar panic with 3.5-rc5 kernel as Massimo reported at:
>> http://marc.info/?l=linux-netdev&m=134089242113979&w=2
>>
>> The steps to reproduce as follow,
>>
>> 1. On Host1, setup brige br0(192.168.1.106)
>> 2. Boot a kvm guest(192.168.1.105) on Host1 and start httpd
>> 3. Start IPVS service on Host1
>>     ipvsadm -A -t 192.168.1.106:80 -s rr
>>     ipvsadm -a -t 192.168.1.106:80 -r 192.168.1.105:80 -m
>> 4. Run apache benchmark on Host2(192.168.1.101)
>>     ab -n 1000 http://192.168.1.106/
>
>
> Thank you Lin,
>
> i spent a couple of days trying to figure out how to reproduce but you were
> quicker and smarter than me.

Could you also test it ? :-)

Thanks.

>
> Massimo

^ permalink raw reply

* Re: [RFC PATCH] ppp: add support for L2 multihop / tunnel switching
From: James Chapman @ 2012-07-09 11:52 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: netdev, linux-ppp
In-Reply-To: <20120708214930.GI19462@kvack.org>

On 08/07/12 22:49, Benjamin LaHaise wrote:
> Hello folks,
> 
> Below is a first cut at implementing multihop L2TP, also known as tunnel 
> switching.  The feature is similar in scope to how PPPoE relaying works -- 
> L2 packets that are received on one PPP interface are forwarded to another.  
> This feature is typically used for traffic aggregation and backhaul for 
> ISPs, with incoming sessions (often PPPoE) being partially authenticated 
> by a LAC, and then forwarded over an L2TP session to an LNS (selected by the 
> user's domain) which then provides network access to the client.

As a mechanism for switching PPP interfaces together, this patch is
good. For L2TP though, I prefer an approach that would be applicable for
all L2TP traffic types, not just PPP.

L2TP supports many different pseudowire types, and this patch will only
be useful for tunnel switching between PPP pseudowires. Whereas if we
implement it within the L2TP core, rather than in the PPP code, we would
get switching between all pseudowire types. If we add this patch and
then subsequently add switching between other pseudowires in the L2TP
core (which we're likely to want to do), then we're left with two
different interfaces for doing L2TP tunnel switching in the kernel.

The L2TP core allows traffic to be passed directly into an L2TP session.
In the case of PPPoE, for example, the PPP data can be  extracted from a
PPPoE packet and passed into an L2TP tunnel/session, with no PPP
interface(s) involved.

That said, your approach allows two PPP interfaces to be switched
together, which has its own advantages.

> The reasoning behind using dev_queue_xmit() rather than outputting directly 
> to another PPP channel is to enable the use of the traffic shaping and 
> queuing features of the kernel on multihop sessions.

I'm not sure about using a pseudo packet type to do this. For L2TP, it
would seem better to add netfilter/tc support for L2TP data packets,
which would let people add rules for, say, traffic in L2TP tunnel x /
session y. This would avoid the need for ETH_P_PPP and you could then
output directly to the ppp channel.

-- 
James Chapman
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development

^ permalink raw reply

* Re: [PATCH] net: cgroup: fix out of bounds accesses
From: Eric Dumazet @ 2012-07-09 11:50 UTC (permalink / raw)
  To: Neil Horman; +Cc: David Miller, linux-kernel, netdev, lizefan, tj, Gao feng
In-Reply-To: <20120709110154.GB9186@hmsreliant.think-freely.org>

On Mon, 2012-07-09 at 07:01 -0400, Neil Horman wrote:

> Thank you for doing this Eric, Gao.  Just to be sure (I asked in the previous
> thread), would it be better to avoid the length check in skb_update_prio, and
> instead update the netdev tables to be long enough in cgrp_create and in
> netprio_device_event on device registration?

Yes probably, and it is even needed because extend_netdev_table() can
acutally fail to expand the table if kzalloc() returned NULL.

Current code just ignores this allocation failure so we also can crash
in write_priomap()

^ permalink raw reply

* Re: [PATCH] net: cgroup: fix out of bounds accesses
From: Neil Horman @ 2012-07-09 11:01 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, linux-kernel, netdev, lizefan, tj, Gao feng
In-Reply-To: <1341819910.3265.2106.camel@edumazet-glaptop>

On Mon, Jul 09, 2012 at 09:45:10AM +0200, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> dev->priomap is allocated by extend_netdev_table() called from
> update_netdev_tables().
> And this is only called if write_priomap() is called.
> 
> But if write_priomap() is not called, it seems we can have out of bounds
> accesses in cgrp_destroy(), read_priomap() & skb_update_prio()
> 
> With help from Gao Feng
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Neil Horman <nhorman@tuxdriver.com>
> Cc: Gao feng <gaofeng@cn.fujitsu.com>
> ---
> net/core/dev.c            |    8 ++++++--
> net/core/netprio_cgroup.c |    4 ++--
> 2 files changed, 8 insertions(+), 4 deletions(-)
> 

Thank you for doing this Eric, Gao.  Just to be sure (I asked in the previous
thread), would it be better to avoid the length check in skb_update_prio, and
instead update the netdev tables to be long enough in cgrp_create and in
netprio_device_event on device registration?

Neil

^ permalink raw reply

* Re: [PATCH v2] cgroup: fix panic in netprio_cgroup
From: Neil Horman @ 2012-07-09 10:59 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Gao feng, davem, netdev, linux-kernel, tj, lizefan
In-Reply-To: <1341777043.3265.1786.camel@edumazet-glaptop>

On Sun, Jul 08, 2012 at 09:50:43PM +0200, Eric Dumazet wrote:
> On Thu, 2012-07-05 at 17:28 +0800, Gao feng wrote:
> > we set max_prioidx to the first zero bit index of prioidx_map in
> > function get_prioidx.
> > 
> > So when we delete the low index netprio cgroup and adding a new
> > netprio cgroup again,the max_prioidx will be set to the low index.
> > 
> > when we set the high index cgroup's net_prio.ifpriomap,the function
> > write_priomap will call update_netdev_tables to alloc memory which
> > size is sizeof(struct netprio_map) + sizeof(u32) * (max_prioidx + 1),
> > so the size of array that map->priomap point to is max_prioidx +1,
> > which is low than what we actually need.
> > 
> > fix this by adding check in get_prioidx,only set max_prioidx when
> > max_prioidx low than the new prioidx.
> > 
> > Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
> > ---
> >  net/core/netprio_cgroup.c |    3 ++-
> >  1 files changed, 2 insertions(+), 1 deletions(-)
> > 
> > diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
> > index 5b8aa2f..aa907ed 100644
> > --- a/net/core/netprio_cgroup.c
> > +++ b/net/core/netprio_cgroup.c
> > @@ -49,8 +49,9 @@ static int get_prioidx(u32 *prio)
> >  		return -ENOSPC;
> >  	}
> >  	set_bit(prioidx, prioidx_map);
> > +	if (atomic_read(&max_prioidx) < prioidx)
> > +		atomic_set(&max_prioidx, prioidx);
> >  	spin_unlock_irqrestore(&prioidx_map_lock, flags);
> > -	atomic_set(&max_prioidx, prioidx);
> >  	*prio = prioidx;
> >  	return 0;
> >  }
> 
> This patch seems fine to me.
> 
> Acked-by: Eric Dumazet <edumazet@google.com>
> 
> Neil, looking at this file, I believe something is wrong.
> 
> dev->priomap is allocated by extend_netdev_table() called from
> update_netdev_tables(). And this is only called if write_priomap() is
> called.
> 
> But if write_priomap() is not called, it seems we can have out of bounds
> accesses in cgrp_destroy() and read_priomap()
> 
> What do you think of following patch ?
> 
> diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
> index 5b8aa2f..80150d2 100644
> --- a/net/core/netprio_cgroup.c
> +++ b/net/core/netprio_cgroup.c
> @@ -141,7 +141,7 @@ static void cgrp_destroy(struct cgroup *cgrp)
>  	rtnl_lock();
>  	for_each_netdev(&init_net, dev) {
>  		map = rtnl_dereference(dev->priomap);
> -		if (map)
> +		if (map && cs->prioidx < map->priomap_len)
>  			map->priomap[cs->prioidx] = 0;
>  	}
>  	rtnl_unlock();
> @@ -165,7 +165,7 @@ static int read_priomap(struct cgroup *cont, struct cftype *cft,
>  	rcu_read_lock();
>  	for_each_netdev_rcu(&init_net, dev) {
>  		map = rcu_dereference(dev->priomap);
> -		priority = map ? map->priomap[prioidx] : 0;
> +		priority = (map && prioidx < map->priomap_len) ? map->priomap[prioidx] : 0;
>  		cb->fill(cb, dev->name, priority);
>  	}
>  	rcu_read_unlock();
> 
> 
> 
You're right,  If we create a cgroup after a net device is registered the group
priority index will likely be out of bounds for those devices.  We can fix it
like you propose above (including the additional prioidx < map->priomap_len
check in skb_update_prio as Gao notes), or we can call update_netdev_tables
iteratively for every net device in cgrp_create, and on device_registration in
netprio_device_event.

I'm not sure how adventageous one is over the other, but it does seem that,
given that skb_update_prio is in the transmit path, it might be nice to avoid
the additional length check there if possible.

Thanks!
Neil

^ permalink raw reply

* Re: [PATCH net-next 4/6] 6lowpan: rework fragment-deleting routine
From: Eric Dumazet @ 2012-07-09 10:53 UTC (permalink / raw)
  To: Alexander Smirnov; +Cc: davem, netdev
In-Reply-To: <1341829351-18485-5-git-send-email-alex.bluesman.smirnov@gmail.com>

On Mon, 2012-07-09 at 14:22 +0400, Alexander Smirnov wrote:
> 6lowpan module starts collecting incomming frames and fragments
> right after lowpan_module_init() therefor it will be better to
> clean unfinished fragments in lowpan_cleanup_module() function
> instead of doing it when link goes down.
> 
> Changed spinlocks type to prevent deadlock with expired timer event
> and removed unused one.
> 
> Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
> ---
>  net/ieee802154/6lowpan.c |   28 ++++++++++++++++------------
>  1 files changed, 16 insertions(+), 12 deletions(-)
> 
> diff --git a/net/ieee802154/6lowpan.c b/net/ieee802154/6lowpan.c
> index b872515..e7de085 100644
> --- a/net/ieee802154/6lowpan.c
> +++ b/net/ieee802154/6lowpan.c
> @@ -113,7 +113,6 @@ struct lowpan_dev_record {
>  
>  struct lowpan_fragment {
>  	struct sk_buff		*skb;		/* skb to be assembled */
> -	spinlock_t		lock;		/* concurency lock */
>  	u16			length;		/* length to be assemled */
>  	u32			bytes_rcv;	/* bytes received */
>  	u16			tag;		/* current fragment tag */
> @@ -761,7 +760,7 @@ lowpan_process_data(struct sk_buff *skb)
>  		if ((frame->bytes_rcv == frame->length) &&
>  		     frame->timer.expires > jiffies) {
>  			/* if timer haven't expired - first of all delete it */
> -			del_timer(&frame->timer);
> +			del_timer_sync(&frame->timer);
>  			list_del(&frame->list);
>  			spin_unlock(&flist_lock);
>  
> @@ -1196,19 +1195,9 @@ static void lowpan_dellink(struct net_device *dev, struct list_head *head)
>  	struct lowpan_dev_info *lowpan_dev = lowpan_dev_info(dev);
>  	struct net_device *real_dev = lowpan_dev->real_dev;
>  	struct lowpan_dev_record *entry, *tmp;
> -	struct lowpan_fragment *frame, *tframe;
>  
>  	ASSERT_RTNL();
>  
> -	spin_lock(&flist_lock);
> -	list_for_each_entry_safe(frame, tframe, &lowpan_fragments, list) {
> -		del_timer(&frame->timer);
> -		list_del(&frame->list);
> -		dev_kfree_skb(frame->skb);
> -		kfree(frame);
> -	}
> -	spin_unlock(&flist_lock);
> -
>  	mutex_lock(&lowpan_dev_info(dev)->dev_list_mtx);
>  	list_for_each_entry_safe(entry, tmp, &lowpan_devices, list) {
>  		if (entry->ldev == dev) {
> @@ -1264,9 +1253,24 @@ out:
>  
>  static void __exit lowpan_cleanup_module(void)
>  {
> +	struct lowpan_fragment *frame, *tframe;
> +
>  	lowpan_netlink_fini();
>  
>  	dev_remove_pack(&lowpan_packet_type);
> +
> +	/* Now 6lowpan packet_type is removed, so no new fragments are
> +	 * expected on RX, therefore that's the time to clean incomplete
> +	 * fragments.
> +	 */
> +	spin_lock_bh(&flist_lock);
> +	list_for_each_entry_safe(frame, tframe, &lowpan_fragments, list) {
> +		del_timer_sync(&frame->timer);
> +		list_del(&frame->list);
> +		dev_kfree_skb(frame->skb);
> +		kfree(frame);
> +	}
> +	spin_unlock_bh(&flist_lock);
>  }
>  
>  module_init(lowpan_init_module);

Problem is lowpan_fragment_timer_expired() can race with this code.

del_timer_sync() might block here if lowpan_fragment_timer_expired() is
waiting/spinning for spin_lock(&flist_lock)

You cant call del_timer_sync() holding flist_lock, you should find
another way to solve the problem.

Its explained in kernel/timer.c :

#ifdef CONFIG_SMP
/**
 * del_timer_sync - deactivate a timer and wait for the handler to finish.
 * @timer: the timer to be deactivated
 *
 * This function only differs from del_timer() on SMP: besides deactivating
 * the timer it also makes sure the handler has finished executing on other
 * CPUs.
 *
 * Synchronization rules: Callers must prevent restarting of the timer,
 * otherwise this function is meaningless. It must not be called from
 * interrupt contexts. The caller must not hold locks which would prevent
 * completion of the timer's handler. The timer's handler must not call
 * add_timer_on(). Upon exit the timer is not queued and the handler is
 * not running on any CPU.
 *
 * Note: You must not hold locks that are held in interrupt context
 *   while calling this function. Even if the lock has nothing to do
 *   with the timer in question.  Here's why:
 *
 *    CPU0                             CPU1
 *    ----                             ----
 *                                   <SOFTIRQ>
 *                                   call_timer_fn();
 *                                     base->running_timer = mytimer;
 *  spin_lock_irq(somelock);
 *                                     <IRQ>
 *                                        spin_lock(somelock);
 *  del_timer_sync(mytimer);
 *   while (base->running_timer == mytimer);
 *
 * Now del_timer_sync() will never return and never release somelock.
 * The interrupt on the other CPU is waiting to grab somelock but
 * it has interrupted the softirq that CPU0 is waiting to finish.
 *
 * The function returns whether it has deactivated a pending timer or not.
 */

^ permalink raw reply

* Re: =? ?q?=5BPATCH=20firmware=203/4=5D=20rtl=5Fnic=3A=20add=20new=20firmware=20for=20RTL8106E?=
From: Jan Ceuleers @ 2012-07-09 10:33 UTC (permalink / raw)
  To: Hayes Wang; +Cc: dwmw2, ben, romieu, netdev, linux-kernel
In-Reply-To: <1341803512-1309-3-git-send-email-hayeswang@realtek.com>

On 07/09/2012 05:11 AM, Hayes Wang wrote:
> File: rtl_nic/rtl8106e-1.fw
> Version: 0.0.1
> 
> Signed-off-by: Hayes Wang <hayeswang@realtek.com>

Hi

I'm afraid that the subject lines of patches 3/4 and 4/4 were mangled.
Could you try again?

Thanks, Jan

^ permalink raw reply

* Re: [PATCH 4/4] asix: Add a new driver for the AX88172A
From: Christian Riesch @ 2012-07-09 10:30 UTC (permalink / raw)
  To: michael
  Cc: Ben Hutchings, netdev, Oliver Neukum, Eric Dumazet, Allan Chou,
	Mark Lord, Grant Grundler, Ming Lei
In-Reply-To: <1341761975.2038.28.camel@schesaplana>

Hi Ben and Michael,

On Sun, Jul 8, 2012 at 5:39 PM, Michael Riesch <michael@riesch.at> wrote:
> On Fri, 2012-07-06 at 18:37 +0100, Ben Hutchings wrote:
>> > +   priv->mdio->priv = (void *)dev;
>> > +   priv->mdio->read = &asix_mdio_bus_read;
>> > +   priv->mdio->write = &asix_mdio_bus_write;
>> > +   priv->mdio->name = "Asix MDIO Bus";
>> > +   snprintf(priv->mdio->id, MII_BUS_ID_SIZE, "asix-%s",
>> > +            dev_name(dev->net->dev.parent));
>> [...]
>>
>> I think you need to ensure that the bus identifier is unique throughout
>> its lifetime, but net devices can be renamed and that could lead to a
>> collision.  Perhaps you could use the ifindex or the USB device path
>
> Ben,
>
> the dev_name function in the code above returns the sysfs filename of
> the USB device (e.g. 1-0:1.0).
>
>> (though that might be too long).
>
> This may be a problem. The bus identifier may be 17 characters long, so
> if we leave the endpoint/configuration part (:1.0) and the prefix away
> it should be fine in any "normal" system. However, on a system with a
> more-than-9-root-hubs 5-tier 127-devices-each USB infrastructure it
> results in collisions. So is this approach acceptable?
>
> Using the ifindex sounds good to me,
>
> snprintf(priv->mdio->id, MII_BUS_ID_SIZE, "asix-%d",
>         dev->net->ifindex);
>
> works on any system with less than 10^12 network interfaces.

Ok, I'll change that to use ifindex. I didn't like the mdio bus name
with lots of colons, dashes, and periods anyway...
Thank you!

Regards, Christian

^ permalink raw reply

* [PATCH net-next 6/6] mac802154: sparse warnings: make symbols static
From: Alexander Smirnov @ 2012-07-09 10:22 UTC (permalink / raw)
  To: davem, eric.dumazet; +Cc: netdev, Alexander Smirnov
In-Reply-To: <1341829351-18485-1-git-send-email-alex.bluesman.smirnov@gmail.com>

Make symbols static to avoid the following warning shown up
by sparse:

    warning: symbol ... was not declared. Should it be static?

Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
---
 net/mac802154/mac_cmd.c |    2 +-
 net/mac802154/mib.c     |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/mac802154/mac_cmd.c b/net/mac802154/mac_cmd.c
index 5d9a47b..d8d2770 100644
--- a/net/mac802154/mac_cmd.c
+++ b/net/mac802154/mac_cmd.c
@@ -55,7 +55,7 @@ static int mac802154_mlme_start_req(struct net_device *dev,
 	return 0;
 }
 
-struct wpan_phy *mac802154_get_phy(const struct net_device *dev)
+static struct wpan_phy *mac802154_get_phy(const struct net_device *dev)
 {
 	struct mac802154_sub_if_data *priv = netdev_priv(dev);
 
diff --git a/net/mac802154/mib.c b/net/mac802154/mib.c
index 5c66b8f..f47781a 100644
--- a/net/mac802154/mib.c
+++ b/net/mac802154/mib.c
@@ -39,7 +39,7 @@ struct hw_addr_filt_notify_work {
 	unsigned long changed;
 };
 
-struct mac802154_priv *mac802154_slave_get_priv(struct net_device *dev)
+static struct mac802154_priv *mac802154_slave_get_priv(struct net_device *dev)
 {
 	struct mac802154_sub_if_data *priv = netdev_priv(dev);
 
-- 
1.7.2.3

^ permalink raw reply related

* [PATCH net-next 5/6] 6lowpan: get extra headroom in allocated frame
From: Alexander Smirnov @ 2012-07-09 10:22 UTC (permalink / raw)
  To: davem, eric.dumazet; +Cc: netdev, Alexander Smirnov
In-Reply-To: <1341829351-18485-1-git-send-email-alex.bluesman.smirnov@gmail.com>

Use netdev_alloc_skb_ip_align() instead of alloc_skb() to get some
extra headroom in case we need to forward this frame in a tunnel or
something else.

Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
---
 net/ieee802154/6lowpan.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ieee802154/6lowpan.c b/net/ieee802154/6lowpan.c
index e7de085..90ca7ba 100644
--- a/net/ieee802154/6lowpan.c
+++ b/net/ieee802154/6lowpan.c
@@ -660,8 +660,8 @@ lowpan_alloc_new_frame(struct sk_buff *skb, u8 iphc0, u8 len, u8 tag)
 	frame->tag = tag;
 
 	/* allocate buffer for frame assembling */
-	frame->skb = alloc_skb(frame->length +
-			       sizeof(struct ipv6hdr), GFP_ATOMIC);
+	frame->skb = netdev_alloc_skb_ip_align(skb->dev, frame->length +
+					       sizeof(struct ipv6hdr));
 
 	if (!frame->skb)
 		goto skb_err;
-- 
1.7.2.3

^ permalink raw reply related

* [PATCH net-next 4/6] 6lowpan: rework fragment-deleting routine
From: Alexander Smirnov @ 2012-07-09 10:22 UTC (permalink / raw)
  To: davem, eric.dumazet; +Cc: netdev, Alexander Smirnov
In-Reply-To: <1341829351-18485-1-git-send-email-alex.bluesman.smirnov@gmail.com>

6lowpan module starts collecting incomming frames and fragments
right after lowpan_module_init() therefor it will be better to
clean unfinished fragments in lowpan_cleanup_module() function
instead of doing it when link goes down.

Changed spinlocks type to prevent deadlock with expired timer event
and removed unused one.

Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
---
 net/ieee802154/6lowpan.c |   28 ++++++++++++++++------------
 1 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/net/ieee802154/6lowpan.c b/net/ieee802154/6lowpan.c
index b872515..e7de085 100644
--- a/net/ieee802154/6lowpan.c
+++ b/net/ieee802154/6lowpan.c
@@ -113,7 +113,6 @@ struct lowpan_dev_record {
 
 struct lowpan_fragment {
 	struct sk_buff		*skb;		/* skb to be assembled */
-	spinlock_t		lock;		/* concurency lock */
 	u16			length;		/* length to be assemled */
 	u32			bytes_rcv;	/* bytes received */
 	u16			tag;		/* current fragment tag */
@@ -761,7 +760,7 @@ lowpan_process_data(struct sk_buff *skb)
 		if ((frame->bytes_rcv == frame->length) &&
 		     frame->timer.expires > jiffies) {
 			/* if timer haven't expired - first of all delete it */
-			del_timer(&frame->timer);
+			del_timer_sync(&frame->timer);
 			list_del(&frame->list);
 			spin_unlock(&flist_lock);
 
@@ -1196,19 +1195,9 @@ static void lowpan_dellink(struct net_device *dev, struct list_head *head)
 	struct lowpan_dev_info *lowpan_dev = lowpan_dev_info(dev);
 	struct net_device *real_dev = lowpan_dev->real_dev;
 	struct lowpan_dev_record *entry, *tmp;
-	struct lowpan_fragment *frame, *tframe;
 
 	ASSERT_RTNL();
 
-	spin_lock(&flist_lock);
-	list_for_each_entry_safe(frame, tframe, &lowpan_fragments, list) {
-		del_timer(&frame->timer);
-		list_del(&frame->list);
-		dev_kfree_skb(frame->skb);
-		kfree(frame);
-	}
-	spin_unlock(&flist_lock);
-
 	mutex_lock(&lowpan_dev_info(dev)->dev_list_mtx);
 	list_for_each_entry_safe(entry, tmp, &lowpan_devices, list) {
 		if (entry->ldev == dev) {
@@ -1264,9 +1253,24 @@ out:
 
 static void __exit lowpan_cleanup_module(void)
 {
+	struct lowpan_fragment *frame, *tframe;
+
 	lowpan_netlink_fini();
 
 	dev_remove_pack(&lowpan_packet_type);
+
+	/* Now 6lowpan packet_type is removed, so no new fragments are
+	 * expected on RX, therefore that's the time to clean incomplete
+	 * fragments.
+	 */
+	spin_lock_bh(&flist_lock);
+	list_for_each_entry_safe(frame, tframe, &lowpan_fragments, list) {
+		del_timer_sync(&frame->timer);
+		list_del(&frame->list);
+		dev_kfree_skb(frame->skb);
+		kfree(frame);
+	}
+	spin_unlock_bh(&flist_lock);
 }
 
 module_init(lowpan_init_module);
-- 
1.7.2.3

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox