Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next] tcp: be more strict before accepting ECN negociation
From: Eric Dumazet @ 2012-05-04 19:05 UTC (permalink / raw)
  To: Rick Jones
  Cc: David Miller, netdev, Perry Lorier, Matt Mathis, Yuchung Cheng,
	Neal Cardwell, Tom Herbert, Wilmer van der Gaast, Dave Täht,
	Ankur Jain
In-Reply-To: <4FA42491.2020104@hp.com>

On Fri, 2012-05-04 at 11:48 -0700, Rick Jones wrote:
> On 05/04/2012 11:23 AM, Eric Dumazet wrote:
> > On Fri, 2012-05-04 at 11:09 -0700, Rick Jones wrote:
> >> What sort of networks were these?  Any chance it was some sort of
> >> attempt to add ECN to FastOpen?
> >
> > Nothing to do with fastopen.
> >
> > Just take a look at a random http server and sample all SYN packets it
> > receives.
> >
> > Some of them have TOS bits 0 or 1 set, or even both bits set.
> 
> I'll fire-up tcpdump on netperf.org:
> 
> tcpdump -i eth0 -vvv '(tcp[tcpflags] & tcp-syn != 0) && (ip[1] != 0x0)'
> 
> and see what appears.
> 
> rick

of (ip[1] & 3 != 0)


Note that you could catch SYNACK with this filter (if your machine
initiates some active TCP sessions), since SYNACK might have ECT bits,
if some stacks implemented :

http://tools.ietf.org/html/draft-kuzmanovic-ecn-syn-00  ( Adding
Explicit Congestion Notification (ECN) Capability to TCP's SYN/ACK
Packets )

http://tools.ietf.org/id/draft-ietf-tcpm-ecnsyn-04.txt

^ permalink raw reply

* Re: [PATCH] bnx2x: bug fix when loading after SAN boot
From: Eilon Greenstein @ 2012-05-04 19:15 UTC (permalink / raw)
  To: David Miller; +Cc: ariele, netdev
In-Reply-To: <20120504.115753.1701261151183409987.davem@davemloft.net>

On Fri, 2012-05-04 at 11:57 -0400, David Miller wrote:
> From: "Ariel Elior" <ariele@broadcom.com>
> Date: Thu, 3 May 2012 11:22:00 +0300
> 
> > +	/* clear hw from errors which mnay have resulted from an interrupted
> > +	 * dmae transaction.
> > +	 */
> 
> Please fix the typos in this comment.
> 

Sure - thanks for pointing that out:

Subject: [PATCH net v2] bnx2x: bug fix when loading after SAN boot

From: Ariel Elior <ariele@broadcom.com>

This is a bug fix for an "interface fails to load" issue.
The issue occurs when bnx2x driver loads after UNDI driver was previously
loaded over the chip. In such a scenario the UNDI driver is loaded and operates
in the pre-boot kernel, within its own specific host memory address range.
When the pre-boot stage is complete, the real kernel is loaded, in a new and
distinct host memory address range. The transition from pre-boot stage to boot
is asynchronous from UNDI point of view.

A race condition occurs when UNDI driver triggers a DMAE transaction to valid
host addresses in the pre-boot stage, when control is diverted to the real
kernel. This results in access to illegal addresses by our HW as the addresses
which were valid in the preboot stage are no longer considered valid.
Specifically, the 'was_error' bit in the pci glue of our device is set. This
causes all following pci transactions from chip to host to timeout (in
accordance to the pci spec).

Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>

---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |   23 +++++++++++++++++++++-
 1 files changed, 22 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index e077d25..795fc49 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -9122,13 +9122,34 @@ static int __devinit bnx2x_prev_unload_common(struct bnx2x *bp)
 	return bnx2x_prev_mcp_done(bp);
 }
 
+/* previous driver DMAE transaction may have occurred when pre-boot stage ended
+ * and boot began, or when kdump kernel was loaded. Either case would invalidate
+ * the addresses of the transaction, resulting in was-error bit set in the pci
+ * causing all hw-to-host pcie transactions to timeout. If this happened we want
+ * to clear the interrupt which detected this from the pglueb and the was done
+ * bit
+ */
+static void __devinit bnx2x_prev_interrupted_dmae(struct bnx2x *bp)
+{
+	u32 val = REG_RD(bp, PGLUE_B_REG_PGLUE_B_INT_STS);
+	if (val & PGLUE_B_PGLUE_B_INT_STS_REG_WAS_ERROR_ATTN) {
+		BNX2X_ERR("was error bit was found to be set in pglueb upon startup. Clearing");
+		REG_WR(bp, PGLUE_B_REG_WAS_ERROR_PF_7_0_CLR, 1 << BP_FUNC(bp));
+	}
+}
+
 static int __devinit bnx2x_prev_unload(struct bnx2x *bp)
 {
 	int time_counter = 10;
 	u32 rc, fw, hw_lock_reg, hw_lock_val;
 	BNX2X_DEV_INFO("Entering Previous Unload Flow\n");
 
-       /* Release previously held locks */
+	/* clear hw from errors which may have resulted from an interrupted
+	 * dmae transaction.
+	 */
+	bnx2x_prev_interrupted_dmae(bp);
+
+	/* Release previously held locks */
 	hw_lock_reg = (BP_FUNC(bp) <= 5) ?
 		      (MISC_REG_DRIVER_CONTROL_1 + BP_FUNC(bp) * 8) :
 		      (MISC_REG_DRIVER_CONTROL_7 + (BP_FUNC(bp) - 6) * 8);

^ permalink raw reply related

* Re: [REGRESSION][PATCH] Fix an old sky2 WOL regression
From: dilieto @ 2012-05-04 20:17 UTC (permalink / raw)
  To: Knut_Petersen
  Cc: Linus Torvalds, Andrew Morton, Stephen Hemminger, David S. Miller,
	arekm, Jared, linux-kernel, netdev

Dear Knut,

I am sorry it took a while to try it due to other 
commitments, however I can now confirm the patch works on my ASUS P5LD2 
motherboard.

Many thanks
Nicola

On 2012-03-21 12:40, Knut Petersen 
wrote:
> Sky2 Wake on LAN is broken since February 2010 on a number of 
systems.
> Yes. More than two years.
>
> We know about the problem and 
the cause since October 2010
> (Bugzilla bug #19492). It´s commit 
87b09f1f25cd1e01d7c50bf423c7fe33027d7511.
>
> Stephen, David: You 
signed off that commit.
>
> Andrew: You called it a regression in 
October 2010.
>
> It has been proposed to revert the commit that caused 
the problem.
> Nothing happened.
>
> I proposed to re-establish the old 
code for dmi_match()ed systems.
> Without success.
>
> Now it is 
proposed to re-establish the old code as a configuration option.
> If 
nothing happens again I will propose a module parameter ;-)
>
> 
Stephen, I don´t want to be a pain in the neck, and it is not my 
intention
> to offend you by my "attitude". But I simply cannot 
understand why this
> know regression is not fixed. The bit we talk 
about is documented,
> and in fact it was set for a number of kernel 
versions unconditionally.
> Nobody complained about ruined hardware or 
minor problems.
>
> The systems affected are old enough that no 
manufacturer cares about
> them, but they are still quite usable for a 
lot of jobs (kernel 3.3 compile
> time here is below 15 minutes).
>
> 
If there is a problem in the kernel and if we do know an easy solution,

> that solution should be commited to the kernel, no matter what is 
written
> in some random documentation, no matter if we could blame 
some
> BIOS authors. That´s the way Linux works - at least I thought 
so.
>
> cu,
> Knut

^ permalink raw reply

* Re: [PATCH net-next] tcp: be more strict before accepting ECN negociation
From: Rick Jones @ 2012-05-04 20:20 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, Perry Lorier, Matt Mathis, Yuchung Cheng,
	Neal Cardwell, Tom Herbert, Wilmer van der Gaast, Dave Täht,
	Ankur Jain
In-Reply-To: <1336158359.3752.382.camel@edumazet-glaptop>

On 05/04/2012 12:05 PM, Eric Dumazet wrote:
> On Fri, 2012-05-04 at 11:48 -0700, Rick Jones wrote:
>> I'll fire-up tcpdump on netperf.org:
>>
>> tcpdump -i eth0 -vvv '(tcp[tcpflags]&  tcp-syn != 0)&&  (ip[1] != 0x0)'
>>
>> and see what appears.
>>
>> rick
>
> of (ip[1]&  3 != 0)

True, I'm looking at more than the ECN bits, but in the 90 minutes the 
tcpdump has been running there have been no packets with the any of the 
8 bits at ip[1] being 1 anyway :)  Netperf.org doesn't get a massive 
quantity of traffic.  It may go the entire week-end or longer without 
seeing such a packet.

> Note that you could catch SYNACK with this filter (if your machine
> initiates some active TCP sessions), since SYNACK might have ECT bits,
> if some stacks implemented :
>
> http://tools.ietf.org/html/draft-kuzmanovic-ecn-syn-00  ( Adding
> Explicit Congestion Notification (ECN) Capability to TCP's SYN/ACK
> Packets )
>
> http://tools.ietf.org/id/draft-ietf-tcpm-ecnsyn-04.txt

True.  I suspect that 99 times out of 10, the outbound connections 
established by netperf.org are in response to traffic to netperf-talk, 
which is itself a rather quiet list, so I'm not too worried about the 
output being cluttered with false hits.

rick

^ permalink raw reply

* Re: [PATCH net-next] tcp: be more strict before accepting ECN negociation
From: Rick Jones @ 2012-05-04 20:36 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, Perry Lorier, Matt Mathis, Yuchung Cheng,
	Neal Cardwell, Tom Herbert, Wilmer van der Gaast, Dave Täht,
	Ankur Jain
In-Reply-To: <4FA43A03.4090707@hp.com>

On 05/04/2012 01:20 PM, Rick Jones wrote:
> True, I'm looking at more than the ECN bits, but in the 90 minutes the
> tcpdump has been running there have been no packets with the any of the
> 8 bits at ip[1] being 1 anyway :) Netperf.org doesn't get a massive
> quantity of traffic. It may go the entire week-end or longer without
> seeing such a packet.

I see fate is working as intended, or someone decided to try to feed me 
my words :) for within 6 minutes of my sending the above I got:

13:26:16.866007 IP (tos 0x3,CE, ttl 41, id 28850, offset 0, flags [DF], 
proto TCP (6), length 64)
     somesystemin.de.55363 > www.netperf.org.www: Flags [S], cksum 
0x4cfc (correct), seq 304457158, win 65535, options [mss 1460,nop,wscale 
3,nop,nop,TS val 288116308 ecr 0,sackOK,eol], length 0
13:26:17.831880 IP (tos 0x3,CE, ttl 41, id 6911, offset 0, flags [DF], 
proto TCP (6), length 64)
     somesystemin.de.55367 > www.netperf.org.www: Flags [S], cksum 
0x17aa (correct), seq 586073737, win 65535, options [mss 1460,nop,wscale 
3,nop,nop,TS val 288117270 ecr 0,sackOK,eol], length 0
13:26:17.831929 IP (tos 0x3,CE, ttl 41, id 28924, offset 0, flags [DF], 
proto TCP (6), length 64)
     somesystemin.de.55368 > www.netperf.org.www: Flags [S], cksum 
0x07cc (correct), seq 1513398047, win 65535, options [mss 
1460,nop,wscale 3,nop,nop,TS val 288117271 ecr 0,sackOK,eol], length 0
13:26:17.831952 IP (tos 0x3,CE, ttl 41, id 2494, offset 0, flags [DF], 
proto TCP (6), length 64)
     somesystemin.de.55366 > www.netperf.org.www: Flags [S], cksum 
0x75f4 (correct), seq 1153058420, win 65535, options [mss 
1460,nop,wscale 3,nop,nop,TS val 288117270 ecr 0,sackOK,eol], length 0
13:26:17.832177 IP (tos 0x3,CE, ttl 41, id 6854, offset 0, flags [DF], 
proto TCP (6), length 64)
     somesystemin.de.55365 > www.netperf.org.www: Flags [S], cksum 
0xfca0 (correct), seq 2332522875, win 65535, options [mss 
1460,nop,wscale 3,nop,nop,TS val 288117270 ecr 0,sackOK,eol], length 0
13:26:17.832239 IP (tos 0x3,CE, ttl 41, id 64733, offset 0, flags [DF], 
proto TCP (6), length 64)
     somesystemin.de.55364 > www.netperf.org.www: Flags [S], cksum 
0x7414 (correct), seq 1544827132, win 65535, options [mss 
1460,nop,wscale 3,nop,nop,TS val 288117270 ecr 0,sackOK,eol], length 0
13:26:38.649126 IP (tos 0x3,CE, ttl 41, id 9860, offset 0, flags [DF], 
proto TCP (6), length 64)
     somesystemin.de.55369 > www.netperf.org.www: Flags [S], cksum 
0x6270 (correct), seq 683091230, win 65535, options [mss 1460,nop,wscale 
3,nop,nop,TS val 288137968 ecr 0,sackOK,eol], length 0
13:26:39.417589 IP (tos 0x3,CE, ttl 41, id 13478, offset 0, flags [DF], 
proto TCP (6), length 64)
     somesystemin.de.55370 > www.netperf.org.www: Flags [S], cksum 
0x2862 (correct), seq 3168323595, win 65535, options [mss 
1460,nop,wscale 3,nop,nop,TS val 288138734 ecr 0,sackOK,eol], length 0

rick

^ permalink raw reply

* Re: [PATCH net-next] tcp: be more strict before accepting ECN negociation
From: Eric Dumazet @ 2012-05-04 20:49 UTC (permalink / raw)
  To: Rick Jones
  Cc: David Miller, netdev, Perry Lorier, Matt Mathis, Yuchung Cheng,
	Neal Cardwell, Tom Herbert, Wilmer van der Gaast, Dave Täht,
	Ankur Jain
In-Reply-To: <4FA43DCE.8040901@hp.com>

On Fri, 2012-05-04 at 13:36 -0700, Rick Jones wrote:
> On 05/04/2012 01:20 PM, Rick Jones wrote:
> > True, I'm looking at more than the ECN bits, but in the 90 minutes the
> > tcpdump has been running there have been no packets with the any of the
> > 8 bits at ip[1] being 1 anyway :) Netperf.org doesn't get a massive
> > quantity of traffic. It may go the entire week-end or longer without
> > seeing such a packet.
> 
> I see fate is working as intended, or someone decided to try to feed me 
> my words :) for within 6 minutes of my sending the above I got:
> 
> 13:26:16.866007 IP (tos 0x3,CE, ttl 41, id 28850, offset 0, flags [DF], 
> proto TCP (6), length 64)
>      somesystemin.de.55363 > www.netperf.org.www: Flags [S], cksum 
> 0x4cfc (correct), seq 304457158, win 65535, options [mss 1460,nop,wscale 
> 3,nop,nop,TS val 288116308 ecr 0,sackOK,eol], length 0
> 13:26:17.831880 IP (tos 0x3,CE, ttl 41, id 6911, offset 0, flags [DF], 
> proto TCP (6), length 64)
>      somesystemin.de.55367 > www.netperf.org.www: Flags [S], cksum 
> 0x17aa (correct), seq 586073737, win 65535, options [mss 1460,nop,wscale 
> 3,nop,nop,TS val 288117270 ecr 0,sackOK,eol], length 0
> 13:26:17.831929 IP (tos 0x3,CE, ttl 41, id 28924, offset 0, flags [DF], 
> proto TCP (6), length 64)
>      somesystemin.de.55368 > www.netperf.org.www: Flags [S], cksum 
> 0x07cc (correct), seq 1513398047, win 65535, options [mss 
> 1460,nop,wscale 3,nop,nop,TS val 288117271 ecr 0,sackOK,eol], length 0
> 13:26:17.831952 IP (tos 0x3,CE, ttl 41, id 2494, offset 0, flags [DF], 
> proto TCP (6), length 64)
>      somesystemin.de.55366 > www.netperf.org.www: Flags [S], cksum 
> 0x75f4 (correct), seq 1153058420, win 65535, options [mss 
> 1460,nop,wscale 3,nop,nop,TS val 288117270 ecr 0,sackOK,eol], length 0
> 13:26:17.832177 IP (tos 0x3,CE, ttl 41, id 6854, offset 0, flags [DF], 
> proto TCP (6), length 64)
>      somesystemin.de.55365 > www.netperf.org.www: Flags [S], cksum 
> 0xfca0 (correct), seq 2332522875, win 65535, options [mss 
> 1460,nop,wscale 3,nop,nop,TS val 288117270 ecr 0,sackOK,eol], length 0
> 13:26:17.832239 IP (tos 0x3,CE, ttl 41, id 64733, offset 0, flags [DF], 
> proto TCP (6), length 64)
>      somesystemin.de.55364 > www.netperf.org.www: Flags [S], cksum 
> 0x7414 (correct), seq 1544827132, win 65535, options [mss 
> 1460,nop,wscale 3,nop,nop,TS val 288117270 ecr 0,sackOK,eol], length 0
> 13:26:38.649126 IP (tos 0x3,CE, ttl 41, id 9860, offset 0, flags [DF], 
> proto TCP (6), length 64)
>      somesystemin.de.55369 > www.netperf.org.www: Flags [S], cksum 
> 0x6270 (correct), seq 683091230, win 65535, options [mss 1460,nop,wscale 
> 3,nop,nop,TS val 288137968 ecr 0,sackOK,eol], length 0
> 13:26:39.417589 IP (tos 0x3,CE, ttl 41, id 13478, offset 0, flags [DF], 
> proto TCP (6), length 64)
>      somesystemin.de.55370 > www.netperf.org.www: Flags [S], cksum 
> 0x2862 (correct), seq 3168323595, win 65535, options [mss 
> 1460,nop,wscale 3,nop,nop,TS val 288138734 ecr 0,sackOK,eol], length 0
> 
> rick

Interesting indeed ;)

Did you check if it was spoofed ?

(did the 3WHS really completed)

^ permalink raw reply

* Re: [PATCH v2] RPS: Sparse connection optimizations - v2
From: Eric Dumazet @ 2012-05-04 20:53 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Deng-Cheng Zhu, davem, netdev
In-Reply-To: <1336146448.3752.349.camel@edumazet-glaptop>

On Fri, 2012-05-04 at 17:47 +0200, Eric Dumazet wrote:
> On Fri, 2012-05-04 at 08:31 -0700, Tom Herbert wrote:
> > > I think the mechanisms of rps_dev_flow_table and cpu_flow (in this
> > > patch) are different: The former works along with rps_sock_flow_table
> > > whose CPU info is based on recvmsg by the application. But for the tests
> > > like what I did, there's no application involved.
> > >
> > While rps_sock_flow_table is currently only managed by recvmsg, it
> > still is the general mechanism that maps flows to CPUs for steering.
> > There should be nothing preventing you from populating and managing
> > entries in other ways.
> 
> It might be done from a netfilter module, activated in FORWARD chain for
> example.
> 
> 

A good indicator of the network load of a cpu would be to gather
&per_cpu(softnet_data, cpu)->input_pkt_queue.qlen in an EWMA.


We could dynamically adjust active cpus in RPS set given the load of the
machine.

On low load, cpu handling NIC interrupt could also bypass RPS and avoid
IPI to other cpus for low overhead.

tcpu = map->cpus[((u64) skb->rxhash * map->len) >> 32];

->

	if (map->curlen) {
		tcpu = map->cpus[((u64) skb->rxhash * map->curlen) >> 32];
		if (cpu_online(tcpu)) 
			return tcpu;
	}	
	return -1;

Every second or so (to reduce Out Of Order impact), allow curlen to be
incremented/decremented in [0 .. map->len] if load is
increasing/lowering.

^ permalink raw reply

* Re: [PATCH net-next] tcp: be more strict before accepting ECN negociation
From: Rick Jones @ 2012-05-04 21:01 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, Perry Lorier, Matt Mathis, Yuchung Cheng,
	Neal Cardwell, Tom Herbert, Wilmer van der Gaast, Dave Täht,
	Ankur Jain
In-Reply-To: <1336164546.3752.460.camel@edumazet-glaptop>

>
> Interesting indeed ;)
>
> Did you check if it was spoofed ?
>
> (did the 3WHS really completed)


Well, the tcpdump command was still:


tcpdump -i eth0 -vvv '(tcp[tcpflags]&  tcp-syn != 0)&&  (ip[1] != 0x0)'

I didn't see any SYN|ACKs go out, but netperf.org would have had to set 
ECT for me to see a SYN|ACK going out.   FWIW, this is on a 2.6.31-15 
(Ubuntu) kernel with net.ipv4.tcp_ecn = 2 and I don't think the SYNs 
themselves were negotiating ECN:

13:26:16.866007 IP (tos 0x3,CE, ttl 41, id 28850, offset 0, flags [DF], 
proto TCP (6), length 64)
     somesystemin.de.55363 > www.netperf.org.www: Flags [S], cksum 
0x4cfc (correct), seq 304457158, win 65535, options [mss 1460,nop,wscale

rick

^ permalink raw reply

* Re: [PATCH] net: davinci_emac: Add pre_open, post_stop platform callbacks
From: Kevin Hilman @ 2012-05-04 21:02 UTC (permalink / raw)
  To: Mark A. Greer
  Cc: Bedia, Vaibhav, nsekhar, Ben Hutchings, netdev@vger.kernel.org,
	linux-omap@vger.kernel.org, linux-arm-kernel@lists.infradead.org
In-Reply-To: <20120504182938.GB28910@animalcreek.com>

"Mark A. Greer" <mgreer@animalcreek.com> writes:

> On Fri, May 04, 2012 at 07:31:30AM -0700, Kevin Hilman wrote:

[...]

>> Come to think of it, the right solution here is probably to use runtime
>> PM.  We could then to add some custom hooks for davinci_emac in the
>> device code to use enable_hlt/disable_hlt based on activity.
>
> That was my first thought, actually, but that only works if its
> okay for the driver to call enable_hlt/disable_hlt directly (i.e.,
> have runtime_suspend() call enable_hlt() and runtime_resume() call
> disable_hlt()).  However, I assumed it would _not_ be acceptable for
> the driver to issue those calls directly.  

I agree.

> Its a platform-specific issue that we shouldn't be polluting the
> driver with and there are currently no drivers that call them under
> the drivers directory.

Using runtime PM we don't have to have any platform specific calls in
the driver.  We handle it inside the platform-specific runtime PM
implementation.

IOW, we don't have to call enable_hlt/disable_hlt in the driver.  It can
be completely transparent to the driver.  We can call
enable_hlt/disable_hlt in device specific code.  That way, davinci
platforms with this same IP won't use

To demonstrate, assume the davinci_emac is runtime PM converted and does
a pm_runtime_get_sync() instead of clk_enable().  

For the first call to pm_runtime_get_sync() (when runtime PM count goes
from zero to 1), this will trigger the runtime PM core to runtime PM
enable the device.  Using the driver model's 'PM domain' layer, we've
plugged in our omap_device layer, so the omap_device layer is called for
runtime PM transitions.  (c.f. omap_device_pm_domain in plat-omap/omap_device.c)

Specifically, on a a runtime PM 'get', the PM domain's
->runtime_resume() callback is called.  For an omap_device, that is
_od_runtime_resume().  After enabling the device using
omap_device_enable() the driver's ->runtime_resume callback is called.

So, to summarize, the (simplified) flow looks like this:

pm_runtime_get_sync()
    <PM domain>->runtime_resume()   /* _od_runtime_resume() */
        omap_device_enable()
        pm_generic_runtime_resume()
            <driver>->runtime_resume()

However, you're still wondering where we would sneak in the call to
disable_hlt.  Well, I'm glad you asked....

Looking closer at omap_device_enable(), you'll see that it calls
_omap_device_activate() which uses a function pointer in the
omap_device_pm_latency structure to actually enable the device.

By default, this function is omap_device_enable_hwmods() for all
omap_devices, which in turn uses the hwmod layer to enable the HW
(including clock enable, PM init, etc.)

Now, here's the magic....

On a per-device basis, that activate function can be customized.  In the
custom function, you can add custom calls (e.g. disable_hlt) and then
call the normal omap_device_* functions to continue the default
behavior.

This is getting messy, so let me give a concrete example in the form of
a patch.  Starting from the GPIO driver, which is already runtime PM
converted.  If I wanted to add disable_hlt/enable_hlt whenver the device
is runtime PM enabled/disabled, it would be as simple as the patch below.

So in summary, whever pm_runtime_get_sync() is called (and the usecount
goes from zero to 1), the omap_device 'activate' function is called
(which can call disable_hlt()), and whenever pm_runtime_put() is called
(and the usecount reaches zero), the omap_device 'deactivate' function
is called, and enable_hlt() can be called.

The example I give below customizes the hooks for *all* SoCs, but in the
specific case we're trying to solve, we would only need to add custom
hooks for the devices without wakeups.

Note that all of this presumes that the driver is runtime PM converted
*and* the device itself is built using omap_device_build().  That means
that the device init code in am35xx_emac.c needs to be converted to use
omap_device_build instead of the normal platform_device_* calls.  (note
though that omap_device_build() will also create/register the
platform_device.

Kevin

diff --git a/arch/arm/mach-omap2/gpio.c b/arch/arm/mach-omap2/gpio.c
index a80e093..3acd1eb 100644
--- a/arch/arm/mach-omap2/gpio.c
+++ b/arch/arm/mach-omap2/gpio.c
@@ -26,8 +26,30 @@
 #include <plat/omap_device.h>
 #include <plat/omap-pm.h>

+#include <asm/system.h>
+
 #include "powerdomain.h"

+static int omap2_gpio_deactivate_func(struct omap_device *od)
+{
+	enable_hlt();
+	return omap_device_idle_hwmods(od);
+}
+
+static int omap2_gpio_activate_func(struct omap_device *od)
+{
+	disable_hlt();
+	return omap_device_enable_hwmods(od);
+}
+
+struct omap_device_pm_latency pm_lats[] __initdata = {
+	{
+		.activate_func = omap2_gpio_activate_func,
+		.deactivate_func = omap2_gpio_deactivate_func,
+		.flags = OMAP_DEVICE_LATENCY_AUTO_ADJUST,
+	},
+};
+
 static int __init omap2_gpio_dev_init(struct omap_hwmod *oh, void *unused)
 {
 	struct platform_device *pdev;
@@ -128,7 +150,8 @@ static int __init omap2_gpio_dev_init(struct omap_hwmod *oh, void *unused)
 	pdata->loses_context = pwrdm_can_ever_lose_context(pwrdm);

 	pdev = omap_device_build(name, id - 1, oh, pdata,
-				sizeof(*pdata),	NULL, 0, false);
+				 sizeof(*pdata),	pm_lats,
+				 ARRAY_SIZE(pm_lats), false);
 	kfree(pdata);

 	if (IS_ERR(pdev)) {

^ permalink raw reply related

* [RFC] MTU discovery not working over GRE
From: Stephen Hemminger @ 2012-05-04 21:04 UTC (permalink / raw)
  To: Herbert Xu, David Miller; +Cc: netdev

When using gretap, I am seeing that Path MTU discovery is not
working for packets going out over the tunnel interface. What happens
is that the driver correctly identifies the DF bit, and see IPv4
but since skb_dst(skb) is NULL, the icmp_send() ends up doing nothing.

The following fixes the problem but does not seem like the correct general
solution. IPv6 almost certainly has the same problem.
Perhaps we should just set the skb_dst() earlier (before the
MTU checks).

diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 1017460..da57bb0 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -838,8 +838,9 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev

 		if ((old_iph->frag_off&htons(IP_DF)) &&
 		    mtu < ntohs(old_iph->tot_len)) {
+			skb_dst_drop(skb);
+			skb_dst_set(skb, &rt->dst);
 			icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu));
-			ip_rt_put(rt);
 			goto tx_error;
 		}
 	}

^ permalink raw reply related

* Re: [PATCH net-next] tcp: be more strict before accepting ECN negociation
From: Eric Dumazet @ 2012-05-04 21:14 UTC (permalink / raw)
  To: Rick Jones
  Cc: David Miller, netdev, Perry Lorier, Matt Mathis, Yuchung Cheng,
	Neal Cardwell, Tom Herbert, Wilmer van der Gaast, Dave Täht,
	Ankur Jain
In-Reply-To: <4FA443B6.9010106@hp.com>

On Fri, 2012-05-04 at 14:01 -0700, Rick Jones wrote:
> >
> > Interesting indeed ;)
> >
> > Did you check if it was spoofed ?
> >
> > (did the 3WHS really completed)
> 
> 
> Well, the tcpdump command was still:
> 
> 
> tcpdump -i eth0 -vvv '(tcp[tcpflags]&  tcp-syn != 0)&&  (ip[1] != 0x0)'
> 
> I didn't see any SYN|ACKs go out, but netperf.org would have had to set 
> ECT for me to see a SYN|ACK going out.   FWIW, this is on a 2.6.31-15 
> (Ubuntu) kernel with net.ipv4.tcp_ecn = 2 and I don't think the SYNs 
> themselves were negotiating ECN:
> 
> 13:26:16.866007 IP (tos 0x3,CE, ttl 41, id 28850, offset 0, flags [DF], 
> proto TCP (6), length 64)
>      somesystemin.de.55363 > www.netperf.org.www: Flags [S], cksum 
> 0x4cfc (correct), seq 304457158, win 65535, options [mss 1460,nop,wscale

Probably not, or else you would see :

13:26:16.866007 IP (tos 0x3,CE, ttl 41, id 28850, offset 0, flags
[DF],proto TCP (6), length 64)
    somesystemin.de.55363 > www.netperf.org.www: Flags [SEW], cksum ...

^ permalink raw reply

* Re: [PATCH 04/15] drivers/net: Do not free an IRQ if its request failed
From: Lee Jones @ 2012-05-04 21:18 UTC (permalink / raw)
  To: Linus Walleij
  Cc: linux, linus.walleij, arnd, netdev, grant.likely, cjb,
	linux-arm-kernel
In-Reply-To: <CACRpkdaB+mAmo50uSDvVcq=V9_BF01-zu2WHUU+cc2WPU1NEXA@mail.gmail.com>

On 04/05/12 19:22, Linus Walleij wrote:
> On Fri, May 4, 2012 at 4:10 PM, Lee Jones<lee.jones@linaro.org>  wrote:
>> On 19/04/12 21:36, Lee Jones wrote:
>
>>> -               goto out_free_irq;
>>> +               goto out_disable_resources;
>>>         }
>>>
>>>         retval = register_netdev(dev);
>>
>>
>> Anything on this from the Net guys?
>
> This was merged for 3.4-rc5, check:
> https://lkml.org/lkml/2012/4/29/2

I actually found that out a few hours ago by accident.

Thanks for letting me know though.

Kind regards,
Lee

-- 
Lee Jones
Linaro ST-Ericsson Landing Team Lead
M: +44 77 88 633 515
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* [patch 1/1] connector/userns: replace netlink uses of cap_raised() with capable()
From: akpm @ 2012-05-04 21:34 UTC (permalink / raw)
  To: davem
  Cc: netdev, akpm, ebiederm, dhowells, james.l.morris, kaber, morgan,
	philipp.reisner, segoon, serge.hallyn

From: Eric W. Biederman <ebiederm@xmission.com>
Subject: connector/userns: replace netlink uses of cap_raised() with capable()

In 2009 Philip Reiser notied that a few users of netlink connector
interface needed a capability check and added the idiom
cap_raised(nsp->eff_cap, CAP_SYS_ADMIN) to a few of them, on the premise
that netlink was asynchronous.

In 2011 Patrick McHardy noticed we were being silly because netlink is
synchronous and removed eff_cap from the netlink_skb_params and changed
the idiom to cap_raised(current_cap(), CAP_SYS_ADMIN).

Looking at those spots with a fresh eye we should be calling
capable(CAP_SYS_ADMIN).  The only reason I can see for not calling capable
is that it once appeared we were not in the same task as the caller which
would have made calling capable() impossible.

In the initial user_namespace the only difference between between
cap_raised(current_cap(), CAP_SYS_ADMIN) and capable(CAP_SYS_ADMIN) are a
few sanity checks and the fact that capable(CAP_SYS_ADMIN) sets
PF_SUPERPRIV if we use the capability.

Since we are going to be using root privilege setting PF_SUPERPRIV seems
the right thing to do.

The motivation for this that patch is that in a child user namespace
cap_raised(current_cap(),...) tests your capabilities with respect to that
child user namespace not capabilities in the initial user namespace and
thus will allow processes that should be unprivielged to use the kernel
services that are only protected with cap_raised(current_cap(),..).

To fix possible user_namespace issues and to just clean up the code
replace cap_raised(current_cap(), CAP_SYS_ADMIN) with
capable(CAP_SYS_ADMIN).

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Acked-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Acked-by: Andrew G. Morgan <morgan@kernel.org>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/block/drbd/drbd_nl.c           |    2 +-
 drivers/md/dm-log-userspace-transfer.c |    2 +-
 drivers/video/uvesafb.c                |    2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff -puN drivers/block/drbd/drbd_nl.c~connector-userns-replace-netlink-uses-of-cap_raised-with-capable drivers/block/drbd/drbd_nl.c
--- a/drivers/block/drbd/drbd_nl.c~connector-userns-replace-netlink-uses-of-cap_raised-with-capable
+++ a/drivers/block/drbd/drbd_nl.c
@@ -2297,7 +2297,7 @@ static void drbd_connector_callback(stru
 		return;
 	}

-	if (!cap_raised(current_cap(), CAP_SYS_ADMIN)) {
+	if (!capable(CAP_SYS_ADMIN)) {
 		retcode = ERR_PERM;
 		goto fail;
 	}
diff -puN drivers/md/dm-log-userspace-transfer.c~connector-userns-replace-netlink-uses-of-cap_raised-with-capable drivers/md/dm-log-userspace-transfer.c
--- a/drivers/md/dm-log-userspace-transfer.c~connector-userns-replace-netlink-uses-of-cap_raised-with-capable
+++ a/drivers/md/dm-log-userspace-transfer.c
@@ -134,7 +134,7 @@ static void cn_ulog_callback(struct cn_m
 {
 	struct dm_ulog_request *tfr = (struct dm_ulog_request *)(msg + 1);

-	if (!cap_raised(current_cap(), CAP_SYS_ADMIN))
+	if (!capable(CAP_SYS_ADMIN))
 		return;

 	spin_lock(&receiving_list_lock);
diff -puN drivers/video/uvesafb.c~connector-userns-replace-netlink-uses-of-cap_raised-with-capable drivers/video/uvesafb.c
--- a/drivers/video/uvesafb.c~connector-userns-replace-netlink-uses-of-cap_raised-with-capable
+++ a/drivers/video/uvesafb.c
@@ -73,7 +73,7 @@ static void uvesafb_cn_callback(struct c
 	struct uvesafb_task *utask;
 	struct uvesafb_ktask *task;

-	if (!cap_raised(current_cap(), CAP_SYS_ADMIN))
+	if (!capable(CAP_SYS_ADMIN))
 		return;

 	if (msg->seq >= UVESAFB_TASKS_MAX)
_

^ permalink raw reply

* [PATCH] irda: irda-usb.c URL to sigmatel stir 4210/4220/4116 fw from archive.org
From: Xose Vazquez Perez @ 2012-05-04 21:37 UTC (permalink / raw)
  To: xose.vazquez, samuel, netdev, linux-kernel

Sigmatel was bought by Freescale, the web is dead and the firmware was lost.

Signed-off-by: Xose Vazquez Perez <xose.vazquez@gmail.com>
---
 drivers/net/irda/irda-usb.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/irda/irda-usb.c b/drivers/net/irda/irda-usb.c
index 72f687b..0c929f2 100644
--- a/drivers/net/irda/irda-usb.c
+++ b/drivers/net/irda/irda-usb.c
@@ -1081,6 +1081,7 @@ static int stir421x_patch_device(struct irda_usb_cb *self)
         /*
          * Known firmware patch file names for STIR421x dongles
          * are "42101001.sb" or "42101002.sb"
+         * http://web.archive.org/web/http://www.sigmatel.com/documents/stir4210_4220_4116_patch_files.tar.gz
          */
         sprintf(stir421x_fw_name, "4210%4X.sb",
                 self->usbdev->descriptor.bcdDevice);
-- 
1.7.10.1

^ permalink raw reply related

* Re: [PATCH] net: davinci_emac: Add pre_open, post_stop platform callbacks
From: Mark A. Greer @ 2012-05-04 21:47 UTC (permalink / raw)
  To: Kevin Hilman
  Cc: Bedia, Vaibhav, nsekhar, Ben Hutchings, netdev@vger.kernel.org,
	linux-omap@vger.kernel.org, linux-arm-kernel@lists.infradead.org
In-Reply-To: <87397faccs.fsf@ti.com>

On Fri, May 04, 2012 at 02:02:43PM -0700, Kevin Hilman wrote:
> "Mark A. Greer" <mgreer@animalcreek.com> writes:
> 
> > On Fri, May 04, 2012 at 07:31:30AM -0700, Kevin Hilman wrote:
> 
> [...]
> 
> >> Come to think of it, the right solution here is probably to use runtime
> >> PM.  We could then to add some custom hooks for davinci_emac in the
> >> device code to use enable_hlt/disable_hlt based on activity.
> >
> > That was my first thought, actually, but that only works if its
> > okay for the driver to call enable_hlt/disable_hlt directly (i.e.,
> > have runtime_suspend() call enable_hlt() and runtime_resume() call
> > disable_hlt()).  However, I assumed it would _not_ be acceptable for
> > the driver to issue those calls directly.  
> 
> I agree.
> 
> > Its a platform-specific issue that we shouldn't be polluting the
> > driver with and there are currently no drivers that call them under
> > the drivers directory.
> 
> Using runtime PM we don't have to have any platform specific calls in
> the driver.  We handle it inside the platform-specific runtime PM
> implementation.

FYI, with some further discussion via IRC, I'm going to implement what
Kevin has laid out here.  There is a dependency on davinci adding support
too but I'll coordinate with the people/person doing that.

Please disregard this patch.

Thanks for the help everyone.

Mark

^ permalink raw reply

* [BUG/PATCH] ppp_mppe discards 50% of packets from some servers
From: Phil Hord @ 2012-05-04 22:28 UTC (permalink / raw)
  To: netdev

[BUG] ppp_mppe discards 50% of packets from some servers

[2.] Full description of the problem/report:

When I connect to a server using MPPE, I receive about every other
packet with the FLUSHED bit turned off. These packets are dropped by
ppp_mppe.c just like the SPEC says they should be. I am not able to
maintain tcp connections through the VPN with these packets being discarded.

I found a patch discussed here:
http://ubuntuforums.org/showthread.php?t=1095189

When I stop dropping those packets (by making this change to the driver;
see patch at end), the problem went away.

When I connect to a different server, I do not have this problem.

WhenI connect using Windows XP or Windows 7 clients, I do not have any
packet loss on either server, though I have not confirmed whether every
other packet still has the FLUSHED bit turned off.

I could see the symptom clearly by pinging a machine on the VPN. Every
other ping packet is lost with the old driver. Once I installed the new
driver, every ping is responded correctly.

I do not know any details about the two servers (one that works and one
that doesn't) except both are managed by the same mega-company but in
different countries (China and Czech).

--- /etc/ppp/peers/server1
pty "pptp remote-server1 --nolaunchpppd"
name "username"
remotename PPTP
require-mppe-128
file /etc/ppp/options.pptp

--- /etc/ppp/options.pptp:
lock
noauth
refuse-pap
refuse-eap
refuse-chap
refuse-mschap
nobsdcomp
nodeflate



[3.] Keywords (i.e., modules, networking, kernel):
ppp_mppe.ko, networking, kernel, mppe

[4.] Kernel information
[4.1.] Kernel version (from /proc/version):
Linux version 3.0.0-14-generic (buildd@allspice) (gcc version 4.6.1
(Ubuntu/Linaro 4.6.1-9ubuntu3) ) #23-Ubuntu SMP Mon Nov 21 20:28:43 UTC 2011


[4.2.] Kernel .config file:
*shrug*

[5.] Most recent kernel version which did not have the bug:

I have never seen a version able to connect to the failing server reliably.

[6.] Output of Oops.. message (if applicable) with symbolic information
     resolved (see Documentation/oops-tracing.txt)

N/A


[7.] A small shell script or example program which triggers the
     problem (if possible)

$ sudo pon server2
$ ping some-machine.server2
...
^C
20 packets transmitted, 10 received, 50% packet loss



[8.] Environment
[8.1.] Software (add the output of the ver_linux script here)
If some fields are empty or look unusual you may have an old version.
Compare to the current minimal requirements in Documentation/Changes.
 
Linux iptv-lnx-hordp 3.0.0-14-generic #23-Ubuntu SMP Mon Nov 21 20:28:43
UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
 
Gnu C                  4.6.1
Gnu make               3.81
binutils               2.21.53.20110810
util-linux             2.19.1
mount                  support
module-init-tools      3.16
e2fsprogs              1.41.14
pcmciautils            015
PPP                    2.4.5
Linux C Library        2.13
Dynamic linker (ldd)   2.13
Procps                 3.2.8
Net-tools              1.60
Kbd                    1.15.2
Sh-utils               8.5
wireless-tools         30
Modules Loaded         ppp_mppe des_generic md4 nls_utf8 cifs ppp_async
crc_ccitt usb_storage uas mos7840 asix usbserial usbnet hid_logitech
ff_memless bnep rfcomm pci_stub vboxpci vboxnetadp vboxnetflt vboxdrv
kvm_intel kvm parport_pc ppdev autofs4 binfmt_misc dm_crypt
snd_hda_codec_hdmi nvidia uvcvideo videodev arc4 v4l2_compat_ioctl32
joydev btusb bluetooth snd_hda_codec_conexant iwlagn thinkpad_acpi
snd_seq_midi snd_hda_intel snd_rawmidi mac80211 snd_hda_codec psmouse
snd_hwdep snd_seq_midi_event serio_raw cfg80211 snd_seq snd_pcm
snd_seq_device snd_timer snd nvram tpm_tis video soundcore
snd_page_alloc wmi mei lp parport usbhid hid mmc_block ahci sdhci_pci
sdhci libahci xhci_hcd e1000e

[8.2.] Processor information (from /proc/cpuinfo):
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 42
model name      : Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
stepping        : 7
cpu MHz         : 800.000
cache size      : 6144 KB
physical id     : 0
siblings        : 8
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology
nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est
tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 x2apic popcnt aes xsave avx
lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority
ept vpid
bogomips        : 4784.69
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 42
model name      : Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
stepping        : 7
cpu MHz         : 800.000
cache size      : 6144 KB
physical id     : 0
siblings        : 8
core id         : 0
cpu cores       : 4
apicid          : 1
initial apicid  : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology
nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est
tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 x2apic popcnt aes xsave avx
lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority
ept vpid
bogomips        : 4784.17
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 2
vendor_id       : GenuineIntel
cpu family      : 6
model           : 42
model name      : Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
stepping        : 7
cpu MHz         : 800.000
cache size      : 6144 KB
physical id     : 0
siblings        : 8
core id         : 1
cpu cores       : 4
apicid          : 2
initial apicid  : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology
nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est
tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 x2apic popcnt aes xsave avx
lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority
ept vpid
bogomips        : 4785.08
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 3
vendor_id       : GenuineIntel
cpu family      : 6
model           : 42
model name      : Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
stepping        : 7
cpu MHz         : 800.000
cache size      : 6144 KB
physical id     : 0
siblings        : 8
core id         : 1
cpu cores       : 4
apicid          : 3
initial apicid  : 3
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology
nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est
tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 x2apic popcnt aes xsave avx
lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority
ept vpid
bogomips        : 4784.56
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 4
vendor_id       : GenuineIntel
cpu family      : 6
model           : 42
model name      : Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
stepping        : 7
cpu MHz         : 800.000
cache size      : 6144 KB
physical id     : 0
siblings        : 8
core id         : 2
cpu cores       : 4
apicid          : 4
initial apicid  : 4
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology
nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est
tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 x2apic popcnt aes xsave avx
lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority
ept vpid
bogomips        : 4784.34
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 5
vendor_id       : GenuineIntel
cpu family      : 6
model           : 42
model name      : Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
stepping        : 7
cpu MHz         : 800.000
cache size      : 6144 KB
physical id     : 0
siblings        : 8
core id         : 2
cpu cores       : 4
apicid          : 5
initial apicid  : 5
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology
nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est
tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 x2apic popcnt aes xsave avx
lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority
ept vpid
bogomips        : 4784.48
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 6
vendor_id       : GenuineIntel
cpu family      : 6
model           : 42
model name      : Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
stepping        : 7
cpu MHz         : 800.000
cache size      : 6144 KB
physical id     : 0
siblings        : 8
core id         : 3
cpu cores       : 4
apicid          : 6
initial apicid  : 6
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology
nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est
tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 x2apic popcnt aes xsave avx
lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority
ept vpid
bogomips        : 4784.48
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 7
vendor_id       : GenuineIntel
cpu family      : 6
model           : 42
model name      : Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
stepping        : 7
cpu MHz         : 800.000
cache size      : 6144 KB
physical id     : 0
siblings        : 8
core id         : 3
cpu cores       : 4
apicid          : 7
initial apicid  : 7
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology
nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est
tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 x2apic popcnt aes xsave avx
lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority
ept vpid
bogomips        : 4784.52
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:


[8.3.] Module information (from /proc/modules):
ppp_mppe 13035 4 - Live 0xffffffffa0fbe000
des_generic 21415 0 - Live 0xffffffffa0fc9000
md4 12595 0 - Live 0xffffffffa0fb9000
nls_utf8 12557 1 - Live 0xffffffffa0fc4000
cifs 273872 2 - Live 0xffffffffa0f75000
ppp_async 17539 2 - Live 0xffffffffa0f6f000
crc_ccitt 12667 1 ppp_async, Live 0xffffffffa0ef5000
usb_storage 57901 0 - Live 0xffffffffa0f5f000
uas 18027 0 - Live 0xffffffffa0f47000
mos7840 36023 0 - Live 0xffffffffa0f55000
asix 22704 0 - Live 0xffffffffa0f4e000
usbserial 47107 1 mos7840, Live 0xffffffffa0f3a000
usbnet 26212 1 asix, Live 0xffffffffa0f00000
hid_logitech 17677 0 - Live 0xffffffffa0efa000
ff_memless 13097 1 hid_logitech, Live 0xffffffffa0ee9000
bnep 18436 2 - Live 0xffffffffa0eef000
rfcomm 47946 12 - Live 0xffffffffa0e85000
pci_stub 12622 1 - Live 0xffffffffa0e02000
vboxpci 23200 0 - Live 0xffffffffa0e7e000
vboxnetadp 13382 0 - Live 0xffffffffa0251000
vboxnetflt 23441 0 - Live 0xffffffffa0e77000
vboxdrv 282548 3 vboxpci,vboxnetadp,vboxnetflt, Live 0xffffffffa0ea3000
kvm_intel 61643 0 - Live 0xffffffffa0e92000
kvm 383781 1 kvm_intel, Live 0xffffffffa0e18000
parport_pc 36962 0 - Live 0xffffffffa0e0d000
ppdev 17113 0 - Live 0xffffffffa0e07000
autofs4 37024 3 - Live 0xffffffffa0df7000
binfmt_misc 17540 1 - Live 0xffffffffa0df1000
dm_crypt 23199 0 - Live 0xffffffffa0de6000
snd_hda_codec_hdmi 32040 4 - Live 0xffffffffa0fec000
nvidia 11713772 45 - Live 0xffffffffa02b9000 (P)
uvcvideo 72711 0 - Live 0xffffffffa0ff5000
videodev 92992 1 uvcvideo, Live 0xffffffffa0fd4000
arc4 12529 6 - Live 0xffffffffa0242000
v4l2_compat_ioctl32 17083 1 videodev, Live 0xffffffffa021a000
joydev 17693 0 - Live 0xffffffffa0214000
btusb 18600 2 - Live 0xffffffffa023c000
bluetooth 166112 23 bnep,rfcomm,btusb, Live 0xffffffffa0f0b000
snd_hda_codec_conexant 62197 1 - Live 0xffffffffa0203000
iwlagn 314213 0 - Live 0xffffffffa0256000
thinkpad_acpi 81819 0 - Live 0xffffffffa02a4000
snd_seq_midi 13324 0 - Live 0xffffffffa0166000
snd_hda_intel 33390 5 - Live 0xffffffffa0247000
snd_rawmidi 30547 1 snd_seq_midi, Live 0xffffffffa015d000
mac80211 462092 1 iwlagn, Live 0xffffffffa0191000
snd_hda_codec 104802 3
snd_hda_codec_hdmi,snd_hda_codec_conexant,snd_hda_intel, Live
0xffffffffa0221000
psmouse 73882 0 - Live 0xffffffffa017d000
snd_hwdep 13668 1 snd_hda_codec, Live 0xffffffffa00fb000
snd_seq_midi_event 14899 1 snd_seq_midi, Live 0xffffffffa010d000
serio_raw 13166 0 - Live 0xffffffffa00e1000
cfg80211 199587 2 iwlagn,mac80211, Live 0xffffffffa012b000
snd_seq 61896 2 snd_seq_midi,snd_seq_midi_event, Live 0xffffffffa016c000
snd_pcm 96714 4 snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec, Live
0xffffffffa0112000
snd_seq_device 14540 3 snd_seq_midi,snd_rawmidi,snd_seq, Live
0xffffffffa00d6000
snd_timer 29991 3 snd_seq,snd_pcm, Live 0xffffffffa0104000
snd 68266 19
snd_hda_codec_hdmi,snd_hda_codec_conexant,thinkpad_acpi,snd_hda_intel,snd_rawmidi,snd_hda_codec,snd_hwdep,snd_seq,snd_pcm,snd_seq_device,snd_timer,
Live 0xffffffffa00e9000
nvram 14413 1 thinkpad_acpi, Live 0xffffffffa00b1000
tpm_tis 18546 0 - Live 0xffffffffa00db000
video 19412 0 - Live 0xffffffffa00d0000
soundcore 12680 1 snd, Live 0xffffffffa007a000
snd_page_alloc 18529 2 snd_hda_intel,snd_pcm, Live 0xffffffffa00ca000
wmi 19256 0 - Live 0xffffffffa00b6000
mei 41480 0 - Live 0xffffffffa00a5000 (C)
lp 17799 0 - Live 0xffffffffa004a000
parport 46562 3 parport_pc,ppdev,lp, Live 0xffffffffa0098000
usbhid 47198 1 hid_logitech, Live 0xffffffffa00bd000
hid 95463 2 hid_logitech,usbhid, Live 0xffffffffa007f000
mmc_block 22923 0 - Live 0xffffffffa0029000
ahci 26002 4 - Live 0xffffffffa0072000
sdhci_pci 14032 0 - Live 0xffffffffa0069000
sdhci 32166 1 sdhci_pci, Live 0xffffffffa005c000
libahci 26861 1 ahci, Live 0xffffffffa0050000
xhci_hcd 82820 0 - Live 0xffffffffa0034000
e1000e 160582 0 - Live 0xffffffffa0000000

[8.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem)
 cat /proc/ioports /proc/iomem
0000-0cf7 : PCI Bus 0000:00
  0000-001f : dma1
  0020-0021 : pic1
  0040-0043 : timer0
  0050-0053 : timer1
  0060-0060 : keyboard
  0062-0062 : EC data
  0064-0064 : keyboard
  0066-0066 : EC cmd
  0070-0071 : rtc0
  0080-008f : dma page reg
  00a0-00a1 : pic2
  00c0-00df : dma2
  00f0-00ff : fpu
  03c0-03df : vga+
  0400-047f : pnp 00:02
    0400-0403 : ACPI PM1a_EVT_BLK
    0404-0405 : ACPI PM1a_CNT_BLK
    0408-040b : ACPI PM_TMR
    0410-0415 : ACPI CPU throttle
    0420-042f : ACPI GPE0_BLK
    0450-0450 : ACPI PM2_CNT_BLK
  0500-057f : pnp 00:02
  0800-080f : pnp 00:02
0cf8-0cff : PCI conf1
0d00-ffff : PCI Bus 0000:00
  15e0-15ef : pnp 00:02
  1600-167f : pnp 00:02
  3000-3fff : PCI Bus 0000:0d
  4000-4fff : PCI Bus 0000:05
  5000-5fff : PCI Bus 0000:01
    5000-507f : 0000:01:00.0
  6020-603f : 0000:00:1f.2
    6020-603f : ahci
  6040-605f : 0000:00:19.0
  6060-6067 : 0000:00:1f.2
    6060-6067 : ahci
  6068-606f : 0000:00:1f.2
    6068-606f : ahci
  6070-6077 : 0000:00:16.3
    6070-6077 : serial
  6078-607b : 0000:00:1f.2
    6078-607b : ahci
  607c-607f : 0000:00:1f.2
    607c-607f : ahci
  efa0-efbf : 0000:00:1f.3
00000000-0000ffff : reserved
00010000-0009d7ff : System RAM
0009d800-0009ffff : reserved
000a0000-000bffff : PCI Bus 0000:00
000c0000-000c7fff : Video ROM
000c8000-000cbfff : pnp 00:00
000cc000-000cffff : pnp 00:00
000d0000-000d3fff : pnp 00:00
000d4000-000d7fff : pnp 00:00
000d8000-000dbfff : pnp 00:00
000dc000-000dffff : pnp 00:00
000e0000-000fffff : reserved
  000f0000-000fffff : System ROM
00100000-bf19efff : System RAM
  01000000-015f753e : Kernel code
  015f753f-01aba6ff : Kernel data
  01bb9000-01d0bfff : Kernel bss
bf19f000-bf69efff : reserved
bf69f000-bf79efff : ACPI Non-volatile Storage
bf79f000-bf7fefff : ACPI Tables
bf7ff000-bf7fffff : System RAM
bf800000-bfffffff : reserved
c0000000-febfffff : PCI Bus 0000:00
  c0000000-d1ffffff : PCI Bus 0000:01
    c0000000-cfffffff : 0000:01:00.0
    d0000000-d1ffffff : 0000:01:00.0
  d2000000-d30fffff : PCI Bus 0000:01
    d2000000-d2ffffff : 0000:01:00.0
      d2000000-d2ffffff : nvidia
    d3000000-d3003fff : 0000:01:00.1
      d3000000-d3003fff : ICH HD audio
    d3080000-d30fffff : 0000:01:00.0
  d3100000-d38fffff : PCI Bus 0000:05
  d3900000-d40fffff : PCI Bus 0000:0d
  d4100000-d41fffff : PCI Bus 0000:0e
    d4100000-d4101fff : 0000:0e:00.0
      d4100000-d4101fff : xhci_hcd
  d4200000-d49fffff : PCI Bus 0000:0d
    d4200000-d42000ff : 0000:0d:00.0
      d4200000-d42000ff : mmc0
  d4a00000-d51fffff : PCI Bus 0000:05
  d5200000-d52fffff : PCI Bus 0000:03
    d5200000-d5201fff : 0000:03:00.0
      d5200000-d5201fff : iwlagn
  d5300000-d531ffff : 0000:00:19.0
    d5300000-d531ffff : e1000e
  d5320000-d5323fff : 0000:00:1b.0
    d5320000-d5323fff : ICH HD audio
  d5324000-d53240ff : 0000:00:1f.3
  d5325000-d532500f : 0000:00:16.0
    d5325000-d532500f : mei
  d5328000-d53287ff : 0000:00:1f.2
    d5328000-d53287ff : ahci
  d5329000-d53293ff : 0000:00:1d.0
    d5329000-d53293ff : ehci_hcd
  d532a000-d532a3ff : 0000:00:1a.0
    d532a000-d532a3ff : ehci_hcd
  d532b000-d532bfff : 0000:00:19.0
    d532b000-d532bfff : e1000e
  d532c000-d532cfff : 0000:00:16.3
  f8000000-fbffffff : PCI MMCONFIG 0000 [bus 00-3f]
    f8000000-fbffffff : reserved
      f8000000-fbffffff : pnp 00:02
fec00000-fec00fff : reserved
fed00000-fed003ff : HPET 0
fed08000-fed08fff : reserved
fed10000-fed19fff : reserved
  fed10000-fed13fff : pnp 00:02
  fed18000-fed18fff : pnp 00:02
  fed19000-fed19fff : pnp 00:02
fed1c000-fed1ffff : reserved
  fed1c000-fed1ffff : pnp 00:02
fed40000-fed4bfff : PCI Bus 0000:00
  fed45000-fed4bfff : pnp 00:02
fee00000-fee00fff : Local APIC
  fee00000-fee00fff : reserved
ffd20000-ffffffff : reserved
100000000-43dffffff : System RAM
43e000000-43fffffff : RAM buffer

[X.] Other notes, patches, fixes, workarounds:


Patch follows:

-->8--

Subject: [PATCH] ppp_mppe: Don't discard non-FLUSHED-bit packets

In mppe_decompress, packets without the FLUSHED bit set
are discarded.  Some servers clear this bit on every other
packet and so streams are impossible to maintain.

Fix this by keeping these packets with the FLUSHED bit set.

This may turn out to be the wrong way to fix this, but it
has worked for me so far.  I only connect to three servers,
though.

Signed-off-by: Phil Hord <hordp@cisco.com>
---
 drivers/net/ppp/ppp_mppe.c |    6 ------
 1 file changed, 6 deletions(-)

diff --git a/drivers/net/ppp/ppp_mppe.c b/drivers/net/ppp/ppp_mppe.c
index 9a1849a..765c28a 100644
--- a/drivers/net/ppp/ppp_mppe.c
+++ b/drivers/net/ppp/ppp_mppe.c
@@ -517,12 +517,6 @@ mppe_decompress(void *arg, unsigned char *ibuf, int
isize, unsigned char *obuf,
                state->sanity_errors += 100;
                sanity = 1;
        }
-       if (!state->stateful && !flushed) {
-               printk(KERN_DEBUG "mppe_decompress[%d]: FLUSHED bit not
set in "
-                      "stateless mode!\n", state->unit);
-               state->sanity_errors += 100;
-               sanity = 1;
-       }
        if (state->stateful && ((ccount & 0xff) == 0xff) && !flushed) {
                printk(KERN_DEBUG "mppe_decompress[%d]: FLUSHED bit not
set on "
                       "flag packet!\n", state->unit);
-- 
1.7.10.509.g991919

^ permalink raw reply related

* RE: [PATCH 2/4] ipgre: follow state of lower device
From: Christian Benvenuti (benve) @ 2012-05-04 23:34 UTC (permalink / raw)
  To: Stephen Hemminger, David Miller; +Cc: netdev, kaber
In-Reply-To: <20120503154025.0845359e@nehalam.linuxnetplumber.net>

Is this the same issue I described in the email below?

  Subject:Route flush on linkdown: physical vs virtual/stacked
interfaces
  http://marc.info/?l=linux-netdev&m=133468470719285&w=2

(ie, need to propagate carrier changes to upper layer device/s)

Thanks
/Chris

> -----Original Message-----
> From: netdev-owner@vger.kernel.org
[mailto:netdev-owner@vger.kernel.org] On Behalf Of Stephen
> Hemminger
> Sent: Thursday, May 03, 2012 3:40 PM
> To: David Miller
> Cc: netdev@vger.kernel.org
> Subject: Re: [PATCH 2/4] ipgre: follow state of lower device
> 
> On Sat, 14 Apr 2012 14:53:02 -0400 (EDT)
> David Miller <davem@davemloft.net> wrote:
> 
> > From: Stephen Hemminger <shemminger@vyatta.com>
> > Date: Thu, 12 Apr 2012 09:31:17 -0700
> >
> > > GRE tunnels like other layered devices should propogate
> > > carrier and RFC2863 state from lower device to tunnel.
> > >
> > > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> >
> > Like others I don't like the ugly hash traversal.
> >
> > A small hash on ifindex, iflink, or whatever ought to be easy and
make
> > the code look much nicer.
> >
> > Longer term project is that a lot of this tunneling code can be
> > commonized at some point.
> 
> The whole set of tunnels needs to be cleaned up to be something
modular, clean
> and cached like the code in OpenVswitch.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH 0/3] First pass of cleanups for pskb_expand_head
From: Alexander Duyck @ 2012-05-05  0:26 UTC (permalink / raw)
  To: netdev; +Cc: davem, eric.dumazet, jeffrey.t.kirsher

After looking over the tcp coalesing and GRO code a couple of days ago it
occurred to me that pskb_expand_head has a few flaws.  A few of which are
addressed in this patch series.

This change set takes care of some of the minor cleanup items.  One thing
that caught my eye is the fact the memmove code in the fast-path is likely
no longer doing any thing but burning cycles on a call that doesn't
actually move any memory.

The other change is a follow on to that to drop the fastpath variable which
really just means if the skb is cloned or not.

The final change in this set just adds an inline for getting the end offset
since there were multiple places where we were computing end - head to get
the offset and if we are storing it as an offset it makes more sense to
just pull the actual value.

There are a few more items that I will try to get to next week.  The big one
is the fact that pskb_expand_head can mess up the truesize since it can
allocate a new head but never updates the truesize.  I plan on adding a helper
function for the cases where we are just using it unshare the head so I can
identify the places where we are actually modifying the size.

---

Alexander Duyck (3):
      skb: Add inline helper for getting the skb end offset from head
      skb: Drop "fastpath" variable for skb_cloned check in pskb_expand_head
      skb: Drop bad code from pskb_expand_head

 drivers/atm/ambassador.c             |    2 +
 drivers/atm/idt77252.c               |    2 +
 drivers/net/wimax/i2400m/usb-rx.c    |    2 +
 drivers/staging/octeon/ethernet-tx.c |    2 +
 include/linux/skbuff.h               |   12 ++++++++-
 net/core/skbuff.c                    |   46 ++++++++++------------------------
 6 files changed, 29 insertions(+), 37 deletions(-)

-- 
Thanks,

Alex

^ permalink raw reply

* [PATCH 1/3] skb: Drop bad code from pskb_expand_head
From: Alexander Duyck @ 2012-05-05  0:26 UTC (permalink / raw)
  To: netdev; +Cc: davem, eric.dumazet, jeffrey.t.kirsher
In-Reply-To: <20120505001059.21292.31647.stgit@gitlad.jf.intel.com>

The fast-path for pskb_expand_head contains a check where the size plus the
unaligned size of skb_shared_info is compared against the size of the data
buffer.  This code path has two issues.  First is the fact that after the
recent changes by Eric Dumazet to __alloc_skb and build_skb the shared info
is always placed in the optimal spot for a buffer size making this check
unnecessary.  The second issue is the fact that the check doesn't take into
account the aligned size of shared info.  As a result the code burns cycles
doing a memcpy with nothing actually being shifted.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---

 net/core/skbuff.c |   12 ------------
 1 files changed, 0 insertions(+), 12 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index c199aa4..4d085d4 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -951,17 +951,6 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 		fastpath = atomic_read(&skb_shinfo(skb)->dataref) == delta;
 	}
 
-	if (fastpath && !skb->head_frag &&
-	    size + sizeof(struct skb_shared_info) <= ksize(skb->head)) {
-		memmove(skb->head + size, skb_shinfo(skb),
-			offsetof(struct skb_shared_info,
-				 frags[skb_shinfo(skb)->nr_frags]));
-		memmove(skb->head + nhead, skb->head,
-			skb_tail_pointer(skb) - skb->head);
-		off = nhead;
-		goto adjust_others;
-	}
-
 	data = kmalloc(size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
 		       gfp_mask);
 	if (!data)
@@ -997,7 +986,6 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 
 	skb->head     = data;
 	skb->head_frag = 0;
-adjust_others:
 	skb->data    += off;
 #ifdef NET_SKBUFF_DATA_USES_OFFSET
 	skb->end      = size;

^ permalink raw reply related

* [PATCH 2/3] skb: Drop "fastpath" variable for skb_cloned check in pskb_expand_head
From: Alexander Duyck @ 2012-05-05  0:26 UTC (permalink / raw)
  To: netdev; +Cc: davem, eric.dumazet, jeffrey.t.kirsher
In-Reply-To: <20120505001059.21292.31647.stgit@gitlad.jf.intel.com>

Since there is now only one spot that actually uses "fastpath" there isn't
much point in carrying it.  Instead we can just use a check for skb_cloned
to verify if we can perform the fast-path free for the head or not.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---

 net/core/skbuff.c |   22 ++++++++--------------
 1 files changed, 8 insertions(+), 14 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 4d085d4..17e4b1e 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -932,7 +932,6 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 	u8 *data;
 	int size = nhead + (skb_end_pointer(skb) - skb->head) + ntail;
 	long off;
-	bool fastpath;
 
 	BUG_ON(nhead < 0);
 
@@ -941,16 +940,6 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 
 	size = SKB_DATA_ALIGN(size);
 
-	/* Check if we can avoid taking references on fragments if we own
-	 * the last reference on skb->head. (see skb_release_data())
-	 */
-	if (!skb->cloned)
-		fastpath = true;
-	else {
-		int delta = skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1;
-		fastpath = atomic_read(&skb_shinfo(skb)->dataref) == delta;
-	}
-
 	data = kmalloc(size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
 		       gfp_mask);
 	if (!data)
@@ -966,9 +955,12 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 	       skb_shinfo(skb),
 	       offsetof(struct skb_shared_info, frags[skb_shinfo(skb)->nr_frags]));
 
-	if (fastpath) {
-		skb_free_head(skb);
-	} else {
+	/*
+	 * if shinfo is shared we must drop the old head gracefully, but if it
+	 * is not we can just drop the old head and let the existing refcount
+	 * be since all we did is relocate the values
+	 */
+	if (skb_cloned(skb)) {
 		/* copy this zero copy skb frags */
 		if (skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) {
 			if (skb_copy_ubufs(skb, gfp_mask))
@@ -981,6 +973,8 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 			skb_clone_fraglist(skb);
 
 		skb_release_data(skb);
+	} else {
+		skb_free_head(skb);
 	}
 	off = (data + nhead) - skb->head;
 

^ permalink raw reply related

* [PATCH 3/3] skb: Add inline helper for getting the skb end offset from head
From: Alexander Duyck @ 2012-05-05  0:26 UTC (permalink / raw)
  To: netdev; +Cc: davem, eric.dumazet, jeffrey.t.kirsher
In-Reply-To: <20120505001059.21292.31647.stgit@gitlad.jf.intel.com>

With the recent changes for how we compute the skb truesize it occurs to me
we are probably going to have a lot of calls to skb_end_pointer -
skb->head.  Instead of running all over the place doing that it would make
more sense to just make it a separate inline skb_end_offset(skb) that way
we can return the correct value without having gcc having to do all the
optimization to cancel out skb->head - skb->head.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---

 drivers/atm/ambassador.c             |    2 +-
 drivers/atm/idt77252.c               |    2 +-
 drivers/net/wimax/i2400m/usb-rx.c    |    2 +-
 drivers/staging/octeon/ethernet-tx.c |    2 +-
 include/linux/skbuff.h               |   12 +++++++++++-
 net/core/skbuff.c                    |   12 ++++++------
 6 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/drivers/atm/ambassador.c b/drivers/atm/ambassador.c
index f8f41e0..89b30f3 100644
--- a/drivers/atm/ambassador.c
+++ b/drivers/atm/ambassador.c
@@ -802,7 +802,7 @@ static void fill_rx_pool (amb_dev * dev, unsigned char pool,
     }
     // cast needed as there is no %? for pointer differences
     PRINTD (DBG_SKB, "allocated skb at %p, head %p, area %li",
-	    skb, skb->head, (long) (skb_end_pointer(skb) - skb->head));
+	    skb, skb->head, (long) skb_end_offset(skb));
     rx.handle = virt_to_bus (skb);
     rx.host_address = cpu_to_be32 (virt_to_bus (skb->data));
     if (rx_give (dev, &rx, pool))
diff --git a/drivers/atm/idt77252.c b/drivers/atm/idt77252.c
index 1c05212..8974bd2 100644
--- a/drivers/atm/idt77252.c
+++ b/drivers/atm/idt77252.c
@@ -1258,7 +1258,7 @@ idt77252_rx_raw(struct idt77252_dev *card)
 	tail = readl(SAR_REG_RAWCT);
 
 	pci_dma_sync_single_for_cpu(card->pcidev, IDT77252_PRV_PADDR(queue),
-				    skb_end_pointer(queue) - queue->head - 16,
+				    skb_end_offset(queue) - 16,
 				    PCI_DMA_FROMDEVICE);
 
 	while (head != tail) {
diff --git a/drivers/net/wimax/i2400m/usb-rx.c b/drivers/net/wimax/i2400m/usb-rx.c
index e325768..b78ee67 100644
--- a/drivers/net/wimax/i2400m/usb-rx.c
+++ b/drivers/net/wimax/i2400m/usb-rx.c
@@ -277,7 +277,7 @@ retry:
 		d_printf(1, dev, "RX: size changed to %d, received %d, "
 			 "copied %d, capacity %ld\n",
 			 rx_size, read_size, rx_skb->len,
-			 (long) (skb_end_pointer(new_skb) - new_skb->head));
+			 (long) skb_end_offset(new_skb));
 		goto retry;
 	}
 		/* In most cases, it happens due to the hardware scheduling a
diff --git a/drivers/staging/octeon/ethernet-tx.c b/drivers/staging/octeon/ethernet-tx.c
index 56d74dc..418ed03 100644
--- a/drivers/staging/octeon/ethernet-tx.c
+++ b/drivers/staging/octeon/ethernet-tx.c
@@ -344,7 +344,7 @@ int cvm_oct_xmit(struct sk_buff *skb, struct net_device *dev)
 	}
 	if (unlikely
 	    (skb->truesize !=
-	     sizeof(*skb) + skb_end_pointer(skb) - skb->head)) {
+	     sizeof(*skb) + skb_end_offset(skb))) {
 		/*
 		   printk("TX buffer truesize has been changed\n");
 		 */
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 37f5391..91ad5e2 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -645,11 +645,21 @@ static inline unsigned char *skb_end_pointer(const struct sk_buff *skb)
 {
 	return skb->head + skb->end;
 }
+
+static inline unsigned int skb_end_offset(const struct sk_buff *skb)
+{
+	return skb->end;
+}
 #else
 static inline unsigned char *skb_end_pointer(const struct sk_buff *skb)
 {
 	return skb->end;
 }
+
+static inline unsigned int skb_end_offset(const struct sk_buff *skb)
+{
+	return skb->end - skb->head;
+}
 #endif
 
 /* Internal */
@@ -2558,7 +2568,7 @@ static inline bool skb_is_recycleable(const struct sk_buff *skb, int skb_size)
 		return false;
 
 	skb_size = SKB_DATA_ALIGN(skb_size + NET_SKB_PAD);
-	if (skb_end_pointer(skb) - skb->head < skb_size)
+	if (skb_end_offset(skb) < skb_size)
 		return false;
 
 	if (skb_shared(skb) || skb_cloned(skb))
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 17e4b1e..2c35da8 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -829,7 +829,7 @@ static void copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
 struct sk_buff *skb_copy(const struct sk_buff *skb, gfp_t gfp_mask)
 {
 	int headerlen = skb_headroom(skb);
-	unsigned int size = (skb_end_pointer(skb) - skb->head) + skb->data_len;
+	unsigned int size = skb_end_offset(skb) + skb->data_len;
 	struct sk_buff *n = alloc_skb(size, gfp_mask);
 
 	if (!n)
@@ -930,7 +930,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 {
 	int i;
 	u8 *data;
-	int size = nhead + (skb_end_pointer(skb) - skb->head) + ntail;
+	int size = nhead + skb_end_offset(skb) + ntail;
 	long off;
 
 	BUG_ON(nhead < 0);
@@ -2727,14 +2727,13 @@ struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features)
 			if (unlikely(!nskb))
 				goto err;
 
-			hsize = skb_end_pointer(nskb) - nskb->head;
+			hsize = skb_end_offset(nskb);
 			if (skb_cow_head(nskb, doffset + headroom)) {
 				kfree_skb(nskb);
 				goto err;
 			}
 
-			nskb->truesize += skb_end_pointer(nskb) - nskb->head -
-					  hsize;
+			nskb->truesize += skb_end_offset(nskb) - hsize;
 			skb_release_head_state(nskb);
 			__skb_push(nskb, doffset);
 		} else {
@@ -2883,7 +2882,8 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 		skb_frag_size_sub(frag, offset);
 
 		/* all fragments truesize : remove (head size + sk_buff) */
-		delta_truesize = skb->truesize - SKB_TRUESIZE(skb_end_pointer(skb) - skb->head);
+		delta_truesize = skb->truesize -
+				 SKB_TRUESIZE(skb_end_offset(skb));
 
 		skb->truesize -= skb->data_len;
 		skb->len -= skb->data_len;

^ permalink raw reply related

* Re: [net-next PATCH v4 0/8] Managing the forwarding database(FDB)
From: John Fastabend @ 2012-05-05  5:00 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: Sridhar Samudrala, Michael S. Tsirkin, shemminger, bhutchings,
	hadi, jeffrey.t.kirsher, netdev, gregory.v.rose, krkumar2
In-Reply-To: <CAGe6so8q26X=HoQx+P-wkoLMtq1NhRerk98-v0cxhUpvMH4zmQ@mail.gmail.com>

On 5/4/2012 1:34 PM, Roopa Prabhu wrote:
> 
> 
> On Thu, May 3, 2012 at 10:43 PM, Sridhar Samudrala <sri@us.ibm.com <mailto:sri@us.ibm.com>> wrote:
> 
>     On 5/3/2012 12:38 PM, John Fastabend wrote:
> 
>         On 5/2/2012 4:36 PM, Sridhar Samudrala wrote:
> 
>             On 5/2/2012 2:52 PM, John Fastabend wrote:
> 
>                 On 5/2/2012 8:08 AM, Michael S. Tsirkin wrote:
> 
>                     On Sun, Apr 15, 2012 at 01:06:37PM -0400, David Miller wrote:
> 
>                         From: John Fastabend<john.r.fastabend@__intel.com <mailto:john.r.fastabend@intel.com>>
>                         Date: Sun, 15 Apr 2012 09:43:51 -0700
> 
>                             The following series is a submission for net-next to allow
>                             embedded switches and other stacked devices other then the
>                             Linux bridge to manage a forwarding database.
> 
>                             Previously discussed here,
> 
>                             http://lists.openwall.net/__netdev/2012/03/19/26 <http://lists.openwall.net/netdev/2012/03/19/26>
> 
>                             v4: propagate return codes correctly for ndo_dflt_Fdb_dump()
> 
>                             v3: resolve the macvlan patch 8/8 to fix a dev_set_promiscuity()
>                                  error and add the flags field to change and get link routines.
> 
>                             v2: addressed feedback from Ben Hutchings resolving a typo in the
>                                  multicast add/del routines and improving the error handling
>                                  when both NTF_SELF and NTF_MASTER are set.
> 
>                             I've tested this with 'br' tool published by Stephen Hemminger
>                             soon to be renamed 'bridge' I believe and various traffic
>                             generators mostly pktgen, ping, and netperf.
> 
>                         All applied, if we need any more tweaks we can just add them
>                         on top of this work.
> 
>                         Thanks John.
> 
>                     John, do you plan to update kvm userspace to use this interface?
> 
>                 No immediate plans. I would really appreciate it if you or one
>                 of the IBM developers working in this space took it on. Of course
>                 if no one steps up I guess I can eventually get at it but it will
>                 be sometime. For now I've been doing this manually with the bridge
>                 tool yet to be published.
> 
> 
>             Does this mean that when we add an interface to a bridge, it need not be put in promiscuous mode and
>             add/delete fdb entries dynamically?
> 
>         The net/bridge will automatically put the interface in promisc mode
>         when the device is attached. We do need to add/delete fdb entries
>         though to allow forwarding packets from the virtual function and
>         any emulated devices e.g. tap devices on the bridge.
> 
> 
>     Consider the following scenario where we have a SR-IOV NIC with 1 PF
>     and 2 VFs (VF1 & VF2).
>     - eth0 is the PF which is attached to bridge br0 and connected to 2 VMs VM1 and VM2.
>     - eth1 is the VF1 terminated on the host and assigned to VM3 via macvtap0 in passthru mode.
>     - VF2 is directly assigned to VM4 via pci-device assignment.
> 
>      VM1      VM2         VM3           VM4
>     (mac1)  (mac2)     (mac3)         (mac4)
>      |        |           |             |
>      |        |           |             |
>     vnet0   vnet1         |             |
>      |        |           |             |
>      \        /           |             |
>      \      /            |             |
>        br0            macvtap0         |
>         |              (mac3)          |
>         |                |             |
>        eth0            eth1            |
>         |              (mac3)          |
>         |               |              |
>       ------------------------------__------
>      | PF              VF1           VF2  |
>      |                                    |
>      |                 VEB                |
>      ------------------------------__------
> 
>     In this setup, i think when VM1 and VM2 come up, mac1 and mac2 have to be added to the
>     embedded bridge's fdb.  Once we add these 2 entries, all the 4 VMs can talk to each other.
>     Is this correct?
> 

Correct as Roopa indicated.

>     Now, if VM1 or VM2 wants to add secondary mac addresses, i think we need qemu to add a new fdb
>     entry when it receives add mac address command via virtio control vq.
> 
> 
> yes. I had used (with some tweaks) some existing qemu patches on patchwork to try this out with my implementation.
> 
> The links to the patches on patchwork are listed in my cover mail at http://marc.info/?l=linux-netdev&m=131534911001054&w=2 <http://marc.info/?l=linux-netdev&m=131534911001054&w=2>
> 
>  
> 
>     Can we add multiple mac addresses to VFs? For example VM3 and VM4 trying to add a secondary mac address.

Yes this is why we also added the fdb interface to the macvlan device as well.

> 
>     What about VMs trying to create VLANs? I think this will work on VM1 and VM2. However with VM3
>     and VM4, i think we need qemu to add vlans to the VFs when the VMs create them.
> 
> 
> yes for vlans too, the qemu patches pointed out above can be reused.
> 
> Thanks,
> Roopa
>  
> 

^ permalink raw reply

* Re: [PATCH 1/3] skb: Drop bad code from pskb_expand_head
From: Eric Dumazet @ 2012-05-05  5:35 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: netdev, davem, jeffrey.t.kirsher
In-Reply-To: <20120505002645.21292.38368.stgit@gitlad.jf.intel.com>

On Fri, 2012-05-04 at 17:26 -0700, Alexander Duyck wrote:
> The fast-path for pskb_expand_head contains a check where the size plus the
> unaligned size of skb_shared_info is compared against the size of the data
> buffer.  This code path has two issues.  First is the fact that after the
> recent changes by Eric Dumazet to __alloc_skb and build_skb the shared info
> is always placed in the optimal spot for a buffer size making this check
> unnecessary.  The second issue is the fact that the check doesn't take into
> account the aligned size of shared info.  As a result the code burns cycles
> doing a memcpy with nothing actually being shifted.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---
> 
>  net/core/skbuff.c |   12 ------------
>  1 files changed, 0 insertions(+), 12 deletions(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index c199aa4..4d085d4 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -951,17 +951,6 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
>  		fastpath = atomic_read(&skb_shinfo(skb)->dataref) == delta;
>  	}
>  
> -	if (fastpath && !skb->head_frag &&
> -	    size + sizeof(struct skb_shared_info) <= ksize(skb->head)) {
> -		memmove(skb->head + size, skb_shinfo(skb),
> -			offsetof(struct skb_shared_info,
> -				 frags[skb_shinfo(skb)->nr_frags]));
> -		memmove(skb->head + nhead, skb->head,
> -			skb_tail_pointer(skb) - skb->head);
> -		off = nhead;
> -		goto adjust_others;
> -	}
> -
>  	data = kmalloc(size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
>  		       gfp_mask);
>  	if (!data)
> @@ -997,7 +986,6 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
>  
>  	skb->head     = data;
>  	skb->head_frag = 0;
> -adjust_others:
>  	skb->data    += off;
>  #ifdef NET_SKBUFF_DATA_USES_OFFSET
>  	skb->end      = size;
> 

I totally agree this code is no longer needed, we already have the
skb_shared_info at the end of the buffer.

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [PATCH 2/3] skb: Drop "fastpath" variable for skb_cloned check in pskb_expand_head
From: Eric Dumazet @ 2012-05-05  5:37 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: netdev, davem, jeffrey.t.kirsher
In-Reply-To: <20120505002651.21292.19680.stgit@gitlad.jf.intel.com>

On Fri, 2012-05-04 at 17:26 -0700, Alexander Duyck wrote:
> Since there is now only one spot that actually uses "fastpath" there isn't
> much point in carrying it.  Instead we can just use a check for skb_cloned
> to verify if we can perform the fast-path free for the head or not.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---
> 
>  net/core/skbuff.c |   22 ++++++++--------------
>  1 files changed, 8 insertions(+), 14 deletions(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 4d085d4..17e4b1e 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -932,7 +932,6 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
>  	u8 *data;
>  	int size = nhead + (skb_end_pointer(skb) - skb->head) + ntail;
>  	long off;
> -	bool fastpath;
>  
>  	BUG_ON(nhead < 0);
>  
> @@ -941,16 +940,6 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
>  
>  	size = SKB_DATA_ALIGN(size);
>  
> -	/* Check if we can avoid taking references on fragments if we own
> -	 * the last reference on skb->head. (see skb_release_data())
> -	 */
> -	if (!skb->cloned)
> -		fastpath = true;
> -	else {
> -		int delta = skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1;
> -		fastpath = atomic_read(&skb_shinfo(skb)->dataref) == delta;
> -	}
> -
>  	data = kmalloc(size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
>  		       gfp_mask);
>  	if (!data)
> @@ -966,9 +955,12 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
>  	       skb_shinfo(skb),
>  	       offsetof(struct skb_shared_info, frags[skb_shinfo(skb)->nr_frags]));
>  
> -	if (fastpath) {
> -		skb_free_head(skb);
> -	} else {
> +	/*
> +	 * if shinfo is shared we must drop the old head gracefully, but if it
> +	 * is not we can just drop the old head and let the existing refcount
> +	 * be since all we did is relocate the values
> +	 */
> +	if (skb_cloned(skb)) {
>  		/* copy this zero copy skb frags */
>  		if (skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) {
>  			if (skb_copy_ubufs(skb, gfp_mask))
> @@ -981,6 +973,8 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
>  			skb_clone_fraglist(skb);
>  
>  		skb_release_data(skb);
> +	} else {
> +		skb_free_head(skb);
>  	}
>  	off = (data + nhead) - skb->head;
>  
> 

Excellent

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [PATCH 3/3] skb: Add inline helper for getting the skb end offset from head
From: Eric Dumazet @ 2012-05-05  5:39 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: netdev, davem, jeffrey.t.kirsher
In-Reply-To: <20120505002656.21292.89799.stgit@gitlad.jf.intel.com>

On Fri, 2012-05-04 at 17:26 -0700, Alexander Duyck wrote:
> With the recent changes for how we compute the skb truesize it occurs to me
> we are probably going to have a lot of calls to skb_end_pointer -
> skb->head.  Instead of running all over the place doing that it would make
> more sense to just make it a separate inline skb_end_offset(skb) that way
> we can return the correct value without having gcc having to do all the
> optimization to cancel out skb->head - skb->head.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---
> 
>  drivers/atm/ambassador.c             |    2 +-
>  drivers/atm/idt77252.c               |    2 +-
>  drivers/net/wimax/i2400m/usb-rx.c    |    2 +-
>  drivers/staging/octeon/ethernet-tx.c |    2 +-
>  include/linux/skbuff.h               |   12 +++++++++++-
>  net/core/skbuff.c                    |   12 ++++++------
>  6 files changed, 21 insertions(+), 11 deletions(-)

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox