Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] pktgen: Clone skb to avoid corruption of skbs in ndo_start_xmit methods
From: David Miller @ 2011-07-22  0:08 UTC (permalink / raw)
  To: nhorman; +Cc: greearb, eric.dumazet, jpirko, netdev, adobriyan, robert.olsson
In-Reply-To: <20110721235049.GA29489@hmsreliant.think-freely.org>

From: Neil Horman <nhorman@tuxdriver.com>
Date: Thu, 21 Jul 2011 19:50:49 -0400

> I'm happy to go down this route Dave, and agree, its a more solid solution, but
> I think the problem with it (which Ben may have been alluding to previously) is
> that pktgen doesn't use dev_queue_xmit or dev_hard_start_xmit to send frames.

Neil, that's THE WHOLE POINT, and HOW MY IDEA WORKS.

Pktgen bypasses those functions, so it won't trigger the check and
therefore won't trigger making a copy of the SKB, since copying isn't
needed.

Only layering devices will end up eventually calling into those two
functions and trigger the "need to copy because this SKB is pktgen
shared" check.

^ permalink raw reply

* Re: ipvs oops in 3.0-rc7
From: Simon Horman @ 2011-07-22  0:08 UTC (permalink / raw)
  To: Huajun Li
  Cc: Julian Anastasov, Randy Dunlap, netdev, lvs-devel, Wensong Zhang
In-Reply-To: <CA+v9cxbhpt3LSZ_m+OA_Kujt79mvDZ7VnGqjbbPQxS_gY6+pTA@mail.gmail.com>

On Thu, Jul 21, 2011 at 09:37:29PM +0800, Huajun Li wrote:
> 2011/7/21 Julian Anastasov <ja@ssi.bg>:
> >
> >        Hello,
> >
> > On Wed, 20 Jul 2011, Randy Dunlap wrote:
> >
> >> I'm seeing the following Oops in 3.0-rc7 on x86_64, just loading and unloading
> >> modules.  Any chance this is already fixed?  I can test current git, but I
> >> wanted to ask first.
> >>
> >> Looks like it is on the second module load of ip_vs (i.e.,
> >> modprobe ip_vs; rmmod ip_vs; modprobe ip_vs).
> >
> >        I think, this problem was fixed by this patch:
> >
> > http://www.spinics.net/lists/lvs-devel/msg02051.html
> >
> >        But it seems it was lost somewhere ...
> >
> 
> That's great, SB. can help to apply it again, thanks.

Sorry about that. For some reason I thought that would be / had been
picked up. I'll send a pull request ASAP.

^ permalink raw reply

* Re: [PATCH net-next-2.6] ipv6: make fragment identifications less predictable
From: David Miller @ 2011-07-22  0:07 UTC (permalink / raw)
  To: fernando; +Cc: eric.dumazet, security, eugeneteo, netdev, mpm
In-Reply-To: <4E28B84C.2090305@gont.com.ar>

From: Fernando Gont <fernando@gont.com.ar>
Date: Thu, 21 Jul 2011 20:37:48 -0300

> While I haven't had a chance to look at the patch yet, I was wondering
> whether it defines both a separe "offset" and "counter" for each "flow".

Why wonder, just take a look.

If you aren't willing to even bother looking at the solution we came
up with, why should we feel compelled to spend any time at all looking
at your "anaylsis" and "concerns"?

^ permalink raw reply

* Re: [PATCH] pktgen: Clone skb to avoid corruption of skbs in ndo_start_xmit methods
From: Neil Horman @ 2011-07-21 23:50 UTC (permalink / raw)
  To: David Miller
  Cc: greearb, eric.dumazet, jpirko, netdev, adobriyan, robert.olsson
In-Reply-To: <20110721.151903.297506479006061401.davem@davemloft.net>

On Thu, Jul 21, 2011 at 03:19:03PM -0700, David Miller wrote:
> From: Ben Greear <greearb@candelatech.com>
> Date: Thu, 21 Jul 2011 15:14:32 -0700
> 
> > On 07/21/2011 03:01 PM, David Miller wrote:
> >> From: Neil Horman<nhorman@tuxdriver.com>
> >> Date: Wed, 20 Jul 2011 11:18:27 -0400
> >>
> >>> On Wed, Jul 20, 2011 at 06:24:15AM +0200, Eric Dumazet wrote:
> >>>> Le mardi 19 juillet 2011 à 22:07 -0400, Neil Horman a écrit :
> >>>>>>
> >>>>> I think this is a good idea.  It lets pktgen dynamically make the
> >>>>> clone/share
> >>>>> decision dynamically and only impacts performance for those systems.
> >>>>>
> >>>>
> >>>> Just let pktgen refuse to use clone_skb command for these devices.
> >>>>
> >>> copy that, This is by no means final, but what do you think of this?
> >>> If its
> >>> agreeable to you, Ben, et al. I can add this to my local tree and
> >>> start auditing
> >>> all the drivers that may need to have the flag set.
> >>
> >> I think there is a much simpler solution.
> >>
> >> Set a flag in the SKB when pktgen does SKB sharing.
> >>
> >> In dev_queue_xmit() (or perhaps, dev_hard_start_xmit()), check the
> >> flag
> >> and if it's set then we copy the SKB.
> >>
> >> If this works, then we fix the crash and no driver changes are
> >> necessary both now and in the future.
> > 
> > Doesn't that make clone-skb in pktgen much less efficient
> > in all cases?
> 
> No, the copy only happens if we enter dev_queue_xmit() which pktgen
> doesn't do, it calls the driver's ->ndo_start_xmit() method directly.
> 
> That's the whole idea.  Only these encapsulating software devices
> will trigger the condition.
> 


I'm happy to go down this route Dave, and agree, its a more solid solution, but
I think the problem with it (which Ben may have been alluding to previously) is
that pktgen doesn't use dev_queue_xmit or dev_hard_start_xmit to send frames.
It mimics the locking of dev_hard_start_xmit (but ignores the other checks that
function does), and just calls the ndo_start_xmit routine of the driver
directly.  So theres no common code that an skb traverses from pktgen to a given
driver where we can check such a flag

Again, I'm happy to change that so that pktgen uses dev_hard_start_xmit, but I
wonder if thats going to get the same sort of pushback about performance that my
origional patch did.  Eric, et al., thoughts?

Regards
Neil


^ permalink raw reply

* Re: [PATCH net-next-2.6] ipv6: make fragment identifications less predictable
From: Fernando Gont @ 2011-07-21 23:37 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, security, eugeneteo, netdev, mpm
In-Reply-To: <20110721.151750.995903739612693126.davem@davemloft.net>

On 07/21/2011 07:17 PM, David Miller wrote:
>> Does it make sense to go in this direction rather than simply randomize
>> the IPv6 Fragment Identification?
> 
> We could, but that's actually a bit more work.
> 
> You have to avoid recycling IDs to the same destination host otherwise
> a retransmit could use the same ID and overlap with a previous set of
> frags, causing corruption.
> 
> This means if you go the "pure random" route, you have to make sure
> that the 32-bit series produced by the random number generator is
> maximally long.  This is why openbsd uses an ID generator based upon
> skip32 etc.

That scenario assumes packet reordering and/or packet loss. In that
case, and at those packet rates, I'd argue that you shoudn't be relying
on fragmentation, anyway (e.g., use PMTUD).


> So we have to look at it like a constrained resource, and therefore
> doing it on a per-destination basis like ipv4 makes a lot of sense.
> 
> I think Eric's work is the way forward and I'll be applying his patches.

While I haven't had a chance to look at the patch yet, I was wondering
whether it defines both a separe "offset" and "counter" for each "flow".

(assuming that Identification values are selected as a result of ID =
offset + counter, where counter is incremented for each packet that is
sent, and offset is some value that depends on (src IP, dst ip)).

If you have separate "offset" for each flow, you solve the problem of
"predictable Identification values". However, if there's a single global
"counter", you're still subject of being exploited for an "idle scan").

Thanks,
-- 
Fernando Gont
e-mail: fernando@gont.com.ar || fgont@acm.org
PGP Fingerprint: 7809 84F5 322E 45C7 F1C9 3945 96EE A9EF D076 FFF1




^ permalink raw reply

* Re: [PATCH net-next-2.6] ipv6: make fragment identifications less predictable
From: David Miller @ 2011-07-21 23:13 UTC (permalink / raw)
  To: rick.jones2; +Cc: fernando, eric.dumazet, security, eugeneteo, netdev, mpm
In-Reply-To: <4E28AC3D.1010907@hp.com>

From: Rick Jones <rick.jones2@hp.com>
Date: Thu, 21 Jul 2011 15:46:21 -0700

> On 07/21/2011 03:17 PM, David Miller wrote:
>> From: Fernando Gont<fernando@gont.com.ar>
>> Date: Wed, 20 Jul 2011 22:32:18 -0300
>>
>>> Does it make sense to go in this direction rather than simply
>>> randomize
>>> the IPv6 Fragment Identification?
>>
>> We could, but that's actually a bit more work.
>>
>> You have to avoid recycling IDs to the same destination host otherwise
>> a retransmit could use the same ID and overlap with a previous set of
>> frags, causing corruption.
> 
> I think you mean ID reuse rather than packet retransmit no?
> 
> It is the same "frankengram" issue present in IPv4 with its now puny
> 16 bit id field. Doesn't that pretty much rely on layer4 or higher
> checksums to avoid corruption?

We've had documented cases where checksums match after recycling the
fragment ID space in ipv4, that's why we have all of the special
code in the ipv4 fragmentation handling to work around that problem.

^ permalink raw reply

* Re: [PATCH net-next-2.6] ipv6: make fragment identifications less predictable
From: Rick Jones @ 2011-07-21 22:46 UTC (permalink / raw)
  To: David Miller; +Cc: fernando, eric.dumazet, security, eugeneteo, netdev, mpm
In-Reply-To: <20110721.151750.995903739612693126.davem@davemloft.net>

On 07/21/2011 03:17 PM, David Miller wrote:
> From: Fernando Gont<fernando@gont.com.ar>
> Date: Wed, 20 Jul 2011 22:32:18 -0300
>
>> Does it make sense to go in this direction rather than simply randomize
>> the IPv6 Fragment Identification?
>
> We could, but that's actually a bit more work.
>
> You have to avoid recycling IDs to the same destination host otherwise
> a retransmit could use the same ID and overlap with a previous set of
> frags, causing corruption.

I think you mean ID reuse rather than packet retransmit no?

It is the same "frankengram" issue present in IPv4 with its now puny 16 
bit id field. Doesn't that pretty much rely on layer4 or higher 
checksums to avoid corruption?

> This means if you go the "pure random" route, you have to make sure
> that the 32-bit series produced by the random number generator is
> maximally long.  This is why openbsd uses an ID generator based upon
> skip32 etc.
>
> And I cannot say that about our RNG infrastructure.
>
> Also, 32-bits seems like a lot, but on a 40Gb link we can exhaust this
> space in ~20 minutes (1554 byte packet over standard ethernet MTU at
> 40Gbit is ~3454767 ipv6 frag IDs per second).  So while maybe not a
> serious issue right now, we seem to go up by a factor of 10 every few
> years, therefore at ~400Gb it's down to 2 minutes.

With mode-rr and bonding (and enough CPU) you could probably get it down 
to 2 minutes without having to wait a few years.

rick jones

> So we have to look at it like a constrained resource, and therefore
> doing it on a per-destination basis like ipv4 makes a lot of sense.
>
> I think Eric's work is the way forward and I'll be applying his patches.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply

* Re: [PATCH 4/4] stmmac: improve and up-to-date the documentation
From: David Miller @ 2011-07-21 22:30 UTC (permalink / raw)
  To: peppe.cavallaro; +Cc: netdev
In-Reply-To: <1311156324-23928-4-git-send-email-peppe.cavallaro@st.com>

From: Giuseppe CAVALLARO <peppe.cavallaro@st.com>
Date: Wed, 20 Jul 2011 12:05:24 +0200

> This patch adds new information for the driver
> especially about its platform structure fields.
> 
> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>

Applied.

^ permalink raw reply

* Re: [PATCH 3/4] stmmac: unify MAC and PHY configuration parameters (V2)
From: David Miller @ 2011-07-21 22:30 UTC (permalink / raw)
  To: peppe.cavallaro; +Cc: netdev, stuart.menefy
In-Reply-To: <1311156324-23928-3-git-send-email-peppe.cavallaro@st.com>

From: Giuseppe CAVALLARO <peppe.cavallaro@st.com>
Date: Wed, 20 Jul 2011 12:05:23 +0200

> Prior to this change, most PHY configuration parameters were passed
> into the STMMAC device as a separate PHY device. As well as being
> unusual, this made it difficult to make changes to the MAC/PHY
> relationship.
> 
> This patch moves all the PHY parameters into the MAC configuration
> structure, mainly as a separate structure. This allows us to completely
> ignore the MDIO bus attached to a stmmac if desired, and not create
> the PHY bus. It also allows the stmmac driver to use a different PHY
> from the one it is connected to, for example a fixed PHY or bit banging
> PHY.
> 
> Also derive the stmmac/PHY connection type (MII/RMII etc) from the
> mode can be passed into <platf>_configure_ethernet.
> STLinux kernel at git://git.stlinux.com/stm/linux-sh4-2.6.32.y.git
> provides several examples how to use this new infrastructure (that
> actually is easier to maintain and clearer).
> 
> Signed-off-by: Stuart Menefy <stuart.menefy@st.com>
> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>

Applied.

^ permalink raw reply

* Re: [PATCH 2/4] stmmac: remove warning when compile as built-in (V2)
From: David Miller @ 2011-07-21 22:30 UTC (permalink / raw)
  To: peppe.cavallaro; +Cc: netdev
In-Reply-To: <1311156324-23928-2-git-send-email-peppe.cavallaro@st.com>

From: Giuseppe CAVALLARO <peppe.cavallaro@st.com>
Date: Wed, 20 Jul 2011 12:05:22 +0200

> The patch removes the following serie of warnings
> when the driver is compiled as built-in:
> 
> drivers/net/stmmac/stmmac_main.c: In function stmmac_cmdline_opt:
> drivers/net/stmmac/stmmac_main.c:1855:12: warning: ignoring return
> value of kstrtoul, declared with attribute warn_unused_result
> [snip]
> 
> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>

Applied.

^ permalink raw reply

* Re: [PATCH 1/4] stmmac: update the version (V2)
From: David Miller @ 2011-07-21 22:30 UTC (permalink / raw)
  To: peppe.cavallaro; +Cc: netdev
In-Reply-To: <1311156324-23928-1-git-send-email-peppe.cavallaro@st.com>

From: Giuseppe CAVALLARO <peppe.cavallaro@st.com>
Date: Wed, 20 Jul 2011 12:05:21 +0200

> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>

Applied.

^ permalink raw reply

* Re: Just one more byte, it is wafer thin...
From: Rick Jones @ 2011-07-21 22:28 UTC (permalink / raw)
  To: netdev; +Cc: Eli Cohen, Yevgeny Petrilin
In-Reply-To: <4E2764A0.90003@hp.com>

On 07/20/2011 04:28 PM, Rick Jones wrote:
> and just to be completely pedantic about it, set rx-usecs-high to 0:
>
> # HDR="-P 1";for r in 4344 4345; do netperf -H mumble.3.21 -t TCP_RR
> $HDR -- -r ${r},1; HDR="-P 0"; done
> MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
> to mumble.3.21 (mumble.3.21) port 0 AF_INET : histogram : first burst 0
> Local /Remote
> Socket Size Request Resp. Elapsed Trans.
> Send Recv Size Size Time Rate
> bytes Bytes bytes bytes secs. per sec
>
> 16384 87380 4344 1 10.00 14274.03
> 16384 87380
> 16384 87380 4345 1 10.00 13697.11
> 16384 87380
>
> and got a somewhat unexpected result - I've no idea why then they both
> went up - perhaps it was sensing "high" occasionally even in the 4344
> byte request case. Still, is this suggesting that perhaps the adaptive
> bits are being a bit to aggressive about sensing high? Over what
> interval is that measurement supposed to be happening?

So, from a 2.6.38 tree in drivers/net/mlx4/en_netdev:

        /* Apply auto-moderation only when packet rate exceeds a rate that
          * it matters */
         if (rate > MLX4_EN_RX_RATE_THRESH) {
                 /* If tx and rx packet rates are not balanced, assume that
                  * traffic is mainly BW bound and apply maximum moderation.
                  * Otherwise, moderate according to packet rate */
                 if (2 * tx_pkt_diff > 3 * rx_pkt_diff &&
                     rx_pkt_diff / rx_byte_diff <
                     MLX4_EN_SMALL_PKT_SIZE)
                         moder_time = priv->rx_usecs_low;
                 else if (2 * rx_pkt_diff > 3 * tx_pkt_diff)
                         moder_time = priv->rx_usecs_high;
                 else {
                         if (rate < priv->pkt_rate_low)
                                 moder_time = priv->rx_usecs_low;
                         else if (rate > priv->pkt_rate_high)
                                 moder_time = priv->rx_usecs_high;
                         else
                                 moder_time = (rate - priv->pkt_rate_low) *
                                         (priv->rx_usecs_high - 
priv->rx_usecs_low) /
                                         (priv->pkt_rate_high - 
priv->pkt_rate_low) +
                                         priv->rx_usecs_low;
                 }
         } else {
                 /* When packet rate is low, use default moderation 
rather than
                  * 0 to prevent interrupt storms if traffic suddenly 
increases */
                 moder_time = priv->rx_usecs;
         }

It would seem that the assume is involved here.  The TCP_RR test I was 
running (or NFS Read, or NFS Write, or I suspect SMB/CIFS reads and 
writes etc) will have either request much larger than response or vice 
versa.  That leaves the tx and rx packet rates decidedly not balanced 
even when the traffic is not BW bound, particularly when there will not 
be all that many requests outstanding at one time.  And it becomes even 
more unbalanced when LRO/GRO stretches the ACKs.

rick jones

^ permalink raw reply

* Re: [PATCH net-2.6] ethtool: Allow zero-length register dumps again
From: David Miller @ 2011-07-21 22:25 UTC (permalink / raw)
  To: bhutchings; +Cc: linville, kvalo, netdev, linux-wireless
In-Reply-To: <1311270840.28569.34.camel@localhost>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Thu, 21 Jul 2011 19:54:00 +0200

> Some drivers (ab)use the ethtool_ops::get_regs operation to expose
> only a hardware revision ID.  Commit
> a77f5db361ed9953b5b749353ea2c7fed2bf8d93 ('ethtool: Allocate register
> dump buffer with vmalloc()') had the side-effect of breaking these, as
> vmalloc() returns a null pointer for size=0 whereas kmalloc() did not.
> 
> For backward-compatibility, allow zero-length dumps again.
> 
> Reported-by: Kalle Valo <kvalo@qca.qualcomm.com>
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> Cc: stable@kernel.org [2.6.37+]

Applied to net-next-2.6, I left the CC: stable tag in there so
-stable will pick it up once it hits Linus's tree during the
merge window.

Thanks.

^ permalink raw reply

* Re: [PATCH] pktgen: Clone skb to avoid corruption of skbs in ndo_start_xmit methods
From: Ben Greear @ 2011-07-21 22:26 UTC (permalink / raw)
  To: David Miller
  Cc: nhorman, eric.dumazet, jpirko, netdev, adobriyan, robert.olsson
In-Reply-To: <20110721.151903.297506479006061401.davem@davemloft.net>

On 07/21/2011 03:19 PM, David Miller wrote:
> From: Ben Greear<greearb@candelatech.com>
> Date: Thu, 21 Jul 2011 15:14:32 -0700
>
>> On 07/21/2011 03:01 PM, David Miller wrote:
>>> From: Neil Horman<nhorman@tuxdriver.com>
>>> Date: Wed, 20 Jul 2011 11:18:27 -0400
>>>
>>>> On Wed, Jul 20, 2011 at 06:24:15AM +0200, Eric Dumazet wrote:
>>>>> Le mardi 19 juillet 2011 à 22:07 -0400, Neil Horman a écrit :
>>>>>>>
>>>>>> I think this is a good idea.  It lets pktgen dynamically make the
>>>>>> clone/share
>>>>>> decision dynamically and only impacts performance for those systems.
>>>>>>
>>>>>
>>>>> Just let pktgen refuse to use clone_skb command for these devices.
>>>>>
>>>> copy that, This is by no means final, but what do you think of this?
>>>> If its
>>>> agreeable to you, Ben, et al. I can add this to my local tree and
>>>> start auditing
>>>> all the drivers that may need to have the flag set.
>>>
>>> I think there is a much simpler solution.
>>>
>>> Set a flag in the SKB when pktgen does SKB sharing.
>>>
>>> In dev_queue_xmit() (or perhaps, dev_hard_start_xmit()), check the
>>> flag
>>> and if it's set then we copy the SKB.
>>>
>>> If this works, then we fix the crash and no driver changes are
>>> necessary both now and in the future.
>>
>> Doesn't that make clone-skb in pktgen much less efficient
>> in all cases?
>
> No, the copy only happens if we enter dev_queue_xmit() which pktgen
> doesn't do, it calls the driver's ->ndo_start_xmit() method directly.
>
> That's the whole idea.  Only these encapsulating software devices
> will trigger the condition.

It seems there may be some Ethernet drivers that have similar issues,
though I don't know of any personally (many years ago, Chelsio 10G NICs had
this issue, but it may be fixed now.)

But, it should be no worse than what we have today, and now that I
better understand what you suggest, it sounds OK to me.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply

* Re: [PATCH net-2.6] jme: Fix unmap error (Causing system freeze)
From: David Miller @ 2011-07-21 22:23 UTC (permalink / raw)
  To: chrisw
  Cc: jason, netdev, cooldavid, jason, marcus.disi, arieslee, devinchiu,
	mschiff, stable
In-Reply-To: <20110721193008.GC9766@sequoia.sous-sol.org>

From: Chris Wright <chrisw@sous-sol.org>
Date: Thu, 21 Jul 2011 12:30:08 -0700

> * cooldavid@cooldavid.org (cooldavid@cooldavid.org) wrote:
>> From: Guo-Fu Tseng <cooldavid@cooldavid.org>
>> 
>> This patch add the missing dma_unmap().
>> Which solved the critical issue of system freeze on heavy load.
>> 
>> Michal Miroslaw's rejected patch:
>> [PATCH v2 10/46] net: jme: convert to generic DMA API
>> Pointed out the issue also, thank you Michal.
>> But the fix was incorrect. It would unmap needed address
>> when low memory.
>> 
>> Got lots of feedback from End user and Gentoo Bugzilla.
>> https://bugs.gentoo.org/show_bug.cgi?id=373109
>> Thank you all. :)
> 
> Also referred to in kernel bugzilla:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=39312
> 
>> Cc: stable@kernel.org
>> Signed-off-by: Guo-Fu Tseng <cooldavid@cooldavid.org>
> 
> Acked-by: Chris Wright <chrisw@sous-sol.org>

Applied, but it's too late for 3.0 so I'll just submit it to all
the -stable branches after 3.0-final is released.

Thanks.

_______________________________________________
stable mailing list
stable@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/stable

^ permalink raw reply

* Re: [PATCH] pktgen: Clone skb to avoid corruption of skbs in ndo_start_xmit methods
From: David Miller @ 2011-07-21 22:19 UTC (permalink / raw)
  To: greearb; +Cc: nhorman, eric.dumazet, jpirko, netdev, adobriyan, robert.olsson
In-Reply-To: <4E28A4C8.8040707@candelatech.com>

From: Ben Greear <greearb@candelatech.com>
Date: Thu, 21 Jul 2011 15:14:32 -0700

> On 07/21/2011 03:01 PM, David Miller wrote:
>> From: Neil Horman<nhorman@tuxdriver.com>
>> Date: Wed, 20 Jul 2011 11:18:27 -0400
>>
>>> On Wed, Jul 20, 2011 at 06:24:15AM +0200, Eric Dumazet wrote:
>>>> Le mardi 19 juillet 2011 à 22:07 -0400, Neil Horman a écrit :
>>>>>>
>>>>> I think this is a good idea.  It lets pktgen dynamically make the
>>>>> clone/share
>>>>> decision dynamically and only impacts performance for those systems.
>>>>>
>>>>
>>>> Just let pktgen refuse to use clone_skb command for these devices.
>>>>
>>> copy that, This is by no means final, but what do you think of this?
>>> If its
>>> agreeable to you, Ben, et al. I can add this to my local tree and
>>> start auditing
>>> all the drivers that may need to have the flag set.
>>
>> I think there is a much simpler solution.
>>
>> Set a flag in the SKB when pktgen does SKB sharing.
>>
>> In dev_queue_xmit() (or perhaps, dev_hard_start_xmit()), check the
>> flag
>> and if it's set then we copy the SKB.
>>
>> If this works, then we fix the crash and no driver changes are
>> necessary both now and in the future.
> 
> Doesn't that make clone-skb in pktgen much less efficient
> in all cases?

No, the copy only happens if we enter dev_queue_xmit() which pktgen
doesn't do, it calls the driver's ->ndo_start_xmit() method directly.

That's the whole idea.  Only these encapsulating software devices
will trigger the condition.

^ permalink raw reply

* Re: [PATCH net-next-2.6] ipv6: make fragment identifications less predictable
From: David Miller @ 2011-07-21 22:17 UTC (permalink / raw)
  To: fernando; +Cc: eric.dumazet, security, eugeneteo, netdev, mpm
In-Reply-To: <4E2781A2.8020905@gont.com.ar>

From: Fernando Gont <fernando@gont.com.ar>
Date: Wed, 20 Jul 2011 22:32:18 -0300

> Does it make sense to go in this direction rather than simply randomize
> the IPv6 Fragment Identification?

We could, but that's actually a bit more work.

You have to avoid recycling IDs to the same destination host otherwise
a retransmit could use the same ID and overlap with a previous set of
frags, causing corruption.

This means if you go the "pure random" route, you have to make sure
that the 32-bit series produced by the random number generator is
maximally long.  This is why openbsd uses an ID generator based upon
skip32 etc.

And I cannot say that about our RNG infrastructure.

Also, 32-bits seems like a lot, but on a 40Gb link we can exhaust this
space in ~20 minutes (1554 byte packet over standard ethernet MTU at
40Gbit is ~3454767 ipv6 frag IDs per second).  So while maybe not a
serious issue right now, we seem to go up by a factor of 10 every few
years, therefore at ~400Gb it's down to 2 minutes.

So we have to look at it like a constrained resource, and therefore
doing it on a per-destination basis like ipv4 makes a lot of sense.

I think Eric's work is the way forward and I'll be applying his patches.

^ permalink raw reply

* Re: [PATCH] pktgen: Clone skb to avoid corruption of skbs in ndo_start_xmit methods
From: Ben Greear @ 2011-07-21 22:14 UTC (permalink / raw)
  To: David Miller
  Cc: nhorman, eric.dumazet, jpirko, netdev, adobriyan, robert.olsson
In-Reply-To: <20110721.150107.25773992475689131.davem@davemloft.net>

On 07/21/2011 03:01 PM, David Miller wrote:
> From: Neil Horman<nhorman@tuxdriver.com>
> Date: Wed, 20 Jul 2011 11:18:27 -0400
>
>> On Wed, Jul 20, 2011 at 06:24:15AM +0200, Eric Dumazet wrote:
>>> Le mardi 19 juillet 2011 à 22:07 -0400, Neil Horman a écrit :
>>>>>
>>>> I think this is a good idea.  It lets pktgen dynamically make the clone/share
>>>> decision dynamically and only impacts performance for those systems.
>>>>
>>>
>>> Just let pktgen refuse to use clone_skb command for these devices.
>>>
>> copy that, This is by no means final, but what do you think of this?  If its
>> agreeable to you, Ben, et al. I can add this to my local tree and start auditing
>> all the drivers that may need to have the flag set.
>
> I think there is a much simpler solution.
>
> Set a flag in the SKB when pktgen does SKB sharing.
>
> In dev_queue_xmit() (or perhaps, dev_hard_start_xmit()), check the flag
> and if it's set then we copy the SKB.
>
> If this works, then we fix the crash and no driver changes are
> necessary both now and in the future.

Doesn't that make clone-skb in pktgen much less efficient
in all cases?

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply

* Re: [PATCH] pktgen: Clone skb to avoid corruption of skbs in ndo_start_xmit methods
From: David Miller @ 2011-07-21 22:01 UTC (permalink / raw)
  To: nhorman; +Cc: eric.dumazet, greearb, jpirko, netdev, adobriyan, robert.olsson
In-Reply-To: <20110720151827.GD12349@hmsreliant.think-freely.org>

From: Neil Horman <nhorman@tuxdriver.com>
Date: Wed, 20 Jul 2011 11:18:27 -0400

> On Wed, Jul 20, 2011 at 06:24:15AM +0200, Eric Dumazet wrote:
>> Le mardi 19 juillet 2011 à 22:07 -0400, Neil Horman a écrit :
>> > > 
>> > I think this is a good idea.  It lets pktgen dynamically make the clone/share
>> > decision dynamically and only impacts performance for those systems.
>> > 
>> 
>> Just let pktgen refuse to use clone_skb command for these devices.
>> 
> copy that, This is by no means final, but what do you think of this?  If its
> agreeable to you, Ben, et al. I can add this to my local tree and start auditing
> all the drivers that may need to have the flag set.

I think there is a much simpler solution.

Set a flag in the SKB when pktgen does SKB sharing.

In dev_queue_xmit() (or perhaps, dev_hard_start_xmit()), check the flag
and if it's set then we copy the SKB.

If this works, then we fix the crash and no driver changes are
necessary both now and in the future.

^ permalink raw reply

* Re: [patch net-next-2.6 v2] skbuff: fix error handling in pskb_copy()
From: David Miller @ 2011-07-21 21:48 UTC (permalink / raw)
  To: eric.dumazet; +Cc: error27, xma, mirq-linux, netdev, kernel-janitors
In-Reply-To: <1311152386.2338.8.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 20 Jul 2011 10:59:46 +0200

> Le mercredi 20 juillet 2011 à 11:51 +0300, Dan Carpenter a écrit :
>> There are two problems:
>> 1) "n" was allocated with alloc_skb() so we should free it with
>>    kfree_skb() instead of regular kfree().
>> 2) We return the freed pointer instead of NULL.
>> 
>> Signed-off-by: Dan Carpenter <error27@gmail.com>
> 
> Thanks Dan
> 
> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied, thanks everyone.

^ permalink raw reply

* Re: [patch net-next-2.6 37/47] igb: do vlan cleanup
From: Jesse Gross @ 2011-07-21 21:45 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, shemminger, eric.dumazet, greearb, mirqus,
	jeffrey.t.kirsher, jesse.brandeburg, peter.p.waskiewicz.jr,
	bruce.w.allan, carolyn.wyborny, donald.c.skidmore, gregory.v.rose,
	alexander.h.duyck, john.ronciak, e1000-devel
In-Reply-To: <20110721065721.GA2199@minipsycho>

On Wed, Jul 20, 2011 at 11:57 PM, Jiri Pirko <jpirko@redhat.com> wrote:
> Thu, Jul 21, 2011 at 01:58:10AM CEST, jesse@nicira.com wrote:
>>On Wed, Jul 20, 2011 at 12:10 PM, Jiri Pirko <jpirko@redhat.com> wrote:
>>> Wed, Jul 20, 2011 at 07:35:33PM CEST, jesse@nicira.com wrote:
>>>>On Wed, Jul 20, 2011 at 7:54 AM, Jiri Pirko <jpirko@redhat.com> wrote:
>>>>> @@ -2943,7 +2944,7 @@ static void igb_rlpml_set(struct igb_adapter *adapter)
>>>>>        struct e1000_hw *hw = &adapter->hw;
>>>>>        u16 pf_id = adapter->vfs_allocated_count;
>>>>>
>>>>> -       if (adapter->vlgrp)
>>>>> +       if (igb_vlan_used(adapter))
>>>>>                max_frame_size += VLAN_TAG_SIZE;
>>>>
>>>>There are similar issues here as with the VF driver.  I think you're
>>>>also confusing vlan acceleration with vlan filtering.  If no vlan
>>>>filters are in use but the card is in promiscuous mode, the buffer
>>>>will be undersized and we lose tagged packets.
>>>
>>> I'm certainly not confusing vlan accel and filtering. Here is the
>>> intension is the behaviour remains intact as well. I believe it's true.
>>
>>I believe the underlying issue for all three of these threads is the
>>same, so I'll just respond to them all here.
>>
>>I agree that this doesn't change the behavior of the driver but I
>>don't think that should be the goal.  When I originally designed this
>>new vlan model my intention was to eliminate a whole class of driver
>>bugs that I was repeatedly hitting in various forms.  In the example
>>above, if you run tcpdump on this device without configuring a vlan
>>group on it then you will see that MTU sized packets are missing
>>because the receive buffer was undersized.
>>
>>The common theme for these problems is that they all occur in
>>situations where vlans are not configured on the device and the driver
>>does something different as a result of this.  The solution was to
>>prevent drivers from changing their behavior in such situations by
>>completely removing the concept of a vlan group from them and letting
>>the networking core tell them when to make the changes instead of
>>doing it implicitly.  That's why I don't see the fact that this change
>>essentially emulates the knowledge of configuring a group to be a
>>plus.  By the way, plenty of your other patches change the behavior of
>>the drivers - on any of the NICs that always enable stripping, try
>>running tcpdump on the interface without configuring a vlan group.
>>Before the change you will see that tags have disappeared and
>>afterwards the tags are intact.  So I think that changing the behavior
>>of drivers in this regard is a positive thing.
>>
>>As an aside, thank you for taking the time to work on all of these
>>drivers.  The only reason why I'm complaining about these few drivers
>>is because I'd like to close the door on this class of problems, which
>>is finally in reach thanks to your work.
>
>
> Okay now it's clear to me. I tried to stay with the code as much similar
> as unpatched. But I see your arguments. I will review and repost
> patches which are enabling/disabling vlan accel on add_vid/kill_vid and
> convert it to set_features.
>
> Thanks. Jesse.

All the new changes look good to me, thanks Jiri.

^ permalink raw reply

* Re: [PATCH 1/2] igb: Allow extra 4 bytes on RX for vlan tags.
From: Jesse Gross @ 2011-07-21 21:44 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: jeffrey.t.kirsher, Ben Greear, netdev@vger.kernel.org,
	Duyck, Alexander H
In-Reply-To: <CAKgT0UfaEEvRTSpu-U+0_oj0KnEkyx5hRAwZDiCAAdtY4YhQUQ@mail.gmail.com>

On Wed, Jul 20, 2011 at 11:35 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Wed, Jul 20, 2011 at 6:21 PM, Jeff Kirsher
> <jeffrey.t.kirsher@intel.com> wrote:
>> On Wed, 2011-07-20 at 17:27 -0700, Ben Greear wrote:
>>> On 07/20/2011 05:18 PM, Jesse Gross wrote:
>>> > On Thu, Feb 17, 2011 at 9:28 AM, Ben Greear<greearb@candelatech.com>  wrote:
>>> >> On 02/17/2011 03:04 AM, Jeff Kirsher wrote:
>>> >>>
>>> >>> On Thu, Feb 10, 2011 at 13:59,<greearb@candelatech.com>    wrote:
>>> >>>>
>>> >>>> From: Ben Greear<greearb@candelatech.com>
>>> >>>>
>>> >>>> This allows the NIC to receive 1518 byte (not counting
>>> >>>> FCS) packets when MTU is 1500, thus allowing 1500 MTU
>>> >>>> VLAN frames to be received.  Please note that no VLANs
>>> >>>> were actually configured on the NIC...it was just acting
>>> >>>> as pass-through device.
>>> >>>>
>>> >>>> Signed-off-by: Ben Greear<greearb@candelatech.com>
>>> >>>> ---
>>> >>>> :100644 100644 58c665b... 30c9cc6... M  drivers/net/igb/igb_main.c
>>> >>>>   drivers/net/igb/igb_main.c |    5 +++--
>>> >>>>   1 files changed, 3 insertions(+), 2 deletions(-)
>>> >>>>
>>> >>>> diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
>>> >>>> index 58c665b..30c9cc6 100644
>>> >>>> --- a/drivers/net/igb/igb_main.c
>>> >>>> +++ b/drivers/net/igb/igb_main.c
>>> >>>> @@ -2281,7 +2281,8 @@ static int __devinit igb_sw_init(struct igb_adapter
>>> >>>> *adapter)
>>> >>>>         adapter->rx_itr_setting = IGB_DEFAULT_ITR;
>>> >>>>         adapter->tx_itr_setting = IGB_DEFAULT_ITR;
>>> >>>>
>>> >>>> -       adapter->max_frame_size = netdev->mtu + ETH_HLEN + ETH_FCS_LEN;
>>> >>>> +       adapter->max_frame_size = (netdev->mtu + ETH_HLEN + ETH_FCS_LEN
>>> >>>> +                                  + VLAN_HLEN);
>>> >>>>         adapter->min_frame_size = ETH_ZLEN + ETH_FCS_LEN;
>>> >>>>
>>> >>>>         spin_lock_init(&adapter->stats64_lock);
>>> >>>> @@ -4303,7 +4304,7 @@ static int igb_change_mtu(struct net_device
>>> >>>> *netdev, int new_mtu)
>>> >>>>   {
>>> >>>>         struct igb_adapter *adapter = netdev_priv(netdev);
>>> >>>>         struct pci_dev *pdev = adapter->pdev;
>>> >>>> -       int max_frame = new_mtu + ETH_HLEN + ETH_FCS_LEN;
>>> >>>> +       int max_frame = new_mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN;
>>> >>>>         u32 rx_buffer_len, i;
>>> >>>>
>>> >>>>         if ((new_mtu<    68) || (max_frame>    MAX_JUMBO_FRAME_SIZE)) {
>>> >>>
>>> >>> While testing this patch, validation found that the patch reduces the
>>> >>> maximum mtu size
>>> >>> by 4 bytes (reduces it from 9216 to 9212).  This is not a desired side
>>> >>> effect of this patch.
>>> >>
>>> >> You could add handling for that case and have it act as it used to when
>>> >> new_mtu is greater than 9212?
>>> >>
>>> >> I tested e1000e and it worked w/out hacking at 1500 MTU, so maybe
>>> >> check how it does it?
>>> >
>>> > I just wanted to bring this up again to see if any progress had been
>>> > made.  We were looking at this driver and trying to figure out the
>>> > best way to convert it to use the new vlan model but I'm not familiar
>>>
>>> I've been watching :)
>>>
>>> > enough with the hardware to know.  It seems that all of the other
>>> > Intel drivers unconditionally add space for the vlan tag to the
>>> > receive buffer (and would therefore have similar effects as this
>>> > patch), is there something different about this card?
>>> >
>>> > I believe that Alex was working on something in this area (in the
>>> > context of one of my patches from a long time ago) but I'm not sure
>>> > what came of that.
>>>
>>> Truth is, I don't really see why it's a problem to decrease the
>>> maximum MTU slightly in order to make it work with VLANs.
>>>
>>> I'm not sure if there is some way to make it work with VLANs
>>> and not decrease the maximum MTU.
>>
>> This was the reason this did not get accepted.  I was looking into what
>> could be done so that we did not decease the maximum MTU, but I got
>> side-tracked and have not done anything on it in several months.
>>
>
> I can take a look at fixing this most likely tomorrow.  I have some
> work planned for igb anyway over the next few days.
>
> Odds are it is just a matter of where the VLAN_HLEN is added.  As I
> recall for our drivers the correct spot is in the setting of
> rx_buffer_len since that is the area more concerned with maximum
> receive frame size versus the mtu section which is more concerned with
> the transmit side of things.

Thanks Alex.

^ permalink raw reply

* Re: [patch net-next-2.6 00/47] vlan cleanup
From: David Miller @ 2011-07-21 20:57 UTC (permalink / raw)
  To: jpirko; +Cc: netdev, shemminger, eric.dumazet, greearb, mirqus
In-Reply-To: <1311173689-17419-1-git-send-email-jpirko@redhat.com>

From: Jiri Pirko <jpirko@redhat.com>
Date: Wed, 20 Jul 2011 16:54:02 +0200

> This patchset converts several drivers to new vlan model.
> Also kills several vlan helpers:
> lro_vlan_hwaccel_receive_skb
> lro_vlan_hwaccel_receive_frags
> __vlan_hwaccel_rx
> vlan_hwaccel_rx
> vlan_hwaccel_receive_skb
> vlan_gro_frags
> vlan_gro_receive
> 
> Removes ndo_vlan_rx_register
> 
> note gianfar patch is dependent on "[PATCH] gianfar: rx parser"

Ok, I applied the latest version of all of the patches (except the
'rx parser' one of course :-), after merging net-2.6 into net-next-2.6

Thanks!

^ permalink raw reply

* Re: [PATCH net-next 5/5] bnx2x: Broken self-test in SF mode on 578xx
From: David Miller @ 2011-07-21 20:01 UTC (permalink / raw)
  To: vladz; +Cc: netdev, eilong
In-Reply-To: <201107212058.54522.vladz@broadcom.com>

From: "Vlad Zolotarov" <vladz@broadcom.com>
Date: Thu, 21 Jul 2011 20:58:54 +0300

> This patch fixes both the failure in the self-test on 578xx
> and a hole in a parity recovery flow that this failure
> has discovered: 
>  - internal 'pending' state in a VLAN_MAC object wasn't been cleared
> when the object state change was called with DRV_ONLY flag, which in
> particular happens when a parity error happens during the self-test.
>  - bp->sp_state wasn't cleared in the similar circumstances as described 
> above.
> 
> Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com>
> Signed-off-by: Eilon Greenstein <eilong@broadcom.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 4/5] bnx2x: Parity errors recovery for 578xx
From: David Miller @ 2011-07-21 20:01 UTC (permalink / raw)
  To: vladz; +Cc: netdev, eilong
In-Reply-To: <201107212058.36462.vladz@broadcom.com>

From: "Vlad Zolotarov" <vladz@broadcom.com>
Date: Thu, 21 Jul 2011 20:58:36 +0300

> Fix the parity errors recovery flow for 578xx:
>     - Add a separate column for the 578xx in the parity mask
>       registers DB.
>     - Fix the bnx2x_process_kill_chip_reset() to handle the blocks
>       newly introduced in the 578xx.
> 
> Cover ATC and PGLUE_B blocks for 57712 and 578xx.
> 
> Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com>
> Signed-off-by: Eilon Greenstein <eilong@broadcom.com>

Applied.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox