Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v2] sctp: be more restrictive in transport selection on bundled sacks
From: Neil Horman @ 2012-06-28 18:07 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: netdev, David S. Miller, linux-sctp
In-Reply-To: <4FEC7F40.8070707@gmail.com>

On Thu, Jun 28, 2012 at 11:58:56AM -0400, Vlad Yasevich wrote:
> On 06/28/2012 11:33 AM, Neil Horman wrote:
> >On Wed, Jun 27, 2012 at 03:44:22PM -0400, Vlad Yasevich wrote:
> >>On 06/27/2012 01:28 PM, Neil Horman wrote:
> >>>On Wed, Jun 27, 2012 at 11:10:26AM -0400, Vlad Yasevich wrote:
> >>>>
> >>>>I didn't think of this yesterday, but I think it would be much
> >>>>better to use pkt->transport here since you are adding the chunk to
> >>>>the packet and it will go out on the transport of the packet.  You
> >>>>are also guaranteed that pkt->transport is set.
> >>>>
> >>>I don't think it really matters, as the chunk transport is used to lookup the
> >>>packet that we append to, and if the chunk transport is unset, its somewhat
> >>>questionable as to weather we should bundle, but if packet->transport is set,
> >>>its probably worth it to avoid the extra conditional.
> >>>
> >>
> >>Just looked at the code flow.  chunk->transport may not be set until
> >>the end of sctp_packet_append_chunk.  For new data, transport may
> >>not be set.  For retransmitted data, transport is set to last
> >>transport data was sent on.  So, we could be looking at the wrong
> >>transport.  What you are trying to decided is if the current
> >>transport we will be used can take the SACK, but you may not be
> >>looking at the current transport. Looking at packet->transport is
> >>the correct thing to do.
> >>
> >>-vlad
> >>
> >So, I agree after what you said above, that this is the right thing to do.  That
> >said, I just tested the change with the SCTP_RR test in netperf, and it wound up
> >giving me horrid performance (Its reporting about 5 transactions per second).
> >It appears that whats happening is that, because the test alternates which
> >transports it sends out, and because it waits for a sack of teh prior packet
> >before it sends out the next transaction, we're always missing the bundle
> >opportunity, and always waiting for the 200ms timeout for the sack to occur.
> >While I know this is a pessimal case, it really seems bad to me.  It seems that
> >because I was using chunk->transport previously, I luckily got the transport
> >wrong sometimes, and it managed to bundle more often.
> >
> >So I'm not sure what to do here.  I had really wanted to avoid adding a sysctl
> >here, but given that this is likely a corner cases, it seems that might be the
> >best approach.  Do you have any thoughts?
> >
> >Neil
> >
> 
> that's strange.  did you modify the SCTP_RR to alternate transports?
> Seems like responses in the RR test need to go the address of the
> sender so that we don't see things like:
> Request (t) --->
>             <--- Response (t2)
> 
> Should be:
> Request (t1) --->
>              <--- Response (t1)
> 
> 
> -vlad
That would seem to me to be the case too....

However, having looked at this some more, it seems I just jumped the gun on
this. Its happening because sctp_eat_data variants are issuing a SCTP_GEN_SACK
command at the end of every received packet, which causes the moved_ctsn value
to get cleared.  We follow the sack every other packet rule instead of taking
the opportunity to bundle on send, so we're sending a packet with a sack, and a
second packet with a 1 byte data chunk (thats part of the SCTP_RR test).

I'm not sure why I didn't see this when I was using the chunk->transport
pointer.  Maybe I was just getting lucky with timing...

I'll see how I can go about fixing this.

Neil

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: [PATCH net-next] fq_codel: report congestion notification at enqueue time
From: Eric Dumazet @ 2012-06-28 18:12 UTC (permalink / raw)
  To: Dave Taht
  Cc: Nandita Dukkipati, netdev, codel, Yuchung Cheng, Neal Cardwell,
	David Miller, Matt Mathis
In-Reply-To: <CAA93jw7agn2J6Hd7x22KWhENY5jqVjnk6uRr=3LJ5Anw7EgacQ@mail.gmail.com>

On Thu, 2012-06-28 at 10:51 -0700, Dave Taht wrote:

> clever idea. A problem is there are other forms of network traffic on
> a link, and this is punishing a single tcp
> stream that may not be the source of the problem in the first place,
> and basically putting it into double jeopardy.
> 

Why ? In fact this patch helps the tcp session being signaled (as it
will be anyway) at enqueue time, instead of having to react to packet
losses indications given (after RTT) by receiver.

Avoiding losses help receiver to consume data without having to buffer
it into Out Of Order queue.

So its not jeopardy, but early congestion notification without RTT
delay.

NET_XMIT_CN is a soft signal, far more disruptive than a DROP.

> I am curious as to how often an enqueue is actually dropping in the
> codel/fq_codel case, the hope was that there would be plenty of
> headroom under far more circumstances on this qdisc.
> 

"tc -s qdisc show dev eth0" can show you all the counts.

We never drop a packet at enqueue time, unless you hit the emergency
limit (10240 packets for fq_codel). When you reach this limit, you are
under trouble.

^ permalink raw reply

* RE: [PATCH v2] l2tp: use per-cpu variables for u64_stats updates
From: Ben Hutchings @ 2012-06-28 18:17 UTC (permalink / raw)
  To: David Laight
  Cc: Eric Dumazet, Rick Jones, Ben Greear, Stephen Hemminger,
	Tom Parkin, netdev, James Chapman
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B6F6B@saturn3.aculab.com>

On Thu, 2012-06-28 at 09:46 +0100, David Laight wrote:
> > [1] : LLTX drivers case 
> >    since ndo_start_xmit() can be run concurrently by many cpus, safely
> > updating an "unsigned long" requires additional hassle :
> > 
> >    1) Use of a spinlock to protect the update.
> >    2) Use atomic_long_t instead of "unsigned long"
> >    3) Use percpu data
> 
> 4) These are statistics so it shouldn't really matter if
>    they are out by a small number.

You might be surprised just how much end users care about statistics.

>    So the errors
>    introduced by unlocked updates can probably be ignored.
>    So just use 'unsigned long'.

This can result in statistics being decremented in case of a race.  That
should never be allowed to happen.

Ben.

>    Might be worth putting in gcc barriers so that the
>    load and store instructions aren't separated too far.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH v2] sctp: be more restrictive in transport selection on bundled sacks
From: Vlad Yasevich @ 2012-06-28 18:22 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev, David S. Miller, linux-sctp
In-Reply-To: <20120628180721.GC29277@hmsreliant.think-freely.org>

On 06/28/2012 02:07 PM, Neil Horman wrote:
> On Thu, Jun 28, 2012 at 11:58:56AM -0400, Vlad Yasevich wrote:
>> On 06/28/2012 11:33 AM, Neil Horman wrote:
>>> On Wed, Jun 27, 2012 at 03:44:22PM -0400, Vlad Yasevich wrote:
>>>> On 06/27/2012 01:28 PM, Neil Horman wrote:
>>>>> On Wed, Jun 27, 2012 at 11:10:26AM -0400, Vlad Yasevich wrote:
>>>>>>
>>>>>> I didn't think of this yesterday, but I think it would be much
>>>>>> better to use pkt->transport here since you are adding the chunk to
>>>>>> the packet and it will go out on the transport of the packet.  You
>>>>>> are also guaranteed that pkt->transport is set.
>>>>>>
>>>>> I don't think it really matters, as the chunk transport is used to lookup the
>>>>> packet that we append to, and if the chunk transport is unset, its somewhat
>>>>> questionable as to weather we should bundle, but if packet->transport is set,
>>>>> its probably worth it to avoid the extra conditional.
>>>>>
>>>>
>>>> Just looked at the code flow.  chunk->transport may not be set until
>>>> the end of sctp_packet_append_chunk.  For new data, transport may
>>>> not be set.  For retransmitted data, transport is set to last
>>>> transport data was sent on.  So, we could be looking at the wrong
>>>> transport.  What you are trying to decided is if the current
>>>> transport we will be used can take the SACK, but you may not be
>>>> looking at the current transport. Looking at packet->transport is
>>>> the correct thing to do.
>>>>
>>>> -vlad
>>>>
>>> So, I agree after what you said above, that this is the right thing to do.  That
>>> said, I just tested the change with the SCTP_RR test in netperf, and it wound up
>>> giving me horrid performance (Its reporting about 5 transactions per second).
>>> It appears that whats happening is that, because the test alternates which
>>> transports it sends out, and because it waits for a sack of teh prior packet
>>> before it sends out the next transaction, we're always missing the bundle
>>> opportunity, and always waiting for the 200ms timeout for the sack to occur.
>>> While I know this is a pessimal case, it really seems bad to me.  It seems that
>>> because I was using chunk->transport previously, I luckily got the transport
>>> wrong sometimes, and it managed to bundle more often.
>>>
>>> So I'm not sure what to do here.  I had really wanted to avoid adding a sysctl
>>> here, but given that this is likely a corner cases, it seems that might be the
>>> best approach.  Do you have any thoughts?
>>>
>>> Neil
>>>
>>
>> that's strange.  did you modify the SCTP_RR to alternate transports?
>> Seems like responses in the RR test need to go the address of the
>> sender so that we don't see things like:
>> Request (t) --->
>>              <--- Response (t2)
>>
>> Should be:
>> Request (t1) --->
>>               <--- Response (t1)
>>
>>
>> -vlad
> That would seem to me to be the case too....
>
> However, having looked at this some more, it seems I just jumped the gun on
> this. Its happening because sctp_eat_data variants are issuing a SCTP_GEN_SACK
> command at the end of every received packet, which causes the moved_ctsn value
> to get cleared.

Ok, that should only happen the very first time as we are supposed to 
ack the first data immediately.  On subsequent packets it should just 
start a timer because we are following the 2pkt/200ms rule.
Then, when the response happens, we should bundle the SACK as long as 
the data is leaving on the transport that moved the CTSN.

So we might be using the wrong transport and as result you send data and 
then end up waiting for a SACK.

-vlad

>  We follow the sack every other packet rule instead of taking
> the opportunity to bundle on send, so we're sending a packet with a sack, and a
> second packet with a 1 byte data chunk (thats part of the SCTP_RR test).
>
> I'm not sure why I didn't see this when I was using the chunk->transport
> pointer.  Maybe I was just getting lucky with timing...
>
> I'll see how I can go about fixing this.
>
> Neil
>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>

^ permalink raw reply

* Re: Sending Packet Much Larger Than MSS
From: George B. @ 2012-06-28 18:28 UTC (permalink / raw)
  To: netdev
In-Reply-To: <CAJ2iOjQGrnhG7L1jDAXybJiG7O5OpnaDzw5UBEuN2kEWXYGUnQ@mail.gmail.com>

On Thu, Jun 28, 2012 at 10:21 AM, George B. <georgeb@gmail.com> wrote:

> Is this a known issue with that kernel?
>
> Thanks.
>
> George

Ok, I am probably seeing GSO/TSO here.  Let me turn that off and try
again.  Sorry to bother.

George

^ permalink raw reply

* Re: [PATCH v2] sctp: be more restrictive in transport selection on bundled sacks
From: Neil Horman @ 2012-06-28 18:36 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: netdev, David S. Miller, linux-sctp
In-Reply-To: <4FECA0CF.10400@gmail.com>

On Thu, Jun 28, 2012 at 02:22:07PM -0400, Vlad Yasevich wrote:
> On 06/28/2012 02:07 PM, Neil Horman wrote:
> >On Thu, Jun 28, 2012 at 11:58:56AM -0400, Vlad Yasevich wrote:
> >>On 06/28/2012 11:33 AM, Neil Horman wrote:
> >>>On Wed, Jun 27, 2012 at 03:44:22PM -0400, Vlad Yasevich wrote:
> >>>>On 06/27/2012 01:28 PM, Neil Horman wrote:
> >>>>>On Wed, Jun 27, 2012 at 11:10:26AM -0400, Vlad Yasevich wrote:
> >>>>>>
> >>>>>>I didn't think of this yesterday, but I think it would be much
> >>>>>>better to use pkt->transport here since you are adding the chunk to
> >>>>>>the packet and it will go out on the transport of the packet.  You
> >>>>>>are also guaranteed that pkt->transport is set.
> >>>>>>
> >>>>>I don't think it really matters, as the chunk transport is used to lookup the
> >>>>>packet that we append to, and if the chunk transport is unset, its somewhat
> >>>>>questionable as to weather we should bundle, but if packet->transport is set,
> >>>>>its probably worth it to avoid the extra conditional.
> >>>>>
> >>>>
> >>>>Just looked at the code flow.  chunk->transport may not be set until
> >>>>the end of sctp_packet_append_chunk.  For new data, transport may
> >>>>not be set.  For retransmitted data, transport is set to last
> >>>>transport data was sent on.  So, we could be looking at the wrong
> >>>>transport.  What you are trying to decided is if the current
> >>>>transport we will be used can take the SACK, but you may not be
> >>>>looking at the current transport. Looking at packet->transport is
> >>>>the correct thing to do.
> >>>>
> >>>>-vlad
> >>>>
> >>>So, I agree after what you said above, that this is the right thing to do.  That
> >>>said, I just tested the change with the SCTP_RR test in netperf, and it wound up
> >>>giving me horrid performance (Its reporting about 5 transactions per second).
> >>>It appears that whats happening is that, because the test alternates which
> >>>transports it sends out, and because it waits for a sack of teh prior packet
> >>>before it sends out the next transaction, we're always missing the bundle
> >>>opportunity, and always waiting for the 200ms timeout for the sack to occur.
> >>>While I know this is a pessimal case, it really seems bad to me.  It seems that
> >>>because I was using chunk->transport previously, I luckily got the transport
> >>>wrong sometimes, and it managed to bundle more often.
> >>>
> >>>So I'm not sure what to do here.  I had really wanted to avoid adding a sysctl
> >>>here, but given that this is likely a corner cases, it seems that might be the
> >>>best approach.  Do you have any thoughts?
> >>>
> >>>Neil
> >>>
> >>
> >>that's strange.  did you modify the SCTP_RR to alternate transports?
> >>Seems like responses in the RR test need to go the address of the
> >>sender so that we don't see things like:
> >>Request (t) --->
> >>             <--- Response (t2)
> >>
> >>Should be:
> >>Request (t1) --->
> >>              <--- Response (t1)
> >>
> >>
> >>-vlad
> >That would seem to me to be the case too....
> >
> >However, having looked at this some more, it seems I just jumped the gun on
> >this. Its happening because sctp_eat_data variants are issuing a SCTP_GEN_SACK
> >command at the end of every received packet, which causes the moved_ctsn value
> >to get cleared.
> 
> Ok, that should only happen the very first time as we are supposed
> to ack the first data immediately.  On subsequent packets it should
> just start a timer because we are following the 2pkt/200ms rule.
> Then, when the response happens, we should bundle the SACK as long
> as the data is leaving on the transport that moved the CTSN.
> 
> So we might be using the wrong transport and as result you send data
> and then end up waiting for a SACK.
> 
Thats what I'm looking into now.  I'll let you know when I've figured it out
Neil

^ permalink raw reply

* pull request: wireless 2012-06-28
From: John W. Linville @ 2012-06-28 18:40 UTC (permalink / raw)
  To: davem; +Cc: linux-wireless, netdev, linux-kernel

commit de03309bd209b6fb025e9359266e0cbb441f7441

Dave,

These fixes are intended for 3.5...

Amitkumar Karwar gives us two mwifiex fixes: one fixes some skb
manipulations when handling some event messages; and another that
does some similar fixing on an error path.

Avinash Patil gives us a fix for for a memory leak in mwifiex.

Dan Rosenberg offers an NFC NCI fix to enforce some message length
limits to prevent buffer overflows.

Eliad Peller provides a mac80211 fix to prevent some frames from
being built with an invalid BSSID.

Eric Dumazet sends an NFC fix to prevent a BUG caused by a NULL
pointer dereference.

Felix Fietkau has an ath9k fix for a regression causing
LEAP-authenticated connection failures.

Johannes Berg provides an iwlwifi fix that eliminates some log SPAM
after an authentication/association timeout.  He also provides a
mac80211 fix to prevent incorrectly addressing certain action frames
(and in so doing, to comply with the 802.11 specs).

Larry Finger provides a few USB IDs for the rtl8192cu driver --
should be harmless.

Panayiotis Karabassis provices a one-liner to fix kernel bug 42903
(a system freeze).

Randy Dunlap provides a one-line Kconfig change to prevent build
failures with some configurations.

Stone Piao provides an mwifiex sequence numbering fix and a fix
to prevent mwifiex from attempting to include eapol frames in an
aggregation frame.

Finally, Tom Hughes provides an ath9k fix for a NULL pointer
dereference.

Please let me know if there are problems!

John

---

The following changes since commit a969dd139cc2f2bccdcb11894f0695517cf84d4d:

  Merge branch 'for-davem' of git://gitorious.org/linux-can/linux-can (2012-06-27 15:27:24 -0700)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless.git for-davem

for you to fetch changes up to de03309bd209b6fb025e9359266e0cbb441f7441:

  Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem (2012-06-28 13:47:53 -0400)

----------------------------------------------------------------

Amitkumar Karwar (2):
      mwifiex: fix bugs in event handling code
      mwifiex: improve error path handling in usb.c

Avinash Patil (1):
      mwifiex: fix memory leak associated with IE manamgement

Dan Rosenberg (1):
      NFC: Prevent multiple buffer overflows in NCI

Eliad Peller (1):
      mac80211: clear ifmgd->bssid only after building DELBA

Eric Dumazet (1):
      NFC: Return from rawsock_release when sk is NULL

Felix Fietkau (1):
      ath9k: fix dynamic WEP related regression

Johannes Berg (2):
      iwlwifi: fix activating inactive stations
      mac80211: correct behaviour on unrecognised action frames

John W. Linville (3):
      Merge branch 'for-john' of git://git.kernel.org/.../jberg/mac80211
      Merge branch 'for-wireless' of git://git.kernel.org/.../sameo/nfc-3.0
      Merge branch 'master' of git://git.kernel.org/.../linville/wireless into for-davem

Larry Finger (1):
      rtlwifi: rtl8192cu: New USB IDs

Panayiotis Karabassis (1):
      ath9k: enable serialize_regmode for non-PCIE AR9287

Randy Dunlap (1):
      wlcore: drop INET dependency

Stone Piao (2):
      mwifiex: fix 11n rx packet drop issue
      mwifiex: fix WPS eapol handshake failure

Tom Hughes (1):
      ath9k: fix panic caused by returning a descriptor we have queued for reuse

 drivers/net/wireless/ath/ath.h               |    1 +
 drivers/net/wireless/ath/ath9k/hw.c          |    2 +-
 drivers/net/wireless/ath/ath9k/recv.c        |    7 ++++---
 drivers/net/wireless/ath/key.c               |    4 ++++
 drivers/net/wireless/iwlwifi/iwl-mac80211.c  |   12 +++++++++++
 drivers/net/wireless/mwifiex/11n_rxreorder.c |    5 +++--
 drivers/net/wireless/mwifiex/11n_rxreorder.h |    7 +++++++
 drivers/net/wireless/mwifiex/ie.c            |    1 +
 drivers/net/wireless/mwifiex/sdio.c          |    6 +++---
 drivers/net/wireless/mwifiex/sta_event.c     |    9 ++++-----
 drivers/net/wireless/mwifiex/usb.c           |   28 ++++++++++++++++++--------
 drivers/net/wireless/mwifiex/wmm.c           |    3 +++
 drivers/net/wireless/rtlwifi/rtl8192cu/sw.c  |    3 +++
 drivers/net/wireless/ti/wlcore/Kconfig       |    1 -
 net/mac80211/mlme.c                          |   13 ++++++------
 net/mac80211/rx.c                            |    5 ++++-
 net/nfc/nci/ntf.c                            |   10 ++++-----
 net/nfc/rawsock.c                            |    5 ++++-
 18 files changed, 85 insertions(+), 37 deletions(-)

diff --git a/drivers/net/wireless/ath/ath.h b/drivers/net/wireless/ath/ath.h
index c54b7d37..420d69b 100644
--- a/drivers/net/wireless/ath/ath.h
+++ b/drivers/net/wireless/ath/ath.h
@@ -143,6 +143,7 @@ struct ath_common {
 	u32 keymax;
 	DECLARE_BITMAP(keymap, ATH_KEYMAX);
 	DECLARE_BITMAP(tkip_keymap, ATH_KEYMAX);
+	DECLARE_BITMAP(ccmp_keymap, ATH_KEYMAX);
 	enum ath_crypt_caps crypt_caps;
 
 	unsigned int clockrate;
diff --git a/drivers/net/wireless/ath/ath9k/hw.c b/drivers/net/wireless/ath/ath9k/hw.c
index 1c68e56..995ca8e 100644
--- a/drivers/net/wireless/ath/ath9k/hw.c
+++ b/drivers/net/wireless/ath/ath9k/hw.c
@@ -622,7 +622,7 @@ static int __ath9k_hw_init(struct ath_hw *ah)
 
 	if (NR_CPUS > 1 && ah->config.serialize_regmode == SER_REG_MODE_AUTO) {
 		if (ah->hw_version.macVersion == AR_SREV_VERSION_5416_PCI ||
-		    ((AR_SREV_9160(ah) || AR_SREV_9280(ah)) &&
+		    ((AR_SREV_9160(ah) || AR_SREV_9280(ah) || AR_SREV_9287(ah)) &&
 		     !ah->is_pciexpress)) {
 			ah->config.serialize_regmode =
 				SER_REG_MODE_ON;
diff --git a/drivers/net/wireless/ath/ath9k/recv.c b/drivers/net/wireless/ath/ath9k/recv.c
index e1fcc68..0735aeb 100644
--- a/drivers/net/wireless/ath/ath9k/recv.c
+++ b/drivers/net/wireless/ath/ath9k/recv.c
@@ -695,9 +695,9 @@ static bool ath_edma_get_buffers(struct ath_softc *sc,
 			__skb_unlink(skb, &rx_edma->rx_fifo);
 			list_add_tail(&bf->list, &sc->rx.rxbuf);
 			ath_rx_edma_buf_link(sc, qtype);
-		} else {
-			bf = NULL;
 		}
+
+		bf = NULL;
 	}
 
 	*dest = bf;
@@ -822,7 +822,8 @@ static bool ath9k_rx_accept(struct ath_common *common,
 	 * descriptor does contain a valid key index. This has been observed
 	 * mostly with CCMP encryption.
 	 */
-	if (rx_stats->rs_keyix == ATH9K_RXKEYIX_INVALID)
+	if (rx_stats->rs_keyix == ATH9K_RXKEYIX_INVALID ||
+	    !test_bit(rx_stats->rs_keyix, common->ccmp_keymap))
 		rx_stats->rs_status &= ~ATH9K_RXERR_KEYMISS;
 
 	if (!rx_stats->rs_datalen) {
diff --git a/drivers/net/wireless/ath/key.c b/drivers/net/wireless/ath/key.c
index 0e81904..5c54aa4 100644
--- a/drivers/net/wireless/ath/key.c
+++ b/drivers/net/wireless/ath/key.c
@@ -556,6 +556,9 @@ int ath_key_config(struct ath_common *common,
 		return -EIO;
 
 	set_bit(idx, common->keymap);
+	if (key->cipher == WLAN_CIPHER_SUITE_CCMP)
+		set_bit(idx, common->ccmp_keymap);
+
 	if (key->cipher == WLAN_CIPHER_SUITE_TKIP) {
 		set_bit(idx + 64, common->keymap);
 		set_bit(idx, common->tkip_keymap);
@@ -582,6 +585,7 @@ void ath_key_delete(struct ath_common *common, struct ieee80211_key_conf *key)
 		return;
 
 	clear_bit(key->hw_key_idx, common->keymap);
+	clear_bit(key->hw_key_idx, common->ccmp_keymap);
 	if (key->cipher != WLAN_CIPHER_SUITE_TKIP)
 		return;
 
diff --git a/drivers/net/wireless/iwlwifi/iwl-mac80211.c b/drivers/net/wireless/iwlwifi/iwl-mac80211.c
index 3ee23134..0136803 100644
--- a/drivers/net/wireless/iwlwifi/iwl-mac80211.c
+++ b/drivers/net/wireless/iwlwifi/iwl-mac80211.c
@@ -796,6 +796,18 @@ int iwlagn_mac_sta_state(struct ieee80211_hw *hw,
 	switch (op) {
 	case ADD:
 		ret = iwlagn_mac_sta_add(hw, vif, sta);
+		if (ret)
+			break;
+		/*
+		 * Clear the in-progress flag, the AP station entry was added
+		 * but we'll initialize LQ only when we've associated (which
+		 * would also clear the in-progress flag). This is necessary
+		 * in case we never initialize LQ because association fails.
+		 */
+		spin_lock_bh(&priv->sta_lock);
+		priv->stations[iwl_sta_id(sta)].used &=
+			~IWL_STA_UCODE_INPROGRESS;
+		spin_unlock_bh(&priv->sta_lock);
 		break;
 	case REMOVE:
 		ret = iwlagn_mac_sta_remove(hw, vif, sta);
diff --git a/drivers/net/wireless/mwifiex/11n_rxreorder.c b/drivers/net/wireless/mwifiex/11n_rxreorder.c
index 9c44088..900ee12 100644
--- a/drivers/net/wireless/mwifiex/11n_rxreorder.c
+++ b/drivers/net/wireless/mwifiex/11n_rxreorder.c
@@ -256,7 +256,8 @@ mwifiex_11n_create_rx_reorder_tbl(struct mwifiex_private *priv, u8 *ta,
 	else
 		last_seq = priv->rx_seq[tid];
 
-	if (last_seq >= new_node->start_win)
+	if (last_seq != MWIFIEX_DEF_11N_RX_SEQ_NUM &&
+	    last_seq >= new_node->start_win)
 		new_node->start_win = last_seq + 1;
 
 	new_node->win_size = win_size;
@@ -596,5 +597,5 @@ void mwifiex_11n_cleanup_reorder_tbl(struct mwifiex_private *priv)
 	spin_unlock_irqrestore(&priv->rx_reorder_tbl_lock, flags);
 
 	INIT_LIST_HEAD(&priv->rx_reorder_tbl_ptr);
-	memset(priv->rx_seq, 0, sizeof(priv->rx_seq));
+	mwifiex_reset_11n_rx_seq_num(priv);
 }
diff --git a/drivers/net/wireless/mwifiex/11n_rxreorder.h b/drivers/net/wireless/mwifiex/11n_rxreorder.h
index f1bffeb..6c9815a 100644
--- a/drivers/net/wireless/mwifiex/11n_rxreorder.h
+++ b/drivers/net/wireless/mwifiex/11n_rxreorder.h
@@ -37,6 +37,13 @@
 
 #define ADDBA_RSP_STATUS_ACCEPT 0
 
+#define MWIFIEX_DEF_11N_RX_SEQ_NUM	0xffff
+
+static inline void mwifiex_reset_11n_rx_seq_num(struct mwifiex_private *priv)
+{
+	memset(priv->rx_seq, 0xff, sizeof(priv->rx_seq));
+}
+
 int mwifiex_11n_rx_reorder_pkt(struct mwifiex_private *,
 			       u16 seqNum,
 			       u16 tid, u8 *ta,
diff --git a/drivers/net/wireless/mwifiex/ie.c b/drivers/net/wireless/mwifiex/ie.c
index ceb82cd..383820a 100644
--- a/drivers/net/wireless/mwifiex/ie.c
+++ b/drivers/net/wireless/mwifiex/ie.c
@@ -213,6 +213,7 @@ mwifiex_update_uap_custom_ie(struct mwifiex_private *priv,
 		/* save assoc resp ie index after auto-indexing */
 		*assoc_idx = *((u16 *)pos);
 
+	kfree(ap_custom_ie);
 	return ret;
 }
 
diff --git a/drivers/net/wireless/mwifiex/sdio.c b/drivers/net/wireless/mwifiex/sdio.c
index e037747..fc8a9bf 100644
--- a/drivers/net/wireless/mwifiex/sdio.c
+++ b/drivers/net/wireless/mwifiex/sdio.c
@@ -978,10 +978,10 @@ static int mwifiex_decode_rx_packet(struct mwifiex_adapter *adapter,
 		dev_dbg(adapter->dev, "info: --- Rx: Event ---\n");
 		adapter->event_cause = *(u32 *) skb->data;
 
-		skb_pull(skb, MWIFIEX_EVENT_HEADER_LEN);
-
 		if ((skb->len > 0) && (skb->len  < MAX_EVENT_SIZE))
-			memcpy(adapter->event_body, skb->data, skb->len);
+			memcpy(adapter->event_body,
+			       skb->data + MWIFIEX_EVENT_HEADER_LEN,
+			       skb->len);
 
 		/* event cause has been saved to adapter->event_cause */
 		adapter->event_received = true;
diff --git a/drivers/net/wireless/mwifiex/sta_event.c b/drivers/net/wireless/mwifiex/sta_event.c
index 4ace5a3..11e731f 100644
--- a/drivers/net/wireless/mwifiex/sta_event.c
+++ b/drivers/net/wireless/mwifiex/sta_event.c
@@ -406,9 +406,9 @@ int mwifiex_process_sta_event(struct mwifiex_private *priv)
 		break;
 
 	case EVENT_UAP_STA_ASSOC:
-		skb_pull(adapter->event_skb, MWIFIEX_UAP_EVENT_EXTRA_HEADER);
 		memset(&sinfo, 0, sizeof(sinfo));
-		event = (struct mwifiex_assoc_event *)adapter->event_skb->data;
+		event = (struct mwifiex_assoc_event *)
+			(adapter->event_body + MWIFIEX_UAP_EVENT_EXTRA_HEADER);
 		if (le16_to_cpu(event->type) == TLV_TYPE_UAP_MGMT_FRAME) {
 			len = -1;
 
@@ -433,9 +433,8 @@ int mwifiex_process_sta_event(struct mwifiex_private *priv)
 				 GFP_KERNEL);
 		break;
 	case EVENT_UAP_STA_DEAUTH:
-		skb_pull(adapter->event_skb, MWIFIEX_UAP_EVENT_EXTRA_HEADER);
-		cfg80211_del_sta(priv->netdev, adapter->event_skb->data,
-				 GFP_KERNEL);
+		cfg80211_del_sta(priv->netdev, adapter->event_body +
+				 MWIFIEX_UAP_EVENT_EXTRA_HEADER, GFP_KERNEL);
 		break;
 	case EVENT_UAP_BSS_IDLE:
 		priv->media_connected = false;
diff --git a/drivers/net/wireless/mwifiex/usb.c b/drivers/net/wireless/mwifiex/usb.c
index 49ebf20..22a5916 100644
--- a/drivers/net/wireless/mwifiex/usb.c
+++ b/drivers/net/wireless/mwifiex/usb.c
@@ -49,6 +49,7 @@ static int mwifiex_usb_recv(struct mwifiex_adapter *adapter,
 	struct device *dev = adapter->dev;
 	u32 recv_type;
 	__le32 tmp;
+	int ret;
 
 	if (adapter->hs_activated)
 		mwifiex_process_hs_config(adapter);
@@ -69,16 +70,19 @@ static int mwifiex_usb_recv(struct mwifiex_adapter *adapter,
 		case MWIFIEX_USB_TYPE_CMD:
 			if (skb->len > MWIFIEX_SIZE_OF_CMD_BUFFER) {
 				dev_err(dev, "CMD: skb->len too large\n");
-				return -1;
+				ret = -1;
+				goto exit_restore_skb;
 			} else if (!adapter->curr_cmd) {
 				dev_dbg(dev, "CMD: no curr_cmd\n");
 				if (adapter->ps_state == PS_STATE_SLEEP_CFM) {
 					mwifiex_process_sleep_confirm_resp(
 							adapter, skb->data,
 							skb->len);
-					return 0;
+					ret = 0;
+					goto exit_restore_skb;
 				}
-				return -1;
+				ret = -1;
+				goto exit_restore_skb;
 			}
 
 			adapter->curr_cmd->resp_skb = skb;
@@ -87,20 +91,22 @@ static int mwifiex_usb_recv(struct mwifiex_adapter *adapter,
 		case MWIFIEX_USB_TYPE_EVENT:
 			if (skb->len < sizeof(u32)) {
 				dev_err(dev, "EVENT: skb->len too small\n");
-				return -1;
+				ret = -1;
+				goto exit_restore_skb;
 			}
 			skb_copy_from_linear_data(skb, &tmp, sizeof(u32));
 			adapter->event_cause = le32_to_cpu(tmp);
-			skb_pull(skb, sizeof(u32));
 			dev_dbg(dev, "event_cause %#x\n", adapter->event_cause);
 
 			if (skb->len > MAX_EVENT_SIZE) {
 				dev_err(dev, "EVENT: event body too large\n");
-				return -1;
+				ret = -1;
+				goto exit_restore_skb;
 			}
 
-			skb_copy_from_linear_data(skb, adapter->event_body,
-						  skb->len);
+			memcpy(adapter->event_body, skb->data +
+			       MWIFIEX_EVENT_HEADER_LEN, skb->len);
+
 			adapter->event_received = true;
 			adapter->event_skb = skb;
 			break;
@@ -124,6 +130,12 @@ static int mwifiex_usb_recv(struct mwifiex_adapter *adapter,
 	}
 
 	return -EINPROGRESS;
+
+exit_restore_skb:
+	/* The buffer will be reused for further cmds/events */
+	skb_push(skb, INTF_HEADER_LEN);
+
+	return ret;
 }
 
 static void mwifiex_usb_rx_complete(struct urb *urb)
diff --git a/drivers/net/wireless/mwifiex/wmm.c b/drivers/net/wireless/mwifiex/wmm.c
index f3fc655..3fa4d41 100644
--- a/drivers/net/wireless/mwifiex/wmm.c
+++ b/drivers/net/wireless/mwifiex/wmm.c
@@ -404,6 +404,8 @@ mwifiex_wmm_init(struct mwifiex_adapter *adapter)
 		priv->add_ba_param.tx_win_size = MWIFIEX_AMPDU_DEF_TXWINSIZE;
 		priv->add_ba_param.rx_win_size = MWIFIEX_AMPDU_DEF_RXWINSIZE;
 
+		mwifiex_reset_11n_rx_seq_num(priv);
+
 		atomic_set(&priv->wmm.tx_pkts_queued, 0);
 		atomic_set(&priv->wmm.highest_queued_prio, HIGH_PRIO_TID);
 	}
@@ -1221,6 +1223,7 @@ mwifiex_dequeue_tx_packet(struct mwifiex_adapter *adapter)
 
 	if (!ptr->is_11n_enabled ||
 	    mwifiex_is_ba_stream_setup(priv, ptr, tid) ||
+	    priv->wps.session_enable ||
 	    ((priv->sec_info.wpa_enabled ||
 	      priv->sec_info.wpa2_enabled) &&
 	     !priv->wpa_is_gtk_set)) {
diff --git a/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c b/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c
index d228358..9970c2b 100644
--- a/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c
+++ b/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c
@@ -301,9 +301,11 @@ static struct usb_device_id rtl8192c_usb_ids[] = {
 	{RTL_USB_DEVICE(0x07b8, 0x8188, rtl92cu_hal_cfg)}, /*Abocom - Abocom*/
 	{RTL_USB_DEVICE(0x07b8, 0x8189, rtl92cu_hal_cfg)}, /*Funai - Abocom*/
 	{RTL_USB_DEVICE(0x0846, 0x9041, rtl92cu_hal_cfg)}, /*NetGear WNA1000M*/
+	{RTL_USB_DEVICE(0x0bda, 0x5088, rtl92cu_hal_cfg)}, /*Thinkware-CC&C*/
 	{RTL_USB_DEVICE(0x0df6, 0x0052, rtl92cu_hal_cfg)}, /*Sitecom - Edimax*/
 	{RTL_USB_DEVICE(0x0df6, 0x005c, rtl92cu_hal_cfg)}, /*Sitecom - Edimax*/
 	{RTL_USB_DEVICE(0x0eb0, 0x9071, rtl92cu_hal_cfg)}, /*NO Brand - Etop*/
+	{RTL_USB_DEVICE(0x4856, 0x0091, rtl92cu_hal_cfg)}, /*NetweeN - Feixun*/
 	/* HP - Lite-On ,8188CUS Slim Combo */
 	{RTL_USB_DEVICE(0x103c, 0x1629, rtl92cu_hal_cfg)},
 	{RTL_USB_DEVICE(0x13d3, 0x3357, rtl92cu_hal_cfg)}, /* AzureWave */
@@ -346,6 +348,7 @@ static struct usb_device_id rtl8192c_usb_ids[] = {
 	{RTL_USB_DEVICE(0x07b8, 0x8178, rtl92cu_hal_cfg)}, /*Funai -Abocom*/
 	{RTL_USB_DEVICE(0x0846, 0x9021, rtl92cu_hal_cfg)}, /*Netgear-Sercomm*/
 	{RTL_USB_DEVICE(0x0b05, 0x17ab, rtl92cu_hal_cfg)}, /*ASUS-Edimax*/
+	{RTL_USB_DEVICE(0x0bda, 0x8186, rtl92cu_hal_cfg)}, /*Realtek 92CE-VAU*/
 	{RTL_USB_DEVICE(0x0df6, 0x0061, rtl92cu_hal_cfg)}, /*Sitecom-Edimax*/
 	{RTL_USB_DEVICE(0x0e66, 0x0019, rtl92cu_hal_cfg)}, /*Hawking-Edimax*/
 	{RTL_USB_DEVICE(0x2001, 0x3307, rtl92cu_hal_cfg)}, /*D-Link-Cameo*/
diff --git a/drivers/net/wireless/ti/wlcore/Kconfig b/drivers/net/wireless/ti/wlcore/Kconfig
index 54156b0..d7b907e 100644
--- a/drivers/net/wireless/ti/wlcore/Kconfig
+++ b/drivers/net/wireless/ti/wlcore/Kconfig
@@ -1,7 +1,6 @@
 config WLCORE
 	tristate "TI wlcore support"
 	depends on WL_TI && GENERIC_HARDIRQS && MAC80211
-	depends on INET
 	select FW_LOADER
 	---help---
 	  This module contains the main code for TI WLAN chips.  It abstracts
diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
index 66e4fcd..a4bb856 100644
--- a/net/mac80211/mlme.c
+++ b/net/mac80211/mlme.c
@@ -1342,7 +1342,6 @@ static void ieee80211_set_disassoc(struct ieee80211_sub_if_data *sdata,
 	struct ieee80211_local *local = sdata->local;
 	struct sta_info *sta;
 	u32 changed = 0;
-	u8 bssid[ETH_ALEN];
 
 	ASSERT_MGD_MTX(ifmgd);
 
@@ -1354,10 +1353,7 @@ static void ieee80211_set_disassoc(struct ieee80211_sub_if_data *sdata,
 
 	ieee80211_stop_poll(sdata);
 
-	memcpy(bssid, ifmgd->associated->bssid, ETH_ALEN);
-
 	ifmgd->associated = NULL;
-	memset(ifmgd->bssid, 0, ETH_ALEN);
 
 	/*
 	 * we need to commit the associated = NULL change because the
@@ -1377,7 +1373,7 @@ static void ieee80211_set_disassoc(struct ieee80211_sub_if_data *sdata,
 	netif_carrier_off(sdata->dev);
 
 	mutex_lock(&local->sta_mtx);
-	sta = sta_info_get(sdata, bssid);
+	sta = sta_info_get(sdata, ifmgd->bssid);
 	if (sta) {
 		set_sta_flag(sta, WLAN_STA_BLOCK_BA);
 		ieee80211_sta_tear_down_BA_sessions(sta, tx);
@@ -1386,13 +1382,16 @@ static void ieee80211_set_disassoc(struct ieee80211_sub_if_data *sdata,
 
 	/* deauthenticate/disassociate now */
 	if (tx || frame_buf)
-		ieee80211_send_deauth_disassoc(sdata, bssid, stype, reason,
-					       tx, frame_buf);
+		ieee80211_send_deauth_disassoc(sdata, ifmgd->bssid, stype,
+					       reason, tx, frame_buf);
 
 	/* flush out frame */
 	if (tx)
 		drv_flush(local, false);
 
+	/* clear bssid only after building the needed mgmt frames */
+	memset(ifmgd->bssid, 0, ETH_ALEN);
+
 	/* remove AP and TDLS peers */
 	sta_info_flush(local, sdata);
 
diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index 7bcecf7..965e6ec 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -2455,7 +2455,7 @@ ieee80211_rx_h_action_return(struct ieee80211_rx_data *rx)
 	 * frames that we didn't handle, including returning unknown
 	 * ones. For all other modes we will return them to the sender,
 	 * setting the 0x80 bit in the action category, as required by
-	 * 802.11-2007 7.3.1.11.
+	 * 802.11-2012 9.24.4.
 	 * Newer versions of hostapd shall also use the management frame
 	 * registration mechanisms, but older ones still use cooked
 	 * monitor interfaces so push all frames there.
@@ -2465,6 +2465,9 @@ ieee80211_rx_h_action_return(struct ieee80211_rx_data *rx)
 	     sdata->vif.type == NL80211_IFTYPE_AP_VLAN))
 		return RX_DROP_MONITOR;
 
+	if (is_multicast_ether_addr(mgmt->da))
+		return RX_DROP_MONITOR;
+
 	/* do not return rejected action frames */
 	if (mgmt->u.action.category & 0x80)
 		return RX_DROP_UNUSABLE;
diff --git a/net/nfc/nci/ntf.c b/net/nfc/nci/ntf.c
index cb26461..2ab196a 100644
--- a/net/nfc/nci/ntf.c
+++ b/net/nfc/nci/ntf.c
@@ -106,7 +106,7 @@ static __u8 *nci_extract_rf_params_nfca_passive_poll(struct nci_dev *ndev,
 	nfca_poll->sens_res = __le16_to_cpu(*((__u16 *)data));
 	data += 2;
 
-	nfca_poll->nfcid1_len = *data++;
+	nfca_poll->nfcid1_len = min_t(__u8, *data++, NFC_NFCID1_MAXSIZE);
 
 	pr_debug("sens_res 0x%x, nfcid1_len %d\n",
 		 nfca_poll->sens_res, nfca_poll->nfcid1_len);
@@ -130,7 +130,7 @@ static __u8 *nci_extract_rf_params_nfcb_passive_poll(struct nci_dev *ndev,
 			struct rf_tech_specific_params_nfcb_poll *nfcb_poll,
 						     __u8 *data)
 {
-	nfcb_poll->sensb_res_len = *data++;
+	nfcb_poll->sensb_res_len = min_t(__u8, *data++, NFC_SENSB_RES_MAXSIZE);
 
 	pr_debug("sensb_res_len %d\n", nfcb_poll->sensb_res_len);
 
@@ -145,7 +145,7 @@ static __u8 *nci_extract_rf_params_nfcf_passive_poll(struct nci_dev *ndev,
 						     __u8 *data)
 {
 	nfcf_poll->bit_rate = *data++;
-	nfcf_poll->sensf_res_len = *data++;
+	nfcf_poll->sensf_res_len = min_t(__u8, *data++, NFC_SENSF_RES_MAXSIZE);
 
 	pr_debug("bit_rate %d, sensf_res_len %d\n",
 		 nfcf_poll->bit_rate, nfcf_poll->sensf_res_len);
@@ -331,7 +331,7 @@ static int nci_extract_activation_params_iso_dep(struct nci_dev *ndev,
 	switch (ntf->activation_rf_tech_and_mode) {
 	case NCI_NFC_A_PASSIVE_POLL_MODE:
 		nfca_poll = &ntf->activation_params.nfca_poll_iso_dep;
-		nfca_poll->rats_res_len = *data++;
+		nfca_poll->rats_res_len = min_t(__u8, *data++, 20);
 		pr_debug("rats_res_len %d\n", nfca_poll->rats_res_len);
 		if (nfca_poll->rats_res_len > 0) {
 			memcpy(nfca_poll->rats_res,
@@ -341,7 +341,7 @@ static int nci_extract_activation_params_iso_dep(struct nci_dev *ndev,
 
 	case NCI_NFC_B_PASSIVE_POLL_MODE:
 		nfcb_poll = &ntf->activation_params.nfcb_poll_iso_dep;
-		nfcb_poll->attrib_res_len = *data++;
+		nfcb_poll->attrib_res_len = min_t(__u8, *data++, 50);
 		pr_debug("attrib_res_len %d\n", nfcb_poll->attrib_res_len);
 		if (nfcb_poll->attrib_res_len > 0) {
 			memcpy(nfcb_poll->attrib_res,
diff --git a/net/nfc/rawsock.c b/net/nfc/rawsock.c
index ec1134c..8b8a6a2 100644
--- a/net/nfc/rawsock.c
+++ b/net/nfc/rawsock.c
@@ -54,7 +54,10 @@ static int rawsock_release(struct socket *sock)
 {
 	struct sock *sk = sock->sk;
 
-	pr_debug("sock=%p\n", sock);
+	pr_debug("sock=%p sk=%p\n", sock, sk);
+
+	if (!sk)
+		return 0;
 
 	sock_orphan(sk);
 	sock_put(sk);
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply related

* Re: [patch net-next 2/4] virtio_net: use IFF_LIFE_ADDR_CHANGE priv_flag
From: Michael S. Tsirkin @ 2012-06-28 19:21 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, shimoda.hiroaki, virtualization, danny.kukawka, edumazet,
	davem
In-Reply-To: <1340892639-1111-3-git-send-email-jpirko@redhat.com>

On Thu, Jun 28, 2012 at 04:10:37PM +0200, Jiri Pirko wrote:
> Signed-off-by: Jiri Pirko <jpirko@redhat.com>

FWIW

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/net/virtio_net.c |   11 +++++------
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 36a16d5..6a0f526 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -679,12 +679,11 @@ static int virtnet_set_mac_address(struct net_device *dev, void *p)
>  {
>  	struct virtnet_info *vi = netdev_priv(dev);
>  	struct virtio_device *vdev = vi->vdev;
> -	struct sockaddr *addr = p;
> +	int ret;
>  
> -	if (!is_valid_ether_addr(addr->sa_data))
> -		return -EADDRNOTAVAIL;
> -	memcpy(dev->dev_addr, addr->sa_data, ETH_ALEN);
> -	dev->addr_assign_type &= ~NET_ADDR_RANDOM;
> +	ret = eth_mac_addr(dev, p);
> +	if (ret)
> +		return ret;
>  
>  	if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC))
>  		vdev->config->set(vdev, offsetof(struct virtio_net_config, mac),
> @@ -1063,7 +1062,7 @@ static int virtnet_probe(struct virtio_device *vdev)
>  		return -ENOMEM;
>  
>  	/* Set up network device as normal. */
> -	dev->priv_flags |= IFF_UNICAST_FLT;
> +	dev->priv_flags |= IFF_UNICAST_FLT | IFF_LIFE_ADDR_CHANGE;
>  	dev->netdev_ops = &virtnet_netdev;
>  	dev->features = NETIF_F_HIGHDMA;
>  
> -- 
> 1.7.10.4

^ permalink raw reply

* [PATCH] net: unlock busylock before servicing qdisc
From: Eric Dumazet @ 2012-06-28 19:22 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Tom Parkin

From: Eric Dumazet <edumazet@google.com>

commit 79640a4ca6 (net: add additional lock to qdisc to increase
throughput) added a LOCKDEP occurrence on bonding/l2tp or other stacked
devices, if several levels use a qdisc.

We should release the busylock before the call to sch_direct_xmit() or
else LOCKDEP complains we try to lock busylock again in the lower dev
qdisc handler.

This of course is a false lockdep positive, but this
patch is the simplest way to get rid of it.

Reported-by: Tom Parkin <tparkin@katalix.com>
Tested-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
---
 net/core/dev.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 6df2140..d265c67 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2412,13 +2412,13 @@ static inline int __dev_xmit_skb(struct sk_buff
*skb, struct Qdisc *q,
 
 		qdisc_bstats_update(q, skb);
 
-		if (sch_direct_xmit(skb, q, dev, txq, root_lock)) {
-			if (unlikely(contended)) {
-				spin_unlock(&q->busylock);
-				contended = false;
-			}
+		if (unlikely(contended)) {
+			spin_unlock(&q->busylock);
+			contended = false;
+		}
+		if (sch_direct_xmit(skb, q, dev, txq, root_lock))
 			__qdisc_run(q);
-		} else
+		else
 			qdisc_run_end(q);
 
 		rc = NET_XMIT_SUCCESS;

^ permalink raw reply related

* Re: 3.4.x regression: rtl8169: frequent resets
From: Francois Romieu @ 2012-06-28 19:30 UTC (permalink / raw)
  To: Nix; +Cc: netdev, linux-kernel
In-Reply-To: <87y5n7qup0.fsf@spindle.srvr.nix>

Nix <nix@esperi.org.uk> :
> I recently upgraded from 3.3.x to 3.4.4, and am now experiencing
> networking problems with my desktop box's r8169 card. The symptoms are
> that all traffic ceases for five to ten seconds, then the card appears
> to reset and everything is back to normal -- until it happens again. It
> can happen quite a lot:

Can you try and revert 036dafa28da1e2565a8529de2ae663c37b7a0060 ?

I would welcome a complete dmesg including the XID line from the
r8169 driver.

Thanks.

-- 
Ueimor

^ permalink raw reply

* Re: [PATCH v2] sctp: be more restrictive in transport selection on bundled sacks
From: Neil Horman @ 2012-06-28 20:14 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: netdev, David S. Miller, linux-sctp
In-Reply-To: <4FECA0CF.10400@gmail.com>

On Thu, Jun 28, 2012 at 02:22:07PM -0400, Vlad Yasevich wrote:
> On 06/28/2012 02:07 PM, Neil Horman wrote:
> >On Thu, Jun 28, 2012 at 11:58:56AM -0400, Vlad Yasevich wrote:
> >>On 06/28/2012 11:33 AM, Neil Horman wrote:
> >>>On Wed, Jun 27, 2012 at 03:44:22PM -0400, Vlad Yasevich wrote:
> >>>>On 06/27/2012 01:28 PM, Neil Horman wrote:
> >>>>>On Wed, Jun 27, 2012 at 11:10:26AM -0400, Vlad Yasevich wrote:
> >>>>>>
> >>>>>>I didn't think of this yesterday, but I think it would be much
> >>>>>>better to use pkt->transport here since you are adding the chunk to
> >>>>>>the packet and it will go out on the transport of the packet.  You
> >>>>>>are also guaranteed that pkt->transport is set.
> >>>>>>
> >>>>>I don't think it really matters, as the chunk transport is used to lookup the
> >>>>>packet that we append to, and if the chunk transport is unset, its somewhat
> >>>>>questionable as to weather we should bundle, but if packet->transport is set,
> >>>>>its probably worth it to avoid the extra conditional.
> >>>>>
> >>>>
> >>>>Just looked at the code flow.  chunk->transport may not be set until
> >>>>the end of sctp_packet_append_chunk.  For new data, transport may
> >>>>not be set.  For retransmitted data, transport is set to last
> >>>>transport data was sent on.  So, we could be looking at the wrong
> >>>>transport.  What you are trying to decided is if the current
> >>>>transport we will be used can take the SACK, but you may not be
> >>>>looking at the current transport. Looking at packet->transport is
> >>>>the correct thing to do.
> >>>>
> >>>>-vlad
> >>>>
> >>>So, I agree after what you said above, that this is the right thing to do.  That
> >>>said, I just tested the change with the SCTP_RR test in netperf, and it wound up
> >>>giving me horrid performance (Its reporting about 5 transactions per second).
> >>>It appears that whats happening is that, because the test alternates which
> >>>transports it sends out, and because it waits for a sack of teh prior packet
> >>>before it sends out the next transaction, we're always missing the bundle
> >>>opportunity, and always waiting for the 200ms timeout for the sack to occur.
> >>>While I know this is a pessimal case, it really seems bad to me.  It seems that
> >>>because I was using chunk->transport previously, I luckily got the transport
> >>>wrong sometimes, and it managed to bundle more often.
> >>>
> >>>So I'm not sure what to do here.  I had really wanted to avoid adding a sysctl
> >>>here, but given that this is likely a corner cases, it seems that might be the
> >>>best approach.  Do you have any thoughts?
> >>>
> >>>Neil
> >>>
> >>
> >>that's strange.  did you modify the SCTP_RR to alternate transports?
> >>Seems like responses in the RR test need to go the address of the
> >>sender so that we don't see things like:
> >>Request (t) --->
> >>             <--- Response (t2)
> >>
> >>Should be:
> >>Request (t1) --->
> >>              <--- Response (t1)
> >>
> >>
> >>-vlad
> >That would seem to me to be the case too....
> >
> >However, having looked at this some more, it seems I just jumped the gun on
> >this. Its happening because sctp_eat_data variants are issuing a SCTP_GEN_SACK
> >command at the end of every received packet, which causes the moved_ctsn value
> >to get cleared.
> 
> Ok, that should only happen the very first time as we are supposed
> to ack the first data immediately.  On subsequent packets it should
> just start a timer because we are following the 2pkt/200ms rule.
> Then, when the response happens, we should bundle the SACK as long
> as the data is leaving on the transport that moved the CTSN.
> 
> So we might be using the wrong transport and as result you send data
> and then end up waiting for a SACK.
> 

So, good news, and more good news - 

The good news is that I found the problem - and its me :)
When I modified the code to use pkt->transport over chunk->transport I removed
the ! in front of the test on accident, and and so was not bundling when I
should have been.  Quite stupid of me, sorry for the noise.

The other good news is that while doing this I think I have a way to save us
from having to do that for loop in sctp_make_sack, so this should be a bit more
scalable.  I'll post when I finish testing tomorrow.

Best
Neil

^ permalink raw reply

* Re: [patch net-next 1/4] net: introduce new priv_flag indicating iface capable of change mac when running
From: Michael S. Tsirkin @ 2012-06-28 20:32 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jiri Pirko, netdev, shimoda.hiroaki, virtualization,
	danny.kukawka, edumazet, davem
In-Reply-To: <1340897092.13187.110.camel@edumazet-glaptop>

On Thu, Jun 28, 2012 at 05:24:52PM +0200, Eric Dumazet wrote:
> On Thu, 2012-06-28 at 16:10 +0200, Jiri Pirko wrote:
> > Introduce IFF_LIFE_ADDR_CHANGE priv_flag and use it to disable
> > netif_running() check in eth_mac_addr()
> > 
> > Signed-off-by: Jiri Pirko <jpirko@redhat.com>
> > ---
> >  include/linux/if.h |    2 ++
> >  net/ethernet/eth.c |    2 +-
> >  2 files changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/if.h b/include/linux/if.h
> > index f995c66..fd9ee7c 100644
> > --- a/include/linux/if.h
> > +++ b/include/linux/if.h
> > @@ -81,6 +81,8 @@
> >  #define IFF_UNICAST_FLT	0x20000		/* Supports unicast filtering	*/
> >  #define IFF_TEAM_PORT	0x40000		/* device used as team port */
> >  #define IFF_SUPP_NOFCS	0x80000		/* device supports sending custom FCS */
> > +#define IFF_LIFE_ADDR_CHANGE 0x100000	/* device supports hardware address
> > +					 * change when it's running */
> >  
> > 
> >  #define IF_GET_IFACE	0x0001		/* for querying only */
> > diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
> > index 36e5880..8f8ded4 100644
> > --- a/net/ethernet/eth.c
> > +++ b/net/ethernet/eth.c
> > @@ -283,7 +283,7 @@ int eth_mac_addr(struct net_device *dev, void *p)
> >  {
> >  	struct sockaddr *addr = p;
> >  
> > -	if (netif_running(dev))
> > +	if (!(dev->priv_flags & IFF_LIFE_ADDR_CHANGE) && netif_running(dev))
> >  		return -EBUSY;
> >  	if (!is_valid_ether_addr(addr->sa_data))
> >  		return -EADDRNOTAVAIL;
> 
> Since the memcpy() is not atomic, there is a small window where a reader
> could get a half-changed mac address. I guess its a detail.
> 

At least for virtio nothing changes - we had this bug forever.
How'd you fix this?

-- 
MST

^ permalink raw reply

* Re: 3.4.x regression: rtl8169: frequent resets
From: Nix @ 2012-06-28 21:12 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, linux-kernel
In-Reply-To: <20120628193026.GA31627@electric-eye.fr.zoreil.com>

[-- Attachment #1: Type: text/plain, Size: 2495 bytes --]

On 28 Jun 2012, Francois Romieu stated:

> Nix <nix@esperi.org.uk> :
>> I recently upgraded from 3.3.x to 3.4.4, and am now experiencing
>> networking problems with my desktop box's r8169 card. The symptoms are
>> that all traffic ceases for five to ten seconds, then the card appears
>> to reset and everything is back to normal -- until it happens again. It
>> can happen quite a lot:
>
> Can you try and revert 036dafa28da1e2565a8529de2ae663c37b7a0060 ?

I can try, but there's been a *lot* of code motion since then, 'git
revert' fails hilariously (trying to patch obviously the wrong places):
I'll have to do it by hand.

I'll try that tomorrow. (But, as before, it might be several days before
we know anything one way or the other. Assuming I can revert it without
fouling something else up.)

I'm not using BQL (yet, anyway).

I note that at some time (after the first reset?), my MTU either flipped
back to 1500, from its initial jumbo default, or simply refused to go
jumbo in the first place. I bring it up like so:

ip link set fastnet up multicast on txqueuelen 100 mtu 7200
ip addr add local 192.168.16.20/24 broadcast 192.168.16.255 dev fastnet

but its MTU is now shown as 1500 :( so at some point either jumbo frames
have stopped working or the reset is flipping them off. (It used to
work, with a warning yelling about how terribly dangerous jumbo frames
were because any attacker on the local subnet could subvert my machine.
Any attacker on the local subnet will be sitting in my lap and/or will
have root on the NFS server from which this machine is getting all its
data, so I don't care one jot about that. This may be because rx
checksumming is turned on by default, and as of last year that forces
jumbo frames off: I've turned that off and will see if jumbo frames
start working next time I bring the interface up. I *thought* my local
network was awful slow lately...)

> I would welcome a complete dmesg including the XID line from the
> r8169 driver.

Now that I can do :) it's long, so here's the relevant bit: complete
gzipped dmesg attached.

[    1.338060] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[    1.339700] r8169 0000:06:00.0: irq 70 for MSI/MSI-X
[    1.339793] r8169 0000:06:00.0: eth0: RTL8168c/8111c at 0xffffc90000048000, 00:24:8c:0f:64:18, XID 1c4000c0 IRQ 70
[    1.341389] r8169 0000:06:00.0: eth0: jumbo features [frames: 6128 bytes, tx checksumming: ko]

(the interface is renamed to 'fastnet' by udev a little later in the
boot process.)


[-- Attachment #2: gzipped complete dmesg --]
[-- Type: application/octet-stream, Size: 18208 bytes --]

^ permalink raw reply

* Re: [patch net-next 1/4] net: introduce new priv_flag indicating iface capable of change mac when running
From: Ben Hutchings @ 2012-06-28 22:17 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: mst, netdev, shimoda.hiroaki, virtualization, danny.kukawka,
	edumazet, davem
In-Reply-To: <1340892639-1111-2-git-send-email-jpirko@redhat.com>

On Thu, 2012-06-28 at 16:10 +0200, Jiri Pirko wrote:
> Introduce IFF_LIFE_ADDR_CHANGE priv_flag and use it to disable
> netif_running() check in eth_mac_addr()
>
> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
> ---
>  include/linux/if.h |    2 ++
>  net/ethernet/eth.c |    2 +-
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/if.h b/include/linux/if.h
> index f995c66..fd9ee7c 100644
> --- a/include/linux/if.h
> +++ b/include/linux/if.h
> @@ -81,6 +81,8 @@
>  #define IFF_UNICAST_FLT	0x20000		/* Supports unicast filtering	*/
>  #define IFF_TEAM_PORT	0x40000		/* device used as team port */
>  #define IFF_SUPP_NOFCS	0x80000		/* device supports sending custom FCS */
> +#define IFF_LIFE_ADDR_CHANGE 0x100000	/* device supports hardware address
> +					 * change when it's running */
[...]

Any device that has IFF_UNICAST_FLT can update the unicast MAC filter
while it's running; doesn't that go hand-in-hand with being able to
handle changes to the primary MAC address?  Is the new flag really
necessary at all?

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: 3.4.x regression: rtl8169: frequent resets
From: Francois Romieu @ 2012-06-28 22:23 UTC (permalink / raw)
  To: Nix; +Cc: netdev, linux-kernel
In-Reply-To: <87fw9fp2e1.fsf@spindle.srvr.nix>

Nix <nix@esperi.org.uk> :
> Francois Romieu <romieu@fr.zoreil.com> :
> > Can you try and revert 036dafa28da1e2565a8529de2ae663c37b7a0060 ?
> 
> I can try, but there's been a *lot* of code motion since then, 'git
> revert' fails hilariously (trying to patch obviously the wrong places):
> I'll have to do it by hand.

There is a single line reject in rtl8169_start_xmit. Other than that it
should patch -p1 -R fine.

[...]
> I note that at some time (after the first reset?), my MTU either flipped
> back to 1500, from its initial jumbo default, or simply refused to go
> jumbo in the first place. I bring it up like so:
> 
> ip link set fastnet up multicast on txqueuelen 100 mtu 7200
> ip addr add local 192.168.16.20/24 broadcast 192.168.16.255 dev fastnet
> 
> but its MTU is now shown as 1500 :( so at some point either jumbo frames
> have stopped working or the reset is flipping them off.
[...]
> [    1.341389] r8169 0000:06:00.0: eth0: jumbo features [frames: 6128 bytes, tx checksumming: ko]

This chipset is not supposed to be pushed beyond 6128 bytes.

-- 
Ueimor

^ permalink raw reply

* Re: [PATCH net-next] fq_codel: report congestion notification at enqueue time
From: Yuchung Cheng @ 2012-06-28 22:56 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Dave Taht, David Miller, netdev, codel, Tom Herbert, Matt Mathis,
	Nandita Dukkipati, Neal Cardwell
In-Reply-To: <1340907151.13187.169.camel@edumazet-glaptop>

On Thu, Jun 28, 2012 at 11:12 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2012-06-28 at 10:51 -0700, Dave Taht wrote:
>
>> clever idea. A problem is there are other forms of network traffic on
>> a link, and this is punishing a single tcp
Dave: it won't just punish a single TCP, all protocols that react to
XMIT_CN will share similar fate.

>> stream that may not be the source of the problem in the first place,
>> and basically putting it into double jeopardy.
>>
>
> Why ? In fact this patch helps the tcp session being signaled (as it
> will be anyway) at enqueue time, instead of having to react to packet
> losses indications given (after RTT) by receiver.
>
> Avoiding losses help receiver to consume data without having to buffer
> it into Out Of Order queue.
>
> So its not jeopardy, but early congestion notification without RTT
> delay.
>
> NET_XMIT_CN is a soft signal, far more disruptive than a DROP.
I don't read here: you mean far "less" disruptive in terms of performance?

>
>> I am curious as to how often an enqueue is actually dropping in the
>> codel/fq_codel case, the hope was that there would be plenty of
>> headroom under far more circumstances on this qdisc.
>>
>
> "tc -s qdisc show dev eth0" can show you all the counts.
>
> We never drop a packet at enqueue time, unless you hit the emergency
> limit (10240 packets for fq_codel). When you reach this limit, you are
> under trouble.

Another nice feature is to resume TCP xmit operation at when the skb
hits the wire.
My hutch is that Eric is working on this too.
>
>
>

^ permalink raw reply

* Re: [PATCH] ipv4: Create and use fib_compute_spec_dst() helper.
From: Julian Anastasov @ 2012-06-28 23:16 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120628.035946.1026666621448892198.davem@davemloft.net>


	Hello,

On Thu, 28 Jun 2012, David Miller wrote:

> +__be32 fib_compute_spec_dst(struct sk_buff *skb)
> +{
> +	struct net_device *dev = skb->dev;
> +	struct in_device *in_dev;
> +	struct fib_result res;
> +	struct flowi4 fl4;
> +	struct net *net;
> +

	This is bad for looped m/b-cast: ip_mc_output calls
ip_dev_loopback_xmit where pkt_type is set to PACKET_LOOPBACK.
May be we have to check skb_dst as below.

	Also, for rt_spec_dst __mkroute_output used fl4->daddr for
looped unicast traffic and fl4->saddr for
RTCF_BROADCAST | RTCF_MULTICAST. Can we check rt_flags instead?
It will catch some pkt_type values such as PACKET_LOOPBACK.
For example, check similar to icmp_send():

	struct rtable *rt = skb_rtable(skb);

	/* input or output route destined to local stack?
	 * daddr can be mcast/bcast
	 */
	if (rt && rt->rt_flags & RTCF_LOCAL) {
		if (rt_is_input_route(rt)) {
			if (!(rt->rt_flags & (RTCF_BROADCAST |
					      RTCF_MULTICAST)))
				return ip_hdr(skb)->daddr;
			/* select saddr */
		} else
			return (rt->rt_flags & (RTCF_BROADCAST | 
						RTCF_MULTICAST)) ?
				ip_hdr(skb)->saddr : ip_hdr(skb)->daddr;
	}

	Isn't pkt_type just a hint from sender? We can not use
it for such places. What if hackers use unicast mac (PACKET_HOST)
combined with multicast daddr? I'm not sure that pkt_type
is a strong validator for layer 3 addresses.

> +	if (skb->pkt_type != PACKET_BROADCAST &&
> +	    skb->pkt_type != PACKET_MULTICAST)
> +		return ip_hdr(skb)->daddr;

	So, above check should be removed if we check rt_flags.

> +
> +	in_dev = __in_dev_get_rcu(dev);
> +	BUG_ON(!in_dev);
> +	fl4.flowi4_oif = 0;
> +	fl4.flowi4_iif = 0;

	It is not clear to me why ip_route_input_mc and
ip_route_input_slow (brd_input) call fib_validate_source()
with arg oif=0. How would fib_rule_match match flowi_iif
for such traffic then?

	May be iif should be always set just like in
ip_route_output_slow because now we do output lookup to
unicast target?:

	fl4.flowi4_iif = net->loopback_dev->ifindex;

> +	fl4.daddr = ip_hdr(skb)->saddr;
> +	fl4.saddr = ip_hdr(skb)->daddr;

	Here only 0 is allowed for m/b-cast daddr, we are not
going to use ip_hdr(skb)->daddr, so no need to provide it:

	fl4.saddr = 0;

	fib_validate_source was called with dst=0 for bcast/mcast.

> +	fl4.flowi4_tos = RT_TOS(ip_hdr(skb)->tos);
> +	fl4.flowi4_scope = RT_SCOPE_UNIVERSE;
> +	fl4.flowi4_mark = IN_DEV_SRC_VMARK(in_dev) ? skb->mark : 0;
> +
> +	net = dev_net(dev);

	Now net will be needed above for iif.

> +	if (!fib_lookup(net, &fl4, &res))
> +		return FIB_RES_PREFSRC(net, res);
> +	else

	What about providing ip_hdr(skb)->saddr as
2nd argument here (it is validated by input routing
to be unicast or 0.0.0.0):

> +		return inet_select_addr(dev, 0, RT_SCOPE_UNIVERSE);

	By this way we will prefer local address from the
same subnet as iph->saddr. Also, we should not call
fib_lookup if ipv4_is_zeronet(ip_hdr(skb)->saddr), we
should use RT_SCOPE_LINK for inet_select_addr in this case.
By this way we will prefer addresses with scope link for
target 0.0.0.0. There is such logic that uses RT_SCOPE_LINK in
ip_route_input_mc() and ip_route_input_slow().

>  int ip_options_compile(struct net *net,
>  		       struct ip_options *opt, struct sk_buff *skb)
>  {
> -	int l;
> -	unsigned char *iph;
> -	unsigned char *optptr;
> -	int optlen;
> +	__be32 spec_dst = (__force __be32) 0;
>  	unsigned char *pp_ptr = NULL;
> -	struct rtable *rt = NULL;
> +	unsigned char *optptr;
> +	unsigned char *iph;
> +	int optlen, l;
>  
>  	if (skb != NULL) {
> -		rt = skb_rtable(skb);
> +		spec_dst = fib_compute_spec_dst(skb);

	If we use rt_flags above I'm not sure whether
ip_options_compile is called with valid rt_flags from the
bridging code.

Regards
--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Re: 3.4.x regression: rtl8169: frequent resets
From: Nix @ 2012-06-28 23:10 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, linux-kernel
In-Reply-To: <20120628222325.GA1579@electric-eye.fr.zoreil.com>

On 28 Jun 2012, Francois Romieu stated:

> Nix <nix@esperi.org.uk> :
>> Francois Romieu <romieu@fr.zoreil.com> :
>> > Can you try and revert 036dafa28da1e2565a8529de2ae663c37b7a0060 ?
>> 
>> I can try, but there's been a *lot* of code motion since then, 'git
>> revert' fails hilariously (trying to patch obviously the wrong places):
>> I'll have to do it by hand.
>
> There is a single line reject in rtl8169_start_xmit. Other than that it
> should patch -p1 -R fine.

... and indeed it does. Weird, why does git revert fail so badly when
patch and git apply are both happy?!

I'll reboot into this kernel tomorrow, and report back in a few days (or
sooner if it goes wrong).

>> [    1.341389] r8169 0000:06:00.0: eth0: jumbo features [frames: 6128 bytes, tx checksumming: ko]
>
> This chipset is not supposed to be pushed beyond 6128 bytes.

Interesting. It's always worked flawlessly at 7200 for me before, until,
uh, last year when it stopped working and I never noticed (in fact
a 7200-byte MTU was how the machine was shipped to me :) ).

I guess I'll knock it down to 6128 then, less than 1000 bytes isn't
going to ruin performance by any means...

... aand that works. thanks! (Let's see if the link stuttering continues. I
expect it will, though it's been three hours since the last stutter...)

-- 
NULL && (void)

^ permalink raw reply

* Re: [PATCH] ipv4: Create and use fib_compute_spec_dst() helper.
From: David Miller @ 2012-06-28 23:27 UTC (permalink / raw)
  To: ja; +Cc: netdev
In-Reply-To: <alpine.LFD.2.00.1206282309340.1722@ja.ssi.bg>

From: Julian Anastasov <ja@ssi.bg>
Date: Fri, 29 Jun 2012 02:16:25 +0300 (EEST)

> 	Hello,

I really appreciate your detailed analysis, I'll look deeply into
all of your feedback.

Thanks!

^ permalink raw reply

* Re: 3.4.x regression: rtl8169: frequent resets
From: Stefan Lippers-Hollmann @ 2012-06-28 23:31 UTC (permalink / raw)
  To: Francois Romieu; +Cc: Nix, netdev, linux-kernel
In-Reply-To: <20120628193026.GA31627@electric-eye.fr.zoreil.com>

[-- Attachment #1: Type: Text/Plain, Size: 1386 bytes --]

Hi

On Thursday 28 June 2012, Francois Romieu wrote:
> Nix <nix@esperi.org.uk> :
> > I recently upgraded from 3.3.x to 3.4.4, and am now experiencing
> > networking problems with my desktop box's r8169 card. The symptoms are
> > that all traffic ceases for five to ten seconds, then the card appears
> > to reset and everything is back to normal -- until it happens again. It
> > can happen quite a lot:
> 
> Can you try and revert 036dafa28da1e2565a8529de2ae663c37b7a0060 ?
> 
> I would welcome a complete dmesg including the XID line from the
> r8169 driver.

I received the same oops from a 3.4.4 user with these onboard network 
cards:

r8169 0000:04:00.0: eth0: RTL8168d/8111d at 0xffffc90000c72000, 00:24:1d:72:7c:75, XID 081000c0 IRQ 44
r8169 0000:05:00.0: eth1: RTL8168d/8111d at 0xffffc90000c70000, 00:24:1d:72:7c:77, XID 081000c0 IRQ 45

Reverting 036dafa28da1e2565a8529de2ae663c37b7a0060 (Nix, trivial 
backport to 3.4.4 attached) did improve the situation, no oops in 21
hours uptime so far (while it usually shows up within about an hour).
Unfortunately his oops report was cut brief, so I've asked him to try 
reproducing it with an unpatched kernel again, to collect a full dmesg
(the test is still going on, past the one hour mark, but the oops 
hasn't triggered yet). I'll report back, as soon as I get confirmation 
and a full dmesg.

Regards
	Stefan Lippers-Hollmann

[-- Attachment #2: revert-r8169-add-byte-queue-limit-support.patch --]
[-- Type: text/x-patch, Size: 1832 bytes --]

revert r8169: add byte queue limit support.

--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -5000,7 +5000,6 @@ static void rtl8169_tx_clear(struct rtl8
 {
 	rtl8169_tx_clear_range(tp, tp->dirty_tx, NUM_TX_DESC);
 	tp->cur_tx = tp->dirty_tx = 0;
-	netdev_reset_queue(tp->dev);
 }
 
 static void rtl_reset_work(struct rtl8169_private *tp)
@@ -5155,8 +5154,6 @@ static netdev_tx_t rtl8169_start_xmit(st
 
 	txd->opts2 = cpu_to_le32(opts[1]);
 
-	netdev_sent_queue(dev, skb->len);
-
 	skb_tx_timestamp(skb);
 
 	wmb();
@@ -5253,16 +5250,9 @@ static void rtl8169_pcierr_interrupt(str
 	rtl_schedule_task(tp, RTL_FLAG_TASK_RESET_PENDING);
 }
 
-struct rtl_txc {
-	int packets;
-	int bytes;
-};
-
 static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp)
 {
-	struct rtl8169_stats *tx_stats = &tp->tx_stats;
 	unsigned int dirty_tx, tx_left;
-	struct rtl_txc txc = { 0, 0 };
 
 	dirty_tx = tp->dirty_tx;
 	smp_rmb();
@@ -5281,24 +5271,17 @@ static void rtl_tx(struct net_device *de
 		rtl8169_unmap_tx_skb(&tp->pci_dev->dev, tx_skb,
 				     tp->TxDescArray + entry);
 		if (status & LastFrag) {
-			struct sk_buff *skb = tx_skb->skb;
-
-			txc.packets++;
-			txc.bytes += skb->len;
-			dev_kfree_skb(skb);
+			u64_stats_update_begin(&tp->tx_stats.syncp);
+			tp->tx_stats.packets++;
+			tp->tx_stats.bytes += tx_skb->skb->len;
+			u64_stats_update_end(&tp->tx_stats.syncp);
+			dev_kfree_skb(tx_skb->skb);
 			tx_skb->skb = NULL;
 		}
 		dirty_tx++;
 		tx_left--;
 	}
 
-	u64_stats_update_begin(&tx_stats->syncp);
-	tx_stats->packets += txc.packets;
-	tx_stats->bytes += txc.bytes;
-	u64_stats_update_end(&tx_stats->syncp);
-
-	netdev_completed_queue(dev, txc.packets, txc.bytes);
-
 	if (tp->dirty_tx != dirty_tx) {
 		tp->dirty_tx = dirty_tx;
 		/* Sync with rtl8169_start_xmit:

^ permalink raw reply

* Re: 3.4.x regression: rtl8169: frequent resets
From: Nix @ 2012-06-28 23:42 UTC (permalink / raw)
  To: Stefan Lippers-Hollmann; +Cc: Francois Romieu, netdev, linux-kernel
In-Reply-To: <201206290131.49150.s.L-H@gmx.de>

On 29 Jun 2012, Stefan Lippers-Hollmann uttered the following:
> I received the same oops from a 3.4.4 user with these onboard network 
> cards:

Pedant point: it's not an oops, just a slowpath warning.

>                                                        no oops in 21
> hours uptime so far (while it usually shows up within about an hour).

Interesting. I wonder why it took two days to show up for me. It's not
like I'm a very light network user or anything, there are easily 300
packets per second in each direction on this interface every second of
the day (and often much more).

-- 
NULL && (void)

^ permalink raw reply

* [PATCH net-next 2/4] tcp: pass fl6 to inet6_csk_route_req()
From: Neal Cardwell @ 2012-06-28 22:34 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Eric Dumazet, Neal Cardwell
In-Reply-To: <1340922861-11684-1-git-send-email-ncardwell@google.com>

This commit changes inet_csk_route_req() so that it uses a pointer to
a struct flowi6, rather than allocating its own on the stack. This
brings its behavior in line with its IPv4 cousin,
inet_csk_route_req(), and allows a follow-on patch to fix a dst leak.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
---
 include/net/inet6_connection_sock.h |    1 +
 net/ipv6/inet6_connection_sock.c    |   24 ++++++++++++------------
 net/ipv6/tcp_ipv6.c                 |    9 ++++++---
 3 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/include/net/inet6_connection_sock.h b/include/net/inet6_connection_sock.h
index 1866a67..df2a857 100644
--- a/include/net/inet6_connection_sock.h
+++ b/include/net/inet6_connection_sock.h
@@ -26,6 +26,7 @@ extern int inet6_csk_bind_conflict(const struct sock *sk,
 				   const struct inet_bind_bucket *tb, bool relax);
 
 extern struct dst_entry* inet6_csk_route_req(struct sock *sk,
+					     struct flowi6 *fl6,
 					     const struct request_sock *req);
 
 extern struct request_sock *inet6_csk_search_req(const struct sock *sk,
diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c
index e23d354..bceb144 100644
--- a/net/ipv6/inet6_connection_sock.c
+++ b/net/ipv6/inet6_connection_sock.c
@@ -55,26 +55,26 @@ int inet6_csk_bind_conflict(const struct sock *sk,
 EXPORT_SYMBOL_GPL(inet6_csk_bind_conflict);
 
 struct dst_entry *inet6_csk_route_req(struct sock *sk,
+				      struct flowi6 *fl6,
 				      const struct request_sock *req)
 {
 	struct inet6_request_sock *treq = inet6_rsk(req);
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	struct in6_addr *final_p, final;
 	struct dst_entry *dst;
-	struct flowi6 fl6;
-
-	memset(&fl6, 0, sizeof(fl6));
-	fl6.flowi6_proto = IPPROTO_TCP;
-	fl6.daddr = treq->rmt_addr;
-	final_p = fl6_update_dst(&fl6, np->opt, &final);
-	fl6.saddr = treq->loc_addr;
-	fl6.flowi6_oif = treq->iif;
-	fl6.flowi6_mark = sk->sk_mark;
-	fl6.fl6_dport = inet_rsk(req)->rmt_port;
-	fl6.fl6_sport = inet_rsk(req)->loc_port;
-	security_req_classify_flow(req, flowi6_to_flowi(&fl6));
 
-	dst = ip6_dst_lookup_flow(sk, &fl6, final_p, false);
+	memset(fl6, 0, sizeof(*fl6));
+	fl6->flowi6_proto = IPPROTO_TCP;
+	fl6->daddr = treq->rmt_addr;
+	final_p = fl6_update_dst(fl6, np->opt, &final);
+	fl6->saddr = treq->loc_addr;
+	fl6->flowi6_oif = treq->iif;
+	fl6->flowi6_mark = sk->sk_mark;
+	fl6->fl6_dport = inet_rsk(req)->rmt_port;
+	fl6->fl6_sport = inet_rsk(req)->loc_port;
+	security_req_classify_flow(req, flowi6_to_flowi(fl6));
+
+	dst = ip6_dst_lookup_flow(sk, fl6, final_p, false);
 	if (IS_ERR(dst))
 		return NULL;
 
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 26a8862..3b79e94 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -477,7 +477,8 @@ out:
 }
 
 
-static int tcp_v6_send_synack(struct sock *sk, struct request_sock *req,
+static int tcp_v6_send_synack(struct sock *sk,
+			      struct request_sock *req,
 			      struct request_values *rvp,
 			      u16 queue_mapping)
 {
@@ -1058,6 +1059,7 @@ static int tcp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
 	struct tcp_sock *tp = tcp_sk(sk);
 	__u32 isn = TCP_SKB_CB(skb)->when;
 	struct dst_entry *dst = NULL;
+	struct flowi6 fl6;
 	bool want_cookie = false;
 
 	if (skb->protocol == htons(ETH_P_IP))
@@ -1177,7 +1179,7 @@ static int tcp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
 		 */
 		if (tmp_opt.saw_tstamp &&
 		    tcp_death_row.sysctl_tw_recycle &&
-		    (dst = inet6_csk_route_req(sk, req)) != NULL &&
+		    (dst = inet6_csk_route_req(sk, &fl6, req)) != NULL &&
 		    (peer = rt6_get_peer((struct rt6_info *)dst)) != NULL &&
 		    ipv6_addr_equal((struct in6_addr *)peer->daddr.addr.a6,
 				    &treq->rmt_addr)) {
@@ -1246,6 +1248,7 @@ static struct sock * tcp_v6_syn_recv_sock(struct sock *sk, struct sk_buff *skb,
 #ifdef CONFIG_TCP_MD5SIG
 	struct tcp_md5sig_key *key;
 #endif
+	struct flowi6 fl6;
 
 	if (skb->protocol == htons(ETH_P_IP)) {
 		/*
@@ -1308,7 +1311,7 @@ static struct sock * tcp_v6_syn_recv_sock(struct sock *sk, struct sk_buff *skb,
 		goto out_overflow;
 
 	if (!dst) {
-		dst = inet6_csk_route_req(sk, req);
+		dst = inet6_csk_route_req(sk, &fl6, req);
 		if (!dst)
 			goto out;
 	}
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH net-next 3/4] tcp: use inet6_csk_route_req() in tcp_v6_send_synack()
From: Neal Cardwell @ 2012-06-28 22:34 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Eric Dumazet, Neal Cardwell
In-Reply-To: <1340922861-11684-1-git-send-email-ncardwell@google.com>

With the recent change (earlier in this patch series) to set
flowi6_oif to treq->iif in inet6_csk_route_req(), the dst lookup in
these two functions is now identical, so tcp_v6_send_synack() can now
just call inet6_csk_route_req(), to reduce code duplication and keep
things closer to the IPv4 side, which is structured this way.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
---
 net/ipv6/tcp_ipv6.c |   29 ++++++-----------------------
 1 files changed, 6 insertions(+), 23 deletions(-)

diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 3b79e94..c221086 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -485,34 +485,17 @@ static int tcp_v6_send_synack(struct sock *sk,
 	struct inet6_request_sock *treq = inet6_rsk(req);
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	struct sk_buff * skb;
-	struct ipv6_txoptions *opt = NULL;
-	struct in6_addr * final_p, final;
+	struct ipv6_txoptions *opt = np->opt;
 	struct flowi6 fl6;
 	struct dst_entry *dst;
-	int err;
-
-	memset(&fl6, 0, sizeof(fl6));
-	fl6.flowi6_proto = IPPROTO_TCP;
-	fl6.daddr = treq->rmt_addr;
-	fl6.saddr = treq->loc_addr;
-	fl6.flowlabel = 0;
-	fl6.flowi6_oif = treq->iif;
-	fl6.flowi6_mark = sk->sk_mark;
-	fl6.fl6_dport = inet_rsk(req)->rmt_port;
-	fl6.fl6_sport = inet_rsk(req)->loc_port;
-	security_req_classify_flow(req, flowi6_to_flowi(&fl6));
-
-	opt = np->opt;
-	final_p = fl6_update_dst(&fl6, opt, &final);
+	int err = -ENOMEM;
 
-	dst = ip6_dst_lookup_flow(sk, &fl6, final_p, false);
-	if (IS_ERR(dst)) {
-		err = PTR_ERR(dst);
-		dst = NULL;
+	dst = inet6_csk_route_req(sk, &fl6, req);
+	if (!dst)
 		goto done;
-	}
+
 	skb = tcp_make_synack(sk, dst, req, rvp);
-	err = -ENOMEM;
+
 	if (skb) {
 		__tcp_v6_send_check(skb, &treq->loc_addr, &treq->rmt_addr);
 
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH net-next 1/4] tcp: fix inet6_csk_route_req() for link-local addresses
From: Neal Cardwell @ 2012-06-28 22:34 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Eric Dumazet, Neal Cardwell

Fix inet6_csk_route_req() to use as the flowi6_oif the treq->iif,
which is correctly fixed up in tcp_v6_conn_request() to handle the
case of link-local addresses. This brings it in line with the
tcp_v6_send_synack() code, which is already correctly using the
treq->iif in this way.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
---
 net/ipv6/inet6_connection_sock.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c
index e6cee52..e23d354 100644
--- a/net/ipv6/inet6_connection_sock.c
+++ b/net/ipv6/inet6_connection_sock.c
@@ -68,7 +68,7 @@ struct dst_entry *inet6_csk_route_req(struct sock *sk,
 	fl6.daddr = treq->rmt_addr;
 	final_p = fl6_update_dst(&fl6, np->opt, &final);
 	fl6.saddr = treq->loc_addr;
-	fl6.flowi6_oif = sk->sk_bound_dev_if;
+	fl6.flowi6_oif = treq->iif;
 	fl6.flowi6_mark = sk->sk_mark;
 	fl6.fl6_dport = inet_rsk(req)->rmt_port;
 	fl6.fl6_sport = inet_rsk(req)->loc_port;
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH net-next 4/4] tcp: plug dst leak in tcp_v6_conn_request()
From: Neal Cardwell @ 2012-06-28 22:34 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Eric Dumazet, Neal Cardwell
In-Reply-To: <1340922861-11684-1-git-send-email-ncardwell@google.com>

The code in tcp_v6_conn_request() was implicitly assuming that
tcp_v6_send_synack() would take care of dst_release(), much as
tcp_v4_send_synack() already does. This resulted in
tcp_v6_conn_request() leaking a dst if sysctl_tw_recycle is enabled.

This commit restructures tcp_v6_send_synack() so that it accepts a dst
pointer and takes care of releasing the dst that is passed in, to plug
the leak and avoid future surprises by bringing the IPv6 behavior in
line with the IPv4 side.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
---
 net/ipv6/tcp_ipv6.c |   19 ++++++++++---------
 1 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index c221086..5edabf6 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -477,7 +477,8 @@ out:
 }
 
 
-static int tcp_v6_send_synack(struct sock *sk,
+static int tcp_v6_send_synack(struct sock *sk, struct dst_entry *dst,
+			      struct flowi6 *fl6,
 			      struct request_sock *req,
 			      struct request_values *rvp,
 			      u16 queue_mapping)
@@ -486,12 +487,10 @@ static int tcp_v6_send_synack(struct sock *sk,
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	struct sk_buff * skb;
 	struct ipv6_txoptions *opt = np->opt;
-	struct flowi6 fl6;
-	struct dst_entry *dst;
 	int err = -ENOMEM;
 
-	dst = inet6_csk_route_req(sk, &fl6, req);
-	if (!dst)
+	/* First, grab a route. */
+	if (!dst && (dst = inet6_csk_route_req(sk, fl6, req)) == NULL)
 		goto done;
 
 	skb = tcp_make_synack(sk, dst, req, rvp);
@@ -499,9 +498,9 @@ static int tcp_v6_send_synack(struct sock *sk,
 	if (skb) {
 		__tcp_v6_send_check(skb, &treq->loc_addr, &treq->rmt_addr);
 
-		fl6.daddr = treq->rmt_addr;
+		fl6->daddr = treq->rmt_addr;
 		skb_set_queue_mapping(skb, queue_mapping);
-		err = ip6_xmit(sk, skb, &fl6, opt, np->tclass);
+		err = ip6_xmit(sk, skb, fl6, opt, np->tclass);
 		err = net_xmit_eval(err);
 	}
 
@@ -514,8 +513,10 @@ done:
 static int tcp_v6_rtx_synack(struct sock *sk, struct request_sock *req,
 			     struct request_values *rvp)
 {
+	struct flowi6 fl6;
+
 	TCP_INC_STATS_BH(sock_net(sk), TCP_MIB_RETRANSSEGS);
-	return tcp_v6_send_synack(sk, req, rvp, 0);
+	return tcp_v6_send_synack(sk, NULL, &fl6, req, rvp, 0);
 }
 
 static void tcp_v6_reqsk_destructor(struct request_sock *req)
@@ -1200,7 +1201,7 @@ have_isn:
 
 	security_inet_conn_request(sk, skb, req);
 
-	if (tcp_v6_send_synack(sk, req,
+	if (tcp_v6_send_synack(sk, dst, &fl6, req,
 			       (struct request_values *)&tmp_ext,
 			       skb_get_queue_mapping(skb)) ||
 	    want_cookie)
-- 
1.7.4.1

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox