* [RFC PATCH] af_packet: don't to defrag shared skb
From: Eric Leblond @ 2012-12-07 18:56 UTC (permalink / raw)
To: netdev; +Cc: Eric Leblond
This patch is adding a check on skb before trying to defrag the
packet for the hash computation in fanout mode. The goal of this
patch is to avoid an kernel crash in pskb_expand_head.
It appears that under some specific condition there is a shared
skb reaching the defrag code and this lead to a crash due to the
following code:
if (skb_shared(skb))
BUG();
I've observed this crash under the following condition:
1. a program is listening to an wifi interface (let say wlan0)
2. it is using fanout capture in flow load balancing mode
3. defrag option is on on the fanout socket
4. the interface disconnect (radio down for example)
5. the interface reconnect (radio switched up)
6. once reconnected a single packet is seen with skb->users=2
7. the kernel crash in pskb_expand_head at skbuff.c:1035
[BBB55:744364] [<ffffffff812a2761>] ? __pskb_pull_tail+0x43x0x26f
[BB8S5.744395] [<ffffffff812d29Tb>] ? ip_check_defrag+ox3a/0x14a
[BBB55.744422] [<ffffffffB1344459>] ? packet_rcv_fanout+ox5e/oxf9
[BBBS5.7444S0] [<ffffffffB12aaS9b>] ? __netif_receive_skb+ox444/ox4f9
[BBB55.T4447B] [<ffffffffB12aa?e1>] ? netif_receive_skb+ox6d/0x?3
[BBB55.T4447B] [<ffffffffB12aa?e1>] ? ieee80211_deliver_skb+0xbd/0xfa [mac80211]
[BBB55.T4447B] [<ffffffffB12aa?e1>] ? ieee80211_rx_h_data+0x1e0/0x21a [mac80211]
[BBB55.T4447B] [<ffffffffB12aa?e1>] ? ieee80211_rx_handlers+0x3d5/0x480 [mac80211]
[BBB55.T4447B] [<ffffffffB12aa?e1>] ? __wake_up
[BBB55.T4447B] [<ffffffffB12aa?e1>] ? evdev_eventr+0xc0/0xcf [evdev]
Signed-off-by: Eric Leblond <eric@regit.org>
---
net/packet/af_packet.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index e639645..4b453f8 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1110,7 +1110,7 @@ static int packet_rcv_fanout(struct sk_buff *skb, struct net_device *dev,
switch (f->type) {
case PACKET_FANOUT_HASH:
default:
- if (f->defrag) {
+ if (f->defrag && !skb_shared(skb)) {
skb = ip_check_defrag(skb, IP_DEFRAG_AF_PACKET);
if (!skb)
return 0;
--
1.7.10.4
^ permalink raw reply related
* pull request: wireless-next 2012-12-07
From: John W. Linville @ 2012-12-07 18:57 UTC (permalink / raw)
To: davem; +Cc: linux-wireless, netdev
[-- Attachment #1: Type: text/plain, Size: 18514 bytes --]
Dave,
This pull request is intended for 3.8...
This includes a Bluetooth pull. Gustavo says:
"A few more patches to 3.8, I hope they can still make it to mainline!
The most important ones are the socket option for the SCO protocol to allow
accept/refuse new connections from userspace. Other than that I added some
fixes and Andrei did more AMP work."
Also, a mac80211 pull. Johannes says:
"If you think there's any chance this might make it still, please pull my
mac80211-next tree (per below). This contains a relatively large number
of fixes to the previous code, as well as a few small features:
* VHT association in mac80211
* some new debugfs files
* P2P GO powersave configuration
* masked MAC address verification
The biggest patch is probably the BSS struct changes to use RCU for
their IE buffers to fix potential races. I've not tagged this for stable
because it's pretty invasive and nobody has ever seen any bugs in this
area as far as I know."
Several other drivers get some attention, including ath9k, brcmfmac,
brcmsmac, and a number of others. Also, Hauke gives us a series that
improves watchdog support for the bcma and ssb busses. Finally, Bill
Pemberton delivers a group of "remove __dev* attributes" for wireless
drivers -- these generate some "section mismatch" warnings, but Greg
K-H assures me that they will disappear by the time -rc1 is released.
This also includes a pull of the wireless tree to avoid merge
conflicts.
Please let me know if there are problems!
Thanks,
John
---
The following changes since commit fd3065b25b69ce345073bbd294a73343a608fd8b:
chelsio: remove get_clock and use ktime_get (2012-12-07 12:56:22 -0500)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next.git for-davem
for you to fetch changes up to 8024dc191025d6b981563236df02da5c0db0854d:
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem (2012-12-07 13:03:50 -0500)
----------------------------------------------------------------
Andrei Emeltchenko (6):
Bluetooth: Refactor l2cap_send_disconn_req
Bluetooth: AMP: Mark controller radio powered down after HCIDEVDOWN
Bluetooth: AMP: Check that AMP is present and active
Bluetooth: Fix missing L2CAP EWS Conf parameter
Bluetooth: Process receiving FCS_NONE in L2CAP Conf Rsp
Bluetooth: trivial: Change NO_FCS_RECV to RECV_NO_FCS
Antonio Quartulli (1):
mac80211: allow userspace registration for probe requests in IBSS
Arend van Spriel (3):
brcm80211: update the MAINTAINERS file
brcmfmac: get rid of struct brcmf_cfg80211_info::link_up attribute
brcmfmac: remove mode from struct brcmf_cfg80211_conf
Avinash Patil (1):
mwifiex: advertise GreenField, 40MHz intolerance support to cfg80211
Bill Pemberton (18):
rfkill: remove __dev* attributes
wireless: remove __dev* attributes
ath5k: remove __dev* attributes
atmel: remove __dev* attributes
b43: remove __dev* attributes
brcm80211: remove __dev* attributes
ipw2x00: remove __dev* attributes
iwlegacy: remove __dev* attributes
iwlwifi: remove __dev* attributes
libertas: remove __dev* attributes
mwl8k: remove __dev* attributes
orinoco: remove __dev* attributes
p54: remove __dev* attributes
rt2x00: remove __dev* attributes
rtl8187: remove __dev* attributes
rtl8187: remove __dev* attributes
wlcore/wl18xx/wl12xx: remove __dev* attributes
rtlwifi: remove __dev* attributes
Christian Lamparter (2):
carl9170: fix signal strength reporting issues
carl9170: explain why sta cannot be NULL for ampdus
Dan Carpenter (2):
mac80211: fix potential NULL dereference
p54: potential signedness issue in p54_parse_rssical()
Daniel Stamer (1):
rtlwifi: rtl8192se: Fixed coding style issues in the driver
Emmanuel Grumbach (1):
iwlwifi: fix the basic CCK rates calculation
Florian Fainelli (1):
mwl8k: remove useless pci shutdown callback and stray debugging
Frédéric Dalleau (2):
Bluetooth: Add BT_DEFER_SETUP option to sco socket
Bluetooth: Implement deferred sco socket setup
Gabor Juhos (4):
rt2x00: rt2800lib: fix indentation of some rt2x00_rt calls
rt2x00: rt2800lib: fix indentation in rt2800_init_rfcsr
rt2x00: rt2800lib: remove trailing semicolons from RFCSR3_* defines
rt2x00: rt2800lib: introduce RFCSR3_VCOCAL_EN
Gustavo Padovan (4):
Bluetooth: Add missing lock nesting notation
Bluetooth: cancel power_on work when unregistering the device
Bluetooth: Move double negation to macros
Revert "Bluetooth: Fix possible deadlock in SCO code"
Hante Meuleman (2):
brcmfmac: fix bug in setting mgmt ie and parsing vndrs ie.
brcmfmac: change debug output for received event.
Hauke Mehrtens (16):
bcma: handle return value of pci_assign_resource
ssb: extif: fix compile errors
ath9k: use SIMPLE_DEV_PM_OPS
p54pci: use SIMPLE_DEV_PM_OPS
rtlwifi: use SIMPLE_DEV_PM_OPS
ssb/bcma: add common header for watchdog
bcma: add bcma_chipco_alp_clock
bcma: set the pmu watchdog if available
bcma: add methods for watchdog driver
bcma: register watchdog driver
ssb: get alp clock from devices with PMU
ssb: set the PMU watchdog if available
ssb: add methods for watchdog driver
ssb: extif: add check for max value before setting watchdog register
ssb: extif: add methods for watchdog driver
ssb: register watchdog driver
Helmut Schaa (4):
mac80211: reject setting masked mac addresses
mac80211: don't reinit rate control when mesh sta exists
rt2x00: Use addr_mask to disallow invalid MAC addresses in mutli-bssid mode
rt2x00: Only specify interface combinations if more then one interface is possible
Johannes Berg (15):
mac80211: fix remain-on-channel (non-)cancelling
cfg80211: rework chandef checking and export it
mac80211: support VHT association
mac80211: support (partial) VHT radiotap information
nl80211: support P2P GO powersave configuration
mac80211: support P2P GO powersave configuration
nl80211: remove unnecessary checks
cfg80211: don't BUG_ON BSS struct issues
cfg80211: fix whitespace in scan handling
cfg80211: fix cmp_hidden_bss
mac80211: make ieee80211_build_preq_ies safer
mac80211: remove probe response temporary buffer allocation
cfg80211: fix BSS struct IE access races
mac80211: simplify loop in minstrel_ht
mwifiex: fix struct member mismatch
John W. Linville (5):
Merge branch 'for-john' of git://git.kernel.org/.../iwlwifi/iwlwifi-fixes
Merge branch 'for-upstream' of git://git.kernel.org/.../bluetooth/bluetooth-next
Merge tag 'nfc-fixes-3.7-2' of git://git.kernel.org/.../sameo/nfc-3.0
Merge branch 'for-john' of git://git.kernel.org/.../jberg/mac80211-next
Merge branch 'master' of git://git.kernel.org/.../linville/wireless-next into for-davem
Luis R. Rodriguez (1):
brcmfmac: convert struct spinlock to spinlock_t
Mahesh Palivela (1):
cfg80211: Remove unused VHT chan code
Marco Porsch (2):
mac80211: fix for mesh sync to indicate TBTT adjustment
cfg80211: fix channel error on mesh join
Masanari Iida (1):
wireless: mwifiex: Fix typo in wireless/mwifiex driver
Piotr Haber (6):
brcmsmac: handle packet drop during transmit correctly
brcmsmac: cleanup in isr code
brcmsmac: fix bounds checking in tx/rx
brcmsmac: hardware info in debugfs
brcmsmac: move PHY functions
brcmsmac: support 4313iPA
Rajkumar Manoharan (1):
ath9k: Fix buffer overflow error
Saravana (4):
mac80211: add debugfs file for last ack signal
mac80211: add debugfs file for current tx rate
mac80211: re-organize the rx rate calculation logic
mac80211: add debugfs file for last rx rate
Seth Forshee (1):
brcmsmac: Fix possible NULL pointer dereference in _dma_ctrlflags()
Simon Wunderlich (3):
nl80211: Fix HT_IBSS feature check in ibss_join
mac80211: Fix typo in mac80211.h
mac80211: return if CSA is not handle
Sujith Manoharan (4):
ath9k: Move ethtool functions to debug.c
ath9k: Replace WME_NUM_TID with IEEE80211_NUM_TIDS
ath9k: Implement sta_add_debugfs/sta_remove_debugfs
ath9k: Remove redundant NULL assignment
Vladimir Kondratiev (1):
wireless: allow Atheros card to not depend on ath.ko
Waldemar Rymarkiewicz (1):
NFC: Fix incorrect llcp pointer dereference
Wei Yongjun (2):
brcmsmac: remove duplicated include from debug.c
ipw2200: return error code on error in ipw_wx_get_auth()
MAINTAINERS | 3 +-
drivers/bcma/bcma_private.h | 2 +
drivers/bcma/driver_chipcommon.c | 114 ++++-
drivers/bcma/driver_pci_host.c | 10 +-
drivers/bcma/main.c | 8 +
drivers/net/wireless/adm8211.c | 6 +-
drivers/net/wireless/airo.c | 6 +-
drivers/net/wireless/ath/Kconfig | 7 +-
drivers/net/wireless/ath/ar5523/Kconfig | 1 +
drivers/net/wireless/ath/ath5k/Kconfig | 1 +
drivers/net/wireless/ath/ath5k/base.c | 4 +-
drivers/net/wireless/ath/ath5k/led.c | 2 +-
drivers/net/wireless/ath/ath5k/pci.c | 6 +-
drivers/net/wireless/ath/ath9k/Kconfig | 1 +
drivers/net/wireless/ath/ath9k/ath9k.h | 7 +-
drivers/net/wireless/ath/ath9k/common.h | 1 -
drivers/net/wireless/ath/ath9k/debug.c | 209 +++++++++
drivers/net/wireless/ath/ath9k/debug.h | 17 +-
drivers/net/wireless/ath/ath9k/main.c | 137 +-----
drivers/net/wireless/ath/ath9k/mci.c | 3 +-
drivers/net/wireless/ath/ath9k/pci.c | 15 +-
drivers/net/wireless/ath/ath9k/xmit.c | 9 +-
drivers/net/wireless/ath/carl9170/Kconfig | 1 +
drivers/net/wireless/ath/carl9170/rx.c | 2 +
drivers/net/wireless/ath/carl9170/tx.c | 7 +
drivers/net/wireless/atmel_pci.c | 6 +-
drivers/net/wireless/b43/pcmcia.c | 6 +-
drivers/net/wireless/b43/sdio.c | 6 +-
drivers/net/wireless/brcm80211/brcmfmac/fweh.c | 4 +-
drivers/net/wireless/brcm80211/brcmfmac/fweh.h | 2 +-
.../net/wireless/brcm80211/brcmfmac/wl_cfg80211.c | 141 +++---
.../net/wireless/brcm80211/brcmfmac/wl_cfg80211.h | 3 -
drivers/net/wireless/brcm80211/brcmsmac/debug.c | 112 +++++
drivers/net/wireless/brcm80211/brcmsmac/debug.h | 23 +
drivers/net/wireless/brcm80211/brcmsmac/dma.c | 4 +-
.../net/wireless/brcm80211/brcmsmac/mac80211_if.c | 33 +-
drivers/net/wireless/brcm80211/brcmsmac/main.c | 61 +--
.../net/wireless/brcm80211/brcmsmac/phy/phy_lcn.c | 471 +++++++++++++--------
.../wireless/brcm80211/brcmsmac/phy/phytbl_lcn.c | 64 +--
drivers/net/wireless/brcm80211/brcmsmac/pub.h | 5 +-
drivers/net/wireless/ipw2x00/ipw2100.c | 4 +-
drivers/net/wireless/ipw2x00/ipw2200.c | 12 +-
drivers/net/wireless/iwlegacy/3945-mac.c | 4 +-
drivers/net/wireless/iwlegacy/4965-mac.c | 4 +-
drivers/net/wireless/iwlwifi/dvm/rxon.c | 12 +-
drivers/net/wireless/iwlwifi/pcie/drv.c | 4 +-
drivers/net/wireless/libertas/cfg.c | 9 +-
drivers/net/wireless/libertas/if_spi.c | 6 +-
drivers/net/wireless/mac80211_hwsim.c | 7 +-
drivers/net/wireless/mwifiex/cfg80211.c | 15 +
drivers/net/wireless/mwifiex/fw.h | 2 +
drivers/net/wireless/mwifiex/sta_ioctl.c | 36 +-
drivers/net/wireless/mwifiex/usb.c | 2 +-
drivers/net/wireless/mwl8k.c | 14 +-
drivers/net/wireless/orinoco/orinoco_nortel.c | 4 +-
drivers/net/wireless/orinoco/orinoco_pci.c | 4 +-
drivers/net/wireless/orinoco/orinoco_plx.c | 4 +-
drivers/net/wireless/orinoco/orinoco_tmd.c | 4 +-
drivers/net/wireless/p54/eeprom.c | 5 +-
drivers/net/wireless/p54/p54pci.c | 19 +-
drivers/net/wireless/p54/p54spi.c | 6 +-
drivers/net/wireless/p54/p54usb.c | 6 +-
drivers/net/wireless/rt2x00/rt2400pci.c | 2 +-
drivers/net/wireless/rt2x00/rt2500pci.c | 2 +-
drivers/net/wireless/rt2x00/rt2800.h | 6 +-
drivers/net/wireless/rt2x00/rt2800lib.c | 142 +++----
drivers/net/wireless/rt2x00/rt2800pci.c | 4 +-
drivers/net/wireless/rt2x00/rt2x00dev.c | 10 +
drivers/net/wireless/rt2x00/rt61pci.c | 2 +-
drivers/net/wireless/rtl818x/rtl8180/dev.c | 6 +-
drivers/net/wireless/rtl818x/rtl8187/dev.c | 6 +-
drivers/net/wireless/rtlwifi/pci.c | 4 +-
drivers/net/wireless/rtlwifi/pci.h | 4 +-
drivers/net/wireless/rtlwifi/rtl8192ce/sw.c | 9 +-
drivers/net/wireless/rtlwifi/rtl8192de/sw.c | 11 +-
drivers/net/wireless/rtlwifi/rtl8192se/def.h | 3 +-
drivers/net/wireless/rtlwifi/rtl8192se/dm.c | 20 +-
drivers/net/wireless/rtlwifi/rtl8192se/hw.c | 3 +-
drivers/net/wireless/rtlwifi/rtl8192se/hw.h | 2 +-
drivers/net/wireless/rtlwifi/rtl8192se/sw.c | 20 +-
drivers/net/wireless/rtlwifi/rtl8723ae/sw.c | 9 +-
drivers/net/wireless/rtlwifi/usb.c | 2 +-
drivers/net/wireless/rtlwifi/usb.h | 2 +-
drivers/net/wireless/ti/wl1251/main.c | 4 +-
drivers/net/wireless/ti/wl1251/sdio.c | 4 +-
drivers/net/wireless/ti/wl1251/spi.c | 6 +-
drivers/net/wireless/ti/wl12xx/main.c | 6 +-
drivers/net/wireless/ti/wl18xx/main.c | 6 +-
drivers/net/wireless/ti/wlcore/cmd.c | 4 +-
drivers/net/wireless/ti/wlcore/main.c | 4 +-
drivers/net/wireless/ti/wlcore/sdio.c | 8 +-
drivers/net/wireless/ti/wlcore/spi.c | 6 +-
drivers/net/wireless/ti/wlcore/wlcore.h | 4 +-
drivers/ssb/driver_chipcommon.c | 100 ++++-
drivers/ssb/driver_chipcommon_pmu.c | 27 ++
drivers/ssb/driver_extif.c | 24 +-
drivers/ssb/driver_mipscore.c | 14 +-
drivers/ssb/embedded.c | 35 ++
drivers/ssb/main.c | 8 +
drivers/ssb/ssb_private.h | 31 ++
include/linux/bcm47xx_wdt.h | 19 +
include/linux/bcma/bcma_driver_chipcommon.h | 7 +-
include/linux/ieee80211.h | 15 +
include/linux/ssb/ssb.h | 2 +
include/linux/ssb/ssb_driver_chipcommon.h | 5 +-
include/linux/ssb/ssb_driver_extif.h | 52 ++-
include/net/bluetooth/hci_core.h | 31 +-
include/net/bluetooth/l2cap.h | 2 +-
include/net/cfg80211.h | 69 ++-
include/net/ieee80211_radiotap.h | 24 ++
include/net/mac80211.h | 12 +-
include/uapi/linux/nl80211.h | 16 +
net/bluetooth/hci_core.c | 5 +
net/bluetooth/hci_event.c | 56 ++-
net/bluetooth/l2cap_core.c | 100 ++---
net/bluetooth/mgmt.c | 10 +-
net/bluetooth/rfcomm/sock.c | 4 +-
net/bluetooth/sco.c | 86 +++-
net/mac80211/cfg.c | 77 ++--
net/mac80211/debugfs_sta.c | 36 ++
net/mac80211/ieee80211_i.h | 5 +-
net/mac80211/iface.c | 45 ++
net/mac80211/main.c | 5 +-
net/mac80211/mesh_plink.c | 3 +-
net/mac80211/mesh_sync.c | 4 +
net/mac80211/mlme.c | 431 +++++++++++++++----
net/mac80211/offchannel.c | 2 -
net/mac80211/rc80211_minstrel_ht.c | 8 +-
net/mac80211/rx.c | 42 +-
net/mac80211/scan.c | 21 +-
net/mac80211/sta_info.h | 4 +
net/mac80211/status.c | 3 +
net/mac80211/tx.c | 9 +-
net/mac80211/util.c | 58 ++-
net/nfc/llcp/llcp.c | 5 +-
net/rfkill/rfkill-gpio.c | 2 +-
net/rfkill/rfkill-regulator.c | 6 +-
net/wireless/chan.c | 92 ++--
net/wireless/core.h | 8 -
net/wireless/mesh.c | 3 +-
net/wireless/nl80211.c | 155 +++----
net/wireless/reg.c | 2 +-
net/wireless/reg.h | 2 +-
net/wireless/scan.c | 452 ++++++++++----------
net/wireless/sme.c | 13 +-
net/wireless/util.c | 9 +-
net/wireless/wext-sme.c | 8 +-
147 files changed, 2860 insertions(+), 1428 deletions(-)
create mode 100644 include/linux/bcm47xx_wdt.h
--
John W. Linville Someday the world will need a hero, and you
linville@tuxdriver.com might be all we have. Be ready.
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply
* Re: pull request: wireless-next 2012-12-07
From: David Miller @ 2012-12-07 19:10 UTC (permalink / raw)
To: linville; +Cc: linux-wireless, netdev
In-Reply-To: <20121207185754.GF1869@tuxdriver.com>
From: "John W. Linville" <linville@tuxdriver.com>
Date: Fri, 7 Dec 2012 13:57:54 -0500
> This includes a Bluetooth pull. Gustavo says:
>
> "A few more patches to 3.8, I hope they can still make it to mainline!
> The most important ones are the socket option for the SCO protocol to allow
> accept/refuse new connections from userspace. Other than that I added some
> fixes and Andrei did more AMP work."
>
> Also, a mac80211 pull. Johannes says:
>
> "If you think there's any chance this might make it still, please pull my
> mac80211-next tree (per below). This contains a relatively large number
> of fixes to the previous code, as well as a few small features:
> * VHT association in mac80211
> * some new debugfs files
> * P2P GO powersave configuration
> * masked MAC address verification
>
> The biggest patch is probably the BSS struct changes to use RCU for
> their IE buffers to fix potential races. I've not tagged this for stable
> because it's pretty invasive and nobody has ever seen any bugs in this
> area as far as I know."
>
> Several other drivers get some attention, including ath9k, brcmfmac,
> brcmsmac, and a number of others. Also, Hauke gives us a series that
> improves watchdog support for the bcma and ssb busses. Finally, Bill
> Pemberton delivers a group of "remove __dev* attributes" for wireless
> drivers -- these generate some "section mismatch" warnings, but Greg
> K-H assures me that they will disappear by the time -rc1 is released.
>
> This also includes a pull of the wireless tree to avoid merge
> conflicts.
Pulled, thanks John.
^ permalink raw reply
* Re: [RFC PATCH] af_packet: don't to defrag shared skb
From: David Miller @ 2012-12-07 19:10 UTC (permalink / raw)
To: eric; +Cc: netdev
In-Reply-To: <1354906561-4695-1-git-send-email-eric@regit.org>
From: Eric Leblond <eric@regit.org>
Date: Fri, 7 Dec 2012 19:56:01 +0100
> This patch is adding a check on skb before trying to defrag the
> packet for the hash computation in fanout mode. The goal of this
> patch is to avoid an kernel crash in pskb_expand_head.
> It appears that under some specific condition there is a shared
> skb reaching the defrag code and this lead to a crash due to the
> following code:
>
> if (skb_shared(skb))
> BUG();
>
> I've observed this crash under the following condition:
> 1. a program is listening to an wifi interface (let say wlan0)
> 2. it is using fanout capture in flow load balancing mode
> 3. defrag option is on on the fanout socket
> 4. the interface disconnect (radio down for example)
> 5. the interface reconnect (radio switched up)
> 6. once reconnected a single packet is seen with skb->users=2
> 7. the kernel crash in pskb_expand_head at skbuff.c:1035
...
> Signed-off-by: Eric Leblond <eric@regit.org>
Thanks Eric. I'll try to figure out if we should instead
change the wireless code to avoid sending shared SKBs into
the input path like that.
^ permalink raw reply
* Re: [patch v2] bridge: make buffer larger in br_setlink()
From: walter harms @ 2012-12-07 19:15 UTC (permalink / raw)
To: Dan Carpenter
Cc: Stephen Hemminger, David S. Miller, bridge, netdev,
kernel-janitors, Thomas Graf
In-Reply-To: <20121207185359.GP22569@mwanda>
Am 07.12.2012 19:53, schrieb Dan Carpenter:
> On Fri, Dec 07, 2012 at 05:07:24PM +0100, walter harms wrote:
>>
>>
>> Am 07.12.2012 12:10, schrieb Dan Carpenter:
>>> We pass IFLA_BRPORT_MAX to nla_parse_nested() so we need
>>> IFLA_BRPORT_MAX + 1 elements. Also Smatch complains that we read past
>>> the end of the array when in br_set_port_flag() when it's called with
>>> IFLA_BRPORT_FAST_LEAVE.
>>>
>>
>>
>>
>> I have no clue why nla_parse_nested() need IFLA_BRPORT_MAX elements.
>> but the majory of loop look like
>> for(i=0;i<max;++)
>> most programmers will think this way.
>> So it seems the place to fix is nla_parse_nested().
>> doing not so is asking for trouble (in the long run).
>> At least this function needs a big warning label that (max-1)
>> is actually needed.
>>
>
> Yeah, nla_parse_nested() is actually documented already.
>
documenting unexspected behavier is not as much helpfull as changing it.
just my 2 cents,
wh
^ permalink raw reply
* Re: [PATCH 0/2 v2] sctp: RCU protection when accessing assoc members for procfs
From: David Miller @ 2012-12-07 19:15 UTC (permalink / raw)
To: tgraf; +Cc: linux-sctp, netdev, vyasevich, nhorman
In-Reply-To: <cover.1354821623.git.tgraf@suug.ch>
From: Thomas Graf <tgraf@suug.ch>
Date: Thu, 6 Dec 2012 20:25:03 +0100
> @Dave: This is a respin of the following patches:
> [PATCH] sctp: proc: protect bind_addr->address_list accesses with rcu_read_lock()
> [PATCH] sctp: Add RCU protection to assoc->transport_addr_list
>
> Thomas Graf (2):
> sctp: proc: protect bind_addr->address_list accesses with
> rcu_read_lock()
> sctp: Add RCU protection to assoc->transport_addr_list
Series applied, thanks.
^ permalink raw reply
* Re: [PATCH net-next] rps: overflow prevention for saturated cpus
From: David Miller @ 2012-12-07 19:20 UTC (permalink / raw)
To: willemb; +Cc: netdev, edumazet, therbert
In-Reply-To: <1354826194-9289-1-git-send-email-willemb@google.com>
From: Willem de Bruijn <willemb@google.com>
Date: Thu, 6 Dec 2012 15:36:34 -0500
> This patch maintains flow affinity in normal conditions, but
> trades it for throughput when a cpu becomes saturated. Then, packets
> destined to that cpu (only) are redirected to the lightest loaded cpu
> in the rxqueue's rps_map. This breaks flow affinity under high load
> for some flows, in favor of processing packets up to the capacity
> of the complete rps_map cpuset in all circumstances.
We specifically built-in very strict checks to make sure we never
deliver packets out-of-order. Those mechanisms must be used and
enforced in any change of this nature.
^ permalink raw reply
* Re: [PATCH net-next 03/10] tipc: sk_recv_queue size check only for connectionless sockets
From: Neil Horman @ 2012-12-07 19:20 UTC (permalink / raw)
To: Paul Gortmaker; +Cc: David Miller, netdev, Jon Maloy, Ying Xue
In-Reply-To: <1354890498-6448-4-git-send-email-paul.gortmaker@windriver.com>
On Fri, Dec 07, 2012 at 09:28:11AM -0500, Paul Gortmaker wrote:
> From: Ying Xue <ying.xue@windriver.com>
>
> The sk_receive_queue limit control is currently performed for
> all arriving messages, disregarding socket and message type.
> But for connected sockets this check is redundant, since the protocol
> flow control already makes queue overflow impossible.
>
Can you explain where that occurs? I see where the tipc dispatch function calls
sk_add_backlog, which checks the per socket recieve queue (regardless of weather
the receiving socket is connection oriented or connectionless), but if the
receiver doesn't call receive very often, This just adds a check against your
global limit, doing nothing for your per-socket limits. In fact it seems to
repeat the same check twice, as in the worst case of the incomming message being
TIPC_LOW_IMPORTANCE, its just going to check that the global limit is exactly
OVERLOAD_LIMIT_BASE/2 again.
Neil
^ permalink raw reply
* Re: [PATCH] drivers/net: fix up function prototypes after __dev* removals
From: David Miller @ 2012-12-07 19:22 UTC (permalink / raw)
To: gregkh; +Cc: netdev, wfp5p
In-Reply-To: <20121207003056.GA30167@kroah.com>
From: Greg KH <gregkh@linuxfoundation.org>
Date: Thu, 6 Dec 2012 16:30:56 -0800
> From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>
> The __dev* removal patches for the network drivers ended up messing up
> the function prototypes for a bunch of drivers. This patch fixes all of
> them back up to be properly aligned.
>
> Bonus is that this almost removes 100 lines of code, always a nice
> surprise.
>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Applied, thanks Greg.
^ permalink raw reply
* Re: [PATCH v3 1/4] net: Add support for hardware-offloaded encapsulation
From: David Miller @ 2012-12-07 19:28 UTC (permalink / raw)
To: joseph.gasparakis
Cc: bhutchings, alexander.h.duyck, shemminger, chrisw, gospo, netdev,
linux-kernel, dmitry, saeed.bishara, peter.p.waskiewicz.jr
In-Reply-To: <alpine.LFD.2.02.1212071013200.30243@morpheus.jf.intel.com>
From: Joseph Gasparakis <joseph.gasparakis@intel.com>
Date: Fri, 7 Dec 2012 10:24:17 -0800 (PST)
> So the idea here is that the driver will use the headers for checksumming
> if the skb->encapsulation bit is on. The bit should be set in the protocol
> driver.
>
> To answer the second comment, the flags that we use in this series of
> patches is NETIF_F_IP_CSUM, NETIF_F_IPV6_CSUM and NETIF_F_SG. These are
> the bits that we propose will be used for checksumming of encapsulation.
> As per a previous comment in v2, the hw_enc_features field should be used
> also in the future when NICs have more encap offloads, so one could
> indicate these features there from the driver.
>
> Furthermore, I submitted a patch for Rx checksumming, where NETIF_F_RXCSUM
> is used, again in conjunction with skb->encapsulation flag. As I mention
> in my logs, the driver is expected to set the ip_summed to UNNECESSARY and
> turn the skb->encapsulation on, to indicate that the inner headers are
> already HW checksummed.
>
This is the kind of language that belongs in the commit message and
code comments.
^ permalink raw reply
* Re: [PATCH net-next 2/2] net: doc: add default value for neighbour parameters
From: David Miller @ 2012-12-07 19:31 UTC (permalink / raw)
To: shanwei88; +Cc: bhutchings, eric.dumazet, netdev
In-Reply-To: <50C15427.9060803@gmail.com>
From: Shan Wei <shanwei88@gmail.com>
Date: Fri, 07 Dec 2012 10:27:51 +0800
> [PATCH net-next] net: doc : use more suitable word 'unexpected' to replace 'secluded'
>
> 'secluded' is used to describe places, not suitable here.
>
> Suggested-by: Ben Hutchings <bhutchings@solarflare.com>
> Signed-off-by: Shan Wei <davidshan@tencent.com>
Applied.
^ permalink raw reply
* Re: [PATCH net-next v5] bridge: export multicast database via netlink
From: David Miller @ 2012-12-07 19:33 UTC (permalink / raw)
To: tgraf; +Cc: amwang, netdev, bridge, herbert, brouer, shemminger
In-Reply-To: <20121207103629.GB2996@casper.infradead.org>
From: Thomas Graf <tgraf@suug.ch>
Date: Fri, 7 Dec 2012 10:36:29 +0000
> On 12/07/12 at 06:04pm, Cong Wang wrote:
>> From: Cong Wang <amwang@redhat.com>
>>
>> V5: fix two bugs pointed out by Thomas
>> remove seq check for now, mark it as TODO
>>
>> V4: remove some useless #include
>> some coding style fix
>>
>> V3: drop debugging printk's
>> update selinux perm table as well
>>
>> V2: drop patch 1/2, export ifindex directly
>> Redesign netlink attributes
>> Improve netlink seq check
>> Handle IPv6 addr as well
>>
>> This patch exports bridge multicast database via netlink
>> message type RTM_GETMDB. Similar to fdb, but currently bridge-specific.
>> We may need to support modify multicast database too (RTM_{ADD,DEL}MDB).
>>
>> (Thanks to Thomas for patient reviews)
>>
>> Cc: Herbert Xu <herbert@gondor.apana.org.au>
>> Cc: Stephen Hemminger <shemminger@vyatta.com>
>> Cc: "David S. Miller" <davem@davemloft.net>
>> Cc: Thomas Graf <tgraf@suug.ch>
>> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
>> Signed-off-by: Cong Wang <amwang@redhat.com>
>
> Acked-by: Thomas Graf <tgraf@suug.ch>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH v3 1/4] net: Add support for hardware-offloaded encapsulation
From: David Miller @ 2012-12-07 19:37 UTC (permalink / raw)
To: joseph.gasparakis
Cc: bhutchings, alexander.h.duyck, shemminger, chrisw, gospo, netdev,
linux-kernel, dmitry, saeed.bishara, peter.p.waskiewicz.jr
In-Reply-To: <alpine.LFD.2.02.1212071139190.4985@morpheus.jf.intel.com>
From: Joseph Gasparakis <joseph.gasparakis@intel.com>
Date: Fri, 7 Dec 2012 11:41:46 -0800 (PST)
>
>
> On Fri, 7 Dec 2012, David Miller wrote:
>
>> From: Joseph Gasparakis <joseph.gasparakis@intel.com>
>> Date: Fri, 7 Dec 2012 10:24:17 -0800 (PST)
>>
>> > So the idea here is that the driver will use the headers for checksumming
>> > if the skb->encapsulation bit is on. The bit should be set in the protocol
>> > driver.
>> >
>> > To answer the second comment, the flags that we use in this series of
>> > patches is NETIF_F_IP_CSUM, NETIF_F_IPV6_CSUM and NETIF_F_SG. These are
>> > the bits that we propose will be used for checksumming of encapsulation.
>> > As per a previous comment in v2, the hw_enc_features field should be used
>> > also in the future when NICs have more encap offloads, so one could
>> > indicate these features there from the driver.
>> >
>> > Furthermore, I submitted a patch for Rx checksumming, where NETIF_F_RXCSUM
>> > is used, again in conjunction with skb->encapsulation flag. As I mention
>> > in my logs, the driver is expected to set the ip_summed to UNNECESSARY and
>> > turn the skb->encapsulation on, to indicate that the inner headers are
>> > already HW checksummed.
>> >
>>
>> This is the kind of language that belongs in the commit message and
>> code comments.
>>
> Sure. I'll wait to gather some more feedback if there is any and I will
> re-spin this off adding more code comments and clarify this in the logs.
Great.
Please note that this request applies to your receive side change too.
^ permalink raw reply
* Re: [patch v2] bridge: make buffer larger in br_setlink()
From: David Miller @ 2012-12-07 19:40 UTC (permalink / raw)
To: dan.carpenter; +Cc: tgraf, netdev, shemminger, bridge, kernel-janitors
In-Reply-To: <20121207111045.GA9676@elgon.mountain>
From: Dan Carpenter <dan.carpenter@oracle.com>
Date: Fri, 7 Dec 2012 14:10:46 +0300
> We pass IFLA_BRPORT_MAX to nla_parse_nested() so we need
> IFLA_BRPORT_MAX + 1 elements. Also Smatch complains that we read past
> the end of the array when in br_set_port_flag() when it's called with
> IFLA_BRPORT_FAST_LEAVE.
>
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Applied.
^ permalink raw reply
* Re: [PATCH] tcp: bug fix Fast Open client retransmission
From: David Miller @ 2012-12-07 19:41 UTC (permalink / raw)
To: ncardwell; +Cc: ycheng, nanditad, edumazet, netdev
In-Reply-To: <CADVnQymE7ubywMytDCabQ=3gMWkp=gVrHZke0HpiufQNbF=1KQ@mail.gmail.com>
From: Neal Cardwell <ncardwell@google.com>
Date: Thu, 6 Dec 2012 14:50:58 -0500
> On Thu, Dec 6, 2012 at 1:45 PM, Yuchung Cheng <ycheng@google.com> wrote:
>> If SYN-ACK partially acks SYN-data, the client retransmits the
>> remaining data by tcp_retransmit_skb(). This increments lost recovery
>> state variables like tp->retrans_out in Open state. If loss recovery
>> happens before the retransmission is acked, it triggers the WARN_ON
>> check in tcp_fastretrans_alert(). For example: the client sends
>> SYN-data, gets SYN-ACK acking only ISN, retransmits data, sends
>> another 4 data packets and get 3 dupacks.
>>
>> Since the retransmission is not caused by network drop it should not
>> update the recovery state variables. Further the server may return a
>> smaller MSS than the cached MSS used for SYN-data, so the retranmission
>> needs a loop. Otherwise some data will not be retransmitted until timeout
>> or other loss recovery events.
>>
>> Signed-off-by: Yuchung Cheng <ycheng@google.com>
>> ---
>
> Acked-by: Neal Cardwell <ncardwell@google.com>
Applied.
^ permalink raw reply
* Re: [PATCH] net: gro: fix possible panic in skb_gro_receive()
From: David Miller @ 2012-12-07 19:41 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev
In-Reply-To: <1354838099.10983.10.camel@edumazet-glaptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 06 Dec 2012 15:54:59 -0800
> From: Eric Dumazet <edumazet@google.com>
>
> commit 2e71a6f8084e (net: gro: selective flush of packets) added
> a bug for skbs using frag_list. This part of the GRO stack is rarely
> used, as it needs skb not using a page fragment for their skb->head.
>
> Most drivers do use a page fragment, but some of them use GFP_KERNEL
> allocations for the initial fill of their RX ring buffer.
>
> napi_gro_flush() overwrite skb->prev that was used for these skb to
> point to the last skb in frag_list.
>
> Fix this using a separate field in struct napi_gro_cb to point to the
> last fragment.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Applied, thanks Eric.
^ permalink raw reply
* Re: [PATCH v3 1/4] net: Add support for hardware-offloaded encapsulation
From: Joseph Gasparakis @ 2012-12-07 19:41 UTC (permalink / raw)
To: David Miller
Cc: joseph.gasparakis, bhutchings, alexander.h.duyck, shemminger,
chrisw, gospo, netdev, linux-kernel, dmitry, saeed.bishara,
peter.p.waskiewicz.jr
In-Reply-To: <20121207.142845.1145515122357569641.davem@davemloft.net>
On Fri, 7 Dec 2012, David Miller wrote:
> From: Joseph Gasparakis <joseph.gasparakis@intel.com>
> Date: Fri, 7 Dec 2012 10:24:17 -0800 (PST)
>
> > So the idea here is that the driver will use the headers for checksumming
> > if the skb->encapsulation bit is on. The bit should be set in the protocol
> > driver.
> >
> > To answer the second comment, the flags that we use in this series of
> > patches is NETIF_F_IP_CSUM, NETIF_F_IPV6_CSUM and NETIF_F_SG. These are
> > the bits that we propose will be used for checksumming of encapsulation.
> > As per a previous comment in v2, the hw_enc_features field should be used
> > also in the future when NICs have more encap offloads, so one could
> > indicate these features there from the driver.
> >
> > Furthermore, I submitted a patch for Rx checksumming, where NETIF_F_RXCSUM
> > is used, again in conjunction with skb->encapsulation flag. As I mention
> > in my logs, the driver is expected to set the ip_summed to UNNECESSARY and
> > turn the skb->encapsulation on, to indicate that the inner headers are
> > already HW checksummed.
> >
>
> This is the kind of language that belongs in the commit message and
> code comments.
>
Sure. I'll wait to gather some more feedback if there is any and I will
re-spin this off adding more code comments and clarify this in the logs.
^ permalink raw reply
* Re: [PATCH net-next] bonding: Fix check for ethtool get_link operation support
From: David Miller @ 2012-12-07 19:41 UTC (permalink / raw)
To: bhutchings; +Cc: fubar, andy, netdev
In-Reply-To: <1354896932.2707.5.camel@bwh-desktop.uk.solarflarecom.com>
From: Ben Hutchings <bhutchings@solarflare.com>
Date: Fri, 7 Dec 2012 16:15:32 +0000
> Since commit 2c60db037034 ('net: provide a default dev->ethtool_ops')
> all devices have a non-null ethtool_ops. Test only
> dev->ethtool_ops->get_link in both places where we care.
>
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Applied.
^ permalink raw reply
* Re: [PATCH net-next 10/10] tipc: refactor accept() code for improved readability
From: Neil Horman @ 2012-12-07 19:42 UTC (permalink / raw)
To: Paul Gortmaker; +Cc: David Miller, netdev, Jon Maloy
In-Reply-To: <1354890498-6448-11-git-send-email-paul.gortmaker@windriver.com>
On Fri, Dec 07, 2012 at 09:28:18AM -0500, Paul Gortmaker wrote:
> In TIPC's accept() routine, there is a large block of code relating
> to initialization of a new socket, all within an if condition checking
> if the allocation succeeded.
>
> Here, we simply factor out that init code within the accept() function
> to its own separate function, which improves readability, and makes
> it easier to track which lock_sock() calls are operating on existing
> vs. new sockets.
>
> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> ---
> net/tipc/socket.c | 93 +++++++++++++++++++++++++++++--------------------------
> 1 file changed, 49 insertions(+), 44 deletions(-)
>
> diff --git a/net/tipc/socket.c b/net/tipc/socket.c
> index 38613cf..56661c8 100644
> --- a/net/tipc/socket.c
> +++ b/net/tipc/socket.c
> @@ -1507,6 +1507,53 @@ static int listen(struct socket *sock, int len)
> return res;
> }
>
> +static void tipc_init_socket(struct sock *sk, struct socket *new_sock,
> + struct sk_buff *buf)
> +{
Can you rename this to something more specific to its purpose? tipc_init_socket
makes me wonder why you're not calling it internally from tipc_create. This
seems more like a tipc_init_accept_sock, or some such. Alternatively, since
you're ony using it from your accept call, you might consider not factoring it
out at all, and just reversing the logic in your accept function so that you do:
res = tipc_create(...)
if (res)
goto exit;
<rest of tipc_init_socket goes here>
That gives you the same level of readability, without the additional potential
call instruction, plus you put the unlikely case inside the if conditional.
Neil
^ permalink raw reply
* Re: [PATCH] fix initializer entry defined twice issue
From: Patrick Trantham @ 2012-12-07 19:51 UTC (permalink / raw)
To: Cong Ding
Cc: David S. Miller, Otavio Salvador, Javier Martinez Canillas,
Jiri Kosina, netdev, linux-kernel
In-Reply-To: <20121207000511.GD2510@gmail.com>
On 12/06/2012 06:05 PM, Cong Ding wrote:
> On Thu, Dec 06, 2012 at 05:59:21PM -0600, Patrick Trantham wrote:
>> On 12/06/2012 05:16 PM, Cong Ding wrote:
>>> the ".config_intr" is defined twice in both line 208 and 212.
>>>
>>> Signed-off-by: Cong Ding <dinggnu@gmail.com>
>>> ---
>>> drivers/net/phy/smsc.c | 1 -
>>> 1 files changed, 0 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/drivers/net/phy/smsc.c b/drivers/net/phy/smsc.c
>>> index 16dceed..5cee6bd 100644
>>> --- a/drivers/net/phy/smsc.c
>>> +++ b/drivers/net/phy/smsc.c
>>> @@ -205,7 +205,6 @@ static struct phy_driver smsc_phy_driver[] = {
>>> /* basic functions */
>>> .config_aneg = genphy_config_aneg,
>>> .read_status = lan87xx_read_status,
>>> - .config_intr = smsc_phy_config_intr,
>>> /* IRQ related */
>>> .ack_interrupt = smsc_phy_ack_interrupt,
>> Hi Cong,
>>
>> That looks like a mistake from my previous patch to that file. Copy
>> and paste fail.
>>
>> The line should read:
>> .config_init = smsc_phy_config_init,
>>
>> I can submit a patch once I get off work unless you beat me to it.
> Thanks for the note. It's better that you submit a patch together with your
> code.
>
> - cong
Patch submitted, thanks for pointing this out!
- Patrick
^ permalink raw reply
* Re: [PATCH v3 1/4] net: Add support for hardware-offloaded encapsulation
From: David Miller @ 2012-12-07 19:52 UTC (permalink / raw)
To: joseph.gasparakis
Cc: bhutchings, alexander.h.duyck, shemminger, chrisw, gospo, netdev,
linux-kernel, dmitry, saeed.bishara, peter.p.waskiewicz.jr
In-Reply-To: <alpine.LFD.2.02.1212071149150.4985@morpheus.jf.intel.com>
From: Joseph Gasparakis <joseph.gasparakis@intel.com>
Date: Fri, 7 Dec 2012 11:52:43 -0800 (PST)
>
>
> On Fri, 7 Dec 2012, David Miller wrote:
>
>> From: Joseph Gasparakis <joseph.gasparakis@intel.com>
>> Date: Fri, 7 Dec 2012 11:41:46 -0800 (PST)
>>
>> >
>> >
>> > On Fri, 7 Dec 2012, David Miller wrote:
>> >
>> >> From: Joseph Gasparakis <joseph.gasparakis@intel.com>
>> >> Date: Fri, 7 Dec 2012 10:24:17 -0800 (PST)
>> >>
>> >> > So the idea here is that the driver will use the headers for checksumming
>> >> > if the skb->encapsulation bit is on. The bit should be set in the protocol
>> >> > driver.
>> >> >
>> >> > To answer the second comment, the flags that we use in this series of
>> >> > patches is NETIF_F_IP_CSUM, NETIF_F_IPV6_CSUM and NETIF_F_SG. These are
>> >> > the bits that we propose will be used for checksumming of encapsulation.
>> >> > As per a previous comment in v2, the hw_enc_features field should be used
>> >> > also in the future when NICs have more encap offloads, so one could
>> >> > indicate these features there from the driver.
>> >> >
>> >> > Furthermore, I submitted a patch for Rx checksumming, where NETIF_F_RXCSUM
>> >> > is used, again in conjunction with skb->encapsulation flag. As I mention
>> >> > in my logs, the driver is expected to set the ip_summed to UNNECESSARY and
>> >> > turn the skb->encapsulation on, to indicate that the inner headers are
>> >> > already HW checksummed.
>> >> >
>> >>
>> >> This is the kind of language that belongs in the commit message and
>> >> code comments.
>> >>
>> > Sure. I'll wait to gather some more feedback if there is any and I will
>> > re-spin this off adding more code comments and clarify this in the logs.
>>
>> Great.
>>
>> Please note that this request applies to your receive side change too.
>>
> I believe the logs are sufficient in the rx patch:
>
> This patch adds capability in vxlan to identify received
> checksummed inner packets and signal them to the upper layers of
> the stack. The driver needs to set the skb->encapsulation bit
> and also set the skb->ip_summed to CHECKSUM_UNNECESSARY.
>
> ...but I can add more comments to the code if this is what you are
> referring to.
Yes, please do.
^ permalink raw reply
* Re: [PATCH v3 1/4] net: Add support for hardware-offloaded encapsulation
From: Joseph Gasparakis @ 2012-12-07 19:52 UTC (permalink / raw)
To: David Miller
Cc: joseph.gasparakis, bhutchings, alexander.h.duyck, shemminger,
chrisw, gospo, netdev, linux-kernel, dmitry, saeed.bishara,
peter.p.waskiewicz.jr
In-Reply-To: <20121207.143725.1501926484331042018.davem@davemloft.net>
On Fri, 7 Dec 2012, David Miller wrote:
> From: Joseph Gasparakis <joseph.gasparakis@intel.com>
> Date: Fri, 7 Dec 2012 11:41:46 -0800 (PST)
>
> >
> >
> > On Fri, 7 Dec 2012, David Miller wrote:
> >
> >> From: Joseph Gasparakis <joseph.gasparakis@intel.com>
> >> Date: Fri, 7 Dec 2012 10:24:17 -0800 (PST)
> >>
> >> > So the idea here is that the driver will use the headers for checksumming
> >> > if the skb->encapsulation bit is on. The bit should be set in the protocol
> >> > driver.
> >> >
> >> > To answer the second comment, the flags that we use in this series of
> >> > patches is NETIF_F_IP_CSUM, NETIF_F_IPV6_CSUM and NETIF_F_SG. These are
> >> > the bits that we propose will be used for checksumming of encapsulation.
> >> > As per a previous comment in v2, the hw_enc_features field should be used
> >> > also in the future when NICs have more encap offloads, so one could
> >> > indicate these features there from the driver.
> >> >
> >> > Furthermore, I submitted a patch for Rx checksumming, where NETIF_F_RXCSUM
> >> > is used, again in conjunction with skb->encapsulation flag. As I mention
> >> > in my logs, the driver is expected to set the ip_summed to UNNECESSARY and
> >> > turn the skb->encapsulation on, to indicate that the inner headers are
> >> > already HW checksummed.
> >> >
> >>
> >> This is the kind of language that belongs in the commit message and
> >> code comments.
> >>
> > Sure. I'll wait to gather some more feedback if there is any and I will
> > re-spin this off adding more code comments and clarify this in the logs.
>
> Great.
>
> Please note that this request applies to your receive side change too.
>
I believe the logs are sufficient in the rx patch:
This patch adds capability in vxlan to identify received
checksummed inner packets and signal them to the upper layers of
the stack. The driver needs to set the skb->encapsulation bit
and also set the skb->ip_summed to CHECKSUM_UNNECESSARY.
...but I can add more comments to the code if this is what you are
referring to.
^ permalink raw reply
* Re: [PATCH net-next 02/10] tipc: eliminate aggregate sk_receive_queue limit
From: Paul Gortmaker @ 2012-12-07 19:54 UTC (permalink / raw)
To: David Miller; +Cc: nhorman, netdev, jon.maloy, ying.xue
In-Reply-To: <20121207.123644.431593821406881255.davem@davemloft.net>
[Re: [PATCH net-next 02/10] tipc: eliminate aggregate sk_receive_queue limit] On 07/12/2012 (Fri 12:36) David Miller wrote:
> From: Neil Horman <nhorman@tuxdriver.com>
> Date: Fri, 7 Dec 2012 11:07:33 -0500
>
> > On Fri, Dec 07, 2012 at 09:28:10AM -0500, Paul Gortmaker wrote:
> >> From: Ying Xue <ying.xue@windriver.com>
> >>
> >> As a complement to the per-socket sk_recv_queue limit, TIPC keeps a
> >> global atomic counter for the sum of sk_recv_queue sizes across all
> >> tipc sockets. When incremented, the counter is compared to an upper
> >> threshold value, and if this is reached, the message is rejected
> >> with error code TIPC_OVERLOAD.
> >>
> >> This check was originally meant to protect the node against
> >> buffer exhaustion and general CPU overload. However, all experience
> >> indicates that the feature not only is redundant on Linux, but even
> >> harmful. Users run into the limit very often, causing disturbances
> >> for their applications, while removing it seems to have no negative
> >> effects at all. We have also seen that overall performance is
> >> boosted significantly when this bottleneck is removed.
> >>
> >> Furthermore, we don't see any other network protocols maintaining
> >> such a mechanism, something strengthening our conviction that this
> >> control can be eliminated.
> >>
> >> Signed-off-by: Ying Xue <ying.xue@windriver.com>
> >> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
> >> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> ...
> >> @@ -1241,11 +1241,6 @@ static u32 filter_rcv(struct sock *sk, struct sk_buff *buf)
> >> }
> >>
> >> /* Reject message if there isn't room to queue it */
> >> - recv_q_len = (u32)atomic_read(&tipc_queue_size);
> >> - if (unlikely(recv_q_len >= OVERLOAD_LIMIT_BASE)) {
> >> - if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE))
> >> - return TIPC_ERR_OVERLOAD;
> >> - }
> > If you're going to remove the one place that you read this variable, don't you
> > also want to remove the points where you increment/decrement the atomic as well,
> > and for that matter eliminate the definition itself?
>
> There's another reader, a getsockopt() call.
>
> I would just make it return zero or similar.
Updated commit below for reference, just to double check that I'm
doing what is requested properly.
Paul.
--
From 9da3d475874f4da49057767913af95ce01063ba3 Mon Sep 17 00:00:00 2001
From: Ying Xue <ying.xue@windriver.com>
Date: Tue, 27 Nov 2012 06:15:27 -0500
Subject: [PATCH] tipc: eliminate aggregate sk_receive_queue limit
As a complement to the per-socket sk_recv_queue limit, TIPC keeps a
global atomic counter for the sum of sk_recv_queue sizes across all
tipc sockets. When incremented, the counter is compared to an upper
threshold value, and if this is reached, the message is rejected
with error code TIPC_OVERLOAD.
This check was originally meant to protect the node against
buffer exhaustion and general CPU overload. However, all experience
indicates that the feature not only is redundant on Linux, but even
harmful. Users run into the limit very often, causing disturbances
for their applications, while removing it seems to have no negative
effects at all. We have also seen that overall performance is
boosted significantly when this bottleneck is removed.
Furthermore, we don't see any other network protocols maintaining
such a mechanism, something strengthening our conviction that this
control can be eliminated.
As a result, the atomic variable tipc_queue_size is now unused
and so it can be deleted. There is a getsockopt call that used
to allow reading it; we retain that but just return zero for
maximum compatibility.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
[PG: phase out tipc_queue_size as pointed out by Neil Horman]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 1a720c8..848be69 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -2,7 +2,7 @@
* net/tipc/socket.c: TIPC socket API
*
* Copyright (c) 2001-2007, Ericsson AB
- * Copyright (c) 2004-2008, 2010-2011, Wind River Systems
+ * Copyright (c) 2004-2008, 2010-2012, Wind River Systems
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
@@ -73,8 +73,6 @@ static struct proto tipc_proto;
static int sockets_enabled;
-static atomic_t tipc_queue_size = ATOMIC_INIT(0);
-
/*
* Revised TIPC socket locking policy:
*
@@ -128,7 +126,6 @@ static atomic_t tipc_queue_size = ATOMIC_INIT(0);
static void advance_rx_queue(struct sock *sk)
{
kfree_skb(__skb_dequeue(&sk->sk_receive_queue));
- atomic_dec(&tipc_queue_size);
}
/**
@@ -140,10 +137,8 @@ static void discard_rx_queue(struct sock *sk)
{
struct sk_buff *buf;
- while ((buf = __skb_dequeue(&sk->sk_receive_queue))) {
- atomic_dec(&tipc_queue_size);
+ while ((buf = __skb_dequeue(&sk->sk_receive_queue)))
kfree_skb(buf);
- }
}
/**
@@ -155,10 +150,8 @@ static void reject_rx_queue(struct sock *sk)
{
struct sk_buff *buf;
- while ((buf = __skb_dequeue(&sk->sk_receive_queue))) {
+ while ((buf = __skb_dequeue(&sk->sk_receive_queue)))
tipc_reject_msg(buf, TIPC_ERR_NO_PORT);
- atomic_dec(&tipc_queue_size);
- }
}
/**
@@ -280,7 +273,6 @@ static int release(struct socket *sock)
buf = __skb_dequeue(&sk->sk_receive_queue);
if (buf == NULL)
break;
- atomic_dec(&tipc_queue_size);
if (TIPC_SKB_CB(buf)->handle != 0)
kfree_skb(buf);
else {
@@ -1241,11 +1233,6 @@ static u32 filter_rcv(struct sock *sk, struct sk_buff *buf)
}
/* Reject message if there isn't room to queue it */
- recv_q_len = (u32)atomic_read(&tipc_queue_size);
- if (unlikely(recv_q_len >= OVERLOAD_LIMIT_BASE)) {
- if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE))
- return TIPC_ERR_OVERLOAD;
- }
recv_q_len = skb_queue_len(&sk->sk_receive_queue);
if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2))) {
if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE / 2))
@@ -1254,7 +1241,6 @@ static u32 filter_rcv(struct sock *sk, struct sk_buff *buf)
/* Enqueue message (finally!) */
TIPC_SKB_CB(buf)->handle = 0;
- atomic_inc(&tipc_queue_size);
__skb_queue_tail(&sk->sk_receive_queue, buf);
/* Initiate connection termination for an incoming 'FIN' */
@@ -1578,7 +1564,6 @@ restart:
/* Disconnect and send a 'FIN+' or 'FIN-' message to peer */
buf = __skb_dequeue(&sk->sk_receive_queue);
if (buf) {
- atomic_dec(&tipc_queue_size);
if (TIPC_SKB_CB(buf)->handle != 0) {
kfree_skb(buf);
goto restart;
@@ -1717,7 +1702,7 @@ static int getsockopt(struct socket *sock,
/* no need to set "res", since already 0 at this point */
break;
case TIPC_NODE_RECVQ_DEPTH:
- value = (u32)atomic_read(&tipc_queue_size);
+ value = 0; /* was tipc_queue_size, now obsolete */
break;
case TIPC_SOCK_RECVQ_DEPTH:
value = skb_queue_len(&sk->sk_receive_queue);
--
1.7.12.1
>
> Paul please do so and respin this series.
>
> Thanks.
^ permalink raw reply related
* Re: [PATCH net-next 10/10] tipc: refactor accept() code for improved readability
From: Paul Gortmaker @ 2012-12-07 19:56 UTC (permalink / raw)
To: Neil Horman; +Cc: David Miller, netdev, Jon Maloy
In-Reply-To: <20121207194226.GB30339@hmsreliant.think-freely.org>
[Re: [PATCH net-next 10/10] tipc: refactor accept() code for improved readability] On 07/12/2012 (Fri 14:42) Neil Horman wrote:
> On Fri, Dec 07, 2012 at 09:28:18AM -0500, Paul Gortmaker wrote:
> > In TIPC's accept() routine, there is a large block of code relating
> > to initialization of a new socket, all within an if condition checking
> > if the allocation succeeded.
> >
> > Here, we simply factor out that init code within the accept() function
> > to its own separate function, which improves readability, and makes
> > it easier to track which lock_sock() calls are operating on existing
> > vs. new sockets.
> >
> > Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> > ---
> > net/tipc/socket.c | 93 +++++++++++++++++++++++++++++--------------------------
> > 1 file changed, 49 insertions(+), 44 deletions(-)
> >
> > diff --git a/net/tipc/socket.c b/net/tipc/socket.c
> > index 38613cf..56661c8 100644
> > --- a/net/tipc/socket.c
> > +++ b/net/tipc/socket.c
> > @@ -1507,6 +1507,53 @@ static int listen(struct socket *sock, int len)
> > return res;
> > }
> >
> > +static void tipc_init_socket(struct sock *sk, struct socket *new_sock,
> > + struct sk_buff *buf)
> > +{
> Can you rename this to something more specific to its purpose? tipc_init_socket
> makes me wonder why you're not calling it internally from tipc_create. This
> seems more like a tipc_init_accept_sock, or some such. Alternatively, since
> you're ony using it from your accept call, you might consider not factoring it
> out at all, and just reversing the logic in your accept function so that you do:
>
> res = tipc_create(...)
> if (res)
> goto exit;
> <rest of tipc_init_socket goes here>
>
> That gives you the same level of readability, without the additional potential
> call instruction, plus you put the unlikely case inside the if conditional.
I'm inclined to do the latter and just flip the sense of the "if" since
I already scratched my head trying to think of a good name (and
apparently failed in the end).
Thanks,
Paul.
>
> Neil
>
^ permalink raw reply
* Re: BUG: scheduling while atomic: ifup-bonding/3711/0x00000002 -- V3.6.7
From: Linda Walsh @ 2012-12-07 20:06 UTC (permalink / raw)
To: Jay Vosburgh; +Cc: Cong Wang, LKML, Linux Kernel Network Developers
In-Reply-To: <4913.1354154236@death.nxdomain>
Sorry for the delay.... my distro (Suse) has made rebooting my system
a chore (have to often boot from rescue to get it to come up because
they put mount libs in /usr/lib expecting they will always boot
from their ram disk -- preventing those of use who boot directly
from disk from doing so easily...grrr.
Jay Vosburgh wrote:
> The miimon functionality is used to check link state and notice
> when slaves lose carrier.
---
If I am running 'rr' on 2 channels -- specifically for the purpose
of link speed aggregation (getting 1 20Gb channel out of 2 10Gb channels)
I'm not sure I see how miimon would provide benefit. -- if 1 link dies,
the other, being on the same card is likely to be dead too, so would
it really serve a purpose?
> Running without it will not detect failure of
> the bonding slaves, which is likely not what you want. The mode,
> balance-rr in your case, is what selects the load balance to use, and is
> separate from the miimon.
>
----
Wouldn't the entire link die if a slave dies -- like RAID0, 1 disk
dies, the entire link goes down?
The other end (windows) doesn't dynamically config for a static-link
aggregation, so I don't think it would provide benefit.
> That said, the problem you're seeing appears to be caused by two
> things: bonding holds a lock (in addition to RTNL) when calling
> __ethtool_get_settings, and an ixgbe function in the call path to
> retrieve the settings, ixgbe_acquire_swfw_sync_X540, can sleep.
>
> The test patch above handles one case in bond_enslave, but there
> is another case in bond_miimon_commit when a slave changes link state
> from down to up, which will occur shortly after the slave is added.
>
----
Added your 2nd patch -- no more error messages...
however -- likely unrelated, the max speed read or write I am seeing
is about 500MB/s, and that is rare -- usually it's barely <3x a 1Gb
network speed. (119/125 MB R/W). I'm not at all sure it's really
combining the links
properly. Anyway to verify that?
On the windows side it shows the bond-link as a 20Gb connection, but
I don't see anyplace for something similar on linux.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox