Netdev List
 help / color / mirror / Atom feed
* Re: [RFCv4 PATCH 1/2] net: Introduce recvmmsg socket syscall
From: David Miller @ 2009-10-09 21:27 UTC (permalink / raw)
  To: acme
  Cc: caitlin.bestler, vanhoof, williams, nhorman, nir.tzachar, niv,
	paul.moore, remi.denis-courmont, steve, netdev
In-Reply-To: <20091009193520.GD12982@ghostprotocols.net>

From: Arnaldo Carvalho de Melo <acme@redhat.com>
Date: Fri, 9 Oct 2009 16:35:20 -0300

> 	The second patch in this series has issues, I still have to
> investigate it properly, study removing the skb_queue_head lock like TCP
> does, but the first patch seems to be OK and already providing good
> results at least as reported by Nir, if there aren't any other concerns
> about the API, can we get it into net-next-2.6?

Please make a formal submission of that first patch with all proper
signoffs and without the "RFC" in the subject line and I'll apply it.

Thanks!

^ permalink raw reply

* Re: tg3 and Broadcom PHY driver
From: David Miller @ 2009-10-09 21:25 UTC (permalink / raw)
  To: bhutchings; +Cc: felix, mcarlson, netdev
In-Reply-To: <1254185639.27790.3.camel@localhost>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Tue, 29 Sep 2009 01:53:59 +0100

> On Mon, 2009-09-28 at 14:55 -0700, David Miller wrote:
>> From: Felix Radensky <felix@embedded-sol.com>
>> Date: Mon, 28 Sep 2009 23:52:54 +0200
>> 
>> > Yes, moving CONFIG_TIGON3 right after CONFIG_PHYLIB in
>> > drivers/net/Makefile fixes the problem for me.
>> 
>> Thanks for testing.
>> 
>> We really need to fix this generically.
>> 
>> Does anyone think that moving the MDIO/MII/PHY layer objects
>> to the top of drivers/net/Makefile will break anything?
>> 
>> If not, that's what we should do I think.
> 
> Only the phylib drivers actually need to be moved to fix the
> initialisation order, but moving the others shouldn't hurt.

Ok, I'm adding the following to net-2.6 to resolve this and
will queue it up for -stable too.

Thanks everyone.

net: Link in PHY drivers before others.

We need PHY drivers to initialize in a static kernel before
the MAC drivers that use them.  So link them in first.

Based upon a report by Felix Radensky.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/Makefile |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index d866b8c..48d82e9 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -2,6 +2,10 @@
 # Makefile for the Linux network (ethercard) device drivers.
 #
 
+obj-$(CONFIG_MII) += mii.o
+obj-$(CONFIG_MDIO) += mdio.o
+obj-$(CONFIG_PHYLIB) += phy/
+
 obj-$(CONFIG_TI_DAVINCI_EMAC) += davinci_emac.o
 
 obj-$(CONFIG_E1000) += e1000/
@@ -100,10 +104,6 @@ obj-$(CONFIG_SH_ETH) += sh_eth.o
 # end link order section
 #
 
-obj-$(CONFIG_MII) += mii.o
-obj-$(CONFIG_MDIO) += mdio.o
-obj-$(CONFIG_PHYLIB) += phy/
-
 obj-$(CONFIG_SUNDANCE) += sundance.o
 obj-$(CONFIG_HAMACHI) += hamachi.o
 obj-$(CONFIG_NET) += Space.o loopback.o
-- 
1.6.4.4


^ permalink raw reply related

* pull request: wireless-next-2.6 2009-10-09
From: John W. Linville @ 2009-10-09 21:05 UTC (permalink / raw)
  To: davem; +Cc: linux-wireless, netdev

Dave,

Here is the usual big first post-window pull request for -next...
Mostly it is the usual suspects, lots of iwlwifi and ath* along
with a smattering of other bits.  There are even a few from me! :-)
Most of these have spent several days banging-around in -next (which
helped to find some Kconfig problems).

Please let me know if there are problems!

Thanks,

John

---

Individual patches are available here:

	http://www.kernel.org/pub/linux/kernel/people/linville/wireless-next-2.6/

---

The following changes since commit d519e17e2d01a0ee9abe083019532061b4438065:
  Andy Gospodarek (1):
        net: export device speed and duplex via sysfs

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6.git master

Abhijeet Kolekar (2):
      iwlwifi/iwl3945 : unify apm stop operation
      iwlwifi: replace iwl_poll_direct_bit with iwl_poll_bit for CSR access

Amitkumar Karwar (2):
      libertas: Add auto deep sleep support for SD8385/SD8686/SD8688
      libertas: Use lbs_is_cmd_allowed() check in command handling routines.

Christian Lamparter (1):
      iwlwifi: drop lib80211 dependency

Daniel C Halperin (3):
      iwlwifi: clean up rs_tx_status
      iwlwifi: do not clear TX info flags when receiving BlockAckResponse
      iwlwifi: add aggregation tables to the rate scaling algorithm

Holger Schurig (5):
      nl80211: report age of scan results
      libertas: separate libertas' Kconfig in it's own file
      libertas: first stab at cfg80211 support
      libertas: remove extraneous select FW_LOADER
      libertas: depend on CONFIG_CFG80211

Huaxu Wan (2):
      iwlwifi: add module firmware info for 1000 series
      iwlwifi: clear the translate table area

Jaswinder Singh Rajput (1):
      b43: Comment unused functions lpphy_restore_dig_flt_state and lpphy_disable_rx_gain_override

Joerg Albert (3):
      ar9170: fixed coding style, moved define
      ar9170: add heavy clip handling
      ar9170: handle overflow in tsf_low register during get_tsf

Johannes Berg (10):
      iwlwifi: clean up ht config a little
      iwlwifi: clean up ht config naming
      iwlwifi: clarify and clean up chain settings
      iwlwifi: fix a typo
      iwlwifi: default to using all chains
      iwlwifi: support idle for 6000 series hw
      wext: refactor
      iwlwifi: device tracing
      iwlwifi: LED cleanup
      wireless: make wireless drivers select core

John W. Linville (6):
      wireless: implement basic ethtool support for cfg80211 devices
      mac80211: support ETHTOOL_GPERMADDR
      iwmc3200wifi: support ETHTOOL_GPERMADDR
      ipw2200: support ETHTOOL_GPERMADDR
      orinoco: support ETHTOOL_GPERMADDR
      net/wireless/ethtool.h: drop unnecessary include of linux/ethtool.h

Kalle Valo (3):
      wl1251: remove wl1251_netlink.h
      cfg80211: add firmware and hardware version to wiphy
      at76c50x-usb: set firmware and hardware version in wiphy

Larry Finger (1):
      staging: Add proper selection of WIRELESS_EXT and WEXT_PRIV

Luis R. Rodriguez (68):
      ath9k: use ath_hw for DPRINTF() and debug init/exit
      ath9k: move btcoex core driver info to its own struct
      ath9k: move hw specific btcoex info to ath_hw
      ath9k: split bluetooth hardware coex init into two helpers
      ath9k: move driver core helpers to main.c
      ath9k: split ath9k_hw_btcoex_enable() into two helpers
      ath9k: replaces SC_OP_BTCOEX_ENABLED with a bool
      ath9k: move bt_stomp_type to driver core
      ath9k: remove unused bt_duty_cycle
      ath9k: rename btcoex_scheme to just scheme
      ath9k: rename ath_btcoex_info to ath_btcoex_hw
      ath9k: simplify ath_btcoex_bt_stomp()
      ath9k: now move ath9k_hw_btcoex_set_weight() to btcoex.c
      ath9k: move ath_btcoex_config and ath_bt_mode to btcoex.c
      ath9k: rename ath_btcoex_supported() to ath9k_hw_btcoex_supported()
      ath9k: move ps helpers onto core driver when reseting tsf
      ath9k: move ath9k_ps_wakeup() and ath9k_ps_restore() to main.c
      ath9k: avoid usage of ath9k_hw_setpower() on hw.c
      ath9k: move ath9k_hw_setpower() to main.c
      ath9k: rename driver core and hw power save helpers
      ath: move ath_bcast_mac to common header
      atheros: use get_unaligned_le*() for bssid mask setting
      ath9k: make ath9k_hw_setbssidmask() and ath9k_hw_write_associd() use ath_hw
      ath9k: Use ath9k_hw_setbssidmask() on reset
      ath9k: use ath9k_hw_write_associd() on reset
      atheros/ath9k: move macaddr, curaid, curbssid and bssidmask to common
      ar9170: make use of common macaddr and curbssid
      ath5k: use common curbssid, bssidmask and macaddr
      ath5k: initialize eeprom struct early on attach
      ath9k: move ath_common to ath_hw
      ath5k: move ath_common to ath5k_hw
      ath9k: Define bus agnostic bluetooth coex prep helper
      atheros/ath9k: add common read/write ops and port ath9k to use it
      ath5k: allocate ath5k_hw prior to initializing hw
      ath5k: define ath_common ops
      atheros: define shared bssidmask setting
      atheros: add ieee80211_hw to ath_common
      ath9k: separate core driver and hw timer code
      atheros: add common debug printing
      atheros: move tx/rx chainmask to ath_common
      ath9k: remove ath9k 25 MHz HT40 spacing stuff
      ath9k: remove ath9k_ht_macmode
      ath9k: move ATH_AMPDU_LIMIT_MAX to hw.h
      ath9k: remove driver ASSERT, just use BUG_ON()
      ath9k: clarify what hw code is and remove ath9k.h from a few files
      ath9k: move ATH9K_RSSI_BAD to hw.h
      atheros: move bus ops to ath_common
      ath9k: make ath9k_common_ops const
      ath9k: use common read/write ops on pci and debug code
      ath9k: move hw code to its own module
      ath9k_hw: print device ID if not supported
      ath9k_hw: add AR9271 srev and device ID to allow hw to support ar9271
      atheros: define a common priv struct
      ath5k: fix regression on setting bssid mask on association
      ath5k: use ath_hw_setbssidmask() for bssid mask setting upon assoc
      ath5k: fix regression introduced upon the removal of AR5K_HIGH_ID()
      ath5k: simplify passed params to ath5k_hw_set_associd()
      ath5k: remove temporary low_id and high_id vars on ath5k_hw_set_associd()
      ath5k: fix regression which triggers an SME join upon assoc
      ath5k: enable Power-Save Polls by setting the association ID
      ath9k: move common->debug_mask setting to ath_init_softc()
      ath9k: initialize hw prior to debugfs
      ath9k: add helper to un-init the hw properly
      ath9k: add a helper to clean the core driver upon module unload
      ath9k: move ath_cleanup() below helpers to avoid forward declarations
      ath9k: rename ath_beaconq_setup() to ath9k_hw_beaconq_setup()
      ath9k: use right parameter for MODULE_PARM_DESC() for debug
      libertas: remove double assignment of dev->netdev_ops

Rafael J. Wysocki (1):
      Wireless / ath5k: Simplify suspend and resume callbacks

Randy Dunlap (1):
      wireless: fix CFG80211_WEXT build problems

Senthil Balasubramanian (5):
      ath9k: Allow PSPOLL only when the interface is configured in AP mode
      ath9k: Handle ATH9K_BEACON_RESET_TSF properly
      ath9k: Reduce PLL Settle time and eliminate redundant PLL calls.
      ath9k: Advertise midband for AR5416 devices
      ath9k: Fix bugs in handling TX power

Sujith (2):
      ath9k: Update INI release for AR9287
      ath9k: Fix RTC reset for AR5416

Vasanthakumar Thiagarajan (1):
      ath9k: Update initvals

Vivek Natarajan (1):
      ath9k: Add Calibration checks

Wey-Yi Guy (19):
      iwlwifi: modify LED blink index table
      iwlwifi: remove un-supported eeprom parameters
      iwlwifi: separate nic_config for different NIC
      iwlwifi: separate set_hw_params function for 6000 series
      iwlwifi: Adjust blink rate to compensate Clock difference
      iwlwifi: show NVM version in debugfs
      iwlwifi: Use RTS/CTS as the preferred protection mechanism for 6000 series
      iwlwifi: allow user change protection mechanism for HT
      iwlwifi: EEPROM version for 1000 and 6000 series
      iwlwifi: use S_IRUGO and S_IWUSR in module parameters
      iwlwifi: send cmd to uCode to configure valid tx antenna
      iwlwifi: update PCI Subsystem ID for 1000 series
      iwlwifi: update PCI Subsystem ID for 6000 series
      iwlwifi: add LED mode to support different LED behavior
      iwlwifi: Chain Noise Calibration for 6000 series
      iwlwifi: reliable entering of critical temperature state
      iwlwifi: change valid EEPROM version for 1000 series
      iwlwifi: set default aggregation frame count limit to 31
      iwlwifi: validate the signature for EEPROM and OTP

 drivers/net/wireless/Kconfig                 |   84 +-
 drivers/net/wireless/at76c50x-usb.c          |   10 +
 drivers/net/wireless/ath/Kconfig             |    8 +
 drivers/net/wireless/ath/Makefile            |    9 +-
 drivers/net/wireless/ath/ar9170/ar9170.h     |    4 +-
 drivers/net/wireless/ath/ar9170/cmd.c        |    3 +-
 drivers/net/wireless/ath/ar9170/cmd.h        |    1 +
 drivers/net/wireless/ath/ar9170/hw.h         |    2 +
 drivers/net/wireless/ath/ar9170/mac.c        |   15 +-
 drivers/net/wireless/ath/ar9170/main.c       |   30 +-
 drivers/net/wireless/ath/ar9170/phy.c        |   99 ++-
 drivers/net/wireless/ath/ath.h               |   41 +
 drivers/net/wireless/ath/ath5k/ath5k.h       |   40 +-
 drivers/net/wireless/ath/ath5k/attach.c      |   31 +-
 drivers/net/wireless/ath/ath5k/base.c        |  116 ++-
 drivers/net/wireless/ath/ath5k/base.h        |   12 -
 drivers/net/wireless/ath/ath5k/initvals.c    |    4 +-
 drivers/net/wireless/ath/ath5k/pcu.c         |  193 +---
 drivers/net/wireless/ath/ath5k/reg.h         |    8 +-
 drivers/net/wireless/ath/ath5k/reset.c       |   16 +-
 drivers/net/wireless/ath/ath9k/Kconfig       |    8 +
 drivers/net/wireless/ath/ath9k/Makefile      |   27 +-
 drivers/net/wireless/ath/ath9k/ahb.c         |   19 +-
 drivers/net/wireless/ath/ath9k/ani.c         |  141 ++-
 drivers/net/wireless/ath/ath9k/ath9k.h       |   73 +-
 drivers/net/wireless/ath/ath9k/beacon.c      |  112 +-
 drivers/net/wireless/ath/ath9k/btcoex.c      |  383 ++----
 drivers/net/wireless/ath/ath9k/btcoex.h      |   64 +-
 drivers/net/wireless/ath/ath9k/calib.c       |  391 ++++---
 drivers/net/wireless/ath/ath9k/calib.h       |    2 +
 drivers/net/wireless/ath/ath9k/debug.c       |   55 +-
 drivers/net/wireless/ath/ath9k/debug.h       |   36 +-
 drivers/net/wireless/ath/ath9k/eeprom.c      |    8 +-
 drivers/net/wireless/ath/ath9k/eeprom.h      |    9 +-
 drivers/net/wireless/ath/ath9k/eeprom_4k.c   |   90 +-
 drivers/net/wireless/ath/ath9k/eeprom_9287.c |   97 +-
 drivers/net/wireless/ath/ath9k/eeprom_def.c  |  183 ++-
 drivers/net/wireless/ath/ath9k/hw.c          |  595 +++++-----
 drivers/net/wireless/ath/ath9k/hw.h          |   63 +-
 drivers/net/wireless/ath/ath9k/initvals.h    |   72 +-
 drivers/net/wireless/ath/ath9k/mac.c         |  162 ++-
 drivers/net/wireless/ath/ath9k/mac.h         |   11 +-
 drivers/net/wireless/ath/ath9k/main.c        |  841 +++++++++----
 drivers/net/wireless/ath/ath9k/pci.c         |   37 +-
 drivers/net/wireless/ath/ath9k/phy.c         |   50 +-
 drivers/net/wireless/ath/ath9k/phy.h         |    1 +
 drivers/net/wireless/ath/ath9k/rc.c          |   33 +-
 drivers/net/wireless/ath/ath9k/recv.c        |   62 +-
 drivers/net/wireless/ath/ath9k/reg.h         |    5 +-
 drivers/net/wireless/ath/ath9k/virtual.c     |   22 +-
 drivers/net/wireless/ath/ath9k/xmit.c        |  113 +-
 drivers/net/wireless/ath/debug.c             |   32 +
 drivers/net/wireless/ath/debug.h             |   77 ++
 drivers/net/wireless/ath/hw.c                |  126 ++
 drivers/net/wireless/ath/reg.h               |   27 +
 drivers/net/wireless/b43/phy_lp.c            |    6 +
 drivers/net/wireless/hostap/Kconfig          |    2 +
 drivers/net/wireless/ipw2x00/Kconfig         |    7 +-
 drivers/net/wireless/ipw2x00/ipw2200.c       |    1 +
 drivers/net/wireless/iwlwifi/Kconfig         |   28 +-
 drivers/net/wireless/iwlwifi/Makefile        |   12 +-
 drivers/net/wireless/iwlwifi/iwl-1000.c      |   35 +-
 drivers/net/wireless/iwlwifi/iwl-3945-led.c  |  371 +-----
 drivers/net/wireless/iwlwifi/iwl-3945-led.h  |   22 +-
 drivers/net/wireless/iwlwifi/iwl-3945.c      |   65 +-
 drivers/net/wireless/iwlwifi/iwl-3945.h      |    2 +-
 drivers/net/wireless/iwlwifi/iwl-4965.c      |   71 +-
 drivers/net/wireless/iwlwifi/iwl-5000.c      |  127 +-
 drivers/net/wireless/iwlwifi/iwl-6000.c      |  245 ++++-
 drivers/net/wireless/iwlwifi/iwl-agn-led.c   |   85 ++
 drivers/net/wireless/iwlwifi/iwl-agn-led.h   |   32 +
 drivers/net/wireless/iwlwifi/iwl-agn-rs.c    |  466 ++++----
 drivers/net/wireless/iwlwifi/iwl-agn.c       |  124 ++-
 drivers/net/wireless/iwlwifi/iwl-calib.c     |   66 +-
 drivers/net/wireless/iwlwifi/iwl-commands.h  |   12 +-
 drivers/net/wireless/iwlwifi/iwl-core.c      |  209 ++--
 drivers/net/wireless/iwlwifi/iwl-core.h      |   31 +-
 drivers/net/wireless/iwlwifi/iwl-csr.h       |    7 +-
 drivers/net/wireless/iwlwifi/iwl-debug.h     |    2 -
 drivers/net/wireless/iwlwifi/iwl-debugfs.c   |   17 +-
 drivers/net/wireless/iwlwifi/iwl-dev.h       |   31 +-
 drivers/net/wireless/iwlwifi/iwl-devtrace.c  |   13 +
 drivers/net/wireless/iwlwifi/iwl-devtrace.h  |  178 +++
 drivers/net/wireless/iwlwifi/iwl-eeprom.c    |   45 +-
 drivers/net/wireless/iwlwifi/iwl-eeprom.h    |   17 +-
 drivers/net/wireless/iwlwifi/iwl-io.h        |   16 +-
 drivers/net/wireless/iwlwifi/iwl-led.c       |  323 +----
 drivers/net/wireless/iwlwifi/iwl-led.h       |   46 +-
 drivers/net/wireless/iwlwifi/iwl-power.c     |  149 ++-
 drivers/net/wireless/iwlwifi/iwl-power.h     |    3 +
 drivers/net/wireless/iwlwifi/iwl-scan.c      |    1 -
 drivers/net/wireless/iwlwifi/iwl-tx.c        |   26 +-
 drivers/net/wireless/iwlwifi/iwl3945-base.c  |   28 +-
 drivers/net/wireless/iwmc3200wifi/main.c     |    2 +
 drivers/net/wireless/libertas/Kconfig        |   39 +
 drivers/net/wireless/libertas/Makefile       |   15 +-
 drivers/net/wireless/libertas/README         |   26 +-
 drivers/net/wireless/libertas/cfg.c          |  198 +++
 drivers/net/wireless/libertas/cfg.h          |   16 +
 drivers/net/wireless/libertas/cmd.c          |  106 ++-
 drivers/net/wireless/libertas/cmdresp.c      |   12 +
 drivers/net/wireless/libertas/decl.h         |    3 +
 drivers/net/wireless/libertas/defs.h         |    2 +
 drivers/net/wireless/libertas/dev.h          |   19 +
 drivers/net/wireless/libertas/host.h         |    1 +
 drivers/net/wireless/libertas/if_cs.c        |    3 +
 drivers/net/wireless/libertas/if_sdio.c      |   56 +
 drivers/net/wireless/libertas/if_sdio.h      |    3 +-
 drivers/net/wireless/libertas/if_spi.c       |    3 +
 drivers/net/wireless/libertas/if_usb.c       |    3 +
 drivers/net/wireless/libertas/main.c         |  171 ++-
 drivers/net/wireless/libertas/wext.c         |   54 +-
 drivers/net/wireless/orinoco/Kconfig         |    4 +-
 drivers/net/wireless/orinoco/main.c          |    1 +
 drivers/net/wireless/wl12xx/wl1251_netlink.h |   30 -
 drivers/staging/rtl8187se/Kconfig            |    3 +-
 drivers/staging/rtl8192e/Kconfig             |    3 +-
 drivers/staging/vt6655/Kconfig               |    4 +-
 drivers/staging/vt6656/Kconfig               |    4 +-
 include/linux/nl80211.h                      |    2 +
 include/net/cfg80211.h                       |    9 +-
 include/net/iw_handler.h                     |   14 +-
 include/net/net_namespace.h                  |    2 +-
 include/net/wext.h                           |   49 +-
 net/core/net-sysfs.c                         |    6 +-
 net/mac80211/iface.c                         |    5 +-
 net/socket.c                                 |    4 +-
 net/wireless/Kconfig                         |   50 +-
 net/wireless/Makefile                        |   10 +-
 net/wireless/core.c                          |   17 +-
 net/wireless/ethtool.c                       |   45 +
 net/wireless/ethtool.h                       |    6 +
 net/wireless/ibss.c                          |   10 +-
 net/wireless/mlme.c                          |    2 +-
 net/wireless/nl80211.c                       |    6 +-
 net/wireless/scan.c                          |    6 +-
 net/wireless/sme.c                           |   12 +-
 net/wireless/wext-core.c                     | 1063 +++++++++++++++
 net/wireless/wext-priv.c                     |  248 ++++
 net/wireless/wext-proc.c                     |  155 +++
 net/wireless/wext-spy.c                      |  231 ++++
 net/wireless/wext.c                          | 1775 --------------------------
 142 files changed, 6953 insertions(+), 5229 deletions(-)
 create mode 100644 drivers/net/wireless/ath/debug.c
 create mode 100644 drivers/net/wireless/ath/debug.h
 create mode 100644 drivers/net/wireless/ath/hw.c
 create mode 100644 drivers/net/wireless/ath/reg.h
 create mode 100644 drivers/net/wireless/iwlwifi/iwl-agn-led.c
 create mode 100644 drivers/net/wireless/iwlwifi/iwl-agn-led.h
 create mode 100644 drivers/net/wireless/iwlwifi/iwl-devtrace.c
 create mode 100644 drivers/net/wireless/iwlwifi/iwl-devtrace.h
 create mode 100644 drivers/net/wireless/libertas/Kconfig
 create mode 100644 drivers/net/wireless/libertas/cfg.c
 create mode 100644 drivers/net/wireless/libertas/cfg.h
 delete mode 100644 drivers/net/wireless/wl12xx/wl1251_netlink.h
 create mode 100644 net/wireless/ethtool.c
 create mode 100644 net/wireless/ethtool.h
 create mode 100644 net/wireless/wext-core.c
 create mode 100644 net/wireless/wext-priv.c
 create mode 100644 net/wireless/wext-proc.c
 create mode 100644 net/wireless/wext-spy.c
 delete mode 100644 net/wireless/wext.c

Omnibus patch is available here:

	http://www.kernel.org/pub/linux/kernel/people/linville/wireless-next-2.6-2009-10-09.patch.bz2

-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply

* Re: PATCH: Network Device Naming mechanism and policy
From: Matt Domsch @ 2009-10-09 21:09 UTC (permalink / raw)
  To: netdev, linux-hotplug; +Cc: Narendra_K, jordan_hargrave
In-Reply-To: <20091009140000.GA18765@mock.linuxdev.us.dell.com>

On Fri, Oct 09, 2009 at 09:00:01AM -0500, Narendra K wrote:
> On Fri, Oct 09, 2009 at 07:12:07PM +0530, K, Narendra wrote:
> > > example udev config:
> > > SUBSYSTEM=="net",
> > SYMLINK+="net/by-mac/$sysfs{ifindex}.$sysfs{address}"
> > 
> > work as well.  But coupling the ifindex to the MAC address like this
> > doesn't work.  (In general, coupling any two unrelated attributes when
> > trying to do persistent names doesn't work.)
> > 
> Attaching the latest patch incorporating review comments.

Same patch, rebased to linux-next.

By creating character devices for every network device, we can use
udev to maintain alternate naming policies for devices, including
additional names for the same device, without interfering with the
name that the kernel assigns a device.

This is conditionalized on CONFIG_NET_CDEV.  If enabled (the default),
device nodes will automatically be created in /dev/netdev/ for each
network device.  (/dev/net/ is already populated by the tun device.)

These device nodes are not functional at the moment - open() returns
-ENOSYS.  Their only purpose is to provide userspace with a kernel
name to ifindex mapping, in a form that udev can easily manage.

Signed-off-by: Jordan Hargrave <Jordan_Hargrave@dell.com>
Signed-off-by: Narendra K <Narendra_K@dell.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>

---
 include/linux/netdevice.h |    4 ++++
 net/Kconfig               |   10 ++++++++++
 net/core/Makefile         |    1 +
 net/core/cdev.c           |   42 ++++++++++++++++++++++++++++++++++++++++++
 net/core/cdev.h           |   13 +++++++++++++
 net/core/dev.c            |   10 ++++++++++
 net/core/net-sysfs.c      |   13 +++++++++++++
 7 files changed, 93 insertions(+), 0 deletions(-)
 create mode 100644 net/core/cdev.c
 create mode 100644 net/core/cdev.h

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index b332eef..a2f23b4 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -44,6 +44,7 @@
 #include <linux/workqueue.h>
 
 #include <linux/ethtool.h>
+#include <linux/cdev.h>
 #include <net/net_namespace.h>
 #include <net/dsa.h>
 #ifdef CONFIG_DCB
@@ -916,6 +917,9 @@ struct net_device
 	/* max exchange id for FCoE LRO by ddp */
 	unsigned int		fcoe_ddp_xid;
 #endif
+#ifdef CONFIG_NET_CDEV
+	struct cdev cdev;
+#endif
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
diff --git a/net/Kconfig b/net/Kconfig
index 041c35e..bdc5bd7 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -43,6 +43,16 @@ config COMPAT_NETLINK_MESSAGES
 	  Newly written code should NEVER need this option but do
 	  compat-independent messages instead!
 
+config NET_CDEV
+       bool "/dev files for network devices"
+       default y
+       help
+         This option causes /dev entries to be created for each
+         network device.  This allows the use of udev to create
+         alternate device naming policies.
+
+	 If unsure, say Y.
+
 menu "Networking options"
 
 source "net/packet/Kconfig"
diff --git a/net/core/Makefile b/net/core/Makefile
index 796f46e..0b40d2c 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -19,4 +19,5 @@ obj-$(CONFIG_NET_DMA) += user_dma.o
 obj-$(CONFIG_FIB_RULES) += fib_rules.o
 obj-$(CONFIG_TRACEPOINTS) += net-traces.o
 obj-$(CONFIG_NET_DROP_MONITOR) += drop_monitor.o
+obj-$(CONFIG_NET_CDEV) += cdev.o
 
diff --git a/net/core/cdev.c b/net/core/cdev.c
new file mode 100644
index 0000000..1f36076
--- /dev/null
+++ b/net/core/cdev.c
@@ -0,0 +1,42 @@
+#include <linux/fs.h>
+#include <linux/cdev.h>
+#include <linux/netdevice.h>
+#include <linux/device.h>
+
+/* Used for network dynamic major number */
+static dev_t netdev_devt;
+
+static int netdev_cdev_open(struct inode *inode, struct file *filep)
+{
+	/* no operations on this device are implemented */
+	return -ENOSYS;
+}
+
+static const struct file_operations netdev_cdev_fops = {
+	.owner = THIS_MODULE,
+	.open = netdev_cdev_open,
+};
+
+void netdev_cdev_alloc(void)
+{
+	alloc_chrdev_region(&netdev_devt, 0, 1<<20, "net");
+}
+
+void netdev_cdev_init(struct net_device *dev)
+{
+	cdev_init(&dev->cdev, &netdev_cdev_fops);
+	cdev_add(&dev->cdev, MKDEV(MAJOR(netdev_devt), dev->ifindex), 1);
+
+}
+
+void netdev_cdev_del(struct net_device *dev)
+{
+	if (dev->cdev.dev)
+		cdev_del(&dev->cdev);
+}
+
+void netdev_cdev_kobj_init(struct device *dev, struct net_device *net)
+{
+	if (net->cdev.dev)
+		dev->devt = net->cdev.dev;
+}
diff --git a/net/core/cdev.h b/net/core/cdev.h
new file mode 100644
index 0000000..9cf5a90
--- /dev/null
+++ b/net/core/cdev.h
@@ -0,0 +1,13 @@
+#include <linux/netdevice.h>
+
+#ifdef CONFIG_NET_CDEV
+void netdev_cdev_alloc(void);
+void netdev_cdev_init(struct net_device *dev);
+void netdev_cdev_del(struct net_device *dev);
+void netdev_cdev_kobj_init(struct device *dev, struct net_device *net);
+#else
+static inline void netdev_cdev_alloc(void) {}
+static inline void netdev_cdev_init(struct net_device *dev) {}
+static inline void netdev_cdev_del(struct net_device *dev) {}
+static inline void netdev_cdev_kobj_init(struct device *dev, struct net_device *net) {}
+#endif
diff --git a/net/core/dev.c b/net/core/dev.c
index a74c8fd..d771438 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -129,6 +129,7 @@
 #include <trace/events/napi.h>
 
 #include "net-sysfs.h"
+#include "cdev.h"
 
 /* Instead of increasing this, you should create a hash table. */
 #define MAX_GRO_SKBS 8
@@ -4684,6 +4685,7 @@ static void rollback_registered(struct net_device *dev)
 
 	/* Remove entries from kobject tree */
 	netdev_unregister_kobject(dev);
+	netdev_cdev_del(dev);
 
 	synchronize_net();
 
@@ -4835,6 +4837,8 @@ int register_netdevice(struct net_device *dev)
 	if (dev->features & NETIF_F_SG)
 		dev->features |= NETIF_F_GSO;
 
+	netdev_cdev_init(dev);
+
 	netdev_initialize_kobject(dev);
 
 	ret = call_netdevice_notifiers(NETDEV_POST_INIT, dev);
@@ -4870,6 +4874,7 @@ out:
 	return ret;
 
 err_uninit:
+	netdev_cdev_del(dev);
 	if (dev->netdev_ops->ndo_uninit)
 		dev->netdev_ops->ndo_uninit(dev);
 	goto out;
@@ -5377,6 +5382,7 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
 	dev_addr_discard(dev);
 
 	netdev_unregister_kobject(dev);
+	netdev_cdev_del(dev);
 
 	/* Actually switch the network namespace */
 	dev_net_set(dev, net);
@@ -5393,6 +5399,8 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
 			dev->iflink = dev->ifindex;
 	}
 
+	netdev_cdev_init(dev);
+
 	/* Fixup kobjects */
 	err = netdev_register_kobject(dev);
 	WARN_ON(err);
@@ -5626,6 +5634,8 @@ static int __init net_dev_init(void)
 
 	BUG_ON(!dev_boot_phase);
 
+	netdev_cdev_alloc();
+
 	if (dev_proc_init())
 		goto out;
 
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 753c420..f4ee557 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -19,6 +19,7 @@
 #include <net/wext.h>
 
 #include "net-sysfs.h"
+#include "cdev.h"
 
 #ifdef CONFIG_SYSFS
 static const char fmt_hex[] = "%#x\n";
@@ -501,6 +502,14 @@ static void netdev_release(struct device *d)
 	kfree((char *)dev - dev->padded);
 }
 
+#ifdef CONFIG_NET_CDEV
+static char *netdev_devnode(struct device *d, mode_t *mode)
+{
+	struct net_device *dev = to_net_dev(d);
+	return kasprintf(GFP_KERNEL, "netdev/%s", dev->name);
+}
+#endif
+
 static struct class net_class = {
 	.name = "net",
 	.dev_release = netdev_release,
@@ -510,6 +519,9 @@ static struct class net_class = {
 #ifdef CONFIG_HOTPLUG
 	.dev_uevent = netdev_uevent,
 #endif
+#ifdef CONFIG_NET_CDEV
+	.devnode = netdev_devnode,
+#endif
 };
 
 /* Delete sysfs entries but hold kobject reference until after all
@@ -536,6 +548,7 @@ int netdev_register_kobject(struct net_device *net)
 	dev->class = &net_class;
 	dev->platform_data = net;
 	dev->groups = groups;
+	netdev_cdev_kobj_init(dev, net);
 
 	dev_set_name(dev, "%s", net->name);
 
-- 
1.6.0.6


^ permalink raw reply related

* Re: [PATCH] net: Fix struct sock bitfield annotation
From: Eric Dumazet @ 2009-10-09 20:41 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: David S. Miller, Vegard Nossum, Linux Netdev List, Ingo Molnar
In-Reply-To: <alpine.DEB.1.10.0910091539170.32209@gentwo.org>

Christoph Lameter a écrit :
> On Fri, 9 Oct 2009, Eric Dumazet wrote:
> 
>> For networking guys, here is the actual mess with "struct sock" on x86_64,
>> related to UDP handling (critical latencies for some people). We basically touch
>> all cache lines, in every paths, bad effects on SMP...
> 
> Please keep me posted on this. I am very interested in this work.
> 
> Some simple shuffling around may do some good here.
> 

Sure, will do, but first I want to suppress the lock_sock()/release_sock() in
rx path, that was added for sk_forward_alloc thing. This really hurts,
because of the backlog handling.

I have preliminary patch that restore UDP latencies we had in the past ;)

Trick is for UDP, sk_forward_alloc is not updated by tx/rx, only rx.
So we can use the sk_receive_queue.lock to forbid concurrent updates.

As this lock is already hot and only used by rx, we wont have to
dirty the sk_lock, that will only be used by tx path.

Then we can carefuly reorder struct sock to lower number of cache lines
needed for each path.


Patch against linux-2.6 git tree

 net/core/sock.c |    9 ++++
 net/ipv4/udp.c  |   89 ++++++++++++++++++++++------------------------
 2 files changed, 51 insertions(+), 47 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 7626b6a..45212d4 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -276,6 +276,7 @@ int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 {
 	int err = 0;
 	int skb_len;
+	unsigned long flags;
 
 	/* Cast sk->rcvbuf to unsigned... It's pointless, but reduces
 	   number of warnings when compiling with -W --ANK
@@ -290,8 +291,12 @@ int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	if (err)
 		goto out;
 
+	skb_orphan(skb);
+
+	spin_lock_irqsave(&sk->sk_receive_queue.lock, flags);
 	if (!sk_rmem_schedule(sk, skb->truesize)) {
 		err = -ENOBUFS;
+		spin_unlock_irqrestore(&sk->sk_receive_queue.lock, flags);
 		goto out;
 	}
 
@@ -305,7 +310,9 @@ int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	 */
 	skb_len = skb->len;
 
-	skb_queue_tail(&sk->sk_receive_queue, skb);
+	__skb_queue_tail(&sk->sk_receive_queue, skb);
+
+	spin_unlock_irqrestore(&sk->sk_receive_queue.lock, flags);
 
 	if (!sock_flag(sk, SOCK_DEAD))
 		sk->sk_data_ready(sk, skb_len);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 6ec6a8a..e8a1be4 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -841,6 +841,36 @@ out:
 	return ret;
 }
 
+
+/**
+ *	first_packet_length	- return length of first packet in receive queue
+ *	@sk: socket
+ *
+ *	Drops all bad checksum frames, until a valid one is found.
+ *	Returns the length of found skb, or 0 if none is found.
+ */
+static unsigned int first_packet_length(struct sock *sk)
+{
+	struct sk_buff_head *rcvq = &sk->sk_receive_queue;
+	struct sk_buff *skb;
+	unsigned int res;
+
+	spin_lock_bh(&rcvq->lock);
+
+	while ((skb = skb_peek(rcvq)) != NULL &&
+		udp_lib_checksum_complete(skb)) {
+		UDP_INC_STATS_BH(sock_net(sk), UDP_MIB_INERRORS,
+				 IS_UDPLITE(sk));
+		__skb_unlink(skb, rcvq);
+		skb_kill_datagram(sk, skb, 0);
+	}
+	res = skb ? skb->len : 0;
+
+	spin_unlock_bh(&rcvq->lock);
+
+	return res;
+}
+
 /*
  *	IOCTL requests applicable to the UDP protocol
  */
@@ -857,21 +887,16 @@ int udp_ioctl(struct sock *sk, int cmd, unsigned long arg)
 
 	case SIOCINQ:
 	{
-		struct sk_buff *skb;
-		unsigned long amount;
+		unsigned int amount = first_packet_length(sk);
 
-		amount = 0;
-		spin_lock_bh(&sk->sk_receive_queue.lock);
-		skb = skb_peek(&sk->sk_receive_queue);
-		if (skb != NULL) {
+		if (amount)
 			/*
 			 * We will only return the amount
 			 * of this packet since that is all
 			 * that will be read.
 			 */
-			amount = skb->len - sizeof(struct udphdr);
-		}
-		spin_unlock_bh(&sk->sk_receive_queue.lock);
+			amount -= sizeof(struct udphdr);
+
 		return put_user(amount, (int __user *)arg);
 	}
 
@@ -968,17 +993,17 @@ try_again:
 		err = ulen;
 
 out_free:
-	lock_sock(sk);
+	spin_lock_bh(&sk->sk_receive_queue.lock);
 	skb_free_datagram(sk, skb);
-	release_sock(sk);
+	spin_unlock_bh(&sk->sk_receive_queue.lock);
 out:
 	return err;
 
 csum_copy_err:
-	lock_sock(sk);
+	spin_lock_bh(&sk->sk_receive_queue.lock);
 	if (!skb_kill_datagram(sk, skb, flags))
-		UDP_INC_STATS_USER(sock_net(sk), UDP_MIB_INERRORS, is_udplite);
-	release_sock(sk);
+		UDP_INC_STATS_BH(sock_net(sk), UDP_MIB_INERRORS, is_udplite);
+	spin_unlock_bh(&sk->sk_receive_queue.lock);
 
 	if (noblock)
 		return -EAGAIN;
@@ -1060,7 +1085,6 @@ drop:
 int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 {
 	struct udp_sock *up = udp_sk(sk);
-	int rc;
 	int is_udplite = IS_UDPLITE(sk);
 
 	/*
@@ -1140,16 +1164,7 @@ int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 			goto drop;
 	}
 
-	rc = 0;
-
-	bh_lock_sock(sk);
-	if (!sock_owned_by_user(sk))
-		rc = __udp_queue_rcv_skb(sk, skb);
-	else
-		sk_add_backlog(sk, skb);
-	bh_unlock_sock(sk);
-
-	return rc;
+	return __udp_queue_rcv_skb(sk, skb);
 
 drop:
 	UDP_INC_STATS_BH(sock_net(sk), UDP_MIB_INERRORS, is_udplite);
@@ -1540,29 +1555,11 @@ unsigned int udp_poll(struct file *file, struct socket *sock, poll_table *wait)
 {
 	unsigned int mask = datagram_poll(file, sock, wait);
 	struct sock *sk = sock->sk;
-	int 	is_lite = IS_UDPLITE(sk);
 
 	/* Check for false positives due to checksum errors */
-	if ((mask & POLLRDNORM) &&
-	    !(file->f_flags & O_NONBLOCK) &&
-	    !(sk->sk_shutdown & RCV_SHUTDOWN)) {
-		struct sk_buff_head *rcvq = &sk->sk_receive_queue;
-		struct sk_buff *skb;
-
-		spin_lock_bh(&rcvq->lock);
-		while ((skb = skb_peek(rcvq)) != NULL &&
-		       udp_lib_checksum_complete(skb)) {
-			UDP_INC_STATS_BH(sock_net(sk),
-					UDP_MIB_INERRORS, is_lite);
-			__skb_unlink(skb, rcvq);
-			kfree_skb(skb);
-		}
-		spin_unlock_bh(&rcvq->lock);
-
-		/* nothing to see, move along */
-		if (skb == NULL)
-			mask &= ~(POLLIN | POLLRDNORM);
-	}
+	if ((mask & POLLRDNORM) && !(file->f_flags & O_NONBLOCK) &&
+	    !(sk->sk_shutdown & RCV_SHUTDOWN) && !first_packet_length(sk))
+		mask &= ~(POLLIN | POLLRDNORM);
 
 	return mask;
 

^ permalink raw reply related

* Re: behaviour question for igb on nehalem box
From: Brandeburg, Jesse @ 2009-10-09 20:22 UTC (permalink / raw)
  To: Chris Friesen
  Cc: e1000-list, Linux Network Development list, Allan, Bruce W,
	Ronciak, John, Kirsher, Jeffrey T
In-Reply-To: <4ACF8466.5030309@nortel.com>

On Fri, 9 Oct 2009, Chris Friesen wrote:
> I've got some general questions around the expected behaviour of the
> 82576 igb net device.  (On a dual quad-core Nehalem box, if it matters.)
> 
> As a caveat, the box is running Centos 5.3 with their 2.6.18 kernel.
> It's using the 1.3.16-k2 igb driver though, which looks to be the one
> from mainline linux.
> 
> The igb driver is being loaded with no parameters specified.  At driver
> init time, it's selecting 1 tx queue and 4 rx queues per device.
> 
> My first question is whether the number of queues makes sense.  I

It does for this kernel, because 2.6.18 doesn't support multiple tx 
queues.  The hardware supports RSS over receive queues, and the driver 
doesn't mention the multiple receive queues from the OS.

> couldn't figure out how this would happen since the rules for selecting
> the number of queues seems to be the same for rx and tx.  Also, it's not
> clear to me why it's limiting itself to 4 rx queues when I have 8
> physical cores (and 16 virtual ones with hyperthreading enabled).

for gigabit more queues is not necessarily better, and MQ arguably isn't 
necessary at all for gigabit.  However, it can help for some workloads 
when spreading out RX traffic.  the hardware you have only supports 8 
queues (rx and tx) and the driver is configured to only set up 4 max.

> My second question is around how the rx queues are mapped to interrupts.
>  According to /proc/interrupts there appears to be a 1:1 mapping between
> queues and interrupts.  However, I've set up at test with a given amount
> of traffic coming in to the device (from 4 different IP addresses and 4
> ports).  Under this scenario, "ethtool -S" shows the number of packets
> increasing for only rx queue 0, but I see the interrupt count going up
> for two interrupts.

one transmit interrupt and one receive interrupt?  RSS will spread the 
receive work out in a flow based way, based on ip/xDP header.  Your test 
as described should be using more than one flow (and therefore more than 
one rx queue) unless you got caught out by the default arp_filter 
behavior (check arp -an).
 
> My final question is around smp affinity for the rx and tx queue
> interrupts.  Do I need to affine the interrupt for each rx queue to a
> single core to guarantee proper packet ordering, or can they be handled
> on arbitrary cores?  Should the tx queue be affined to a particular core
> or left to be handled by all cores?

on RHEL5.3 you can use irqbalance, you shouldn't need to hand affine 
anything.  Packets won't be received out of order unless you have the rx 
interrupts going to more that one cpu per queue. (smp_affinity mask has 
more than one bit set)  RSS is doing flow steering.

going to a 2.6.27 or newer kernel will get you full tx multiqueue support.

Hope this helps,
  Jesse

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference

^ permalink raw reply

* Re: [PATCH] irda/sa1100_ir: check return value of startup hook
From: Dmitry Artamonow @ 2009-10-09 20:12 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: netdev, Samuel Ortiz, Russell King, David S. Miller,
	linux-arm-kernel
In-Reply-To: <4ACF293C.7070803@ru.mvista.com>

[-- Attachment #1: Type: text/plain, Size: 367 bytes --]

On 16:14 Fri 09 Oct     , Sergei Shtylyov wrote:

[...]
> > -	if (si->pdata->startup)
> > -		si->pdata->startup(si->dev);
> > +	if (si->pdata->startup)	{
> > +		ret = si->pdata->startup(si->dev);
> > +		if (ret)
> > +			return ret;
> > +		}
> 
>     Overindented brace.
> 

Nice catch, thanks!

Updated patch in attachment.

-- 
Best regards,
Dmitry "MAD" Artamonow


[-- Attachment #2: 0001-irda-sa1100_ir-check-return-value-of-startup-hook.patch --]
[-- Type: text/plain, Size: 933 bytes --]

>From ba1fe701950634aae46aa59431633e99f8bd18cc Mon Sep 17 00:00:00 2001
From: Dmitry Artamonow <mad_soft@inbox.ru>
Date: Fri, 9 Oct 2009 21:56:21 +0400
Subject: [PATCH v2] irda/sa1100_ir: check return value of startup hook

Signed-off-by: Dmitry Artamonow <mad_soft@inbox.ru>
---
 drivers/net/irda/sa1100_ir.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/irda/sa1100_ir.c b/drivers/net/irda/sa1100_ir.c
index 38bf7cf..c412e80 100644
--- a/drivers/net/irda/sa1100_ir.c
+++ b/drivers/net/irda/sa1100_ir.c
@@ -232,8 +232,11 @@ static int sa1100_irda_startup(struct sa1100_irda *si)
 	/*
 	 * Ensure that the ports for this device are setup correctly.
 	 */
-	if (si->pdata->startup)
-		si->pdata->startup(si->dev);
+	if (si->pdata->startup)	{
+		ret = si->pdata->startup(si->dev);
+		if (ret)
+			return ret;
+	}
 
 	/*
 	 * Configure PPC for IRDA - we want to drive TXD2 low.
-- 
1.6.3.4


[-- Attachment #3: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* Re: [PATCH] net: Fix struct sock bitfield annotation
From: Christoph Lameter @ 2009-10-09 19:39 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Vegard Nossum, Linux Netdev List, Ingo Molnar
In-Reply-To: <4ACE95E1.30301@gmail.com>

On Fri, 9 Oct 2009, Eric Dumazet wrote:

> For networking guys, here is the actual mess with "struct sock" on x86_64,
> related to UDP handling (critical latencies for some people). We basically touch
> all cache lines, in every paths, bad effects on SMP...

Please keep me posted on this. I am very interested in this work.

Some simple shuffling around may do some good here.


^ permalink raw reply

* Re: [RFCv4 PATCH 1/2] net: Introduce recvmmsg socket syscall
From: Arnaldo Carvalho de Melo @ 2009-10-09 19:35 UTC (permalink / raw)
  To: David Miller
  Cc: Caitlin Bestler, Chris Van Hoof, Clark Williams, Neil Horman,
	Nir Tzachar, Nivedita Singhvi, Paul Moore,
	Rémi Denis-Courmont, Steven Whitehouse,
	Linux Networking Development Mailing List
In-Reply-To: <20090916170738.GC7699@ghostprotocols.net>

Em Wed, Sep 16, 2009 at 02:07:38PM -0300, Arnaldo Carvalho de Melo escreveu:
> Meaning receive multiple messages, reducing the number of syscalls and
> net stack entry/exit operations.
>
> Next patches will introduce mechanisms where protocols that want to
> optimize this operation will provide an unlocked_recvmsg operation.

Hi Dave,

	The second patch in this series has issues, I still have to
investigate it properly, study removing the skb_queue_head lock like TCP
does, but the first patch seems to be OK and already providing good
results at least as reported by Nir, if there aren't any other concerns
about the API, can we get it into net-next-2.6?

Best Regards,

- Arnaldo

^ permalink raw reply

* Re: [PATCH] Generalize socket rx gap / receive queue overflow cmsg (v2)
From: Neil Horman @ 2009-10-09 19:35 UTC (permalink / raw)
  To: netdev; +Cc: eric.dumazet, davem, socketcan, nhorman
In-Reply-To: <20091007180835.GB20524@hmsreliant.think-freely.org>

Ok, take two of this patch, taking in Erics notes:

Change Notes:

1) Locking on dropcount cleaned up

2) Support for reading of dropcount moved to a lower level support function
(sock_recv_ts_and_drops, modeled after sock_recv_timestamp).  This should make
this work a good deal faster

3) Socket flags moved to sk->sk_flags structure in support of (2)

Works well for me.


========================================================================

Create a new socket level option to report number of queue overflows

Recently I augmented the AF_PACKET protocol to report the number of frames lost
on the socket receive queue between any two enqueued frames.  This value was
exported via a SOL_PACKET level cmsg.  AFter I completed that work it was
requested that this feature be generalized so that any datagram oriented socket
could make use of this option.  As such I've created this patch, It creates a
new SOL_SOCKET level option called SO_RXQ_OVFL, which when enabled exports a
SOL_SOCKET level cmsg that reports the nubmer of times the sk_receive_queue
overflowed between any two given frames.  It also augments the AF_PACKET
protocol to take advantage of this new feature (as it previously did not touch
sk->sk_drops, which this patch uses to record the overflow count).  Tested
successfully by me.

Notes:

1) Unlike my previous patch, this patch simply records the sk_drops value, which
is not a number of drops between packets, but rather a total number of drops.
Deltas must be computed in user space.

2) While this patch currently works with datagram oriented protocols, it will
also be accepted by non-datagram oriented protocols. I'm not sure if thats
agreeable to everyone, but my argument in favor of doing so is that, for those
protocols which aren't applicable to this option, sk_drops will always be zero,
and reporting no drops on a receive queue that isn't used for those
non-participating protocols seems reasonable to me.  This also saves us having
to code in a per-protocol opt in mechanism.

3) This applies cleanly to net-next assuming that commit
977750076d98c7ff6cbda51858bb5a5894a9d9ab (my af packet cmsg patch) is reverted

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>


 include/asm-generic/socket.h |    1 +
 include/linux/skbuff.h       |    6 ++++--
 include/net/sock.h           |   13 +++++++++++++
 net/atm/common.c             |    2 +-
 net/bluetooth/af_bluetooth.c |    2 +-
 net/bluetooth/rfcomm/sock.c  |    2 +-
 net/can/bcm.c                |    2 +-
 net/can/raw.c                |    2 +-
 net/core/sock.c              |   17 ++++++++++++++++-
 net/ieee802154/dgram.c       |    2 +-
 net/ieee802154/raw.c         |    2 +-
 net/ipv4/raw.c               |    2 +-
 net/ipv4/udp.c               |    2 +-
 net/ipv6/raw.c               |    2 +-
 net/ipv6/udp.c               |    2 +-
 net/key/af_key.c             |    2 +-
 net/packet/af_packet.c       |    7 +++----
 net/rxrpc/ar-recvmsg.c       |    2 +-
 net/sctp/socket.c            |    2 +-
 net/socket.c                 |    7 +++++++
 20 files changed, 58 insertions(+), 21 deletions(-)

diff --git a/include/asm-generic/socket.h b/include/asm-generic/socket.h
index 538991c..9a6115e 100644
--- a/include/asm-generic/socket.h
+++ b/include/asm-generic/socket.h
@@ -63,4 +63,5 @@
 #define SO_PROTOCOL		38
 #define SO_DOMAIN		39
 
+#define SO_RXQ_OVFL             40
 #endif /* __ASM_GENERIC_SOCKET_H */
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index df7b23a..8c866b5 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -389,8 +389,10 @@ struct sk_buff {
 #ifdef CONFIG_NETWORK_SECMARK
 	__u32			secmark;
 #endif
-
-	__u32			mark;
+	union {
+		__u32		mark;
+		__u32		dropcount;
+	};
 
 	__u16			vlan_tci;
 
diff --git a/include/net/sock.h b/include/net/sock.h
index 98398bd..ae48d99 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -505,6 +505,7 @@ enum sock_flags {
 	SOCK_TIMESTAMPING_RAW_HARDWARE, /* %SOF_TIMESTAMPING_RAW_HARDWARE */
 	SOCK_TIMESTAMPING_SYS_HARDWARE, /* %SOF_TIMESTAMPING_SYS_HARDWARE */
 	SOCK_FASYNC, /* fasync() active */
+	SOCK_RXQ_OVFL,
 };
 
 static inline void sock_copy_flags(struct sock *nsk, struct sock *osk)
@@ -1493,6 +1494,18 @@ sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb)
 		sk->sk_stamp = kt;
 }
 
+extern void __sock_recv_ts_and_drops(struct msghdr *msg, struct sock *sk,
+	struct sk_buff *skb);
+
+static __inline__ void
+sock_recv_ts_and_drops(struct msghdr *msg, struct sock *sk, struct sk_buff *skb)
+{
+	sock_recv_timestamp(msg, sk, skb);
+
+	if (sock_flag(sk, SOCK_RXQ_OVFL) && skb && skb->dropcount)
+		__sock_recv_ts_and_drops(msg, sk, skb);
+}
+
 /**
  * sock_tx_timestamp - checks whether the outgoing packet is to be time stamped
  * @msg:	outgoing packet
diff --git a/net/atm/common.c b/net/atm/common.c
index 950bd16..d61e051 100644
--- a/net/atm/common.c
+++ b/net/atm/common.c
@@ -496,7 +496,7 @@ int vcc_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
 	error = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
 	if (error)
 		return error;
-	sock_recv_timestamp(msg, sk, skb);
+	sock_recv_ts_and_drops(msg, sk, skb);
 	pr_debug("RcvM %d -= %d\n", atomic_read(&sk->sk_rmem_alloc), skb->truesize);
 	atm_return(vcc, skb->truesize);
 	skb_free_datagram(sk, skb);
diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c
index 1f6e49c..399e59c 100644
--- a/net/bluetooth/af_bluetooth.c
+++ b/net/bluetooth/af_bluetooth.c
@@ -257,7 +257,7 @@ int bt_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
 	skb_reset_transport_header(skb);
 	err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
 	if (err == 0)
-		sock_recv_timestamp(msg, sk, skb);
+		sock_recv_ts_and_drops(msg, sk, skb);
 
 	skb_free_datagram(sk, skb);
 
diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c
index c707865..d3bfc1b 100644
--- a/net/bluetooth/rfcomm/sock.c
+++ b/net/bluetooth/rfcomm/sock.c
@@ -703,7 +703,7 @@ static int rfcomm_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
 		copied += chunk;
 		size   -= chunk;
 
-		sock_recv_timestamp(msg, sk, skb);
+		sock_recv_ts_and_drops(msg, sk, skb);
 
 		if (!(flags & MSG_PEEK)) {
 			atomic_sub(chunk, &sk->sk_rmem_alloc);
diff --git a/net/can/bcm.c b/net/can/bcm.c
index 597da4f..2f47039 100644
--- a/net/can/bcm.c
+++ b/net/can/bcm.c
@@ -1534,7 +1534,7 @@ static int bcm_recvmsg(struct kiocb *iocb, struct socket *sock,
 		return err;
 	}
 
-	sock_recv_timestamp(msg, sk, skb);
+	sock_recv_ts_and_drops(msg, sk, skb);
 
 	if (msg->msg_name) {
 		msg->msg_namelen = sizeof(struct sockaddr_can);
diff --git a/net/can/raw.c b/net/can/raw.c
index b5e8979..962fc9f 100644
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -702,7 +702,7 @@ static int raw_recvmsg(struct kiocb *iocb, struct socket *sock,
 		return err;
 	}
 
-	sock_recv_timestamp(msg, sk, skb);
+	sock_recv_ts_and_drops(msg, sk, skb);
 
 	if (msg->msg_name) {
 		msg->msg_namelen = sizeof(struct sockaddr_can);
diff --git a/net/core/sock.c b/net/core/sock.c
index 7626b6a..0897311 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -276,6 +276,8 @@ int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 {
 	int err = 0;
 	int skb_len;
+	unsigned long flags;
+	struct sk_buff_head *list = &sk->sk_receive_queue;
 
 	/* Cast sk->rcvbuf to unsigned... It's pointless, but reduces
 	   number of warnings when compiling with -W --ANK
@@ -305,7 +307,10 @@ int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	 */
 	skb_len = skb->len;
 
-	skb_queue_tail(&sk->sk_receive_queue, skb);
+	spin_lock_irqsave(&list->lock, flags);
+	skb->dropcount = atomic_read(&sk->sk_drops);
+	__skb_queue_tail(list, skb);
+	spin_unlock_irqrestore(&list->lock, flags);
 
 	if (!sock_flag(sk, SOCK_DEAD))
 		sk->sk_data_ready(sk, skb_len);
@@ -702,6 +707,12 @@ set_rcvbuf:
 
 		/* We implement the SO_SNDLOWAT etc to
 		   not be settable (1003.1g 5.3) */
+	case SO_RXQ_OVFL:
+		if (valbool)
+			sock_set_flag(sk, SOCK_RXQ_OVFL);
+		else
+			sock_reset_flag(sk, SOCK_RXQ_OVFL);
+		break;
 	default:
 		ret = -ENOPROTOOPT;
 		break;
@@ -901,6 +912,10 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
 		v.val = sk->sk_mark;
 		break;
 
+	case SO_RXQ_OVFL:
+		v.val = sock_flag(sk, SOCK_RXQ_OVFL);
+		break;
+
 	default:
 		return -ENOPROTOOPT;
 	}
diff --git a/net/ieee802154/dgram.c b/net/ieee802154/dgram.c
index a413b1b..25ad956 100644
--- a/net/ieee802154/dgram.c
+++ b/net/ieee802154/dgram.c
@@ -303,7 +303,7 @@ static int dgram_recvmsg(struct kiocb *iocb, struct sock *sk,
 	if (err)
 		goto done;
 
-	sock_recv_timestamp(msg, sk, skb);
+	sock_recv_ts_and_drops(msg, sk, skb);
 
 	if (flags & MSG_TRUNC)
 		copied = skb->len;
diff --git a/net/ieee802154/raw.c b/net/ieee802154/raw.c
index 30e74ee..769c8d1 100644
--- a/net/ieee802154/raw.c
+++ b/net/ieee802154/raw.c
@@ -191,7 +191,7 @@ static int raw_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	if (err)
 		goto done;
 
-	sock_recv_timestamp(msg, sk, skb);
+	sock_recv_ts_and_drops(msg, sk, skb);
 
 	if (flags & MSG_TRUNC)
 		copied = skb->len;
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 757c917..f18172b 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -682,7 +682,7 @@ static int raw_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	if (err)
 		goto done;
 
-	sock_recv_timestamp(msg, sk, skb);
+	sock_recv_ts_and_drops(msg, sk, skb);
 
 	/* Copy the address. */
 	if (sin) {
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 6ec6a8a..bb96eee 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -951,7 +951,7 @@ try_again:
 		UDP_INC_STATS_USER(sock_net(sk),
 				UDP_MIB_INDATAGRAMS, is_udplite);
 
-	sock_recv_timestamp(msg, sk, skb);
+	sock_recv_ts_and_drops(msg, sk, skb);
 
 	/* Copy the address. */
 	if (sin) {
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 4f24570..d8375bc 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -497,7 +497,7 @@ static int rawv6_recvmsg(struct kiocb *iocb, struct sock *sk,
 			sin6->sin6_scope_id = IP6CB(skb)->iif;
 	}
 
-	sock_recv_timestamp(msg, sk, skb);
+	sock_recv_ts_and_drops(msg, sk, skb);
 
 	if (np->rxopt.all)
 		datagram_recv_ctl(sk, msg, skb);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index c6a303e..b51ee64 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -252,7 +252,7 @@ try_again:
 					UDP_MIB_INDATAGRAMS, is_udplite);
 	}
 
-	sock_recv_timestamp(msg, sk, skb);
+	sock_recv_ts_and_drops(msg, sk, skb);
 
 	/* Copy the address. */
 	if (msg->msg_name) {
diff --git a/net/key/af_key.c b/net/key/af_key.c
index c078ae6..472f659 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -3606,7 +3606,7 @@ static int pfkey_recvmsg(struct kiocb *kiocb,
 	if (err)
 		goto out_free;
 
-	sock_recv_timestamp(msg, sk, skb);
+	sock_recv_ts_and_drops(msg, sk, skb);
 
 	err = (flags & MSG_TRUNC) ? skb->len : copied;
 
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index f87ed48..bf3a295 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -627,15 +627,14 @@ static int packet_rcv(struct sk_buff *skb, struct net_device *dev,
 
 	spin_lock(&sk->sk_receive_queue.lock);
 	po->stats.tp_packets++;
+	skb->dropcount = atomic_read(&sk->sk_drops);
 	__skb_queue_tail(&sk->sk_receive_queue, skb);
 	spin_unlock(&sk->sk_receive_queue.lock);
 	sk->sk_data_ready(sk, skb->len);
 	return 0;
 
 drop_n_acct:
-	spin_lock(&sk->sk_receive_queue.lock);
-	po->stats.tp_drops++;
-	spin_unlock(&sk->sk_receive_queue.lock);
+	po->stats.tp_drops = atomic_inc_return(&sk->sk_drops);
 
 drop_n_restore:
 	if (skb_head != skb->data && skb_shared(skb)) {
@@ -1478,7 +1477,7 @@ static int packet_recvmsg(struct kiocb *iocb, struct socket *sock,
 	if (err)
 		goto out_free;
 
-	sock_recv_timestamp(msg, sk, skb);
+	sock_recv_ts_and_drops(msg, sk, skb);
 
 	if (msg->msg_name)
 		memcpy(msg->msg_name, &PACKET_SKB_CB(skb)->sa,
diff --git a/net/rxrpc/ar-recvmsg.c b/net/rxrpc/ar-recvmsg.c
index a39bf97..60c2b94 100644
--- a/net/rxrpc/ar-recvmsg.c
+++ b/net/rxrpc/ar-recvmsg.c
@@ -146,7 +146,7 @@ int rxrpc_recvmsg(struct kiocb *iocb, struct socket *sock,
 				memcpy(msg->msg_name,
 				       &call->conn->trans->peer->srx,
 				       sizeof(call->conn->trans->peer->srx));
-			sock_recv_timestamp(msg, &rx->sk, skb);
+			sock_recv_ts_and_drops(msg, &rx->sk, skb);
 		}
 
 		/* receive the message */
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index c8d0575..0970e92 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1958,7 +1958,7 @@ SCTP_STATIC int sctp_recvmsg(struct kiocb *iocb, struct sock *sk,
 	if (err)
 		goto out_free;
 
-	sock_recv_timestamp(msg, sk, skb);
+	sock_recv_ts_and_drops(msg, sk, skb);
 	if (sctp_ulpevent_is_notification(event)) {
 		msg->msg_flags |= MSG_NOTIFICATION;
 		sp->pf->event_msgname(event, msg->msg_name, addr_len);
diff --git a/net/socket.c b/net/socket.c
index d53ad11..c82146c 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -668,6 +668,13 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
 
 EXPORT_SYMBOL_GPL(__sock_recv_timestamp);
 
+void __sock_recv_ts_and_drops(struct msghdr *msg, struct sock *sk,
+	struct sk_buff *skb)
+{
+	put_cmsg(msg, SOL_SOCKET, SO_RXQ_OVFL, sizeof(__u32), &skb->dropcount);
+}
+EXPORT_SYMBOL_GPL(__sock_recv_ts_and_drops);
+
 static inline int __sock_recvmsg(struct kiocb *iocb, struct socket *sock,
 				 struct msghdr *msg, size_t size, int flags)
 {

^ permalink raw reply related

* Ath5k data aborts
From: Krzysztof Halasa @ 2009-10-09 19:16 UTC (permalink / raw)
  To: linux-wireless, ath5k-devel, netdev

Hi,

I have done a small investigation. IXP425 (ARM) in big-endian mode,
EABI, mini-PCI atk5k wifi card, hostapd.

Atheros Communications Inc. Atheros AR5001X+ Wireless Network Adapter (rev 01)
Subsystem: Wistron NeWeb Corp. CM9 Wireless a/b/g MiniPCI Adapter
168c:0013 subsystem 185f:1012


Results:
Bad mode in data abort handler detected
Internal error: Oops - bad mode: 0 [#1]
LR is at ath5k_beacon_config+0x150/0x1d4 [ath5k]

This means the PCI device didn't respond on the bus or something
like that. Obviously the card is then unusable and the system needs to
be restarted.

Bisecting (I had to modify the procedure a bit since it only started to
show up after other unrelated code was merged) shows the guilty commit:
e8f055f0c3ba226ca599c14c2e5fe829f6f57cbb (ath5k: Update reset code).

The problem exists with 2.6.30, 2.6.31 and current Linus' tree.

Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>

----------------------------------------------
2.6.30 appears to be fixed by:

--- a/drivers/net/wireless/ath5k/reset.c
+++ b/drivers/net/wireless/ath5k/reset.c
@@ -476,7 +476,7 @@ static void ath5k_hw_set_sleep_clock(struct ath5k_hw *ah, bool enable)
 		(ah->ah_mac_version == (AR5K_SREV_AR2417 >> 4))) {
 			ath5k_hw_reg_write(ah, 0x26, AR5K_PHY_SLMT);
 			ath5k_hw_reg_write(ah, 0x0d, AR5K_PHY_SCAL);
-			ath5k_hw_reg_write(ah, 0x07, AR5K_PHY_SCLOCK);
+			ath5k_hw_reg_write(ah, 0x0C, AR5K_PHY_SCLOCK);
 			ath5k_hw_reg_write(ah, 0x3f, AR5K_PHY_SDELAY);
 			AR5K_REG_WRITE_BITS(ah, AR5K_PCICFG,
 				AR5K_PCICFG_SLEEP_CLOCK_RATE, 0x02);
@@ -490,8 +490,10 @@ static void ath5k_hw_set_sleep_clock(struct ath5k_hw *ah, bool enable)
 		}
 
 		/* Enable sleep clock operation */
+#if 0
 		AR5K_REG_ENABLE_BITS(ah, AR5K_PCICFG,
 				AR5K_PCICFG_SLEEP_CLOCK_EN);
+#endif
 
 	} else {
 


The AR5K_PHY_SCLOCK brings the old value (before the commit in question)
back, I have no idea what is it. Leaving the new value causes the second
run of hostapd to make the driver fail, the chip seems to not respond.
It seems the value itself may be correct (as it works with 2.6.31+) but
there is some additional bug fixed after 2.6.30, gitk show several
candidate patches for this.


Only disabling AR5K_PCICFG write makes the data abort go away.

----------------------------------------------
2.6.31 and Linus-current only need the AR5K_PCICFG change:

--- a/drivers/net/wireless/ath/ath5k/reset.c
+++ b/drivers/net/wireless/ath/ath5k/reset.c
@@ -489,9 +489,10 @@ static void ath5k_hw_set_sleep_clock(struct ath5k_hw *ah, bool enable)
 		}
 
 		/* Enable sleep clock operation */
+#if 0
 		AR5K_REG_ENABLE_BITS(ah, AR5K_PCICFG,
 				AR5K_PCICFG_SLEEP_CLOCK_EN);
-
+#endif
 	} else {
 
 		/* Disable sleep clock operation and


The question is, obviously, how to fix that for good. I can test the
result.


Full error message, not sure why the backtrace isn't printed.

Bad mode in data abort handler detected
Internal error: Oops - bad mode: 0 [#1]
Modules linked in: ohci_hcd ehci_hcd usbcore nls_base ixp4xx_hss ath5k ath ixp4x
x_eth
CPU: 0    Not tainted  (2.6.32-rc3 #123)
PC is at 0xffff01fc
LR is at ath5k_beacon_config+0x150/0x1d4 [ath5k]
pc : [<ffff01fc>]    lr : [<bf028db0>]    psr: a0000092
sp : c7dbfb90  ip : 00008050  fp : c78aa000
r10: c7dbfbd8  r9 : c78ac1c0  r8 : 00003304
r7 : c78aa000  r6 : c78aa000  r5 : 00000013  r4 : c78ac900
r3 : c88e0000  r2 : c88d0024  r1 : c88d0048  r0 : 800924b5
Flags: NzCv  IRQs off  FIQs on  Mode IRQ_32  ISA ARM  Segment user
Control: 000039ff  Table: 067e0000  DAC: 00000015
Process hostapd (pid: 258, stack limit = 0xc7dbe278)
Stack: (0xc7dbfb90 to 0xc7dc0000)
fb80:                                     800924b5 c88d0048 c88d0024 c88e0000 
fba0: c78ac900 00000013 c78aa000 c78aa000 00003304 c78ac1c0 c7dbfbd8 c78aa000 
fbc0: 00008050 c7dbfb90 bf028db0 ffff01fc a0000092 ffffffff 00000003 00000000 
fbe0: 00080000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
fc00: c78ac924 c78ac900 c78ac924 c7d34628 00000300 c7d34620 00000013 bf028ec8 
fc20: c78ac1c0 c7d52980 c0487e90 c7d342c0 c78ac1c0 c67e7140 c7caef20 0000001a 
fc40: 00000004 00000000 00000024 c0391694 c67e7140 c7caef20 0000001a c7dbfc88 
fc60: c7d342c0 c039e974 c7d52440 00000033 c7dbfcc0 c039e998 c7dbfc88 c0487d30 
fc80: c7c6e810 c0384640 00000000 00000000 00000000 00000002 00000000 00000000 
fca0: c7d34000 c78ac000 c0487bb0 c7c6e800 c7d52440 c02c5448 c0488184 00000102 
fcc0: 00000080 00000102 c7c6e800 c7c6e810 c7c6e814 c788d000 c7c6e800 c7d52440 
fce0: c02c5258 c7c54600 c04a0710 00000038 c7dbfd1c c02c42dc c047f6ac c7d52440 
fd00: c7d52440 c02c5244 c788d200 00000000 c7d52440 c02c3f14 00000024 7fffffff 
fd20: 00000000 c7dbff5c c7c54600 c7d52440 c7dbfe18 00000024 00000000 00000000 
fd40: 00000000 c02c47cc c7dbfe38 c7c54600 00000102 00000000 00000000 00000000 
fd60: 00000000 c7dbff5c c7dbfe18 00000000 00000000 00000024 c7dbfefc 00000000 
fd80: 00000008 c0282298 00000000 c7dbff1c 00000000 00000001 ffffffff 00000000 
fda0: 00000000 00000000 00000000 00000000 c7839080 00000001 00000000 00000000 
fdc0: 00000000 c7839080 c0185ccc c7dbfdcc c7dbfdcc 0000092a c7dbfec8 c038eae4 
fde0: c7dbfe18 00008b24 c7dbfec8 c037ab88 c7dbfdec c7dbfe0c c67d7360 c01aaf9c 
fe00: c7dbfe38 c0161cfc 00000000 c67da1a4 00000040 00000000 00000000 c74231bc 
fe20: 00000015 00000024 c7489380 0001d000 c7dbfd50 c7dbff5c c7dbfe7c c7dbfefc 
fe40: c7dbfe7c c7dbfefc c015a048 c7dbfefc 00000008 00000000 c7dbff5c c7dbff5c 
fe60: c7489380 00000000 c7dbfefc c02823ec c7dbff3c c7dbff3c 00000000 00100000 
fe80: 00000000 00000000 00000020 00000000 00008933 c02941a4 c67e4380 c67e4000 
fea0: 0000000a c67e4380 c7dbe000 c047bc28 c786d940 c024df48 776c616e 30000000 
fec0: 00000000 00000000 00000006 00000000 00000000 0e000000 c67e40e0 c67e4084 
fee0: 00000000 c028171c c786d340 00008933 00000000 60000013 00000007 0005c754 
ff00: 00000000 00000000 00000000 00000000 c74890e8 c0472750 00200200 00100100 
ff20: c7497338 c7401498 00200200 00100100 ffffffff ffffffff c780d5a0 c01ce3a0 
ff40: c7497338 c0472750 00200200 c01cea04 c786d340 00000000 c7497338 c7dbfe7c 
ff60: 0000000c c7dbfefc 00000001 00000000 00000000 00000000 00000000 ffffff97 
ff80: c786d340 000598f8 400722b0 000598a0 00000128 c015a048 c7dbe000 00000000 
ffa0: 00000001 c0159ea0 000598f8 400722b0 00000004 be9dfb24 00000000 00000000 
ffc0: 000598f8 400722b0 000598a0 00000128 00000000 00000000 00000001 00000001 
ffe0: be9dfb24 be9dfaf8 40039c84 402b022c 60000010 00000004 00000000 00000000 
Code: 00000000 00000000 00000000 00000000 (00000000) 
---[ end trace ff977de942e87c2d ]---

-- 
Krzysztof Halasa

^ permalink raw reply

* behaviour question for igb on nehalem box
From: Chris Friesen @ 2009-10-09 18:43 UTC (permalink / raw)
  To: e1000-list, Linux Network Development list, Kirsher, Jeffrey T,
	"Brandeburg, Jesse" <jesse


Hi all,

I've got some general questions around the expected behaviour of the
82576 igb net device.  (On a dual quad-core Nehalem box, if it matters.)

As a caveat, the box is running Centos 5.3 with their 2.6.18 kernel.
It's using the 1.3.16-k2 igb driver though, which looks to be the one
from mainline linux.

The igb driver is being loaded with no parameters specified.  At driver
init time, it's selecting 1 tx queue and 4 rx queues per device.

My first question is whether the number of queues makes sense.  I
couldn't figure out how this would happen since the rules for selecting
the number of queues seems to be the same for rx and tx.  Also, it's not
clear to me why it's limiting itself to 4 rx queues when I have 8
physical cores (and 16 virtual ones with hyperthreading enabled).

My second question is around how the rx queues are mapped to interrupts.
 According to /proc/interrupts there appears to be a 1:1 mapping between
queues and interrupts.  However, I've set up at test with a given amount
of traffic coming in to the device (from 4 different IP addresses and 4
ports).  Under this scenario, "ethtool -S" shows the number of packets
increasing for only rx queue 0, but I see the interrupt count going up
for two interrupts.

My final question is around smp affinity for the rx and tx queue
interrupts.  Do I need to affine the interrupt for each rx queue to a
single core to guarantee proper packet ordering, or can they be handled
on arbitrary cores?  Should the tx queue be affined to a particular core
or left to be handled by all cores?

Thanks,

Chris


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference

^ permalink raw reply

* Re: PATCH: Network Device Naming mechanism and policy
From: Greg KH @ 2009-10-09 17:22 UTC (permalink / raw)
  To: Matt Domsch; +Cc: Narendra K, netdev, linux-hotplug, jordan_hargrave
In-Reply-To: <20091009171724.GA11004@auslistsprd01.us.dell.com>

On Fri, Oct 09, 2009 at 12:17:24PM -0500, Matt Domsch wrote:
> 
> uevents aren't namespaced.  Presumably that means /dev can't be
> polyinstantiated.  Therefore, all devnodes in /dev/netdev/* will be
> visible to all processes, where 'ifconfig' and friends would only show
> device names in the processes namespace.  This doesn't mean the app
> can _do_ anything (it's the same as if it tried to act on a device
> using an ifindex for a device not in its namespace), but yes, the fact
> that such a device exists will be exposed.

That's the problem that the sysfs namespace patches were trying to
address.

Now I'm not saying it is a valid thing to try to work with this kind of
crazy, I was just wondering how it would work out.  Looks like it
doesn't :)

thanks,

greg k-h

^ permalink raw reply

* Re: PATCH: Network Device Naming mechanism and policy
From: Matt Domsch @ 2009-10-09 17:17 UTC (permalink / raw)
  To: Greg KH; +Cc: Narendra K, netdev, linux-hotplug, jordan_hargrave
In-Reply-To: <20091009163613.GA3414@kroah.com>

On Fri, Oct 09, 2009 at 09:36:13AM -0700, Greg KH wrote:
> On Fri, Oct 09, 2009 at 09:00:01AM -0500, Narendra K wrote:
> > On Fri, Oct 09, 2009 at 07:12:07PM +0530, K, Narendra wrote:
> > > > example udev config:
> > > > SUBSYSTEM=="net",
> > > SYMLINK+="net/by-mac/$sysfs{ifindex}.$sysfs{address}"
> > > 
> > > work as well.  But coupling the ifindex to the MAC address like this
> > > doesn't work.  (In general, coupling any two unrelated attributes when
> > > trying to do persistent names doesn't work.)
> > > 
> > Attaching the latest patch incorporating review comments.
> > 
> > By creating character devices for every network device, we can use
> > udev to maintain alternate naming policies for devices, including
> > additional names for the same device, without interfering with the
> > name that the kernel assigns a device.
> > 
> > This is conditionalized on CONFIG_NET_CDEV.  If enabled (the default),
> > device nodes will automatically be created in /dev/netdev/ for each
> > network device.  (/dev/net/ is already populated by the tun device.)
> > 
> > These device nodes are not functional at the moment - open() returns
> > -ENOSYS.  Their only purpose is to provide userspace with a kernel
> > name to ifindex mapping, in a form that udev can easily manage.
> 
> How does this patch work with the network namespace functionality?

There is a monitonically increasing static ifindex kept in
net/core/dev.c:dev_new_index(), which is shared across all namespaces.
struct net_device ifindex field is assigned from this.  So two devices
in two different namespaces can't share an ifindex value.  However,
the device can be present (or not) in the per-namespace dev_name_hash
and dev_index_hashes.  This patch doesn't change this at all.

uevents aren't namespaced.  Presumably that means /dev can't be
polyinstantiated.  Therefore, all devnodes in /dev/netdev/* will be
visible to all processes, where 'ifconfig' and friends would only show
device names in the processes namespace.  This doesn't mean the app
can _do_ anything (it's the same as if it tried to act on a device
using an ifindex for a device not in its namespace), but yes, the fact
that such a device exists will be exposed.

-- 
Matt Domsch
Technology Strategist, Dell Office of the CTO
linux.dell.com & www.dell.com/linux

^ permalink raw reply

* Re: PATCH: Network Device Naming mechanism and policy
From: Marco d'Itri @ 2009-10-09 16:56 UTC (permalink / raw)
  To: Bryan Kadzban
  Cc: Matt Domsch, Narendra K, netdev, linux-hotplug, jordan_hargrave
In-Reply-To: <4ACF6367.8040401@kadzban.is-a-geek.net>

[-- Attachment #1: Type: text/plain, Size: 353 bytes --]

On Oct 09, Bryan Kadzban <bryan@kadzban.is-a-geek.net> wrote:

> > As has been noted here, MAC addresses are not necessarily unique to
> > an interface.
> Only in the case of e.g. qemu (virtual hardware), I think.  (Or some
> kinds of broken hardware.
Some Sun products have multiple interfaces sharing the same MAC address.

-- 
ciao,
Marco

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* Re: [PATCH] [CAIF-RFC 5/8-v2] CAIF Protocol Stack
From: Randy Dunlap @ 2009-10-09 16:43 UTC (permalink / raw)
  To: sjur.brandeland
  Cc: netdev, stefano.babic, randy.dunlap, kim.xx.lilliestierna,
	christian.bejram, daniel.martensson
In-Reply-To: <1255095571-6501-6-git-send-email-sjur.brandeland@stericsson.com>

On Fri, 09 Oct 2009 15:39:28 +0200 sjur.brandeland@stericsson.com wrote:

> From: Sjur Braendeland <sjur.brandeland@stericsson.com>
> 
> Change-Id: I205c5b3baf1542e1593637ce896d8684870415be
> Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
> ---
>  net/caif/Kconfig            |   61 ++
>  net/caif/Makefile           |   56 ++
>  net/caif/caif_chnlif.c      |  219 +++++++
>  net/caif/caif_chr.c         |  374 ++++++++++++
>  net/caif/caif_config_util.c |  167 ++++++
>  net/caif/chnl_chr.c         | 1393 +++++++++++++++++++++++++++++++++++++++++++
>  net/caif/chnl_net.c         |  492 +++++++++++++++
>  7 files changed, 2762 insertions(+), 0 deletions(-)
>  create mode 100644 net/caif/Kconfig
>  create mode 100644 net/caif/Makefile
>  create mode 100644 net/caif/caif_chnlif.c
>  create mode 100644 net/caif/caif_chr.c
>  create mode 100644 net/caif/caif_config_util.c
>  create mode 100644 net/caif/chnl_chr.c
>  create mode 100644 net/caif/chnl_net.c
> 
> diff --git a/net/caif/Kconfig b/net/caif/Kconfig
> new file mode 100644
> index 0000000..7fb9e9c
> --- /dev/null
> +++ b/net/caif/Kconfig
> @@ -0,0 +1,61 @@
> +#
> +# CAIF net configurations
> +#
> +
> +#menu "Caif Support"
> +comment "CAIF Support"
> +
> +menuconfig CAIF
> +	tristate "Enable Caif support"
> +	default n
> +	---help---
> +	Say Y here if you need to use a phone modem that uses CAIF as transport

	end above with period ('.').

> +	You will also need to say yes to any caif physical devices that your platform
> +	supports.
> +	This can be either built-in or as a loadable module, if you select to build it as module

s/,/;/

> +	the other CAIF also needs to built as modules

	the other CAIF {options or drivers or some other word here} also need  ... modules.
	(end with period)


> +	See Documentation/CAIF for a further explanation on how to use and configure.
> +
> +if CAIF
> +
> +config CAIF_CHARDEV
> +	tristate "CAIF character device"
> +	default CAIF
> +	---help---
> +	Say Y if you will be using the CAIF AT type character devices.
> +	This can be either built-in or as a loadable module,
> +	If you select to build it as a built in then the main caif device must also be a builtin.
> +	If unsure say Y.
> +
> +config CAIF_NETDEV
> +	tristate "CAIF Network device"
> +	default CAIF
> +	---help---
> +	Say Y if you will be using the CAIF based network device.
> +	This can be either built-in or as a loadable module,
> +	If you select to build it as a built in then the main caif device must also be a builtin.
> +	If unsure say Y.
> +
> +
> +config  CAIF_USE_PLAIN
> +	bool  "Use plain buffers instead of SKB in caif"
> +	default n
> +	---help---
> +	Use plain buffer to transport data,

	s/,/./

> +	Select what type of internal buffering CAIF should use,
> +	skb or plain.
> +	If unsure say N hre.
> +
> +config  CAIF_DEBUG
> +	bool "Enable Debug"
> +	default n
> +	--- help ---
> +	Enable the inclusion of debug code in the caif stack,
> +	be aware that doing this will impact performance.
> +	If unsure say N here.
> +
> +# Include physical drivers
> +# source "drivers/net/caif/Kconfig"

Drop the above line.

> +source "drivers/net/caif/Kconfig"
> +endif
> +#endmenu


---
~Randy

^ permalink raw reply

* Re: Real networking namespace
From: Stephen Smalley @ 2009-10-09 16:44 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: linux-security-module, Al Viro, netdev, Paul Moore, James Morris
In-Reply-To: <1255106246.2182.219.camel@moss-pluto.epoch.ncsc.mil>

On Fri, 2009-10-09 at 12:37 -0400, Stephen Smalley wrote:
> On Fri, 2009-10-09 at 08:38 -0700, Stephen Hemminger wrote:
> > The existing networking namespace model is unattractive for what I want,
> > has anyone investigated better alternatives?
> > 
> > I would like to be able to allow access to a network interface and associated objects
> > (routing tables etc), to be controlled by Mandatory Access Control API's.
> > I.e grant access to eth0 and to only certain processes.  Some the issues
> > with the existing models are:
> >   * eth0 and associated objects don't really exist in filesystem so
> >     not subject to LSM style control (SeLinux/SMACK/TOMOYO)
> >   * network namespaces do not allow object to exist in multiple namespaces.
> >     The current model is more restrictive than chroot jails. At least with
> >     chroot, put filesystem objects in multiple jails.
> > 
> > Since one of the first rules of security is "don't reinvent", surely
> > others have dealt with this issue. Any good ideas?
> 
> Is there something that prevents you from using the existing SELinux
> network access controls?  netif is a security class governed by SELinux
> policy, and routing table operations would be covered by the SELinux
> checks on netlink_route_socket.  SELinux uses a combination of LSM hooks
> and netfilter hooks to mediate network operations.

Also, depending on what you want to do, SECMARK may be useful to you.
That allows you to mark packets with security contexts via iptables, and
then use SELinux policy to control their flow.
http://paulmoore.livejournal.com/4281.html
http://james-morris.livejournal.com/11010.html

-- 
Stephen Smalley
National Security Agency


^ permalink raw reply

* Re: Ping Is Broken
From: Rob Townley @ 2009-10-09 16:44 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA; +Cc: CentOS mailing list, Omaha Linux User Group
In-Reply-To: <7e84ed60910090934y2a0d422cr158aa8d15e452f97-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

ping -I is broken

The following deals with bug in ping that made it very difficult to set up a
system with two gateways.

Demonstration that *ping -I is broken*. When specifying the source
interface using -I with an *ethX* alias and that interface is not the
default gateway
interface, then ping fails. When specifying the interface as an ip address,
ping works. Search for "Destination Host Unreachable" to find the bug.


eth*0* = 4.3.2.8 and the default gateway is accessed through a different
interface eth*1*.
eth*1* = 192.168.168.155 is used as the device to get to the default
gateway.
*FAILS *: ping *-I eth0* 208.67.222.222
*WORKS*: ping *-I 4.3.2.8* 208.67.222.222
*WORKS*: ping *-I eth1* 208.67.222.222
*WORKS*: ping *-I 192.168.168.155* 208.67.222.222

The following are actual results which can be reproduced from an up-to-date
Fedora 11 or CentOS 5.3 box. Caused a very very long episode of frustration
when setting up multi gatewayed systems.


* ping using eth0 *:

ping -c 2 -B -I  eth0 208.67.222.222
PING 208.67.222.222 (208.67.222.222) from 4.3.2.8 eth0: 56(84) bytes of data.
>From 4.3.2.8 icmp_seq=1 Destination Host Unreachable
>From 4.3.2.8 icmp_seq=2 Destination Host Unreachable

--- 208.67.222.222 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 999ms
, pipe 2

--------------------------------------
The Following all WORK:
* ping using 4.3.2.8 *:

ping -c 2 -B -I  4.3.2.8 208.67.222.222
PING 208.67.222.222 (208.67.222.222) from 4.3.2.8 : 56(84) bytes of data.
64 bytes from 208.67.222.222: icmp_seq=1 ttl=55 time=562 ms
64 bytes from 208.67.222.222: icmp_seq=2 ttl=55 time=642 ms

--- 208.67.222.222 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 562.546/602.400/642.255/39.862 ms


* ping using eth1 *:

ping -c 2 -B -I  eth1 208.67.222.222
PING 208.67.222.222 (208.67.222.222) from 192.168.168.155 eth1: 56(84)
bytes of data.
64 bytes from 208.67.222.222: icmp_seq=1 ttl=54 time=270 ms
64 bytes from 208.67.222.222: icmp_seq=2 ttl=54 time=629 ms

--- 208.67.222.222 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 270.128/449.766/629.405/179.639 ms


* ping using 192.168.168.155 *:

ping -c 2 -B -I  192.168.168.155 208.67.222.222
PING 208.67.222.222 (208.67.222.222) from 192.168.168.155 : 56(84)
bytes of data.
64 bytes from 208.67.222.222: icmp_seq=1 ttl=54 time=585 ms
64 bytes from 208.67.222.222: icmp_seq=2 ttl=54 time=554 ms

--- 208.67.222.222 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 554.098/569.655/585.212/15.557 ms

My source route policy rules:

/sbin/ip rule show
0:	from all lookup 255
32762:	from 4.3.2.8 lookup nic0
32763:	from 192.168.168.155 lookup nic1
32764:	from 192.168.168.155 lookup nic1
32765:	from 4.3.2.8 lookup nic0
32766:	from all lookup main
32767:	from all lookup default



Print out routing tables using /sbin/ip route show table TABLENAME:
routing table  nic0 :
/sbin/ip route show table nic0
default via 4.3.2.1 dev eth0

routing table  nic1 :
/sbin/ip route show table nic1
default via 192.168.168.1 dev eth1

routing table  main :
/sbin/ip route show table main
4.3.2.1/27 dev eth0  proto kernel  scope link  src 4.3.2.8
192.168.168.0/24 dev eth1  proto kernel  scope link  src 192.168.168.155
169.254.0.0/16 dev eth1  scope link
default via 192.168.168.1 dev eth1

routing table  default :
/sbin/ip route show table default




NOTES: cat /etc/iproute2/rt_tables to get your own table names.

ping Maintainer YOSHIFUJI Hideaki / USAGI/WIDE Project
 http://www.skbuff.net/iputils/
Mailing List netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

man ping:
   -I interface address
        Set source address to specified interface address.
        Argument may be *numeric IP address or name of device*.
        When  pinging  IPv6  link-local  address  this option is required.

ping -V returns the latest available on CentOS and Fedora and the
maintainers website:
ping utility, iputils-ss020927

^ permalink raw reply

* Re: Real networking namespace
From: Stephen Smalley @ 2009-10-09 16:37 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: linux-security-module, Al Viro, netdev, Paul Moore, James Morris
In-Reply-To: <20091009083807.16e55b08@nehalam>

On Fri, 2009-10-09 at 08:38 -0700, Stephen Hemminger wrote:
> The existing networking namespace model is unattractive for what I want,
> has anyone investigated better alternatives?
> 
> I would like to be able to allow access to a network interface and associated objects
> (routing tables etc), to be controlled by Mandatory Access Control API's.
> I.e grant access to eth0 and to only certain processes.  Some the issues
> with the existing models are:
>   * eth0 and associated objects don't really exist in filesystem so
>     not subject to LSM style control (SeLinux/SMACK/TOMOYO)
>   * network namespaces do not allow object to exist in multiple namespaces.
>     The current model is more restrictive than chroot jails. At least with
>     chroot, put filesystem objects in multiple jails.
> 
> Since one of the first rules of security is "don't reinvent", surely
> others have dealt with this issue. Any good ideas?

Is there something that prevents you from using the existing SELinux
network access controls?  netif is a security class governed by SELinux
policy, and routing table operations would be covered by the SELinux
checks on netlink_route_socket.  SELinux uses a combination of LSM hooks
and netfilter hooks to mediate network operations.

-- 
Stephen Smalley
National Security Agency


^ permalink raw reply

* Re: PATCH: Network Device Naming mechanism and policy
From: Greg KH @ 2009-10-09 16:36 UTC (permalink / raw)
  To: Narendra K; +Cc: netdev, linux-hotplug, matt_domsch, jordan_hargrave
In-Reply-To: <20091009140000.GA18765@mock.linuxdev.us.dell.com>

On Fri, Oct 09, 2009 at 09:00:01AM -0500, Narendra K wrote:
> On Fri, Oct 09, 2009 at 07:12:07PM +0530, K, Narendra wrote:
> > > example udev config:
> > > SUBSYSTEM=="net",
> > SYMLINK+="net/by-mac/$sysfs{ifindex}.$sysfs{address}"
> > 
> > work as well.  But coupling the ifindex to the MAC address like this
> > doesn't work.  (In general, coupling any two unrelated attributes when
> > trying to do persistent names doesn't work.)
> > 
> Attaching the latest patch incorporating review comments.
> 
> By creating character devices for every network device, we can use
> udev to maintain alternate naming policies for devices, including
> additional names for the same device, without interfering with the
> name that the kernel assigns a device.
> 
> This is conditionalized on CONFIG_NET_CDEV.  If enabled (the default),
> device nodes will automatically be created in /dev/netdev/ for each
> network device.  (/dev/net/ is already populated by the tun device.)
> 
> These device nodes are not functional at the moment - open() returns
> -ENOSYS.  Their only purpose is to provide userspace with a kernel
> name to ifindex mapping, in a form that udev can easily manage.

How does this patch work with the network namespace functionality?

thanks,

greg k-h

^ permalink raw reply

* Re: Ping Is Broken
From: Rob Townley @ 2009-10-09 16:34 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA; +Cc: CentOS mailing list, Omaha Linux User Group
In-Reply-To: <7e84ed60910090316ne9224fat81d9c79c58fc713b-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 4027 bytes --]

The following deals with bug in ping that made it very difficult to set up a
system with two gateways.

ping -I is broken

Demonstration that *ping -I is broken*. When specifying the source
interface using -I with an *ethX* alias and that interface is not the
default gateway
interface, then ping fails. When specifying the interface as an ip address,
ping works. Search for "Destination Host Unreachable" to find the bug.


eth*0* = 4.3.2.8 and the default gateway is accessed through a different
interface eth*1*.
eth*1* = 192.168.168.155 is used as the device to get to the default
gateway.
*FAILS *: ping *-I eth0* 208.67.222.222
*WORKS*: ping *-I 4.3.2.8* 208.67.222.222
*WORKS*: ping *-I eth1* 208.67.222.222
*WORKS*: ping *-I 192.168.168.155* 208.67.222.222

The following are actual results which can be reproduced from an up-to-date
Fedora 11 or CentOS 5.3 box. Caused a very very long episode of frustration
when setting up multi gatewayed systems.


* ping using eth0 *:

ping -c 2 -B -I  eth0 208.67.222.222
PING 208.67.222.222 (208.67.222.222) from 4.3.2.8 eth0: 56(84) bytes of data.
>From 4.3.2.8 icmp_seq=1 Destination Host Unreachable
>From 4.3.2.8 icmp_seq=2 Destination Host Unreachable

--- 208.67.222.222 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 999ms
, pipe 2


* ping using 4.3.2.8 *:

ping -c 2 -B -I  4.3.2.8 208.67.222.222
PING 208.67.222.222 (208.67.222.222) from 4.3.2.8 : 56(84) bytes of data.
64 bytes from 208.67.222.222: icmp_seq=1 ttl=55 time=562 ms
64 bytes from 208.67.222.222: icmp_seq=2 ttl=55 time=642 ms

--- 208.67.222.222 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 562.546/602.400/642.255/39.862 ms


* ping using eth1 *:

ping -c 2 -B -I  eth1 208.67.222.222
PING 208.67.222.222 (208.67.222.222) from 192.168.168.155 eth1: 56(84)
bytes of data.
64 bytes from 208.67.222.222: icmp_seq=1 ttl=54 time=270 ms
64 bytes from 208.67.222.222: icmp_seq=2 ttl=54 time=629 ms

--- 208.67.222.222 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 270.128/449.766/629.405/179.639 ms


* ping using 192.168.168.155 *:

ping -c 2 -B -I  192.168.168.155 208.67.222.222
PING 208.67.222.222 (208.67.222.222) from 192.168.168.155 : 56(84)
bytes of data.
64 bytes from 208.67.222.222: icmp_seq=1 ttl=54 time=585 ms
64 bytes from 208.67.222.222: icmp_seq=2 ttl=54 time=554 ms

--- 208.67.222.222 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 554.098/569.655/585.212/15.557 ms

My source route policy rules:

/sbin/ip rule show
0:	from all lookup 255
32762:	from 4.3.2.8 lookup nic0
32763:	from 192.168.168.155 lookup nic1
32764:	from 192.168.168.155 lookup nic1
32765:	from 4.3.2.8 lookup nic0
32766:	from all lookup main
32767:	from all lookup default



Print out routing tables using /sbin/ip route show table TABLENAME:
routing table  nic0 :
/sbin/ip route show table nic0
default via 4.3.2.1 dev eth0

routing table  nic1 :
/sbin/ip route show table nic1
default via 192.168.168.1 dev eth1

routing table  main :
/sbin/ip route show table main
4.3.2.1/27 dev eth0  proto kernel  scope link  src 4.3.2.8
192.168.168.0/24 dev eth1  proto kernel  scope link  src 192.168.168.155
169.254.0.0/16 dev eth1  scope link
default via 192.168.168.1 dev eth1

routing table  default :
/sbin/ip route show table default




NOTES: cat /etc/iproute2/rt_tables to get your own table names.

ping Maintainer YOSHIFUJI Hideaki / USAGI/WIDE Project
 http://www.skbuff.net/iputils/
Mailing List netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

man ping:
   -I interface address
        Set source address to specified interface address.
        Argument may be *numeric IP address or name of device*.
        When  pinging  IPv6  link-local  address  this option is required.

ping -V returns the latest available on CentOS and Fedora and the
maintainers website:

ping utility, iputils-ss020927

[-- Attachment #1.2: Type: text/html, Size: 5458 bytes --]

[-- Attachment #2: Type: text/plain, Size: 163 bytes --]

_______________________________________________
CentOS mailing list
CentOS-IFYaIzF+flcdnm+yROfE0A@public.gmane.org
http://lists.centos.org/mailman/listinfo/centos

^ permalink raw reply

* Re: [PATCH] net: Add netdev_alloc_skb_ip_align() helper
From: Eric Dumazet @ 2009-10-09 16:31 UTC (permalink / raw)
  To: David Miller; +Cc: thomas, netdev, thierry.reding, nios2-dev, linux-kernel
In-Reply-To: <20091007.224036.247820677.davem@davemloft.net>

David Miller a écrit :

> Looks ok, but I want to look at how often this exact sequence
> would match.  If it applies to a lot of cases, I'll add this
> but I know of many exceptions in my head already :-)

Well, it was more as a reference. I believe about 20-30 call sites
could use it. Do you want me to provide a combo patch ?


^ permalink raw reply

* Re: PATCH: Network Device Naming mechanism and policy
From: Matt Domsch @ 2009-10-09 16:25 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Narendra K, netdev, linux-hotplug, jordan_hargrave
In-Reply-To: <20091009091247.0a9b60cb@nehalam>

On Fri, Oct 09, 2009 at 09:12:47AM -0700, Stephen Hemminger wrote:
> On Fri, 9 Oct 2009 11:04:43 -0500
> Narendra K <Narendra_K@dell.com> wrote:
> 
> > By creating character devices for every network device, we can use
> > udev to maintain alternate naming policies for devices, including
> > additional names for the same device, without interfering with the
> > name that the kernel assigns a device.
> > 
> What happens if interface is renamed by either networking API:
>   ip li set dev eth0 name eth-renamed-by-me

udev sees a KOBJ_MOVE uevent.  Today it does not handle these events
at all, but talking with Kay, he believes udev can be extended to
handle that pretty easily.


> or via
>    mv /dev/net/eth0 /dev/net/eth-renamed-by-user

There is no VFS magic today such that this 'mv' will translate into a
device_rename() function inside the kernel.

udev "owns" the /dev/netdev/eth0 device node name.  If a user (root)
does a 'mv', the symlink referants will be broken.  This is no
different than doing so for a disk device or any other udev-managed
device node.  If someone does a
  mv /dev/sda /dev/sda-mybootdisk
and is relying on the /dev/disk/by-label/mybootdisk -> /dev/sda
symlink in some way, the application will fail.

> or if both are done at same time (what is locking model?)

There is no locking model.  udev will serialize the rename events
though, as seen in userspace.

Thanks,
Matt

-- 
Matt Domsch
Technology Strategist, Dell Office of the CTO
linux.dell.com & www.dell.com/linux

^ permalink raw reply

* Re: PATCH: Network Device Naming mechanism and policy
From: Bryan Kadzban @ 2009-10-09 16:23 UTC (permalink / raw)
  To: Matt Domsch; +Cc: Narendra K, netdev, linux-hotplug, jordan_hargrave
In-Reply-To: <20091009145137.GD19218@mock.linuxdev.us.dell.com>

[-- Attachment #1: Type: text/plain, Size: 905 bytes --]

Matt Domsch wrote:
> Let me also note that we are prepared to have userspace consumers of 
> this new character device node.
> 
> http://linux.dell.com/wiki/index.php/Oss/libnetdevname
> 
> notes how the kernel patch will interact with udev, describes the new
> library helper function in libnetdevname, and has patches for 
> net-tools, iproute2, and ethtool to make use of the helper function.
> 
> As has been noted here, MAC addresses are not necessarily unique to
> an interface.

Only in the case of e.g. qemu (virtual hardware), I think.  (Or some
kinds of broken hardware.  Anything not on the udev whitelist from
75-persistent-net-generator.rules.)

The combination of (MAC, ifindex) is not unique, which is what I meant
earlier -- but the setup on the wiki seems to handle this properly.
Assuming there was a /dev/net/by-mac/00:01:02:03:04:05 link, it should
work fine...


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 260 bytes --]

^ permalink raw reply

* [PATCH] WAN: fix Cisco HDLC handshaking.
From: Krzysztof Halasa @ 2009-10-09 16:16 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

WAN: fix Cisco HDLC handshaking.

Cisco HDLC uses keepalive packets and sequence numbers to determine link
state. In rare cases both ends could transmit keepalive packets at the same
time, causing the received sequence numbers to be treated as incorrect.
Now we accept our current sequence number as well as the previous one.

Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>

diff --git a/drivers/net/wan/hdlc_cisco.c b/drivers/net/wan/hdlc_cisco.c
index cf5fd17..f1bff98 100644
--- a/drivers/net/wan/hdlc_cisco.c
+++ b/drivers/net/wan/hdlc_cisco.c
@@ -58,8 +58,7 @@ struct cisco_state {
 	spinlock_t lock;
 	unsigned long last_poll;
 	int up;
-	int request_sent;
-	u32 txseq; /* TX sequence number */
+	u32 txseq; /* TX sequence number, 0 = none */
 	u32 rxseq; /* RX sequence number */
 };
 
@@ -163,6 +162,7 @@ static int cisco_rx(struct sk_buff *skb)
 	struct cisco_packet *cisco_data;
 	struct in_device *in_dev;
 	__be32 addr, mask;
+	u32 ack;
 
 	if (skb->len < sizeof(struct hdlc_header))
 		goto rx_error;
@@ -223,8 +223,10 @@ static int cisco_rx(struct sk_buff *skb)
 		case CISCO_KEEPALIVE_REQ:
 			spin_lock(&st->lock);
 			st->rxseq = ntohl(cisco_data->par1);
-			if (st->request_sent &&
-			    ntohl(cisco_data->par2) == st->txseq) {
+			ack = ntohl(cisco_data->par2);
+			if (ack && (ack == st->txseq ||
+				    /* our current REQ may be in transit */
+				    ack == st->txseq - 1)) {
 				st->last_poll = jiffies;
 				if (!st->up) {
 					u32 sec, min, hrs, days;
@@ -275,7 +277,6 @@ static void cisco_timer(unsigned long arg)
 
 	cisco_keepalive_send(dev, CISCO_KEEPALIVE_REQ, htonl(++st->txseq),
 			     htonl(st->rxseq));
-	st->request_sent = 1;
 	spin_unlock(&st->lock);
 
 	st->timer.expires = jiffies + st->settings.interval * HZ;
@@ -293,9 +294,7 @@ static void cisco_start(struct net_device *dev)
 	unsigned long flags;
 
 	spin_lock_irqsave(&st->lock, flags);
-	st->up = 0;
-	st->request_sent = 0;
-	st->txseq = st->rxseq = 0;
+	st->up = st->txseq = st->rxseq = 0;
 	spin_unlock_irqrestore(&st->lock, flags);
 
 	init_timer(&st->timer);
@@ -317,8 +316,7 @@ static void cisco_stop(struct net_device *dev)
 
 	spin_lock_irqsave(&st->lock, flags);
 	netif_dormant_on(dev);
-	st->up = 0;
-	st->request_sent = 0;
+	st->up = st->txseq = 0;
 	spin_unlock_irqrestore(&st->lock, flags);
 }
 

-- 
Krzysztof Halasa

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox